Ruslan Dautkhanov created HIVE-14989: ----------------------------------------
Summary: FIELDS TERMINATED BY parsing broken when delimiter is more than 1 byte Key: HIVE-14989 URL: https://issues.apache.org/jira/browse/HIVE-14989 Project: Hive Issue Type: Bug Components: File Formats, Parser, Reader Affects Versions: 0.13.1, 0.13.0 Reporter: Ruslan Dautkhanov FIELDS TERMINATED BY parsing broken when delimiter is more than 1 byte. Delimiter starting from 2nd character becomes part of returned data. No parsed properly. Test case: {noformat} CREATE external TABLE test_muldelim ( string1 STRING, string2 STRING, string3 STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '<>' LINES TERMINATED BY '\n' STORED AS TEXTFILE location '/user/hive/test_muldelim' {noformat} Create a text file under /user/hive/test_muldelim with following 2 lines: {noformat} data1<>data2<>data3 aa<>bb<>cc {noformat} Now notice that two-character delimiter wasn't parsed properly: {noformat} jdbc:hive2://host.domain.com:1> select * from ruslan_test.test_muldelim ; +------------------------+------------------------+------------------------+--+ | test_muldelim.string1 | test_muldelim.string2 | test_muldelim.string3 | +------------------------+------------------------+------------------------+--+ | data1 | >data2 | >data3 | | aa | >bb | >cc | +------------------------+------------------------+------------------------+--+ 2 rows selected (0.453 seconds) {noformat} The second delimiter's character ('>') became part of the columns to the right (`string2` and `string3`). Table DDL: {noformat} 0: jdbc:hive2://host.domain.com:1> show create table dafault.test_muldelim ; +-----------------------------------------------------------------+--+ | createtab_stmt | +-----------------------------------------------------------------+--+ | CREATE EXTERNAL TABLE `default.test_muldelim`( | | `string1` string, | | `string2` string, | | `string3` string) | | ROW FORMAT DELIMITED | | FIELDS TERMINATED BY '<>' | | LINES TERMINATED BY '\n' | | STORED AS INPUTFORMAT | | 'org.apache.hadoop.mapred.TextInputFormat' | | OUTPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' | | LOCATION | | 'hdfs://epsdatalake/user/hive/test_muldelim' | | TBLPROPERTIES ( | | 'transient_lastDdlTime'='1476727100') | +-----------------------------------------------------------------+--+ 15 rows selected (0.286 seconds) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)