[ https://issues.apache.org/jira/browse/HIVE-14989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15584420#comment-15584420 ]
Niklaus Xiao commented on HIVE-14989: ------------------------------------- You should use {{MultiDelimtSerde}} in this case. > FIELDS TERMINATED BY parsing broken when delimiter is more than 1 byte > ---------------------------------------------------------------------- > > Key: HIVE-14989 > URL: https://issues.apache.org/jira/browse/HIVE-14989 > Project: Hive > Issue Type: Bug > Components: File Formats, Parser, Reader > Affects Versions: 0.13.0, 0.13.1 > Reporter: Ruslan Dautkhanov > > FIELDS TERMINATED BY parsing broken when delimiter is more than 1 byte. > Delimiter starting from 2nd character becomes part of returned data. No > parsed properly. > Test case: > {noformat} > CREATE external TABLE test_muldelim > ( string1 STRING, > string2 STRING, > string3 STRING > ) > ROW FORMAT > DELIMITED FIELDS TERMINATED BY '<>' > LINES TERMINATED BY '\n' > STORED AS TEXTFILE > location '/user/hive/test_muldelim' > {noformat} > Create a text file under /user/hive/test_muldelim with following 2 lines: > {noformat} > data1<>data2<>data3 > aa<>bb<>cc > {noformat} > Now notice that two-character delimiter wasn't parsed properly: > {noformat} > jdbc:hive2://host.domain.com:1> select * from ruslan_test.test_muldelim ; > +------------------------+------------------------+------------------------+--+ > | test_muldelim.string1 | test_muldelim.string2 | test_muldelim.string3 | > +------------------------+------------------------+------------------------+--+ > | data1 | >data2 | >data3 | > | aa | >bb | >cc | > +------------------------+------------------------+------------------------+--+ > 2 rows selected (0.453 seconds) > {noformat} > The second delimiter's character ('>') became part of the columns to the > right (`string2` and `string3`). > Table DDL: > {noformat} > 0: jdbc:hive2://host.domain.com:1> show create table dafault.test_muldelim ; > +-----------------------------------------------------------------+--+ > | createtab_stmt | > +-----------------------------------------------------------------+--+ > | CREATE EXTERNAL TABLE `default.test_muldelim`( | > | `string1` string, | > | `string2` string, | > | `string3` string) | > | ROW FORMAT DELIMITED | > | FIELDS TERMINATED BY '<>' | > | LINES TERMINATED BY '\n' | > | STORED AS INPUTFORMAT | > | 'org.apache.hadoop.mapred.TextInputFormat' | > | OUTPUTFORMAT | > | 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' | > | LOCATION | > | 'hdfs://epsdatalake/user/hive/test_muldelim' | > | TBLPROPERTIES ( | > | 'transient_lastDdlTime'='1476727100') | > +-----------------------------------------------------------------+--+ > 15 rows selected (0.286 seconds) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)