[ https://issues.apache.org/jira/browse/HIVE-18265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hui Huang updated HIVE-18265: ----------------------------- Attachment: HIVE-18265.2.patch > desc formatted/extended or show create table can not fully display the result > when field or table comment contains tab character > -------------------------------------------------------------------------------------------------------------------------------- > > Key: HIVE-18265 > URL: https://issues.apache.org/jira/browse/HIVE-18265 > Project: Hive > Issue Type: Bug > Components: CLI > Affects Versions: 1.2.1, 3.1.0 > Reporter: Hui Huang > Assignee: Hui Huang > Priority: Major > Fix For: 3.1.0 > > Attachments: HIVE-18265.1.patch, HIVE-18265.2.patch, HIVE-18265.patch > > > Here are some examples: > create table test_comment (id1 string comment 'full_\tname1', id2 string > comment 'full_\tname2', id3 string comment 'full_\tname3') stored as textfile; > When execute `show create table test_comment`, we can see the following > content in the console, > {quote} > createtab_stmt > CREATE TABLE `test_comment`( > `id1` string COMMENT 'full_ > `id2` string COMMENT 'full_ > `id3` string COMMENT 'full_ > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.mapred.TextInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > LOCATION > 'hdfs://xxx/user/huanghui/warehouse/huanghuitest.db/test_comment' > TBLPROPERTIES ( > 'transient_lastDdlTime'='1513095570') > {quote} > And the output of `desc formatted table ` is a little similar, > {quote} > col_name data_type comment > \# col_name data_type comment > id1 string full_ > id2 string full_ > id3 string full_ > \# Detailed Table Information > (ignore)... > {quote} > When execute `desc extended test_comment`, the problem is more obvious, > {quote} > col_name data_type comment > id1 string full_ > id2 string full_ > id3 string full_ > Detailed Table Information Table(tableName:test_comment, > dbName:huanghuitest, owner:huanghui, createTime:1513095570, lastAccessTime:0, > retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id1, type:string, > comment:full_ name1), FieldSchema(name:id2, type:string, comment:full_ > {quote} > *the rest of the content is lost*. > The content is not really lost, it's just can not display normal. Because > hive store the result in LazyStruct, and LazyStruct use '\t' as field > separator: > {code:java} > // LazyStruct.java#parse() > // Go through all bytes in the byte[] > while (fieldByteEnd <= structByteEnd) { > if (fieldByteEnd == structByteEnd || bytes[fieldByteEnd] == separator) { > // Reached the end of a field? > if (lastColumnTakesRest && fieldId == fields.length - 1) { > fieldByteEnd = structByteEnd; > } > startPosition[fieldId] = fieldByteBegin; > fieldId++; > if (fieldId == fields.length || fieldByteEnd == structByteEnd) { > // All fields have been parsed, or bytes have been parsed. > // We need to set the startPosition of fields.length to ensure we > // can use the same formula to calculate the length of each field. > // For missing fields, their starting positions will all be the > same, > // which will make their lengths to be -1 and uncheckedGetField will > // return these fields as NULLs. > for (int i = fieldId; i <= fields.length; i++) { > startPosition[i] = fieldByteEnd + 1; > } > break; > } > fieldByteBegin = fieldByteEnd + 1; > fieldByteEnd++; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)