----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28147/#review62744 -----------------------------------------------------------
data/files/parquet_types.txt <https://reviews.apache.org/r/28147/#comment104827> I think this is bit confusing, since the 0b prefix gives the impression that data is read in binary format, whereas it is actually getting read as a string. I think we can either write (preferably non-ascii) binary data instead (for example, see: data/files/string.txt) OR alternatively, we could write it legibly in hex, like 68656c6c6f ("hello") and convert it to binary using unhex() in the INSERT OVERWRITE query. What do you think ? ql/src/test/queries/clientpositive/parquet_types.q <https://reviews.apache.org/r/28147/#comment104828> If we write hex format (like 68656c6c6f) in parquet_types.q, we can just use unhex() to convert it to binary: INSERT OVERWRITE TABLE parquet_types SELECT cint, ctinyint, csmallint, cfloat, cdouble, cstring1, t, cchar, cvarchar, unhex(cbinary), m1, l1, st1 FROM parquet_types_staging; ql/src/test/queries/clientpositive/parquet_types.q <https://reviews.apache.org/r/28147/#comment104830> Instead of "select * from parquet_types"... since cbinary column may have unprintable characters, you can pass it through hex() to make it legible: SELECT cint, ctinyint, csmallint, cfloat, cdouble, cstring1, t, cchar, cvarchar, hex(cbinary), m1, l1, st1 FROM parquet_types; ql/src/test/queries/clientpositive/parquet_types.q <https://reviews.apache.org/r/28147/#comment104829> No need to unhex here... Can just be: SELECT cchar, LENGTH(cchar), cvarchar, LENGTH(cvarchar), cbinary FROM parquet_types Or you can pass it through hex() if original data has unprintable characters: SELECT cchar, LENGTH(cchar), cvarchar, LENGTH(cvarchar), hex(cbinary) FROM parquet_types - Mohit Sabharwal On Nov. 21, 2014, 8:53 a.m., cheng xu wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/28147/ > ----------------------------------------------------------- > > (Updated Nov. 21, 2014, 8:53 a.m.) > > > Review request for hive. > > > Repository: hive-git > > > Description > ------- > > This patch includes: > 1. binary support for ParquetHiveSerde > 2. related test cases both in unit and ql test > > > Diffs > ----- > > data/files/parquet_types.txt d342062 > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java > 472de8f > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java > d5aae3b > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java > 4effe73 > ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java > 8ac7864 > ql/src/test/queries/clientpositive/parquet_types.q 22585c3 > ql/src/test/results/clientpositive/parquet_types.q.out 275897c > > Diff: https://reviews.apache.org/r/28147/diff/ > > > Testing > ------- > > related UT and QL tests passed > > > Thanks, > > cheng xu > >