[ https://issues.apache.org/jira/browse/IMPALA-886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Armstrong updated IMPALA-886: --------------------------------- Labels: catalog-server hbase usability (was: catalog-server usability) > Always display HBase cols in same order as CREATE TABLE statement > ----------------------------------------------------------------- > > Key: IMPALA-886 > URL: https://issues.apache.org/jira/browse/IMPALA-886 > Project: IMPALA > Issue Type: Improvement > Components: Catalog > Affects Versions: Impala 1.3 > Reporter: John Russell > Priority: Minor > Labels: catalog-server, hbase, usability > > I noticed a discrepancy with Hive, in how Impala handles column order for > HBase tables. > I think it would be preferable to use the same behavior as Hive, otherwise > life becomes > more complicated for anyone doing INSERT or SELECT * with an HBase table > through Impala. > (And I have to add caveats and usage notes in the docs.) > Repro: > In HBase shell, create a table with a single column family. I think most > Impala tests use 1 column family per column, where you won't notice this > behavior. > hbase(main):008:0> create 'sample_data_fast','cols' > 0 row(s) in 71.8750 seconds > In Hive shell, create a mapping table. Notice how DESCRIBE repeats back the > columns in the same order as in CREATE TABLE. > hive> create external table sample_data_fast (id string, val int, zfill > string, name string, assertion boolean) > > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > > WITH SERDEPROPERTIES ( > > "hbase.columns.mapping" = > > ":key,cols:val,cols:zfill,cols:name,cols:assertion") > > TBLPROPERTIES("hbase.table.name" = "sample_data_fast") > > ; > OK > Time taken: 1.7 seconds > hive> desc sample_data_fast; > OK > id string from deserializer > val int from deserializer > zfill string from deserializer > name string from deserializer > assertion boolean from deserializer > Time taken: 0.302 seconds > Now try the same DESCRIBE in impala-shell. The key column (id) is listed > first. Then all the other columns, part of the same column family, are listed > in alphabetical order rather than the order from CREATE TABLE: > [localhost:21000] > desc sample_data_fast; > Query: describe sample_data_fast > +-----------+---------+---------+ > | name | type | comment | > +-----------+---------+---------+ > | id | string | | > | assertion | boolean | | > | name | string | | > | val | int | | > | zfill | string | | > +-----------+---------+---------+ > Returned 5 row(s) in 0.02s > Thus if you already had Hive code that was doing SELECT * from an HBase table > like this, you would get a different result set (different column order) in > Impala. > If you tried to copy from an HDFS table via 'INSERT INTO hbase_table SELECT * > FROM hdfs_table', you would get an error because the columns don't match. If > you made a separate column family for each column, the discrepancy is masked > because you need more than one column per column family to experience the > alphabetical ordering. > Since Hive is preserving the column order, the relevant info must be there in > the metastore. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org