[ 
https://issues.apache.org/jira/browse/HAWQ-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Radar Lei reassigned HAWQ-1303:
-------------------------------

    Assignee: Oleksandr Diachenko  (was: Radar Lei)

> Load each partition as separate table for heterogenous tables in HCatalog
> -------------------------------------------------------------------------
>
>                 Key: HAWQ-1303
>                 URL: https://issues.apache.org/jira/browse/HAWQ-1303
>             Project: Apache HAWQ
>          Issue Type: Improvement
>          Components: Hcatalog, PXF
>            Reporter: Oleksandr Diachenko
>            Assignee: Oleksandr Diachenko
>
> Changes introduced in HAWQ-1228 made HAWQ use optimal profile/format for Hive 
> tables. But there is a limitation when HAWQ loads Hive tables into memory, it 
> loads them as one table even if a table has multiple partitions with 
> different output formats(GPDBWritable, TEXT). Thus currently it uses 
> GBDBWritable format for that case. The idea is to load each partition set of 
> one output format as a separate table, so not optimal profile, but optimal 
> output format could be used.
> Example: 
> We have Hive table with four partitions of following formats - Text, RC, ORC, 
> Sequence file.
> Currently, HAWQ will load it to memory with GPDBWritable format.
> GPDBWritable format is optimal for HiveORC, Hive profiles but not optimal for 
> HIveText and HiveRC profiles.
> With proposed changes, HAWQ should load two tables with TEXT and GPDBWritable 
> formats and use following pairs to read partitions - HiveText/TEXT, 
> HiveRC/TEXT, HiveORC/GPDBWritable, Hive/GPDBWritable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to