[
https://issues.apache.org/jira/browse/HAWQ-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Radar Lei reassigned HAWQ-1303:
-------------------------------
Assignee: Oleksandr Diachenko (was: Radar Lei)
> Load each partition as separate table for heterogenous tables in HCatalog
> -------------------------------------------------------------------------
>
> Key: HAWQ-1303
> URL: https://issues.apache.org/jira/browse/HAWQ-1303
> Project: Apache HAWQ
> Issue Type: Improvement
> Components: Hcatalog, PXF
> Reporter: Oleksandr Diachenko
> Assignee: Oleksandr Diachenko
>
> Changes introduced in HAWQ-1228 made HAWQ use optimal profile/format for Hive
> tables. But there is a limitation when HAWQ loads Hive tables into memory, it
> loads them as one table even if a table has multiple partitions with
> different output formats(GPDBWritable, TEXT). Thus currently it uses
> GBDBWritable format for that case. The idea is to load each partition set of
> one output format as a separate table, so not optimal profile, but optimal
> output format could be used.
> Example:
> We have Hive table with four partitions of following formats - Text, RC, ORC,
> Sequence file.
> Currently, HAWQ will load it to memory with GPDBWritable format.
> GPDBWritable format is optimal for HiveORC, Hive profiles but not optimal for
> HIveText and HiveRC profiles.
> With proposed changes, HAWQ should load two tables with TEXT and GPDBWritable
> formats and use following pairs to read partitions - HiveText/TEXT,
> HiveRC/TEXT, HiveORC/GPDBWritable, Hive/GPDBWritable.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)