[ https://issues.apache.org/jira/browse/HAWQ-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Radar Lei reassigned HAWQ-1303: ------------------------------- Assignee: Oleksandr Diachenko (was: Radar Lei) > Load each partition as separate table for heterogenous tables in HCatalog > ------------------------------------------------------------------------- > > Key: HAWQ-1303 > URL: https://issues.apache.org/jira/browse/HAWQ-1303 > Project: Apache HAWQ > Issue Type: Improvement > Components: Hcatalog, PXF > Reporter: Oleksandr Diachenko > Assignee: Oleksandr Diachenko > > Changes introduced in HAWQ-1228 made HAWQ use optimal profile/format for Hive > tables. But there is a limitation when HAWQ loads Hive tables into memory, it > loads them as one table even if a table has multiple partitions with > different output formats(GPDBWritable, TEXT). Thus currently it uses > GBDBWritable format for that case. The idea is to load each partition set of > one output format as a separate table, so not optimal profile, but optimal > output format could be used. > Example: > We have Hive table with four partitions of following formats - Text, RC, ORC, > Sequence file. > Currently, HAWQ will load it to memory with GPDBWritable format. > GPDBWritable format is optimal for HiveORC, Hive profiles but not optimal for > HIveText and HiveRC profiles. > With proposed changes, HAWQ should load two tables with TEXT and GPDBWritable > formats and use following pairs to read partitions - HiveText/TEXT, > HiveRC/TEXT, HiveORC/GPDBWritable, Hive/GPDBWritable. -- This message was sent by Atlassian JIRA (v6.4.14#64029)