Gunther Hagleitner created HIVE-6262:
----------------------------------------
Summary: Remove unnecessary copies of schema + table desc from
serialized plan
Key: HIVE-6262
URL: https://issues.apache.org/jira/browse/HIVE-6262
Project: Hive
Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
Currently for a partitioned table the following are true:
- for each partitiondesc we send a copy of the corresponding tabledesc
- for each partitiondesc we send two copies of the schema (in different
formats).
Obviously we need to send different schemas if they are required by schema
evolution, but in our case we'll always end up with multiple copies.
The effect can be dramatic. The reductions by removing those on partitioned
tables easily be can be 8-10x in size. Plans themselves can be 10s to 100s of
mb (even with kryo). The size difference also plays out in every task on the
cluster we run.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)