Ratandeep Ratti created HIVE-18410:
--------------------------------------
Summary: [Performance][Avro] Reading flat Avro tables is very
expensive in Hive
Key: HIVE-18410
URL: https://issues.apache.org/jira/browse/HIVE-18410
Project: Hive
Issue Type: Improvement
Reporter: Ratandeep Ratti
Assignee: Ratandeep Ratti
There's a performance penalty when reading flat [no nested fields] Avro tables.
When reading the same flat dataset in Pig, it takes half the time. On
profiling, a lot of time is spent in
{{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the time
is spent in GenericData.get().resolveUnion(), which calls
GenericData.getSchemaName(Object datum), which does a lot of instanceof checks.
This could be simplified with performance benefits. A approach is described in
this patch which almost halves the runtime.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)