[jira] [Created] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

Ratandeep Ratti (JIRA) Mon, 08 Jan 2018 17:41:25 -0800

Ratandeep Ratti created HIVE-18410:
--------------------------------------

             Summary: [Performance][Avro] Reading flat Avro tables is very 
expensive in Hive
                 Key: HIVE-18410
                 URL: https://issues.apache.org/jira/browse/HIVE-18410
             Project: Hive
          Issue Type: Improvement
            Reporter: Ratandeep Ratti
            Assignee: Ratandeep Ratti



There's a performance penalty when reading flat [no nested fields] Avro tables. 
When reading the same flat dataset in Pig, it takes half the time.  On 
profiling, a lot of time is spent in 
{{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the time 
is spent in GenericData.get().resolveUnion(), which calls 
GenericData.getSchemaName(Object datum), which does a lot of instanceof checks. 
 This could be simplified with performance benefits. A approach is described in 
this patch which almost halves the runtime.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

Reply via email to