[
https://issues.apache.org/jira/browse/HIVE-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14099343#comment-14099343
]
David Chen commented on HIVE-4329:
----------------------------------
Hi Sushanth,
Thank you for taking a look at this ticket.
I agree that it would be ideal to get Hive to a point where a unified
StorageHandler interface can replace the current use of HiveOutputFormat and
FileSinkOperator.RecordWriter (which should really be named HiveRecordWriter).
However, that is a larger, more long-term undertaking whereas this ticket is to
fix the fact that it is currently not possible to write using HCatalog for
storage formats whose (Hive)OutputFormats that only implement
getHiveRecordWriter and not getRecordWriter.
The new tests I added as part of HIVE-7286 have demonstrated that only solving
the type compatibility issue mentioned earlier in this ticket is not
sufficient. The type error for AvroContainerOutputFormat masks the real issue
which is that AvroContainerOutputFormat's getRecordWriter (as with
ParquetHiveOutputFormat's) does nothing but throws an exception, which says
that "this method should not be called."
This is why my fix for this issue is taking this approach, which is based on
the approach taken by core Hive. To my understanding, Hive accepts both MR
OutputFormats as well as HiveOutputFormats but ends up calling
getHiveRecordWriter in both cases. For the case of MR OutputFormats, Hive
detects that it is not a HiveOutputFormat and wraps it using
HivePassThroughOutputFormat.
My understanding is that your main concern is that this patch may be turning
HCatOutputFormat into a HiveOutputFormat. However, this is not the case. This
patch does not change the HCatalog interface; it changes the way that
HCatOutputFormat wraps the underlying OutputFormat so that it can properly
handle HiveOutputFormats, which is required to make it possible to write using
HCatalog for Avro and Parquet.
> HCatalog should use getHiveRecordWriter rather than getRecordWriter
> -------------------------------------------------------------------
>
> Key: HIVE-4329
> URL: https://issues.apache.org/jira/browse/HIVE-4329
> Project: Hive
> Issue Type: Bug
> Components: HCatalog, Serializers/Deserializers
> Affects Versions: 0.14.0
> Environment: discovered in Pig, but it looks like the root cause
> impacts all non-Hive users
> Reporter: Sean Busbey
> Assignee: David Chen
> Attachments: HIVE-4329.0.patch
>
>
> Attempting to write to a HCatalog defined table backed by the AvroSerde fails
> with the following stacktrace:
> {code}
> java.lang.ClassCastException: org.apache.hadoop.io.NullWritable cannot be
> cast to org.apache.hadoop.io.LongWritable
> at
> org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat$1.write(AvroContainerOutputFormat.java:84)
> at
> org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:253)
> at
> org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:53)
> at
> org.apache.hcatalog.pig.HCatBaseStorer.putNext(HCatBaseStorer.java:242)
> at org.apache.hcatalog.pig.HCatStorer.putNext(HCatStorer.java:52)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
> at
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:559)
> at
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
> {code}
> The proximal cause of this failure is that the AvroContainerOutputFormat's
> signature mandates a LongWritable key and HCat's FileRecordWriterContainer
> forces a NullWritable. I'm not sure of a general fix, other than redefining
> HiveOutputFormat to mandate a WritableComparable.
> It looks like accepting WritableComparable is what's done in the other Hive
> OutputFormats, and there's no reason AvroContainerOutputFormat couldn't also
> be changed, since it's ignoring the key. That way fixing things so
> FileRecordWriterContainer can always use NullWritable could get spun into a
> different issue?
> The underlying cause for failure to write to AvroSerde tables is that
> AvroContainerOutputFormat doesn't meaningfully implement getRecordWriter, so
> fixing the above will just push the failure into the placeholder RecordWriter.
--
This message was sent by Atlassian JIRA
(v6.2#6252)