Harsh J created HIVE-13275:
------------------------------
Summary: Add a toString method to BytesRefArrayWritable
Key: HIVE-13275
URL: https://issues.apache.org/jira/browse/HIVE-13275
Project: Hive
Issue Type: Improvement
Components: Serializers/Deserializers
Affects Versions: 1.1.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Trivial
Attachments: HIVE-13275.000.patch
RCFileInputFormat cannot be used externally for Hadoop Streaming today cause
Streaming generally relies on the K/V pairs to be able to emit text
representations (via toString()).
Since BytesRefArrayWritable has no toString() methods, the usage of the
RCFileInputFormat causes object representation prints which are not useful.
Also, unlike SequenceFiles, RCFiles store multiple "values" per row (i.e. an
array), so its important to output them in a valid/parseable manner, as opposed
to choosing a simple joining delimiter over the string representations of the
inner elements.
I propose adding a standardised CSV formatting of the array data, such that
users of Streaming can then parse the results in their own script. Since we
have OpenCSV as a dependency already, we can make use of it for this purpose.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)