[ 
https://issues.apache.org/jira/browse/HADOOP-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Runping Qi updated HADOOP-732:
------------------------------

    Attachment: seqFileMetadata.patch


Attached is a patch for this issue.

SequenceFile has a new header --- a TreeMap<Text, Text> object wrapped in a 
class, Metadata, implementing Writable interface. To accomodate this, the 
version number is bumped up to 6. 

The Reader class has a new member variable for the metadata. A method is also 
added for returning the metadata object. The new code can read the files of old 
versions.

New constructors of various Writer classes are added to take a metadata object 
as their last parameter. New createWriter static functions with metadata as the 
last 
parameter are also introduced. They are all backward compatible. A new unit 
test is added to TestSequenceFile for testing writing/reading sequence files 
with metadata.
All unit tests passed.



> SequenceFile's header should allow to store metadata in the form of key/value 
> pairs
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-732
>                 URL: https://issues.apache.org/jira/browse/HADOOP-732
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: io
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>
> The sequence file currently stores a fixed list of metadata attributes, such 
> as key/value class names, 
> compression method, etc.  To make sequence file more self descriptable, it 
> should allow to store a list of key/value pairs.  One particular attribute of 
> interest is to indicate whether the key/value classes are actually hadoop 
> record classes, 
> if so, store the DDls for the records. This way, we may create tools to 
> extract DDl from a sequence file and 
> then generate necessary classes. It also make it possible to provide an 
> interpretive version of Hadoop record. 
> This way, even in the situation where Hadoop or the application does not have 
> the necessary classes, 
> a sequence file of Hadoop records can be read and deserialized 
> "interpretively".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to