[ https://issues.apache.org/jira/browse/HADOOP-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Allen Wittenauer resolved HADOOP-4243. -------------------------------------- Resolution: Duplicate > Serialization framework use SequenceFile/TFile/Other metadata to instantiate > deserializer > ----------------------------------------------------------------------------------------- > > Key: HADOOP-4243 > URL: https://issues.apache.org/jira/browse/HADOOP-4243 > Project: Hadoop Common > Issue Type: Improvement > Components: contrib/serialization > Reporter: Pete Wyckoff > > SequenceFile metadata is useful for storing additional information about the > serialized data, for example, for RecordIO, whether the data is CSV or > Binary. For thrift, the same thing - Binary, JSON, ... > For Hive, this may be especially important, because it has a Dynamic generic > serializer/deserializer that takes its DDL at runtime (as opposed to RecordIO > and Thrift which require pre-compilation into a specific class whose name can > be stored in the sequence file key or value class). In this case, the class > name is like Record.java in RecordIO - it doesn't tell you anything without > the DDL. > One way to address this could be adding the sequence file metadata to the > getDeserializer call in Serialization interface. The api would then be > something like getDeserializer(Class<?>, Map<Text, Text> metadata) or > Properties metadata. > But, I am open to proposals. > This also means that saying a class implements Writable is not enough to > necessarily deserialize it since it may do specific actions based on the > metadata - e.g., RecordIO might determine whether to use CSV rather than the > default Binary deserialization. > There's the other issue of the getSerializer returning the metadata to be > written to the Sequence/T File. -- This message was sent by Atlassian JIRA (v6.2#6252)