[jira] Commented: (HADOOP-6729) serializer.JavaSerialization should be added to io.serializations by default

Tom White (JIRA) Wed, 28 Apr 2010 10:22:13 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-6729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861847#action_12861847
 ]


Tom White commented on HADOOP-6729:
-----------------------------------

One inefficiency of JavaSerialization is the fact that it stores the classname 
with every record. This is actually worse than normal Java serialization, which 
uses backreferences to classnames to make the resulting stream more compact. 
This optimization is disabled in Hadoop (see 
JavaSerializationSerializer#serialize()) because records are reordered in the 
shuffle, which would break back references.

Another inefficiency is that JavaSerialization creates a new object every time 
the deserialize() is called. In the context of large scale data processing, 
where there may be billions of records, this is very expensive, which is why 
Writables and Avro reuse instances.

> serializer.JavaSerialization should be added to io.serializations by default
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-6729
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6729
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.20.2
>            Reporter: Ted Yu
>
> org.apache.hadoop.io.serializer.JavaSerialization isn't included in 
> io.serializations by default.
> When a class which implements the Serializable interface is used, user would 
> see the following without serializer.JavaSerialization:
> java.lang.NullPointerException
>    at
> org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)
>    at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:759)
>    at
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:487)
>    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:575)
>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>    at org.apache.hadoop.mapred.Child.main(Child.java:170)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6729) serializer.JavaSerialization should be added to io.serializations by default

Reply via email to