[ https://issues.apache.org/jira/browse/HADOOP-10669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mikhail Bernadsky updated HADOOP-10669: --------------------------------------- Attachment: HADOOP-10669_alt.patch > Avro serialization does not flush buffered serialized values causing data lost > ------------------------------------------------------------------------------ > > Key: HADOOP-10669 > URL: https://issues.apache.org/jira/browse/HADOOP-10669 > Project: Hadoop Common > Issue Type: Bug > Components: io > Affects Versions: 2.4.0 > Reporter: Mikhail Bernadsky > Attachments: HADOOP-10669.patch, HADOOP-10669_alt.patch > > > Found this debugging Nutch. > MapTask serializes keys and values to the same stream, in pairs: > keySerializer.serialize(key); > ..... > valSerializer.serialize(value); > ..... > bb.write(b0, 0, 0); > AvroSerializer does not flush its buffer after each serialization. So if it > is used for valSerializer, the values are only partially written or not > written at all to the output stream before the record is marked as complete > (the last line above). -- This message was sent by Atlassian JIRA (v6.2#6252)