[ 
https://issues.apache.org/jira/browse/HADOOP-10669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bernadsky updated HADOOP-10669:
---------------------------------------

    Attachment: HADOOP-10669_alt.patch

> Avro serialization does not flush buffered serialized values causing data lost
> ------------------------------------------------------------------------------
>
>                 Key: HADOOP-10669
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10669
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 2.4.0
>            Reporter: Mikhail Bernadsky
>         Attachments: HADOOP-10669.patch, HADOOP-10669_alt.patch
>
>
> Found this debugging Nutch. 
> MapTask serializes keys and values to the same stream, in pairs: 
> keySerializer.serialize(key); 
> ..... 
> valSerializer.serialize(value);
>  ..... 
> bb.write(b0, 0, 0); 
> AvroSerializer does not flush its buffer after each serialization. So if it 
> is used for valSerializer, the values are only partially written or not 
> written at all to the output stream before the record is marked as complete 
> (the last line above).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to