[ 
https://issues.apache.org/jira/browse/AVRO-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16717787#comment-16717787
 ] 

ASF subversion and git services commented on AVRO-2109:
-------------------------------------------------------

Commit a731fab500606404ecfd755717b441109ccf7337 in avro's branch 
refs/heads/branch-1.8 from [~gszadovszky]
[ https://gitbox.apache.org/repos/asf?p=avro.git;h=a731fab ]

AVRO-2109: Reset buffers in case of IOException

Closes #260

Signed-off-by: Zoltan Ivanfi <z...@cloudera.com>
Signed-off-by: sacharya <su...@apache.org>
Signed-off-by: Nandor Kollar <nkol...@apache.org>
(cherry picked from commit 673261c8656124cc58bee65fe5e8c779350779ee)


> Reset buffers in case of IOException
> ------------------------------------
>
>                 Key: AVRO-2109
>                 URL: https://issues.apache.org/jira/browse/AVRO-2109
>             Project: Apache Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.8.2
>            Reporter: Gabor Szadovszky
>            Assignee: Gabor Szadovszky
>            Priority: Major
>             Fix For: 1.7.8, 1.9.0, 1.8.3
>
>
> In case of an {{IOException}} is thrown out from 
> {{DataFileWriter.writeBlock}} the {{buffer}} and {{blockCount}} are not reset 
> therefore duplicated data is written out when {{close}}/{{flush}}.
> This is actually a conceptual question whether we should reset the buffer or 
> not in case of an exception. In case of an exception occurs during writing 
> the file we shall expect that the file will be corrupt. So, the possible 
> duplication of data shall not matter.
> In the other hand if the file is already corrupt why would we try to write 
> anything again at file close?
> This issue comes from a Flume issue where the HDFS wait thread is interrupted 
> because of a timeout during writing an Avro file. The actual block is 
> properly written already but because of the {{IOException}} caused by the 
> thread interrupt we invoke {{close()}} on the writer which writes the block 
> again with some other stuff (maybe duplicated sync marker) that makes the 
> file corrupt.
> [~busbey], [~nkollar], [~zi], any thoughts?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to