[ 
https://issues.apache.org/jira/browse/BEAM-9743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyoungha Min updated BEAM-9743:
-------------------------------
    Description: 
Issue # 1: TFRecordCodec only tries once to read the header/footer. This is 
likely to fail around the end of channel buffer.  

Issue # 2: (minor) TFRecordCodec currently does not checks how much it writes. 

 

Seems like it only happens with Zstd compression (or any other picky input 
stream that refuse to read fully). ZstdInputStream seems very picky at giving 
out data.

The parts with the issue are

[https://github.com/apache/beam/blob/c7911043510a266078a3dc8faef7a1dbe1f598c5/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java#L672]

[https://github.com/apache/beam/blob/c7911043510a266078a3dc8faef7a1dbe1f598c5/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java#L699]

 

And not so problem within the beam application (As all (or most) of 
WritableByteChannels in beam-java-sdk-core are backed by some OutputStream), 
but still not following the WritableByteChannel specification, 

[https://github.com/apache/beam/blob/c7911043510a266078a3dc8faef7a1dbe1f598c5/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java#L720-L727]

 

ReadableByteChannel/WritableByteChannel Javadoc specifies that they are not 
required to read/write fully, and can refuse to read/write time to time.

  was:
Issue # 1: TFRecordCodec only tries once to read the header/footer. This is 
likely to fail around the end of channel buffer.  

Issue # 2: (minor) TFRecordCodec currently does not checks how much it writes. 

 

Seems like it only happens with Zstd compression (or any other picky input 
stream that refuse to read fully). ZstdInputStream seems very picky at giving 
out data.

The parts with the issue are

[https://github.com/apache/beam/blob/c7911043510a266078a3dc8faef7a1dbe1f598c5/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java#L672]

[https://github.com/apache/beam/blob/c7911043510a266078a3dc8faef7a1dbe1f598c5/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java#L699]

 

And not so problem within the beam application (As all WritableByteChannels in 
beam-java-sdk-core are backed by some OutputStream), but still not following 
the WritableByteChannel specification, 

[https://github.com/apache/beam/blob/c7911043510a266078a3dc8faef7a1dbe1f598c5/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java#L720-L727]

 

ReadableByteChannel/WritableByteChannel Javadoc specifies that they are not 
required to read/write fully, and can refuse to read/write time to time.


> TFRecordCodec not attempt to fully read/write
> ---------------------------------------------
>
>                 Key: BEAM-9743
>                 URL: https://issues.apache.org/jira/browse/BEAM-9743
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-core
>            Reporter: Kyoungha Min
>            Assignee: Kyoungha Min
>            Priority: Critical
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Issue # 1: TFRecordCodec only tries once to read the header/footer. This is 
> likely to fail around the end of channel buffer.  
> Issue # 2: (minor) TFRecordCodec currently does not checks how much it 
> writes. 
>  
> Seems like it only happens with Zstd compression (or any other picky input 
> stream that refuse to read fully). ZstdInputStream seems very picky at giving 
> out data.
> The parts with the issue are
> [https://github.com/apache/beam/blob/c7911043510a266078a3dc8faef7a1dbe1f598c5/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java#L672]
> [https://github.com/apache/beam/blob/c7911043510a266078a3dc8faef7a1dbe1f598c5/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java#L699]
>  
> And not so problem within the beam application (As all (or most) of 
> WritableByteChannels in beam-java-sdk-core are backed by some OutputStream), 
> but still not following the WritableByteChannel specification, 
> [https://github.com/apache/beam/blob/c7911043510a266078a3dc8faef7a1dbe1f598c5/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java#L720-L727]
>  
> ReadableByteChannel/WritableByteChannel Javadoc specifies that they are not 
> required to read/write fully, and can refuse to read/write time to time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to