[ 
https://issues.apache.org/jira/browse/FLINK-10356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16673211#comment-16673211
 ] 

ASF GitHub Bot commented on FLINK-10356:
----------------------------------------

NicoK commented on a change in pull request #6705: [FLINK-10356][network] add 
sanity checks to SpillingAdaptiveSpanningRecordDeserializer
URL: https://github.com/apache/flink/pull/6705#discussion_r230404808
 
 

 ##########
 File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/api/serialization/SpillingAdaptiveSpanningRecordDeserializer.java
 ##########
 @@ -549,21 +578,67 @@ private void addNextChunkFromMemorySegment(MemorySegment 
segment, int offset, in
                                }
                                else {
                                        spillingChannel.close();
+                                       spillingChannel = null;
 
-                                       BufferedInputStream inStream = new 
BufferedInputStream(new FileInputStream(spillFile), 2 * 1024 * 1024);
+                                       BufferedInputStream inStream =
+                                               new BufferedInputStream(
+                                                       new 
FileInputStream(checkNotNull(spillFile)),
+                                                       2 * 1024 * 1024);
                                        this.spillFileReader = new 
DataInputViewStreamWrapper(inStream);
                                }
                        }
                }
 
-               private void 
moveRemainderToNonSpanningDeserializer(NonSpanningWrapper deserializer) {
+               private void 
moveRemainderToNonSpanningDeserializer(NonSpanningWrapper deserializer) throws 
IOException {
+                       checkForDeserializationError(null);
+
                        deserializer.clear();
 
                        if (leftOverData != null) {
                                
deserializer.initializeFromMemorySegment(leftOverData, leftOverStart, 
leftOverLimit);
                        }
                }
 
+               /**
+                * In addition to a potentially thrown {@link EOFException}, 
checks for further
+                * deserialization errors and tries to throw an exception with 
a more meaningful error
+                * message.
+                *
+                * @param eofException
+                *              exception thrown before the check (or 
<tt>null</tt> if there was none)
+                *
+                * @throws IOException
+                *              in case of too few or too many bytes read, an 
exception with more useful data for
+                *              debugging
+                */
+               private void checkForDeserializationError(@Nullable 
EOFException eofException) throws IOException {
 
 Review comment:
   Actually, `null` values play quite nicely with exception causes and do not 
require additional branches and overloads - this boils down to a lot less code 
here. I'd keep it here - let's see if you can live with it after the other 
refactorings ;)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add sanity checks to SpillingAdaptiveSpanningRecordDeserializer
> ---------------------------------------------------------------
>
>                 Key: FLINK-10356
>                 URL: https://issues.apache.org/jira/browse/FLINK-10356
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Network
>    Affects Versions: 1.5.0, 1.5.1, 1.5.2, 1.5.3, 1.5.4, 1.6.0, 1.6.1, 1.7.0
>            Reporter: Nico Kruber
>            Assignee: Nico Kruber
>            Priority: Major
>              Labels: pull-request-available
>
> {{SpillingAdaptiveSpanningRecordDeserializer}} doesn't have any consistency 
> checks for usage calls or serializers behaving properly, e.g. to read only as 
> many bytes as available/promised for that record. At least these checks 
> should be added:
>  # Check that buffers have not been read from yet before adding them (this is 
> an invariant {{SpillingAdaptiveSpanningRecordDeserializer}} works with and 
> from what I can see, it is followed now.
>  # Check that after deserialization, we actually consumed {{recordLength}} 
> bytes
>  ** If not, in the spanning deserializer, we currently simply skip the 
> remaining bytes.
>  ** But in the non-spanning deserializer, we currently continue from the 
> wrong offset.
>  # Protect against {{setNextBuffer}} being called before draining all 
> available records



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to