[GitHub] [flink] pnowojski commented on a change in pull request #13614: [FLINK-19547][runtime] Clean up partial record when reconnecting for Approximate Local Recovery

GitBox Thu, 15 Oct 2020 04:25:17 -0700


pnowojski commented on a change in pull request #13614:
URL: https://github.com/apache/flink/pull/13614#discussion_r505464538




##########
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/buffer/BufferConsumerWithPartialRecordLength.java
##########
@@ -0,0 +1,120 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.io.network.buffer;
+
+import javax.annotation.concurrent.NotThreadSafe;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+import static org.apache.flink.util.Preconditions.checkState;
+
+/**
+ * BufferConsumer with partial record length if a record is spanning over 
buffers
+ *
+ * <p>`partialRecordLength` is the length of bytes to skip in order to start 
with a complete record,
+ * from position index 0 of the underlying MemorySegment. 
`partialRecordLength` is used in approximate
+ * local recovery to find the start position of a complete record on a 
BufferConsumer, so called
+ * `partial record clean-up`.
+ *
+ * <p>Partial records happen if a record can not fit into one buffer, then the 
remaining part of the same record
+ * is put into the next buffer. Hence partial records only exist at the 
beginning of a buffer.
+ * Partial record clean-up is needed in the mode of approximate local recovery.
+ * If a record is spanning over multiple buffers, and the first (several) 
buffers have got lost due to the failure
+ * of the receiver task, the remaining data belonging to the same record in 
transition should be cleaned up.
+ *
+ * <p> If partialRecordLength == 0, the buffer starts with a complete 
record</p>
+ * <p> If partialRecordLength > 0, the buffer starts with a partial record, 
its length = partialRecordLength</p>
+ * <p> If partialRecordLength < 0, partialRecordLength is undefined. It is 
currently used in
+ *                                                                     {@cite 
ResultSubpartitionRecoveredStateHandler#recover}</p>
+ */
+@NotThreadSafe
+public class BufferConsumerWithPartialRecordLength {
+       private final BufferConsumer bufferConsumer;
+       private final int partialRecordLength;
+
+       public BufferConsumerWithPartialRecordLength(BufferConsumer 
bufferConsumer, int partialRecordLength) {
+               this.bufferConsumer = checkNotNull(bufferConsumer);
+               this.partialRecordLength = partialRecordLength;
+       }
+
+       public BufferConsumer getBufferConsumer() {
+               return bufferConsumer;
+       }
+
+       public int getPartialRecordLength() {
+               return partialRecordLength;
+       }
+
+       public Buffer build() {
+               return bufferConsumer.build();
+       }
+
+       public PartialRecordCleanupResult cleanupPartialRecord() {

Review comment:
       > In the current build(), currentReaderPosition can never be 
buffer.getMaxCapacity()
   
   is that really the case? I think the `BufferConsumer` should be able to 
handle building empty buffers, after writer has appended and committed the 
data, but hasn't yet called `BufferBuilder#finish()`.  After all those are two 
separate calls happening in a different thread.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] pnowojski commented on a change in pull request #13614: [FLINK-19547][runtime] Clean up partial record when reconnecting for Approximate Local Recovery

Reply via email to