alnzng opened a new pull request, #1615:
URL: https://github.com/apache/samza/pull/1615

   **Symptom**: 
   Using startpoints to trigger full bootstrapping is not reliable in the 
current implementation, we observed that the bootstrapping only happened on the 
part of expected partitions. 
   
   **Cause**:
   Within Samza (the main class to pay attention to is OffsetManager.scala), 
there is a bug in which a startpoint can be deleted before the startpoint 
actually gets used for message consumption. If a container gets into this 
situation, then the result is that the startpoint is ignored and consumption 
will continue from the previous processed message from before the startpoint 
was applied.
   - Load last processed offsets and startpoints
   - Use startpoints to register starting offsets for consumers
   - Message processing starts, but messages for only some of the partitions 
are received
   - Write checkpoint using last processed offsets
      - If a partition did not get messages, then the last processed offset is 
still the offset from before the standpoint.
   - Delete startpoints
   - Container dies (e.g. due to running out of memory)
   - On restart, load last processed offsets (startpoints have been deleted)
      - The partitions that did have messages in the previous deployment will 
have the correct checkpoint.
      - The partitions that did not have messages will have the checkpoint set 
to the offset from before the startpoint was applied. This is unexpected, and 
it means that bootstrapping is not happening for this partition.
   
   **Changes**:
   - Keep track of the partitions which have updated processed offsets, and 
only delete the startpoint for those partitions after checkpointing.
   
   **Tests**:
   - Added new unit tests and modified the existing unit tests for the new logic
   
   **API Changes**:
   - Added a new API `removeFanOutForTaskSSPs` in StartpointManager to allow 
clean up the fan outs on partition granularity
   
   **Upgrade Instructions**: None
   **Usage Instructions**: None


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to