ilooner commented on a change in pull request #1472: DRILL-6747: Fixed IOB in 
HashJoin for inner and right joins on empty probe side
URL: https://github.com/apache/drill/pull/1472#discussion_r219367526
 
 

 ##########
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatch.java
 ##########
 @@ -115,6 +115,14 @@
      *   This value will be returned only after {@link #OK_NEW_SCHEMA} has been
      *   returned at least once (not necessarily <em>immediately</em> after).
      * </p>
+     * <p>
+     *   Also after a RecordBatch returns NONE a RecordBatch should:
+     *   <ul>
+     *     <li>Contain the last valid schema seen by the operator.</li>
+     *     <li>Contain a VectorContainer with empty columns corresponding to 
the last valid schema.</li>
+     *     <li>Return a record count of 0.</li>
+     *   </ul>
+     * </p>
      */
     NONE(false),
 
 Review comment:
   @sohami I agree that guarantee was not provided before. However, I am 
proposing to change the contract in the way described in the comment. Note no 
data would be associated with a record batch after it returns NONE. It would 
only have a VectorContainer with **EMPTY** columns corresponding to the last 
seen valid schema, and return a record count of 0. In practice almost all 
operators already do this (except they don't always zero their container).
   
   The advantage of adding this new guarantee is that downstream operators do 
not have to do book keeping when the upstream can do it for us very easily.
   
   The details of why this was done are in the JIRA. There were many ways to 
fix the issue in HashJoin but this is by far the cleanest, and less risky since 
most of the code in HashJoin has no unit tests. It also has the advantage of 
clarifying the behavior of RecordBatches for future operator implementations. 
So I felt this approach was a win win and the best way to go about fixing the 
issue. However, there should be a follow up Jira to validate the new contract I 
am proposing for the rest of the operators, not just unordered reciever.
   
    

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to