[ 
https://issues.apache.org/jira/browse/DRILL-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16285105#comment-16285105
 ] 

ASF GitHub Bot commented on DRILL-5851:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r155939394
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/AbstractRecordBatch.java
 ---
    @@ -228,4 +228,20 @@ public WritableBatch getWritableBatch() {
       public VectorContainer getOutgoingContainer() {
         throw new UnsupportedOperationException(String.format(" You should not 
call getOutgoingContainer() for class %s", this.getClass().getCanonicalName()));
       }
    +
    +  public void drainStream(IterOutcome stream, int input, RecordBatch 
batch) {
    +    if (stream == IterOutcome.OK_NEW_SCHEMA || stream == IterOutcome.OK) {
    +      for (final VectorWrapper<?> wrapper : batch) {
    +        wrapper.getValueVector().clear();
    +      }
    +      batch.kill(true);
    +      stream = next(input, batch);
    +      while (stream == IterOutcome.OK_NEW_SCHEMA || stream == 
IterOutcome.OK) {
    +        for (final VectorWrapper<?> wrapper : batch) {
    +          wrapper.getValueVector().clear();
    +        }
    +        stream = next(input, batch);
    +      }
    --- End diff --
    
    Let's think a bit about this. Each fragment is synchronous and resides in a 
single thread. The `kill()` call will tell the upstream batch that we don't 
want any more batches. Under what conditions would the upstream operator ignore 
our request and still send us more batches?
    
    Given that the upstream batch is in the same thread, there is no race 
condition issues. That is, the upstream can't be busy producing batches and 
adding them to a queue. Why? It is in the same thread and the thread is 
executing here.
    
    If the upstream is a network receiver, then the network layer should handle 
the race conditions so we don't expose those issues to the entire operator 
stack.
    
    Given all of this, I wonder, was this tested? Can it be tested? How can we 
verify that the mechanism actually works other than trying it in production? 
Any way to unit test this?


> Empty table during a join operation with a non empty table produces cast 
> exception 
> -----------------------------------------------------------------------------------
>
>                 Key: DRILL-5851
>                 URL: https://issues.apache.org/jira/browse/DRILL-5851
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 1.11.0
>            Reporter: Hanumath Rao Maduri
>            Assignee: Hanumath Rao Maduri
>
> Hash Join operation on tables with one table empty and the other non empty 
> throws an exception 
> {code} 
> Error: SYSTEM ERROR: DrillRuntimeException: Join only supports implicit casts 
> between 1. Numeric data
>  2. Varchar, Varbinary data 3. Date, Timestamp data Left type: VARCHAR, Right 
> type: INT. Add explicit casts to avoid this error
> {code}
> Here is an example query with which it is reproducible.
> {code}
> select * from cp.`sample-data/nation.parquet` nation left outer join 
> dfs.tmp.`2.csv` as two on two.a = nation.`N_COMMENT`;
> {code}
> the contents of 2.csv is empty (i.e not even header info).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to