[ 
https://issues.apache.org/jira/browse/DRILL-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16186403#comment-16186403
 ] 

Vitalii Diravka commented on DRILL-5826:
----------------------------------------

[~Paul.Rogers]
I have the same observations.

Can we skip this first empty batch like others empty bathes  ["skip over empty 
batches"|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unorderedreceiver/UnorderedReceiverBatch.java#L161]?
Looks like the following change can resolve the issue: 
{code}
// skip over empty batches. we do this since these are basically control 
messages.
        while (batch != null && batch.getHeader().getDef().getRecordCount() == 
0) {
          batch = getNextBatch();
        }
{code}


> UnorderedReceiverBatch fails to detect a schema change within a map
> -------------------------------------------------------------------
>
>                 Key: DRILL-5826
>                 URL: https://issues.apache.org/jira/browse/DRILL-5826
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.11.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>
> Run the following HBase query using:
> {code}
> select * from `hbase`.browser_action2 a
> {code}
> Table is defined as:
> {code}
> > create 'browser_action2', 'v', {SPLITS => 
> > ['0','1','2','3','4','5','6','7','8','9']}
> ...
> > scan 'browser_action2'
> ROW                                   COLUMN+CELL                             
>                                                                   
>  1                                    column=v:e0, timestamp=1506560555979, 
> value=abc1                                                          
>  2                                    column=v:e0, timestamp=1506560564807, 
> value=abc2
> {code}
> Step through the {{UnorderedReceiverBatch}} with a parallelization of 1. 
> Observe the following (behavior is random):
> * The first batch has schema (row_key, v) where v is an empty map 
> (corresponding to a column family), but no data (zero rows.)
> * Because the first batch has columns, it is sent downstream with 
> {{OK_NEW_SCHEMA}}.
> * The second batch has schema (row_key, v{e0}), where v is a map with column 
> e0 (corresponding to a column family with one column) and one row.
> * The code loads the batch, asking the batch itself if it has a new schema.
> * The batch does not have a new schema so returns false.
> * The {{UnorderedReceiverBatch}} returns {OK}, indicating to the downstream 
> operator that the second batch has the same schema as the first (which, in 
> this case, turns out to not be true.)
> Code in question:
> {code}
>       final boolean schemaChanged = batchLoader.load(rbd, batch.getBody());
> {code}
> In point of fact, each sender has no visibility to the schema of other 
> senders, and the order of receiving batches is undefined. Therefore, an input 
> batch has no way of knowing if it has the same schema as the previous output 
> batch.
> The obvious, correct, logic is to compare the incoming batch schema with the 
> current receiver schema, and send {{OK}} or {{OK_NEW_SCHEMA}} based on the 
> result of that comparison.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to