[jira] [Commented] (DRILL-5546) Schema change problems caused by empty batch

ASF GitHub Bot (JIRA) Mon, 21 Aug 2017 12:54:27 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16135707#comment-16135707
 ]


ASF GitHub Bot commented on DRILL-5546:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/906#discussion_r134301372
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java 
---
    @@ -152,97 +157,75 @@ public void kill(boolean sendUpstream) {
         }
       }
     
    -  private void releaseAssets() {
    -    container.zeroVectors();
    -  }
    -
    -  private void clearFieldVectorMap() {
    -    for (final ValueVector v : mutator.fieldVectorMap().values()) {
    -      v.clear();
    -    }
    -  }
    -
       @Override
       public IterOutcome next() {
         if (done) {
           return IterOutcome.NONE;
         }
         oContext.getStats().startProcessing();
         try {
    -      try {
    -        injector.injectChecked(context.getExecutionControls(), 
"next-allocate", OutOfMemoryException.class);
    -
    -        currentReader.allocate(mutator.fieldVectorMap());
    -      } catch (OutOfMemoryException e) {
    -        clearFieldVectorMap();
    -        throw UserException.memoryError(e).build(logger);
    -      }
    -      while ((recordCount = currentReader.next()) == 0) {
    +      while (true) {
             try {
    -          if (!readers.hasNext()) {
    -            // We're on the last reader, and it has no (more) rows.
    -            currentReader.close();
    -            releaseAssets();
    -            done = true;  // have any future call to next() return NONE
    -
    -            if (mutator.isNewSchema()) {
    -              // This last reader has a new schema (e.g., we have a 
zero-row
    -              // file or other source).  (Note that some sources have a 
non-
    -              // null/non-trivial schema even when there are no rows.)
    +          injector.injectChecked(context.getExecutionControls(), 
"next-allocate", OutOfMemoryException.class);
    +          currentReader.allocate(mutator.fieldVectorMap());
    +        } catch (OutOfMemoryException e) {
    +          clearFieldVectorMap();
    +          throw UserException.memoryError(e).build(logger);
    +        }
     
    -              container.buildSchema(SelectionVectorMode.NONE);
    -              schema = container.getSchema();
    +        recordCount = currentReader.next();
    +        Preconditions.checkArgument(recordCount >= 0,
    +            "recordCount from RecordReader.next() should not be negative");
     
    -              return IterOutcome.OK_NEW_SCHEMA;
    -            }
    -            return IterOutcome.NONE;
    -          }
    -          // At this point, the reader that hit its end is not the last 
reader.
    +        boolean isNewRegularSchema = mutator.isNewSchema();
    +        // We should skip the reader, when recordCount = 0 && ! 
isNewRegularSchema.
    +        // Add/set implicit column vectors, only when reader gets > 0 row, 
or
    +        // when reader gets 0 row but with a schema with new field added
    +        if (recordCount > 0 || isNewRegularSchema) {
    +          addImplicitVectors();
    +          populateImplicitVectors();
    +        }
     
    -          // If all the files we have read so far are just empty, the 
schema is not useful
    -          if (! hasReadNonEmptyFile) {
    -            container.clear();
    -            clearFieldVectorMap();
    -            mutator.clear();
    -          }
    +        boolean isNewImplicitSchema = mutator.isNewSchema();
    +        for (VectorWrapper<?> w : container) {
    +          w.getValueVector().getMutator().setValueCount(recordCount);
    +        }
    +        final boolean isNewSchema = isNewRegularSchema || 
isNewImplicitSchema;
    +        oContext.getStats().batchReceived(0, recordCount, isNewSchema);
     
    +        if (recordCount == 0) {
               currentReader.close();
    -          currentReader = readers.next();
    -          implicitValues = implicitColumns.hasNext() ? 
implicitColumns.next() : null;
    -          currentReader.setup(oContext, mutator);
    -          try {
    -            currentReader.allocate(mutator.fieldVectorMap());
    -          } catch (OutOfMemoryException e) {
    -            clearFieldVectorMap();
    -            throw UserException.memoryError(e).build(logger);
    +          if (isNewSchema) {
    +            // current reader presents a new schema in mutator even though 
it has 0 row.
    --- End diff --
    
    Thanks for the comments here and in the PR. I wonder, can the PR 
description be moved into a class Javadoc comment so that it is available for 
future readers? Also, makes it easier to review since the description is close 
to the code.


> Schema change problems caused by empty batch
> --------------------------------------------
>
>                 Key: DRILL-5546
>                 URL: https://issues.apache.org/jira/browse/DRILL-5546
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Jinfeng Ni
>            Assignee: Jinfeng Ni
>
> There have been a few JIRAs opened related to schema change failure caused by 
> empty batch. This JIRA is opened as an umbrella for all those related JIRAS ( 
> such as DRILL-4686, DRILL-4734, DRILL4476, DRILL-4255, etc).
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5546) Schema change problems caused by empty batch

Reply via email to