vdiravka commented on pull request #2364:
URL: https://github.com/apache/drill/pull/2364#issuecomment-1030429509


   Hi @paul-rogers I have rebased the branch to master branch. And in separate 
new commit removed the hack, which hid the schema change in the 
`HashAggTemplate` (and actually one row is missing in query result, just 
actually test case doesn't check it).
   Thanks for explanation how vectors is working, it helped me. It is clear 
now, that schema is changing due to `RepeatedMapVector` [can't be obtained from 
the 
cache](https://github.com/apache/drill/blob/317f164791bbbe8f937eb452b49e92c34f1c0333/exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/impl/ColumnBuilder.java#L220):
   ```
         // Don't get the map vector from the vector cache. Map vectors may
         // have content that varies from batch to batch. Only the leaf
         // vectors can be cached.
   ```
   Obtaining vector from cache here leads to errors in this and other test 
cases:
   `mapVector = (RepeatedMapVector) 
parent.vectorCache().vectorFor(mapColSchema.schema());`
   
   ```
   org.apache.drill.common.exceptions.UserRemoteException: EXECUTION_ERROR 
ERROR: null
   
   Read failed for reader: JsonBatchReader
   ....
   Caused by: java.lang.AssertionError: 
        at 
org.apache.drill.exec.physical.resultSet.impl.TupleState$MapState.addOutputColumn(TupleState.java:475)
        at 
org.apache.drill.exec.physical.resultSet.impl.ColumnState.buildOutput(ColumnState.java:321)
        at 
org.apache.drill.exec.physical.resultSet.impl.TupleState.updateOutput(TupleState.java:206)
        at 
org.apache.drill.exec.physical.resultSet.impl.TupleState.updateOutput(TupleState.java:217)
        at 
org.apache.drill.exec.physical.resultSet.impl.TupleState$RowState.updateOutput(TupleState.java:430)
        at 
org.apache.drill.exec.physical.resultSet.impl.ResultSetLoaderImpl.harvest(ResultSetLoaderImpl.java:716)
   ```
   So as for me looks like we need to implement supporting schema change for 
hashAgg operator or obtaining `RepeatedMapVector` from the cache. I lean 
towards the latter. What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to