[
https://issues.apache.org/jira/browse/DRILL-8511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882786#comment-17882786
]
ASF GitHub Bot commented on DRILL-8511:
---------------------------------------
rymarm opened a new pull request, #2943:
URL: https://github.com/apache/drill/pull/2943
# [DRILL-8511](https://issues.apache.org/jira/browse/DRILL-8511): Overflow
appeared when the batch reached rows limit
## Description
The size-aware scan framework fails to end the batch.
Framework tries to reallocate the vector on batch end, due to a hidden,
minor bug in `BitColumnWriter` - which in general is not notable, but in a
specific case, when the initial vector allocation size limit is exceeded and a
reader reaches the batch row size limit.
`BitColumnWriter` uses instead of a write index a value count and this
causes unexpected vector reallocation (look at the changes).
## Documentation
No changes required.
## Testing
Manual tests
> Overflow appeared when the batch reached rows limit
> ---------------------------------------------------
>
> Key: DRILL-8511
> URL: https://issues.apache.org/jira/browse/DRILL-8511
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.21.2
> Reporter: Maksym Rymar
> Assignee: Maksym Rymar
> Priority: Major
> Attachments: complex.zip
>
>
>
> Drill fails to read a JSON file with the exception:
> {{java.lang.IllegalStateException: Unexpected state: FULL_BATCH:}}
> {code:java}
> Caused by: java.lang.IllegalStateException: Unexpected state: FULL_BATCH
> at
> org.apache.drill.exec.physical.resultSet.impl.ResultSetLoaderImpl.overflowed(ResultSetLoaderImpl.java:639)
> at
> org.apache.drill.exec.physical.resultSet.impl.ColumnState$PrimitiveColumnState.overflowed(ColumnState.java:73)
> at
> org.apache.drill.exec.vector.accessor.writer.BaseScalarWriter.overflowed(BaseScalarWriter.java:214)
> at
> org.apache.drill.exec.vector.accessor.writer.AbstractFixedWidthWriter.resize(AbstractFixedWidthWriter.java:249)
> at
> org.apache.drill.exec.vector.accessor.writer.BitColumnWriter.prepareWrite(BitColumnWriter.java:77)
> at
> org.apache.drill.exec.vector.accessor.writer.BitColumnWriter.setValueCount(BitColumnWriter.java:87)
> at
> org.apache.drill.exec.vector.accessor.writer.AbstractFixedWidthWriter.endWrite(AbstractFixedWidthWriter.java:299)
> at
> org.apache.drill.exec.vector.accessor.writer.NullableScalarWriter.endWrite(NullableScalarWriter.java:298)
> at
> org.apache.drill.exec.vector.accessor.writer.AbstractTupleWriter.endWrite(AbstractTupleWriter.java:366)
> at
> org.apache.drill.exec.physical.resultSet.impl.RowSetLoaderImpl.endBatch(RowSetLoaderImpl.java:101)
> at
> org.apache.drill.exec.physical.resultSet.impl.ResultSetLoaderImpl.harvestNormalBatch(ResultSetLoaderImpl.java:730)
> at
> org.apache.drill.exec.physical.resultSet.impl.ResultSetLoaderImpl.harvest(ResultSetLoaderImpl.java:700)
> at
> org.apache.drill.exec.physical.impl.scan.project.ReaderSchemaOrchestrator.endBatch(ReaderSchemaOrchestrator.java:137)
> at
> org.apache.drill.exec.physical.impl.scan.framework.ShimBatchReader.next(ShimBatchReader.java:148)
> at
> org.apache.drill.exec.physical.impl.scan.ReaderState.readBatch(ReaderState.java:400)
> at
> org.apache.drill.exec.physical.impl.scan.ReaderState.next(ReaderState.java:361)
> at
> org.apache.drill.exec.physical.impl.scan.ScanOperatorExec.nextAction(ScanOperatorExec.java:270)
> at
> org.apache.drill.exec.physical.impl.scan.ScanOperatorExec.next(ScanOperatorExec.java:242)
> at
> org.apache.drill.exec.physical.impl.protocol.OperatorDriver.doNext(OperatorDriver.java:201)
> at
> org.apache.drill.exec.physical.impl.protocol.OperatorDriver.start(OperatorDriver.java:179)
> at
> org.apache.drill.exec.physical.impl.protocol.OperatorDriver.next(OperatorDriver.java:129)
> at
> org.apache.drill.exec.physical.impl.protocol.OperatorRecordBatch.next(OperatorRecordBatch.java:149)
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:101)
> at
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:59)
> at
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:93)
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:161)
> at
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:103)
> at
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:81)
> at
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:93)
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.lambda$run$0(FragmentExecutor.java:324)
> at .......(:0)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2012)
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:313)
> at
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
> at .......(:0) {code}
> Overflow appeared when the batch reached the rows limit with JSON reader.
> To reproduce the issue - execute the following query against the attached
> file:
>
> {code:java}
> SELECT id,
> gbyi,
> gbyt,
> fl,
> nul,
> bool,
> str,
> sia,
> sfa,
> soa,
> ooa,
> oooi,
> ooof,
> ooos,
> oooa
> FROM dfs.tmp.`complex.json` {code}
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)