Hi Charles,
Looks like I forgot the extra newlines needed to make my e-mail provider work
with Apache's mailer. Let me try again.
In the "classic" readers, each reader picks some number of rows per batch,
often 1K, 4K, 4000, etc. The idea is that, on average, this row count will give
us a decently-sized record batch.
EVF uses a different approach. It tallies the total space used by the batch,
and the space taken by each vector, and it will decide when the batch is full.
(You can also set a row count limit. By default, the limit is 64K, the maximum
row count allowed in a Drill batch.)
So, if you are converting from a "classic" reader to EVF, you will want to
remove the reader-imposed row count limit. (Or, if there is reason to do so,
you can pass that limit to EVF and let EVF enforce it.)
For example, convert code that looks like this:
for (int rowCount = 0; rowCount < MAX_BATCH_SIZE; rowCount++) {
// Load a row
}
Into code that looks like this:
RowSetLoader rowWriter = // get from EVF per examples
while (! rowWriter.isFull()) {
// Load a row
rowWriter.save();
}
Note that the "save" is needed because EVF let's you discard a row: you can
load it, check it, and decide to skip it. This will, eventually, allow us to
push filtering down to the reader. For now, you just need to call save().
Thanks,
- Paul
On Tuesday, September 24, 2019, 09:56:34 AM PDT, Paul Rogers
<[email protected]> wrote:
So the usual pattern is:
while (! rowWriter.isFull()) { // Load the row rowWriter.save();}
Is it the case that PCAP is trying to force the row count to, say 4K or 8K or
whatever? If so, ignore that count.
The error is telling you that at least one vector has reached 16 MB in size (or
you've reached the row count limit, if you set that.)
Thanks,
- Paul
On Monday, September 23, 2019, 08:09:38 PM PDT, Charles Givre
<[email protected]> wrote:
Ok... so I have yet another question relating to the EVF. I'm working on a
project to improve (hopefully) the PCAP plugin with the ultimate goal being to
include parsed PCAP packet data. In any event, I've run into a snag. In one
unit test, I'm getting the error below when I call rowWriter.save(). I suspect
what is happening is that the TupleWriter is not full when it starts writing
the row, but by the time it is finished writing the fields, the batch is full.
Does that even make sense?
Here's a link to the offending code
<https://github.com/cgivre/drill/blob/caa69e7f27f68aeedaa28902596a2250ab32cd84/exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/PcapBatchReader.java#L232>:
Any suggestions?
Thanks!
--C
[Error Id: bf6446fe-95d5-4ba7-bf7f-32e8a5dd4d1e on 192.168.1.21:31010]
(java.lang.IllegalStateException) Unexpected state: FULL_BATCH
org.apache.drill.exec.physical.resultSet.impl.ResultSetLoaderImpl.saveRow():530
org.apache.drill.exec.physical.resultSet.impl.RowSetLoaderImpl.save():73
org.apache.drill.exec.store.pcap.PcapBatchReader.addDataToTable():232
org.apache.drill.exec.store.pcap.PcapBatchReader.parsePcapFilesAndPutItToTable():174
org.apache.drill.exec.store.pcap.PcapBatchReader.next():102
org.apache.drill.exec.physical.impl.scan.framework.ShimBatchReader.next():132
org.apache.drill.exec.physical.impl.scan.ReaderState.readBatch():412
org.apache.drill.exec.physical.impl.scan.ReaderState.next():369
org.apache.drill.exec.physical.impl.scan.ScanOperatorExec.nextAction():261
org.apache.drill.exec.physical.impl.scan.ScanOperatorExec.next():232
org.apache.drill.exec.physical.impl.protocol.OperatorDriver.doNext():196
org.apache.drill.exec.physical.impl.protocol.OperatorDriver.start():174
org.apache.drill.exec.physical.impl.protocol.OperatorDriver.next():124
org.apache.drill.exec.physical.impl.protocol.OperatorRecordBatch.next():148
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():237
org.apache.drill.exec.record.AbstractRecordBatch.next():126
org.apache.drill.exec.record.AbstractRecordBatch.next():116
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141
org.apache.drill.exec.record.AbstractRecordBatch.next():186
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():237
org.apache.drill.exec.physical.impl.BaseRootExec.next():104
org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
org.apache.drill.exec.physical.impl.BaseRootExec.next():94
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():422
org.apache.hadoop.security.UserGroupInformation.doAs():1746
org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1149
java.util.concurrent.ThreadPoolExecutor$Worker.run():624
java.lang.Thread.run():748
at
org.apache.drill.exec.rpc.RpcException.mapException(RpcException.java:60)
~[classes/:na]
at
org.apache.drill.exec.client.DrillClient$ListHoldingResultsListener.getResults(DrillClient.java:881)
~[classes/:na]
at org.apache.drill.exec.client.DrillClient.runQuery(DrillClient.java:583)
~[classes/:na]
at
org.apache.drill.test.BaseTestQuery.testRunAndReturn(BaseTestQuery.java:340)
~[test-classes/:na]
at
org.apache.drill.test.BaseTestQuery.testSqlWithResults(BaseTestQuery.java:321)
~[test-classes/:na]
at
org.apache.drill.exec.store.pcap.TestPcapRecordReader.runSQLWithResults(TestPcapRecordReader.java:90)
~[test-classes/:na]
at
org.apache.drill.exec.store.pcap.TestPcapRecordReader.runSQLVerifyCount(TestPcapRecordReader.java:85)
~[test-classes/:na]
at
org.apache.drill.exec.store.pcap.TestPcapRecordReader.testCorruptPCAPQuery(TestPcapRecordReader.java:47)
~[test-classes/:na]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_211]
Caused by: org.apache.drill.common.exceptions.UserRemoteException:
EXECUTION_ERROR ERROR: Unexpected state: FULL_BATCH
Read failed for reader PcapBatchReader
Fragment 0:0
[Error Id: bf6446fe-95d5-4ba7-bf7f-32e8a5dd4d1e on 192.168.1.21:31010]
(java.lang.IllegalStateException) Unexpected state: FULL_BATCH
org.apache.drill.exec.physical.resultSet.impl.ResultSetLoaderImpl.saveRow():530
org.apache.drill.exec.physical.resultSet.impl.RowSetLoaderImpl.save():73
org.apache.drill.exec.store.pcap.PcapBatchReader.addDataToTable():232
org.apache.drill.exec.store.pcap.PcapBatchReader.parsePcapFilesAndPutItToTable():174
org.apache.drill.exec.store.pcap.PcapBatchReader.next():102
org.apache.drill.exec.physical.impl.scan.framework.ShimBatchReader.next():132
org.apache.drill.exec.physical.impl.scan.ReaderState.readBatch():412
org.apache.drill.exec.physical.impl.scan.ReaderState.next():369
org.apache.drill.exec.physical.impl.scan.ScanOperatorExec.nextAction():261
org.apache.drill.exec.physical.impl.scan.ScanOperatorExec.next():232
org.apache.drill.exec.physical.impl.protocol.OperatorDriver.doNext():196
org.apache.drill.exec.physical.impl.protocol.OperatorDriver.start():174
org.apache.drill.exec.physical.impl.protocol.OperatorDriver.next():124
org.apache.drill.exec.physical.impl.protocol.OperatorRecordBatch.next():148
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():237
org.apache.drill.exec.record.AbstractRecordBatch.next():126
org.apache.drill.exec.record.AbstractRecordBatch.next():116
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141
org.apache.drill.exec.record.AbstractRecordBatch.next():186
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():237
org.apache.drill.exec.physical.impl.BaseRootExec.next():104
org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
org.apache.drill.exec.physical.impl.BaseRootExec.next():94
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():422
org.apache.hadoop.security.UserGroupInformation.doAs():1746
org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1149
java.util.concurrent.ThreadPoolExecutor$Worker.run():624
java.lang.Thread.run():748