[ 
https://issues.apache.org/jira/browse/DRILL-7324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16985731#comment-16985731
 ] 

ASF GitHub Bot commented on DRILL-7324:
---------------------------------------

paul-rogers commented on pull request #1912: DRILL-7324: Final set of "batch 
count" fixes
URL: https://github.com/apache/drill/pull/1912#discussion_r352383577
 
 

 ##########
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/statistics/StatisticsMergeBatch.java
 ##########
 @@ -82,17 +83,27 @@
  *       "sales_city" : BIGINT - nonnullstatcount(sales_city)
  *       "cnt"        : BIGINT - nonnullstatcount(cnt)
  *   .... another map for next stats function ....
+ * </pre>
+ * <p>
+ * Note that the above schema is not a valid Drill record batch: the varous
 
 Review comment:
   Thanks for the note. Taking a closer look, you are right; the batch just 
does create one record. The handling of the `recordCount` variable was 
misleading.
   
   I adjusted the code and now the vector checks pass. This is cool as it means 
all batches pass all checks and I was able to remove the special case code from 
the batch verifier.
   
   I recall having a long conversation about this design back when the PR was 
first submitted. A better design would be to use the new DICT type to convert 
the maps into DICTs of (field, value) pairs. I think things will be simpler and 
clearer if we have a DICT of fields and values than if we have a wide MAP where 
the map structure differs for very input file. Anyway, something to consider 
for later.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Many vector-validity errors from unit tests
> -------------------------------------------
>
>                 Key: DRILL-7324
>                 URL: https://issues.apache.org/jira/browse/DRILL-7324
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.16.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Major
>
> Drill's value vectors contain many counts that must be maintained in sync. 
> Drill provides a utility, {{BatchValidator}} to check (a subset of) these 
> values for consistency.
> The {{IteratorValidatorBatchIterator}} class is used in tests to validate the 
> state of each operator (AKA "record batch") as Drill runs the Volcano 
> iterator. This class can also validate vectors by setting the 
> {{VALIDATE_VECTORS}} constant to `true`.
> This was done, then unit tests were run. Many tests failed. Examples:
> {noformat}
> [INFO] Running org.apache.drill.TestUnionDistinct
> 18:44:26.742 [22d42585-74c2-d418-6f59-9b1870d04770:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> LimitRecordBatch
> key - NullableBitVector: Row count = 0, but value count = 2
> 18:44:26.745 [22d42585-74c2-d418-6f59-9b1870d04770:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> LimitRecordBatch
> key - NullableBitVector: Row count = 0, but value count = 2
> [INFO] Running org.apache.drill.TestUnionDistinct
> 8:44:48.302 [22d4256e-c90b-847c-5104-02d6cdf5223e:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> LimitRecordBatch
> key - NullableBitVector: Row count = 0, but value count = 2
> 18:44:48.703 [22d4256e-ccf3-2af6-f56a-140e9c3e55bb:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> FilterRecordBatch
> n_nationkey - IntVector: Row count = 2, but value count = 25
> n_regionkey - IntVector: Row count = 2, but value count = 25
> 18:44:48.731 [22d4256e-ccf3-2af6-f56a-140e9c3e55bb:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> FilterRecordBatch
> n_nationkey - IntVector: Row count = 4, but value count = 25
> n_regionkey - IntVector: Row count = 4, but value count = 25
> 18:44:49.039 [22d4256f-6b39-d2ab-d145-4f2b0db315a3:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> FilterRecordBatch
> n_nationkey - IntVector: Row count = 2, but value count = 25
> 18:44:49.363 [22d4256e-3d91-850f-9ab4-5939219ac0d0:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> FilterRecordBatch
> c_custkey - IntVector: Row count = 4, but value count = 1500
> 18:44:49.597 [22d4256d-c113-ae5c-6f31-4dd1ec091365:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> FilterRecordBatch
> n_nationkey - IntVector: Row count = 5, but value count = 25
> n_regionkey - IntVector: Row count = 5, but value count = 25
> 18:44:49.610 [22d4256d-c113-ae5c-6f31-4dd1ec091365:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> FilterRecordBatch
> r_regionkey - IntVector: Row count = 1, but value count = 5
> 18:44:53.029 [22d4256a-8b70-5f3b-f79b-806e194c5ed2:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> LimitRecordBatch
> n_nationkey - IntVector: Row count = 0, but value count = 25
> n_name - VarCharVector: Row count = 0, but value count = 25
> n_regionkey - IntVector: Row count = 0, but value count = 25
> 18:44:53.033 [22d4256a-8b70-5f3b-f79b-806e194c5ed2:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> LimitRecordBatch
> n_regionkey - IntVector: Row count = 5, but value count = 25
> 18:44:53.331 [22d4256a-526c-7815-c216-8e45752a4a6c:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> LimitRecordBatch
> n_nationkey - IntVector: Row count = 5, but value count = 25
> n_name - VarCharVector: Row count = 5, but value count = 25
> n_regionkey - IntVector: Row count = 5, but value count = 25
> 18:44:53.337 [22d4256a-526c-7815-c216-8e45752a4a6c:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> LimitRecordBatch
> n_regionkey - IntVector: Row count = 0, but value count = 25
> 18:44:53.646 [22d42569-c293-ced0-c3d0-e9153cc4a70a:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> LimitRecordBatch
> key - NullableBitVector: Row count = 0, but value count = 2
> Running org.apache.drill.TestTpchSingleMode
> 18:45:01.299 [22d42563-0ed6-1501-86a1-4cb375a9cad4:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> FilterRecordBatch
> Running org.apache.drill.TestMergeFilterPlan
> 18:45:03.738 [22d4255f-b322-fd56-2f93-34b7f5c709c1:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> FilterRecordBatch
> o_orderkey - IntVector: Row count = 561, but value count = 15000
> o_orderdate - DateVector: Row count = 561, but value count = 15000
> o_orderpriority - VarCharVector: Row count = 561, but value count = 15000
> 18:45:03.828 [22d4255f-b322-fd56-2f93-34b7f5c709c1:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> FilterRecordBatch
> l_orderkey - IntVector: Row count = 20580, but value count = 32767
> l_commitdate - DateVector: Row count = 20580, but value count = 32767
> l_receiptdate - DateVector: Row count = 20580, but value count = 32767
> 18:45:03.990 [22d4255f-b322-fd56-2f93-34b7f5c709c1:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> FilterRecordBatch
> l_orderkey - IntVector: Row count = 17317, but value count = 27408
> l_commitdate - DateVector: Row count = 17317, but value count = 27408
> l_receiptdate - DateVector: Row count = 17317, but value count = 27408
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.041 
> s - in org.apache.drill.TestMergeFilterPlan
> 18:45:04.929 [22d4255f-040c-f4c9-7d23-b90702db4a1e:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> FilterRecordBatch
> o_orderkey - IntVector: Row count = 2287, but value count = 15000
> o_custkey - IntVector: Row count = 2287, but value count = 15000
> o_orderdate - DateVector: Row count = 2287, but value count = 15000
> 18:45:04.944 [22d4255f-040c-f4c9-7d23-b90702db4a1e:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> FilterRecordBatch
> r_regionkey - IntVector: Row count = 1, but value count = 5
> r_name - VarCharVector: Row count = 1, but value count = 5
> [INFO] Running org.apache.drill.TestSelectWithOption
> 18:45:06.120 [22d4255e-5f13-aabb-40bb-bd09dc3d35e1:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> FilterRecordBatch
> l_quantity - Float8Vector: Row count = 594, but value count = 32767
> l_extendedprice - Float8Vector: Row count = 594, but value count = 32767
> l_discount - Float8Vector: Row count = 594, but value count = 32767
> l_shipdate - DateVector: Row count = 594, but value count = 32767
> 18:45:06.156 [22d4255e-5f13-aabb-40bb-bd09dc3d35e1:frag:0:0] ERROR 
> o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from 
> FilterRecordBatch
> l_quantity - Float8Vector: Row count = 543, but value count = 27408
> l_extendedprice - Float8Vector: Row count = 543, but value count = 27408
> l_discount - Float8Vector: Row count = 543, but value count = 27408
> l_shipdate - DateVector: Row count = 543, but value count = 27408
> {noformat}
> And many, many more. (Note that the test names might not be accurate: Maven 
> runs multiple tests in parallel and it is hard to correlate log messages with 
> tests in this output format.)
> The problem with these errors is that it makes operators very fragile: once 
> we accept invalid vectors, it is very hard to detect when an operator makes 
> vectors even more invalid. It is also hard to reason about the code if the 
> inputs (or outputs) can be corrupt in normal operation.
> Suggestions:
> 1. Extend {{BatchValidator}} with the vectors not yet covered (maps, repeated 
> maps.)
> 2. Work step-by-step through tests.
> 3. Identify operators that corrupt vectors.
> 4. Fix the source of corruption and retest.
> 5. Continue until no vector corruption errors occur.
> 6. Change the {{IteratorValidatorBatchIterator}} to check vectors by default, 
> and to throw a fatal error if corruption is found.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to