[jira] [Commented] (DRILL-5504) Vector validator to diagnose offset vector issues
[ https://issues.apache.org/jira/browse/DRILL-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16035804#comment-16035804 ] ASF GitHub Bot commented on DRILL-5504: --- Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/832 > Vector validator to diagnose offset vector issues > - > > Key: DRILL-5504 > URL: https://issues.apache.org/jira/browse/DRILL-5504 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > Labels: ready-to-commit > Fix For: 1.11.0 > > > DRILL-5470 describes a case in which an offset vector appears to have become > corrupted, yielding a bogus field-length value that is orders of magnitude > larger than the vector that contains the data. > Debugging such cases is slow and tedious. To help, we propose to create a > "vector validator" that spins through vectors looking for problems. > Then, to allow the validator to be used in the field, extend the "iterator > validator batch iterator" to optionally allow vector validation on each batch. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5504) Vector validator to diagnose offset vector issues
[ https://issues.apache.org/jira/browse/DRILL-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16025573#comment-16025573 ] ASF GitHub Bot commented on DRILL-5504: --- Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/832 Fixed typo in log message and rebased onto latest master. > Vector validator to diagnose offset vector issues > - > > Key: DRILL-5504 > URL: https://issues.apache.org/jira/browse/DRILL-5504 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > Labels: ready-to-commit > Fix For: 1.11.0 > > > DRILL-5470 describes a case in which an offset vector appears to have become > corrupted, yielding a bogus field-length value that is orders of magnitude > larger than the vector that contains the data. > Debugging such cases is slow and tedious. To help, we propose to create a > "vector validator" that spins through vectors looking for problems. > Then, to allow the validator to be used in the field, extend the "iterator > validator batch iterator" to optionally allow vector validation on each batch. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5504) Vector validator to diagnose offset vector issues
[ https://issues.apache.org/jira/browse/DRILL-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16017674#comment-16017674 ] ASF GitHub Bot commented on DRILL-5504: --- Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/832 Commits squashed. > Vector validator to diagnose offset vector issues > - > > Key: DRILL-5504 > URL: https://issues.apache.org/jira/browse/DRILL-5504 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > Fix For: 1.11.0 > > > DRILL-5470 describes a case in which an offset vector appears to have become > corrupted, yielding a bogus field-length value that is orders of magnitude > larger than the vector that contains the data. > Debugging such cases is slow and tedious. To help, we propose to create a > "vector validator" that spins through vectors looking for problems. > Then, to allow the validator to be used in the field, extend the "iterator > validator batch iterator" to optionally allow vector validation on each batch. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5504) Vector validator to diagnose offset vector issues
[ https://issues.apache.org/jira/browse/DRILL-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16017612#comment-16017612 ] ASF GitHub Bot commented on DRILL-5504: --- Github user sudheeshkatkam commented on the issue: https://github.com/apache/drill/pull/832 +1 Please squash the commits. > Vector validator to diagnose offset vector issues > - > > Key: DRILL-5504 > URL: https://issues.apache.org/jira/browse/DRILL-5504 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > Fix For: 1.11.0 > > > DRILL-5470 describes a case in which an offset vector appears to have become > corrupted, yielding a bogus field-length value that is orders of magnitude > larger than the vector that contains the data. > Debugging such cases is slow and tedious. To help, we propose to create a > "vector validator" that spins through vectors looking for problems. > Then, to allow the validator to be used in the field, extend the "iterator > validator batch iterator" to optionally allow vector validation on each batch. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5504) Vector validator to diagnose offset vector issues
[ https://issues.apache.org/jira/browse/DRILL-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016609#comment-16016609 ] ASF GitHub Bot commented on DRILL-5504: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/832#discussion_r117362498 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/validate/BatchValidator.java --- @@ -0,0 +1,205 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + **/ +package org.apache.drill.exec.physical.impl.validate; + +import java.util.ArrayList; +import java.util.List; + +import org.apache.drill.exec.record.SimpleVectorWrapper; +import org.apache.drill.exec.record.VectorAccessible; +import org.apache.drill.exec.record.VectorWrapper; +import org.apache.drill.exec.vector.BaseDataValueVector; +import org.apache.drill.exec.vector.FixedWidthVector; +import org.apache.drill.exec.vector.NullableVarCharVector; +import org.apache.drill.exec.vector.NullableVector; +import org.apache.drill.exec.vector.RepeatedVarCharVector; +import org.apache.drill.exec.vector.UInt4Vector; +import org.apache.drill.exec.vector.ValueVector; +import org.apache.drill.exec.vector.VarCharVector; +import org.apache.drill.exec.vector.VariableWidthVector; +import org.apache.drill.exec.vector.complex.BaseRepeatedValueVector; +import org.apache.drill.exec.vector.complex.RepeatedFixedWidthVectorLike; + + +/** + * Validate a batch of value vectors. It is not possible to validate the + * data, but we can validate the structure, especially offset vectors. + * Only handles single (non-hyper) vectors at present. Current form is + * self-contained. Better checks can be done by moving checks inside + * vectors or by exposing more metadata from vectors. + */ + +public class BatchValidator { + private static final org.slf4j.Logger logger = + org.slf4j.LoggerFactory.getLogger(BatchValidator.class); + + public static final int MAX_ERRORS = 100; + + private final int rowCount; + private final VectorAccessible batch; + private final List errorList; + private int errorCount; + + public BatchValidator(VectorAccessible batch) { +rowCount = batch.getRecordCount(); +this.batch = batch; +errorList = null; + } + + public BatchValidator(VectorAccessible batch, boolean captureErrors) { +rowCount = batch.getRecordCount(); +this.batch = batch; +if (captureErrors) { + errorList = new ArrayList<>(); +} else { + errorList = null; +} + } + + public void validate() { +for (VectorWrapper w : batch) { + validateWrapper(w); +} + } + + private void validateWrapper(VectorWrapper w) { +if (w instanceof SimpleVectorWrapper) { + validateVector(w.getValueVector()); +} --- End diff -- Done. See DRILL-5526. > Vector validator to diagnose offset vector issues > - > > Key: DRILL-5504 > URL: https://issues.apache.org/jira/browse/DRILL-5504 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > Fix For: 1.11.0 > > > DRILL-5470 describes a case in which an offset vector appears to have become > corrupted, yielding a bogus field-length value that is orders of magnitude > larger than the vector that contains the data. > Debugging such cases is slow and tedious. To help, we propose to create a > "vector validator" that spins through vectors looking for problems. > Then, to allow the validator to be used in the field, extend the "iterator > validator batch iterator" to optionally allow vector validation on
[jira] [Commented] (DRILL-5504) Vector validator to diagnose offset vector issues
[ https://issues.apache.org/jira/browse/DRILL-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016610#comment-16016610 ] ASF GitHub Bot commented on DRILL-5504: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/832#discussion_r117359430 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ImplCreator.java --- @@ -69,9 +70,18 @@ public static RootExec getExec(FragmentContext context, FragmentRoot root) throw Preconditions.checkNotNull(root); Preconditions.checkNotNull(context); -if (AssertionUtil.isAssertionsEnabled()) { +// Enable iterator (operator) validation if assertions are enabled (debug mode) +// or if in production mode and the ENABLE_ITERATOR_VALIDATION option is set +// to true. + +boolean enableValidation = AssertionUtil.isAssertionsEnabled(); +if (! enableValidation) { + enableValidation = context.getOptionSet().getOption(ExecConstants.ENABLE_ITERATOR_VALIDATOR); +} +if (enableValidation) { --- End diff -- Fixed. > Vector validator to diagnose offset vector issues > - > > Key: DRILL-5504 > URL: https://issues.apache.org/jira/browse/DRILL-5504 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > Fix For: 1.11.0 > > > DRILL-5470 describes a case in which an offset vector appears to have become > corrupted, yielding a bogus field-length value that is orders of magnitude > larger than the vector that contains the data. > Debugging such cases is slow and tedious. To help, we propose to create a > "vector validator" that spins through vectors looking for problems. > Then, to allow the validator to be used in the field, extend the "iterator > validator batch iterator" to optionally allow vector validation on each batch. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5504) Vector validator to diagnose offset vector issues
[ https://issues.apache.org/jira/browse/DRILL-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016608#comment-16016608 ] ASF GitHub Bot commented on DRILL-5504: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/832#discussion_r117361619 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/validate/BatchValidator.java --- @@ -0,0 +1,205 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + **/ +package org.apache.drill.exec.physical.impl.validate; + +import java.util.ArrayList; +import java.util.List; + +import org.apache.drill.exec.record.SimpleVectorWrapper; +import org.apache.drill.exec.record.VectorAccessible; +import org.apache.drill.exec.record.VectorWrapper; +import org.apache.drill.exec.vector.BaseDataValueVector; +import org.apache.drill.exec.vector.FixedWidthVector; +import org.apache.drill.exec.vector.NullableVarCharVector; +import org.apache.drill.exec.vector.NullableVector; +import org.apache.drill.exec.vector.RepeatedVarCharVector; +import org.apache.drill.exec.vector.UInt4Vector; +import org.apache.drill.exec.vector.ValueVector; +import org.apache.drill.exec.vector.VarCharVector; +import org.apache.drill.exec.vector.VariableWidthVector; +import org.apache.drill.exec.vector.complex.BaseRepeatedValueVector; +import org.apache.drill.exec.vector.complex.RepeatedFixedWidthVectorLike; + + +/** + * Validate a batch of value vectors. It is not possible to validate the + * data, but we can validate the structure, especially offset vectors. + * Only handles single (non-hyper) vectors at present. Current form is + * self-contained. Better checks can be done by moving checks inside + * vectors or by exposing more metadata from vectors. + */ + +public class BatchValidator { + private static final org.slf4j.Logger logger = + org.slf4j.LoggerFactory.getLogger(BatchValidator.class); + + public static final int MAX_ERRORS = 100; + + private final int rowCount; + private final VectorAccessible batch; + private final List errorList; + private int errorCount; + + public BatchValidator(VectorAccessible batch) { +rowCount = batch.getRecordCount(); +this.batch = batch; +errorList = null; + } + + public BatchValidator(VectorAccessible batch, boolean captureErrors) { +rowCount = batch.getRecordCount(); +this.batch = batch; +if (captureErrors) { + errorList = new ArrayList<>(); +} else { + errorList = null; +} + } + + public void validate() { --- End diff -- Great idea! Added a config option that forces vector validation. Add the following to the pom.xml file in the Surefire options: {code} -Ddrill.exec.debug.validate_vectors=true {code} Will try this out and enable the checks as a different JIRA ticket and PR. > Vector validator to diagnose offset vector issues > - > > Key: DRILL-5504 > URL: https://issues.apache.org/jira/browse/DRILL-5504 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > Fix For: 1.11.0 > > > DRILL-5470 describes a case in which an offset vector appears to have become > corrupted, yielding a bogus field-length value that is orders of magnitude > larger than the vector that contains the data. > Debugging such cases is slow and tedious. To help, we propose to create a > "vector validator" that spins through vectors looking for problems. > Then, to allow the validator to be used in the field, extend the "iterator > validator batch iterator" to optionally allow vector validation
[jira] [Commented] (DRILL-5504) Vector validator to diagnose offset vector issues
[ https://issues.apache.org/jira/browse/DRILL-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16014873#comment-16014873 ] ASF GitHub Bot commented on DRILL-5504: --- Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/832#discussion_r116091914 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java --- @@ -449,4 +449,19 @@ String PERSISTENT_TABLE_UMASK = "exec.persistent_table.umask"; StringValidator PERSISTENT_TABLE_UMASK_VALIDATOR = new StringValidator(PERSISTENT_TABLE_UMASK, "002"); + /** + * When iterator validation is enabled, additionally validates the vectors in + * each batch passed to each iterator. + */ + String ENABLE_VECTOR_VALIDATION = "debug.validate_vectors"; + BooleanValidator ENABLE_VECTOR_VALIDATOR = new BooleanValidator(ENABLE_VECTOR_VALIDATION, true); --- End diff -- false, by default, here and below. > Vector validator to diagnose offset vector issues > - > > Key: DRILL-5504 > URL: https://issues.apache.org/jira/browse/DRILL-5504 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > Fix For: 1.11.0 > > > DRILL-5470 describes a case in which an offset vector appears to have become > corrupted, yielding a bogus field-length value that is orders of magnitude > larger than the vector that contains the data. > Debugging such cases is slow and tedious. To help, we propose to create a > "vector validator" that spins through vectors looking for problems. > Then, to allow the validator to be used in the field, extend the "iterator > validator batch iterator" to optionally allow vector validation on each batch. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5504) Vector validator to diagnose offset vector issues
[ https://issues.apache.org/jira/browse/DRILL-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16014871#comment-16014871 ] ASF GitHub Bot commented on DRILL-5504: --- Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/832#discussion_r116092232 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ImplCreator.java --- @@ -69,9 +70,18 @@ public static RootExec getExec(FragmentContext context, FragmentRoot root) throw Preconditions.checkNotNull(root); Preconditions.checkNotNull(context); -if (AssertionUtil.isAssertionsEnabled()) { +// Enable iterator (operator) validation if assertions are enabled (debug mode) +// or if in production mode and the ENABLE_ITERATOR_VALIDATION option is set +// to true. + +boolean enableValidation = AssertionUtil.isAssertionsEnabled(); +if (! enableValidation) { + enableValidation = context.getOptionSet().getOption(ExecConstants.ENABLE_ITERATOR_VALIDATOR); +} +if (enableValidation) { --- End diff -- ``` if (AssertionUtil.isAssertionsEnabled() || context.getOptionSet().getOption(ExecConstants.ENABLE_ITERATOR_VALIDATOR) { ... } ``` > Vector validator to diagnose offset vector issues > - > > Key: DRILL-5504 > URL: https://issues.apache.org/jira/browse/DRILL-5504 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > Fix For: 1.11.0 > > > DRILL-5470 describes a case in which an offset vector appears to have become > corrupted, yielding a bogus field-length value that is orders of magnitude > larger than the vector that contains the data. > Debugging such cases is slow and tedious. To help, we propose to create a > "vector validator" that spins through vectors looking for problems. > Then, to allow the validator to be used in the field, extend the "iterator > validator batch iterator" to optionally allow vector validation on each batch. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5504) Vector validator to diagnose offset vector issues
[ https://issues.apache.org/jira/browse/DRILL-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16014874#comment-16014874 ] ASF GitHub Bot commented on DRILL-5504: --- Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/832#discussion_r116094668 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/validate/BatchValidator.java --- @@ -0,0 +1,205 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + **/ +package org.apache.drill.exec.physical.impl.validate; + +import java.util.ArrayList; +import java.util.List; + +import org.apache.drill.exec.record.SimpleVectorWrapper; +import org.apache.drill.exec.record.VectorAccessible; +import org.apache.drill.exec.record.VectorWrapper; +import org.apache.drill.exec.vector.BaseDataValueVector; +import org.apache.drill.exec.vector.FixedWidthVector; +import org.apache.drill.exec.vector.NullableVarCharVector; +import org.apache.drill.exec.vector.NullableVector; +import org.apache.drill.exec.vector.RepeatedVarCharVector; +import org.apache.drill.exec.vector.UInt4Vector; +import org.apache.drill.exec.vector.ValueVector; +import org.apache.drill.exec.vector.VarCharVector; +import org.apache.drill.exec.vector.VariableWidthVector; +import org.apache.drill.exec.vector.complex.BaseRepeatedValueVector; +import org.apache.drill.exec.vector.complex.RepeatedFixedWidthVectorLike; + + +/** + * Validate a batch of value vectors. It is not possible to validate the + * data, but we can validate the structure, especially offset vectors. + * Only handles single (non-hyper) vectors at present. Current form is + * self-contained. Better checks can be done by moving checks inside + * vectors or by exposing more metadata from vectors. + */ + +public class BatchValidator { + private static final org.slf4j.Logger logger = + org.slf4j.LoggerFactory.getLogger(BatchValidator.class); + + public static final int MAX_ERRORS = 100; + + private final int rowCount; + private final VectorAccessible batch; + private final List errorList; + private int errorCount; + + public BatchValidator(VectorAccessible batch) { +rowCount = batch.getRecordCount(); +this.batch = batch; +errorList = null; + } + + public BatchValidator(VectorAccessible batch, boolean captureErrors) { +rowCount = batch.getRecordCount(); +this.batch = batch; +if (captureErrors) { + errorList = new ArrayList<>(); +} else { + errorList = null; +} + } + + public void validate() { --- End diff -- Just a thought. Is there a way to enable these checks (and fail if invalid) for pre-commit tests as well? > Vector validator to diagnose offset vector issues > - > > Key: DRILL-5504 > URL: https://issues.apache.org/jira/browse/DRILL-5504 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > Fix For: 1.11.0 > > > DRILL-5470 describes a case in which an offset vector appears to have become > corrupted, yielding a bogus field-length value that is orders of magnitude > larger than the vector that contains the data. > Debugging such cases is slow and tedious. To help, we propose to create a > "vector validator" that spins through vectors looking for problems. > Then, to allow the validator to be used in the field, extend the "iterator > validator batch iterator" to optionally allow vector validation on each batch. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5504) Vector validator to diagnose offset vector issues
[ https://issues.apache.org/jira/browse/DRILL-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16014872#comment-16014872 ] ASF GitHub Bot commented on DRILL-5504: --- Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/832#discussion_r116092613 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/validate/BatchValidator.java --- @@ -0,0 +1,205 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + **/ +package org.apache.drill.exec.physical.impl.validate; + +import java.util.ArrayList; +import java.util.List; + +import org.apache.drill.exec.record.SimpleVectorWrapper; +import org.apache.drill.exec.record.VectorAccessible; +import org.apache.drill.exec.record.VectorWrapper; +import org.apache.drill.exec.vector.BaseDataValueVector; +import org.apache.drill.exec.vector.FixedWidthVector; +import org.apache.drill.exec.vector.NullableVarCharVector; +import org.apache.drill.exec.vector.NullableVector; +import org.apache.drill.exec.vector.RepeatedVarCharVector; +import org.apache.drill.exec.vector.UInt4Vector; +import org.apache.drill.exec.vector.ValueVector; +import org.apache.drill.exec.vector.VarCharVector; +import org.apache.drill.exec.vector.VariableWidthVector; +import org.apache.drill.exec.vector.complex.BaseRepeatedValueVector; +import org.apache.drill.exec.vector.complex.RepeatedFixedWidthVectorLike; + + +/** + * Validate a batch of value vectors. It is not possible to validate the + * data, but we can validate the structure, especially offset vectors. + * Only handles single (non-hyper) vectors at present. Current form is + * self-contained. Better checks can be done by moving checks inside + * vectors or by exposing more metadata from vectors. + */ + +public class BatchValidator { + private static final org.slf4j.Logger logger = + org.slf4j.LoggerFactory.getLogger(BatchValidator.class); + + public static final int MAX_ERRORS = 100; + + private final int rowCount; + private final VectorAccessible batch; + private final List errorList; + private int errorCount; + + public BatchValidator(VectorAccessible batch) { +rowCount = batch.getRecordCount(); +this.batch = batch; +errorList = null; + } + + public BatchValidator(VectorAccessible batch, boolean captureErrors) { +rowCount = batch.getRecordCount(); +this.batch = batch; +if (captureErrors) { + errorList = new ArrayList<>(); +} else { + errorList = null; +} + } + + public void validate() { +for (VectorWrapper w : batch) { + validateWrapper(w); +} + } + + private void validateWrapper(VectorWrapper w) { +if (w instanceof SimpleVectorWrapper) { + validateVector(w.getValueVector()); +} --- End diff -- You mentioned above that HyperVectorWrapper is not validated. Can you open a ticket for the functionality to-be-implemented in this validator? > Vector validator to diagnose offset vector issues > - > > Key: DRILL-5504 > URL: https://issues.apache.org/jira/browse/DRILL-5504 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > Fix For: 1.11.0 > > > DRILL-5470 describes a case in which an offset vector appears to have become > corrupted, yielding a bogus field-length value that is orders of magnitude > larger than the vector that contains the data. > Debugging such cases is slow and tedious. To help, we propose to create a > "vector validator" that spins through vectors looking for problems. > Then, to allow the
[jira] [Commented] (DRILL-5504) Vector validator to diagnose offset vector issues
[ https://issues.apache.org/jira/browse/DRILL-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007072#comment-16007072 ] ASF GitHub Bot commented on DRILL-5504: --- GitHub user paul-rogers opened a pull request: https://github.com/apache/drill/pull/832 DRILL-5504: Vector validator to diagnose offset vector issues Validates offset vectors in VarChar and repeated vectors. Validates the special case of repeated VarChar vectors (two layers of offsets.) Provides two new session variables to turn on validation. One enables the existing operator (iterator) validation, the other adds vector validation. This allows validation to occur in a “production” Drill (without restarting Drill with assertions, as previously required.) Unit tests validate the validator. Another test validates the integration, but requires manual steps, so is ignored by default. This version is first-cut: all work is done within a single class. Allows back-porting to an earlier version to solve a specific issues. A revision should move some of the work into generated code (or refactor vectors to allow outside access), since offset vectors appear for each subclass; not on a base class that would allow generic operations. You can merge this pull request into a Git repository by running: $ git pull https://github.com/paul-rogers/drill DRILL-5504 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/832.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #832 commit 175e592419ca6bda1fd0259cc42b033616facc3d Author: Paul RogersDate: 2017-05-11T19:46:15Z DRILL-5504: Vector validator to diagnose offset vector issues Validates offset vectors in VarChar and repeated vectors. Validates the special case of repeated VarChar vectors (two layers of offsets.) Provides two new session variables to turn on validation. One enables the existing operator (iterator) validation, the other adds vector validation. This allows validation to occur in a “production” Drill (without restarting Drill with assertions, as previously required.) Unit tests validate the validator. Another test validates the integration, but requires manual steps, so is ignored by default. This version is first-cut: all work is done within a single class. Allows back-porting to an earlier version to solve a specific issues. A revision should move some of the work into generated code (or refactor vectors to allow outside access), since offset vectors appear for each subclass; not on a base class that would allow generic operations. > Vector validator to diagnose offset vector issues > - > > Key: DRILL-5504 > URL: https://issues.apache.org/jira/browse/DRILL-5504 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > Fix For: 1.11.0 > > > DRILL-5470 describes a case in which an offset vector appears to have become > corrupted, yielding a bogus field-length value that is orders of magnitude > larger than the vector that contains the data. > Debugging such cases is slow and tedious. To help, we propose to create a > "vector validator" that spins through vectors looking for problems. > Then, to allow the validator to be used in the field, extend the "iterator > validator batch iterator" to optionally allow vector validation on each batch. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5504) Vector validator to diagnose offset vector issues
[ https://issues.apache.org/jira/browse/DRILL-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16005685#comment-16005685 ] Paul Rogers commented on DRILL-5504: Mini-design: 1. Create a BatchValidator class to iterate over vectors in a batch and validate each. 2. Create a VectorValidator class to validate each kind of vector that lends itself to validation. 3. Add a new session option, debug.validate-vectors to enable validation. 4. Modify IteratorValidatorBatchIterator to invoke the batch validator on each batch, if the above option is set. 5. Add another new session option, debug.validate-iterator, to enable the iterator validator. 5. Modify ImplCreator to consider both assertions enabled OR the debug.validate-iterator option to decide to inject the iterator validator. The result will be that a production, non-debug user can enable validation for just a problematic query, without having to restart the Drillbit with assertions enabled. > Vector validator to diagnose offset vector issues > - > > Key: DRILL-5504 > URL: https://issues.apache.org/jira/browse/DRILL-5504 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > Fix For: 1.11.0 > > > DRILL-5470 describes a case in which an offset vector appears to have become > corrupted, yielding a bogus field-length value that is orders of magnitude > larger than the vector that contains the data. > Debugging such cases is slow and tedious. To help, we propose to create a > "vector validator" that spins through vectors looking for problems. > Then, to allow the validator to be used in the field, extend the "iterator > validator batch iterator" to optionally allow vector validation on each batch. -- This message was sent by Atlassian JIRA (v6.3.15#6346)