[jira] [Commented] (DRILL-5504) Vector validator to diagnose offset vector issues

2017-06-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16035804#comment-16035804
 ] 

ASF GitHub Bot commented on DRILL-5504:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/832


> Vector validator to diagnose offset vector issues
> -
>
> Key: DRILL-5504
> URL: https://issues.apache.org/jira/browse/DRILL-5504
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> DRILL-5470 describes a case in which an offset vector appears to have become 
> corrupted, yielding a bogus field-length value that is orders of magnitude 
> larger than the vector that contains the data.
> Debugging such cases is slow and tedious. To help, we propose to create a 
> "vector validator" that spins through vectors looking for problems.
> Then, to allow the validator to be used in the field, extend the "iterator 
> validator batch iterator" to optionally allow vector validation on each batch.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5504) Vector validator to diagnose offset vector issues

2017-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16025573#comment-16025573
 ] 

ASF GitHub Bot commented on DRILL-5504:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/832
  
Fixed typo in log message and rebased onto latest master.


> Vector validator to diagnose offset vector issues
> -
>
> Key: DRILL-5504
> URL: https://issues.apache.org/jira/browse/DRILL-5504
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> DRILL-5470 describes a case in which an offset vector appears to have become 
> corrupted, yielding a bogus field-length value that is orders of magnitude 
> larger than the vector that contains the data.
> Debugging such cases is slow and tedious. To help, we propose to create a 
> "vector validator" that spins through vectors looking for problems.
> Then, to allow the validator to be used in the field, extend the "iterator 
> validator batch iterator" to optionally allow vector validation on each batch.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5504) Vector validator to diagnose offset vector issues

2017-05-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16017674#comment-16017674
 ] 

ASF GitHub Bot commented on DRILL-5504:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/832
  
Commits squashed.


> Vector validator to diagnose offset vector issues
> -
>
> Key: DRILL-5504
> URL: https://issues.apache.org/jira/browse/DRILL-5504
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> DRILL-5470 describes a case in which an offset vector appears to have become 
> corrupted, yielding a bogus field-length value that is orders of magnitude 
> larger than the vector that contains the data.
> Debugging such cases is slow and tedious. To help, we propose to create a 
> "vector validator" that spins through vectors looking for problems.
> Then, to allow the validator to be used in the field, extend the "iterator 
> validator batch iterator" to optionally allow vector validation on each batch.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5504) Vector validator to diagnose offset vector issues

2017-05-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16017612#comment-16017612
 ] 

ASF GitHub Bot commented on DRILL-5504:
---

Github user sudheeshkatkam commented on the issue:

https://github.com/apache/drill/pull/832
  
+1

Please squash the commits.


> Vector validator to diagnose offset vector issues
> -
>
> Key: DRILL-5504
> URL: https://issues.apache.org/jira/browse/DRILL-5504
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> DRILL-5470 describes a case in which an offset vector appears to have become 
> corrupted, yielding a bogus field-length value that is orders of magnitude 
> larger than the vector that contains the data.
> Debugging such cases is slow and tedious. To help, we propose to create a 
> "vector validator" that spins through vectors looking for problems.
> Then, to allow the validator to be used in the field, extend the "iterator 
> validator batch iterator" to optionally allow vector validation on each batch.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5504) Vector validator to diagnose offset vector issues

2017-05-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016609#comment-16016609
 ] 

ASF GitHub Bot commented on DRILL-5504:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/832#discussion_r117362498
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/validate/BatchValidator.java
 ---
@@ -0,0 +1,205 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ 
**/
+package org.apache.drill.exec.physical.impl.validate;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.exec.record.SimpleVectorWrapper;
+import org.apache.drill.exec.record.VectorAccessible;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.vector.BaseDataValueVector;
+import org.apache.drill.exec.vector.FixedWidthVector;
+import org.apache.drill.exec.vector.NullableVarCharVector;
+import org.apache.drill.exec.vector.NullableVector;
+import org.apache.drill.exec.vector.RepeatedVarCharVector;
+import org.apache.drill.exec.vector.UInt4Vector;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.exec.vector.VarCharVector;
+import org.apache.drill.exec.vector.VariableWidthVector;
+import org.apache.drill.exec.vector.complex.BaseRepeatedValueVector;
+import org.apache.drill.exec.vector.complex.RepeatedFixedWidthVectorLike;
+
+
+/**
+ * Validate a batch of value vectors. It is not possible to validate the
+ * data, but we can validate the structure, especially offset vectors.
+ * Only handles single (non-hyper) vectors at present. Current form is
+ * self-contained. Better checks can be done by moving checks inside
+ * vectors or by exposing more metadata from vectors.
+ */
+
+public class BatchValidator {
+  private static final org.slf4j.Logger logger =
+  org.slf4j.LoggerFactory.getLogger(BatchValidator.class);
+
+  public static final int MAX_ERRORS = 100;
+
+  private final int rowCount;
+  private final VectorAccessible batch;
+  private final List errorList;
+  private int errorCount;
+
+  public BatchValidator(VectorAccessible batch) {
+rowCount = batch.getRecordCount();
+this.batch = batch;
+errorList = null;
+  }
+
+  public BatchValidator(VectorAccessible batch, boolean captureErrors) {
+rowCount = batch.getRecordCount();
+this.batch = batch;
+if (captureErrors) {
+  errorList = new ArrayList<>();
+} else {
+  errorList = null;
+}
+  }
+
+  public void validate() {
+for (VectorWrapper w : batch) {
+  validateWrapper(w);
+}
+  }
+
+  private void validateWrapper(VectorWrapper w) {
+if (w instanceof SimpleVectorWrapper) {
+  validateVector(w.getValueVector());
+}
--- End diff --

Done. See DRILL-5526.


> Vector validator to diagnose offset vector issues
> -
>
> Key: DRILL-5504
> URL: https://issues.apache.org/jira/browse/DRILL-5504
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> DRILL-5470 describes a case in which an offset vector appears to have become 
> corrupted, yielding a bogus field-length value that is orders of magnitude 
> larger than the vector that contains the data.
> Debugging such cases is slow and tedious. To help, we propose to create a 
> "vector validator" that spins through vectors looking for problems.
> Then, to allow the validator to be used in the field, extend the "iterator 
> validator batch iterator" to optionally allow vector validation on 

[jira] [Commented] (DRILL-5504) Vector validator to diagnose offset vector issues

2017-05-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016610#comment-16016610
 ] 

ASF GitHub Bot commented on DRILL-5504:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/832#discussion_r117359430
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ImplCreator.java
 ---
@@ -69,9 +70,18 @@ public static RootExec getExec(FragmentContext context, 
FragmentRoot root) throw
 Preconditions.checkNotNull(root);
 Preconditions.checkNotNull(context);
 
-if (AssertionUtil.isAssertionsEnabled()) {
+// Enable iterator (operator) validation if assertions are enabled 
(debug mode)
+// or if in production mode and the ENABLE_ITERATOR_VALIDATION option 
is set
+// to true.
+
+boolean enableValidation = AssertionUtil.isAssertionsEnabled();
+if (! enableValidation) {
+  enableValidation = 
context.getOptionSet().getOption(ExecConstants.ENABLE_ITERATOR_VALIDATOR);
+}
+if (enableValidation) {
--- End diff --

Fixed.


> Vector validator to diagnose offset vector issues
> -
>
> Key: DRILL-5504
> URL: https://issues.apache.org/jira/browse/DRILL-5504
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> DRILL-5470 describes a case in which an offset vector appears to have become 
> corrupted, yielding a bogus field-length value that is orders of magnitude 
> larger than the vector that contains the data.
> Debugging such cases is slow and tedious. To help, we propose to create a 
> "vector validator" that spins through vectors looking for problems.
> Then, to allow the validator to be used in the field, extend the "iterator 
> validator batch iterator" to optionally allow vector validation on each batch.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5504) Vector validator to diagnose offset vector issues

2017-05-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016608#comment-16016608
 ] 

ASF GitHub Bot commented on DRILL-5504:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/832#discussion_r117361619
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/validate/BatchValidator.java
 ---
@@ -0,0 +1,205 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ 
**/
+package org.apache.drill.exec.physical.impl.validate;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.exec.record.SimpleVectorWrapper;
+import org.apache.drill.exec.record.VectorAccessible;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.vector.BaseDataValueVector;
+import org.apache.drill.exec.vector.FixedWidthVector;
+import org.apache.drill.exec.vector.NullableVarCharVector;
+import org.apache.drill.exec.vector.NullableVector;
+import org.apache.drill.exec.vector.RepeatedVarCharVector;
+import org.apache.drill.exec.vector.UInt4Vector;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.exec.vector.VarCharVector;
+import org.apache.drill.exec.vector.VariableWidthVector;
+import org.apache.drill.exec.vector.complex.BaseRepeatedValueVector;
+import org.apache.drill.exec.vector.complex.RepeatedFixedWidthVectorLike;
+
+
+/**
+ * Validate a batch of value vectors. It is not possible to validate the
+ * data, but we can validate the structure, especially offset vectors.
+ * Only handles single (non-hyper) vectors at present. Current form is
+ * self-contained. Better checks can be done by moving checks inside
+ * vectors or by exposing more metadata from vectors.
+ */
+
+public class BatchValidator {
+  private static final org.slf4j.Logger logger =
+  org.slf4j.LoggerFactory.getLogger(BatchValidator.class);
+
+  public static final int MAX_ERRORS = 100;
+
+  private final int rowCount;
+  private final VectorAccessible batch;
+  private final List errorList;
+  private int errorCount;
+
+  public BatchValidator(VectorAccessible batch) {
+rowCount = batch.getRecordCount();
+this.batch = batch;
+errorList = null;
+  }
+
+  public BatchValidator(VectorAccessible batch, boolean captureErrors) {
+rowCount = batch.getRecordCount();
+this.batch = batch;
+if (captureErrors) {
+  errorList = new ArrayList<>();
+} else {
+  errorList = null;
+}
+  }
+
+  public void validate() {
--- End diff --

Great idea! Added a config option that forces vector validation. Add the 
following to the pom.xml file in the Surefire options:

{code}
-Ddrill.exec.debug.validate_vectors=true
{code}

Will try this out and enable the checks as a different JIRA ticket and PR.


> Vector validator to diagnose offset vector issues
> -
>
> Key: DRILL-5504
> URL: https://issues.apache.org/jira/browse/DRILL-5504
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> DRILL-5470 describes a case in which an offset vector appears to have become 
> corrupted, yielding a bogus field-length value that is orders of magnitude 
> larger than the vector that contains the data.
> Debugging such cases is slow and tedious. To help, we propose to create a 
> "vector validator" that spins through vectors looking for problems.
> Then, to allow the validator to be used in the field, extend the "iterator 
> validator batch iterator" to optionally allow vector validation 

[jira] [Commented] (DRILL-5504) Vector validator to diagnose offset vector issues

2017-05-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16014873#comment-16014873
 ] 

ASF GitHub Bot commented on DRILL-5504:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/832#discussion_r116091914
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java ---
@@ -449,4 +449,19 @@
   String PERSISTENT_TABLE_UMASK = "exec.persistent_table.umask";
   StringValidator PERSISTENT_TABLE_UMASK_VALIDATOR = new 
StringValidator(PERSISTENT_TABLE_UMASK, "002");
 
+  /**
+   * When iterator validation is enabled, additionally validates the 
vectors in
+   * each batch passed to each iterator.
+   */
+  String ENABLE_VECTOR_VALIDATION = "debug.validate_vectors";
+  BooleanValidator ENABLE_VECTOR_VALIDATOR = new 
BooleanValidator(ENABLE_VECTOR_VALIDATION, true);
--- End diff --

false, by default, here and below.


> Vector validator to diagnose offset vector issues
> -
>
> Key: DRILL-5504
> URL: https://issues.apache.org/jira/browse/DRILL-5504
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> DRILL-5470 describes a case in which an offset vector appears to have become 
> corrupted, yielding a bogus field-length value that is orders of magnitude 
> larger than the vector that contains the data.
> Debugging such cases is slow and tedious. To help, we propose to create a 
> "vector validator" that spins through vectors looking for problems.
> Then, to allow the validator to be used in the field, extend the "iterator 
> validator batch iterator" to optionally allow vector validation on each batch.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5504) Vector validator to diagnose offset vector issues

2017-05-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16014871#comment-16014871
 ] 

ASF GitHub Bot commented on DRILL-5504:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/832#discussion_r116092232
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ImplCreator.java
 ---
@@ -69,9 +70,18 @@ public static RootExec getExec(FragmentContext context, 
FragmentRoot root) throw
 Preconditions.checkNotNull(root);
 Preconditions.checkNotNull(context);
 
-if (AssertionUtil.isAssertionsEnabled()) {
+// Enable iterator (operator) validation if assertions are enabled 
(debug mode)
+// or if in production mode and the ENABLE_ITERATOR_VALIDATION option 
is set
+// to true.
+
+boolean enableValidation = AssertionUtil.isAssertionsEnabled();
+if (! enableValidation) {
+  enableValidation = 
context.getOptionSet().getOption(ExecConstants.ENABLE_ITERATOR_VALIDATOR);
+}
+if (enableValidation) {
--- End diff --

```
if (AssertionUtil.isAssertionsEnabled() ||  
 
context.getOptionSet().getOption(ExecConstants.ENABLE_ITERATOR_VALIDATOR) { ... 
}
```


> Vector validator to diagnose offset vector issues
> -
>
> Key: DRILL-5504
> URL: https://issues.apache.org/jira/browse/DRILL-5504
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> DRILL-5470 describes a case in which an offset vector appears to have become 
> corrupted, yielding a bogus field-length value that is orders of magnitude 
> larger than the vector that contains the data.
> Debugging such cases is slow and tedious. To help, we propose to create a 
> "vector validator" that spins through vectors looking for problems.
> Then, to allow the validator to be used in the field, extend the "iterator 
> validator batch iterator" to optionally allow vector validation on each batch.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5504) Vector validator to diagnose offset vector issues

2017-05-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16014874#comment-16014874
 ] 

ASF GitHub Bot commented on DRILL-5504:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/832#discussion_r116094668
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/validate/BatchValidator.java
 ---
@@ -0,0 +1,205 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ 
**/
+package org.apache.drill.exec.physical.impl.validate;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.exec.record.SimpleVectorWrapper;
+import org.apache.drill.exec.record.VectorAccessible;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.vector.BaseDataValueVector;
+import org.apache.drill.exec.vector.FixedWidthVector;
+import org.apache.drill.exec.vector.NullableVarCharVector;
+import org.apache.drill.exec.vector.NullableVector;
+import org.apache.drill.exec.vector.RepeatedVarCharVector;
+import org.apache.drill.exec.vector.UInt4Vector;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.exec.vector.VarCharVector;
+import org.apache.drill.exec.vector.VariableWidthVector;
+import org.apache.drill.exec.vector.complex.BaseRepeatedValueVector;
+import org.apache.drill.exec.vector.complex.RepeatedFixedWidthVectorLike;
+
+
+/**
+ * Validate a batch of value vectors. It is not possible to validate the
+ * data, but we can validate the structure, especially offset vectors.
+ * Only handles single (non-hyper) vectors at present. Current form is
+ * self-contained. Better checks can be done by moving checks inside
+ * vectors or by exposing more metadata from vectors.
+ */
+
+public class BatchValidator {
+  private static final org.slf4j.Logger logger =
+  org.slf4j.LoggerFactory.getLogger(BatchValidator.class);
+
+  public static final int MAX_ERRORS = 100;
+
+  private final int rowCount;
+  private final VectorAccessible batch;
+  private final List errorList;
+  private int errorCount;
+
+  public BatchValidator(VectorAccessible batch) {
+rowCount = batch.getRecordCount();
+this.batch = batch;
+errorList = null;
+  }
+
+  public BatchValidator(VectorAccessible batch, boolean captureErrors) {
+rowCount = batch.getRecordCount();
+this.batch = batch;
+if (captureErrors) {
+  errorList = new ArrayList<>();
+} else {
+  errorList = null;
+}
+  }
+
+  public void validate() {
--- End diff --

Just a thought. Is there a way to enable these checks (and fail if invalid) 
for pre-commit tests as well?


> Vector validator to diagnose offset vector issues
> -
>
> Key: DRILL-5504
> URL: https://issues.apache.org/jira/browse/DRILL-5504
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> DRILL-5470 describes a case in which an offset vector appears to have become 
> corrupted, yielding a bogus field-length value that is orders of magnitude 
> larger than the vector that contains the data.
> Debugging such cases is slow and tedious. To help, we propose to create a 
> "vector validator" that spins through vectors looking for problems.
> Then, to allow the validator to be used in the field, extend the "iterator 
> validator batch iterator" to optionally allow vector validation on each batch.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5504) Vector validator to diagnose offset vector issues

2017-05-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16014872#comment-16014872
 ] 

ASF GitHub Bot commented on DRILL-5504:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/832#discussion_r116092613
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/validate/BatchValidator.java
 ---
@@ -0,0 +1,205 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ 
**/
+package org.apache.drill.exec.physical.impl.validate;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.exec.record.SimpleVectorWrapper;
+import org.apache.drill.exec.record.VectorAccessible;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.vector.BaseDataValueVector;
+import org.apache.drill.exec.vector.FixedWidthVector;
+import org.apache.drill.exec.vector.NullableVarCharVector;
+import org.apache.drill.exec.vector.NullableVector;
+import org.apache.drill.exec.vector.RepeatedVarCharVector;
+import org.apache.drill.exec.vector.UInt4Vector;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.exec.vector.VarCharVector;
+import org.apache.drill.exec.vector.VariableWidthVector;
+import org.apache.drill.exec.vector.complex.BaseRepeatedValueVector;
+import org.apache.drill.exec.vector.complex.RepeatedFixedWidthVectorLike;
+
+
+/**
+ * Validate a batch of value vectors. It is not possible to validate the
+ * data, but we can validate the structure, especially offset vectors.
+ * Only handles single (non-hyper) vectors at present. Current form is
+ * self-contained. Better checks can be done by moving checks inside
+ * vectors or by exposing more metadata from vectors.
+ */
+
+public class BatchValidator {
+  private static final org.slf4j.Logger logger =
+  org.slf4j.LoggerFactory.getLogger(BatchValidator.class);
+
+  public static final int MAX_ERRORS = 100;
+
+  private final int rowCount;
+  private final VectorAccessible batch;
+  private final List errorList;
+  private int errorCount;
+
+  public BatchValidator(VectorAccessible batch) {
+rowCount = batch.getRecordCount();
+this.batch = batch;
+errorList = null;
+  }
+
+  public BatchValidator(VectorAccessible batch, boolean captureErrors) {
+rowCount = batch.getRecordCount();
+this.batch = batch;
+if (captureErrors) {
+  errorList = new ArrayList<>();
+} else {
+  errorList = null;
+}
+  }
+
+  public void validate() {
+for (VectorWrapper w : batch) {
+  validateWrapper(w);
+}
+  }
+
+  private void validateWrapper(VectorWrapper w) {
+if (w instanceof SimpleVectorWrapper) {
+  validateVector(w.getValueVector());
+}
--- End diff --

You mentioned above that HyperVectorWrapper is not validated. Can you open 
a ticket for the functionality to-be-implemented in this validator?


> Vector validator to diagnose offset vector issues
> -
>
> Key: DRILL-5504
> URL: https://issues.apache.org/jira/browse/DRILL-5504
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> DRILL-5470 describes a case in which an offset vector appears to have become 
> corrupted, yielding a bogus field-length value that is orders of magnitude 
> larger than the vector that contains the data.
> Debugging such cases is slow and tedious. To help, we propose to create a 
> "vector validator" that spins through vectors looking for problems.
> Then, to allow the 

[jira] [Commented] (DRILL-5504) Vector validator to diagnose offset vector issues

2017-05-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007072#comment-16007072
 ] 

ASF GitHub Bot commented on DRILL-5504:
---

GitHub user paul-rogers opened a pull request:

https://github.com/apache/drill/pull/832

DRILL-5504: Vector validator to diagnose offset vector issues

Validates offset vectors in VarChar and repeated vectors. Validates the
special case of repeated VarChar vectors (two layers of offsets.)

Provides two new session variables to turn on validation. One enables
the existing operator (iterator) validation, the other adds vector
validation. This allows validation to occur in a “production” Drill
(without restarting Drill with assertions, as previously required.)

Unit tests validate the validator. Another test validates the
integration, but requires manual steps, so is ignored by default.

This version is first-cut: all work is done within a single class.
Allows back-porting to an earlier version to solve a specific issues. A
revision should move some of the work into generated code (or refactor
vectors to allow outside access), since offset vectors appear for each
subclass; not on a base class that would allow generic operations.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/paul-rogers/drill DRILL-5504

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/832.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #832


commit 175e592419ca6bda1fd0259cc42b033616facc3d
Author: Paul Rogers 
Date:   2017-05-11T19:46:15Z

DRILL-5504: Vector validator to diagnose offset vector issues

Validates offset vectors in VarChar and repeated vectors. Validates the
special case of repeated VarChar vectors (two layers of offsets.)

Provides two new session variables to turn on validation. One enables
the existing operator (iterator) validation, the other adds vector
validation. This allows validation to occur in a “production” Drill
(without restarting Drill with assertions, as previously required.)

Unit tests validate the validator. Another test validates the
integration, but requires manual steps, so is ignored by default.

This version is first-cut: all work is done within a single class.
Allows back-porting to an earlier version to solve a specific issues. A
revision should move some of the work into generated code (or refactor
vectors to allow outside access), since offset vectors appear for each
subclass; not on a base class that would allow generic operations.




> Vector validator to diagnose offset vector issues
> -
>
> Key: DRILL-5504
> URL: https://issues.apache.org/jira/browse/DRILL-5504
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> DRILL-5470 describes a case in which an offset vector appears to have become 
> corrupted, yielding a bogus field-length value that is orders of magnitude 
> larger than the vector that contains the data.
> Debugging such cases is slow and tedious. To help, we propose to create a 
> "vector validator" that spins through vectors looking for problems.
> Then, to allow the validator to be used in the field, extend the "iterator 
> validator batch iterator" to optionally allow vector validation on each batch.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5504) Vector validator to diagnose offset vector issues

2017-05-10 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16005685#comment-16005685
 ] 

Paul Rogers commented on DRILL-5504:


Mini-design:

1. Create a BatchValidator class to iterate over vectors in a batch and 
validate each.
2. Create a VectorValidator class to validate each kind of vector that lends 
itself to validation.
3. Add a new session option, debug.validate-vectors to enable validation.
4. Modify IteratorValidatorBatchIterator to invoke the batch validator on each 
batch, if the above option is set.
5. Add another new session option, debug.validate-iterator, to enable the 
iterator validator.
5. Modify ImplCreator to consider both assertions enabled OR the 
debug.validate-iterator option to decide to inject the iterator validator.

The result will be that a production, non-debug user can enable validation for 
just a problematic query, without having to restart the Drillbit with 
assertions enabled.

> Vector validator to diagnose offset vector issues
> -
>
> Key: DRILL-5504
> URL: https://issues.apache.org/jira/browse/DRILL-5504
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> DRILL-5470 describes a case in which an offset vector appears to have become 
> corrupted, yielding a bogus field-length value that is orders of magnitude 
> larger than the vector that contains the data.
> Debugging such cases is slow and tedious. To help, we propose to create a 
> "vector validator" that spins through vectors looking for problems.
> Then, to allow the validator to be used in the field, extend the "iterator 
> validator batch iterator" to optionally allow vector validation on each batch.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)