[GitHub] vrozov commented on a change in pull request #1349: DRILL-6554: Minor code improvements in parquet statistics handling

2018-06-29 Thread GitBox
vrozov commented on a change in pull request #1349: DRILL-6554: Minor code 
improvements in parquet statistics handling
URL: https://github.com/apache/drill/pull/1349#discussion_r199217282
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java
 ##
 @@ -417,16 +417,9 @@ public static DateCorruptionStatus 
checkForCorruptDateValuesInStatistics(Parquet
 // column does not appear in this file, skip it
 continue;
   }
-  Statistics statistics = 
footer.getBlocks().get(rowGroupIndex).getColumns().get(colIndex).getStatistics();
-  Integer max = (Integer) statistics.genericGetMax();
-  if (statistics.hasNonNullValue()) {
-if (max > ParquetReaderUtility.DATE_CORRUPTION_THRESHOLD) {
-  return DateCorruptionStatus.META_SHOWS_CORRUPTION;
-}
-  } else {
-// no statistics, go check the first page
-return DateCorruptionStatus.META_UNCLEAR_TEST_VALUES;
-  }
+  IntStatistics statistics = 
(IntStatistics)footer.getBlocks().get(rowGroupIndex).getColumns().get(colIndex).getStatistics();
 
 Review comment:
   I don't see any specific code style in regards to spacing for casting (grep 
-I -R \(\([a-zA-Z0-9_]*\)[a-zA-Z0-9_] * | grep -c java). It seems to be a 
preference of a contributor.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vrozov commented on a change in pull request #1349: DRILL-6554: Minor code improvements in parquet statistics handling

2018-06-29 Thread GitBox
vrozov commented on a change in pull request #1349: DRILL-6554: Minor code 
improvements in parquet statistics handling
URL: https://github.com/apache/drill/pull/1349#discussion_r199215493
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetPredicatesHelper.java
 ##
 @@ -39,22 +40,21 @@ static boolean isNullOrEmpty(Statistics stat) {
*
* @param stat parquet column statistics
* @param rowCount number of rows in the parquet file
-   * @return True if all rows are null in the parquet file
-   *  False if at least one row is not null.
+   * @return true if all rows are null in the parquet file and 
false otherwise
*/
   static boolean isAllNulls(Statistics stat, long rowCount) {
-return stat.isNumNullsSet() && stat.getNumNulls() == rowCount;
+Preconditions.checkArgument(rowCount >= 0, String.format("negative 
rowCount %d is not valid", rowCount));
 
 Review comment:
   It does not matter where it comes from (it actually comes from 
`RowGroupInfo`). The condition needs to be intercepted prior to calling 
`isAllNulls()` as `isAllNulls()` would return the wrong result (whether true or 
false) for a negative row count.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vrozov commented on a change in pull request #1349: DRILL-6554: Minor code improvements in parquet statistics handling

2018-06-29 Thread GitBox
vrozov commented on a change in pull request #1349: DRILL-6554: Minor code 
improvements in parquet statistics handling
URL: https://github.com/apache/drill/pull/1349#discussion_r199210267
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetPredicatesHelper.java
 ##
 @@ -39,22 +40,21 @@ static boolean isNullOrEmpty(Statistics stat) {
*
* @param stat parquet column statistics
* @param rowCount number of rows in the parquet file
-   * @return True if all rows are null in the parquet file
-   *  False if at least one row is not null.
+   * @return true if all rows are null in the parquet file and 
false otherwise
*/
   static boolean isAllNulls(Statistics stat, long rowCount) {
-return stat.isNumNullsSet() && stat.getNumNulls() == rowCount;
+Preconditions.checkArgument(rowCount >= 0, String.format("negative 
rowCount %d is not valid", rowCount));
 
 Review comment:
   I hope for the reverse. Negative row count passed to this method indicates a 
bug, and bug means incorrect result. I'd rather fail a query than give a wrong 
result.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vrozov commented on a change in pull request #1349: DRILL-6554: Minor code improvements in parquet statistics handling

2018-06-29 Thread GitBox
vrozov commented on a change in pull request #1349: DRILL-6554: Minor code 
improvements in parquet statistics handling
URL: https://github.com/apache/drill/pull/1349#discussion_r199208654
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetPredicatesHelper.java
 ##
 @@ -39,22 +40,21 @@ static boolean isNullOrEmpty(Statistics stat) {
*
* @param stat parquet column statistics
* @param rowCount number of rows in the parquet file
-   * @return True if all rows are null in the parquet file
-   *  False if at least one row is not null.
+   * @return true if all rows are null in the parquet file and 
false otherwise
*/
   static boolean isAllNulls(Statistics stat, long rowCount) {
-return stat.isNumNullsSet() && stat.getNumNulls() == rowCount;
+Preconditions.checkArgument(rowCount >= 0, String.format("negative 
rowCount %d is not valid", rowCount));
 
 Review comment:
   To validate input. It can't give the correct answer if an input is a junk 
(`rowCount` is negative).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vrozov commented on a change in pull request #1349: DRILL-6554: Minor code improvements in parquet statistics handling

2018-06-29 Thread GitBox
vrozov commented on a change in pull request #1349: DRILL-6554: Minor code 
improvements in parquet statistics handling
URL: https://github.com/apache/drill/pull/1349#discussion_r199207853
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java
 ##
 @@ -417,16 +417,9 @@ public static DateCorruptionStatus 
checkForCorruptDateValuesInStatistics(Parquet
 // column does not appear in this file, skip it
 continue;
   }
-  Statistics statistics = 
footer.getBlocks().get(rowGroupIndex).getColumns().get(colIndex).getStatistics();
-  Integer max = (Integer) statistics.genericGetMax();
-  if (statistics.hasNonNullValue()) {
-if (max > ParquetReaderUtility.DATE_CORRUPTION_THRESHOLD) {
-  return DateCorruptionStatus.META_SHOWS_CORRUPTION;
-}
-  } else {
-// no statistics, go check the first page
-return DateCorruptionStatus.META_UNCLEAR_TEST_VALUES;
-  }
+  IntStatistics statistics = 
(IntStatistics)footer.getBlocks().get(rowGroupIndex).getColumns().get(colIndex).getStatistics();
 
 Review comment:
   Please see parquet format spec. `DATE` is always `int32`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vrozov commented on a change in pull request #1349: DRILL-6554: Minor code improvements in parquet statistics handling

2018-06-29 Thread GitBox
vrozov commented on a change in pull request #1349: DRILL-6554: Minor code 
improvements in parquet statistics handling
URL: https://github.com/apache/drill/pull/1349#discussion_r199205436
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetPredicatesHelper.java
 ##
 @@ -39,22 +40,21 @@ static boolean isNullOrEmpty(Statistics stat) {
*
* @param stat parquet column statistics
* @param rowCount number of rows in the parquet file
-   * @return True if all rows are null in the parquet file
-   *  False if at least one row is not null.
+   * @return true if all rows are null in the parquet file and 
false otherwise
*/
   static boolean isAllNulls(Statistics stat, long rowCount) {
-return stat.isNumNullsSet() && stat.getNumNulls() == rowCount;
+Preconditions.checkArgument(rowCount >= 0, String.format("negative 
rowCount %d is not valid", rowCount));
 
 Review comment:
   To avoid bugs :). It is invalid to call `isAllNulls()` with negative 
`rowCount`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vrozov commented on a change in pull request #1349: DRILL-6554: Minor code improvements in parquet statistics handling

2018-06-29 Thread GitBox
vrozov commented on a change in pull request #1349: DRILL-6554: Minor code 
improvements in parquet statistics handling
URL: https://github.com/apache/drill/pull/1349#discussion_r199204147
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetPredicatesHelper.java
 ##
 @@ -39,22 +40,21 @@ static boolean isNullOrEmpty(Statistics stat) {
*
* @param stat parquet column statistics
* @param rowCount number of rows in the parquet file
-   * @return True if all rows are null in the parquet file
-   *  False if at least one row is not null.
+   * @return true if all rows are null in the parquet file and 
false otherwise
*/
   static boolean isAllNulls(Statistics stat, long rowCount) {
-return stat.isNumNullsSet() && stat.getNumNulls() == rowCount;
+Preconditions.checkArgument(rowCount >= 0, String.format("negative 
rowCount %d is not valid", rowCount));
 
 Review comment:
   When can rowCount be negative?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services