Github user vvysotskyi commented on a diff in the pull request:
https://github.com/apache/drill/pull/805#discussion_r140057495
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java
---
@@ -1054,8 +1057,36 @@ public void setMax(Object max) {
return nulls;
}
- @Override public boolean hasSingleValue() {
- return (max != null && min != null && max.equals(min));
+ /**
+ * Checks that the column chunk has a single value.
+ * Returns {@code true} if {@code min} and {@code max} are the same
but not null
+ * and nulls count is 0 or equal to the rows count.
+ * <p>
+ * Returns {@code true} if {@code min} and {@code max} are null and
the number of null values
+ * in the column chunk is equal to the rows count.
+ * <p>
+ * Comparison of nulls and rows count is needed for the cases:
+ * <ul>
+ * <li>column with primitive type has single value and null values</li>
+ *
+ * <li>column <b>with primitive type</b> has only null values, min/max
couldn't be null,
+ * but column has single value</li>
+ * </ul>
+ *
+ * @param rowCount rows count in column chunk
+ * @return true if column has single value
+ */
+ @Override
+ public boolean hasSingleValue(long rowCount) {
+ if (nulls != null) {
+ if (min != null) {
+ // Objects.deepEquals() is used here, since min and max may be
byte arrays
+ return Objects.deepEquals(min, max) && (nulls == 0 || nulls ==
rowCount);
--- End diff --
Statistics [1] for most parquet types use java primitive types to store min
and max values, so min/max can not be null even if the table has null values.
[1]
https://github.com/apache/parquet-mr/tree/e54ca615f213f5db6d34d9163c97eec98920d7a7/parquet-column/src/main/java/org/apache/parquet/column/statistics
---