Github user vvysotskyi commented on a diff in the pull request:
https://github.com/apache/drill/pull/805#discussion_r112650691
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java
---
@@ -1668,9 +1666,25 @@ public ColumnMetadata_v3(String[] name,
PrimitiveTypeName primitiveType, Object
return nulls;
}
+ /**
+ * Checks that the column chunk has single value.
+ * Returns true if minValue and maxValue are the same, but not null,
+ * and all column chunk values are not null.
+ * Returns true if minValue and maxValue are null and null values
count in
+ * the column chunk is greater than 0.
+ *
+ * @return true if column has single value
+ */
@Override
public boolean hasSingleValue() {
- return (minValue !=null && maxValue != null &&
minValue.equals(maxValue));
+ if (nulls != null) {
--- End diff --
Yes, we should apply it to ColumnMetadata_v1, thanks.
In ColumnMetadata_v2 mxValue is set only in the case when min value is the
same as the max value. So we could not be sure that if mxValue == null and
nulls count is greater than zero, column has single value.
Also I have changed the result for the second case described in your
previous comment. Statistics [1] for most parquet types use java primitive
types to store min and max values, so min/max can not be null even if the table
has null values. So I removed check nulls == 0.
[1]
https://github.com/apache/parquet-mr/tree/e54ca615f213f5db6d34d9163c97eec98920d7a7/parquet-column/src/main/java/org/apache/parquet/column/statistics
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---