[
https://issues.apache.org/jira/browse/PARQUET-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769386#comment-17769386
]
ASF GitHub Bot commented on PARQUET-1647:
-----------------------------------------
wgtmac commented on code in PR #1142:
URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1337924016
##########
parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java:
##########
@@ -139,6 +140,43 @@ public Statistics<?> build() {
}
}
+ // Builder for FLOAT16 type to handle special cases of min/max values like
NaN, -0.0, and 0.0
+ private static class Float16Builder extends Builder {
+ public Float16Builder(PrimitiveType type) {
+ super(type);
+ assert type.getPrimitiveTypeName() == PrimitiveTypeName.BINARY;
Review Comment:
```suggestion
assert type.getPrimitiveTypeName() ==
PrimitiveTypeName.FIXED_LEN_BYTE_ARRAY;
```
##########
parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java:
##########
@@ -139,6 +140,43 @@ public Statistics<?> build() {
}
}
+ // Builder for FLOAT16 type to handle special cases of min/max values like
NaN, -0.0, and 0.0
+ private static class Float16Builder extends Builder {
+ public Float16Builder(PrimitiveType type) {
+ super(type);
+ assert type.getPrimitiveTypeName() == PrimitiveTypeName.BINARY;
+ }
+
+ @Override
+ public Statistics<?> build() {
+ Float16Statistics stats = (Float16Statistics) super.build();
+ if (stats.hasNonNullValue()) {
+ Binary bMin = stats.genericGetMin();
+ Binary bMax = stats.genericGetMax();
+ short min = Float16.fromBytesLittleEndian(bMin.getBytes());
+ short max = Float16.fromBytesLittleEndian(bMax.getBytes());
+ // Drop min/max values in case of NaN as the sorting order of values
is undefined for this case
+ if (Float16.isNaN(min) || Float16.isNaN(max)) {
+ bMin =
Binary.fromConstantByteArray(Float16.toBytesLittleEndian(Float16.POSITIVE_ZERO));
Review Comment:
It seems worth adding two static constants (of Binary type) to Float16 for
POSITIVE_ZERO and NEGATIVE_ZERO as they are repeatedly constructed.
##########
parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java:
##########
@@ -226,6 +268,11 @@ public static Builder getBuilderForReading(PrimitiveType
type) {
return new FloatBuilder(type);
case DOUBLE:
return new DoubleBuilder(type);
+ case BINARY:
Review Comment:
```suggestion
case FIXED_LEN_BYTE_ARRAY:
```
##########
parquet-column/src/main/java/org/apache/parquet/column/statistics/Float16Statistics.java:
##########
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.parquet.column.statistics;
+
+import org.apache.parquet.schema.LogicalTypeAnnotation;
+import org.apache.parquet.schema.PrimitiveType;
+import org.apache.parquet.schema.Types;
+
+public class Float16Statistics extends BinaryStatistics
+{
+ // A fake type object to be used to generate the proper comparator
+ private static final PrimitiveType DEFAULT_FAKE_TYPE =
Types.optional(PrimitiveType.PrimitiveTypeName.BINARY)
+
.named("fake_binary_float16_type").withLogicalTypeAnnotation(LogicalTypeAnnotation.float16Type());
+
+ /**
+ * @deprecated will be removed in 2.0.0. Use {@link
Statistics#createStats(org.apache.parquet.schema.Type)} instead
+ */
+ @Deprecated
Review Comment:
We shouldn't even add this if it is a deprecated one.
##########
parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java:
##########
@@ -139,6 +140,43 @@ public Statistics<?> build() {
}
}
+ // Builder for FLOAT16 type to handle special cases of min/max values like
NaN, -0.0, and 0.0
+ private static class Float16Builder extends Builder {
+ public Float16Builder(PrimitiveType type) {
+ super(type);
+ assert type.getPrimitiveTypeName() == PrimitiveTypeName.BINARY;
Review Comment:
Please check the fixed length (2) as well.
##########
parquet-common/src/main/java/org/apache/parquet/type/Float16.java:
##########
@@ -0,0 +1,339 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.type;
+
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+
+/**
+ * The class is a utility class to manipulate half-precision 16-bit
+ * <a
href="https://en.wikipedia.org/wiki/Half-precision_floating-point_format">IEEE
754</a>
+ * floating point data types (also called fp16 or binary16). A half-precision
float can be
+ * created from or converted to single-precision floats, and is stored in a
short data type.
+ * The IEEE 754 standard specifies an float16 as having the following format:
+ * <ul>
+ * <li>Sign bit: 1 bit</li>
+ * <li>Exponent width: 5 bits</li>
+ * <li>Significand: 10 bits</li>
+ * </ul>
+ *
+ * <p>The format is laid out as follows:</p>
+ * <pre>
+ * 1 11111 1111111111
+ * ^ --^
> [Java] support for Arrow's float16
> ----------------------------------
>
> Key: PARQUET-1647
> URL: https://issues.apache.org/jira/browse/PARQUET-1647
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-format, parquet-thrift
> Reporter: The Alchemist
> Priority: Minor
>
> h2. DESCRIPTION
>
> I'm wondering if there's any interest in supporting Arrow's {{float16}} type
> in Parquet.
> There seem to be one or two {{float16}} / {{halffloat}} tickets here (e.g.,
> PARQUET-1403) but nothing that speaks to adding half-float support to Parquet
> in-general.
>
> h2. PLANS
> I'm able to spend some time on this, if someone points me in the right
> direction.
>
> # Add the {{HALFFLOAT}} or {{FLOAT16}} enum (any preferred naming
> convention?) to
> [https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32]
> # Add {{HALFFLOAT}} to {{org.apache.parquet.schema.PrimitiveType}}
> # Add {{HALFFLOAT}} support to
> {{org.apache.parquet.arrow.schema.SchemaConverter}}
> # Add encoding for new type at {{org.apache.parquet.column.Encoding}}
> # ??
> If anyone has any interest in this, pointers, or comments, they would be
> greatly appreciated!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)