[
https://issues.apache.org/jira/browse/PARQUET-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732592#comment-17732592
]
ASF GitHub Bot commented on PARQUET-758:
----------------------------------------
pitrou commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1591494344
> FWIW, I rather think it should be a physical type for the following
reasons:
>
> * encodings are currently only defined on the physical type, not the
logical one. So allowing BYTE_STREAM_SPLIT for this type would actually break
this if it is a logical type.
This seems reasonable, but BYTE_STREAM_SPLIT is the only relevant encoding
here (and it's probably not widely used yet).
> * Having this be a logical type while float and double are physical types
seems inconsistent.
While I agree it seems inconsistent, this is an idealistic argument. If
Float16 had been included from scratch, it would be logical ( :-) ) to make it
a physical type because there would not be any compatibility problem.
> * There might eventually be hardware support or native language support
for this for this type. In this case, having it as physical type would allow
easier to leverage this hardware / language support, as most libraries
instantiate encoders/decoders based on the physical type.
I'm not sure by how much making Float16 a physical type would make
encoding/decoding easier. The main change would be that bytewidth is known at
compile time, instead of at runtime. Other than that, having HW support for
Float16 would not change the ease of copying data two bytes at a time...
> * IMHO, the basic idea behind physical and logical types is not to support
forward compatibility; that is just a byproduct. Otherwise, there should just
be one or two physical types in the first place (FIXED_LEN_BYTE_ARRAY and
BYTE_ARRAY).
That's partly true, but once the Parquet format is widely used, forward
compatibility becomes a significant concern.
If Float16 is a new physical type, existing systems will probably not be
able to read data with such a column _at all_. If Float16 is a new logical
type, existing systems should be able to read the data; they just won't be able
to draw any insight from the Float16 columns (but will be able to process the
other columns).
To sum it up:
* making Float16 a logical type would make adoption of the new type faster
as compatibility with existing systems would be preserved
* making Float16 a physical type would be more consistent overall, and would
allow applying the BYTE_STREAM_SPLIT encoding
> [Format] HALF precision FLOAT Logical type
> ------------------------------------------
>
> Key: PARQUET-758
> URL: https://issues.apache.org/jira/browse/PARQUET-758
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-format
> Reporter: Julien Le Dem
> Priority: Minor
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)