pitrou commented on PR #184: URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1591494344
> FWIW, I rather think it should be a physical type for the following reasons: > > * encodings are currently only defined on the physical type, not the logical one. So allowing BYTE_STREAM_SPLIT for this type would actually break this if it is a logical type. This seems reasonable, but BYTE_STREAM_SPLIT is the only relevant encoding here (and it's probably not widely used yet). > * Having this be a logical type while float and double are physical types seems inconsistent. While I agree it seems inconsistent, this is an idealistic argument. If Float16 had been included from scratch, it would be logical ( :-) ) to make it a physical type because there would not be any compatibility problem. > * There might eventually be hardware support or native language support for this for this type. In this case, having it as physical type would allow easier to leverage this hardware / language support, as most libraries instantiate encoders/decoders based on the physical type. I'm not sure by how much making Float16 a physical type would make encoding/decoding easier. The main change would be that bytewidth is known at compile time, instead of at runtime. Other than that, having HW support for Float16 would not change the ease of copying data two bytes at a time... > * IMHO, the basic idea behind physical and logical types is not to support forward compatibility; that is just a byproduct. Otherwise, there should just be one or two physical types in the first place (FIXED_LEN_BYTE_ARRAY and BYTE_ARRAY). That's partly true, but once the Parquet format is widely used, forward compatibility becomes a significant concern. If Float16 is a new physical type, existing systems will probably not be able to read data with such a column _at all_. If Float16 is a new logical type, existing systems should be able to read the data; they just won't be able to draw any insight from the Float16 columns (but will be able to process the other columns). To sum it up: * making Float16 a logical type would make adoption of the new type faster as compatibility with existing systems would be preserved * making Float16 a physical type would be more consistent overall, and would allow applying the BYTE_STREAM_SPLIT encoding -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org