pitrou commented on PR #184:
URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1591494344

   > FWIW, I rather think it should be a physical type for the following 
reasons:
   > 
   > * encodings are currently only defined on the physical type, not the 
logical one. So allowing BYTE_STREAM_SPLIT for this type would actually break 
this if it is a logical type.
   
   This seems reasonable, but BYTE_STREAM_SPLIT is the only relevant encoding 
here (and it's probably not widely used yet).
   
   > * Having this be a logical type while float and double are physical types 
seems inconsistent.
   
   While I agree it seems inconsistent, this is an idealistic argument. If 
Float16 had been included from scratch, it would be logical ( :-) ) to make it 
a physical type because there would not be any compatibility problem.
   
   > * There might eventually be hardware support or native language support 
for this for this type. In this case, having it as physical type would allow 
easier to leverage this hardware / language support, as most libraries 
instantiate encoders/decoders based on the physical type.
   
   I'm not sure by how much making Float16 a physical type would make 
encoding/decoding easier. The main change would be that bytewidth is known at 
compile time, instead of at runtime. Other than that, having HW support for 
Float16 would not change the ease of copying data two bytes at a time...
   
   > * IMHO, the basic idea behind physical and logical types is not to support 
forward compatibility; that is just a byproduct. Otherwise, there should just 
be one or two physical types in the first place (FIXED_LEN_BYTE_ARRAY and 
BYTE_ARRAY).
   
   That's partly true, but once the Parquet format is widely used, forward 
compatibility becomes a significant concern.
   
   If Float16 is a new physical type, existing systems will probably not be 
able to read data with such a column _at all_. If Float16 is a new logical 
type, existing systems should be able to read the data; they just won't be able 
to draw any insight from the Float16 columns (but will be able to process the 
other columns).
   
   To sum it up:
   * making Float16 a logical type would make adoption of the new type faster 
as compatibility with existing systems would be preserved
   * making Float16 a physical type would be more consistent overall, and would 
allow applying the BYTE_STREAM_SPLIT encoding
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to