andrewthad opened a new issue, #36188:
URL: https://github.com/apache/arrow/issues/36188

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   In `format/Message.fbs`, the Arrow project documents two options for 
compression:
   
   ```
   enum CompressionType:byte {
     // LZ4 frame format, for portability, as provided by lz4frame.h or wrappers
     // thereof. Not to be confused with "raw" (also called "block") format
     // provided by lz4.h
     LZ4_FRAME,
   
     // Zstandard
     ZSTD
   }
   ```
   
   However, 
[pyarrow.Codec](https://arrow.apache.org/docs/python/generated/pyarrow.Codec.html)
 suggests that there are many more options, at the least: gzip, bz2, brotli, 
lz4, lz4_frame, lz4_raw, zstd, snappy. And this is confirmed by [the 5.0.0 
release notes](https://arrow.apache.org/blog/2021/07/29/5.0.0-release/):
   
   > The new LZ4_RAW compression scheme was implemented (PARQUET-1998). Unlike 
the legacy LZ4 compression scheme, it is defined unambiguously and should 
provide better portability once other Parquet implementations catch up.
   
   But the flatbuffers definition lags behind. I've looked through some of the 
generated cpp files, and I cannot figure out what the flatbuffers value for 
`LZ4_RAW` is supposed to be. I would greatly appreciate it if someone who 
understands the living spec could update the formal spec.
   
   ### Component(s)
   
   Documentation


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to