andrewthad opened a new issue, #36188:
URL: https://github.com/apache/arrow/issues/36188
### Describe the bug, including details regarding any error messages,
version, and platform.
In `format/Message.fbs`, the Arrow project documents two options for
compression:
```
enum CompressionType:byte {
// LZ4 frame format, for portability, as provided by lz4frame.h or wrappers
// thereof. Not to be confused with "raw" (also called "block") format
// provided by lz4.h
LZ4_FRAME,
// Zstandard
ZSTD
}
```
However,
[pyarrow.Codec](https://arrow.apache.org/docs/python/generated/pyarrow.Codec.html)
suggests that there are many more options, at the least: gzip, bz2, brotli,
lz4, lz4_frame, lz4_raw, zstd, snappy. And this is confirmed by [the 5.0.0
release notes](https://arrow.apache.org/blog/2021/07/29/5.0.0-release/):
> The new LZ4_RAW compression scheme was implemented (PARQUET-1998). Unlike
the legacy LZ4 compression scheme, it is defined unambiguously and should
provide better portability once other Parquet implementations catch up.
But the flatbuffers definition lags behind. I've looked through some of the
generated cpp files, and I cannot figure out what the flatbuffers value for
`LZ4_RAW` is supposed to be. I would greatly appreciate it if someone who
understands the living spec could update the formal spec.
### Component(s)
Documentation
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]