kylebarron edited a comment on issue #180:
URL: https://github.com/apache/arrow-rs/issues/180#issuecomment-1058827130


   Another update on the compression codecs:
   
   - [x] Uncompressed
   - [x] Snappy: works out of the box with `snap` feature
   - [x] Gzip: works out of the box with the `flate2` feature
   - [x] Brotli: works out of the box with the `brotli` feature. (I thought 
this had failed for me before, so maybe it was [updating my 
llvm/clang](https://github.com/kylebarron/parquet-wasm/pull/2) while debugging 
ZSTD that caused this to work).
   - [x] ZSTD: works with [this unreleased 
commit](https://github.com/gyscos/zstd-rs/commit/d6bfa32d09b8e4ef747c9b57109974c270ffab72)
 on `zstd-rs`, merged in January 2022, a day after the `0.10.0` release was cut 
🥲 . Tested working when pointing to the latest `zstd-rs` master and with 
`default-features = false`. Hopefully they make a new release soon. (Should 
work as of #1414)
   - [ ] LZ4: The currently used 
[`lz4-rs`](https://github.com/10XGenomics/lz4-rs) hasn't had a release since 
June 8, 2020. It accepted a PR from July 2020 that [purported to add support 
for `wasm32-unknown-unknown`](https://github.com/10XGenomics/lz4-rs/pull/11) 
but I pulled their repo and couldn't get it to build for that target.
   
        [`lz4-flex`](https://github.com/PSeitz/lz4_flex) successfully compiles 
to WASM, but the API is slightly different. I tried 
(https://github.com/kylebarron/arrow-rs/pull/3/files#diff-73978efa44253b6c1cafc48e0fd042b761ebfff35cb32c9f53717d1641dab0fe)
 to update `parquet/src/compression.rs` to the `lz4-flex` API. It compiles fine 
but I get panics when testing the WASM with an LZ4 parquet file in the browser. 
I don't really know what I'm doing wrong 😅 . Switching out to `lz4-flex` seems 
very achievable by someone who knows Rust better than I do 😄 .
   - [ ] LZO: I see references to LZO compression in `parquet/src/basic.rs` but 
it doesn't seem to be implemented in `parquet/src/compression.rs`? I hadn't 
heard of LZO before. According to Wes, [LZO isn't really used 
anymore](https://github.com/apache/arrow/issues/2209#issuecomment-402859258), 
so I don't see myself working on this.
   
   In terms of Arrow IPC files being malformatted, I switched from 
`arrow::ipc::writer::StreamWriter` to `arrow::ipc::writer::FileWriter`. Now all 
the Arrow files generated by `parquet-wasm` from my Parquet test files are 
readable in _Python_ using `pa.ipc.open_file`, so presumably the JS errors 
arising from `arrow.tableFromIPC(tableBytes)` are issues with the JS library 
and its IPC parser. (It looks like there are known issues with the IPC support 
in JS [ARROW-15642](https://issues.apache.org/jira/browse/ARROW-15642), 
[ARROW-13818](https://issues.apache.org/jira/browse/ARROW-13818), 
[ARROW-8674](https://issues.apache.org/jira/browse/ARROW-8674))


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to