[ 
https://issues.apache.org/jira/browse/ARROW-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523985#comment-17523985
 ] 

Dominik Moritz commented on ARROW-8674:
---------------------------------------

> For one, gzip compression is much slower than LZ4 or ZSTD compression.

Maybe. Let's make sure to compare native gzip compression that a web server 
uses with js lz4/zstd compression.

> I think it would be possible to force the `compress` and `decompress` 
> functions in the plugin system to be synchronous. That would just force the 
> user to finish any async initialization before trying to read/write a file, 
> since wasm bundles can't be instantiated synchronously I think.

It would unfortunately also preclude people from putting decompression into a 
worker. Maybe we can make the relevant IPC methods return return promises when 
the compression/decompression method is async (returns a promise).

> None of the ZSTD libraries I came across were pure JS. The only LZ4 one that 
> was pure JS was lz4js.

We could consider inlining the wasm code with base64 if it's tiny but I suspect 
it will not. Worth considering, though. 

Anyway, I think it makes sense to work on this and send a pull request. We 
should definitely have a way to pass in/register compression algorithms. Then 
let's look into whether we want to bundle any algorithms. Let's start with lz4 
and try a few libraries (e.g. https://github.com/gorhill/lz4-wasm, 
https://github.com/Benzinga/lz4js, https://github.com/pierrec/node-lz4). If 
they are small enough, I would consider including a default lz4 implementation. 
Sounds good?

> [JS] Implement IPC RecordBatch body buffer compression from ARROW-300
> ---------------------------------------------------------------------
>
>                 Key: ARROW-8674
>                 URL: https://issues.apache.org/jira/browse/ARROW-8674
>             Project: Apache Arrow
>          Issue Type: Sub-task
>          Components: JavaScript
>            Reporter: Wes McKinney
>            Priority: Major
>
> This may not be a hard requirement for JS because this would require pulling 
> in implementations of LZ4 and ZSTD which not all users may want



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to