[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468498#comment-17468498
 ] 

Adrien Grand commented on LUCENE-8739:
--------------------------------------

bq. Would such an increase even make sense or would this cause other issues?

It would require reading more data from disk. This read would be sequential so 
I suspect it wouldn't hurt much, including on slower I/O. The main drawback is 
probably that it would trash a bit more of filesystem cache. That said I agree 
with you that we should probably look into increasing the block size with 
ZStandard. I just did a run with 1.5x larger blocks and level=6, it slightly 
outperforms our current BEST_COMPRESSION mode across indexing time, disk usage 
and compression.

||Codec ||Indexing time (ms) ||Disk usage (MB) || Retrieval time per 10k docs 
(ms) ||
| ZSTD dict level=6 1.5x larger blocks | 43228 | 57.455 | 1269.22127 |

bq. Or would 3 presets be too much choice?

IMO it would be too much, but I like the fact that ZSTD could help us have two 
options for compression that share the exact same read logic, e.g. if we 
replaced BEST_SPEED with what you suggested for BALANCED: low level ZSTD 
compression with a small block size.

bq. Anyway I see potential for good tradeoffs here.

+1 ZSTD is quite great. I wouldn't use it in the Lucene default codec yet, 
because lucene-core shouldn't have dependencies and we don't want to use JNI in 
the lucere-core build. Maybe we can reconsider when Project Panama lands and it 
gets easier to interact with native libraries.

> ZSTD Compressor support in Lucene
> ---------------------------------
>
>                 Key: LUCENE-8739
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8739
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: core/codecs
>            Reporter: Sean Torres
>            Priority: Minor
>              Labels: features
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to