[ 
https://issues.apache.org/jira/browse/LUCENE-9525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-9525.
----------------------------------
    Fix Version/s: 8.7
       Resolution: Fixed

> Better handle small documents with the new Lucene87StoredFieldsFormat
> ---------------------------------------------------------------------
>
>                 Key: LUCENE-9525
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9525
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>             Fix For: 8.7
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Stored fields configure a maximum number of fields per block, whose goal is 
> to make sure that you don't decompress more than X documents to get access to 
> a single one. However this has interesting effects with the new format.
> For instance we use 4kB of dictionary and blocks of 60kB for at most 512 
> documents per block. So if your documents are very small, say 10 bytes, the 
> block will be 5120 bytes overall, and we'll first compress 4096 bytes 
> independently, and then 5120-4096=1024 bytes with 4096 bytes of dictionary. 
> In this case training the dictionary takes more time than actually 
> compressing the data, and it's not even sure it's worth it since only 1024 
> bytes out of the 5120 bytes of the block get compressed with a preset 
> dictionary.
> I'm considering adapting the dictionary size and the block size to the total 
> block size in order to better handle such cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to