[ 
https://issues.apache.org/jira/browse/LUCENE-9525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196903#comment-17196903
 ] 

ASF subversion and git services commented on LUCENE-9525:
---------------------------------------------------------

Commit 9cd3af50f8093ddf9c70c90fa7cc8e1103ecabb7 in lucene-solr's branch 
refs/heads/branch_8x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=9cd3af5 ]

LUCENE-9525: Better handle small documents with Lucene87StoredFieldsFormat. 
(#1876)

Instead of configuring a dictionary size and a block size, the format
now tries to have 10 sub blocks per bigger block, and adapts the size of
the dictionary and of the sub blocks to this overall block size.

> Better handle small documents with the new Lucene87StoredFieldsFormat
> ---------------------------------------------------------------------
>
>                 Key: LUCENE-9525
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9525
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Stored fields configure a maximum number of fields per block, whose goal is 
> to make sure that you don't decompress more than X documents to get access to 
> a single one. However this has interesting effects with the new format.
> For instance we use 4kB of dictionary and blocks of 60kB for at most 512 
> documents per block. So if your documents are very small, say 10 bytes, the 
> block will be 5120 bytes overall, and we'll first compress 4096 bytes 
> independently, and then 5120-4096=1024 bytes with 4096 bytes of dictionary. 
> In this case training the dictionary takes more time than actually 
> compressing the data, and it's not even sure it's worth it since only 1024 
> bytes out of the 5120 bytes of the block get compressed with a preset 
> dictionary.
> I'm considering adapting the dictionary size and the block size to the total 
> block size in order to better handle such cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to