[ https://issues.apache.org/jira/browse/LUCENE-9525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Adrien Grand resolved LUCENE-9525. ---------------------------------- Fix Version/s: 8.7 Resolution: Fixed > Better handle small documents with the new Lucene87StoredFieldsFormat > --------------------------------------------------------------------- > > Key: LUCENE-9525 > URL: https://issues.apache.org/jira/browse/LUCENE-9525 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Priority: Minor > Fix For: 8.7 > > Time Spent: 20m > Remaining Estimate: 0h > > Stored fields configure a maximum number of fields per block, whose goal is > to make sure that you don't decompress more than X documents to get access to > a single one. However this has interesting effects with the new format. > For instance we use 4kB of dictionary and blocks of 60kB for at most 512 > documents per block. So if your documents are very small, say 10 bytes, the > block will be 5120 bytes overall, and we'll first compress 4096 bytes > independently, and then 5120-4096=1024 bytes with 4096 bytes of dictionary. > In this case training the dictionary takes more time than actually > compressing the data, and it's not even sure it's worth it since only 1024 > bytes out of the 5120 bytes of the block get compressed with a preset > dictionary. > I'm considering adapting the dictionary size and the block size to the total > block size in order to better handle such cases. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org