[ https://issues.apache.org/jira/browse/LUCENE-9877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Greg Miller updated LUCENE-9877: -------------------------------- Status: Open (was: Patch Available) > Explore increasing the allowable exceptions in PForUtil > ------------------------------------------------------- > > Key: LUCENE-9877 > URL: https://issues.apache.org/jira/browse/LUCENE-9877 > Project: Lucene - Core > Issue Type: Task > Components: core/codecs > Affects Versions: main (9.0) > Reporter: Greg Miller > Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > Piggybacking a little off of the investigation I was doing over in > LUCENE-9850 I thought it might also be worth-while exploring the impact of > increasing the number of allowable exceptions in PForUtil. The aim of this > investigation is to see if we could reduce index size by allowing for more > exceptions without significant negative impact to performance. > PForUtil currently allows for up to 3 exceptions, and it only uses 3 bits to > encode the number of exceptions (using the remaining 3 bits of the byte used > to also encode the number of bits-per-value, which requires 5 bits). Each > exception used is encoded with a two full bytes, using a maximum of 6 bytes > per block. > It seems to me like 7 might be a more ideal number of exceptions if index > size is the driving motivation. My thought process is that, in the > worst-case, 7 exceptions would be used to save only a single bit-per-value in > the corresponding block. With 128 entries per block, this would save 16 > bytes. So with 14 bytes used to encode the exception values (7 x 2 bytes per > exception), we would save a two bytes in total (just slightly better than > breaking even). If we need fewer than the 7 exceptions, or if we're able to > save more than 1 bit-per-value, it's all additional savings. I suppose the > question is what kind of performance hit we might observe due to decoding > more exceptions. > Also note that 7 exceptions is the max we can encode with the 3 bits we > currently have available for the number of exceptions. So moving to 8 > exceptions would not only take 16 bytes to encode the exceptions (if using > all of them), but we'd need one more byte per block to encode the exception > count. So in the worst case of using all 8 exceptions to save 1 bit per > value, we'd actually be worse off. > I'll post some results here for discussion or at least for public record of > my work for future reference. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org