msokolov opened a new pull request #811: LUCENE-8920: Fix bug preventing FST 
duplicate tails from being shared…
URL: https://github.com/apache/lucene-solr/pull/811
 
 
   … when encoded as array-with-gaps
   While trying to reduce the size of FSTs with array-with-gap encoding, I 
found that I had neglected to update the comparison function in NodeHash that 
is used to determine when two arcs are equal, enabling shared tails to be 
collapsed together. That behavior wasn't tested anywhere, and relied on some 
internal details of the Arc encoding to short circuit the equality test when 
two array Arcs are different-sized.
   
   This patch adds a function to check if arcs are packed array, and thus 
amenable to such an optimization, and a unit test that demonstrates the size 
reduction. 
   
   This fix won't address the worst-case example Adrien posted, but it 
addresses a common case, I think. It would be interesting to see how the ES 
benchmarks are impacted by this fix. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to