benwtrent commented on issue #12627: URL: https://github.com/apache/lucene/issues/12627#issuecomment-1912598493
OK, I added https://github.com/mikemccand/luceneutil/pull/253 Doing some local benchmarking. It seems that the more merges occur, the worse we can get. Sometimes I get good graphs like this: ``` Leaf 5 has 5 layers Leaf 5 has 137196 documents Graph level=4 size=2, Fanout min=1, mean=1.00, max=1 % 0 10 20 30 40 50 60 70 80 90 100 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Graph level=3 size=34, Fanout min=4, mean=7.82, max=12 % 0 10 20 30 40 50 60 70 80 90 100 0 5 6 7 7 8 8 9 9 10 12 Graph level=2 size=545, Fanout min=13, mean=15.98, max=16 % 0 10 20 30 40 50 60 70 80 90 100 0 16 16 16 16 16 16 16 16 16 16 Graph level=1 size=8693, Fanout min=16, mean=16.00, max=16 % 0 10 20 30 40 50 60 70 80 90 100 0 16 16 16 16 16 16 16 16 16 16 Graph level=0 size=137196, Fanout min=3, mean=24.57, max=32 % 0 10 20 30 40 50 60 70 80 90 100 0 14 17 19 22 26 31 32 32 32 32 Graph level=4 size=2, connectedness=1.00 Graph level=3 size=34, connectedness=1.00 Graph level=2 size=545, connectedness=1.00 Graph level=1 size=8693, connectedness=1.00 Graph level=0 size=137196, connectedness=1.00 ``` Other times I get graphs that are pretty abysmal: ``` Leaf 7 has 4 layers Leaf 7 has 39628 documents Graph level=3 size=8, Fanout min=1, mean=1.75, max=7 % 0 10 20 30 40 50 60 70 80 90 100 0 1 1 1 1 1 1 1 1 1 7 7 Graph level=2 size=153, Fanout min=1, mean=1.10, max=16 % 0 10 20 30 40 50 60 70 80 90 100 0 1 1 1 1 1 1 1 1 1 16 Graph level=1 size=2503, Fanout min=1, mean=1.01, max=16 % 0 10 20 30 40 50 60 70 80 90 100 0 1 1 1 1 1 1 1 1 1 16 Graph level=0 size=39628, Fanout min=1, mean=1.00, max=32 % 0 10 20 30 40 50 60 70 80 90 100 0 1 1 1 1 1 1 1 1 1 32 Graph level=3 size=8, connectedness=1.00 Graph level=2 size=153, connectedness=0.12 Graph level=1 size=2503, connectedness=0.01 Graph level=0 size=39628, connectedness=0.00 ``` The numbers are so bad, I almost think this is a bug in my measurements, but it isn't clear to me where it would be. I am going to validate older versions of Lucene to see if this changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org