krickert commented on PR #15676:
URL: https://github.com/apache/lucene/pull/15676#issuecomment-3867055536
So initial tests:
* Proof of Concept: In a skewed 4-shard cluster (Literature vs.
Wikipedia), collaborative pruning reduced total cluster work by 46% for a
Wikipedia query.
* The "Cold Shard": Individual shards unrelated to the query (like
Tolstoy shards for a Wiki query) saw up to 66.3% fewer node visits.
* Accuracy: Recall remained at 0.95+ against the baseline.
k=100 was the sweet sweet spot though.
The fact that the recall isn't at 100% but still remains high is something I
can look deeper into if it starts to fall too low. Also, I'm going to write a
test that goes against wikipedia data so I can see if I can get k=10000 to be
performant. I think with very large indices per shard (like 200M per shard?)
we can see fast searches at k=10000? That's the theory..
I'll commit what I have in a few min but it'll be disabled by default. I'll
figure out a way so it's easier for anyone on this thread to test it too.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]