[
https://issues.apache.org/jira/browse/SOLR-17756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17950518#comment-17950518
]
Matthew Biscocho commented on SOLR-17756:
-----------------------------------------
Doing some testing of the PR above, I got some numbers. FYI my machine has 12
cores. I created a single core in Solr and indexed ~118 million docs with only
an ID which created 58 segments. 2 segments had 31 million documents. I
invalidated the fingerprint cache for my tests as well.
Sequentially (Original) - ~631 ms
2025-05-08 22:23:32.798 INFO (qtp436094532-32-localhost-7) [c:gettingstarted
s:shard1 r:core_node2 x:gettingstarted_shard1_replica_n1 t:localhost-7]
o.a.s.u.IndexFingerprint IndexFingerprint millis:631.0
result:\{maxVersionSpecified=9223372036854775807,
maxVersionEncountered=1831592552090828800, maxInHash=1831592552090828800,
versionsHash=6472754633150858610, numVersions=118657846, numDocs=118657846,
maxDoc=31554300}
2025-05-08 22:23:34.515 INFO (qtp436094532-38-localhost-8) [c:gettingstarted
s:shard1 r:core_node2 x:gettingstarted_shard1_replica_n1 t:localhost-8]
o.a.s.u.IndexFingerprint IndexFingerprint millis:665.0
result:\{maxVersionSpecified=9223372036854775807,
maxVersionEncountered=1831592552090828800, maxInHash=1831592552090828800,
versionsHash=6472754633150858610, numVersions=118657846, numDocs=118657846,
maxDoc=31554300}
Parallel (12 Cores) - ~249 ms
2025-05-08 22:19:51.563 INFO (qtp436094532-204-localhost-13345662)
[c:gettingstarted s:shard1 r:core_node2 x:gettingstarted_shard1_replica_n1
t:localhost-13345662] o.a.s.u.IndexFingerprint IndexFingerprint millis:249.0
result:\{maxVersionSpecified=9223372036854775807,
maxVersionEncountered=1831592552090828800, maxInHash=1831592552090828800,
versionsHash=6472754633150858610, numVersions=118657846, numDocs=118657846,
maxDoc=31554300}
2025-05-08 22:19:52.304 INFO (qtp436094532-260-localhost-13345663)
[c:gettingstarted s:shard1 r:core_node2 x:gettingstarted_shard1_replica_n1
t:localhost-13345663] o.a.s.u.IndexFingerprint IndexFingerprint millis:249.0
result:\{maxVersionSpecified=9223372036854775807,
maxVersionEncountered=1831592552090828800, maxInHash=1831592552090828800,
versionsHash=6472754633150858610, numVersions=118657846, numDocs=118657846,
maxDoc=31554300}
So there is definitely some improvement here but I'd be curious to see how much
of an improvement on a much larger documents and more segments. In a real life
scenario with a fingerprint cache on some of the older untouched segments it
might only be going over the new smaller segments this should help.
> Parallelize calculation of index fingerprint across segments
> ------------------------------------------------------------
>
> Key: SOLR-17756
> URL: https://issues.apache.org/jira/browse/SOLR-17756
> Project: Solr
> Issue Type: Improvement
> Affects Versions: main (10.0), 8.11.4, 9.8.1
> Reporter: Matthew Biscocho
> Assignee: Matthew Biscocho
> Priority: Major
> Labels: pull-request-available
> Time Spent: 50m
> Remaining Estimate: 0h
>
> The index fingerprint is currently being calculated on each segment
> sequentially. While this works fine, the index fingerprint calculation was
> noticed to be a very slow process and on leader election is blocking.
> This proposes to have this calculation parallelized across segments instead.
> Since the fingerprint is just a cumulative sum of a hash on versions, the
> order in which it is added to the running sum should not matter.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]