[ https://issues.apache.org/jira/browse/SOLR-16838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17729558#comment-17729558 ]
Rahul Goswami edited comment on SOLR-16838 at 6/6/23 3:51 AM: -------------------------------------------------------------- Running further benchmarks (this time for 3 million docs) reveals that the slowness is in the searcher.getFirstMatch() call inside getInputDocument() . The call eventually ends up in Lucene's SegmentTermsEnum.seekExact() which is where the regression seems to be. *+Solr 7.7.2+* 2023-06-01 21:17:34.492 WARN (qtp1034094674-41) [ x:techproducts] o.a.s.u.p.LogUpdateProcessorFactory RTG timing stats:: tlogFetchTime: 508053 ; *searcherFetchTime: 3229011* *+Solr 8+* 2023-06-01 20:43:31.767 WARN (qtp391506011-56) [ x:techproducts] o.a.s.u.p.LogUpdateProcessorFactory RTG timing stats:: tlogFetchTime: 410873 ; *searcherFetchTime: 33296008* was (Author: rahul196...@gmail.com): Running further benchmarks reveals that the slowness is in the searcher.getFirstMatch() call inside getInputDocument() . The call eventually ends up in Lucene's SegmentTermsEnum.seekExact() which is where the regression seems to be. *+Solr 7.7.2+* 2023-06-01 21:17:34.492 WARN (qtp1034094674-41) [ x:techproducts] o.a.s.u.p.LogUpdateProcessorFactory RTG timing stats:: tlogFetchTime: 508053 ; *searcherFetchTime: 3229011* *+Solr 8+* 2023-06-01 20:43:31.767 WARN (qtp391506011-56) [ x:techproducts] o.a.s.u.p.LogUpdateProcessorFactory RTG timing stats:: tlogFetchTime: 410873 ; *searcherFetchTime: 33296008* > Atomic updates too slow in Solr 8 vs Solr 7 > ------------------------------------------- > > Key: SOLR-16838 > URL: https://issues.apache.org/jira/browse/SOLR-16838 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SearchComponents - other > Affects Versions: 8.11.1 > Reporter: Rahul Goswami > Priority: Major > > Started experiencing slowness with updates in production after upgrading from > Solr 7.7.2 to 8.11.1. Upon comparing the performance it turns out that > indexing 20 million docs via atomic updates through the same client program > (running 15 parallel threads indexing in batches of 1000) takes below time: > > Solr 7 : 78 mins > Solr 8: 370 mins > > Environment details: > - Java 11 on Windows server > - Xms1536m Xmx3072m > - Indexing client code running 15 parallel threads indexing in batches of 1000 > - using SimpleFSDirectoryFactory (since Mmap doesn't quite work well on > Windows for our index sizes which commonly run north of 1 TB) > > Looking at the thread dump, the bottleneck seems to be RealTimeGet and I can > see that Solr 7 takes a different code path than Solr 8. Note that the > performance of regular updates (non-atomic) is still pretty good on Solr 8 > completing in < 1 hour for the same 20 million data set. > > Sharing the indexing code, solrconfig, schema and thread dumps in the link > below: > [https://drive.google.com/drive/folders/1q2DPNTYQEU6fi3NeXIKJhaoq3KPnms0h?usp=sharing] -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org