[ https://issues.apache.org/jira/browse/SOLR-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17261002#comment-17261002 ]
ASF subversion and git services commented on SOLR-14923: -------------------------------------------------------- Commit 2417aa085f646f1c5b13ee1bc55d286460dfe826 in lucene-solr's branch refs/heads/branch_8x from David Smiley [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2417aa0 ] SOLR-14923: Nested docs indexing perf & robustness (#2159) * When the schema defines _root_, and you want to do atomic/partial updates... ** _root_ needn't be stored or have docValues any more ** _nest_path_ field isn't needed for this any more ** Simplified internal logic * Allow (and recommend, eventually insist) that the _root_ field be passed for atomic/partial updates to child docs. ** In the absence of _root_, assume the _route_ param is equivalent to ameliorate back-compat scope. This is a temporary hack; remove in SOLR-15064. ** One of the two is required; you'll get an exception if the assumption is false. THIS IS A BACK-COMPAT CHANGE * Ensure that the update log contains the _root_ field if it's defined in the schema; in some cases it wasn't. It's important for robustness of atomic/partial updates to child docs. Caveat: the buffer replay scenario is not tested with child docs. * Limited the cases when a realtime searcher is re-opened. It was being applied to any update that included child docs but now only some narrow subset: only for atomic/partial updates, and when the update log contains an in-place update for the same nest because it's complicated to resolve those log entries. * Internal improvements to RealTimeGetComponent to aid clarity & robustness & probably performance... ** Use SolrDocumentFetcher.solrDoc(docID, ReturnFields) instead of more manual loading. Will do more with this in another PR. ** Clarify when only root doc IDs are expected. ** Use Resolution enum more, add PARTIAL, remove DOC_WITH_CHILDREN; enhance docs. ** When have ReturnFields, a Set of "onlyTheseFields" becomes redundant. Add a child doc resolution via a transformer when needed. ** Clarified where copy-field targets are removed * NestPathField should default to single valued, instead of inheriting the schema default, which for ancient schemas was multi-valued. * AddUpdateCommand.getLuceneDocument(s) methods are very internal; made package visible and refactored a bit for clarity * DocumentBuilder: when in-place update, skip id and _root_ here, thus also simplifying further logic * NestedShardedAtomicUpdateTest no longer extends AbstractFullDistribZkTestBase because it wasn't really leveraging the "control client" checking, and it added too much complexity to debug failures. (cherry picked from commit 4cb3ad4a1c40b4326aec64577a7e60018f7f1a5e) > Indexing performance is unacceptable when child documents are involved > ---------------------------------------------------------------------- > > Key: SOLR-14923 > URL: https://issues.apache.org/jira/browse/SOLR-14923 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: update, UpdateRequestProcessors > Affects Versions: 8.3, 8.4, 8.5, 8.6, 8.7, master (9.0) > Reporter: Thomas Wöckinger > Priority: Critical > Labels: performance, pull-request-available > Time Spent: 2h 50m > Remaining Estimate: 0h > > Parallel indexing does not make sense at moment when child documents are used. > The org.apache.solr.update.processor.DistributedUpdateProcessor checks at the > end of the method doVersionAdd if Ulog caches should be refreshed. > This check will return true if any child document is included in the > AddUpdateCommand. > If so ulog.openRealtimeSearcher(); is called, this call is very expensive, > and executed in a synchronized block of the UpdateLog instance, therefore all > other operations on the UpdateLog are blocked too. > Because every important UpdateLog method (add, delete, ...) is done using a > synchronized block almost each operation is blocked. > This reduces multi threaded index update to a single thread behavior. > The described behavior is not depending on any option of the UpdateRequest, > so it does not make any difference if 'waitFlush', 'waitSearcher' or > 'softCommit' is true or false. > The described behavior makes the usage of ChildDocuments useless, because the > performance is unacceptable. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org