[jira] [Commented] (SOLR-9941) log replay redundently (pre-)applies DBQs as if they were out of order
[ https://issues.apache.org/jira/browse/SOLR-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15824621#comment-15824621 ] ASF subversion and git services commented on SOLR-9941: --- Commit 38af094d175daebe4093782cc06e964cfc2dd14b in lucene-solr's branch refs/heads/master from [~ichattopadhyaya] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=38af094 ] SOLR-9941: Moving changelog entry from 7.0.0 to 6.5.0 > log replay redundently (pre-)applies DBQs as if they were out of order > -- > > Key: SOLR-9941 > URL: https://issues.apache.org/jira/browse/SOLR-9941 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man >Assignee: Ishan Chattopadhyaya > Attachments: SOLR-9941.hoss-test-experiment.patch, SOLR-9941.patch, > SOLR-9941.patch, SOLR-9941.patch, SOLR-9941.patch, SOLR-9941.patch > > > There's kind of an odd situation that arises when a Solr node starts up > (after a crash) and tries to recover from it's tlog that causes deletes to be > redundantly & excessively applied -- at a minimum it causes confusing really > log messages > * {{UpdateLog.init(...)}} creates {{TransactionLog}} instances for the most > recent log files found (based on numRecordsToKeep) and then builds a > {{RecentUpdates}} instance from them > * Delete entries from the {{RecentUpdates}} are used to populate 2 lists: > ** {{deleteByQueries}} > ** {{oldDeletes}} (for deleteById). > * Then when {{UpdateLog.recoverFromLog}} is called a {{LogReplayer}} is used > to replay any (uncommited) {{TransactionLog}} enteries > ** during replay {{UpdateLog}} delegates to the UpdateRequestProcessorChain > to for the various adds/deletes, etc... > ** when an add makes it to {{RunUpdateProcessor}} it delegates to > {{DirectUpdateHandler2}}, which (independent of the fact that we're in log > replay) calls {{UpdateLog.getDBQNewer}} for every add, looking for any > "Reordered" deletes that have a version greater then the add > *** if it finds _any_ DBQs "newer" then the document being added, it does a > low level {{IndexWriter.updateDocument}} and then immediately executes _all_ > the newer DBQs ... _once per add_ > ** these deletes are *also* still executed as part of the normal tlog replay, > because they are in the tlog. > Which means if you are recovering from a tlog with 90 addDocs, followed by 5 > DBQs, then *each* of those 5 DBQs will each be executed 91 times -- and for > 90 of those executions, a DUH2 INFO log messages will say {{"Reordered DBQs > detected. ..."}} even tough the only reason they are out of order is because > Solr is deliberately applying them out of order. > * At a minimum we should improve the log messages > * Ideally we should stop (pre-emptively) applying these deletes during tlog > replay. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9941) log replay redundently (pre-)applies DBQs as if they were out of order
[ https://issues.apache.org/jira/browse/SOLR-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15824614#comment-15824614 ] ASF subversion and git services commented on SOLR-9941: --- Commit 302ce326d5a2eee043445918fa3e3885dc003b2f in lucene-solr's branch refs/heads/branch_6x from [~ichattopadhyaya] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=302ce32 ] SOLR-9941: Clear deletes lists before log replay > log replay redundently (pre-)applies DBQs as if they were out of order > -- > > Key: SOLR-9941 > URL: https://issues.apache.org/jira/browse/SOLR-9941 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man >Assignee: Ishan Chattopadhyaya > Attachments: SOLR-9941.hoss-test-experiment.patch, SOLR-9941.patch, > SOLR-9941.patch, SOLR-9941.patch, SOLR-9941.patch, SOLR-9941.patch > > > There's kind of an odd situation that arises when a Solr node starts up > (after a crash) and tries to recover from it's tlog that causes deletes to be > redundantly & excessively applied -- at a minimum it causes confusing really > log messages > * {{UpdateLog.init(...)}} creates {{TransactionLog}} instances for the most > recent log files found (based on numRecordsToKeep) and then builds a > {{RecentUpdates}} instance from them > * Delete entries from the {{RecentUpdates}} are used to populate 2 lists: > ** {{deleteByQueries}} > ** {{oldDeletes}} (for deleteById). > * Then when {{UpdateLog.recoverFromLog}} is called a {{LogReplayer}} is used > to replay any (uncommited) {{TransactionLog}} enteries > ** during replay {{UpdateLog}} delegates to the UpdateRequestProcessorChain > to for the various adds/deletes, etc... > ** when an add makes it to {{RunUpdateProcessor}} it delegates to > {{DirectUpdateHandler2}}, which (independent of the fact that we're in log > replay) calls {{UpdateLog.getDBQNewer}} for every add, looking for any > "Reordered" deletes that have a version greater then the add > *** if it finds _any_ DBQs "newer" then the document being added, it does a > low level {{IndexWriter.updateDocument}} and then immediately executes _all_ > the newer DBQs ... _once per add_ > ** these deletes are *also* still executed as part of the normal tlog replay, > because they are in the tlog. > Which means if you are recovering from a tlog with 90 addDocs, followed by 5 > DBQs, then *each* of those 5 DBQs will each be executed 91 times -- and for > 90 of those executions, a DUH2 INFO log messages will say {{"Reordered DBQs > detected. ..."}} even tough the only reason they are out of order is because > Solr is deliberately applying them out of order. > * At a minimum we should improve the log messages > * Ideally we should stop (pre-emptively) applying these deletes during tlog > replay. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9941) log replay redundently (pre-)applies DBQs as if they were out of order
[ https://issues.apache.org/jira/browse/SOLR-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15824615#comment-15824615 ] ASF subversion and git services commented on SOLR-9941: --- Commit 7ef8cf7d6aad25888de4cffc4c20239694a67a45 in lucene-solr's branch refs/heads/branch_6x from [~ichattopadhyaya] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7ef8cf7 ] SOLR-9941: Adding the Optimizations section to the CHANGES.txt > log replay redundently (pre-)applies DBQs as if they were out of order > -- > > Key: SOLR-9941 > URL: https://issues.apache.org/jira/browse/SOLR-9941 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man >Assignee: Ishan Chattopadhyaya > Attachments: SOLR-9941.hoss-test-experiment.patch, SOLR-9941.patch, > SOLR-9941.patch, SOLR-9941.patch, SOLR-9941.patch, SOLR-9941.patch > > > There's kind of an odd situation that arises when a Solr node starts up > (after a crash) and tries to recover from it's tlog that causes deletes to be > redundantly & excessively applied -- at a minimum it causes confusing really > log messages > * {{UpdateLog.init(...)}} creates {{TransactionLog}} instances for the most > recent log files found (based on numRecordsToKeep) and then builds a > {{RecentUpdates}} instance from them > * Delete entries from the {{RecentUpdates}} are used to populate 2 lists: > ** {{deleteByQueries}} > ** {{oldDeletes}} (for deleteById). > * Then when {{UpdateLog.recoverFromLog}} is called a {{LogReplayer}} is used > to replay any (uncommited) {{TransactionLog}} enteries > ** during replay {{UpdateLog}} delegates to the UpdateRequestProcessorChain > to for the various adds/deletes, etc... > ** when an add makes it to {{RunUpdateProcessor}} it delegates to > {{DirectUpdateHandler2}}, which (independent of the fact that we're in log > replay) calls {{UpdateLog.getDBQNewer}} for every add, looking for any > "Reordered" deletes that have a version greater then the add > *** if it finds _any_ DBQs "newer" then the document being added, it does a > low level {{IndexWriter.updateDocument}} and then immediately executes _all_ > the newer DBQs ... _once per add_ > ** these deletes are *also* still executed as part of the normal tlog replay, > because they are in the tlog. > Which means if you are recovering from a tlog with 90 addDocs, followed by 5 > DBQs, then *each* of those 5 DBQs will each be executed 91 times -- and for > 90 of those executions, a DUH2 INFO log messages will say {{"Reordered DBQs > detected. ..."}} even tough the only reason they are out of order is because > Solr is deliberately applying them out of order. > * At a minimum we should improve the log messages > * Ideally we should stop (pre-emptively) applying these deletes during tlog > replay. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9941) log replay redundently (pre-)applies DBQs as if they were out of order
[ https://issues.apache.org/jira/browse/SOLR-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15818754#comment-15818754 ] ASF subversion and git services commented on SOLR-9941: --- Commit 04f45aab768b053f32feece7343f994d25f0bb26 in lucene-solr's branch refs/heads/master from [~ichattopadhyaya] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=04f45aa ] SOLR-9941: Clear deletes lists before log replay > log replay redundently (pre-)applies DBQs as if they were out of order > -- > > Key: SOLR-9941 > URL: https://issues.apache.org/jira/browse/SOLR-9941 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man > Attachments: SOLR-9941.hoss-test-experiment.patch, SOLR-9941.patch, > SOLR-9941.patch, SOLR-9941.patch, SOLR-9941.patch, SOLR-9941.patch > > > There's kind of an odd situation that arises when a Solr node starts up > (after a crash) and tries to recover from it's tlog that causes deletes to be > redundantly & excessively applied -- at a minimum it causes confusing really > log messages > * {{UpdateLog.init(...)}} creates {{TransactionLog}} instances for the most > recent log files found (based on numRecordsToKeep) and then builds a > {{RecentUpdates}} instance from them > * Delete entries from the {{RecentUpdates}} are used to populate 2 lists: > ** {{deleteByQueries}} > ** {{oldDeletes}} (for deleteById). > * Then when {{UpdateLog.recoverFromLog}} is called a {{LogReplayer}} is used > to replay any (uncommited) {{TransactionLog}} enteries > ** during replay {{UpdateLog}} delegates to the UpdateRequestProcessorChain > to for the various adds/deletes, etc... > ** when an add makes it to {{RunUpdateProcessor}} it delegates to > {{DirectUpdateHandler2}}, which (independent of the fact that we're in log > replay) calls {{UpdateLog.getDBQNewer}} for every add, looking for any > "Reordered" deletes that have a version greater then the add > *** if it finds _any_ DBQs "newer" then the document being added, it does a > low level {{IndexWriter.updateDocument}} and then immediately executes _all_ > the newer DBQs ... _once per add_ > ** these deletes are *also* still executed as part of the normal tlog replay, > because they are in the tlog. > Which means if you are recovering from a tlog with 90 addDocs, followed by 5 > DBQs, then *each* of those 5 DBQs will each be executed 91 times -- and for > 90 of those executions, a DUH2 INFO log messages will say {{"Reordered DBQs > detected. ..."}} even tough the only reason they are out of order is because > Solr is deliberately applying them out of order. > * At a minimum we should improve the log messages > * Ideally we should stop (pre-emptively) applying these deletes during tlog > replay. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9941) log replay redundently (pre-)applies DBQs as if they were out of order
[ https://issues.apache.org/jira/browse/SOLR-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15815615#comment-15815615 ] Christine Poerschke commented on SOLR-9941: --- Tentatively linking SOLR-8760 and SOLR-9941 as related (but haven't yet had chance to catchup on SOLR-9941 here to see if it would solve SOLR-8760 also or indeed if the issues are different). > log replay redundently (pre-)applies DBQs as if they were out of order > -- > > Key: SOLR-9941 > URL: https://issues.apache.org/jira/browse/SOLR-9941 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man > Attachments: SOLR-9941.hoss-test-experiment.patch, SOLR-9941.patch, > SOLR-9941.patch, SOLR-9941.patch, SOLR-9941.patch > > > There's kind of an odd situation that arises when a Solr node starts up > (after a crash) and tries to recover from it's tlog that causes deletes to be > redundantly & excessively applied -- at a minimum it causes confusing really > log messages > * {{UpdateLog.init(...)}} creates {{TransactionLog}} instances for the most > recent log files found (based on numRecordsToKeep) and then builds a > {{RecentUpdates}} instance from them > * Delete entries from the {{RecentUpdates}} are used to populate 2 lists: > ** {{deleteByQueries}} > ** {{oldDeletes}} (for deleteById). > * Then when {{UpdateLog.recoverFromLog}} is called a {{LogReplayer}} is used > to replay any (uncommited) {{TransactionLog}} enteries > ** during replay {{UpdateLog}} delegates to the UpdateRequestProcessorChain > to for the various adds/deletes, etc... > ** when an add makes it to {{RunUpdateProcessor}} it delegates to > {{DirectUpdateHandler2}}, which (independent of the fact that we're in log > replay) calls {{UpdateLog.getDBQNewer}} for every add, looking for any > "Reordered" deletes that have a version greater then the add > *** if it finds _any_ DBQs "newer" then the document being added, it does a > low level {{IndexWriter.updateDocument}} and then immediately executes _all_ > the newer DBQs ... _once per add_ > ** these deletes are *also* still executed as part of the normal tlog replay, > because they are in the tlog. > Which means if you are recovering from a tlog with 90 addDocs, followed by 5 > DBQs, then *each* of those 5 DBQs will each be executed 91 times -- and for > 90 of those executions, a DUH2 INFO log messages will say {{"Reordered DBQs > detected. ..."}} even tough the only reason they are out of order is because > Solr is deliberately applying them out of order. > * At a minimum we should improve the log messages > * Ideally we should stop (pre-emptively) applying these deletes during tlog > replay. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9941) log replay redundently (pre-)applies DBQs as if they were out of order
[ https://issues.apache.org/jira/browse/SOLR-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15814488#comment-15814488 ] Cao Manh Dat commented on SOLR-9941: +1 for remove {{boolean onStartup}}. The rest a good to me. > log replay redundently (pre-)applies DBQs as if they were out of order > -- > > Key: SOLR-9941 > URL: https://issues.apache.org/jira/browse/SOLR-9941 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man > Attachments: SOLR-9941.hoss-test-experiment.patch, SOLR-9941.patch, > SOLR-9941.patch, SOLR-9941.patch > > > There's kind of an odd situation that arises when a Solr node starts up > (after a crash) and tries to recover from it's tlog that causes deletes to be > redundantly & excessively applied -- at a minimum it causes confusing really > log messages > * {{UpdateLog.init(...)}} creates {{TransactionLog}} instances for the most > recent log files found (based on numRecordsToKeep) and then builds a > {{RecentUpdates}} instance from them > * Delete entries from the {{RecentUpdates}} are used to populate 2 lists: > ** {{deleteByQueries}} > ** {{oldDeletes}} (for deleteById). > * Then when {{UpdateLog.recoverFromLog}} is called a {{LogReplayer}} is used > to replay any (uncommited) {{TransactionLog}} enteries > ** during replay {{UpdateLog}} delegates to the UpdateRequestProcessorChain > to for the various adds/deletes, etc... > ** when an add makes it to {{RunUpdateProcessor}} it delegates to > {{DirectUpdateHandler2}}, which (independent of the fact that we're in log > replay) calls {{UpdateLog.getDBQNewer}} for every add, looking for any > "Reordered" deletes that have a version greater then the add > *** if it finds _any_ DBQs "newer" then the document being added, it does a > low level {{IndexWriter.updateDocument}} and then immediately executes _all_ > the newer DBQs ... _once per add_ > ** these deletes are *also* still executed as part of the normal tlog replay, > because they are in the tlog. > Which means if you are recovering from a tlog with 90 addDocs, followed by 5 > DBQs, then *each* of those 5 DBQs will each be executed 91 times -- and for > 90 of those executions, a DUH2 INFO log messages will say {{"Reordered DBQs > detected. ..."}} even tough the only reason they are out of order is because > Solr is deliberately applying them out of order. > * At a minimum we should improve the log messages > * Ideally we should stop (pre-emptively) applying these deletes during tlog > replay. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9941) log replay redundently (pre-)applies DBQs as if they were out of order
[ https://issues.apache.org/jira/browse/SOLR-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813726#comment-15813726 ] Cao Manh Dat commented on SOLR-9941: Here are the reason to explain why Hoss's test fails : - When the test add updates, it will be written to tlog- - After the core is restarted, it will replay the updates from tlog- - In the same time, the test add another DEQ, because the ulog is on buffering mode, DUP will write this DEQ to another file ( tlog-0001) and ignore apply this DEQ to IW. - After LogReplay finish, we call a commit, which will write a commit update at the end of tlog-0001 So this DEQ update will never be replayed ( because it belong tlog-0001 not tlog- ), and it've never been applied to IW, so that DEQ update is lost. Even we restart the core, to hoping that it will replay tlog-0001, but because we write an commit at the end of tlog-0001, tlog-0001 will never be replayed. So I think this fail belong to another issue. > log replay redundently (pre-)applies DBQs as if they were out of order > -- > > Key: SOLR-9941 > URL: https://issues.apache.org/jira/browse/SOLR-9941 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man > Attachments: SOLR-9941.hoss-test-experiment.patch, SOLR-9941.patch, > SOLR-9941.patch, SOLR-9941.patch > > > There's kind of an odd situation that arises when a Solr node starts up > (after a crash) and tries to recover from it's tlog that causes deletes to be > redundantly & excessively applied -- at a minimum it causes confusing really > log messages > * {{UpdateLog.init(...)}} creates {{TransactionLog}} instances for the most > recent log files found (based on numRecordsToKeep) and then builds a > {{RecentUpdates}} instance from them > * Delete entries from the {{RecentUpdates}} are used to populate 2 lists: > ** {{deleteByQueries}} > ** {{oldDeletes}} (for deleteById). > * Then when {{UpdateLog.recoverFromLog}} is called a {{LogReplayer}} is used > to replay any (uncommited) {{TransactionLog}} enteries > ** during replay {{UpdateLog}} delegates to the UpdateRequestProcessorChain > to for the various adds/deletes, etc... > ** when an add makes it to {{RunUpdateProcessor}} it delegates to > {{DirectUpdateHandler2}}, which (independent of the fact that we're in log > replay) calls {{UpdateLog.getDBQNewer}} for every add, looking for any > "Reordered" deletes that have a version greater then the add > *** if it finds _any_ DBQs "newer" then the document being added, it does a > low level {{IndexWriter.updateDocument}} and then immediately executes _all_ > the newer DBQs ... _once per add_ > ** these deletes are *also* still executed as part of the normal tlog replay, > because they are in the tlog. > Which means if you are recovering from a tlog with 90 addDocs, followed by 5 > DBQs, then *each* of those 5 DBQs will each be executed 91 times -- and for > 90 of those executions, a DUH2 INFO log messages will say {{"Reordered DBQs > detected. ..."}} even tough the only reason they are out of order is because > Solr is deliberately applying them out of order. > * At a minimum we should improve the log messages > * Ideally we should stop (pre-emptively) applying these deletes during tlog > replay. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9941) log replay redundently (pre-)applies DBQs as if they were out of order
[ https://issues.apache.org/jira/browse/SOLR-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813309#comment-15813309 ] Ishan Chattopadhyaya commented on SOLR-9941: {quote} Seems to me that updates arriving during the log replay (in TestRecovery) are being silently dropped. {quote} One possible explanation came to mind: During log replay, the state of the core is REPLAYING. Hence, perhaps, incoming updates are not applied until the recovery has finished and state is back to ACTIVE? > log replay redundently (pre-)applies DBQs as if they were out of order > -- > > Key: SOLR-9941 > URL: https://issues.apache.org/jira/browse/SOLR-9941 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man > Attachments: SOLR-9941.hoss-test-experiment.patch, SOLR-9941.patch, > SOLR-9941.patch, SOLR-9941.patch > > > There's kind of an odd situation that arises when a Solr node starts up > (after a crash) and tries to recover from it's tlog that causes deletes to be > redundantly & excessively applied -- at a minimum it causes confusing really > log messages > * {{UpdateLog.init(...)}} creates {{TransactionLog}} instances for the most > recent log files found (based on numRecordsToKeep) and then builds a > {{RecentUpdates}} instance from them > * Delete entries from the {{RecentUpdates}} are used to populate 2 lists: > ** {{deleteByQueries}} > ** {{oldDeletes}} (for deleteById). > * Then when {{UpdateLog.recoverFromLog}} is called a {{LogReplayer}} is used > to replay any (uncommited) {{TransactionLog}} enteries > ** during replay {{UpdateLog}} delegates to the UpdateRequestProcessorChain > to for the various adds/deletes, etc... > ** when an add makes it to {{RunUpdateProcessor}} it delegates to > {{DirectUpdateHandler2}}, which (independent of the fact that we're in log > replay) calls {{UpdateLog.getDBQNewer}} for every add, looking for any > "Reordered" deletes that have a version greater then the add > *** if it finds _any_ DBQs "newer" then the document being added, it does a > low level {{IndexWriter.updateDocument}} and then immediately executes _all_ > the newer DBQs ... _once per add_ > ** these deletes are *also* still executed as part of the normal tlog replay, > because they are in the tlog. > Which means if you are recovering from a tlog with 90 addDocs, followed by 5 > DBQs, then *each* of those 5 DBQs will each be executed 91 times -- and for > 90 of those executions, a DUH2 INFO log messages will say {{"Reordered DBQs > detected. ..."}} even tough the only reason they are out of order is because > Solr is deliberately applying them out of order. > * At a minimum we should improve the log messages > * Ideally we should stop (pre-emptively) applying these deletes during tlog > replay. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9941) log replay redundently (pre-)applies DBQs as if they were out of order
[ https://issues.apache.org/jira/browse/SOLR-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813256#comment-15813256 ] Ishan Chattopadhyaya commented on SOLR-9941: {quote} I wanted to beef up Ishan's testLogReplayWithReorderedDBQ to prove that if another (probably ordered) DBQ arrived during log replay it would correctly be applied – even if some affected docs hadn't been added yet as part of replay. (ie: prove that "RecentUpdates" was still being used during replay) {quote} Seems to me that updates arriving *during* the log replay (in TestRecovery) are being silently dropped. There's definitely something fishy about it, but I think that is another issue altogether, since even adds during the log replay are not persisting after the replay. Here's [0] a modified version of your test that applies the DBQ *after* recovery has finished. Of course, not nearly as effective in proving anything, and also doesn't test for the premise you were after: {{prove that "RecentUpdates" was still being used during replay}}. [0] - https://paste.fedoraproject.org/524485/05608148 > log replay redundently (pre-)applies DBQs as if they were out of order > -- > > Key: SOLR-9941 > URL: https://issues.apache.org/jira/browse/SOLR-9941 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man > Attachments: SOLR-9941.hoss-test-experiment.patch, SOLR-9941.patch, > SOLR-9941.patch, SOLR-9941.patch > > > There's kind of an odd situation that arises when a Solr node starts up > (after a crash) and tries to recover from it's tlog that causes deletes to be > redundantly & excessively applied -- at a minimum it causes confusing really > log messages > * {{UpdateLog.init(...)}} creates {{TransactionLog}} instances for the most > recent log files found (based on numRecordsToKeep) and then builds a > {{RecentUpdates}} instance from them > * Delete entries from the {{RecentUpdates}} are used to populate 2 lists: > ** {{deleteByQueries}} > ** {{oldDeletes}} (for deleteById). > * Then when {{UpdateLog.recoverFromLog}} is called a {{LogReplayer}} is used > to replay any (uncommited) {{TransactionLog}} enteries > ** during replay {{UpdateLog}} delegates to the UpdateRequestProcessorChain > to for the various adds/deletes, etc... > ** when an add makes it to {{RunUpdateProcessor}} it delegates to > {{DirectUpdateHandler2}}, which (independent of the fact that we're in log > replay) calls {{UpdateLog.getDBQNewer}} for every add, looking for any > "Reordered" deletes that have a version greater then the add > *** if it finds _any_ DBQs "newer" then the document being added, it does a > low level {{IndexWriter.updateDocument}} and then immediately executes _all_ > the newer DBQs ... _once per add_ > ** these deletes are *also* still executed as part of the normal tlog replay, > because they are in the tlog. > Which means if you are recovering from a tlog with 90 addDocs, followed by 5 > DBQs, then *each* of those 5 DBQs will each be executed 91 times -- and for > 90 of those executions, a DUH2 INFO log messages will say {{"Reordered DBQs > detected. ..."}} even tough the only reason they are out of order is because > Solr is deliberately applying them out of order. > * At a minimum we should improve the log messages > * Ideally we should stop (pre-emptively) applying these deletes during tlog > replay. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9941) log replay redundently (pre-)applies DBQs as if they were out of order
[ https://issues.apache.org/jira/browse/SOLR-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813195#comment-15813195 ] Ishan Chattopadhyaya commented on SOLR-9941: {quote} One question i have is: if the only code paths that call recoverFromLog(boolean) are "startup" paths that pass true why do we need the optional argument? why not just refactor the method to always use the new logic? {quote} My thought there was that if someone wanted to reuse the doLogReplay() method later for use during a live node, and not during the startup of a node/core (for whatever reason that I cannot forsee right now), that they not end up clearing their deletes lists in the process. Though, given the current use of the method now, I am also open to eliminate that extra parameter if you suggest. > log replay redundently (pre-)applies DBQs as if they were out of order > -- > > Key: SOLR-9941 > URL: https://issues.apache.org/jira/browse/SOLR-9941 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man > Attachments: SOLR-9941.hoss-test-experiment.patch, SOLR-9941.patch, > SOLR-9941.patch, SOLR-9941.patch > > > There's kind of an odd situation that arises when a Solr node starts up > (after a crash) and tries to recover from it's tlog that causes deletes to be > redundantly & excessively applied -- at a minimum it causes confusing really > log messages > * {{UpdateLog.init(...)}} creates {{TransactionLog}} instances for the most > recent log files found (based on numRecordsToKeep) and then builds a > {{RecentUpdates}} instance from them > * Delete entries from the {{RecentUpdates}} are used to populate 2 lists: > ** {{deleteByQueries}} > ** {{oldDeletes}} (for deleteById). > * Then when {{UpdateLog.recoverFromLog}} is called a {{LogReplayer}} is used > to replay any (uncommited) {{TransactionLog}} enteries > ** during replay {{UpdateLog}} delegates to the UpdateRequestProcessorChain > to for the various adds/deletes, etc... > ** when an add makes it to {{RunUpdateProcessor}} it delegates to > {{DirectUpdateHandler2}}, which (independent of the fact that we're in log > replay) calls {{UpdateLog.getDBQNewer}} for every add, looking for any > "Reordered" deletes that have a version greater then the add > *** if it finds _any_ DBQs "newer" then the document being added, it does a > low level {{IndexWriter.updateDocument}} and then immediately executes _all_ > the newer DBQs ... _once per add_ > ** these deletes are *also* still executed as part of the normal tlog replay, > because they are in the tlog. > Which means if you are recovering from a tlog with 90 addDocs, followed by 5 > DBQs, then *each* of those 5 DBQs will each be executed 91 times -- and for > 90 of those executions, a DUH2 INFO log messages will say {{"Reordered DBQs > detected. ..."}} even tough the only reason they are out of order is because > Solr is deliberately applying them out of order. > * At a minimum we should improve the log messages > * Ideally we should stop (pre-emptively) applying these deletes during tlog > replay. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9941) log replay redundently (pre-)applies DBQs as if they were out of order
[ https://issues.apache.org/jira/browse/SOLR-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812443#comment-15812443 ] Hoss Man commented on SOLR-9941: The patch looks pretty good to me -- but i'm not a tlog expert. One question i have is: if the only code paths that call {{recoverFromLog(boolean)}} are "startup" paths that pass {{true}} why do we need the optional argument? why not just refactor the method to always use the new logic? > log replay redundently (pre-)applies DBQs as if they were out of order > -- > > Key: SOLR-9941 > URL: https://issues.apache.org/jira/browse/SOLR-9941 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man > Attachments: SOLR-9941.patch, SOLR-9941.patch, SOLR-9941.patch > > > There's kind of an odd situation that arises when a Solr node starts up > (after a crash) and tries to recover from it's tlog that causes deletes to be > redundantly & excessively applied -- at a minimum it causes confusing really > log messages > * {{UpdateLog.init(...)}} creates {{TransactionLog}} instances for the most > recent log files found (based on numRecordsToKeep) and then builds a > {{RecentUpdates}} instance from them > * Delete entries from the {{RecentUpdates}} are used to populate 2 lists: > ** {{deleteByQueries}} > ** {{oldDeletes}} (for deleteById). > * Then when {{UpdateLog.recoverFromLog}} is called a {{LogReplayer}} is used > to replay any (uncommited) {{TransactionLog}} enteries > ** during replay {{UpdateLog}} delegates to the UpdateRequestProcessorChain > to for the various adds/deletes, etc... > ** when an add makes it to {{RunUpdateProcessor}} it delegates to > {{DirectUpdateHandler2}}, which (independent of the fact that we're in log > replay) calls {{UpdateLog.getDBQNewer}} for every add, looking for any > "Reordered" deletes that have a version greater then the add > *** if it finds _any_ DBQs "newer" then the document being added, it does a > low level {{IndexWriter.updateDocument}} and then immediately executes _all_ > the newer DBQs ... _once per add_ > ** these deletes are *also* still executed as part of the normal tlog replay, > because they are in the tlog. > Which means if you are recovering from a tlog with 90 addDocs, followed by 5 > DBQs, then *each* of those 5 DBQs will each be executed 91 times -- and for > 90 of those executions, a DUH2 INFO log messages will say {{"Reordered DBQs > detected. ..."}} even tough the only reason they are out of order is because > Solr is deliberately applying them out of order. > * At a minimum we should improve the log messages > * Ideally we should stop (pre-emptively) applying these deletes during tlog > replay. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9941) log replay redundently (pre-)applies DBQs as if they were out of order
[ https://issues.apache.org/jira/browse/SOLR-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812165#comment-15812165 ] Ishan Chattopadhyaya commented on SOLR-9941: Fyi, [~shalinmangar], [~caomanhdat] please review. > log replay redundently (pre-)applies DBQs as if they were out of order > -- > > Key: SOLR-9941 > URL: https://issues.apache.org/jira/browse/SOLR-9941 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man > Attachments: SOLR-9941.patch, SOLR-9941.patch, SOLR-9941.patch > > > There's kind of an odd situation that arises when a Solr node starts up > (after a crash) and tries to recover from it's tlog that causes deletes to be > redundantly & excessively applied -- at a minimum it causes confusing really > log messages > * {{UpdateLog.init(...)}} creates {{TransactionLog}} instances for the most > recent log files found (based on numRecordsToKeep) and then builds a > {{RecentUpdates}} instance from them > * Delete entries from the {{RecentUpdates}} are used to populate 2 lists: > ** {{deleteByQueries}} > ** {{oldDeletes}} (for deleteById). > * Then when {{UpdateLog.recoverFromLog}} is called a {{LogReplayer}} is used > to replay any (uncommited) {{TransactionLog}} enteries > ** during replay {{UpdateLog}} delegates to the UpdateRequestProcessorChain > to for the various adds/deletes, etc... > ** when an add makes it to {{RunUpdateProcessor}} it delegates to > {{DirectUpdateHandler2}}, which (independent of the fact that we're in log > replay) calls {{UpdateLog.getDBQNewer}} for every add, looking for any > "Reordered" deletes that have a version greater then the add > *** if it finds _any_ DBQs "newer" then the document being added, it does a > low level {{IndexWriter.updateDocument}} and then immediately executes _all_ > the newer DBQs ... _once per add_ > ** these deletes are *also* still executed as part of the normal tlog replay, > because they are in the tlog. > Which means if you are recovering from a tlog with 90 addDocs, followed by 5 > DBQs, then *each* of those 5 DBQs will each be executed 91 times -- and for > 90 of those executions, a DUH2 INFO log messages will say {{"Reordered DBQs > detected. ..."}} even tough the only reason they are out of order is because > Solr is deliberately applying them out of order. > * At a minimum we should improve the log messages > * Ideally we should stop (pre-emptively) applying these deletes during tlog > replay. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9941) log replay redundently (pre-)applies DBQs as if they were out of order
[ https://issues.apache.org/jira/browse/SOLR-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15806431#comment-15806431 ] Hoss Man commented on SOLR-9941: I'm guessing this situation exists because the {{RecentUpdates}} code was added to {{UpdateLog}} to handle (true) out of order updates while a system was "live", but it's not clear to me why the {{RecentUpdates}} are used the way they are during {{UpdateLog.init}} ... it seems like at a minimum DUH2 could inspect the UpdateCommand flags to see if this is a {{REPLAY}} command and if it is skip the {{UpdateLog.getDBQNewer}} call? Here's some steps that make it easy to see how weird this situation can get... {noformat} bin/solr -e techproducts curl -X POST http://localhost:8983/solr/techproducts/config -H 'Content-Type: application/json' --data-binary '{"set-property":{"updateHandler.autoCommit.maxTime":"-1"}}' curl -X POST http://localhost:8983/solr/techproducts/update -H 'Content-Type: application/json' --data-binary '{ "add":{ "doc": { "id": "DOCX", "foo_i": 41 } }, "delete": { "query":"foo_i:42" }, "delete": { "query":"foo_i:43" }, "add":{ "doc": { "id": "DOCX", "foo_i": { "inc" : "1" } } }, "delete": { "query":"foo_i:41" }, "delete": { "query":"foo_i:43" }, "add":{ "doc": { "id": "DOCX", "foo_i": { "inc" : "1" } } }, "delete": { "query":"foo_i:41" }, "delete": { "query":"foo_i:42" } }' # verify the updates were applied correctly and doc didn't get deleted by mistake curl 'http://localhost:8983/solr/techproducts/get?wt=json&id=DOCX' { "doc": { "id":"DOCX", "foo_i":43, "_version_":1555827152896655360}} kill -9 PID # use whatever PID you get from "ps -ef | grep start.jar | grep techproducts" bin/solr start -s example/techproducts/solr/ # re-verify the updates were applied correctly during replay curl 'http://localhost:8983/solr/techproducts/get?wt=json&id=DOCX' { "doc": { "id":"DOCX", "foo_i":43, "_version_":1555827152896655360}} {noformat} And here's a snippet of solr's logs during the replay on (second) startup... {noformat} WARN - 2017-01-07 01:27:45.408; [ x:techproducts] org.apache.solr.update.UpdateLog$LogReplayer; Starting log replay tlog{file=/home/hossman/lucene/dev/solr/example/techproducts/solr/techproducts/data/tlog/tlog.001 refcount=2} active=false starting pos=0 INFO - 2017-01-07 01:27:45.422; [ x:techproducts] org.apache.solr.update.DirectUpdateHandler2; Reordered DBQs detected. Update=add{flags=a,_version_=1555827152848420864,id=DOCX} DBQs=[DBQ{version=1555827152905043968,q=foo_i:42}, DBQ{version=1555827152897703936,q=foo_i:41}, DBQ{version=1555827152894558208,q=foo_i:43}, DBQ{version=1555827152883023872,q=foo_i:41}, DBQ{version=1555827152872538112,q=foo_i:43}, DBQ{version=1555827152849469440,q=foo_i:42}] INFO - 2017-01-07 01:27:45.434; [ x:techproducts] org.apache.solr.core.SolrCore; [techproducts] webapp=null path=null params={q=static+firstSearcher+warming+in+solrconfig.xml&distrib=false&event=firstSearcher} hits=3 status=0 QTime=62 INFO - 2017-01-07 01:27:45.446; [ x:techproducts] org.apache.solr.core.QuerySenderListener; QuerySenderListener done. INFO - 2017-01-07 01:27:45.446; [ x:techproducts] org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListener; Loading spell index for spellchecker: default INFO - 2017-01-07 01:27:45.446; [ x:techproducts] org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListener; Loading spell index for spellchecker: wordbreak INFO - 2017-01-07 01:27:45.448; [ x:techproducts] org.apache.solr.core.SolrCore; [techproducts] Registered new searcher Searcher@1826a6c8[techproducts] main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_0(7.0.0):C32)))} INFO - 2017-01-07 01:27:45.532; [ x:techproducts] org.apache.solr.search.SolrIndexSearcher; Opening [Searcher@5bd6b40f[techproducts] realtime] INFO - 2017-01-07 01:27:45.536; [ x:techproducts] org.apache.solr.update.DirectUpdateHandler2; Reordered DBQs detected. Update=add{flags=a,_version_=1555827152881975296,id=DOCX} DBQs=[DBQ{version=1555827152905043968,q=foo_i:42}, DBQ{version=1555827152897703936,q=foo_i:41}, DBQ{version=1555827152894558208,q=foo_i:43}, DBQ{version=1555827152883023872,q=foo_i:41}] INFO - 2017-01-07 01:27:45.546; [ x:techproducts] org.apache.solr.search.SolrIndexSearcher; Opening [Searcher@667a386f[techproducts] realtime] INFO - 2017-01-07 01:27:45.555; [ x:techproducts] org.apache.solr.update.DirectUpdateHandler2; Reordered DBQs detected. Update=add{flags=a,_version_=1555827152896655360,id=DOCX} DBQs=[DBQ{version=1555827152905043968,q=foo_i:42}, DBQ{version=1555827152897703936,q=foo_i:41}] INFO - 2017-01-07 01:27:45.573; [ x:techproducts] org.apache.solr.search.SolrIndexSearcher; Openi