[jira] [Updated] (SOLR-9941) log replay redundently (pre-)applies DBQs as if they were out of order

2017-01-16 Thread Ishan Chattopadhyaya (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chattopadhyaya updated SOLR-9941:
---
Fix Version/s: (was: 6x)
   6.5

> log replay redundently (pre-)applies DBQs as if they were out of order
> --
>
> Key: SOLR-9941
> URL: https://issues.apache.org/jira/browse/SOLR-9941
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
>Assignee: Ishan Chattopadhyaya
> Fix For: master (7.0), 6.5
>
> Attachments: SOLR-9941.hoss-test-experiment.patch, SOLR-9941.patch, 
> SOLR-9941.patch, SOLR-9941.patch, SOLR-9941.patch, SOLR-9941.patch
>
>
> There's kind of an odd situation that arises when a Solr node starts up 
> (after a crash) and tries to recover from it's tlog that causes deletes to be 
> redundantly & excessively applied -- at a minimum it causes confusing really 
> log messages
> * {{UpdateLog.init(...)}} creates {{TransactionLog}} instances for the most 
> recent log files found (based on numRecordsToKeep) and then builds a 
> {{RecentUpdates}} instance from them
> * Delete entries from the {{RecentUpdates}} are used to populate 2 lists:
> ** {{deleteByQueries}}
> ** {{oldDeletes}} (for deleteById).
> * Then when {{UpdateLog.recoverFromLog}} is called a {{LogReplayer}} is used 
> to replay any (uncommited) {{TransactionLog}} enteries
> ** during replay {{UpdateLog}} delegates to the UpdateRequestProcessorChain 
> to for the various adds/deletes, etc...
> ** when an add makes it to {{RunUpdateProcessor}} it delegates to 
> {{DirectUpdateHandler2}}, which (independent of the fact that we're in log 
> replay) calls {{UpdateLog.getDBQNewer}} for every add, looking for any 
> "Reordered" deletes that have a version greater then the add
> *** if it finds _any_ DBQs "newer" then the document being added, it does a 
> low level {{IndexWriter.updateDocument}} and then immediately executes _all_ 
> the newer DBQs ... _once per add_
> ** these deletes are *also* still executed as part of the normal tlog replay, 
> because they are in the tlog.
> Which means if you are recovering from a tlog with 90 addDocs, followed by 5 
> DBQs, then *each* of those 5 DBQs will each be executed 91 times -- and for 
> 90 of those executions, a DUH2 INFO log messages will say {{"Reordered DBQs 
> detected. ..."}} even tough the only reason they are out of order is because 
> Solr is deliberately applying them out of order.
> * At a minimum we should improve the log messages
> * Ideally we should stop (pre-emptively) applying these deletes during tlog 
> replay.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9941) log replay redundently (pre-)applies DBQs as if they were out of order

2017-01-10 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-9941:
---
Attachment: SOLR-9941.patch

bq. In TestRecovery, we kinda mixed up single instance code path along with 
SolrCloud mode code path, so we end up with the fail.

yeah, that was the piece i wasn't clear about -- makes sense.

based on Dat's comments, I re-did my previous idea as a new independent 
{{testNewDBQAndDocMatchingOldDBQDuringLogReplay}} method:

* the "client" only sends regular updates (w/o fake versions) as it would to a 
leader/single-node instance
* confirms that a DBQ which arrives during recovery is processed correctly, 
even if the docs being deleted haven't been added yet as part of recovery
* Also includes a DBQ in the tlog which may (randomly) be applied before/after 
the new DBQ arrives
** just to sanity check that the new delete doesn't muck with the replay of old 
deletes
* the DBQ from the tlog also matches a doc which isn't added until recovery is 
in process (randomly before/after the DBQ from the tlog is applied) to verify 
that doc isn't deleted by mistake

...i've added this new test method to the patch to help with the coverage of 
the affected code paths.

bq. Unless someone has any objections or suggests some modifications, I'd like 
to commit this after the 6.4 branch is cut (or alternatively, commit this to 
master and wait for backporting after the 6.4 branch is cut).

The sooner it gets committed to master, the sooner jenkins starts helping us do 
randomized testing -- since Dat thinks the change makes sense, and there are no 
objections I would encourage committing to master early and waiting to backport.



> log replay redundently (pre-)applies DBQs as if they were out of order
> --
>
> Key: SOLR-9941
> URL: https://issues.apache.org/jira/browse/SOLR-9941
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
> Attachments: SOLR-9941.hoss-test-experiment.patch, SOLR-9941.patch, 
> SOLR-9941.patch, SOLR-9941.patch, SOLR-9941.patch, SOLR-9941.patch
>
>
> There's kind of an odd situation that arises when a Solr node starts up 
> (after a crash) and tries to recover from it's tlog that causes deletes to be 
> redundantly & excessively applied -- at a minimum it causes confusing really 
> log messages
> * {{UpdateLog.init(...)}} creates {{TransactionLog}} instances for the most 
> recent log files found (based on numRecordsToKeep) and then builds a 
> {{RecentUpdates}} instance from them
> * Delete entries from the {{RecentUpdates}} are used to populate 2 lists:
> ** {{deleteByQueries}}
> ** {{oldDeletes}} (for deleteById).
> * Then when {{UpdateLog.recoverFromLog}} is called a {{LogReplayer}} is used 
> to replay any (uncommited) {{TransactionLog}} enteries
> ** during replay {{UpdateLog}} delegates to the UpdateRequestProcessorChain 
> to for the various adds/deletes, etc...
> ** when an add makes it to {{RunUpdateProcessor}} it delegates to 
> {{DirectUpdateHandler2}}, which (independent of the fact that we're in log 
> replay) calls {{UpdateLog.getDBQNewer}} for every add, looking for any 
> "Reordered" deletes that have a version greater then the add
> *** if it finds _any_ DBQs "newer" then the document being added, it does a 
> low level {{IndexWriter.updateDocument}} and then immediately executes _all_ 
> the newer DBQs ... _once per add_
> ** these deletes are *also* still executed as part of the normal tlog replay, 
> because they are in the tlog.
> Which means if you are recovering from a tlog with 90 addDocs, followed by 5 
> DBQs, then *each* of those 5 DBQs will each be executed 91 times -- and for 
> 90 of those executions, a DUH2 INFO log messages will say {{"Reordered DBQs 
> detected. ..."}} even tough the only reason they are out of order is because 
> Solr is deliberately applying them out of order.
> * At a minimum we should improve the log messages
> * Ideally we should stop (pre-emptively) applying these deletes during tlog 
> replay.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9941) log replay redundently (pre-)applies DBQs as if they were out of order

2017-01-10 Thread Ishan Chattopadhyaya (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chattopadhyaya updated SOLR-9941:
---
Attachment: SOLR-9941.patch

Thanks [~caomanhdat]. Updated the patch, removed the conditional parameter for 
startup.

Unless someone has any objections or suggests some modifications, I'd like to 
commit this after the 6.4 branch is cut (or alternatively, commit this to 
master and wait for backporting after the 6.4 branch is cut).

> log replay redundently (pre-)applies DBQs as if they were out of order
> --
>
> Key: SOLR-9941
> URL: https://issues.apache.org/jira/browse/SOLR-9941
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
> Attachments: SOLR-9941.hoss-test-experiment.patch, SOLR-9941.patch, 
> SOLR-9941.patch, SOLR-9941.patch, SOLR-9941.patch
>
>
> There's kind of an odd situation that arises when a Solr node starts up 
> (after a crash) and tries to recover from it's tlog that causes deletes to be 
> redundantly & excessively applied -- at a minimum it causes confusing really 
> log messages
> * {{UpdateLog.init(...)}} creates {{TransactionLog}} instances for the most 
> recent log files found (based on numRecordsToKeep) and then builds a 
> {{RecentUpdates}} instance from them
> * Delete entries from the {{RecentUpdates}} are used to populate 2 lists:
> ** {{deleteByQueries}}
> ** {{oldDeletes}} (for deleteById).
> * Then when {{UpdateLog.recoverFromLog}} is called a {{LogReplayer}} is used 
> to replay any (uncommited) {{TransactionLog}} enteries
> ** during replay {{UpdateLog}} delegates to the UpdateRequestProcessorChain 
> to for the various adds/deletes, etc...
> ** when an add makes it to {{RunUpdateProcessor}} it delegates to 
> {{DirectUpdateHandler2}}, which (independent of the fact that we're in log 
> replay) calls {{UpdateLog.getDBQNewer}} for every add, looking for any 
> "Reordered" deletes that have a version greater then the add
> *** if it finds _any_ DBQs "newer" then the document being added, it does a 
> low level {{IndexWriter.updateDocument}} and then immediately executes _all_ 
> the newer DBQs ... _once per add_
> ** these deletes are *also* still executed as part of the normal tlog replay, 
> because they are in the tlog.
> Which means if you are recovering from a tlog with 90 addDocs, followed by 5 
> DBQs, then *each* of those 5 DBQs will each be executed 91 times -- and for 
> 90 of those executions, a DUH2 INFO log messages will say {{"Reordered DBQs 
> detected. ..."}} even tough the only reason they are out of order is because 
> Solr is deliberately applying them out of order.
> * At a minimum we should improve the log messages
> * Ideally we should stop (pre-emptively) applying these deletes during tlog 
> replay.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9941) log replay redundently (pre-)applies DBQs as if they were out of order

2017-01-09 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-9941:
---
Attachment: SOLR-9941.hoss-test-experiment.patch

I wanted to beef up Ishan's testLogReplayWithReorderedDBQ to prove that if 
another (probably ordered) DBQ arrived _during_ log replay it would correctly 
be applied -- even if some affected docs hadn't been added yet as part of 
replay.  (ie: prove that "RecentUpdates" was still being used during replay)

But for some reason my modified test fails even on master.  I suspect either 
i'm misunderstanding something about how recovery works, or about the way 
TestRecovery works (or i just have a silly bug/typo somewhere)

In anycase, i've attached only my test idea in 
SOLR-9941.hoss-test-experiment.patch if anyone wants to take a look

> log replay redundently (pre-)applies DBQs as if they were out of order
> --
>
> Key: SOLR-9941
> URL: https://issues.apache.org/jira/browse/SOLR-9941
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
> Attachments: SOLR-9941.hoss-test-experiment.patch, SOLR-9941.patch, 
> SOLR-9941.patch, SOLR-9941.patch
>
>
> There's kind of an odd situation that arises when a Solr node starts up 
> (after a crash) and tries to recover from it's tlog that causes deletes to be 
> redundantly & excessively applied -- at a minimum it causes confusing really 
> log messages
> * {{UpdateLog.init(...)}} creates {{TransactionLog}} instances for the most 
> recent log files found (based on numRecordsToKeep) and then builds a 
> {{RecentUpdates}} instance from them
> * Delete entries from the {{RecentUpdates}} are used to populate 2 lists:
> ** {{deleteByQueries}}
> ** {{oldDeletes}} (for deleteById).
> * Then when {{UpdateLog.recoverFromLog}} is called a {{LogReplayer}} is used 
> to replay any (uncommited) {{TransactionLog}} enteries
> ** during replay {{UpdateLog}} delegates to the UpdateRequestProcessorChain 
> to for the various adds/deletes, etc...
> ** when an add makes it to {{RunUpdateProcessor}} it delegates to 
> {{DirectUpdateHandler2}}, which (independent of the fact that we're in log 
> replay) calls {{UpdateLog.getDBQNewer}} for every add, looking for any 
> "Reordered" deletes that have a version greater then the add
> *** if it finds _any_ DBQs "newer" then the document being added, it does a 
> low level {{IndexWriter.updateDocument}} and then immediately executes _all_ 
> the newer DBQs ... _once per add_
> ** these deletes are *also* still executed as part of the normal tlog replay, 
> because they are in the tlog.
> Which means if you are recovering from a tlog with 90 addDocs, followed by 5 
> DBQs, then *each* of those 5 DBQs will each be executed 91 times -- and for 
> 90 of those executions, a DUH2 INFO log messages will say {{"Reordered DBQs 
> detected. ..."}} even tough the only reason they are out of order is because 
> Solr is deliberately applying them out of order.
> * At a minimum we should improve the log messages
> * Ideally we should stop (pre-emptively) applying these deletes during tlog 
> replay.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9941) log replay redundently (pre-)applies DBQs as if they were out of order

2017-01-07 Thread Ishan Chattopadhyaya (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chattopadhyaya updated SOLR-9941:
---
Attachment: SOLR-9941.patch

Attaching a fix to clear out the DBQs and oldDeletes lists before a log replay 
upon a startup. [~hossman], [~markrmil...@gmail.com], [~yo...@apache.org], can 
you please review?

> log replay redundently (pre-)applies DBQs as if they were out of order
> --
>
> Key: SOLR-9941
> URL: https://issues.apache.org/jira/browse/SOLR-9941
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
> Attachments: SOLR-9941.patch, SOLR-9941.patch, SOLR-9941.patch
>
>
> There's kind of an odd situation that arises when a Solr node starts up 
> (after a crash) and tries to recover from it's tlog that causes deletes to be 
> redundantly & excessively applied -- at a minimum it causes confusing really 
> log messages
> * {{UpdateLog.init(...)}} creates {{TransactionLog}} instances for the most 
> recent log files found (based on numRecordsToKeep) and then builds a 
> {{RecentUpdates}} instance from them
> * Delete entries from the {{RecentUpdates}} are used to populate 2 lists:
> ** {{deleteByQueries}}
> ** {{oldDeletes}} (for deleteById).
> * Then when {{UpdateLog.recoverFromLog}} is called a {{LogReplayer}} is used 
> to replay any (uncommited) {{TransactionLog}} enteries
> ** during replay {{UpdateLog}} delegates to the UpdateRequestProcessorChain 
> to for the various adds/deletes, etc...
> ** when an add makes it to {{RunUpdateProcessor}} it delegates to 
> {{DirectUpdateHandler2}}, which (independent of the fact that we're in log 
> replay) calls {{UpdateLog.getDBQNewer}} for every add, looking for any 
> "Reordered" deletes that have a version greater then the add
> *** if it finds _any_ DBQs "newer" then the document being added, it does a 
> low level {{IndexWriter.updateDocument}} and then immediately executes _all_ 
> the newer DBQs ... _once per add_
> ** these deletes are *also* still executed as part of the normal tlog replay, 
> because they are in the tlog.
> Which means if you are recovering from a tlog with 90 addDocs, followed by 5 
> DBQs, then *each* of those 5 DBQs will each be executed 91 times -- and for 
> 90 of those executions, a DUH2 INFO log messages will say {{"Reordered DBQs 
> detected. ..."}} even tough the only reason they are out of order is because 
> Solr is deliberately applying them out of order.
> * At a minimum we should improve the log messages
> * Ideally we should stop (pre-emptively) applying these deletes during tlog 
> replay.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9941) log replay redundently (pre-)applies DBQs as if they were out of order

2017-01-07 Thread Ishan Chattopadhyaya (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chattopadhyaya updated SOLR-9941:
---
Attachment: SOLR-9941.patch

Seems like the fix I posted below won't work due to the fact that doing so 
would preclude us from processing actually re-ordered DBQs also, and hence 
leave out some documents that should've been deleted. Added a test for this 
(which should anyway be committed, I think).

> log replay redundently (pre-)applies DBQs as if they were out of order
> --
>
> Key: SOLR-9941
> URL: https://issues.apache.org/jira/browse/SOLR-9941
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
> Attachments: SOLR-9941.patch, SOLR-9941.patch
>
>
> There's kind of an odd situation that arises when a Solr node starts up 
> (after a crash) and tries to recover from it's tlog that causes deletes to be 
> redundantly & excessively applied -- at a minimum it causes confusing really 
> log messages
> * {{UpdateLog.init(...)}} creates {{TransactionLog}} instances for the most 
> recent log files found (based on numRecordsToKeep) and then builds a 
> {{RecentUpdates}} instance from them
> * Delete entries from the {{RecentUpdates}} are used to populate 2 lists:
> ** {{deleteByQueries}}
> ** {{oldDeletes}} (for deleteById).
> * Then when {{UpdateLog.recoverFromLog}} is called a {{LogReplayer}} is used 
> to replay any (uncommited) {{TransactionLog}} enteries
> ** during replay {{UpdateLog}} delegates to the UpdateRequestProcessorChain 
> to for the various adds/deletes, etc...
> ** when an add makes it to {{RunUpdateProcessor}} it delegates to 
> {{DirectUpdateHandler2}}, which (independent of the fact that we're in log 
> replay) calls {{UpdateLog.getDBQNewer}} for every add, looking for any 
> "Reordered" deletes that have a version greater then the add
> *** if it finds _any_ DBQs "newer" then the document being added, it does a 
> low level {{IndexWriter.updateDocument}} and then immediately executes _all_ 
> the newer DBQs ... _once per add_
> ** these deletes are *also* still executed as part of the normal tlog replay, 
> because they are in the tlog.
> Which means if you are recovering from a tlog with 90 addDocs, followed by 5 
> DBQs, then *each* of those 5 DBQs will each be executed 91 times -- and for 
> 90 of those executions, a DUH2 INFO log messages will say {{"Reordered DBQs 
> detected. ..."}} even tough the only reason they are out of order is because 
> Solr is deliberately applying them out of order.
> * At a minimum we should improve the log messages
> * Ideally we should stop (pre-emptively) applying these deletes during tlog 
> replay.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9941) log replay redundently (pre-)applies DBQs as if they were out of order

2017-01-06 Thread Ishan Chattopadhyaya (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chattopadhyaya updated SOLR-9941:
---
Attachment: SOLR-9941.patch

The attached patch seems to fix the problem, and feels like the right thing to 
do. Based on Hoss' idea:
{quote}
it seems like at a minimum DUH2 could inspect the UpdateCommand flags to see if 
this is a {{REPLAY}} command and if it is skip the {{UpdateLog.getDBQNewer}} 
call?
{quote}

The corresponding test failure in jira/solr-5944 branch passes with this patch 
(even after excluding the getUpdateLog().deleteAll() call).

> log replay redundently (pre-)applies DBQs as if they were out of order
> --
>
> Key: SOLR-9941
> URL: https://issues.apache.org/jira/browse/SOLR-9941
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
> Attachments: SOLR-9941.patch
>
>
> There's kind of an odd situation that arises when a Solr node starts up 
> (after a crash) and tries to recover from it's tlog that causes deletes to be 
> redundantly & excessively applied -- at a minimum it causes confusing really 
> log messages
> * {{UpdateLog.init(...)}} creates {{TransactionLog}} instances for the most 
> recent log files found (based on numRecordsToKeep) and then builds a 
> {{RecentUpdates}} instance from them
> * Delete entries from the {{RecentUpdates}} are used to populate 2 lists:
> ** {{deleteByQueries}}
> ** {{oldDeletes}} (for deleteById).
> * Then when {{UpdateLog.recoverFromLog}} is called a {{LogReplayer}} is used 
> to replay any (uncommited) {{TransactionLog}} enteries
> ** during replay {{UpdateLog}} delegates to the UpdateRequestProcessorChain 
> to for the various adds/deletes, etc...
> ** when an add makes it to {{RunUpdateProcessor}} it delegates to 
> {{DirectUpdateHandler2}}, which (independent of the fact that we're in log 
> replay) calls {{UpdateLog.getDBQNewer}} for every add, looking for any 
> "Reordered" deletes that have a version greater then the add
> *** if it finds _any_ DBQs "newer" then the document being added, it does a 
> low level {{IndexWriter.updateDocument}} and then immediately executes _all_ 
> the newer DBQs ... _once per add_
> ** these deletes are *also* still executed as part of the normal tlog replay, 
> because they are in the tlog.
> Which means if you are recovering from a tlog with 90 addDocs, followed by 5 
> DBQs, then *each* of those 5 DBQs will each be executed 91 times -- and for 
> 90 of those executions, a DUH2 INFO log messages will say {{"Reordered DBQs 
> detected. ..."}} even tough the only reason they are out of order is because 
> Solr is deliberately applying them out of order.
> * At a minimum we should improve the log messages
> * Ideally we should stop (pre-emptively) applying these deletes during tlog 
> replay.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org