[jira] [Commented] (SOLR-13718) SPLITSHARD using async can cause data loss
[ https://issues.apache.org/jira/browse/SOLR-13718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918555#comment-16918555 ] ASF subversion and git services commented on SOLR-13718: Commit 9d68b0d00dd815210674262463b7908f72a1ef30 in lucene-solr's branch refs/heads/branch_7_7 from Ishan Chattopadhyaya [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=9d68b0d ] SOLR-13718: A more targeted fix for SPLITSHARD, thereby avoiding Backup/Restore test failures > SPLITSHARD using async can cause data loss > -- > > Key: SOLR-13718 > URL: https://issues.apache.org/jira/browse/SOLR-13718 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.7.2, 8.1, 8.2 >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Major > Fix For: 7.7.3, 8.3 > > Attachments: SOLR-13718.patch, solr-13718-reproduce.sh, solr.zip > > > When using SPLITSHARD with async, if there are underlying failures in the > SPLIT core command or other sub-commands of SPLITSHARD, then SPLITSHARD > succeeds and results in two empty sub-shards. > There are various potential failures with SPLIT core command, here's a way to > reproduce using a Solr 6x index in Solr 7x. > -Steps to reproduce (in Solr 7x):- > {code} > 1. Import the attached configset, and create a collection. > 2. Move in the attached data directory (index created in Solr6x) in place of > the created collection's data directory. Do a collection RELOAD. > 3. Issue a *:* query, we see 5 documents. > 4. Issue a SPLITSHARD (async), and then issue *:*, we see 0 documents. > {code} > Check attached solr-13718-reproduce.sh script to do the same (without needing > the zip file). -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13718) SPLITSHARD using async can cause data loss
[ https://issues.apache.org/jira/browse/SOLR-13718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918540#comment-16918540 ] ASF subversion and git services commented on SOLR-13718: Commit f27665198a87692311e7798e835933fc1e9ff986 in lucene-solr's branch refs/heads/master from Ishan Chattopadhyaya [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f276651 ] SOLR-13718: A more targeted fix for SPLITSHARD, thereby avoiding Backup/Restore test failures > SPLITSHARD using async can cause data loss > -- > > Key: SOLR-13718 > URL: https://issues.apache.org/jira/browse/SOLR-13718 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.7.2, 8.1, 8.2 >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Major > Fix For: 7.7.3, 8.3 > > Attachments: SOLR-13718.patch, solr-13718-reproduce.sh, solr.zip > > > When using SPLITSHARD with async, if there are underlying failures in the > SPLIT core command or other sub-commands of SPLITSHARD, then SPLITSHARD > succeeds and results in two empty sub-shards. > There are various potential failures with SPLIT core command, here's a way to > reproduce using a Solr 6x index in Solr 7x. > -Steps to reproduce (in Solr 7x):- > {code} > 1. Import the attached configset, and create a collection. > 2. Move in the attached data directory (index created in Solr6x) in place of > the created collection's data directory. Do a collection RELOAD. > 3. Issue a *:* query, we see 5 documents. > 4. Issue a SPLITSHARD (async), and then issue *:*, we see 0 documents. > {code} > Check attached solr-13718-reproduce.sh script to do the same (without needing > the zip file). -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13718) SPLITSHARD using async can cause data loss
[ https://issues.apache.org/jira/browse/SOLR-13718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918541#comment-16918541 ] ASF subversion and git services commented on SOLR-13718: Commit 12715da544379ad6bad2f13164cea2f4cfe2c78e in lucene-solr's branch refs/heads/branch_8x from Ishan Chattopadhyaya [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=12715da ] SOLR-13718: A more targeted fix for SPLITSHARD, thereby avoiding Backup/Restore test failures > SPLITSHARD using async can cause data loss > -- > > Key: SOLR-13718 > URL: https://issues.apache.org/jira/browse/SOLR-13718 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.7.2, 8.1, 8.2 >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Major > Fix For: 7.7.3, 8.3 > > Attachments: SOLR-13718.patch, solr-13718-reproduce.sh, solr.zip > > > When using SPLITSHARD with async, if there are underlying failures in the > SPLIT core command or other sub-commands of SPLITSHARD, then SPLITSHARD > succeeds and results in two empty sub-shards. > There are various potential failures with SPLIT core command, here's a way to > reproduce using a Solr 6x index in Solr 7x. > -Steps to reproduce (in Solr 7x):- > {code} > 1. Import the attached configset, and create a collection. > 2. Move in the attached data directory (index created in Solr6x) in place of > the created collection's data directory. Do a collection RELOAD. > 3. Issue a *:* query, we see 5 documents. > 4. Issue a SPLITSHARD (async), and then issue *:*, we see 0 documents. > {code} > Check attached solr-13718-reproduce.sh script to do the same (without needing > the zip file). -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13718) SPLITSHARD using async can cause data loss
[ https://issues.apache.org/jira/browse/SOLR-13718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918519#comment-16918519 ] Ishan Chattopadhyaya commented on SOLR-13718: - The above fix caused a test failure in TestLocalFSCloudBackupRestore. There is something wrong with ShardRequestTracker (OCMH)'s processResponses(), whereby the abortOnError is not respected in case of async requests. In this fix, I tried aborting (on error) the async requests as well. However, due to aforementioned wrong behaviour, the RestoreCmd was working around by adding additional checks, and hence the test started failing after my fix. Fixing this the right way will require handling these async responses across all collection API commands uniformly, and will be a longer effort. For now, I'm going to revert my fix and handle the SPLITSHARD failure the same way as RestoreCmd is doing. > SPLITSHARD using async can cause data loss > -- > > Key: SOLR-13718 > URL: https://issues.apache.org/jira/browse/SOLR-13718 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.7.2, 8.1, 8.2 >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Major > Fix For: 7.7.3, 8.3 > > Attachments: SOLR-13718.patch, solr-13718-reproduce.sh, solr.zip > > > When using SPLITSHARD with async, if there are underlying failures in the > SPLIT core command or other sub-commands of SPLITSHARD, then SPLITSHARD > succeeds and results in two empty sub-shards. > There are various potential failures with SPLIT core command, here's a way to > reproduce using a Solr 6x index in Solr 7x. > -Steps to reproduce (in Solr 7x):- > {code} > 1. Import the attached configset, and create a collection. > 2. Move in the attached data directory (index created in Solr6x) in place of > the created collection's data directory. Do a collection RELOAD. > 3. Issue a *:* query, we see 5 documents. > 4. Issue a SPLITSHARD (async), and then issue *:*, we see 0 documents. > {code} > Check attached solr-13718-reproduce.sh script to do the same (without needing > the zip file). -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13718) SPLITSHARD using async can cause data loss
[ https://issues.apache.org/jira/browse/SOLR-13718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918292#comment-16918292 ] ASF subversion and git services commented on SOLR-13718: Commit f5e6334a69d20f6af7b4297c1555885289d1e25a in lucene-solr's branch refs/heads/branch_7_7 from Ishan Chattopadhyaya [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f5e6334 ] SOLR-13718: SPLITSHARD (async) with failures in underlying sub-operations can result in data loss When SPLITSHARD is issued asynchronously, any exception in a sub-operation isn't propagated and the overall SPLITSHARD task proceeds as if there were no failures. This results in marking the active parent shard inactive and can result in two empty sub-shards, thus causing data loss. > SPLITSHARD using async can cause data loss > -- > > Key: SOLR-13718 > URL: https://issues.apache.org/jira/browse/SOLR-13718 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.7.2, 8.1, 8.2 >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Major > Fix For: 8.3 > > Attachments: SOLR-13718.patch, solr-13718-reproduce.sh, solr.zip > > > When using SPLITSHARD with async, if there are underlying failures in the > SPLIT core command or other sub-commands of SPLITSHARD, then SPLITSHARD > succeeds and results in two empty sub-shards. > There are various potential failures with SPLIT core command, here's a way to > reproduce using a Solr 6x index in Solr 7x. > Steps to reproduce (in Solr 7x): > {code} > 1. Import the attached configset, and create a collection. > 2. Move in the attached data directory (index created in Solr6x) in place of > the created collection's data directory. Do a collection RELOAD. > 3. Issue a *:* query, we see 5 documents. > 4. Issue a SPLITSHARD, and then issue *:*, we see 0 documents. > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13718) SPLITSHARD using async can cause data loss
[ https://issues.apache.org/jira/browse/SOLR-13718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918280#comment-16918280 ] ASF subversion and git services commented on SOLR-13718: Commit d606ffdea92513a29bd7d7a1af3cfdf556aae93c in lucene-solr's branch refs/heads/branch_8x from Ishan Chattopadhyaya [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d606ffd ] SOLR-13718: SPLITSHARD (async) with failures in underlying sub-operations can result in data loss When SPLITSHARD is issued asynchronously, any exception in a sub-operation isn't propagated and the overall SPLITSHARD task proceeds as if there were no failures. This results in marking the active parent shard inactive and can result in two empty sub-shards, thus causing data loss. > SPLITSHARD using async can cause data loss > -- > > Key: SOLR-13718 > URL: https://issues.apache.org/jira/browse/SOLR-13718 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.7.2, 8.1, 8.2 >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Major > Fix For: 8.3 > > Attachments: SOLR-13718.patch, solr-13718-reproduce.sh, solr.zip > > > When using SPLITSHARD with async, if there are underlying failures in the > SPLIT core command or other sub-commands of SPLITSHARD, then SPLITSHARD > succeeds and results in two empty sub-shards. > There are various potential failures with SPLIT core command, here's a way to > reproduce using a Solr 6x index in Solr 7x. > Steps to reproduce (in Solr 7x): > {code} > 1. Import the attached configset, and create a collection. > 2. Move in the attached data directory (index created in Solr6x) in place of > the created collection's data directory. Do a collection RELOAD. > 3. Issue a *:* query, we see 5 documents. > 4. Issue a SPLITSHARD, and then issue *:*, we see 0 documents. > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13718) SPLITSHARD using async can cause data loss
[ https://issues.apache.org/jira/browse/SOLR-13718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918278#comment-16918278 ] ASF subversion and git services commented on SOLR-13718: Commit a8d5bd34bf494da8f59baea52f6578bf3ba44ce8 in lucene-solr's branch refs/heads/master from Ishan Chattopadhyaya [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a8d5bd3 ] SOLR-13718: SPLITSHARD (async) with failures in underlying sub-operations can result in data loss When SPLITSHARD is issued asynchronously, any exception in a sub-operation isn't propagated and the overall SPLITSHARD task proceeds as if there were no failures. This results in marking the active parent shard inactive and can result in two empty sub-shards, thus causing data loss. > SPLITSHARD using async can cause data loss > -- > > Key: SOLR-13718 > URL: https://issues.apache.org/jira/browse/SOLR-13718 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.7.2 >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Major > Attachments: SOLR-13718.patch, solr.zip > > > When using SPLITSHARD with async, if there are underlying failures in the > SPLIT core command or other sub-commands of SPLITSHARD, then SPLITSHARD > succeeds and results in two empty sub-shards. > There are various potential failures with SPLIT core command, here's a way to > reproduce using a Solr 6x index in Solr 7x. > Steps to reproduce (in Solr 7x): > {code} > 1. Import the attached configset, and create a collection. > 2. Move in the attached data directory (index created in Solr6x) in place of > the created collection's data directory. Do a collection RELOAD. > 3. Issue a *:* query, we see 5 documents. > 4. Issue a SPLITSHARD, and then issue *:*, we see 0 documents. > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13718) SPLITSHARD using async can cause data loss
[ https://issues.apache.org/jira/browse/SOLR-13718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16915713#comment-16915713 ] Mikhail Khludnev commented on SOLR-13718: - bq. seems like there's a considerable refactoring there around handling of these collection API responses It's probably due to SOLR-12291. Let me know if you need some comments. > SPLITSHARD using async can cause data loss > -- > > Key: SOLR-13718 > URL: https://issues.apache.org/jira/browse/SOLR-13718 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.7.2 >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Major > Attachments: SOLR-13718.patch, solr.zip > > > When using SPLITSHARD with async, if there are underlying failures in the > SPLIT core command or other sub-commands of SPLITSHARD, then SPLITSHARD > succeeds and results in two empty sub-shards. > There are various potential failures with SPLIT core command, here's a way to > reproduce using a Solr 6x index in Solr 7x. > Steps to reproduce (in Solr 7x): > {code} > 1. Import the attached configset, and create a collection. > 2. Move in the attached data directory (index created in Solr6x) in place of > the created collection's data directory. Do a collection RELOAD. > 3. Issue a *:* query, we see 5 documents. > 4. Issue a SPLITSHARD, and then issue *:*, we see 0 documents. > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13718) SPLITSHARD using async can cause data loss
[ https://issues.apache.org/jira/browse/SOLR-13718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16915708#comment-16915708 ] Ishan Chattopadhyaya commented on SOLR-13718: - This problem seems to affect 7.7 and 8x, but not master (seems like there's a considerable refactoring there around handling of these collection API responses). Will look into the master in more detail. > SPLITSHARD using async can cause data loss > -- > > Key: SOLR-13718 > URL: https://issues.apache.org/jira/browse/SOLR-13718 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.7.2 >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Major > Attachments: SOLR-13718.patch, solr.zip > > > When using SPLITSHARD with async, if there are underlying failures in the > SPLIT core command or other sub-commands of SPLITSHARD, then SPLITSHARD > succeeds and results in two empty sub-shards. > There are various potential failures with SPLIT core command, here's a way to > reproduce using a Solr 6x index in Solr 7x. > Steps to reproduce (in Solr 7x): > {code} > 1. Import the attached configset, and create a collection. > 2. Move in the attached data directory (index created in Solr6x) in place of > the created collection's data directory. Do a collection RELOAD. > 3. Issue a *:* query, we see 5 documents. > 4. Issue a SPLITSHARD, and then issue *:*, we see 0 documents. > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13718) SPLITSHARD using async can cause data loss
[ https://issues.apache.org/jira/browse/SOLR-13718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16915700#comment-16915700 ] Ishan Chattopadhyaya commented on SOLR-13718: - Attaching a patch to fix this. > SPLITSHARD using async can cause data loss > -- > > Key: SOLR-13718 > URL: https://issues.apache.org/jira/browse/SOLR-13718 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.7.2 >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Major > Attachments: SOLR-13718.patch, solr.zip > > > When using SPLITSHARD with async, if there are underlying failures in the > SPLIT core command or other sub-commands of SPLITSHARD, then SPLITSHARD > succeeds and results in two empty sub-shards. > There are various potential failures with SPLIT core command, here's a way to > reproduce using a Solr 6x index in Solr 7x. > Steps to reproduce (in Solr 7x): > {code} > 1. Import the attached configset, and create a collection. > 2. Move in the attached data directory (index created in Solr6x) in place of > the created collection's data directory. Do a collection RELOAD. > 3. Issue a *:* query, we see 5 documents. > 4. Issue a SPLITSHARD, and then issue *:*, we see 0 documents. > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13718) SPLITSHARD using async can cause data loss
[ https://issues.apache.org/jira/browse/SOLR-13718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16915697#comment-16915697 ] Ishan Chattopadhyaya commented on SOLR-13718: - This is a spin off from SOLR-13695. THe last comment there describes how I discovered this bug. > SPLITSHARD using async can cause data loss > -- > > Key: SOLR-13718 > URL: https://issues.apache.org/jira/browse/SOLR-13718 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.7.2 >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Major > > When using SPLITSHARD with async, if there are underlying failures in the > SPLIT core command or other sub-commands of SPLITSHARD, then SPLITSHARD > succeeds and results in two empty sub-shards. > There are various potential failures with SPLIT core command, here's a way to > reproduce using a Solr 6x index in Solr 7x. > Steps to reproduce (in Solr 7x): > {code} > 1. Import the attached configset, and create a collection. > 2. Move in the attached data directory (index created in Solr6x) in place of > the created collection's data directory. Do a collection RELOAD. > 3. Issue a *:* query, we see 5 documents. > 4. Issue a SPLITSHARD, and then issue *:*, we see 0 documents. > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org