[jira] [Commented] (SOLR-13718) SPLITSHARD using async can cause data loss

2019-08-29 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918555#comment-16918555
 ] 

ASF subversion and git services commented on SOLR-13718:


Commit 9d68b0d00dd815210674262463b7908f72a1ef30 in lucene-solr's branch 
refs/heads/branch_7_7 from Ishan Chattopadhyaya
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=9d68b0d ]

SOLR-13718: A more targeted fix for SPLITSHARD, thereby avoiding Backup/Restore 
test failures


> SPLITSHARD using async can cause data loss
> --
>
> Key: SOLR-13718
> URL: https://issues.apache.org/jira/browse/SOLR-13718
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.7.2, 8.1, 8.2
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Fix For: 7.7.3, 8.3
>
> Attachments: SOLR-13718.patch, solr-13718-reproduce.sh, solr.zip
>
>
> When using SPLITSHARD with async, if there are underlying failures in the 
> SPLIT core command or other sub-commands of SPLITSHARD, then SPLITSHARD 
> succeeds and results in two empty sub-shards.
> There are various potential failures with SPLIT core command, here's a way to 
> reproduce using a Solr 6x index in Solr 7x.
> -Steps to reproduce (in Solr 7x):-
> {code}
> 1. Import the attached configset, and create a collection.
> 2. Move in the attached data directory (index created in Solr6x) in place of 
> the created collection's data directory. Do a collection RELOAD.
> 3. Issue a *:* query, we see 5 documents.
> 4. Issue a SPLITSHARD (async), and then issue *:*, we see 0 documents.
> {code}
> Check attached solr-13718-reproduce.sh script to do the same (without needing 
> the zip file).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13718) SPLITSHARD using async can cause data loss

2019-08-29 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918540#comment-16918540
 ] 

ASF subversion and git services commented on SOLR-13718:


Commit f27665198a87692311e7798e835933fc1e9ff986 in lucene-solr's branch 
refs/heads/master from Ishan Chattopadhyaya
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f276651 ]

SOLR-13718: A more targeted fix for SPLITSHARD, thereby avoiding Backup/Restore 
test failures


> SPLITSHARD using async can cause data loss
> --
>
> Key: SOLR-13718
> URL: https://issues.apache.org/jira/browse/SOLR-13718
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.7.2, 8.1, 8.2
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Fix For: 7.7.3, 8.3
>
> Attachments: SOLR-13718.patch, solr-13718-reproduce.sh, solr.zip
>
>
> When using SPLITSHARD with async, if there are underlying failures in the 
> SPLIT core command or other sub-commands of SPLITSHARD, then SPLITSHARD 
> succeeds and results in two empty sub-shards.
> There are various potential failures with SPLIT core command, here's a way to 
> reproduce using a Solr 6x index in Solr 7x.
> -Steps to reproduce (in Solr 7x):-
> {code}
> 1. Import the attached configset, and create a collection.
> 2. Move in the attached data directory (index created in Solr6x) in place of 
> the created collection's data directory. Do a collection RELOAD.
> 3. Issue a *:* query, we see 5 documents.
> 4. Issue a SPLITSHARD (async), and then issue *:*, we see 0 documents.
> {code}
> Check attached solr-13718-reproduce.sh script to do the same (without needing 
> the zip file).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13718) SPLITSHARD using async can cause data loss

2019-08-29 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918541#comment-16918541
 ] 

ASF subversion and git services commented on SOLR-13718:


Commit 12715da544379ad6bad2f13164cea2f4cfe2c78e in lucene-solr's branch 
refs/heads/branch_8x from Ishan Chattopadhyaya
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=12715da ]

SOLR-13718: A more targeted fix for SPLITSHARD, thereby avoiding Backup/Restore 
test failures


> SPLITSHARD using async can cause data loss
> --
>
> Key: SOLR-13718
> URL: https://issues.apache.org/jira/browse/SOLR-13718
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.7.2, 8.1, 8.2
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Fix For: 7.7.3, 8.3
>
> Attachments: SOLR-13718.patch, solr-13718-reproduce.sh, solr.zip
>
>
> When using SPLITSHARD with async, if there are underlying failures in the 
> SPLIT core command or other sub-commands of SPLITSHARD, then SPLITSHARD 
> succeeds and results in two empty sub-shards.
> There are various potential failures with SPLIT core command, here's a way to 
> reproduce using a Solr 6x index in Solr 7x.
> -Steps to reproduce (in Solr 7x):-
> {code}
> 1. Import the attached configset, and create a collection.
> 2. Move in the attached data directory (index created in Solr6x) in place of 
> the created collection's data directory. Do a collection RELOAD.
> 3. Issue a *:* query, we see 5 documents.
> 4. Issue a SPLITSHARD (async), and then issue *:*, we see 0 documents.
> {code}
> Check attached solr-13718-reproduce.sh script to do the same (without needing 
> the zip file).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13718) SPLITSHARD using async can cause data loss

2019-08-29 Thread Ishan Chattopadhyaya (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918519#comment-16918519
 ] 

Ishan Chattopadhyaya commented on SOLR-13718:
-

The above fix caused a test failure in TestLocalFSCloudBackupRestore. There is 
something wrong with ShardRequestTracker (OCMH)'s processResponses(), whereby 
the abortOnError is not respected in case of async requests. In this fix, I 
tried aborting (on error) the async requests as well. However, due to 
aforementioned wrong behaviour, the RestoreCmd was working around by adding 
additional checks, and hence the test started failing after my fix.

Fixing this the right way will require handling these async responses across 
all collection API commands uniformly, and will be a longer effort. For now, 
I'm going to revert my fix and handle the SPLITSHARD failure the same way as 
RestoreCmd is doing.

> SPLITSHARD using async can cause data loss
> --
>
> Key: SOLR-13718
> URL: https://issues.apache.org/jira/browse/SOLR-13718
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.7.2, 8.1, 8.2
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Fix For: 7.7.3, 8.3
>
> Attachments: SOLR-13718.patch, solr-13718-reproduce.sh, solr.zip
>
>
> When using SPLITSHARD with async, if there are underlying failures in the 
> SPLIT core command or other sub-commands of SPLITSHARD, then SPLITSHARD 
> succeeds and results in two empty sub-shards.
> There are various potential failures with SPLIT core command, here's a way to 
> reproduce using a Solr 6x index in Solr 7x.
> -Steps to reproduce (in Solr 7x):-
> {code}
> 1. Import the attached configset, and create a collection.
> 2. Move in the attached data directory (index created in Solr6x) in place of 
> the created collection's data directory. Do a collection RELOAD.
> 3. Issue a *:* query, we see 5 documents.
> 4. Issue a SPLITSHARD (async), and then issue *:*, we see 0 documents.
> {code}
> Check attached solr-13718-reproduce.sh script to do the same (without needing 
> the zip file).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13718) SPLITSHARD using async can cause data loss

2019-08-28 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918292#comment-16918292
 ] 

ASF subversion and git services commented on SOLR-13718:


Commit f5e6334a69d20f6af7b4297c1555885289d1e25a in lucene-solr's branch 
refs/heads/branch_7_7 from Ishan Chattopadhyaya
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f5e6334 ]

SOLR-13718: SPLITSHARD (async) with failures in underlying sub-operations can 
result in data loss

  When SPLITSHARD is issued asynchronously, any exception in a sub-operation 
isn't propagated and the overall
  SPLITSHARD task proceeds as if there were no failures. This results in 
marking the active parent shard inactive
  and can result in two empty sub-shards, thus causing data loss.


> SPLITSHARD using async can cause data loss
> --
>
> Key: SOLR-13718
> URL: https://issues.apache.org/jira/browse/SOLR-13718
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.7.2, 8.1, 8.2
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Fix For: 8.3
>
> Attachments: SOLR-13718.patch, solr-13718-reproduce.sh, solr.zip
>
>
> When using SPLITSHARD with async, if there are underlying failures in the 
> SPLIT core command or other sub-commands of SPLITSHARD, then SPLITSHARD 
> succeeds and results in two empty sub-shards.
> There are various potential failures with SPLIT core command, here's a way to 
> reproduce using a Solr 6x index in Solr 7x.
> Steps to reproduce (in Solr 7x):
> {code}
> 1. Import the attached configset, and create a collection.
> 2. Move in the attached data directory (index created in Solr6x) in place of 
> the created collection's data directory. Do a collection RELOAD.
> 3. Issue a *:* query, we see 5 documents.
> 4. Issue a SPLITSHARD, and then issue *:*, we see 0 documents.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13718) SPLITSHARD using async can cause data loss

2019-08-28 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918280#comment-16918280
 ] 

ASF subversion and git services commented on SOLR-13718:


Commit d606ffdea92513a29bd7d7a1af3cfdf556aae93c in lucene-solr's branch 
refs/heads/branch_8x from Ishan Chattopadhyaya
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d606ffd ]

SOLR-13718: SPLITSHARD (async) with failures in underlying sub-operations can 
result in data loss

  When SPLITSHARD is issued asynchronously, any exception in a sub-operation 
isn't propagated and the overall
  SPLITSHARD task proceeds as if there were no failures. This results in 
marking the active parent shard inactive
  and can result in two empty sub-shards, thus causing data loss.


> SPLITSHARD using async can cause data loss
> --
>
> Key: SOLR-13718
> URL: https://issues.apache.org/jira/browse/SOLR-13718
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.7.2, 8.1, 8.2
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Fix For: 8.3
>
> Attachments: SOLR-13718.patch, solr-13718-reproduce.sh, solr.zip
>
>
> When using SPLITSHARD with async, if there are underlying failures in the 
> SPLIT core command or other sub-commands of SPLITSHARD, then SPLITSHARD 
> succeeds and results in two empty sub-shards.
> There are various potential failures with SPLIT core command, here's a way to 
> reproduce using a Solr 6x index in Solr 7x.
> Steps to reproduce (in Solr 7x):
> {code}
> 1. Import the attached configset, and create a collection.
> 2. Move in the attached data directory (index created in Solr6x) in place of 
> the created collection's data directory. Do a collection RELOAD.
> 3. Issue a *:* query, we see 5 documents.
> 4. Issue a SPLITSHARD, and then issue *:*, we see 0 documents.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13718) SPLITSHARD using async can cause data loss

2019-08-28 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918278#comment-16918278
 ] 

ASF subversion and git services commented on SOLR-13718:


Commit a8d5bd34bf494da8f59baea52f6578bf3ba44ce8 in lucene-solr's branch 
refs/heads/master from Ishan Chattopadhyaya
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a8d5bd3 ]

SOLR-13718: SPLITSHARD (async) with failures in underlying sub-operations can 
result in data loss

  When SPLITSHARD is issued asynchronously, any exception in a sub-operation 
isn't propagated and the overall
  SPLITSHARD task proceeds as if there were no failures. This results in 
marking the active parent shard inactive
  and can result in two empty sub-shards, thus causing data loss.


> SPLITSHARD using async can cause data loss
> --
>
> Key: SOLR-13718
> URL: https://issues.apache.org/jira/browse/SOLR-13718
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.7.2
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Attachments: SOLR-13718.patch, solr.zip
>
>
> When using SPLITSHARD with async, if there are underlying failures in the 
> SPLIT core command or other sub-commands of SPLITSHARD, then SPLITSHARD 
> succeeds and results in two empty sub-shards.
> There are various potential failures with SPLIT core command, here's a way to 
> reproduce using a Solr 6x index in Solr 7x.
> Steps to reproduce (in Solr 7x):
> {code}
> 1. Import the attached configset, and create a collection.
> 2. Move in the attached data directory (index created in Solr6x) in place of 
> the created collection's data directory. Do a collection RELOAD.
> 3. Issue a *:* query, we see 5 documents.
> 4. Issue a SPLITSHARD, and then issue *:*, we see 0 documents.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13718) SPLITSHARD using async can cause data loss

2019-08-26 Thread Mikhail Khludnev (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16915713#comment-16915713
 ] 

Mikhail Khludnev commented on SOLR-13718:
-

bq. seems like there's a considerable refactoring there around handling of 
these collection API responses
It's probably due to SOLR-12291. Let me know if you need some comments.

> SPLITSHARD using async can cause data loss
> --
>
> Key: SOLR-13718
> URL: https://issues.apache.org/jira/browse/SOLR-13718
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.7.2
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Attachments: SOLR-13718.patch, solr.zip
>
>
> When using SPLITSHARD with async, if there are underlying failures in the 
> SPLIT core command or other sub-commands of SPLITSHARD, then SPLITSHARD 
> succeeds and results in two empty sub-shards.
> There are various potential failures with SPLIT core command, here's a way to 
> reproduce using a Solr 6x index in Solr 7x.
> Steps to reproduce (in Solr 7x):
> {code}
> 1. Import the attached configset, and create a collection.
> 2. Move in the attached data directory (index created in Solr6x) in place of 
> the created collection's data directory. Do a collection RELOAD.
> 3. Issue a *:* query, we see 5 documents.
> 4. Issue a SPLITSHARD, and then issue *:*, we see 0 documents.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13718) SPLITSHARD using async can cause data loss

2019-08-26 Thread Ishan Chattopadhyaya (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16915708#comment-16915708
 ] 

Ishan Chattopadhyaya commented on SOLR-13718:
-

This problem seems to affect 7.7 and 8x, but not master (seems like there's a 
considerable refactoring there around handling of these collection API 
responses). Will look into the master in more detail.

> SPLITSHARD using async can cause data loss
> --
>
> Key: SOLR-13718
> URL: https://issues.apache.org/jira/browse/SOLR-13718
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.7.2
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Attachments: SOLR-13718.patch, solr.zip
>
>
> When using SPLITSHARD with async, if there are underlying failures in the 
> SPLIT core command or other sub-commands of SPLITSHARD, then SPLITSHARD 
> succeeds and results in two empty sub-shards.
> There are various potential failures with SPLIT core command, here's a way to 
> reproduce using a Solr 6x index in Solr 7x.
> Steps to reproduce (in Solr 7x):
> {code}
> 1. Import the attached configset, and create a collection.
> 2. Move in the attached data directory (index created in Solr6x) in place of 
> the created collection's data directory. Do a collection RELOAD.
> 3. Issue a *:* query, we see 5 documents.
> 4. Issue a SPLITSHARD, and then issue *:*, we see 0 documents.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13718) SPLITSHARD using async can cause data loss

2019-08-26 Thread Ishan Chattopadhyaya (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16915700#comment-16915700
 ] 

Ishan Chattopadhyaya commented on SOLR-13718:
-

Attaching a patch to fix this.

> SPLITSHARD using async can cause data loss
> --
>
> Key: SOLR-13718
> URL: https://issues.apache.org/jira/browse/SOLR-13718
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.7.2
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Attachments: SOLR-13718.patch, solr.zip
>
>
> When using SPLITSHARD with async, if there are underlying failures in the 
> SPLIT core command or other sub-commands of SPLITSHARD, then SPLITSHARD 
> succeeds and results in two empty sub-shards.
> There are various potential failures with SPLIT core command, here's a way to 
> reproduce using a Solr 6x index in Solr 7x.
> Steps to reproduce (in Solr 7x):
> {code}
> 1. Import the attached configset, and create a collection.
> 2. Move in the attached data directory (index created in Solr6x) in place of 
> the created collection's data directory. Do a collection RELOAD.
> 3. Issue a *:* query, we see 5 documents.
> 4. Issue a SPLITSHARD, and then issue *:*, we see 0 documents.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13718) SPLITSHARD using async can cause data loss

2019-08-26 Thread Ishan Chattopadhyaya (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16915697#comment-16915697
 ] 

Ishan Chattopadhyaya commented on SOLR-13718:
-

This is a spin off from SOLR-13695. THe last comment there describes how I 
discovered this bug.

> SPLITSHARD using async can cause data loss
> --
>
> Key: SOLR-13718
> URL: https://issues.apache.org/jira/browse/SOLR-13718
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.7.2
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>
> When using SPLITSHARD with async, if there are underlying failures in the 
> SPLIT core command or other sub-commands of SPLITSHARD, then SPLITSHARD 
> succeeds and results in two empty sub-shards.
> There are various potential failures with SPLIT core command, here's a way to 
> reproduce using a Solr 6x index in Solr 7x.
> Steps to reproduce (in Solr 7x):
> {code}
> 1. Import the attached configset, and create a collection.
> 2. Move in the attached data directory (index created in Solr6x) in place of 
> the created collection's data directory. Do a collection RELOAD.
> 3. Issue a *:* query, we see 5 documents.
> 4. Issue a SPLITSHARD, and then issue *:*, we see 0 documents.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org