[ 
https://issues.apache.org/jira/browse/LUCENE-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16785625#comment-16785625
 ] 

Simon Willnauer commented on LUCENE-8692:
-----------------------------------------

{quote}
For now I've updated the patch to take the simplest possible approach to 
checking for MergeAbortedException
{quote}


+1

{quote}
Well, to flip your question around: is there an example of a Throwable you can 
think of bubbling up out of IndexWriter.startCommit() that should NOT be 
considered fatal?
{quote}
I think we need to be careful here. From my perspective there are 3 types of 
exceptions here:
 * unrecoverable exceptions aka. VirtualMachineErrors
 * exceptions that happen during indexing and are not recoverable (these are 
handled in DocumentsWriter)
 * exceptions that cause dataloss or inconsistencies (we didn't handle those as 
fatal yet at least not consistently) but we only catch VirtualMachineError.

Those are in particular:

 * getReader()
 * deleteAll()
 * addIndexes()
 * flushNextBuffer()
 * prepareCommitInternal() 
 * doFlush()
 * startCommit()

Those methods might cause documents go missing etc. but we treated them not as 
fatal or tragic events since a user could always call rollback() to go back the 
the last known safe-point / previous commit. Now we can debate if we want to 
change this and we can, in-fact I am all for making it even more strict 
especially since it's inconsistent with what we do if addDocument fails with an 
aborting exception. 
If we do that we need to see if rollback still has a purpose and maybe remove 
it?

now speaking of maybeMerge I don't see why we need to close the index writer 
with a tragic event, there is no dataloss nor an inconsistency? From that logic 
I don't think we need to handle these exceptions in such a drastic way?

{quote}
I don't use github for lucene development – I track all contributions as 
patches in the official issue tracker for the project as recommended by our 
official guidelines : )  ... but i'll go ahead and create a jira/LUCENE-8692 
branch if that will help you review.
{quote}

Bummer, I am not sure branches help. Working like it's still 1999 is a pain we 
should fix our guidelines.



> IndexWriter.getTragicException() nay not reflect all corrupting exceptions 
> (notably: NoSuchFileException)
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-8692
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8692
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Hoss Man
>            Priority: Major
>         Attachments: LUCENE-8692.patch, LUCENE-8692.patch, LUCENE-8692.patch, 
> LUCENE-8692_test.patch
>
>
> Backstory...
> Solr has a "LeaderTragicEventTest" which uses MockDirectoryWrapper's 
> {{corruptFiles}} to introduce corruption into the "leader" node's index and 
> then assert that this solr node gives up it's leadership of the shard and 
> another replica takes over.
> This can currently fail sporadically (but usually reproducibly - 
> seeSOLR-13237) due to the leader not giving up it's leadership even after the 
> corruption causes an update/commit to fail.  Solr's leadership code makes 
> this decision after encountering an exception from the IndexWriter based on 
> wether {{IndexWriter.getTragicException()}} is (non-)null.
> ----
> While investigating this, I created an isolated Lucene-Core equivilent test 
> that demonstrates the same basic situation:
> * Gradually cause corruption on an index untill (otherwise) valid execution 
> of IW.add() + IW.commit() calls throw an exception to the IW client.
> * assert that if an exception is thrown to the IW client, 
> {{getTragicException()}} is now non-null.
> It's fairly easy to make my new test fail reproducibly -- in every situation 
> I've seen the underlying exception is a {{NoSuchFileException}} (ie: the 
> randomly introduced corruption was to delete some file).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to