[ 
https://issues.apache.org/jira/browse/OAK-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840316#comment-15840316
 ] 

Alexander Klimetschek commented on OAK-5519:
--------------------------------------------

Related issues:
* OAK-4939 addresses this in 1.5 and 1.6, but considers the entire index 
"corrupted" and isolates it; if this is an important full text index, then it 
would still impact users as they won't find the other content (that is fine)
* OAK-3813 that I reported earlier which is about datastore failing to resolve 
blobs (in this case S3 where you might have more failure scenarios)


> Skip problematic binaries instead of blocking indexing
> ------------------------------------------------------
>
>                 Key: OAK-5519
>                 URL: https://issues.apache.org/jira/browse/OAK-5519
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: query
>            Reporter: Alexander Klimetschek
>
> If a text extraction is broken (weird PDF) or a blob cannot be found in the 
> datastore or any other error upon indexing one item from the repository that 
> is outside the scope of the indexer, it currently halts the complete indexing 
> (lane). Thus one broken item (that maybe isn't important to the users at all) 
> can block the indexing of other, new content (that might be important to 
> users), and it always requires manual intervention to fix (which is also not 
> easy and requires oak experts).
> Instead, the item could be remembered in a known issue list, proper warnings 
> given, and indexing continue. Maintenance operations should be available to 
> come back to reindex these once the issue is fixed, or the indexer could 
> automatically retry after some time.
> I think the line should probably be drawn for binary properties. Not sure if 
> other JCR property types could trigger a similar issue, and if a failure in 
> them might actually warrant a halt, as it could lead to an "incorrect" index, 
> if these properties are important. But maybe the line is simply a try & catch 
> around "full text extraction".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to