[ https://issues.apache.org/jira/browse/OAK-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840316#comment-15840316 ]
Alexander Klimetschek commented on OAK-5519: -------------------------------------------- Related issues: * OAK-4939 addresses this in 1.5 and 1.6, but considers the entire index "corrupted" and isolates it; if this is an important full text index, then it would still impact users as they won't find the other content (that is fine) * OAK-3813 that I reported earlier which is about datastore failing to resolve blobs (in this case S3 where you might have more failure scenarios) > Skip problematic binaries instead of blocking indexing > ------------------------------------------------------ > > Key: OAK-5519 > URL: https://issues.apache.org/jira/browse/OAK-5519 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: query > Reporter: Alexander Klimetschek > > If a text extraction is broken (weird PDF) or a blob cannot be found in the > datastore or any other error upon indexing one item from the repository that > is outside the scope of the indexer, it currently halts the complete indexing > (lane). Thus one broken item (that maybe isn't important to the users at all) > can block the indexing of other, new content (that might be important to > users), and it always requires manual intervention to fix (which is also not > easy and requires oak experts). > Instead, the item could be remembered in a known issue list, proper warnings > given, and indexing continue. Maintenance operations should be available to > come back to reindex these once the issue is fixed, or the indexer could > automatically retry after some time. > I think the line should probably be drawn for binary properties. Not sure if > other JCR property types could trigger a similar issue, and if a failure in > them might actually warrant a halt, as it could lead to an "incorrect" index, > if these properties are important. But maybe the line is simply a try & catch > around "full text extraction". -- This message was sent by Atlassian JIRA (v6.3.4#6332)