[ https://issues.apache.org/jira/browse/OAK-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15976382#comment-15976382 ]
Thomas Mueller commented on OAK-5519: ------------------------------------- I recently saw OutOfMemory error during the index update; I'm not sure if that's caused by a problematic binary, a bug in the PDF text extraction tool, or something else. It probably makes sense to deal with OOME as well (at least catch it and log the stack trace). > Skip problematic binaries instead of blocking indexing > ------------------------------------------------------ > > Key: OAK-5519 > URL: https://issues.apache.org/jira/browse/OAK-5519 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: indexing > Reporter: Alexander Klimetschek > Labels: resilience > > If a text extraction is blocked (weird PDF) or a blob cannot be found in the > datastore or any other error upon indexing one item from the repository that > is outside the scope of the indexer, it currently halts the indexing (lane). > Thus one item (that maybe isn't important to the users at all) can block the > indexing of other, new content (that might be important to users), and it > always requires manual intervention (which is also not easy and requires oak > experts). > Instead, the item could be remembered in a known issue list, proper warnings > given, and indexing continue. Maintenance operations should be available to > come back to reindex these, or the indexer could automatically retry after > some time. This would allow normal user activity to go on without manual > intervention, and solving the problem (if it's isolated to some binaries) can > be deferred. > I think the line should probably be drawn for binary properties. Not sure if > other JCR property types could trigger a similar issue, and if a failure in > them might actually warrant a halt, as it could lead to an "incorrect" index, > if these properties are important. But maybe the line is simply a try & catch > around "full text extraction". -- This message was sent by Atlassian JIRA (v6.3.15#6346)