[ https://issues.apache.org/jira/browse/OAK-4585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15407380#comment-15407380 ]
Thomas Mueller commented on OAK-4585: ------------------------------------- http://svn.apache.org/r1755145 (trunk) Thanks Chetan. * Path: I now mainly log the path, as I see it's better than the content identity for the SegmentStore case. I still log the content identity for the FileDataStore case, so it should be easy to get hold of the binary (and even if the repository is not available). * Trace level: I still use debug level, not sure why trace level would be better? I think people mainly use debug level when trying to find problems, and this doesn't seem to increase the log file dramatically. * Log time taken, source size, extracted text size: done * By the way there was a bug in the first patch, Long.parseLong instead of Long.getLong. > Text extraction: runtime status monitoring > ------------------------------------------ > > Key: OAK-4585 > URL: https://issues.apache.org/jira/browse/OAK-4585 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene > Reporter: Thomas Mueller > Assignee: Thomas Mueller > Fix For: 1.4.6, 1.5.7 > > > Text extraction is sometimes slow, and, in case of a bug in the text > extraction library, can even get stuck in an endless loop. > Right now, it is not easy to understand what is going on, even when looking > at full thread dumps. (Debug) log information about the current state of text > extraction would be nice as well. > I suggest we add debug level logging for the current extracted binary > (content identity). For larger binaries, we can also temporarily set the > thread name (append "Extracting <contentIdentity>"). That way, it is > relatively easy to see if text extraction is stuck simply looking at full > thread dumps, without having to change the log level and then reindex. -- This message was sent by Atlassian JIRA (v6.3.4#6332)