[ 
https://issues.apache.org/jira/browse/OAK-4585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15387714#comment-15387714
 ] 

Chetan Mehrotra commented on OAK-4585:
--------------------------------------

Few minor points
* Instead of content identity we can use path
* Might be better to log this at trace and also log after finish time taken, 
source size, extracted text size

> Text extraction: runtime status monitoring
> ------------------------------------------
>
>                 Key: OAK-4585
>                 URL: https://issues.apache.org/jira/browse/OAK-4585
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>
> Text extraction is sometimes slow, and, in case of a bug in the text 
> extraction library, can even get stuck in an endless loop.
> Right now, it is not easy to understand what is going on, even when looking 
> at full thread dumps. (Debug) log information about the current state of text 
> extraction would be nice as well.
> I suggest we add debug level logging for the current extracted binary 
> (content identity). For larger binaries, we can also temporarily set the 
> thread name (append "Extracting <contentIdentity>"). That way, it is 
> relatively easy to see if text extraction is stuck simply looking at full 
> thread dumps, without having to change the log level and then reindex.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to