[ https://issues.apache.org/jira/browse/TIKA-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16953639#comment-16953639 ]
Tim Allison commented on TIKA-2966: ----------------------------------- I'd want this in streaming mode to handle text as it came in by {{characters()}}, but tokenization is critical and we can't guarantee that parsers will call {{characters()}} on logical chunks. > Create a tika-eval SAXHandler > ----------------------------- > > Key: TIKA-2966 > URL: https://issues.apache.org/jira/browse/TIKA-2966 > Project: Tika > Issue Type: Improvement > Reporter: Tim Allison > Priority: Major > > One of the improvements coming in 1.23 is the decoupling of the text stats > calculator from the tika-eval app. To make this even easier to use, let's > add a handler that will calculate the text stats on .endDocument() and record > those stats in a metadata object. -- This message was sent by Atlassian Jira (v8.3.4#803005)