[ 
https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830079#comment-17830079
 ] 

Tilman Hausherr commented on TIKA-4218:
---------------------------------------

The word "party" appears 36 times in the json file, 18 times in my text 
extraction, but 62 times in the csv file in the TOP_N_TOKENS_A row. The double 
in the json file is because of "xfa_content", but the "62" I don't understand.

Thanks for mentioning the new list (I probably missed it), I'll adjust my 
scripts and use them the next time.

> Run regression tests to support 2.9.2 release
> ---------------------------------------------
>
>                 Key: TIKA-4218
>                 URL: https://issues.apache.org/jira/browse/TIKA-4218
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to