[ https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830079#comment-17830079 ]
Tilman Hausherr commented on TIKA-4218: --------------------------------------- The word "party" appears 36 times in the json file, 18 times in my text extraction, but 62 times in the csv file in the TOP_N_TOKENS_A row. The double in the json file is because of "xfa_content", but the "62" I don't understand. Thanks for mentioning the new list (I probably missed it), I'll adjust my scripts and use them the next time. > Run regression tests to support 2.9.2 release > --------------------------------------------- > > Key: TIKA-4218 > URL: https://issues.apache.org/jira/browse/TIKA-4218 > Project: Tika > Issue Type: Task > Reporter: Tim Allison > Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)