[ https://issues.apache.org/jira/browse/TIKA-354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tyler Palsulich closed TIKA-354. -------------------------------- Resolution: Not a Problem Closing this off, unless you're still interested in getting it in, [~kkrugler]. We recently had a good improvement to detection speed (TIKA-1549). > ProfilingHandler should take a length-limiting parameter > -------------------------------------------------------- > > Key: TIKA-354 > URL: https://issues.apache.org/jira/browse/TIKA-354 > Project: Tika > Issue Type: Improvement > Components: languageidentifier > Affects Versions: 0.5 > Reporter: Vivek Magotra > Assignee: Ken Krugler > Attachments: TIKA-354-2.patch, TIKA-354.patch > > > ProfilingHandler currently parses the entire document (thereby analyzing > n-grams for the entire doc). > ProfilingHandler should take a length-limiting parameter that allows a user > to specify the amount of data that should get analyzed. > In fact, by default that limit should be set to something like 8K. -- This message was sent by Atlassian JIRA (v6.3.4#6332)