[ https://issues.apache.org/jira/browse/TIKA-529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144652#comment-13144652 ]
Michael McCandless commented on TIKA-529: ----------------------------------------- This patch looks safe, and avoids crazy allocations inside this detector.... can we commit it (reversing the first 2 conditions)? > IBM420 charset detection's isLamAlef is allocation-happy > -------------------------------------------------------- > > Key: TIKA-529 > URL: https://issues.apache.org/jira/browse/TIKA-529 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 0.8 > Reporter: Radek > Assignee: Ken Krugler > Priority: Minor > Attachments: isLamAlef.diff > > > Two IBM420 charset detectors (rtl and ltr) run isLamAlef() for each byte of > detection buffer. > The code is allocating and filling a bytes array every time it runs, which > makes it responsible for approximately 70% of all object allocations in my > current test case (many text files). > Since array is identical every time, and the entire thing can be achieved > without any array, this is wasteful. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira