[ https://issues.apache.org/jira/browse/TIKA-529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920085#action_12920085 ]
Radek commented on TIKA-529: ---------------------------- Actually, if you accept the patch, could you reverse the two first conditions. This way ascii characters would immediately fail first test (faster). I meant to code it this way but got lost in all the negative numbers involved. > IBM420 charset detection's isLamAlef is allocation-happy > -------------------------------------------------------- > > Key: TIKA-529 > URL: https://issues.apache.org/jira/browse/TIKA-529 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 0.8 > Reporter: Radek > Assignee: Ken Krugler > Priority: Minor > Attachments: isLamAlef.diff > > > Two IBM420 charset detectors (rtl and ltr) run isLamAlef() for each byte of > detection buffer. > The code is allocating and filling a bytes array every time it runs, which > makes it responsible for approximately 70% of all object allocations in my > current test case (many text files). > Since array is identical every time, and the entire thing can be achieved > without any array, this is wasteful. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.