[ https://issues.apache.org/jira/browse/TIKA-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ken Krugler reassigned TIKA-1296: --------------------------------- Assignee: Ken Krugler > Add case insensitive matching for text/html mime type > ----------------------------------------------------- > > Key: TIKA-1296 > URL: https://issues.apache.org/jira/browse/TIKA-1296 > Project: Tika > Issue Type: Improvement > Components: mime > Affects Versions: 1.5 > Reporter: Phil Lester > Assignee: Ken Krugler > > Currently in tika-mimetypes.xml for the mime type text/html (and possibly > others) matches in a couple different cases are provided for the elements so > that varying HTML writing styles are matched. As of version 1.5 of Tika the > ability exists to make these case insensitive using the "stringignorecase" > type. This would allow consolidation of some matches and improve detection of > poorly-formed HTML that would be rendered by most browsers regardless of case. > For example: > <match value="<BODY" type="string" offset="0"/> > <match value="<body" type="string" offset="0"/> > could become: > <match value="<BODY" type="stringignorecase" offset="0"/> -- This message was sent by Atlassian JIRA (v6.3.4#6332)