Sebastian Nagel created NUTCH-1754: -------------------------------------- Summary: remove BOM from extracted plain text Key: NUTCH-1754 URL: https://issues.apache.org/jira/browse/NUTCH-1754 Project: Nutch Issue Type: Bug Components: parser Affects Versions: 2.3, 1.9 Reporter: Sebastian Nagel Priority: Minor Fix For: 2.3, 1.9
(reported by [~jlafitte], see NUTCH-1733) Plain-text content extracted by parse-html should not contain a leading Unicode Byte Order Mark (BOM), ev. followed by white space characters. -- This message was sent by Atlassian JIRA (v6.2#6252)