[ http://issues.apache.org/jira/browse/NUTCH-110?page=all ]
[EMAIL PROTECTED] updated NUTCH-110: ------------------------------------ Attachment: fixIllegalXmlChars.patch Attached patch runs all xml text through a check for bad xml characters. This patch is brutal dropping silently illegal characters. Patch was made after hunting xalan, jdk, and nutch itself for a method that would do the above filtering but was unable to find any such method -- perhaps an oversight on my part? > OpenSearchServlet outputs illegal xml characters > ------------------------------------------------ > > Key: NUTCH-110 > URL: http://issues.apache.org/jira/browse/NUTCH-110 > Project: Nutch > Type: Bug > Components: searcher > Versions: 0.7 > Environment: linux, jdk 1.5 > Reporter: [EMAIL PROTECTED] > Attachments: fixIllegalXmlChars.patch > > OpenSearchServlet does not check text-to-output for illegal xml characters; > dependent on search result, its possible for OSS to output xml that is not > well-formed. For example, if text has the character FF character in it -- -- > i.e. the ascii character at position (decimal) 12 -- the produced XML will > show the FF character as '' The character/entity '' is not legal in > XML according to http://www.w3.org/TR/2000/REC-xml-20001006#NT-Char. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira