[ http://issues.apache.org/jira/browse/NUTCH-110?page=all ]

[EMAIL PROTECTED] updated NUTCH-110:
------------------------------------

    Attachment: fixIllegalXmlChars.patch

Attached patch runs all xml text through a check for bad xml characters.  This 
patch is brutal dropping silently illegal characters.  Patch was made after 
hunting xalan, jdk, and nutch itself for a method that would do the above 
filtering but was unable to find any such method -- perhaps an oversight on my 
part?

> OpenSearchServlet outputs illegal xml characters
> ------------------------------------------------
>
>          Key: NUTCH-110
>          URL: http://issues.apache.org/jira/browse/NUTCH-110
>      Project: Nutch
>         Type: Bug
>   Components: searcher
>     Versions: 0.7
>  Environment: linux, jdk 1.5
>     Reporter: [EMAIL PROTECTED]
>  Attachments: fixIllegalXmlChars.patch
>
> OpenSearchServlet does not check text-to-output for illegal xml characters; 
> dependent on  search result, its possible for OSS to output xml that is not 
> well-formed.  For example, if text has the character FF character in it -- -- 
> i.e. the ascii character at position (decimal) 12 --  the produced XML will 
> show the FF character as '' The character/entity '' is not legal in 
> XML according to http://www.w3.org/TR/2000/REC-xml-20001006#NT-Char.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to