Hi,

 I'm not an XML expert by any means, but wouldn't it be simpler to just wrap
any text where illegal chars are possible with a <!CDATA[.... ]!> tag? That
way, the offending characters won't be dropped and the process won't be
lossy, no?

  If the CDATA method won't work, and there's no other way to solve the
problem without losing text, then your patch has my +1.

Cheers,
 Chris


______________________________________________
Chris A. Mattmann
[EMAIL PROTECTED] 
Staff Member
Modeling and Data Management Systems Section (387)
Data Management Systems and Technologies Group

_________________________________________________
Jet Propulsion Laboratory            Pasadena, CA
Office: 171-266B                        Mailstop:  171-246
_______________________________________________________

Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.

> -----Original Message-----
> From: [EMAIL PROTECTED] (JIRA) [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, October 12, 2005 5:19 PM
> To: nutch-dev@incubator.apache.org
> Subject: [jira] Updated: (NUTCH-110) OpenSearchServlet outputs illegal xml
> characters
> 
>      [ http://issues.apache.org/jira/browse/NUTCH-110?page=all ]
> 
> [EMAIL PROTECTED] updated NUTCH-110:
> ------------------------------------
> 
>     Attachment: fixIllegalXmlChars.patch
> 
> Attached patch runs all xml text through a check for bad xml characters.
> This patch is brutal dropping silently illegal characters.  Patch was made
> after hunting xalan, jdk, and nutch itself for a method that would do the
> above filtering but was unable to find any such method -- perhaps an
> oversight on my part?
> 
> > OpenSearchServlet outputs illegal xml characters
> > ------------------------------------------------
> >
> >          Key: NUTCH-110
> >          URL: http://issues.apache.org/jira/browse/NUTCH-110
> >      Project: Nutch
> >         Type: Bug
> >   Components: searcher
> >     Versions: 0.7
> >  Environment: linux, jdk 1.5
> >     Reporter: [EMAIL PROTECTED]
> >  Attachments: fixIllegalXmlChars.patch
> >
> > OpenSearchServlet does not check text-to-output for illegal xml
> characters; dependent on  search result, its possible for OSS to output
> xml that is not well-formed.  For example, if text has the character FF
> character in it -- -- i.e. the ascii character at position (decimal) 12 --
> the produced XML will show the FF character as '&#12;' The
> character/entity '&#12;' is not legal in XML according to
> http://www.w3.org/TR/2000/REC-xml-20001006#NT-Char.
> 
> --
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the administrators:
>    http://issues.apache.org/jira/secure/Administrators.jspa
> -
> For more information on JIRA, see:
>    http://www.atlassian.com/software/jira

Reply via email to