Rafael:

It's important to realise that character entities exist only in the in the
serialization of XML, and that inside a Cocoon pipeline the XML is not
serialized, so the entities do not exist. Inside Cocoon these "special"
characters are no different to any other characters: as far as Cocoon is
concerned every character is just a Java char. It's only when the characters
are serialized with a particular encoding that entity references are created
at all. And when a document containing these character entities is parsed,
the entities are converted back into Java characters. Therefore they only
appear where Cocoon interfaces with something else.

So problems could arise where the text is submitted to the database, or
where the text is serialised to a browser, or something like that. Though in
general you should not have to do any translation at all, since the xml
parser and serialiser should do it for you.

Where do the entities cause problems in your system?

Con



> -----Original Message-----
> From: Rafael Alvarado [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, 10 March 2004 11:54
> To: [EMAIL PROTECTED]
> Subject: RE: My XSP is entifying my UTF
>
>
> Here is my situation.  I run an etext server with documents written in
> several languages.  In creating a search interface for a collection of
> Hebrew documents, for example, I want to pull a distinct list
> of words from
> a db and create a set of lists for users to search with.  The
> values have to
> be in unicode, since they will be sent back to the database as a query
> string.  I don't want to have to translate entities back and
> forth into UTF8
> -- I would rather work in UTF8 and forget entities forever.
>
> By the way, I had a similar problem with the HTML generator
> that uses Jtidy
> -- is this, too, the fault of Xalan?
>
>
> Rafael C. Alvarado
> Manager of Humanities Computing Research Applications
> 316 87 Prospect | Princeton University
>
>
> -----Original Message-----
> From: Joerg Heinicke [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, March 09, 2004 5:46 PM
> To: [EMAIL PROTECTED]
> Subject: Re: My XSP is entifying my UTF
>
> On 09.03.2004 23:31, Rafael Alvarado wrote:
>
> > OK, thanks for the clarification. So, then, if Xalan is the
> culprit,
> > can it be replaced in Cocoon? My memory says no, but I'll
> have to take
> > a look.  If it cannot be replaced, then I'll probably have
> to drop Cocoon!
>
> Much to my regret! Why are the entified characters so problematic?
>
> The good news: Cocoon does not depend on Xalan, but only on a JAXP
> compatible processor. So it can be replaced. I know few
> people using Saxon
> for example. The bad news: if you use JDK 1.4 Xalan is
> delivered with the
> JDK and it will be a bit more difficult to get it not used by
> Cocoon (e.g.
> by the ParanoidCocoonServlet).
>
> Joerg
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to