Hi,

On Thu, Aug 16, 2012 at 04:58:17AM +0100, Ben Hutchings wrote:
> Attaching an updated patch which corrects many errors in the original
> description of the bug.

One update from me.  Pleae also read TODO list.

Your patch is very fundamental one for SDATA.

XML backend I made is nothing more than minor modificaion of origimal
HTML backend.  I kept HTML as the original one.

 
> -- 
> Ben Hutchings
> I say we take off; nuke the site from orbit.  It's the only way to be sure.

> From b0a67f447be96d4b5df683da6f280d4b24eb3ff4 Mon Sep 17 00:00:00 2001
> From: Ben Hutchings <[email protected]>
> Date: Thu, 16 Aug 2012 04:16:54 +0100
> Subject: [PATCH] Fix mangling of '&' in URLs
> 
> SDATA entities, e.g. '&amp;' are converted on input to different
> escape sequences, e.g. '\|[amp   ]\|'.  These need to be converted
> back to literal characters or entities on output, depending on the
> format.  Currently we fail to do this because:
> 
> 1. The HTML and XML back-ends only do URL-normalization; they never
> process the SDATA escape sequences.

I trust you here.  I kind of remember thinking similar but did not have
courage to changing this fundamental feature.

> Change them to do CDATA conversion before normalizing URLs.  This is
> not theoretically correct: we should convert to literal text, then
> normalize the URL, then convert to CDATA.  However we know that '&'
> and ';' will not be URL-escaped and therefore the result should be the
> same.
> 
> 2. The driver does some basic normalization of the URL by squashing
> multiple spaces.  Since the spaces are significant in matching of the
> SDATA escape sequences, they are then not converted by any back-end.
> 
> Change it to trim leading and trailing space only; URLs should not
> normally contain any spaces anyway.

I think your assesment is sound.

Osamu


-- 
To UNSUBSCRIBE, email to [email protected]
with a subject of "unsubscribe". Trouble? Contact [email protected]

Reply via email to