Laurens M. Fridael <[EMAIL PROTECTED]> wrote:
> Are you still talking about the spaces issue here? Replacing literal spaces
> with "%20" does not break standards because literal spaces are illegal in
> the first place. IOW, this simple fix makes illegal URLs
> standards-compliant.
If you're doing this in the parser, as some have suggested, should you
do it before identification of attributes? Before the normalising them?
After?
As I understand it, you can't do this before identification, as space
may be used as an attribute delimiter for some versions of HTML.
You can do this after identification before normalisation, but then you
have to handle each URI in an attribute as a special case and make sure
that you only do this to URIs whenever they occur. That sounds slow
and complicated. If you cut corners by only handling some URIs, it's an
easy way for errors to creep in as Plucker supports more tags. I could
be wrong, but I can't see how to write this another way.
You can't do this after normalisation, as you can't tell what was a
accidentally unencoded space any more.
If someone can write the code to pick out all the URIs between
identification and normalisation, and can do it without slowing the
parser down even more, come on! Send a patch to the list!
I suspect the situation may be easier for xhtml, but do the sort of
people who misencode URIs speak xhtml yet?
It may also be possible to do this in some other part of the process
besides the parser, but I only know the parser and that seemed to be where
people want to try putting the fix. I can't see how it can go there.
> [...] a simple search-and-replace will help parsing quite a number of
> broken pages - with no side-effects. A win-win situation as far as I
> can see.
See above. It's not a simple solution to do without breaking
other things, despite the claims of some. If jpluck does a "simple
search and replace" in its parser, it's time to start praying IMO.
--
MJR/slef My Opinion Only and possibly not of any group I know.
http://mjr.towers.org.uk/ jabber://[EMAIL PROTECTED]
Creative copyleft computing services via http://www.ttllp.co.uk/
Thought: "Changeset algebra is really difficult."
_______________________________________________
plucker-list mailing list
[EMAIL PROTECTED]
http://lists.rubberchicken.org/mailman/listinfo/plucker-list