On 9/1/07, David Chisnall <[EMAIL PROTECTED]> wrote:
> On 1 Sep 2007, at 10:59, David Chisnall wrote:
>
> > On 1 Sep 2007, at 04:56, Yen-Ju Chen wrote:
> >
> >> Currently TRXML will fails on something like this:
> >> <img width=64 /> because the value is not quated.
> >> What's the chance to fix it in TRXML without major changes ?
> >> Another way I can think is to pre-process such text before feeding
> >> into the parser.
> >
> > It probably wouldn't be too hard to do. XML requires attributes to
> > be quoted, but TRXML cheats a bit and allows them to be quoted by any
> > character, so it would try to parse this as 4 quoted by 6s, and fail
> > because it can't find the closing 6.
>
> Apparently I already fixed this behaviour, and in XML mode it parses
> things correctly.
Now, it breaks on this:
<html>
<body>
<div class="articleTitleStyle">
</div>
</body></html>
>
> > The fix would be (in SGML mode)
> > to check that the quote is a quote character, and skip to the next
> > space if it isn't. I'll have a look at doing that now.
>
> I have done this. It will now parse your example correctly, but with
> one caveat:
> Things like this:
> <img size=64>
> will still break. Getting it to detect a space, a / or a > as the
> end of the attribute would require some slightly larger changes to
> the parser. Feel free to have a play.
I start to feel it is too much for TRXML to support every broken xml.
So I probably will try to have a pre-processor to make xml valid first.
Yen-Ju
>
> David
>
> _______________________________________________
> Etoile-dev mailing list
> [email protected]
> https://mail.gna.org/listinfo/etoile-dev
>
_______________________________________________
Etoile-dev mailing list
[email protected]
https://mail.gna.org/listinfo/etoile-dev