On 9/1/07, David Chisnall <[EMAIL PROTECTED]> wrote:
> On 1 Sep 2007, at 10:59, David Chisnall wrote:
>
> > On 1 Sep 2007, at 04:56, Yen-Ju Chen wrote:
> >
> >> Currently TRXML will fails on something like this:
> >> <img width=64 /> because the value is not quated.
> >> What's the chance to fix it in TRXML without major changes ?
> >> Another way I can think is to pre-process such text before feeding
> >> into the parser.
> >
> > It probably wouldn't be too hard to do.  XML requires attributes to
> > be quoted, but TRXML cheats a bit and allows them to be quoted by any
> > character, so it would try to parse this as 4 quoted by 6s, and fail
> > because it can't find the closing 6.
>
> Apparently I already fixed this behaviour, and in XML mode it parses
> things correctly.

Now, it breaks on this:

<html>
<body>
        <div class="articleTitleStyle">
        </div>
</body></html>


>
> > The fix would be (in SGML mode)
> > to check that the quote is a quote character, and skip to the next
> > space if it isn't.  I'll have a look at doing that now.
>
> I have done this.  It will now parse your example correctly, but with
> one caveat:
> Things like this:
> <img size=64>
> will still break.  Getting it to detect a space, a / or a > as the
> end of the attribute would require some slightly larger changes to
> the parser.  Feel free to have a play.

  I start to feel it is too much for TRXML to support every broken xml.
  So I probably will try to have a pre-processor to make xml valid first.

  Yen-Ju

>
> David
>
> _______________________________________________
> Etoile-dev mailing list
> [email protected]
> https://mail.gna.org/listinfo/etoile-dev
>

_______________________________________________
Etoile-dev mailing list
[email protected]
https://mail.gna.org/listinfo/etoile-dev

Reply via email to