On Thu, Nov 3, 2011 at 10:27 AM, Terry Brown <terry_n_br...@yahoo.com> wrote:

> Probably not relevant to Leo import export, but javascript often
> contains things not valid in HTML, notably < and &

Yikes.
>
> Which raises the question of
> CDATA http://www.w3schools.com/xml/xml_cdata.asp
> strictly speaking Leo should not parse anything in a CDATA block.

Ok.  I'll put this on the list for someday.

> I wasn't suggesting using ElementTree / lxml for parsing, just that
> the .text and .tail attribue model allows complete representation of
> HTML's "pernicious mixed content".
> http://www.thaiopensource.com/relaxng/design.html#section:11

Thanks for this.

In any event, my initial enthusiasm for the scanner-based approach was
unfounded.  I had forgotten to remove the code that completely ignores
whitespace.  When I did so, the original whitespace failure
reappeared!  [Sounds of teeth gnashing.]

This is a really nasty problem.  Somehow the html code generator must
find a way around it.  Either that, or pretend that the importer is
allowed to insert whitespace in some cases.

In short, the present html importer is being overly persnickety.  I
don't know how to cure that problem without gutting a significant part
of the import check...

Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To post to this group, send email to leo-editor@googlegroups.com.
To unsubscribe from this group, send email to 
leo-editor+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/leo-editor?hl=en.

Reply via email to