Re: [uf-discuss] stats on well formed XHTML

ryan Wed, 16 Jan 2008 15:13:38 -0800

On Jan 16, 2008, at 12:41 AM, Kevin Burton wrote:

Has anyone done any large scale audits of XHTML in the wild to
determine the percentage that parse correctly?

Yes, Ian Hickson at Google did a survey of about 1B pages and foundthat over 90% had *well-formedness* errors. I can't find a referenceoff hand, but it maybe buried somewhere in [#webstats].

I'm thinking about deploying one in Spinn3r but I'd rather focus on
other tasks if this has already been done.


I'd suggest working on other tasks. :)

I'm curious about the assumptions one could make when assuming that
XHTML is well formed.


You know what they say about assumptions.

Specifically, the probability that a naive non-XML parser can make
while indexing the content.

I'm not sure what you mean here, but I'd reccomend against using anXML parser against web content and instead use something like theHTML5 parsing algorithm [#html5-parsing].


-ryan

[webstats]: http://code.google.com/webstats/
[html5-parsing]: http://whatwg.org/specs/web-apps/current-work/#parsing
_______________________________________________
microformats-discuss mailing list
microformats-discuss@microformats.org
http://microformats.org/mailman/listinfo/microformats-discuss

Re: [uf-discuss] stats on well formed XHTML

Reply via email to