On 2017-08-13 19:51, Adam D. Ruppe wrote:
On Sunday, 13 August 2017 at 15:54:45 UTC, Faux Amis wrote:
Just curious, but is there a spec of sorts which defines which errors should be fixed and such?

The HTML5 spec describes how you are supposed to parse various things, including the recovery paths for broken markup.

My module, however, isn't so formal. I just used it for a web scraping thing at work that hit a few hundred sites and fixed bugs as they came up to give good enough results for me.... (one thing I found is a lot of sites claiming to be UTF-8 are actually latin-1, so it validates and falls back to handle that. My http thing, while buggier, is similar - I hit a server once that ignored the accept gzip header and always sent it anyway, so I had to handle that... and I noticed curl actually didn't!)

So on the one hand, there's surely still bugs and weird cases, but on the other hand, it did get a fair chunk of real-world use so I am fairly confident it will be ok for most things.


Sounds good!
(Althought following the spec would be the first step to a D html layout engine :D )

Reply via email to