On Fri, Jun 1, 2012 at 5:57 PM, Tjerk Meesters <datib...@hotmail.com> wrote:

> Gentlemen,
>
> Regarding this bug report: https://bugs.php.net/bug.php?id=49705
>
> As more developers move away from using regular expressions to parse
> HTML and start using DOMDocument, I've noticed that quite a few
> stumble over encoding "issues". They're not bugs, because it's
> documented (I think) that if a document is loaded using
> ::loadHTMLFile() or if it contains a "content-type" meta tag which
> specifies the character encoding it will work as expected.
>
> So far I've suggested a hack that involves adding the meta-tag in
> front of the string that contains the HTML. As horrible as it seems,
> that does the job!
>
> That said, I'm hoping to get enough internals support to add a
> parameter to ::loadHTML() that set / overrides the default character
> set when processing the document; when given, any <meta> tags
> pertaining to character set encoding should be ignored (AFAIK that's
> also the browser's behavior).
>
> Btw, there's another patch that also introduces a new parameter to
> ::parseHTML() which has gone into 5.4 branch
> (https://bugs.php.net/bug.php?id=54037), so it looks like this would
> be the second (optional) parameter then.
>
> Thoughts?
>
>
would be nice.
bump.


-- 
Ferenc Kovács
@Tyr43l - http://tyrael.hu

Reply via email to