On Fri, Jun 1, 2012 at 5:57 PM, Tjerk Meesters <datib...@hotmail.com> wrote:
> Gentlemen, > > Regarding this bug report: https://bugs.php.net/bug.php?id=49705 > > As more developers move away from using regular expressions to parse > HTML and start using DOMDocument, I've noticed that quite a few > stumble over encoding "issues". They're not bugs, because it's > documented (I think) that if a document is loaded using > ::loadHTMLFile() or if it contains a "content-type" meta tag which > specifies the character encoding it will work as expected. > > So far I've suggested a hack that involves adding the meta-tag in > front of the string that contains the HTML. As horrible as it seems, > that does the job! > > That said, I'm hoping to get enough internals support to add a > parameter to ::loadHTML() that set / overrides the default character > set when processing the document; when given, any <meta> tags > pertaining to character set encoding should be ignored (AFAIK that's > also the browser's behavior). > > Btw, there's another patch that also introduces a new parameter to > ::parseHTML() which has gone into 5.4 branch > (https://bugs.php.net/bug.php?id=54037), so it looks like this would > be the second (optional) parameter then. > > Thoughts? > > would be nice. bump. -- Ferenc Kovács @Tyr43l - http://tyrael.hu