Re: [PHP-DEV] domdocument loadhtml and encoding

2012-08-07 Thread Ferenc Kovacs
On Fri, Jun 1, 2012 at 5:57 PM, Tjerk Meesters datib...@hotmail.com wrote:

 Gentlemen,

 Regarding this bug report: https://bugs.php.net/bug.php?id=49705

 As more developers move away from using regular expressions to parse
 HTML and start using DOMDocument, I've noticed that quite a few
 stumble over encoding issues. They're not bugs, because it's
 documented (I think) that if a document is loaded using
 ::loadHTMLFile() or if it contains a content-type meta tag which
 specifies the character encoding it will work as expected.

 So far I've suggested a hack that involves adding the meta-tag in
 front of the string that contains the HTML. As horrible as it seems,
 that does the job!

 That said, I'm hoping to get enough internals support to add a
 parameter to ::loadHTML() that set / overrides the default character
 set when processing the document; when given, any meta tags
 pertaining to character set encoding should be ignored (AFAIK that's
 also the browser's behavior).

 Btw, there's another patch that also introduces a new parameter to
 ::parseHTML() which has gone into 5.4 branch
 (https://bugs.php.net/bug.php?id=54037), so it looks like this would
 be the second (optional) parameter then.

 Thoughts?


would be nice.
bump.


-- 
Ferenc Kovács
@Tyr43l - http://tyrael.hu


[PHP-DEV] domdocument loadhtml and encoding

2012-06-01 Thread Tjerk Meesters
Gentlemen,

Regarding this bug report: https://bugs.php.net/bug.php?id=49705

As more developers move away from using regular expressions to parse
HTML and start using DOMDocument, I've noticed that quite a few
stumble over encoding issues. They're not bugs, because it's
documented (I think) that if a document is loaded using
::loadHTMLFile() or if it contains a content-type meta tag which
specifies the character encoding it will work as expected.

So far I've suggested a hack that involves adding the meta-tag in
front of the string that contains the HTML. As horrible as it seems,
that does the job!

That said, I'm hoping to get enough internals support to add a
parameter to ::loadHTML() that set / overrides the default character
set when processing the document; when given, any meta tags
pertaining to character set encoding should be ignored (AFAIK that's
also the browser's behavior).

Btw, there's another patch that also introduces a new parameter to
::parseHTML() which has gone into 5.4 branch
(https://bugs.php.net/bug.php?id=54037), so it looks like this would
be the second (optional) parameter then.

Thoughts?

-- 
--
Tjerk

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php