Hello Rob,
Thank you for your advices... :-)
> Is there a meta tag that specifies the encoding?
> When loading HTML that is also used to determine the encoding.
> I think I need to clarify the encoding issue:
> I'll bet when the document is loading, the encoding is being properly
> detected. When working with the elements however you are getting
> hung up on the UTF-8 factor....
> you probably do something like the following:
> $myelement = getElementById('someid');
> print $myelement->textContent;
> That right there will output the textual content in UTF-8
> (the garbled characters). It does not take into consideration the
> encoding used in the origional document. This is just how the xml
> functions work. Now...
> You really need to do something like:
> $text = $myelement->textContent;
> print iconv("UTF-8", <output encoding>, $text);
> If the encoding is in the meta tag, typically encountered as:
> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
> If you add the content to a dom node, you do not change the encoding
> since the functions all work on UTF-8. The document to which
> the content is being added however, must be set to use the desired
> encoding. I am assuming you are doing what I previously
> explained though.
I tried following:
I downloaded the ominous html page, coded it as UTF-8 (with text-editor option)
and added a metag-tag declaring utf-8 encoding:
<meta http-equiv = 'content-type' content = 'text/html; charset=UTF-8'>
I checked that the special characters were written correctly then (actually I
had to correct them).
Then I used the function:
$doc = new DomDocument('1.0', 'UTF-8');
The result is still the same, special chars are displayed wrong. Different
wrong than before :-) but still wrong... ("ä" is now "ä").
I tried analogy to do the same with "ISO-8859-1" but it's not getting better...
So, fazit, even converting the whole document in UTF-8 and adding UTF-8 charset
declaration to it, doesn't help me handling special chars...
And what about the img-tags which are converted into what ever invisible chars
(empty spaces looking at the source code)...?
Thank you very much for your help!
LS
--
"Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ...
Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php