Ryan S wrote:


$website_data = file_get_contents('dom_test.html');//load the website data,

$dom = new DomDocument; //make a new DOM container in PHP
$dom->loadHTML($website_data);  //load all the fetched data into the DOM 
container

I'm not sure what the answer to your issue is, but mind if I make a couple off topic recommondations?

1) Use loadXML() instead of loadHTML()

The reason is that loadHTML() will mutilate multibyte utf8 characters, replacing them with entities.

You can still use $dom->saveHTML() to present the data if html is your target output.

2) loadXML() is less forgiving of malformed content, but you can fix that by using tidy to import your data

$website_data = new tidy('dom_test.html',$tidy_config);
$website_data->cleanRepair();
$dom->loadXML($website_data);

where

$tidy_config is the tidy configuration array.
Make sure you set

$tidy_config['output-xhtml'] = true;

so that the output of tidy is clean X(ht)ML for loadXML().

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to