Op 1/30/10 1:35 AM, Mari Masuda schreef:
> Hello,
> 
> I have a function that uses tidy to attempt to clean up a bunch of crappy 
> HTML that I inherited.  In order to use tidy, I write the crappy HTML to a 
> temporary file on disk, run tidy, and extract and return the clean(er) HTML.  
> The program itself works fine but with all of the disk access, it runs quite 
> slowly.  I saw on this page (http://www.php.net/manual/en/wrappers.php.php) 
> that I could write to memory by using php://memory.  Unfortunately, I could 
> not quite get it to work.  The problem is that in the below function, the 
> code within the [[[if (file_exists($dirty_file_path))]]] does not get run if 
> I change [[[$dirty_file_path]]] to "php://memory".  Has anyone ever 
> successfully used php://memory before?  If so, what can I do to use it in my 
> code?  Thank you.

what does it matter that it runs slowly, run it once and be done with it?
alternatively use the php tidy extension and avoid the file system and shelling 
out altogether.

actually I'd imagine shelling out from a webserver process is the bottle neck 
and not saving/reading
from the file system.

lastly I don't suppose you've heard of /dev/shm ?

and, er, no, I don't have experience with php://memory but you might try 
searching for
other people's code:

        
http://www.google.com/codesearch?q=php%3A%2F%2Fmemory&hl=en&btnG=Search+Code

> //==========================================================
> function cleanUpHtml($dirty_html, $enclose_text=true) {
> 
>       $parent_dir = "/filesWrittenFromPHP/";
>       $now = time();
>       $random = rand();
>       
>       //save dirty html to a file so tidy can process it
>       $dirty_file_path = $parent_dir . "dirty" . $now . "-" . $random . 
> ".txt";
>       $dirty_handle = fopen($dirty_file_path, "w");
>       fwrite($dirty_handle, $dirty_html);
>       fclose($dirty_handle);
> 
>       $cleaned_html = "";
>       $start = 0;
>       $end = 0;
> 
>       if (file_exists($dirty_file_path)) {
>               exec("/usr/local/bin/tidy -miq -wrap 0 -asxhtml --doctype 
> strict --preserve-entities yes --css-prefix \"tidy\" --tidy-mark no 
> --char-encoding utf8 --drop-proprietary-attributes yes  --fix-uri yes " . 
> ($enclose_text ? "--enclose-text yes " : "") . $dirty_file_path . " 2> 
> /dev/null");
> 
>               $tidied_html = file_get_contents($dirty_file_path);
>               
>               $start = strpos($tidied_html, "<body>") + 6;
>               $end = strpos($tidied_html, "</body>") - 1;
>               
>               $cleaned_html = trim(substr($tidied_html, $start, ($end - 
> $start)));
>       }
>       
>       unlink($dirty_file_path);
> 
> 
>       return $cleaned_html;
> }
> //==========================================================
> 
> 


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to