Hi,

I need to do a simple thing. I want to read a binary
file (e.g., microsoft word, excel etc) and then
extract only the text from it. I am using simple
fopen() and fread() and when I print out the contents
of the file, it returns me the text but apart from the
text, there is some junk which is probably because of
the file being binary. 

Is it possible through the regexp to specify that I
only want some of the ASCII characters from the binary
stream? Here is the perl equivalent: 

    /([\040-\176\s]{3,})/g

I want only those words that are minimum 3 characters
and I want the characters to match the ASCII numbers
from 40 to 176. 

Would really appreciate any help 

Thanks!




__________________________________________________
Do You Yahoo!?
Get email at your own domain with Yahoo! Mail. 
http://personal.mail.yahoo.com/

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
To contact the list administrators, e-mail: [EMAIL PROTECTED]

Reply via email to