Hi,

I am using the function fopen to open a word document, loading the contents into a variable and then using a substr_count to count the number of times a certain string is found, this is allowing me to search through the file and say how many times the word appears, I can even use str_replace to highlight certain words. However Microsoft word seems to put a lot of rubbish in the header and footer, I am wondering is it possible to filter this rubbish out to get the exact document.

I also tried using fopen to open a PDF file, but as PDF is handled differently it came up completely different with no words at all, just full of rubbish. Is there anyway I can get this information using a simple fopen?

I am basically trying to create a search engine which can read within files similar to google. The only problem I would have after I have done all this is actually weighting the search results, however I would probably have to create the results first and then finally go through the results to try to weight them.

Does anyone else have any experience in this or could help me out with any of the problems I am having?

Thanks

Kevin

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to