> Hello, I am making an app that read from an html file outputted by MS > word (ya its for those people that need to make webpages but don't know > how o write html) anyway, using MS word is a requirement; After the user > saves their .doc file as a web page (now and htm file) the php will take > that html file from a dir on the server, open it, read it, and ignore > anything that is from the beginning of the file up to and right after > the body tag ends, then it must ignore anything at the end of the page > up and including the body tags and the closing html tag. So basically > after its done doing its thing I would have all the content of the page > ready to be echoed inside another page that would be a sort of shell or > template. > > I am loocking right now at regular expressions and file_open etc, but > just to give you an idea and to see if anybody has any helpful pointers, > this (yes, can u believe it?) is the beginning of the word2html > translation that MS word does: (BAH!) (i have to get rid of this > remember?)
Here is an example regular expression that someone on this group gave me. It gives everything between the body tags. <?php $html_text = ' <html> <head> <title>Untitled</title> </head> <body> Blah Blah Blah Blah </body> </html> '; preg_match("/<body>(.*)<\/body>/i",$html_text,$matches); echo $html_text; ?> Here is a class that removes un-needed word 2000 HTML tags: http://www.phpclasses.org/browse.html/package/277.html If you need the styling you will need to do an extra regular expression to get out of the head and perhaps put it into a file. If you don't need styling I would recomment parsing the document itself and removing all the class="" and style="" attributes -- JJ Harrison [EMAIL PROTECTED] www.tececo.com -- Please reply on the list/newsgroup unless the reply it OT. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php