> Hello, I am making an app that read from an html file outputted by MS
> word (ya its for those people that need to make webpages but don't know
> how o write html) anyway, using MS word is a requirement; After the user
> saves their .doc file as a web page (now and htm file) the php will take
> that html file from a dir on the server, open it, read it, and ignore
> anything that is from the beginning of the file up to and right after
> the body tag ends, then it must ignore anything at the end of the page
> up and including the body tags and the closing html tag. So basically
> after its done doing its thing I would have all the content of the page
> ready to be echoed inside another page that would be a sort of shell or
> template.
>
> I am loocking right now at regular expressions and file_open etc, but
> just to give you an idea and to see if anybody has any helpful pointers,
> this (yes, can u believe it?) is the beginning of the word2html
> translation that MS word does: (BAH!) (i have to get rid of this
> remember?)


Here is an example regular expression that someone on this group gave me. It
gives everything between the body tags.
<?php
$html_text = '
<html>
<head>
<title>Untitled</title>
</head>
<body>
Blah Blah Blah Blah
</body>
</html>
';
preg_match("/<body>(.*)<\/body>/i",$html_text,$matches);
echo $html_text;
?>

Here is a class that removes un-needed word 2000 HTML tags:
http://www.phpclasses.org/browse.html/package/277.html

If you need the styling you will need to do an extra regular expression to
get out of the head and perhaps put it into a file.
If you don't need styling I would recomment parsing the document itself and
removing all the class="" and style="" attributes


--
JJ Harrison
[EMAIL PROTECTED]
www.tececo.com

--
Please reply on the list/newsgroup unless the reply it OT.



-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to