Re: [PHP] How can I strip the code from HTML pages to extract thecontents of a HTML page.
Hi Charles, Not sure exactly what you are after, but function displayLinks ($pagecontents) { $search = '//im'; $replace = ''; return (preg_replace ($search, $replace, $pagecontents)); } For me, that takes all the links in $pagecontents and modifies the links for a recorder I am building. You could do something simular with it, although if you need the name, you might want... $search '/(.*?)<\/a>/im'; And $4 would be your name. I hope this helps. Todd. - Original Message - From: "Charles Fowler" <[EMAIL PROTECTED]> To: "Justin French" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Friday, August 30, 2002 10:14 AM Subject: Re: [PHP] How can I strip the code from HTML pages to extract thecontents of a HTML page. > I was looking into stripping HTML files that contain alot of links. I > was trying to avoid the manual way of data entry. The contents i need > are the name of the link (plain text which sits out side the HTML code) > and all the a href tags. I would like the a href (ie.the hyperlink) > tags to be displayed on the HTML output as plain text. All other HTML > tags would be kept in place. > > The reason why I am doing this is that I am placing a link's name and > the http:// link in to flat files, where they can be updated just by > appending to them. The srcipt that I have does the rest. > > I have looked into the functions suggested but do find the concepts and > use of the opperators to strip the HTML involved esoteric and tricky. > > ¬¬Chuck¬¬ > > > -- > PHP General Mailing List (http://www.php.net/) > To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] How can I strip the code from HTML pages to extract thecontents of a HTML page.
I was looking into stripping HTML files that contain alot of links. I was trying to avoid the manual way of data entry. The contents i need are the name of the link (plain text which sits out side the HTML code) and all the a href tags. I would like the a href (ie.the hyperlink) tags to be displayed on the HTML output as plain text. All other HTML tags would be kept in place. The reason why I am doing this is that I am placing a link's name and the http:// link in to flat files, where they can be updated just by appending to them. The srcipt that I have does the rest. I have looked into the functions suggested but do find the concepts and use of the opperators to strip the HTML involved esoteric and tricky. ¬¬Chuck¬¬ -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] How can I strip the code from HTML pages to extract thecontents of a HTML page.
Either the ereg_replace, eregi_replace, or preg_replace has a full working script that does this, returning pretty much plain text. There's also the strip_tags()/striptags() function which strips out all PHP and HTML tags -- perhaps not enough, nice you'd want to remove *some* other stuff maybe, but it's a good start, and may be used in conjunction with other stuff. You haven't said if you want: - all the stuff between the body tags OR - all the stuff that isn't tags (would include the title, and perhaps other stuff As per usual, specifically asking for what you want helps, but there is HEAPS of ways of doing this. More than likely you'll find/build the components you need in different places: - recursively run through a directory for each HTML file - stripping each HTML file - possibly presenting the raw text in a TEXTAREA for previewing/modifying - adding the text to the DB, probably assigning the ID based on the original filename, or something Etc etc Good luck, Justin on 28/08/02 11:58 PM, Charles Fowler ([EMAIL PROTECTED]) wrote: > This may be an interesting challenge for someone or has it been done > before > > Can some one help me. > > I am looking for a laboursaving method of extracting the contents of a > web page (Text only) and dumping the rest of the html code. > > I need the contents to rework the pages and put the contents into flat > file database. Large but only two columns of data. Simple to work with > (no need for DB) - They are just alot of links on a links page. > > Scripts would be welcome. > > Ciao, Carlos > > > -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php