RE: read source file of .html
That works. It became tweaked a little, $page = shift to be able to alter the result, and a '/' b/c a top-level URL without file name and without trailing forward slash gets redirected on the server to the version with the trailing forward slash. A little quicker. In detail, I think that http://www.someplace.com/~user would first look for a file called ~user and then say, doh, that must be a directory, and find the index or default page for http://www.someplace.com/~user/, and then display the latter, with the trailing slash. Ok, too much information. Thank you very much! Gary #!perl use HTML::Parser 3; use LWP::Simple; my $html = get("http://www.mit.edu/";) or die "Couldn't fetch the page"; my $parser = HTML::Parser->new( unbroken_text => 1, ignore_elements => [qw( script head )], text_h => [ sub { $page = shift; }, 'dtext'] )->parse($html)->eof(); $page =~ s#\n\s*\n#\n#g; print $page; __END__ .. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: read source file of .html
Depending on what you are doing... I have found a lot of great ways to pull out tables from HTML using HTML::TableExtract and LWP::UserAgent and HTML::TreeBuilder. I really haven't delved in to all of the libraries under HTML, but these have been greate. see cpan.org or perldoc for more info. -Original Message- From: Gary Hawkins [mailto:[EMAIL PROTECTED]] Sent: Tuesday, January 15, 2002 2:26 AM To: [EMAIL PROTECTED] Subject: RE: read source file of .html > use LWP. it can be as simple as this : > > > use LWP::Simple; > print get("http://www.mit.edu";); > > Tor. > Neat. Along that line, I would like to be able to wind up with pages after retrieval as plain text without html tags, hopefully using a module. /g -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: read source file of .html
Gary Hawkins wrote: > Along that line, I would like to be able to wind up with pages after retrieval > as plain text without html tags, hopefully using a module. Here's a really quick way to do so using HTML::Parser, it can probably use some tweaking. Hope this helps, Briac #!/usr/bin/perl -w use strict; use HTML::Parser 3; use LWP::Simple; my $html = get("http://www.mit.edu";) or die "Couldn't fetch the page"; my $parser = HTML::Parser->new( unbroken_text => 1, ignore_elements => [qw( script head )], text_h => [ sub {print shift}, 'dtext'] )->parse($html)->eof(); __END__ -- briac A flying lark. Five trout swim in the pond. Four foxes under a she-oak. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: read source file of .html
> use LWP. it can be as simple as this : > > > use LWP::Simple; > print get("http://www.mit.edu";); > > Tor. > Neat. Along that line, I would like to be able to wind up with pages after retrieval as plain text without html tags, hopefully using a module. /g -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: read source file of .html
use LWP. it can be as simple as this : use LWP::Simple; print get("http://www.mit.edu";); Tor. yun yun wrote: > if I want to read the real html file from web, such as > http://www.mit.edu, should I use sock programming? and > if then, how could I use,and where can I study this > aspect? Thanks! > > _ > Do You Yahoo!? µÇ¼Ãâ·ÑÑÅ»¢µçÓÊ! http://mail.yahoo.com.cn > > ÎÞÁÄ£¿ÓôÃÆ£¿¸ßÐË£¿Ã»ÀíÓÉ£¿¶¼À´ÁÄÌì°É£¡¡ª¡ª > ÑÅ»¢??ÐÂÁÄÌìÊÒ! http://cn.chat.yahoo.com/c/roomlist.html > > -- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
read source file of .html
if I want to read the real html file from web, such as http://www.mit.edu, should I use sock programming? and if then, how could I use,and where can I study this aspect? Thanks! _ Do You Yahoo!? µÇ¼Ãâ·ÑÑÅ»¢µçÓÊ! http://mail.yahoo.com.cn ÎÞÁÄ£¿ÓôÃÆ£¿¸ßÐË£¿Ã»ÀíÓÉ£¿¶¼À´ÁÄÌì°É£¡¡ª¡ª ÑÅ»¢È«ÐÂÁÄÌìÊÒ! http://cn.chat.yahoo.com/c/roomlist.html -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
read source file of .html
if I want to read the real html file from web, such as http://www.mit.edu, should I use sock programming? and if then, how could I use,and where can I study this aspect? Thanks! _ Do You Yahoo!? µÇ¼Ãâ·ÑÑÅ»¢µçÓÊ! http://mail.yahoo.com.cn ÎÞÁÄ£¿ÓôÃÆ£¿¸ßÐË£¿Ã»ÀíÓÉ£¿¶¼À´ÁÄÌì°É£¡¡ª¡ª ÑÅ»¢È«ÐÂÁÄÌìÊÒ! http://cn.chat.yahoo.com/c/roomlist.html -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]