R.Raghuram wrote:
Hi all,
I need some scripting help to extract some urls from a directory of web pages.
Hi! Here's a cheapy solution to your problem. Save the following script as href.pl
-----------
#!/usr/bin/perl
while(<STDIN>) {
if (/<a\s+href\s*=\s*("|')(.[^>"]*)("|')(.*)>/) {
print $2, "\n";
}
}----------
Now, from the top level directory (the path from where you wish to extract URLs), run the following command at your bash prompt:
( for i in `find . -name *.html -print`; do ./href.pl < $i; done ) > url_list.txt
--------- Hope this solves your problem.
Cheers, Chandrashekar Babu.
------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ linux-india-help mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/linux-india-help
