R.Raghuram wrote:

Hi all,

I need some scripting help to extract some urls from a directory of web pages.

Hi! Here's a cheapy solution to your problem.
Save the following script as href.pl

-----------

#!/usr/bin/perl
while(<STDIN>) {
   if (/<a\s+href\s*=\s*("|')(.[^>"]*)("|')(.*)>/) {
   print $2, "\n";
   }
}

----------
Now, from the top level directory (the path from where you wish to extract URLs), run the following command at your bash prompt:


( for i in `find . -name *.html -print`; do ./href.pl < $i; done ) > url_list.txt

---------
Hope this solves your problem.

Cheers,
Chandrashekar Babu.





-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
linux-india-help mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/linux-india-help

Reply via email to