-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
>>>>> "CB" == Chandrashekar B <[EMAIL PROTECTED]> writes:
CB> Binand Sethumadhavan wrote:
>> It also won't work in a directory layout like this:
>>
>> . ./foo.html ./morehtmlfiles ./morehtmlfiles/a.html
>> ./morehtmlfiles/b.html ./morehtmlfiles/c.html ...
>>
>>
CB> A quick and dirty modification to the command as below must
CB> handle the above case: ( for i in `find . -type f | grep
CB> "\.html$"`; do ./href.pl < $i; done )
>> url_list.txt
CB> Then again, all this with the assumption that the relevant
CB> files end with .html . This solution isn't bullet-proof
CB> either.
This one seems to work for files with spaces in the names, files with
extensions other than .html, case-insensitive HTML tags and wrapped
href's like this:
<a
href="a.very.long.url.here.that.gets.wrapped.by.smart.editors">
find . -type f -print0 | xargs -0 file | fgrep 'HTML document text' | cut -d: -f1 |
while read f ; do perl -ne 'undef $/ ; while(s/<a\s*href="([^>]*)">//is){print
"$1\n";}' "$f" ; done
If you want me to explain it it'll cost you lots of mango colada's at
CCD -- the real thing, not promises ;)
Regards,
- -- Raju
- --
Raj Mathur [EMAIL PROTECTED] http://kandalaya.org/
GPG: 78D4 FC67 367F 40E2 0DD5 0FEF C968 D0EF CC68 D17F
It is the mind that moves
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>
iD8DBQFBKAuGyWjQ78xo0X8RAsZ4AJ4jI4mVt4R7HaQcneAb5dzeGTjFpgCfas1N
worBQMM+LjHWNVvv7jeyUBg=
=aRXL
-----END PGP SIGNATURE-----
-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
linux-india-help mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/linux-india-help