Just lurking and I saw this. A simple technique might be to insert a new line before each href then use grep and cut. e.g. open it in vim and do:
:%s/href=/^Mhref=/gc :%s/HREF=/^Mhref=/gc (where ^M is ctrl+v followed by the return key) Then grep href filename.html|cut -d '"' -f 2 and optionally ... | sort | uniq There might be some way to do a case insensitive find and replace in vim, but I don't know it of the top of my head. Jeremy. On 12 September 2011 11:19, Vic <l...@beer.org.uk> wrote: > >> You can probably do this quite easily in perl. > > You can. > >> Are there any nice short programs to do this? > > Something like this? > > #! /usr/bin/perl > > my $fname = $ARGV[0]; > die "need a filename" unless defined ($fname); > > open INFILE, "<$fname" or die "Can't open $fname for reading"; > > while (<INFILE>) > { > my @links = $_ =~ m|<a +href="([^"]+)"|gc; > if(scalar(@links) > 0) > { > foreach my $link (@links) > { > # Do something here > print "Link : $link\n"; > } > } > } > > You could probably write this in a much more compact fashion if you wanted. > > Vic. > > > -- > Please post to: Hampshire@mailman.lug.org.uk > Web Interface: https://mailman.lug.org.uk/mailman/listinfo/hampshire > LUG URL: http://www.hantslug.org.uk > -------------------------------------------------------------- > -- Please post to: Hampshire@mailman.lug.org.uk Web Interface: https://mailman.lug.org.uk/mailman/listinfo/hampshire LUG URL: http://www.hantslug.org.uk --------------------------------------------------------------