Just lurking and I saw this.  A simple technique might be to insert a
new line before each href then use grep and cut.  e.g. open it in vim
and do:

:%s/href=/^Mhref=/gc
:%s/HREF=/^Mhref=/gc

(where ^M is ctrl+v followed by the return key)


Then

grep href filename.html|cut -d '"' -f 2

and optionally

... | sort | uniq

There might be some way to do a case insensitive find and replace in
vim, but I don't know it of the top of my head.

Jeremy.

On 12 September 2011 11:19, Vic <l...@beer.org.uk> wrote:
>
>> You can probably do this quite easily in perl.
>
> You can.
>
>> Are there any nice short programs to do this?
>
> Something like this?
>
> #! /usr/bin/perl
>
> my $fname = $ARGV[0];
> die "need a filename" unless defined ($fname);
>
> open INFILE, "<$fname" or die "Can't open $fname for reading";
>
> while (<INFILE>)
> {
>    my @links = $_ =~ m|<a +href="([^"]+)"|gc;
>    if(scalar(@links) > 0)
>    {
>        foreach my $link (@links)
>        {
>            # Do something here
>            print "Link : $link\n";
>        }
>    }
> }
>
> You could probably write this in a much more compact fashion if you wanted.
>
> Vic.
>
>
> --
> Please post to: Hampshire@mailman.lug.org.uk
> Web Interface: https://mailman.lug.org.uk/mailman/listinfo/hampshire
> LUG URL: http://www.hantslug.org.uk
> --------------------------------------------------------------
>

--
Please post to: Hampshire@mailman.lug.org.uk
Web Interface: https://mailman.lug.org.uk/mailman/listinfo/hampshire
LUG URL: http://www.hantslug.org.uk
--------------------------------------------------------------

Reply via email to