On Mon, May 9, 2011 at 12:04, Sandip Bhattacharya <
sand...@foss-community.com> wrote:

> On Mon, May 9, 2011 at 11:44 PM, Tiago Hori <tiago.h...@gmail.com> wrote:
> > I am trying to write a small script to parse bibliographic references
> like
> > this:
> >
> > Morgan, M.J., Wilson, C.E., Crim, L.W., 1999. The effect of stress on
> > reproduction in Atlantic cod. J. Fish Biol. 54, 477-488.
> >
> > What I want to be able to do eventually is parse each name separately and
> > associate that with the title. I am not sure how yet, but I haven't even
> got
> > there.
>
> I took a stab at this. It might not be perfect and catch all possible
> variations. But in any case, unless you have rules for the text in
> these entries, it is very difficult to catch them all.
>
> =========================================================
> #!/usr/bin/perl
> #
>
> use strict;
> use warnings;
>
> my $text = <<END;
> Morgan, M.J., Wilson, C.E., Crim, L.W., 1999. The effect of stress on
> reproduction in Atlantic cod. J. Fish Biol. 54, 477-488.
> END
>
> my @authors=();
>
> # Extract authors
> # Assuming each author is composed of one of more matches of:
> #   <SPACE>* WORD, <SPACE>* (ALPHABET PERIOD)+
> if (my @matches = $text =~ m/(\s*\w+,\s*(\w\.)+),/gs) {
>    while(@matches) {
>        my $match = shift @matches;
>        my @comps = map {s/^ +//;s/ +$//;$_} (split ",", $match);
>        push @authors, join " ",@comps[1,0];
>        shift @matches;
>    }
> }
>
> # Extract title
> # Everything from the first period followed by a space to the next period.
> # Authors should have periods followed by either a letter or a comma
> # for this to work
> if ($text =~m/\. (.*?)\./s) {
>    my $title =  $1;
>    $title =~ s/\n/ /g;
>    foreach(@authors) {
>        print "$title: $_\n";
>    }
> }
> =====================================================================
>
> $ ./match_2.pl
> The effect of stress on reproduction in Atlantic cod: M.J. Morgan
> The effect of stress on reproduction in Atlantic cod: C.E. Wilson
> The effect of stress on reproduction in Atlantic cod: L.W. Crim
>
> All, please let me know if there is a way to combine both the regexes.
> I had a brain coredump before I gave up.
>
> Thanks,
>  Sandip
>

Hasn't someone already fixed this problem?  If there isn't a CPAN module to
perform standardized bibliographic reference formatting/parsing.  I haven't
looked at CPAN; did either of you?  If a CPAN module doesn't exist, one
should!

Ken Wolcott

Reply via email to