On Mon, May 9, 2011 at 12:04, Sandip Bhattacharya < sand...@foss-community.com> wrote:
> On Mon, May 9, 2011 at 11:44 PM, Tiago Hori <tiago.h...@gmail.com> wrote: > > I am trying to write a small script to parse bibliographic references > like > > this: > > > > Morgan, M.J., Wilson, C.E., Crim, L.W., 1999. The effect of stress on > > reproduction in Atlantic cod. J. Fish Biol. 54, 477-488. > > > > What I want to be able to do eventually is parse each name separately and > > associate that with the title. I am not sure how yet, but I haven't even > got > > there. > > I took a stab at this. It might not be perfect and catch all possible > variations. But in any case, unless you have rules for the text in > these entries, it is very difficult to catch them all. > > ========================================================= > #!/usr/bin/perl > # > > use strict; > use warnings; > > my $text = <<END; > Morgan, M.J., Wilson, C.E., Crim, L.W., 1999. The effect of stress on > reproduction in Atlantic cod. J. Fish Biol. 54, 477-488. > END > > my @authors=(); > > # Extract authors > # Assuming each author is composed of one of more matches of: > # <SPACE>* WORD, <SPACE>* (ALPHABET PERIOD)+ > if (my @matches = $text =~ m/(\s*\w+,\s*(\w\.)+),/gs) { > while(@matches) { > my $match = shift @matches; > my @comps = map {s/^ +//;s/ +$//;$_} (split ",", $match); > push @authors, join " ",@comps[1,0]; > shift @matches; > } > } > > # Extract title > # Everything from the first period followed by a space to the next period. > # Authors should have periods followed by either a letter or a comma > # for this to work > if ($text =~m/\. (.*?)\./s) { > my $title = $1; > $title =~ s/\n/ /g; > foreach(@authors) { > print "$title: $_\n"; > } > } > ===================================================================== > > $ ./match_2.pl > The effect of stress on reproduction in Atlantic cod: M.J. Morgan > The effect of stress on reproduction in Atlantic cod: C.E. Wilson > The effect of stress on reproduction in Atlantic cod: L.W. Crim > > All, please let me know if there is a way to combine both the regexes. > I had a brain coredump before I gave up. > > Thanks, > Sandip > Hasn't someone already fixed this problem? If there isn't a CPAN module to perform standardized bibliographic reference formatting/parsing. I haven't looked at CPAN; did either of you? If a CPAN module doesn't exist, one should! Ken Wolcott