Re: [CODE4LIB] Citation parsing?

2007-07-20 Thread Joe Atzberger

On 7/20/07, Eric Hellman <[EMAIL PROTECTED]> wrote:


Have people been able to do a decent job of identifying parts of
speech in natural language?



I think trying to import broad NLP findings into our narrower problem of
citation parsing is not likely to be fruitful but on the other hand
stealing their tools seems perfectly reasonable, and this group seems to be
familiar with several.

About 8 years ago, I made use of a parser-genator called ANTLR (ANother Tool
For Language Recognition) that takes an EBNF grammar spec and builds a
parser.  Since then developers have improved the tool with some new versions
and even a GUI development environment.  The languages recognized in
practice all seem to be well-defined programming languages, but if you
wanted to roll your own (new) parser for citations, ANTLR might help.

I think ANTLR satisfies Eric's first two crtiteria for flexibility and ease
of extension and might be used to satisfy the third (broad contextual
info).  It now includes a kind of ability to back itself out of rule descent
and try other alternatives in the tree if the static gramar fails.  The
license is BSD.  Notably, it supports unicode and the new version does NOT
require a pre-specified number of look-ahead tokens. And the userbase is
fairly broad for such a specialized tool.

This might be considered an incongruous solution inasmuch as you are asking
for parser characteristics and I am recommending a parser generator that
*could* produce the kind of parser you want.  But I think that is
appropriate for the task described.

--joe


Re: [CODE4LIB] Citation parsing?

2007-07-20 Thread Nathan Vack

On Jul 20, 2007, at 9:14 AM, Eric Hellman wrote:


Heuristics are perhaps the only way to deal with lack of consistent
format. (i.e. "a cluster of words including "journal of" is likely
to contain a journal name")


You're right; in a lot of ways, it depends on what you consider a
heuristic; every algorithm involves human intervention to describe
what features it should consider; there's no magic there. In this
case, I'm positing that a system in which we manually identify and
weight rules isn't gonna scale; we'll need to use some kind of
machine learning.

In this specific case, I might instead try a feature that's more like
"the occurrence of this word in journal titles versus the occurrence
of this word in ordinary text" and then let some ML algorithm train
and weight that feature.

And "fun" problems abound in even finding delimiters between parts of
a citaion -- are we using ',' or '.' or something else? Is that
delimiter used in other contexts? What happens if a citation's
missing a delimiter...?


Have people been able to do a decent job of identifying parts of
speech in natural language?


Yeah... good PoS taggers (I'm looking at a paper on a Markov model-
based tagger now) run in the 95-98% accuracy range. The standard
dataaset, however, seems to be a collection of Wall Street Journal
articles, which are gonna be cleaner than our citation listings. Then
again, general language is more complex than citations, so... who knows?

Oddly, the literature has been relatively quiet on this topic for the
last few years -- lots of papers from the late '90s, but not so much
in the last couple years. But check Scholar; there's a lot to build on.

-n


Re: [CODE4LIB] Citation parsing?

2007-07-20 Thread Eric Hellman

On Jul 18, 2007, at 10:04 PM, Eric Hellman wrote:
Also, even in (many) scholarly journals, editorial consistency is
almost unbelievably poor -- lots of times, the rules just aren't
followed. Punctuation gets missed, journal names (especially
abbreviations!) are misspelled... and so on. Rule-based and heuristic
systems are always going to have problems in those cases.


Heuristics are perhaps the only way to deal with lack of consistent
format. (i.e. "a cluster of words including "journal of" is likely to
contain a journal name")
If you have a halfway decent journal name parser (such as the one in
our openurl software) it already contains a large list of journal
misspellings.



In a lot of ways, I think the problem is fundamentally similar to
identifying parts of speech in natural language (which has lots of
the same ambiguities) -- and the same techniques that succeed there
will probably yield the most robust results for citation parsing.


Have people been able to do a decent job of identifying parts of
speech in natural language?
--

Eric Hellman, DirectorOCLC Openly
Informatics Division
[EMAIL PROTECTED]  [EMAIL PROTECTED]   2 Broad St., Suite 
208
tel 1-973-509-7800 fax 1-734-468-6216 Bloomfield, NJ 07003
http://openly.oclc.org/1cate/  1 Click Access To Everything


Re: [CODE4LIB] Citation parsing?

2007-07-19 Thread Nathan Vack

On Jul 18, 2007, at 10:04 PM, Eric Hellman wrote:


Anyway, almost all parsers rely on a set of heuristics. I have not
seen any parsers that do a good job of managing their heuristics in a
scaleable way. A successful open-source attack on this problem would
have the following characteristics:
1. able to efficiently handle and manage large numbers of parsing and
scoring heuristics
2. easy for contributors to add parsing and scoring heuristics
3. able to use contextual information (is the citation from a physics
article or from a history monograph?) in application  and scoring of
heuristics


One of the more problematic things is that we don't always get the
contextual information as to where citations occurred -- in fact,
it's quite rare to get that.

Also, even in (many) scholarly journals, editorial consistency is
almost unbelievably poor -- lots of times, the rules just aren't
followed. Punctuation gets missed, journal names (especially
abbreviations!) are misspelled... and so on. Rule-based and heuristic
systems are always going to have problems in those cases.

In a lot of ways, I think the problem is fundamentally similar to
identifying parts of speech in natural language (which has lots of
the same ambiguities) -- and the same techniques that succeed there
will probably yield the most robust results for citation parsing.

-n


Re: [CODE4LIB] Citation parsing?

2007-07-18 Thread Eric Hellman

Having written a pretty decent citation parser 10 years ago (in
Applescript!), and having seen a lot of people take whacks at the
problem, I have to say that it's pretty easy to write one that works
on 70-80% of citations, particularly if you stick to one scholarly
subject area. On the other hand, it's really quite hard to write a
citation parser that gets better than 90 % of citations for a general
corpus .

The main problem is that scholarly works are written by creative,
ingenious people. When applied to citations, creativity and ingenuity
are disasters for automated parsers.

Parsers working on the computer science literature have come the
farthest, mostly because the convention in computer science
literature is to always include the article title. The most
impressive thing to me about Google Scholar when it was first
released was to see how far they had taken the citation parsing
outside of the computer science literature. Still, they have a ways
to go; most of the progress they've made seems to be by cheating (
i.e. backing the citation out of the linking, which means they're
just piggybacking on the work done by Inera and others).

(Hint: one of the very best performing open source citation parsers
was written (in perl) by Steve Lawrence, who was at NEC at the time,
as part of ResearchIndex AKA CiteSeer. It was released as pseudo open
source, but not so easy to separate. It relied heavily on the
availability of the article title. Steve has been at Google for a
while. Steve apparently wasn't involved in Scholar, but you have to
assume he and Anurag  did a fair amount of comparing notes.)

Anyway, almost all parsers rely on a set of heuristics. I have not
seen any parsers that do a good job of managing their heuristics in a
scaleable way. A successful open-source attack on this problem would
have the following characteristics:
1. able to efficiently handle and manage large numbers of parsing and
scoring heuristics
2. easy for contributors to add parsing and scoring heuristics
3. able to use contextual information (is the citation from a physics
article or from a history monograph?) in application  and scoring of
heuristics

Eric


It's on our list of Big Problems To Solve; I'm hoping to have time to
tackle it later this year :)

-n

On Jul 18, 2007, at 12:57 PM, Jonathan Rochkind wrote:


Ha! If it's not too difficult, then with all the time you've spent
"looking at it extensively", how come you don't have a solution yet?

Just kidding. :)

Jonathan

Nathan Vack wrote:

We've looked at this pretty extensively, and we're pretty certain
there's nothing downloadable that does a "good enough" job. However,
it's by no means impossible -- it seems to be undergrad thesis-level
work in Singapore:

http://wing.comp.nus.edu.sg/parsCit/

There used to be a paper describing this approach (essentially
treating citation parsing as a natural language processing task and
using a maximum entropy algorithm) online... the page even cites
it... but it seems to be gone now.

FWIW, it didn't look too difficult.

-Nate

On Jul 17, 2007, at 6:16 PM, Jonathan Rochkind wrote:


Does anyone have any decent open source code to parse a citation?
I'm
talking about a completely narrative citation like someone might
cut-and-paste from a bibliography or web page. I realize there are a
number of differnet formats this could be in (not to mention the
human
error problems that always occur from human entered free text)--but
thinking about it, I suspect that with some work you could get
something
that worked reasonably well (if not perfect). So I'm wondering if
anyone
has donethis work.

(One of the commerical legal product--I forget if it's Lexis or
West--does this with legal citations--a more limited domain--quite
well.  I'm not sure if any of the commerical bibliographic citation
management software does this?)

The goal, as you can probably guess, is a box that the user can
paste a
citation into; make an OpenURL out of it; show the user where to
get the
citation.  I'm pretty confident something useful could be created
here,
with enough time put into it. But saldy, it's probably more time
than
anyone has individually. Unless someone's done it already?

Hopefully,
Jonathan





--
Jonathan Rochkind
Sr. Programmer/Analyst
The Sheridan Libraries
Johns Hopkins University
410.516.8886
rochkind (at) jhu.edu



--

Eric Hellman, DirectorOCLC Openly
Informatics Division
[EMAIL PROTECTED]  [EMAIL PROTECTED]   2 Broad St., Suite 
208
tel 1-973-509-7800 fax 1-734-468-6216 Bloomfield, NJ 07003
http://openly.oclc.org/1cate/  1 Click Access To Everything


Re: [CODE4LIB] Citation parsing?

2007-07-18 Thread Nathan Vack

It's on our list of Big Problems To Solve; I'm hoping to have time to
tackle it later this year :)

-n

On Jul 18, 2007, at 12:57 PM, Jonathan Rochkind wrote:


Ha! If it's not too difficult, then with all the time you've spent
"looking at it extensively", how come you don't have a solution yet?

Just kidding. :)

Jonathan

Nathan Vack wrote:

We've looked at this pretty extensively, and we're pretty certain
there's nothing downloadable that does a "good enough" job. However,
it's by no means impossible -- it seems to be undergrad thesis-level
work in Singapore:

http://wing.comp.nus.edu.sg/parsCit/

There used to be a paper describing this approach (essentially
treating citation parsing as a natural language processing task and
using a maximum entropy algorithm) online... the page even cites
it... but it seems to be gone now.

FWIW, it didn't look too difficult.

-Nate

On Jul 17, 2007, at 6:16 PM, Jonathan Rochkind wrote:


Does anyone have any decent open source code to parse a citation?
I'm
talking about a completely narrative citation like someone might
cut-and-paste from a bibliography or web page. I realize there are a
number of differnet formats this could be in (not to mention the
human
error problems that always occur from human entered free text)--but
thinking about it, I suspect that with some work you could get
something
that worked reasonably well (if not perfect). So I'm wondering if
anyone
has donethis work.

(One of the commerical legal product--I forget if it's Lexis or
West--does this with legal citations--a more limited domain--quite
well.  I'm not sure if any of the commerical bibliographic citation
management software does this?)

The goal, as you can probably guess, is a box that the user can
paste a
citation into; make an OpenURL out of it; show the user where to
get the
citation.  I'm pretty confident something useful could be created
here,
with enough time put into it. But saldy, it's probably more time
than
anyone has individually. Unless someone's done it already?

Hopefully,
Jonathan





--
Jonathan Rochkind
Sr. Programmer/Analyst
The Sheridan Libraries
Johns Hopkins University
410.516.8886
rochkind (at) jhu.edu



Re: [CODE4LIB] Citation parsing?

2007-07-18 Thread Jonathan Rochkind

Ha! If it's not too difficult, then with all the time you've spent
"looking at it extensively", how come you don't have a solution yet?

Just kidding. :)

Jonathan

Nathan Vack wrote:

We've looked at this pretty extensively, and we're pretty certain
there's nothing downloadable that does a "good enough" job. However,
it's by no means impossible -- it seems to be undergrad thesis-level
work in Singapore:

http://wing.comp.nus.edu.sg/parsCit/

There used to be a paper describing this approach (essentially
treating citation parsing as a natural language processing task and
using a maximum entropy algorithm) online... the page even cites
it... but it seems to be gone now.

FWIW, it didn't look too difficult.

-Nate

On Jul 17, 2007, at 6:16 PM, Jonathan Rochkind wrote:


Does anyone have any decent open source code to parse a citation? I'm
talking about a completely narrative citation like someone might
cut-and-paste from a bibliography or web page. I realize there are a
number of differnet formats this could be in (not to mention the human
error problems that always occur from human entered free text)--but
thinking about it, I suspect that with some work you could get
something
that worked reasonably well (if not perfect). So I'm wondering if
anyone
has donethis work.

(One of the commerical legal product--I forget if it's Lexis or
West--does this with legal citations--a more limited domain--quite
well.  I'm not sure if any of the commerical bibliographic citation
management software does this?)

The goal, as you can probably guess, is a box that the user can
paste a
citation into; make an OpenURL out of it; show the user where to
get the
citation.  I'm pretty confident something useful could be created
here,
with enough time put into it. But saldy, it's probably more time than
anyone has individually. Unless someone's done it already?

Hopefully,
Jonathan





--
Jonathan Rochkind
Sr. Programmer/Analyst
The Sheridan Libraries
Johns Hopkins University
410.516.8886
rochkind (at) jhu.edu


Re: [CODE4LIB] Citation parsing?

2007-07-18 Thread Nathan Vack

We've looked at this pretty extensively, and we're pretty certain
there's nothing downloadable that does a "good enough" job. However,
it's by no means impossible -- it seems to be undergrad thesis-level
work in Singapore:

http://wing.comp.nus.edu.sg/parsCit/

There used to be a paper describing this approach (essentially
treating citation parsing as a natural language processing task and
using a maximum entropy algorithm) online... the page even cites
it... but it seems to be gone now.

FWIW, it didn't look too difficult.

-Nate

On Jul 17, 2007, at 6:16 PM, Jonathan Rochkind wrote:


Does anyone have any decent open source code to parse a citation? I'm
talking about a completely narrative citation like someone might
cut-and-paste from a bibliography or web page. I realize there are a
number of differnet formats this could be in (not to mention the human
error problems that always occur from human entered free text)--but
thinking about it, I suspect that with some work you could get
something
that worked reasonably well (if not perfect). So I'm wondering if
anyone
has donethis work.

(One of the commerical legal product--I forget if it's Lexis or
West--does this with legal citations--a more limited domain--quite
well.  I'm not sure if any of the commerical bibliographic citation
management software does this?)

The goal, as you can probably guess, is a box that the user can
paste a
citation into; make an OpenURL out of it; show the user where to
get the
citation.  I'm pretty confident something useful could be created
here,
with enough time put into it. But saldy, it's probably more time than
anyone has individually. Unless someone's done it already?

Hopefully,
Jonathan



Re: [CODE4LIB] Citation parsing?

2007-07-18 Thread Alberto Accomazzi

Hi Jonathan,

There is a PERL module by Mike Jewell which was written for this purpose:
http://search.cpan.org/~mjewell/Biblio-Citation-Parser-1.10/

I am not using the code, so I can't comment on how well it may work for
your purpose, but it's probably worth a look.

-- Alberto



On 7/17/07, Jonathan Rochkind <[EMAIL PROTECTED]> wrote:

Does anyone have any decent open source code to parse a citation? I'm
talking about a completely narrative citation like someone might
cut-and-paste from a bibliography or web page. I realize there are a
number of differnet formats this could be in (not to mention the human
error problems that always occur from human entered free text)--but
thinking about it, I suspect that with some work you could get something
that worked reasonably well (if not perfect). So I'm wondering if anyone
has donethis work.

(One of the commerical legal product--I forget if it's Lexis or
West--does this with legal citations--a more limited domain--quite
well.  I'm not sure if any of the commerical bibliographic citation
management software does this?)

The goal, as you can probably guess, is a box that the user can paste a
citation into; make an OpenURL out of it; show the user where to get the
citation.  I'm pretty confident something useful could be created here,
with enough time put into it. But saldy, it's probably more time than
anyone has individually. Unless someone's done it already?

Hopefully,
Jonathan



Dr. Alberto Accomazzi  aaccomazzi(at)cfa harvard edu
NASA Astrophysics Data Systemads.harvard.edu
Harvard-Smithsonian Center for Astrophysics  www.cfa.harvard.edu
60 Garden St, MS 67, Cambridge, MA 02138, USA



Re: [CODE4LIB] Citation parsing?

2007-07-18 Thread Godmar Back

On 7/18/07, Jonathan Rochkind <[EMAIL PROTECTED]> wrote:

Nice, that might be what I need. Maybe I'll take a look at the LibX
code, it's open source, right?

Google Scholar has no API--you're screen scraping it?



Yes and yes.

- Godmar


Re: [CODE4LIB] Citation parsing?

2007-07-18 Thread Godmar Back

On 7/18/07, Steve Toub <[EMAIL PROTECTED]> wrote:


Agreed that a lookup against something like Google Scholar, Web of
Science, or a set of federated search targets instance may yield better
results. We've discussed by haven't done any testing.


Use your LibX edition, Steve. I can also send a draft of the paper
that describes our method to anyone interested.

Which reminds me: our new edition builder interface is up and being
rolled out agile style.
The URL is http://libx.org/editionbuilder if people like to give this a spin.

- Godmar


Re: [CODE4LIB] Citation parsing?

2007-07-18 Thread Jonathan Rochkind

Nice, that might be what I need. Maybe I'll take a look at the LibX
code, it's open source, right?

Google Scholar has no API--you're screen scraping it?

Jonathan

Godmar Back wrote:

A year or so ago a couple of students looked into this for LibX. There
are a number of systems that people have published about, although
some are not available and none worked very well or were easy to get
to work. The systems also varied in their computational complexity,
with some not suitable for interactive use. Google for "libx citation
sensing", or generally for citation extraction, automatic record
boundary detection or extraction. (Unfortunately, pubs.dlib.vt.edu
appears to be down at the moment - otherwise, Suresh Menon's report
contains a useful bibliography of work. I'll ping them.)

For citations that contain item titles (which is true for a majority,
but definitely not all citation styles) LibX's magic button uses
Scholar as a hidden backend to produce an actionable OpenURL. Combined
with a similarity analysis, this  "magic button" functionality
produces a usable OpenURL in (on average) 81% of cases for a set of
400 randomly chosen citations from 4 widely read journals from 4
different areas published in 2006 [1].  With some fixes, we could
probably get this number up to 90%. Obviously, this approach only
works for individual use, Google would object for large scale batch
uses.

- Godmar

[1] Annette Bailey and Godmar Back, Retrieving Known Items with LibX.
The Serials Librarian, 2007. To appear.

On 7/17/07, Jonathan Rochkind <[EMAIL PROTECTED]> wrote:

Does anyone have any decent open source code to parse a citation? I'm
talking about a completely narrative citation like someone might
cut-and-paste from a bibliography or web page. I realize there are a
number of differnet formats this could be in (not to mention the human
error problems that always occur from human entered free text)--but
thinking about it, I suspect that with some work you could get something
that worked reasonably well (if not perfect). So I'm wondering if anyone
has donethis work.

(One of the commerical legal product--I forget if it's Lexis or
West--does this with legal citations--a more limited domain--quite
well.  I'm not sure if any of the commerical bibliographic citation
management software does this?)

The goal, as you can probably guess, is a box that the user can paste a
citation into; make an OpenURL out of it; show the user where to get the
citation.  I'm pretty confident something useful could be created here,
with enough time put into it. But saldy, it's probably more time than
anyone has individually. Unless someone's done it already?

Hopefully,
Jonathan





--
Jonathan Rochkind
Sr. Programmer/Analyst
The Sheridan Libraries
Johns Hopkins University
410.516.8886
rochkind (at) jhu.edu


Re: [CODE4LIB] Citation parsing?

2007-07-18 Thread Steve Toub

Godmar Back wrote:

A year or so ago a couple of students looked into this for LibX. There
are a number of systems that people have published about, although
some are not available and none worked very well or were easy to get
to work. The systems also varied in their computational complexity,
with some not suitable for interactive use. Google for "libx citation
sensing", or generally for citation extraction, automatic record
boundary detection or extraction. (Unfortunately, pubs.dlib.vt.edu
appears to be down at the moment - otherwise, Suresh Menon's report
contains a useful bibliography of work. I'll ping them.)


I've tested ParaTools

but after it choked on most of it's own examples, tried looking elsewhere.

Inera's eXtyles refXpress claims to do this. You can see it in action
at: . Better than ParaTools
but still missed a lot of things I thought would have been obvious.
Inera said most of the issues I picked out were a problem with
CrossRef's implementation, but the cost of the product was so great that
I didn't explore further.

There was an interesting paper at JCDL 2007 on an unsupervised way of
doing this that had promising results
 but I haven't found any of
their code online.


For citations that contain item titles (which is true for a majority,
but definitely not all citation styles) LibX's magic button uses
Scholar as a hidden backend to produce an actionable OpenURL. Combined
with a similarity analysis, this  "magic button" functionality
produces a usable OpenURL in (on average) 81% of cases for a set of
400 randomly chosen citations from 4 widely read journals from 4
different areas published in 2006 [1].  With some fixes, we could
probably get this number up to 90%. Obviously, this approach only
works for individual use, Google would object for large scale batch
uses.


Agreed that a lookup against something like Google Scholar, Web of
Science, or a set of federated search targets instance may yield better
results. We've discussed by haven't done any testing.
   --SET



- Godmar

[1] Annette Bailey and Godmar Back, Retrieving Known Items with LibX.
The Serials Librarian, 2007. To appear.

On 7/17/07, Jonathan Rochkind <[EMAIL PROTECTED]> wrote:

Does anyone have any decent open source code to parse a citation? I'm
talking about a completely narrative citation like someone might
cut-and-paste from a bibliography or web page. I realize there are a
number of differnet formats this could be in (not to mention the human
error problems that always occur from human entered free text)--but
thinking about it, I suspect that with some work you could get something
that worked reasonably well (if not perfect). So I'm wondering if anyone
has donethis work.

(One of the commerical legal product--I forget if it's Lexis or
West--does this with legal citations--a more limited domain--quite
well.  I'm not sure if any of the commerical bibliographic citation
management software does this?)

The goal, as you can probably guess, is a box that the user can paste a
citation into; make an OpenURL out of it; show the user where to get the
citation.  I'm pretty confident something useful could be created here,
with enough time put into it. But saldy, it's probably more time than
anyone has individually. Unless someone's done it already?

Hopefully,
Jonathan





Re: [CODE4LIB] Citation parsing?

2007-07-17 Thread Godmar Back

A year or so ago a couple of students looked into this for LibX. There
are a number of systems that people have published about, although
some are not available and none worked very well or were easy to get
to work. The systems also varied in their computational complexity,
with some not suitable for interactive use. Google for "libx citation
sensing", or generally for citation extraction, automatic record
boundary detection or extraction. (Unfortunately, pubs.dlib.vt.edu
appears to be down at the moment - otherwise, Suresh Menon's report
contains a useful bibliography of work. I'll ping them.)

For citations that contain item titles (which is true for a majority,
but definitely not all citation styles) LibX's magic button uses
Scholar as a hidden backend to produce an actionable OpenURL. Combined
with a similarity analysis, this  "magic button" functionality
produces a usable OpenURL in (on average) 81% of cases for a set of
400 randomly chosen citations from 4 widely read journals from 4
different areas published in 2006 [1].  With some fixes, we could
probably get this number up to 90%. Obviously, this approach only
works for individual use, Google would object for large scale batch
uses.

- Godmar

[1] Annette Bailey and Godmar Back, Retrieving Known Items with LibX.
The Serials Librarian, 2007. To appear.

On 7/17/07, Jonathan Rochkind <[EMAIL PROTECTED]> wrote:

Does anyone have any decent open source code to parse a citation? I'm
talking about a completely narrative citation like someone might
cut-and-paste from a bibliography or web page. I realize there are a
number of differnet formats this could be in (not to mention the human
error problems that always occur from human entered free text)--but
thinking about it, I suspect that with some work you could get something
that worked reasonably well (if not perfect). So I'm wondering if anyone
has donethis work.

(One of the commerical legal product--I forget if it's Lexis or
West--does this with legal citations--a more limited domain--quite
well.  I'm not sure if any of the commerical bibliographic citation
management software does this?)

The goal, as you can probably guess, is a box that the user can paste a
citation into; make an OpenURL out of it; show the user where to get the
citation.  I'm pretty confident something useful could be created here,
with enough time put into it. But saldy, it's probably more time than
anyone has individually. Unless someone's done it already?

Hopefully,
Jonathan



[CODE4LIB] Citation parsing?

2007-07-17 Thread Jonathan Rochkind

Does anyone have any decent open source code to parse a citation? I'm
talking about a completely narrative citation like someone might
cut-and-paste from a bibliography or web page. I realize there are a
number of differnet formats this could be in (not to mention the human
error problems that always occur from human entered free text)--but
thinking about it, I suspect that with some work you could get something
that worked reasonably well (if not perfect). So I'm wondering if anyone
has donethis work.

(One of the commerical legal product--I forget if it's Lexis or
West--does this with legal citations--a more limited domain--quite
well.  I'm not sure if any of the commerical bibliographic citation
management software does this?)

The goal, as you can probably guess, is a box that the user can paste a
citation into; make an OpenURL out of it; show the user where to get the
citation.  I'm pretty confident something useful could be created here,
with enough time put into it. But saldy, it's probably more time than
anyone has individually. Unless someone's done it already?

Hopefully,
Jonathan