We've looked at this pretty extensively, and we're pretty certain there's nothing downloadable that does a "good enough" job. However, it's by no means impossible -- it seems to be undergrad thesis-level work in Singapore:
http://wing.comp.nus.edu.sg/parsCit/ There used to be a paper describing this approach (essentially treating citation parsing as a natural language processing task and using a maximum entropy algorithm) online... the page even cites it... but it seems to be gone now. FWIW, it didn't look too difficult. -Nate On Jul 17, 2007, at 6:16 PM, Jonathan Rochkind wrote:
Does anyone have any decent open source code to parse a citation? I'm talking about a completely narrative citation like someone might cut-and-paste from a bibliography or web page. I realize there are a number of differnet formats this could be in (not to mention the human error problems that always occur from human entered free text)--but thinking about it, I suspect that with some work you could get something that worked reasonably well (if not perfect). So I'm wondering if anyone has donethis work. (One of the commerical legal product--I forget if it's Lexis or West--does this with legal citations--a more limited domain--quite well. I'm not sure if any of the commerical bibliographic citation management software does this?) The goal, as you can probably guess, is a box that the user can paste a citation into; make an OpenURL out of it; show the user where to get the citation. I'm pretty confident something useful could be created here, with enough time put into it. But saldy, it's probably more time than anyone has individually. Unless someone's done it already? Hopefully, Jonathan