On 1 Oct 2007, at 8:23 PM, Michael McCracken wrote:

> On 10/1/07, Christiaan Hofman <[EMAIL PROTECTED]> wrote:
>> I haven't had time to test out or look at the new web group changes.
>> So I don't know what kind of scrapers you have.
>
> citeulike.org, hubmed.org, ACM digital library, and google scholar.
>
>> But would it be
>> possible to add some scrapers perhaps for the arXiv? I haven't looked
>> much at what they added there to make things easier.
>
> I don't see an easy way to get an official BibTeX entry, but it looks
> like we could get the title, authors, and abstract from their HTML,
> which is nicely formed and uses meaningful html class names. If a
> paper has been accepted to a journal, it gets a journal-id which would
> give us the journal name/number, pages, etc. That isn't marked up, but
> it might be in a common format that's regex-parseable.
>
> However, maybe the better way to get arxiv support would be to look at
> the Open Archives Initiative support they have:
> http://arxiv.org/help/oa
>
> Other sites use that as well, so if we could just scrape an identifier
> from the page then get useful metadata from their OAI-PMH server, that
> might be a good way to support more sites. I'll make a note to look
> into that more, but I can't do it right now.
>
> -mike
>

I was also thinking of that. We can already parse the OAI, as there  
was a search group for that at some pointy (but wasn't very useful as  
it doesn't have an interface for searches). Also it may be useful to  
be able to scrape the list pages as well as the abstract pages.

Christiaan

>
>> On 1 Oct 2007, at 7:53 PM, Michael McCracken wrote:
>>
>>> Has anyone had a chance to try out the web group?
>>>
>>> Let me know if there is a site you use to find papers that I  
>>> could add
>>> support for, if it would help you test things.
>>>
>>> Or does everyone else already enjoy good searching support  
>>> through the
>>> z39.50 stuff, and it's only those of us in CS with backwards
>>> publishers?
>>>
>>> -mike
>>>
>>> --
>>> Michael McCracken
>>> UCSD CSE PhD Candidate
>>> research: http://www.cse.ucsd.edu/~mmccrack/
>>> misc: http://michael-mccracken.net/wp/
>>>
>>
>> --------------------------------------------------------------------- 
>> ----
>> This SF.net email is sponsored by: Microsoft
>> Defy all challenges. Microsoft(R) Visual Studio 2005.
>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
>> _______________________________________________
>> Bibdesk-develop mailing list
>> Bibdesk-develop@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/bibdesk-develop
>>
>
>
> -- 
> Michael McCracken
> UCSD CSE PhD Candidate
> research: http://www.cse.ucsd.edu/~mmccrack/
> misc: http://michael-mccracken.net/wp/
>
> ---------------------------------------------------------------------- 
> ---
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2005.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> Bibdesk-develop mailing list
> Bibdesk-develop@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bibdesk-develop


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bibdesk-develop mailing list
Bibdesk-develop@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bibdesk-develop

Reply via email to