This is an oddity & a bit of a mystery to me.

The odd thing is that Google supposedly no longer uses OAI-PMH at all. 
They retired support for it back in 2008:
http://googlewebmastercentral.blogspot.com/2008/04/retiring-support-for-oai-pmh-in.html

That being said, I believe Google Scholar runs its own separate crawlers 
(i.e. separate from the main Google crawlers). But, I didn't think they 
used OAI-PMH either.  So, this is news to me that they'd even be finding 
content via OAI-PMH.

I tend to agree that this sounds like it might be something where Google 
Scholar's crawlers are now acting slightly differently then they did 
before.  I'll bring this up in today's Developers Meeting to see if 
anyone else has seen this, or can reach out to Anurag @ Google Scholar 
about it.

- Tim

On 11/14/2012 11:29 AM, Reinhard Engels wrote:
> Hi Helix,
>
> Thanks for the quick reply!
>
> I saw that posting earlier, but from what I can see in my logs, it
> really looks like the METS/ORE links are the explanation.
>
> Both the timing, and the scope of the issue seem to fit (time when OAI
> requests were made vs. pdf.txt requests, the subset of records that
> were crawled this way).
>
> I do see some evidence of redirects, but (form eyeballing them) it
> looks like Googlebot is successfully following them and getting a 200
> for the pdf. Also, even if this has something to do with redirects,
> this is how dspace always behaved and it wasn't an issue until quite
> recently.
>
> I think this may be more of a google scholar problem than a dspace
> problem. Do you guys have an inside connection over there you can run
> this by? (I tried Anurag, but I may not have his current contact
> info). If they make their crawler just work like it used to, I think
> this problem will go away, without tons on dspace installations having
> to be upgraded.
>
> Thanks for the invitation to the developer chat -- I don't think I can
> make it but I appreciate the invitation. I'm happy to investigate more
> methodically if you guys need more data.
>
> Reinhard
>
> On Wed, Nov 14, 2012 at 12:00 PM, helix84 <heli...@centrum.sk> wrote:
>> Hi Reinhard,
>>
>> I just wanted to check with you if this is not the problem:
>>
>> http://www.mail-archive.com/dspace-tech@lists.sourceforge.net/msg18831.html
>>
>> Please, try answering ASAP. If there's really a problem, we might take
>> it up on today's developer meeting at 20:00 UTC. You're welcome to
>> attend.
>>
>> https://wiki.duraspace.org/display/DSPACE/Developer+Meetings
>>
>>
>> Regards,
>> ~~helix84
>>
>> Compulsory reading: DSpace Mailing List Etiquette
>> https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
>
> ------------------------------------------------------------------------------
> Monitor your physical, virtual and cloud infrastructure from a single
> web console. Get in-depth insight into apps, servers, databases, vmware,
> SAP, cloud infrastructure, etc. Download 30-day Free Trial.
> Pricing starts from $795 for 25 servers or applications!
> http://p.sf.net/sfu/zoho_dev2dev_nov
> _______________________________________________
> DSpace-tech mailing list
> DSpace-tech@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspace-tech
>

------------------------------------------------------------------------------
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to