Hi,
FYI, our server RERO DOC has been successfully harvested by OAIster
lately, roughly on a weekly basis (last harvesting took place on
09.12.2008). I just made a few tests on their search interface in
order to confirm this, and I can indeed find several of our records
dating from 08.12.2008.
This might be useful: we currently check OAIster activity on our
server simply by grep'ing the apache log for '141.211.175.166', which
is obviously quite a fallible method, but has been effective for more
than a year now. I don't even know if they use several harvesting
servers or this single one, but that might help you out for the time
being.
Regards,
Miguel.
On Dec 9, 2008, at 14:11, Jerome Caffaro wrote:
Hi Ferran,
Ferran Jorba wrote:
[..] may we ask you why the default value is
set to 10 seconds?
I guess the default CFG_OAI_SLEEP value was set to 10 seconds more
or less arbitrarily: to avoid being flooded with requests that
result in potentially bandwidth and CPU hungry responses, a delay
between each request can be necessary, depending on your hardware/
network configuration (do not forget that several harvesters might
try to access the OAI gateway at the same time).
Is it safe to lower it to zero?
Yes, but you might want to monitor your server load to ensure that
it can serve both regular users and requests on the OAI gateway. At
CERN we kept the value of CFG_OAI_SLEEP to 10, and we never noticed
any problem.
> we are having some issues when being collected by OAIster, and we
> suspect that their robot doesn't obey the Retry-After HTTP header.
I do not think that Retry-After is a problem. However OAIster
attempts to validate your repository with the "OAI Repository
Explorer" <http://re.cs.uct.ac.za/>. You will notice that it fails
with the version of CDS Invenio you have installed: that's because
the validation tests have become much more strict than before, to
better stick to OAI-PMH. This validation problems has been fixed in
later versions of CDS Invenio.
So this means that you cannot yet register your repository in
OAIster or other similar services that validate the repository with
the "OAI Repository Explorer".
In addition, note that it is important to correctly set the value of
CFG_OAI_IDENTIFY_DESCRIPTION: it contains data that is being checked
by the validator (pay attention to specify the correct base URL,
which must match the URL you submit - trailing "/" must also match -
and to not leave spaces or line breaks inside tags like "<scheme>",
"<repositoryIdentifier>" etc. as it is by default in the
invenio.conf file... You seem to have configured this correctly, but
let me specify it here for others who might not be aware of these
details).
Are you being successfully collected by Oaister?
Not sure: they seem to have only a subset of our data, but I do not
know exactly how they count records, if they do selective
harvesting, etc.
Best regards
--
Jerome Caffaro ** CERN Document Server ** <http://cds.cern.ch/>