I have run into this exact issue when running large bulk ingests. Fedora's 
resource consumption went through the roof and everything slowed to a crawl. 
Plus the RI queries Proai was executing always timed out anyway.

My solution was to completely circumvent the Proai polling system and manually 
populate the rcQueue table in MySQL. It is not necessary elegant, but I thought 
the "pull on demand" approach that is baked into Proai was not appropriate for 
our system -- especially since we add objects in batches of thousands or tens 
of thousands. So this uses more of a "push on change" approach, which is much 
less resource intensive.

Incidentally, I am using Proai 1.2.2 and fedora 3.5.

This involves first disabling PROAI polling (in the configuration file):

proai.driverPollingEnabled = false

and (in the backend database):

UPDATE rcAdmin SET pollingEnabled=0;

This raises the issue of how to populate the Proai queue. There is no API for 
this, nor is there any Proai documentation for doing this -- I had to read 
through the source code -- and there is no guarantee that this will work in the 
future. With that in mind, I am leveraging fedora's messaging system (ActiveMQ) 
in such a way that whenever an object is added or updated, I add one or more 
entries to the proai queue in MySQL. There are a lot of different ways to make 
this happen, but it all comes to a SQL query such as:

INSERT INTO rcQueue
   (identifier, mdPrefix, sourceInfo, queueSource)
VALUES
    ('{pid}', {mdPrefix}', '{sourceInfo}', 'R');

Please note that each object in fedora may translate into multiple rows in the 
queue, especially if an object is in multiple collections and/or if you provide 
multiple metadata formats for each object. Also, depending on how atomic your 
fedora objects are, you may need a mechanism for filtering the fedora messages.

For the values:
pid: this is obvious
mdPrefix: this means "metadata prefix", and in our case this includes an entry 
for 'mods' and 'oai_dc'
queueSource: this should always be 'R', though I can't say why
sourceInfo: this is more complicated, but it is a space-delimited string that 
includes the following pieces of information:
1. Full fedora URI for the metadata (e.g. info:fedora/{PID}/MODS)
2. null (I have no idea what this value does)
3. false (I also have no idea what this value does)
4. date string formatted like: yyyy-MM-ddTHH:mm:ssZ
5. collection pid -- you can retrieve this by running a RI query such as 
(depending on how your collections are set up):

SELECT ?spec WHERE {
  <fedora:{PID}> <fedora-rels-ext:isMemberOfCollection> ?coll .
  ?coll <http://www.openarchives.org/OAI/2.0/setSpec> ?spec .
}

So in my case, a "sourceInfo" value might look like:
"info:fedora/asc:17865/MODS null false 2013-06-05T12:20:15Z collection:asc"

The nice thing about using the messaging system is that you can also use it to 
delete objects from the Proai system. It was never clear to me that objects are 
ever deleted from the proai cache.

Hope that helps or at least gives you some ideas.

At the very least, try turning off driverPolling while your are running an 
ingest. After the ingest completes, try turning the polling back on.

Aaron


--
Aaron Coburn
Systems Administrator and Programmer
Academic Technology Services, Amherst College
[email protected]<mailto:[email protected]>






On Jun 4, 2013, at 8:54 AM, Grondin Luc 
<[email protected]<mailto:[email protected]>> wrote:

Hello,

We are running into a problem where PROAI’s cache cannot be updated due to the 
fact that its updating request into Fedora’s Resource Index never succeeds.

Some time ago, we had an operation that changed in a short period of time 
something like about 7000 or 8000 objects from one of our repository 
(containing about 450000 objects). Since then, PROAI cannot updates modified or 
new objects. It appears that the research index query causes Fedora to use a 
large amount of CPU and memory resource and never succeed to return a response.

I have tried to execute the query by adding one condition at a time. Here is 
the complete query:

select $item $itemID $state $date
from <#ri>
where $item <http://www.openarchives.org/OAI/2.0/itemID> $itemID
and $item <info:fedora/fedora-system:def/model#state> $state
and $item <info:fedora/fedora-system:def/model#hasModel> $model
and $model <info:fedora/fedora-system:def/model#hasService> $SDef
and $SDef <info:fedora/fedora-system:def/model#definesMethod> 'getOaiDublinCore'
and $SDef <http://mulgara.org/mulgara#is> <info:fedora/erudit-model:unitSDef>
and $item <info:fedora/fedora-system:def/view#lastModifiedDate> $date
and $date <http://mulgara.org/mulgara#after> 
'2013-04-30T08:18:02.519Z'^^<http://www.w3.org/2001/XMLSchema#dateTime> in 
<#xsd>
and $date <http://mulgara.org/mulgara#before> 
'2013-05-28T00:00:00Z'^^<http://www.w3.org/2001/XMLSchema#dateTime> in <#xsd>
order  by $date asc

It appears that this query works :

select $item $itemID $state $date
from <#ri>
where $item <http://www.openarchives.org/OAI/2.0/itemID> $itemID
and $item <info:fedora/fedora-system:def/model#state> $state
and $item <info:fedora/fedora-system:def/model#hasModel> $model
and $item <info:fedora/fedora-system:def/view#lastModifiedDate> $date
and $date <http://mulgara.org/mulgara#after> 
'2013-04-30T08:18:02.519Z'^^<http://www.w3.org/2001/XMLSchema#dateTime> in 
<#xsd>
and $date <http://mulgara.org/mulgara#before> 
'2013-05-28T00:00:00Z'^^<http://www.w3.org/2001/XMLSchema#dateTime> in <#xsd>
order  by $date asc

But when I add the condition

and $model <info:fedora/fedora-system:def/model#hasService> $SDef

Fedora’s process starts jumping to a high state of resource consumption. After 
a while, this ends with an exception.

ERROR 2013-06-03 18:11:43.380 ["http-bio-/10.137.96.15-8082"-exec-3] 
(RISearchServlet) Unexpected error servicing API-A request
org.trippi.TrippiException: TransactionalAnswer closed
        at 
org.trippi.impl.mulgara.MulgaraTupleIterator.close(MulgaraTupleIterator.java:39)
 [trippi-mulgara-1.4.3.jar:na]
        at 
org.trippi.impl.base.PoolAwareTupleIterator.close(PoolAwareTupleIterator.java:66)
 [trippi-core-1.4.3.jar:na]
        at org.trippi.server.TrippiServer.find(TrippiServer.java:126) 
[trippi-core-1.4.3.jar:na]
…
ERROR 2013-06-03 18:21:52.467 ["http-bio-/10.137.96.15-8082"-exec-5] 
(RISearchServlet) Unexpected error servicing API-A request
org.trippi.TrippiException: Transaction error
        at 
org.trippi.impl.mulgara.MulgaraTupleIterator.<init>(MulgaraTupleIterator.java:27)
 [trippi-mulgara-1.4.3.jar:na]
        at 
org.trippi.impl.mulgara.MulgaraSession.query(MulgaraSession.java:156) 
[trippi-mulgara-1.4.3.jar:na]
        at 
org.trippi.impl.base.ConcurrentTriplestoreReader.findTuples(ConcurrentTriplestoreReader.java:79)
 [trippi-core-1.4.3.j


I tried rebuild the Resource Index but that did not help.

I suppose that I could regenerate the complete PROAI cache from scratch. But 
that would mean to reset the OAI datestamp to current date. This would be a 
last resort option, since that would cause impacts on partners that use our OAI 
service. They would have to reharvest our whole collection to get updates.

Would anybody have a suggestion on how to “cure” that problem or circumvent it? 
By the way, the Fedora instance runs under Version 3.4.2 and Proai is 1.2.2.

Thanks,

Luc

 ---
  Luc Grondin
  Analyste en gestion de l'information numérique
  Centre d'expertise numérique pour la recherche - Université de Montréal
  téléphone: 514-343-6111 p. 3988  --  
[email protected]<mailto:[email protected]>

------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j_______________________________________________
Fedora-commons-users mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
Fedora-commons-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Reply via email to