We had the same problem (mulgara not responding to proai query after the
repo reached cca 150000 objects) and ended up with similar solution
(manually updating proai database + using mpt triplestore).

Since this is not the first time the query problem appears in this list,
maybe the oaiprovider would use some redesign? I can imagine it could
process updates like the gsearch does (JMS subscriber to fedora updates).
It could actually work + use the server resources more friendly.

Rasta
Vienna University Computer Center


On 5 June 2013 18:45, Aaron Coburn <[email protected]> wrote:

>  I have run into this exact issue when running large bulk ingests.
> Fedora's resource consumption went through the roof and everything slowed
> to a crawl. Plus the RI queries Proai was executing always timed out
> anyway.
>
>  My solution was to completely circumvent the Proai polling system and
> manually populate the rcQueue table in MySQL. It is not necessary elegant,
> but I thought the "pull on demand" approach that is baked into Proai was
> not appropriate for our system -- especially since we add objects in
> batches of thousands or tens of thousands. So this uses more of a "push on
> change" approach, which is much less resource intensive.
>
>  Incidentally, I am using Proai 1.2.2 and fedora 3.5.
>
>  This involves first disabling PROAI polling (in the configuration file):
>
>  proai.driverPollingEnabled = false
>
>  and (in the backend database):
>
>  UPDATE rcAdmin SET pollingEnabled=0;
>
>  This raises the issue of how to populate the Proai queue. There is no
> API for this, nor is there any Proai documentation for doing this -- I had
> to read through the source code -- and there is no guarantee that this will
> work in the future. With that in mind, I am leveraging fedora's messaging
> system (ActiveMQ) in such a way that whenever an object is added or
> updated, I add one or more entries to the proai queue in MySQL. There are a
> lot of different ways to make this happen, but it all comes to a SQL query
> such as:
>
>  INSERT INTO rcQueue
>    (identifier, mdPrefix, sourceInfo, queueSource)
> VALUES
>     ('{pid}', {mdPrefix}', '{sourceInfo}', 'R');
>
>  Please note that each object in fedora may translate into multiple rows
> in the queue, especially if an object is in multiple collections and/or if
> you provide multiple metadata formats for each object. Also, depending on
> how atomic your fedora objects are, you may need a mechanism for filtering
> the fedora messages.
>
>  For the values:
> pid: this is obvious
> mdPrefix: this means "metadata prefix", and in our case this includes an
> entry for 'mods' and 'oai_dc'
> queueSource: this should always be 'R', though I can't say why
> sourceInfo: this is more complicated, but it is a space-delimited string
> that includes the following pieces of information:
> 1. Full fedora URI for the metadata (e.g. info:fedora/{PID}/MODS)
> 2. null (I have no idea what this value does)
> 3. false (I also have no idea what this value does)
> 4. date string formatted like: yyyy-MM-ddTHH:mm:ssZ
> 5. collection pid -- you can retrieve this by running a RI query such as
> (depending on how your collections are set up):
>
>  SELECT ?spec WHERE {
>   <fedora:{PID}> <fedora-rels-ext:isMemberOfCollection> ?coll .
>   ?coll <http://www.openarchives.org/OAI/2.0/setSpec> ?spec .
> }
>
>  So in my case, a "sourceInfo" value might look like:
> "info:fedora/asc:17865/MODS null false 2013-06-05T12:20:15Z collection:asc"
>
>  The nice thing about using the messaging system is that you can also use
> it to delete objects from the Proai system. It was never clear to me that
> objects are ever deleted from the proai cache.
>
>  Hope that helps or at least gives you some ideas.
>
>  At the very least, try turning off driverPolling while your are running
> an ingest. After the ingest completes, try turning the polling back on.
>
>  Aaron
>
>
>    --
> Aaron Coburn
> Systems Administrator and Programmer
> Academic Technology Services, Amherst College
> [email protected]
>
>
>
>
>
>
>  On Jun 4, 2013, at 8:54 AM, Grondin Luc <[email protected]> wrote:
>
>   Hello,****
>
>  We are running into a problem where PROAI’s cache cannot be updated due
> to the fact that its updating request into Fedora’s Resource Index never
> succeeds.****
>
>  Some time ago, we had an operation that changed in a short period of
> time something like about 7000 or 8000 objects from one of our repository
> (containing about 450000 objects). Since then, PROAI cannot updates
> modified or new objects. It appears that the research index query causes
> Fedora to use a large amount of CPU and memory resource and never succeed
> to return a response.****
>
>  I have tried to execute the query by adding one condition at a time.
> Here is the complete query:****
>
>  select $item $itemID $state $date****
>  from <#ri>****
>  where $item <http://www.openarchives.org/OAI/2.0/itemID> $itemID****
>  and $item <info:fedora/fedora-system:def/model#state> $state****
>  and $item <info:fedora/fedora-system:def/model#hasModel> $model****
>  and $model <info:fedora/fedora-system:def/model#hasService> $SDef****
>  and $SDef <info:fedora/fedora-system:def/model#definesMethod>
> 'getOaiDublinCore'****
>  and $SDef <http://mulgara.org/mulgara#is>
> <info:fedora/erudit-model:unitSDef>****
>  and $item <info:fedora/fedora-system:def/view#lastModifiedDate> $date****
>  and $date <http://mulgara.org/mulgara#after>
> '2013-04-30T08:18:02.519Z'^^<http://www.w3.org/2001/XMLSchema#dateTime>
> in <#xsd>****
>  and $date <http://mulgara.org/mulgara#before> '2013-05-28T00:00:00Z'^^<
> http://www.w3.org/2001/XMLSchema#dateTime> in <#xsd>****
>  order  by $date asc****
>
>  It appears that this query works :****
>
>  select $item $itemID $state $date****
>  from <#ri>****
>  where $item <http://www.openarchives.org/OAI/2.0/itemID> $itemID****
>  and $item <info:fedora/fedora-system:def/model#state> $state****
>  and $item <info:fedora/fedora-system:def/model#hasModel> $model****
>  and $item <info:fedora/fedora-system:def/view#lastModifiedDate> $date****
>  and $date <http://mulgara.org/mulgara#after>
> '2013-04-30T08:18:02.519Z'^^<http://www.w3.org/2001/XMLSchema#dateTime>
> in <#xsd>****
>  and $date <http://mulgara.org/mulgara#before> '2013-05-28T00:00:00Z'^^<
> http://www.w3.org/2001/XMLSchema#dateTime> in <#xsd>****
>  order  by $date asc****
>
>  But when I add the condition****
>
>  and $model <info:fedora/fedora-system:def/model#hasService> $SDef****
>
>  Fedora’s process starts jumping to a high state of resource consumption.
> After a while, this ends with an exception.****
>
>  ERROR 2013-06-03 18:11:43.380 ["http-bio-/10.137.96.15-8082"-exec-3]
> (RISearchServlet) Unexpected error servicing API-A request****
>  org.trippi.TrippiException: TransactionalAnswer closed****
>          at
> org.trippi.impl.mulgara.MulgaraTupleIterator.close(MulgaraTupleIterator.java:39)
> [trippi-mulgara-1.4.3.jar:na]****
>          at
> org.trippi.impl.base.PoolAwareTupleIterator.close(PoolAwareTupleIterator.java:66)
> [trippi-core-1.4.3.jar:na]****
>          at org.trippi.server.TrippiServer.find(TrippiServer.java:126)
> [trippi-core-1.4.3.jar:na]****
>  …****
>  ERROR 2013-06-03 18:21:52.467 ["http-bio-/10.137.96.15-8082"-exec-5]
> (RISearchServlet) Unexpected error servicing API-A request****
>  org.trippi.TrippiException: Transaction error****
>          at
> org.trippi.impl.mulgara.MulgaraTupleIterator.<init>(MulgaraTupleIterator.java:27)
> [trippi-mulgara-1.4.3.jar:na]****
>          at
> org.trippi.impl.mulgara.MulgaraSession.query(MulgaraSession.java:156)
> [trippi-mulgara-1.4.3.jar:na]****
>          at
> org.trippi.impl.base.ConcurrentTriplestoreReader.findTuples(ConcurrentTriplestoreReader.java:79)
> [trippi-core-1.4.3.j****
>
>
>  I tried rebuild the Resource Index but that did not help.****
>
>  I suppose that I could regenerate the complete PROAI cache from scratch.
> But that would mean to reset the OAI datestamp to current date. This would
> be a last resort option, since that would cause impacts on partners that
> use our OAI service. They would have to reharvest our whole collection to
> get updates.****
>
>  Would anybody have a suggestion on how to “cure” that problem or
> circumvent it? By the way, the Fedora instance runs under Version 3.4.2 and
> Proai is 1.2.2.****
>
>  Thanks,****
>
>  Luc****
>
>   ---****
>    Luc Grondin
>   Analyste en gestion de l'information numérique
>   Centre d'expertise numérique pour la recherche - Université de Montréal
>   téléphone: 514-343-6111 p. 3988  --  [email protected]****
>  ** **
> ------------------------------------------------------------------------------
> How ServiceNow helps IT people transform IT departments:
> 1. A cloud service to automate IT design, transition and operations
> 2. Dashboards that offer high-level views of enterprise services
> 3. A single system of record for all IT processes
>
> http://p.sf.net/sfu/servicenow-d2d-j_______________________________________________
> Fedora-commons-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
>
>
>
>
> ------------------------------------------------------------------------------
> How ServiceNow helps IT people transform IT departments:
> 1. A cloud service to automate IT design, transition and operations
> 2. Dashboards that offer high-level views of enterprise services
> 3. A single system of record for all IT processes
> http://p.sf.net/sfu/servicenow-d2d-j
> _______________________________________________
> Fedora-commons-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
>
>
------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
Fedora-commons-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Reply via email to