Hi Jakob, I am, quite brutely, doing things like this:
let $total-count := count( xdmp:document-properties()/prop:properties/cpf:processing-status ) let $done-count := count( xdmp:document-properties()/prop:properties[cpf:processing-status/text() = 'done' and not(cpf:state/text() = 'http://marklogic.com/states/error')] ) let $error-count := count( xdmp:document-properties()/prop:properties[cpf:state/text() = 'http://marklogic.com/states/error'] ) let $active-count := $total-count - $error-count - $done-count No looping, just xpath with predicates wrapped in a count. No special indexes (yet).. Kind regards, Geert > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of > Jakob Fix > Sent: dinsdag 14 juli 2009 16:44 > To: General Mark Logic Developer Discussion > Subject: Re: [MarkLogic Dev General] triggering after spawning > > Geert, > > Good question about storing this info at all. Doing a normal > xpath takes clearly too long (five seconds or so), so yes, > you're right, I will test the index on the attribute value. > > cheers, > Jakob. > > > > On Tue, Jul 14, 2009 at 16:36, Geert > Josten<[email protected]> wrote: > > I am wondering why storing it in the database at all. Why > not calculate it on demand? Putting an index on the boolean > element should allow it to perform even when you have > processed many many many documents.. > > > > You might even try doing it without adding a particular > index. It might be covered by the word index already.. > > > > I did a similar thing to keep track of all document being > processed by CPF, using counts on all documents with specific > property values to show a progress bar. I haven't tried it > with many documents yet, but just showing the progress bar > based on about 4 counts, takes only a few tens of a second.. > Didn't need any special indexes at all.. > > > > Kind regards, > > Geert > > > >> -----Original Message----- > >> From: [email protected] > >> [mailto:[email protected]] On Behalf > Of Jakob > >> Fix > >> Sent: dinsdag 14 juli 2009 16:27 > >> To: General Mark Logic Developer Discussion > >> Subject: Re: [MarkLogic Dev General] triggering after spawning > >> > >> Geert, > >> > >> thanks for the quick reply. Some more information which > explains the > >> logic behind what I'm doing: > >> > >> Each day I get an input document containing a(n > increasing) number of > >> URLs (currently around 23.000) which return XML documents, > containing > >> among other things a boolean value. > >> Each day, I record the total number of documents actually > retrieved, > >> the number of "true" and the number of "false" > >> (the total number being a kind of checksum). > >> > >> The summary document looks a bit like this: > >> > >> <doi-stats> > >> ... > >> <doi-stat date="2009-07-14" > >> recorded="{fn:current-dateTime()}" resolved="123" > >> unresolved="456" total="579" /> ... > >> </doi-stats> > >> > >> Now, you're right it might be possible for each spawned task to > >> update this document, however, wouldn't there be a serious > >> performance impact? > >> > >> First, I would have to decrease the number of concurrent tasks > >> (currently six) to maybe two (or even one?), so that > there's not too > >> much time spent waiting to update the document. Second, for each > >> document I would need to count all documents in the collection (or > >> the directory), and third, I'd need do the two xpaths to > retrieve the > >> booleans ... > >> > >> The more I think about this approach, the less I'm convinced that > >> it's scalable, but I'd be more than happy to be convinced > otherwise! > >> > >> thanks, > >> Jakob. > >> > >> > >> > >> On Tue, Jul 14, 2009 at 16:02, Geert > >> Josten<[email protected]> wrote: > >> > Or just have each task update the summary document, each > >> incrementing the finished docs counter by one (if there is any)? > >> > > >> > Note: that effectively serialize all tasks.. > >> > > >> > Kind regards, > >> > Geert > >> > > >> >> > >> > > >> > > >> > Drs. G.P.H. Josten > >> > Consultant > >> > > >> > > >> > http://www.daidalos.nl/ > >> > Daidalos BV > >> > Source of Innovation > >> > Hoekeindsehof 1-4 > >> > 2665 JZ Bleiswijk > >> > Tel.: +31 (0) 10 850 1200 > >> > Fax: +31 (0) 10 850 1199 > >> > http://www.daidalos.nl/ > >> > KvK 27164984 > >> > De informatie - verzonden in of met dit emailbericht - is > >> afkomstig van Daidalos BV en is uitsluitend bestemd voor de > >> geadresseerde. Indien u dit bericht onbedoeld hebt ontvangen, > >> verzoeken wij u het te verwijderen. Aan dit bericht kunnen geen > >> rechten worden ontleend. > >> > > >> > > >> >> From: [email protected] > >> >> [mailto:[email protected]] On Behalf > >> Of Jakob > >> >> Fix > >> >> Sent: dinsdag 14 juli 2009 15:55 > >> >> To: General Mark Logic Developer Discussion > >> >> Subject: [MarkLogic Dev General] triggering after spawning > >> >> > >> >> So I manage to spawn some twenty thousand tasks to > >> retrieve documents > >> >> from a remote server and to store them in MarkLogic. I've also > >> >> created a user interface with a progress bar to follow its > >> progress > >> >> (although this won't be used in production). > >> >> > >> >> Now, what I'd like to do is to trigger an update of a summary > >> >> document once all spawned tasks have executed. From my limited > >> >> experience with ML, I cannot seem to find a satisfying > solution to > >> >> this challenge ... > >> >> > >> >> My ideas: > >> >> - After the spawn call a function recursively which sleeps > >> for some > >> >> time and checks the number of tasks in the task queue, and > >> once it's > >> >> empty assumes "that that's that" and updates/creates a document? > >> >> - Have each spawned task inspect the task queue and if > >> there is just > >> >> one task in the queue (i.e. itself), trigger the > document update? > >> >> > >> >> Hmmm, any better ideas? > >> >> > >> >> Thanks, > >> >> Jakob. > >> >> _______________________________________________ > >> >> General mailing list > >> >> [email protected] > >> >> http://xqzone.com/mailman/listinfo/general > >> >> > >> > > >> > _______________________________________________ > >> > General mailing list > >> > [email protected] > >> > http://xqzone.com/mailman/listinfo/general > >> > > >> _______________________________________________ > >> General mailing list > >> [email protected] > >> http://xqzone.com/mailman/listinfo/general > >> _______________________________________________ > > General mailing list > > [email protected] > > http://xqzone.com/mailman/listinfo/general > > > _______________________________________________ > General mailing list > [email protected] > http://xqzone.com/mailman/listinfo/general > _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
