Hi Jakob,

I am, quite brutely, doing things like this:

let $total-count := count( 
xdmp:document-properties()/prop:properties/cpf:processing-status )

let $done-count := count( 
xdmp:document-properties()/prop:properties[cpf:processing-status/text() = 
'done' and not(cpf:state/text() = 'http://marklogic.com/states/error')] )
let $error-count := count( 
xdmp:document-properties()/prop:properties[cpf:state/text() = 
'http://marklogic.com/states/error'] )
let $active-count := $total-count - $error-count - $done-count

No looping, just xpath with predicates wrapped in a count. No special indexes 
(yet)..

Kind regards,
Geert 

> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of 
> Jakob Fix
> Sent: dinsdag 14 juli 2009 16:44
> To: General Mark Logic Developer Discussion
> Subject: Re: [MarkLogic Dev General] triggering after spawning
> 
> Geert,
> 
> Good question about storing this info at all.  Doing a normal 
> xpath takes clearly too long (five seconds or so), so yes, 
> you're right, I will test the index on the attribute value.
> 
> cheers,
> Jakob.
> 
> 
> 
> On Tue, Jul 14, 2009 at 16:36, Geert 
> Josten<[email protected]> wrote:
> > I am wondering why storing it in the database at all. Why 
> not calculate it on demand? Putting an index on the boolean 
> element should allow it to perform even when you have 
> processed many many many documents..
> >
> > You might even try doing it without adding a particular 
> index. It might be covered by the word index already..
> >
> > I did a similar thing to keep track of all document being 
> processed by CPF, using counts on all documents with specific 
> property values to show a progress bar. I haven't tried it 
> with many documents yet, but just showing the progress bar 
> based on about 4 counts, takes only a few tens of a second.. 
> Didn't need any special indexes at all..
> >
> > Kind regards,
> > Geert
> >
> >> -----Original Message-----
> >> From: [email protected]
> >> [mailto:[email protected]] On Behalf 
> Of Jakob 
> >> Fix
> >> Sent: dinsdag 14 juli 2009 16:27
> >> To: General Mark Logic Developer Discussion
> >> Subject: Re: [MarkLogic Dev General] triggering after spawning
> >>
> >> Geert,
> >>
> >> thanks for the quick reply. Some more information which 
> explains the 
> >> logic behind what I'm doing:
> >>
> >> Each day I get an input document containing a(n 
> increasing) number of 
> >> URLs (currently around 23.000) which return XML documents, 
> containing 
> >> among other things a boolean value.
> >> Each day, I record the total number of documents actually 
> retrieved, 
> >> the number of "true" and the number of "false"
> >> (the total number being a kind of checksum).
> >>
> >> The summary document looks a bit like this:
> >>
> >> <doi-stats>
> >> ...
> >>   <doi-stat date="2009-07-14"
> >>       recorded="{fn:current-dateTime()}" resolved="123"
> >>       unresolved="456" total="579" /> ...
> >> </doi-stats>
> >>
> >> Now, you're right it might be possible for each spawned task to 
> >> update this document, however, wouldn't there be a serious 
> >> performance impact?
> >>
> >> First, I would have to decrease the number of concurrent tasks 
> >> (currently six) to maybe two (or even one?), so that 
> there's not too 
> >> much time spent waiting to update the document.  Second, for each 
> >> document I would need to count all documents in the collection (or 
> >> the directory), and third, I'd need do the two xpaths to 
> retrieve the 
> >> booleans ...
> >>
> >> The more I think about this approach, the less I'm convinced that 
> >> it's scalable, but I'd be more than happy to be convinced 
> otherwise!
> >>
> >> thanks,
> >> Jakob.
> >>
> >>
> >>
> >> On Tue, Jul 14, 2009 at 16:02, Geert
> >> Josten<[email protected]> wrote:
> >> > Or just have each task update the summary document, each
> >> incrementing the finished docs counter by one (if there is any)?
> >> >
> >> > Note: that effectively serialize all tasks..
> >> >
> >> > Kind regards,
> >> > Geert
> >> >
> >> >>
> >> >
> >> >
> >> > Drs. G.P.H. Josten
> >> > Consultant
> >> >
> >> >
> >> > http://www.daidalos.nl/
> >> > Daidalos BV
> >> > Source of Innovation
> >> > Hoekeindsehof 1-4
> >> > 2665 JZ Bleiswijk
> >> > Tel.: +31 (0) 10 850 1200
> >> > Fax: +31 (0) 10 850 1199
> >> > http://www.daidalos.nl/
> >> > KvK 27164984
> >> > De informatie - verzonden in of met dit emailbericht - is
> >> afkomstig van Daidalos BV en is uitsluitend bestemd voor de 
> >> geadresseerde. Indien u dit bericht onbedoeld hebt ontvangen, 
> >> verzoeken wij u het te verwijderen. Aan dit bericht kunnen geen 
> >> rechten worden ontleend.
> >> >
> >> >
> >> >> From: [email protected]
> >> >> [mailto:[email protected]] On Behalf
> >> Of Jakob
> >> >> Fix
> >> >> Sent: dinsdag 14 juli 2009 15:55
> >> >> To: General Mark Logic Developer Discussion
> >> >> Subject: [MarkLogic Dev General] triggering after spawning
> >> >>
> >> >> So I manage to spawn some twenty thousand tasks to
> >> retrieve documents
> >> >> from a remote server and to store them in MarkLogic.  I've also 
> >> >> created a user interface with a progress bar to follow its
> >> progress
> >> >> (although this won't be used in production).
> >> >>
> >> >> Now, what I'd like to do is to trigger an update of a summary 
> >> >> document once all spawned tasks have executed. From my limited 
> >> >> experience with ML, I cannot seem to find a satisfying 
> solution to 
> >> >> this challenge ...
> >> >>
> >> >> My ideas:
> >> >> - After the spawn call a function recursively which sleeps
> >> for some
> >> >> time and checks the number of tasks in the task queue, and
> >> once it's
> >> >> empty assumes "that that's that" and updates/creates a document?
> >> >> - Have each spawned task inspect the task queue and if
> >> there is just
> >> >> one task in the queue (i.e. itself), trigger the 
> document update?
> >> >>
> >> >> Hmmm, any better ideas?
> >> >>
> >> >> Thanks,
> >> >> Jakob.
> >> >> _______________________________________________
> >> >> General mailing list
> >> >> [email protected]
> >> >> http://xqzone.com/mailman/listinfo/general
> >> >>
> >> >
> >> > _______________________________________________
> >> > General mailing list
> >> > [email protected]
> >> > http://xqzone.com/mailman/listinfo/general
> >> >
> >> _______________________________________________
> >> General mailing list
> >> [email protected]
> >> http://xqzone.com/mailman/listinfo/general
> >> _______________________________________________
> > General mailing list
> > [email protected]
> > http://xqzone.com/mailman/listinfo/general
> >
> _______________________________________________
> General mailing list
> [email protected]
> http://xqzone.com/mailman/listinfo/general
> _______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to