I am wondering why storing it in the database at all. Why not calculate it on 
demand? Putting an index on the boolean element should allow it to perform even 
when you have processed many many many documents..

You might even try doing it without adding a particular index. It might be 
covered by the word index already..

I did a similar thing to keep track of all document being processed by CPF, 
using counts on all documents with specific property values to show a progress 
bar. I haven't tried it with many documents yet, but just showing the progress 
bar based on about 4 counts, takes only a few tens of a second.. Didn't need 
any special indexes at all..

Kind regards,
Geert

> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of 
> Jakob Fix
> Sent: dinsdag 14 juli 2009 16:27
> To: General Mark Logic Developer Discussion
> Subject: Re: [MarkLogic Dev General] triggering after spawning
> 
> Geert,
> 
> thanks for the quick reply. Some more information which 
> explains the logic behind what I'm doing:
> 
> Each day I get an input document containing a(n increasing) 
> number of URLs (currently around 23.000) which return XML 
> documents, containing among other things a boolean value.  
> Each day, I record the total number of documents actually 
> retrieved, the number of "true" and the number of "false" 
> (the total number being a kind of checksum).
> 
> The summary document looks a bit like this:
> 
> <doi-stats>
> ...
>   <doi-stat date="2009-07-14"
>       recorded="{fn:current-dateTime()}" resolved="123"
>       unresolved="456" total="579" />
> ...
> </doi-stats>
> 
> Now, you're right it might be possible for each spawned task 
> to update this document, however, wouldn't there be a serious 
> performance impact?
> 
> First, I would have to decrease the number of concurrent 
> tasks (currently six) to maybe two (or even one?), so that 
> there's not too much time spent waiting to update the 
> document.  Second, for each document I would need to count 
> all documents in the collection (or the directory), and 
> third, I'd need do the two xpaths to retrieve the booleans ...
> 
> The more I think about this approach, the less I'm convinced 
> that it's scalable, but I'd be more than happy to be 
> convinced otherwise!
> 
> thanks,
> Jakob.
> 
> 
> 
> On Tue, Jul 14, 2009 at 16:02, Geert 
> Josten<[email protected]> wrote:
> > Or just have each task update the summary document, each 
> incrementing the finished docs counter by one (if there is any)?
> >
> > Note: that effectively serialize all tasks..
> >
> > Kind regards,
> > Geert
> >
> >>
> >
> >
> > Drs. G.P.H. Josten
> > Consultant
> >
> >
> > http://www.daidalos.nl/
> > Daidalos BV
> > Source of Innovation
> > Hoekeindsehof 1-4
> > 2665 JZ Bleiswijk
> > Tel.: +31 (0) 10 850 1200
> > Fax: +31 (0) 10 850 1199
> > http://www.daidalos.nl/
> > KvK 27164984
> > De informatie - verzonden in of met dit emailbericht - is 
> afkomstig van Daidalos BV en is uitsluitend bestemd voor de 
> geadresseerde. Indien u dit bericht onbedoeld hebt ontvangen, 
> verzoeken wij u het te verwijderen. Aan dit bericht kunnen 
> geen rechten worden ontleend.
> >
> >
> >> From: [email protected]
> >> [mailto:[email protected]] On Behalf 
> Of Jakob 
> >> Fix
> >> Sent: dinsdag 14 juli 2009 15:55
> >> To: General Mark Logic Developer Discussion
> >> Subject: [MarkLogic Dev General] triggering after spawning
> >>
> >> So I manage to spawn some twenty thousand tasks to 
> retrieve documents 
> >> from a remote server and to store them in MarkLogic.  I've also 
> >> created a user interface with a progress bar to follow its 
> progress 
> >> (although this won't be used in production).
> >>
> >> Now, what I'd like to do is to trigger an update of a summary 
> >> document once all spawned tasks have executed. From my limited 
> >> experience with ML, I cannot seem to find a satisfying solution to 
> >> this challenge ...
> >>
> >> My ideas:
> >> - After the spawn call a function recursively which sleeps 
> for some 
> >> time and checks the number of tasks in the task queue, and 
> once it's 
> >> empty assumes "that that's that" and updates/creates a document?
> >> - Have each spawned task inspect the task queue and if 
> there is just 
> >> one task in the queue (i.e. itself), trigger the document update?
> >>
> >> Hmmm, any better ideas?
> >>
> >> Thanks,
> >> Jakob.
> >> _______________________________________________
> >> General mailing list
> >> [email protected]
> >> http://xqzone.com/mailman/listinfo/general
> >>
> >
> > _______________________________________________
> > General mailing list
> > [email protected]
> > http://xqzone.com/mailman/listinfo/general
> >
> _______________________________________________
> General mailing list
> [email protected]
> http://xqzone.com/mailman/listinfo/general
> _______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to