Geert,

thanks for the quick reply. Some more information which explains the
logic behind what I'm doing:

Each day I get an input document containing a(n increasing) number of
URLs (currently around 23.000) which return XML documents, containing
among other things a boolean value.  Each day, I record the total
number of documents actually retrieved, the number of "true" and the
number of "false" (the total number being a kind of checksum).

The summary document looks a bit like this:

<doi-stats>
...
  <doi-stat date="2009-07-14"
      recorded="{fn:current-dateTime()}" resolved="123"
      unresolved="456" total="579" />
...
</doi-stats>

Now, you're right it might be possible for each spawned task to update
this document, however, wouldn't there be a serious performance
impact?

First, I would have to decrease the number of concurrent tasks
(currently six) to maybe two (or even one?), so that there's not too
much time spent waiting to update the document.  Second, for each
document I would need to count all documents in the collection (or the
directory), and third, I'd need do the two xpaths to retrieve the
booleans ...

The more I think about this approach, the less I'm convinced that it's
scalable, but I'd be more than happy to be convinced otherwise!

thanks,
Jakob.



On Tue, Jul 14, 2009 at 16:02, Geert Josten<[email protected]> wrote:
> Or just have each task update the summary document, each incrementing the 
> finished docs counter by one (if there is any)?
>
> Note: that effectively serialize all tasks..
>
> Kind regards,
> Geert
>
>>
>
>
> Drs. G.P.H. Josten
> Consultant
>
>
> http://www.daidalos.nl/
> Daidalos BV
> Source of Innovation
> Hoekeindsehof 1-4
> 2665 JZ Bleiswijk
> Tel.: +31 (0) 10 850 1200
> Fax: +31 (0) 10 850 1199
> http://www.daidalos.nl/
> KvK 27164984
> De informatie - verzonden in of met dit emailbericht - is afkomstig van 
> Daidalos BV en is uitsluitend bestemd voor de geadresseerde. Indien u dit 
> bericht onbedoeld hebt ontvangen, verzoeken wij u het te verwijderen. Aan dit 
> bericht kunnen geen rechten worden ontleend.
>
>
>> From: [email protected]
>> [mailto:[email protected]] On Behalf Of
>> Jakob Fix
>> Sent: dinsdag 14 juli 2009 15:55
>> To: General Mark Logic Developer Discussion
>> Subject: [MarkLogic Dev General] triggering after spawning
>>
>> So I manage to spawn some twenty thousand tasks to retrieve
>> documents from a remote server and to store them in
>> MarkLogic.  I've also created a user interface with a
>> progress bar to follow its progress (although this won't be
>> used in production).
>>
>> Now, what I'd like to do is to trigger an update of a summary
>> document once all spawned tasks have executed. From my
>> limited experience with ML, I cannot seem to find a
>> satisfying solution to this challenge ...
>>
>> My ideas:
>> - After the spawn call a function recursively which sleeps
>> for some time and checks the number of tasks in the task
>> queue, and once it's empty assumes "that that's that" and
>> updates/creates a document?
>> - Have each spawned task inspect the task queue and if there
>> is just one task in the queue (i.e. itself), trigger the
>> document update?
>>
>> Hmmm, any better ideas?
>>
>> Thanks,
>> Jakob.
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://xqzone.com/mailman/listinfo/general
>>
>
> _______________________________________________
> General mailing list
> [email protected]
> http://xqzone.com/mailman/listinfo/general
>
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to