Geert,

Good question about storing this info at all.  Doing a normal xpath
takes clearly too long (five seconds or so), so yes, you're right, I
will test the index on the attribute value.

cheers,
Jakob.



On Tue, Jul 14, 2009 at 16:36, Geert Josten<[email protected]> wrote:
> I am wondering why storing it in the database at all. Why not calculate it on 
> demand? Putting an index on the boolean element should allow it to perform 
> even when you have processed many many many documents..
>
> You might even try doing it without adding a particular index. It might be 
> covered by the word index already..
>
> I did a similar thing to keep track of all document being processed by CPF, 
> using counts on all documents with specific property values to show a 
> progress bar. I haven't tried it with many documents yet, but just showing 
> the progress bar based on about 4 counts, takes only a few tens of a second.. 
> Didn't need any special indexes at all..
>
> Kind regards,
> Geert
>
>> -----Original Message-----
>> From: [email protected]
>> [mailto:[email protected]] On Behalf Of
>> Jakob Fix
>> Sent: dinsdag 14 juli 2009 16:27
>> To: General Mark Logic Developer Discussion
>> Subject: Re: [MarkLogic Dev General] triggering after spawning
>>
>> Geert,
>>
>> thanks for the quick reply. Some more information which
>> explains the logic behind what I'm doing:
>>
>> Each day I get an input document containing a(n increasing)
>> number of URLs (currently around 23.000) which return XML
>> documents, containing among other things a boolean value.
>> Each day, I record the total number of documents actually
>> retrieved, the number of "true" and the number of "false"
>> (the total number being a kind of checksum).
>>
>> The summary document looks a bit like this:
>>
>> <doi-stats>
>> ...
>>   <doi-stat date="2009-07-14"
>>       recorded="{fn:current-dateTime()}" resolved="123"
>>       unresolved="456" total="579" />
>> ...
>> </doi-stats>
>>
>> Now, you're right it might be possible for each spawned task
>> to update this document, however, wouldn't there be a serious
>> performance impact?
>>
>> First, I would have to decrease the number of concurrent
>> tasks (currently six) to maybe two (or even one?), so that
>> there's not too much time spent waiting to update the
>> document.  Second, for each document I would need to count
>> all documents in the collection (or the directory), and
>> third, I'd need do the two xpaths to retrieve the booleans ...
>>
>> The more I think about this approach, the less I'm convinced
>> that it's scalable, but I'd be more than happy to be
>> convinced otherwise!
>>
>> thanks,
>> Jakob.
>>
>>
>>
>> On Tue, Jul 14, 2009 at 16:02, Geert
>> Josten<[email protected]> wrote:
>> > Or just have each task update the summary document, each
>> incrementing the finished docs counter by one (if there is any)?
>> >
>> > Note: that effectively serialize all tasks..
>> >
>> > Kind regards,
>> > Geert
>> >
>> >>
>> >
>> >
>> > Drs. G.P.H. Josten
>> > Consultant
>> >
>> >
>> > http://www.daidalos.nl/
>> > Daidalos BV
>> > Source of Innovation
>> > Hoekeindsehof 1-4
>> > 2665 JZ Bleiswijk
>> > Tel.: +31 (0) 10 850 1200
>> > Fax: +31 (0) 10 850 1199
>> > http://www.daidalos.nl/
>> > KvK 27164984
>> > De informatie - verzonden in of met dit emailbericht - is
>> afkomstig van Daidalos BV en is uitsluitend bestemd voor de
>> geadresseerde. Indien u dit bericht onbedoeld hebt ontvangen,
>> verzoeken wij u het te verwijderen. Aan dit bericht kunnen
>> geen rechten worden ontleend.
>> >
>> >
>> >> From: [email protected]
>> >> [mailto:[email protected]] On Behalf
>> Of Jakob
>> >> Fix
>> >> Sent: dinsdag 14 juli 2009 15:55
>> >> To: General Mark Logic Developer Discussion
>> >> Subject: [MarkLogic Dev General] triggering after spawning
>> >>
>> >> So I manage to spawn some twenty thousand tasks to
>> retrieve documents
>> >> from a remote server and to store them in MarkLogic.  I've also
>> >> created a user interface with a progress bar to follow its
>> progress
>> >> (although this won't be used in production).
>> >>
>> >> Now, what I'd like to do is to trigger an update of a summary
>> >> document once all spawned tasks have executed. From my limited
>> >> experience with ML, I cannot seem to find a satisfying solution to
>> >> this challenge ...
>> >>
>> >> My ideas:
>> >> - After the spawn call a function recursively which sleeps
>> for some
>> >> time and checks the number of tasks in the task queue, and
>> once it's
>> >> empty assumes "that that's that" and updates/creates a document?
>> >> - Have each spawned task inspect the task queue and if
>> there is just
>> >> one task in the queue (i.e. itself), trigger the document update?
>> >>
>> >> Hmmm, any better ideas?
>> >>
>> >> Thanks,
>> >> Jakob.
>> >> _______________________________________________
>> >> General mailing list
>> >> [email protected]
>> >> http://xqzone.com/mailman/listinfo/general
>> >>
>> >
>> > _______________________________________________
>> > General mailing list
>> > [email protected]
>> > http://xqzone.com/mailman/listinfo/general
>> >
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://xqzone.com/mailman/listinfo/general
>> _______________________________________________
> General mailing list
> [email protected]
> http://xqzone.com/mailman/listinfo/general
>
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to