Geert, Good question about storing this info at all. Doing a normal xpath takes clearly too long (five seconds or so), so yes, you're right, I will test the index on the attribute value.
cheers, Jakob. On Tue, Jul 14, 2009 at 16:36, Geert Josten<[email protected]> wrote: > I am wondering why storing it in the database at all. Why not calculate it on > demand? Putting an index on the boolean element should allow it to perform > even when you have processed many many many documents.. > > You might even try doing it without adding a particular index. It might be > covered by the word index already.. > > I did a similar thing to keep track of all document being processed by CPF, > using counts on all documents with specific property values to show a > progress bar. I haven't tried it with many documents yet, but just showing > the progress bar based on about 4 counts, takes only a few tens of a second.. > Didn't need any special indexes at all.. > > Kind regards, > Geert > >> -----Original Message----- >> From: [email protected] >> [mailto:[email protected]] On Behalf Of >> Jakob Fix >> Sent: dinsdag 14 juli 2009 16:27 >> To: General Mark Logic Developer Discussion >> Subject: Re: [MarkLogic Dev General] triggering after spawning >> >> Geert, >> >> thanks for the quick reply. Some more information which >> explains the logic behind what I'm doing: >> >> Each day I get an input document containing a(n increasing) >> number of URLs (currently around 23.000) which return XML >> documents, containing among other things a boolean value. >> Each day, I record the total number of documents actually >> retrieved, the number of "true" and the number of "false" >> (the total number being a kind of checksum). >> >> The summary document looks a bit like this: >> >> <doi-stats> >> ... >> <doi-stat date="2009-07-14" >> recorded="{fn:current-dateTime()}" resolved="123" >> unresolved="456" total="579" /> >> ... >> </doi-stats> >> >> Now, you're right it might be possible for each spawned task >> to update this document, however, wouldn't there be a serious >> performance impact? >> >> First, I would have to decrease the number of concurrent >> tasks (currently six) to maybe two (or even one?), so that >> there's not too much time spent waiting to update the >> document. Second, for each document I would need to count >> all documents in the collection (or the directory), and >> third, I'd need do the two xpaths to retrieve the booleans ... >> >> The more I think about this approach, the less I'm convinced >> that it's scalable, but I'd be more than happy to be >> convinced otherwise! >> >> thanks, >> Jakob. >> >> >> >> On Tue, Jul 14, 2009 at 16:02, Geert >> Josten<[email protected]> wrote: >> > Or just have each task update the summary document, each >> incrementing the finished docs counter by one (if there is any)? >> > >> > Note: that effectively serialize all tasks.. >> > >> > Kind regards, >> > Geert >> > >> >> >> > >> > >> > Drs. G.P.H. Josten >> > Consultant >> > >> > >> > http://www.daidalos.nl/ >> > Daidalos BV >> > Source of Innovation >> > Hoekeindsehof 1-4 >> > 2665 JZ Bleiswijk >> > Tel.: +31 (0) 10 850 1200 >> > Fax: +31 (0) 10 850 1199 >> > http://www.daidalos.nl/ >> > KvK 27164984 >> > De informatie - verzonden in of met dit emailbericht - is >> afkomstig van Daidalos BV en is uitsluitend bestemd voor de >> geadresseerde. Indien u dit bericht onbedoeld hebt ontvangen, >> verzoeken wij u het te verwijderen. Aan dit bericht kunnen >> geen rechten worden ontleend. >> > >> > >> >> From: [email protected] >> >> [mailto:[email protected]] On Behalf >> Of Jakob >> >> Fix >> >> Sent: dinsdag 14 juli 2009 15:55 >> >> To: General Mark Logic Developer Discussion >> >> Subject: [MarkLogic Dev General] triggering after spawning >> >> >> >> So I manage to spawn some twenty thousand tasks to >> retrieve documents >> >> from a remote server and to store them in MarkLogic. I've also >> >> created a user interface with a progress bar to follow its >> progress >> >> (although this won't be used in production). >> >> >> >> Now, what I'd like to do is to trigger an update of a summary >> >> document once all spawned tasks have executed. From my limited >> >> experience with ML, I cannot seem to find a satisfying solution to >> >> this challenge ... >> >> >> >> My ideas: >> >> - After the spawn call a function recursively which sleeps >> for some >> >> time and checks the number of tasks in the task queue, and >> once it's >> >> empty assumes "that that's that" and updates/creates a document? >> >> - Have each spawned task inspect the task queue and if >> there is just >> >> one task in the queue (i.e. itself), trigger the document update? >> >> >> >> Hmmm, any better ideas? >> >> >> >> Thanks, >> >> Jakob. >> >> _______________________________________________ >> >> General mailing list >> >> [email protected] >> >> http://xqzone.com/mailman/listinfo/general >> >> >> > >> > _______________________________________________ >> > General mailing list >> > [email protected] >> > http://xqzone.com/mailman/listinfo/general >> > >> _______________________________________________ >> General mailing list >> [email protected] >> http://xqzone.com/mailman/listinfo/general >> _______________________________________________ > General mailing list > [email protected] > http://xqzone.com/mailman/listinfo/general > _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
