Mike, No I hadn't, actually. But now I have. :-)
I have only little documents in the database, but even for this little profile timing drops from 0.2 sec to 0.02 sec. Unfortunately, I was expecting the following numbers: Total: 1802 Done: 1802 (100%) Active: 0 (0%) Error: 0 (0%) But am now getting: Total: 1802 Done: 1802 (100%) Active: -1775 (-99%) Error: 1775 (99%) Apparently, the estimate for error docs differs from the real count: let $error-count := xdmp:estimate( xdmp:document-properties()/prop:properties[cpf:state/text() = 'http://marklogic.com/states/error'] ) I am guessing it is some sophisticated 'feature' of xdmp:estimate being fragment based, but have trouble figuring things out. Some database statistics: Docs: 1,802 Fragments: 3,619 Deleted: 370 Stands: 2 A merge didn't make any different, other than clearing deleted fragments.. Any ideas? Anyone? Kind regards, Geert > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of > Michael Blakeley > Sent: dinsdag 14 juli 2009 17:39 > To: General Mark Logic Developer Discussion > Subject: Re: [MarkLogic Dev General] triggering after spawning > > Geert, > > Have you tried xdmp:estimate() instead of count()? The > difference is that count() generally drives I/O, while > xdmp:estimate() does not. For this purpose, I believe that > both will return the same results using the default indexes. > I don't think any special indexes are needed. > > thanks, > -- Mike > > On 2009-07-14 07:55, Geert Josten wrote: > > Hi Jakob, > > > > I am, quite brutely, doing things like this: > > > > let $total-count := count( > > xdmp:document-properties()/prop:properties/cpf:processing-status ) > > > > let $done-count := count( > > > xdmp:document-properties()/prop:properties[cpf:processing-status/text( > > ) = 'done' and not(cpf:state/text() = > > 'http://marklogic.com/states/error')] ) let $error-count := count( > > xdmp:document-properties()/prop:properties[cpf:state/text() = > > 'http://marklogic.com/states/error'] ) let $active-count := > > $total-count - $error-count - $done-count > > > > No looping, just xpath with predicates wrapped in a count. > No special indexes (yet).. > > > > Kind regards, > > Geert > > > >> -----Original Message----- > >> From: [email protected] > >> [mailto:[email protected]] On Behalf > Of Jakob > >> Fix > >> Sent: dinsdag 14 juli 2009 16:44 > >> To: General Mark Logic Developer Discussion > >> Subject: Re: [MarkLogic Dev General] triggering after spawning > >> > >> Geert, > >> > >> Good question about storing this info at all. Doing a > normal xpath > >> takes clearly too long (five seconds or so), so yes, > you're right, I > >> will test the index on the attribute value. > >> > >> cheers, > >> Jakob. > >> > >> > >> > >> On Tue, Jul 14, 2009 at 16:36, Geert > >> Josten<[email protected]> wrote: > >>> I am wondering why storing it in the database at all. Why > >> not calculate it on demand? Putting an index on the > boolean element > >> should allow it to perform even when you have processed many many > >> many documents.. > >>> You might even try doing it without adding a particular > >> index. It might be covered by the word index already.. > >>> I did a similar thing to keep track of all document being > >> processed by CPF, using counts on all documents with specific > >> property values to show a progress bar. I haven't tried it > with many > >> documents yet, but just showing the progress bar based on about 4 > >> counts, takes only a few tens of a second.. > >> Didn't need any special indexes at all.. > >>> Kind regards, > >>> Geert > >>> > >>>> -----Original Message----- > >>>> From: [email protected] > >>>> [mailto:[email protected]] On Behalf > >> Of Jakob > >>>> Fix > >>>> Sent: dinsdag 14 juli 2009 16:27 > >>>> To: General Mark Logic Developer Discussion > >>>> Subject: Re: [MarkLogic Dev General] triggering after spawning > >>>> > >>>> Geert, > >>>> > >>>> thanks for the quick reply. Some more information which > >> explains the > >>>> logic behind what I'm doing: > >>>> > >>>> Each day I get an input document containing a(n > >> increasing) number of > >>>> URLs (currently around 23.000) which return XML documents, > >> containing > >>>> among other things a boolean value. > >>>> Each day, I record the total number of documents actually > >> retrieved, > >>>> the number of "true" and the number of "false" > >>>> (the total number being a kind of checksum). > >>>> > >>>> The summary document looks a bit like this: > >>>> > >>>> <doi-stats> > >>>> ... > >>>> <doi-stat date="2009-07-14" > >>>> recorded="{fn:current-dateTime()}" resolved="123" > >>>> unresolved="456" total="579" /> ... > >>>> </doi-stats> > >>>> > >>>> Now, you're right it might be possible for each spawned task to > >>>> update this document, however, wouldn't there be a serious > >>>> performance impact? > >>>> > >>>> First, I would have to decrease the number of concurrent tasks > >>>> (currently six) to maybe two (or even one?), so that > >> there's not too > >>>> much time spent waiting to update the document. Second, > for each > >>>> document I would need to count all documents in the > collection (or > >>>> the directory), and third, I'd need do the two xpaths to > >> retrieve the > >>>> booleans ... > >>>> > >>>> The more I think about this approach, the less I'm > convinced that > >>>> it's scalable, but I'd be more than happy to be convinced > >> otherwise! > >>>> thanks, > >>>> Jakob. > >>>> > >>>> > >>>> > >>>> On Tue, Jul 14, 2009 at 16:02, Geert > >>>> Josten<[email protected]> wrote: > >>>>> Or just have each task update the summary document, each > >>>> incrementing the finished docs counter by one (if there is any)? > >>>>> Note: that effectively serialize all tasks.. > >>>>> > >>>>> Kind regards, > >>>>> Geert > >>>>> > >>>>> > >>>>> Drs. G.P.H. Josten > >>>>> Consultant > >>>>> > >>>>> > >>>>> http://www.daidalos.nl/ > >>>>> Daidalos BV > >>>>> Source of Innovation > >>>>> Hoekeindsehof 1-4 > >>>>> 2665 JZ Bleiswijk > >>>>> Tel.: +31 (0) 10 850 1200 > >>>>> Fax: +31 (0) 10 850 1199 > >>>>> http://www.daidalos.nl/ > >>>>> KvK 27164984 > >>>>> De informatie - verzonden in of met dit emailbericht - is > >>>> afkomstig van Daidalos BV en is uitsluitend bestemd voor de > >>>> geadresseerde. Indien u dit bericht onbedoeld hebt ontvangen, > >>>> verzoeken wij u het te verwijderen. Aan dit bericht kunnen geen > >>>> rechten worden ontleend. > >>>>> > >>>>>> From: [email protected] > >>>>>> [mailto:[email protected]] On Behalf > >>>> Of Jakob > >>>>>> Fix > >>>>>> Sent: dinsdag 14 juli 2009 15:55 > >>>>>> To: General Mark Logic Developer Discussion > >>>>>> Subject: [MarkLogic Dev General] triggering after spawning > >>>>>> > >>>>>> So I manage to spawn some twenty thousand tasks to > >>>> retrieve documents > >>>>>> from a remote server and to store them in MarkLogic. > I've also > >>>>>> created a user interface with a progress bar to follow its > >>>> progress > >>>>>> (although this won't be used in production). > >>>>>> > >>>>>> Now, what I'd like to do is to trigger an update of a summary > >>>>>> document once all spawned tasks have executed. From my limited > >>>>>> experience with ML, I cannot seem to find a satisfying > >> solution to > >>>>>> this challenge ... > >>>>>> > >>>>>> My ideas: > >>>>>> - After the spawn call a function recursively which sleeps > >>>> for some > >>>>>> time and checks the number of tasks in the task queue, and > >>>> once it's > >>>>>> empty assumes "that that's that" and updates/creates a > document? > >>>>>> - Have each spawned task inspect the task queue and if > >>>> there is just > >>>>>> one task in the queue (i.e. itself), trigger the > >> document update? > >>>>>> Hmmm, any better ideas? > >>>>>> > >>>>>> Thanks, > >>>>>> Jakob. > >>>>>> _______________________________________________ > >>>>>> General mailing list > >>>>>> [email protected] > >>>>>> http://xqzone.com/mailman/listinfo/general > >>>>>> > >>>>> _______________________________________________ > >>>>> General mailing list > >>>>> [email protected] > >>>>> http://xqzone.com/mailman/listinfo/general > >>>>> > >>>> _______________________________________________ > >>>> General mailing list > >>>> [email protected] > >>>> http://xqzone.com/mailman/listinfo/general > >>>> _______________________________________________ > >>> General mailing list > >>> [email protected] > >>> http://xqzone.com/mailman/listinfo/general > >>> > >> _______________________________________________ > >> General mailing list > >> [email protected] > >> http://xqzone.com/mailman/listinfo/general > >> _______________________________________________ > > General mailing list > > [email protected] > > http://xqzone.com/mailman/listinfo/general > > _______________________________________________ > General mailing list > [email protected] > http://xqzone.com/mailman/listinfo/general > _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
