Thanks, that does the trick!
> -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of > Michael Blakeley > Sent: dinsdag 14 juli 2009 22:44 > To: General Mark Logic Developer Discussion > Subject: Re: [MarkLogic Dev General] xdmp:estimate.. > > Geert, > > Try removing the '/text()' step. It isn't necessary, and > seems to confuse the evaluator in this case. > > import module namespace cpf = "http://marklogic.com/cpf" at > "/MarkLogic/cpf/cpf.xqy"; > > xdmp:estimate( > xdmp:document-properties()/prop:properties[ > cpf:state/text() = 'http://marklogic.com/states/error'] > ) , xdmp:estimate( > xdmp:document-properties()/prop:properties[ > cpf:state = 'http://marklogic.com/states/error'] ) , count( > xdmp:document-properties()/prop:properties[ > cpf:state = 'http://marklogic.com/states/error'] ) > > => 2 0 0 > > I believe that '2' is the count of property fragments that > have prop:properties and cpf:state elements, ignoring the > value of cpf:state. > > -- Mike > > On 2009-07-14 13:38, Geert Josten wrote: > > Mike, > > > > No I hadn't, actually. But now I have. :-) > > > > I have only little documents in the database, but even for > this little profile timing drops from 0.2 sec to 0.02 sec. > Unfortunately, I was expecting the following numbers: > > > > Total: 1802 > > Done: 1802 (100%) > > Active: 0 (0%) > > Error: 0 (0%) > > > > But am now getting: > > > > Total: 1802 > > Done: 1802 (100%) > > Active: -1775 (-99%) > > Error: 1775 (99%) > > > > Apparently, the estimate for error docs differs from the real count: > > > > let $error-count := xdmp:estimate( > > xdmp:document-properties()/prop:properties[cpf:state/text() = > > 'http://marklogic.com/states/error'] ) > > > > I am guessing it is some sophisticated 'feature' of > xdmp:estimate being fragment based, but have trouble figuring > things out. > > > > Some database statistics: > > Docs: 1,802 > > Fragments: 3,619 > > Deleted: 370 > > Stands: 2 > > > > A merge didn't make any different, other than clearing > deleted fragments.. > > > > Any ideas? Anyone? > > > > Kind regards, > > Geert > > > >> -----Original Message----- > >> From: [email protected] > >> [mailto:[email protected]] On Behalf > Of Michael > >> Blakeley > >> Sent: dinsdag 14 juli 2009 17:39 > >> To: General Mark Logic Developer Discussion > >> Subject: Re: [MarkLogic Dev General] triggering after spawning > >> > >> Geert, > >> > >> Have you tried xdmp:estimate() instead of count()? The > difference is > >> that count() generally drives I/O, while > >> xdmp:estimate() does not. For this purpose, I believe that > both will > >> return the same results using the default indexes. > >> I don't think any special indexes are needed. > >> > >> thanks, > >> -- Mike > >> > >> On 2009-07-14 07:55, Geert Josten wrote: > >>> Hi Jakob, > >>> > >>> I am, quite brutely, doing things like this: > >>> > >>> let $total-count := count( > >>> xdmp:document-properties()/prop:properties/cpf:processing-status ) > >>> > >>> let $done-count := count( > >>> > >> > xdmp:document-properties()/prop:properties[cpf:processing-status/text > >> ( > >>> ) = 'done' and not(cpf:state/text() = > >>> 'http://marklogic.com/states/error')] ) let $error-count := count( > >>> xdmp:document-properties()/prop:properties[cpf:state/text() = > >>> 'http://marklogic.com/states/error'] ) let $active-count := > >>> $total-count - $error-count - $done-count > >>> > >>> No looping, just xpath with predicates wrapped in a count. > >> No special indexes (yet).. > >>> Kind regards, > >>> Geert > >>> > >>>> -----Original Message----- > >>>> From: [email protected] > >>>> [mailto:[email protected]] On Behalf > >> Of Jakob > >>>> Fix > >>>> Sent: dinsdag 14 juli 2009 16:44 > >>>> To: General Mark Logic Developer Discussion > >>>> Subject: Re: [MarkLogic Dev General] triggering after spawning > >>>> > >>>> Geert, > >>>> > >>>> Good question about storing this info at all. Doing a > >> normal xpath > >>>> takes clearly too long (five seconds or so), so yes, > >> you're right, I > >>>> will test the index on the attribute value. > >>>> > >>>> cheers, > >>>> Jakob. > >>>> > >>>> > >>>> > >>>> On Tue, Jul 14, 2009 at 16:36, Geert > >>>> Josten<[email protected]> wrote: > >>>>> I am wondering why storing it in the database at all. Why > >>>> not calculate it on demand? Putting an index on the > >> boolean element > >>>> should allow it to perform even when you have processed > many many > >>>> many documents.. > >>>>> You might even try doing it without adding a particular > >>>> index. It might be covered by the word index already.. > >>>>> I did a similar thing to keep track of all document being > >>>> processed by CPF, using counts on all documents with specific > >>>> property values to show a progress bar. I haven't tried it > >> with many > >>>> documents yet, but just showing the progress bar based > on about 4 > >>>> counts, takes only a few tens of a second.. > >>>> Didn't need any special indexes at all.. > >>>>> Kind regards, > >>>>> Geert > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: [email protected] > >>>>>> [mailto:[email protected]] On Behalf > >>>> Of Jakob > >>>>>> Fix > >>>>>> Sent: dinsdag 14 juli 2009 16:27 > >>>>>> To: General Mark Logic Developer Discussion > >>>>>> Subject: Re: [MarkLogic Dev General] triggering after spawning > >>>>>> > >>>>>> Geert, > >>>>>> > >>>>>> thanks for the quick reply. Some more information which > >>>> explains the > >>>>>> logic behind what I'm doing: > >>>>>> > >>>>>> Each day I get an input document containing a(n > >>>> increasing) number of > >>>>>> URLs (currently around 23.000) which return XML documents, > >>>> containing > >>>>>> among other things a boolean value. > >>>>>> Each day, I record the total number of documents actually > >>>> retrieved, > >>>>>> the number of "true" and the number of "false" > >>>>>> (the total number being a kind of checksum). > >>>>>> > >>>>>> The summary document looks a bit like this: > >>>>>> > >>>>>> <doi-stats> > >>>>>> ... > >>>>>> <doi-stat date="2009-07-14" > >>>>>> recorded="{fn:current-dateTime()}" resolved="123" > >>>>>> unresolved="456" total="579" /> ... > >>>>>> </doi-stats> > >>>>>> > >>>>>> Now, you're right it might be possible for each > spawned task to > >>>>>> update this document, however, wouldn't there be a serious > >>>>>> performance impact? > >>>>>> > >>>>>> First, I would have to decrease the number of concurrent tasks > >>>>>> (currently six) to maybe two (or even one?), so that > >>>> there's not too > >>>>>> much time spent waiting to update the document. Second, > >> for each > >>>>>> document I would need to count all documents in the > >> collection (or > >>>>>> the directory), and third, I'd need do the two xpaths to > >>>> retrieve the > >>>>>> booleans ... > >>>>>> > >>>>>> The more I think about this approach, the less I'm > >> convinced that > >>>>>> it's scalable, but I'd be more than happy to be convinced > >>>> otherwise! > >>>>>> thanks, > >>>>>> Jakob. > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Tue, Jul 14, 2009 at 16:02, Geert > >>>>>> Josten<[email protected]> wrote: > >>>>>>> Or just have each task update the summary document, each > >>>>>> incrementing the finished docs counter by one (if > there is any)? > >>>>>>> Note: that effectively serialize all tasks.. > >>>>>>> > >>>>>>> Kind regards, > >>>>>>> Geert > >>>>>>> > >>>>>>> > >>>>>>> Drs. G.P.H. Josten > >>>>>>> Consultant > >>>>>>> > >>>>>>> > >>>>>>> http://www.daidalos.nl/ > >>>>>>> Daidalos BV > >>>>>>> Source of Innovation > >>>>>>> Hoekeindsehof 1-4 > >>>>>>> 2665 JZ Bleiswijk > >>>>>>> Tel.: +31 (0) 10 850 1200 > >>>>>>> Fax: +31 (0) 10 850 1199 > >>>>>>> http://www.daidalos.nl/ > >>>>>>> KvK 27164984 > >>>>>>> De informatie - verzonden in of met dit emailbericht - is > >>>>>> afkomstig van Daidalos BV en is uitsluitend bestemd voor de > >>>>>> geadresseerde. Indien u dit bericht onbedoeld hebt ontvangen, > >>>>>> verzoeken wij u het te verwijderen. Aan dit bericht > kunnen geen > >>>>>> rechten worden ontleend. > >>>>>>>> From: [email protected] > >>>>>>>> [mailto:[email protected]] On Behalf > >>>>>> Of Jakob > >>>>>>>> Fix > >>>>>>>> Sent: dinsdag 14 juli 2009 15:55 > >>>>>>>> To: General Mark Logic Developer Discussion > >>>>>>>> Subject: [MarkLogic Dev General] triggering after spawning > >>>>>>>> > >>>>>>>> So I manage to spawn some twenty thousand tasks to > >>>>>> retrieve documents > >>>>>>>> from a remote server and to store them in MarkLogic. > >> I've also > >>>>>>>> created a user interface with a progress bar to follow its > >>>>>> progress > >>>>>>>> (although this won't be used in production). > >>>>>>>> > >>>>>>>> Now, what I'd like to do is to trigger an update of > a summary > >>>>>>>> document once all spawned tasks have executed. From > my limited > >>>>>>>> experience with ML, I cannot seem to find a satisfying > >>>> solution to > >>>>>>>> this challenge ... > >>>>>>>> > >>>>>>>> My ideas: > >>>>>>>> - After the spawn call a function recursively which sleeps > >>>>>> for some > >>>>>>>> time and checks the number of tasks in the task queue, and > >>>>>> once it's > >>>>>>>> empty assumes "that that's that" and updates/creates a > >> document? > >>>>>>>> - Have each spawned task inspect the task queue and if > >>>>>> there is just > >>>>>>>> one task in the queue (i.e. itself), trigger the > >>>> document update? > >>>>>>>> Hmmm, any better ideas? > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Jakob. > >>>>>>>> _______________________________________________ > >>>>>>>> General mailing list > >>>>>>>> [email protected] > >>>>>>>> http://xqzone.com/mailman/listinfo/general > >>>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> General mailing list > >>>>>>> [email protected] > >>>>>>> http://xqzone.com/mailman/listinfo/general > >>>>>>> > >>>>>> _______________________________________________ > >>>>>> General mailing list > >>>>>> [email protected] > >>>>>> http://xqzone.com/mailman/listinfo/general > >>>>>> _______________________________________________ > >>>>> General mailing list > >>>>> [email protected] > >>>>> http://xqzone.com/mailman/listinfo/general > >>>>> > >>>> _______________________________________________ > >>>> General mailing list > >>>> [email protected] > >>>> http://xqzone.com/mailman/listinfo/general > >>>> _______________________________________________ > >>> General mailing list > >>> [email protected] > >>> http://xqzone.com/mailman/listinfo/general > >> _______________________________________________ > >> General mailing list > >> [email protected] > >> http://xqzone.com/mailman/listinfo/general > >> _______________________________________________ > > General mailing list > > [email protected] > > http://xqzone.com/mailman/listinfo/general > > _______________________________________________ > General mailing list > [email protected] > http://xqzone.com/mailman/listinfo/general > _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
