Geert,

Try removing the '/text()' step. It isn't necessary, and seems to confuse the evaluator in this case.

import module namespace cpf = "http://marklogic.com/cpf"; at "/MarkLogic/cpf/cpf.xqy";

xdmp:estimate(
  xdmp:document-properties()/prop:properties[
    cpf:state/text() = 'http://marklogic.com/states/error'] )
,
xdmp:estimate(
  xdmp:document-properties()/prop:properties[
    cpf:state = 'http://marklogic.com/states/error'] )
,
count(
  xdmp:document-properties()/prop:properties[
    cpf:state = 'http://marklogic.com/states/error'] )

=> 2 0 0

I believe that '2' is the count of property fragments that have prop:properties and cpf:state elements, ignoring the value of cpf:state.

-- Mike

On 2009-07-14 13:38, Geert Josten wrote:
Mike,

No I hadn't, actually. But now I have. :-)

I have only little documents in the database, but even for this little profile 
timing drops from 0.2 sec to 0.02 sec. Unfortunately, I was expecting the 
following numbers:

Total: 1802
Done: 1802 (100%)
Active: 0 (0%)
Error: 0 (0%)

But am now getting:

Total: 1802
Done: 1802 (100%)
Active: -1775 (-99%)
Error: 1775 (99%)

Apparently, the estimate for error docs differs from the real count:

let $error-count := xdmp:estimate( 
xdmp:document-properties()/prop:properties[cpf:state/text() = 
'http://marklogic.com/states/error'] )

I am guessing it is some sophisticated 'feature' of xdmp:estimate being 
fragment based, but have trouble figuring things out.

Some database statistics:
Docs: 1,802
Fragments: 3,619
Deleted: 370
Stands: 2

A merge didn't make any different, other than clearing deleted fragments..

Any ideas? Anyone?

Kind regards,
Geert

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of
Michael Blakeley
Sent: dinsdag 14 juli 2009 17:39
To: General Mark Logic Developer Discussion
Subject: Re: [MarkLogic Dev General] triggering after spawning

Geert,

Have you tried xdmp:estimate() instead of count()? The
difference is that count() generally drives I/O, while
xdmp:estimate() does not. For this purpose, I believe that
both will return the same results using the default indexes.
I don't think any special indexes are needed.

thanks,
-- Mike

On 2009-07-14 07:55, Geert Josten wrote:
Hi Jakob,

I am, quite brutely, doing things like this:

let $total-count := count(
xdmp:document-properties()/prop:properties/cpf:processing-status )

let $done-count := count(

xdmp:document-properties()/prop:properties[cpf:processing-status/text(
) = 'done' and not(cpf:state/text() =
'http://marklogic.com/states/error')] ) let $error-count := count(
xdmp:document-properties()/prop:properties[cpf:state/text() =
'http://marklogic.com/states/error'] ) let $active-count :=
$total-count - $error-count - $done-count

No looping, just xpath with predicates wrapped in a count.
No special indexes (yet)..
Kind regards,
Geert

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf
Of Jakob
Fix
Sent: dinsdag 14 juli 2009 16:44
To: General Mark Logic Developer Discussion
Subject: Re: [MarkLogic Dev General] triggering after spawning

Geert,

Good question about storing this info at all.  Doing a
normal xpath
takes clearly too long (five seconds or so), so yes,
you're right, I
will test the index on the attribute value.

cheers,
Jakob.



On Tue, Jul 14, 2009 at 16:36, Geert
Josten<[email protected]>   wrote:
I am wondering why storing it in the database at all. Why
not calculate it on demand? Putting an index on the
boolean element
should allow it to perform even when you have processed many many
many documents..
You might even try doing it without adding a particular
index. It might be covered by the word index already..
I did a similar thing to keep track of all document being
processed by CPF, using counts on all documents with specific
property values to show a progress bar. I haven't tried it
with many
documents yet, but just showing the progress bar based on about 4
counts, takes only a few tens of a second..
Didn't need any special indexes at all..
Kind regards,
Geert

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf
Of Jakob
Fix
Sent: dinsdag 14 juli 2009 16:27
To: General Mark Logic Developer Discussion
Subject: Re: [MarkLogic Dev General] triggering after spawning

Geert,

thanks for the quick reply. Some more information which
explains the
logic behind what I'm doing:

Each day I get an input document containing a(n
increasing) number of
URLs (currently around 23.000) which return XML documents,
containing
among other things a boolean value.
Each day, I record the total number of documents actually
retrieved,
the number of "true" and the number of "false"
(the total number being a kind of checksum).

The summary document looks a bit like this:

<doi-stats>
...
    <doi-stat date="2009-07-14"
        recorded="{fn:current-dateTime()}" resolved="123"
        unresolved="456" total="579" />   ...
</doi-stats>

Now, you're right it might be possible for each spawned task to
update this document, however, wouldn't there be a serious
performance impact?

First, I would have to decrease the number of concurrent tasks
(currently six) to maybe two (or even one?), so that
there's not too
much time spent waiting to update the document.  Second,
for each
document I would need to count all documents in the
collection (or
the directory), and third, I'd need do the two xpaths to
retrieve the
booleans ...

The more I think about this approach, the less I'm
convinced that
it's scalable, but I'd be more than happy to be convinced
otherwise!
thanks,
Jakob.



On Tue, Jul 14, 2009 at 16:02, Geert
Josten<[email protected]>   wrote:
Or just have each task update the summary document, each
incrementing the finished docs counter by one (if there is any)?
Note: that effectively serialize all tasks..

Kind regards,
Geert


Drs. G.P.H. Josten
Consultant


http://www.daidalos.nl/
Daidalos BV
Source of Innovation
Hoekeindsehof 1-4
2665 JZ Bleiswijk
Tel.: +31 (0) 10 850 1200
Fax: +31 (0) 10 850 1199
http://www.daidalos.nl/
KvK 27164984
De informatie - verzonden in of met dit emailbericht - is
afkomstig van Daidalos BV en is uitsluitend bestemd voor de
geadresseerde. Indien u dit bericht onbedoeld hebt ontvangen,
verzoeken wij u het te verwijderen. Aan dit bericht kunnen geen
rechten worden ontleend.
From: [email protected]
[mailto:[email protected]] On Behalf
Of Jakob
Fix
Sent: dinsdag 14 juli 2009 15:55
To: General Mark Logic Developer Discussion
Subject: [MarkLogic Dev General] triggering after spawning

So I manage to spawn some twenty thousand tasks to
retrieve documents
from a remote server and to store them in MarkLogic.
I've also
created a user interface with a progress bar to follow its
progress
(although this won't be used in production).

Now, what I'd like to do is to trigger an update of a summary
document once all spawned tasks have executed. From my limited
experience with ML, I cannot seem to find a satisfying
solution to
this challenge ...

My ideas:
- After the spawn call a function recursively which sleeps
for some
time and checks the number of tasks in the task queue, and
once it's
empty assumes "that that's that" and updates/creates a
document?
- Have each spawned task inspect the task queue and if
there is just
one task in the queue (i.e. itself), trigger the
document update?
Hmmm, any better ideas?

Thanks,
Jakob.
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to