There’s also the CORB facility.  I would try Taskbot, but if you’re not 
familiar with it, I would also try to do a version of it using 
xdmp:spawn-function.  Learning to use xdmp:spawn-function is s a sometimes 
over-looked but extremely useful function.

Paul Hoehne
Senior Consultant
MarkLogic Corporation
paul.hoe...@marklogic.com
mobile: +1 571 830 4735
www.marklogic.com

Click http://po.st/hMGDFm to get your free NoSQL For Dummies e-book!

From: David Ennis <david.en...@hinttech.com<mailto:david.en...@hinttech.com>>
Reply-To: MarkLogic Developer Discussion 
<general@developer.marklogic.com<mailto:general@developer.marklogic.com>>
Date: Thursday, January 15, 2015 at 1:43 PM
To: MarkLogic Developer Discussion 
<general@developer.marklogic.com<mailto:general@developer.marklogic.com>>
Subject: Re: [MarkLogic Dev General] Bulk content processing in MarkLogic

HI.

I usually spawn these types of things in batches.

Also, There is also a nice utility by Michael Blakeley out there to help manage 
this and make good use of the resources of your particular setup - including a 
nice sample to start with:

https://github.com/mblakele/taskbot

It uses pretty much the same functions you would likely use on your own - but 
organized nicely in a reusable/configurable way.


Kind Regards,
David Ennis


David Ennis
Content Engineer

[HintTech] <http://www.hinttech.com/>
Mastering the value of content
creative | technology | content

Delftechpark 37i
2628 XJ Delft
The Netherlands
T: +31 88 268 25 00
M: +31 63 091 72 80

[http://www.hinttech.com]<http://www.hinttech.com> 
[http://www.hinttech.com/signature/Twitter_HintTech.png] 
<https://twitter.com/HintTech>  
[http://www.hinttech.com/signature/Facebook_HintTech.png] 
<http://www.facebook.com/HintTech>  
[http://www.hinttech.com/signature/Linkedin_HintTech.png] 
<http://www.linkedin.com/company/HintTech>

On 15 January 2015 at 19:33, Alexei Betin 
<abe...@elevate.com<mailto:abe...@elevate.com>> wrote:
Hello,

I stumble upon what seems to be a straightforward task of making a bulk 
modification of XML documents in MarkLogic (such as adding a new element to 
every document in the collection). I’ve looked at CPF first, but it seems like 
it only supports event-based processing (triggers) and does not have any 
facility for batch processing.

So I just write a simple xQuery as follows:

for $x in collection()/A  return ( xdmp:node-delete( $x/test ), 
xdmp:node-insert-child( $x, <test>test</test> ) )

but it runs out of memory on a large collection – “Expanded tree cache full”. 
So it looks like the above query is trying fetch all documents into memory 
first, then iterate over them.

Whereas what I want is to perform the work on smaller chunks of data that fit 
into memory and, ideally, do several such chunks in parallel (think “map” 
without “reduce”).

Is there another approach? I am reading about CoRB that seems to be just the 
thing I need, but I wonder if I am missing another potential solution here.

Also, while CoRB description mentions that it can run updates on disk (not in 
memory), it does not mention parallelization – which eventually will be quite 
important for my use case.

Thanks,

[Forward Slash]

[Elevate]

Alexei Betin

Principal Architect; Big Data
P: (817) 928-1643<tel:%28817%29%20928-1643> | 
Elevate.com<http://www.elevate.com>
4150 International Plaza, Suite 300
Fort Worth, TX 76109


Privileged and Confidential. This e-mail, and any attachments thereto, is 
intended only for use by the addressee(s) named herein and may contain 
privileged and/or confidential information. If you have received this e-mail in 
error, please notify me immediately by a return e-mail and delete this e-mail. 
You are hereby notified that any dissemination, distribution or copying of this 
e-mail and/or any attachments thereto, is strictly prohibited.



_______________________________________________
General mailing list
General@developer.marklogic.com<mailto:General@developer.marklogic.com>
http://developer.marklogic.com/mailman/listinfo/general


_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to