There’s also the CORB facility. I would try Taskbot, but if you’re not familiar with it, I would also try to do a version of it using xdmp:spawn-function. Learning to use xdmp:spawn-function is s a sometimes over-looked but extremely useful function.
Paul Hoehne Senior Consultant MarkLogic Corporation paul.hoe...@marklogic.com mobile: +1 571 830 4735 www.marklogic.com Click http://po.st/hMGDFm to get your free NoSQL For Dummies e-book! From: David Ennis <david.en...@hinttech.com<mailto:david.en...@hinttech.com>> Reply-To: MarkLogic Developer Discussion <general@developer.marklogic.com<mailto:general@developer.marklogic.com>> Date: Thursday, January 15, 2015 at 1:43 PM To: MarkLogic Developer Discussion <general@developer.marklogic.com<mailto:general@developer.marklogic.com>> Subject: Re: [MarkLogic Dev General] Bulk content processing in MarkLogic HI. I usually spawn these types of things in batches. Also, There is also a nice utility by Michael Blakeley out there to help manage this and make good use of the resources of your particular setup - including a nice sample to start with: https://github.com/mblakele/taskbot It uses pretty much the same functions you would likely use on your own - but organized nicely in a reusable/configurable way. Kind Regards, David Ennis David Ennis Content Engineer [HintTech] <http://www.hinttech.com/> Mastering the value of content creative | technology | content Delftechpark 37i 2628 XJ Delft The Netherlands T: +31 88 268 25 00 M: +31 63 091 72 80 [http://www.hinttech.com]<http://www.hinttech.com> [http://www.hinttech.com/signature/Twitter_HintTech.png] <https://twitter.com/HintTech> [http://www.hinttech.com/signature/Facebook_HintTech.png] <http://www.facebook.com/HintTech> [http://www.hinttech.com/signature/Linkedin_HintTech.png] <http://www.linkedin.com/company/HintTech> On 15 January 2015 at 19:33, Alexei Betin <abe...@elevate.com<mailto:abe...@elevate.com>> wrote: Hello, I stumble upon what seems to be a straightforward task of making a bulk modification of XML documents in MarkLogic (such as adding a new element to every document in the collection). I’ve looked at CPF first, but it seems like it only supports event-based processing (triggers) and does not have any facility for batch processing. So I just write a simple xQuery as follows: for $x in collection()/A return ( xdmp:node-delete( $x/test ), xdmp:node-insert-child( $x, <test>test</test> ) ) but it runs out of memory on a large collection – “Expanded tree cache full”. So it looks like the above query is trying fetch all documents into memory first, then iterate over them. Whereas what I want is to perform the work on smaller chunks of data that fit into memory and, ideally, do several such chunks in parallel (think “map” without “reduce”). Is there another approach? I am reading about CoRB that seems to be just the thing I need, but I wonder if I am missing another potential solution here. Also, while CoRB description mentions that it can run updates on disk (not in memory), it does not mention parallelization – which eventually will be quite important for my use case. Thanks, [Forward Slash] [Elevate] Alexei Betin Principal Architect; Big Data P: (817) 928-1643<tel:%28817%29%20928-1643> | Elevate.com<http://www.elevate.com> 4150 International Plaza, Suite 300 Fort Worth, TX 76109 Privileged and Confidential. This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain privileged and/or confidential information. If you have received this e-mail in error, please notify me immediately by a return e-mail and delete this e-mail. You are hereby notified that any dissemination, distribution or copying of this e-mail and/or any attachments thereto, is strictly prohibited. _______________________________________________ General mailing list General@developer.marklogic.com<mailto:General@developer.marklogic.com> http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general