[MarkLogic Dev General] Bulk content processing in MarkLogic

Alexei Betin Thu, 15 Jan 2015 10:34:31 -0800

Hello,

I stumble upon what seems to be a straightforward task of making a bulk 
modification of XML documents in MarkLogic (such as adding a new element to 
every document in the collection). I've looked at CPF first, but it seems like 
it only supports event-based processing (triggers) and does not have any 
facility for batch processing.


So I just write a simple xQuery as follows:

for $x in collection()/A  return ( xdmp:node-delete( $x/test ), 
xdmp:node-insert-child( $x, <test>test</test> ) )

but it runs out of memory on a large collection - "Expanded tree cache full". 
So it looks like the above query is trying fetch all documents into memory 
first, then iterate over them.

Whereas what I want is to perform the work on smaller chunks of data that fit 
into memory and, ideally, do several such chunks in parallel (think "map" 
without "reduce").

Is there another approach? I am reading about CoRB that seems to be just the 
thing I need, but I wonder if I am missing another potential solution here.

Also, while CoRB description mentions that it can run updates on disk (not in 
memory), it does not mention parallelization - which eventually will be quite 
important for my use case.

Thanks,

[Forward Slash]

[Elevate]

Alexei Betin

Principal Architect; Big Data
P: (817) 928-1643 | Elevate.com<http://www.elevate.com>
4150 International Plaza, Suite 300
Fort Worth, TX 76109


Privileged and Confidential. This e-mail, and any attachments thereto, is 
intended only for use by the addressee(s) named herein and may contain 
privileged and/or confidential information. If you have received this e-mail in 
error, please notify me immediately by a return e-mail and delete this e-mail. 
You are hereby notified that any dissemination, distribution or copying of this 
e-mail and/or any attachments thereto, is strictly prohibited.

_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

[MarkLogic Dev General] Bulk content processing in MarkLogic

Reply via email to