All data resides in forests and forests reside on specific hosts. You can operate on only data in a particular forest. That is one way to operate only on data on the local host. To find out the forest for a document, use XQuery function xdmp:document-forest. To find out all forests on a host, use XQuery function xdmp:host-forests. To search for documents matching a query but only in particular forests, add the forest ids as the fifth argument to cts:search calls.
> Going further, suppose we have sort of map reduce pattern. Is data > processed (e.g. reduced) on the node it exists and then returned back to the > caller (e.g. to benefit of caching)? The server has many different ways it optimizes caching, and where appropriate it takes into account processing on the node it exists and then returning it back to the caller. The good news is you don't usually have to (or want to) bother yourself about it. Nevertheless, it's very much worth understanding, which is why we publish the e-book Inside MarkLogic Server<http://developer.marklogic.com/inside-marklogic>. In addition, the Java Client API has a new Data Movement SDK which takes care of this for you from the client side. If need to process massive amounts of documents, it will split them up for you into forest-specific batches. You can then process one forest-specific batch at a time on the host with that forest. Sam Mefford Senior Engineer MarkLogic Corporation [email protected] Cell: +1 801 706 9731 www.marklogic.com<http://www.marklogic.com> This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation. ________________________________ From: [email protected] [[email protected]] on behalf of Shmennen [[email protected]] Sent: Tuesday, August 29, 2017 1:01 PM To: 'MarkLogic Developer Discussion' Subject: Re: [MarkLogic Dev General] [MarkLogic] Data Locality Sorry for replying on this thread. I have created a new topic. Regards Johnny On Tue, Aug 29, 2017 at 21:45, Shmennen <[email protected]> wrote: Hello All, I am quite new with MarkLogic, but I would like to ask you about following scenario: - suppose we have a cluster with 2 nodes located on different physical nodes; e.g. A and B - each node contains replicated data If I run a query from host A, is there any change to get data returned by that host (e.g. D node from host A)? Generally speaking, this might be a performance improvement to return data from the host which contains it and it is closer to the caller... Going further, suppose we have sort of map reduce pattern. Is data processed (e.g. reduced) on the node it exists and then returned back to the caller (e.g. to benefit of caching)? Please let me know your input. Best Regards Johnny
_______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
