Re: [MarkLogic Dev General] [MarkLogic] Data Locality

Sam Mefford Tue, 29 Aug 2017 13:46:03 -0700

All data resides in forests and forests reside on specific hosts.  You can 
operate on only data in a particular forest.  That is one way to operate only 
on data on the local host.  To find out the forest for a document, use XQuery 
function xdmp:document-forest.  To find out all forests on a host, use XQuery 
function xdmp:host-forests.  To search for documents matching a query but only 
in particular forests, add the forest ids as the fifth argument to cts:search 
calls.

>    Going further, suppose we have sort of map reduce pattern. Is data 
> processed (e.g. reduced) on the node it exists and then returned back to the 
> caller (e.g. to benefit of caching)?

The server has many different ways it optimizes caching, and where appropriate 
it takes into account processing on the node it exists and then returning it 
back to the caller.  The good news is you don't usually have to (or want to) 
bother yourself about it.  Nevertheless, it's very much worth understanding, 
which is why we publish the e-book Inside MarkLogic 
Server<http://developer.marklogic.com/inside-marklogic>.

In addition, the Java Client API has a new Data Movement SDK which takes care 
of this for you from the client side.  If need to process massive amounts of 
documents, it will split them up for you into forest-specific batches.  You can 
then process one forest-specific batch at a time on the host with that forest.

Sam Mefford
Senior Engineer
MarkLogic Corporation
[email protected]
Cell: +1 801 706 9731
www.marklogic.com<http://www.marklogic.com>

This e-mail and any accompanying attachments are confidential. The information 
is intended solely for the use of the individual to whom it is addressed. Any 
review, disclosure, copying, distribution, or use of this e-mail communication 
by others is strictly prohibited. If you are not the intended recipient, please 
notify us immediately by returning this message to the sender and delete all 
copies. Thank you for your cooperation.
________________________________
From: [email protected] 
[[email protected]] on behalf of Shmennen 
[[email protected]]
Sent: Tuesday, August 29, 2017 1:01 PM
To: 'MarkLogic Developer Discussion'
Subject: Re: [MarkLogic Dev General] [MarkLogic] Data Locality

Sorry for replying on this thread. I have created a new topic.

Regards
Johnny

On Tue, Aug 29, 2017 at 21:45, Shmennen
<[email protected]> wrote:
Hello All,

   I am quite new with MarkLogic, but I would like to ask you about following 
scenario:
- suppose we have a cluster with 2 nodes located on different physical nodes; 
e.g. A and B
- each node contains replicated data

   If I run a query from host A, is there any change to get data returned by 
that host (e.g. D node from host A)?
Generally speaking, this might be a performance improvement to return data from 
the host which contains it and it is closer to the caller...

   Going further, suppose we have sort of map reduce pattern. Is data processed 
(e.g. reduced) on the node it exists and then returned back to the caller (e.g. 
to benefit of caching)?

  Please let me know your input.

Best Regards
Johnny

_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] [MarkLogic] Data Locality

Reply via email to