Re: MapReduce scalability

Bernard Fouché Thu, 28 Feb 2013 10:44:56 -0800

Thanks Christian and BR Rune for your answers, we'll reconsider how wedid our processing since obviously we badly react to tombstones when ourMR query meet them.


Best Regards,


 Bernard

Le 28/02/2013 16:17, Christian Dahlqvist a écrit :

Hi Bernard,
The description in the documentation is entirely accurate and not atall purely theoretical. Riak will automatically select a covering setof vnodes/partitions that hold the data set required to complete thejob. All physical nodes may therefore net need to participate in thejob. When performing this selection, the coordinating node will takeinto account any node outages.
Any map phases will then run on all of these vnodes and use the datastored on each local partition. In order to make it as efficient aspossible, it will use only the versions of the data available locallyand will not perform a quorum read against all the replicas holding acopy of that data as this would result in a lot of network trafficwhen running large jobs. The outputs of any map phases are then sentover to the coordinating node where any reduce phases would normally run.
As the input to the map phase only reads from one replica for every KVpair, results can differ from run to run if all replicas are not insync. This likelihood of this happening should however be reduced withthe introduction of active anti-entropy in release 1.3 of Riak, butwill due to the eventually consistent nature of Riak never becompletely eliminated.
MapReduce is quite resilient to data issues as long as any map phasefunctions used have been designed to handle notfounds and tombstones.Nodes going down during a MapReduce job will however in many casescause it to fail.
Although it would technically be possible to create a map phasefunction in Erlang that performs a quorum read using the internal Riakclient and then performs any processing based on this object insteadof the one passed in, this is strongly discouraged as it would add alot of additional network traffic and pose a significant risk ofoverloading the cluster.
Best regards,

Christian
On 28 Feb 2013, at 13:53, Bernard Fouché <[email protected]<mailto:[email protected]>> wrote:
Hi Christian,
Athttp://docs.basho.com/riak/1.3.0/references/appendices/MapReduce-Implementation/, one can read "...any Riak node can also coordinate a MapReducequery by sending a map-step evaluation request directly to the noderesponsible for maintaining the input data. Map-step results are sentback to the coordinating node, where reduce-step processing canproduce a unified result.".
What you wrote means that the above description is purely theoreticalsince if there is any problem to get access to data in a node, thenthe MR fails. We have also seen that deleting a key while doing a MRjust makes the MR to run forever so it makes me think that yourdescription is accurate and for the documentation to be correct itseems that one must first be sure that all input data reading willnever trigger any kind of error processing, otherwise the MR job willfail (or be stuck). Please correct me if I've misunderstood!
Now if I want to split processing of a list of keys in the cluster,is there a way to know what node is supposed to have at least onecopy of a K/V ?
If so, we can setup our own kind of MR, by sending subset of keys tonodes known to have at least one version of the K/V pair. Hence ifR==2, there will be one local read in the node receiving the subsetand only one more read in another node that holds a copy. Then thisdistributed processing can handle read-repair, aggregate data andsend the result to the coordinating node.
Best Regards,

        Bernard
Le 28/02/2013 10:32, Christian Dahlqvist a écrit :
Hi Boris,
Apart from not scaling quite as well as straight K/V access,emulating multiGET through MapReduce also has another significantdrawback. MapReduce has no concept of quorum reads, and only work ona single copy of the data, which can be thought of basically as aread with R=1 that does not trigger read-repair. It is thereforepossible that it can give inconsistent or incorrect results if allreplicas do not have the same data. It is worth noting thatMapReduce was designed as a way to efficiently spread compute workacross the cluster, and re-appropriating it for use with datacollection is not its designed purpose.
The recommended way to implement efficient multiget is to performnormal GET operations in parallel. If you are retrieving 20 objects,you don't necessarily need to do all 20 GETs in parallel, but couldset it up to use perhaps 3 or 4 connections. If you then pair thiswith a connection pool that can grow and shrink in size (perhapsbetween a minimum and a maximum value) as load requires, you shouldbe able to retrieve the objects in a reasonable time withoutoverloading the cluster.
Best regards,

Christian
On 27 Feb 2013, at 02:18, Boris Okner <[email protected]<mailto:[email protected]>> wrote:
Thanks Christian,
The problem I'm trying to solve is to find the way to retrievevalues for limited number of keys with the best possible latency(or maybe with decent latency which is balanced with decentthroughput). Let's say we have keys stored in some cacheon top of Riak, and want to retrieve values, 20 at the time, to beable to implement pagination. Another alternative to mapreducewould to send multiple asynchronous gets, but then we'd have toworry about connection pool being exhausted if there's too manysuch "page" requests. So what would be the proper way to deal withthe situation when we need to emulate multiple key retrieval?
On Tue, Feb 26, 2013 at 1:57 AM, Christian Dahlqvist<[email protected] <mailto:[email protected]>> wrote:
    Hi Boris,

    MapReduce is a very flexible and powerful way of querying Riak
    and allows processing to be performed locally where the data
    resides, which allows for efficient processing of larger data
    sets. A result of this is that every mapreduce job requires a
    covering set of vnodes (all vnodes that hold the data required
    for processing) to participate, meaning that it puts
    considerable more load on the system compared to straight K/V
    access and therefore does not scale quite as well. It is
    primarily designed for batch type processing over reasonably
    large amounts of data and scales well with increased data
    volumes as new nodes are added. We do however usually not
    recommended using it as an interface for realtime queries where
    low and predictable latencies are required and the concurrency
    level, and therefore load level on the cluster, can not be
    controlled.

    I am not sure I understand what you mean by the performance
    degrading with the number of nodes, unless you are strictly
    measuring latency rather than throughput. As the number of
    nodes increase, it gets more and more likely that multiple
    physical nodes will be involved in the job, which will add to
    the amount of communication and coordination required between
    the nodes, thereby increasing latency. Could you please explain
    in more detail what you are trying to achieve?

    Best regards,

    Christian


    On 25 Feb 2013, at 16:41, Boris Okner <[email protected]
    <mailto:[email protected]>> wrote:
    Hello,

    I'm experimenting with 2 Riak 1.3.0 nodes (both are "bare
    metal"), and it looks like mapreduce performs better when one
    of the nodes is down. The mapreduce requests are running on
    20-key blocks. So am I doing something wrong, or is it an
    expected behaviour, i.e. mapreduce degrades with the the
    number of nodes increased? If the former, could
    you give me some pointers on how to set up it to get advantage
    of multiple nodes?

    Thanks in advance for your help,
    Boris
    _______________________________________________
    riak-users mailing list
    [email protected] <mailto:[email protected]>
    http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
[email protected] <mailto:[email protected]>
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: MapReduce scalability

Reply via email to