Re: riak performance

Nils Petersohn Fri, 17 Sep 2010 05:02:43 -0700

same setup with the tip, did not speedup the map phase, i even generated a 2Mb 
post request file:


{"inputs":[["actionbucket","10000"],["actionbucket","10001"],["actionbucket","10002"],["actionbucket","10003"],["actionbucket","10004"],["actionbucket","10005"],["actionbucket","10006"],["actionbucket","10
    
007"],["actionbucket","10008"],["actionbucket","10009"],["actionbucket","10010"],["actionbucket","10011"],["actionbucket","10012"],["actionbucket","10013"],["actionbucket","10014"],["actionbucket","1001...

but this did not speedup the process too...

my computer setup is the following:
Processor Name: Intel Core 2 Duo
  Processor Speed:      2.53 GHz
  Number Of Processors: 1
  Total Number Of Cores:        2
  L2 Cache:     3 MB
  Memory:       4 GB

i use win7 on a intel quadcore 6600 cpu with 4gb ram.
the two vmware virtual machines have 2 cpus and ~ 1400mb ram configured + 
fedora 12.
the firewalls are turned off completly

everything is connected with ethernet through a small switch and a static ips.

all the nine riak instances have this config (only the ip and port are 
changing):
------------------------------------------------------------------------------------------------------------------------------------------
%% -*- tab-width: 4;erlang-indent-level: 4;indent-tabs-mode: nil -*-
%% ex: ts=4 sw=4 et

%%
%% etc/app.config
%%
{ring_state_dir,    "data/ring"}.
{web_ip,            "192.168.0.100"}.
{web_port,          8091}.
{handoff_port,      8101}.
{pb_ip,             "192.168.0.100"}.
{pb_port,           8081}.
{bitcask_data_root, "data/bitcask"}.
{sasl_error_log,    "log/sasl-error.log"}.
{sasl_log_dir,      "log/sasl"}.

%%
%% etc/vm.args
%%
{node,         "[email protected]"}.

%%
%% bin/riak
%%
{runner_script_dir,  "$(cd ${0%/*} && pwd)"}.
{runner_base_dir,    "${RUNNER_SCRIPT_DIR%/*}"}.
{runner_etc_dir,     "$RUNNER_BASE_DIR/etc"}.
{runner_log_dir,     "$RUNNER_BASE_DIR/log"}.
{pipe_dir,           "/tmp/$RUNNER_BASE_DIR/"}.
{runner_user,        ""}.
------------------------------------------------------------------------------------------------------------------------------------------

is vmware the bottleneck? should i use erlang to do the mr job? 

best regards
nils



On Sep 17, 2010, at 12:40 AM, Grant Schofield wrote:

> I think the slowness is coming from the older list keys implementation in 
> 0.12.1, list keys has been changed in the tip version of Riak and is quite a 
> bit faster now. In addition there have been a lot of improvements to the 
> Javascript map reduce implementation that should help the speed of your 
> query. For the time being you will need to run Riak tip to get access to 
> these enhancements. 
> 
> Grant Schofield
> Developer Advocate
> Basho Technologies, Inc.
> 
> 
> On Sep 16, 2010, at 5:17 PM, Nils Petersohn wrote:
> 
>> ok, my ring seems ok now.
>> what i did was to change the rel/vars/dev[1,2,3]_vars.config file.
>> in there i was just replacing the ips...
>> this reip thing did not really work out ...
>> 
>> here is my riak ring now:
>> ([email protected])1> riak_core_ring_manager:get_my_ring().
>> {ok,{chstate,'[email protected]',
>>            [{'[email protected]',{65,63451889794}},
>>             {'[email protected]',{13,63451889512}},
>>             {'[email protected]',{104,63451889512}},
>>             {'[email protected]',{49,63451889512}},
>>             {'[email protected]',{32,63451889009}},
>>             {'[email protected]',{94,63451889253}},
>>             {'[email protected]',{9,63451889769}},
>>             {'[email protected]',{97,63451889494}}],
>>            {64,
>>             [{0,'[email protected]'},
>>              {22835963083295358096932575511191922182123945984,
>>               '[email protected]'},
>>              {45671926166590716193865151022383844364247891968,
>>               '[email protected]'},
>>              {68507889249886074290797726533575766546371837952,
>>               '[email protected]'},
>>              {91343852333181432387730302044767688728495783936,
>>               '[email protected]'},
>>              {114179815416476790484662877555959610910619729920,
>>               '[email protected]'},
>>              {137015778499772148581595453067151533092743675904,
>>               '[email protected]'},
>>              {159851741583067506678528028578343455274867621888,
>>               '[email protected]'},
>>              {182687704666362864775460604089535377456991567872,
>>               '[email protected]'},
>>              {205523667749658222872393179600727299639115513856,
>>               '[email protected]'},
>>              {228359630832953580969325755111919221821239459840,
>>               '[email protected]'},
>>              {251195593916248939066258330623111144003363405824,
>>               '[email protected]'},
>>              {274031556999544297163190906134303066185487351808,
>>               '[email protected]'},
>>              {296867520082839655260123481645494988367611297792,
>>               '[email protected]'},
>>              {319703483166135013357056057156686910549735243776,
>>               '[email protected]'},
>>              {342539446249430371453988632667878832731859189760,
>>               '[email protected]'},
>>              {365375409332725729550921208179070754913983135744,
>>               '[email protected]'},
>>              {388211372416021087647853783690262677096107081728,
>>               '[email protected]'},
>>              {411047335499316445744786359201454599278231027712,
>>               '[email protected]'},
>>              {433883298582611803841718934712646521460354973696,...},
>>              {...}|...]},
>>            {dict,0,16,16,8,80,48,
>>                  {[],[],[],[],[],[],[],[],[],[],[],[],[],[],...},
>>                  {{[],[],[],[],[],[],[],[],[],[],[],[],...}}}}}
>> ([email protected])2> 
>> 
>> i am using 0.12.1 on my mac and 0.12 on both vms. i have now a set of 
>> 100.000 entrys like this (just for testing):
>> {"id":"42164", "actionTime":"2007-05-11 17:08:55", "action":"some action", 
>> "res":"7024", "user":"5", "client":"2787"}
>> 
>> 
>> and my mr job looks like this (just for testing):
>> {"inputs":"actionbucket",
>> "query":[
>>  {"map":{"language":"javascript", "source":
>>  "function(values, keyData, arg) {
>>       
>>      var value = Riak.mapValuesJson(values)[0];
>>       if(value.reservation == '4084'){
>>              return [value];
>>      }
>>      return [];
>>  }","keep":true}}
>>  ],"timeout": 900000
>> }
>> 
>> 
>> the beam instances are all showing on "top" now, and there is some traffic 
>> going back and forth. (~200kb / s)
>> 
>> but this job takes like 1:30 min.
>> 
>> i know that this is not really comparable with a mysql query because you can 
>> do more calculations in the mr job to produce much more special results and 
>> the mr job has a ~linear "worktime"... but ~1:30 min is still pretty bad 
>> .... 
>> 
>> is there any way to do much better ?
>> 
>> best regards
>> nils
>> 
>> On Sep 16, 2010, at 7:08 PM, Grant Schofield wrote:
>> 
>>> 
>>> On Sep 15, 2010, at 2:40 PM, Nils Petersohn wrote:
>>> 
>>>> hello,
>>>> 
>>>> i was setting up 9 riak instances:
>>>> 
>>>> three on my mac with the appropriate app config
>>>> and six with two virtual machines on a different computer.
>>>> 
>>>> all 8 joined the [email protected]
>>>> and the join request was sent.
>>>> 
>>>> after setting this up:
>>>> i wanted to put data with the java client on [email protected] than i got 
>>>> a timeout ?!?
>>>> 
>>> 
>>> I am curious if you started this node and then changed its name in the 
>>> config file? Errors like this can happen if you don't riak-admin reip the 
>>> node, also the ring file would be wrong and this could lead to some of the 
>>> other errors you saw below.  One thing you may want to look at is the state 
>>> of your ring from the Riak console using 
>>> riak_core_ring_manager:get_my_ring(). That might show any problems with the 
>>> ring, feel free to send that along so we can take a look at it.
>>> 
>>>> when i put data on one of the other machines than only this machine was 
>>>> using cpu time and none of the other ...
>>>> if consistent hashing works like expected, than all the machines should 
>>>> show up on "top"
>>>> 
>>>> when i did a mapreduce job than only this machine was using cpu time and 
>>>> none of the other ...
>>>> 
>>>> i had "top" running on all of them.
>>>> 
>>>> -------------------------------------------------------
>>>> the other problem is:
>>>> 
>>>> when i have 1/2 mio. entrys in one bucket with less than 100 chars for 
>>>> each entry
>>>> and i do a really simple mapreduce job, than it takes forever (15 minutes 
>>>> ...)
>>>> while sql uses .005 secons....
>>>> 
>>>> i know that doing a mr on a complete bucket, than it takes very long if i 
>>>> don't secify keys in the bucket. but how should i know which keys to use 
>>>> ...
>>> 
>>> What version of Riak are you using?  There has been a fair amount of 
>>> improvement to the map reduce system as well as list keys. Are the map 
>>> reduce jobs you are running javascript?
>>> 
>>>> ------------------------------------------------------
>>>> 
>>>> if i put stuff in one bucket and add a machine with the join request, how 
>>>> can i rebalance the bucket???? so that the other machine is taking some 
>>>> values too.
>>> 
>>> This happens automatically. When the new node joins the cluster you should 
>>> see handoff messages in the erlang.log.X log file.   Rebalancing is handled 
>>> by the cluster and shouldn't be done manually.
>>> 
>>> Grant Schofield
>>> Developer Advocate
>>> Basho Technologies, Inc.
>>> 
>>> 
>>>> 
>>>> ------------------------------------------------------
>>>> 
>>>> i don't understand these issues/behaviors (timeout, 15min. etc., 
>>>> rebalancing), maybe i was setting the one of the three params incorrect ? 
>>>> i left everything to the default settings.
>>>> 
>>>> thx in advance for any hints...
>>>> 
>>>> nils
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> [email protected]
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>> 
>> 
>> Nils M. Petersohn
>> xing.com/profile/Nils_Petersohn
>> blog.srvme.de
>> twitter.com/snackycracky
>> facebook.com/nils.petersohn
>> myspace.com/electrash
>> 
>> [email protected]
>> 0049 (0)151 40 511 351
>> skype: nilz_berlin
>> 
>> Ebertystr. 47
>> 10249 Berlin
>> 
> 

Nils M. Petersohn
xing.com/profile/Nils_Petersohn
blog.srvme.de
twitter.com/snackycracky
facebook.com/nils.petersohn
myspace.com/electrash

[email protected]
0049 (0)151 40 511 351
skype: nilz_berlin

Ebertystr. 47
10249 Berlin


_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: riak performance

Reply via email to