same setup with the tip, did not speedup the map phase, i even generated a 2Mb
post request file:
{"inputs":[["actionbucket","10000"],["actionbucket","10001"],["actionbucket","10002"],["actionbucket","10003"],["actionbucket","10004"],["actionbucket","10005"],["actionbucket","10006"],["actionbucket","10
007"],["actionbucket","10008"],["actionbucket","10009"],["actionbucket","10010"],["actionbucket","10011"],["actionbucket","10012"],["actionbucket","10013"],["actionbucket","10014"],["actionbucket","1001...
but this did not speedup the process too...
my computer setup is the following:
Processor Name: Intel Core 2 Duo
Processor Speed: 2.53 GHz
Number Of Processors: 1
Total Number Of Cores: 2
L2 Cache: 3 MB
Memory: 4 GB
i use win7 on a intel quadcore 6600 cpu with 4gb ram.
the two vmware virtual machines have 2 cpus and ~ 1400mb ram configured +
fedora 12.
the firewalls are turned off completly
everything is connected with ethernet through a small switch and a static ips.
all the nine riak instances have this config (only the ip and port are
changing):
------------------------------------------------------------------------------------------------------------------------------------------
%% -*- tab-width: 4;erlang-indent-level: 4;indent-tabs-mode: nil -*-
%% ex: ts=4 sw=4 et
%%
%% etc/app.config
%%
{ring_state_dir, "data/ring"}.
{web_ip, "192.168.0.100"}.
{web_port, 8091}.
{handoff_port, 8101}.
{pb_ip, "192.168.0.100"}.
{pb_port, 8081}.
{bitcask_data_root, "data/bitcask"}.
{sasl_error_log, "log/sasl-error.log"}.
{sasl_log_dir, "log/sasl"}.
%%
%% etc/vm.args
%%
{node, "[email protected]"}.
%%
%% bin/riak
%%
{runner_script_dir, "$(cd ${0%/*} && pwd)"}.
{runner_base_dir, "${RUNNER_SCRIPT_DIR%/*}"}.
{runner_etc_dir, "$RUNNER_BASE_DIR/etc"}.
{runner_log_dir, "$RUNNER_BASE_DIR/log"}.
{pipe_dir, "/tmp/$RUNNER_BASE_DIR/"}.
{runner_user, ""}.
------------------------------------------------------------------------------------------------------------------------------------------
is vmware the bottleneck? should i use erlang to do the mr job?
best regards
nils
On Sep 17, 2010, at 12:40 AM, Grant Schofield wrote:
> I think the slowness is coming from the older list keys implementation in
> 0.12.1, list keys has been changed in the tip version of Riak and is quite a
> bit faster now. In addition there have been a lot of improvements to the
> Javascript map reduce implementation that should help the speed of your
> query. For the time being you will need to run Riak tip to get access to
> these enhancements.
>
> Grant Schofield
> Developer Advocate
> Basho Technologies, Inc.
>
>
> On Sep 16, 2010, at 5:17 PM, Nils Petersohn wrote:
>
>> ok, my ring seems ok now.
>> what i did was to change the rel/vars/dev[1,2,3]_vars.config file.
>> in there i was just replacing the ips...
>> this reip thing did not really work out ...
>>
>> here is my riak ring now:
>> ([email protected])1> riak_core_ring_manager:get_my_ring().
>> {ok,{chstate,'[email protected]',
>> [{'[email protected]',{65,63451889794}},
>> {'[email protected]',{13,63451889512}},
>> {'[email protected]',{104,63451889512}},
>> {'[email protected]',{49,63451889512}},
>> {'[email protected]',{32,63451889009}},
>> {'[email protected]',{94,63451889253}},
>> {'[email protected]',{9,63451889769}},
>> {'[email protected]',{97,63451889494}}],
>> {64,
>> [{0,'[email protected]'},
>> {22835963083295358096932575511191922182123945984,
>> '[email protected]'},
>> {45671926166590716193865151022383844364247891968,
>> '[email protected]'},
>> {68507889249886074290797726533575766546371837952,
>> '[email protected]'},
>> {91343852333181432387730302044767688728495783936,
>> '[email protected]'},
>> {114179815416476790484662877555959610910619729920,
>> '[email protected]'},
>> {137015778499772148581595453067151533092743675904,
>> '[email protected]'},
>> {159851741583067506678528028578343455274867621888,
>> '[email protected]'},
>> {182687704666362864775460604089535377456991567872,
>> '[email protected]'},
>> {205523667749658222872393179600727299639115513856,
>> '[email protected]'},
>> {228359630832953580969325755111919221821239459840,
>> '[email protected]'},
>> {251195593916248939066258330623111144003363405824,
>> '[email protected]'},
>> {274031556999544297163190906134303066185487351808,
>> '[email protected]'},
>> {296867520082839655260123481645494988367611297792,
>> '[email protected]'},
>> {319703483166135013357056057156686910549735243776,
>> '[email protected]'},
>> {342539446249430371453988632667878832731859189760,
>> '[email protected]'},
>> {365375409332725729550921208179070754913983135744,
>> '[email protected]'},
>> {388211372416021087647853783690262677096107081728,
>> '[email protected]'},
>> {411047335499316445744786359201454599278231027712,
>> '[email protected]'},
>> {433883298582611803841718934712646521460354973696,...},
>> {...}|...]},
>> {dict,0,16,16,8,80,48,
>> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],...},
>> {{[],[],[],[],[],[],[],[],[],[],[],[],...}}}}}
>> ([email protected])2>
>>
>> i am using 0.12.1 on my mac and 0.12 on both vms. i have now a set of
>> 100.000 entrys like this (just for testing):
>> {"id":"42164", "actionTime":"2007-05-11 17:08:55", "action":"some action",
>> "res":"7024", "user":"5", "client":"2787"}
>>
>>
>> and my mr job looks like this (just for testing):
>> {"inputs":"actionbucket",
>> "query":[
>> {"map":{"language":"javascript", "source":
>> "function(values, keyData, arg) {
>>
>> var value = Riak.mapValuesJson(values)[0];
>> if(value.reservation == '4084'){
>> return [value];
>> }
>> return [];
>> }","keep":true}}
>> ],"timeout": 900000
>> }
>>
>>
>> the beam instances are all showing on "top" now, and there is some traffic
>> going back and forth. (~200kb / s)
>>
>> but this job takes like 1:30 min.
>>
>> i know that this is not really comparable with a mysql query because you can
>> do more calculations in the mr job to produce much more special results and
>> the mr job has a ~linear "worktime"... but ~1:30 min is still pretty bad
>> ....
>>
>> is there any way to do much better ?
>>
>> best regards
>> nils
>>
>> On Sep 16, 2010, at 7:08 PM, Grant Schofield wrote:
>>
>>>
>>> On Sep 15, 2010, at 2:40 PM, Nils Petersohn wrote:
>>>
>>>> hello,
>>>>
>>>> i was setting up 9 riak instances:
>>>>
>>>> three on my mac with the appropriate app config
>>>> and six with two virtual machines on a different computer.
>>>>
>>>> all 8 joined the [email protected]
>>>> and the join request was sent.
>>>>
>>>> after setting this up:
>>>> i wanted to put data with the java client on [email protected] than i got
>>>> a timeout ?!?
>>>>
>>>
>>> I am curious if you started this node and then changed its name in the
>>> config file? Errors like this can happen if you don't riak-admin reip the
>>> node, also the ring file would be wrong and this could lead to some of the
>>> other errors you saw below. One thing you may want to look at is the state
>>> of your ring from the Riak console using
>>> riak_core_ring_manager:get_my_ring(). That might show any problems with the
>>> ring, feel free to send that along so we can take a look at it.
>>>
>>>> when i put data on one of the other machines than only this machine was
>>>> using cpu time and none of the other ...
>>>> if consistent hashing works like expected, than all the machines should
>>>> show up on "top"
>>>>
>>>> when i did a mapreduce job than only this machine was using cpu time and
>>>> none of the other ...
>>>>
>>>> i had "top" running on all of them.
>>>>
>>>> -------------------------------------------------------
>>>> the other problem is:
>>>>
>>>> when i have 1/2 mio. entrys in one bucket with less than 100 chars for
>>>> each entry
>>>> and i do a really simple mapreduce job, than it takes forever (15 minutes
>>>> ...)
>>>> while sql uses .005 secons....
>>>>
>>>> i know that doing a mr on a complete bucket, than it takes very long if i
>>>> don't secify keys in the bucket. but how should i know which keys to use
>>>> ...
>>>
>>> What version of Riak are you using? There has been a fair amount of
>>> improvement to the map reduce system as well as list keys. Are the map
>>> reduce jobs you are running javascript?
>>>
>>>> ------------------------------------------------------
>>>>
>>>> if i put stuff in one bucket and add a machine with the join request, how
>>>> can i rebalance the bucket???? so that the other machine is taking some
>>>> values too.
>>>
>>> This happens automatically. When the new node joins the cluster you should
>>> see handoff messages in the erlang.log.X log file. Rebalancing is handled
>>> by the cluster and shouldn't be done manually.
>>>
>>> Grant Schofield
>>> Developer Advocate
>>> Basho Technologies, Inc.
>>>
>>>
>>>>
>>>> ------------------------------------------------------
>>>>
>>>> i don't understand these issues/behaviors (timeout, 15min. etc.,
>>>> rebalancing), maybe i was setting the one of the three params incorrect ?
>>>> i left everything to the default settings.
>>>>
>>>> thx in advance for any hints...
>>>>
>>>> nils
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> [email protected]
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>
>> Nils M. Petersohn
>> xing.com/profile/Nils_Petersohn
>> blog.srvme.de
>> twitter.com/snackycracky
>> facebook.com/nils.petersohn
>> myspace.com/electrash
>>
>> [email protected]
>> 0049 (0)151 40 511 351
>> skype: nilz_berlin
>>
>> Ebertystr. 47
>> 10249 Berlin
>>
>
Nils M. Petersohn
xing.com/profile/Nils_Petersohn
blog.srvme.de
twitter.com/snackycracky
facebook.com/nils.petersohn
myspace.com/electrash
[email protected]
0049 (0)151 40 511 351
skype: nilz_berlin
Ebertystr. 47
10249 Berlin
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com