Jim,
A couple of things to note. First, bitcask stores all keys in memory, but
eleveldb does not necessarliy, so the performance of your disks could be a
factor. Not saying it is, but just a difference to be aware of between bitcask
and eleveldb.
Second, the latest error you shared was a timeout from the mapreduce operation.
You can increase the timeout for the operation by modifying your original query
like this:
curl -v -d
'{"inputs":{"bucket":"nodes","key_filters":[["eq","user_id-xxxxxxx-info"]]},"query":[{"reduce":{"language":"erlang","module":"riak_kv_mapreduce","function":"reduce_identity"}}],
"timeout", 120}' -H "Content-Type: application/json"
http://xx.xx.xx.xx:8098/mapred
Finally, you're using a reduce phase in the query when I think you might be
better served by a map phase which will allow you to get more parallelization
during the query execution. Try using a map phase with the map_identity
function instead of reduce_identity and I suspect you will get better results.
Hope that helps and please respond if you have any further questions or
problems. Cheers.
Kelly
On Oct 23, 2011, at 5:40 PM, Jim Adler wrote:
> A little context on my use-case here. I've got about 8M keys in this 3 node
> cluster. I need to clean out some bad keys and some bad data. So, I'm using
> the key filter and search functionality to accomplish this (I tend to use the
> riak python client). But, to be honest, I'm having a helluva time getting
> these basic tasks accomplished before I ramp to hundreds of millions of keys.
>
> Thanks for any help.
>
> Jim
>
> From: Kelly McLaughlin <[email protected]>
> Date: Sun, 23 Oct 2011 14:13:09 -0600
> To: Jim Adler <[email protected]>
> Cc: "[email protected]" <[email protected]>
> Subject: Re: Key Filter Timeout
>
> Jim,
>
> Looks like you are possibly using both the legacy key listing option and the
> legacy map reduce. Assuming all your nodes are on Riak 1.0, check your
> app.config files on all nodes and make sure mapred_system is set to pipe and
> legacy_keylisting is set to false. If that's not already the case you should
> see better performance. If you are still getting the same or similar errors
> with those setting in place, please respond with what they are so we can look
> into it more. Thanks.
>
> Kelly
>
> On Oct 23, 2011, at 12:38 PM, Jim Adler wrote:
>
>> I'm trying to run a very simplified key filter that's timing out. I've got
>> about 8M keys in a 3-node cluster, 15 GB memory, num_partitions=256, LevelDB
>> backend.
>>
>> I'm thinking this should be pretty quick. What am I doing wrong?
>>
>> Jim
>>
>> Here's the query:
>>
>> curl -v -d
>> '{"inputs":{"bucket":"nodes","key_filters":[["eq","user_id-xxxxxxx-info"]]},"query":[{"reduce":{"language":"erlang","module":"riak_kv_mapreduce","function":"reduce_identity"}}]}'
>> -H "Content-Type: application/json" http://xx.xx.xx.xx:8098/mapred
>>
>> Here's the log:
>>
>> 18:25:08.892 [error] gen_fsm <0.20795.0> in state executing terminated with
>> reason: {error,flow_timeout}
>> 18:25:08.961 [error] CRASH REPORT Process <0.20795.0> with 2 neighbours
>> crashed with reason: {error,flow_timeout}
>> 18:25:08.963 [error] Supervisor luke_flow_sup had child undefined started
>> with {luke_flow,start_link,undefined} at <0.20795.0> exit with reason
>> {error,flow_timeout} in context child_terminated
>> 18:25:08.966 [error] gen_fsm <0.20798.0> in state waiting_kl terminated with
>> reason: {error,flow_timeout}
>> 18:25:08.971 [error] CRASH REPORT Process <0.20798.0> with 0 neighbours
>> crashed with reason: {error,flow_timeout}
>> 18:25:08.980 [error] Supervisor riak_kv_keys_fsm_legacy_sup had child
>> undefined started with {riak_kv_keys_fsm_legacy,start_link,undefined} at
>> <0.20798.0> exit with reason {error,flow_timeout} in context child_terminated
>> 18:25:08.983 [error] Supervisor luke_phase_sup had child undefined started
>> with {luke_phase,start_link,undefined} at <0.20797.0> exit with reason
>> {error,flow_timeout} in context child_terminated
>> 18:25:08.996 [error] Supervisor luke_phase_sup had child undefined started
>> with {luke_phase,start_link,undefined} at <0.20796.0> exit with reason
>> {error,flow_timeout} in context child_terminated
>>
>>
>> _______________________________________________
>> riak-users mailing list
>> [email protected]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com