Howdy Sean,
I think I have figured out why performing a M/R search using a key filter as
the input and just a single Javascript reduce phase was causing a crash.
The inputs to the Javascript reduce phase (with no preceding map phase) are
Erlang binary terms that are not compatible with the Javascript interface.
I duplicated my test using the key filter as an input and using just a single
Erlang reduce phase as follows:
{
"inputs": {
"bucket": "junk",
"key_filters": [ ["string_to_int"], ["less_than", 10] ]
},
"query": [
{
"reduce": {
"language": "erlang",
"module": "riak_kv_mapreduce",
"function": "reduce_identity"
}
}
]
}
This worked just fine! And the time to generate the output WITH keys that
match the input filter and WITHOUT keys that match the input filter is about
the same. So it definitely seems that this method does not cause objects to be
loaded.
Thanks for your help and insight!
--gordon
On Jan 25, 2011, at 08:57, Sean Cribbs wrote:
I'm not sure why that's crashing (I suspect it's string_to_int on 0-prefixed
numbers), but your phase has to have "keep":true to return any data to the
client.
Sean Cribbs <[email protected]<mailto:[email protected]>>
Developer Advocate
Basho Technologies, Inc.
http://basho.com/
On Jan 25, 2011, at 9:53 AM, Gordon Tillman wrote:
Sean thanks again for the feedback.
Using just a reduce function seems to cause Riak problems (unless it's me doing
something wrong).
For example, I populated a bucket called "junk" with with the following command:
for s in `seq 00000 10000`; do curl -X POST -H 'Content-Type: text/plain'
http://localhost:8091/riak/junk/$s -d 0; done
Now if I try and select some keys using key filters and a map function it works:
curl -X POST -H 'Content-Type: application/json' http://localhost:8091/mapred
[email protected]<mailto:[email protected]>
["5","25","37","10", ... ,"18","22","17"]
where kfm.json is:
{
"inputs": {
"bucket": "junk",
"key_filters": [ ["string_to_int"], ["less_than", 100] ]
},
"query": [
{
"map": {
"language": "javascript",
"source": "function(v, a) { return [v.key]; }"
}
}
]
}
But if I try the same thing with just a reduce phase like this:
curl -X POST -H 'Content-Type: application/json' http://localhost:8091/mapred
[email protected]<mailto:[email protected]> -i
where kfr.json is:
{
"inputs": {
"bucket": "junk",
"key_filters": [ ["string_to_int"], ["less_than", 100] ]
},
"query": [
{
"reduce": {
"language": "javascript",
"source": "function(v, a) { return v; }"
}
}
]
}
The curl command just hangs and I see this in the logs:
=CRASH REPORT==== 25-Jan-2011::08:46:11 ===
crasher:
initial call: riak_kv_keys_fsm:init/1
pid: <0.26942.19>
registered_name: []
exception exit: badmsg
in function gen_fsm:terminate/7
in call from proc_lib:init_p_do_apply/3
ancestors: [<0.26941.19>]
messages:
[{EXIT,<0.26942.19>,normal},{EXIT,<0.26942.19>,normal},{$gen_event,{66195534,{kl,1278813932664540053428224228626747642198940975104,[<<"83">>,<<"9">>]}}},{$gen_event,{66195534,{kl,1278813932664540053428224228626747642198940975104,[<<"76">>]}}},{$gen_event,{66195534,{kl,1278813932664540053428224228626747642198940975104,[<<"95">>]}}},{$gen_event,{66195534,{kl,1278813932664540053428224228626747642198940975104,[<<"61">>]}}},{$gen_event,{66195534,1278813932664540053428224228626747642198940975104,done}}]
links: []
dictionary: []
trap_exit: true
status: running
heap_size: 2584
stack_size: 24
reductions: 94951
neighbours:
--gordon
On Jan 25, 2011, at 07:24, Sean Cribbs wrote:
Use a reduce phase instead, which doesn't force loading of the objects. A
simple identity reduce should do what you want: function(values,arg){ return
values; }
Sean Cribbs <[email protected]<mailto:[email protected]>>
Developer Advocate
Basho Technologies, Inc.
http://basho.com/
On Jan 24, 2011, at 7:43 PM, Gordon Tillman wrote:
Greetings All,
I have a use case for our app where I need to fetch a list of keys that match
some pattern and was hoping to be able to use key filters for that.
In my test I defined a key filter for the input phase of mapred and then
defined just a single map phase that returns the object key. But there is
considerable overhead with that map phase because (I'm assuming this part) Riak
is having to load each object to provide the necessary inputs to the map
function.
Is there a way to do this without Riak having to actually load the objects?
Many thanks,
--gordon
_______________________________________________
riak-users mailing list
[email protected]<mailto:[email protected]>
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com