For those wishing to know how I solved this and to shed light on debugging
map/reduce here is what I did.
Background:
I’m dealing with a set of keys that are mirrored in two buckets, an
authorization expression bucket and a protected objects bucket. My goal is to
use map/reduce to evaluate the authz expressions for the passed-in keys and
return only those protected objects for which a user is authorized. But authz
expressions and protected objects themselves may not exist since they could be
deleted while references to them may not have been cleaned up as yet.
For my input I have this bucket/key pairs array from other processing. I have
authz expressions for v1 to v4 but not v5 to v8 and protected objects for v1 to
v3 but v4 to v8.
[[“authz”, “v1”], [“authz”, “v2”], [“authz”, “v3”], [“authz”, “v4”], …
[“authz”, “v8”]]
I’m using riak-js in node.js. My map reduce looked like this to begin with:
db.add(pairs)
.map(evaluation.toMapReduceForm, { 'obj-bucket' : 'v2.tv', 'user-atts' :
userAtts })
.map('Riak.mapValuesJson') // converts the buckets and keys
array into array of json objects
.run(function(err, listOfViews) {
if (err) {
console.log("ERROR: Unable to obtain tvs for id '" + id
+ "'. Detail: " + JSON.stringify(err));
send500ToClient(response);
return;
}
callback(listOfViews);
});
This results in the err object being the unhelpful {"statusCode":500}.
Fortunately, I have an http proxy that I wrote, “google wamulator”, that I’ve
configured allowing all riak-js http traffic passing to riak to pass through
the proxy exposing what passes across the wire. And here is what I saw:
{
* "phase":0,
* "error":"function_clause",
*
"input":"{{error,notfound},{<<"v2.tv.authz">>,<<"v5">>},{struct,[{<<"type">>,<<"FALSE">>}]}}",
* "type":"error",
*
"stack":"[{riak_kv_pipe_get,bkey,[{not_found,{<<"v2.tv.authz">>,<<"v5">>},{struct,[{<<"type">>,<<"FALSE">>}]}}]},{riak_kv_pipe_get,bkey_chash,1},{riak_pipe_vnode,queue_work,4},{riak_kv_mrc_map,send_results,2},{riak_pipe_vnode_worker,process_input,3},{riak_pipe_vnode_worker,wait_for_input,2},{gen_fsm,handle_msg,7},{proc_lib,init_p_do_apply,3}]"
}
This is where it got interesting. It appears that it wasn’t finding the authz
object for the v5 key. So I assumed it was failing before even hitting my first
map function. On the contrary, _it wasn’t_. On a whim I commented out the
second map and the reduce portions. And ran the query again. And the following
array was returned.
[
[
"v2.tv",
"v1"
],
{
"not_found": {
"bucket": "v2.tv.authz",
"key": "v5",
"keydata": "undefined"
}
},
… more not_found objects, one for each missing key,
[
"v2.tv",
"v3"
],
]
This gave me some great information:
1) If I don’t have a reduce phase my objects returned from a map phase
make it back to the client as-is. We can use that for debugging!
2) I was getting these weird not_found objects included with my two
objects (of three) for which the user was authorized.
Now where did those not_found objects come from? After _much_ trial and error I
came to the conclusion that to each phase is passed an array. A map phase
interprets that as an array of bucket/key pair arrays. For each of those the
map phase looks for the corresponding item. If not found, the map phase puts
one of these not_found objects in its output array. If an item _is_ found it
passes the item to the map function and sticks any returned object into the
output array.
Note that I said “returned object” not “bucket/key pairs”. As noted in item 1
above, it appears to be crafting another input array without interpretation. It
appears that intepretation belongs to the next phase. And if there is no next
phase, then that array propagates back to the client as-is including any
not_founds for missing bucket/key objects in the input array to the map. In
contrast, it appears that a reduce phase takes the incoming array as-is without
treating them as bucket/key pairs.
Now back to my original error. The not_found error for the v5 key was coming
from the second map phase, the mapValuesJson part. As noted, it tries to
interpret the incoming array as bucket/key pair array objects and sees those
not_found items and throws the error.
So how did I solve this problem?
Riak has some pre-defined javascript functions that can be used in map/reduce
defined at
https://github.com/basho/riak_kv/blob/master/priv/mapred_builtins.js. I noted
that one of these, filterNotDefined, had a single argument having a plural
name, values. That led me to believe that it was solely for use in the reduce
phase. So here is what I did. Notice that after each map phase to which keys
will be passed that might not exist I have a reduce phase that leverages the
filterNotDefined function to pull those not_found objects from the array. That
last one is there so that I don’t get those not_found objects in the array
returned from riak.
db.add(pairs)
.map(evaluation.toMapReduceForm, { 'obj-bucket' : 'v2.tv', 'user-atts' :
userAtts })
.reduce('Riak.filterNotFound')
.map('Riak.mapValuesJson') // converts the buckets and keys array into
array of json objects
.reduce('Riak.filterNotFound')
.run(function(err, listOfViews) { // process on client the list of
returned array objects
if (err) {
console.log("ERROR: Unable to obtain tvs for id '" + id + "'.
Detail: " + JSON.stringify(err));
send500ToClient(response);
return;
}
callback(listOfViews);
});
Yes, you can have multiple reduce steps and that solves the “not found” issue.
Hope this helps.
Mark
From: Mark Boyd ソフトウェア 建築家
Sent: Sunday, July 15, 2012 10:18 PM
To: [email protected]
Subject: RE: mapreduce with non-existent keys
Never mind. I found the archive search page and this same question posted
earlier here:
http://riak-users.197444.n3.nabble.com/Map-Reduce-behavior-when-key-not-found-td3641739.html
Mark
From:
[email protected]<mailto:[email protected]>
[mailto:[email protected]]<mailto:[mailto:[email protected]]>
On Behalf Of Mark Boyd ?????? ???
Sent: Sunday, July 15, 2012 7:55 AM
To: [email protected]<mailto:[email protected]>
Subject: mapreduce with non-existent keys
I’ve got a set of bucket/key pairs that may contain items that no longer exist
in riak. Is it possible to pass that to map/reduce and explicitly tell riak to
ignore any pairs which aren’t current, ie: which aren’t found? For example, if
I have compiled a list of pairs but before passing the list, one or more of
those items was removed from the database, then my map/reduce appears to fail
since it doesn’t find the referenced item. Can riak be told to ignore such
missing items if they are incurred?
Thanks.
Mark
NOTICE: This email message is for the sole use of the intended recipient(s) and
may contain confidential and privileged information. Any unauthorized review,
use, disclosure or distribution is prohibited. If you are not the intended
recipient, please contact the sender by reply email and destroy all copies of
the original message.
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com