So just to provide a bit of context. We want a datastore that can hold over 500 000 000 keys and will those keys will map reduced routinely.
I would love to use Riak for this but the question is can it handle this amount of data (and possibly more) and can it be done cheaply? What sort of hosting would be needed? RAM? CPU? etc... Thanks for the help On Wed, May 15, 2013 at 5:33 PM, Dmitri Zagidulin <[email protected]>wrote: > Kurt, > > I'm not sure about the cause of the MapReduce crash (I suspect it's > running out of resources of some kind, even with the increase of vm count > and mem). > One word of advice about the list keys timeout, though: > Be sure to use streaming list keys. > > In Python, this would look something like: > > for keylist in bucket.stream_keys(): > for key in keylist: > # Do something with the key > > > This will at least avoid the timeout problem (though you may want to > consider your use case here, and maybe use secondary index queries or > search queries instead of listing all the keys in a bucket, since even a > streaming list keys has to iterate over _all_ keys in a cluster). > > Dmitri > > > On Wed, May 15, 2013 at 7:02 AM, kurt campher <[email protected]>wrote: > >> Hi People >> >> Im running Map Reduce on a bucket with more than 100 000 items. >> >> The MR runs for 10 seconds then stops with this error in the logs: >> *@riak_pipe_vnode:new_worker:766 Pipe worker startup failed:fitting was >> gone before startup* >> >> *And this errror in the Python shell:* >> Error running MapReduce operation. Headers: {'date': 'Tue, 14 May 2013 >> 15:07:27 GMT', 'content-length': '623', 'content-type': 'application/json', >> 'http_code': 500, 'server': 'MochiWeb/1.1 WebMachine/1.9.0 (someone had >> painted it blue)'} Body: >> '{"phase":0,"error":"[preflist_exhausted]","input":"{ok,{r_object,<<\\"real_raw_logs\\">>,<<\\"8a4986cc235ec8690123677460ac05e6:2013-05-14 >> 12:11:08.178628:0.184912287858\\">>,[{r_content,{dict,6,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[[<<\\"Links\\">>]],[],[],[],[],[],[],[],[[<<\\"content-type\\">>,97,112,112,108,105,99,97,116,105,111,110,47,106,115,111,110],[<<\\"X-Riak-VTag\\">>,49,89,48,122,98,99,66,53,120,86,120,50,90,67,101,51,115,120,79,85,65,79]],[[<<\\"index\\">>]],[],[[<<\\"X-Riak-Last-Modified\\">>|{1368,533468,242947}]],[],[...]}}},...}],...},...}","type":"forward_preflist","stack":"[]"}' >> >> Also, I cant list the keys on the bucket. A timeout error occurs. >> >> >> I have Riak running on 2 nodes with 7 Gigs of RAM each. >> Map Reduce runs fine over 2000 items. >> I have increased the js_vm count multiple times. >> Also increased the js_max_vm_mem to 2048 >> Also increased the Map Reduce query's timeout but never lasts longer than >> 10 seconds >> >> *Thanks to anyone who looks at this* >> >> >> >> >> _______________________________________________ >> riak-users mailing list >> [email protected] >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> > > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
