Hey Christopher, On Thu, Jan 20, 2011 at 9:58 AM, Christopher Rueber <[email protected]> wrote: > This seems like a question that would be answered in some of the docs, but I > can't find the details... > What kind of upper limitations are on large buckets? Millions of entries? > Billions? Directly correlating to the amount of disk space available to it? > Is there any kind of performance degradation of using one massive bucket, > over cutting things down in to more digestable chunks and splitting them in > to different buckets (which pulls more from a relational database mindset)?
There are no hard coded limitations on either the number of buckets your cluster can have, or the number of entries in an individual bucket. Your data model and application needs should dictate the number of buckets you need/want. If you only need one bucket, you only need one bucket. That said, more often than not users chose to use many buckets in their applications for various reasons. > Clearly there will be a map/reduce implication of having to iterate over > millions of entries, but is there an appreciable difference in the > read/write speed that Riak performs at, when its buckets get quite large? You're right. Outside of listing all the keys in a bucket for full bucket map/reduce, key-based GET, PUT and DELETE performance should not be affected by very large buckets. You're right about the docs, too. There is some language about how many buckets you can have and the resources they consume -- http://wiki.basho.com/REST-API.html#Bucket-operations -- (that I may or may not have just tweaked to make more apparent) but there should be more. I'll see about adding some additional info around "bucket basics." Thanks, Mark _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
