Hey Christopher,

On Thu, Jan 20, 2011 at 9:58 AM, Christopher Rueber <[email protected]> wrote:
> This seems like a question that would be answered in some of the docs, but I
> can't find the details...
> What kind of upper limitations are on large buckets? Millions of entries?
> Billions? Directly correlating to the amount of disk space available to it?
> Is there any kind of performance degradation of using one massive bucket,
> over cutting things down in to more digestable chunks and splitting them in
> to different buckets (which pulls more from a relational database mindset)?

There are no hard coded limitations on either the number of buckets
your cluster can have, or the number of entries in an individual
bucket. Your data model and application needs should dictate the
number of buckets you need/want. If you only need one bucket, you only
need one bucket. That said, more often than not users chose to use
many buckets in their applications for various reasons.

> Clearly there will be a map/reduce implication of having to iterate over
> millions of entries, but is there an appreciable difference in the
> read/write speed that Riak performs at, when its buckets get quite large?

You're right. Outside of listing all the keys in a bucket for full
bucket map/reduce, key-based GET, PUT and DELETE performance should
not be affected by very large buckets.

You're right about the docs, too. There is some language about how
many buckets you can have and the resources they consume --
http://wiki.basho.com/REST-API.html#Bucket-operations -- (that I may
or may not have just tweaked to make more apparent) but there should
be more. I'll see about adding some additional info around "bucket
basics."

Thanks,

Mark

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to