Listing large key spaces, and bucket Links header

Gavin Carr Mon, 06 Sep 2010 09:03:45 -0700

Greetings all,

I'm a riak newbie, trying out version 0.12.1. One of the use-cases
we're interested in is using riak as a backend for brackup[1], an
open source backup tool.


brackup supports pluggable targets/backends, including filesystems, 
ftp, sftp, Amazon S3, etc. I've written a first-pass riak target 
that I'm testing, which works nicely for small backups. I'm now
looking to scale that up, and had a couple of questions.


1. Almost entirely brackup only needs per-key lookups and writes. 
The one one exception is garbage collection, where I need to walk 
the entire set of keys to figure out which chunks are orphaned and 
can therefore be deleted.

So I'm wondering is there an upper limit on number of keys where
"listing keys is expensive" turns into "listing keys is insane"?
I'm looking at millions of keys/chunks for large backups, I guess.

I guess splitting chunks over multiple buckets and performing
multiple queries might help. Is there an recommended upper limit
for keys per bucket on bitcask for sane list keys performance?


2. There seems to be a standard Link header coming back on my 
bucket key queries that is huge - twice the size of the response
body with my 45b keys. So for 50k keys the response is about 1MB,
and the Link header is about 2MB! I'm wondering if there's any
way of turning this off, given I aren't doing any Link walking?


Thanks,
Gavin


[1] http://code.google.com/p/brackup/


_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Listing large key spaces, and bucket Links header

Reply via email to