Oh, I get what the siblings/allow_mult business is for, just wondering if I can use it off-label a little, and eventually do 'conflict resolution' which would make the results be much more reasonable.
But it sounds like I shouldn't do that. That's totally fine. Since I'm doing a write-once, update-never environment - I don't see how allow_mult would help me otherwise? A new write will always be to a new key. There will never be an update. So if that's the case - no need for allow_mult. Does that sound right? -B. On Wed, Sep 25, 2013 at 6:30 PM, Jeremiah Peschka < jeremiah.pesc...@gmail.com> wrote: > inline. > > --- > Jeremiah Peschka - Founder, Brent Ozar Unlimited > MCITP: SQL Server 2008, MVP > Cloudera Certified Developer for Apache Hadoop > > > On Wed, Sep 25, 2013 at 2:47 PM, Brady Wetherington > <br...@bespincorp.com>wrote: > >> I've built it a solid proof-of-concept system on leveldb, and use some 2i >> indexes in order to search for certain things - usually just for counts of >> things. >> >> I have two questions so far: >> >> First off, why is Bitcask the default? Is it just because it is faster? >> Or is it considered more 'stable' or something? >> > > Long ago, when bitcask was elected as the default, LevelDB was not a > thing. > > Databases strive for stability and the principle of least surprise. > Changing anything can potentially introduce performance regressions, > stability problems, and any host of other undesirable and reputation > destroying things. > > Changing the storage back end is high up on the list of things I'd never > want to do in a database. Why do you think MySQL still defaults to MyISAM? > > >> >> Next, I've learned about the allow_mult feature you can set on buckets. I >> wonder if I should use this for my most heavily-used primary-purpose >> queries? Is there a limit to how many 'siblings' you can have for an entry? >> Is it inadvisable to do what I'm talking about? Would fetching all of the >> siblings end up being a disastrous nightmare or something? >> > > The upper limit will depend on the size of your objects. You don't want to > have object sizes (including siblings) much beyond 6MB. You'll have a lot > of network congestion. You certainly *could* have bigger object + sibling > collections, but you'd want to beef up the network backend to something > like 10GbE, 40GbE, or InfiniBand to deal with the increased gossip. > > Fetching all of your siblings is bad if you never resolve siblings since > you'll have a lot of data. > > Allow_mult is typically turned on for production clusters. This is set off > by default to help new users get a handle on Riak quickly without having to > worry about siblings. Once you get the hang of how Riak behaves, turning on > siblings is usually a good thing. > > Depending on resolution, it's probably best to read your data, resolve > siblings, and send that garbage collected object back to Riak - even if > you're performing a "read only" query. The new Riak DT features eliminate > some of the worry about siblings by pushing the responsibility back down to > Riak. Those features are only available if you're building from source, but > hopefully Riak 2.0 will be out soon. > > >> I *assume* - and I could be wrong - that a 2i query would be slower than >> a fetch-of-siblings for a particular key - is that wrong? >> >> If I switch from using 2i indexes to using allow_mult and siblings, we'd >> be talking a few hundred thousand to low millions for a sibling-count. >> > > I do not think 'siblings' means what you think it means. > > A sibling would occur if two clients, A and B, read v1 of an object and > then issue writes. > > Client A updates object and sets preferences to ['cat pictures', 'ham > sandwiches'] > Client B updates object and sets preferences to ['knitting with bacon'] > > With allow_mult enabled you'd have two versions of the object. These are > siblings. > > If you're thinking of some kind of index created by your application, you > could look at 2i vs using siblings to build a secondary index: > http://basho.com/index-for-fun-and-for-profit/ Even when you're creating > your own secondary index, you still want to perform garbage collection on > the data you're storing in Riak. > > >> Thanks for making an excellent product! Can't wait to get this bad boy >> into production and really see what it can do! >> >> -B. >> >> _______________________________________________ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com