Re: allow_mult vs. 2i

Brady Wetherington Thu, 26 Sep 2013 16:10:59 -0700

Oh, I get what the siblings/allow_mult business is for, just wondering if I
can use it off-label a little, and eventually do 'conflict resolution'
which would make the results be much more reasonable.


But it sounds like I shouldn't do that. That's totally fine.

Since I'm doing a write-once, update-never environment - I don't see how
allow_mult would help me otherwise? A new write will always be to a new
key. There will never be an update. So if that's the case - no need for
allow_mult. Does that sound right?

-B.


On Wed, Sep 25, 2013 at 6:30 PM, Jeremiah Peschka <
jeremiah.pesc...@gmail.com> wrote:

> inline.
>
> ---
> Jeremiah Peschka - Founder, Brent Ozar Unlimited
> MCITP: SQL Server 2008, MVP
> Cloudera Certified Developer for Apache Hadoop
>
>
> On Wed, Sep 25, 2013 at 2:47 PM, Brady Wetherington 
> <br...@bespincorp.com>wrote:
>
>> I've built it a solid proof-of-concept system on leveldb, and use some 2i
>> indexes in order to search for certain things - usually just for counts of
>> things.
>>
>> I have two questions so far:
>>
>> First off, why is Bitcask the default? Is it just because it is faster?
>> Or is it considered more 'stable' or something?
>>
>
> Long ago, when bitcask was elected as the default, LevelDB was not a
> thing.
>
> Databases strive for stability and the principle of least surprise.
> Changing anything can potentially introduce performance regressions,
> stability problems, and any host of other undesirable and reputation
> destroying things.
>
> Changing the storage back end is high up on the list of things I'd never
> want to do in a database. Why do you think MySQL still defaults to MyISAM?
>
>
>>
>> Next, I've learned about the allow_mult feature you can set on buckets. I
>> wonder if I should use this for my most heavily-used primary-purpose
>> queries? Is there a limit to how many 'siblings' you can have for an entry?
>> Is it inadvisable to do what I'm talking about? Would fetching all of the
>> siblings end up being a disastrous nightmare or something?
>>
>
> The upper limit will depend on the size of your objects. You don't want to
> have object sizes (including siblings) much beyond 6MB. You'll have a lot
> of network congestion. You certainly *could* have bigger object + sibling
> collections, but you'd want to beef up the network backend to something
> like 10GbE, 40GbE, or InfiniBand to deal with the increased gossip.
>
> Fetching all of your siblings is bad if you never resolve siblings since
> you'll have a lot of data.
>
> Allow_mult is typically turned on for production clusters. This is set off
> by default to help new users get a handle on Riak quickly without having to
> worry about siblings. Once you get the hang of how Riak behaves, turning on
> siblings is usually a good thing.
>
> Depending on resolution, it's probably best to read your data, resolve
> siblings, and send that garbage collected object back to Riak - even if
> you're performing a "read only" query. The new Riak DT features eliminate
> some of the worry about siblings by pushing the responsibility back down to
> Riak. Those features are only available if you're building from source, but
> hopefully Riak 2.0 will be out soon.
>
>
>> I *assume* - and I could be wrong - that a 2i query would be slower than
>> a fetch-of-siblings for a particular key - is that wrong?
>>
>> If I switch from using 2i indexes to using allow_mult and siblings, we'd
>> be talking a few hundred thousand to low millions for a sibling-count.
>>
>
> I do not think 'siblings' means what you think it means.
>
> A sibling would occur if two clients, A and B, read v1 of an object and
> then issue writes.
>
> Client A updates object and sets preferences to ['cat pictures', 'ham
> sandwiches']
> Client B updates object and sets preferences to ['knitting with bacon']
>
> With allow_mult enabled you'd have two versions of the object. These are
> siblings.
>
> If you're thinking of some kind of index created by your application, you
> could look at 2i vs using siblings to build a secondary index:
> http://basho.com/index-for-fun-and-for-profit/ Even when you're creating
> your own secondary index, you still want to perform garbage collection on
> the data you're storing in Riak.
>
>
>> Thanks for making an excellent product! Can't wait to get this bad boy
>> into production and really see what it can do!
>>
>> -B.
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: allow_mult vs. 2i

Reply via email to