Re: use Siblings to implement a message queue?
Hi Alex, I get what you mean, I would say; if you think your project will succeed don't think small, RabbitMQ and Riak can be pretty easy to understand and manage once you play with them, don't be scare to try, high demand systems usually have several subsystems to get their tasks done quickly like: 1. Distributed data. 2. Distributed queues. 3. Text indexing/search. With Riak+Yokozuna you cover 1 and 3 which is a pretty good combination, as a side note, maybe, just maybe, if you "divide and conquer" you may be able to remove the "Set" concept and just work with CRDT or siblings, let me explain briefly: CRDT would warranty your distributed CAS operations per your designed key, instead of treating the keys as a Set, treat them independently and each CRDT key to manage its own states, that way you get the atomicity you want without worrying about locking or distributed concurrency in a collection of items since each key is unique and each would control its own state, or let each key have siblings and conflict resolution, as long as you don't create too many siblings per key in a very small amount of time. The the only show stopper for CRDT is if you need to do operations in more than one atomically, then you will have to use siblings, even for siblings, your key approach would be to try to operate atomically in the minimum amount of states as possible so that you can avoid using Sets or Queues. I hope I have expressed my thoughts well since explaining all this can get complicated really quick, or become a really long talk. Hope that helps, Guido. On 12/09/13 17:45, Alex Rice wrote: Hey all, thanks for the feedback, this is interesting! re: Those CRDT white papers look pretty complicated, I will definitely will leave that to you hardcore mathematicians and computer scientists :) re: Riak 2.0, I see there are git branches for 2.0 related stuff in Riak and in corrugatediron, I am guessing it's still too rough around the edges for a newbie like me to be messing around with? re: RabbitMQ I had heard of it but am hesitant to add another daemon/system dependency. But I am definitely considering your advice, Guido! Also for this particular project (a multiplayer game) I may be able to get away with Sets, don't actually really need proper Queues. Actually I'm not sure. Just thinking out loud here. For a Set implementation in Riak, I am thinking of modeling it after this blog post, I cannot find which at the moment, which described how to implement a lock-free Set in Redis. Redis is not distributed, but I think the data structure is still relevant: * To get around the eventual-consistency issue, a prerequisite would be to always use R=1 and W=all. This will optimize it for reads because there will be a process polling Riak to remove messages out of the Set. * In the Redis implementation, they used the atomic APPEND operation to append entries into the set, as a list but with notation to add or remove, prefixed with + or -. e.g. +itemA +itemB -itemA +itemC => yields { itemB, itemC } * Obviously Riak doesn't have atomic appends, however setting a bucket to allow_mult = true, and always Put-ing with null vector clock, then the Siblings feature should be equivalent. Instead of a value being appended to, the Siblings would be populated with +itemA, +itemB, -itemA, +itemC, etc. * Any process reading the Set is also be responsible for collating and Put-ing the condensed set back, after it reaches some size limit or I think they called it 'fragmentation'. This condense and re-Put operation would be the tricky part, and would need to use vector clock and ensure there are 0 siblings when finished. But it should be possible? It seems like this is an uber-simplified form of a CRDT data structure? Thanks, Alex On Thu, Sep 12, 2013 at 2:12 AM, Guido Medina wrote: Alex, RabbitMQ which is a good high performer, developed in Erlang and scales just as Riak. The old saying, the right tool for the right job, I like how fast Riak is fetching/storing key values on a distributed environment, I don't like Riak for queues, is it because it wasn't designed for that? CRDT like CAS operations on structures (to mention some for Java) like LinkedBlockingQueue and ConcurrentLinkedQueue should be painful enough to develop in a CRDT environment, cause now you would be talking of guarding states of few within the same structure; say AtomicReference of several variables to warranty a non-blocking CAS operation, in this case heap and tail of the queue (Not so true for LinkedBlockingQueue) And performance of several CRDT for high performance queue I don't actually think is going to be good. If I were to use Riak for my big data environment and would like to match Riak on a distributed queueing system then I would use like I said, RabbitMQ. Guido. On 12/09/13 00:29, Alex Rice wrote: Hi I'm very new to Riak. The allow_mult = true / Siblings feature is very interesting. Could it be
Re: use Siblings to implement a message queue?
--- Jeremiah Peschka - Founder, Brent Ozar Unlimited MCITP: SQL Server 2008, MVP Cloudera Certified Developer for Apache Hadoop On Thu, Sep 12, 2013 at 9:45 AM, Alex Rice wrote: > Hey all, thanks for the feedback, this is interesting! > > re: Those CRDT white papers look pretty complicated, I will definitely > will leave that to you hardcore mathematicians and computer scientists > :) > re: Riak 2.0, I see there are git branches for 2.0 related stuff in > Riak and in corrugatediron, I am guessing it's still too rough around > the edges for a newbie like me to be messing around with? > I can't speak for Riak 2.0, but CorrugatedIron's support doesn't have any tests supporting it... I hope to get on that next week to see where things are. The only hold up is time on my end. I believe the riak branch should be workable for development. There are features in CI that need clean up, but I'll start pushing out RC builds through nuget for the adventurous. > re: RabbitMQ I had heard of it but am hesitant to add another > daemon/system dependency. But I am definitely considering your > advice, Guido! > > Also for this particular project (a multiplayer game) I may be able to > get away with Sets, don't actually really need proper Queues. Actually > I'm not sure. Just thinking out loud here. > > For a Set implementation in Riak, I am thinking of modeling it after > this blog post, I cannot find which at the moment, which described how > to implement a lock-free Set in Redis. Redis is not distributed, but I > think the data structure is still relevant: > > * To get around the eventual-consistency issue, a prerequisite would > be to always use R=1 and W=all. This will optimize it for reads > because there will be a process polling Riak to remove messages out of > the Set. > * In the Redis implementation, they used the atomic APPEND operation > to append entries into the set, as a list but with notation to add or > remove, prefixed with + or -. e.g. > +itemA > +itemB > -itemA > +itemC > => yields { itemB, itemC } > * Obviously Riak doesn't have atomic appends, however setting a bucket > to allow_mult = true, and always Put-ing with null vector clock, then > the Siblings feature should be equivalent. Instead of a value being > appended to, the Siblings would be populated with +itemA, +itemB, > -itemA, +itemC, etc. > >From an API perspective, you'll do something similar in Riak 2.0. The CorrugatedIron operation would allow to atomically put individual items into a set. I'm hoping to create change tracking data structures for use with the get/put/delete functionality of Riak's server side CRDTs. > * Any process reading the Set is also be responsible for collating and > Put-ing the condensed set back, after it reaches some size limit or I > think they called it 'fragmentation'. This condense and re-Put > operation would be the tricky part, and would need to use vector clock > and ensure there are 0 siblings when finished. But it should be > possible? It seems like this is an uber-simplified form of a CRDT data > structure? > If you're implementing the CRDT on the client side, you'll need to figure out the garbage collection yourself. If you're able to use Riak's data structures, then you can safely rely on someone else handling the problems for you. If it weren't for my day job, I'd have this CRDT support in CI by now ;) > > Thanks, > Alex > > > > > > > > > > > > > > On Thu, Sep 12, 2013 at 2:12 AM, Guido Medina > wrote: > > Alex, > > > > RabbitMQ which is a good high performer, developed in Erlang and scales > just > > as Riak. > > > > The old saying, the right tool for the right job, I like how fast Riak is > > fetching/storing key values on a distributed environment, I don't like > Riak > > for queues, is it because it wasn't designed for that? CRDT like CAS > > operations on structures (to mention some for Java) like > LinkedBlockingQueue > > and ConcurrentLinkedQueue should be painful enough to develop in a CRDT > > environment, cause now you would be talking of guarding states of few > within > > the same structure; say AtomicReference of several variables to warranty > a > > non-blocking CAS operation, in this case heap and tail of the queue (Not > so > > true for LinkedBlockingQueue) > > > > And performance of several CRDT for high performance queue I don't > actually > > think is going to be good. > > > > If I were to use Riak for my big data environment and would like to match > > Riak on a distributed queueing system then I would use like I said, > > RabbitMQ. > > > > Guido. > > > > > > On 12/09/13 00:29, Alex Rice wrote: > >> > >> Hi I'm very new to Riak. The allow_mult = true / Siblings feature is > >> very interesting. Could it be used to implement a high performance > >> collection like a Queue or Set, in a lock free manner? The Riak docs > >> make it sound like allow_mult is mainly for confict resolution and > >> degenerate cases, rather than a feature which a data structure could > >>
Re: use Siblings to implement a message queue?
Hey all, thanks for the feedback, this is interesting! re: Those CRDT white papers look pretty complicated, I will definitely will leave that to you hardcore mathematicians and computer scientists :) re: Riak 2.0, I see there are git branches for 2.0 related stuff in Riak and in corrugatediron, I am guessing it's still too rough around the edges for a newbie like me to be messing around with? re: RabbitMQ I had heard of it but am hesitant to add another daemon/system dependency. But I am definitely considering your advice, Guido! Also for this particular project (a multiplayer game) I may be able to get away with Sets, don't actually really need proper Queues. Actually I'm not sure. Just thinking out loud here. For a Set implementation in Riak, I am thinking of modeling it after this blog post, I cannot find which at the moment, which described how to implement a lock-free Set in Redis. Redis is not distributed, but I think the data structure is still relevant: * To get around the eventual-consistency issue, a prerequisite would be to always use R=1 and W=all. This will optimize it for reads because there will be a process polling Riak to remove messages out of the Set. * In the Redis implementation, they used the atomic APPEND operation to append entries into the set, as a list but with notation to add or remove, prefixed with + or -. e.g. +itemA +itemB -itemA +itemC => yields { itemB, itemC } * Obviously Riak doesn't have atomic appends, however setting a bucket to allow_mult = true, and always Put-ing with null vector clock, then the Siblings feature should be equivalent. Instead of a value being appended to, the Siblings would be populated with +itemA, +itemB, -itemA, +itemC, etc. * Any process reading the Set is also be responsible for collating and Put-ing the condensed set back, after it reaches some size limit or I think they called it 'fragmentation'. This condense and re-Put operation would be the tricky part, and would need to use vector clock and ensure there are 0 siblings when finished. But it should be possible? It seems like this is an uber-simplified form of a CRDT data structure? Thanks, Alex On Thu, Sep 12, 2013 at 2:12 AM, Guido Medina wrote: > Alex, > > RabbitMQ which is a good high performer, developed in Erlang and scales just > as Riak. > > The old saying, the right tool for the right job, I like how fast Riak is > fetching/storing key values on a distributed environment, I don't like Riak > for queues, is it because it wasn't designed for that? CRDT like CAS > operations on structures (to mention some for Java) like LinkedBlockingQueue > and ConcurrentLinkedQueue should be painful enough to develop in a CRDT > environment, cause now you would be talking of guarding states of few within > the same structure; say AtomicReference of several variables to warranty a > non-blocking CAS operation, in this case heap and tail of the queue (Not so > true for LinkedBlockingQueue) > > And performance of several CRDT for high performance queue I don't actually > think is going to be good. > > If I were to use Riak for my big data environment and would like to match > Riak on a distributed queueing system then I would use like I said, > RabbitMQ. > > Guido. > > > On 12/09/13 00:29, Alex Rice wrote: >> >> Hi I'm very new to Riak. The allow_mult = true / Siblings feature is >> very interesting. Could it be used to implement a high performance >> collection like a Queue or Set, in a lock free manner? The Riak docs >> make it sound like allow_mult is mainly for confict resolution and >> degenerate cases, rather than a feature which a data structure could >> be designed around. Does anyone have any links to share, or thoughts >> about this broader issue? >> Thanks much! >> Alex >> >> ___ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: use Siblings to implement a message queue?
Alex, RabbitMQ which is a good high performer, developed in Erlang and scales just as Riak. The old saying, the right tool for the right job, I like how fast Riak is fetching/storing key values on a distributed environment, I don't like Riak for queues, is it because it wasn't designed for that? CRDT like CAS operations on structures (to mention some for Java) like LinkedBlockingQueue and ConcurrentLinkedQueue should be painful enough to develop in a CRDT environment, cause now you would be talking of guarding states of few within the same structure; say AtomicReference of several variables to warranty a non-blocking CAS operation, in this case heap and tail of the queue (Not so true for LinkedBlockingQueue) And performance of several CRDT for high performance queue I don't actually think is going to be good. If I were to use Riak for my big data environment and would like to match Riak on a distributed queueing system then I would use like I said, RabbitMQ. Guido. On 12/09/13 00:29, Alex Rice wrote: Hi I'm very new to Riak. The allow_mult = true / Siblings feature is very interesting. Could it be used to implement a high performance collection like a Queue or Set, in a lock free manner? The Riak docs make it sound like allow_mult is mainly for confict resolution and degenerate cases, rather than a feature which a data structure could be designed around. Does anyone have any links to share, or thoughts about this broader issue? Thanks much! Alex ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: use Siblings to implement a message queue?
That doesn't seem bad at all. I think there even were some attempts to make a Riak backend for RabbitMQ. I'd say, keep messages as separate objects, and keep their IDs ordered inside a sibling-based structure, built on top of statebox[1] or riak_dt[2]. The downsides are: two writes and reads every time, potential data leaks if you don't make sure the consumed message is deleted. But storing an entire queue in a single object sounds like a dangerous idea – one might grow too much. Also, keep in mind that your queues will be eventually consistent. Or, as Jeremiah says, you can really wait till Riak 2.0, it might have most of the implementation problems solved, you'll just need to write a bit of code. [1] https://github.com/mochi/statebox [2] https://github.com/basho/riak_dt On Wed, Sep 11, 2013 at 4:29 PM, Alex Rice wrote: > Hi I'm very new to Riak. The allow_mult = true / Siblings feature is > very interesting. Could it be used to implement a high performance > collection like a Queue or Set, in a lock free manner? The Riak docs > make it sound like allow_mult is mainly for confict resolution and > degenerate cases, rather than a feature which a data structure could > be designed around. Does anyone have any links to share, or thoughts > about this broader issue? > Thanks much! > Alex > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > -- Best regards, Dmitry Demeshchuk ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: use Siblings to implement a message queue?
To add to what Jeremiah said about CRDT support in the upcoming release of Riak (2.0) see this GitHub issue: https://github.com/basho/riak/issues/354 On Wed, Sep 11, 2013 at 7:42 PM, Jeremiah Peschka < jeremiah.pesc...@gmail.com> wrote: > Siblings don't have to be limited to terrible scenarios - they're a > feature that makes sense for many workloads using an eventually consistent, > distributed database like Riak. > > Siblings are being used under the covers in Riak 2.0 to implement sets, > maps, and counters. Riak 1.4.x implements counters using siblings, too. > > There's a whole set of data types called CRDTs that can be used to > implement rich data types in a system like Riak. There's work already in > place to do this with some of the existing clients and I started a project > to implement CRDTs through CorrugatedIron: > https://github.com/peschkaj/MoarDT/ > > The only tricky bit is that you can never be quite sure if you've read > everything. Oh, and garbage collecting the CRDT can be tricky if you're > using a client generated CRDT - you can end up with a large number of > siblings if you aren't careful. > > > --- > Jeremiah Peschka - Founder, Brent Ozar Unlimited > MCITP: SQL Server 2008, MVP > Cloudera Certified Developer for Apache Hadoop > > > On Wed, Sep 11, 2013 at 4:29 PM, Alex Rice wrote: > >> Hi I'm very new to Riak. The allow_mult = true / Siblings feature is >> very interesting. Could it be used to implement a high performance >> collection like a Queue or Set, in a lock free manner? The Riak docs >> make it sound like allow_mult is mainly for confict resolution and >> degenerate cases, rather than a feature which a data structure could >> be designed around. Does anyone have any links to share, or thoughts >> about this broader issue? >> Thanks much! >> Alex >> >> ___ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> > > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: use Siblings to implement a message queue?
Siblings don't have to be limited to terrible scenarios - they're a feature that makes sense for many workloads using an eventually consistent, distributed database like Riak. Siblings are being used under the covers in Riak 2.0 to implement sets, maps, and counters. Riak 1.4.x implements counters using siblings, too. There's a whole set of data types called CRDTs that can be used to implement rich data types in a system like Riak. There's work already in place to do this with some of the existing clients and I started a project to implement CRDTs through CorrugatedIron: https://github.com/peschkaj/MoarDT/ The only tricky bit is that you can never be quite sure if you've read everything. Oh, and garbage collecting the CRDT can be tricky if you're using a client generated CRDT - you can end up with a large number of siblings if you aren't careful. --- Jeremiah Peschka - Founder, Brent Ozar Unlimited MCITP: SQL Server 2008, MVP Cloudera Certified Developer for Apache Hadoop On Wed, Sep 11, 2013 at 4:29 PM, Alex Rice wrote: > Hi I'm very new to Riak. The allow_mult = true / Siblings feature is > very interesting. Could it be used to implement a high performance > collection like a Queue or Set, in a lock free manner? The Riak docs > make it sound like allow_mult is mainly for confict resolution and > degenerate cases, rather than a feature which a data structure could > be designed around. Does anyone have any links to share, or thoughts > about this broader issue? > Thanks much! > Alex > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com