Re: use Siblings to implement a message queue?

2013-09-12 Thread Guido Medina

Alex,

RabbitMQ which is a good high performer, developed in Erlang and scales 
just as Riak.


The old saying, the right tool for the right job, I like how fast Riak 
is fetching/storing key values on a distributed environment, I don't 
like Riak for queues, is it because it wasn't designed for that? CRDT 
like CAS operations on structures (to mention some for Java) like 
LinkedBlockingQueue and ConcurrentLinkedQueue should be painful enough 
to develop in a CRDT environment, cause now you would be talking of 
guarding states of few within the same structure; say AtomicReference of 
several variables to warranty a non-blocking CAS operation, in this case 
heap and tail of the queue (Not so true for LinkedBlockingQueue)


And performance of several CRDT for high performance queue I don't 
actually think is going to be good.


If I were to use Riak for my big data environment and would like to 
match Riak on a distributed queueing system then I would use like I 
said, RabbitMQ.


Guido.

On 12/09/13 00:29, Alex Rice wrote:

Hi I'm very new to Riak. The allow_mult = true /  Siblings feature is
very interesting. Could it be used to implement a high performance
collection like a Queue or Set, in a lock free manner? The Riak docs
make it sound like allow_mult is mainly for confict resolution and
degenerate cases, rather than a feature which a data structure could
be designed around.  Does anyone have any links to share, or thoughts
about this broader issue?
Thanks much!
Alex

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: use Siblings to implement a message queue?

2013-09-12 Thread Alex Rice
Hey all, thanks for the feedback, this is interesting!

re: Those CRDT white papers look pretty complicated, I will definitely
will leave that to you hardcore mathematicians and computer scientists
:)
re: Riak 2.0, I see there are git branches for 2.0 related stuff in
Riak and in corrugatediron, I am guessing it's still too rough around
the edges for a newbie like me to be messing around with?
re: RabbitMQ I had heard of it but am hesitant to add another
daemon/system dependency. But I am  definitely considering your
advice, Guido!

Also for this particular project (a multiplayer game) I may be able to
get away with Sets, don't actually really need proper Queues. Actually
I'm not sure. Just thinking out loud here.

For a Set implementation in Riak, I am thinking of modeling it after
this blog post, I cannot find which at the moment, which described how
to implement a lock-free Set in Redis. Redis is not distributed, but I
think the data structure is still relevant:

* To get around the eventual-consistency issue, a prerequisite would
be to always use R=1 and W=all. This will optimize it for reads
because there will be a process polling Riak to remove messages out of
the Set.
*  In the Redis implementation, they used the atomic APPEND operation
to append entries into the set, as a list but with notation to add or
remove, prefixed with + or -. e.g.
 +itemA
 +itemB
 -itemA
 +itemC
 = yields { itemB, itemC }
* Obviously Riak doesn't have atomic appends, however setting a bucket
to allow_mult = true, and always Put-ing with null vector clock, then
the Siblings feature should be equivalent. Instead of a value being
appended to, the Siblings would be populated with +itemA, +itemB,
-itemA, +itemC, etc.
* Any process reading the Set is also be responsible for collating and
Put-ing the condensed set back, after it reaches some size limit or I
think they called it 'fragmentation'.  This condense and re-Put
operation would be the tricky part, and would need to use vector clock
and ensure there are 0 siblings when finished. But it should be
possible? It seems like this is an uber-simplified form of a CRDT data
structure?

Thanks,
Alex













On Thu, Sep 12, 2013 at 2:12 AM, Guido Medina guido.med...@temetra.com wrote:
 Alex,

 RabbitMQ which is a good high performer, developed in Erlang and scales just
 as Riak.

 The old saying, the right tool for the right job, I like how fast Riak is
 fetching/storing key values on a distributed environment, I don't like Riak
 for queues, is it because it wasn't designed for that? CRDT like CAS
 operations on structures (to mention some for Java) like LinkedBlockingQueue
 and ConcurrentLinkedQueue should be painful enough to develop in a CRDT
 environment, cause now you would be talking of guarding states of few within
 the same structure; say AtomicReference of several variables to warranty a
 non-blocking CAS operation, in this case heap and tail of the queue (Not so
 true for LinkedBlockingQueue)

 And performance of several CRDT for high performance queue I don't actually
 think is going to be good.

 If I were to use Riak for my big data environment and would like to match
 Riak on a distributed queueing system then I would use like I said,
 RabbitMQ.

 Guido.


 On 12/09/13 00:29, Alex Rice wrote:

 Hi I'm very new to Riak. The allow_mult = true /  Siblings feature is
 very interesting. Could it be used to implement a high performance
 collection like a Queue or Set, in a lock free manner? The Riak docs
 make it sound like allow_mult is mainly for confict resolution and
 degenerate cases, rather than a feature which a data structure could
 be designed around.  Does anyone have any links to share, or thoughts
 about this broader issue?
 Thanks much!
 Alex

 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: use Siblings to implement a message queue?

2013-09-12 Thread Jeremiah Peschka
---
Jeremiah Peschka - Founder, Brent Ozar Unlimited
MCITP: SQL Server 2008, MVP
Cloudera Certified Developer for Apache Hadoop


On Thu, Sep 12, 2013 at 9:45 AM, Alex Rice a...@mindlube.com wrote:

 Hey all, thanks for the feedback, this is interesting!

 re: Those CRDT white papers look pretty complicated, I will definitely
 will leave that to you hardcore mathematicians and computer scientists
 :)
 re: Riak 2.0, I see there are git branches for 2.0 related stuff in
 Riak and in corrugatediron, I am guessing it's still too rough around
 the edges for a newbie like me to be messing around with?


I can't speak for Riak 2.0, but CorrugatedIron's support doesn't have any
tests supporting it... I hope to get on that next week to see where things
are. The only hold up is time on my end. I believe the riak branch should
be workable for development. There are features in CI that need clean up,
but I'll start pushing out RC builds through nuget for the adventurous.


 re: RabbitMQ I had heard of it but am hesitant to add another
 daemon/system dependency. But I am  definitely considering your
 advice, Guido!

 Also for this particular project (a multiplayer game) I may be able to
 get away with Sets, don't actually really need proper Queues. Actually
 I'm not sure. Just thinking out loud here.

 For a Set implementation in Riak, I am thinking of modeling it after
 this blog post, I cannot find which at the moment, which described how
 to implement a lock-free Set in Redis. Redis is not distributed, but I
 think the data structure is still relevant:

 * To get around the eventual-consistency issue, a prerequisite would
 be to always use R=1 and W=all. This will optimize it for reads
 because there will be a process polling Riak to remove messages out of
 the Set.
 *  In the Redis implementation, they used the atomic APPEND operation
 to append entries into the set, as a list but with notation to add or
 remove, prefixed with + or -. e.g.
  +itemA
  +itemB
  -itemA
  +itemC
  = yields { itemB, itemC }
 * Obviously Riak doesn't have atomic appends, however setting a bucket
 to allow_mult = true, and always Put-ing with null vector clock, then
 the Siblings feature should be equivalent. Instead of a value being
 appended to, the Siblings would be populated with +itemA, +itemB,
 -itemA, +itemC, etc.


From an API perspective, you'll do something similar in Riak 2.0. The
CorrugatedIron operation would allow to atomically put individual items
into a set. I'm hoping to create change tracking data structures for use
with the get/put/delete functionality of Riak's server side CRDTs.


 * Any process reading the Set is also be responsible for collating and
 Put-ing the condensed set back, after it reaches some size limit or I
 think they called it 'fragmentation'.  This condense and re-Put
 operation would be the tricky part, and would need to use vector clock
 and ensure there are 0 siblings when finished. But it should be
 possible? It seems like this is an uber-simplified form of a CRDT data
 structure?


If you're implementing the CRDT on the client side, you'll need to figure
out the garbage collection yourself. If you're able to use Riak's data
structures, then you can safely rely on someone else handling the problems
for you.

If it weren't for my day job, I'd have this CRDT support in CI by now ;)



 Thanks,
 Alex













 On Thu, Sep 12, 2013 at 2:12 AM, Guido Medina guido.med...@temetra.com
 wrote:
  Alex,
 
  RabbitMQ which is a good high performer, developed in Erlang and scales
 just
  as Riak.
 
  The old saying, the right tool for the right job, I like how fast Riak is
  fetching/storing key values on a distributed environment, I don't like
 Riak
  for queues, is it because it wasn't designed for that? CRDT like CAS
  operations on structures (to mention some for Java) like
 LinkedBlockingQueue
  and ConcurrentLinkedQueue should be painful enough to develop in a CRDT
  environment, cause now you would be talking of guarding states of few
 within
  the same structure; say AtomicReference of several variables to warranty
 a
  non-blocking CAS operation, in this case heap and tail of the queue (Not
 so
  true for LinkedBlockingQueue)
 
  And performance of several CRDT for high performance queue I don't
 actually
  think is going to be good.
 
  If I were to use Riak for my big data environment and would like to match
  Riak on a distributed queueing system then I would use like I said,
  RabbitMQ.
 
  Guido.
 
 
  On 12/09/13 00:29, Alex Rice wrote:
 
  Hi I'm very new to Riak. The allow_mult = true /  Siblings feature is
  very interesting. Could it be used to implement a high performance
  collection like a Queue or Set, in a lock free manner? The Riak docs
  make it sound like allow_mult is mainly for confict resolution and
  degenerate cases, rather than a feature which a data structure could
  be designed around.  Does anyone have any links to share, or thoughts
  about this broader issue?
  Thanks 

Re: use Siblings to implement a message queue?

2013-09-12 Thread Guido Medina

Hi Alex,

I get what you mean, I would say; if you think your project will succeed 
don't think small, RabbitMQ and Riak can be pretty easy to understand 
and manage once you play with them, don't be scare to try, high demand 
systems usually have several subsystems to get their tasks done quickly 
like:


1. Distributed data.
2. Distributed queues.
3. Text indexing/search.

With Riak+Yokozuna you cover 1 and 3 which is a pretty good combination, 
as a side note, maybe, just maybe, if you divide and conquer you may 
be able to remove the Set concept and just work with CRDT or siblings, 
let me explain briefly:


CRDT would warranty your distributed CAS operations per your designed 
key, instead of treating the keys as a Set, treat them independently and 
each CRDT key to manage its own states, that way you get the atomicity 
you want without worrying about locking or distributed concurrency in a 
collection of items since each key is unique and each would control its 
own state, or let each key have siblings and conflict resolution, as 
long as you don't create too many siblings per key in a very small 
amount of time.


The the only show stopper for CRDT is if you need to do operations in 
more than one atomically, then you will have to use siblings, even for 
siblings, your key approach would be to try to operate atomically in the 
minimum amount of states as possible so that you can avoid using Sets or 
Queues.


I hope I have expressed my thoughts well since explaining all this can 
get complicated really quick, or become a really long talk.


Hope that helps,

Guido.

On 12/09/13 17:45, Alex Rice wrote:

Hey all, thanks for the feedback, this is interesting!

re: Those CRDT white papers look pretty complicated, I will definitely
will leave that to you hardcore mathematicians and computer scientists
:)
re: Riak 2.0, I see there are git branches for 2.0 related stuff in
Riak and in corrugatediron, I am guessing it's still too rough around
the edges for a newbie like me to be messing around with?
re: RabbitMQ I had heard of it but am hesitant to add another
daemon/system dependency. But I am  definitely considering your
advice, Guido!

Also for this particular project (a multiplayer game) I may be able to
get away with Sets, don't actually really need proper Queues. Actually
I'm not sure. Just thinking out loud here.

For a Set implementation in Riak, I am thinking of modeling it after
this blog post, I cannot find which at the moment, which described how
to implement a lock-free Set in Redis. Redis is not distributed, but I
think the data structure is still relevant:

* To get around the eventual-consistency issue, a prerequisite would
be to always use R=1 and W=all. This will optimize it for reads
because there will be a process polling Riak to remove messages out of
the Set.
*  In the Redis implementation, they used the atomic APPEND operation
to append entries into the set, as a list but with notation to add or
remove, prefixed with + or -. e.g.
  +itemA
  +itemB
  -itemA
  +itemC
  = yields { itemB, itemC }
* Obviously Riak doesn't have atomic appends, however setting a bucket
to allow_mult = true, and always Put-ing with null vector clock, then
the Siblings feature should be equivalent. Instead of a value being
appended to, the Siblings would be populated with +itemA, +itemB,
-itemA, +itemC, etc.
* Any process reading the Set is also be responsible for collating and
Put-ing the condensed set back, after it reaches some size limit or I
think they called it 'fragmentation'.  This condense and re-Put
operation would be the tricky part, and would need to use vector clock
and ensure there are 0 siblings when finished. But it should be
possible? It seems like this is an uber-simplified form of a CRDT data
structure?

Thanks,
Alex













On Thu, Sep 12, 2013 at 2:12 AM, Guido Medina guido.med...@temetra.com wrote:

Alex,

RabbitMQ which is a good high performer, developed in Erlang and scales just
as Riak.

The old saying, the right tool for the right job, I like how fast Riak is
fetching/storing key values on a distributed environment, I don't like Riak
for queues, is it because it wasn't designed for that? CRDT like CAS
operations on structures (to mention some for Java) like LinkedBlockingQueue
and ConcurrentLinkedQueue should be painful enough to develop in a CRDT
environment, cause now you would be talking of guarding states of few within
the same structure; say AtomicReference of several variables to warranty a
non-blocking CAS operation, in this case heap and tail of the queue (Not so
true for LinkedBlockingQueue)

And performance of several CRDT for high performance queue I don't actually
think is going to be good.

If I were to use Riak for my big data environment and would like to match
Riak on a distributed queueing system then I would use like I said,
RabbitMQ.

Guido.


On 12/09/13 00:29, Alex Rice wrote:

Hi I'm very new to Riak. The allow_mult = true /  Siblings feature is
very 

use Siblings to implement a message queue?

2013-09-11 Thread Alex Rice
Hi I'm very new to Riak. The allow_mult = true /  Siblings feature is
very interesting. Could it be used to implement a high performance
collection like a Queue or Set, in a lock free manner? The Riak docs
make it sound like allow_mult is mainly for confict resolution and
degenerate cases, rather than a feature which a data structure could
be designed around.  Does anyone have any links to share, or thoughts
about this broader issue?
Thanks much!
Alex

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: use Siblings to implement a message queue?

2013-09-11 Thread Jeremiah Peschka
Siblings don't have to be limited to terrible scenarios - they're a feature
that makes sense for many workloads using an eventually consistent,
distributed database like Riak.

Siblings are being used under the covers in Riak 2.0 to implement sets,
maps, and counters. Riak 1.4.x implements counters using siblings, too.

There's a whole set of data types called CRDTs that can be used to
implement rich data types in a system like Riak. There's work already in
place to do this with some of the existing clients and I started a project
to implement CRDTs through CorrugatedIron:
https://github.com/peschkaj/MoarDT/

The only tricky bit is that you can never be quite sure if you've read
everything. Oh, and garbage collecting the CRDT can be tricky if you're
using a client generated CRDT - you can end up with a large number of
siblings if you aren't careful.


---
Jeremiah Peschka - Founder, Brent Ozar Unlimited
MCITP: SQL Server 2008, MVP
Cloudera Certified Developer for Apache Hadoop


On Wed, Sep 11, 2013 at 4:29 PM, Alex Rice a...@mindlube.com wrote:

 Hi I'm very new to Riak. The allow_mult = true /  Siblings feature is
 very interesting. Could it be used to implement a high performance
 collection like a Queue or Set, in a lock free manner? The Riak docs
 make it sound like allow_mult is mainly for confict resolution and
 degenerate cases, rather than a feature which a data structure could
 be designed around.  Does anyone have any links to share, or thoughts
 about this broader issue?
 Thanks much!
 Alex

 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: use Siblings to implement a message queue?

2013-09-11 Thread Tom Santero
To add to what Jeremiah said about CRDT support in the upcoming release of
Riak (2.0) see this GitHub issue: https://github.com/basho/riak/issues/354


On Wed, Sep 11, 2013 at 7:42 PM, Jeremiah Peschka 
jeremiah.pesc...@gmail.com wrote:

 Siblings don't have to be limited to terrible scenarios - they're a
 feature that makes sense for many workloads using an eventually consistent,
 distributed database like Riak.

 Siblings are being used under the covers in Riak 2.0 to implement sets,
 maps, and counters. Riak 1.4.x implements counters using siblings, too.

 There's a whole set of data types called CRDTs that can be used to
 implement rich data types in a system like Riak. There's work already in
 place to do this with some of the existing clients and I started a project
 to implement CRDTs through CorrugatedIron:
 https://github.com/peschkaj/MoarDT/

 The only tricky bit is that you can never be quite sure if you've read
 everything. Oh, and garbage collecting the CRDT can be tricky if you're
 using a client generated CRDT - you can end up with a large number of
 siblings if you aren't careful.


 ---
 Jeremiah Peschka - Founder, Brent Ozar Unlimited
 MCITP: SQL Server 2008, MVP
 Cloudera Certified Developer for Apache Hadoop


 On Wed, Sep 11, 2013 at 4:29 PM, Alex Rice a...@mindlube.com wrote:

 Hi I'm very new to Riak. The allow_mult = true /  Siblings feature is
 very interesting. Could it be used to implement a high performance
 collection like a Queue or Set, in a lock free manner? The Riak docs
 make it sound like allow_mult is mainly for confict resolution and
 degenerate cases, rather than a feature which a data structure could
 be designed around.  Does anyone have any links to share, or thoughts
 about this broader issue?
 Thanks much!
 Alex

 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: use Siblings to implement a message queue?

2013-09-11 Thread Dmitry Demeshchuk
That doesn't seem bad at all. I think there even were some attempts to make
a Riak backend for RabbitMQ.

I'd say, keep messages as separate objects, and keep their IDs ordered
inside a sibling-based structure, built on top of statebox[1] or
riak_dt[2]. The downsides are: two writes and reads every time, potential
data leaks if you don't make sure the consumed message is deleted. But
storing an entire queue in a single object sounds like a dangerous idea –
one might grow too much.

Also, keep in mind that your queues will be eventually consistent.


Or, as Jeremiah says, you can really wait till Riak 2.0, it might have most
of the implementation problems solved, you'll just need to write a bit of
code.


[1] https://github.com/mochi/statebox
[2] https://github.com/basho/riak_dt


On Wed, Sep 11, 2013 at 4:29 PM, Alex Rice a...@mindlube.com wrote:

 Hi I'm very new to Riak. The allow_mult = true /  Siblings feature is
 very interesting. Could it be used to implement a high performance
 collection like a Queue or Set, in a lock free manner? The Riak docs
 make it sound like allow_mult is mainly for confict resolution and
 degenerate cases, rather than a feature which a data structure could
 be designed around.  Does anyone have any links to share, or thoughts
 about this broader issue?
 Thanks much!
 Alex

 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




-- 
Best regards,
Dmitry Demeshchuk
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com