Re: use Siblings to implement a message queue?

2013-09-12 Thread Guido Medina

Hi Alex,

I get what you mean, I would say; if you think your project will succeed 
don't think small, RabbitMQ and Riak can be pretty easy to understand 
and manage once you play with them, don't be scare to try, high demand 
systems usually have several subsystems to get their tasks done quickly 
like:


1. Distributed data.
2. Distributed queues.
3. Text indexing/search.

With Riak+Yokozuna you cover 1 and 3 which is a pretty good combination, 
as a side note, maybe, just maybe, if you "divide and conquer" you may 
be able to remove the "Set" concept and just work with CRDT or siblings, 
let me explain briefly:


CRDT would warranty your distributed CAS operations per your designed 
key, instead of treating the keys as a Set, treat them independently and 
each CRDT key to manage its own states, that way you get the atomicity 
you want without worrying about locking or distributed concurrency in a 
collection of items since each key is unique and each would control its 
own state, or let each key have siblings and conflict resolution, as 
long as you don't create too many siblings per key in a very small 
amount of time.


The the only show stopper for CRDT is if you need to do operations in 
more than one atomically, then you will have to use siblings, even for 
siblings, your key approach would be to try to operate atomically in the 
minimum amount of states as possible so that you can avoid using Sets or 
Queues.


I hope I have expressed my thoughts well since explaining all this can 
get complicated really quick, or become a really long talk.


Hope that helps,

Guido.

On 12/09/13 17:45, Alex Rice wrote:

Hey all, thanks for the feedback, this is interesting!

re: Those CRDT white papers look pretty complicated, I will definitely
will leave that to you hardcore mathematicians and computer scientists
:)
re: Riak 2.0, I see there are git branches for 2.0 related stuff in
Riak and in corrugatediron, I am guessing it's still too rough around
the edges for a newbie like me to be messing around with?
re: RabbitMQ I had heard of it but am hesitant to add another
daemon/system dependency. But I am  definitely considering your
advice, Guido!

Also for this particular project (a multiplayer game) I may be able to
get away with Sets, don't actually really need proper Queues. Actually
I'm not sure. Just thinking out loud here.

For a Set implementation in Riak, I am thinking of modeling it after
this blog post, I cannot find which at the moment, which described how
to implement a lock-free Set in Redis. Redis is not distributed, but I
think the data structure is still relevant:

* To get around the eventual-consistency issue, a prerequisite would
be to always use R=1 and W=all. This will optimize it for reads
because there will be a process polling Riak to remove messages out of
the Set.
*  In the Redis implementation, they used the atomic APPEND operation
to append entries into the set, as a list but with notation to add or
remove, prefixed with + or -. e.g.
  +itemA
  +itemB
  -itemA
  +itemC
  => yields { itemB, itemC }
* Obviously Riak doesn't have atomic appends, however setting a bucket
to allow_mult = true, and always Put-ing with null vector clock, then
the Siblings feature should be equivalent. Instead of a value being
appended to, the Siblings would be populated with +itemA, +itemB,
-itemA, +itemC, etc.
* Any process reading the Set is also be responsible for collating and
Put-ing the condensed set back, after it reaches some size limit or I
think they called it 'fragmentation'.  This condense and re-Put
operation would be the tricky part, and would need to use vector clock
and ensure there are 0 siblings when finished. But it should be
possible? It seems like this is an uber-simplified form of a CRDT data
structure?

Thanks,
Alex













On Thu, Sep 12, 2013 at 2:12 AM, Guido Medina  wrote:

Alex,

RabbitMQ which is a good high performer, developed in Erlang and scales just
as Riak.

The old saying, the right tool for the right job, I like how fast Riak is
fetching/storing key values on a distributed environment, I don't like Riak
for queues, is it because it wasn't designed for that? CRDT like CAS
operations on structures (to mention some for Java) like LinkedBlockingQueue
and ConcurrentLinkedQueue should be painful enough to develop in a CRDT
environment, cause now you would be talking of guarding states of few within
the same structure; say AtomicReference of several variables to warranty a
non-blocking CAS operation, in this case heap and tail of the queue (Not so
true for LinkedBlockingQueue)

And performance of several CRDT for high performance queue I don't actually
think is going to be good.

If I were to use Riak for my big data environment and would like to match
Riak on a distributed queueing system then I would use like I said,
RabbitMQ.

Guido.


On 12/09/13 00:29, Alex Rice wrote:

Hi I'm very new to Riak. The allow_mult = true /  Siblings feature is
very interesting. Could it be

Re: use Siblings to implement a message queue?

2013-09-12 Thread Jeremiah Peschka
---
Jeremiah Peschka - Founder, Brent Ozar Unlimited
MCITP: SQL Server 2008, MVP
Cloudera Certified Developer for Apache Hadoop


On Thu, Sep 12, 2013 at 9:45 AM, Alex Rice  wrote:

> Hey all, thanks for the feedback, this is interesting!
>
> re: Those CRDT white papers look pretty complicated, I will definitely
> will leave that to you hardcore mathematicians and computer scientists
> :)
> re: Riak 2.0, I see there are git branches for 2.0 related stuff in
> Riak and in corrugatediron, I am guessing it's still too rough around
> the edges for a newbie like me to be messing around with?
>

I can't speak for Riak 2.0, but CorrugatedIron's support doesn't have any
tests supporting it... I hope to get on that next week to see where things
are. The only hold up is time on my end. I believe the riak branch should
be workable for development. There are features in CI that need clean up,
but I'll start pushing out RC builds through nuget for the adventurous.


> re: RabbitMQ I had heard of it but am hesitant to add another
> daemon/system dependency. But I am  definitely considering your
> advice, Guido!
>
> Also for this particular project (a multiplayer game) I may be able to
> get away with Sets, don't actually really need proper Queues. Actually
> I'm not sure. Just thinking out loud here.
>
> For a Set implementation in Riak, I am thinking of modeling it after
> this blog post, I cannot find which at the moment, which described how
> to implement a lock-free Set in Redis. Redis is not distributed, but I
> think the data structure is still relevant:
>
> * To get around the eventual-consistency issue, a prerequisite would
> be to always use R=1 and W=all. This will optimize it for reads
> because there will be a process polling Riak to remove messages out of
> the Set.
> *  In the Redis implementation, they used the atomic APPEND operation
> to append entries into the set, as a list but with notation to add or
> remove, prefixed with + or -. e.g.
>  +itemA
>  +itemB
>  -itemA
>  +itemC
>  => yields { itemB, itemC }
> * Obviously Riak doesn't have atomic appends, however setting a bucket
> to allow_mult = true, and always Put-ing with null vector clock, then
> the Siblings feature should be equivalent. Instead of a value being
> appended to, the Siblings would be populated with +itemA, +itemB,
> -itemA, +itemC, etc.
>

>From an API perspective, you'll do something similar in Riak 2.0. The
CorrugatedIron operation would allow to atomically put individual items
into a set. I'm hoping to create change tracking data structures for use
with the get/put/delete functionality of Riak's server side CRDTs.


> * Any process reading the Set is also be responsible for collating and
> Put-ing the condensed set back, after it reaches some size limit or I
> think they called it 'fragmentation'.  This condense and re-Put
> operation would be the tricky part, and would need to use vector clock
> and ensure there are 0 siblings when finished. But it should be
> possible? It seems like this is an uber-simplified form of a CRDT data
> structure?
>

If you're implementing the CRDT on the client side, you'll need to figure
out the garbage collection yourself. If you're able to use Riak's data
structures, then you can safely rely on someone else handling the problems
for you.

If it weren't for my day job, I'd have this CRDT support in CI by now ;)


>
> Thanks,
> Alex
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Thu, Sep 12, 2013 at 2:12 AM, Guido Medina 
> wrote:
> > Alex,
> >
> > RabbitMQ which is a good high performer, developed in Erlang and scales
> just
> > as Riak.
> >
> > The old saying, the right tool for the right job, I like how fast Riak is
> > fetching/storing key values on a distributed environment, I don't like
> Riak
> > for queues, is it because it wasn't designed for that? CRDT like CAS
> > operations on structures (to mention some for Java) like
> LinkedBlockingQueue
> > and ConcurrentLinkedQueue should be painful enough to develop in a CRDT
> > environment, cause now you would be talking of guarding states of few
> within
> > the same structure; say AtomicReference of several variables to warranty
> a
> > non-blocking CAS operation, in this case heap and tail of the queue (Not
> so
> > true for LinkedBlockingQueue)
> >
> > And performance of several CRDT for high performance queue I don't
> actually
> > think is going to be good.
> >
> > If I were to use Riak for my big data environment and would like to match
> > Riak on a distributed queueing system then I would use like I said,
> > RabbitMQ.
> >
> > Guido.
> >
> >
> > On 12/09/13 00:29, Alex Rice wrote:
> >>
> >> Hi I'm very new to Riak. The allow_mult = true /  Siblings feature is
> >> very interesting. Could it be used to implement a high performance
> >> collection like a Queue or Set, in a lock free manner? The Riak docs
> >> make it sound like allow_mult is mainly for confict resolution and
> >> degenerate cases, rather than a feature which a data structure could
> >> 

Re: use Siblings to implement a message queue?

2013-09-12 Thread Alex Rice
Hey all, thanks for the feedback, this is interesting!

re: Those CRDT white papers look pretty complicated, I will definitely
will leave that to you hardcore mathematicians and computer scientists
:)
re: Riak 2.0, I see there are git branches for 2.0 related stuff in
Riak and in corrugatediron, I am guessing it's still too rough around
the edges for a newbie like me to be messing around with?
re: RabbitMQ I had heard of it but am hesitant to add another
daemon/system dependency. But I am  definitely considering your
advice, Guido!

Also for this particular project (a multiplayer game) I may be able to
get away with Sets, don't actually really need proper Queues. Actually
I'm not sure. Just thinking out loud here.

For a Set implementation in Riak, I am thinking of modeling it after
this blog post, I cannot find which at the moment, which described how
to implement a lock-free Set in Redis. Redis is not distributed, but I
think the data structure is still relevant:

* To get around the eventual-consistency issue, a prerequisite would
be to always use R=1 and W=all. This will optimize it for reads
because there will be a process polling Riak to remove messages out of
the Set.
*  In the Redis implementation, they used the atomic APPEND operation
to append entries into the set, as a list but with notation to add or
remove, prefixed with + or -. e.g.
 +itemA
 +itemB
 -itemA
 +itemC
 => yields { itemB, itemC }
* Obviously Riak doesn't have atomic appends, however setting a bucket
to allow_mult = true, and always Put-ing with null vector clock, then
the Siblings feature should be equivalent. Instead of a value being
appended to, the Siblings would be populated with +itemA, +itemB,
-itemA, +itemC, etc.
* Any process reading the Set is also be responsible for collating and
Put-ing the condensed set back, after it reaches some size limit or I
think they called it 'fragmentation'.  This condense and re-Put
operation would be the tricky part, and would need to use vector clock
and ensure there are 0 siblings when finished. But it should be
possible? It seems like this is an uber-simplified form of a CRDT data
structure?

Thanks,
Alex













On Thu, Sep 12, 2013 at 2:12 AM, Guido Medina  wrote:
> Alex,
>
> RabbitMQ which is a good high performer, developed in Erlang and scales just
> as Riak.
>
> The old saying, the right tool for the right job, I like how fast Riak is
> fetching/storing key values on a distributed environment, I don't like Riak
> for queues, is it because it wasn't designed for that? CRDT like CAS
> operations on structures (to mention some for Java) like LinkedBlockingQueue
> and ConcurrentLinkedQueue should be painful enough to develop in a CRDT
> environment, cause now you would be talking of guarding states of few within
> the same structure; say AtomicReference of several variables to warranty a
> non-blocking CAS operation, in this case heap and tail of the queue (Not so
> true for LinkedBlockingQueue)
>
> And performance of several CRDT for high performance queue I don't actually
> think is going to be good.
>
> If I were to use Riak for my big data environment and would like to match
> Riak on a distributed queueing system then I would use like I said,
> RabbitMQ.
>
> Guido.
>
>
> On 12/09/13 00:29, Alex Rice wrote:
>>
>> Hi I'm very new to Riak. The allow_mult = true /  Siblings feature is
>> very interesting. Could it be used to implement a high performance
>> collection like a Queue or Set, in a lock free manner? The Riak docs
>> make it sound like allow_mult is mainly for confict resolution and
>> degenerate cases, rather than a feature which a data structure could
>> be designed around.  Does anyone have any links to share, or thoughts
>> about this broader issue?
>> Thanks much!
>> Alex
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: use Siblings to implement a message queue?

2013-09-12 Thread Guido Medina

Alex,

RabbitMQ which is a good high performer, developed in Erlang and scales 
just as Riak.


The old saying, the right tool for the right job, I like how fast Riak 
is fetching/storing key values on a distributed environment, I don't 
like Riak for queues, is it because it wasn't designed for that? CRDT 
like CAS operations on structures (to mention some for Java) like 
LinkedBlockingQueue and ConcurrentLinkedQueue should be painful enough 
to develop in a CRDT environment, cause now you would be talking of 
guarding states of few within the same structure; say AtomicReference of 
several variables to warranty a non-blocking CAS operation, in this case 
heap and tail of the queue (Not so true for LinkedBlockingQueue)


And performance of several CRDT for high performance queue I don't 
actually think is going to be good.


If I were to use Riak for my big data environment and would like to 
match Riak on a distributed queueing system then I would use like I 
said, RabbitMQ.


Guido.

On 12/09/13 00:29, Alex Rice wrote:

Hi I'm very new to Riak. The allow_mult = true /  Siblings feature is
very interesting. Could it be used to implement a high performance
collection like a Queue or Set, in a lock free manner? The Riak docs
make it sound like allow_mult is mainly for confict resolution and
degenerate cases, rather than a feature which a data structure could
be designed around.  Does anyone have any links to share, or thoughts
about this broader issue?
Thanks much!
Alex

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: use Siblings to implement a message queue?

2013-09-11 Thread Dmitry Demeshchuk
That doesn't seem bad at all. I think there even were some attempts to make
a Riak backend for RabbitMQ.

I'd say, keep messages as separate objects, and keep their IDs ordered
inside a sibling-based structure, built on top of statebox[1] or
riak_dt[2]. The downsides are: two writes and reads every time, potential
data leaks if you don't make sure the consumed message is deleted. But
storing an entire queue in a single object sounds like a dangerous idea –
one might grow too much.

Also, keep in mind that your queues will be eventually consistent.


Or, as Jeremiah says, you can really wait till Riak 2.0, it might have most
of the implementation problems solved, you'll just need to write a bit of
code.


[1] https://github.com/mochi/statebox
[2] https://github.com/basho/riak_dt


On Wed, Sep 11, 2013 at 4:29 PM, Alex Rice  wrote:

> Hi I'm very new to Riak. The allow_mult = true /  Siblings feature is
> very interesting. Could it be used to implement a high performance
> collection like a Queue or Set, in a lock free manner? The Riak docs
> make it sound like allow_mult is mainly for confict resolution and
> degenerate cases, rather than a feature which a data structure could
> be designed around.  Does anyone have any links to share, or thoughts
> about this broader issue?
> Thanks much!
> Alex
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>



-- 
Best regards,
Dmitry Demeshchuk
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: use Siblings to implement a message queue?

2013-09-11 Thread Tom Santero
To add to what Jeremiah said about CRDT support in the upcoming release of
Riak (2.0) see this GitHub issue: https://github.com/basho/riak/issues/354


On Wed, Sep 11, 2013 at 7:42 PM, Jeremiah Peschka <
jeremiah.pesc...@gmail.com> wrote:

> Siblings don't have to be limited to terrible scenarios - they're a
> feature that makes sense for many workloads using an eventually consistent,
> distributed database like Riak.
>
> Siblings are being used under the covers in Riak 2.0 to implement sets,
> maps, and counters. Riak 1.4.x implements counters using siblings, too.
>
> There's a whole set of data types called CRDTs that can be used to
> implement rich data types in a system like Riak. There's work already in
> place to do this with some of the existing clients and I started a project
> to implement CRDTs through CorrugatedIron:
> https://github.com/peschkaj/MoarDT/
>
> The only tricky bit is that you can never be quite sure if you've read
> everything. Oh, and garbage collecting the CRDT can be tricky if you're
> using a client generated CRDT - you can end up with a large number of
> siblings if you aren't careful.
>
>
> ---
> Jeremiah Peschka - Founder, Brent Ozar Unlimited
> MCITP: SQL Server 2008, MVP
> Cloudera Certified Developer for Apache Hadoop
>
>
> On Wed, Sep 11, 2013 at 4:29 PM, Alex Rice  wrote:
>
>> Hi I'm very new to Riak. The allow_mult = true /  Siblings feature is
>> very interesting. Could it be used to implement a high performance
>> collection like a Queue or Set, in a lock free manner? The Riak docs
>> make it sound like allow_mult is mainly for confict resolution and
>> degenerate cases, rather than a feature which a data structure could
>> be designed around.  Does anyone have any links to share, or thoughts
>> about this broader issue?
>> Thanks much!
>> Alex
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: use Siblings to implement a message queue?

2013-09-11 Thread Jeremiah Peschka
Siblings don't have to be limited to terrible scenarios - they're a feature
that makes sense for many workloads using an eventually consistent,
distributed database like Riak.

Siblings are being used under the covers in Riak 2.0 to implement sets,
maps, and counters. Riak 1.4.x implements counters using siblings, too.

There's a whole set of data types called CRDTs that can be used to
implement rich data types in a system like Riak. There's work already in
place to do this with some of the existing clients and I started a project
to implement CRDTs through CorrugatedIron:
https://github.com/peschkaj/MoarDT/

The only tricky bit is that you can never be quite sure if you've read
everything. Oh, and garbage collecting the CRDT can be tricky if you're
using a client generated CRDT - you can end up with a large number of
siblings if you aren't careful.


---
Jeremiah Peschka - Founder, Brent Ozar Unlimited
MCITP: SQL Server 2008, MVP
Cloudera Certified Developer for Apache Hadoop


On Wed, Sep 11, 2013 at 4:29 PM, Alex Rice  wrote:

> Hi I'm very new to Riak. The allow_mult = true /  Siblings feature is
> very interesting. Could it be used to implement a high performance
> collection like a Queue or Set, in a lock free manner? The Riak docs
> make it sound like allow_mult is mainly for confict resolution and
> degenerate cases, rather than a feature which a data structure could
> be designed around.  Does anyone have any links to share, or thoughts
> about this broader issue?
> Thanks much!
> Alex
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com