Re: Asynchronous API's and monotonicity

2014-06-06 Thread Rakesh Radhakrishnan
Hi Mudit,

Thanks Flavio for showing BookKeeper. Yeah this is another option you can
explore in your free time:)


I'd like to introduce it briefly to you, please have a look if you are
interested.

Documentation available at
http://zookeeper.apache.org/bookkeeper/docs/r4.2.2/

Basic terminologies:-

 - servers called as"bookies",
 - log are "ledgers",
 - and each unit of a log (aka record) is a "ledger entry"

Bookie server uses filesystem to store the ledgers and their entries(Since
it uses filesystem, message size won't be a constraint I guess). Also, it
uses ZK for storing the ledger metadata information.

Basic operations:-
1) Open a bookkeeper client.
2) Create a ledger -
Here it will internally generate id for this ledger and it will be unique.
Like ZooKeeper sequential znodes it internally maintains sequence to
generate the ids.
Upon creating a ledger, a BookKeeper client writes metadata about the
ledger to ZK.
3) Write to the ledger - User can add entries(user data) to the ledger. BK
guarantees single writer.
4) After write close the ledger.

Assume user has created four ledgers, now ledger id looks like L1,
L2, L3, L4. When user tries to create a new ledger, then id
will be incremented L5.

Ledger metadata in ZK:
/ledgers/L1
/ledgers/L2
/ledgers/L3
/ledgers/L4
/ledgers/L5

Now using this sequential ledger znodes present in ZK, one can write the
logic of distributed queue.

Any queries feel free to ping us happy to help you:)
You can reach us bookkeeper-...@zookeeper.apache.org or user mailing id.

Regards,
Rakesh


On Sat, Jun 7, 2014 at 2:32 AM, Flavio Junqueira <
fpjunque...@yahoo.com.invalid> wrote:

> You may want to have a look at BookKeeper.
>
> -Flavio
>
> On 05 Jun 2014, at 16:32, Mudit Verma  wrote:
>
> > Hi Zookeeper Users,
> >
> > Lately, I have been working on a research project where I want to use
> zookeeper as a distributed logging service.
> >
> > I want to build a queue on top of zookeeper (also provided in recipes).
> >
> > What for:
> > Intention is to insert some operations performed by different clients in
> a distributed queue, and process them lazily at some later point of time.
>  And I want some ordering between these operations.
> >
> > Setup:
> > 5 physical  zookeeper servers
> >
> > The problem is:
> > In my current setup, I am observing a latency of about 13 ms per enqueue
> operation (using synchronous create APIs with sequential flags). I want to
> significantly reduce this time. The other way could be to use asynchronous
> zookeeper calls  but I am not sure what can be the side effects. Would it
> still be monotonous when used with SEQUENTIAL flag?
> >
> > For example, a  client X created a SEQUENTIAL node Z1 at time t1 using
> async create, same client created another SEQUENTIAL node Z2 at time t2
> where t2 > t1. Would the monotonic number associated with Z1 be lesser than
> that of Z2?
> >
> > Your help is much appreciated.
> >
> > Thanks
> > Mudit
> >
>
>


Re: zookeeper watch limitation

2014-06-06 Thread Ted Dunning
It is a problem if you expect subsequent watches to go out in milliseconds.

It isn't a problem if the resulting delays are OK with you.  To me, it
sounds like it will be just fine.  If the herd effect is too much, you can
always split the version flags into many pieces and update one version flag
at a time setting of a small herd each time.  That would also allow you to
do canary testing with new configs.




On Fri, Jun 6, 2014 at 2:36 PM, Denis Samoilov  wrote:

> hi,
> I am reading the book "Zookeeper" by Flavio Junqueira and Benjamin Reed.
> And I am now concerned if Zookeeper right tool for our scenario:
> configuration management. We have ~2000 servers that expected to subscribe
> to znode change notification: current version number. As version number
> changed all clients will read new value and read configuration
> correspoinding this value:
>
> / currentVersion "v3"
> /versions
>   /v1 {server1, server2, server3}
>   /v2 {server1, server2, server5}
>   /v3 {server0, server2, server3}
>
> the idea we want to update configuration within seconds (<5s)
>
> Is 2000 watch on same znode and than two simultaneous 2000 reads (one for
> version and one for content) Ok for ZooKeeper?
>
> according the book:
> "...One issue to be aware of is that ZooKeeper triggers all watches set for
> a particular znode change when the change occurs. If there are 1,000
> clients that have set a watch on a given znode with a call to exists, then
> 1,000 notifications will be sent out when the znode is created. A change to
> a watched znode might consequently generate a spike of notifica‐ tions.
> Such a spike could affect, for example, the latency of operations submitted
> around the time of the spike. When possible, we recommend avoiding such a
> use of ZooKeeper in which a large number of clients watch for a change to a
> given znode. It is much better to have only a few clients watching any
> given znode at a time, and ideally at most one..."
>
> 1 vs 2000 is too big difference. And books says that even 1000 is a
> problem. On other hand Zynga says that they did similar to our solution:
>
> http://code.zynga.com/2011/08/updating-thousands-of-configuration-files-in-under-a-second/
>
> Thank you,
> Denis
>


zookeeper watch limitation

2014-06-06 Thread Denis Samoilov
hi,
I am reading the book "Zookeeper" by Flavio Junqueira and Benjamin Reed.
And I am now concerned if Zookeeper right tool for our scenario:
configuration management. We have ~2000 servers that expected to subscribe
to znode change notification: current version number. As version number
changed all clients will read new value and read configuration
correspoinding this value:

/ currentVersion "v3"
/versions
  /v1 {server1, server2, server3}
  /v2 {server1, server2, server5}
  /v3 {server0, server2, server3}

the idea we want to update configuration within seconds (<5s)

Is 2000 watch on same znode and than two simultaneous 2000 reads (one for
version and one for content) Ok for ZooKeeper?

according the book:
"...One issue to be aware of is that ZooKeeper triggers all watches set for
a particular znode change when the change occurs. If there are 1,000
clients that have set a watch on a given znode with a call to exists, then
1,000 notifications will be sent out when the znode is created. A change to
a watched znode might consequently generate a spike of notifica‐ tions.
Such a spike could affect, for example, the latency of operations submitted
around the time of the spike. When possible, we recommend avoiding such a
use of ZooKeeper in which a large number of clients watch for a change to a
given znode. It is much better to have only a few clients watching any
given znode at a time, and ideally at most one..."

1 vs 2000 is too big difference. And books says that even 1000 is a
problem. On other hand Zynga says that they did similar to our solution:
http://code.zynga.com/2011/08/updating-thousands-of-configuration-files-in-under-a-second/

Thank you,
Denis


Re: Asynchronous API's and monotonicity

2014-06-06 Thread James A. Robinson
On Fri, Jun 6, 2014 at 2:19 PM, Flavio Junqueira
 wrote:
> There are ways around this. You can create a hierarchy to get around the 
> problems you're talking about, like we did with the hierarchical ledger 
> manager in bookkeeper. But granted, if all you want to do is a queue, perhaps 
> there are better options out there.

Apache Kafka, for example.  We first installed zookeeper because
we wanted to use Kafka. :)

Jim


Re: Asynchronous API's and monotonicity

2014-06-06 Thread Flavio Junqueira
There are ways around this. You can create a hierarchy to get around the 
problems you're talking about, like we did with the hierarchical ledger manager 
in bookkeeper. But granted, if all you want to do is a queue, perhaps there are 
better options out there.

-Flavio

On 06 Jun 2014, at 22:11, Diego Oliveira  wrote:

> Flavio,
> 
>   Queues usually may have a lot of messages and you may use a znode to
> keep those messages. When the client want to list the messages in a anode
> the Zookeeper server must pass the child names, but there is a data
> transportation limit between the server and the client and if the child
> names is larger then this limit the queue crashes. I got this problem in a
> production system. Take a look for more details the curator wiki (
> https://github.com/Netflix/curator/wiki/Tech-Note-4).
> 
> 
> 
> 
> On Fri, Jun 6, 2014 at 6:04 PM, Flavio Junqueira <
> fpjunque...@yahoo.com.invalid> wrote:
> 
>> I don't quite understand the correlation among amount of data per znode,
>> queues, and being a well known problem. You might as well be right, though.
>> 
>> -Flavio
>> 
>> On 06 Jun 2014, at 21:25, Diego Oliveira  wrote:
>> 
>>> Mudit,
>>> 
>>>   Just to let you know, Zookeeper isn't the best choice for queue, it
>> has
>>> problems in the amount of data that a anode can handle. It is a very well
>>> know problem.
>>> 
>>> Att,
>>>Diego
>>> 
>>> 
>>> On Fri, Jun 6, 2014 at 5:12 AM, Mudit Verma 
>>> wrote:
>>> 
 Thanks James and Rakesh. It helps :)
 
 On 05 Jun 2014, at 21:07, James A. Robinson  wrote:
 
> On Thu, Jun 5, 2014 at 9:45 AM, Rakesh Radhakrishnan <
 rakeshr.apa...@gmail.com> wrote:
> But this behaviour may not be same if we perform operations through
> different clients. Here network delays or other factors may cause
 different
> clients to see a change.
> 
> I'm assuming the other important factor is to ensure that he's
> either got a single control loop dispatching the async calls to
> his zookeeper connection or that he's coordinating the threads
> himself to impose ordering.
> 
> Otherwise, if one has threads x1 and x2 running in parallel,
> he'd have no guarantee which thread dispatched its async
> call to zookeeper first.
> 
> Jim
> 
 
 
>>> 
>>> 
>>> --
>>> Att.
>>> Diego de Oliveira
>>> System Architect
>>> di...@diegooliveira.com
>>> www.diegooliveira.com
>>> Never argue with a fool -- people might not be able to tell the
>> difference
>> 
>> 
> 
> 
> -- 
> Att.
> Diego de Oliveira
> System Architect
> di...@diegooliveira.com
> www.diegooliveira.com
> Never argue with a fool -- people might not be able to tell the difference



Re: Asynchronous API's and monotonicity

2014-06-06 Thread Diego Oliveira
Flavio,

   Queues usually may have a lot of messages and you may use a znode to
keep those messages. When the client want to list the messages in a anode
the Zookeeper server must pass the child names, but there is a data
transportation limit between the server and the client and if the child
names is larger then this limit the queue crashes. I got this problem in a
production system. Take a look for more details the curator wiki (
https://github.com/Netflix/curator/wiki/Tech-Note-4).




On Fri, Jun 6, 2014 at 6:04 PM, Flavio Junqueira <
fpjunque...@yahoo.com.invalid> wrote:

> I don't quite understand the correlation among amount of data per znode,
> queues, and being a well known problem. You might as well be right, though.
>
> -Flavio
>
> On 06 Jun 2014, at 21:25, Diego Oliveira  wrote:
>
> > Mudit,
> >
> >Just to let you know, Zookeeper isn't the best choice for queue, it
> has
> > problems in the amount of data that a anode can handle. It is a very well
> > know problem.
> >
> > Att,
> > Diego
> >
> >
> > On Fri, Jun 6, 2014 at 5:12 AM, Mudit Verma 
> > wrote:
> >
> >> Thanks James and Rakesh. It helps :)
> >>
> >> On 05 Jun 2014, at 21:07, James A. Robinson  wrote:
> >>
> >>> On Thu, Jun 5, 2014 at 9:45 AM, Rakesh Radhakrishnan <
> >> rakeshr.apa...@gmail.com> wrote:
> >>> But this behaviour may not be same if we perform operations through
> >>> different clients. Here network delays or other factors may cause
> >> different
> >>> clients to see a change.
> >>>
> >>> I'm assuming the other important factor is to ensure that he's
> >>> either got a single control loop dispatching the async calls to
> >>> his zookeeper connection or that he's coordinating the threads
> >>> himself to impose ordering.
> >>>
> >>> Otherwise, if one has threads x1 and x2 running in parallel,
> >>> he'd have no guarantee which thread dispatched its async
> >>> call to zookeeper first.
> >>>
> >>> Jim
> >>>
> >>
> >>
> >
> >
> > --
> > Att.
> > Diego de Oliveira
> > System Architect
> > di...@diegooliveira.com
> > www.diegooliveira.com
> > Never argue with a fool -- people might not be able to tell the
> difference
>
>


-- 
Att.
Diego de Oliveira
System Architect
di...@diegooliveira.com
www.diegooliveira.com
Never argue with a fool -- people might not be able to tell the difference


Re: Asynchronous API's and monotonicity

2014-06-06 Thread Flavio Junqueira
I don't quite understand the correlation among amount of data per znode, 
queues, and being a well known problem. You might as well be right, though.

-Flavio

On 06 Jun 2014, at 21:25, Diego Oliveira  wrote:

> Mudit,
> 
>Just to let you know, Zookeeper isn't the best choice for queue, it has
> problems in the amount of data that a anode can handle. It is a very well
> know problem.
> 
> Att,
> Diego
> 
> 
> On Fri, Jun 6, 2014 at 5:12 AM, Mudit Verma 
> wrote:
> 
>> Thanks James and Rakesh. It helps :)
>> 
>> On 05 Jun 2014, at 21:07, James A. Robinson  wrote:
>> 
>>> On Thu, Jun 5, 2014 at 9:45 AM, Rakesh Radhakrishnan <
>> rakeshr.apa...@gmail.com> wrote:
>>> But this behaviour may not be same if we perform operations through
>>> different clients. Here network delays or other factors may cause
>> different
>>> clients to see a change.
>>> 
>>> I'm assuming the other important factor is to ensure that he's
>>> either got a single control loop dispatching the async calls to
>>> his zookeeper connection or that he's coordinating the threads
>>> himself to impose ordering.
>>> 
>>> Otherwise, if one has threads x1 and x2 running in parallel,
>>> he'd have no guarantee which thread dispatched its async
>>> call to zookeeper first.
>>> 
>>> Jim
>>> 
>> 
>> 
> 
> 
> -- 
> Att.
> Diego de Oliveira
> System Architect
> di...@diegooliveira.com
> www.diegooliveira.com
> Never argue with a fool -- people might not be able to tell the difference



Re: Asynchronous API's and monotonicity

2014-06-06 Thread Flavio Junqueira
You may want to have a look at BookKeeper.

-Flavio

On 05 Jun 2014, at 16:32, Mudit Verma  wrote:

> Hi Zookeeper Users, 
> 
> Lately, I have been working on a research project where I want to use 
> zookeeper as a distributed logging service. 
> 
> I want to build a queue on top of zookeeper (also provided in recipes). 
> 
> What for: 
> Intention is to insert some operations performed by different clients in a 
> distributed queue, and process them lazily at some later point of time.  And 
> I want some ordering between these operations.
> 
> Setup: 
> 5 physical  zookeeper servers
> 
> The problem is: 
> In my current setup, I am observing a latency of about 13 ms per enqueue 
> operation (using synchronous create APIs with sequential flags). I want to 
> significantly reduce this time. The other way could be to use asynchronous 
> zookeeper calls  but I am not sure what can be the side effects. Would it 
> still be monotonous when used with SEQUENTIAL flag?  
> 
> For example, a  client X created a SEQUENTIAL node Z1 at time t1 using async 
> create, same client created another SEQUENTIAL node Z2 at time t2 where t2 > 
> t1. Would the monotonic number associated with Z1 be lesser than that of Z2? 
> 
> Your help is much appreciated. 
> 
> Thanks
> Mudit
> 



Re: Asynchronous API's and monotonicity

2014-06-06 Thread Diego Oliveira
Mudit,

Just to let you know, Zookeeper isn't the best choice for queue, it has
problems in the amount of data that a anode can handle. It is a very well
know problem.

Att,
 Diego


On Fri, Jun 6, 2014 at 5:12 AM, Mudit Verma 
wrote:

> Thanks James and Rakesh. It helps :)
>
> On 05 Jun 2014, at 21:07, James A. Robinson  wrote:
>
> > On Thu, Jun 5, 2014 at 9:45 AM, Rakesh Radhakrishnan <
> rakeshr.apa...@gmail.com> wrote:
> > But this behaviour may not be same if we perform operations through
> > different clients. Here network delays or other factors may cause
> different
> > clients to see a change.
> >
> > I'm assuming the other important factor is to ensure that he's
> > either got a single control loop dispatching the async calls to
> > his zookeeper connection or that he's coordinating the threads
> > himself to impose ordering.
> >
> > Otherwise, if one has threads x1 and x2 running in parallel,
> > he'd have no guarantee which thread dispatched its async
> > call to zookeeper first.
> >
> > Jim
> >
>
>


-- 
Att.
Diego de Oliveira
System Architect
di...@diegooliveira.com
www.diegooliveira.com
Never argue with a fool -- people might not be able to tell the difference


Re: Asynchronous API's and monotonicity

2014-06-06 Thread Mudit Verma
Thanks James and Rakesh. It helps :)

On 05 Jun 2014, at 21:07, James A. Robinson  wrote:

> On Thu, Jun 5, 2014 at 9:45 AM, Rakesh Radhakrishnan 
>  wrote:
> But this behaviour may not be same if we perform operations through
> different clients. Here network delays or other factors may cause different
> clients to see a change.
> 
> I'm assuming the other important factor is to ensure that he's
> either got a single control loop dispatching the async calls to
> his zookeeper connection or that he's coordinating the threads
> himself to impose ordering.
> 
> Otherwise, if one has threads x1 and x2 running in parallel,
> he'd have no guarantee which thread dispatched its async
> call to zookeeper first.
> 
> Jim
>