Re: Inconsistent data across 3.4.6 ensemble

2015-04-22 Thread jlindwall
I do not know if we were stricken by that bug. We do have autopurge enabled
on a 1 hour cycle. How can we tell if that bug caused our issue?



--
View this message in context: 
http://zookeeper-user.578899.n2.nabble.com/Inconsistent-data-across-3-4-6-ensemble-tp7581007p7581013.html
Sent from the zookeeper-user mailing list archive at Nabble.com.


Re: Inconsistent data across 3.4.6 ensemble

2015-04-22 Thread jlindwall
Yes, the log and snaps are in separate dirs. I am afraid I do not have easy
access to the logs right now. When I do I'll see if I can post them.

We restored sanity to the ensemble following the procedure you suggested. It
looks ok now, but this event had made me paranoid so we'll be keeping a
close eye on it.  I guess so has been so robust up to now that I took that
for granted.  I assume the consensus would be that this experience is a
rarity?

Thanks so much for the help!



--
View this message in context: 
http://zookeeper-user.578899.n2.nabble.com/Inconsistent-data-across-3-4-6-ensemble-tp7581007p7581012.html
Sent from the zookeeper-user mailing list archive at Nabble.com.


Re: Zookeeper port 3181

2015-04-22 Thread Rakesh Radhakrishnan
Hi Eyal,

ZooKeeper and BookKeeper are two different services.

Are you trying to setup only zookeeper cluster? Can you show the
configurations and the command to start the servers.

Regards,
Rakesh

On Wed, Apr 22, 2015 at 1:17 PM, Eyal Bar  wrote:

> Hi,
>
> The bookkeeper service listens on port 3181 as I read from the
> documentation (
> http://zookeeper.apache.org/doc/r3.3.6/bookkeeperStarted.html).
> The question is why is this port only open on 1 out of 3 installed
> zookeeper servers. Is there always one live bookkeeper out of the 3
> zookeeper?
>
> CDH5 = Cloudera Hadoop version 5 not Cassandra.
>
> Best,
> On Apr 21, 2015 5:57 PM, "Flavio Junqueira"  >
> wrote:
>
> > I'm confused, you refer to bookie, is it about bookkeeper?
> >
> > Also, you may want to ask Cloudera folks directly about their
> > distribution, you're likely to get a better answer.
> >
> > -Flavio
> >
> > -Original Message-
> > From: "Eyal Bar" 
> > Sent: ‎4/‎21/‎2015 2:27 PM
> > To: "user@zookeeper.apache.org" 
> > Subject: Zookeeper port 3181
> >
> > Hi,
> >
> > I have a CDH5 installed with HA configuration and part on this
> installation
> > are 3 Zookeeper servers.
> >
> > I have noticed that port 3181, which the bookie uses to listens for
> > connection requests from clients, is open only on 1 out of the 3
> installed
> > Zookeepers servers.
> >
> > Do any of you know why port 3181 isn't open on *all *3 Zookeepers?
> >
> > Thanks,
> >
> > --
> > *[ Eyal Bar ]*
> > MySQL and Cassandra Database Administrator - Infrastructure Team  //
> > *Kenshoo*
> > *Office* +972 (3) 746-6500 x473 // *Mobile* +972 (52) 458-6100
> > *eyal@kenshoo.com *
> >  * *
> > ___
> > *www.Kenshoo.com* 
> >
> > --
> > This e-mail, as well as any attached document, may contain material which
> > is confidential and privileged and may include trademark, copyright and
> > other intellectual property rights that are proprietary to Kenshoo Ltd,
> >  its subsidiaries or affiliates ("Kenshoo"). This e-mail and its
> > attachments may be read, copied and used only by the addressee for the
> > purpose(s) for which it was disclosed herein. If you have received it in
> > error, please destroy the message and any attachment, and contact us
> > immediately. If you are not the intended recipient, be aware that any
> > review, reliance, disclosure, copying, distribution or use of the
> contents
> > of this message without Kenshoo's express permission is strictly
> > prohibited.
> >
>
> --
> This e-mail, as well as any attached document, may contain material which
> is confidential and privileged and may include trademark, copyright and
> other intellectual property rights that are proprietary to Kenshoo Ltd,
>  its subsidiaries or affiliates ("Kenshoo"). This e-mail and its
> attachments may be read, copied and used only by the addressee for the
> purpose(s) for which it was disclosed herein. If you have received it in
> error, please destroy the message and any attachment, and contact us
> immediately. If you are not the intended recipient, be aware that any
> review, reliance, disclosure, copying, distribution or use of the contents
> of this message without Kenshoo's express permission is strictly
> prohibited.
>


Re: Inconsistent data across 3.4.6 ensemble

2015-04-22 Thread Asad Saeed
John,

Is it possible you are hitting 
https://issues.apache.org/jira/browse/ZOOKEEPER-1797 ?

Asad

From: jlindwall 
Sent: Apr 22, 2015 6:30 PM
To: zookeeper-u...@hadoop.apache.org
Subject: Re: Inconsistent data across 3.4.6 ensemble

One more piece of data: The inconsistencies involve both ephemeral and
persistent znodes.




--
View this message in context: 
http://zookeeper-user.578899.n2.nabble.com/Inconsistent-data-across-3-4-6-ensemble-tp7581007p7581008.html
Sent from the zookeeper-user mailing list archive at Nabble.com.


Re: Inconsistent data across 3.4.6 ensemble

2015-04-22 Thread Asad Saeed
John,

Can you give more detail as in how the data is inconsistent and post the logs 
somewhere. Are the log and data directories on different mountpoints?

To recover immediately, you should stop zookeeper on the divergent nodes. 
Backup then delete the log and snap directories on those nodes and then restart 
zookeeper on those nodes.

Asad

From: jlindwall 
Sent: Apr 22, 2015 6:25 PM
To: zookeeper-u...@hadoop.apache.org
Subject: Inconsistent data across 3.4.6 ensemble


We somehow are seeing inconsistent data across our 3-node prod ensemble.
Never saw anything like it in dev or qa. We are running on Solaris.

The dataDirs for the nodes were recently involved in a situation in which
the nfs disk they live on was dismounted and remounted, while zk was
running. Not sure if it is related.

Regardless, this seems like it should never happe


n with zookeeper.

Any ideas for correcting the situation?  I have 2 ideas, please critique:

1. Bring down follower 1, delete it's logDataDir and dataDir contents,
restart; do same with follower 2
2. Bring down the whole thing; delete all logDataDir and dataDir contents;
restart

I'd prefer not to do option #2, but I will if I must.

Thanks,
John




--
View this message in context: 
http://zookeeper-user.578899.n2.nabble.com/Inconsistent-data-across-3-4-6-ensemble-tp7581007.html
Sent from the zookeeper-user mailing list archive at Nabble.com.


Re: Inconsistent data across 3.4.6 ensemble

2015-04-22 Thread jlindwall
One more piece of data: The inconsistencies involve both ephemeral and
persistent znodes.




--
View this message in context: 
http://zookeeper-user.578899.n2.nabble.com/Inconsistent-data-across-3-4-6-ensemble-tp7581007p7581008.html
Sent from the zookeeper-user mailing list archive at Nabble.com.


Inconsistent data across 3.4.6 ensemble

2015-04-22 Thread jlindwall
We somehow are seeing inconsistent data across our 3-node prod ensemble.
Never saw anything like it in dev or qa. We are running on Solaris.

The dataDirs for the nodes were recently involved in a situation in which
the nfs disk they live on was dismounted and remounted, while zk was
running. Not sure if it is related.

Regardless, this seems like it should never happen with zookeeper.

Any ideas for correcting the situation?  I have 2 ideas, please critique:

1. Bring down follower 1, delete it's logDataDir and dataDir contents,
restart; do same with follower 2
2. Bring down the whole thing; delete all logDataDir and dataDir contents;
restart

I'd prefer not to do option #2, but I will if I must.

Thanks,
John




--
View this message in context: 
http://zookeeper-user.578899.n2.nabble.com/Inconsistent-data-across-3-4-6-ensemble-tp7581007.html
Sent from the zookeeper-user mailing list archive at Nabble.com.


Zookeeper-based discovery provider: infinite re-connect loop after server restart

2015-04-22 Thread Yuriy Lopotun
Hi guys,



In our client-server OSGI application we are using ECF Zookeeper-based
discovery provider for remote services discovery (based on Zookeeper
v.3.3.6).

In a standalone mode the plugin opens a dedicated Zookeeper connection from
the client to each of the servers.


When testing the application resiliency, we noticed that when we restart
the server, the connection never gets re-established. In the server logs I
found the following:

2015-04-22 18:20:53,763 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2001] INFO
org.apac.zook.serv.NIOServerCnxn - Accepted socket connection from /
10.36.64.250:53022

2015-04-22 18:20:53,763 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2001] DEBUG
org.apac.zook.serv.NIOServerCnxn - Session establishment request from
client /10.36.64.250:53022 client's lastZxid is 0x8

2015-04-22 18:20:53,764 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2001] INFO
org.apac.zook.serv.NIOServerCnxn - Refusing session request for client /
10.36.64.250:53022 as it has seen zxid 0x8 our last zxid is 0x7 client must
try another server

2015-04-22 18:20:53,764 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2001] INFO
org.apac.zook.serv.NIOServerCnxn - Closed socket connection for client /
10.36.64.250:53022 (no session established for client)



As far as I understood – this is an expected behaviour, since the server
(due to restart) cleaned up its DB and reset the transaction id.


The problem in this case is that the client session keeps trying
re-connecting to this only server, which causes an infinite loop:

2015-04-22 18:21:02,760 [pool-2-thread-3-SendThread(
ca-rd-mbernard.miranda.com:2001)] INFO  org.apac.zook.ClientCnxn - Opening
socket connection to server ca-rd-mbernard.miranda.com/10.36.64.250:2001

2015-04-22 18:21:02,761 [pool-2-thread-3-SendThread(
ca-rd-mbernard.miranda.com:2001)] INFO  org.apac.zook.ClientCnxn - Socket
connection established to ca-rd-mbernard.miranda.com/10.36.64.250:2001,
initiating session

2015-04-22 18:21:02,761 [pool-2-thread-3-SendThread(
ca-rd-mbernard.miranda.com:2001)] DEBUG org.apac.zook.ClientCnxn - Session
establishment request sent on ca-rd-mbernard.miranda.com/10.36.64.250:2001

2015-04-22 18:21:02,762 [pool-2-thread-3-SendThread(
ca-rd-mbernard.miranda.com:2001)] INFO  org.apac.zook.ClientCnxn - Unable
to read additional data from server sessionid 0x14ce32e178c0002, likely
server has closed socket, closing socket connection and attempting reconnect



Again, I think this is a correct behaviour in case of several servers. But
in our case – it’s always 1.

So, I wanted to ask you for a suggestion: what you think we can do in this
case to achieve automatic reconnect.

I thought, maybe we can close the connection in case of such exception if
there is only 1 server instead of retrying? Maybe this enhancement is
already done in more recent versions and could be back-ported?



Thanks,

Yuriy


Re: Intermittent connection loss error

2015-04-22 Thread Harihara Vinayakaram
Are u running on a VM ? VMWare VM's have a nasty habit of changing times
--Hari

On Tue, Apr 21, 2015 at 8:13 PM, Lahiru Ginnaliya Gamathige <
glah...@gmail.com> wrote:

> Hi Devs,
>
> We are using ZK in Apache Airavata and when we run it for sometime some
> connections are get lost and never get reconnect. I get following error and
> since I try to reconnect in my process method it keeps trying and exhaust
> the log. Of course I can fix the log issue but I am not sure why this is
> happening . I am using ZK in standalone mode just single instance and below
> is the code I use to reconnect and the log.
>
> 2015-04-08 09:43:10,785 [main-SendThread(gw111.iu.xsede.org:2181)] WARN
> org.apache.zookeeper.ClientCnxn - Session 0x0 for server
> gw111.iu.xsede.org/149.165.228.109:2181, unexpected error, closing socket
> connection and attempting reconnect
> java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
> at
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:66)
> at
>
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:291)
> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1041)
>
>
> @Override
> synchronized public void process(WatchedEvent watchedEvent) {
> logger.info(watchedEvent.getPath());
> synchronized (mutex) {
> Event.KeeperState state = watchedEvent.getState();
> logger.info(state.name());
> switch(state){
> case SyncConnected:
> mutex.notify();
> case Expired:case Disconnected:
> try {
> mutex = -1;
> zk = new
> ZooKeeper(AiravataZKUtils.getZKhostPort(),
> AiravataZKUtils.getZKTimeout(), this);
> synchronized (mutex) {
> mutex.wait();  // waiting for the syncConnected
> event
> }
> storeServerConfig();
> } catch (IOException e) {
> logger.error("Error while synchronizing with
> zookeeper", e);
> } catch (ApplicationSettingsException e) {
> logger.error("Error while synchronizing with
> zookeeper", e);
> } catch (InterruptedException e) {
> logger.error("Error while synchronizing with
> zookeeper", e);
> } catch (AiravataSystemException e) {
> logger.error("Error while synchronizing with
> zookeeper", e);
> }
> }
> }
> }
>
>
> Lahiru
>
>
> --
> Research Assistant
> Science Gateways Group
> Indiana University
>


RE: Zookeeper port 3181

2015-04-22 Thread Eyal Bar
Hi,

The bookkeeper service listens on port 3181 as I read from the
documentation (
http://zookeeper.apache.org/doc/r3.3.6/bookkeeperStarted.html).
The question is why is this port only open on 1 out of 3 installed
zookeeper servers. Is there always one live bookkeeper out of the 3
zookeeper?

CDH5 = Cloudera Hadoop version 5 not Cassandra.

Best,
On Apr 21, 2015 5:57 PM, "Flavio Junqueira" 
wrote:

> I'm confused, you refer to bookie, is it about bookkeeper?
>
> Also, you may want to ask Cloudera folks directly about their
> distribution, you're likely to get a better answer.
>
> -Flavio
>
> -Original Message-
> From: "Eyal Bar" 
> Sent: ‎4/‎21/‎2015 2:27 PM
> To: "user@zookeeper.apache.org" 
> Subject: Zookeeper port 3181
>
> Hi,
>
> I have a CDH5 installed with HA configuration and part on this installation
> are 3 Zookeeper servers.
>
> I have noticed that port 3181, which the bookie uses to listens for
> connection requests from clients, is open only on 1 out of the 3 installed
> Zookeepers servers.
>
> Do any of you know why port 3181 isn't open on *all *3 Zookeepers?
>
> Thanks,
>
> --
> *[ Eyal Bar ]*
> MySQL and Cassandra Database Administrator - Infrastructure Team  //
> *Kenshoo*
> *Office* +972 (3) 746-6500 x473 // *Mobile* +972 (52) 458-6100
> *eyal@kenshoo.com *
>  * *
> ___
> *www.Kenshoo.com* 
>
> --
> This e-mail, as well as any attached document, may contain material which
> is confidential and privileged and may include trademark, copyright and
> other intellectual property rights that are proprietary to Kenshoo Ltd,
>  its subsidiaries or affiliates ("Kenshoo"). This e-mail and its
> attachments may be read, copied and used only by the addressee for the
> purpose(s) for which it was disclosed herein. If you have received it in
> error, please destroy the message and any attachment, and contact us
> immediately. If you are not the intended recipient, be aware that any
> review, reliance, disclosure, copying, distribution or use of the contents
> of this message without Kenshoo's express permission is strictly
> prohibited.
>

-- 
This e-mail, as well as any attached document, may contain material which 
is confidential and privileged and may include trademark, copyright and 
other intellectual property rights that are proprietary to Kenshoo Ltd, 
 its subsidiaries or affiliates ("Kenshoo"). This e-mail and its 
attachments may be read, copied and used only by the addressee for the 
purpose(s) for which it was disclosed herein. If you have received it in 
error, please destroy the message and any attachment, and contact us 
immediately. If you are not the intended recipient, be aware that any 
review, reliance, disclosure, copying, distribution or use of the contents 
of this message without Kenshoo's express permission is strictly prohibited.