Re: What OpenJDK Version Can I Use?

2023-09-22 Thread Shawn Heisey

On 9/21/23 12:51, DeLella, David wrote:

To all,

I am contemplating a Java upgrade on some local instances of Zookeeper. I 
cannot find any documentation online about the highest compatible version of 
Java I can use. The version numbers of both my Solr and Zookeeper instance are 
fixed and cannot change.

Solr = 8.1.1
Zookeeper = 3.4.14

Currently, we are using OpenJDK 1.8.202. According to online documentation, 
Solr 8.1.1 has been tested through Java 13 (pre-release), which probably means 
13 GA will work. Zookeeper just says 1.7+. Does that mean 13 will work with my 
version of Zookeeper?

Thanks
"This email and any attachments may be confidential or legally privileged and are 
intended solely for the use of the individual or entity addressed. If you have received 
this email in error please notify the sender and delete it from your email system. 
Additionally, you are notified that disclosing, copying, distributing or taking any 
action or reliance on the contents of this information is strictly prohibited."


For Solr 8, which requires Java 8.x, I would probably go with OpenJDK 
11.  Java 17 has a bug that Solr 9 can work around, but Solr 8 has no 
workaround for it.  For Solr 9, I would run OpenJDK 17.


If you are considering upgrading beyond your current release, Solr 
8.11.x includes ZK 3.6.2 and Solr 9.3.0 includes ZK 3.9.0 ... which are 
going to require a ZK server upgrade.


Thanks,
Shawn



Re: Is the zookeeper client 3.5x compatible with 3.4 server

2023-06-03 Thread Shawn Heisey

On 6/2/23 00:00, Xie wrote:

No infomation found in the release note. But we do find some use who complained 
the backward compatibility to 3.4 server. So, do we have any tests about this? 
And which functions are broken when use 3.5+ client against the 3.4 server?


Yes.

There is compatibility info here:

https://cwiki.apache.org/confluence/display/ZOOKEEPER/ReleaseManagement

In simpler terms:  A 3.4 server is guaranteed compatible with clients 
from 3.3.x through 3.5.x -- one minor version in either direction.  It 
is possible for the actual compatible range to be larger, but the info 
there is what the project tries to guarantee.


Thanks,
Shawn


Re: Support for Java 17

2023-04-18 Thread Shawn Heisey

On 4/17/2023 7:21 AM, Wilson Leao wrote:

Are there any plans to support Java 17+ ?


I can't speak directly for the ZK project... but I am running a 
development version of Solr, in cloud mode, with the embedded ZK server, 
on OpenJDK 17.  So it is running both the ZK server and the ZK client. 
I haven't seen any problems with this setup, so my unofficial 
observation is that this version of ZK runs in Java 17.


elyograg@bilbo:~$ java -version
openjdk version "17.0.6" 2023-01-17
OpenJDK Runtime Environment (build 17.0.6+10-Ubuntu-0ubuntu120.04.1)
OpenJDK 64-Bit Server VM (build 17.0.6+10-Ubuntu-0ubuntu120.04.1, mixed 
mode, sharing)


elyograg@bilbo:~$ find /opt/solr-9.3.0-SNAPSHOT/server | grep zookeeper
/opt/solr-9.3.0-SNAPSHOT/server/solr-webapp/webapp/WEB-INF/lib/solr-solrj-zookeeper-9.3.0-SNAPSHOT.jar
/opt/solr-9.3.0-SNAPSHOT/server/solr-webapp/webapp/WEB-INF/lib/zookeeper-jute-3.8.1.jar
/opt/solr-9.3.0-SNAPSHOT/server/solr-webapp/webapp/WEB-INF/lib/zookeeper-3.8.1.jar

This dev version of Solr includes ZK 3.8.1, as does the current public 
release of Solr, which is 9.2.0.  I do not know specifically if 3.7.1 
would also work.


Thanks,
Shawn


Re: Backup and restore Solr 8.11.2 collections and configsets in Zookeeper version: 3.7.0

2022-09-16 Thread Shawn Heisey

On 9/16/22 09:37, Szalay-Bekő Máté wrote:

But actually much better would be to do the backup and restore on Solr
level.


Solr doesn't currently have this capability.  We do have functionality 
that can download index configs from ZK to the filesystem, but not all 
the cluster contents in ZK.


If the ZK is dedicated to Solr, I believe you can copy the entire 
"version-2" directory from the ZK datadir and install it in new ZK nodes 
while they are down, then start them up.


Thanks,
Shawn



Re: Apache ZooKeeper Consistency with Majority Failure

2022-07-28 Thread Shawn Heisey

On 7/28/22 13:33, Shawn Heisey wrote:

Node 1 is most likely informed that its database is now out of date 
(or it decides that for itself) so it syncs the whole DB from the 
current leader, which will not know about the znode created in step B.


Not in any way a ZK expert.  But that seems like the most logical way 
for it to work.


I'm just guessing that there is some timestamp which declares the last 
time a database was running with quorum and that comparing those 
timestamps is how ZK decides that a node's database is out of date.  I 
am curious as to whether I have deduced things incorrectly.


Further extrapolation ... I would guess that if at any point the entire 
cluster goes down or runs without quorum, any node that later starts up 
and joins the cluster will have its DB overwritten and be identical to 
whatever nodes ARE running.  A question for the experts on the list ... 
is that correct?


Thanks,
Shawn



Re: Apache ZooKeeper Consistency with Majority Failure

2022-07-28 Thread Shawn Heisey

On 7/28/22 08:35, José Armando García Sancio wrote:

B) I started the majority of the nodes (1, 2). The ensemble was
established and I was able to create a znode using the CLI.

C) I shutdown all of the nodes (1, 2 since I never started node 3). To
simulate a disk failure I deleted the content of the transaction and
snapshot directory (version-2) for node 2.


Note that at this point only node 1 knows about the znode you created in 
step B.



D) I started the majority of the nodes (2, 3). The ensemble was
established and I was able to establish a connection with the CLI.

E) I finally started node 1 which had the committed transactions and
snapshots. The znode created in step B) was not present.


Node 1 is most likely informed that its database is now out of date (or 
it decides that for itself) so it syncs the whole DB from the current 
leader, which will not know about the znode created in step B.


Not in any way a ZK expert.  But that seems like the most logical way 
for it to work.


I'm just guessing that there is some timestamp which declares the last 
time a database was running with quorum and that comparing those 
timestamps is how ZK decides that a node's database is out of date.  I 
am curious as to whether I have deduced things incorrectly.


Thanks,
Shawn



Re: What does Apache ZooKeeper do?

2022-05-18 Thread Shawn Heisey

On 5/18/2022 8:07 AM, Turritopsis Dohrnii Teo En Ming wrote:

I notice my company/organization is using Apache ZooKeeper. What does it do?


Did you visit the website?  It's got a pretty good summary.  And 
following that, there is link to the wiki, which has even more detail.


https://zookeeper.apache.org/

That webpage is the first hit if you do a Google search for zookeeper.

Thanks,
Shawn



Re: TLS quorum host name verification issue with docker-compose

2022-04-06 Thread Shawn Heisey

On 2022-04-06 14:54, René Buffat wrote:

javax.net.ssl.SSLPeerUnverifiedException: Certificate for
 doesn't match common name of 
the

certificate subject: zookeeper2
java.security.cert.CertificateException: Failed to verify both host 
address

and host name


Generally speaking, I have never heard of anything SSL-related that 
looks at or cares about reverse DNS.


What should matter is what the CN (or SAN, for certificates that handle 
multiple names) is, and what hostname the client is using to connect to 
the server.  Those have to match.  When using SSL, you do not want to 
specify an IP address for the host, you want to give it a name, because 
it is very unlikely that you'll see an IP address in a certificate 
unless you create it with a private CA.


If reverse DNS is the only place that the longer name appears, then 
either the SSL verification is entirely too picky, or you have given the 
client an IP address for the server instead of a name, and it is looking 
up the reverse DNS so that it has a name to compare to the cert.  I have 
no idea whether the ZK client (or maybe Java itself) does this, but it 
wouldn't surprise me.


Thanks,
Shawn


Re: 回复: why zookeeper server count is odd !!!

2021-09-01 Thread Shawn Heisey

On 9/1/2021 6:32 PM, 一直以来 wrote:

second email i want know :

if i have 7 zookeeper server,

if internet error, <123>server in one small master-replica, <4567> server in 
one small master-replica too,
so i want know :
<123>server can have new leader??
and 
<4567>server can have new leader??


I still do not understand what you're asking, but I will try to guess.

An ensemble with 7 servers can survive 3 failures.

If you are describing a situation with 7 servers where the 1, 2, and 3 
servers are disconnected from the rest but can still communicate with 
each other, those 3 servers will NOT be able to achieve quorum.  If the 
other 4 servers can talk to each other, then they WILL achieve quorum 
and elect a leader.  This is how ZK is designed, and I really doubt that 
it will ever change.


Thanks,
Shawn


Re: why zookeeper server count is odd !!!

2021-09-01 Thread Shawn Heisey

On 8/31/2021 7:59 PM, 一直以来 wrote:

can has a url address ?? or at zookeeper.apache.org site doc ??



An odd number of servers is recommended because you don't gain anything 
from having one more (making it an even number).


https://zookeeper.apache.org/doc/r3.6.1/zookeeperAdmin.html#sc_CrossMachineRequirements

Relevant info on that page:

---
For the ZooKeeper service to be active, there must be a majority of 
non-failing machines that can communicate with each other. To create a 
deployment that can tolerate the failure of F machines, you should count 
on deploying 2xF+1 machines. Thus, a deployment that consists of three 
machines can handle one failure, and a deployment of five machines can 
handle two failures. Note that a deployment of six machines can only 
handle two failures since three machines is not a majority. For this 
reason, ZooKeeper deployments are usually made up of an odd number of 
machines.

---

I could not figure out what your second email was asking. Subject:
has 7 zookeeper server , internet wrong,<123>,<4567>

Thanks,
Shawn



Re: Zookeeper 3.4.5 with Solr 8.8.0

2021-03-01 Thread Shawn Heisey

On 3/1/2021 6:51 AM, Subhajit Das wrote:

I noticed, that Solr 8.8.0 uses Zookeeper 3.6.2 client, while Solr 6.3.0 uses 
Zookeeper 3.4.6 client. Is this a client bug or mismatch issue?
If so, how to fix this?


The ZK project guarantees that each minor version (X.Y.Z, where Y is the 
same) will work with the previous minor version or the next minor version.


3.4 and 3.6 are two minor versions apart, and thus compatibility cannot 
be guaranteed.


See the "backward compatibility" matrix here:

https://cwiki.apache.org/confluence/display/ZOOKEEPER/ReleaseManagement

I think you'll need to upgrade your ZK server ensemble to fix it.

Thanks,
Shawn


Re: zookeeper / solr cloud problems

2019-12-13 Thread Shawn Heisey

On 12/13/2019 11:01 AM, Kojo wrote:

We had already changed SO configuration before the last crash, so I think
that the problem is not there.

ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 257683
max locked memory   (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files  (-n) 65535
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 8192
cpu time   (seconds, -t) unlimited
max user processes  (-u) 65535
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited


Are you running this ulimit command as the same user that is running 
your Solr process?  It must be the same user to learn anything useful. 
This output indicates that the user that's running the ulimit command is 
allowed to start 64K processes, which I would think should be enough.


Best guess here is that the actual user that's running Solr does *NOT* 
have its limits increased.  It may be a different user than you're using 
to run the ulimit command.



When Solr tries to delete a znode? I´am sorry, because I understand nothing
about this process, and it is the only point that seems suspicios for me.
Do you think that it can cause inconsistency leading to the OOM problem?


OOME isn't caused by inconsistencies at the application level.  It's a 
low-level problem, an indication that Java tried to do something 
required to run the program that it couldn't do.


I assume that it's Solr trying to delete the znode, because the node 
path has solr in it.  It will be the ZK client running inside Solr 
that's actually trying to do the work, but Solr code probably initiated it.



Just after this INFO message above, ZK log starts to log thousands of this
block of lines below. Where it seems that ZK creates and closes thousands
of sessions.


I responded to this thread because I have some knowledge about Solr.  I 
really have no idea what these additional ZK server logs might mean. 
The one that you quoted before was pretty straightforward, so I was able 
to understand it.


Anything that gets logged after an OOME is suspect and may be useless. 
The execution of a Java program after OOME is unpredictable, because 
whatever was being run when the OOME was thrown did NOT successfully 
execute.


Thanks,
Shawn


Re: zookeeper / solr cloud problems

2019-12-13 Thread Shawn Heisey

On 12/13/2019 9:47 AM, Kojo wrote:

My setup is Solr Cloud (two shards) and Zookeeper (one instance) in the
same box. I am having some problems (OutOfMemory) on Solr.

This is the solr oom log:

java.lang.OutOfMemoryError: unable to create new native thread


Solr tried to start a new thread.  This is extremely common in Solr. 
Solr is a strongly multi-threaded application.  Java tried to honor 
Solr's request, but it couldn't -- the operating system said "you can't 
create that thread."


So you need to increase the OS limit that prevents the new thread from 
starting.  In some operating systems, this is actually controlled as a 
process limit, in others it might be something relating specifically to 
threads.


If most recent versions of Solr are running on a non-Windows operating 
system, the occurrence of an OutOfMemoryError (OOME) will cause Java to 
start a script which kills Solr.  This is done for safety reasons.  When 
OOME happens, the state of a running Java program becomes completely 
unpredictable.  To protect against undesirable outcomes like index 
corruption, we forcibly terminate Solr when OOME happens.  The same 
protection hasn't yet made it to the Windows startup script.



Just this message bellow, can you help me to understand what does this
message means?

2019-12-12 10:00:23,662 [myid:] - INFO  [ProcessThread(sid:0
cport:2181)::PrepRequestProcessor@653] - Got user-level KeeperException
when processing sessionid:0x171b8ec4adb type:delete cxid:0x10
zxid:0xafc6 txntype:-1 reqpath:n/a Error
Path:/overseer_elect/election/72058082471721304-192.168.0.61:8983_solr-n_18
Error:KeeperErrorCode = NoNode for
/overseer_elect/election/72058082471721304-192.168.0.61:8983_solr-n_18


Solr tried to delete a znode from zookeeper and that deletion failed 
because the znode did not exist.


I can't offer much about WHY it didn't exist, but my best guess is that 
it would have been created by the thread that Solr could not start.


Thanks,
Shawn


Re: One node crashing in 3.4.11 triggered a full ensemble restart

2019-10-03 Thread Shawn Heisey

On 10/3/2019 2:45 AM, Norbert Kalmar wrote:

As for running a mixed version of 3.5 and 3.4 quorum - I'm afraid it will
not work. From 3.5 we have a check on PROTOCOL_VERSION. 3.4 did not have
this protocol version, so when the nodes try to communicate it will throw
an exception. Plus, it is not a goal to keep quorum protocol backward
compatible, so chances are even without the check it would not work.


This document suggests that a mixed environment of 3.4 and 3.5 will work:

https://cwiki.apache.org/confluence/display/ZOOKEEPER/ReleaseManagement

But you seem to be saying that it won't.

As a committer on the Lucene/Solr project (which uses ZK) I am wondering 
what we can tell our users about upgrading ZK.  I was under the 
impression from the wiki page I linked that they could do a rolling 
upgrade with zero downtime, where they do one ZK server at a time.  Are 
you saying that this is not possible?


The Upgrade FAQ that you linked doesn't say anything about 3.4 and 3.5 
not working together.  The only big gotcha I see there is 
ZOOKEEPER-3056, which has a workaround.


(I think of 4lw whitelisting as just a config problem with a new 
default, not a true upgrade issue)


Thanks,
Shawn


Re: Zookeeper client with single address pointing to multiple servers

2019-09-27 Thread Shawn Heisey

On 9/27/2019 9:24 AM, Benjamin Reed wrote:

are you making the assumption that you have a single machine that will
always be up? that is not a common assumption these days, which is why
solr might be resistant to such a change.

you can have a single DNS name resolve to multiple IP addresses and
ZooKeeper client will use all those addresses if you don't like
specifying a list on all the clients.


Is there something in the ZK client API that will allow Solr to ask the 
ZK client for a list of active servers that it is connected to?


Currently Solr just parses the zkHost string to obtain a server list for 
the "ZK status" portion of our admin UI.  This code was written when we 
used ZK 3.4.x ... but because we're now using 3.5.x which has dynamic 
reconfiguration, the list of active servers can be different than the 
zkHost string used when Solr started.


Thanks,
Shawn


Re: Issues with using ZooKeeper 3.5.5 together with Solr 8.2.0

2019-08-03 Thread Shawn Heisey

On 8/2/2019 10:33 AM, Patrick Hunt wrote:

Right, it prints the membership of the quorum, see (for majority case which
is typical):
org.apache.zookeeper.server.quorum.flexible.QuorumMaj#toString
https://github.com/apache/zookeeper/blob/faa7cec71fddfb959a7d67923acffdb67d93c953/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/flexible/QuorumMaj.java#L112


For our purposes (the Solr project) the output of the "conf" 4lw command 
is inconsistent, changing when there is a multi-server ensemble.  All of 
the lines except the "membership: " one use an equals sign as a 
separator.  Our parsing code fails on that line because there is no 
equals sign.


Whether or not the ZK project should consider this a bug is the question 
that I am asking.


While getting to the bottom of that question, another one arises:  Who 
are the intended audiences of the "conf" 4lw output?  If one of those 
audiences is ZK itself, then the output of the command probably will 
work perfectly for that audience, as ZK uses Java's "properties" API to 
read its config file, which means that both = and : will work as separators.


The current output also works great for a human audience.  Humans are 
quite flexible.


The difficulty is machine-based parsers like the one in Solr, which is 
very simple and just splits lines on an equal sign.  How much 
consistency can an audience like this expect?  I would personally say 
that the way "membership: " is output is a bug.  That line probably 
should be entirely removed, or the colon could be replaced with an equal 
sign.  I think that the line only makes sense for a human audience, and 
that audience probably doesn't really need it.


An alternate path:  One statement in the documentation would remove all 
difficulty, without any code changes in ZK:


"The output from the conf 4lw command should be parsed by the Java 
Properties API for best results."


If that statement is added, then Solr just needs to utilize the 
Properties API, which is very easy to do, and all is well again.


So... I'm thinking we should open an issue in Jira, and then leave it up 
to the ZK committers whether it's better to change the output or adjust 
the documentation.  I can supply a patch either way.  What does the 
community think?


Thanks,
Shawn


Can SSL capability be satisfied by a smaller dependency than netty-all?

2019-07-30 Thread Shawn Heisey
We neglected to notice that netty is a required dependency for ZK SSL 
when we upgraded to ZK 3.5.5 in Solr.  We have an issue to track this:


https://issues.apache.org/jira/browse/SOLR-13665

I was noticing that the netty-all jar included in ZK is nearly 4MB ... 
and we will have to include it twice in the Solr download because it is 
needed for the SolrJ client as well as the Solr server.  The Solr 
download is already quite large ... increasing it by another 7MB is painful.


I'm hoping that ZK's SSL capability can be satisfied by one of the 
smaller netty jars, rather than netty-all.  Is that a question that can 
be answered here on the ZK list?  The specific class that is mentioned 
by the error is included in netty-transport.


Thanks,
Shawn


Re: Issues with using ZooKeeper 3.5.5 together with Solr 8.2.0

2019-07-30 Thread Shawn Heisey

On 7/29/2019 11:45 PM, Enrico Olivelli wrote:

Due to potential security risks since ZK 3.5 you have to explicitly
whitelist some commands.


The 3.5.5 documentation says that "*" can be used to whitelist all commands.

But what you just said seems to contradict that.  If your statement is 
more accurate, then the documentation should be updated to list the 
commands that are NOT enabled when using a wildcard.


There is a SOLR issue to upgrade the client in Solr to 3.5.5:

https://issues.apache.org/jira/browse/SOLR-8346

A comment was made on this issue saying that the following config is 
needed when the server is running 3.5.x:


4lw.commands.whitelist=mntr,conf,ruok

Thanks,
Shawn


Re: Does mentioning port number with FQDN resolve multiple connections

2018-10-09 Thread Shawn Heisey

On 10/9/2018 12:54 PM, Karthik K G wrote:

We have a scenario where we give the Zookeeper FQDN to our Solr Application.
When we use this we are seeing that zookeeper is accepting connections from
all Solr nodes every minute.


I'm more familiar with the Solr side than the ZK side. Here's a ZKHOST 
string that might be given to Solr, to connect to a three-node ensemble:


zoo1.example.com:2181,zoo2.example.com:2181,zoo3.example.com:2181/solr

The way I understand things, the ZK client (which is embedded in Solr) 
should open one TCP connection to each server listed, and leave those 
connections open indefinitely.  I have not examined network traffic to 
verify this, but that is my understanding.  So if my understanding is 
correct, then the following statement should be true:


If you're seeing lots of connections, then you may have some kind of 
problem, where services are being restarted frequently, or where 
connections are getting terminated prematurely outside the control of 
either ZK or Solr.


Thanks,
Shawn



Re: ZooKeeper monuturize clients

2018-10-08 Thread Shawn Heisey

On 10/8/2018 9:53 AM, Celso Diogo da Silva Batista (Academia) wrote:

I have a zookeeper group with 3 nodes, with two clients there connected, kafka 
and flink.

I wonder if there is any way to monuturize which clients are connected in real 
time to the zookeeper and know their status.


I'm going to assume by "monuturize" that you actually mean "monitor" ... 
I can't find a definition for the word you used.


If you make sure that particular four-letter-word command is enabled, 
you can run the "cons" command to see what's connected to a server 
currently.


https://zookeeper.apache.org/doc/r3.4.13/zookeeperAdmin.html#sc_zkCommands

Thanks,
Shawn



Re: Does Apache ZooKeeper really need a JDK? Is JRE sufficient?

2018-09-15 Thread Shawn Heisey

On 9/15/2018 10:15 AM, M.P. Ardhanareeswaran wrote:

Does Apache ZooKeeper need a JDK?  Is a JRE sufficient?


For running a binary release, JRE is sufficient.

If you want to compile it from source, you'll need the JDK.

Thanks,
Shawn



Re: can not know the process name from zk log

2018-09-12 Thread Shawn Heisey

On 9/12/2018 2:33 AM, wangyongqiang0...@163.com wrote:

from zk log, i can get the ip and port,  i think if zk can print the process 
info with the ip and port , will help us in some cases


What precisely are you after?  A java program can typically report what 
PID its process has, but I don't know that any other process information 
is available.  I have not checked to see whether ZK logs the PID it's 
using at any point.  Usually such information is logged at startup (if 
it is ever logged at all) and not anywhere else.


With the port number, you can use a program like lsof or netstat to 
determine the pid, and I think this works on both the client and server 
side.  Here's an example of that for another Java program.  This isn't 
zookeeper, but the same thing will work for ZK too.


root@smeagol:~# lsof -Pn -i:45499
COMMAND  PID USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
java    8713 elyograg   35u  IPv6   95610  0t0  TCP 127.0.0.1:45499 
(LISTEN)
java    8713 elyograg   62u  IPv6 6442866  0t0  TCP 
127.0.0.1:52686->127.0.0.1:45499 (CLOSE_WAIT)
java    8713 elyograg   67u  IPv6 6443911  0t0  TCP 
127.0.0.1:52792->127.0.0.1:45499 (CLOSE_WAIT)
java    8713 elyograg   78u  IPv6 6446143  0t0  TCP 
127.0.0.1:52814->127.0.0.1:45499 (ESTABLISHED)
java    8713 elyograg   83u  IPv6 6444628  0t0  TCP 
127.0.0.1:45499->127.0.0.1:52814 (ESTABLISHED)
java    8713 elyograg   84u  IPv6 6443524  0t0  TCP 
127.0.0.1:52710->127.0.0.1:45499 (CLOSE_WAIT)
java    8713 elyograg   85u  IPv6 6442460  0t0  TCP 
127.0.0.1:52360->127.0.0.1:45499 (CLOSE_WAIT)
java    8713 elyograg   87u  IPv6 6445101  0t0  TCP 
127.0.0.1:52766->127.0.0.1:45499 (CLOSE_WAIT)
java    8713 elyograg  113u  IPv6 6443962  0t0  TCP 
127.0.0.1:52844->127.0.0.1:45499 (ESTABLISHED)
java    8713 elyograg  119u  IPv6 6444645  0t0  TCP 
127.0.0.1:45499->127.0.0.1:52844 (ESTABLISHED)
java    8713 elyograg  200u  IPv6 6441819  0t0  TCP 
127.0.0.1:52656->127.0.0.1:45499 (CLOSE_WAIT)


The -Pn parameters instruct lsof to not translate port numbers or IP 
addresses to names.  I do this to make the lsof program run faster.


Thanks,
Shawn



Re: Port 3888 closed on Leader

2018-08-23 Thread Shawn Heisey

On 8/15/2018 7:46 AM, harish lohar wrote:

In a deployment of 3 Node Zk Cluster we have seen that sometime port 3888
is absent after the cluster is formed , this causes Follower node to not
able to connect to leader if they restart.

Don't leader itself should come out of clustering if this happens  ??


I'm not well-versed in how ZK works internally, and don't have access 
any more to systems I can check, but I seem to remember when looking at 
a live ensemble that not every ZK instance will bind to all three ports 
(2181, 2888, and 3888 if using the example configs).  Surprised me when 
I noticed it, but I didn't worry about it too much since ZK seemed to be 
working correctly.


Thanks,
Shawn



Re: ZooKeeper in different datacenters

2018-08-22 Thread Shawn Heisey

On 8/22/2018 11:10 AM, ilango dhandapani wrote:

So, if I have 3 zk servers on 1st DC, 2 solr servers on 1st DC and 2 solr
servers on 2nd DC, this will work right ? Other than network/latency between
DC1 and DC2 for solr replication.


No.  This is what I was trying to tell you.

With half your ZK servers in one DC and half in the other, if you lose 
either datacenter, ZK loses quorum, and SolrCloud switches to 
read-only.  If you still have a complete copy of your indexes, you'll be 
able to continue making queries, but you will be unable to make any 
changes to any index until the datacenter comes back up.  When there are 
six ZK servers, you must have at least four of them operational and 
reachable, or you do not have quorum.


For fault tolerance on ZK, you need at least three data centers.  So 
that if you lose any datacenter, quorum can still be maintained with the 
ZK servers in the other two datacenters.


Thanks,
Shawn



Re: ZooKeeper in different datacenters

2018-08-22 Thread Shawn Heisey

On 8/22/2018 10:02 AM, ilango dhandapani wrote:

1. To have disaster recovery, planning to have 2 solr servers on 1st DC and
other 2 solr servers on 2nd DC. Seems there should not be any issue here.
Each shard will have 1st node in 1st DC and 2nd node in 2nd DC.


For Solr nodes in a SolrCloud setup, this is fine.  But keep reading, 
because your overall plan isn't going to work.



2. Planing to run 3 zk nodes on 1st DC and 3 zk nodes on 2nd DC. Now will
affect the performance ?


ZooKeeper cannot be made fully fault tolerant with only two 
datacenters.  It's simply not possible.  No matter how you distribute 
the nodes, at least one of your data centers will not have enough nodes 
to achieve quorum.  The way you've described things, NEITHER of the data 
centers will have enough nodes to achieve quorum if the other datacenter 
becomes unreachable. More than half are required. You must have at least 
three datacenters for a distributed fault tolerant ZK setup.  If you put 
4 ZK nodes in DC1 and 3 in DC2, then the loss of DC1 will eliminate quorum.


When a write is made to ZK, it must be written to all running ZK servers 
before the operation returns to the caller. Solr does not write to ZK 
often unless a Solr instance goes down or comes up frequently.  Index 
updates do NOT go through ZK.The ZK database is consulted to discover 
where the replicas are, but the updates themselves are never written to ZK.



3. Will this affect the replication between the solr nodes on different DCs?


This mailing list will have no idea about this -- it's for ZK.  I'm part 
of the Solr community though, so you're not completely out of luck.


The only thing that's going to affect Solr replication between data 
centers is the network latency between those data centers.  If that's 
low, replication will be fast.


Thanks,
Shawn



Re: Configuring SolrCloud with Redundancy on Two Physical Frames

2018-05-02 Thread Shawn Heisey
On 5/2/2018 11:44 AM, Adam Blank wrote:
> Shawn - Sorry if I mixed up terminology, but by standalone I meant having a
> single Zk instead of a Zk ensemble.  So I would reconfigure the remaining
> Zk and Solr node to only use that single Zk.  To your point about the
> config data being stored in Zk, I'm thinking that should be alright in this
> case since the remaining Zk should already have that data stored?

If the goal is to avoid manual intervention in the face of failure, the
scenario you've described does not meet that goal.  You would have to
reconfigure and restart services.  Adding another machine for a third ZK
would eliminate the need to do that.

Andor's assertion that the machine should have similar horsepower to the
others would only apply if the only thing those machines are running is
ZK.  If you've got two beefy machines doing both Solr and ZK and a third
machine doing only ZK, the third machine does not need as much
horsepower.  It should certainly not be slow, but it's not going to need
the same level of storage and memory that the other two will.  A lot of
the CPU horsepower on the two beefy machines would be used for Solr, so
the third machine probably doesn't need as much CPU either.

For an install where you're trying to get by with two machines, it is
highly unlikely that SolrCloud would put much of a load on ZK.  Most of
what SolrCloud writes to ZK is cluster state changes -- machines going
up/down, collections created/deleted, etc.  When a SolrCloud cluster is
stable, there will be very little written to ZK.  Earlier I said that
when ZK loses quorum, SolrCloud will go read-only.  This is true, but it
does NOT mean that index updates go through ZK.  They do not.  SolrCloud
switches to read-only when ZK quorum goes away to protect itself from
split-brain problems.

Thanks,
Shawn



Re: Configuring SolrCloud with Redundancy on Two Physical Frames

2018-05-02 Thread Shawn Heisey
On 5/2/2018 7:07 AM, Adam Blank wrote:
> Thank you everyone for the useful information.  Would it be easy to
> reconfigure an existing clustered deployment to a standalone deployment?
> This is what I'm thinking:
>
> I have two physical servers.  I would have one Zk installed on server 1 and
> two Zk installed on server 2.  I would have a Solr node on each server,
> each with one or more shards.  If server 1 goes down, I should still be
> operational.  If server 2 goes down, I would reconfigure the remaining Solr
> node and Zk on server 1 as a standalone deployment.  Should that work in
> theory?  If so, the only changes that I should need to make would be to
> update the zoo.cfg within Zk and to restart Zk and Solr in standalone
> mode?

This question (and my response) are out of place on this mailing list. 
Adding ZK to Solr means Solr is running in cloud mode.  Standalone mode
means ZK is not involved.  If you want to pursue this further, please
start a thread on the solr-user mailing list.

If you try to start a Solr machine running in cloud mode in standalone
mode (by removing the information about ZK), it's not going to work,
because all the config data for the indexes is in zookeeper.  There is
no active config data on the filesystem with the indexes.  You could
possibly fix that by creating a "conf" directory in each core's
directory with the required configuration.  Assuming that works (and I'm
not 100% sure that it would), if you have collections with more than one
shard, then what you'll end up with is partial indexes that have no
connection to each other.

Generally speaking, a SolrCloud install that loses ZK quorum will still
work, but it will be read-only.

Thanks,
Shawn



Re: Need help installing Zookeeper service in Ubuntu 16.04

2018-04-10 Thread Shawn Heisey

On 4/10/2018 7:43 PM, Gregorius Soedharmo wrote:

Thank you for your help, but unfortunately, it all sounds gibberish to me.
As stated in the stack exchange question, I'm a complete Linux newbie that
couldn't even properly install a piece of software in Ubuntu. I did include
all of my efforts so far in the question.

Do you think it is best to scrap it and try a different installation
approach instead?


Unless you're absolutely certain that you need a new feature only 
available in 3.4.9 or newer, I would just run "apt-get install 
zookeeper" and use the 3.4.8 version provided by Ubuntu.  I do not know 
where that package will install its configuration, but it probably won't 
be all that hard to find.


It is likely that some of the bug fixes from later versions have been 
incorporated into the debian/ubuntu package by the people who maintain 
that package.  Usually new functionality is not backported, but bug 
fixes often are.


Thanks,
Shawn



Re: Need help installing Zookeeper service in Ubuntu 16.04

2018-04-10 Thread Shawn Heisey
On 4/10/2018 10:39 AM, Gregorius Soedharmo wrote:
> I'm having problem installing Zookeeper as a service in Ubuntu, can you
> help by answering either of these stack exchange question?
>
> https://askubuntu.com/questions/1022575/what-is-the-proper-way-to-install-zookeeper-on-ubuntu-16-04-for-both-standalone
>
> https://devops.stackexchange.com/questions/3833/what-is-the-proper-way-to-install-zookeeper-on-ubuntu-16-04-for-both-standalone

Here's what I did on a CentOS system.  This could be adapted to Ubuntu. 
It's probably not the "proper" way to do it, but it worked for me.

First, I extracted the zookeeper tarball to /opt and renamed the
zookeeper-X.Y.Z directory to "zoo".

Then I created a little shell script at /usr/local/sbin/zkrun :


#!/bin/sh

# chkconfig: - 75 50
# description: Starts and stops ZK

cd /opt/zoo
bin/zkServer.sh $1


I made the script executable, and then created a symlink in the init.d
directory:

chmod +x /usr/local/sbin/zkrun
ln -s /usr/local/sbin/zkrun /etc/init.d/zookeeper

The following two commands activated the service for the next boot:

chkconfig --add zookeeper
chkconfig zookeeper on

Starting it was simple:

service zookeeper start

On Ubuntu, you can do something similar to what I did with chkconfig
using the update-rc.d command.  I don't know if that command looks for
comments in the script to determine where in the sequence to place the
startup and shutdown, but if it does, you could edit the script to
include those comments.

Or you could just install the zookeeper package that's included with
Ubuntu.  It's not the latest -- on an Ubuntu 16 system, I see version
3.4.8 in the repository.  It's not ancient.  3.4.11 is the newest stable
release.

Thanks,
Shawn



Re: Is the current max packet length available via the API?

2018-04-07 Thread Shawn Heisey

On 4/6/2018 6:46 AM, Martin Gainty wrote:


 ZOOMAIN="-Dcom.sun.management.jmxremote 
-Dcom.sun.management.jmxremote.port=$JMXPORT 
-Dcom.sun.management.jmxremote.authenticate=$JMXAUTH 
-Dcom.sun.management.jmxremote.ssl=$JMXSSL -Dzookeeper.jmx.log4j.disable=$JMXLOG4J 
org.apache.zookeeper.server.quorum.QuorumPeerMain"
   fi
else
 echo "JMX disabled by user request" >&2
 ZOOMAIN="org.apache.zookeeper.server.quorum.QuorumPeerMain"
fi
MG>everything you need to setup JMX is located 
MG>athttp://java.sun.com/javase/6/docs/technotes/guides/management/agent.html


I know how to enable remote JMX when running a java program. I also know 
how to connect a JMX client like jconsole to that program.


What I was saying that I didn't know how to do was access data from JMX 
in a Java program that I've written myself.


Remote JMX seems like overkill, when all I want to know is what the ZK 
client that's part of my application is currently using as a maximum 
packet length.  Reading ClientCnxn.packetLen gets me what I was after.  
It would have taken an extensive code review for me to find that 
particular field, so thank you for letting me know about it.


If the client has an increased jute.maxbuffer but the server doesn't, 
then there could still be a problem with writing data, but I'm not going 
to try and detect that.  In any case, I'm not going to abort the upload, 
I'm just logging a warning.


Thanks,
Shawn



Re: Is the current max packet length available via the API?

2018-04-05 Thread Shawn Heisey

On 4/5/2018 3:44 AM, Andor Molnar wrote:
You can get the current jute.maxbuffer setting from a running 
ZooKeeper instance by querying ZooKeeperServerBean via JMX.


I'm not sure how I would do that in a client program.  It might be 
trivial, but it's not something I've ever done.


Currently there're 2 usage of the setting in ZK: 1) server-client 
communication which is by default 4MB, 2) server-server communication 
which is by default 1MB. They can't be set individually, but can be 
overriden with the jute.maxbuffer system property.


I'm looking for a way to ask the ZK client to give me the value it is 
currently using as its max packet length.  I'm only going to be logging 
a warning to inform the user about which file may have caused a problem 
due to size, not preventing the attempt at uploading the file, so I'm 
not opposed to falling back to a hard-coded value if I can't figure it 
out.  I can look for the jute.maxbuffer sysprop, but if ZK will tell me 
what it's actually using, I'd prefer that.


Does the max packet length cover ONLY the size of the znode data, or 
does the znode name get included in that?  Asked another way: Should I 
subtract a little bit from the max packet length (maybe 128 or 256) 
before I compare the file size, or just compare the unchanged value?


I did discover that the ZkClientConfig.CLIENT_MAX_PACKET_LENGTH_DEFAULT 
field I mentioned before is not available in 3.4.x, it seems to have 
been added to a 3.5 version.  Since Solr uses 3.4.x and won't upgrade 
until there is a stable 3.5 release, I can't use that.


I do think that the ZK client should log something useful when the max 
packet length is exceeded -- if that's even possible.  The user in this 
scenario is running the latest version of Solr that was available at the 
time, which includes ZK 3.4.10 for its client.  The error message 
indicated socket problems, but didn't have any information about the cause.


When running under java 9, they got this as the error:

WARN - 2018-04-04 09:05:28.194; 
org.apache.zookeeper.ClientCnxn$SendThread; Session 0x100244e8ffb0004 
for server localhost/127.0.0.1:2181, unexpected error, closing socket 
connection and attempting reconnect java.io.IOException: Connection 
reset by peer


With Java 8, they got this:

WARN - 2018-04-04 09:10:11.879; 
org.apache.zookeeper.ClientCnxn$SendThread; Session 0x10024db7e280002 
for server localhost/0:0:0:0:0:0:0:1:2181, unexpected error, closing 
socket connection and attempting reconnect java.io.IOException: Protocol 
wrong type for socket


In both cases, the stacktrace listed a bunch of sun classes and then a 
couple of methods in zookeeper's ClientCnxnSocketNIO class.


When I asked them what their ZK server log said, that's when I figured 
out the problem:


2018-04-04 14:06:01,361 [myid:] - WARN [NIOServerCxn.Factory: 
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@383] - Exception causing close of 
session 0x10024db7e280006: Len error 5327937



Do I understand correctly that Solr uploads file to ZooKeeper?


Solr *itself* won't typically be uploading data to ZK that can exceed 
the max packet size.  It is typically done either with a separate 
commandline program (the ZkCLI class the commandline program uses is 
included in Solr), or by a client program using the SolrJ library (which 
is part of Solr like ZkCLI, but usable by itself).  The action being 
performed is an upload of a configuration for a Solr index.


Solr does sometimes run into the problem described in ZOOKEEPER-1162, 
but this is due to the number of children in a znode, where each one has 
minimal data.


Thanks,
Shawn



Is the current max packet length available via the API?

2018-04-04 Thread Shawn Heisey
Is it possible to get the current max packet length from the API?
(version 3.4.x)

If not, I'm guessing that I need to look for the jute.maxbuffer system
property and fallback to ZkClientConfig.CLIENT_MAX_PACKET_LENGTH_DEFAULT
if it's not defined.

What I'm trying to do is log a useful error message in Solr if somebody
tries to upload a file that's too big for what's allowed.  The error
that they get currently is not helpful, and figuring out what went wrong
seems to require looking at the server log.

Side note:  I can see in current code (and the 3.5.2 programmer's guide)
that the default max packet length is 4MB, but the administrators guide
(even the 3.5.3 version) still says 1MB.

Thanks,
Shawn



Re: Upgrade required Java version to 1.8 on 3.5+

2018-03-08 Thread Shawn Heisey
On 3/7/2018 1:21 PM, Jeff Widman wrote:
> +1 from me to using Java 8 or even going all the way to 9 for the 3.5
> release branch.

I don't think it would be a good idea to require Java 9 at this time. 
It's probably already an uphill battle for sysadmins to get approval to
jump ONE major version.  Getting approval to upgrade through TWO major
versions might prove to be very difficult for some.

A year from now, after Java 8 goes end of support, might be the time to
have that discussion.

I have no idea what kind of overall roadmap there is for ZK major
versions.  Maybe nobody has planned that far ahead.

Ordinarily I would say that requiring a new major Java version should
happen in a major release, which would mean requiring Java 8 with the
4.0 release and Java 9 with the 5.0 release.  But I know that ZK has a
very slow release cycle -- multiple months between *point* releases, and
far longer between minor releases.  I don't even know what kind of cycle
there is for major releases.  Maybe because of the slow release cycle,
waiting for 4.0 would just take too long.  So here's an alternate idea:
require Java 8 in 3.6.x and Java 9 in whatever minor or major release
comes after 3.6.

For comparison purposes -- Lucene/Solr usually puts out a new minor
release every few weeks.  Point releases usually are VERY quick after a
minor release, and typically are only created for really massive bugs.

Thanks,
Shawn



Re: Upgrade required Java version to 1.8 on 3.5+

2018-03-07 Thread Shawn Heisey

On 3/7/2018 4:04 AM, Andor Molnar wrote:

I've quickly checked some of the major components that are heavy Zk clients:

Hadoop/HDFS = 1.8 required
HBase = 1.8 required
Kafka = 1.7 required (has some 1.8 and 1.9 bindings)
Hive = 1.8 required
Curator = 1.7 required (has 1.8-only async module to take advantage of Java
lambdas)
Solr  = 1.8

As always, your feedback is much appreciated.


I come from the Solr world.

Lucene/Solr started requiring Java 7 with the release of 4.8.0, 
announced on 2014-04-28.


Lucene/Solr started requiring Java 8 with the release of 6.0.0, 
announced on 2016-04-08.


The general reaction each time one of these major changes was discussed 
seemed to be "oh, finally!  it's about time!"  I get the strong sense 
that Lucene committers really want to use the new language features, and 
feel limited when they can't. Historically, there have been a few 
changes committed that failed to compile when the officially supported 
minimum JDK version was used.  The authors probably should have noticed 
the problem, but sometimes don't because they're using updated toolchains.


How do the committers on this project generally feel about needing to 
avoid using Java 8 features?  If they don't feel limited, there's 
probably no reason to update the requirement.  If however they feel that 
they could write better code with a refresh, then given general industry 
trends, it probably is time to consider updating the requirement.  Maybe 
you will want to accelerate plans for a 4.0 release, and update the 
requirement there.


Another piece of information to think about:  Oracle isn't providing 
public support/bugfixes for Java 7 any more.  To get support, Oracle 
must be paid.  Java 8 is going to reach that same milestone in January 
2019, so within the next year or so, we are going to begin seeing a lot 
of projects updating to a minimum of Java 9.


Thanks,
Shawn



Re: "ant eclipse" in source code fails, easy fix

2018-03-04 Thread Shawn Heisey

On 3/3/2018 10:10 PM, Edward Ribeiro wrote:

I recommend you open a JIRA issue at
https://issues.apache.org/jira/projects/ZOOKEEPER and then open a PR to
https://www.github.com/apache/zookeeper , please. This is clearly a bug and
the fix is trivial so you can bypass any dev mailing list discussion, IMO.


Jira and linked github PR created.

https://issues.apache.org/jira/browse/ZOOKEEPER-2992

Thanks,
Shawn



"ant eclipse" in source code fails, easy fix

2018-03-03 Thread Shawn Heisey
I know I really should be putting this on the dev list.  The reason I'm 
not doing so is because I'm already subscribed to far too many mailing 
lists.  I don't expect to be making a ton of contributions to ZK, so I 
don't want to join another mailing list for one little discussion.  If 
at some point I *do* find myself more involved with ZK development, I 
will join the dev list.


I just cloned the source to poke around a little bit, not make changes.  
I find eclipse fairly easy to use, so I wanted to prep the repository 
for loading into that software.  I typed "ant eclipse" immediately after 
cloning from the github mirror.  It failed.



C:\Users\elyograg\git\zookeeper>ant eclipse
Buildfile: C:\Users\elyograg\git\zookeeper\build.xml

ant-eclipse-download:
  [get] Getting: 
http://downloads.sourceforge.net/project/ant-eclipse/ant-eclipse/1.0/ant-eclipse-1.0.bin.tar.bz2
  [get] To: 
C:\Users\elyograg\git\zookeeper\src\java\ant-eclipse-1.0.bin.tar.bz2
  [get] 
http://downloads.sourceforge.net/project/ant-eclipse/ant-eclipse/1.0/ant-eclipse-1.0.bin.tar.bz2 
moved to 
https://iweb.dl.sourceforge.net/project/ant-eclipse/ant-eclipse/1.0/ant-eclipse-1.0.bin.tar.bz2


BUILD FAILED
C:\Users\elyograg\git\zookeeper\build.xml:1809: Redirection detected 
from http to https. Protocol switch unsafe, not allowed.


Total time: 0 seconds


The reason is simple -- sourceforge no longer allows unencrypted 
access.  They are nice enough to redirect the request to https, and if 
this had been a browser access, that would have worked without problem.  
But apparently ant doesn't consider such redirects to be safe.


The fix is easy -- add one character to the ant build.  I tested this, 
it worked without issue on Windows.  I expect it would work on the 
better operating systems too.



diff --git a/build.xml b/build.xml
index 639707e7..a6b7617b 100644
--- a/build.xml
+++ b/build.xml
@@ -1805,7 +1805,7 @@ xmlns:cs="antlib:com.puppycrawl.tools.checkstyle.ant">
  

  
-   src="http://downloads.sourceforge.net/project/ant-eclipse/ant-eclipse/1.0/ant-eclipse-1.0.bin.tar.bz2;
+   src="https://downloads.sourceforge.net/project/ant-eclipse/ant-eclipse/1.0/ant-eclipse-1.0.bin.tar.bz2;
 dest="${src.dir}/java/ant-eclipse-1.0.bin.tar.bz2" 
usetimestamp="false" />


    


Do you want an issue in Jira, or is this informal discussion enough?

Thanks,
Shawn



Re: Ensemble fails when one node looses connectivity

2018-03-02 Thread Shawn Heisey

On 3/2/2018 6:54 AM, Jim Keeney wrote:

Thanks for jumping in on the ZK side as well.

I will take a hard look at my config files but I checked and I do not have
any one file over 1MB. The combined files (10 indexes) is 2.2MB.

I am using micros for the nodes which are very limited in memory.

I'm not currently using a java.env file so I guess I'm using the default
values for the JVM which is typically xmx512M if I remember correctly.

Could it be just a memory issue?


Usually Java on Linux has a default heap size of about 4GB.  But it 
would be highly dependent on the amount of memory actually present on 
the machine.  Just yesterday, I saw Java report a 6GB default heap size, 
on a machine with 24GB of memory. Information I can find about AWS 
instance types says that a micro instance has 1GB of memory.  So the 
default heap size is probably quite small.


Even in small server situations, I would strongly recommend that anytime 
you have a java commandline, you define -Xmx for the max heap, and -Xms 
should probably be set as well, to the same value as -Xmx.  That way 
you're not relying on defaults, you're absolutely sure what the heap 
size is.


For ZK servers handling 2 megabytes of config data plus the rest of a 
small SolrCloud install, something like 256MB or 512MB of heap would 
probably be plenty.  ZK holds a copy of its entire database in memory.  
Small SolrCloud installs won't put much of a load on ZK.  A micro 
instance should be plenty for ZK when the software using it is Solr, as 
long as that's the only thing it's running.


Thanks,
Shawn



Re: Ensemble fails when one node looses connectivity

2018-03-01 Thread Shawn Heisey

On 3/1/2018 7:59 PM, Jim Keeney wrote:

Read about the maxbuffer and am pretty sure that this might explain the
behavior we are seeing since it occurs when there has been a significant
reboot of all the servers. We have over 2 mb of config files for all of our
indexes and if all the Solr nodes are sync ing their configs at once it
seems like that might overflow the buffer.


You probably recognize me from the Solr side.  Hello again.  I do know 
enough to handle this part, so I'm answering. I didn't consider the 
maxbuffer setting, because I didn't see anything about large packets in 
the logs you shared on the Solr mailing list, and it's very rare for 
Solr users to need to increase it.


You only need to worry about the maxbuffer if any single part of the 
config in ZK (what is called a "znode") is over 1MB. Each file in the 
configs that you upload will go into its own znode.  So if none of the 
individual files in your configs is really large, you probably won't 
need to set jute.maxbuffer.


As for the other things that Solr puts in ZK:  Unless you have a REALLY 
huge cluster (tons of collections, shards, replicas, servers, etc) then 
that information should be quite small.



Newbie question, where would i set the -Djute.maxbuffer ? Should I update
the zkServer.sh file so this is applied every time zookeeper is started or
restarted.


If jute.maxbuffer is needed, it must be set on the startup options for 
every ZK server and every client that will access large znodes.  Which 
means all your ZK servers, all your Solr servers, and any invocations of 
things like the scripts Solr includes for uploading configs.


Thanks,
Shawn



Re: Zookeeper session expiration

2017-12-04 Thread Shawn Heisey

On 12/4/2017 12:51 PM, Anthony Shaya wrote:

Thanks Shawn, should I message the developer mailing list for a more definitive 
answer?


The ZK dev list is for discussion around the development of ZK itself, 
NOT for development of software that uses ZK.  For the latter kind of 
development, you want THIS list.


If you're talking about the dev list for whatever software is using the 
ZK client, then that would be the right place to go.


Although a bug in ZK is always possible, I don't think it's very likely 
for the session timeouts you are seeing.  Even if it does turn out to be 
a bug in ZK, this list would still be the correct place to discuss it, 
and further action would then be taken as an issue in Jira.


For most usages, there will be at least three ZK servers, and each 
client will know about all of them.  If there are no problems on the 
client side, then the client would only lose connectivity to one of the 
servers and would be able to communicate with the others.  If there ARE 
problems on the client side, then it would probably lose connection with 
all the servers at nearly the same time.


Thanks,
Shawn


Re: Zookeeper session expiration

2017-12-04 Thread Shawn Heisey

On 12/4/2017 8:22 AM, Anthony Shaya wrote:

My question is related to how session expiration works, I noticed on many of 
the client machines the times across these machines were all off (by anywhere 
from 1 minute to 20 minutes - which was resolved after discovery - haven't 
verified this completely yet). Can this directly affect session expiration 
within the zookeeper cluster?

   *   I read the following in https://wiki.apache.org/hadoop/ZooKeeper/FAQ , 
"Expirations happens when the cluster does not hear from the client within the 
specified session timeout period (i.e. no heartbeat).". So in some case it seems 
like if the times were wrong across the machines its possible one of the clients could of 
effectively sent a heart beat in the past (not sure about this tbh) and then the cluster 
expires the session?


I make these comments without any knowledge of what ZK code actually 
does.  I am a member of this list because I'm a representative of the 
Apache Solr project, which uses the ZK client in order to maintain a 
cluster.


IMHO, any software which makes actual decisions based on the timestamps 
in messages from another system is badly designed.  I would hope that 
the ZK designers know this, and always make any decisions related to 
time using the clock in the local system only.


If ZK's designers did the right thing, then a session timeout would 
indicate that quite literally no heartbeats were received in X seconds, 
as measured by the local clock, and the local clock ONLY ... NOT from 
timestamp information received from another system.


Although such a lack of communication could be caused by any number of 
things, including network hardware failure, one of the most common 
reasons I have seen for problems like this is extreme java garbage 
collection pauses in the client software.


Situations where the heap is a little bit too small can cause a java 
program to basically be doing garbage collection constantly, so it 
doesn't have much time to do anything else, like send heartbeats to ZK 
servers.


Situations where the heap is HUGE and garbage collection is not well 
tuned can lead to pauses of a minute or longer while Java does a massive 
full GC.



   *   I don't have the zookeeper node log for the above time to see what was 
going on in zookeeper when the cluster determined the session expired.

   *   Is there any additional logging I can turn on to troubleshoot zk session 
expiration issues?


Hopefully your ZK clients also have logging.  Failing that, you could 
turn on GC logging for the software with the ZK client (assuming it's a 
Java client) and find a program or website that can examine the log and 
give you statistics or a graph of GC pauses.


If there is a problem in software using the client and whatever logging 
is available doesn't help you figure out what's wrong, you're generally 
going to need to talk to whoever wrote that software for help 
troubleshooting it.


Thanks,
Shawn


Re: ...likely client has closed socket...

2017-07-14 Thread Shawn Heisey
On 7/14/2017 5:14 AM, mosto...@gmail.com wrote:
> Using zookeeper 3.5.3-beta we are getting a few log lines like:
>
>[2017-07-14 13:02:30,588] WARN Unable to read additional data from
>client sessionid 0xc00bee319ed0004, likely client has closed socket
>(org.apache.zookeeper.server.NIOServerCnxn)
>
> Why zookeeper is "reading" adittional data from client if it already
> leave? Anyway, it the client is gone this shoudn't be considered an
> error, isnt it? 

I'm no expert in ZK code, so I could be completely wrong in what I'm
going to say:

This kind of message probably means that the normal TCP stack operation
(where a client will notify a server that it is closing the socket) was
somehow interrupted, possibly by a badly configured firewall or a
network problem.  There is always the possibility of a bug, but it is
more likely to be a problem somewhere else.  I do not believe that this
is a message that can be ignored -- it probably indicates the presence
of a real problem that must be fixed.

If I'm wrong, hopefully someone who IS an expert will step up and
correct me.

Thanks,
Shawn



Re: How to add nodes to a Zookeeper 3.5.3-beta ensemble with reconfigEnabled=false

2017-06-23 Thread Shawn Heisey
On 6/22/2017 11:39 PM, Alexander Shraer wrote:
> The described behavior is the intended one - in 3.5 configuration is
> part of the synced state and is updated when the server syncs with the
> leader. The only rolling upgrade I tested was to upgrade the software
> version of the servers - this should still work. But I didn't try to
> support rolling upgrade for upgrading the configuration, since this
> should be done through reconfig. 

If the intent is to get rid of the old way of changing the configuration
(update zoo.cfg and perform rolling restarts) and only support dynamic
reconfiguration, then why is there a reconfigEnabled setting at all, and
why does it default to false?

Based on everything that has been said here, it sounds like when
reconfigEnabled is left alone or explicitly set it to false, the ability
to change the ensemble configuration is entirely lost, because the
server will pull the dynamic config from the ensemble on startup and any
changes made to zoo.cfg are ignored.

Thanks,
Shawn



Re: Yet another "two datacenter" discussion

2017-05-26 Thread Shawn Heisey
On 5/26/2017 9:48 AM, Jordan Zimmerman wrote:
> In ZK 3.4.x if you have configuration differences amongst your instances you 
> are susceptible to a split brain. See this email thread, "Rolling Config 
> Change Considered Harmful":
>
> http://zookeeper-user.578899.n2.nabble.com/Rolling-config-change-considered-harmful-td7578761.html
>  
> 
>
> In ZK 3.5.x I'm not even sure it would work. 

Thank you for your reply.

I don't fully understand everything being discussed in that thread, but
it sounds like bad things could happen once connectivity is restored. 
If DC1 and DC2 were both operational from a client perspective, but
unable to communicate with each other, I think the potential for bad
things would be even higher, because there could be confusion about
which Solr servers are leaders, as well as which ZK server is the leader.

Thanks,
Shawn



Yet another "two datacenter" discussion

2017-05-26 Thread Shawn Heisey
I feel fairly certain that this thread willbe an annoyance.  I don't
know enough about zookeeper to answer the questions that are being
asked, so I apologize about needing to relay questions about ZK fault
tolerance in two datacenters.

It seems that everyone wants to avoid the expense of a tie-breaker ZK VM
in a third datacenter.

The scenario, which this list has seen over and over:

DC1 - three ZK servers, one or more Solr servers.
DC2 - two ZK servers, one or more Solr servers.

I've already explained that if DC2 goes down, everything's fine, but if
DC1 goes down, Solr goes ready-only, and there's no way to prevent that.

The conversation went further, and I'm sure you guys have seen this
before too:  "Is there any way we can get DC2 back to operational with
manual intervention if DC1 goes down?"  I explained that any manual
intervention would briefly take Solr down ... at which point the
following proposal was mentioned:

Add an observer node to DC2, and in the event DC1 goes down, run a
script that reconfigures all the ZK servers to change the observer to a
voting member and does rolling restarts.

Will their proposal work?  What happens when DC1 comes back online?  As
you know, DC1 will contain a partial ensemble that still has quorum,
about to rejoin what it THINKS is a partial ensemble *without* quorum,
which is not what it will find.  I'm guessing that ZK assumes the
question of who has the "real" quorum shouldn't ever need to be
negotiated, because the rules prevent multiple partitions from gaining
quorum.

Solr currently ships with 3.4.6, but the next version of Solr (about to
drop any day now) will have 3.4.10.  Once 3.5 is released and Solr is
updated to use it, does the situation I've described above change in any
meaningful way?

Thanks,
Shawn



Re: odd issue after enabling the firewall

2017-05-10 Thread Shawn Heisey
On 5/10/2017 11:40 AM, msouthwick wrote:
> I have 2 zookeepers, 2 shards and 2 replica shards in my setup.

Followup, noticed this after I had sent the previous reply:  A ZK
ensemble of two servers is LESS fault tolerant than a single server.  If
*either* server were to go down, you would lose quorum.  You need three
servers for fault tolerance.  This is outlined in at least two places in
the ZK documentation.

Your mention of port 8983 (as well as shards and replicas) suggests that
you're running SolrCloud.  I believe that the need for three ZK servers
is also mentioned in the Solr documentation ... and if it's not, then I
need to make sure that gets added.

Thanks,
Shawn



Re: odd issue after enabling the firewall

2017-05-10 Thread Shawn Heisey
On 5/10/2017 11:40 AM, msouthwick wrote:
> I have 2 zookeepers, 2 shards and 2 replica shards in my setup. Everything
> was working just fine until I enabled the firewall. I started by allowing
> ports: 1099, 2181, 2888, 3888, 8983. Now I get the following in the
> zookeeper log.
>
> 2017-05-10 11:04:11,300 [myid:1] - INFO 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] -
> Accepted socket connection from /151.155.70.24:43248

> It looks to me that the port is being changed in this example to 43248. This
> number changes so I opened a range of ports from 43000 to 43300 in hopes
> that this would fix the issue but as you can see it didn't.

That's the source port on the client side of the TCP connection.  2181
is the destination port on the server side.

Although most firewalls are CAPABLE of restricting traffic by the source
port, it is rare for such restrictions to be configured intentionally. 
The source port is basically unpredictable without extensive knowledge
of a client's TCP stack implementation.

The source port range for Linux machines is typically 32768 to 61000. 
It can be configured, but unless you are absolutely certain that you
MUST configure this, you should not worry about changing it.  Other
client operating systems may use a different port range, but it will
generally have thousands of possible ports available.

Thanks,
Shawn



Re: Memory requirement

2017-04-27 Thread Shawn Heisey
On 4/26/2017 12:25 PM, Daniel Chan wrote:
> We have a Zookeeper (3.4.6) data store with
> zk_approximate_data_size  1.88G
> zk_znode_count4.43 millions
>
> 99% of the znodes has dataLen around 600 bytes.
>
> The Zookeeper instance is configured with "-Xms4G -Xmx4G" but it failed on 
> startup.
>
> Is there any way to project memory requirement for running Zookeeper based on 
> its data size? something like 2X or 3X of the data size?

Disclaimer: I'm technically deficient on ZK in many ways.  I hang out
here because the project where I *do* know a thing or two (Solr) uses
ZK.  To the best of my knowledge, what I am saying below is correct, but
I could be wrong.

Even without knowing all that much about ZK or what its memory
requirements are, the first thing that comes to mind is that your
problem description of "it failed on startup" is extremely vague.  What
happens *exactly*?  If there are error messages logged, can you give us
those, including the full Java stacktrace if it's present?  Maybe you
are correct to think that there's not enough memory, but without error
messages, it's not possible to say for sure.

I see that the day before you sent this message, you sent another
message where you DID include an error message seen at startup:

java.lang.OutOfMemoryError: GC overhead limit exceeded

My research on this error says the following:  If this is the error you
are getting when you start with a 4G heap, you're not actually running
out of heap, but the amount of heap needed is so close to the 4G you've
specified that Java is spending most of its time doing garbage
collection without freeing very much memory.

Probably the first thing I would have tried is to run it with an 8G heap
and see whether that works.  Based on the discussion on the thread where
you opened ZOOKEEPER-2714, with that large dataset startup may be VERY
slow even when there is enough memory to avoid massive GC overhead.

Further discussion:  The second thing that comes to mind, and this will
need to be addressed by someone with intimate knowledge, is that your
data size makes me wonder if ZK is the correct choice for whatever you
are doing.  It is not my intent to badmouth the project, but there are
certain things that it does not handle well.  If the amount of data that
is changed and/or read at any given moment is very small, then you MIGHT
be OK aside from a very slow startup.

Thanks,
Shawn



Re: Query: 3.5.x version as alpha

2017-02-22 Thread Shawn Heisey
On 2/20/2017 2:19 AM, Deepti Sharma S wrote:
> Can anyone confirm, is there any specific reason in the naming
> convention in this release and to make it as "alpha". As per my
> understanding Alpha means its beta release and not recommended to use
> in production, however as per below mail it seems many customers using
> the alpha version, so there is no harm to use this version.

The developers have decided that the 3.5 branch is currently alpha. 
This is a different designation than beta.  Here's how I interpret the
stages of public software releases, from the point of view of the
developers, but others may see them differently:

Alpha:  We've got the code finished, we think it works, now we want to
find out whether it works for other people.

Beta:  We fixed problems that brave people helped us find in the alpha
releases.  The latest build works for everyone who's tried it so far,
now we want to get it out to a wider audience that can really pound on
it and see whether it's bulletproof.

Stable: All the problems found during alpha and beta have been fixed,
the software seems to perform well under heavy usage, and now it's ready
for everyone.

Information available on this list indicates that the current plan is
that version 3.5.3 will be beta, and there is no scheduled release
date.  For 3.4.x, there was only one beta release before it changed to
stable, but there's no guarantee that history will repeat itself.  The
release version numbers did not contain the text "alpha" or "beta" for
the 3.4 releases -- that is new this time.

It looks like this project typically has a very slow release cycle.  The
3.5 branch has been in alpha for two and a half years, significantly
longer than 3.4 was in that state.  The last 3.5 release was July of
last year, and the previous release was nearly a year before that.  If
any show-stopper issues had been discovered, it is likely that there
would have been more frequent releases.If somebody chooses to run
mission-critical systems on 3.5, they do so at their own risk ... but
the risk is probably low.

Thanks,
Shawn



Re: Zookeeper Ensemble Automation

2017-01-05 Thread Shawn Heisey
On 1/5/2017 11:19 AM, Washko, Daniel wrote:
> Thanks for the reply Shawn. I would like to clarify something though.
> Right now, the Dynamic Reconfiguration of Zookeeper works for
> Zookeeper – that is adding/removing nodes automatically without having
> to reconfigure each zookeeper node manually. Once Zookeeper is out of
> Alpha then Solr will be updated to take advantage of the Dynamic
> Reconfiguration capability of Zookeeper and auto-discover any changes.
> Is that correct?

I am not sure whether my understanding is correct, but if it is, then I
don't think a zookeeper 3.4.x client (like the one in Solr) will notice
that the server list (with servers running 3.5.x) has changed. 
Depending on exactly how the membership changed, the SolrCloud instance
might not be able to maintain a viable ZK quorum.  If it loses quorum,
SolrCloud goes read-only.

After ZK 3.5 goes through the beta phase and reaches stable, then Solr
will get the upgrade, and we will make sure that the dynamic
reconfiguration works.  It's a feature that we definitely want, though
we may wait for the second or third stable release before we upgrade to
be absolutely certain that it's solid.

There are a couple of questions I do not know the answer to:  1) Whether
any code changes will be required in Solr to take advantage of dynamic
reconfiguration after the dependency upgrade.  2) Whether a Solr
instance with the 3.5 client could be told about only one ZK server,
then discover the whole cluster and connect to all the servers.  Can a
more knowledgeable member of this community answer these questions for me?

Thanks,
Shawn



Re: Zookeeper Ensemble Automation

2017-01-05 Thread Shawn Heisey
On 1/5/2017 10:28 AM, Washko, Daniel wrote:
> Good day, I am soliciting advice for how to automate setting up and
> maintaining a Zookeeper ensemble. In our environment we try to
> automate everything. We are currently operating out of AWS using
> Scalr. Our goal for Zookeeper would be to automate the creation of a
> Zookeeper ensemble where nodes would join together as they are
> created. For ongoing maintenance, the ability to dynamically add and
> remove nodes is required. We have used Exhibitor for doing this the
> past two years but there is a major problem that we have experienced.
> Every so often the Zookeeper ensemble will lose all the configurations
> stored. We are using Zookeeper with Solr and this causes the cloud to
> fail and collections to be lost. On our Zookeeper Solr implementations
> that are not using Exhibitor we have never had this problem Given that
> Exhibitor’s future remains in flux and along with the problems we have
> had we are trying to find a solution that does not use Exhibitor.
>
> The Dynamic Reconfiguration in the 3.5.x series seems like a good
> option, but 3.5.x has been in Alpha state since 2014 and I don’t see
> any indication when It will jump to beta or even stable. We are leery
> about running alpha software in production.

As I understand it, the dynamic cluster membership in 3.5.x requires
3.5.x *clients*.  The client in the newest version of Solr is 3.4.6.

I'm a beginner with zookeeper, but I am very active in the Solr
community.  Once ZK 3.5.x gets out of beta (still in alpha), a later
version of Solr will be upgraded to the stable 3.5.x version of
zookeeper, and then Solr should support dynamic cluster membership.

Thanks,
Shawn



Re: Is SSL supported in 3.4.9?

2016-12-07 Thread Shawn Heisey
On 12/7/2016 7:44 AM, Dan Langille wrote:
> I'm getting mixed messages from the documentation, and I'm unable to get 
> Zookeeper 
> to talk on a secureClientPort.
> 
> Is SSL supported in Zookeeper 3.4.9?
> 
> At https://zookeeper.apache.org/doc/r3.4.9/zookeeperAdmin.html 
>  I see: 
> 
> "New in 3.4: Netty is an NIO based client/server communication framework 
> Netty framework 
> has built in support for encryption (SSL)"
> 
> But at 
> https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZooKeeper+SSL+User+Guide
>  it 
> states that SSL has "been added in ZOOKEEPER-2125", which is 3.5.1, 3.6.0

Here's hoping I can help, and that I'm not giving you incorrect info.
I'm relatively new to this project, and only here because zookeeper is a
critical component for SolrCloud.

The documentation on zookeeper.a.o for 3.4.9, 3.5.0-alpha, and
3.5.1-alpha do not mention any steps or configuration for setting up
SSL.  The documentation for 3.5.2-alpha does.

I'm reasonably certain that SSL configuration is *not* included in any
3.4.x release.

Because I'm not watching development at all, I have no idea why the 2125
issue says it is in 3.5.1 but the 3.5.1-alpha docs on the website don't
include SSL configuration.  The svn commit to branch-3.5 looks like it
includes an update to the Administrator's Guide with SSL configuration
info.  The 3.5.1-alpha release was announced nearly six months *after*
the commit.  The announcement for that release does mention SSL support.

General observation about all 3.5.X-alpha docs on the site:  The 3.5.0
doc (Administrator's Guide) doesn't have info about what's new in 3.5.0,
the 3.5.1 doc doesn't have info about what's new in 3.5.1, and the 3.5.2
doc doesn't have info about what's new in 3.5.2.  The 3.5.1 doc does
have info about what's new in 3.5.0, and the 3.5.2 doc does have info
about what's new in both 3.5.1 and 3.5.0.  This seems quite strange.

Thanks,
Shawn



Re: Split a large ZooKeeper cluster into multiple separate clusters

2016-09-07 Thread Shawn Heisey
On 9/7/2016 5:15 PM, Eric Young wrote:
> Also, adding/removing ZooKeeper nodes can be problematic to manage
> over a large cluster (partly because 3.4.6 doesn't support live config
> changes b3.5.0+ does). 

When an actual stable 3.5 release comes out, Solr can begin the process
of upgrading its ZK support to include dynamic clusters.

> Unfortunately, the final goal is to have each final SolrCloud cluster
> to have knowledge of every static collections, but the local ZooKeeper
> clusters should not know about the others in other clusters
> (effectively duplicating the collection in each cluster). So, there is
> no rearranging to do, only removing "extra" nodes after splitting the
> ZooKeeper cluster. This may sound counter productive, but the static
> collections are managed outside of Solr. In the event that I do need
> to update the content in one, I can reload the collection on per
> location basis for a less risky deployment. It's a bit scary when you
> need to reload a large static collection across 20+ Solr servers.

If I correctly understand what you're trying to do with the static
collections, Solr cannot do it directly.  If you have collection A in
cluster 1, and collection A in cluster 2, they are entirely separate and
cannot be managed as a single collection.  Queries and updates sent to
one cluster will remain in that cluster and will not ever be sent to
other clusters.

You might be able to use the CDCR (cross-data-center-replication)
feature that's new in 6.x to keep a collection in one cluster in sync
with another cluster.  I'm not familiar with the feature, but there is
documentation.

Thanks,
Shawn



Re: Split a large ZooKeeper cluster into multiple separate clusters

2016-09-07 Thread Shawn Heisey
On 9/7/2016 4:04 PM, Shawn Heisey wrote:
> You *might* be able to just use the DELETE action on the Collections API
> to delete collections instead of manually editing clusterstate, but I'm
> not 100% positive about that.

On second thought, DON'T TRY THIS.

I wouldn't want to take the chance that the DELETE would actually try to
contact the mentioned servers and truly delete the collection.

Thanks,
Shawn



Re: Split a large ZooKeeper cluster into multiple separate clusters

2016-09-07 Thread Shawn Heisey
On 9/7/2016 3:19 PM, Eric Young wrote:
> I have a very large ZooKeeper cluster which manages config and replication
> for multiple SolrCloud clusters.  I want to split the monolithic ZooKeeper
> cluster into smaller, more manageable clusters in a live migration (i.e.
> minimal or no downtime).

The zookeeper list isn't really the right place for most of this.  The
residents of this list will to have zero knowledge of how Solr uses
zookeeper.  I'm on both lists -- and I'm a lot more familiar with Solr
than Zookeeper.

Because Solr normally will not place a large load on zookeeper, I
personally would just use one zookeeper ensemble for both SolrCloud
clusters, each using a different chroot in zookeeper.  I'd use either
three or five ZK servers, depending on how likely I thought it would be
that I would need to survive two servers going down.

That's not what you asked about though, so I will attempt to help you
with what you DID ask about.

> I have collections that can be updated dynamically which are already
> separated logically in different SolrCloud clusters.  I also have some
> static collections (never updated) that have replicas across all the
> SolrCloud clusters though.  All my collections only have a single shard.
>
> ZooKeeper version: 3.4.6
> Solr version: 4.8.1
>
> Example current setup (minimal):
> ZK cluster servers:  z1-1, z1-2, z1-3, z2-1, z2-2, z2-3
> Solr cluster 1 servers: s1-1, s1-2
> Solr cluster 2 servers: s2-1, s2-2
>
> Example collections:
> Dynamic collection 1: c1 (sharded on s1-1, s1-2)
> Dynamic collection 2: c2 (sharded on s2-1, s2-2)
> Static collection 1: c3 (sharded on all 4 Solr servers s1-1, s1-2, s2-1,
> s2-2)

If you have a collection that has replicas on all four Solr servers,
then your four solr servers are *one* SolrCloud cluster, not two.  If
they were separate clusters, it would not be possible to have one
collection with shards/replicas on all four servers.

I really don't know what to do for the zookeeper part of this equation. 
Somebody else on this list will need to answer that.

Downtime is not going to be avoidable.  With careful planning and
execution, you might be able to minimize it.

The first thing you need to do is rearrange the static collection so it
only lives on two of the Solr servers.  To do this, you can use
ADDREPLICA if addiitonal replicas are required, then DELETEREPLICA to
remove it from two of the servers.

At this point, you'll need to shut down all instances of Solr, make
whatever changes are required to split the zookeeper cluster (which I
can't help you with), and update zkHost in Solr so that each pair of
servers only talks to the servers in its cluster.

After making sure that both zk ensembles have all the information in
them, you would then start your Solr servers back up.

Then you'll want to manually edit the two clusterstates to remove all
mention of the collections and servers that don't belong in each
cluster, and after making sure each clusterstate is correct, restart all
the Solr servers.

You *might* be able to just use the DELETE action on the Collections API
to delete collections instead of manually editing clusterstate, but I'm
not 100% positive about that.

Thanks,
Shawn



Re: zookeeper deployment strategy for multi data centers

2016-06-03 Thread Shawn Heisey
On 6/3/2016 1:44 PM, Nomar Morado wrote:
> Is there any settings to override the quorum rule? Would you know the
> rationale behind it? Ideally, you will want to operate the application
> even if at least one data center is up.

I do not know if the quorum rule can be overridden, or whether your
application can tell the difference between a loss of quorum and
zookeeper going down entirely.  I really don't know anything about
zookeeper client code or zookeeper internals.

>From what I understand, majority quorum is the only way to be
*completely* sure that cluster software like SolrCloud or your
application can handle write operations with confidence that they are
applied correctly.  If you lose quorum, which will happen if only one DC
is operational, then your application should go read-only.  This is what
SolrCloud does.

I am a committer on the Apache Solr project, and Solr uses zookeeper
when it is running in SolrCloud mode.  The cloud code is handled by
other people -- I don't know much about it.

I joined this list because I wanted to have the ZK devs include a
clarification in zookeeper documentation -- oddly enough, related to the
very thing we are discussing.  I wanted to be sure that the
documentation explicitly mentioned that three serversare required for a
fault-tolerant setup.  Some SolrCloud users don't want to accept this as
a fact, and believe that two servers should be enough.

Thanks,
Shawn



Re: zookeeper deployment strategy for multi data centers

2016-06-03 Thread Shawn Heisey
On 6/2/2016 4:06 PM, J316 Services wrote:
> We have two data centers and got two servers at each. At an event of a
> data center failure, with the quorum majority rule - the other
> surviving data center seems to be no use at all and we'll be out of luck. 

You are correct -- the scenario you've described is not fault tolerant.

When setting up a geographically diverse zookeeper ensemble, there must
be at least three locations, so if there's a complete power or network
failure at one location, the other two can maintain quorum.  One
solution I saw discussed was a fifth tie-breaker server in a cloud
service like Amazon EC2, or you could go full-scale with two more
servers at a third datacenter.

Thanks,
Shawn



Re: Zookeeper with SSL release date

2016-04-01 Thread Shawn Heisey
On 4/1/2016 10:18 AM, Alexander Shraer wrote:
> Because using reconfig without ACLs any client can remove the servers (or
> replace them with a different set of servers
> or change their configuration parameters) and break the system.

This is a potential worry even without reconfig -- a malicious person
could change or delete the entire database ... yet many people
(including me) run without ACLs.

My ZK ensemble is in a network location that unauthorized people can't
reach without finding and exploiting some vulnerability that has not yet
reached my awareness.

If somebody can gain access to the ZK machines, at least one of my
public-facing servers is already compromised.  ZK will be very low on my
list of things to worry about.  Chances are that even if the attacker
figured out I was using ZK and where it lives, it would be extremely low
on THEIR list of priorities -- it doesn't contain any sensitive info,
and there are far more efficient ways to cause problems.

Thanks,
Shawn



Re: Is it a accepted practice to share Zookeeper ensemble among Kafka, Storm and Solr

2016-03-30 Thread Shawn Heisey
On 3/30/2016 12:08 AM, Flavio Junqueira wrote:
> Sharing is definitely ok, and I'd say that is common practice, but it really 
> depends on the size of the cluster you're talking about and your workload. 
> Storm can be quite demanding on ZK, and Kafka typically isn't except during 
> fail-over. I don't have experience with the workload of Solr. If you have a 
> way of testing it against some sample workload, I'd suggest you do it.

Unless there are tons of nodes/collections/shards/replicas, and/or a lot
of issues that cause clusterstate changes, I would expect Solr's impact
on zookeeper to be fairly light.  I strongly recommend a chroot for
Solr's connection to ZK, especially when sharing the ensemble.

Every Solr node and every CloudSolrClient object in client code will use
watchers, though I have no idea how many are created.

Thanks,
Shawn



Re: from zookeper embedded to standalone

2016-03-15 Thread Shawn Heisey
On 3/15/2016 10:13 AM, Flavio Junqueira wrote:
> Have you asked on the Solr list? They are probably better equipped to answer 
> your question.
>
> On our end, if you are switching to a new set of servers, it is unclear how 
> you're going to safely migrate your data from the old ensemble to the new 
> one. It is also unclear what will happen with in-flight requests. If it is 
> the same set of servers, then I don't se why you'd have issues. We'd need to 
> know more about your setup, though.

I saw the message on the Solr list, and was going to send Rachid here,
because migrating the zookeeper database successfully to different
servers is something that you can help with better.

Here's the overall steps I'd use:

* Shut down all Solr processes.
* Migrate the zookeeper DB and start the new ensemble.
* Start each Solr server without the -DzkRun option and with a corrected
-DzkHost option.

The middle step is what you fine folks will need to help with.

Drawing from elsewhere on this thread ... what "history" would be lost? 
As far as I know, Solr only cares what's in the database at any given
moment ... but my understanding may be incomplete.

Because Solr includes version 3.4.6, adding/removing servers on the fly
isn't supported, which is why I thought a migration with downtime would
be the best option.

Thanks,
Shawn



Re: Multi DC ( DC-1 and DC-2) zookeeper setup

2016-03-08 Thread Shawn Heisey
On 3/8/2016 3:40 PM, s influxdb wrote:
> How does the client failover to the DC2 if DC1 is down ? Does the services
> registered on DC1 for example with ephemeral nodes have to re-register
with
> DC2 ?

Even though Flavio and Camille have both said this, I'm not sure whether
the posters on this thread are hearing it:

If you only have two datacenters, you cannot set up a reliable zookeeper
ensemble.  It's simply not possible.  There are NO combinations of
servers that will achieve fault tolerance with only two datacenters.

The reason this won't work is the same reason that you cannot set up a
reliable ensemble with only two servers.  If either data center goes
down, half of your ZK nodes will be gone, and neither data center will
have enough nodes to achieve quorum.

When you have three datacenters that are all capable of directly
reaching each other, you only need one ZK node in each location.  If any
single DC goes down, the other two will be able to keep the ensemble
running.

Data is replicated among the DCs in exactly the same way that it is if
all the servers are in one place.  I don't know enough about internal ZK
operation to comment further.

=

Some TL;DR information to follow:

If you want to be able to take a node down for maintenance in a multi-DC
situation and *still* survive an entire DC going down, you need three
nodes in each of three data centers -- nine total.  This ensemble is
able to survive any four servers going down, so you can take down a node
in one DC for maintenance, and if one of the other DCs fails entirely,
there will be five functioning servers that can maintain quorum.

Detailed information for the specific situation outlined by Kaushal:

DC-1 1 Leader 2 Followers
DC-2 1 Follower 2 Observers.

A six-node ensemble requires at least operational four nodes to maintain
quorum.  If either of those data centers fails, there are only three
nodes left, which is not enough.

Thanks,
Shawn



Re: Java version and zookeper

2016-01-28 Thread Shawn Heisey

On 1/28/2016 8:54 AM, Muresanu A.V. (Andrei Valentin) wrote:

what is the "supported" oracle jdk version that is supported by zookeeper 3.4.6 
?


In the Administrator's guide:

http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html

This text is included, under "Required Software":

ZooKeeper runs in Java, release 1.6 or greater (JDK 6 or greater).

Thanks,
Shawn



3.4.6 download hard to find

2015-12-11 Thread Shawn Heisey
Someone on the #solr IRC channel asked about where they can get ZK
version 3.4.6, since that's what's included in the most recent version
of Solr (5.4.0 is a day or two from release).

When I first looked, and couldn't even find it on the "archives" link
that's present on each mirror, I initially suggested that the user
should use 3.4.7 instead.  I'd like to know whether there is enough
binary compatibility with 3.4.6 for this to be a reasonable suggestion. 
Since it's a point release and not a new minor version, this seems likely.

Further digging shows that releases up to 3.3.2 are in the
hadoop/zookeeper archive location, with newer archived releases in the
zookeeper archive location.  I'm guessing that 3.3.3 was the first
release after the project was promoted to top level.

Thoughts about making this situation better:  Change the "archives" link
on the mirror page to the zookeeper archive location (newer versions). 
Both archive locations should have links pointing at the other archive
location.

Thanks,
Shawn



Prevent a znode from exceeding jute.maxbuffer

2015-10-01 Thread Shawn Heisey
I was going to open an issue in Jira for this, but I figured I should
discuss it here before I do that, to make sure that's a reasonable
course of action.

I was thinking about a problem that we encounter with SolrCloud, where
our overseer queue (stored in zookeeper) will greatly exceed the default
jute.maxbuffer size.  I encountered this personally while researching
something for a Solr issue:

https://issues.apache.org/jira/browse/SOLR-7191?focusedCommentId=14347834=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14347834

It seems silly that a znode could get to 14 times the allowed size
without notifying the code *inserting* the data.  The structure of our
queue is such that entries in the queue are children of the znode.  This
means that the data stored directly in the znode is not the problem
(which is pretty much nonexistent in this case), it's the number of
children.

It seems like it would be a good idea to reject the creation of new
children if that would cause the znode size to exceed jute.maxbuffer. 
This moves the required error handling to the code that *updates* ZK,
rather than the code that is watching and/or reading ZK, which seems
more appropriate to me.

Alternately, the mechanisms involved could be changed so that the client
can handle accessing a znode with millions of children, without
complaining about the packet length.

Thoughts?

Thanks,
Shawn



Re: Prevent a znode from exceeding jute.maxbuffer

2015-10-01 Thread Shawn Heisey
On 10/1/2015 6:35 PM, Edward Ribeiro wrote:
> I agree with you, and I think
> https://issues.apache.org/jira/browse/ZOOKEEPER-2260 comes close to the
> second approach you suggested. wdyt?

Interesting!  That could be helpful.

I think it would require changes to the user application code, to handle
the pagination.  If that could be avoided, it would be better, but I'm
not sure that it can be avoided.

If changes to user code are required, I think I like the idea of
rejecting new child creation more -- user code changes will be about
properly handling exceptions at update time instead of modifying the
consuming code to paginate.

A feature to reject child creation should probably be a mode of
operation that can be enabled, but will not be turned on by default. 
Down the road, after the real-world impact of that option has been
determined, the question of whether to turn it on by default can be
reviewed, and perhaps delayed until the next major release (4.0).

A corollary idea -- allow configurable thresholds (percentages of
jute.maxbuffer, maybe) which will slow down the creation of new
children, with the amount of pause increasing as the size of the znode
increases .. and ultimately reject the creation if the buffer size would
actually be exceeded.  I have mixed feelings about this idea.

Thanks,
Shawn



Set source address in zookeeper client?

2015-09-08 Thread Shawn Heisey
A user wrote to the solr-user mailing list asking how they could set the
source address for zookeeper connections from their multi-homed
SolrCloud install.  Solr doesn't have any way to configure this, but I
was wondering whether there's a system property honored by zookeeper
that would bind the source address.  I have not been able to find
anything in the zookeeper docs or with google.

Thanks,
Shawn