On 25 Aug 2016, at 22:49, kant kodali 
<kanth...@gmail.com<mailto:kanth...@gmail.com>> wrote:

yeah so its seems like its work in progress. At very least Mesos took the 
initiative to provide alternatives to ZK. I am just really looking forward for 
this.

https://issues.apache.org/jira/browse/MESOS-3797




I worry about any attempt to implement distributed consensus systems: they take 
time in production to get right.

1. There's the need to prove that what you are building is valid if the 
implementation matches the specification. That has apparently been done for ZK, 
though given the complexity of maths involved, I cannot vouch for that myself:
https://blog.acolyer.org/2015/03/09/zab-high-performance-broadcast-for-primary-backup-systems/

2. you need to run it in production to find the problems. Google's Chubby paper 
hints about the things they found out went wrong there. As far as ZK goes, 
jepsen hints its robust

https://aphyr.com/posts/291-jepsen-zookeeper

If it has weaknesses, I'd point at
 - it's security model
 -it's lack of helpfulness when there are kerberos/SASL auth problems (ZK 
server closes connection; client sees connection failure and retries),
 -the fact that it's failure modes aren't always understood by people coding 
against it.

http://blog.cloudera.com/blog/2014/03/zookeeper-resilience-at-pinterest/

the Raft algorithm appears to be easier to implement than Paxos; there are 
things built on it and I look forward to seeing what works/doesn't work in 
production.

Certainly Aphyr found problems when it pointed jepsen at etcd, though being a 
2014 piece of work, I expect those specific problems to have been addressed. 
The main thing is: it shows how hard it is to get things right in the presence 
of complex failures.

Finally, regarding S3

You can use S3 object store as a source of data in queries/streaming, and, if 
done carefully, a destination. Performance is variable...something some of us 
are working on there, across S3a, spark and hive.

Conference placement: I shall be talking on that topic at Spark Summit Europe 
if you want to find out more: https://spark-summit.org/eu-2016/


On Thu, Aug 25, 2016 2:00 PM, Michael Gummelt 
mgumm...@mesosphere.io<mailto:mgumm...@mesosphere.io> wrote:
Mesos also uses ZK for leader election.  There seems to be some effort in 
supporting etcd, but it's in progress: 
https://issues.apache.org/jira/browse/MESOS-1806

On Thu, Aug 25, 2016 at 1:55 PM, kant kodali 
<kanth...@gmail.com<mailto:kanth...@gmail.com>> wrote:
@Ofir @Sean very good points.

@Mike We dont use Kafka or Hive and I understand that Zookeeper can do many 
things but for our use case all we need is for high availability and given the 
devops people frustrations here in our company who had extensive experience 
managing large clusters in the past we would be very happy to avoid Zookeeper. 
I also heard that Mesos can provide High Availability through etcd and consul 
and if that is true I will be left with the following stack





Spark + Mesos scheduler + Distributed File System or to be precise I should say 
Distributed Storage since S3 is an object store so I guess this will be HDFS 
for us + etcd & consul. Now the big question for me is how do I set all this up 
[https://dv4jgpe7xb4ws.cloudfront.net/v1/simple_smile.png]






Reply via email to