Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.

The "ZooKeeper/SoC2010Ideas" page has been changed by HenryRobinson.
http://wiki.apache.org/hadoop/ZooKeeper/SoC2010Ideas?action=diff&rev1=2&rev2=3

--------------------------------------------------

  
  More details about Google's Summer of Code can be found here: 
http://socghop.appspot.com/document/show/gsoc_program/google/gsoc2010/faqs - 
the deadline for our applications is Friday, so get those ideas coming in!
  
+ (Note that the mentorship assignment is tentative in some cases - every 
project can be mentored, however!)
+ 
- ----
+ ------
  === Optimizations for WAN Deployments ===
  ==== Possible Mentor ====
  Henry Robinson (henry at apache dot org)
  ==== Requirements ====
  Java, some networking familiarity
  ==== Description ====
- ZK 3.3.0 added ''observers'' which are non-voting members of a ZK ensemble. 
One use case for observers is as a proxy to a remote voting ensemble, say in a 
different data center. Since observers do not need to vote, there are less 
strict latency requirements on the delivery of messages to them. WAN traffic is 
also expensive. This project would investigate and implement batching of 
messages to observers, and potential mechanisms for decreasing the number of 
messages that need to be sent. For example, a destructive update to a znode 
twice in a row does not theoretically need to be sent twice - although making 
this work correctly with ZAB will be a challenge.  
+ ZK 3.3.0 added 
[[http://www.cloudera.com/blog/2009/12/observers-making-zookeeper-scale-even-further/|''observers'']
 which are non-voting members of a ZK ensemble. One use case for observers is 
as a proxy to a remote voting ensemble, say in a different data center. Since 
observers do not need to vote, there are less strict latency requirements on 
the delivery of messages to them. WAN traffic is also expensive. This project 
would investigate and implement batching of messages to observers, and 
potential mechanisms for decreasing the number of messages that need to be 
sent. For example, a destructive update to a znode twice in a row does not 
theoretically need to be sent twice - although making this work correctly with 
ZAB will be a challenge.  
+ ----
  === FUSE module for BookKeeper ===
  ==== Possible Mentor ====
  Ben Reed (breed at apache dot org) & Patrick Hunt (phunt at apache dot org)
@@ -27, +30 @@

  BookKeeper is a distributed write ahead log with client & server written in 
Java. BookKeeper client & server also use ZooKeeper. There is a BookKeeper API 
that clients can use to integrate write ahead logging into their application. 
It would be a lot easier if applications could use BK without changes to the 
client application through use of a file system api (FUSE). The project would 
involve implementing a C interface for BookKeeper (Java already exists) and 
implementing the FUSE module.
  
  Example use: the write ahead log in mysql, called binlogs are typically 
written to the local filesystem using the std filesystem api. We could modify 
mysql to use BooKeeper, however if we had a BK FUSE module we could run it 
(mysql) w/o any modification and get the performance/reliability of a 
distributed write ahead log.
+ ----
+ === Web-based Administrative Interface ===
+ ==== Possible Mentor ====
+ Henry Robinson (henry at apache dot org)
+ ==== Requirements ====
+ Modern web platform - e.g. Django. Some design or UI skills would help. Java 
for adding methods to ZooKeeper. 
+ ==== Description ====
+ ZooKeeper is a complex distributed system. Understanding how well it is 
running is tremendously important. Patrick Hunt has created a Django-based 
dashboard (see http://github.com/phunt/zookeeper_dashboard#readme) that allows 
some insight into how ZooKeeper is running. This is a great foundation on which 
to build; however there are improvements that could be made! This project would 
capture much more information from ZooKeeper, adding hooks to retrieve it where 
necessary and visualise it in a appealing and useful way. Integration with 
[[http://ganglia.sourceforge.net/|Ganglia]] would be a definite plus. 
  
+ ----
+ === Failure Detector Module ===
+ ==== Possible Mentor ====
+ Henry Robinson (henry at apache dot org)
+ ==== Requirements ====
+ Java, some distributed systems knowledge, comfort implementing distributed 
systems protocols
+ ==== Description ====
+ ZooKeeper servers detects the failure of other servers and clients by 
counting the number of 'ticks' for which it doesn't get a heartbeat from other 
machines. This is the 'timeout' method of failure detection and works very 
well; however it is possible that it is too aggressive and not easily tuned for 
some more unusual ZooKeeper installations (such as in a wide-area network, or 
even in a mobile ad-hoc network).
+ 
+ This project would abstract the notion of failure detection to a dedicated 
Java module, and implement several failure detectors to compare and contrast 
their appropriateness for ZooKeeper. For example, 
[[http://incubator.apache.org/cassandra/|Apache Cassandra]] uses a phi-accrual 
failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which is much more 
tunable and has some very interesting properties. This is a great project if 
you are interested in distributed algorithms, or want to help re-factor some of 
ZooKeeper's internal code. 
+ 
+ ----
+ === Concurrent Primitives Library ===
+ ==== Possible Mentor ====
+ Henry Robinson (henry at apache dot org)
+ ==== Requirements ====
+ Java / C / Python / Ruby
+ ==== Description ====
+ ZooKeeper is very powerful, but sometimes a bit hard to use. This project 
will create a library of concurrent primitives such as locks in many varieties 
(RW, MRSW, MRMW), concurrent queues, barriers, latches and semaphores that 
application developers can use to more easily take advantage of ZooKeeper's 
power. Writing distributed systems code is hard, and therefore should be done 
at most once!
+ 
+ This project would contribute a solid library in at least one of our 
supported languages. It is possible that improvements to the client library 
will be required - e.g. wrapper code to retry failed RPC calls. That falls 
under the scope of this project!
+ 
+ This would be a very interesting project to work on, as it will directly 
influence the evolution of the ZooKeeper API when you discover things that the 
current API makes needlessly difficult. 
+ 
+ See 
http://www.cloudera.com/blog/2009/05/building-a-distributed-concurrent-queue-with-apache-zookeeper/
 for a detailed example of how to build a distributed queue. 
+ 
+ ----
+ === ZooKeeper DNS Server ===
+ ==== Possible Mentor ====
+ Henry Robinson (henry at apache dot org)
+ ==== Requirements ====
+ Java or Python or C
+ ==== Description ====
+ Although ZooKeeper is primarily used for co-ordination of distributed 
processes, its consistency semantics means that it's a good candidate for 
serving small (key,value) records as well. The Domain Name Service has similar 
requirements, raising the interesting question of whether ZooKeeper would be a 
capable DNS server for your local network. One intriguing possibility is having 
''versioned'' DNS records, such that known-good configurations can be stored 
and rolled back to in the case of an issue.  If this versioning primitive 
proves to be useful, it's easy to imagine other types of configuration that 
could be stored.
+ 
+ This project would involve designing and building an RFC-1035 compliant DNS 
server and performing a detailed performance study against an already existant 
simple DNS server like tinydns. 
+ 
+ ----
+ === Read-only mode ===
+ ==== Possible Mentor ====
+ Henry Robinson (henry at apache dot org)
+ ==== Requirements ====
+ Java and TCP/IP networking
+ ==== Description ====
+ When a ZooKeeper server loses contact with over half of the other servers in 
an ensemble ('loses a quorum'), it stops responding to client requests because 
it cannot guarantee that writes will get processed correctly. For some 
applications, it would be beneficial if a server still responded to read 
requests when the quorum is lost, but caused an error condition when a write 
request was attempted. 
+ 
+ This project would implement a 'read-only' mode for ZooKeeper servers (maybe 
only for Observers) that allowed read requests to be served as long as the 
client can contact a server.
+ 
+ This is a great project for getting really hands-on with the internals of 
ZooKeeper - you must be comfortable with Java and networking otherwise you'll 
have a hard time coming up to speed. 
+ 

Reply via email to