Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.

The "HedWig" page has been changed by mahadevkonar.
http://wiki.apache.org/hadoop/HedWig?action=diff&rev1=8&rev2=9

--------------------------------------------------

  
  {{attachment:hedwig.png}}
  
- Now digging down into a region, it consists of a collection of '''hub 
servers'''. Hub servers aggregate messages published in a region and persist 
them. They also subscribe to hubs in other regions to listen for messages that 
their clients are subscribed to. Clients always subscribe only to  local hub 
servers. Hedwig plans to use [[http://hadoop.apache.org/zookeeper/][Zookeeper]] 
for persistence of metadata, and 
[[https://issues.apache.org/jira/browse/ZOOKEEPER-276][Bookkeeper]] for 
persistence of actual messages.
+ Now digging down into a region, it consists of a collection of '''hub 
servers'''. Hub servers aggregate messages published in a region and persist 
them. They also subscribe to hubs in other regions to listen for messages that 
their clients are subscribed to. Clients always subscribe only to  local hub 
servers. Hedwig plans to use [[http://hadoop.apache.org/zookeeper/ | 
Zookeeper]] for persistence of metadata, and 
[[https://issues.apache.org/jira/browse/ZOOKEEPER-276 | Bookkeeper]] for 
persistence of actual messages.
  
+ {{attachment:region.png}}
- <center>
- <img src="%ATTACHURLPATH%/region.png" width="500"/>
- </center>
- 
  
  Topics are randomly split over hubs. When the hub responsible for a topic 
fails, another hub should take over responsibility of the topic.
  
- <hline>
- 
  Now digging into a hub, it consists of 6 components:
  
+ {{attachment:hub.png}}
- <center>
- <img src="%ATTACHURLPATH%/hub.png" width="500"/>
- </center>
  
-    1 *Network-I/O component*: For high throughput, we use Java NIO through a 
framework called [[http://www.jboss.org/netty][Netty]].
+    1. '''Network-I/O component''': For high throughput, we use Java NIO 
through a framework called [[http://www.jboss.org/netty | Netty]].
-    1 *Topic Manager*: Maintains and coordinates ownership of topics among 
hubs. It is responsible for doing automatic failover when a hub dies. It will 
be a Zookeeper client.
+    1. '''Topic Manager''': Maintains and coordinates ownership of topics 
among hubs. It is responsible for doing automatic failover when a hub dies. It 
will be a Zookeeper client.
-    1 *Subscription Manager*: Maintains information about which subscriptions 
exist in the system, and their _consume-points_: the point in the topic until 
which a subscriber has already received and acknowledged, so that the next 
delivery of a message can start from this point.
+    1. '''Subscription Manager''': Maintains information about which 
subscriptions exist in the system, and their '''consume-points''': the point in 
the topic until which a subscriber has already received and acknowledged, so 
that the next delivery of a message can start from this point.
-    1 *Persistence Manager*: Persists messages in a reliable way so that they 
can be retrieved sequentially later. It will be a Bookkeeper client.
+    1. '''Persistence Manager''': Persists messages in a reliable way so that 
they can be retrieved sequentially later. It will be a Bookkeeper client.
-    1 *Remote Subscriber*: Subscribes to hubs in other regions for the topics, 
so that the messages published there can also reach the local clients. This 
component will be our own Hedwig Java client.
+    1. '''Remote Subscriber''': Subscribes to hubs in other regions for the 
topics, so that the messages published there can also reach the local clients. 
This component will be our own Hedwig Java client.
-    1 *Delivery Manager*: This component will be responsible for delivering 
messages to the subscribers. The subscribers can be either our own local 
clients or the hub in other regions.
+    1. '''Delivery Manager''': This component will be responsible for 
delivering messages to the subscribers. The subscribers can be either our own 
local clients or the hub in other regions.
  
- ---+++ Code-Level Design & Details
+ === Code-Level Design & Details ===
  
  The hedwig project consists of 3 modules:
  
-    * 
[[http://svn.corp.yahoo.com/view/yahoo/yrl/hedwig/trunk/hedwig/protocol/src/main/protobuf/PubSubProtocol.proto?view=markup][Protocol]]:
 This is a simple module that specifies the client-server protocol used by 
Hedwig. The protocol is specified as a 
[[http://code.google.com/p/protobuf/][Protocol Buffers]] file. Protocol buffers 
can generate serialization and deserialization code for us in multiple 
languages.
+    * [[ 
http://svn.apache.org/viewvc/hadoop/zookeeper/trunk/src/contrib/hedwig/protocol/src/main/protobuf/PubSubProtocol.proto?view=markup
 | Protocol]]: This is a simple module that specifies the client-server 
protocol used by Hedwig. The protocol is specified as a 
[[http://code.google.com/p/protobuf/ | Protocol Buffers]] file. Protocol 
buffers can generate serialization and deserialization code for us in multiple 
languages.
-    * [[HedwigClient][Client]]: We support both a c++ and a java client 
library. The client module obviously depends on the protocol module. 
+    * [[HedwigClient | Client]]: We support both a c++ and a java client 
library. The client module obviously depends on the protocol module. 
-    * [[HedwigServer][Server]]: This the major chunk of the system. A server 
is responsible for certain topics, and accepting publish and subscribe requests 
for them. The server uses (and hence depends on) the hedwig java client to 
subscribe to topics in other regions.   
+    * [[HedwigServer | Server]]: This the major chunk of the system. A server 
is responsible for certain topics, and accepting publish and subscribe requests 
for them. The server uses (and hence depends on) the hedwig java client to 
subscribe to topics in other regions.   
  
- <blockquote>
  Get started by checking out the repository:
- =svn+ssh://svn.corp.yahoo.com/yahoo/yrl/hedwig/trunk/hedwig=
+ =====  https://svn.apache.org/repos/asf/hadoop/zookeeper/trunk =====
+ Follow the directions in the BUILD.txt in src/contrib/hedwig
  
- Follow the directions in the BUILD.txt
- </blockquote>
+ The following '''coding conventions''' are to be followed:
+    1. Indentation only with 4 spaces, no tabs (adjustable using Eclipse 
settings).
+    1. Use curly braces even for single-line ifs and elses. I am sure you have 
encountered at least 1 bug in your life because the else got associated with 
the wrong if.
+    1. No System.outs (only logging)
+    1. Javadoc (even if brief) for every class. No author tags.
+    1. We are building a high-throughput server. Watch out for the following 
performance crippling gotchas:
+       1. If you are logging something that requires string concatenation, 
even if it is at debug level, you pay the price of concatenation. Hence 
surround by =if(isDebugEnabled())= block
+       1. Use !StringBuilder while building long strings, don't just use +'s.
+    1. Remember that we are multithreaded which is a double-edged sword.
+       1. Protect your data structures where needed with a lock.
+       1. However locking needs to be well-understood and documented. Please 
bring it up in the design discussions. 
  
+ ==== Test ===
- The following *coding conventions* are to be followed:
-    1 Indentation only with 4 spaces, no tabs (adjustable using Eclipse 
settings).
-    1 Use curly braces even for single-line ifs and elses. I am sure you have 
encountered at least 1 bug in your life because the else got associated with 
the wrong if.
-    1 No System.outs (only logging)
-    1 Javadoc (even if brief) for every class. No author tags.
-    1 We are building a high-throughput server. Watch out for the following 
performance crippling gotchas:
-       * If you are logging something that requires string concatenation, even 
if it is at debug level, you pay the price of concatenation. Hence surround by 
=if(isDebugEnabled())= block
-       * Use !StringBuilder while building long strings, don't just use +'s.
-    1 Remember that we are multithreaded which is a double-edged sword.
-       * Protect your data structures where needed with a lock.
-       * However locking needs to be well-understood and documented. Please 
bring it up in the design discussions. 
- 
- ---+++ Test
  
  Every class that has a non-abstract method must have a test case. Maven 
directory structure is already set up to be able to write and run tests easily.
   
- ---+++ Current Tasks
  
- ---++++ Protocol (Ben)
- 
-    1 use region name as subscriberID for hub subscribers. This needs only an 
explanatory comment
-    1 Put a transaction-id in publishes. This can help us in firing off 
multiple publishes at a time.
-    1 Put a version string in every request. 
-    1 Figure out our authentication and access control story. It's important 
to have one.
- 
- ---++++ Server
- 
-    1 Design a push interface for reading rather than the current pull-based 
(Utkarsh)
-    1 How do topics get created? Is it on the first subscription? Who gets the 
ownership for the topic, and how do we ensure that this is load balanced across 
the hubs? (Brian)
-    1 Figure out a rich admin interface and how to implement it with the least 
effort - Possibly jmx (Erwin)
-    1 Cross-region subscription problem (and hence log collection problem): 
How do we avoid making the cross-region call when we receive a local 
subscription for a brand new topic. If we don't make the cross-region call, 
then log collection must be performed only after all regions agree that no one 
is interested. (Adam)
-    1 Designing the delivery manager, especially the management of readahead 
of messages for subscribers, and caching the recently published messages. While 
designing this interface, keep it pluggable to different policies. 
-    1 Consider whether bookkeeper client library needs to be written using 
asynchronous network I/O. IOW, is 1 thread per "bookie" acceptable?
- 
- ---+++ Notes
- 
-    * [[HedwigTopicManagement][Topic management]]
- 

Reply via email to