Hi, In classic Hama mode, client works directly with already existing Zookeeper. "ZooKeeperSyncServerImpl.java" is used for launching a syncServer (= zookeeper server) on YARN cluster.
If a node fails, Job will be restarted from last checkpoint position. P.S., FT does not perfectly works yet. On Fri, May 10, 2013 at 1:51 AM, kishore g <[email protected]> wrote: > Thanks Edward, > > I looked at the code and it looks like its nicely abstracted. I see some > comments in the code that say this happens only in YARN. Can you give me > some additional info on what is the difference when running with YARN. > > Another thing I wanted to check is what happens when a node fails, is the > entire job restarted or just super step or just the sub task of the super > step. I am interested in the current behavior and what would be nice to > have. > > Is there a document that describes the internal architecture. > > Thanks, > Kishore G > > > > > On Wed, May 8, 2013 at 6:21 PM, Edward J. Yoon <[email protected]>wrote: > >> Hi, >> >> This would be great collaboration. Since we pursue the pluggable >> interfaces for managing the synchronization[1], messenger, and job >> scheduling systems (we want to preserve the classic (standalone) >> cluster mode, while integrating with resource manager systems), the >> integration with Helix won't be difficult. >> >> 1. http://wiki.apache.org/hama/SyncService >> >> On Thu, May 9, 2013 at 7:01 AM, kishore g <[email protected]> wrote: >> > Hello, >> > >> > I am starting a discussion thread on potential pros/cons of using Helix >> in >> > Hama. I dont know the internal details of Hama, so please correct me if >> > something does not make sense. >> > >> > My source of information is http://wiki.apache.org/hama/Architectureand a >> > brief chat with Suraj at ApacheCon where he described the need for >> barriers >> > between super steps. >> > >> > Please read about Apache Helix here http://helix.incubator.apache.org/. >> > >> > Architecture wise Helix maps pretty well with the components in Hama. >> > HelixController can be wrapped inside BSPMaster and GroomServer is the >> > PARTICIPANT in Helix terminology that wraps Helix Agent. >> > >> > The partitioning and assigning tasks to GroomServers can be done via >> Helix >> > Apis, it basically boils down to setting the idealstate for a particular >> > stage. Starting of the next step which basically depends on all tasks in >> > previous step being completed can be done by watching the ExternalView. >> > >> > In the architecture wiki, I see that there is plan to integrate with >> > Zookeeper for fault tolerance. Helix internally uses Zookeeper to store >> the >> > cluster state. So it might make it easier to make the tasks fault >> tolerant >> > and probably restartable as well at a task level instead of job/stage >> level. >> > >> > We recently added a recipe in Helix to demonstrate the concept of >> > dependency between resources. >> > >> > http://helix.incubator.apache.org/recipes/task_dag_execution.html >> > Code: >> > >> https://github.com/apache/incubator-helix/tree/master/recipes/task-execution/src/main/java/org/apache/helix/taskexecution >> > >> > Let me know your thoughts. >> > >> > thanks, >> > Kishore G >> >> >> >> -- >> Best Regards, Edward J. Yoon >> @eddieyoon >> -- Best Regards, Edward J. Yoon @eddieyoon
