In that case I suppose we can simply revert Superstep to original plain bsp function, and just sideline any issues related to FT at the moment.
On 10 April 2014 13:24, Edward J. Yoon <[email protected]> wrote: > As you know, we are still NOT supporting FT job processing, and > there's no documentation. I might be wrong but we can *simply* restart > whole tasks from the last checkpoint files on HDFS. > > It has been many years since we've discussed about FT and superstep > API. And main contributors of FT job processing are currently > inactive. > > May I close all old issue tickets? Let's just code it. > > > > On Thu, Apr 10, 2014 at 2:31 AM, Chia-Hung Lin <[email protected]> wrote: >> That's why I proposed to use Superstep api instead, though I prefer >> plain bsp function. Unless we want to instrument the source code, >> which I believe is not what we, including users, want. >> >> With Superstep api we can resume the message from the latest (the new >> refactored code should base on this as well) checkpointed message, >> under some precondition. >> >> Alternative we can implement our own code (not Java or probably in >> Java 8) to perform checkpoint, but that would take very long time in >> accomplishing those tasks. I would put that issue in the future >> roadmap because personally I perform plain bsp function instead of >> Superstep. >> >> >> On 9 April 2014 23:56, Suraj Menon <[email protected]> wrote: >>> I don't like my patch in HAMA-639 myself, eventhough I believe it satisfies >>> all the mentioned requirements. The usage of superstep chaining API >>> implementation in the patch is too complicated. A superstep here is like a >>> transformation function you define on an RDD in Spark. So if you look into >>> FT design of Spark, on failure, they rerun the operations on the RDD to get >>> to the current state. This is similar to what we have in mind using >>> checkpointing. The challenge is in getting the same messages replayed to >>> newly spawned task on checkpointed data. If you don't use the Superstep(or >>> any other abstraction representing a function) you cannot start processing >>> from a line of code where the failure occurred. (Java does not support goto >>> line number.) >>> >>> -Suraj >>> >>> >>> On Wed, Apr 9, 2014 at 7:29 AM, Edward J. Yoon <[email protected]>wrote: >>> >>>> I just found this: https://issues.apache.org/jira/browse/HAMA-503 and >>>> HAMA-639. >>>> >>>> Do you still think superstep API is essential for checkpoint/recovery? >>>> If not, we can drop it. I don't think it's good idea. >>>> >>>> On Wed, Apr 9, 2014 at 7:43 PM, Chia-Hung Lin <[email protected]> >>>> wrote: >>>> > Not very sure if we sync at the same page. And sorry I am not very >>>> > familiar with Superstep implementation. >>>> > >>>> > I assume that traditional bsp model means the original bsp interface >>>> > where there is a bsp function and user can freely call peer.sync(), >>>> > etc. methods >>>> > >>>> > .... bsp(BSPPeer ... peer) { >>>> > // whatever computation >>>> > peer.sync(); >>>> > } >>>> > >>>> > And the superstep style is with Superstep abstract class. >>>> > >>>> > If this is the case, SuperstepBSP.java has already call sync, as >>>> > below, outside each Superstep.compute(). So it looks like even >>>> > SuperstepPiEstimator doesn't call sync() method, barrier sync will be >>>> > executed because each Superstep is viewed as a superstep in original >>>> > BSP definition. >>>> > >>>> > @Override >>>> > public void bsp(BSPPeer<K1, V1, K2, V2, M> peer) throws IOException, >>>> > SyncException, InterruptedException { >>>> > for (int index = startSuperstep; index < supersteps.length; index++) >>>> { >>>> > Superstep<K1, V1, K2, V2, M> superstep = supersteps[index]; >>>> > superstep.compute(peer); >>>> > if (superstep.haltComputation(peer)) { >>>> > break; >>>> > } >>>> > peer.sync(); >>>> > startSuperstep = 0; >>>> > } >>>> > } >>>> > >>>> > Within the Superstep.compute(), if sync is called again, I would think >>>> > that another barrier sync will be executed. >>>> > >>>> > SuperstepBSP.java >>>> > >>>> > for(...) { >>>> > superstep .compute() -> { // in compute method >>>> > ... >>>> > peer.sync() >>>> > } >>>> > ... >>>> > peer.sync() >>>> > } >>>> > >>>> > IIRC each call to sync may raise the checkpoint (no recovery) method >>>> > serialize message to hdfs. >>>> > >>>> > For SerializePrinting, following code snippet may move >>>> > >>>> > for (String otherPeer : bspPeer.getAllPeerNames()) { >>>> > bspPeer.send(otherPeer, new >>>> IntegerMessage(bspPeer.getPeerName(), i)); >>>> > } >>>> > >>>> > to Superstep.compute() >>>> > >>>> > And the outer for loop is what is programmed in SuperstepBSP.java >>>> > >>>> > for (int i = 0; i < NUM_SUPERSTEPS; i++) { >>>> > // code that should be moved to Superstep.compute() >>>> > } >>>> > bspPeer.sync(); >>>> > >>>> > >>>> > >>>> > On 9 April 2014 16:17, Edward J. Yoon <[email protected]> wrote: >>>> >> As you can see here[1], the sync() method never called, and an classes >>>> >> of all superstars were needed to be declared within Job configuration. >>>> >> Therefore, I thought it's similar with Pregel style on BSP model. It's >>>> >> quite different from legacy model in my eyes. >>>> >> >>>> >> According to HAMA-505, superstep API seems used for FT job processing >>>> >> (I didn't read closely yet). Right? In here, I have an questions. What >>>> >> happens if I call the sync() method within compute() method? In this >>>> >> case, framework guarantees the checkpoint/recovery? And how can I >>>> >> implement the http://wiki.apache.org/hama/SerializePrinting using >>>> >> superstep API? >>>> >> >>>> >>> What's difference between pure BSP and FT BSP? Any concrete example? >>>> >> >>>> >> I was mean the traditional BSP programming model. >>>> >> >>>> >> 1. >>>> http://svn.apache.org/repos/asf/hama/trunk/examples/src/main/java/org/apache/hama/examples/SuperstepPiEstimator.java >>>> >> >>>> >> On Wed, Apr 9, 2014 at 4:25 PM, Chia-Hung Lin <[email protected]> >>>> wrote: >>>> >>> Sorry don't catch the point. >>>> >>> >>>> >>> What's difference between pure BSP and FT BSP? Any concrete example? >>>> >>> >>>> >>> >>>> >>> On 9 April 2014 08:29, Edward J. Yoon <[email protected]> wrote: >>>> >>>> In my eyes, SuperstepPiEstimator[1] look like totally new programming >>>> >>>> model, very similar with Pregel. >>>> >>>> >>>> >>>> I personally would like to suggest that we provide both pure BSP and >>>> >>>> fault tolerant BSP model, instead of replace. >>>> >>>> >>>> >>>> 1. >>>> http://svn.apache.org/repos/asf/hama/trunk/examples/src/main/java/org/apache/hama/examples/SuperstepPiEstimator.java >>>> >>>> >>>> >>>> -- >>>> >>>> Edward J. Yoon (@eddieyoon) >>>> >>>> Chief Executive Officer >>>> >>>> DataSayer, Inc. >>>> >> >>>> >> >>>> >> >>>> >> -- >>>> >> Edward J. Yoon (@eddieyoon) >>>> >> CEO at DataSayer Co., Ltd. >>>> >>>> >>>> >>>> -- >>>> Edward J. Yoon (@eddieyoon) >>>> Chief Executive Officer >>>> DataSayer Co., Ltd. >>>> > > > > -- > Edward J. Yoon (@eddieyoon) > Chief Executive Officer > DataSayer Co., Ltd.
