Code right now is at https://github.com/chlin501/hama.git
Maven and jdk are required to build the project Command to have a clean build: mvn clean install -DskipTests=true -Dmaven.javadoc.skip=true To test a specific test case: mvn -DskipTests=false -Dtest=<TestCaseName> test On 15 August 2014 18:21, Suraj Menon <[email protected]> wrote: > Hi Edward, sorry to enter the discussion so late. > > Bundling and Unbundling of message queue is not Spilling queue's > responsibility, it was ended up there to be compatible with the existent > implementation of BSP Peer communication. Remember Spilling Queue > implementation was done to immediately remove some OutOfMemory issues on > sender side first. Spilling Queue gives you a byte array (ByteBuffer) with > a batch of serialized messages. This is effectively bundling the messages > in byte array (hence the ByteArrayMessageBundle) and sending them for > processing. The SpilledDataProcessor's are implemented as a pipeline of > processing done using inheritance, something like what we may use trait for > in Scala. So if we have a SpilledDataProcessor that sends this bundled > message via RPC to the peer, there is no need to write them to file and > read them back. As I previously mentioned this was done to be compatible > with the existent implementation of peer.send. > > Also, the async checkpoint recovery code was written before spilling queue. > Today we can remove the single message write and do this in "before peer > sync" phase to just write the whole file to HDFS. > > I would say performance numbers and maintainability comes first and if you > think removing spilling queue is a solution go for it. As far as async > checkpointing is to be considered, that was a first proof of concept we did > and it is high time we move forward from there. > > Chiahung, do you have some instruction on where and how I can build the > scala version of your code? > > I am really finding it hard to dedicate time for Hama these days. > > - Suraj > > > On Tue, Aug 12, 2014 at 7:15 AM, Edward J. Yoon <[email protected]> > wrote: > >> ChiaHung, >> >> Yes, I'm thinking similar things. >> >> On Tue, Aug 12, 2014 at 4:11 PM, Chia-Hung Lin <[email protected]> >> wrote: >> > I am currently working on this part based on the superstep api, >> > similar to the Superstep.java in the trunk. >> > >> > The checkpointer[1] saves bundle message instead of single message. >> > Not very sure if this is what you are looking for? >> > >> > [1]. >> https://github.com/chlin501/hama/blob/peer-comm-mech-changed/core/src/main/scala/org/apache/hama/monitor/Checkpointer.scala >> > >> > >> > >> > >> > On 12 August 2014 15:04, Edward J. Yoon <[email protected]> wrote: >> >> I think that transferring single messages at a time is not a wise way. >> >> Bundle is used to avoid network overheads and contentions. So, if we >> >> use Bundle, each processor always sends/receives an bundles. >> >> >> >> BSPMessageBundle is Writable (and Iterable). And it manages the >> >> serialized message as a byte array. If we write an bundles when >> >> checkpointing or using Disk-queue, it'll be more simple and faster. >> >> >> >> In Spilling Queue case, it always requires the process of unbundling >> >> and putting messages into queue. >> >> >> >> >> >> On Tue, Aug 12, 2014 at 2:41 PM, Tommaso Teofili >> >> <[email protected]> wrote: >> >>> -1, can't we first discuss? Also it'd be helpful to be more specific >> on the >> >>> problems. >> >>> Tommaso >> >>> >> >>> >> >>> >> >>> 2014-08-12 4:25 GMT+02:00 Edward J. Yoon <[email protected]>: >> >>> >> >>>> All, >> >>>> >> >>>> I'll delete Spilling queue, and rewrite checkpoint/recovery >> >>>> implementation (checkpointing bundles is better than checkpointing all >> >>>> messages). Current implementation is quite mess :/ there are huge >> >>>> deserialization/serialization overheads.. >> >>>> >> >>>> -- >> >>>> Best Regards, Edward J. Yoon >> >>>> CEO at DataSayer Co., Ltd. >> >>>> >> >> >> >> >> >> >> >> -- >> >> Best Regards, Edward J. Yoon >> >> CEO at DataSayer Co., Ltd. >> >> >> >> -- >> Best Regards, Edward J. Yoon >> CEO at DataSayer Co., Ltd. >>
