Do you have any plan for merging them? This is side opinion. If we want to use Git, now I'm +1.
On Sat, Aug 16, 2014 at 12:00 AM, Chia-Hung Lin <[email protected]> wrote: > Code right now is at https://github.com/chlin501/hama.git > > Maven and jdk are required to build the project > > Command to have a clean build: > mvn clean install -DskipTests=true -Dmaven.javadoc.skip=true > > To test a specific test case: > mvn -DskipTests=false -Dtest=<TestCaseName> test > > > On 15 August 2014 18:21, Suraj Menon <[email protected]> wrote: >> Hi Edward, sorry to enter the discussion so late. >> >> Bundling and Unbundling of message queue is not Spilling queue's >> responsibility, it was ended up there to be compatible with the existent >> implementation of BSP Peer communication. Remember Spilling Queue >> implementation was done to immediately remove some OutOfMemory issues on >> sender side first. Spilling Queue gives you a byte array (ByteBuffer) with >> a batch of serialized messages. This is effectively bundling the messages >> in byte array (hence the ByteArrayMessageBundle) and sending them for >> processing. The SpilledDataProcessor's are implemented as a pipeline of >> processing done using inheritance, something like what we may use trait for >> in Scala. So if we have a SpilledDataProcessor that sends this bundled >> message via RPC to the peer, there is no need to write them to file and >> read them back. As I previously mentioned this was done to be compatible >> with the existent implementation of peer.send. >> >> Also, the async checkpoint recovery code was written before spilling queue. >> Today we can remove the single message write and do this in "before peer >> sync" phase to just write the whole file to HDFS. >> >> I would say performance numbers and maintainability comes first and if you >> think removing spilling queue is a solution go for it. As far as async >> checkpointing is to be considered, that was a first proof of concept we did >> and it is high time we move forward from there. >> >> Chiahung, do you have some instruction on where and how I can build the >> scala version of your code? >> >> I am really finding it hard to dedicate time for Hama these days. >> >> - Suraj >> >> >> On Tue, Aug 12, 2014 at 7:15 AM, Edward J. Yoon <[email protected]> >> wrote: >> >>> ChiaHung, >>> >>> Yes, I'm thinking similar things. >>> >>> On Tue, Aug 12, 2014 at 4:11 PM, Chia-Hung Lin <[email protected]> >>> wrote: >>> > I am currently working on this part based on the superstep api, >>> > similar to the Superstep.java in the trunk. >>> > >>> > The checkpointer[1] saves bundle message instead of single message. >>> > Not very sure if this is what you are looking for? >>> > >>> > [1]. >>> https://github.com/chlin501/hama/blob/peer-comm-mech-changed/core/src/main/scala/org/apache/hama/monitor/Checkpointer.scala >>> > >>> > >>> > >>> > >>> > On 12 August 2014 15:04, Edward J. Yoon <[email protected]> wrote: >>> >> I think that transferring single messages at a time is not a wise way. >>> >> Bundle is used to avoid network overheads and contentions. So, if we >>> >> use Bundle, each processor always sends/receives an bundles. >>> >> >>> >> BSPMessageBundle is Writable (and Iterable). And it manages the >>> >> serialized message as a byte array. If we write an bundles when >>> >> checkpointing or using Disk-queue, it'll be more simple and faster. >>> >> >>> >> In Spilling Queue case, it always requires the process of unbundling >>> >> and putting messages into queue. >>> >> >>> >> >>> >> On Tue, Aug 12, 2014 at 2:41 PM, Tommaso Teofili >>> >> <[email protected]> wrote: >>> >>> -1, can't we first discuss? Also it'd be helpful to be more specific >>> on the >>> >>> problems. >>> >>> Tommaso >>> >>> >>> >>> >>> >>> >>> >>> 2014-08-12 4:25 GMT+02:00 Edward J. Yoon <[email protected]>: >>> >>> >>> >>>> All, >>> >>>> >>> >>>> I'll delete Spilling queue, and rewrite checkpoint/recovery >>> >>>> implementation (checkpointing bundles is better than checkpointing all >>> >>>> messages). Current implementation is quite mess :/ there are huge >>> >>>> deserialization/serialization overheads.. >>> >>>> >>> >>>> -- >>> >>>> Best Regards, Edward J. Yoon >>> >>>> CEO at DataSayer Co., Ltd. >>> >>>> >>> >> >>> >> >>> >> >>> >> -- >>> >> Best Regards, Edward J. Yoon >>> >> CEO at DataSayer Co., Ltd. >>> >>> >>> >>> -- >>> Best Regards, Edward J. Yoon >>> CEO at DataSayer Co., Ltd. >>> -- Best Regards, Edward J. Yoon CEO at DataSayer Co., Ltd.
