[ 
https://issues.apache.org/jira/browse/HAMA-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238679#comment-13238679
 ] 

Suraj Menon commented on HAMA-521:
----------------------------------

Nice implementation! .. I have the following design questions and would like to 
propose few changes in design after these questions:

1. Should MessageManager hold socket address information? On failure, socket 
address of few peers would change as they would get scheduled on different 
machine. If MessageManager holds the socket address, then it has to be updated 
on failure of peers.
2. Should we have identifier for each message? In my opinion we should. This 
would help to remove duplicates in messages while cleanup on recovery. If that 
is the case, we need to implement queue as Set (LinkedHashSet?). This would 
also help us implement sorting in the message buffer. We can have TreeSet 
implementation underneath.
3. For that matter should we have header <id, source peer , destination peer> ?
4. There should be a simple reliable transactional protocol between two peers. 
When the transaction is completed, the sender is acknowledged that the receiver 
has completely received all the messages.

During transfer, the sender should send a BEGIN-TRANSACTION flag. Send all its 
messages. Send COMMIT. The transaction is over only once sender gets an ACK on 
COMMIT. With this protocol, it does not matter where we write the messages in 
file, HDFS or on remote machine. If the transaction fails, the sender can 
cleanup its own side and re-attempt after getting new destination peer address. 
On sender failure, the receiver can cleanup and remove the duplicate messages. 
We have to figure out how to send Transaction commands. Probably, this is where 
the headers would be helpful. For synchronous checkpointing, we can make sure 
that sender sends COMMIT only after checkpointing all the messages.

> init/close into the queue interface, otherwise the DiskQueue will be a whole 
> mess

We are sure of reading all the messages from the DiskQueue. Can we have an 
Iterator that would close the file once the last record is read?




                
> Improve message buffering to save memory
> ----------------------------------------
>
>                 Key: HAMA-521
>                 URL: https://issues.apache.org/jira/browse/HAMA-521
>             Project: Hama
>          Issue Type: Sub-task
>            Reporter: Thomas Jungblut
>            Assignee: Thomas Jungblut
>         Attachments: HAMA-521.patch, HAMA-521_1.patch
>
>
> Suraj and I had a bit of discussion about incoming and outgoing message 
> buffering and scalability.
> Currently everything lies on the heap, causing huge amounts of GC and waste 
> of memory. We can do better.
> Therefore we need to extract an abstract Messenger class which is directly 
> under the interface but over the compressor class.
> It should abstract the use of the queues in the back (currently lot of 
> duplicated code) and it should be backed by a sequencefile on local disk.
> Once sync() starts it should return a message iterator for combining and then 
> gets put into a message bundle which is send over RPC.
> On the other side we get a bundle and looping over it putting everything into 
> the heap making it much larger than it needs to be. Here we can also flush on 
> disk because we are just using a queue-like method to the user-side.
> Plus points:
> In case we have enough heap (see our new metric system), we can also 
> implement a buffering technology that is not flushing everything to disk.
> Open questions:
> I don't know how much slower the whole system gets, but it would save alot of 
> memory. Maybe we should first evaluate if it is really needed.
> In any case, the refactoring of the duplicate code in the messengers is 
> needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to