[ https://issues.apache.org/jira/browse/HDFS-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068817#comment-13068817 ]
Sanjay Radia commented on HDFS-2181: ------------------------------------ I have attached a patch to show how I am approaching the problem. As explained in Hadoop-7347, the proposal is to separate data types so that main logic is not polluted by the serialization data types (such as in PB or Avro) and that we can write adapters on client and server side to support multiple protocols whenever there is discontinuity (such as when we switch to Protocol Buffers(PB) or avro). DFSClient and NN both use the data types in hdfs.protocol (unchanged). The actual protocol datatypes that are serialized are in the following packages * package protocolR23Compatible - the data types used in Release 23 onwards that will maintain compatibly as explained in overview.html (basically do not change method signatures - add new types and new methods, etc). Note I have picked R23 as package name rather then the actual protocol version because R23 onwards we will not break compatibiulity till we move to PB. If we move to PB in release 23 then we will delete R23. The separation of data types help us do PB. * package protocolR20203Compatible is for 20.203 protocol * package protocolProtocolBuffers for the protocol buffers There are two translator for each protocol - Client side (protocolR23Compatible.ClientProtocolTranslator) translate to R23 types on client side - Server side (protocolR23Compatible.ClientProtocolServerSideTranslator) - translate from R23 type on SSide. While these translators are simple logic, they increase the maintenance load because each time a new rpc method is added one has to add a method to these translators. We are exploring how to address this ( perhaps by moving the data type translation/mapping to the rpc layer.) The following still remains to be done. # Plugging in a translator on the Server side into the NN. The NN implements multiple protocols and have not quite figured out how to do this. # Automatically picking the right protocol on the client side based on the Server's version. I will give more details on this later. # Have the server implement multiple versions (say R20.203 and R23). We want to use the same port and hence the rpc layer below must demultiplex. Perhaps this will help us solve 1. 2) > Seperate HDFS wire protocol data types > -------------------------------------- > > Key: HDFS-2181 > URL: https://issues.apache.org/jira/browse/HDFS-2181 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Sanjay Radia > Assignee: Sanjay Radia > Attachments: separateDataType1.patch, separateDataType2.patch > > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira