[ 
https://issues.apache.org/jira/browse/HDFS-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068817#comment-13068817
 ] 

Sanjay Radia commented on HDFS-2181:
------------------------------------

I have attached a patch to show how I am approaching the problem.
As explained in Hadoop-7347, the proposal is to separate data types so that 
main logic is not polluted by the
serialization data types (such as in PB or Avro) and that we can write adapters 
on client and server side
to support multiple protocols whenever there is discontinuity (such as when we 
switch to Protocol Buffers(PB) or avro).

DFSClient and NN both use the data types in hdfs.protocol (unchanged).
The actual protocol datatypes that are serialized are in the following packages
* package protocolR23Compatible - the data types used in Release 23 onwards 
that will maintain compatibly as
explained in overview.html (basically do not change method signatures - add new 
types and new methods, etc).
Note I have picked R23 as package name rather then the actual protocol version 
because R23 onwards we will not
break compatibiulity till we move to PB. If we move to PB in release 23 then we 
will delete R23. The separation of data types help us do PB.
* package protocolR20203Compatible is for 20.203 protocol
* package protocolProtocolBuffers for the protocol buffers

There are two translator for each protocol
- Client side (protocolR23Compatible.ClientProtocolTranslator) translate to R23 
types on client side
- Server side (protocolR23Compatible.ClientProtocolServerSideTranslator) - 
translate from R23 type  on SSide.

While these translators are simple logic, they increase the maintenance load 
because each time a new rpc
method is added one has to add a method to these translators. We are exploring 
how to address this ( perhaps by moving the data type translation/mapping to 
the rpc layer.)

The following still remains to be done.
# Plugging in a translator on the Server side into the NN. The NN implements 
multiple protocols and have not quite figured out how to do this.
# Automatically picking the right protocol on the client side based on the 
Server's version. I will give more details on this later.
# Have the server implement multiple versions (say R20.203 and R23). We want to 
use the same port and
hence the rpc layer below must demultiplex. Perhaps this will help us solve 1.
2)

> Seperate HDFS wire protocol data types
> --------------------------------------
>
>                 Key: HDFS-2181
>                 URL: https://issues.apache.org/jira/browse/HDFS-2181
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>         Attachments: separateDataType1.patch, separateDataType2.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to