[
https://issues.apache.org/jira/browse/HDFS-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068817#comment-13068817
]
Sanjay Radia commented on HDFS-2181:
------------------------------------
I have attached a patch to show how I am approaching the problem.
As explained in Hadoop-7347, the proposal is to separate data types so that
main logic is not polluted by the
serialization data types (such as in PB or Avro) and that we can write adapters
on client and server side
to support multiple protocols whenever there is discontinuity (such as when we
switch to Protocol Buffers(PB) or avro).
DFSClient and NN both use the data types in hdfs.protocol (unchanged).
The actual protocol datatypes that are serialized are in the following packages
* package protocolR23Compatible - the data types used in Release 23 onwards
that will maintain compatibly as
explained in overview.html (basically do not change method signatures - add new
types and new methods, etc).
Note I have picked R23 as package name rather then the actual protocol version
because R23 onwards we will not
break compatibiulity till we move to PB. If we move to PB in release 23 then we
will delete R23. The separation of data types help us do PB.
* package protocolR20203Compatible is for 20.203 protocol
* package protocolProtocolBuffers for the protocol buffers
There are two translator for each protocol
- Client side (protocolR23Compatible.ClientProtocolTranslator) translate to R23
types on client side
- Server side (protocolR23Compatible.ClientProtocolServerSideTranslator) -
translate from R23 type on SSide.
While these translators are simple logic, they increase the maintenance load
because each time a new rpc
method is added one has to add a method to these translators. We are exploring
how to address this ( perhaps by moving the data type translation/mapping to
the rpc layer.)
The following still remains to be done.
# Plugging in a translator on the Server side into the NN. The NN implements
multiple protocols and have not quite figured out how to do this.
# Automatically picking the right protocol on the client side based on the
Server's version. I will give more details on this later.
# Have the server implement multiple versions (say R20.203 and R23). We want to
use the same port and
hence the rpc layer below must demultiplex. Perhaps this will help us solve 1.
2)
> Seperate HDFS wire protocol data types
> --------------------------------------
>
> Key: HDFS-2181
> URL: https://issues.apache.org/jira/browse/HDFS-2181
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Sanjay Radia
> Assignee: Sanjay Radia
> Attachments: separateDataType1.patch, separateDataType2.patch
>
>
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira