Hello,
With regards to contributing HDFS/ZooKeeper functionalities, I was going
through HDT code and noted some design issues/thoughts that I wanted to
discuss.

1) Client cannot connect with multiple versions of HDFS/MR servers
org.apache.hdt.core.cluster.HadoopCluster which represents a cluster,
provides direct access to the HDFS and MR java API. This implies that at
any time, only 1 version of HDFS and MR client libraries can be used
(typically, whichever version gets loaded first by the classloader). So if
there was any use case where interactions with multiple HDFS/MR versions is
required, we would hit runtime issues. The client would be at the mercy of
backward/forward compatibility capabilities of the HDFS/MR/[any-service]
clients.

In the Hadoop-Eclipse project, to get around this issue, I have created an
extension-point based abstraction, where the Eclipse functionality itself
would never directly use HDFS/ZooKeeper/[service] classes. Rather, from
multiple versions of extension point implementations, the right one would
be used to talk to the server. This allows the UI/core(headless)
functionalities to be free from the ever changing versions of
clients/servers.


2) No clean seperation of UI, non-UI capabilities.
In HDFS, almost all functionality is non-UI (create, read, write, delete of
files/folders). However, currently all HDT plugins are dependent on UI
plugins (starting with org.apache.hdt.core). This goes against the
model-view-controller (MVC) paradigm, where the Eclipse UI (view) is mixed
in with the models and controllers. There is no reason why someone could
not leverage or extend the core/headless/non-UI capabilities of various
Hadoop services in Eclipse without the UI.

In the Hadoop-Eclipse project, plugins are categorized into core
(representing non-UI capabilities) and UI plugins. You can create
connections, create/read/write/delete HDFS/ZooKeeper contents, etc.,
without even having UI plugins. This is helpful in nightly JUnit tests to
start. But it also allows others to provide their own UI interactions on
top of us. The models that are persisted (HDFS/ZooKeeper connections,
metadata, etc.) are Eclipse Modeling Framework
(EMF)<http://www.eclipse.org/modeling/emf/>models, which have a
built-in notification mechanism. They help in a clean
separation of Models and Controllers in MVC.


The above were some of the major ones which came to mind.
I encourage the community to go through the
Hadoop-Eclipse<http://people.apache.org/~srimanth/hadoop-eclipse/>project
codebase, and discuss any issues/concerns you have.

I am thinking of the best way to merge the functionalities of both
projects, and would like to put forward a proposal.
HDFS is the only functionality common between both projects, along with
underlying framework. If we can come to a consensus on which parts we want
from where, it will be a smoother effort merging the code. From my end of
the spectrum, I was thinking it might be easier if the MR functionality
could be merged into the HDFS/ZooKeeper functionalities, thus providing a
union of both projects.

I just wanted to get the merging process started, and look forward to
discussing more about it.
Best regards,
Srimanth

Reply via email to