[
https://issues.apache.org/jira/browse/HDT-24?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13804546#comment-13804546
]
Mirko Kaempf edited comment on HDT-24 at 10/24/13 6:58 PM:
-----------------------------------------------------------
This quite old issue is still marked as a blocker, and I think a lot about our
approach of working with dependencies in the context of multiple hadoop
versions in multiple clusters. We have two levels of dependencies:
As a developer I want to create code for a certain version of hadoop. So I have
to know the details of the setup of my cluster and I want to build reusable
software which also works with other environments. This adds a lot of
complexity.
One constraint could be to select the appropriate cluster configuration on a
global level. But than I can not work with two different clusters, which might
be necessary in future projects.
I see the following situation:
My task is to develop MR-code for a project which has two (or more) datasets in
different clusters. We are not able to move the data around. A solution is now,
to connect to both clusters. Therfor I need the libraries of both in my
classpath, of my client. The IDE is a client in this moment. So we have to
solve the problem of 1 : n relations on Hadoop client sideif we want to use the
IDE as a kind of and interactive workspace, which fits in to the mindset of
developers. I do not speak about a Analysis Platform. But as a developer I
might be interested to connect do a cluster A which runs also HBase and I want
to build and deploy a Coprocessor. In cluster B the Hive and Pig work is done.
For this cluster I want to create SeDes or UDFs. Do we have such scenarios in
mind already or do I think into the wrong direction.
I dont want to make the things to complicated, but I think this is a critical
issue, which might have an impact on the success of the project.
What do you think? And what about a "live session" via Google Hangout or Skype
to get in touch? I am not sure if this is a common thing in the community, but
I would like to connect faces with the names I already know.
was (Author: kamir1604):
This quite old issue is still marked as a blocker, and I think a lot about out
approach of working with dependencies. We have two levels of dependencies.
As a developer I want to create code for a certain version of hadoop. So I have
to know the details of the setup of my cluster and I want to build reusable
software which also works with other environments. This adds a lot of
complexity.
One constraint could be to select the appropriate cluster configuration on a
global level. But than I can not work with two different clusters, which might
be necessary in future projects.
I see the following situation:
My task is to develop MR-code for a project which has two (or more) datasets in
different clusters. We are not able to move the data around. A solution is now,
to connect to both clusters. Therfor I need the libraries of both in my
classpath, of my client. The IDE is a client in this moment. So we have to
solve the problem of 1 : n relations on Hadoop client sideif we want to use the
IDE as a kind of and interactive workspace, which fits in to the mindset of
developers. I do not speak about a Analysis Platform. But as a developer I
might be interested to connect do a cluster A which runs also HBase and I want
to build and deploy a Coprocessor. In cluster B the Hive and Pig work is done.
For this cluster I want to create SeDes or UDFs. Do we have such scenarios in
mind already or do I think into the wrong direction.
I dont want to make the things to complicated, but I think this is a critical
issue, which might have an impact on the success of the project.
What do you think? And what about a "live session" via Google Hangout or Skype
to get in touch? I am not sure if this is a common thing in the community, but
I would like to connect faces with the names I already know.
> No connection to testcluster
> ----------------------------
>
> Key: HDT-24
> URL: https://issues.apache.org/jira/browse/HDT-24
> Project: Hadoop Development Tools
> Issue Type: Bug
> Components: HDFS
> Environment: Cluster with CDH4.0 and CDH4.2 installation
> Reporter: Mirko Kaempf
> Priority: Blocker
>
> It is not possible to connect to the testcluster.
> !MESSAGE An internal error occurred during: "Connecting to DFS
> MyResearchCluster".
> !STACK 0
> java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration
> at
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<init>(DefaultMetricsSystem.java:37)
> at
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<clinit>(DefaultMetricsSystem.java:34)
> at
> org.apache.hadoop.security.UgiInstrumentation.create(UgiInstrumentation.java:51)
> at
> org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:216)
> at
> org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:184)
> at
> org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:236)
> at
> org.apache.hadoop.security.KerberosName.<clinit>(KerberosName.java:79)
> at
> org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:209)
> at
> org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:184)
> at
> org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:236)
> at
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:466)
> at
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:452)
> at
> org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:1494)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1395)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123)
> at
> org.apache.hdt.core.cluster.HadoopCluster.getDFS(HadoopCluster.java:474)
> at org.apache.hdt.dfs.core.DFSPath.getDFS(DFSPath.java:146)
> at
> org.apache.hdt.dfs.core.DFSFolder.loadDFSFolderChildren(DFSFolder.java:61)
> at org.apache.hdt.dfs.core.DFSFolder$1.run(DFSFolder.java:178)
> at org.eclipse.core.internal.jobs.Worker.run(Worker.java:53)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.commons.configuration.Configuration
> at
> org.eclipse.osgi.internal.loader.BundleLoader.findClassInternal(BundleLoader.java:501)
> at
> org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:421)
> at
> org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:412)
> at
> org.eclipse.osgi.internal.baseadaptor.DefaultClassLoader.loadClass(DefaultClassLoader.java:107)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> ... 21 more
--
This message was sent by Atlassian JIRA
(v6.1#6144)