[ 
https://issues.apache.org/jira/browse/HDT-24?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13804546#comment-13804546
 ] 

Mirko Kaempf edited comment on HDT-24 at 10/24/13 6:58 PM:
-----------------------------------------------------------

This quite old issue is still marked as a blocker, and I think a lot about our 
approach of working with dependencies in the context of multiple hadoop 
versions in multiple clusters. We have two levels of dependencies:

As a developer I want to create code for a certain version of hadoop. So I have 
to know the details of the setup of my cluster and I want to build reusable 
software which also works with other environments. This adds a lot of 
complexity.

One constraint could be to select the appropriate cluster configuration on a 
global level. But than I can not work with two different clusters, which might 
be necessary in future projects. 

I see the following situation:

My task is to develop MR-code for a project which has two (or more) datasets in 
different clusters. We are not able to move the data around. A solution is now, 
to connect to both clusters. Therfor I need the libraries of both in my 
classpath, of my client. The IDE is a client in this moment. So we have to 
solve the problem of 1 : n relations on Hadoop client sideif we want to use the 
IDE as a kind of and interactive workspace, which fits in to the mindset of 
developers. I do not speak about a Analysis Platform. But as a developer I 
might be interested to connect do a cluster A which runs also HBase and I want 
to build and deploy a Coprocessor. In cluster B the Hive and Pig work is done. 
For this cluster I want to create SeDes or UDFs. Do we have such scenarios in 
mind already or do I think into the wrong direction.

I dont want to make the things to complicated, but I think this is a critical 
issue, which might have an impact on the success of the project.

What do you think? And what about a "live session" via Google Hangout or Skype 
to get in touch? I am not sure if this is a common thing in the community, but 
I would like to connect faces with the names I already know. 

    


was (Author: kamir1604):
This quite old issue is still marked as a blocker, and I think a lot about out 
approach of working with dependencies. We have two levels of dependencies. 

As a developer I want to create code for a certain version of hadoop. So I have 
to know the details of the setup of my cluster and I want to build reusable 
software which also works with other environments. This adds a lot of 
complexity.

One constraint could be to select the appropriate cluster configuration on a 
global level. But than I can not work with two different clusters, which might 
be necessary in future projects. 

I see the following situation:

My task is to develop MR-code for a project which has two (or more) datasets in 
different clusters. We are not able to move the data around. A solution is now, 
to connect to both clusters. Therfor I need the libraries of both in my 
classpath, of my client. The IDE is a client in this moment. So we have to 
solve the problem of 1 : n relations on Hadoop client sideif we want to use the 
IDE as a kind of and interactive workspace, which fits in to the mindset of 
developers. I do not speak about a Analysis Platform. But as a developer I 
might be interested to connect do a cluster A which runs also HBase and I want 
to build and deploy a Coprocessor. In cluster B the Hive and Pig work is done. 
For this cluster I want to create SeDes or UDFs. Do we have such scenarios in 
mind already or do I think into the wrong direction.

I dont want to make the things to complicated, but I think this is a critical 
issue, which might have an impact on the success of the project.

What do you think? And what about a "live session" via Google Hangout or Skype 
to get in touch? I am not sure if this is a common thing in the community, but 
I would like to connect faces with the names I already know. 

    

> No connection to testcluster
> ----------------------------
>
>                 Key: HDT-24
>                 URL: https://issues.apache.org/jira/browse/HDT-24
>             Project: Hadoop Development Tools
>          Issue Type: Bug
>          Components: HDFS
>         Environment: Cluster with CDH4.0 and CDH4.2 installation
>            Reporter: Mirko Kaempf
>            Priority: Blocker
>
> It is not possible to connect to the testcluster.
> !MESSAGE An internal error occurred during: "Connecting to DFS 
> MyResearchCluster".
> !STACK 0
> java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration
>       at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<init>(DefaultMetricsSystem.java:37)
>       at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<clinit>(DefaultMetricsSystem.java:34)
>       at 
> org.apache.hadoop.security.UgiInstrumentation.create(UgiInstrumentation.java:51)
>       at 
> org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:216)
>       at 
> org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:184)
>       at 
> org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:236)
>       at 
> org.apache.hadoop.security.KerberosName.<clinit>(KerberosName.java:79)
>       at 
> org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:209)
>       at 
> org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:184)
>       at 
> org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:236)
>       at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:466)
>       at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:452)
>       at 
> org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:1494)
>       at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1395)
>       at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
>       at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123)
>       at 
> org.apache.hdt.core.cluster.HadoopCluster.getDFS(HadoopCluster.java:474)
>       at org.apache.hdt.dfs.core.DFSPath.getDFS(DFSPath.java:146)
>       at 
> org.apache.hdt.dfs.core.DFSFolder.loadDFSFolderChildren(DFSFolder.java:61)
>       at org.apache.hdt.dfs.core.DFSFolder$1.run(DFSFolder.java:178)
>       at org.eclipse.core.internal.jobs.Worker.run(Worker.java:53)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.commons.configuration.Configuration
>       at 
> org.eclipse.osgi.internal.loader.BundleLoader.findClassInternal(BundleLoader.java:501)
>       at 
> org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:421)
>       at 
> org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:412)
>       at 
> org.eclipse.osgi.internal.baseadaptor.DefaultClassLoader.loadClass(DefaultClassLoader.java:107)
>       at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>       ... 21 more



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to