Re: issue of building with native

2013-09-18 Thread Harsh J
This could be http://jira.codehaus.org/browse/MSITE-683. What version
of Maven are you using btw?

On Thu, Sep 19, 2013 at 6:31 AM, Hai Huang  wrote:
> Hi,
>
> I built the java code with native code through the instruction in BUILDING.txt
>
>  $ mvn package -Pdist,native,docs -DskipTests -Dtar
>
> But the error occured below.  Does anyone know the problems ? Actually, the 
> build was passed if I built without native code (  $ mvn package -Pdist 
> -DskipTests -Dtar)
>
>
> I am using Java 1.6.0_24 and gcc 4.4.6
>
> Thanks
>
> Haifeng
> 
>
>
> [WARNING] Error injecting: 
> org.apache.maven.reporting.exec.DefaultMavenReportExecutor
> java.lang.NoClassDefFoundError: org/sonatype/aether/graph/DependencyFilter
> at java.lang.Class.getDeclaredConstructors0(Native Method)
> at java.lang.Class.privateGetDeclaredConstructors(Class.java:2444)
> at java.lang.Class.getDeclaredConstructors(Class.java:1883)
> at 
> com.google.inject.spi.InjectionPoint.forConstructorOf(InjectionPoint.java:245)
> at 
> com.google.inject.internal.ConstructorBindingImpl.create(ConstructorBindingImpl.java:99)
> at 
> com.google.inject.internal.InjectorImpl.createUninitializedBinding(InjectorImpl.java:653)
> at 
> com.google.inject.internal.InjectorImpl.createJustInTimeBinding(InjectorImpl.java:863)
> at 
> com.google.inject.internal.InjectorImpl.createJustInTimeBindingRecursive(InjectorImpl.java:790)
> at 
> com.google.inject.internal.InjectorImpl.getJustInTimeBinding(InjectorImpl.java:278)
> at 
> com.google.inject.internal.InjectorImpl.getBindingOrThrow(InjectorImpl.java:210)
> at 
> com.google.inject.internal.InjectorImpl.getProviderOrThrow(InjectorImpl.java:986)
> at 
> com.google.inject.internal.InjectorImpl.getProvider(InjectorImpl.java:1019)
> at 
> com.google.inject.internal.InjectorImpl.getProvider(InjectorImpl.java:982)
> at 
> com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1032)
> at 
> org.eclipse.sisu.reflect.AbstractDeferredClass.get(AbstractDeferredClass.java:44)
> at 
> com.google.inject.internal.ProviderInternalFactory.provision(ProviderInternalFactory.java:86)
> at 
> com.google.inject.internal.InternalFactoryToInitializableAdapter.provision(InternalFactoryToInitializableAdapter.java:55)
> at 
> com.google.inject.internal.ProviderInternalFactory$1.call(ProviderInternalFactory.java:70)
> at 
> com.google.inject.internal.ProvisionListenerStackCallback$Provision.provision(ProvisionListenerStackCallback.java:100)
> at 
> org.eclipse.sisu.plexus.lifecycles.PlexusLifecycleManager.onProvision(PlexusLifecycleManager.java:134)
> at 
> com.google.inject.internal.ProvisionListenerStackCallback$Provision.provision(ProvisionListenerStackCallback.java:109)
> at 
> com.google.inject.internal.ProvisionListenerStackCallback.provision(ProvisionListenerStackCallback.java:55)
> at 
> com.google.inject.internal.ProviderInternalFactory.circularGet(ProviderInternalFactory.java:68)
> at 
> com.google.inject.internal.InternalFactoryToInitializableAdapter.get(InternalFactoryToInitializableAdapter.java:47)
> at 
> com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)
> at 
> com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1054)
> at 
> com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
> at com.google.inject.Scopes$1$1.get(Scopes.java:59)
> at 
> com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41)
> at 
> com.google.inject.internal.InjectorImpl$2$1.call(InjectorImpl.java:997)
> at 
> com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1047)
> at 
> com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:993)
> at 
> org.eclipse.sisu.locators.LazyBeanEntry.getValue(LazyBeanEntry.java:82)
> at 
> org.eclipse.sisu.plexus.locators.LazyPlexusBean.getValue(LazyPlexusBean.java:52)
> at 
> org.codehaus.plexus.DefaultPlexusContainer.lookup(DefaultPlexusContainer.java:259)
> at 
> org.codehaus.plexus.DefaultPlexusContainer.lookup(DefaultPlexusContainer.java:239)
> at 
> org.codehaus.plexus.DefaultPlexusContainer.lookup(DefaultPlexusContainer.java:233)
> at 
> org.apache.maven.plugins.site.AbstractSiteRenderingMojo.getReports(AbstractSiteRenderingMojo.java:234)
> at org.apache.maven.plugins.site.SiteMojo.execute(SiteMojo.java:121)
> at 
> org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:106)
> at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
> at 
> org.apache.maven.lifecycl

Re: issue of building with native

2013-09-18 Thread Hai Huang
Hi,

I built the java code with native code through the instruction in BUILDING.txt

 $ mvn package -Pdist,native,docs -DskipTests -Dtar

But the error occured below.  Does anyone know the problems ? Actually, the 
build was passed if I built without native code (  $ mvn package -Pdist 
-DskipTests -Dtar)


I am using Java 1.6.0_24 and gcc 4.4.6

Thanks

Haifeng



[WARNING] Error injecting: 
org.apache.maven.reporting.exec.DefaultMavenReportExecutor
java.lang.NoClassDefFoundError: org/sonatype/aether/graph/DependencyFilter
    at java.lang.Class.getDeclaredConstructors0(Native Method)
    at java.lang.Class.privateGetDeclaredConstructors(Class.java:2444)
    at java.lang.Class.getDeclaredConstructors(Class.java:1883)
    at 
com.google.inject.spi.InjectionPoint.forConstructorOf(InjectionPoint.java:245)
    at 
com.google.inject.internal.ConstructorBindingImpl.create(ConstructorBindingImpl.java:99)
    at 
com.google.inject.internal.InjectorImpl.createUninitializedBinding(InjectorImpl.java:653)
    at 
com.google.inject.internal.InjectorImpl.createJustInTimeBinding(InjectorImpl.java:863)
    at 
com.google.inject.internal.InjectorImpl.createJustInTimeBindingRecursive(InjectorImpl.java:790)
    at 
com.google.inject.internal.InjectorImpl.getJustInTimeBinding(InjectorImpl.java:278)
    at 
com.google.inject.internal.InjectorImpl.getBindingOrThrow(InjectorImpl.java:210)
    at 
com.google.inject.internal.InjectorImpl.getProviderOrThrow(InjectorImpl.java:986)
    at 
com.google.inject.internal.InjectorImpl.getProvider(InjectorImpl.java:1019)
    at 
com.google.inject.internal.InjectorImpl.getProvider(InjectorImpl.java:982)
    at 
com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1032)
    at 
org.eclipse.sisu.reflect.AbstractDeferredClass.get(AbstractDeferredClass.java:44)
    at 
com.google.inject.internal.ProviderInternalFactory.provision(ProviderInternalFactory.java:86)
    at 
com.google.inject.internal.InternalFactoryToInitializableAdapter.provision(InternalFactoryToInitializableAdapter.java:55)
    at 
com.google.inject.internal.ProviderInternalFactory$1.call(ProviderInternalFactory.java:70)
    at 
com.google.inject.internal.ProvisionListenerStackCallback$Provision.provision(ProvisionListenerStackCallback.java:100)
    at 
org.eclipse.sisu.plexus.lifecycles.PlexusLifecycleManager.onProvision(PlexusLifecycleManager.java:134)
    at 
com.google.inject.internal.ProvisionListenerStackCallback$Provision.provision(ProvisionListenerStackCallback.java:109)
    at 
com.google.inject.internal.ProvisionListenerStackCallback.provision(ProvisionListenerStackCallback.java:55)
    at 
com.google.inject.internal.ProviderInternalFactory.circularGet(ProviderInternalFactory.java:68)
    at 
com.google.inject.internal.InternalFactoryToInitializableAdapter.get(InternalFactoryToInitializableAdapter.java:47)
    at 
com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)
    at 
com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1054)
    at 
com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
    at com.google.inject.Scopes$1$1.get(Scopes.java:59)
    at 
com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41)
    at 
com.google.inject.internal.InjectorImpl$2$1.call(InjectorImpl.java:997)
    at 
com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1047)
    at com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:993)
    at 
org.eclipse.sisu.locators.LazyBeanEntry.getValue(LazyBeanEntry.java:82)
    at 
org.eclipse.sisu.plexus.locators.LazyPlexusBean.getValue(LazyPlexusBean.java:52)
    at 
org.codehaus.plexus.DefaultPlexusContainer.lookup(DefaultPlexusContainer.java:259)
    at 
org.codehaus.plexus.DefaultPlexusContainer.lookup(DefaultPlexusContainer.java:239)
    at 
org.codehaus.plexus.DefaultPlexusContainer.lookup(DefaultPlexusContainer.java:233)
    at 
org.apache.maven.plugins.site.AbstractSiteRenderingMojo.getReports(AbstractSiteRenderingMojo.java:234)
    at org.apache.maven.plugins.site.SiteMojo.execute(SiteMojo.java:121)
    at 
org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:106)
    at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
    at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
    at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
    at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:84)
    at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProje

Re: symlink support in Hadoop 2 GA

2013-09-18 Thread J. Rottinghuis
However painful protobuf version changes are at build time for Hadoop
developers, at runtime with multiple clusters and many Hadoop users this is
a total nightmare.
Even upgrading clusters from one protobuf version to the next is going to
be very difficult. The same users will run jobs on, and/or read&write to
multiple clusters. That means that they will have to fork their code, run
multiple instances? Or in the very least they have to do an update to their
applications. All in sync with Hadoop cluster changes. And these are not
doable in a rolling fashion.
All Hadoop and HBase clusters will all upgrade at the same time, or we'll
have to have our users fork / roll multiple versions ?
My point is that these things are much harder than just fix the (Jenkins)
build and we're done. These changes are massively disruptive.

There is a similar situation with symlinks. Having an API that lets users
create symlinks is very problematic. Some users create symlinks and as Eli
pointed out, somebody else (or automated process) tries to copy to / from
another (Hadoop 1.x?) cluster over hftp. What will happen ?
Having an API that people should not use is also a nightmare. We
experienced this with append. For a while it was there, but users were "not
allowed to use it" (or else there were large #'s of corrupt blocks). If
there is an API to create a symlink, then some of our users are going to
use it and others are going to trip over those symlinks. We already know
that Pig does not work with symlinks yet, and as Steve pointed out, there
is tons of other code out there that assumes that !isDir() means isFile().

I like symlink functionality, but in our migration to Hadoop 2.x this is a
total distraction. If the APIs stay in 2.2 GA we'll have to choose to:
a) Not uprev until symlink support is figured out up and down the stack,
and we've been able to migrate all our 1.x (equivalent) clusters to 2.x
(equivalent). Or
b) rip out the API altogether. Or
c) change the implementation to throw an UnsupportedOperationException
I'm not sure yet which of these I like least.

Thanks,

Joep




On Wed, Sep 18, 2013 at 9:48 AM, Arun C Murthy  wrote:

>
> On Sep 16, 2013, at 6:49 PM, Andrew Wang  wrote:
>
> > Hi all,
> >
> > I wanted to broadcast plans for putting the FileSystem symlinks work
> > (HADOOP-8040) into branch-2.1 for the pending Hadoop 2 GA release. I
> think
> > it's pretty important we get it in since it's not a compatible change; if
> > it misses the GA train, we're not going to have symlinks until the next
> > major release.
>
> Just catching up, is this an incompatible change, or not? The above reads
> 'not an incompatible change'.
>
> Arun
>
> >
> > However, we're still dealing with ongoing issues revealed via testing.
> > There's user-code out there that only handles files and directories and
> > will barf when given a symlink (perhaps a dangling one!). See HADOOP-9912
> > for a nice example where globStatus returning symlinks broke Pig; some of
> > us had a conference call to talk it through, and one definite conclusion
> > was that this wasn't solvable in a generally compatible manner.
> >
> > There are also still some gaps in symlink support right now. For example,
> > the more esoteric FileSystems like WebHDFS, HttpFS, and HFTP need symlink
> > resolution, and tooling like the FsShell and Distcp still need to be
> > updated as well.
> >
> > So, there's definitely work to be done, but there are a lot of users
> > interested in the feature, and symlinks really should be in GA. Would
> > appreciate any thoughts/input on the matter.
> >
> > Thanks,
> > Andrew
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>


Re: [VOTE] Release Apache Hadoop 2.1.1-beta

2013-09-18 Thread Roman Shaposhnik
On Tue, Sep 17, 2013 at 9:58 PM, Roman Shaposhnik  wrote:
> I'm also running the tests on fully distributed clusters in Bigtop --
> will report the findings tomorrow.

The first result of my test run is this:
https://issues.apache.org/jira/browse/HDFS-5225

Not sure it it qualifies as a blocker, but it looks serious
enough. Not just because it essentially make DN
logs grow to infinity (this issue actually can be
mitigated by managing local FS) but because
it suggests a deeper problem somehow.

I'd love to proceed with more testing, but I'm keeping
this current cluster running if anybody wants to
take a look.

Thanks,
Roman.


[jira] [Created] (HADOOP-9979) HADOOP_IDENT_STRING should not be changed in hadoop-env.sh for hadoop daemons running as services

2013-09-18 Thread Masatake Iwasaki (JIRA)
Masatake Iwasaki created HADOOP-9979:


 Summary: HADOOP_IDENT_STRING should not be changed in 
hadoop-env.sh for hadoop daemons running as services
 Key: HADOOP-9979
 URL: https://issues.apache.org/jira/browse/HADOOP-9979
 Project: Hadoop Common
  Issue Type: Improvement
  Components: conf
Reporter: Masatake Iwasaki
Priority: Trivial


Because configurations for services are set in the files such as /etc/default/* 
for Linux cases. In order to avoid this to be overrided by hadoop-env.sh, there 
should be some comments for caution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HADOOP-9974) Trunk Build Failure at HDFS Sub-project

2013-09-18 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang resolved HADOOP-9974.
-

Resolution: Not A Problem

Resolving, please re-open if Steve's fix is insufficient. FWIW, I have not hit 
this in my Ubuntu dev environment.

> Trunk Build Failure at HDFS Sub-project
> ---
>
> Key: HADOOP-9974
> URL: https://issues.apache.org/jira/browse/HADOOP-9974
> Project: Hadoop Common
>  Issue Type: Bug
> Environment: Mac OS X
>Reporter: Zhijie Shen
>
> Recently Hadoop upgraded to use Protobuf 2.5.0. To build the trunk, I updated 
> my installed Protobuf 2.5.0. With this upgrade, I didn't encounter the build 
> failure due to protoc, but failed when building HDFS sub-project. Bellow is 
> failure message. I'm using Mac OS X.
> {code}
> INFO] Reactor Summary:
> [INFO] 
> [INFO] Apache Hadoop Main  SUCCESS [1.075s]
> [INFO] Apache Hadoop Project POM . SUCCESS [0.805s]
> [INFO] Apache Hadoop Annotations . SUCCESS [2.283s]
> [INFO] Apache Hadoop Assemblies .. SUCCESS [0.343s]
> [INFO] Apache Hadoop Project Dist POM  SUCCESS [1.913s]
> [INFO] Apache Hadoop Maven Plugins ... SUCCESS [2.390s]
> [INFO] Apache Hadoop Auth  SUCCESS [2.597s]
> [INFO] Apache Hadoop Auth Examples ... SUCCESS [1.868s]
> [INFO] Apache Hadoop Common .. SUCCESS [55.798s]
> [INFO] Apache Hadoop NFS . SUCCESS [3.549s]
> [INFO] Apache Hadoop MiniKDC . SUCCESS [1.788s]
> [INFO] Apache Hadoop Common Project .. SUCCESS [0.044s]
> [INFO] Apache Hadoop HDFS  FAILURE [25.219s]
> [INFO] Apache Hadoop HttpFS .. SKIPPED
> [INFO] Apache Hadoop HDFS BookKeeper Journal . SKIPPED
> [INFO] Apache Hadoop HDFS-NFS  SKIPPED
> [INFO] Apache Hadoop HDFS Project  SKIPPED
> [INFO] hadoop-yarn ... SKIPPED
> [INFO] hadoop-yarn-api ... SKIPPED
> [INFO] hadoop-yarn-common  SKIPPED
> [INFO] hadoop-yarn-server  SKIPPED
> [INFO] hadoop-yarn-server-common . SKIPPED
> [INFO] hadoop-yarn-server-nodemanager  SKIPPED
> [INFO] hadoop-yarn-server-web-proxy .. SKIPPED
> [INFO] hadoop-yarn-server-resourcemanager  SKIPPED
> [INFO] hadoop-yarn-server-tests .. SKIPPED
> [INFO] hadoop-yarn-client  SKIPPED
> [INFO] hadoop-yarn-applications .. SKIPPED
> [INFO] hadoop-yarn-applications-distributedshell . SKIPPED
> [INFO] hadoop-mapreduce-client ... SKIPPED
> [INFO] hadoop-mapreduce-client-core .. SKIPPED
> [INFO] hadoop-yarn-applications-unmanaged-am-launcher  SKIPPED
> [INFO] hadoop-yarn-site .. SKIPPED
> [INFO] hadoop-yarn-project ... SKIPPED
> [INFO] hadoop-mapreduce-client-common  SKIPPED
> [INFO] hadoop-mapreduce-client-shuffle ... SKIPPED
> [INFO] hadoop-mapreduce-client-app ... SKIPPED
> [INFO] hadoop-mapreduce-client-hs  SKIPPED
> [INFO] hadoop-mapreduce-client-jobclient . SKIPPED
> [INFO] hadoop-mapreduce-client-hs-plugins  SKIPPED
> [INFO] Apache Hadoop MapReduce Examples .. SKIPPED
> [INFO] hadoop-mapreduce .. SKIPPED
> [INFO] Apache Hadoop MapReduce Streaming . SKIPPED
> [INFO] Apache Hadoop Distributed Copy  SKIPPED
> [INFO] Apache Hadoop Archives  SKIPPED
> [INFO] Apache Hadoop Rumen ... SKIPPED
> [INFO] Apache Hadoop Gridmix . SKIPPED
> [INFO] Apache Hadoop Data Join ... SKIPPED
> [INFO] Apache Hadoop Extras .. SKIPPED
> [INFO] Apache Hadoop Pipes ... SKIPPED
> [INFO] Apache Hadoop Tools Dist .. SKIPPED
> [INFO] Apache Hadoop Tools ... SKIPPED
> [INFO] Apache Hadoop Distribution  SKIPPED
> [INFO] Apache Hadoop Client .. SKIPPED
> [INFO] Apache Hadoop Mini-Cluster  SKIPPED
> [INFO] 
> --

Re: symlink support in Hadoop 2 GA

2013-09-18 Thread Arun C Murthy

On Sep 16, 2013, at 6:49 PM, Andrew Wang  wrote:

> Hi all,
> 
> I wanted to broadcast plans for putting the FileSystem symlinks work
> (HADOOP-8040) into branch-2.1 for the pending Hadoop 2 GA release. I think
> it's pretty important we get it in since it's not a compatible change; if
> it misses the GA train, we're not going to have symlinks until the next
> major release.

Just catching up, is this an incompatible change, or not? The above reads 'not 
an incompatible change'.

Arun

> 
> However, we're still dealing with ongoing issues revealed via testing.
> There's user-code out there that only handles files and directories and
> will barf when given a symlink (perhaps a dangling one!). See HADOOP-9912
> for a nice example where globStatus returning symlinks broke Pig; some of
> us had a conference call to talk it through, and one definite conclusion
> was that this wasn't solvable in a generally compatible manner.
> 
> There are also still some gaps in symlink support right now. For example,
> the more esoteric FileSystems like WebHDFS, HttpFS, and HFTP need symlink
> resolution, and tooling like the FsShell and Distcp still need to be
> updated as well.
> 
> So, there's definitely work to be done, but there are a lot of users
> interested in the feature, and symlinks really should be in GA. Would
> appreciate any thoughts/input on the matter.
> 
> Thanks,
> Andrew

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: symlink support in Hadoop 2 GA

2013-09-18 Thread Eli Collins
On Wed, Sep 18, 2013 at 5:45 AM, Steve Loughran wrote:

> On 18 September 2013 12:53, Alejandro Abdelnur  wrote:
>
> > On Wed, Sep 18, 2013 at 11:29 AM, Steve Loughran  > >wrote:
> >
> > > I'm reluctant for this as while delaying the release, because we are
> > going
> > > to find problems all the way up the stack -which will require a
> > > choreographed set of changes. Given the grief of the protbuf update, I
> > > don't want to go near that just before the final release.
> > >
> >
> > Well, I would use the exact same argument used for protobuf (which only
> > complication was getting protoc 2.5.0 in the jenkins boxes and
> communicate
> > developers to do the same, other than that we didn't hit any other issue
> > AFAIK) ...
> >
>
> protobuf was traumatic at build time, as I recall because it was neither
> forwards or backwards compatible. Those of us trying to build different
> branches had to choose which version to have on the path, or set up scripts
> to do the switching. HBase needed rebuilding, so did other things. And I
> still have the pain of downloading and installing protoc on all Linux VMs I
> build up going forward, until apt-get and yum have protoc 2.5 artifacts.
>
> This means it was very painful for developer, added a lot of late breaking
> pain to the developers, but it had one key feature that gave it an edge: it
> was immediately obvious where you had a problem as things didn't compile or
> classload without linkage problems. No latent bugs, unless protobuf 2.5 has
> them internally -for which we have to rely on google's release testing to
> have found.
>
> That is a lot simpler to regression test than adding any new feature to
> HDFS and seeing what breaks -as that is something that only surfaces out in
> the field. Which is why I think it's too late in the 2.1 release timetable
> to add symlinks. We've had a 2.1-beta out there, we've got feedback. Fix
> those problems that are show stoppers, but don't add more stuff. Which is
> precisely why I have not been pushing in any of my recent changes. I may
> seem ruthless arguing against symlinks -but I'm not being inconsistent with
> my own commit history. The only two things I've put in branch-2.1 since
> beta-1 were a separate log for the Configuration deprecation warnings and a
> patch to the POM for a java7 build on OSX: and they weren't even my
> patches.
>
>
> -Steve
>
> (One of these days I should volunteer to be the release manager and it'll
> be obvious that Arun is being quite amenable to all the other developers)
>
>
>
> >
> > IMO, it makes more sense to do this change during the beta rather than
> when
> > GA. That gives us more flexibility to iron out things if necessary.
> >
> >
> I'm arguing this change can go into the beta of the successor to 2.1 -not
> GA.
>
>
What does "this change" refer to?  Symlinks are already in 2.1, and the
existing semantics create problems for programs (eg see the pig
example in HADOOP-9912)
that we need to resolve.  I don't think do nothing is an option for 2.2. GA.

Thanks,
Eli







> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>


Re: symlink support in Hadoop 2 GA

2013-09-18 Thread Steve Loughran
On 18 September 2013 12:53, Alejandro Abdelnur  wrote:

> On Wed, Sep 18, 2013 at 11:29 AM, Steve Loughran  >wrote:
>
> > I'm reluctant for this as while delaying the release, because we are
> going
> > to find problems all the way up the stack -which will require a
> > choreographed set of changes. Given the grief of the protbuf update, I
> > don't want to go near that just before the final release.
> >
>
> Well, I would use the exact same argument used for protobuf (which only
> complication was getting protoc 2.5.0 in the jenkins boxes and communicate
> developers to do the same, other than that we didn't hit any other issue
> AFAIK) ...
>

protobuf was traumatic at build time, as I recall because it was neither
forwards or backwards compatible. Those of us trying to build different
branches had to choose which version to have on the path, or set up scripts
to do the switching. HBase needed rebuilding, so did other things. And I
still have the pain of downloading and installing protoc on all Linux VMs I
build up going forward, until apt-get and yum have protoc 2.5 artifacts.

This means it was very painful for developer, added a lot of late breaking
pain to the developers, but it had one key feature that gave it an edge: it
was immediately obvious where you had a problem as things didn't compile or
classload without linkage problems. No latent bugs, unless protobuf 2.5 has
them internally -for which we have to rely on google's release testing to
have found.

That is a lot simpler to regression test than adding any new feature to
HDFS and seeing what breaks -as that is something that only surfaces out in
the field. Which is why I think it's too late in the 2.1 release timetable
to add symlinks. We've had a 2.1-beta out there, we've got feedback. Fix
those problems that are show stoppers, but don't add more stuff. Which is
precisely why I have not been pushing in any of my recent changes. I may
seem ruthless arguing against symlinks -but I'm not being inconsistent with
my own commit history. The only two things I've put in branch-2.1 since
beta-1 were a separate log for the Configuration deprecation warnings and a
patch to the POM for a java7 build on OSX: and they weren't even my patches.


-Steve

(One of these days I should volunteer to be the release manager and it'll
be obvious that Arun is being quite amenable to all the other developers)



>
> IMO, it makes more sense to do this change during the beta rather than when
> GA. That gives us more flexibility to iron out things if necessary.
>
>
I'm arguing this change can go into the beta of the successor to 2.1 -not
GA.

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: symlink support in Hadoop 2 GA

2013-09-18 Thread Alejandro Abdelnur
On Wed, Sep 18, 2013 at 11:29 AM, Steve Loughran wrote:

> I'm reluctant for this as while delaying the release, because we are going
> to find problems all the way up the stack -which will require a
> choreographed set of changes. Given the grief of the protbuf update, I
> don't want to go near that just before the final release.
>

Well, I would use the exact same argument used for protobuf (which only
complication was getting protoc 2.5.0 in the jenkins boxes and communicate
developers to do the same, other than that we didn't hit any other issue
AFAIK) ...

IMO, it makes more sense to do this change during the beta rather than when
GA. That gives us more flexibility to iron out things if necessary.

thx

-- 
Alejandro


Re: symlink support in Hadoop 2 GA

2013-09-18 Thread Steve Loughran
On 17 September 2013 23:05, Eli Collins  wrote:

> (Looping in Arun since this impacts 2.x releases)
>
> I updated the versions on HADOOP-8040 and sub-tasks to reflect where
> the changes have landed. All of these changes (modulo HADOOP-9417)
> were merged to branch-2.1 and are in the 2.1.0 release.
>
> While symlinks are in 2.1.0 I don't think we can really claim they're
> ready until issues like HADOOP-9912 are resolved, and they are
> supported in the shell, distcp and WebHDFS/HttpFS/Hftp (these are not
> esoteric!).  Someone can create a symlink with FileSystem causing
> someone else's distcp job to fail. Unlikely given they're not exposed
> outside the Java API but still not great.   Ideally this work would
> have been done on a feature branch and then merged when complete, but
> that's water under the bridge.
>
> I see the following options:
>
> 1. Fixup the current symlink support so that symlinks are ready for
> 2.2 (GA), or at least the public APIs. This means the APIs will be in
> GA from the get go so while the functionality might be fully baked we
> don't have to worry about incompatible changes like FileStatus#isDir()
> changing behavior in 2.3 or a later update.  The downside is this will
> take at least a couple weeks (to resolve HADOOP-9912 and potentially
> implement the remaining pieces) and so may impact the 2.2 release
> timing. This option means 2.2 won't remove the new APIs introduced in
> 2.1.  We'd want to spin a 2.1.2 beta with the new API changes so we
> don't introduce new APIs in the beta to GA transition.
>

I'm reluctant for this as while delaying the release, because we are going
to find problems all the way up the stack -which will require a
choreographed set of changes. Given the grief of the protbuf update, I
don't want to go near that just before the final release.


We already have lots of 1.x era code that assume !isDir() == isFile() -I
know that from spending lots of time in the FS specification layer. That's
something which is going to break with Symlinks, irrespective of when the
feature is rolled out.

The other thing we have to do is push back the API changes into 1.x, at
least at the FileSystem interface layer, so that code which uses
IsDirectory, isSymlink, etc does not need to be edited to compile & run
against both versions. I know Chris Nauroth has been doing this, but think
we need to make sure it is all there. This will let things like Pig compile
against all versions with symlink-ready code.

The other issues is thatit goes on to increase the pressure to get other
features in there "hey, we've got 2 more weeks! let's add X!(where for me,
X:={HADOOP-8545, some restrictions on valid names of app types & instance
names for YARN, ...).

My vote then: freeze and ship. We're happy with the wire formats, the API
has added knowledge of Symlink and Filesystem features can evolve
afterwards -with layers above handling the changes.



>
> 2. Revert symlinks from branch-2.1-beta and branch-2. Finish up the
> work in trunk (or a feature branch) and merge for a subsequent 2.x
> update.  While this helps get us to GA faster it would be preferable
> to get an API change like this in for 2.2 GA since they may be
> disruptive to introduce in an update (eg see example in #1). And of
> course our users would like symlinks functionality in the GA release.
> This option would mean 2.2 is incompatible with 2.1 because it's
> dropping the new APIs, not ideal for a beta to GA transition.
>


Why just ship as is, with a note "symlinks not live yet, leave alone".
That's what's been in the betas to date.


>
> 3. Revert and punt symlinks to 3.x.  IMO should be the last resort.
>
>
I'd prefer it in 2.3 -which is where I'm targeting all my feature creep.

IMO 2.1 is frozen except for bug fixes

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.