I don't think we have any such documentation. -- Christopher L Tubbs II http://gravatar.com/ctubbsii
On Tue, May 13, 2014 at 10:55 AM, Sean Busbey <[email protected]> wrote: > Does anyone know if we have some build/packaging documentation that lists > where we currently expect to get our various runtime dependencies from? And > when we decide to repackage something that we know is present in the > environment? > > I know the assembly file lists which dependencies we package in our binary > assembly and git blame will explain why e.g. commons-math is present. But > it's easy for the git history to get complicated enough for that lookup to > not really work. > > I'm looking for the level of which hadoop sub-component, rather than just > "is in the hadoop dist," so we have an easier time seeing what the impact > of this change would be. > > Also this would make it easier to see if there are other version mismatches > like ACCUMULO-2791. > > > > On Mon, May 12, 2014 at 7:36 PM, Joey Echeverria < > [email protected]> wrote: > >> Packaging other jars that had been made available at runtime by virtue of >> their existence in the Hadoop directories. >> >> >> I'm only talking about dependencies that were/are provided by Hadoop. >> >> >> >> >> But since you brought up ZooKeeper, my understanding is that ZK intends >> for dependent projects to only rely on the ZK jar that is in the top level >> of the tarball. If you need other jars, you should package those yourself. >> WARNING: my info about ZK may be out of date as it's been a long time since >> I spoke to the project about how they intend services that rely on it to be >> consumed. >> >> On Mon, May 12, 2014 at 7:30 PM, Christopher <[email protected]> wrote: >> >> > Does that mean package everything else? >> > What about ZooKeeper? >> > -- >> > Christopher L Tubbs II >> > http://gravatar.com/ctubbsii >> > On Mon, May 12, 2014 at 3:38 PM, Joey Echeverria <[email protected]> >> wrote: >> >> +1 to only depending on Hadoop client jars. >> >> >> >> >> >> -- >> >> Joey Echeverria >> >> Chief Architect >> >> Cloudera Government Solutions >> >> >> >> >> >> On Sun, May 11, 2014 at 6:07 PM, Christopher <[email protected]> >> wrote: >> >>> In general, I think this is reasonable... especially because Hadoop >> >>> Client stabilizes things a bit. On the other hand, things get really >> >>> complicated with dependencies in the pom (somewhat complicated), and >> >>> packaged dependencies (more complicated), when we're talking about >> >>> supporting both Hadoop 1 and Hadoop 2. I know some of us want to drop >> >>> Hadoop 1 support in 2.0.0, and I think this is one more good reason to >> >>> do that. >> >>> >> >>> Another data point that I think is going to complicate things a (very) >> >>> tiny bit: the work on ACCUMULO-2589 includes things like: drop the >> >>> dependencies on Hadoop from the API. But, we're likely to still have a >> >>> dependency on guava (there was a suggestion to use guava's @Beta >> >>> annotations in the API). Maybe this is fine.... because the packaging >> >>> considerations for the binary tarball are not the same as the API >> >>> module dependencies (though they'll have to be compatible), but it's >> >>> something to consider. >> >>> >> >>> -- >> >>> Christopher L Tubbs II >> >>> http://gravatar.com/ctubbsii >> >>> >> >>> >> >>> On Sun, May 11, 2014 at 4:45 PM, Sean Busbey <[email protected]> >> wrote: >> >>>> ACCUMULO-2786 has brought up the issue of what dependencies we bring >> with >> >>>> Accumulo rather than depend on the environment providing[1]. >> >>>> >> >>>> Christopher explains our extant reasoning thus >> >>>> >> >>>>> The precedent has been: if vanilla Apache Hadoop provides it in its >> bin >> >>>> tarball, we don't need to. >> >>>> >> >>>> I'd like us to move to packaging any dependencies that aren't brought >> in by >> >>>> Hadoop Client. >> >>>> >> >>>> 1) Our existing practice developed before Hadoop Client existed, so we >> >>>> essentially *had* to have all of the Hadoop related deps on our >> classpath. >> >>>> For versions where we default to Hadoop 2, we can improve things. >> >>>> >> >>>> 2) We should encourage users to follow good practice by minimizing the >> >>>> number of jars added to the classpath. >> >>>> >> >>>> 3) We have to still include the jars found in Hadoop Client because >> we use >> >>>> hadoop. >> >>>> >> >>>> 4) Limiting the dependencies we rely on external sources to provide >> allows >> >>>> us to update more of our dependencies to current versions. >> >>>> >> >>>> 5) Minimizing the number of jars we rely on from external sources >> reduces >> >>>> the chances that they change out from under us (and thus reduces the >> number >> >>>> of external factors we have to remain cognizant of) >> >>>> >> >>>> 6) Minimizing the classpath reduces the chances of having multiple >> >>>> different versions of the same library present. >> >>>> >> >>>> I'd also like for us to *not* package any of the jars brought in by >> Hadoop >> >>>> Client. Due to the additional work it would take to downgrade our >> version >> >>>> of guava, I'd like to wait to do that. >> >>>> >> >>>> [1]: https://issues.apache.org/jira/browse/ACCUMULO-2786 >> >>>> >> >>>> -- >> >>>> Sean >> > > > > -- > Sean
