Re: [DISCUSS] packaging our dependencies

Christopher Thu, 15 May 2014 17:20:13 -0700

I don't think we have any such documentation.

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii



On Tue, May 13, 2014 at 10:55 AM, Sean Busbey <[email protected]> wrote:
> Does anyone know if we have some build/packaging documentation that lists
> where we currently expect to get our various runtime dependencies from? And
> when we decide to repackage something that we know is present in the
> environment?
>
> I know the assembly file lists which dependencies we package in our binary
> assembly and git blame will explain why e.g. commons-math is present. But
> it's easy for the git history to get complicated enough for that lookup to
> not really work.
>
> I'm looking for the level of which hadoop sub-component, rather than just
> "is in the hadoop dist," so we have an easier time seeing what the impact
> of this change would be.
>
> Also this would make it easier to see if there are other version mismatches
> like ACCUMULO-2791.
>
>
>
> On Mon, May 12, 2014 at 7:36 PM, Joey Echeverria <
> [email protected]> wrote:
>
>> Packaging other jars that had been made available at runtime by virtue of
>> their existence in the Hadoop directories.
>>
>>
>> I'm only talking about dependencies that were/are provided by Hadoop.
>>
>>
>>
>>
>> But since you brought up ZooKeeper, my understanding is that ZK intends
>> for dependent projects to only rely on the ZK jar that is in the top level
>> of the tarball. If you need other jars, you should package those yourself.
>> WARNING: my info about ZK may be out of date as it's been a long time since
>> I spoke to the project about how they intend services that rely on it to be
>> consumed.
>>
>> On Mon, May 12, 2014 at 7:30 PM, Christopher <[email protected]> wrote:
>>
>> > Does that mean package everything else?
>> > What about ZooKeeper?
>> > --
>> > Christopher L Tubbs II
>> > http://gravatar.com/ctubbsii
>> > On Mon, May 12, 2014 at 3:38 PM, Joey Echeverria <[email protected]>
>> wrote:
>> >> +1 to only depending on Hadoop client jars.
>> >>
>> >>
>> >> --
>> >> Joey Echeverria
>> >> Chief Architect
>> >> Cloudera Government Solutions
>> >>
>> >>
>> >> On Sun, May 11, 2014 at 6:07 PM, Christopher <[email protected]>
>> wrote:
>> >>> In general, I think this is reasonable... especially because Hadoop
>> >>> Client stabilizes things a bit. On the other hand, things get really
>> >>> complicated with dependencies in the pom (somewhat complicated), and
>> >>> packaged dependencies (more complicated), when we're talking about
>> >>> supporting both Hadoop 1 and Hadoop 2. I know some of us want to drop
>> >>> Hadoop 1 support in 2.0.0, and I think this is one more good reason to
>> >>> do that.
>> >>>
>> >>> Another data point that I think is going to complicate things a (very)
>> >>> tiny bit: the work on ACCUMULO-2589 includes things like: drop the
>> >>> dependencies on Hadoop from the API. But, we're likely to still have a
>> >>> dependency on guava (there was a suggestion to use guava's @Beta
>> >>> annotations in the API). Maybe this is fine.... because the packaging
>> >>> considerations for the binary tarball are not the same as the API
>> >>> module dependencies (though they'll have to be compatible), but it's
>> >>> something to consider.
>> >>>
>> >>> --
>> >>> Christopher L Tubbs II
>> >>> http://gravatar.com/ctubbsii
>> >>>
>> >>>
>> >>> On Sun, May 11, 2014 at 4:45 PM, Sean Busbey <[email protected]>
>> wrote:
>> >>>> ACCUMULO-2786 has brought up the issue of what dependencies we bring
>> with
>> >>>> Accumulo rather than depend on the environment providing[1].
>> >>>>
>> >>>> Christopher explains our extant reasoning thus
>> >>>>
>> >>>>> The precedent has been: if vanilla Apache Hadoop provides it in its
>> bin
>> >>>> tarball, we don't need to.
>> >>>>
>> >>>> I'd like us to move to packaging any dependencies that aren't brought
>> in by
>> >>>> Hadoop Client.
>> >>>>
>> >>>> 1) Our existing practice developed before Hadoop Client existed, so we
>> >>>> essentially *had* to have all of the Hadoop related deps on our
>> classpath.
>> >>>> For versions where we default to Hadoop 2, we can improve things.
>> >>>>
>> >>>> 2) We should encourage users to follow good practice by minimizing the
>> >>>> number of jars added to the classpath.
>> >>>>
>> >>>> 3) We have to still include the jars found in Hadoop Client because
>> we use
>> >>>> hadoop.
>> >>>>
>> >>>> 4) Limiting the dependencies we rely on external sources to provide
>> allows
>> >>>> us to update more of our dependencies to current versions.
>> >>>>
>> >>>> 5) Minimizing the number of jars we rely on from external sources
>> reduces
>> >>>> the chances that they change out from under us (and thus reduces the
>> number
>> >>>> of external factors we have to remain cognizant of)
>> >>>>
>> >>>> 6) Minimizing the classpath reduces the chances of having multiple
>> >>>> different versions of the same library present.
>> >>>>
>> >>>> I'd also like for us to *not* package any of the jars brought in by
>> Hadoop
>> >>>> Client. Due to the additional work it would take to downgrade our
>> version
>> >>>> of guava, I'd like to wait to do that.
>> >>>>
>> >>>> [1]: https://issues.apache.org/jira/browse/ACCUMULO-2786
>> >>>>
>> >>>> --
>> >>>> Sean
>>
>
>
>
> --
> Sean

Re: [DISCUSS] packaging our dependencies

Reply via email to