@Moist
It can be found here
https://github.com/apache/hbase/blob/master/hbase-shaded/hbase-shaded-check-invariants/src/test/resources/ensure-jars-have-correct-contents.sh

On Wed, Nov 15, 2017 at 3:38 PM Brian Towles <[email protected]> wrote:

> HBase does have a ensure-jars-have-correct-contents.sh check that they run
> which we could do something similar to.
>
> On Wed, Nov 15, 2017 at 3:30 PM Stephen Moist <[email protected]> wrote:
>
>> Is there any way to add a check to the maven build to make sure that the
>> classes being imported are the shaded ones?
>>
>>
>> > On Nov 15, 2017, at 3:02 PM, Brian Towles <[email protected]> wrote:
>> >
>> > I can see that.  I can look into that as well.  It would make sense for
>> the
>> > "Sentry as a Library" you talked about as well
>> >
>> > On Wed, Nov 15, 2017 at 2:56 PM Alexander Kolbasov <[email protected]>
>> > wrote:
>> >
>> >> Brian, There are cases where shading is useful for server as well -
>> mostly
>> >> for e2e tests which run server, HDFS, Hive all in the same JVM.
>> >>
>> >> On Wed, Nov 15, 2017 at 12:47 PM, Brian Towles <[email protected]>
>> >> wrote:
>> >>
>> >>> So there would be two different levels of shading.
>> >>>
>> >>> The first would be the known issues shared repackaged dependencies
>> like
>> >>> Guava.  This would only be for libraries that we absolutely know will
>> >> cause
>> >>> collisions with implementing services and implementing third party
>> >>> libraries that need to have the references to the trouble causing
>> >> libraries
>> >>> shifted.  These trouble causing libraries would be the only ones where
>> >> hard
>> >>> coded import path changes would need to take place for.
>> >>>
>> >>> For example a sentry-thridparty/sentry-shaded-miscellaneous module
>> would
>> >>> shared google guava shifting the package to
>> >>> org.apache.sentry.thirdparty.com.google.common. As well as it would
>> shade
>> >>> the BoneCP jar so that it picks up the shift of the Guava package but
>> not
>> >>> shifting the BoneCP packages unless we encounter and a known conflict.
>> >>>
>> >>> The second level os shading would be for the plugins distribution and
>> >> would
>> >>> shade and package shift only the dependencies needed for that plugin.
>> >> This
>> >>> would not need to have any import statement changes as the maven shade
>> >>> plugin would change the bytecode package references on packaging.  As
>> to
>> >>> Kalyan's question, the run time memory used for class loading would
>> only
>> >> be
>> >>> for those items used by the plugin, which is really not much anyways
>> >>> comparatively. The class loader doesn't load up a class until its
>> needed.
>> >>> Till that point it is just a reference to a file in a jar.  The list
>> of
>> >>> files/classes in jars is not a heavy memory load.
>> >>>
>> >>> The main server implementation would not need to have the second level
>> >>> shading done to it as it runs stand alone and is unaffected by the
>> >>> collisions issues.
>> >>>
>> >>> @Alexander For your questions, there is a small amount of refactoring
>> of
>> >>> import statements that need to take place for the first level of
>> shading
>> >>> but there are no logic changes that take place.
>> >>>
>> >>> @Stephen Yeah I can see BouncyCastle being more of an issue but it
>> does
>> >> not
>> >>> appear in any of the Sentry dependency chains right now.
>> >>>
>> >>> -=Brian
>> >>>
>> >>> On Wed, Nov 15, 2017 at 1:30 PM Stephen Moist <[email protected]>
>> >> wrote:
>> >>>
>> >>>> +1
>> >>>>
>> >>>> Agree with Kalyan’s concern.  To me it seems simpler to shade all
>> jars
>> >>>> rather than known issue jars.  One thing though is what if there are
>> >> bug
>> >>>> fixes in our jars that are not included in other components, it may
>> >> cause
>> >>>> subtle error that are hard to track.  I’m thinking along the lines of
>> >> the
>> >>>> KeyValue class change in behavior.  So I propose limited the scope of
>> >>>> shaded jars to known issues.
>> >>>>
>> >>>> However one thing to note is that some jars do not like to be shaded,
>> >> as
>> >>>> an example BouncyCastle has a check to see if the jar has been
>> modified
>> >>> and
>> >>>> will refuse to be used in a shaded jar.
>> >>>>
>> >>>>> On Nov 15, 2017, at 1:07 PM, Kalyan Kumar Kalvagadda <
>> >>>> [email protected]> wrote:
>> >>>>>
>> >>>>> +1
>> >>>>>
>> >>>>> It's good approach but I have a question/concern.
>> >>>>> Is the proposal to shade is for some specific jars OR to shade all
>> >> the
>> >>>>> third party jars?
>> >>>>>
>> >>>>> If proposal is shade all the third-party jars then if would impact
>> >> the
>> >>>> run
>> >>>>> time memory usage as all the classes from the third-party jars would
>> >> be
>> >>>>> loaded regardless..
>> >>>>>
>> >>>>> -Kalyan
>> >>>>>
>> >>>>> On Wed, Nov 15, 2017 at 12:04 PM, Sergio Pena <
>> >>> [email protected]>
>> >>>>> wrote:
>> >>>>>
>> >>>>>> +1
>> >>>>>>
>> >>>>>> I like the idea. It's hard to upgrade our libraries to newer ones
>> >> when
>> >>>>>> other components break due to Sentry being a plugin. I was once
>> >>> thought
>> >>>> of
>> >>>>>> using different jars versions per plugin (i.e. guava11 on hdfs,
>> >>> guava14
>> >>>> on
>> >>>>>> hive and the server, etc), but that is too much to do and not good.
>> >> I
>> >>>> like
>> >>>>>> the hbase idea better so we can use the latest libraries that we
>> >> want
>> >>>>>> instead of depending on other components.
>> >>>>>>
>> >>>>>> - Sergio
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> On Wed, Nov 15, 2017 at 11:53 AM, Alexander Kolbasov <
>> >>>> [email protected]>
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>>> Agreed, this would be a very useful thing to do. I remember
>> >> spending
>> >>> a
>> >>>>>> lot
>> >>>>>>> of time trying to make Sentry work with DataNucleus 4 - the
>> problem
>> >>> was
>> >>>>>>> that e2e tests combine Sentry with Hive in the same JVM and this
>> >>>> created
>> >>>>>> a
>> >>>>>>> conflict on the DataNucleus libraries and test failures.
>> >>>>>>>
>> >>>>>>> Looking at the HBase proposal it seems to require some (small)
>> code
>> >>>>>> changes
>> >>>>>>> for package names - or I am misreading it? Do you plan to shade
>> all
>> >>>> 3-rd
>> >>>>>>> party packages or just some? I would consider at least
>> >>>>>>> Guava/ZooKeepeer/DataNucleus/Jetty.
>> >>>>>>>
>> >>>>>>> - Alex
>> >>>>>>>
>> >>>>>>> On Wed, Nov 15, 2017 at 8:24 AM, Brian Towles <
>> >> [email protected]>
>> >>>>>>> wrote:
>> >>>>>>>
>> >>>>>>>> Howdy all,
>> >>>>>>>>
>> >>>>>>>> An issue that keeps coming up seems to be the conflict of
>> >> dependency
>> >>>>>>>> versions between Sentry and the components it is plugging into. A
>> >>>>>> current
>> >>>>>>>> example of this impact is Google Guava with hive2 using v14 and
>> >>> Impala
>> >>>>>>>> using v11 while Sentry needs to have at least v14 in order to fix
>> >>> some
>> >>>>>>> bugs
>> >>>>>>>> in the BoneCP library.  These sort of conflicts come in whenever
>> >> we
>> >>>> are
>> >>>>>>>> embedding a Sentry plugin or using a Sentry library into another
>> >>>>>> project.
>> >>>>>>>>
>> >>>>>>>> I would like to propose a mechanism for offsetting some of these
>> >>>> issues
>> >>>>>>>> using something similar to the third party shading that HBase
>> uses
>> >>> for
>> >>>>>>>> common problems (such as Guava)
>> >>>>>>>> (https://github.com/apache/hbase-thirdparty with
>> >>>>>>>> documentation about it
>> >> http://hbase.apache.org/book.html#thirdparty
>> >>> )
>> >>>>>> and
>> >>>>>>>> is
>> >>>>>>>> used by HBase and Hadoop for packaging of their downstream
>> >>> artifacts (
>> >>>>>>>> https://github.com/apache/hbase/blob/master/hbase-shaded/ and
>> >>>>>>>> https://github.com/apache/hadoop/tree/trunk/hadoop-
>> >>>>>>>> client-modules/hadoop-client-api/
>> >>>>>>>> )
>> >>>>>>>>
>> >>>>>>>> The main benefit of this is that it would allow Sentry to be used
>> >> as
>> >>>>>>>> libraries and plugins with all of the dependencies needed for
>> >> Sentry
>> >>>> to
>> >>>>>>> be
>> >>>>>>>> abstracted away from components implementing it. Sentry could rev
>> >>>>>>> versions
>> >>>>>>>> of libraries easier and not have collisions of library versions
>> >>> needed
>> >>>>>> by
>> >>>>>>>> the implementing component.  As well this would potentially make
>> >>>> Sentry
>> >>>>>>>> downstream usage more stable since it would be used and tested
>> >>> against
>> >>>>>> a
>> >>>>>>>> static set of dependencies and not using libraries based on what
>> >> the
>> >>>>>>>> implementing component has.
>> >>>>>>>>
>> >>>>>>>> On the downside, it would make the on disk size of the Sentry
>> >>> plugins
>> >>>>>> and
>> >>>>>>>> libraries for downstream larger. As well, the number of classes
>> >>> loaded
>> >>>>>>> into
>> >>>>>>>> memory would be larger since there would be potential duplication
>> >> of
>> >>>>>>> actual
>> >>>>>>>> class implementations or multiple versions of a class with
>> >> different
>> >>>>>>>> package names in memory.  But this seems to be a common practice
>> >> and
>> >>>>>> the
>> >>>>>>>> lack of library version collisions and stability make up for
>> these
>> >>>>>>>> downsides.
>> >>>>>>>>
>> >>>>>>>> The third party shading works by using the Maven shade plugin to
>> >> do
>> >>>>>>> package
>> >>>>>>>> name shifting of the third party library (Guava, BoneCP) to a
>> >> sentry
>> >>>>>>>> specific package using the version on the third party library
>> >> needed
>> >>>>>> for
>> >>>>>>>> Sentry.
>> >>>>>>>>
>> >>>>>>>> E.G. *com.google.common* packages could be shifted to
>> >>>>>>>> *org.apache.sentry.shaded.com.google.common*.
>> >>>>>>>>
>> >>>>>>>> Since the Maven Shade plugin can actually change the byte-codes
>> of
>> >>> the
>> >>>>>>>> libraries being shaded, we can do this even for dependencies that
>> >>> have
>> >>>>>>>> shared sub-dependencies .  BoneCP being the main example since it
>> >>> uses
>> >>>>>>>> Guava internal to itself.  We could shade the BoneCP into the
>> same
>> >>>>>> shared
>> >>>>>>>> sentry third party dependency jar and since the bytecode level
>> >>>>>>>> manipulation.
>> >>>>>>>>
>> >>>>>>>> What this means on the development side is that we would need to
>> >>>>>>> reference
>> >>>>>>>> imports from the shaded third party
>> >>>>>>>>
>> >>>>>>>> E.G. *import com.google.common.collect.Maps* becomes *import
>> >>>>>>>> org.apache.sentry.shaded.com.google.common.collect.Maps*.
>> >>>>>>>>
>> >>>>>>>> Ive been looking at this in the context of
>> >>>>>>>> https://issues.apache.org/jira/browse/SENTRY-2044, but I feel
>> >> this
>> >>>>>>> should
>> >>>>>>>> potentially be something that is more of an overall Sentry
>> >> standard
>> >>>>>>>> practice and larger scale implementation.
>> >>>>>>>>
>> >>>>>>>> -=Brian
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> --
>> >>>>>>>> *Brian Towles* | Software Engineer
>> >>>>>>>> t. (512) 415- <0000000000>8105 e. [email protected] <
>> >>>>>>> [email protected]>
>> >>>>>>>> cloudera.com <http://www.cloudera.com/>
>> >>>>>>>>
>> >>>>>>>> [image: Cloudera] <http://www.cloudera.com/>
>> >>>>>>>>
>> >>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera>
>> >> [image:
>> >>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera>
>> [image:
>> >>>>>>> Cloudera
>> >>>>>>>> on LinkedIn] <https://www.linkedin.com/company/cloudera>
>> >>>>>>>> ------------------------------
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>
>> >>>> --
>> >>> *Brian Towles* | Software Engineer
>> >>> t. (512) 415- <0000000000>8105 e. [email protected] <
>> >> [email protected]>
>> >>> cloudera.com <http://www.cloudera.com/>
>> >>>
>> >>> [image: Cloudera] <http://www.cloudera.com/>
>> >>>
>> >>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>> >>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>> >> Cloudera
>> >>> on LinkedIn] <https://www.linkedin.com/company/cloudera>
>> >>> ------------------------------
>> >>>
>> >>
>> > --
>> > *Brian Towles* | Software Engineer
>> > t. (512) 415- <0000000000>8105 e. [email protected] <
>> [email protected]>
>> > cloudera.com <http://www.cloudera.com/>
>> >
>> > [image: Cloudera] <http://www.cloudera.com/>
>> >
>> > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>> > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>> Cloudera
>> > on LinkedIn] <https://www.linkedin.com/company/cloudera>
>> > ------------------------------
>>
>> --
> *Brian Towles* | Software Engineer
> t. (512) 415- <0000000000>8105 e. [email protected] <[email protected]>
> cloudera.com <http://www.cloudera.com/>
>
> [image: Cloudera] <http://www.cloudera.com/>
>
> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
> ------------------------------
>
-- 
*Brian Towles* | Software Engineer
t. (512) 415- <0000000000>8105 e. [email protected] <[email protected]>
cloudera.com <http://www.cloudera.com/>

[image: Cloudera] <http://www.cloudera.com/>

[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
------------------------------

Reply via email to