HBase does have a ensure-jars-have-correct-contents.sh check that they run
which we could do something similar to.

On Wed, Nov 15, 2017 at 3:30 PM Stephen Moist <mo...@cloudera.com> wrote:

> Is there any way to add a check to the maven build to make sure that the
> classes being imported are the shaded ones?
>
>
> > On Nov 15, 2017, at 3:02 PM, Brian Towles <btow...@cloudera.com> wrote:
> >
> > I can see that.  I can look into that as well.  It would make sense for
> the
> > "Sentry as a Library" you talked about as well
> >
> > On Wed, Nov 15, 2017 at 2:56 PM Alexander Kolbasov <ak...@cloudera.com>
> > wrote:
> >
> >> Brian, There are cases where shading is useful for server as well -
> mostly
> >> for e2e tests which run server, HDFS, Hive all in the same JVM.
> >>
> >> On Wed, Nov 15, 2017 at 12:47 PM, Brian Towles <btow...@cloudera.com>
> >> wrote:
> >>
> >>> So there would be two different levels of shading.
> >>>
> >>> The first would be the known issues shared repackaged dependencies like
> >>> Guava.  This would only be for libraries that we absolutely know will
> >> cause
> >>> collisions with implementing services and implementing third party
> >>> libraries that need to have the references to the trouble causing
> >> libraries
> >>> shifted.  These trouble causing libraries would be the only ones where
> >> hard
> >>> coded import path changes would need to take place for.
> >>>
> >>> For example a sentry-thridparty/sentry-shaded-miscellaneous module
> would
> >>> shared google guava shifting the package to
> >>> org.apache.sentry.thirdparty.com.google.common. As well as it would
> shade
> >>> the BoneCP jar so that it picks up the shift of the Guava package but
> not
> >>> shifting the BoneCP packages unless we encounter and a known conflict.
> >>>
> >>> The second level os shading would be for the plugins distribution and
> >> would
> >>> shade and package shift only the dependencies needed for that plugin.
> >> This
> >>> would not need to have any import statement changes as the maven shade
> >>> plugin would change the bytecode package references on packaging.  As
> to
> >>> Kalyan's question, the run time memory used for class loading would
> only
> >> be
> >>> for those items used by the plugin, which is really not much anyways
> >>> comparatively. The class loader doesn't load up a class until its
> needed.
> >>> Till that point it is just a reference to a file in a jar.  The list of
> >>> files/classes in jars is not a heavy memory load.
> >>>
> >>> The main server implementation would not need to have the second level
> >>> shading done to it as it runs stand alone and is unaffected by the
> >>> collisions issues.
> >>>
> >>> @Alexander For your questions, there is a small amount of refactoring
> of
> >>> import statements that need to take place for the first level of
> shading
> >>> but there are no logic changes that take place.
> >>>
> >>> @Stephen Yeah I can see BouncyCastle being more of an issue but it does
> >> not
> >>> appear in any of the Sentry dependency chains right now.
> >>>
> >>> -=Brian
> >>>
> >>> On Wed, Nov 15, 2017 at 1:30 PM Stephen Moist <mo...@cloudera.com>
> >> wrote:
> >>>
> >>>> +1
> >>>>
> >>>> Agree with Kalyan’s concern.  To me it seems simpler to shade all jars
> >>>> rather than known issue jars.  One thing though is what if there are
> >> bug
> >>>> fixes in our jars that are not included in other components, it may
> >> cause
> >>>> subtle error that are hard to track.  I’m thinking along the lines of
> >> the
> >>>> KeyValue class change in behavior.  So I propose limited the scope of
> >>>> shaded jars to known issues.
> >>>>
> >>>> However one thing to note is that some jars do not like to be shaded,
> >> as
> >>>> an example BouncyCastle has a check to see if the jar has been
> modified
> >>> and
> >>>> will refuse to be used in a shaded jar.
> >>>>
> >>>>> On Nov 15, 2017, at 1:07 PM, Kalyan Kumar Kalvagadda <
> >>>> kkal...@cloudera.com> wrote:
> >>>>>
> >>>>> +1
> >>>>>
> >>>>> It's good approach but I have a question/concern.
> >>>>> Is the proposal to shade is for some specific jars OR to shade all
> >> the
> >>>>> third party jars?
> >>>>>
> >>>>> If proposal is shade all the third-party jars then if would impact
> >> the
> >>>> run
> >>>>> time memory usage as all the classes from the third-party jars would
> >> be
> >>>>> loaded regardless..
> >>>>>
> >>>>> -Kalyan
> >>>>>
> >>>>> On Wed, Nov 15, 2017 at 12:04 PM, Sergio Pena <
> >>> sergio.p...@cloudera.com>
> >>>>> wrote:
> >>>>>
> >>>>>> +1
> >>>>>>
> >>>>>> I like the idea. It's hard to upgrade our libraries to newer ones
> >> when
> >>>>>> other components break due to Sentry being a plugin. I was once
> >>> thought
> >>>> of
> >>>>>> using different jars versions per plugin (i.e. guava11 on hdfs,
> >>> guava14
> >>>> on
> >>>>>> hive and the server, etc), but that is too much to do and not good.
> >> I
> >>>> like
> >>>>>> the hbase idea better so we can use the latest libraries that we
> >> want
> >>>>>> instead of depending on other components.
> >>>>>>
> >>>>>> - Sergio
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Wed, Nov 15, 2017 at 11:53 AM, Alexander Kolbasov <
> >>>> ak...@cloudera.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Agreed, this would be a very useful thing to do. I remember
> >> spending
> >>> a
> >>>>>> lot
> >>>>>>> of time trying to make Sentry work with DataNucleus 4 - the problem
> >>> was
> >>>>>>> that e2e tests combine Sentry with Hive in the same JVM and this
> >>>> created
> >>>>>> a
> >>>>>>> conflict on the DataNucleus libraries and test failures.
> >>>>>>>
> >>>>>>> Looking at the HBase proposal it seems to require some (small) code
> >>>>>> changes
> >>>>>>> for package names - or I am misreading it? Do you plan to shade all
> >>>> 3-rd
> >>>>>>> party packages or just some? I would consider at least
> >>>>>>> Guava/ZooKeepeer/DataNucleus/Jetty.
> >>>>>>>
> >>>>>>> - Alex
> >>>>>>>
> >>>>>>> On Wed, Nov 15, 2017 at 8:24 AM, Brian Towles <
> >> btow...@cloudera.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Howdy all,
> >>>>>>>>
> >>>>>>>> An issue that keeps coming up seems to be the conflict of
> >> dependency
> >>>>>>>> versions between Sentry and the components it is plugging into. A
> >>>>>> current
> >>>>>>>> example of this impact is Google Guava with hive2 using v14 and
> >>> Impala
> >>>>>>>> using v11 while Sentry needs to have at least v14 in order to fix
> >>> some
> >>>>>>> bugs
> >>>>>>>> in the BoneCP library.  These sort of conflicts come in whenever
> >> we
> >>>> are
> >>>>>>>> embedding a Sentry plugin or using a Sentry library into another
> >>>>>> project.
> >>>>>>>>
> >>>>>>>> I would like to propose a mechanism for offsetting some of these
> >>>> issues
> >>>>>>>> using something similar to the third party shading that HBase uses
> >>> for
> >>>>>>>> common problems (such as Guava)
> >>>>>>>> (https://github.com/apache/hbase-thirdparty with
> >>>>>>>> documentation about it
> >> http://hbase.apache.org/book.html#thirdparty
> >>> )
> >>>>>> and
> >>>>>>>> is
> >>>>>>>> used by HBase and Hadoop for packaging of their downstream
> >>> artifacts (
> >>>>>>>> https://github.com/apache/hbase/blob/master/hbase-shaded/ and
> >>>>>>>> https://github.com/apache/hadoop/tree/trunk/hadoop-
> >>>>>>>> client-modules/hadoop-client-api/
> >>>>>>>> )
> >>>>>>>>
> >>>>>>>> The main benefit of this is that it would allow Sentry to be used
> >> as
> >>>>>>>> libraries and plugins with all of the dependencies needed for
> >> Sentry
> >>>> to
> >>>>>>> be
> >>>>>>>> abstracted away from components implementing it. Sentry could rev
> >>>>>>> versions
> >>>>>>>> of libraries easier and not have collisions of library versions
> >>> needed
> >>>>>> by
> >>>>>>>> the implementing component.  As well this would potentially make
> >>>> Sentry
> >>>>>>>> downstream usage more stable since it would be used and tested
> >>> against
> >>>>>> a
> >>>>>>>> static set of dependencies and not using libraries based on what
> >> the
> >>>>>>>> implementing component has.
> >>>>>>>>
> >>>>>>>> On the downside, it would make the on disk size of the Sentry
> >>> plugins
> >>>>>> and
> >>>>>>>> libraries for downstream larger. As well, the number of classes
> >>> loaded
> >>>>>>> into
> >>>>>>>> memory would be larger since there would be potential duplication
> >> of
> >>>>>>> actual
> >>>>>>>> class implementations or multiple versions of a class with
> >> different
> >>>>>>>> package names in memory.  But this seems to be a common practice
> >> and
> >>>>>> the
> >>>>>>>> lack of library version collisions and stability make up for these
> >>>>>>>> downsides.
> >>>>>>>>
> >>>>>>>> The third party shading works by using the Maven shade plugin to
> >> do
> >>>>>>> package
> >>>>>>>> name shifting of the third party library (Guava, BoneCP) to a
> >> sentry
> >>>>>>>> specific package using the version on the third party library
> >> needed
> >>>>>> for
> >>>>>>>> Sentry.
> >>>>>>>>
> >>>>>>>> E.G. *com.google.common* packages could be shifted to
> >>>>>>>> *org.apache.sentry.shaded.com.google.common*.
> >>>>>>>>
> >>>>>>>> Since the Maven Shade plugin can actually change the byte-codes of
> >>> the
> >>>>>>>> libraries being shaded, we can do this even for dependencies that
> >>> have
> >>>>>>>> shared sub-dependencies .  BoneCP being the main example since it
> >>> uses
> >>>>>>>> Guava internal to itself.  We could shade the BoneCP into the same
> >>>>>> shared
> >>>>>>>> sentry third party dependency jar and since the bytecode level
> >>>>>>>> manipulation.
> >>>>>>>>
> >>>>>>>> What this means on the development side is that we would need to
> >>>>>>> reference
> >>>>>>>> imports from the shaded third party
> >>>>>>>>
> >>>>>>>> E.G. *import com.google.common.collect.Maps* becomes *import
> >>>>>>>> org.apache.sentry.shaded.com.google.common.collect.Maps*.
> >>>>>>>>
> >>>>>>>> Ive been looking at this in the context of
> >>>>>>>> https://issues.apache.org/jira/browse/SENTRY-2044, but I feel
> >> this
> >>>>>>> should
> >>>>>>>> potentially be something that is more of an overall Sentry
> >> standard
> >>>>>>>> practice and larger scale implementation.
> >>>>>>>>
> >>>>>>>> -=Brian
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> *Brian Towles* | Software Engineer
> >>>>>>>> t. (512) 415- <0000000000>8105 e. btow...@cloudera.com <
> >>>>>>> j...@cloudera.com>
> >>>>>>>> cloudera.com <http://www.cloudera.com/>
> >>>>>>>>
> >>>>>>>> [image: Cloudera] <http://www.cloudera.com/>
> >>>>>>>>
> >>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera>
> >> [image:
> >>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> >>>>>>> Cloudera
> >>>>>>>> on LinkedIn] <https://www.linkedin.com/company/cloudera>
> >>>>>>>> ------------------------------
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>>> --
> >>> *Brian Towles* | Software Engineer
> >>> t. (512) 415- <0000000000>8105 e. btow...@cloudera.com <
> >> j...@cloudera.com>
> >>> cloudera.com <http://www.cloudera.com/>
> >>>
> >>> [image: Cloudera] <http://www.cloudera.com/>
> >>>
> >>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> >>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> >> Cloudera
> >>> on LinkedIn] <https://www.linkedin.com/company/cloudera>
> >>> ------------------------------
> >>>
> >>
> > --
> > *Brian Towles* | Software Engineer
> > t. (512) 415- <0000000000>8105 e. btow...@cloudera.com <
> j...@cloudera.com>
> > cloudera.com <http://www.cloudera.com/>
> >
> > [image: Cloudera] <http://www.cloudera.com/>
> >
> > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> Cloudera
> > on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > ------------------------------
>
> --
*Brian Towles* | Software Engineer
t. (512) 415- <0000000000>8105 e. btow...@cloudera.com <j...@cloudera.com>
cloudera.com <http://www.cloudera.com/>

[image: Cloudera] <http://www.cloudera.com/>

[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
------------------------------

Reply via email to