I can see that.  I can look into that as well.  It would make sense for the
"Sentry as a Library" you talked about as well

On Wed, Nov 15, 2017 at 2:56 PM Alexander Kolbasov <[email protected]>
wrote:

> Brian, There are cases where shading is useful for server as well - mostly
> for e2e tests which run server, HDFS, Hive all in the same JVM.
>
> On Wed, Nov 15, 2017 at 12:47 PM, Brian Towles <[email protected]>
> wrote:
>
> > So there would be two different levels of shading.
> >
> > The first would be the known issues shared repackaged dependencies like
> > Guava.  This would only be for libraries that we absolutely know will
> cause
> > collisions with implementing services and implementing third party
> > libraries that need to have the references to the trouble causing
> libraries
> > shifted.  These trouble causing libraries would be the only ones where
> hard
> > coded import path changes would need to take place for.
> >
> > For example a sentry-thridparty/sentry-shaded-miscellaneous module would
> > shared google guava shifting the package to
> > org.apache.sentry.thirdparty.com.google.common. As well as it would shade
> > the BoneCP jar so that it picks up the shift of the Guava package but not
> > shifting the BoneCP packages unless we encounter and a known conflict.
> >
> > The second level os shading would be for the plugins distribution and
> would
> > shade and package shift only the dependencies needed for that plugin.
> This
> > would not need to have any import statement changes as the maven shade
> > plugin would change the bytecode package references on packaging.  As to
> > Kalyan's question, the run time memory used for class loading would only
> be
> > for those items used by the plugin, which is really not much anyways
> > comparatively. The class loader doesn't load up a class until its needed.
> > Till that point it is just a reference to a file in a jar.  The list of
> > files/classes in jars is not a heavy memory load.
> >
> > The main server implementation would not need to have the second level
> > shading done to it as it runs stand alone and is unaffected by the
> > collisions issues.
> >
> > @Alexander For your questions, there is a small amount of refactoring of
> > import statements that need to take place for the first level of shading
> > but there are no logic changes that take place.
> >
> > @Stephen Yeah I can see BouncyCastle being more of an issue but it does
> not
> > appear in any of the Sentry dependency chains right now.
> >
> > -=Brian
> >
> > On Wed, Nov 15, 2017 at 1:30 PM Stephen Moist <[email protected]>
> wrote:
> >
> > > +1
> > >
> > > Agree with Kalyan’s concern.  To me it seems simpler to shade all jars
> > > rather than known issue jars.  One thing though is what if there are
> bug
> > > fixes in our jars that are not included in other components, it may
> cause
> > > subtle error that are hard to track.  I’m thinking along the lines of
> the
> > > KeyValue class change in behavior.  So I propose limited the scope of
> > > shaded jars to known issues.
> > >
> > > However one thing to note is that some jars do not like to be shaded,
> as
> > > an example BouncyCastle has a check to see if the jar has been modified
> > and
> > > will refuse to be used in a shaded jar.
> > >
> > > > On Nov 15, 2017, at 1:07 PM, Kalyan Kumar Kalvagadda <
> > > [email protected]> wrote:
> > > >
> > > > +1
> > > >
> > > > It's good approach but I have a question/concern.
> > > > Is the proposal to shade is for some specific jars OR to shade all
> the
> > > > third party jars?
> > > >
> > > > If proposal is shade all the third-party jars then if would impact
> the
> > > run
> > > > time memory usage as all the classes from the third-party jars would
> be
> > > > loaded regardless..
> > > >
> > > > -Kalyan
> > > >
> > > > On Wed, Nov 15, 2017 at 12:04 PM, Sergio Pena <
> > [email protected]>
> > > > wrote:
> > > >
> > > >> +1
> > > >>
> > > >> I like the idea. It's hard to upgrade our libraries to newer ones
> when
> > > >> other components break due to Sentry being a plugin. I was once
> > thought
> > > of
> > > >> using different jars versions per plugin (i.e. guava11 on hdfs,
> > guava14
> > > on
> > > >> hive and the server, etc), but that is too much to do and not good.
> I
> > > like
> > > >> the hbase idea better so we can use the latest libraries that we
> want
> > > >> instead of depending on other components.
> > > >>
> > > >> - Sergio
> > > >>
> > > >>
> > > >>
> > > >> On Wed, Nov 15, 2017 at 11:53 AM, Alexander Kolbasov <
> > > [email protected]>
> > > >> wrote:
> > > >>
> > > >>> Agreed, this would be a very useful thing to do. I remember
> spending
> > a
> > > >> lot
> > > >>> of time trying to make Sentry work with DataNucleus 4 - the problem
> > was
> > > >>> that e2e tests combine Sentry with Hive in the same JVM and this
> > > created
> > > >> a
> > > >>> conflict on the DataNucleus libraries and test failures.
> > > >>>
> > > >>> Looking at the HBase proposal it seems to require some (small) code
> > > >> changes
> > > >>> for package names - or I am misreading it? Do you plan to shade all
> > > 3-rd
> > > >>> party packages or just some? I would consider at least
> > > >>> Guava/ZooKeepeer/DataNucleus/Jetty.
> > > >>>
> > > >>> - Alex
> > > >>>
> > > >>> On Wed, Nov 15, 2017 at 8:24 AM, Brian Towles <
> [email protected]>
> > > >>> wrote:
> > > >>>
> > > >>>> Howdy all,
> > > >>>>
> > > >>>> An issue that keeps coming up seems to be the conflict of
> dependency
> > > >>>> versions between Sentry and the components it is plugging into. A
> > > >> current
> > > >>>> example of this impact is Google Guava with hive2 using v14 and
> > Impala
> > > >>>> using v11 while Sentry needs to have at least v14 in order to fix
> > some
> > > >>> bugs
> > > >>>> in the BoneCP library.  These sort of conflicts come in whenever
> we
> > > are
> > > >>>> embedding a Sentry plugin or using a Sentry library into another
> > > >> project.
> > > >>>>
> > > >>>> I would like to propose a mechanism for offsetting some of these
> > > issues
> > > >>>> using something similar to the third party shading that HBase uses
> > for
> > > >>>> common problems (such as Guava)
> > > >>>> (https://github.com/apache/hbase-thirdparty with
> > > >>>> documentation about it
> http://hbase.apache.org/book.html#thirdparty
> > )
> > > >> and
> > > >>>> is
> > > >>>> used by HBase and Hadoop for packaging of their downstream
> > artifacts (
> > > >>>> https://github.com/apache/hbase/blob/master/hbase-shaded/ and
> > > >>>> https://github.com/apache/hadoop/tree/trunk/hadoop-
> > > >>>> client-modules/hadoop-client-api/
> > > >>>> )
> > > >>>>
> > > >>>> The main benefit of this is that it would allow Sentry to be used
> as
> > > >>>> libraries and plugins with all of the dependencies needed for
> Sentry
> > > to
> > > >>> be
> > > >>>> abstracted away from components implementing it. Sentry could rev
> > > >>> versions
> > > >>>> of libraries easier and not have collisions of library versions
> > needed
> > > >> by
> > > >>>> the implementing component.  As well this would potentially make
> > > Sentry
> > > >>>> downstream usage more stable since it would be used and tested
> > against
> > > >> a
> > > >>>> static set of dependencies and not using libraries based on what
> the
> > > >>>> implementing component has.
> > > >>>>
> > > >>>> On the downside, it would make the on disk size of the Sentry
> > plugins
> > > >> and
> > > >>>> libraries for downstream larger. As well, the number of classes
> > loaded
> > > >>> into
> > > >>>> memory would be larger since there would be potential duplication
> of
> > > >>> actual
> > > >>>> class implementations or multiple versions of a class with
> different
> > > >>>> package names in memory.  But this seems to be a common practice
> and
> > > >> the
> > > >>>> lack of library version collisions and stability make up for these
> > > >>>> downsides.
> > > >>>>
> > > >>>> The third party shading works by using the Maven shade plugin to
> do
> > > >>> package
> > > >>>> name shifting of the third party library (Guava, BoneCP) to a
> sentry
> > > >>>> specific package using the version on the third party library
> needed
> > > >> for
> > > >>>> Sentry.
> > > >>>>
> > > >>>> E.G. *com.google.common* packages could be shifted to
> > > >>>> *org.apache.sentry.shaded.com.google.common*.
> > > >>>>
> > > >>>> Since the Maven Shade plugin can actually change the byte-codes of
> > the
> > > >>>> libraries being shaded, we can do this even for dependencies that
> > have
> > > >>>> shared sub-dependencies .  BoneCP being the main example since it
> > uses
> > > >>>> Guava internal to itself.  We could shade the BoneCP into the same
> > > >> shared
> > > >>>> sentry third party dependency jar and since the bytecode level
> > > >>>> manipulation.
> > > >>>>
> > > >>>> What this means on the development side is that we would need to
> > > >>> reference
> > > >>>> imports from the shaded third party
> > > >>>>
> > > >>>> E.G. *import com.google.common.collect.Maps* becomes *import
> > > >>>> org.apache.sentry.shaded.com.google.common.collect.Maps*.
> > > >>>>
> > > >>>> Ive been looking at this in the context of
> > > >>>> https://issues.apache.org/jira/browse/SENTRY-2044, but I feel
> this
> > > >>> should
> > > >>>> potentially be something that is more of an overall Sentry
> standard
> > > >>>> practice and larger scale implementation.
> > > >>>>
> > > >>>> -=Brian
> > > >>>>
> > > >>>>
> > > >>>> --
> > > >>>> *Brian Towles* | Software Engineer
> > > >>>> t. (512) 415- <0000000000>8105 e. [email protected] <
> > > >>> [email protected]>
> > > >>>> cloudera.com <http://www.cloudera.com/>
> > > >>>>
> > > >>>> [image: Cloudera] <http://www.cloudera.com/>
> > > >>>>
> > > >>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera>
> [image:
> > > >>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> > > >>> Cloudera
> > > >>>> on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > > >>>> ------------------------------
> > > >>>>
> > > >>>
> > > >>
> > >
> > > --
> > *Brian Towles* | Software Engineer
> > t. (512) 415- <0000000000>8105 e. [email protected] <
> [email protected]>
> > cloudera.com <http://www.cloudera.com/>
> >
> > [image: Cloudera] <http://www.cloudera.com/>
> >
> > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> Cloudera
> > on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > ------------------------------
> >
>
-- 
*Brian Towles* | Software Engineer
t. (512) 415- <0000000000>8105 e. [email protected] <[email protected]>
cloudera.com <http://www.cloudera.com/>

[image: Cloudera] <http://www.cloudera.com/>

[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
------------------------------

Reply via email to