Brian, There are cases where shading is useful for server as well - mostly
for e2e tests which run server, HDFS, Hive all in the same JVM.

On Wed, Nov 15, 2017 at 12:47 PM, Brian Towles <btow...@cloudera.com> wrote:

> So there would be two different levels of shading.
>
> The first would be the known issues shared repackaged dependencies like
> Guava.  This would only be for libraries that we absolutely know will cause
> collisions with implementing services and implementing third party
> libraries that need to have the references to the trouble causing libraries
> shifted.  These trouble causing libraries would be the only ones where hard
> coded import path changes would need to take place for.
>
> For example a sentry-thridparty/sentry-shaded-miscellaneous module would
> shared google guava shifting the package to
> org.apache.sentry.thirdparty.com.google.common. As well as it would shade
> the BoneCP jar so that it picks up the shift of the Guava package but not
> shifting the BoneCP packages unless we encounter and a known conflict.
>
> The second level os shading would be for the plugins distribution and would
> shade and package shift only the dependencies needed for that plugin.  This
> would not need to have any import statement changes as the maven shade
> plugin would change the bytecode package references on packaging.  As to
> Kalyan's question, the run time memory used for class loading would only be
> for those items used by the plugin, which is really not much anyways
> comparatively. The class loader doesn't load up a class until its needed.
> Till that point it is just a reference to a file in a jar.  The list of
> files/classes in jars is not a heavy memory load.
>
> The main server implementation would not need to have the second level
> shading done to it as it runs stand alone and is unaffected by the
> collisions issues.
>
> @Alexander For your questions, there is a small amount of refactoring of
> import statements that need to take place for the first level of shading
> but there are no logic changes that take place.
>
> @Stephen Yeah I can see BouncyCastle being more of an issue but it does not
> appear in any of the Sentry dependency chains right now.
>
> -=Brian
>
> On Wed, Nov 15, 2017 at 1:30 PM Stephen Moist <mo...@cloudera.com> wrote:
>
> > +1
> >
> > Agree with Kalyan’s concern.  To me it seems simpler to shade all jars
> > rather than known issue jars.  One thing though is what if there are bug
> > fixes in our jars that are not included in other components, it may cause
> > subtle error that are hard to track.  I’m thinking along the lines of the
> > KeyValue class change in behavior.  So I propose limited the scope of
> > shaded jars to known issues.
> >
> > However one thing to note is that some jars do not like to be shaded, as
> > an example BouncyCastle has a check to see if the jar has been modified
> and
> > will refuse to be used in a shaded jar.
> >
> > > On Nov 15, 2017, at 1:07 PM, Kalyan Kumar Kalvagadda <
> > kkal...@cloudera.com> wrote:
> > >
> > > +1
> > >
> > > It's good approach but I have a question/concern.
> > > Is the proposal to shade is for some specific jars OR to shade all the
> > > third party jars?
> > >
> > > If proposal is shade all the third-party jars then if would impact the
> > run
> > > time memory usage as all the classes from the third-party jars would be
> > > loaded regardless..
> > >
> > > -Kalyan
> > >
> > > On Wed, Nov 15, 2017 at 12:04 PM, Sergio Pena <
> sergio.p...@cloudera.com>
> > > wrote:
> > >
> > >> +1
> > >>
> > >> I like the idea. It's hard to upgrade our libraries to newer ones when
> > >> other components break due to Sentry being a plugin. I was once
> thought
> > of
> > >> using different jars versions per plugin (i.e. guava11 on hdfs,
> guava14
> > on
> > >> hive and the server, etc), but that is too much to do and not good. I
> > like
> > >> the hbase idea better so we can use the latest libraries that we want
> > >> instead of depending on other components.
> > >>
> > >> - Sergio
> > >>
> > >>
> > >>
> > >> On Wed, Nov 15, 2017 at 11:53 AM, Alexander Kolbasov <
> > ak...@cloudera.com>
> > >> wrote:
> > >>
> > >>> Agreed, this would be a very useful thing to do. I remember spending
> a
> > >> lot
> > >>> of time trying to make Sentry work with DataNucleus 4 - the problem
> was
> > >>> that e2e tests combine Sentry with Hive in the same JVM and this
> > created
> > >> a
> > >>> conflict on the DataNucleus libraries and test failures.
> > >>>
> > >>> Looking at the HBase proposal it seems to require some (small) code
> > >> changes
> > >>> for package names - or I am misreading it? Do you plan to shade all
> > 3-rd
> > >>> party packages or just some? I would consider at least
> > >>> Guava/ZooKeepeer/DataNucleus/Jetty.
> > >>>
> > >>> - Alex
> > >>>
> > >>> On Wed, Nov 15, 2017 at 8:24 AM, Brian Towles <btow...@cloudera.com>
> > >>> wrote:
> > >>>
> > >>>> Howdy all,
> > >>>>
> > >>>> An issue that keeps coming up seems to be the conflict of dependency
> > >>>> versions between Sentry and the components it is plugging into. A
> > >> current
> > >>>> example of this impact is Google Guava with hive2 using v14 and
> Impala
> > >>>> using v11 while Sentry needs to have at least v14 in order to fix
> some
> > >>> bugs
> > >>>> in the BoneCP library.  These sort of conflicts come in whenever we
> > are
> > >>>> embedding a Sentry plugin or using a Sentry library into another
> > >> project.
> > >>>>
> > >>>> I would like to propose a mechanism for offsetting some of these
> > issues
> > >>>> using something similar to the third party shading that HBase uses
> for
> > >>>> common problems (such as Guava)
> > >>>> (https://github.com/apache/hbase-thirdparty with
> > >>>> documentation about it http://hbase.apache.org/book.html#thirdparty
> )
> > >> and
> > >>>> is
> > >>>> used by HBase and Hadoop for packaging of their downstream
> artifacts (
> > >>>> https://github.com/apache/hbase/blob/master/hbase-shaded/ and
> > >>>> https://github.com/apache/hadoop/tree/trunk/hadoop-
> > >>>> client-modules/hadoop-client-api/
> > >>>> )
> > >>>>
> > >>>> The main benefit of this is that it would allow Sentry to be used as
> > >>>> libraries and plugins with all of the dependencies needed for Sentry
> > to
> > >>> be
> > >>>> abstracted away from components implementing it. Sentry could rev
> > >>> versions
> > >>>> of libraries easier and not have collisions of library versions
> needed
> > >> by
> > >>>> the implementing component.  As well this would potentially make
> > Sentry
> > >>>> downstream usage more stable since it would be used and tested
> against
> > >> a
> > >>>> static set of dependencies and not using libraries based on what the
> > >>>> implementing component has.
> > >>>>
> > >>>> On the downside, it would make the on disk size of the Sentry
> plugins
> > >> and
> > >>>> libraries for downstream larger. As well, the number of classes
> loaded
> > >>> into
> > >>>> memory would be larger since there would be potential duplication of
> > >>> actual
> > >>>> class implementations or multiple versions of a class with different
> > >>>> package names in memory.  But this seems to be a common practice and
> > >> the
> > >>>> lack of library version collisions and stability make up for these
> > >>>> downsides.
> > >>>>
> > >>>> The third party shading works by using the Maven shade plugin to do
> > >>> package
> > >>>> name shifting of the third party library (Guava, BoneCP) to a sentry
> > >>>> specific package using the version on the third party library needed
> > >> for
> > >>>> Sentry.
> > >>>>
> > >>>> E.G. *com.google.common* packages could be shifted to
> > >>>> *org.apache.sentry.shaded.com.google.common*.
> > >>>>
> > >>>> Since the Maven Shade plugin can actually change the byte-codes of
> the
> > >>>> libraries being shaded, we can do this even for dependencies that
> have
> > >>>> shared sub-dependencies .  BoneCP being the main example since it
> uses
> > >>>> Guava internal to itself.  We could shade the BoneCP into the same
> > >> shared
> > >>>> sentry third party dependency jar and since the bytecode level
> > >>>> manipulation.
> > >>>>
> > >>>> What this means on the development side is that we would need to
> > >>> reference
> > >>>> imports from the shaded third party
> > >>>>
> > >>>> E.G. *import com.google.common.collect.Maps* becomes *import
> > >>>> org.apache.sentry.shaded.com.google.common.collect.Maps*.
> > >>>>
> > >>>> Ive been looking at this in the context of
> > >>>> https://issues.apache.org/jira/browse/SENTRY-2044, but I feel this
> > >>> should
> > >>>> potentially be something that is more of an overall Sentry standard
> > >>>> practice and larger scale implementation.
> > >>>>
> > >>>> -=Brian
> > >>>>
> > >>>>
> > >>>> --
> > >>>> *Brian Towles* | Software Engineer
> > >>>> t. (512) 415- <0000000000>8105 e. btow...@cloudera.com <
> > >>> j...@cloudera.com>
> > >>>> cloudera.com <http://www.cloudera.com/>
> > >>>>
> > >>>> [image: Cloudera] <http://www.cloudera.com/>
> > >>>>
> > >>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> > >>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> > >>> Cloudera
> > >>>> on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > >>>> ------------------------------
> > >>>>
> > >>>
> > >>
> >
> > --
> *Brian Towles* | Software Engineer
> t. (512) 415- <0000000000>8105 e. btow...@cloudera.com <j...@cloudera.com>
> cloudera.com <http://www.cloudera.com/>
>
> [image: Cloudera] <http://www.cloudera.com/>
>
> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
> on LinkedIn] <https://www.linkedin.com/company/cloudera>
> ------------------------------
>

Reply via email to