Agreed, this would be a very useful thing to do. I remember spending a lot
of time trying to make Sentry work with DataNucleus 4 - the problem was
that e2e tests combine Sentry with Hive in the same JVM and this created a
conflict on the DataNucleus libraries and test failures.

Looking at the HBase proposal it seems to require some (small) code changes
for package names - or I am misreading it? Do you plan to shade all 3-rd
party packages or just some? I would consider at least
Guava/ZooKeepeer/DataNucleus/Jetty.

- Alex

On Wed, Nov 15, 2017 at 8:24 AM, Brian Towles <btow...@cloudera.com> wrote:

> Howdy all,
>
> An issue that keeps coming up seems to be the conflict of dependency
> versions between Sentry and the components it is plugging into. A current
> example of this impact is Google Guava with hive2 using v14 and Impala
> using v11 while Sentry needs to have at least v14 in order to fix some bugs
> in the BoneCP library.  These sort of conflicts come in whenever we are
> embedding a Sentry plugin or using a Sentry library into another project.
>
> I would like to propose a mechanism for offsetting some of these issues
> using something similar to the third party shading that HBase uses for
> common problems (such as Guava)
> (https://github.com/apache/hbase-thirdparty with
> documentation about it http://hbase.apache.org/book.html#thirdparty) and
> is
> used by HBase and Hadoop for packaging of their downstream artifacts (
> https://github.com/apache/hbase/blob/master/hbase-shaded/ and
> https://github.com/apache/hadoop/tree/trunk/hadoop-
> client-modules/hadoop-client-api/
> )
>
> The main benefit of this is that it would allow Sentry to be used as
> libraries and plugins with all of the dependencies needed for Sentry to be
> abstracted away from components implementing it. Sentry could rev versions
> of libraries easier and not have collisions of library versions needed by
> the implementing component.  As well this would potentially make Sentry
> downstream usage more stable since it would be used and tested against a
> static set of dependencies and not using libraries based on what the
> implementing component has.
>
> On the downside, it would make the on disk size of the Sentry plugins and
> libraries for downstream larger. As well, the number of classes loaded into
> memory would be larger since there would be potential duplication of actual
> class implementations or multiple versions of a class with different
> package names in memory.  But this seems to be a common practice and the
> lack of library version collisions and stability make up for these
> downsides.
>
> The third party shading works by using the Maven shade plugin to do package
> name shifting of the third party library (Guava, BoneCP) to a sentry
> specific package using the version on the third party library needed for
> Sentry.
>
> E.G. *com.google.common* packages could be shifted to
> *org.apache.sentry.shaded.com.google.common*.
>
> Since the Maven Shade plugin can actually change the byte-codes of the
> libraries being shaded, we can do this even for dependencies that have
> shared sub-dependencies .  BoneCP being the main example since it uses
> Guava internal to itself.  We could shade the BoneCP into the same shared
> sentry third party dependency jar and since the bytecode level
> manipulation.
>
> What this means on the development side is that we would need to reference
> imports from the shaded third party
>
> E.G. *import com.google.common.collect.Maps* becomes *import
> org.apache.sentry.shaded.com.google.common.collect.Maps*.
>
> Ive been looking at this in the context of
> https://issues.apache.org/jira/browse/SENTRY-2044, but I feel this should
> potentially be something that is more of an overall Sentry standard
> practice and larger scale implementation.
>
> -=Brian
>
>
> --
> *Brian Towles* | Software Engineer
> t. (512) 415- <0000000000>8105 e. btow...@cloudera.com <j...@cloudera.com>
> cloudera.com <http://www.cloudera.com/>
>
> [image: Cloudera] <http://www.cloudera.com/>
>
> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
> on LinkedIn] <https://www.linkedin.com/company/cloudera>
> ------------------------------
>

Reply via email to