Howdy all,

An issue that keeps coming up seems to be the conflict of dependency
versions between Sentry and the components it is plugging into. A current
example of this impact is Google Guava with hive2 using v14 and Impala
using v11 while Sentry needs to have at least v14 in order to fix some bugs
in the BoneCP library.  These sort of conflicts come in whenever we are
embedding a Sentry plugin or using a Sentry library into another project.

I would like to propose a mechanism for offsetting some of these issues
using something similar to the third party shading that HBase uses for
common problems (such as Guava)
(https://github.com/apache/hbase-thirdparty with
documentation about it http://hbase.apache.org/book.html#thirdparty) and is
used by HBase and Hadoop for packaging of their downstream artifacts (
https://github.com/apache/hbase/blob/master/hbase-shaded/ and
https://github.com/apache/hadoop/tree/trunk/hadoop-client-modules/hadoop-client-api/
)

The main benefit of this is that it would allow Sentry to be used as
libraries and plugins with all of the dependencies needed for Sentry to be
abstracted away from components implementing it. Sentry could rev versions
of libraries easier and not have collisions of library versions needed by
the implementing component.  As well this would potentially make Sentry
downstream usage more stable since it would be used and tested against a
static set of dependencies and not using libraries based on what the
implementing component has.

On the downside, it would make the on disk size of the Sentry plugins and
libraries for downstream larger. As well, the number of classes loaded into
memory would be larger since there would be potential duplication of actual
class implementations or multiple versions of a class with different
package names in memory.  But this seems to be a common practice and the
lack of library version collisions and stability make up for these
downsides.

The third party shading works by using the Maven shade plugin to do package
name shifting of the third party library (Guava, BoneCP) to a sentry
specific package using the version on the third party library needed for
Sentry.

E.G. *com.google.common* packages could be shifted to
*org.apache.sentry.shaded.com.google.common*.

Since the Maven Shade plugin can actually change the byte-codes of the
libraries being shaded, we can do this even for dependencies that have
shared sub-dependencies .  BoneCP being the main example since it uses
Guava internal to itself.  We could shade the BoneCP into the same shared
sentry third party dependency jar and since the bytecode level
manipulation.

What this means on the development side is that we would need to reference
imports from the shaded third party

E.G. *import com.google.common.collect.Maps* becomes *import
org.apache.sentry.shaded.com.google.common.collect.Maps*.

Ive been looking at this in the context of
https://issues.apache.org/jira/browse/SENTRY-2044, but I feel this should
potentially be something that is more of an overall Sentry standard
practice and larger scale implementation.

-=Brian


-- 
*Brian Towles* | Software Engineer
t. (512) 415- <0000000000>8105 e. [email protected] <[email protected]>
cloudera.com <http://www.cloudera.com/>

[image: Cloudera] <http://www.cloudera.com/>

[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
------------------------------

Reply via email to