+1 Agree with Kalyan’s concern. To me it seems simpler to shade all jars rather than known issue jars. One thing though is what if there are bug fixes in our jars that are not included in other components, it may cause subtle error that are hard to track. I’m thinking along the lines of the KeyValue class change in behavior. So I propose limited the scope of shaded jars to known issues.
However one thing to note is that some jars do not like to be shaded, as an example BouncyCastle has a check to see if the jar has been modified and will refuse to be used in a shaded jar. > On Nov 15, 2017, at 1:07 PM, Kalyan Kumar Kalvagadda <[email protected]> > wrote: > > +1 > > It's good approach but I have a question/concern. > Is the proposal to shade is for some specific jars OR to shade all the > third party jars? > > If proposal is shade all the third-party jars then if would impact the run > time memory usage as all the classes from the third-party jars would be > loaded regardless.. > > -Kalyan > > On Wed, Nov 15, 2017 at 12:04 PM, Sergio Pena <[email protected]> > wrote: > >> +1 >> >> I like the idea. It's hard to upgrade our libraries to newer ones when >> other components break due to Sentry being a plugin. I was once thought of >> using different jars versions per plugin (i.e. guava11 on hdfs, guava14 on >> hive and the server, etc), but that is too much to do and not good. I like >> the hbase idea better so we can use the latest libraries that we want >> instead of depending on other components. >> >> - Sergio >> >> >> >> On Wed, Nov 15, 2017 at 11:53 AM, Alexander Kolbasov <[email protected]> >> wrote: >> >>> Agreed, this would be a very useful thing to do. I remember spending a >> lot >>> of time trying to make Sentry work with DataNucleus 4 - the problem was >>> that e2e tests combine Sentry with Hive in the same JVM and this created >> a >>> conflict on the DataNucleus libraries and test failures. >>> >>> Looking at the HBase proposal it seems to require some (small) code >> changes >>> for package names - or I am misreading it? Do you plan to shade all 3-rd >>> party packages or just some? I would consider at least >>> Guava/ZooKeepeer/DataNucleus/Jetty. >>> >>> - Alex >>> >>> On Wed, Nov 15, 2017 at 8:24 AM, Brian Towles <[email protected]> >>> wrote: >>> >>>> Howdy all, >>>> >>>> An issue that keeps coming up seems to be the conflict of dependency >>>> versions between Sentry and the components it is plugging into. A >> current >>>> example of this impact is Google Guava with hive2 using v14 and Impala >>>> using v11 while Sentry needs to have at least v14 in order to fix some >>> bugs >>>> in the BoneCP library. These sort of conflicts come in whenever we are >>>> embedding a Sentry plugin or using a Sentry library into another >> project. >>>> >>>> I would like to propose a mechanism for offsetting some of these issues >>>> using something similar to the third party shading that HBase uses for >>>> common problems (such as Guava) >>>> (https://github.com/apache/hbase-thirdparty with >>>> documentation about it http://hbase.apache.org/book.html#thirdparty) >> and >>>> is >>>> used by HBase and Hadoop for packaging of their downstream artifacts ( >>>> https://github.com/apache/hbase/blob/master/hbase-shaded/ and >>>> https://github.com/apache/hadoop/tree/trunk/hadoop- >>>> client-modules/hadoop-client-api/ >>>> ) >>>> >>>> The main benefit of this is that it would allow Sentry to be used as >>>> libraries and plugins with all of the dependencies needed for Sentry to >>> be >>>> abstracted away from components implementing it. Sentry could rev >>> versions >>>> of libraries easier and not have collisions of library versions needed >> by >>>> the implementing component. As well this would potentially make Sentry >>>> downstream usage more stable since it would be used and tested against >> a >>>> static set of dependencies and not using libraries based on what the >>>> implementing component has. >>>> >>>> On the downside, it would make the on disk size of the Sentry plugins >> and >>>> libraries for downstream larger. As well, the number of classes loaded >>> into >>>> memory would be larger since there would be potential duplication of >>> actual >>>> class implementations or multiple versions of a class with different >>>> package names in memory. But this seems to be a common practice and >> the >>>> lack of library version collisions and stability make up for these >>>> downsides. >>>> >>>> The third party shading works by using the Maven shade plugin to do >>> package >>>> name shifting of the third party library (Guava, BoneCP) to a sentry >>>> specific package using the version on the third party library needed >> for >>>> Sentry. >>>> >>>> E.G. *com.google.common* packages could be shifted to >>>> *org.apache.sentry.shaded.com.google.common*. >>>> >>>> Since the Maven Shade plugin can actually change the byte-codes of the >>>> libraries being shaded, we can do this even for dependencies that have >>>> shared sub-dependencies . BoneCP being the main example since it uses >>>> Guava internal to itself. We could shade the BoneCP into the same >> shared >>>> sentry third party dependency jar and since the bytecode level >>>> manipulation. >>>> >>>> What this means on the development side is that we would need to >>> reference >>>> imports from the shaded third party >>>> >>>> E.G. *import com.google.common.collect.Maps* becomes *import >>>> org.apache.sentry.shaded.com.google.common.collect.Maps*. >>>> >>>> Ive been looking at this in the context of >>>> https://issues.apache.org/jira/browse/SENTRY-2044, but I feel this >>> should >>>> potentially be something that is more of an overall Sentry standard >>>> practice and larger scale implementation. >>>> >>>> -=Brian >>>> >>>> >>>> -- >>>> *Brian Towles* | Software Engineer >>>> t. (512) 415- <0000000000>8105 e. [email protected] < >>> [email protected]> >>>> cloudera.com <http://www.cloudera.com/> >>>> >>>> [image: Cloudera] <http://www.cloudera.com/> >>>> >>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: >>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: >>> Cloudera >>>> on LinkedIn] <https://www.linkedin.com/company/cloudera> >>>> ------------------------------ >>>> >>> >>
