Is there any way to add a check to the maven build to make sure that the classes being imported are the shaded ones?
> On Nov 15, 2017, at 3:02 PM, Brian Towles <[email protected]> wrote: > > I can see that. I can look into that as well. It would make sense for the > "Sentry as a Library" you talked about as well > > On Wed, Nov 15, 2017 at 2:56 PM Alexander Kolbasov <[email protected]> > wrote: > >> Brian, There are cases where shading is useful for server as well - mostly >> for e2e tests which run server, HDFS, Hive all in the same JVM. >> >> On Wed, Nov 15, 2017 at 12:47 PM, Brian Towles <[email protected]> >> wrote: >> >>> So there would be two different levels of shading. >>> >>> The first would be the known issues shared repackaged dependencies like >>> Guava. This would only be for libraries that we absolutely know will >> cause >>> collisions with implementing services and implementing third party >>> libraries that need to have the references to the trouble causing >> libraries >>> shifted. These trouble causing libraries would be the only ones where >> hard >>> coded import path changes would need to take place for. >>> >>> For example a sentry-thridparty/sentry-shaded-miscellaneous module would >>> shared google guava shifting the package to >>> org.apache.sentry.thirdparty.com.google.common. As well as it would shade >>> the BoneCP jar so that it picks up the shift of the Guava package but not >>> shifting the BoneCP packages unless we encounter and a known conflict. >>> >>> The second level os shading would be for the plugins distribution and >> would >>> shade and package shift only the dependencies needed for that plugin. >> This >>> would not need to have any import statement changes as the maven shade >>> plugin would change the bytecode package references on packaging. As to >>> Kalyan's question, the run time memory used for class loading would only >> be >>> for those items used by the plugin, which is really not much anyways >>> comparatively. The class loader doesn't load up a class until its needed. >>> Till that point it is just a reference to a file in a jar. The list of >>> files/classes in jars is not a heavy memory load. >>> >>> The main server implementation would not need to have the second level >>> shading done to it as it runs stand alone and is unaffected by the >>> collisions issues. >>> >>> @Alexander For your questions, there is a small amount of refactoring of >>> import statements that need to take place for the first level of shading >>> but there are no logic changes that take place. >>> >>> @Stephen Yeah I can see BouncyCastle being more of an issue but it does >> not >>> appear in any of the Sentry dependency chains right now. >>> >>> -=Brian >>> >>> On Wed, Nov 15, 2017 at 1:30 PM Stephen Moist <[email protected]> >> wrote: >>> >>>> +1 >>>> >>>> Agree with Kalyan’s concern. To me it seems simpler to shade all jars >>>> rather than known issue jars. One thing though is what if there are >> bug >>>> fixes in our jars that are not included in other components, it may >> cause >>>> subtle error that are hard to track. I’m thinking along the lines of >> the >>>> KeyValue class change in behavior. So I propose limited the scope of >>>> shaded jars to known issues. >>>> >>>> However one thing to note is that some jars do not like to be shaded, >> as >>>> an example BouncyCastle has a check to see if the jar has been modified >>> and >>>> will refuse to be used in a shaded jar. >>>> >>>>> On Nov 15, 2017, at 1:07 PM, Kalyan Kumar Kalvagadda < >>>> [email protected]> wrote: >>>>> >>>>> +1 >>>>> >>>>> It's good approach but I have a question/concern. >>>>> Is the proposal to shade is for some specific jars OR to shade all >> the >>>>> third party jars? >>>>> >>>>> If proposal is shade all the third-party jars then if would impact >> the >>>> run >>>>> time memory usage as all the classes from the third-party jars would >> be >>>>> loaded regardless.. >>>>> >>>>> -Kalyan >>>>> >>>>> On Wed, Nov 15, 2017 at 12:04 PM, Sergio Pena < >>> [email protected]> >>>>> wrote: >>>>> >>>>>> +1 >>>>>> >>>>>> I like the idea. It's hard to upgrade our libraries to newer ones >> when >>>>>> other components break due to Sentry being a plugin. I was once >>> thought >>>> of >>>>>> using different jars versions per plugin (i.e. guava11 on hdfs, >>> guava14 >>>> on >>>>>> hive and the server, etc), but that is too much to do and not good. >> I >>>> like >>>>>> the hbase idea better so we can use the latest libraries that we >> want >>>>>> instead of depending on other components. >>>>>> >>>>>> - Sergio >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Nov 15, 2017 at 11:53 AM, Alexander Kolbasov < >>>> [email protected]> >>>>>> wrote: >>>>>> >>>>>>> Agreed, this would be a very useful thing to do. I remember >> spending >>> a >>>>>> lot >>>>>>> of time trying to make Sentry work with DataNucleus 4 - the problem >>> was >>>>>>> that e2e tests combine Sentry with Hive in the same JVM and this >>>> created >>>>>> a >>>>>>> conflict on the DataNucleus libraries and test failures. >>>>>>> >>>>>>> Looking at the HBase proposal it seems to require some (small) code >>>>>> changes >>>>>>> for package names - or I am misreading it? Do you plan to shade all >>>> 3-rd >>>>>>> party packages or just some? I would consider at least >>>>>>> Guava/ZooKeepeer/DataNucleus/Jetty. >>>>>>> >>>>>>> - Alex >>>>>>> >>>>>>> On Wed, Nov 15, 2017 at 8:24 AM, Brian Towles < >> [email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Howdy all, >>>>>>>> >>>>>>>> An issue that keeps coming up seems to be the conflict of >> dependency >>>>>>>> versions between Sentry and the components it is plugging into. A >>>>>> current >>>>>>>> example of this impact is Google Guava with hive2 using v14 and >>> Impala >>>>>>>> using v11 while Sentry needs to have at least v14 in order to fix >>> some >>>>>>> bugs >>>>>>>> in the BoneCP library. These sort of conflicts come in whenever >> we >>>> are >>>>>>>> embedding a Sentry plugin or using a Sentry library into another >>>>>> project. >>>>>>>> >>>>>>>> I would like to propose a mechanism for offsetting some of these >>>> issues >>>>>>>> using something similar to the third party shading that HBase uses >>> for >>>>>>>> common problems (such as Guava) >>>>>>>> (https://github.com/apache/hbase-thirdparty with >>>>>>>> documentation about it >> http://hbase.apache.org/book.html#thirdparty >>> ) >>>>>> and >>>>>>>> is >>>>>>>> used by HBase and Hadoop for packaging of their downstream >>> artifacts ( >>>>>>>> https://github.com/apache/hbase/blob/master/hbase-shaded/ and >>>>>>>> https://github.com/apache/hadoop/tree/trunk/hadoop- >>>>>>>> client-modules/hadoop-client-api/ >>>>>>>> ) >>>>>>>> >>>>>>>> The main benefit of this is that it would allow Sentry to be used >> as >>>>>>>> libraries and plugins with all of the dependencies needed for >> Sentry >>>> to >>>>>>> be >>>>>>>> abstracted away from components implementing it. Sentry could rev >>>>>>> versions >>>>>>>> of libraries easier and not have collisions of library versions >>> needed >>>>>> by >>>>>>>> the implementing component. As well this would potentially make >>>> Sentry >>>>>>>> downstream usage more stable since it would be used and tested >>> against >>>>>> a >>>>>>>> static set of dependencies and not using libraries based on what >> the >>>>>>>> implementing component has. >>>>>>>> >>>>>>>> On the downside, it would make the on disk size of the Sentry >>> plugins >>>>>> and >>>>>>>> libraries for downstream larger. As well, the number of classes >>> loaded >>>>>>> into >>>>>>>> memory would be larger since there would be potential duplication >> of >>>>>>> actual >>>>>>>> class implementations or multiple versions of a class with >> different >>>>>>>> package names in memory. But this seems to be a common practice >> and >>>>>> the >>>>>>>> lack of library version collisions and stability make up for these >>>>>>>> downsides. >>>>>>>> >>>>>>>> The third party shading works by using the Maven shade plugin to >> do >>>>>>> package >>>>>>>> name shifting of the third party library (Guava, BoneCP) to a >> sentry >>>>>>>> specific package using the version on the third party library >> needed >>>>>> for >>>>>>>> Sentry. >>>>>>>> >>>>>>>> E.G. *com.google.common* packages could be shifted to >>>>>>>> *org.apache.sentry.shaded.com.google.common*. >>>>>>>> >>>>>>>> Since the Maven Shade plugin can actually change the byte-codes of >>> the >>>>>>>> libraries being shaded, we can do this even for dependencies that >>> have >>>>>>>> shared sub-dependencies . BoneCP being the main example since it >>> uses >>>>>>>> Guava internal to itself. We could shade the BoneCP into the same >>>>>> shared >>>>>>>> sentry third party dependency jar and since the bytecode level >>>>>>>> manipulation. >>>>>>>> >>>>>>>> What this means on the development side is that we would need to >>>>>>> reference >>>>>>>> imports from the shaded third party >>>>>>>> >>>>>>>> E.G. *import com.google.common.collect.Maps* becomes *import >>>>>>>> org.apache.sentry.shaded.com.google.common.collect.Maps*. >>>>>>>> >>>>>>>> Ive been looking at this in the context of >>>>>>>> https://issues.apache.org/jira/browse/SENTRY-2044, but I feel >> this >>>>>>> should >>>>>>>> potentially be something that is more of an overall Sentry >> standard >>>>>>>> practice and larger scale implementation. >>>>>>>> >>>>>>>> -=Brian >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> *Brian Towles* | Software Engineer >>>>>>>> t. (512) 415- <0000000000>8105 e. [email protected] < >>>>>>> [email protected]> >>>>>>>> cloudera.com <http://www.cloudera.com/> >>>>>>>> >>>>>>>> [image: Cloudera] <http://www.cloudera.com/> >>>>>>>> >>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> >> [image: >>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: >>>>>>> Cloudera >>>>>>>> on LinkedIn] <https://www.linkedin.com/company/cloudera> >>>>>>>> ------------------------------ >>>>>>>> >>>>>>> >>>>>> >>>> >>>> -- >>> *Brian Towles* | Software Engineer >>> t. (512) 415- <0000000000>8105 e. [email protected] < >> [email protected]> >>> cloudera.com <http://www.cloudera.com/> >>> >>> [image: Cloudera] <http://www.cloudera.com/> >>> >>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: >>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: >> Cloudera >>> on LinkedIn] <https://www.linkedin.com/company/cloudera> >>> ------------------------------ >>> >> > -- > *Brian Towles* | Software Engineer > t. (512) 415- <0000000000>8105 e. [email protected] <[email protected]> > cloudera.com <http://www.cloudera.com/> > > [image: Cloudera] <http://www.cloudera.com/> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera > on LinkedIn] <https://www.linkedin.com/company/cloudera> > ------------------------------
