Brian, There are cases where shading is useful for server as well - mostly for e2e tests which run server, HDFS, Hive all in the same JVM.
On Wed, Nov 15, 2017 at 12:47 PM, Brian Towles <btow...@cloudera.com> wrote: > So there would be two different levels of shading. > > The first would be the known issues shared repackaged dependencies like > Guava. This would only be for libraries that we absolutely know will cause > collisions with implementing services and implementing third party > libraries that need to have the references to the trouble causing libraries > shifted. These trouble causing libraries would be the only ones where hard > coded import path changes would need to take place for. > > For example a sentry-thridparty/sentry-shaded-miscellaneous module would > shared google guava shifting the package to > org.apache.sentry.thirdparty.com.google.common. As well as it would shade > the BoneCP jar so that it picks up the shift of the Guava package but not > shifting the BoneCP packages unless we encounter and a known conflict. > > The second level os shading would be for the plugins distribution and would > shade and package shift only the dependencies needed for that plugin. This > would not need to have any import statement changes as the maven shade > plugin would change the bytecode package references on packaging. As to > Kalyan's question, the run time memory used for class loading would only be > for those items used by the plugin, which is really not much anyways > comparatively. The class loader doesn't load up a class until its needed. > Till that point it is just a reference to a file in a jar. The list of > files/classes in jars is not a heavy memory load. > > The main server implementation would not need to have the second level > shading done to it as it runs stand alone and is unaffected by the > collisions issues. > > @Alexander For your questions, there is a small amount of refactoring of > import statements that need to take place for the first level of shading > but there are no logic changes that take place. > > @Stephen Yeah I can see BouncyCastle being more of an issue but it does not > appear in any of the Sentry dependency chains right now. > > -=Brian > > On Wed, Nov 15, 2017 at 1:30 PM Stephen Moist <mo...@cloudera.com> wrote: > > > +1 > > > > Agree with Kalyan’s concern. To me it seems simpler to shade all jars > > rather than known issue jars. One thing though is what if there are bug > > fixes in our jars that are not included in other components, it may cause > > subtle error that are hard to track. I’m thinking along the lines of the > > KeyValue class change in behavior. So I propose limited the scope of > > shaded jars to known issues. > > > > However one thing to note is that some jars do not like to be shaded, as > > an example BouncyCastle has a check to see if the jar has been modified > and > > will refuse to be used in a shaded jar. > > > > > On Nov 15, 2017, at 1:07 PM, Kalyan Kumar Kalvagadda < > > kkal...@cloudera.com> wrote: > > > > > > +1 > > > > > > It's good approach but I have a question/concern. > > > Is the proposal to shade is for some specific jars OR to shade all the > > > third party jars? > > > > > > If proposal is shade all the third-party jars then if would impact the > > run > > > time memory usage as all the classes from the third-party jars would be > > > loaded regardless.. > > > > > > -Kalyan > > > > > > On Wed, Nov 15, 2017 at 12:04 PM, Sergio Pena < > sergio.p...@cloudera.com> > > > wrote: > > > > > >> +1 > > >> > > >> I like the idea. It's hard to upgrade our libraries to newer ones when > > >> other components break due to Sentry being a plugin. I was once > thought > > of > > >> using different jars versions per plugin (i.e. guava11 on hdfs, > guava14 > > on > > >> hive and the server, etc), but that is too much to do and not good. I > > like > > >> the hbase idea better so we can use the latest libraries that we want > > >> instead of depending on other components. > > >> > > >> - Sergio > > >> > > >> > > >> > > >> On Wed, Nov 15, 2017 at 11:53 AM, Alexander Kolbasov < > > ak...@cloudera.com> > > >> wrote: > > >> > > >>> Agreed, this would be a very useful thing to do. I remember spending > a > > >> lot > > >>> of time trying to make Sentry work with DataNucleus 4 - the problem > was > > >>> that e2e tests combine Sentry with Hive in the same JVM and this > > created > > >> a > > >>> conflict on the DataNucleus libraries and test failures. > > >>> > > >>> Looking at the HBase proposal it seems to require some (small) code > > >> changes > > >>> for package names - or I am misreading it? Do you plan to shade all > > 3-rd > > >>> party packages or just some? I would consider at least > > >>> Guava/ZooKeepeer/DataNucleus/Jetty. > > >>> > > >>> - Alex > > >>> > > >>> On Wed, Nov 15, 2017 at 8:24 AM, Brian Towles <btow...@cloudera.com> > > >>> wrote: > > >>> > > >>>> Howdy all, > > >>>> > > >>>> An issue that keeps coming up seems to be the conflict of dependency > > >>>> versions between Sentry and the components it is plugging into. A > > >> current > > >>>> example of this impact is Google Guava with hive2 using v14 and > Impala > > >>>> using v11 while Sentry needs to have at least v14 in order to fix > some > > >>> bugs > > >>>> in the BoneCP library. These sort of conflicts come in whenever we > > are > > >>>> embedding a Sentry plugin or using a Sentry library into another > > >> project. > > >>>> > > >>>> I would like to propose a mechanism for offsetting some of these > > issues > > >>>> using something similar to the third party shading that HBase uses > for > > >>>> common problems (such as Guava) > > >>>> (https://github.com/apache/hbase-thirdparty with > > >>>> documentation about it http://hbase.apache.org/book.html#thirdparty > ) > > >> and > > >>>> is > > >>>> used by HBase and Hadoop for packaging of their downstream > artifacts ( > > >>>> https://github.com/apache/hbase/blob/master/hbase-shaded/ and > > >>>> https://github.com/apache/hadoop/tree/trunk/hadoop- > > >>>> client-modules/hadoop-client-api/ > > >>>> ) > > >>>> > > >>>> The main benefit of this is that it would allow Sentry to be used as > > >>>> libraries and plugins with all of the dependencies needed for Sentry > > to > > >>> be > > >>>> abstracted away from components implementing it. Sentry could rev > > >>> versions > > >>>> of libraries easier and not have collisions of library versions > needed > > >> by > > >>>> the implementing component. As well this would potentially make > > Sentry > > >>>> downstream usage more stable since it would be used and tested > against > > >> a > > >>>> static set of dependencies and not using libraries based on what the > > >>>> implementing component has. > > >>>> > > >>>> On the downside, it would make the on disk size of the Sentry > plugins > > >> and > > >>>> libraries for downstream larger. As well, the number of classes > loaded > > >>> into > > >>>> memory would be larger since there would be potential duplication of > > >>> actual > > >>>> class implementations or multiple versions of a class with different > > >>>> package names in memory. But this seems to be a common practice and > > >> the > > >>>> lack of library version collisions and stability make up for these > > >>>> downsides. > > >>>> > > >>>> The third party shading works by using the Maven shade plugin to do > > >>> package > > >>>> name shifting of the third party library (Guava, BoneCP) to a sentry > > >>>> specific package using the version on the third party library needed > > >> for > > >>>> Sentry. > > >>>> > > >>>> E.G. *com.google.common* packages could be shifted to > > >>>> *org.apache.sentry.shaded.com.google.common*. > > >>>> > > >>>> Since the Maven Shade plugin can actually change the byte-codes of > the > > >>>> libraries being shaded, we can do this even for dependencies that > have > > >>>> shared sub-dependencies . BoneCP being the main example since it > uses > > >>>> Guava internal to itself. We could shade the BoneCP into the same > > >> shared > > >>>> sentry third party dependency jar and since the bytecode level > > >>>> manipulation. > > >>>> > > >>>> What this means on the development side is that we would need to > > >>> reference > > >>>> imports from the shaded third party > > >>>> > > >>>> E.G. *import com.google.common.collect.Maps* becomes *import > > >>>> org.apache.sentry.shaded.com.google.common.collect.Maps*. > > >>>> > > >>>> Ive been looking at this in the context of > > >>>> https://issues.apache.org/jira/browse/SENTRY-2044, but I feel this > > >>> should > > >>>> potentially be something that is more of an overall Sentry standard > > >>>> practice and larger scale implementation. > > >>>> > > >>>> -=Brian > > >>>> > > >>>> > > >>>> -- > > >>>> *Brian Towles* | Software Engineer > > >>>> t. (512) 415- <0000000000>8105 e. btow...@cloudera.com < > > >>> j...@cloudera.com> > > >>>> cloudera.com <http://www.cloudera.com/> > > >>>> > > >>>> [image: Cloudera] <http://www.cloudera.com/> > > >>>> > > >>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: > > >>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: > > >>> Cloudera > > >>>> on LinkedIn] <https://www.linkedin.com/company/cloudera> > > >>>> ------------------------------ > > >>>> > > >>> > > >> > > > > -- > *Brian Towles* | Software Engineer > t. (512) 415- <0000000000>8105 e. btow...@cloudera.com <j...@cloudera.com> > cloudera.com <http://www.cloudera.com/> > > [image: Cloudera] <http://www.cloudera.com/> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera > on LinkedIn] <https://www.linkedin.com/company/cloudera> > ------------------------------ >