@Moist It can be found here https://github.com/apache/hbase/blob/master/hbase-shaded/hbase-shaded-check-invariants/src/test/resources/ensure-jars-have-correct-contents.sh
On Wed, Nov 15, 2017 at 3:38 PM Brian Towles <[email protected]> wrote: > HBase does have a ensure-jars-have-correct-contents.sh check that they run > which we could do something similar to. > > On Wed, Nov 15, 2017 at 3:30 PM Stephen Moist <[email protected]> wrote: > >> Is there any way to add a check to the maven build to make sure that the >> classes being imported are the shaded ones? >> >> >> > On Nov 15, 2017, at 3:02 PM, Brian Towles <[email protected]> wrote: >> > >> > I can see that. I can look into that as well. It would make sense for >> the >> > "Sentry as a Library" you talked about as well >> > >> > On Wed, Nov 15, 2017 at 2:56 PM Alexander Kolbasov <[email protected]> >> > wrote: >> > >> >> Brian, There are cases where shading is useful for server as well - >> mostly >> >> for e2e tests which run server, HDFS, Hive all in the same JVM. >> >> >> >> On Wed, Nov 15, 2017 at 12:47 PM, Brian Towles <[email protected]> >> >> wrote: >> >> >> >>> So there would be two different levels of shading. >> >>> >> >>> The first would be the known issues shared repackaged dependencies >> like >> >>> Guava. This would only be for libraries that we absolutely know will >> >> cause >> >>> collisions with implementing services and implementing third party >> >>> libraries that need to have the references to the trouble causing >> >> libraries >> >>> shifted. These trouble causing libraries would be the only ones where >> >> hard >> >>> coded import path changes would need to take place for. >> >>> >> >>> For example a sentry-thridparty/sentry-shaded-miscellaneous module >> would >> >>> shared google guava shifting the package to >> >>> org.apache.sentry.thirdparty.com.google.common. As well as it would >> shade >> >>> the BoneCP jar so that it picks up the shift of the Guava package but >> not >> >>> shifting the BoneCP packages unless we encounter and a known conflict. >> >>> >> >>> The second level os shading would be for the plugins distribution and >> >> would >> >>> shade and package shift only the dependencies needed for that plugin. >> >> This >> >>> would not need to have any import statement changes as the maven shade >> >>> plugin would change the bytecode package references on packaging. As >> to >> >>> Kalyan's question, the run time memory used for class loading would >> only >> >> be >> >>> for those items used by the plugin, which is really not much anyways >> >>> comparatively. The class loader doesn't load up a class until its >> needed. >> >>> Till that point it is just a reference to a file in a jar. The list >> of >> >>> files/classes in jars is not a heavy memory load. >> >>> >> >>> The main server implementation would not need to have the second level >> >>> shading done to it as it runs stand alone and is unaffected by the >> >>> collisions issues. >> >>> >> >>> @Alexander For your questions, there is a small amount of refactoring >> of >> >>> import statements that need to take place for the first level of >> shading >> >>> but there are no logic changes that take place. >> >>> >> >>> @Stephen Yeah I can see BouncyCastle being more of an issue but it >> does >> >> not >> >>> appear in any of the Sentry dependency chains right now. >> >>> >> >>> -=Brian >> >>> >> >>> On Wed, Nov 15, 2017 at 1:30 PM Stephen Moist <[email protected]> >> >> wrote: >> >>> >> >>>> +1 >> >>>> >> >>>> Agree with Kalyan’s concern. To me it seems simpler to shade all >> jars >> >>>> rather than known issue jars. One thing though is what if there are >> >> bug >> >>>> fixes in our jars that are not included in other components, it may >> >> cause >> >>>> subtle error that are hard to track. I’m thinking along the lines of >> >> the >> >>>> KeyValue class change in behavior. So I propose limited the scope of >> >>>> shaded jars to known issues. >> >>>> >> >>>> However one thing to note is that some jars do not like to be shaded, >> >> as >> >>>> an example BouncyCastle has a check to see if the jar has been >> modified >> >>> and >> >>>> will refuse to be used in a shaded jar. >> >>>> >> >>>>> On Nov 15, 2017, at 1:07 PM, Kalyan Kumar Kalvagadda < >> >>>> [email protected]> wrote: >> >>>>> >> >>>>> +1 >> >>>>> >> >>>>> It's good approach but I have a question/concern. >> >>>>> Is the proposal to shade is for some specific jars OR to shade all >> >> the >> >>>>> third party jars? >> >>>>> >> >>>>> If proposal is shade all the third-party jars then if would impact >> >> the >> >>>> run >> >>>>> time memory usage as all the classes from the third-party jars would >> >> be >> >>>>> loaded regardless.. >> >>>>> >> >>>>> -Kalyan >> >>>>> >> >>>>> On Wed, Nov 15, 2017 at 12:04 PM, Sergio Pena < >> >>> [email protected]> >> >>>>> wrote: >> >>>>> >> >>>>>> +1 >> >>>>>> >> >>>>>> I like the idea. It's hard to upgrade our libraries to newer ones >> >> when >> >>>>>> other components break due to Sentry being a plugin. I was once >> >>> thought >> >>>> of >> >>>>>> using different jars versions per plugin (i.e. guava11 on hdfs, >> >>> guava14 >> >>>> on >> >>>>>> hive and the server, etc), but that is too much to do and not good. >> >> I >> >>>> like >> >>>>>> the hbase idea better so we can use the latest libraries that we >> >> want >> >>>>>> instead of depending on other components. >> >>>>>> >> >>>>>> - Sergio >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> On Wed, Nov 15, 2017 at 11:53 AM, Alexander Kolbasov < >> >>>> [email protected]> >> >>>>>> wrote: >> >>>>>> >> >>>>>>> Agreed, this would be a very useful thing to do. I remember >> >> spending >> >>> a >> >>>>>> lot >> >>>>>>> of time trying to make Sentry work with DataNucleus 4 - the >> problem >> >>> was >> >>>>>>> that e2e tests combine Sentry with Hive in the same JVM and this >> >>>> created >> >>>>>> a >> >>>>>>> conflict on the DataNucleus libraries and test failures. >> >>>>>>> >> >>>>>>> Looking at the HBase proposal it seems to require some (small) >> code >> >>>>>> changes >> >>>>>>> for package names - or I am misreading it? Do you plan to shade >> all >> >>>> 3-rd >> >>>>>>> party packages or just some? I would consider at least >> >>>>>>> Guava/ZooKeepeer/DataNucleus/Jetty. >> >>>>>>> >> >>>>>>> - Alex >> >>>>>>> >> >>>>>>> On Wed, Nov 15, 2017 at 8:24 AM, Brian Towles < >> >> [email protected]> >> >>>>>>> wrote: >> >>>>>>> >> >>>>>>>> Howdy all, >> >>>>>>>> >> >>>>>>>> An issue that keeps coming up seems to be the conflict of >> >> dependency >> >>>>>>>> versions between Sentry and the components it is plugging into. A >> >>>>>> current >> >>>>>>>> example of this impact is Google Guava with hive2 using v14 and >> >>> Impala >> >>>>>>>> using v11 while Sentry needs to have at least v14 in order to fix >> >>> some >> >>>>>>> bugs >> >>>>>>>> in the BoneCP library. These sort of conflicts come in whenever >> >> we >> >>>> are >> >>>>>>>> embedding a Sentry plugin or using a Sentry library into another >> >>>>>> project. >> >>>>>>>> >> >>>>>>>> I would like to propose a mechanism for offsetting some of these >> >>>> issues >> >>>>>>>> using something similar to the third party shading that HBase >> uses >> >>> for >> >>>>>>>> common problems (such as Guava) >> >>>>>>>> (https://github.com/apache/hbase-thirdparty with >> >>>>>>>> documentation about it >> >> http://hbase.apache.org/book.html#thirdparty >> >>> ) >> >>>>>> and >> >>>>>>>> is >> >>>>>>>> used by HBase and Hadoop for packaging of their downstream >> >>> artifacts ( >> >>>>>>>> https://github.com/apache/hbase/blob/master/hbase-shaded/ and >> >>>>>>>> https://github.com/apache/hadoop/tree/trunk/hadoop- >> >>>>>>>> client-modules/hadoop-client-api/ >> >>>>>>>> ) >> >>>>>>>> >> >>>>>>>> The main benefit of this is that it would allow Sentry to be used >> >> as >> >>>>>>>> libraries and plugins with all of the dependencies needed for >> >> Sentry >> >>>> to >> >>>>>>> be >> >>>>>>>> abstracted away from components implementing it. Sentry could rev >> >>>>>>> versions >> >>>>>>>> of libraries easier and not have collisions of library versions >> >>> needed >> >>>>>> by >> >>>>>>>> the implementing component. As well this would potentially make >> >>>> Sentry >> >>>>>>>> downstream usage more stable since it would be used and tested >> >>> against >> >>>>>> a >> >>>>>>>> static set of dependencies and not using libraries based on what >> >> the >> >>>>>>>> implementing component has. >> >>>>>>>> >> >>>>>>>> On the downside, it would make the on disk size of the Sentry >> >>> plugins >> >>>>>> and >> >>>>>>>> libraries for downstream larger. As well, the number of classes >> >>> loaded >> >>>>>>> into >> >>>>>>>> memory would be larger since there would be potential duplication >> >> of >> >>>>>>> actual >> >>>>>>>> class implementations or multiple versions of a class with >> >> different >> >>>>>>>> package names in memory. But this seems to be a common practice >> >> and >> >>>>>> the >> >>>>>>>> lack of library version collisions and stability make up for >> these >> >>>>>>>> downsides. >> >>>>>>>> >> >>>>>>>> The third party shading works by using the Maven shade plugin to >> >> do >> >>>>>>> package >> >>>>>>>> name shifting of the third party library (Guava, BoneCP) to a >> >> sentry >> >>>>>>>> specific package using the version on the third party library >> >> needed >> >>>>>> for >> >>>>>>>> Sentry. >> >>>>>>>> >> >>>>>>>> E.G. *com.google.common* packages could be shifted to >> >>>>>>>> *org.apache.sentry.shaded.com.google.common*. >> >>>>>>>> >> >>>>>>>> Since the Maven Shade plugin can actually change the byte-codes >> of >> >>> the >> >>>>>>>> libraries being shaded, we can do this even for dependencies that >> >>> have >> >>>>>>>> shared sub-dependencies . BoneCP being the main example since it >> >>> uses >> >>>>>>>> Guava internal to itself. We could shade the BoneCP into the >> same >> >>>>>> shared >> >>>>>>>> sentry third party dependency jar and since the bytecode level >> >>>>>>>> manipulation. >> >>>>>>>> >> >>>>>>>> What this means on the development side is that we would need to >> >>>>>>> reference >> >>>>>>>> imports from the shaded third party >> >>>>>>>> >> >>>>>>>> E.G. *import com.google.common.collect.Maps* becomes *import >> >>>>>>>> org.apache.sentry.shaded.com.google.common.collect.Maps*. >> >>>>>>>> >> >>>>>>>> Ive been looking at this in the context of >> >>>>>>>> https://issues.apache.org/jira/browse/SENTRY-2044, but I feel >> >> this >> >>>>>>> should >> >>>>>>>> potentially be something that is more of an overall Sentry >> >> standard >> >>>>>>>> practice and larger scale implementation. >> >>>>>>>> >> >>>>>>>> -=Brian >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> -- >> >>>>>>>> *Brian Towles* | Software Engineer >> >>>>>>>> t. (512) 415- <0000000000>8105 e. [email protected] < >> >>>>>>> [email protected]> >> >>>>>>>> cloudera.com <http://www.cloudera.com/> >> >>>>>>>> >> >>>>>>>> [image: Cloudera] <http://www.cloudera.com/> >> >>>>>>>> >> >>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> >> >> [image: >> >>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> >> [image: >> >>>>>>> Cloudera >> >>>>>>>> on LinkedIn] <https://www.linkedin.com/company/cloudera> >> >>>>>>>> ------------------------------ >> >>>>>>>> >> >>>>>>> >> >>>>>> >> >>>> >> >>>> -- >> >>> *Brian Towles* | Software Engineer >> >>> t. (512) 415- <0000000000>8105 e. [email protected] < >> >> [email protected]> >> >>> cloudera.com <http://www.cloudera.com/> >> >>> >> >>> [image: Cloudera] <http://www.cloudera.com/> >> >>> >> >>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: >> >>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: >> >> Cloudera >> >>> on LinkedIn] <https://www.linkedin.com/company/cloudera> >> >>> ------------------------------ >> >>> >> >> >> > -- >> > *Brian Towles* | Software Engineer >> > t. (512) 415- <0000000000>8105 e. [email protected] < >> [email protected]> >> > cloudera.com <http://www.cloudera.com/> >> > >> > [image: Cloudera] <http://www.cloudera.com/> >> > >> > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: >> > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: >> Cloudera >> > on LinkedIn] <https://www.linkedin.com/company/cloudera> >> > ------------------------------ >> >> -- > *Brian Towles* | Software Engineer > t. (512) 415- <0000000000>8105 e. [email protected] <[email protected]> > cloudera.com <http://www.cloudera.com/> > > [image: Cloudera] <http://www.cloudera.com/> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: > Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> > ------------------------------ > -- *Brian Towles* | Software Engineer t. (512) 415- <0000000000>8105 e. [email protected] <[email protected]> cloudera.com <http://www.cloudera.com/> [image: Cloudera] <http://www.cloudera.com/> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> ------------------------------
