Is there any way to add a check to the maven build to make sure that the 
classes being imported are the shaded ones?


> On Nov 15, 2017, at 3:02 PM, Brian Towles <[email protected]> wrote:
> 
> I can see that.  I can look into that as well.  It would make sense for the
> "Sentry as a Library" you talked about as well
> 
> On Wed, Nov 15, 2017 at 2:56 PM Alexander Kolbasov <[email protected]>
> wrote:
> 
>> Brian, There are cases where shading is useful for server as well - mostly
>> for e2e tests which run server, HDFS, Hive all in the same JVM.
>> 
>> On Wed, Nov 15, 2017 at 12:47 PM, Brian Towles <[email protected]>
>> wrote:
>> 
>>> So there would be two different levels of shading.
>>> 
>>> The first would be the known issues shared repackaged dependencies like
>>> Guava.  This would only be for libraries that we absolutely know will
>> cause
>>> collisions with implementing services and implementing third party
>>> libraries that need to have the references to the trouble causing
>> libraries
>>> shifted.  These trouble causing libraries would be the only ones where
>> hard
>>> coded import path changes would need to take place for.
>>> 
>>> For example a sentry-thridparty/sentry-shaded-miscellaneous module would
>>> shared google guava shifting the package to
>>> org.apache.sentry.thirdparty.com.google.common. As well as it would shade
>>> the BoneCP jar so that it picks up the shift of the Guava package but not
>>> shifting the BoneCP packages unless we encounter and a known conflict.
>>> 
>>> The second level os shading would be for the plugins distribution and
>> would
>>> shade and package shift only the dependencies needed for that plugin.
>> This
>>> would not need to have any import statement changes as the maven shade
>>> plugin would change the bytecode package references on packaging.  As to
>>> Kalyan's question, the run time memory used for class loading would only
>> be
>>> for those items used by the plugin, which is really not much anyways
>>> comparatively. The class loader doesn't load up a class until its needed.
>>> Till that point it is just a reference to a file in a jar.  The list of
>>> files/classes in jars is not a heavy memory load.
>>> 
>>> The main server implementation would not need to have the second level
>>> shading done to it as it runs stand alone and is unaffected by the
>>> collisions issues.
>>> 
>>> @Alexander For your questions, there is a small amount of refactoring of
>>> import statements that need to take place for the first level of shading
>>> but there are no logic changes that take place.
>>> 
>>> @Stephen Yeah I can see BouncyCastle being more of an issue but it does
>> not
>>> appear in any of the Sentry dependency chains right now.
>>> 
>>> -=Brian
>>> 
>>> On Wed, Nov 15, 2017 at 1:30 PM Stephen Moist <[email protected]>
>> wrote:
>>> 
>>>> +1
>>>> 
>>>> Agree with Kalyan’s concern.  To me it seems simpler to shade all jars
>>>> rather than known issue jars.  One thing though is what if there are
>> bug
>>>> fixes in our jars that are not included in other components, it may
>> cause
>>>> subtle error that are hard to track.  I’m thinking along the lines of
>> the
>>>> KeyValue class change in behavior.  So I propose limited the scope of
>>>> shaded jars to known issues.
>>>> 
>>>> However one thing to note is that some jars do not like to be shaded,
>> as
>>>> an example BouncyCastle has a check to see if the jar has been modified
>>> and
>>>> will refuse to be used in a shaded jar.
>>>> 
>>>>> On Nov 15, 2017, at 1:07 PM, Kalyan Kumar Kalvagadda <
>>>> [email protected]> wrote:
>>>>> 
>>>>> +1
>>>>> 
>>>>> It's good approach but I have a question/concern.
>>>>> Is the proposal to shade is for some specific jars OR to shade all
>> the
>>>>> third party jars?
>>>>> 
>>>>> If proposal is shade all the third-party jars then if would impact
>> the
>>>> run
>>>>> time memory usage as all the classes from the third-party jars would
>> be
>>>>> loaded regardless..
>>>>> 
>>>>> -Kalyan
>>>>> 
>>>>> On Wed, Nov 15, 2017 at 12:04 PM, Sergio Pena <
>>> [email protected]>
>>>>> wrote:
>>>>> 
>>>>>> +1
>>>>>> 
>>>>>> I like the idea. It's hard to upgrade our libraries to newer ones
>> when
>>>>>> other components break due to Sentry being a plugin. I was once
>>> thought
>>>> of
>>>>>> using different jars versions per plugin (i.e. guava11 on hdfs,
>>> guava14
>>>> on
>>>>>> hive and the server, etc), but that is too much to do and not good.
>> I
>>>> like
>>>>>> the hbase idea better so we can use the latest libraries that we
>> want
>>>>>> instead of depending on other components.
>>>>>> 
>>>>>> - Sergio
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Wed, Nov 15, 2017 at 11:53 AM, Alexander Kolbasov <
>>>> [email protected]>
>>>>>> wrote:
>>>>>> 
>>>>>>> Agreed, this would be a very useful thing to do. I remember
>> spending
>>> a
>>>>>> lot
>>>>>>> of time trying to make Sentry work with DataNucleus 4 - the problem
>>> was
>>>>>>> that e2e tests combine Sentry with Hive in the same JVM and this
>>>> created
>>>>>> a
>>>>>>> conflict on the DataNucleus libraries and test failures.
>>>>>>> 
>>>>>>> Looking at the HBase proposal it seems to require some (small) code
>>>>>> changes
>>>>>>> for package names - or I am misreading it? Do you plan to shade all
>>>> 3-rd
>>>>>>> party packages or just some? I would consider at least
>>>>>>> Guava/ZooKeepeer/DataNucleus/Jetty.
>>>>>>> 
>>>>>>> - Alex
>>>>>>> 
>>>>>>> On Wed, Nov 15, 2017 at 8:24 AM, Brian Towles <
>> [email protected]>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Howdy all,
>>>>>>>> 
>>>>>>>> An issue that keeps coming up seems to be the conflict of
>> dependency
>>>>>>>> versions between Sentry and the components it is plugging into. A
>>>>>> current
>>>>>>>> example of this impact is Google Guava with hive2 using v14 and
>>> Impala
>>>>>>>> using v11 while Sentry needs to have at least v14 in order to fix
>>> some
>>>>>>> bugs
>>>>>>>> in the BoneCP library.  These sort of conflicts come in whenever
>> we
>>>> are
>>>>>>>> embedding a Sentry plugin or using a Sentry library into another
>>>>>> project.
>>>>>>>> 
>>>>>>>> I would like to propose a mechanism for offsetting some of these
>>>> issues
>>>>>>>> using something similar to the third party shading that HBase uses
>>> for
>>>>>>>> common problems (such as Guava)
>>>>>>>> (https://github.com/apache/hbase-thirdparty with
>>>>>>>> documentation about it
>> http://hbase.apache.org/book.html#thirdparty
>>> )
>>>>>> and
>>>>>>>> is
>>>>>>>> used by HBase and Hadoop for packaging of their downstream
>>> artifacts (
>>>>>>>> https://github.com/apache/hbase/blob/master/hbase-shaded/ and
>>>>>>>> https://github.com/apache/hadoop/tree/trunk/hadoop-
>>>>>>>> client-modules/hadoop-client-api/
>>>>>>>> )
>>>>>>>> 
>>>>>>>> The main benefit of this is that it would allow Sentry to be used
>> as
>>>>>>>> libraries and plugins with all of the dependencies needed for
>> Sentry
>>>> to
>>>>>>> be
>>>>>>>> abstracted away from components implementing it. Sentry could rev
>>>>>>> versions
>>>>>>>> of libraries easier and not have collisions of library versions
>>> needed
>>>>>> by
>>>>>>>> the implementing component.  As well this would potentially make
>>>> Sentry
>>>>>>>> downstream usage more stable since it would be used and tested
>>> against
>>>>>> a
>>>>>>>> static set of dependencies and not using libraries based on what
>> the
>>>>>>>> implementing component has.
>>>>>>>> 
>>>>>>>> On the downside, it would make the on disk size of the Sentry
>>> plugins
>>>>>> and
>>>>>>>> libraries for downstream larger. As well, the number of classes
>>> loaded
>>>>>>> into
>>>>>>>> memory would be larger since there would be potential duplication
>> of
>>>>>>> actual
>>>>>>>> class implementations or multiple versions of a class with
>> different
>>>>>>>> package names in memory.  But this seems to be a common practice
>> and
>>>>>> the
>>>>>>>> lack of library version collisions and stability make up for these
>>>>>>>> downsides.
>>>>>>>> 
>>>>>>>> The third party shading works by using the Maven shade plugin to
>> do
>>>>>>> package
>>>>>>>> name shifting of the third party library (Guava, BoneCP) to a
>> sentry
>>>>>>>> specific package using the version on the third party library
>> needed
>>>>>> for
>>>>>>>> Sentry.
>>>>>>>> 
>>>>>>>> E.G. *com.google.common* packages could be shifted to
>>>>>>>> *org.apache.sentry.shaded.com.google.common*.
>>>>>>>> 
>>>>>>>> Since the Maven Shade plugin can actually change the byte-codes of
>>> the
>>>>>>>> libraries being shaded, we can do this even for dependencies that
>>> have
>>>>>>>> shared sub-dependencies .  BoneCP being the main example since it
>>> uses
>>>>>>>> Guava internal to itself.  We could shade the BoneCP into the same
>>>>>> shared
>>>>>>>> sentry third party dependency jar and since the bytecode level
>>>>>>>> manipulation.
>>>>>>>> 
>>>>>>>> What this means on the development side is that we would need to
>>>>>>> reference
>>>>>>>> imports from the shaded third party
>>>>>>>> 
>>>>>>>> E.G. *import com.google.common.collect.Maps* becomes *import
>>>>>>>> org.apache.sentry.shaded.com.google.common.collect.Maps*.
>>>>>>>> 
>>>>>>>> Ive been looking at this in the context of
>>>>>>>> https://issues.apache.org/jira/browse/SENTRY-2044, but I feel
>> this
>>>>>>> should
>>>>>>>> potentially be something that is more of an overall Sentry
>> standard
>>>>>>>> practice and larger scale implementation.
>>>>>>>> 
>>>>>>>> -=Brian
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> *Brian Towles* | Software Engineer
>>>>>>>> t. (512) 415- <0000000000>8105 e. [email protected] <
>>>>>>> [email protected]>
>>>>>>>> cloudera.com <http://www.cloudera.com/>
>>>>>>>> 
>>>>>>>> [image: Cloudera] <http://www.cloudera.com/>
>>>>>>>> 
>>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera>
>> [image:
>>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>>>> Cloudera
>>>>>>>> on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>>> ------------------------------
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>>> --
>>> *Brian Towles* | Software Engineer
>>> t. (512) 415- <0000000000>8105 e. [email protected] <
>> [email protected]>
>>> cloudera.com <http://www.cloudera.com/>
>>> 
>>> [image: Cloudera] <http://www.cloudera.com/>
>>> 
>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>> Cloudera
>>> on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>> ------------------------------
>>> 
>> 
> -- 
> *Brian Towles* | Software Engineer
> t. (512) 415- <0000000000>8105 e. [email protected] <[email protected]>
> cloudera.com <http://www.cloudera.com/>
> 
> [image: Cloudera] <http://www.cloudera.com/>
> 
> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
> on LinkedIn] <https://www.linkedin.com/company/cloudera>
> ------------------------------

Reply via email to