Thanks Steve
       Likes the slides on kerberos, I have enough scars from Kerberos
while trying to integrated it with (Pig, MapRed, Hive JDBC, and HCatalog
and Spark) etc.  I am still having trouble making Impersonating to work for
HCatalog.  I might send you an offline email to ask some pointers

      Thanks for the ticket.

Chester





On Thu, Oct 22, 2015 at 1:15 PM, Steve Loughran <ste...@hortonworks.com>
wrote:

>
> On 22 Oct 2015, at 19:32, Chester Chen <ches...@alpinenow.com> wrote:
>
> Steven
>       You summarized mostly correct. But there is a couple points I want
> to emphasize.
>
>      Not every cluster have the Hive Service enabled. So The Yarn Client
> shouldn't try to get the hive delegation token just because security mode
> is enabled.
>
>
> I agree, but it shouldn't be failing with a stack trace. Log -yes, fail
> no.
>
>
>      The Yarn Client code can check if the service is enabled or not
> (possible by check hive metastore URI is present or other hive-site.xml
> elements). If hive service is not enabled, then we don't need to get hive
> delegation token. Hence we don't have the exception.
>
>      If we still try to get hive delegation regardless hive service is
> enabled or not ( like the current code is doing now), then code should
> still launch the yarn container and spark job, as the user could simply run
> a job against HDFS, not accessing Hive.  Of course, access Hive will fail.
>
>
> That's exactly what should be happening: the token is only needed if the
> code tries to talk to hive. The problem is the YARN client doesn't know
> whether that's the case, so it tries every time. It shouldn't be failing
> though.
>
> Created an issue to cover this; I'll see what reflection it takes. I'll
> also pull the code out into a method that can be tested standalone: we
> shoudn't have to wait until a run on UGI.isSecure() mode.
>
> https://issues.apache.org/jira/browse/SPARK-11265
>
>
> Meanwhile, for the curious, these slides include an animation of what goes
> on when a YARN app is launched in a secure cluster, to help explain why
> things seem a bit complicated
>
> http://people.apache.org/~stevel/kerberos/2015-09-kerberos-the-madness.pptx
>
>      The 3rd point is that not sure why org.spark-project.hive's hive-exec
> and orga.apache.hadoop.hive hive-exec behave differently for the same
> method.
>
> Chester
>
>
>
>
>
>
>
>
>
> On Thu, Oct 22, 2015 at 10:18 AM, Charmee Patel <charm...@gmail.com>
> wrote:
>
>> A similar issue occurs when interacting with Hive secured by Sentry.
>> https://issues.apache.org/jira/browse/SPARK-9042
>>
>> By changing how Hive Context instance is created, this issue might also
>> be resolved.
>>
>> On Thu, Oct 22, 2015 at 11:33 AM Steve Loughran <ste...@hortonworks.com>
>> wrote:
>>
>>> On 22 Oct 2015, at 08:25, Chester Chen <ches...@alpinenow.com> wrote:
>>>
>>> Doug
>>>
>>>    We are not trying to compiling against different version of hive. The
>>> 1.2.1.spark hive-exec is specified on spark 1.5.2 Pom file. We are moving
>>> from spark 1.3.1 to 1.5.1. Simply trying to supply the needed
>>> dependency. The rest of application (besides spark) simply uses hive 0.13.1.
>>>
>>>    Yes we are using yarn client directly, there are many functions we
>>> need and modified are not provided in yarn client. The spark launcher in
>>> the current form does not satisfy our requirements (at least last time I
>>> see it) there is a discussion thread about several month ago.
>>>
>>>     From spark 1.x  to 1.3.1, we fork the yarn client to achieve these
>>> goals ( yarn listener call backs, killApplications, yarn capacities call
>>> back etc). In current integration for 1.5.1, to avoid forking the spark, we
>>> simply subclass the yarn client overwrites a few methods. But we lost
>>> resource capacity call back and estimation by doing this.
>>>
>>>    This is bit off the original topic.
>>>
>>>     I still think there is a bug related to the spark yarn client in
>>> case of Kerberos + spark hive-exec dependency.
>>>
>>> Chester
>>>
>>>
>>> I think I understand what's being implied here.
>>>
>>>
>>>    1. In a secure cluster, a spark app needs a hive delegation token
>>>     to talk to hive
>>>    2. Spark yarn Client (org.apache.spark.deploy.yarn.Client) uses
>>>    reflection to get the delegation token
>>>    3. The reflection doesn't work, a CFNE exception is logged
>>>    4. The app should still launch, but it'll be without a hive token ,
>>>    so attempting to work with Hive will fail.
>>>
>>> I haven't seen this, because while I do test runs against a kerberos
>>> cluster, I wasn't talking to hive from the deployed app.
>>>
>>>
>>> It sounds like this workaround works because the hive RPC protocol is
>>> compatible enough with 0.13 that a 0.13 client can ask hive for the token,
>>> though then your remote CP is stuck on 0.13
>>>
>>> Looking at the hive class, the metastore has now made the hive
>>> constructor private and gone to a factory method (public static Hive
>>> get(HiveConf c) throws HiveException) to get an instance. The reflection
>>> code would need to be updated.
>>>
>>> I'll file a bug with my name next to it
>>>
>>>
>>>
>>>
>
>

Reply via email to