Re: Possible bug on Spark Yarn Client (1.5.1) during kerberos mode ?

Chester Chen Thu, 22 Oct 2015 11:34:45 -0700

Steven
      You summarized mostly correct. But there is a couple points I want to
emphasize.


     Not every cluster have the Hive Service enabled. So The Yarn Client
shouldn't try to get the hive delegation token just because security mode
is enabled.

     The Yarn Client code can check if the service is enabled or not
(possible by check hive metastore URI is present or other hive-site.xml
elements). If hive service is not enabled, then we don't need to get hive
delegation token. Hence we don't have the exception.

     If we still try to get hive delegation regardless hive service is
enabled or not ( like the current code is doing now), then code should
still launch the yarn container and spark job, as the user could simply run
a job against HDFS, not accessing Hive.  Of course, access Hive will fail.

     The 3rd point is that not sure why org.spark-project.hive's hive-exec
and orga.apache.hadoop.hive hive-exec behave differently for the same
method.

Chester









On Thu, Oct 22, 2015 at 10:18 AM, Charmee Patel <charm...@gmail.com> wrote:

> A similar issue occurs when interacting with Hive secured by Sentry.
> https://issues.apache.org/jira/browse/SPARK-9042
>
> By changing how Hive Context instance is created, this issue might also be
> resolved.
>
> On Thu, Oct 22, 2015 at 11:33 AM Steve Loughran <ste...@hortonworks.com>
> wrote:
>
>> On 22 Oct 2015, at 08:25, Chester Chen <ches...@alpinenow.com> wrote:
>>
>> Doug
>>
>>    We are not trying to compiling against different version of hive. The
>> 1.2.1.spark hive-exec is specified on spark 1.5.2 Pom file. We are moving
>> from spark 1.3.1 to 1.5.1. Simply trying to supply the needed
>> dependency. The rest of application (besides spark) simply uses hive 0.13.1.
>>
>>    Yes we are using yarn client directly, there are many functions we
>> need and modified are not provided in yarn client. The spark launcher in
>> the current form does not satisfy our requirements (at least last time I
>> see it) there is a discussion thread about several month ago.
>>
>>     From spark 1.x  to 1.3.1, we fork the yarn client to achieve these
>> goals ( yarn listener call backs, killApplications, yarn capacities call
>> back etc). In current integration for 1.5.1, to avoid forking the spark, we
>> simply subclass the yarn client overwrites a few methods. But we lost
>> resource capacity call back and estimation by doing this.
>>
>>    This is bit off the original topic.
>>
>>     I still think there is a bug related to the spark yarn client in case
>> of Kerberos + spark hive-exec dependency.
>>
>> Chester
>>
>>
>> I think I understand what's being implied here.
>>
>>
>>    1. In a secure cluster, a spark app needs a hive delegation token  to
>>    talk to hive
>>    2. Spark yarn Client (org.apache.spark.deploy.yarn.Client) uses
>>    reflection to get the delegation token
>>    3. The reflection doesn't work, a CFNE exception is logged
>>    4. The app should still launch, but it'll be without a hive token ,
>>    so attempting to work with Hive will fail.
>>
>> I haven't seen this, because while I do test runs against a kerberos
>> cluster, I wasn't talking to hive from the deployed app.
>>
>>
>> It sounds like this workaround works because the hive RPC protocol is
>> compatible enough with 0.13 that a 0.13 client can ask hive for the token,
>> though then your remote CP is stuck on 0.13
>>
>> Looking at the hive class, the metastore has now made the hive
>> constructor private and gone to a factory method (public static Hive
>> get(HiveConf c) throws HiveException) to get an instance. The reflection
>> code would need to be updated.
>>
>> I'll file a bug with my name next to it
>>
>>
>>
>>

Re: Possible bug on Spark Yarn Client (1.5.1) during kerberos mode ?

Reply via email to