Thanks Steve Likes the slides on kerberos, I have enough scars from Kerberos while trying to integrated it with (Pig, MapRed, Hive JDBC, and HCatalog and Spark) etc. I am still having trouble making Impersonating to work for HCatalog. I might send you an offline email to ask some pointers
Thanks for the ticket. Chester On Thu, Oct 22, 2015 at 1:15 PM, Steve Loughran <ste...@hortonworks.com> wrote: > > On 22 Oct 2015, at 19:32, Chester Chen <ches...@alpinenow.com> wrote: > > Steven > You summarized mostly correct. But there is a couple points I want > to emphasize. > > Not every cluster have the Hive Service enabled. So The Yarn Client > shouldn't try to get the hive delegation token just because security mode > is enabled. > > > I agree, but it shouldn't be failing with a stack trace. Log -yes, fail > no. > > > The Yarn Client code can check if the service is enabled or not > (possible by check hive metastore URI is present or other hive-site.xml > elements). If hive service is not enabled, then we don't need to get hive > delegation token. Hence we don't have the exception. > > If we still try to get hive delegation regardless hive service is > enabled or not ( like the current code is doing now), then code should > still launch the yarn container and spark job, as the user could simply run > a job against HDFS, not accessing Hive. Of course, access Hive will fail. > > > That's exactly what should be happening: the token is only needed if the > code tries to talk to hive. The problem is the YARN client doesn't know > whether that's the case, so it tries every time. It shouldn't be failing > though. > > Created an issue to cover this; I'll see what reflection it takes. I'll > also pull the code out into a method that can be tested standalone: we > shoudn't have to wait until a run on UGI.isSecure() mode. > > https://issues.apache.org/jira/browse/SPARK-11265 > > > Meanwhile, for the curious, these slides include an animation of what goes > on when a YARN app is launched in a secure cluster, to help explain why > things seem a bit complicated > > http://people.apache.org/~stevel/kerberos/2015-09-kerberos-the-madness.pptx > > The 3rd point is that not sure why org.spark-project.hive's hive-exec > and orga.apache.hadoop.hive hive-exec behave differently for the same > method. > > Chester > > > > > > > > > > On Thu, Oct 22, 2015 at 10:18 AM, Charmee Patel <charm...@gmail.com> > wrote: > >> A similar issue occurs when interacting with Hive secured by Sentry. >> https://issues.apache.org/jira/browse/SPARK-9042 >> >> By changing how Hive Context instance is created, this issue might also >> be resolved. >> >> On Thu, Oct 22, 2015 at 11:33 AM Steve Loughran <ste...@hortonworks.com> >> wrote: >> >>> On 22 Oct 2015, at 08:25, Chester Chen <ches...@alpinenow.com> wrote: >>> >>> Doug >>> >>> We are not trying to compiling against different version of hive. The >>> 1.2.1.spark hive-exec is specified on spark 1.5.2 Pom file. We are moving >>> from spark 1.3.1 to 1.5.1. Simply trying to supply the needed >>> dependency. The rest of application (besides spark) simply uses hive 0.13.1. >>> >>> Yes we are using yarn client directly, there are many functions we >>> need and modified are not provided in yarn client. The spark launcher in >>> the current form does not satisfy our requirements (at least last time I >>> see it) there is a discussion thread about several month ago. >>> >>> From spark 1.x to 1.3.1, we fork the yarn client to achieve these >>> goals ( yarn listener call backs, killApplications, yarn capacities call >>> back etc). In current integration for 1.5.1, to avoid forking the spark, we >>> simply subclass the yarn client overwrites a few methods. But we lost >>> resource capacity call back and estimation by doing this. >>> >>> This is bit off the original topic. >>> >>> I still think there is a bug related to the spark yarn client in >>> case of Kerberos + spark hive-exec dependency. >>> >>> Chester >>> >>> >>> I think I understand what's being implied here. >>> >>> >>> 1. In a secure cluster, a spark app needs a hive delegation token >>> to talk to hive >>> 2. Spark yarn Client (org.apache.spark.deploy.yarn.Client) uses >>> reflection to get the delegation token >>> 3. The reflection doesn't work, a CFNE exception is logged >>> 4. The app should still launch, but it'll be without a hive token , >>> so attempting to work with Hive will fail. >>> >>> I haven't seen this, because while I do test runs against a kerberos >>> cluster, I wasn't talking to hive from the deployed app. >>> >>> >>> It sounds like this workaround works because the hive RPC protocol is >>> compatible enough with 0.13 that a 0.13 client can ask hive for the token, >>> though then your remote CP is stuck on 0.13 >>> >>> Looking at the hive class, the metastore has now made the hive >>> constructor private and gone to a factory method (public static Hive >>> get(HiveConf c) throws HiveException) to get an instance. The reflection >>> code would need to be updated. >>> >>> I'll file a bug with my name next to it >>> >>> >>> >>> > >