[jira] [Commented] (FLINK-27191) Support multi kerberos-enabled Hive clusters
[ https://issues.apache.org/jira/browse/FLINK-27191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17631505#comment-17631505 ] luoyuxia commented on FLINK-27191: -- [~straw] Sorry for late reply for I'm busy with working other stuffs. I think you're right. After a investigation, I think we may need different processes for different kerberos-enabled clusters. > Support multi kerberos-enabled Hive clusters > - > > Key: FLINK-27191 > URL: https://issues.apache.org/jira/browse/FLINK-27191 > Project: Flink > Issue Type: Improvement > Components: Connectors / Hive >Reporter: luoyuxia >Priority: Major > Fix For: 1.17.0 > > > Currently, to access kerberos-enabled Hive cluster, users are expected to add > key/secret in flink-conf. But it can only access one Hive cluster in one > Flink cluster, we are also expected to support multi kerberos-enabled Hive > clusters in one Flink cluster. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-27191) Support multi kerberos-enabled Hive clusters
[ https://issues.apache.org/jira/browse/FLINK-27191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536404#comment-17536404 ] Yuan Zhu commented on FLINK-27191: -- I'm so familiar with hadoop proxy schema. In my mind, the ProxyUser cannot be used to impersonate the user in different KDC(Key Distribution Center)s or kerberos-enabled clusters? > Support multi kerberos-enabled Hive clusters > - > > Key: FLINK-27191 > URL: https://issues.apache.org/jira/browse/FLINK-27191 > Project: Flink > Issue Type: Improvement > Components: Connectors / Hive >Reporter: luoyuxia >Priority: Major > Fix For: 1.17.0 > > > Currently, to access kerberos-enabled Hive cluster, users are expected to add > key/secret in flink-conf. But it can only access one Hive cluster in one > Flink cluster, we are also expected to support multi kerberos-enabled Hive > clusters in one Flink cluster. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (FLINK-27191) Support multi kerberos-enabled Hive clusters
[ https://issues.apache.org/jira/browse/FLINK-27191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529156#comment-17529156 ] luoyuxia commented on FLINK-27191: -- [~straw] Very sorry for late repsonse for I'am busy with some other things. From this issue , it does a problem if we use doAs to switch user. And after some brief investagation, it seems there's no other better way. Does the ProxyUser proposed in that issue can solve your problem? > Support multi kerberos-enabled Hive clusters > - > > Key: FLINK-27191 > URL: https://issues.apache.org/jira/browse/FLINK-27191 > Project: Flink > Issue Type: Improvement > Components: Connectors / Hive >Reporter: luoyuxia >Priority: Major > Fix For: 1.16.0 > > > Currently, to access kerberos-enabled Hive cluster, users are expected to add > key/secret in flink-conf. But it can only access one Hive cluster in one > Flink cluster, we are also expected to support multi kerberos-enabled Hive > clusters in one Flink cluster. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (FLINK-27191) Support multi kerberos-enabled Hive clusters
[ https://issues.apache.org/jira/browse/FLINK-27191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525021#comment-17525021 ] Yuan Zhu commented on FLINK-27191: -- [~luoyuxia] Your analysis is right. I miss it. But there may be other problem we have encountered in our internal environment. You can check this issue. > Support multi kerberos-enabled Hive clusters > - > > Key: FLINK-27191 > URL: https://issues.apache.org/jira/browse/FLINK-27191 > Project: Flink > Issue Type: Improvement > Components: Connectors / Hive >Reporter: luoyuxia >Priority: Major > Fix For: 1.16.0 > > > Currently, to access kerberos-enabled Hive cluster, users are expected to add > key/secret in flink-conf. But it can only access one Hive cluster in one > Flink cluster, we are also expected to support multi kerberos-enabled Hive > clusters in one Flink cluster. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (FLINK-27191) Support multi kerberos-enabled Hive clusters
[ https://issues.apache.org/jira/browse/FLINK-27191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525019#comment-17525019 ] Yuan Zhu commented on FLINK-27191: -- [~luoyuxia] I check it again and your analysis is right. > Support multi kerberos-enabled Hive clusters > - > > Key: FLINK-27191 > URL: https://issues.apache.org/jira/browse/FLINK-27191 > Project: Flink > Issue Type: Improvement > Components: Connectors / Hive >Reporter: luoyuxia >Priority: Major > Fix For: 1.16.0 > > > Currently, to access kerberos-enabled Hive cluster, users are expected to add > key/secret in flink-conf. But it can only access one Hive cluster in one > Flink cluster, we are also expected to support multi kerberos-enabled Hive > clusters in one Flink cluster. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (FLINK-27191) Support multi kerberos-enabled Hive clusters
[ https://issues.apache.org/jira/browse/FLINK-27191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17522051#comment-17522051 ] luoyuxia commented on FLINK-27191: -- [~straw] Yes, you are right. I didn't notice that. But I wonder what's the problem it may cause. If run with a kerberos-disabled cluster [loginUser.spawnAutoRenewalThreadForUserCreds()|https://github.com/apache/hadoop/blob/1b5c6b3a3b90c6e396e00e991b49d170eb2dac55/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L748] will be invoked, but only when {code:java} if (isSecurityEnabled()) { //spawn thread only if we have kerb credentials if (user.getAuthenticationMethod() == AuthenticationMethod.KERBEROS && !isKeytab) { Thread t; t.start() } } {code} IMO, if it's kerberos-disabled cluster , it won't spawn the renewal thread. I may miss something, please correct me if I have any other miss. > Support multi kerberos-enabled Hive clusters > - > > Key: FLINK-27191 > URL: https://issues.apache.org/jira/browse/FLINK-27191 > Project: Flink > Issue Type: Improvement > Components: Connectors / Hive >Reporter: luoyuxia >Priority: Major > Fix For: 1.16.0 > > > Currently, to access kerberos-enabled Hive cluster, users are expected to add > key/secret in flink-conf. But it can only access one Hive cluster in one > Flink cluster, we are also expected to support multi kerberos-enabled Hive > clusters in one Flink cluster. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-27191) Support multi kerberos-enabled Hive clusters
[ https://issues.apache.org/jira/browse/FLINK-27191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17522028#comment-17522028 ] Yuan Zhu commented on FLINK-27191: -- There are some static fields, such like [authenticationMethod|https://github.com/apache/hadoop/blob/1b5c6b3a3b90c6e396e00e991b49d170eb2dac55/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L210] in UGI. So the race is still exist even they are different objects. > Support multi kerberos-enabled Hive clusters > - > > Key: FLINK-27191 > URL: https://issues.apache.org/jira/browse/FLINK-27191 > Project: Flink > Issue Type: Improvement > Components: Connectors / Hive >Reporter: luoyuxia >Priority: Major > Fix For: 1.16.0 > > > Currently, to access kerberos-enabled Hive cluster, users are expected to add > key/secret in flink-conf. But it can only access one Hive cluster in one > Flink cluster, we are also expected to support multi kerberos-enabled Hive > clusters in one Flink cluster. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-27191) Support multi kerberos-enabled Hive clusters
[ https://issues.apache.org/jira/browse/FLINK-27191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17522025#comment-17522025 ] luoyuxia commented on FLINK-27191: -- [~straw] Appreciated for your detail explaination. IIUC, you mean the thread wrapped for new principal to access hive will modify [authenticationMethod|https://github.com/apache/hadoop/blob/1b5c6b3a3b90c6e396e00e991b49d170eb2dac55/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L310] which is also be checked by the autoRenewalThread? >From the UGI code, the wapper will use method >[loginUserFromKeytabAndReturnUGI|[https://github.com/apache/hadoop/blob/1b5c6b3a3b90c6e396e00e991b49d170eb2dac55/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1040]], > seems it will create a new >[UGI|[https://github.com/apache/hadoop/blob/1b5c6b3a3b90c6e396e00e991b49d170eb2dac55/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1063]], > and set >[authenticationMethod|https://github.com/apache/hadoop/blob/1b5c6b3a3b90c6e396e00e991b49d170eb2dac55/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L310] > for the new UGI. Actually they modify different object, so I think there's >no race. > Support multi kerberos-enabled Hive clusters > - > > Key: FLINK-27191 > URL: https://issues.apache.org/jira/browse/FLINK-27191 > Project: Flink > Issue Type: Improvement > Components: Connectors / Hive >Reporter: luoyuxia >Priority: Major > Fix For: 1.16.0 > > > Currently, to access kerberos-enabled Hive cluster, users are expected to add > key/secret in flink-conf. But it can only access one Hive cluster in one > Flink cluster, we are also expected to support multi kerberos-enabled Hive > clusters in one Flink cluster. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-27191) Support multi kerberos-enabled Hive clusters
[ https://issues.apache.org/jira/browse/FLINK-27191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521544#comment-17521544 ] Yuan Zhu commented on FLINK-27191: -- If user run a kerberos-disabled cluster and access to a kerberos-enabled hive. Then TM login by HadoopModule#install ->[invoke loginUserFromSubject|https://github.com/apache/flink/blob/ba027b6b1a956b425ff14ed8b55c6aeef3e565c8/flink-runtime/src/main/java/org/apache/flink/runtime/security/modules/HadoopModule.java#L114] ->UGI# [loginUserFromSubject|https://github.com/apache/hadoop/blob/1b5c6b3a3b90c6e396e00e991b49d170eb2dac55/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L717] . Here loginUser.spawnAutoRenewalThreadForUserCreds() will spawn a renewal thread if [authenticationMethod |https://github.com/apache/hadoop/blob/1b5c6b3a3b90c6e396e00e991b49d170eb2dac55/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L310]modified by the thread accessing hive , because UGI need setConfiguration before login. > Support multi kerberos-enabled Hive clusters > - > > Key: FLINK-27191 > URL: https://issues.apache.org/jira/browse/FLINK-27191 > Project: Flink > Issue Type: Improvement > Components: Connectors / Hive >Reporter: luoyuxia >Priority: Major > Fix For: 1.16.0 > > > Currently, to access kerberos-enabled Hive cluster, users are expected to add > key/secret in flink-conf. But it can only access one Hive cluster in one > Flink cluster, we are also expected to support multi kerberos-enabled Hive > clusters in one Flink cluster. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-27191) Support multi kerberos-enabled Hive clusters
[ https://issues.apache.org/jira/browse/FLINK-27191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521415#comment-17521415 ] luoyuxia commented on FLINK-27191: -- What do you mean about the concurrency? IIUC, when access hive, the corresponding thread's context will switch to other principal and the other threads of TM when still use it's own principal. > Support multi kerberos-enabled Hive clusters > - > > Key: FLINK-27191 > URL: https://issues.apache.org/jira/browse/FLINK-27191 > Project: Flink > Issue Type: Improvement > Components: Connectors / Hive >Reporter: luoyuxia >Priority: Major > Fix For: 1.16.0 > > > Currently, to access kerberos-enabled Hive cluster, users are expected to add > key/secret in flink-conf. But it can only access one Hive cluster in one > Flink cluster, we are also expected to support multi kerberos-enabled Hive > clusters in one Flink cluster. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-27191) Support multi kerberos-enabled Hive clusters
[ https://issues.apache.org/jira/browse/FLINK-27191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521408#comment-17521408 ] Yuan Zhu commented on FLINK-27191: -- Then how do we deal with the concurrency of UGI's invoking in TM and wrappers? > Support multi kerberos-enabled Hive clusters > - > > Key: FLINK-27191 > URL: https://issues.apache.org/jira/browse/FLINK-27191 > Project: Flink > Issue Type: Improvement > Components: Connectors / Hive >Reporter: luoyuxia >Priority: Major > Fix For: 1.16.0 > > > Currently, to access kerberos-enabled Hive cluster, users are expected to add > key/secret in flink-conf. But it can only access one Hive cluster in one > Flink cluster, we are also expected to support multi kerberos-enabled Hive > clusters in one Flink cluster. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-27191) Support multi kerberos-enabled Hive clusters
[ https://issues.apache.org/jira/browse/FLINK-27191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521407#comment-17521407 ] luoyuxia commented on FLINK-27191: -- About the confict with TM's, the idea is when every call involving with hdfs/hive metastore, switch the user like using the following code: {code:java} final Configuration conf = new Configuration(); UserGroupInformation ugi = UserGroupInformation.loginUserFromKeytabAndReturnUGI(userCode, keytabPath); ugi.doAs(new PrivilegedAction() { @Override public Object run() { // access hdfs/hive metastore } });{code} So that's why we may need a wrapper. > Support multi kerberos-enabled Hive clusters > - > > Key: FLINK-27191 > URL: https://issues.apache.org/jira/browse/FLINK-27191 > Project: Flink > Issue Type: Improvement > Components: Connectors / Hive >Reporter: luoyuxia >Priority: Major > Fix For: 1.16.0 > > > Currently, to access kerberos-enabled Hive cluster, users are expected to add > key/secret in flink-conf. But it can only access one Hive cluster in one > Flink cluster, we are also expected to support multi kerberos-enabled Hive > clusters in one Flink cluster. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-27191) Support multi kerberos-enabled Hive clusters
[ https://issues.apache.org/jira/browse/FLINK-27191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521189#comment-17521189 ] Yuan Zhu commented on FLINK-27191: -- I'm very happy to contribute. > Support multi kerberos-enabled Hive clusters > - > > Key: FLINK-27191 > URL: https://issues.apache.org/jira/browse/FLINK-27191 > Project: Flink > Issue Type: Improvement > Components: Connectors / Hive >Reporter: luoyuxia >Priority: Major > Fix For: 1.16.0 > > > Currently, to access kerberos-enabled Hive cluster, users are expected to add > key/secret in flink-conf. But it can only access one Hive cluster in one > Flink cluster, we are also expected to support multi kerberos-enabled Hive > clusters in one Flink cluster. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-27191) Support multi kerberos-enabled Hive clusters
[ https://issues.apache.org/jira/browse/FLINK-27191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521188#comment-17521188 ] Yuan Zhu commented on FLINK-27191: -- But when job starts running, Hive source need to connect hdfs/hiveMetaStore with specific principal. IMHO, TM will install HadoopModule when it starts. If we try to connect hdfs/hiveMetaStore with other principal, the configuration will conflict with TM's. How to avoid it? > Support multi kerberos-enabled Hive clusters > - > > Key: FLINK-27191 > URL: https://issues.apache.org/jira/browse/FLINK-27191 > Project: Flink > Issue Type: Improvement > Components: Connectors / Hive >Reporter: luoyuxia >Priority: Major > Fix For: 1.16.0 > > > Currently, to access kerberos-enabled Hive cluster, users are expected to add > key/secret in flink-conf. But it can only access one Hive cluster in one > Flink cluster, we are also expected to support multi kerberos-enabled Hive > clusters in one Flink cluster. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-27191) Support multi kerberos-enabled Hive clusters
[ https://issues.apache.org/jira/browse/FLINK-27191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520962#comment-17520962 ] luoyuxia commented on FLINK-27191: -- [~straw] Thanks for your attention. The basic idea is to wrap the Hive Catalog, so that every call involved with Hive Catlaog can login as specific principal. BTW, weclome to contribute to this ticket in any ways, including dicussion the solution to make sure the solution can solve your problem or code review/contribution, etc. > Support multi kerberos-enabled Hive clusters > - > > Key: FLINK-27191 > URL: https://issues.apache.org/jira/browse/FLINK-27191 > Project: Flink > Issue Type: Improvement > Components: Connectors / Hive >Reporter: luoyuxia >Priority: Major > Fix For: 1.16.0 > > > Currently, to access kerberos-enabled Hive cluster, users are expected to add > key/secret in flink-conf. But it can only access one Hive cluster in one > Flink cluster, we are also expected to support multi kerberos-enabled Hive > clusters in one Flink cluster. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-27191) Support multi kerberos-enabled Hive clusters
[ https://issues.apache.org/jira/browse/FLINK-27191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520931#comment-17520931 ] Yuan Zhu commented on FLINK-27191: -- Hi, [~luoyuxia]. Do you have any idea to deal with it? I have been struggled with the singleton UserGroupInformation for a long time. > Support multi kerberos-enabled Hive clusters > - > > Key: FLINK-27191 > URL: https://issues.apache.org/jira/browse/FLINK-27191 > Project: Flink > Issue Type: Improvement > Components: Connectors / Hive >Reporter: luoyuxia >Priority: Major > Fix For: 1.16.0 > > > Currently, to accesse kerberos-enabled Hive cluster, users are expected to > add ker/secret in flink-conf. But it can only access one Hive cluster in one > Flink cluster, we are also expected to support multi kerberos-enabled Hive > clusters in one Flink cluster. -- This message was sent by Atlassian Jira (v8.20.1#820001)