[jira] [Commented] (FLINK-27191) Support multi kerberos-enabled Hive clusters

2022-11-10 Thread luoyuxia (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17631505#comment-17631505
 ] 

luoyuxia commented on FLINK-27191:
--

[~straw] Sorry for late reply for  I'm busy with working other stuffs.

I think you're right.  After a investigation,  I think we may need different 
processes  for different kerberos-enabled clusters.

> Support multi kerberos-enabled Hive clusters 
> -
>
> Key: FLINK-27191
> URL: https://issues.apache.org/jira/browse/FLINK-27191
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive
>Reporter: luoyuxia
>Priority: Major
> Fix For: 1.17.0
>
>
> Currently, to access kerberos-enabled Hive cluster, users are expected to add 
> key/secret in flink-conf. But it can only access one Hive cluster in one 
> Flink cluster, we are also expected to support multi kerberos-enabled Hive 
> clusters  in one Flink cluster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-27191) Support multi kerberos-enabled Hive clusters

2022-05-12 Thread Yuan Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536404#comment-17536404
 ] 

Yuan Zhu commented on FLINK-27191:
--

I'm so familiar with hadoop proxy schema. In my mind, the ProxyUser cannot be 
used to impersonate the user in different KDC(Key Distribution Center)s or 
kerberos-enabled clusters?

> Support multi kerberos-enabled Hive clusters 
> -
>
> Key: FLINK-27191
> URL: https://issues.apache.org/jira/browse/FLINK-27191
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive
>Reporter: luoyuxia
>Priority: Major
> Fix For: 1.17.0
>
>
> Currently, to access kerberos-enabled Hive cluster, users are expected to add 
> key/secret in flink-conf. But it can only access one Hive cluster in one 
> Flink cluster, we are also expected to support multi kerberos-enabled Hive 
> clusters  in one Flink cluster.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (FLINK-27191) Support multi kerberos-enabled Hive clusters

2022-04-27 Thread luoyuxia (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529156#comment-17529156
 ] 

luoyuxia commented on FLINK-27191:
--

[~straw] Very sorry for late repsonse for I'am busy with some other things. 
From this issue , it does a problem if we use doAs to switch user. And after 
some brief  investagation, it seems there's no other better way. Does the 
ProxyUser proposed in that issue can solve your problem?

> Support multi kerberos-enabled Hive clusters 
> -
>
> Key: FLINK-27191
> URL: https://issues.apache.org/jira/browse/FLINK-27191
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive
>Reporter: luoyuxia
>Priority: Major
> Fix For: 1.16.0
>
>
> Currently, to access kerberos-enabled Hive cluster, users are expected to add 
> key/secret in flink-conf. But it can only access one Hive cluster in one 
> Flink cluster, we are also expected to support multi kerberos-enabled Hive 
> clusters  in one Flink cluster.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (FLINK-27191) Support multi kerberos-enabled Hive clusters

2022-04-20 Thread Yuan Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525021#comment-17525021
 ] 

Yuan Zhu commented on FLINK-27191:
--

[~luoyuxia] Your analysis is right. I miss it.

 

But there may be other problem we have encountered in our internal environment. 
You can check this issue. 

> Support multi kerberos-enabled Hive clusters 
> -
>
> Key: FLINK-27191
> URL: https://issues.apache.org/jira/browse/FLINK-27191
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive
>Reporter: luoyuxia
>Priority: Major
> Fix For: 1.16.0
>
>
> Currently, to access kerberos-enabled Hive cluster, users are expected to add 
> key/secret in flink-conf. But it can only access one Hive cluster in one 
> Flink cluster, we are also expected to support multi kerberos-enabled Hive 
> clusters  in one Flink cluster.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (FLINK-27191) Support multi kerberos-enabled Hive clusters

2022-04-20 Thread Yuan Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525019#comment-17525019
 ] 

Yuan Zhu commented on FLINK-27191:
--

[~luoyuxia] I check it again and your analysis is right. 

> Support multi kerberos-enabled Hive clusters 
> -
>
> Key: FLINK-27191
> URL: https://issues.apache.org/jira/browse/FLINK-27191
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive
>Reporter: luoyuxia
>Priority: Major
> Fix For: 1.16.0
>
>
> Currently, to access kerberos-enabled Hive cluster, users are expected to add 
> key/secret in flink-conf. But it can only access one Hive cluster in one 
> Flink cluster, we are also expected to support multi kerberos-enabled Hive 
> clusters  in one Flink cluster.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (FLINK-27191) Support multi kerberos-enabled Hive clusters

2022-04-13 Thread luoyuxia (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17522051#comment-17522051
 ] 

luoyuxia commented on FLINK-27191:
--

[~straw] Yes, you are right. I didn't notice that. But I wonder what's the 
problem it may cause.  If run with a kerberos-disabled cluster 

[loginUser.spawnAutoRenewalThreadForUserCreds()|https://github.com/apache/hadoop/blob/1b5c6b3a3b90c6e396e00e991b49d170eb2dac55/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L748]
  will be invoked, but only when 
{code:java}
if (isSecurityEnabled()) {
   //spawn thread only if we have kerb credentials 
  if (user.getAuthenticationMethod() == AuthenticationMethod.KERBEROS && 
!isKeytab) {
    Thread t;
   t.start()
   }
} {code}
IMO, if it's kerberos-disabled cluster , it won't spawn the renewal thread.

 

I may miss something, please correct me if I have any other miss.

 

 

> Support multi kerberos-enabled Hive clusters 
> -
>
> Key: FLINK-27191
> URL: https://issues.apache.org/jira/browse/FLINK-27191
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive
>Reporter: luoyuxia
>Priority: Major
> Fix For: 1.16.0
>
>
> Currently, to access kerberos-enabled Hive cluster, users are expected to add 
> key/secret in flink-conf. But it can only access one Hive cluster in one 
> Flink cluster, we are also expected to support multi kerberos-enabled Hive 
> clusters  in one Flink cluster.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-27191) Support multi kerberos-enabled Hive clusters

2022-04-13 Thread Yuan Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17522028#comment-17522028
 ] 

Yuan Zhu commented on FLINK-27191:
--

There are some static fields, such like 
[authenticationMethod|https://github.com/apache/hadoop/blob/1b5c6b3a3b90c6e396e00e991b49d170eb2dac55/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L210]
  in UGI. So the race is still exist even they are different objects.

> Support multi kerberos-enabled Hive clusters 
> -
>
> Key: FLINK-27191
> URL: https://issues.apache.org/jira/browse/FLINK-27191
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive
>Reporter: luoyuxia
>Priority: Major
> Fix For: 1.16.0
>
>
> Currently, to access kerberos-enabled Hive cluster, users are expected to add 
> key/secret in flink-conf. But it can only access one Hive cluster in one 
> Flink cluster, we are also expected to support multi kerberos-enabled Hive 
> clusters  in one Flink cluster.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-27191) Support multi kerberos-enabled Hive clusters

2022-04-13 Thread luoyuxia (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17522025#comment-17522025
 ] 

luoyuxia commented on FLINK-27191:
--

[~straw] Appreciated for your detail explaination. IIUC, you mean the thread 
wrapped for new principal to access hive  will modify  
[authenticationMethod|https://github.com/apache/hadoop/blob/1b5c6b3a3b90c6e396e00e991b49d170eb2dac55/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L310]
 which is also be checked by the autoRenewalThread?

>From the UGI code, the wapper will use method 
>[loginUserFromKeytabAndReturnUGI|[https://github.com/apache/hadoop/blob/1b5c6b3a3b90c6e396e00e991b49d170eb2dac55/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1040]],
> seems it will create a new 
>[UGI|[https://github.com/apache/hadoop/blob/1b5c6b3a3b90c6e396e00e991b49d170eb2dac55/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1063]],
> and set 
>[authenticationMethod|https://github.com/apache/hadoop/blob/1b5c6b3a3b90c6e396e00e991b49d170eb2dac55/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L310]
>  for the new UGI. Actually they modify different object, so I think there's 
>no race.

> Support multi kerberos-enabled Hive clusters 
> -
>
> Key: FLINK-27191
> URL: https://issues.apache.org/jira/browse/FLINK-27191
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive
>Reporter: luoyuxia
>Priority: Major
> Fix For: 1.16.0
>
>
> Currently, to access kerberos-enabled Hive cluster, users are expected to add 
> key/secret in flink-conf. But it can only access one Hive cluster in one 
> Flink cluster, we are also expected to support multi kerberos-enabled Hive 
> clusters  in one Flink cluster.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-27191) Support multi kerberos-enabled Hive clusters

2022-04-13 Thread Yuan Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521544#comment-17521544
 ] 

Yuan Zhu commented on FLINK-27191:
--

If user run a kerberos-disabled cluster and access to a kerberos-enabled hive.

Then TM login by HadoopModule#install ->[invoke 
loginUserFromSubject|https://github.com/apache/flink/blob/ba027b6b1a956b425ff14ed8b55c6aeef3e565c8/flink-runtime/src/main/java/org/apache/flink/runtime/security/modules/HadoopModule.java#L114]
 ->UGI# 
[loginUserFromSubject|https://github.com/apache/hadoop/blob/1b5c6b3a3b90c6e396e00e991b49d170eb2dac55/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L717]
 .

Here loginUser.spawnAutoRenewalThreadForUserCreds() will spawn a renewal thread 
if [authenticationMethod 
|https://github.com/apache/hadoop/blob/1b5c6b3a3b90c6e396e00e991b49d170eb2dac55/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L310]modified
 by the thread accessing hive , because UGI need setConfiguration before login.

> Support multi kerberos-enabled Hive clusters 
> -
>
> Key: FLINK-27191
> URL: https://issues.apache.org/jira/browse/FLINK-27191
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive
>Reporter: luoyuxia
>Priority: Major
> Fix For: 1.16.0
>
>
> Currently, to access kerberos-enabled Hive cluster, users are expected to add 
> key/secret in flink-conf. But it can only access one Hive cluster in one 
> Flink cluster, we are also expected to support multi kerberos-enabled Hive 
> clusters  in one Flink cluster.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-27191) Support multi kerberos-enabled Hive clusters

2022-04-12 Thread luoyuxia (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521415#comment-17521415
 ] 

luoyuxia commented on FLINK-27191:
--

What do you mean about the concurrency? IIUC, when access hive, the 
corresponding thread's context will switch to other principal and  the other 
threads of TM when still use it's own principal.

> Support multi kerberos-enabled Hive clusters 
> -
>
> Key: FLINK-27191
> URL: https://issues.apache.org/jira/browse/FLINK-27191
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive
>Reporter: luoyuxia
>Priority: Major
> Fix For: 1.16.0
>
>
> Currently, to access kerberos-enabled Hive cluster, users are expected to add 
> key/secret in flink-conf. But it can only access one Hive cluster in one 
> Flink cluster, we are also expected to support multi kerberos-enabled Hive 
> clusters  in one Flink cluster.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-27191) Support multi kerberos-enabled Hive clusters

2022-04-12 Thread Yuan Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521408#comment-17521408
 ] 

Yuan Zhu commented on FLINK-27191:
--

Then how do we deal with the concurrency of UGI's invoking in TM and wrappers?

> Support multi kerberos-enabled Hive clusters 
> -
>
> Key: FLINK-27191
> URL: https://issues.apache.org/jira/browse/FLINK-27191
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive
>Reporter: luoyuxia
>Priority: Major
> Fix For: 1.16.0
>
>
> Currently, to access kerberos-enabled Hive cluster, users are expected to add 
> key/secret in flink-conf. But it can only access one Hive cluster in one 
> Flink cluster, we are also expected to support multi kerberos-enabled Hive 
> clusters  in one Flink cluster.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-27191) Support multi kerberos-enabled Hive clusters

2022-04-12 Thread luoyuxia (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521407#comment-17521407
 ] 

luoyuxia commented on FLINK-27191:
--

About the confict with TM's, the idea is when every call involving with 
hdfs/hive metastore, switch the user like using the following code:

 
{code:java}
final Configuration conf = new Configuration();
UserGroupInformation ugi = 
UserGroupInformation.loginUserFromKeytabAndReturnUGI(userCode, keytabPath);
ugi.doAs(new PrivilegedAction() {
    @Override
    public Object run() {
        // access hdfs/hive metastore
    }
});{code}
 

So that's why we may need a wrapper.

> Support multi kerberos-enabled Hive clusters 
> -
>
> Key: FLINK-27191
> URL: https://issues.apache.org/jira/browse/FLINK-27191
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive
>Reporter: luoyuxia
>Priority: Major
> Fix For: 1.16.0
>
>
> Currently, to access kerberos-enabled Hive cluster, users are expected to add 
> key/secret in flink-conf. But it can only access one Hive cluster in one 
> Flink cluster, we are also expected to support multi kerberos-enabled Hive 
> clusters  in one Flink cluster.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-27191) Support multi kerberos-enabled Hive clusters

2022-04-12 Thread Yuan Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521189#comment-17521189
 ] 

Yuan Zhu commented on FLINK-27191:
--

I'm very happy to contribute.

> Support multi kerberos-enabled Hive clusters 
> -
>
> Key: FLINK-27191
> URL: https://issues.apache.org/jira/browse/FLINK-27191
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive
>Reporter: luoyuxia
>Priority: Major
> Fix For: 1.16.0
>
>
> Currently, to access kerberos-enabled Hive cluster, users are expected to add 
> key/secret in flink-conf. But it can only access one Hive cluster in one 
> Flink cluster, we are also expected to support multi kerberos-enabled Hive 
> clusters  in one Flink cluster.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-27191) Support multi kerberos-enabled Hive clusters

2022-04-12 Thread Yuan Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521188#comment-17521188
 ] 

Yuan Zhu commented on FLINK-27191:
--

But when job starts running, Hive source need to connect hdfs/hiveMetaStore 
with specific principal. IMHO, TM will install HadoopModule when it starts. If 
we try to connect hdfs/hiveMetaStore with other principal, the configuration 
will conflict with TM's. How to avoid it?

> Support multi kerberos-enabled Hive clusters 
> -
>
> Key: FLINK-27191
> URL: https://issues.apache.org/jira/browse/FLINK-27191
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive
>Reporter: luoyuxia
>Priority: Major
> Fix For: 1.16.0
>
>
> Currently, to access kerberos-enabled Hive cluster, users are expected to add 
> key/secret in flink-conf. But it can only access one Hive cluster in one 
> Flink cluster, we are also expected to support multi kerberos-enabled Hive 
> clusters  in one Flink cluster.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-27191) Support multi kerberos-enabled Hive clusters

2022-04-12 Thread luoyuxia (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520962#comment-17520962
 ] 

luoyuxia commented on FLINK-27191:
--

[~straw] Thanks for your attention. The basic idea is to wrap the Hive Catalog, 
so that every call involved with Hive Catlaog can login as specific principal.

BTW, weclome to contribute to this ticket in any ways, including dicussion the 
solution to make sure the solution can solve your problem  or code 
review/contribution, etc. 

> Support multi kerberos-enabled Hive clusters 
> -
>
> Key: FLINK-27191
> URL: https://issues.apache.org/jira/browse/FLINK-27191
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive
>Reporter: luoyuxia
>Priority: Major
> Fix For: 1.16.0
>
>
> Currently, to access kerberos-enabled Hive cluster, users are expected to add 
> key/secret in flink-conf. But it can only access one Hive cluster in one 
> Flink cluster, we are also expected to support multi kerberos-enabled Hive 
> clusters  in one Flink cluster.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-27191) Support multi kerberos-enabled Hive clusters

2022-04-11 Thread Yuan Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520931#comment-17520931
 ] 

Yuan Zhu commented on FLINK-27191:
--

Hi, [~luoyuxia]. Do you have any idea to deal with it? I have been struggled 
with the singleton UserGroupInformation for a long time.

> Support multi kerberos-enabled Hive clusters 
> -
>
> Key: FLINK-27191
> URL: https://issues.apache.org/jira/browse/FLINK-27191
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive
>Reporter: luoyuxia
>Priority: Major
> Fix For: 1.16.0
>
>
> Currently, to accesse kerberos-enabled Hive cluster, users are expected to 
> add ker/secret in flink-conf. But it can only access one Hive cluster in one 
> Flink cluster, we are also expected to support multi kerberos-enabled Hive 
> clusters  in one Flink cluster.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)