[ 
https://issues.apache.org/jira/browse/GOBBLIN-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Sen updated GOBBLIN-1308:
-----------------------------
    Description: 
Gobblin's hadoop tokens/ key management : 
 Problem: Gobblin only maintains local cluster tokens when key management is 
enabled. and does not have capability to manage tokens for remote hadoop 
cluster. ( based on my conversation with many folks here, the token files can 
be made available externally. but that would require that external system 
running on cron or something )

Solution: add remote cluster token management in Gobblin. where remote clusters 
key can be managed same way it manages the local clusters keys.

 

Config looks like following

( Changes the enable.key.management config to key.management.enabled )

 
{code:java}
gobblin.hadoop.key.management {
 enabled = true
 remote.clusters = [ ${gobblin_sync_systems.hadoop_cluster1}, 
${gobblin_sync_systems.hadoop_cluster2} ]
}

// These Gobblin platform configurations can be moved to database for other 
use-cases, but this layout helps make the platform moduler for each connectors.
gobblin_sync_systems {
 hadoop_cluster1 {
 // if Hadoop config path is specified, the FileSystem will be created based on 
all the xml config provided here, which has all the required info.
 hadoop_config_path = "file:///etc/hadoop_cluster1/hadoop/config"
 // If hadoop config path is not specified, you can still specify the speecific 
nodes for the specific type of tokens
 namenode_uri = ["hdfs://nn1.hadoop_cluster1.example.com:8020", 
"hdfs://nn2.hadoop_cluster1.example.com:8020"]
 kms_nodes = [ "kms1.hadoop_cluster1.example.com:9292", 
"kms2.hadoop_cluster1.example.com:9292" ]
 }
 hadoop_cluster2 {
 hadoop_config_path = "file:///etc/hadoop_cluster1/hadoop/config"
 namenode_uri = ["hdfs://nn1.hadoop_cluster2.example.com:8020", 
"hdfs://nn2.hadoop_cluster2.example.com:8020"]
 kms_nodes = [ "kms1.hadoop_cluster2.example.com:9292", 
"kms2.hadoop_cluster2.example.com:9292" ]
 }
}{code}

  was:
Gobblin's hadoop tokens/ key management : 
 Problem: Gobblin only maintains local cluster tokens when key management is 
enabled. and does not have capability to manage tokens for remote hadoop 
cluster. ( based on my conversation with many folks here, the token files can 
be made available externally. but that would require that external system 
running on cron or something )

Solution: add remote cluster token management in Gobblin. where remote clusters 
key can be managed same way it manages the local clusters keys.

 

Config looks like following

( Changes the enable.key.management config to key.management.enabled )

 
{code:java}
gobblin.yarn.key.management {
 enabled = true
 remote.clusters = [ ${gobblin_sync_systems.hadoop_cluster1}, 
${gobblin_sync_systems.hadoop_cluster2} ]
}

// These Gobblin platform configurations can be moved to database for other 
use-cases, but this layout helps make the platform moduler for each connectors.
gobblin_sync_systems {
 hadoop_cluster1 {
 // if Hadoop config path is specified, the FileSystem will be created based on 
all the xml config provided here, which has all the required info.
 hadoop_config_path = "file:///etc/hadoop_cluster1/hadoop/config"
 // If hadoop config path is not specified, you can still specify the speecific 
nodes for the specific type of tokens
 namenode_uri = ["hdfs://nn1.hadoop_cluster1.example.com:8020", 
"hdfs://nn2.hadoop_cluster1.example.com:8020"]
 kms_nodes = [ "kms1.hadoop_cluster1.example.com:9292", 
"kms2.hadoop_cluster1.example.com:9292" ]
 }
 hadoop_cluster2 {
 hadoop_config_path = "file:///etc/hadoop_cluster1/hadoop/config"
 namenode_uri = ["hdfs://nn1.hadoop_cluster2.example.com:8020", 
"hdfs://nn2.hadoop_cluster2.example.com:8020"]
 kms_nodes = [ "kms1.hadoop_cluster2.example.com:9292", 
"kms2.hadoop_cluster2.example.com:9292" ]
 }
}{code}


> Gobblin's kerberos token management for remote clusters
> -------------------------------------------------------
>
>                 Key: GOBBLIN-1308
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1308
>             Project: Apache Gobblin
>          Issue Type: Improvement
>    Affects Versions: 0.15.0
>            Reporter: Jay Sen
>            Priority: Major
>             Fix For: 0.16.0
>
>
> Gobblin's hadoop tokens/ key management : 
>  Problem: Gobblin only maintains local cluster tokens when key management is 
> enabled. and does not have capability to manage tokens for remote hadoop 
> cluster. ( based on my conversation with many folks here, the token files can 
> be made available externally. but that would require that external system 
> running on cron or something )
> Solution: add remote cluster token management in Gobblin. where remote 
> clusters key can be managed same way it manages the local clusters keys.
>  
> Config looks like following
> ( Changes the enable.key.management config to key.management.enabled )
>  
> {code:java}
> gobblin.hadoop.key.management {
>  enabled = true
>  remote.clusters = [ ${gobblin_sync_systems.hadoop_cluster1}, 
> ${gobblin_sync_systems.hadoop_cluster2} ]
> }
> // These Gobblin platform configurations can be moved to database for other 
> use-cases, but this layout helps make the platform moduler for each 
> connectors.
> gobblin_sync_systems {
>  hadoop_cluster1 {
>  // if Hadoop config path is specified, the FileSystem will be created based 
> on all the xml config provided here, which has all the required info.
>  hadoop_config_path = "file:///etc/hadoop_cluster1/hadoop/config"
>  // If hadoop config path is not specified, you can still specify the 
> speecific nodes for the specific type of tokens
>  namenode_uri = ["hdfs://nn1.hadoop_cluster1.example.com:8020", 
> "hdfs://nn2.hadoop_cluster1.example.com:8020"]
>  kms_nodes = [ "kms1.hadoop_cluster1.example.com:9292", 
> "kms2.hadoop_cluster1.example.com:9292" ]
>  }
>  hadoop_cluster2 {
>  hadoop_config_path = "file:///etc/hadoop_cluster1/hadoop/config"
>  namenode_uri = ["hdfs://nn1.hadoop_cluster2.example.com:8020", 
> "hdfs://nn2.hadoop_cluster2.example.com:8020"]
>  kms_nodes = [ "kms1.hadoop_cluster2.example.com:9292", 
> "kms2.hadoop_cluster2.example.com:9292" ]
>  }
> }{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to