[ https://issues.apache.org/jira/browse/GOBBLIN-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jay Sen updated GOBBLIN-1308: ----------------------------- Description: Gobblin's hadoop tokens/ key management : Problem: Gobblin only maintains local cluster tokens when key management is enabled. and does not have capability to manage tokens for remote hadoop cluster. ( based on my conversation with many folks here, the token files can be made available externally. but that would require that external system running on cron or something ) Solution: add remote cluster token management in Gobblin. where remote clusters key can be managed same way it manages the local clusters keys. Config looks like following ( Changes the enable.key.management config to key.management.enabled ) {code:java} gobblin.hadoop.key.management { enabled = true remote.clusters = [ ${gobblin_sync_systems.hadoop_cluster1}, ${gobblin_sync_systems.hadoop_cluster2} ] } // These Gobblin platform configurations can be moved to database for other use-cases, but this layout helps make the platform moduler for each connectors. gobblin_sync_systems { hadoop_cluster1 { // if Hadoop config path is specified, the FileSystem will be created based on all the xml config provided here, which has all the required info. hadoop_config_path = "file:///etc/hadoop_cluster1/hadoop/config" // If hadoop config path is not specified, you can still specify the speecific nodes for the specific type of tokens namenode_uri = ["hdfs://nn1.hadoop_cluster1.example.com:8020", "hdfs://nn2.hadoop_cluster1.example.com:8020"] kms_nodes = [ "kms1.hadoop_cluster1.example.com:9292", "kms2.hadoop_cluster1.example.com:9292" ] } hadoop_cluster2 { hadoop_config_path = "file:///etc/hadoop_cluster1/hadoop/config" namenode_uri = ["hdfs://nn1.hadoop_cluster2.example.com:8020", "hdfs://nn2.hadoop_cluster2.example.com:8020"] kms_nodes = [ "kms1.hadoop_cluster2.example.com:9292", "kms2.hadoop_cluster2.example.com:9292" ] } }{code} was: Gobblin's hadoop tokens/ key management : Problem: Gobblin only maintains local cluster tokens when key management is enabled. and does not have capability to manage tokens for remote hadoop cluster. ( based on my conversation with many folks here, the token files can be made available externally. but that would require that external system running on cron or something ) Solution: add remote cluster token management in Gobblin. where remote clusters key can be managed same way it manages the local clusters keys. Config looks like following ( Changes the enable.key.management config to key.management.enabled ) {code:java} gobblin.yarn.key.management { enabled = true remote.clusters = [ ${gobblin_sync_systems.hadoop_cluster1}, ${gobblin_sync_systems.hadoop_cluster2} ] } // These Gobblin platform configurations can be moved to database for other use-cases, but this layout helps make the platform moduler for each connectors. gobblin_sync_systems { hadoop_cluster1 { // if Hadoop config path is specified, the FileSystem will be created based on all the xml config provided here, which has all the required info. hadoop_config_path = "file:///etc/hadoop_cluster1/hadoop/config" // If hadoop config path is not specified, you can still specify the speecific nodes for the specific type of tokens namenode_uri = ["hdfs://nn1.hadoop_cluster1.example.com:8020", "hdfs://nn2.hadoop_cluster1.example.com:8020"] kms_nodes = [ "kms1.hadoop_cluster1.example.com:9292", "kms2.hadoop_cluster1.example.com:9292" ] } hadoop_cluster2 { hadoop_config_path = "file:///etc/hadoop_cluster1/hadoop/config" namenode_uri = ["hdfs://nn1.hadoop_cluster2.example.com:8020", "hdfs://nn2.hadoop_cluster2.example.com:8020"] kms_nodes = [ "kms1.hadoop_cluster2.example.com:9292", "kms2.hadoop_cluster2.example.com:9292" ] } }{code} > Gobblin's kerberos token management for remote clusters > ------------------------------------------------------- > > Key: GOBBLIN-1308 > URL: https://issues.apache.org/jira/browse/GOBBLIN-1308 > Project: Apache Gobblin > Issue Type: Improvement > Affects Versions: 0.15.0 > Reporter: Jay Sen > Priority: Major > Fix For: 0.16.0 > > > Gobblin's hadoop tokens/ key management : > Problem: Gobblin only maintains local cluster tokens when key management is > enabled. and does not have capability to manage tokens for remote hadoop > cluster. ( based on my conversation with many folks here, the token files can > be made available externally. but that would require that external system > running on cron or something ) > Solution: add remote cluster token management in Gobblin. where remote > clusters key can be managed same way it manages the local clusters keys. > > Config looks like following > ( Changes the enable.key.management config to key.management.enabled ) > > {code:java} > gobblin.hadoop.key.management { > enabled = true > remote.clusters = [ ${gobblin_sync_systems.hadoop_cluster1}, > ${gobblin_sync_systems.hadoop_cluster2} ] > } > // These Gobblin platform configurations can be moved to database for other > use-cases, but this layout helps make the platform moduler for each > connectors. > gobblin_sync_systems { > hadoop_cluster1 { > // if Hadoop config path is specified, the FileSystem will be created based > on all the xml config provided here, which has all the required info. > hadoop_config_path = "file:///etc/hadoop_cluster1/hadoop/config" > // If hadoop config path is not specified, you can still specify the > speecific nodes for the specific type of tokens > namenode_uri = ["hdfs://nn1.hadoop_cluster1.example.com:8020", > "hdfs://nn2.hadoop_cluster1.example.com:8020"] > kms_nodes = [ "kms1.hadoop_cluster1.example.com:9292", > "kms2.hadoop_cluster1.example.com:9292" ] > } > hadoop_cluster2 { > hadoop_config_path = "file:///etc/hadoop_cluster1/hadoop/config" > namenode_uri = ["hdfs://nn1.hadoop_cluster2.example.com:8020", > "hdfs://nn2.hadoop_cluster2.example.com:8020"] > kms_nodes = [ "kms1.hadoop_cluster2.example.com:9292", > "kms2.hadoop_cluster2.example.com:9292" ] > } > }{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)