Zhixiong Chen created GOBBLIN-716: ------------------------------------- Summary: Add FileBasedSource lineage event Key: GOBBLIN-716 URL: https://issues.apache.org/jira/browse/GOBBLIN-716 Project: Apache Gobblin Issue Type: Bug Reporter: Zhixiong Chen Assignee: Zhixiong Chen
It'd be useful to support configuration properties to override the default username when connecting to a HDFS cluster, e.g. in the HDFS writers. The system username that owns the Gobblin process is used by default. One particular use case for this is for stand-alone Gobblin instances running as the `root` system user within a Docker container. Individual users within an organization employing a stand-alone Gobblin cluster for data integration needs across multiple teams may have multiple users submitting jobs meant to touch different parts of the HDFS namespace under the control of separate users. Note that this feature is not quite security-relevant, as this would still allow any job configuration file to specify any username, so there aren't any enforced privilege boundaries anyway. One solution that does not appear to work is to specify the `hadoop.job.ugi` property in a job configuration file, despite what this appears to suggest in [FsDataWriter.java](https://github.com/linkedin/gobblin/blob/7141ec88c255c8c3cbc7054fb8146eebe77fc09d/gobblin-core/src/main/java/gobblin/writer/FsDataWriter.java#L88-L91): ```java Configuration conf = new Configuration(); // Add all job configuration properties so they are picked up by Hadoop JobConfigurationUtils.putStateIntoConfiguration(properties, conf); this.fs = WriterUtils.getWriterFS(properties, this.numBranches, this.branchId); ``` *Github Url* : https://github.com/linkedin/gobblin/issues/1904 *Github Reporter* : *mgomezch* *Github Created At* : 2017-05-26T18:58:16Z *Github Updated At* : 2017-05-26T18:58:16Z -- This message was sent by Atlassian JIRA (v7.6.3#76005)