[ https://issues.apache.org/jira/browse/BEAM-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15996232#comment-15996232 ]
ASF GitHub Bot commented on BEAM-1491: -------------------------------------- GitHub user 397090770 opened a pull request: https://github.com/apache/beam/pull/2890 [BEAM-1491]Identify HADOOP_CONF_DIR(or YARN_CONF_DIR) environment variables Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-<Jira issue #>] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. - [ ] Replace `<Jira issue #>` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/397090770/beam BEAM-1491 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/2890.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2890 ---- commit 9bd067d2a1d3b7210560e8edce2763fe3c000dba Author: yangping.wu <yangping...@qunar.com> Date: 2017-05-04T06:04:08Z [BEAM-1491]Identify HADOOP_CONF_DIR(or YARN_CONF_DIR) environment variables commit 59e6302631a716da0472894fb0dfc866a770b533 Author: yangping.wu <yangping...@qunar.com> Date: 2017-05-04T06:11:45Z [BEAM-1491]Identify HADOOP_CONF_DIR(or YARN_CONF_DIR) environment variables ---- > HDFSFileSource should be able to read the HADOOP_CONF_DIR(YARN_CONF_DIR) > environment variable > --------------------------------------------------------------------------------------------- > > Key: BEAM-1491 > URL: https://issues.apache.org/jira/browse/BEAM-1491 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core > Affects Versions: 0.6.0 > Reporter: wyp > Assignee: Jean-Baptiste Onofré > > Currently, if we want to read file store on HDFS, we will do it as follow: > {code} > PRead.Bounded<KV<LongWritable, Text>> from = > Read.from(HDFSFileSource.from("hdfs://hadoopserver:8020/tmp/data.txt", > TextInputFormat.class, LongWritable.class, Text.class)); > PCollection<KV<LongWritable, Text>> data = p.apply(from); > {code} > or > {code} > Configuration conf = new Configuration(); > conf.set("fs.default.name", "hdfs://hadoopserver:8020"); > PRead.Bounded<KV<LongWritable, Text>> from = > Read.from(HDFSFileSource.from("/tmp/data.txt", TextInputFormat.class, > LongWritable.class, Text.class).withConfiguration(conf)); > PCollection<KV<LongWritable, Text>> data = p.apply(from); > {code} > As we have seen above, we must be set {{hdfs://hadoopserver:8020}} in the > file path > if we can initialize {{conf}} by reading > {{HADOOP_CONF_DIR}}({{YARN_CONF_DIR}}) environmen variable, then we can read > HDFS file like this: > {code} > PRead.Bounded<KV<LongWritable, Text>> from = > Read.from(HDFSFileSource.from("/tmp/data.txt", TextInputFormat.class, > LongWritable.class, Text.class)); > PCollection<KV<LongWritable, Text>> data = p.apply(from); > {code} > note we don't specify {{hdfs://hadoopserver:8020}} prefix, because the > program read it from {{HADOOP_CONF_DIR}}({{YARN_CONF_DIR}}) environmen, and > the program will read file from HDFS. -- This message was sent by Atlassian JIRA (v6.3.15#6346)