[ https://issues.apache.org/jira/browse/BEAM-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
yangping wu updated BEAM-1491: ------------------------------ Summary: HDFSFileSource should be able to read the HADOOP_CONF_DIR(YARN_CONF_DIR) environment variable (was: HDFSFileSource should be able to read the HADOOP_CONF_DIR(YARN_CONF_DIR) environmen variable) > HDFSFileSource should be able to read the HADOOP_CONF_DIR(YARN_CONF_DIR) > environment variable > --------------------------------------------------------------------------------------------- > > Key: BEAM-1491 > URL: https://issues.apache.org/jira/browse/BEAM-1491 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core > Affects Versions: 0.6.0 > Reporter: yangping wu > Assignee: Jean-Baptiste Onofré > > Currently, if we want to read file store on HDFS, we will do it as follow: > {code} > PRead.Bounded<KV<LongWritable, Text>> from = > Read.from(HDFSFileSource.from("hdfs://hadoopserver:8020/tmp/data.txt", > TextInputFormat.class, LongWritable.class, Text.class)); > PCollection<KV<LongWritable, Text>> data = p.apply(from); > {code} > or > {code} > Configuration conf = new Configuration(); > conf.set("fs.default.name", "hdfs://hadoopserver:8020"); > PRead.Bounded<KV<LongWritable, Text>> from = > Read.from(HDFSFileSource.from("/tmp/data.txt", TextInputFormat.class, > LongWritable.class, Text.class).withConfiguration(conf)); > PCollection<KV<LongWritable, Text>> data = p.apply(from); > {code} > As we have seen above, we must be set {{hdfs://hadoopserver:8020}} in the > file path > if we can initialize {{conf}} by reading > {{HADOOP_CONF_DIR}}({{YARN_CONF_DIR}}) environmen variable, then we can read > HDFS file like this: > {code} > PRead.Bounded<KV<LongWritable, Text>> from = > Read.from(HDFSFileSource.from("/tmp/data.txt", TextInputFormat.class, > LongWritable.class, Text.class)); > PCollection<KV<LongWritable, Text>> data = p.apply(from); > {code} > note we don't specify {{hdfs://hadoopserver:8020}} prefix, because the > program read it from {{HADOOP_CONF_DIR}}({{YARN_CONF_DIR}}) environmen, and > the program will read file from HDFS. -- This message was sent by Atlassian JIRA (v6.3.15#6346)