[ https://issues.apache.org/jira/browse/BEAM-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16587202#comment-16587202 ]
Jozef Vilcek commented on BEAM-2277: ------------------------------------ Is anyone have this problem if it is using HDFS paths in form with vs without authority? *hdfs:///path/to/dir* vs *hdfs://ha-nn/path/to/dir* ? I suspect the problem could be drop empty authority here: [https://github.com/apache/beam/blob/master/sdks/java/io/hadoop-file-system/src/main/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystem.java#L260] Which does for {code:java} FileSystems.matchNewResource("hdfs:///path/to/dir", false) {code} produce {code:java} hdfs:/path/to/dir {code} Reading back such path via FileSystems.matchNewResource() produce a ResourceId with "file" schema because of the chance in BEAM-5180 > IllegalArgumentException when using Hadoop file system for WordCount example. > ----------------------------------------------------------------------------- > > Key: BEAM-2277 > URL: https://issues.apache.org/jira/browse/BEAM-2277 > Project: Beam > Issue Type: Bug > Components: z-do-not-use-sdk-java-extensions > Affects Versions: 2.6.0 > Reporter: Aviem Zur > Assignee: Aviem Zur > Priority: Blocker > Fix For: 2.0.0 > > > IllegalArgumentException when using Hadoop file system for WordCount example. > Occurred when running WordCount example using Spark runner on a YARN cluster. > Command-line arguments: > {code:none} > --runner=SparkRunner --inputFile=hdfs:///user/myuser/kinglear.txt > --output=hdfs:///user/myuser/wc/wc > {code} > Stack trace: > {code:none} > java.lang.IllegalArgumentException: Expect srcResourceIds and destResourceIds > have the same scheme, but received file, hdfs. > at > org.apache.beam.sdk.repackaged.com.google.common.base.Preconditions.checkArgument(Preconditions.java:122) > at > org.apache.beam.sdk.io.FileSystems.validateSrcDestLists(FileSystems.java:394) > at org.apache.beam.sdk.io.FileSystems.copy(FileSystems.java:236) > at > org.apache.beam.sdk.io.FileBasedSink$WriteOperation.copyToOutputFiles(FileBasedSink.java:626) > at > org.apache.beam.sdk.io.FileBasedSink$WriteOperation.finalize(FileBasedSink.java:516) > at > org.apache.beam.sdk.io.WriteFiles$2.processElement(WriteFiles.java:592) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)