[jira] [Comment Edited] (BEAM-2277) IllegalArgumentException when using Hadoop file system for WordCount example.
[ https://issues.apache.org/jira/browse/BEAM-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587423#comment-16587423 ] Jozef Vilcek edited comment on BEAM-2277 at 8/21/18 1:31 PM: - So after more investigation, URI, which is backing the HadoopResourceId does also drop '//' on empty authority when calling HadoopResourceIr.resolve() and produce string versions of resource in "hdfs:/path" form. This is not accepted back as hdfs scheme by FileSystems.matchNewResource(). The question is, what would be an elegant fix to that. was (Author: jozovilcek): So after ore investigation, URI, which is backing the HadoopResourceId does also drope empty authority hen calling HadoopResourceIr.resolve() and produce string versions of resource in "hdfs:/path" form. This is not accepted back as hdfs scheme by FileSystems.matchNewResource(). The question is, what would be an elegant fix to that. > IllegalArgumentException when using Hadoop file system for WordCount example. > - > > Key: BEAM-2277 > URL: https://issues.apache.org/jira/browse/BEAM-2277 > Project: Beam > Issue Type: Bug > Components: z-do-not-use-sdk-java-extensions >Affects Versions: 2.6.0 >Reporter: Aviem Zur >Assignee: Aviem Zur >Priority: Blocker > Fix For: 2.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > IllegalArgumentException when using Hadoop file system for WordCount example. > Occurred when running WordCount example using Spark runner on a YARN cluster. > Command-line arguments: > {code:none} > --runner=SparkRunner --inputFile=hdfs:///user/myuser/kinglear.txt > --output=hdfs:///user/myuser/wc/wc > {code} > Stack trace: > {code:none} > java.lang.IllegalArgumentException: Expect srcResourceIds and destResourceIds > have the same scheme, but received file, hdfs. > at > org.apache.beam.sdk.repackaged.com.google.common.base.Preconditions.checkArgument(Preconditions.java:122) > at > org.apache.beam.sdk.io.FileSystems.validateSrcDestLists(FileSystems.java:394) > at org.apache.beam.sdk.io.FileSystems.copy(FileSystems.java:236) > at > org.apache.beam.sdk.io.FileBasedSink$WriteOperation.copyToOutputFiles(FileBasedSink.java:626) > at > org.apache.beam.sdk.io.FileBasedSink$WriteOperation.finalize(FileBasedSink.java:516) > at > org.apache.beam.sdk.io.WriteFiles$2.processElement(WriteFiles.java:592) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (BEAM-2277) IllegalArgumentException when using Hadoop file system for WordCount example.
[ https://issues.apache.org/jira/browse/BEAM-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587202#comment-16587202 ] Jozef Vilcek edited comment on BEAM-2277 at 8/21/18 10:54 AM: -- Is anyone have this problem if it is using HDFS paths in form with vs without authority? *hdfs:///path/to/dir* vs *hdfs://ha-nn/path/to/dir* ? I suspect the problem could be drop empty authority here: [https://github.com/apache/beam/blob/master/sdks/java/io/hadoop-file-system/src/main/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystem.java#L260] Which does for {code:java} FileSystems.matchNewResource("hdfs:///path/to/dir", false) {code} produce {code:java} hdfs:/path/to/dir {code} Reading back such path via FileSystems.matchNewResource() produce a ResourceId with "file" schema because of the chance in BEAM-5180 was (Author: jozovilcek): Is anyone have this problem if it is using HDFS paths in form with vs without authority? *hdfs:///path/to/dir* vs *hdfs://ha-nn/path/to/dir* ? I suspect the problem could be drop empty authority here: [https://github.com/apache/beam/blob/master/sdks/java/io/hadoop-file-system/src/main/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystem.java#L260] Which does for {code:java} FileSystems.matchNewResource("hdfs:///path/to/dir", false) {code} produce {code:java} hdfs:/path/to/dir {code} Reading back such path via FileSystems.matchNewResource() produce a ResourceId with "file" schema because of the chance in BEAM-5180 > IllegalArgumentException when using Hadoop file system for WordCount example. > - > > Key: BEAM-2277 > URL: https://issues.apache.org/jira/browse/BEAM-2277 > Project: Beam > Issue Type: Bug > Components: z-do-not-use-sdk-java-extensions >Affects Versions: 2.6.0 >Reporter: Aviem Zur >Assignee: Aviem Zur >Priority: Blocker > Fix For: 2.0.0 > > > IllegalArgumentException when using Hadoop file system for WordCount example. > Occurred when running WordCount example using Spark runner on a YARN cluster. > Command-line arguments: > {code:none} > --runner=SparkRunner --inputFile=hdfs:///user/myuser/kinglear.txt > --output=hdfs:///user/myuser/wc/wc > {code} > Stack trace: > {code:none} > java.lang.IllegalArgumentException: Expect srcResourceIds and destResourceIds > have the same scheme, but received file, hdfs. > at > org.apache.beam.sdk.repackaged.com.google.common.base.Preconditions.checkArgument(Preconditions.java:122) > at > org.apache.beam.sdk.io.FileSystems.validateSrcDestLists(FileSystems.java:394) > at org.apache.beam.sdk.io.FileSystems.copy(FileSystems.java:236) > at > org.apache.beam.sdk.io.FileBasedSink$WriteOperation.copyToOutputFiles(FileBasedSink.java:626) > at > org.apache.beam.sdk.io.FileBasedSink$WriteOperation.finalize(FileBasedSink.java:516) > at > org.apache.beam.sdk.io.WriteFiles$2.processElement(WriteFiles.java:592) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)