[jira] [Comment Edited] (BEAM-2277) IllegalArgumentException when using Hadoop file system for WordCount example.

2018-08-21 Thread Jozef Vilcek (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587423#comment-16587423
 ] 

Jozef Vilcek edited comment on BEAM-2277 at 8/21/18 1:31 PM:
-

So after more investigation, URI, which is backing the HadoopResourceId does 
also drop '//' on empty authority when calling HadoopResourceIr.resolve() and 
produce string versions of resource in "hdfs:/path" form. This is not accepted 
back as hdfs scheme by FileSystems.matchNewResource(). The question is, what 
would be an elegant fix to that.


was (Author: jozovilcek):
So after ore investigation, URI, which is backing the HadoopResourceId does 
also drope empty authority hen calling HadoopResourceIr.resolve() and produce 
string versions of resource in "hdfs:/path" form. This is not accepted back as 
hdfs scheme by FileSystems.matchNewResource(). The question is, what would be 
an elegant fix to that.

> IllegalArgumentException when using Hadoop file system for WordCount example.
> -
>
> Key: BEAM-2277
> URL: https://issues.apache.org/jira/browse/BEAM-2277
> Project: Beam
>  Issue Type: Bug
>  Components: z-do-not-use-sdk-java-extensions
>Affects Versions: 2.6.0
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>Priority: Blocker
> Fix For: 2.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> IllegalArgumentException when using Hadoop file system for WordCount example.
> Occurred when running WordCount example using Spark runner on a YARN cluster.
> Command-line arguments:
> {code:none}
> --runner=SparkRunner --inputFile=hdfs:///user/myuser/kinglear.txt 
> --output=hdfs:///user/myuser/wc/wc
> {code}
> Stack trace:
> {code:none}
> java.lang.IllegalArgumentException: Expect srcResourceIds and destResourceIds 
> have the same scheme, but received file, hdfs.
>   at 
> org.apache.beam.sdk.repackaged.com.google.common.base.Preconditions.checkArgument(Preconditions.java:122)
>   at 
> org.apache.beam.sdk.io.FileSystems.validateSrcDestLists(FileSystems.java:394)
>   at org.apache.beam.sdk.io.FileSystems.copy(FileSystems.java:236)
>   at 
> org.apache.beam.sdk.io.FileBasedSink$WriteOperation.copyToOutputFiles(FileBasedSink.java:626)
>   at 
> org.apache.beam.sdk.io.FileBasedSink$WriteOperation.finalize(FileBasedSink.java:516)
>   at 
> org.apache.beam.sdk.io.WriteFiles$2.processElement(WriteFiles.java:592)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-2277) IllegalArgumentException when using Hadoop file system for WordCount example.

2018-08-21 Thread Jozef Vilcek (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587202#comment-16587202
 ] 

Jozef Vilcek edited comment on BEAM-2277 at 8/21/18 10:54 AM:
--

Is anyone have this problem if it is using HDFS paths in form with vs without 
authority? *hdfs:///path/to/dir*    vs   *hdfs://ha-nn/path/to/dir* ?

I suspect the problem could be drop empty authority here:

[https://github.com/apache/beam/blob/master/sdks/java/io/hadoop-file-system/src/main/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystem.java#L260]

Which does for
{code:java}
FileSystems.matchNewResource("hdfs:///path/to/dir", false)
{code}
produce
{code:java}
hdfs:/path/to/dir
{code}
Reading back such path via  FileSystems.matchNewResource()  produce a 
ResourceId with "file" schema because of the chance in BEAM-5180

 


was (Author: jozovilcek):
Is anyone have this problem if it is using HDFS paths in form with vs without 
authority? *hdfs:///path/to/dir*    vs   *hdfs://ha-nn/path/to/dir* ?


I suspect the problem could be drop empty authority here:

[https://github.com/apache/beam/blob/master/sdks/java/io/hadoop-file-system/src/main/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystem.java#L260]

Which does for
{code:java}
FileSystems.matchNewResource("hdfs:///path/to/dir", false)
{code}
produce

 

 
{code:java}
hdfs:/path/to/dir
{code}
Reading back such path via  FileSystems.matchNewResource()  produce a 
ResourceId with "file" schema because of the chance in BEAM-5180

 

> IllegalArgumentException when using Hadoop file system for WordCount example.
> -
>
> Key: BEAM-2277
> URL: https://issues.apache.org/jira/browse/BEAM-2277
> Project: Beam
>  Issue Type: Bug
>  Components: z-do-not-use-sdk-java-extensions
>Affects Versions: 2.6.0
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>Priority: Blocker
> Fix For: 2.0.0
>
>
> IllegalArgumentException when using Hadoop file system for WordCount example.
> Occurred when running WordCount example using Spark runner on a YARN cluster.
> Command-line arguments:
> {code:none}
> --runner=SparkRunner --inputFile=hdfs:///user/myuser/kinglear.txt 
> --output=hdfs:///user/myuser/wc/wc
> {code}
> Stack trace:
> {code:none}
> java.lang.IllegalArgumentException: Expect srcResourceIds and destResourceIds 
> have the same scheme, but received file, hdfs.
>   at 
> org.apache.beam.sdk.repackaged.com.google.common.base.Preconditions.checkArgument(Preconditions.java:122)
>   at 
> org.apache.beam.sdk.io.FileSystems.validateSrcDestLists(FileSystems.java:394)
>   at org.apache.beam.sdk.io.FileSystems.copy(FileSystems.java:236)
>   at 
> org.apache.beam.sdk.io.FileBasedSink$WriteOperation.copyToOutputFiles(FileBasedSink.java:626)
>   at 
> org.apache.beam.sdk.io.FileBasedSink$WriteOperation.finalize(FileBasedSink.java:516)
>   at 
> org.apache.beam.sdk.io.WriteFiles$2.processElement(WriteFiles.java:592)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)