[jira] [Commented] (BEAM-2457) Error: "Unable to find registrar for hdfs" - need to prevent/improve error message

2017-08-28 Thread Flavio Fiszman (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144110#comment-16144110
 ] 

Flavio Fiszman commented on BEAM-2457:
--

I'm not sure about the error in 2.2.0-SNAPSHOT,
but for the "Unable to find registrar for hdfs" error, there are these related 
questions that have solved other people's issues:

https://stackoverflow.com/questions/26958865/no-filesystem-for-scheme-hdfs/28135140#28135140
https://stackoverflow.com/questions/44365545/apache-beam-unable-to-find-registrar-for-gs
https://stackoverflow.com/questions/17265002/hadoop-no-filesystem-for-scheme-file
https://issues.apache.org/jira/projects/BEAM/issues/BEAM-2429?filter=allissues

Let me know if that helps you, thanks!

> Error: "Unable to find registrar for hdfs" - need to prevent/improve error 
> message
> --
>
> Key: BEAM-2457
> URL: https://issues.apache.org/jira/browse/BEAM-2457
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Affects Versions: 2.0.0
>Reporter: Stephen Sisk
>Assignee: Flavio Fiszman
>
> I've noticed a number of user reports where jobs are failing with the error 
> message "Unable to find registrar for hdfs": 
> * 
> https://stackoverflow.com/questions/44497662/apache-beamunable-to-find-registrar-for-hdfs/44508533?noredirect=1#comment76026835_44508533
> * 
> https://lists.apache.org/thread.html/144c384e54a141646fcbe854226bb3668da091c5dc7fa2d471626e9b@%3Cuser.beam.apache.org%3E
> * 
> https://lists.apache.org/thread.html/e4d5ac744367f9d036a1f776bba31b9c4fe377d8f11a4b530be9f829@%3Cuser.beam.apache.org%3E
>  
> This isn't too many reports, but it is the only time I can recall so many 
> users reporting the same error message in a such a short amount of time. 
> We believe the problem is one of two things: 
> 1) bad uber jar creation
> 2) incorrect HDFS configuration
> However, it's highly possible this could have some other root cause. 
> It seems like it'd be useful to:
> 1) Follow up with the above reports to see if they've resolved the issue, and 
> if so what fixed it. There may be another root cause out there.
> 2) Improve the error message to include more information about how to resolve 
> it
> 3) See if we can improve detection of the error cases to give more specific 
> information (specifically, if HDFS is miconfigured, can we detect that 
> somehow and tell the user exactly that?)
> 4) update documentation



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2702) Dataflow pipeline stalls after autoscaling

2017-07-31 Thread Flavio Fiszman (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108162#comment-16108162
 ] 

Flavio Fiszman commented on BEAM-2702:
--

Hi Johann,
JIRA issues are primarily used for Beam SDK-related issues. The issue here 
seems to be related to the Dataflow Runner in specific, so a better reporting 
mechanism is to send that information to dataflow-feedb...@google.com .
Thanks!

> Dataflow pipeline stalls after autoscaling
> --
>
> Key: BEAM-2702
> URL: https://issues.apache.org/jira/browse/BEAM-2702
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Affects Versions: 2.0.0
>Reporter: Johann Steinbrecher
>Assignee: Thomas Groh
>
> A 4 step dataflow pipeline (Pubsubio.Read, windowing, message parsing, 
> DatastoreV1.write) stalls as soon as the autoscaling algorithm is increasing 
> the number of workers from 1 to 4. 
> *Expected*:
> Throughput (elements/sec) for each pipeline step increases due to more 
> workers.
> *Actual*:
> Throughput (elements/sec) goes to 0 for all steps. The number of processed 
> elements in the first step equals the number of processed elements in the 
> last step. The number of workers stays high.
> Runner: google-cloud-platform managed dataflow runner
> Sample dataflow job id (log level debug):
> 2017-07-27_14_51_37-4624978117098944513
> Log message after autoscaling:
> Rpc to .. completed with error DEADLINE_EXCEEDED (cause or symptom?)
> autoscaling configuration 
> --autoscalingAlgorithm=THROUGHPUT_BASED 
> --maxNumWorkers=4 
> machine types tested:
> - n1-highmem-2
> - n1-standard-1
> zone: us-east1-d
> sdk version:
> org.apache.beam@2.0.0



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2298) Java WordCount doesn't work in Window OS for glob expressions or file: prefixed paths

2017-06-29 Thread Flavio Fiszman (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16068632#comment-16068632
 ] 

Flavio Fiszman commented on BEAM-2298:
--

Hi Vinay, thanks for the extra piece of information.
I'm currently working on a fix for this problem and will update this issue with 
progress.

> Java WordCount doesn't work in Window OS for glob expressions or file: 
> prefixed paths
> -
>
> Key: BEAM-2298
> URL: https://issues.apache.org/jira/browse/BEAM-2298
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Pei He
>Assignee: Flavio Fiszman
> Fix For: 2.1.0
>
>
> I am not able to build beam repo in Windows OS, so I copied the jar file from 
> my Mac.
> WordCount failed with the following cmd:
> java -cp beam-examples-java-2.0.0-jar-with-dependencies.jar
>  org.apache.beam.examples.WordCount --inputFile=input.txt --output=counts
> May 15, 2017 6:09:48 AM org.apache.beam.sdk.io.FileBasedSource 
> getEstimatedSizeB
> ytes
> INFO: Filepattern input.txt matched 1 files with total size 0
> May 15, 2017 6:09:48 AM org.apache.beam.sdk.io.FileBasedSource 
> expandFilePattern
> INFO: Matched 1 files for pattern input.txt
> May 15, 2017 6:09:48 AM org.apache.beam.sdk.io.FileBasedSource split
> INFO: Splitting filepattern input.txt into bundles of size 0 took 0 ms and 
> produ
> ced 1 files and 0 bundles
> May 15, 2017 6:09:48 AM org.apache.beam.sdk.io.WriteFiles$2 processElement
> INFO: Finalizing write operation 
> TextWriteOperation{tempDirectory=C:\Users\Pei\D
> esktop\.temp-beam-2017-05-135_13-09-48-1\, windowedWrites=false}.
> May 15, 2017 6:09:48 AM org.apache.beam.sdk.io.WriteFiles$2 processElement
> INFO: Creating 1 empty output shards in addition to 0 written for a total of 
> 1.
> Exception in thread "main" 
> org.apache.beam.sdk.Pipeline$PipelineExecutionExcepti
> on: java.lang.IllegalStateException: Unable to find registrar for c
> at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.wait
> UntilFinish(DirectRunner.java:322)
> at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.wait
> UntilFinish(DirectRunner.java:292)
> at 
> org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:200
> )
> at 
> org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:63)
> at org.apache.beam.sdk.Pipeline.run(Pipeline.java:295)
> at org.apache.beam.sdk.Pipeline.run(Pipeline.java:281)
> at org.apache.beam.examples.WordCount.main(WordCount.java:184)
> Caused by: java.lang.IllegalStateException: Unable to find registrar for c
> at 
> org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.
> java:447)
> at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:111)
> at 
> org.apache.beam.sdk.io.FileSystems.matchResources(FileSystems.java:17
> 4)
> at 
> org.apache.beam.sdk.io.FileSystems.filterMissingFiles(FileSystems.jav
> a:367)
> at org.apache.beam.sdk.io.FileSystems.copy(FileSystems.java:251)
> at 
> org.apache.beam.sdk.io.FileBasedSink$WriteOperation.copyToOutputFiles
> (FileBasedSink.java:641)
> at 
> org.apache.beam.sdk.io.FileBasedSink$WriteOperation.finalize(FileBase
> dSink.java:529)
> at 
> org.apache.beam.sdk.io.WriteFiles$2.processElement(WriteFiles.java:59
> 2)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2298) Java WordCount doesn't work in Window OS for glob expressions or file: prefixed paths

2017-06-09 Thread Flavio Fiszman (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Fiszman reassigned BEAM-2298:


Assignee: Flavio Fiszman

> Java WordCount doesn't work in Window OS for glob expressions or file: 
> prefixed paths
> -
>
> Key: BEAM-2298
> URL: https://issues.apache.org/jira/browse/BEAM-2298
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Pei He
>Assignee: Flavio Fiszman
> Fix For: 2.1.0
>
>
> I am not able to build beam repo in Windows OS, so I copied the jar file from 
> my Mac.
> WordCount failed with the following cmd:
> java -cp beam-examples-java-2.0.0-jar-with-dependencies.jar
>  org.apache.beam.examples.WordCount --inputFile=input.txt --output=counts
> May 15, 2017 6:09:48 AM org.apache.beam.sdk.io.FileBasedSource 
> getEstimatedSizeB
> ytes
> INFO: Filepattern input.txt matched 1 files with total size 0
> May 15, 2017 6:09:48 AM org.apache.beam.sdk.io.FileBasedSource 
> expandFilePattern
> INFO: Matched 1 files for pattern input.txt
> May 15, 2017 6:09:48 AM org.apache.beam.sdk.io.FileBasedSource split
> INFO: Splitting filepattern input.txt into bundles of size 0 took 0 ms and 
> produ
> ced 1 files and 0 bundles
> May 15, 2017 6:09:48 AM org.apache.beam.sdk.io.WriteFiles$2 processElement
> INFO: Finalizing write operation 
> TextWriteOperation{tempDirectory=C:\Users\Pei\D
> esktop\.temp-beam-2017-05-135_13-09-48-1\, windowedWrites=false}.
> May 15, 2017 6:09:48 AM org.apache.beam.sdk.io.WriteFiles$2 processElement
> INFO: Creating 1 empty output shards in addition to 0 written for a total of 
> 1.
> Exception in thread "main" 
> org.apache.beam.sdk.Pipeline$PipelineExecutionExcepti
> on: java.lang.IllegalStateException: Unable to find registrar for c
> at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.wait
> UntilFinish(DirectRunner.java:322)
> at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.wait
> UntilFinish(DirectRunner.java:292)
> at 
> org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:200
> )
> at 
> org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:63)
> at org.apache.beam.sdk.Pipeline.run(Pipeline.java:295)
> at org.apache.beam.sdk.Pipeline.run(Pipeline.java:281)
> at org.apache.beam.examples.WordCount.main(WordCount.java:184)
> Caused by: java.lang.IllegalStateException: Unable to find registrar for c
> at 
> org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.
> java:447)
> at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:111)
> at 
> org.apache.beam.sdk.io.FileSystems.matchResources(FileSystems.java:17
> 4)
> at 
> org.apache.beam.sdk.io.FileSystems.filterMissingFiles(FileSystems.jav
> a:367)
> at org.apache.beam.sdk.io.FileSystems.copy(FileSystems.java:251)
> at 
> org.apache.beam.sdk.io.FileBasedSink$WriteOperation.copyToOutputFiles
> (FileBasedSink.java:641)
> at 
> org.apache.beam.sdk.io.FileBasedSink$WriteOperation.finalize(FileBase
> dSink.java:529)
> at 
> org.apache.beam.sdk.io.WriteFiles$2.processElement(WriteFiles.java:59
> 2)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-2429) Conflicting filesystems with used of HadoopFileSystem

2017-06-09 Thread Flavio Fiszman (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044762#comment-16044762
 ] 

Flavio Fiszman commented on BEAM-2429:
--

Forgot to attach the related question with some context, here it is: 
https://stackoverflow.com/questions/12391226/hadoop-hdfs-points-to-file-not-hdfs?rq=1

> Conflicting filesystems with used of HadoopFileSystem
> -
>
> Key: BEAM-2429
> URL: https://issues.apache.org/jira/browse/BEAM-2429
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Affects Versions: 2.0.0
>Reporter: François Wagner
>Assignee: Flavio Fiszman
>
> I'm facing issue when trying to use HadoopFileSystem in my pipeline. It looks 
> like HadoopFileSystem is registring itself under the `file` schema 
> (https://github.com/apache/beam/pull/2777/files#diff-330bd0854dcab6037ef0e52c05d68eb2L79),
>  hence the following Exception is thrown when trying to register 
> HadoopFileSystem.
> java.lang.IllegalStateException: Scheme: [file] has conflicting filesystems: 
> [org.apache.beam.sdk.io.LocalFileSystem, 
> org.apache.beam.sdk.io.hdfs.HadoopFileSystem]
>   at 
> org.apache.beam.sdk.io.FileSystems.verifySchemesAreUnique(FileSystems.java:498)
> What is the correct way to handle `hdfs` url out of the box with TextIO & 
> AvroIO ?
> {code:java}
> String[] args = new String[]{
> "--hdfsConfiguration=[{\"dfs.client.use.datanode.hostname\": 
> \"true\"}]"};
> HadoopFileSystemOptions options = PipelineOptionsFactory
> .fromArgs(args)
> .withValidation()
> .as(HadoopFileSystemOptions.class);
> Pipeline pipeline = Pipeline.create(options); 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-2429) Conflicting filesystems with used of HadoopFileSystem

2017-06-09 Thread Flavio Fiszman (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044757#comment-16044757
 ] 

Flavio Fiszman commented on BEAM-2429:
--

Hi François! In your hdfsConfiguration, you're specifying 
"use.datanode.hostname" for your use case. On top of that, you should also 
specify the field "fs.defaultFS" with hdfs:///, specifying the whole URI. You 
can just add that field to the map.

This is a related question for reference, and notice how fs.defaultFS is the 
new name for fs.default.name, if you see both of them appearing online. Let me 
know if you have any questions, thanks!

> Conflicting filesystems with used of HadoopFileSystem
> -
>
> Key: BEAM-2429
> URL: https://issues.apache.org/jira/browse/BEAM-2429
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Affects Versions: 2.0.0
>Reporter: François Wagner
>Assignee: Flavio Fiszman
>
> I'm facing issue when trying to use HadoopFileSystem in my pipeline. It looks 
> like HadoopFileSystem is registring itself under the `file` schema 
> (https://github.com/apache/beam/pull/2777/files#diff-330bd0854dcab6037ef0e52c05d68eb2L79),
>  hence the following Exception is thrown when trying to register 
> HadoopFileSystem.
> java.lang.IllegalStateException: Scheme: [file] has conflicting filesystems: 
> [org.apache.beam.sdk.io.LocalFileSystem, 
> org.apache.beam.sdk.io.hdfs.HadoopFileSystem]
>   at 
> org.apache.beam.sdk.io.FileSystems.verifySchemesAreUnique(FileSystems.java:498)
> What is the correct way to handle `hdfs` url out of the box with TextIO & 
> AvroIO ?
> {code:java}
> String[] args = new String[]{
> "--hdfsConfiguration=[{\"dfs.client.use.datanode.hostname\": 
> \"true\"}]"};
> HadoopFileSystemOptions options = PipelineOptionsFactory
> .fromArgs(args)
> .withValidation()
> .as(HadoopFileSystemOptions.class);
> Pipeline pipeline = Pipeline.create(options); 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)