[jira] [Commented] (BEAM-645) Running Wordcount in Spark Checks Locally and Outputs in HDFS

Amit Sela (JIRA) Tue, 20 Sep 2016 12:17:55 -0700

    [ 
https://issues.apache.org/jira/browse/BEAM-645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507498#comment-15507498
 ]


Amit Sela commented on BEAM-645:
--------------------------------

I think it's the other way around - TextIO is the one passing knowledge to the 
runner, but before doing so, it validates the filepattern and fails.
The "touch local" hack suffices this validation, and moves on to execution 
where the read operation is translated to "sc.textFile".

What subtlety am i missing ? simply try the SDK's example but call 
"withoutValidation()" on the read, should work as it's the only difference 
between the examples.

And again, doing my best to support Read.Bound before the 27th (right ?) 

> Running Wordcount in Spark Checks Locally and Outputs in HDFS
> -------------------------------------------------------------
>
>                 Key: BEAM-645
>                 URL: https://issues.apache.org/jira/browse/BEAM-645
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-spark
>    Affects Versions: 0.3.0-incubating
>            Reporter: Jesse Anderson
>            Assignee: Amit Sela
>
> When running the Wordcount example with the Spark runner, the Spark runner 
> uses the input file in HDFS. When the program performs its startup checks, it 
> looks for the file in the local filesystem.
> To workaround this issue, you have to create a file in the local filesystem 
> and put the actual file in HDFS.
> Here is the stack trace when the file doesn't exist in the local filesystem:
> {quote}Exception in thread "main" java.lang.IllegalStateException: Unable to 
> find any files matching Macbeth.txt
>       at 
> org.apache.beam.sdk.repackaged.com.google.common.base.Preconditions.checkState(Preconditions.java:199)
>       at org.apache.beam.sdk.io.TextIO$Read$Bound.apply(TextIO.java:279)
>       at org.apache.beam.sdk.io.TextIO$Read$Bound.apply(TextIO.java:192)
>       at 
> org.apache.beam.sdk.runners.PipelineRunner.apply(PipelineRunner.java:76)
>       at org.apache.beam.runners.spark.SparkRunner.apply(SparkRunner.java:128)
>       at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:400)
>       at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:323)
>       at org.apache.beam.sdk.values.PBegin.apply(PBegin.java:58)
>       at org.apache.beam.sdk.Pipeline.apply(Pipeline.java:173)
>       at org.apache.beam.examples.WordCount.main(WordCount.java:195)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>       at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>       at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>       at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>       at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-645) Running Wordcount in Spark Checks Locally and Outputs in HDFS

Reply via email to