[jira] [Created] (BEAM-645) Running Wordcount in Spark Checks Locally and Outputs in HDFS

Jesse Anderson (JIRA) Tue, 20 Sep 2016 11:25:48 -0700

Jesse Anderson created BEAM-645:
-----------------------------------

             Summary: Running Wordcount in Spark Checks Locally and Outputs in 
HDFS
                 Key: BEAM-645
                 URL: https://issues.apache.org/jira/browse/BEAM-645
             Project: Beam
          Issue Type: Bug
          Components: runner-spark
    Affects Versions: 0.3.0-incubating
            Reporter: Jesse Anderson
            Assignee: Amit Sela



When running the Wordcount example with the Spark runner, the Spark runner uses 
the input file in HDFS. When the program performs its startup checks, it looks 
for the file in the local filesystem.

To workaround this issue, you have to create a file in the local filesystem and 
put the actual file in HDFS.

Here is the stack trace when the file doesn't exist in the local filesystem:
{quote}Exception in thread "main" java.lang.IllegalStateException: Unable to 
find any files matching Macbeth.txt
        at 
org.apache.beam.sdk.repackaged.com.google.common.base.Preconditions.checkState(Preconditions.java:199)
        at org.apache.beam.sdk.io.TextIO$Read$Bound.apply(TextIO.java:279)
        at org.apache.beam.sdk.io.TextIO$Read$Bound.apply(TextIO.java:192)
        at 
org.apache.beam.sdk.runners.PipelineRunner.apply(PipelineRunner.java:76)
        at org.apache.beam.runners.spark.SparkRunner.apply(SparkRunner.java:128)
        at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:400)
        at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:323)
        at org.apache.beam.sdk.values.PBegin.apply(PBegin.java:58)
        at org.apache.beam.sdk.Pipeline.apply(Pipeline.java:173)
        at org.apache.beam.examples.WordCount.main(WordCount.java:195)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
        at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (BEAM-645) Running Wordcount in Spark Checks Locally and Outputs in HDFS

Reply via email to