Hi Ravi,

I have seen a similar issue before. You can try to set
fs.hdfs.impl.disable.cache to true in your hadoop configuration. For
example, suppose your hadoop configuration file is hadoopConf, you can use
hadoopConf.setBoolean("fs.hdfs.impl.disable.cache", true)

Let me know if that helps.

Best,
Liquan


On Wed, Jul 16, 2014 at 4:56 PM, rpandya <r...@iecommerce.com> wrote:

> Matei - I tried using coalesce(numNodes, true), but it then seemed to run
> too
> few SNAP tasks - only 2 or 3 when I had specified 46. The job failed,
> perhaps for unrelated reasons, with some odd exceptions in the log (at the
> end of this message). But I really don't want to force data movement
> between
> nodes. The input data is in HDFS and should already be somewhat balanced
> among the nodes. We've run this scenario using the simple "hadoop jar"
> runner and a custom format jar to break the input into 8-line chunks
> (paired
> FASTQ). Ideally I'd like Spark to do the minimum data movement to balance
> the work, feeding each task mostly from data local to that node.
>
> Daniel - that's a good thought, I could invoke a small stub for each task
> that talks to a single local demon process over a socket, and serializes
> all
> the tasks on a given machine.
>
> Thanks,
>
> Ravi
>
> P.S. Log exceptions:
>
> 14/07/15 17:02:00 WARN yarn.ApplicationMaster: Unable to retrieve
> SparkContext in spite of waiting for 100000, maxNumTries = 10
> Exception in thread "main" java.lang.NullPointerException
>         at
>
> org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkContextInitialized(ApplicationMaster.scala:233)
>         at
>
> org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:110)
>
> ...and later...
>
> 14/07/15 17:11:07 ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM
> 14/07/15 17:11:07 INFO yarn.ApplicationMaster: AppMaster received a signal.
> 14/07/15 17:11:07 WARN rdd.NewHadoopRDD: Exception in RecordReader.close()
> java.io.IOException: Filesystem closed
>         at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:707)
>         at
> org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:619)
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Memory-compute-intensive-tasks-tp9643p9991.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>



-- 
Liquan Pei
Department of Physics
University of Massachusetts Amherst

Reply via email to