Hi Spark users and developers,
I have been trying to use spark-ec2. After I launched the spark cluster
(1.4.1) with ephemeral hdfs (using hadoop 2.4.0), I tried to execute a job
where the data is stored in the ephemeral hdfs. It does not matter what I
tried to do, there is no data locality at
Hi guys,
I am running some SQL queries, but all my tasks are reported as either
NODE_LOCAL or PROCESS_LOCAL.
In case of Hadoop world, the reduce tasks are RACK or NON_RACK LOCAL because
they have to aggregate data from multiple hosts. However, in Spark even the
aggregation stages are reported
time roughly the same)
spark.storage.memoryFraction 0.9
Mike
From: Timothy Chen t...@mesosphere.io
To: Michael V Le/Watson/IBM@IBMUS
Cc: user user@spark.apache.org
Date: 01/10/2015 04:31 AM
Subject:Re: Data locality running Spark on Mesos
Hi Michael,
I see you capped
@IBMUS
Cc: user user@spark.apache.org
Date: 01/08/2015 03:04 PM
Subject: Re: Data locality running Spark on Mesos
How did you run this benchmark, and is there a open version I can try it with?
And what is your configurations, like spark.locality.wait, etc?
Tim
On Thu, Jan 8
in the fine-grained case.
BTW: i'm using Spark ver 1.1.0 and Mesos ver 0.20.0
Thanks,
Mike
From: Tim Chen t...@mesosphere.io
To: Michael V Le/Watson/IBM@IBMUS
Cc: user user@spark.apache.org
Date: 01/08/2015 03:04 PM
Subject:Re: Data locality running Spark on Mesos
How
, especially for coarse-grained mode as the executors
supposedly do not go away until job completion.
Any ideas?
Thanks,
Mike
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Data-locality-running-Spark-on-Mesos-tp21041.html
Sent from the Apache Spark User
do not go away until job completion.
Any ideas?
Thanks,
Mike
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Data-locality-running-Spark-on-Mesos-tp21041.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Performance-of-Akka-or-TCP-Socket-input-sources-vs-HDFS-Data-locality-in-Spark-Streaming-tp7317.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
fault tolerance, and
the ability to checkpoint and recover even if master fails.
Cheers,
Nilesh
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Performance-of-Akka-or-TCP-Socket-input-sources-vs-HDFS-Data-locality-in-Spark-Streaming-tp7317.html
Sent