Poor HDFS Data Locality on Spark-EC2

2015-08-04 Thread Jerry Lam
Hi Spark users and developers, I have been trying to use spark-ec2. After I launched the spark cluster (1.4.1) with ephemeral hdfs (using hadoop 2.4.0), I tried to execute a job where the data is stored in the ephemeral hdfs. It does not matter what I tried to do, there is no data locality at

data locality in spark

2015-04-27 Thread Grandl Robert
Hi guys, I am running some SQL queries, but all my tasks are reported as either NODE_LOCAL or PROCESS_LOCAL.  In case of Hadoop world, the reduce tasks are RACK or NON_RACK LOCAL because they have to aggregate data from multiple hosts. However, in Spark even the aggregation stages are reported

Re: Data locality running Spark on Mesos

2015-01-11 Thread Michael V Le
time roughly the same) spark.storage.memoryFraction 0.9 Mike From: Timothy Chen t...@mesosphere.io To: Michael V Le/Watson/IBM@IBMUS Cc: user user@spark.apache.org Date: 01/10/2015 04:31 AM Subject:Re: Data locality running Spark on Mesos Hi Michael, I see you capped

Re: Data locality running Spark on Mesos

2015-01-10 Thread Timothy Chen
@IBMUS Cc: user user@spark.apache.org Date: 01/08/2015 03:04 PM Subject: Re: Data locality running Spark on Mesos How did you run this benchmark, and is there a open version I can try it with? And what is your configurations, like spark.locality.wait, etc? Tim On Thu, Jan 8

Re: Data locality running Spark on Mesos

2015-01-09 Thread Michael V Le
in the fine-grained case. BTW: i'm using Spark ver 1.1.0 and Mesos ver 0.20.0 Thanks, Mike From: Tim Chen t...@mesosphere.io To: Michael V Le/Watson/IBM@IBMUS Cc: user user@spark.apache.org Date: 01/08/2015 03:04 PM Subject:Re: Data locality running Spark on Mesos How

Data locality running Spark on Mesos

2015-01-08 Thread mvle
, especially for coarse-grained mode as the executors supposedly do not go away until job completion. Any ideas? Thanks, Mike -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Data-locality-running-Spark-on-Mesos-tp21041.html Sent from the Apache Spark User

Re: Data locality running Spark on Mesos

2015-01-08 Thread Tim Chen
do not go away until job completion. Any ideas? Thanks, Mike -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Data-locality-running-Spark-on-Mesos-tp21041.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Performance of Akka or TCP Socket input sources vs HDFS: Data locality in Spark Streaming

2014-06-10 Thread Nilesh Chakraborty
in context: http://apache-spark-user-list.1001560.n3.nabble.com/Performance-of-Akka-or-TCP-Socket-input-sources-vs-HDFS-Data-locality-in-Spark-Streaming-tp7317.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Performance of Akka or TCP Socket input sources vs HDFS: Data locality in Spark Streaming

2014-06-10 Thread Michael Cutler
fault tolerance, and the ability to checkpoint and recover even if master fails. Cheers, Nilesh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Performance-of-Akka-or-TCP-Socket-input-sources-vs-HDFS-Data-locality-in-Spark-Streaming-tp7317.html Sent