Problem with Graphx and number of partitions

2016-08-31 Thread alvarobrandon
Helo everyone: I have a problem when setting the number of partitions inside Graphx with the ConnectedComponents function. When I launch the application with the default number of partition everything runs smoothly. However when I increase the number of partitions to 150 for example ( it happens

Scheduler Delay Time

2016-06-03 Thread alvarobrandon
Hello: I'm doing some instrumentation in Spark and I've realised that some of my tasks take really long times to complete because the Scheduler Delay Time. I submit the apps through spark-submit in a YARN cluster. I was wondering if this Delay time takes also into account the period between an

DAG of Spark Sort application spanning two jobs

2016-05-30 Thread alvarobrandon
I've written a very simple Sort scala program with Spark. /object Sort { def main(args: Array[String]): Unit = { if (args.length < 2) { System.err.println("Usage: Sort " + " []") System.exit(1) } val conf = new

Reading Shuffle Data from highly loaded nodes

2016-05-09 Thread alvarobrandon
Hello everyone: I'm running an experiment in a Spark cluster where some of the machines are highly loaded with CPU, memory and network consuming process ( let's call them straggler machines ). Obviously the tasks of these machines take longer to execute than in other nodes of the cluster.

Problem with History Server

2016-04-13 Thread alvarobrandon
Hello: I'm using the history server to keep track of the applications I run in my cluster. I'm using Spark with YARN. When I run on application it finishes correctly even YARN says that it finished. This is the result of the YARN Resource Manager API {u'app': [{u'runningContainers': -1,

Dynamic allocation Spark

2016-02-26 Thread alvarobrandon
Hello everyone: I'm trying the dynamic allocation in Spark with YARN. I have followed the following configuration steps: 1. Copy the spark-*-yarn-shuffle.jar to the nodemanager classpath. "cp /opt/spark/lib/spark-*-yarn-shuffle.jar /opt/hadoop/share/hadoop/yarn" 2. Added the shuffle service of

Re: No event log in /tmp/spark-events

2016-02-26 Thread alvarobrandon
Just write /tmp/sparkserverlog without the file part. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/No-event-log-in-tmp-spark-events-tp26318p26343.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

SPARK REST API on YARN

2016-02-18 Thread alvarobrandon
Hello: I wanted to access the REST API (http://spark.apache.org/docs/latest/monitoring.html#rest-api) of Spark to monitor my jobs. However I'm running my Spark Apps over YARN. When I try to make a request to http://localhost:4040/api/v1 as the documentation says I don't get any response. My

Re: Error when executing Spark application on YARN

2016-02-18 Thread alvarobrandon
Found the solution. I was pointing to the wrong hadoop conf directory. I feel so stupid :P -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Error-when-executing-Spark-application-on-YARN-tp26248p26266.html Sent from the Apache Spark User List mailing list

Re: Error when executing Spark application on YARN

2016-02-18 Thread alvarobrandon
1. It happens to all the classes inside the jar package. 2. I didn't do any changes. - I have three nodes: one master and two slaves in the conf/slaves file - In spark-env.sh I just set the HADOOP_CONF_DIR parameter - In spark-defaults.conf I didn't change anything 3. The

Error when executing Spark application on YARN

2016-02-17 Thread alvarobrandon
Hello: I'm trying to launch an application in a yarn cluster with the following command /opt/spark/bin/spark-submit --class com.abrandon.upm.GenerateKMeansData --master yarn --deploy-mode client /opt/spark/BenchMark-1.0-SNAPSHOT.jar kMeans 5 4 5 0.9 8 The last bit after the jar file

Monitoring Spark HDFS Reads and Writes

2015-12-30 Thread alvarobrandon
-76de587560c0, blockid: BP-189543387-138.100.13.81-1450715936956:blk_1073741837_1013, duration: 2619119 hadoop-alvarobrandon-datanode-usuariop81.fi.upm.es.log:2015-12-21 18:29:15,429 INFO org.apache.hadoop.hdfs.server.d Is there any trace about this kind of operations to be found in any log? Thanks

Is there anyway to log properties from a Spark application

2015-12-28 Thread alvarobrandon
Hello: I was wondering if its possible to log properties from Spark Applications like spark.yarn.am.memory, spark.driver.cores, spark.reducer.maxSizeInFlight without having to access the SparkConf object programmatically. I'm trying to find some kind of log file that has traces of the execution