How to set executor num on spark on yarn
hi~I want to set the executor number to 16, but it is very strange that executor cores may affect executor num on spark on yarn, i don't know why and how to set executor number. = ./bin/spark-submit --class com.hequn.spark.SparkJoins \ --master yarn-cluster \ --num-executors 16 \ --driver-memory 2g \ --executor-memory 10g \ * --executor-cores 4 \* /home/sparkjoins-1.0-SNAPSHOT.jar The UI shows there are *7 executors* = ./bin/spark-submit --class com.hequn.spark.SparkJoins \ --master yarn-cluster \ --num-executors 16 \ --driver-memory 2g \ --executor-memory 10g \ *--executor-cores 2 \* /home/sparkjoins-1.0-SNAPSHOT.jar The UI shows there are *9 executors* = ./bin/spark-submit --class com.hequn.spark.SparkJoins \ --master yarn-cluster \ --num-executors 16 \ --driver-memory 2g \ --executor-memory 10g \ *--executor-cores 1 \* /home/sparkjoins-1.0-SNAPSHOT.jar The UI shows there are *9 executors* == The cluster contains 16 nodes. Each node 64G RAM.
Re: Hadoop 2.3 Centralized Cache vs RDD
I tried centralized cache step by step following the apache hadoop oficial website, but it seems centralized cache doesn't work. see : http://stackoverflow.com/questions/22293358/centralized-cache-failed-in-hadoop-2-3 . Can anyone succeed? 2014-05-15 5:30 GMT+08:00 William Kang weliam.cl...@gmail.com: Hi, Any comments or thoughts on the implications of the newly released feature from Hadoop 2.3 on the centralized cache? How different it is from RDD? Many thanks. Cao
Re: RDD usage
points.foreach(p=p.y = another_value) will return a new modified RDD. 2014-03-24 18:13 GMT+08:00 Chieh-Yen r01944...@csie.ntu.edu.tw: Dear all, I have a question about the usage of RDD. I implemented a class called AppDataPoint, it looks like: case class AppDataPoint(input_y : Double, input_x : Array[Double]) extends Serializable { var y : Double = input_y var x : Array[Double] = input_x .. } Furthermore, I created the RDD by the following function. def parsePoint(line: String): AppDataPoint = { /* Some related works for parsing */ .. } Assume the RDD called points: val lines = sc.textFile(inputPath, numPartition) var points = lines.map(parsePoint _).cache() The question is that, I tried to modify the value of this RDD, the operation is: points.foreach(p=p.y = another_value) The operation is workable. There doesn't have any warning or error message showed by the system and the results are right. I wonder that if the modification for RDD is a correct and in fact workable design. The usage web said that the RDD is immutable, is there any suggestion? Thanks a lot. Chieh-Yen Lin
Re: 答复: RDD usage
First question: If you save your modified RDD like this: points.foreach(p=p.y = another_value).collect() or points.foreach(p=p.y = another_value).saveAsTextFile(...) the modified RDD will be materialized and this will not use any work's memory. If you have more transformatins after the map(), the spark will pipelines all transformations and build a DAG. Very little memory will be used in this stage and the memory will be free soon. Only cache() will persist your RDD in memory for a long time. Second question: Once RDD be created, it can not be changed due to the immutable feature.You can only create a new RDD from the existing RDD or from file system. 2014-03-25 9:45 GMT+08:00 林武康 vboylin1...@gmail.com: Hi hequn, a relative question, is that mean the memory usage will doubled? And further more, if the compute function in a rdd is not idempotent, rdd will changed during the job running, is that right? -- 发件人: hequn cheng chenghe...@gmail.com 发送时间: 2014/3/25 9:35 收件人: user@spark.apache.org 主题: Re: RDD usage points.foreach(p=p.y = another_value) will return a new modified RDD. 2014-03-24 18:13 GMT+08:00 Chieh-Yen r01944...@csie.ntu.edu.tw: Dear all, I have a question about the usage of RDD. I implemented a class called AppDataPoint, it looks like: case class AppDataPoint(input_y : Double, input_x : Array[Double]) extends Serializable { var y : Double = input_y var x : Array[Double] = input_x .. } Furthermore, I created the RDD by the following function. def parsePoint(line: String): AppDataPoint = { /* Some related works for parsing */ .. } Assume the RDD called points: val lines = sc.textFile(inputPath, numPartition) var points = lines.map(parsePoint _).cache() The question is that, I tried to modify the value of this RDD, the operation is: points.foreach(p=p.y = another_value) The operation is workable. There doesn't have any warning or error message showed by the system and the results are right. I wonder that if the modification for RDD is a correct and in fact workable design. The usage web said that the RDD is immutable, is there any suggestion? Thanks a lot. Chieh-Yen Lin
Re: What's the lifecycle of an rdd? Can I control it?
persist and unpersist. unpersist:Mark the RDD as non-persistent, and remove all blocks for it from memory and disk 2014-03-19 16:40 GMT+08:00 林武康 vboylin1...@gmail.com: Hi, can any one tell me about the lifecycle of an rdd? I search through the official website and still can't figure it out. Can I use an rdd in some stages and destroy it in order to release memory because that no stages ahead will use this rdd any more. Is it possible? Thanks! Sincerely Lin wukang
subscribe
hi
subscribe
hi
Re: SPARK_JAVA_OPTS not picked up by the application
have your send spark-env.sh to the slave nodes ? 2014-03-11 6:47 GMT+08:00 Linlin linlin200...@gmail.com: Hi, I have a java option (-Xss) setting specified in SPARK_JAVA_OPTS in spark-env.sh, noticed after stop/restart the spark cluster, the master/worker daemon has the setting being applied, but this setting is not being propagated to the executor, my application continue behave the same. I am not sure if there is a way to specify it through SparkConf? like SparkConf.set(), and what is the correct way of setting this up for a particular spark application. Thank you! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SPARK-JAVA-OPTS-not-picked-up-by-the-application-tp2483.html Sent from the Apache Spark User List mailing list archive at Nabble.com.