How to set executor num on spark on yarn

2014-09-15 Thread hequn cheng
hi~I want to set the executor number to 16, but it is very strange that
executor cores may affect executor num on spark on yarn, i don't know why
and how to set executor number.
=
./bin/spark-submit --class com.hequn.spark.SparkJoins \
--master yarn-cluster \
--num-executors 16 \
--driver-memory 2g \
--executor-memory 10g \
  *  --executor-cores 4 \*
/home/sparkjoins-1.0-SNAPSHOT.jar

The UI shows there are *7 executors*
=
./bin/spark-submit --class com.hequn.spark.SparkJoins \
--master yarn-cluster \
--num-executors 16 \
--driver-memory 2g \
--executor-memory 10g \
*--executor-cores 2 \*
/home/sparkjoins-1.0-SNAPSHOT.jar

The UI shows there are *9 executors*
=
./bin/spark-submit --class com.hequn.spark.SparkJoins \
--master yarn-cluster \
--num-executors 16 \
--driver-memory 2g \
--executor-memory 10g \
*--executor-cores 1 \*
/home/sparkjoins-1.0-SNAPSHOT.jar

The UI shows there are *9 executors*
==
The cluster contains 16 nodes. Each node 64G RAM.


Re: Hadoop 2.3 Centralized Cache vs RDD

2014-05-16 Thread hequn cheng
I tried centralized cache step by step following the apache hadoop oficial
website, but it seems centralized cache doesn't work.
see :
http://stackoverflow.com/questions/22293358/centralized-cache-failed-in-hadoop-2-3
.
Can anyone succeed?


2014-05-15 5:30 GMT+08:00 William Kang weliam.cl...@gmail.com:

 Hi,
 Any comments or thoughts on the implications of the newly released feature
 from Hadoop 2.3 on the centralized cache? How different it is from RDD?

 Many thanks.


 Cao



Re: RDD usage

2014-03-24 Thread hequn cheng
points.foreach(p=p.y = another_value) will return a new modified RDD.


2014-03-24 18:13 GMT+08:00 Chieh-Yen r01944...@csie.ntu.edu.tw:

 Dear all,

 I have a question about the usage of RDD.
 I implemented a class called AppDataPoint, it looks like:

 case class AppDataPoint(input_y : Double, input_x : Array[Double]) extends
 Serializable {
   var y : Double = input_y
   var x : Array[Double] = input_x
   ..
 }
 Furthermore, I created the RDD by the following function.

 def parsePoint(line: String): AppDataPoint = {
   /* Some related works for parsing */
   ..
 }

 Assume the RDD called points:

 val lines = sc.textFile(inputPath, numPartition)
 var points = lines.map(parsePoint _).cache()

 The question is that, I tried to modify the value of this RDD, the
 operation is:

 points.foreach(p=p.y = another_value)

 The operation is workable.
 There doesn't have any warning or error message showed by the system and
 the results are right.
 I wonder that if the modification for RDD is a correct and in fact
 workable design.
 The usage web said that the RDD is immutable, is there any suggestion?

 Thanks a lot.

 Chieh-Yen Lin



Re: 答复: RDD usage

2014-03-24 Thread hequn cheng
First question:
If you save your modified RDD like this:
points.foreach(p=p.y = another_value).collect() or
points.foreach(p=p.y = another_value).saveAsTextFile(...)
the modified RDD will be materialized and this will not use any work's
memory.
If you have more transformatins after the map(), the spark will pipelines
all transformations and build a DAG. Very little memory will be used in
this stage and the memory will be free soon.
Only cache() will persist your RDD in memory for a long time.
Second question:
Once RDD be created, it can not be changed due to the immutable feature.You
can only create a new RDD from the existing RDD or from file system.


2014-03-25 9:45 GMT+08:00 林武康 vboylin1...@gmail.com:

  Hi hequn, a relative question, is that mean the memory usage will
 doubled? And further more, if the compute function in a rdd is not
 idempotent, rdd will changed during the job running, is that right?
  --
 发件人: hequn cheng chenghe...@gmail.com
 发送时间: 2014/3/25 9:35
 收件人: user@spark.apache.org
 主题: Re: RDD usage

 points.foreach(p=p.y = another_value) will return a new modified RDD.


 2014-03-24 18:13 GMT+08:00 Chieh-Yen r01944...@csie.ntu.edu.tw:

  Dear all,

 I have a question about the usage of RDD.
 I implemented a class called AppDataPoint, it looks like:

 case class AppDataPoint(input_y : Double, input_x : Array[Double])
 extends Serializable {
   var y : Double = input_y
   var x : Array[Double] = input_x
   ..
 }
 Furthermore, I created the RDD by the following function.

 def parsePoint(line: String): AppDataPoint = {
   /* Some related works for parsing */
   ..
 }

 Assume the RDD called points:

 val lines = sc.textFile(inputPath, numPartition)
 var points = lines.map(parsePoint _).cache()

 The question is that, I tried to modify the value of this RDD, the
 operation is:

 points.foreach(p=p.y = another_value)

 The operation is workable.
 There doesn't have any warning or error message showed by the system and
 the results are right.
 I wonder that if the modification for RDD is a correct and in fact
 workable design.
 The usage web said that the RDD is immutable, is there any suggestion?

 Thanks a lot.

 Chieh-Yen Lin





Re: What's the lifecycle of an rdd? Can I control it?

2014-03-19 Thread hequn cheng
persist and unpersist.
unpersist:Mark the RDD as non-persistent, and remove all blocks for it from
memory and disk


2014-03-19 16:40 GMT+08:00 林武康 vboylin1...@gmail.com:

  Hi, can any one tell me about the lifecycle of an rdd? I search through
 the official website and still can't figure it out. Can I use an rdd in
 some stages and destroy it in order to release memory because that no
 stages ahead will use this rdd any more. Is it possible?

 Thanks!

 Sincerely
 Lin wukang



subscribe

2014-03-10 Thread hequn cheng
hi


subscribe

2014-03-10 Thread hequn cheng
hi


Re: SPARK_JAVA_OPTS not picked up by the application

2014-03-10 Thread hequn cheng
have your send spark-env.sh to the slave nodes ?


2014-03-11 6:47 GMT+08:00 Linlin linlin200...@gmail.com:


 Hi,

 I have a java option (-Xss) setting specified in SPARK_JAVA_OPTS in
 spark-env.sh,  noticed after stop/restart the spark cluster, the
 master/worker daemon has the setting being applied, but this setting is not
 being propagated to the executor, my application continue behave the same.
 I
 am not sure if there is a way to specify it through SparkConf? like
 SparkConf.set(), and what is the correct way of setting this up for a
 particular spark application.

 Thank you!




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/SPARK-JAVA-OPTS-not-picked-up-by-the-application-tp2483.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.