Re: Understanding Spark Memory distribution

2015-03-30 Thread giive chen
Hi Ankur

If you using standalone mode, your config is wrong. You should use export
SPARK_DAEMON_MEMORY=xxx   in config/spark-env.sh. At least it works on my
spark 1.3.0 standalone mode machine.

BTW, The SPARK_DRIVER_MEMORY is used in Yarn mode and looks like the
standalone mode don't use this config.

To debug this, please type ps auxw | grep
org.apache.spark.deploy.master.[M]aster  in master machine.
You can see the Xmx and Xms option.

Wisely Chen






On Mon, Mar 30, 2015 at 3:55 AM, Ankur Srivastava 
ankur.srivast...@gmail.com wrote:

 Hi Wisely,

 I am running on Amazon EC2 instances so I can not doubt the hardware.
 Moreover my other pipelines run successfully except for this which involves
 Broadcasting large object.

 My spark-en.sh setting are:

 SPARK_MASTER_IP=MASTER-IP

 SPARK_LOCAL_IP=LOCAL-IP

 SPARK_DRIVER_MEMORY=24g

 SPARK_WORKER_MEMORY=28g

 SPARK_EXECUTOR_MEMORY=26g

 SPARK_WORKER_CORES=8

 My spark-default.sh settings are:

 spark.eventLog.enabled   true

 spark.eventLog.dir   /srv/logs/

 spark.serializer org.apache.spark.serializer.KryoSerializer

 spark.kryo.registrator
 com.test.utils.KryoSerializationRegistrator

 spark.executor.extraJavaOptions  -verbose:gc -XX:+PrintGCDetails
 -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError
 -XX:HeapDumpPath=/srv/logs/ -XX:+UseG1GC

 spark.shuffle.consolidateFiles   true

 spark.shuffle.managersort

 spark.shuffle.compress   true

 spark.rdd.compress   true
 Thanks
 Ankur

 On Sat, Mar 28, 2015 at 7:57 AM, Wisely Chen wiselyc...@appier.com
 wrote:

 Hi Ankur

 If your hardware is ok, looks like it is config problem. Can you show me
 the config of spark-env.sh or JVM config?

 Thanks

 Wisely Chen

 2015-03-28 15:39 GMT+08:00 Ankur Srivastava ankur.srivast...@gmail.com:

 Hi Wisely,
 I have 26gb for driver and the master is running on m3.2xlarge machines.

 I see OOM errors on workers and even they are running with 26th of
 memory.

 Thanks

 On Fri, Mar 27, 2015, 11:43 PM Wisely Chen wiselyc...@appier.com
 wrote:

 Hi

 In broadcast, spark will collect the whole 3gb object into master node
 and broadcast to each slaves. It is very common situation that the master
 node don't have enough memory .

 What is your master node settings?

 Wisely Chen

 Ankur Srivastava ankur.srivast...@gmail.com 於 2015年3月28日 星期六寫道:

 I have increased the spark.storage.memoryFraction to 0.4 but I still
 get OOM errors on Spark Executor nodes


 15/03/27 23:19:51 INFO BlockManagerMaster: Updated info of block
 broadcast_5_piece10

 15/03/27 23:19:51 INFO TorrentBroadcast: Reading broadcast variable 5
 took 2704 ms

 15/03/27 23:19:52 INFO MemoryStore: ensureFreeSpace(672530208) called
 with curMem=2484698683, maxMem=9631778734

 15/03/27 23:19:52 INFO MemoryStore: Block broadcast_5 stored as values
 in memory (estimated size 641.4 MB, free 6.0 GB)

 15/03/27 23:34:02 WARN AkkaUtils: Error sending message in 1 attempts

 java.util.concurrent.TimeoutException: Futures timed out after [30
 seconds]

 at
 scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)

 at
 scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)

 at
 scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)

 at
 scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)

 at scala.concurrent.Await$.result(package.scala:107)

 at
 org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:187)

 at
 org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:407)

 15/03/27 23:34:02 ERROR Executor: Exception in task 7.0 in stage 2.0
 (TID 4007)

 java.lang.OutOfMemoryError: GC overhead limit exceeded

 at
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1986)

 at
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)

 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)

 Thanks

 Ankur

 On Fri, Mar 27, 2015 at 2:52 PM, Ankur Srivastava 
 ankur.srivast...@gmail.com wrote:

 Hi All,

 I am running a spark cluster on EC2 instances of type: m3.2xlarge. I
 have given 26gb of memory with all 8 cores to my executors. I can see 
 that
 in the logs too:

 *15/03/27 21:31:06 INFO AppClient$ClientActor: Executor added:
 app-20150327213106-/0 on worker-20150327212934-10.x.y.z-40128
 (10.x.y.z:40128) with 8 cores*

 I am not caching any RDD so I have set spark.storage.memoryFraction
 to 0.2. I can see on SparkUI under executors tab Memory used is 0.0/4.5 
 GB.

 I am now confused with these logs?

 *15/03/27 21:31:08 INFO BlockManagerMasterActor: Registering block
 manager 10.77.100.196:58407 http://10.77.100.196:58407 with 4.5 GB RAM,
 BlockManagerId(4, 10.x.y.z, 58407)*

 I am broadcasting a large object of 3 gb and after that when I am
 creating an RDD, I see logs which show this 4.5 GB memory getting full 
 and
 then I 

RE: Announcing Spark 1.0.0

2014-05-30 Thread giive chen
Great work!
On May 30, 2014 10:15 PM, Ian Ferreira ianferre...@hotmail.com wrote:

  Congrats

 Sent from my Windows Phone
  --
 From: Dean Wampler deanwamp...@gmail.com
 Sent: 5/30/2014 6:53 AM
 To: user@spark.apache.org
 Subject: Re: Announcing Spark 1.0.0

   Congratulations!!


 On Fri, May 30, 2014 at 5:12 AM, Patrick Wendell pwend...@gmail.com
 wrote:

 I'm thrilled to announce the availability of Spark 1.0.0! Spark 1.0.0
 is a milestone release as the first in the 1.0 line of releases,
 providing API stability for Spark's core interfaces.

 Spark 1.0.0 is Spark's largest release ever, with contributions from
 117 developers. I'd like to thank everyone involved in this release -
 it was truly a community effort with fixes, features, and
 optimizations contributed from dozens of organizations.

 This release expands Spark's standard libraries, introducing a new SQL
 package (SparkSQL) which lets users integrate SQL queries into
 existing Spark workflows. MLlib, Spark's machine learning library, is
 expanded with sparse vector support and several new algorithms. The
 GraphX and Streaming libraries also introduce new features and
 optimizations. Spark's core engine adds support for secured YARN
 clusters, a unified tool for submitting Spark applications, and
 several performance and stability improvements. Finally, Spark adds
 support for Java 8 lambda syntax and improves coverage of the Java and
 Python API's.

 Those features only scratch the surface - check out the release notes here:
 http://spark.apache.org/releases/spark-release-1-0-0.html

 Note that since release artifacts were posted recently, certain
 mirrors may not have working downloads for a few hours.

 - Patrick




  --
 Dean Wampler, Ph.D.
 Typesafe
 @deanwampler
 http://typesafe.com
 http://polyglotprogramming.com



Re: Java RDD structure for Matrix predict?

2014-05-27 Thread giive chen
Hi Sandeep

I think you should use  testRatings.mapToPair instead of  testRatings.map.

So the code should be


JavaPairRDDInteger,Integer usersProducts = training.mapToPair(
new PairFunctionRating, Integer, Integer() {
public Tuple2Integer, Integer call(Rating r) throws
Exception {
return new Tuple2Integer, Integer(r.user(),
r.product());
}
}
);

It works on my side.


Wisely Chen


On Wed, May 28, 2014 at 6:27 AM, Sandeep Parikh sand...@clusterbeep.orgwrote:

 I've got a trained MatrixFactorizationModel via ALS.train(...) and now I'm
 trying to use it to predict some ratings like so:

 JavaRDDRating predictions = model.predict(usersProducts.rdd())

 Where usersProducts is built from an existing Ratings dataset like so:

 JavaPairRDDInteger,Integer usersProducts = testRatings.map(
   new PairFunctionRating, Integer, Integer() {
 public Tuple2Integer, Integer call(Rating r) throws Exception {
   return new Tuple2Integer, Integer(r.user(), r.product());
 }
   }
 );

 The problem is that model.predict(...) doesn't like usersProducts,
 claiming that the method doesn't accept an RDD of type Tuple2 however the
 docs show the method signature as follows:

 def predict(usersProducts: RDD[(Int, Int)]): RDD[Rating]

 Am I missing something? The JavaRDD is just a list of Tuple2 elements,
 which would match the method signature but the compile is complaining.

 Thanks!




Re: Error reading HDFS file using spark 0.9.0 / hadoop 2.2.0 - incompatible protobuf 2.5 and 2.4.1

2014-04-15 Thread giive chen
Hi Prasad

Sorry for missing your reply.
https://gist.github.com/thegiive/10791823
Here it is.

Wisely Chen


On Fri, Apr 4, 2014 at 11:57 PM, Prasad ramachandran.pra...@gmail.comwrote:

 Hi Wisely,
 Could you please post your pom.xml here.

 Thanks



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Error-reading-HDFS-file-using-spark-0-9-0-hadoop-2-2-0-incompatible-protobuf-2-5-and-2-4-1-tp2158p3770.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.



Re: Lost an executor error - Jobs fail

2014-04-14 Thread giive chen
Hi Praveen

What is your config about * spark.local.dir ? *
Is all your worker has this dir and all worker has right permission on this
dir?

I think this is the reason of your error

Wisely Chen


On Mon, Apr 14, 2014 at 9:29 PM, Praveen R prav...@sigmoidanalytics.comwrote:

 Had below error while running shark queries on 30 node cluster and was not
 able to start shark server or run any jobs.

 *14/04/11 19:06:52 ERROR scheduler.TaskSchedulerImpl: Lost an executor 4
 (already removed): Failed to create local directory (bad spark.local.dir?)*
 *Full log: *https://gist.github.com/praveenr019/10647049

 After spending quite some time, found it was due to disk read errors on
 one node and had the cluster working after removing the node.

 Wanted to know if there is any configuration (like akkaTimeout) which can
 handle this or does mesos help ?

 Shouldn't the worker be marked dead in such scenario, instead of making
 the cluster non-usable so the debugging can be done at leisure.

 Thanks,
 Praveen R