Re: Understanding Spark Memory distribution
Hi Ankur If you using standalone mode, your config is wrong. You should use export SPARK_DAEMON_MEMORY=xxx in config/spark-env.sh. At least it works on my spark 1.3.0 standalone mode machine. BTW, The SPARK_DRIVER_MEMORY is used in Yarn mode and looks like the standalone mode don't use this config. To debug this, please type ps auxw | grep org.apache.spark.deploy.master.[M]aster in master machine. You can see the Xmx and Xms option. Wisely Chen On Mon, Mar 30, 2015 at 3:55 AM, Ankur Srivastava ankur.srivast...@gmail.com wrote: Hi Wisely, I am running on Amazon EC2 instances so I can not doubt the hardware. Moreover my other pipelines run successfully except for this which involves Broadcasting large object. My spark-en.sh setting are: SPARK_MASTER_IP=MASTER-IP SPARK_LOCAL_IP=LOCAL-IP SPARK_DRIVER_MEMORY=24g SPARK_WORKER_MEMORY=28g SPARK_EXECUTOR_MEMORY=26g SPARK_WORKER_CORES=8 My spark-default.sh settings are: spark.eventLog.enabled true spark.eventLog.dir /srv/logs/ spark.serializer org.apache.spark.serializer.KryoSerializer spark.kryo.registrator com.test.utils.KryoSerializationRegistrator spark.executor.extraJavaOptions -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/srv/logs/ -XX:+UseG1GC spark.shuffle.consolidateFiles true spark.shuffle.managersort spark.shuffle.compress true spark.rdd.compress true Thanks Ankur On Sat, Mar 28, 2015 at 7:57 AM, Wisely Chen wiselyc...@appier.com wrote: Hi Ankur If your hardware is ok, looks like it is config problem. Can you show me the config of spark-env.sh or JVM config? Thanks Wisely Chen 2015-03-28 15:39 GMT+08:00 Ankur Srivastava ankur.srivast...@gmail.com: Hi Wisely, I have 26gb for driver and the master is running on m3.2xlarge machines. I see OOM errors on workers and even they are running with 26th of memory. Thanks On Fri, Mar 27, 2015, 11:43 PM Wisely Chen wiselyc...@appier.com wrote: Hi In broadcast, spark will collect the whole 3gb object into master node and broadcast to each slaves. It is very common situation that the master node don't have enough memory . What is your master node settings? Wisely Chen Ankur Srivastava ankur.srivast...@gmail.com 於 2015年3月28日 星期六寫道: I have increased the spark.storage.memoryFraction to 0.4 but I still get OOM errors on Spark Executor nodes 15/03/27 23:19:51 INFO BlockManagerMaster: Updated info of block broadcast_5_piece10 15/03/27 23:19:51 INFO TorrentBroadcast: Reading broadcast variable 5 took 2704 ms 15/03/27 23:19:52 INFO MemoryStore: ensureFreeSpace(672530208) called with curMem=2484698683, maxMem=9631778734 15/03/27 23:19:52 INFO MemoryStore: Block broadcast_5 stored as values in memory (estimated size 641.4 MB, free 6.0 GB) 15/03/27 23:34:02 WARN AkkaUtils: Error sending message in 1 attempts java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:187) at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:407) 15/03/27 23:34:02 ERROR Executor: Exception in task 7.0 in stage 2.0 (TID 4007) java.lang.OutOfMemoryError: GC overhead limit exceeded at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1986) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) Thanks Ankur On Fri, Mar 27, 2015 at 2:52 PM, Ankur Srivastava ankur.srivast...@gmail.com wrote: Hi All, I am running a spark cluster on EC2 instances of type: m3.2xlarge. I have given 26gb of memory with all 8 cores to my executors. I can see that in the logs too: *15/03/27 21:31:06 INFO AppClient$ClientActor: Executor added: app-20150327213106-/0 on worker-20150327212934-10.x.y.z-40128 (10.x.y.z:40128) with 8 cores* I am not caching any RDD so I have set spark.storage.memoryFraction to 0.2. I can see on SparkUI under executors tab Memory used is 0.0/4.5 GB. I am now confused with these logs? *15/03/27 21:31:08 INFO BlockManagerMasterActor: Registering block manager 10.77.100.196:58407 http://10.77.100.196:58407 with 4.5 GB RAM, BlockManagerId(4, 10.x.y.z, 58407)* I am broadcasting a large object of 3 gb and after that when I am creating an RDD, I see logs which show this 4.5 GB memory getting full and then I
RE: Announcing Spark 1.0.0
Great work! On May 30, 2014 10:15 PM, Ian Ferreira ianferre...@hotmail.com wrote: Congrats Sent from my Windows Phone -- From: Dean Wampler deanwamp...@gmail.com Sent: 5/30/2014 6:53 AM To: user@spark.apache.org Subject: Re: Announcing Spark 1.0.0 Congratulations!! On Fri, May 30, 2014 at 5:12 AM, Patrick Wendell pwend...@gmail.com wrote: I'm thrilled to announce the availability of Spark 1.0.0! Spark 1.0.0 is a milestone release as the first in the 1.0 line of releases, providing API stability for Spark's core interfaces. Spark 1.0.0 is Spark's largest release ever, with contributions from 117 developers. I'd like to thank everyone involved in this release - it was truly a community effort with fixes, features, and optimizations contributed from dozens of organizations. This release expands Spark's standard libraries, introducing a new SQL package (SparkSQL) which lets users integrate SQL queries into existing Spark workflows. MLlib, Spark's machine learning library, is expanded with sparse vector support and several new algorithms. The GraphX and Streaming libraries also introduce new features and optimizations. Spark's core engine adds support for secured YARN clusters, a unified tool for submitting Spark applications, and several performance and stability improvements. Finally, Spark adds support for Java 8 lambda syntax and improves coverage of the Java and Python API's. Those features only scratch the surface - check out the release notes here: http://spark.apache.org/releases/spark-release-1-0-0.html Note that since release artifacts were posted recently, certain mirrors may not have working downloads for a few hours. - Patrick -- Dean Wampler, Ph.D. Typesafe @deanwampler http://typesafe.com http://polyglotprogramming.com
Re: Java RDD structure for Matrix predict?
Hi Sandeep I think you should use testRatings.mapToPair instead of testRatings.map. So the code should be JavaPairRDDInteger,Integer usersProducts = training.mapToPair( new PairFunctionRating, Integer, Integer() { public Tuple2Integer, Integer call(Rating r) throws Exception { return new Tuple2Integer, Integer(r.user(), r.product()); } } ); It works on my side. Wisely Chen On Wed, May 28, 2014 at 6:27 AM, Sandeep Parikh sand...@clusterbeep.orgwrote: I've got a trained MatrixFactorizationModel via ALS.train(...) and now I'm trying to use it to predict some ratings like so: JavaRDDRating predictions = model.predict(usersProducts.rdd()) Where usersProducts is built from an existing Ratings dataset like so: JavaPairRDDInteger,Integer usersProducts = testRatings.map( new PairFunctionRating, Integer, Integer() { public Tuple2Integer, Integer call(Rating r) throws Exception { return new Tuple2Integer, Integer(r.user(), r.product()); } } ); The problem is that model.predict(...) doesn't like usersProducts, claiming that the method doesn't accept an RDD of type Tuple2 however the docs show the method signature as follows: def predict(usersProducts: RDD[(Int, Int)]): RDD[Rating] Am I missing something? The JavaRDD is just a list of Tuple2 elements, which would match the method signature but the compile is complaining. Thanks!
Re: Error reading HDFS file using spark 0.9.0 / hadoop 2.2.0 - incompatible protobuf 2.5 and 2.4.1
Hi Prasad Sorry for missing your reply. https://gist.github.com/thegiive/10791823 Here it is. Wisely Chen On Fri, Apr 4, 2014 at 11:57 PM, Prasad ramachandran.pra...@gmail.comwrote: Hi Wisely, Could you please post your pom.xml here. Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Error-reading-HDFS-file-using-spark-0-9-0-hadoop-2-2-0-incompatible-protobuf-2-5-and-2-4-1-tp2158p3770.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Lost an executor error - Jobs fail
Hi Praveen What is your config about * spark.local.dir ? * Is all your worker has this dir and all worker has right permission on this dir? I think this is the reason of your error Wisely Chen On Mon, Apr 14, 2014 at 9:29 PM, Praveen R prav...@sigmoidanalytics.comwrote: Had below error while running shark queries on 30 node cluster and was not able to start shark server or run any jobs. *14/04/11 19:06:52 ERROR scheduler.TaskSchedulerImpl: Lost an executor 4 (already removed): Failed to create local directory (bad spark.local.dir?)* *Full log: *https://gist.github.com/praveenr019/10647049 After spending quite some time, found it was due to disk read errors on one node and had the cluster working after removing the node. Wanted to know if there is any configuration (like akkaTimeout) which can handle this or does mesos help ? Shouldn't the worker be marked dead in such scenario, instead of making the cluster non-usable so the debugging can be done at leisure. Thanks, Praveen R