Re: Feasibility Project - Text Processing and Category Classification
Load the textFile as an RDD. Something like this: val file = sc.textFile(/path/to/file) After this you can manipulate this RDD to filter texts the way you want them : val a1 = file.filter( line = line.contains([ERROR]) ) val a2 = file.filter( line = line.contains([WARN]) ) val a3 = file.filter( line = line.contains([INFO]) ) You can view the lines using the println method like this: a1.foreach(println) You can also count the number of such lines using the count function like this: val b1 = file.filter( line = line.contains([ERROR]) ).count() Regards, *Ritesh Kumar Singh,**https://riteshtoday.wordpress.com/ https://riteshtoday.wordpress.com/*
Re: Unsupported major.minor version 51.0
Can you please mention the output for the following : java -version javac -version
Local spark jars not being detected
Hi, I'm using IntelliJ ide for my spark project. I've compiled spark 1.3.0 for scala 2.11.4 and here's the one of the compiled jar installed in my m2 folder : ~/.m2/repository/org/apache/spark/spark-core_2.11/1.3.0/spark-core_2.11-1.3.0.jar But when I add this dependency in my pom file for the project : dependency groupIdorg.apache.spark/groupId artifactIdspark-core_$(scala.version)/artifactId version${spark.version}/version scopeprovided/scope /dependency I'm getting Dependency org.apache.spark:spark-core_$(scala.version):1.3.0 not found. Why is this happening and what's the workaround ?
Re: Local spark jars not being detected
Yes, finally solved. It was there in front of my eyes all time. Thanks a lot Pete.
Error using spark 1.3.0 with maven
Hi, I'm getting this error while running spark as a java project using maven : 15/06/15 17:11:38 INFO SparkContext: Running Spark version 1.3.0 15/06/15 17:11:38 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/06/15 17:11:38 INFO SecurityManager: Changing view acls to: root 15/06/15 17:11:38 INFO SecurityManager: Changing modify acls to: root 15/06/15 17:11:38 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) Exception in thread main com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'akka.version' at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:124) at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:145) at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:151) at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:159) at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:164) at com.typesafe.config.impl.SimpleConfig.getString(SimpleConfig.java:206) at akka.actor.ActorSystem$Settings.init(ActorSystem.scala:168) at akka.actor.ActorSystemImpl.init(ActorSystem.scala:504) at akka.actor.ActorSystem$.apply(ActorSystem.scala:141) at akka.actor.ActorSystem$.apply(ActorSystem.scala:118) at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:122) at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:55) at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54) at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1832) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1823) at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:57) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:223) at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:163) at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:267) at org.apache.spark.SparkContext.init(SparkContext.scala:270) at org.apache.spark.api.java.JavaSparkContext.init(JavaSparkContext.scala:61) at Test.main(Test.java:9) == My Test.java file contains following : import org.apache.spark.api.java.*; import org.apache.spark.SparkConf; import org.apache.spark.api.java.function.Function; public class Test { public static void main(String[] args) { String logFile = /code/data.txt; SparkConf conf = new SparkConf().setMaster(local[4]).setAppName(Simple Application); JavaSparkContext sc = new JavaSparkContext(conf); JavaRDDString logData = sc.textFile(logFile); long numAs = logData.filter(new FunctionString, Boolean() { public Boolean call(String s) { return s.contains(a); } }).count(); long numBs = logData.filter(new FunctionString, Boolean() { public Boolean call(String s) { return s.contains(b); } }).count(); System.out.println(Lines with a: + numAs + , lines with b: + numBs); } } == My pom file contains the following : ?xml version=1.0 encoding=UTF-8? project xmlns=http://maven.apache.org/POM/4.0.0; xmlns:xsi= http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation=http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd; modelVersion4.0.0/modelVersion groupIddexample/groupId artifactIdsparktest/artifactId nameTesting spark with maven/name packagingjar/packaging version1.0-SNAPSHOT/version dependencies dependency groupIdorg.apache.spark/groupId artifactIdspark-core_2.10/artifactId version1.3.0/version /dependency /dependencies build plugins plugin groupIdorg.apache.maven.plugins/groupId artifactIdmaven-jar-plugin/artifactId version2.6/version configuration finalNamesparktest/finalName archive manifest addClasspathtrue/addClasspath mainClassTest/mainClass classpathPrefixdependency-jars//classpathPrefix /manifest /archive /configuration /plugin plugin groupIdorg.apache.maven.plugins/groupId artifactIdmaven-compiler-plugin/artifactId version3.3/version configuration source1.7/source target1.7/target /configuration /plugin plugin groupIdorg.apache.maven.plugins/groupId artifactIdmaven-assembly-plugin/artifactId executions execution goals goalattached/goal /goals phasepackage/phase configuration finalNamesparktest/finalName descriptorRefs
akka configuration not found
Hi, Though my project has nothing to do with akka, I'm getting this error : Exception in thread main com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'akka.version' at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:124) at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:145) at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:151) at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:159) at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:164) at com.typesafe.config.impl.SimpleConfig.getString(SimpleConfig.java:206) at akka.actor.ActorSystem$Settings.init(ActorSystem.scala:168) at akka.actor.ActorSystemImpl.init(ActorSystem.scala:504) at akka.actor.ActorSystem$.apply(ActorSystem.scala:141) at akka.actor.ActorSystem$.apply(ActorSystem.scala:118) at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:122) at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:55) at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54) at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1832) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1823) at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:57) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:223) at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:163) at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:267) at org.apache.spark.SparkContext.init(SparkContext.scala:270) at org.apache.spark.api.java.JavaSparkContext.init(JavaSparkContext.scala:61) at Test.main(Test.java:9) There is no reference to akka anywhere in the code / pom file. Any fixes? Thanks, Ritesh
Re: Can't build Spark 1.3
It did hang for me too. High RAM consumption during build. Had to free a lot of RAM and introduce swap memory just to get it build in my 3rd attempt. Everything else looks fine. You can download the prebuilt versions from the Spark homepage to save yourself from all this trouble. Thanks, Ritesh
Re: Official Docker container for Spark
Use this: sequenceiq/docker Here's a link to their github repo: docker-spark https://github.com/sequenceiq/docker-spark They have repos for other big data tools too which are agin really nice. Its being maintained properly by their devs and
Re: Overlapping classes warnings
Though the warnings can be ignored, they add up in the log files while compiling other projects too. And there are a lot of those warnings. Any workaround? How do we modify the pom.xml file to exclude these unnecessary dependencies? On Fri, Apr 10, 2015 at 2:29 AM, Sean Owen so...@cloudera.com wrote: Generally, you can ignore these things. They mean some artifacts packaged other artifacts, and so two copies show up when all the JAR contents are merged. But here you do show a small dependency convergence problem; beanutils 1.7 is present but beanutills-core 1.8 is too even though these should be harmonized. I imagine one could be excluded; I imagine we could harmonize the version manually. In practice, I also imagine it doesn't cause any problem but feel free to propose a fix along those lines. On Thu, Apr 9, 2015 at 4:54 PM, Ritesh Kumar Singh riteshoneinamill...@gmail.com wrote: Hi, During compilation I get a lot of these: [WARNING] kryo-2.21.jar, reflectasm-1.07-shaded.jar define 23 overlappping classes: [WARNING] commons-beanutils-1.7.0.jar, commons-beanutils-core-1.8.0.jar define 82 overlappping classes: [WARNING] commons-beanutils-1.7.0.jar, commons-collections-3.2.1.jar, commons-beanutils-core-1.8.0.jar define 10 overlappping classes: And a lot of others. How do I fix these?
Re: Overlapping classes warnings
I found this jira https://jira.codehaus.org/browse/MSHADE-128 when googling for fixes. Wonder if it can fix anything here. But anyways, thanks for the help :) On Fri, Apr 10, 2015 at 2:46 AM, Sean Owen so...@cloudera.com wrote: I agree, but as I say, most are out of the control of Spark. They aren't because of unnecessary dependencies. On Thu, Apr 9, 2015 at 5:14 PM, Ritesh Kumar Singh riteshoneinamill...@gmail.com wrote: Though the warnings can be ignored, they add up in the log files while compiling other projects too. And there are a lot of those warnings. Any workaround? How do we modify the pom.xml file to exclude these unnecessary dependencies? On Fri, Apr 10, 2015 at 2:29 AM, Sean Owen so...@cloudera.com wrote: Generally, you can ignore these things. They mean some artifacts packaged other artifacts, and so two copies show up when all the JAR contents are merged. But here you do show a small dependency convergence problem; beanutils 1.7 is present but beanutills-core 1.8 is too even though these should be harmonized. I imagine one could be excluded; I imagine we could harmonize the version manually. In practice, I also imagine it doesn't cause any problem but feel free to propose a fix along those lines. On Thu, Apr 9, 2015 at 4:54 PM, Ritesh Kumar Singh riteshoneinamill...@gmail.com wrote: Hi, During compilation I get a lot of these: [WARNING] kryo-2.21.jar, reflectasm-1.07-shaded.jar define 23 overlappping classes: [WARNING] commons-beanutils-1.7.0.jar, commons-beanutils-core-1.8.0.jar define 82 overlappping classes: [WARNING] commons-beanutils-1.7.0.jar, commons-collections-3.2.1.jar, commons-beanutils-core-1.8.0.jar define 10 overlappping classes: And a lot of others. How do I fix these?
Migrating from Spark 0.8.0 to Spark 1.3.0
Hi, Are there any tutorials that explains all the changelogs between Spark 0.8.0 and Spark 1.3.0 and how can we approach this issue.
Re: Is there any Sparse Matrix implementation in Spark/MLib?
try using breeze (scala linear algebra library) On Fri, Feb 27, 2015 at 5:56 PM, shahab shahab.mok...@gmail.com wrote: Thanks a lot Vijay, let me see how it performs. Best Shahab On Friday, February 27, 2015, Vijay Saraswat vi...@saraswat.org wrote: Available in GML -- http://x10-lang.org/x10-community/applications/global-matrix-library.html We are exploring how to make it available within Spark. Any ideas would be much appreciated. On 2/27/15 7:01 AM, shahab wrote: Hi, I just wonder if there is any Sparse Matrix implementation available in Spark, so it can be used in spark application? best, /Shahab - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Mllib error
How did you build your spark 1.1.1 ? On Wed, Dec 10, 2014 at 10:41 AM, amin mohebbi aminn_...@yahoo.com.invalid wrote: I'm trying to build a very simple scala standalone app using the Mllib, but I get the following error when trying to bulid the program: Object mllib is not a member of package org.apache.spark please note I just migrated from 1.0.2 to 1.1.1 Best Regards ... Amin Mohebbi PhD candidate in Software Engineering at university of Malaysia Tel : +60 18 2040 017 E-Mail : tp025...@ex.apiit.edu.my amin_...@me.com
Re: Install Apache Spark on a Cluster
On a rough note, Step 1: Install Hadoop2.x in all the machines on cluster Step 2: Check if Hadoop cluster is working Step 3: Setup Apache Spark as given on the documentation page for the cluster. Check the status of cluster on the master UI As it is some data mining project, configure Hive too. You can use Spark SQL or AMPLAB Shark as a database store On Mon, Dec 8, 2014 at 11:01 PM, riginos samarasrigi...@gmail.com wrote: My thesis is related to big data mining and I have a cluster in the laboratory of my university. My task is to install apache spark on it and use it for extraction purposes. Is there any understandable guidance on how to do this ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Install-Apache-Spark-on-a-Cluster-tp20580.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: How take top N of top M from RDD as RDD
For converting an Array or any List to a RDD, we can try using : sc.parallelize(groupedScore)//or whatever the name of the list variable is On Mon, Dec 1, 2014 at 8:14 PM, Xuefeng Wu ben...@gmail.com wrote: Hi, I have a problem, it is easy in Scala code, but I can not take the top N from RDD as RDD. There are 1 Student Score, ask take top 10 age, and then take top 10 from each age, the result is 100 records. The Scala code is here, but how can I do it in RDD, *for RDD.take return is Array, but other RDD.* example Scala code: import scala.util.Random case class StudentScore(age: Int, num: Int, score: Int, name: Int) val scores = for { i - 1 to 1 } yield { StudentScore(Random.nextInt(100), Random.nextInt(100), Random.nextInt(), Random.nextInt()) } def takeTop(scores: Seq[StudentScore], byKey: StudentScore = Int): Seq[(Int, Seq[StudentScore])] = { val groupedScore = scores.groupBy(byKey) .map{case (_, _scores) = (_scores.foldLeft(0)((acc, v) = acc + v.score), _scores)}.toSeq groupedScore.sortBy(_._1).take(10) } val topScores = for { (_, ageScores) - takeTop(scores, _.age) (_, numScores) - takeTop(ageScores, _.num) } yield { numScores } topScores.size -- ~Yours, Xuefeng Wu/吴雪峰 敬上
Re: Setting network variables in spark-shell
Spark configuration settings can be found here http://spark.apache.org/docs/latest/configuration.html Hope it helps :) On Sun, Nov 30, 2014 at 9:55 PM, Brian Dolan buddha_...@yahoo.com.invalid wrote: Howdy Folks, What is the correct syntax in 1.0.0 to set networking variables in spark shell? Specifically, I'd like to set the spark.akka.frameSize I'm attempting this: spark-shell -Dspark.akka.frameSize=1 --executor-memory 4g Only to get this within the session: System.getProperty(spark.executor.memory) res0: String = 4g System.getProperty(spark.akka.frameSize) res1: String = null I don't believe I am violating protocol, but I have also posted this to SO: http://stackoverflow.com/questions/27215288/how-do-i-set-spark-akka-framesize-in-spark-shell ~~ May All Your Sequences Converge
Re: spark-shell giving me error of unread block data
As Marcelo mentioned, the issue occurs mostly when incompatible classes are used by executors or drivers. Try out if the output is coming on spark-shell. If yes, then most probably in your case, there might be some issue with your configuration files. It will be helpful if you can paste the contents of the config files you edited. On Thu, Nov 20, 2014 at 5:45 AM, Anson Abraham anson.abra...@gmail.com wrote: Sorry meant cdh 5.2 w/ spark 1.1. On Wed, Nov 19, 2014, 17:41 Anson Abraham anson.abra...@gmail.com wrote: yeah CDH distribution (1.1). On Wed Nov 19 2014 at 5:29:39 PM Marcelo Vanzin van...@cloudera.com wrote: On Wed, Nov 19, 2014 at 2:13 PM, Anson Abraham anson.abra...@gmail.com wrote: yeah but in this case i'm not building any files. just deployed out config files in CDH5.2 and initiated a spark-shell to just read and output a file. In that case it is a little bit weird. Just to be sure, you are using CDH's version of Spark, not trying to run an Apache Spark release on top of CDH, right? (If that's the case, then we could probably move this conversation to cdh-us...@cloudera.org, since it would be CDH-specific.) On Wed Nov 19 2014 at 4:52:51 PM Marcelo Vanzin van...@cloudera.com wrote: Hi Anson, We've seen this error when incompatible classes are used in the driver and executors (e.g., same class name, but the classes are different and thus the serialized data is different). This can happen for example if you're including some 3rd party libraries in your app's jar, or changing the driver/executor class paths to include these conflicting libraries. Can you clarify whether any of the above apply to your case? (For example, one easy way to trigger this is to add the spark-examples jar shipped with CDH5.2 in the classpath of your driver. That's one of the reasons I filed SPARK-4048, but I digress.) On Tue, Nov 18, 2014 at 1:59 PM, Anson Abraham anson.abra...@gmail.com wrote: I'm essentially loading a file and saving output to another location: val source = sc.textFile(/tmp/testfile.txt) source.saveAsTextFile(/tmp/testsparkoutput) when i do so, i'm hitting this error: 14/11/18 21:15:08 INFO DAGScheduler: Failed to run saveAsTextFile at console:15 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, cloudera-1.testdomain.net): java.lang.IllegalStateExceptio n: unread block data java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode( ObjectInputStream.java:2421) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382) java.io.ObjectInputStream.defaultReadFields(ObjectInputStrea m.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream. java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStre am.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream.java :370) org.apache.spark.serializer.JavaDeserializationStream.readOb ject(JavaSerializer.scala:62) org.apache.spark.serializer.JavaSerializerInstance.deseriali ze(JavaSerializer.scala:87) org.apache.spark.executor.Executor$TaskRunner.run(Executor. scala:162) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool Executor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo lExecutor.java:615) java.lang.Thread.run(Thread.java:744) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$sch eduler$DAGScheduler$$failJobAndIndependentStages(DAGSchedule r.scala:1185) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$ 1.apply(DAGScheduler.scala:1174) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$ 1.apply(DAGScheduler.scala:1173) at scala.collection.mutable.ResizableArray$class.foreach(Resiza bleArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer. scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGSchedu ler.scala:1173) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskS etFailed$1.apply(DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskS etFailed$1.apply(DAGScheduler.scala:688) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed( DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$ anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at
Re: Spark On Yarn Issue: Initial job has not accepted any resources
Not sure how to solve this, but spotted these lines in the logs: 14/11/18 14:28:23 INFO YarnAllocationHandler: Container marked as *failed*: container_1415961020140_0325_01_02 14/11/18 14:28:38 INFO YarnAllocationHandler: Container marked as *failed*: container_1415961020140_0325_01_03 And the lines following it says its trying to allocate some space of 1408B but its failing to do so. You might want to look into that On Tue, Nov 18, 2014 at 1:23 PM, LinCharlie lin_q...@outlook.com wrote: Hi All: I was submitting a spark_program.jar to `spark on yarn cluster` on a driver machine with yarn-client mode. Here is the spark-submit command I used: ./spark-submit --master yarn-client --class com.charlie.spark.grax.OldFollowersExample --queue dt_spark ~/script/spark-flume-test-0.1-SNAPSHOT-hadoop2.0.0-mr1-cdh4.2.1.jar The queue `dt_spark` was free, and the program was submitted succesfully and running on the cluster. But on console, it showed repeatedly that: 14/11/18 15:11:48 WARN YarnClientClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory Checked the cluster UI logs, I find no errors: SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/disk5/yarn/usercache/linqili/filecache/6957209742046754908/spark-assembly-1.0.2-hadoop2.0.0-cdh4.2.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.0.0-cdh4.2.1/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 14/11/18 14:28:16 INFO SecurityManager: Changing view acls to: hadoop,linqili 14/11/18 14:28:16 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop, linqili) 14/11/18 14:28:17 INFO Slf4jLogger: Slf4jLogger started 14/11/18 14:28:17 INFO Remoting: Starting remoting 14/11/18 14:28:17 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkyar...@longzhou-hdp3.lz.dscc:37187] 14/11/18 14:28:17 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkyar...@longzhou-hdp3.lz.dscc:37187] 14/11/18 14:28:17 INFO ExecutorLauncher: ApplicationAttemptId: appattempt_1415961020140_0325_01 14/11/18 14:28:17 INFO ExecutorLauncher: Connecting to ResourceManager at longzhou-hdpnn.lz.dscc/192.168.19.107:12032 14/11/18 14:28:17 INFO ExecutorLauncher: Registering the ApplicationMaster 14/11/18 14:28:18 INFO ExecutorLauncher: Waiting for spark driver to be reachable. 14/11/18 14:28:18 INFO ExecutorLauncher: Master now available: 192.168.59.90:36691 14/11/18 14:28:18 INFO ExecutorLauncher: Listen to driver: akka.tcp://spark@192.168.59.90:36691/user/CoarseGrainedScheduler 14/11/18 http://spark@192.168.59.90:36691/user/CoarseGrainedScheduler14/11/18 14:28:18 INFO ExecutorLauncher: Allocating 1 executors. 14/11/18 14:28:18 INFO YarnAllocationHandler: Allocating 1 executor containers with 1408 of memory each. 14/11/18 14:28:18 INFO YarnAllocationHandler: ResourceRequest (host : *, num containers: 1, priority = 1 , capability : memory: 1408) 14/11/18 14:28:18 INFO YarnAllocationHandler: Allocating 1 executor containers with 1408 of memory each. 14/11/18 14:28:18 INFO YarnAllocationHandler: ResourceRequest (host : *, num containers: 1, priority = 1 , capability : memory: 1408) 14/11/18 14:28:18 INFO RackResolver: Resolved longzhou-hdp3.lz.dscc to /rack1 14/11/18 14:28:18 INFO YarnAllocationHandler: launching container on container_1415961020140_0325_01_02 host longzhou-hdp3.lz.dscc 14/11/18 14:28:18 INFO ExecutorRunnable: Starting Executor Container 14/11/18 14:28:18 INFO ExecutorRunnable: Connecting to ContainerManager at longzhou-hdp3.lz.dscc:12040 14/11/18 14:28:18 INFO ExecutorRunnable: Setting up ContainerLaunchContext 14/11/18 14:28:18 INFO ExecutorRunnable: Preparing Local resources 14/11/18 14:28:18 INFO ExecutorLauncher: All executors have launched. 14/11/18 14:28:18 INFO ExecutorLauncher: Started progress reporter thread - sleep time : 5000 14/11/18 14:28:18 INFO YarnAllocationHandler: ResourceRequest (host : *, num containers: 0, priority = 1 , capability : memory: 1408) 14/11/18 14:28:18 INFO ExecutorRunnable: Prepared Local resources Map(__spark__.jar - resource {, scheme: hdfs, host: longzhou-hdpnn.lz.dscc, port: 11000, file: /user/linqili/.sparkStaging/application_1415961020140_0325/spark-assembly-1.0.2-hadoop2.0.0-cdh4.2.1.jar, }, size: 134859131, timestamp: 1416292093988, type: FILE, visibility: PRIVATE, ) 14/11/18 14:28:18 INFO ExecutorRunnable: Setting up executor with commands: List($JAVA_HOME/bin/java, -server, -XX:OnOutOfMemoryError='kill %p', -Xms1024m -Xmx1024m ,
Re: spark-shell giving me error of unread block data
It can be a serialization issue. Happens when there are different versions installed on the same system. What do you mean by the first time you installed and tested it out? On Wed, Nov 19, 2014 at 3:29 AM, Anson Abraham anson.abra...@gmail.com wrote: I'm essentially loading a file and saving output to another location: val source = sc.textFile(/tmp/testfile.txt) source.saveAsTextFile(/tmp/testsparkoutput) when i do so, i'm hitting this error: 14/11/18 21:15:08 INFO DAGScheduler: Failed to run saveAsTextFile at console:15 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, cloudera-1.testdomain.net): java.lang.IllegalStateException: unread block data java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:162) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Cant figure out what the issue is. I'm running in CDH5.2 w/ version of spark being 1.1. The file i'm loading is literally just 7 MB. I thought it was jar files mismatch, but i did a compare and see they're all identical. But seeing as how they were all installed through CDH parcels, not sure how there would be version mismatch on the nodes and master. Oh yeah 1 master node w/ 2 worker nodes and running in standalone not through yarn. So as a just in case, i copied the jars from the master to the 2 worker nodes as just in case, and still same issue. Weird thing is, first time i installed and tested it out, it worked, but now it doesn't. Any help here would be greatly appreciated.
RandomGenerator class not found exception
My sbt file for the project includes this: libraryDependencies ++= Seq( org.apache.spark %% spark-core % 1.1.0, org.apache.spark %% spark-mllib % 1.1.0, org.apache.commons % commons-math3 % 3.3 ) = Still I am getting this error: java.lang.NoClassDefFoundError: org/apache/commons/math3/random/RandomGenerator = The jar at location: ~/.m2/repository/org/apache/commons/commons-math3/3.3 contains the random generator class: $ jar tvf commons-math3-3.3.jar | grep RandomGenerator org/apache/commons/math3/random/RandomGenerator.class org/apache/commons/math3/random/UniformRandomGenerator.class org/apache/commons/math3/random/SynchronizedRandomGenerator.class org/apache/commons/math3/random/AbstractRandomGenerator.class org/apache/commons/math3/random/RandomGeneratorFactory$1.class org/apache/commons/math3/random/RandomGeneratorFactory.class org/apache/commons/math3/random/StableRandomGenerator.class org/apache/commons/math3/random/NormalizedRandomGenerator.class org/apache/commons/math3/random/JDKRandomGenerator.class org/apache/commons/math3/random/GaussianRandomGenerator.class Please help
Re: Returning breeze.linalg.DenseMatrix from method
Yeah, it works. Although when I try to define a var of type DenseMatrix, like this: var mat1: DenseMatrix[Double] It gives an error saying we need to initialise the matrix mat1 at the time of declaration. Had to initialise it as : var mat1: DenseMatrix[Double] = DenseMatrix.zeros[Double](1,1) Anyways, it works now Thanks for helping :) On Mon, Nov 17, 2014 at 4:56 PM, tribhuvan...@gmail.com tribhuvan...@gmail.com wrote: This should fix it -- def func(str: String): DenseMatrix*[Double]* = { ... ... } So, why is this required? Think of it like this -- If you hadn't explicitly mentioned Double, it might have been that the calling function expected a DenseMatrix[SomeOtherType], and performed a SomeOtherType-specific operation which may have not been supported by the returned DenseMatrix[Double]. (I'm also assuming that SomeOtherType has no subtype relations with Double). On 17 November 2014 00:14, Ritesh Kumar Singh riteshoneinamill...@gmail.com wrote: Hi, I have a method that returns DenseMatrix: def func(str: String): DenseMatrix = { ... ... } But I keep getting this error: *class DenseMatrix takes type parameters* I tried this too: def func(str: String): DenseMatrix(Int, Int, Array[Double]) = { ... ... } But this gives me this error: *'=' expected but '(' found* Any possible fixes? -- *Tribhuvanesh Orekondy*
Returning breeze.linalg.DenseMatrix from method
Hi, I have a method that returns DenseMatrix: def func(str: String): DenseMatrix = { ... ... } But I keep getting this error: *class DenseMatrix takes type parameters* I tried this too: def func(str: String): DenseMatrix(Int, Int, Array[Double]) = { ... ... } But this gives me this error: *'=' expected but '(' found* Any possible fixes?
Re: Fwd: Executor Lost Failure
Yes... found the output on web UI of the slave. Thanks :) On Tue, Nov 11, 2014 at 2:48 AM, Ankur Dave ankurd...@gmail.com wrote: At 2014-11-10 22:53:49 +0530, Ritesh Kumar Singh riteshoneinamill...@gmail.com wrote: Tasks are now getting submitted, but many tasks don't happen. Like, after opening the spark-shell, I load a text file from disk and try printing its contentsas: sc.textFile(/path/to/file).foreach(println) It does not give me any output. That's because foreach launches tasks on the slaves. When each task tries to print its lines, they go to the stdout file on the slave rather than to your console at the driver. You should see the file's contents in each of the slaves' stdout files in the web UI. This only happens when running on a cluster. In local mode, all the tasks are running locally and can output to the driver, so foreach(println) is more useful. Ankur
Re: disable log4j for spark-shell
go to your spark home and then into the conf/ directory and then edit the log4j.properties file i.e. : gedit $SPARK_HOME/conf/log4j.properties and set root logger to: log4j.rootCategory=WARN, console U don't need to build spark for the changes to take place. Whenever you open spark-shel, it by default looks into the conf directories and loads all the properties. Thanks On Tue, Nov 11, 2014 at 6:34 AM, lordjoe lordjoe2...@gmail.com wrote: public static void main(String[] args) throws Exception { System.out.println(Set Log to Warn); Logger rootLogger = Logger.getRootLogger(); rootLogger.setLevel(Level.WARN); ... works for me -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/disable-log4j-for-spark-shell-tp11278p18535.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Fwd: disable log4j for spark-shell
-- Forwarded message -- From: Ritesh Kumar Singh riteshoneinamill...@gmail.com Date: Tue, Nov 11, 2014 at 2:18 PM Subject: Re: disable log4j for spark-shell To: lordjoe lordjoe2...@gmail.com Cc: u...@spark.incubator.apache.org go to your spark home and then into the conf/ directory and then edit the log4j.properties file i.e. : gedit $SPARK_HOME/conf/log4j.properties and set root logger to: log4j.rootCategory=WARN, console U don't need to build spark for the changes to take place. Whenever you open spark-shel, it by default looks into the conf directories and loads all the properties. Thanks On Tue, Nov 11, 2014 at 6:34 AM, lordjoe lordjoe2...@gmail.com wrote: public static void main(String[] args) throws Exception { System.out.println(Set Log to Warn); Logger rootLogger = Logger.getRootLogger(); rootLogger.setLevel(Level.WARN); ... works for me -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/disable-log4j-for-spark-shell-tp11278p18535.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: save as file
We have RDD.saveAsTextFile and RDD.saveAsObjectFile for saving the output to any location specified. The params to be provided are: path of storage location no. of partitions For giving an hdfs path we use the following format: /user/user-name/directory-to-sore/ On Tue, Nov 11, 2014 at 6:28 PM, Naveen Kumar Pokala npok...@spcapitaliq.com wrote: Hi, I am spark 1.1.0. I need a help regarding saving rdd in a JSON file? How to do that? And how to mentions hdfs path in the program. -Naveen
Re: How to kill a Spark job running in cluster mode ?
There is a property : spark.ui.killEnabled which needs to be set true for killing applications directly from the webUI. Check the link: Kill Enable spark job http://spark.apache.org/docs/latest/configuration.html#spark-ui Thanks On Tue, Nov 11, 2014 at 7:42 PM, Sonal Goyal sonalgoy...@gmail.com wrote: The web interface has a kill link. You can try using that. Best Regards, Sonal Founder, Nube Technologies http://www.nubetech.co http://in.linkedin.com/in/sonalgoyal On Tue, Nov 11, 2014 at 7:28 PM, Tao Xiao xiaotao.cs@gmail.com wrote: I'm using Spark 1.0.0 and I'd like to kill a job running in cluster mode, which means the driver is not running on local node. So how can I kill such a job? Is there a command like hadoop job -kill job-id which kills a running MapReduce job ? Thanks
Re: Spark-submit and Windows / Linux mixed network
Never tried this form but just guessing, What's the output when you submit this jar: \\shares\publish\Spark\app1\ someJar.jar using spark-submit.cmd
Removing INFO logs
How can I remove all the INFO logs that appear on the console when I submit an application using spark-submit?
Re: Removing INFO logs
It works. Thanks On Mon, Nov 10, 2014 at 6:32 PM, YANG Fan idd...@gmail.com wrote: Hi, In conf/log4j.properties, change the following log4j.rootCategory=INFO, console to log4j.rootCategory=WARN, console This works for me. Best, Fan On Mon, Nov 10, 2014 at 8:21 PM, Ritesh Kumar Singh riteshoneinamill...@gmail.com wrote: How can I remove all the INFO logs that appear on the console when I submit an application using spark-submit?
Re: Executor Lost Failure
On Mon, Nov 10, 2014 at 10:52 PM, Ritesh Kumar Singh riteshoneinamill...@gmail.com wrote: Tasks are now getting submitted, but many tasks don't happen. Like, after opening the spark-shell, I load a text file from disk and try printing its contentsas: sc.textFile(/path/to/file).foreach(println) It does not give me any output. While running this: sc.textFile(/path/to/file).count gives me the right number of lines in the text file. Not sure what the error is. But here is the output on the console for print case: 14/11/10 22:48:02 INFO MemoryStore: ensureFreeSpace(215230) called with curMem=709528, maxMem=463837593 14/11/10 22:48:02 INFO MemoryStore: Block broadcast_6 stored as values in memory (estimated size 210.2 KB, free 441.5 MB) 14/11/10 22:48:02 INFO MemoryStore: ensureFreeSpace(17239) called with curMem=924758, maxMem=463837593 14/11/10 22:48:02 INFO MemoryStore: Block broadcast_6_piece0 stored as bytes in memory (estimated size 16.8 KB, free 441.5 MB) 14/11/10 22:48:02 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory on gonephishing.local:42648 (size: 16.8 KB, free: 442.3 MB) 14/11/10 22:48:02 INFO BlockManagerMaster: Updated info of block broadcast_6_piece0 14/11/10 22:48:02 INFO FileInputFormat: Total input paths to process : 1 14/11/10 22:48:02 INFO SparkContext: Starting job: foreach at console:13 14/11/10 22:48:02 INFO DAGScheduler: Got job 3 (foreach at console:13) with 2 output partitions (allowLocal=false) 14/11/10 22:48:02 INFO DAGScheduler: Final stage: Stage 3(foreach at console:13) 14/11/10 22:48:02 INFO DAGScheduler: Parents of final stage: List() 14/11/10 22:48:02 INFO DAGScheduler: Missing parents: List() 14/11/10 22:48:02 INFO DAGScheduler: Submitting Stage 3 (Desktop/mnd.txt MappedRDD[7] at textFile at console:13), which has no missing parents 14/11/10 22:48:02 INFO MemoryStore: ensureFreeSpace(2504) called with curMem=941997, maxMem=463837593 14/11/10 22:48:02 INFO MemoryStore: Block broadcast_7 stored as values in memory (estimated size 2.4 KB, free 441.4 MB) 14/11/10 22:48:02 INFO MemoryStore: ensureFreeSpace(1602) called with curMem=944501, maxMem=463837593 14/11/10 22:48:02 INFO MemoryStore: Block broadcast_7_piece0 stored as bytes in memory (estimated size 1602.0 B, free 441.4 MB) 14/11/10 22:48:02 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory on gonephishing.local:42648 (size: 1602.0 B, free: 442.3 MB) 14/11/10 22:48:02 INFO BlockManagerMaster: Updated info of block broadcast_7_piece0 14/11/10 22:48:02 INFO DAGScheduler: Submitting 2 missing tasks from Stage 3 (Desktop/mnd.txt MappedRDD[7] at textFile at console:13) 14/11/10 22:48:02 INFO TaskSchedulerImpl: Adding task set 3.0 with 2 tasks 14/11/10 22:48:02 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID 6, gonephishing.local, PROCESS_LOCAL, 1216 bytes) 14/11/10 22:48:02 INFO TaskSetManager: Starting task 1.0 in stage 3.0 (TID 7, gonephishing.local, PROCESS_LOCAL, 1216 bytes) 14/11/10 22:48:02 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory on gonephishing.local:48857 (size: 1602.0 B, free: 442.3 MB) 14/11/10 22:48:02 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory on gonephishing.local:48857 (size: 16.8 KB, free: 442.3 MB) 14/11/10 22:48:02 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID 6) in 308 ms on gonephishing.local (1/2) 14/11/10 22:48:02 INFO DAGScheduler: Stage 3 (foreach at console:13) finished in 0.321 s 14/11/10 22:48:02 INFO TaskSetManager: Finished task 1.0 in stage 3.0 (TID 7) in 315 ms on gonephishing.local (2/2) 14/11/10 22:48:02 INFO SparkContext: Job finished: foreach at console:13, took 0.376602079 s 14/11/10 22:48:02 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool === On Mon, Nov 10, 2014 at 8:01 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Try adding the following configurations also, might work. spark.rdd.compress true spark.storage.memoryFraction 1 spark.core.connection.ack.wait.timeout 600 spark.akka.frameSize 50 Thanks Best Regards On Mon, Nov 10, 2014 at 6:51 PM, Ritesh Kumar Singh riteshoneinamill...@gmail.com wrote: Hi, I am trying to submit my application using spark-submit, using following spark-default.conf params: spark.master spark://master-ip:7077 spark.eventLog.enabled true spark.serializer org.apache.spark.serializer.KryoSerializer spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers=one two three === But every time I am getting this error: 14/11/10 18:39:17 ERROR TaskSchedulerImpl: Lost executor 1 on aa.local: remote Akka client disassociated 14/11/10 18:39:17 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, aa.local): ExecutorLostFailure (executor lost) 14/11/10 18:39:17
Fwd: Executor Lost Failure
-- Forwarded message -- From: Ritesh Kumar Singh riteshoneinamill...@gmail.com Date: Mon, Nov 10, 2014 at 10:52 PM Subject: Re: Executor Lost Failure To: Akhil Das ak...@sigmoidanalytics.com Tasks are now getting submitted, but many tasks don't happen. Like, after opening the spark-shell, I load a text file from disk and try printing its contentsas: sc.textFile(/path/to/file).foreach(println) It does not give me any output. While running this: sc.textFile(/path/to/file).count gives me the right number of lines in the text file. Not sure what the error is. But here is the output on the console for print case: 14/11/10 22:48:02 INFO MemoryStore: ensureFreeSpace(215230) called with curMem=709528, maxMem=463837593 14/11/10 22:48:02 INFO MemoryStore: Block broadcast_6 stored as values in memory (estimated size 210.2 KB, free 441.5 MB) 14/11/10 22:48:02 INFO MemoryStore: ensureFreeSpace(17239) called with curMem=924758, maxMem=463837593 14/11/10 22:48:02 INFO MemoryStore: Block broadcast_6_piece0 stored as bytes in memory (estimated size 16.8 KB, free 441.5 MB) 14/11/10 22:48:02 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory on gonephishing.local:42648 (size: 16.8 KB, free: 442.3 MB) 14/11/10 22:48:02 INFO BlockManagerMaster: Updated info of block broadcast_6_piece0 14/11/10 22:48:02 INFO FileInputFormat: Total input paths to process : 1 14/11/10 22:48:02 INFO SparkContext: Starting job: foreach at console:13 14/11/10 22:48:02 INFO DAGScheduler: Got job 3 (foreach at console:13) with 2 output partitions (allowLocal=false) 14/11/10 22:48:02 INFO DAGScheduler: Final stage: Stage 3(foreach at console:13) 14/11/10 22:48:02 INFO DAGScheduler: Parents of final stage: List() 14/11/10 22:48:02 INFO DAGScheduler: Missing parents: List() 14/11/10 22:48:02 INFO DAGScheduler: Submitting Stage 3 (Desktop/mnd.txt MappedRDD[7] at textFile at console:13), which has no missing parents 14/11/10 22:48:02 INFO MemoryStore: ensureFreeSpace(2504) called with curMem=941997, maxMem=463837593 14/11/10 22:48:02 INFO MemoryStore: Block broadcast_7 stored as values in memory (estimated size 2.4 KB, free 441.4 MB) 14/11/10 22:48:02 INFO MemoryStore: ensureFreeSpace(1602) called with curMem=944501, maxMem=463837593 14/11/10 22:48:02 INFO MemoryStore: Block broadcast_7_piece0 stored as bytes in memory (estimated size 1602.0 B, free 441.4 MB) 14/11/10 22:48:02 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory on gonephishing.local:42648 (size: 1602.0 B, free: 442.3 MB) 14/11/10 22:48:02 INFO BlockManagerMaster: Updated info of block broadcast_7_piece0 14/11/10 22:48:02 INFO DAGScheduler: Submitting 2 missing tasks from Stage 3 (Desktop/mnd.txt MappedRDD[7] at textFile at console:13) 14/11/10 22:48:02 INFO TaskSchedulerImpl: Adding task set 3.0 with 2 tasks 14/11/10 22:48:02 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID 6, gonephishing.local, PROCESS_LOCAL, 1216 bytes) 14/11/10 22:48:02 INFO TaskSetManager: Starting task 1.0 in stage 3.0 (TID 7, gonephishing.local, PROCESS_LOCAL, 1216 bytes) 14/11/10 22:48:02 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory on gonephishing.local:48857 (size: 1602.0 B, free: 442.3 MB) 14/11/10 22:48:02 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory on gonephishing.local:48857 (size: 16.8 KB, free: 442.3 MB) 14/11/10 22:48:02 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID 6) in 308 ms on gonephishing.local (1/2) 14/11/10 22:48:02 INFO DAGScheduler: Stage 3 (foreach at console:13) finished in 0.321 s 14/11/10 22:48:02 INFO TaskSetManager: Finished task 1.0 in stage 3.0 (TID 7) in 315 ms on gonephishing.local (2/2) 14/11/10 22:48:02 INFO SparkContext: Job finished: foreach at console:13, took 0.376602079 s 14/11/10 22:48:02 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool === On Mon, Nov 10, 2014 at 8:01 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Try adding the following configurations also, might work. spark.rdd.compress true spark.storage.memoryFraction 1 spark.core.connection.ack.wait.timeout 600 spark.akka.frameSize 50 Thanks Best Regards On Mon, Nov 10, 2014 at 6:51 PM, Ritesh Kumar Singh riteshoneinamill...@gmail.com wrote: Hi, I am trying to submit my application using spark-submit, using following spark-default.conf params: spark.master spark://master-ip:7077 spark.eventLog.enabled true spark.serializer org.apache.spark.serializer.KryoSerializer spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers=one two three === But every time I am getting this error: 14/11/10 18:39:17 ERROR TaskSchedulerImpl: Lost executor 1 on aa.local: remote Akka client disassociated 14/11/10 18:39:17 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, aa.local