Spark: Why Standalone mode can not set Executor Number.
As far as I know, only yarn mode can set --num-executors, someone proved to set more number-execuotrs for will perform better than set only 1 or 2 executor with large mem and core. sett http://apache-spark-user-list.1001560.n3.nabble.com/executor-cores-vs-num-executors-td9878.html Why Standalone mode not provide number-execuotrs parameters instead of using spreadout strategy by default to generate executor? Can anyone explain this in detail ? Thanks : ) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Why-Standalone-mode-can-not-set-Executor-Number-tp12684.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark: why need a masterLock when sending heartbeat to master
Thanks, I got it ! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-why-need-a-masterLock-when-sending-heartbeat-to-master-tp12256p12297.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark: why need a masterLock when sending heartbeat to master
I don't understand why worker need a master lock when sending heartbeat. Caused by master HA ? Who can explain this in detail? Thanks~ Please refer: http://stackoverflow.com/questions/25173219/why-does-the-spark-worker-actor-use-a-masterlock case SendHeartbeat = masterLock.synchronized { if (connected) { master ! Heartbeat(workerId) } } -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-why-need-a-masterLock-when-sending-heartbeat-to-master-tp12256.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: spark1.0.1 spark sql error java.lang.NoClassDefFoundError: Could not initialize class $line11.$read$
Hi, Yin Huai I test again with your snippet code. It works well in spark-1.0.1 Here is my code: val sqlContext = new org.apache.spark.sql.SQLContext(sc) case class Record(data_date: String, mobile: String, create_time: String) val mobile = Record(2014-07-20,1234567,2014-07-19) val lm = List(mobile) val mobileRDD = sc.makeRDD(lm) val mobileSchemaRDD = sqlContext.createSchemaRDD(mobileRDD) mobileSchemaRDD.registerAsTable(mobile) sqlContext.sql(select count(1) from mobile).collect() The Result is like below: 14/07/22 15:49:53 INFO spark.SparkContext: Job finished: collect at SparkPlan.scala:52, took 0.296864832 s res9: Array[org.apache.spark.sql.Row] = Array([1]) But what is the main cause of this exception? And how you find it out by looking some unknown characters like $line11.$read$ $line12.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$ ? Thanks, Victor -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark1-0-1-spark-sql-error-java-lang-NoClassDefFoundError-Could-not-initialize-class-line11-read-tp10135p10390.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: spark1.0.1 spark sql error java.lang.NoClassDefFoundError: Could not initialize class $line11.$read$
Hi,Kevin I tried it on spark1.0.0, it works fine. It's a bug in spark1.0.1 ... Thanks, Victor -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark1-0-1-spark-sql-error-java-lang-NoClassDefFoundError-Could-not-initialize-class-line11-read-tp10135p10288.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: spark1.0.1 spark sql error java.lang.NoClassDefFoundError: Could not initialize class $line11.$read$
Hi, Michael I only modified the default hadoop version to 0.20.2-cdh3u5, and DEFAULT_HIVE=true in SparkBuild.scala. Then sbt/sbt assembly. I just run in the local standalone mode by using sbin/start-all.sh. Hadoop version is 0.20.2-cdh3u5. Then use spark-shell to execute the spark sql. One machine both master and slave. OS is CentOS 5. And not use mesos. Thanks, Victor -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark1-0-1-spark-sql-error-java-lang-NoClassDefFoundError-Could-not-initialize-class-line11-read-tp10135p10266.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: spark1.0.1 spark sql error java.lang.NoClassDefFoundError: Could not initialize class $line11.$read$
Hi,Svend Your reply is very helpful to me. I'll keep an eye on that ticket. And also... Cheers :) Best Regards, Victor -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark1-0-1-spark-sql-error-java-lang-NoClassDefFoundError-Could-not-initialize-class-line11-read-tp10135p10162.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
spark1.0.1 spark sql error java.lang.NoClassDefFoundError: Could not initialize class $line11.$read$
when I run a query to a hadoop file. mobile.registerAsTable(mobile) val count = sqlContext.sql(select count(1) from mobile) res5: org.apache.spark.sql.SchemaRDD = SchemaRDD[21] at RDD at SchemaRDD.scala:100 == Query Plan == ExistingRdd [data_date#0,mobile#1,create_time#2], MapPartitionsRDD[4] at mapPartitions at basicOperators.scala:176 when I run collect. count.collect() It throws exceptions, Can anyone help me ? Job aborted due to stage failure: Task 3.0:22 failed 4 times, most recent failure: Exception failure in TID 153 on host wh-8-210: java.lang.NoClassDefFoundError: Could not initialize class $line11.$read$ $line12.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(console:19) $line12.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(console:19) scala.collection.Iterator$$anon$11.next(Iterator.scala:328) scala.collection.Iterator$$anon$1.next(Iterator.scala:853) scala.collection.Iterator$$anon$1.head(Iterator.scala:840) org.apache.spark.sql.execution.ExistingRdd$$anonfun$productToRowRdd$1.apply(basicOperators.scala:181) org.apache.spark.sql.execution.ExistingRdd$$anonfun$productToRowRdd$1.apply(basicOperators.scala:176) org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:559) org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:559) org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158) org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) org.apache.spark.scheduler.Task.run(Task.scala:51) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) java.lang.Thread.run(Thread.java:722) Driver stacktrace: java.lang.ExceptionInInitializerError at $line11.$read$$iwC.init(console:6) at $line11.$read.init(console:26) at $line11.$read$.init(console:30) at $line11.$read$.clinit(console) at $line12.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(console:19) at $line12.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(console:19) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$1.next(Iterator.scala:853) at scala.collection.Iterator$$anon$1.head(Iterator.scala:840) at org.apache.spark.sql.execution.ExistingRdd$$anonfun$productToRowRdd$1.apply(basicOperators.scala:181) at org.apache.spark.sql.execution.ExistingRdd$$anonfun$productToRowRdd$1.apply(basicOperators.scala:176) at org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:559) at org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:559) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) My classpath is : /app/hadoop/spark-1.0.1/assembly/target/scala-2.10/spark-assembly-1.0.1-hadoop0.20.2-cdh3u5.jar System Classpath /app/hadoop/spark-1.0.1/confSystem Classpath /app/hadoop/spark-1.0.1/lib_managed/jars/JavaEWAH-0.3.2.jar System Classpath /app/hadoop/spark-1.0.1/lib_managed/jars/JavaEWAH-0.6.6.jar System Classpath /app/hadoop/spark-1.0.1/lib_managed/jars/ST4-4.0.4.jar System Classpath /app/hadoop/spark-1.0.1/lib_managed/jars/activation-1.1.jar System Classpath /app/hadoop/spark-1.0.1/lib_managed/jars/akka-actor_2.10-2.2.3-shaded-protobuf.jar System Classpath /app/hadoop/spark-1.0.1/lib_managed/jars/algebird-core_2.10-0.1.11.jar System Classpath
spark1.0.1 catalyst transform filter not push down
Hi, I encountered a weird problem in spark sql. I use sbt/sbt hive/console to go into the shell. I test the filter push down by using catalyst. scala val queryPlan = sql(select value from (select key,value from src)a where a.key=86 ) scala queryPlan.baseLogicalPlan res0: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan = Project ['value] Filter ('a.key = 86) Subquery a Project ['key,'value] UnresolvedRelation None, src, None I want to achieve the Filter Push Down. So I run : scala var newQuery = queryPlan.baseLogicalPlan transform { | case f @ Filter(_, p @ Project(_,grandChild)) | if (f.references subsetOf grandChild.output) = | p.copy(child = f.copy(child = grandChild)) | } console:42: error: type mismatch; found : Seq[org.apache.spark.sql.catalyst.expressions.Attribute] required: scala.collection.GenSet[org.apache.spark.sql.catalyst.expressions.Attribute] if (f.references subsetOf grandChild.output) = ^ It throws exception above. I don't know what's wrong. If I run : var newQuery = queryPlan.baseLogicalPlan transform { case f @ Filter(_, p @ Project(_,grandChild)) if true = p.copy(child = f.copy(child = grandChild)) } scala var newQuery = queryPlan.baseLogicalPlan transform { | case f @ Filter(_, p @ Project(_,grandChild)) | if true = | p.copy(child = f.copy(child = grandChild)) | } newQuery: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan = Project ['value] Filter ('a.key = 86) Subquery a Project ['key,'value] UnresolvedRelation None, src, None It seems the Filter also in the same position, not switch the order. Can anyone guide me about it? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark1-0-1-catalyst-transform-filter-not-push-down-tp9599.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: spark1.0.1 catalyst transform filter not push down
I use queryPlan.queryExecution.analyzed to get the logical plan. it works. And What you explained to me is very useful. Thank you very much. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark1-0-1-catalyst-transform-filter-not-push-down-tp9599p9689.html Sent from the Apache Spark User List mailing list archive at Nabble.com.