Spark: Why Standalone mode can not set Executor Number.

2014-08-22 Thread Victor Sheng
As far as I know, only yarn mode can set --num-executors, someone proved to
set more number-execuotrs for will perform better than set only 1 or 2
executor with large mem and core. sett
http://apache-spark-user-list.1001560.n3.nabble.com/executor-cores-vs-num-executors-td9878.html
  

Why Standalone mode not provide number-execuotrs parameters instead of
using spreadout strategy by default to generate executor? 

Can anyone explain this in detail ? Thanks : )



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Why-Standalone-mode-can-not-set-Executor-Number-tp12684.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark: why need a masterLock when sending heartbeat to master

2014-08-18 Thread Victor Sheng
Thanks, I got it !



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-why-need-a-masterLock-when-sending-heartbeat-to-master-tp12256p12297.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark: why need a masterLock when sending heartbeat to master

2014-08-17 Thread Victor Sheng
I don't understand why worker need a master lock when sending heartbeat.

Caused by master HA ? Who can explain this in detail? Thanks~

Please refer:
http://stackoverflow.com/questions/25173219/why-does-the-spark-worker-actor-use-a-masterlock

 case SendHeartbeat =
  masterLock.synchronized {
if (connected) { master ! Heartbeat(workerId) }
  }



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-why-need-a-masterLock-when-sending-heartbeat-to-master-tp12256.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: spark1.0.1 spark sql error java.lang.NoClassDefFoundError: Could not initialize class $line11.$read$

2014-07-22 Thread Victor Sheng
Hi, Yin Huai
I test again with your snippet code.
It works well in spark-1.0.1

Here is my code:
 
 val sqlContext = new org.apache.spark.sql.SQLContext(sc)
 case class Record(data_date: String, mobile: String, create_time: String)
 val mobile = Record(2014-07-20,1234567,2014-07-19)
 val lm = List(mobile)
 val mobileRDD = sc.makeRDD(lm)
 val mobileSchemaRDD = sqlContext.createSchemaRDD(mobileRDD)
 mobileSchemaRDD.registerAsTable(mobile)
 sqlContext.sql(select count(1) from mobile).collect()
 
The Result is like below:
14/07/22 15:49:53 INFO spark.SparkContext: Job finished: collect at
SparkPlan.scala:52, took 0.296864832 s
res9: Array[org.apache.spark.sql.Row] = Array([1])

   
   But what is the main cause of this exception? And how you find it out by
looking some unknown characters like $line11.$read$
$line12.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$ ? 

Thanks,
Victor




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark1-0-1-spark-sql-error-java-lang-NoClassDefFoundError-Could-not-initialize-class-line11-read-tp10135p10390.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: spark1.0.1 spark sql error java.lang.NoClassDefFoundError: Could not initialize class $line11.$read$

2014-07-21 Thread Victor Sheng
Hi,Kevin
I tried it on spark1.0.0, it works fine.
It's a bug in spark1.0.1 ...
Thanks,
Victor



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark1-0-1-spark-sql-error-java-lang-NoClassDefFoundError-Could-not-initialize-class-line11-read-tp10135p10288.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: spark1.0.1 spark sql error java.lang.NoClassDefFoundError: Could not initialize class $line11.$read$

2014-07-20 Thread Victor Sheng
Hi, Michael
I only modified the default hadoop version to 0.20.2-cdh3u5, and
DEFAULT_HIVE=true in SparkBuild.scala.
Then sbt/sbt assembly.
I just run in the local standalone mode by using sbin/start-all.sh.
Hadoop version is 0.20.2-cdh3u5.
Then use spark-shell to execute the spark sql.

One machine both master and slave. OS is CentOS 5.
And not use mesos.
Thanks,
Victor



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark1-0-1-spark-sql-error-java-lang-NoClassDefFoundError-Could-not-initialize-class-line11-read-tp10135p10266.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: spark1.0.1 spark sql error java.lang.NoClassDefFoundError: Could not initialize class $line11.$read$

2014-07-18 Thread Victor Sheng
Hi,Svend
  Your reply is very helpful to me. I'll keep an eye on that ticket.
  And also... Cheers  :)
Best Regards,
Victor



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark1-0-1-spark-sql-error-java-lang-NoClassDefFoundError-Could-not-initialize-class-line11-read-tp10135p10162.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


spark1.0.1 spark sql error java.lang.NoClassDefFoundError: Could not initialize class $line11.$read$

2014-07-17 Thread Victor Sheng
when I run a query to a hadoop file.
mobile.registerAsTable(mobile)
val count = sqlContext.sql(select count(1) from mobile)
res5: org.apache.spark.sql.SchemaRDD = 
SchemaRDD[21] at RDD at SchemaRDD.scala:100
== Query Plan ==
ExistingRdd [data_date#0,mobile#1,create_time#2], MapPartitionsRDD[4] at
mapPartitions at basicOperators.scala:176

when I run collect.
count.collect()

It throws exceptions, Can anyone help me ?

Job aborted due to stage failure: Task 3.0:22 failed 4 times, most recent
failure: Exception failure in TID 153 on host wh-8-210:
java.lang.NoClassDefFoundError: Could not initialize class $line11.$read$
$line12.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(console:19)
$line12.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(console:19)
scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
scala.collection.Iterator$$anon$1.next(Iterator.scala:853)
scala.collection.Iterator$$anon$1.head(Iterator.scala:840)
org.apache.spark.sql.execution.ExistingRdd$$anonfun$productToRowRdd$1.apply(basicOperators.scala:181)
org.apache.spark.sql.execution.ExistingRdd$$anonfun$productToRowRdd$1.apply(basicOperators.scala:176)
org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:559)
org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:559)
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
org.apache.spark.scheduler.Task.run(Task.scala:51)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
java.lang.Thread.run(Thread.java:722) Driver stacktrace:



java.lang.ExceptionInInitializerError
at $line11.$read$$iwC.init(console:6)
at $line11.$read.init(console:26)
at $line11.$read$.init(console:30)
at $line11.$read$.clinit(console)
at $line12.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(console:19)
at $line12.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(console:19)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$1.next(Iterator.scala:853)
at scala.collection.Iterator$$anon$1.head(Iterator.scala:840)
at
org.apache.spark.sql.execution.ExistingRdd$$anonfun$productToRowRdd$1.apply(basicOperators.scala:181)
at
org.apache.spark.sql.execution.ExistingRdd$$anonfun$productToRowRdd$1.apply(basicOperators.scala:176)
at org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:559)
at org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:559)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
at org.apache.spark.scheduler.Task.run(Task.scala:51)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)


My classpath is :
/app/hadoop/spark-1.0.1/assembly/target/scala-2.10/spark-assembly-1.0.1-hadoop0.20.2-cdh3u5.jar
System Classpath
/app/hadoop/spark-1.0.1/confSystem Classpath
/app/hadoop/spark-1.0.1/lib_managed/jars/JavaEWAH-0.3.2.jar System Classpath
/app/hadoop/spark-1.0.1/lib_managed/jars/JavaEWAH-0.6.6.jar System Classpath
/app/hadoop/spark-1.0.1/lib_managed/jars/ST4-4.0.4.jar  System Classpath
/app/hadoop/spark-1.0.1/lib_managed/jars/activation-1.1.jar System Classpath
/app/hadoop/spark-1.0.1/lib_managed/jars/akka-actor_2.10-2.2.3-shaded-protobuf.jar
System Classpath
/app/hadoop/spark-1.0.1/lib_managed/jars/algebird-core_2.10-0.1.11.jar
System Classpath

spark1.0.1 catalyst transform filter not push down

2014-07-14 Thread victor sheng
Hi, I encountered a weird problem in spark sql.
I use sbt/sbt hive/console  to go into the shell.

I test the filter push down by using catalyst.

scala  val queryPlan = sql(select value from (select key,value from src)a
where a.key=86 )
scala queryPlan.baseLogicalPlan
res0: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan = 
Project ['value]
 Filter ('a.key = 86)
  Subquery a
   Project ['key,'value]
UnresolvedRelation None, src, None

I want to achieve the Filter Push Down.

So I run :
scala var newQuery = queryPlan.baseLogicalPlan transform {
 | case f @ Filter(_, p @ Project(_,grandChild)) 
 | if (f.references subsetOf grandChild.output) = 
 | p.copy(child = f.copy(child = grandChild))
 | }
console:42: error: type mismatch;
 found   : Seq[org.apache.spark.sql.catalyst.expressions.Attribute]
 required:
scala.collection.GenSet[org.apache.spark.sql.catalyst.expressions.Attribute]
   if (f.references subsetOf grandChild.output) = 
^
It throws exception above. I don't know what's wrong.

If I run :
var newQuery = queryPlan.baseLogicalPlan transform {
case f @ Filter(_, p @ Project(_,grandChild)) 
if true = 
p.copy(child = f.copy(child = grandChild))
}
scala var newQuery = queryPlan.baseLogicalPlan transform {
 | case f @ Filter(_, p @ Project(_,grandChild)) 
 | if true = 
 | p.copy(child = f.copy(child = grandChild))
 | }
newQuery: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan = 
Project ['value]
 Filter ('a.key = 86)
  Subquery a
   Project ['key,'value]
UnresolvedRelation None, src, None

It seems the Filter also in the same position, not switch the order.
Can anyone guide me about it?




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark1-0-1-catalyst-transform-filter-not-push-down-tp9599.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: spark1.0.1 catalyst transform filter not push down

2014-07-14 Thread Victor Sheng
I use queryPlan.queryExecution.analyzed to get the logical plan.

it works.

And What you explained to me is very useful. 

Thank you very much.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark1-0-1-catalyst-transform-filter-not-push-down-tp9599p9689.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.