Spark Job Doesn't End on Mesos

2016-08-09 Thread Todd Leo
Hi,

I’m running Spark jobs on Mesos. When the job finishes, *SparkContext* is
manually closed by sc.stop(). Then Mesos log shows:

I0809 15:48:34.132014 11020 sched.cpp:1589] Asked to stop the driver
I0809 15:48:34.132181 11277 sched.cpp:831] Stopping framework
'20160808-170425-2365980426-5050-4372-0034'

However, the process doesn’t quit after all. This is critical, because I’d
like to use SparkLauncher to submit such jobs. If my job doesn’t end, jobs
will pile up and fill up the memory. Pls help. :-|

—
BR,
Todd Leo
​


OutOfMemoryError when using DataFrame created by Spark SQL

2015-03-25 Thread Todd Leo
Hi,

I am using *Spark SQL* to query on my *Hive cluster*, following Spark SQL
and DataFrame Guide
https://spark.apache.org/docs/latest/sql-programming-guide.html step by
step. However, my HiveQL via sqlContext.sql() fails and
java.lang.OutOfMemoryError was raised. The expected result of such query is
considered to be small (by adding limit 1000 clause). My code is shown
below:

scala import sqlContext.implicits._
scala val df = sqlContext.sql(select * from some_table where
logdate=2015-03-24 limit 1000)

and the error msg:

[ERROR] [03/25/2015 16:08:22.379] [sparkDriver-scheduler-27]
[ActorSystem(sparkDriver)] Uncaught fatal error from thread
[sparkDriver-scheduler-27] shutting down ActorSystem [sparkDriver]
java.lang.OutOfMemoryError: GC overhead limit exceeded

the master heap memory is set by -Xms512m -Xmx512m, while workers set
by -Xms4096M
-Xmx4096M, which I presume sufficient for this trivial query.

Additionally, after restarted the spark-shell and re-run the limit 5 query
, the df object is returned and can be printed by df.show(), but other APIs
fails on OutOfMemoryError, namely, df.count(),
df.select(some_field).show() and so forth.

I understand that the RDD can be collected to master hence further
transmutations can be applied, as DataFrame has “richer optimizations under
the hood” and the convention from an R/julia user, I really hope this error
is able to be tackled, and DataFrame is robust enough to depend.

Thanks in advance!

REGARDS,
Todd
​




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemoryError-when-using-DataFrame-created-by-Spark-SQL-tp22219.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.