RE: SparkR job with >200 tasks hangs when calling from web server

Sun, Rui Sun, 01 Nov 2015 23:54:07 -0800

I guess that this is not related to SparkR, but something wrong in the Spark 
Core.

Could you try your application logic within spark-shell (you have to use Scala 
DataFrame API) instead of SparkR shell and to see if this issue still happens?

-----Original Message-----
From: rporcio [mailto:rpor...@gmail.com] 
Sent: Friday, October 30, 2015 11:09 PM
To: user@spark.apache.org
Subject: SparkR job with >200 tasks hangs when calling from web server

Hi,

I have a web server which can execute R codes using SparkR.
The R session is created with the Rscript init.R command where the /init.R/ 
file contains a sparkR initialization section:

/library(SparkR, lib.loc = paste("/opt/Spark/spark-1.5.1-bin-hadoop2.6",
"R", "lib", sep = "/"))
sc <<- sparkR.init(master = "local[4]", appName = "TestR", sparkHome = 
"/opt/Spark/spark-1.5.1-bin-hadoop2.6", sparkPackages =
"com.databricks:spark-csv_2.10:1.2.0")
sqlContext <<- sparkRSQL.init(sc)/

I have the below example R code that I want to execute (flights.csv comes from 
SparkR examples):

/df <- read.df(sqlContext, "/opt/Spark/flights.csv", source = 
"com.databricks.spark.csv", header="true") registerTempTable(df, "flights") 
depDF <- sql(sqlContext, "SELECT dep FROM flights") deps <- collect(depDF)/

If I run this code, it is successfully executed . When I check the Spark UI, I 
see that the belonging job has 2 tasks only.

But if I change the first row to
/df <- repartition(read.df(sqlContext, "/opt/Spark/flights.csv", source = 
"com.databricks.spark.csv", header="true"), 200)/ and execute the R code again, 
the belonging job has 202 tasks from which it sucessfully finishes some (like 
132/202) but then it hangs forever.

If I check the /stderr/ of the executor I can see that the executor can't 
communicate with the driver:

/15/10/30 15:34:24 WARN AkkaRpcEndpointRef: Error sending message [message = 
Heartbeat(0,[Lscala.Tuple2;@36834e15,BlockManagerId(0, 192.168.178.198, 7092))] 
in 1 attempts
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [30 seconds]. 
This timeout is controlled by spark.rpc.askTimeout/

I tried to change memory (e.g. 4g to driver), akka and timeout settings but 
with no luck.

Executing the same code (with the repartition part) from R, it successfully 
finishes, so I assume the problem is related somehow to the webserver, but I 
can't figure it out.

I'm using Centos.

Can someone give me some advice what should I try?

Thanks

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-job-with-200-tasks-hangs-when-calling-from-web-server-tp25237.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
commands, e-mail: user-h...@spark.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

RE: SparkR job with >200 tasks hangs when calling from web server

Reply via email to