[ 
https://issues.apache.org/jira/browse/SPARK-14240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15216475#comment-15216475
 ] 

Sayak Ghosh commented on SPARK-14240:
-------------------------------------

As I am new to this environment, I cannot get the proper reason behind this.
But I am explaining.
Firstly I have used 1 Master Node(2 cores & 4 GB RAM) and 2 Slave Nodes(4 cores 
& 7 GB RAM). At the beginning of application, the it went perfectly but then 
slowed down and ultimately halted.

I have been constantly monitoring the task manager of the slave nodes. Sometime 
I observed that the usage of CPU% was becoming 350% on the both slave nodes 
although the memory usage is fine. I cannot find the reason behind it.
When the application halted, the usage of CPU and Memory became 1-2%. 

============
I have spark-env.sh with the following configuration --

export SPARK_PUBLIC_DNS="azuremaster.westus.cloudapp.azure.com"
export SPARK_EXECUTOR_INSTANCES=1
export SPARK_EXECUTOR_CORES=2
export SPARK_EXECUTOR_MEMORY=3G
export SPARK_MASTER_OPTS="-Dspark.deploy.defaultCores=1"
export SPARK_WORKER_PORT="8888"
export PYSPARK_PYTHON=/usr/bin/python3
export PYSPARK_DRIVER_PYTHON=python3
export SPARK_HIVE=true

But one thing I want to tell you that, I have created a lot of temp table to 
store the processed sql data frame. So  can this be caused of some memory 
issues?
Please give me some suggestion.


> PySpark Standalone Application hangs without any Error message
> --------------------------------------------------------------
>
>                 Key: SPARK-14240
>                 URL: https://issues.apache.org/jira/browse/SPARK-14240
>             Project: Spark
>          Issue Type: Bug
>          Components: Deploy, PySpark
>    Affects Versions: 1.6.0
>            Reporter: Sayak Ghosh
>
> I am relatively new to Spark and wrote a simple script using python and spark 
> SQL. My problem is that it is perfectly allright at the starting phase of the 
> execution but gradually it slowed down and at the end of the last phase the 
> whole application hangs
> Here is my code snippet - 
>   hivectx.registerDataFrameAsTable(aggregatedDataV1,"aggregatedDataV1")
> q1 = "SELECT *, (Total_Sale/Sale_Weeks) as Average_Sale_Per_SaleWeek, 
> (Total_Weeks/Sale_Weeks) as Velocity FROM aggregatedDataV1"
> aggregatedData = hivectx.sql(q1)
> aggregatedData.show(100)
> ========== Terminal Hanging with the following =========
> 16/03/29 09:05:50 INFO TaskSetManager: Finished task 96.0 in stage 416.0 (TID 
> 19992) in 41924 ms on 10.9.0.7 (104/200)
> 16/03/29 09:05:50 INFO TaskSetManager: Finished task 108.0 in stage 416.0 
> (TID 20004) in 24608 ms on 10.9.0.10 (105/200)
> 16/03/29 09:05:50 INFO TaskSetManager: Finished task 105.0 in stage 416.0 
> (TID 20001) in 24610 ms on 10.9.0.10 (106/200)
> 16/03/29 09:05:55 INFO TaskSetManager: Starting task 116.0 in stage 416.0 
> (TID 20012, 10.9.0.10, partition 116,NODE_LOCAL, 2240 bytes)
> 16/03/29 09:06:31 INFO TaskSetManager: Finished task 99.0 in stage 416.0 (TID 
> 19995) in 78435 ms on 10.9.0.7 (110/200)
> 16/03/29 09:06:40 INFO TaskSetManager: Starting task 119.0 in stage 416.0 
> (TID 20015, 10.9.0.10, partition 119,NODE_LOCAL, 2240 bytes)
> 16/03/29 09:07:12 INFO TaskSetManager: Starting task 122.0 in stage 416.0 
> (TID 20018, 10.9.0.7, partition 122,NODE_LOCAL, 2240 bytes) 
> 16/03/29 09:07:16 INFO TaskSetManager: Starting task 123.0 in stage 416.0 
> (TID 20019, 10.9.0.7, partition 123,NODE_LOCAL, 2240 bytes)
> 16/03/29 09:07:28 INFO TaskSetManager: Finished task 111.0 in stage 416.0 
> (TID 20007) in 110198 ms on 10.9.0.7 (114/200)
> 16/03/29 09:07:52 INFO TaskSetManager: Starting task 124.0 in stage 416.0 
> (TID 20020, 10.9.0.10, partition 124,NODE_LOCAL, 2240 bytes)
> 16/03/29 09:08:08 INFO TaskSetManager: Finished task 110.0 in stage 416.0 
> (TID 20006) in 150023 ms on 10.9.0.7 (115/200)
> 16/03/29 09:08:12 INFO TaskSetManager: Finished task 113.0 in stage 416.0 
> (TID 20009) in 154120 ms on 10.9.0.7 (116/200)
> 16/03/29 09:08:16 INFO TaskSetManager: Finished task 116.0 in stage 416.0 
> (TID 20012) in 145691 ms on 10.9.0.10 (117/200)
> There is no sign of error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to