[ https://issues.apache.org/jira/browse/SPARK-14240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15216475#comment-15216475 ]
Sayak Ghosh commented on SPARK-14240: ------------------------------------- As I am new to this environment, I cannot get the proper reason behind this. But I am explaining. Firstly I have used 1 Master Node(2 cores & 4 GB RAM) and 2 Slave Nodes(4 cores & 7 GB RAM). At the beginning of application, the it went perfectly but then slowed down and ultimately halted. I have been constantly monitoring the task manager of the slave nodes. Sometime I observed that the usage of CPU% was becoming 350% on the both slave nodes although the memory usage is fine. I cannot find the reason behind it. When the application halted, the usage of CPU and Memory became 1-2%. ============ I have spark-env.sh with the following configuration -- export SPARK_PUBLIC_DNS="azuremaster.westus.cloudapp.azure.com" export SPARK_EXECUTOR_INSTANCES=1 export SPARK_EXECUTOR_CORES=2 export SPARK_EXECUTOR_MEMORY=3G export SPARK_MASTER_OPTS="-Dspark.deploy.defaultCores=1" export SPARK_WORKER_PORT="8888" export PYSPARK_PYTHON=/usr/bin/python3 export PYSPARK_DRIVER_PYTHON=python3 export SPARK_HIVE=true But one thing I want to tell you that, I have created a lot of temp table to store the processed sql data frame. So can this be caused of some memory issues? Please give me some suggestion. > PySpark Standalone Application hangs without any Error message > -------------------------------------------------------------- > > Key: SPARK-14240 > URL: https://issues.apache.org/jira/browse/SPARK-14240 > Project: Spark > Issue Type: Bug > Components: Deploy, PySpark > Affects Versions: 1.6.0 > Reporter: Sayak Ghosh > > I am relatively new to Spark and wrote a simple script using python and spark > SQL. My problem is that it is perfectly allright at the starting phase of the > execution but gradually it slowed down and at the end of the last phase the > whole application hangs > Here is my code snippet - > hivectx.registerDataFrameAsTable(aggregatedDataV1,"aggregatedDataV1") > q1 = "SELECT *, (Total_Sale/Sale_Weeks) as Average_Sale_Per_SaleWeek, > (Total_Weeks/Sale_Weeks) as Velocity FROM aggregatedDataV1" > aggregatedData = hivectx.sql(q1) > aggregatedData.show(100) > ========== Terminal Hanging with the following ========= > 16/03/29 09:05:50 INFO TaskSetManager: Finished task 96.0 in stage 416.0 (TID > 19992) in 41924 ms on 10.9.0.7 (104/200) > 16/03/29 09:05:50 INFO TaskSetManager: Finished task 108.0 in stage 416.0 > (TID 20004) in 24608 ms on 10.9.0.10 (105/200) > 16/03/29 09:05:50 INFO TaskSetManager: Finished task 105.0 in stage 416.0 > (TID 20001) in 24610 ms on 10.9.0.10 (106/200) > 16/03/29 09:05:55 INFO TaskSetManager: Starting task 116.0 in stage 416.0 > (TID 20012, 10.9.0.10, partition 116,NODE_LOCAL, 2240 bytes) > 16/03/29 09:06:31 INFO TaskSetManager: Finished task 99.0 in stage 416.0 (TID > 19995) in 78435 ms on 10.9.0.7 (110/200) > 16/03/29 09:06:40 INFO TaskSetManager: Starting task 119.0 in stage 416.0 > (TID 20015, 10.9.0.10, partition 119,NODE_LOCAL, 2240 bytes) > 16/03/29 09:07:12 INFO TaskSetManager: Starting task 122.0 in stage 416.0 > (TID 20018, 10.9.0.7, partition 122,NODE_LOCAL, 2240 bytes) > 16/03/29 09:07:16 INFO TaskSetManager: Starting task 123.0 in stage 416.0 > (TID 20019, 10.9.0.7, partition 123,NODE_LOCAL, 2240 bytes) > 16/03/29 09:07:28 INFO TaskSetManager: Finished task 111.0 in stage 416.0 > (TID 20007) in 110198 ms on 10.9.0.7 (114/200) > 16/03/29 09:07:52 INFO TaskSetManager: Starting task 124.0 in stage 416.0 > (TID 20020, 10.9.0.10, partition 124,NODE_LOCAL, 2240 bytes) > 16/03/29 09:08:08 INFO TaskSetManager: Finished task 110.0 in stage 416.0 > (TID 20006) in 150023 ms on 10.9.0.7 (115/200) > 16/03/29 09:08:12 INFO TaskSetManager: Finished task 113.0 in stage 416.0 > (TID 20009) in 154120 ms on 10.9.0.7 (116/200) > 16/03/29 09:08:16 INFO TaskSetManager: Finished task 116.0 in stage 416.0 > (TID 20012) in 145691 ms on 10.9.0.10 (117/200) > There is no sign of error. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org