[jira] [Comment Edited] (SPARK-3563) Shuffle data not always be cleaned

2014-09-17 Thread shenhong (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137014#comment-14137014
 ] 

shenhong edited comment on SPARK-3563 at 9/17/14 9:55 AM:
--

Thanks Sean Owen!
I don‘t have set spark.cleaner.ttl, maybe it will work, but my point is why 
some shuffle stages data have been cleaned, but the others are not.


was (Author: shenhong):
Thanks Sean Owen!
I don‘t have set spark.cleaner.ttl, maybe it will work, but my point is why 
some shuffle stage data have been cleaned, but the other are not.

 Shuffle data not always be cleaned
 --

 Key: SPARK-3563
 URL: https://issues.apache.org/jira/browse/SPARK-3563
 Project: Spark
  Issue Type: Bug
  Components: core
Affects Versions: 1.0.2
Reporter: shenhong

 In our cluster, when we run a spark streaming job, after running for many 
 hours, the shuffle data seems not all be cleaned, here is the shuffle data:
 -rw-r- 1 tdwadmin users 23948 Sep 17 13:21 shuffle_132_34_0
 -rw-r- 1 tdwadmin users 18237 Sep 17 13:32 shuffle_143_22_1
 -rw-r- 1 tdwadmin users 22934 Sep 17 13:35 shuffle_146_15_0
 -rw-r- 1 tdwadmin users 27666 Sep 17 13:35 shuffle_146_36_1
 -rw-r- 1 tdwadmin users 12864 Sep 17 14:05 shuffle_176_12_0
 -rw-r- 1 tdwadmin users 22115 Sep 17 14:05 shuffle_176_33_1
 -rw-r- 1 tdwadmin users 15666 Sep 17 14:21 shuffle_192_0_1
 -rw-r- 1 tdwadmin users 13916 Sep 17 14:38 shuffle_209_53_0
 -rw-r- 1 tdwadmin users 20031 Sep 17 14:41 shuffle_212_26_0
 -rw-r- 1 tdwadmin users 15158 Sep 17 14:41 shuffle_212_47_1
 -rw-r- 1 tdwadmin users 42880 Sep 17 12:12 shuffle_63_1_1
 -rw-r- 1 tdwadmin users 32030 Sep 17 12:14 shuffle_65_40_0
 -rw-r- 1 tdwadmin users 34477 Sep 17 12:33 shuffle_84_2_1
 The shuffle data of stage 63, 65, 84, 132... are not cleaned.
 In ContextCleaner, it maintains a weak reference for each RDD, 
 ShuffleDependency, and Broadcast of interest,  to be processed when the 
 associated object goes out of scope of the application. Actual  cleanup is 
 performed in a separate daemon thread. 
 There must be some  reference for ShuffleDependency , and it's hard to find 
 out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-3563) Shuffle data not always be cleaned

2014-09-17 Thread shenhong (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137014#comment-14137014
 ] 

shenhong edited comment on SPARK-3563 at 9/17/14 9:59 AM:
--

Thanks Sean Owen!
I don‘t have set spark.cleaner.ttl, maybe it will work, but my point is why 
some shuffle stages data have been cleaned, while the others are not.


was (Author: shenhong):
Thanks Sean Owen!
I don‘t have set spark.cleaner.ttl, maybe it will work, but my point is why 
some shuffle stages data have been cleaned, but the others are not.

 Shuffle data not always be cleaned
 --

 Key: SPARK-3563
 URL: https://issues.apache.org/jira/browse/SPARK-3563
 Project: Spark
  Issue Type: Bug
  Components: core
Affects Versions: 1.0.2
Reporter: shenhong

 In our cluster, when we run a spark streaming job, after running for many 
 hours, the shuffle data seems not all be cleaned, here is the shuffle data:
 -rw-r- 1 tdwadmin users 23948 Sep 17 13:21 shuffle_132_34_0
 -rw-r- 1 tdwadmin users 18237 Sep 17 13:32 shuffle_143_22_1
 -rw-r- 1 tdwadmin users 22934 Sep 17 13:35 shuffle_146_15_0
 -rw-r- 1 tdwadmin users 27666 Sep 17 13:35 shuffle_146_36_1
 -rw-r- 1 tdwadmin users 12864 Sep 17 14:05 shuffle_176_12_0
 -rw-r- 1 tdwadmin users 22115 Sep 17 14:05 shuffle_176_33_1
 -rw-r- 1 tdwadmin users 15666 Sep 17 14:21 shuffle_192_0_1
 -rw-r- 1 tdwadmin users 13916 Sep 17 14:38 shuffle_209_53_0
 -rw-r- 1 tdwadmin users 20031 Sep 17 14:41 shuffle_212_26_0
 -rw-r- 1 tdwadmin users 15158 Sep 17 14:41 shuffle_212_47_1
 -rw-r- 1 tdwadmin users 42880 Sep 17 12:12 shuffle_63_1_1
 -rw-r- 1 tdwadmin users 32030 Sep 17 12:14 shuffle_65_40_0
 -rw-r- 1 tdwadmin users 34477 Sep 17 12:33 shuffle_84_2_1
 The shuffle data of stage 63, 65, 84, 132... are not cleaned.
 In ContextCleaner, it maintains a weak reference for each RDD, 
 ShuffleDependency, and Broadcast of interest,  to be processed when the 
 associated object goes out of scope of the application. Actual  cleanup is 
 performed in a separate daemon thread. 
 There must be some  reference for ShuffleDependency , and it's hard to find 
 out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-3563) Shuffle data not always be cleaned

2014-09-17 Thread shenhong (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138376#comment-14138376
 ] 

shenhong edited comment on SPARK-3563 at 9/18/14 1:47 AM:
--

I have just trigger the driver a full GC, and the shuffle data all cleaned. 
here is the log:
...
14/09/18 09:28:31 INFO ContextCleaner: Cleaned shuffle 101
14/09/18 09:28:31 INFO ContextCleaner: Cleaned shuffle 100
14/09/18 09:28:31 INFO ContextCleaner: Cleaned shuffle 96
...
14/09/18 09:28:31 INFO ContextCleaner: Cleaned shuffle 68
14/09/18 09:28:31 INFO ContextCleaner: Cleaned shuffle 63
14/09/18 09:28:31 INFO ContextCleaner: Cleaned shuffle 62
14/09/18 09:28:31 INFO ContextCleaner: Cleaned shuffle 56
14/09/18 09:28:31 INFO ContextCleaner: Cleaned shuffle 55
14/09/18 09:28:31 INFO ContextCleaner: Cleaned shuffle 628
14/09/18 09:28:31 INFO ContextCleaner: Cleaned shuffle 627
...
The driver only use 365M memory, and I set --driver-memory 4g, maybe the driver 
have never start a full gc. 
Whether it is reasonable for cleaning shuffle data depends on driver's full gc.


was (Author: shenhong):
I have just trigger the driver a full GC, and the shuffle data all cleaned. 
here is the log:
...
14/09/18 09:28:31 INFO ContextCleaner: Cleaned shuffle 101
14/09/18 09:28:31 INFO ContextCleaner: Cleaned shuffle 100
14/09/18 09:28:31 INFO ContextCleaner: Cleaned shuffle 96
...
14/09/18 09:28:31 INFO ContextCleaner: Cleaned shuffle 68
14/09/18 09:28:31 INFO ContextCleaner: Cleaned shuffle 63
14/09/18 09:28:31 INFO ContextCleaner: Cleaned shuffle 62
14/09/18 09:28:31 INFO ContextCleaner: Cleaned shuffle 56
14/09/18 09:28:31 INFO ContextCleaner: Cleaned shuffle 55
14/09/18 09:28:31 INFO ContextCleaner: Cleaned shuffle 628
14/09/18 09:28:31 INFO ContextCleaner: Cleaned shuffle 627
...
The driver only use 365M memory, and I set --driver-memory 4g, maybe the driver 
have never stert a full gc. 
Whether it is reasonable for cleaning shuffle data depends on driver's full gc.

 Shuffle data not always be cleaned
 --

 Key: SPARK-3563
 URL: https://issues.apache.org/jira/browse/SPARK-3563
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.0.2
Reporter: shenhong

 In our cluster, when we run a spark streaming job, after running for many 
 hours, the shuffle data seems not all be cleaned, here is the shuffle data:
 -rw-r- 1 tdwadmin users 23948 Sep 17 13:21 shuffle_132_34_0
 -rw-r- 1 tdwadmin users 18237 Sep 17 13:32 shuffle_143_22_1
 -rw-r- 1 tdwadmin users 22934 Sep 17 13:35 shuffle_146_15_0
 -rw-r- 1 tdwadmin users 27666 Sep 17 13:35 shuffle_146_36_1
 -rw-r- 1 tdwadmin users 12864 Sep 17 14:05 shuffle_176_12_0
 -rw-r- 1 tdwadmin users 22115 Sep 17 14:05 shuffle_176_33_1
 -rw-r- 1 tdwadmin users 15666 Sep 17 14:21 shuffle_192_0_1
 -rw-r- 1 tdwadmin users 13916 Sep 17 14:38 shuffle_209_53_0
 -rw-r- 1 tdwadmin users 20031 Sep 17 14:41 shuffle_212_26_0
 -rw-r- 1 tdwadmin users 15158 Sep 17 14:41 shuffle_212_47_1
 -rw-r- 1 tdwadmin users 42880 Sep 17 12:12 shuffle_63_1_1
 -rw-r- 1 tdwadmin users 32030 Sep 17 12:14 shuffle_65_40_0
 -rw-r- 1 tdwadmin users 34477 Sep 17 12:33 shuffle_84_2_1
 The shuffle data of stage 63, 65, 84, 132... are not cleaned.
 In ContextCleaner, it maintains a weak reference for each RDD, 
 ShuffleDependency, and Broadcast of interest,  to be processed when the 
 associated object goes out of scope of the application. Actual  cleanup is 
 performed in a separate daemon thread. 
 There must be some  reference for ShuffleDependency , and it's hard to find 
 out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-3563) Shuffle data not always be cleaned

2014-09-17 Thread shenhong (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137348#comment-14137348
 ] 

shenhong edited comment on SPARK-3563 at 9/18/14 2:16 AM:
--

Thanks, Saisai. I think you are right, it depend on JVM's GC. But in my 
streaming job, one minute a batch,  two stage a batch, and it contain a shuffle 
stage. In the first hour(60 batches), shuffle data had been cleaned, but after 
that, shuffle data not always be cleaned. And streaming job won't stop.


was (Author: shenhong):
Thanks, Saisai. I thank you are right, it depend on JVM's GC. But in my 
streaming job, one minute a batch,  two stage a batch, and it contain a shuffle 
stage. In the first hour(60 batches), shuffle data had been cleaned, but after 
that, shuffle data not always be cleaned. And streaming job won't stop.

 Shuffle data not always be cleaned
 --

 Key: SPARK-3563
 URL: https://issues.apache.org/jira/browse/SPARK-3563
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.0.2
Reporter: shenhong

 In our cluster, when we run a spark streaming job, after running for many 
 hours, the shuffle data seems not all be cleaned, here is the shuffle data:
 -rw-r- 1 tdwadmin users 23948 Sep 17 13:21 shuffle_132_34_0
 -rw-r- 1 tdwadmin users 18237 Sep 17 13:32 shuffle_143_22_1
 -rw-r- 1 tdwadmin users 22934 Sep 17 13:35 shuffle_146_15_0
 -rw-r- 1 tdwadmin users 27666 Sep 17 13:35 shuffle_146_36_1
 -rw-r- 1 tdwadmin users 12864 Sep 17 14:05 shuffle_176_12_0
 -rw-r- 1 tdwadmin users 22115 Sep 17 14:05 shuffle_176_33_1
 -rw-r- 1 tdwadmin users 15666 Sep 17 14:21 shuffle_192_0_1
 -rw-r- 1 tdwadmin users 13916 Sep 17 14:38 shuffle_209_53_0
 -rw-r- 1 tdwadmin users 20031 Sep 17 14:41 shuffle_212_26_0
 -rw-r- 1 tdwadmin users 15158 Sep 17 14:41 shuffle_212_47_1
 -rw-r- 1 tdwadmin users 42880 Sep 17 12:12 shuffle_63_1_1
 -rw-r- 1 tdwadmin users 32030 Sep 17 12:14 shuffle_65_40_0
 -rw-r- 1 tdwadmin users 34477 Sep 17 12:33 shuffle_84_2_1
 The shuffle data of stage 63, 65, 84, 132... are not cleaned.
 In ContextCleaner, it maintains a weak reference for each RDD, 
 ShuffleDependency, and Broadcast of interest,  to be processed when the 
 associated object goes out of scope of the application. Actual  cleanup is 
 performed in a separate daemon thread. 
 There must be some  reference for ShuffleDependency , and it's hard to find 
 out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org