[jira] [Resolved] (SPARK-6738) EstimateSize is difference with spill file size

2015-04-27 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-6738.
--
   Resolution: Fixed
Fix Version/s: 1.4.0
 Assignee: Hong Shen

Resolved by https://github.com/apache/spark/pull/5608

 EstimateSize  is difference with spill file size
 

 Key: SPARK-6738
 URL: https://issues.apache.org/jira/browse/SPARK-6738
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.2.0
Reporter: Hong Shen
Assignee: Hong Shen
 Fix For: 1.4.0


 ExternalAppendOnlyMap spill 2.2 GB data to disk:
 {code}
 15/04/07 20:27:37 INFO collection.ExternalAppendOnlyMap: Thread 54 spilling 
 in-memory map of 2.2 GB to disk (61 times so far)
 15/04/07 20:27:37 INFO collection.ExternalAppendOnlyMap: 
 /data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40455651/spark-local-20150407202613-4e80/11/temp_local_fdb4a583-5d13-4394-bccb-e1217d5db812
 {code}
 But the file size is only 2.2M.
 {code}
 ll -h 
 /data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40455651/spark-local-20150407202613-4e80/11/
 total 2.2M
 -rw-r- 1 spark users 2.2M Apr  7 20:27 
 temp_local_fdb4a583-5d13-4394-bccb-e1217d5db812
 {code}
 The GC log show that the jvm memory is less than 1GB.
 {code}
 2015-04-07T20:27:08.023+0800: [GC 981981K-55363K(3961344K), 0.0341720 secs]
 2015-04-07T20:27:14.483+0800: [GC 987523K-53737K(3961344K), 0.0252660 secs]
 2015-04-07T20:27:20.793+0800: [GC 985897K-56370K(3961344K), 0.0606460 secs]
 2015-04-07T20:27:27.553+0800: [GC 988530K-59089K(3961344K), 0.0651840 secs]
 2015-04-07T20:27:34.067+0800: [GC 991249K-62153K(3961344K), 0.0288460 secs]
 2015-04-07T20:27:40.180+0800: [GC 994313K-61344K(3961344K), 0.0388970 secs]
 2015-04-07T20:27:46.490+0800: [GC 993504K-59915K(3961344K), 0.0235150 secs]
 {code}
 The estimateSize  is hugh difference with spill file size, there is a bug in 
 SizeEstimator.visitArray.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-6738) EstimateSize is difference with spill file size

2015-04-13 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-6738.
--
Resolution: Not A Problem

We can reopen if there is more detail, but the problem report is focusing on 
the size of one spill file when there are lots of them. The in-memory size is 
also not necessarily the on-disk size. I haven't seen a report of a problem 
here either, like something that then fails.

 EstimateSize  is difference with spill file size
 

 Key: SPARK-6738
 URL: https://issues.apache.org/jira/browse/SPARK-6738
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.2.0
Reporter: Hong Shen

 ExternalAppendOnlyMap spill 2.2 GB data to disk:
 {code}
 15/04/07 20:27:37 INFO collection.ExternalAppendOnlyMap: Thread 54 spilling 
 in-memory map of 2.2 GB to disk (61 times so far)
 15/04/07 20:27:37 INFO collection.ExternalAppendOnlyMap: 
 /data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40455651/spark-local-20150407202613-4e80/11/temp_local_fdb4a583-5d13-4394-bccb-e1217d5db812
 {code}
 But the file size is only 2.2M.
 {code}
 ll -h 
 /data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40455651/spark-local-20150407202613-4e80/11/
 total 2.2M
 -rw-r- 1 spark users 2.2M Apr  7 20:27 
 temp_local_fdb4a583-5d13-4394-bccb-e1217d5db812
 {code}
 The GC log show that the jvm memory is less than 1GB.
 {code}
 2015-04-07T20:27:08.023+0800: [GC 981981K-55363K(3961344K), 0.0341720 secs]
 2015-04-07T20:27:14.483+0800: [GC 987523K-53737K(3961344K), 0.0252660 secs]
 2015-04-07T20:27:20.793+0800: [GC 985897K-56370K(3961344K), 0.0606460 secs]
 2015-04-07T20:27:27.553+0800: [GC 988530K-59089K(3961344K), 0.0651840 secs]
 2015-04-07T20:27:34.067+0800: [GC 991249K-62153K(3961344K), 0.0288460 secs]
 2015-04-07T20:27:40.180+0800: [GC 994313K-61344K(3961344K), 0.0388970 secs]
 2015-04-07T20:27:46.490+0800: [GC 993504K-59915K(3961344K), 0.0235150 secs]
 {code}
 The estimateSize  is hugh difference with spill file size



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org