[jira] [Resolved] (SPARK-6738) EstimateSize is difference with spill file size
[ https://issues.apache.org/jira/browse/SPARK-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-6738. -- Resolution: Fixed Fix Version/s: 1.4.0 Assignee: Hong Shen Resolved by https://github.com/apache/spark/pull/5608 EstimateSize is difference with spill file size Key: SPARK-6738 URL: https://issues.apache.org/jira/browse/SPARK-6738 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.2.0 Reporter: Hong Shen Assignee: Hong Shen Fix For: 1.4.0 ExternalAppendOnlyMap spill 2.2 GB data to disk: {code} 15/04/07 20:27:37 INFO collection.ExternalAppendOnlyMap: Thread 54 spilling in-memory map of 2.2 GB to disk (61 times so far) 15/04/07 20:27:37 INFO collection.ExternalAppendOnlyMap: /data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40455651/spark-local-20150407202613-4e80/11/temp_local_fdb4a583-5d13-4394-bccb-e1217d5db812 {code} But the file size is only 2.2M. {code} ll -h /data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40455651/spark-local-20150407202613-4e80/11/ total 2.2M -rw-r- 1 spark users 2.2M Apr 7 20:27 temp_local_fdb4a583-5d13-4394-bccb-e1217d5db812 {code} The GC log show that the jvm memory is less than 1GB. {code} 2015-04-07T20:27:08.023+0800: [GC 981981K-55363K(3961344K), 0.0341720 secs] 2015-04-07T20:27:14.483+0800: [GC 987523K-53737K(3961344K), 0.0252660 secs] 2015-04-07T20:27:20.793+0800: [GC 985897K-56370K(3961344K), 0.0606460 secs] 2015-04-07T20:27:27.553+0800: [GC 988530K-59089K(3961344K), 0.0651840 secs] 2015-04-07T20:27:34.067+0800: [GC 991249K-62153K(3961344K), 0.0288460 secs] 2015-04-07T20:27:40.180+0800: [GC 994313K-61344K(3961344K), 0.0388970 secs] 2015-04-07T20:27:46.490+0800: [GC 993504K-59915K(3961344K), 0.0235150 secs] {code} The estimateSize is hugh difference with spill file size, there is a bug in SizeEstimator.visitArray. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-6738) EstimateSize is difference with spill file size
[ https://issues.apache.org/jira/browse/SPARK-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-6738. -- Resolution: Not A Problem We can reopen if there is more detail, but the problem report is focusing on the size of one spill file when there are lots of them. The in-memory size is also not necessarily the on-disk size. I haven't seen a report of a problem here either, like something that then fails. EstimateSize is difference with spill file size Key: SPARK-6738 URL: https://issues.apache.org/jira/browse/SPARK-6738 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.2.0 Reporter: Hong Shen ExternalAppendOnlyMap spill 2.2 GB data to disk: {code} 15/04/07 20:27:37 INFO collection.ExternalAppendOnlyMap: Thread 54 spilling in-memory map of 2.2 GB to disk (61 times so far) 15/04/07 20:27:37 INFO collection.ExternalAppendOnlyMap: /data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40455651/spark-local-20150407202613-4e80/11/temp_local_fdb4a583-5d13-4394-bccb-e1217d5db812 {code} But the file size is only 2.2M. {code} ll -h /data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40455651/spark-local-20150407202613-4e80/11/ total 2.2M -rw-r- 1 spark users 2.2M Apr 7 20:27 temp_local_fdb4a583-5d13-4394-bccb-e1217d5db812 {code} The GC log show that the jvm memory is less than 1GB. {code} 2015-04-07T20:27:08.023+0800: [GC 981981K-55363K(3961344K), 0.0341720 secs] 2015-04-07T20:27:14.483+0800: [GC 987523K-53737K(3961344K), 0.0252660 secs] 2015-04-07T20:27:20.793+0800: [GC 985897K-56370K(3961344K), 0.0606460 secs] 2015-04-07T20:27:27.553+0800: [GC 988530K-59089K(3961344K), 0.0651840 secs] 2015-04-07T20:27:34.067+0800: [GC 991249K-62153K(3961344K), 0.0288460 secs] 2015-04-07T20:27:40.180+0800: [GC 994313K-61344K(3961344K), 0.0388970 secs] 2015-04-07T20:27:46.490+0800: [GC 993504K-59915K(3961344K), 0.0235150 secs] {code} The estimateSize is hugh difference with spill file size -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org