[ https://issues.apache.org/jira/browse/SPARK-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hong Shen updated SPARK-6738: ----------------------------- Description: ExternalAppendOnlyMap spill 2.2 GB data to disk: {code} 15/04/07 20:27:37 INFO collection.ExternalAppendOnlyMap: Thread 54 spilling in-memory map of 2.2 GB to disk (61 times so far) 15/04/07 20:27:37 INFO collection.ExternalAppendOnlyMap: /data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40455651/spark-local-20150407202613-4e80/11/temp_local_fdb4a583-5d13-4394-bccb-e1217d5db812 {code} But the file size is only 2.2M. {code} ll -h /data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40455651/spark-local-20150407202613-4e80/11/ total 2.2M -rw-r----- 1 spark users 2.2M Apr 7 20:27 temp_local_fdb4a583-5d13-4394-bccb-e1217d5db812 {code} The GC log show that the jvm memory is less than 1GB. {code} 2015-04-07T20:27:08.023+0800: [GC 981981K->55363K(3961344K), 0.0341720 secs] 2015-04-07T20:27:14.483+0800: [GC 987523K->53737K(3961344K), 0.0252660 secs] 2015-04-07T20:27:20.793+0800: [GC 985897K->56370K(3961344K), 0.0606460 secs] 2015-04-07T20:27:27.553+0800: [GC 988530K->59089K(3961344K), 0.0651840 secs] 2015-04-07T20:27:34.067+0800: [GC 991249K->62153K(3961344K), 0.0288460 secs] 2015-04-07T20:27:40.180+0800: [GC 994313K->61344K(3961344K), 0.0388970 secs] 2015-04-07T20:27:46.490+0800: [GC 993504K->59915K(3961344K), 0.0235150 secs] {code} The estimateSize is hugh difference with spill file size, there is a bug in was: ExternalAppendOnlyMap spill 2.2 GB data to disk: {code} 15/04/07 20:27:37 INFO collection.ExternalAppendOnlyMap: Thread 54 spilling in-memory map of 2.2 GB to disk (61 times so far) 15/04/07 20:27:37 INFO collection.ExternalAppendOnlyMap: /data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40455651/spark-local-20150407202613-4e80/11/temp_local_fdb4a583-5d13-4394-bccb-e1217d5db812 {code} But the file size is only 2.2M. {code} ll -h /data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40455651/spark-local-20150407202613-4e80/11/ total 2.2M -rw-r----- 1 spark users 2.2M Apr 7 20:27 temp_local_fdb4a583-5d13-4394-bccb-e1217d5db812 {code} The GC log show that the jvm memory is less than 1GB. {code} 2015-04-07T20:27:08.023+0800: [GC 981981K->55363K(3961344K), 0.0341720 secs] 2015-04-07T20:27:14.483+0800: [GC 987523K->53737K(3961344K), 0.0252660 secs] 2015-04-07T20:27:20.793+0800: [GC 985897K->56370K(3961344K), 0.0606460 secs] 2015-04-07T20:27:27.553+0800: [GC 988530K->59089K(3961344K), 0.0651840 secs] 2015-04-07T20:27:34.067+0800: [GC 991249K->62153K(3961344K), 0.0288460 secs] 2015-04-07T20:27:40.180+0800: [GC 994313K->61344K(3961344K), 0.0388970 secs] 2015-04-07T20:27:46.490+0800: [GC 993504K->59915K(3961344K), 0.0235150 secs] {code} The estimateSize is hugh difference with spill file size > EstimateSize is difference with spill file size > ------------------------------------------------ > > Key: SPARK-6738 > URL: https://issues.apache.org/jira/browse/SPARK-6738 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.2.0 > Reporter: Hong Shen > > ExternalAppendOnlyMap spill 2.2 GB data to disk: > {code} > 15/04/07 20:27:37 INFO collection.ExternalAppendOnlyMap: Thread 54 spilling > in-memory map of 2.2 GB to disk (61 times so far) > 15/04/07 20:27:37 INFO collection.ExternalAppendOnlyMap: > /data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40455651/spark-local-20150407202613-4e80/11/temp_local_fdb4a583-5d13-4394-bccb-e1217d5db812 > {code} > But the file size is only 2.2M. > {code} > ll -h > /data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40455651/spark-local-20150407202613-4e80/11/ > total 2.2M > -rw-r----- 1 spark users 2.2M Apr 7 20:27 > temp_local_fdb4a583-5d13-4394-bccb-e1217d5db812 > {code} > The GC log show that the jvm memory is less than 1GB. > {code} > 2015-04-07T20:27:08.023+0800: [GC 981981K->55363K(3961344K), 0.0341720 secs] > 2015-04-07T20:27:14.483+0800: [GC 987523K->53737K(3961344K), 0.0252660 secs] > 2015-04-07T20:27:20.793+0800: [GC 985897K->56370K(3961344K), 0.0606460 secs] > 2015-04-07T20:27:27.553+0800: [GC 988530K->59089K(3961344K), 0.0651840 secs] > 2015-04-07T20:27:34.067+0800: [GC 991249K->62153K(3961344K), 0.0288460 secs] > 2015-04-07T20:27:40.180+0800: [GC 994313K->61344K(3961344K), 0.0388970 secs] > 2015-04-07T20:27:46.490+0800: [GC 993504K->59915K(3961344K), 0.0235150 secs] > {code} > The estimateSize is hugh difference with spill file size, there is a bug in -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org