[ 
https://issues.apache.org/jira/browse/SPARK-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-6738:
-----------------------------
    Description: 
ExternalAppendOnlyMap spill 2.2 GB data to disk:

{code}

15/04/07 20:27:37 INFO collection.ExternalAppendOnlyMap: Thread 54 spilling 
in-memory map of 2.2 GB to disk (61 times so far)
15/04/07 20:27:37 INFO collection.ExternalAppendOnlyMap: 
/data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40455651/spark-local-20150407202613-4e80/11/temp_local_fdb4a583-5d13-4394-bccb-e1217d5db812
{code}

But the file size is only 2.2M.

{code}
ll -h 
/data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40455651/spark-local-20150407202613-4e80/11/
total 2.2M
-rw-r----- 1 spark users 2.2M Apr  7 20:27 
temp_local_fdb4a583-5d13-4394-bccb-e1217d5db812
{code}

The GC log show that the jvm memory is less than 1GB.
{code}
2015-04-07T20:27:08.023+0800: [GC 981981K->55363K(3961344K), 0.0341720 secs]
2015-04-07T20:27:14.483+0800: [GC 987523K->53737K(3961344K), 0.0252660 secs]
2015-04-07T20:27:20.793+0800: [GC 985897K->56370K(3961344K), 0.0606460 secs]
2015-04-07T20:27:27.553+0800: [GC 988530K->59089K(3961344K), 0.0651840 secs]
2015-04-07T20:27:34.067+0800: [GC 991249K->62153K(3961344K), 0.0288460 secs]
2015-04-07T20:27:40.180+0800: [GC 994313K->61344K(3961344K), 0.0388970 secs]
2015-04-07T20:27:46.490+0800: [GC 993504K->59915K(3961344K), 0.0235150 secs]
{code}

The estimateSize  is hugh difference with spill file size, there is a bug in 

  was:
ExternalAppendOnlyMap spill 2.2 GB data to disk:

{code}

15/04/07 20:27:37 INFO collection.ExternalAppendOnlyMap: Thread 54 spilling 
in-memory map of 2.2 GB to disk (61 times so far)
15/04/07 20:27:37 INFO collection.ExternalAppendOnlyMap: 
/data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40455651/spark-local-20150407202613-4e80/11/temp_local_fdb4a583-5d13-4394-bccb-e1217d5db812
{code}

But the file size is only 2.2M.

{code}
ll -h 
/data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40455651/spark-local-20150407202613-4e80/11/
total 2.2M
-rw-r----- 1 spark users 2.2M Apr  7 20:27 
temp_local_fdb4a583-5d13-4394-bccb-e1217d5db812
{code}

The GC log show that the jvm memory is less than 1GB.
{code}
2015-04-07T20:27:08.023+0800: [GC 981981K->55363K(3961344K), 0.0341720 secs]
2015-04-07T20:27:14.483+0800: [GC 987523K->53737K(3961344K), 0.0252660 secs]
2015-04-07T20:27:20.793+0800: [GC 985897K->56370K(3961344K), 0.0606460 secs]
2015-04-07T20:27:27.553+0800: [GC 988530K->59089K(3961344K), 0.0651840 secs]
2015-04-07T20:27:34.067+0800: [GC 991249K->62153K(3961344K), 0.0288460 secs]
2015-04-07T20:27:40.180+0800: [GC 994313K->61344K(3961344K), 0.0388970 secs]
2015-04-07T20:27:46.490+0800: [GC 993504K->59915K(3961344K), 0.0235150 secs]
{code}

The estimateSize  is hugh difference with spill file size


> EstimateSize  is difference with spill file size
> ------------------------------------------------
>
>                 Key: SPARK-6738
>                 URL: https://issues.apache.org/jira/browse/SPARK-6738
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.2.0
>            Reporter: Hong Shen
>
> ExternalAppendOnlyMap spill 2.2 GB data to disk:
> {code}
> 15/04/07 20:27:37 INFO collection.ExternalAppendOnlyMap: Thread 54 spilling 
> in-memory map of 2.2 GB to disk (61 times so far)
> 15/04/07 20:27:37 INFO collection.ExternalAppendOnlyMap: 
> /data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40455651/spark-local-20150407202613-4e80/11/temp_local_fdb4a583-5d13-4394-bccb-e1217d5db812
> {code}
> But the file size is only 2.2M.
> {code}
> ll -h 
> /data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40455651/spark-local-20150407202613-4e80/11/
> total 2.2M
> -rw-r----- 1 spark users 2.2M Apr  7 20:27 
> temp_local_fdb4a583-5d13-4394-bccb-e1217d5db812
> {code}
> The GC log show that the jvm memory is less than 1GB.
> {code}
> 2015-04-07T20:27:08.023+0800: [GC 981981K->55363K(3961344K), 0.0341720 secs]
> 2015-04-07T20:27:14.483+0800: [GC 987523K->53737K(3961344K), 0.0252660 secs]
> 2015-04-07T20:27:20.793+0800: [GC 985897K->56370K(3961344K), 0.0606460 secs]
> 2015-04-07T20:27:27.553+0800: [GC 988530K->59089K(3961344K), 0.0651840 secs]
> 2015-04-07T20:27:34.067+0800: [GC 991249K->62153K(3961344K), 0.0288460 secs]
> 2015-04-07T20:27:40.180+0800: [GC 994313K->61344K(3961344K), 0.0388970 secs]
> 2015-04-07T20:27:46.490+0800: [GC 993504K->59915K(3961344K), 0.0235150 secs]
> {code}
> The estimateSize  is hugh difference with spill file size, there is a bug in 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to