[jira] [Updated] (SPARK-25034) possible triple memory consumption in fetchBlockSync()

Dongjoon Hyun (Jira) Mon, 16 Mar 2020 15:55:29 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-25034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Dongjoon Hyun updated SPARK-25034:
----------------------------------
    Affects Version/s:     (was: 3.0.0)
                       3.1.0

> possible triple memory consumption in fetchBlockSync()
> ------------------------------------------------------
>
>                 Key: SPARK-25034
>                 URL: https://issues.apache.org/jira/browse/SPARK-25034
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.1.0
>            Reporter: Vincent
>            Priority: Major
>
> Hello
> in the code of  _fetchBlockSync_() in _blockTransferService_, we have:
>  
> {code:java}
> val ret = ByteBuffer.allocate(data.size.toInt)
> ret.put(data.nioByteBuffer())
> ret.flip()
> result.success(new NioManagedBuffer(ret)) 
> {code}
> In some cases, the _data_ variable is a _NettyManagedBuffer_, whose 
> underlying netty representation is a _CompositeByteBuffer_.
> Going through the code above in this configuration, assuming that the 
> variable _data_ holds N bytes:
> 1) we allocate a full buffer of N bytes in _ret_
> 2) calling _data.nioByteBuffer()_ on a  _CompositeByteBuffer_ will trigger a 
> full merge of all the composite buffers, which will allocate  *again* a full 
> buffer of N bytes
> 3) we copy to _ret_ the data byte by byte
> This means that at some point the N bytes of data are located 3 times in 
> memory.
> Is this really necessary?
> It seems unclear to me why we have to process at all the data, given that we 
> receive a _ManagedBuffer_ and we want to return a _ManagedBuffer_ 
> Is there something I'm missing here? It seems this whole operation could be 
> done with 0 copies. 
> The only upside here is that the new buffer will have merged all the 
> composite buffer's arrays, but it is really not clear if this is intended. In 
> any case this could be done with peak memory of 2N and not 3N
> Cheers!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-25034) possible triple memory consumption in fetchBlockSync()

Reply via email to