[jira] [Updated] (SPARK-25034) possible triple memory consumption in fetchBlockSync()

2020-03-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-25034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25034:
--
Affects Version/s: (was: 3.0.0)
   3.1.0

> possible triple memory consumption in fetchBlockSync()
> --
>
> Key: SPARK-25034
> URL: https://issues.apache.org/jira/browse/SPARK-25034
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Vincent
>Priority: Major
>
> Hello
> in the code of  _fetchBlockSync_() in _blockTransferService_, we have:
>  
> {code:java}
> val ret = ByteBuffer.allocate(data.size.toInt)
> ret.put(data.nioByteBuffer())
> ret.flip()
> result.success(new NioManagedBuffer(ret)) 
> {code}
> In some cases, the _data_ variable is a _NettyManagedBuffer_, whose 
> underlying netty representation is a _CompositeByteBuffer_.
> Going through the code above in this configuration, assuming that the 
> variable _data_ holds N bytes:
> 1) we allocate a full buffer of N bytes in _ret_
> 2) calling _data.nioByteBuffer()_ on a  _CompositeByteBuffer_ will trigger a 
> full merge of all the composite buffers, which will allocate  *again* a full 
> buffer of N bytes
> 3) we copy to _ret_ the data byte by byte
> This means that at some point the N bytes of data are located 3 times in 
> memory.
> Is this really necessary?
> It seems unclear to me why we have to process at all the data, given that we 
> receive a _ManagedBuffer_ and we want to return a _ManagedBuffer_ 
> Is there something I'm missing here? It seems this whole operation could be 
> done with 0 copies. 
> The only upside here is that the new buffer will have merged all the 
> composite buffer's arrays, but it is really not clear if this is intended. In 
> any case this could be done with peak memory of 2N and not 3N
> Cheers!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25034) possible triple memory consumption in fetchBlockSync()

2019-07-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25034:
--
Affects Version/s: (was: 2.4.0)
   (was: 2.2.2)
   (was: 2.3.0)
   3.0.0

> possible triple memory consumption in fetchBlockSync()
> --
>
> Key: SPARK-25034
> URL: https://issues.apache.org/jira/browse/SPARK-25034
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Vincent
>Priority: Major
>
> Hello
> in the code of  _fetchBlockSync_() in _blockTransferService_, we have:
>  
> {code:java}
> val ret = ByteBuffer.allocate(data.size.toInt)
> ret.put(data.nioByteBuffer())
> ret.flip()
> result.success(new NioManagedBuffer(ret)) 
> {code}
> In some cases, the _data_ variable is a _NettyManagedBuffer_, whose 
> underlying netty representation is a _CompositeByteBuffer_.
> Going through the code above in this configuration, assuming that the 
> variable _data_ holds N bytes:
> 1) we allocate a full buffer of N bytes in _ret_
> 2) calling _data.nioByteBuffer()_ on a  _CompositeByteBuffer_ will trigger a 
> full merge of all the composite buffers, which will allocate  *again* a full 
> buffer of N bytes
> 3) we copy to _ret_ the data byte by byte
> This means that at some point the N bytes of data are located 3 times in 
> memory.
> Is this really necessary?
> It seems unclear to me why we have to process at all the data, given that we 
> receive a _ManagedBuffer_ and we want to return a _ManagedBuffer_ 
> Is there something I'm missing here? It seems this whole operation could be 
> done with 0 copies. 
> The only upside here is that the new buffer will have merged all the 
> composite buffer's arrays, but it is really not clear if this is intended. In 
> any case this could be done with peak memory of 2N and not 3N
> Cheers!
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org