[jira] [Updated] (SPARK-25034) possible triple memory consumption in fetchBlockSync()
[ https://issues.apache.org/jira/browse/SPARK-25034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-25034: -- Affects Version/s: (was: 3.0.0) 3.1.0 > possible triple memory consumption in fetchBlockSync() > -- > > Key: SPARK-25034 > URL: https://issues.apache.org/jira/browse/SPARK-25034 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Vincent >Priority: Major > > Hello > in the code of _fetchBlockSync_() in _blockTransferService_, we have: > > {code:java} > val ret = ByteBuffer.allocate(data.size.toInt) > ret.put(data.nioByteBuffer()) > ret.flip() > result.success(new NioManagedBuffer(ret)) > {code} > In some cases, the _data_ variable is a _NettyManagedBuffer_, whose > underlying netty representation is a _CompositeByteBuffer_. > Going through the code above in this configuration, assuming that the > variable _data_ holds N bytes: > 1) we allocate a full buffer of N bytes in _ret_ > 2) calling _data.nioByteBuffer()_ on a _CompositeByteBuffer_ will trigger a > full merge of all the composite buffers, which will allocate *again* a full > buffer of N bytes > 3) we copy to _ret_ the data byte by byte > This means that at some point the N bytes of data are located 3 times in > memory. > Is this really necessary? > It seems unclear to me why we have to process at all the data, given that we > receive a _ManagedBuffer_ and we want to return a _ManagedBuffer_ > Is there something I'm missing here? It seems this whole operation could be > done with 0 copies. > The only upside here is that the new buffer will have merged all the > composite buffer's arrays, but it is really not clear if this is intended. In > any case this could be done with peak memory of 2N and not 3N > Cheers! > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25034) possible triple memory consumption in fetchBlockSync()
[ https://issues.apache.org/jira/browse/SPARK-25034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-25034: -- Affects Version/s: (was: 2.4.0) (was: 2.2.2) (was: 2.3.0) 3.0.0 > possible triple memory consumption in fetchBlockSync() > -- > > Key: SPARK-25034 > URL: https://issues.apache.org/jira/browse/SPARK-25034 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Vincent >Priority: Major > > Hello > in the code of _fetchBlockSync_() in _blockTransferService_, we have: > > {code:java} > val ret = ByteBuffer.allocate(data.size.toInt) > ret.put(data.nioByteBuffer()) > ret.flip() > result.success(new NioManagedBuffer(ret)) > {code} > In some cases, the _data_ variable is a _NettyManagedBuffer_, whose > underlying netty representation is a _CompositeByteBuffer_. > Going through the code above in this configuration, assuming that the > variable _data_ holds N bytes: > 1) we allocate a full buffer of N bytes in _ret_ > 2) calling _data.nioByteBuffer()_ on a _CompositeByteBuffer_ will trigger a > full merge of all the composite buffers, which will allocate *again* a full > buffer of N bytes > 3) we copy to _ret_ the data byte by byte > This means that at some point the N bytes of data are located 3 times in > memory. > Is this really necessary? > It seems unclear to me why we have to process at all the data, given that we > receive a _ManagedBuffer_ and we want to return a _ManagedBuffer_ > Is there something I'm missing here? It seems this whole operation could be > done with 0 copies. > The only upside here is that the new buffer will have merged all the > composite buffer's arrays, but it is really not clear if this is intended. In > any case this could be done with peak memory of 2N and not 3N > Cheers! > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org