[ https://issues.apache.org/jira/browse/SPARK-25034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-25034: ---------------------------------- Affects Version/s: (was: 3.0.0) 3.1.0 > possible triple memory consumption in fetchBlockSync() > ------------------------------------------------------ > > Key: SPARK-25034 > URL: https://issues.apache.org/jira/browse/SPARK-25034 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 3.1.0 > Reporter: Vincent > Priority: Major > > Hello > in the code of _fetchBlockSync_() in _blockTransferService_, we have: > > {code:java} > val ret = ByteBuffer.allocate(data.size.toInt) > ret.put(data.nioByteBuffer()) > ret.flip() > result.success(new NioManagedBuffer(ret)) > {code} > In some cases, the _data_ variable is a _NettyManagedBuffer_, whose > underlying netty representation is a _CompositeByteBuffer_. > Going through the code above in this configuration, assuming that the > variable _data_ holds N bytes: > 1) we allocate a full buffer of N bytes in _ret_ > 2) calling _data.nioByteBuffer()_ on a _CompositeByteBuffer_ will trigger a > full merge of all the composite buffers, which will allocate *again* a full > buffer of N bytes > 3) we copy to _ret_ the data byte by byte > This means that at some point the N bytes of data are located 3 times in > memory. > Is this really necessary? > It seems unclear to me why we have to process at all the data, given that we > receive a _ManagedBuffer_ and we want to return a _ManagedBuffer_ > Is there something I'm missing here? It seems this whole operation could be > done with 0 copies. > The only upside here is that the new buffer will have merged all the > composite buffer's arrays, but it is really not clear if this is intended. In > any case this could be done with peak memory of 2N and not 3N > Cheers! > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org