[jira] [Assigned] (SPARK-24917) Sending a partition over netty results in 2x memory usage
[ https://issues.apache.org/jira/browse/SPARK-24917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24917: Assignee: (was: Apache Spark) > Sending a partition over netty results in 2x memory usage > - > > Key: SPARK-24917 > URL: https://issues.apache.org/jira/browse/SPARK-24917 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.2 >Reporter: Vincent >Priority: Major > > Hello > while investigating some OOM errors in Spark 2.2 [(here's my call > stack)|https://image.ibb.co/hHa2R8/sparkOOM.png], I find the following > behavior happening, which I think is weird: > * a request happens to send a partition over network > * this partition is 1.9 GB and is persisted in memory > * this partition is apparently stored in a ByteBufferBlockData, that is made > of a ChunkedByteBuffer, which is a list of (lots of) ByteBuffer of 4 MB each. > * the call to toNetty() is supposed to only wrap all the arrays and not > allocate any memory > * yet the call stack shows that netty is allocating memory and is trying to > consolidate all the chunks into one big 1.9GB array > * this means that at this point the memory footprint is 2x the size of the > actual partition (which is huge when the partition is 1.9GB) > Is this transient allocation expected? > After digging, it turns out that the actual copy is due to [this > method|https://github.com/netty/netty/blob/4.0/buffer/src/main/java/io/netty/buffer/Unpooled.java#L260] > in netty. If my initial buffer is made of more than DEFAULT_MAX_COMPONENTS > (16) components it will trigger a re-allocation of all the buffer. This netty > issue was fixed in this recent change : > [https://github.com/netty/netty/commit/9b95b8ee628983e3e4434da93fffb893edff4aa2] > > As a result, is it possible to benefit from this change somehow in spark 2.2 > and above? I don't know how the netty dependencies are handled for spark > > NB: it seems this ticket: [https://jira.apache.org/jira/browse/SPARK-24307] > kinda changed the approach for spark 2.4 by bypassing netty buffer > altogether. However as it is written in the ticket, this approach *still* > needs to have the *entire* block serialized in memory, so this would be a > downgrade from fixing the netty issue when your buffer in < 2GB > > Thanks! > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24917) Sending a partition over netty results in 2x memory usage
[ https://issues.apache.org/jira/browse/SPARK-24917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24917: Assignee: Apache Spark > Sending a partition over netty results in 2x memory usage > - > > Key: SPARK-24917 > URL: https://issues.apache.org/jira/browse/SPARK-24917 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.2 >Reporter: Vincent >Assignee: Apache Spark >Priority: Major > > Hello > while investigating some OOM errors in Spark 2.2 [(here's my call > stack)|https://image.ibb.co/hHa2R8/sparkOOM.png], I find the following > behavior happening, which I think is weird: > * a request happens to send a partition over network > * this partition is 1.9 GB and is persisted in memory > * this partition is apparently stored in a ByteBufferBlockData, that is made > of a ChunkedByteBuffer, which is a list of (lots of) ByteBuffer of 4 MB each. > * the call to toNetty() is supposed to only wrap all the arrays and not > allocate any memory > * yet the call stack shows that netty is allocating memory and is trying to > consolidate all the chunks into one big 1.9GB array > * this means that at this point the memory footprint is 2x the size of the > actual partition (which is huge when the partition is 1.9GB) > Is this transient allocation expected? > After digging, it turns out that the actual copy is due to [this > method|https://github.com/netty/netty/blob/4.0/buffer/src/main/java/io/netty/buffer/Unpooled.java#L260] > in netty. If my initial buffer is made of more than DEFAULT_MAX_COMPONENTS > (16) components it will trigger a re-allocation of all the buffer. This netty > issue was fixed in this recent change : > [https://github.com/netty/netty/commit/9b95b8ee628983e3e4434da93fffb893edff4aa2] > > As a result, is it possible to benefit from this change somehow in spark 2.2 > and above? I don't know how the netty dependencies are handled for spark > > NB: it seems this ticket: [https://jira.apache.org/jira/browse/SPARK-24307] > kinda changed the approach for spark 2.4 by bypassing netty buffer > altogether. However as it is written in the ticket, this approach *still* > needs to have the *entire* block serialized in memory, so this would be a > downgrade from fixing the netty issue when your buffer in < 2GB > > Thanks! > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org