[
https://issues.apache.org/jira/browse/SPARK-12007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15028012#comment-15028012
]
Apache Spark commented on SPARK-12007:
--------------------------------------
User 'vanzin' has created a pull request for this issue:
https://github.com/apache/spark/pull/9987
> Network library's RPC layer requires a lot of copying
> -----------------------------------------------------
>
> Key: SPARK-12007
> URL: https://issues.apache.org/jira/browse/SPARK-12007
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Reporter: Marcelo Vanzin
>
> The network library's RPC layer has an external API based on byte arrays,
> instead of ByteBuffer; that requires a lot of copying since the internals of
> the library use ByteBuffers (or rather Netty's ByteBuf), and lots of external
> clients also use ByteBuffer.
> The extra copies could be avoided if the API used ByteBuffer instead.
> To show an extreme case, look at an RPC send via NettyRpcEnv:
> - message is encoded using JavaSerializer, resulting in a ByteBuffer
> - the ByteBuffer is copied into a byte array of the right size, since its
> internal array may be larger than the actual data it holds
> - the network library's encoder copies the byte array into a ByteBuf
> - finally the data is written to the socket
> The intermediate 2 copies could be avoided if the API allowed the original
> ByteBuffer to be sent instead.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]