[ https://issues.apache.org/jira/browse/SPARK-27367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822045#comment-16822045 ]
Imran Rashid commented on SPARK-27367: -------------------------------------- Did you change spark code as well, to use the new suggested api for serde? Or just upgrade the version of roaring bitmap? the size of the bitmap is related to how many partitions there are on the reduce side of a shufflemap task. And then the number of messages which go to the driver is related to the number of map tasks, and how many tasks are running concurrently. Some users on large clusters run with > 10K tasks on each side. > Faster RoaringBitmap Serialization with v0.8.0 > ---------------------------------------------- > > Key: SPARK-27367 > URL: https://issues.apache.org/jira/browse/SPARK-27367 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 3.0.0 > Reporter: Imran Rashid > Priority: Major > > RoaringBitmap 0.8.0 adds faster serde, but also requires us to change how we > call the serde routines slightly to take advantage of it. This is probably a > worthwhile optimization as the every shuffle map task with a large # of > partitions generates these bitmaps, and the driver especially has to > deserialize many of these messages. > See > * https://github.com/apache/spark/pull/24264#issuecomment-479675572 > * https://github.com/RoaringBitmap/RoaringBitmap/pull/325 > * https://github.com/RoaringBitmap/RoaringBitmap/issues/319 -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org