[ 
https://issues.apache.org/jira/browse/SPARK-27367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822045#comment-16822045
 ] 

Imran Rashid commented on SPARK-27367:
--------------------------------------

Did you change spark code as well, to use the new suggested api for serde?  Or 
just upgrade the version of roaring bitmap?

the size of the bitmap is related to how many partitions there are on the 
reduce side of a shufflemap task.  And then the number of messages which go to 
the driver is related to the number of map tasks, and how many tasks are 
running concurrently.  Some users on large clusters run with > 10K tasks on 
each side.

> Faster RoaringBitmap Serialization with v0.8.0
> ----------------------------------------------
>
>                 Key: SPARK-27367
>                 URL: https://issues.apache.org/jira/browse/SPARK-27367
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.0.0
>            Reporter: Imran Rashid
>            Priority: Major
>
> RoaringBitmap 0.8.0 adds faster serde, but also requires us to change how we 
> call the serde routines slightly to take advantage of it.  This is probably a 
> worthwhile optimization as the every shuffle map task with a large # of 
> partitions generates these bitmaps, and the driver especially has to 
> deserialize many of these messages.
> See 
> * https://github.com/apache/spark/pull/24264#issuecomment-479675572
> * https://github.com/RoaringBitmap/RoaringBitmap/pull/325
> * https://github.com/RoaringBitmap/RoaringBitmap/issues/319



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to