[ 
https://issues.apache.org/jira/browse/FLINK-24959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17452755#comment-17452755
 ] 

ZhuoYu Chen edited comment on FLINK-24959 at 12/3/21, 5:12 AM:
---------------------------------------------------------------

hi [~jark] clickhouse,starRocks,doris all support bitmap

Traditional Count distinct calculation, due to the need for multiple data 
shuffle (transferring data between different nodes and calculating 
de-duplication) during query execution, can lead to a linear decrease in 
performance as the amount of data increases.

Advantages of bitmap de-duplication

    Space advantage: The use of one bit of a bitmap to indicate the existence 
of the corresponding subscript has a great space advantage; for example, for 
int32 de-duplication, the storage space required by a normal bitmap is only 
1/32 of the traditional de-duplication. With the optimized implementation of 
Roaring Bitmap, the storage space is further reduced significantly for sparse 
bitmaps.
    Time advantage: The computation involved in bitmap de-duplication includes 
bit placement for a given subscript and counting the number of bits in a 
bitmap, which is an O(1) operation and an O(n) operation, respectively, and the 
latter can be efficiently computed using clz, ctz, and other instructions.


was (Author: monster#12):
hi [~jark] clickhouse,starRocks,doris all support bitmap

Traditional Count distinct calculation, due to the need for multiple data 
shuffle (transferring data between different nodes and calculating 
de-duplication) during query execution, can lead to a linear decrease in 
performance as the amount of data increases.

Advantages of bitmap de-duplication

    Space advantage: The use of one bit of a bitmap to indicate the existence 
of the corresponding subscript has a great space advantage; for example, for 
int32 de-duplication, the storage space required by a normal bitmap is only 
1/32 of the traditional de-duplication. With the optimized implementation of 
Roaring Bitmap, the storage space is further reduced significantly for sparse 
bitmaps.
    Time advantage: The computation involved in bitmap de-duplication includes 
bit placement for a given subscript and counting the number of bits in a 
bitmap, which is an O(1) operation and an O(n) operation, respectively, and the 
latter can be efficiently computed using clz, ctz, and other instructions. In 
addition, bitmap de-duplication can be accelerated in parallel in the MPP 
execution engine, where each computation node computes a local sub-bitmap and 
uses the bitor operation to merge these sub-bitmaps into the final bitmap. The 
bitor operation is more efficient than sort-based and hash-based 
de-duplication, has no conditional dependencies and data dependencies, and can 
be executed quantitatively.

> Add a BitMap function to FlinkSQL
> ---------------------------------
>
>                 Key: FLINK-24959
>                 URL: https://issues.apache.org/jira/browse/FLINK-24959
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table SQL / API
>    Affects Versions: 1.15.0
>            Reporter: ZhuoYu Chen
>            Priority: Minor
>
> bitmap_and :{color:#333333}Computes the intersection of two input bitmaps and 
> returns the new bitmap{color}
> {color:#30323e}bitmap_andnot:{color:#333333}Computes the set (difference set) 
> that is in A but not in B.{color}{color}
> {color:#30323e}{color:#333333}Bitmap functions related to join operations, 
> etc{color}{color}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to