zenoyang opened a new pull request, #19880:
URL: https://github.com/apache/doris/pull/19880
# Proposed changes
Issue Number: close #xxx
Support CRoaring COW feature to reduce bitmap copying
## Problem summary
In some current scenarios, bitmap column copying occurs, for example,
multiple derived columns are derived from the same bitmap column, or the
build/probe column is copied to the join_block during join. Generally, the copy
performance of bitmap is relatively slow, and the CRoaring CopyOnWrite feature
can reduce copying, thereby improving query performance.
Note: CRoaring-0.4.0 COW is thread-unsafe, so it needs to be patched to make
it thread-safe.
The following is the simplified sql of our production scenario:
```sql
SELECT `t2`.`page`
, `t2`.`entrance`
, `t1`.`partition_date`
, `t1`.`entrance_code`
, BITMAP_UNION_COUNT(CASE WHEN t1.type = 3 THEN t1.device_id END) AS `dau`
FROM `doris_vec_stage`.`tbl1` `t1`
LEFT JOIN `doris_vec_stage`.`tbl2` `t2`
ON `t1`.`first_entrance` = `t2`.`second_join_key`
AND `t1`.`page_id` = `t2`.`first_join_key`
WHERE `t2`.`entrance` IN ('******', '******')
AND `t2`.`page` = '******'
AND `t1`.`partition_date` BETWEEN '2023-02-27' AND '2023-03-19'
GROUP BY 1, 2, 3, 4
```
(The following tests are based on doris 1.1.5)
Before, the query took 112756 ms:
```
ProbePhase:
- ProbeExprCallTime: 1.836ms
- ProbeFindNextTime: 5s934ms
- ProbeRows: 90.048K (90048)
- ProbeTime: 1m52s
- ProbeWhenBuildSideOutputTime: 62.457ms
- ProbeWhenProbeSideOutputTime: 54s431ms
- ProbeWhenSearchHashTableTime: 27.254ms
```
Column copy time: `ProbeWhenProbeSideOutputTime: 54s431ms`
Column clear time is not printed here, the actual value is probably:
(112000-5934-54431) ms = 51653 ms
After (open croaing cow), the query takes 53585 ms:
```
ProbePhase:
- ProbeExprCallTime: 1.331ms
- ProbeFindNextTime: 40s635ms
- ProbeRows: 37.348K (37348)
- ProbeTime: 53s96ms
- ProbeWhenBuildSideOutputTime: 17.399ms
- ProbeWhenProbeSideOutputTime: 6s247ms
- ProbeWhenSearchHashTableTime: 6.700ms
```
Column copy time: `ProbeWhenProbeSideOutputTime: 6s247ms`
Column clear time: (53096-40635-6247) ms = 6214 ms
Describe your changes.
## Checklist(Required)
* [ ] Does it affect the original behavior
* [ ] Has unit tests been added
* [ ] Has document been added or modified
* [x] Does it need to update dependencies
* [x] Is this PR support rollback (If NO, please explain WHY)
## Further comments
If this is a relatively large or complex change, kick off the discussion at
[[email protected]](mailto:[email protected]) by explaining why you
chose the solution you did and what alternatives you considered, etc...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]