airborne12 commented on PR #61200:
URL: https://github.com/apache/doris/pull/61200#issuecomment-4037869931
## Code Review — Final Report
**Reviewer**: 审查者 (automated)
**Verdict**: REQUEST_CHANGES (小修,方向正确)
---
### 总结
PR 正确识别了 `ExcludeScorer` 不处理 null bitmap 导致 `search('NOT field:value')` 错误包含
NULL 行的问题。`binding_keys` 管线透传是必要的(variant subcolumn 是高频使用场景)。
但当前方案用 `AndNotScorer`(eager bitmap materialization)替换 `ExcludeScorer`(lazy
iterator),在执行热路径上引入了性能退化。建议保留 `ExcludeScorer` 的 lazy 特性,仅增强其 null bitmap 处理能力。
---
### P1 — 性能:eager materialization 在热路径上的退化
**问题**:`AndNotScorer` 构造时一次性遍历所有 exclude docs 并物化为 bitmap:
```cpp
// AndNotScorer 构造函数
for (auto& scorer : excludes) {
while (scorer->doc() != TERMINATED) {
_exclude_true.add(scorer->doc()); // 遍历所有 exclude docs
scorer->advance();
}
}
```
**影响场景**:`search('msg:rare_term AND NOT msg:very_common_term')`
- `rare_term` 匹配 10 条,`very_common_term` 匹配 1000 万条
- 旧代码:ExcludeScorer 对 10 个 include doc 做 lazy seek,~10 次操作
- 新代码:AndNotScorer 先遍历 1000 万个 exclude docs 建 bitmap,再做 10 次 O(1) lookup
- **构造阶段浪费了 1000 万次迭代**
此路径位于查询执行核心(per-segment scorer 构建),性能影响会被 segment 数量放大。
**建议方案**:增强 `ExcludeScorer`,保持 lazy + 支持 null bitmap
```cpp
// 方案:ExcludeScorer 增强
template <typename TDocSet, typename TDocSetExclude>
class Exclude final : public Scorer {
public:
Exclude(TDocSet underlying, TDocSetExclude excluding,
roaring::Roaring exclude_null = {})
: _underlying(std::move(underlying)),
_excluding(std::move(excluding)),
_exclude_null(std::move(exclude_null)) {
// 原有初始化逻辑 + null 感知
_init_with_null_awareness();
}
uint32_t advance() override {
while (true) {
uint32_t candidate = _underlying->advance();
if (candidate == TERMINATED) return TERMINATED;
// O(1) null bitmap 检查(bitmap 从索引层预计算,cheap)
if (!_exclude_null.isEmpty() &&
_exclude_null.contains(candidate)) {
_null_bitmap.add(candidate);
continue;
}
// 原有 lazy seek 检查(不变)
if (!is_within(_excluding, candidate)) {
return candidate;
}
}
}
bool has_null_bitmap(const NullBitmapResolver* = nullptr) override {
return !_null_bitmap.isEmpty();
}
const roaring::Roaring* get_null_bitmap(const NullBitmapResolver* =
nullptr) override {
return _null_bitmap.isEmpty() ? nullptr : &_null_bitmap;
}
private:
roaring::Roaring _exclude_null; // 构造时从 scorer 的 null bitmap 收集
roaring::Roaring _null_bitmap; // 迭代时累积
};
```
`build_exclude_opt` 改造:
```cpp
ScorerPtr build_exclude_opt(std::vector<ScorerPtr> must_not_scorers,
const NullBitmapResolver* resolver) {
if (must_not_scorers.empty()) return nullptr;
// 在 union 之前收集 null bitmap(从索引层直接读,不需要遍历 scorer)
roaring::Roaring exclude_null;
for (auto& s : must_not_scorers) {
if (resolver && s->has_null_bitmap(resolver)) {
if (auto* nb = s->get_null_bitmap(resolver)) {
exclude_null |= *nb;
}
}
}
// 原有 union 逻辑不变
auto do_nothing = std::make_shared<DoNothingCombiner>();
auto specialized_scorer = scorer_union(std::move(must_not_scorers),
do_nothing);
auto boxed = into_box_scorer(std::move(specialized_scorer), do_nothing);
// 传入 null bitmap
return make_exclude(std::move(boxed), std::move(exclude_null));
}
```
**保留的部分**(当前 PR 中正确且必要的改动):
1. ✅ `binding_keys` 管线透传(`function_search.cpp` → `OccurBooleanQueryBuilder`
→ `OccurBooleanQuery` → `OccurBooleanWeight` → `per_occur_scorers`)— variant
subcolumn 需要
2. ✅ `per_occur_scorers` 中 `weight->scorer(context, binding_key)` 的改动
3. ✅ 所有新增的单元测试和回归测试(调整为验证增强后的 ExcludeScorer)
**需要修改的部分**:
1. ❌ 不删除 `build_exclude_opt`,改为增强版(接收 resolver,收集 null bitmap)
2. ❌ 不替换为 `AndNotScorer`,改为增强 `ExcludeScorer`(接收 exclude_null bitmap)
3. ❌ `complex_scorer` 中保留 `build_exclude_opt` + `make_exclude` 调用模式,传入
`context.null_resolver`
---
### P3 — 建议
1. `_binding_keys.resize(_sub_weights.size())` 建议加 `DCHECK_EQ` 校验
2. 单元测试补 variant subcolumn + MUST_NOT 场景
3. `ExcludeScorer` 的 `seek()` 方法也需要同步增加 null bitmap 检查
---
### 改动清单(供编码 agent 参考)
| 文件 | 操作 |
|------|------|
| `exclude_scorer.h` | 增加 `roaring::Roaring _exclude_null`, `_null_bitmap`
成员;重写 `has_null_bitmap`/`get_null_bitmap`;构造函数增加 `exclude_null` 参数 |
| `exclude_scorer.cpp` | `advance()`/`seek()` 中增加 null bitmap
检查逻辑;`make_exclude` 增加 `exclude_null` 参数 |
| `occur_boolean_weight.cpp` | `build_exclude_opt` 增加 `resolver` 参数,union
前收集 null bitmap;`complex_scorer` 中恢复调用 `build_exclude_opt` + `make_exclude`(不用
AndNotScorer) |
| `occur_boolean_weight.h` | `build_exclude_opt` 签名更新 |
| 其余文件 | `binding_keys` 管线改动保持不变 |
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]