(doris) branch master updated: [Enhancement](skills) clarify code review skill usage and refine guidelines (#61116)

zclll Tue, 10 Mar 2026 00:17:46 -0700

This is an automated email from the ASF dual-hosted git repository.

zclll pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git



The following commit(s) were added to refs/heads/master by this push:
     new 50bbf89c907 [Enhancement](skills) clarify code review skill usage and 
refine guidelines (#61116)
50bbf89c907 is described below

commit 50bbf89c907f5c6457719200b8804a4327b91d02
Author: zclllyybb <[email protected]>
AuthorDate: Tue Mar 10 15:17:14 2026 +0800

    [Enhancement](skills) clarify code review skill usage and refine guidelines 
(#61116)
    
    mainly based on the observation of AI review pipeline operation status
---
 .claude/skills/code-review/SKILL.md | 23 ++++++++++++++++++++---
 AGENTS.md                           |  8 ++++----
 2 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/.claude/skills/code-review/SKILL.md 
b/.claude/skills/code-review/SKILL.md
index 84d3700dfaf..ac9a6cbcf0c 100644
--- a/.claude/skills/code-review/SKILL.md
+++ b/.claude/skills/code-review/SKILL.md
@@ -12,6 +12,10 @@ Complete code review work to ensure code quality. Due to the 
length of content,
 
 Use this when you need to review code, whether it's code you just completed or 
directly reviewing target code.
 
+## How to use me
+
+Since this article is long, you must read and respond to the content of part 
1. The other specific details can be referred to as needed when reviewing the 
code modules (not required).
+
 ---
 
 ## Part 1: General Principles
@@ -27,15 +31,15 @@ Always focus on the following core invariants during review:
 
 ### 1.2 Review Principles
 
-- **Defensive Programming Prohibited**: Do not use `if(valid)` style defensive 
checks to mask logic errors. If logically A=true should imply B=true, write 
`if(A) { DORIS_CHECK(B); ... }`, never `if(A && B)`
 - **Follow Context Conventions**: When adding code, strictly follow patterns 
in adjacent code—same error handling, same interface call order, same lock 
acquisition patterns, unless a clearly correct and more reasonable approach is 
identified.
 - **Prioritize Reuse**: Before adding new functionality, search for similar 
implementations that can be reused or extended; after adding, ensure good 
abstraction for shared code.
 - **Code First**: When this SKILL conflicts with actual code behavior, defer 
to your understanding of actual code behavior and explain
 - **Performance First**: All obviously redundant operations should be 
optimized away, all obvious performance optimizations must be applied, and 
obvious anti-patterns must be eliminated.
+- **Evidence Speaks**: All issues with code itself (not memory or environment) 
must be clearly identified as either having problems or not. For any erroneous 
situation, if it cannot be confirmed locally, you must provide the specific 
path or logic where the error occurs. That is, if you believe that if A then B, 
you must specify a clear scenario where A occurs.
 
 ### 1.3 Critical Checkpoints (Self-Review and Review Priority)
 
-The following checkpoints must be **individually confirmed with conclusions** 
during self-review and review:
+The following checkpoints must be **individually confirmed with conclusions** 
during self-review and review. If the code is too long or too complex, you 
should delve into the code again as needed when analyzing specific issues, 
especially the entire logic chain where there are doubts:
 
 - What is the goal of the current task? Does the current code accomplish this 
goal? Is there a test that proves it?
 - Is this modification as small, clear, and focused as possible?
@@ -43,8 +47,9 @@ The following checkpoints must be **individually confirmed 
with conclusions** du
   - Which threads introduce concurrency, what are the critical variables? What 
locks protect them?
   - Are operations within locks as lightweight as possible? Are all heavy 
operations outside locks while maintaining concurrency safety?
   - If multiple locks exist, is the locking order consistent? Is there 
deadlock risk?
-- Is there special or non-intuitive lifecycle management? If yes:
+- Is there special or non-intuitive lifecycle management including static 
initialization order? If yes:
   - Understand the complete lifecycle and relationships of related variables. 
Are there circular references? When is each released? Can all lifecycles end 
normally?
+  - For cross-TU static/global variables: does the initializer of one static 
variable depend on another static variable defined in a different translation 
unit? If yes, the initialization order is undefined by the C++ standard (static 
initialization order fiasco). Use constexpr, inline variables in the same 
header, or lazy initialization to fix.
 - Are configuration items added?
   - Should the configuration allow dynamic changes, and does it actually allow 
them? If yes:
     - Can affected processes detect the change promptly without restart?
@@ -207,6 +212,13 @@ Vectorized columns (`IColumn`) use custom intrusive 
reference-counted Copy-on-Wr
 - [ ] Do cache values inherit from `LRUCacheValueBase`? This base class 
automatically releases tracking bytes on destruction
 - [ ] Are new cache types registered with `CacheManager`? Unregistered caches 
don't participate in global GC
 
+#### 2.2.5 Static/Global Variable Initialization Order
+
+- [ ] Does the diff add or modify any `static` / global variable definition in 
a `.cpp` file? If the initializer references any static/global variable from 
another file (header or .cpp), this is a **SIOF** (static initialization order 
fiasco) — C++ does not define initialization order across translation units, so 
the dependency may be zero/garbage at the point of use.
+  - **Unsafe**: `const Foo Foo::X = Foo(Bar::Y.method());` in `foo.cpp`, where 
`Bar::Y` is `inline` in `bar.h` or defined in `bar.cpp` — different TU, order 
undefined
+  - **Safe**: `inline const Foo Foo::X = Foo(Bar::Y.method());` in `foo.h` 
where `bar.h` is included — same TU for every includer
+  - **Fix**: use `constexpr`, move to same header as `inline`, or use 
function-local static (Meyers' singleton)
+
 ### 2.3 Concurrency and Locks
 
 #### 2.3.1 Tablet Lock Hierarchy
@@ -753,6 +765,7 @@ MoW tables are one of the most complex data paths, require 
special attention dur
 | `op_not` const but modifies data | **MEDIUM** | 
`inverted_index_reader.h:170` | Modifies via shared_ptr bypassing const, 
callers assume immutable |
 | Schema change four-level lock order | **HIGH** | `schema_change.cpp:972-976` 
| base push→new push→base header→new header, out-of-order deadlock |
 | MOW schema change delete bitmap four steps | **CRITICAL** | 
`schema_change.cpp:1595-1657` | Step 3 must block publish, omission causes 
bitmap inconsistency |
+| Cross-TU static variable initialization dependency | **HIGH** | BE global | 
Static init order between TUs is undefined; initializer reading another TU's 
static gets zero/garbage |
 | `ExternalCatalog.isInitializing` dead code | **MEDIUM** | 
`ExternalCatalog.java:358` | Never set to true, reentrancy protection 
ineffective |
 | `lowerCaseToDatabaseName.clear()` race | **HIGH** | 
`ExternalCatalog.java:504` | Concurrent get returns null between clear and 
refill |
 | `resetToUninitialized()` sets fields to null | **HIGH** | 
`ExternalCatalog.java:581-590` | Concurrent queries encounter null fields 
midway causing NPE |
@@ -839,3 +852,7 @@ BE extensively uses bthread, following are easily 
overlooked bthread-specific co
 
 - [ ] **Lock-protected data**: Comment next to member variables `// protected 
by xxx_lock`
 - [ ] **TODO/FIXME**: Include author, date, brief description, e.g., `// 
TODO(zhaochangle, 2026-03-01): optimize this path`
+
+## Finally
+
+In general, the content mentioned in 1.3 Critical Checkpoints is what you must 
check and respond to item by item. Of course, your ultimate goal is still to 
discover bugs, so you can conduct any investigation on suspicious parts and 
provide conclusions.
diff --git a/AGENTS.md b/AGENTS.md
index 795ce6f8024..ecfff03dbed 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -6,11 +6,11 @@ This is the codebase for Apache Doris, an MPP OLAP database. 
It primarily consis
 
 To ensure smooth test execution without interference between worktrees, the 
first thing to do upon entering a worktree directory is to check if 
`.worktree_initialized` exists. If not, execute `hooks/setup_worktree.sh`, 
setting `$ROOT_WORKSPACE_PATH` to the base directory (typically 
`${DORIS_REPO}`) beforehand. After successful execution, verify that 
`.worktree_initialized` has been touched and that `thirdparty/installed` 
dependencies exist correctly. Also check if submodules have been pr [...]
 
-When working in worktree mode, all operations must be confined to the current 
worktree directory. Do not enter `${DORIS_REPO}` or use any resources there. 
Compilation and execution must be done within the current worktree directory. 
The compiled Doris cluster must use random ports not used by other worktrees 
(modify BE and FE conf before compilation, using a uniform offset of 707-807 
from default ports without conflicting with other worktrees' ports). Run from 
the `output` directory with [...]
+When working in worktree mode, all operations must be confined to the current 
worktree directory. Do not enter `${DORIS_REPO}` or use any resources there. 
Compilation and execution must be done within the current worktree directory. 
The compiled Doris cluster must use random ports not used by other worktrees 
(modify BE and FE conf before compilation, using a uniform offset of 
`${DORIS_PORT_OFFSET_RANGE}` from default ports without conflicting with other 
worktrees' ports). Run from the `o [...]
 
 ## Coding Standards
 
-Assert correctness only—never use defensive programming with `if` or similar 
constructs. Any `if` check for errors must have a clearly known inevitable 
failure path (not speculation). If no such scenario is found, strictly avoid 
using `if(valid)` checks. However, you may use the `DORIS_CHECK` macro for 
precondition assertions. For example, if logically A=true should always imply 
B=true, then strictly avoid `if(A&&B)` and instead use 
`if(A){DORIS_CHECK(B);...}`. In short, the principle is [...]
+Assert correctness only—never use defensive programming with `if` or similar 
constructs. Any `if` check for errors must have a clearly known inevitable 
failure path (not speculation). If no such scenario is found, strictly avoid 
using `if(valid)` checks. However, you may use the `DORIS_CHECK` macro for 
precondition assertions (If inside performance-sensitive areas like loops, it 
can only be `DCHECK`). For example, if logically A=true should always imply 
B=true, then strictly avoid `if(A& [...]
 
 When adding code, strictly follow existing similar code in similar contexts, 
including interface usage, error handling, etc., maintaining consistency. When 
adding any code, first try to reference existing functionality. Second, you 
must examine the relevant context paragraphs to fully understand the logic.
 
@@ -22,7 +22,7 @@ When conducting code review (including self-review and review 
tasks), it is nece
 
 ## Build and Run Standards
 
-Always use only the `build.sh` script with its correct parameters to build 
Doris BE and FE. When building, use at least `-j${DORIS_PARALLELISM}` 
parallelism. For example, the simplest BE+FE build command is `./build.sh --be 
--fe -j${DORIS_PARALLELISM}`.
+Always use only the `build.sh` script with its correct parameters to build 
Doris BE and FE. When building, use `-j${DORIS_PARALLELISM}` parallelism. For 
example, the simplest BE+FE build command is `./build.sh --be --fe 
-j${DORIS_PARALLELISM}`.
 Build type can be set via `BUILD_TYPE` in `custom_env.sh`, but only set it to 
`RELEASE` when explicitly required for performance testing; otherwise, keep it 
as `ASAN`.
 You may modify BE and FE ports and network settings in `conf/` before 
compilation to ensure correctness and avoid conflicts.
 Build artifacts are in the current directory's `output/`. If starting the 
service, ensure all process artifacts have their conf set with appropriate 
non-conflicting ports and `priority_networks = 10.16.10.3/24`. Use `--daemon` 
when starting. Cluster startup is slow; wait at least 30s for success. If still 
not ready after waiting, continue waiting. If not ready after a long time, 
check BE and FE logs to investigate.
@@ -34,7 +34,7 @@ All kernel features must have corresponding tests. Prioritize 
adding regression
 
 You must use the preset scripts in the codebase with their correct parameters 
to run tests (`run-regression-test.sh`, `run-be-ut.sh`, `run-fe-ut.sh`). 
Regression test result files must not be handwritten; they must be 
auto-generated via test scripts. When running regression tests, if using `-s` 
to specify a case, also try to use `-d` to specify the parent directory for 
faster execution. For example, for cases under `nereids_p0`, you can use `-d 
nereids_p0 -s xxx`, where `xxx` is the name [...]
 
-BE-UT compilation must not be below `${DORIS_PARALLELISM}` parallelism.
+BE-UT compilation must use at most `${DORIS_PARALLELISM}` parallelism also.
 
 Key utility functions in BE code, as well as the core logic (functions) of 
complete features, must have corresponding unit tests. If it's inconvenient to 
add unit tests, the module design and function decomposition should be reviewed 
again to ensure high cohesion and low coupling are properly achieved.
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(doris) branch master updated: [Enhancement](skills) clarify code review skill usage and refine guidelines (#61116)

Reply via email to