Copilot commented on code in PR #867:
URL: 
https://github.com/apache/skywalking-banyandb/pull/867#discussion_r2568115781


##########
api/proto/banyandb/common/v1/common.proto:
##########
@@ -89,8 +89,60 @@ message LifecycleStage {
   // A value of 0 means no replicas, while a value of 1 means one primary 
shard and one replica.
   // Higher values indicate more replicas.
   uint32 replicas = 7;
+  // filter_rules defines filtering rules for specific tag families, tags, or 
fields.
+  // These rules are applied when data is migrated TO this stage.
+  // The TTL duration used for expiration check comes from this stage's ttl 
field.
+  // See FilterRule message for detailed matching logic.
+  repeated FilterRule filter_rules = 8;
 }
 
+// TagList represents a list of tag names within a tag family.
+message TagList {
+  repeated string tags = 1;
+}
+
+// FilterRule defines filtering rules for specific tag families, tags, or 
fields.
+// This allows fine-grained control over data retention at a more granular 
level than stage TTL.
+// These rules are scoped to the LifecycleStage that contains them.
+// The TTL duration used for expiration check comes from the parent 
LifecycleStage's ttl field.
+//
+// Matching Logic:
+// For a given column (Tag Family TF, Tag T or Field F) in a Subject S when 
migrating TO a stage:
+// 1. Get the filter_rules from the target LifecycleStage.
+// 2. Iterate through these rules in order.
+// 3. For each rule:
+//    - Check Subject: S must match subject_include (empty list matches ALL 
subjects) 
+//      and NOT match subject_exclude. subject_exclude takes precedence.
+//    - Check Tag Family: TF must be present in tag_family_include keys.
+//      - If present, check Tag T: T must be in the TagList value (empty list 
or "*" matches ALL tags).
+//      - Check Exclude: TF must NOT be in tag_family_exclude keys 
+//        (or if present, T must NOT be in its TagList value).
+//    - Check Field: F must match field_include (empty list matches NO fields, 
"*" matches ALL)
+//      and NOT match field_exclude.
+// 4. The first matching rule determines that the column should be filtered 
when migrating to that stage.
+// 5. If no rule matches, the column retains the default behavior (no 
filtering, full migration).
+// 6. The TTL duration used for expiration check comes from the parent 
LifecycleStage.ttl.
+message FilterRule {
+  // Subject filters (Stream/Measure/Trace names).
+  // If subject_include is empty, it matches ALL subjects in the Group.
+  // subject_exclude takes precedence over subject_include.
+  repeated string subject_include = 1;
+  repeated string subject_exclude = 2;
+
+  // Tag Family filters.
+  // The key is the tag family name. The value is a list of tags in the tag 
family.
+  // If the value list is empty, it matches ALL tags in the tag family.
+  // If the value list contains "*", it matches ALL tags in the tag family.
+  map<string, TagList> tag_family_include = 3;
+  map<string, TagList> tag_family_exclude = 4;
+
+  // Field filters.
+  // If field_include is empty, NO fields are matched (unless "*" is used).

Review Comment:
   The documentation states "empty list matches NO fields" for field_include, 
which conflicts with the special case handling of "*" that matches ALL fields. 
This inconsistency makes the behavior unclear.
   
   Additionally, the comment on line 134 says "If the value list is empty, it 
matches ALL tags in the tag family" but line 120 says for fields "empty list 
matches NO fields". This inconsistency between tag family and field matching 
behavior should be explained more clearly or made consistent.
   
   Recommendation: Explicitly document why empty lists have different semantics 
for tag_family_include vs field_include, or consider making them consistent.
   ```suggestion
     // If the value list contains "*", it matches ALL tags in the tag family.
     // Note: For tag_family_include, an empty list means "include all tags" 
for that family.
     // This is to allow easy inclusion of entire tag families without 
specifying each tag.
     map<string, TagList> tag_family_include = 3;
     map<string, TagList> tag_family_exclude = 4;
   
     // Field filters.
     // If field_include is empty, NO fields are matched (unless "*" is used).
     // Note: For field_include, an empty list means "do not include any 
fields" by default.
     // This is to prevent accidental inclusion of all fields, which may be 
undesirable for fields.
     // To include all fields, explicitly use "*" in the list.
   ```



##########
api/proto/banyandb/common/v1/common.proto:
##########
@@ -89,8 +89,60 @@ message LifecycleStage {
   // A value of 0 means no replicas, while a value of 1 means one primary 
shard and one replica.
   // Higher values indicate more replicas.
   uint32 replicas = 7;
+  // filter_rules defines filtering rules for specific tag families, tags, or 
fields.
+  // These rules are applied when data is migrated TO this stage.
+  // The TTL duration used for expiration check comes from this stage's ttl 
field.
+  // See FilterRule message for detailed matching logic.
+  repeated FilterRule filter_rules = 8;
 }
 
+// TagList represents a list of tag names within a tag family.
+message TagList {
+  repeated string tags = 1;
+}
+
+// FilterRule defines filtering rules for specific tag families, tags, or 
fields.
+// This allows fine-grained control over data retention at a more granular 
level than stage TTL.
+// These rules are scoped to the LifecycleStage that contains them.
+// The TTL duration used for expiration check comes from the parent 
LifecycleStage's ttl field.
+//
+// Matching Logic:
+// For a given column (Tag Family TF, Tag T or Field F) in a Subject S when 
migrating TO a stage:
+// 1. Get the filter_rules from the target LifecycleStage.
+// 2. Iterate through these rules in order.
+// 3. For each rule:
+//    - Check Subject: S must match subject_include (empty list matches ALL 
subjects) 
+//      and NOT match subject_exclude. subject_exclude takes precedence.
+//    - Check Tag Family: TF must be present in tag_family_include keys.
+//      - If present, check Tag T: T must be in the TagList value (empty list 
or "*" matches ALL tags).
+//      - Check Exclude: TF must NOT be in tag_family_exclude keys 
+//        (or if present, T must NOT be in its TagList value).
+//    - Check Field: F must match field_include (empty list matches NO fields, 
"*" matches ALL)
+//      and NOT match field_exclude.

Review Comment:
   The matching logic documentation lacks clarity on how tag family and field 
checks interact. Line 110 states "For a given column (Tag Family TF, Tag T or 
Field F)" suggesting these are alternatives, but the step-by-step logic in 
lines 116-121 lists them as sequential checks without clarifying whether all 
must pass or if they're alternatives based on column type.
   
   Recommendation: Restructure the matching logic to clearly separate the cases:
   - "If the column is a Tag Family/Tag (TF, T): Check subject, then 
tag_family_include/exclude..."
   - "If the column is a Field (F): Check subject, then 
field_include/exclude..."
   
   This would make it clear that tag family and field checks are mutually 
exclusive based on column type, not cumulative requirements.
   ```suggestion
   // For a given column (Tag Family TF, Tag T, or Field F) in a Subject S when 
migrating TO a stage:
   // 1. Get the filter_rules from the target LifecycleStage.
   // 2. Iterate through these rules in order.
   // 3. For each rule:
   //    - Check Subject: S must match subject_include (empty list matches ALL 
subjects)
   //      and NOT match subject_exclude. subject_exclude takes precedence.
   //    - If the column is a Tag Family or Tag (TF, T):
   //        - Check Tag Family: TF must be present in tag_family_include keys.
   //          - If present, check Tag T: T must be in the TagList value (empty 
list or "*" matches ALL tags).
   //        - Check Exclude: TF must NOT be in tag_family_exclude keys
   //          (or if present, T must NOT be in its TagList value).
   //    - If the column is a Field (F):
   //        - Check Field: F must match field_include (empty list matches NO 
fields, "*" matches ALL)
   //          and NOT match field_exclude.
   ```



##########
api/proto/banyandb/common/v1/common.proto:
##########
@@ -89,8 +89,60 @@ message LifecycleStage {
   // A value of 0 means no replicas, while a value of 1 means one primary 
shard and one replica.
   // Higher values indicate more replicas.
   uint32 replicas = 7;
+  // filter_rules defines filtering rules for specific tag families, tags, or 
fields.
+  // These rules are applied when data is migrated TO this stage.
+  // The TTL duration used for expiration check comes from this stage's ttl 
field.
+  // See FilterRule message for detailed matching logic.
+  repeated FilterRule filter_rules = 8;
 }
 
+// TagList represents a list of tag names within a tag family.
+message TagList {
+  repeated string tags = 1;
+}
+
+// FilterRule defines filtering rules for specific tag families, tags, or 
fields.
+// This allows fine-grained control over data retention at a more granular 
level than stage TTL.
+// These rules are scoped to the LifecycleStage that contains them.
+// The TTL duration used for expiration check comes from the parent 
LifecycleStage's ttl field.
+//
+// Matching Logic:
+// For a given column (Tag Family TF, Tag T or Field F) in a Subject S when 
migrating TO a stage:
+// 1. Get the filter_rules from the target LifecycleStage.
+// 2. Iterate through these rules in order.
+// 3. For each rule:
+//    - Check Subject: S must match subject_include (empty list matches ALL 
subjects) 
+//      and NOT match subject_exclude. subject_exclude takes precedence.
+//    - Check Tag Family: TF must be present in tag_family_include keys.
+//      - If present, check Tag T: T must be in the TagList value (empty list 
or "*" matches ALL tags).
+//      - Check Exclude: TF must NOT be in tag_family_exclude keys 
+//        (or if present, T must NOT be in its TagList value).
+//    - Check Field: F must match field_include (empty list matches NO fields, 
"*" matches ALL)
+//      and NOT match field_exclude.
+// 4. The first matching rule determines that the column should be filtered 
when migrating to that stage.
+// 5. If no rule matches, the column retains the default behavior (no 
filtering, full migration).
+// 6. The TTL duration used for expiration check comes from the parent 
LifecycleStage.ttl.
+message FilterRule {
+  // Subject filters (Stream/Measure/Trace names).
+  // If subject_include is empty, it matches ALL subjects in the Group.
+  // subject_exclude takes precedence over subject_include.
+  repeated string subject_include = 1;
+  repeated string subject_exclude = 2;
+
+  // Tag Family filters.
+  // The key is the tag family name. The value is a list of tags in the tag 
family.
+  // If the value list is empty, it matches ALL tags in the tag family.
+  // If the value list contains "*", it matches ALL tags in the tag family.
+  map<string, TagList> tag_family_include = 3;
+  map<string, TagList> tag_family_exclude = 4;
+
+  // Field filters.
+  // If field_include is empty, NO fields are matched (unless "*" is used).

Review Comment:
   [nitpick] Inconsistent empty list semantics between tag families and fields. 
Line 134 states "If the value list is empty, it matches ALL tags in the tag 
family" for tag_family_include, but line 140 states "If field_include is empty, 
NO fields are matched (unless \"*\" is used)."
   
   This creates confusion for API users. Consider either:
   1. Making the behavior consistent (e.g., both match ALL when empty)
   2. Providing a clear rationale for the difference in a comment
   
   The current documentation explains both behaviors but doesn't justify why 
they differ, which could lead to user errors.
   ```suggestion
     // If the value list contains "*", it matches ALL tags in the tag family.
     // 
     // Rationale: For tag families, an empty list means "no restriction" and 
matches all tags,
     // because tag families are often grouped and a rule is expected to apply 
broadly unless specified.
     // For fields, an empty list means "no fields are matched" to avoid 
accidental inclusion of all fields,
     // which could be risky in migration/filtering scenarios. Use "*" to 
explicitly match all fields.
     map<string, TagList> tag_family_include = 3;
     map<string, TagList> tag_family_exclude = 4;
   
     // Field filters.
     // If field_include is empty, NO fields are matched (unless "*" is used).
     // See rationale above for why empty list semantics differ from tag 
families.
   ```



##########
api/proto/banyandb/common/v1/common.proto:
##########
@@ -89,8 +89,60 @@ message LifecycleStage {
   // A value of 0 means no replicas, while a value of 1 means one primary 
shard and one replica.
   // Higher values indicate more replicas.
   uint32 replicas = 7;
+  // filter_rules defines filtering rules for specific tag families, tags, or 
fields.
+  // These rules are applied when data is migrated TO this stage.
+  // The TTL duration used for expiration check comes from this stage's ttl 
field.
+  // See FilterRule message for detailed matching logic.
+  repeated FilterRule filter_rules = 8;
 }
 
+// TagList represents a list of tag names within a tag family.
+message TagList {
+  repeated string tags = 1;
+}
+
+// FilterRule defines filtering rules for specific tag families, tags, or 
fields.
+// This allows fine-grained control over data retention at a more granular 
level than stage TTL.
+// These rules are scoped to the LifecycleStage that contains them.
+// The TTL duration used for expiration check comes from the parent 
LifecycleStage's ttl field.
+//
+// Matching Logic:
+// For a given column (Tag Family TF, Tag T or Field F) in a Subject S when 
migrating TO a stage:
+// 1. Get the filter_rules from the target LifecycleStage.
+// 2. Iterate through these rules in order.
+// 3. For each rule:
+//    - Check Subject: S must match subject_include (empty list matches ALL 
subjects) 

Review Comment:
   [nitpick] Trailing whitespace at the end of the line after "ALL subjects)". 
This should be removed for code cleanliness.
   ```suggestion
   //    - Check Subject: S must match subject_include (empty list matches ALL 
subjects)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to