hanahmily commented on code in PR #1184:
URL: 
https://github.com/apache/skywalking-banyandb/pull/1184#discussion_r3456885907


##########
banyand/stream/block_writer.go:
##########
@@ -173,6 +174,7 @@ type blockWriter struct {
 
 func (bw *blockWriter) reset() {
        bw.writers.reset()
+       bw.tagType.reset()

Review Comment:
   `reset()` calls `bw.tagType.reset()` (i.e. `clear()`) directly, without 
guarding against a nil map. This is safe today only because 
`generateBlockWriter` always initializes `tagType`, but `copyFrom` would panic 
if a `blockWriter` ever reached use with a nil map. The merged trace 
implementation guards this — could we match it here for consistency and 
robustness?
   
   ```go
   if bw.tagType == nil {
       bw.tagType = make(tagType)
   } else {
       bw.tagType.reset()
   }
   ```



##########
banyand/stream/part.go:
##########
@@ -232,6 +239,7 @@ func (mp *memPart) mustFlush(fileSystem fs.FileSystem, path 
string) {
        }
 
        mp.partMetadata.mustWriteMetadata(fileSystem, path)
+       mp.tagType.mustWriteTagType(fileSystem, path)

Review Comment:
   `tag.type` is written *after* `metadata.json` here, which raises two points:
   
   - The comment just below ("…covering the dirent changes for all data files 
written above") is now slightly stale, since `metadata.json` is no longer the 
final write.
   - `metadata.json` is the part's completion marker, so a crash between the 
two writes yields a "complete" part with no `tag.type`. Harmless now (read path 
is out of scope), but the future query-path reader will need to tolerate a 
missing `tag.type`.
   
   Note also that `FinishSync` in `write_data.go` writes them in the opposite 
order (`tag.type` before `metadata.json`), so the ordering is inconsistent 
between the flush/merge paths and the sync path. Worth aligning, or at least 
documenting the intended invariant.



##########
banyand/stream/part_metadata.go:
##########
@@ -107,6 +112,96 @@ func (pm *partMetadata) mustWriteMetadata(fileSystem 
fs.FileSystem, partPath str
        }
 }
 
+type tagType map[string]map[string]pbv1.ValueType
+
+func (tt tagType) reset() {
+       clear(tt)
+}
+
+func (tt tagType) copyFrom(source tagType) {
+       for familyName, sourceTags := range source {
+               tags := tt[familyName]
+               if tags == nil {
+                       tags = make(map[string]pbv1.ValueType, len(sourceTags))
+                       tt[familyName] = tags
+               }
+               for tagName, valueType := range sourceTags {
+                       tags[tagName] = valueType
+               }
+       }
+}
+
+func (tt tagType) marshal() []byte {
+       var dst []byte
+       dst = encoding.VarUint64ToBytes(dst, uint64(len(tt)))
+       familyNames := make([]string, 0, len(tt))
+       for familyName := range tt {
+               familyNames = append(familyNames, familyName)
+       }
+       sort.Strings(familyNames)
+       for _, familyName := range familyNames {
+               dst = encoding.EncodeBytes(dst, 
convert.StringToBytes(familyName))
+               tags := tt[familyName]
+               dst = encoding.VarUint64ToBytes(dst, uint64(len(tags)))
+               tagNames := make([]string, 0, len(tags))
+               for tagName := range tags {
+                       tagNames = append(tagNames, tagName)
+               }
+               sort.Strings(tagNames)
+               for _, tagName := range tagNames {
+                       dst = encoding.EncodeBytes(dst, 
convert.StringToBytes(tagName))
+                       dst = append(dst, byte(tags[tagName]))
+               }
+       }
+       return dst
+}
+
+func (tt tagType) unmarshal(src []byte) error {

Review Comment:
   The PR description states this "returns errors for truncated names or 
missing type bytes," but the decode is only partially defensive: 
`encoding.BytesToVarUint64` and `DecodeBytes` return `0`/empty *without* an 
error on several truncation shapes. A corrupt `familyCount`/`tagCount` would 
therefore spin the loop (inserting empty entries) or panic in `make(..., 
tagCount)` rather than returning a clean error. Since `unmarshal` is test-only 
today this is latent, but before the read path lands I'd suggest 
bounds-checking the decoded counts against `len(remaining)` so corruption fails 
fast.
   
   (This is the same function Copilot flagged, but a distinct concern — its 
compile claim is incorrect, as `uint64` is a valid `make` size hint.)



##########
banyand/stream/part.go:
##########
@@ -422,6 +430,18 @@ func CreatePartFileReaderFromPath(partPath string, lfs 
fs.FileSystem) ([]queue.F
                if e.IsDir() {
                        continue
                }
+               if e.Name() == tagTypeFilename {

Review Comment:
   Minor: this `if e.Name() == tagTypeFilename` branch doesn't `continue` 
afterward, so control falls through to the `filepath.Ext` checks below. It's 
harmless (`.type` matches none of those extensions), but adding `continue` 
would match the structure of the sibling branches and avoid the redundant 
comparisons.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to