zclllyybb commented on issue #64767:
URL: https://github.com/apache/doris/issues/64767#issuecomment-4785729065

   Breakwater-GitHub-Analysis-Slot: slot_bc1fd4b9373a
   This content is generated by AI for reference only.
   
   Initial triage: this looks like a real consistency bug in current `master`.
   
   I refreshed `apache/doris` `upstream/master` to 
`e5f3badd0109e312167f242df5aa53adb86806d8` and checked the referenced code 
path. In 
`fe/fe-foundation/src/main/java/org/apache/doris/foundation/util/PathUtils.java`,
 `equalsIgnoreSchemeIfOneIsS3` currently has two different equality contracts:
   
   - Same-scheme URIs go through a full-string `equalsIgnoreCase` comparison, 
so object key case is ignored and trailing slashes remain significant.
   - Cross-scheme URIs where either side is `s3` compare normalized authority 
and path with `Objects.equals`, so the comparison is case-sensitive and strips 
trailing slashes.
   
   That matches the issue's reproduction. It also matters to production 
behavior: the production use I found is 
`HMSTransaction.prepareInsertExistingTable` in 
`fe/fe-core/src/main/java/org/apache/doris/datasource/hive/HMSTransaction.java`,
 where the result directly drives `needRename = 
!PathUtils.equalsIgnoreSchemeIfOneIsS3(targetPath, writePath)`. Therefore:
   
   - Same-scheme trailing-slash-only differences can trigger an unnecessary 
rename even though the cross-scheme `s3` path treats the same location as equal.
   - Same-scheme case-only differences such as `s3://bucket/A` versus 
`s3://bucket/a` can be incorrectly treated as equal, which is unsafe for 
object-storage keys.
   
   I also found the linked public fix PR: 
https://github.com/apache/doris/pull/64768. It is currently open as a draft, 
based on the same `master` head, and its patch changes the utility to use one 
structural comparison rule for same-scheme and `s3` cross-scheme cases. The 
added `PathUtilsTest` cases cover the reported trailing-slash consistency 
issue, case-sensitive path/authority comparisons, malformed object-storage URI 
forms, encoded slashes, query/fragment distinctions, and `s3a`/`s3n` behavior.
   
   Suggested next steps:
   
   1. Review PR #64768 as the likely direct fix for this issue.
   2. Keep the utility-level regression tests for both reported examples: 
same-scheme trailing slash and same-scheme case-only path differences.
   3. If maintainers want stronger caller coverage, add a narrow FE test around 
the Hive insert path or rename-decision input pair to ensure `targetPath` and 
`writePath` follow the unified location-equality rule.
   
   Missing information is not blocking the code-level conclusion here because 
the logic bug is visible from the current source and the issue includes direct 
method-level reproductions. A real-world incident assessment would still need 
the exact table location, write path, storage scheme pair, Doris build SHA, and 
FE commit/rename logs.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to