liaoxin01 opened a new pull request, #64383:
URL: https://github.com/apache/doris/pull/64383

   ### What problem does this PR solve?
   
   When `enable_packed_file` is enabled (cloud mode), the first segment's 
inverted index file (`{rowset_id}_{seg}.idx`) is packed into a shared file 
instead of being written as a standalone object on remote storage.
   
   At read time, `Segment::_open_index_file_reader()` derived the `.idx` path 
prefix from `_file_reader->path()`. The remote file reader normalizes this to 
an **absolute** path (e.g. 
`s3://bucket/instance_prefix/data/{tablet}/{rowset}_{seg}.idx`). But 
`PackedFileSystem`'s index map is keyed by **relative** paths 
(`data/{tablet}/{rowset}_{seg}.idx`, exactly as recorded by `CloudRowsetWriter` 
at write time). The absolute lookup key therefore never matched the relative 
map key, so `PackedFileSystem::open_file_impl()` fell through to reading the 
`.idx` as a **standalone object**, which does not exist (the data lives inside 
the packed file). The read failed with:
   
   ```
   [E-6002]CLuceneError occur when init idx file s3://.../{rowset}_{seg}.idx, 
error msg: read past EOF
   ```
   
   (`read past EOF` is how the S3 `NOT_FOUND`/404 is surfaced by 
`FSIndexInput::readInternal`.)
   
   The failure was masked by the local file cache, whose key is filename based: 
a warm-up read (which uses the relative/packed path) populates the cache, and a 
subsequent query hits it. So the bug only surfaces on a **file cache miss** 
(cold/evicted cache). The `.dat` segment file is unaffected because it is 
opened directly with the relative segment path.
   
   Note: `branch-3.1` does not have this bug because there 
`Segment::_open_inverted_index()` derives the index path from the relative 
`_seg_path` member. The regression was introduced when this was switched to 
`_file_reader->path()`.
   
   ### Release note
   
   Fix `CLuceneError ... read past EOF` when querying an inverted index whose 
`.idx` file is stored in a packed file and is not present in the local file 
cache.
   
   ### Solution
   
   Store the path passed to `Segment::open()` in a new `_seg_path` member and 
use it (instead of `_file_reader->path()`) to derive the inverted index file 
path prefix, so the lookup key matches the relative keys recorded by 
`PackedFileSystem`. This restores the behavior `branch-3.1` already had.
   
   A regression test 
(`cloud_p0/packed_file/test_packed_file_inverted_index_query`) loads small data 
so the `.idx` is packed, clears the file cache to force a miss, then runs 
inverted-index-backed queries and asserts they succeed with correct results.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to