sollhui opened a new pull request, #64684:
URL: https://github.com/apache/doris/pull/64684

   ### What problem does this PR solve?
   
   Before this change, S3-compatible glob listing derived the object-store 
`ListObjects` prefix by stopping at the first glob metacharacter. For a path 
like:
   
   
`s3://bucket/asin_trend/sale/month/date=2025-{0[3-9],1[0-2]}-01/mp_id=8/0/0/436/*`
   
   the old behavior listed the broad prefix:
   
   `asin_trend/sale/month/date=2025-`
   
   and then filtered all returned object keys in FE. If many unrelated objects 
existed under `date=2025-*`, for example other dates, `mp_id`s, or deeper 
paths, S3 TVF planning could spend a long time listing and filtering files 
before query execution started.
   
   After this change, Doris expands safely enumerable glob fragments before 
issuing object-store list requests. The same path is now listed through 
narrower prefixes such as:
   
   `asin_trend/sale/month/date=2025-03-01/mp_id=8/0/0/436/`
   ...
   `asin_trend/sale/month/date=2025-12-01/mp_id=8/0/0/436/`
   
   Doris still applies the full glob regex after listing, so result correctness 
is unchanged. The optimization only reduces the remote listing scope. Expansion 
is limited to bounded brace alternations and positive character classes, with a 
hard cap to avoid generating too many prefixes. Existing pagination behavior 
through `startAfter` and `maxFile` is preserved.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to