github-actions[bot] commented on code in PR #64684:
URL: https://github.com/apache/doris/pull/64684#discussion_r3450914756
##########
fe/fe-filesystem/fe-filesystem-spi/src/main/java/org/apache/doris/filesystem/spi/S3CompatibleFileSystem.java:
##########
@@ -847,6 +849,247 @@ protected static String longestNonGlobPrefix(String
globPattern) {
return globPattern.substring(0, earliest);
}
+ /**
+ * Returns object-store list prefixes that are safe to push down for a
glob pattern.
+ *
+ * <p>Unlike {@link #longestNonGlobPrefix(String)}, this expands bounded
glob constructs
+ * ({@code {...}} alternation and positive {@code [...]} character
classes) before the first
+ * unbounded wildcard. That lets patterns such as
+ * {@code date=2025-{0[3-9],1[0-2]}-01/mp_id=8/*} list the concrete
date/mp prefixes instead
+ * of scanning everything under {@code date=2025-}. If expansion would be
too large or a glob
+ * construct is not safely enumerable, it falls back to the conservative
longest static prefix.
+ */
+ protected static List<String> expandedGlobListPrefixes(String globPattern)
{
+ List<String> prefixes = expandGlobListPrefixes(globPattern, true);
+ return prefixes == null ? List.of(longestNonGlobPrefix(globPattern)) :
prefixes;
+ }
+
+ private static List<String> expandGlobListPrefixes(String globPattern,
boolean allowPartialPrefix) {
+ List<String> prefixes = new ArrayList<>();
+ prefixes.add("");
+ int i = 0;
+ while (i < globPattern.length()) {
+ char c = globPattern.charAt(i);
+ if (c == '*' || c == '?') {
Review Comment:
This recursive expansion is only safe when the brace arm is fully
enumerable. Right now `expandGlobListPrefixes(alternative, false)` still
returns a partial prefix when it encounters `*` or `?` (lines 873-875), and
then the caller appends the suffix after the brace. For example,
`data/{foo*,bar*}/part.parquet` produces list prefixes like
`data/foo/part.parquet` and `data/bar/part.parquet`, so objects such as
`data/foobar/part.parquet` match the glob regex but are never listed. If an
alternative hits an unbounded wildcard while `allowPartialPrefix` is false, the
brace expansion needs to fail and fall back to the conservative outer prefix
instead of appending the outer suffix to that partial arm.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]