Copilot commented on code in PR #60512:
URL: https://github.com/apache/doris/pull/60512#discussion_r2765171688
##########
fe/fe-core/src/main/java/org/apache/doris/fs/obj/AzureObjStorage.java:
##########
@@ -436,6 +451,85 @@ public Status globList(String remotePath, List<RemoteFile>
result, boolean fileN
return st;
}
+ /**
+ * Get file metadata using getProperties requests for deterministic paths.
+ * This avoids requiring list permission when only read permission is
granted.
+ *
+ * @param bucket Azure container name
+ * @param keyPattern The key pattern (may contain {..} brace patterns
but no wildcards)
+ * @param result List to store matching RemoteFile objects
+ * @param fileNameOnly If true, only store file names; otherwise store
full paths
+ * @param startTime Start time for logging duration
+ * @return Status if successful, null if should fall back to listing
+ */
+ private Status globListByGetProperties(String bucket, String keyPattern,
+ List<RemoteFile> result, boolean fileNameOnly, long startTime) {
+ try {
+ // First expand any {..} patterns, then use getProperties requests
+ String expandedPattern = S3Util.extendGlobs(keyPattern);
+ List<String> expandedPaths =
S3Util.expandBracePatterns(expandedPattern);
+
+ // Fall back to listing if too many paths to avoid overwhelming
Azure with requests
+ // Controlled by config: s3_head_request_max_paths
+ if (expandedPaths.size() > Config.s3_head_request_max_paths) {
+ LOG.info("Expanded path count {} exceeds limit {}, falling
back to LIST",
+ expandedPaths.size(),
Config.s3_head_request_max_paths);
+ return null;
+ }
+
+ if (LOG.isDebugEnabled()) {
+ LOG.debug("Using getProperties requests for deterministic path
pattern, expanded to {} paths",
+ expandedPaths.size());
+ }
+
+ BlobContainerClient containerClient =
getClient().getBlobContainerClient(bucket);
+ long matchCnt = 0;
+ for (String key : expandedPaths) {
+ String fullPath = constructS3Path(key, bucket);
+ try {
+ BlobClient blobClient = containerClient.getBlobClient(key);
+ BlobProperties props = blobClient.getProperties();
+
+ matchCnt++;
+ RemoteFile remoteFile = new RemoteFile(
+ fileNameOnly ?
Paths.get(key).getFileName().toString() : fullPath,
+ true, // isFile
+ props.getBlobSize(),
+ props.getBlobSize(),
+ props.getLastModified() != null
+ ? props.getLastModified().toEpochSecond()
: 0
Review Comment:
Inconsistent timestamp conversion method. The existing code in this file
uses `getSecond()` (lines 426, 561) for Azure blob timestamps, but this new
code uses `toEpochSecond()`. This inconsistency could lead to different
timestamp values being returned for the same type of data depending on which
code path is used. Use `getSecond()` instead to match the existing convention.
```suggestion
? props.getLastModified().getSecond() : 0
```
##########
fe/fe-core/src/main/java/org/apache/doris/qe/SessionVariable.java:
##########
@@ -4652,6 +4662,35 @@ public boolean isEnablePushDownStringMinMax() {
return enablePushDownStringMinMax;
}
+ public String getEnableMorValuePredicatePushdownTables() {
+ return enableMorValuePredicatePushdownTables;
+ }
+
+ /**
+ * Check if a table is enabled for MOR value predicate pushdown.
+ * @param dbName database name
+ * @param tableName table name
+ * @return true if the table is in the enabled list or if '*' is set
+ */
+ public boolean isMorValuePredicatePushdownEnabled(String dbName, String
tableName) {
+ if (enableMorValuePredicatePushdownTables == null
+ || enableMorValuePredicatePushdownTables.isEmpty()) {
+ return false;
+ }
+ String trimmed = enableMorValuePredicatePushdownTables.trim();
+ if ("*".equals(trimmed)) {
+ return true;
+ }
+ String fullName = dbName + "." + tableName;
+ for (String table : trimmed.split(",")) {
+ if (table.trim().equalsIgnoreCase(fullName)
+ || table.trim().equalsIgnoreCase(tableName)) {
Review Comment:
Performance concern with repeated string parsing. The
isMorValuePredicatePushdownEnabled method splits and trims the comma-separated
table list on every invocation. For queries that scan multiple MOR tables, this
parsing happens repeatedly. Consider caching the parsed table set when the
session variable is set, or at minimum move the split operation outside the
loop to avoid repeated splitting.
```suggestion
String trimmedTable = table.trim();
if (trimmedTable.equalsIgnoreCase(fullName)
|| trimmedTable.equalsIgnoreCase(tableName)) {
```
##########
fe/fe-core/src/main/java/org/apache/doris/qe/SessionVariable.java:
##########
@@ -4652,6 +4662,35 @@ public boolean isEnablePushDownStringMinMax() {
return enablePushDownStringMinMax;
}
+ public String getEnableMorValuePredicatePushdownTables() {
+ return enableMorValuePredicatePushdownTables;
+ }
+
+ /**
+ * Check if a table is enabled for MOR value predicate pushdown.
+ * @param dbName database name
+ * @param tableName table name
+ * @return true if the table is in the enabled list or if '*' is set
+ */
+ public boolean isMorValuePredicatePushdownEnabled(String dbName, String
tableName) {
+ if (enableMorValuePredicatePushdownTables == null
+ || enableMorValuePredicatePushdownTables.isEmpty()) {
+ return false;
+ }
+ String trimmed = enableMorValuePredicatePushdownTables.trim();
+ if ("*".equals(trimmed)) {
+ return true;
+ }
Review Comment:
Missing null check for dbName parameter. If olapTable.getQualifiedDbName()
returns null, the string concatenation at line 4684 will produce
"null.tableName" instead of failing gracefully or handling this case
appropriately. Add a null check for dbName before using it in the string
concatenation.
```suggestion
}
if (dbName == null || tableName == null) {
return false;
}
```
##########
fe/fe-core/src/main/java/org/apache/doris/fs/obj/S3ObjStorage.java:
##########
@@ -578,6 +579,23 @@ private GlobListResult globListInternal(String remotePath,
List<RemoteFile> resu
}
bucket = uri.getBucket();
+
+ // Optimization: For deterministic paths (no wildcards like *, ?,
[...]),
+ // use HEAD requests instead of listing to avoid requiring
ListBucket permission.
+ // This is useful when only GetObject permission is granted.
+ // Controlled by config: s3_skip_list_for_deterministic_path
+ String keyPattern = uri.getKey();
+ if (Config.s3_skip_list_for_deterministic_path
+ && S3Util.isDeterministicPattern(keyPattern)
+ && !hasLimits && startFile == null) {
+ GlobListResult headResult = globListByHeadRequests(
+ bucket, keyPattern, result, fileNameOnly, startTime);
+ if (headResult != null) {
+ return headResult;
+ }
+ // If headResult is null, fall through to use listing
+ }
Review Comment:
The PR description does not mention the S3/Azure optimization features that
add HEAD request support for deterministic paths. This PR appears to include
two unrelated features: (1) MOR value predicate pushdown control, and (2)
S3/Azure deterministic path optimization. These should ideally be in separate
PRs for better code review and maintainability. At minimum, the PR description
should be updated to reflect all changes being made.
##########
fe/fe-core/src/main/java/org/apache/doris/qe/SessionVariable.java:
##########
@@ -4652,6 +4662,35 @@ public boolean isEnablePushDownStringMinMax() {
return enablePushDownStringMinMax;
}
+ public String getEnableMorValuePredicatePushdownTables() {
+ return enableMorValuePredicatePushdownTables;
+ }
+
+ /**
+ * Check if a table is enabled for MOR value predicate pushdown.
+ * @param dbName database name
+ * @param tableName table name
+ * @return true if the table is in the enabled list or if '*' is set
+ */
+ public boolean isMorValuePredicatePushdownEnabled(String dbName, String
tableName) {
+ if (enableMorValuePredicatePushdownTables == null
+ || enableMorValuePredicatePushdownTables.isEmpty()) {
+ return false;
+ }
+ String trimmed = enableMorValuePredicatePushdownTables.trim();
+ if ("*".equals(trimmed)) {
+ return true;
+ }
+ String fullName = dbName + "." + tableName;
+ for (String table : trimmed.split(",")) {
+ if (table.trim().equalsIgnoreCase(fullName)
+ || table.trim().equalsIgnoreCase(tableName)) {
+ return true;
+ }
+ }
+ return false;
+ }
Review Comment:
Missing test coverage for the MOR value predicate pushdown feature. The new
session variable isMorValuePredicatePushdownEnabled() method includes parsing
logic for comma-separated table lists, wildcard handling, and case-insensitive
matching. This logic should have unit tests to verify correct behavior for
various input formats (empty string, single table, multiple tables, wildcard,
table name only vs qualified name, etc.).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]