morningman commented on PR #59065:
URL: https://github.com/apache/doris/pull/59065#issuecomment-4017265228
## Comment 1 — `FileCacheAdmissionManager.java` L579 (readLock/writeLock not
in finally)
[Critical] The `readLock.unlock()` is not inside a `finally` block. If
`ruleManager.isAdmittedAtTableLevel(...)` throws an unexpected exception, the
read lock will never be released, causing all subsequent admission checks (and
writes) to be blocked indefinitely.
The same issue exists for `writeLock` in `loadRules()` (L599-L601) and
`loadRules(String)` (L656-L658).
Please use `try-finally` to ensure lock release:
```java
public boolean isAdmittedAtTableLevel(...) {
readLock.lock();
try {
return ruleManager.isAdmittedAtTableLevel(...);
} finally {
readLock.unlock();
}
}
```
And similarly for the write lock:
```java
writeLock.lock();
try {
ruleManager = newRuleManager;
} finally {
writeLock.unlock();
}
```
---
## Comment 2 — `FileCacheAdmissionManager.java` L486 (getIndex potential OOB)
[Critical] `getIndex(char firstChar)` returns `firstChar - 'A'`, and the
backing list has `PARTITION_COUNT = 58` slots. However, the caller checks
`Character.isAlphabetic(firstChar)` before calling `getIndex`, and
`Character.isAlphabetic()` returns `true` for many Unicode characters beyond
ASCII (e.g., Chinese characters, accented letters, etc.). For such characters,
`firstChar - 'A'` will produce a value far exceeding 57, causing an
`IndexOutOfBoundsException`.
Suggestion: Replace the `List<Map<String, RuleCollection>>` with a simple
`Map<String, RuleCollection>` (or `HashMap<String, RuleCollection>`) keyed by
user identity. This eliminates the fragile character-based indexing entirely
and is simpler to understand:
```java
// Before
private static final int PARTITION_COUNT = 58;
private final List<Map<String, RuleCollection>> maps;
// After
private final Map<String, RuleCollection> userRuleMap = new HashMap<>();
```
If the intent is to reduce lock contention via partitioning, consider using
a `ConcurrentHashMap` instead.
---
## Comment 3 — `FileCacheAdmissionManager.java` L636 (jsonFiles null check)
`File.listFiles()` can return `null` if an I/O error occurs or if the path
is not a valid directory (even after the `isDirectory()` check, race conditions
are possible). Accessing `jsonFiles.length` on a `null` reference will throw a
`NullPointerException`.
Please add a null check:
```java
File[] jsonFiles = ruleDir.listFiles((dir, name) ->
name.toLowerCase().endsWith(".json"));
if (jsonFiles == null) {
LOG.warn("Failed to list files in admission rule directory: {}",
Config.file_cache_admission_control_json_dir);
return;
}
```
---
## Comment 4 — `FileQueryScanNode.java` L326 (Boolean vs boolean)
`Boolean` (boxed type) is used here and in
`SplitAssignment.fileCacheAdmission` / `SplitToScanRange.getScanRange(...)`,
but there is no need for `null` semantics. Using the primitive `boolean` avoids
unnecessary auto-boxing overhead and the risk of `NullPointerException`.
```java
// Before
Boolean admissionResult = true;
// After
boolean admissionResult = true;
```
Same applies to:
- `SplitAssignment.java`: `private final Boolean fileCacheAdmission` →
`private final boolean fileCacheAdmission`
- `SplitToScanRange.java`: parameter `Boolean fileCacheAdmission` → `boolean
fileCacheAdmission`
- `FileScanNode.addFileCacheAdmissionLog`: parameter `Boolean admitted` →
`boolean admitted`
- `RuleCollection`: `private Boolean excludeGlobal` / `private Boolean
includeGlobal` → `private boolean`
---
## Comment 5 — `csv_reader.cpp` (and json/orc/parquet readers) Thrift
`__isset` check
The `file_cache_admission` field is defined as `optional` in Thrift
(`PlanNodes.thrift`). When a Thrift optional field is not explicitly set by the
sender (e.g., an older FE version), accessing it directly may yield an
uninitialized/default value depending on the language binding behavior.
It would be safer to check `__isset` before reading:
```cpp
_file_description.file_cache_admission = _range.__isset.file_cache_admission
? _range.file_cache_admission : true;
```
This applies to all four readers: `CsvReader`, `NewJsonReader`, `OrcReader`,
and `ParquetReader`.
---
## Comment 6 — `FileCacheAdmissionRuleRefresherTest.java` (Thread.sleep
flakiness)
Using `Thread.sleep(1000)` to wait for file-system watch events is
inherently flaky — on a heavily loaded CI machine, the `WatchService` event
delivery could be delayed beyond 1 second, causing intermittent test failures.
Consider using a polling-based assertion with a timeout, e.g., using the
[Awaitility](https://github.com/awaitility/awaitility) library:
```java
import static org.awaitility.Awaitility.await;
import java.util.concurrent.TimeUnit;
// Instead of Thread.sleep(1000):
await().atMost(10, TimeUnit.SECONDS).untilAsserted(() -> {
AtomicReference<String> reason = new AtomicReference<>();
boolean result = manager.isAdmittedAtTableLevel("user_1", "catalog_1",
"database_1", "table_1", reason);
Assert.assertTrue(result);
Assert.assertEquals("user table-level whitelist rule", reason.get());
});
```
Alternatively, if adding a dependency is undesirable, a simple retry loop
with `Thread.sleep(100)` and a max retry count would also be more robust than a
single 1-second sleep.
---
## Comment 7 — `FileCacheAdmissionManager.java` L539 (minor code style)
Nit: In `logDefaultAdmission`, the local variable `admitted` is always
`false`, making the ternary expression unnecessary:
```java
// Before
boolean admitted = false;
String decision = admitted ? "admitted" : "denied";
// After
String decision = "denied";
```
Also, the class name `ConcurrentRuleManager` is somewhat misleading — it is
**not** thread-safe by itself. The actual thread safety is provided by the
`ReentrantReadWriteLock` in the enclosing `FileCacheAdmissionManager`. Consider
renaming it to something like `RuleManager` or `RuleIndex` to avoid confusion.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]