morningman commented on PR #59065:
URL: https://github.com/apache/doris/pull/59065#issuecomment-4017265228

   ## Comment 1 — `FileCacheAdmissionManager.java` L579 (readLock/writeLock not 
in finally)
   
   [Critical] The `readLock.unlock()` is not inside a `finally` block. If 
`ruleManager.isAdmittedAtTableLevel(...)` throws an unexpected exception, the 
read lock will never be released, causing all subsequent admission checks (and 
writes) to be blocked indefinitely.
   
   The same issue exists for `writeLock` in `loadRules()` (L599-L601) and 
`loadRules(String)` (L656-L658).
   
   Please use `try-finally` to ensure lock release:
   
   ```java
   public boolean isAdmittedAtTableLevel(...) {
       readLock.lock();
       try {
           return ruleManager.isAdmittedAtTableLevel(...);
       } finally {
           readLock.unlock();
       }
   }
   ```
   
   And similarly for the write lock:
   
   ```java
   writeLock.lock();
   try {
       ruleManager = newRuleManager;
   } finally {
       writeLock.unlock();
   }
   ```
   
   ---
   
   ## Comment 2 — `FileCacheAdmissionManager.java` L486 (getIndex potential OOB)
   
   [Critical] `getIndex(char firstChar)` returns `firstChar - 'A'`, and the 
backing list has `PARTITION_COUNT = 58` slots. However, the caller checks 
`Character.isAlphabetic(firstChar)` before calling `getIndex`, and 
`Character.isAlphabetic()` returns `true` for many Unicode characters beyond 
ASCII (e.g., Chinese characters, accented letters, etc.). For such characters, 
`firstChar - 'A'` will produce a value far exceeding 57, causing an 
`IndexOutOfBoundsException`.
   
   Suggestion: Replace the `List<Map<String, RuleCollection>>` with a simple 
`Map<String, RuleCollection>` (or `HashMap<String, RuleCollection>`) keyed by 
user identity. This eliminates the fragile character-based indexing entirely 
and is simpler to understand:
   
   ```java
   // Before
   private static final int PARTITION_COUNT = 58;
   private final List<Map<String, RuleCollection>> maps;
   
   // After
   private final Map<String, RuleCollection> userRuleMap = new HashMap<>();
   ```
   
   If the intent is to reduce lock contention via partitioning, consider using 
a `ConcurrentHashMap` instead.
   
   ---
   
   ## Comment 3 — `FileCacheAdmissionManager.java` L636 (jsonFiles null check)
   
   `File.listFiles()` can return `null` if an I/O error occurs or if the path 
is not a valid directory (even after the `isDirectory()` check, race conditions 
are possible). Accessing `jsonFiles.length` on a `null` reference will throw a 
`NullPointerException`.
   
   Please add a null check:
   
   ```java
   File[] jsonFiles = ruleDir.listFiles((dir, name) -> 
name.toLowerCase().endsWith(".json"));
   if (jsonFiles == null) {
       LOG.warn("Failed to list files in admission rule directory: {}",
               Config.file_cache_admission_control_json_dir);
       return;
   }
   ```
   
   ---
   
   ## Comment 4 — `FileQueryScanNode.java` L326 (Boolean vs boolean)
   
   `Boolean` (boxed type) is used here and in 
`SplitAssignment.fileCacheAdmission` / `SplitToScanRange.getScanRange(...)`, 
but there is no need for `null` semantics. Using the primitive `boolean` avoids 
unnecessary auto-boxing overhead and the risk of `NullPointerException`.
   
   ```java
   // Before
   Boolean admissionResult = true;
   
   // After
   boolean admissionResult = true;
   ```
   
   Same applies to:
   - `SplitAssignment.java`: `private final Boolean fileCacheAdmission` → 
`private final boolean fileCacheAdmission`
   - `SplitToScanRange.java`: parameter `Boolean fileCacheAdmission` → `boolean 
fileCacheAdmission`
   - `FileScanNode.addFileCacheAdmissionLog`: parameter `Boolean admitted` → 
`boolean admitted`
   - `RuleCollection`: `private Boolean excludeGlobal` / `private Boolean 
includeGlobal` → `private boolean`
   
   ---
   
   ## Comment 5 — `csv_reader.cpp` (and json/orc/parquet readers) Thrift 
`__isset` check
   
   The `file_cache_admission` field is defined as `optional` in Thrift 
(`PlanNodes.thrift`). When a Thrift optional field is not explicitly set by the 
sender (e.g., an older FE version), accessing it directly may yield an 
uninitialized/default value depending on the language binding behavior.
   
   It would be safer to check `__isset` before reading:
   
   ```cpp
   _file_description.file_cache_admission = _range.__isset.file_cache_admission
       ? _range.file_cache_admission : true;
   ```
   
   This applies to all four readers: `CsvReader`, `NewJsonReader`, `OrcReader`, 
and `ParquetReader`.
   
   ---
   
   ## Comment 6 — `FileCacheAdmissionRuleRefresherTest.java` (Thread.sleep 
flakiness)
   
   Using `Thread.sleep(1000)` to wait for file-system watch events is 
inherently flaky — on a heavily loaded CI machine, the `WatchService` event 
delivery could be delayed beyond 1 second, causing intermittent test failures.
   
   Consider using a polling-based assertion with a timeout, e.g., using the 
[Awaitility](https://github.com/awaitility/awaitility) library:
   
   ```java
   import static org.awaitility.Awaitility.await;
   import java.util.concurrent.TimeUnit;
   
   // Instead of Thread.sleep(1000):
   await().atMost(10, TimeUnit.SECONDS).untilAsserted(() -> {
       AtomicReference<String> reason = new AtomicReference<>();
       boolean result = manager.isAdmittedAtTableLevel("user_1", "catalog_1", 
"database_1", "table_1", reason);
       Assert.assertTrue(result);
       Assert.assertEquals("user table-level whitelist rule", reason.get());
   });
   ```
   
   Alternatively, if adding a dependency is undesirable, a simple retry loop 
with `Thread.sleep(100)` and a max retry count would also be more robust than a 
single 1-second sleep.
   
   ---
   
   ## Comment 7 — `FileCacheAdmissionManager.java` L539 (minor code style)
   
   Nit: In `logDefaultAdmission`, the local variable `admitted` is always 
`false`, making the ternary expression unnecessary:
   
   ```java
   // Before
   boolean admitted = false;
   String decision = admitted ? "admitted" : "denied";
   
   // After
   String decision = "denied";
   ```
   
   Also, the class name `ConcurrentRuleManager` is somewhat misleading — it is 
**not** thread-safe by itself. The actual thread safety is provided by the 
`ReentrantReadWriteLock` in the enclosing `FileCacheAdmissionManager`. Consider 
renaming it to something like `RuleManager` or `RuleIndex` to avoid confusion.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to