hudi-agent commented on code in PR #18790:
URL: https://github.com/apache/hudi/pull/18790#discussion_r3296286039


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/metrics/FlinkBucketAssignMetrics.java:
##########
@@ -38,6 +40,13 @@ public class FlinkBucketAssignMetrics extends 
HoodieFlinkMetrics {
    */
   private final Histogram recordBufferingTime;
 
+  /**
+   * Number of RLI file group shards assigned to this bucket assign task.
+   * Set once during open() when global RLI is active; remains -1 otherwise.
+   * Compare across task subtasks to detect skew in shard distribution.
+   */

Review Comment:
   🤖 nit: since this value is written exactly once (in `open()`) and only read 
by the gauge, an `AtomicInteger` overstates the concurrency requirement. A 
`volatile int` with a plain setter would convey intent more clearly.
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/partitioner/GlobalRecordIndexPartitioner.java:
##########
@@ -39,6 +41,13 @@
  * group and reduces small files in the metadata table.
  */
 public class GlobalRecordIndexPartitioner implements Partitioner<HoodieKey> {
+  /**
+   * JVM-level cache to avoid repeatedly fetching the file group count from 
the metadata table.
+   * Keyed by table path; shared across all instances (partitioner + 
BucketAssignFunction) in the
+   * same task manager JVM so the expensive metadata lookup only happens once 
per JVM.
+   */
+  private static final ConcurrentHashMap<String, Integer> 
NUM_FILE_GROUPS_CACHE = new ConcurrentHashMap<>();
+
   private final Configuration conf;

Review Comment:
   🤖 nit: the cache is keyed only by table path, but the lambda captures the 
first `conf` it sees. Worth calling that out in the Javadoc so a future caller 
doesn't expect per-conf semantics when reusing the same path.
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to