hudi-agent commented on code in PR #18790:
URL: https://github.com/apache/hudi/pull/18790#discussion_r3296286039
##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/metrics/FlinkBucketAssignMetrics.java:
##########
@@ -38,6 +40,13 @@ public class FlinkBucketAssignMetrics extends
HoodieFlinkMetrics {
*/
private final Histogram recordBufferingTime;
+ /**
+ * Number of RLI file group shards assigned to this bucket assign task.
+ * Set once during open() when global RLI is active; remains -1 otherwise.
+ * Compare across task subtasks to detect skew in shard distribution.
+ */
Review Comment:
🤖 nit: since this value is written exactly once (in `open()`) and only read
by the gauge, an `AtomicInteger` overstates the concurrency requirement. A
`volatile int` with a plain setter would convey intent more clearly.
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/partitioner/GlobalRecordIndexPartitioner.java:
##########
@@ -39,6 +41,13 @@
* group and reduces small files in the metadata table.
*/
public class GlobalRecordIndexPartitioner implements Partitioner<HoodieKey> {
+ /**
+ * JVM-level cache to avoid repeatedly fetching the file group count from
the metadata table.
+ * Keyed by table path; shared across all instances (partitioner +
BucketAssignFunction) in the
+ * same task manager JVM so the expensive metadata lookup only happens once
per JVM.
+ */
+ private static final ConcurrentHashMap<String, Integer>
NUM_FILE_GROUPS_CACHE = new ConcurrentHashMap<>();
+
private final Configuration conf;
Review Comment:
🤖 nit: the cache is keyed only by table path, but the lambda captures the
first `conf` it sees. Worth calling that out in the Javadoc so a future caller
doesn't expect per-conf semantics when reusing the same path.
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]