hudi-agent commented on code in PR #17591:
URL: https://github.com/apache/hudi/pull/17591#discussion_r3461925297


##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/common/HoodieSparkEngineContext.java:
##########
@@ -135,6 +135,16 @@ public <T> HoodieData<T> parallelize(List<T> data, int 
parallelism) {
     return HoodieJavaRDD.of(javaSparkContext.parallelize(data, parallelism));
   }
 
+  @Override
+  public <T> HoodieData<T> union(List<HoodieData<T>> dataList) {
+    List<JavaRDD<T>> javaRDDList = 
dataList.stream().map(HoodieJavaRDD::getJavaRDD).collect(Collectors.toList());
+    if (javaRDDList.isEmpty()) {
+      return HoodieJavaRDD.of(javaSparkContext.emptyRDD());
+    } else {
+      return HoodieJavaRDD.of(javaSparkContext.union(javaRDDList.toArray(new 
JavaRDD[0])));
+    }

Review Comment:
   🤖 nit: `new JavaRDD[0]` is a raw type and will produce an unchecked warning 
at compile time — have you considered passing `javaRDDList` directly if 
`JavaSparkContext.union(List<JavaRDD<T>>)` is available, or casting to 
`(JavaRDD<T>[]) new JavaRDD[0]` with a `@SuppressWarnings("unchecked")` to make 
the intent explicit?
   
   <sub><i>⚠️ AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



##########
hudi-common/src/main/java/org/apache/hudi/common/engine/HoodieEngineContext.java:
##########
@@ -80,6 +80,8 @@ public <T> HoodieData<T> parallelize(List<T> data) {
     }
   }
 
+  public abstract <T> HoodieData<T> union(List<HoodieData<T>> dataList);

Review Comment:
   🤖 nit: since the list-based `union` body is identical in all four non-Spark 
contexts (Flink, Java, Local, ExecutorServiceBased), could you make this a 
concrete default method here instead of abstract? `HoodieEngineContext` already 
has a concrete `parallelize(List<T> data)` that delegates down — the same 
pattern would eliminate the four copy-pasted overrides.
   
   <sub><i>⚠️ AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



##########
hudi-common/src/main/java/org/apache/hudi/common/engine/HoodieEngineContext.java:
##########
@@ -80,6 +80,8 @@ public <T> HoodieData<T> parallelize(List<T> data) {
     }
   }
 
+  public abstract <T> HoodieData<T> union(List<HoodieData<T>> dataList);

Review Comment:
   🤖 One thing worth double-checking on the underlying concern: I couldn't find 
an `ExecutorServiceBasedEngineContext` class anywhere in the repo. The only 
subclasses of `HoodieEngineContext` are `HoodieLocalEngineContext`, 
`HoodieSparkEngineContext`, `HoodieJavaEngineContext`, and 
`HoodieFlinkEngineContext` — all four of which the PR implements `union` for. 
So the new abstract method shouldn't cause a missing-override compile failure 
unless I'm missing a subclass somewhere.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to