garyli1019 commented on a change in pull request #2296:
URL: https://github.com/apache/hudi/pull/2296#discussion_r539828241



##########
File path: hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##########
@@ -178,11 +178,6 @@ private[hudi] object HoodieSparkSqlWriter {
             } else {
               hoodieAllIncomingRecords
             }
-
-          if (hoodieRecords.isEmpty()) {

Review comment:
       Hi @pengzhiwei2018 , I guess this `isEmpty()` triggerred the complex rdd 
transformations, the actual time consuming part is the transformation before 
this check. We can double check by put something like `show(1)` before the 
`isEmpty` check, then the time-consuming part should become the `show(1)` and 
the `isEmpty` check should finish fast. 
   IIRC, an empty rdd will trigger an error later if we don't have this check 
here. If that's not the case, would you write a unit test to verify?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to