garyli1019 commented on a change in pull request #2296: URL: https://github.com/apache/hudi/pull/2296#discussion_r539828241
########## File path: hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala ########## @@ -178,11 +178,6 @@ private[hudi] object HoodieSparkSqlWriter { } else { hoodieAllIncomingRecords } - - if (hoodieRecords.isEmpty()) { Review comment: Hi @pengzhiwei2018 , I guess this `isEmpty()` triggerred the complex rdd transformations, the actual time consuming part is the transformation before this check. We can double check by put something like `show(1)` before the `isEmpty` check, then the time-consuming part should become the `show(1)` and the `isEmpty` check should finish fast. IIRC, an empty rdd will trigger an error later if we don't have this check here. If that's not the case, would you write a unit test to verify? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org