Copilot commented on code in PR #3653:
URL: https://github.com/apache/celeborn/pull/3653#discussion_r3057887472


##########
worker/src/main/scala/org/apache/celeborn/service/deploy/worker/PushDataHandler.scala:
##########
@@ -1425,7 +1425,7 @@ class PushDataHandler(val workerSource: WorkerSource) 
extends BaseMessageHandler
     (false, fileWriter)
   }
 
-  private def checkDiskFull(fileWriter: PartitionDataWriter): Boolean = {
+  private[worker] def checkDiskFull(fileWriter: PartitionDataWriter): Boolean 
= {

Review Comment:
   Both `checkDiskFull` and `checkDiskFullAndSplit` are widened to 
`private[worker]`. If the goal is only to unit test the split decision, 
consider limiting the visibility change to just `checkDiskFullAndSplit` (or 
extracting the decision logic into a small package-private helper) to reduce 
the surface area exposed to other production code in the `worker` package.
   ```suggestion
     private def checkDiskFull(fileWriter: PartitionDataWriter): Boolean = {
   ```



##########
worker/src/main/scala/org/apache/celeborn/service/deploy/worker/PushDataHandler.scala:
##########
@@ -1455,9 +1455,11 @@ class PushDataHandler(val workerSource: WorkerSource) 
extends BaseMessageHandler
          |fileName:${fileWriter.getCurrentFileInfo.getFilePath}
          |""".stripMargin)
     val diskFileInfo = fileWriter.getDiskFileInfo
-    if (diskFileInfo != null) {
-      if (workerPartitionSplitEnabled && ((diskFull && 
diskFileInfo.getFileLength > partitionSplitMinimumSize) ||
-          (isPrimary && diskFileInfo.getFileLength > 
fileWriter.getSplitThreshold))) {
+    if (workerPartitionSplitEnabled && diskFileInfo != null) {
+      if (diskFull && (diskFileInfo.getFileLength > 
partitionSplitMinimumSize)) {
+        return StatusCode.HARD_SPLIT
+      }
+      if (isPrimary && diskFileInfo.getFileLength > 
fileWriter.getSplitThreshold) {
         if (fileWriter.getSplitMode == PartitionSplitMode.SOFT &&
           (fileWriter.getDiskFileInfo.getFileLength < 
partitionSplitMaximumSize)) {

Review Comment:
   Inside the `diskFileInfo != null` branch, the code re-fetches 
`fileWriter.getDiskFileInfo` at line 1464 even though `diskFileInfo` is already 
available and non-null. Use the local `diskFileInfo` consistently to avoid 
redundant calls and to ensure the decision is based on a single, consistent 
snapshot of file length.
   ```suggestion
             (diskFileInfo.getFileLength < partitionSplitMaximumSize)) {
   ```



##########
worker/src/main/scala/org/apache/celeborn/service/deploy/worker/PushDataHandler.scala:
##########
@@ -1425,7 +1425,7 @@ class PushDataHandler(val workerSource: WorkerSource) 
extends BaseMessageHandler
     (false, fileWriter)
   }
 
-  private def checkDiskFull(fileWriter: PartitionDataWriter): Boolean = {
+  private[worker] def checkDiskFull(fileWriter: PartitionDataWriter): Boolean 
= {
     val flusher = fileWriter.getFlusher;

Review Comment:
   Scala style convention generally avoids trailing semicolons. Consider 
removing the semicolon to match typical Scala code style (and reduce diff noise 
in future edits).
   ```suggestion
       val flusher = fileWriter.getFlusher
   ```



##########
worker/src/main/scala/org/apache/celeborn/service/deploy/worker/PushDataHandler.scala:
##########
@@ -1436,7 +1436,7 @@ class PushDataHandler(val workerSource: WorkerSource) 
extends BaseMessageHandler
     }
   }
 
-  private def checkDiskFullAndSplit(
+  private[worker] def checkDiskFullAndSplit(

Review Comment:
   Both `checkDiskFull` and `checkDiskFullAndSplit` are widened to 
`private[worker]`. If the goal is only to unit test the split decision, 
consider limiting the visibility change to just `checkDiskFullAndSplit` (or 
extracting the decision logic into a small package-private helper) to reduce 
the surface area exposed to other production code in the `worker` package.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to