Copilot commented on code in PR #3653:
URL: https://github.com/apache/celeborn/pull/3653#discussion_r3057887472
##########
worker/src/main/scala/org/apache/celeborn/service/deploy/worker/PushDataHandler.scala:
##########
@@ -1425,7 +1425,7 @@ class PushDataHandler(val workerSource: WorkerSource)
extends BaseMessageHandler
(false, fileWriter)
}
- private def checkDiskFull(fileWriter: PartitionDataWriter): Boolean = {
+ private[worker] def checkDiskFull(fileWriter: PartitionDataWriter): Boolean
= {
Review Comment:
Both `checkDiskFull` and `checkDiskFullAndSplit` are widened to
`private[worker]`. If the goal is only to unit test the split decision,
consider limiting the visibility change to just `checkDiskFullAndSplit` (or
extracting the decision logic into a small package-private helper) to reduce
the surface area exposed to other production code in the `worker` package.
```suggestion
private def checkDiskFull(fileWriter: PartitionDataWriter): Boolean = {
```
##########
worker/src/main/scala/org/apache/celeborn/service/deploy/worker/PushDataHandler.scala:
##########
@@ -1455,9 +1455,11 @@ class PushDataHandler(val workerSource: WorkerSource)
extends BaseMessageHandler
|fileName:${fileWriter.getCurrentFileInfo.getFilePath}
|""".stripMargin)
val diskFileInfo = fileWriter.getDiskFileInfo
- if (diskFileInfo != null) {
- if (workerPartitionSplitEnabled && ((diskFull &&
diskFileInfo.getFileLength > partitionSplitMinimumSize) ||
- (isPrimary && diskFileInfo.getFileLength >
fileWriter.getSplitThreshold))) {
+ if (workerPartitionSplitEnabled && diskFileInfo != null) {
+ if (diskFull && (diskFileInfo.getFileLength >
partitionSplitMinimumSize)) {
+ return StatusCode.HARD_SPLIT
+ }
+ if (isPrimary && diskFileInfo.getFileLength >
fileWriter.getSplitThreshold) {
if (fileWriter.getSplitMode == PartitionSplitMode.SOFT &&
(fileWriter.getDiskFileInfo.getFileLength <
partitionSplitMaximumSize)) {
Review Comment:
Inside the `diskFileInfo != null` branch, the code re-fetches
`fileWriter.getDiskFileInfo` at line 1464 even though `diskFileInfo` is already
available and non-null. Use the local `diskFileInfo` consistently to avoid
redundant calls and to ensure the decision is based on a single, consistent
snapshot of file length.
```suggestion
(diskFileInfo.getFileLength < partitionSplitMaximumSize)) {
```
##########
worker/src/main/scala/org/apache/celeborn/service/deploy/worker/PushDataHandler.scala:
##########
@@ -1425,7 +1425,7 @@ class PushDataHandler(val workerSource: WorkerSource)
extends BaseMessageHandler
(false, fileWriter)
}
- private def checkDiskFull(fileWriter: PartitionDataWriter): Boolean = {
+ private[worker] def checkDiskFull(fileWriter: PartitionDataWriter): Boolean
= {
val flusher = fileWriter.getFlusher;
Review Comment:
Scala style convention generally avoids trailing semicolons. Consider
removing the semicolon to match typical Scala code style (and reduce diff noise
in future edits).
```suggestion
val flusher = fileWriter.getFlusher
```
##########
worker/src/main/scala/org/apache/celeborn/service/deploy/worker/PushDataHandler.scala:
##########
@@ -1436,7 +1436,7 @@ class PushDataHandler(val workerSource: WorkerSource)
extends BaseMessageHandler
}
}
- private def checkDiskFullAndSplit(
+ private[worker] def checkDiskFullAndSplit(
Review Comment:
Both `checkDiskFull` and `checkDiskFullAndSplit` are widened to
`private[worker]`. If the goal is only to unit test the split decision,
consider limiting the visibility change to just `checkDiskFullAndSplit` (or
extracting the decision logic into a small package-private helper) to reduce
the surface area exposed to other production code in the `worker` package.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]