yuanlihan commented on code in PR #14984:
URL: https://github.com/apache/druid/pull/14984#discussion_r1326796012
##########
extensions-core/hdfs-storage/src/main/java/org/apache/druid/storage/hdfs/HdfsDataSegmentPusher.java:
##########
@@ -124,8 +124,18 @@ public DataSegment pushToPath(File inDir, DataSegment
segment, String storageDir
fullyQualifiedStorageDirectory.get(),
storageDirSuffix
);
-
- final String storageDir = StringUtils.format("%s/%s",
fullyQualifiedStorageDirectory.get(), storageDirSuffix);
+
+ final String outIndexFilePath;
+ if (storageDirSuffix.endsWith("index.zip")) {
Review Comment:
@kfaraz Thanks for looking into this!
My first intuition was also trying to fix
`DeepStorageIntermediaryDataManager` by passing a file path contains
`index.zip` suffix here
https://github.com/apache/druid/blob/3ae5e978012e7a553e6ad3007eb210ab8da5b6fc/indexing-service/src/main/java/org/apache/druid/indexing/worker/shuffle/DeepStorageIntermediaryDataManager.java#L78
But after checking other implementations of `DataSegmentPusher`, I found
they all call method
[pushToPath](https://github.com/apache/druid/blob/3ae5e978012e7a553e6ad3007eb210ab8da5b6fc/processing/src/main/java/org/apache/druid/segment/loading/DataSegmentPusher.java#L71)
in their
[push](https://github.com/apache/druid/blob/3ae5e978012e7a553e6ad3007eb210ab8da5b6fc/processing/src/main/java/org/apache/druid/segment/loading/DataSegmentPusher.java#L69)
method. Some of them format segment file path contains `index.zip` suffix and
pass it by parameter `storageDirSuffix` to method `pushToPath`, see below
1. `AzureDataSegmentPusher`
https://github.com/apache/druid/blob/3ae5e978012e7a553e6ad3007eb210ab8da5b6fc/extensions-core/azure-extensions/src/main/java/org/apache/druid/storage/azure/AzureDataSegmentPusher.java#L124-L125
2. `HdfsDataSegmentPusher`
https://github.com/apache/druid/blob/279b3818f0e02f6a2ff389532f1e6118598a7c24/extensions-core/hdfs-storage/src/main/java/org/apache/druid/storage/hdfs/HdfsDataSegmentPusher.java#L108-L115
Other implementations format `storageDirSuffix` by calling
[getStorageDir](https://github.com/apache/druid/blob/279b3818f0e02f6a2ff389532f1e6118598a7c24/processing/src/main/java/org/apache/druid/segment/loading/DataSegmentPusher.java#L89)
in their `push` methods.
To fix the issue, I'd prefer to fix the inconsistent usages of `pushToPath`
in method `AzureDataSegmentPusher#push` and `HdfsDataSegmentPusher#push`. We
should always pass a dir to parameter `storageDirSuffix` of method `pushToPath`
and format segment file path with suffix `index.zip` in method `pushToPath`.
So, we don't need to change `DeepStorageIntermediaryDataManager` as it passes a
dir to method `pushToPath`. Does it make sense to you?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]