phet commented on code in PR #4082:
URL: https://github.com/apache/gobblin/pull/4082#discussion_r1878688674
##########
gobblin-utility/src/main/java/org/apache/gobblin/util/JobLauncherUtils.java:
##########
@@ -66,13 +66,25 @@ public class JobLauncherUtils {
public static class WorkUnitPathCalculator {
private final AtomicInteger nextMultiWorkUnitTaskId = new AtomicInteger(0);
- // Serialize each work unit into a file named after the task ID
+ /** @return `Path` beneath `basePath` to serialize `workUnit`, with file
named after the task ID (itself named after the job ID) */
public Path calcNextPath(WorkUnit workUnit, String jobId, Path basePath) {
String workUnitFileName = workUnit.isMultiWorkUnit()
? JobLauncherUtils.newMultiTaskId(jobId,
nextMultiWorkUnitTaskId.getAndIncrement()) +
JobLauncherUtils.MULTI_WORK_UNIT_FILE_EXTENSION
: workUnit.getProp(ConfigurationKeys.TASK_ID_KEY) +
JobLauncherUtils.WORK_UNIT_FILE_EXTENSION;
return new Path(basePath, workUnitFileName);
}
+
+ /**
+ * Calc where to serialize {@link WorkUnit}, using a filename that tunnels
{@link WorkUnitSizeInfo}, vs. repeating the task/job ID, as was legacy practice
+ * @return `Path` beneath `basePath` to serialize `workUnit`
+ */
+ public Path calcNextPathWithTunneledSizeInfo(WorkUnit workUnit, String
jobId, Path basePath) {
+ String encodedSizeInfo = WorkUnitSizeInfo.forWorkUnit(workUnit).encode();
+ String workUnitFileName = workUnit.isMultiWorkUnit()
+ ? Id.MultiTask.create(encodedSizeInfo,
nextMultiWorkUnitTaskId.getAndIncrement()) +
JobLauncherUtils.MULTI_WORK_UNIT_FILE_EXTENSION
+ : Id.Task.create(encodedSizeInfo,
workUnit.getPropAsInt(ConfigurationKeys.TASK_KEY_KEY)) +
JobLauncherUtils.WORK_UNIT_FILE_EXTENSION;
+ return new Path(basePath, workUnitFileName);
+ }
Review Comment:
cool. yeah, practically speaking the filename was repeating the same "job
ID" info, also used in its parent dir's name. eliminating that redundancy
offered a direct and simple way to tunnel limited, but crucial, metadata that
incurs no additional access cost (nor adds load upon the FS).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]