abhishekmjain commented on code in PR #4092:
URL: https://github.com/apache/gobblin/pull/4092#discussion_r1940501751
##########
gobblin-temporal/src/main/java/org/apache/gobblin/temporal/yarn/DynamicScalingYarnService.java:
##########
@@ -232,7 +237,12 @@ private synchronized void
handleContainerExitedWithOOM(ContainerId completedCont
// Request a replacement container
int currContainerMemoryMbs =
workerProfile.getConfig().getInt(GobblinYarnConfigurationKeys.CONTAINER_MEMORY_MBS_KEY);
- int newContainerMemoryMbs = currContainerMemoryMbs * 2; //TODO: make it
configurable or auto-tunable
+ int newContainerMemoryMbs = currContainerMemoryMbs *
DEFAULT_REPLACEMENT_CONTAINER_MEMORY_MULTIPLIER;
+ if (newContainerMemoryMbs > MAX_REPLACEMENT_CONTAINER_MEMORY_MBS) {
+ log.warn("Expected replacement container memory exceeds the maximum
allowed memory {}. Not requesting a replacement container.",
+ MAX_REPLACEMENT_CONTAINER_MEMORY_MBS);
+ return;
Review Comment:
We have two options here:
1. to create a replacement container with same memory as the old container
2. return without creating any container.
I think 2 here would help us in failing the flow fast, whereas 1 would help
resolve any intermittent OOM issues (which is unlikely). We have opted for 2
here, but may need to update this in future based on the observations.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]