lukecwik commented on a change in pull request #12984:
URL: https://github.com/apache/beam/pull/12984#discussion_r504119011
##########
File path:
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/HotKeyLogger.java
##########
@@ -47,10 +46,31 @@
/** Logs a detection of the hot key every 5 minutes. */
public void logHotKeyDetection(String userStepName, Duration hotKeyAge) {
+ logHotKeyDetection(userStepName, hotKeyAge, null);
Review comment:
Note that `null` is a valid key type and using null to differentiate
between log without key and log with key will miss this usecase which is why I
proposed the two suggestions below:
```suggestion
if (isThrottled()) {
return;
}
LOG.warn(
"A hot key was detected in step '{}' with age of '{}'. This is "
+ "a symptom of key distribution being skewed. To fix, please
inspect your data and "
+ "pipeline to ensure that elements are evenly distributed
across your key space.",
userStepName,
TimeUtil.toCloudDuration(hotKeyAge));
```
##########
File path:
runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/HotKeyLoggerTest.java
##########
@@ -50,18 +45,29 @@ public void SetUp() {
}
@Test
- public void correctHotKeyMessage() {
+ public void correctHotKeyMessageWithoutKey() {
HotKeyLogger hotKeyLogger = new HotKeyLogger(clock);
- assertFalse(hotKeyLogger.isThrottled());
- String m = hotKeyLogger.getHotKeyMessage("TEST_STEP_ID", "1s");
+ hotKeyLogger.logHotKeyDetection("TEST_STEP_ID",
Duration.standardSeconds(1));
assertTrue(hotKeyLogger.isThrottled());
- assertEquals(
+ expectedLogs.verifyWarn(
"A hot key was detected in step 'TEST_STEP_ID' with age of '1s'. This
is a "
+ "symptom of key distribution being skewed. To fix, please
inspect your data and "
- + "pipeline to ensure that elements are evenly distributed across
your key space.",
- m);
+ + "pipeline to ensure that elements are evenly distributed across
your key space.");
+ }
+
+ @Test
+ public void correctHotKeyMessageWithKey() {
Review comment:
might as well cover the `null` as an object case to ensure we don't
regress.
##########
File path:
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/HotKeyLogger.java
##########
@@ -47,10 +46,31 @@
/** Logs a detection of the hot key every 5 minutes. */
public void logHotKeyDetection(String userStepName, Duration hotKeyAge) {
+ logHotKeyDetection(userStepName, hotKeyAge, null);
+ }
+
+ /** Logs a detection of the hot key every 5 minutes with the given key. */
+ public void logHotKeyDetection(String userStepName, Duration hotKeyAge,
Object hotkey) {
if (isThrottled()) {
return;
}
- LOG.warn(getHotKeyMessage(userStepName,
TimeUtil.toCloudDuration(hotKeyAge)));
+
+ if (hotkey == null) {
+ LOG.warn(
+ "A hot key was detected in step '{}' with age of '{}'. This is "
+ + "a symptom of key distribution being skewed. To fix, please
inspect your data and "
+ + "pipeline to ensure that elements are evenly distributed
across your key space.",
+ userStepName,
+ TimeUtil.toCloudDuration(hotKeyAge));
+ } else {
+ LOG.warn(
+ "A hot key '{}' was detected in step '{}' with age of '{}'. This is "
+ + "a symptom of key distribution being skewed. To fix, please
inspect your data and "
+ + "pipeline to ensure that elements are evenly distributed
across your key space.",
+ hotkey,
+ userStepName,
+ TimeUtil.toCloudDuration(hotKeyAge));
+ }
Review comment:
```suggestion
LOG.warn(
"A hot key '{}' was detected in step '{}' with age of '{}'. This is "
+ "a symptom of key distribution being skewed. To fix, please
inspect your data and "
+ "pipeline to ensure that elements are evenly distributed
across your key space.",
hotkey,
userStepName,
TimeUtil.toCloudDuration(hotKeyAge));
}
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]