tillrohrmann commented on a change in pull request #13111:
URL: https://github.com/apache/flink/pull/13111#discussion_r469758699
##########
File path: flink-core/src/main/java/org/apache/flink/util/ExceptionUtils.java
##########
@@ -110,51 +112,76 @@ public static boolean
isJvmFatalOrOutOfMemoryError(Throwable t) {
}
/**
- * Tries to enrich the passed exception with additional information.
- *
- * <p>This method improves error message for direct and metaspace
{@link OutOfMemoryError}.
- * It adds description of possible causes and ways of resolution.
- *
- * @param exception exception to enrich if not {@code null}
- * @return the enriched exception or the original if no additional
information could be added;
- * {@code null} if the argument was {@code null}
+ * Tries to enrich OutOfMemoryErrors being part of the passed root
Throwable's cause tree.
+ *
+ * <p>This method improves error messages for direct and metaspace
{@link OutOfMemoryError}.
+ * It adds description about the possible causes and ways of resolution.
+ *
+ * @param root The Throwable of which the cause tree shall be traversed.
+ * @param jvmMetaspaceOomNewErrorMessage The message being used for JVM
metaspace-related OutOfMemoryErrors. Passing
+ * <code>null</code> will disable
handling this class of error.
+ * @param jvmDirectOomNewErrorMessage The message being used for direct
memory-related OutOfMemoryErrors. Passing
+ * <code>null</code> will disable
handling this class of error.
+ * @param jvmHeapSpaceOomNewErrorMessage The message being used for
Heap space-related OutOfMemoryErrors. Passing
+ * <code>null</code> will disable
handling this class of error.
*/
- @Nullable
- public static Throwable tryEnrichOutOfMemoryError(
- @Nullable Throwable exception,
- String jvmMetaspaceOomNewErrorMessage,
- String jvmDirectOomNewErrorMessage) {
- boolean isOom = exception instanceof OutOfMemoryError;
- if (!isOom) {
- return exception;
- }
-
- OutOfMemoryError oom = (OutOfMemoryError) exception;
- if (isMetaspaceOutOfMemoryError(oom)) {
- return changeOutOfMemoryErrorMessage(oom,
jvmMetaspaceOomNewErrorMessage);
- } else if (isDirectOutOfMemoryError(oom)) {
- return changeOutOfMemoryErrorMessage(oom,
jvmDirectOomNewErrorMessage);
- }
+ public static void tryEnrichOutOfMemoryError(
+ @Nullable Throwable root,
+ @Nullable String jvmMetaspaceOomNewErrorMessage,
+ @Nullable String jvmDirectOomNewErrorMessage,
+ @Nullable String jvmHeapSpaceOomNewErrorMessage) {
+ updateDetailMessage(root, t -> {
+ if (isMetaspaceOutOfMemoryError(t)) {
+ return jvmMetaspaceOomNewErrorMessage;
+ } else if (isDirectOutOfMemoryError(t)) {
+ return jvmDirectOomNewErrorMessage;
+ } else if (isHeapSpaceOutOfMemoryError(t)) {
+ return jvmHeapSpaceOomNewErrorMessage;
+ }
- return oom;
+ return null;
+ });
}
/**
- * Rewrites the error message of a {@link OutOfMemoryError}.
+ * Updates the error message of the first Throwable appearing in the
cause tree of the passed root Throwable and
+ * matching the passed Predicate by traversing the cause tree
top-to-bottom.
*
- * @param oom original {@link OutOfMemoryError}
- * @param newMessage new error message
- * @return the origianl {@link OutOfMemoryError} if it already has the
new error message or
- * a new {@link OutOfMemoryError} with the new error message
+ * @param root The Throwable whose cause tree shall be traversed.
+ * @param throwableToMessage The Function based on which the new
messages are generated. The function implementation
+ * should return the new message. Returning
<code>null</code>, in contrast, will result in
+ * not updating the message for the
corresponding Throwable.
*/
- private static OutOfMemoryError
changeOutOfMemoryErrorMessage(OutOfMemoryError oom, String newMessage) {
- if (oom.getMessage().equals(newMessage)) {
- return oom;
+ public static void updateDetailMessage(Throwable root,
Function<Throwable, String> throwableToMessage) {
Review comment:
if `root` can be `null`, then let's add `@Nullable`. The same for
`throwableToMessage`. However, it might not be necessary to support `null`
values here.
##########
File path: flink-core/src/main/java/org/apache/flink/util/ExceptionUtils.java
##########
@@ -110,51 +112,76 @@ public static boolean
isJvmFatalOrOutOfMemoryError(Throwable t) {
}
/**
- * Tries to enrich the passed exception with additional information.
- *
- * <p>This method improves error message for direct and metaspace
{@link OutOfMemoryError}.
- * It adds description of possible causes and ways of resolution.
- *
- * @param exception exception to enrich if not {@code null}
- * @return the enriched exception or the original if no additional
information could be added;
- * {@code null} if the argument was {@code null}
+ * Tries to enrich OutOfMemoryErrors being part of the passed root
Throwable's cause tree.
+ *
+ * <p>This method improves error messages for direct and metaspace
{@link OutOfMemoryError}.
+ * It adds description about the possible causes and ways of resolution.
+ *
+ * @param root The Throwable of which the cause tree shall be traversed.
+ * @param jvmMetaspaceOomNewErrorMessage The message being used for JVM
metaspace-related OutOfMemoryErrors. Passing
+ * <code>null</code> will disable
handling this class of error.
+ * @param jvmDirectOomNewErrorMessage The message being used for direct
memory-related OutOfMemoryErrors. Passing
+ * <code>null</code> will disable
handling this class of error.
+ * @param jvmHeapSpaceOomNewErrorMessage The message being used for
Heap space-related OutOfMemoryErrors. Passing
+ * <code>null</code> will disable
handling this class of error.
*/
- @Nullable
- public static Throwable tryEnrichOutOfMemoryError(
- @Nullable Throwable exception,
- String jvmMetaspaceOomNewErrorMessage,
- String jvmDirectOomNewErrorMessage) {
- boolean isOom = exception instanceof OutOfMemoryError;
- if (!isOom) {
- return exception;
- }
-
- OutOfMemoryError oom = (OutOfMemoryError) exception;
- if (isMetaspaceOutOfMemoryError(oom)) {
- return changeOutOfMemoryErrorMessage(oom,
jvmMetaspaceOomNewErrorMessage);
- } else if (isDirectOutOfMemoryError(oom)) {
- return changeOutOfMemoryErrorMessage(oom,
jvmDirectOomNewErrorMessage);
- }
+ public static void tryEnrichOutOfMemoryError(
+ @Nullable Throwable root,
+ @Nullable String jvmMetaspaceOomNewErrorMessage,
+ @Nullable String jvmDirectOomNewErrorMessage,
+ @Nullable String jvmHeapSpaceOomNewErrorMessage) {
+ updateDetailMessage(root, t -> {
+ if (isMetaspaceOutOfMemoryError(t)) {
+ return jvmMetaspaceOomNewErrorMessage;
+ } else if (isDirectOutOfMemoryError(t)) {
+ return jvmDirectOomNewErrorMessage;
+ } else if (isHeapSpaceOutOfMemoryError(t)) {
+ return jvmHeapSpaceOomNewErrorMessage;
+ }
- return oom;
+ return null;
+ });
}
/**
- * Rewrites the error message of a {@link OutOfMemoryError}.
+ * Updates the error message of the first Throwable appearing in the
cause tree of the passed root Throwable and
+ * matching the passed Predicate by traversing the cause tree
top-to-bottom.
*
- * @param oom original {@link OutOfMemoryError}
- * @param newMessage new error message
- * @return the origianl {@link OutOfMemoryError} if it already has the
new error message or
- * a new {@link OutOfMemoryError} with the new error message
+ * @param root The Throwable whose cause tree shall be traversed.
+ * @param throwableToMessage The Function based on which the new
messages are generated. The function implementation
+ * should return the new message. Returning
<code>null</code>, in contrast, will result in
+ * not updating the message for the
corresponding Throwable.
*/
- private static OutOfMemoryError
changeOutOfMemoryErrorMessage(OutOfMemoryError oom, String newMessage) {
- if (oom.getMessage().equals(newMessage)) {
- return oom;
+ public static void updateDetailMessage(Throwable root,
Function<Throwable, String> throwableToMessage) {
+ if (throwableToMessage == null) {
+ return;
+ }
+
+ Throwable it = root;
+ while (it != null) {
+ String newMessage = throwableToMessage.apply(it);
+ if (newMessage != null) {
+ updateDetailMessageOfThrowable(it, newMessage);
+ }
+
+ it = it.getCause();
+ }
+ }
+
+ private static void updateDetailMessageOfThrowable(Throwable throwable,
String newDetailMessage) {
+ Field field;
+ try {
+ field =
Throwable.class.getDeclaredField("detailMessage");
+ } catch (NoSuchFieldException e) {
+ throw new IllegalStateException("The JDK Throwable does
provide a detailMessage.", e);
Review comment:
```suggestion
throw new IllegalStateException("The JDK Throwable does
not provide a detailMessage.", e);
```
##########
File path: flink-core/src/main/java/org/apache/flink/util/ExceptionUtils.java
##########
@@ -110,51 +112,76 @@ public static boolean
isJvmFatalOrOutOfMemoryError(Throwable t) {
}
/**
- * Tries to enrich the passed exception with additional information.
- *
- * <p>This method improves error message for direct and metaspace
{@link OutOfMemoryError}.
- * It adds description of possible causes and ways of resolution.
- *
- * @param exception exception to enrich if not {@code null}
- * @return the enriched exception or the original if no additional
information could be added;
- * {@code null} if the argument was {@code null}
+ * Tries to enrich OutOfMemoryErrors being part of the passed root
Throwable's cause tree.
+ *
+ * <p>This method improves error messages for direct and metaspace
{@link OutOfMemoryError}.
+ * It adds description about the possible causes and ways of resolution.
+ *
+ * @param root The Throwable of which the cause tree shall be traversed.
+ * @param jvmMetaspaceOomNewErrorMessage The message being used for JVM
metaspace-related OutOfMemoryErrors. Passing
+ * <code>null</code> will disable
handling this class of error.
+ * @param jvmDirectOomNewErrorMessage The message being used for direct
memory-related OutOfMemoryErrors. Passing
+ * <code>null</code> will disable
handling this class of error.
+ * @param jvmHeapSpaceOomNewErrorMessage The message being used for
Heap space-related OutOfMemoryErrors. Passing
+ * <code>null</code> will disable
handling this class of error.
*/
- @Nullable
- public static Throwable tryEnrichOutOfMemoryError(
- @Nullable Throwable exception,
- String jvmMetaspaceOomNewErrorMessage,
- String jvmDirectOomNewErrorMessage) {
- boolean isOom = exception instanceof OutOfMemoryError;
- if (!isOom) {
- return exception;
- }
-
- OutOfMemoryError oom = (OutOfMemoryError) exception;
- if (isMetaspaceOutOfMemoryError(oom)) {
- return changeOutOfMemoryErrorMessage(oom,
jvmMetaspaceOomNewErrorMessage);
- } else if (isDirectOutOfMemoryError(oom)) {
- return changeOutOfMemoryErrorMessage(oom,
jvmDirectOomNewErrorMessage);
- }
+ public static void tryEnrichOutOfMemoryError(
+ @Nullable Throwable root,
+ @Nullable String jvmMetaspaceOomNewErrorMessage,
+ @Nullable String jvmDirectOomNewErrorMessage,
+ @Nullable String jvmHeapSpaceOomNewErrorMessage) {
+ updateDetailMessage(root, t -> {
+ if (isMetaspaceOutOfMemoryError(t)) {
+ return jvmMetaspaceOomNewErrorMessage;
+ } else if (isDirectOutOfMemoryError(t)) {
+ return jvmDirectOomNewErrorMessage;
+ } else if (isHeapSpaceOutOfMemoryError(t)) {
+ return jvmHeapSpaceOomNewErrorMessage;
+ }
- return oom;
+ return null;
+ });
}
/**
- * Rewrites the error message of a {@link OutOfMemoryError}.
+ * Updates the error message of the first Throwable appearing in the
cause tree of the passed root Throwable and
+ * matching the passed Predicate by traversing the cause tree
top-to-bottom.
*
- * @param oom original {@link OutOfMemoryError}
- * @param newMessage new error message
- * @return the origianl {@link OutOfMemoryError} if it already has the
new error message or
- * a new {@link OutOfMemoryError} with the new error message
+ * @param root The Throwable whose cause tree shall be traversed.
+ * @param throwableToMessage The Function based on which the new
messages are generated. The function implementation
+ * should return the new message. Returning
<code>null</code>, in contrast, will result in
+ * not updating the message for the
corresponding Throwable.
*/
- private static OutOfMemoryError
changeOutOfMemoryErrorMessage(OutOfMemoryError oom, String newMessage) {
- if (oom.getMessage().equals(newMessage)) {
- return oom;
+ public static void updateDetailMessage(Throwable root,
Function<Throwable, String> throwableToMessage) {
+ if (throwableToMessage == null) {
+ return;
+ }
+
+ Throwable it = root;
+ while (it != null) {
+ String newMessage = throwableToMessage.apply(it);
+ if (newMessage != null) {
+ updateDetailMessageOfThrowable(it, newMessage);
+ }
+
+ it = it.getCause();
+ }
Review comment:
The JavaDocs and this loop are not aligned. It looks as if we continue
with updating the messages even after we have found the first match.
##########
File path:
flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobManagerRunnerImpl.java
##########
@@ -137,7 +138,12 @@ public JobManagerRunnerImpl(
this.leaderGatewayFuture = new CompletableFuture<>();
// now start the JobManager
- this.jobMasterService =
jobMasterFactory.createJobMasterService(jobGraph, this, userCodeLoader);
+ try {
+ this.jobMasterService =
jobMasterFactory.createJobMasterService(jobGraph, this, userCodeLoader);
+ } catch (Throwable t) {
+
ClusterEntryPointExceptionUtils.tryEnrichClusterEntryPointError(t);
+ throw t;
+ }
Review comment:
I would move this specific exception handling out because now we would
only enrich OOMs which occur here. But what if an OOM happens in a line above
(e.g. `classLoaderLease.getOrResolveClassLoader(...)`)? I think it would be
better to put it into `Dispatcher.internalSubmitJob`.
##########
File path:
flink-runtime/src/main/java/org/apache/flink/runtime/entrypoint/ClusterEntryPointExceptionUtils.java
##########
@@ -47,22 +47,28 @@
"which has to be investigated and fixed. The Flink
Master has to be shutdown...",
JobManagerOptions.JVM_METASPACE.key());
+ private static final String JM_HEAP_SPACE_OOM_ERROR_MESSAGE =
String.format(
+ "Java heap space. A heap space-related out-of-memory error has
occurred. This can mean two things: either Flink Master " +
+ "requires a larger size of JVM heap space or there is a
memory leak. In the first case, '%s' can be used to increase " +
+ "the amount of available heap memory. If the problem is
not resolved by increasing the heap size, it indicate a " +
Review comment:
```suggestion
"the amount of available heap memory. If the problem is
not resolved by increasing the heap size, it indicates a " +
```
##########
File path:
flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionGraph.java
##########
@@ -1046,6 +1047,8 @@ void failGlobalIfExecutionIsStillRunning(Throwable cause,
ExecutionAttemptID fai
* @param t The exception that caused the failure.
*/
public void failGlobal(Throwable t) {
+
ClusterEntryPointExceptionUtils.tryEnrichClusterEntryPointError(t);
Review comment:
I am not sure whether we can enrich `t` here because `t` can also come
from the `TaskExecutor` if a `Task` has failed. In this case, the
ClusterEntryPoint specific messages don't make sense.
##########
File path:
flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java
##########
@@ -134,120 +134,190 @@
*/
public class Task implements Runnable, TaskSlotPayload, TaskActions,
PartitionProducerStateProvider, CheckpointListener, BackPressureSampleableTask {
- /** The class logger. */
+ /**
+ * The class logger.
+ */
Review comment:
All changes in this file seem to be quite unrelated. Please revert.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]