Re: Flink Memory analyze on AWS EMR
Hi, Xintong Thanks for point it out, after I set up the log path it's working now . so , for conclusion . on emr , to set up jitwatch in flink-conf.yaml, we should not include quotes and give a path to output the jit log file . this is different from setting it on standalone cluster . example : env.java.opts: -XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading -XX:+LogCompilation -XX:LogFile=/tmp/flinkmemdump.jit -XX:+PrintAssembly Thanks everyone involved in this discussion! Jacky Xintong Song 于2020年5月12日周二 下午10:41写道: > Hi Jacky, > > I don't think ${FLINK_LOG_PREFIX} is available for Flink Yarn deployment. > This is just my guess, that the actual file name becomes ".jit". You can > try to verify that by looking for the hidden file. > > If it is indeed this problem, you can try to replace "${FLINK_LOG_PREFIX}" > with "/your-file-name.jit". The token "" should be > replaced with proper log directory path by Yarn automatically. > > I noticed that the usage of ${FLINK_LOG_PREFIX} is recommended by Flink's > documentation [1]. This is IMO a bit misleading. I'll try to file an issue > to improve the docs. > > Thank you~ > > Xintong Song > > > [1] > https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/application_profiling.html#profiling-with-jitwatch > > On Wed, May 13, 2020 at 2:45 AM Jacky D wrote: > >> hi, Arvid >> >> thanks for the advice , I removed the quotes and it do created a yarn >> session on EMR , but I didn't find any jit log file generated . >> >> The config with quotes is working on standalone cluster . I also tried to >> dynamic pass the property within the yarn session command : >> >> flink-yarn-session -n 1 -d -nm testSession -yD >> env.java.opts="-XX:+UnlockDiagnosticVMOptions >> -XX:+TraceClassLoading -XX:+LogCompilation >> -XX:LogFile=${FLINK_LOG_PREFIX}.jit -XX:+PrintAssembly" >> >> >> but get same result , session created , but can not find any jit log file >> under container log . >> >> >> Thanks >> >> Jacky >> >> Arvid Heise 于2020年5月12日周二 下午12:57写道: >> >>> Hi Jacky, >>> >>> I suspect that the quotes are the actual issue. Could you try to remove >>> them? See also [1]. >>> >>> [1] >>> http://blogs.perl.org/users/tinita/2018/03/strings-in-yaml---to-quote-or-not-to-quote.html >>> >>> On Tue, May 12, 2020 at 4:03 PM Jacky D wrote: >>> >>>> hi, Xintong >>>> >>>> Thanks for reply , I attached those lines below for application master >>>> start command : >>>> >>>> >>>> 2020-05-11 21:16:16,635 DEBUG >>>> org.apache.hadoop.util.PerformanceAdvisory- Crypto >>>> codec org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec is not available. >>>> 2020-05-11 21:16:16,635 DEBUG >>>> org.apache.hadoop.util.PerformanceAdvisory- Using >>>> crypto codec org.apache.hadoop.crypto.JceAesCtrCryptoCodec. >>>> 2020-05-11 21:16:16,636 DEBUG org.apache.hadoop.hdfs.DataStreamer >>>> - DataStreamer block >>>> BP-1519523618-98.94.65.144-1581106168138:blk_1073745139_4315 sending packet >>>> packet seqno: 0 offsetInBlock: 0 lastPacketInBlock: false >>>> lastByteOffsetInBlock: 1697 >>>> 2020-05-11 21:16:16,637 DEBUG org.apache.hadoop.hdfs.DataStreamer >>>> - DFSClient seqno: 0 reply: SUCCESS >>>> downstreamAckTimeNanos: 0 flag: 0 >>>> 2020-05-11 21:16:16,637 DEBUG org.apache.hadoop.hdfs.DataStreamer >>>> - DataStreamer block >>>> BP-1519523618-98.94.65.144-1581106168138:blk_1073745139_4315 sending packet >>>> packet seqno: 1 offsetInBlock: 1697 lastPacketInBlock: true >>>> lastByteOffsetInBlock: 1697 >>>> 2020-05-11 21:16:16,638 DEBUG org.apache.hadoop.hdfs.DataStreamer >>>> - DFSClient seqno: 1 reply: SUCCESS >>>> downstreamAckTimeNanos: 0 flag: 0 >>>> 2020-05-11 21:16:16,638 DEBUG org.apache.hadoop.hdfs.DataStreamer >>>> - Closing old block >>>> BP-1519523618-98.94.65.144-1581106168138:blk_1073745139_4315 >>>> 2020-05-11 21:16:16,641 DEBUG org.apache.hadoop.ipc.Client >>>> - IPC Client (1954985045) connection to >>>> ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop sending #70 >>>> org.apache.hadoop.hdfs.protocol.ClientProtocol.complete >>>> 2020
Re: Flink Memory analyze on AWS EMR
hi, Arvid thanks for the advice , I removed the quotes and it do created a yarn session on EMR , but I didn't find any jit log file generated . The config with quotes is working on standalone cluster . I also tried to dynamic pass the property within the yarn session command : flink-yarn-session -n 1 -d -nm testSession -yD env.java.opts="-XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading -XX:+LogCompilation -XX:LogFile=${FLINK_LOG_PREFIX}.jit -XX:+PrintAssembly" but get same result , session created , but can not find any jit log file under container log . Thanks Jacky Arvid Heise 于2020年5月12日周二 下午12:57写道: > Hi Jacky, > > I suspect that the quotes are the actual issue. Could you try to remove > them? See also [1]. > > [1] > http://blogs.perl.org/users/tinita/2018/03/strings-in-yaml---to-quote-or-not-to-quote.html > > On Tue, May 12, 2020 at 4:03 PM Jacky D wrote: > >> hi, Xintong >> >> Thanks for reply , I attached those lines below for application master >> start command : >> >> >> 2020-05-11 21:16:16,635 DEBUG org.apache.hadoop.util.PerformanceAdvisory >> - Crypto codec >> org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec is not available. >> 2020-05-11 21:16:16,635 DEBUG org.apache.hadoop.util.PerformanceAdvisory >> - Using crypto codec >> org.apache.hadoop.crypto.JceAesCtrCryptoCodec. >> 2020-05-11 21:16:16,636 DEBUG org.apache.hadoop.hdfs.DataStreamer >>- DataStreamer block >> BP-1519523618-98.94.65.144-1581106168138:blk_1073745139_4315 sending packet >> packet seqno: 0 offsetInBlock: 0 lastPacketInBlock: false >> lastByteOffsetInBlock: 1697 >> 2020-05-11 21:16:16,637 DEBUG org.apache.hadoop.hdfs.DataStreamer >>- DFSClient seqno: 0 reply: SUCCESS >> downstreamAckTimeNanos: 0 flag: 0 >> 2020-05-11 21:16:16,637 DEBUG org.apache.hadoop.hdfs.DataStreamer >>- DataStreamer block >> BP-1519523618-98.94.65.144-1581106168138:blk_1073745139_4315 sending packet >> packet seqno: 1 offsetInBlock: 1697 lastPacketInBlock: true >> lastByteOffsetInBlock: 1697 >> 2020-05-11 21:16:16,638 DEBUG org.apache.hadoop.hdfs.DataStreamer >>- DFSClient seqno: 1 reply: SUCCESS >> downstreamAckTimeNanos: 0 flag: 0 >> 2020-05-11 21:16:16,638 DEBUG org.apache.hadoop.hdfs.DataStreamer >>- Closing old block >> BP-1519523618-98.94.65.144-1581106168138:blk_1073745139_4315 >> 2020-05-11 21:16:16,641 DEBUG org.apache.hadoop.ipc.Client >> - IPC Client (1954985045) connection to >> ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop sending #70 >> org.apache.hadoop.hdfs.protocol.ClientProtocol.complete >> 2020-05-11 21:16:16,643 DEBUG org.apache.hadoop.ipc.Client >> - IPC Client (1954985045) connection to >> ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop got value #70 >> 2020-05-11 21:16:16,643 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine >>- Call: complete took 2ms >> 2020-05-11 21:16:16,643 DEBUG org.apache.hadoop.ipc.Client >> - IPC Client (1954985045) connection to >> ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop sending #71 >> org.apache.hadoop.hdfs.protocol.ClientProtocol.setTimes >> 2020-05-11 21:16:16,645 DEBUG org.apache.hadoop.ipc.Client >> - IPC Client (1954985045) connection to >> ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop got value #71 >> 2020-05-11 21:16:16,645 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine >>- Call: setTimes took 2ms >> 2020-05-11 21:16:16,647 DEBUG org.apache.hadoop.ipc.Client >> - IPC Client (1954985045) connection to >> ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop sending #72 >> org.apache.hadoop.hdfs.protocol.ClientProtocol.setPermission >> 2020-05-11 21:16:16,648 DEBUG org.apache.hadoop.ipc.Client >> - IPC Client (1954985045) connection to >> ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop got value #72 >> 2020-05-11 21:16:16,648 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine >>- Call: setPermission took 2ms >> 2020-05-11 21:16:16,654 DEBUG >> org.apache.flink.yarn.AbstractYarnClusterDescriptor - Application >> Master start command: $JAVA_HOME/bin/java -Xmx424m >> "-XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading -XX:+LogCompilation >> -XX:LogFile=${FLINK_LOG_PREFIX}.jit -XX:+PrintAssembly" >> -Dlog.file="/jobmanager
Re: Flink Memory analyze on AWS EMR
hi, Xintong Thanks for reply , I attached those lines below for application master start command : 2020-05-11 21:16:16,635 DEBUG org.apache.hadoop.util.PerformanceAdvisory - Crypto codec org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec is not available. 2020-05-11 21:16:16,635 DEBUG org.apache.hadoop.util.PerformanceAdvisory - Using crypto codec org.apache.hadoop.crypto.JceAesCtrCryptoCodec. 2020-05-11 21:16:16,636 DEBUG org.apache.hadoop.hdfs.DataStreamer - DataStreamer block BP-1519523618-98.94.65.144-1581106168138:blk_1073745139_4315 sending packet packet seqno: 0 offsetInBlock: 0 lastPacketInBlock: false lastByteOffsetInBlock: 1697 2020-05-11 21:16:16,637 DEBUG org.apache.hadoop.hdfs.DataStreamer - DFSClient seqno: 0 reply: SUCCESS downstreamAckTimeNanos: 0 flag: 0 2020-05-11 21:16:16,637 DEBUG org.apache.hadoop.hdfs.DataStreamer - DataStreamer block BP-1519523618-98.94.65.144-1581106168138:blk_1073745139_4315 sending packet packet seqno: 1 offsetInBlock: 1697 lastPacketInBlock: true lastByteOffsetInBlock: 1697 2020-05-11 21:16:16,638 DEBUG org.apache.hadoop.hdfs.DataStreamer - DFSClient seqno: 1 reply: SUCCESS downstreamAckTimeNanos: 0 flag: 0 2020-05-11 21:16:16,638 DEBUG org.apache.hadoop.hdfs.DataStreamer - Closing old block BP-1519523618-98.94.65.144-1581106168138:blk_1073745139_4315 2020-05-11 21:16:16,641 DEBUG org.apache.hadoop.ipc.Client - IPC Client (1954985045) connection to ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop sending #70 org.apache.hadoop.hdfs.protocol.ClientProtocol.complete 2020-05-11 21:16:16,643 DEBUG org.apache.hadoop.ipc.Client - IPC Client (1954985045) connection to ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop got value #70 2020-05-11 21:16:16,643 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine - Call: complete took 2ms 2020-05-11 21:16:16,643 DEBUG org.apache.hadoop.ipc.Client - IPC Client (1954985045) connection to ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop sending #71 org.apache.hadoop.hdfs.protocol.ClientProtocol.setTimes 2020-05-11 21:16:16,645 DEBUG org.apache.hadoop.ipc.Client - IPC Client (1954985045) connection to ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop got value #71 2020-05-11 21:16:16,645 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine - Call: setTimes took 2ms 2020-05-11 21:16:16,647 DEBUG org.apache.hadoop.ipc.Client - IPC Client (1954985045) connection to ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop sending #72 org.apache.hadoop.hdfs.protocol.ClientProtocol.setPermission 2020-05-11 21:16:16,648 DEBUG org.apache.hadoop.ipc.Client - IPC Client (1954985045) connection to ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop got value #72 2020-05-11 21:16:16,648 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine - Call: setPermission took 2ms 2020-05-11 21:16:16,654 DEBUG org.apache.flink.yarn.AbstractYarnClusterDescriptor - Application Master start command: $JAVA_HOME/bin/java -Xmx424m "-XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading -XX:+LogCompilation -XX:LogFile=${FLINK_LOG_PREFIX}.jit -XX:+PrintAssembly" -Dlog.file="/jobmanager.log" -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.entrypoint.YarnSessionClusterEntrypoint 1> /jobmanager.out 2> /jobmanager.err 2020-05-11 21:16:16,654 DEBUG org.apache.hadoop.ipc.Client - stopping client from cache: org.apache.hadoop.ipc.Client@28194a50 2020-05-11 21:16:16,656 DEBUG org.apache.flink.yarn.AbstractYarnClusterDescriptor$ApplicationSubmissionContextReflector - org.apache.hadoop.yarn.api.records.ApplicationSubmissionContext supports method setApplicationTags. 2020-05-11 21:16:16,656 DEBUG org.apache.flink.yarn.AbstractYarnClusterDescriptor$ApplicationSubmissionContextReflector - org.apache.hadoop.yarn.api.records.ApplicationSubmissionContext supports method setAttemptFailuresValidityInterval. 2020-05-11 21:16:16,656 DEBUG org.apache.flink.yarn.AbstractYarnClusterDescriptor$ApplicationSubmissionContextReflector - org.apache.hadoop.yarn.api.records.ApplicationSubmissionContext supports method setKeepContainersAcrossApplicationAttempts. 2020-05-11 21:16:16,656 DEBUG org.apache.flink.yarn.AbstractYarnClusterDescriptor$ApplicationSubmissionContextReflector - org.apache.hadoop.yarn.api.records.ApplicationSubmissionContext supports method setNodeLabelExpression. Xintong Song 于2020年5月11日周一 下午10:11写道: > Hi Jacky, > > Could you search for "Application Master start command:" in the debug log > and post the result and a few lines before & after that? This is not > included in the clip of attached log file. > > Thank you~ > > Xintong Song > > > > On Tue, May 12, 2020
Re: Flink Memory analyze on AWS EMR
hi, Robert Thanks so much for quick reply , I changed the log level to debug and attach the log file . Thanks Jacky Robert Metzger 于2020年5月11日周一 下午4:14写道: > Thanks a lot for posting the full output. > > It seems that Flink is passing an invalid list of arguments to the JVM. > Can you > - set the root log level in conf/log4j-yarn-session.properties to DEBUG > - then launch the YARN session > - share the log file of the yarn session on the mailing list? > > I'm particularly interested in the line printed here, as it shows the JVM > invocation. > > https://github.com/apache/flink/blob/release-1.6/flink-yarn/src/main/java/org/apache/flink/yarn/AbstractYarnClusterDescriptor.java#L1630 > > > On Mon, May 11, 2020 at 9:56 PM Jacky D wrote: > >> Hi,Robert >> >> Yes , I tried to retrieve more log info from yarn UI , the full logs >> showing below , this happens when I try to create a flink yarn session on >> emr when set up jitwatch configuration . >> >> 2020-05-11 19:06:09,552 ERROR >> org.apache.flink.yarn.cli.FlinkYarnSessionCli - Error while >> running the Flink Yarn session. >> java.lang.reflect.UndeclaredThrowableException >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1862) >> at >> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) >> at >> org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:813) >> Caused by: org.apache.flink.client.deployment.ClusterDeploymentException: >> Couldn't deploy Yarn session cluster >> at >> org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:429) >> at >> org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:610) >> at >> org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$2(FlinkYarnSessionCli.java:813) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:422) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) >> ... 2 more >> Caused by: >> org.apache.flink.yarn.AbstractYarnClusterDescriptor$YarnDeploymentException: >> The YARN application unexpectedly switched to state FAILED during >> deployment. >> Diagnostics from YARN: Application application_1584459865196_0165 failed >> 1 times (global limit =2; local limit is =1) due to AM Container for >> appattempt_1584459865196_0165_01 exited with exitCode: 1 >> Failing this attempt.Diagnostics: Exception from container-launch. >> Container id: container_1584459865196_0165_01_01 >> Exit code: 1 >> Exception message: Usage: java [-options] class [args...] >>(to execute a class) >>or java [-options] -jar jarfile [args...] >>(to execute a jar file) >> where options include: >> -d32 use a 32-bit data model if available >> -d64 use a 64-bit data model if available >> -server to select the "server" VM >> The default VM is server, >> because you are running on a server-class machine. >> >> >> -cp >> -classpath >> A : separated list of directories, JAR archives, >> and ZIP archives to search for class files. >> -D= >> set a system property >> -verbose:[class|gc|jni] >> enable verbose output >> -version print product version and exit >> -version: >> Warning: this feature is deprecated and will be removed >> in a future release. >> require the specified version to run >> -showversion print product version and continue >> -jre-restrict-search | -no-jre-restrict-search >> Warning: this feature is deprecated and will be removed >> in a future release. >> include/exclude user private JREs in the version search >> -? -help print this help message >> -Xprint help on non-standard options >> -ea[:...|:] >> -enableassertions[:...|:] >> enable assertions with specified granularity >> -da[:...|:] >> -disableassertions[:...|:] >> disable assertions with specified granularity >> -esa | -enablesystemassertions >> enable system assertions >> -dsa | -disablesystemassertions >> disa
Re: Flink Memory analyze on AWS EMR
Hi,Robert Yes , I tried to retrieve more log info from yarn UI , the full logs showing below , this happens when I try to create a flink yarn session on emr when set up jitwatch configuration . 2020-05-11 19:06:09,552 ERROR org.apache.flink.yarn.cli.FlinkYarnSessionCli - Error while running the Flink Yarn session. java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1862) at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:813) Caused by: org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't deploy Yarn session cluster at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:429) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:610) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$2(FlinkYarnSessionCli.java:813) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) ... 2 more Caused by: org.apache.flink.yarn.AbstractYarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment. Diagnostics from YARN: Application application_1584459865196_0165 failed 1 times (global limit =2; local limit is =1) due to AM Container for appattempt_1584459865196_0165_01 exited with exitCode: 1 Failing this attempt.Diagnostics: Exception from container-launch. Container id: container_1584459865196_0165_01_01 Exit code: 1 Exception message: Usage: java [-options] class [args...] (to execute a class) or java [-options] -jar jarfile [args...] (to execute a jar file) where options include: -d32 use a 32-bit data model if available -d64 use a 64-bit data model if available -server to select the "server" VM The default VM is server, because you are running on a server-class machine. -cp -classpath A : separated list of directories, JAR archives, and ZIP archives to search for class files. -D= set a system property -verbose:[class|gc|jni] enable verbose output -version print product version and exit -version: Warning: this feature is deprecated and will be removed in a future release. require the specified version to run -showversion print product version and continue -jre-restrict-search | -no-jre-restrict-search Warning: this feature is deprecated and will be removed in a future release. include/exclude user private JREs in the version search -? -help print this help message -Xprint help on non-standard options -ea[:...|:] -enableassertions[:...|:] enable assertions with specified granularity -da[:...|:] -disableassertions[:...|:] disable assertions with specified granularity -esa | -enablesystemassertions enable system assertions -dsa | -disablesystemassertions disable system assertions -agentlib:[=] load native agent library , e.g. -agentlib:hprof see also, -agentlib:jdwp=help and -agentlib:hprof=help -agentpath:[=] load native agent library by full pathname -javaagent:[=] load Java programming language agent, see java.lang.instrument -splash: show splash screen with specified image See http://www.oracle.com/technetwork/java/javase/documentation/index.html for more details. Thanks Jacky Robert Metzger 于2020年5月11日周一 下午3:42写道: > Hey Jacky, > > The error says "The YARN application unexpectedly switched to state FAILED > during deployment.". > Have you tried retrieving the YARN application logs? > Does the YARN UI / resource manager logs reveal anything on the reason for > the deployment to fail? > > Best, > Robert > > > On Mon, May 11, 2020 at 9:34 PM Jacky D wrote: > >> >> >> -- Forwarded message - >> 发件人: Jacky D >> Date: 2020年5月11日周一 下午3:12 >> Subject: Re: Flink Memory analyze on AWS EMR >> To: Khachatryan Roman >> >> >> Hi, Roman >> >> Thanks for quick response , I tried without logFIle option but failed >> with same error , I'm currently using flink 1.6 >> https://ci.apache.org/projects/flink/flink-docs-release-1.6/monitoring/application_profiling.html, >> so I can only use Jitwatch
Fwd: Flink Memory analyze on AWS EMR
-- Forwarded message - 发件人: Jacky D Date: 2020年5月11日周一 下午3:12 Subject: Re: Flink Memory analyze on AWS EMR To: Khachatryan Roman Hi, Roman Thanks for quick response , I tried without logFIle option but failed with same error , I'm currently using flink 1.6 https://ci.apache.org/projects/flink/flink-docs-release-1.6/monitoring/application_profiling.html, so I can only use Jitwatch or JMC . I guess those tools only available on Standalone cluster ? as document mentioned "Each standalone JobManager, TaskManager, HistoryServer, and ZooKeeper daemon redirects stdout and stderr to a file with a .out filename suffix and writes internal logging to a file with a .log suffix. Java options configured by the user in env.java.opts" ? Thanks Jacky
Flink Memory analyze on AWS EMR
hi, All I'm encounter a memory issue with my flink job on AWS EMR(current flink version 1.6.2) , I would like to find the root cause so I'm trying JITWatch on my local standalone cluster but I can not use it on EMR . after I add following config on flink-conf.yaml : env.java.opts: "-XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading -XX:+LogCompilation -XX:LogFile=${FLINK_LOG_PREFIX}.jit -XX:+PrintAssembly" I got error 2020-05-07 16:24:53,368 ERROR org.apache.flink.yarn.cli.FlinkYarnSessionCli - Error while running the Flink Yarn session. java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1862) at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:813) Caused by: org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't deploy Yarn session cluster at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:429) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:610) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$2(FlinkYarnSessionCli.java:813) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) ... 2 more Caused by: org.apache.flink.yarn.AbstractYarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment. How can I fix this issue to enable JITWatch or which tool will be a proper way to analyze flink mem dump on EMR ? Thanks Jacky Du