Re: Flink Memory analyze on AWS EMR

2020-05-13 Thread Jacky D
Hi, Xintong

Thanks for point it out, after I set up the log path it's working now .
so , for conclusion .

on emr , to set up jitwatch in flink-conf.yaml, we should not include
quotes and give a path to output the jit log file . this is different from
setting it on standalone cluster .
example :
env.java.opts: -XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading
-XX:+LogCompilation -XX:LogFile=/tmp/flinkmemdump.jit -XX:+PrintAssembly

Thanks everyone involved in this discussion!

Jacky

Xintong Song  于2020年5月12日周二 下午10:41写道:

> Hi Jacky,
>
> I don't think ${FLINK_LOG_PREFIX} is available for Flink Yarn deployment.
> This is just my guess, that the actual file name becomes ".jit". You can
> try to verify that by looking for the hidden file.
>
> If it is indeed this problem, you can try to replace "${FLINK_LOG_PREFIX}"
> with "/your-file-name.jit". The token "" should be
> replaced with proper log directory path by Yarn automatically.
>
> I noticed that the usage of ${FLINK_LOG_PREFIX} is recommended by Flink's
> documentation [1]. This is IMO a bit misleading. I'll try to file an issue
> to improve the docs.
>
> Thank you~
>
> Xintong Song
>
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/application_profiling.html#profiling-with-jitwatch
>
> On Wed, May 13, 2020 at 2:45 AM Jacky D  wrote:
>
>> hi, Arvid
>>
>> thanks for the advice  ,  I removed the quotes and it do created a yarn
>> session on EMR , but I didn't find any jit log file generated .
>>
>> The config with quotes is working on standalone cluster . I also tried to
>> dynamic pass the property within the yarn session command :
>>
>> flink-yarn-session -n 1 -d -nm testSession -yD 
>> env.java.opts="-XX:+UnlockDiagnosticVMOptions
>> -XX:+TraceClassLoading -XX:+LogCompilation
>> -XX:LogFile=${FLINK_LOG_PREFIX}.jit -XX:+PrintAssembly"
>>
>>
>> but get same result , session created , but can not find any jit log file
>> under container log .
>>
>>
>> Thanks
>>
>> Jacky
>>
>> Arvid Heise  于2020年5月12日周二 下午12:57写道:
>>
>>> Hi Jacky,
>>>
>>> I suspect that the quotes are the actual issue. Could you try to remove
>>> them? See also [1].
>>>
>>> [1]
>>> http://blogs.perl.org/users/tinita/2018/03/strings-in-yaml---to-quote-or-not-to-quote.html
>>>
>>> On Tue, May 12, 2020 at 4:03 PM Jacky D  wrote:
>>>
>>>> hi, Xintong
>>>>
>>>> Thanks for reply , I attached those lines below for application master
>>>> start command :
>>>>
>>>>
>>>> 2020-05-11 21:16:16,635 DEBUG
>>>> org.apache.hadoop.util.PerformanceAdvisory- Crypto
>>>> codec org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec is not available.
>>>> 2020-05-11 21:16:16,635 DEBUG
>>>> org.apache.hadoop.util.PerformanceAdvisory- Using
>>>> crypto codec org.apache.hadoop.crypto.JceAesCtrCryptoCodec.
>>>> 2020-05-11 21:16:16,636 DEBUG org.apache.hadoop.hdfs.DataStreamer
>>>>  - DataStreamer block
>>>> BP-1519523618-98.94.65.144-1581106168138:blk_1073745139_4315 sending packet
>>>> packet seqno: 0 offsetInBlock: 0 lastPacketInBlock: false
>>>> lastByteOffsetInBlock: 1697
>>>> 2020-05-11 21:16:16,637 DEBUG org.apache.hadoop.hdfs.DataStreamer
>>>>  - DFSClient seqno: 0 reply: SUCCESS
>>>> downstreamAckTimeNanos: 0 flag: 0
>>>> 2020-05-11 21:16:16,637 DEBUG org.apache.hadoop.hdfs.DataStreamer
>>>>  - DataStreamer block
>>>> BP-1519523618-98.94.65.144-1581106168138:blk_1073745139_4315 sending packet
>>>> packet seqno: 1 offsetInBlock: 1697 lastPacketInBlock: true
>>>> lastByteOffsetInBlock: 1697
>>>> 2020-05-11 21:16:16,638 DEBUG org.apache.hadoop.hdfs.DataStreamer
>>>>  - DFSClient seqno: 1 reply: SUCCESS
>>>> downstreamAckTimeNanos: 0 flag: 0
>>>> 2020-05-11 21:16:16,638 DEBUG org.apache.hadoop.hdfs.DataStreamer
>>>>  - Closing old block
>>>> BP-1519523618-98.94.65.144-1581106168138:blk_1073745139_4315
>>>> 2020-05-11 21:16:16,641 DEBUG org.apache.hadoop.ipc.Client
>>>> - IPC Client (1954985045) connection to
>>>> ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop sending #70
>>>> org.apache.hadoop.hdfs.protocol.ClientProtocol.complete
>>>> 2020

Re: Flink Memory analyze on AWS EMR

2020-05-12 Thread Jacky D
hi, Arvid

thanks for the advice  ,  I removed the quotes and it do created a yarn
session on EMR , but I didn't find any jit log file generated .

The config with quotes is working on standalone cluster . I also tried to
dynamic pass the property within the yarn session command :

flink-yarn-session -n 1 -d -nm testSession -yD
env.java.opts="-XX:+UnlockDiagnosticVMOptions
-XX:+TraceClassLoading -XX:+LogCompilation
-XX:LogFile=${FLINK_LOG_PREFIX}.jit -XX:+PrintAssembly"


but get same result , session created , but can not find any jit log file
under container log .


Thanks

Jacky

Arvid Heise  于2020年5月12日周二 下午12:57写道:

> Hi Jacky,
>
> I suspect that the quotes are the actual issue. Could you try to remove
> them? See also [1].
>
> [1]
> http://blogs.perl.org/users/tinita/2018/03/strings-in-yaml---to-quote-or-not-to-quote.html
>
> On Tue, May 12, 2020 at 4:03 PM Jacky D  wrote:
>
>> hi, Xintong
>>
>> Thanks for reply , I attached those lines below for application master
>> start command :
>>
>>
>> 2020-05-11 21:16:16,635 DEBUG org.apache.hadoop.util.PerformanceAdvisory
>>   - Crypto codec
>> org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec is not available.
>> 2020-05-11 21:16:16,635 DEBUG org.apache.hadoop.util.PerformanceAdvisory
>>   - Using crypto codec
>> org.apache.hadoop.crypto.JceAesCtrCryptoCodec.
>> 2020-05-11 21:16:16,636 DEBUG org.apache.hadoop.hdfs.DataStreamer
>>- DataStreamer block
>> BP-1519523618-98.94.65.144-1581106168138:blk_1073745139_4315 sending packet
>> packet seqno: 0 offsetInBlock: 0 lastPacketInBlock: false
>> lastByteOffsetInBlock: 1697
>> 2020-05-11 21:16:16,637 DEBUG org.apache.hadoop.hdfs.DataStreamer
>>- DFSClient seqno: 0 reply: SUCCESS
>> downstreamAckTimeNanos: 0 flag: 0
>> 2020-05-11 21:16:16,637 DEBUG org.apache.hadoop.hdfs.DataStreamer
>>- DataStreamer block
>> BP-1519523618-98.94.65.144-1581106168138:blk_1073745139_4315 sending packet
>> packet seqno: 1 offsetInBlock: 1697 lastPacketInBlock: true
>> lastByteOffsetInBlock: 1697
>> 2020-05-11 21:16:16,638 DEBUG org.apache.hadoop.hdfs.DataStreamer
>>- DFSClient seqno: 1 reply: SUCCESS
>> downstreamAckTimeNanos: 0 flag: 0
>> 2020-05-11 21:16:16,638 DEBUG org.apache.hadoop.hdfs.DataStreamer
>>- Closing old block
>> BP-1519523618-98.94.65.144-1581106168138:blk_1073745139_4315
>> 2020-05-11 21:16:16,641 DEBUG org.apache.hadoop.ipc.Client
>>   - IPC Client (1954985045) connection to
>> ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop sending #70
>> org.apache.hadoop.hdfs.protocol.ClientProtocol.complete
>> 2020-05-11 21:16:16,643 DEBUG org.apache.hadoop.ipc.Client
>>   - IPC Client (1954985045) connection to
>> ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop got value #70
>> 2020-05-11 21:16:16,643 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine
>>- Call: complete took 2ms
>> 2020-05-11 21:16:16,643 DEBUG org.apache.hadoop.ipc.Client
>>   - IPC Client (1954985045) connection to
>> ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop sending #71
>> org.apache.hadoop.hdfs.protocol.ClientProtocol.setTimes
>> 2020-05-11 21:16:16,645 DEBUG org.apache.hadoop.ipc.Client
>>   - IPC Client (1954985045) connection to
>> ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop got value #71
>> 2020-05-11 21:16:16,645 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine
>>- Call: setTimes took 2ms
>> 2020-05-11 21:16:16,647 DEBUG org.apache.hadoop.ipc.Client
>>   - IPC Client (1954985045) connection to
>> ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop sending #72
>> org.apache.hadoop.hdfs.protocol.ClientProtocol.setPermission
>> 2020-05-11 21:16:16,648 DEBUG org.apache.hadoop.ipc.Client
>>   - IPC Client (1954985045) connection to
>> ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop got value #72
>> 2020-05-11 21:16:16,648 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine
>>- Call: setPermission took 2ms
>> 2020-05-11 21:16:16,654 DEBUG
>> org.apache.flink.yarn.AbstractYarnClusterDescriptor   - Application
>> Master start command: $JAVA_HOME/bin/java -Xmx424m
>> "-XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading -XX:+LogCompilation
>> -XX:LogFile=${FLINK_LOG_PREFIX}.jit -XX:+PrintAssembly"
>> -Dlog.file="/jobmanager

Re: Flink Memory analyze on AWS EMR

2020-05-12 Thread Jacky D
hi, Xintong

Thanks for reply , I attached those lines below for application master
start command :


2020-05-11 21:16:16,635 DEBUG org.apache.hadoop.util.PerformanceAdvisory
- Crypto codec
org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec is not available.
2020-05-11 21:16:16,635 DEBUG org.apache.hadoop.util.PerformanceAdvisory
- Using crypto codec
org.apache.hadoop.crypto.JceAesCtrCryptoCodec.
2020-05-11 21:16:16,636 DEBUG org.apache.hadoop.hdfs.DataStreamer
 - DataStreamer block
BP-1519523618-98.94.65.144-1581106168138:blk_1073745139_4315 sending packet
packet seqno: 0 offsetInBlock: 0 lastPacketInBlock: false
lastByteOffsetInBlock: 1697
2020-05-11 21:16:16,637 DEBUG org.apache.hadoop.hdfs.DataStreamer
 - DFSClient seqno: 0 reply: SUCCESS
downstreamAckTimeNanos: 0 flag: 0
2020-05-11 21:16:16,637 DEBUG org.apache.hadoop.hdfs.DataStreamer
 - DataStreamer block
BP-1519523618-98.94.65.144-1581106168138:blk_1073745139_4315 sending packet
packet seqno: 1 offsetInBlock: 1697 lastPacketInBlock: true
lastByteOffsetInBlock: 1697
2020-05-11 21:16:16,638 DEBUG org.apache.hadoop.hdfs.DataStreamer
 - DFSClient seqno: 1 reply: SUCCESS
downstreamAckTimeNanos: 0 flag: 0
2020-05-11 21:16:16,638 DEBUG org.apache.hadoop.hdfs.DataStreamer
 - Closing old block
BP-1519523618-98.94.65.144-1581106168138:blk_1073745139_4315
2020-05-11 21:16:16,641 DEBUG org.apache.hadoop.ipc.Client
- IPC Client (1954985045) connection to
ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop sending #70
org.apache.hadoop.hdfs.protocol.ClientProtocol.complete
2020-05-11 21:16:16,643 DEBUG org.apache.hadoop.ipc.Client
- IPC Client (1954985045) connection to
ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop got value #70
2020-05-11 21:16:16,643 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine
 - Call: complete took 2ms
2020-05-11 21:16:16,643 DEBUG org.apache.hadoop.ipc.Client
- IPC Client (1954985045) connection to
ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop sending #71
org.apache.hadoop.hdfs.protocol.ClientProtocol.setTimes
2020-05-11 21:16:16,645 DEBUG org.apache.hadoop.ipc.Client
- IPC Client (1954985045) connection to
ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop got value #71
2020-05-11 21:16:16,645 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine
 - Call: setTimes took 2ms
2020-05-11 21:16:16,647 DEBUG org.apache.hadoop.ipc.Client
- IPC Client (1954985045) connection to
ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop sending #72
org.apache.hadoop.hdfs.protocol.ClientProtocol.setPermission
2020-05-11 21:16:16,648 DEBUG org.apache.hadoop.ipc.Client
- IPC Client (1954985045) connection to
ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop got value #72
2020-05-11 21:16:16,648 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine
 - Call: setPermission took 2ms
2020-05-11 21:16:16,654 DEBUG
org.apache.flink.yarn.AbstractYarnClusterDescriptor   - Application
Master start command: $JAVA_HOME/bin/java -Xmx424m
"-XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading -XX:+LogCompilation
-XX:LogFile=${FLINK_LOG_PREFIX}.jit -XX:+PrintAssembly"
-Dlog.file="/jobmanager.log"
-Dlog4j.configuration=file:log4j.properties
org.apache.flink.yarn.entrypoint.YarnSessionClusterEntrypoint  1>
/jobmanager.out 2> /jobmanager.err
2020-05-11 21:16:16,654 DEBUG org.apache.hadoop.ipc.Client
- stopping client from cache:
org.apache.hadoop.ipc.Client@28194a50
2020-05-11 21:16:16,656 DEBUG
org.apache.flink.yarn.AbstractYarnClusterDescriptor$ApplicationSubmissionContextReflector
- org.apache.hadoop.yarn.api.records.ApplicationSubmissionContext supports
method setApplicationTags.
2020-05-11 21:16:16,656 DEBUG
org.apache.flink.yarn.AbstractYarnClusterDescriptor$ApplicationSubmissionContextReflector
- org.apache.hadoop.yarn.api.records.ApplicationSubmissionContext supports
method setAttemptFailuresValidityInterval.
2020-05-11 21:16:16,656 DEBUG
org.apache.flink.yarn.AbstractYarnClusterDescriptor$ApplicationSubmissionContextReflector
- org.apache.hadoop.yarn.api.records.ApplicationSubmissionContext supports
method setKeepContainersAcrossApplicationAttempts.
2020-05-11 21:16:16,656 DEBUG
org.apache.flink.yarn.AbstractYarnClusterDescriptor$ApplicationSubmissionContextReflector
- org.apache.hadoop.yarn.api.records.ApplicationSubmissionContext supports
method setNodeLabelExpression.

Xintong Song  于2020年5月11日周一 下午10:11写道:

> Hi Jacky,
>
> Could you search for "Application Master start command:" in the debug log
> and post the result and a few lines before & after that? This is not
> included in the clip of attached log file.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Tue, May 12, 2020 

Re: Flink Memory analyze on AWS EMR

2020-05-11 Thread Jacky D
hi, Robert

Thanks so much for quick reply  , I changed the log level to debug  and
attach the log file .

Thanks
Jacky

Robert Metzger  于2020年5月11日周一 下午4:14写道:

> Thanks a lot for posting the full output.
>
> It seems that Flink is passing an invalid list of arguments to the JVM.
> Can you
> - set the root log level in conf/log4j-yarn-session.properties to DEBUG
> - then launch the YARN session
> - share the log file of the yarn session on the mailing list?
>
> I'm particularly interested in the line printed here, as it shows the JVM
> invocation.
>
> https://github.com/apache/flink/blob/release-1.6/flink-yarn/src/main/java/org/apache/flink/yarn/AbstractYarnClusterDescriptor.java#L1630
>
>
> On Mon, May 11, 2020 at 9:56 PM Jacky D  wrote:
>
>> Hi,Robert
>>
>> Yes , I tried to retrieve more log info from yarn UI , the full logs
>> showing below , this happens when I try to create a flink yarn session on
>> emr when set up jitwatch configuration .
>>
>> 2020-05-11 19:06:09,552 ERROR
>> org.apache.flink.yarn.cli.FlinkYarnSessionCli - Error while
>> running the Flink Yarn session.
>> java.lang.reflect.UndeclaredThrowableException
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1862)
>> at
>> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>> at
>> org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:813)
>> Caused by: org.apache.flink.client.deployment.ClusterDeploymentException:
>> Couldn't deploy Yarn session cluster
>> at
>> org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:429)
>> at
>> org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:610)
>> at
>> org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$2(FlinkYarnSessionCli.java:813)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:422)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
>> ... 2 more
>> Caused by:
>> org.apache.flink.yarn.AbstractYarnClusterDescriptor$YarnDeploymentException:
>> The YARN application unexpectedly switched to state FAILED during
>> deployment.
>> Diagnostics from YARN: Application application_1584459865196_0165 failed
>> 1 times (global limit =2; local limit is =1) due to AM Container for
>> appattempt_1584459865196_0165_01 exited with  exitCode: 1
>> Failing this attempt.Diagnostics: Exception from container-launch.
>> Container id: container_1584459865196_0165_01_01
>> Exit code: 1
>> Exception message: Usage: java [-options] class [args...]
>>(to execute a class)
>>or  java [-options] -jar jarfile [args...]
>>(to execute a jar file)
>> where options include:
>> -d32   use a 32-bit data model if available
>> -d64   use a 64-bit data model if available
>> -server   to select the "server" VM
>>   The default VM is server,
>>   because you are running on a server-class machine.
>>
>>
>> -cp 
>> -classpath 
>>   A : separated list of directories, JAR archives,
>>   and ZIP archives to search for class files.
>> -D=
>>   set a system property
>> -verbose:[class|gc|jni]
>>   enable verbose output
>> -version  print product version and exit
>> -version:
>>   Warning: this feature is deprecated and will be removed
>>   in a future release.
>>   require the specified version to run
>> -showversion  print product version and continue
>> -jre-restrict-search | -no-jre-restrict-search
>>   Warning: this feature is deprecated and will be removed
>>   in a future release.
>>   include/exclude user private JREs in the version search
>> -? -help  print this help message
>> -Xprint help on non-standard options
>> -ea[:...|:]
>> -enableassertions[:...|:]
>>   enable assertions with specified granularity
>> -da[:...|:]
>> -disableassertions[:...|:]
>>   disable assertions with specified granularity
>> -esa | -enablesystemassertions
>>   enable system assertions
>> -dsa | -disablesystemassertions
>>   disa

Re: Flink Memory analyze on AWS EMR

2020-05-11 Thread Jacky D
Hi,Robert

Yes , I tried to retrieve more log info from yarn UI , the full logs
showing below , this happens when I try to create a flink yarn session on
emr when set up jitwatch configuration .

2020-05-11 19:06:09,552 ERROR
org.apache.flink.yarn.cli.FlinkYarnSessionCli - Error while
running the Flink Yarn session.
java.lang.reflect.UndeclaredThrowableException
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1862)
at
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at
org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:813)
Caused by: org.apache.flink.client.deployment.ClusterDeploymentException:
Couldn't deploy Yarn session cluster
at
org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:429)
at
org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:610)
at
org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$2(FlinkYarnSessionCli.java:813)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
... 2 more
Caused by:
org.apache.flink.yarn.AbstractYarnClusterDescriptor$YarnDeploymentException:
The YARN application unexpectedly switched to state FAILED during
deployment.
Diagnostics from YARN: Application application_1584459865196_0165 failed 1
times (global limit =2; local limit is =1) due to AM Container for
appattempt_1584459865196_0165_01 exited with  exitCode: 1
Failing this attempt.Diagnostics: Exception from container-launch.
Container id: container_1584459865196_0165_01_01
Exit code: 1
Exception message: Usage: java [-options] class [args...]
   (to execute a class)
   or  java [-options] -jar jarfile [args...]
   (to execute a jar file)
where options include:
-d32   use a 32-bit data model if available
-d64   use a 64-bit data model if available
-server   to select the "server" VM
  The default VM is server,
  because you are running on a server-class machine.


-cp 
-classpath 
  A : separated list of directories, JAR archives,
  and ZIP archives to search for class files.
-D=
  set a system property
-verbose:[class|gc|jni]
  enable verbose output
-version  print product version and exit
-version:
  Warning: this feature is deprecated and will be removed
  in a future release.
  require the specified version to run
-showversion  print product version and continue
-jre-restrict-search | -no-jre-restrict-search
  Warning: this feature is deprecated and will be removed
  in a future release.
  include/exclude user private JREs in the version search
-? -help  print this help message
-Xprint help on non-standard options
-ea[:...|:]
-enableassertions[:...|:]
  enable assertions with specified granularity
-da[:...|:]
-disableassertions[:...|:]
  disable assertions with specified granularity
-esa | -enablesystemassertions
  enable system assertions
-dsa | -disablesystemassertions
  disable system assertions
-agentlib:[=]
  load native agent library , e.g. -agentlib:hprof
  see also, -agentlib:jdwp=help and -agentlib:hprof=help
-agentpath:[=]
  load native agent library by full pathname
-javaagent:[=]
  load Java programming language agent, see
java.lang.instrument
-splash:
  show splash screen with specified image
See http://www.oracle.com/technetwork/java/javase/documentation/index.html
for more details.

Thanks
Jacky

Robert Metzger  于2020年5月11日周一 下午3:42写道:

> Hey Jacky,
>
> The error says "The YARN application unexpectedly switched to state FAILED
> during deployment.".
> Have you tried retrieving the YARN application logs?
> Does the YARN UI / resource manager logs reveal anything on the reason for
> the deployment to fail?
>
> Best,
> Robert
>
>
> On Mon, May 11, 2020 at 9:34 PM Jacky D  wrote:
>
>>
>>
>> -- Forwarded message -
>> 发件人: Jacky D 
>> Date: 2020年5月11日周一 下午3:12
>> Subject: Re: Flink Memory analyze on AWS EMR
>> To: Khachatryan Roman 
>>
>>
>> Hi, Roman
>>
>> Thanks for quick response , I tried without logFIle option but failed
>> with same error , I'm currently using flink 1.6
>> https://ci.apache.org/projects/flink/flink-docs-release-1.6/monitoring/application_profiling.html,
>> so I can only use Jitwatch 

Fwd: Flink Memory analyze on AWS EMR

2020-05-11 Thread Jacky D
-- Forwarded message -
发件人: Jacky D 
Date: 2020年5月11日周一 下午3:12
Subject: Re: Flink Memory analyze on AWS EMR
To: Khachatryan Roman 


Hi, Roman

Thanks for quick response , I tried without logFIle option but failed with
same error , I'm currently using flink 1.6
https://ci.apache.org/projects/flink/flink-docs-release-1.6/monitoring/application_profiling.html,
so I can only use Jitwatch or JMC .  I guess those tools only available on
Standalone cluster ? as document mentioned "Each standalone JobManager,
TaskManager, HistoryServer, and ZooKeeper daemon redirects stdout and stderr to
a file with a .out filename suffix and writes internal logging to a file
with a .log suffix. Java options configured by the user in env.java.opts" ?

Thanks
Jacky


Flink Memory analyze on AWS EMR

2020-05-11 Thread Jacky D
hi, All

I'm encounter a memory issue with my flink job on AWS EMR(current flink
version 1.6.2) , I would like to find the root cause so I'm trying JITWatch
on my local standalone cluster but I can not use it on EMR . after I add
following config on flink-conf.yaml :

env.java.opts: "-XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading
-XX:+LogCompilation -XX:LogFile=${FLINK_LOG_PREFIX}.jit -XX:+PrintAssembly"

I got error

2020-05-07 16:24:53,368 ERROR
org.apache.flink.yarn.cli.FlinkYarnSessionCli - Error while
running the Flink Yarn session.
java.lang.reflect.UndeclaredThrowableException
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1862)
at
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at
org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:813)
Caused by: org.apache.flink.client.deployment.ClusterDeploymentException:
Couldn't deploy Yarn session cluster
at
org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:429)
at
org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:610)
at
org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$2(FlinkYarnSessionCli.java:813)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
... 2 more
Caused by:
org.apache.flink.yarn.AbstractYarnClusterDescriptor$YarnDeploymentException:
The YARN application unexpectedly switched to state FAILED during
deployment.

How can I fix this issue to enable JITWatch or which tool will be a proper
way to analyze flink mem dump on EMR  ?

Thanks
Jacky Du