[ 
https://issues.apache.org/jira/browse/APEXCORE-575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15684250#comment-15684250
 ] 

Tushar Gosavi commented on APEXCORE-575:
----------------------------------------

In my tests, I have observed that launching application having 2GB (130 
operators) checkpointed state, takes around 2 minutes to relaunch. Most of the 
time taken is for copying this state (106 seconds).

{code}
241.3 K  723.8 K  
datatorrent/apps/application_1478656869152_0327/bval-jsr303-0.5.jar
2.0 G    5.9 G    datatorrent/apps/application_1478656869152_0327/checkpoints
228.4 K  685.1 K  
datatorrent/apps/application_1478656869152_0327/commons-beanutils-1.9.2.jar
{code}

{code}
16/11/21 01:26:34 INFO stram.StramClient: Restart from 
hdfs://node18.morado.com:8020/user/tushar/datatorrent/apps/application_1478656869152_0327
16/11/21 01:26:35 INFO stram.FSRecoveryHandler: Creating 
hdfs://node18.morado.com:8020/user/tushar/datatorrent/apps/application_1478656869152_0330/recovery/log
16/11/21 01:28:20 INFO stram.StramClient: copy of old state took 106398 ms << 
Time taken
16/11/21 01:28:21 INFO stram.StramClient: Set the environment for the 
application master
{code}

In some cases when downstream operator keeps on crashing and upstream operator 
keeps on taking checkpoints which are not purged because of downstream 
failures. Sometimes relaunched is used to recover from this failure, and 
copying of old  app files could delay application relaunch.

For file-based storage agent we could avoid copying by keeping reference to old 
checkpoint directory and use it for reading only, and write new checkpoints in 
new application directory.


> Improve application relaunch time.
> ----------------------------------
>
>                 Key: APEXCORE-575
>                 URL: https://issues.apache.org/jira/browse/APEXCORE-575
>             Project: Apache Apex Core
>          Issue Type: Improvement
>            Reporter: Tushar Gosavi
>            Assignee: Tushar Gosavi
>
> Improve application relaunch time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to