[jira] [Closed] (AIRAVATA-2738) Experiments are not actually LAUNCHED from orchestrator and not in zookeeper queue
[ https://issues.apache.org/jira/browse/AIRAVATA-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dimuthu Upeksha closed AIRAVATA-2738. - Resolution: Fixed > Experiments are not actually LAUNCHED from orchestrator and not in zookeeper > queue > -- > > Key: AIRAVATA-2738 > URL: https://issues.apache.org/jira/browse/AIRAVATA-2738 > Project: Airavata > Issue Type: Bug > Components: helix implementation >Affects Versions: 0.18 >Reporter: Eroma >Assignee: Dimuthu Upeksha >Priority: Major > Fix For: 0.18 > > > When experiments are launched the experiment status is changed to LAUNCHED. > But the PROCESS ID of the experiment is not really added to the zookeeper > queue and hence the it is not further processed by the helix. The > orchestrator was unable to connect to zookeeper and couldn't add the ID to > the queue and no errors in the orchestrator log as well. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRAVATA-2738) Experiments are not actually LAUNCHED from orchestrator and not in zookeeper queue
[ https://issues.apache.org/jira/browse/AIRAVATA-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831962#comment-16831962 ] Dimuthu Upeksha commented on AIRAVATA-2738: --- Fixed by moving zk level metadata storage to database > Experiments are not actually LAUNCHED from orchestrator and not in zookeeper > queue > -- > > Key: AIRAVATA-2738 > URL: https://issues.apache.org/jira/browse/AIRAVATA-2738 > Project: Airavata > Issue Type: Bug > Components: helix implementation >Affects Versions: 0.18 >Reporter: Eroma >Assignee: Dimuthu Upeksha >Priority: Major > Fix For: 0.18 > > > When experiments are launched the experiment status is changed to LAUNCHED. > But the PROCESS ID of the experiment is not really added to the zookeeper > queue and hence the it is not further processed by the helix. The > orchestrator was unable to connect to zookeeper and couldn't add the ID to > the queue and no errors in the orchestrator log as well. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRAVATA-2815) First experiment fails after API server restart
[ https://issues.apache.org/jira/browse/AIRAVATA-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831959#comment-16831959 ] Dimuthu Upeksha commented on AIRAVATA-2815: --- Fixed in [https://github.com/apache/airavata/commit/26d3f1a668adf9d069a46f434d461cea4eb23490] [https://github.com/apache/airavata/commit/feea5203dfb4fdd70caa994794b9bbb15b2ccd8d] [https://github.com/apache/airavata/commit/55b3dd6b9f958871288be7db482a885b41c09503] [https://github.com/apache/airavata/commit/82c57c7d637be78a16ab4f954a54ce70d56e2f12] > First experiment fails after API server restart > --- > > Key: AIRAVATA-2815 > URL: https://issues.apache.org/jira/browse/AIRAVATA-2815 > Project: Airavata > Issue Type: Bug > Components: Airavata API, helix implementation, Registry API >Affects Versions: 0.18 > Environment: https://staging.ultrascan.scigap.org/home >Reporter: Eroma >Assignee: Dimuthu Upeksha >Priority: Major > Fix For: 0.18 > > > After API server restart the connections to Helix servers drops. As a result > the very first experiment doesn't move beyond LAUNCHED and its failed in the > back end. > Helix should have a way of establishing the link and have the experiment > reprocessed rather than failing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRAVATA-2884) Unusual delay in helix job submission
[ https://issues.apache.org/jira/browse/AIRAVATA-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831956#comment-16831956 ] Dimuthu Upeksha commented on AIRAVATA-2884: --- Fixed in new stack. Fixed in [https://github.com/apache/airavata/commit/27eb5129e76dd8d0be7992a8c6b099314d1f5b7e] This occurred due to the bug in task creation logic where different tasks are getting same id and they eventually getting stacked up in helix queues > Unusual delay in helix job submission > - > > Key: AIRAVATA-2884 > URL: https://issues.apache.org/jira/browse/AIRAVATA-2884 > Project: Airavata > Issue Type: Bug > Components: helix implementation >Affects Versions: 0.18 > Environment: https://staging.seagrid.org >Reporter: Eroma >Assignee: Dimuthu Upeksha >Priority: Major > Fix For: 0.18 > > Attachments: Screen Shot 2018-09-25 at 2.03.59 PM.png > > > # Unusual delay in job submission from Helix. > # The job submission happened after 8, 10, etc minutes from the experiment > creation. > # Some of the IDs to check > ## SLM001-Gaussian-Carbonate:9_3d5f55c2-c3bf-47f2-939a-5c35585f12bb > ## SLM001-NEK5000-BR2:9_51ba9624-0db9-4e07-8efc-8d224a71081e > # There are some which are create long time ago and the job is not submitted > ## SLM001-NEK5000-BR2:9_08ccf14f-34d9-468c-ac9f-6ffe434cbef7 > ## SLM001-NEK5000-BR2:8_cff5e293-4a84-4835-bbce-fce10acaa254 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (AIRAVATA-2884) Unusual delay in helix job submission
[ https://issues.apache.org/jira/browse/AIRAVATA-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dimuthu Upeksha closed AIRAVATA-2884. - Resolution: Fixed > Unusual delay in helix job submission > - > > Key: AIRAVATA-2884 > URL: https://issues.apache.org/jira/browse/AIRAVATA-2884 > Project: Airavata > Issue Type: Bug > Components: helix implementation >Affects Versions: 0.18 > Environment: https://staging.seagrid.org >Reporter: Eroma >Assignee: Dimuthu Upeksha >Priority: Major > Fix For: 0.18 > > Attachments: Screen Shot 2018-09-25 at 2.03.59 PM.png > > > # Unusual delay in job submission from Helix. > # The job submission happened after 8, 10, etc minutes from the experiment > creation. > # Some of the IDs to check > ## SLM001-Gaussian-Carbonate:9_3d5f55c2-c3bf-47f2-939a-5c35585f12bb > ## SLM001-NEK5000-BR2:9_51ba9624-0db9-4e07-8efc-8d224a71081e > # There are some which are create long time ago and the job is not submitted > ## SLM001-NEK5000-BR2:9_08ccf14f-34d9-468c-ac9f-6ffe434cbef7 > ## SLM001-NEK5000-BR2:8_cff5e293-4a84-4835-bbce-fce10acaa254 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (AIRAVATA-2815) First experiment fails after API server restart
[ https://issues.apache.org/jira/browse/AIRAVATA-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dimuthu Upeksha closed AIRAVATA-2815. - Resolution: Fixed > First experiment fails after API server restart > --- > > Key: AIRAVATA-2815 > URL: https://issues.apache.org/jira/browse/AIRAVATA-2815 > Project: Airavata > Issue Type: Bug > Components: Airavata API, helix implementation, Registry API >Affects Versions: 0.18 > Environment: https://staging.ultrascan.scigap.org/home >Reporter: Eroma >Assignee: Dimuthu Upeksha >Priority: Major > Fix For: 0.18 > > > After API server restart the connections to Helix servers drops. As a result > the very first experiment doesn't move beyond LAUNCHED and its failed in the > back end. > Helix should have a way of establishing the link and have the experiment > reprocessed rather than failing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (AIRAVATA-2205) Conflicting Loggers
[ https://issues.apache.org/jira/browse/AIRAVATA-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dimuthu Upeksha closed AIRAVATA-2205. - Resolution: Fixed > Conflicting Loggers > > > Key: AIRAVATA-2205 > URL: https://issues.apache.org/jira/browse/AIRAVATA-2205 > Project: Airavata > Issue Type: Bug >Reporter: Ajinkya >Priority: Critical > > slf4j-log4j12-1.7.10.jar and log4j-1.2.17.jar need to be removed from lib > directory. > These jars are conflicting with new logback integration. > These files were removed during new logging implementation but was included > in later commits. > Basically, server won't start with these two jars in lib directory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRAVATA-2205) Conflicting Loggers
[ https://issues.apache.org/jira/browse/AIRAVATA-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831879#comment-16831879 ] Dimuthu Upeksha commented on AIRAVATA-2205: --- Fixed in latest distributions. So closing > Conflicting Loggers > > > Key: AIRAVATA-2205 > URL: https://issues.apache.org/jira/browse/AIRAVATA-2205 > Project: Airavata > Issue Type: Bug >Reporter: Ajinkya >Priority: Critical > > slf4j-log4j12-1.7.10.jar and log4j-1.2.17.jar need to be removed from lib > directory. > These jars are conflicting with new logback integration. > These files were removed during new logging implementation but was included > in later commits. > Basically, server won't start with these two jars in lib directory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRAVATA-1904) ARCHIVE did not happen in recovery
[ https://issues.apache.org/jira/browse/AIRAVATA-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831876#comment-16831876 ] Dimuthu Upeksha commented on AIRAVATA-1904: --- [~eroma_a] Do we still need this ticket? Can you verify the behavior with new stack? > ARCHIVE did not happen in recovery > -- > > Key: AIRAVATA-1904 > URL: https://issues.apache.org/jira/browse/AIRAVATA-1904 > Project: Airavata > Issue Type: Bug > Components: GFac, PGA PHP Web Gateway >Affects Versions: 0.16 > Environment: dev.seagrid.org >Reporter: Eroma >Assignee: Suresh Marru >Priority: Critical > Fix For: 0.18 > > > 1. i submitted an amber job to comet and when it was active in comet i > stopped GFAC. > 2. I started gfac agin while it was in running state. now the experiment is > completed but ARCHIVE did not happen. > 3. In storage location also ARCHIVE does not exist > gateway-user-data/dev-seagrid/Eroma2016/March_14th_2016/SLM3_AmberSander_Comet1457983469/PROCESS_b4b8f7ce-801d-403b-a286-d7755429eb84 > 4. This is amber_sander application and exp ID is > SLM3-AmberSander-Comet_59a5c095-73ba-4dd9-8d13-19709f6fa474 -- This message was sent by Atlassian JIRA (v7.6.3#76005)