[jira] [Closed] (AIRAVATA-2738) Experiments are not actually LAUNCHED from orchestrator and not in zookeeper queue

2019-05-02 Thread Dimuthu Upeksha (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRAVATA-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dimuthu Upeksha closed AIRAVATA-2738.
-
Resolution: Fixed

> Experiments are not actually LAUNCHED from orchestrator and not in zookeeper 
> queue
> --
>
> Key: AIRAVATA-2738
> URL: https://issues.apache.org/jira/browse/AIRAVATA-2738
> Project: Airavata
>  Issue Type: Bug
>  Components: helix implementation
>Affects Versions: 0.18
>Reporter: Eroma
>Assignee: Dimuthu Upeksha
>Priority: Major
> Fix For: 0.18
>
>
> When experiments are launched the experiment status is changed to LAUNCHED. 
> But the PROCESS ID of the experiment is not really added to the zookeeper 
> queue and hence the it is not further processed by the helix. The 
> orchestrator was unable to connect to zookeeper and couldn't add the ID to 
> the queue and no errors in the orchestrator log as well.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRAVATA-2738) Experiments are not actually LAUNCHED from orchestrator and not in zookeeper queue

2019-05-02 Thread Dimuthu Upeksha (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRAVATA-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831962#comment-16831962
 ] 

Dimuthu Upeksha commented on AIRAVATA-2738:
---

Fixed by moving zk level metadata storage to database

> Experiments are not actually LAUNCHED from orchestrator and not in zookeeper 
> queue
> --
>
> Key: AIRAVATA-2738
> URL: https://issues.apache.org/jira/browse/AIRAVATA-2738
> Project: Airavata
>  Issue Type: Bug
>  Components: helix implementation
>Affects Versions: 0.18
>Reporter: Eroma
>Assignee: Dimuthu Upeksha
>Priority: Major
> Fix For: 0.18
>
>
> When experiments are launched the experiment status is changed to LAUNCHED. 
> But the PROCESS ID of the experiment is not really added to the zookeeper 
> queue and hence the it is not further processed by the helix. The 
> orchestrator was unable to connect to zookeeper and couldn't add the ID to 
> the queue and no errors in the orchestrator log as well.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRAVATA-2815) First experiment fails after API server restart

2019-05-02 Thread Dimuthu Upeksha (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRAVATA-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831959#comment-16831959
 ] 

Dimuthu Upeksha commented on AIRAVATA-2815:
---

Fixed in

[https://github.com/apache/airavata/commit/26d3f1a668adf9d069a46f434d461cea4eb23490]

[https://github.com/apache/airavata/commit/feea5203dfb4fdd70caa994794b9bbb15b2ccd8d]

[https://github.com/apache/airavata/commit/55b3dd6b9f958871288be7db482a885b41c09503]

[https://github.com/apache/airavata/commit/82c57c7d637be78a16ab4f954a54ce70d56e2f12]

> First experiment fails after API server restart
> ---
>
> Key: AIRAVATA-2815
> URL: https://issues.apache.org/jira/browse/AIRAVATA-2815
> Project: Airavata
>  Issue Type: Bug
>  Components: Airavata API, helix implementation, Registry API
>Affects Versions: 0.18
> Environment: https://staging.ultrascan.scigap.org/home
>Reporter: Eroma
>Assignee: Dimuthu Upeksha
>Priority: Major
> Fix For: 0.18
>
>
> After API server restart the connections to Helix servers drops. As a result 
> the very first experiment doesn't move beyond LAUNCHED and its failed in the 
> back end.
> Helix should have a way of establishing the link and have the experiment 
> reprocessed rather than failing. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRAVATA-2884) Unusual delay in helix job submission

2019-05-02 Thread Dimuthu Upeksha (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRAVATA-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831956#comment-16831956
 ] 

Dimuthu Upeksha commented on AIRAVATA-2884:
---

Fixed in new stack. Fixed in 
[https://github.com/apache/airavata/commit/27eb5129e76dd8d0be7992a8c6b099314d1f5b7e]

This occurred due to the bug in task creation logic where different tasks are 
getting same id and they eventually getting stacked up in helix queues

> Unusual delay in helix job submission
> -
>
> Key: AIRAVATA-2884
> URL: https://issues.apache.org/jira/browse/AIRAVATA-2884
> Project: Airavata
>  Issue Type: Bug
>  Components: helix implementation
>Affects Versions: 0.18
> Environment: https://staging.seagrid.org
>Reporter: Eroma
>Assignee: Dimuthu Upeksha
>Priority: Major
> Fix For: 0.18
>
> Attachments: Screen Shot 2018-09-25 at 2.03.59 PM.png
>
>
> # Unusual delay in job submission from Helix.
>  # The job submission happened after 8, 10, etc minutes from the experiment 
> creation.
>  # Some of the IDs to check
>  ## SLM001-Gaussian-Carbonate:9_3d5f55c2-c3bf-47f2-939a-5c35585f12bb
>  ## SLM001-NEK5000-BR2:9_51ba9624-0db9-4e07-8efc-8d224a71081e
>  # There are some which are create long time ago and the job is not submitted
>  ## SLM001-NEK5000-BR2:9_08ccf14f-34d9-468c-ac9f-6ffe434cbef7
>  ## SLM001-NEK5000-BR2:8_cff5e293-4a84-4835-bbce-fce10acaa254



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRAVATA-2884) Unusual delay in helix job submission

2019-05-02 Thread Dimuthu Upeksha (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRAVATA-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dimuthu Upeksha closed AIRAVATA-2884.
-
Resolution: Fixed

> Unusual delay in helix job submission
> -
>
> Key: AIRAVATA-2884
> URL: https://issues.apache.org/jira/browse/AIRAVATA-2884
> Project: Airavata
>  Issue Type: Bug
>  Components: helix implementation
>Affects Versions: 0.18
> Environment: https://staging.seagrid.org
>Reporter: Eroma
>Assignee: Dimuthu Upeksha
>Priority: Major
> Fix For: 0.18
>
> Attachments: Screen Shot 2018-09-25 at 2.03.59 PM.png
>
>
> # Unusual delay in job submission from Helix.
>  # The job submission happened after 8, 10, etc minutes from the experiment 
> creation.
>  # Some of the IDs to check
>  ## SLM001-Gaussian-Carbonate:9_3d5f55c2-c3bf-47f2-939a-5c35585f12bb
>  ## SLM001-NEK5000-BR2:9_51ba9624-0db9-4e07-8efc-8d224a71081e
>  # There are some which are create long time ago and the job is not submitted
>  ## SLM001-NEK5000-BR2:9_08ccf14f-34d9-468c-ac9f-6ffe434cbef7
>  ## SLM001-NEK5000-BR2:8_cff5e293-4a84-4835-bbce-fce10acaa254



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRAVATA-2815) First experiment fails after API server restart

2019-05-02 Thread Dimuthu Upeksha (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRAVATA-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dimuthu Upeksha closed AIRAVATA-2815.
-
Resolution: Fixed

> First experiment fails after API server restart
> ---
>
> Key: AIRAVATA-2815
> URL: https://issues.apache.org/jira/browse/AIRAVATA-2815
> Project: Airavata
>  Issue Type: Bug
>  Components: Airavata API, helix implementation, Registry API
>Affects Versions: 0.18
> Environment: https://staging.ultrascan.scigap.org/home
>Reporter: Eroma
>Assignee: Dimuthu Upeksha
>Priority: Major
> Fix For: 0.18
>
>
> After API server restart the connections to Helix servers drops. As a result 
> the very first experiment doesn't move beyond LAUNCHED and its failed in the 
> back end.
> Helix should have a way of establishing the link and have the experiment 
> reprocessed rather than failing. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRAVATA-2205) Conflicting Loggers

2019-05-02 Thread Dimuthu Upeksha (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRAVATA-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dimuthu Upeksha closed AIRAVATA-2205.
-
Resolution: Fixed

> Conflicting Loggers 
> 
>
> Key: AIRAVATA-2205
> URL: https://issues.apache.org/jira/browse/AIRAVATA-2205
> Project: Airavata
>  Issue Type: Bug
>Reporter: Ajinkya
>Priority: Critical
>
> slf4j-log4j12-1.7.10.jar and log4j-1.2.17.jar need to be removed from lib 
> directory.
> These jars are conflicting with new logback integration.
> These files were removed during new logging implementation but was included 
> in later commits. 
> Basically, server won't start with these two jars in lib directory.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRAVATA-2205) Conflicting Loggers

2019-05-02 Thread Dimuthu Upeksha (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRAVATA-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831879#comment-16831879
 ] 

Dimuthu Upeksha commented on AIRAVATA-2205:
---

Fixed in latest distributions. So closing

> Conflicting Loggers 
> 
>
> Key: AIRAVATA-2205
> URL: https://issues.apache.org/jira/browse/AIRAVATA-2205
> Project: Airavata
>  Issue Type: Bug
>Reporter: Ajinkya
>Priority: Critical
>
> slf4j-log4j12-1.7.10.jar and log4j-1.2.17.jar need to be removed from lib 
> directory.
> These jars are conflicting with new logback integration.
> These files were removed during new logging implementation but was included 
> in later commits. 
> Basically, server won't start with these two jars in lib directory.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRAVATA-1904) ARCHIVE did not happen in recovery

2019-05-02 Thread Dimuthu Upeksha (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRAVATA-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831876#comment-16831876
 ] 

Dimuthu Upeksha commented on AIRAVATA-1904:
---

[~eroma_a] Do we still need this ticket? Can you verify the behavior with new 
stack?

> ARCHIVE did not happen in recovery
> --
>
> Key: AIRAVATA-1904
> URL: https://issues.apache.org/jira/browse/AIRAVATA-1904
> Project: Airavata
>  Issue Type: Bug
>  Components: GFac, PGA PHP Web Gateway
>Affects Versions: 0.16
> Environment: dev.seagrid.org
>Reporter: Eroma
>Assignee: Suresh Marru
>Priority: Critical
> Fix For: 0.18
>
>
> 1. i submitted an amber job to comet and when it was active in comet i 
> stopped GFAC. 
> 2. I started gfac agin while it was in running state. now the experiment is 
> completed but ARCHIVE did not happen. 
> 3. In storage location also ARCHIVE does not exist 
> gateway-user-data/dev-seagrid/Eroma2016/March_14th_2016/SLM3_AmberSander_Comet1457983469/PROCESS_b4b8f7ce-801d-403b-a286-d7755429eb84
> 4. This is amber_sander application and exp ID is 
> SLM3-AmberSander-Comet_59a5c095-73ba-4dd9-8d13-19709f6fa474



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)