[jira] [Created] (AIRAVATA-2941) Experiments fail to submit jobs to HPC cluster queues due to queue reaching the max job limit per user.
Eroma created AIRAVATA-2941: --- Summary: Experiments fail to submit jobs to HPC cluster queues due to queue reaching the max job limit per user. Key: AIRAVATA-2941 URL: https://issues.apache.org/jira/browse/AIRAVATA-2941 Project: Airavata Issue Type: Bug Components: GFac, helix implementation Affects Versions: 0.18 Environment: https://staging.ultrascan.scigap.org & https://ultrascan.scigap.org/ Reporter: Eroma Assignee: Dimuthu Upeksha Fix For: 0.18 Currently experiments fail when # HPC queue reaches the max job number for the queue. # When the job submission fails and HPC sent job submission response [1]airavata tags the experiment as FAILED. # The only option for gateway user is to submit the experiment again. Fix required is to Airavata to have internal queues or a way to manage such experiments until the HPC queue is available for jobs and not to FAIL the experiment. [1] This example os from stampede2 - Welcome to the Stampede2 Supercomputer - No reservation for this job --> Verifying valid submit host (login3)...OK --> Verifying valid jobname...OK --> Enforcing max jobs per user...FAILED [*] Too many simultaneous jobs in queue. --> Max job limits for us3 = 50 jobs -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRAVATA-2940) Sporadic JPA errors when invoking Registry Server APIs
[ https://issues.apache.org/jira/browse/AIRAVATA-2940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16684215#comment-16684215 ] Dimuthu Upeksha commented on AIRAVATA-2940: --- Still couldn't identify the cause for the issue but retrying on the API gives the result. So fixed in helix side to retry if an API call is failed https://github.com/apache/airavata/commit/274c73ffcc226daabfbe213a27b8f10ad53dac0b > Sporadic JPA errors when invoking Registry Server APIs > -- > > Key: AIRAVATA-2940 > URL: https://issues.apache.org/jira/browse/AIRAVATA-2940 > Project: Airavata > Issue Type: Bug > Components: Registry API >Affects Versions: 0.17 > Environment: staging >Reporter: Dimuthu Upeksha >Assignee: Dimuthu Upeksha >Priority: Major > > This issue occurs randomly at different registry components. It seems like a > general JPA bug or a misuse of JPA APIs in registry code. > 2018-11-10 18:29:28,003 [pool-10-thread-208241] ERROR > o.a.a.r.c.a.c.i.ApplicationDeploymentImpl - Error while retrieving > application deployment... > org.apache.airavata.registry.cpi.AppCatalogException: > > org.apache.openjpa.persistence.InvalidStateException: The context has been > closed. The stack trace at which the context was closed is available if > Runtime=TRACE logging is enabled. > at > org.apache.airavata.registry.core.app.catalog.resources.LibraryApendPathResource.get(LibraryApendPathResource.java:214) > at > org.apache.airavata.registry.core.app.catalog.util.AppCatalogThriftConversion.getApplicationDeploymentDescription(AppCatalogThriftConversion.java:758) > at > org.apache.airavata.registry.core.app.catalog.impl.ApplicationDeploymentImpl.getApplicationDeployement(ApplicationDeploymentImpl.java:326) > at > org.apache.airavata.registry.api.service.handler.RegistryServerHandler.getApplicationDeployment(RegistryServerHandler.java:1211) > at > org.apache.airavata.registry.api.RegistryService$Processor$getApplicationDeployment.getResult(RegistryService.java:14835) > at > org.apache.airavata.registry.api.RegistryService$Processor$getApplicationDeployment.getResult(RegistryService.java:14819) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.openjpa.persistence.InvalidStateException: The context > has been closed. The stack trace at which the context was closed is available > if Runtime=TRACE logging is enabled. > at org.apache.openjpa.kernel.BrokerImpl.assertOpen(BrokerImpl.java:4676) > at org.apache.openjpa.kernel.BrokerImpl.beginOperation(BrokerImpl.java:1930) > at org.apache.openjpa.kernel.BrokerImpl.commit(BrokerImpl.java:1503) > at > org.apache.openjpa.kernel.DelegatingBroker.commit(DelegatingBroker.java:933) > at > org.apache.openjpa.persistence.EntityManagerImpl.commit(EntityManagerImpl.java:570) > at > org.apache.airavata.registry.core.app.catalog.resources.LibraryApendPathResource.get(LibraryApendPathResource.java:205) > ... 11 common frames omitted > 2018-11-10 18:29:28,003 [pool-10-thread-208241] ERROR > o.a.a.r.a.s.h.RegistryServerHandler - > comet.sdsc.edu_Ultrascan_0091a13a-1fe5-41cf-8708-79a987e3021a > org.apache.airavata.registry.cpi.AppCatalogException: > org.apache.airavata.registry.cpi.AppCatalogException: > > org.apache.openjpa.persistence.InvalidStateException: The context has been > closed. The stack trace at which the context was closed is available if > Runtime=TRACE logging is enabled. > at > org.apache.airavata.registry.core.app.catalog.impl.ApplicationDeploymentImpl.getApplicationDeployement(ApplicationDeploymentImpl.java:329) > at > org.apache.airavata.registry.api.service.handler.RegistryServerHandler.getApplicationDeployment(RegistryServerHandler.java:1211) > at > org.apache.airavata.registry.api.RegistryService$Processor$getApplicationDeployment.getResult(RegistryService.java:14835) > at > org.apache.airavata.registry.api.RegistryService$Processor$getApplicationDeployment.getResult(RegistryService.java:14819) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at >
[jira] [Created] (AIRAVATA-2940) Sporadic JPA errors when invoking Registry Server APIs
Dimuthu Upeksha created AIRAVATA-2940: - Summary: Sporadic JPA errors when invoking Registry Server APIs Key: AIRAVATA-2940 URL: https://issues.apache.org/jira/browse/AIRAVATA-2940 Project: Airavata Issue Type: Bug Components: Registry API Affects Versions: 0.17 Environment: staging Reporter: Dimuthu Upeksha Assignee: Dimuthu Upeksha This issue occurs randomly at different registry components. It seems like a general JPA bug or a misuse of JPA APIs in registry code. 2018-11-10 18:29:28,003 [pool-10-thread-208241] ERROR o.a.a.r.c.a.c.i.ApplicationDeploymentImpl - Error while retrieving application deployment... org.apache.airavata.registry.cpi.AppCatalogException: org.apache.openjpa.persistence.InvalidStateException: The context has been closed. The stack trace at which the context was closed is available if Runtime=TRACE logging is enabled. at org.apache.airavata.registry.core.app.catalog.resources.LibraryApendPathResource.get(LibraryApendPathResource.java:214) at org.apache.airavata.registry.core.app.catalog.util.AppCatalogThriftConversion.getApplicationDeploymentDescription(AppCatalogThriftConversion.java:758) at org.apache.airavata.registry.core.app.catalog.impl.ApplicationDeploymentImpl.getApplicationDeployement(ApplicationDeploymentImpl.java:326) at org.apache.airavata.registry.api.service.handler.RegistryServerHandler.getApplicationDeployment(RegistryServerHandler.java:1211) at org.apache.airavata.registry.api.RegistryService$Processor$getApplicationDeployment.getResult(RegistryService.java:14835) at org.apache.airavata.registry.api.RegistryService$Processor$getApplicationDeployment.getResult(RegistryService.java:14819) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.openjpa.persistence.InvalidStateException: The context has been closed. The stack trace at which the context was closed is available if Runtime=TRACE logging is enabled. at org.apache.openjpa.kernel.BrokerImpl.assertOpen(BrokerImpl.java:4676) at org.apache.openjpa.kernel.BrokerImpl.beginOperation(BrokerImpl.java:1930) at org.apache.openjpa.kernel.BrokerImpl.commit(BrokerImpl.java:1503) at org.apache.openjpa.kernel.DelegatingBroker.commit(DelegatingBroker.java:933) at org.apache.openjpa.persistence.EntityManagerImpl.commit(EntityManagerImpl.java:570) at org.apache.airavata.registry.core.app.catalog.resources.LibraryApendPathResource.get(LibraryApendPathResource.java:205) ... 11 common frames omitted 2018-11-10 18:29:28,003 [pool-10-thread-208241] ERROR o.a.a.r.a.s.h.RegistryServerHandler - comet.sdsc.edu_Ultrascan_0091a13a-1fe5-41cf-8708-79a987e3021a org.apache.airavata.registry.cpi.AppCatalogException: org.apache.airavata.registry.cpi.AppCatalogException: org.apache.openjpa.persistence.InvalidStateException: The context has been closed. The stack trace at which the context was closed is available if Runtime=TRACE logging is enabled. at org.apache.airavata.registry.core.app.catalog.impl.ApplicationDeploymentImpl.getApplicationDeployement(ApplicationDeploymentImpl.java:329) at org.apache.airavata.registry.api.service.handler.RegistryServerHandler.getApplicationDeployment(RegistryServerHandler.java:1211) at org.apache.airavata.registry.api.RegistryService$Processor$getApplicationDeployment.getResult(RegistryService.java:14835) at org.apache.airavata.registry.api.RegistryService$Processor$getApplicationDeployment.getResult(RegistryService.java:14819) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.airavata.registry.cpi.AppCatalogException: org.apache.openjpa.persistence.InvalidStateException: The context has been closed. The stack trace at which the context was closed is available if Runtime=TRACE logging is enabled. at org.apache.airavata.registry.core.app.catalog.resources.LibraryApendPathResource.get(LibraryApendPathResource.java:214) at org.apache.airavata.registry.core.app.catalog.util.AppCatalogThriftConversion.getApplicationDeploymentDescription(AppCatalogThriftConversion.java:758) at
[jira] [Commented] (AIRAVATA-2939) gateway_id set to null when application interface updated
[ https://issues.apache.org/jira/browse/AIRAVATA-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683871#comment-16683871 ] ASF subversion and git services commented on AIRAVATA-2939: --- Commit 079bbbca457b7d9c49e54a75235943d5d1d0459d in airavata's branch refs/heads/develop from [~marcuschristie] [ https://gitbox.apache.org/repos/asf?p=airavata.git;h=079bbbc ] AIRAVATA-2939 Don't merge referenced entities on app module mapping > gateway_id set to null when application interface updated > - > > Key: AIRAVATA-2939 > URL: https://issues.apache.org/jira/browse/AIRAVATA-2939 > Project: Airavata > Issue Type: Bug > Components: Django Portal >Affects Versions: 0.18 > Environment: https://beta-sciencegateway.brylinski.org >Reporter: Eroma >Assignee: Marcus Christie >Priority: Major > Fix For: 0.18 > > > Added a new application and an interface input. Then retrieved the > application again and added metadata in to the existing input and added > another input and updated the application. Both the existed d and the new > input are gone! > > The inputs are in the database table but the METADATA is not saved. It is not > displayed or visible from the django portal -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRAVATA-2939) gateway_id set to null when application interface updated
[ https://issues.apache.org/jira/browse/AIRAVATA-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Christie resolved AIRAVATA-2939. --- Resolution: Fixed > gateway_id set to null when application interface updated > - > > Key: AIRAVATA-2939 > URL: https://issues.apache.org/jira/browse/AIRAVATA-2939 > Project: Airavata > Issue Type: Bug > Components: Django Portal >Affects Versions: 0.18 > Environment: https://beta-sciencegateway.brylinski.org >Reporter: Eroma >Assignee: Marcus Christie >Priority: Major > Fix For: 0.18 > > > Added a new application and an interface input. Then retrieved the > application again and added metadata in to the existing input and added > another input and updated the application. Both the existed d and the new > input are gone! > > The inputs are in the database table but the METADATA is not saved. It is not > displayed or visible from the django portal -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRAVATA-2939) gateway_id set to null when application interface updated
[ https://issues.apache.org/jira/browse/AIRAVATA-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683821#comment-16683821 ] ASF subversion and git services commented on AIRAVATA-2939: --- Commit 2dd7049e0a73478918b72bd9fe3a279bf46c25d8 in airavata's branch refs/heads/develop from [~marcuschristie] [ https://gitbox.apache.org/repos/asf?p=airavata.git;h=2dd7049 ] AIRAVATA-2939 Prevent setting gateway_id to null > gateway_id set to null when application interface updated > - > > Key: AIRAVATA-2939 > URL: https://issues.apache.org/jira/browse/AIRAVATA-2939 > Project: Airavata > Issue Type: Bug > Components: Django Portal >Affects Versions: 0.18 > Environment: https://beta-sciencegateway.brylinski.org >Reporter: Eroma >Assignee: Marcus Christie >Priority: Major > Fix For: 0.18 > > > Added a new application and an interface input. Then retrieved the > application again and added metadata in to the existing input and added > another input and updated the application. Both the existed d and the new > input are gone! > > The inputs are in the database table but the METADATA is not saved. It is not > displayed or visible from the django portal -- This message was sent by Atlassian JIRA (v7.6.3#76005)