[ https://issues.apache.org/jira/browse/AIRAVATA-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dimuthu Upeksha resolved AIRAVATA-2831. --------------------------------------- Resolution: Fixed This should be fixed after data staging retrying implementation > Experiment FAILED with an error on output file staging! But the file > referring in the error is actually downloaded and available in storage. > -------------------------------------------------------------------------------------------------------------------------------------------- > > Key: AIRAVATA-2831 > URL: https://issues.apache.org/jira/browse/AIRAVATA-2831 > Project: Airavata > Issue Type: Bug > Components: helix implementation > Affects Versions: 0.18 > Environment: https://staging.seagrid.org/ > Reporter: Eroma > Assignee: Dimuthu Upeksha > Priority: Major > Fix For: 0.18 > > > # When experiments were launched and jobs were submitted bot real time > monitoring and email monitoring was stopped. > # Started realtime monitoring and then the job statuses got updated > correctly. > # Then stopped the realtime monitoring and started email monitoing. > # Job statuses got updated correctly but experiment status of some are > FAILED with error [1] > # But the file is already transfered. > # exp ID: SLM005-QEspresso-JS:2_1fec2375-945b-4b21-8157-5e91b1391312 and job > iD: 237.torque-server > [1] > |org.apache.airavata.helix.impl.task.TaskOnFailException: Error Code : > 01ee4646-2139-40b8-840e-348e37b1823f, Task > TASK_f5726ea4-638f-4c41-9904-0b3c766fcaee failed due to Error while checking > the file > /N/SEAGrid_scratch//PROCESS_f0192239-787a-4f8f-b63e-7cb45a837f4a/Quantum_Espresso.stdout > existence, net.schmizz.sshj.connection.ConnectionException: > [CONNECTION_LOST] Did not receive any keep-alive response for 25 seconds at > org.apache.airavata.helix.impl.task.AiravataTask.onFail(AiravataTask.java:102) > at > org.apache.airavata.helix.impl.task.staging.OutputDataStagingTask.onRun(OutputDataStagingTask.java:187) > at > org.apache.airavata.helix.impl.task.AiravataTask.onRun(AiravataTask.java:311) > at org.apache.airavata.helix.core.AbstractTask.run(AbstractTask.java:90) at > org.apache.helix.task.TaskRunner.run(TaskRunner.java:71) at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at > java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) Caused by: > org.apache.airavata.agents.api.AgentException: > net.schmizz.sshj.connection.ConnectionException: [CONNECTION_LOST] Did not > receive any keep-alive response for 25 seconds at > org.apache.airavata.helix.adaptor.SSHJAgentAdaptor.doesFileExist(SSHJAgentAdaptor.java:183) > at > org.apache.airavata.helix.impl.task.staging.DataStagingTask.transferFileToStorage(DataStagingTask.java:141) > at > org.apache.airavata.helix.impl.task.staging.OutputDataStagingTask.onRun(OutputDataStagingTask.java:172) > ... 10 more Caused by: net.schmizz.sshj.connection.ConnectionException: > [CONNECTION_LOST] Did not receive any keep-alive response for 25 seconds at > net.schmizz.keepalive.KeepAliveRunner.checkMaxReached(KeepAliveRunner.java:64) > at > net.schmizz.keepalive.KeepAliveRunner.doKeepAlive(KeepAliveRunner.java:56) at > net.schmizz.keepalive.KeepAlive.run(KeepAlive.java:63)| -- This message was sent by Atlassian JIRA (v7.6.3#76005)