[ 
https://issues.apache.org/jira/browse/BEAM-14080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janek Bevendorff updated BEAM-14080:
------------------------------------
    Description: 
I submit Python Beam jobs to our Flink cluster with the PortableRunner through 
a remote job server. If a job finishes within a few seconds or minutes, the 
return status (including a dump of any Python exceptions in case there was an 
error) is returned to the client upon completion.

If the job, however, runs for longer (say) hours, then the client and job 
server seem to lose connection. This results in the client hanging forever 
until I press Ctrl+C to terminate it, even long after the actual job has 
completed (which has no effect whatsoever on the actual job).

Example pseudo job:
{code:java}
print('Job started')
with beam.Pipeline() as pipeline:
    pipeline | DoSomething()
print('Job finished'){code}
If the pipeline finishes quickly, it looks like this from the client's 
perspective:
{code:java}
python3 myjob.py
Job started
Job finished{code}


If the job runs for longer, then the {{with}} statement never finishes and I 
have to abort the Python script with Ctrl+C:
{code:java}
 {code}
$ python3 myjob.py
Job started
^C
${code}

  was:
I submit Python Beam jobs to our Flink cluster with the PortableRunner through 
a remote job server. If a job finishes within a few seconds or minutes, the 
return status (including a dump of any Python exceptions in case there was an 
error) is returned to the client upon completion.

If the job, however, runs for longer (say) hours, then the client and job 
server seem to lose connection. This results in the client hanging forever 
until I press Ctrl+C to terminate it, even long after the actual job has 
completed (which has no effect whatsoever on the actual job).

Example pseudo job:
{code:java}
print('Job started')
with beam.Pipeline() as pipeline:
    pipeline | DoSomething()
print('Job finished'){code}
If the pipeline finishes quickly, it looks like this from the client's 
perspective:
{noformat}
$ python3 myjob.py
Job started
Job finished
\${noformat}
If the job runs for longer, then the {{with}} statement never finishes and I 
have to abort the Python script with Ctrl+C:
{code:java}
 {code}
$ python3 myjob.py
Job started
^C
${code}


> Portable runner does not return job exit status to client after long-running 
> job
> --------------------------------------------------------------------------------
>
>                 Key: BEAM-14080
>                 URL: https://issues.apache.org/jira/browse/BEAM-14080
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-harness
>    Affects Versions: 2.36.0
>            Reporter: Janek Bevendorff
>            Priority: P2
>
> I submit Python Beam jobs to our Flink cluster with the PortableRunner 
> through a remote job server. If a job finishes within a few seconds or 
> minutes, the return status (including a dump of any Python exceptions in case 
> there was an error) is returned to the client upon completion.
> If the job, however, runs for longer (say) hours, then the client and job 
> server seem to lose connection. This results in the client hanging forever 
> until I press Ctrl+C to terminate it, even long after the actual job has 
> completed (which has no effect whatsoever on the actual job).
> Example pseudo job:
> {code:java}
> print('Job started')
> with beam.Pipeline() as pipeline:
>     pipeline | DoSomething()
> print('Job finished'){code}
> If the pipeline finishes quickly, it looks like this from the client's 
> perspective:
> {code:java}
> python3 myjob.py
> Job started
> Job finished{code}
> If the job runs for longer, then the {{with}} statement never finishes and I 
> have to abort the Python script with Ctrl+C:
> {code:java}
>  {code}
> $ python3 myjob.py
> Job started
> ^C
> ${code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to