[jira] [Updated] (BEAM-3189) Python Fnapi - Worker speedup

2017-11-14 Thread Ankur Goenka (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-3189:
---
Affects Version/s: (was: 2.2.0)
   2.3.0

> Python Fnapi - Worker speedup
> -
>
> Key: BEAM-3189
> URL: https://issues.apache.org/jira/browse/BEAM-3189
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-harness
>Affects Versions: 2.3.0
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Minor
>  Labels: portability
>
> Python post commits are failing because the runner harness is not compatible 
> with the sdk harness.
> We need a new runner harness compatible with: 
> https://github.com/apache/beam/commit/80c6f4ec0c2a3cc3a441289a9cc8ff53cb70f863



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (BEAM-3189) Python Fnapi - Worker speedup

2017-11-14 Thread Ankur Goenka (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-3189:
---
Labels: performance portability  (was: portability)

> Python Fnapi - Worker speedup
> -
>
> Key: BEAM-3189
> URL: https://issues.apache.org/jira/browse/BEAM-3189
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Affects Versions: 2.3.0
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Minor
>  Labels: performance, portability
>
> Python post commits are failing because the runner harness is not compatible 
> with the sdk harness.
> We need a new runner harness compatible with: 
> https://github.com/apache/beam/commit/80c6f4ec0c2a3cc3a441289a9cc8ff53cb70f863



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-3189) Python Fnapi - Worker speedup

2017-11-14 Thread Ankur Goenka (JIRA)
Ankur Goenka created BEAM-3189:
--

 Summary: Python Fnapi - Worker speedup
 Key: BEAM-3189
 URL: https://issues.apache.org/jira/browse/BEAM-3189
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-harness
Affects Versions: 2.2.0
Reporter: Ankur Goenka
Assignee: Valentyn Tymofieiev


Python post commits are failing because the runner harness is not compatible 
with the sdk harness.

We need a new runner harness compatible with: 
https://github.com/apache/beam/commit/80c6f4ec0c2a3cc3a441289a9cc8ff53cb70f863



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (BEAM-3189) Python Fnapi - Worker speedup

2017-11-14 Thread Ankur Goenka (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-3189:
---
Component/s: (was: sdk-java-harness)
 sdk-py-harness

> Python Fnapi - Worker speedup
> -
>
> Key: BEAM-3189
> URL: https://issues.apache.org/jira/browse/BEAM-3189
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Affects Versions: 2.3.0
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Minor
>  Labels: performance, portability
>
> Python post commits are failing because the runner harness is not compatible 
> with the sdk harness.
> We need a new runner harness compatible with: 
> https://github.com/apache/beam/commit/80c6f4ec0c2a3cc3a441289a9cc8ff53cb70f863



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (BEAM-3189) Python Fnapi - Worker speedup

2017-11-14 Thread Ankur Goenka (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-3189:
---
Description: 
Beam Python SDK is couple of magnitude slower than Java SDK when it comes to 
stream processing.
There are two related issues:
# Given a single core, currently we are not fully utilizing the core because 
the single thread spends a lot of time on the IO. This is more of a limitation 
of our implementation rather than a limitation of Python.
# Given a machine with multiple cores, single Python process could only utilize 
one core.

In this task we will only address 1. 2 will be good for future optimization.


  was:
Python post commits are failing because the runner harness is not compatible 
with the sdk harness.

We need a new runner harness compatible with: 
https://github.com/apache/beam/commit/80c6f4ec0c2a3cc3a441289a9cc8ff53cb70f863


> Python Fnapi - Worker speedup
> -
>
> Key: BEAM-3189
> URL: https://issues.apache.org/jira/browse/BEAM-3189
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Affects Versions: 2.3.0
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Minor
>  Labels: performance, portability
>
> Beam Python SDK is couple of magnitude slower than Java SDK when it comes to 
> stream processing.
> There are two related issues:
> # Given a single core, currently we are not fully utilizing the core because 
> the single thread spends a lot of time on the IO. This is more of a 
> limitation of our implementation rather than a limitation of Python.
> # Given a machine with multiple cores, single Python process could only 
> utilize one core.
> In this task we will only address 1. 2 will be good for future optimization.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-3189) Python Fnapi - Worker speedup

2017-11-14 Thread Ankur Goenka (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka reassigned BEAM-3189:
--

Assignee: Ankur Goenka  (was: Valentyn Tymofieiev)

> Python Fnapi - Worker speedup
> -
>
> Key: BEAM-3189
> URL: https://issues.apache.org/jira/browse/BEAM-3189
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Affects Versions: 2.3.0
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>  Labels: portability
>
> Python post commits are failing because the runner harness is not compatible 
> with the sdk harness.
> We need a new runner harness compatible with: 
> https://github.com/apache/beam/commit/80c6f4ec0c2a3cc3a441289a9cc8ff53cb70f863



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (BEAM-3189) Python Fnapi - Worker speedup

2017-11-14 Thread Ankur Goenka (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-3189:
---
Issue Type: Improvement  (was: Bug)

> Python Fnapi - Worker speedup
> -
>
> Key: BEAM-3189
> URL: https://issues.apache.org/jira/browse/BEAM-3189
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-harness
>Affects Versions: 2.3.0
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>  Labels: portability
>
> Python post commits are failing because the runner harness is not compatible 
> with the sdk harness.
> We need a new runner harness compatible with: 
> https://github.com/apache/beam/commit/80c6f4ec0c2a3cc3a441289a9cc8ff53cb70f863



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-3260) Honor Python SDK log level while logging

2017-11-27 Thread Ankur Goenka (JIRA)
Ankur Goenka created BEAM-3260:
--

 Summary: Honor Python SDK log level while logging
 Key: BEAM-3260
 URL: https://issues.apache.org/jira/browse/BEAM-3260
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-harness
Reporter: Ankur Goenka
Assignee: Ankur Goenka
Priority: Minor


All messages from Python SDK are shown as info in the stack driver even if they 
are warn or error.
The root cause might be that SDK Harness is not setting the right bit while 
reporting them to stackdriver.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-3239) Enable debug server for python sdk workers

2017-11-22 Thread Ankur Goenka (JIRA)
Ankur Goenka created BEAM-3239:
--

 Summary: Enable debug server for python sdk workers
 Key: BEAM-3239
 URL: https://issues.apache.org/jira/browse/BEAM-3239
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: Ankur Goenka
Assignee: Ankur Goenka
Priority: Minor


Enable status server to dump threads when http get call is made to the server.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-3230) beam_PerformanceTests_Python is failing

2017-11-21 Thread Ankur Goenka (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261211#comment-16261211
 ] 

Ankur Goenka commented on BEAM-3230:


Initial skimming of the logs reveal unknown option  'python_requires'

STDERR: /usr/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown 
distribution option: 'python_requires'


> beam_PerformanceTests_Python is failing
> ---
>
> Key: BEAM-3230
> URL: https://issues.apache.org/jira/browse/BEAM-3230
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ahmet Altay
>
> Jenkings beam_PerformanceTests_Python stage is failing for python builds.
> Here is the link to a failure console output 
> https://builds.apache.org/job/beam_PerformanceTests_Python/582/console



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-3230) beam_PerformanceTests_Python is failing

2017-11-21 Thread Ankur Goenka (JIRA)
Ankur Goenka created BEAM-3230:
--

 Summary: beam_PerformanceTests_Python is failing
 Key: BEAM-3230
 URL: https://issues.apache.org/jira/browse/BEAM-3230
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Ankur Goenka
Assignee: Ahmet Altay


Jenkings beam_PerformanceTests_Python stage is failing for python builds.
Here is the link to a failure console output 
https://builds.apache.org/job/beam_PerformanceTests_Python/582/console




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-3230) beam_PerformanceTests_Python is failing

2017-11-21 Thread Ankur Goenka (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261688#comment-16261688
 ] 

Ankur Goenka commented on BEAM-3230:


The default worker container image for fnapi seems to be incompatible with the 
latest code. This causes the worker to fail and get into a restart loop.

Ran the test case with fnapi image tag beam-2.3.0-20171121 and the test case 
passed.
Trying to run again on jenkins server.

> beam_PerformanceTests_Python is failing
> ---
>
> Key: BEAM-3230
> URL: https://issues.apache.org/jira/browse/BEAM-3230
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ahmet Altay
>
> Jenkings beam_PerformanceTests_Python stage is failing for python builds.
> Here is the link to a failure console output 
> https://builds.apache.org/job/beam_PerformanceTests_Python/582/console



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-3230) beam_PerformanceTests_Python is failing

2017-11-21 Thread Ankur Goenka (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka reassigned BEAM-3230:
--

Assignee: Ankur Goenka  (was: Ahmet Altay)

> beam_PerformanceTests_Python is failing
> ---
>
> Key: BEAM-3230
> URL: https://issues.apache.org/jira/browse/BEAM-3230
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>
> Jenkings beam_PerformanceTests_Python stage is failing for python builds.
> Here is the link to a failure console output 
> https://builds.apache.org/job/beam_PerformanceTests_Python/582/console



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-3230) beam_PerformanceTests_Python is failing

2017-11-21 Thread Ankur Goenka (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261776#comment-16261776
 ] 

Ankur Goenka commented on BEAM-3230:


The jenkins build timeout. Doubling the timeout to 20min by passing 
beam_it_timeout = 1200

This Bug also depends upon PR 4158

> beam_PerformanceTests_Python is failing
> ---
>
> Key: BEAM-3230
> URL: https://issues.apache.org/jira/browse/BEAM-3230
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ahmet Altay
>
> Jenkings beam_PerformanceTests_Python stage is failing for python builds.
> Here is the link to a failure console output 
> https://builds.apache.org/job/beam_PerformanceTests_Python/582/console



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (BEAM-3230) beam_PerformanceTests_Python is failing

2017-11-21 Thread Ankur Goenka (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261776#comment-16261776
 ] 

Ankur Goenka edited comment on BEAM-3230 at 11/22/17 12:51 AM:
---

The jenkins build timeout. Doubling the timeout to 20min by passing 
beam_it_timeout = 1200 .
[Reference | 
https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/blob/903eb82d4e7ee5ed2ac2953cf6ce1b80459497ed/perfkitbenchmarker/beam_benchmark_helper.py#L54]
 

This Bug also depends upon [PR4158 | https://github.com/apache/beam/pull/4158]


was (Author: angoenka):
The jenkins build timeout. Doubling the timeout to 20min by passing 
beam_it_timeout = 1200 .
[Reference | 
https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/blob/903eb82d4e7ee5ed2ac2953cf6ce1b80459497ed/perfkitbenchmarker/beam_benchmark_helper.py#L54]
 

This Bug also depends upon PR 4158

> beam_PerformanceTests_Python is failing
> ---
>
> Key: BEAM-3230
> URL: https://issues.apache.org/jira/browse/BEAM-3230
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>
> Jenkings beam_PerformanceTests_Python stage is failing for python builds.
> Here is the link to a failure console output 
> https://builds.apache.org/job/beam_PerformanceTests_Python/582/console



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (BEAM-3230) beam_PerformanceTests_Python is failing

2017-11-21 Thread Ankur Goenka (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261776#comment-16261776
 ] 

Ankur Goenka edited comment on BEAM-3230 at 11/22/17 12:50 AM:
---

The jenkins build timeout. Doubling the timeout to 20min by passing 
beam_it_timeout = 1200 .
[Reference | 
https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/blob/903eb82d4e7ee5ed2ac2953cf6ce1b80459497ed/perfkitbenchmarker/beam_benchmark_helper.py#L54]
 

This Bug also depends upon PR 4158


was (Author: angoenka):
The jenkins build timeout. Doubling the timeout to 20min by passing 
beam_it_timeout = 1200 .
[Reference  
https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/blob/903eb82d4e7ee5ed2ac2953cf6ce1b80459497ed/perfkitbenchmarker/beam_benchmark_helper.py#L54]
 

This Bug also depends upon PR 4158

> beam_PerformanceTests_Python is failing
> ---
>
> Key: BEAM-3230
> URL: https://issues.apache.org/jira/browse/BEAM-3230
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>
> Jenkings beam_PerformanceTests_Python stage is failing for python builds.
> Here is the link to a failure console output 
> https://builds.apache.org/job/beam_PerformanceTests_Python/582/console



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (BEAM-3230) beam_PerformanceTests_Python is failing

2017-11-21 Thread Ankur Goenka (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261776#comment-16261776
 ] 

Ankur Goenka edited comment on BEAM-3230 at 11/22/17 12:50 AM:
---

The jenkins build timeout. Doubling the timeout to 20min by passing 
beam_it_timeout = 1200 .
[Reference: 
https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/blob/903eb82d4e7ee5ed2ac2953cf6ce1b80459497ed/perfkitbenchmarker/beam_benchmark_helper.py#L54]
 

This Bug also depends upon PR 4158


was (Author: angoenka):
The jenkins build timeout. Doubling the timeout to 20min by passing 
beam_it_timeout = 1200

This Bug also depends upon PR 4158

> beam_PerformanceTests_Python is failing
> ---
>
> Key: BEAM-3230
> URL: https://issues.apache.org/jira/browse/BEAM-3230
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>
> Jenkings beam_PerformanceTests_Python stage is failing for python builds.
> Here is the link to a failure console output 
> https://builds.apache.org/job/beam_PerformanceTests_Python/582/console



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (BEAM-3230) beam_PerformanceTests_Python is failing

2017-11-21 Thread Ankur Goenka (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261776#comment-16261776
 ] 

Ankur Goenka edited comment on BEAM-3230 at 11/22/17 12:50 AM:
---

The jenkins build timeout. Doubling the timeout to 20min by passing 
beam_it_timeout = 1200 .
[Reference  
https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/blob/903eb82d4e7ee5ed2ac2953cf6ce1b80459497ed/perfkitbenchmarker/beam_benchmark_helper.py#L54]
 

This Bug also depends upon PR 4158


was (Author: angoenka):
The jenkins build timeout. Doubling the timeout to 20min by passing 
beam_it_timeout = 1200 .
[Reference: 
https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/blob/903eb82d4e7ee5ed2ac2953cf6ce1b80459497ed/perfkitbenchmarker/beam_benchmark_helper.py#L54]
 

This Bug also depends upon PR 4158

> beam_PerformanceTests_Python is failing
> ---
>
> Key: BEAM-3230
> URL: https://issues.apache.org/jira/browse/BEAM-3230
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>
> Jenkings beam_PerformanceTests_Python stage is failing for python builds.
> Here is the link to a failure console output 
> https://builds.apache.org/job/beam_PerformanceTests_Python/582/console



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-3135) Adding futures dependency to beam python SDK

2017-11-01 Thread Ankur Goenka (JIRA)
Ankur Goenka created BEAM-3135:
--

 Summary: Adding futures dependency to beam python SDK
 Key: BEAM-3135
 URL: https://issues.apache.org/jira/browse/BEAM-3135
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: Ankur Goenka
Assignee: Ankur Goenka


[sdk_worker|https://github.com/apache/beam/blob/a33c717dbb9b25485218b135a78b5cdb3f8239c9/sdks/python/apache_beam/runners/worker/sdk_worker.py#L30]
 is importing [futures|https://pypi.python.org/pypi/futures] module without 
declaring it in setup.py
Adding the futures dependency to fix the dependencies.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (BEAM-3135) Adding futures dependency to beam python SDK

2017-11-01 Thread Ankur Goenka (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-3135:
---
Description: 
[sdk_worker|https://github.com/apache/beam/blob/a33c717dbb9b25485218b135a78b5cdb3f8239c9/sdks/python/apache_beam/runners/worker/sdk_worker.py#L30]
 is importing [futures|https://pypi.python.org/pypi/futures] module without 
declaring it in 
[setup.py|https://github.com/apache/beam/blob/a33c717dbb9b25485218b135a78b5cdb3f8239c9/sdks/python/setup.py#L97]
Adding the futures dependency to fix the dependencies.

  was:
[sdk_worker|https://github.com/apache/beam/blob/a33c717dbb9b25485218b135a78b5cdb3f8239c9/sdks/python/apache_beam/runners/worker/sdk_worker.py#L30]
 is importing [futures|https://pypi.python.org/pypi/futures] module without 
declaring it in setup.py
Adding the futures dependency to fix the dependencies.


> Adding futures dependency to beam python SDK
> 
>
> Key: BEAM-3135
> URL: https://issues.apache.org/jira/browse/BEAM-3135
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>  Labels: newbie
>
> [sdk_worker|https://github.com/apache/beam/blob/a33c717dbb9b25485218b135a78b5cdb3f8239c9/sdks/python/apache_beam/runners/worker/sdk_worker.py#L30]
>  is importing [futures|https://pypi.python.org/pypi/futures] module without 
> declaring it in 
> [setup.py|https://github.com/apache/beam/blob/a33c717dbb9b25485218b135a78b5cdb3f8239c9/sdks/python/setup.py#L97]
> Adding the futures dependency to fix the dependencies.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-3155) Python SDKHarness does not use separate thread for progress reporting

2017-11-07 Thread Ankur Goenka (JIRA)
Ankur Goenka created BEAM-3155:
--

 Summary: Python SDKHarness does not use separate thread for 
progress reporting
 Key: BEAM-3155
 URL: https://issues.apache.org/jira/browse/BEAM-3155
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core, sdk-py-harness
Reporter: Ankur Goenka
Assignee: Ahmet Altay
Priority: Minor


Python SDKHarness is using worker thread for progress reporting. progress 
reporting will be delayed with long running process bundle requests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (BEAM-3189) Python Fnapi - Worker speedup

2017-12-05 Thread Ankur Goenka (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-3189:
---
External issue URL: 
https://docs.google.com/document/d/1mHFaNgHA71RVGLVNrGrHIlWHgJb4tKCJ2qQzS13REY8

> Python Fnapi - Worker speedup
> -
>
> Key: BEAM-3189
> URL: https://issues.apache.org/jira/browse/BEAM-3189
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Affects Versions: 2.3.0
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Minor
>  Labels: performance, portability
>
> Beam Python SDK is couple of magnitude slower than Java SDK when it comes to 
> stream processing.
> There are two related issues:
> # Given a single core, currently we are not fully utilizing the core because 
> the single thread spends a lot of time on the IO. This is more of a 
> limitation of our implementation rather than a limitation of Python.
> # Given a machine with multiple cores, single Python process could only 
> utilize one core.
> In this task we will only address 1. 2 will be good for future optimization.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-3075) Document getting started for python sdk developers

2017-10-31 Thread Ankur Goenka (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16231655#comment-16231655
 ] 

Ankur Goenka commented on BEAM-3075:


Hi [~altay] I can take a shot at it.

> Document getting started for python sdk developers
> --
>
> Key: BEAM-3075
> URL: https://issues.apache.org/jira/browse/BEAM-3075
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, website
>Reporter: Ahmet Altay
>Priority: Major
>
> Ideas:
> - Instructions for running a modified wordcount, running tests and linter 
> locally.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2594) Python shim for submitting to the ULR

2018-05-08 Thread Ankur Goenka (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16467864#comment-16467864
 ] 

Ankur Goenka commented on BEAM-2594:


[~jkff] [~bsidhom]

Based on the current state of Python SDK, this task require to register a new 
runner extending Python ULR for flink (like flinkRunner, sparkRunner extends 
ULR) and introduction of runner specific pipeline options or is there more to 
it than this?

> Python shim for submitting to the ULR
> -
>
> Key: BEAM-2594
> URL: https://issues.apache.org/jira/browse/BEAM-2594
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Kenneth Knowles
>Priority: Minor
>  Labels: portability
>
> Python SDK should support submission of portable pipelines to the ULR, as per 
> https://s.apache.org/beam-job-api.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4289) How to express "Job Update" in portable Beam

2018-05-14 Thread Ankur Goenka (JIRA)
Ankur Goenka created BEAM-4289:
--

 Summary: How to express "Job Update" in portable Beam
 Key: BEAM-4289
 URL: https://issues.apache.org/jira/browse/BEAM-4289
 Project: Beam
  Issue Type: Improvement
  Components: beam-model
Reporter: Ankur Goenka
Assignee: Ankur Goenka


Based in the discussion here 
[https://docs.google.com/a/google.com/document/d/1xOaEEJrMmiSHprd-WiYABegfT129qqF-idUBINjxz8s/edit?disco=B9MqC-A]

For portable beam, how can user express intent to update the a running job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-3883) Python SDK stages artifacts when talking to job server

2018-05-24 Thread Ankur Goenka (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka resolved BEAM-3883.

   Resolution: Fixed
Fix Version/s: 2.5.0

> Python SDK stages artifacts when talking to job server
> --
>
> Key: BEAM-3883
> URL: https://issues.apache.org/jira/browse/BEAM-3883
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Ben Sidhom
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 19h 10m
>  Remaining Estimate: 0h
>
> The Python SDK does not currently stage its user-defined functions or 
> dependencies when talking to the job API. Artifacts that need to be staged 
> include the user code itself, any SDK components not included in the 
> container image, and the list of Python packages that must be installed at 
> runtime.
>  
> Artifacts that are currently expected can be found in the harness boot code: 
> [https://github.com/apache/beam/blob/58e3b06bee7378d2d8db1c8dd534b415864f63e1/sdks/python/container/boot.go#L52.]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4448) --sdk_location seems to be a required parameter now in the Apache Beam Python SDK when using the DataflowRunner

2018-06-07 Thread Ankur Goenka (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505250#comment-16505250
 ] 

Ankur Goenka commented on BEAM-4448:


This was fixed internally.

[~lcwik] Do we need to keep this bug open?

> --sdk_location seems to be a required parameter now in the Apache Beam Python 
> SDK when using the DataflowRunner
> ---
>
> Key: BEAM-4448
> URL: https://issues.apache.org/jira/browse/BEAM-4448
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Luke Cwik
>Assignee: Ankur Goenka
>Priority: Major
>
> During the code import process of Apache Beam code into Google, it was 
> discovered that the --sdk_location is a required parameter. Tests would fail 
> with:  
> {code:java}
> apache_beam/runners/portability/stager.py", line 513, in 
> _download_pypi_sdk_package.format(package_name))
> RuntimeError: Please set --sdk_location command-line option or install a 
> valid apache-beam distribution. 
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem

2018-06-13 Thread Ankur Goenka (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka resolved BEAM-4290.

   Resolution: Fixed
Fix Version/s: 2.6.0

> ArtifactStagingService that stages to a distributed filesystem
> --
>
> Key: BEAM-4290
> URL: https://issues.apache.org/jira/browse/BEAM-4290
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-core
>Reporter: Eugene Kirpichov
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.6.0
>
>  Time Spent: 15h 40m
>  Remaining Estimate: 0h
>
> Using the job's staging directory from PipelineOptions.
> Physical layout on the distributed filesystem is TBD but it should allow for 
> arbitrary filenames and ideally for eventually avoiding uploading artifacts 
> that are already there.
> Handling credentials is TBD.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4067) Java: FlinkPortableTestRunner: runs portably via self-started local Flink

2018-06-14 Thread Ankur Goenka (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka reassigned BEAM-4067:
--

Assignee: Ankur Goenka

> Java: FlinkPortableTestRunner: runs portably via self-started local Flink
> -
>
> Key: BEAM-4067
> URL: https://issues.apache.org/jira/browse/BEAM-4067
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Ben Sidhom
>Assignee: Ankur Goenka
>Priority: Minor
>
> The portable Flink runner cannot be tested through the normal mechanisms used 
> for ValidatesRunner tests because it requires a job server to be constructed 
> out of band and for pipelines to be run through it. We should implement a 
> shim that acts as a standard Java SDK Runner that spins up the necessary 
> server (possibly in-process) and runs against it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4219) Make Portable ULR use workerId in GRPC channel

2018-05-01 Thread Ankur Goenka (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-4219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16460003#comment-16460003
 ] 

Ankur Goenka commented on BEAM-4219:


cc: [~lcwik]

> Make Portable ULR use workerId in GRPC channel
> --
>
> Key: BEAM-4219
> URL: https://issues.apache.org/jira/browse/BEAM-4219
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>
> The internal implementation of Data service uses workerId make sure that 
> [https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/data/GrpcDataService.java]
> Also uses workerId.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4219) Make Portable ULR use workerId in GRPC channel

2018-05-01 Thread Ankur Goenka (JIRA)
Ankur Goenka created BEAM-4219:
--

 Summary: Make Portable ULR use workerId in GRPC channel
 Key: BEAM-4219
 URL: https://issues.apache.org/jira/browse/BEAM-4219
 Project: Beam
  Issue Type: Bug
  Components: runner-core
Reporter: Ankur Goenka
Assignee: Ankur Goenka


The internal implementation of Data service uses workerId make sure that 

[https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/data/GrpcDataService.java]

Also uses workerId.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4149) Java SDK Harness should populate worker id in control plane headers

2018-04-30 Thread Ankur Goenka (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-4149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16459046#comment-16459046
 ] 

Ankur Goenka commented on BEAM-4149:


Is the workaround good enough or should we fix it now?

> Java SDK Harness should populate worker id in control plane headers
> ---
>
> Key: BEAM-4149
> URL: https://issues.apache.org/jira/browse/BEAM-4149
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Ben Sidhom
>Assignee: Luke Cwik
>Priority: Minor
>
> The Java SDK harness currently does nothing to populate control plane headers 
> with the harness worker id. This id is necessary in order to identify 
> harnesses when multiple are run from the same runner control server.
> Note that this affects the _Java_ harness specifically (e.g., when running a 
> local process or in-memory harness). When the harness launched within the 
> docker container, the go boot code takes care of setting this: 
> https://github.com/apache/beam/blob/dffe50924f34d3cc994008703f01e802c99913d2/sdks/java/container/boot.go#L70



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4149) Java SDK Harness should populate worker id in control plane headers

2018-04-30 Thread Ankur Goenka (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-4149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16459279#comment-16459279
 ] 

Ankur Goenka commented on BEAM-4149:


In reference to 
[https://github.com/apache/beam/blob/dffe50924f34d3cc994008703f01e802c99913d2/sdks/java/container/boot.go#L70]

github.com/apache/beam/sdks/go/pkg/beam/util/grpcx/metadata.go/WriteWorkerID 
Only adds worker id in context of the go sdk. Java sdk's boot.go will not add 
worker_id to grpc channels created in java sdk.

> Java SDK Harness should populate worker id in control plane headers
> ---
>
> Key: BEAM-4149
> URL: https://issues.apache.org/jira/browse/BEAM-4149
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Ben Sidhom
>Assignee: Luke Cwik
>Priority: Minor
>
> The Java SDK harness currently does nothing to populate control plane headers 
> with the harness worker id. This id is necessary in order to identify 
> harnesses when multiple are run from the same runner control server.
> Note that this affects the _Java_ harness specifically (e.g., when running a 
> local process or in-memory harness). When the harness launched within the 
> docker container, the go boot code takes care of setting this: 
> https://github.com/apache/beam/blob/dffe50924f34d3cc994008703f01e802c99913d2/sdks/java/container/boot.go#L70



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-3418) Python Fnapi - Multiprocess worker

2018-01-05 Thread Ankur Goenka (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-3418:
---
Issue Type: Improvement  (was: Bug)

> Python Fnapi - Multiprocess worker
> --
>
> Key: BEAM-3418
> URL: https://issues.apache.org/jira/browse/BEAM-3418
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>
> Support multiple python SDK process on a VM to fully utilize a machine.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-3418) Python Fnapi - Multiprocess worker

2018-01-05 Thread Ankur Goenka (JIRA)
Ankur Goenka created BEAM-3418:
--

 Summary: Python Fnapi - Multiprocess worker
 Key: BEAM-3418
 URL: https://issues.apache.org/jira/browse/BEAM-3418
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-harness
Reporter: Ankur Goenka
Assignee: Ankur Goenka


Support multiple python SDK process on a VM to fully utilize a machine.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-3411) Test apache_beam.examples.wordcount_it_test.WordCountIT times out

2018-01-05 Thread Ankur Goenka (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16314279#comment-16314279
 ] 

Ankur Goenka commented on BEAM-3411:


We missed catching a case where progress report for an unregistered work item 
is requested from the worker which resulted in an uncaught exceptions:
Underlying error stack trace:
{code:java}
Python sdk harness failed: 
Traceback (most recent call last):
  File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker_main.py",
 line 134, in main
worker_count=_get_worker_count(sdk_pipeline_options)).run()
  File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
 line 97, in run
work_request)
  File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
 line 162, in _request_process_bundle_progress
worker = self._instruction_id_vs_worker[request.instruction_id]
KeyError: u'-39'
{code}


> Test apache_beam.examples.wordcount_it_test.WordCountIT times out
> -
>
> Key: BEAM-3411
> URL: https://issues.apache.org/jira/browse/BEAM-3411
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Henning Rohde
>Assignee: Ankur Goenka
>
> Failed run: 
> https://builds.apache.org/job/beam_PostCommit_Python_Verify/3876/console
> Log snippet:
> test_wordcount_fnapi_it (apache_beam.examples.wordcount_it_test.WordCountIT) 
> ... ERROR
> ==
> ERROR: test_wordcount_fnapi_it 
> (apache_beam.examples.wordcount_it_test.WordCountIT)
> --
> Traceback (most recent call last):
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/plugins/multiprocess.py",
>  line 812, in run
> test(orig)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/case.py",
>  line 45, in __call__
> return self.run(*arg, **kwarg)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/case.py",
>  line 133, in run
> self.runTest(result)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/case.py",
>  line 151, in runTest
> test(result)
>   File "/usr/lib/python2.7/unittest/case.py", line 395, in __call__
> return self.run(*args, **kwds)
>   File "/usr/lib/python2.7/unittest/case.py", line 331, in run
> testMethod()
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/examples/wordcount_it_test.py",
>  line 77, in test_wordcount_fnapi_it
> on_success_matcher=PipelineStateMatcher()))
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/examples/wordcount_fnapi.py",
>  line 130, in run
> result.wait_until_finish()
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 956, in wait_until_finish
> time.sleep(5.0)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/plugins/multiprocess.py",
>  line 276, in signalhandler
> raise TimedOutException()
> TimedOutException: 'test_wordcount_fnapi_it 
> (apache_beam.examples.wordcount_it_test.WordCountIT)'
> --
> Ran 3 tests in 901.290s
> FAILED (errors=1)
> Build step 'Execute shell' marked build as failure



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (BEAM-3411) Test apache_beam.examples.wordcount_it_test.WordCountIT times out

2018-01-05 Thread Ankur Goenka (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16314279#comment-16314279
 ] 

Ankur Goenka edited comment on BEAM-3411 at 1/6/18 1:48 AM:


We missed catching a case where progress report for an unregistered work item 
is requested from the worker which resulted in an uncaught exceptions.
Erroneous Code block: 

{code:java}
worker = self._instruction_id_vs_worker[request.instruction_id]
{code}

Underlying error stack trace:
{code:java}
Python sdk harness failed: 
Traceback (most recent call last):
  File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker_main.py",
 line 134, in main
worker_count=_get_worker_count(sdk_pipeline_options)).run()
  File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
 line 97, in run
work_request)
  File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
 line 162, in _request_process_bundle_progress
worker = self._instruction_id_vs_worker[request.instruction_id]
KeyError: u'-39'
{code}



was (Author: angoenka):
We missed catching a case where progress report for an unregistered work item 
is requested from the worker which resulted in an uncaught exceptions.
Erroneous Code block: 

{code:python}
worker = self._instruction_id_vs_worker[request.instruction_id]
{code}

Underlying error stack trace:
{code:python}
Python sdk harness failed: 
Traceback (most recent call last):
  File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker_main.py",
 line 134, in main
worker_count=_get_worker_count(sdk_pipeline_options)).run()
  File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
 line 97, in run
work_request)
  File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
 line 162, in _request_process_bundle_progress
worker = self._instruction_id_vs_worker[request.instruction_id]
KeyError: u'-39'
{code}


> Test apache_beam.examples.wordcount_it_test.WordCountIT times out
> -
>
> Key: BEAM-3411
> URL: https://issues.apache.org/jira/browse/BEAM-3411
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Henning Rohde
>Assignee: Ankur Goenka
>
> Failed run: 
> https://builds.apache.org/job/beam_PostCommit_Python_Verify/3876/console
> Log snippet:
> test_wordcount_fnapi_it (apache_beam.examples.wordcount_it_test.WordCountIT) 
> ... ERROR
> ==
> ERROR: test_wordcount_fnapi_it 
> (apache_beam.examples.wordcount_it_test.WordCountIT)
> --
> Traceback (most recent call last):
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/plugins/multiprocess.py",
>  line 812, in run
> test(orig)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/case.py",
>  line 45, in __call__
> return self.run(*arg, **kwarg)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/case.py",
>  line 133, in run
> self.runTest(result)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/case.py",
>  line 151, in runTest
> test(result)
>   File "/usr/lib/python2.7/unittest/case.py", line 395, in __call__
> return self.run(*args, **kwds)
>   File "/usr/lib/python2.7/unittest/case.py", line 331, in run
> testMethod()
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/examples/wordcount_it_test.py",
>  line 77, in test_wordcount_fnapi_it
> on_success_matcher=PipelineStateMatcher()))
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/examples/wordcount_fnapi.py",
>  line 130, in run
> result.wait_until_finish()
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 956, in wait_until_finish
> time.sleep(5.0)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/plugins/multiprocess.py",
>  line 276, in signalhandler
> raise TimedOutException()
> TimedOutException: 'test_wordcount_fnapi_it 
> (apache_beam.examples.wordcount_it_test.WordCountIT)'
> --
> Ran 3 tests in 901.290s
> FAILED (errors=1)

[jira] [Comment Edited] (BEAM-3411) Test apache_beam.examples.wordcount_it_test.WordCountIT times out

2018-01-05 Thread Ankur Goenka (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16314279#comment-16314279
 ] 

Ankur Goenka edited comment on BEAM-3411 at 1/6/18 1:47 AM:


We missed catching a case where progress report for an unregistered work item 
is requested from the worker which resulted in an uncaught exceptions.
Erroneous Code block: 

{code:python}
worker = self._instruction_id_vs_worker[request.instruction_id]
{code}

Underlying error stack trace:
{code:python}
Python sdk harness failed: 
Traceback (most recent call last):
  File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker_main.py",
 line 134, in main
worker_count=_get_worker_count(sdk_pipeline_options)).run()
  File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
 line 97, in run
work_request)
  File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
 line 162, in _request_process_bundle_progress
worker = self._instruction_id_vs_worker[request.instruction_id]
KeyError: u'-39'
{code}



was (Author: angoenka):
We missed catching a case where progress report for an unregistered work item 
is requested from the worker which resulted in an uncaught exceptions:
Underlying error stack trace:
{code:java}
Python sdk harness failed: 
Traceback (most recent call last):
  File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker_main.py",
 line 134, in main
worker_count=_get_worker_count(sdk_pipeline_options)).run()
  File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
 line 97, in run
work_request)
  File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
 line 162, in _request_process_bundle_progress
worker = self._instruction_id_vs_worker[request.instruction_id]
KeyError: u'-39'
{code}


> Test apache_beam.examples.wordcount_it_test.WordCountIT times out
> -
>
> Key: BEAM-3411
> URL: https://issues.apache.org/jira/browse/BEAM-3411
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Henning Rohde
>Assignee: Ankur Goenka
>
> Failed run: 
> https://builds.apache.org/job/beam_PostCommit_Python_Verify/3876/console
> Log snippet:
> test_wordcount_fnapi_it (apache_beam.examples.wordcount_it_test.WordCountIT) 
> ... ERROR
> ==
> ERROR: test_wordcount_fnapi_it 
> (apache_beam.examples.wordcount_it_test.WordCountIT)
> --
> Traceback (most recent call last):
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/plugins/multiprocess.py",
>  line 812, in run
> test(orig)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/case.py",
>  line 45, in __call__
> return self.run(*arg, **kwarg)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/case.py",
>  line 133, in run
> self.runTest(result)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/case.py",
>  line 151, in runTest
> test(result)
>   File "/usr/lib/python2.7/unittest/case.py", line 395, in __call__
> return self.run(*args, **kwds)
>   File "/usr/lib/python2.7/unittest/case.py", line 331, in run
> testMethod()
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/examples/wordcount_it_test.py",
>  line 77, in test_wordcount_fnapi_it
> on_success_matcher=PipelineStateMatcher()))
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/examples/wordcount_fnapi.py",
>  line 130, in run
> result.wait_until_finish()
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 956, in wait_until_finish
> time.sleep(5.0)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/plugins/multiprocess.py",
>  line 276, in signalhandler
> raise TimedOutException()
> TimedOutException: 'test_wordcount_fnapi_it 
> (apache_beam.examples.wordcount_it_test.WordCountIT)'
> --
> Ran 3 tests in 901.290s
> FAILED (errors=1)
> Build step 'Execute shell' marked build as failure



--
This message was sent by Atlassian JIRA

[jira] [Created] (BEAM-3389) Clean receive queue in dataplane

2017-12-21 Thread Ankur Goenka (JIRA)
Ankur Goenka created BEAM-3389:
--

 Summary: Clean receive queue in dataplane
 Key: BEAM-3389
 URL: https://issues.apache.org/jira/browse/BEAM-3389
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Ankur Goenka
Assignee: Ahmet Altay


Remove the receiving queue for an instruction_id after an instruction_id is 
processed



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-3389) Clean receive queue in dataplane

2017-12-21 Thread Ankur Goenka (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka reassigned BEAM-3389:
--

Assignee: Ankur Goenka  (was: Ahmet Altay)

> Clean receive queue in dataplane
> 
>
> Key: BEAM-3389
> URL: https://issues.apache.org/jira/browse/BEAM-3389
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>
> Remove the receiving queue for an instruction_id after an instruction_id is 
> processed



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (BEAM-3389) Clean receive queue in dataplane

2018-01-08 Thread Ankur Goenka (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka closed BEAM-3389.
--
   Resolution: Fixed
Fix Version/s: 2.3.0

> Clean receive queue in dataplane
> 
>
> Key: BEAM-3389
> URL: https://issues.apache.org/jira/browse/BEAM-3389
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
> Fix For: 2.3.0
>
>
> Remove the receiving queue for an instruction_id after an instruction_id is 
> processed



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (BEAM-3418) Python Fnapi - Multiprocess worker

2018-01-08 Thread Ankur Goenka (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-3418:
---
Labels: performance portability  (was: )

> Python Fnapi - Multiprocess worker
> --
>
> Key: BEAM-3418
> URL: https://issues.apache.org/jira/browse/BEAM-3418
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>  Labels: performance, portability
>
> Support multiple python SDK process on a VM to fully utilize a machine.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (BEAM-3239) Enable debug server for python sdk workers

2018-01-08 Thread Ankur Goenka (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka closed BEAM-3239.
--
   Resolution: Fixed
Fix Version/s: 2.3.0

> Enable debug server for python sdk workers
> --
>
> Key: BEAM-3239
> URL: https://issues.apache.org/jira/browse/BEAM-3239
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Minor
> Fix For: 2.3.0
>
>
> Enable status server to dump threads when http get call is made to the server.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (BEAM-3155) Python SDKHarness does not use separate thread for progress reporting

2018-01-08 Thread Ankur Goenka (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka closed BEAM-3155.
--
   Resolution: Fixed
Fix Version/s: 2.3.0

> Python SDKHarness does not use separate thread for progress reporting
> -
>
> Key: BEAM-3155
> URL: https://issues.apache.org/jira/browse/BEAM-3155
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Minor
> Fix For: 2.3.0
>
>
> Python SDKHarness is using worker thread for progress reporting. progress 
> reporting will be delayed with long running process bundle requests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (BEAM-3135) Adding futures dependency to beam python SDK

2018-01-08 Thread Ankur Goenka (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka closed BEAM-3135.
--
   Resolution: Fixed
Fix Version/s: 2.3.0

> Adding futures dependency to beam python SDK
> 
>
> Key: BEAM-3135
> URL: https://issues.apache.org/jira/browse/BEAM-3135
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Trivial
>  Labels: newbie
> Fix For: 2.3.0
>
>
> [sdk_worker|https://github.com/apache/beam/blob/a33c717dbb9b25485218b135a78b5cdb3f8239c9/sdks/python/apache_beam/runners/worker/sdk_worker.py#L30]
>  is importing [futures|https://pypi.python.org/pypi/futures] module without 
> declaring it in 
> [setup.py|https://github.com/apache/beam/blob/a33c717dbb9b25485218b135a78b5cdb3f8239c9/sdks/python/setup.py#L97]
> Adding the futures dependency to fix the dependencies.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (BEAM-3486) Progress reporting for python sdk fix

2018-01-24 Thread Ankur Goenka (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka resolved BEAM-3486.

Resolution: Fixed

> Progress reporting for python sdk fix
> -
>
> Key: BEAM-3486
> URL: https://issues.apache.org/jira/browse/BEAM-3486
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Trivial
> Fix For: 2.3.0
>
>
> Python sdk was not reporting progress correctly



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-3411) Test apache_beam.examples.wordcount_it_test.WordCountIT times out

2018-01-24 Thread Ankur Goenka (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka resolved BEAM-3411.

   Resolution: Fixed
Fix Version/s: 2.3.0

> Test apache_beam.examples.wordcount_it_test.WordCountIT times out
> -
>
> Key: BEAM-3411
> URL: https://issues.apache.org/jira/browse/BEAM-3411
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Henning Rohde
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.3.0
>
>
> Failed run: 
> https://builds.apache.org/job/beam_PostCommit_Python_Verify/3876/console
> Log snippet:
> test_wordcount_fnapi_it (apache_beam.examples.wordcount_it_test.WordCountIT) 
> ... ERROR
> ==
> ERROR: test_wordcount_fnapi_it 
> (apache_beam.examples.wordcount_it_test.WordCountIT)
> --
> Traceback (most recent call last):
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/plugins/multiprocess.py",
>  line 812, in run
> test(orig)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/case.py",
>  line 45, in __call__
> return self.run(*arg, **kwarg)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/case.py",
>  line 133, in run
> self.runTest(result)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/case.py",
>  line 151, in runTest
> test(result)
>   File "/usr/lib/python2.7/unittest/case.py", line 395, in __call__
> return self.run(*args, **kwds)
>   File "/usr/lib/python2.7/unittest/case.py", line 331, in run
> testMethod()
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/examples/wordcount_it_test.py",
>  line 77, in test_wordcount_fnapi_it
> on_success_matcher=PipelineStateMatcher()))
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/examples/wordcount_fnapi.py",
>  line 130, in run
> result.wait_until_finish()
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 956, in wait_until_finish
> time.sleep(5.0)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/plugins/multiprocess.py",
>  line 276, in signalhandler
> raise TimedOutException()
> TimedOutException: 'test_wordcount_fnapi_it 
> (apache_beam.examples.wordcount_it_test.WordCountIT)'
> --
> Ran 3 tests in 901.290s
> FAILED (errors=1)
> Build step 'Execute shell' marked build as failure



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3411) Test apache_beam.examples.wordcount_it_test.WordCountIT times out

2018-01-24 Thread Ankur Goenka (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338151#comment-16338151
 ] 

Ankur Goenka commented on BEAM-3411:


[https://builds.apache.org/job/beam_PostCommit_Python_Verify/]  is passing now.

> Test apache_beam.examples.wordcount_it_test.WordCountIT times out
> -
>
> Key: BEAM-3411
> URL: https://issues.apache.org/jira/browse/BEAM-3411
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Henning Rohde
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.3.0
>
>
> Failed run: 
> https://builds.apache.org/job/beam_PostCommit_Python_Verify/3876/console
> Log snippet:
> test_wordcount_fnapi_it (apache_beam.examples.wordcount_it_test.WordCountIT) 
> ... ERROR
> ==
> ERROR: test_wordcount_fnapi_it 
> (apache_beam.examples.wordcount_it_test.WordCountIT)
> --
> Traceback (most recent call last):
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/plugins/multiprocess.py",
>  line 812, in run
> test(orig)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/case.py",
>  line 45, in __call__
> return self.run(*arg, **kwarg)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/case.py",
>  line 133, in run
> self.runTest(result)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/case.py",
>  line 151, in runTest
> test(result)
>   File "/usr/lib/python2.7/unittest/case.py", line 395, in __call__
> return self.run(*args, **kwds)
>   File "/usr/lib/python2.7/unittest/case.py", line 331, in run
> testMethod()
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/examples/wordcount_it_test.py",
>  line 77, in test_wordcount_fnapi_it
> on_success_matcher=PipelineStateMatcher()))
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/examples/wordcount_fnapi.py",
>  line 130, in run
> result.wait_until_finish()
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 956, in wait_until_finish
> time.sleep(5.0)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/.eggs/nose-1.3.7-py2.7.egg/nose/plugins/multiprocess.py",
>  line 276, in signalhandler
> raise TimedOutException()
> TimedOutException: 'test_wordcount_fnapi_it 
> (apache_beam.examples.wordcount_it_test.WordCountIT)'
> --
> Ran 3 tests in 901.290s
> FAILED (errors=1)
> Build step 'Execute shell' marked build as failure



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-3486) Progress reporting for python sdk fix

2018-01-16 Thread Ankur Goenka (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-3486:
---
Issue Type: Bug  (was: Improvement)

> Progress reporting for python sdk fix
> -
>
> Key: BEAM-3486
> URL: https://issues.apache.org/jira/browse/BEAM-3486
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Trivial
>  Labels: newbie
> Fix For: 2.3.0
>
>
> [sdk_worker|https://github.com/apache/beam/blob/a33c717dbb9b25485218b135a78b5cdb3f8239c9/sdks/python/apache_beam/runners/worker/sdk_worker.py#L30]
>  is importing [futures|https://pypi.python.org/pypi/futures] module without 
> declaring it in 
> [setup.py|https://github.com/apache/beam/blob/a33c717dbb9b25485218b135a78b5cdb3f8239c9/sdks/python/setup.py#L97]
> Adding the futures dependency to fix the dependencies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-3486) Progress reporting for python sdk fix

2018-01-16 Thread Ankur Goenka (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-3486:
---
Description: Python sdk was not reporting progress correctly  (was: 
[sdk_worker|https://github.com/apache/beam/blob/a33c717dbb9b25485218b135a78b5cdb3f8239c9/sdks/python/apache_beam/runners/worker/sdk_worker.py#L30]
 is importing [futures|https://pypi.python.org/pypi/futures] module without 
declaring it in 
[setup.py|https://github.com/apache/beam/blob/a33c717dbb9b25485218b135a78b5cdb3f8239c9/sdks/python/setup.py#L97]
Adding the futures dependency to fix the dependencies.)

> Progress reporting for python sdk fix
> -
>
> Key: BEAM-3486
> URL: https://issues.apache.org/jira/browse/BEAM-3486
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Trivial
> Fix For: 2.3.0
>
>
> Python sdk was not reporting progress correctly



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-3486) Progress reporting for python sdk fix

2018-01-16 Thread Ankur Goenka (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-3486:
---
Labels:   (was: newbie)

> Progress reporting for python sdk fix
> -
>
> Key: BEAM-3486
> URL: https://issues.apache.org/jira/browse/BEAM-3486
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Trivial
> Fix For: 2.3.0
>
>
> Python sdk was not reporting progress correctly



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-4748) Flaky post-commit test org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactServicesTest.putArtifactsMultipleFilesConcurrentlyTest

2018-07-30 Thread Ankur Goenka (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka resolved BEAM-4748.

   Resolution: Duplicate
Fix Version/s: 2.7.0

> Flaky post-commit test 
> org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactServicesTest.putArtifactsMultipleFilesConcurrentlyTest
> 
>
> Key: BEAM-4748
> URL: https://issues.apache.org/jira/browse/BEAM-4748
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Mikhail Gryzykhin
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.7.0
>
>
> Test flaked on following job:
> https://builds.apache.org/view/A-D/view/Beam/job/beam_PostCommit_Java_GradleBuild/1040/testReport/junit/org.apache.beam.runners.fnexecution.artifact/BeamFileSystemArtifactServicesTest/putArtifactsMultipleFilesConcurrentlyTest/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4748) Flaky post-commit test org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactServicesTest.putArtifactsMultipleFilesConcurrentlyTest

2018-07-30 Thread Ankur Goenka (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562415#comment-16562415
 ] 

Ankur Goenka commented on BEAM-4748:


This flaky test was fixed in [https://github.com/apache/beam/pull/5975] 

Please try with the latest code base.

> Flaky post-commit test 
> org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactServicesTest.putArtifactsMultipleFilesConcurrentlyTest
> 
>
> Key: BEAM-4748
> URL: https://issues.apache.org/jira/browse/BEAM-4748
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Mikhail Gryzykhin
>Assignee: Ankur Goenka
>Priority: Major
>
> Test flaked on following job:
> https://builds.apache.org/view/A-D/view/Beam/job/beam_PostCommit_Java_GradleBuild/1040/testReport/junit/org.apache.beam.runners.fnexecution.artifact/BeamFileSystemArtifactServicesTest/putArtifactsMultipleFilesConcurrentlyTest/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5041) Java Fn SDK Harness skips unprocessed pCollections

2018-07-29 Thread Ankur Goenka (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-5041:
---
Description: 
Java Sdk Harness used pCollections to keep track of computed consumers 
[here|[https://github.com/apache/beam/blob/ff95a82e461bd8319d9733be60e75992ba90cd7c/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/control/ProcessBundleHandler.java#L158]]
 . This is incorrect as consumers are based on pTransforms so pTransforms 
should be used to keep track of computed processors.

In case of Flatten, this creates an issue where pTransforms having same input 
as that to flatten are not executed. This causes 

[https://github.com/apache/beam/blob/ff95a82e461bd8319d9733be60e75992ba90cd7c/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlattenTest.java#L316]
 to fail.

  was:
Java Sdk Harness used pCollections to keep track of computed consumers 
[here|[https://github.com/apache/beam/blob/ff95a82e461bd8319d9733be60e75992ba90cd7c/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/control/ProcessBundleHandler.java#L158]

] . This is incorrect as consumers are based on pTransforms so pTransforms 
should be used to keep track of computed processors.

In case of Flatten, this creates an issue where pTransforms having same input 
as that to flatten are not executed. This causes 

[https://github.com/apache/beam/blob/ff95a82e461bd8319d9733be60e75992ba90cd7c/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlattenTest.java#L316]
 to fail.


> Java Fn SDK Harness skips unprocessed pCollections
> --
>
> Key: BEAM-5041
> URL: https://issues.apache.org/jira/browse/BEAM-5041
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>
> Java Sdk Harness used pCollections to keep track of computed consumers 
> [here|[https://github.com/apache/beam/blob/ff95a82e461bd8319d9733be60e75992ba90cd7c/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/control/ProcessBundleHandler.java#L158]]
>  . This is incorrect as consumers are based on pTransforms so pTransforms 
> should be used to keep track of computed processors.
> In case of Flatten, this creates an issue where pTransforms having same input 
> as that to flatten are not executed. This causes 
> [https://github.com/apache/beam/blob/ff95a82e461bd8319d9733be60e75992ba90cd7c/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlattenTest.java#L316]
>  to fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5041) Java Fn SDK Harness skips unprocessed pCollections

2018-07-29 Thread Ankur Goenka (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-5041:
---
Description: 
Java Sdk Harness used pCollections to keep track of computed consumers 
[here|https://github.com/apache/beam/blob/ff95a82e461bd8319d9733be60e75992ba90cd7c/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/control/ProcessBundleHandler.java#L158].
 This is incorrect as consumers are based on pTransforms so pTransforms should 
be used to keep track of computed processors.

In case of Flatten, this creates an issue where pTransforms having same input 
as that to flatten are not executed. This causes 

[https://github.com/apache/beam/blob/ff95a82e461bd8319d9733be60e75992ba90cd7c/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlattenTest.java#L316]
 to fail.

  was:
Java Sdk Harness used pCollections to keep track of computed consumers 
[here|[https://github.com/apache/beam/blob/ff95a82e461bd8319d9733be60e75992ba90cd7c/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/control/ProcessBundleHandler.java#L158]]
 . This is incorrect as consumers are based on pTransforms so pTransforms 
should be used to keep track of computed processors.

In case of Flatten, this creates an issue where pTransforms having same input 
as that to flatten are not executed. This causes 

[https://github.com/apache/beam/blob/ff95a82e461bd8319d9733be60e75992ba90cd7c/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlattenTest.java#L316]
 to fail.


> Java Fn SDK Harness skips unprocessed pCollections
> --
>
> Key: BEAM-5041
> URL: https://issues.apache.org/jira/browse/BEAM-5041
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>
> Java Sdk Harness used pCollections to keep track of computed consumers 
> [here|https://github.com/apache/beam/blob/ff95a82e461bd8319d9733be60e75992ba90cd7c/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/control/ProcessBundleHandler.java#L158].
>  This is incorrect as consumers are based on pTransforms so pTransforms 
> should be used to keep track of computed processors.
> In case of Flatten, this creates an issue where pTransforms having same input 
> as that to flatten are not executed. This causes 
> [https://github.com/apache/beam/blob/ff95a82e461bd8319d9733be60e75992ba90cd7c/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlattenTest.java#L316]
>  to fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5041) Java Fn SDK Harness skips unprocessed pCollections

2018-07-29 Thread Ankur Goenka (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-5041:
---
Description: 
Java Sdk Harness used pCollections to keep track of computed consumers 
[here|[https://github.com/apache/beam/blob/ff95a82e461bd8319d9733be60e75992ba90cd7c/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/control/ProcessBundleHandler.java#L158]

] . This is incorrect as consumers are based on pTransforms so pTransforms 
should be used to keep track of computed processors.

In case of Flatten, this creates an issue where pTransforms having same input 
as that to flatten are not executed. This causes 

[https://github.com/apache/beam/blob/ff95a82e461bd8319d9733be60e75992ba90cd7c/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlattenTest.java#L316]
 to fail.

  was:
Java Sdk Harness used pCollections to keep track of computed consumers [here|

https://github.com/apache/beam/blob/ff95a82e461bd8319d9733be60e75992ba90cd7c/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/control/ProcessBundleHandler.java#L158

] . This is incorrect as consumers are based on pTransforms so pTransforms 
should be used to keep track of computed processors.

In case of Flatten, this creates an issue where pTransforms having same input 
as that to flatten are not executed. This causes 

[https://github.com/apache/beam/blob/ff95a82e461bd8319d9733be60e75992ba90cd7c/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlattenTest.java#L316]
 to fail.


> Java Fn SDK Harness skips unprocessed pCollections
> --
>
> Key: BEAM-5041
> URL: https://issues.apache.org/jira/browse/BEAM-5041
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>
> Java Sdk Harness used pCollections to keep track of computed consumers 
> [here|[https://github.com/apache/beam/blob/ff95a82e461bd8319d9733be60e75992ba90cd7c/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/control/ProcessBundleHandler.java#L158]
> ] . This is incorrect as consumers are based on pTransforms so pTransforms 
> should be used to keep track of computed processors.
> In case of Flatten, this creates an issue where pTransforms having same input 
> as that to flatten are not executed. This causes 
> [https://github.com/apache/beam/blob/ff95a82e461bd8319d9733be60e75992ba90cd7c/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlattenTest.java#L316]
>  to fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5041) Java Fn SDK Harness skips unprocessed pCollections

2018-07-29 Thread Ankur Goenka (JIRA)
Ankur Goenka created BEAM-5041:
--

 Summary: Java Fn SDK Harness skips unprocessed pCollections
 Key: BEAM-5041
 URL: https://issues.apache.org/jira/browse/BEAM-5041
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-harness
Reporter: Ankur Goenka
Assignee: Ankur Goenka


Java Sdk Harness used pCollections to keep track of computed consumers [here|

https://github.com/apache/beam/blob/ff95a82e461bd8319d9733be60e75992ba90cd7c/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/control/ProcessBundleHandler.java#L158

] . This is incorrect as consumers are based on pTransforms so pTransforms 
should be used to keep track of computed processors.

In case of Flatten, this creates an issue where pTransforms having same input 
as that to flatten are not executed. This causes 

[https://github.com/apache/beam/blob/ff95a82e461bd8319d9733be60e75992ba90cd7c/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlattenTest.java#L316]
 to fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5041) Java Fn SDK Harness skips unprocessed pCollections

2018-07-29 Thread Ankur Goenka (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-5041:
---
Description: 
Java Sdk Harness used pCollections to keep track of computed consumers 
[here|https://github.com/apache/beam/blob/ff95a82e461bd8319d9733be60e75992ba90cd7c/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/control/ProcessBundleHandler.java#L158].
 This is incorrect as consumers are based on pTransforms so pTransforms should 
be used to keep track of computed consumers.

In case of Flatten, this creates an issue where pTransforms having same input 
as that to flatten are not executed. This causes 

[https://github.com/apache/beam/blob/ff95a82e461bd8319d9733be60e75992ba90cd7c/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlattenTest.java#L316]
 to fail.

  was:
Java Sdk Harness used pCollections to keep track of computed consumers 
[here|https://github.com/apache/beam/blob/ff95a82e461bd8319d9733be60e75992ba90cd7c/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/control/ProcessBundleHandler.java#L158].
 This is incorrect as consumers are based on pTransforms so pTransforms should 
be used to keep track of computed processors.

In case of Flatten, this creates an issue where pTransforms having same input 
as that to flatten are not executed. This causes 

[https://github.com/apache/beam/blob/ff95a82e461bd8319d9733be60e75992ba90cd7c/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlattenTest.java#L316]
 to fail.


> Java Fn SDK Harness skips unprocessed pCollections
> --
>
> Key: BEAM-5041
> URL: https://issues.apache.org/jira/browse/BEAM-5041
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>
> Java Sdk Harness used pCollections to keep track of computed consumers 
> [here|https://github.com/apache/beam/blob/ff95a82e461bd8319d9733be60e75992ba90cd7c/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/control/ProcessBundleHandler.java#L158].
>  This is incorrect as consumers are based on pTransforms so pTransforms 
> should be used to keep track of computed consumers.
> In case of Flatten, this creates an issue where pTransforms having same input 
> as that to flatten are not executed. This causes 
> [https://github.com/apache/beam/blob/ff95a82e461bd8319d9733be60e75992ba90cd7c/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlattenTest.java#L316]
>  to fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5022) Move beam-sdks-java-fn-execution#createPortableValidatesRunnerTask to BeamGradlePlugin

2018-07-25 Thread Ankur Goenka (JIRA)
Ankur Goenka created BEAM-5022:
--

 Summary: Move 
beam-sdks-java-fn-execution#createPortableValidatesRunnerTask to 
BeamGradlePlugin
 Key: BEAM-5022
 URL: https://issues.apache.org/jira/browse/BEAM-5022
 Project: Beam
  Issue Type: Improvement
  Components: build-system, runner-flink
Reporter: Ankur Goenka
Assignee: Ankur Goenka


Move beam-sdks-java-fn-execution#createPortableValidatesRunnerTask to 
BeamGradlePlugin So that it can be used by other portable runners tests.

 

Also Create an interface TestJobserverDriver and make the drivers extend it 
instead of using reflection start the Jobserver.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5023) BeamFnDataGrpcClient should pass the worker_id when connecting to the RunnerHarness

2018-07-25 Thread Ankur Goenka (JIRA)
Ankur Goenka created BEAM-5023:
--

 Summary: BeamFnDataGrpcClient should pass the worker_id when 
connecting to the RunnerHarness
 Key: BEAM-5023
 URL: https://issues.apache.org/jira/browse/BEAM-5023
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-harness
Reporter: Ankur Goenka
Assignee: Ankur Goenka






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4826) Flink runner sends bad flatten to SDK

2018-08-01 Thread Ankur Goenka (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565838#comment-16565838
 ] 

Ankur Goenka commented on BEAM-4826:


I see. 

The split of flatten is ok. I will see how we can structure the code to remove 
unused pPcollection in flatten or other transforms input.

Though unused outputs will still be present as parDo will generate them in the 
user code.

 

> Flink runner sends bad flatten to SDK
> -
>
> Key: BEAM-4826
> URL: https://issues.apache.org/jira/browse/BEAM-4826
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Henning Rohde
>Assignee: Ankur Goenka
>Priority: Major
>  Labels: portability
>
> For a Go flatten test w/ 3 input, the Flink runner splits this into 3 bundle 
> descriptors. But it sends the original 3-input flatten but w/ 1 actual input 
> present in each bundle descriptor. This is inconsistent and the SDK shouldn't 
> expect dangling PCollections. In contrast, Dataflow removes the flatten when 
> it does the same split.
> Snippet:
> register: <
>   process_bundle_descriptor: <
> id: "3"
> transforms: <
>   key: "e4"
>   value: <
> unique_name: "github.com/apache/beam/sdks/go/pkg/beam.createFn'1"
> spec: <
>   urn: "urn:beam:transform:pardo:v1"
>   payload: [...]
> >
> inputs: <
>   key: "i0"
>   value: "n3"
> >
> outputs: <
>   key: "i0"
>   value: "n4"
> >
>   >
> >
> transforms: <
>   key: "e7"
>   value: <
> unique_name: "Flatten"
> spec: <
>   urn: "beam:transform:flatten:v1"
> >
> inputs: <
>   key: "i0"
>   value: "n2"
> >
> inputs: <
>   key: "i1"
>   value: "n4" . // <--- only one present.
> >
> inputs: <
>   key: "i2"
>   value: "n6"
> >
> outputs: <
>   key: "i0"
>   value: "n7"
> >
>   >
> >
> [...]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-4067) Java: FlinkPortableTestRunner: runs portably via self-started local Flink

2018-08-01 Thread Ankur Goenka (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka resolved BEAM-4067.

   Resolution: Fixed
Fix Version/s: 2.7.0

> Java: FlinkPortableTestRunner: runs portably via self-started local Flink
> -
>
> Key: BEAM-4067
> URL: https://issues.apache.org/jira/browse/BEAM-4067
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Ben Sidhom
>Assignee: Ankur Goenka
>Priority: Minor
> Fix For: 2.7.0
>
>
> The portable Flink runner cannot be tested through the normal mechanisms used 
> for ValidatesRunner tests because it requires a job server to be constructed 
> out of band and for pipelines to be run through it. We should implement a 
> shim that acts as a standard Java SDK Runner that spins up the necessary 
> server (possibly in-process) and runs against it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4067) Java: FlinkPortableTestRunner: runs portably via self-started local Flink

2018-08-01 Thread Ankur Goenka (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16566051#comment-16566051
 ] 

Ankur Goenka commented on BEAM-4067:


This feature is implemented in https://github.com/apache/beam/pull/5935

> Java: FlinkPortableTestRunner: runs portably via self-started local Flink
> -
>
> Key: BEAM-4067
> URL: https://issues.apache.org/jira/browse/BEAM-4067
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Ben Sidhom
>Assignee: Ankur Goenka
>Priority: Minor
>
> The portable Flink runner cannot be tested through the normal mechanisms used 
> for ValidatesRunner tests because it requires a job server to be constructed 
> out of band and for pipelines to be run through it. We should implement a 
> shim that acts as a standard Java SDK Runner that spins up the necessary 
> server (possibly in-process) and runs against it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-4147) Abstractions for artifact delivery via arbitrary storage backends

2018-08-01 Thread Ankur Goenka (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka resolved BEAM-4147.

   Resolution: Resolved
Fix Version/s: 2.6.0

We have a working ArtifactStaging and ArtifactRetriveal service based on Beam 
file system which cover a majority of DFS.

> Abstractions for artifact delivery via arbitrary storage backends
> -
>
> Key: BEAM-4147
> URL: https://issues.apache.org/jira/browse/BEAM-4147
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Ben Sidhom
>Assignee: Axel Magnuson
>Priority: Major
> Fix For: 2.6.0
>
>
> We need a way to wire in arbitrary runner artifact storage backends into the 
> job server and through to artifact staging on workers. This requires some new 
> abstractions in front of the job service.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3230) beam_PerformanceTests_Python is failing

2018-08-01 Thread Ankur Goenka (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16566065#comment-16566065
 ] 

Ankur Goenka commented on BEAM-3230:


[~tvalentyn] The latest fail is 
[https://builds.apache.org/job/beam_PerformanceTests_Python/1263/console] where 
its missing the version.py file.

Is this test still valid after changes in the past few months?

> beam_PerformanceTests_Python is failing
> ---
>
> Key: BEAM-3230
> URL: https://issues.apache.org/jira/browse/BEAM-3230
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Critical
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Jenkings beam_PerformanceTests_Python stage is failing for python builds.
> Here is the link to a failure console output 
> https://builds.apache.org/job/beam_PerformanceTests_Python/582/console



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5022) Move beam-sdks-java-fn-execution#createPortableValidatesRunnerTask to BeamModulePlugin

2018-08-01 Thread Ankur Goenka (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16566055#comment-16566055
 ] 

Ankur Goenka commented on BEAM-5022:


Addressed as part of [https://github.com/apache/beam/pull/6073]

> Move beam-sdks-java-fn-execution#createPortableValidatesRunnerTask to 
> BeamModulePlugin
> --
>
> Key: BEAM-5022
> URL: https://issues.apache.org/jira/browse/BEAM-5022
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, runner-flink
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>
> Move beam-sdks-java-fn-execution#createPortableValidatesRunnerTask to 
> BeamModulePlugin So that it can be used by other portable runners tests.
>  
> Also Create an interface TestJobserverDriver and make the drivers extend it 
> instead of using reflection start the Jobserver.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4826) Flink runner sends bad flatten to SDK

2018-08-01 Thread Ankur Goenka (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16566224#comment-16566224
 ] 

Ankur Goenka commented on BEAM-4826:


[~herohde] Quick question.

What do you mean by "Dataflow removes the flatten when it does the same split".

Does the dataflow drop the flatten whole together or does it drop the redundant 
inputs from flatten transform when sending process bundle descriptor to the 
SDKHarness?

 

The challenge with flink and in general with portability implementation is that 
it can potentially create different ProcessBundleDescriptor for a flatten for 
each input based on how those inputs are created. 

There are 2 potential fix for this.
 # Remove the redundant input at the execution time in the runner.
 # Create multiple flatten transforms for each stage created after fusion.

I think fix 1 is better because it gives runner more information about how to 
fuse things which in fix 2, this information is hard to attain.

> Flink runner sends bad flatten to SDK
> -
>
> Key: BEAM-4826
> URL: https://issues.apache.org/jira/browse/BEAM-4826
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Henning Rohde
>Assignee: Ankur Goenka
>Priority: Major
>  Labels: portability
>
> For a Go flatten test w/ 3 input, the Flink runner splits this into 3 bundle 
> descriptors. But it sends the original 3-input flatten but w/ 1 actual input 
> present in each bundle descriptor. This is inconsistent and the SDK shouldn't 
> expect dangling PCollections. In contrast, Dataflow removes the flatten when 
> it does the same split.
> Snippet:
> register: <
>   process_bundle_descriptor: <
> id: "3"
> transforms: <
>   key: "e4"
>   value: <
> unique_name: "github.com/apache/beam/sdks/go/pkg/beam.createFn'1"
> spec: <
>   urn: "urn:beam:transform:pardo:v1"
>   payload: [...]
> >
> inputs: <
>   key: "i0"
>   value: "n3"
> >
> outputs: <
>   key: "i0"
>   value: "n4"
> >
>   >
> >
> transforms: <
>   key: "e7"
>   value: <
> unique_name: "Flatten"
> spec: <
>   urn: "beam:transform:flatten:v1"
> >
> inputs: <
>   key: "i0"
>   value: "n2"
> >
> inputs: <
>   key: "i1"
>   value: "n4" . // <--- only one present.
> >
> inputs: <
>   key: "i2"
>   value: "n6"
> >
> outputs: <
>   key: "i0"
>   value: "n7"
> >
>   >
> >
> [...]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5090) Use topological sort during ProcessBundle in Java SDKHarness

2018-08-06 Thread Ankur Goenka (JIRA)
Ankur Goenka created BEAM-5090:
--

 Summary: Use topological sort during ProcessBundle in Java 
SDKHarness
 Key: BEAM-5090
 URL: https://issues.apache.org/jira/browse/BEAM-5090
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-harness
Reporter: Ankur Goenka
Assignee: Ankur Goenka


In reference to comment 
[https://github.com/apache/beam/pull/6093#issuecomment-410831830]
 * Use QueryablePipeline#getTopologicallyOrderedTransforms and execute 
processBundle requests.
 * Explore: is it worth caching the sorted structure at when registReuest is 
received.
 * Also, explore the how we can handle cycles in the execution stage if process 
bundle request if any.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-5110) Reconile Flink JVM singleton management with deployment

2018-08-13 Thread Ankur Goenka (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578840#comment-16578840
 ] 

Ankur Goenka edited comment on BEAM-5110 at 8/13/18 7:56 PM:
-

I agree. We should provide control over SDKHarness container instances 
especially when python container can only utilize a single core at a time.

We can break down the whole container management in 2 parts based on the 
discussion.
 # How to start and manage SDKHarness. (using kubernetes, process based, 
singleton factory etc)
 # Configuration for managing SDKHarenss. ( Number of containers. type of 
containers etc)

This bug tries to address the 1st part using a singleton factory which 
apparently is not behaving as expected in the current code base. We want to 
explore a more robust way of managing containers like kubernetes but that will 
take some time to add and also impose more infrastructure requirements so 
singleton container seems useful.

Shall we track the 2nd part as "Multiple SDKHarness with singleton container 
manager" separately?


was (Author: angoenka):
I agree. We should provide control SDKHarness container instances especially 
when python container can only utilize a single core at a time.

We can break down the whole container management in 2 parts based on the 
discussion.
 # How to start and manage SDKHarness. (using kubernetes, process based, 
singleton factory etc)
 # Configuration for managing SDKHarenss. ( Number of containers. type of 
containers etc)

This bug tries to address the 1st part using a singleton factory which 
apparently is not behaving as expected in the current code base. We want to 
explore a more robust way of managing containers like kubernetes but that will 
take some time to add and also impose more infrastructure requirements so 
singleton container seems useful.

Shall we track the 2nd part as "Multiple SDKHarness with singleton container 
manager" separately?

> Reconile Flink JVM singleton management with deployment
> ---
>
> Key: BEAM-5110
> URL: https://issues.apache.org/jira/browse/BEAM-5110
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Ben Sidhom
>Assignee: Ben Sidhom
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> [~angoenka] noticed through debugging that multiple instances of 
> BatchFlinkExecutableStageContext.BatchFactory are loaded for a given job when 
> executing in standalone cluster mode. This context factory is responsible for 
> maintaining singleton state across a TaskManager (JVM) in order to share SDK 
> Environments across workers in a given job. The multiple-loading breaks 
> singleton semantics and results in an indeterminate number of Environments 
> being created.
> It turns out that the [Flink classloading 
> mechanism|https://ci.apache.org/projects/flink/flink-docs-release-1.5/monitoring/debugging_classloading.html]
>  is determined by deployment mode. Note that "user code" as referenced by 
> this link is actually the Flink job server jar. Actual end-user code lives 
> inside of the SDK Environment and uploaded artifacts.
> In order to maintain singletons without resorting to IPC (for example, using 
> file locks and/or additional gRPC servers), we need to force non-dynamic 
> classloading. For example, this happens when jobs are submitted to YARN for 
> one-off deployments via `flink run`. However, connecting to an existing 
> (Flink standalone) deployment results in dynamic classloading.
> We should investigate this behavior and either document (and attempt to 
> enforce) deployment modes that are consistent with our requirements, or (if 
> possible) create a custom classloader that enforces singleton loading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5110) Reconile Flink JVM singleton management with deployment

2018-08-13 Thread Ankur Goenka (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578840#comment-16578840
 ] 

Ankur Goenka commented on BEAM-5110:


I agree. We should provide control SDKHarness container instances especially 
when python container can only utilize a single core at a time.

We can break down the whole container management in 2 parts based on the 
discussion.
 # How to start and manage SDKHarness. (using kubernetes, process based, 
singleton factory etc)
 # Configuration for managing SDKHarenss. ( Number of containers. type of 
containers etc)

This bug tries to address the 1st part using a singleton factory which 
apparently is not behaving as expected in the current code base. We want to 
explore a more robust way of managing containers like kubernetes but that will 
take some time to add and also impose more infrastructure requirements so 
singleton container seems useful.

Shall we track the 2nd part as "Multiple SDKHarness with singleton container 
manager" separately?

> Reconile Flink JVM singleton management with deployment
> ---
>
> Key: BEAM-5110
> URL: https://issues.apache.org/jira/browse/BEAM-5110
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Ben Sidhom
>Assignee: Ben Sidhom
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> [~angoenka] noticed through debugging that multiple instances of 
> BatchFlinkExecutableStageContext.BatchFactory are loaded for a given job when 
> executing in standalone cluster mode. This context factory is responsible for 
> maintaining singleton state across a TaskManager (JVM) in order to share SDK 
> Environments across workers in a given job. The multiple-loading breaks 
> singleton semantics and results in an indeterminate number of Environments 
> being created.
> It turns out that the [Flink classloading 
> mechanism|https://ci.apache.org/projects/flink/flink-docs-release-1.5/monitoring/debugging_classloading.html]
>  is determined by deployment mode. Note that "user code" as referenced by 
> this link is actually the Flink job server jar. Actual end-user code lives 
> inside of the SDK Environment and uploaded artifacts.
> In order to maintain singletons without resorting to IPC (for example, using 
> file locks and/or additional gRPC servers), we need to force non-dynamic 
> classloading. For example, this happens when jobs are submitted to YARN for 
> one-off deployments via `flink run`. However, connecting to an existing 
> (Flink standalone) deployment results in dynamic classloading.
> We should investigate this behavior and either document (and attempt to 
> enforce) deployment modes that are consistent with our requirements, or (if 
> possible) create a custom classloader that enforces singleton loading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5155) Custom sdk_location parameter not working with fn_api

2018-08-15 Thread Ankur Goenka (JIRA)
Ankur Goenka created BEAM-5155:
--

 Summary: Custom sdk_location parameter not working with fn_api
 Key: BEAM-5155
 URL: https://issues.apache.org/jira/browse/BEAM-5155
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-harness
Reporter: Ankur Goenka
Assignee: Ankur Goenka


The custom sdk_location is not taking affect in portability framework.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4130) Portable Flink runner JobService entry point in a Docker container

2018-08-14 Thread Ankur Goenka (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka reassigned BEAM-4130:
--

Assignee: Maximilian Michels  (was: Ankur Goenka)

> Portable Flink runner JobService entry point in a Docker container
> --
>
> Key: BEAM-4130
> URL: https://issues.apache.org/jira/browse/BEAM-4130
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Ben Sidhom
>Assignee: Maximilian Michels
>Priority: Minor
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> The portable Flink runner exists as a Job Service that runs somewhere. We 
> need a main entry point that itself spins up the job service (and artifact 
> staging service). The main program itself should be packaged into an uberjar 
> such that it can be run locally or submitted to a Flink deployment via `flink 
> run`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5110) Reconile Flink JVM singleton management with deployment

2018-08-14 Thread Ankur Goenka (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580554#comment-16580554
 ] 

Ankur Goenka commented on BEAM-5110:


With the worker_id introduction, fn_api is capable of supporting and managing 
multiple SDKHarness on a single machine. 

The challenge with python is that a single python process and only use 1 cpu 
core.

Going forward there can be other cases where we might want to have more than 1 
instance of the same SDKHarness environment.

But this seems to be a separate problem and should be handled separately from 
the current issue.

> Reconile Flink JVM singleton management with deployment
> ---
>
> Key: BEAM-5110
> URL: https://issues.apache.org/jira/browse/BEAM-5110
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Ben Sidhom
>Assignee: Ben Sidhom
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> [~angoenka] noticed through debugging that multiple instances of 
> BatchFlinkExecutableStageContext.BatchFactory are loaded for a given job when 
> executing in standalone cluster mode. This context factory is responsible for 
> maintaining singleton state across a TaskManager (JVM) in order to share SDK 
> Environments across workers in a given job. The multiple-loading breaks 
> singleton semantics and results in an indeterminate number of Environments 
> being created.
> It turns out that the [Flink classloading 
> mechanism|https://ci.apache.org/projects/flink/flink-docs-release-1.5/monitoring/debugging_classloading.html]
>  is determined by deployment mode. Note that "user code" as referenced by 
> this link is actually the Flink job server jar. Actual end-user code lives 
> inside of the SDK Environment and uploaded artifacts.
> In order to maintain singletons without resorting to IPC (for example, using 
> file locks and/or additional gRPC servers), we need to force non-dynamic 
> classloading. For example, this happens when jobs are submitted to YARN for 
> one-off deployments via `flink run`. However, connecting to an existing 
> (Flink standalone) deployment results in dynamic classloading.
> We should investigate this behavior and either document (and attempt to 
> enforce) deployment modes that are consistent with our requirements, or (if 
> possible) create a custom classloader that enforces singleton loading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-5110) Reconile Flink JVM singleton management with deployment

2018-08-14 Thread Ankur Goenka (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580554#comment-16580554
 ] 

Ankur Goenka edited comment on BEAM-5110 at 8/14/18 11:47 PM:
--

With the worker_id introduction, fn_api is capable of supporting and managing 
multiple SDKHarness on a single machine. 

The challenge with python is that a single python process can only use 1 cpu 
core.

Going forward there can be other cases where we might want to have more than 1 
instance of the same SDKHarness environment.

But this seems to be a separate problem and should be handled separately from 
the current issue.


was (Author: angoenka):
With the worker_id introduction, fn_api is capable of supporting and managing 
multiple SDKHarness on a single machine. 

The challenge with python is that a single python process and only use 1 cpu 
core.

Going forward there can be other cases where we might want to have more than 1 
instance of the same SDKHarness environment.

But this seems to be a separate problem and should be handled separately from 
the current issue.

> Reconile Flink JVM singleton management with deployment
> ---
>
> Key: BEAM-5110
> URL: https://issues.apache.org/jira/browse/BEAM-5110
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Ben Sidhom
>Assignee: Ben Sidhom
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> [~angoenka] noticed through debugging that multiple instances of 
> BatchFlinkExecutableStageContext.BatchFactory are loaded for a given job when 
> executing in standalone cluster mode. This context factory is responsible for 
> maintaining singleton state across a TaskManager (JVM) in order to share SDK 
> Environments across workers in a given job. The multiple-loading breaks 
> singleton semantics and results in an indeterminate number of Environments 
> being created.
> It turns out that the [Flink classloading 
> mechanism|https://ci.apache.org/projects/flink/flink-docs-release-1.5/monitoring/debugging_classloading.html]
>  is determined by deployment mode. Note that "user code" as referenced by 
> this link is actually the Flink job server jar. Actual end-user code lives 
> inside of the SDK Environment and uploaded artifacts.
> In order to maintain singletons without resorting to IPC (for example, using 
> file locks and/or additional gRPC servers), we need to force non-dynamic 
> classloading. For example, this happens when jobs are submitted to YARN for 
> one-off deployments via `flink run`. However, connecting to an existing 
> (Flink standalone) deployment results in dynamic classloading.
> We should investigate this behavior and either document (and attempt to 
> enforce) deployment modes that are consistent with our requirements, or (if 
> possible) create a custom classloader that enforces singleton loading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5110) Reconile Flink JVM singleton management with deployment

2018-08-08 Thread Ankur Goenka (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573688#comment-16573688
 ] 

Ankur Goenka commented on BEAM-5110:


We can invert the class loading order using 
https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#classloader-resolve-order

> Reconile Flink JVM singleton management with deployment
> ---
>
> Key: BEAM-5110
> URL: https://issues.apache.org/jira/browse/BEAM-5110
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Ben Sidhom
>Assignee: Ben Sidhom
>Priority: Major
>
> [~angoenka] noticed through debugging that multiple instances of 
> BatchFlinkExecutableStageContext.BatchFactory are loaded for a given job when 
> executing in standalone cluster mode. This context factory is responsible for 
> maintaining singleton state across a TaskManager (JVM) in order to share SDK 
> Environments across workers in a given job. The multiple-loading breaks 
> singleton semantics and results in an indeterminate number of Environments 
> being created.
> It turns out that the [Flink classloading 
> mechanism|https://ci.apache.org/projects/flink/flink-docs-release-1.5/monitoring/debugging_classloading.html]
>  is determined by deployment mode. Note that "user code" as referenced by 
> this link is actually the Flink job server jar. Actual end-user code lives 
> inside of the SDK Environment and uploaded artifacts.
> In order to maintain singletons without resorting to IPC (for example, using 
> file locks and/or additional gRPC servers), we need to force non-dynamic 
> classloading. For example, this happens when jobs are submitted to YARN for 
> one-off deployments via `flink run`. However, connecting to an existing 
> (Flink standalone) deployment results in dynamic classloading.
> We should investigate this behavior and either document (and attempt to 
> enforce) deployment modes that are consistent with our requirements, or (if 
> possible) create a custom classloader that enforces singleton loading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-5110) Reconile Flink JVM singleton management with deployment

2018-08-08 Thread Ankur Goenka (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573688#comment-16573688
 ] 

Ankur Goenka edited comment on BEAM-5110 at 8/8/18 6:43 PM:


We can invert the class loading order using 
[https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#classloader-resolve-order]

 

There are additional configs as well which can be used to load cerain classes 
from the parent class loader first and then the dynamic class loader


was (Author: angoenka):
We can invert the class loading order using 
https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#classloader-resolve-order

> Reconile Flink JVM singleton management with deployment
> ---
>
> Key: BEAM-5110
> URL: https://issues.apache.org/jira/browse/BEAM-5110
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Ben Sidhom
>Assignee: Ben Sidhom
>Priority: Major
>
> [~angoenka] noticed through debugging that multiple instances of 
> BatchFlinkExecutableStageContext.BatchFactory are loaded for a given job when 
> executing in standalone cluster mode. This context factory is responsible for 
> maintaining singleton state across a TaskManager (JVM) in order to share SDK 
> Environments across workers in a given job. The multiple-loading breaks 
> singleton semantics and results in an indeterminate number of Environments 
> being created.
> It turns out that the [Flink classloading 
> mechanism|https://ci.apache.org/projects/flink/flink-docs-release-1.5/monitoring/debugging_classloading.html]
>  is determined by deployment mode. Note that "user code" as referenced by 
> this link is actually the Flink job server jar. Actual end-user code lives 
> inside of the SDK Environment and uploaded artifacts.
> In order to maintain singletons without resorting to IPC (for example, using 
> file locks and/or additional gRPC servers), we need to force non-dynamic 
> classloading. For example, this happens when jobs are submitted to YARN for 
> one-off deployments via `flink run`. However, connecting to an existing 
> (Flink standalone) deployment results in dynamic classloading.
> We should investigate this behavior and either document (and attempt to 
> enforce) deployment modes that are consistent with our requirements, or (if 
> possible) create a custom classloader that enforces singleton loading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5090) Use topological sort during ProcessBundle in Java SDKHarness

2018-08-08 Thread Ankur Goenka (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-5090:
---
Description: 
In reference to comment 
[https://github.com/apache/beam/pull/6093#issuecomment-410831830]
 * Use QueryablePipeline#getTopologicallyOrderedTransforms and execute 
processBundle requests.
 * Explore: is it worth caching the sorted structure when registReuest is 
received.
 * Also, explore how we can handle cycles in the execution stage if process 
bundle request if any.

  was:
In reference to comment 
[https://github.com/apache/beam/pull/6093#issuecomment-410831830]
 * Use QueryablePipeline#getTopologicallyOrderedTransforms and execute 
processBundle requests.
 * Explore: is it worth caching the sorted structure at when registReuest is 
received.
 * Also, explore the how we can handle cycles in the execution stage if process 
bundle request if any.


> Use topological sort during ProcessBundle in Java SDKHarness
> 
>
> Key: BEAM-5090
> URL: https://issues.apache.org/jira/browse/BEAM-5090
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-harness
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>
> In reference to comment 
> [https://github.com/apache/beam/pull/6093#issuecomment-410831830]
>  * Use QueryablePipeline#getTopologicallyOrderedTransforms and execute 
> processBundle requests.
>  * Explore: is it worth caching the sorted structure when registReuest is 
> received.
>  * Also, explore how we can handle cycles in the execution stage if process 
> bundle request if any.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5090) Use topological sort during ProcessBundle in Java SDKHarness

2018-08-08 Thread Ankur Goenka (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-5090:
---
Description: 
In reference to comment 
[https://github.com/apache/beam/pull/6093#issuecomment-410831830]
 * Use QueryablePipeline#getTopologicallyOrderedTransforms and execute 
processBundle requests.
 * Explore: is it worth caching the sorted structure when registReuest is 
received.
 * Also, explore how we can handle cycles in the execution stage in process 
bundle request if any.

  was:
In reference to comment 
[https://github.com/apache/beam/pull/6093#issuecomment-410831830]
 * Use QueryablePipeline#getTopologicallyOrderedTransforms and execute 
processBundle requests.
 * Explore: is it worth caching the sorted structure when registReuest is 
received.
 * Also, explore how we can handle cycles in the execution stage if process 
bundle request if any.


> Use topological sort during ProcessBundle in Java SDKHarness
> 
>
> Key: BEAM-5090
> URL: https://issues.apache.org/jira/browse/BEAM-5090
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-harness
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>
> In reference to comment 
> [https://github.com/apache/beam/pull/6093#issuecomment-410831830]
>  * Use QueryablePipeline#getTopologicallyOrderedTransforms and execute 
> processBundle requests.
>  * Explore: is it worth caching the sorted structure when registReuest is 
> received.
>  * Also, explore how we can handle cycles in the execution stage in process 
> bundle request if any.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-4699) BeamFileSystemArtifactServicesTest.putArtifactsSingleSmallFileTest flake

2018-08-09 Thread Ankur Goenka (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka resolved BEAM-4699.

   Resolution: Duplicate
Fix Version/s: 2.7.0

The issue is a duplicate of BEAM-4810
Please re open if you see the issue again.

 

> BeamFileSystemArtifactServicesTest.putArtifactsSingleSmallFileTest flake
> 
>
> Key: BEAM-4699
> URL: https://issues.apache.org/jira/browse/BEAM-4699
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Kenneth Knowles
>Assignee: Ankur Goenka
>Priority: Major
>  Labels: portability
> Fix For: 2.7.0
>
>
> I've seen a few transient failures from 
> {{BeamFileSystemArtifactServicesTest}}. I don't recall if they are all 
> {{putArtifactsSingleSmallFileTest}} or how often they occur.
> https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/
> {code}
> java.io.FileNotFoundException: 
> /tmp/junit8499382858780569091/staging/123/artifacts/artifact_c147efcfc2d7ea666a9e4f5187b115c90903f0fc896a56df9a6ef5d8f3fc9f31
>  (No such file or directory)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5190) Python pipeline options are not picked correctly by PortableRunner

2018-08-22 Thread Ankur Goenka (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588483#comment-16588483
 ] 

Ankur Goenka commented on BEAM-5190:


cc: [~thw] This blocks the worker_thread option on python SDKHarness.

> Python pipeline options are not picked correctly by PortableRunner
> --
>
> Key: BEAM-5190
> URL: https://issues.apache.org/jira/browse/BEAM-5190
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>
> Python SDK worker is deserializing the pipeline options to dictionary instead 
> of PipelineOptions
> Sample log
> [grpc-default-executor-2] INFO sdk_worker_main.main - Python sdk harness 
> started with pipeline_options: \{u'beam:option:flink_master:v1': u'[auto]', 
> u'beam:option:streaming:v1': False, u'beam:option:experiments:v1': 
> [u'beam_fn_api', u'worker_threads=50'], u'beam:option:dry_run:v1': False, 
> u'beam:option:runner:v1': None, u'beam:option:profile_memory:v1': False, 
> u'beam:option:runtime_type_check:v1': False, u'beam:option:region:v1': 
> u'us-central1', u'beam:option:options_id:v1': 1, u'beam:option:no_auth:v1': 
> False, u'beam:option:dataflow_endpoint:v1': 
> u'https://dataflow.googleapis.com', u'beam:option:sdk_location:v1': 
> u'/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/dist/apache-beam-2.7.0.dev0.tar.gz',
>  u'beam:option:direct_runner_use_stacked_bundle:v1': True, 
> u'beam:option:save_main_session:v1': True, 
> u'beam:option:type_check_strictness:v1': u'DEFAULT_TO_ANY', 
> u'beam:option:profile_cpu:v1': False, u'beam:option:job_endpoint:v1': 
> u'localhost:8099', u'beam:option:job_name:v1': 
> u'BeamApp-goenka-0822071645-48ae1008', u'beam:option:temp_location:v1': 
> u'gs://clouddfe-goenka/tmp/', u'beam:option:app_name:v1': None, 
> u'beam:option:project:v1': u'google.com:clouddfe', 
> u'beam:option:pipeline_type_check:v1': True, 
> u'beam:option:staging_location:v1': u'gs://clouddfe-goenka/tmp/staging'} 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5190) Python pipeline options are not picked correctly by PortableRunner

2018-08-22 Thread Ankur Goenka (JIRA)
Ankur Goenka created BEAM-5190:
--

 Summary: Python pipeline options are not picked correctly by 
PortableRunner
 Key: BEAM-5190
 URL: https://issues.apache.org/jira/browse/BEAM-5190
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-harness
Reporter: Ankur Goenka
Assignee: Ankur Goenka


Python SDK worker is deserializing the pipeline options to dictionary instead 
of PipelineOptions

Sample log

[grpc-default-executor-2] INFO sdk_worker_main.main - Python sdk harness 
started with pipeline_options: \{u'beam:option:flink_master:v1': u'[auto]', 
u'beam:option:streaming:v1': False, u'beam:option:experiments:v1': 
[u'beam_fn_api', u'worker_threads=50'], u'beam:option:dry_run:v1': False, 
u'beam:option:runner:v1': None, u'beam:option:profile_memory:v1': False, 
u'beam:option:runtime_type_check:v1': False, u'beam:option:region:v1': 
u'us-central1', u'beam:option:options_id:v1': 1, u'beam:option:no_auth:v1': 
False, u'beam:option:dataflow_endpoint:v1': u'https://dataflow.googleapis.com', 
u'beam:option:sdk_location:v1': 
u'/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/dist/apache-beam-2.7.0.dev0.tar.gz',
 u'beam:option:direct_runner_use_stacked_bundle:v1': True, 
u'beam:option:save_main_session:v1': True, 
u'beam:option:type_check_strictness:v1': u'DEFAULT_TO_ANY', 
u'beam:option:profile_cpu:v1': False, u'beam:option:job_endpoint:v1': 
u'localhost:8099', u'beam:option:job_name:v1': 
u'BeamApp-goenka-0822071645-48ae1008', u'beam:option:temp_location:v1': 
u'gs://clouddfe-goenka/tmp/', u'beam:option:app_name:v1': None, 
u'beam:option:project:v1': u'google.com:clouddfe', 
u'beam:option:pipeline_type_check:v1': True, 
u'beam:option:staging_location:v1': u'gs://clouddfe-goenka/tmp/staging'} 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5190) Python pipeline options are not picked correctly by PortableRunner

2018-08-22 Thread Ankur Goenka (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588549#comment-16588549
 ] 

Ankur Goenka commented on BEAM-5190:


Options are rewritten to URN causing the issue.

https://github.com/apache/beam/blob/7c41e0a915083bd3b1fe52c2a417fa38a00e6463/sdks/python/apache_beam/runners/portability/portable_runner.py#L107

> Python pipeline options are not picked correctly by PortableRunner
> --
>
> Key: BEAM-5190
> URL: https://issues.apache.org/jira/browse/BEAM-5190
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>
> Python SDK worker is deserializing the pipeline options to dictionary instead 
> of PipelineOptions
> Sample log
> [grpc-default-executor-2] INFO sdk_worker_main.main - Python sdk harness 
> started with pipeline_options: \{u'beam:option:flink_master:v1': u'[auto]', 
> u'beam:option:streaming:v1': False, u'beam:option:experiments:v1': 
> [u'beam_fn_api', u'worker_threads=50'], u'beam:option:dry_run:v1': False, 
> u'beam:option:runner:v1': None, u'beam:option:profile_memory:v1': False, 
> u'beam:option:runtime_type_check:v1': False, u'beam:option:region:v1': 
> u'us-central1', u'beam:option:options_id:v1': 1, u'beam:option:no_auth:v1': 
> False, u'beam:option:dataflow_endpoint:v1': 
> u'https://dataflow.googleapis.com', u'beam:option:sdk_location:v1': 
> u'/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/dist/apache-beam-2.7.0.dev0.tar.gz',
>  u'beam:option:direct_runner_use_stacked_bundle:v1': True, 
> u'beam:option:save_main_session:v1': True, 
> u'beam:option:type_check_strictness:v1': u'DEFAULT_TO_ANY', 
> u'beam:option:profile_cpu:v1': False, u'beam:option:job_endpoint:v1': 
> u'localhost:8099', u'beam:option:job_name:v1': 
> u'BeamApp-goenka-0822071645-48ae1008', u'beam:option:temp_location:v1': 
> u'gs://clouddfe-goenka/tmp/', u'beam:option:app_name:v1': None, 
> u'beam:option:project:v1': u'google.com:clouddfe', 
> u'beam:option:pipeline_type_check:v1': True, 
> u'beam:option:staging_location:v1': u'gs://clouddfe-goenka/tmp/staging'} 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5180) Broken FileResultCoder via parseSchema change

2018-08-24 Thread Ankur Goenka (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16591973#comment-16591973
 ] 

Ankur Goenka commented on BEAM-5180:


>From the java.net.URI docs,

A hierarchical URI is subject to further parsing according to the syntax
{quote}[_scheme_{{*:*}}][{{*//*}}_authority_][_path_][{{*?*}}_query_][{{*#*}}_fragment_]{quote}
Which enforces  //

But to support HDFS and unblock our selves we should go with the rollback.

> Broken FileResultCoder via parseSchema change
> -
>
> Key: BEAM-5180
> URL: https://issues.apache.org/jira/browse/BEAM-5180
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.6.0
>Reporter: Jozef Vilcek
>Assignee: Kenneth Knowles
>Priority: Blocker
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Recently this commit
> [https://github.com/apache/beam/commit/3fff58c21f94415f3397e185377e36d3df662384]
> introduced more strict schema parsing which is breaking the contract between 
> _FileResultCoder_ and _FileSystems.matchNewResource()_.
> Coder takes _ResourceId_ and serialize it via `_toString_` methods and then 
> relies on filesystem being able to parse it back again. Having strict 
> _scheme://_ breaks this at least for Hadoop filesystem which use _URI_ for 
> _ResourceId_ and produce _toString()_ in form of `_hdfs:/some/path_`
> I guess the _ResourceIdCoder_ is suffering the same problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-5180) Broken FileResultCoder via parseSchema change

2018-08-25 Thread Ankur Goenka (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16591973#comment-16591973
 ] 

Ankur Goenka edited comment on BEAM-5180 at 8/25/18 10:06 PM:
--

>From the java.net.URI docs,

A hierarchical URI is subject to further parsing according to the syntax
{quote}[_scheme_:][//_authority_][_path_][?_query_][#_fragment_|#*}}_fragment_]
{quote}
Which enforces  //

But to support HDFS and unblock our selves we should go with the rollback.


was (Author: angoenka):
>From the java.net.URI docs,

A hierarchical URI is subject to further parsing according to the syntax
{quote}[_scheme_{{*:*}}][{{*//*}}_authority_][_path_][{{*?*}}_query_][{{*#*}}_fragment_]{quote}
Which enforces  //

But to support HDFS and unblock our selves we should go with the rollback.

> Broken FileResultCoder via parseSchema change
> -
>
> Key: BEAM-5180
> URL: https://issues.apache.org/jira/browse/BEAM-5180
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.6.0
>Reporter: Jozef Vilcek
>Assignee: Ankur Goenka
>Priority: Blocker
> Fix For: 2.7.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Recently this commit
> [https://github.com/apache/beam/commit/3fff58c21f94415f3397e185377e36d3df662384]
> introduced more strict schema parsing which is breaking the contract between 
> _FileResultCoder_ and _FileSystems.matchNewResource()_.
> Coder takes _ResourceId_ and serialize it via `_toString_` methods and then 
> relies on filesystem being able to parse it back again. Having strict 
> _scheme://_ breaks this at least for Hadoop filesystem which use _URI_ for 
> _ResourceId_ and produce _toString()_ in form of `_hdfs:/some/path_`
> I guess the _ResourceIdCoder_ is suffering the same problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5180) Broken FileResultCoder via parseSchema change

2018-08-25 Thread Ankur Goenka (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16592727#comment-16592727
 ] 

Ankur Goenka commented on BEAM-5180:


That's correct. HDFS file URIs in this case do not have authority so // is not 
required.

I agree with the resolution.

>From the documentation, the schema identification should be done only based on 
>: but windows breaks this format.

We can have a separate way to identify windows file system by using
{code:java}
[a-zA-Z]:/{code}
given that windows can have only a single drive letter. We should also relax 
back schema regex to just 
{code:java}
(?[a-zA-Z][-a-zA-Z0-9+.]*):.*
{code}

> Broken FileResultCoder via parseSchema change
> -
>
> Key: BEAM-5180
> URL: https://issues.apache.org/jira/browse/BEAM-5180
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.6.0
>Reporter: Jozef Vilcek
>Assignee: Ankur Goenka
>Priority: Blocker
> Fix For: 2.7.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Recently this commit
> [https://github.com/apache/beam/commit/3fff58c21f94415f3397e185377e36d3df662384]
> introduced more strict schema parsing which is breaking the contract between 
> _FileResultCoder_ and _FileSystems.matchNewResource()_.
> Coder takes _ResourceId_ and serialize it via `_toString_` methods and then 
> relies on filesystem being able to parse it back again. Having strict 
> _scheme://_ breaks this at least for Hadoop filesystem which use _URI_ for 
> _ResourceId_ and produce _toString()_ in form of `_hdfs:/some/path_`
> I guess the _ResourceIdCoder_ is suffering the same problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5218) Add the portable test name to the pipeline name for easy debugging

2018-08-25 Thread Ankur Goenka (JIRA)
Ankur Goenka created BEAM-5218:
--

 Summary: Add the portable test name to the pipeline name for easy 
debugging 
 Key: BEAM-5218
 URL: https://issues.apache.org/jira/browse/BEAM-5218
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: Ankur Goenka
Assignee: Ankur Goenka


[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/portable_runner_test.py]
 launches job for each test.

The job names are generic and hard to match to a test case.

Adding test case name to the job name will make it easy to identify 
corresponding test cases and debug,\.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-5155) Custom sdk_location parameter not working with fn_api

2018-08-17 Thread Ankur Goenka (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka resolved BEAM-5155.

   Resolution: Fixed
Fix Version/s: 2.7.0

> Custom sdk_location parameter not working with fn_api
> -
>
> Key: BEAM-5155
> URL: https://issues.apache.org/jira/browse/BEAM-5155
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.7.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The custom sdk_location is not taking affect in portability framework.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5166) Add support for upper bound and optimum concurrency in SDK

2018-08-18 Thread Ankur Goenka (JIRA)
Ankur Goenka created BEAM-5166:
--

 Summary: Add support for upper bound and optimum concurrency in SDK
 Key: BEAM-5166
 URL: https://issues.apache.org/jira/browse/BEAM-5166
 Project: Beam
  Issue Type: New Feature
  Components: sdk-go, sdk-java-harness, sdk-py-harness
Reporter: Ankur Goenka
Assignee: Ankur Goenka


Based on the discussion 

[https://lists.apache.org/thread.html/0cbf73d696e0d3a5bb8e93618ac9d6bb81daecf2c9c8e11ee220c8ae@%3Cdev.beam.apache.org%3E]
 

Send Concurrency information to Runner from SDKHarness as header of control 
channel.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5167) Use concurrency information from SDK Harness in Flink Portable Runner

2018-08-18 Thread Ankur Goenka (JIRA)
Ankur Goenka created BEAM-5167:
--

 Summary: Use concurrency information from SDK Harness in Flink 
Portable Runner
 Key: BEAM-5167
 URL: https://issues.apache.org/jira/browse/BEAM-5167
 Project: Beam
  Issue Type: New Feature
  Components: runner-flink
Reporter: Ankur Goenka
Assignee: Ankur Goenka


Based on the discussion 
[https://lists.apache.org/thread.html/0cbf73d696e0d3a5bb8e93618ac9d6bb81daecf2c9c8e11ee220c8ae@%3Cdev.beam.apache.org%3E]

Use SDK Harness concurrency information in Flink runner to schedule bundles.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5190) Python pipeline options are not picked correctly by PortableRunner

2018-08-23 Thread Ankur Goenka (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589865#comment-16589865
 ] 

Ankur Goenka commented on BEAM-5190:


Yes, The more interesting stack is on the Python SDK side where will see that 
only 12 bundles are processed at a time.

 

To check the python stack trace search for "HTTP" in worker logs and use the 
host and port.

> Python pipeline options are not picked correctly by PortableRunner
> --
>
> Key: BEAM-5190
> URL: https://issues.apache.org/jira/browse/BEAM-5190
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Python SDK worker is deserializing the pipeline options to dictionary instead 
> of PipelineOptions
> Sample log
> [grpc-default-executor-2] INFO sdk_worker_main.main - Python sdk harness 
> started with pipeline_options: \{u'beam:option:flink_master:v1': u'[auto]', 
> u'beam:option:streaming:v1': False, u'beam:option:experiments:v1': 
> [u'beam_fn_api', u'worker_threads=50'], u'beam:option:dry_run:v1': False, 
> u'beam:option:runner:v1': None, u'beam:option:profile_memory:v1': False, 
> u'beam:option:runtime_type_check:v1': False, u'beam:option:region:v1': 
> u'us-central1', u'beam:option:options_id:v1': 1, u'beam:option:no_auth:v1': 
> False, u'beam:option:dataflow_endpoint:v1': 
> u'https://dataflow.googleapis.com', u'beam:option:sdk_location:v1': 
> u'/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/dist/apache-beam-2.7.0.dev0.tar.gz',
>  u'beam:option:direct_runner_use_stacked_bundle:v1': True, 
> u'beam:option:save_main_session:v1': True, 
> u'beam:option:type_check_strictness:v1': u'DEFAULT_TO_ANY', 
> u'beam:option:profile_cpu:v1': False, u'beam:option:job_endpoint:v1': 
> u'localhost:8099', u'beam:option:job_name:v1': 
> u'BeamApp-goenka-0822071645-48ae1008', u'beam:option:temp_location:v1': 
> u'gs://clouddfe-goenka/tmp/', u'beam:option:app_name:v1': None, 
> u'beam:option:project:v1': u'google.com:clouddfe', 
> u'beam:option:pipeline_type_check:v1': True, 
> u'beam:option:staging_location:v1': u'gs://clouddfe-goenka/tmp/staging'} 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5194) Pipeline options with multi value are not deserialized correctly from map

2018-08-22 Thread Ankur Goenka (JIRA)
Ankur Goenka created BEAM-5194:
--

 Summary: Pipeline options with multi value are not deserialized 
correctly from map
 Key: BEAM-5194
 URL: https://issues.apache.org/jira/browse/BEAM-5194
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Ankur Goenka
Assignee: Ahmet Altay


[https://github.com/apache/beam/blob/7c41e0a915083bd3b1fe52c2a417fa38a00e6463/sdks/python/apache_beam/options/pipeline_options.py#L171]

 

Multiple options are converted to strings and added to flags which causes wrong 
deserialization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5194) Pipeline options with multi value are not deserialized correctly from map

2018-08-22 Thread Ankur Goenka (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka reassigned BEAM-5194:
--

Assignee: Ankur Goenka  (was: Ahmet Altay)

> Pipeline options with multi value are not deserialized correctly from map
> -
>
> Key: BEAM-5194
> URL: https://issues.apache.org/jira/browse/BEAM-5194
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>
> [https://github.com/apache/beam/blob/7c41e0a915083bd3b1fe52c2a417fa38a00e6463/sdks/python/apache_beam/options/pipeline_options.py#L171]
>  
> Multiple options are converted to strings and added to flags which causes 
> wrong deserialization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-5156) Apache Beam on dataflow runner can't find Tensorflow for workers

2018-08-22 Thread Ankur Goenka (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589137#comment-16589137
 ] 

Ankur Goenka edited comment on BEAM-5156 at 8/22/18 5:17 PM:
-

Check the thread 
[https://lists.apache.org/thread.html/0cbf73d696e0d3a5bb8e93618ac9d6bb81daecf2c9c8e11ee220c8ae@%3Cdev.beam.apache.org%3E]

 I am suspecting the same issue here.

Please try using --experiments worker_threads=100 after fixing the setup.py, 
please use 


was (Author: angoenka):
Check the thread 
[https://lists.apache.org/thread.html/0cbf73d696e0d3a5bb8e93618ac9d6bb81daecf2c9c8e11ee220c8ae@%3Cdev.beam.apache.org%3E]

 

Please try using --experiments worker_threads=100 after fixing the setup.py, 
please use 

> Apache Beam on dataflow runner can't find Tensorflow for workers
> 
>
> Key: BEAM-5156
> URL: https://issues.apache.org/jira/browse/BEAM-5156
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
> Environment: google cloud compute instance running linux
>Reporter: Thomas Johns
>Assignee: Kenneth Knowles
>Priority: Major
> Fix For: 2.5.0, 2.6.0
>
>
> Adding serialized tensorflow model to apache beam pipeline with python sdk 
> but it can not find any version of tensorflow when applied to dataflow runner 
> although it is not a problem locally. Tried various versions of tensorflow 
> from 1.6 to 1.10. I thought it might be a conflicting package some where so I 
> removed all other packages and tried to just install tensorflow and same 
> problem.
> Could not find a version that satisfies the requirement tensorflow==1.6.0 
> (from -r reqtest.txt (line 59)) (from versions: )No matching distribution 
> found for tensorflow==1.6.0 (from -r reqtest.txt (line 59))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5156) Apache Beam on dataflow runner can't find Tensorflow for workers

2018-08-22 Thread Ankur Goenka (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589137#comment-16589137
 ] 

Ankur Goenka commented on BEAM-5156:


Check the thread 
[https://lists.apache.org/thread.html/0cbf73d696e0d3a5bb8e93618ac9d6bb81daecf2c9c8e11ee220c8ae@%3Cdev.beam.apache.org%3E]

 

Please try using --experiments worker_threads=100 after fixing the setup.py, 
please use 

> Apache Beam on dataflow runner can't find Tensorflow for workers
> 
>
> Key: BEAM-5156
> URL: https://issues.apache.org/jira/browse/BEAM-5156
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
> Environment: google cloud compute instance running linux
>Reporter: Thomas Johns
>Assignee: Kenneth Knowles
>Priority: Major
> Fix For: 2.5.0, 2.6.0
>
>
> Adding serialized tensorflow model to apache beam pipeline with python sdk 
> but it can not find any version of tensorflow when applied to dataflow runner 
> although it is not a problem locally. Tried various versions of tensorflow 
> from 1.6 to 1.10. I thought it might be a conflicting package some where so I 
> removed all other packages and tried to just install tensorflow and same 
> problem.
> Could not find a version that satisfies the requirement tensorflow==1.6.0 
> (from -r reqtest.txt (line 59)) (from versions: )No matching distribution 
> found for tensorflow==1.6.0 (from -r reqtest.txt (line 59))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5219) Expose OutboundMessage in PubSub client

2018-08-26 Thread Ankur Goenka (JIRA)
Ankur Goenka created BEAM-5219:
--

 Summary: Expose OutboundMessage in PubSub client
 Key: BEAM-5219
 URL: https://issues.apache.org/jira/browse/BEAM-5219
 Project: Beam
  Issue Type: Improvement
  Components: io-java-gcp
Reporter: Ankur Goenka
Assignee: Chamikara Jayalath


publish method in org/apache/beam/sdk/io/gcp/pubsub/PubsubClient.java is public 
but the argument OutboundMessage is not public which makes the api unusable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5219) Expose OutboundMessage in PubSub client

2018-08-26 Thread Ankur Goenka (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka reassigned BEAM-5219:
--

Assignee: Ankur Goenka  (was: Chamikara Jayalath)

> Expose OutboundMessage in PubSub client
> ---
>
> Key: BEAM-5219
> URL: https://issues.apache.org/jira/browse/BEAM-5219
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Minor
>
> publish method in org/apache/beam/sdk/io/gcp/pubsub/PubsubClient.java is 
> public but the argument OutboundMessage is not public which makes the api 
> unusable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4834) Validate cycles in org.apache.beam.runners.core.construction.graph.Network before doing topological sort

2018-07-19 Thread Ankur Goenka (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka reassigned BEAM-4834:
--

Assignee: Ankur Goenka  (was: Kenneth Knowles)

> Validate cycles in org.apache.beam.runners.core.construction.graph.Network 
> before doing topological sort
> 
>
> Key: BEAM-4834
> URL: https://issues.apache.org/jira/browse/BEAM-4834
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Cyclic graphs will never finish the topological sort so we should check the 
> cycle before doing the topological sort.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4834) Validate cycles in org.apache.beam.runners.core.construction.graph.Network before doing topological sort

2018-07-19 Thread Ankur Goenka (JIRA)
Ankur Goenka created BEAM-4834:
--

 Summary: Validate cycles in 
org.apache.beam.runners.core.construction.graph.Network before doing 
topological sort
 Key: BEAM-4834
 URL: https://issues.apache.org/jira/browse/BEAM-4834
 Project: Beam
  Issue Type: Bug
  Components: runner-core
Reporter: Ankur Goenka
Assignee: Kenneth Knowles


Cyclic graphs will never finish the topological sort so we should check the 
cycle before doing the topological sort.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4810) Flaky test BeamFileSystemArtifactServicesTest

2018-07-17 Thread Ankur Goenka (JIRA)
Ankur Goenka created BEAM-4810:
--

 Summary: Flaky test BeamFileSystemArtifactServicesTest
 Key: BEAM-4810
 URL: https://issues.apache.org/jira/browse/BEAM-4810
 Project: Beam
  Issue Type: Bug
  Components: runner-core
Reporter: Ankur Goenka
Assignee: Ankur Goenka


The test is flaky because we do not wait for putArtifact completion.

Here is a failing build 
https://builds.apache.org/job/beam_PreCommit_Java_Commit/340/testReport/org.apache.beam.runners.fnexecution.artifact/BeamFileSystemArtifactServicesTest/putArtifactsSingleSmallFileTest/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5273) Local file system does not work as expected on Portability Framework with Docker

2018-08-30 Thread Ankur Goenka (JIRA)
Ankur Goenka created BEAM-5273:
--

 Summary: Local file system does not work as expected on 
Portability Framework with Docker
 Key: BEAM-5273
 URL: https://issues.apache.org/jira/browse/BEAM-5273
 Project: Beam
  Issue Type: Bug
  Components: sdk-go, sdk-java-harness, sdk-py-harness
Reporter: Ankur Goenka
Assignee: Ankur Goenka


With portability framework, the local file system reads and write to the docker 
container file system.

This makes usage of local files impossible with portability framework.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >