[jira] [Updated] (BEAM-11075) Load Tests for Go SDK

2020-12-16 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-11075:

Fix Version/s: Not applicable
   Resolution: Fixed
   Status: Resolved  (was: Open)

> Load Tests for Go SDK
> -
>
> Key: BEAM-11075
> URL: https://issues.apache.org/jira/browse/BEAM-11075
> Project: Beam
>  Issue Type: Test
>  Components: sdk-go, testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: P3
> Fix For: Not applicable
>
>  Time Spent: 19h
>  Remaining Estimate: 0h
>
> We have Load Tests for Python and Java SDKs[1], but we are missing the ones 
> for Go SDK.
> Tests to be done:
>  * ParDo
>  * Combine
>  * coGBK
>  * GBK
>  * Side Input
> The tests should run on Dataflow and Flink. The tests should be using 
> synthetic source and be running in batch mode.
> [1] 
> [http://metrics.beam.apache.org/dashboards/f/OtXje1iGz/performance-tests-metrics]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-11425) [Go SDK] Support metrics querying (Dataflow)

2020-12-15 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-11425:

Resolution: Fixed
Status: Resolved  (was: Open)

> [Go SDK] Support metrics querying (Dataflow)
> 
>
> Key: BEAM-11425
> URL: https://issues.apache.org/jira/browse/BEAM-11425
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: P2
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> The idea of querying metrics described in the parent ticket doesn't apply to 
> Dataflow runner. Instead, we can get metrics from Monitoring API (this is how 
> it works in Python SDK: 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/dataflow_metrics.py)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-11212) Beam metrics should be displayed in Flink UI "Metrics" tab

2020-12-15 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski reassigned BEAM-11212:
---

Assignee: (was: Kamil Wasilewski)

> Beam metrics should be displayed in Flink UI "Metrics" tab
> --
>
> Key: BEAM-11212
> URL: https://issues.apache.org/jira/browse/BEAM-11212
> Project: Beam
>  Issue Type: Wish
>  Components: runner-flink
>Reporter: Kyle Weaver
>Priority: P2
>  Labels: portability-flink
>
> All Beam metrics are visible in the Flink UI in a single accumulator value 
> (in the "Accumulators" tab), which is a large, hard-to-read blob. Originally, 
> this blob was rendered in a bespoke format 
> (https://github.com/apache/beam/blob/ead80b469ffeeddcd8e9e5c8dc462eec0b0ffc6b/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/MetricQueryResults.java#L63-L72).
>  I changed the format to JSON so it could be easily deserialized (BEAM-9600). 
> But then an issue was filed (BEAM-10294) reporting that the new JSON format 
> was harder to read than the original bespoke format. The temporary fix was to 
> revert to the bespoke format in Spark, while allowing Flink to continue to 
> use JSON. However, if Beam metrics in Flink are only visible as an 
> accumulator, then they are also unreadable because the payloads are in binary 
> form (BEAM-10719).
> Having metrics visible in Flink's "Metrics" tab would A) make metrics easier 
> to read (even compared to the bespoke accumulator string format), and closer 
> to what users of Beamless Flink expect, and B) free us to use the accumulator 
> however we wish for Beam internal purposes, without worrying about 
> readability.
> One question I'm not sure about is, why can't we see Beam metrics in the 
> Flink UI already? I thought we were already translating Beam metrics into 
> Flink native metrics 
> (https://github.com/apache/beam/blob/ea2a3f6896b66a2852a2ff3d82f4e1b010013d13/runners/flink/src/main/java/org/apache/beam/runners/flink/metrics/FlinkMetricContainer.java#L108-L109).
>  Is there something else we need to do to display them in the UI?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-11230) ReadFromBigQuery fails when the table has repeated records

2020-12-15 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-11230:

Fix Version/s: 2.27.0
   Resolution: Fixed
   Status: Resolved  (was: Open)

> ReadFromBigQuery fails when the table has repeated records
> --
>
> Key: BEAM-11230
> URL: https://issues.apache.org/jira/browse/BEAM-11230
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.25.0
>Reporter: Alvaro
>Assignee: Kamil Wasilewski
>Priority: P2
> Fix For: 2.27.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This is pretty much similar to the issue mentioned here: 
> https://issues.apache.org/jira/browse/BEAM-10524
> I've upgraded the python sdk version from 2.24 to 2.25 and the 
> ReadFromBigQuery start failing with this stacktrace:
>  
> {code:java}
> 
> "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 
> 649, in do_work
> work_executor.execute()
>   File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", 
> line 179, in execute
> op.start()
>   File "dataflow_worker/native_operations.py", line 38, in 
> dataflow_worker.native_operations.NativeReadOperation.start
>   File "dataflow_worker/native_operations.py", line 39, in 
> dataflow_worker.native_operations.NativeReadOperation.start
>   File "dataflow_worker/native_operations.py", line 44, in 
> dataflow_worker.native_operations.NativeReadOperation.start
>   File "dataflow_worker/native_operations.py", line 48, in 
> dataflow_worker.native_operations.NativeReadOperation.start
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/concat_source.py", 
> line 89, in read
> range_tracker.sub_range_tracker(source_ix)):
>   File "/usr/local/lib/python3.7/site-packages/apache_beam/io/textio.py", 
> line 210, in read_records
> yield self._coder.decode(record)
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py", line 
> 633, in decode
> return self._decode_with_schema(value, self.fields)
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py", line 
> 656, in _decode_with_schema
> value[field.name] = converter(value[field.name])
> TypeError: int() argument must be a string, a bytes-like object or a number, 
> not 'list'{code}
> According to the aforementioned issue, this should be fixed on the 2.25 but 
> it is actually the opposite in my case. 
> Code: 
> https://github.com/apache/beam/blob/release-2.25.0/sdks/python/apache_beam/io/gcp/bigquery.py#L656
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-11272) Combiner Label Constructor Arg Not Passed Through To pTransform

2020-12-14 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-11272:

Status: Open  (was: Triage Needed)

> Combiner Label Constructor Arg Not Passed Through To pTransform
> ---
>
> Key: BEAM-11272
> URL: https://issues.apache.org/jira/browse/BEAM-11272
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Robbie Gruener
>Priority: P3
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The combiners ToSet, ToDict, and ToList have a label argument in their 
> constructor:
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/combiners.py#L877]
>  
> However the label passed through to the parent class is not actually used 
> (there is a mismatch in the super constructor) 
> https://github.com/apache/beam/blob/7ac82a875462f2c2570a1157b782ebadddebbe70/sdks/python/apache_beam/transforms/combiners.py#L68



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-11230) ReadFromBigQuery fails when the table has repeated records

2020-12-11 Thread Kamil Wasilewski (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-11230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247891#comment-17247891
 ] 

Kamil Wasilewski commented on BEAM-11230:
-

Thanks, I'll take care of that bug. In the meantime, you can pass 
`use_json_exports=False` parameter to your ReadFromBigQuery transform. With 
this parameter being False, the transform will export a BigQuery table to AVRO 
files instead of JSON files. That should work.

> ReadFromBigQuery fails when the table has repeated records
> --
>
> Key: BEAM-11230
> URL: https://issues.apache.org/jira/browse/BEAM-11230
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.25.0
>Reporter: Alvaro
>Assignee: Kamil Wasilewski
>Priority: P2
>
> This is pretty much similar to the issue mentioned here: 
> https://issues.apache.org/jira/browse/BEAM-10524
> I've upgraded the python sdk version from 2.24 to 2.25 and the 
> ReadFromBigQuery start failing with this stacktrace:
>  
> {code:java}
> 
> "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 
> 649, in do_work
> work_executor.execute()
>   File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", 
> line 179, in execute
> op.start()
>   File "dataflow_worker/native_operations.py", line 38, in 
> dataflow_worker.native_operations.NativeReadOperation.start
>   File "dataflow_worker/native_operations.py", line 39, in 
> dataflow_worker.native_operations.NativeReadOperation.start
>   File "dataflow_worker/native_operations.py", line 44, in 
> dataflow_worker.native_operations.NativeReadOperation.start
>   File "dataflow_worker/native_operations.py", line 48, in 
> dataflow_worker.native_operations.NativeReadOperation.start
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/concat_source.py", 
> line 89, in read
> range_tracker.sub_range_tracker(source_ix)):
>   File "/usr/local/lib/python3.7/site-packages/apache_beam/io/textio.py", 
> line 210, in read_records
> yield self._coder.decode(record)
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py", line 
> 633, in decode
> return self._decode_with_schema(value, self.fields)
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py", line 
> 656, in _decode_with_schema
> value[field.name] = converter(value[field.name])
> TypeError: int() argument must be a string, a bytes-like object or a number, 
> not 'list'{code}
> According to the aforementioned issue, this should be fixed on the 2.25 but 
> it is actually the opposite in my case. 
> Code: 
> https://github.com/apache/beam/blob/release-2.25.0/sdks/python/apache_beam/io/gcp/bigquery.py#L656
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-11230) ReadFromBigQuery fails when the table has repeated records

2020-12-11 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski reassigned BEAM-11230:
---

Assignee: Kamil Wasilewski

> ReadFromBigQuery fails when the table has repeated records
> --
>
> Key: BEAM-11230
> URL: https://issues.apache.org/jira/browse/BEAM-11230
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.25.0
>Reporter: Alvaro
>Assignee: Kamil Wasilewski
>Priority: P2
>
> This is pretty much similar to the issue mentioned here: 
> https://issues.apache.org/jira/browse/BEAM-10524
> I've upgraded the python sdk version from 2.24 to 2.25 and the 
> ReadFromBigQuery start failing with this stacktrace:
>  
> {code:java}
> 
> "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 
> 649, in do_work
> work_executor.execute()
>   File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", 
> line 179, in execute
> op.start()
>   File "dataflow_worker/native_operations.py", line 38, in 
> dataflow_worker.native_operations.NativeReadOperation.start
>   File "dataflow_worker/native_operations.py", line 39, in 
> dataflow_worker.native_operations.NativeReadOperation.start
>   File "dataflow_worker/native_operations.py", line 44, in 
> dataflow_worker.native_operations.NativeReadOperation.start
>   File "dataflow_worker/native_operations.py", line 48, in 
> dataflow_worker.native_operations.NativeReadOperation.start
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/concat_source.py", 
> line 89, in read
> range_tracker.sub_range_tracker(source_ix)):
>   File "/usr/local/lib/python3.7/site-packages/apache_beam/io/textio.py", 
> line 210, in read_records
> yield self._coder.decode(record)
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py", line 
> 633, in decode
> return self._decode_with_schema(value, self.fields)
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py", line 
> 656, in _decode_with_schema
> value[field.name] = converter(value[field.name])
> TypeError: int() argument must be a string, a bytes-like object or a number, 
> not 'list'{code}
> According to the aforementioned issue, this should be fixed on the 2.25 but 
> it is actually the opposite in my case. 
> Code: 
> https://github.com/apache/beam/blob/release-2.25.0/sdks/python/apache_beam/io/gcp/bigquery.py#L656
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-11230) ReadFromBigQuery fails when the table has repeated records

2020-12-11 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-11230:

Status: Open  (was: Triage Needed)

> ReadFromBigQuery fails when the table has repeated records
> --
>
> Key: BEAM-11230
> URL: https://issues.apache.org/jira/browse/BEAM-11230
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.25.0
>Reporter: Alvaro
>Priority: P2
>
> This is pretty much similar to the issue mentioned here: 
> https://issues.apache.org/jira/browse/BEAM-10524
> I've upgraded the python sdk version from 2.24 to 2.25 and the 
> ReadFromBigQuery start failing with this stacktrace:
>  
> {code:java}
> 
> "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 
> 649, in do_work
> work_executor.execute()
>   File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", 
> line 179, in execute
> op.start()
>   File "dataflow_worker/native_operations.py", line 38, in 
> dataflow_worker.native_operations.NativeReadOperation.start
>   File "dataflow_worker/native_operations.py", line 39, in 
> dataflow_worker.native_operations.NativeReadOperation.start
>   File "dataflow_worker/native_operations.py", line 44, in 
> dataflow_worker.native_operations.NativeReadOperation.start
>   File "dataflow_worker/native_operations.py", line 48, in 
> dataflow_worker.native_operations.NativeReadOperation.start
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/concat_source.py", 
> line 89, in read
> range_tracker.sub_range_tracker(source_ix)):
>   File "/usr/local/lib/python3.7/site-packages/apache_beam/io/textio.py", 
> line 210, in read_records
> yield self._coder.decode(record)
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py", line 
> 633, in decode
> return self._decode_with_schema(value, self.fields)
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py", line 
> 656, in _decode_with_schema
> value[field.name] = converter(value[field.name])
> TypeError: int() argument must be a string, a bytes-like object or a number, 
> not 'list'{code}
> According to the aforementioned issue, this should be fixed on the 2.25 but 
> it is actually the opposite in my case. 
> Code: 
> https://github.com/apache/beam/blob/release-2.25.0/sdks/python/apache_beam/io/gcp/bigquery.py#L656
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-11212) Beam metrics should be displayed in Flink UI "Metrics" tab

2020-12-09 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski reassigned BEAM-11212:
---

Assignee: Kamil Wasilewski

> Beam metrics should be displayed in Flink UI "Metrics" tab
> --
>
> Key: BEAM-11212
> URL: https://issues.apache.org/jira/browse/BEAM-11212
> Project: Beam
>  Issue Type: Wish
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kamil Wasilewski
>Priority: P2
>  Labels: portability-flink
>
> All Beam metrics are visible in the Flink UI in a single accumulator value 
> (in the "Accumulators" tab), which is a large, hard-to-read blob. Originally, 
> this blob was rendered in a bespoke format 
> (https://github.com/apache/beam/blob/ead80b469ffeeddcd8e9e5c8dc462eec0b0ffc6b/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/MetricQueryResults.java#L63-L72).
>  I changed the format to JSON so it could be easily deserialized (BEAM-9600). 
> But then an issue was filed (BEAM-10294) reporting that the new JSON format 
> was harder to read than the original bespoke format. The temporary fix was to 
> revert to the bespoke format in Spark, while allowing Flink to continue to 
> use JSON. However, if Beam metrics in Flink are only visible as an 
> accumulator, then they are also unreadable because the payloads are in binary 
> form (BEAM-10719).
> Having metrics visible in Flink's "Metrics" tab would A) make metrics easier 
> to read (even compared to the bespoke accumulator string format), and closer 
> to what users of Beamless Flink expect, and B) free us to use the accumulator 
> however we wish for Beam internal purposes, without worrying about 
> readability.
> One question I'm not sure about is, why can't we see Beam metrics in the 
> Flink UI already? I thought we were already translating Beam metrics into 
> Flink native metrics 
> (https://github.com/apache/beam/blob/ea2a3f6896b66a2852a2ff3d82f4e1b010013d13/runners/flink/src/main/java/org/apache/beam/runners/flink/metrics/FlinkMetricContainer.java#L108-L109).
>  Is there something else we need to do to display them in the UI?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-11212) Beam metrics should be displayed in Flink UI "Metrics" tab

2020-12-09 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-11212:

Status: Open  (was: Triage Needed)

> Beam metrics should be displayed in Flink UI "Metrics" tab
> --
>
> Key: BEAM-11212
> URL: https://issues.apache.org/jira/browse/BEAM-11212
> Project: Beam
>  Issue Type: Wish
>  Components: runner-flink
>Reporter: Kyle Weaver
>Priority: P2
>  Labels: portability-flink
>
> All Beam metrics are visible in the Flink UI in a single accumulator value 
> (in the "Accumulators" tab), which is a large, hard-to-read blob. Originally, 
> this blob was rendered in a bespoke format 
> (https://github.com/apache/beam/blob/ead80b469ffeeddcd8e9e5c8dc462eec0b0ffc6b/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/MetricQueryResults.java#L63-L72).
>  I changed the format to JSON so it could be easily deserialized (BEAM-9600). 
> But then an issue was filed (BEAM-10294) reporting that the new JSON format 
> was harder to read than the original bespoke format. The temporary fix was to 
> revert to the bespoke format in Spark, while allowing Flink to continue to 
> use JSON. However, if Beam metrics in Flink are only visible as an 
> accumulator, then they are also unreadable because the payloads are in binary 
> form (BEAM-10719).
> Having metrics visible in Flink's "Metrics" tab would A) make metrics easier 
> to read (even compared to the bespoke accumulator string format), and closer 
> to what users of Beamless Flink expect, and B) free us to use the accumulator 
> however we wish for Beam internal purposes, without worrying about 
> readability.
> One question I'm not sure about is, why can't we see Beam metrics in the 
> Flink UI already? I thought we were already translating Beam metrics into 
> Flink native metrics 
> (https://github.com/apache/beam/blob/ea2a3f6896b66a2852a2ff3d82f4e1b010013d13/runners/flink/src/main/java/org/apache/beam/runners/flink/metrics/FlinkMetricContainer.java#L108-L109).
>  Is there something else we need to do to display them in the UI?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-11425) [Go SDK] Support metrics querying (Dataflow)

2020-12-08 Thread Kamil Wasilewski (Jira)
Kamil Wasilewski created BEAM-11425:
---

 Summary: [Go SDK] Support metrics querying (Dataflow)
 Key: BEAM-11425
 URL: https://issues.apache.org/jira/browse/BEAM-11425
 Project: Beam
  Issue Type: Sub-task
  Components: sdk-go
Reporter: Kamil Wasilewski
Assignee: Kamil Wasilewski


The idea of querying metrics described in the parent ticket doesn't apply to 
Dataflow runner. Instead, we can get metrics from Monitoring API (this is how 
it works in Python SDK: 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/dataflow_metrics.py)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-11379) coktogel

2020-12-08 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-11379:

Resolution: Won't Fix
Status: Resolved  (was: Triage Needed)

> coktogel
> 
>
> Key: BEAM-11379
> URL: https://issues.apache.org/jira/browse/BEAM-11379
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: viarmerdeka17
>Priority: P2
>  Labels: spam
>
> [ https://cintatogel88.com/|https://cintatogel88.com/] adalah situs judi 
> online yang sudah terpercaya dikalangan para bettor untuk bermain judi 
> online. Kalian akan mendapatkan banyak keuntungan jika bermain disini.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-11379) coktogel

2020-12-08 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-11379:

Labels: spam  (was: )

> coktogel
> 
>
> Key: BEAM-11379
> URL: https://issues.apache.org/jira/browse/BEAM-11379
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: viarmerdeka17
>Priority: P2
>  Labels: spam
>
> [ https://cintatogel88.com/|https://cintatogel88.com/] adalah situs judi 
> online yang sudah terpercaya dikalangan para bettor untuk bermain judi 
> online. Kalian akan mendapatkan banyak keuntungan jika bermain disini.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-11398) #1 CoGBK Go load test is too slow

2020-12-04 Thread Kamil Wasilewski (Jira)
Kamil Wasilewski created BEAM-11398:
---

 Summary: #1 CoGBK Go load test is too slow
 Key: BEAM-11398
 URL: https://issues.apache.org/jira/browse/BEAM-11398
 Project: Beam
  Issue Type: Sub-task
  Components: testing
Reporter: Kamil Wasilewski


One of the CoGBK load test cases, which involves a PCollection of 2GB of data 
with a single key, takes much longer to be processed than the rest. This 
happens for Go SDK and Flink.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-3736) Add SetUp() and TearDown() for CombineFns

2020-12-04 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski reassigned BEAM-3736:
--

Assignee: (was: Kamil Wasilewski)

> Add SetUp() and TearDown() for CombineFns
> -
>
> Key: BEAM-3736
> URL: https://issues.apache.org/jira/browse/BEAM-3736
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-py-core
>Reporter: Chuan Yu Foo
>Priority: P2
>  Time Spent: 13h
>  Remaining Estimate: 0h
>
> I have a CombineFn that has a large amount of state that needs to be loaded 
> once before it can add_input or merge_combiners (for example, the CombineFn 
> might load up a large lookup table used for combining). 
> Right now, to initialise this state, for each of the methods, I check if the 
> state has already been initialised, and if not, I initialise it. It would be 
> nice if CombineFn provided a SetUp() method that is called once to initialise 
> this state (and a corresponding TearDown() method to clean up this state if 
> necessary).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-11075) Load Tests for Go SDK

2020-11-25 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-11075:

Description: 
We have Load Tests for Python and Java SDKs[1], but we are missing the ones for 
Go SDK.

Tests to be done:
 * ParDo
 * Combine
 * coGBK
 * GBK
 * Side Input

The tests should run on Dataflow and Flink. The tests should be using synthetic 
source and be running in batch mode.

[1] 
[http://metrics.beam.apache.org/dashboards/f/OtXje1iGz/performance-tests-metrics]

  was:
We have Load Tests for Python and Java SDKs[1], but we are missing the ones for 
Go SDK.

Tests to be done:
 * ParDo
 * Combine
 * coGBK
 * GBK
 * Side Input

The tests should run on Dataflow and Flink. The tests should be using synthetic 
source and be running in batch mode.

[1]http://metrics.beam.apache.org/dashboards/f/OtXje1iGz/performance-tests-metrics


> Load Tests for Go SDK
> -
>
> Key: BEAM-11075
> URL: https://issues.apache.org/jira/browse/BEAM-11075
> Project: Beam
>  Issue Type: Test
>  Components: sdk-go, testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: P3
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> We have Load Tests for Python and Java SDKs[1], but we are missing the ones 
> for Go SDK.
> Tests to be done:
>  * ParDo
>  * Combine
>  * coGBK
>  * GBK
>  * Side Input
> The tests should run on Dataflow and Flink. The tests should be using 
> synthetic source and be running in batch mode.
> [1] 
> [http://metrics.beam.apache.org/dashboards/f/OtXje1iGz/performance-tests-metrics]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9421) AI Platform pipeline patterns

2020-11-25 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-9421:
---
Status: Resolved  (was: Open)

> AI Platform pipeline patterns
> -
>
> Key: BEAM-9421
> URL: https://issues.apache.org/jira/browse/BEAM-9421
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Kamil Wasilewski
>Priority: P3
>  Labels: pipeline-patterns
>  Time Spent: 17h
>  Remaining Estimate: 0h
>
> New pipeline patterns should be contributed to the Beam's website in order to 
> demonstrate how newly implemented Google Cloud AI PTransforms can be used in 
> pipelines.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9421) AI Platform pipeline patterns

2020-11-25 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-9421:
---
Fix Version/s: Not applicable

> AI Platform pipeline patterns
> -
>
> Key: BEAM-9421
> URL: https://issues.apache.org/jira/browse/BEAM-9421
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Kamil Wasilewski
>Priority: P3
>  Labels: pipeline-patterns
> Fix For: Not applicable
>
>  Time Spent: 17h
>  Remaining Estimate: 0h
>
> New pipeline patterns should be contributed to the Beam's website in order to 
> demonstrate how newly implemented Google Cloud AI PTransforms can be used in 
> pipelines.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10979) Python CoGBK should propagate typehints

2020-11-23 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-10979:

Status: Resolved  (was: Open)

> Python CoGBK should propagate typehints
> ---
>
> Key: BEAM-10979
> URL: https://issues.apache.org/jira/browse/BEAM-10979
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Brian Hulette
>Priority: P2
>  Labels: types
>
> Currently CoGBK erases type hints, but they could be propagated



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (BEAM-10979) Python CoGBK should propagate typehints

2020-11-23 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski reopened BEAM-10979:
-

> Python CoGBK should propagate typehints
> ---
>
> Key: BEAM-10979
> URL: https://issues.apache.org/jira/browse/BEAM-10979
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Brian Hulette
>Priority: P2
>  Labels: types
>
> Currently CoGBK erases type hints, but they could be propagated



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8135) Remove obsolete BigQuery publishers

2020-11-23 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-8135:
---
Description: After creating an InfluxDB metrics publisher, BQ publishers 
became obsolete and should be removed.  (was: After creating Prometheus metrics 
publisher, BQ publishers would become obsolete and should be removed.)

> Remove obsolete BigQuery publishers
> ---
>
> Key: BEAM-8135
> URL: https://issues.apache.org/jira/browse/BEAM-8135
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Priority: P3
>
> After creating an InfluxDB metrics publisher, BQ publishers became obsolete 
> and should be removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-11207) [Go SDK] Support metrics querying

2020-11-17 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-11207:

Status: Resolved  (was: Open)

> [Go SDK] Support metrics querying
> -
>
> Key: BEAM-11207
> URL: https://issues.apache.org/jira/browse/BEAM-11207
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: P2
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> Design doc: [https://s.apache.org/get-metrics-api]
> Go SDK does not offer a way to query a pipeline's metrics. Go SDK needs to 
> implement a code that would call the GetJobMetrics RPC and let users query 
> the result by using an API similar to existing APIs in Python and Java SDKs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-11218) ptest allows to obtain a pipeline result

2020-11-10 Thread Kamil Wasilewski (Jira)
Kamil Wasilewski created BEAM-11218:
---

 Summary: ptest allows to obtain a pipeline result
 Key: BEAM-11218
 URL: https://issues.apache.org/jira/browse/BEAM-11218
 Project: Beam
  Issue Type: Sub-task
  Components: sdk-go
Reporter: Kamil Wasilewski


ptest should allow to obtain a pipeline result, which would enable using 
metrics in testing scenarios. This can be done either by adding a new endpoint 
(e.g. `ptest.RunWithMetrics`) or replacing the existing `ptest.Run` with a new 
version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-11217) Implement metrics filtering

2020-11-10 Thread Kamil Wasilewski (Jira)
Kamil Wasilewski created BEAM-11217:
---

 Summary: Implement metrics filtering
 Key: BEAM-11217
 URL: https://issues.apache.org/jira/browse/BEAM-11217
 Project: Beam
  Issue Type: Sub-task
  Components: sdk-go
Reporter: Kamil Wasilewski


`metrics.Results` misses a method for querying metrics using a provided filter. 
The method should take a filter object as an argument and return 
`metrics.QueryResults` object containing metrics that matched the filter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-11207) [Go SDK] Support metrics querying

2020-11-09 Thread Kamil Wasilewski (Jira)
Kamil Wasilewski created BEAM-11207:
---

 Summary: [Go SDK] Support metrics querying
 Key: BEAM-11207
 URL: https://issues.apache.org/jira/browse/BEAM-11207
 Project: Beam
  Issue Type: New Feature
  Components: sdk-go
Reporter: Kamil Wasilewski
Assignee: Kamil Wasilewski


Design doc: [https://s.apache.org/get-metrics-api]

Go SDK does not offer a way to query a pipeline's metrics. Go SDK needs to 
implement a code that would call the GetJobMetrics RPC and let users query the 
result by using an API similar to existing APIs in Python and Java SDKs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-5465) Have the sdks/go gradle tasks clean up the vendor directories on clean

2020-10-30 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-5465:
---
Status: Resolved  (was: Open)

> Have the sdks/go gradle tasks clean up the vendor directories on clean
> --
>
> Key: BEAM-5465
> URL: https://issues.apache.org/jira/browse/BEAM-5465
> Project: Beam
>  Issue Type: Bug
>  Components: build-system, sdk-go
>Reporter: Robert Burke
>Assignee: Kamil Wasilewski
>Priority: P3
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The sdks/go/\{test,examples,container} directories depend on the main beam 
> package, which cause gogradle to vendor the beam package in those directories.
>  
> The gogradle plugin doesn't clean up the [vendor 
> directories|https://golang.org/cmd/go/#hdr-Vendor_Directories] that it sets 
> up on builds, when clean is invoked.
>  
> This leads to stale copies of the Go SDK's beam package vendored in local 
> directories, which can lead to build failures of other tasks that invoke the 
> tests or similar, when the code in those directories uses a more recent 
> version of beam than what is cached.
>  
> This doesn't happen for users directly using the go tool, with their git repo 
> nested under Go PATH, since the go tool will correctly use the local repo 
> copy of beam.
>  
> A workaround on a Unix machine or similar, invoked from the beam repo root, 
> is to delete the vendor and gogradle directories, and retry the task.
>  
> rm -rf sdks/go/\{vendor,.gogradle} 
> sdks/go/\{test,examples,container}/\{vendor,.gogradle}
>  
> This cause gogradle to fetch a more recent copy of beam for vendoring.
>  
> Ideally we fix the clean tasks for the go directories to delete the vendor 
> directories as well, which will resolve the issue more reliably for those 
> using gradle to test their changes against the go sdk.
> Related: BEAM-5379 is for avoiding the vendoring & cleaning cycle all 
> together and migrating to gomodules.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-5465) Have the sdks/go gradle tasks clean up the vendor directories on clean

2020-10-30 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-5465:
---
Fix Version/s: Not applicable

> Have the sdks/go gradle tasks clean up the vendor directories on clean
> --
>
> Key: BEAM-5465
> URL: https://issues.apache.org/jira/browse/BEAM-5465
> Project: Beam
>  Issue Type: Bug
>  Components: build-system, sdk-go
>Reporter: Robert Burke
>Assignee: Kamil Wasilewski
>Priority: P3
> Fix For: Not applicable
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The sdks/go/\{test,examples,container} directories depend on the main beam 
> package, which cause gogradle to vendor the beam package in those directories.
>  
> The gogradle plugin doesn't clean up the [vendor 
> directories|https://golang.org/cmd/go/#hdr-Vendor_Directories] that it sets 
> up on builds, when clean is invoked.
>  
> This leads to stale copies of the Go SDK's beam package vendored in local 
> directories, which can lead to build failures of other tasks that invoke the 
> tests or similar, when the code in those directories uses a more recent 
> version of beam than what is cached.
>  
> This doesn't happen for users directly using the go tool, with their git repo 
> nested under Go PATH, since the go tool will correctly use the local repo 
> copy of beam.
>  
> A workaround on a Unix machine or similar, invoked from the beam repo root, 
> is to delete the vendor and gogradle directories, and retry the task.
>  
> rm -rf sdks/go/\{vendor,.gogradle} 
> sdks/go/\{test,examples,container}/\{vendor,.gogradle}
>  
> This cause gogradle to fetch a more recent copy of beam for vendoring.
>  
> Ideally we fix the clean tasks for the go directories to delete the vendor 
> directories as well, which will resolve the issue more reliably for those 
> using gradle to test their changes against the go sdk.
> Related: BEAM-5379 is for avoiding the vendoring & cleaning cycle all 
> together and migrating to gomodules.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-5465) Have the sdks/go gradle tasks clean up the vendor directories on clean

2020-10-29 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski reassigned BEAM-5465:
--

Assignee: Kamil Wasilewski

> Have the sdks/go gradle tasks clean up the vendor directories on clean
> --
>
> Key: BEAM-5465
> URL: https://issues.apache.org/jira/browse/BEAM-5465
> Project: Beam
>  Issue Type: Bug
>  Components: build-system, sdk-go
>Reporter: Robert Burke
>Assignee: Kamil Wasilewski
>Priority: P3
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The sdks/go/\{test,examples,container} directories depend on the main beam 
> package, which cause gogradle to vendor the beam package in those directories.
>  
> The gogradle plugin doesn't clean up the [vendor 
> directories|https://golang.org/cmd/go/#hdr-Vendor_Directories] that it sets 
> up on builds, when clean is invoked.
>  
> This leads to stale copies of the Go SDK's beam package vendored in local 
> directories, which can lead to build failures of other tasks that invoke the 
> tests or similar, when the code in those directories uses a more recent 
> version of beam than what is cached.
>  
> This doesn't happen for users directly using the go tool, with their git repo 
> nested under Go PATH, since the go tool will correctly use the local repo 
> copy of beam.
>  
> A workaround on a Unix machine or similar, invoked from the beam repo root, 
> is to delete the vendor and gogradle directories, and retry the task.
>  
> rm -rf sdks/go/\{vendor,.gogradle} 
> sdks/go/\{test,examples,container}/\{vendor,.gogradle}
>  
> This cause gogradle to fetch a more recent copy of beam for vendoring.
>  
> Ideally we fix the clean tasks for the go directories to delete the vendor 
> directories as well, which will resolve the issue more reliably for those 
> using gradle to test their changes against the go sdk.
> Related: BEAM-5379 is for avoiding the vendoring & cleaning cycle all 
> together and migrating to gomodules.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-11075) Load Tests for Go SDK

2020-10-20 Thread Kamil Wasilewski (Jira)
Kamil Wasilewski created BEAM-11075:
---

 Summary: Load Tests for Go SDK
 Key: BEAM-11075
 URL: https://issues.apache.org/jira/browse/BEAM-11075
 Project: Beam
  Issue Type: Test
  Components: sdk-go, testing
Reporter: Kamil Wasilewski
Assignee: Kamil Wasilewski


We have Load Tests for Python and Java SDKs[1], but we are missing the ones for 
Go SDK.

Tests to be done:
 * ParDo
 * Combine
 * coGBK
 * GBK
 * Side Input

The tests should run on Dataflow and Flink. The tests should be using synthetic 
source and be running in batch mode.

[1]http://metrics.beam.apache.org/dashboards/f/OtXje1iGz/performance-tests-metrics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10774) GBK Python streaming load tests are too slow

2020-10-20 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-10774:

Labels:   (was: stale-P2)

> GBK Python streaming load tests are too slow
> 
>
> Key: BEAM-10774
> URL: https://issues.apache.org/jira/browse/BEAM-10774
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Kamil Wasilewski
>Priority: P3
>
> The following GBK streaming test cases take too long on Dataflow:
>  
> 1) 2GB of 10B records
> 2) 2GB of 100B records
> 4) fanout 4 times with 2GB 10-byte records total
> 5) fanout 8 times with 2GB 10-byte records total
>  
> Each of them needs at least 1 hour to execute, which is way too long for one 
> Jenkins job. 
> Job's definition: 
> [https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_LoadTests_GBK_Python.groovy]
> Test pipeline: 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/testing/load_tests/group_by_key_test.py]
> It is probable that those cases are too extreme. The first two cases involve 
> grouping 20M unique keys, which is a stressful operation. A solution might be 
> to overhaul the cases so that they would be less complex.
> Both the current production Dataflow runner and the new Dataflow Runner V2 
> were tested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10774) GBK Python streaming load tests are too slow

2020-10-20 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-10774:

Priority: P3  (was: P2)

> GBK Python streaming load tests are too slow
> 
>
> Key: BEAM-10774
> URL: https://issues.apache.org/jira/browse/BEAM-10774
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Kamil Wasilewski
>Priority: P3
>  Labels: stale-P2
>
> The following GBK streaming test cases take too long on Dataflow:
>  
> 1) 2GB of 10B records
> 2) 2GB of 100B records
> 4) fanout 4 times with 2GB 10-byte records total
> 5) fanout 8 times with 2GB 10-byte records total
>  
> Each of them needs at least 1 hour to execute, which is way too long for one 
> Jenkins job. 
> Job's definition: 
> [https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_LoadTests_GBK_Python.groovy]
> Test pipeline: 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/testing/load_tests/group_by_key_test.py]
> It is probable that those cases are too extreme. The first two cases involve 
> grouping 20M unique keys, which is a stressful operation. A solution might be 
> to overhaul the cases so that they would be less complex.
> Both the current production Dataflow runner and the new Dataflow Runner V2 
> were tested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10270) Portable batch performance regression observed via beam_LoadTests_Python_ParDo_Flink_Batch timing out

2020-10-20 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-10270:

Status: Resolved  (was: Open)

> Portable batch performance regression observed via 
> beam_LoadTests_Python_ParDo_Flink_Batch timing out
> -
>
> Key: BEAM-10270
> URL: https://issues.apache.org/jira/browse/BEAM-10270
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, sdk-py-core, test-failures
>Reporter: Udi Meiri
>Priority: P0
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Seems to be consistently timing out. Last successful on May 7 2020 
> (https://builds.apache.org/job/beam_LoadTests_Python_ParDo_Flink_Batch/225/) 
> where it ran for 3 hr 18 min.
> {code}
> 07:26:43 > Task :sdks:python:apache_beam:testing:load_tests:run
> 07:26:43 WARNING:root:Make sure that locally built Python SDK docker image 
> has Python 3.7 interpreter.
> 07:26:43 INFO:root:Using Python SDK docker image: 
> apache/beam_python3.7_sdk:2.23.0.dev. If the image is not available at local, 
> we will try to pull from hub.docker.com
> 07:26:45 
> INFO:apache_beam.runners.portability.fn_api_runner.translations:
>   
> 07:26:45 WARNING:apache_beam.options.pipeline_options:Discarding unparseable 
> args: ['--iterations=200', '--number_of_counter_operations=0', 
> '--number_of_counters=0']
> 07:26:45 INFO:apache_beam.runners.portability.portable_runner:Job state 
> changed to STOPPED
> 07:26:45 INFO:apache_beam.runners.portability.portable_runner:Job state 
> changed to STARTING
> 07:26:45 INFO:apache_beam.runners.portability.portable_runner:Job state 
> changed to RUNNING
> 10:48:27 Build timed out (after 240 minutes). Marking the build as aborted.
> {code}
> https://builds.apache.org/job/beam_LoadTests_Python_ParDo_Flink_Batch/264/console



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10270) Portable batch performance regression observed via beam_LoadTests_Python_ParDo_Flink_Batch timing out

2020-10-20 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-10270:

Fix Version/s: Not applicable

> Portable batch performance regression observed via 
> beam_LoadTests_Python_ParDo_Flink_Batch timing out
> -
>
> Key: BEAM-10270
> URL: https://issues.apache.org/jira/browse/BEAM-10270
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, sdk-py-core, test-failures
>Reporter: Udi Meiri
>Priority: P0
> Fix For: Not applicable
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Seems to be consistently timing out. Last successful on May 7 2020 
> (https://builds.apache.org/job/beam_LoadTests_Python_ParDo_Flink_Batch/225/) 
> where it ran for 3 hr 18 min.
> {code}
> 07:26:43 > Task :sdks:python:apache_beam:testing:load_tests:run
> 07:26:43 WARNING:root:Make sure that locally built Python SDK docker image 
> has Python 3.7 interpreter.
> 07:26:43 INFO:root:Using Python SDK docker image: 
> apache/beam_python3.7_sdk:2.23.0.dev. If the image is not available at local, 
> we will try to pull from hub.docker.com
> 07:26:45 
> INFO:apache_beam.runners.portability.fn_api_runner.translations:
>   
> 07:26:45 WARNING:apache_beam.options.pipeline_options:Discarding unparseable 
> args: ['--iterations=200', '--number_of_counter_operations=0', 
> '--number_of_counters=0']
> 07:26:45 INFO:apache_beam.runners.portability.portable_runner:Job state 
> changed to STOPPED
> 07:26:45 INFO:apache_beam.runners.portability.portable_runner:Job state 
> changed to STARTING
> 07:26:45 INFO:apache_beam.runners.portability.portable_runner:Job state 
> changed to RUNNING
> 10:48:27 Build timed out (after 240 minutes). Marking the build as aborted.
> {code}
> https://builds.apache.org/job/beam_LoadTests_Python_ParDo_Flink_Batch/264/console



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10270) Portable batch performance regression observed via beam_LoadTests_Python_ParDo_Flink_Batch timing out

2020-10-20 Thread Kamil Wasilewski (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17217452#comment-17217452
 ] 

Kamil Wasilewski commented on BEAM-10270:
-

Yes, it has been resolved by [https://github.com/apache/beam/pull/12435. 
|https://github.com/apache/beam/pull/12435/]The regression was caused due to 
some previous changes to the test. Fortunately, there was no performance 
degradation in the Beam.

I think we can close the ticket.

> Portable batch performance regression observed via 
> beam_LoadTests_Python_ParDo_Flink_Batch timing out
> -
>
> Key: BEAM-10270
> URL: https://issues.apache.org/jira/browse/BEAM-10270
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, sdk-py-core, test-failures
>Reporter: Udi Meiri
>Priority: P0
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Seems to be consistently timing out. Last successful on May 7 2020 
> (https://builds.apache.org/job/beam_LoadTests_Python_ParDo_Flink_Batch/225/) 
> where it ran for 3 hr 18 min.
> {code}
> 07:26:43 > Task :sdks:python:apache_beam:testing:load_tests:run
> 07:26:43 WARNING:root:Make sure that locally built Python SDK docker image 
> has Python 3.7 interpreter.
> 07:26:43 INFO:root:Using Python SDK docker image: 
> apache/beam_python3.7_sdk:2.23.0.dev. If the image is not available at local, 
> we will try to pull from hub.docker.com
> 07:26:45 
> INFO:apache_beam.runners.portability.fn_api_runner.translations:
>   
> 07:26:45 WARNING:apache_beam.options.pipeline_options:Discarding unparseable 
> args: ['--iterations=200', '--number_of_counter_operations=0', 
> '--number_of_counters=0']
> 07:26:45 INFO:apache_beam.runners.portability.portable_runner:Job state 
> changed to STOPPED
> 07:26:45 INFO:apache_beam.runners.portability.portable_runner:Job state 
> changed to STARTING
> 07:26:45 INFO:apache_beam.runners.portability.portable_runner:Job state 
> changed to RUNNING
> 10:48:27 Build timed out (after 240 minutes). Marking the build as aborted.
> {code}
> https://builds.apache.org/job/beam_LoadTests_Python_ParDo_Flink_Batch/264/console



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-9154) Move Chicago Taxi Example to Python 3

2020-10-19 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski reassigned BEAM-9154:
--

Assignee: (was: Kamil Wasilewski)

> Move Chicago Taxi Example to Python 3
> -
>
> Key: BEAM-9154
> URL: https://issues.apache.org/jira/browse/BEAM-9154
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Kamil Wasilewski
>Priority: P1
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> The Chicago Taxi Example[1] should be moved to the latest version of Python 
> supported by Beam (currently it's Python 3.7).
> At the moment, the following error occurs when running the benchmark on 
> Python 3.7 (requires futher investigation):
> {code:java}
> Traceback (most recent call last):
>   File "preprocess.py", line 259, in 
> main()
>   File "preprocess.py", line 254, in main
> project=known_args.metric_reporting_project
>   File "preprocess.py", line 155, in transform_data
> ('Analyze' >> tft_beam.AnalyzeDataset(preprocessing_fn)))
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/transforms/ptransform.py",
>  line 987, in __ror__
> return self.transform.__ror__(pvalueish, self.label)
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/transforms/ptransform.py",
>  line 547, in __ror__
> result = p.apply(self, pvalueish, label)
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/pipeline.py", line 
> 532, in apply
> return self.apply(transform, pvalueish)
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/pipeline.py", line 
> 573, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/runners/runner.py", 
> line 223, in apply_PTransform
> return transform.expand(input)
>   File 
> "/Users/kamilwasilewski/proj/beam/build/gradleenv/2022703441/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py",
>  line 825, in expand
> input_metadata))
>   File 
> "/Users/kamilwasilewski/proj/beam/build/gradleenv/2022703441/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py",
>  line 716, in expand
> output_signature = self._preprocessing_fn(copied_inputs)
>   File "preprocess.py", line 102, in preprocessing_fn
> _fill_in_missing(inputs[key]),
> KeyError: 'company'
> {code}
> [1] sdks/python/apache_beam/testing/benchmarks/chicago_taxi



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9154) Move Chicago Taxi Example to Python 3

2020-10-19 Thread Kamil Wasilewski (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216780#comment-17216780
 ] 

Kamil Wasilewski commented on BEAM-9154:


Thanks, it looks like we have a course of action. Unfortunately, I can't work 
on this at the moment due to other stuff I'm currently involved in. I'll leave 
the ticket unassigned so that someone can pick it up.

> Move Chicago Taxi Example to Python 3
> -
>
> Key: BEAM-9154
> URL: https://issues.apache.org/jira/browse/BEAM-9154
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: P1
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> The Chicago Taxi Example[1] should be moved to the latest version of Python 
> supported by Beam (currently it's Python 3.7).
> At the moment, the following error occurs when running the benchmark on 
> Python 3.7 (requires futher investigation):
> {code:java}
> Traceback (most recent call last):
>   File "preprocess.py", line 259, in 
> main()
>   File "preprocess.py", line 254, in main
> project=known_args.metric_reporting_project
>   File "preprocess.py", line 155, in transform_data
> ('Analyze' >> tft_beam.AnalyzeDataset(preprocessing_fn)))
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/transforms/ptransform.py",
>  line 987, in __ror__
> return self.transform.__ror__(pvalueish, self.label)
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/transforms/ptransform.py",
>  line 547, in __ror__
> result = p.apply(self, pvalueish, label)
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/pipeline.py", line 
> 532, in apply
> return self.apply(transform, pvalueish)
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/pipeline.py", line 
> 573, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/runners/runner.py", 
> line 223, in apply_PTransform
> return transform.expand(input)
>   File 
> "/Users/kamilwasilewski/proj/beam/build/gradleenv/2022703441/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py",
>  line 825, in expand
> input_metadata))
>   File 
> "/Users/kamilwasilewski/proj/beam/build/gradleenv/2022703441/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py",
>  line 716, in expand
> output_signature = self._preprocessing_fn(copied_inputs)
>   File "preprocess.py", line 102, in preprocessing_fn
> _fill_in_missing(inputs[key]),
> KeyError: 'company'
> {code}
> [1] sdks/python/apache_beam/testing/benchmarks/chicago_taxi



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-3736) Add SetUp() and TearDown() for CombineFns

2020-10-05 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski reassigned BEAM-3736:
--

Assignee: Kamil Wasilewski

> Add SetUp() and TearDown() for CombineFns
> -
>
> Key: BEAM-3736
> URL: https://issues.apache.org/jira/browse/BEAM-3736
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-py-core
>Reporter: Chuan Yu Foo
>Assignee: Kamil Wasilewski
>Priority: P3
>
> I have a CombineFn that has a large amount of state that needs to be loaded 
> once before it can add_input or merge_combiners (for example, the CombineFn 
> might load up a large lookup table used for combining). 
> Right now, to initialise this state, for each of the methods, I check if the 
> state has already been initialised, and if not, I initialise it. It would be 
> nice if CombineFn provided a SetUp() method that is called once to initialise 
> this state (and a corresponding TearDown() method to clean up this state if 
> necessary).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-7505) Create SideInput Python Load Test Jenkins Job

2020-10-03 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-7505:
---
Fix Version/s: Not applicable

> Create SideInput Python Load Test Jenkins Job
> -
>
> Key: BEAM-7505
> URL: https://issues.apache.org/jira/browse/BEAM-7505
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kasia Kucharczyk
>Assignee: Kamil Wasilewski
>Priority: P2
> Fix For: Not applicable
>
>  Time Spent: 18h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-7505) Create SideInput Python Load Test Jenkins Job

2020-10-03 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-7505:
---
Status: Resolved  (was: Open)

> Create SideInput Python Load Test Jenkins Job
> -
>
> Key: BEAM-7505
> URL: https://issues.apache.org/jira/browse/BEAM-7505
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kasia Kucharczyk
>Assignee: Kamil Wasilewski
>Priority: P2
>  Time Spent: 18h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9506) _CustomBigQuerySource value provider parameter gcs_location is wrongly evaluated

2020-10-03 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-9506:
---
Fix Version/s: 2.26.0

> _CustomBigQuerySource value provider parameter gcs_location is wrongly 
> evaluated
> 
>
> Key: BEAM-9506
> URL: https://issues.apache.org/jira/browse/BEAM-9506
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Affects Versions: 2.19.0
>Reporter: Elias Djurfeldt
>Assignee: Kamil Wasilewski
>Priority: P3
> Fix For: 2.26.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The `gcs_location` parameter is wrongly evaluated at pipeline construction 
> time, resulting in calling value_provider.get() from a non-runtime context 
> when using a value provider for the `gcs_location`. 
> See discussion at 
> [https://github.com/apache/beam/pull/11040#issuecomment-597872563]
> The code in question is at: 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L1575]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9506) _CustomBigQuerySource value provider parameter gcs_location is wrongly evaluated

2020-10-03 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-9506:
---
Status: Resolved  (was: Open)

> _CustomBigQuerySource value provider parameter gcs_location is wrongly 
> evaluated
> 
>
> Key: BEAM-9506
> URL: https://issues.apache.org/jira/browse/BEAM-9506
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Affects Versions: 2.19.0
>Reporter: Elias Djurfeldt
>Assignee: Kamil Wasilewski
>Priority: P3
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The `gcs_location` parameter is wrongly evaluated at pipeline construction 
> time, resulting in calling value_provider.get() from a non-runtime context 
> when using a value provider for the `gcs_location`. 
> See discussion at 
> [https://github.com/apache/beam/pull/11040#issuecomment-597872563]
> The code in question is at: 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L1575]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9017) Beam Dependency Update Request: cachetools

2020-10-02 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-9017:
---
Status: Resolved  (was: Open)

> Beam Dependency Update Request: cachetools
> --
>
> Key: BEAM-9017
> URL: https://issues.apache.org/jira/browse/BEAM-9017
> Project: Beam
>  Issue Type: Bug
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Priority: P3
>
>  - 2019-12-23 12:04:17.483984 
> -
> Please consider upgrading the dependency cachetools. 
> The current version is 3.1.1. The latest version is 4.0.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-30 14:00:14.523312 
> -
> Please consider upgrading the dependency cachetools. 
> The current version is 3.1.1. The latest version is 4.0.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-01-06 12:03:38.328424 
> -
> Please consider upgrading the dependency cachetools. 
> The current version is 3.1.1. The latest version is 4.0.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-01-13 12:04:04.593478 
> -
> Please consider upgrading the dependency cachetools. 
> The current version is 3.1.1. The latest version is 4.0.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-01-20 12:03:36.859072 
> -
> Please consider upgrading the dependency cachetools. 
> The current version is 3.1.1. The latest version is 4.0.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-01-27 12:04:22.622190 
> -
> Please consider upgrading the dependency cachetools. 
> The current version is 3.1.1. The latest version is 4.0.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-02-03 12:06:51.686822 
> -
> Please consider upgrading the dependency cachetools. 
> The current version is 3.1.1. The latest version is 4.0.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-02-10 12:03:51.875766 
> -
> Please consider upgrading the dependency cachetools. 
> The current version is 3.1.1. The latest version is 4.0.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-02-17 12:04:37.088086 
> -
> Please consider upgrading the dependency cachetools. 
> The current version is 3.1.1. The latest version is 4.0.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-02-24 12:04:37.014614 
> -
> Please consider upgrading the dependency cachetools. 
> The current version is 3.1.1. The latest version is 4.0.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-07-08 10:29:37.440173 
> -
> Please consider upgrading the dependency cachetools. 
> The current version is 3.1.1. The latest version is 4.1.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-07-13 12:05:33.262799 
> -
> Please consider 

[jira] [Updated] (BEAM-6089) Beam Dependency Update Request: oauth2client

2020-10-02 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-6089:
---
Fix Version/s: 2.26.0

> Beam Dependency Update Request: oauth2client
> 
>
> Key: BEAM-6089
> URL: https://issues.apache.org/jira/browse/BEAM-6089
> Project: Beam
>  Issue Type: Bug
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Priority: P3
> Fix For: 2.26.0
>
>
>  - 2018-11-19 12:11:53.801885 
> -
> Please consider upgrading the dependency oauth2client. 
> The current version is 3.0.0. The latest version is 4.1.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-11-26 12:10:31.359164 
> -
> Please consider upgrading the dependency oauth2client. 
> The current version is 3.0.0. The latest version is 4.1.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-12-03 12:11:18.194090 
> -
> Please consider upgrading the dependency oauth2client. 
> The current version is 3.0.0. The latest version is 4.1.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-12-10 12:13:40.021791 
> -
> Please consider upgrading the dependency oauth2client. 
> The current version is 3.0.0. The latest version is 4.1.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-12-17 12:12:09.88 
> -
> Please consider upgrading the dependency oauth2client. 
> The current version is 3.0.0. The latest version is 4.1.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-12-31 15:20:14.935936 
> -
> Please consider upgrading the dependency oauth2client. 
> The current version is 3.0.0. The latest version is 4.1.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-01-07 12:23:14.664558 
> -
> Please consider upgrading the dependency oauth2client. 
> The current version is 3.0.0. The latest version is 4.1.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-01-14 12:12:14.081917 
> -
> Please consider upgrading the dependency oauth2client. 
> The current version is 3.0.0. The latest version is 4.1.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-01-21 12:18:38.928775 
> -
> Please consider upgrading the dependency oauth2client. 
> The current version is 3.0.0. The latest version is 4.1.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-01-28 12:10:22.371989 
> -
> Please consider upgrading the dependency oauth2client. 
> The current version is 3.0.0. The latest version is 4.1.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-02-04 12:11:00.277439 
> -
> Please consider upgrading the dependency oauth2client. 
> The current version is 3.0.0. The latest version is 4.1.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-02-11 12:11:25.422782 
> --

[jira] [Updated] (BEAM-9017) Beam Dependency Update Request: cachetools

2020-10-02 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-9017:
---
Fix Version/s: 2.26.0

> Beam Dependency Update Request: cachetools
> --
>
> Key: BEAM-9017
> URL: https://issues.apache.org/jira/browse/BEAM-9017
> Project: Beam
>  Issue Type: Bug
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Priority: P3
> Fix For: 2.26.0
>
>
>  - 2019-12-23 12:04:17.483984 
> -
> Please consider upgrading the dependency cachetools. 
> The current version is 3.1.1. The latest version is 4.0.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-30 14:00:14.523312 
> -
> Please consider upgrading the dependency cachetools. 
> The current version is 3.1.1. The latest version is 4.0.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-01-06 12:03:38.328424 
> -
> Please consider upgrading the dependency cachetools. 
> The current version is 3.1.1. The latest version is 4.0.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-01-13 12:04:04.593478 
> -
> Please consider upgrading the dependency cachetools. 
> The current version is 3.1.1. The latest version is 4.0.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-01-20 12:03:36.859072 
> -
> Please consider upgrading the dependency cachetools. 
> The current version is 3.1.1. The latest version is 4.0.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-01-27 12:04:22.622190 
> -
> Please consider upgrading the dependency cachetools. 
> The current version is 3.1.1. The latest version is 4.0.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-02-03 12:06:51.686822 
> -
> Please consider upgrading the dependency cachetools. 
> The current version is 3.1.1. The latest version is 4.0.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-02-10 12:03:51.875766 
> -
> Please consider upgrading the dependency cachetools. 
> The current version is 3.1.1. The latest version is 4.0.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-02-17 12:04:37.088086 
> -
> Please consider upgrading the dependency cachetools. 
> The current version is 3.1.1. The latest version is 4.0.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-02-24 12:04:37.014614 
> -
> Please consider upgrading the dependency cachetools. 
> The current version is 3.1.1. The latest version is 4.0.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-07-08 10:29:37.440173 
> -
> Please consider upgrading the dependency cachetools. 
> The current version is 3.1.1. The latest version is 4.1.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-07-13 12:05:33.262799 
> -
> 

[jira] [Updated] (BEAM-6089) Beam Dependency Update Request: oauth2client

2020-10-02 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-6089:
---
Status: Resolved  (was: Open)

> Beam Dependency Update Request: oauth2client
> 
>
> Key: BEAM-6089
> URL: https://issues.apache.org/jira/browse/BEAM-6089
> Project: Beam
>  Issue Type: Bug
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Priority: P3
>
>  - 2018-11-19 12:11:53.801885 
> -
> Please consider upgrading the dependency oauth2client. 
> The current version is 3.0.0. The latest version is 4.1.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-11-26 12:10:31.359164 
> -
> Please consider upgrading the dependency oauth2client. 
> The current version is 3.0.0. The latest version is 4.1.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-12-03 12:11:18.194090 
> -
> Please consider upgrading the dependency oauth2client. 
> The current version is 3.0.0. The latest version is 4.1.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-12-10 12:13:40.021791 
> -
> Please consider upgrading the dependency oauth2client. 
> The current version is 3.0.0. The latest version is 4.1.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-12-17 12:12:09.88 
> -
> Please consider upgrading the dependency oauth2client. 
> The current version is 3.0.0. The latest version is 4.1.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-12-31 15:20:14.935936 
> -
> Please consider upgrading the dependency oauth2client. 
> The current version is 3.0.0. The latest version is 4.1.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-01-07 12:23:14.664558 
> -
> Please consider upgrading the dependency oauth2client. 
> The current version is 3.0.0. The latest version is 4.1.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-01-14 12:12:14.081917 
> -
> Please consider upgrading the dependency oauth2client. 
> The current version is 3.0.0. The latest version is 4.1.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-01-21 12:18:38.928775 
> -
> Please consider upgrading the dependency oauth2client. 
> The current version is 3.0.0. The latest version is 4.1.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-01-28 12:10:22.371989 
> -
> Please consider upgrading the dependency oauth2client. 
> The current version is 3.0.0. The latest version is 4.1.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-02-04 12:11:00.277439 
> -
> Please consider upgrading the dependency oauth2client. 
> The current version is 3.0.0. The latest version is 4.1.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-02-11 12:11:25.422782 
> -

[jira] [Updated] (BEAM-10798) Beam Dependency Update Request: fastavro

2020-10-02 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-10798:

Fix Version/s: 2.26.0

> Beam Dependency Update Request: fastavro
> 
>
> Key: BEAM-10798
> URL: https://issues.apache.org/jira/browse/BEAM-10798
> Project: Beam
>  Issue Type: Bug
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Kamil Wasilewski
>Priority: P2
> Fix For: 2.26.0
>
>
>  - 2020-08-24 12:07:04.164034 
> -
> Please consider upgrading the dependency fastavro. 
> The current version is 0.23.6. The latest version is 1.0.0 
> cc: [~rdub], [~chamikara], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-08-28 09:10:01.691022 
> -
> Please consider upgrading the dependency fastavro. 
> The current version is 0.23.6. The latest version is 1.0.0.post1 
> cc: [~rdub], [~chamikara], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-08-28 15:47:13.166544 
> -
> Please consider upgrading the dependency fastavro. 
> The current version is 0.23.6. The latest version is 1.0.0.post1 
> cc: [~rdub], [~chamikara], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-08-28 16:18:37.722339 
> -
> Please consider upgrading the dependency fastavro. 
> The current version is 0.23.6. The latest version is 1.0.0.post1 
> cc: [~rdub], [~chamikara], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-08-31 12:07:48.339349 
> -
> Please consider upgrading the dependency fastavro. 
> The current version is 0.23.6. The latest version is 1.0.0.post1 
> cc: [~rdub], [~chamikara], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-09-07 12:07:08.876003 
> -
> Please consider upgrading the dependency fastavro. 
> The current version is 0.23.6. The latest version is 1.0.0.post1 
> cc: [~rdub], [~chamikara], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-09-14 12:20:38.465463 
> -
> Please consider upgrading the dependency fastavro. 
> The current version is 0.23.6. The latest version is 1.0.0.post1 
> cc: [~rdub], [~chamikara], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-09-21 12:16:39.857257 
> -
> Please consider upgrading the dependency fastavro. 
> The current version is 0.23.6. The latest version is 1.0.0.post1 
> cc: [~rdub], [~chamikara], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10639) Create an integration test that exercises --setup_file flag.

2020-10-02 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-10639:

Status: Resolved  (was: Open)

> Create an integration test that exercises --setup_file flag.
> 
>
> Key: BEAM-10639
> URL: https://issues.apache.org/jira/browse/BEAM-10639
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: Valentyn Tymofieiev
>Assignee: Kamil Wasilewski
>Priority: P1
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> We have an example [1] and user-facing instruction [2], but I couldn't find 
> an integration test that exercises this functionality continuously on any of 
> the runners.
> [1] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/complete/juliaset/juliaset/juliaset.py
> [2]  
> https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/#nonpython



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10639) Create an integration test that exercises --setup_file flag.

2020-10-02 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-10639:

Fix Version/s: Not applicable

> Create an integration test that exercises --setup_file flag.
> 
>
> Key: BEAM-10639
> URL: https://issues.apache.org/jira/browse/BEAM-10639
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: Valentyn Tymofieiev
>Assignee: Kamil Wasilewski
>Priority: P1
> Fix For: Not applicable
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> We have an example [1] and user-facing instruction [2], but I couldn't find 
> an integration test that exercises this functionality continuously on any of 
> the runners.
> [1] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/complete/juliaset/juliaset/juliaset.py
> [2]  
> https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/#nonpython



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10798) Beam Dependency Update Request: fastavro

2020-10-02 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-10798:

Status: Resolved  (was: Triage Needed)

> Beam Dependency Update Request: fastavro
> 
>
> Key: BEAM-10798
> URL: https://issues.apache.org/jira/browse/BEAM-10798
> Project: Beam
>  Issue Type: Bug
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Kamil Wasilewski
>Priority: P2
>
>  - 2020-08-24 12:07:04.164034 
> -
> Please consider upgrading the dependency fastavro. 
> The current version is 0.23.6. The latest version is 1.0.0 
> cc: [~rdub], [~chamikara], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-08-28 09:10:01.691022 
> -
> Please consider upgrading the dependency fastavro. 
> The current version is 0.23.6. The latest version is 1.0.0.post1 
> cc: [~rdub], [~chamikara], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-08-28 15:47:13.166544 
> -
> Please consider upgrading the dependency fastavro. 
> The current version is 0.23.6. The latest version is 1.0.0.post1 
> cc: [~rdub], [~chamikara], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-08-28 16:18:37.722339 
> -
> Please consider upgrading the dependency fastavro. 
> The current version is 0.23.6. The latest version is 1.0.0.post1 
> cc: [~rdub], [~chamikara], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-08-31 12:07:48.339349 
> -
> Please consider upgrading the dependency fastavro. 
> The current version is 0.23.6. The latest version is 1.0.0.post1 
> cc: [~rdub], [~chamikara], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-09-07 12:07:08.876003 
> -
> Please consider upgrading the dependency fastavro. 
> The current version is 0.23.6. The latest version is 1.0.0.post1 
> cc: [~rdub], [~chamikara], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-09-14 12:20:38.465463 
> -
> Please consider upgrading the dependency fastavro. 
> The current version is 0.23.6. The latest version is 1.0.0.post1 
> cc: [~rdub], [~chamikara], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-09-21 12:16:39.857257 
> -
> Please consider upgrading the dependency fastavro. 
> The current version is 0.23.6. The latest version is 1.0.0.post1 
> cc: [~rdub], [~chamikara], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-9506) _CustomBigQuerySource value provider parameter gcs_location is wrongly evaluated

2020-09-25 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski reassigned BEAM-9506:
--

Assignee: Kamil Wasilewski

> _CustomBigQuerySource value provider parameter gcs_location is wrongly 
> evaluated
> 
>
> Key: BEAM-9506
> URL: https://issues.apache.org/jira/browse/BEAM-9506
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Affects Versions: 2.19.0
>Reporter: Elias Djurfeldt
>Assignee: Kamil Wasilewski
>Priority: P3
>
> The `gcs_location` parameter is wrongly evaluated at pipeline construction 
> time, resulting in calling value_provider.get() from a non-runtime context 
> when using a value provider for the `gcs_location`. 
> See discussion at 
> [https://github.com/apache/beam/pull/11040#issuecomment-597872563]
> The code in question is at: 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L1575]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-3237) Logs from python custom commands are not visible at installation time

2020-09-22 Thread Kamil Wasilewski (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200088#comment-17200088
 ] 

Kamil Wasilewski commented on BEAM-3237:


It looks like the output of custom commands is no longer logged in, even in the 
worker-startup log. This is all what I got after running a job with custom 
setup.py:


Building wheels for collected packages: example 
Building wheel for example (setup.py): started 
Building wheel for example (setup.py): finished with status 'done' 
Created wheel for example: filename=example-0.0.1-py3-none-any.whl size=1039 
sha256=9a12e64c4690c2f1183599ccaaf0f6094fba1180733ce03f133258f9bcbe7518 
Stored in directory: 
/tmp/pip-ephem-wheel-cache-j_0qi3fg/wheels/3b/74/63/a19ebe216edb94a0eb77b70a7ffebfc04517b580414a51692b
 
Successfully built example 
Installing collected packages: example 
Successfully installed example-0.0.1

 

Judging from how logs look like, Dataflow runs "pip install . ", which is not 
enough. At least "pip install . -v" must be used in order to include the output 
of custom commands.

 

 

> Logs from python custom commands are not visible at installation time
> -
>
> Key: BEAM-3237
> URL: https://issues.apache.org/jira/browse/BEAM-3237
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Priority: P3
>
> Outputs from the custom commands in the julia set example does not show in 
> the DataflowRunner logs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10639) Create an integration test that exercises --setup_file flag.

2020-09-22 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-10639:

Status: Open  (was: Triage Needed)

> Create an integration test that exercises --setup_file flag.
> 
>
> Key: BEAM-10639
> URL: https://issues.apache.org/jira/browse/BEAM-10639
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: Valentyn Tymofieiev
>Assignee: Kamil Wasilewski
>Priority: P1
>
> We have an example [1] and user-facing instruction [2], but I couldn't find 
> an integration test that exercises this functionality continuously on any of 
> the runners.
> [1] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/complete/juliaset/juliaset/juliaset.py
> [2]  
> https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/#nonpython



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-10639) Create an integration test that exercises --setup_file flag.

2020-09-22 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski reassigned BEAM-10639:
---

Assignee: Kamil Wasilewski

> Create an integration test that exercises --setup_file flag.
> 
>
> Key: BEAM-10639
> URL: https://issues.apache.org/jira/browse/BEAM-10639
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: Valentyn Tymofieiev
>Assignee: Kamil Wasilewski
>Priority: P1
>
> We have an example [1] and user-facing instruction [2], but I couldn't find 
> an integration test that exercises this functionality continuously on any of 
> the runners.
> [1] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/complete/juliaset/juliaset/juliaset.py
> [2]  
> https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/#nonpython



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9154) Move Chicago Taxi Example to Python 3

2020-09-22 Thread Kamil Wasilewski (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17199975#comment-17199975
 ] 

Kamil Wasilewski commented on BEAM-9154:


> It sounds this test is misconfigured and may not add much value.

I agree. It would be nice to hear what others think.

> Move Chicago Taxi Example to Python 3
> -
>
> Key: BEAM-9154
> URL: https://issues.apache.org/jira/browse/BEAM-9154
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: P1
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The Chicago Taxi Example[1] should be moved to the latest version of Python 
> supported by Beam (currently it's Python 3.7).
> At the moment, the following error occurs when running the benchmark on 
> Python 3.7 (requires futher investigation):
> {code:java}
> Traceback (most recent call last):
>   File "preprocess.py", line 259, in 
> main()
>   File "preprocess.py", line 254, in main
> project=known_args.metric_reporting_project
>   File "preprocess.py", line 155, in transform_data
> ('Analyze' >> tft_beam.AnalyzeDataset(preprocessing_fn)))
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/transforms/ptransform.py",
>  line 987, in __ror__
> return self.transform.__ror__(pvalueish, self.label)
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/transforms/ptransform.py",
>  line 547, in __ror__
> result = p.apply(self, pvalueish, label)
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/pipeline.py", line 
> 532, in apply
> return self.apply(transform, pvalueish)
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/pipeline.py", line 
> 573, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/runners/runner.py", 
> line 223, in apply_PTransform
> return transform.expand(input)
>   File 
> "/Users/kamilwasilewski/proj/beam/build/gradleenv/2022703441/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py",
>  line 825, in expand
> input_metadata))
>   File 
> "/Users/kamilwasilewski/proj/beam/build/gradleenv/2022703441/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py",
>  line 716, in expand
> output_signature = self._preprocessing_fn(copied_inputs)
>   File "preprocess.py", line 102, in preprocessing_fn
> _fill_in_missing(inputs[key]),
> KeyError: 'company'
> {code}
> [1] sdks/python/apache_beam/testing/benchmarks/chicago_taxi



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9154) Move Chicago Taxi Example to Python 3

2020-09-21 Thread Kamil Wasilewski (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17199384#comment-17199384
 ] 

Kamil Wasilewski commented on BEAM-9154:


https://github.com/apache/beam/pull/12886

> Move Chicago Taxi Example to Python 3
> -
>
> Key: BEAM-9154
> URL: https://issues.apache.org/jira/browse/BEAM-9154
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: P1
>
> The Chicago Taxi Example[1] should be moved to the latest version of Python 
> supported by Beam (currently it's Python 3.7).
> At the moment, the following error occurs when running the benchmark on 
> Python 3.7 (requires futher investigation):
> {code:java}
> Traceback (most recent call last):
>   File "preprocess.py", line 259, in 
> main()
>   File "preprocess.py", line 254, in main
> project=known_args.metric_reporting_project
>   File "preprocess.py", line 155, in transform_data
> ('Analyze' >> tft_beam.AnalyzeDataset(preprocessing_fn)))
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/transforms/ptransform.py",
>  line 987, in __ror__
> return self.transform.__ror__(pvalueish, self.label)
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/transforms/ptransform.py",
>  line 547, in __ror__
> result = p.apply(self, pvalueish, label)
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/pipeline.py", line 
> 532, in apply
> return self.apply(transform, pvalueish)
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/pipeline.py", line 
> 573, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/runners/runner.py", 
> line 223, in apply_PTransform
> return transform.expand(input)
>   File 
> "/Users/kamilwasilewski/proj/beam/build/gradleenv/2022703441/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py",
>  line 825, in expand
> input_metadata))
>   File 
> "/Users/kamilwasilewski/proj/beam/build/gradleenv/2022703441/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py",
>  line 716, in expand
> output_signature = self._preprocessing_fn(copied_inputs)
>   File "preprocess.py", line 102, in preprocessing_fn
> _fill_in_missing(inputs[key]),
> KeyError: 'company'
> {code}
> [1] sdks/python/apache_beam/testing/benchmarks/chicago_taxi



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9154) Move Chicago Taxi Example to Python 3

2020-09-21 Thread Kamil Wasilewski (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17199378#comment-17199378
 ] 

Kamil Wasilewski commented on BEAM-9154:


> Can we migrate the code to TF2? 

I don't know. IIRC, the example was copy-pasted from the tfx repo without any 
major changes (the example doesn't exist now, see: 
https://github.com/tensorflow/tfx/pull/741). The tfx team may know the answer.

Let's disable these tests until we know what to do next. I can prepare a pull 
request. 

> Move Chicago Taxi Example to Python 3
> -
>
> Key: BEAM-9154
> URL: https://issues.apache.org/jira/browse/BEAM-9154
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: P1
>
> The Chicago Taxi Example[1] should be moved to the latest version of Python 
> supported by Beam (currently it's Python 3.7).
> At the moment, the following error occurs when running the benchmark on 
> Python 3.7 (requires futher investigation):
> {code:java}
> Traceback (most recent call last):
>   File "preprocess.py", line 259, in 
> main()
>   File "preprocess.py", line 254, in main
> project=known_args.metric_reporting_project
>   File "preprocess.py", line 155, in transform_data
> ('Analyze' >> tft_beam.AnalyzeDataset(preprocessing_fn)))
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/transforms/ptransform.py",
>  line 987, in __ror__
> return self.transform.__ror__(pvalueish, self.label)
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/transforms/ptransform.py",
>  line 547, in __ror__
> result = p.apply(self, pvalueish, label)
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/pipeline.py", line 
> 532, in apply
> return self.apply(transform, pvalueish)
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/pipeline.py", line 
> 573, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
>   File 
> "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/runners/runner.py", 
> line 223, in apply_PTransform
> return transform.expand(input)
>   File 
> "/Users/kamilwasilewski/proj/beam/build/gradleenv/2022703441/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py",
>  line 825, in expand
> input_metadata))
>   File 
> "/Users/kamilwasilewski/proj/beam/build/gradleenv/2022703441/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py",
>  line 716, in expand
> output_signature = self._preprocessing_fn(copied_inputs)
>   File "preprocess.py", line 102, in preprocessing_fn
> _fill_in_missing(inputs[key]),
> KeyError: 'company'
> {code}
> [1] sdks/python/apache_beam/testing/benchmarks/chicago_taxi



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8200) Add performance benchmarks for Python streaming

2020-09-21 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-8200:
---
Status: Resolved  (was: Open)

> Add performance benchmarks for Python streaming 
> 
>
> Key: BEAM-8200
> URL: https://issues.apache.org/jira/browse/BEAM-8200
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: P3
>
> Per discussion [1], we have a gap in python streaming benchmark coverage. We 
> should look into adding benchmarks and alerting (see: BEAM-8199). 
> cc: [~altay] [~angoenka] [~kamilwu] [~kasiak] [~thw]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8200) Add performance benchmarks for Python streaming

2020-09-21 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-8200:
---
Fix Version/s: Not applicable

> Add performance benchmarks for Python streaming 
> 
>
> Key: BEAM-8200
> URL: https://issues.apache.org/jira/browse/BEAM-8200
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: P3
> Fix For: Not applicable
>
>
> Per discussion [1], we have a gap in python streaming benchmark coverage. We 
> should look into adding benchmarks and alerting (see: BEAM-8199). 
> cc: [~altay] [~angoenka] [~kamilwu] [~kasiak] [~thw]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8789) Document InfluxDB/Kapacitor deployment on Kubernetes in project's wiki

2020-09-21 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-8789:
---
Fix Version/s: Not applicable

> Document InfluxDB/Kapacitor deployment on Kubernetes in project's wiki
> --
>
> Key: BEAM-8789
> URL: https://issues.apache.org/jira/browse/BEAM-8789
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Priority: P3
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8136) Configure alert notification channels

2020-09-21 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-8136:
---
Status: Resolved  (was: Open)

> Configure alert notification channels
> -
>
> Key: BEAM-8136
> URL: https://issues.apache.org/jira/browse/BEAM-8136
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Priority: P3
>
> Alert notifications should be sent:
>  * by email (possibly to comm...@beam.apache.org)
>  * to Slack (TBD at which channel)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8136) Configure alert notification channels

2020-09-21 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-8136:
---
Fix Version/s: Not applicable

> Configure alert notification channels
> -
>
> Key: BEAM-8136
> URL: https://issues.apache.org/jira/browse/BEAM-8136
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Priority: P3
> Fix For: Not applicable
>
>
> Alert notifications should be sent:
>  * by email (possibly to comm...@beam.apache.org)
>  * to Slack (TBD at which channel)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8308) Provide an anomaly detection algorithm

2020-09-21 Thread Kamil Wasilewski (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17199297#comment-17199297
 ] 

Kamil Wasilewski commented on BEAM-8308:


Fixed by https://issues.apache.org/jira/browse/BEAM-10807

> Provide an anomaly detection algorithm
> --
>
> Key: BEAM-8308
> URL: https://issues.apache.org/jira/browse/BEAM-8308
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Priority: P3
> Fix For: Not applicable
>
>
> The goal is to verify the effectiveness of the existing algorithm that checks 
> whether the average runtime over 24 hours is 20% greater than average from 
> six previous days.
> Its definition can be found here: 
> `[.test-infra/metrics/prometheus/prometheus/config/rules.yml|https://github.com/apache/beam/pull/9482/files/66d24f4bfe7056edc580b540f8a9a25bb6adb321#diff-00a7ea7bd616a0f1d09e027e51646428]`
> Provide an alternative algorithm if it won't be sufficient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8308) Provide an anomaly detection algorithm

2020-09-21 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-8308:
---
Fix Version/s: Not applicable

> Provide an anomaly detection algorithm
> --
>
> Key: BEAM-8308
> URL: https://issues.apache.org/jira/browse/BEAM-8308
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Priority: P3
> Fix For: Not applicable
>
>
> The goal is to verify the effectiveness of the existing algorithm that checks 
> whether the average runtime over 24 hours is 20% greater than average from 
> six previous days.
> Its definition can be found here: 
> `[.test-infra/metrics/prometheus/prometheus/config/rules.yml|https://github.com/apache/beam/pull/9482/files/66d24f4bfe7056edc580b540f8a9a25bb6adb321#diff-00a7ea7bd616a0f1d09e027e51646428]`
> Provide an alternative algorithm if it won't be sufficient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8308) Provide an anomaly detection algorithm

2020-09-21 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-8308:
---
Status: Resolved  (was: Open)

> Provide an anomaly detection algorithm
> --
>
> Key: BEAM-8308
> URL: https://issues.apache.org/jira/browse/BEAM-8308
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Priority: P3
>
> The goal is to verify the effectiveness of the existing algorithm that checks 
> whether the average runtime over 24 hours is 20% greater than average from 
> six previous days.
> Its definition can be found here: 
> `[.test-infra/metrics/prometheus/prometheus/config/rules.yml|https://github.com/apache/beam/pull/9482/files/66d24f4bfe7056edc580b540f8a9a25bb6adb321#diff-00a7ea7bd616a0f1d09e027e51646428]`
> Provide an alternative algorithm if it won't be sufficient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8789) Document InfluxDB/Kapacitor deployment on Kubernetes in project's wiki

2020-09-21 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-8789:
---
Status: Resolved  (was: Open)

> Document InfluxDB/Kapacitor deployment on Kubernetes in project's wiki
> --
>
> Key: BEAM-8789
> URL: https://issues.apache.org/jira/browse/BEAM-8789
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Priority: P3
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8790) Provide Kubernetes deployment for InfluxDB/Kapacitor

2020-09-21 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-8790:
---
Fix Version/s: Not applicable

> Provide Kubernetes deployment for InfluxDB/Kapacitor
> 
>
> Key: BEAM-8790
> URL: https://issues.apache.org/jira/browse/BEAM-8790
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Priority: P2
>  Labels: stale-P2
> Fix For: Not applicable
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8790) Provide Kubernetes deployment for InfluxDB/Kapacitor

2020-09-21 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-8790:
---
Status: Resolved  (was: Open)

> Provide Kubernetes deployment for InfluxDB/Kapacitor
> 
>
> Key: BEAM-8790
> URL: https://issues.apache.org/jira/browse/BEAM-8790
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Priority: P2
>  Labels: stale-P2
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10643) Grafana dashboard missing per runner nexmark pages

2020-09-18 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-10643:

Fix Version/s: Not applicable

> Grafana dashboard missing per runner nexmark pages
> --
>
> Key: BEAM-10643
> URL: https://issues.apache.org/jira/browse/BEAM-10643
> Project: Beam
>  Issue Type: Bug
>  Components: testing-nexmark
>Reporter: Andrew Pilloud
>Assignee: Kamil Wasilewski
>Priority: P2
> Fix For: Not applicable
>
> Attachments: Zrzut ekranu 2020-09-16 o 16.12.08.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The perfkit dashboard was just turned down, it had a page comparing different 
> implementations for the same query for each runner (Dataflow, Direct, ect). 
> For example, this would be "Java Batch vs Java Streaming vs SQL vs ZetaSQL on 
> Dataflow".
> The new grafina dashboard breaks down nexmark per implementation (Java, SQL, 
> missing ZetaSQL), and hides streaming which isn't a useful comparison as 
> different runners have different sample sizes and measurement techniques.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10643) Grafana dashboard missing per runner nexmark pages

2020-09-18 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-10643:

Status: Resolved  (was: Open)

> Grafana dashboard missing per runner nexmark pages
> --
>
> Key: BEAM-10643
> URL: https://issues.apache.org/jira/browse/BEAM-10643
> Project: Beam
>  Issue Type: Bug
>  Components: testing-nexmark
>Reporter: Andrew Pilloud
>Assignee: Kamil Wasilewski
>Priority: P2
> Attachments: Zrzut ekranu 2020-09-16 o 16.12.08.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The perfkit dashboard was just turned down, it had a page comparing different 
> implementations for the same query for each runner (Dataflow, Direct, ect). 
> For example, this would be "Java Batch vs Java Streaming vs SQL vs ZetaSQL on 
> Dataflow".
> The new grafina dashboard breaks down nexmark per implementation (Java, SQL, 
> missing ZetaSQL), and hides streaming which isn't a useful comparison as 
> different runners have different sample sizes and measurement techniques.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8200) Add performance benchmarks for Python streaming

2020-09-18 Thread Kamil Wasilewski (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17198238#comment-17198238
 ] 

Kamil Wasilewski commented on BEAM-8200:


New benchmarks for Python streaming have been added recently: 
https://issues.apache.org/jira/browse/BEAM-10616, 
https://issues.apache.org/jira/browse/BEAM-10674, 
https://issues.apache.org/jira/browse/BEAM-10675 and 
https://issues.apache.org/jira/browse/BEAM-10672. I think we can close this 
ticket.

> Add performance benchmarks for Python streaming 
> 
>
> Key: BEAM-8200
> URL: https://issues.apache.org/jira/browse/BEAM-8200
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: P3
>
> Per discussion [1], we have a gap in python streaming benchmark coverage. We 
> should look into adding benchmarks and alerting (see: BEAM-8199). 
> cc: [~altay] [~angoenka] [~kamilwu] [~kasiak] [~thw]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-10798) Beam Dependency Update Request: fastavro

2020-09-17 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski reassigned BEAM-10798:
---

Assignee: Kamil Wasilewski

> Beam Dependency Update Request: fastavro
> 
>
> Key: BEAM-10798
> URL: https://issues.apache.org/jira/browse/BEAM-10798
> Project: Beam
>  Issue Type: Bug
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Kamil Wasilewski
>Priority: P2
>
>  - 2020-08-24 12:07:04.164034 
> -
> Please consider upgrading the dependency fastavro. 
> The current version is 0.23.6. The latest version is 1.0.0 
> cc: [~rdub], [~chamikara], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-08-28 09:10:01.691022 
> -
> Please consider upgrading the dependency fastavro. 
> The current version is 0.23.6. The latest version is 1.0.0.post1 
> cc: [~rdub], [~chamikara], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-08-28 15:47:13.166544 
> -
> Please consider upgrading the dependency fastavro. 
> The current version is 0.23.6. The latest version is 1.0.0.post1 
> cc: [~rdub], [~chamikara], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-08-28 16:18:37.722339 
> -
> Please consider upgrading the dependency fastavro. 
> The current version is 0.23.6. The latest version is 1.0.0.post1 
> cc: [~rdub], [~chamikara], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-08-31 12:07:48.339349 
> -
> Please consider upgrading the dependency fastavro. 
> The current version is 0.23.6. The latest version is 1.0.0.post1 
> cc: [~rdub], [~chamikara], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-09-07 12:07:08.876003 
> -
> Please consider upgrading the dependency fastavro. 
> The current version is 0.23.6. The latest version is 1.0.0.post1 
> cc: [~rdub], [~chamikara], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-09-14 12:20:38.465463 
> -
> Please consider upgrading the dependency fastavro. 
> The current version is 0.23.6. The latest version is 1.0.0.post1 
> cc: [~rdub], [~chamikara], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10616) Create Streaming ParDo Python Load Test Jenkins Job

2020-09-17 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-10616:

Status: Resolved  (was: In Progress)

> Create Streaming ParDo Python Load Test Jenkins Job
> ---
>
> Key: BEAM-10616
> URL: https://issues.apache.org/jira/browse/BEAM-10616
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kasia Kucharczyk
>Assignee: Kasia Kucharczyk
>Priority: P2
>  Time Spent: 17.5h
>  Remaining Estimate: 0h
>
> ParDo core operation load tests for streaming with 4 tests cases that loads 
> data from SyntheticSources and runs on Dataflow and Flink.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10616) Create Streaming ParDo Python Load Test Jenkins Job

2020-09-17 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-10616:

Fix Version/s: Not applicable

> Create Streaming ParDo Python Load Test Jenkins Job
> ---
>
> Key: BEAM-10616
> URL: https://issues.apache.org/jira/browse/BEAM-10616
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kasia Kucharczyk
>Assignee: Kasia Kucharczyk
>Priority: P2
> Fix For: Not applicable
>
>  Time Spent: 17.5h
>  Remaining Estimate: 0h
>
> ParDo core operation load tests for streaming with 4 tests cases that loads 
> data from SyntheticSources and runs on Dataflow and Flink.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10643) Grafana dashboard missing per runner nexmark pages

2020-09-16 Thread Kamil Wasilewski (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17197030#comment-17197030
 ] 

Kamil Wasilewski commented on BEAM-10643:
-

The screenshot is big, but you can click on the link in "Attachments" and its 
size would be adjusted to your screen.

> Grafana dashboard missing per runner nexmark pages
> --
>
> Key: BEAM-10643
> URL: https://issues.apache.org/jira/browse/BEAM-10643
> Project: Beam
>  Issue Type: Bug
>  Components: testing-nexmark
>Reporter: Andrew Pilloud
>Assignee: Kamil Wasilewski
>Priority: P2
> Attachments: Zrzut ekranu 2020-09-16 o 16.12.08.png
>
>
> The perfkit dashboard was just turned down, it had a page comparing different 
> implementations for the same query for each runner (Dataflow, Direct, ect). 
> For example, this would be "Java Batch vs Java Streaming vs SQL vs ZetaSQL on 
> Dataflow".
> The new grafina dashboard breaks down nexmark per implementation (Java, SQL, 
> missing ZetaSQL), and hides streaming which isn't a useful comparison as 
> different runners have different sample sizes and measurement techniques.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10643) Grafana dashboard missing per runner nexmark pages

2020-09-16 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-10643:

Summary: Grafana dashboard missing per runner nexmark pages  (was: Grafina 
dashboard missing per runner nexmark pages)

> Grafana dashboard missing per runner nexmark pages
> --
>
> Key: BEAM-10643
> URL: https://issues.apache.org/jira/browse/BEAM-10643
> Project: Beam
>  Issue Type: Bug
>  Components: testing-nexmark
>Reporter: Andrew Pilloud
>Assignee: Kamil Wasilewski
>Priority: P2
> Attachments: Zrzut ekranu 2020-09-16 o 16.12.08.png
>
>
> The perfkit dashboard was just turned down, it had a page comparing different 
> implementations for the same query for each runner (Dataflow, Direct, ect). 
> For example, this would be "Java Batch vs Java Streaming vs SQL vs ZetaSQL on 
> Dataflow".
> The new grafina dashboard breaks down nexmark per implementation (Java, SQL, 
> missing ZetaSQL), and hides streaming which isn't a useful comparison as 
> different runners have different sample sizes and measurement techniques.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10643) Grafina dashboard missing per runner nexmark pages

2020-09-16 Thread Kamil Wasilewski (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17197024#comment-17197024
 ] 

Kamil Wasilewski commented on BEAM-10643:
-

[~apilloud], how about something like this? As you can see in the screenshot 
below, there are up to three data series for each chart: standard, SQL and 
ZetaSQL. If there is no data for SQL and ZetaSQL variants in the database, only 
one data series is displayed. Conversely, when adding SQL/ZetaSQL variants of 
tests, the results should be automatically displayed without any modifications 
to the dashboard.

Also, it would be no longer possible to display results for multiple runners. 
The drop-down list of runners would still be there, but only one runner may be 
selected.

I'm looking forward to your opinion.

 !Zrzut ekranu 2020-09-16 o 16.12.08.png! 

> Grafina dashboard missing per runner nexmark pages
> --
>
> Key: BEAM-10643
> URL: https://issues.apache.org/jira/browse/BEAM-10643
> Project: Beam
>  Issue Type: Bug
>  Components: testing-nexmark
>Reporter: Andrew Pilloud
>Assignee: Kamil Wasilewski
>Priority: P2
> Attachments: Zrzut ekranu 2020-09-16 o 16.12.08.png
>
>
> The perfkit dashboard was just turned down, it had a page comparing different 
> implementations for the same query for each runner (Dataflow, Direct, ect). 
> For example, this would be "Java Batch vs Java Streaming vs SQL vs ZetaSQL on 
> Dataflow".
> The new grafina dashboard breaks down nexmark per implementation (Java, SQL, 
> missing ZetaSQL), and hides streaming which isn't a useful comparison as 
> different runners have different sample sizes and measurement techniques.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10643) Grafina dashboard missing per runner nexmark pages

2020-09-16 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-10643:

Attachment: Zrzut ekranu 2020-09-16 o 16.12.08.png

> Grafina dashboard missing per runner nexmark pages
> --
>
> Key: BEAM-10643
> URL: https://issues.apache.org/jira/browse/BEAM-10643
> Project: Beam
>  Issue Type: Bug
>  Components: testing-nexmark
>Reporter: Andrew Pilloud
>Assignee: Kamil Wasilewski
>Priority: P2
> Attachments: Zrzut ekranu 2020-09-16 o 16.12.08.png
>
>
> The perfkit dashboard was just turned down, it had a page comparing different 
> implementations for the same query for each runner (Dataflow, Direct, ect). 
> For example, this would be "Java Batch vs Java Streaming vs SQL vs ZetaSQL on 
> Dataflow".
> The new grafina dashboard breaks down nexmark per implementation (Java, SQL, 
> missing ZetaSQL), and hides streaming which isn't a useful comparison as 
> different runners have different sample sizes and measurement techniques.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10659) ParDo Python streaming load tests timeouts on 200-iterations case

2020-09-15 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-10659:

Fix Version/s: Not applicable

> ParDo Python streaming load tests timeouts on 200-iterations case
> -
>
> Key: BEAM-10659
> URL: https://issues.apache.org/jira/browse/BEAM-10659
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Kasia Kucharczyk
>Priority: P2
> Fix For: Not applicable
>
>
> Running Python Dataflow load test in streaming option timeouts on Jenkins on 
> case 2:
> {code:java}
> 2GB 100 byte records 200 times
> {code}
>  It 
> [iterates|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/testing/load_tests/pardo_test.py#L147]
>  same ParDo step sequentially. 
> Jenkins jobs has 2h timeout. Second case usually is 
> [cancelled|https://console.cloud.google.com/dataflow/jobs/us-central1/2020-08-04_05_00_47-15183151853043328210;mainTab=JOB_METRICS?project=apache-beam-testing]
>  after 1h 47 min. The most suspicious metric here is throughput which in 
> comparison to other jobs doesn't look steady. Sometimes there are spike after 
> 1 hour of non action, or there are several spikes (to 30 000 elements/sec).
> [Python batch 
> case|https://console.cloud.google.com/dataflow/jobs/us-central1/2020-08-04_06_32_29-2466435392086580014;step=s1;mainTab=JOB_METRICS?project=apache-beam-testing]
>  scenario takes ~56 minutes, with steady throughput ~7000 elements/sec for 
> almost whole job run.
> In comparison [Java same test 
> case|https://console.cloud.google.com/dataflow/jobs/us-central1/2020-08-03_05_13_48-16554947290254286391;mainTab=JOB_GRAPH?project=apache-beam-testing]
>  takes ~6 minutes. Here throughput goes up to ~100 000 elements/sec then 
> after processing all elements it decreases.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10659) ParDo Python streaming load tests timeouts on 200-iterations case

2020-09-15 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-10659:

Status: Resolved  (was: Open)

> ParDo Python streaming load tests timeouts on 200-iterations case
> -
>
> Key: BEAM-10659
> URL: https://issues.apache.org/jira/browse/BEAM-10659
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Kasia Kucharczyk
>Priority: P2
>
> Running Python Dataflow load test in streaming option timeouts on Jenkins on 
> case 2:
> {code:java}
> 2GB 100 byte records 200 times
> {code}
>  It 
> [iterates|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/testing/load_tests/pardo_test.py#L147]
>  same ParDo step sequentially. 
> Jenkins jobs has 2h timeout. Second case usually is 
> [cancelled|https://console.cloud.google.com/dataflow/jobs/us-central1/2020-08-04_05_00_47-15183151853043328210;mainTab=JOB_METRICS?project=apache-beam-testing]
>  after 1h 47 min. The most suspicious metric here is throughput which in 
> comparison to other jobs doesn't look steady. Sometimes there are spike after 
> 1 hour of non action, or there are several spikes (to 30 000 elements/sec).
> [Python batch 
> case|https://console.cloud.google.com/dataflow/jobs/us-central1/2020-08-04_06_32_29-2466435392086580014;step=s1;mainTab=JOB_METRICS?project=apache-beam-testing]
>  scenario takes ~56 minutes, with steady throughput ~7000 elements/sec for 
> almost whole job run.
> In comparison [Java same test 
> case|https://console.cloud.google.com/dataflow/jobs/us-central1/2020-08-03_05_13_48-16554947290254286391;mainTab=JOB_GRAPH?project=apache-beam-testing]
>  takes ~6 minutes. Here throughput goes up to ~100 000 elements/sec then 
> after processing all elements it decreases.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10852) Combine Python streaming load test is too slow on Flink

2020-09-03 Thread Kamil Wasilewski (Jira)
Kamil Wasilewski created BEAM-10852:
---

 Summary: Combine Python streaming load test is too slow on Flink
 Key: BEAM-10852
 URL: https://issues.apache.org/jira/browse/BEAM-10852
 Project: Beam
  Issue Type: Bug
  Components: testing
Reporter: Kamil Wasilewski


One of the Combine load test cases, which involves a global combiner and data 
stream of 200M elements, takes too long on Flink. Flink is able to process only 
a half of that data stream within 1 hour, which is too long for a Jenkins job.

Job's definition: 
https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_LoadTests_Combine_Flink_Python.groovy#L36

Test pipeline: 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/testing/load_tests/combine_test.py




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10616) Create Streaming ParDo Python Load Test Jenkins Job

2020-09-03 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-10616:

Description: ParDo core operation load tests for streaming with 4 tests 
cases that loads data from SyntheticSources and runs on Dataflow and Flink.  
(was: ParDo core operation load tests for streaming with 4 tests cases that 
loads data from SyntheticSources and runs on Dataflow.)

> Create Streaming ParDo Python Load Test Jenkins Job
> ---
>
> Key: BEAM-10616
> URL: https://issues.apache.org/jira/browse/BEAM-10616
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kasia Kucharczyk
>Assignee: Kasia Kucharczyk
>Priority: P2
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>
> ParDo core operation load tests for streaming with 4 tests cases that loads 
> data from SyntheticSources and runs on Dataflow and Flink.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10835) Improve Github Actions cancelling duplicated runs

2020-08-31 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-10835:

Status: Resolved  (was: Open)

> Improve Github Actions cancelling duplicated runs
> -
>
> Key: BEAM-10835
> URL: https://issues.apache.org/jira/browse/BEAM-10835
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, testing
>Reporter: Tobiasz Kedzierski
>Assignee: Tobiasz Kedzierski
>Priority: P2
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10835) Improve Github Actions cancelling duplicated runs

2020-08-31 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-10835:

Fix Version/s: Not applicable

> Improve Github Actions cancelling duplicated runs
> -
>
> Key: BEAM-10835
> URL: https://issues.apache.org/jira/browse/BEAM-10835
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, testing
>Reporter: Tobiasz Kedzierski
>Assignee: Tobiasz Kedzierski
>Priority: P2
> Fix For: Not applicable
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10837) Remove unused beam_PerformanceTests_Analysis Jenkins Job

2020-08-31 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-10837:

Status: Resolved  (was: Open)

> Remove unused beam_PerformanceTests_Analysis Jenkins Job
> 
>
> Key: BEAM-10837
> URL: https://issues.apache.org/jira/browse/BEAM-10837
> Project: Beam
>  Issue Type: Improvement
>  Components: community-metrics
>Reporter: Tobiasz Kedzierski
>Assignee: Tobiasz Kedzierski
>Priority: P2
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10837) Remove unused beam_PerformanceTests_Analysis Jenkins Job

2020-08-31 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-10837:

Fix Version/s: Not applicable

> Remove unused beam_PerformanceTests_Analysis Jenkins Job
> 
>
> Key: BEAM-10837
> URL: https://issues.apache.org/jira/browse/BEAM-10837
> Project: Beam
>  Issue Type: Improvement
>  Components: community-metrics
>Reporter: Tobiasz Kedzierski
>Assignee: Tobiasz Kedzierski
>Priority: P2
> Fix For: Not applicable
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10672) Create Streaming Combine Python Load Test Jenkins Job

2020-08-27 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-10672:

Description: Adding combine load test case with streaming option for python 
SDK.  (was: Adding combine load test case with streaming option for Dataflow 
and python SDK.)

> Create Streaming Combine Python Load Test Jenkins Job
> -
>
> Key: BEAM-10672
> URL: https://issues.apache.org/jira/browse/BEAM-10672
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kasia Kucharczyk
>Assignee: Kasia Kucharczyk
>Priority: P2
> Fix For: Not applicable
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Adding combine load test case with streaming option for python SDK.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10643) Grafina dashboard missing per runner nexmark pages

2020-08-26 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-10643:

Status: Open  (was: Triage Needed)

> Grafina dashboard missing per runner nexmark pages
> --
>
> Key: BEAM-10643
> URL: https://issues.apache.org/jira/browse/BEAM-10643
> Project: Beam
>  Issue Type: Bug
>  Components: testing-nexmark
>Reporter: Andrew Pilloud
>Assignee: Kamil Wasilewski
>Priority: P2
>
> The perfkit dashboard was just turned down, it had a page comparing different 
> implementations for the same query for each runner (Dataflow, Direct, ect). 
> For example, this would be "Java Batch vs Java Streaming vs SQL vs ZetaSQL on 
> Dataflow".
> The new grafina dashboard breaks down nexmark per implementation (Java, SQL, 
> missing ZetaSQL), and hides streaming which isn't a useful comparison as 
> different runners have different sample sizes and measurement techniques.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10675) Create Streaming GBK Python Load Test Jenkins Job

2020-08-26 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-10675:

Fix Version/s: Not applicable

> Create Streaming GBK Python Load Test Jenkins Job
> -
>
> Key: BEAM-10675
> URL: https://issues.apache.org/jira/browse/BEAM-10675
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: P2
> Fix For: Not applicable
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10675) Create Streaming GBK Python Load Test Jenkins Job

2020-08-26 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-10675:

Status: Resolved  (was: Open)

> Create Streaming GBK Python Load Test Jenkins Job
> -
>
> Key: BEAM-10675
> URL: https://issues.apache.org/jira/browse/BEAM-10675
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: P2
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8961) GBK Python Load test on Flink fails in Jenkins jobs

2020-08-26 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-8961:
---
Status: Resolved  (was: Resolved)

> GBK Python Load test on Flink fails in Jenkins jobs
> ---
>
> Key: BEAM-8961
> URL: https://issues.apache.org/jira/browse/BEAM-8961
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Michał Walenia
>Priority: P3
> Fix For: Not applicable
>
>
> [https://builds.apache.org/job/beam_LoadTests_Python_GBK_Flink_Batch_PR/53/console]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8961) GBK Python Load test on Flink fails in Jenkins jobs

2020-08-26 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-8961:
---
Fix Version/s: Not applicable

> GBK Python Load test on Flink fails in Jenkins jobs
> ---
>
> Key: BEAM-8961
> URL: https://issues.apache.org/jira/browse/BEAM-8961
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Michał Walenia
>Priority: P3
> Fix For: Not applicable
>
>
> [https://builds.apache.org/job/beam_LoadTests_Python_GBK_Flink_Batch_PR/53/console]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8961) GBK Python Load test on Flink fails in Jenkins jobs

2020-08-26 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-8961:
---
Status: Resolved  (was: Open)

> GBK Python Load test on Flink fails in Jenkins jobs
> ---
>
> Key: BEAM-8961
> URL: https://issues.apache.org/jira/browse/BEAM-8961
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Michał Walenia
>Priority: P3
>
> [https://builds.apache.org/job/beam_LoadTests_Python_GBK_Flink_Batch_PR/53/console]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8961) GBK Python Load test on Flink fails in Jenkins jobs

2020-08-26 Thread Kamil Wasilewski (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17185063#comment-17185063
 ] 

Kamil Wasilewski commented on BEAM-8961:


Cannot reproduce due to https://issues.apache.org/jira/browse/BEAM-9761

> GBK Python Load test on Flink fails in Jenkins jobs
> ---
>
> Key: BEAM-8961
> URL: https://issues.apache.org/jira/browse/BEAM-8961
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Michał Walenia
>Priority: P3
>
> [https://builds.apache.org/job/beam_LoadTests_Python_GBK_Flink_Batch_PR/53/console]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10524) Default decoder for ReadFromBigQuery does not support repeatable fields

2020-08-25 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-10524:

Fix Version/s: 2.25.0

> Default decoder for ReadFromBigQuery does not support repeatable fields
> ---
>
> Key: BEAM-10524
> URL: https://issues.apache.org/jira/browse/BEAM-10524
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.23.0
>Reporter: Roman Frigg
>Assignee: Kamil Wasilewski
>Priority: P2
> Fix For: 2.25.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The code in 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L570]
> handles decoding fields with mode "REPEATABLE" incorrectly. This bug leads to 
> the following stack trace when running a query that returns results with 
> repeatable fields represented as JSON arrays. The corresponding stack trace 
> looks as follows:
> {noformat}
> ...
>  File "apache_beam/runners/common.py", line 1095, in 
> apache_beam.runners.common._OutputProcessor.process_outputs
>   File 
> "/Users/roman/src/ml/data-pipelines/etl/venv/lib/python3.6/site-packages/apache_beam/io/concat_source.py",
>  line 89, in read
> range_tracker.sub_range_tracker(source_ix)):
>   File 
> "/Users/roman/src/ml/data-pipelines/etl/venv/lib/python3.6/site-packages/apache_beam/io/textio.py",
>  line 210, in read_records
> yield self._coder.decode(record)
>   File 
> "/Users/roman/src/ml/data-pipelines/etl/venv/lib/python3.6/site-packages/apache_beam/io/gcp/bigquery.py",
>  line 566, in decode
> return self._decode_with_schema(value, self.fields)
>   File 
> "/Users/roman/src/ml/data-pipelines/etl/venv/lib/python3.6/site-packages/apache_beam/io/gcp/bigquery.py",
>  line 580, in _decode_with_schema
> value[field.name], field.fields)
>   File 
> "/Users/roman/src/ml/data-pipelines/etl/venv/lib/python3.6/site-packages/apache_beam/io/gcp/bigquery.py",
>  line 575, in _decode_with_schema
> value[field.name] = None
> TypeError: list indices must be integers or slices, not str{noformat}
>  
> The fix could look something like this (untested):
> {code:python}
> def _decode_with_schema(self, value, schema_fields):
> for field in schema_fields:
> if field.name not in value:
> # The field exists in the schema, but it doesn't exist in this 
> row.
> # It probably means its value was null, as the extract to JSON job
> # doesn't preserve null fields
> value[field.name] = None
> continue
> if field.type == 'RECORD':
> if field.mode == 'REPEATED':
> value[field.name] = [self._decode_with_schema(val, 
> field.fields)
>  for val in value[field.name]]
> else:
> value[field.name] = 
> self._decode_with_schema(value[field.name],
>  field.fields)
> else:
> try:
> converter = self._converters[field.type]
> except KeyError:
> # No need to do any conversion
> continue
> if field.mode == 'REPEATED':
> value[field.name] = map(converter, value[field.name])
> else:
> value[field.name] = converter(value[field.name])
> return value
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10524) Default decoder for ReadFromBigQuery does not support repeatable fields

2020-08-25 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-10524:

Status: Resolved  (was: Open)

> Default decoder for ReadFromBigQuery does not support repeatable fields
> ---
>
> Key: BEAM-10524
> URL: https://issues.apache.org/jira/browse/BEAM-10524
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.23.0
>Reporter: Roman Frigg
>Assignee: Kamil Wasilewski
>Priority: P2
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The code in 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L570]
> handles decoding fields with mode "REPEATABLE" incorrectly. This bug leads to 
> the following stack trace when running a query that returns results with 
> repeatable fields represented as JSON arrays. The corresponding stack trace 
> looks as follows:
> {noformat}
> ...
>  File "apache_beam/runners/common.py", line 1095, in 
> apache_beam.runners.common._OutputProcessor.process_outputs
>   File 
> "/Users/roman/src/ml/data-pipelines/etl/venv/lib/python3.6/site-packages/apache_beam/io/concat_source.py",
>  line 89, in read
> range_tracker.sub_range_tracker(source_ix)):
>   File 
> "/Users/roman/src/ml/data-pipelines/etl/venv/lib/python3.6/site-packages/apache_beam/io/textio.py",
>  line 210, in read_records
> yield self._coder.decode(record)
>   File 
> "/Users/roman/src/ml/data-pipelines/etl/venv/lib/python3.6/site-packages/apache_beam/io/gcp/bigquery.py",
>  line 566, in decode
> return self._decode_with_schema(value, self.fields)
>   File 
> "/Users/roman/src/ml/data-pipelines/etl/venv/lib/python3.6/site-packages/apache_beam/io/gcp/bigquery.py",
>  line 580, in _decode_with_schema
> value[field.name], field.fields)
>   File 
> "/Users/roman/src/ml/data-pipelines/etl/venv/lib/python3.6/site-packages/apache_beam/io/gcp/bigquery.py",
>  line 575, in _decode_with_schema
> value[field.name] = None
> TypeError: list indices must be integers or slices, not str{noformat}
>  
> The fix could look something like this (untested):
> {code:python}
> def _decode_with_schema(self, value, schema_fields):
> for field in schema_fields:
> if field.name not in value:
> # The field exists in the schema, but it doesn't exist in this 
> row.
> # It probably means its value was null, as the extract to JSON job
> # doesn't preserve null fields
> value[field.name] = None
> continue
> if field.type == 'RECORD':
> if field.mode == 'REPEATED':
> value[field.name] = [self._decode_with_schema(val, 
> field.fields)
>  for val in value[field.name]]
> else:
> value[field.name] = 
> self._decode_with_schema(value[field.name],
>  field.fields)
> else:
> try:
> converter = self._converters[field.type]
> except KeyError:
> # No need to do any conversion
> continue
> if field.mode == 'REPEATED':
> value[field.name] = map(converter, value[field.name])
> else:
> value[field.name] = converter(value[field.name])
> return value
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10524) Default decoder for ReadFromBigQuery does not support repeatable fields

2020-08-25 Thread Kamil Wasilewski (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17184105#comment-17184105
 ] 

Kamil Wasilewski commented on BEAM-10524:
-

> Is there any particular reason you decided not to support repeated fields?

I don't remember it clearly. Most probably it must have been a mistake. 
Nevertheless, the bugfix has been merged to master and will be available since 
2.25 release.
Thank for reporting the issue!



> Default decoder for ReadFromBigQuery does not support repeatable fields
> ---
>
> Key: BEAM-10524
> URL: https://issues.apache.org/jira/browse/BEAM-10524
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.23.0
>Reporter: Roman Frigg
>Assignee: Kamil Wasilewski
>Priority: P2
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The code in 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L570]
> handles decoding fields with mode "REPEATABLE" incorrectly. This bug leads to 
> the following stack trace when running a query that returns results with 
> repeatable fields represented as JSON arrays. The corresponding stack trace 
> looks as follows:
> {noformat}
> ...
>  File "apache_beam/runners/common.py", line 1095, in 
> apache_beam.runners.common._OutputProcessor.process_outputs
>   File 
> "/Users/roman/src/ml/data-pipelines/etl/venv/lib/python3.6/site-packages/apache_beam/io/concat_source.py",
>  line 89, in read
> range_tracker.sub_range_tracker(source_ix)):
>   File 
> "/Users/roman/src/ml/data-pipelines/etl/venv/lib/python3.6/site-packages/apache_beam/io/textio.py",
>  line 210, in read_records
> yield self._coder.decode(record)
>   File 
> "/Users/roman/src/ml/data-pipelines/etl/venv/lib/python3.6/site-packages/apache_beam/io/gcp/bigquery.py",
>  line 566, in decode
> return self._decode_with_schema(value, self.fields)
>   File 
> "/Users/roman/src/ml/data-pipelines/etl/venv/lib/python3.6/site-packages/apache_beam/io/gcp/bigquery.py",
>  line 580, in _decode_with_schema
> value[field.name], field.fields)
>   File 
> "/Users/roman/src/ml/data-pipelines/etl/venv/lib/python3.6/site-packages/apache_beam/io/gcp/bigquery.py",
>  line 575, in _decode_with_schema
> value[field.name] = None
> TypeError: list indices must be integers or slices, not str{noformat}
>  
> The fix could look something like this (untested):
> {code:python}
> def _decode_with_schema(self, value, schema_fields):
> for field in schema_fields:
> if field.name not in value:
> # The field exists in the schema, but it doesn't exist in this 
> row.
> # It probably means its value was null, as the extract to JSON job
> # doesn't preserve null fields
> value[field.name] = None
> continue
> if field.type == 'RECORD':
> if field.mode == 'REPEATED':
> value[field.name] = [self._decode_with_schema(val, 
> field.fields)
>  for val in value[field.name]]
> else:
> value[field.name] = 
> self._decode_with_schema(value[field.name],
>  field.fields)
> else:
> try:
> converter = self._converters[field.type]
> except KeyError:
> # No need to do any conversion
> continue
> if field.mode == 'REPEATED':
> value[field.name] = map(converter, value[field.name])
> else:
> value[field.name] = converter(value[field.name])
> return value
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8742) Add stateful processing to ParDo load test

2020-08-24 Thread Kamil Wasilewski (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183248#comment-17183248
 ] 

Kamil Wasilewski commented on BEAM-8742:


> have support for streaming which we can't have until SDF is implemented in 
> streaming mode

[~mxm] Does it apply only to Flink? We've successfully run streaming pipelines 
with `SyntheticSource` on Dataflow, which would mean SDF does work in streaming 
mode (SyntheticSource executes as SDF because a wrapper is used). Also, a JIRA 
ticket related to streaming SDF in Python has been closed: 
https://issues.apache.org/jira/browse/BEAM-3742

> Add stateful processing to ParDo load test
> --
>
> Key: BEAM-8742
> URL: https://issues.apache.org/jira/browse/BEAM-8742
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 16h 10m
>  Remaining Estimate: 0h
>
> So far, the ParDo load test is not stateful. We should add a basic counter to 
> test the stateful processing.
> The test should work in streaming mode and with checkpointing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10524) Default decoder for ReadFromBigQuery does not support repeatable fields

2020-08-24 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-10524:

Affects Version/s: 2.23.0

> Default decoder for ReadFromBigQuery does not support repeatable fields
> ---
>
> Key: BEAM-10524
> URL: https://issues.apache.org/jira/browse/BEAM-10524
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.23.0
>Reporter: Roman Frigg
>Assignee: Kamil Wasilewski
>Priority: P2
>
> The code in 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L570]
> handles decoding fields with mode "REPEATABLE" incorrectly. This bug leads to 
> the following stack trace when running a query that returns results with 
> repeatable fields represented as JSON arrays. The corresponding stack trace 
> looks as follows:
> {noformat}
> ...
>  File "apache_beam/runners/common.py", line 1095, in 
> apache_beam.runners.common._OutputProcessor.process_outputs
>   File 
> "/Users/roman/src/ml/data-pipelines/etl/venv/lib/python3.6/site-packages/apache_beam/io/concat_source.py",
>  line 89, in read
> range_tracker.sub_range_tracker(source_ix)):
>   File 
> "/Users/roman/src/ml/data-pipelines/etl/venv/lib/python3.6/site-packages/apache_beam/io/textio.py",
>  line 210, in read_records
> yield self._coder.decode(record)
>   File 
> "/Users/roman/src/ml/data-pipelines/etl/venv/lib/python3.6/site-packages/apache_beam/io/gcp/bigquery.py",
>  line 566, in decode
> return self._decode_with_schema(value, self.fields)
>   File 
> "/Users/roman/src/ml/data-pipelines/etl/venv/lib/python3.6/site-packages/apache_beam/io/gcp/bigquery.py",
>  line 580, in _decode_with_schema
> value[field.name], field.fields)
>   File 
> "/Users/roman/src/ml/data-pipelines/etl/venv/lib/python3.6/site-packages/apache_beam/io/gcp/bigquery.py",
>  line 575, in _decode_with_schema
> value[field.name] = None
> TypeError: list indices must be integers or slices, not str{noformat}
>  
> The fix could look something like this (untested):
> {code:python}
> def _decode_with_schema(self, value, schema_fields):
> for field in schema_fields:
> if field.name not in value:
> # The field exists in the schema, but it doesn't exist in this 
> row.
> # It probably means its value was null, as the extract to JSON job
> # doesn't preserve null fields
> value[field.name] = None
> continue
> if field.type == 'RECORD':
> if field.mode == 'REPEATED':
> value[field.name] = [self._decode_with_schema(val, 
> field.fields)
>  for val in value[field.name]]
> else:
> value[field.name] = 
> self._decode_with_schema(value[field.name],
>  field.fields)
> else:
> try:
> converter = self._converters[field.type]
> except KeyError:
> # No need to do any conversion
> continue
> if field.mode == 'REPEATED':
> value[field.name] = map(converter, value[field.name])
> else:
> value[field.name] = converter(value[field.name])
> return value
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10674) Create Streaming coGBK Python Load Test Jenkins Job

2020-08-20 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-10674:

Fix Version/s: Not applicable

> Create Streaming coGBK Python Load Test Jenkins Job
> ---
>
> Key: BEAM-10674
> URL: https://issues.apache.org/jira/browse/BEAM-10674
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: P2
> Fix For: Not applicable
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10674) Create Streaming coGBK Python Load Test Jenkins Job

2020-08-20 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski updated BEAM-10674:

Status: Resolved  (was: In Progress)

> Create Streaming coGBK Python Load Test Jenkins Job
> ---
>
> Key: BEAM-10674
> URL: https://issues.apache.org/jira/browse/BEAM-10674
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: P2
>  Time Spent: 4h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   6   >