[jira] [Created] (BEAM-14524) Add return_row option to RunInference transform

2022-05-26 Thread Heejong Lee (Jira)
Heejong Lee created BEAM-14524:
--

 Summary: Add return_row option to RunInference transform
 Key: BEAM-14524
 URL: https://issues.apache.org/jira/browse/BEAM-14524
 Project: Beam
  Issue Type: Improvement
  Components: cross-language
Reporter: Heejong Lee
Assignee: Heejong Lee


Add return_row option to RunInference transform.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (BEAM-14517) Add files_to_stage option to Python SDK

2022-05-25 Thread Heejong Lee (Jira)
Heejong Lee created BEAM-14517:
--

 Summary: Add files_to_stage option to Python SDK
 Key: BEAM-14517
 URL: https://issues.apache.org/jira/browse/BEAM-14517
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: Heejong Lee
Assignee: Heejong Lee


Add files_to_stage option to Python SDK



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (BEAM-14516) Add context support for Python callables

2022-05-25 Thread Heejong Lee (Jira)
Heejong Lee created BEAM-14516:
--

 Summary: Add context support for Python callables
 Key: BEAM-14516
 URL: https://issues.apache.org/jira/browse/BEAM-14516
 Project: Beam
  Issue Type: Improvement
  Components: cross-language
Reporter: Heejong Lee
Assignee: Heejong Lee


Add context support for Python callables



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (BEAM-14506) Adding testcases and examples for xlang Python RunInference

2022-05-24 Thread Heejong Lee (Jira)
Heejong Lee created BEAM-14506:
--

 Summary: Adding testcases and examples for xlang Python 
RunInference
 Key: BEAM-14506
 URL: https://issues.apache.org/jira/browse/BEAM-14506
 Project: Beam
  Issue Type: Improvement
  Components: cross-language, testing
Reporter: Heejong Lee
Assignee: Heejong Lee


Adding testcases and examples for xlang Python RunInference



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-14343) Allow expansion service override in ExternalPythonTransform

2022-05-24 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-14343:
---
Fix Version/s: 2.39.0
   Resolution: Fixed
   Status: Resolved  (was: Open)

> Allow expansion service override in ExternalPythonTransform
> ---
>
> Key: BEAM-14343
> URL: https://issues.apache.org/jira/browse/BEAM-14343
> Project: Beam
>  Issue Type: Bug
>  Components: cross-language, sdk-java-core
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
> Fix For: 2.39.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Allow expansion service override in ExternalPythonTransform



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-14369) Fix "target/options: no such file or directory" error while building Java container without licenses

2022-05-24 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-14369:
---
Fix Version/s: 2.39.0
   Resolution: Fixed
   Status: Resolved  (was: Open)

> Fix "target/options: no such file or directory" error while building Java 
> container without licenses
> 
>
> Key: BEAM-14369
> URL: https://issues.apache.org/jira/browse/BEAM-14369
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
> Fix For: 2.39.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Fix "target/options: no such file or directory" error while building Java 
> container without licenses



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-14374) Fix module import error in FullyQualifiedNamedTransform

2022-05-24 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-14374:
---
Fix Version/s: 2.40.0
   Resolution: Fixed
   Status: Resolved  (was: Open)

> Fix module import error in FullyQualifiedNamedTransform
> ---
>
> Key: BEAM-14374
> URL: https://issues.apache.org/jira/browse/BEAM-14374
> Project: Beam
>  Issue Type: Bug
>  Components: cross-language, sdk-py-core
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
> Fix For: 2.40.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Fix module import error in FullyQualifiedNamedTransform



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-14430) Adding a logical type support for Python callables to Row schema

2022-05-24 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-14430:
---
Fix Version/s: 2.40.0
   Resolution: Fixed
   Status: Resolved  (was: Open)

> Adding a logical type support for Python callables to Row schema
> 
>
> Key: BEAM-14430
> URL: https://issues.apache.org/jira/browse/BEAM-14430
> Project: Beam
>  Issue Type: New Feature
>  Components: cross-language
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
> Fix For: 2.40.0
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Adding a logical type support for Python callables to Row schema



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-14455) Add UUID to sub-schemas for PythonExternalTransform

2022-05-24 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-14455:
---
Fix Version/s: 2.40.0
   Resolution: Fixed
   Status: Resolved  (was: Open)

> Add UUID to sub-schemas for PythonExternalTransform
> ---
>
> Key: BEAM-14455
> URL: https://issues.apache.org/jira/browse/BEAM-14455
> Project: Beam
>  Issue Type: Improvement
>  Components: cross-language
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
> Fix For: 2.40.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Add UUID to sub-schemas for PythonExternalTransform



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-14471) Adding testcases and examples for xlang Python DataframeTransform

2022-05-24 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-14471:
---
Fix Version/s: 2.40.0
   Resolution: Fixed
   Status: Resolved  (was: Open)

> Adding testcases and examples for xlang Python DataframeTransform 
> --
>
> Key: BEAM-14471
> URL: https://issues.apache.org/jira/browse/BEAM-14471
> Project: Beam
>  Issue Type: Improvement
>  Components: cross-language, testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
> Fix For: 2.40.0
>
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> Adding testcases and examples for xlang Python DataframeTransform 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (BEAM-14478) Fix 'RuntimeValueProvider' object has no attribute 'projectId' error in _CustomBigQuerySource.split

2022-05-16 Thread Heejong Lee (Jira)
Heejong Lee created BEAM-14478:
--

 Summary: Fix 'RuntimeValueProvider' object has no attribute 
'projectId' error in _CustomBigQuerySource.split
 Key: BEAM-14478
 URL: https://issues.apache.org/jira/browse/BEAM-14478
 Project: Beam
  Issue Type: Bug
  Components: io-py-gcp
Reporter: Heejong Lee
Assignee: Heejong Lee


Fix 'RuntimeValueProvider' object has no attribute 'projectId' error in 
_CustomBigQuerySource.split


{noformat}
Error message from worker: Traceback (most recent call last): 
File "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", 
line 644, 
in do_work work_executor.execute() File 
"/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", line 255, 
in execute self._split_task) File 
"/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", line 263, 
in _perform_source_split_considering_api_limits desired_bundle_size) File 
"/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", line 300, 
in _perform_source_split for split in source.split(desired_bundle_size): File 
"/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py", line 
813, 
in split if not self.table_reference.projectId: AttributeError: 
'RuntimeValueProvider' object has no attribute 'projectId'
{noformat}




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-14232) Only resolve artifacts in expanded environments for Java External transform

2022-05-13 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-14232:
---
Fix Version/s: 2.39.0
   Resolution: Fixed
   Status: Resolved  (was: Open)

> Only resolve artifacts in expanded environments for Java External transform
> ---
>
> Key: BEAM-14232
> URL: https://issues.apache.org/jira/browse/BEAM-14232
> Project: Beam
>  Issue Type: Bug
>  Components: cross-language
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>  Labels: stale-assigned
> Fix For: 2.39.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Only resolve artifacts in expanded environments for Java External transform.
>  
> We can't assume that any expansion service resolves any artifact information. 
> For example, Java artifacts returned from Java expansion service cannot be 
> resolved (+downloaded) with Python expansion service. Also, one Python 
> expansion service may returns Python artifacts which are unknown to other 
> Python expansion service. We need to skip pre-existing artifacts and only 
> resolves artifacts in new environments from the expansion service at the 
> moment.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (BEAM-14471) Adding testcases and examples for xlang Python DataframeTransform

2022-05-13 Thread Heejong Lee (Jira)
Heejong Lee created BEAM-14471:
--

 Summary: Adding testcases and examples for xlang Python 
DataframeTransform 
 Key: BEAM-14471
 URL: https://issues.apache.org/jira/browse/BEAM-14471
 Project: Beam
  Issue Type: Improvement
  Components: cross-language, testing
Reporter: Heejong Lee
Assignee: Heejong Lee


Adding testcases and examples for xlang Python DataframeTransform 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-14468) Move all universally understood logical type urns to schema.proto

2022-05-12 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-14468:
---
Status: Open  (was: Triage Needed)

> Move all universally understood logical type urns to schema.proto
> -
>
> Key: BEAM-14468
> URL: https://issues.apache.org/jira/browse/BEAM-14468
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Heejong Lee
>Priority: P2
>
> Move all universally understood logical type urns to schema.proto and update 
> the use sites to point to urn definitions in the proto.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (BEAM-14468) Move all universally understood logical type urns to schema.proto

2022-05-12 Thread Heejong Lee (Jira)
Heejong Lee created BEAM-14468:
--

 Summary: Move all universally understood logical type urns to 
schema.proto
 Key: BEAM-14468
 URL: https://issues.apache.org/jira/browse/BEAM-14468
 Project: Beam
  Issue Type: Improvement
  Components: beam-model
Reporter: Heejong Lee


Move all universally understood logical type urns to schema.proto and update 
the use sites to point to urn definitions in the proto.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-14233) Merge requirements from expanded response for Java External transform

2022-05-12 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-14233:
---
Fix Version/s: 2.39.0
   Resolution: Fixed
   Status: Resolved  (was: Open)

> Merge requirements from expanded response for Java External transform
> -
>
> Key: BEAM-14233
> URL: https://issues.apache.org/jira/browse/BEAM-14233
> Project: Beam
>  Issue Type: Bug
>  Components: cross-language
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>  Labels: stale-assigned
> Fix For: 2.39.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Merge requirements from expanded response for Java External transform



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-14457) Check syntactic errors in Python source string for Java PythonCallableSource object

2022-05-10 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-14457:
---
Status: Open  (was: Triage Needed)

> Check syntactic errors in Python source string for Java PythonCallableSource 
> object
> ---
>
> Key: BEAM-14457
> URL: https://issues.apache.org/jira/browse/BEAM-14457
> Project: Beam
>  Issue Type: Improvement
>  Components: cross-language, sdk-java-core
>Reporter: Heejong Lee
>Priority: P2
>
> We would like to have correctness checks for Python source strings when we 
> construct PythonCallableSource object in Java SDK.
> We might launch Python subprocess in PythonCallableSource.of method and call 
> eval function to easily check syntactic correctness. Python parser module 
> from Jython library could also be helpful.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-14458) Support type inference for logical types in StaticSchemaInference

2022-05-10 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-14458:
---
Status: Open  (was: Triage Needed)

> Support type inference for logical types in StaticSchemaInference
> -
>
> Key: BEAM-14458
> URL: https://issues.apache.org/jira/browse/BEAM-14458
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Heejong Lee
>Priority: P2
>
> Support type inference for logical types in StaticSchemaInference



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (BEAM-14458) Support type inference for logical types in StaticSchemaInference

2022-05-10 Thread Heejong Lee (Jira)
Heejong Lee created BEAM-14458:
--

 Summary: Support type inference for logical types in 
StaticSchemaInference
 Key: BEAM-14458
 URL: https://issues.apache.org/jira/browse/BEAM-14458
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-core
Reporter: Heejong Lee


Support type inference for logical types in StaticSchemaInference



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-14457) Check syntactic errors in Python source string for Java PythonCallableSource object

2022-05-10 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-14457:
---
Summary: Check syntactic errors in Python source string for Java 
PythonCallableSource object  (was: Check syntactic errors of Python source 
string in PythonCallableSource.of)

> Check syntactic errors in Python source string for Java PythonCallableSource 
> object
> ---
>
> Key: BEAM-14457
> URL: https://issues.apache.org/jira/browse/BEAM-14457
> Project: Beam
>  Issue Type: Improvement
>  Components: cross-language, sdk-java-core
>Reporter: Heejong Lee
>Priority: P2
>
> We would like to have correctness checks for Python source strings when we 
> construct PythonCallableSource object in Java SDK.
> We might launch Python subprocess in PythonCallableSource.of method and call 
> eval function to easily check syntactic correctness. Python parser module 
> from Jython library could also be helpful.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (BEAM-14457) Check syntactic errors of Python source string in PythonCallableSource.of

2022-05-10 Thread Heejong Lee (Jira)
Heejong Lee created BEAM-14457:
--

 Summary: Check syntactic errors of Python source string in 
PythonCallableSource.of
 Key: BEAM-14457
 URL: https://issues.apache.org/jira/browse/BEAM-14457
 Project: Beam
  Issue Type: Improvement
  Components: cross-language, sdk-java-core
Reporter: Heejong Lee


We would like to have correctness checks for Python source strings when we 
construct PythonCallableSource object in Java SDK.

We might launch Python subprocess in PythonCallableSource.of method and call 
eval function to easily check syntactic correctness. Python parser module from 
Jython library could also be helpful.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (BEAM-14455) Add UUID to sub-schemas for PythonExternalTransform

2022-05-10 Thread Heejong Lee (Jira)
Heejong Lee created BEAM-14455:
--

 Summary: Add UUID to sub-schemas for PythonExternalTransform
 Key: BEAM-14455
 URL: https://issues.apache.org/jira/browse/BEAM-14455
 Project: Beam
  Issue Type: Improvement
  Components: cross-language
Reporter: Heejong Lee
Assignee: Heejong Lee


Add UUID to sub-schemas for PythonExternalTransform



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (BEAM-14146) Python Streaming job failing to drain with BigQueryIO write errors

2022-05-06 Thread Heejong Lee (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17533071#comment-17533071
 ] 

Heejong Lee commented on BEAM-14146:


I assume that the issue is fixed. We can close this now and reopen later if the 
issue still persists.

> Python Streaming job failing to drain with BigQueryIO write errors
> --
>
> Key: BEAM-14146
> URL: https://issues.apache.org/jira/browse/BEAM-14146
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp, sdk-py-core
>Affects Versions: 2.37.0
>Reporter: Rahul Iyer
>Assignee: Heejong Lee
>Priority: P1
> Fix For: 2.39.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We have a Python Streaming Dataflow job that writes to BigQuery using the 
> {{FILE_LOADS}} method and {{auto_sharding}} enabled. When we try to drain the 
> job it fails with the following error,
> {code:python}
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py",
>  line 1000, in perform_load_job ValueError: Either a non-empty list of 
> fully-qualified source URIs must be provided via the source_uris parameter or 
> an open file object must be provided via the source_stream parameter.
> {code}
> Our {{WriteToBigQuery}} configuration,
> {code:python}
> beam.io.WriteToBigQuery(
>   table=options.output_table,
>   schema=bq_schema,
>   create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
>   write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
>   insert_retry_strategy=RetryStrategy.RETRY_ON_TRANSIENT_ERROR,
>   method=beam.io.WriteToBigQuery.Method.FILE_LOADS,
>   additional_bq_parameters={
> "timePartitioning": {
>   "type": "HOUR",
>   "field": "bq_insert_timestamp",
> },
> "schemaUpdateOptions": ["ALLOW_FIELD_ADDITION", "ALLOW_FIELD_RELAXATION"],
>   },
>   triggering_frequency=120,
>   with_auto_sharding=True,
> )
> {code}
> We are also noticing that the job only fails to drain when there are actual 
> schema updates. If there are no schema updates the job drains without the 
> above error.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-14146) Python Streaming job failing to drain with BigQueryIO write errors

2022-05-06 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-14146:
---
Resolution: Fixed
Status: Resolved  (was: Open)

> Python Streaming job failing to drain with BigQueryIO write errors
> --
>
> Key: BEAM-14146
> URL: https://issues.apache.org/jira/browse/BEAM-14146
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp, sdk-py-core
>Affects Versions: 2.37.0
>Reporter: Rahul Iyer
>Assignee: Heejong Lee
>Priority: P1
> Fix For: 2.39.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We have a Python Streaming Dataflow job that writes to BigQuery using the 
> {{FILE_LOADS}} method and {{auto_sharding}} enabled. When we try to drain the 
> job it fails with the following error,
> {code:python}
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py",
>  line 1000, in perform_load_job ValueError: Either a non-empty list of 
> fully-qualified source URIs must be provided via the source_uris parameter or 
> an open file object must be provided via the source_stream parameter.
> {code}
> Our {{WriteToBigQuery}} configuration,
> {code:python}
> beam.io.WriteToBigQuery(
>   table=options.output_table,
>   schema=bq_schema,
>   create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
>   write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
>   insert_retry_strategy=RetryStrategy.RETRY_ON_TRANSIENT_ERROR,
>   method=beam.io.WriteToBigQuery.Method.FILE_LOADS,
>   additional_bq_parameters={
> "timePartitioning": {
>   "type": "HOUR",
>   "field": "bq_insert_timestamp",
> },
> "schemaUpdateOptions": ["ALLOW_FIELD_ADDITION", "ALLOW_FIELD_RELAXATION"],
>   },
>   triggering_frequency=120,
>   with_auto_sharding=True,
> )
> {code}
> We are also noticing that the job only fails to drain when there are actual 
> schema updates. If there are no schema updates the job drains without the 
> above error.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (BEAM-14430) Adding a logical type support for Python callables to Row schema

2022-05-05 Thread Heejong Lee (Jira)
Heejong Lee created BEAM-14430:
--

 Summary: Adding a logical type support for Python callables to Row 
schema
 Key: BEAM-14430
 URL: https://issues.apache.org/jira/browse/BEAM-14430
 Project: Beam
  Issue Type: New Feature
  Components: cross-language
Reporter: Heejong Lee
Assignee: Heejong Lee


Adding a logical type support for Python callables to Row schema



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-9245) Unable to pull datatore Entity which contains dict properties

2022-05-04 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-9245:
--
Fix Version/s: 2.39.0
   Resolution: Fixed
   Status: Resolved  (was: Open)

> Unable to pull datatore Entity which contains dict properties
> -
>
> Key: BEAM-9245
> URL: https://issues.apache.org/jira/browse/BEAM-9245
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.18.0
>Reporter: Colin Le Nost
>Priority: P3
> Fix For: 2.39.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Hello, we are facing a small bug while reading Datastore entities using 
> ReadFromDatastore transform (python SDK, 2.17 & 2.18)
> We are unable to retrieve entities that contain a dictionary. We think there 
> is implicit casting from these properties into Datastore entity, but when the 
> client is trying to retrieve the entity using the key, it breaks (because 
> this entity has no key).
> h2.  Stacktrace
> {code:python}
>   File 
> ".../venv/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/datastoreio.py",
>  line 269, in process
> yield types.Entity.from_client_entity(client_entity)
>   File 
> ".../venv/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/types.py",
>  line 225, in from_client_entity
> value = Entity.from_client_entity(value)
>   File 
> ".../venv/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/types.py",
>  line 219, in from_client_entity
> Key.from_client_key(client_entity.key),
>   File 
> ".../venv/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/types.py",
>  line 156, in from_client_key
> return Key(client_key.flat_path, project=client_key.project,
> AttributeError: 'NoneType' object has no attribute 'flat_path' [while running 
> 'Read from datastore/Read']
> {code}
>  
> h2.  Here is some code to reproduce:
>  # Insert a datastore entity using the given function
>  # Run the dataflow pipeline using DirectRunner
>  
> {code:python}
> import apache_beam as beam
> from google.cloud import datastore
> from apache_beam.io.gcp.datastore.v1new.types import Query
> from apache_beam.io.gcp.datastore.v1new.datastoreio import ReadFromDatastore
> from apache_beam.options.pipeline_options import StandardOptions, 
> PipelineOptions
> DATASTORE_KIND = "my_entity_kind"
> PROJECT_ID = "my_project_id"
> def create_datastore_entity():
> client = datastore.Client(PROJECT_ID)
> key = client.key(DATASTORE_KIND, "my_task")
> entity = client.get(key=key)
> if entity is not None:
> raise Exception("Existing entity")
> else:
> entity_dict = {"regular_field": "test", "nested_field": {"field1": 
> "my_field1"}}
> entity = datastore.Entity(key=key)
> entity_dict = {k: v for k, v in entity_dict.items()}
> entity.update(entity_dict)
> client.put(entity)
> def my_func(element):
> print(element)
> return element
> def run():
> pipeline_options = PipelineOptions()
> pipeline_options.view_as(StandardOptions).runner = "DirectRunner"
> p = beam.Pipeline(options=pipeline_options)
> my_ds_query = Query(kind=DATASTORE_KIND, project=PROJECT_ID,)
> p | "Read from datastore" >> ReadFromDatastore(
> query=my_ds_query
> ) | "Print entity" >> beam.Map(my_func)
> p.run().wait_until_finish()
> if __name__ == "__main__":
> create_datastore_entity()
> run()
> {code}
> h2.  
>  Workaround
> Currently, we mocked the library using this code (modifying the Entity class, 
> in `sdks/python/apache_beam/io/gcp/datastore/v1new/types.py`, aka this 
> [line|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/datastore/v1new/types.py#L231]
>  ). 
> {code:python}
>   @staticmethod
>   def from_client_entity(client_entity):
> res = Entity(
> Key.from_client_key(client_entity.key),
> exclude_from_indexes=set(client_entity.exclude_from_indexes))
> for name, value in client_entity.items():
>   if isinstance(value, key.Key):
> value = Key.from_client_key(value)
>   if isinstance(value, entity.Entity):
> if value.key:
>   value = Entity.from_client_entity(value)
> else:
>   value = {k:v for k,v in value.items()}
>   res.properties[name] = value
> return res
> {code}
>  If the workaround works for you, I can do the PR.
>  
> Thanks, Colin
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (BEAM-14146) Python Streaming job failing to drain with BigQueryIO write errors

2022-05-02 Thread Heejong Lee (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530996#comment-17530996
 ] 

Heejong Lee commented on BEAM-14146:


[~chamikara] [~pabloem] Do we really need 
[this|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L1005]
 check? Looks like it's possible that {{perform_load_job}} can be called with 
empty list of files during pipeline draining. Could we just do no-op with 
warning messages when it's called with empty list of files?

> Python Streaming job failing to drain with BigQueryIO write errors
> --
>
> Key: BEAM-14146
> URL: https://issues.apache.org/jira/browse/BEAM-14146
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp, sdk-py-core
>Affects Versions: 2.37.0
>Reporter: Rahul Iyer
>Assignee: Heejong Lee
>Priority: P1
> Fix For: 2.39.0
>
>
> We have a Python Streaming Dataflow job that writes to BigQuery using the 
> {{FILE_LOADS}} method and {{auto_sharding}} enabled. When we try to drain the 
> job it fails with the following error,
> {code:python}
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py",
>  line 1000, in perform_load_job ValueError: Either a non-empty list of 
> fully-qualified source URIs must be provided via the source_uris parameter or 
> an open file object must be provided via the source_stream parameter.
> {code}
> Our {{WriteToBigQuery}} configuration,
> {code:python}
> beam.io.WriteToBigQuery(
>   table=options.output_table,
>   schema=bq_schema,
>   create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
>   write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
>   insert_retry_strategy=RetryStrategy.RETRY_ON_TRANSIENT_ERROR,
>   method=beam.io.WriteToBigQuery.Method.FILE_LOADS,
>   additional_bq_parameters={
> "timePartitioning": {
>   "type": "HOUR",
>   "field": "bq_insert_timestamp",
> },
> "schemaUpdateOptions": ["ALLOW_FIELD_ADDITION", "ALLOW_FIELD_RELAXATION"],
>   },
>   triggering_frequency=120,
>   with_auto_sharding=True,
> )
> {code}
> We are also noticing that the job only fails to drain when there are actual 
> schema updates. If there are no schema updates the job drains without the 
> above error.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (BEAM-14374) Fix module import error in FullyQualifiedNamedTransform

2022-04-27 Thread Heejong Lee (Jira)
Heejong Lee created BEAM-14374:
--

 Summary: Fix module import error in FullyQualifiedNamedTransform
 Key: BEAM-14374
 URL: https://issues.apache.org/jira/browse/BEAM-14374
 Project: Beam
  Issue Type: Bug
  Components: cross-language, sdk-py-core
Reporter: Heejong Lee
Assignee: Heejong Lee


Fix module import error in FullyQualifiedNamedTransform



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-14369) Fix "target/options: no such file or directory" error while building Java container without licenses

2022-04-26 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-14369:
---
Summary: Fix "target/options: no such file or directory" error while 
building Java container without licenses  (was: Fix "target/options: no such 
file or directory" error while building Java container)

> Fix "target/options: no such file or directory" error while building Java 
> container without licenses
> 
>
> Key: BEAM-14369
> URL: https://issues.apache.org/jira/browse/BEAM-14369
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Fix "target/options: no such file or directory" error while building Java 
> container



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-14369) Fix "target/options: no such file or directory" error while building Java container without licenses

2022-04-26 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-14369:
---
Description: Fix "target/options: no such file or directory" error while 
building Java container without licenses  (was: Fix "target/options: no such 
file or directory" error while building Java container)

> Fix "target/options: no such file or directory" error while building Java 
> container without licenses
> 
>
> Key: BEAM-14369
> URL: https://issues.apache.org/jira/browse/BEAM-14369
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Fix "target/options: no such file or directory" error while building Java 
> container without licenses



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (BEAM-14369) Fix "target/options: no such file or directory" error while building Java container

2022-04-26 Thread Heejong Lee (Jira)
Heejong Lee created BEAM-14369:
--

 Summary: Fix "target/options: no such file or directory" error 
while building Java container
 Key: BEAM-14369
 URL: https://issues.apache.org/jira/browse/BEAM-14369
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-harness
Reporter: Heejong Lee
Assignee: Heejong Lee


Fix "target/options: no such file or directory" error while building Java 
container



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (BEAM-14343) Allow expansion service override in ExternalPythonTransform

2022-04-20 Thread Heejong Lee (Jira)
Heejong Lee created BEAM-14343:
--

 Summary: Allow expansion service override in 
ExternalPythonTransform
 Key: BEAM-14343
 URL: https://issues.apache.org/jira/browse/BEAM-14343
 Project: Beam
  Issue Type: Bug
  Components: cross-language, sdk-java-core
Reporter: Heejong Lee
Assignee: Heejong Lee


Allow expansion service override in ExternalPythonTransform



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-14300) Fix Java precommit failure

2022-04-12 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-14300:
---
Fix Version/s: Not applicable
   Resolution: Fixed
   Status: Resolved  (was: Open)

> Fix Java precommit failure
> --
>
> Key: BEAM-14300
> URL: https://issues.apache.org/jira/browse/BEAM-14300
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
> Fix For: Not applicable
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (BEAM-14300) Fix Java precommit failure

2022-04-12 Thread Heejong Lee (Jira)
Heejong Lee created BEAM-14300:
--

 Summary: Fix Java precommit failure
 Key: BEAM-14300
 URL: https://issues.apache.org/jira/browse/BEAM-14300
 Project: Beam
  Issue Type: Bug
  Components: test-failures
Reporter: Heejong Lee
Assignee: Heejong Lee






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (BEAM-14251) add output_coder_override to ExpansionRequest

2022-04-05 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee reassigned BEAM-14251:
--

Assignee: Heejong Lee

> add output_coder_override to ExpansionRequest
> -
>
> Key: BEAM-14251
> URL: https://issues.apache.org/jira/browse/BEAM-14251
> Project: Beam
>  Issue Type: Improvement
>  Components: cross-language
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> add output_coder_override to ExpansionRequest



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-14251) add output_coder_override to ExpansionRequest

2022-04-05 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-14251:
---
Description: add output_coder_override to ExpansionRequest  (was: add 
output_coder_override to ExternalConfigurationPayload)

> add output_coder_override to ExpansionRequest
> -
>
> Key: BEAM-14251
> URL: https://issues.apache.org/jira/browse/BEAM-14251
> Project: Beam
>  Issue Type: Improvement
>  Components: cross-language
>Reporter: Heejong Lee
>Priority: P2
>
> add output_coder_override to ExpansionRequest



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-14251) add output_coder_override to ExpansionRequest

2022-04-05 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-14251:
---
Summary: add output_coder_override to ExpansionRequest  (was: add 
output_coder_override to ExternalConfigurationPayload)

> add output_coder_override to ExpansionRequest
> -
>
> Key: BEAM-14251
> URL: https://issues.apache.org/jira/browse/BEAM-14251
> Project: Beam
>  Issue Type: Improvement
>  Components: cross-language
>Reporter: Heejong Lee
>Priority: P2
>
> add output_coder_override to ExternalConfigurationPayload



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (BEAM-14251) add output_coder_override to ExternalConfigurationPayload

2022-04-04 Thread Heejong Lee (Jira)
Heejong Lee created BEAM-14251:
--

 Summary: add output_coder_override to ExternalConfigurationPayload
 Key: BEAM-14251
 URL: https://issues.apache.org/jira/browse/BEAM-14251
 Project: Beam
  Issue Type: Improvement
  Components: cross-language
Reporter: Heejong Lee


add output_coder_override to ExternalConfigurationPayload



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (BEAM-14236) [Python] Write to Parquet support for list to conform with Apache Parquet specification

2022-04-04 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee reassigned BEAM-14236:
--

Assignee: Shivraj Devidas Wabale

> [Python] Write to Parquet support for list to conform with Apache Parquet 
> specification
> ---
>
> Key: BEAM-14236
> URL: https://issues.apache.org/jira/browse/BEAM-14236
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-parquet
>Reporter: Shivraj Devidas Wabale
>Assignee: Shivraj Devidas Wabale
>Priority: P2
>
> ARROW-11497 The pyarrow parquet writer now support the list type contains 3 
> level where the middle level, named {{{}list{}}}, must be a repeated group 
> with a single field named _{{element. 
> [https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#lists|https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#lists,]}}_
> I think we can simply populate it to [WriteToParquet 
> |https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/parquetio.py#L358]by
>  adding additional flag (use_compliant_nested_type) to 
>  conform with Apache Parquet specification.
> h4.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-14133) Fix potential NPE in BigQueryServicesImpl.getErrorInfo

2022-04-01 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-14133:
---
Fix Version/s: 2.39.0
   Resolution: Fixed
   Status: Resolved  (was: Open)

> Fix potential NPE in BigQueryServicesImpl.getErrorInfo
> --
>
> Key: BEAM-14133
> URL: https://issues.apache.org/jira/browse/BEAM-14133
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
> Fix For: 2.39.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Fix potential NPE in BigQueryServicesImpl.getErrorInfo



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-14232) Only resolve artifacts in expanded environments for Java External transform

2022-04-01 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-14232:
---
Description: 
Only resolve artifacts in expanded environments for Java External transform.

 

We can't assume that any expansion service resolves any artifact information. 
For example, Java artifacts returned from Java expansion service cannot be 
resolved (+downloaded) with Python expansion service. Also, one Python 
expansion service may returns Python artifacts which are unknown to other 
Python expansion service. We need to skip pre-existing artifacts and only 
resolves artifacts in new environments from the expansion service at the moment.

  was:Only resolve artifacts in expanded environments for Java External 
transform


> Only resolve artifacts in expanded environments for Java External transform
> ---
>
> Key: BEAM-14232
> URL: https://issues.apache.org/jira/browse/BEAM-14232
> Project: Beam
>  Issue Type: Bug
>  Components: cross-language
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Only resolve artifacts in expanded environments for Java External transform.
>  
> We can't assume that any expansion service resolves any artifact information. 
> For example, Java artifacts returned from Java expansion service cannot be 
> resolved (+downloaded) with Python expansion service. Also, one Python 
> expansion service may returns Python artifacts which are unknown to other 
> Python expansion service. We need to skip pre-existing artifacts and only 
> resolves artifacts in new environments from the expansion service at the 
> moment.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (BEAM-14233) Merge requirements from expanded response for Java External transform

2022-04-01 Thread Heejong Lee (Jira)
Heejong Lee created BEAM-14233:
--

 Summary: Merge requirements from expanded response for Java 
External transform
 Key: BEAM-14233
 URL: https://issues.apache.org/jira/browse/BEAM-14233
 Project: Beam
  Issue Type: Bug
  Components: cross-language
Reporter: Heejong Lee
Assignee: Heejong Lee


Merge requirements from expanded response for Java External transform



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (BEAM-14232) Only resolve artifacts in expanded environments for Java External transform

2022-04-01 Thread Heejong Lee (Jira)
Heejong Lee created BEAM-14232:
--

 Summary: Only resolve artifacts in expanded environments for Java 
External transform
 Key: BEAM-14232
 URL: https://issues.apache.org/jira/browse/BEAM-14232
 Project: Beam
  Issue Type: Bug
  Components: cross-language
Reporter: Heejong Lee
Assignee: Heejong Lee


Only resolve artifacts in expanded environments for Java External transform



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-10790) tar.gz artifacts written into fat jar is no longer gzip file

2022-03-21 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-10790:
---
Fix Version/s: 2.25.0
   Resolution: Fixed
   Status: Resolved  (was: Open)

Please feel free to reopen if the problem still exists in 2.25.0+

> tar.gz artifacts written into fat jar is no longer gzip file
> 
>
> Key: BEAM-10790
> URL: https://issues.apache.org/jira/browse/BEAM-10790
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
> Environment: Flink 1.10.1
> Beam worker pool: apache/beam_python3.6_sdk:2.22.0
> SDK: apache beam 2.22.0 
> Python 3.6
>Reporter: Jiaxin Shan
>Priority: P2
>  Labels: stale-P2
> Fix For: 2.25.0
>
>
> I am using Flink Runner on Kubernetes. The problem I meet is zip file is kind 
> of broken after going through the artifact server.  
>  # application persist a dependency dist file tfx_ephemeral-0.22.0.tar.gz and 
> pass 
> `--extra_package=/tmp/tmpsm0_ll8e/build/tfx/dist/tfx_ephemeral-0.22.0.tar.gz` 
> to beam args
>  # artifact service will retrieve it from artifact server and append into 
> uber jar. 
>  FROM: 'ref_Environment_default_environment_1', dependencies '[type_urn: 
> "beam:artifact:type:[file:v1|file:///v1]"
>  27type_payload: 
> "\n;/tmp/tmpsm0_ll8e/build/tfx/dist/tfx_ephemeral-0.22.0.tar.gz" 
>  TO: 
> `/BEAM-PIPLINE/pipeline/artifacts/job-${uuid}/${hash}-tfx_ephemeral-0.22.0.tar.gz`
>  # beam python worker pool will get dependency from uber jar and put it under 
> /tmp/staged/ 
> I originally find this problem in step 3. I tried to trace the problem and 
> notice it is from step 2. the code snippet is here. 
> [https://github.com/apache/beam/blob/3d0f7dc011bb7bbee4eea9525b82f53855de85f1/sdks/python/apache_beam/runners/portability/artifact_service.py#L306-L310]
>  
> Logs from Application
> {code:java}
> INFO:root:Using Python SDK docker image: apache/beam_python3.6_sdk:2.22.0. If 
> the image is not available at local, we will try to pull from hub.docker.com
>  
> 19INFO:apache_beam.runners.portability.fn_api_runner.translations:
>   
>  20INFO:apache_beam.runners.portability.flink_runner:Adding HTTP protocol 
> scheme to flink_master parameter: http://beam-flink-cluster-jobmanager:8081
>  21INFO:apache_beam.runners.portability.abstract_job_service:Got Prepare 
> request.
>  22INFO:apache_beam.utils.subprocess_server:Downloading job server jar from 
> https://repo.maven.apache.org/maven2/org/apache/beam/beam-runners-flink-1.10-job-server/2.22.0/beam-runners-flink-1.10-job-server-2.22.0.jar
>  23INFO:apache_beam.runners.portability.abstract_job_service:Artifact server 
> started on port 44337
>  24INFO:apache_beam.runners.portability.abstract_job_service:Prepared job 
> 'job' as 'job-880904cc-ec5e-490a-ba93-8e18d87fbd89'
>  25INFO:apache_beam.runners.portability.artifact_service:staging token: 
> 'job-880904cc-ec5e-490a-ba93-8e18d87fbd89'
>  26INFO:apache_beam.runners.portability.artifact_service:artifact service 
> key: 'ref_Environment_default_environment_1', dependencies '[type_urn: 
> "beam:artifact:type:file:v1"
>  27type_payload: 
> "\n;/tmp/tmpsm0_ll8e/build/tfx/dist/tfx_ephemeral-0.22.0.tar.gz"
>   
>  INFO:apache_beam.runners.portability.abstract_job_service:Running job 
> 'job-880904cc-ec5e-490a-ba93-8e18d87fbd89'
>  51INFO:apache_beam.runners.portability.flink_uber_jar_job_server:Started 
> Flink job as bedeef0503bf0a17e3461169e2c9b5bc
>  52INFO:apache_beam.runners.portability.portable_runner:Job state changed to 
> STOPPED
>  53INFO:apache_beam.runners.portability.portable_runner:Job state changed to 
> RUNNING
> {code}
> The job never finish because exception in beam worker pool
>  
> Logs from beam worker pool. 
>  
> {code:java}
> 2020/08/21 18:29:46 Installing extra package: tfx_ephemeral-0.22.0.tar.gz
>  ERROR: Exception:
>  Traceback (most recent call last):
>  File "/usr/local/lib/python3.6/tarfile.py", line 1643, in gzopen
>  t = cls.taropen(name, mode, fileobj, **kwargs)
>  File "/usr/local/lib/python3.6/tarfile.py", line 1619, in taropen
>  return cls(name, mode, fileobj, **kwargs)
>  File "/usr/local/lib/python3.6/tarfile.py", line 1482, in _init_
>  self.firstmember = self.next()
>  File "/usr/local/lib/python3.6/tarfile.py", line 2297, in next
>  tarinfo = self.tarinfo.fromtarfile(self)
>  File "/usr/local/lib/python3.6/tarfile.py", line 1092, in fromtarfile
>  buf = tarfile.fileobj.read(BLOCKSIZE)
>  File "/usr/local/lib/python3.6/gzip.py", line 276, in read
>  return self._buffer.read(size)
>  File "/usr/local/lib/python3.6/_compression.py", line 68, in readinto
>  data = self.read(len(byte_view))
>  File "/usr/local/lib/python3.6/gzip.py", line 463, in read
>  if not self._read_gzip_header():
>  File "/us

[jira] [Created] (BEAM-14133) Fix potential NPE in BigQueryServicesImpl.getErrorInfo

2022-03-18 Thread Heejong Lee (Jira)
Heejong Lee created BEAM-14133:
--

 Summary: Fix potential NPE in BigQueryServicesImpl.getErrorInfo
 Key: BEAM-14133
 URL: https://issues.apache.org/jira/browse/BEAM-14133
 Project: Beam
  Issue Type: Improvement
  Components: io-java-gcp
Reporter: Heejong Lee
Assignee: Heejong Lee


Fix potential NPE in BigQueryServicesImpl.getErrorInfo



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-14114) Parsing datetime string with 0 to 6 decimal points for BigQuery

2022-03-15 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-14114:
---
Description: 
DateTimeFormatter for BigQuery only supports 0, 3, 6 decimal point:

[https://github.com/apache/beam/blob/v2.37.0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryUtils.java#L165]

We might want to support an arbitrary number of decimal points between 0 to 6.

 
{code:java}
Pipeline pipeline = Pipeline.create(options);

pipeline.apply(BigQueryIO.readTableRows()

.fromQuery("select cast("2022-02-18 11:09:12.3456" as datetime) "

"UNION ALL "

"select cast("2022-02-18 11:09:12.345678" as datetime) ")

.usingStandardSql()

).apply(ParDo.of(new DoFn() {

@ProcessElement

public void processElement(@Element TableRow tableRow) {

System.out.println(tableRow);
{code}
 

Error stack:
{noformat}
Caused by: java.time.format.DateTimeParseException: Text 
'2022-02-18T11:09:12.3456' could not be parsed, unparsed text found at index 19

java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:1952)

java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1851)

java.time.LocalDateTime.parse(LocalDateTime.java:492)

org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils.toBeamValue(BigQueryUtils.java:673)

org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils.toBeamRowFieldValue(BigQueryUtils.java:636)
{noformat}

  was:
{code:java}
Pipeline pipeline = Pipeline.create(options);

pipeline.apply(BigQueryIO.readTableRows()

.fromQuery("select cast("2022-02-18 11:09:12.3456" as datetime) "

"UNION ALL "

"select cast("2022-02-18 11:09:12.345678" as datetime) ")

.usingStandardSql()

).apply(ParDo.of(new DoFn() {

@ProcessElement

public void processElement(@Element TableRow tableRow) {

System.out.println(tableRow);
{code}
 

Error stack:
{noformat}
Caused by: java.time.format.DateTimeParseException: Text 
'2022-02-18T11:09:12.3456' could not be parsed, unparsed text found at index 19

java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:1952)

java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1851)

java.time.LocalDateTime.parse(LocalDateTime.java:492)

org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils.toBeamValue(BigQueryUtils.java:673)

org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils.toBeamRowFieldValue(BigQueryUtils.java:636)
{noformat}


> Parsing datetime string with 0 to 6 decimal points for BigQuery
> ---
>
> Key: BEAM-14114
> URL: https://issues.apache.org/jira/browse/BEAM-14114
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Heejong Lee
>Priority: P2
>
> DateTimeFormatter for BigQuery only supports 0, 3, 6 decimal point:
> [https://github.com/apache/beam/blob/v2.37.0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryUtils.java#L165]
> We might want to support an arbitrary number of decimal points between 0 to 6.
>  
> {code:java}
> Pipeline pipeline = Pipeline.create(options);
> pipeline.apply(BigQueryIO.readTableRows()
> .fromQuery("select cast("2022-02-18 11:09:12.3456" as datetime) "
> "UNION ALL "
> "select cast("2022-02-18 11:09:12.345678" as datetime) ")
> .usingStandardSql()
> ).apply(ParDo.of(new DoFn() {
> @ProcessElement
> public void processElement(@Element TableRow tableRow) {
> System.out.println(tableRow);
> {code}
>  
> Error stack:
> {noformat}
> Caused by: java.time.format.DateTimeParseException: Text 
> '2022-02-18T11:09:12.3456' could not be parsed, unparsed text found at index 
> 19
> java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:1952)
> java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1851)
> java.time.LocalDateTime.parse(LocalDateTime.java:492)
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils.toBeamValue(BigQueryUtils.java:673)
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils.toBeamRowFieldValue(BigQueryUtils.java:636)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (BEAM-14114) Parsing datetime string with 0 to 6 decimal points for BigQuery

2022-03-15 Thread Heejong Lee (Jira)
Heejong Lee created BEAM-14114:
--

 Summary: Parsing datetime string with 0 to 6 decimal points for 
BigQuery
 Key: BEAM-14114
 URL: https://issues.apache.org/jira/browse/BEAM-14114
 Project: Beam
  Issue Type: Improvement
  Components: io-java-gcp
Reporter: Heejong Lee


{code:java}
Pipeline pipeline = Pipeline.create(options);

pipeline.apply(BigQueryIO.readTableRows()

.fromQuery("select cast("2022-02-18 11:09:12.3456" as datetime) "

"UNION ALL "

"select cast("2022-02-18 11:09:12.345678" as datetime) ")

.usingStandardSql()

).apply(ParDo.of(new DoFn() {

@ProcessElement

public void processElement(@Element TableRow tableRow) {

System.out.println(tableRow);
{code}
 

Error stack:
{noformat}
Caused by: java.time.format.DateTimeParseException: Text 
'2022-02-18T11:09:12.3456' could not be parsed, unparsed text found at index 19

java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:1952)

java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1851)

java.time.LocalDateTime.parse(LocalDateTime.java:492)

org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils.toBeamValue(BigQueryUtils.java:673)

org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils.toBeamRowFieldValue(BigQueryUtils.java:636)
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13813) Add support for URL artifact to extractStagingToPath

2022-03-07 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-13813:
---
Fix Version/s: 2.37.0
   Resolution: Fixed
   Status: Resolved  (was: Open)

> Add support for URL artifact to extractStagingToPath
> 
>
> Key: BEAM-13813
> URL: https://issues.apache.org/jira/browse/BEAM-13813
> Project: Beam
>  Issue Type: New Feature
>  Components: cross-language
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>  Labels: stale-assigned
> Fix For: 2.37.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Add support for URL artifact to extractStagingToPath



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13629) Update URL artifact type for Dataflow Go

2022-03-07 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-13629:
---
Fix Version/s: 2.37.0
   Resolution: Fixed
   Status: Resolved  (was: Open)

> Update URL artifact type for Dataflow Go
> 
>
> Key: BEAM-13629
> URL: https://issues.apache.org/jira/browse/BEAM-13629
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, sdk-go
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>  Labels: stale-assigned
> Fix For: 2.37.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Update URL artifact type for Dataflow Go



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-13920) Beam x-lang Dataflow tests failing due to _InactiveRpcError

2022-02-14 Thread Heejong Lee (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-13920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17492278#comment-17492278
 ] 

Heejong Lee commented on BEAM-13920:


Looking.

It looks like Gradle task dependencies issue. In the failed tests, no expansion 
service was launched at all. 

> Beam x-lang Dataflow tests failing due to _InactiveRpcError
> ---
>
> Key: BEAM-13920
> URL: https://issues.apache.org/jira/browse/BEAM-13920
> Project: Beam
>  Issue Type: Bug
>  Components: cross-language, test-failures
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Heejong Lee
>Priority: P1
>
> For example,
> https://ci-beam.apache.org/job/beam_PostCommit_XVR_PythonUsingJava_Dataflow/63/testReport/junit/apache_beam.transforms.validate_runner_xlang_test/ValidateRunnerXlangTest/test_group_by_key/
> Seems like we couldn't startup the expansion service or couldn't connect to 
> it. [~heejong] can you check ?
> state = 
> call = 
> with_call = False, deadline = None
> def _end_unary_response_blocking(state, call, with_call, deadline):
> if state.code is grpc.StatusCode.OK:
> if with_call:
> rendezvous = _MultiThreadedRendezvous(state, call, None, 
> deadline)
> return state.response, rendezvous
> else:
> return state.response
> else:
> >   raise _InactiveRpcError(state)
> E   grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that 
> terminated with:
> E status = StatusCode.UNAVAILABLE
> E details = "failed to connect to all addresses"
> E debug_error_string = 
> "{"created":"@1644495578.531912291","description":"Failed to pick 
> subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3134,"referenced_errors":[{"created":"@1644495578.531910958","description":"failed
>  to connect to all 
> addresses","file":"src/core/lib/transport/error_utils.cc","file_line":163,"grpc_status":14}]}"
> E   >



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13615) Bumping up FnApi environment version to 9 in Java, Python SDK

2022-02-11 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-13615:
---
Description: Bumping up FnApi environment version to 9 in Java, Python SDK. 
 (was: Bumping up FnApi environment version to 9 in Java, Python SDK. This is a 
blocker for Python multi-language support GA. )

> Bumping up FnApi environment version to 9 in Java, Python SDK
> -
>
> Key: BEAM-13615
> URL: https://issues.apache.org/jira/browse/BEAM-13615
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, sdk-py-core
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Bumping up FnApi environment version to 9 in Java, Python SDK.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-13615) Bumping up FnApi environment version to 9 in Java, Python SDK

2022-02-11 Thread Heejong Lee (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-13615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17491217#comment-17491217
 ] 

Heejong Lee commented on BEAM-13615:


We need to update the internal system first to allow version 9. Removed the Fix 
Version.

> Bumping up FnApi environment version to 9 in Java, Python SDK
> -
>
> Key: BEAM-13615
> URL: https://issues.apache.org/jira/browse/BEAM-13615
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, sdk-py-core
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Bumping up FnApi environment version to 9 in Java, Python SDK. This is a 
> blocker for Python multi-language support GA. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13615) Bumping up FnApi environment version to 9 in Java, Python SDK

2022-02-11 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-13615:
---
Priority: P2  (was: P1)

> Bumping up FnApi environment version to 9 in Java, Python SDK
> -
>
> Key: BEAM-13615
> URL: https://issues.apache.org/jira/browse/BEAM-13615
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, sdk-py-core
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Bumping up FnApi environment version to 9 in Java, Python SDK. This is a 
> blocker for Python multi-language support GA. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13615) Bumping up FnApi environment version to 9 in Java, Python SDK

2022-02-11 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-13615:
---
Fix Version/s: (was: 2.37.0)

> Bumping up FnApi environment version to 9 in Java, Python SDK
> -
>
> Key: BEAM-13615
> URL: https://issues.apache.org/jira/browse/BEAM-13615
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, sdk-py-core
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P1
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Bumping up FnApi environment version to 9 in Java, Python SDK. This is a 
> blocker for Python multi-language support GA. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (BEAM-13813) Add support for URL artifact to extractStagingToPath

2022-02-02 Thread Heejong Lee (Jira)
Heejong Lee created BEAM-13813:
--

 Summary: Add support for URL artifact to extractStagingToPath
 Key: BEAM-13813
 URL: https://issues.apache.org/jira/browse/BEAM-13813
 Project: Beam
  Issue Type: New Feature
  Components: cross-language
Reporter: Heejong Lee
Assignee: Heejong Lee


Add support for URL artifact to extractStagingToPath



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (BEAM-13793) Adding external transform registry for schema-based payload to Python expansion service

2022-02-01 Thread Heejong Lee (Jira)
Heejong Lee created BEAM-13793:
--

 Summary: Adding external transform registry for schema-based 
payload to Python expansion service
 Key: BEAM-13793
 URL: https://issues.apache.org/jira/browse/BEAM-13793
 Project: Beam
  Issue Type: New Feature
  Components: cross-language, sdk-py-core
Reporter: Heejong Lee
Assignee: Heejong Lee


Adding external transform registry for schema-based payload to Python expansion 
service



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (BEAM-13764) Adding schema-based payload builder to Java external transform

2022-01-27 Thread Heejong Lee (Jira)
Heejong Lee created BEAM-13764:
--

 Summary: Adding schema-based payload builder to Java external 
transform
 Key: BEAM-13764
 URL: https://issues.apache.org/jira/browse/BEAM-13764
 Project: Beam
  Issue Type: New Feature
  Components: cross-language, sdk-java-core
Reporter: Heejong Lee
Assignee: Heejong Lee


Adding schema-based payload builder to Java external transform



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13615) Bumping up FnApi environment version to 9 in Java, Python SDK

2022-01-26 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-13615:
---
Fix Version/s: 2.37.0
   (was: 2.36.0)

> Bumping up FnApi environment version to 9 in Java, Python SDK
> -
>
> Key: BEAM-13615
> URL: https://issues.apache.org/jira/browse/BEAM-13615
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, sdk-py-core
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P1
> Fix For: 2.37.0
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Bumping up FnApi environment version to 9 in Java, Python SDK. This is a 
> blocker for Python multi-language support GA. Hope we could cherry-pick this 
> into 2.36.0 release.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13716) Clear before creating a new virtual environment in setupVirtualenv

2022-01-26 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-13716:
---
Fix Version/s: 2.36.0
   Resolution: Fixed
   Status: Resolved  (was: Open)

> Clear before creating a new virtual environment in setupVirtualenv
> --
>
> Key: BEAM-13716
> URL: https://issues.apache.org/jira/browse/BEAM-13716
> Project: Beam
>  Issue Type: Bug
>  Components: build-system, testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P1
> Fix For: 2.36.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> h2. *Summary*
> An existing virtualenv directory should be cleared before creating a new one.
> h2. *Problem Description*
> A virtualenv directory name for Python tasks is generated from the hash of 
> the project path so any tasks that have the same project path share the same 
> virtualenv directory. The problem is that when {{setupVirtualenv}} task 
> initializes a new virtualenv directory it doesn't overwrite an existing data. 
> This can cause a subtle bug which is very hard to debug. See the following 
> example:
> {noformat}
> ❯ ./gradlew :sdks:python:wordCount -PpythonVersion=3.8
> Configuration on demand is an incubating feature.
> > Task :sdks:python:setupVirtualenv
> > Task :sdks:python:sdist
> > Task :sdks:python:installGcpTest
> Successfully installed apache-beam-2.37.0.dev0 atomicwrites-1.4.0 
> attrs-21.4.0 azure-core-1.21.1 azure-storage-blob-12.9.0 boto3-1.20.41 
> botocore-1.23.41 cachetools-4.2.4 certifi-2021.10.8 cffi-1.15.0 
> charset-normalizer-2.0.10 cloudpickle-2.0.0 crcmod-1.7 cryptography-36.0.1 
> deprecation-2.1.0 dill-0.3.1.1 docker-5.0.3 docopt-0.6.2 execnet-1.9.0 
> fastavro-1.4.9 fasteners-0.17.2 freezegun-1.1.0 google-api-core-1.31.5 
> google-apitools-0.5.31 google-auth-1.35.0 google-cloud-bigquery-2.32.0 
> google-cloud-bigquery-storage-2.11.0 google-cloud-bigtable-1.7.0 
> google-cloud-core-1.7.2 google-cloud-datastore-1.15.3 google-cloud-dlp-3.5.0 
> google-cloud-language-1.3.0 google-cloud-pubsub-2.9.0 
> google-cloud-pubsublite-1.3.0 google-cloud-recommendations-ai-0.2.0 
> google-cloud-spanner-1.19.1 google-cloud-videointelligence-1.16.1 
> google-cloud-vision-1.0.0 google-crc32c-1.3.0 google-resumable-media-2.1.0 
> googleapis-common-protos-1.54.0 greenlet-1.1.2 grpc-google-iam-v1-0.12.3 
> grpcio-gcp-0.2.2 grpcio-status-1.43.0 hdfs-2.6.0 httplib2-0.19.1 idna-3.3 
> isodate-0.6.1 jmespath-0.10.0 libcst-0.4.0 mock-2.0.0 more-itertools-8.12.0 
> msrest-0.6.21 mypy-extensions-0.4.3 numpy-1.21.5 oauth2client-4.1.3 
> oauthlib-3.1.1 orjson-3.6.5 overrides-6.1.0 pandas-1.3.5 parameterized-0.7.5 
> pbr-5.8.0 pluggy-0.13.1 proto-plus-1.19.8 psycopg2-binary-2.9.3 pyarrow-6.0.1 
> pyasn1-0.4.8 pyasn1-modules-0.2.8 pycparser-2.21 pydot-1.4.2 
> pyhamcrest-1.10.1 pymongo-3.12.3 pyparsing-2.4.7 pytest-4.6.11 
> pytest-forked-1.4.0 pytest-timeout-1.4.2 pytest-xdist-1.34.0 
> python-dateutil-2.8.2 pytz-2021.3 pyyaml-6.0 requests-2.27.1 
> requests-mock-1.9.3 requests-oauthlib-1.3.0 rsa-4.8 s3transfer-0.5.0 
> sqlalchemy-1.4.31 tenacity-5.1.5 testcontainers-3.4.2 
> typing-extensions-3.10.0.2 typing-inspect-0.7.1 typing-utils-0.1.0 
> urllib3-1.26.8 wcwidth-0.2.5 websocket-client-1.2.3 wrapt-1.13.3
> > Task :sdks:python:wordCount
> INFO:apache_beam.internal.gcp.auth:Setting socket default timeout to 60 
> seconds.
> INFO:apache_beam.internal.gcp.auth:socket default timeout is 60.0 seconds.
> INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
> INFO:oauth2client.client:Refreshing access_token
> WARNING:root:Make sure that locally built Python SDK docker image has Python 
> 3.8 interpreter.
> INFO:root:Default Python SDK image for environment is 
> apache/beam_python3.8_sdk:2.37.0.dev
> INFO:apache_beam.runners.portability.fn_api_runner.translations:
>   
> 
> INFO:apache_beam.runners.portability.fn_api_runner.translations:
>   
> INFO:apache_beam.runners.portability.fn_api_runner.translations:
>   
> INFO:apache_beam.runners.portability.fn_api_runner.translations:
>   
> INFO:apache_beam.runners.portability.fn_api_runner.translations:
>   
> INFO:apache_beam.runners.portability.fn_api_runner.translations:
>   
> INFO:apache_beam.runners.portability.fn_api_runner.translations:
>   
> INFO:apache_beam.runners.portability.fn_api_runner.translations:
>   
> INFO:apache_beam.runners.portability.fn_api_runner.translations:
>  

[jira] [Updated] (BEAM-13716) Clear before creating a new virtual environment in setupVirtualenv

2022-01-24 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-13716:
---
Description: 
h2. *Summary*

An existing virtualenv directory should be cleared before creating a new one.
h2. *Problem Description*

A virtualenv directory name for Python tasks is generated from the hash of the 
project path so any tasks that have the same project path share the same 
virtualenv directory. The problem is that when {{setupVirtualenv}} task 
initializes a new virtualenv directory it doesn't overwrite an existing data. 
This can cause a subtle bug which is very hard to debug. See the following 
example:
{noformat}
❯ ./gradlew :sdks:python:wordCount -PpythonVersion=3.8
Configuration on demand is an incubating feature.

> Task :sdks:python:setupVirtualenv

> Task :sdks:python:sdist

> Task :sdks:python:installGcpTest
Successfully installed apache-beam-2.37.0.dev0 atomicwrites-1.4.0 attrs-21.4.0 
azure-core-1.21.1 azure-storage-blob-12.9.0 boto3-1.20.41 botocore-1.23.41 
cachetools-4.2.4 certifi-2021.10.8 cffi-1.15.0 charset-normalizer-2.0.10 
cloudpickle-2.0.0 crcmod-1.7 cryptography-36.0.1 deprecation-2.1.0 dill-0.3.1.1 
docker-5.0.3 docopt-0.6.2 execnet-1.9.0 fastavro-1.4.9 fasteners-0.17.2 
freezegun-1.1.0 google-api-core-1.31.5 google-apitools-0.5.31 
google-auth-1.35.0 google-cloud-bigquery-2.32.0 
google-cloud-bigquery-storage-2.11.0 google-cloud-bigtable-1.7.0 
google-cloud-core-1.7.2 google-cloud-datastore-1.15.3 google-cloud-dlp-3.5.0 
google-cloud-language-1.3.0 google-cloud-pubsub-2.9.0 
google-cloud-pubsublite-1.3.0 google-cloud-recommendations-ai-0.2.0 
google-cloud-spanner-1.19.1 google-cloud-videointelligence-1.16.1 
google-cloud-vision-1.0.0 google-crc32c-1.3.0 google-resumable-media-2.1.0 
googleapis-common-protos-1.54.0 greenlet-1.1.2 grpc-google-iam-v1-0.12.3 
grpcio-gcp-0.2.2 grpcio-status-1.43.0 hdfs-2.6.0 httplib2-0.19.1 idna-3.3 
isodate-0.6.1 jmespath-0.10.0 libcst-0.4.0 mock-2.0.0 more-itertools-8.12.0 
msrest-0.6.21 mypy-extensions-0.4.3 numpy-1.21.5 oauth2client-4.1.3 
oauthlib-3.1.1 orjson-3.6.5 overrides-6.1.0 pandas-1.3.5 parameterized-0.7.5 
pbr-5.8.0 pluggy-0.13.1 proto-plus-1.19.8 psycopg2-binary-2.9.3 pyarrow-6.0.1 
pyasn1-0.4.8 pyasn1-modules-0.2.8 pycparser-2.21 pydot-1.4.2 pyhamcrest-1.10.1 
pymongo-3.12.3 pyparsing-2.4.7 pytest-4.6.11 pytest-forked-1.4.0 
pytest-timeout-1.4.2 pytest-xdist-1.34.0 python-dateutil-2.8.2 pytz-2021.3 
pyyaml-6.0 requests-2.27.1 requests-mock-1.9.3 requests-oauthlib-1.3.0 rsa-4.8 
s3transfer-0.5.0 sqlalchemy-1.4.31 tenacity-5.1.5 testcontainers-3.4.2 
typing-extensions-3.10.0.2 typing-inspect-0.7.1 typing-utils-0.1.0 
urllib3-1.26.8 wcwidth-0.2.5 websocket-client-1.2.3 wrapt-1.13.3

> Task :sdks:python:wordCount
INFO:apache_beam.internal.gcp.auth:Setting socket default timeout to 60 seconds.
INFO:apache_beam.internal.gcp.auth:socket default timeout is 60.0 seconds.
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
WARNING:root:Make sure that locally built Python SDK docker image has Python 
3.8 interpreter.
INFO:root:Default Python SDK image for environment is 
apache/beam_python3.8_sdk:2.37.0.dev
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.worker.statecache:Creating state cache with size 100
INFO:apache_beam.runners.portability.fn_api_runner.worker_handlers:Created 
Worker handler 
 for environment ref_Environment_default_environment_1 
(beam:env:embedded_python:v1, b'')
INFO:apache_beam.runners.port

[jira] [Updated] (BEAM-13716) Clear before creating a new virtual environment in setupVirtualenv

2022-01-21 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-13716:
---
Description: 
h2. *TL;DR*

An existing virtualenv directory should be cleared before creating a new one.
h2. *Problem Description*

A virtualenv directory name for Python tasks is generated from the hash of the 
project path so any tasks that have the same project path share the same 
virtualenv directory. The problem is that when {{setupVirtualenv}} task 
initializes a new virtualenv directory it doesn't overwrite an existing data. 
This can cause a subtle bug which is very hard to debug. See the following 
example:
{noformat}
❯ ./gradlew :sdks:python:wordCount -PpythonVersion=3.8
Configuration on demand is an incubating feature.

> Task :sdks:python:setupVirtualenv

> Task :sdks:python:sdist

> Task :sdks:python:installGcpTest
Successfully installed apache-beam-2.37.0.dev0 atomicwrites-1.4.0 attrs-21.4.0 
azure-core-1.21.1 azure-storage-blob-12.9.0 boto3-1.20.41 botocore-1.23.41 
cachetools-4.2.4 certifi-2021.10.8 cffi-1.15.0 charset-normalizer-2.0.10 
cloudpickle-2.0.0 crcmod-1.7 cryptography-36.0.1 deprecation-2.1.0 dill-0.3.1.1 
docker-5.0.3 docopt-0.6.2 execnet-1.9.0 fastavro-1.4.9 fasteners-0.17.2 
freezegun-1.1.0 google-api-core-1.31.5 google-apitools-0.5.31 
google-auth-1.35.0 google-cloud-bigquery-2.32.0 
google-cloud-bigquery-storage-2.11.0 google-cloud-bigtable-1.7.0 
google-cloud-core-1.7.2 google-cloud-datastore-1.15.3 google-cloud-dlp-3.5.0 
google-cloud-language-1.3.0 google-cloud-pubsub-2.9.0 
google-cloud-pubsublite-1.3.0 google-cloud-recommendations-ai-0.2.0 
google-cloud-spanner-1.19.1 google-cloud-videointelligence-1.16.1 
google-cloud-vision-1.0.0 google-crc32c-1.3.0 google-resumable-media-2.1.0 
googleapis-common-protos-1.54.0 greenlet-1.1.2 grpc-google-iam-v1-0.12.3 
grpcio-gcp-0.2.2 grpcio-status-1.43.0 hdfs-2.6.0 httplib2-0.19.1 idna-3.3 
isodate-0.6.1 jmespath-0.10.0 libcst-0.4.0 mock-2.0.0 more-itertools-8.12.0 
msrest-0.6.21 mypy-extensions-0.4.3 numpy-1.21.5 oauth2client-4.1.3 
oauthlib-3.1.1 orjson-3.6.5 overrides-6.1.0 pandas-1.3.5 parameterized-0.7.5 
pbr-5.8.0 pluggy-0.13.1 proto-plus-1.19.8 psycopg2-binary-2.9.3 pyarrow-6.0.1 
pyasn1-0.4.8 pyasn1-modules-0.2.8 pycparser-2.21 pydot-1.4.2 pyhamcrest-1.10.1 
pymongo-3.12.3 pyparsing-2.4.7 pytest-4.6.11 pytest-forked-1.4.0 
pytest-timeout-1.4.2 pytest-xdist-1.34.0 python-dateutil-2.8.2 pytz-2021.3 
pyyaml-6.0 requests-2.27.1 requests-mock-1.9.3 requests-oauthlib-1.3.0 rsa-4.8 
s3transfer-0.5.0 sqlalchemy-1.4.31 tenacity-5.1.5 testcontainers-3.4.2 
typing-extensions-3.10.0.2 typing-inspect-0.7.1 typing-utils-0.1.0 
urllib3-1.26.8 wcwidth-0.2.5 websocket-client-1.2.3 wrapt-1.13.3

> Task :sdks:python:wordCount
INFO:apache_beam.internal.gcp.auth:Setting socket default timeout to 60 seconds.
INFO:apache_beam.internal.gcp.auth:socket default timeout is 60.0 seconds.
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
WARNING:root:Make sure that locally built Python SDK docker image has Python 
3.8 interpreter.
INFO:root:Default Python SDK image for environment is 
apache/beam_python3.8_sdk:2.37.0.dev
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.worker.statecache:Creating state cache with size 100
INFO:apache_beam.runners.portability.fn_api_runner.worker_handlers:Created 
Worker handler 
 for environment ref_Environment_default_environment_1 
(beam:env:embedded_python:v1, b'')
INFO:apache_beam.runners.portab

[jira] [Updated] (BEAM-13716) Clear before creating a new virtual environment in setupVirtualenv

2022-01-21 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-13716:
---
Description: 
h2. *TL;DR*

An existing virtualenv directory should be cleared before creating a new one.
h2. *Problem Description*

A virtualenv directory name for Python tasks is generated from the hash of the 
project path so any tasks that have the same project path share the same 
virtualenv directory. The problem is that when {{setupVirtualenv}} task 
initializes a new virtualenv directory it doesn't overwrite an existing data. 
This can cause a subtle bug which is very hard to debug. See the following 
example:
{noformat}
❯ ./gradlew :sdks:python:wordCount -PpythonVersion=3.8
Configuration on demand is an incubating feature.

> Task :sdks:python:setupVirtualenv

> Task :sdks:python:sdist

> Task :sdks:python:installGcpTest
Successfully installed apache-beam-2.37.0.dev0 atomicwrites-1.4.0 attrs-21.4.0 
azure-core-1.21.1 azure-storage-blob-12.9.0 boto3-1.20.41 botocore-1.23.41 
cachetools-4.2.4 certifi-2021.10.8 cffi-1.15.0 charset-normalizer-2.0.10 
cloudpickle-2.0.0 crcmod-1.7 cryptography-36.0.1 deprecation-2.1.0 dill-0.3.1.1 
docker-5.0.3 docopt-0.6.2 execnet-1.9.0 fastavro-1.4.9 fasteners-0.17.2 
freezegun-1.1.0 google-api-core-1.31.5 google-apitools-0.5.31 
google-auth-1.35.0 google-cloud-bigquery-2.32.0 
google-cloud-bigquery-storage-2.11.0 google-cloud-bigtable-1.7.0 
google-cloud-core-1.7.2 google-cloud-datastore-1.15.3 google-cloud-dlp-3.5.0 
google-cloud-language-1.3.0 google-cloud-pubsub-2.9.0 
google-cloud-pubsublite-1.3.0 google-cloud-recommendations-ai-0.2.0 
google-cloud-spanner-1.19.1 google-cloud-videointelligence-1.16.1 
google-cloud-vision-1.0.0 google-crc32c-1.3.0 google-resumable-media-2.1.0 
googleapis-common-protos-1.54.0 greenlet-1.1.2 grpc-google-iam-v1-0.12.3 
grpcio-gcp-0.2.2 grpcio-status-1.43.0 hdfs-2.6.0 httplib2-0.19.1 idna-3.3 
isodate-0.6.1 jmespath-0.10.0 libcst-0.4.0 mock-2.0.0 more-itertools-8.12.0 
msrest-0.6.21 mypy-extensions-0.4.3 numpy-1.21.5 oauth2client-4.1.3 
oauthlib-3.1.1 orjson-3.6.5 overrides-6.1.0 pandas-1.3.5 parameterized-0.7.5 
pbr-5.8.0 pluggy-0.13.1 proto-plus-1.19.8 psycopg2-binary-2.9.3 pyarrow-6.0.1 
pyasn1-0.4.8 pyasn1-modules-0.2.8 pycparser-2.21 pydot-1.4.2 pyhamcrest-1.10.1 
pymongo-3.12.3 pyparsing-2.4.7 pytest-4.6.11 pytest-forked-1.4.0 
pytest-timeout-1.4.2 pytest-xdist-1.34.0 python-dateutil-2.8.2 pytz-2021.3 
pyyaml-6.0 requests-2.27.1 requests-mock-1.9.3 requests-oauthlib-1.3.0 rsa-4.8 
s3transfer-0.5.0 sqlalchemy-1.4.31 tenacity-5.1.5 testcontainers-3.4.2 
typing-extensions-3.10.0.2 typing-inspect-0.7.1 typing-utils-0.1.0 
urllib3-1.26.8 wcwidth-0.2.5 websocket-client-1.2.3 wrapt-1.13.3

> Task :sdks:python:wordCount
INFO:apache_beam.internal.gcp.auth:Setting socket default timeout to 60 seconds.
INFO:apache_beam.internal.gcp.auth:socket default timeout is 60.0 seconds.
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
WARNING:root:Make sure that locally built Python SDK docker image has Python 
3.8 interpreter.
INFO:root:Default Python SDK image for environment is 
apache/beam_python3.8_sdk:2.37.0.dev
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.worker.statecache:Creating state cache with size 100
INFO:apache_beam.runners.portability.fn_api_runner.worker_handlers:Created 
Worker handler 
 for environment ref_Environment_default_environment_1 
(beam:env:embedded_python:v1, b'')
INFO:apache_beam.runners.portab

[jira] [Updated] (BEAM-13716) Clear before creating a new virtual environment in setupVirtualenv

2022-01-21 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-13716:
---
Description: 
h2. *TL;DR*

An existing virtualenv directory should be cleared before creating a new one.
h2. *Problem Description*

A virtualenv directory name for Python tasks is generated from the hash of the 
project path so any tasks that have the same project path share the same 
virtualenv directory. The problem is that when {{setupVirtualenv}} task 
initializes a new virtualenv directory it doesn't overwrite an existing data. 
This can cause a subtle bug which is very hard to debug. See the following 
example:
{noformat}
❯ ./gradlew :sdks:python:wordCount -PpythonVersion=3.8
Configuration on demand is an incubating feature.

> Task :sdks:python:setupVirtualenv

> Task :sdks:python:sdist

> Task :sdks:python:installGcpTest
Successfully installed apache-beam-2.37.0.dev0 atomicwrites-1.4.0 attrs-21.4.0 
azure-core-1.21.1 azure-storage-blob-12.9.0 boto3-1.20.41 botocore-1.23.41 
cachetools-4.2.4 certifi-2021.10.8 cffi-1.15.0 charset-normalizer-2.0.10 
cloudpickle-2.0.0 crcmod-1.7 cryptography-36.0.1 deprecation-2.1.0 dill-0.3.1.1 
docker-5.0.3 docopt-0.6.2 execnet-1.9.0 fastavro-1.4.9 fasteners-0.17.2 
freezegun-1.1.0 google-api-core-1.31.5 google-apitools-0.5.31 
google-auth-1.35.0 google-cloud-bigquery-2.32.0 
google-cloud-bigquery-storage-2.11.0 google-cloud-bigtable-1.7.0 
google-cloud-core-1.7.2 google-cloud-datastore-1.15.3 google-cloud-dlp-3.5.0 
google-cloud-language-1.3.0 google-cloud-pubsub-2.9.0 
google-cloud-pubsublite-1.3.0 google-cloud-recommendations-ai-0.2.0 
google-cloud-spanner-1.19.1 google-cloud-videointelligence-1.16.1 
google-cloud-vision-1.0.0 google-crc32c-1.3.0 google-resumable-media-2.1.0 
googleapis-common-protos-1.54.0 greenlet-1.1.2 grpc-google-iam-v1-0.12.3 
grpcio-gcp-0.2.2 grpcio-status-1.43.0 hdfs-2.6.0 httplib2-0.19.1 idna-3.3 
isodate-0.6.1 jmespath-0.10.0 libcst-0.4.0 mock-2.0.0 more-itertools-8.12.0 
msrest-0.6.21 mypy-extensions-0.4.3 numpy-1.21.5 oauth2client-4.1.3 
oauthlib-3.1.1 orjson-3.6.5 overrides-6.1.0 pandas-1.3.5 parameterized-0.7.5 
pbr-5.8.0 pluggy-0.13.1 proto-plus-1.19.8 psycopg2-binary-2.9.3 pyarrow-6.0.1 
pyasn1-0.4.8 pyasn1-modules-0.2.8 pycparser-2.21 pydot-1.4.2 pyhamcrest-1.10.1 
pymongo-3.12.3 pyparsing-2.4.7 pytest-4.6.11 pytest-forked-1.4.0 
pytest-timeout-1.4.2 pytest-xdist-1.34.0 python-dateutil-2.8.2 pytz-2021.3 
pyyaml-6.0 requests-2.27.1 requests-mock-1.9.3 requests-oauthlib-1.3.0 rsa-4.8 
s3transfer-0.5.0 sqlalchemy-1.4.31 tenacity-5.1.5 testcontainers-3.4.2 
typing-extensions-3.10.0.2 typing-inspect-0.7.1 typing-utils-0.1.0 
urllib3-1.26.8 wcwidth-0.2.5 websocket-client-1.2.3 wrapt-1.13.3

> Task :sdks:python:wordCount
INFO:apache_beam.internal.gcp.auth:Setting socket default timeout to 60 seconds.
INFO:apache_beam.internal.gcp.auth:socket default timeout is 60.0 seconds.
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
WARNING:root:Make sure that locally built Python SDK docker image has Python 
3.8 interpreter.
INFO:root:Default Python SDK image for environment is 
apache/beam_python3.8_sdk:2.37.0.dev
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.worker.statecache:Creating state cache with size 100
INFO:apache_beam.runners.portability.fn_api_runner.worker_handlers:Created 
Worker handler 
 for environment ref_Environment_default_environment_1 
(beam:env:embedded_python:v1, b'')
INFO:apache_beam.runners.portab

[jira] [Updated] (BEAM-13716) Clear before creating a new virtual environment in setupVirtualenv

2022-01-21 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-13716:
---
Description: 
h2. *TL;DR*

An existing virtualenv directory should be cleared before creating a new one.
h2. *Problem Description*

A virtualenv directory name for Python tasks is generated from the hash of the 
project path so any tasks that have the same project path share the same 
virtualenv directory. The problem is that when {{setupVirtualenv}} task 
initializes a new virtualenv directory it doesn't overwrite an existing data. 
This can cause a subtle bug which is very hard to debug. See the following 
example:
{noformat}
❯ ./gradlew :sdks:python:wordCount -PpythonVersion=3.8
Configuration on demand is an incubating feature.

> Task :sdks:python:setupVirtualenv

> Task :sdks:python:sdist

> Task :sdks:python:installGcpTest
Successfully installed apache-beam-2.37.0.dev0 atomicwrites-1.4.0 attrs-21.4.0 
azure-core-1.21.1 azure-storage-blob-12.9.0 boto3-1.20.41 botocore-1.23.41 
cachetools-4.2.4 certifi-2021.10.8 cffi-1.15.0 charset-normalizer-2.0.10 
cloudpickle-2.0.0 crcmod-1.7 cryptography-36.0.1 deprecation-2.1.0 dill-0.3.1.1 
docker-5.0.3 docopt-0.6.2 execnet-1.9.0 fastavro-1.4.9 fasteners-0.17.2 
freezegun-1.1.0 google-api-core-1.31.5 google-apitools-0.5.31 
google-auth-1.35.0 google-cloud-bigquery-2.32.0 
google-cloud-bigquery-storage-2.11.0 google-cloud-bigtable-1.7.0 
google-cloud-core-1.7.2 google-cloud-datastore-1.15.3 google-cloud-dlp-3.5.0 
google-cloud-language-1.3.0 google-cloud-pubsub-2.9.0 
google-cloud-pubsublite-1.3.0 google-cloud-recommendations-ai-0.2.0 
google-cloud-spanner-1.19.1 google-cloud-videointelligence-1.16.1 
google-cloud-vision-1.0.0 google-crc32c-1.3.0 google-resumable-media-2.1.0 
googleapis-common-protos-1.54.0 greenlet-1.1.2 grpc-google-iam-v1-0.12.3 
grpcio-gcp-0.2.2 grpcio-status-1.43.0 hdfs-2.6.0 httplib2-0.19.1 idna-3.3 
isodate-0.6.1 jmespath-0.10.0 libcst-0.4.0 mock-2.0.0 more-itertools-8.12.0 
msrest-0.6.21 mypy-extensions-0.4.3 numpy-1.21.5 oauth2client-4.1.3 
oauthlib-3.1.1 orjson-3.6.5 overrides-6.1.0 pandas-1.3.5 parameterized-0.7.5 
pbr-5.8.0 pluggy-0.13.1 proto-plus-1.19.8 psycopg2-binary-2.9.3 pyarrow-6.0.1 
pyasn1-0.4.8 pyasn1-modules-0.2.8 pycparser-2.21 pydot-1.4.2 pyhamcrest-1.10.1 
pymongo-3.12.3 pyparsing-2.4.7 pytest-4.6.11 pytest-forked-1.4.0 
pytest-timeout-1.4.2 pytest-xdist-1.34.0 python-dateutil-2.8.2 pytz-2021.3 
pyyaml-6.0 requests-2.27.1 requests-mock-1.9.3 requests-oauthlib-1.3.0 rsa-4.8 
s3transfer-0.5.0 sqlalchemy-1.4.31 tenacity-5.1.5 testcontainers-3.4.2 
typing-extensions-3.10.0.2 typing-inspect-0.7.1 typing-utils-0.1.0 
urllib3-1.26.8 wcwidth-0.2.5 websocket-client-1.2.3 wrapt-1.13.3

> Task :sdks:python:wordCount
INFO:apache_beam.internal.gcp.auth:Setting socket default timeout to 60 seconds.
INFO:apache_beam.internal.gcp.auth:socket default timeout is 60.0 seconds.
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
WARNING:root:Make sure that locally built Python SDK docker image has Python 
3.8 interpreter.
INFO:root:Default Python SDK image for environment is 
apache/beam_python3.8_sdk:2.37.0.dev
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.worker.statecache:Creating state cache with size 100
INFO:apache_beam.runners.portability.fn_api_runner.worker_handlers:Created 
Worker handler 
 for environment ref_Environment_default_environment_1 
(beam:env:embedded_python:v1, b'')
INFO:apache_beam.runners.portab

[jira] [Updated] (BEAM-13716) Clear before creating a new virtual environment in setupVirtualenv

2022-01-21 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-13716:
---
Description: 
h2. *TL;DR*

An existing virtualenv directory should be cleared before creating a new one.
h2. *Problem Description*

A virtualenv directory name for Python tasks is generated from the hash of the 
project path so any tasks that have the same project path share the same 
virtualenv directory. The problem is that when {{setupVirtualenv}} task 
initializes a virtualenv directory it doesn't overwrite an existing 
configuration. This can cause a subtle bug which is very hard to debug. See the 
following example:
{noformat}
❯ ./gradlew :sdks:python:wordCount -PpythonVersion=3.8
Configuration on demand is an incubating feature.

> Task :sdks:python:setupVirtualenv

> Task :sdks:python:sdist

> Task :sdks:python:installGcpTest
Successfully installed apache-beam-2.37.0.dev0 atomicwrites-1.4.0 attrs-21.4.0 
azure-core-1.21.1 azure-storage-blob-12.9.0 boto3-1.20.41 botocore-1.23.41 
cachetools-4.2.4 certifi-2021.10.8 cffi-1.15.0 charset-normalizer-2.0.10 
cloudpickle-2.0.0 crcmod-1.7 cryptography-36.0.1 deprecation-2.1.0 dill-0.3.1.1 
docker-5.0.3 docopt-0.6.2 execnet-1.9.0 fastavro-1.4.9 fasteners-0.17.2 
freezegun-1.1.0 google-api-core-1.31.5 google-apitools-0.5.31 
google-auth-1.35.0 google-cloud-bigquery-2.32.0 
google-cloud-bigquery-storage-2.11.0 google-cloud-bigtable-1.7.0 
google-cloud-core-1.7.2 google-cloud-datastore-1.15.3 google-cloud-dlp-3.5.0 
google-cloud-language-1.3.0 google-cloud-pubsub-2.9.0 
google-cloud-pubsublite-1.3.0 google-cloud-recommendations-ai-0.2.0 
google-cloud-spanner-1.19.1 google-cloud-videointelligence-1.16.1 
google-cloud-vision-1.0.0 google-crc32c-1.3.0 google-resumable-media-2.1.0 
googleapis-common-protos-1.54.0 greenlet-1.1.2 grpc-google-iam-v1-0.12.3 
grpcio-gcp-0.2.2 grpcio-status-1.43.0 hdfs-2.6.0 httplib2-0.19.1 idna-3.3 
isodate-0.6.1 jmespath-0.10.0 libcst-0.4.0 mock-2.0.0 more-itertools-8.12.0 
msrest-0.6.21 mypy-extensions-0.4.3 numpy-1.21.5 oauth2client-4.1.3 
oauthlib-3.1.1 orjson-3.6.5 overrides-6.1.0 pandas-1.3.5 parameterized-0.7.5 
pbr-5.8.0 pluggy-0.13.1 proto-plus-1.19.8 psycopg2-binary-2.9.3 pyarrow-6.0.1 
pyasn1-0.4.8 pyasn1-modules-0.2.8 pycparser-2.21 pydot-1.4.2 pyhamcrest-1.10.1 
pymongo-3.12.3 pyparsing-2.4.7 pytest-4.6.11 pytest-forked-1.4.0 
pytest-timeout-1.4.2 pytest-xdist-1.34.0 python-dateutil-2.8.2 pytz-2021.3 
pyyaml-6.0 requests-2.27.1 requests-mock-1.9.3 requests-oauthlib-1.3.0 rsa-4.8 
s3transfer-0.5.0 sqlalchemy-1.4.31 tenacity-5.1.5 testcontainers-3.4.2 
typing-extensions-3.10.0.2 typing-inspect-0.7.1 typing-utils-0.1.0 
urllib3-1.26.8 wcwidth-0.2.5 websocket-client-1.2.3 wrapt-1.13.3

> Task :sdks:python:wordCount
INFO:apache_beam.internal.gcp.auth:Setting socket default timeout to 60 seconds.
INFO:apache_beam.internal.gcp.auth:socket default timeout is 60.0 seconds.
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
WARNING:root:Make sure that locally built Python SDK docker image has Python 
3.8 interpreter.
INFO:root:Default Python SDK image for environment is 
apache/beam_python3.8_sdk:2.37.0.dev
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.worker.statecache:Creating state cache with size 100
INFO:apache_beam.runners.portability.fn_api_runner.worker_handlers:Created 
Worker handler 
 for environment ref_Environment_default_environment_1 
(beam:env:embedded_python:v1, b'')
INFO:apache_beam.runners.p

[jira] [Updated] (BEAM-13716) Clear before creating a new virtual environment in setupVirtualenv

2022-01-21 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-13716:
---
Description: 
h2. *TL;DR*

An existing virtualenv directory should be cleared before creating a new one.
h2. *Problem Description*

A virtualenv directory for Python tasks is generated from the project path so 
any tasks that have the same project path share the same virtualenv directory. 
The problem is that when {{setupVirtualenv}} task initializes a virtualenv 
directory it doesn't overwrite an existing configuration. This can cause a 
subtle bug which is very hard to debug. See the following example:
{noformat}
❯ ./gradlew :sdks:python:wordCount -PpythonVersion=3.8
Configuration on demand is an incubating feature.

> Task :sdks:python:setupVirtualenv

> Task :sdks:python:sdist

> Task :sdks:python:installGcpTest
Successfully installed apache-beam-2.37.0.dev0 atomicwrites-1.4.0 attrs-21.4.0 
azure-core-1.21.1 azure-storage-blob-12.9.0 boto3-1.20.41 botocore-1.23.41 
cachetools-4.2.4 certifi-2021.10.8 cffi-1.15.0 charset-normalizer-2.0.10 
cloudpickle-2.0.0 crcmod-1.7 cryptography-36.0.1 deprecation-2.1.0 dill-0.3.1.1 
docker-5.0.3 docopt-0.6.2 execnet-1.9.0 fastavro-1.4.9 fasteners-0.17.2 
freezegun-1.1.0 google-api-core-1.31.5 google-apitools-0.5.31 
google-auth-1.35.0 google-cloud-bigquery-2.32.0 
google-cloud-bigquery-storage-2.11.0 google-cloud-bigtable-1.7.0 
google-cloud-core-1.7.2 google-cloud-datastore-1.15.3 google-cloud-dlp-3.5.0 
google-cloud-language-1.3.0 google-cloud-pubsub-2.9.0 
google-cloud-pubsublite-1.3.0 google-cloud-recommendations-ai-0.2.0 
google-cloud-spanner-1.19.1 google-cloud-videointelligence-1.16.1 
google-cloud-vision-1.0.0 google-crc32c-1.3.0 google-resumable-media-2.1.0 
googleapis-common-protos-1.54.0 greenlet-1.1.2 grpc-google-iam-v1-0.12.3 
grpcio-gcp-0.2.2 grpcio-status-1.43.0 hdfs-2.6.0 httplib2-0.19.1 idna-3.3 
isodate-0.6.1 jmespath-0.10.0 libcst-0.4.0 mock-2.0.0 more-itertools-8.12.0 
msrest-0.6.21 mypy-extensions-0.4.3 numpy-1.21.5 oauth2client-4.1.3 
oauthlib-3.1.1 orjson-3.6.5 overrides-6.1.0 pandas-1.3.5 parameterized-0.7.5 
pbr-5.8.0 pluggy-0.13.1 proto-plus-1.19.8 psycopg2-binary-2.9.3 pyarrow-6.0.1 
pyasn1-0.4.8 pyasn1-modules-0.2.8 pycparser-2.21 pydot-1.4.2 pyhamcrest-1.10.1 
pymongo-3.12.3 pyparsing-2.4.7 pytest-4.6.11 pytest-forked-1.4.0 
pytest-timeout-1.4.2 pytest-xdist-1.34.0 python-dateutil-2.8.2 pytz-2021.3 
pyyaml-6.0 requests-2.27.1 requests-mock-1.9.3 requests-oauthlib-1.3.0 rsa-4.8 
s3transfer-0.5.0 sqlalchemy-1.4.31 tenacity-5.1.5 testcontainers-3.4.2 
typing-extensions-3.10.0.2 typing-inspect-0.7.1 typing-utils-0.1.0 
urllib3-1.26.8 wcwidth-0.2.5 websocket-client-1.2.3 wrapt-1.13.3

> Task :sdks:python:wordCount
INFO:apache_beam.internal.gcp.auth:Setting socket default timeout to 60 seconds.
INFO:apache_beam.internal.gcp.auth:socket default timeout is 60.0 seconds.
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
WARNING:root:Make sure that locally built Python SDK docker image has Python 
3.8 interpreter.
INFO:root:Default Python SDK image for environment is 
apache/beam_python3.8_sdk:2.37.0.dev
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.worker.statecache:Creating state cache with size 100
INFO:apache_beam.runners.portability.fn_api_runner.worker_handlers:Created 
Worker handler 
 for environment ref_Environment_default_environment_1 
(beam:env:embedded_python:v1, b'')
INFO:apache_beam.runners.portability.fn_api_

[jira] [Updated] (BEAM-13716) Clear before creating a new virtual environment in setupVirtualenv

2022-01-21 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-13716:
---
Description: 
h2. *TL;DR*

An existing virtualenv directory should be cleared before creating a new one.
h2. *Problem Description*

A virtualenv directory for Python tasks is generated from the project path so 
any tasks that have the same project path share the same virtualenv directory. 
The problem is that when {{setupVirtualenv}} task initializes a virtualenv 
directory it doesn't overwrite an existing configuration. This can cause a 
subtle bug which is very hard to debug. See the following example:
{noformat}
❯ ./gradlew :sdks:python:wordCount -PpythonVersion=3.8
Configuration on demand is an incubating feature.

> Task :sdks:python:setupVirtualenv

> Task :sdks:python:sdist

> Task :sdks:python:installGcpTest
Successfully installed apache-beam-2.37.0.dev0 atomicwrites-1.4.0 attrs-21.4.0 
azure-core-1.21.1 azure-storage-blob-12.9.0 boto3-1.20.41 botocore-1.23.41 
cachetools-4.2.4 certifi-2021.10.8 cffi-1.15.0 charset-normalizer-2.0.10 
cloudpickle-2.0.0 crcmod-1.7 cryptography-36.0.1 deprecation-2.1.0 dill-0.3.1.1 
docker-5.0.3 docopt-0.6.2 execnet-1.9.0 fastavro-1.4.9 fasteners-0.17.2 
freezegun-1.1.0 google-api-core-1.31.5 google-apitools-0.5.31 
google-auth-1.35.0 google-cloud-bigquery-2.32.0 
google-cloud-bigquery-storage-2.11.0 google-cloud-bigtable-1.7.0 
google-cloud-core-1.7.2 google-cloud-datastore-1.15.3 google-cloud-dlp-3.5.0 
google-cloud-language-1.3.0 google-cloud-pubsub-2.9.0 
google-cloud-pubsublite-1.3.0 google-cloud-recommendations-ai-0.2.0 
google-cloud-spanner-1.19.1 google-cloud-videointelligence-1.16.1 
google-cloud-vision-1.0.0 google-crc32c-1.3.0 google-resumable-media-2.1.0 
googleapis-common-protos-1.54.0 greenlet-1.1.2 grpc-google-iam-v1-0.12.3 
grpcio-gcp-0.2.2 grpcio-status-1.43.0 hdfs-2.6.0 httplib2-0.19.1 idna-3.3 
isodate-0.6.1 jmespath-0.10.0 libcst-0.4.0 mock-2.0.0 more-itertools-8.12.0 
msrest-0.6.21 mypy-extensions-0.4.3 numpy-1.21.5 oauth2client-4.1.3 
oauthlib-3.1.1 orjson-3.6.5 overrides-6.1.0 pandas-1.3.5 parameterized-0.7.5 
pbr-5.8.0 pluggy-0.13.1 proto-plus-1.19.8 psycopg2-binary-2.9.3 pyarrow-6.0.1 
pyasn1-0.4.8 pyasn1-modules-0.2.8 pycparser-2.21 pydot-1.4.2 pyhamcrest-1.10.1 
pymongo-3.12.3 pyparsing-2.4.7 pytest-4.6.11 pytest-forked-1.4.0 
pytest-timeout-1.4.2 pytest-xdist-1.34.0 python-dateutil-2.8.2 pytz-2021.3 
pyyaml-6.0 requests-2.27.1 requests-mock-1.9.3 requests-oauthlib-1.3.0 rsa-4.8 
s3transfer-0.5.0 sqlalchemy-1.4.31 tenacity-5.1.5 testcontainers-3.4.2 
typing-extensions-3.10.0.2 typing-inspect-0.7.1 typing-utils-0.1.0 
urllib3-1.26.8 wcwidth-0.2.5 websocket-client-1.2.3 wrapt-1.13.3

> Task :sdks:python:wordCount
INFO:apache_beam.internal.gcp.auth:Setting socket default timeout to 60 seconds.
INFO:apache_beam.internal.gcp.auth:socket default timeout is 60.0 seconds.
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
WARNING:root:Make sure that locally built Python SDK docker image has Python 
3.8 interpreter.
INFO:root:Default Python SDK image for environment is 
apache/beam_python3.8_sdk:2.37.0.dev
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.worker.statecache:Creating state cache with size 100
INFO:apache_beam.runners.portability.fn_api_runner.worker_handlers:Created 
Worker handler 
 for environment ref_Environment_default_environment_1 
(beam:env:embedded_python:v1, b'')
INFO:apache_beam.runners.portability.fn_api_

[jira] [Updated] (BEAM-13716) Clear before creating a new virtual environment in setupVirtualenv

2022-01-21 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-13716:
---
Description: 
h2. *TL;DR*

An existing virtualenv directory should be cleared before creating a new one.
h2. *Problem Description*

A virtualenv directory for Python tasks is generated from project path so any 
tasks that have the same project path share the same virtualenv directory. The 
problem is that when setupVirtualenv task initializes a virtualenv directory it 
doesn't overwrite an existing configuration. This can cause a subtle bug which 
is very hard to debug. See the following example:
{noformat}
❯ ./gradlew :sdks:python:wordCount -PpythonVersion=3.8
Configuration on demand is an incubating feature.

> Task :sdks:python:setupVirtualenv

> Task :sdks:python:sdist

> Task :sdks:python:installGcpTest
Successfully installed apache-beam-2.37.0.dev0 atomicwrites-1.4.0 attrs-21.4.0 
azure-core-1.21.1 azure-storage-blob-12.9.0 boto3-1.20.41 botocore-1.23.41 
cachetools-4.2.4 certifi-2021.10.8 cffi-1.15.0 charset-normalizer-2.0.10 
cloudpickle-2.0.0 crcmod-1.7 cryptography-36.0.1 deprecation-2.1.0 dill-0.3.1.1 
docker-5.0.3 docopt-0.6.2 execnet-1.9.0 fastavro-1.4.9 fasteners-0.17.2 
freezegun-1.1.0 google-api-core-1.31.5 google-apitools-0.5.31 
google-auth-1.35.0 google-cloud-bigquery-2.32.0 
google-cloud-bigquery-storage-2.11.0 google-cloud-bigtable-1.7.0 
google-cloud-core-1.7.2 google-cloud-datastore-1.15.3 google-cloud-dlp-3.5.0 
google-cloud-language-1.3.0 google-cloud-pubsub-2.9.0 
google-cloud-pubsublite-1.3.0 google-cloud-recommendations-ai-0.2.0 
google-cloud-spanner-1.19.1 google-cloud-videointelligence-1.16.1 
google-cloud-vision-1.0.0 google-crc32c-1.3.0 google-resumable-media-2.1.0 
googleapis-common-protos-1.54.0 greenlet-1.1.2 grpc-google-iam-v1-0.12.3 
grpcio-gcp-0.2.2 grpcio-status-1.43.0 hdfs-2.6.0 httplib2-0.19.1 idna-3.3 
isodate-0.6.1 jmespath-0.10.0 libcst-0.4.0 mock-2.0.0 more-itertools-8.12.0 
msrest-0.6.21 mypy-extensions-0.4.3 numpy-1.21.5 oauth2client-4.1.3 
oauthlib-3.1.1 orjson-3.6.5 overrides-6.1.0 pandas-1.3.5 parameterized-0.7.5 
pbr-5.8.0 pluggy-0.13.1 proto-plus-1.19.8 psycopg2-binary-2.9.3 pyarrow-6.0.1 
pyasn1-0.4.8 pyasn1-modules-0.2.8 pycparser-2.21 pydot-1.4.2 pyhamcrest-1.10.1 
pymongo-3.12.3 pyparsing-2.4.7 pytest-4.6.11 pytest-forked-1.4.0 
pytest-timeout-1.4.2 pytest-xdist-1.34.0 python-dateutil-2.8.2 pytz-2021.3 
pyyaml-6.0 requests-2.27.1 requests-mock-1.9.3 requests-oauthlib-1.3.0 rsa-4.8 
s3transfer-0.5.0 sqlalchemy-1.4.31 tenacity-5.1.5 testcontainers-3.4.2 
typing-extensions-3.10.0.2 typing-inspect-0.7.1 typing-utils-0.1.0 
urllib3-1.26.8 wcwidth-0.2.5 websocket-client-1.2.3 wrapt-1.13.3

> Task :sdks:python:wordCount
INFO:apache_beam.internal.gcp.auth:Setting socket default timeout to 60 seconds.
INFO:apache_beam.internal.gcp.auth:socket default timeout is 60.0 seconds.
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
WARNING:root:Make sure that locally built Python SDK docker image has Python 
3.8 interpreter.
INFO:root:Default Python SDK image for environment is 
apache/beam_python3.8_sdk:2.37.0.dev
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.portability.fn_api_runner.translations:
  
INFO:apache_beam.runners.worker.statecache:Creating state cache with size 100
INFO:apache_beam.runners.portability.fn_api_runner.worker_handlers:Created 
Worker handler 
 for environment ref_Environment_default_environment_1 
(beam:env:embedded_python:v1, b'')
INFO:apache_beam.runners.portability.fn_api_runner.f

[jira] [Created] (BEAM-13716) Clear before creating a new virtual environment in setupVirtualenv

2022-01-21 Thread Heejong Lee (Jira)
Heejong Lee created BEAM-13716:
--

 Summary: Clear before creating a new virtual environment in 
setupVirtualenv
 Key: BEAM-13716
 URL: https://issues.apache.org/jira/browse/BEAM-13716
 Project: Beam
  Issue Type: Bug
  Components: build-system, testing
Reporter: Heejong Lee
Assignee: Heejong Lee


An existing virtualenv directory should be cleared before creating a new one.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-12582) Adding jar packages only to Java environments

2022-01-18 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-12582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-12582:
---
Resolution: Won't Fix
Status: Resolved  (was: Open)

> Adding jar packages only to Java environments
> -
>
> Key: BEAM-12582
> URL: https://issues.apache.org/jira/browse/BEAM-12582
> Project: Beam
>  Issue Type: Improvement
>  Components: cross-language
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P3
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Adding jar packages only to Java environments



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13455) Remove duplicated artifacts when using multiple environments with Dataflow Java

2022-01-18 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-13455:
---
Resolution: Fixed
Status: Resolved  (was: Open)

> Remove duplicated artifacts when using multiple environments with Dataflow 
> Java
> ---
>
> Key: BEAM-13455
> URL: https://issues.apache.org/jira/browse/BEAM-13455
> Project: Beam
>  Issue Type: Improvement
>  Components: cross-language, runner-dataflow
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Remove duplicated artifacts when using multiple environments with Dataflow



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-10790) tar.gz artifacts written into fat jar is no longer gzip file

2022-01-18 Thread Heejong Lee (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17478147#comment-17478147
 ] 

Heejong Lee commented on BEAM-10790:


This might be already fixed in 2.25.0 release: 
https://github.com/apache/beam/commit/b37f1f63cc056da8a89d7d7752728b9fe04d045b#diff-2d9ca0bc6bca8dbcf45b4dbb8a951d660014c5d7edc754030366c4b8db1cfeb8

> tar.gz artifacts written into fat jar is no longer gzip file
> 
>
> Key: BEAM-10790
> URL: https://issues.apache.org/jira/browse/BEAM-10790
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
> Environment: Flink 1.10.1
> Beam worker pool: apache/beam_python3.6_sdk:2.22.0
> SDK: apache beam 2.22.0 
> Python 3.6
>Reporter: Jiaxin Shan
>Priority: P2
>
> I am using Flink Runner on Kubernetes. The problem I meet is zip file is kind 
> of broken after going through the artifact server.  
>  # application persist a dependency dist file tfx_ephemeral-0.22.0.tar.gz and 
> pass 
> `--extra_package=/tmp/tmpsm0_ll8e/build/tfx/dist/tfx_ephemeral-0.22.0.tar.gz` 
> to beam args
>  # artifact service will retrieve it from artifact server and append into 
> uber jar. 
>  FROM: 'ref_Environment_default_environment_1', dependencies '[type_urn: 
> "beam:artifact:type:[file:v1|file:///v1]"
>  27type_payload: 
> "\n;/tmp/tmpsm0_ll8e/build/tfx/dist/tfx_ephemeral-0.22.0.tar.gz" 
>  TO: 
> `/BEAM-PIPLINE/pipeline/artifacts/job-${uuid}/${hash}-tfx_ephemeral-0.22.0.tar.gz`
>  # beam python worker pool will get dependency from uber jar and put it under 
> /tmp/staged/ 
> I originally find this problem in step 3. I tried to trace the problem and 
> notice it is from step 2. the code snippet is here. 
> [https://github.com/apache/beam/blob/3d0f7dc011bb7bbee4eea9525b82f53855de85f1/sdks/python/apache_beam/runners/portability/artifact_service.py#L306-L310]
>  
> Logs from Application
> {code:java}
> INFO:root:Using Python SDK docker image: apache/beam_python3.6_sdk:2.22.0. If 
> the image is not available at local, we will try to pull from hub.docker.com
>  
> 19INFO:apache_beam.runners.portability.fn_api_runner.translations:
>   
>  20INFO:apache_beam.runners.portability.flink_runner:Adding HTTP protocol 
> scheme to flink_master parameter: http://beam-flink-cluster-jobmanager:8081
>  21INFO:apache_beam.runners.portability.abstract_job_service:Got Prepare 
> request.
>  22INFO:apache_beam.utils.subprocess_server:Downloading job server jar from 
> https://repo.maven.apache.org/maven2/org/apache/beam/beam-runners-flink-1.10-job-server/2.22.0/beam-runners-flink-1.10-job-server-2.22.0.jar
>  23INFO:apache_beam.runners.portability.abstract_job_service:Artifact server 
> started on port 44337
>  24INFO:apache_beam.runners.portability.abstract_job_service:Prepared job 
> 'job' as 'job-880904cc-ec5e-490a-ba93-8e18d87fbd89'
>  25INFO:apache_beam.runners.portability.artifact_service:staging token: 
> 'job-880904cc-ec5e-490a-ba93-8e18d87fbd89'
>  26INFO:apache_beam.runners.portability.artifact_service:artifact service 
> key: 'ref_Environment_default_environment_1', dependencies '[type_urn: 
> "beam:artifact:type:file:v1"
>  27type_payload: 
> "\n;/tmp/tmpsm0_ll8e/build/tfx/dist/tfx_ephemeral-0.22.0.tar.gz"
>   
>  INFO:apache_beam.runners.portability.abstract_job_service:Running job 
> 'job-880904cc-ec5e-490a-ba93-8e18d87fbd89'
>  51INFO:apache_beam.runners.portability.flink_uber_jar_job_server:Started 
> Flink job as bedeef0503bf0a17e3461169e2c9b5bc
>  52INFO:apache_beam.runners.portability.portable_runner:Job state changed to 
> STOPPED
>  53INFO:apache_beam.runners.portability.portable_runner:Job state changed to 
> RUNNING
> {code}
> The job never finish because exception in beam worker pool
>  
> Logs from beam worker pool. 
>  
> {code:java}
> 2020/08/21 18:29:46 Installing extra package: tfx_ephemeral-0.22.0.tar.gz
>  ERROR: Exception:
>  Traceback (most recent call last):
>  File "/usr/local/lib/python3.6/tarfile.py", line 1643, in gzopen
>  t = cls.taropen(name, mode, fileobj, **kwargs)
>  File "/usr/local/lib/python3.6/tarfile.py", line 1619, in taropen
>  return cls(name, mode, fileobj, **kwargs)
>  File "/usr/local/lib/python3.6/tarfile.py", line 1482, in _init_
>  self.firstmember = self.next()
>  File "/usr/local/lib/python3.6/tarfile.py", line 2297, in next
>  tarinfo = self.tarinfo.fromtarfile(self)
>  File "/usr/local/lib/python3.6/tarfile.py", line 1092, in fromtarfile
>  buf = tarfile.fileobj.read(BLOCKSIZE)
>  File "/usr/local/lib/python3.6/gzip.py", line 276, in read
>  return self._buffer.read(size)
>  File "/usr/local/lib/python3.6/_compression.py", line 68, in readinto
>  data = self.read(len(byte_view))
>  File "/usr/local/lib/python3.6/gzip.py", line 463, in read
>  if not self._

[jira] [Created] (BEAM-13647) Go SDK FnApi environment version 9 should be compatible with Runner v2 artifact service

2022-01-12 Thread Heejong Lee (Jira)
Heejong Lee created BEAM-13647:
--

 Summary: Go SDK FnApi environment version 9 should be compatible 
with Runner v2 artifact service
 Key: BEAM-13647
 URL: https://issues.apache.org/jira/browse/BEAM-13647
 Project: Beam
  Issue Type: Improvement
  Components: runner-dataflow, sdk-go
Reporter: Heejong Lee


Go SDK FnApi environment version 9 should be compatible with Runner v2 artifact 
service. We need to test and verify whether Go SDK works well with Runner v2 
artifact service before increasing the version number.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13629) Update URL artifact type for Dataflow Go

2022-01-12 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-13629:
---
Description: Update URL artifact type for Dataflow Go  (was: Update GCS 
artifact type for Dataflow Go)

> Update URL artifact type for Dataflow Go
> 
>
> Key: BEAM-13629
> URL: https://issues.apache.org/jira/browse/BEAM-13629
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, sdk-go
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>
> Update URL artifact type for Dataflow Go



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13629) Update URL artifact type for Dataflow Go

2022-01-12 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-13629:
---
Summary: Update URL artifact type for Dataflow Go  (was: Update GCS 
artifact type for Dataflow Go)

> Update URL artifact type for Dataflow Go
> 
>
> Key: BEAM-13629
> URL: https://issues.apache.org/jira/browse/BEAM-13629
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, sdk-go
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>
> Update GCS artifact type for Dataflow Go



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13090) Adding SDK harness container overrides option to Java SDK

2022-01-12 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-13090:
---
Resolution: Fixed
Status: Resolved  (was: Open)

> Adding SDK harness container overrides option to Java SDK
> -
>
> Key: BEAM-13090
> URL: https://issues.apache.org/jira/browse/BEAM-13090
> Project: Beam
>  Issue Type: Sub-task
>  Components: cross-language
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>  Labels: stale-assigned
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Adding SDK harness container overrides option to Java SDK



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13091) Generate missing staged names from hash for Dataflow runner

2022-01-12 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-13091:
---
Resolution: Fixed
Status: Resolved  (was: Open)

> Generate missing staged names from hash for Dataflow runner
> ---
>
> Key: BEAM-13091
> URL: https://issues.apache.org/jira/browse/BEAM-13091
> Project: Beam
>  Issue Type: Sub-task
>  Components: cross-language, runner-dataflow
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Generate missing staged names from hash for Dataflow runner



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13021) Deduplicate Python artifact not only by hash but also by source path

2022-01-12 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-13021:
---
Resolution: Fixed
Status: Resolved  (was: Open)

> Deduplicate Python artifact not only by hash but also by source path
> 
>
> Key: BEAM-13021
> URL: https://issues.apache.org/jira/browse/BEAM-13021
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>  Labels: stale-assigned
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> Deduplicate Python artifact not only by hash but also by source path



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13092) Adding dummy external transform translators for Dataflow runner

2022-01-12 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-13092:
---
Resolution: Fixed
Status: Resolved  (was: In Progress)

> Adding dummy external transform translators for Dataflow runner
> ---
>
> Key: BEAM-13092
> URL: https://issues.apache.org/jira/browse/BEAM-13092
> Project: Beam
>  Issue Type: Sub-task
>  Components: cross-language, runner-dataflow
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>  Labels: stale-assigned
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Adding dummy external transform translators for Dataflow runner



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work started] (BEAM-13092) Adding dummy external transform translators for Dataflow runner

2022-01-12 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-13092 started by Heejong Lee.
--
> Adding dummy external transform translators for Dataflow runner
> ---
>
> Key: BEAM-13092
> URL: https://issues.apache.org/jira/browse/BEAM-13092
> Project: Beam
>  Issue Type: Sub-task
>  Components: cross-language, runner-dataflow
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>  Labels: stale-assigned
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Adding dummy external transform translators for Dataflow runner



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-12855) Infer result type when the composite type has Any

2022-01-12 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-12855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-12855:
---
Resolution: Won't Do
Status: Resolved  (was: Open)

> Infer result type when the composite type has Any
> -
>
> Key: BEAM-12855
> URL: https://issues.apache.org/jira/browse/BEAM-12855
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Heejong Lee
>Priority: P2
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Infer result type when the composite type has Any. This change makes 
> `with_output_types` annotation work for composite transforms like 
> `CombinePerKey`.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-10880) Log error counts to debug BigQuery streaming insert requests for Python SDK

2022-01-12 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-10880:
---
Resolution: Won't Do
Status: Resolved  (was: Open)

> Log error counts to debug BigQuery streaming insert requests for Python SDK
> ---
>
> Key: BEAM-10880
> URL: https://issues.apache.org/jira/browse/BEAM-10880
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Heejong Lee
>Priority: P3
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> Log error counts to debug BigQuery streaming insert requests for Python SDK



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-10890) Log error counts to debug BigQuery streaming insert requests for Java SDK

2022-01-12 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-10890:
---
Resolution: Fixed
Status: Resolved  (was: Open)

> Log error counts to debug BigQuery streaming insert requests for Java SDK
> -
>
> Key: BEAM-10890
> URL: https://issues.apache.org/jira/browse/BEAM-10890
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Heejong Lee
>Priority: P3
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Log error counts to debug BigQuery streaming insert requests for Java SDK



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-10699) Identify and log additional information needed to debug streaming insert requests for Java SDK

2022-01-12 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-10699:
---
Resolution: Fixed
Status: Resolved  (was: Open)

> Identify and log additional information needed to debug streaming insert 
> requests for Java SDK
> --
>
> Key: BEAM-10699
> URL: https://issues.apache.org/jira/browse/BEAM-10699
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Heejong Lee
>Priority: P3
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> implement logging for per worker statistics:
> - Request count - for that window.
> - Error codes + number of occurrences for that window (Or perhaps just log 
> each error with as much detail as possible.)
> - Tail latencies of requests (50, 90 and 99, percentiles)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-10791) Identify and log additional information needed to debug streaming insert requests for Python SDK

2022-01-12 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-10791:
---
Resolution: Fixed
Status: Resolved  (was: Open)

> Identify and log additional information needed to debug streaming insert 
> requests for Python SDK
> 
>
> Key: BEAM-10791
> URL: https://issues.apache.org/jira/browse/BEAM-10791
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Heejong Lee
>Priority: P3
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> implement logging for per worker statistics:
> - Request count - for that window.
> - Error codes + number of occurrences for that window (Or perhaps just log 
> each error with as much detail as possible.)
> - Tail latencies of requests (50, 90 and 99, percentiles)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-10322) allow only single assignment to producing stages by pcollection map

2022-01-12 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-10322:
---
Resolution: Fixed
Status: Resolved  (was: Open)

> allow only single assignment to producing stages by pcollection map
> ---
>
> Key: BEAM-10322
> URL: https://issues.apache.org/jira/browse/BEAM-10322
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Reporter: Heejong Lee
>Priority: P3
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> allow only single assignment to producing stages by pcollection map



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-10064) Fix google3 import error for BEAM-9383

2022-01-12 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-10064:
---
Resolution: Fixed
Status: Resolved  (was: Open)

> Fix google3 import error for BEAM-9383
> --
>
> Key: BEAM-10064
> URL: https://issues.apache.org/jira/browse/BEAM-10064
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Heejong Lee
>Priority: P3
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Fix google3 importing error for BEAM-9383
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-9415) fix postcommit xlang validate runner

2022-01-12 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-9415:
--
Resolution: Fixed
Status: Resolved  (was: Open)

> fix postcommit xlang validate runner
> 
>
> Key: BEAM-9415
> URL: https://issues.apache.org/jira/browse/BEAM-9415
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Priority: P3
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> broken since [https://builds.apache.org/job/beam_PostCommit_XVR_Flink/1838/]
> The proposed PR checks whether the coder is compatible with Java SDK before 
> rehydrating it from expanded components. The coder id renaming is only needed 
> for Java SDK compatible coders so the change is safe.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-10318) fix uninitialized grpc_server in FnApiRunner

2022-01-12 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-10318:
---
Resolution: Fixed
Status: Resolved  (was: Open)

> fix uninitialized grpc_server in FnApiRunner
> 
>
> Key: BEAM-10318
> URL: https://issues.apache.org/jira/browse/BEAM-10318
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Reporter: Heejong Lee
>Priority: P3
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> fix uninitialized grpc_server in FnApiRunner



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-9335) update hard-coded coder id when translating Java external transforms

2022-01-12 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-9335:
--
Resolution: Fixed
Status: Resolved  (was: Open)

> update hard-coded coder id when translating Java external transforms
> 
>
> Key: BEAM-9335
> URL: https://issues.apache.org/jira/browse/BEAM-9335
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Priority: P3
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> hard-coded coder id needs to be updated when translating Java external 
> transforms. Otherwise pipeline will fail if coder id is reused.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-7534) add --mountTempDir option for easier data sharing with Docker container

2022-01-12 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-7534:
--
Resolution: Won't Do
Status: Resolved  (was: Open)

> add --mountTempDir option for easier data sharing with Docker container
> ---
>
> Key: BEAM-7534
> URL: https://issues.apache.org/jira/browse/BEAM-7534
> Project: Beam
>  Issue Type: Improvement
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Priority: P3
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> add --mountTempDir option for easier data sharing with Docker container.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-7124) adding kafkaio test in Python validateCrossLanguageRunner

2022-01-12 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-7124:
--
Resolution: Won't Do
Status: Resolved  (was: Open)

> adding kafkaio test in Python validateCrossLanguageRunner
> -
>
> Key: BEAM-7124
> URL: https://issues.apache.org/jira/browse/BEAM-7124
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Heejong Lee
>Priority: P3
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> adding low-level kafkaio test in Python validateCrossLanguageRunner



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13629) Update GCS artifact type for Dataflow Go

2022-01-10 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-13629:
---
Description: Update GCS artifact type for Dataflow Go  (was: Update 
artifact type for Dataflow Go)

> Update GCS artifact type for Dataflow Go
> 
>
> Key: BEAM-13629
> URL: https://issues.apache.org/jira/browse/BEAM-13629
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, sdk-go
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>
> Update GCS artifact type for Dataflow Go



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13629) Update GCS artifact type for Dataflow Go

2022-01-10 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-13629:
---
Summary: Update GCS artifact type for Dataflow Go  (was: Update artifact 
type for Dataflow Go)

> Update GCS artifact type for Dataflow Go
> 
>
> Key: BEAM-13629
> URL: https://issues.apache.org/jira/browse/BEAM-13629
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, sdk-go
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>
> Update artifact type for Dataflow Go



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (BEAM-13629) Update artifact type for Dataflow Go

2022-01-10 Thread Heejong Lee (Jira)
Heejong Lee created BEAM-13629:
--

 Summary: Update artifact type for Dataflow Go
 Key: BEAM-13629
 URL: https://issues.apache.org/jira/browse/BEAM-13629
 Project: Beam
  Issue Type: Improvement
  Components: runner-dataflow, sdk-go
Reporter: Heejong Lee
Assignee: Heejong Lee


Update artifact type for Dataflow Go



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13615) Bumping up FnApi environment version to 9 in Java, Python SDK

2022-01-07 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-13615:
---
Description: Bumping up FnApi environment version to 9 in Java, Python SDK. 
This is a blocker for Python multi-language support GA. Hope we could 
cherry-pick this into 2.36.0 release.  (was: Bumping up FnApi environment 
version to 9. This is a blocker for Python multi-language support GA. Hope we 
could cherry-pick this into 2.36.0 release.)

> Bumping up FnApi environment version to 9 in Java, Python SDK
> -
>
> Key: BEAM-13615
> URL: https://issues.apache.org/jira/browse/BEAM-13615
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, sdk-py-core
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P1
> Fix For: 2.36.0
>
>
> Bumping up FnApi environment version to 9 in Java, Python SDK. This is a 
> blocker for Python multi-language support GA. Hope we could cherry-pick this 
> into 2.36.0 release.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13615) Bumping up FnApi environment version to 9

2022-01-07 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-13615:
---
Component/s: (was: sdk-go)

> Bumping up FnApi environment version to 9
> -
>
> Key: BEAM-13615
> URL: https://issues.apache.org/jira/browse/BEAM-13615
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, sdk-py-core
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P1
> Fix For: 2.36.0
>
>
> Bumping up FnApi environment version to 9. This is a blocker for Python 
> multi-language support GA. Hope we could cherry-pick this into 2.36.0 release.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13615) Bumping up FnApi environment version to 9 in Java, Python SDK

2022-01-07 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-13615:
---
Summary: Bumping up FnApi environment version to 9 in Java, Python SDK  
(was: Bumping up FnApi environment version to 9)

> Bumping up FnApi environment version to 9 in Java, Python SDK
> -
>
> Key: BEAM-13615
> URL: https://issues.apache.org/jira/browse/BEAM-13615
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, sdk-py-core
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P1
> Fix For: 2.36.0
>
>
> Bumping up FnApi environment version to 9. This is a blocker for Python 
> multi-language support GA. Hope we could cherry-pick this into 2.36.0 release.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13615) Bumping up FnApi environment version to 9

2022-01-07 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-13615:
---
Priority: P1  (was: P2)

> Bumping up FnApi environment version to 9
> -
>
> Key: BEAM-13615
> URL: https://issues.apache.org/jira/browse/BEAM-13615
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go, sdk-java-core, sdk-py-core
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P1
> Fix For: 2.36.0
>
>
> Bumping up FnApi environment version to 9. This is a blocker for Python 
> multi-language support GA. Hope we could cherry-pick this into 2.36.0 release.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13615) Bumping up FnApi environment version to 9

2022-01-07 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-13615:
---
Status: Open  (was: Triage Needed)

> Bumping up FnApi environment version to 9
> -
>
> Key: BEAM-13615
> URL: https://issues.apache.org/jira/browse/BEAM-13615
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go, sdk-java-core, sdk-py-core
>Reporter: Heejong Lee
>Priority: P2
>
> Bumping up FnApi environment version to 9. This is a blocker for Python 
> multi-language support GA. Hope we could cherry-pick this into 2.36.0 release.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


  1   2   3   4   5   6   7   >