[jira] [Work logged] (BEAM-6909) Add location support for BigQueryWrapper._get_query_results()

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6909?focusedWorklogId=218458=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218458
 ]

ASF GitHub Bot logged work on BEAM-6909:


Author: ASF GitHub Bot
Created on: 26/Mar/19 04:50
Start Date: 26/Mar/19 04:50
Worklog Time Spent: 10m 
  Work Description: ryanyuan commented on issue #8139: [BEAM-6909] Add 
location support for BigQueryWrapper._get_query_resul…
URL: https://github.com/apache/beam/pull/8139#issuecomment-476473547
 
 
   R: @pabloem 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218458)
Time Spent: 0.5h  (was: 20m)

> Add location support for BigQueryWrapper._get_query_results()
> -
>
> Key: BEAM-6909
> URL: https://issues.apache.org/jira/browse/BEAM-6909
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ryan Yuan
>Priority: Critical
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When a BQ Job is created outside US or EU, the _get_query_results() in 
> bigquery_tools.py always return 404.
>  
> This is to patch the _get_query_results() so that it can support location as 
> parameter.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6909) Add location support for BigQueryWrapper._get_query_results()

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6909?focusedWorklogId=218456=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218456
 ]

ASF GitHub Bot logged work on BEAM-6909:


Author: ASF GitHub Bot
Created on: 26/Mar/19 04:43
Start Date: 26/Mar/19 04:43
Worklog Time Spent: 10m 
  Work Description: ryanyuan commented on issue #8139: [BEAM-6909] Add 
location support for BigQueryWrapper._get_query_resul…
URL: https://github.com/apache/beam/pull/8139#issuecomment-476473547
 
 
   @pabloem 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218456)
Time Spent: 20m  (was: 10m)

> Add location support for BigQueryWrapper._get_query_results()
> -
>
> Key: BEAM-6909
> URL: https://issues.apache.org/jira/browse/BEAM-6909
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ryan Yuan
>Priority: Critical
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When a BQ Job is created outside US or EU, the _get_query_results() in 
> bigquery_tools.py always return 404.
>  
> This is to patch the _get_query_results() so that it can support location as 
> parameter.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6910) Beam does not consider BigQuery's processing location when getting query results

2019-03-25 Thread Graham Polley (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Graham Polley updated BEAM-6910:

Description: 
When using the BigQuery source with a SQL query in a pipeline, the "processing 
location" is not taken into consideration and the pipeline fails.

For example, consider the following which uses {{BigQuerySource}} to read from 
BigQuery using some SQL. The BigQuery dataset and tables are located in 
{{australia-southeast1}}. The query is submitted successfully ([Beam works out 
the processing location by examining the first table referenced in the query 
and sets it 
accordingly|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L221]),
 but when Beam attempts to poll for the job status after it has been submitted, 
it fails because it doesn't set the {{location}} to be 
{{australia-southeast1}}, which is required by BigQuery:

 
{code:java}
p | 'read' >> beam.io.Read(beam.io.BigQuerySource(use_standard_sql=True, 
query='SELECT * from 
`a_project_id.dataset_in_australia.table_in_australia`'){code}
 
{code:java}
HttpNotFoundError: HttpError accessing 
:
 response: <{'status': '404', 'content-length': '328', 'x-xss-protection': '1; 
mode=block', 'x-content-type-options': 'nosniff', 'transfer-encoding': 
'chunked', 'vary': 'Origin, X-Origin, Referer', 'server': 'ESF', 
'-content-encoding': 'gzip', 'cache-control': 'private', 'date': 'Tue, 26 Mar 
2019 03:11:32 GMT', 'x-frame-options': 'SAMEORIGIN', 'alt-svc': 'quic=":443"; 
ma=2592000; v="46,44,43,39"', 'content-type': 'application/json; 
charset=UTF-8'}>, content <{
  "error": {
    "code": 404,
    "message": "Not found: Job a_project_id:5ad9cc803baa432290b6cd0203f556d9",
    "errors": [
      {
    "message": "Not found: Job 
a_project_id:5ad9cc803baa432290b6cd0203f556d9",
    "domain": "global",
    "reason": "notFound"
  }
    ],
    "status": "NOT_FOUND"
  }
}
{code}
 

The problem can be seen/found here:

[https://github.com/apache/beam/blob/v2.11.0/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L571]

[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L357]

The location of the job (in this case {{australia-southeast1}}) needs to 
set/inferred (or exposed via the API), otherwise its fails.

 For reference, Airflow had the same bug/problem: 
[https://github.com/apache/airflow/pull/4695]

 

 

  was:
When using the BigQuery source with a query in a pipeline, the "processing 
location" is not taken into consideration and the pipeline fails.

For example, consider the following which uses `BigQuerySource` to read from 
BigQuery using some SQL. The BigQuery dataset and tables are located in 
"australia-southeast1". The query is submitted successfully ([Beam works out 
the processing location by examining the first table referenced in the query 
and sets it 
accordingly|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L221]),
 but when Beam attempts to poll for the job status after it has been submitted, 
it fails because it doesn't set the `location` to be "australia-southeast1", 
which is required by BigQuery:

 
{code:java}
p | 'read' >> beam.io.Read(beam.io.BigQuerySource(use_standard_sql=True, 
query='SELECT * from 
`a_project_id.dataset_in_australia.table_in_australia`'){code}
 
{code:java}
HttpNotFoundError: HttpError accessing 
:
 response: <{'status': '404', 'content-length': '328', 'x-xss-protection': '1; 
mode=block', 'x-content-type-options': 'nosniff', 'transfer-encoding': 
'chunked', 'vary': 'Origin, X-Origin, Referer', 'server': 'ESF', 
'-content-encoding': 'gzip', 'cache-control': 'private', 'date': 'Tue, 26 Mar 
2019 03:11:32 GMT', 'x-frame-options': 'SAMEORIGIN', 'alt-svc': 'quic=":443"; 
ma=2592000; v="46,44,43,39"', 'content-type': 'application/json; 
charset=UTF-8'}>, content <{
  "error": {
    "code": 404,
    "message": "Not found: Job a_project_id:5ad9cc803baa432290b6cd0203f556d9",
    "errors": [
      {
    "message": "Not found: Job 
a_project_id:5ad9cc803baa432290b6cd0203f556d9",
    "domain": "global",
    "reason": "notFound"
  }
    ],
    "status": "NOT_FOUND"
  }
}
{code}
 

The problem can be seen/found here:

[https://github.com/apache/beam/blob/v2.11.0/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L571]

[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L357]

The location of the job (in this case "australia-southeast1") needs to 
set/inferred (or exposed via the API), otherwise its fails.

 For reference, Airflow had the same bug/problem: 

[jira] [Work logged] (BEAM-6909) Add location support for BigQueryWrapper._get_query_results()

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6909?focusedWorklogId=218453=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218453
 ]

ASF GitHub Bot logged work on BEAM-6909:


Author: ASF GitHub Bot
Created on: 26/Mar/19 04:22
Start Date: 26/Mar/19 04:22
Worklog Time Spent: 10m 
  Work Description: ryanyuan commented on pull request #8139: [BEAM-6909] 
Add location support for BigQueryWrapper._get_query_resul…
URL: https://github.com/apache/beam/pull/8139
 
 
   When a BQ Job is created outside US or EU, the _get_query_results() in 
bigquery_tools.py always return 404.
   
   This is to patch the _get_query_results() so that it can support location as 
parameter.
   
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/)
 | --- | --- | ---
   
   Pre-Commit Tests 

[jira] [Updated] (BEAM-6910) Beam does not consider BigQuery's processing location when getting query results

2019-03-25 Thread Graham Polley (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Graham Polley updated BEAM-6910:

Description: 
When using the BigQuery source with a query in a pipeline, the "processing 
location" is not taken into consideration and the pipeline fails.

For example, consider the following which uses `BigQuerySource` to read from 
BigQuery using some SQL. The BigQuery dataset and tables are located in 
"australia-southeast1". The query is submitted successfully ([Beam works out 
the processing location by examining the first table referenced in the query 
and sets it 
accordingly|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L221]),
 but when Beam attempts to poll for the job status after it has been submitted, 
it fails because it doesn't set the `location` to be "australia-southeast1", 
which is required by BigQuery:

 
{code:java}
p | 'read' >> beam.io.Read(beam.io.BigQuerySource(use_standard_sql=True, 
query='SELECT * from 
`a_project_id.dataset_in_australia.table_in_australia`'){code}
 
{code:java}
HttpNotFoundError: HttpError accessing 
:
 response: <{'status': '404', 'content-length': '328', 'x-xss-protection': '1; 
mode=block', 'x-content-type-options': 'nosniff', 'transfer-encoding': 
'chunked', 'vary': 'Origin, X-Origin, Referer', 'server': 'ESF', 
'-content-encoding': 'gzip', 'cache-control': 'private', 'date': 'Tue, 26 Mar 
2019 03:11:32 GMT', 'x-frame-options': 'SAMEORIGIN', 'alt-svc': 'quic=":443"; 
ma=2592000; v="46,44,43,39"', 'content-type': 'application/json; 
charset=UTF-8'}>, content <{
  "error": {
    "code": 404,
    "message": "Not found: Job a_project_id:5ad9cc803baa432290b6cd0203f556d9",
    "errors": [
      {
    "message": "Not found: Job 
a_project_id:5ad9cc803baa432290b6cd0203f556d9",
    "domain": "global",
    "reason": "notFound"
  }
    ],
    "status": "NOT_FOUND"
  }
}
{code}
 

The problem can be seen/found here:

[https://github.com/apache/beam/blob/v2.11.0/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L571]

[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L357]

The location of the job (in this case "australia-southeast1") needs to 
set/inferred (or exposed via the API), otherwise its fails.

 For reference, Airflow had the same bug/problem: 
[https://github.com/apache/airflow/pull/4695]

 

 

  was:
When using the BigQuery source with a query in a pipeline, the "processing 
location" is not taken into consideration and the pipeline fails.

For example, consider the following which uses `BigQuerySource` to read from 
BigQuery using some SQL. The BigQuery dataset and tables are located in 
"australia-southeast1". The query is submitted successfully ([Beam works out 
the processing location by examining the first table referenced in the query 
and sets it 
accordingly|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L221]),
 but when Beam attempts to poll for the job status after it has been submitted, 
it fails because it doesn't set the `location` to be "australia-southeast1", 
which is required by BigQuery:

 

 
{code:java}
p | 'read' >> beam.io.Read(beam.io.BigQuerySource(use_standard_sql=True, 
query='SELECT * from 
`a_project_id.dataset_in_australia.table_in_australia`'){code}
 

 

 
{code:java}
HttpNotFoundError: HttpError accessing 
:
 response: <{'status': '404', 'content-length': '328', 'x-xss-protection': '1; 
mode=block', 'x-content-type-options': 'nosniff', 'transfer-encoding': 
'chunked', 'vary': 'Origin, X-Origin, Referer', 'server': 'ESF', 
'-content-encoding': 'gzip', 'cache-control': 'private', 'date': 'Tue, 26 Mar 
2019 03:11:32 GMT', 'x-frame-options': 'SAMEORIGIN', 'alt-svc': 'quic=":443"; 
ma=2592000; v="46,44,43,39"', 'content-type': 'application/json; 
charset=UTF-8'}>, content <{
  "error": {
    "code": 404,
    "message": "Not found: Job a_project_id:5ad9cc803baa432290b6cd0203f556d9",
    "errors": [
      {
    "message": "Not found: Job 
a_project_id:5ad9cc803baa432290b6cd0203f556d9",
    "domain": "global",
    "reason": "notFound"
  }
    ],
    "status": "NOT_FOUND"
  }
}
{code}
 

The problem can be seen here:

[https://github.com/apache/beam/blob/v2.11.0/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L571]

[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L357]

The location of the job (in this case "australia-southeast1") needs to 
set/inferred (or exposed via the API), otherwise its fails.

 

For reference, Airflow had the same bug/problem: 
https://github.com/apache/airflow/pull/4695

 


[jira] [Created] (BEAM-6910) Beam does not consider BigQuery's processing location when getting query results

2019-03-25 Thread Graham Polley (JIRA)
Graham Polley created BEAM-6910:
---

 Summary: Beam does not consider BigQuery's processing location 
when getting query results
 Key: BEAM-6910
 URL: https://issues.apache.org/jira/browse/BEAM-6910
 Project: Beam
  Issue Type: Bug
  Components: dependencies, runner-dataflow, sdk-py-core
Affects Versions: 2.11.0
 Environment: Python
Reporter: Graham Polley


When using the BigQuery source with a query in a pipeline, the "processing 
location" is not taken into consideration and the pipeline fails.

For example, consider the following which uses `BigQuerySource` to read from 
BigQuery using some SQL. The BigQuery dataset and tables are located in 
"australia-southeast1". The query is submitted successfully ([Beam works out 
the processing location by examining the first table referenced in the query 
and sets it 
accordingly|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L221]),
 but when Beam attempts to poll for the job status after it has been submitted, 
it fails because it doesn't set the `location` to be "australia-southeast1", 
which is required by BigQuery:

 

 
{code:java}
p | 'read' >> beam.io.Read(beam.io.BigQuerySource(use_standard_sql=True, 
query='SELECT * from 
`a_project_id.dataset_in_australia.table_in_australia`'){code}
 

 

 
{code:java}
HttpNotFoundError: HttpError accessing 
:
 response: <{'status': '404', 'content-length': '328', 'x-xss-protection': '1; 
mode=block', 'x-content-type-options': 'nosniff', 'transfer-encoding': 
'chunked', 'vary': 'Origin, X-Origin, Referer', 'server': 'ESF', 
'-content-encoding': 'gzip', 'cache-control': 'private', 'date': 'Tue, 26 Mar 
2019 03:11:32 GMT', 'x-frame-options': 'SAMEORIGIN', 'alt-svc': 'quic=":443"; 
ma=2592000; v="46,44,43,39"', 'content-type': 'application/json; 
charset=UTF-8'}>, content <{
  "error": {
    "code": 404,
    "message": "Not found: Job a_project_id:5ad9cc803baa432290b6cd0203f556d9",
    "errors": [
      {
    "message": "Not found: Job 
a_project_id:5ad9cc803baa432290b6cd0203f556d9",
    "domain": "global",
    "reason": "notFound"
  }
    ],
    "status": "NOT_FOUND"
  }
}
{code}
 

The problem can be seen here:

[https://github.com/apache/beam/blob/v2.11.0/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L571]

[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L357]

The location of the job (in this case "australia-southeast1") needs to 
set/inferred (or exposed via the API), otherwise its fails.

 

For reference, Airflow had the same bug/problem: 
https://github.com/apache/airflow/pull/4695

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6909) Add location support for _get_query_results()

2019-03-25 Thread Ryan Yuan (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Yuan updated BEAM-6909:

Summary: Add location support for _get_query_results()  (was: Add location 
support for BigqueryJobsGetQueryResultsRequest)

> Add location support for _get_query_results()
> -
>
> Key: BEAM-6909
> URL: https://issues.apache.org/jira/browse/BEAM-6909
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ryan Yuan
>Priority: Critical
>
> When a BQ Job is created outside US or EU, the _get_query_results() in 
> bigquery_tools.py always return 404.
>  
> This is to patch the _get_query_results() so that it can support location as 
> parameter.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6909) Add location support for BigQueryWrapper._get_query_results()

2019-03-25 Thread Ryan Yuan (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Yuan updated BEAM-6909:

Summary: Add location support for BigQueryWrapper._get_query_results()  
(was: Add location support for _get_query_results())

> Add location support for BigQueryWrapper._get_query_results()
> -
>
> Key: BEAM-6909
> URL: https://issues.apache.org/jira/browse/BEAM-6909
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ryan Yuan
>Priority: Critical
>
> When a BQ Job is created outside US or EU, the _get_query_results() in 
> bigquery_tools.py always return 404.
>  
> This is to patch the _get_query_results() so that it can support location as 
> parameter.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6909) Add location support for BigqueryJobsGetQueryResultsRequest

2019-03-25 Thread Ryan Yuan (JIRA)
Ryan Yuan created BEAM-6909:
---

 Summary: Add location support for 
BigqueryJobsGetQueryResultsRequest
 Key: BEAM-6909
 URL: https://issues.apache.org/jira/browse/BEAM-6909
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Ryan Yuan


When a BQ Job is created outside US or EU, the _get_query_results() in 
bigquery_tools.py always return 404.

 

This is to patch the _get_query_results() so that it can support location as 
parameter.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5959) Add Cloud KMS support to GCS copies

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5959?focusedWorklogId=218416=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218416
 ]

ASF GitHub Bot logged work on BEAM-5959:


Author: ASF GitHub Bot
Created on: 26/Mar/19 01:35
Start Date: 26/Mar/19 01:35
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #7744: [BEAM-5959] KMS 
support for BigQuery
URL: https://github.com/apache/beam/pull/7744#discussion_r268915313
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
 ##
 @@ -1534,6 +1550,10 @@ static String getExtractDestinationUri(String 
extractDestinationDir) {
   return toBuilder().setIgnoreUnknownValues(true).build();
 }
 
+Write withKmsKey(String kmsKey) {
 
 Review comment:
   Yes, it should be.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218416)
Time Spent: 31h  (was: 30h 50m)

> Add Cloud KMS support to GCS copies
> ---
>
> Key: BEAM-5959
> URL: https://issues.apache.org/jira/browse/BEAM-5959
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>  Labels: triaged
> Fix For: 2.11.0
>
>  Time Spent: 31h
>  Remaining Estimate: 0h
>
> Beam SDK currently uses the CopyTo GCS API call, which doesn't support 
> copying objects that Customer Managed Encryption Keys (CMEK).
> CMEKs are managed in Cloud KMS.
> Items (for Java and Python SDKs):
> - Update clients to versions that support KMS keys.
> - Change copyTo API calls to use rewriteTo (Python - directly, Java - 
> possibly convert copyTo API call to use client library)
> - Add unit tests.
> - Add basic tests (DirectRunner and GCS buckets with CMEK).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6257) Can we deprecate the side input paths through PAssert?

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6257?focusedWorklogId=218414=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218414
 ]

ASF GitHub Bot logged work on BEAM-6257:


Author: ASF GitHub Bot
Created on: 26/Mar/19 01:33
Start Date: 26/Mar/19 01:33
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #8094: 
[BEAM-6257] PAssert.thatSingleton: use GBK instead of side inputs
URL: https://github.com/apache/beam/pull/8094#discussion_r268915056
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/testing/PAssert.java
 ##
 @@ -23,6 +23,7 @@
 import static org.hamcrest.Matchers.not;
 import static org.junit.Assert.assertThat;
 
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
 
 Review comment:
   Ah, OK. Of course.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218414)
Time Spent: 2.5h  (was: 2h 20m)

> Can we deprecate the side input paths through PAssert?
> --
>
> Key: BEAM-6257
> URL: https://issues.apache.org/jira/browse/BEAM-6257
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Priority: Major
>  Labels: starter, triaged
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> PAssert has two distinct paths - one uses GBK with a single-firing trigger, 
> and one uses side inputs. Side inputs are usually a later addition to a 
> runner, while GBK is one of the first primitives (with a single firing it is 
> even simple). Filing this against myself to figure out why the side input 
> version is not deprecated, and if it can be deprecated.
> Marking this as a "starter" task because finding and eliminating side input 
> version of PAssert should be fairly easy. You might need help but can ask on 
> dev@



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5959) Add Cloud KMS support to GCS copies

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5959?focusedWorklogId=218408=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218408
 ]

ASF GitHub Bot logged work on BEAM-5959:


Author: ASF GitHub Bot
Created on: 26/Mar/19 01:27
Start Date: 26/Mar/19 01:27
Worklog Time Spent: 10m 
  Work Description: mayansalama commented on pull request #7744: 
[BEAM-5959] KMS support for BigQuery
URL: https://github.com/apache/beam/pull/7744#discussion_r268914070
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
 ##
 @@ -1534,6 +1550,10 @@ static String getExtractDestinationUri(String 
extractDestinationDir) {
   return toBuilder().setIgnoreUnknownValues(true).build();
 }
 
+Write withKmsKey(String kmsKey) {
 
 Review comment:
   Shouldn't this method be public? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218408)
Time Spent: 30h 50m  (was: 30h 40m)

> Add Cloud KMS support to GCS copies
> ---
>
> Key: BEAM-5959
> URL: https://issues.apache.org/jira/browse/BEAM-5959
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>  Labels: triaged
> Fix For: 2.11.0
>
>  Time Spent: 30h 50m
>  Remaining Estimate: 0h
>
> Beam SDK currently uses the CopyTo GCS API call, which doesn't support 
> copying objects that Customer Managed Encryption Keys (CMEK).
> CMEKs are managed in Cloud KMS.
> Items (for Java and Python SDKs):
> - Update clients to versions that support KMS keys.
> - Change copyTo API calls to use rewriteTo (Python - directly, Java - 
> possibly convert copyTo API call to use client library)
> - Add unit tests.
> - Add basic tests (DirectRunner and GCS buckets with CMEK).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6747) Adding ExternalTransform in JavaSDK

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6747?focusedWorklogId=218378=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218378
 ]

ASF GitHub Bot logged work on BEAM-6747:


Author: ASF GitHub Bot
Created on: 26/Mar/19 00:20
Start Date: 26/Mar/19 00:20
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #7954: [BEAM-6747] 
Adding ExternalTransform in JavaSDK
URL: https://github.com/apache/beam/pull/7954#discussion_r268903658
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/External.java
 ##
 @@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.construction;
+
+import static 
org.apache.beam.vendor.guava.v20_0.com.google.common.base.Preconditions.checkArgument;
+
+import java.io.IOException;
+import java.util.Collections;
+import java.util.Map;
+import java.util.concurrent.atomic.AtomicInteger;
+import javax.annotation.Nullable;
+import org.apache.beam.model.expansion.v1.ExpansionApi;
+import org.apache.beam.model.pipeline.v1.Endpoints;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.runners.AppliedPTransform;
+import org.apache.beam.sdk.transforms.Impulse;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionTuple;
+import org.apache.beam.sdk.values.PInput;
+import org.apache.beam.sdk.values.POutput;
+import org.apache.beam.sdk.values.PValue;
+import org.apache.beam.sdk.values.TupleTag;
+import org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.ByteString;
+import org.apache.beam.vendor.grpc.v1p13p1.io.grpc.ManagedChannelBuilder;
+import 
org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableMap;
+import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.Iterables;
+
+/** Cross-language external transform. */
+public class External {
+  private static final String EXPANDED_TRANSFORM_BASE_NAME = "external";
+  private static final String IMPULSE_PREFIX = "IMPULSE";
+  private static AtomicInteger namespaceCounter = new AtomicInteger(0);
+
+  private static final ExpansionServiceClientFactory DEFAULT =
+  new DefaultExpansionServiceClientFactory(
+  endPoint -> 
ManagedChannelBuilder.forTarget(endPoint.getUrl()).usePlaintext().build());
+
+  private static int getFreshNamespaceIndex() {
+return namespaceCounter.getAndIncrement();
+  }
+
+  public static  SingleOutputExpandableTransform of(
+  String urn, byte[] payload, String endpoint) {
 
 Review comment:
   I didn't modify the payload type since its Python counterpart has the same 
type signature: 
https://github.com/apache/beam/blob/0b71f541e93f3bd69af87ad8a6db46ccb4a01ddc/sdks/python/apache_beam/transforms/external.py#L53
   
   I think we can still provide a nicer way for constructing the payload with 
high-level wrappers.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218378)
Time Spent: 2h 40m  (was: 2.5h)

> Adding ExternalTransform in JavaSDK
> ---
>
> Key: BEAM-6747
> URL: https://issues.apache.org/jira/browse/BEAM-6747
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Adding Java counterpart of Python ExternalTransform for testing Python 
> transforms from pipelines in Java SDK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-6048) pylint-27 failed but not caught in Pre/PostCommit

2019-03-25 Thread Yueyang Qiu (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yueyang Qiu resolved BEAM-6048.
---
   Resolution: Fixed
Fix Version/s: 2.9.0

> pylint-27 failed but not caught in Pre/PostCommit
> -
>
> Key: BEAM-6048
> URL: https://issues.apache.org/jira/browse/BEAM-6048
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Mark Liu
>Assignee: Yueyang Qiu
>Priority: Major
>  Labels: triaged
> Fix For: 2.9.0
>
>
> tox py27-lint is failed but Gradle task :beam-sdks-python:lintPy27 ignore the 
> failure and return as passed. 
> console log:
> {code}
> 14:54:25 Running isort for module apache_beam  gen_protos.py  setup.py  
> test_config.py:
> 14:54:31 ERROR: 
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/runners/portability/stager.py
>  Imports are incorrectly sorted.
> 14:54:31 --- 
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/runners/portability/stager.py:before
>   2018-11-08 21:49:32.444622
> 14:54:31 +++ 
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/runners/portability/stager.py:after
>2018-11-08 22:54:29.293067
> 14:54:31 @@ -60,9 +60,9 @@
> 14:54:31  from apache_beam.io.filesystems import FileSystems
> 14:54:31  from apache_beam.options.pipeline_options import SetupOptions
> 14:54:31  from apache_beam.options.pipeline_options import WorkerOptions
> 14:54:31 -from apache_beam.runners.internal import names
> 14:54:31  # TODO(angoenka): Remove reference to dataflow internal names
> 14:54:31  from apache_beam.runners.dataflow.internal.names import 
> DATAFLOW_SDK_TARBALL_FILE
> 14:54:31 +from apache_beam.runners.internal import names
> 14:54:31  from apache_beam.utils import processes
> 14:54:31  
> 14:54:31  # All constants are for internal use only; no 
> backwards-compatibility
> 14:54:31 Command exited with non-zero status 1
> 14:54:31 418.43user 9.73system 1:03.13elapsed 678%CPU (0avgtext+0avgdata 
> 326144maxresident)k
> 14:54:31 0inputs+208outputs (0major+862242minor)pagefaults 0swaps
> 14:54:31 ERROR: InvocationError for command '/usr/bin/time 
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/scripts/run_pylint.sh'
>  (exited with code 1)
> 14:54:31 ___ summary 
> 
> 14:54:31 ERROR:   py27-lint: commands failed
> 14:54:31 # Retry once for the specific exit code -11.
> 14:54:31 if [[ $? == -11 ]]; then
> 14:54:31   tox -c tox.ini --recreate -e $1
> 14:54:31 fi
> 14:54:31 :beam-sdks-python:lintPy27 (Thread[Daemon worker,5,main]) completed. 
> Took 1 mins 21.549 secs.
> {code}
> from Jenkins run: 
> https://builds.apache.org/job/beam_PostCommit_Python_Verify_PR/211
> This also happened in PreCommit: 
> https://builds.apache.org/job/beam_PreCommit_Python_Commit/2366/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6619) Add PostCommit suite for integration tests on DataflowRunner

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6619?focusedWorklogId=218377=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218377
 ]

ASF GitHub Bot logged work on BEAM-6619:


Author: ASF GitHub Bot
Created on: 26/Mar/19 00:16
Start Date: 26/Mar/19 00:16
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #8098: [BEAM-6619] 
[BEAM-6593] update gradle to include all py3 it tests
URL: https://github.com/apache/beam/pull/8098#discussion_r268902994
 
 

 ##
 File path: sdks/python/test-suites/direct/py3/build.gradle
 ##
 @@ -29,19 +29,9 @@ task postCommitIT(dependsOn: 'installGcpTest') {
   doLast {
 def batchTests = [
 "apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it",
-
"apache_beam.examples.cookbook.bigquery_tornadoes_it_test:BigqueryTornadoesIT.test_bigquery_tornadoes_it",
 
 Review comment:
   Good point. I think unless there are specific reasons not to run some ITs on 
direct runner, we should run them all in postcommit suite.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218377)
Time Spent: 23h  (was: 22h 50m)

> Add PostCommit suite for integration tests on DataflowRunner
> 
>
> Key: BEAM-6619
> URL: https://issues.apache.org/jira/browse/BEAM-6619
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Labels: triaged
> Fix For: Not applicable
>
>  Time Spent: 23h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6892) Use temp_location for BQ FILE_LOADS on DirectRunner, and autocreate it in GCS if not specified by user.

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6892?focusedWorklogId=218367=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218367
 ]

ASF GitHub Bot logged work on BEAM-6892:


Author: ASF GitHub Bot
Created on: 26/Mar/19 00:16
Start Date: 26/Mar/19 00:16
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #8135: [BEAM-6892] 
Adding support for auto-creating buckets for BigQuery file loads
URL: https://github.com/apache/beam/pull/8135#discussion_r268890089
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/bigquery_file_loads.py
 ##
 @@ -490,9 +498,93 @@ def __init__(
 # job to run - and thus we avoid using temporary tables
 self.temp_tables = True if callable(destination) else False
 
+self.kms_key = kms_key
+
+self._validate = validate
+
+  def verify(self, options):
+
+self._custom_gcs_temp_location = (
+self._custom_gcs_temp_location
+or options.view_as(GoogleCloudOptions).temp_location)
+
+if (not self._custom_gcs_temp_location or
+not self._custom_gcs_temp_location.startswith('gs://')):
+
+  logging.info('No appropriate location was provided to perform file loads'
+   'to GCS.')
+  bucket = self.try_to_create_default_gcs_bucket(options)
+
+  if bucket:
+self._custom_gcs_temp_location = 'gs://%s/temp/' % bucket.name
+return
+
+  raise ValueError('Invalid GCS location.\n'
+   'Writing to BigQuery with FILE_LOADS method requires a '
+   'GCS location to be provided to write files to be '
+   'loaded into BigQuery. Please provide a GCS bucket, or '
+   'pass method="STREAMING_INSERTS" to WriteToBigQuery.')
+
+  def try_to_create_default_gcs_bucket(self, options):
+DEFAULT_BUCKET_NAME = "dataflow-staging-%s-%s"
 
 Review comment:
   I think this feature is better placed in something like dataflow_runner.py, 
because it's not BQ-specific. For example we could use this bucket when the 
--staging_location option is not given (like the Java SDK does).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218367)
Time Spent: 20m  (was: 10m)

> Use temp_location for BQ FILE_LOADS on DirectRunner, and autocreate it in GCS 
> if not specified by user.
> ---
>
> Key: BEAM-6892
> URL: https://issues.apache.org/jira/browse/BEAM-6892
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Pablo Estrada
>Priority: Blocker
> Fix For: 2.12.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6892) Use temp_location for BQ FILE_LOADS on DirectRunner, and autocreate it in GCS if not specified by user.

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6892?focusedWorklogId=218376=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218376
 ]

ASF GitHub Bot logged work on BEAM-6892:


Author: ASF GitHub Bot
Created on: 26/Mar/19 00:16
Start Date: 26/Mar/19 00:16
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #8135: [BEAM-6892] 
Adding support for auto-creating buckets for BigQuery file loads
URL: https://github.com/apache/beam/pull/8135#discussion_r268901248
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/gcsio.py
 ##
 @@ -152,6 +152,37 @@ def _set_rewrite_response_callback(self, callback):
 """
 self._rewrite_cb = callback
 
+  def get_bucket(self, bucket_name):
+"""Returns an object bucket from its name, or None if it does not exist."""
+try:
+  request = storage.StorageBucketsGetRequest(bucket=bucket_name)
+  return self.client.buckets.Get(request)
+except HttpError:
+  return None
+
+  def insert_bucket(self,
+bucket_name,
+project,
+kms_key=None,
+location=None):
+"""Create and return a GCS bucket in a specific project."""
 
 Review comment:
   Please document return types and what None means. (or better yet don't 
return None on failure; just pass the exception)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218376)
Time Spent: 1.5h  (was: 1h 20m)

> Use temp_location for BQ FILE_LOADS on DirectRunner, and autocreate it in GCS 
> if not specified by user.
> ---
>
> Key: BEAM-6892
> URL: https://issues.apache.org/jira/browse/BEAM-6892
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Pablo Estrada
>Priority: Blocker
> Fix For: 2.12.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6892) Use temp_location for BQ FILE_LOADS on DirectRunner, and autocreate it in GCS if not specified by user.

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6892?focusedWorklogId=218372=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218372
 ]

ASF GitHub Bot logged work on BEAM-6892:


Author: ASF GitHub Bot
Created on: 26/Mar/19 00:16
Start Date: 26/Mar/19 00:16
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #8135: [BEAM-6892] 
Adding support for auto-creating buckets for BigQuery file loads
URL: https://github.com/apache/beam/pull/8135#discussion_r268898602
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/bigquery_file_loads.py
 ##
 @@ -472,7 +478,9 @@ def __init__(
   coder=None,
   max_file_size=None,
   max_files_per_bundle=None,
-  test_client=None):
+  test_client=None,
+  kms_key=None,
+  validate=True):
 
 Review comment:
   Also, this name is not very descriptive. Maybe: `validate_pipeline_options` 
or `validate_temp_location`?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218372)
Time Spent: 1h  (was: 50m)

> Use temp_location for BQ FILE_LOADS on DirectRunner, and autocreate it in GCS 
> if not specified by user.
> ---
>
> Key: BEAM-6892
> URL: https://issues.apache.org/jira/browse/BEAM-6892
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Pablo Estrada
>Priority: Blocker
> Fix For: 2.12.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6892) Use temp_location for BQ FILE_LOADS on DirectRunner, and autocreate it in GCS if not specified by user.

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6892?focusedWorklogId=218375=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218375
 ]

ASF GitHub Bot logged work on BEAM-6892:


Author: ASF GitHub Bot
Created on: 26/Mar/19 00:16
Start Date: 26/Mar/19 00:16
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #8135: [BEAM-6892] 
Adding support for auto-creating buckets for BigQuery file loads
URL: https://github.com/apache/beam/pull/8135#discussion_r268901800
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/gcsio.py
 ##
 @@ -152,6 +152,37 @@ def _set_rewrite_response_callback(self, callback):
 """
 self._rewrite_cb = callback
 
+  def get_bucket(self, bucket_name):
 
 Review comment:
   Could you update this method to look more like `def exists(...)` below? (add 
retries, return None on 404 but otherwise raise)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218375)
Time Spent: 1h 20m  (was: 1h 10m)

> Use temp_location for BQ FILE_LOADS on DirectRunner, and autocreate it in GCS 
> if not specified by user.
> ---
>
> Key: BEAM-6892
> URL: https://issues.apache.org/jira/browse/BEAM-6892
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Pablo Estrada
>Priority: Blocker
> Fix For: 2.12.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6892) Use temp_location for BQ FILE_LOADS on DirectRunner, and autocreate it in GCS if not specified by user.

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6892?focusedWorklogId=218368=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218368
 ]

ASF GitHub Bot logged work on BEAM-6892:


Author: ASF GitHub Bot
Created on: 26/Mar/19 00:16
Start Date: 26/Mar/19 00:16
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #8135: [BEAM-6892] 
Adding support for auto-creating buckets for BigQuery file loads
URL: https://github.com/apache/beam/pull/8135#discussion_r268895938
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/bigquery_file_loads.py
 ##
 @@ -490,9 +498,93 @@ def __init__(
 # job to run - and thus we avoid using temporary tables
 self.temp_tables = True if callable(destination) else False
 
+self.kms_key = kms_key
+
+self._validate = validate
+
+  def verify(self, options):
+
+self._custom_gcs_temp_location = (
+self._custom_gcs_temp_location
+or options.view_as(GoogleCloudOptions).temp_location)
+
+if (not self._custom_gcs_temp_location or
+not self._custom_gcs_temp_location.startswith('gs://')):
+
+  logging.info('No appropriate location was provided to perform file loads'
+   'to GCS.')
+  bucket = self.try_to_create_default_gcs_bucket(options)
+
+  if bucket:
+self._custom_gcs_temp_location = 'gs://%s/temp/' % bucket.name
+return
+
+  raise ValueError('Invalid GCS location.\n'
+   'Writing to BigQuery with FILE_LOADS method requires a '
+   'GCS location to be provided to write files to be '
+   'loaded into BigQuery. Please provide a GCS bucket, or '
+   'pass method="STREAMING_INSERTS" to WriteToBigQuery.')
+
+  def try_to_create_default_gcs_bucket(self, options):
+DEFAULT_BUCKET_NAME = "dataflow-staging-%s-%s"
+DEFAULT_REGION = "US"
+logging.info('Attempting to get or create a default GCS bucket.')
+
+project_name = options.view_as(GoogleCloudOptions).project
+
+if not project_name and isinstance(self.destination,
+   bigquery_api.TableReference):
+  project_name = self.destination.projectId
+
+region = options.view_as(GoogleCloudOptions).region
+
+if not project_name:
+  raise ValueError('--project is a required option.'
+   ' To create a default bucket, Beam needs a project '
+   'parameter passed to your pipeline.')
+
+# Retrieve the project number for the default bucket
+from google.cloud import resource_manager
+client = resource_manager.Client()
+project_number = client.fetch_project(project_name).number
+
+# We get the region, and cut off the zone id if there is one.
+region = (region or DEFAULT_REGION).split('-')[0].lower()
+
+bucket_name = DEFAULT_BUCKET_NAME % (region, project_number)
+
+gcs = gcsio.GcsIO()
+bucket = gcs.get_bucket(bucket_name)
+
+if not bucket:
+  logging.warn(
 
 Review comment:
   I think this should be an info-level message, if it's part of normal flow.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218368)
Time Spent: 0.5h  (was: 20m)

> Use temp_location for BQ FILE_LOADS on DirectRunner, and autocreate it in GCS 
> if not specified by user.
> ---
>
> Key: BEAM-6892
> URL: https://issues.apache.org/jira/browse/BEAM-6892
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Pablo Estrada
>Priority: Blocker
> Fix For: 2.12.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6892) Use temp_location for BQ FILE_LOADS on DirectRunner, and autocreate it in GCS if not specified by user.

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6892?focusedWorklogId=218373=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218373
 ]

ASF GitHub Bot logged work on BEAM-6892:


Author: ASF GitHub Bot
Created on: 26/Mar/19 00:16
Start Date: 26/Mar/19 00:16
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #8135: [BEAM-6892] 
Adding support for auto-creating buckets for BigQuery file loads
URL: https://github.com/apache/beam/pull/8135#discussion_r268902480
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/gcsio.py
 ##
 @@ -152,6 +152,37 @@ def _set_rewrite_response_callback(self, callback):
 """
 self._rewrite_cb = callback
 
+  def get_bucket(self, bucket_name):
+"""Returns an object bucket from its name, or None if it does not exist."""
+try:
+  request = storage.StorageBucketsGetRequest(bucket=bucket_name)
+  return self.client.buckets.Get(request)
+except HttpError:
+  return None
+
+  def insert_bucket(self,
 
 Review comment:
   `create_bucket` is more user friendly. :)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218373)

> Use temp_location for BQ FILE_LOADS on DirectRunner, and autocreate it in GCS 
> if not specified by user.
> ---
>
> Key: BEAM-6892
> URL: https://issues.apache.org/jira/browse/BEAM-6892
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Pablo Estrada
>Priority: Blocker
> Fix For: 2.12.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6892) Use temp_location for BQ FILE_LOADS on DirectRunner, and autocreate it in GCS if not specified by user.

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6892?focusedWorklogId=218374=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218374
 ]

ASF GitHub Bot logged work on BEAM-6892:


Author: ASF GitHub Bot
Created on: 26/Mar/19 00:16
Start Date: 26/Mar/19 00:16
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #8135: [BEAM-6892] 
Adding support for auto-creating buckets for BigQuery file loads
URL: https://github.com/apache/beam/pull/8135#discussion_r268899333
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/bigquery_file_loads.py
 ##
 @@ -490,9 +498,93 @@ def __init__(
 # job to run - and thus we avoid using temporary tables
 self.temp_tables = True if callable(destination) else False
 
+self.kms_key = kms_key
+
+self._validate = validate
+
+  def verify(self, options):
+
+self._custom_gcs_temp_location = (
+self._custom_gcs_temp_location
 
 Review comment:
   I believe this would raise an exception since 
`self._custom_gcs_temp_location` is not defined yet.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218374)
Time Spent: 1h 10m  (was: 1h)

> Use temp_location for BQ FILE_LOADS on DirectRunner, and autocreate it in GCS 
> if not specified by user.
> ---
>
> Key: BEAM-6892
> URL: https://issues.apache.org/jira/browse/BEAM-6892
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Pablo Estrada
>Priority: Blocker
> Fix For: 2.12.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6892) Use temp_location for BQ FILE_LOADS on DirectRunner, and autocreate it in GCS if not specified by user.

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6892?focusedWorklogId=218369=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218369
 ]

ASF GitHub Bot logged work on BEAM-6892:


Author: ASF GitHub Bot
Created on: 26/Mar/19 00:16
Start Date: 26/Mar/19 00:16
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #8135: [BEAM-6892] 
Adding support for auto-creating buckets for BigQuery file loads
URL: https://github.com/apache/beam/pull/8135#discussion_r268898027
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/bigquery_file_loads.py
 ##
 @@ -472,7 +478,9 @@ def __init__(
   coder=None,
   max_file_size=None,
   max_files_per_bundle=None,
-  test_client=None):
+  test_client=None,
+  kms_key=None,
+  validate=True):
 
 Review comment:
   When would you want this to be false?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218369)
Time Spent: 40m  (was: 0.5h)

> Use temp_location for BQ FILE_LOADS on DirectRunner, and autocreate it in GCS 
> if not specified by user.
> ---
>
> Key: BEAM-6892
> URL: https://issues.apache.org/jira/browse/BEAM-6892
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Pablo Estrada
>Priority: Blocker
> Fix For: 2.12.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6892) Use temp_location for BQ FILE_LOADS on DirectRunner, and autocreate it in GCS if not specified by user.

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6892?focusedWorklogId=218371=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218371
 ]

ASF GitHub Bot logged work on BEAM-6892:


Author: ASF GitHub Bot
Created on: 26/Mar/19 00:16
Start Date: 26/Mar/19 00:16
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #8135: [BEAM-6892] 
Adding support for auto-creating buckets for BigQuery file loads
URL: https://github.com/apache/beam/pull/8135#discussion_r268901190
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/gcsio.py
 ##
 @@ -152,6 +152,37 @@ def _set_rewrite_response_callback(self, callback):
 """
 self._rewrite_cb = callback
 
+  def get_bucket(self, bucket_name):
+"""Returns an object bucket from its name, or None if it does not exist."""
+try:
+  request = storage.StorageBucketsGetRequest(bucket=bucket_name)
+  return self.client.buckets.Get(request)
+except HttpError:
+  return None
+
+  def insert_bucket(self,
+bucket_name,
+project,
+kms_key=None,
+location=None):
+"""Create and return a GCS bucket in a specific project."""
+encryption = None
+if kms_key:
+  encryption = storage.Bucket.EncryptionValue(kms_key)
+
+request = storage.StorageBucketsInsertRequest(
+bucket=storage.Bucket(
+name=bucket_name,
+location=location,
+encryption=encryption
+),
+project=project,
+)
+try:
+  return self.client.buckets.Insert(request)
+except HttpError:
+  return None
 
 Review comment:
   I think that errors should be surfaced back to the user. Note that any 
methods added to GcsIO may be used by Beam users.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218371)
Time Spent: 1h  (was: 50m)

> Use temp_location for BQ FILE_LOADS on DirectRunner, and autocreate it in GCS 
> if not specified by user.
> ---
>
> Key: BEAM-6892
> URL: https://issues.apache.org/jira/browse/BEAM-6892
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Pablo Estrada
>Priority: Blocker
> Fix For: 2.12.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6892) Use temp_location for BQ FILE_LOADS on DirectRunner, and autocreate it in GCS if not specified by user.

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6892?focusedWorklogId=218370=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218370
 ]

ASF GitHub Bot logged work on BEAM-6892:


Author: ASF GitHub Bot
Created on: 26/Mar/19 00:16
Start Date: 26/Mar/19 00:16
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #8135: [BEAM-6892] 
Adding support for auto-creating buckets for BigQuery file loads
URL: https://github.com/apache/beam/pull/8135#discussion_r268899688
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/bigquery_file_loads.py
 ##
 @@ -490,9 +498,93 @@ def __init__(
 # job to run - and thus we avoid using temporary tables
 self.temp_tables = True if callable(destination) else False
 
+self.kms_key = kms_key
+
+self._validate = validate
+
+  def verify(self, options):
+
+self._custom_gcs_temp_location = (
+self._custom_gcs_temp_location
 
 Review comment:
   Also, `self._custom_gcs_temp_location` is not used elsewhere.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218370)
Time Spent: 50m  (was: 40m)

> Use temp_location for BQ FILE_LOADS on DirectRunner, and autocreate it in GCS 
> if not specified by user.
> ---
>
> Key: BEAM-6892
> URL: https://issues.apache.org/jira/browse/BEAM-6892
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Pablo Estrada
>Priority: Blocker
> Fix For: 2.12.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-5911) Python Dataflow IT test fails even though job succeeded

2019-03-25 Thread Yueyang Qiu (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yueyang Qiu resolved BEAM-5911.
---
   Resolution: Fixed
Fix Version/s: 2.10.0

> Python Dataflow IT test fails even though job succeeded
> ---
>
> Key: BEAM-5911
> URL: https://issues.apache.org/jira/browse/BEAM-5911
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Henning Rohde
>Assignee: Yueyang Qiu
>Priority: Major
>  Labels: triaged
> Fix For: 2.10.0
>
>
> https://scans.gradle.com/s/rult2khtcwhvy/console-log?task=:beam-sdks-python:postCommitITTests#L185
> ==
> ERROR: test_wordcount_it (apache_beam.examples.wordcount_it_test.WordCountIT)
> --
> Traceback (most recent call last):
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/examples/wordcount_it_test.py",
>  line 47, in test_wordcount_it
> self._run_wordcount_it()
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/examples/wordcount_it_test.py",
>  line 75, in _run_wordcount_it
> wordcount.run(test_pipeline.get_full_options_as_args(**extra_opts))
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/examples/wordcount.py",
>  line 115, in run
> result = p.run()
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/pipeline.py",
>  line 405, in run
> self._options).run(False)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/pipeline.py",
>  line 418, in run
> return self.runner.run_pipeline(self)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py",
>  line 67, in run_pipeline
> hc_assert_that(self.result, pickler.loads(on_success_matcher))
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/build/gradleenv/local/lib/python2.7/site-packages/hamcrest/core/assert_that.py",
>  line 43, in assert_that
> _assert_match(actual=arg1, matcher=arg2, reason=arg3)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/build/gradleenv/local/lib/python2.7/site-packages/hamcrest/core/assert_that.py",
>  line 49, in _assert_match
> if not matcher.matches(actual):
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/build/gradleenv/local/lib/pytho
> n2.7/site-packages/hamcrest/core/core/allof.py", line 16, in matches
> if not matcher.matches(item):
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/build/gradleenv/local/lib/python2.7/site-packages/hamcrest/core/base_matcher.py",
>  line 28, in matches
> match_result = self._matches(item)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/testing/pipeline_verifiers.py",
>  line 139, in _matches
> read_lines = self._read_with_retry()
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/utils/retry.py",
>  line 197, in wrapper
> raise_with_traceback(exn, exn_traceback)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/utils/retry.py",
>  line 184, in wrapper
> return fun(*args, **kwargs)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/testing/pipeline_verifiers.py",
>  line 122, in _read_with_retry
> raise IOError('No such file or directory: %s' % self.file_path)
> IOError: No such file or directory: 
> gs://temp-storage-for-end-to-end-tests/py-it-cloud/output/1540778275/results*-of-*
>  >> begin captured stdout << -
> Found: 
> https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-10-28_18_58_04-427174494982975515?project=apache-beam-testing.
> - >> end captured stdout << --
> --
> XML: 
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/nosetests.xml
> --
> Ran 18 tests in 3196.711s
> The job succeeded. Race condition or are we checking the wrong location?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work started] (BEAM-5911) Python Dataflow IT test fails even though job succeeded

2019-03-25 Thread Yueyang Qiu (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-5911 started by Yueyang Qiu.
-
> Python Dataflow IT test fails even though job succeeded
> ---
>
> Key: BEAM-5911
> URL: https://issues.apache.org/jira/browse/BEAM-5911
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Henning Rohde
>Assignee: Yueyang Qiu
>Priority: Major
>  Labels: triaged
>
> https://scans.gradle.com/s/rult2khtcwhvy/console-log?task=:beam-sdks-python:postCommitITTests#L185
> ==
> ERROR: test_wordcount_it (apache_beam.examples.wordcount_it_test.WordCountIT)
> --
> Traceback (most recent call last):
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/examples/wordcount_it_test.py",
>  line 47, in test_wordcount_it
> self._run_wordcount_it()
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/examples/wordcount_it_test.py",
>  line 75, in _run_wordcount_it
> wordcount.run(test_pipeline.get_full_options_as_args(**extra_opts))
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/examples/wordcount.py",
>  line 115, in run
> result = p.run()
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/pipeline.py",
>  line 405, in run
> self._options).run(False)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/pipeline.py",
>  line 418, in run
> return self.runner.run_pipeline(self)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py",
>  line 67, in run_pipeline
> hc_assert_that(self.result, pickler.loads(on_success_matcher))
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/build/gradleenv/local/lib/python2.7/site-packages/hamcrest/core/assert_that.py",
>  line 43, in assert_that
> _assert_match(actual=arg1, matcher=arg2, reason=arg3)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/build/gradleenv/local/lib/python2.7/site-packages/hamcrest/core/assert_that.py",
>  line 49, in _assert_match
> if not matcher.matches(actual):
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/build/gradleenv/local/lib/pytho
> n2.7/site-packages/hamcrest/core/core/allof.py", line 16, in matches
> if not matcher.matches(item):
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/build/gradleenv/local/lib/python2.7/site-packages/hamcrest/core/base_matcher.py",
>  line 28, in matches
> match_result = self._matches(item)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/testing/pipeline_verifiers.py",
>  line 139, in _matches
> read_lines = self._read_with_retry()
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/utils/retry.py",
>  line 197, in wrapper
> raise_with_traceback(exn, exn_traceback)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/utils/retry.py",
>  line 184, in wrapper
> return fun(*args, **kwargs)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/testing/pipeline_verifiers.py",
>  line 122, in _read_with_retry
> raise IOError('No such file or directory: %s' % self.file_path)
> IOError: No such file or directory: 
> gs://temp-storage-for-end-to-end-tests/py-it-cloud/output/1540778275/results*-of-*
>  >> begin captured stdout << -
> Found: 
> https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-10-28_18_58_04-427174494982975515?project=apache-beam-testing.
> - >> end captured stdout << --
> --
> XML: 
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/nosetests.xml
> --
> Ran 18 tests in 3196.711s
> The job succeeded. Race condition or are we checking the wrong location?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5911) Python Dataflow IT test fails even though job succeeded

2019-03-25 Thread Yueyang Qiu (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yueyang Qiu updated BEAM-5911:
--
External issue URL:   (was: https://github.com/apache/beam/pull/7424)

> Python Dataflow IT test fails even though job succeeded
> ---
>
> Key: BEAM-5911
> URL: https://issues.apache.org/jira/browse/BEAM-5911
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Henning Rohde
>Assignee: Yueyang Qiu
>Priority: Major
>  Labels: triaged
>
> https://scans.gradle.com/s/rult2khtcwhvy/console-log?task=:beam-sdks-python:postCommitITTests#L185
> ==
> ERROR: test_wordcount_it (apache_beam.examples.wordcount_it_test.WordCountIT)
> --
> Traceback (most recent call last):
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/examples/wordcount_it_test.py",
>  line 47, in test_wordcount_it
> self._run_wordcount_it()
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/examples/wordcount_it_test.py",
>  line 75, in _run_wordcount_it
> wordcount.run(test_pipeline.get_full_options_as_args(**extra_opts))
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/examples/wordcount.py",
>  line 115, in run
> result = p.run()
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/pipeline.py",
>  line 405, in run
> self._options).run(False)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/pipeline.py",
>  line 418, in run
> return self.runner.run_pipeline(self)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py",
>  line 67, in run_pipeline
> hc_assert_that(self.result, pickler.loads(on_success_matcher))
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/build/gradleenv/local/lib/python2.7/site-packages/hamcrest/core/assert_that.py",
>  line 43, in assert_that
> _assert_match(actual=arg1, matcher=arg2, reason=arg3)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/build/gradleenv/local/lib/python2.7/site-packages/hamcrest/core/assert_that.py",
>  line 49, in _assert_match
> if not matcher.matches(actual):
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/build/gradleenv/local/lib/pytho
> n2.7/site-packages/hamcrest/core/core/allof.py", line 16, in matches
> if not matcher.matches(item):
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/build/gradleenv/local/lib/python2.7/site-packages/hamcrest/core/base_matcher.py",
>  line 28, in matches
> match_result = self._matches(item)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/testing/pipeline_verifiers.py",
>  line 139, in _matches
> read_lines = self._read_with_retry()
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/utils/retry.py",
>  line 197, in wrapper
> raise_with_traceback(exn, exn_traceback)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/utils/retry.py",
>  line 184, in wrapper
> return fun(*args, **kwargs)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/testing/pipeline_verifiers.py",
>  line 122, in _read_with_retry
> raise IOError('No such file or directory: %s' % self.file_path)
> IOError: No such file or directory: 
> gs://temp-storage-for-end-to-end-tests/py-it-cloud/output/1540778275/results*-of-*
>  >> begin captured stdout << -
> Found: 
> https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-10-28_18_58_04-427174494982975515?project=apache-beam-testing.
> - >> end captured stdout << --
> --
> XML: 
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/nosetests.xml
> --
> Ran 18 tests in 3196.711s
> The job succeeded. Race condition or are we checking the wrong location?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5911) Python Dataflow IT test fails even though job succeeded

2019-03-25 Thread Yueyang Qiu (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yueyang Qiu updated BEAM-5911:
--
External issue URL: https://github.com/apache/beam/pull/7424

> Python Dataflow IT test fails even though job succeeded
> ---
>
> Key: BEAM-5911
> URL: https://issues.apache.org/jira/browse/BEAM-5911
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Henning Rohde
>Assignee: Yueyang Qiu
>Priority: Major
>  Labels: triaged
>
> https://scans.gradle.com/s/rult2khtcwhvy/console-log?task=:beam-sdks-python:postCommitITTests#L185
> ==
> ERROR: test_wordcount_it (apache_beam.examples.wordcount_it_test.WordCountIT)
> --
> Traceback (most recent call last):
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/examples/wordcount_it_test.py",
>  line 47, in test_wordcount_it
> self._run_wordcount_it()
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/examples/wordcount_it_test.py",
>  line 75, in _run_wordcount_it
> wordcount.run(test_pipeline.get_full_options_as_args(**extra_opts))
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/examples/wordcount.py",
>  line 115, in run
> result = p.run()
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/pipeline.py",
>  line 405, in run
> self._options).run(False)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/pipeline.py",
>  line 418, in run
> return self.runner.run_pipeline(self)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py",
>  line 67, in run_pipeline
> hc_assert_that(self.result, pickler.loads(on_success_matcher))
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/build/gradleenv/local/lib/python2.7/site-packages/hamcrest/core/assert_that.py",
>  line 43, in assert_that
> _assert_match(actual=arg1, matcher=arg2, reason=arg3)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/build/gradleenv/local/lib/python2.7/site-packages/hamcrest/core/assert_that.py",
>  line 49, in _assert_match
> if not matcher.matches(actual):
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/build/gradleenv/local/lib/pytho
> n2.7/site-packages/hamcrest/core/core/allof.py", line 16, in matches
> if not matcher.matches(item):
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/build/gradleenv/local/lib/python2.7/site-packages/hamcrest/core/base_matcher.py",
>  line 28, in matches
> match_result = self._matches(item)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/testing/pipeline_verifiers.py",
>  line 139, in _matches
> read_lines = self._read_with_retry()
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/utils/retry.py",
>  line 197, in wrapper
> raise_with_traceback(exn, exn_traceback)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/utils/retry.py",
>  line 184, in wrapper
> return fun(*args, **kwargs)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/testing/pipeline_verifiers.py",
>  line 122, in _read_with_retry
> raise IOError('No such file or directory: %s' % self.file_path)
> IOError: No such file or directory: 
> gs://temp-storage-for-end-to-end-tests/py-it-cloud/output/1540778275/results*-of-*
>  >> begin captured stdout << -
> Found: 
> https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-10-28_18_58_04-427174494982975515?project=apache-beam-testing.
> - >> end captured stdout << --
> --
> XML: 
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/nosetests.xml
> --
> Ran 18 tests in 3196.711s
> The job succeeded. Race condition or are we checking the wrong location?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6257) Can we deprecate the side input paths through PAssert?

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6257?focusedWorklogId=218349=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218349
 ]

ASF GitHub Bot logged work on BEAM-6257:


Author: ASF GitHub Bot
Created on: 25/Mar/19 23:32
Start Date: 25/Mar/19 23:32
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #8094: [BEAM-6257] 
PAssert.thatSingleton: use GBK instead of side inputs
URL: https://github.com/apache/beam/pull/8094#discussion_r268894613
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/testing/PAssert.java
 ##
 @@ -23,6 +23,7 @@
 import static org.hamcrest.Matchers.not;
 import static org.junit.Assert.assertThat;
 
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
 
 Review comment:
   As far as I can tell, the spotbugs directory structure and class names look 
exactly the same as findbugs: 
https://github.com/spotbugs/spotbugs/blob/master/spotbugs-annotations/src/main/java/edu/umd/cs/findbugs/annotations/SuppressFBWarnings.java
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218349)
Time Spent: 2h 20m  (was: 2h 10m)

> Can we deprecate the side input paths through PAssert?
> --
>
> Key: BEAM-6257
> URL: https://issues.apache.org/jira/browse/BEAM-6257
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Priority: Major
>  Labels: starter, triaged
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> PAssert has two distinct paths - one uses GBK with a single-firing trigger, 
> and one uses side inputs. Side inputs are usually a later addition to a 
> runner, while GBK is one of the first primitives (with a single firing it is 
> even simple). Filing this against myself to figure out why the side input 
> version is not deprecated, and if it can be deprecated.
> Marking this as a "starter" task because finding and eliminating side input 
> version of PAssert should be fairly easy. You might need help but can ask on 
> dev@



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6903) Go IT fails on quota issues frequently

2019-03-25 Thread Robert Burke (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801223#comment-16801223
 ] 

Robert Burke commented on BEAM-6903:


The error:

2019/03/25 06:04:12 Test flatten:flatten failed: googleapi: Error 429: Quota 
exceeded for quota metric 'dataflow.googleapis.com/create_requests' and limit 
'CreateRequestsPerMinutePerUser' of service 'dataflow.googleapis.com' for 
consumer 'project_number:'., rateLimitExceeded


This means that we're hitting our quota with the dataflow service. Anything 
also running against dataflow would be running into this issue. 

The Go SDK doesn't run that many jobs (granted it's ~7 simultaneously), and 
runs on the periodic post-commit unless manually. The only options are to 
somehow get our quota with dataflow increased, or changing the Go SDK 
integration test driver code to add a pause between sending job requests, to 
reduce the rate. This would naturally increase test run time.

Assigning to Jason for triage, and possibly increasing our simultaneous job 
quota. Let me know if adding the wait is desired.

> Go IT fails on quota issues frequently
> --
>
> Key: BEAM-6903
> URL: https://issues.apache.org/jira/browse/BEAM-6903
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Boyuan Zhang
>Assignee: Jason Kuster
>Priority: Major
>
> https://builds.apache.org/job/beam_PostCommit_Go/3002/
> https://builds.apache.org/job/beam_PostCommit_Go/3000/
> https://builds.apache.org/job/beam_PostCommit_Go/2997/
> https://builds.apache.org/job/beam_PostCommit_Go/2993/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-5576) Beam Dependency Update Request: io.dropwizard.metrics:metrics-core

2019-03-25 Thread yifan zou (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou resolved BEAM-5576.
-
   Resolution: Won't Fix
Fix Version/s: Not applicable

Not going to upgrade this to a RC version. Close it for now.

> Beam Dependency Update Request: io.dropwizard.metrics:metrics-core
> --
>
> Key: BEAM-5576
> URL: https://issues.apache.org/jira/browse/BEAM-5576
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Priority: Major
> Fix For: Not applicable
>
>
>  - 2018-10-01 19:31:51.178224 
> -
> Please consider upgrading the dependency 
> io.dropwizard.metrics:metrics-core. 
> The current version is 3.1.2. The latest version is 4.1.0-rc2 
> cc: [~swegner], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-10-08 12:19:45.297255 
> -
> Please consider upgrading the dependency 
> io.dropwizard.metrics:metrics-core. 
> The current version is 3.1.2. The latest version is 4.1.0-rc2 
> cc: [~swegner], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-10-15 12:13:37.889638 
> -
> Please consider upgrading the dependency 
> io.dropwizard.metrics:metrics-core. 
> The current version is 3.1.2. The latest version is 4.1.0-rc2 
> cc: [~swegner], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-10-22 12:13:57.542553 
> -
> Please consider upgrading the dependency 
> io.dropwizard.metrics:metrics-core. 
> The current version is 3.1.2. The latest version is 4.1.0-rc2 
> cc: [~swegner], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-10-29 12:18:45.797770 
> -
> Please consider upgrading the dependency 
> io.dropwizard.metrics:metrics-core. 
> The current version is 3.1.2. The latest version is 4.1.0-rc2 
> cc: [~swegner], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-11-05 12:15:32.624922 
> -
> Please consider upgrading the dependency 
> io.dropwizard.metrics:metrics-core. 
> The current version is 3.1.2. The latest version is 4.1.0-rc2 
> cc: [~swegner], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-11-12 12:15:32.563497 
> -
> Please consider upgrading the dependency 
> io.dropwizard.metrics:metrics-core. 
> The current version is 3.1.2. The latest version is 4.1.0-rc2 
> cc: [~swegner], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-11-19 12:16:14.103011 
> -
> Please consider upgrading the dependency 
> io.dropwizard.metrics:metrics-core. 
> The current version is 3.1.2. The latest version is 4.1.0-rc2 
> cc: [~swegner], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-11-26 12:15:14.707561 
> -
> Please consider upgrading the dependency 
> io.dropwizard.metrics:metrics-core. 
> The current version is 3.1.2. The latest version is 4.1.0-rc2 
> cc: [~swegner], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-12-03 12:15:42.813697 
> -
> Please consider upgrading the dependency 
> io.dropwizard.metrics:metrics-core. 
> The current version is 3.1.2. The latest version is 4.1.0-rc2 
> cc: [~swegner], 
>  Please refer to [Beam Dependency Guide 
> 

[jira] [Resolved] (BEAM-6896) Beam Dependency Update Request: PyYAML

2019-03-25 Thread yifan zou (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou resolved BEAM-6896.
-
   Resolution: Won't Fix
Fix Version/s: Not applicable

This is not used by the SDK, but only for the dependency check itself. 

> Beam Dependency Update Request: PyYAML
> --
>
> Key: BEAM-6896
> URL: https://issues.apache.org/jira/browse/BEAM-6896
> Project: Beam
>  Issue Type: Bug
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Priority: Major
> Fix For: Not applicable
>
>
>  - 2019-03-25 04:17:47.501359 
> -
> Please consider upgrading the dependency PyYAML. 
> The current version is 3.13. The latest version is 5.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-6903) Go IT fails on quota issues frequently

2019-03-25 Thread Robert Burke (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke reassigned BEAM-6903:
--

Assignee: Jason Kuster  (was: Robert Burke)

> Go IT fails on quota issues frequently
> --
>
> Key: BEAM-6903
> URL: https://issues.apache.org/jira/browse/BEAM-6903
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Boyuan Zhang
>Assignee: Jason Kuster
>Priority: Major
>
> https://builds.apache.org/job/beam_PostCommit_Go/3002/
> https://builds.apache.org/job/beam_PostCommit_Go/3000/
> https://builds.apache.org/job/beam_PostCommit_Go/2997/
> https://builds.apache.org/job/beam_PostCommit_Go/2993/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-6039) Beam Dependency Update Request: org.apache.rat:apache-rat-tasks

2019-03-25 Thread yifan zou (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou resolved BEAM-6039.
-
   Resolution: Won't Fix
Fix Version/s: Not applicable

There is no plan to upgrade the rat to 0.13. Close this ticket to stop updating 
on it and improve the readability of the dependency report.

> Beam Dependency Update Request: org.apache.rat:apache-rat-tasks
> ---
>
> Key: BEAM-6039
> URL: https://issues.apache.org/jira/browse/BEAM-6039
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Priority: Major
> Fix For: Not applicable
>
>
>  - 2018-11-12 12:11:36.170970 
> -
> Please consider upgrading the dependency 
> org.apache.rat:apache-rat-tasks. 
> The current version is 0.12. The latest version is 0.13 
> cc: [~swegner], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-11-19 12:12:22.129873 
> -
> Please consider upgrading the dependency 
> org.apache.rat:apache-rat-tasks. 
> The current version is 0.12. The latest version is 0.13 
> cc: [~swegner], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-11-26 12:11:02.426441 
> -
> Please consider upgrading the dependency 
> org.apache.rat:apache-rat-tasks. 
> The current version is 0.12. The latest version is 0.13 
> cc: [~swegner], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-12-03 12:11:50.802389 
> -
> Please consider upgrading the dependency 
> org.apache.rat:apache-rat-tasks. 
> The current version is 0.12. The latest version is 0.13 
> cc: [~swegner], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-12-10 12:14:09.855564 
> -
> Please consider upgrading the dependency 
> org.apache.rat:apache-rat-tasks. 
> The current version is 0.12. The latest version is 0.13 
> cc: [~swegner], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-12-17 12:14:43.026726 
> -
> Please consider upgrading the dependency 
> org.apache.rat:apache-rat-tasks. 
> The current version is 0.12. The latest version is 0.13 
> cc: [~swegner], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-12-31 15:20:55.703600 
> -
> Please consider upgrading the dependency 
> org.apache.rat:apache-rat-tasks. 
> The current version is 0.12. The latest version is 0.13 
> cc: [~swegner], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-01-07 12:24:00.066240 
> -
> Please consider upgrading the dependency 
> org.apache.rat:apache-rat-tasks. 
> The current version is 0.12. The latest version is 0.13 
> cc: [~swegner], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-01-14 12:12:37.445993 
> -
> Please consider upgrading the dependency 
> org.apache.rat:apache-rat-tasks. 
> The current version is 0.12. The latest version is 0.13 
> cc: [~swegner], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-01-21 12:19:02.563910 
> -
> Please consider upgrading the dependency 
> org.apache.rat:apache-rat-tasks. 
> The current version is 0.12. The latest version is 0.13 
> cc: [~swegner], 
>  Please refer to [Beam Dependency Guide 
> 

[jira] [Created] (BEAM-6908) Supports Python3 benchmark

2019-03-25 Thread Mark Liu (JIRA)
Mark Liu created BEAM-6908:
--

 Summary: Supports Python3 benchmark
 Key: BEAM-6908
 URL: https://issues.apache.org/jira/browse/BEAM-6908
 Project: Beam
  Issue Type: Sub-task
  Components: testing
Reporter: Mark Liu
Assignee: Mark Liu


Similar to 
[beam_PerformanceTests_Python|https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/job/beam_PerformanceTests_Python/],
 we want to have a Python3 benchmark running on Jenkins to detect performance 
regression during code adoption.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=218342=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218342
 ]

ASF GitHub Bot logged work on BEAM-4374:


Author: ASF GitHub Bot
Created on: 25/Mar/19 23:03
Start Date: 25/Mar/19 23:03
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on issue #8136: 
[DoNotMerge][DoNotReview][BEAM-4374] Emit MeanByteCount distribution tuple 
system metric from …
URL: https://github.com/apache/beam/pull/8136#issuecomment-476410476
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218342)
Time Spent: 9h 50m  (was: 9h 40m)

> Update existing metrics in the FN API to use new Metric Schema
> --
>
> Key: BEAM-4374
> URL: https://issues.apache.org/jira/browse/BEAM-4374
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Alex Amato
>Priority: Major
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> Update existing metrics to use the new proto and cataloging schema defined in:
> [_https://s.apache.org/beam-fn-api-metrics_]
>  * Check in new protos
>  * Define catalog file for metrics
>  * Port existing metrics to use this new format, based on catalog 
> names+metadata



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=218344=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218344
 ]

ASF GitHub Bot logged work on BEAM-4374:


Author: ASF GitHub Bot
Created on: 25/Mar/19 23:03
Start Date: 25/Mar/19 23:03
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on issue #8136: 
[DoNotMerge][DoNotReview][BEAM-4374] Emit MeanByteCount distribution tuple 
system metric from …
URL: https://github.com/apache/beam/pull/8136#issuecomment-476410476
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218344)
Time Spent: 10h  (was: 9h 50m)

> Update existing metrics in the FN API to use new Metric Schema
> --
>
> Key: BEAM-4374
> URL: https://issues.apache.org/jira/browse/BEAM-4374
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Alex Amato
>Priority: Major
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> Update existing metrics to use the new proto and cataloging schema defined in:
> [_https://s.apache.org/beam-fn-api-metrics_]
>  * Check in new protos
>  * Define catalog file for metrics
>  * Port existing metrics to use this new format, based on catalog 
> names+metadata



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6905) Add support for User Distribution metrics for Python Streaming

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6905?focusedWorklogId=218339=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218339
 ]

ASF GitHub Bot logged work on BEAM-6905:


Author: ASF GitHub Bot
Created on: 25/Mar/19 23:01
Start Date: 25/Mar/19 23:01
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on issue #8131: [BEAM-6905] Add 
support for user distribution metrics.
URL: https://github.com/apache/beam/pull/8131#issuecomment-476410033
 
 
   java: io.hadoop test failed
   python: setupEnv for py3 failed.
   Both seem flakes. Trying to rerun.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218339)
Time Spent: 1h 50m  (was: 1h 40m)

> Add support for User Distribution metrics for Python Streaming
> --
>
> Key: BEAM-6905
> URL: https://issues.apache.org/jira/browse/BEAM-6905
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-harness
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> # Add new UserDistributionCounter urn to metrics.proto
>  # Report UserDistribution metric in python SDK
>  # Plumb User Distribution metric in Dataflow runner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6905) Add support for User Distribution metrics for Python Streaming

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6905?focusedWorklogId=218336=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218336
 ]

ASF GitHub Bot logged work on BEAM-6905:


Author: ASF GitHub Bot
Created on: 25/Mar/19 23:01
Start Date: 25/Mar/19 23:01
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on issue #8131: [BEAM-6905] Add 
support for user distribution metrics.
URL: https://github.com/apache/beam/pull/8131#issuecomment-476409917
 
 
   run python precommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218336)
Time Spent: 1h 40m  (was: 1.5h)

> Add support for User Distribution metrics for Python Streaming
> --
>
> Key: BEAM-6905
> URL: https://issues.apache.org/jira/browse/BEAM-6905
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-harness
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> # Add new UserDistributionCounter urn to metrics.proto
>  # Report UserDistribution metric in python SDK
>  # Plumb User Distribution metric in Dataflow runner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6905) Add support for User Distribution metrics for Python Streaming

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6905?focusedWorklogId=218340=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218340
 ]

ASF GitHub Bot logged work on BEAM-6905:


Author: ASF GitHub Bot
Created on: 25/Mar/19 23:01
Start Date: 25/Mar/19 23:01
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on issue #8131: [BEAM-6905] Add 
support for user distribution metrics.
URL: https://github.com/apache/beam/pull/8131#issuecomment-476409917
 
 
   run python precommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218340)
Time Spent: 2h  (was: 1h 50m)

> Add support for User Distribution metrics for Python Streaming
> --
>
> Key: BEAM-6905
> URL: https://issues.apache.org/jira/browse/BEAM-6905
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-harness
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> # Add new UserDistributionCounter urn to metrics.proto
>  # Report UserDistribution metric in python SDK
>  # Plumb User Distribution metric in Dataflow runner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6905) Add support for User Distribution metrics for Python Streaming

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6905?focusedWorklogId=218341=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218341
 ]

ASF GitHub Bot logged work on BEAM-6905:


Author: ASF GitHub Bot
Created on: 25/Mar/19 23:01
Start Date: 25/Mar/19 23:01
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on issue #8131: [BEAM-6905] Add 
support for user distribution metrics.
URL: https://github.com/apache/beam/pull/8131#issuecomment-476409679
 
 
   run java precommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218341)
Time Spent: 2h 10m  (was: 2h)

> Add support for User Distribution metrics for Python Streaming
> --
>
> Key: BEAM-6905
> URL: https://issues.apache.org/jira/browse/BEAM-6905
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-harness
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> # Add new UserDistributionCounter urn to metrics.proto
>  # Report UserDistribution metric in python SDK
>  # Plumb User Distribution metric in Dataflow runner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6905) Add support for User Distribution metrics for Python Streaming

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6905?focusedWorklogId=218335=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218335
 ]

ASF GitHub Bot logged work on BEAM-6905:


Author: ASF GitHub Bot
Created on: 25/Mar/19 23:00
Start Date: 25/Mar/19 23:00
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on issue #8131: [BEAM-6905] Add 
support for user distribution metrics.
URL: https://github.com/apache/beam/pull/8131#issuecomment-476409679
 
 
   run java precommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218335)
Time Spent: 1.5h  (was: 1h 20m)

> Add support for User Distribution metrics for Python Streaming
> --
>
> Key: BEAM-6905
> URL: https://issues.apache.org/jira/browse/BEAM-6905
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-harness
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> # Add new UserDistributionCounter urn to metrics.proto
>  # Report UserDistribution metric in python SDK
>  # Plumb User Distribution metric in Dataflow runner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6907) Standardize Gradle projects/tasks structure for Python SDK

2019-03-25 Thread Mark Liu (JIRA)
Mark Liu created BEAM-6907:
--

 Summary: Standardize Gradle projects/tasks structure for Python SDK
 Key: BEAM-6907
 URL: https://issues.apache.org/jira/browse/BEAM-6907
 Project: Beam
  Issue Type: Task
  Components: build-system
Reporter: Mark Liu
Assignee: Mark Liu


As Gradle parallelism applied to Python tests and more python versions added to 
tests, the way Gradle manages projects/tasks changed a lot. It will be better 
to standardize how we use Gradle to manage Python tests/builds/tasks across 
different versions and runners.

In general, we may want to:
- Apply parallel execution per version per runner.
- Group basic tasks that are used in most projects in BeamModulePlugins.
- Avoid deeply nested directory structure for build.gradle.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6892) Use temp_location for BQ FILE_LOADS on DirectRunner, and autocreate it in GCS if not specified by user.

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6892?focusedWorklogId=218327=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218327
 ]

ASF GitHub Bot logged work on BEAM-6892:


Author: ASF GitHub Bot
Created on: 25/Mar/19 22:43
Start Date: 25/Mar/19 22:43
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #8135: [BEAM-6892] Adding 
support for auto-creating buckets for BigQuery file loads
URL: https://github.com/apache/beam/pull/8135#issuecomment-476405845
 
 
   Run Python PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218327)
Time Spent: 10m
Remaining Estimate: 0h

> Use temp_location for BQ FILE_LOADS on DirectRunner, and autocreate it in GCS 
> if not specified by user.
> ---
>
> Key: BEAM-6892
> URL: https://issues.apache.org/jira/browse/BEAM-6892
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Pablo Estrada
>Priority: Blocker
> Fix For: 2.12.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=218320=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218320
 ]

ASF GitHub Bot logged work on BEAM-4374:


Author: ASF GitHub Bot
Created on: 25/Mar/19 22:35
Start Date: 25/Mar/19 22:35
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on pull request #8136: 
[DoNotMerge][DoNotReview][BEAM-4374] Emit MeanByteCount distribution tuple 
system metric from …
URL: https://github.com/apache/beam/pull/8136
 
 
   …Python SDK
   
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/)
 | --- | --- | ---
   
   Pre-Commit Tests Status (on master branch)
   

   
   --- |Java | Python | Go | Website
   

[jira] [Work logged] (BEAM-6905) Add support for User Distribution metrics for Python Streaming

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6905?focusedWorklogId=218316=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218316
 ]

ASF GitHub Bot logged work on BEAM-6905:


Author: ASF GitHub Bot
Created on: 25/Mar/19 22:33
Start Date: 25/Mar/19 22:33
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on issue #8131: [BEAM-6905] Add 
support for user distribution metrics.
URL: https://github.com/apache/beam/pull/8131#issuecomment-476394890
 
 
   Run RAT PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218316)
Time Spent: 1h 20m  (was: 1h 10m)

> Add support for User Distribution metrics for Python Streaming
> --
>
> Key: BEAM-6905
> URL: https://issues.apache.org/jira/browse/BEAM-6905
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-harness
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> # Add new UserDistributionCounter urn to metrics.proto
>  # Report UserDistribution metric in python SDK
>  # Plumb User Distribution metric in Dataflow runner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6905) Add support for User Distribution metrics for Python Streaming

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6905?focusedWorklogId=218300=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218300
 ]

ASF GitHub Bot logged work on BEAM-6905:


Author: ASF GitHub Bot
Created on: 25/Mar/19 22:02
Start Date: 25/Mar/19 22:02
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on issue #8131: [BEAM-6905] Add 
support for user distribution metrics.
URL: https://github.com/apache/beam/pull/8131#issuecomment-476394373
 
 
   Run Spotless PreCommit
   Run RAT PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218300)
Time Spent: 1h  (was: 50m)

> Add support for User Distribution metrics for Python Streaming
> --
>
> Key: BEAM-6905
> URL: https://issues.apache.org/jira/browse/BEAM-6905
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-harness
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> # Add new UserDistributionCounter urn to metrics.proto
>  # Report UserDistribution metric in python SDK
>  # Plumb User Distribution metric in Dataflow runner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6905) Add support for User Distribution metrics for Python Streaming

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6905?focusedWorklogId=218301=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218301
 ]

ASF GitHub Bot logged work on BEAM-6905:


Author: ASF GitHub Bot
Created on: 25/Mar/19 22:02
Start Date: 25/Mar/19 22:02
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on issue #8131: [BEAM-6905] Add 
support for user distribution metrics.
URL: https://github.com/apache/beam/pull/8131#issuecomment-476394890
 
 
   Run RAT PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218301)
Time Spent: 1h 10m  (was: 1h)

> Add support for User Distribution metrics for Python Streaming
> --
>
> Key: BEAM-6905
> URL: https://issues.apache.org/jira/browse/BEAM-6905
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-harness
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> # Add new UserDistributionCounter urn to metrics.proto
>  # Report UserDistribution metric in python SDK
>  # Plumb User Distribution metric in Dataflow runner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6905) Add support for User Distribution metrics for Python Streaming

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6905?focusedWorklogId=218298=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218298
 ]

ASF GitHub Bot logged work on BEAM-6905:


Author: ASF GitHub Bot
Created on: 25/Mar/19 22:00
Start Date: 25/Mar/19 22:00
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on issue #8131: [BEAM-6905] Add 
support for user distribution metrics.
URL: https://github.com/apache/beam/pull/8131#issuecomment-476394373
 
 
   Run Spotless PreCommit
   Run RAT PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218298)
Time Spent: 50m  (was: 40m)

> Add support for User Distribution metrics for Python Streaming
> --
>
> Key: BEAM-6905
> URL: https://issues.apache.org/jira/browse/BEAM-6905
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-harness
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> # Add new UserDistributionCounter urn to metrics.proto
>  # Report UserDistribution metric in python SDK
>  # Plumb User Distribution metric in Dataflow runner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6905) Add support for User Distribution metrics for Python Streaming

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6905?focusedWorklogId=218297=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218297
 ]

ASF GitHub Bot logged work on BEAM-6905:


Author: ASF GitHub Bot
Created on: 25/Mar/19 21:58
Start Date: 25/Mar/19 21:58
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on issue #8131: [BEAM-6905] Add 
support for user distribution metrics.
URL: https://github.com/apache/beam/pull/8131#issuecomment-476392812
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218297)
Time Spent: 40m  (was: 0.5h)

> Add support for User Distribution metrics for Python Streaming
> --
>
> Key: BEAM-6905
> URL: https://issues.apache.org/jira/browse/BEAM-6905
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-harness
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> # Add new UserDistributionCounter urn to metrics.proto
>  # Report UserDistribution metric in python SDK
>  # Plumb User Distribution metric in Dataflow runner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6905) Add support for User Distribution metrics for Python Streaming

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6905?focusedWorklogId=218294=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218294
 ]

ASF GitHub Bot logged work on BEAM-6905:


Author: ASF GitHub Bot
Created on: 25/Mar/19 21:55
Start Date: 25/Mar/19 21:55
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on issue #8131: [BEAM-6905] Add 
support for user distribution metrics.
URL: https://github.com/apache/beam/pull/8131#issuecomment-476392812
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218294)
Time Spent: 0.5h  (was: 20m)

> Add support for User Distribution metrics for Python Streaming
> --
>
> Key: BEAM-6905
> URL: https://issues.apache.org/jira/browse/BEAM-6905
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-harness
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> # Add new UserDistributionCounter urn to metrics.proto
>  # Report UserDistribution metric in python SDK
>  # Plumb User Distribution metric in Dataflow runner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6284) [FLAKE][beam_PostCommit_Java_ValidatesRunner_Dataflow] TestRunner fails with result UNKNOWN on succeeded job and checks passed

2019-03-25 Thread Daniel Oliveira (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801156#comment-16801156
 ] 

Daniel Oliveira commented on BEAM-6284:
---

Sorry for not updating here. It looks like the internal discussion ended up 
going nowhere. This bug still seems to be happening though (BEAM-6882) so I'll 
try to find someone with Dataflow experience to assign this to.

> [FLAKE][beam_PostCommit_Java_ValidatesRunner_Dataflow] TestRunner fails with 
> result UNKNOWN on succeeded job and checks passed
> --
>
> Key: BEAM-6284
> URL: https://issues.apache.org/jira/browse/BEAM-6284
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures, testing
>Reporter: Mikhail Gryzykhin
>Assignee: Daniel Oliveira
>Priority: Major
>  Labels: currently-failing, triaged
>
> _Use this form to file an issue for test failure:_
>  * 
> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/testReport/junit/org.apache.beam.sdk.transforms/ViewTest/testWindowedSideInputFixedToGlobal/
> Initial investigation:
> According to logs all test-relevant checks have passed and it seem to be 
> testing framework failure.
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6906) Mutating accumulators in fused stages is generally unsafe - need to provide a single mutable accumulator

2019-03-25 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-6906:
--
Priority: Critical  (was: Major)

> Mutating accumulators in fused stages is generally unsafe - need to provide a 
> single mutable accumulator
> 
>
> Key: BEAM-6906
> URL: https://issues.apache.org/jira/browse/BEAM-6906
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Reporter: Kenneth Knowles
>Assignee: Yueyang Qiu
>Priority: Critical
>
> Our current docs encourage a CombineFn author to mutate accumulators for 
> efficiency. This is important, but cannot be done generally without losing 
> efficiency - it is not safe to share accumulators within a stage or across 
> sliding windows. The ownership story needs to be clear. Any accumulator that 
> is mutable is from that point on owned by the CombineFn, not the runner and 
> cannot be given to other steps.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6906) Mutating accumulators in fused stages is generally unsafe - need to provide a single mutable accumulator

2019-03-25 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-6906:
--
Issue Type: Bug  (was: Test)

> Mutating accumulators in fused stages is generally unsafe - need to provide a 
> single mutable accumulator
> 
>
> Key: BEAM-6906
> URL: https://issues.apache.org/jira/browse/BEAM-6906
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Reporter: Kenneth Knowles
>Assignee: Yueyang Qiu
>Priority: Major
>
> Our current docs encourage a CombineFn author to mutate accumulators for 
> efficiency. This is important, but cannot be done generally without losing 
> efficiency - it is not safe to share accumulators within a stage or across 
> sliding windows. The ownership story needs to be clear. Any accumulator that 
> is mutable is from that point on owned by the CombineFn, not the runner and 
> cannot be given to other steps.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6906) Mutating accumulators in fused stages is generally unsafe - need to provide a single mutable accumulator

2019-03-25 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-6906:
-

 Summary: Mutating accumulators in fused stages is generally unsafe 
- need to provide a single mutable accumulator
 Key: BEAM-6906
 URL: https://issues.apache.org/jira/browse/BEAM-6906
 Project: Beam
  Issue Type: Test
  Components: beam-model
Reporter: Kenneth Knowles
Assignee: Yueyang Qiu


Our current docs encourage a CombineFn author to mutate accumulators for 
efficiency. This is important, but cannot be done generally without losing 
efficiency - it is not safe to share accumulators within a stage or across 
sliding windows. The ownership story needs to be clear. Any accumulator that is 
mutable is from that point on owned by the CombineFn, not the runner and cannot 
be given to other steps.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5519) Spark Streaming Duplicated Encoding/Decoding Effort

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5519?focusedWorklogId=218290=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218290
 ]

ASF GitHub Bot logged work on BEAM-5519:


Author: ASF GitHub Bot
Created on: 25/Mar/19 21:42
Start Date: 25/Mar/19 21:42
Worklog Time Spent: 10m 
  Work Description: dmvk commented on issue #6511: [BEAM-5519] Remove call 
to groupByKey in Spark Streaming.
URL: https://github.com/apache/beam/pull/6511#issuecomment-476389018
 
 
   Run Spark ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218290)
Time Spent: 3h 40m  (was: 3.5h)

> Spark Streaming Duplicated Encoding/Decoding Effort
> ---
>
> Key: BEAM-5519
> URL: https://issues.apache.org/jira/browse/BEAM-5519
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Winkelman
>Assignee: Kyle Winkelman
>Priority: Major
>  Labels: spark, spark-streaming, triaged
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> When using the SparkRunner in streaming mode. There is a call to groupByKey 
> followed by a call to updateStateByKey. BEAM-1815 fixed an issue where this 
> used to cause 2 shuffles but it still causes 2 encode/decode cycles.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5519) Spark Streaming Duplicated Encoding/Decoding Effort

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5519?focusedWorklogId=218289=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218289
 ]

ASF GitHub Bot logged work on BEAM-5519:


Author: ASF GitHub Bot
Created on: 25/Mar/19 21:42
Start Date: 25/Mar/19 21:42
Worklog Time Spent: 10m 
  Work Description: dmvk commented on issue #6511: [BEAM-5519] Remove call 
to groupByKey in Spark Streaming.
URL: https://github.com/apache/beam/pull/6511#issuecomment-476388952
 
 
   Run Spark Run Spark ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218289)
Time Spent: 3.5h  (was: 3h 20m)

> Spark Streaming Duplicated Encoding/Decoding Effort
> ---
>
> Key: BEAM-5519
> URL: https://issues.apache.org/jira/browse/BEAM-5519
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Winkelman
>Assignee: Kyle Winkelman
>Priority: Major
>  Labels: spark, spark-streaming, triaged
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> When using the SparkRunner in streaming mode. There is a call to groupByKey 
> followed by a call to updateStateByKey. BEAM-1815 fixed an issue where this 
> used to cause 2 shuffles but it still causes 2 encode/decode cycles.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5519) Spark Streaming Duplicated Encoding/Decoding Effort

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5519?focusedWorklogId=218288=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218288
 ]

ASF GitHub Bot logged work on BEAM-5519:


Author: ASF GitHub Bot
Created on: 25/Mar/19 21:42
Start Date: 25/Mar/19 21:42
Worklog Time Spent: 10m 
  Work Description: dmvk commented on issue #6511: [BEAM-5519] Remove call 
to groupByKey in Spark Streaming.
URL: https://github.com/apache/beam/pull/6511#issuecomment-476388952
 
 
   Run Spark Run Spark ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218288)
Time Spent: 3h 20m  (was: 3h 10m)

> Spark Streaming Duplicated Encoding/Decoding Effort
> ---
>
> Key: BEAM-5519
> URL: https://issues.apache.org/jira/browse/BEAM-5519
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Winkelman
>Assignee: Kyle Winkelman
>Priority: Major
>  Labels: spark, spark-streaming, triaged
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> When using the SparkRunner in streaming mode. There is a call to groupByKey 
> followed by a call to updateStateByKey. BEAM-1815 fixed an issue where this 
> used to cause 2 shuffles but it still causes 2 encode/decode cycles.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6861) Select fields and computed fields in CassandraIO.Read

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6861?focusedWorklogId=218292=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218292
 ]

ASF GitHub Bot logged work on BEAM-6861:


Author: ASF GitHub Bot
Created on: 25/Mar/19 21:42
Start Date: 25/Mar/19 21:42
Worklog Time Spent: 10m 
  Work Description: stankiewicz commented on issue #8090: [BEAM-6861] 
Select fields and computed fields in CassandraIO.Read
URL: https://github.com/apache/beam/pull/8090#issuecomment-476389202
 
 
   @iemejia , no problem :) Thanks for all the effort!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218292)
Time Spent: 4.5h  (was: 4h 20m)

> Select fields and computed fields in CassandraIO.Read 
> --
>
> Key: BEAM-6861
> URL: https://issues.apache.org/jira/browse/BEAM-6861
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-cassandra
>Reporter: Radosław Stankiewicz
>Assignee: Radosław Stankiewicz
>Priority: Minor
>  Labels: features
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> CassandraIO.Read currently selects all fields and maps them to POJO.
> To make this component more flexible, it should be possible to select only 
> subset of fields or computed fields to allow reading things like write 
> Timestamp or using other functions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6257) Can we deprecate the side input paths through PAssert?

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6257?focusedWorklogId=218284=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218284
 ]

ASF GitHub Bot logged work on BEAM-6257:


Author: ASF GitHub Bot
Created on: 25/Mar/19 21:37
Start Date: 25/Mar/19 21:37
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #8094: 
[BEAM-6257] PAssert.thatSingleton: use GBK instead of side inputs
URL: https://github.com/apache/beam/pull/8094#discussion_r26886
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/testing/PAssert.java
 ##
 @@ -23,6 +23,7 @@
 import static org.hamcrest.Matchers.not;
 import static org.junit.Assert.assertThat;
 
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
 
 Review comment:
   spotbugs annotations now
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218284)
Time Spent: 2h 10m  (was: 2h)

> Can we deprecate the side input paths through PAssert?
> --
>
> Key: BEAM-6257
> URL: https://issues.apache.org/jira/browse/BEAM-6257
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Priority: Major
>  Labels: starter, triaged
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> PAssert has two distinct paths - one uses GBK with a single-firing trigger, 
> and one uses side inputs. Side inputs are usually a later addition to a 
> runner, while GBK is one of the first primitives (with a single firing it is 
> even simple). Filing this against myself to figure out why the side input 
> version is not deprecated, and if it can be deprecated.
> Marking this as a "starter" task because finding and eliminating side input 
> version of PAssert should be fairly easy. You might need help but can ask on 
> dev@



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6257) Can we deprecate the side input paths through PAssert?

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6257?focusedWorklogId=218285=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218285
 ]

ASF GitHub Bot logged work on BEAM-6257:


Author: ASF GitHub Bot
Created on: 25/Mar/19 21:37
Start Date: 25/Mar/19 21:37
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #8094: 
[BEAM-6257] PAssert.thatSingleton: use GBK instead of side inputs
URL: https://github.com/apache/beam/pull/8094#discussion_r268859683
 
 

 ##
 File path: 
runners/apex/src/test/java/org/apache/beam/runners/apex/ApexRunnerTest.java
 ##
 @@ -82,7 +82,7 @@ public void testParDoChaining() throws Exception {
 ApexPipelineOptions options = 
PipelineOptionsFactory.as(ApexPipelineOptions.class);
 DAG dag = TestApexRunner.translate(p, options);
 
-String[] expectedThreadLocal = 
{"/CreateActual/FilterActuals/Window.Assign"};
+String[] expectedThreadLocal = 
{"/GroupGlobally/RewindowActuals/Window.Assign"};
 
 Review comment:
   From the name of the test, I think this is a hacked way to ensure that the 
transforms are actually fused into a single Apex operator. Your adjustment LGTM.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218285)

> Can we deprecate the side input paths through PAssert?
> --
>
> Key: BEAM-6257
> URL: https://issues.apache.org/jira/browse/BEAM-6257
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Priority: Major
>  Labels: starter, triaged
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> PAssert has two distinct paths - one uses GBK with a single-firing trigger, 
> and one uses side inputs. Side inputs are usually a later addition to a 
> runner, while GBK is one of the first primitives (with a single firing it is 
> even simple). Filing this against myself to figure out why the side input 
> version is not deprecated, and if it can be deprecated.
> Marking this as a "starter" task because finding and eliminating side input 
> version of PAssert should be fairly easy. You might need help but can ask on 
> dev@



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6876) User state cleanup in portable Flink runner

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6876?focusedWorklogId=218283=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218283
 ]

ASF GitHub Bot logged work on BEAM-6876:


Author: ASF GitHub Bot
Created on: 25/Mar/19 21:37
Start Date: 25/Mar/19 21:37
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #8118: [BEAM-6876] 
Cleanup user state in portable Flink Runner
URL: https://github.com/apache/beam/pull/8118#discussion_r268862438
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java
 ##
 @@ -713,6 +705,86 @@ private void emitResults() {
 }
   }
 
+  private DoFnRunner ensureStateCleanup(
+  SdkHarnessDoFnRunner sdkHarnessRunner) {
+if (keyCoder == null) {
+  // There won't be any state to clean up
+  // (stateful functions have to be keyed)
+  return sdkHarnessRunner;
+}
+// Takes care of state cleanup via StatefulDoFnRunner
+Coder windowCoder = windowingStrategy.getWindowFn().windowCoder();
+StatefulDoFnRunner.CleanupTimer cleanupTimer =
+new StatefulDoFnRunner.CleanupTimer() {
+
+  private static final String GC_TIMER_ID = "__user-state-cleanup__";
+
+  @Override
+  public Instant currentInputWatermarkTime() {
+return timerInternals.currentInputWatermarkTime();
+  }
+
+  @Override
+  public void setForWindow(InputT input, BoundedWindow window) {
+Preconditions.checkNotNull(input, "Null input passed to 
CleanupTimer");
+// make sure this fires after any window.maxTimestamp() timers
+Instant gcTime = LateDataUtils.garbageCollectionTime(window, 
windowingStrategy).plus(1);
+ByteBuffer key;
+try {
+  key =
+  ByteBuffer.wrap(
+  CoderUtils.encodeToByteArray((Coder) keyCoder, ((KV) 
input).getKey()));
+} catch (CoderException e) {
+  throw new RuntimeException("Failed to encode key for Flink state 
backend", e);
+}
+// Ensure the state backend is not concurrently accessed by the 
state requests
+try {
+  stateBackendLock.lock();
+  // Set these two to ensure correct timer registration
+  // 1) For the timer setting
+  sdkHarnessRunner.setCurrentTimerKey(key);
+  // 2) For the timer deduplication
+  getKeyedStateBackend().setCurrentKey(key);
+  timerInternals.setTimer(
+  StateNamespaces.window(windowCoder, window),
+  GC_TIMER_ID,
+  gcTime,
 
 Review comment:
   The state namespace is assumed to be unique for each window, if not then 
they are the same window and the GC time should be the same. Internally, 
`StateNamespaces$WindowNamespace#stringKey()` serializes the Window to a byte 
array and base64 encodes it to a string. That's a bit crazy but that's how Beam 
state works for windows. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218283)
Time Spent: 1h  (was: 50m)

> User state cleanup in portable Flink runner
> ---
>
> Key: BEAM-6876
> URL: https://issues.apache.org/jira/browse/BEAM-6876
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.11.0
>Reporter: Thomas Weise
>Assignee: Maximilian Michels
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> State is currently not being cleaned up by the runner.
> [https://lists.apache.org/thread.html/86f0809fbfa3da873051287b9ff249d6dd5a896b45409db1e484cf38@%3Cdev.beam.apache.org%3E]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-6497) ContainerLaunchException in java precommit

2019-03-25 Thread Gleb Kanterov (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gleb Kanterov closed BEAM-6497.
---

> ContainerLaunchException in java precommit
> --
>
> Key: BEAM-6497
> URL: https://issues.apache.org/jira/browse/BEAM-6497
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Alex Amato
>Assignee: Gleb Kanterov
>Priority: Critical
> Fix For: 2.11.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> h1. I have seen this fail the following tests:
> org.apache.beam.sdk.io.clickhouse.AtomicInsertTest.classMethod
> org.apache.beam.sdk.io.clickhouse.ClickHouseIOTest.classMethod
> [https://builds.apache.org/job/beam_PreCommit_Java_Commit/3722/]
>  
> h1. Failed
> org.apache.beam.sdk.io.clickhouse.AtomicInsertTest.classMethod
> Failing for the past 1 build (Since 
> [!https://builds.apache.org/static/3b09176f/images/16x16/red.png! 
> #3722|https://builds.apache.org/job/beam_PreCommit_Java_Commit/3722/] )
> [Took 16 
> ms.|https://builds.apache.org/job/beam_PreCommit_Java_Commit/3722/testReport/junit/org.apache.beam.sdk.io.clickhouse/AtomicInsertTest/classMethod/history]
>  
> h3. Error Message
> org.testcontainers.containers.ContainerLaunchException: Container startup 
> failed
> h3. Stacktrace
> org.testcontainers.containers.ContainerLaunchException: Container startup 
> failed at 
> org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:250)
>  at 
> org.testcontainers.containers.GenericContainer.start(GenericContainer.java:228)
>  at 
> org.apache.beam.sdk.io.clickhouse.BaseClickHouseTest.setup(BaseClickHouseTest.java:66)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) 
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> at org.junit.runners.ParentRunner.run(ParentRunner.java:396) at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106)
>  at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>  at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>  at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:66)
>  at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>  at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>  at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
>  at 
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
>  at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) at 
> org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:117)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>  at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>  at 
> org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:155)
>  at 
> 

[jira] [Closed] (BEAM-6639) ClickHouseIOTest flakey failure failing in precomiits

2019-03-25 Thread Gleb Kanterov (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gleb Kanterov closed BEAM-6639.
---
   Resolution: Fixed
Fix Version/s: 2.12.0

> ClickHouseIOTest flakey failure failing in precomiits
> -
>
> Key: BEAM-6639
> URL: https://issues.apache.org/jira/browse/BEAM-6639
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-clickhouse, test-failures
>Reporter: Alex Amato
>Assignee: Gleb Kanterov
>Priority: Critical
>  Labels: flake
> Fix For: 2.12.0
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/job/beam_PreCommit_Java_Commit/4166/testReport/junit/org.apache.beam.sdk.io.clickhouse/ClickHouseIOTest/classMethod/]
>  
> h3. Error Message
> org.testcontainers.containers.ContainerLaunchException: Container startup 
> failed
> h3. Stacktrace
> org.testcontainers.containers.ContainerLaunchException: Container startup 
> failed at 
> org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:221)
>  at 
> org.testcontainers.containers.GenericContainer.start(GenericContainer.java:203)
>  at 
> org.apache.beam.sdk.io.clickhouse.BaseClickHouseTest.setup(BaseClickHouseTest.java:68)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) 
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> at org.junit.runners.ParentRunner.run(ParentRunner.java:396) at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106)
>  at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>  at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>  at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:66)
>  at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>  at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>  at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
>  at 
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
>  at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) at 
> org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:117)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>  at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>  at 
> org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:155)
>  at 
> org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:137)
>  at 
> org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:404)
>  at 
> org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
>  at 
> org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46)
>  at 
> 

[jira] [Assigned] (BEAM-6639) ClickHouseIOTest flakey failure failing in precomiits

2019-03-25 Thread Gleb Kanterov (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gleb Kanterov reassigned BEAM-6639:
---

Assignee: Gleb Kanterov

> ClickHouseIOTest flakey failure failing in precomiits
> -
>
> Key: BEAM-6639
> URL: https://issues.apache.org/jira/browse/BEAM-6639
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-clickhouse, test-failures
>Reporter: Alex Amato
>Assignee: Gleb Kanterov
>Priority: Critical
>  Labels: flake
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/job/beam_PreCommit_Java_Commit/4166/testReport/junit/org.apache.beam.sdk.io.clickhouse/ClickHouseIOTest/classMethod/]
>  
> h3. Error Message
> org.testcontainers.containers.ContainerLaunchException: Container startup 
> failed
> h3. Stacktrace
> org.testcontainers.containers.ContainerLaunchException: Container startup 
> failed at 
> org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:221)
>  at 
> org.testcontainers.containers.GenericContainer.start(GenericContainer.java:203)
>  at 
> org.apache.beam.sdk.io.clickhouse.BaseClickHouseTest.setup(BaseClickHouseTest.java:68)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) 
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> at org.junit.runners.ParentRunner.run(ParentRunner.java:396) at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106)
>  at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>  at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>  at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:66)
>  at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>  at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>  at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
>  at 
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
>  at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) at 
> org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:117)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>  at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>  at 
> org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:155)
>  at 
> org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:137)
>  at 
> org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:404)
>  at 
> org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
>  at 
> org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> 

[jira] [Work logged] (BEAM-6905) Add support for User Distribution metrics for Python Streaming

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6905?focusedWorklogId=218277=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218277
 ]

ASF GitHub Bot logged work on BEAM-6905:


Author: ASF GitHub Bot
Created on: 25/Mar/19 21:29
Start Date: 25/Mar/19 21:29
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on issue #8131: [BEAM-6905] Add 
support for user distribution metrics.
URL: https://github.com/apache/beam/pull/8131#issuecomment-476384944
 
 
   @ajamato 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218277)
Time Spent: 20m  (was: 10m)

> Add support for User Distribution metrics for Python Streaming
> --
>
> Key: BEAM-6905
> URL: https://issues.apache.org/jira/browse/BEAM-6905
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-harness
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> # Add new UserDistributionCounter urn to metrics.proto
>  # Report UserDistribution metric in python SDK
>  # Plumb User Distribution metric in Dataflow runner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6639) ClickHouseIOTest flakey failure failing in precomiits

2019-03-25 Thread Gleb Kanterov (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801145#comment-16801145
 ] 

Gleb Kanterov commented on BEAM-6639:
-

[~kenn] I think so, in any case, the issue is supposed to be resolved, and 
should be re-opened if it isn't.

> ClickHouseIOTest flakey failure failing in precomiits
> -
>
> Key: BEAM-6639
> URL: https://issues.apache.org/jira/browse/BEAM-6639
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-clickhouse, test-failures
>Reporter: Alex Amato
>Priority: Critical
>  Labels: flake
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/job/beam_PreCommit_Java_Commit/4166/testReport/junit/org.apache.beam.sdk.io.clickhouse/ClickHouseIOTest/classMethod/]
>  
> h3. Error Message
> org.testcontainers.containers.ContainerLaunchException: Container startup 
> failed
> h3. Stacktrace
> org.testcontainers.containers.ContainerLaunchException: Container startup 
> failed at 
> org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:221)
>  at 
> org.testcontainers.containers.GenericContainer.start(GenericContainer.java:203)
>  at 
> org.apache.beam.sdk.io.clickhouse.BaseClickHouseTest.setup(BaseClickHouseTest.java:68)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) 
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> at org.junit.runners.ParentRunner.run(ParentRunner.java:396) at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106)
>  at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>  at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>  at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:66)
>  at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>  at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>  at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
>  at 
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
>  at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) at 
> org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:117)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>  at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>  at 
> org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:155)
>  at 
> org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:137)
>  at 
> org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:404)
>  at 
> org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
>  at 
> org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46)
>  at 
> 

[jira] [Work logged] (BEAM-6905) Add support for User Distribution metrics for Python Streaming

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6905?focusedWorklogId=218276=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218276
 ]

ASF GitHub Bot logged work on BEAM-6905:


Author: ASF GitHub Bot
Created on: 25/Mar/19 21:28
Start Date: 25/Mar/19 21:28
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on issue #8131: [BEAM-6905] Add 
support for user distribution metrics.
URL: https://github.com/apache/beam/pull/8131#issuecomment-476384712
 
 
   Beam8 OOMs and some other precommits are failing on master.
   PR is still good for review, I'll followup on failures once mitigations are 
in place.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218276)
Time Spent: 10m
Remaining Estimate: 0h

> Add support for User Distribution metrics for Python Streaming
> --
>
> Key: BEAM-6905
> URL: https://issues.apache.org/jira/browse/BEAM-6905
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-harness
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> # Add new UserDistributionCounter urn to metrics.proto
>  # Report UserDistribution metric in python SDK
>  # Plumb User Distribution metric in Dataflow runner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6905) Add support for User Distribution metrics for Python Streaming

2019-03-25 Thread Mikhail Gryzykhin (JIRA)
Mikhail Gryzykhin created BEAM-6905:
---

 Summary: Add support for User Distribution metrics for Python 
Streaming
 Key: BEAM-6905
 URL: https://issues.apache.org/jira/browse/BEAM-6905
 Project: Beam
  Issue Type: Bug
  Components: runner-dataflow, sdk-py-harness
Reporter: Mikhail Gryzykhin
Assignee: Mikhail Gryzykhin


# Add new UserDistributionCounter urn to metrics.proto
 # Report UserDistribution metric in python SDK
 # Plumb User Distribution metric in Dataflow runner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6257) Can we deprecate the side input paths through PAssert?

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6257?focusedWorklogId=218273=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218273
 ]

ASF GitHub Bot logged work on BEAM-6257:


Author: ASF GitHub Bot
Created on: 25/Mar/19 21:24
Start Date: 25/Mar/19 21:24
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #8094: [BEAM-6257] 
PAssert.thatSingleton: use GBK instead of side inputs
URL: https://github.com/apache/beam/pull/8094#issuecomment-476383431
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218273)
Time Spent: 2h  (was: 1h 50m)

> Can we deprecate the side input paths through PAssert?
> --
>
> Key: BEAM-6257
> URL: https://issues.apache.org/jira/browse/BEAM-6257
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Priority: Major
>  Labels: starter, triaged
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> PAssert has two distinct paths - one uses GBK with a single-firing trigger, 
> and one uses side inputs. Side inputs are usually a later addition to a 
> runner, while GBK is one of the first primitives (with a single firing it is 
> even simple). Filing this against myself to figure out why the side input 
> version is not deprecated, and if it can be deprecated.
> Marking this as a "starter" task because finding and eliminating side input 
> version of PAssert should be fairly easy. You might need help but can ask on 
> dev@



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6619) Add PostCommit suite for integration tests on DataflowRunner

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6619?focusedWorklogId=218271=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218271
 ]

ASF GitHub Bot logged work on BEAM-6619:


Author: ASF GitHub Bot
Created on: 25/Mar/19 21:21
Start Date: 25/Mar/19 21:21
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #8098: [BEAM-6619] 
[BEAM-6593] update gradle to include all py3 it tests
URL: https://github.com/apache/beam/pull/8098#discussion_r268856630
 
 

 ##
 File path: sdks/python/test-suites/direct/py3/build.gradle
 ##
 @@ -29,19 +29,9 @@ task postCommitIT(dependsOn: 'installGcpTest') {
   doLast {
 def batchTests = [
 "apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it",
-
"apache_beam.examples.cookbook.bigquery_tornadoes_it_test:BigqueryTornadoesIT.test_bigquery_tornadoes_it",
 
 Review comment:
   And furthermore, I did not realize these tests were broken because they 
werent running on DirectRunner, so I see the value of having at least one or 
two BQIO tests on Directrunner.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218271)
Time Spent: 22h 50m  (was: 22h 40m)

> Add PostCommit suite for integration tests on DataflowRunner
> 
>
> Key: BEAM-6619
> URL: https://issues.apache.org/jira/browse/BEAM-6619
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Labels: triaged
> Fix For: Not applicable
>
>  Time Spent: 22h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6876) User state cleanup in portable Flink runner

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6876?focusedWorklogId=218270=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218270
 ]

ASF GitHub Bot logged work on BEAM-6876:


Author: ASF GitHub Bot
Created on: 25/Mar/19 21:21
Start Date: 25/Mar/19 21:21
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #8118: [BEAM-6876] 
Cleanup user state in portable Flink Runner
URL: https://github.com/apache/beam/pull/8118#discussion_r268856542
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java
 ##
 @@ -713,6 +705,86 @@ private void emitResults() {
 }
   }
 
+  private DoFnRunner ensureStateCleanup(
+  SdkHarnessDoFnRunner sdkHarnessRunner) {
+if (keyCoder == null) {
+  // There won't be any state to clean up
+  // (stateful functions have to be keyed)
+  return sdkHarnessRunner;
+}
+// Takes care of state cleanup via StatefulDoFnRunner
+Coder windowCoder = windowingStrategy.getWindowFn().windowCoder();
+StatefulDoFnRunner.CleanupTimer cleanupTimer =
+new StatefulDoFnRunner.CleanupTimer() {
+
+  private static final String GC_TIMER_ID = "__user-state-cleanup__";
+
+  @Override
+  public Instant currentInputWatermarkTime() {
+return timerInternals.currentInputWatermarkTime();
+  }
+
+  @Override
+  public void setForWindow(InputT input, BoundedWindow window) {
+Preconditions.checkNotNull(input, "Null input passed to 
CleanupTimer");
+// make sure this fires after any window.maxTimestamp() timers
+Instant gcTime = LateDataUtils.garbageCollectionTime(window, 
windowingStrategy).plus(1);
+ByteBuffer key;
+try {
+  key =
+  ByteBuffer.wrap(
+  CoderUtils.encodeToByteArray((Coder) keyCoder, ((KV) 
input).getKey()));
 
 Review comment:
   There is a timer for each window and key. The key has to be set here because 
Flink's state backend partitions state by key. Does that answer your question?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218270)
Time Spent: 50m  (was: 40m)

> User state cleanup in portable Flink runner
> ---
>
> Key: BEAM-6876
> URL: https://issues.apache.org/jira/browse/BEAM-6876
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.11.0
>Reporter: Thomas Weise
>Assignee: Maximilian Michels
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> State is currently not being cleaned up by the runner.
> [https://lists.apache.org/thread.html/86f0809fbfa3da873051287b9ff249d6dd5a896b45409db1e484cf38@%3Cdev.beam.apache.org%3E]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6284) [FLAKE][beam_PostCommit_Java_ValidatesRunner_Dataflow] TestRunner fails with result UNKNOWN on succeeded job and checks passed

2019-03-25 Thread Mikhail Gryzykhin (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801081#comment-16801081
 ] 

Mikhail Gryzykhin commented on BEAM-6284:
-

Hi Daniel,

Can you please update on this ticket?

> [FLAKE][beam_PostCommit_Java_ValidatesRunner_Dataflow] TestRunner fails with 
> result UNKNOWN on succeeded job and checks passed
> --
>
> Key: BEAM-6284
> URL: https://issues.apache.org/jira/browse/BEAM-6284
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures, testing
>Reporter: Mikhail Gryzykhin
>Assignee: Daniel Oliveira
>Priority: Major
>  Labels: currently-failing, triaged
>
> _Use this form to file an issue for test failure:_
>  * 
> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/testReport/junit/org.apache.beam.sdk.transforms/ViewTest/testWindowedSideInputFixedToGlobal/
> Initial investigation:
> According to logs all test-relevant checks have passed and it seem to be 
> testing framework failure.
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-6882) [Flake][DF IT tests] job ##### terminated in state UNKNOWN but did not return a failure reason.

2019-03-25 Thread Daniel Oliveira (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Oliveira closed BEAM-6882.
-
   Resolution: Duplicate
Fix Version/s: Not applicable

> [Flake][DF IT tests] job # terminated in state UNKNOWN but did not return 
> a failure reason.
> ---
>
> Key: BEAM-6882
> URL: https://issues.apache.org/jira/browse/BEAM-6882
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures, testing
>Reporter: Mikhail Gryzykhin
>Assignee: Daniel Oliveira
>Priority: Critical
>  Labels: currently-failing
> Fix For: Not applicable
>
>
>  
> Dataflow integration tests fail regularly failing to properly detect job 
> status. Usually error looks like:
> java.lang.RuntimeException: Dataflow job 
> 2019-03-21_05_08_46-17069693240682025803 terminated in state UNKNOWN but did 
> not return a failure reason. at 
> org.apache.beam.runners.dataflow.TestDataflowRunner.run(TestDataflowRunner.java:134)
> We have several flakes like this a week. (More statistics might be good to 
> add)
> Sample log.
> https://builds.apache.org/job/beam_PostCommit_Java11_ValidatesRunner_PortabilityApi_Dataflow/17/testReport/junit/org.apache.beam.sdk.transforms.windowing/WindowTest/testTimestampCombinerDefault/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6882) [Flake][DF IT tests] job ##### terminated in state UNKNOWN but did not return a failure reason.

2019-03-25 Thread Daniel Oliveira (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801067#comment-16801067
 ] 

Daniel Oliveira commented on BEAM-6882:
---

Yeah, looks like it.

> [Flake][DF IT tests] job # terminated in state UNKNOWN but did not return 
> a failure reason.
> ---
>
> Key: BEAM-6882
> URL: https://issues.apache.org/jira/browse/BEAM-6882
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures, testing
>Reporter: Mikhail Gryzykhin
>Assignee: Daniel Oliveira
>Priority: Critical
>  Labels: currently-failing
>
>  
> Dataflow integration tests fail regularly failing to properly detect job 
> status. Usually error looks like:
> java.lang.RuntimeException: Dataflow job 
> 2019-03-21_05_08_46-17069693240682025803 terminated in state UNKNOWN but did 
> not return a failure reason. at 
> org.apache.beam.runners.dataflow.TestDataflowRunner.run(TestDataflowRunner.java:134)
> We have several flakes like this a week. (More statistics might be good to 
> add)
> Sample log.
> https://builds.apache.org/job/beam_PostCommit_Java11_ValidatesRunner_PortabilityApi_Dataflow/17/testReport/junit/org.apache.beam.sdk.transforms.windowing/WindowTest/testTimestampCombinerDefault/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6824) Plumb FnApi ElementCount metrics in Dataflow Runner.

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6824?focusedWorklogId=218264=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218264
 ]

ASF GitHub Bot logged work on BEAM-6824:


Author: ASF GitHub Bot
Created on: 25/Mar/19 21:03
Start Date: 25/Mar/19 21:03
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #8095: [BEAM-6824] 
Add Element count transformer for Java worker FnApi
URL: https://github.com/apache/beam/pull/8095
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218264)
Time Spent: 3h 10m  (was: 3h)

> Plumb FnApi ElementCount metrics in Dataflow Runner.
> 
>
> Key: BEAM-6824
> URL: https://issues.apache.org/jira/browse/BEAM-6824
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-dataflow
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> We want to provide FnApi metrics to Dataflow Users.
> This bug covers ElementCount metric.
> Current approach utilizes mapping of PCollectionID based on WorkItem and 
> ProcessBundle graphs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3279) Deprecate and remove Coder consistentWithEquals in favor of overriding structuredValue

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3279?focusedWorklogId=218263=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218263
 ]

ASF GitHub Bot logged work on BEAM-3279:


Author: ASF GitHub Bot
Created on: 25/Mar/19 20:56
Start Date: 25/Mar/19 20:56
Worklog Time Spent: 10m 
  Work Description: AlexKbit commented on issue #8071: [BEAM-3279] 
Deprecate and remove Coder.consistentWithEquals
URL: https://github.com/apache/beam/pull/8071#issuecomment-476374108
 
 
   Thank you!  
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218263)
Time Spent: 4h 50m  (was: 4h 40m)

> Deprecate and remove Coder consistentWithEquals in favor of overriding 
> structuredValue
> --
>
> Key: BEAM-3279
> URL: https://issues.apache.org/jira/browse/BEAM-3279
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Alexander Savchenko
>Priority: Minor
>  Labels: starter
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Summary of discussion linked below:
> consistentWithEquals() is redundant w.r.t. structuralValue(), and should be 
> deprecated. I think our mutation detectors are already using 
> structuralValue(), so the work here would be to simply mark the method 
> deprecated, remove all remaining overrides in the SDK, and document that 
> overriding the method is a no-op.
> https://lists.apache.org/thread.html/8b2dcf09ba8e46b3c008293d99e4028d10463148b68326687dc29a4d@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-6864) PortablePipelineRunner should take JobInfo as an argument

2019-03-25 Thread Kyle Weaver (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver resolved BEAM-6864.
---
   Resolution: Fixed
Fix Version/s: 2.12.0

> PortablePipelineRunner should take JobInfo as an argument
> -
>
> Key: BEAM-6864
> URL: https://issues.apache.org/jira/browse/BEAM-6864
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Minor
> Fix For: 2.12.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The executable stage function for every portable runner will require an 
> instance of JobInfo; we should make that more explicit by passing JobInfo 
> when the pipeline is run, rather than requiring each runner to [add redundant 
> fields|https://github.com/apache/beam/blob/43bee0c2832f2685260a23182de4a8bfc55c/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineRunner.java#L43-L44].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3279) Deprecate and remove Coder consistentWithEquals in favor of overriding structuredValue

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3279?focusedWorklogId=218260=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218260
 ]

ASF GitHub Bot logged work on BEAM-3279:


Author: ASF GitHub Bot
Created on: 25/Mar/19 20:54
Start Date: 25/Mar/19 20:54
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #8071: [BEAM-3279] 
Deprecate and remove Coder.consistentWithEquals
URL: https://github.com/apache/beam/pull/8071#issuecomment-476373395
 
 
   https://issues.apache.org/jira/browse/BEAM-6904
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218260)
Time Spent: 4h 40m  (was: 4.5h)

> Deprecate and remove Coder consistentWithEquals in favor of overriding 
> structuredValue
> --
>
> Key: BEAM-3279
> URL: https://issues.apache.org/jira/browse/BEAM-3279
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Alexander Savchenko
>Priority: Minor
>  Labels: starter
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Summary of discussion linked below:
> consistentWithEquals() is redundant w.r.t. structuralValue(), and should be 
> deprecated. I think our mutation detectors are already using 
> structuralValue(), so the work here would be to simply mark the method 
> deprecated, remove all remaining overrides in the SDK, and document that 
> overriding the method is a no-op.
> https://lists.apache.org/thread.html/8b2dcf09ba8e46b3c008293d99e4028d10463148b68326687dc29a4d@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6904) Test all Coder structuralValue implementations

2019-03-25 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-6904:
-

 Summary: Test all Coder structuralValue implementations
 Key: BEAM-6904
 URL: https://issues.apache.org/jira/browse/BEAM-6904
 Project: Beam
  Issue Type: Test
  Components: sdk-java-core
Reporter: Kenneth Knowles
Assignee: Alexander Savchenko


Here is a test helper that check that structuralValue is consistent with 
equals: 
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/CoderProperties.java#L200

And here is one that tests it another way: 
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/CoderProperties.java#L226

With the deprecation of consistentWithEquals and implementing all the 
structualValue methods, we should add these tests to every coder.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6824) Plumb FnApi ElementCount metrics in Dataflow Runner.

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6824?focusedWorklogId=218256=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218256
 ]

ASF GitHub Bot logged work on BEAM-6824:


Author: ASF GitHub Bot
Created on: 25/Mar/19 20:50
Start Date: 25/Mar/19 20:50
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #8095: [BEAM-6824] Add 
Element count transformer for Java worker FnApi
URL: https://github.com/apache/beam/pull/8095#issuecomment-476371991
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218256)
Time Spent: 3h  (was: 2h 50m)

> Plumb FnApi ElementCount metrics in Dataflow Runner.
> 
>
> Key: BEAM-6824
> URL: https://issues.apache.org/jira/browse/BEAM-6824
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-dataflow
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> We want to provide FnApi metrics to Dataflow Users.
> This bug covers ElementCount metric.
> Current approach utilizes mapping of PCollectionID based on WorkItem and 
> ProcessBundle graphs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6557) Add IPython notebooks for quickstarts and custom I/O

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6557?focusedWorklogId=218255=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218255
 ]

ASF GitHub Bot logged work on BEAM-6557:


Author: ASF GitHub Bot
Created on: 25/Mar/19 20:49
Start Date: 25/Mar/19 20:49
Worklog Time Spent: 10m 
  Work Description: melap commented on pull request #7679: [BEAM-6557] Adds 
notebooks for quickstarts and custom I/O
URL: https://github.com/apache/beam/pull/7679#discussion_r266703246
 
 

 ##
 File path: website/src/documentation/io/developing-io-python.md
 ##
 @@ -85,6 +85,37 @@ Supply the logic for your new source by creating the 
following classes:
 You can find these classes in the
 [apache_beam.io.iobase module](https://beam.apache.org/releases/pydoc/{{ 
site.release_latest }}/apache_beam.io.iobase.html).
 
+### Using ParDo
 
 Review comment:
   I think this one section/example belongs on the overview page where we talk 
about using ParDo. as this page only covers using 
BoundedSource/UnboundedSource. perhaps add a new "Interactive Python examples" 
section at the bottom of the overview page, with subsections such as "Custom 
inputs using ParDo", "Custom outputs using ParDo", etc. , each with an intro 
sentence like "This example shows you how to create a custom input using 
ParDo.", then the source code snippet/buttons. All the ParDo related examples 
can go there.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218255)
Time Spent: 3h  (was: 2h 50m)

> Add IPython notebooks for quickstarts and custom I/O
> 
>
> Key: BEAM-6557
> URL: https://issues.apache.org/jira/browse/BEAM-6557
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: David Cavazos
>Assignee: David Cavazos
>Priority: Minor
>  Labels: triaged
>  Time Spent: 3h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5519) Spark Streaming Duplicated Encoding/Decoding Effort

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5519?focusedWorklogId=218254=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218254
 ]

ASF GitHub Bot logged work on BEAM-5519:


Author: ASF GitHub Bot
Created on: 25/Mar/19 20:48
Start Date: 25/Mar/19 20:48
Worklog Time Spent: 10m 
  Work Description: kyle-winkelman commented on issue #6511: [BEAM-5519] 
Remove call to groupByKey in Spark Streaming.
URL: https://github.com/apache/beam/pull/6511#issuecomment-476371088
 
 
   @iemejia Rebased. I cannot think of anything left to do. If its possible to 
run performance tests before this is merged we could do that.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218254)
Time Spent: 3h 10m  (was: 3h)

> Spark Streaming Duplicated Encoding/Decoding Effort
> ---
>
> Key: BEAM-5519
> URL: https://issues.apache.org/jira/browse/BEAM-5519
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Winkelman
>Assignee: Kyle Winkelman
>Priority: Major
>  Labels: spark, spark-streaming, triaged
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> When using the SparkRunner in streaming mode. There is a call to groupByKey 
> followed by a call to updateStateByKey. BEAM-1815 fixed an issue where this 
> used to cause 2 shuffles but it still causes 2 encode/decode cycles.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3279) Deprecate and remove Coder consistentWithEquals in favor of overriding structuredValue

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3279?focusedWorklogId=218250=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218250
 ]

ASF GitHub Bot logged work on BEAM-3279:


Author: ASF GitHub Bot
Created on: 25/Mar/19 20:46
Start Date: 25/Mar/19 20:46
Worklog Time Spent: 10m 
  Work Description: AlexKbit commented on issue #8071: [BEAM-3279] 
Deprecate and remove Coder.consistentWithEquals
URL: https://github.com/apache/beam/pull/8071#issuecomment-476370351
 
 
   Thank you! Can you create another ticket for this task(assign to me)?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218250)
Time Spent: 4.5h  (was: 4h 20m)

> Deprecate and remove Coder consistentWithEquals in favor of overriding 
> structuredValue
> --
>
> Key: BEAM-3279
> URL: https://issues.apache.org/jira/browse/BEAM-3279
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Alexander Savchenko
>Priority: Minor
>  Labels: starter
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Summary of discussion linked below:
> consistentWithEquals() is redundant w.r.t. structuralValue(), and should be 
> deprecated. I think our mutation detectors are already using 
> structuralValue(), so the work here would be to simply mark the method 
> deprecated, remove all remaining overrides in the SDK, and document that 
> overriding the method is a no-op.
> https://lists.apache.org/thread.html/8b2dcf09ba8e46b3c008293d99e4028d10463148b68326687dc29a4d@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6824) Plumb FnApi ElementCount metrics in Dataflow Runner.

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6824?focusedWorklogId=218249=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218249
 ]

ASF GitHub Bot logged work on BEAM-6824:


Author: ASF GitHub Bot
Created on: 25/Mar/19 20:44
Start Date: 25/Mar/19 20:44
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #8095: [BEAM-6824] Add 
Element count transformer for Java worker FnApi
URL: https://github.com/apache/beam/pull/8095#issuecomment-476369909
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218249)
Time Spent: 2h 50m  (was: 2h 40m)

> Plumb FnApi ElementCount metrics in Dataflow Runner.
> 
>
> Key: BEAM-6824
> URL: https://issues.apache.org/jira/browse/BEAM-6824
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-dataflow
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> We want to provide FnApi metrics to Dataflow Users.
> This bug covers ElementCount metric.
> Current approach utilizes mapping of PCollectionID based on WorkItem and 
> ProcessBundle graphs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3279) Deprecate and remove Coder consistentWithEquals in favor of overriding structuredValue

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3279?focusedWorklogId=218246=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218246
 ]

ASF GitHub Bot logged work on BEAM-3279:


Author: ASF GitHub Bot
Created on: 25/Mar/19 20:39
Start Date: 25/Mar/19 20:39
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #8071: 
[BEAM-3279] Deprecate and remove Coder.consistentWithEquals
URL: https://github.com/apache/beam/pull/8071
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218246)
Time Spent: 4h 10m  (was: 4h)

> Deprecate and remove Coder consistentWithEquals in favor of overriding 
> structuredValue
> --
>
> Key: BEAM-3279
> URL: https://issues.apache.org/jira/browse/BEAM-3279
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Alexander Savchenko
>Priority: Minor
>  Labels: starter
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Summary of discussion linked below:
> consistentWithEquals() is redundant w.r.t. structuralValue(), and should be 
> deprecated. I think our mutation detectors are already using 
> structuralValue(), so the work here would be to simply mark the method 
> deprecated, remove all remaining overrides in the SDK, and document that 
> overriding the method is a no-op.
> https://lists.apache.org/thread.html/8b2dcf09ba8e46b3c008293d99e4028d10463148b68326687dc29a4d@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3279) Deprecate and remove Coder consistentWithEquals in favor of overriding structuredValue

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3279?focusedWorklogId=218247=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218247
 ]

ASF GitHub Bot logged work on BEAM-3279:


Author: ASF GitHub Bot
Created on: 25/Mar/19 20:39
Start Date: 25/Mar/19 20:39
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #8071: [BEAM-3279] 
Deprecate and remove Coder.consistentWithEquals
URL: https://github.com/apache/beam/pull/8071#issuecomment-476368129
 
 
   Thanks! If you want to follow-up there is good test coverage missing here: 
each coder could have a test, shared in `CoderProperties`, that the equality on 
encoded bytes matches the equality on structuralValue.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218247)
Time Spent: 4h 20m  (was: 4h 10m)

> Deprecate and remove Coder consistentWithEquals in favor of overriding 
> structuredValue
> --
>
> Key: BEAM-3279
> URL: https://issues.apache.org/jira/browse/BEAM-3279
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Alexander Savchenko
>Priority: Minor
>  Labels: starter
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Summary of discussion linked below:
> consistentWithEquals() is redundant w.r.t. structuralValue(), and should be 
> deprecated. I think our mutation detectors are already using 
> structuralValue(), so the work here would be to simply mark the method 
> deprecated, remove all remaining overrides in the SDK, and document that 
> overriding the method is a no-op.
> https://lists.apache.org/thread.html/8b2dcf09ba8e46b3c008293d99e4028d10463148b68326687dc29a4d@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3279) Deprecate and remove Coder consistentWithEquals in favor of overriding structuredValue

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3279?focusedWorklogId=218245=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218245
 ]

ASF GitHub Bot logged work on BEAM-3279:


Author: ASF GitHub Bot
Created on: 25/Mar/19 20:37
Start Date: 25/Mar/19 20:37
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #8071: [BEAM-3279] 
Deprecate and remove Coder.consistentWithEquals
URL: https://github.com/apache/beam/pull/8071#issuecomment-476367387
 
 
   I can squash them for you. Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218245)
Time Spent: 4h  (was: 3h 50m)

> Deprecate and remove Coder consistentWithEquals in favor of overriding 
> structuredValue
> --
>
> Key: BEAM-3279
> URL: https://issues.apache.org/jira/browse/BEAM-3279
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Alexander Savchenko
>Priority: Minor
>  Labels: starter
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Summary of discussion linked below:
> consistentWithEquals() is redundant w.r.t. structuralValue(), and should be 
> deprecated. I think our mutation detectors are already using 
> structuralValue(), so the work here would be to simply mark the method 
> deprecated, remove all remaining overrides in the SDK, and document that 
> overriding the method is a no-op.
> https://lists.apache.org/thread.html/8b2dcf09ba8e46b3c008293d99e4028d10463148b68326687dc29a4d@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6493) examples in Kotlin

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6493?focusedWorklogId=218243=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218243
 ]

ASF GitHub Bot logged work on BEAM-6493:


Author: ASF GitHub Bot
Created on: 25/Mar/19 20:31
Start Date: 25/Mar/19 20:31
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #8034: [BEAM-6493] Convert 
the WordCount samples to Kotlin
URL: https://github.com/apache/beam/pull/8034#issuecomment-476365506
 
 
   No need to apologize : ) I just didn't want you to forget about thisw, 
because we're grateful for the contribution.
   Sorry about the spotless trouble. You can run `./gradlew  
:beam-examples-kotlin:spotlessApply` to do the autoformatting.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218243)
Time Spent: 3.5h  (was: 3h 20m)
Remaining Estimate: 501.5h  (was: 501h 40m)

> examples in Kotlin
> --
>
> Key: BEAM-6493
> URL: https://issues.apache.org/jira/browse/BEAM-6493
> Project: Beam
>  Issue Type: Task
>  Components: examples-java
>Affects Versions: Not applicable
>Reporter: Harshit Dwivedi
>Assignee: Harshit Dwivedi
>Priority: Minor
>  Labels: documentation
> Fix For: Not applicable
>
>   Original Estimate: 504h
>  Time Spent: 3.5h
>  Remaining Estimate: 501.5h
>
> I have been using Apache Beam for few of my projects in production since the 
> past 6 months and apart from Java, [Kotlin|https://kotlinlang.org/] also 
> seems to work as well with no issues whatsoever.
> But currently, the Github Repository of Apache Beam contains examples only in 
> Java which might be an issue for other developers who want to use Apache Beam 
> SDK with kotlin as there are no sample resources available.
> That said, I would love to go ahead and add kotlin examples alongside the 
> current java examples in the [Beam 
> repository|https://github.com/apache/beam/tree/master/examples/java].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6882) [Flake][DF IT tests] job ##### terminated in state UNKNOWN but did not return a failure reason.

2019-03-25 Thread Mikhail Gryzykhin (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801029#comment-16801029
 ] 

Mikhail Gryzykhin commented on BEAM-6882:
-

Hi Dan,
Can you verify whether this is a duplicate with 
https://issues.apache.org/jira/browse/BEAM-6284

> [Flake][DF IT tests] job # terminated in state UNKNOWN but did not return 
> a failure reason.
> ---
>
> Key: BEAM-6882
> URL: https://issues.apache.org/jira/browse/BEAM-6882
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures, testing
>Reporter: Mikhail Gryzykhin
>Assignee: Daniel Oliveira
>Priority: Critical
>  Labels: currently-failing
>
>  
> Dataflow integration tests fail regularly failing to properly detect job 
> status. Usually error looks like:
> java.lang.RuntimeException: Dataflow job 
> 2019-03-21_05_08_46-17069693240682025803 terminated in state UNKNOWN but did 
> not return a failure reason. at 
> org.apache.beam.runners.dataflow.TestDataflowRunner.run(TestDataflowRunner.java:134)
> We have several flakes like this a week. (More statistics might be good to 
> add)
> Sample log.
> https://builds.apache.org/job/beam_PostCommit_Java11_ValidatesRunner_PortabilityApi_Dataflow/17/testReport/junit/org.apache.beam.sdk.transforms.windowing/WindowTest/testTimestampCombinerDefault/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-6882) [Flake][DF IT tests] job ##### terminated in state UNKNOWN but did not return a failure reason.

2019-03-25 Thread Mikhail Gryzykhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Gryzykhin reassigned BEAM-6882:
---

Assignee: Daniel Oliveira

> [Flake][DF IT tests] job # terminated in state UNKNOWN but did not return 
> a failure reason.
> ---
>
> Key: BEAM-6882
> URL: https://issues.apache.org/jira/browse/BEAM-6882
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures, testing
>Reporter: Mikhail Gryzykhin
>Assignee: Daniel Oliveira
>Priority: Critical
>  Labels: currently-failing
>
>  
> Dataflow integration tests fail regularly failing to properly detect job 
> status. Usually error looks like:
> java.lang.RuntimeException: Dataflow job 
> 2019-03-21_05_08_46-17069693240682025803 terminated in state UNKNOWN but did 
> not return a failure reason. at 
> org.apache.beam.runners.dataflow.TestDataflowRunner.run(TestDataflowRunner.java:134)
> We have several flakes like this a week. (More statistics might be good to 
> add)
> Sample log.
> https://builds.apache.org/job/beam_PostCommit_Java11_ValidatesRunner_PortabilityApi_Dataflow/17/testReport/junit/org.apache.beam.sdk.transforms.windowing/WindowTest/testTimestampCombinerDefault/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6824) Plumb FnApi ElementCount metrics in Dataflow Runner.

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6824?focusedWorklogId=218228=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218228
 ]

ASF GitHub Bot logged work on BEAM-6824:


Author: ASF GitHub Bot
Created on: 25/Mar/19 19:49
Start Date: 25/Mar/19 19:49
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #8095: [BEAM-6824] Add 
Element count transformer for Java worker FnApi
URL: https://github.com/apache/beam/pull/8095#issuecomment-476350891
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218228)
Time Spent: 2h 40m  (was: 2.5h)

> Plumb FnApi ElementCount metrics in Dataflow Runner.
> 
>
> Key: BEAM-6824
> URL: https://issues.apache.org/jira/browse/BEAM-6824
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-dataflow
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> We want to provide FnApi metrics to Dataflow Users.
> This bug covers ElementCount metric.
> Current approach utilizes mapping of PCollectionID based on WorkItem and 
> ProcessBundle graphs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6876) User state cleanup in portable Flink runner

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6876?focusedWorklogId=218227=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218227
 ]

ASF GitHub Bot logged work on BEAM-6876:


Author: ASF GitHub Bot
Created on: 25/Mar/19 19:47
Start Date: 25/Mar/19 19:47
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #8118: [BEAM-6876] 
Cleanup user state in portable Flink Runner
URL: https://github.com/apache/beam/pull/8118#discussion_r268817794
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java
 ##
 @@ -713,6 +705,86 @@ private void emitResults() {
 }
   }
 
+  private DoFnRunner ensureStateCleanup(
+  SdkHarnessDoFnRunner sdkHarnessRunner) {
+if (keyCoder == null) {
+  // There won't be any state to clean up
+  // (stateful functions have to be keyed)
+  return sdkHarnessRunner;
+}
+// Takes care of state cleanup via StatefulDoFnRunner
+Coder windowCoder = windowingStrategy.getWindowFn().windowCoder();
+StatefulDoFnRunner.CleanupTimer cleanupTimer =
+new StatefulDoFnRunner.CleanupTimer() {
+
+  private static final String GC_TIMER_ID = "__user-state-cleanup__";
+
+  @Override
+  public Instant currentInputWatermarkTime() {
+return timerInternals.currentInputWatermarkTime();
+  }
+
+  @Override
+  public void setForWindow(InputT input, BoundedWindow window) {
+Preconditions.checkNotNull(input, "Null input passed to 
CleanupTimer");
+// make sure this fires after any window.maxTimestamp() timers
+Instant gcTime = LateDataUtils.garbageCollectionTime(window, 
windowingStrategy).plus(1);
+ByteBuffer key;
+try {
+  key =
+  ByteBuffer.wrap(
+  CoderUtils.encodeToByteArray((Coder) keyCoder, ((KV) 
input).getKey()));
 
 Review comment:
   Do we need to handle key collision in different windows?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218227)
Time Spent: 40m  (was: 0.5h)

> User state cleanup in portable Flink runner
> ---
>
> Key: BEAM-6876
> URL: https://issues.apache.org/jira/browse/BEAM-6876
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.11.0
>Reporter: Thomas Weise
>Assignee: Maximilian Michels
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> State is currently not being cleaned up by the runner.
> [https://lists.apache.org/thread.html/86f0809fbfa3da873051287b9ff249d6dd5a896b45409db1e484cf38@%3Cdev.beam.apache.org%3E]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6876) User state cleanup in portable Flink runner

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6876?focusedWorklogId=218226=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218226
 ]

ASF GitHub Bot logged work on BEAM-6876:


Author: ASF GitHub Bot
Created on: 25/Mar/19 19:47
Start Date: 25/Mar/19 19:47
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #8118: [BEAM-6876] 
Cleanup user state in portable Flink Runner
URL: https://github.com/apache/beam/pull/8118#discussion_r268820620
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java
 ##
 @@ -713,6 +705,86 @@ private void emitResults() {
 }
   }
 
+  private DoFnRunner ensureStateCleanup(
+  SdkHarnessDoFnRunner sdkHarnessRunner) {
+if (keyCoder == null) {
+  // There won't be any state to clean up
+  // (stateful functions have to be keyed)
+  return sdkHarnessRunner;
+}
+// Takes care of state cleanup via StatefulDoFnRunner
+Coder windowCoder = windowingStrategy.getWindowFn().windowCoder();
+StatefulDoFnRunner.CleanupTimer cleanupTimer =
+new StatefulDoFnRunner.CleanupTimer() {
+
+  private static final String GC_TIMER_ID = "__user-state-cleanup__";
+
+  @Override
+  public Instant currentInputWatermarkTime() {
+return timerInternals.currentInputWatermarkTime();
+  }
+
+  @Override
+  public void setForWindow(InputT input, BoundedWindow window) {
+Preconditions.checkNotNull(input, "Null input passed to 
CleanupTimer");
+// make sure this fires after any window.maxTimestamp() timers
+Instant gcTime = LateDataUtils.garbageCollectionTime(window, 
windowingStrategy).plus(1);
+ByteBuffer key;
+try {
+  key =
+  ByteBuffer.wrap(
+  CoderUtils.encodeToByteArray((Coder) keyCoder, ((KV) 
input).getKey()));
+} catch (CoderException e) {
+  throw new RuntimeException("Failed to encode key for Flink state 
backend", e);
+}
+// Ensure the state backend is not concurrently accessed by the 
state requests
+try {
+  stateBackendLock.lock();
+  // Set these two to ensure correct timer registration
+  // 1) For the timer setting
+  sdkHarnessRunner.setCurrentTimerKey(key);
+  // 2) For the timer deduplication
+  getKeyedStateBackend().setCurrentKey(key);
+  timerInternals.setTimer(
+  StateNamespaces.window(windowCoder, window),
+  GC_TIMER_ID,
+  gcTime,
 
 Review comment:
   Should we compare gcTime in case of pre-existing timer for a key?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218226)
Time Spent: 40m  (was: 0.5h)

> User state cleanup in portable Flink runner
> ---
>
> Key: BEAM-6876
> URL: https://issues.apache.org/jira/browse/BEAM-6876
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.11.0
>Reporter: Thomas Weise
>Assignee: Maximilian Michels
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> State is currently not being cleaned up by the runner.
> [https://lists.apache.org/thread.html/86f0809fbfa3da873051287b9ff249d6dd5a896b45409db1e484cf38@%3Cdev.beam.apache.org%3E]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3279) Deprecate and remove Coder consistentWithEquals in favor of overriding structuredValue

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3279?focusedWorklogId=218219=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218219
 ]

ASF GitHub Bot logged work on BEAM-3279:


Author: ASF GitHub Bot
Created on: 25/Mar/19 19:08
Start Date: 25/Mar/19 19:08
Worklog Time Spent: 10m 
  Work Description: AlexKbit commented on issue #8071: [BEAM-3279] 
Deprecate and remove Coder.consistentWithEquals
URL: https://github.com/apache/beam/pull/8071#issuecomment-476337232
 
 
   @kennknowles should i do squash with this changes again?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218219)
Time Spent: 3h 50m  (was: 3h 40m)

> Deprecate and remove Coder consistentWithEquals in favor of overriding 
> structuredValue
> --
>
> Key: BEAM-3279
> URL: https://issues.apache.org/jira/browse/BEAM-3279
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Alexander Savchenko
>Priority: Minor
>  Labels: starter
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Summary of discussion linked below:
> consistentWithEquals() is redundant w.r.t. structuralValue(), and should be 
> deprecated. I think our mutation detectors are already using 
> structuralValue(), so the work here would be to simply mark the method 
> deprecated, remove all remaining overrides in the SDK, and document that 
> overriding the method is a no-op.
> https://lists.apache.org/thread.html/8b2dcf09ba8e46b3c008293d99e4028d10463148b68326687dc29a4d@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6867) Data race in Side Input Metrics code

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6867?focusedWorklogId=218215=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218215
 ]

ASF GitHub Bot logged work on BEAM-6867:


Author: ASF GitHub Bot
Created on: 25/Mar/19 18:53
Start Date: 25/Mar/19 18:53
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #8080: [BEAM-6867] Fixing 
data race in DataflowSideInputReadCounter
URL: https://github.com/apache/beam/pull/8080#issuecomment-476331716
 
 
   I am holding this back a bit to try and repro the issue / test the fix. I 
will merge this by Wednesday.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218215)
Time Spent: 20m  (was: 10m)

> Data race in Side Input Metrics code
> 
>
> Key: BEAM-6867
> URL: https://issues.apache.org/jira/browse/BEAM-6867
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
> Fix For: 2.12.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6257) Can we deprecate the side input paths through PAssert?

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6257?focusedWorklogId=218204=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218204
 ]

ASF GitHub Bot logged work on BEAM-6257:


Author: ASF GitHub Bot
Created on: 25/Mar/19 18:29
Start Date: 25/Mar/19 18:29
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #8094: [BEAM-6257] 
PAssert.thatSingleton: use GBK instead of side inputs
URL: https://github.com/apache/beam/pull/8094#discussion_r268791096
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/testing/PAssert.java
 ##
 @@ -782,6 +782,121 @@ public PCollectionSingletonIterableAssert(
 }
   }
 
+  /**
+   * A {@link SingletonAssert} about the contents of a {@link PCollection} 
when it contains a single
+   * value of type {@code T}. This does not require the runner to support side 
inputs.
+   */
+  private static class PCollectionSingletonAssert implements 
SingletonAssert {
+private final PCollection actual;
+private final Coder coder;
+private final AssertionWindows rewindowingStrategy;
+private final SimpleFunction>, 
Iterable> paneExtractor;
+
+private final PAssertionSite site;
+
+PCollectionSingletonAssert(PCollection actual, PAssertionSite site) {
+  this(actual, IntoGlobalWindow.of(), PaneExtractors.allPanes(), site);
+}
+
+PCollectionSingletonAssert(
+PCollection actual,
+AssertionWindows rewindowingStrategy,
+SimpleFunction>, Iterable> 
paneExtractor,
+PAssertionSite site) {
+  this.actual = actual;
+  this.coder = actual.getCoder();
+  this.rewindowingStrategy = rewindowingStrategy;
+  this.paneExtractor = paneExtractor;
+  this.site = site;
+}
+
+@Override
+public PCollectionSingletonAssert inFinalPane(BoundedWindow window) {
+  return withPanes(window, PaneExtractors.finalPane());
+}
+
+@Override
+public PCollectionSingletonAssert inOnTimePane(BoundedWindow window) {
+  return withPanes(window, PaneExtractors.onTimePane());
+}
+
+@Override
+public PCollectionSingletonAssert inEarlyPane(BoundedWindow window) {
+  return withPanes(window, PaneExtractors.earlyPanes());
+}
+
+@Override
+public SingletonAssert isEqualTo(T expected) {
+  return satisfies(new AssertIsEqualToRelation<>(), expected);
+}
+
+@Override
+public SingletonAssert notEqualTo(T notExpected) {
+  return satisfies(new AssertNotEqualToRelation<>(), notExpected);
+}
+
+@Override
+public PCollectionSingletonAssert inOnlyPane(BoundedWindow window) {
+  return withPanes(window, PaneExtractors.onlyPane(site));
+}
+
+private PCollectionSingletonAssert withPanes(
+BoundedWindow window,
+SimpleFunction>, Iterable> 
paneExtractor) {
+  @SuppressWarnings({"unchecked", "rawtypes"})
+  Coder windowCoder =
+  (Coder) actual.getWindowingStrategy().getWindowFn().windowCoder();
+  return new PCollectionSingletonAssert<>(
+  actual, IntoStaticWindows.of(windowCoder, window), paneExtractor, 
site);
+}
+
+@Override
+public PCollectionSingletonAssert satisfies(SerializableFunction checkerFn) {
+  actual.apply(
+  "PAssert$" + (assertCount++),
+  new GroupThenAssertForSingleton<>(checkerFn, rewindowingStrategy, 
paneExtractor, site));
+  return this;
+}
+
+/**
+ * Applies an {@link AssertRelation} to check the provided relation 
against the value of this
+ * assert and the provided expected value.
+ *
+ * Returns this {@code SingletonAssert}.
+ */
+private PCollectionSingletonAssert satisfies(
+AssertRelation relation, final T expected) {
+  return satisfies(new CheckRelationAgainstExpected<>(relation, expected, 
coder));
+}
+
+/**
+ * @throws UnsupportedOperationException always
+ * @deprecated {@link Object#equals(Object)} is not supported on PAssert 
objects. If you meant
+ * to test object equality, use {@link #isEqualTo} instead.
+ */
+@SuppressFBWarnings
 
 Review comment:
   fixed
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218204)
Time Spent: 1h 50m  (was: 1h 40m)

> Can we deprecate the side input paths through PAssert?
> --
>
> Key: BEAM-6257
> URL: https://issues.apache.org/jira/browse/BEAM-6257
> Project: Beam
>  Issue Type: Improvement
>  Components: 

[jira] [Resolved] (BEAM-6838) Improve error messages for unregistered types

2019-03-25 Thread Daniel Oliveira (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Oliveira resolved BEAM-6838.
---
   Resolution: Fixed
Fix Version/s: 2.13.0

> Improve error messages for unregistered types
> -
>
> Key: BEAM-6838
> URL: https://issues.apache.org/jira/browse/BEAM-6838
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Minor
> Fix For: 2.13.0
>
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> When users have a type that can't be serialized (it has a function Field or 
> similar) or such as more complex struct types with unexported fields (like 
> protocol buffers), the error should be improved to display the type being 
> registered, and better yet, the beam.RegisterType(...) code they can use to 
> enable use of their type.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6257) Can we deprecate the side input paths through PAssert?

2019-03-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6257?focusedWorklogId=218203=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218203
 ]

ASF GitHub Bot logged work on BEAM-6257:


Author: ASF GitHub Bot
Created on: 25/Mar/19 18:29
Start Date: 25/Mar/19 18:29
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #8094: [BEAM-6257] 
PAssert.thatSingleton: use GBK instead of side inputs
URL: https://github.com/apache/beam/pull/8094#issuecomment-476322919
 
 
   I guess we have to manually re-run tests that were in progress when a change 
was made to the code? :thinking: 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218203)
Time Spent: 1h 40m  (was: 1.5h)

> Can we deprecate the side input paths through PAssert?
> --
>
> Key: BEAM-6257
> URL: https://issues.apache.org/jira/browse/BEAM-6257
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Priority: Major
>  Labels: starter, triaged
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> PAssert has two distinct paths - one uses GBK with a single-firing trigger, 
> and one uses side inputs. Side inputs are usually a later addition to a 
> runner, while GBK is one of the first primitives (with a single firing it is 
> even simple). Filing this against myself to figure out why the side input 
> version is not deprecated, and if it can be deprecated.
> Marking this as a "starter" task because finding and eliminating side input 
> version of PAssert should be fairly easy. You might need help but can ask on 
> dev@



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >