[jira] [Work logged] (BEAM-7028) Create ParDo and Combine streaming load test jobs for Java SDK

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7028?focusedWorklogId=240079&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240079
 ]

ASF GitHub Bot logged work on BEAM-7028:


Author: ASF GitHub Bot
Created on: 10/May/19 07:13
Start Date: 10/May/19 07:13
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #8539: [BEAM-7028] Change 
ParDo load test according to the proposal
URL: https://github.com/apache/beam/pull/8539#issuecomment-491184101
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240079)
Time Spent: 16h 20m  (was: 16h 10m)

> Create ParDo and Combine streaming load test jobs for Java SDK
> --
>
> Key: BEAM-7028
> URL: https://issues.apache.org/jira/browse/BEAM-7028
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Michal Walenia
>Assignee: Michal Walenia
>Priority: Minor
>  Time Spent: 16h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-7263) Deprecate AutoValue set/getClientConfiguration from JdbcIO

2019-05-10 Thread JIRA
Ismaël Mejía created BEAM-7263:
--

 Summary: Deprecate AutoValue set/getClientConfiguration from JdbcIO
 Key: BEAM-7263
 URL: https://issues.apache.org/jira/browse/BEAM-7263
 Project: Beam
  Issue Type: Improvement
  Components: io-java-jdbc
Reporter: Ismaël Mejía
Assignee: Ismaël Mejía


JdbcIO relies now on `withDataSourceProviderFn` as the primary way to build a 
DataSource, the main method used to do this is `withDatasourceConfiguration` 
relies now on it, so the 'interna' method to get the object should not be 
exposed anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7263) Deprecate AutoValue set/getClientConfiguration from JdbcIO

2019-05-10 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7263:
---
Fix Version/s: 2.14.0

> Deprecate AutoValue set/getClientConfiguration from JdbcIO
> --
>
> Key: BEAM-7263
> URL: https://issues.apache.org/jira/browse/BEAM-7263
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: 2.14.0
>
>
> JdbcIO relies now on `withDataSourceProviderFn` as the primary way to build a 
> DataSource, the main method used to do this is `withDatasourceConfiguration` 
> relies now on it, so the 'interna' method to get the object should not be 
> exposed anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7263) Deprecate AutoValue set/getClientConfiguration from JdbcIO

2019-05-10 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7263:
---
Status: Open  (was: Triage Needed)

> Deprecate AutoValue set/getClientConfiguration from JdbcIO
> --
>
> Key: BEAM-7263
> URL: https://issues.apache.org/jira/browse/BEAM-7263
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: 2.14.0
>
>
> JdbcIO relies now on `withDataSourceProviderFn` as the primary way to build a 
> DataSource, the main method used to do this is `withDatasourceConfiguration` 
> relies now on it, so the 'interna' method to get the object should not be 
> exposed anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (BEAM-7263) Deprecate AutoValue set/getClientConfiguration from JdbcIO

2019-05-10 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-7263 started by Ismaël Mejía.
--
> Deprecate AutoValue set/getClientConfiguration from JdbcIO
> --
>
> Key: BEAM-7263
> URL: https://issues.apache.org/jira/browse/BEAM-7263
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: 2.14.0
>
>
> JdbcIO relies now on `withDataSourceProviderFn` as the primary way to build a 
> DataSource, the main method used to do this is `withDatasourceConfiguration` 
> relies now on it, so the 'interna' method to get the object should not be 
> exposed anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7263) Deprecate set/getClientConfiguration from JdbcIO

2019-05-10 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7263:
---
Summary: Deprecate set/getClientConfiguration from JdbcIO  (was: Deprecate 
AutoValue set/getClientConfiguration from JdbcIO)

> Deprecate set/getClientConfiguration from JdbcIO
> 
>
> Key: BEAM-7263
> URL: https://issues.apache.org/jira/browse/BEAM-7263
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: 2.14.0
>
>
> JdbcIO relies now on `withDataSourceProviderFn` as the primary way to build a 
> DataSource, the main method used to do this is `withDatasourceConfiguration` 
> relies now on it, so the 'interna' method to get the object should not be 
> exposed anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7263) Deprecate set/getClientConfiguration in JdbcIO

2019-05-10 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7263:
---
Summary: Deprecate set/getClientConfiguration in JdbcIO  (was: Deprecate 
set/getClientConfiguration from JdbcIO)

> Deprecate set/getClientConfiguration in JdbcIO
> --
>
> Key: BEAM-7263
> URL: https://issues.apache.org/jira/browse/BEAM-7263
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: 2.14.0
>
>
> JdbcIO relies now on `withDataSourceProviderFn` as the primary way to build a 
> DataSource, the main method used to do this is `withDatasourceConfiguration` 
> relies now on it, so the 'interna' method to get the object should not be 
> exposed anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7263) Deprecate set/getClientConfiguration in JdbcIO

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7263?focusedWorklogId=240095&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240095
 ]

ASF GitHub Bot logged work on BEAM-7263:


Author: ASF GitHub Bot
Created on: 10/May/19 08:07
Start Date: 10/May/19 08:07
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #8547: [BEAM-7263] 
Deprecate set/getClientConfiguration in JdbcIO
URL: https://github.com/apache/beam/pull/8547
 
 
   This is a minor clean up to clean the two methods because they are not used 
internally anymore, the ClientConfiguration is just wrapped and passed as a 
function to `withDataSourceProviderFn`. This has almost zero impact on end 
users because the set functionality is already covetred by the `with` method 
and the `get` method is not mandatory because the user already has the 
reference.
   
   R: @jbonofre 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240095)
Time Spent: 10m
Remaining Estimate: 0h

> Deprecate set/getClientConfiguration in JdbcIO
> --
>
> Key: BEAM-7263
> URL: https://issues.apache.org/jira/browse/BEAM-7263
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: 2.14.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> JdbcIO relies now on `withDataSourceProviderFn` as the primary way to build a 
> DataSource, the main method used to do this is `withDatasourceConfiguration` 
> relies now on it, so the 'interna' method to get the object should not be 
> exposed anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7143) adding withConsumerConfigUpdates

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7143?focusedWorklogId=240097&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240097
 ]

ASF GitHub Bot logged work on BEAM-7143:


Author: ASF GitHub Bot
Created on: 10/May/19 08:11
Start Date: 10/May/19 08:11
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #8398: [BEAM-7143] 
adding withConsumerConfigUpdates
URL: https://github.com/apache/beam/pull/8398#issuecomment-491200517
 
 
   @ihji Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240097)
Time Spent: 1h 10m  (was: 1h)

> adding withConsumerConfigUpdates
> 
>
> Key: BEAM-7143
> URL: https://issues.apache.org/jira/browse/BEAM-7143
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Minor
> Fix For: 2.13.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> To modify `ConsumerConfig` for main consumer, we use 
> `updateConsumerProperties`. However, to modify `ConsumerConfig` for offset 
> consumer, the right method is `withOffsetConsumerConfigOverrides`. It would 
> be good to match both names for improving usability.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-7265) Update Spark runner to use spark version 2.4.3

2019-05-10 Thread JIRA
Ismaël Mejía created BEAM-7265:
--

 Summary: Update Spark runner to use spark version 2.4.3
 Key: BEAM-7265
 URL: https://issues.apache.org/jira/browse/BEAM-7265
 Project: Beam
  Issue Type: Improvement
  Components: runner-spark
Reporter: Ismaël Mejía
Assignee: Ismaël Mejía


To keep up with upstream fixes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (BEAM-7265) Update Spark runner to use spark version 2.4.3

2019-05-10 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-7265 started by Ismaël Mejía.
--
> Update Spark runner to use spark version 2.4.3
> --
>
> Key: BEAM-7265
> URL: https://issues.apache.org/jira/browse/BEAM-7265
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: 2.14.0
>
>
> To keep up with upstream fixes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7265) Update Spark runner to use spark version 2.4.3

2019-05-10 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7265:
---
Fix Version/s: 2.14.0

> Update Spark runner to use spark version 2.4.3
> --
>
> Key: BEAM-7265
> URL: https://issues.apache.org/jira/browse/BEAM-7265
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: 2.14.0
>
>
> To keep up with upstream fixes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7265) Update Spark runner to use spark version 2.4.3

2019-05-10 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7265:
---
Status: Open  (was: Triage Needed)

> Update Spark runner to use spark version 2.4.3
> --
>
> Key: BEAM-7265
> URL: https://issues.apache.org/jira/browse/BEAM-7265
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: 2.14.0
>
>
> To keep up with upstream fixes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7265) Update Spark runner to use spark version 2.4.3

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7265?focusedWorklogId=240104&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240104
 ]

ASF GitHub Bot logged work on BEAM-7265:


Author: ASF GitHub Bot
Created on: 10/May/19 08:20
Start Date: 10/May/19 08:20
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #8548: [BEAM-7265] 
Update Spark runner to use spark version 2.4.3
URL: https://github.com/apache/beam/pull/8548
 
 
   R: @aromanenko-dev 
   CC: @jbonofre 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240104)
Time Spent: 10m
Remaining Estimate: 0h

> Update Spark runner to use spark version 2.4.3
> --
>
> Key: BEAM-7265
> URL: https://issues.apache.org/jira/browse/BEAM-7265
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: 2.14.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> To keep up with upstream fixes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7264) Remove set/getClientConfiguration in JdbcIO

2019-05-10 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7264:
---
Fix Version/s: 2.16.0

> Remove set/getClientConfiguration in JdbcIO
> ---
>
> Key: BEAM-7264
> URL: https://issues.apache.org/jira/browse/BEAM-7264
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: 2.16.0
>
>
> BEAM-7263 deprecated these methods and now they can be removed.
> This is a minor clean up to remove the two methods because they are not used 
> internally anymore, the `ClientConfiguration` object is now wrapped and 
> passed as a function to `withDataSourceProviderFn`. This has almost zero 
> impact on end users because the set functionality is already covered by the 
> `with` method and the `get` method is not mandatory because the user already 
> has the reference.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-7264) Remove set/getClientConfiguration in JdbcIO

2019-05-10 Thread JIRA
Ismaël Mejía created BEAM-7264:
--

 Summary: Remove set/getClientConfiguration in JdbcIO
 Key: BEAM-7264
 URL: https://issues.apache.org/jira/browse/BEAM-7264
 Project: Beam
  Issue Type: Improvement
  Components: io-java-jdbc
Reporter: Ismaël Mejía
Assignee: Ismaël Mejía


BEAM-7263 deprecated these methods and now they can be removed.

This is a minor clean up to remove the two methods because they are not used 
internally anymore, the `ClientConfiguration` object is now wrapped and passed 
as a function to `withDataSourceProviderFn`. This has almost zero impact on end 
users because the set functionality is already covered by the `with` method and 
the `get` method is not mandatory because the user already has the reference.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7265) Update Spark runner to use spark version 2.4.3

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7265?focusedWorklogId=240105&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240105
 ]

ASF GitHub Bot logged work on BEAM-7265:


Author: ASF GitHub Bot
Created on: 10/May/19 08:20
Start Date: 10/May/19 08:20
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #8548: [BEAM-7265] Update 
Spark runner to use spark version 2.4.3
URL: https://github.com/apache/beam/pull/8548#issuecomment-491203204
 
 
   Run Spark ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240105)
Time Spent: 20m  (was: 10m)

> Update Spark runner to use spark version 2.4.3
> --
>
> Key: BEAM-7265
> URL: https://issues.apache.org/jira/browse/BEAM-7265
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: 2.14.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> To keep up with upstream fixes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7264) Remove set/getClientConfiguration in JdbcIO

2019-05-10 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7264:
---
Status: Open  (was: Triage Needed)

> Remove set/getClientConfiguration in JdbcIO
> ---
>
> Key: BEAM-7264
> URL: https://issues.apache.org/jira/browse/BEAM-7264
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>
> BEAM-7263 deprecated these methods and now they can be removed.
> This is a minor clean up to remove the two methods because they are not used 
> internally anymore, the `ClientConfiguration` object is now wrapped and 
> passed as a function to `withDataSourceProviderFn`. This has almost zero 
> impact on end users because the set functionality is already covered by the 
> `with` method and the `get` method is not mandatory because the user already 
> has the reference.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6627) Use Metrics API in IO performance tests

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6627?focusedWorklogId=240109&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240109
 ]

ASF GitHub Bot logged work on BEAM-6627:


Author: ASF GitHub Bot
Created on: 10/May/19 08:27
Start Date: 10/May/19 08:27
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #8400: [BEAM-6627] Added 
byte and item counting metrics to integration tests
URL: https://github.com/apache/beam/pull/8400#issuecomment-490767376
 
 
   @lgajowy I reverted the enum creation. What should I do with the commit 
history?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240109)
Time Spent: 13h 40m  (was: 13.5h)

> Use Metrics API in IO performance tests
> ---
>
> Key: BEAM-6627
> URL: https://issues.apache.org/jira/browse/BEAM-6627
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Michal Walenia
>Assignee: Michal Walenia
>Priority: Minor
>  Time Spent: 13h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6627) Use Metrics API in IO performance tests

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6627?focusedWorklogId=240108&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240108
 ]

ASF GitHub Bot logged work on BEAM-6627:


Author: ASF GitHub Bot
Created on: 10/May/19 08:27
Start Date: 10/May/19 08:27
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #8400: [BEAM-6627] Added 
byte and item counting metrics to integration tests
URL: https://github.com/apache/beam/pull/8400#issuecomment-491205377
 
 
   @lgajowy I squashed the commits to be well organized, I think it's ready to 
merge. WDYT?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240108)
Time Spent: 13.5h  (was: 13h 20m)

> Use Metrics API in IO performance tests
> ---
>
> Key: BEAM-6627
> URL: https://issues.apache.org/jira/browse/BEAM-6627
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Michal Walenia
>Assignee: Michal Walenia
>Priority: Minor
>  Time Spent: 13.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6627) Use Metrics API in IO performance tests

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6627?focusedWorklogId=240110&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240110
 ]

ASF GitHub Bot logged work on BEAM-6627:


Author: ASF GitHub Bot
Created on: 10/May/19 08:28
Start Date: 10/May/19 08:28
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #8400: [BEAM-6627] Added 
byte and item counting metrics to integration tests
URL: https://github.com/apache/beam/pull/8400#issuecomment-491205377
 
 
   @lgajowy I squashed the commits to be well organized, I think it's ready to 
merge. WDYT?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240110)
Time Spent: 13h 50m  (was: 13h 40m)

> Use Metrics API in IO performance tests
> ---
>
> Key: BEAM-6627
> URL: https://issues.apache.org/jira/browse/BEAM-6627
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Michal Walenia
>Assignee: Michal Walenia
>Priority: Minor
>  Time Spent: 13h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7028) Create ParDo and Combine streaming load test jobs for Java SDK

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7028?focusedWorklogId=240111&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240111
 ]

ASF GitHub Bot logged work on BEAM-7028:


Author: ASF GitHub Bot
Created on: 10/May/19 08:30
Start Date: 10/May/19 08:30
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #8539: [BEAM-7028] Change 
ParDo load test according to the proposal
URL: https://github.com/apache/beam/pull/8539#issuecomment-491206358
 
 
   Run Load Tests Java ParDo Dataflow Streaming
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240111)
Time Spent: 16.5h  (was: 16h 20m)

> Create ParDo and Combine streaming load test jobs for Java SDK
> --
>
> Key: BEAM-7028
> URL: https://issues.apache.org/jira/browse/BEAM-7028
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Michal Walenia
>Assignee: Michal Walenia
>Priority: Minor
>  Time Spent: 16.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7028) Create ParDo and Combine streaming load test jobs for Java SDK

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7028?focusedWorklogId=240113&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240113
 ]

ASF GitHub Bot logged work on BEAM-7028:


Author: ASF GitHub Bot
Created on: 10/May/19 08:30
Start Date: 10/May/19 08:30
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #8539: [BEAM-7028] Change 
ParDo load test according to the proposal
URL: https://github.com/apache/beam/pull/8539#issuecomment-491206432
 
 
   Run Seed Job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240113)
Time Spent: 16h 50m  (was: 16h 40m)

> Create ParDo and Combine streaming load test jobs for Java SDK
> --
>
> Key: BEAM-7028
> URL: https://issues.apache.org/jira/browse/BEAM-7028
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Michal Walenia
>Assignee: Michal Walenia
>Priority: Minor
>  Time Spent: 16h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7028) Create ParDo and Combine streaming load test jobs for Java SDK

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7028?focusedWorklogId=240112&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240112
 ]

ASF GitHub Bot logged work on BEAM-7028:


Author: ASF GitHub Bot
Created on: 10/May/19 08:30
Start Date: 10/May/19 08:30
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #8539: [BEAM-7028] Change 
ParDo load test according to the proposal
URL: https://github.com/apache/beam/pull/8539#issuecomment-491206358
 
 
   Run Load Tests Java ParDo Dataflow Streaming
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240112)
Time Spent: 16h 40m  (was: 16.5h)

> Create ParDo and Combine streaming load test jobs for Java SDK
> --
>
> Key: BEAM-7028
> URL: https://issues.apache.org/jira/browse/BEAM-7028
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Michal Walenia
>Assignee: Michal Walenia
>Priority: Minor
>  Time Spent: 16h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6627) Use Metrics API in IO performance tests

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6627?focusedWorklogId=240114&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240114
 ]

ASF GitHub Bot logged work on BEAM-6627:


Author: ASF GitHub Bot
Created on: 10/May/19 08:33
Start Date: 10/May/19 08:33
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #8400: [BEAM-6627] Added 
byte and item counting metrics to integration tests
URL: https://github.com/apache/beam/pull/8400#issuecomment-491207464
 
 
   @lgajowy I squashed the commits to make history clearer and ran the tests 
locally to check if reporting works - everything looks ready to merge. WDYT?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240114)
Time Spent: 14h  (was: 13h 50m)

> Use Metrics API in IO performance tests
> ---
>
> Key: BEAM-6627
> URL: https://issues.apache.org/jira/browse/BEAM-6627
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Michal Walenia
>Assignee: Michal Walenia
>Priority: Minor
>  Time Spent: 14h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6627) Use Metrics API in IO performance tests

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6627?focusedWorklogId=240120&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240120
 ]

ASF GitHub Bot logged work on BEAM-6627:


Author: ASF GitHub Bot
Created on: 10/May/19 08:47
Start Date: 10/May/19 08:47
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #8400: [BEAM-6627] 
Added byte and item counting metrics to integration tests
URL: https://github.com/apache/beam/pull/8400#discussion_r282798257
 
 

 ##
 File path: 
sdks/java/io/jdbc/src/test/java/org/apache/beam/sdk/io/jdbc/JdbcIOIT.java
 ##
 @@ -148,6 +150,16 @@ private void gatherAndPublishMetrics(PipelineResult 
writeResult, PipelineResult
   return NamedTestResult.create(
   uuid, timestamp, "write_time", (writeEnd - writeStart) / 1e3);
 });
+suppliers.add(
+reader -> {
+  double totalBytes = reader.getCounterMetric("write_bytes");
+  return NamedTestResult.create(uuid, timestamp, "write_bytes", 
totalBytes);
 
 Review comment:
   There's no need to collect bytes/items separately for write & read. If it 
will be different the PAssert will fail. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240120)
Time Spent: 14h 10m  (was: 14h)

> Use Metrics API in IO performance tests
> ---
>
> Key: BEAM-6627
> URL: https://issues.apache.org/jira/browse/BEAM-6627
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Michal Walenia
>Assignee: Michal Walenia
>Priority: Minor
>  Time Spent: 14h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7265) Update Spark runner to use spark version 2.4.3

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7265?focusedWorklogId=240123&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240123
 ]

ASF GitHub Bot logged work on BEAM-7265:


Author: ASF GitHub Bot
Created on: 10/May/19 08:52
Start Date: 10/May/19 08:52
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #8548: [BEAM-7265] 
Update Spark runner to use spark version 2.4.3
URL: https://github.com/apache/beam/pull/8548#issuecomment-491213441
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240123)
Time Spent: 0.5h  (was: 20m)

> Update Spark runner to use spark version 2.4.3
> --
>
> Key: BEAM-7265
> URL: https://issues.apache.org/jira/browse/BEAM-7265
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: 2.14.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> To keep up with upstream fixes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6627) Use Metrics API in IO performance tests

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6627?focusedWorklogId=240129&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240129
 ]

ASF GitHub Bot logged work on BEAM-6627:


Author: ASF GitHub Bot
Created on: 10/May/19 08:59
Start Date: 10/May/19 08:59
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on pull request #8400: [BEAM-6627] 
Added byte and item counting metrics to integration tests
URL: https://github.com/apache/beam/pull/8400#discussion_r282802429
 
 

 ##
 File path: 
sdks/java/io/jdbc/src/test/java/org/apache/beam/sdk/io/jdbc/JdbcIOIT.java
 ##
 @@ -148,6 +150,16 @@ private void gatherAndPublishMetrics(PipelineResult 
writeResult, PipelineResult
   return NamedTestResult.create(
   uuid, timestamp, "write_time", (writeEnd - writeStart) / 1e3);
 });
+suppliers.add(
+reader -> {
+  double totalBytes = reader.getCounterMetric("write_bytes");
+  return NamedTestResult.create(uuid, timestamp, "write_bytes", 
totalBytes);
 
 Review comment:
   You're right, I corrected it
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240129)
Time Spent: 14h 20m  (was: 14h 10m)

> Use Metrics API in IO performance tests
> ---
>
> Key: BEAM-6627
> URL: https://issues.apache.org/jira/browse/BEAM-6627
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Michal Walenia
>Assignee: Michal Walenia
>Priority: Minor
>  Time Spent: 14h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6627) Use Metrics API in IO performance tests

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6627?focusedWorklogId=240131&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240131
 ]

ASF GitHub Bot logged work on BEAM-6627:


Author: ASF GitHub Bot
Created on: 10/May/19 09:00
Start Date: 10/May/19 09:00
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #8400: [BEAM-6627] Added 
byte and item counting metrics to integration tests
URL: https://github.com/apache/beam/pull/8400#issuecomment-491215815
 
 
   I added the metrics to JDBC and Mongo IOITs, I'll edit the HadoopFormatIOIT 
in another PR, as it changed in master already and I don't want to create 
conflicts.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240131)
Time Spent: 14.5h  (was: 14h 20m)

> Use Metrics API in IO performance tests
> ---
>
> Key: BEAM-6627
> URL: https://issues.apache.org/jira/browse/BEAM-6627
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Michal Walenia
>Assignee: Michal Walenia
>Priority: Minor
>  Time Spent: 14.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7145) Make Flink Runner compatible with Flink 1.8

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7145?focusedWorklogId=240139&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240139
 ]

ASF GitHub Bot logged work on BEAM-7145:


Author: ASF GitHub Bot
Created on: 10/May/19 09:22
Start Date: 10/May/19 09:22
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #8540: [BEAM-7145] Make 
FlinkRunner compatible with Flink 1.8
URL: https://github.com/apache/beam/pull/8540#issuecomment-491222421
 
 
   Run RAT PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240139)
Time Spent: 20m  (was: 10m)

> Make Flink Runner compatible with Flink 1.8
> ---
>
> Key: BEAM-7145
> URL: https://issues.apache.org/jira/browse/BEAM-7145
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.13.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Users have asked about Flink 1.8 support. From a quick look,
>  * There are changes related to how TypeSerializers are snapshotted. We might 
> have to copy {{CoderTypeSerializer}} to ensure compatibility across the 
> different Flink Runner build targets.
>  * StandaloneClusterClient has been removed. We might drop support for legacy 
> deployment and only allow REST.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-7263) Deprecate set/getClientConfiguration in JdbcIO

2019-05-10 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Baptiste Onofré resolved BEAM-7263.

Resolution: Fixed

> Deprecate set/getClientConfiguration in JdbcIO
> --
>
> Key: BEAM-7263
> URL: https://issues.apache.org/jira/browse/BEAM-7263
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: 2.14.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> JdbcIO relies now on `withDataSourceProviderFn` as the primary way to build a 
> DataSource, the main method used to do this is `withDatasourceConfiguration` 
> relies now on it, so the 'interna' method to get the object should not be 
> exposed anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7263) Deprecate set/getClientConfiguration in JdbcIO

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7263?focusedWorklogId=240136&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240136
 ]

ASF GitHub Bot logged work on BEAM-7263:


Author: ASF GitHub Bot
Created on: 10/May/19 09:17
Start Date: 10/May/19 09:17
Worklog Time Spent: 10m 
  Work Description: jbonofre commented on pull request #8547: [BEAM-7263] 
Deprecate set/getClientConfiguration in JdbcIO
URL: https://github.com/apache/beam/pull/8547
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240136)
Time Spent: 20m  (was: 10m)

> Deprecate set/getClientConfiguration in JdbcIO
> --
>
> Key: BEAM-7263
> URL: https://issues.apache.org/jira/browse/BEAM-7263
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: 2.14.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> JdbcIO relies now on `withDataSourceProviderFn` as the primary way to build a 
> DataSource, the main method used to do this is `withDatasourceConfiguration` 
> relies now on it, so the 'interna' method to get the object should not be 
> exposed anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6813) Issues with state + timers in java Direct Runner

2019-05-10 Thread JIRA


[ 
https://issues.apache.org/jira/browse/BEAM-6813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16837089#comment-16837089
 ] 

Jan Lukavský commented on BEAM-6813:


I suppose the problem is caused by either {{keyCoder}} or {{accumulatorCoder}} 
is missing {{hashCode}} and {{equals}}. Adding these methods should fix the 
problem. Discussion on how to prevent this type of bugs in on dev mailing list.

> Issues with state + timers in java Direct Runner 
> -
>
> Key: BEAM-6813
> URL: https://issues.apache.org/jira/browse/BEAM-6813
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Affects Versions: 2.11.0
>Reporter: Steve Niemitz
>Priority: Major
>
> I was experimenting with a stateful DoFn with timers, and ran into a weird 
> bug where a state cell I was writing to would come back as null when I read 
> it inside a timer callback.
> I've attached the code below [1] (please excuse the scala ;) ).
> After I dug into this a little bit, I found that the state's value was 
> present in the `underlying` table in CopyOnAccessMemoryStateTable [2], but 
> not set in the `stateTable` itself on the instance. [3]   Based on my very 
> rudimentary understanding of how this works in the direct runner, it seems 
> like commit() is not being called on the state table before the timer is 
> firing?
>   
>  [1]
> {code:java}
> private final class AggregatorDoFn[K, V, Acc, Out](
>   combiner: CombineFn[V, Acc, Out],
>   keyCoder: Coder[K],
>   accumulatorCoder: Coder[Acc]
> ) extends DoFn[KV[K, V], KV[K, Out]] {
>   @StateId(KeyId)
>   private final val keySpec = StateSpecs.value(keyCoder)
>   @StateId(AggregationId)
>   private final val stateSpec = StateSpecs.combining(accumulatorCoder, 
> combiner)
>   @StateId("numElements")
>   private final val numElementsSpec = StateSpecs.combining(Sum.ofLongs())
>   @TimerId(FlushTimerId)
>   private final val flushTimerSpec = 
> TimerSpecs.timer(TimeDomain.PROCESSING_TIME)
>   @ProcessElement
>   def processElement(
> @StateId(KeyId) key: ValueState[K],
> @StateId(AggregationId) state: CombiningState[V, Acc, Out],
> @StateId("numElements") numElements: CombiningState[JLong, _, JLong],
> @TimerId(FlushTimerId) flushTimer: Timer,
> @Element element: KV[K, V],
> window: BoundedWindow
>   ): Unit = {
> key.write(element.getKey)
> state.add(element.getValue)
> numElements.add(1L)
> if (numElements.read() == 1) {
>   flushTimer
> .offset(Duration.standardSeconds(10))
> .setRelative()
> }
>   }
>   @OnTimer(FlushTimerId)
>   def onFlushTimer(
> @StateId(KeyId) key: ValueState[K],
> @StateId(AggregationId) state: CombiningState[V, _, Out],
> @StateId("numElements") numElements: CombiningState[JLong, _, JLong],
> output: OutputReceiver[KV[K, Out]]
>   ): Unit = {
> if (numElements.read() > 0) {
>   val k = key.read()
>   output.output(
> KV.of(k, state.read())
>   )
> }
> numElements.clear()
>   }
> }{code}
> [2]
>  [https://imgur.com/a/xvPR5nd]
> [3]
>  [https://imgur.com/a/jznMdaQ]
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7028) Create ParDo and Combine streaming load test jobs for Java SDK

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7028?focusedWorklogId=240155&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240155
 ]

ASF GitHub Bot logged work on BEAM-7028:


Author: ASF GitHub Bot
Created on: 10/May/19 09:41
Start Date: 10/May/19 09:41
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #8539: [BEAM-7028] Change 
ParDo load test according to the proposal
URL: https://github.com/apache/beam/pull/8539#issuecomment-491228040
 
 
   Run Load Tests Java ParDo Dataflow Streaming
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240155)
Time Spent: 17h  (was: 16h 50m)

> Create ParDo and Combine streaming load test jobs for Java SDK
> --
>
> Key: BEAM-7028
> URL: https://issues.apache.org/jira/browse/BEAM-7028
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Michal Walenia
>Assignee: Michal Walenia
>Priority: Minor
>  Time Spent: 17h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7265) Update Spark runner to use spark version 2.4.3

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7265?focusedWorklogId=240153&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240153
 ]

ASF GitHub Bot logged work on BEAM-7265:


Author: ASF GitHub Bot
Created on: 10/May/19 09:40
Start Date: 10/May/19 09:40
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on pull request #8548: 
[BEAM-7265] Update Spark runner to use spark version 2.4.3
URL: https://github.com/apache/beam/pull/8548
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240153)
Time Spent: 40m  (was: 0.5h)

> Update Spark runner to use spark version 2.4.3
> --
>
> Key: BEAM-7265
> URL: https://issues.apache.org/jira/browse/BEAM-7265
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: 2.14.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> To keep up with upstream fixes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6923) OOM errors in jobServer when using GCS artifactDir

2019-05-10 Thread Maximilian Michels (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16837105#comment-16837105
 ] 

Maximilian Michels commented on BEAM-6923:
--

It's not surprising you see this with the Spark Runner as well because the 
artifact staging code is shared between the portable Flink and Spark Runners.

[~ŁukaszG] Is this consistently happening with just a single pipeline execution 
after a fresh start of the job server?

> OOM errors in jobServer when using GCS artifactDir
> --
>
> Key: BEAM-6923
> URL: https://issues.apache.org/jira/browse/BEAM-6923
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Lukasz Gajowy
>Priority: Major
> Attachments: Instance counts.png, Paths to GC root.png, 
> Telemetries.png
>
>
> When starting jobServer with artifactDir pointing to a GCS bucket: 
> {code:java}
> ./gradlew :beam-runners-flink_2.11-job-server:runShadow 
> -PflinkMasterUrl=localhost:8081 -PartifactsDir=gs://the-bucket{code}
> and running a Java portable pipeline with the following, portability related 
> pipeline options: 
> {code:java}
> --runner=PortableRunner --jobEndpoint=localhost:8099 
> --defaultEnvironmentType=DOCKER 
> --defaultEnvironmentConfig=gcr.io//java:latest'{code}
>  
> I'm facing a series of OOM errors, like this: 
> {code:java}
> Exception in thread "grpc-default-executor-3" java.lang.OutOfMemoryError: 
> Java heap space
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:508)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549)
> at 
> com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:301)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745){code}
>  
> This does not happen when I'm using a local filesystem for the artifact 
> staging location. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-7265) Update Spark runner to use spark version 2.4.3

2019-05-10 Thread Alexey Romanenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko resolved BEAM-7265.

Resolution: Fixed

> Update Spark runner to use spark version 2.4.3
> --
>
> Key: BEAM-7265
> URL: https://issues.apache.org/jira/browse/BEAM-7265
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: 2.14.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> To keep up with upstream fixes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-6923) OOM errors in jobServer when using GCS artifactDir

2019-05-10 Thread Lukasz Gajowy (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16837135#comment-16837135
 ] 

Lukasz Gajowy edited comment on BEAM-6923 at 5/10/19 10:09 AM:
---

[~mxm] As far as I remember there was one or two times the job actually 
succeeded (luck? more free memory at that moment?) but definitely most of the 
time it failed regardles of that I just started the job Server or used an 
already running one. I only executed single pipeline at once.


was (Author: łukaszg):
[~mxm] yes. As far as I remember there was one or two times the job actually 
succeeded (luck? more free memory at that moment?) but definitely most of the 
time it failed regardles of that I just started the job Server or used an 
already running one. I only executed single pipeline at once.

> OOM errors in jobServer when using GCS artifactDir
> --
>
> Key: BEAM-6923
> URL: https://issues.apache.org/jira/browse/BEAM-6923
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Lukasz Gajowy
>Priority: Major
> Attachments: Instance counts.png, Paths to GC root.png, 
> Telemetries.png
>
>
> When starting jobServer with artifactDir pointing to a GCS bucket: 
> {code:java}
> ./gradlew :beam-runners-flink_2.11-job-server:runShadow 
> -PflinkMasterUrl=localhost:8081 -PartifactsDir=gs://the-bucket{code}
> and running a Java portable pipeline with the following, portability related 
> pipeline options: 
> {code:java}
> --runner=PortableRunner --jobEndpoint=localhost:8099 
> --defaultEnvironmentType=DOCKER 
> --defaultEnvironmentConfig=gcr.io//java:latest'{code}
>  
> I'm facing a series of OOM errors, like this: 
> {code:java}
> Exception in thread "grpc-default-executor-3" java.lang.OutOfMemoryError: 
> Java heap space
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:508)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549)
> at 
> com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:301)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745){code}
>  
> This does not happen when I'm using a local filesystem for the artifact 
> staging location. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6923) OOM errors in jobServer when using GCS artifactDir

2019-05-10 Thread Lukasz Gajowy (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16837135#comment-16837135
 ] 

Lukasz Gajowy commented on BEAM-6923:
-

[~mxm] yes. As far as I remember there was one or two times the job actually 
succeeded (luck? more free memory at that moment?) but definitely most of the 
time it failed regardles of that I just started the job Server or used an 
already running one. I only executed single pipeline at once.

> OOM errors in jobServer when using GCS artifactDir
> --
>
> Key: BEAM-6923
> URL: https://issues.apache.org/jira/browse/BEAM-6923
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Lukasz Gajowy
>Priority: Major
> Attachments: Instance counts.png, Paths to GC root.png, 
> Telemetries.png
>
>
> When starting jobServer with artifactDir pointing to a GCS bucket: 
> {code:java}
> ./gradlew :beam-runners-flink_2.11-job-server:runShadow 
> -PflinkMasterUrl=localhost:8081 -PartifactsDir=gs://the-bucket{code}
> and running a Java portable pipeline with the following, portability related 
> pipeline options: 
> {code:java}
> --runner=PortableRunner --jobEndpoint=localhost:8099 
> --defaultEnvironmentType=DOCKER 
> --defaultEnvironmentConfig=gcr.io//java:latest'{code}
>  
> I'm facing a series of OOM errors, like this: 
> {code:java}
> Exception in thread "grpc-default-executor-3" java.lang.OutOfMemoryError: 
> Java heap space
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:508)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549)
> at 
> com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:301)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745){code}
>  
> This does not happen when I'm using a local filesystem for the artifact 
> staging location. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-6923) OOM errors in jobServer when using GCS artifactDir

2019-05-10 Thread Lukasz Gajowy (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16837135#comment-16837135
 ] 

Lukasz Gajowy edited comment on BEAM-6923 at 5/10/19 10:09 AM:
---

[~mxm] yes. As far as I remember there was one or two times the job actually 
succeeded (luck? more free memory at that moment?) but definitely most of the 
time it failed regardless of that I just started the job Server or used an 
already running one. I only executed a single pipeline at once.


was (Author: łukaszg):
[~mxm] yes. As far as I remember there was one or two times the job actually 
succeeded (luck? more free memory at that moment?) but definitely most of the 
time it failed regardles of that I just started the job Server or used an 
already running one. I only executed single pipeline at once.

> OOM errors in jobServer when using GCS artifactDir
> --
>
> Key: BEAM-6923
> URL: https://issues.apache.org/jira/browse/BEAM-6923
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Lukasz Gajowy
>Priority: Major
> Attachments: Instance counts.png, Paths to GC root.png, 
> Telemetries.png
>
>
> When starting jobServer with artifactDir pointing to a GCS bucket: 
> {code:java}
> ./gradlew :beam-runners-flink_2.11-job-server:runShadow 
> -PflinkMasterUrl=localhost:8081 -PartifactsDir=gs://the-bucket{code}
> and running a Java portable pipeline with the following, portability related 
> pipeline options: 
> {code:java}
> --runner=PortableRunner --jobEndpoint=localhost:8099 
> --defaultEnvironmentType=DOCKER 
> --defaultEnvironmentConfig=gcr.io//java:latest'{code}
>  
> I'm facing a series of OOM errors, like this: 
> {code:java}
> Exception in thread "grpc-default-executor-3" java.lang.OutOfMemoryError: 
> Java heap space
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:508)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549)
> at 
> com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:301)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745){code}
>  
> This does not happen when I'm using a local filesystem for the artifact 
> staging location. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-6923) OOM errors in jobServer when using GCS artifactDir

2019-05-10 Thread Lukasz Gajowy (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16837135#comment-16837135
 ] 

Lukasz Gajowy edited comment on BEAM-6923 at 5/10/19 10:09 AM:
---

[~mxm] yes. As far as I remember there was one or two times the job actually 
succeeded (luck? more free memory at that moment?) but definitely most of the 
time it failed regardles of that I just started the job Server or used an 
already running one. I only executed single pipeline at once.


was (Author: łukaszg):
[~mxm] As far as I remember there was one or two times the job actually 
succeeded (luck? more free memory at that moment?) but definitely most of the 
time it failed regardles of that I just started the job Server or used an 
already running one. I only executed single pipeline at once.

> OOM errors in jobServer when using GCS artifactDir
> --
>
> Key: BEAM-6923
> URL: https://issues.apache.org/jira/browse/BEAM-6923
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Lukasz Gajowy
>Priority: Major
> Attachments: Instance counts.png, Paths to GC root.png, 
> Telemetries.png
>
>
> When starting jobServer with artifactDir pointing to a GCS bucket: 
> {code:java}
> ./gradlew :beam-runners-flink_2.11-job-server:runShadow 
> -PflinkMasterUrl=localhost:8081 -PartifactsDir=gs://the-bucket{code}
> and running a Java portable pipeline with the following, portability related 
> pipeline options: 
> {code:java}
> --runner=PortableRunner --jobEndpoint=localhost:8099 
> --defaultEnvironmentType=DOCKER 
> --defaultEnvironmentConfig=gcr.io//java:latest'{code}
>  
> I'm facing a series of OOM errors, like this: 
> {code:java}
> Exception in thread "grpc-default-executor-3" java.lang.OutOfMemoryError: 
> Java heap space
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:508)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549)
> at 
> com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:301)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745){code}
>  
> This does not happen when I'm using a local filesystem for the artifact 
> staging location. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7028) Create ParDo and Combine streaming load test jobs for Java SDK

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7028?focusedWorklogId=240166&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240166
 ]

ASF GitHub Bot logged work on BEAM-7028:


Author: ASF GitHub Bot
Created on: 10/May/19 10:12
Start Date: 10/May/19 10:12
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #8539: [BEAM-7028] Change 
ParDo load test according to the proposal
URL: https://github.com/apache/beam/pull/8539#issuecomment-491237211
 
 
   R: @pabloem can you take a look?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240166)
Time Spent: 17h 10m  (was: 17h)

> Create ParDo and Combine streaming load test jobs for Java SDK
> --
>
> Key: BEAM-7028
> URL: https://issues.apache.org/jira/browse/BEAM-7028
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Michal Walenia
>Assignee: Michal Walenia
>Priority: Minor
>  Time Spent: 17h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-7266) Pipeline run does not terminate because of Dataflow runner can close file system writer

2019-05-10 Thread Fabian (JIRA)
Fabian created BEAM-7266:


 Summary: Pipeline run does not terminate because of Dataflow 
runner can close file system writer
 Key: BEAM-7266
 URL: https://issues.apache.org/jira/browse/BEAM-7266
 Project: Beam
  Issue Type: Bug
  Components: io-python-gcp, runner-dataflow
Affects Versions: 2.11.0
Reporter: Fabian


We are using Apache Beam in version 2.11.0 (Python SDK) with the Dataflow 
runner running on the Google Cloud Platform. Two pipeline runs did not 
terminate, i.e. after multiple days (instead of some minutes) they where still 
running. The only error that was logged is:

If fails to close a writer:

{code:java}
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", 
line 649, in do_work
work_executor.execute()
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", 
line 178, in execute
op.finish()
  File "dataflow_worker/native_operations.py", line 93, in 
dataflow_worker.native_operations.NativeWriteOperation.finish
def finish(self):
  File "dataflow_worker/native_operations.py", line 94, in 
dataflow_worker.native_operations.NativeWriteOperation.finish
with self.scoped_finish_state:
  File "dataflow_worker/native_operations.py", line 95, in 
dataflow_worker.native_operations.NativeWriteOperation.finish
self.writer.__exit__(None, None, None)
  File 
"/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativeavroio.py", line 
277, in __exit__
self._data_file_writer.close()
  File "/usr/local/lib/python2.7/dist-packages/avro/datafile.py", line 220, in 
close
self.writer.close()
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/filesystemio.py", 
line 202, in close
self._uploader.finish()
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/gcsio.py", 
line 606, in finish
raise self._upload_thread.last_error  # pylint: disable=raising-bad-type
NotImplementedError{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7266) Pipeline run does not terminate because of Dataflow runner can not close file system writer

2019-05-10 Thread Fabian (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian updated BEAM-7266:
-
Summary: Pipeline run does not terminate because of Dataflow runner can not 
close file system writer  (was: Pipeline run does not terminate because of 
Dataflow runner can close file system writer)

> Pipeline run does not terminate because of Dataflow runner can not close file 
> system writer
> ---
>
> Key: BEAM-7266
> URL: https://issues.apache.org/jira/browse/BEAM-7266
> Project: Beam
>  Issue Type: Bug
>  Components: io-python-gcp, runner-dataflow
>Affects Versions: 2.11.0
>Reporter: Fabian
>Priority: Major
>
> We are using Apache Beam in version 2.11.0 (Python SDK) with the Dataflow 
> runner running on the Google Cloud Platform. Two pipeline runs did not 
> terminate, i.e. after multiple days (instead of some minutes) they where 
> still running. The only error that was logged is:
> If fails to close a writer:
> {code:java}
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 
> 649, in do_work
> work_executor.execute()
>   File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", 
> line 178, in execute
> op.finish()
>   File "dataflow_worker/native_operations.py", line 93, in 
> dataflow_worker.native_operations.NativeWriteOperation.finish
> def finish(self):
>   File "dataflow_worker/native_operations.py", line 94, in 
> dataflow_worker.native_operations.NativeWriteOperation.finish
> with self.scoped_finish_state:
>   File "dataflow_worker/native_operations.py", line 95, in 
> dataflow_worker.native_operations.NativeWriteOperation.finish
> self.writer.__exit__(None, None, None)
>   File 
> "/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativeavroio.py", 
> line 277, in __exit__
> self._data_file_writer.close()
>   File "/usr/local/lib/python2.7/dist-packages/avro/datafile.py", line 220, 
> in close
> self.writer.close()
>   File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/io/filesystemio.py", line 
> 202, in close
> self._uploader.finish()
>   File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/gcsio.py", 
> line 606, in finish
> raise self._upload_thread.last_error  # pylint: disable=raising-bad-type
> NotImplementedError{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-7265) Update Spark runner to use spark version 2.4.3

2019-05-10 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía closed BEAM-7265.
--

> Update Spark runner to use spark version 2.4.3
> --
>
> Key: BEAM-7265
> URL: https://issues.apache.org/jira/browse/BEAM-7265
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: 2.14.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> To keep up with upstream fixes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-7263) Deprecate set/getClientConfiguration in JdbcIO

2019-05-10 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía closed BEAM-7263.
--

> Deprecate set/getClientConfiguration in JdbcIO
> --
>
> Key: BEAM-7263
> URL: https://issues.apache.org/jira/browse/BEAM-7263
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: 2.14.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> JdbcIO relies now on `withDataSourceProviderFn` as the primary way to build a 
> DataSource, the main method used to do this is `withDatasourceConfiguration` 
> relies now on it, so the 'interna' method to get the object should not be 
> exposed anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7126) Double encoding of state keys in portable Flink runner

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7126?focusedWorklogId=240230&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240230
 ]

ASF GitHub Bot logged work on BEAM-7126:


Author: ASF GitHub Bot
Created on: 10/May/19 13:22
Start Date: 10/May/19 13:22
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #8472: [BEAM-7126] Remove 
duplicate key encoding for state backend
URL: https://github.com/apache/beam/pull/8472#issuecomment-491286843
 
 
   @tweise Any news?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240230)
Time Spent: 0.5h  (was: 20m)

> Double encoding of state keys in portable Flink runner
> --
>
> Key: BEAM-7126
> URL: https://issues.apache.org/jira/browse/BEAM-7126
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.11.0
>Reporter: Thomas Weise
>Assignee: Maximilian Michels
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> State keys currently need to be encoded as NESTED. My attempt to use the 
> ByteString directly in BEAM-7112 caused checkpointing to fail. We should look 
> into eliminating the redundant key encoding and adjusting 
> StateRequestHandlers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6923) OOM errors in jobServer when using GCS artifactDir

2019-05-10 Thread Marcelo Pio de Castro (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16837294#comment-16837294
 ] 

Marcelo Pio de Castro commented on BEAM-6923:
-

I saw that the release notes for google-java-api-client mentioned the resume 
upload 
([https://github.com/googleapis/google-api-java-client/releases/tag/v1.27.0),] 
so I tryed to downgrade apache beam version to 2.8.0, this is the first one 
that don't use google-api-java-client 1.27.0.

Then the job fails almost instantly with a OOM, just in a different part:
{code:java}
Caused by: java.lang.OutOfMemoryError: Java heap space
at 
shaded.google.api.client.googleapis.media.MediaHttpUploader.setContentAndHeadersOnCurrentRequest(MediaHttpUploader.java:603)
at 
shaded.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:409)
at 
shaded.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
at 
shaded.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:427)
at 
shaded.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
at 
shaded.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
at 
shaded.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:358)
at java.util.concurrent.FutureTask.run(Unknown Source)
{code}

> OOM errors in jobServer when using GCS artifactDir
> --
>
> Key: BEAM-6923
> URL: https://issues.apache.org/jira/browse/BEAM-6923
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Lukasz Gajowy
>Priority: Major
> Attachments: Instance counts.png, Paths to GC root.png, 
> Telemetries.png
>
>
> When starting jobServer with artifactDir pointing to a GCS bucket: 
> {code:java}
> ./gradlew :beam-runners-flink_2.11-job-server:runShadow 
> -PflinkMasterUrl=localhost:8081 -PartifactsDir=gs://the-bucket{code}
> and running a Java portable pipeline with the following, portability related 
> pipeline options: 
> {code:java}
> --runner=PortableRunner --jobEndpoint=localhost:8099 
> --defaultEnvironmentType=DOCKER 
> --defaultEnvironmentConfig=gcr.io//java:latest'{code}
>  
> I'm facing a series of OOM errors, like this: 
> {code:java}
> Exception in thread "grpc-default-executor-3" java.lang.OutOfMemoryError: 
> Java heap space
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:508)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549)
> at 
> com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:301)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745){code}
>  
> This does not happen when I'm using a local filesystem for the artifact 
> staging location. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-6923) OOM errors in jobServer when using GCS artifactDir

2019-05-10 Thread Marcelo Pio de Castro (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16837294#comment-16837294
 ] 

Marcelo Pio de Castro edited comment on BEAM-6923 at 5/10/19 1:39 PM:
--

I saw that the release notes for google-java-api-client mentioned the resume 
upload 
([https://github.com/googleapis/google-api-java-client/releases/tag/v1.27.0),] 
so I tryed to downgrade apache beam version to 2.8.0, this is the first one 
that don't use google-api-java-client 1.27.0.

Then the job fails almost instantly with a OOM, just in a different part:

Edit: I need to shade the google apis due to different version of spark and 
apache beam versions
{code:java}
Caused by: java.lang.OutOfMemoryError: Java heap space
at 
shaded.google.api.client.googleapis.media.MediaHttpUploader.setContentAndHeadersOnCurrentRequest(MediaHttpUploader.java:603)
at 
shaded.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:409)
at 
shaded.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
at 
shaded.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:427)
at 
shaded.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
at 
shaded.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
at 
shaded.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:358)
at java.util.concurrent.FutureTask.run(Unknown Source)
{code}


was (Author: marcelo.castro):
I saw that the release notes for google-java-api-client mentioned the resume 
upload 
([https://github.com/googleapis/google-api-java-client/releases/tag/v1.27.0),] 
so I tryed to downgrade apache beam version to 2.8.0, this is the first one 
that don't use google-api-java-client 1.27.0.

Then the job fails almost instantly with a OOM, just in a different part:
{code:java}
Caused by: java.lang.OutOfMemoryError: Java heap space
at 
shaded.google.api.client.googleapis.media.MediaHttpUploader.setContentAndHeadersOnCurrentRequest(MediaHttpUploader.java:603)
at 
shaded.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:409)
at 
shaded.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
at 
shaded.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:427)
at 
shaded.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
at 
shaded.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
at 
shaded.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:358)
at java.util.concurrent.FutureTask.run(Unknown Source)
{code}

> OOM errors in jobServer when using GCS artifactDir
> --
>
> Key: BEAM-6923
> URL: https://issues.apache.org/jira/browse/BEAM-6923
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Lukasz Gajowy
>Priority: Major
> Attachments: Instance counts.png, Paths to GC root.png, 
> Telemetries.png
>
>
> When starting jobServer with artifactDir pointing to a GCS bucket: 
> {code:java}
> ./gradlew :beam-runners-flink_2.11-job-server:runShadow 
> -PflinkMasterUrl=localhost:8081 -PartifactsDir=gs://the-bucket{code}
> and running a Java portable pipeline with the following, portability related 
> pipeline options: 
> {code:java}
> --runner=PortableRunner --jobEndpoint=localhost:8099 
> --defaultEnvironmentType=DOCKER 
> --defaultEnvironmentConfig=gcr.io//java:latest'{code}
>  
> I'm facing a series of OOM errors, like this: 
> {code:java}
> Exception in thread "grpc-default-executor-3" java.lang.OutOfMemoryError: 
> Java heap space
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:508)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549)
> at 
> com.google.cloud.hadoop.util.AbstractGoogleA

[jira] [Work logged] (BEAM-7145) Make Flink Runner compatible with Flink 1.8

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7145?focusedWorklogId=240258&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240258
 ]

ASF GitHub Bot logged work on BEAM-7145:


Author: ASF GitHub Bot
Created on: 10/May/19 14:16
Start Date: 10/May/19 14:16
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #8540: [BEAM-7145] 
Make FlinkRunner compatible with Flink 1.8
URL: https://github.com/apache/beam/pull/8540
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240258)
Time Spent: 0.5h  (was: 20m)

> Make Flink Runner compatible with Flink 1.8
> ---
>
> Key: BEAM-7145
> URL: https://issues.apache.org/jira/browse/BEAM-7145
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.13.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Users have asked about Flink 1.8 support. From a quick look,
>  * There are changes related to how TypeSerializers are snapshotted. We might 
> have to copy {{CoderTypeSerializer}} to ensure compatibility across the 
> different Flink Runner build targets.
>  * StandaloneClusterClient has been removed. We might drop support for legacy 
> deployment and only allow REST.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7145) Make Flink Runner compatible with Flink 1.8

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7145?focusedWorklogId=240259&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240259
 ]

ASF GitHub Bot logged work on BEAM-7145:


Author: ASF GitHub Bot
Created on: 10/May/19 14:20
Start Date: 10/May/19 14:20
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #8549: 
[release-2.13.0][BEAM-7145] Make FlinkRunner compatible with Flink 1.8
URL: https://github.com/apache/beam/pull/8549
 
 
   Backported from #8540.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/)
 | --- | --- | ---
   
   Pre-Commit Tests Status (on master branch)
   

   
   --- |Java | Python | Go | Website
   --- | --- | --- | --- | ---
   Non-portable | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Website_Cron/

[jira] [Work logged] (BEAM-7145) Make Flink Runner compatible with Flink 1.8

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7145?focusedWorklogId=240260&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240260
 ]

ASF GitHub Bot logged work on BEAM-7145:


Author: ASF GitHub Bot
Created on: 10/May/19 14:21
Start Date: 10/May/19 14:21
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #8549: 
[release-2.13.0][BEAM-7145] Make FlinkRunner compatible with Flink 1.8
URL: https://github.com/apache/beam/pull/8549#issuecomment-491306460
 
 
   CC @iemejia @angoenka 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240260)
Time Spent: 50m  (was: 40m)

> Make Flink Runner compatible with Flink 1.8
> ---
>
> Key: BEAM-7145
> URL: https://issues.apache.org/jira/browse/BEAM-7145
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.13.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Users have asked about Flink 1.8 support. From a quick look,
>  * There are changes related to how TypeSerializers are snapshotted. We might 
> have to copy {{CoderTypeSerializer}} to ensure compatibility across the 
> different Flink Runner build targets.
>  * StandaloneClusterClient has been removed. We might drop support for legacy 
> deployment and only allow REST.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7181) Python 3.6 IT tests: PubSub Expected 2 messages. Got 0 messages.

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7181?focusedWorklogId=240261&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240261
 ]

ASF GitHub Bot logged work on BEAM-7181:


Author: ASF GitHub Bot
Created on: 10/May/19 14:29
Start Date: 10/May/19 14:29
Worklog Time Spent: 10m 
  Work Description: Juta commented on pull request #8550: 
[BEAM-7181][BEAM-7243] add streaming it tests python 3.6
URL: https://github.com/apache/beam/pull/8550
 
 
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/)
 | --- | --- | ---
   
   Pre-Commit Tests Status (on master branch)
   

   
   --- |Java | Python | Go | Website
   --- | --- | --- | --- | ---
   Non-portable | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/)
 
   Portable | --

[jira] [Work logged] (BEAM-7181) Python 3.6 IT tests: PubSub Expected 2 messages. Got 0 messages.

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7181?focusedWorklogId=240262&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240262
 ]

ASF GitHub Bot logged work on BEAM-7181:


Author: ASF GitHub Bot
Created on: 10/May/19 14:30
Start Date: 10/May/19 14:30
Worklog Time Spent: 10m 
  Work Description: Juta commented on issue #8550: [BEAM-7181][BEAM-7243] 
add streaming it tests python 3.6
URL: https://github.com/apache/beam/pull/8550#issuecomment-491309457
 
 
   Run Python PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240262)
Time Spent: 20m  (was: 10m)

> Python 3.6 IT tests: PubSub Expected 2 messages. Got 0 messages.
> 
>
> Key: BEAM-7181
> URL: https://issues.apache.org/jira/browse/BEAM-7181
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Juta Staes
>Assignee: Valentyn Tymofieiev
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Several test fail in the 
> beam-sdks-python-test-suites-dataflow-py36:postCommitIT with the following 
> error
> {code:java}
> 19:13:05 
> ==
>  19:13:05 FAIL: test_streaming_data_only 
> (apache_beam.io.gcp.pubsub_integration_test.PubSubIntegrationTest)
>  19:13:05 
> --
>  19:13:05 Traceback (most recent call last):
>  19:13:05 File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/io/gcp/pubsub_integration_test.py",
>  line 175, in test_streaming_data_only
>  19:13:05 self._test_streaming(with_attributes=False)
>  19:13:05 File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/io/gcp/pubsub_integration_test.py",
>  line 171, in _test_streaming
>  19:13:05 timestamp_attribute=self.TIMESTAMP_ATTRIBUTE)
>  19:13:05 File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/io/gcp/pubsub_it_pipeline.py",
>  line 91, in run_pipeline
>  19:13:05 result = p.run()
>  19:13:05 File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
>  19:13:05 return self.runner.run_pipeline(self, self._options)
>  19:13:05 File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py",
>  line 68, in run_pipeline
>  19:13:05 hc_assert_that(self.result, pickler.loads(on_success_matcher))
>  19:13:05 AssertionError: 
>  19:13:05 Expected: (Test pipeline expected terminated in state: RUNNING and 
> Expected 2 messages.)
>  19:13:05 but: Expected 2 messages. Got 0 messages. Diffs (item, count):
>  19:13:05 Expected but not in actual: dict_items([('data001-seen', 1), 
> ('data002-seen', 1)])
>  19:13:05 Unexpected: dict_items([]){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-7144) Job re-scale fails on Flink 1.7

2019-05-10 Thread Maximilian Michels (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16837341#comment-16837341
 ] 

Maximilian Michels commented on BEAM-7144:
--

I've done some testing and found rescaling works on Flink 1.6/1.7, at least for 
a simple pipeline. I think I need a bit more information what's going on in 
your pipeline.

> Job re-scale fails on Flink 1.7
> ---
>
> Key: BEAM-7144
> URL: https://issues.apache.org/jira/browse/BEAM-7144
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.11.0
>Reporter: Jozef Vilcek
>Priority: Major
> Fix For: 2.13.0
>
>
> I am unable to rescale job after moving it to flink runner 1.7. What I am 
> doing is:
>  # Recompile job code just with swapped flink runner version 1.5 -> 1.7
>  # Run streaming job with parallelism 112 and maxParallelism 448
>  # Wait until checkpoint is taken
>  # Stop job
>  # Run job again with parallelims 224 and checpooint path to restore from
>  # Job fails
> The same happens if I try to increase parallelims. This procedure works for 
> the same job compiled with flink runner 1.5 and run on 1.5.0. Fails with 
> runner 1.7 on flink 1.7.2
> Exception is:
> {noformat}
> java.lang.Exception: Exception while creating StreamOperatorStateContext.
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:195)
> at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:250)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:738)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:289)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for 
> WindowDoFnOperator_2b6af61dc418f10e82551367a7e7f78e_(83/224) from any of the 
> 1 provided restore options.
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:137)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:284)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135)
> ... 5 more
> Caused by: java.lang.IndexOutOfBoundsException: Index: 101, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:653)
> at java.util.ArrayList.get(ArrayList.java:429)
> at 
> com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:42)
> at com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:805)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:759)
> at 
> org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize(KryoSerializer.java:315)
> at 
> org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders.lambda$createV2PlusReader$0(StateTableByKeyGroupReaders.java:73)
> at 
> org.apache.flink.runtime.state.KeyGroupPartitioner$PartitioningResultKeyGroupReader.readMappingsInKeyGroup(KeyGroupPartitioner.java:297)
> at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.readKeyGroupStateData(HeapKeyedStateBackend.java:492)
> at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.readStateHandleStateData(HeapKeyedStateBackend.java:453)
> at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restorePartitionedState(HeapKeyedStateBackend.java:410)
> at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:358)
> at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:104)
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:151)
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:123)
> ... 7 more{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-7144) Job re-scale fails on Flink 1.7

2019-05-10 Thread Maximilian Michels (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels reassigned BEAM-7144:


Assignee: Maximilian Michels

> Job re-scale fails on Flink 1.7
> ---
>
> Key: BEAM-7144
> URL: https://issues.apache.org/jira/browse/BEAM-7144
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.11.0
>Reporter: Jozef Vilcek
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.13.0
>
>
> I am unable to rescale job after moving it to flink runner 1.7. What I am 
> doing is:
>  # Recompile job code just with swapped flink runner version 1.5 -> 1.7
>  # Run streaming job with parallelism 112 and maxParallelism 448
>  # Wait until checkpoint is taken
>  # Stop job
>  # Run job again with parallelims 224 and checpooint path to restore from
>  # Job fails
> The same happens if I try to increase parallelims. This procedure works for 
> the same job compiled with flink runner 1.5 and run on 1.5.0. Fails with 
> runner 1.7 on flink 1.7.2
> Exception is:
> {noformat}
> java.lang.Exception: Exception while creating StreamOperatorStateContext.
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:195)
> at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:250)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:738)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:289)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for 
> WindowDoFnOperator_2b6af61dc418f10e82551367a7e7f78e_(83/224) from any of the 
> 1 provided restore options.
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:137)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:284)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135)
> ... 5 more
> Caused by: java.lang.IndexOutOfBoundsException: Index: 101, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:653)
> at java.util.ArrayList.get(ArrayList.java:429)
> at 
> com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:42)
> at com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:805)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:759)
> at 
> org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize(KryoSerializer.java:315)
> at 
> org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders.lambda$createV2PlusReader$0(StateTableByKeyGroupReaders.java:73)
> at 
> org.apache.flink.runtime.state.KeyGroupPartitioner$PartitioningResultKeyGroupReader.readMappingsInKeyGroup(KeyGroupPartitioner.java:297)
> at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.readKeyGroupStateData(HeapKeyedStateBackend.java:492)
> at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.readStateHandleStateData(HeapKeyedStateBackend.java:453)
> at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restorePartitionedState(HeapKeyedStateBackend.java:410)
> at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:358)
> at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:104)
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:151)
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:123)
> ... 7 more{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7145) Make Flink Runner compatible with Flink 1.8

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7145?focusedWorklogId=240289&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240289
 ]

ASF GitHub Bot logged work on BEAM-7145:


Author: ASF GitHub Bot
Created on: 10/May/19 15:15
Start Date: 10/May/19 15:15
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #8549: 
[release-2.13.0][BEAM-7145] Make FlinkRunner compatible with Flink 1.8
URL: https://github.com/apache/beam/pull/8549#issuecomment-491325356
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240289)
Time Spent: 1h  (was: 50m)

> Make Flink Runner compatible with Flink 1.8
> ---
>
> Key: BEAM-7145
> URL: https://issues.apache.org/jira/browse/BEAM-7145
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.13.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Users have asked about Flink 1.8 support. From a quick look,
>  * There are changes related to how TypeSerializers are snapshotted. We might 
> have to copy {{CoderTypeSerializer}} to ensure compatibility across the 
> different Flink Runner build targets.
>  * StandaloneClusterClient has been removed. We might drop support for legacy 
> deployment and only allow REST.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7145) Make Flink Runner compatible with Flink 1.8

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7145?focusedWorklogId=240290&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240290
 ]

ASF GitHub Bot logged work on BEAM-7145:


Author: ASF GitHub Bot
Created on: 10/May/19 15:15
Start Date: 10/May/19 15:15
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #8549: 
[release-2.13.0][BEAM-7145] Make FlinkRunner compatible with Flink 1.8
URL: https://github.com/apache/beam/pull/8549#issuecomment-491325356
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240290)
Time Spent: 1h 10m  (was: 1h)

> Make Flink Runner compatible with Flink 1.8
> ---
>
> Key: BEAM-7145
> URL: https://issues.apache.org/jira/browse/BEAM-7145
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.13.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Users have asked about Flink 1.8 support. From a quick look,
>  * There are changes related to how TypeSerializers are snapshotted. We might 
> have to copy {{CoderTypeSerializer}} to ensure compatibility across the 
> different Flink Runner build targets.
>  * StandaloneClusterClient has been removed. We might drop support for legacy 
> deployment and only allow REST.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-7144) Job re-scale fails on Flink 1.7

2019-05-10 Thread Jozef Vilcek (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16837382#comment-16837382
 ] 

Jozef Vilcek commented on BEAM-7144:


Hm, I was afraid of this. Pipeline I was testing is like

   KafkaRead -> Filter -> WriteFiles -> Stateful map of written files (until 
accumulate required size) -> Map file name list per Key

List to files is taken as WriteFilesResult.getPerDestinationOutputFilenames(). 
Pipeline is written in Scio.

> Job re-scale fails on Flink 1.7
> ---
>
> Key: BEAM-7144
> URL: https://issues.apache.org/jira/browse/BEAM-7144
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.11.0
>Reporter: Jozef Vilcek
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.13.0
>
>
> I am unable to rescale job after moving it to flink runner 1.7. What I am 
> doing is:
>  # Recompile job code just with swapped flink runner version 1.5 -> 1.7
>  # Run streaming job with parallelism 112 and maxParallelism 448
>  # Wait until checkpoint is taken
>  # Stop job
>  # Run job again with parallelims 224 and checpooint path to restore from
>  # Job fails
> The same happens if I try to increase parallelims. This procedure works for 
> the same job compiled with flink runner 1.5 and run on 1.5.0. Fails with 
> runner 1.7 on flink 1.7.2
> Exception is:
> {noformat}
> java.lang.Exception: Exception while creating StreamOperatorStateContext.
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:195)
> at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:250)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:738)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:289)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for 
> WindowDoFnOperator_2b6af61dc418f10e82551367a7e7f78e_(83/224) from any of the 
> 1 provided restore options.
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:137)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:284)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135)
> ... 5 more
> Caused by: java.lang.IndexOutOfBoundsException: Index: 101, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:653)
> at java.util.ArrayList.get(ArrayList.java:429)
> at 
> com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:42)
> at com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:805)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:759)
> at 
> org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize(KryoSerializer.java:315)
> at 
> org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders.lambda$createV2PlusReader$0(StateTableByKeyGroupReaders.java:73)
> at 
> org.apache.flink.runtime.state.KeyGroupPartitioner$PartitioningResultKeyGroupReader.readMappingsInKeyGroup(KeyGroupPartitioner.java:297)
> at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.readKeyGroupStateData(HeapKeyedStateBackend.java:492)
> at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.readStateHandleStateData(HeapKeyedStateBackend.java:453)
> at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restorePartitionedState(HeapKeyedStateBackend.java:410)
> at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:358)
> at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:104)
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:151)
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:123)
> ... 7 more{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7240) Kinesis IO Watermark Computation Improvements

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7240?focusedWorklogId=240304&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240304
 ]

ASF GitHub Bot logged work on BEAM-7240:


Author: ASF GitHub Bot
Created on: 10/May/19 15:41
Start Date: 10/May/19 15:41
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on pull request #8513: 
[BEAM-7240] Kinesis IO Watermark Computation Improvements
URL: https://github.com/apache/beam/pull/8513#discussion_r282879046
 
 

 ##
 File path: 
sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisIO.java
 ##
 @@ -190,6 +190,7 @@ public static Read read() {
 return new AutoValue_KinesisIO_Read.Builder()
 .setMaxNumRecords(Long.MAX_VALUE)
 .setUpToDateThreshold(Duration.ZERO)
+
.setKinesisWatermarkPolicyFactory(KinesisWatermarkPolicyFactory.withArrivalTimePolicy())
 
 Review comment:
   Here and everywhere below, I think we can omit using `Kinesis` in 
method/class names since it's assumed by default. So, we can make it a little 
bit less verbose, for example:
   `.setWatermarkPolicyFactory(WatermarkPolicyFactory.withArrivalTimePolicy())`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240304)
Time Spent: 20m  (was: 10m)

> Kinesis IO Watermark Computation Improvements
> -
>
> Key: BEAM-7240
> URL: https://issues.apache.org/jira/browse/BEAM-7240
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kinesis
>Reporter: Ajo Thomas
>Assignee: Ajo Thomas
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, watermarks in kinesis IO are computed taking into account the 
> record arrival time in a {{KinesisRecord}}. The arrival time might not always 
> be the right representation of the event time. The user of the IO should be 
> able to specify how they want to extract the event time from the 
> KinesisRecord. 
> As the per current logic, the end user of the IO cannot control watermark 
> computation in any way. A user should be able to control watermark 
> computation through some custom heuristics or configurable params like time 
> duration to advance the watermark if no data was received (could be due to a 
> shard getting stalled.  The watermark should advance and not be stalled in 
> that case).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7240) Kinesis IO Watermark Computation Improvements

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7240?focusedWorklogId=240305&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240305
 ]

ASF GitHub Bot logged work on BEAM-7240:


Author: ASF GitHub Bot
Created on: 10/May/19 15:44
Start Date: 10/May/19 15:44
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #8513: [BEAM-7240] 
Kinesis IO Watermark Computation Improvements
URL: https://github.com/apache/beam/pull/8513#issuecomment-491335324
 
 
   R: @krzysztof-tr 
   As original author of last changes in `KinesisWatermark`and related code, 
could you take a look on this PR too, please?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240305)
Time Spent: 0.5h  (was: 20m)

> Kinesis IO Watermark Computation Improvements
> -
>
> Key: BEAM-7240
> URL: https://issues.apache.org/jira/browse/BEAM-7240
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kinesis
>Reporter: Ajo Thomas
>Assignee: Ajo Thomas
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently, watermarks in kinesis IO are computed taking into account the 
> record arrival time in a {{KinesisRecord}}. The arrival time might not always 
> be the right representation of the event time. The user of the IO should be 
> able to specify how they want to extract the event time from the 
> KinesisRecord. 
> As the per current logic, the end user of the IO cannot control watermark 
> computation in any way. A user should be able to control watermark 
> computation through some custom heuristics or configurable params like time 
> duration to advance the watermark if no data was received (could be due to a 
> shard getting stalled.  The watermark should advance and not be stalled in 
> that case).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-7216) beam-sdks-java-io-kafka error with kafka brokers < 0.11

2019-05-10 Thread Alexey Romanenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko resolved BEAM-7216.

   Resolution: Fixed
Fix Version/s: 2.13.0

> beam-sdks-java-io-kafka error with kafka brokers < 0.11
> ---
>
> Key: BEAM-7216
> URL: https://issues.apache.org/jira/browse/BEAM-7216
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kafka
>Affects Versions: 2.10.0, 2.11.0, 2.12.0
>Reporter: Richard Moorhead
>Priority: Minor
> Fix For: 2.13.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In beam 2.9.0, KafkaRecordCoder was used for both producer/consumer records 
> in KafkaIO, in version 2.10.0, ProducerRecordCoder was introduced but it 
> appears that in the following code checks are not made to ensure kafka client 
> compatibility:
> [https://github.com/apache/beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ProducerRecordCoder.java#L137]
> Specifically the method call to `headers` will fail for kafka clients < 0.11. 
> Elsewhere in this class there are checks on ConsumerSpEL and it is proposed 
> that they should be reused in the line referenced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6916) Reorganize Beam SQL docs

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6916?focusedWorklogId=240322&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240322
 ]

ASF GitHub Bot logged work on BEAM-6916:


Author: ASF GitHub Bot
Created on: 10/May/19 16:47
Start Date: 10/May/19 16:47
Worklog Time Spent: 10m 
  Work Description: melap commented on pull request #8455: [BEAM-6916] 
Reorg Beam SQL docs and add Calcite section
URL: https://github.com/apache/beam/pull/8455#discussion_r282677350
 
 

 ##
 File path: website/src/documentation/dsls/sql/calcite/overview.md
 ##
 @@ -0,0 +1,85 @@
+---
+layout: section
+title: "Beam SQL in Calcite: Overview"
+section_menu: section-menu/sdks.html
+permalink: /documentation/dsls/sql/calcite/overview/
+---
+
+# Beam SQL in Calcite: Overview
+
+[Apache Calcite](http://calcite.apache.org) is a widespread SQL dialect used in
+big data processing with some streaming enhancements. Calcite provides the
+basic dialect underlying Beam SQL. 
+
+The following table summarizes Apache Calcite operators and functions 
supported by Beam SQL.
+
+
+  Operators and functionsBeam SQL support status
+http://calcite.apache.org/docs/reference.html#operator-precedence";>Operator
 precedenceYes
+http://calcite.apache.org/docs/reference.html#comparison-operators";>Comparison
 operatorsYes
+http://calcite.apache.org/docs/reference.html#logical-operators";>Logical 
operatorsYes
+http://calcite.apache.org/docs/reference.html#arithmetic-operators-and-functions";>Arithmetic
 operators and functionsYes
+http://calcite.apache.org/docs/reference.html#character-string-operators-and-functions";>Character
 string operators and functionsYes
+http://calcite.apache.org/docs/reference.html#binary-string-operators-and-functions";>Binary
 string operators and functionsNo
+http://calcite.apache.org/docs/reference.html#datetime-functions";>Date/time
 functionsYes
+http://calcite.apache.org/docs/reference.html#system-functions";>System 
functionsNo
+http://calcite.apache.org/docs/reference.html#conditional-functions-and-operators";>Conditional
 functions and operatorsYes
+http://calcite.apache.org/docs/reference.html#type-conversion";>Type 
conversionYes
+http://calcite.apache.org/docs/reference.html#value-constructors";>Value 
constructorsNo, except array
+http://calcite.apache.org/docs/reference.html#collection-functions";>Collection
 functionsNo
+http://calcite.apache.org/docs/reference.html#period-predicates";>Period 
predicatesNo
+http://calcite.apache.org/docs/reference.html#jdbc-function-escape";>JDBC 
function escapeNo
+http://calcite.apache.org/docs/reference.html#aggregate-functions";>Aggregate
 functions
+Use Beam SQL https://beam.apache.org/documentation/dsls/sql/calcite/aggregate-functions/";>aggregate
 functions
+http://calcite.apache.org/docs/reference.html#window-functions";>Window 
functionsNo
+http://calcite.apache.org/docs/reference.html#grouping-functions";>Grouping
 functionsNo
+http://calcite.apache.org/docs/reference.html#grouped-window-functions";>Grouped
 window functionsUse Beam SQL https://beam.apache.org/documentation/dsls/sql/windowing-and-triggering/";>windowing
 and triggering
+http://calcite.apache.org/docs/reference.html#grouped-auxiliary-functions";>Grouped
 auxiliary functionsYes, except SESSION_END
+http://calcite.apache.org/docs/reference.html#spatial-functions";>Spatial 
functionsNo
+http://calcite.apache.org/docs/reference.html#geometry-creation-functions-3d";>Geometry
 creation functions (3D)No
+http://calcite.apache.org/docs/reference.html#geometry-predicates";>Geometry
 predicatesNo
+http://calcite.apache.org/docs/reference.html#json-functions";>JSON 
functionsNo
+http://calcite.apache.org/docs/reference.html#user-defined-functions";>User-defined
 functions
+Use Beam SQL https://beam.apache.org/documentation/dsls/sql/user-defined-functions/";>user-defined
 functions. You cannot call functions with http://calcite.apache.org/docs/reference.html#calling-functions-with-named-and-optional-parameters";>named
 and optional parameters.
+http://calcite.apache.org/docs/reference.html#match_recognize";>MATCH_RECOGNIZENo
+http://calcite.apache.org/docs/reference.html#ddl-extensions";>DDL 
ExtensionsUse Beam SQL https://beam.apache.org/documentation/dsls/sql/create-external-table/";>CREATE
 EXTERNAL TABLE
+
+
+We have added additional extensions to
+make it easy to leverage Beam's unified batch/streaming model and support
+for complex data types.
+
+## Query syntax
+Query statements scan one or more tables or expressions and return the 
computed result rows.
+The [Query syntax]({{ site.baseurl
+}}/documentation/dsls/sql/calcite/query-syntax) page describes Beam SQL's 
syntax for queries when using Apache Calcite.
+
+## Data types
+Beam SQL supports standard SQL scalar data types as well as extensions 
including arrays, maps, and nested rows.
+Read about supported [data types](

[jira] [Work logged] (BEAM-6916) Reorganize Beam SQL docs

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6916?focusedWorklogId=240323&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240323
 ]

ASF GitHub Bot logged work on BEAM-6916:


Author: ASF GitHub Bot
Created on: 10/May/19 16:47
Start Date: 10/May/19 16:47
Worklog Time Spent: 10m 
  Work Description: melap commented on pull request #8455: [BEAM-6916] 
Reorg Beam SQL docs and add Calcite section
URL: https://github.com/apache/beam/pull/8455#discussion_r282680032
 
 

 ##
 File path: website/src/documentation/dsls/sql/calcite/overview.md
 ##
 @@ -0,0 +1,85 @@
+---
+layout: section
+title: "Beam SQL in Calcite: Overview"
+section_menu: section-menu/sdks.html
+permalink: /documentation/dsls/sql/calcite/overview/
+---
+
+# Beam SQL in Calcite: Overview
+
+[Apache Calcite](http://calcite.apache.org) is a widespread SQL dialect used in
+big data processing with some streaming enhancements. Calcite provides the
+basic dialect underlying Beam SQL. 
+
+The following table summarizes Apache Calcite operators and functions 
supported by Beam SQL.
+
+
+  Operators and functionsBeam SQL support status
+http://calcite.apache.org/docs/reference.html#operator-precedence";>Operator
 precedenceYes
+http://calcite.apache.org/docs/reference.html#comparison-operators";>Comparison
 operatorsYes
+http://calcite.apache.org/docs/reference.html#logical-operators";>Logical 
operatorsYes
+http://calcite.apache.org/docs/reference.html#arithmetic-operators-and-functions";>Arithmetic
 operators and functionsYes
+http://calcite.apache.org/docs/reference.html#character-string-operators-and-functions";>Character
 string operators and functionsYes
+http://calcite.apache.org/docs/reference.html#binary-string-operators-and-functions";>Binary
 string operators and functionsNo
+http://calcite.apache.org/docs/reference.html#datetime-functions";>Date/time
 functionsYes
+http://calcite.apache.org/docs/reference.html#system-functions";>System 
functionsNo
+http://calcite.apache.org/docs/reference.html#conditional-functions-and-operators";>Conditional
 functions and operatorsYes
+http://calcite.apache.org/docs/reference.html#type-conversion";>Type 
conversionYes
+http://calcite.apache.org/docs/reference.html#value-constructors";>Value 
constructorsNo, except array
+http://calcite.apache.org/docs/reference.html#collection-functions";>Collection
 functionsNo
+http://calcite.apache.org/docs/reference.html#period-predicates";>Period 
predicatesNo
+http://calcite.apache.org/docs/reference.html#jdbc-function-escape";>JDBC 
function escapeNo
+http://calcite.apache.org/docs/reference.html#aggregate-functions";>Aggregate
 functions
+Use Beam SQL https://beam.apache.org/documentation/dsls/sql/calcite/aggregate-functions/";>aggregate
 functions
+http://calcite.apache.org/docs/reference.html#window-functions";>Window 
functionsNo
+http://calcite.apache.org/docs/reference.html#grouping-functions";>Grouping
 functionsNo
+http://calcite.apache.org/docs/reference.html#grouped-window-functions";>Grouped
 window functionsUse Beam SQL https://beam.apache.org/documentation/dsls/sql/windowing-and-triggering/";>windowing
 and triggering
+http://calcite.apache.org/docs/reference.html#grouped-auxiliary-functions";>Grouped
 auxiliary functionsYes, except SESSION_END
+http://calcite.apache.org/docs/reference.html#spatial-functions";>Spatial 
functionsNo
+http://calcite.apache.org/docs/reference.html#geometry-creation-functions-3d";>Geometry
 creation functions (3D)No
+http://calcite.apache.org/docs/reference.html#geometry-predicates";>Geometry
 predicatesNo
+http://calcite.apache.org/docs/reference.html#json-functions";>JSON 
functionsNo
+http://calcite.apache.org/docs/reference.html#user-defined-functions";>User-defined
 functions
+Use Beam SQL https://beam.apache.org/documentation/dsls/sql/user-defined-functions/";>user-defined
 functions. You cannot call functions with http://calcite.apache.org/docs/reference.html#calling-functions-with-named-and-optional-parameters";>named
 and optional parameters.
+http://calcite.apache.org/docs/reference.html#match_recognize";>MATCH_RECOGNIZENo
+http://calcite.apache.org/docs/reference.html#ddl-extensions";>DDL 
ExtensionsUse Beam SQL https://beam.apache.org/documentation/dsls/sql/create-external-table/";>CREATE
 EXTERNAL TABLE
+
+
+We have added additional extensions to
+make it easy to leverage Beam's unified batch/streaming model and support
+for complex data types.
+
+## Query syntax
+Query statements scan one or more tables or expressions and return the 
computed result rows.
+The [Query syntax]({{ site.baseurl
+}}/documentation/dsls/sql/calcite/query-syntax) page describes Beam SQL's 
syntax for queries when using Apache Calcite.
+
+## Data types
+Beam SQL supports standard SQL scalar data types as well as extensions 
including arrays, maps, and nested rows.
+Read about supported [data types](

[jira] [Work logged] (BEAM-6916) Reorganize Beam SQL docs

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6916?focusedWorklogId=240324&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240324
 ]

ASF GitHub Bot logged work on BEAM-6916:


Author: ASF GitHub Bot
Created on: 10/May/19 16:47
Start Date: 10/May/19 16:47
Worklog Time Spent: 10m 
  Work Description: melap commented on pull request #8455: [BEAM-6916] 
Reorg Beam SQL docs and add Calcite section
URL: https://github.com/apache/beam/pull/8455#discussion_r282676870
 
 

 ##
 File path: website/src/documentation/dsls/sql/calcite/overview.md
 ##
 @@ -0,0 +1,85 @@
+---
+layout: section
+title: "Beam SQL in Calcite: Overview"
+section_menu: section-menu/sdks.html
+permalink: /documentation/dsls/sql/calcite/overview/
+---
+
+# Beam SQL in Calcite: Overview
+
+[Apache Calcite](http://calcite.apache.org) is a widespread SQL dialect used in
+big data processing with some streaming enhancements. Calcite provides the
+basic dialect underlying Beam SQL. 
+
+The following table summarizes Apache Calcite operators and functions 
supported by Beam SQL.
+
+
+  Operators and functionsBeam SQL support status
+http://calcite.apache.org/docs/reference.html#operator-precedence";>Operator
 precedenceYes
+http://calcite.apache.org/docs/reference.html#comparison-operators";>Comparison
 operatorsYes
+http://calcite.apache.org/docs/reference.html#logical-operators";>Logical 
operatorsYes
+http://calcite.apache.org/docs/reference.html#arithmetic-operators-and-functions";>Arithmetic
 operators and functionsYes
+http://calcite.apache.org/docs/reference.html#character-string-operators-and-functions";>Character
 string operators and functionsYes
+http://calcite.apache.org/docs/reference.html#binary-string-operators-and-functions";>Binary
 string operators and functionsNo
+http://calcite.apache.org/docs/reference.html#datetime-functions";>Date/time
 functionsYes
+http://calcite.apache.org/docs/reference.html#system-functions";>System 
functionsNo
+http://calcite.apache.org/docs/reference.html#conditional-functions-and-operators";>Conditional
 functions and operatorsYes
+http://calcite.apache.org/docs/reference.html#type-conversion";>Type 
conversionYes
+http://calcite.apache.org/docs/reference.html#value-constructors";>Value 
constructorsNo, except array
+http://calcite.apache.org/docs/reference.html#collection-functions";>Collection
 functionsNo
+http://calcite.apache.org/docs/reference.html#period-predicates";>Period 
predicatesNo
+http://calcite.apache.org/docs/reference.html#jdbc-function-escape";>JDBC 
function escapeNo
+http://calcite.apache.org/docs/reference.html#aggregate-functions";>Aggregate
 functions
+Use Beam SQL https://beam.apache.org/documentation/dsls/sql/calcite/aggregate-functions/";>aggregate
 functions
 
 Review comment:
   for all of these, perhaps "See" instead of "Use"?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240324)
Time Spent: 2.5h  (was: 2h 20m)
Remaining Estimate: 165.5h  (was: 165h 40m)

> Reorganize Beam SQL docs
> 
>
> Key: BEAM-6916
> URL: https://issues.apache.org/jira/browse/BEAM-6916
> Project: Beam
>  Issue Type: New Feature
>  Components: website
>Reporter: Rose Nguyen
>Assignee: Rose Nguyen
>Priority: Minor
>   Original Estimate: 168h
>  Time Spent: 2.5h
>  Remaining Estimate: 165.5h
>
> This page describes the Calcite SQL dialect supported by Beam SQL.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6916) Reorganize Beam SQL docs

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6916?focusedWorklogId=240325&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240325
 ]

ASF GitHub Bot logged work on BEAM-6916:


Author: ASF GitHub Bot
Created on: 10/May/19 16:47
Start Date: 10/May/19 16:47
Worklog Time Spent: 10m 
  Work Description: melap commented on pull request #8455: [BEAM-6916] 
Reorg Beam SQL docs and add Calcite section
URL: https://github.com/apache/beam/pull/8455#discussion_r282679012
 
 

 ##
 File path: website/src/documentation/dsls/sql/calcite/overview.md
 ##
 @@ -0,0 +1,85 @@
+---
+layout: section
+title: "Beam SQL in Calcite: Overview"
+section_menu: section-menu/sdks.html
+permalink: /documentation/dsls/sql/calcite/overview/
+---
+
+# Beam SQL in Calcite: Overview
+
+[Apache Calcite](http://calcite.apache.org) is a widespread SQL dialect used in
+big data processing with some streaming enhancements. Calcite provides the
+basic dialect underlying Beam SQL. 
+
+The following table summarizes Apache Calcite operators and functions 
supported by Beam SQL.
+
+
+  Operators and functionsBeam SQL support status
+http://calcite.apache.org/docs/reference.html#operator-precedence";>Operator
 precedenceYes
+http://calcite.apache.org/docs/reference.html#comparison-operators";>Comparison
 operatorsYes
+http://calcite.apache.org/docs/reference.html#logical-operators";>Logical 
operatorsYes
+http://calcite.apache.org/docs/reference.html#arithmetic-operators-and-functions";>Arithmetic
 operators and functionsYes
+http://calcite.apache.org/docs/reference.html#character-string-operators-and-functions";>Character
 string operators and functionsYes
+http://calcite.apache.org/docs/reference.html#binary-string-operators-and-functions";>Binary
 string operators and functionsNo
+http://calcite.apache.org/docs/reference.html#datetime-functions";>Date/time
 functionsYes
+http://calcite.apache.org/docs/reference.html#system-functions";>System 
functionsNo
+http://calcite.apache.org/docs/reference.html#conditional-functions-and-operators";>Conditional
 functions and operatorsYes
+http://calcite.apache.org/docs/reference.html#type-conversion";>Type 
conversionYes
+http://calcite.apache.org/docs/reference.html#value-constructors";>Value 
constructorsNo, except array
+http://calcite.apache.org/docs/reference.html#collection-functions";>Collection
 functionsNo
+http://calcite.apache.org/docs/reference.html#period-predicates";>Period 
predicatesNo
+http://calcite.apache.org/docs/reference.html#jdbc-function-escape";>JDBC 
function escapeNo
+http://calcite.apache.org/docs/reference.html#aggregate-functions";>Aggregate
 functions
+Use Beam SQL https://beam.apache.org/documentation/dsls/sql/calcite/aggregate-functions/";>aggregate
 functions
+http://calcite.apache.org/docs/reference.html#window-functions";>Window 
functionsNo
+http://calcite.apache.org/docs/reference.html#grouping-functions";>Grouping
 functionsNo
+http://calcite.apache.org/docs/reference.html#grouped-window-functions";>Grouped
 window functionsUse Beam SQL https://beam.apache.org/documentation/dsls/sql/windowing-and-triggering/";>windowing
 and triggering
+http://calcite.apache.org/docs/reference.html#grouped-auxiliary-functions";>Grouped
 auxiliary functionsYes, except SESSION_END
+http://calcite.apache.org/docs/reference.html#spatial-functions";>Spatial 
functionsNo
+http://calcite.apache.org/docs/reference.html#geometry-creation-functions-3d";>Geometry
 creation functions (3D)No
+http://calcite.apache.org/docs/reference.html#geometry-predicates";>Geometry
 predicatesNo
+http://calcite.apache.org/docs/reference.html#json-functions";>JSON 
functionsNo
+http://calcite.apache.org/docs/reference.html#user-defined-functions";>User-defined
 functions
+Use Beam SQL https://beam.apache.org/documentation/dsls/sql/user-defined-functions/";>user-defined
 functions. You cannot call functions with http://calcite.apache.org/docs/reference.html#calling-functions-with-named-and-optional-parameters";>named
 and optional parameters.
+http://calcite.apache.org/docs/reference.html#match_recognize";>MATCH_RECOGNIZENo
+http://calcite.apache.org/docs/reference.html#ddl-extensions";>DDL 
ExtensionsUse Beam SQL https://beam.apache.org/documentation/dsls/sql/create-external-table/";>CREATE
 EXTERNAL TABLE
+
+
+We have added additional extensions to
+make it easy to leverage Beam's unified batch/streaming model and support
+for complex data types.
+
+## Query syntax
 
 Review comment:
   wondering if might be good to order these in the page like this, similar to 
Calcite docs:
   - Query syntax
   - Lexical Structure
   - Data types
   - Functions and operators (new section title before "The following table 
summarizes...")
   
   (and remove scalar/aggregate sections as they'll be listed in the functions 
and operators table - more on that below)
 

[jira] [Work logged] (BEAM-6916) Reorganize Beam SQL docs

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6916?focusedWorklogId=240321&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240321
 ]

ASF GitHub Bot logged work on BEAM-6916:


Author: ASF GitHub Bot
Created on: 10/May/19 16:47
Start Date: 10/May/19 16:47
Worklog Time Spent: 10m 
  Work Description: melap commented on pull request #8455: [BEAM-6916] 
Reorg Beam SQL docs and add Calcite section
URL: https://github.com/apache/beam/pull/8455#discussion_r282680764
 
 

 ##
 File path: website/src/documentation/dsls/sql/overview.md
 ##
 @@ -25,19 +25,32 @@ bounded and unbounded `PCollections` with SQL statements. 
Your SQL query
 is translated to a `PTransform`, an encapsulated segment of a Beam pipeline.
 You can freely mix SQL `PTransforms` and other `PTransforms` in your pipeline.
 
-There are three main things you will need to know to use SQL in your pipeline:
-
- - [Apache Calcite](http://calcite.apache.org): a widespread SQL dialect used 
in
-   big data processing with some streaming enhancements. Calcite provides the
-   basic dialect underlying Beam SQL. We have added additional extensions to
-   make it easy to leverage Beam's unified batch/streaming model and support
-   for complex data types.
- - [SqlTransform](https://beam.apache.org/releases/javadoc/{{ 
site.release_latest 
}}/index.html?org/apache/beam/sdk/extensions/sql/SqlTransform.html): 
-   the interface for creating `PTransforms` from SQL queries.
+[Apache Calcite](http://calcite.apache.org) a widespread SQL dialect used in
 
 Review comment:
   missing "is"
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240321)
Time Spent: 2h  (was: 1h 50m)
Remaining Estimate: 166h  (was: 166h 10m)

> Reorganize Beam SQL docs
> 
>
> Key: BEAM-6916
> URL: https://issues.apache.org/jira/browse/BEAM-6916
> Project: Beam
>  Issue Type: New Feature
>  Components: website
>Reporter: Rose Nguyen
>Assignee: Rose Nguyen
>Priority: Minor
>   Original Estimate: 168h
>  Time Spent: 2h
>  Remaining Estimate: 166h
>
> This page describes the Calcite SQL dialect supported by Beam SQL.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7260) UTF8 coder is breaking dataflow tests

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7260?focusedWorklogId=240333&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240333
 ]

ASF GitHub Bot logged work on BEAM-7260:


Author: ASF GitHub Bot
Created on: 10/May/19 17:19
Start Date: 10/May/19 17:19
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #8545: [BEAM-7260] UTF8 
coder is breaking dataflow tests
URL: https://github.com/apache/beam/pull/8545#issuecomment-491365158
 
 
   I am optimistically going to merge this change unbroken master. 
   @robertwb Please review it and if needed we can revert it this PR.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240333)
Time Spent: 0.5h  (was: 20m)

> UTF8 coder is breaking dataflow tests
> -
>
> Key: BEAM-7260
> URL: https://issues.apache.org/jira/browse/BEAM-7260
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Caused by: java.lang.RuntimeException: Error received from SDK harness for 
> instruction -910: org.apache.beam.sdk.util.UserCodeException: 
> java.lang.IllegalStateException: java.lang.ClassCastException: [B cannot be 
> cast to java.lang.String
>         at 
> org.apache.beam.sdk.coders.StringUtf8Coder.encode(StringUtf8Coder.java:37)
>         at 
> org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.encodeAndConcat(RegisterAndProcessBundleOperation.java:621)
>         at 
> org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.handleMultimapSideInput(RegisterAndProcessBundleOperation.java:488)
>         at 
> org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.delegateByStateKeyType(RegisterAndProcessBundleOperation.java:394)
>         at 
> org.apache.beam.runners.fnexecution.state.GrpcStateService$Inbound.onNext(GrpcStateService.java:130)
>         at 
> org.apache.beam.runners.fnexecution.state.GrpcStateService$Inbound.onNext(GrpcStateService.java:118)
>         at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:248)
>         at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33)
>         at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.Contexts$ContextualizedServerCallListener.onMessage(Contexts.java:76)
>         at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:263)
>         at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:683)
>         at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
>         at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
>         at 
> org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:34)
>         at 
> org.apache.beam.sdk.transforms.Combine$GroupedValues$1$DoFnInvoker.invokeProcessElement(Unknown
>  Source)
>         at 
> org.apache.beam.fn.harness.FnApiDoFnRunner.processElement(FnApiDoFnRunner.java:189)
>         at 
> org.apache.beam.fn.harness.data.PCollectionConsumerRegistry.lambda$register$0(PCollectionConsumerRegistry.java:105)
>         at 
> org.apache.beam.fn.harness.data.ElementCountFnDataReceiver.accept(ElementCountFnDataReceiver.java:66)
>         at 
> org.apache.beam.fn.harness.data.ElementCountFnDataReceiver.accept(ElementCountFnDataReceiver.java:39)
>         at 
> org.apache.beam.fn.harness.data.QueueingBeamFnDataClient.drainAndBlock(QueueingBeamFnDataClient.java:108)
>         at 
> org.apache.beam.fn.harness.control.ProcessBundleHandler.processBundle(ProcessBundleHandler.java:312)
>         at 
> org.apache.beam.fn.harness.control.BeamFnControlClient.delegateOnInstructionRequestType(BeamFnControlClient.java:160)
>         at 
> org.apache.beam.fn.harness.

[jira] [Work logged] (BEAM-7260) UTF8 coder is breaking dataflow tests

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7260?focusedWorklogId=240334&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240334
 ]

ASF GitHub Bot logged work on BEAM-7260:


Author: ASF GitHub Bot
Created on: 10/May/19 17:19
Start Date: 10/May/19 17:19
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #8545: [BEAM-7260] 
UTF8 coder is breaking dataflow tests
URL: https://github.com/apache/beam/pull/8545
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240334)
Time Spent: 40m  (was: 0.5h)

> UTF8 coder is breaking dataflow tests
> -
>
> Key: BEAM-7260
> URL: https://issues.apache.org/jira/browse/BEAM-7260
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Caused by: java.lang.RuntimeException: Error received from SDK harness for 
> instruction -910: org.apache.beam.sdk.util.UserCodeException: 
> java.lang.IllegalStateException: java.lang.ClassCastException: [B cannot be 
> cast to java.lang.String
>         at 
> org.apache.beam.sdk.coders.StringUtf8Coder.encode(StringUtf8Coder.java:37)
>         at 
> org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.encodeAndConcat(RegisterAndProcessBundleOperation.java:621)
>         at 
> org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.handleMultimapSideInput(RegisterAndProcessBundleOperation.java:488)
>         at 
> org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.delegateByStateKeyType(RegisterAndProcessBundleOperation.java:394)
>         at 
> org.apache.beam.runners.fnexecution.state.GrpcStateService$Inbound.onNext(GrpcStateService.java:130)
>         at 
> org.apache.beam.runners.fnexecution.state.GrpcStateService$Inbound.onNext(GrpcStateService.java:118)
>         at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:248)
>         at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33)
>         at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.Contexts$ContextualizedServerCallListener.onMessage(Contexts.java:76)
>         at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:263)
>         at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:683)
>         at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
>         at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
>         at 
> org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:34)
>         at 
> org.apache.beam.sdk.transforms.Combine$GroupedValues$1$DoFnInvoker.invokeProcessElement(Unknown
>  Source)
>         at 
> org.apache.beam.fn.harness.FnApiDoFnRunner.processElement(FnApiDoFnRunner.java:189)
>         at 
> org.apache.beam.fn.harness.data.PCollectionConsumerRegistry.lambda$register$0(PCollectionConsumerRegistry.java:105)
>         at 
> org.apache.beam.fn.harness.data.ElementCountFnDataReceiver.accept(ElementCountFnDataReceiver.java:66)
>         at 
> org.apache.beam.fn.harness.data.ElementCountFnDataReceiver.accept(ElementCountFnDataReceiver.java:39)
>         at 
> org.apache.beam.fn.harness.data.QueueingBeamFnDataClient.drainAndBlock(QueueingBeamFnDataClient.java:108)
>         at 
> org.apache.beam.fn.harness.control.ProcessBundleHandler.processBundle(ProcessBundleHandler.java:312)
>         at 
> org.apache.beam.fn.harness.control.BeamFnControlClient.delegateOnInstructionRequestType(BeamFnControlClient.java:160)
>         at 
> org.apache.beam.fn.harness.control.BeamFnControlClient.lambda$processInstructionRequests$0(BeamFnControlClient.java:144)
>         at 
> java.util.concurrent.ThreadPoolExecutor.r

[jira] [Work logged] (BEAM-6880) Deprecate Java Portable Reference Runner

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6880?focusedWorklogId=240344&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240344
 ]

ASF GitHub Bot logged work on BEAM-6880:


Author: ASF GitHub Bot
Created on: 10/May/19 17:42
Start Date: 10/May/19 17:42
Worklog Time Spent: 10m 
  Work Description: youngoli commented on issue #8380: [BEAM-6880] Remove 
deprecated Reference Runner code.
URL: https://github.com/apache/beam/pull/8380#issuecomment-491372537
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240344)
Time Spent: 4h 50m  (was: 4h 40m)

> Deprecate Java Portable Reference Runner
> 
>
> Key: BEAM-6880
> URL: https://issues.apache.org/jira/browse/BEAM-6880
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-direct, test-failures, testing
>Reporter: Mikhail Gryzykhin
>Assignee: Daniel Oliveira
>Priority: Major
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> This ticket is about deprecating Java Portable Reference runner.
>  
> Discussion is happening in [this 
> thread|[https://lists.apache.org/thread.html/0b68efce9b7f2c5297b32d09e5d903e9b354199fe2ce446fbcd240bc@%3Cdev.beam.apache.org%3E]]
>  
>  
> Current summary is: disable beam_PostCommit_Java_PVR_Reference job.
> Keeping or removing reference runner code is still under discussion. It is 
> suggested to create PR that removes relevant code and start voting there.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6985) TypeHints Py3 Error: Native type compatibility tests fail on Python 3.7+

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6985?focusedWorklogId=240348&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240348
 ]

ASF GitHub Bot logged work on BEAM-6985:


Author: ASF GitHub Bot
Created on: 10/May/19 17:46
Start Date: 10/May/19 17:46
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #8453: [BEAM-6985] 
TypeHints Py3 Error: Native type compatibility tests fail on Python 3.7+ Updates
URL: https://github.com/apache/beam/pull/8453#discussion_r282979580
 
 

 ##
 File path: sdks/python/apache_beam/typehints/native_type_compatibility.py
 ##
 @@ -95,8 +101,15 @@ def _match_issubclass(match_against):
 
 def _match_same_type(match_against):
   # For Union types. They can't be compared with isinstance either, so we
-  # have to compare their types directly.
-  return lambda user_type: type(user_type) == type(match_against)
+  # Have to compare their types directly.
+
+  def matcher(derived, parent):
+try:
+  return derived.__origin__ is parent
 
 Review comment:
   Is there a unit test for this? I'm wondering why this code is not:
   ```suggestion
 return derived.__origin__ is parent.__origin__
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240348)
Time Spent: 5h 10m  (was: 5h)

> TypeHints Py3 Error: Native type compatibility tests fail on Python 3.7+
> 
>
> Key: BEAM-6985
> URL: https://issues.apache.org/jira/browse/BEAM-6985
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: niklas Hansson
>Priority: Major
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> The following tests are failing:
> * test_convert_nested_to_beam_type 
> (apache_beam.typehints.native_type_compatibility_test.NativeTypeCompatibilityTest)
>  
> * test_convert_to_beam_type 
> (apache_beam.typehints.native_type_compatibility_test.NativeTypeCompatibilityTest)
>  
> * test_convert_to_beam_types 
> (apache_beam.typehints.native_type_compatibility_test.NativeTypeCompatibilityTest)
> With similar errors, where `typing. != `. eg:
> {noformat}
>  FAIL: test_convert_to_beam_type 
> (apache_beam.typehints.native_type_compatibility_test.NativeTypeCompatibilityTest)
>  --
>  Traceback (most recent call last):
>  File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/native_type_compatibility_test.py",
>  line 79, in test_convert_to_beam_type
>  beam_type, description)
>  AssertionError: typing.Dict[bytes, int] != Dict[bytes, int] : simple dict
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-7267) install python3 components in release_verify_script.sh

2019-05-10 Thread Ankur Goenka (JIRA)
Ankur Goenka created BEAM-7267:
--

 Summary: install python3 components in release_verify_script.sh
 Key: BEAM-7267
 URL: https://issues.apache.org/jira/browse/BEAM-7267
 Project: Beam
  Issue Type: Improvement
  Components: testing
Reporter: Ankur Goenka


Py3 cython test need python-dev for specific versions

 python3-dev
 python3.5-dev
 python3.6-dev
 python3.7-dev



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7267) install python3 components in release_verify_script.sh

2019-05-10 Thread Ankur Goenka (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-7267:
---
Issue Type: Bug  (was: Improvement)

> install python3 components in release_verify_script.sh
> --
>
> Key: BEAM-7267
> URL: https://issues.apache.org/jira/browse/BEAM-7267
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Ankur Goenka
>Priority: Major
>
> Py3 cython test need python-dev for specific versions
>  python3-dev
>  python3.5-dev
>  python3.6-dev
>  python3.7-dev



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6916) Reorganize Beam SQL docs

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6916?focusedWorklogId=240352&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240352
 ]

ASF GitHub Bot logged work on BEAM-6916:


Author: ASF GitHub Bot
Created on: 10/May/19 17:53
Start Date: 10/May/19 17:53
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #8455: [BEAM-6916] 
Reorg Beam SQL docs and add Calcite section
URL: https://github.com/apache/beam/pull/8455#discussion_r282982472
 
 

 ##
 File path: website/src/documentation/dsls/sql/calcite/overview.md
 ##
 @@ -0,0 +1,85 @@
+---
+layout: section
+title: "Beam SQL in Calcite: Overview"
+section_menu: section-menu/sdks.html
+permalink: /documentation/dsls/sql/calcite/overview/
+---
+
+# Beam SQL in Calcite: Overview
+
+[Apache Calcite](http://calcite.apache.org) is a widespread SQL dialect used in
+big data processing with some streaming enhancements. Calcite provides the
+basic dialect underlying Beam SQL. 
+
+The following table summarizes Apache Calcite operators and functions 
supported by Beam SQL.
+
+
+  Operators and functionsBeam SQL support status
+http://calcite.apache.org/docs/reference.html#operator-precedence";>Operator
 precedenceYes
+http://calcite.apache.org/docs/reference.html#comparison-operators";>Comparison
 operatorsYes
+http://calcite.apache.org/docs/reference.html#logical-operators";>Logical 
operatorsYes
+http://calcite.apache.org/docs/reference.html#arithmetic-operators-and-functions";>Arithmetic
 operators and functionsYes
+http://calcite.apache.org/docs/reference.html#character-string-operators-and-functions";>Character
 string operators and functionsYes
+http://calcite.apache.org/docs/reference.html#binary-string-operators-and-functions";>Binary
 string operators and functionsNo
+http://calcite.apache.org/docs/reference.html#datetime-functions";>Date/time
 functionsYes
+http://calcite.apache.org/docs/reference.html#system-functions";>System 
functionsNo
+http://calcite.apache.org/docs/reference.html#conditional-functions-and-operators";>Conditional
 functions and operatorsYes
+http://calcite.apache.org/docs/reference.html#type-conversion";>Type 
conversionYes
+http://calcite.apache.org/docs/reference.html#value-constructors";>Value 
constructorsNo, except array
+http://calcite.apache.org/docs/reference.html#collection-functions";>Collection
 functionsNo
+http://calcite.apache.org/docs/reference.html#period-predicates";>Period 
predicatesNo
+http://calcite.apache.org/docs/reference.html#jdbc-function-escape";>JDBC 
function escapeNo
+http://calcite.apache.org/docs/reference.html#aggregate-functions";>Aggregate
 functions
+Use Beam SQL https://beam.apache.org/documentation/dsls/sql/calcite/aggregate-functions/";>aggregate
 functions
 
 Review comment:
   Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240352)
Time Spent: 2h 50m  (was: 2h 40m)
Remaining Estimate: 165h 10m  (was: 165h 20m)

> Reorganize Beam SQL docs
> 
>
> Key: BEAM-6916
> URL: https://issues.apache.org/jira/browse/BEAM-6916
> Project: Beam
>  Issue Type: New Feature
>  Components: website
>Reporter: Rose Nguyen
>Assignee: Rose Nguyen
>Priority: Minor
>   Original Estimate: 168h
>  Time Spent: 2h 50m
>  Remaining Estimate: 165h 10m
>
> This page describes the Calcite SQL dialect supported by Beam SQL.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7267) install python3 components in release_verify_script.sh

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7267?focusedWorklogId=240351&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240351
 ]

ASF GitHub Bot logged work on BEAM-7267:


Author: ASF GitHub Bot
Created on: 10/May/19 17:53
Start Date: 10/May/19 17:53
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #8551: [BEAM-7267] Add 
python3*-dev to verify script
URL: https://github.com/apache/beam/pull/8551#issuecomment-491376720
 
 
   R: @tvalentyn @aaltay 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240351)
Time Spent: 20m  (was: 10m)

> install python3 components in release_verify_script.sh
> --
>
> Key: BEAM-7267
> URL: https://issues.apache.org/jira/browse/BEAM-7267
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Ankur Goenka
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Py3 cython test need python-dev for specific versions
>  python3-dev
>  python3.5-dev
>  python3.6-dev
>  python3.7-dev



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7267) install python3 components in release_verify_script.sh

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7267?focusedWorklogId=240350&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240350
 ]

ASF GitHub Bot logged work on BEAM-7267:


Author: ASF GitHub Bot
Created on: 10/May/19 17:53
Start Date: 10/May/19 17:53
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #8551: [BEAM-7267] 
Add python3*-dev to verify script
URL: https://github.com/apache/beam/pull/8551
 
 
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/)
 | --- | --- | ---
   
   Pre-Commit Tests Status (on master branch)
   

   
   --- |Java | Python | Go | Website
   --- | --- | --- | --- | ---
   Non-portable | [![Build 
Status](

[jira] [Work logged] (BEAM-6916) Reorganize Beam SQL docs

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6916?focusedWorklogId=240355&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240355
 ]

ASF GitHub Bot logged work on BEAM-6916:


Author: ASF GitHub Bot
Created on: 10/May/19 17:55
Start Date: 10/May/19 17:55
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #8455: [BEAM-6916] 
Reorg Beam SQL docs and add Calcite section
URL: https://github.com/apache/beam/pull/8455#discussion_r282983109
 
 

 ##
 File path: website/src/documentation/dsls/sql/calcite/overview.md
 ##
 @@ -0,0 +1,85 @@
+---
+layout: section
+title: "Beam SQL in Calcite: Overview"
+section_menu: section-menu/sdks.html
+permalink: /documentation/dsls/sql/calcite/overview/
+---
+
+# Beam SQL in Calcite: Overview
+
+[Apache Calcite](http://calcite.apache.org) is a widespread SQL dialect used in
+big data processing with some streaming enhancements. Calcite provides the
+basic dialect underlying Beam SQL. 
+
+The following table summarizes Apache Calcite operators and functions 
supported by Beam SQL.
+
+
+  Operators and functionsBeam SQL support status
+http://calcite.apache.org/docs/reference.html#operator-precedence";>Operator
 precedenceYes
+http://calcite.apache.org/docs/reference.html#comparison-operators";>Comparison
 operatorsYes
+http://calcite.apache.org/docs/reference.html#logical-operators";>Logical 
operatorsYes
+http://calcite.apache.org/docs/reference.html#arithmetic-operators-and-functions";>Arithmetic
 operators and functionsYes
+http://calcite.apache.org/docs/reference.html#character-string-operators-and-functions";>Character
 string operators and functionsYes
+http://calcite.apache.org/docs/reference.html#binary-string-operators-and-functions";>Binary
 string operators and functionsNo
+http://calcite.apache.org/docs/reference.html#datetime-functions";>Date/time
 functionsYes
+http://calcite.apache.org/docs/reference.html#system-functions";>System 
functionsNo
+http://calcite.apache.org/docs/reference.html#conditional-functions-and-operators";>Conditional
 functions and operatorsYes
+http://calcite.apache.org/docs/reference.html#type-conversion";>Type 
conversionYes
+http://calcite.apache.org/docs/reference.html#value-constructors";>Value 
constructorsNo, except array
+http://calcite.apache.org/docs/reference.html#collection-functions";>Collection
 functionsNo
+http://calcite.apache.org/docs/reference.html#period-predicates";>Period 
predicatesNo
+http://calcite.apache.org/docs/reference.html#jdbc-function-escape";>JDBC 
function escapeNo
+http://calcite.apache.org/docs/reference.html#aggregate-functions";>Aggregate
 functions
+Use Beam SQL https://beam.apache.org/documentation/dsls/sql/calcite/aggregate-functions/";>aggregate
 functions
+http://calcite.apache.org/docs/reference.html#window-functions";>Window 
functionsNo
+http://calcite.apache.org/docs/reference.html#grouping-functions";>Grouping
 functionsNo
+http://calcite.apache.org/docs/reference.html#grouped-window-functions";>Grouped
 window functionsUse Beam SQL https://beam.apache.org/documentation/dsls/sql/windowing-and-triggering/";>windowing
 and triggering
+http://calcite.apache.org/docs/reference.html#grouped-auxiliary-functions";>Grouped
 auxiliary functionsYes, except SESSION_END
+http://calcite.apache.org/docs/reference.html#spatial-functions";>Spatial 
functionsNo
+http://calcite.apache.org/docs/reference.html#geometry-creation-functions-3d";>Geometry
 creation functions (3D)No
+http://calcite.apache.org/docs/reference.html#geometry-predicates";>Geometry
 predicatesNo
+http://calcite.apache.org/docs/reference.html#json-functions";>JSON 
functionsNo
+http://calcite.apache.org/docs/reference.html#user-defined-functions";>User-defined
 functions
+Use Beam SQL https://beam.apache.org/documentation/dsls/sql/user-defined-functions/";>user-defined
 functions. You cannot call functions with http://calcite.apache.org/docs/reference.html#calling-functions-with-named-and-optional-parameters";>named
 and optional parameters.
+http://calcite.apache.org/docs/reference.html#match_recognize";>MATCH_RECOGNIZENo
+http://calcite.apache.org/docs/reference.html#ddl-extensions";>DDL 
ExtensionsUse Beam SQL https://beam.apache.org/documentation/dsls/sql/create-external-table/";>CREATE
 EXTERNAL TABLE
+
+
+We have added additional extensions to
+make it easy to leverage Beam's unified batch/streaming model and support
+for complex data types.
+
+## Query syntax
 
 Review comment:
   Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: 

[jira] [Work logged] (BEAM-6916) Reorganize Beam SQL docs

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6916?focusedWorklogId=240354&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240354
 ]

ASF GitHub Bot logged work on BEAM-6916:


Author: ASF GitHub Bot
Created on: 10/May/19 17:54
Start Date: 10/May/19 17:54
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #8455: [BEAM-6916] 
Reorg Beam SQL docs and add Calcite section
URL: https://github.com/apache/beam/pull/8455#discussion_r282982797
 
 

 ##
 File path: website/src/documentation/dsls/sql/calcite/overview.md
 ##
 @@ -0,0 +1,85 @@
+---
+layout: section
+title: "Beam SQL in Calcite: Overview"
+section_menu: section-menu/sdks.html
+permalink: /documentation/dsls/sql/calcite/overview/
+---
+
+# Beam SQL in Calcite: Overview
+
+[Apache Calcite](http://calcite.apache.org) is a widespread SQL dialect used in
+big data processing with some streaming enhancements. Calcite provides the
+basic dialect underlying Beam SQL. 
+
+The following table summarizes Apache Calcite operators and functions 
supported by Beam SQL.
+
+
+  Operators and functionsBeam SQL support status
+http://calcite.apache.org/docs/reference.html#operator-precedence";>Operator
 precedenceYes
+http://calcite.apache.org/docs/reference.html#comparison-operators";>Comparison
 operatorsYes
+http://calcite.apache.org/docs/reference.html#logical-operators";>Logical 
operatorsYes
+http://calcite.apache.org/docs/reference.html#arithmetic-operators-and-functions";>Arithmetic
 operators and functionsYes
+http://calcite.apache.org/docs/reference.html#character-string-operators-and-functions";>Character
 string operators and functionsYes
+http://calcite.apache.org/docs/reference.html#binary-string-operators-and-functions";>Binary
 string operators and functionsNo
+http://calcite.apache.org/docs/reference.html#datetime-functions";>Date/time
 functionsYes
+http://calcite.apache.org/docs/reference.html#system-functions";>System 
functionsNo
+http://calcite.apache.org/docs/reference.html#conditional-functions-and-operators";>Conditional
 functions and operatorsYes
+http://calcite.apache.org/docs/reference.html#type-conversion";>Type 
conversionYes
+http://calcite.apache.org/docs/reference.html#value-constructors";>Value 
constructorsNo, except array
+http://calcite.apache.org/docs/reference.html#collection-functions";>Collection
 functionsNo
+http://calcite.apache.org/docs/reference.html#period-predicates";>Period 
predicatesNo
+http://calcite.apache.org/docs/reference.html#jdbc-function-escape";>JDBC 
function escapeNo
+http://calcite.apache.org/docs/reference.html#aggregate-functions";>Aggregate
 functions
+Use Beam SQL https://beam.apache.org/documentation/dsls/sql/calcite/aggregate-functions/";>aggregate
 functions
+http://calcite.apache.org/docs/reference.html#window-functions";>Window 
functionsNo
+http://calcite.apache.org/docs/reference.html#grouping-functions";>Grouping
 functionsNo
+http://calcite.apache.org/docs/reference.html#grouped-window-functions";>Grouped
 window functionsUse Beam SQL https://beam.apache.org/documentation/dsls/sql/windowing-and-triggering/";>windowing
 and triggering
+http://calcite.apache.org/docs/reference.html#grouped-auxiliary-functions";>Grouped
 auxiliary functionsYes, except SESSION_END
+http://calcite.apache.org/docs/reference.html#spatial-functions";>Spatial 
functionsNo
+http://calcite.apache.org/docs/reference.html#geometry-creation-functions-3d";>Geometry
 creation functions (3D)No
+http://calcite.apache.org/docs/reference.html#geometry-predicates";>Geometry
 predicatesNo
+http://calcite.apache.org/docs/reference.html#json-functions";>JSON 
functionsNo
+http://calcite.apache.org/docs/reference.html#user-defined-functions";>User-defined
 functions
+Use Beam SQL https://beam.apache.org/documentation/dsls/sql/user-defined-functions/";>user-defined
 functions. You cannot call functions with http://calcite.apache.org/docs/reference.html#calling-functions-with-named-and-optional-parameters";>named
 and optional parameters.
+http://calcite.apache.org/docs/reference.html#match_recognize";>MATCH_RECOGNIZENo
+http://calcite.apache.org/docs/reference.html#ddl-extensions";>DDL 
ExtensionsUse Beam SQL https://beam.apache.org/documentation/dsls/sql/create-external-table/";>CREATE
 EXTERNAL TABLE
+
+
+We have added additional extensions to
+make it easy to leverage Beam's unified batch/streaming model and support
+for complex data types.
+
+## Query syntax
+Query statements scan one or more tables or expressions and return the 
computed result rows.
+The [Query syntax]({{ site.baseurl
+}}/documentation/dsls/sql/calcite/query-syntax) page describes Beam SQL's 
syntax for queries when using Apache Calcite.
+
+## Data types
+Beam SQL supports standard SQL scalar data types as well as extensions 
including arrays, maps, and nested rows.
+Read about supported [data types]

[jira] [Created] (BEAM-7268) Make external sorter Hadoop free

2019-05-10 Thread Neville Li (JIRA)
Neville Li created BEAM-7268:


 Summary: Make external sorter Hadoop free
 Key: BEAM-7268
 URL: https://issues.apache.org/jira/browse/BEAM-7268
 Project: Beam
  Issue Type: Improvement
  Components: sdk-ideas
Affects Versions: 2.12.0
Reporter: Neville Li
Assignee: Neville Li


Right now the Java sorter extension depends on Hadoop SequenceFile for external 
sort. It'll be nice to re-implement it without the dependency to avoid 
conflicts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-7243) Release python{36,37}-fnapi containers images required by Dataflow runner.

2019-05-10 Thread Ankur Goenka (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16837505#comment-16837505
 ] 

Ankur Goenka commented on BEAM-7243:


Container updation PR [https://github.com/apache/beam/pull/8546]

> Release python{36,37}-fnapi containers images required by Dataflow runner. 
> ---
>
> Key: BEAM-7243
> URL: https://issues.apache.org/jira/browse/BEAM-7243
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Valentyn Tymofieiev
>Assignee: Ankur Goenka
>Priority: Major
>
> We have not yet released all container images for currently used dev fnapi 
> containers [1]. 
> For example:
> :~$ docker pull 
> gcr.io/cloud-dataflow/v1beta3/python36-fnapi:beam-master-20190213
> Error response from daemon: manifest for 
> gcr.io/cloud-dataflow/v1beta3/python36-fnapi:beam-master-20190213 not found
> This causes failures in Python streaming postcommit tests on Dataflow runner 
> for Python 3.6 and higher versions. 
> We need to release the containers and update names.py.
> [~angoenka], I think you were planning an update of dev containers that will 
> also take care of this. If you don't plan to do that or need help, please 
> reassign the issue back to me. Thanks!
> [1] 
> https://github.com/apache/beam/blob/79a463784fce36c12292b4e642238ef124c184e0/sdks/python/apache_beam/runners/dataflow/internal/names.py#L44



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-7267) install python3 components in release_verify_script.sh

2019-05-10 Thread Ankur Goenka (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka reassigned BEAM-7267:
--

Assignee: Ankur Goenka

> install python3 components in release_verify_script.sh
> --
>
> Key: BEAM-7267
> URL: https://issues.apache.org/jira/browse/BEAM-7267
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Py3 cython test need python-dev for specific versions
>  python3-dev
>  python3.5-dev
>  python3.6-dev
>  python3.7-dev



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6908) Add Python3 performance benchmarks

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6908?focusedWorklogId=240360&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240360
 ]

ASF GitHub Bot logged work on BEAM-6908:


Author: ASF GitHub Bot
Created on: 10/May/19 18:36
Start Date: 10/May/19 18:36
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #8518: [BEAM-6908] 
Refactor Python performance test groovy file for easy configuration
URL: https://github.com/apache/beam/pull/8518#discussion_r282983808
 
 

 ##
 File path: .test-infra/jenkins/job_PerformanceTests_Python.groovy
 ##
 @@ -18,46 +18,130 @@
 
 import CommonJobProperties as commonJobProperties
 
-// This job runs the Beam Python performance tests on PerfKit Benchmarker.
-job('beam_PerformanceTests_Python'){
-  // Set default Beam job properties.
-  commonJobProperties.setTopLevelMainJobProperties(delegate)
-
-  // Run job in postcommit every 6 hours, don't trigger every push.
-  commonJobProperties.setAutoJob(
-  delegate,
-  'H */6 * * *')
-
-  // Allows triggering this build against pull requests.
-  commonJobProperties.enablePhraseTriggeringFromPullRequest(
-  delegate,
-  'Python SDK Performance Test',
-  'Run Python Performance Test')
-
-  def pipelineArgs = [
-  project: 'apache-beam-testing',
-  staging_location: 'gs://temp-storage-for-end-to-end-tests/staging-it',
-  temp_location: 'gs://temp-storage-for-end-to-end-tests/temp-it',
-  output: 'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output'
-  ]
+
+class PerformanceTestConfigurations {
+  // Name of the Jenkins job
+  String jobName
+  // Description of the Jenkins job
+  String jobDescription
+  // Phrase to trigger this Jenkins job
+  String jobTriggerPhrase
+  // Frequency of the job build, default to every 6 hours
+  String buildSchedule = 'H */6 * * *'
+  // A benchmark flag, will pass to "--benchmarkName"
+  String benchmarkName = 'beam_integration_benchmark'
+  // A benchmark flag, will pass to "--beam_sdk"
+  String sdk = 'python'
+  // A benchmark flag, will pass to "--bigqueryTable"
+  String resultTable
+  // A benchmark flag, will pass to "--beam_it_class"
+  String itClass
+  // A benchmark flag, will pass to "--beam_it_module"
+  String itModule
+  // A benchmark flag, will pass to "--beam_prebuilt"
+  Boolean prebuilt = false
+  // A benchmark flag, will pass to "--beam_python_sdk_location"
 
 Review comment:
   Please add a comment about default behavior if not specified. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240360)
Time Spent: 11h  (was: 10h 50m)

> Add Python3 performance benchmarks
> --
>
> Key: BEAM-6908
> URL: https://issues.apache.org/jira/browse/BEAM-6908
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 11h
>  Remaining Estimate: 0h
>
> Similar to 
> [beam_PerformanceTests_Python|https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/job/beam_PerformanceTests_Python/],
>  we want to have a Python3 benchmark running on Jenkins to detect performance 
> regression during code adoption.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6908) Add Python3 performance benchmarks

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6908?focusedWorklogId=240356&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240356
 ]

ASF GitHub Bot logged work on BEAM-6908:


Author: ASF GitHub Bot
Created on: 10/May/19 18:36
Start Date: 10/May/19 18:36
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #8518: [BEAM-6908] 
Refactor Python performance test groovy file for easy configuration
URL: https://github.com/apache/beam/pull/8518#discussion_r282981925
 
 

 ##
 File path: .test-infra/jenkins/job_PerformanceTests_Python.groovy
 ##
 @@ -18,46 +18,130 @@
 
 import CommonJobProperties as commonJobProperties
 
-// This job runs the Beam Python performance tests on PerfKit Benchmarker.
-job('beam_PerformanceTests_Python'){
-  // Set default Beam job properties.
-  commonJobProperties.setTopLevelMainJobProperties(delegate)
-
-  // Run job in postcommit every 6 hours, don't trigger every push.
-  commonJobProperties.setAutoJob(
-  delegate,
-  'H */6 * * *')
-
-  // Allows triggering this build against pull requests.
-  commonJobProperties.enablePhraseTriggeringFromPullRequest(
-  delegate,
-  'Python SDK Performance Test',
-  'Run Python Performance Test')
-
-  def pipelineArgs = [
-  project: 'apache-beam-testing',
-  staging_location: 'gs://temp-storage-for-end-to-end-tests/staging-it',
-  temp_location: 'gs://temp-storage-for-end-to-end-tests/temp-it',
-  output: 'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output'
-  ]
+
+class PerformanceTestConfigurations {
+  // Name of the Jenkins job
+  String jobName
+  // Description of the Jenkins job
+  String jobDescription
+  // Phrase to trigger this Jenkins job
+  String jobTriggerPhrase
+  // Frequency of the job build, default to every 6 hours
+  String buildSchedule = 'H */6 * * *'
+  // A benchmark flag, will pass to "--benchmarkName"
+  String benchmarkName = 'beam_integration_benchmark'
+  // A benchmark flag, will pass to "--beam_sdk"
+  String sdk = 'python'
+  // A benchmark flag, will pass to "--bigqueryTable"
+  String resultTable
+  // A benchmark flag, will pass to "--beam_it_class"
+  String itClass
+  // A benchmark flag, will pass to "--beam_it_module"
+  String itModule
+  // A benchmark flag, will pass to "--beam_prebuilt"
+  Boolean prebuilt = false
 
 Review comment:
   How do we know how to set this flag correctly and what it affects? Is there 
a default value we can set so that the user doesn't need to worry about this 
knob as well?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240356)
Time Spent: 10.5h  (was: 10h 20m)

> Add Python3 performance benchmarks
> --
>
> Key: BEAM-6908
> URL: https://issues.apache.org/jira/browse/BEAM-6908
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> Similar to 
> [beam_PerformanceTests_Python|https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/job/beam_PerformanceTests_Python/],
>  we want to have a Python3 benchmark running on Jenkins to detect performance 
> regression during code adoption.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6908) Add Python3 performance benchmarks

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6908?focusedWorklogId=240359&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240359
 ]

ASF GitHub Bot logged work on BEAM-6908:


Author: ASF GitHub Bot
Created on: 10/May/19 18:36
Start Date: 10/May/19 18:36
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #8518: [BEAM-6908] 
Refactor Python performance test groovy file for easy configuration
URL: https://github.com/apache/beam/pull/8518#discussion_r282997900
 
 

 ##
 File path: .test-infra/jenkins/job_PerformanceTests_Python.groovy
 ##
 @@ -18,46 +18,107 @@
 
 import CommonJobProperties as commonJobProperties
 
-// This job runs the Beam Python performance tests on PerfKit Benchmarker.
-job('beam_PerformanceTests_Python'){
-  // Set default Beam job properties.
-  commonJobProperties.setTopLevelMainJobProperties(delegate)
-
-  // Run job in postcommit every 6 hours, don't trigger every push.
-  commonJobProperties.setAutoJob(
-  delegate,
-  'H */6 * * *')
-
-  // Allows triggering this build against pull requests.
-  commonJobProperties.enablePhraseTriggeringFromPullRequest(
-  delegate,
-  'Python SDK Performance Test',
-  'Run Python Performance Test')
-
-  def pipelineArgs = [
-  project: 'apache-beam-testing',
-  staging_location: 'gs://temp-storage-for-end-to-end-tests/staging-it',
-  temp_location: 'gs://temp-storage-for-end-to-end-tests/temp-it',
-  output: 'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output'
-  ]
-  def pipelineArgList = []
-  pipelineArgs.each({
-key, value -> pipelineArgList.add("--$key=$value")
-  })
-  def pipelineArgsJoined = pipelineArgList.join(',')
-
-  def argMap = [
-  beam_sdk : 'python',
-  benchmarks   : 'beam_integration_benchmark',
-  bigquery_table   : 'beam_performance.wordcount_py_pkb_results',
-  beam_it_class: 
'apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it',
-  beam_it_module   : 'sdks/python',
-  beam_prebuilt: 'true',  // skip beam prebuild
-  beam_python_sdk_location : 'build/apache-beam.tar.gz',
-  beam_runner  : 'TestDataflowRunner',
-  beam_it_timeout  : '1200',
-  beam_it_args : pipelineArgsJoined,
-  ]
-
-  commonJobProperties.buildPerformanceTest(delegate, argMap)
+
+class PerformanceTestConfigurations {
+  String jobName
+  String jobDescription
+  String jobTriggerPhrase
+  String buildSchedule = 'H */6 * * *'  // every 6 hours
+  String benchmarkName = 'beam_integration_benchmark'
+  String sdk = 'python'
+  String bigqueryTable
+  String itClass
+  String itModule
+  Boolean skipPrebuild = false
+  String pythonSdkLocation
+  String runner = 'TestDataflowRunner'
+  Integer itTimeout = 1200
+  Map extraPipelineArgs
+}
+
+// Common pipeline args for Dataflow job.
+def dataflowPipelineArgs = [
+project : 'apache-beam-testing',
+staging_location: 'gs://temp-storage-for-end-to-end-tests/staging-it',
+temp_location   : 'gs://temp-storage-for-end-to-end-tests/temp-it',
+]
+
+
+// Configurations of each Jenkins job.
+def testConfigurations = [
+new PerformanceTestConfigurations(
+jobName   : 'beam_PerformanceTests_Python',
+jobDescription: 'Python SDK Performance Test',
+jobTriggerPhrase  : 'Run Python Performance Test',
+bigqueryTable : 'beam_performance.wordcount_py_pkb_results',
+skipPrebuild  : true,
+pythonSdkLocation : 'build/apache-beam.tar.gz',
+itClass   : 
'apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it',
+itModule  : 'sdks/python',
+extraPipelineArgs : dataflowPipelineArgs + [
+output: 'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output'
+],
+),
+new PerformanceTestConfigurations(
+jobName   : 'beam_PerformanceTests_Python35',
+jobDescription: 'Python35 SDK Performance Test',
+jobTriggerPhrase  : 'Run Python35 Performance Test',
+bigqueryTable : 'beam_performance.wordcount_py35_pkb_results',
+skipPrebuild  : true,
+pythonSdkLocation : 
'test-suites/dataflow/py35/build/apache-beam.tar.gz',
+itClass   : 
'apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it',
 
 Review comment:
   Ok, but we will not get one test reading per each test, instead we will run 
both tests, and get a total runtime, right?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure a

[jira] [Work logged] (BEAM-6908) Add Python3 performance benchmarks

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6908?focusedWorklogId=240357&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240357
 ]

ASF GitHub Bot logged work on BEAM-6908:


Author: ASF GitHub Bot
Created on: 10/May/19 18:36
Start Date: 10/May/19 18:36
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #8518: [BEAM-6908] 
Refactor Python performance test groovy file for easy configuration
URL: https://github.com/apache/beam/pull/8518#discussion_r282983336
 
 

 ##
 File path: .test-infra/jenkins/job_PerformanceTests_Python.groovy
 ##
 @@ -18,46 +18,107 @@
 
 import CommonJobProperties as commonJobProperties
 
-// This job runs the Beam Python performance tests on PerfKit Benchmarker.
-job('beam_PerformanceTests_Python'){
-  // Set default Beam job properties.
-  commonJobProperties.setTopLevelMainJobProperties(delegate)
-
-  // Run job in postcommit every 6 hours, don't trigger every push.
-  commonJobProperties.setAutoJob(
-  delegate,
-  'H */6 * * *')
-
-  // Allows triggering this build against pull requests.
-  commonJobProperties.enablePhraseTriggeringFromPullRequest(
-  delegate,
-  'Python SDK Performance Test',
-  'Run Python Performance Test')
-
-  def pipelineArgs = [
-  project: 'apache-beam-testing',
-  staging_location: 'gs://temp-storage-for-end-to-end-tests/staging-it',
-  temp_location: 'gs://temp-storage-for-end-to-end-tests/temp-it',
-  output: 'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output'
-  ]
-  def pipelineArgList = []
-  pipelineArgs.each({
-key, value -> pipelineArgList.add("--$key=$value")
-  })
-  def pipelineArgsJoined = pipelineArgList.join(',')
-
-  def argMap = [
-  beam_sdk : 'python',
-  benchmarks   : 'beam_integration_benchmark',
-  bigquery_table   : 'beam_performance.wordcount_py_pkb_results',
-  beam_it_class: 
'apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it',
-  beam_it_module   : 'sdks/python',
-  beam_prebuilt: 'true',  // skip beam prebuild
-  beam_python_sdk_location : 'build/apache-beam.tar.gz',
-  beam_runner  : 'TestDataflowRunner',
-  beam_it_timeout  : '1200',
-  beam_it_args : pipelineArgsJoined,
-  ]
-
-  commonJobProperties.buildPerformanceTest(delegate, argMap)
+
+class PerformanceTestConfigurations {
+  String jobName
+  String jobDescription
+  String jobTriggerPhrase
+  String buildSchedule = 'H */6 * * *'  // every 6 hours
+  String benchmarkName = 'beam_integration_benchmark'
+  String sdk = 'python'
 
 Review comment:
   Let's try to reduce the knobs if we can. I think we can remove this one from 
PerformanceTestConfigurations and just pass `"python"` in 
`createPythonPerformanceTestJob`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240357)
Time Spent: 10.5h  (was: 10h 20m)

> Add Python3 performance benchmarks
> --
>
> Key: BEAM-6908
> URL: https://issues.apache.org/jira/browse/BEAM-6908
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> Similar to 
> [beam_PerformanceTests_Python|https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/job/beam_PerformanceTests_Python/],
>  we want to have a Python3 benchmark running on Jenkins to detect performance 
> regression during code adoption.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6908) Add Python3 performance benchmarks

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6908?focusedWorklogId=240358&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240358
 ]

ASF GitHub Bot logged work on BEAM-6908:


Author: ASF GitHub Bot
Created on: 10/May/19 18:36
Start Date: 10/May/19 18:36
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #8518: [BEAM-6908] 
Refactor Python performance test groovy file for easy configuration
URL: https://github.com/apache/beam/pull/8518#discussion_r282980975
 
 

 ##
 File path: .test-infra/jenkins/job_PerformanceTests_Python.groovy
 ##
 @@ -18,46 +18,107 @@
 
 import CommonJobProperties as commonJobProperties
 
-// This job runs the Beam Python performance tests on PerfKit Benchmarker.
-job('beam_PerformanceTests_Python'){
-  // Set default Beam job properties.
-  commonJobProperties.setTopLevelMainJobProperties(delegate)
-
-  // Run job in postcommit every 6 hours, don't trigger every push.
-  commonJobProperties.setAutoJob(
-  delegate,
-  'H */6 * * *')
-
-  // Allows triggering this build against pull requests.
-  commonJobProperties.enablePhraseTriggeringFromPullRequest(
-  delegate,
-  'Python SDK Performance Test',
-  'Run Python Performance Test')
-
-  def pipelineArgs = [
-  project: 'apache-beam-testing',
-  staging_location: 'gs://temp-storage-for-end-to-end-tests/staging-it',
-  temp_location: 'gs://temp-storage-for-end-to-end-tests/temp-it',
-  output: 'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output'
-  ]
-  def pipelineArgList = []
-  pipelineArgs.each({
-key, value -> pipelineArgList.add("--$key=$value")
-  })
-  def pipelineArgsJoined = pipelineArgList.join(',')
-
-  def argMap = [
-  beam_sdk : 'python',
-  benchmarks   : 'beam_integration_benchmark',
-  bigquery_table   : 'beam_performance.wordcount_py_pkb_results',
-  beam_it_class: 
'apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it',
-  beam_it_module   : 'sdks/python',
-  beam_prebuilt: 'true',  // skip beam prebuild
-  beam_python_sdk_location : 'build/apache-beam.tar.gz',
-  beam_runner  : 'TestDataflowRunner',
-  beam_it_timeout  : '1200',
-  beam_it_args : pipelineArgsJoined,
-  ]
-
-  commonJobProperties.buildPerformanceTest(delegate, argMap)
+
+class PerformanceTestConfigurations {
+  String jobName
+  String jobDescription
+  String jobTriggerPhrase
+  String buildSchedule = 'H */6 * * *'  // every 6 hours
+  String benchmarkName = 'beam_integration_benchmark'
+  String sdk = 'python'
+  String bigqueryTable
+  String itClass
+  String itModule
 
 Review comment:
   How do we know how to set this correctly? It seems not intuitive...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240358)
Time Spent: 10h 40m  (was: 10.5h)

> Add Python3 performance benchmarks
> --
>
> Key: BEAM-6908
> URL: https://issues.apache.org/jira/browse/BEAM-6908
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 10h 40m
>  Remaining Estimate: 0h
>
> Similar to 
> [beam_PerformanceTests_Python|https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/job/beam_PerformanceTests_Python/],
>  we want to have a Python3 benchmark running on Jenkins to detect performance 
> regression during code adoption.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7243) Release python{36,37}-fnapi containers images required by Dataflow runner.

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7243?focusedWorklogId=240365&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240365
 ]

ASF GitHub Bot logged work on BEAM-7243:


Author: ASF GitHub Bot
Created on: 10/May/19 19:05
Start Date: 10/May/19 19:05
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #8546: [BEAM-7243] 
Updating python containers to beam-master-20190509
URL: https://github.com/apache/beam/pull/8546#issuecomment-491398413
 
 
   Run Python PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240365)
Time Spent: 10m
Remaining Estimate: 0h

> Release python{36,37}-fnapi containers images required by Dataflow runner. 
> ---
>
> Key: BEAM-7243
> URL: https://issues.apache.org/jira/browse/BEAM-7243
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Valentyn Tymofieiev
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We have not yet released all container images for currently used dev fnapi 
> containers [1]. 
> For example:
> :~$ docker pull 
> gcr.io/cloud-dataflow/v1beta3/python36-fnapi:beam-master-20190213
> Error response from daemon: manifest for 
> gcr.io/cloud-dataflow/v1beta3/python36-fnapi:beam-master-20190213 not found
> This causes failures in Python streaming postcommit tests on Dataflow runner 
> for Python 3.6 and higher versions. 
> We need to release the containers and update names.py.
> [~angoenka], I think you were planning an update of dev containers that will 
> also take care of this. If you don't plan to do that or need help, please 
> reassign the issue back to me. Thanks!
> [1] 
> https://github.com/apache/beam/blob/79a463784fce36c12292b4e642238ef124c184e0/sdks/python/apache_beam/runners/dataflow/internal/names.py#L44



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7243) Release python{36,37}-fnapi containers images required by Dataflow runner.

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7243?focusedWorklogId=240366&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240366
 ]

ASF GitHub Bot logged work on BEAM-7243:


Author: ASF GitHub Bot
Created on: 10/May/19 19:05
Start Date: 10/May/19 19:05
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #8546: [BEAM-7243] 
Updating python containers to beam-master-20190509
URL: https://github.com/apache/beam/pull/8546#issuecomment-491398458
 
 
   Run Python Dataflow ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240366)
Time Spent: 20m  (was: 10m)

> Release python{36,37}-fnapi containers images required by Dataflow runner. 
> ---
>
> Key: BEAM-7243
> URL: https://issues.apache.org/jira/browse/BEAM-7243
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Valentyn Tymofieiev
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We have not yet released all container images for currently used dev fnapi 
> containers [1]. 
> For example:
> :~$ docker pull 
> gcr.io/cloud-dataflow/v1beta3/python36-fnapi:beam-master-20190213
> Error response from daemon: manifest for 
> gcr.io/cloud-dataflow/v1beta3/python36-fnapi:beam-master-20190213 not found
> This causes failures in Python streaming postcommit tests on Dataflow runner 
> for Python 3.6 and higher versions. 
> We need to release the containers and update names.py.
> [~angoenka], I think you were planning an update of dev containers that will 
> also take care of this. If you don't plan to do that or need help, please 
> reassign the issue back to me. Thanks!
> [1] 
> https://github.com/apache/beam/blob/79a463784fce36c12292b4e642238ef124c184e0/sdks/python/apache_beam/runners/dataflow/internal/names.py#L44



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7267) install python3 components in release_verify_script.sh

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7267?focusedWorklogId=240380&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240380
 ]

ASF GitHub Bot logged work on BEAM-7267:


Author: ASF GitHub Bot
Created on: 10/May/19 19:50
Start Date: 10/May/19 19:50
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #8551: [BEAM-7267] 
Add python3*-dev to verify script
URL: https://github.com/apache/beam/pull/8551
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240380)
Time Spent: 0.5h  (was: 20m)

> install python3 components in release_verify_script.sh
> --
>
> Key: BEAM-7267
> URL: https://issues.apache.org/jira/browse/BEAM-7267
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Py3 cython test need python-dev for specific versions
>  python3-dev
>  python3.5-dev
>  python3.6-dev
>  python3.7-dev



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7145) Make Flink Runner compatible with Flink 1.8

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7145?focusedWorklogId=240384&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240384
 ]

ASF GitHub Bot logged work on BEAM-7145:


Author: ASF GitHub Bot
Created on: 10/May/19 19:55
Start Date: 10/May/19 19:55
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #8549: 
[release-2.13.0][BEAM-7145] Make FlinkRunner compatible with Flink 1.8
URL: https://github.com/apache/beam/pull/8549
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240384)
Time Spent: 1h 20m  (was: 1h 10m)

> Make Flink Runner compatible with Flink 1.8
> ---
>
> Key: BEAM-7145
> URL: https://issues.apache.org/jira/browse/BEAM-7145
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.13.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Users have asked about Flink 1.8 support. From a quick look,
>  * There are changes related to how TypeSerializers are snapshotted. We might 
> have to copy {{CoderTypeSerializer}} to ensure compatibility across the 
> different Flink Runner build targets.
>  * StandaloneClusterClient has been removed. We might drop support for legacy 
> deployment and only allow REST.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-7145) Make Flink Runner compatible with Flink 1.8

2019-05-10 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-7145.

Resolution: Fixed

> Make Flink Runner compatible with Flink 1.8
> ---
>
> Key: BEAM-7145
> URL: https://issues.apache.org/jira/browse/BEAM-7145
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.13.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Users have asked about Flink 1.8 support. From a quick look,
>  * There are changes related to how TypeSerializers are snapshotted. We might 
> have to copy {{CoderTypeSerializer}} to ensure compatibility across the 
> different Flink Runner build targets.
>  * StandaloneClusterClient has been removed. We might drop support for legacy 
> deployment and only allow REST.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-562) DoFn Reuse: Add new methods to DoFn

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-562?focusedWorklogId=240389&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240389
 ]

ASF GitHub Bot logged work on BEAM-562:
---

Author: ASF GitHub Bot
Created on: 10/May/19 20:12
Start Date: 10/May/19 20:12
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #7994: [BEAM-562] Add 
DoFn.setup and DoFn.teardown to Python SDK
URL: https://github.com/apache/beam/pull/7994#issuecomment-491416153
 
 
   Run Python PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240389)
Time Spent: 9h 20m  (was: 9h 10m)

> DoFn Reuse: Add new methods to DoFn
> ---
>
> Key: BEAM-562
> URL: https://issues.apache.org/jira/browse/BEAM-562
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Yifan Mai
>Priority: Major
>  Labels: sdk-consistency
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> Java SDK added setup and teardown methods to the DoFns. This makes DoFns 
> reusable and provide performance improvements. Python SDK should add support 
> for these new DoFn methods:
> Proposal doc: 
> https://docs.google.com/document/d/1LLQqggSePURt3XavKBGV7SZJYQ4NW8yCu63lBchzMRk/edit?ts=5771458f#



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-562) DoFn Reuse: Add new methods to DoFn

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-562?focusedWorklogId=240388&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240388
 ]

ASF GitHub Bot logged work on BEAM-562:
---

Author: ASF GitHub Bot
Created on: 10/May/19 20:12
Start Date: 10/May/19 20:12
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #7994: [BEAM-562] Add 
DoFn.setup and DoFn.teardown to Python SDK
URL: https://github.com/apache/beam/pull/7994#issuecomment-491416096
 
 
   Fails with a py3 error, not sure how it is related:
   22:36:06   File 
"/usr/local/lib/python3.5/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 613, in _blocking_request
   22:36:06 raise RuntimeError(response.error)
   22:36:06 RuntimeError: java.lang.ClassCastException: java.lang.String cannot 
be cast to [B
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240388)
Time Spent: 9h 10m  (was: 9h)

> DoFn Reuse: Add new methods to DoFn
> ---
>
> Key: BEAM-562
> URL: https://issues.apache.org/jira/browse/BEAM-562
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Yifan Mai
>Priority: Major
>  Labels: sdk-consistency
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> Java SDK added setup and teardown methods to the DoFns. This makes DoFns 
> reusable and provide performance improvements. Python SDK should add support 
> for these new DoFn methods:
> Proposal doc: 
> https://docs.google.com/document/d/1LLQqggSePURt3XavKBGV7SZJYQ4NW8yCu63lBchzMRk/edit?ts=5771458f#



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4150) Standardize use of PCollection coder proto attribute

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4150?focusedWorklogId=240390&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240390
 ]

ASF GitHub Bot logged work on BEAM-4150:


Author: ASF GitHub Bot
Created on: 10/May/19 20:14
Start Date: 10/May/19 20:14
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #8533: [BEAM-4150] 
Downgrade missing coder error logs to info logs.
URL: https://github.com/apache/beam/pull/8533
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240390)
Time Spent: 2h 50m  (was: 2h 40m)

> Standardize use of PCollection coder proto attribute
> 
>
> Key: BEAM-4150
> URL: https://issues.apache.org/jira/browse/BEAM-4150
> Project: Beam
>  Issue Type: Task
>  Components: beam-model
>Reporter: Robert Bradshaw
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> In some places it's expected to be a WindowedCoder, in others the raw 
> ElementCoder. We should use the same convention (decided in discussion to be 
> the raw ElementCoder) everywhere. The WindowCoder can be pulled out of the 
> attached windowing strategy, and the input/output ports should specify the 
> encoding directly rather than read the adjacent PCollection coder fields. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7253) test_with_jar_packages_invalid_file_name test fails on windows

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7253?focusedWorklogId=240391&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240391
 ]

ASF GitHub Bot logged work on BEAM-7253:


Author: ASF GitHub Bot
Created on: 10/May/19 20:15
Start Date: 10/May/19 20:15
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #8537: [BEAM-7253] 
test_with_jar_packages_invalid_file_name test fails on Windows
URL: https://github.com/apache/beam/pull/8537#issuecomment-491416979
 
 
   Let's try to merge this to 2.13 release branch after it is merged here.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240391)
Time Spent: 1h 10m  (was: 1h)

> test_with_jar_packages_invalid_file_name test fails on windows
> --
>
> Key: BEAM-7253
> URL: https://issues.apache.org/jira/browse/BEAM-7253
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Heejong Lee
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> test_with_jar_packages_invalid_file_name test fails on windows. possibly 
> different class path separator on windows ";" as compared to linux ":".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-7269) Remove StateSpec from hashCode of SimpleStateTag

2019-05-10 Thread JIRA
Jan Lukavský created BEAM-7269:
--

 Summary: Remove StateSpec from hashCode of SimpleStateTag
 Key: BEAM-7269
 URL: https://issues.apache.org/jira/browse/BEAM-7269
 Project: Beam
  Issue Type: Bug
  Components: runner-core
Affects Versions: 2.12.0
Reporter: Jan Lukavský


SimpleStateTag is used as key in hash based StateTable, and currently is hashes 
and compares StateSpec and StructuredId, that are inside the SimpleStateTag. 
StateSpec hashes Coder into the resulting hashCode and when the Coder fails to 
have a proper hashCode and equals implementations, it results in wrong behavior 
(apparently missing states).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7253) test_with_jar_packages_invalid_file_name test fails on windows

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7253?focusedWorklogId=240396&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240396
 ]

ASF GitHub Bot logged work on BEAM-7253:


Author: ASF GitHub Bot
Created on: 10/May/19 20:29
Start Date: 10/May/19 20:29
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #8537: 
[BEAM-7253] test_with_jar_packages_invalid_file_name test fails on Windows
URL: https://github.com/apache/beam/pull/8537#discussion_r283032601
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/stager.py
 ##
 @@ -200,10 +201,11 @@ def stage_job_resources(self,
 # Handle jar packages that should be staged for Java SDK Harness.
 jar_packages = options.view_as(
 DebugOptions).lookup_experiment('jar_packages')
+classpath_separator = ':' if platform.system() != 'Windows' else ';'
 
 Review comment:
   Yeah, let's use a single separator (prefer colon) to make things simpler and 
offer a platform-independent API.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240396)
Time Spent: 1h 20m  (was: 1h 10m)

> test_with_jar_packages_invalid_file_name test fails on windows
> --
>
> Key: BEAM-7253
> URL: https://issues.apache.org/jira/browse/BEAM-7253
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Heejong Lee
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> test_with_jar_packages_invalid_file_name test fails on windows. possibly 
> different class path separator on windows ";" as compared to linux ":".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6813) Issues with state + timers in java Direct Runner

2019-05-10 Thread JIRA


[ 
https://issues.apache.org/jira/browse/BEAM-6813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16837591#comment-16837591
 ] 

Jan Lukavský commented on BEAM-6813:


Probably related to https://issues.apache.org/jira/browse/BEAM-7269. 
[~SteveNiemitz] can you please confirm that either of the coders is missing 
hashCode and/or equals?

> Issues with state + timers in java Direct Runner 
> -
>
> Key: BEAM-6813
> URL: https://issues.apache.org/jira/browse/BEAM-6813
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Affects Versions: 2.11.0
>Reporter: Steve Niemitz
>Priority: Major
>
> I was experimenting with a stateful DoFn with timers, and ran into a weird 
> bug where a state cell I was writing to would come back as null when I read 
> it inside a timer callback.
> I've attached the code below [1] (please excuse the scala ;) ).
> After I dug into this a little bit, I found that the state's value was 
> present in the `underlying` table in CopyOnAccessMemoryStateTable [2], but 
> not set in the `stateTable` itself on the instance. [3]   Based on my very 
> rudimentary understanding of how this works in the direct runner, it seems 
> like commit() is not being called on the state table before the timer is 
> firing?
>   
>  [1]
> {code:java}
> private final class AggregatorDoFn[K, V, Acc, Out](
>   combiner: CombineFn[V, Acc, Out],
>   keyCoder: Coder[K],
>   accumulatorCoder: Coder[Acc]
> ) extends DoFn[KV[K, V], KV[K, Out]] {
>   @StateId(KeyId)
>   private final val keySpec = StateSpecs.value(keyCoder)
>   @StateId(AggregationId)
>   private final val stateSpec = StateSpecs.combining(accumulatorCoder, 
> combiner)
>   @StateId("numElements")
>   private final val numElementsSpec = StateSpecs.combining(Sum.ofLongs())
>   @TimerId(FlushTimerId)
>   private final val flushTimerSpec = 
> TimerSpecs.timer(TimeDomain.PROCESSING_TIME)
>   @ProcessElement
>   def processElement(
> @StateId(KeyId) key: ValueState[K],
> @StateId(AggregationId) state: CombiningState[V, Acc, Out],
> @StateId("numElements") numElements: CombiningState[JLong, _, JLong],
> @TimerId(FlushTimerId) flushTimer: Timer,
> @Element element: KV[K, V],
> window: BoundedWindow
>   ): Unit = {
> key.write(element.getKey)
> state.add(element.getValue)
> numElements.add(1L)
> if (numElements.read() == 1) {
>   flushTimer
> .offset(Duration.standardSeconds(10))
> .setRelative()
> }
>   }
>   @OnTimer(FlushTimerId)
>   def onFlushTimer(
> @StateId(KeyId) key: ValueState[K],
> @StateId(AggregationId) state: CombiningState[V, _, Out],
> @StateId("numElements") numElements: CombiningState[JLong, _, JLong],
> output: OutputReceiver[KV[K, Out]]
>   ): Unit = {
> if (numElements.read() > 0) {
>   val k = key.read()
>   output.output(
> KV.of(k, state.read())
>   )
> }
> numElements.clear()
>   }
> }{code}
> [2]
>  [https://imgur.com/a/xvPR5nd]
> [3]
>  [https://imgur.com/a/jznMdaQ]
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7008) adding UTF8 String coder to Java SDK ModelCoders

2019-05-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7008?focusedWorklogId=240402&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240402
 ]

ASF GitHub Bot logged work on BEAM-7008:


Author: ASF GitHub Bot
Created on: 10/May/19 20:40
Start Date: 10/May/19 20:40
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #8544: Revert "Merge pull 
request #8228: [BEAM-7008] adding UTF8 String code…
URL: https://github.com/apache/beam/pull/8544#issuecomment-491423651
 
 
   Run Python PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 240402)
Time Spent: 6h 10m  (was: 6h)

> adding UTF8 String coder to Java SDK ModelCoders
> 
>
> Key: BEAM-7008
> URL: https://issues.apache.org/jira/browse/BEAM-7008
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
> Fix For: 2.13.0
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> It looks like UTF-8 String Coder in Java and Python SDKs uses different 
> encoding schemes. StringUtf8Coder in Java SDK puts the varint length of the 
> input string before actual data bytes however StrUtf8Coder in Python SDK 
> directly encodes the input string to bytes value. We should unify the 
> encoding schemes of UTF8 strings across the different SDKs and make it a 
> standard coder.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7257) Add withProducerConfigUpdates to KafkaIO

2019-05-10 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7257:
---
Summary: Add withProducerConfigUpdates to KafkaIO  (was: adding 
withProducerConfigUpdates)

> Add withProducerConfigUpdates to KafkaIO
> 
>
> Key: BEAM-7257
> URL: https://issues.apache.org/jira/browse/BEAM-7257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
> Fix For: 2.13.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> adding withProducerConfigUpdates and deprecating updateProducerProperties



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   >