date:20191231

[jira] [Work logged] (BEAM-8935) Fail fast if sdk harness startup failed

2019-12-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8935?focusedWorklogId=364909&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-364909
 ]

ASF GitHub Bot logged work on BEAM-8935:


Author: ASF GitHub Bot
Created on: 31/Dec/19 08:00
Start Date: 31/Dec/19 08:00
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on issue #10338: [BEAM-8935] 
Fail fast if sdk harness startup failed.
URL: https://github.com/apache/beam/pull/10338#issuecomment-569884513
 
 
   Good catch! I have not found direct documentation for this question. 
However, I did find some useful information which maybe helpful.
   
   According to the 
[doc](https://docs.docker.com/engine/api/v1.40/#operation/ContainerList), the 
status of a docker container could be one of `created, restarting, running, 
removing, paused, exited, dead`. So I think we only need to consider if the 
status of a container could be `created` in race conditions after executing 
`docker run`(for the other status, it's obvious that there is something wrong). 
According to 
[StackOverFlow](https://stackoverflow.com/questions/37744961/docker-run-vs-create),
 `docker run = docker create + docker start`. I guess that `docker create` will 
change the state of container to `created` and `docker start` will change the 
state of container to `running`. It is further explained in 
[StackOverFlow](https://stackoverflow.com/questions/43734412/what-does-created-container-mean-in-docker)
 in which case the status of docker container could be "created": `docker 
create` and `docker run`. For `docker run`, it says that `Docker container has 
been created using docker run but it hasn't been able to start successfully`. 
So we can infer that if the docker status become `created` after `docker run`, 
the docker container isn't started successfully.
   So, I think current logic would be fine :) What do you think?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 364909)
Time Spent: 50m  (was: 40m)

> Fail fast if sdk harness startup failed
> ---
>
> Key: BEAM-8935
> URL: https://issues.apache.org/jira/browse/BEAM-8935
> Project: Beam
>  Issue Type: Improvement
>  Components: java-fn-execution
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently the runner waits for the sdk harness to startup blockingly until 
> the sdk harness is available or timeout occurs. The timeout is 1 or 2 
> minutes. If the sdk harness startup failed for some reason, the runner may be 
> aware of it after 1 or 2 minutes. This is too long.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8935) Fail fast if sdk harness startup failed

2019-12-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8935?focusedWorklogId=364910&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-364910
 ]

ASF GitHub Bot logged work on BEAM-8935:


Author: ASF GitHub Bot
Created on: 31/Dec/19 08:00
Start Date: 31/Dec/19 08:00
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on issue #10338: [BEAM-8935] 
Fail fast if sdk harness startup failed.
URL: https://github.com/apache/beam/pull/10338#issuecomment-569884513
 
 
   Good catch! I have not found direct documentation for this question. 
However, I did find some useful information which maybe helpful.
   
   According to the 
[doc](https://docs.docker.com/engine/api/v1.40/#operation/ContainerList), the 
status of a docker container could be one of `created, restarting, running, 
removing, paused, exited, dead`. So I think we only need to consider if the 
status of a container could be `created` in race conditions after executing 
`docker run`(for the other status, it's obvious that there is something wrong). 
According to 
[StackOverFlow](https://stackoverflow.com/questions/37744961/docker-run-vs-create),
 `docker run = docker create + docker start`. I guess that `docker create` will 
change the state of container to `created` and `docker start` will change the 
state of container to `running`. It is further explained in 
[StackOverFlow](https://stackoverflow.com/questions/43734412/what-does-created-container-mean-in-docker)
 in which case the status of docker container could be "created": `docker 
create` and `docker run`. For `docker run`, it says that `Docker container has 
been created using docker run but it hasn't been able to start successfully`. 
So we can infer that if the docker status become `created` after `docker run`, 
the docker container isn't started successfully.
   
   So, I think the check logic of current PR would be fine :) What do you think?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 364910)
Time Spent: 1h  (was: 50m)

> Fail fast if sdk harness startup failed
> ---
>
> Key: BEAM-8935
> URL: https://issues.apache.org/jira/browse/BEAM-8935
> Project: Beam
>  Issue Type: Improvement
>  Components: java-fn-execution
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently the runner waits for the sdk harness to startup blockingly until 
> the sdk harness is available or timeout occurs. The timeout is 1 or 2 
> minutes. If the sdk harness startup failed for some reason, the runner may be 
> aware of it after 1 or 2 minutes. This is too long.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8935) Fail fast if sdk harness startup failed

2019-12-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8935?focusedWorklogId=364911&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-364911
 ]

ASF GitHub Bot logged work on BEAM-8935:


Author: ASF GitHub Bot
Created on: 31/Dec/19 08:03
Start Date: 31/Dec/19 08:03
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on issue #10338: [BEAM-8935] 
Fail fast if sdk harness startup failed.
URL: https://github.com/apache/beam/pull/10338#issuecomment-569884513
 
 
   Good catch! I have not found direct documentation for this question. 
However, I did find some useful information which maybe helpful.
   
   According to the 
[doc](https://docs.docker.com/engine/api/v1.40/#operation/ContainerList), the 
status of a docker container could be one of `created, restarting, running, 
removing, paused, exited, dead`. So I think we only need to consider if the 
status of a container could be `created` in race conditions after executing 
`docker run`(for the other status, it's obvious that there is something wrong). 
According to 
[StackOverFlow](https://stackoverflow.com/questions/37744961/docker-run-vs-create),
 `docker run = docker create + docker start`. I guess that `docker create` will 
change the state of container to `created` and `docker start` will change the 
state of container to `running`. It is further explained in 
[StackOverFlow](https://stackoverflow.com/questions/43734412/what-does-created-container-mean-in-docker)
 in which case the status of docker container could be "created": `docker 
create` and `docker run`. For `docker run`, it says that `Docker container has 
been created using docker run but it hasn't been able to start successfully`. 
So we can infer that if the docker status become `created` after `docker run`, 
the docker container isn't started successfully.
   
   Besides, there is an unit test in 
[DockerCommandTest](https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/environment/DockerCommandTest.java#L60),
 it checks that the container becomes `running` after `docker run`. If there is 
race condition, I guess this test may fail from time to time. 
   
   So, I think the check logic of current PR would be fine :) What do you think?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 364911)
Time Spent: 1h 10m  (was: 1h)

> Fail fast if sdk harness startup failed
> ---
>
> Key: BEAM-8935
> URL: https://issues.apache.org/jira/browse/BEAM-8935
> Project: Beam
>  Issue Type: Improvement
>  Components: java-fn-execution
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently the runner waits for the sdk harness to startup blockingly until 
> the sdk harness is available or timeout occurs. The timeout is 1 or 2 
> minutes. If the sdk harness startup failed for some reason, the runner may be 
> aware of it after 1 or 2 minutes. This is too long.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9040) Add Spark Structured Streaming to Nexmark PostCommit run

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9040:
---
Status: Open  (was: Triage Needed)

> Add Spark Structured Streaming to Nexmark PostCommit run
> 
>
> Key: BEAM-9040
> URL: https://issues.apache.org/jira/browse/BEAM-9040
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark, testing-nexmark
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>
> The new Spark Structured Streaming runner is not part of our regular 
> PostCommit runs, adding it will help us track regressions as well as compare 
> its performance against the classic Spark runner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-9040) Add Spark Structured Streaming to Nexmark PostCommit run

2019-12-31 Thread Jira

Ismaël Mejía created BEAM-9040:
--

 Summary: Add Spark Structured Streaming to Nexmark PostCommit run
 Key: BEAM-9040
 URL: https://issues.apache.org/jira/browse/BEAM-9040
 Project: Beam
  Issue Type: Improvement
  Components: runner-spark, testing-nexmark
Reporter: Ismaël Mejía
Assignee: Ismaël Mejía


The new Spark Structured Streaming runner is not part of our regular PostCommit 
runs, adding it will help us track regressions as well as compare its 
performance against the classic Spark runner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9040) Add Spark Structured Streaming to Nexmark PostCommit run

2019-12-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9040?focusedWorklogId=364913&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-364913
 ]

ASF GitHub Bot logged work on BEAM-9040:


Author: ASF GitHub Bot
Created on: 31/Dec/19 08:18
Start Date: 31/Dec/19 08:18
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10485: [BEAM-9040] 
Add Spark Structured Streaming to Nexmark PostCommit run
URL: https://github.com/apache/beam/pull/10485
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 364913)
Remaining Estimate: 0h
Time Spent: 10m

> Add Spark Structured Streaming to Nexmark PostCommit run
> 
>
> Key: BEAM-9040
> URL: https://issues.apache.org/jira/browse/BEAM-9040
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark, testing-nexmark
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The new Spark Structured Streaming runner is not part of our regular 
> PostCommit runs, adding it will help us track regressions as well as compare 
> its performance against the classic Spark runner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9041) SchemaCoder equals should not rely on fromRow/toRow equality

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-9041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9041:
---
Status: Open  (was: Triage Needed)

> SchemaCoder equals should not rely on fromRow/toRow equality
> 
>
> Key: BEAM-9041
> URL: https://issues.apache.org/jira/browse/BEAM-9041
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>
> SchemaCoder equals implementation relies on SerializableFunction equals 
> method, this is error-prone because users rarely implement the equals method 
> for a SerializableFunction. One alternative would be to rely on bytes 
> equality for this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-9041) SchemaCoder equals should not rely on fromRow/toRow equality

2019-12-31 Thread Jira

Ismaël Mejía created BEAM-9041:
--

 Summary: SchemaCoder equals should not rely on fromRow/toRow 
equality
 Key: BEAM-9041
 URL: https://issues.apache.org/jira/browse/BEAM-9041
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-core
Reporter: Ismaël Mejía
Assignee: Ismaël Mejía


SchemaCoder equals implementation relies on SerializableFunction equals method, 
this is error-prone because users rarely implement the equals method for a 
SerializableFunction. One alternative would be to rely on bytes equality for 
this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9041) SchemaCoder equals should not rely on fromRow/toRow equality

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-9041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9041:
---
Fix Version/s: 2.18.0

> SchemaCoder equals should not rely on fromRow/toRow equality
> 
>
> Key: BEAM-9041
> URL: https://issues.apache.org/jira/browse/BEAM-9041
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: 2.18.0
>
>
> SchemaCoder equals implementation relies on SerializableFunction equals 
> method, this is error-prone because users rarely implement the equals method 
> for a SerializableFunction. One alternative would be to rely on bytes 
> equality for this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8882) Allow Dataflow to automatically choose portability or not.

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-8882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-8882:
---
Status: Open  (was: Triage Needed)

> Allow Dataflow to automatically choose portability or not.
> --
>
> Key: BEAM-8882
> URL: https://issues.apache.org/jira/browse/BEAM-8882
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Critical
> Fix For: 2.18.0
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> We would like the Dataflow service to be able to automatically choose whether 
> to run pipelines in a portable way. In order to do this, we need to provide 
> more information even if portability is not explicitly requested. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9030) Bump grpc to 1.26.0

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-9030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9030:
---
Summary: Bump grpc to 1.26.0  (was: Bump the version of GRPC to 1.22.0+(May 
be latest 1.26.0, currently 1.21.0))

> Bump grpc to 1.26.0
> ---
>
> Key: BEAM-9030
> URL: https://issues.apache.org/jira/browse/BEAM-9030
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution, runner-flink
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> When submitting a Python word count job to a Flink session/standalone cluster 
> repeatedly, the meta space usage of the task manager of the Flink cluster 
> will continuously increase (about 40MB each time). The reason is that the 
> Beam classes are loaded with the user class loader in Flink and there are 
> problems with the implementation of `ProcessManager`(from Beam) and 
> `ThreadPoolCache`(from netty) which may cause the user class loader could not 
> be garbage collected even after the job finished which causes the meta space 
> memory leak eventually. You can refer to FLINK-15338[1] for more information.
> Regarding to `ProcessManager`, I have created a JIRA BEAM-9006[2] to track 
> it. Regarding to `ThreadPoolCache`, it is a Netty problem and has been fixed 
> in NETTY#8955[3]. Netty 4.1.35 Final has already included this fix and GRPC 
> 1.22.0 has already dependents on Netty 4.1.35 Final. So we need to bump the 
> version of GRPC to 1.22.0+ (currently 1.21.0).
>  
> What do you think?
> [1] https://issues.apache.org/jira/browse/FLINK-15338
> [2] https://issues.apache.org/jira/browse/BEAM-9006
> [3] [https://github.com/netty/netty/pull/8955]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9030) Bump the version of GRPC to 1.22.0+(May be latest 1.26.0, currently 1.21.0)

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-9030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9030:
---
Status: Open  (was: Triage Needed)

> Bump the version of GRPC to 1.22.0+(May be latest 1.26.0, currently 1.21.0)
> ---
>
> Key: BEAM-9030
> URL: https://issues.apache.org/jira/browse/BEAM-9030
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution, runner-flink
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> When submitting a Python word count job to a Flink session/standalone cluster 
> repeatedly, the meta space usage of the task manager of the Flink cluster 
> will continuously increase (about 40MB each time). The reason is that the 
> Beam classes are loaded with the user class loader in Flink and there are 
> problems with the implementation of `ProcessManager`(from Beam) and 
> `ThreadPoolCache`(from netty) which may cause the user class loader could not 
> be garbage collected even after the job finished which causes the meta space 
> memory leak eventually. You can refer to FLINK-15338[1] for more information.
> Regarding to `ProcessManager`, I have created a JIRA BEAM-9006[2] to track 
> it. Regarding to `ThreadPoolCache`, it is a Netty problem and has been fixed 
> in NETTY#8955[3]. Netty 4.1.35 Final has already included this fix and GRPC 
> 1.22.0 has already dependents on Netty 4.1.35 Final. So we need to bump the 
> version of GRPC to 1.22.0+ (currently 1.21.0).
>  
> What do you think?
> [1] https://issues.apache.org/jira/browse/FLINK-15338
> [2] https://issues.apache.org/jira/browse/BEAM-9006
> [3] [https://github.com/netty/netty/pull/8955]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9006) Meta space memory leak caused by the shutdown hook of ProcessManager

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-9006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9006:
---
Status: Open  (was: Triage Needed)

> Meta space memory leak caused by the shutdown hook of ProcessManager 
> -
>
> Key: BEAM-9006
> URL: https://issues.apache.org/jira/browse/BEAM-9006
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently the class `ProcessManager` will add a shutdown hook to stop all the 
> living processes before JVM exits. The shutdown hook will never be removed. 
> If this class is loaded by the user class loader, it will cause the user 
> class loader could not be garbage collected which causes meta space memory 
> leak eventually.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9037) Promote proto logical type and duration to the core logical types

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-9037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9037:
---
Status: Open  (was: Triage Needed)

> Promote proto logical type and duration to the core logical types
> -
>
> Key: BEAM-9037
> URL: https://issues.apache.org/jira/browse/BEAM-9037
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core
>Reporter: Alex Van Boxel
>Assignee: Alex Van Boxel
>Priority: Major
>
> The proto schema includes Timestamp and Duration with nano precision. The 
> logical types should be promoted to the core logical types, so they can be 
> handled on various IO's as standard mandatory conversions.
> This means that the logical type should use the proto specific Timestamp and 
> Duration but the java 8 Instant and Duration.
> See discussion in the design document:
> [https://docs.google.com/document/d/1uu9pJktzT_O3DxGd1-Q2op4nRk4HekIZbzi-0oTAips/edit#heading=h.9uhml95iygqr]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (BEAM-9027) [SQL] ZetaSQL unparsing should produce valid result

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-9027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-9027:
--

Assignee: Kirill Kozlov

> [SQL] ZetaSQL unparsing should produce valid result
> ---
>
> Key: BEAM-9027
> URL: https://issues.apache.org/jira/browse/BEAM-9027
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql-zetasql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> * ZetaSQL does not recognize keyword INTERVAL
>  * Calcite cannot unparse RexNode back to bytes literal
>  * Calcite cannot unparse some floating point literals correctly
>  * Calcite cannot unparse some string literals correctly
>  * Calcite cannot unparse types correctly for CAST function



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9027) [SQL] ZetaSQL unparsing should produce valid result

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-9027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9027:
---
Status: Open  (was: Triage Needed)

> [SQL] ZetaSQL unparsing should produce valid result
> ---
>
> Key: BEAM-9027
> URL: https://issues.apache.org/jira/browse/BEAM-9027
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql-zetasql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> * ZetaSQL does not recognize keyword INTERVAL
>  * Calcite cannot unparse RexNode back to bytes literal
>  * Calcite cannot unparse some floating point literals correctly
>  * Calcite cannot unparse some string literals correctly
>  * Calcite cannot unparse types correctly for CAST function



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9032) Replace broadcast variables based side inputs with temp views

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-9032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9032:
---
Status: Open  (was: Triage Needed)

> Replace broadcast variables based side inputs with temp views
> -
>
> Key: BEAM-9032
> URL: https://issues.apache.org/jira/browse/BEAM-9032
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: structured-streaming
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9040) Add Spark Structured Streaming to Nexmark PostCommit run

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9040:
---
Labels: structured-streaming  (was: )

> Add Spark Structured Streaming to Nexmark PostCommit run
> 
>
> Key: BEAM-9040
> URL: https://issues.apache.org/jira/browse/BEAM-9040
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark, testing-nexmark
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Labels: structured-streaming
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The new Spark Structured Streaming runner is not part of our regular 
> PostCommit runs, adding it will help us track regressions as well as compare 
> its performance against the classic Spark runner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (BEAM-9031) Wrong Python example in Flink runner documentation

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-9031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-9031:
--

Assignee: Berkay Öztürk

> Wrong Python example in Flink runner documentation
> --
>
> Key: BEAM-9031
> URL: https://issues.apache.org/jira/browse/BEAM-9031
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Affects Versions: Not applicable
>Reporter: Berkay Öztürk
>Assignee: Berkay Öztürk
>Priority: Trivial
>  Labels: documentation, easyfix, newbie
> Fix For: Not applicable
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Python example under the [Executing a Beam pipeline on a Flink 
> Cluster|https://beam.apache.org/documentation/runners/flink/#executing-a-beam-pipeline-on-a-flink-cluster]
>  header will throw this error:
> {code}
> TypeError: Runner PipelineOptions() is not a PipelineRunner object or the 
> name of a registered runner
> {code}
> Fix:
> {code:python}
> import apache_beam as beam
> from apache_beam.options.pipeline_options import PipelineOptions
> options = PipelineOptions([
> "--runner=FlinkRunner",
> "--flink_version=1.8",
> "--flink_master=localhost:8081",
> "--environment_type=LOOPBACK"
> ])
> with beam.Pipeline(options=options) as p:
> ...
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9031) Wrong Python example in Flink runner documentation

2019-12-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9031?focusedWorklogId=364925&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-364925
 ]

ASF GitHub Bot logged work on BEAM-9031:


Author: ASF GitHub Bot
Created on: 31/Dec/19 09:05
Start Date: 31/Dec/19 09:05
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10465: [BEAM-9031] 
Fixes wrong Python example in Flink runner documentation
URL: https://github.com/apache/beam/pull/10465
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 364925)
Time Spent: 20m  (was: 10m)

> Wrong Python example in Flink runner documentation
> --
>
> Key: BEAM-9031
> URL: https://issues.apache.org/jira/browse/BEAM-9031
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Affects Versions: Not applicable
>Reporter: Berkay Öztürk
>Assignee: Berkay Öztürk
>Priority: Trivial
>  Labels: documentation, easyfix, newbie
> Fix For: Not applicable
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Python example under the [Executing a Beam pipeline on a Flink 
> Cluster|https://beam.apache.org/documentation/runners/flink/#executing-a-beam-pipeline-on-a-flink-cluster]
>  header will throw this error:
> {code}
> TypeError: Runner PipelineOptions() is not a PipelineRunner object or the 
> name of a registered runner
> {code}
> Fix:
> {code:python}
> import apache_beam as beam
> from apache_beam.options.pipeline_options import PipelineOptions
> options = PipelineOptions([
> "--runner=FlinkRunner",
> "--flink_version=1.8",
> "--flink_master=localhost:8081",
> "--environment_type=LOOPBACK"
> ])
> with beam.Pipeline(options=options) as p:
> ...
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9031) Wrong Python example in Flink runner documentation

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-9031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9031:
---
Status: Open  (was: Triage Needed)

> Wrong Python example in Flink runner documentation
> --
>
> Key: BEAM-9031
> URL: https://issues.apache.org/jira/browse/BEAM-9031
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Affects Versions: Not applicable
>Reporter: Berkay Öztürk
>Assignee: Berkay Öztürk
>Priority: Trivial
>  Labels: documentation, easyfix, newbie
> Fix For: Not applicable
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Python example under the [Executing a Beam pipeline on a Flink 
> Cluster|https://beam.apache.org/documentation/runners/flink/#executing-a-beam-pipeline-on-a-flink-cluster]
>  header will throw this error:
> {code}
> TypeError: Runner PipelineOptions() is not a PipelineRunner object or the 
> name of a registered runner
> {code}
> Fix:
> {code:python}
> import apache_beam as beam
> from apache_beam.options.pipeline_options import PipelineOptions
> options = PipelineOptions([
> "--runner=FlinkRunner",
> "--flink_version=1.8",
> "--flink_master=localhost:8081",
> "--environment_type=LOOPBACK"
> ])
> with beam.Pipeline(options=options) as p:
> ...
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (BEAM-9031) Wrong Python example in Flink runner documentation

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-9031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-9031.

Resolution: Fixed

> Wrong Python example in Flink runner documentation
> --
>
> Key: BEAM-9031
> URL: https://issues.apache.org/jira/browse/BEAM-9031
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Affects Versions: Not applicable
>Reporter: Berkay Öztürk
>Assignee: Berkay Öztürk
>Priority: Trivial
>  Labels: documentation, easyfix, newbie
> Fix For: Not applicable
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Python example under the [Executing a Beam pipeline on a Flink 
> Cluster|https://beam.apache.org/documentation/runners/flink/#executing-a-beam-pipeline-on-a-flink-cluster]
>  header will throw this error:
> {code}
> TypeError: Runner PipelineOptions() is not a PipelineRunner object or the 
> name of a registered runner
> {code}
> Fix:
> {code:python}
> import apache_beam as beam
> from apache_beam.options.pipeline_options import PipelineOptions
> options = PipelineOptions([
> "--runner=FlinkRunner",
> "--flink_version=1.8",
> "--flink_master=localhost:8081",
> "--environment_type=LOOPBACK"
> ])
> with beam.Pipeline(options=options) as p:
> ...
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8960) Add an option for user to be able to opt out of using insert id for BigQuery streaming insert.

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-8960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-8960:
---
Status: Open  (was: Triage Needed)

> Add an option for user to be able to opt out of using insert id for BigQuery 
> streaming insert.
> --
>
> Key: BEAM-8960
> URL: https://issues.apache.org/jira/browse/BEAM-8960
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Yiru Tang
>Assignee: Yiru Tang
>Priority: Minor
>   Original Estimate: 24h
>  Time Spent: 2h 10m
>  Remaining Estimate: 21h 50m
>
> BigQuery streaming insert id offers best effort insert deduplication. If user 
> choose to opt out of using insert ids, they could potentially to be opt into 
> using our current new streaming backend which gives higher speed and more 
> quota. Insert id deduplication is best effort and doesn't have ultimate just 
> once guarantees.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (BEAM-8960) Add an option for user to be able to opt out of using insert id for BigQuery streaming insert.

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-8960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-8960:
--

Assignee: Yiru Tang

> Add an option for user to be able to opt out of using insert id for BigQuery 
> streaming insert.
> --
>
> Key: BEAM-8960
> URL: https://issues.apache.org/jira/browse/BEAM-8960
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Yiru Tang
>Assignee: Yiru Tang
>Priority: Minor
>   Original Estimate: 24h
>  Time Spent: 2h 10m
>  Remaining Estimate: 21h 50m
>
> BigQuery streaming insert id offers best effort insert deduplication. If user 
> choose to opt out of using insert ids, they could potentially to be opt into 
> using our current new streaming backend which gives higher speed and more 
> quota. Insert id deduplication is best effort and doesn't have ultimate just 
> once guarantees.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9029) Two bugs in Python SDK S3 filesystem support

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-9029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9029:
---
Status: Open  (was: Triage Needed)

> Two bugs in Python SDK S3 filesystem support
> 
>
> Key: BEAM-9029
> URL: https://issues.apache.org/jira/browse/BEAM-9029
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Wenhai Pan
>Assignee: Wenhai Pan
>Priority: Major
>  Labels: pull-request-available
>   Original Estimate: 24h
>  Time Spent: 20m
>  Remaining Estimate: 23h 40m
>
> Hi :)
> There seem to be 2 bugs in the S3 filesystem support.
> I tried to use S3 storage for a simple wordcount demo with DirectRunner.
> The demo script:
> {code:java}
> def main():
> options = PipelineOptions().view_as(StandardOptions)
>  options.runner = 'DirectRunner'
> pipeline = beam.Pipeline(options = options)
> (
>  pipeline
>  | ReadFromText("s3://mx-machine-learning/panwenhai/beam_test/test_data")
>  | "extract_words" >> beam.FlatMap(lambda x: re.findall(r" [A-Za-z\']+", x))
>  | beam.combiners.Count.PerElement()
>  | beam.MapTuple(lambda word, count: "%s: %s" % (word, count))
>  | WriteToText("s3://mx-machine-learning/panwenhai/beam_test/output")
>  )
> result = pipeline.run()
>  result.wait_until_finish()
> return
> {code}
>  
> Error message 1:
> {noformat}
> apache_beam.io.filesystem.BeamIOError: Match operation failed with exceptions 
> {'s3://mx-machine-learning/panwenhai/beam_test/output-*-of-1': 
> BeamIOError("List operation failed with exceptions 
> {'s3://mx-machine-learning/panwenhai/beam_test/output-': S3ClientError('Tried 
> to list nonexistent S3 path: 
> s3://mx-machine-learning/panwenhai/beam_test/output-', 404)}")} [while 
> running 'WriteToText/Write/WriteImpl/PreFinalize'] with exceptions 
> None{noformat}
>  
> After digging into the code, it seems the Boto3 client's list function will 
> raise an exception when trying to list a nonexistent S3 path 
> (beam/sdks/pythonapache_beam/io/aws/clients/s3/boto3_client.py line 111). And 
> the S3IO class does not handle this exception in list_prefix function 
> (beam/sdks/python/apache_beam/io/aws/s3io.py line 121).
> When the runner tries to list and delete the existing output file, if there 
> is no existing output file, it will try to list a nonexistent S3 path and 
> will trigger the exception.
> This should not be an issue here. I think we can ignore this exception safely 
> in the S3IO list_prefix function.
> Error Message 2:
> {noformat}
> File 
> "/Users/wenhai.pan/venvs/tfx/lib/python3.7/site-packages/apache_beam-2.19.0.dev0-py3.7.egg/apache_beam/io/aws/s3filesystem.py",
>  line 272, in delete
> exceptions = {path: error for (path, error) in results
> File 
> "/Users/wenhai.pan/venvs/tfx/lib/python3.7/site-packages/apache_beam-2.19.0.dev0-py3.7.egg/apache_beam/io/aws/s3filesystem.py",
>  line 272, in 
> exceptions = {path: error for (path, error) in results
> ValueError: too many values to unpack (expected 2) [while running 
> 'WriteToText/Write/WriteImpl/FinalizeWrite']{noformat}
>  
> When the runner tries to delete the temporary output directory, it will 
> trigger this exception. This exception is caused by parsing (path, error) 
> directly from the "results" which is a dict 
> (beam/sdks/python/apache_beam/io/aws/s3filesystem.py line 272). I think we 
> should use results.items() here.
> I have submitted a patch for these 2 bugs: 
> https://github.com/apache/beam/pull/10459
>  
> Thank you.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (BEAM-9029) Two bugs in Python SDK S3 filesystem support

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-9029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-9029:
--

Assignee: Wenhai Pan

> Two bugs in Python SDK S3 filesystem support
> 
>
> Key: BEAM-9029
> URL: https://issues.apache.org/jira/browse/BEAM-9029
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Wenhai Pan
>Assignee: Wenhai Pan
>Priority: Major
>  Labels: pull-request-available
>   Original Estimate: 24h
>  Time Spent: 20m
>  Remaining Estimate: 23h 40m
>
> Hi :)
> There seem to be 2 bugs in the S3 filesystem support.
> I tried to use S3 storage for a simple wordcount demo with DirectRunner.
> The demo script:
> {code:java}
> def main():
> options = PipelineOptions().view_as(StandardOptions)
>  options.runner = 'DirectRunner'
> pipeline = beam.Pipeline(options = options)
> (
>  pipeline
>  | ReadFromText("s3://mx-machine-learning/panwenhai/beam_test/test_data")
>  | "extract_words" >> beam.FlatMap(lambda x: re.findall(r" [A-Za-z\']+", x))
>  | beam.combiners.Count.PerElement()
>  | beam.MapTuple(lambda word, count: "%s: %s" % (word, count))
>  | WriteToText("s3://mx-machine-learning/panwenhai/beam_test/output")
>  )
> result = pipeline.run()
>  result.wait_until_finish()
> return
> {code}
>  
> Error message 1:
> {noformat}
> apache_beam.io.filesystem.BeamIOError: Match operation failed with exceptions 
> {'s3://mx-machine-learning/panwenhai/beam_test/output-*-of-1': 
> BeamIOError("List operation failed with exceptions 
> {'s3://mx-machine-learning/panwenhai/beam_test/output-': S3ClientError('Tried 
> to list nonexistent S3 path: 
> s3://mx-machine-learning/panwenhai/beam_test/output-', 404)}")} [while 
> running 'WriteToText/Write/WriteImpl/PreFinalize'] with exceptions 
> None{noformat}
>  
> After digging into the code, it seems the Boto3 client's list function will 
> raise an exception when trying to list a nonexistent S3 path 
> (beam/sdks/pythonapache_beam/io/aws/clients/s3/boto3_client.py line 111). And 
> the S3IO class does not handle this exception in list_prefix function 
> (beam/sdks/python/apache_beam/io/aws/s3io.py line 121).
> When the runner tries to list and delete the existing output file, if there 
> is no existing output file, it will try to list a nonexistent S3 path and 
> will trigger the exception.
> This should not be an issue here. I think we can ignore this exception safely 
> in the S3IO list_prefix function.
> Error Message 2:
> {noformat}
> File 
> "/Users/wenhai.pan/venvs/tfx/lib/python3.7/site-packages/apache_beam-2.19.0.dev0-py3.7.egg/apache_beam/io/aws/s3filesystem.py",
>  line 272, in delete
> exceptions = {path: error for (path, error) in results
> File 
> "/Users/wenhai.pan/venvs/tfx/lib/python3.7/site-packages/apache_beam-2.19.0.dev0-py3.7.egg/apache_beam/io/aws/s3filesystem.py",
>  line 272, in 
> exceptions = {path: error for (path, error) in results
> ValueError: too many values to unpack (expected 2) [while running 
> 'WriteToText/Write/WriteImpl/FinalizeWrite']{noformat}
>  
> When the runner tries to delete the temporary output directory, it will 
> trigger this exception. This exception is caused by parsing (path, error) 
> directly from the "results" which is a dict 
> (beam/sdks/python/apache_beam/io/aws/s3filesystem.py line 272). I think we 
> should use results.items() here.
> I have submitted a patch for these 2 bugs: 
> https://github.com/apache/beam/pull/10459
>  
> Thank you.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9013) Multi-output TestStream breaks the DataflowRunner

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-9013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9013:
---
Status: Open  (was: Triage Needed)

> Multi-output TestStream breaks the DataflowRunner
> -
>
> Key: BEAM-9013
> URL: https://issues.apache.org/jira/browse/BEAM-9013
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Affects Versions: 2.17.0
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (BEAM-8999) PGBKCVOperation does not respect timestamp combiners

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-8999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-8999.

Fix Version/s: 2.19.0
   Resolution: Fixed

> PGBKCVOperation does not respect timestamp combiners
> 
>
> Key: BEAM-8999
> URL: https://issues.apache.org/jira/browse/BEAM-8999
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> We prevent lifting in the FnAPI runner in this case, but other optimizers 
> (e.g. the Greedy Fuser and Dataflow) do not, resulting in incorrect 
> timestamps. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8999) PGBKCVOperation does not respect timestamp combiners

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-8999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-8999:
---
Status: Open  (was: Triage Needed)

> PGBKCVOperation does not respect timestamp combiners
> 
>
> Key: BEAM-8999
> URL: https://issues.apache.org/jira/browse/BEAM-8999
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> We prevent lifting in the FnAPI runner in this case, but other optimizers 
> (e.g. the Greedy Fuser and Dataflow) do not, resulting in incorrect 
> timestamps. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (BEAM-8999) PGBKCVOperation does not respect timestamp combiners

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-8999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-8999:
--

Assignee: Robert Bradshaw

> PGBKCVOperation does not respect timestamp combiners
> 
>
> Key: BEAM-8999
> URL: https://issues.apache.org/jira/browse/BEAM-8999
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> We prevent lifting in the FnAPI runner in this case, but other optimizers 
> (e.g. the Greedy Fuser and Dataflow) do not, resulting in incorrect 
> timestamps. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8234) Java dependencies page outdated

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-8234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-8234:
---
Component/s: (was: beam-community)
 website

> Java dependencies page outdated
> ---
>
> Key: BEAM-8234
> URL: https://issues.apache.org/jira/browse/BEAM-8234
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Neville Li
>Assignee: Melissa Pashniak
>Priority: Trivial
>
> https://github.com/apache/beam/blob/master/website/src/documentation/sdks/java-dependencies.md
> Latest release as of today is 2.15.0 while latest documented release in that 
> page is 2.9.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (BEAM-8647) Remove .mailmap from the sources

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-8647:
--

Assignee: (was: Aizhamal Nurmamat kyzy)

> Remove .mailmap from the sources
> 
>
> Key: BEAM-8647
> URL: https://issues.apache.org/jira/browse/BEAM-8647
> Project: Beam
>  Issue Type: Task
>  Components: beam-community
>Reporter: Romain Manni-Bucau
>Priority: Major
>
> Hi,
>  
> .mailmap manipulates individuals data which are considered "personal" (name, 
> email etc)
> AFAIK Apache/Beam is not allowed to do it straight, in particular for EU 
> citizens (_GDPR)._
> Can the file be removed since it is not used by the beam project (at least 
> apache/beam repo)?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8647) Remove .mailmap from the sources

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-8647:
---
Component/s: (was: beam-community)
 build-system

> Remove .mailmap from the sources
> 
>
> Key: BEAM-8647
> URL: https://issues.apache.org/jira/browse/BEAM-8647
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Romain Manni-Bucau
>Priority: Major
>
> Hi,
>  
> .mailmap manipulates individuals data which are considered "personal" (name, 
> email etc)
> AFAIK Apache/Beam is not allowed to do it straight, in particular for EU 
> citizens (_GDPR)._
> Can the file be removed since it is not used by the beam project (at least 
> apache/beam repo)?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9007) beam.DoFn setup() will call several times when using python subprocess

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-9007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9007:
---
Component/s: (was: examples-python)
 (was: beam-community)
 sdk-py-core

> beam.DoFn setup() will call several times when using python subprocess
> --
>
> Key: BEAM-9007
> URL: https://issues.apache.org/jira/browse/BEAM-9007
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.15.0, 2.16.0
> Environment: python 3.5
> apache-beam[gcp] == 2.16.*
> google-cloud-storage == 1.23.*
> google-resumable-media == 0.5.*
> googleapis-common-protos == 1.6.*
> grpc-google-logging-v2 == 0.11.*
>Reporter: Hokuto Tateyama
>Assignee: Aizhamal Nurmamat kyzy
>Priority: Minor
>
> Hello. 
>  I`m trying to use a make command on dataflow to use OpenCV source written in 
> C++.
> I was thinking, *setup()* function on *beam.DoFn* will run only once a time 
> before the process runs.
>  So I tried to run build commands on the setup() function, and it will run 
> successfully.
> h1. Problem
> After the running process, the setup() function will run again and try to 
> build commands several times. I`ve checked these logs from my stack driver.
> h1. Codes
> These are my codes using dataflow. I defined the command_list in the class 
> that inheritance from beam.DoFn and call run_cmd() from setup().
> ・Run command lines.
> {code:python}
> def run_cmd(command_list: List[List[str]], shell: bool = False) -> 
> List[Dict[str, Any]]:
>   outputs = []
>   try:
>   for cmd in command_list:
>   logging.info(cmd)
>   proc = subprocess.check_output(
>   cmd, shell=shell, stderr=subprocess.STDOUT, 
> universal_newlines=True)
>   outputs.append({“Input: “: cmd, “Output: “: proc})
>   except subprocess.CalledProcessError as e:
>   logging.warning(“Return code:{}, 
> Output:{}”.format(e.returncode, e.output))
>   return outputs{code}
> ・Command list to pass run_cmd() function.
> {code:python}
> command_list = [
> [“cat /etc/issue”],
> [“apt-get —assume-yes update”],
> [
> “apt-get —assume-yes install —no-install-recommends ffmpeg git 
> software-properties-common”
> ],
> [“apt-get install -y software-properties-common”],
> [
> ‘add-apt-repository -s “deb http://security.ubuntu.com/ubuntu 
> bionic-security main”’
> ],
> [
> “apt-get install -y build-essential checkinstall cmake unzip 
> pkg-config yasm unzip”
> ],
> [“apt-get -y install git gfortran python3-dev”],
> [
> “apt-get -y install libjpeg62-turbo-dev libpng-dev libpng16-16 
> libavcodec-dev libavformat-dev libswscale-dev libdc1394-22-dev libxine2-dev 
> libv4l-dev”
> ],
> [“apt-get -y install libjpeg-dev libpng-dev libtiff-dev libtbb-dev”],
> [
> “apt-get -y install libavcodec-dev libavformat-dev libswscale-dev 
> libv4l-dev libatlas-base-dev libxvidcore-dev libx264-dev libgtk-3-dev”
> ],
> [“apt-get clean”],
> [“rm -rf /var/lib/apt/lists/*”],
> [“git clone https://github.com/opencv/opencv.git”],
> [“git clone https://github.com/opencv/opencv_contrib.git”],
> [“cd opencv_contrib”],
> [“git checkout -b 3.4.3 refs/tags/3.4.3”],
> [“cd ../opencv/“],
> [“git checkout -b 3.4.3 refs/tags/3.4.3”],
> [“mkdir build”],
> [“cd build”],
> [
> “cmake -D CMAKE_BUILD_TYPE=Release \
> -D CMAKE_INSTALL_PREFIX=/usr/local \
> -D WITH_TBB=ON \
> -D OPENCV_EXTRA_MODULES_PATH=../../opencv_contrib/modules ..”
> ],
> [“make -j8”],
> [“make install”],
> [“echo /usr/local/lib > /etc/ld.so.conf.d/opencv.conf”],
> [“ldconfig -v”]
> ]
> {code}
> h1. Question
> For my summary, I`m wondering if these are bugs for apache beam.
>  # What is the reason for calling setup() several times?
>  # Is there any solution to set up these commands only once in the total 
> running? This is a method what I tried.
>  ## Using os.system() instead of subprocess. I think subprocess will create 
> another process on setup() so, it can not extract process finished 
> successfully.
>  ## Writing commands on setup.py and use it for CustomCommand
>  [https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/]
>  
> Regards, Collonville



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (BEAM-9016) Select PTransform result order is not predictable

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-9016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-9016:
--

Assignee: (was: Aizhamal Nurmamat kyzy)

> Select PTransform result order is not predictable
> -
>
> Key: BEAM-9016
> URL: https://issues.apache.org/jira/browse/BEAM-9016
> Project: Beam
>  Issue Type: Bug
>  Components: beam-community
>Reporter: Yang Zhang
>Priority: Major
>
> pipeline.apply(Select.fieldNames("x", "y"))
> pipeline.apply(Select.fieldNames("a", "b"))
> The returned output order is not predictable. In the above two examples, 
> field `x` may return first, while field `a` (also queries in the first place) 
> may return in the second place.  
>  
> Shall we add `withOrderByFieldInsertionOrder` to fieldAccessDescriptor in 
> Select PTransform, so that the return order is predictable?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (BEAM-9007) beam.DoFn setup() will call several times when using python subprocess

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-9007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-9007:
--

Assignee: (was: Aizhamal Nurmamat kyzy)

> beam.DoFn setup() will call several times when using python subprocess
> --
>
> Key: BEAM-9007
> URL: https://issues.apache.org/jira/browse/BEAM-9007
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.15.0, 2.16.0
> Environment: python 3.5
> apache-beam[gcp] == 2.16.*
> google-cloud-storage == 1.23.*
> google-resumable-media == 0.5.*
> googleapis-common-protos == 1.6.*
> grpc-google-logging-v2 == 0.11.*
>Reporter: Hokuto Tateyama
>Priority: Minor
>
> Hello. 
>  I`m trying to use a make command on dataflow to use OpenCV source written in 
> C++.
> I was thinking, *setup()* function on *beam.DoFn* will run only once a time 
> before the process runs.
>  So I tried to run build commands on the setup() function, and it will run 
> successfully.
> h1. Problem
> After the running process, the setup() function will run again and try to 
> build commands several times. I`ve checked these logs from my stack driver.
> h1. Codes
> These are my codes using dataflow. I defined the command_list in the class 
> that inheritance from beam.DoFn and call run_cmd() from setup().
> ・Run command lines.
> {code:python}
> def run_cmd(command_list: List[List[str]], shell: bool = False) -> 
> List[Dict[str, Any]]:
>   outputs = []
>   try:
>   for cmd in command_list:
>   logging.info(cmd)
>   proc = subprocess.check_output(
>   cmd, shell=shell, stderr=subprocess.STDOUT, 
> universal_newlines=True)
>   outputs.append({“Input: “: cmd, “Output: “: proc})
>   except subprocess.CalledProcessError as e:
>   logging.warning(“Return code:{}, 
> Output:{}”.format(e.returncode, e.output))
>   return outputs{code}
> ・Command list to pass run_cmd() function.
> {code:python}
> command_list = [
> [“cat /etc/issue”],
> [“apt-get —assume-yes update”],
> [
> “apt-get —assume-yes install —no-install-recommends ffmpeg git 
> software-properties-common”
> ],
> [“apt-get install -y software-properties-common”],
> [
> ‘add-apt-repository -s “deb http://security.ubuntu.com/ubuntu 
> bionic-security main”’
> ],
> [
> “apt-get install -y build-essential checkinstall cmake unzip 
> pkg-config yasm unzip”
> ],
> [“apt-get -y install git gfortran python3-dev”],
> [
> “apt-get -y install libjpeg62-turbo-dev libpng-dev libpng16-16 
> libavcodec-dev libavformat-dev libswscale-dev libdc1394-22-dev libxine2-dev 
> libv4l-dev”
> ],
> [“apt-get -y install libjpeg-dev libpng-dev libtiff-dev libtbb-dev”],
> [
> “apt-get -y install libavcodec-dev libavformat-dev libswscale-dev 
> libv4l-dev libatlas-base-dev libxvidcore-dev libx264-dev libgtk-3-dev”
> ],
> [“apt-get clean”],
> [“rm -rf /var/lib/apt/lists/*”],
> [“git clone https://github.com/opencv/opencv.git”],
> [“git clone https://github.com/opencv/opencv_contrib.git”],
> [“cd opencv_contrib”],
> [“git checkout -b 3.4.3 refs/tags/3.4.3”],
> [“cd ../opencv/“],
> [“git checkout -b 3.4.3 refs/tags/3.4.3”],
> [“mkdir build”],
> [“cd build”],
> [
> “cmake -D CMAKE_BUILD_TYPE=Release \
> -D CMAKE_INSTALL_PREFIX=/usr/local \
> -D WITH_TBB=ON \
> -D OPENCV_EXTRA_MODULES_PATH=../../opencv_contrib/modules ..”
> ],
> [“make -j8”],
> [“make install”],
> [“echo /usr/local/lib > /etc/ld.so.conf.d/opencv.conf”],
> [“ldconfig -v”]
> ]
> {code}
> h1. Question
> For my summary, I`m wondering if these are bugs for apache beam.
>  # What is the reason for calling setup() several times?
>  # Is there any solution to set up these commands only once in the total 
> running? This is a method what I tried.
>  ## Using os.system() instead of subprocess. I think subprocess will create 
> another process on setup() so, it can not extract process finished 
> successfully.
>  ## Writing commands on setup.py and use it for CustomCommand
>  [https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/]
>  
> Regards, Collonville



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9016) Select transform (Schema-based) result order is not predictable

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-9016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9016:
---
Summary: Select transform (Schema-based) result order is not predictable  
(was: Select PTransform result order is not predictable)

> Select transform (Schema-based) result order is not predictable
> ---
>
> Key: BEAM-9016
> URL: https://issues.apache.org/jira/browse/BEAM-9016
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Yang Zhang
>Priority: Major
>
> pipeline.apply(Select.fieldNames("x", "y"))
> pipeline.apply(Select.fieldNames("a", "b"))
> The returned output order is not predictable. In the above two examples, 
> field `x` may return first, while field `a` (also queries in the first place) 
> may return in the second place.  
>  
> Shall we add `withOrderByFieldInsertionOrder` to fieldAccessDescriptor in 
> Select PTransform, so that the return order is predictable?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9016) Select PTransform result order is not predictable

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-9016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9016:
---
Component/s: (was: beam-community)
 sdk-java-core

> Select PTransform result order is not predictable
> -
>
> Key: BEAM-9016
> URL: https://issues.apache.org/jira/browse/BEAM-9016
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Yang Zhang
>Priority: Major
>
> pipeline.apply(Select.fieldNames("x", "y"))
> pipeline.apply(Select.fieldNames("a", "b"))
> The returned output order is not predictable. In the above two examples, 
> field `x` may return first, while field `a` (also queries in the first place) 
> may return in the second place.  
>  
> Shall we add `withOrderByFieldInsertionOrder` to fieldAccessDescriptor in 
> Select PTransform, so that the return order is predictable?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (BEAM-9020) LengthPrefixUnknownCodersTest to avoid relying on AbstractMap's equality

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-9020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-9020:
--

Assignee: Luke Cwik

> LengthPrefixUnknownCodersTest to avoid relying on AbstractMap's equality
> 
>
> Key: BEAM-9020
> URL: https://issues.apache.org/jira/browse/BEAM-9020
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Tomo Suzuki
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> In an attempt for BEAM-8695 LengthPrefixUnknownCodersTest failed when trying 
> to upgrade google-http-client v1.34.0, because LengthPrefixUnknownCodersTest 
> relies on the equality of {{CloudObject}} with Map.
> Class hierarchy:
> {noformat}
> CloudObject < GenericJson < GenericData < AbstractMap
> {noformat}
> It was working fine as long as CloudObject's equality inherits 
> AbstractMap.equality. {{GenericData}} did not override equals method in 
> google-http-client v1.28.0 and earlier. The comparison was checking only key 
> and value of a Map.
> {code:java}
> assertEquals(
> CloudObjects.asCloudObject(prefixedWindowedValueCoder, null),   // This 
> is a CloudObject
> lengthPrefixedCoderCloudObject); // This is a 
> Map
> {code}
> However, with google-http-client v1.29.0 or higher, GenericData has its own 
> {{equals}} method 
> ([PR#589|https://github.com/googleapis/google-http-java-client/pull/589])  
> that checks {{classInfo}} and thus the comparisons between a CloudObject and 
> a Map always fail.
> Test failures when I tried to upgrade google-http-client 1.34.0 
> ([Jenkins|https://builds.apache.org/job/beam_PreCommit_Java_Commit/9288/#showFailuresLink]):
> {noformat}
> org.apache.beam.runners.dataflow.worker.graph.LengthPrefixUnknownCodersTest.testLengthPrefixUnknownCoders
> org.apache.beam.runners.dataflow.worker.graph.LengthPrefixUnknownCodersTest.testLengthPrefixForInstructionOutputNodeWithGrpcNodeSuccessor
> org.apache.beam.runners.dataflow.worker.graph.LengthPrefixUnknownCodersTest.testLengthPrefixForLengthPrefixCoder
> org.apache.beam.runners.dataflow.worker.graph.LengthPrefixUnknownCodersTest.testLengthPrefixForSideInputInfos
> org.apache.beam.runners.dataflow.worker.graph.LengthPrefixUnknownCodersTest.testLengthPrefixParDoInstructionCoder
> org.apache.beam.runners.dataflow.worker.graph.LengthPrefixUnknownCodersTest.testLengthPrefixInstructionOutputCoder
> org.apache.beam.runners.dataflow.worker.graph.LengthPrefixUnknownCodersTest.testLengthPrefixWriteInstructionCoder
> org.apache.beam.runners.dataflow.worker.graph.LengthPrefixUnknownCodersTest.testLengthPrefixAndReplaceUnknownCoder
> org.apache.beam.runners.dataflow.worker.graph.LengthPrefixUnknownCodersTest.testLengthPrefixAndReplaceForRunnerNetwork
> org.apache.beam.runners.dataflow.worker.graph.LengthPrefixUnknownCodersTest.testLengthPrefixForInstructionOutputNodeWithGrpcNodePredecessor
> org.apache.beam.runners.dataflow.worker.graph.LengthPrefixUnknownCodersTest.testLengthPrefixReadInstructionCoder
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (BEAM-9020) LengthPrefixUnknownCodersTest to avoid relying on AbstractMap's equality

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-9020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-9020.

Fix Version/s: 2.19.0
   Resolution: Fixed

> LengthPrefixUnknownCodersTest to avoid relying on AbstractMap's equality
> 
>
> Key: BEAM-9020
> URL: https://issues.apache.org/jira/browse/BEAM-9020
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Tomo Suzuki
>Assignee: Luke Cwik
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> In an attempt for BEAM-8695 LengthPrefixUnknownCodersTest failed when trying 
> to upgrade google-http-client v1.34.0, because LengthPrefixUnknownCodersTest 
> relies on the equality of {{CloudObject}} with Map.
> Class hierarchy:
> {noformat}
> CloudObject < GenericJson < GenericData < AbstractMap
> {noformat}
> It was working fine as long as CloudObject's equality inherits 
> AbstractMap.equality. {{GenericData}} did not override equals method in 
> google-http-client v1.28.0 and earlier. The comparison was checking only key 
> and value of a Map.
> {code:java}
> assertEquals(
> CloudObjects.asCloudObject(prefixedWindowedValueCoder, null),   // This 
> is a CloudObject
> lengthPrefixedCoderCloudObject); // This is a 
> Map
> {code}
> However, with google-http-client v1.29.0 or higher, GenericData has its own 
> {{equals}} method 
> ([PR#589|https://github.com/googleapis/google-http-java-client/pull/589])  
> that checks {{classInfo}} and thus the comparisons between a CloudObject and 
> a Map always fail.
> Test failures when I tried to upgrade google-http-client 1.34.0 
> ([Jenkins|https://builds.apache.org/job/beam_PreCommit_Java_Commit/9288/#showFailuresLink]):
> {noformat}
> org.apache.beam.runners.dataflow.worker.graph.LengthPrefixUnknownCodersTest.testLengthPrefixUnknownCoders
> org.apache.beam.runners.dataflow.worker.graph.LengthPrefixUnknownCodersTest.testLengthPrefixForInstructionOutputNodeWithGrpcNodeSuccessor
> org.apache.beam.runners.dataflow.worker.graph.LengthPrefixUnknownCodersTest.testLengthPrefixForLengthPrefixCoder
> org.apache.beam.runners.dataflow.worker.graph.LengthPrefixUnknownCodersTest.testLengthPrefixForSideInputInfos
> org.apache.beam.runners.dataflow.worker.graph.LengthPrefixUnknownCodersTest.testLengthPrefixParDoInstructionCoder
> org.apache.beam.runners.dataflow.worker.graph.LengthPrefixUnknownCodersTest.testLengthPrefixInstructionOutputCoder
> org.apache.beam.runners.dataflow.worker.graph.LengthPrefixUnknownCodersTest.testLengthPrefixWriteInstructionCoder
> org.apache.beam.runners.dataflow.worker.graph.LengthPrefixUnknownCodersTest.testLengthPrefixAndReplaceUnknownCoder
> org.apache.beam.runners.dataflow.worker.graph.LengthPrefixUnknownCodersTest.testLengthPrefixAndReplaceForRunnerNetwork
> org.apache.beam.runners.dataflow.worker.graph.LengthPrefixUnknownCodersTest.testLengthPrefixForInstructionOutputNodeWithGrpcNodePredecessor
> org.apache.beam.runners.dataflow.worker.graph.LengthPrefixUnknownCodersTest.testLengthPrefixReadInstructionCoder
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9020) LengthPrefixUnknownCodersTest to avoid relying on AbstractMap's equality

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-9020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9020:
---
Status: Open  (was: Triage Needed)

> LengthPrefixUnknownCodersTest to avoid relying on AbstractMap's equality
> 
>
> Key: BEAM-9020
> URL: https://issues.apache.org/jira/browse/BEAM-9020
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Tomo Suzuki
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> In an attempt for BEAM-8695 LengthPrefixUnknownCodersTest failed when trying 
> to upgrade google-http-client v1.34.0, because LengthPrefixUnknownCodersTest 
> relies on the equality of {{CloudObject}} with Map.
> Class hierarchy:
> {noformat}
> CloudObject < GenericJson < GenericData < AbstractMap
> {noformat}
> It was working fine as long as CloudObject's equality inherits 
> AbstractMap.equality. {{GenericData}} did not override equals method in 
> google-http-client v1.28.0 and earlier. The comparison was checking only key 
> and value of a Map.
> {code:java}
> assertEquals(
> CloudObjects.asCloudObject(prefixedWindowedValueCoder, null),   // This 
> is a CloudObject
> lengthPrefixedCoderCloudObject); // This is a 
> Map
> {code}
> However, with google-http-client v1.29.0 or higher, GenericData has its own 
> {{equals}} method 
> ([PR#589|https://github.com/googleapis/google-http-java-client/pull/589])  
> that checks {{classInfo}} and thus the comparisons between a CloudObject and 
> a Map always fail.
> Test failures when I tried to upgrade google-http-client 1.34.0 
> ([Jenkins|https://builds.apache.org/job/beam_PreCommit_Java_Commit/9288/#showFailuresLink]):
> {noformat}
> org.apache.beam.runners.dataflow.worker.graph.LengthPrefixUnknownCodersTest.testLengthPrefixUnknownCoders
> org.apache.beam.runners.dataflow.worker.graph.LengthPrefixUnknownCodersTest.testLengthPrefixForInstructionOutputNodeWithGrpcNodeSuccessor
> org.apache.beam.runners.dataflow.worker.graph.LengthPrefixUnknownCodersTest.testLengthPrefixForLengthPrefixCoder
> org.apache.beam.runners.dataflow.worker.graph.LengthPrefixUnknownCodersTest.testLengthPrefixForSideInputInfos
> org.apache.beam.runners.dataflow.worker.graph.LengthPrefixUnknownCodersTest.testLengthPrefixParDoInstructionCoder
> org.apache.beam.runners.dataflow.worker.graph.LengthPrefixUnknownCodersTest.testLengthPrefixInstructionOutputCoder
> org.apache.beam.runners.dataflow.worker.graph.LengthPrefixUnknownCodersTest.testLengthPrefixWriteInstructionCoder
> org.apache.beam.runners.dataflow.worker.graph.LengthPrefixUnknownCodersTest.testLengthPrefixAndReplaceUnknownCoder
> org.apache.beam.runners.dataflow.worker.graph.LengthPrefixUnknownCodersTest.testLengthPrefixAndReplaceForRunnerNetwork
> org.apache.beam.runners.dataflow.worker.graph.LengthPrefixUnknownCodersTest.testLengthPrefixForInstructionOutputNodeWithGrpcNodePredecessor
> org.apache.beam.runners.dataflow.worker.graph.LengthPrefixUnknownCodersTest.testLengthPrefixReadInstructionCoder
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9004) Update Mockito Matchers usage to ArgumentMatchers since Matchers is deprecated in Mockito 2

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9004:
---
Status: Open  (was: Triage Needed)

> Update Mockito Matchers usage to ArgumentMatchers since Matchers is 
> deprecated in Mockito 2
> ---
>
> Key: BEAM-9004
> URL: https://issues.apache.org/jira/browse/BEAM-9004
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, testing
>Reporter: Luke Cwik
>Priority: Trivial
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9003) test_reshuffle_preserves_timestamps (apache_beam.transforms.util_test.ReshuffleTest) does not work in Streaming VR suite on Dataflow

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-9003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9003:
---
Status: Open  (was: Triage Needed)

> test_reshuffle_preserves_timestamps 
> (apache_beam.transforms.util_test.ReshuffleTest) does not work in Streaming 
> VR suite on Dataflow
> 
>
> Key: BEAM-9003
> URL: https://issues.apache.org/jira/browse/BEAM-9003
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: wendy liu
>Priority: Major
>
> Per investigation in https://issues.apache.org/jira/browse/BEAM-8877, the 
> test times out and was recently added to VR test suite.
> [~liumomo315], I will sickbay this test for streaming, could you please help 
> triage the failure?
> Thank you!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (BEAM-9004) Update Mockito Matchers usage to ArgumentMatchers since Matchers is deprecated in Mockito 2

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-9004.

Fix Version/s: 2.19.0
   Resolution: Fixed

> Update Mockito Matchers usage to ArgumentMatchers since Matchers is 
> deprecated in Mockito 2
> ---
>
> Key: BEAM-9004
> URL: https://issues.apache.org/jira/browse/BEAM-9004
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, testing
>Reporter: Luke Cwik
>Priority: Trivial
> Fix For: 2.19.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9002) test_flatten_same_pcollections (apache_beam.transforms.ptransform_test.PTransformTest) does not work in Streaming VR suite on Dataflow

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-9002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9002:
---
Status: Open  (was: Triage Needed)

> test_flatten_same_pcollections 
> (apache_beam.transforms.ptransform_test.PTransformTest) does not work in 
> Streaming VR suite on Dataflow
> --
>
> Key: BEAM-9002
> URL: https://issues.apache.org/jira/browse/BEAM-9002
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: wendy liu
>Priority: Major
>
> Per investigation in https://issues.apache.org/jira/browse/BEAM-8877, the 
> test times out and was recently added to VR test suite.
> [~liumomo315], I will sickbay this test for streaming, could you please help 
> triage the failure?
> Thank you!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8941) Create a common place for Load Tests configuration

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-8941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-8941:
---
Status: Open  (was: Triage Needed)

> Create a common place for Load Tests configuration
> --
>
> Key: BEAM-8941
> URL: https://issues.apache.org/jira/browse/BEAM-8941
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Pawel Pasterz
>Priority: Minor
> Fix For: Not applicable
>
>
> The Apache Beam community maintains different versions of each Load Test. For 
> example, right now, there are two versions of all Python Load Tests: the 
> first one runs on Dataflow runner, and the second one runs on Flink. With the 
> lack of a common place where configuration for the tests can be stored, the 
> configuration is duplicated many times with minimal differences.
> The goal is to create a common place for the configuration, so that it could 
> be passed to different files with tests (.test-infra/jenkins/*.groovy) and 
> filtered according to needs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8973) Python PreCommit occasionally timeouts

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-8973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-8973:
---
Summary: Python PreCommit occasionally timeouts  (was: Python PreCommit 
occasionally timesout )

> Python PreCommit occasionally timeouts
> --
>
> Key: BEAM-8973
> URL: https://issues.apache.org/jira/browse/BEAM-8973
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Valentyn Tymofieiev
>Priority: Major
>
> Sample time outs in Cron jobs (~1 out of 10 jobs):
> [https://builds.apache.org/job/beam_PreCommit_Python_Cron/2157/]
> [https://builds.apache.org/job/beam_PreCommit_Python_Cron/2146/]
> In jobs triggered on PRs the error also happened more frequently, example: 
> [https://builds.apache.org/job/beam_PreCommit_Python_Commit/10373/]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-7405) Task :sdks:python:hdfsIntegrationTest is failing in Python PostCommits - docker-credential-gcloud not installed

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7405:
---
Status: Open  (was: Triage Needed)

> Task :sdks:python:hdfsIntegrationTest is failing in Python PostCommits - 
> docker-credential-gcloud not installed
> ---
>
> Key: BEAM-7405
> URL: https://issues.apache.org/jira/browse/BEAM-7405
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Yifan Zou
>Priority: Major
> Fix For: 2.14.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> This failure happened on apache-beam-jenkins-14.
> {noformat}
> 18:47:03 > Task :sdks:python:hdfsIntegrationTest
> 18:47:03 ++ dirname 
> ./apache_beam/io/hdfs_integration_test/hdfs_integration_test.sh
> 18:47:03 + TEST_DIR=./apache_beam/io/hdfs_integration_test
> 18:47:03 + ROOT_DIR=./apache_beam/io/hdfs_integration_test/../../../../..
> 18:47:03 + 
> CONTEXT_DIR=./apache_beam/io/hdfs_integration_test/../../../../../build/hdfs_integration
> 18:47:03 + rm -r 
> ./apache_beam/io/hdfs_integration_test/../../../../../build/hdfs_integration
> 18:47:03 rm: cannot remove 
> './apache_beam/io/hdfs_integration_test/../../../../../build/hdfs_integration':
>  No such file or directory
> 18:47:03 + true
> 18:47:03 + mkdir -p 
> ./apache_beam/io/hdfs_integration_test/../../../../../build/hdfs_integration/sdks
> 18:47:03 + cp ./apache_beam/io/hdfs_integration_test/docker-compose.yml 
> ./apache_beam/io/hdfs_integration_test/Dockerfile 
> ./apache_beam/io/hdfs_integration_test/hdfscli.cfg 
> ./apache_beam/io/hdfs_integration_test/hdfs_integration_test.sh 
> ./apache_beam/io/hdfs_integration_test/../../../../../build/hdfs_integration/
> 18:47:03 + cp -r 
> ./apache_beam/io/hdfs_integration_test/../../../../../sdks/python 
> ./apache_beam/io/hdfs_integration_test/../../../../../build/hdfs_integration/sdks/
> 18:47:03 + cp -r ./apache_beam/io/hdfs_integration_test/../../../../../model 
> ./apache_beam/io/hdfs_integration_test/../../../../../build/hdfs_integration/
> 18:47:03 ++ echo hdfs_IT-jenkins-beam_PostCommit_Python_Verify_PR-714
> 18:47:03 + PROJECT_NAME=hdfs_IT-jenkins-beam_PostCommit_Python_Verify_PR-714
> 18:47:03 + '[' -z jenkins-beam_PostCommit_Python_Verify_PR-714 ']'
> 18:47:03 + COLOR_OPT=--no-ansi
> 18:47:03 + COMPOSE_OPT='-p 
> hdfs_IT-jenkins-beam_PostCommit_Python_Verify_PR-714 --no-ansi'
> 18:47:03 + cd 
> ./apache_beam/io/hdfs_integration_test/../../../../../build/hdfs_integration
> 18:47:03 + docker network prune --force
> 18:47:03 + trap finally EXIT
> 18:47:03 + docker-compose -p 
> hdfs_IT-jenkins-beam_PostCommit_Python_Verify_PR-714 --no-ansi build
> 18:47:03 namenode uses an image, skipping
> 18:47:03 datanode uses an image, skipping
> 18:47:03 Building test
> 18:47:03 [29234] Failed to execute script docker-compose
> 18:47:03 Traceback (most recent call last):
> 18:47:03   File "bin/docker-compose", line 6, in 
> 18:47:03   File "compose/cli/main.py", line 71, in main
> 18:47:03   File "compose/cli/main.py", line 127, in perform_command
> 18:47:03   File "compose/cli/main.py", line 287, in build
> 18:47:03   File "compose/project.py", line 386, in build
> 18:47:03   File "compose/project.py", line 368, in build_service
> 18:47:03   File "compose/service.py", line 1084, in build
> 18:47:03   File "site-packages/docker/api/build.py", line 260, in build
> 18:47:03   File "site-packages/docker/api/build.py", line 307, in 
> _set_auth_headers
> 18:47:03   File "site-packages/docker/auth.py", line 310, in 
> get_all_credentials
> 18:47:03   File "site-packages/docker/auth.py", line 262, in 
> _resolve_authconfig_credstore
> 18:47:03   File "site-packages/docker/auth.py", line 287, in 
> _get_store_instance
> 18:47:03   File "site-packages/dockerpycreds/store.py", line 25, in __init__
> 18:47:03 dockerpycreds.errors.InitializationError: docker-credential-gcloud 
> not installed or not available in PATH
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8982) Testing board for Beam (ignore issue)

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-8982:
---
Status: Open  (was: Triage Needed)

> Testing board for Beam (ignore issue)
> -
>
> Key: BEAM-8982
> URL: https://issues.apache.org/jira/browse/BEAM-8982
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Pablo Estrada
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8580) Request Python API to support windows ClosingBehavior

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-8580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-8580:
---
Status: Open  (was: Triage Needed)

> Request Python API to support windows ClosingBehavior
> -
>
> Key: BEAM-8580
> URL: https://issues.apache.org/jira/browse/BEAM-8580
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: wendy liu
>Assignee: Yichi Zhang
>Priority: Major
>
> Beam Python should have an API to support windows ClosingBehavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8916) external_test_it.py is not collected by pytest

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-8916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-8916:
---
Status: Open  (was: Triage Needed)

> external_test_it.py is not collected by pytest
> --
>
> Key: BEAM-8916
> URL: https://issues.apache.org/jira/browse/BEAM-8916
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, testing
>Reporter: Udi Meiri
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Critical
>
> pytest only collects tests matching these patterns:
> https://github.com/apache/beam/blob/8066d78f0fd2237b718859d4a776511203880df0/sdks/python/pytest.ini#L27
> Please rename the file. (ex: external_integration_test.py)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9041) SchemaCoder equals should not rely on fromRow/toRow equality

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-9041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9041:
---
Issue Type: Bug  (was: Improvement)

> SchemaCoder equals should not rely on fromRow/toRow equality
> 
>
> Key: BEAM-9041
> URL: https://issues.apache.org/jira/browse/BEAM-9041
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: 2.18.0
>
>
> SchemaCoder equals implementation relies on SerializableFunction equals 
> method, this is error-prone because users rarely implement the equals method 
> for a SerializableFunction. One alternative would be to rely on bytes 
> equality for this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-9042) AvroUtils.schemaCoder(schema) produces a not serializable SchemaCoder

2019-12-31 Thread Jira

Ismaël Mejía created BEAM-9042:
--

 Summary: AvroUtils.schemaCoder(schema) produces a not serializable 
SchemaCoder
 Key: BEAM-9042
 URL: https://issues.apache.org/jira/browse/BEAM-9042
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-core
Affects Versions: 2.18.0
Reporter: Ismaël Mejía
Assignee: Ismaël Mejía


After some recent change in the implementation of AvroUtils.schemaCoder(schema) 
the produced SchemaCoder is not serializable.

You can reproduce this by doing this:
{code:java}
final SchemaCoder avroSchemaCoder = 
AvroUtils.schemaCoder(schema);
 CoderProperties.coderSerializable(avroSchemaCoder);{code}
it produces this exception
{code:java}
unable to serialize SchemaCoder

[jira] [Updated] (BEAM-9042) AvroUtils.schemaCoder(schema) produces a not serializable SchemaCoder

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-9042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9042:
---
Status: Open  (was: Triage Needed)

> AvroUtils.schemaCoder(schema) produces a not serializable SchemaCoder
> -
>
> Key: BEAM-9042
> URL: https://issues.apache.org/jira/browse/BEAM-9042
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.18.0
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Major
>
> After some recent change in the implementation of 
> AvroUtils.schemaCoder(schema) the produced SchemaCoder is not serializable.
> You can reproduce this by doing this:
> {code:java}
> final SchemaCoder avroSchemaCoder = 
> AvroUtils.schemaCoder(schema);
>  CoderProperties.coderSerializable(avroSchemaCoder);{code}
> it produces this exception
> {code:java}
> unable to serialize SchemaCoder Field{name=bool, description=, type=FieldType{typeName=BOOLEAN, 
> nullable=false, logicalType=null, collectionElementType=null, 
> mapKeyType=null, mapValueType=null, rowSchema=null, metadata={}}}
> Field{name=int, description=, type=FieldType{typeName=INT32, nullable=false, 
> logicalType=null, collectionElementType=null, mapKeyType=null, 
> mapValueType=null, rowSchema=null, metadata={}}}
>  UUID: 6a1ff5b7-e3be-42c3-9b36-f8b53d487fcd delegateCoder: 
> org.apache.beam.sdk.coders.Coder$ByteBuddy$LzAYzILR@5f8ca63c
> java.lang.IllegalArgumentException: unable to serialize SchemaCoder Fields:
> Field{name=bool, description=, type=FieldType{typeName=BOOLEAN, 
> nullable=false, logicalType=null, collectionElementType=null, 
> mapKeyType=null, mapValueType=null, rowSchema=null, metadata={}}}
> Field{name=int, description=, type=FieldType{typeName=INT32, nullable=false, 
> logicalType=null, collectionElementType=null, mapKeyType=null, 
> mapValueType=null, rowSchema=null, metadata={}}}
>  UUID: 6a1ff5b7-e3be-42c3-9b36-f8b53d487fcd delegateCoder: 
> org.apache.beam.sdk.coders.Coder$ByteBuddy$LzAYzILR@5f8ca63c
>  at 
> org.apache.beam.sdk.util.SerializableUtils.serializeToByteArray(SerializableUtils.java:55)
>  at 
> org.apache.beam.sdk.util.SerializableUtils.clone(SerializableUtils.java:113)
>  at 
> org.apache.beam.sdk.util.SerializableUtils.ensureSerializable(SerializableUtils.java:92)
>  at 
> org.apache.beam.sdk.util.SerializableUtils.ensureSerializable(SerializableUtils.java:131)
>  at 
> org.apache.beam.sdk.testing.CoderProperties.coderSerializable(CoderProperties.java:181)
>  at 
> org.apache.beam.sdk.schemas.utils.AvroUtilsTest.testAvroSchemaCoders(AvroUtilsTest.java:543)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
>  at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>  at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365)
>  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>  at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330)
>  at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78)
>  at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328)
>  at org.junit.runners.ParentRunner.access$100(ParentRunner.java:65)
>  at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
>  at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
>  at org.junit.runners.ParentRunner.run(ParentRunner.java:412)
>  at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
>  at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>  at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>  at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
>  at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>  a

[jira] [Updated] (BEAM-9042) AvroUtils.schemaCoder(schema) produces a not serializable SchemaCoder

2019-12-31 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-9042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9042:
---
Fix Version/s: 2.18.0

> AvroUtils.schemaCoder(schema) produces a not serializable SchemaCoder
> -
>
> Key: BEAM-9042
> URL: https://issues.apache.org/jira/browse/BEAM-9042
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.18.0
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Major
> Fix For: 2.18.0
>
>
> After some recent change in the implementation of 
> AvroUtils.schemaCoder(schema) the produced SchemaCoder is not serializable.
> You can reproduce this by doing this:
> {code:java}
> final SchemaCoder avroSchemaCoder = 
> AvroUtils.schemaCoder(schema);
>  CoderProperties.coderSerializable(avroSchemaCoder);{code}
> it produces this exception
> {code:java}
> unable to serialize SchemaCoder Field{name=bool, description=, type=FieldType{typeName=BOOLEAN, 
> nullable=false, logicalType=null, collectionElementType=null, 
> mapKeyType=null, mapValueType=null, rowSchema=null, metadata={}}}
> Field{name=int, description=, type=FieldType{typeName=INT32, nullable=false, 
> logicalType=null, collectionElementType=null, mapKeyType=null, 
> mapValueType=null, rowSchema=null, metadata={}}}
>  UUID: 6a1ff5b7-e3be-42c3-9b36-f8b53d487fcd delegateCoder: 
> org.apache.beam.sdk.coders.Coder$ByteBuddy$LzAYzILR@5f8ca63c
> java.lang.IllegalArgumentException: unable to serialize SchemaCoder Fields:
> Field{name=bool, description=, type=FieldType{typeName=BOOLEAN, 
> nullable=false, logicalType=null, collectionElementType=null, 
> mapKeyType=null, mapValueType=null, rowSchema=null, metadata={}}}
> Field{name=int, description=, type=FieldType{typeName=INT32, nullable=false, 
> logicalType=null, collectionElementType=null, mapKeyType=null, 
> mapValueType=null, rowSchema=null, metadata={}}}
>  UUID: 6a1ff5b7-e3be-42c3-9b36-f8b53d487fcd delegateCoder: 
> org.apache.beam.sdk.coders.Coder$ByteBuddy$LzAYzILR@5f8ca63c
>  at 
> org.apache.beam.sdk.util.SerializableUtils.serializeToByteArray(SerializableUtils.java:55)
>  at 
> org.apache.beam.sdk.util.SerializableUtils.clone(SerializableUtils.java:113)
>  at 
> org.apache.beam.sdk.util.SerializableUtils.ensureSerializable(SerializableUtils.java:92)
>  at 
> org.apache.beam.sdk.util.SerializableUtils.ensureSerializable(SerializableUtils.java:131)
>  at 
> org.apache.beam.sdk.testing.CoderProperties.coderSerializable(CoderProperties.java:181)
>  at 
> org.apache.beam.sdk.schemas.utils.AvroUtilsTest.testAvroSchemaCoders(AvroUtilsTest.java:543)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
>  at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>  at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365)
>  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>  at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330)
>  at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78)
>  at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328)
>  at org.junit.runners.ParentRunner.access$100(ParentRunner.java:65)
>  at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
>  at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
>  at org.junit.runners.ParentRunner.run(ParentRunner.java:412)
>  at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
>  at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>  at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>  at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
>  at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProc

[jira] [Work logged] (BEAM-9040) Add Spark Structured Streaming to Nexmark PostCommit run

2019-12-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9040?focusedWorklogId=364939&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-364939
 ]

ASF GitHub Bot logged work on BEAM-9040:


Author: ASF GitHub Bot
Created on: 31/Dec/19 09:45
Start Date: 31/Dec/19 09:45
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10485: [BEAM-9040] Add 
Spark Structured Streaming to Nexmark PostCommit run
URL: https://github.com/apache/beam/pull/10485#issuecomment-569898778
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 364939)
Time Spent: 0.5h  (was: 20m)

> Add Spark Structured Streaming to Nexmark PostCommit run
> 
>
> Key: BEAM-9040
> URL: https://issues.apache.org/jira/browse/BEAM-9040
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark, testing-nexmark
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Labels: structured-streaming
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The new Spark Structured Streaming runner is not part of our regular 
> PostCommit runs, adding it will help us track regressions as well as compare 
> its performance against the classic Spark runner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9040) Add Spark Structured Streaming to Nexmark PostCommit run

2019-12-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9040?focusedWorklogId=364938&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-364938
 ]

ASF GitHub Bot logged work on BEAM-9040:


Author: ASF GitHub Bot
Created on: 31/Dec/19 09:45
Start Date: 31/Dec/19 09:45
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10485: [BEAM-9040] Add 
Spark Structured Streaming to Nexmark PostCommit run
URL: https://github.com/apache/beam/pull/10485#issuecomment-569898778
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 364938)
Time Spent: 20m  (was: 10m)

> Add Spark Structured Streaming to Nexmark PostCommit run
> 
>
> Key: BEAM-9040
> URL: https://issues.apache.org/jira/browse/BEAM-9040
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark, testing-nexmark
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Labels: structured-streaming
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The new Spark Structured Streaming runner is not part of our regular 
> PostCommit runs, adding it will help us track regressions as well as compare 
> its performance against the classic Spark runner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7949) Add time-based cache threshold support in the data service of the Python SDK harness

2019-12-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7949?focusedWorklogId=364948&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-364948
 ]

ASF GitHub Bot logged work on BEAM-7949:


Author: ASF GitHub Bot
Created on: 31/Dec/19 10:14
Start Date: 31/Dec/19 10:14
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on pull request #10246: 
[BEAM-7949] Add time-based cache threshold support in the data service of the 
Python SDK harness
URL: https://github.com/apache/beam/pull/10246#discussion_r362187710
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/sdk_worker_main.py
 ##
 @@ -200,6 +203,29 @@ def _get_state_cache_size(pipeline_options):
   return 0
 
 
+def _get_data_buffer_time_limit_ms(pipeline_options):
+  """Defines the time limt of the outbound data buffering.
+
+  Note: data_buffer_time_limit_ms is an experimental flag and might
+  not be available in future releases.
+
+  Returns:
+an int indicating the time limit in milliseconds of the the outbound
+  data buffering. Default is 0 (disabled)
+  """
+  experiments = pipeline_options.view_as(DebugOptions).experiments
+  experiments = experiments if experiments else []
+
+  for experiment in experiments:
+# There should only be 1 match so returning from the loop
+if re.match(r'data_buffer_time_limit_ms=', experiment):
+  return int(
+  re.match(
+  r'data_buffer_time_limit_ms=(?P.*)',
 
 Review comment:
   I have also thought about this question when preparing this PR. The reason I 
have not done that in this PR is because I found that most config keys in the 
Java SDK harness starts with "beam_fn_api_" and it's not the same case for the 
config keys in the Python SDK harness.
I'm fine to unify the config key if you don't think that's a problem. 
   What's your thought? @lukecwik @mxm 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 364948)
Time Spent: 4h  (was: 3h 50m)

> Add time-based cache threshold support in the data service of the Python SDK 
> harness
> 
>
> Key: BEAM-7949
> URL: https://issues.apache.org/jira/browse/BEAM-7949
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Currently only size-based cache threshold is supported in the data service of 
> Python SDK harness. It should also support the time-based cache threshold. 
> This is very important, especially for streaming jobs which are sensitive to 
> the delay. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-6857) Support dynamic timers

2019-12-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-6857?focusedWorklogId=364961&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-364961
 ]

ASF GitHub Bot logged work on BEAM-6857:


Author: ASF GitHub Bot
Created on: 31/Dec/19 11:10
Start Date: 31/Dec/19 11:10
Worklog Time Spent: 10m 
  Work Description: rehmanmuradali commented on issue #10316: [BEAM-6857] 
Support Dynamic Timers
URL: https://github.com/apache/beam/pull/10316#issuecomment-569911344
 
 
   @reuvenlax , rebasing complete
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 364961)
Time Spent: 3.5h  (was: 3h 20m)

> Support dynamic timers
> --
>
> Key: BEAM-6857
> URL: https://issues.apache.org/jira/browse/BEAM-6857
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> The Beam timers API currently requires each timer to be statically specified 
> in the DoFn. The user must provide a separate callback method per timer. For 
> example:
>  
> {code:java}
> DoFn()
> {   
>   @TimerId("timer1") 
>   private final TimerSpec timer1 = TimerSpecs.timer(...);   
>   @TimerId("timer2") 
>   private final TimerSpec timer2 = TimerSpecs.timer(...);                 
>   .. set timers in processElement    
>   @OnTimer("timer1") 
>   public void onTimer1() { .}
>   @OnTimer("timer2") 
>   public void onTimer2() {}
> }
> {code}
>  
> However there are many cases where the user does not know the set of timers 
> statically when writing their code. This happens when the timer tag should be 
> based on the data. It also happens when writing a DSL on top of Beam, where 
> the DSL author has to create DoFns but does not know statically which timers 
> their users will want to set (e.g. Scio).
>  
> The goal is to support dynamic timers. Something as follows;
>  
> {code:java}
> DoFn() 
> {
>   @TimerId("timer") 
>   private final TimerSpec timer1 = TimerSpecs.dynamicTimer(...);
>   @ProcessElement process(@TimerId("timer") DynamicTimer timer)
>   {
>        timer.set("tag1'", ts);       
>timer.set("tag2", ts);     
>   }
>   @OnTimer("timer") 
>   public void onTimer1(@TimerTag String tag) { .}
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9006) Meta space memory leak caused by the shutdown hook of ProcessManager

2019-12-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9006?focusedWorklogId=364986&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-364986
 ]

ASF GitHub Bot logged work on BEAM-9006:


Author: ASF GitHub Bot
Created on: 31/Dec/19 13:28
Start Date: 31/Dec/19 13:28
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #10462: [BEAM-9006] Improve 
ProcessManager for shutdown hook handling.
URL: https://github.com/apache/beam/pull/10462#issuecomment-569928768
 
 
   Great find! Looks good to me. I'm on the road at the moment but I'll have 
another look later. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 364986)
Time Spent: 40m  (was: 0.5h)

> Meta space memory leak caused by the shutdown hook of ProcessManager 
> -
>
> Key: BEAM-9006
> URL: https://issues.apache.org/jira/browse/BEAM-9006
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently the class `ProcessManager` will add a shutdown hook to stop all the 
> living processes before JVM exits. The shutdown hook will never be removed. 
> If this class is loaded by the user class loader, it will cause the user 
> class loader could not be garbage collected which causes meta space memory 
> leak eventually.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9030) Bump grpc to 1.26.0

2019-12-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9030?focusedWorklogId=364988&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-364988
 ]

ASF GitHub Bot logged work on BEAM-9030:


Author: ASF GitHub Bot
Created on: 31/Dec/19 13:44
Start Date: 31/Dec/19 13:44
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on issue #10463: [BEAM-9030] 
Bump grpc to 1.26.0
URL: https://github.com/apache/beam/pull/10463#issuecomment-569930822
 
 
   Hi @lukecwik, Regarding to the linkage check, could you share the command 
you use? I can not reproduce the result with the command `./gradlew 
-Ppublishing 
-PjavaLinkageArtifactIds=beam-sdks-java-core,beam-sdks-java-io-jdbc 
:checkJavaLinkage`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 364988)
Time Spent: 3h 40m  (was: 3.5h)

> Bump grpc to 1.26.0
> ---
>
> Key: BEAM-9030
> URL: https://issues.apache.org/jira/browse/BEAM-9030
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution, runner-flink
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> When submitting a Python word count job to a Flink session/standalone cluster 
> repeatedly, the meta space usage of the task manager of the Flink cluster 
> will continuously increase (about 40MB each time). The reason is that the 
> Beam classes are loaded with the user class loader in Flink and there are 
> problems with the implementation of `ProcessManager`(from Beam) and 
> `ThreadPoolCache`(from netty) which may cause the user class loader could not 
> be garbage collected even after the job finished which causes the meta space 
> memory leak eventually. You can refer to FLINK-15338[1] for more information.
> Regarding to `ProcessManager`, I have created a JIRA BEAM-9006[2] to track 
> it. Regarding to `ThreadPoolCache`, it is a Netty problem and has been fixed 
> in NETTY#8955[3]. Netty 4.1.35 Final has already included this fix and GRPC 
> 1.22.0 has already dependents on Netty 4.1.35 Final. So we need to bump the 
> version of GRPC to 1.22.0+ (currently 1.21.0).
>  
> What do you think?
> [1] https://issues.apache.org/jira/browse/FLINK-15338
> [2] https://issues.apache.org/jira/browse/BEAM-9006
> [3] [https://github.com/netty/netty/pull/8955]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9037) Instant and duration as logical type

2019-12-31 Thread Alex Van Boxel (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Van Boxel updated BEAM-9037:
-
Summary: Instant and duration as logical type   (was: Promote proto logical 
type and duration to the core logical types)

> Instant and duration as logical type 
> -
>
> Key: BEAM-9037
> URL: https://issues.apache.org/jira/browse/BEAM-9037
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core
>Reporter: Alex Van Boxel
>Assignee: Alex Van Boxel
>Priority: Major
>
> The proto schema includes Timestamp and Duration with nano precision. The 
> logical types should be promoted to the core logical types, so they can be 
> handled on various IO's as standard mandatory conversions.
> This means that the logical type should use the proto specific Timestamp and 
> Duration but the java 8 Instant and Duration.
> See discussion in the design document:
> [https://docs.google.com/document/d/1uu9pJktzT_O3DxGd1-Q2op4nRk4HekIZbzi-0oTAips/edit#heading=h.9uhml95iygqr]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9037) Instant and duration as logical type

2019-12-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9037?focusedWorklogId=365001&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-365001
 ]

ASF GitHub Bot logged work on BEAM-9037:


Author: ASF GitHub Bot
Created on: 31/Dec/19 14:50
Start Date: 31/Dec/19 14:50
Worklog Time Spent: 10m 
  Work Description: alexvanboxel commented on pull request #10486: 
[BEAM-9037] Instant and duration as logical type
URL: https://github.com/apache/beam/pull/10486
 
 
   The proto schema includes Timestamp and Duration with nano precision.
   The logical types should be promoted to the core logical types, so they
   can be handled on various IO's as standard mandatory conversions.
   
   This means that the logical type should use the proto specific Timestamp
   and Duration but the java 8 Instant and Duration.
   
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.or

[jira] [Resolved] (BEAM-9037) Instant and duration as logical type

2019-12-31 Thread Alex Van Boxel (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Van Boxel resolved BEAM-9037.
--
Fix Version/s: 2.19.0
   Resolution: Fixed

> Instant and duration as logical type 
> -
>
> Key: BEAM-9037
> URL: https://issues.apache.org/jira/browse/BEAM-9037
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core
>Reporter: Alex Van Boxel
>Assignee: Alex Van Boxel
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The proto schema includes Timestamp and Duration with nano precision. The 
> logical types should be promoted to the core logical types, so they can be 
> handled on various IO's as standard mandatory conversions.
> This means that the logical type should use the proto specific Timestamp and 
> Duration but the java 8 Instant and Duration.
> See discussion in the design document:
> [https://docs.google.com/document/d/1uu9pJktzT_O3DxGd1-Q2op4nRk4HekIZbzi-0oTAips/edit#heading=h.9uhml95iygqr]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work started] (BEAM-9037) Instant and duration as logical type

2019-12-31 Thread Alex Van Boxel (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-9037 started by Alex Van Boxel.

> Instant and duration as logical type 
> -
>
> Key: BEAM-9037
> URL: https://issues.apache.org/jira/browse/BEAM-9037
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core
>Reporter: Alex Van Boxel
>Assignee: Alex Van Boxel
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The proto schema includes Timestamp and Duration with nano precision. The 
> logical types should be promoted to the core logical types, so they can be 
> handled on various IO's as standard mandatory conversions.
> This means that the logical type should use the proto specific Timestamp and 
> Duration but the java 8 Instant and Duration.
> See discussion in the design document:
> [https://docs.google.com/document/d/1uu9pJktzT_O3DxGd1-Q2op4nRk4HekIZbzi-0oTAips/edit#heading=h.9uhml95iygqr]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9040) Add Spark Structured Streaming to Nexmark PostCommit run

2019-12-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9040?focusedWorklogId=365008&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-365008
 ]

ASF GitHub Bot logged work on BEAM-9040:


Author: ASF GitHub Bot
Created on: 31/Dec/19 15:26
Start Date: 31/Dec/19 15:26
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10485: [BEAM-9040] Add 
Spark Structured Streaming to Nexmark PostCommit run
URL: https://github.com/apache/beam/pull/10485#issuecomment-569945475
 
 
   Run Spark Runner Nexmark Tests
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 365008)
Time Spent: 40m  (was: 0.5h)

> Add Spark Structured Streaming to Nexmark PostCommit run
> 
>
> Key: BEAM-9040
> URL: https://issues.apache.org/jira/browse/BEAM-9040
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark, testing-nexmark
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Labels: structured-streaming
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The new Spark Structured Streaming runner is not part of our regular 
> PostCommit runs, adding it will help us track regressions as well as compare 
> its performance against the classic Spark runner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9040) Add Spark Structured Streaming to Nexmark PostCommit run

2019-12-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9040?focusedWorklogId=365011&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-365011
 ]

ASF GitHub Bot logged work on BEAM-9040:


Author: ASF GitHub Bot
Created on: 31/Dec/19 15:35
Start Date: 31/Dec/19 15:35
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10485: [BEAM-9040] Add 
Spark Structured Streaming to Nexmark PostCommit run
URL: https://github.com/apache/beam/pull/10485#issuecomment-569946626
 
 
   Run Seed Job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 365011)
Time Spent: 50m  (was: 40m)

> Add Spark Structured Streaming to Nexmark PostCommit run
> 
>
> Key: BEAM-9040
> URL: https://issues.apache.org/jira/browse/BEAM-9040
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark, testing-nexmark
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Labels: structured-streaming
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The new Spark Structured Streaming runner is not part of our regular 
> PostCommit runs, adding it will help us track regressions as well as compare 
> its performance against the classic Spark runner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9040) Add Spark Structured Streaming to Nexmark PostCommit run

2019-12-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9040?focusedWorklogId=365012&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-365012
 ]

ASF GitHub Bot logged work on BEAM-9040:


Author: ASF GitHub Bot
Created on: 31/Dec/19 15:35
Start Date: 31/Dec/19 15:35
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10485: [BEAM-9040] Add 
Spark Structured Streaming to Nexmark PostCommit run
URL: https://github.com/apache/beam/pull/10485#issuecomment-569945475
 
 
   Run Spark Runner Nexmark Tests
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 365012)
Time Spent: 1h  (was: 50m)

> Add Spark Structured Streaming to Nexmark PostCommit run
> 
>
> Key: BEAM-9040
> URL: https://issues.apache.org/jira/browse/BEAM-9040
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark, testing-nexmark
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Labels: structured-streaming
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The new Spark Structured Streaming runner is not part of our regular 
> PostCommit runs, adding it will help us track regressions as well as compare 
> its performance against the classic Spark runner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9040) Add Spark Structured Streaming to Nexmark PostCommit run

2019-12-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9040?focusedWorklogId=365019&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-365019
 ]

ASF GitHub Bot logged work on BEAM-9040:


Author: ASF GitHub Bot
Created on: 31/Dec/19 15:49
Start Date: 31/Dec/19 15:49
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10485: [BEAM-9040] Add 
Spark Structured Streaming to Nexmark PostCommit run
URL: https://github.com/apache/beam/pull/10485#issuecomment-569948709
 
 
   Run Spark Runner Nexmark Tests
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 365019)
Time Spent: 1h 10m  (was: 1h)

> Add Spark Structured Streaming to Nexmark PostCommit run
> 
>
> Key: BEAM-9040
> URL: https://issues.apache.org/jira/browse/BEAM-9040
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark, testing-nexmark
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Labels: structured-streaming
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The new Spark Structured Streaming runner is not part of our regular 
> PostCommit runs, adding it will help us track regressions as well as compare 
> its performance against the classic Spark runner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9037) Instant and duration as logical type

2019-12-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9037?focusedWorklogId=365024&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-365024
 ]

ASF GitHub Bot logged work on BEAM-9037:


Author: ASF GitHub Bot
Created on: 31/Dec/19 16:23
Start Date: 31/Dec/19 16:23
Worklog Time Spent: 10m 
  Work Description: alexvanboxel commented on issue #10486: [BEAM-9037] 
Instant and duration as logical type
URL: https://github.com/apache/beam/pull/10486#issuecomment-569953631
 
 
   @TheNeuralBit @reuvenlax this PR promotes the Timestamp+Duration to 
NanosInstant and NanosDuration. Please see the PR for more information and 
reference to the design document.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 365024)
Time Spent: 20m  (was: 10m)

> Instant and duration as logical type 
> -
>
> Key: BEAM-9037
> URL: https://issues.apache.org/jira/browse/BEAM-9037
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core
>Reporter: Alex Van Boxel
>Assignee: Alex Van Boxel
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The proto schema includes Timestamp and Duration with nano precision. The 
> logical types should be promoted to the core logical types, so they can be 
> handled on various IO's as standard mandatory conversions.
> This means that the logical type should use the proto specific Timestamp and 
> Duration but the java 8 Instant and Duration.
> See discussion in the design document:
> [https://docs.google.com/document/d/1uu9pJktzT_O3DxGd1-Q2op4nRk4HekIZbzi-0oTAips/edit#heading=h.9uhml95iygqr]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-6766) Sort Merge Bucket Join support in Beam

2019-12-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-6766?focusedWorklogId=365027&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-365027
 ]

ASF GitHub Bot logged work on BEAM-6766:


Author: ASF GitHub Bot
Created on: 31/Dec/19 16:30
Start Date: 31/Dec/19 16:30
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on issue #8823: [BEAM-6766] 
Metadata file implementation for Sort Merge Bucket source/sink
URL: https://github.com/apache/beam/pull/8823#issuecomment-569954835
 
 
   This pull request has been marked as stale due to 60 days of inactivity. It 
will be closed in 1 week if no further activity occurs. If you think that’s 
incorrect or this pull request requires a review, please simply write any 
comment. If closed, you can revive the PR at any time and @mention a reviewer 
or discuss it on the d...@beam.apache.org list. Thank you for your 
contributions.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 365027)
Time Spent: 8h 20m  (was: 8h 10m)

> Sort Merge Bucket Join support in Beam
> --
>
> Key: BEAM-6766
> URL: https://issues.apache.org/jira/browse/BEAM-6766
> Project: Beam
>  Issue Type: Improvement
>  Components: extensions-java-join-library, io-ideas
>Reporter: Claire McGinty
>Assignee: Claire McGinty
>Priority: Major
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> Design doc: 
> https://docs.google.com/document/d/1AQlonN8t4YJrARcWzepyP7mWHTxHAd6WIECwk1s3LQQ/edit#
> Hi! Spotify has been internally prototyping and testing an implementation of 
> the sort merge join using Beam primitives and we're interested in 
> contributing it open-source – probably to Beam's extensions package in its 
> own `smb` module or as part of the joins module?
> We've tested this with Avro files using Avro's GenericDatumWriter/Reader 
> directly (although this could theoretically be expanded to other 
> serialization formats). We'd add two transforms*, an SMB write and an SMB 
> join. 
> SMB write would take in one PCollection and a # of buckets and:
> 1) Apply a partitioning function to the input to assign each record to one 
> bucket. (the user code would have to statically specify a # of buckets... 
> hard to see a way to do this dynamically.)
> 2) Group by that bucket ID and within each bucket perform an in-memory sort 
> on join key. If the grouped records are too large to fit in memory, fall back 
> to an external sort (although if this happens, user should probably increase 
> bucket size so every group fits in memory).
> 3) Directly write the contents of bucket to a sequentially named file.
> 4) Write a metadata file to the same output path with info about hash 
> algorithm/# buckets.
> SMB join would take in the input paths for 2 or more Sources, all of which 
> are written in a bucketed and partitioned way, and :
> 1) Verify that the metadata files have compatible bucket # and hash algorithm.
> 2) Expand the input paths to enumerate the `ResourceIds` of every file in the 
> paths. Group all inputs with the same bucket ID.
> 3) Within each group, open a file reader on all `ResourceIds`. Sequentially 
> read files one record at a time, outputting tuples of all record pairs with 
> matching join key.
>  \* These could be implemented either directly as `PTransforms` with the 
> writer being a `DoFn` but I semantically do like the idea of extending 
> `FileBasedSource`/`Sink` with abstract classes like 
> `SortedBucketSink`/`SortedBucketSource`... if we represent the elements in a 
> sink as KV pairs of >>, so that the # 
> of elements in the PCollection == # of buckets == # of output files, we could 
> just implement something like `SortedBucketSink` extending `FileBasedSink` 
> with a dynamic file naming function. I'd like to be able to take advantage of 
> the existing write/read implementation logic in the `io` package as much as 
> possible although I guess some of those are package private. 
> –
> From our internal testing, we've seen some substantial performance 
> improvements using the right bucket size--not only by avoiding a shuffle 
> during the join step, but also in storage costs, since we're getting better 
> compression in Avro by storing sorted records.
> Please let us know what you think/any concerns we can address! Our 
> implementation isn't quite production-ready yet, but we'd like to start a 
> discussion about it early.



--
This message was sent by Atlassian Jira
(v8.3.4#803

[jira] [Work logged] (BEAM-9040) Add Spark Structured Streaming to Nexmark PostCommit run

2019-12-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9040?focusedWorklogId=365029&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-365029
 ]

ASF GitHub Bot logged work on BEAM-9040:


Author: ASF GitHub Bot
Created on: 31/Dec/19 16:42
Start Date: 31/Dec/19 16:42
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10485: [BEAM-9040] Add 
Spark Structured Streaming to Nexmark PostCommit run
URL: https://github.com/apache/beam/pull/10485#issuecomment-569946626
 
 
   Run Seed Job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 365029)
Time Spent: 1.5h  (was: 1h 20m)

> Add Spark Structured Streaming to Nexmark PostCommit run
> 
>
> Key: BEAM-9040
> URL: https://issues.apache.org/jira/browse/BEAM-9040
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark, testing-nexmark
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Labels: structured-streaming
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The new Spark Structured Streaming runner is not part of our regular 
> PostCommit runs, adding it will help us track regressions as well as compare 
> its performance against the classic Spark runner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9040) Add Spark Structured Streaming to Nexmark PostCommit run

2019-12-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9040?focusedWorklogId=365028&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-365028
 ]

ASF GitHub Bot logged work on BEAM-9040:


Author: ASF GitHub Bot
Created on: 31/Dec/19 16:42
Start Date: 31/Dec/19 16:42
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10485: [BEAM-9040] Add 
Spark Structured Streaming to Nexmark PostCommit run
URL: https://github.com/apache/beam/pull/10485#issuecomment-569956410
 
 
   Run Seed Job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 365028)
Time Spent: 1h 20m  (was: 1h 10m)

> Add Spark Structured Streaming to Nexmark PostCommit run
> 
>
> Key: BEAM-9040
> URL: https://issues.apache.org/jira/browse/BEAM-9040
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark, testing-nexmark
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Labels: structured-streaming
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The new Spark Structured Streaming runner is not part of our regular 
> PostCommit runs, adding it will help us track regressions as well as compare 
> its performance against the classic Spark runner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9006) Meta space memory leak caused by the shutdown hook of ProcessManager

2019-12-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9006?focusedWorklogId=365034&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-365034
 ]

ASF GitHub Bot logged work on BEAM-9006:


Author: ASF GitHub Bot
Created on: 31/Dec/19 17:12
Start Date: 31/Dec/19 17:12
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #10462: [BEAM-9006] Improve 
ProcessManager for shutdown hook handling.
URL: https://github.com/apache/beam/pull/10462#issuecomment-569960758
 
 
   CC @tweise 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 365034)
Time Spent: 50m  (was: 40m)

> Meta space memory leak caused by the shutdown hook of ProcessManager 
> -
>
> Key: BEAM-9006
> URL: https://issues.apache.org/jira/browse/BEAM-9006
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently the class `ProcessManager` will add a shutdown hook to stop all the 
> living processes before JVM exits. The shutdown hook will never be removed. 
> If this class is loaded by the user class loader, it will cause the user 
> class loader could not be garbage collected which causes meta space memory 
> leak eventually.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9006) Meta space memory leak caused by the shutdown hook of ProcessManager

2019-12-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9006?focusedWorklogId=365035&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-365035
 ]

ASF GitHub Bot logged work on BEAM-9006:


Author: ASF GitHub Bot
Created on: 31/Dec/19 17:13
Start Date: 31/Dec/19 17:13
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #10462: [BEAM-9006] Improve 
ProcessManager for shutdown hook handling.
URL: https://github.com/apache/beam/pull/10462#issuecomment-569960802
 
 
   Could you squash the commits? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 365035)
Time Spent: 1h  (was: 50m)

> Meta space memory leak caused by the shutdown hook of ProcessManager 
> -
>
> Key: BEAM-9006
> URL: https://issues.apache.org/jira/browse/BEAM-9006
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently the class `ProcessManager` will add a shutdown hook to stop all the 
> living processes before JVM exits. The shutdown hook will never be removed. 
> If this class is loaded by the user class loader, it will cause the user 
> class loader could not be garbage collected which causes meta space memory 
> leak eventually.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9040) Add Spark Structured Streaming to Nexmark PostCommit run

2019-12-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9040?focusedWorklogId=365039&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-365039
 ]

ASF GitHub Bot logged work on BEAM-9040:


Author: ASF GitHub Bot
Created on: 31/Dec/19 17:23
Start Date: 31/Dec/19 17:23
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10485: [BEAM-9040] Add 
Spark Structured Streaming to Nexmark PostCommit run
URL: https://github.com/apache/beam/pull/10485#issuecomment-569962243
 
 
   Run Spark Runner Nexmark Tests
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 365039)
Time Spent: 1h 40m  (was: 1.5h)

> Add Spark Structured Streaming to Nexmark PostCommit run
> 
>
> Key: BEAM-9040
> URL: https://issues.apache.org/jira/browse/BEAM-9040
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark, testing-nexmark
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Labels: structured-streaming
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The new Spark Structured Streaming runner is not part of our regular 
> PostCommit runs, adding it will help us track regressions as well as compare 
> its performance against the classic Spark runner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9040) Add Spark Structured Streaming to Nexmark PostCommit run

2019-12-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9040?focusedWorklogId=365040&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-365040
 ]

ASF GitHub Bot logged work on BEAM-9040:


Author: ASF GitHub Bot
Created on: 31/Dec/19 17:23
Start Date: 31/Dec/19 17:23
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10485: [BEAM-9040] Add 
Spark Structured Streaming to Nexmark PostCommit run
URL: https://github.com/apache/beam/pull/10485#issuecomment-569948709
 
 
   Run Spark Runner Nexmark Tests
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 365040)
Time Spent: 1h 50m  (was: 1h 40m)

> Add Spark Structured Streaming to Nexmark PostCommit run
> 
>
> Key: BEAM-9040
> URL: https://issues.apache.org/jira/browse/BEAM-9040
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark, testing-nexmark
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Labels: structured-streaming
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The new Spark Structured Streaming runner is not part of our regular 
> PostCommit runs, adding it will help us track regressions as well as compare 
> its performance against the classic Spark runner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-9043) BigQueryIO fails cryptically if gcpTempLocation is set and tempLocation is not

2019-12-31 Thread Brian Hulette (Jira)

Brian Hulette created BEAM-9043:
---

 Summary: BigQueryIO fails cryptically if gcpTempLocation is set 
and tempLocation is not
 Key: BEAM-9043
 URL: https://issues.apache.org/jira/browse/BEAM-9043
 Project: Beam
  Issue Type: Bug
  Components: io-java-gcp
Reporter: Brian Hulette


The following error arises when running a pipeline that uses BigQueryIO with 
gcpTempLocation set and tempLocation not set. We should either handle this case 
gracefully, or throw a more helpful error like "please specify tempLocation".

{code:java}
2019-12-24 13:06:18 WARN  UnboundedReadFromBoundedSource:152 - Exception while 
splitting org.apache.beam.sdk.io.gcp.bigquery.BigQueryQuerySource@5d21202d, 
skips the initial splits.
java.lang.NullPointerException
at java.util.regex.Matcher.getTextLength(Matcher.java:1283)
at java.util.regex.Matcher.reset(Matcher.java:309)
at java.util.regex.Matcher.(Matcher.java:229)
at java.util.regex.Pattern.matcher(Pattern.java:1093)
at org.apache.beam.sdk.io.FileSystems.parseScheme(FileSystems.java:447)
at 
org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:533)
at 
org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers.resolveTempLocation(BigQueryHelpers.java:706)
at 
org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.extractFiles(BigQuerySourceBase.java:125)
at 
org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.split(BigQuerySourceBase.java:148)
at 
org.apache.beam.runners.core.construction.UnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter.split(UnboundedReadFromBoundedSource.java:144)
at 
org.apache.beam.runners.dataflow.internal.CustomSources.serializeToCloudSource(CustomSources.java:87)
at 
org.apache.beam.runners.dataflow.ReadTranslator.translateReadHelper(ReadTranslator.java:51)
at 
org.apache.beam.runners.dataflow.DataflowRunner$StreamingUnboundedRead$ReadWithIdsTranslator.translate(DataflowRunner.java:1590)
at 
org.apache.beam.runners.dataflow.DataflowRunner$StreamingUnboundedRead$ReadWithIdsTranslator.translate(DataflowRunner.java:1587)
at 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.visitPrimitiveTransform(DataflowPipelineTranslator.java:475)
at 
org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:665)
at 
org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657)
at 
org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657)
at 
org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657)
at 
org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657)
at 
org.apache.beam.sdk.runners.TransformHierarchy$Node.access$600(TransformHierarchy.java:317)
at 
org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:251)
at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:460)
at 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.translate(DataflowPipelineTranslator.java:414)
at 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator.translate(DataflowPipelineTranslator.java:173)
at 
org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:763)
at 
org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:186)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:315)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:301)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9006) Meta space memory leak caused by the shutdown hook of ProcessManager

2019-12-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9006?focusedWorklogId=365110&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-365110
 ]

ASF GitHub Bot logged work on BEAM-9006:


Author: ASF GitHub Bot
Created on: 31/Dec/19 23:50
Start Date: 31/Dec/19 23:50
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on issue #10462: [BEAM-9006] 
Improve ProcessManager for shutdown hook handling.
URL: https://github.com/apache/beam/pull/10462#issuecomment-570006517
 
 
   Thanks for the review 👍 , I have squash the commits :)  
   Although in general we only communicate for work in the PR, but today is 
more special, I would like to say: "Happy new year"  @mxm :)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 365110)
Time Spent: 1h 10m  (was: 1h)

> Meta space memory leak caused by the shutdown hook of ProcessManager 
> -
>
> Key: BEAM-9006
> URL: https://issues.apache.org/jira/browse/BEAM-9006
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently the class `ProcessManager` will add a shutdown hook to stop all the 
> living processes before JVM exits. The shutdown hook will never be removed. 
> If this class is loaded by the user class loader, it will cause the user 
> class loader could not be garbage collected which causes meta space memory 
> leak eventually.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

79 matches

Mail list logo