[jira] [Commented] (BEAM-6324) CassandraIO.Read - Add the ability to provide a filter to the query

2019-01-24 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751955#comment-16751955
 ] 

Kenneth Knowles commented on BEAM-6324:
---

Quite interesting. Let's keep this open and see if a productive resolution can 
be reached.

> CassandraIO.Read - Add the ability to provide a filter to the query
> ---
>
> Key: BEAM-6324
> URL: https://issues.apache.org/jira/browse/BEAM-6324
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-cassandra
>Affects Versions: 2.9.0
>Reporter: Shahar Frank
>Assignee: Shahar Frank
>Priority: Major
>  Labels: performance, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> CassandraIO.Read doesn't support using WHERE to filter the input at the 
> source (In Cassandra) which might provide great performance boost.
> Already implemented by:
> https://github.com/apache/beam/pull/7340



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6324) CassandraIO.Read - Add the ability to provide a filter to the query

2019-01-24 Thread Shahar Frank (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751956#comment-16751956
 ] 

Shahar Frank commented on BEAM-6324:


Sounds like a plan :)

> CassandraIO.Read - Add the ability to provide a filter to the query
> ---
>
> Key: BEAM-6324
> URL: https://issues.apache.org/jira/browse/BEAM-6324
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-cassandra
>Affects Versions: 2.9.0
>Reporter: Shahar Frank
>Assignee: Shahar Frank
>Priority: Major
>  Labels: performance, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> CassandraIO.Read doesn't support using WHERE to filter the input at the 
> source (In Cassandra) which might provide great performance boost.
> Already implemented by:
> https://github.com/apache/beam/pull/7340



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6324) CassandraIO.Read - Add the ability to provide a filter to the query

2019-01-24 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751935#comment-16751935
 ] 

Kenneth Knowles commented on BEAM-6324:
---

Yikes! I pinged a couple possible reviewers and raised this on the dev list. I 
think you may have been hit by the holidays, in part.

> CassandraIO.Read - Add the ability to provide a filter to the query
> ---
>
> Key: BEAM-6324
> URL: https://issues.apache.org/jira/browse/BEAM-6324
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-cassandra
>Affects Versions: 2.9.0
>Reporter: Shahar Frank
>Assignee: Shahar Frank
>Priority: Major
>  Labels: performance, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> CassandraIO.Read doesn't support using WHERE to filter the input at the 
> source (In Cassandra) which might provide great performance boost.
> Already implemented by:
> https://github.com/apache/beam/pull/7340



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6324) CassandraIO.Read - Add the ability to provide a filter to the query

2019-01-24 Thread Shahar Frank (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751950#comment-16751950
 ] 

Shahar Frank commented on BEAM-6324:


Thanks [~kenn]. Maybe this helps. Regardless - this PR only is half way what 
I'd like to have - it's a sort of workaround. The proper solution will require 
using an updated Casandra Java Driver - [for which the PR was not 
accepted|https://github.com/datastax/java-driver/pull/1163]. I wouldn't want 
this PR to be the long-term solution so I can see no way around maintaining my 
own repos of both projects. Besides I have some more issues with latest - e.g. 
[Cassandra IO Reader not stack when doing a 
Join|https://issues.apache.org/jira/browse/BEAM-6289] etc... which might prove 
hard to merge as well.

> CassandraIO.Read - Add the ability to provide a filter to the query
> ---
>
> Key: BEAM-6324
> URL: https://issues.apache.org/jira/browse/BEAM-6324
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-cassandra
>Affects Versions: 2.9.0
>Reporter: Shahar Frank
>Assignee: Shahar Frank
>Priority: Major
>  Labels: performance, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> CassandraIO.Read doesn't support using WHERE to filter the input at the 
> source (In Cassandra) which might provide great performance boost.
> Already implemented by:
> https://github.com/apache/beam/pull/7340



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6024) Gradle setupVirtualenv supports Python 3

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6024?focusedWorklogId=189863=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189863
 ]

ASF GitHub Bot logged work on BEAM-6024:


Author: ASF GitHub Bot
Created on: 25/Jan/19 06:46
Start Date: 25/Jan/19 06:46
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #7423: [BEAM-6024] Build 
Python 3 container image with Gradle
URL: https://github.com/apache/beam/pull/7423#issuecomment-457473035
 
 
   Run Python Dataflow ValidatesContainer
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189863)
Time Spent: 2.5h  (was: 2h 20m)

> Gradle setupVirtualenv supports Python 3
> 
>
> Key: BEAM-6024
> URL: https://issues.apache.org/jira/browse/BEAM-6024
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Need to depend on Python 3 virtualenv in few places:
> - Build Dataflow worker container in Python 3
> - Run ValidatesRunner and integration tests on Jenkins



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3386) Dependency conflict when Calcite is included in a project.

2019-01-24 Thread Kai Jiang (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751942#comment-16751942
 ] 

Kai Jiang commented on BEAM-3386:
-

[~kenn] this (without relocation) sounds pretty useful. let me checkout the 
thread quickly.

> Dependency conflict when Calcite is included in a project.
> --
>
> Key: BEAM-3386
> URL: https://issues.apache.org/jira/browse/BEAM-3386
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.2.0, 2.3.0, 2.4.0, 2.5.0, 2.6.0
>Reporter: Austin Haas
>Assignee: Kai Jiang
>Priority: Critical
>
> When Calcite (v. 1.13.0) is included in a project that also includes Beam and 
> the Beam SQL extension, then the following error is thrown when trying to run 
> Beam code.
> ClassCastException 
> org.apache.beam.sdk.extensions.sql.impl.planner.BeamRelDataTypeSystem cannot 
> be cast to org.apache.calcite.rel.type.RelDataTypeSystem
> org.apache.calcite.jdbc.CalciteConnectionImpl. 
> (CalciteConnectionImpl.java:120)
> 
> org.apache.calcite.jdbc.CalciteJdbc41Factory$CalciteJdbc41Connection. 
> (CalciteJdbc41Factory.java:114)
> org.apache.calcite.jdbc.CalciteJdbc41Factory.newConnection 
> (CalciteJdbc41Factory.java:59)
> org.apache.calcite.jdbc.CalciteJdbc41Factory.newConnection 
> (CalciteJdbc41Factory.java:44)
> org.apache.calcite.jdbc.CalciteFactory.newConnection 
> (CalciteFactory.java:53)
> org.apache.calcite.avatica.UnregisteredDriver.connect 
> (UnregisteredDriver.java:138)
> java.sql.DriverManager.getConnection (DriverManager.java:664)
> java.sql.DriverManager.getConnection (DriverManager.java:208)
> 
> org.apache.beam.sdks.java.extensions.sql.repackaged.org.apache.calcite.tools.Frameworks.withPrepare
>  (Frameworks.java:145)
> 
> org.apache.beam.sdks.java.extensions.sql.repackaged.org.apache.calcite.tools.Frameworks.withPlanner
>  (Frameworks.java:106)
> 
> org.apache.beam.sdks.java.extensions.sql.repackaged.org.apache.calcite.prepare.PlannerImpl.ready
>  (PlannerImpl.java:140)
> 
> org.apache.beam.sdks.java.extensions.sql.repackaged.org.apache.calcite.prepare.PlannerImpl.parse
>  (PlannerImpl.java:170)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3386) Dependency conflict when Calcite is included in a project.

2019-01-24 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751939#comment-16751939
 ] 

Kenneth Knowles commented on BEAM-3386:
---

[~vectorijk] did you see my thread on the dev list? I tried just a little. I 
think it will be helpful to first separate the generated code from the main 
Beam SQL module. That way, the generated code can link to vendored Calcite with 
a relocation. The main Beam SQL module will depend on vendored Calcite directly 
without needing relocation.

> Dependency conflict when Calcite is included in a project.
> --
>
> Key: BEAM-3386
> URL: https://issues.apache.org/jira/browse/BEAM-3386
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.2.0, 2.3.0, 2.4.0, 2.5.0, 2.6.0
>Reporter: Austin Haas
>Assignee: Kai Jiang
>Priority: Critical
>
> When Calcite (v. 1.13.0) is included in a project that also includes Beam and 
> the Beam SQL extension, then the following error is thrown when trying to run 
> Beam code.
> ClassCastException 
> org.apache.beam.sdk.extensions.sql.impl.planner.BeamRelDataTypeSystem cannot 
> be cast to org.apache.calcite.rel.type.RelDataTypeSystem
> org.apache.calcite.jdbc.CalciteConnectionImpl. 
> (CalciteConnectionImpl.java:120)
> 
> org.apache.calcite.jdbc.CalciteJdbc41Factory$CalciteJdbc41Connection. 
> (CalciteJdbc41Factory.java:114)
> org.apache.calcite.jdbc.CalciteJdbc41Factory.newConnection 
> (CalciteJdbc41Factory.java:59)
> org.apache.calcite.jdbc.CalciteJdbc41Factory.newConnection 
> (CalciteJdbc41Factory.java:44)
> org.apache.calcite.jdbc.CalciteFactory.newConnection 
> (CalciteFactory.java:53)
> org.apache.calcite.avatica.UnregisteredDriver.connect 
> (UnregisteredDriver.java:138)
> java.sql.DriverManager.getConnection (DriverManager.java:664)
> java.sql.DriverManager.getConnection (DriverManager.java:208)
> 
> org.apache.beam.sdks.java.extensions.sql.repackaged.org.apache.calcite.tools.Frameworks.withPrepare
>  (Frameworks.java:145)
> 
> org.apache.beam.sdks.java.extensions.sql.repackaged.org.apache.calcite.tools.Frameworks.withPlanner
>  (Frameworks.java:106)
> 
> org.apache.beam.sdks.java.extensions.sql.repackaged.org.apache.calcite.prepare.PlannerImpl.ready
>  (PlannerImpl.java:140)
> 
> org.apache.beam.sdks.java.extensions.sql.repackaged.org.apache.calcite.prepare.PlannerImpl.parse
>  (PlannerImpl.java:170)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3386) Dependency conflict when Calcite is included in a project.

2019-01-24 Thread Kai Jiang (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751928#comment-16751928
 ] 

Kai Jiang commented on BEAM-3386:
-

[~iemejia] Thanks! I will take some time trying to work on vendor calcite.

> Dependency conflict when Calcite is included in a project.
> --
>
> Key: BEAM-3386
> URL: https://issues.apache.org/jira/browse/BEAM-3386
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.2.0, 2.3.0, 2.4.0, 2.5.0, 2.6.0
>Reporter: Austin Haas
>Assignee: Kai Jiang
>Priority: Critical
>
> When Calcite (v. 1.13.0) is included in a project that also includes Beam and 
> the Beam SQL extension, then the following error is thrown when trying to run 
> Beam code.
> ClassCastException 
> org.apache.beam.sdk.extensions.sql.impl.planner.BeamRelDataTypeSystem cannot 
> be cast to org.apache.calcite.rel.type.RelDataTypeSystem
> org.apache.calcite.jdbc.CalciteConnectionImpl. 
> (CalciteConnectionImpl.java:120)
> 
> org.apache.calcite.jdbc.CalciteJdbc41Factory$CalciteJdbc41Connection. 
> (CalciteJdbc41Factory.java:114)
> org.apache.calcite.jdbc.CalciteJdbc41Factory.newConnection 
> (CalciteJdbc41Factory.java:59)
> org.apache.calcite.jdbc.CalciteJdbc41Factory.newConnection 
> (CalciteJdbc41Factory.java:44)
> org.apache.calcite.jdbc.CalciteFactory.newConnection 
> (CalciteFactory.java:53)
> org.apache.calcite.avatica.UnregisteredDriver.connect 
> (UnregisteredDriver.java:138)
> java.sql.DriverManager.getConnection (DriverManager.java:664)
> java.sql.DriverManager.getConnection (DriverManager.java:208)
> 
> org.apache.beam.sdks.java.extensions.sql.repackaged.org.apache.calcite.tools.Frameworks.withPrepare
>  (Frameworks.java:145)
> 
> org.apache.beam.sdks.java.extensions.sql.repackaged.org.apache.calcite.tools.Frameworks.withPlanner
>  (Frameworks.java:106)
> 
> org.apache.beam.sdks.java.extensions.sql.repackaged.org.apache.calcite.prepare.PlannerImpl.ready
>  (PlannerImpl.java:140)
> 
> org.apache.beam.sdks.java.extensions.sql.repackaged.org.apache.calcite.prepare.PlannerImpl.parse
>  (PlannerImpl.java:170)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6024) Gradle setupVirtualenv supports Python 3

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6024?focusedWorklogId=189857=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189857
 ]

ASF GitHub Bot logged work on BEAM-6024:


Author: ASF GitHub Bot
Created on: 25/Jan/19 06:05
Start Date: 25/Jan/19 06:05
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #7423: [BEAM-6024] Build 
Python 3 container image with Gradle
URL: https://github.com/apache/beam/pull/7423#issuecomment-457466225
 
 
   Run Seed Job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189857)
Time Spent: 2h 20m  (was: 2h 10m)

> Gradle setupVirtualenv supports Python 3
> 
>
> Key: BEAM-6024
> URL: https://issues.apache.org/jira/browse/BEAM-6024
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Need to depend on Python 3 virtualenv in few places:
> - Build Dataflow worker container in Python 3
> - Run ValidatesRunner and integration tests on Jenkins



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5953) Support DataflowRunner on Python 3

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5953?focusedWorklogId=189856=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189856
 ]

ASF GitHub Bot logged work on BEAM-5953:


Author: ASF GitHub Bot
Created on: 25/Jan/19 06:04
Start Date: 25/Jan/19 06:04
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #7521: [BEAM-5953] Fix 
py3 type error in bundle_processor
URL: https://github.com/apache/beam/pull/7521#issuecomment-457466049
 
 
   PTAL @tvalentyn @robertwb 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189856)
Time Spent: 4h 20m  (was: 4h 10m)

> Support DataflowRunner on Python 3
> --
>
> Key: BEAM-5953
> URL: https://issues.apache.org/jira/browse/BEAM-5953
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5953) Support DataflowRunner on Python 3

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5953?focusedWorklogId=189855=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189855
 ]

ASF GitHub Bot logged work on BEAM-5953:


Author: ASF GitHub Bot
Created on: 25/Jan/19 06:03
Start Date: 25/Jan/19 06:03
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #7521: [BEAM-5953] Fix 
py3 type error in bundle_processor
URL: https://github.com/apache/beam/pull/7521#issuecomment-457466049
 
 
   PTAL
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189855)
Time Spent: 4h 10m  (was: 4h)

> Support DataflowRunner on Python 3
> --
>
> Key: BEAM-5953
> URL: https://issues.apache.org/jira/browse/BEAM-5953
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6324) CassandraIO.Read - Add the ability to provide a filter to the query

2019-01-24 Thread Shahar Frank (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751917#comment-16751917
 ] 

Shahar Frank commented on BEAM-6324:


[~kenn] - I've pretty much given up on the PR being merged. I use [my own 
fork|https://github.com/srfrnk/beam] for my projects. I've included the 
compiled JAR in the repo and using it [directly from Github like 
so|[https://github.com/srfrnk/beam-playground].] It also uses [my own version 
for the Datastax Cassandra Driver for 
Java|https://github.com/srfrnk/java-driver] with the changes I needed to 
accomplish this. Hope that helps.

> CassandraIO.Read - Add the ability to provide a filter to the query
> ---
>
> Key: BEAM-6324
> URL: https://issues.apache.org/jira/browse/BEAM-6324
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-cassandra
>Affects Versions: 2.9.0
>Reporter: Shahar Frank
>Assignee: Shahar Frank
>Priority: Major
>  Labels: performance, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> CassandraIO.Read doesn't support using WHERE to filter the input at the 
> source (In Cassandra) which might provide great performance boost.
> Already implemented by:
> https://github.com/apache/beam/pull/7340



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6361) Fix user-metric prefix detection for portable Flink metrics

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6361?focusedWorklogId=189849=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189849
 ]

ASF GitHub Bot logged work on BEAM-6361:


Author: ASF GitHub Bot
Created on: 25/Jan/19 05:03
Start Date: 25/Jan/19 05:03
Worklog Time Spent: 10m 
  Work Description: ryan-williams commented on issue #7408: 
[BEAM-6361][BEAM-6364] fix user-metric-prefix checking in Flink portable 
metrics update
URL: https://github.com/apache/beam/pull/7408#issuecomment-457452816
 
 
   Looks like everything passed the second time around, so this should be good 
to go
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189849)
Time Spent: 3h 10m  (was: 3h)

> Fix user-metric prefix detection for portable Flink metrics
> ---
>
> Key: BEAM-6361
> URL: https://issues.apache.org/jira/browse/BEAM-6361
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.9.0
>Reporter: Ryan Williams
>Assignee: Ryan Williams
>Priority: Minor
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> [~ajam...@google.com] [noticed a 
> bug|https://github.com/apache/beam/pull/7183/files#r243402155] in classifying 
> portable Flink metrics as user-namespaced vs not.
> I think it's only currently affecting how metrics get displayed downstream in 
> Flink, but it should be fixed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6361) Fix user-metric prefix detection for portable Flink metrics

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6361?focusedWorklogId=189845=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189845
 ]

ASF GitHub Bot logged work on BEAM-6361:


Author: ASF GitHub Bot
Created on: 25/Jan/19 04:34
Start Date: 25/Jan/19 04:34
Worklog Time Spent: 10m 
  Work Description: ryan-williams commented on issue #7408: 
[BEAM-6361][BEAM-6364] fix user-metric-prefix checking in Flink portable 
metrics update
URL: https://github.com/apache/beam/pull/7408#issuecomment-457452816
 
 
   Looks like everything passed the second time around, so this should be good 
to go
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189845)
Time Spent: 2h 50m  (was: 2h 40m)

> Fix user-metric prefix detection for portable Flink metrics
> ---
>
> Key: BEAM-6361
> URL: https://issues.apache.org/jira/browse/BEAM-6361
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.9.0
>Reporter: Ryan Williams
>Assignee: Ryan Williams
>Priority: Minor
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> [~ajam...@google.com] [noticed a 
> bug|https://github.com/apache/beam/pull/7183/files#r243402155] in classifying 
> portable Flink metrics as user-namespaced vs not.
> I think it's only currently affecting how metrics get displayed downstream in 
> Flink, but it should be fixed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6361) Fix user-metric prefix detection for portable Flink metrics

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6361?focusedWorklogId=189846=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189846
 ]

ASF GitHub Bot logged work on BEAM-6361:


Author: ASF GitHub Bot
Created on: 25/Jan/19 04:35
Start Date: 25/Jan/19 04:35
Worklog Time Spent: 10m 
  Work Description: ryan-williams commented on issue #7408: 
[BEAM-6361][BEAM-6364] fix user-metric-prefix checking in Flink portable 
metrics update
URL: https://github.com/apache/beam/pull/7408#issuecomment-457452816
 
 
   Looks like everything passed the second time around, so this should be good 
to go
   
   edit: Python_PVR_Flink is apparently still running, though github was 
showing me all checks as passed a moment ago
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189846)
Time Spent: 3h  (was: 2h 50m)

> Fix user-metric prefix detection for portable Flink metrics
> ---
>
> Key: BEAM-6361
> URL: https://issues.apache.org/jira/browse/BEAM-6361
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.9.0
>Reporter: Ryan Williams
>Assignee: Ryan Williams
>Priority: Minor
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> [~ajam...@google.com] [noticed a 
> bug|https://github.com/apache/beam/pull/7183/files#r243402155] in classifying 
> portable Flink metrics as user-namespaced vs not.
> I think it's only currently affecting how metrics get displayed downstream in 
> Flink, but it should be fixed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6487) BigQuery api is out of date

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6487?focusedWorklogId=189837=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189837
 ]

ASF GitHub Bot logged work on BEAM-6487:


Author: ASF GitHub Bot
Created on: 25/Jan/19 03:06
Start Date: 25/Jan/19 03:06
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #7590: [BEAM-6487] 
Updating BQ API
URL: https://github.com/apache/beam/pull/7590
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189837)
Time Spent: 1h 10m  (was: 1h)

> BigQuery api is out of date
> ---
>
> Key: BEAM-6487
> URL: https://issues.apache.org/jira/browse/BEAM-6487
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5442) PortableRunner swallows custom options for Runner

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5442?focusedWorklogId=189835=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189835
 ]

ASF GitHub Bot logged work on BEAM-5442:


Author: ASF GitHub Bot
Created on: 25/Jan/19 03:04
Start Date: 25/Jan/19 03:04
Worklog Time Spent: 10m 
  Work Description: tweise commented on issue #7597: [WIP] [BEAM-5442] 
Retrieve valid runner options from JobService
URL: https://github.com/apache/beam/pull/7597#issuecomment-457439306
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189835)
Time Spent: 17h 50m  (was: 17h 40m)

> PortableRunner swallows custom options for Runner
> -
>
> Key: BEAM-5442
> URL: https://issues.apache.org/jira/browse/BEAM-5442
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core, sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Thomas Weise
>Priority: Major
>  Labels: portability, portability-flink
>  Time Spent: 17h 50m
>  Remaining Estimate: 0h
>
> The PortableRunner doesn't pass custom PipelineOptions to the executing 
> Runner.
> Example: {{--parallelism=4}} won't be forwarded to the FlinkRunner.
> (The option is just removed during proto translation without any warning)
> We should allow some form of customization through the options, even for the 
> PortableRunner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6431) Add ExecutionTime metrics to the Beam Java SDK

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6431?focusedWorklogId=189829=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189829
 ]

ASF GitHub Bot logged work on BEAM-6431:


Author: ASF GitHub Bot
Created on: 25/Jan/19 02:22
Start Date: 25/Jan/19 02:22
Worklog Time Spent: 10m 
  Work Description: swegner commented on pull request #7507: [BEAM-6431] 
Refactor, Remove references to Dataflow classes in base State Sampling
URL: https://github.com/apache/beam/pull/7507
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189829)
Time Spent: 1h 10m  (was: 1h)

> Add ExecutionTime metrics to the Beam Java SDK
> --
>
> Key: BEAM-6431
> URL: https://issues.apache.org/jira/browse/BEAM-6431
> Project: Beam
>  Issue Type: New Feature
>  Components: java-fn-execution
>Reporter: Alex Amato
>Assignee: Alex Amato
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This will be done by using the Dataflow worker's StateSampler code. I have 
> put together a refactoring plan
> [here|https://docs.google.com/document/d/1OlAJf4T_CTL9WRH8lP8uQOfLjWYfm8IpRXSe38g34k4/edit#]
> This will include estimating the processing time for the start, process and 
> finish bundle. The python SDK already has an implementation of this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6431) Add ExecutionTime metrics to the Beam Java SDK

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6431?focusedWorklogId=189822=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189822
 ]

ASF GitHub Bot logged work on BEAM-6431:


Author: ASF GitHub Bot
Created on: 25/Jan/19 02:03
Start Date: 25/Jan/19 02:03
Worklog Time Spent: 10m 
  Work Description: ajamato commented on issue #7507: [BEAM-6431] Refactor, 
Remove references to Dataflow classes in base State Sampling
URL: https://github.com/apache/beam/pull/7507#issuecomment-457428834
 
 
   I have finished all the extra testing which I wanted to perform.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189822)
Time Spent: 1h  (was: 50m)

> Add ExecutionTime metrics to the Beam Java SDK
> --
>
> Key: BEAM-6431
> URL: https://issues.apache.org/jira/browse/BEAM-6431
> Project: Beam
>  Issue Type: New Feature
>  Components: java-fn-execution
>Reporter: Alex Amato
>Assignee: Alex Amato
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This will be done by using the Dataflow worker's StateSampler code. I have 
> put together a refactoring plan
> [here|https://docs.google.com/document/d/1OlAJf4T_CTL9WRH8lP8uQOfLjWYfm8IpRXSe38g34k4/edit#]
> This will include estimating the processing time for the start, process and 
> finish bundle. The python SDK already has an implementation of this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6233) Make bundle execution with ExecutableStage support timer/states

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6233?focusedWorklogId=189808=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189808
 ]

ASF GitHub Bot logged work on BEAM-6233:


Author: ASF GitHub Bot
Created on: 25/Jan/19 00:53
Start Date: 25/Jan/19 00:53
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on issue #7330: [BEAM-6233]: Add 
initial user timer support in Dataflow for batch pipelines
URL: https://github.com/apache/beam/pull/7330#issuecomment-457415340
 
 
   R: @kennknowles, hey Kenn are you able to review this please? I want to get 
a good confirmation that the general logic is correct in this PR. In a 
subsequent PR I'll pull the timer logic into a different class.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189808)
Time Spent: 3h 20m  (was: 3h 10m)

> Make bundle execution with ExecutableStage support timer/states
> ---
>
> Key: BEAM-6233
> URL: https://issues.apache.org/jira/browse/BEAM-6233
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6506) Dataflow runner harness should always use proper coder URNs

2019-01-24 Thread Mark Liu (JIRA)
Mark Liu created BEAM-6506:
--

 Summary: Dataflow runner harness should always use proper coder 
URNs
 Key: BEAM-6506
 URL: https://issues.apache.org/jira/browse/BEAM-6506
 Project: Beam
  Issue Type: Wish
  Components: runner-core
Reporter: Mark Liu
Assignee: Robert Bradshaw


This comes from a Python 3 incompatible bug and mentioned in 
https://github.com/apache/beam/pull/7521/files#r248993766. The fix decode 
payload unconditionally in Python 2 or 3 but suggested by [~robertwb]: 

"the Dataflow runner harness was fixed to always use proper coder URNs rather 
than stuffing json-ized dataflow v1b3 cloud proto representations into the 
payload".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6506) Dataflow runner harness should always use proper coder URNs

2019-01-24 Thread Mark Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Liu updated BEAM-6506:
---
Labels: portability  (was: )

> Dataflow runner harness should always use proper coder URNs
> ---
>
> Key: BEAM-6506
> URL: https://issues.apache.org/jira/browse/BEAM-6506
> Project: Beam
>  Issue Type: Wish
>  Components: runner-core
>Reporter: Mark Liu
>Assignee: Robert Bradshaw
>Priority: Major
>  Labels: portability
>
> This comes from a Python 3 incompatible bug and mentioned in 
> https://github.com/apache/beam/pull/7521/files#r248993766. The fix decode 
> payload unconditionally in Python 2 or 3 but suggested by [~robertwb]: 
> "the Dataflow runner harness was fixed to always use proper coder URNs rather 
> than stuffing json-ized dataflow v1b3 cloud proto representations into the 
> payload".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5953) Support DataflowRunner on Python 3

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5953?focusedWorklogId=189807=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189807
 ]

ASF GitHub Bot logged work on BEAM-5953:


Author: ASF GitHub Bot
Created on: 25/Jan/19 00:53
Start Date: 25/Jan/19 00:53
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on pull request #7521: 
[BEAM-5953] Fix py3 type error in bundle_processor
URL: https://github.com/apache/beam/pull/7521#discussion_r250833446
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/bundle_processor.py
 ##
 @@ -591,8 +592,10 @@ def get_coder(self, coder_id):
   return self.context.coders.get_by_id(coder_id)
 else:
   # No URN, assume cloud object encoding json bytes.
-  return operation_specs.get_coder_from_spec(
-  json.loads(coder_proto.spec.spec.payload))
+  payload = coder_proto.spec.spec.payload
+  if isinstance(payload, bytes) and sys.version_info[0] == 3:
 
 Review comment:
   sg. Also filed https://issues.apache.org/jira/browse/BEAM-6506
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189807)
Time Spent: 4h  (was: 3h 50m)

> Support DataflowRunner on Python 3
> --
>
> Key: BEAM-5953
> URL: https://issues.apache.org/jira/browse/BEAM-5953
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5442) PortableRunner swallows custom options for Runner

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5442?focusedWorklogId=189805=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189805
 ]

ASF GitHub Bot logged work on BEAM-5442:


Author: ASF GitHub Bot
Created on: 25/Jan/19 00:48
Start Date: 25/Jan/19 00:48
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #7597: [WIP] 
[BEAM-5442] Retrieve valid runner options from JobService
URL: https://github.com/apache/beam/pull/7597#discussion_r250832615
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsFactory.java
 ##
 @@ -630,6 +634,55 @@ public static void printHelp(PrintStream out, Class i
 }
   }
 
+  public static Map describe(Set> ifaces) {
+checkNotNull(ifaces);
+Map result = new HashMap<>();
+
+for (Class iface : ifaces) {
+  CACHE.get().validateWellFormed(iface);
+
+  Set properties = 
PipelineOptionsReflector.getOptionSpecs(iface);
+
+  RowSortedTable, String, Method> ifacePropGetterTable =
+  TreeBasedTable.create(ClassNameComparator.INSTANCE, 
Ordering.natural());
+  for (PipelineOptionSpec prop : properties) {
+ifacePropGetterTable.put(prop.getDefiningInterface(), prop.getName(), 
prop.getGetterMethod());
+  }
+
+  for (Map.Entry, Map> ifaceToPropertyMap :
+  ifacePropGetterTable.rowMap().entrySet()) {
+Class currentIface = ifaceToPropertyMap.getKey();
+Map propertyNamesToGetters = 
ifaceToPropertyMap.getValue();
+
+List lists = 
Lists.newArrayList(propertyNamesToGetters.keySet());
+lists.sort(String.CASE_INSENSITIVE_ORDER);
+for (String propertyName : lists) {
+  Method method = propertyNamesToGetters.get(propertyName);
+  // TODO: type representation
+  String printableType = method.getReturnType().getSimpleName();
 
 Review comment:
   Perhaps we can alternatively just define the accumulation for the option, 
i.e. the equivalent of `store`, `append`, `store_true`? That's all that seems 
to be required at this point.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189805)
Time Spent: 17h 40m  (was: 17.5h)

> PortableRunner swallows custom options for Runner
> -
>
> Key: BEAM-5442
> URL: https://issues.apache.org/jira/browse/BEAM-5442
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core, sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Thomas Weise
>Priority: Major
>  Labels: portability, portability-flink
>  Time Spent: 17h 40m
>  Remaining Estimate: 0h
>
> The PortableRunner doesn't pass custom PipelineOptions to the executing 
> Runner.
> Example: {{--parallelism=4}} won't be forwarded to the FlinkRunner.
> (The option is just removed during proto translation without any warning)
> We should allow some form of customization through the options, even for the 
> PortableRunner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6505) Java SDK - Allow System Counters (which don't need MetricsContainer context)

2019-01-24 Thread Alex Amato (JIRA)
Alex Amato created BEAM-6505:


 Summary: Java SDK - Allow System Counters (which don't need 
MetricsContainer context)
 Key: BEAM-6505
 URL: https://issues.apache.org/jira/browse/BEAM-6505
 Project: Beam
  Issue Type: New Feature
  Components: java-fn-execution
Reporter: Alex Amato
Assignee: Alex Amato


See the comment added for this issue in ElementCountFnDataReceiver.java

The method used to create these metrics relies on the currently in scope 
metrics container, though we should use the same metrics container every time 
this code is invoked instead. There is no need to use the current scoped metric 
container, which only offers the main benefit to user counters, by attaching 
the PTransform name to the metrics. In this case the metric does not need the 
currently scoped PTransform name, since the code is labelling the metrics with 
the pcollection, and similar cases can manually attach the ptransform name 
(i.e. for execution time metrics).

We can make the static method LabelledMetrics.counter(metricName) obtain a 
consistent metric container instead of looking for the currently scoped metric 
container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6361) Fix user-metric prefix detection for portable Flink metrics

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6361?focusedWorklogId=189804=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189804
 ]

ASF GitHub Bot logged work on BEAM-6361:


Author: ASF GitHub Bot
Created on: 25/Jan/19 00:44
Start Date: 25/Jan/19 00:44
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #7408: [BEAM-6361][BEAM-6364] 
fix user-metric-prefix checking in Flink portable metrics update
URL: https://github.com/apache/beam/pull/7408#issuecomment-457413639
 
 
   If it's only that test failing, it's fine to merge. All other tests still 
run.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189804)
Time Spent: 2h 40m  (was: 2.5h)

> Fix user-metric prefix detection for portable Flink metrics
> ---
>
> Key: BEAM-6361
> URL: https://issues.apache.org/jira/browse/BEAM-6361
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.9.0
>Reporter: Ryan Williams
>Assignee: Ryan Williams
>Priority: Minor
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> [~ajam...@google.com] [noticed a 
> bug|https://github.com/apache/beam/pull/7183/files#r243402155] in classifying 
> portable Flink metrics as user-namespaced vs not.
> I think it's only currently affecting how metrics get displayed downstream in 
> Flink, but it should be fixed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6487) BigQuery api is out of date

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6487?focusedWorklogId=189787=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189787
 ]

ASF GitHub Bot logged work on BEAM-6487:


Author: ASF GitHub Bot
Created on: 25/Jan/19 00:32
Start Date: 25/Jan/19 00:32
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #7590: [BEAM-6487] Updating 
BQ API
URL: https://github.com/apache/beam/pull/7590#issuecomment-457410870
 
 
   added lint exclusion for generated internal proto sources cc:@tvalentyn 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189787)
Time Spent: 1h  (was: 50m)

> BigQuery api is out of date
> ---
>
> Key: BEAM-6487
> URL: https://issues.apache.org/jira/browse/BEAM-6487
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6468) Cannot create empty TestBoundedTable

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6468?focusedWorklogId=189801=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189801
 ]

ASF GitHub Bot logged work on BEAM-6468:


Author: ASF GitHub Bot
Created on: 25/Jan/19 00:33
Start Date: 25/Jan/19 00:33
Worklog Time Spent: 10m 
  Work Description: akedin commented on pull request #7568: [BEAM-6468] 
Allow creating empty TestBoundedTable
URL: https://github.com/apache/beam/pull/7568
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189801)
Time Spent: 3.5h  (was: 3h 20m)

> Cannot create empty TestBoundedTable
> 
>
> Key: BEAM-6468
> URL: https://issues.apache.org/jira/browse/BEAM-6468
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6361) Fix user-metric prefix detection for portable Flink metrics

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6361?focusedWorklogId=189781=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189781
 ]

ASF GitHub Bot logged work on BEAM-6361:


Author: ASF GitHub Bot
Created on: 25/Jan/19 00:28
Start Date: 25/Jan/19 00:28
Worklog Time Spent: 10m 
  Work Description: ryan-williams commented on issue #7408: 
[BEAM-6361][BEAM-6364] fix user-metric-prefix checking in Flink portable 
metrics update
URL: https://github.com/apache/beam/pull/7408#issuecomment-457410082
 
 
   ```
   org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest > 
testWriteRetryValidRequest FAILED
   java.lang.AssertionError at ElasticsearchIOTest.java:219
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189781)
Time Spent: 2.5h  (was: 2h 20m)

> Fix user-metric prefix detection for portable Flink metrics
> ---
>
> Key: BEAM-6361
> URL: https://issues.apache.org/jira/browse/BEAM-6361
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.9.0
>Reporter: Ryan Williams
>Assignee: Ryan Williams
>Priority: Minor
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> [~ajam...@google.com] [noticed a 
> bug|https://github.com/apache/beam/pull/7183/files#r243402155] in classifying 
> portable Flink metrics as user-namespaced vs not.
> I think it's only currently affecting how metrics get displayed downstream in 
> Flink, but it should be fixed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6361) Fix user-metric prefix detection for portable Flink metrics

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6361?focusedWorklogId=189780=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189780
 ]

ASF GitHub Bot logged work on BEAM-6361:


Author: ASF GitHub Bot
Created on: 25/Jan/19 00:27
Start Date: 25/Jan/19 00:27
Worklog Time Spent: 10m 
  Work Description: ryan-williams commented on issue #7408: 
[BEAM-6361][BEAM-6364] fix user-metric-prefix checking in Flink portable 
metrics update
URL: https://github.com/apache/beam/pull/7408#issuecomment-457409945
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189780)
Time Spent: 2h 20m  (was: 2h 10m)

> Fix user-metric prefix detection for portable Flink metrics
> ---
>
> Key: BEAM-6361
> URL: https://issues.apache.org/jira/browse/BEAM-6361
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.9.0
>Reporter: Ryan Williams
>Assignee: Ryan Williams
>Priority: Minor
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> [~ajam...@google.com] [noticed a 
> bug|https://github.com/apache/beam/pull/7183/files#r243402155] in classifying 
> portable Flink metrics as user-namespaced vs not.
> I think it's only currently affecting how metrics get displayed downstream in 
> Flink, but it should be fixed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6468) Cannot create empty TestBoundedTable

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6468?focusedWorklogId=189777=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189777
 ]

ASF GitHub Bot logged work on BEAM-6468:


Author: ASF GitHub Bot
Created on: 25/Jan/19 00:22
Start Date: 25/Jan/19 00:22
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #7568: [BEAM-6468] 
Allow creating empty TestBoundedTable
URL: https://github.com/apache/beam/pull/7568#discussion_r250828057
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestBoundedTable.java
 ##
 @@ -92,8 +95,19 @@ public TestBoundedTable addRows(Object... args) {
 
   @Override
   public PCollection buildIOReader(PBegin begin) {
+PTransform> valueTransform;
+
+if (rows.isEmpty()) {
+  valueTransform =
+  Create.empty(
+  SchemaCoder.of(
+  schema, SerializableFunctions.identity(), 
SerializableFunctions.identity()));
+} else {
+  valueTransform = Create.of(rows);
 
 Review comment:
   Thanks. Didn't notice this API to set SchemaCoder. Updated.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189777)
Time Spent: 3h 20m  (was: 3h 10m)

> Cannot create empty TestBoundedTable
> 
>
> Key: BEAM-6468
> URL: https://issues.apache.org/jira/browse/BEAM-6468
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6161) Add ElementCount MonitoringInfos for the Java SDK

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6161?focusedWorklogId=189776=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189776
 ]

ASF GitHub Bot logged work on BEAM-6161:


Author: ASF GitHub Bot
Created on: 25/Jan/19 00:21
Start Date: 25/Jan/19 00:21
Worklog Time Spent: 10m 
  Work Description: ryan-williams commented on pull request #7272: 
[BEAM-6161] Introduce PCollectionConsumerRegistry and add ElementCoun…
URL: https://github.com/apache/beam/pull/7272#discussion_r250827912
 
 

 ##
 File path: 
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/SpecMonitoringInfoValidator.java
 ##
 @@ -68,6 +68,7 @@ public SpecMonitoringInfoValidator() {
   monitoringInfo.getUrn(), spec.getTypeUrn(), 
monitoringInfo.getType()));
 }
 
+// TODO(ajamato): Tighten this restriction to use set equality, to catch 
unused
 
 Review comment:
   Just noting: I recently had a local change to keep this a bit relaxed, 
because I was testing with user metrics that were getting a `PTRANSFORM` label. 
I'd weakened the condition below to: 
   
   ```java
   !requiredLabels.ieEmpty && 
!monitoringInfo.getLabelsMap().keySet().equals(requiredLabels)
   ```
   
   where previously it required strict equality in all cases. 
   
   Seemingly this was relaxed to the `containsAll` below before I got anywhere 
with my change, and it's possible I was in some other invalid state to have had 
user metrics with a `PTRANSFORM` label that were failing the strict test (where 
`requiredLabels` was empty).
   
   Just wanted to leave a breadcrumb here about that since we'll probably come 
back to it soon..
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189776)
Time Spent: 11.5h  (was: 11h 20m)

> Add ElementCount MonitoringInfos for the Java SDK
> -
>
> Key: BEAM-6161
> URL: https://issues.apache.org/jira/browse/BEAM-6161
> Project: Beam
>  Issue Type: New Feature
>  Components: java-fn-execution, sdk-java-harness
>Reporter: Alex Amato
>Assignee: Alex Amato
>Priority: Major
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-4620) UnboundedReadFromBoundedSource.split() should always call split()

2019-01-24 Thread Chamikara Jayalath (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chamikara Jayalath resolved BEAM-4620.
--
   Resolution: Fixed
Fix Version/s: 2.11.0

> UnboundedReadFromBoundedSource.split() should always call split()
> -
>
> Key: BEAM-4620
> URL: https://issues.apache.org/jira/browse/BEAM-4620
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Josh M
>Assignee: Chamikara Jayalath
>Priority: Minor
> Fix For: 2.11.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> If the source contains too little data (rounds the result down to 0), or if 
> size estimation fails, then we don't call .split().  We must always call 
> split(); instead of returning original source, we should have computed some 
> fallback value for desiredBundleSize.
> This bug has existed since the code was first introduced, so it is not a 
> regression - but it is certainly a bug.
>  
> source: 
> https://github.com/apache/beam/blob/697a1d17e473cd5b097aaaeee24c08f43cc77f58/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/UnboundedReadFromBoundedSource.java#L137



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4620) UnboundedReadFromBoundedSource.split() should always call split()

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4620?focusedWorklogId=189762=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189762
 ]

ASF GitHub Bot logged work on BEAM-4620:


Author: ASF GitHub Bot
Created on: 24/Jan/19 23:59
Start Date: 24/Jan/19 23:59
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #7555: [BEAM-4620] 
UnboundedReadFromBoundedSource invokes split for small bounded sources
URL: https://github.com/apache/beam/pull/7555#issuecomment-457404131
 
 
   Thanks.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189762)
Time Spent: 1h 10m  (was: 1h)

> UnboundedReadFromBoundedSource.split() should always call split()
> -
>
> Key: BEAM-4620
> URL: https://issues.apache.org/jira/browse/BEAM-4620
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Josh M
>Assignee: Chamikara Jayalath
>Priority: Minor
> Fix For: 2.11.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> If the source contains too little data (rounds the result down to 0), or if 
> size estimation fails, then we don't call .split().  We must always call 
> split(); instead of returning original source, we should have computed some 
> fallback value for desiredBundleSize.
> This bug has existed since the code was first introduced, so it is not a 
> regression - but it is certainly a bug.
>  
> source: 
> https://github.com/apache/beam/blob/697a1d17e473cd5b097aaaeee24c08f43cc77f58/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/UnboundedReadFromBoundedSource.java#L137



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4620) UnboundedReadFromBoundedSource.split() should always call split()

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4620?focusedWorklogId=189759=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189759
 ]

ASF GitHub Bot logged work on BEAM-4620:


Author: ASF GitHub Bot
Created on: 24/Jan/19 23:50
Start Date: 24/Jan/19 23:50
Worklog Time Spent: 10m 
  Work Description: swegner commented on pull request #7555: [BEAM-4620] 
UnboundedReadFromBoundedSource invokes split for small bounded sources
URL: https://github.com/apache/beam/pull/7555
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189759)
Time Spent: 1h  (was: 50m)

> UnboundedReadFromBoundedSource.split() should always call split()
> -
>
> Key: BEAM-4620
> URL: https://issues.apache.org/jira/browse/BEAM-4620
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Josh M
>Assignee: Chamikara Jayalath
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> If the source contains too little data (rounds the result down to 0), or if 
> size estimation fails, then we don't call .split().  We must always call 
> split(); instead of returning original source, we should have computed some 
> fallback value for desiredBundleSize.
> This bug has existed since the code was first introduced, so it is not a 
> regression - but it is certainly a bug.
>  
> source: 
> https://github.com/apache/beam/blob/697a1d17e473cd5b097aaaeee24c08f43cc77f58/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/UnboundedReadFromBoundedSource.java#L137



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5977) Benchmark buffer use in the Go SDK Harness

2019-01-24 Thread Robert Burke (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke reassigned BEAM-5977:
--

Assignee: Robert Burke

> Benchmark buffer use in the Go SDK Harness
> --
>
> Key: BEAM-5977
> URL: https://issues.apache.org/jira/browse/BEAM-5977
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: Major
> Fix For: Not applicable
>
>
> There's opportunity to reduce CPU & RAM usage with better buffer re-use in 
> [datamgr.go|https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/harness/datamgr.go]
>  WRT both the small element batch buffer and with large elements.
> There should be some benchmarking around both large elements, and smaller 
> elements of varying sizes, (and ideally mixing), and using that we can 
> measure subsequent improvements.
> As a performance improvement, offhand, maintaining a pair of a couple of 
> `chunkSize` buffers could be handy and avoid associated GRPC costs, to handle 
> the smaller elements, as well as flushing large elements immediately and 
> without copying.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6468) Cannot create empty TestBoundedTable

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6468?focusedWorklogId=189753=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189753
 ]

ASF GitHub Bot logged work on BEAM-6468:


Author: ASF GitHub Bot
Created on: 24/Jan/19 23:35
Start Date: 24/Jan/19 23:35
Worklog Time Spent: 10m 
  Work Description: akedin commented on pull request #7568: [BEAM-6468] 
Allow creating empty TestBoundedTable
URL: https://github.com/apache/beam/pull/7568#discussion_r250818620
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestBoundedTable.java
 ##
 @@ -92,8 +95,19 @@ public TestBoundedTable addRows(Object... args) {
 
   @Override
   public PCollection buildIOReader(PBegin begin) {
+PTransform> valueTransform;
+
+if (rows.isEmpty()) {
+  valueTransform =
+  Create.empty(
+  SchemaCoder.of(
+  schema, SerializableFunctions.identity(), 
SerializableFunctions.identity()));
+} else {
+  valueTransform = Create.of(rows);
 
 Review comment:
   why not just always set the schema, it will then handle both cases:
   
   `Create.of(rows).withSchema(SchemaCoder.of(...))`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189753)
Time Spent: 3h 10m  (was: 3h)

> Cannot create empty TestBoundedTable
> 
>
> Key: BEAM-6468
> URL: https://issues.apache.org/jira/browse/BEAM-6468
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6154) Gcsio batch delete broken in Python 3

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6154?focusedWorklogId=189748=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189748
 ]

ASF GitHub Bot logged work on BEAM-6154:


Author: ASF GitHub Bot
Created on: 24/Jan/19 23:28
Start Date: 24/Jan/19 23:28
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #7617: [BEAM-6154] Update 
google-apitools to 0.5.26 and fix gcsio in python 3
URL: https://github.com/apache/beam/pull/7617#issuecomment-457397478
 
 
   R: @aaltay 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189748)
Time Spent: 1h 50m  (was: 1h 40m)

> Gcsio batch delete broken in Python 3
> -
>
> Key: BEAM-6154
> URL: https://issues.apache.org/jira/browse/BEAM-6154
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> I'm running Python SDK agianst GCP in Python 3.5 and got following gcsio 
> error while deleting files:
> {code}
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/io/iobase.py", 
> line 1077, in 
> window.TimestampedValue(v, timestamp.MAX_TIMESTAMP) for v in outputs)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", 
> line 315, in finalize_write
> num_threads)
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/internal/util.py", 
> line 145, in run_using_threadpool
> return pool.map(fn_to_execute, inputs)
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 266, in map
> return self._map_async(func, iterable, mapstar, chunksize).get()
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 644, in get
> raise self._value
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 119, in worker
> result = (True, func(*args, **kwds))
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 44, in mapstar
> return list(map(*args))
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", 
> line 299, in _rename_batch
> FileSystems.rename(source_files, destination_files)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filesystems.py", line 
> 252, in rename
> return filesystem.rename(source_file_names, destination_file_names)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsfilesystem.py", 
> line 229, in rename
> copy_statuses = gcsio.GcsIO().copy_batch(batch)
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsio.py", 
> line 322, in copy_batch
> api_calls = batch_request.Execute(self.client._http)  # pylint: 
> disable=protected-access
>   File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", 
> line 222, in Execute
> batch_http_request.Execute(http)
>   File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", 
> line 480, in Execute
> self._Execute(http)
>   File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", 
> line 450, in _Execute
> mime_response = parser.parsestr(header + response.content)
> TypeError: Can't convert 'bytes' object to str implicitly
> {code} 
> After looking into related code in apitools library, I found response.content 
> that's returned via http request to gcs is bytes and apitools didn't handle 
> this scenario. This can be a blocker to any pipeline depending on gcsio and 
> apparently blocks all Dataflow job in Python 3.
> This could be another case that moving off apitools dependency in 
> [BEAM-4850|https://issues.apache.org/jira/browse/BEAM-4850].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4776) Java PortableRunner should support metrics

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4776?focusedWorklogId=189755=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189755
 ]

ASF GitHub Bot logged work on BEAM-4776:


Author: ASF GitHub Bot
Created on: 24/Jan/19 23:37
Start Date: 24/Jan/19 23:37
Worklog Time Spent: 10m 
  Work Description: ryan-williams commented on pull request #7621: 
[BEAM-4776] make AutoValues for MetricResults, MetricQueryResults
URL: https://github.com/apache/beam/pull/7621
 
 
   … and remove duplicated implementations from `direct` and `direct/portable` 
runners
   
   This is an early bit of cleanup that came out of work I'm doing to plumb 
metrics over the job API (cf. 
[BEAM-4776](https://issues.apache.org/jira/browse/BEAM-4776)). In later work I 
re-use these same concrete `@AutoValue` classes a third time.
   
   R: @mxm, @ajamato 
   CC: @youngoli 
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | --- | --- | --- | ---
   
   
   
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189755)
Time Spent: 10m
Remaining Estimate: 0h

> Java PortableRunner should support metrics
> --
>
> Key: BEAM-4776
> URL: https://issues.apache.org/jira/browse/BEAM-4776
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Eugene Kirpichov
>Assignee: Ryan Williams
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> BEAM-4775 concerns adding metrics to the JobService API; the 

[jira] [Work logged] (BEAM-6154) Gcsio batch delete broken in Python 3

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6154?focusedWorklogId=189750=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189750
 ]

ASF GitHub Bot logged work on BEAM-6154:


Author: ASF GitHub Bot
Created on: 24/Jan/19 23:32
Start Date: 24/Jan/19 23:32
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #7617: [BEAM-6154] 
Update google-apitools to 0.5.26 and fix gcsio in python 3
URL: https://github.com/apache/beam/pull/7617
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189750)
Time Spent: 2h  (was: 1h 50m)

> Gcsio batch delete broken in Python 3
> -
>
> Key: BEAM-6154
> URL: https://issues.apache.org/jira/browse/BEAM-6154
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> I'm running Python SDK agianst GCP in Python 3.5 and got following gcsio 
> error while deleting files:
> {code}
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/io/iobase.py", 
> line 1077, in 
> window.TimestampedValue(v, timestamp.MAX_TIMESTAMP) for v in outputs)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", 
> line 315, in finalize_write
> num_threads)
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/internal/util.py", 
> line 145, in run_using_threadpool
> return pool.map(fn_to_execute, inputs)
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 266, in map
> return self._map_async(func, iterable, mapstar, chunksize).get()
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 644, in get
> raise self._value
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 119, in worker
> result = (True, func(*args, **kwds))
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 44, in mapstar
> return list(map(*args))
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", 
> line 299, in _rename_batch
> FileSystems.rename(source_files, destination_files)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filesystems.py", line 
> 252, in rename
> return filesystem.rename(source_file_names, destination_file_names)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsfilesystem.py", 
> line 229, in rename
> copy_statuses = gcsio.GcsIO().copy_batch(batch)
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsio.py", 
> line 322, in copy_batch
> api_calls = batch_request.Execute(self.client._http)  # pylint: 
> disable=protected-access
>   File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", 
> line 222, in Execute
> batch_http_request.Execute(http)
>   File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", 
> line 480, in Execute
> self._Execute(http)
>   File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", 
> line 450, in _Execute
> mime_response = parser.parsestr(header + response.content)
> TypeError: Can't convert 'bytes' object to str implicitly
> {code} 
> After looking into related code in apitools library, I found response.content 
> that's returned via http request to gcs is bytes and apitools didn't handle 
> this scenario. This can be a blocker to any pipeline depending on gcsio and 
> apparently blocks all Dataflow job in Python 3.
> This could be another case that moving off apitools dependency in 
> [BEAM-4850|https://issues.apache.org/jira/browse/BEAM-4850].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6361) Fix user-metric prefix detection for portable Flink metrics

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6361?focusedWorklogId=189742=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189742
 ]

ASF GitHub Bot logged work on BEAM-6361:


Author: ASF GitHub Bot
Created on: 24/Jan/19 23:21
Start Date: 24/Jan/19 23:21
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #7408: 
[BEAM-6361][BEAM-6364] fix user-metric-prefix checking in Flink portable 
metrics update
URL: https://github.com/apache/beam/pull/7408#issuecomment-457395853
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189742)
Time Spent: 2h 10m  (was: 2h)

> Fix user-metric prefix detection for portable Flink metrics
> ---
>
> Key: BEAM-6361
> URL: https://issues.apache.org/jira/browse/BEAM-6361
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.9.0
>Reporter: Ryan Williams
>Assignee: Ryan Williams
>Priority: Minor
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> [~ajam...@google.com] [noticed a 
> bug|https://github.com/apache/beam/pull/7183/files#r243402155] in classifying 
> portable Flink metrics as user-namespaced vs not.
> I think it's only currently affecting how metrics get displayed downstream in 
> Flink, but it should be fixed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6361) Fix user-metric prefix detection for portable Flink metrics

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6361?focusedWorklogId=189726=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189726
 ]

ASF GitHub Bot logged work on BEAM-6361:


Author: ASF GitHub Bot
Created on: 24/Jan/19 22:59
Start Date: 24/Jan/19 22:59
Worklog Time Spent: 10m 
  Work Description: ryan-williams commented on pull request #7408: 
[BEAM-6361][BEAM-6364] fix user-metric-prefix checking in Flink portable 
metrics update
URL: https://github.com/apache/beam/pull/7408#discussion_r250810281
 
 

 ##
 File path: 
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/SimpleMonitoringInfoBuilder.java
 ##
 @@ -112,7 +112,7 @@ private boolean validate() {
   if (split.size() != 5) {
 LOG.warn(
 "Dropping MonitoringInfo for URN {}, UserMetric namespaces and "
-+ "name cannot contain ':' characters.",
++ "names cannot contain ':' characters.",
 
 Review comment:
   It seems like you're agreeing that my fix is better? I'd changed it from 
"namespaces and name" to "namespaces and names".
   
   I also agree about the `s/,/./` on the comma.
   
   However it seems like this whole warning was dropped in the meantime, so 
this is all moot!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189726)
Time Spent: 2h  (was: 1h 50m)

> Fix user-metric prefix detection for portable Flink metrics
> ---
>
> Key: BEAM-6361
> URL: https://issues.apache.org/jira/browse/BEAM-6361
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.9.0
>Reporter: Ryan Williams
>Assignee: Ryan Williams
>Priority: Minor
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> [~ajam...@google.com] [noticed a 
> bug|https://github.com/apache/beam/pull/7183/files#r243402155] in classifying 
> portable Flink metrics as user-namespaced vs not.
> I think it's only currently affecting how metrics get displayed downstream in 
> Flink, but it should be fixed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6024) Gradle setupVirtualenv supports Python 3

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6024?focusedWorklogId=189723=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189723
 ]

ASF GitHub Bot logged work on BEAM-6024:


Author: ASF GitHub Bot
Created on: 24/Jan/19 22:51
Start Date: 24/Jan/19 22:51
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #7423: [BEAM-6024] Build 
Python 3 container image with Gradle
URL: https://github.com/apache/beam/pull/7423#issuecomment-457388800
 
 
   Run Seed Job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189723)
Time Spent: 2h 10m  (was: 2h)

> Gradle setupVirtualenv supports Python 3
> 
>
> Key: BEAM-6024
> URL: https://issues.apache.org/jira/browse/BEAM-6024
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Need to depend on Python 3 virtualenv in few places:
> - Build Dataflow worker container in Python 3
> - Run ValidatesRunner and integration tests on Jenkins



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6468) Cannot create empty TestBoundedTable

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6468?focusedWorklogId=189721=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189721
 ]

ASF GitHub Bot logged work on BEAM-6468:


Author: ASF GitHub Bot
Created on: 24/Jan/19 22:32
Start Date: 24/Jan/19 22:32
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #7568: [BEAM-6468] 
Allow creating empty TestBoundedTable
URL: https://github.com/apache/beam/pull/7568#discussion_r250803387
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestBoundedTable.java
 ##
 @@ -92,8 +95,19 @@ public TestBoundedTable addRows(Object... args) {
 
   @Override
   public PCollection buildIOReader(PBegin begin) {
+PTransform> valueTransform;
+
+if (rows.isEmpty()) {
+  valueTransform =
+  Create.empty(
+  SchemaCoder.of(
+  schema, SerializableFunctions.identity(), 
SerializableFunctions.identity()));
+} else {
+  valueTransform = Create.of(rows);
 
 Review comment:
   if rows is empty, Create.of() will throw exception and ask for SchemaCoder
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189721)
Time Spent: 3h  (was: 2h 50m)

> Cannot create empty TestBoundedTable
> 
>
> Key: BEAM-6468
> URL: https://issues.apache.org/jira/browse/BEAM-6468
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6161) Add ElementCount MonitoringInfos for the Java SDK

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6161?focusedWorklogId=189720=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189720
 ]

ASF GitHub Bot logged work on BEAM-6161:


Author: ASF GitHub Bot
Created on: 24/Jan/19 22:25
Start Date: 24/Jan/19 22:25
Worklog Time Spent: 10m 
  Work Description: swegner commented on pull request #7272: [BEAM-6161] 
Introduce PCollectionConsumerRegistry and add ElementCoun…
URL: https://github.com/apache/beam/pull/7272
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189720)
Time Spent: 11h 20m  (was: 11h 10m)

> Add ElementCount MonitoringInfos for the Java SDK
> -
>
> Key: BEAM-6161
> URL: https://issues.apache.org/jira/browse/BEAM-6161
> Project: Beam
>  Issue Type: New Feature
>  Components: java-fn-execution, sdk-java-harness
>Reporter: Alex Amato
>Assignee: Alex Amato
>Priority: Major
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6468) Cannot create empty TestBoundedTable

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6468?focusedWorklogId=189719=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189719
 ]

ASF GitHub Bot logged work on BEAM-6468:


Author: ASF GitHub Bot
Created on: 24/Jan/19 22:24
Start Date: 24/Jan/19 22:24
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #7568: [BEAM-6468] Allow 
creating empty TestBoundedTable
URL: https://github.com/apache/beam/pull/7568#issuecomment-457381427
 
 
   @akedin @apilloud 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189719)
Time Spent: 2h 50m  (was: 2h 40m)

> Cannot create empty TestBoundedTable
> 
>
> Key: BEAM-6468
> URL: https://issues.apache.org/jira/browse/BEAM-6468
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6504) Integration of Portabability sideInput into Dataflow

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6504?focusedWorklogId=189717=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189717
 ]

ASF GitHub Bot logged work on BEAM-6504:


Author: ASF GitHub Bot
Created on: 24/Jan/19 22:23
Start Date: 24/Jan/19 22:23
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on issue #7619: [BEAM-6504] Create 
Portable sideInput handler for Dataflow (Part One)
URL: https://github.com/apache/beam/pull/7619#issuecomment-457380947
 
 
   R: @boyuanzz  @swegner 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189717)
Time Spent: 20m  (was: 10m)

> Integration of Portabability sideInput into Dataflow
> 
>
> Key: BEAM-6504
> URL: https://issues.apache.org/jira/browse/BEAM-6504
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-dataflow
>Reporter: Ruoyun Huang
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Underlying fn api support is done in BEAM-2929, this Jira integrates 
> everything into dataflow. 
>  
> 1) introduce a sideInputHandler for dataflow. 
> 2) wire the handler to dataflow runner (i.e.  ProcessRemoteBundleOperation)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5442) PortableRunner swallows custom options for Runner

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5442?focusedWorklogId=189709=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189709
 ]

ASF GitHub Bot logged work on BEAM-5442:


Author: ASF GitHub Bot
Created on: 24/Jan/19 21:43
Start Date: 24/Jan/19 21:43
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #7597: [WIP] [BEAM-5442] 
Retrieve valid runner options from JobService
URL: https://github.com/apache/beam/pull/7597#discussion_r250788212
 
 

 ##
 File path: model/job-management/src/main/proto/beam_job_api.proto
 ##
 @@ -176,3 +179,34 @@ message JobState {
 CANCELLING = 10;
   }
 }
+
+
+// DescribePipelineOptions provides metadata about the options supported by a 
runner.
+// It will be used by the SDK client to validate the options specified by or
+// list available options to the user. // Throws error GRPC_STATUS_UNAVAILABLE 
if server is down
+message DescribePipelineOptionsRequest {
+  // TODO: base options to support proxy functionality
 
 Review comment:
   I see, thanks for clarifying.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189709)
Time Spent: 17.5h  (was: 17h 20m)

> PortableRunner swallows custom options for Runner
> -
>
> Key: BEAM-5442
> URL: https://issues.apache.org/jira/browse/BEAM-5442
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core, sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Thomas Weise
>Priority: Major
>  Labels: portability, portability-flink
>  Time Spent: 17.5h
>  Remaining Estimate: 0h
>
> The PortableRunner doesn't pass custom PipelineOptions to the executing 
> Runner.
> Example: {{--parallelism=4}} won't be forwarded to the FlinkRunner.
> (The option is just removed during proto translation without any warning)
> We should allow some form of customization through the options, even for the 
> PortableRunner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6443) decrease the number of threads for BigQuery streaming insertAll

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6443?focusedWorklogId=189711=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189711
 ]

ASF GitHub Bot logged work on BEAM-6443:


Author: ASF GitHub Bot
Created on: 24/Jan/19 21:54
Start Date: 24/Jan/19 21:54
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #7547: [BEAM-6443] 
decrease the number of thread for BigQuery streaming inse…
URL: https://github.com/apache/beam/pull/7547#issuecomment-457372034
 
 
   Actually, seems like sharding is per step not per bundle: 
https://github.com/apache/beam/blob/89e322198828702db239998d6dd29a574fe04898/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteTables.java#L139
   
   So the proposal here is to reduce the parallelism of writing to BQ for each 
write step from 50 unlimited sized thread pools to 50 threads.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189711)
Time Spent: 1h 50m  (was: 1h 40m)

> decrease the number of threads for BigQuery streaming insertAll
> ---
>
> Key: BEAM-6443
> URL: https://issues.apache.org/jira/browse/BEAM-6443
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> When inserting (a large number of ) very small elements into BigQuery via 
> streaming insertAll, BigQueryIO causes lots of quota exceeded errors. This 
> implies that 1) BigQueryIO puts unnecessary overheads on BigQuery API layer 
> by sending requests too fast 2) log file becomes very big because of repeated 
> same error messages. Currently we use 50 shards for writing data into 
> BigQuery and in each bundle 20-30 futures are executed simultaneously with 
> unlimited thread pool. It would be worth investigating whether just single 
> thread pool is sufficient for running concurrent insertAll.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6487) BigQuery api is out of date

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6487?focusedWorklogId=189710=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189710
 ]

ASF GitHub Bot logged work on BEAM-6487:


Author: ASF GitHub Bot
Created on: 24/Jan/19 21:51
Start Date: 24/Jan/19 21:51
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #7590: [BEAM-6487] Updating 
BQ API
URL: https://github.com/apache/beam/pull/7590#issuecomment-457371406
 
 
   rebased. Hoping this'll fix the lint issue.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189710)
Time Spent: 50m  (was: 40m)

> BigQuery api is out of date
> ---
>
> Key: BEAM-6487
> URL: https://issues.apache.org/jira/browse/BEAM-6487
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5442) PortableRunner swallows custom options for Runner

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5442?focusedWorklogId=189708=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189708
 ]

ASF GitHub Bot logged work on BEAM-5442:


Author: ASF GitHub Bot
Created on: 24/Jan/19 21:42
Start Date: 24/Jan/19 21:42
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #7597: [WIP] [BEAM-5442] 
Retrieve valid runner options from JobService
URL: https://github.com/apache/beam/pull/7597#discussion_r250788102
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsFactory.java
 ##
 @@ -630,6 +634,55 @@ public static void printHelp(PrintStream out, Class i
 }
   }
 
+  public static Map describe(Set> ifaces) {
+checkNotNull(ifaces);
+Map result = new HashMap<>();
+
+for (Class iface : ifaces) {
+  CACHE.get().validateWellFormed(iface);
+
+  Set properties = 
PipelineOptionsReflector.getOptionSpecs(iface);
+
+  RowSortedTable, String, Method> ifacePropGetterTable =
+  TreeBasedTable.create(ClassNameComparator.INSTANCE, 
Ordering.natural());
+  for (PipelineOptionSpec prop : properties) {
+ifacePropGetterTable.put(prop.getDefiningInterface(), prop.getName(), 
prop.getGetterMethod());
+  }
+
+  for (Map.Entry, Map> ifaceToPropertyMap :
+  ifacePropGetterTable.rowMap().entrySet()) {
+Class currentIface = ifaceToPropertyMap.getKey();
+Map propertyNamesToGetters = 
ifaceToPropertyMap.getValue();
+
+List lists = 
Lists.newArrayList(propertyNamesToGetters.keySet());
+lists.sort(String.CASE_INSENSITIVE_ORDER);
+for (String propertyName : lists) {
+  Method method = propertyNamesToGetters.get(propertyName);
+  // TODO: type representation
+  String printableType = method.getReturnType().getSimpleName();
 
 Review comment:
   Would it work to have a list of pre-defined types alongside with a mapping 
from Java types to those types?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189708)
Time Spent: 17h 20m  (was: 17h 10m)

> PortableRunner swallows custom options for Runner
> -
>
> Key: BEAM-5442
> URL: https://issues.apache.org/jira/browse/BEAM-5442
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core, sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Thomas Weise
>Priority: Major
>  Labels: portability, portability-flink
>  Time Spent: 17h 20m
>  Remaining Estimate: 0h
>
> The PortableRunner doesn't pass custom PipelineOptions to the executing 
> Runner.
> Example: {{--parallelism=4}} won't be forwarded to the FlinkRunner.
> (The option is just removed during proto translation without any warning)
> We should allow some form of customization through the options, even for the 
> PortableRunner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-6476) Kubectl command missing on workers beam8 and beam11

2019-01-24 Thread Lukasz Gajowy (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy closed BEAM-6476.
---
   Resolution: Fixed
Fix Version/s: Not applicable

> Kubectl command missing on workers beam8 and beam11
> ---
>
> Key: BEAM-6476
> URL: https://issues.apache.org/jira/browse/BEAM-6476
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Lukasz Gajowy
>Assignee: Jason Kuster
>Priority: Major
> Fix For: Not applicable
>
>
> Kubernetes seems to have gone missing on beam8 and beam11 leading to 
> Performance Tests failures: 
> [https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests]
> {code:java}
> ...
> [EnvInject] - Variables injected successfully.
> [beam_PerformanceTests_JDBC] $ /bin/bash -xe 
> /tmp/jenkins5453785317279275290.sh
> + gcloud container clusters get-credentials io-datastores 
> --zone=us-central1-a --verbosity=debug
> DEBUG: Running [gcloud.container.clusters.get-credentials] with arguments: 
> [--verbosity: "debug", --zone: "us-central1-a", NAME: "io-datastores"]
> WARNING: Accessing a Kubernetes Engine cluster requires the kubernetes 
> commandline
> client [kubectl]. To install, run
>   $ gcloud components install kubectl
> Fetching cluster endpoint and auth data.
> DEBUG: Saved kubeconfig to /home/jenkins/.kube/config
> kubeconfig entry generated for io-datastores.
> INFO: Display format: "default"
> DEBUG: SDK update checks are disabled.
> [beam_PerformanceTests_JDBC] $ /bin/bash -xe 
> /tmp/jenkins5922335513066151017.sh
> + cp /home/jenkins/.kube/config 
> /home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_JDBC/config-beam-performancetests-jdbc-1603
> [beam_PerformanceTests_JDBC] $ /bin/bash -xe /tmp/jenkins693655582377630528.sh
> + kubectl 
> --kubeconfig=/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_JDBC/config-beam-performancetests-jdbc-1603
>  create namespace beam-performancetests-jdbc-1603
> /tmp/jenkins693655582377630528.sh: line 2: kubectl: command not found
> Build step 'Execute shell' marked build as failure
> Sending e-mails to: bui...@beam.apache.org
> Finished: FAILURE
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6476) Kubectl command missing on workers beam8 and beam11

2019-01-24 Thread Lukasz Gajowy (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751609#comment-16751609
 ] 

Lukasz Gajowy commented on BEAM-6476:
-

This has been fixed by infra.

> Kubectl command missing on workers beam8 and beam11
> ---
>
> Key: BEAM-6476
> URL: https://issues.apache.org/jira/browse/BEAM-6476
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Lukasz Gajowy
>Assignee: Jason Kuster
>Priority: Major
>
> Kubernetes seems to have gone missing on beam8 and beam11 leading to 
> Performance Tests failures: 
> [https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests]
> {code:java}
> ...
> [EnvInject] - Variables injected successfully.
> [beam_PerformanceTests_JDBC] $ /bin/bash -xe 
> /tmp/jenkins5453785317279275290.sh
> + gcloud container clusters get-credentials io-datastores 
> --zone=us-central1-a --verbosity=debug
> DEBUG: Running [gcloud.container.clusters.get-credentials] with arguments: 
> [--verbosity: "debug", --zone: "us-central1-a", NAME: "io-datastores"]
> WARNING: Accessing a Kubernetes Engine cluster requires the kubernetes 
> commandline
> client [kubectl]. To install, run
>   $ gcloud components install kubectl
> Fetching cluster endpoint and auth data.
> DEBUG: Saved kubeconfig to /home/jenkins/.kube/config
> kubeconfig entry generated for io-datastores.
> INFO: Display format: "default"
> DEBUG: SDK update checks are disabled.
> [beam_PerformanceTests_JDBC] $ /bin/bash -xe 
> /tmp/jenkins5922335513066151017.sh
> + cp /home/jenkins/.kube/config 
> /home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_JDBC/config-beam-performancetests-jdbc-1603
> [beam_PerformanceTests_JDBC] $ /bin/bash -xe /tmp/jenkins693655582377630528.sh
> + kubectl 
> --kubeconfig=/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_JDBC/config-beam-performancetests-jdbc-1603
>  create namespace beam-performancetests-jdbc-1603
> /tmp/jenkins693655582377630528.sh: line 2: kubectl: command not found
> Build step 'Execute shell' marked build as failure
> Sending e-mails to: bui...@beam.apache.org
> Finished: FAILURE
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6184) PortableRunner dependency missed in wordcount example maven artifact

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6184?focusedWorklogId=189692=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189692
 ]

ASF GitHub Bot logged work on BEAM-6184:


Author: ASF GitHub Bot
Created on: 24/Jan/19 21:07
Start Date: 24/Jan/19 21:07
Worklog Time Spent: 10m 
  Work Description: swegner commented on pull request #7532: 
[BEAM-6184]Make checkstyle report error on missing javaDocMethod
URL: https://github.com/apache/beam/pull/7532
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189692)
Time Spent: 10h 20m  (was: 10h 10m)

> PortableRunner dependency missed in wordcount example maven artifact
> 
>
> Key: BEAM-6184
> URL: https://issues.apache.org/jira/browse/BEAM-6184
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Ruoyun Huang
>Assignee: Ruoyun Huang
>Priority: Minor
> Fix For: Not applicable
>
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
>  
>  
> more context: 
> https://lists.apache.org/thread.html/8dd60395424425f7502d62888c49014430d1d3b06c026606f3db28ab@%3Cuser.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6468) Cannot create empty TestBoundedTable

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6468?focusedWorklogId=189698=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189698
 ]

ASF GitHub Bot logged work on BEAM-6468:


Author: ASF GitHub Bot
Created on: 24/Jan/19 21:11
Start Date: 24/Jan/19 21:11
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #7568: [BEAM-6468] Allow 
creating empty TestBoundedTable
URL: https://github.com/apache/beam/pull/7568#issuecomment-457358175
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189698)
Time Spent: 2h 40m  (was: 2.5h)

> Cannot create empty TestBoundedTable
> 
>
> Key: BEAM-6468
> URL: https://issues.apache.org/jira/browse/BEAM-6468
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6154) Gcsio batch delete broken in Python 3

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6154?focusedWorklogId=189697=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189697
 ]

ASF GitHub Bot logged work on BEAM-6154:


Author: ASF GitHub Bot
Created on: 24/Jan/19 21:10
Start Date: 24/Jan/19 21:10
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #7617: [BEAM-6154] 
Update google-apitools to 0.5.26 and fix gcsio in python 3
URL: https://github.com/apache/beam/pull/7617#issuecomment-457357755
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189697)
Time Spent: 1h 40m  (was: 1.5h)

> Gcsio batch delete broken in Python 3
> -
>
> Key: BEAM-6154
> URL: https://issues.apache.org/jira/browse/BEAM-6154
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> I'm running Python SDK agianst GCP in Python 3.5 and got following gcsio 
> error while deleting files:
> {code}
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/io/iobase.py", 
> line 1077, in 
> window.TimestampedValue(v, timestamp.MAX_TIMESTAMP) for v in outputs)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", 
> line 315, in finalize_write
> num_threads)
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/internal/util.py", 
> line 145, in run_using_threadpool
> return pool.map(fn_to_execute, inputs)
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 266, in map
> return self._map_async(func, iterable, mapstar, chunksize).get()
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 644, in get
> raise self._value
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 119, in worker
> result = (True, func(*args, **kwds))
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 44, in mapstar
> return list(map(*args))
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", 
> line 299, in _rename_batch
> FileSystems.rename(source_files, destination_files)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filesystems.py", line 
> 252, in rename
> return filesystem.rename(source_file_names, destination_file_names)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsfilesystem.py", 
> line 229, in rename
> copy_statuses = gcsio.GcsIO().copy_batch(batch)
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsio.py", 
> line 322, in copy_batch
> api_calls = batch_request.Execute(self.client._http)  # pylint: 
> disable=protected-access
>   File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", 
> line 222, in Execute
> batch_http_request.Execute(http)
>   File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", 
> line 480, in Execute
> self._Execute(http)
>   File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", 
> line 450, in _Execute
> mime_response = parser.parsestr(header + response.content)
> TypeError: Can't convert 'bytes' object to str implicitly
> {code} 
> After looking into related code in apitools library, I found response.content 
> that's returned via http request to gcs is bytes and apitools didn't handle 
> this scenario. This can be a blocker to any pipeline depending on gcsio and 
> apparently blocks all Dataflow job in Python 3.
> This could be another case that moving off apitools dependency in 
> [BEAM-4850|https://issues.apache.org/jira/browse/BEAM-4850].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6431) Add ExecutionTime metrics to the Beam Java SDK

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6431?focusedWorklogId=189686=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189686
 ]

ASF GitHub Bot logged work on BEAM-6431:


Author: ASF GitHub Bot
Created on: 24/Jan/19 21:03
Start Date: 24/Jan/19 21:03
Worklog Time Spent: 10m 
  Work Description: ajamato commented on issue #7507: [BEAM-6431] [Do not 
merge] Refactor, Remove references to Dataflow classes in base State Sampling
URL: https://github.com/apache/beam/pull/7507#issuecomment-457355342
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189686)
Time Spent: 50m  (was: 40m)

> Add ExecutionTime metrics to the Beam Java SDK
> --
>
> Key: BEAM-6431
> URL: https://issues.apache.org/jira/browse/BEAM-6431
> Project: Beam
>  Issue Type: New Feature
>  Components: java-fn-execution
>Reporter: Alex Amato
>Assignee: Alex Amato
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This will be done by using the Dataflow worker's StateSampler code. I have 
> put together a refactoring plan
> [here|https://docs.google.com/document/d/1OlAJf4T_CTL9WRH8lP8uQOfLjWYfm8IpRXSe38g34k4/edit#]
> This will include estimating the processing time for the start, process and 
> finish bundle. The python SDK already has an implementation of this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3342) Create a Cloud Bigtable Python connector

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3342?focusedWorklogId=189684=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189684
 ]

ASF GitHub Bot logged work on BEAM-3342:


Author: ASF GitHub Bot
Created on: 24/Jan/19 21:02
Start Date: 24/Jan/19 21:02
Worklog Time Spent: 10m 
  Work Description: juan-rael commented on pull request #7367: [BEAM-3342] 
Create a Cloud Bigtable Python connector Write
URL: https://github.com/apache/beam/pull/7367#discussion_r250775235
 
 

 ##
 File path: sdks/python/apache_beam/examples/cookbook/bigtableio_it_test.py
 ##
 @@ -0,0 +1,197 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Unittest for GCP Bigtable testing."""
+from __future__ import absolute_import
+
+import datetime
+import logging
+import random
+import string
+import unittest
+import uuid
+
+import pytz
+
+import apache_beam as beam
+from apache_beam.io.gcp.bigtableio import _BigTableWriteFn
+from apache_beam.metrics.metric import MetricsFilter
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.runners.runner import PipelineState
+from apache_beam.testing.test_pipeline import TestPipeline
+
+# Protect against environments where bigtable library is not available.
+# pylint: disable=wrong-import-order, wrong-import-position
+try:
+  from google.cloud._helpers import _datetime_from_microseconds
+  from google.cloud._helpers import _microseconds_from_datetime
+  from google.cloud._helpers import UTC
+  from google.cloud.bigtable import row, column_family, Client
+except ImportError:
+  Client = None
+  UTC = pytz.utc
+  _microseconds_from_datetime = lambda label_stamp: label_stamp
+  _datetime_from_microseconds = lambda micro: micro
+
+
+EXISTING_INSTANCES = []
+LABEL_KEY = u'python-bigtable-beam'
+label_stamp = datetime.datetime.utcnow().replace(tzinfo=UTC)
+label_stamp_micros = _microseconds_from_datetime(label_stamp)
+LABELS = {LABEL_KEY: str(label_stamp_micros)}
+
+
+def _retry_on_unavailable(exc):
+  """Retry only errors whose status code is 'UNAVAILABLE'."""
+  from grpc import StatusCode
+  return exc.code() == StatusCode.UNAVAILABLE
+
+
+class WriteToBigTable(beam.PTransform):
 
 Review comment:
   I correct all the problems...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189684)
Time Spent: 13h 40m  (was: 13.5h)

> Create a Cloud Bigtable Python connector
> 
>
> Key: BEAM-3342
> URL: https://issues.apache.org/jira/browse/BEAM-3342
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Solomon Duskis
>Assignee: Solomon Duskis
>Priority: Major
>  Time Spent: 13h 40m
>  Remaining Estimate: 0h
>
> I would like to create a Cloud Bigtable python connector.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6154) Gcsio batch delete broken in Python 3

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6154?focusedWorklogId=189611=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189611
 ]

ASF GitHub Bot logged work on BEAM-6154:


Author: ASF GitHub Bot
Created on: 24/Jan/19 18:36
Start Date: 24/Jan/19 18:36
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #7617: [BEAM-6154] Update 
google-apitools to 0.5.26 and fix gcsio in python 3
URL: https://github.com/apache/beam/pull/7617#issuecomment-457307838
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189611)
Time Spent: 1h 10m  (was: 1h)

> Gcsio batch delete broken in Python 3
> -
>
> Key: BEAM-6154
> URL: https://issues.apache.org/jira/browse/BEAM-6154
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> I'm running Python SDK agianst GCP in Python 3.5 and got following gcsio 
> error while deleting files:
> {code}
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/io/iobase.py", 
> line 1077, in 
> window.TimestampedValue(v, timestamp.MAX_TIMESTAMP) for v in outputs)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", 
> line 315, in finalize_write
> num_threads)
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/internal/util.py", 
> line 145, in run_using_threadpool
> return pool.map(fn_to_execute, inputs)
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 266, in map
> return self._map_async(func, iterable, mapstar, chunksize).get()
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 644, in get
> raise self._value
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 119, in worker
> result = (True, func(*args, **kwds))
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 44, in mapstar
> return list(map(*args))
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", 
> line 299, in _rename_batch
> FileSystems.rename(source_files, destination_files)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filesystems.py", line 
> 252, in rename
> return filesystem.rename(source_file_names, destination_file_names)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsfilesystem.py", 
> line 229, in rename
> copy_statuses = gcsio.GcsIO().copy_batch(batch)
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsio.py", 
> line 322, in copy_batch
> api_calls = batch_request.Execute(self.client._http)  # pylint: 
> disable=protected-access
>   File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", 
> line 222, in Execute
> batch_http_request.Execute(http)
>   File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", 
> line 480, in Execute
> self._Execute(http)
>   File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", 
> line 450, in _Execute
> mime_response = parser.parsestr(header + response.content)
> TypeError: Can't convert 'bytes' object to str implicitly
> {code} 
> After looking into related code in apitools library, I found response.content 
> that's returned via http request to gcs is bytes and apitools didn't handle 
> this scenario. This can be a blocker to any pipeline depending on gcsio and 
> apparently blocks all Dataflow job in Python 3.
> This could be another case that moving off apitools dependency in 
> [BEAM-4850|https://issues.apache.org/jira/browse/BEAM-4850].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6161) Add ElementCount MonitoringInfos for the Java SDK

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6161?focusedWorklogId=189687=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189687
 ]

ASF GitHub Bot logged work on BEAM-6161:


Author: ASF GitHub Bot
Created on: 24/Jan/19 21:03
Start Date: 24/Jan/19 21:03
Worklog Time Spent: 10m 
  Work Description: ajamato commented on issue #7272: [BEAM-6161] Introduce 
PCollectionConsumerRegistry and add ElementCoun…
URL: https://github.com/apache/beam/pull/7272#issuecomment-457355367
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189687)
Time Spent: 11h 10m  (was: 11h)

> Add ElementCount MonitoringInfos for the Java SDK
> -
>
> Key: BEAM-6161
> URL: https://issues.apache.org/jira/browse/BEAM-6161
> Project: Beam
>  Issue Type: New Feature
>  Components: java-fn-execution, sdk-java-harness
>Reporter: Alex Amato
>Assignee: Alex Amato
>Priority: Major
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5933) PCollectionViews$SimplePCollectionView.hashCode allocates memory

2019-01-24 Thread JIRA


[ 
https://issues.apache.org/jira/browse/BEAM-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751579#comment-16751579
 ] 

Ismaël Mejía commented on BEAM-5933:


clone? close? I did not understand what is the reality? isn't allocating extra 
memory an issue?

> PCollectionViews$SimplePCollectionView.hashCode allocates memory
> 
>
> Key: BEAM-5933
> URL: https://issues.apache.org/jira/browse/BEAM-5933
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Affects Versions: 2.8.0
>Reporter: Vojtech Janota
>Assignee: Vojtech Janota
>Priority: Trivial
> Fix For: 2.9.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> I'm currently profiling memory consumption of our Beam pipeline and have 
> noticed that
>     
> org.apache.beam.sdk.values.PCollectionViews$SimplePCollectionView.hashCode()
> makes noticeable heap allocations. The implementation is:
>     return Objects.hash(tag);
> That itself translates to:
>     return Arrays.hashCode(values);
> Which performs implicit array creation in order to call:
>     public static int Arrays.hashCode(Object a[]);
> Instead of the helper call, doing simple:
>     tag.hashCode();
> Seems more appropriate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-6354) Hanging SplittableDoFnTest#testLateData

2019-01-24 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles resolved BEAM-6354.
---
Resolution: Fixed

> Hanging SplittableDoFnTest#testLateData
> ---
>
> Key: BEAM-6354
> URL: https://issues.apache.org/jira/browse/BEAM-6354
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Kenneth Knowles
>Priority: Major
> Fix For: 2.10.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> It seems that they have a similar root cause because both of them use 
> unbounded streams.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6504) Integration of Portabability sideInput into Dataflow

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6504?focusedWorklogId=189666=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189666
 ]

ASF GitHub Bot logged work on BEAM-6504:


Author: ASF GitHub Bot
Created on: 24/Jan/19 20:14
Start Date: 24/Jan/19 20:14
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on pull request #7619: [BEAM-6504] 
Create sideInput handler for Dataflow (Part One)
URL: https://github.com/apache/beam/pull/7619
 
 
   Create a SideInput handler inside dataflow fnApi runner. 
   
   There will be another PR to actually wire this independent module. 
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | --- | --- | --- | ---
   
   
   
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189666)
Time Spent: 10m
Remaining Estimate: 0h

> Integration of Portabability sideInput into Dataflow
> 
>
> Key: BEAM-6504
>  

[jira] [Work logged] (BEAM-6184) PortableRunner dependency missed in wordcount example maven artifact

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6184?focusedWorklogId=189654=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189654
 ]

ASF GitHub Bot logged work on BEAM-6184:


Author: ASF GitHub Bot
Created on: 24/Jan/19 19:55
Start Date: 24/Jan/19 19:55
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on issue #7532: [BEAM-6184]Make 
checkstyle report error on missing javadocmethod
URL: https://github.com/apache/beam/pull/7532#issuecomment-457334156
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189654)
Time Spent: 10h  (was: 9h 50m)

> PortableRunner dependency missed in wordcount example maven artifact
> 
>
> Key: BEAM-6184
> URL: https://issues.apache.org/jira/browse/BEAM-6184
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Ruoyun Huang
>Assignee: Ruoyun Huang
>Priority: Minor
> Fix For: Not applicable
>
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
>  
>  
> more context: 
> https://lists.apache.org/thread.html/8dd60395424425f7502d62888c49014430d1d3b06c026606f3db28ab@%3Cuser.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6504) Integration of Portabability sideInput into Dataflow

2019-01-24 Thread Ruoyun Huang (JIRA)
Ruoyun Huang created BEAM-6504:
--

 Summary: Integration of Portabability sideInput into Dataflow
 Key: BEAM-6504
 URL: https://issues.apache.org/jira/browse/BEAM-6504
 Project: Beam
  Issue Type: New Feature
  Components: runner-dataflow
Reporter: Ruoyun Huang
Assignee: Ruoyun Huang


Underlying fn api support is done in BEAM-2929, this Jira integrates everything 
into dataflow. 

 

1) introduce a sideInputHandler for dataflow. 

2) wire the handler to dataflow runner (i.e.  ProcessRemoteBundleOperation)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6154) Gcsio batch delete broken in Python 3

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6154?focusedWorklogId=189665=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189665
 ]

ASF GitHub Bot logged work on BEAM-6154:


Author: ASF GitHub Bot
Created on: 24/Jan/19 20:09
Start Date: 24/Jan/19 20:09
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #7617: [BEAM-6154] 
Update google-apitools to 0.5.26 and fix gcsio in python 3
URL: https://github.com/apache/beam/pull/7617#issuecomment-457338679
 
 
   Run Python Dataflow ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189665)
Time Spent: 1.5h  (was: 1h 20m)

> Gcsio batch delete broken in Python 3
> -
>
> Key: BEAM-6154
> URL: https://issues.apache.org/jira/browse/BEAM-6154
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> I'm running Python SDK agianst GCP in Python 3.5 and got following gcsio 
> error while deleting files:
> {code}
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/io/iobase.py", 
> line 1077, in 
> window.TimestampedValue(v, timestamp.MAX_TIMESTAMP) for v in outputs)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", 
> line 315, in finalize_write
> num_threads)
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/internal/util.py", 
> line 145, in run_using_threadpool
> return pool.map(fn_to_execute, inputs)
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 266, in map
> return self._map_async(func, iterable, mapstar, chunksize).get()
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 644, in get
> raise self._value
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 119, in worker
> result = (True, func(*args, **kwds))
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 44, in mapstar
> return list(map(*args))
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", 
> line 299, in _rename_batch
> FileSystems.rename(source_files, destination_files)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filesystems.py", line 
> 252, in rename
> return filesystem.rename(source_file_names, destination_file_names)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsfilesystem.py", 
> line 229, in rename
> copy_statuses = gcsio.GcsIO().copy_batch(batch)
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsio.py", 
> line 322, in copy_batch
> api_calls = batch_request.Execute(self.client._http)  # pylint: 
> disable=protected-access
>   File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", 
> line 222, in Execute
> batch_http_request.Execute(http)
>   File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", 
> line 480, in Execute
> self._Execute(http)
>   File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", 
> line 450, in _Execute
> mime_response = parser.parsestr(header + response.content)
> TypeError: Can't convert 'bytes' object to str implicitly
> {code} 
> After looking into related code in apitools library, I found response.content 
> that's returned via http request to gcs is bytes and apitools didn't handle 
> this scenario. This can be a blocker to any pipeline depending on gcsio and 
> apparently blocks all Dataflow job in Python 3.
> This could be another case that moving off apitools dependency in 
> [BEAM-4850|https://issues.apache.org/jira/browse/BEAM-4850].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6184) PortableRunner dependency missed in wordcount example maven artifact

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6184?focusedWorklogId=189652=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189652
 ]

ASF GitHub Bot logged work on BEAM-6184:


Author: ASF GitHub Bot
Created on: 24/Jan/19 19:54
Start Date: 24/Jan/19 19:54
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on issue #7532: [BEAM-6184]Make 
checkstyle report error on missing javadocmethod
URL: https://github.com/apache/beam/pull/7532#issuecomment-457333792
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189652)
Time Spent: 9h 40m  (was: 9.5h)

> PortableRunner dependency missed in wordcount example maven artifact
> 
>
> Key: BEAM-6184
> URL: https://issues.apache.org/jira/browse/BEAM-6184
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Ruoyun Huang
>Assignee: Ruoyun Huang
>Priority: Minor
> Fix For: Not applicable
>
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
>  
>  
> more context: 
> https://lists.apache.org/thread.html/8dd60395424425f7502d62888c49014430d1d3b06c026606f3db28ab@%3Cuser.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6184) PortableRunner dependency missed in wordcount example maven artifact

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6184?focusedWorklogId=189656=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189656
 ]

ASF GitHub Bot logged work on BEAM-6184:


Author: ASF GitHub Bot
Created on: 24/Jan/19 20:02
Start Date: 24/Jan/19 20:02
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on issue #7532: [BEAM-6184]Make 
checkstyle report error on missing javadocmethod
URL: https://github.com/apache/beam/pull/7532#issuecomment-457336629
 
 
   synced to HEAD and now shows all green 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189656)
Time Spent: 10h 10m  (was: 10h)

> PortableRunner dependency missed in wordcount example maven artifact
> 
>
> Key: BEAM-6184
> URL: https://issues.apache.org/jira/browse/BEAM-6184
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Ruoyun Huang
>Assignee: Ruoyun Huang
>Priority: Minor
> Fix For: Not applicable
>
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
>  
>  
> more context: 
> https://lists.apache.org/thread.html/8dd60395424425f7502d62888c49014430d1d3b06c026606f3db28ab@%3Cuser.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6184) PortableRunner dependency missed in wordcount example maven artifact

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6184?focusedWorklogId=189653=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189653
 ]

ASF GitHub Bot logged work on BEAM-6184:


Author: ASF GitHub Bot
Created on: 24/Jan/19 19:55
Start Date: 24/Jan/19 19:55
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on issue #7532: [BEAM-6184]Make 
checkstyle report error on missing javadocmethod
URL: https://github.com/apache/beam/pull/7532#issuecomment-457334112
 
 
   Run Seed Job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189653)
Time Spent: 9h 50m  (was: 9h 40m)

> PortableRunner dependency missed in wordcount example maven artifact
> 
>
> Key: BEAM-6184
> URL: https://issues.apache.org/jira/browse/BEAM-6184
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Ruoyun Huang
>Assignee: Ruoyun Huang
>Priority: Minor
> Fix For: Not applicable
>
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
>  
>  
> more context: 
> https://lists.apache.org/thread.html/8dd60395424425f7502d62888c49014430d1d3b06c026606f3db28ab@%3Cuser.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5442) PortableRunner swallows custom options for Runner

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5442?focusedWorklogId=189645=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189645
 ]

ASF GitHub Bot logged work on BEAM-5442:


Author: ASF GitHub Bot
Created on: 24/Jan/19 19:39
Start Date: 24/Jan/19 19:39
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #7564: [release] Revert 
"[BEAM-5442] Pass SDK-unknown PipelineOptions to Runner"
URL: https://github.com/apache/beam/pull/7564#issuecomment-457329021
 
 
   Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189645)
Time Spent: 17h  (was: 16h 50m)

> PortableRunner swallows custom options for Runner
> -
>
> Key: BEAM-5442
> URL: https://issues.apache.org/jira/browse/BEAM-5442
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core, sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Thomas Weise
>Priority: Major
>  Labels: portability, portability-flink
>  Time Spent: 17h
>  Remaining Estimate: 0h
>
> The PortableRunner doesn't pass custom PipelineOptions to the executing 
> Runner.
> Example: {{--parallelism=4}} won't be forwarded to the FlinkRunner.
> (The option is just removed during proto translation without any warning)
> We should allow some form of customization through the options, even for the 
> PortableRunner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5442) PortableRunner swallows custom options for Runner

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5442?focusedWorklogId=189649=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189649
 ]

ASF GitHub Bot logged work on BEAM-5442:


Author: ASF GitHub Bot
Created on: 24/Jan/19 19:46
Start Date: 24/Jan/19 19:46
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #7597: [WIP] 
[BEAM-5442] Retrieve valid runner options from JobService
URL: https://github.com/apache/beam/pull/7597#discussion_r250697265
 
 

 ##
 File path: model/job-management/src/main/proto/beam_job_api.proto
 ##
 @@ -176,3 +179,37 @@ message JobState {
 CANCELLING = 10;
   }
 }
+
+
+// DescribePipelineOptions provides metadata about the options supported by a 
runner.
+// It will be used by the SDK client to validate the options specified by or
+// list available options to the user.
+// Throws error GRPC_STATUS_UNAVAILABLE if server is down
+message DescribePipelineOptionsRequest {
+  // TODO: base options to support proxy functionality
+}
+
+// Metadata for a pipeline option.
+message PipelineOptionDescriptor {
+  // (Required) The option key.
+  string key = 1;
+
+  // (Required) Type of option.
+  // TODO: enumerate
+  string type = 2;
+
+  // (Optional) Description suitable for display / help text.
+  string description = 3;
+
+  // (Optional) Default value.
+  string default_value = 4;
+
+  // (Required) The group this option belongs to.
+  string group = 5;
+}
+
+message DescribePipelineOptionsResponse {
+  // Map of pipeline option key to descriptor.
+  // Note that keys are expected to be unique accross groups.
+  map options = 1;
 
 Review comment:
   @mxm this map/key may complicate things if we later introduce scoping. 
Perhaps just return all options as list?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189649)
Time Spent: 17h 10m  (was: 17h)

> PortableRunner swallows custom options for Runner
> -
>
> Key: BEAM-5442
> URL: https://issues.apache.org/jira/browse/BEAM-5442
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core, sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Thomas Weise
>Priority: Major
>  Labels: portability, portability-flink
>  Time Spent: 17h 10m
>  Remaining Estimate: 0h
>
> The PortableRunner doesn't pass custom PipelineOptions to the executing 
> Runner.
> Example: {{--parallelism=4}} won't be forwarded to the FlinkRunner.
> (The option is just removed during proto translation without any warning)
> We should allow some form of customization through the options, even for the 
> PortableRunner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4620) UnboundedReadFromBoundedSource.split() should always call split()

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4620?focusedWorklogId=189648=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189648
 ]

ASF GitHub Bot logged work on BEAM-4620:


Author: ASF GitHub Bot
Created on: 24/Jan/19 19:45
Start Date: 24/Jan/19 19:45
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #7555: [BEAM-4620] 
UnboundedReadFromBoundedSource invokes split for small bounded sources
URL: https://github.com/apache/beam/pull/7555#issuecomment-457331131
 
 
   PTAL.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189648)
Time Spent: 50m  (was: 40m)

> UnboundedReadFromBoundedSource.split() should always call split()
> -
>
> Key: BEAM-4620
> URL: https://issues.apache.org/jira/browse/BEAM-4620
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Josh M
>Assignee: Chamikara Jayalath
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> If the source contains too little data (rounds the result down to 0), or if 
> size estimation fails, then we don't call .split().  We must always call 
> split(); instead of returning original source, we should have computed some 
> fallback value for desiredBundleSize.
> This bug has existed since the code was first introduced, so it is not a 
> regression - but it is certainly a bug.
>  
> source: 
> https://github.com/apache/beam/blob/697a1d17e473cd5b097aaaeee24c08f43cc77f58/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/UnboundedReadFromBoundedSource.java#L137



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4620) UnboundedReadFromBoundedSource.split() should always call split()

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4620?focusedWorklogId=189647=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189647
 ]

ASF GitHub Bot logged work on BEAM-4620:


Author: ASF GitHub Bot
Created on: 24/Jan/19 19:45
Start Date: 24/Jan/19 19:45
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #7555: 
[BEAM-4620] UnboundedReadFromBoundedSource invokes split for small bounded 
sources
URL: https://github.com/apache/beam/pull/7555#discussion_r250752064
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/UnboundedReadFromBoundedSource.java
 ##
 @@ -127,9 +131,11 @@ public void validate() {
 long desiredBundleSize = boundedSource.getEstimatedSizeBytes(options) 
/ desiredNumSplits;
 if (desiredBundleSize <= 0) {
   LOG.warn(
-  "BoundedSource {} cannot estimate its size, skips the initial 
splits.",
 
 Review comment:
   Your comment makes sense. To make matters more complicated our API instructs 
source authors to return 0L if size of source cannot be determined. 
   
   So I updated code to pick a default size of 64MB in case 0 is returned as 
the estimated source. Otherwise we'll try to maximize splitting by setting 
desiredBindleSize to 1 when a valid value cannot be determined.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189647)
Time Spent: 40m  (was: 0.5h)

> UnboundedReadFromBoundedSource.split() should always call split()
> -
>
> Key: BEAM-4620
> URL: https://issues.apache.org/jira/browse/BEAM-4620
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Josh M
>Assignee: Chamikara Jayalath
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> If the source contains too little data (rounds the result down to 0), or if 
> size estimation fails, then we don't call .split().  We must always call 
> split(); instead of returning original source, we should have computed some 
> fallback value for desiredBundleSize.
> This bug has existed since the code was first introduced, so it is not a 
> regression - but it is certainly a bug.
>  
> source: 
> https://github.com/apache/beam/blob/697a1d17e473cd5b097aaaeee24c08f43cc77f58/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/UnboundedReadFromBoundedSource.java#L137



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6501) [beam_PostCommit_Python_VR_Flink] [Flake] Java runtime runs out of memory.

2019-01-24 Thread Ankur Goenka (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751464#comment-16751464
 ] 

Ankur Goenka commented on BEAM-6501:


It looks like we are requesting a big chunk of memory from jvm which is causing 
this issues.
*05:06:15* Java HotSpot(TM) 64-Bit Server VM warning: INFO: 
os::commit_memory(0x00034360, 3581411328, 0) failed; error='Cannot 
allocate memory' (errno=12)*05:06:15* #*05:06:15* # There is insufficient 
memory for the Java Runtime Environment to continue.*05:06:15* # Native memory 
allocation (mmap) failed to map 3581411328 bytes for committing reserved 
memory.*05:06:15* # An error report file with more information is saved 
as:*05:06:15* # 
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_VR_Flink/src/sdks/python/hs_err_pid10167.log*05:06:16*
 Exception in thread wait_until_finish_read:*05:06:16* Traceback (most recent 
call last):*05:06:16*   File "/usr/lib/python2.7/threading.py", line 801, in 
__bootstrap_inner*05:06:16* self.run()*05:06:16*   File 
"/usr/lib/python2.7/threading.py", line 754, in run*05:06:16* 
self.__target(*self.__args, **self.__kwargs)*05:06:16*   File 
"apache_beam/runners/portability/portable_runner.py", line 331, in 
read_messages*05:06:16* for message in self._message_stream:*05:06:16*   
File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_VR_Flink/src/build/gradleenv/1327086738/local/lib/python2.7/site-packages/grpc/_channel.py",
 line 367, in next*05:06:16* return self._next()*05:06:16*   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_VR_Flink/src/build/gradleenv/1327086738/local/lib/python2.7/site-packages/grpc/_channel.py",
 line 358, in _next*05:06:16* raise self*05:06:16* _Rendezvous: 
<_Rendezvous of RPC that terminated with:*05:06:16*   status = 
StatusCode.UNAVAILABLE*05:06:16*   details = "Socket closed"*05:06:16* 
debug_error_string = "\{"created":"@1548335176.725835112","description":"Error 
received from 
peer","file":"src/core/lib/surface/call.cc","file_line":1036,"grpc_message":"Socket
 closed","grpc_status":14}"*05:06:16* >*05:06:16* *06:43:22* Build timed out 
(after 100 minutes). Marking the build as aborted.*06:43:22* EssBuild was 
aborted*06:43:23* No emails were triggered.*06:43:23* Finished: ABORTED

> [beam_PostCommit_Python_VR_Flink] [Flake] Java runtime runs out of memory.
> --
>
> Key: BEAM-6501
> URL: https://issues.apache.org/jira/browse/BEAM-6501
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, test-failures
>Reporter: Daniel Oliveira
>Assignee: Ankur Goenka
>Priority: Major
>  Labels: flake
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/86/]
> Initial investigation:
> Test seems to have timed out after failing due to hitting a memory limit. 
> From the bottom of the logs:
> {noformat}
> 05:06:15 # There is insufficient memory for the Java Runtime Environment to 
> continue.
> 05:06:15 # Native memory allocation (mmap) failed to map 3581411328 bytes for 
> committing reserved memory.
> {noformat}
> May be a memory leak or a quota issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5442) PortableRunner swallows custom options for Runner

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5442?focusedWorklogId=189636=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189636
 ]

ASF GitHub Bot logged work on BEAM-5442:


Author: ASF GitHub Bot
Created on: 24/Jan/19 19:21
Start Date: 24/Jan/19 19:21
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #7564: [release] Revert 
"[BEAM-5442] Pass SDK-unknown PipelineOptions to Runner"
URL: https://github.com/apache/beam/pull/7564#issuecomment-457322591
 
 
   I am fixing the image now, but I am happy to merge this, too. Final 
verification suffices.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189636)
Time Spent: 16h 40m  (was: 16.5h)

> PortableRunner swallows custom options for Runner
> -
>
> Key: BEAM-5442
> URL: https://issues.apache.org/jira/browse/BEAM-5442
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core, sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Thomas Weise
>Priority: Major
>  Labels: portability, portability-flink
>  Time Spent: 16h 40m
>  Remaining Estimate: 0h
>
> The PortableRunner doesn't pass custom PipelineOptions to the executing 
> Runner.
> Example: {{--parallelism=4}} won't be forwarded to the FlinkRunner.
> (The option is just removed during proto translation without any warning)
> We should allow some form of customization through the options, even for the 
> PortableRunner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6468) Cannot create empty TestBoundedTable

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6468?focusedWorklogId=189635=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189635
 ]

ASF GitHub Bot logged work on BEAM-6468:


Author: ASF GitHub Bot
Created on: 24/Jan/19 19:19
Start Date: 24/Jan/19 19:19
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #7568: [BEAM-6468] Allow 
creating empty TestBoundedTable
URL: https://github.com/apache/beam/pull/7568#issuecomment-457322129
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189635)
Time Spent: 2.5h  (was: 2h 20m)

> Cannot create empty TestBoundedTable
> 
>
> Key: BEAM-6468
> URL: https://issues.apache.org/jira/browse/BEAM-6468
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6489) Python precommits are failiing due to a pip error: "no such option: --process-dependency-links"

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6489?focusedWorklogId=189639=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189639
 ]

ASF GitHub Bot logged work on BEAM-6489:


Author: ASF GitHub Bot
Created on: 24/Jan/19 19:22
Start Date: 24/Jan/19 19:22
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #7611: 
[BEAM-6489] Cherry-pick 10f4b1a5ef to 2.10.0 release branch
URL: https://github.com/apache/beam/pull/7611
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189639)
Time Spent: 4.5h  (was: 4h 20m)

> Python precommits are failiing due to a pip error: "no such option: 
> --process-dependency-links" 
> 
>
> Key: BEAM-6489
> URL: https://issues.apache.org/jira/browse/BEAM-6489
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Blocker
>  Labels: currently-failing
> Fix For: 2.11.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> {noformat}
> 13:35:37 GLOB sdist-make: 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/setup.py
> 13:35:37 docs create: 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/target/.tox/docs
> 13:35:39 docs installdeps: Sphinx==1.6.5, sphinx_rtd_theme==0.2.4
> 13:35:39 ERROR: invocation failed (exit code 2), logfile: 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/target/.tox/docs/log/docs-1.log
> 13:35:39 ERROR: actionid: docs
> 13:35:39 msg: getenv
> 13:35:39 cmdargs: 
> ['/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/target/.tox/docs/bin/python',
>  
> '/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/target/.tox/docs/bin/pip',
>  'install', '--retries', '10', '--process-dependency-links', 'Sphinx==1.6.5', 
> 'sphinx_rtd_theme==0.2.4']
> 13:35:39 
> 13:35:39 
> 13:35:39 Usage:   
> 13:35:39   pip install [options]  
> [package-index-options] ...
> 13:35:39   pip install [options] -r  
> [package-index-options] ...
> 13:35:39   pip install [options] [-e]  ...
> 13:35:39   pip install [options] [-e]  ...
> 13:35:39   pip install [options]  ...
> 13:35:39 
> 13:35:39 no such option: --process-dependency-links
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5442) PortableRunner swallows custom options for Runner

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5442?focusedWorklogId=189638=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189638
 ]

ASF GitHub Bot logged work on BEAM-5442:


Author: ASF GitHub Bot
Created on: 24/Jan/19 19:21
Start Date: 24/Jan/19 19:21
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #7564: [release] 
Revert "[BEAM-5442] Pass SDK-unknown PipelineOptions to Runner"
URL: https://github.com/apache/beam/pull/7564
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189638)
Time Spent: 16h 50m  (was: 16h 40m)

> PortableRunner swallows custom options for Runner
> -
>
> Key: BEAM-5442
> URL: https://issues.apache.org/jira/browse/BEAM-5442
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core, sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Thomas Weise
>Priority: Major
>  Labels: portability, portability-flink
>  Time Spent: 16h 50m
>  Remaining Estimate: 0h
>
> The PortableRunner doesn't pass custom PipelineOptions to the executing 
> Runner.
> Example: {{--parallelism=4}} won't be forwarded to the FlinkRunner.
> (The option is just removed during proto translation without any warning)
> We should allow some form of customization through the options, even for the 
> PortableRunner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6503) PCollectionViews$SimplePCollectionView.hashCode once again allocates memory (fix reverted, then fixed again)

2019-01-24 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-6503:
-

 Summary: PCollectionViews$SimplePCollectionView.hashCode once 
again allocates memory (fix reverted, then fixed again)
 Key: BEAM-6503
 URL: https://issues.apache.org/jira/browse/BEAM-6503
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-core
Affects Versions: 2.8.0
Reporter: Vojtech Janota
Assignee: Vojtech Janota
 Fix For: 2.9.0


I'm currently profiling memory consumption of our Beam pipeline and have 
noticed that

    org.apache.beam.sdk.values.PCollectionViews$SimplePCollectionView.hashCode()

makes noticeable heap allocations. The implementation is:

    return Objects.hash(tag);

That itself translates to:

    return Arrays.hashCode(values);

Which performs implicit array creation in order to call:

    public static int Arrays.hashCode(Object a[]);

Instead of the helper call, doing simple:

    tag.hashCode();

Seems more appropriate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6324) CassandraIO.Read - Add the ability to provide a filter to the query

2019-01-24 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751440#comment-16751440
 ] 

Kenneth Knowles commented on BEAM-6324:
---

The PR is still open and I don't see that this is actually in 2.9.0. Just 
updating the Fix Version field so Jira reflects this.

> CassandraIO.Read - Add the ability to provide a filter to the query
> ---
>
> Key: BEAM-6324
> URL: https://issues.apache.org/jira/browse/BEAM-6324
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-cassandra
>Affects Versions: 2.9.0
>Reporter: Shahar Frank
>Assignee: Shahar Frank
>Priority: Major
>  Labels: performance, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> CassandraIO.Read doesn't support using WHERE to filter the input at the 
> source (In Cassandra) which might provide great performance boost.
> Already implemented by:
> https://github.com/apache/beam/pull/7340



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6500) Precomit broken due to style violation

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6500?focusedWorklogId=189626=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189626
 ]

ASF GitHub Bot logged work on BEAM-6500:


Author: ASF GitHub Bot
Created on: 24/Jan/19 19:08
Start Date: 24/Jan/19 19:08
Worklog Time Spent: 10m 
  Work Description: swegner commented on issue #7618: [BEAM-6500] Add 
missing javadocs for new public methods
URL: https://github.com/apache/beam/pull/7618#issuecomment-457318412
 
 
   Java pre-commit is passing and this comment-only change doesn't effect 
Python. I believe this is ready to merge.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189626)
Time Spent: 0.5h  (was: 20m)

> Precomit broken due to style violation
> --
>
> Key: BEAM-6500
> URL: https://issues.apache.org/jira/browse/BEAM-6500
> Project: Beam
>  Issue Type: New Feature
>  Components: java-fn-execution
>Reporter: Alex Amato
>Assignee: Scott Wegner
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This seems to be broken for all PRs now
>  
> :beam-runners-direct-java:checkstyleMain FAILED [ant:checkstyle] [WARN] 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Java_Phrase/src/runners/direct-java/src/main/java/org/apache/beam/runners/direct/portable/ReferenceRunner.java:164:3:
>  Missing a Javadoc comment. [JavadocMethod] [ant:checkstyle] [ERROR] 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Java_Phrase/src/runners/direct-java/src/main/java/org/apache/beam/runners/direct/portable/job/ReferenceRunnerJobServer.java:147:
>  Missing a Javadoc comment. [JavadocType]
>  
> https://scans.gradle.com/s/nwgb7xegklwqo/console-log?task=:beam-runners-direct-java:checkstyleMain



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6503) PCollectionViews$SimplePCollectionView.hashCode once again allocates memory (fix reverted, then fixed again)

2019-01-24 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-6503:
--
Affects Version/s: (was: 2.8.0)
   2.10.0

> PCollectionViews$SimplePCollectionView.hashCode once again allocates memory 
> (fix reverted, then fixed again)
> 
>
> Key: BEAM-6503
> URL: https://issues.apache.org/jira/browse/BEAM-6503
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Affects Versions: 2.10.0
>Reporter: Vojtech Janota
>Assignee: Vojtech Janota
>Priority: Trivial
>
> I'm currently profiling memory consumption of our Beam pipeline and have 
> noticed that
>     
> org.apache.beam.sdk.values.PCollectionViews$SimplePCollectionView.hashCode()
> makes noticeable heap allocations. The implementation is:
>     return Objects.hash(tag);
> That itself translates to:
>     return Arrays.hashCode(values);
> Which performs implicit array creation in order to call:
>     public static int Arrays.hashCode(Object a[]);
> Instead of the helper call, doing simple:
>     tag.hashCode();
> Seems more appropriate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6503) PCollectionViews$SimplePCollectionView.hashCode once again allocates memory (fix reverted, then fixed again)

2019-01-24 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-6503:
--
Fix Version/s: (was: 2.9.0)

> PCollectionViews$SimplePCollectionView.hashCode once again allocates memory 
> (fix reverted, then fixed again)
> 
>
> Key: BEAM-6503
> URL: https://issues.apache.org/jira/browse/BEAM-6503
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Affects Versions: 2.8.0
>Reporter: Vojtech Janota
>Assignee: Vojtech Janota
>Priority: Trivial
>
> I'm currently profiling memory consumption of our Beam pipeline and have 
> noticed that
>     
> org.apache.beam.sdk.values.PCollectionViews$SimplePCollectionView.hashCode()
> makes noticeable heap allocations. The implementation is:
>     return Objects.hash(tag);
> That itself translates to:
>     return Arrays.hashCode(values);
> Which performs implicit array creation in order to call:
>     public static int Arrays.hashCode(Object a[]);
> Instead of the helper call, doing simple:
>     tag.hashCode();
> Seems more appropriate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5933) PCollectionViews$SimplePCollectionView.hashCode allocates memory

2019-01-24 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751443#comment-16751443
 ] 

Kenneth Knowles commented on BEAM-5933:
---

I was just going through the Release tab of Jira and I think it might be good 
to represent the reality there, so I will clone.

> PCollectionViews$SimplePCollectionView.hashCode allocates memory
> 
>
> Key: BEAM-5933
> URL: https://issues.apache.org/jira/browse/BEAM-5933
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Affects Versions: 2.8.0
>Reporter: Vojtech Janota
>Assignee: Vojtech Janota
>Priority: Trivial
> Fix For: 2.9.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> I'm currently profiling memory consumption of our Beam pipeline and have 
> noticed that
>     
> org.apache.beam.sdk.values.PCollectionViews$SimplePCollectionView.hashCode()
> makes noticeable heap allocations. The implementation is:
>     return Objects.hash(tag);
> That itself translates to:
>     return Arrays.hashCode(values);
> Which performs implicit array creation in order to call:
>     public static int Arrays.hashCode(Object a[]);
> Instead of the helper call, doing simple:
>     tag.hashCode();
> Seems more appropriate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-5933) PCollectionViews$SimplePCollectionView.hashCode allocates memory

2019-01-24 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles resolved BEAM-5933.
---
Resolution: Fixed

> PCollectionViews$SimplePCollectionView.hashCode allocates memory
> 
>
> Key: BEAM-5933
> URL: https://issues.apache.org/jira/browse/BEAM-5933
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Affects Versions: 2.8.0
>Reporter: Vojtech Janota
>Assignee: Vojtech Janota
>Priority: Trivial
> Fix For: 2.9.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> I'm currently profiling memory consumption of our Beam pipeline and have 
> noticed that
>     
> org.apache.beam.sdk.values.PCollectionViews$SimplePCollectionView.hashCode()
> makes noticeable heap allocations. The implementation is:
>     return Objects.hash(tag);
> That itself translates to:
>     return Arrays.hashCode(values);
> Which performs implicit array creation in order to call:
>     public static int Arrays.hashCode(Object a[]);
> Instead of the helper call, doing simple:
>     tag.hashCode();
> Seems more appropriate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5446) SplittableDoFn: Remove runner time execution information from public API surface

2019-01-24 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751441#comment-16751441
 ] 

Kenneth Knowles commented on BEAM-5446:
---

Since this  _was_ included in 2.9.0 but will not be in 2.10.0, I am going to 
re-close and clone.

> SplittableDoFn: Remove runner time execution information from public API 
> surface
> 
>
> Key: BEAM-5446
> URL: https://issues.apache.org/jira/browse/BEAM-5446
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Scott Wegner
>Priority: Minor
> Fix For: 2.9.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Move the setting of "claim observers" within RestrictionTracker to another 
> location to clean up the RestrictionTracker interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-6502) SplittableDoFn: Re-Remove runner time execution information from public API surface

2019-01-24 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-6502:
-

Assignee: Scott Wegner  (was: Luke Cwik)

> SplittableDoFn: Re-Remove runner time execution information from public API 
> surface
> ---
>
> Key: BEAM-6502
> URL: https://issues.apache.org/jira/browse/BEAM-6502
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Scott Wegner
>Priority: Minor
> Fix For: 2.11.0
>
>
> Move the setting of "claim observers" within RestrictionTracker to another 
> location to clean up the RestrictionTracker interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6500) Precomit broken due to style violation

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6500?focusedWorklogId=189627=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189627
 ]

ASF GitHub Bot logged work on BEAM-6500:


Author: ASF GitHub Bot
Created on: 24/Jan/19 19:08
Start Date: 24/Jan/19 19:08
Worklog Time Spent: 10m 
  Work Description: swegner commented on pull request #7618: [BEAM-6500] 
Add missing javadocs for new public methods
URL: https://github.com/apache/beam/pull/7618
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189627)
Time Spent: 40m  (was: 0.5h)

> Precomit broken due to style violation
> --
>
> Key: BEAM-6500
> URL: https://issues.apache.org/jira/browse/BEAM-6500
> Project: Beam
>  Issue Type: New Feature
>  Components: java-fn-execution
>Reporter: Alex Amato
>Assignee: Scott Wegner
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This seems to be broken for all PRs now
>  
> :beam-runners-direct-java:checkstyleMain FAILED [ant:checkstyle] [WARN] 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Java_Phrase/src/runners/direct-java/src/main/java/org/apache/beam/runners/direct/portable/ReferenceRunner.java:164:3:
>  Missing a Javadoc comment. [JavadocMethod] [ant:checkstyle] [ERROR] 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Java_Phrase/src/runners/direct-java/src/main/java/org/apache/beam/runners/direct/portable/job/ReferenceRunnerJobServer.java:147:
>  Missing a Javadoc comment. [JavadocType]
>  
> https://scans.gradle.com/s/nwgb7xegklwqo/console-log?task=:beam-runners-direct-java:checkstyleMain



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-5446) SplittableDoFn: Remove runner time execution information from public API surface

2019-01-24 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles resolved BEAM-5446.
---
Resolution: Fixed
  Assignee: Luke Cwik  (was: Scott Wegner)

> SplittableDoFn: Remove runner time execution information from public API 
> surface
> 
>
> Key: BEAM-5446
> URL: https://issues.apache.org/jira/browse/BEAM-5446
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Minor
> Fix For: 2.9.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Move the setting of "claim observers" within RestrictionTracker to another 
> location to clean up the RestrictionTracker interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6233) Make bundle execution with ExecutableStage support timer/states

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6233?focusedWorklogId=189628=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189628
 ]

ASF GitHub Bot logged work on BEAM-6233:


Author: ASF GitHub Bot
Created on: 24/Jan/19 19:08
Start Date: 24/Jan/19 19:08
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #7330: 
[BEAM-6233]: Add initial user timer support in Dataflow for batch pipelines
URL: https://github.com/apache/beam/pull/7330#discussion_r250738966
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/ProcessRemoteBundleOperation.java
 ##
 @@ -48,25 +61,98 @@
   new OutputReceiverFactory() {
 @Override
 public FnDataReceiver create(String pCollectionId) {
-  return receivedElement -> {
-LOG.debug("Consume element {}", receivedElement);
-outputReceiverMap.get(pCollectionId).process((WindowedValue) 
receivedElement);
-  };
+  return receivedElement -> receive(pCollectionId, receivedElement);
 }
   };
   private final StateRequestHandler stateRequestHandler;
   private final BundleProgressHandler progressHandler;
   private RemoteBundle remoteBundle;
+  private final DataflowExecutionContext executionContext;
+  private final Map 
timerOutputIdToSpecMap;
+  private final Map> timerWindowCodersMap;
+  private final Map 
timerIdToTimerSpecMap;
+  private final Map timerIdToKey;
+  private ExecutableStage executableStage;
 
   public ProcessRemoteBundleOperation(
-  OperationContext context,
+  ExecutableStage executableStage,
+  DataflowExecutionContext executionContext,
+  DataflowOperationContext operationContext,
   StageBundleFactory stageBundleFactory,
   Map outputReceiverMap) {
-super(EMPTY_RECEIVER_ARRAY, context);
+super(EMPTY_RECEIVER_ARRAY, operationContext);
+
 this.stageBundleFactory = stageBundleFactory;
-this.outputReceiverMap = outputReceiverMap;
 this.stateRequestHandler = StateRequestHandler.unsupported();
 this.progressHandler = BundleProgressHandler.ignored();
+this.executionContext = executionContext;
+this.timerOutputIdToSpecMap = new HashMap<>();
+this.timerWindowCodersMap = new HashMap<>();
+this.executableStage = executableStage;
+this.timerIdToKey = new HashMap<>();
+this.outputReceiverMap = outputReceiverMap;
+timerIdToTimerSpecMap = new HashMap<>();
+
+for (Map transformTimerMap :
+
stageBundleFactory.getProcessBundleDescriptor().getTimerSpecs().values()) {
+  for (ProcessBundleDescriptors.TimerSpec timerSpec : 
transformTimerMap.values()) {
+timerIdToTimerSpecMap.put(timerSpec.timerId(), timerSpec);
+  }
+}
+
+for (Map transformTimerMap :
+
stageBundleFactory.getProcessBundleDescriptor().getTimerSpecs().values()) {
+  for (ProcessBundleDescriptors.TimerSpec timerSpec : 
transformTimerMap.values()) {
+timerOutputIdToSpecMap.put(timerSpec.outputCollectionId(), timerSpec);
+  }
+}
+
+for (RunnerApi.PTransform pTransform :
 
 Review comment:
   The inner loop and the lookup logic complicates this unfortunately to a 
point where the stream method obfuscates too much.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189628)
Time Spent: 3h 10m  (was: 3h)

> Make bundle execution with ExecutableStage support timer/states
> ---
>
> Key: BEAM-6233
> URL: https://issues.apache.org/jira/browse/BEAM-6233
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6324) CassandraIO.Read - Add the ability to provide a filter to the query

2019-01-24 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-6324:
--
Fix Version/s: (was: 2.9.0)

> CassandraIO.Read - Add the ability to provide a filter to the query
> ---
>
> Key: BEAM-6324
> URL: https://issues.apache.org/jira/browse/BEAM-6324
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-cassandra
>Affects Versions: 2.9.0
>Reporter: Shahar Frank
>Assignee: Shahar Frank
>Priority: Major
>  Labels: performance, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> CassandraIO.Read doesn't support using WHERE to filter the input at the 
> source (In Cassandra) which might provide great performance boost.
> Already implemented by:
> https://github.com/apache/beam/pull/7340



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6233) Make bundle execution with ExecutableStage support timer/states

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6233?focusedWorklogId=189625=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189625
 ]

ASF GitHub Bot logged work on BEAM-6233:


Author: ASF GitHub Bot
Created on: 24/Jan/19 19:07
Start Date: 24/Jan/19 19:07
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #7330: 
[BEAM-6233]: Add initial user timer support in Dataflow for batch pipelines
URL: https://github.com/apache/beam/pull/7330#discussion_r250738679
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/ProcessRemoteBundleOperation.java
 ##
 @@ -100,6 +197,75 @@ public void finish() throws Exception {
   } catch (Exception e) {
 throw new RuntimeException("Failed to finish remote bundle", e);
   }
+
+  // TODO(BEAM-6274): do we have to put this in the "start" method as well?
+  try (RemoteBundle bundle =
+  stageBundleFactory.getBundle(receiverFactory, stateRequestHandler, 
progressHandler)) {
+
+// TODO(BEAM-6274): Why do we need to namespace this to "user"?
+DataflowExecutionContext.DataflowStepContext stepContext =
+executionContext
+.getStepContext((DataflowOperationContext) this.context)
+.namespacedToUser();
+
+// TODO(BEAM-6274): investigate if this is the correct window
+TimerInternals.TimerData timerData =
+stepContext.getNextFiredTimer(GlobalWindow.Coder.INSTANCE);
+while (timerData != null) {
+  LOG.debug("Found fired timer in start {}", timerData);
+
+  // TODO(BEAM-6274): get the correct payload and payload coder
+  StateNamespaces.WindowNamespace windowNamespace =
+  (StateNamespaces.WindowNamespace) timerData.getNamespace();
+  BoundedWindow window = windowNamespace.getWindow();
+
+  WindowedValue> timerValue =
+  WindowedValue.of(
+  KV.of(
+  timerIdToKey.get(timerData.getTimerId()),
+  Timer.of(timerData.getTimestamp(), new byte[0])),
+  timerData.getTimestamp(),
+  Collections.singleton(window),
+  PaneInfo.NO_FIRING);
+
+  String mainInputId =
+  
timerIdToTimerSpecMap.get(timerData.getTimerId()).inputCollectionId();
+
+  bundle.getInputReceivers().get(mainInputId).accept(timerValue);
+
+  // TODO(BEAM-6274): investigate if this is the correct window
+  timerData = 
stepContext.getNextFiredTimer(GlobalWindow.Coder.INSTANCE);
+}
+  }
+}
+  }
+
+  private void receive(String pCollectionId, Object receivedElement) throws 
Exception {
+LOG.debug("Received element {} for pcollection {}", receivedElement, 
pCollectionId);
+// TODO(BEAM-6274): move this out into its own receiver class
 
 Review comment:
   I plan on a follow up PR cleaning this up and moving into its own class.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189625)
Time Spent: 3h  (was: 2h 50m)

> Make bundle execution with ExecutableStage support timer/states
> ---
>
> Key: BEAM-6233
> URL: https://issues.apache.org/jira/browse/BEAM-6233
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6468) Cannot create empty TestBoundedTable

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6468?focusedWorklogId=189618=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189618
 ]

ASF GitHub Bot logged work on BEAM-6468:


Author: ASF GitHub Bot
Created on: 24/Jan/19 18:51
Start Date: 24/Jan/19 18:51
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #7568: [BEAM-6468] Allow 
creating empty TestBoundedTable
URL: https://github.com/apache/beam/pull/7568#issuecomment-457312809
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189618)
Time Spent: 2h 20m  (was: 2h 10m)

> Cannot create empty TestBoundedTable
> 
>
> Key: BEAM-6468
> URL: https://issues.apache.org/jira/browse/BEAM-6468
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6154) Gcsio batch delete broken in Python 3

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6154?focusedWorklogId=189609=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189609
 ]

ASF GitHub Bot logged work on BEAM-6154:


Author: ASF GitHub Bot
Created on: 24/Jan/19 18:36
Start Date: 24/Jan/19 18:36
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #7617: [BEAM-6154] Update 
google-apitools to 0.5.26 and fix gcsio in python 3
URL: https://github.com/apache/beam/pull/7617#issuecomment-457307785
 
 
   Run Python Dataflow ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189609)
Time Spent: 50m  (was: 40m)

> Gcsio batch delete broken in Python 3
> -
>
> Key: BEAM-6154
> URL: https://issues.apache.org/jira/browse/BEAM-6154
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> I'm running Python SDK agianst GCP in Python 3.5 and got following gcsio 
> error while deleting files:
> {code}
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/io/iobase.py", 
> line 1077, in 
> window.TimestampedValue(v, timestamp.MAX_TIMESTAMP) for v in outputs)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", 
> line 315, in finalize_write
> num_threads)
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/internal/util.py", 
> line 145, in run_using_threadpool
> return pool.map(fn_to_execute, inputs)
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 266, in map
> return self._map_async(func, iterable, mapstar, chunksize).get()
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 644, in get
> raise self._value
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 119, in worker
> result = (True, func(*args, **kwds))
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 44, in mapstar
> return list(map(*args))
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", 
> line 299, in _rename_batch
> FileSystems.rename(source_files, destination_files)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filesystems.py", line 
> 252, in rename
> return filesystem.rename(source_file_names, destination_file_names)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsfilesystem.py", 
> line 229, in rename
> copy_statuses = gcsio.GcsIO().copy_batch(batch)
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsio.py", 
> line 322, in copy_batch
> api_calls = batch_request.Execute(self.client._http)  # pylint: 
> disable=protected-access
>   File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", 
> line 222, in Execute
> batch_http_request.Execute(http)
>   File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", 
> line 480, in Execute
> self._Execute(http)
>   File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", 
> line 450, in _Execute
> mime_response = parser.parsestr(header + response.content)
> TypeError: Can't convert 'bytes' object to str implicitly
> {code} 
> After looking into related code in apitools library, I found response.content 
> that's returned via http request to gcs is bytes and apitools didn't handle 
> this scenario. This can be a blocker to any pipeline depending on gcsio and 
> apparently blocks all Dataflow job in Python 3.
> This could be another case that moving off apitools dependency in 
> [BEAM-4850|https://issues.apache.org/jira/browse/BEAM-4850].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6154) Gcsio batch delete broken in Python 3

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6154?focusedWorklogId=189612=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189612
 ]

ASF GitHub Bot logged work on BEAM-6154:


Author: ASF GitHub Bot
Created on: 24/Jan/19 18:36
Start Date: 24/Jan/19 18:36
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #7617: [BEAM-6154] Update 
google-apitools to 0.5.26 and fix gcsio in python 3
URL: https://github.com/apache/beam/pull/7617#issuecomment-457307867
 
 
   Run Python Dataflow ValidatesRunner
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189612)
Time Spent: 1h 20m  (was: 1h 10m)

> Gcsio batch delete broken in Python 3
> -
>
> Key: BEAM-6154
> URL: https://issues.apache.org/jira/browse/BEAM-6154
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> I'm running Python SDK agianst GCP in Python 3.5 and got following gcsio 
> error while deleting files:
> {code}
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/io/iobase.py", 
> line 1077, in 
> window.TimestampedValue(v, timestamp.MAX_TIMESTAMP) for v in outputs)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", 
> line 315, in finalize_write
> num_threads)
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/internal/util.py", 
> line 145, in run_using_threadpool
> return pool.map(fn_to_execute, inputs)
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 266, in map
> return self._map_async(func, iterable, mapstar, chunksize).get()
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 644, in get
> raise self._value
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 119, in worker
> result = (True, func(*args, **kwds))
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 44, in mapstar
> return list(map(*args))
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", 
> line 299, in _rename_batch
> FileSystems.rename(source_files, destination_files)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filesystems.py", line 
> 252, in rename
> return filesystem.rename(source_file_names, destination_file_names)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsfilesystem.py", 
> line 229, in rename
> copy_statuses = gcsio.GcsIO().copy_batch(batch)
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsio.py", 
> line 322, in copy_batch
> api_calls = batch_request.Execute(self.client._http)  # pylint: 
> disable=protected-access
>   File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", 
> line 222, in Execute
> batch_http_request.Execute(http)
>   File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", 
> line 480, in Execute
> self._Execute(http)
>   File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", 
> line 450, in _Execute
> mime_response = parser.parsestr(header + response.content)
> TypeError: Can't convert 'bytes' object to str implicitly
> {code} 
> After looking into related code in apitools library, I found response.content 
> that's returned via http request to gcs is bytes and apitools didn't handle 
> this scenario. This can be a blocker to any pipeline depending on gcsio and 
> apparently blocks all Dataflow job in Python 3.
> This could be another case that moving off apitools dependency in 
> [BEAM-4850|https://issues.apache.org/jira/browse/BEAM-4850].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6154) Gcsio batch delete broken in Python 3

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6154?focusedWorklogId=189608=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189608
 ]

ASF GitHub Bot logged work on BEAM-6154:


Author: ASF GitHub Bot
Created on: 24/Jan/19 18:35
Start Date: 24/Jan/19 18:35
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #7617: [BEAM-6154] Update 
google-apitools to 0.5.26 and fix gcsio in python 3
URL: https://github.com/apache/beam/pull/7617#issuecomment-457307339
 
 
   Run Python Dataflow ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189608)
Time Spent: 40m  (was: 0.5h)

> Gcsio batch delete broken in Python 3
> -
>
> Key: BEAM-6154
> URL: https://issues.apache.org/jira/browse/BEAM-6154
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> I'm running Python SDK agianst GCP in Python 3.5 and got following gcsio 
> error while deleting files:
> {code}
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/io/iobase.py", 
> line 1077, in 
> window.TimestampedValue(v, timestamp.MAX_TIMESTAMP) for v in outputs)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", 
> line 315, in finalize_write
> num_threads)
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/internal/util.py", 
> line 145, in run_using_threadpool
> return pool.map(fn_to_execute, inputs)
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 266, in map
> return self._map_async(func, iterable, mapstar, chunksize).get()
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 644, in get
> raise self._value
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 119, in worker
> result = (True, func(*args, **kwds))
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 44, in mapstar
> return list(map(*args))
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", 
> line 299, in _rename_batch
> FileSystems.rename(source_files, destination_files)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filesystems.py", line 
> 252, in rename
> return filesystem.rename(source_file_names, destination_file_names)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsfilesystem.py", 
> line 229, in rename
> copy_statuses = gcsio.GcsIO().copy_batch(batch)
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsio.py", 
> line 322, in copy_batch
> api_calls = batch_request.Execute(self.client._http)  # pylint: 
> disable=protected-access
>   File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", 
> line 222, in Execute
> batch_http_request.Execute(http)
>   File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", 
> line 480, in Execute
> self._Execute(http)
>   File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", 
> line 450, in _Execute
> mime_response = parser.parsestr(header + response.content)
> TypeError: Can't convert 'bytes' object to str implicitly
> {code} 
> After looking into related code in apitools library, I found response.content 
> that's returned via http request to gcs is bytes and apitools didn't handle 
> this scenario. This can be a blocker to any pipeline depending on gcsio and 
> apparently blocks all Dataflow job in Python 3.
> This could be another case that moving off apitools dependency in 
> [BEAM-4850|https://issues.apache.org/jira/browse/BEAM-4850].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6154) Gcsio batch delete broken in Python 3

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6154?focusedWorklogId=189607=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189607
 ]

ASF GitHub Bot logged work on BEAM-6154:


Author: ASF GitHub Bot
Created on: 24/Jan/19 18:34
Start Date: 24/Jan/19 18:34
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #7617: [BEAM-6154] Update 
google-apitools to 0.5.26 and fix gcsio in python 3
URL: https://github.com/apache/beam/pull/7617#issuecomment-457307212
 
 
   Run Dataflow Python ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189607)
Time Spent: 0.5h  (was: 20m)

> Gcsio batch delete broken in Python 3
> -
>
> Key: BEAM-6154
> URL: https://issues.apache.org/jira/browse/BEAM-6154
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I'm running Python SDK agianst GCP in Python 3.5 and got following gcsio 
> error while deleting files:
> {code}
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/io/iobase.py", 
> line 1077, in 
> window.TimestampedValue(v, timestamp.MAX_TIMESTAMP) for v in outputs)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", 
> line 315, in finalize_write
> num_threads)
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/internal/util.py", 
> line 145, in run_using_threadpool
> return pool.map(fn_to_execute, inputs)
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 266, in map
> return self._map_async(func, iterable, mapstar, chunksize).get()
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 644, in get
> raise self._value
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 119, in worker
> result = (True, func(*args, **kwds))
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 44, in mapstar
> return list(map(*args))
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", 
> line 299, in _rename_batch
> FileSystems.rename(source_files, destination_files)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filesystems.py", line 
> 252, in rename
> return filesystem.rename(source_file_names, destination_file_names)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsfilesystem.py", 
> line 229, in rename
> copy_statuses = gcsio.GcsIO().copy_batch(batch)
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsio.py", 
> line 322, in copy_batch
> api_calls = batch_request.Execute(self.client._http)  # pylint: 
> disable=protected-access
>   File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", 
> line 222, in Execute
> batch_http_request.Execute(http)
>   File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", 
> line 480, in Execute
> self._Execute(http)
>   File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", 
> line 450, in _Execute
> mime_response = parser.parsestr(header + response.content)
> TypeError: Can't convert 'bytes' object to str implicitly
> {code} 
> After looking into related code in apitools library, I found response.content 
> that's returned via http request to gcs is bytes and apitools didn't handle 
> this scenario. This can be a blocker to any pipeline depending on gcsio and 
> apparently blocks all Dataflow job in Python 3.
> This could be another case that moving off apitools dependency in 
> [BEAM-4850|https://issues.apache.org/jira/browse/BEAM-4850].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5442) PortableRunner swallows custom options for Runner

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5442?focusedWorklogId=189604=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189604
 ]

ASF GitHub Bot logged work on BEAM-5442:


Author: ASF GitHub Bot
Created on: 24/Jan/19 18:32
Start Date: 24/Jan/19 18:32
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #7564: [release] Revert 
"[BEAM-5442] Pass SDK-unknown PipelineOptions to Runner"
URL: https://github.com/apache/beam/pull/7564#issuecomment-457306533
 
 
   Do we want to fix the missing image or assume this is working working 
correctly since all other tests pass?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189604)
Time Spent: 16.5h  (was: 16h 20m)

> PortableRunner swallows custom options for Runner
> -
>
> Key: BEAM-5442
> URL: https://issues.apache.org/jira/browse/BEAM-5442
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core, sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Thomas Weise
>Priority: Major
>  Labels: portability, portability-flink
>  Time Spent: 16.5h
>  Remaining Estimate: 0h
>
> The PortableRunner doesn't pass custom PipelineOptions to the executing 
> Runner.
> Example: {{--parallelism=4}} won't be forwarded to the FlinkRunner.
> (The option is just removed during proto translation without any warning)
> We should allow some form of customization through the options, even for the 
> PortableRunner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6500) Precomit broken due to style violation

2019-01-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6500?focusedWorklogId=189603=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189603
 ]

ASF GitHub Bot logged work on BEAM-6500:


Author: ASF GitHub Bot
Created on: 24/Jan/19 18:32
Start Date: 24/Jan/19 18:32
Worklog Time Spent: 10m 
  Work Description: swegner commented on pull request #7618: [BEAM-6500] 
Add missing javadocs for new public methods
URL: https://github.com/apache/beam/pull/7618
 
 
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | --- | --- | --- | ---
   
   
   
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 189603)
Time Spent: 10m
Remaining Estimate: 0h

> Precomit broken due to style violation
> --
>
> Key: BEAM-6500
> URL: https://issues.apache.org/jira/browse/BEAM-6500
> Project: Beam
>  Issue Type: New Feature
>  Components: java-fn-execution
> 

[jira] [Assigned] (BEAM-6500) Precomit broken due to style violation

2019-01-24 Thread Scott Wegner (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Wegner reassigned BEAM-6500:
--

Assignee: Scott Wegner

> Precomit broken due to style violation
> --
>
> Key: BEAM-6500
> URL: https://issues.apache.org/jira/browse/BEAM-6500
> Project: Beam
>  Issue Type: New Feature
>  Components: java-fn-execution
>Reporter: Alex Amato
>Assignee: Scott Wegner
>Priority: Major
>
> This seems to be broken for all PRs now
>  
> :beam-runners-direct-java:checkstyleMain FAILED [ant:checkstyle] [WARN] 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Java_Phrase/src/runners/direct-java/src/main/java/org/apache/beam/runners/direct/portable/ReferenceRunner.java:164:3:
>  Missing a Javadoc comment. [JavadocMethod] [ant:checkstyle] [ERROR] 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Java_Phrase/src/runners/direct-java/src/main/java/org/apache/beam/runners/direct/portable/job/ReferenceRunnerJobServer.java:147:
>  Missing a Javadoc comment. [JavadocType]
>  
> https://scans.gradle.com/s/nwgb7xegklwqo/console-log?task=:beam-runners-direct-java:checkstyleMain



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >