[jira] [Commented] (BEAM-6324) CassandraIO.Read - Add the ability to provide a filter to the query
[ https://issues.apache.org/jira/browse/BEAM-6324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751955#comment-16751955 ] Kenneth Knowles commented on BEAM-6324: --- Quite interesting. Let's keep this open and see if a productive resolution can be reached. > CassandraIO.Read - Add the ability to provide a filter to the query > --- > > Key: BEAM-6324 > URL: https://issues.apache.org/jira/browse/BEAM-6324 > Project: Beam > Issue Type: Improvement > Components: io-java-cassandra >Affects Versions: 2.9.0 >Reporter: Shahar Frank >Assignee: Shahar Frank >Priority: Major > Labels: performance, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > CassandraIO.Read doesn't support using WHERE to filter the input at the > source (In Cassandra) which might provide great performance boost. > Already implemented by: > https://github.com/apache/beam/pull/7340 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6324) CassandraIO.Read - Add the ability to provide a filter to the query
[ https://issues.apache.org/jira/browse/BEAM-6324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751956#comment-16751956 ] Shahar Frank commented on BEAM-6324: Sounds like a plan :) > CassandraIO.Read - Add the ability to provide a filter to the query > --- > > Key: BEAM-6324 > URL: https://issues.apache.org/jira/browse/BEAM-6324 > Project: Beam > Issue Type: Improvement > Components: io-java-cassandra >Affects Versions: 2.9.0 >Reporter: Shahar Frank >Assignee: Shahar Frank >Priority: Major > Labels: performance, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > CassandraIO.Read doesn't support using WHERE to filter the input at the > source (In Cassandra) which might provide great performance boost. > Already implemented by: > https://github.com/apache/beam/pull/7340 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6324) CassandraIO.Read - Add the ability to provide a filter to the query
[ https://issues.apache.org/jira/browse/BEAM-6324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751935#comment-16751935 ] Kenneth Knowles commented on BEAM-6324: --- Yikes! I pinged a couple possible reviewers and raised this on the dev list. I think you may have been hit by the holidays, in part. > CassandraIO.Read - Add the ability to provide a filter to the query > --- > > Key: BEAM-6324 > URL: https://issues.apache.org/jira/browse/BEAM-6324 > Project: Beam > Issue Type: Improvement > Components: io-java-cassandra >Affects Versions: 2.9.0 >Reporter: Shahar Frank >Assignee: Shahar Frank >Priority: Major > Labels: performance, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > CassandraIO.Read doesn't support using WHERE to filter the input at the > source (In Cassandra) which might provide great performance boost. > Already implemented by: > https://github.com/apache/beam/pull/7340 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6324) CassandraIO.Read - Add the ability to provide a filter to the query
[ https://issues.apache.org/jira/browse/BEAM-6324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751950#comment-16751950 ] Shahar Frank commented on BEAM-6324: Thanks [~kenn]. Maybe this helps. Regardless - this PR only is half way what I'd like to have - it's a sort of workaround. The proper solution will require using an updated Casandra Java Driver - [for which the PR was not accepted|https://github.com/datastax/java-driver/pull/1163]. I wouldn't want this PR to be the long-term solution so I can see no way around maintaining my own repos of both projects. Besides I have some more issues with latest - e.g. [Cassandra IO Reader not stack when doing a Join|https://issues.apache.org/jira/browse/BEAM-6289] etc... which might prove hard to merge as well. > CassandraIO.Read - Add the ability to provide a filter to the query > --- > > Key: BEAM-6324 > URL: https://issues.apache.org/jira/browse/BEAM-6324 > Project: Beam > Issue Type: Improvement > Components: io-java-cassandra >Affects Versions: 2.9.0 >Reporter: Shahar Frank >Assignee: Shahar Frank >Priority: Major > Labels: performance, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > CassandraIO.Read doesn't support using WHERE to filter the input at the > source (In Cassandra) which might provide great performance boost. > Already implemented by: > https://github.com/apache/beam/pull/7340 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6024) Gradle setupVirtualenv supports Python 3
[ https://issues.apache.org/jira/browse/BEAM-6024?focusedWorklogId=189863=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189863 ] ASF GitHub Bot logged work on BEAM-6024: Author: ASF GitHub Bot Created on: 25/Jan/19 06:46 Start Date: 25/Jan/19 06:46 Worklog Time Spent: 10m Work Description: markflyhigh commented on issue #7423: [BEAM-6024] Build Python 3 container image with Gradle URL: https://github.com/apache/beam/pull/7423#issuecomment-457473035 Run Python Dataflow ValidatesContainer This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189863) Time Spent: 2.5h (was: 2h 20m) > Gradle setupVirtualenv supports Python 3 > > > Key: BEAM-6024 > URL: https://issues.apache.org/jira/browse/BEAM-6024 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-harness >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > Need to depend on Python 3 virtualenv in few places: > - Build Dataflow worker container in Python 3 > - Run ValidatesRunner and integration tests on Jenkins -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3386) Dependency conflict when Calcite is included in a project.
[ https://issues.apache.org/jira/browse/BEAM-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751942#comment-16751942 ] Kai Jiang commented on BEAM-3386: - [~kenn] this (without relocation) sounds pretty useful. let me checkout the thread quickly. > Dependency conflict when Calcite is included in a project. > -- > > Key: BEAM-3386 > URL: https://issues.apache.org/jira/browse/BEAM-3386 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Affects Versions: 2.2.0, 2.3.0, 2.4.0, 2.5.0, 2.6.0 >Reporter: Austin Haas >Assignee: Kai Jiang >Priority: Critical > > When Calcite (v. 1.13.0) is included in a project that also includes Beam and > the Beam SQL extension, then the following error is thrown when trying to run > Beam code. > ClassCastException > org.apache.beam.sdk.extensions.sql.impl.planner.BeamRelDataTypeSystem cannot > be cast to org.apache.calcite.rel.type.RelDataTypeSystem > org.apache.calcite.jdbc.CalciteConnectionImpl. > (CalciteConnectionImpl.java:120) > > org.apache.calcite.jdbc.CalciteJdbc41Factory$CalciteJdbc41Connection. > (CalciteJdbc41Factory.java:114) > org.apache.calcite.jdbc.CalciteJdbc41Factory.newConnection > (CalciteJdbc41Factory.java:59) > org.apache.calcite.jdbc.CalciteJdbc41Factory.newConnection > (CalciteJdbc41Factory.java:44) > org.apache.calcite.jdbc.CalciteFactory.newConnection > (CalciteFactory.java:53) > org.apache.calcite.avatica.UnregisteredDriver.connect > (UnregisteredDriver.java:138) > java.sql.DriverManager.getConnection (DriverManager.java:664) > java.sql.DriverManager.getConnection (DriverManager.java:208) > > org.apache.beam.sdks.java.extensions.sql.repackaged.org.apache.calcite.tools.Frameworks.withPrepare > (Frameworks.java:145) > > org.apache.beam.sdks.java.extensions.sql.repackaged.org.apache.calcite.tools.Frameworks.withPlanner > (Frameworks.java:106) > > org.apache.beam.sdks.java.extensions.sql.repackaged.org.apache.calcite.prepare.PlannerImpl.ready > (PlannerImpl.java:140) > > org.apache.beam.sdks.java.extensions.sql.repackaged.org.apache.calcite.prepare.PlannerImpl.parse > (PlannerImpl.java:170) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3386) Dependency conflict when Calcite is included in a project.
[ https://issues.apache.org/jira/browse/BEAM-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751939#comment-16751939 ] Kenneth Knowles commented on BEAM-3386: --- [~vectorijk] did you see my thread on the dev list? I tried just a little. I think it will be helpful to first separate the generated code from the main Beam SQL module. That way, the generated code can link to vendored Calcite with a relocation. The main Beam SQL module will depend on vendored Calcite directly without needing relocation. > Dependency conflict when Calcite is included in a project. > -- > > Key: BEAM-3386 > URL: https://issues.apache.org/jira/browse/BEAM-3386 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Affects Versions: 2.2.0, 2.3.0, 2.4.0, 2.5.0, 2.6.0 >Reporter: Austin Haas >Assignee: Kai Jiang >Priority: Critical > > When Calcite (v. 1.13.0) is included in a project that also includes Beam and > the Beam SQL extension, then the following error is thrown when trying to run > Beam code. > ClassCastException > org.apache.beam.sdk.extensions.sql.impl.planner.BeamRelDataTypeSystem cannot > be cast to org.apache.calcite.rel.type.RelDataTypeSystem > org.apache.calcite.jdbc.CalciteConnectionImpl. > (CalciteConnectionImpl.java:120) > > org.apache.calcite.jdbc.CalciteJdbc41Factory$CalciteJdbc41Connection. > (CalciteJdbc41Factory.java:114) > org.apache.calcite.jdbc.CalciteJdbc41Factory.newConnection > (CalciteJdbc41Factory.java:59) > org.apache.calcite.jdbc.CalciteJdbc41Factory.newConnection > (CalciteJdbc41Factory.java:44) > org.apache.calcite.jdbc.CalciteFactory.newConnection > (CalciteFactory.java:53) > org.apache.calcite.avatica.UnregisteredDriver.connect > (UnregisteredDriver.java:138) > java.sql.DriverManager.getConnection (DriverManager.java:664) > java.sql.DriverManager.getConnection (DriverManager.java:208) > > org.apache.beam.sdks.java.extensions.sql.repackaged.org.apache.calcite.tools.Frameworks.withPrepare > (Frameworks.java:145) > > org.apache.beam.sdks.java.extensions.sql.repackaged.org.apache.calcite.tools.Frameworks.withPlanner > (Frameworks.java:106) > > org.apache.beam.sdks.java.extensions.sql.repackaged.org.apache.calcite.prepare.PlannerImpl.ready > (PlannerImpl.java:140) > > org.apache.beam.sdks.java.extensions.sql.repackaged.org.apache.calcite.prepare.PlannerImpl.parse > (PlannerImpl.java:170) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3386) Dependency conflict when Calcite is included in a project.
[ https://issues.apache.org/jira/browse/BEAM-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751928#comment-16751928 ] Kai Jiang commented on BEAM-3386: - [~iemejia] Thanks! I will take some time trying to work on vendor calcite. > Dependency conflict when Calcite is included in a project. > -- > > Key: BEAM-3386 > URL: https://issues.apache.org/jira/browse/BEAM-3386 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Affects Versions: 2.2.0, 2.3.0, 2.4.0, 2.5.0, 2.6.0 >Reporter: Austin Haas >Assignee: Kai Jiang >Priority: Critical > > When Calcite (v. 1.13.0) is included in a project that also includes Beam and > the Beam SQL extension, then the following error is thrown when trying to run > Beam code. > ClassCastException > org.apache.beam.sdk.extensions.sql.impl.planner.BeamRelDataTypeSystem cannot > be cast to org.apache.calcite.rel.type.RelDataTypeSystem > org.apache.calcite.jdbc.CalciteConnectionImpl. > (CalciteConnectionImpl.java:120) > > org.apache.calcite.jdbc.CalciteJdbc41Factory$CalciteJdbc41Connection. > (CalciteJdbc41Factory.java:114) > org.apache.calcite.jdbc.CalciteJdbc41Factory.newConnection > (CalciteJdbc41Factory.java:59) > org.apache.calcite.jdbc.CalciteJdbc41Factory.newConnection > (CalciteJdbc41Factory.java:44) > org.apache.calcite.jdbc.CalciteFactory.newConnection > (CalciteFactory.java:53) > org.apache.calcite.avatica.UnregisteredDriver.connect > (UnregisteredDriver.java:138) > java.sql.DriverManager.getConnection (DriverManager.java:664) > java.sql.DriverManager.getConnection (DriverManager.java:208) > > org.apache.beam.sdks.java.extensions.sql.repackaged.org.apache.calcite.tools.Frameworks.withPrepare > (Frameworks.java:145) > > org.apache.beam.sdks.java.extensions.sql.repackaged.org.apache.calcite.tools.Frameworks.withPlanner > (Frameworks.java:106) > > org.apache.beam.sdks.java.extensions.sql.repackaged.org.apache.calcite.prepare.PlannerImpl.ready > (PlannerImpl.java:140) > > org.apache.beam.sdks.java.extensions.sql.repackaged.org.apache.calcite.prepare.PlannerImpl.parse > (PlannerImpl.java:170) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6024) Gradle setupVirtualenv supports Python 3
[ https://issues.apache.org/jira/browse/BEAM-6024?focusedWorklogId=189857=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189857 ] ASF GitHub Bot logged work on BEAM-6024: Author: ASF GitHub Bot Created on: 25/Jan/19 06:05 Start Date: 25/Jan/19 06:05 Worklog Time Spent: 10m Work Description: markflyhigh commented on issue #7423: [BEAM-6024] Build Python 3 container image with Gradle URL: https://github.com/apache/beam/pull/7423#issuecomment-457466225 Run Seed Job This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189857) Time Spent: 2h 20m (was: 2h 10m) > Gradle setupVirtualenv supports Python 3 > > > Key: BEAM-6024 > URL: https://issues.apache.org/jira/browse/BEAM-6024 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-harness >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > Need to depend on Python 3 virtualenv in few places: > - Build Dataflow worker container in Python 3 > - Run ValidatesRunner and integration tests on Jenkins -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5953) Support DataflowRunner on Python 3
[ https://issues.apache.org/jira/browse/BEAM-5953?focusedWorklogId=189856=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189856 ] ASF GitHub Bot logged work on BEAM-5953: Author: ASF GitHub Bot Created on: 25/Jan/19 06:04 Start Date: 25/Jan/19 06:04 Worklog Time Spent: 10m Work Description: markflyhigh commented on issue #7521: [BEAM-5953] Fix py3 type error in bundle_processor URL: https://github.com/apache/beam/pull/7521#issuecomment-457466049 PTAL @tvalentyn @robertwb This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189856) Time Spent: 4h 20m (was: 4h 10m) > Support DataflowRunner on Python 3 > -- > > Key: BEAM-5953 > URL: https://issues.apache.org/jira/browse/BEAM-5953 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Time Spent: 4h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5953) Support DataflowRunner on Python 3
[ https://issues.apache.org/jira/browse/BEAM-5953?focusedWorklogId=189855=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189855 ] ASF GitHub Bot logged work on BEAM-5953: Author: ASF GitHub Bot Created on: 25/Jan/19 06:03 Start Date: 25/Jan/19 06:03 Worklog Time Spent: 10m Work Description: markflyhigh commented on issue #7521: [BEAM-5953] Fix py3 type error in bundle_processor URL: https://github.com/apache/beam/pull/7521#issuecomment-457466049 PTAL This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189855) Time Spent: 4h 10m (was: 4h) > Support DataflowRunner on Python 3 > -- > > Key: BEAM-5953 > URL: https://issues.apache.org/jira/browse/BEAM-5953 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6324) CassandraIO.Read - Add the ability to provide a filter to the query
[ https://issues.apache.org/jira/browse/BEAM-6324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751917#comment-16751917 ] Shahar Frank commented on BEAM-6324: [~kenn] - I've pretty much given up on the PR being merged. I use [my own fork|https://github.com/srfrnk/beam] for my projects. I've included the compiled JAR in the repo and using it [directly from Github like so|[https://github.com/srfrnk/beam-playground].] It also uses [my own version for the Datastax Cassandra Driver for Java|https://github.com/srfrnk/java-driver] with the changes I needed to accomplish this. Hope that helps. > CassandraIO.Read - Add the ability to provide a filter to the query > --- > > Key: BEAM-6324 > URL: https://issues.apache.org/jira/browse/BEAM-6324 > Project: Beam > Issue Type: Improvement > Components: io-java-cassandra >Affects Versions: 2.9.0 >Reporter: Shahar Frank >Assignee: Shahar Frank >Priority: Major > Labels: performance, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > CassandraIO.Read doesn't support using WHERE to filter the input at the > source (In Cassandra) which might provide great performance boost. > Already implemented by: > https://github.com/apache/beam/pull/7340 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6361) Fix user-metric prefix detection for portable Flink metrics
[ https://issues.apache.org/jira/browse/BEAM-6361?focusedWorklogId=189849=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189849 ] ASF GitHub Bot logged work on BEAM-6361: Author: ASF GitHub Bot Created on: 25/Jan/19 05:03 Start Date: 25/Jan/19 05:03 Worklog Time Spent: 10m Work Description: ryan-williams commented on issue #7408: [BEAM-6361][BEAM-6364] fix user-metric-prefix checking in Flink portable metrics update URL: https://github.com/apache/beam/pull/7408#issuecomment-457452816 Looks like everything passed the second time around, so this should be good to go This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189849) Time Spent: 3h 10m (was: 3h) > Fix user-metric prefix detection for portable Flink metrics > --- > > Key: BEAM-6361 > URL: https://issues.apache.org/jira/browse/BEAM-6361 > Project: Beam > Issue Type: Bug > Components: runner-flink >Affects Versions: 2.9.0 >Reporter: Ryan Williams >Assignee: Ryan Williams >Priority: Minor > Time Spent: 3h 10m > Remaining Estimate: 0h > > [~ajam...@google.com] [noticed a > bug|https://github.com/apache/beam/pull/7183/files#r243402155] in classifying > portable Flink metrics as user-namespaced vs not. > I think it's only currently affecting how metrics get displayed downstream in > Flink, but it should be fixed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6361) Fix user-metric prefix detection for portable Flink metrics
[ https://issues.apache.org/jira/browse/BEAM-6361?focusedWorklogId=189845=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189845 ] ASF GitHub Bot logged work on BEAM-6361: Author: ASF GitHub Bot Created on: 25/Jan/19 04:34 Start Date: 25/Jan/19 04:34 Worklog Time Spent: 10m Work Description: ryan-williams commented on issue #7408: [BEAM-6361][BEAM-6364] fix user-metric-prefix checking in Flink portable metrics update URL: https://github.com/apache/beam/pull/7408#issuecomment-457452816 Looks like everything passed the second time around, so this should be good to go This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189845) Time Spent: 2h 50m (was: 2h 40m) > Fix user-metric prefix detection for portable Flink metrics > --- > > Key: BEAM-6361 > URL: https://issues.apache.org/jira/browse/BEAM-6361 > Project: Beam > Issue Type: Bug > Components: runner-flink >Affects Versions: 2.9.0 >Reporter: Ryan Williams >Assignee: Ryan Williams >Priority: Minor > Time Spent: 2h 50m > Remaining Estimate: 0h > > [~ajam...@google.com] [noticed a > bug|https://github.com/apache/beam/pull/7183/files#r243402155] in classifying > portable Flink metrics as user-namespaced vs not. > I think it's only currently affecting how metrics get displayed downstream in > Flink, but it should be fixed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6361) Fix user-metric prefix detection for portable Flink metrics
[ https://issues.apache.org/jira/browse/BEAM-6361?focusedWorklogId=189846=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189846 ] ASF GitHub Bot logged work on BEAM-6361: Author: ASF GitHub Bot Created on: 25/Jan/19 04:35 Start Date: 25/Jan/19 04:35 Worklog Time Spent: 10m Work Description: ryan-williams commented on issue #7408: [BEAM-6361][BEAM-6364] fix user-metric-prefix checking in Flink portable metrics update URL: https://github.com/apache/beam/pull/7408#issuecomment-457452816 Looks like everything passed the second time around, so this should be good to go edit: Python_PVR_Flink is apparently still running, though github was showing me all checks as passed a moment ago This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189846) Time Spent: 3h (was: 2h 50m) > Fix user-metric prefix detection for portable Flink metrics > --- > > Key: BEAM-6361 > URL: https://issues.apache.org/jira/browse/BEAM-6361 > Project: Beam > Issue Type: Bug > Components: runner-flink >Affects Versions: 2.9.0 >Reporter: Ryan Williams >Assignee: Ryan Williams >Priority: Minor > Time Spent: 3h > Remaining Estimate: 0h > > [~ajam...@google.com] [noticed a > bug|https://github.com/apache/beam/pull/7183/files#r243402155] in classifying > portable Flink metrics as user-namespaced vs not. > I think it's only currently affecting how metrics get displayed downstream in > Flink, but it should be fixed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6487) BigQuery api is out of date
[ https://issues.apache.org/jira/browse/BEAM-6487?focusedWorklogId=189837=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189837 ] ASF GitHub Bot logged work on BEAM-6487: Author: ASF GitHub Bot Created on: 25/Jan/19 03:06 Start Date: 25/Jan/19 03:06 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #7590: [BEAM-6487] Updating BQ API URL: https://github.com/apache/beam/pull/7590 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189837) Time Spent: 1h 10m (was: 1h) > BigQuery api is out of date > --- > > Key: BEAM-6487 > URL: https://issues.apache.org/jira/browse/BEAM-6487 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5442) PortableRunner swallows custom options for Runner
[ https://issues.apache.org/jira/browse/BEAM-5442?focusedWorklogId=189835=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189835 ] ASF GitHub Bot logged work on BEAM-5442: Author: ASF GitHub Bot Created on: 25/Jan/19 03:04 Start Date: 25/Jan/19 03:04 Worklog Time Spent: 10m Work Description: tweise commented on issue #7597: [WIP] [BEAM-5442] Retrieve valid runner options from JobService URL: https://github.com/apache/beam/pull/7597#issuecomment-457439306 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189835) Time Spent: 17h 50m (was: 17h 40m) > PortableRunner swallows custom options for Runner > - > > Key: BEAM-5442 > URL: https://issues.apache.org/jira/browse/BEAM-5442 > Project: Beam > Issue Type: Bug > Components: sdk-java-core, sdk-py-core >Reporter: Maximilian Michels >Assignee: Thomas Weise >Priority: Major > Labels: portability, portability-flink > Time Spent: 17h 50m > Remaining Estimate: 0h > > The PortableRunner doesn't pass custom PipelineOptions to the executing > Runner. > Example: {{--parallelism=4}} won't be forwarded to the FlinkRunner. > (The option is just removed during proto translation without any warning) > We should allow some form of customization through the options, even for the > PortableRunner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6431) Add ExecutionTime metrics to the Beam Java SDK
[ https://issues.apache.org/jira/browse/BEAM-6431?focusedWorklogId=189829=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189829 ] ASF GitHub Bot logged work on BEAM-6431: Author: ASF GitHub Bot Created on: 25/Jan/19 02:22 Start Date: 25/Jan/19 02:22 Worklog Time Spent: 10m Work Description: swegner commented on pull request #7507: [BEAM-6431] Refactor, Remove references to Dataflow classes in base State Sampling URL: https://github.com/apache/beam/pull/7507 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189829) Time Spent: 1h 10m (was: 1h) > Add ExecutionTime metrics to the Beam Java SDK > -- > > Key: BEAM-6431 > URL: https://issues.apache.org/jira/browse/BEAM-6431 > Project: Beam > Issue Type: New Feature > Components: java-fn-execution >Reporter: Alex Amato >Assignee: Alex Amato >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > This will be done by using the Dataflow worker's StateSampler code. I have > put together a refactoring plan > [here|https://docs.google.com/document/d/1OlAJf4T_CTL9WRH8lP8uQOfLjWYfm8IpRXSe38g34k4/edit#] > This will include estimating the processing time for the start, process and > finish bundle. The python SDK already has an implementation of this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6431) Add ExecutionTime metrics to the Beam Java SDK
[ https://issues.apache.org/jira/browse/BEAM-6431?focusedWorklogId=189822=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189822 ] ASF GitHub Bot logged work on BEAM-6431: Author: ASF GitHub Bot Created on: 25/Jan/19 02:03 Start Date: 25/Jan/19 02:03 Worklog Time Spent: 10m Work Description: ajamato commented on issue #7507: [BEAM-6431] Refactor, Remove references to Dataflow classes in base State Sampling URL: https://github.com/apache/beam/pull/7507#issuecomment-457428834 I have finished all the extra testing which I wanted to perform. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189822) Time Spent: 1h (was: 50m) > Add ExecutionTime metrics to the Beam Java SDK > -- > > Key: BEAM-6431 > URL: https://issues.apache.org/jira/browse/BEAM-6431 > Project: Beam > Issue Type: New Feature > Components: java-fn-execution >Reporter: Alex Amato >Assignee: Alex Amato >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > This will be done by using the Dataflow worker's StateSampler code. I have > put together a refactoring plan > [here|https://docs.google.com/document/d/1OlAJf4T_CTL9WRH8lP8uQOfLjWYfm8IpRXSe38g34k4/edit#] > This will include estimating the processing time for the start, process and > finish bundle. The python SDK already has an implementation of this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6233) Make bundle execution with ExecutableStage support timer/states
[ https://issues.apache.org/jira/browse/BEAM-6233?focusedWorklogId=189808=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189808 ] ASF GitHub Bot logged work on BEAM-6233: Author: ASF GitHub Bot Created on: 25/Jan/19 00:53 Start Date: 25/Jan/19 00:53 Worklog Time Spent: 10m Work Description: rohdesamuel commented on issue #7330: [BEAM-6233]: Add initial user timer support in Dataflow for batch pipelines URL: https://github.com/apache/beam/pull/7330#issuecomment-457415340 R: @kennknowles, hey Kenn are you able to review this please? I want to get a good confirmation that the general logic is correct in this PR. In a subsequent PR I'll pull the timer logic into a different class. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189808) Time Spent: 3h 20m (was: 3h 10m) > Make bundle execution with ExecutableStage support timer/states > --- > > Key: BEAM-6233 > URL: https://issues.apache.org/jira/browse/BEAM-6233 > Project: Beam > Issue Type: Task > Components: runner-dataflow >Reporter: Boyuan Zhang >Assignee: Sam Rohde >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6506) Dataflow runner harness should always use proper coder URNs
Mark Liu created BEAM-6506: -- Summary: Dataflow runner harness should always use proper coder URNs Key: BEAM-6506 URL: https://issues.apache.org/jira/browse/BEAM-6506 Project: Beam Issue Type: Wish Components: runner-core Reporter: Mark Liu Assignee: Robert Bradshaw This comes from a Python 3 incompatible bug and mentioned in https://github.com/apache/beam/pull/7521/files#r248993766. The fix decode payload unconditionally in Python 2 or 3 but suggested by [~robertwb]: "the Dataflow runner harness was fixed to always use proper coder URNs rather than stuffing json-ized dataflow v1b3 cloud proto representations into the payload". -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6506) Dataflow runner harness should always use proper coder URNs
[ https://issues.apache.org/jira/browse/BEAM-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Liu updated BEAM-6506: --- Labels: portability (was: ) > Dataflow runner harness should always use proper coder URNs > --- > > Key: BEAM-6506 > URL: https://issues.apache.org/jira/browse/BEAM-6506 > Project: Beam > Issue Type: Wish > Components: runner-core >Reporter: Mark Liu >Assignee: Robert Bradshaw >Priority: Major > Labels: portability > > This comes from a Python 3 incompatible bug and mentioned in > https://github.com/apache/beam/pull/7521/files#r248993766. The fix decode > payload unconditionally in Python 2 or 3 but suggested by [~robertwb]: > "the Dataflow runner harness was fixed to always use proper coder URNs rather > than stuffing json-ized dataflow v1b3 cloud proto representations into the > payload". -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5953) Support DataflowRunner on Python 3
[ https://issues.apache.org/jira/browse/BEAM-5953?focusedWorklogId=189807=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189807 ] ASF GitHub Bot logged work on BEAM-5953: Author: ASF GitHub Bot Created on: 25/Jan/19 00:53 Start Date: 25/Jan/19 00:53 Worklog Time Spent: 10m Work Description: markflyhigh commented on pull request #7521: [BEAM-5953] Fix py3 type error in bundle_processor URL: https://github.com/apache/beam/pull/7521#discussion_r250833446 ## File path: sdks/python/apache_beam/runners/worker/bundle_processor.py ## @@ -591,8 +592,10 @@ def get_coder(self, coder_id): return self.context.coders.get_by_id(coder_id) else: # No URN, assume cloud object encoding json bytes. - return operation_specs.get_coder_from_spec( - json.loads(coder_proto.spec.spec.payload)) + payload = coder_proto.spec.spec.payload + if isinstance(payload, bytes) and sys.version_info[0] == 3: Review comment: sg. Also filed https://issues.apache.org/jira/browse/BEAM-6506 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189807) Time Spent: 4h (was: 3h 50m) > Support DataflowRunner on Python 3 > -- > > Key: BEAM-5953 > URL: https://issues.apache.org/jira/browse/BEAM-5953 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Time Spent: 4h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5442) PortableRunner swallows custom options for Runner
[ https://issues.apache.org/jira/browse/BEAM-5442?focusedWorklogId=189805=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189805 ] ASF GitHub Bot logged work on BEAM-5442: Author: ASF GitHub Bot Created on: 25/Jan/19 00:48 Start Date: 25/Jan/19 00:48 Worklog Time Spent: 10m Work Description: tweise commented on pull request #7597: [WIP] [BEAM-5442] Retrieve valid runner options from JobService URL: https://github.com/apache/beam/pull/7597#discussion_r250832615 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsFactory.java ## @@ -630,6 +634,55 @@ public static void printHelp(PrintStream out, Class i } } + public static Map describe(Set> ifaces) { +checkNotNull(ifaces); +Map result = new HashMap<>(); + +for (Class iface : ifaces) { + CACHE.get().validateWellFormed(iface); + + Set properties = PipelineOptionsReflector.getOptionSpecs(iface); + + RowSortedTable, String, Method> ifacePropGetterTable = + TreeBasedTable.create(ClassNameComparator.INSTANCE, Ordering.natural()); + for (PipelineOptionSpec prop : properties) { +ifacePropGetterTable.put(prop.getDefiningInterface(), prop.getName(), prop.getGetterMethod()); + } + + for (Map.Entry, Map> ifaceToPropertyMap : + ifacePropGetterTable.rowMap().entrySet()) { +Class currentIface = ifaceToPropertyMap.getKey(); +Map propertyNamesToGetters = ifaceToPropertyMap.getValue(); + +List lists = Lists.newArrayList(propertyNamesToGetters.keySet()); +lists.sort(String.CASE_INSENSITIVE_ORDER); +for (String propertyName : lists) { + Method method = propertyNamesToGetters.get(propertyName); + // TODO: type representation + String printableType = method.getReturnType().getSimpleName(); Review comment: Perhaps we can alternatively just define the accumulation for the option, i.e. the equivalent of `store`, `append`, `store_true`? That's all that seems to be required at this point. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189805) Time Spent: 17h 40m (was: 17.5h) > PortableRunner swallows custom options for Runner > - > > Key: BEAM-5442 > URL: https://issues.apache.org/jira/browse/BEAM-5442 > Project: Beam > Issue Type: Bug > Components: sdk-java-core, sdk-py-core >Reporter: Maximilian Michels >Assignee: Thomas Weise >Priority: Major > Labels: portability, portability-flink > Time Spent: 17h 40m > Remaining Estimate: 0h > > The PortableRunner doesn't pass custom PipelineOptions to the executing > Runner. > Example: {{--parallelism=4}} won't be forwarded to the FlinkRunner. > (The option is just removed during proto translation without any warning) > We should allow some form of customization through the options, even for the > PortableRunner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6505) Java SDK - Allow System Counters (which don't need MetricsContainer context)
Alex Amato created BEAM-6505: Summary: Java SDK - Allow System Counters (which don't need MetricsContainer context) Key: BEAM-6505 URL: https://issues.apache.org/jira/browse/BEAM-6505 Project: Beam Issue Type: New Feature Components: java-fn-execution Reporter: Alex Amato Assignee: Alex Amato See the comment added for this issue in ElementCountFnDataReceiver.java The method used to create these metrics relies on the currently in scope metrics container, though we should use the same metrics container every time this code is invoked instead. There is no need to use the current scoped metric container, which only offers the main benefit to user counters, by attaching the PTransform name to the metrics. In this case the metric does not need the currently scoped PTransform name, since the code is labelling the metrics with the pcollection, and similar cases can manually attach the ptransform name (i.e. for execution time metrics). We can make the static method LabelledMetrics.counter(metricName) obtain a consistent metric container instead of looking for the currently scoped metric container. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6361) Fix user-metric prefix detection for portable Flink metrics
[ https://issues.apache.org/jira/browse/BEAM-6361?focusedWorklogId=189804=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189804 ] ASF GitHub Bot logged work on BEAM-6361: Author: ASF GitHub Bot Created on: 25/Jan/19 00:44 Start Date: 25/Jan/19 00:44 Worklog Time Spent: 10m Work Description: mxm commented on issue #7408: [BEAM-6361][BEAM-6364] fix user-metric-prefix checking in Flink portable metrics update URL: https://github.com/apache/beam/pull/7408#issuecomment-457413639 If it's only that test failing, it's fine to merge. All other tests still run. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189804) Time Spent: 2h 40m (was: 2.5h) > Fix user-metric prefix detection for portable Flink metrics > --- > > Key: BEAM-6361 > URL: https://issues.apache.org/jira/browse/BEAM-6361 > Project: Beam > Issue Type: Bug > Components: runner-flink >Affects Versions: 2.9.0 >Reporter: Ryan Williams >Assignee: Ryan Williams >Priority: Minor > Time Spent: 2h 40m > Remaining Estimate: 0h > > [~ajam...@google.com] [noticed a > bug|https://github.com/apache/beam/pull/7183/files#r243402155] in classifying > portable Flink metrics as user-namespaced vs not. > I think it's only currently affecting how metrics get displayed downstream in > Flink, but it should be fixed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6487) BigQuery api is out of date
[ https://issues.apache.org/jira/browse/BEAM-6487?focusedWorklogId=189787=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189787 ] ASF GitHub Bot logged work on BEAM-6487: Author: ASF GitHub Bot Created on: 25/Jan/19 00:32 Start Date: 25/Jan/19 00:32 Worklog Time Spent: 10m Work Description: pabloem commented on issue #7590: [BEAM-6487] Updating BQ API URL: https://github.com/apache/beam/pull/7590#issuecomment-457410870 added lint exclusion for generated internal proto sources cc:@tvalentyn This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189787) Time Spent: 1h (was: 50m) > BigQuery api is out of date > --- > > Key: BEAM-6487 > URL: https://issues.apache.org/jira/browse/BEAM-6487 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6468) Cannot create empty TestBoundedTable
[ https://issues.apache.org/jira/browse/BEAM-6468?focusedWorklogId=189801=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189801 ] ASF GitHub Bot logged work on BEAM-6468: Author: ASF GitHub Bot Created on: 25/Jan/19 00:33 Start Date: 25/Jan/19 00:33 Worklog Time Spent: 10m Work Description: akedin commented on pull request #7568: [BEAM-6468] Allow creating empty TestBoundedTable URL: https://github.com/apache/beam/pull/7568 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189801) Time Spent: 3.5h (was: 3h 20m) > Cannot create empty TestBoundedTable > > > Key: BEAM-6468 > URL: https://issues.apache.org/jira/browse/BEAM-6468 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6361) Fix user-metric prefix detection for portable Flink metrics
[ https://issues.apache.org/jira/browse/BEAM-6361?focusedWorklogId=189781=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189781 ] ASF GitHub Bot logged work on BEAM-6361: Author: ASF GitHub Bot Created on: 25/Jan/19 00:28 Start Date: 25/Jan/19 00:28 Worklog Time Spent: 10m Work Description: ryan-williams commented on issue #7408: [BEAM-6361][BEAM-6364] fix user-metric-prefix checking in Flink portable metrics update URL: https://github.com/apache/beam/pull/7408#issuecomment-457410082 ``` org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest > testWriteRetryValidRequest FAILED java.lang.AssertionError at ElasticsearchIOTest.java:219 ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189781) Time Spent: 2.5h (was: 2h 20m) > Fix user-metric prefix detection for portable Flink metrics > --- > > Key: BEAM-6361 > URL: https://issues.apache.org/jira/browse/BEAM-6361 > Project: Beam > Issue Type: Bug > Components: runner-flink >Affects Versions: 2.9.0 >Reporter: Ryan Williams >Assignee: Ryan Williams >Priority: Minor > Time Spent: 2.5h > Remaining Estimate: 0h > > [~ajam...@google.com] [noticed a > bug|https://github.com/apache/beam/pull/7183/files#r243402155] in classifying > portable Flink metrics as user-namespaced vs not. > I think it's only currently affecting how metrics get displayed downstream in > Flink, but it should be fixed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6361) Fix user-metric prefix detection for portable Flink metrics
[ https://issues.apache.org/jira/browse/BEAM-6361?focusedWorklogId=189780=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189780 ] ASF GitHub Bot logged work on BEAM-6361: Author: ASF GitHub Bot Created on: 25/Jan/19 00:27 Start Date: 25/Jan/19 00:27 Worklog Time Spent: 10m Work Description: ryan-williams commented on issue #7408: [BEAM-6361][BEAM-6364] fix user-metric-prefix checking in Flink portable metrics update URL: https://github.com/apache/beam/pull/7408#issuecomment-457409945 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189780) Time Spent: 2h 20m (was: 2h 10m) > Fix user-metric prefix detection for portable Flink metrics > --- > > Key: BEAM-6361 > URL: https://issues.apache.org/jira/browse/BEAM-6361 > Project: Beam > Issue Type: Bug > Components: runner-flink >Affects Versions: 2.9.0 >Reporter: Ryan Williams >Assignee: Ryan Williams >Priority: Minor > Time Spent: 2h 20m > Remaining Estimate: 0h > > [~ajam...@google.com] [noticed a > bug|https://github.com/apache/beam/pull/7183/files#r243402155] in classifying > portable Flink metrics as user-namespaced vs not. > I think it's only currently affecting how metrics get displayed downstream in > Flink, but it should be fixed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6468) Cannot create empty TestBoundedTable
[ https://issues.apache.org/jira/browse/BEAM-6468?focusedWorklogId=189777=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189777 ] ASF GitHub Bot logged work on BEAM-6468: Author: ASF GitHub Bot Created on: 25/Jan/19 00:22 Start Date: 25/Jan/19 00:22 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #7568: [BEAM-6468] Allow creating empty TestBoundedTable URL: https://github.com/apache/beam/pull/7568#discussion_r250828057 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestBoundedTable.java ## @@ -92,8 +95,19 @@ public TestBoundedTable addRows(Object... args) { @Override public PCollection buildIOReader(PBegin begin) { +PTransform> valueTransform; + +if (rows.isEmpty()) { + valueTransform = + Create.empty( + SchemaCoder.of( + schema, SerializableFunctions.identity(), SerializableFunctions.identity())); +} else { + valueTransform = Create.of(rows); Review comment: Thanks. Didn't notice this API to set SchemaCoder. Updated. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189777) Time Spent: 3h 20m (was: 3h 10m) > Cannot create empty TestBoundedTable > > > Key: BEAM-6468 > URL: https://issues.apache.org/jira/browse/BEAM-6468 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6161) Add ElementCount MonitoringInfos for the Java SDK
[ https://issues.apache.org/jira/browse/BEAM-6161?focusedWorklogId=189776=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189776 ] ASF GitHub Bot logged work on BEAM-6161: Author: ASF GitHub Bot Created on: 25/Jan/19 00:21 Start Date: 25/Jan/19 00:21 Worklog Time Spent: 10m Work Description: ryan-williams commented on pull request #7272: [BEAM-6161] Introduce PCollectionConsumerRegistry and add ElementCoun… URL: https://github.com/apache/beam/pull/7272#discussion_r250827912 ## File path: runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/SpecMonitoringInfoValidator.java ## @@ -68,6 +68,7 @@ public SpecMonitoringInfoValidator() { monitoringInfo.getUrn(), spec.getTypeUrn(), monitoringInfo.getType())); } +// TODO(ajamato): Tighten this restriction to use set equality, to catch unused Review comment: Just noting: I recently had a local change to keep this a bit relaxed, because I was testing with user metrics that were getting a `PTRANSFORM` label. I'd weakened the condition below to: ```java !requiredLabels.ieEmpty && !monitoringInfo.getLabelsMap().keySet().equals(requiredLabels) ``` where previously it required strict equality in all cases. Seemingly this was relaxed to the `containsAll` below before I got anywhere with my change, and it's possible I was in some other invalid state to have had user metrics with a `PTRANSFORM` label that were failing the strict test (where `requiredLabels` was empty). Just wanted to leave a breadcrumb here about that since we'll probably come back to it soon.. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189776) Time Spent: 11.5h (was: 11h 20m) > Add ElementCount MonitoringInfos for the Java SDK > - > > Key: BEAM-6161 > URL: https://issues.apache.org/jira/browse/BEAM-6161 > Project: Beam > Issue Type: New Feature > Components: java-fn-execution, sdk-java-harness >Reporter: Alex Amato >Assignee: Alex Amato >Priority: Major > Time Spent: 11.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-4620) UnboundedReadFromBoundedSource.split() should always call split()
[ https://issues.apache.org/jira/browse/BEAM-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chamikara Jayalath resolved BEAM-4620. -- Resolution: Fixed Fix Version/s: 2.11.0 > UnboundedReadFromBoundedSource.split() should always call split() > - > > Key: BEAM-4620 > URL: https://issues.apache.org/jira/browse/BEAM-4620 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Josh M >Assignee: Chamikara Jayalath >Priority: Minor > Fix For: 2.11.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > If the source contains too little data (rounds the result down to 0), or if > size estimation fails, then we don't call .split(). We must always call > split(); instead of returning original source, we should have computed some > fallback value for desiredBundleSize. > This bug has existed since the code was first introduced, so it is not a > regression - but it is certainly a bug. > > source: > https://github.com/apache/beam/blob/697a1d17e473cd5b097aaaeee24c08f43cc77f58/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/UnboundedReadFromBoundedSource.java#L137 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4620) UnboundedReadFromBoundedSource.split() should always call split()
[ https://issues.apache.org/jira/browse/BEAM-4620?focusedWorklogId=189762=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189762 ] ASF GitHub Bot logged work on BEAM-4620: Author: ASF GitHub Bot Created on: 24/Jan/19 23:59 Start Date: 24/Jan/19 23:59 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #7555: [BEAM-4620] UnboundedReadFromBoundedSource invokes split for small bounded sources URL: https://github.com/apache/beam/pull/7555#issuecomment-457404131 Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189762) Time Spent: 1h 10m (was: 1h) > UnboundedReadFromBoundedSource.split() should always call split() > - > > Key: BEAM-4620 > URL: https://issues.apache.org/jira/browse/BEAM-4620 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Josh M >Assignee: Chamikara Jayalath >Priority: Minor > Fix For: 2.11.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > If the source contains too little data (rounds the result down to 0), or if > size estimation fails, then we don't call .split(). We must always call > split(); instead of returning original source, we should have computed some > fallback value for desiredBundleSize. > This bug has existed since the code was first introduced, so it is not a > regression - but it is certainly a bug. > > source: > https://github.com/apache/beam/blob/697a1d17e473cd5b097aaaeee24c08f43cc77f58/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/UnboundedReadFromBoundedSource.java#L137 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4620) UnboundedReadFromBoundedSource.split() should always call split()
[ https://issues.apache.org/jira/browse/BEAM-4620?focusedWorklogId=189759=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189759 ] ASF GitHub Bot logged work on BEAM-4620: Author: ASF GitHub Bot Created on: 24/Jan/19 23:50 Start Date: 24/Jan/19 23:50 Worklog Time Spent: 10m Work Description: swegner commented on pull request #7555: [BEAM-4620] UnboundedReadFromBoundedSource invokes split for small bounded sources URL: https://github.com/apache/beam/pull/7555 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189759) Time Spent: 1h (was: 50m) > UnboundedReadFromBoundedSource.split() should always call split() > - > > Key: BEAM-4620 > URL: https://issues.apache.org/jira/browse/BEAM-4620 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Josh M >Assignee: Chamikara Jayalath >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > If the source contains too little data (rounds the result down to 0), or if > size estimation fails, then we don't call .split(). We must always call > split(); instead of returning original source, we should have computed some > fallback value for desiredBundleSize. > This bug has existed since the code was first introduced, so it is not a > regression - but it is certainly a bug. > > source: > https://github.com/apache/beam/blob/697a1d17e473cd5b097aaaeee24c08f43cc77f58/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/UnboundedReadFromBoundedSource.java#L137 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-5977) Benchmark buffer use in the Go SDK Harness
[ https://issues.apache.org/jira/browse/BEAM-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke reassigned BEAM-5977: -- Assignee: Robert Burke > Benchmark buffer use in the Go SDK Harness > -- > > Key: BEAM-5977 > URL: https://issues.apache.org/jira/browse/BEAM-5977 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Robert Burke >Assignee: Robert Burke >Priority: Major > Fix For: Not applicable > > > There's opportunity to reduce CPU & RAM usage with better buffer re-use in > [datamgr.go|https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/harness/datamgr.go] > WRT both the small element batch buffer and with large elements. > There should be some benchmarking around both large elements, and smaller > elements of varying sizes, (and ideally mixing), and using that we can > measure subsequent improvements. > As a performance improvement, offhand, maintaining a pair of a couple of > `chunkSize` buffers could be handy and avoid associated GRPC costs, to handle > the smaller elements, as well as flushing large elements immediately and > without copying. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6468) Cannot create empty TestBoundedTable
[ https://issues.apache.org/jira/browse/BEAM-6468?focusedWorklogId=189753=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189753 ] ASF GitHub Bot logged work on BEAM-6468: Author: ASF GitHub Bot Created on: 24/Jan/19 23:35 Start Date: 24/Jan/19 23:35 Worklog Time Spent: 10m Work Description: akedin commented on pull request #7568: [BEAM-6468] Allow creating empty TestBoundedTable URL: https://github.com/apache/beam/pull/7568#discussion_r250818620 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestBoundedTable.java ## @@ -92,8 +95,19 @@ public TestBoundedTable addRows(Object... args) { @Override public PCollection buildIOReader(PBegin begin) { +PTransform> valueTransform; + +if (rows.isEmpty()) { + valueTransform = + Create.empty( + SchemaCoder.of( + schema, SerializableFunctions.identity(), SerializableFunctions.identity())); +} else { + valueTransform = Create.of(rows); Review comment: why not just always set the schema, it will then handle both cases: `Create.of(rows).withSchema(SchemaCoder.of(...))` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189753) Time Spent: 3h 10m (was: 3h) > Cannot create empty TestBoundedTable > > > Key: BEAM-6468 > URL: https://issues.apache.org/jira/browse/BEAM-6468 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6154) Gcsio batch delete broken in Python 3
[ https://issues.apache.org/jira/browse/BEAM-6154?focusedWorklogId=189748=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189748 ] ASF GitHub Bot logged work on BEAM-6154: Author: ASF GitHub Bot Created on: 24/Jan/19 23:28 Start Date: 24/Jan/19 23:28 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #7617: [BEAM-6154] Update google-apitools to 0.5.26 and fix gcsio in python 3 URL: https://github.com/apache/beam/pull/7617#issuecomment-457397478 R: @aaltay This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189748) Time Spent: 1h 50m (was: 1h 40m) > Gcsio batch delete broken in Python 3 > - > > Key: BEAM-6154 > URL: https://issues.apache.org/jira/browse/BEAM-6154 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > I'm running Python SDK agianst GCP in Python 3.5 and got following gcsio > error while deleting files: > {code} > File "/usr/local/lib/python3.5/site-packages/apache_beam/io/iobase.py", > line 1077, in > window.TimestampedValue(v, timestamp.MAX_TIMESTAMP) for v in outputs) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", > line 315, in finalize_write > num_threads) > File "/usr/local/lib/python3.5/site-packages/apache_beam/internal/util.py", > line 145, in run_using_threadpool > return pool.map(fn_to_execute, inputs) > File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 266, in map > return self._map_async(func, iterable, mapstar, chunksize).get() > File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 644, in get > raise self._value > File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 119, in worker > result = (True, func(*args, **kwds)) > File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 44, in mapstar > return list(map(*args)) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", > line 299, in _rename_batch > FileSystems.rename(source_files, destination_files) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/filesystems.py", line > 252, in rename > return filesystem.rename(source_file_names, destination_file_names) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsfilesystem.py", > line 229, in rename > copy_statuses = gcsio.GcsIO().copy_batch(batch) > File "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsio.py", > line 322, in copy_batch > api_calls = batch_request.Execute(self.client._http) # pylint: > disable=protected-access > File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", > line 222, in Execute > batch_http_request.Execute(http) > File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", > line 480, in Execute > self._Execute(http) > File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", > line 450, in _Execute > mime_response = parser.parsestr(header + response.content) > TypeError: Can't convert 'bytes' object to str implicitly > {code} > After looking into related code in apitools library, I found response.content > that's returned via http request to gcs is bytes and apitools didn't handle > this scenario. This can be a blocker to any pipeline depending on gcsio and > apparently blocks all Dataflow job in Python 3. > This could be another case that moving off apitools dependency in > [BEAM-4850|https://issues.apache.org/jira/browse/BEAM-4850]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4776) Java PortableRunner should support metrics
[ https://issues.apache.org/jira/browse/BEAM-4776?focusedWorklogId=189755=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189755 ] ASF GitHub Bot logged work on BEAM-4776: Author: ASF GitHub Bot Created on: 24/Jan/19 23:37 Start Date: 24/Jan/19 23:37 Worklog Time Spent: 10m Work Description: ryan-williams commented on pull request #7621: [BEAM-4776] make AutoValues for MetricResults, MetricQueryResults URL: https://github.com/apache/beam/pull/7621 … and remove duplicated implementations from `direct` and `direct/portable` runners This is an early bit of cleanup that came out of work I'm doing to plumb metrics over the job API (cf. [BEAM-4776](https://issues.apache.org/jira/browse/BEAM-4776)). In later work I re-use these same concrete `@AutoValue` classes a third time. R: @mxm, @ajamato CC: @youngoli Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | --- | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189755) Time Spent: 10m Remaining Estimate: 0h > Java PortableRunner should support metrics > -- > > Key: BEAM-4776 > URL: https://issues.apache.org/jira/browse/BEAM-4776 > Project: Beam > Issue Type: Bug > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Ryan Williams >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > BEAM-4775 concerns adding metrics to the JobService API; the
[jira] [Work logged] (BEAM-6154) Gcsio batch delete broken in Python 3
[ https://issues.apache.org/jira/browse/BEAM-6154?focusedWorklogId=189750=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189750 ] ASF GitHub Bot logged work on BEAM-6154: Author: ASF GitHub Bot Created on: 24/Jan/19 23:32 Start Date: 24/Jan/19 23:32 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #7617: [BEAM-6154] Update google-apitools to 0.5.26 and fix gcsio in python 3 URL: https://github.com/apache/beam/pull/7617 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189750) Time Spent: 2h (was: 1h 50m) > Gcsio batch delete broken in Python 3 > - > > Key: BEAM-6154 > URL: https://issues.apache.org/jira/browse/BEAM-6154 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > I'm running Python SDK agianst GCP in Python 3.5 and got following gcsio > error while deleting files: > {code} > File "/usr/local/lib/python3.5/site-packages/apache_beam/io/iobase.py", > line 1077, in > window.TimestampedValue(v, timestamp.MAX_TIMESTAMP) for v in outputs) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", > line 315, in finalize_write > num_threads) > File "/usr/local/lib/python3.5/site-packages/apache_beam/internal/util.py", > line 145, in run_using_threadpool > return pool.map(fn_to_execute, inputs) > File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 266, in map > return self._map_async(func, iterable, mapstar, chunksize).get() > File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 644, in get > raise self._value > File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 119, in worker > result = (True, func(*args, **kwds)) > File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 44, in mapstar > return list(map(*args)) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", > line 299, in _rename_batch > FileSystems.rename(source_files, destination_files) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/filesystems.py", line > 252, in rename > return filesystem.rename(source_file_names, destination_file_names) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsfilesystem.py", > line 229, in rename > copy_statuses = gcsio.GcsIO().copy_batch(batch) > File "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsio.py", > line 322, in copy_batch > api_calls = batch_request.Execute(self.client._http) # pylint: > disable=protected-access > File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", > line 222, in Execute > batch_http_request.Execute(http) > File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", > line 480, in Execute > self._Execute(http) > File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", > line 450, in _Execute > mime_response = parser.parsestr(header + response.content) > TypeError: Can't convert 'bytes' object to str implicitly > {code} > After looking into related code in apitools library, I found response.content > that's returned via http request to gcs is bytes and apitools didn't handle > this scenario. This can be a blocker to any pipeline depending on gcsio and > apparently blocks all Dataflow job in Python 3. > This could be another case that moving off apitools dependency in > [BEAM-4850|https://issues.apache.org/jira/browse/BEAM-4850]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6361) Fix user-metric prefix detection for portable Flink metrics
[ https://issues.apache.org/jira/browse/BEAM-6361?focusedWorklogId=189742=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189742 ] ASF GitHub Bot logged work on BEAM-6361: Author: ASF GitHub Bot Created on: 24/Jan/19 23:21 Start Date: 24/Jan/19 23:21 Worklog Time Spent: 10m Work Description: angoenka commented on issue #7408: [BEAM-6361][BEAM-6364] fix user-metric-prefix checking in Flink portable metrics update URL: https://github.com/apache/beam/pull/7408#issuecomment-457395853 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189742) Time Spent: 2h 10m (was: 2h) > Fix user-metric prefix detection for portable Flink metrics > --- > > Key: BEAM-6361 > URL: https://issues.apache.org/jira/browse/BEAM-6361 > Project: Beam > Issue Type: Bug > Components: runner-flink >Affects Versions: 2.9.0 >Reporter: Ryan Williams >Assignee: Ryan Williams >Priority: Minor > Time Spent: 2h 10m > Remaining Estimate: 0h > > [~ajam...@google.com] [noticed a > bug|https://github.com/apache/beam/pull/7183/files#r243402155] in classifying > portable Flink metrics as user-namespaced vs not. > I think it's only currently affecting how metrics get displayed downstream in > Flink, but it should be fixed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6361) Fix user-metric prefix detection for portable Flink metrics
[ https://issues.apache.org/jira/browse/BEAM-6361?focusedWorklogId=189726=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189726 ] ASF GitHub Bot logged work on BEAM-6361: Author: ASF GitHub Bot Created on: 24/Jan/19 22:59 Start Date: 24/Jan/19 22:59 Worklog Time Spent: 10m Work Description: ryan-williams commented on pull request #7408: [BEAM-6361][BEAM-6364] fix user-metric-prefix checking in Flink portable metrics update URL: https://github.com/apache/beam/pull/7408#discussion_r250810281 ## File path: runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/SimpleMonitoringInfoBuilder.java ## @@ -112,7 +112,7 @@ private boolean validate() { if (split.size() != 5) { LOG.warn( "Dropping MonitoringInfo for URN {}, UserMetric namespaces and " -+ "name cannot contain ':' characters.", ++ "names cannot contain ':' characters.", Review comment: It seems like you're agreeing that my fix is better? I'd changed it from "namespaces and name" to "namespaces and names". I also agree about the `s/,/./` on the comma. However it seems like this whole warning was dropped in the meantime, so this is all moot! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189726) Time Spent: 2h (was: 1h 50m) > Fix user-metric prefix detection for portable Flink metrics > --- > > Key: BEAM-6361 > URL: https://issues.apache.org/jira/browse/BEAM-6361 > Project: Beam > Issue Type: Bug > Components: runner-flink >Affects Versions: 2.9.0 >Reporter: Ryan Williams >Assignee: Ryan Williams >Priority: Minor > Time Spent: 2h > Remaining Estimate: 0h > > [~ajam...@google.com] [noticed a > bug|https://github.com/apache/beam/pull/7183/files#r243402155] in classifying > portable Flink metrics as user-namespaced vs not. > I think it's only currently affecting how metrics get displayed downstream in > Flink, but it should be fixed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6024) Gradle setupVirtualenv supports Python 3
[ https://issues.apache.org/jira/browse/BEAM-6024?focusedWorklogId=189723=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189723 ] ASF GitHub Bot logged work on BEAM-6024: Author: ASF GitHub Bot Created on: 24/Jan/19 22:51 Start Date: 24/Jan/19 22:51 Worklog Time Spent: 10m Work Description: markflyhigh commented on issue #7423: [BEAM-6024] Build Python 3 container image with Gradle URL: https://github.com/apache/beam/pull/7423#issuecomment-457388800 Run Seed Job This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189723) Time Spent: 2h 10m (was: 2h) > Gradle setupVirtualenv supports Python 3 > > > Key: BEAM-6024 > URL: https://issues.apache.org/jira/browse/BEAM-6024 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-harness >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > Need to depend on Python 3 virtualenv in few places: > - Build Dataflow worker container in Python 3 > - Run ValidatesRunner and integration tests on Jenkins -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6468) Cannot create empty TestBoundedTable
[ https://issues.apache.org/jira/browse/BEAM-6468?focusedWorklogId=189721=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189721 ] ASF GitHub Bot logged work on BEAM-6468: Author: ASF GitHub Bot Created on: 24/Jan/19 22:32 Start Date: 24/Jan/19 22:32 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #7568: [BEAM-6468] Allow creating empty TestBoundedTable URL: https://github.com/apache/beam/pull/7568#discussion_r250803387 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestBoundedTable.java ## @@ -92,8 +95,19 @@ public TestBoundedTable addRows(Object... args) { @Override public PCollection buildIOReader(PBegin begin) { +PTransform> valueTransform; + +if (rows.isEmpty()) { + valueTransform = + Create.empty( + SchemaCoder.of( + schema, SerializableFunctions.identity(), SerializableFunctions.identity())); +} else { + valueTransform = Create.of(rows); Review comment: if rows is empty, Create.of() will throw exception and ask for SchemaCoder This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189721) Time Spent: 3h (was: 2h 50m) > Cannot create empty TestBoundedTable > > > Key: BEAM-6468 > URL: https://issues.apache.org/jira/browse/BEAM-6468 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Time Spent: 3h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6161) Add ElementCount MonitoringInfos for the Java SDK
[ https://issues.apache.org/jira/browse/BEAM-6161?focusedWorklogId=189720=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189720 ] ASF GitHub Bot logged work on BEAM-6161: Author: ASF GitHub Bot Created on: 24/Jan/19 22:25 Start Date: 24/Jan/19 22:25 Worklog Time Spent: 10m Work Description: swegner commented on pull request #7272: [BEAM-6161] Introduce PCollectionConsumerRegistry and add ElementCoun… URL: https://github.com/apache/beam/pull/7272 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189720) Time Spent: 11h 20m (was: 11h 10m) > Add ElementCount MonitoringInfos for the Java SDK > - > > Key: BEAM-6161 > URL: https://issues.apache.org/jira/browse/BEAM-6161 > Project: Beam > Issue Type: New Feature > Components: java-fn-execution, sdk-java-harness >Reporter: Alex Amato >Assignee: Alex Amato >Priority: Major > Time Spent: 11h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6468) Cannot create empty TestBoundedTable
[ https://issues.apache.org/jira/browse/BEAM-6468?focusedWorklogId=189719=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189719 ] ASF GitHub Bot logged work on BEAM-6468: Author: ASF GitHub Bot Created on: 24/Jan/19 22:24 Start Date: 24/Jan/19 22:24 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #7568: [BEAM-6468] Allow creating empty TestBoundedTable URL: https://github.com/apache/beam/pull/7568#issuecomment-457381427 @akedin @apilloud This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189719) Time Spent: 2h 50m (was: 2h 40m) > Cannot create empty TestBoundedTable > > > Key: BEAM-6468 > URL: https://issues.apache.org/jira/browse/BEAM-6468 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6504) Integration of Portabability sideInput into Dataflow
[ https://issues.apache.org/jira/browse/BEAM-6504?focusedWorklogId=189717=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189717 ] ASF GitHub Bot logged work on BEAM-6504: Author: ASF GitHub Bot Created on: 24/Jan/19 22:23 Start Date: 24/Jan/19 22:23 Worklog Time Spent: 10m Work Description: HuangLED commented on issue #7619: [BEAM-6504] Create Portable sideInput handler for Dataflow (Part One) URL: https://github.com/apache/beam/pull/7619#issuecomment-457380947 R: @boyuanzz @swegner This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189717) Time Spent: 20m (was: 10m) > Integration of Portabability sideInput into Dataflow > > > Key: BEAM-6504 > URL: https://issues.apache.org/jira/browse/BEAM-6504 > Project: Beam > Issue Type: New Feature > Components: runner-dataflow >Reporter: Ruoyun Huang >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Underlying fn api support is done in BEAM-2929, this Jira integrates > everything into dataflow. > > 1) introduce a sideInputHandler for dataflow. > 2) wire the handler to dataflow runner (i.e. ProcessRemoteBundleOperation) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5442) PortableRunner swallows custom options for Runner
[ https://issues.apache.org/jira/browse/BEAM-5442?focusedWorklogId=189709=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189709 ] ASF GitHub Bot logged work on BEAM-5442: Author: ASF GitHub Bot Created on: 24/Jan/19 21:43 Start Date: 24/Jan/19 21:43 Worklog Time Spent: 10m Work Description: mxm commented on pull request #7597: [WIP] [BEAM-5442] Retrieve valid runner options from JobService URL: https://github.com/apache/beam/pull/7597#discussion_r250788212 ## File path: model/job-management/src/main/proto/beam_job_api.proto ## @@ -176,3 +179,34 @@ message JobState { CANCELLING = 10; } } + + +// DescribePipelineOptions provides metadata about the options supported by a runner. +// It will be used by the SDK client to validate the options specified by or +// list available options to the user. // Throws error GRPC_STATUS_UNAVAILABLE if server is down +message DescribePipelineOptionsRequest { + // TODO: base options to support proxy functionality Review comment: I see, thanks for clarifying. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189709) Time Spent: 17.5h (was: 17h 20m) > PortableRunner swallows custom options for Runner > - > > Key: BEAM-5442 > URL: https://issues.apache.org/jira/browse/BEAM-5442 > Project: Beam > Issue Type: Bug > Components: sdk-java-core, sdk-py-core >Reporter: Maximilian Michels >Assignee: Thomas Weise >Priority: Major > Labels: portability, portability-flink > Time Spent: 17.5h > Remaining Estimate: 0h > > The PortableRunner doesn't pass custom PipelineOptions to the executing > Runner. > Example: {{--parallelism=4}} won't be forwarded to the FlinkRunner. > (The option is just removed during proto translation without any warning) > We should allow some form of customization through the options, even for the > PortableRunner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6443) decrease the number of threads for BigQuery streaming insertAll
[ https://issues.apache.org/jira/browse/BEAM-6443?focusedWorklogId=189711=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189711 ] ASF GitHub Bot logged work on BEAM-6443: Author: ASF GitHub Bot Created on: 24/Jan/19 21:54 Start Date: 24/Jan/19 21:54 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #7547: [BEAM-6443] decrease the number of thread for BigQuery streaming inse… URL: https://github.com/apache/beam/pull/7547#issuecomment-457372034 Actually, seems like sharding is per step not per bundle: https://github.com/apache/beam/blob/89e322198828702db239998d6dd29a574fe04898/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteTables.java#L139 So the proposal here is to reduce the parallelism of writing to BQ for each write step from 50 unlimited sized thread pools to 50 threads. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189711) Time Spent: 1h 50m (was: 1h 40m) > decrease the number of threads for BigQuery streaming insertAll > --- > > Key: BEAM-6443 > URL: https://issues.apache.org/jira/browse/BEAM-6443 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > When inserting (a large number of ) very small elements into BigQuery via > streaming insertAll, BigQueryIO causes lots of quota exceeded errors. This > implies that 1) BigQueryIO puts unnecessary overheads on BigQuery API layer > by sending requests too fast 2) log file becomes very big because of repeated > same error messages. Currently we use 50 shards for writing data into > BigQuery and in each bundle 20-30 futures are executed simultaneously with > unlimited thread pool. It would be worth investigating whether just single > thread pool is sufficient for running concurrent insertAll. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6487) BigQuery api is out of date
[ https://issues.apache.org/jira/browse/BEAM-6487?focusedWorklogId=189710=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189710 ] ASF GitHub Bot logged work on BEAM-6487: Author: ASF GitHub Bot Created on: 24/Jan/19 21:51 Start Date: 24/Jan/19 21:51 Worklog Time Spent: 10m Work Description: pabloem commented on issue #7590: [BEAM-6487] Updating BQ API URL: https://github.com/apache/beam/pull/7590#issuecomment-457371406 rebased. Hoping this'll fix the lint issue. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189710) Time Spent: 50m (was: 40m) > BigQuery api is out of date > --- > > Key: BEAM-6487 > URL: https://issues.apache.org/jira/browse/BEAM-6487 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5442) PortableRunner swallows custom options for Runner
[ https://issues.apache.org/jira/browse/BEAM-5442?focusedWorklogId=189708=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189708 ] ASF GitHub Bot logged work on BEAM-5442: Author: ASF GitHub Bot Created on: 24/Jan/19 21:42 Start Date: 24/Jan/19 21:42 Worklog Time Spent: 10m Work Description: mxm commented on pull request #7597: [WIP] [BEAM-5442] Retrieve valid runner options from JobService URL: https://github.com/apache/beam/pull/7597#discussion_r250788102 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsFactory.java ## @@ -630,6 +634,55 @@ public static void printHelp(PrintStream out, Class i } } + public static Map describe(Set> ifaces) { +checkNotNull(ifaces); +Map result = new HashMap<>(); + +for (Class iface : ifaces) { + CACHE.get().validateWellFormed(iface); + + Set properties = PipelineOptionsReflector.getOptionSpecs(iface); + + RowSortedTable, String, Method> ifacePropGetterTable = + TreeBasedTable.create(ClassNameComparator.INSTANCE, Ordering.natural()); + for (PipelineOptionSpec prop : properties) { +ifacePropGetterTable.put(prop.getDefiningInterface(), prop.getName(), prop.getGetterMethod()); + } + + for (Map.Entry, Map> ifaceToPropertyMap : + ifacePropGetterTable.rowMap().entrySet()) { +Class currentIface = ifaceToPropertyMap.getKey(); +Map propertyNamesToGetters = ifaceToPropertyMap.getValue(); + +List lists = Lists.newArrayList(propertyNamesToGetters.keySet()); +lists.sort(String.CASE_INSENSITIVE_ORDER); +for (String propertyName : lists) { + Method method = propertyNamesToGetters.get(propertyName); + // TODO: type representation + String printableType = method.getReturnType().getSimpleName(); Review comment: Would it work to have a list of pre-defined types alongside with a mapping from Java types to those types? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189708) Time Spent: 17h 20m (was: 17h 10m) > PortableRunner swallows custom options for Runner > - > > Key: BEAM-5442 > URL: https://issues.apache.org/jira/browse/BEAM-5442 > Project: Beam > Issue Type: Bug > Components: sdk-java-core, sdk-py-core >Reporter: Maximilian Michels >Assignee: Thomas Weise >Priority: Major > Labels: portability, portability-flink > Time Spent: 17h 20m > Remaining Estimate: 0h > > The PortableRunner doesn't pass custom PipelineOptions to the executing > Runner. > Example: {{--parallelism=4}} won't be forwarded to the FlinkRunner. > (The option is just removed during proto translation without any warning) > We should allow some form of customization through the options, even for the > PortableRunner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-6476) Kubectl command missing on workers beam8 and beam11
[ https://issues.apache.org/jira/browse/BEAM-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukasz Gajowy closed BEAM-6476. --- Resolution: Fixed Fix Version/s: Not applicable > Kubectl command missing on workers beam8 and beam11 > --- > > Key: BEAM-6476 > URL: https://issues.apache.org/jira/browse/BEAM-6476 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Lukasz Gajowy >Assignee: Jason Kuster >Priority: Major > Fix For: Not applicable > > > Kubernetes seems to have gone missing on beam8 and beam11 leading to > Performance Tests failures: > [https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests] > {code:java} > ... > [EnvInject] - Variables injected successfully. > [beam_PerformanceTests_JDBC] $ /bin/bash -xe > /tmp/jenkins5453785317279275290.sh > + gcloud container clusters get-credentials io-datastores > --zone=us-central1-a --verbosity=debug > DEBUG: Running [gcloud.container.clusters.get-credentials] with arguments: > [--verbosity: "debug", --zone: "us-central1-a", NAME: "io-datastores"] > WARNING: Accessing a Kubernetes Engine cluster requires the kubernetes > commandline > client [kubectl]. To install, run > $ gcloud components install kubectl > Fetching cluster endpoint and auth data. > DEBUG: Saved kubeconfig to /home/jenkins/.kube/config > kubeconfig entry generated for io-datastores. > INFO: Display format: "default" > DEBUG: SDK update checks are disabled. > [beam_PerformanceTests_JDBC] $ /bin/bash -xe > /tmp/jenkins5922335513066151017.sh > + cp /home/jenkins/.kube/config > /home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_JDBC/config-beam-performancetests-jdbc-1603 > [beam_PerformanceTests_JDBC] $ /bin/bash -xe /tmp/jenkins693655582377630528.sh > + kubectl > --kubeconfig=/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_JDBC/config-beam-performancetests-jdbc-1603 > create namespace beam-performancetests-jdbc-1603 > /tmp/jenkins693655582377630528.sh: line 2: kubectl: command not found > Build step 'Execute shell' marked build as failure > Sending e-mails to: bui...@beam.apache.org > Finished: FAILURE > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6476) Kubectl command missing on workers beam8 and beam11
[ https://issues.apache.org/jira/browse/BEAM-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751609#comment-16751609 ] Lukasz Gajowy commented on BEAM-6476: - This has been fixed by infra. > Kubectl command missing on workers beam8 and beam11 > --- > > Key: BEAM-6476 > URL: https://issues.apache.org/jira/browse/BEAM-6476 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Lukasz Gajowy >Assignee: Jason Kuster >Priority: Major > > Kubernetes seems to have gone missing on beam8 and beam11 leading to > Performance Tests failures: > [https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests] > {code:java} > ... > [EnvInject] - Variables injected successfully. > [beam_PerformanceTests_JDBC] $ /bin/bash -xe > /tmp/jenkins5453785317279275290.sh > + gcloud container clusters get-credentials io-datastores > --zone=us-central1-a --verbosity=debug > DEBUG: Running [gcloud.container.clusters.get-credentials] with arguments: > [--verbosity: "debug", --zone: "us-central1-a", NAME: "io-datastores"] > WARNING: Accessing a Kubernetes Engine cluster requires the kubernetes > commandline > client [kubectl]. To install, run > $ gcloud components install kubectl > Fetching cluster endpoint and auth data. > DEBUG: Saved kubeconfig to /home/jenkins/.kube/config > kubeconfig entry generated for io-datastores. > INFO: Display format: "default" > DEBUG: SDK update checks are disabled. > [beam_PerformanceTests_JDBC] $ /bin/bash -xe > /tmp/jenkins5922335513066151017.sh > + cp /home/jenkins/.kube/config > /home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_JDBC/config-beam-performancetests-jdbc-1603 > [beam_PerformanceTests_JDBC] $ /bin/bash -xe /tmp/jenkins693655582377630528.sh > + kubectl > --kubeconfig=/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_JDBC/config-beam-performancetests-jdbc-1603 > create namespace beam-performancetests-jdbc-1603 > /tmp/jenkins693655582377630528.sh: line 2: kubectl: command not found > Build step 'Execute shell' marked build as failure > Sending e-mails to: bui...@beam.apache.org > Finished: FAILURE > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6184) PortableRunner dependency missed in wordcount example maven artifact
[ https://issues.apache.org/jira/browse/BEAM-6184?focusedWorklogId=189692=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189692 ] ASF GitHub Bot logged work on BEAM-6184: Author: ASF GitHub Bot Created on: 24/Jan/19 21:07 Start Date: 24/Jan/19 21:07 Worklog Time Spent: 10m Work Description: swegner commented on pull request #7532: [BEAM-6184]Make checkstyle report error on missing javaDocMethod URL: https://github.com/apache/beam/pull/7532 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189692) Time Spent: 10h 20m (was: 10h 10m) > PortableRunner dependency missed in wordcount example maven artifact > > > Key: BEAM-6184 > URL: https://issues.apache.org/jira/browse/BEAM-6184 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Ruoyun Huang >Assignee: Ruoyun Huang >Priority: Minor > Fix For: Not applicable > > Time Spent: 10h 20m > Remaining Estimate: 0h > > > > more context: > https://lists.apache.org/thread.html/8dd60395424425f7502d62888c49014430d1d3b06c026606f3db28ab@%3Cuser.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6468) Cannot create empty TestBoundedTable
[ https://issues.apache.org/jira/browse/BEAM-6468?focusedWorklogId=189698=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189698 ] ASF GitHub Bot logged work on BEAM-6468: Author: ASF GitHub Bot Created on: 24/Jan/19 21:11 Start Date: 24/Jan/19 21:11 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #7568: [BEAM-6468] Allow creating empty TestBoundedTable URL: https://github.com/apache/beam/pull/7568#issuecomment-457358175 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189698) Time Spent: 2h 40m (was: 2.5h) > Cannot create empty TestBoundedTable > > > Key: BEAM-6468 > URL: https://issues.apache.org/jira/browse/BEAM-6468 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6154) Gcsio batch delete broken in Python 3
[ https://issues.apache.org/jira/browse/BEAM-6154?focusedWorklogId=189697=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189697 ] ASF GitHub Bot logged work on BEAM-6154: Author: ASF GitHub Bot Created on: 24/Jan/19 21:10 Start Date: 24/Jan/19 21:10 Worklog Time Spent: 10m Work Description: markflyhigh commented on issue #7617: [BEAM-6154] Update google-apitools to 0.5.26 and fix gcsio in python 3 URL: https://github.com/apache/beam/pull/7617#issuecomment-457357755 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189697) Time Spent: 1h 40m (was: 1.5h) > Gcsio batch delete broken in Python 3 > - > > Key: BEAM-6154 > URL: https://issues.apache.org/jira/browse/BEAM-6154 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > I'm running Python SDK agianst GCP in Python 3.5 and got following gcsio > error while deleting files: > {code} > File "/usr/local/lib/python3.5/site-packages/apache_beam/io/iobase.py", > line 1077, in > window.TimestampedValue(v, timestamp.MAX_TIMESTAMP) for v in outputs) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", > line 315, in finalize_write > num_threads) > File "/usr/local/lib/python3.5/site-packages/apache_beam/internal/util.py", > line 145, in run_using_threadpool > return pool.map(fn_to_execute, inputs) > File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 266, in map > return self._map_async(func, iterable, mapstar, chunksize).get() > File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 644, in get > raise self._value > File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 119, in worker > result = (True, func(*args, **kwds)) > File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 44, in mapstar > return list(map(*args)) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", > line 299, in _rename_batch > FileSystems.rename(source_files, destination_files) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/filesystems.py", line > 252, in rename > return filesystem.rename(source_file_names, destination_file_names) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsfilesystem.py", > line 229, in rename > copy_statuses = gcsio.GcsIO().copy_batch(batch) > File "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsio.py", > line 322, in copy_batch > api_calls = batch_request.Execute(self.client._http) # pylint: > disable=protected-access > File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", > line 222, in Execute > batch_http_request.Execute(http) > File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", > line 480, in Execute > self._Execute(http) > File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", > line 450, in _Execute > mime_response = parser.parsestr(header + response.content) > TypeError: Can't convert 'bytes' object to str implicitly > {code} > After looking into related code in apitools library, I found response.content > that's returned via http request to gcs is bytes and apitools didn't handle > this scenario. This can be a blocker to any pipeline depending on gcsio and > apparently blocks all Dataflow job in Python 3. > This could be another case that moving off apitools dependency in > [BEAM-4850|https://issues.apache.org/jira/browse/BEAM-4850]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6431) Add ExecutionTime metrics to the Beam Java SDK
[ https://issues.apache.org/jira/browse/BEAM-6431?focusedWorklogId=189686=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189686 ] ASF GitHub Bot logged work on BEAM-6431: Author: ASF GitHub Bot Created on: 24/Jan/19 21:03 Start Date: 24/Jan/19 21:03 Worklog Time Spent: 10m Work Description: ajamato commented on issue #7507: [BEAM-6431] [Do not merge] Refactor, Remove references to Dataflow classes in base State Sampling URL: https://github.com/apache/beam/pull/7507#issuecomment-457355342 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189686) Time Spent: 50m (was: 40m) > Add ExecutionTime metrics to the Beam Java SDK > -- > > Key: BEAM-6431 > URL: https://issues.apache.org/jira/browse/BEAM-6431 > Project: Beam > Issue Type: New Feature > Components: java-fn-execution >Reporter: Alex Amato >Assignee: Alex Amato >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > This will be done by using the Dataflow worker's StateSampler code. I have > put together a refactoring plan > [here|https://docs.google.com/document/d/1OlAJf4T_CTL9WRH8lP8uQOfLjWYfm8IpRXSe38g34k4/edit#] > This will include estimating the processing time for the start, process and > finish bundle. The python SDK already has an implementation of this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3342) Create a Cloud Bigtable Python connector
[ https://issues.apache.org/jira/browse/BEAM-3342?focusedWorklogId=189684=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189684 ] ASF GitHub Bot logged work on BEAM-3342: Author: ASF GitHub Bot Created on: 24/Jan/19 21:02 Start Date: 24/Jan/19 21:02 Worklog Time Spent: 10m Work Description: juan-rael commented on pull request #7367: [BEAM-3342] Create a Cloud Bigtable Python connector Write URL: https://github.com/apache/beam/pull/7367#discussion_r250775235 ## File path: sdks/python/apache_beam/examples/cookbook/bigtableio_it_test.py ## @@ -0,0 +1,197 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Unittest for GCP Bigtable testing.""" +from __future__ import absolute_import + +import datetime +import logging +import random +import string +import unittest +import uuid + +import pytz + +import apache_beam as beam +from apache_beam.io.gcp.bigtableio import _BigTableWriteFn +from apache_beam.metrics.metric import MetricsFilter +from apache_beam.options.pipeline_options import PipelineOptions +from apache_beam.runners.runner import PipelineState +from apache_beam.testing.test_pipeline import TestPipeline + +# Protect against environments where bigtable library is not available. +# pylint: disable=wrong-import-order, wrong-import-position +try: + from google.cloud._helpers import _datetime_from_microseconds + from google.cloud._helpers import _microseconds_from_datetime + from google.cloud._helpers import UTC + from google.cloud.bigtable import row, column_family, Client +except ImportError: + Client = None + UTC = pytz.utc + _microseconds_from_datetime = lambda label_stamp: label_stamp + _datetime_from_microseconds = lambda micro: micro + + +EXISTING_INSTANCES = [] +LABEL_KEY = u'python-bigtable-beam' +label_stamp = datetime.datetime.utcnow().replace(tzinfo=UTC) +label_stamp_micros = _microseconds_from_datetime(label_stamp) +LABELS = {LABEL_KEY: str(label_stamp_micros)} + + +def _retry_on_unavailable(exc): + """Retry only errors whose status code is 'UNAVAILABLE'.""" + from grpc import StatusCode + return exc.code() == StatusCode.UNAVAILABLE + + +class WriteToBigTable(beam.PTransform): Review comment: I correct all the problems... This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189684) Time Spent: 13h 40m (was: 13.5h) > Create a Cloud Bigtable Python connector > > > Key: BEAM-3342 > URL: https://issues.apache.org/jira/browse/BEAM-3342 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Solomon Duskis >Assignee: Solomon Duskis >Priority: Major > Time Spent: 13h 40m > Remaining Estimate: 0h > > I would like to create a Cloud Bigtable python connector. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6154) Gcsio batch delete broken in Python 3
[ https://issues.apache.org/jira/browse/BEAM-6154?focusedWorklogId=189611=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189611 ] ASF GitHub Bot logged work on BEAM-6154: Author: ASF GitHub Bot Created on: 24/Jan/19 18:36 Start Date: 24/Jan/19 18:36 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #7617: [BEAM-6154] Update google-apitools to 0.5.26 and fix gcsio in python 3 URL: https://github.com/apache/beam/pull/7617#issuecomment-457307838 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189611) Time Spent: 1h 10m (was: 1h) > Gcsio batch delete broken in Python 3 > - > > Key: BEAM-6154 > URL: https://issues.apache.org/jira/browse/BEAM-6154 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > I'm running Python SDK agianst GCP in Python 3.5 and got following gcsio > error while deleting files: > {code} > File "/usr/local/lib/python3.5/site-packages/apache_beam/io/iobase.py", > line 1077, in > window.TimestampedValue(v, timestamp.MAX_TIMESTAMP) for v in outputs) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", > line 315, in finalize_write > num_threads) > File "/usr/local/lib/python3.5/site-packages/apache_beam/internal/util.py", > line 145, in run_using_threadpool > return pool.map(fn_to_execute, inputs) > File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 266, in map > return self._map_async(func, iterable, mapstar, chunksize).get() > File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 644, in get > raise self._value > File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 119, in worker > result = (True, func(*args, **kwds)) > File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 44, in mapstar > return list(map(*args)) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", > line 299, in _rename_batch > FileSystems.rename(source_files, destination_files) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/filesystems.py", line > 252, in rename > return filesystem.rename(source_file_names, destination_file_names) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsfilesystem.py", > line 229, in rename > copy_statuses = gcsio.GcsIO().copy_batch(batch) > File "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsio.py", > line 322, in copy_batch > api_calls = batch_request.Execute(self.client._http) # pylint: > disable=protected-access > File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", > line 222, in Execute > batch_http_request.Execute(http) > File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", > line 480, in Execute > self._Execute(http) > File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", > line 450, in _Execute > mime_response = parser.parsestr(header + response.content) > TypeError: Can't convert 'bytes' object to str implicitly > {code} > After looking into related code in apitools library, I found response.content > that's returned via http request to gcs is bytes and apitools didn't handle > this scenario. This can be a blocker to any pipeline depending on gcsio and > apparently blocks all Dataflow job in Python 3. > This could be another case that moving off apitools dependency in > [BEAM-4850|https://issues.apache.org/jira/browse/BEAM-4850]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6161) Add ElementCount MonitoringInfos for the Java SDK
[ https://issues.apache.org/jira/browse/BEAM-6161?focusedWorklogId=189687=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189687 ] ASF GitHub Bot logged work on BEAM-6161: Author: ASF GitHub Bot Created on: 24/Jan/19 21:03 Start Date: 24/Jan/19 21:03 Worklog Time Spent: 10m Work Description: ajamato commented on issue #7272: [BEAM-6161] Introduce PCollectionConsumerRegistry and add ElementCoun… URL: https://github.com/apache/beam/pull/7272#issuecomment-457355367 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189687) Time Spent: 11h 10m (was: 11h) > Add ElementCount MonitoringInfos for the Java SDK > - > > Key: BEAM-6161 > URL: https://issues.apache.org/jira/browse/BEAM-6161 > Project: Beam > Issue Type: New Feature > Components: java-fn-execution, sdk-java-harness >Reporter: Alex Amato >Assignee: Alex Amato >Priority: Major > Time Spent: 11h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5933) PCollectionViews$SimplePCollectionView.hashCode allocates memory
[ https://issues.apache.org/jira/browse/BEAM-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751579#comment-16751579 ] Ismaël Mejía commented on BEAM-5933: clone? close? I did not understand what is the reality? isn't allocating extra memory an issue? > PCollectionViews$SimplePCollectionView.hashCode allocates memory > > > Key: BEAM-5933 > URL: https://issues.apache.org/jira/browse/BEAM-5933 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Affects Versions: 2.8.0 >Reporter: Vojtech Janota >Assignee: Vojtech Janota >Priority: Trivial > Fix For: 2.9.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > I'm currently profiling memory consumption of our Beam pipeline and have > noticed that > > org.apache.beam.sdk.values.PCollectionViews$SimplePCollectionView.hashCode() > makes noticeable heap allocations. The implementation is: > return Objects.hash(tag); > That itself translates to: > return Arrays.hashCode(values); > Which performs implicit array creation in order to call: > public static int Arrays.hashCode(Object a[]); > Instead of the helper call, doing simple: > tag.hashCode(); > Seems more appropriate. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-6354) Hanging SplittableDoFnTest#testLateData
[ https://issues.apache.org/jira/browse/BEAM-6354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles resolved BEAM-6354. --- Resolution: Fixed > Hanging SplittableDoFnTest#testLateData > --- > > Key: BEAM-6354 > URL: https://issues.apache.org/jira/browse/BEAM-6354 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Kenneth Knowles >Priority: Major > Fix For: 2.10.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > It seems that they have a similar root cause because both of them use > unbounded streams. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6504) Integration of Portabability sideInput into Dataflow
[ https://issues.apache.org/jira/browse/BEAM-6504?focusedWorklogId=189666=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189666 ] ASF GitHub Bot logged work on BEAM-6504: Author: ASF GitHub Bot Created on: 24/Jan/19 20:14 Start Date: 24/Jan/19 20:14 Worklog Time Spent: 10m Work Description: HuangLED commented on pull request #7619: [BEAM-6504] Create sideInput handler for Dataflow (Part One) URL: https://github.com/apache/beam/pull/7619 Create a SideInput handler inside dataflow fnApi runner. There will be another PR to actually wire this independent module. Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). It will help us expedite review of your Pull Request if you tag someone (e.g. `@username`) to look at it. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | --- | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189666) Time Spent: 10m Remaining Estimate: 0h > Integration of Portabability sideInput into Dataflow > > > Key: BEAM-6504 >
[jira] [Work logged] (BEAM-6184) PortableRunner dependency missed in wordcount example maven artifact
[ https://issues.apache.org/jira/browse/BEAM-6184?focusedWorklogId=189654=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189654 ] ASF GitHub Bot logged work on BEAM-6184: Author: ASF GitHub Bot Created on: 24/Jan/19 19:55 Start Date: 24/Jan/19 19:55 Worklog Time Spent: 10m Work Description: HuangLED commented on issue #7532: [BEAM-6184]Make checkstyle report error on missing javadocmethod URL: https://github.com/apache/beam/pull/7532#issuecomment-457334156 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189654) Time Spent: 10h (was: 9h 50m) > PortableRunner dependency missed in wordcount example maven artifact > > > Key: BEAM-6184 > URL: https://issues.apache.org/jira/browse/BEAM-6184 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Ruoyun Huang >Assignee: Ruoyun Huang >Priority: Minor > Fix For: Not applicable > > Time Spent: 10h > Remaining Estimate: 0h > > > > more context: > https://lists.apache.org/thread.html/8dd60395424425f7502d62888c49014430d1d3b06c026606f3db28ab@%3Cuser.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6504) Integration of Portabability sideInput into Dataflow
Ruoyun Huang created BEAM-6504: -- Summary: Integration of Portabability sideInput into Dataflow Key: BEAM-6504 URL: https://issues.apache.org/jira/browse/BEAM-6504 Project: Beam Issue Type: New Feature Components: runner-dataflow Reporter: Ruoyun Huang Assignee: Ruoyun Huang Underlying fn api support is done in BEAM-2929, this Jira integrates everything into dataflow. 1) introduce a sideInputHandler for dataflow. 2) wire the handler to dataflow runner (i.e. ProcessRemoteBundleOperation) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6154) Gcsio batch delete broken in Python 3
[ https://issues.apache.org/jira/browse/BEAM-6154?focusedWorklogId=189665=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189665 ] ASF GitHub Bot logged work on BEAM-6154: Author: ASF GitHub Bot Created on: 24/Jan/19 20:09 Start Date: 24/Jan/19 20:09 Worklog Time Spent: 10m Work Description: markflyhigh commented on issue #7617: [BEAM-6154] Update google-apitools to 0.5.26 and fix gcsio in python 3 URL: https://github.com/apache/beam/pull/7617#issuecomment-457338679 Run Python Dataflow ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189665) Time Spent: 1.5h (was: 1h 20m) > Gcsio batch delete broken in Python 3 > - > > Key: BEAM-6154 > URL: https://issues.apache.org/jira/browse/BEAM-6154 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > I'm running Python SDK agianst GCP in Python 3.5 and got following gcsio > error while deleting files: > {code} > File "/usr/local/lib/python3.5/site-packages/apache_beam/io/iobase.py", > line 1077, in > window.TimestampedValue(v, timestamp.MAX_TIMESTAMP) for v in outputs) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", > line 315, in finalize_write > num_threads) > File "/usr/local/lib/python3.5/site-packages/apache_beam/internal/util.py", > line 145, in run_using_threadpool > return pool.map(fn_to_execute, inputs) > File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 266, in map > return self._map_async(func, iterable, mapstar, chunksize).get() > File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 644, in get > raise self._value > File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 119, in worker > result = (True, func(*args, **kwds)) > File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 44, in mapstar > return list(map(*args)) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", > line 299, in _rename_batch > FileSystems.rename(source_files, destination_files) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/filesystems.py", line > 252, in rename > return filesystem.rename(source_file_names, destination_file_names) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsfilesystem.py", > line 229, in rename > copy_statuses = gcsio.GcsIO().copy_batch(batch) > File "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsio.py", > line 322, in copy_batch > api_calls = batch_request.Execute(self.client._http) # pylint: > disable=protected-access > File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", > line 222, in Execute > batch_http_request.Execute(http) > File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", > line 480, in Execute > self._Execute(http) > File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", > line 450, in _Execute > mime_response = parser.parsestr(header + response.content) > TypeError: Can't convert 'bytes' object to str implicitly > {code} > After looking into related code in apitools library, I found response.content > that's returned via http request to gcs is bytes and apitools didn't handle > this scenario. This can be a blocker to any pipeline depending on gcsio and > apparently blocks all Dataflow job in Python 3. > This could be another case that moving off apitools dependency in > [BEAM-4850|https://issues.apache.org/jira/browse/BEAM-4850]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6184) PortableRunner dependency missed in wordcount example maven artifact
[ https://issues.apache.org/jira/browse/BEAM-6184?focusedWorklogId=189652=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189652 ] ASF GitHub Bot logged work on BEAM-6184: Author: ASF GitHub Bot Created on: 24/Jan/19 19:54 Start Date: 24/Jan/19 19:54 Worklog Time Spent: 10m Work Description: HuangLED commented on issue #7532: [BEAM-6184]Make checkstyle report error on missing javadocmethod URL: https://github.com/apache/beam/pull/7532#issuecomment-457333792 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189652) Time Spent: 9h 40m (was: 9.5h) > PortableRunner dependency missed in wordcount example maven artifact > > > Key: BEAM-6184 > URL: https://issues.apache.org/jira/browse/BEAM-6184 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Ruoyun Huang >Assignee: Ruoyun Huang >Priority: Minor > Fix For: Not applicable > > Time Spent: 9h 40m > Remaining Estimate: 0h > > > > more context: > https://lists.apache.org/thread.html/8dd60395424425f7502d62888c49014430d1d3b06c026606f3db28ab@%3Cuser.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6184) PortableRunner dependency missed in wordcount example maven artifact
[ https://issues.apache.org/jira/browse/BEAM-6184?focusedWorklogId=189656=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189656 ] ASF GitHub Bot logged work on BEAM-6184: Author: ASF GitHub Bot Created on: 24/Jan/19 20:02 Start Date: 24/Jan/19 20:02 Worklog Time Spent: 10m Work Description: HuangLED commented on issue #7532: [BEAM-6184]Make checkstyle report error on missing javadocmethod URL: https://github.com/apache/beam/pull/7532#issuecomment-457336629 synced to HEAD and now shows all green This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189656) Time Spent: 10h 10m (was: 10h) > PortableRunner dependency missed in wordcount example maven artifact > > > Key: BEAM-6184 > URL: https://issues.apache.org/jira/browse/BEAM-6184 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Ruoyun Huang >Assignee: Ruoyun Huang >Priority: Minor > Fix For: Not applicable > > Time Spent: 10h 10m > Remaining Estimate: 0h > > > > more context: > https://lists.apache.org/thread.html/8dd60395424425f7502d62888c49014430d1d3b06c026606f3db28ab@%3Cuser.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6184) PortableRunner dependency missed in wordcount example maven artifact
[ https://issues.apache.org/jira/browse/BEAM-6184?focusedWorklogId=189653=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189653 ] ASF GitHub Bot logged work on BEAM-6184: Author: ASF GitHub Bot Created on: 24/Jan/19 19:55 Start Date: 24/Jan/19 19:55 Worklog Time Spent: 10m Work Description: HuangLED commented on issue #7532: [BEAM-6184]Make checkstyle report error on missing javadocmethod URL: https://github.com/apache/beam/pull/7532#issuecomment-457334112 Run Seed Job This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189653) Time Spent: 9h 50m (was: 9h 40m) > PortableRunner dependency missed in wordcount example maven artifact > > > Key: BEAM-6184 > URL: https://issues.apache.org/jira/browse/BEAM-6184 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Ruoyun Huang >Assignee: Ruoyun Huang >Priority: Minor > Fix For: Not applicable > > Time Spent: 9h 50m > Remaining Estimate: 0h > > > > more context: > https://lists.apache.org/thread.html/8dd60395424425f7502d62888c49014430d1d3b06c026606f3db28ab@%3Cuser.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5442) PortableRunner swallows custom options for Runner
[ https://issues.apache.org/jira/browse/BEAM-5442?focusedWorklogId=189645=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189645 ] ASF GitHub Bot logged work on BEAM-5442: Author: ASF GitHub Bot Created on: 24/Jan/19 19:39 Start Date: 24/Jan/19 19:39 Worklog Time Spent: 10m Work Description: mxm commented on issue #7564: [release] Revert "[BEAM-5442] Pass SDK-unknown PipelineOptions to Runner" URL: https://github.com/apache/beam/pull/7564#issuecomment-457329021 Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189645) Time Spent: 17h (was: 16h 50m) > PortableRunner swallows custom options for Runner > - > > Key: BEAM-5442 > URL: https://issues.apache.org/jira/browse/BEAM-5442 > Project: Beam > Issue Type: Bug > Components: sdk-java-core, sdk-py-core >Reporter: Maximilian Michels >Assignee: Thomas Weise >Priority: Major > Labels: portability, portability-flink > Time Spent: 17h > Remaining Estimate: 0h > > The PortableRunner doesn't pass custom PipelineOptions to the executing > Runner. > Example: {{--parallelism=4}} won't be forwarded to the FlinkRunner. > (The option is just removed during proto translation without any warning) > We should allow some form of customization through the options, even for the > PortableRunner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5442) PortableRunner swallows custom options for Runner
[ https://issues.apache.org/jira/browse/BEAM-5442?focusedWorklogId=189649=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189649 ] ASF GitHub Bot logged work on BEAM-5442: Author: ASF GitHub Bot Created on: 24/Jan/19 19:46 Start Date: 24/Jan/19 19:46 Worklog Time Spent: 10m Work Description: tweise commented on pull request #7597: [WIP] [BEAM-5442] Retrieve valid runner options from JobService URL: https://github.com/apache/beam/pull/7597#discussion_r250697265 ## File path: model/job-management/src/main/proto/beam_job_api.proto ## @@ -176,3 +179,37 @@ message JobState { CANCELLING = 10; } } + + +// DescribePipelineOptions provides metadata about the options supported by a runner. +// It will be used by the SDK client to validate the options specified by or +// list available options to the user. +// Throws error GRPC_STATUS_UNAVAILABLE if server is down +message DescribePipelineOptionsRequest { + // TODO: base options to support proxy functionality +} + +// Metadata for a pipeline option. +message PipelineOptionDescriptor { + // (Required) The option key. + string key = 1; + + // (Required) Type of option. + // TODO: enumerate + string type = 2; + + // (Optional) Description suitable for display / help text. + string description = 3; + + // (Optional) Default value. + string default_value = 4; + + // (Required) The group this option belongs to. + string group = 5; +} + +message DescribePipelineOptionsResponse { + // Map of pipeline option key to descriptor. + // Note that keys are expected to be unique accross groups. + map options = 1; Review comment: @mxm this map/key may complicate things if we later introduce scoping. Perhaps just return all options as list? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189649) Time Spent: 17h 10m (was: 17h) > PortableRunner swallows custom options for Runner > - > > Key: BEAM-5442 > URL: https://issues.apache.org/jira/browse/BEAM-5442 > Project: Beam > Issue Type: Bug > Components: sdk-java-core, sdk-py-core >Reporter: Maximilian Michels >Assignee: Thomas Weise >Priority: Major > Labels: portability, portability-flink > Time Spent: 17h 10m > Remaining Estimate: 0h > > The PortableRunner doesn't pass custom PipelineOptions to the executing > Runner. > Example: {{--parallelism=4}} won't be forwarded to the FlinkRunner. > (The option is just removed during proto translation without any warning) > We should allow some form of customization through the options, even for the > PortableRunner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4620) UnboundedReadFromBoundedSource.split() should always call split()
[ https://issues.apache.org/jira/browse/BEAM-4620?focusedWorklogId=189648=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189648 ] ASF GitHub Bot logged work on BEAM-4620: Author: ASF GitHub Bot Created on: 24/Jan/19 19:45 Start Date: 24/Jan/19 19:45 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #7555: [BEAM-4620] UnboundedReadFromBoundedSource invokes split for small bounded sources URL: https://github.com/apache/beam/pull/7555#issuecomment-457331131 PTAL. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189648) Time Spent: 50m (was: 40m) > UnboundedReadFromBoundedSource.split() should always call split() > - > > Key: BEAM-4620 > URL: https://issues.apache.org/jira/browse/BEAM-4620 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Josh M >Assignee: Chamikara Jayalath >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > If the source contains too little data (rounds the result down to 0), or if > size estimation fails, then we don't call .split(). We must always call > split(); instead of returning original source, we should have computed some > fallback value for desiredBundleSize. > This bug has existed since the code was first introduced, so it is not a > regression - but it is certainly a bug. > > source: > https://github.com/apache/beam/blob/697a1d17e473cd5b097aaaeee24c08f43cc77f58/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/UnboundedReadFromBoundedSource.java#L137 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4620) UnboundedReadFromBoundedSource.split() should always call split()
[ https://issues.apache.org/jira/browse/BEAM-4620?focusedWorklogId=189647=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189647 ] ASF GitHub Bot logged work on BEAM-4620: Author: ASF GitHub Bot Created on: 24/Jan/19 19:45 Start Date: 24/Jan/19 19:45 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #7555: [BEAM-4620] UnboundedReadFromBoundedSource invokes split for small bounded sources URL: https://github.com/apache/beam/pull/7555#discussion_r250752064 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/UnboundedReadFromBoundedSource.java ## @@ -127,9 +131,11 @@ public void validate() { long desiredBundleSize = boundedSource.getEstimatedSizeBytes(options) / desiredNumSplits; if (desiredBundleSize <= 0) { LOG.warn( - "BoundedSource {} cannot estimate its size, skips the initial splits.", Review comment: Your comment makes sense. To make matters more complicated our API instructs source authors to return 0L if size of source cannot be determined. So I updated code to pick a default size of 64MB in case 0 is returned as the estimated source. Otherwise we'll try to maximize splitting by setting desiredBindleSize to 1 when a valid value cannot be determined. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189647) Time Spent: 40m (was: 0.5h) > UnboundedReadFromBoundedSource.split() should always call split() > - > > Key: BEAM-4620 > URL: https://issues.apache.org/jira/browse/BEAM-4620 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Josh M >Assignee: Chamikara Jayalath >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > If the source contains too little data (rounds the result down to 0), or if > size estimation fails, then we don't call .split(). We must always call > split(); instead of returning original source, we should have computed some > fallback value for desiredBundleSize. > This bug has existed since the code was first introduced, so it is not a > regression - but it is certainly a bug. > > source: > https://github.com/apache/beam/blob/697a1d17e473cd5b097aaaeee24c08f43cc77f58/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/UnboundedReadFromBoundedSource.java#L137 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6501) [beam_PostCommit_Python_VR_Flink] [Flake] Java runtime runs out of memory.
[ https://issues.apache.org/jira/browse/BEAM-6501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751464#comment-16751464 ] Ankur Goenka commented on BEAM-6501: It looks like we are requesting a big chunk of memory from jvm which is causing this issues. *05:06:15* Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00034360, 3581411328, 0) failed; error='Cannot allocate memory' (errno=12)*05:06:15* #*05:06:15* # There is insufficient memory for the Java Runtime Environment to continue.*05:06:15* # Native memory allocation (mmap) failed to map 3581411328 bytes for committing reserved memory.*05:06:15* # An error report file with more information is saved as:*05:06:15* # /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_VR_Flink/src/sdks/python/hs_err_pid10167.log*05:06:16* Exception in thread wait_until_finish_read:*05:06:16* Traceback (most recent call last):*05:06:16* File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner*05:06:16* self.run()*05:06:16* File "/usr/lib/python2.7/threading.py", line 754, in run*05:06:16* self.__target(*self.__args, **self.__kwargs)*05:06:16* File "apache_beam/runners/portability/portable_runner.py", line 331, in read_messages*05:06:16* for message in self._message_stream:*05:06:16* File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_VR_Flink/src/build/gradleenv/1327086738/local/lib/python2.7/site-packages/grpc/_channel.py", line 367, in next*05:06:16* return self._next()*05:06:16* File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_VR_Flink/src/build/gradleenv/1327086738/local/lib/python2.7/site-packages/grpc/_channel.py", line 358, in _next*05:06:16* raise self*05:06:16* _Rendezvous: <_Rendezvous of RPC that terminated with:*05:06:16* status = StatusCode.UNAVAILABLE*05:06:16* details = "Socket closed"*05:06:16* debug_error_string = "\{"created":"@1548335176.725835112","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1036,"grpc_message":"Socket closed","grpc_status":14}"*05:06:16* >*05:06:16* *06:43:22* Build timed out (after 100 minutes). Marking the build as aborted.*06:43:22* EssBuild was aborted*06:43:23* No emails were triggered.*06:43:23* Finished: ABORTED > [beam_PostCommit_Python_VR_Flink] [Flake] Java runtime runs out of memory. > -- > > Key: BEAM-6501 > URL: https://issues.apache.org/jira/browse/BEAM-6501 > Project: Beam > Issue Type: Bug > Components: runner-flink, test-failures >Reporter: Daniel Oliveira >Assignee: Ankur Goenka >Priority: Major > Labels: flake > > _Use this form to file an issue for test failure:_ > * [Jenkins > Job|https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/86/] > Initial investigation: > Test seems to have timed out after failing due to hitting a memory limit. > From the bottom of the logs: > {noformat} > 05:06:15 # There is insufficient memory for the Java Runtime Environment to > continue. > 05:06:15 # Native memory allocation (mmap) failed to map 3581411328 bytes for > committing reserved memory. > {noformat} > May be a memory leak or a quota issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5442) PortableRunner swallows custom options for Runner
[ https://issues.apache.org/jira/browse/BEAM-5442?focusedWorklogId=189636=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189636 ] ASF GitHub Bot logged work on BEAM-5442: Author: ASF GitHub Bot Created on: 24/Jan/19 19:21 Start Date: 24/Jan/19 19:21 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #7564: [release] Revert "[BEAM-5442] Pass SDK-unknown PipelineOptions to Runner" URL: https://github.com/apache/beam/pull/7564#issuecomment-457322591 I am fixing the image now, but I am happy to merge this, too. Final verification suffices. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189636) Time Spent: 16h 40m (was: 16.5h) > PortableRunner swallows custom options for Runner > - > > Key: BEAM-5442 > URL: https://issues.apache.org/jira/browse/BEAM-5442 > Project: Beam > Issue Type: Bug > Components: sdk-java-core, sdk-py-core >Reporter: Maximilian Michels >Assignee: Thomas Weise >Priority: Major > Labels: portability, portability-flink > Time Spent: 16h 40m > Remaining Estimate: 0h > > The PortableRunner doesn't pass custom PipelineOptions to the executing > Runner. > Example: {{--parallelism=4}} won't be forwarded to the FlinkRunner. > (The option is just removed during proto translation without any warning) > We should allow some form of customization through the options, even for the > PortableRunner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6468) Cannot create empty TestBoundedTable
[ https://issues.apache.org/jira/browse/BEAM-6468?focusedWorklogId=189635=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189635 ] ASF GitHub Bot logged work on BEAM-6468: Author: ASF GitHub Bot Created on: 24/Jan/19 19:19 Start Date: 24/Jan/19 19:19 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #7568: [BEAM-6468] Allow creating empty TestBoundedTable URL: https://github.com/apache/beam/pull/7568#issuecomment-457322129 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189635) Time Spent: 2.5h (was: 2h 20m) > Cannot create empty TestBoundedTable > > > Key: BEAM-6468 > URL: https://issues.apache.org/jira/browse/BEAM-6468 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6489) Python precommits are failiing due to a pip error: "no such option: --process-dependency-links"
[ https://issues.apache.org/jira/browse/BEAM-6489?focusedWorklogId=189639=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189639 ] ASF GitHub Bot logged work on BEAM-6489: Author: ASF GitHub Bot Created on: 24/Jan/19 19:22 Start Date: 24/Jan/19 19:22 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #7611: [BEAM-6489] Cherry-pick 10f4b1a5ef to 2.10.0 release branch URL: https://github.com/apache/beam/pull/7611 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189639) Time Spent: 4.5h (was: 4h 20m) > Python precommits are failiing due to a pip error: "no such option: > --process-dependency-links" > > > Key: BEAM-6489 > URL: https://issues.apache.org/jira/browse/BEAM-6489 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, test-failures >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Blocker > Labels: currently-failing > Fix For: 2.11.0 > > Time Spent: 4.5h > Remaining Estimate: 0h > > {noformat} > 13:35:37 GLOB sdist-make: > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/setup.py > 13:35:37 docs create: > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/target/.tox/docs > 13:35:39 docs installdeps: Sphinx==1.6.5, sphinx_rtd_theme==0.2.4 > 13:35:39 ERROR: invocation failed (exit code 2), logfile: > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/target/.tox/docs/log/docs-1.log > 13:35:39 ERROR: actionid: docs > 13:35:39 msg: getenv > 13:35:39 cmdargs: > ['/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/target/.tox/docs/bin/python', > > '/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/target/.tox/docs/bin/pip', > 'install', '--retries', '10', '--process-dependency-links', 'Sphinx==1.6.5', > 'sphinx_rtd_theme==0.2.4'] > 13:35:39 > 13:35:39 > 13:35:39 Usage: > 13:35:39 pip install [options] > [package-index-options] ... > 13:35:39 pip install [options] -r > [package-index-options] ... > 13:35:39 pip install [options] [-e] ... > 13:35:39 pip install [options] [-e] ... > 13:35:39 pip install [options] ... > 13:35:39 > 13:35:39 no such option: --process-dependency-links > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5442) PortableRunner swallows custom options for Runner
[ https://issues.apache.org/jira/browse/BEAM-5442?focusedWorklogId=189638=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189638 ] ASF GitHub Bot logged work on BEAM-5442: Author: ASF GitHub Bot Created on: 24/Jan/19 19:21 Start Date: 24/Jan/19 19:21 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #7564: [release] Revert "[BEAM-5442] Pass SDK-unknown PipelineOptions to Runner" URL: https://github.com/apache/beam/pull/7564 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189638) Time Spent: 16h 50m (was: 16h 40m) > PortableRunner swallows custom options for Runner > - > > Key: BEAM-5442 > URL: https://issues.apache.org/jira/browse/BEAM-5442 > Project: Beam > Issue Type: Bug > Components: sdk-java-core, sdk-py-core >Reporter: Maximilian Michels >Assignee: Thomas Weise >Priority: Major > Labels: portability, portability-flink > Time Spent: 16h 50m > Remaining Estimate: 0h > > The PortableRunner doesn't pass custom PipelineOptions to the executing > Runner. > Example: {{--parallelism=4}} won't be forwarded to the FlinkRunner. > (The option is just removed during proto translation without any warning) > We should allow some form of customization through the options, even for the > PortableRunner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6503) PCollectionViews$SimplePCollectionView.hashCode once again allocates memory (fix reverted, then fixed again)
Kenneth Knowles created BEAM-6503: - Summary: PCollectionViews$SimplePCollectionView.hashCode once again allocates memory (fix reverted, then fixed again) Key: BEAM-6503 URL: https://issues.apache.org/jira/browse/BEAM-6503 Project: Beam Issue Type: Improvement Components: sdk-java-core Affects Versions: 2.8.0 Reporter: Vojtech Janota Assignee: Vojtech Janota Fix For: 2.9.0 I'm currently profiling memory consumption of our Beam pipeline and have noticed that org.apache.beam.sdk.values.PCollectionViews$SimplePCollectionView.hashCode() makes noticeable heap allocations. The implementation is: return Objects.hash(tag); That itself translates to: return Arrays.hashCode(values); Which performs implicit array creation in order to call: public static int Arrays.hashCode(Object a[]); Instead of the helper call, doing simple: tag.hashCode(); Seems more appropriate. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6324) CassandraIO.Read - Add the ability to provide a filter to the query
[ https://issues.apache.org/jira/browse/BEAM-6324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751440#comment-16751440 ] Kenneth Knowles commented on BEAM-6324: --- The PR is still open and I don't see that this is actually in 2.9.0. Just updating the Fix Version field so Jira reflects this. > CassandraIO.Read - Add the ability to provide a filter to the query > --- > > Key: BEAM-6324 > URL: https://issues.apache.org/jira/browse/BEAM-6324 > Project: Beam > Issue Type: Improvement > Components: io-java-cassandra >Affects Versions: 2.9.0 >Reporter: Shahar Frank >Assignee: Shahar Frank >Priority: Major > Labels: performance, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > CassandraIO.Read doesn't support using WHERE to filter the input at the > source (In Cassandra) which might provide great performance boost. > Already implemented by: > https://github.com/apache/beam/pull/7340 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6500) Precomit broken due to style violation
[ https://issues.apache.org/jira/browse/BEAM-6500?focusedWorklogId=189626=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189626 ] ASF GitHub Bot logged work on BEAM-6500: Author: ASF GitHub Bot Created on: 24/Jan/19 19:08 Start Date: 24/Jan/19 19:08 Worklog Time Spent: 10m Work Description: swegner commented on issue #7618: [BEAM-6500] Add missing javadocs for new public methods URL: https://github.com/apache/beam/pull/7618#issuecomment-457318412 Java pre-commit is passing and this comment-only change doesn't effect Python. I believe this is ready to merge. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189626) Time Spent: 0.5h (was: 20m) > Precomit broken due to style violation > -- > > Key: BEAM-6500 > URL: https://issues.apache.org/jira/browse/BEAM-6500 > Project: Beam > Issue Type: New Feature > Components: java-fn-execution >Reporter: Alex Amato >Assignee: Scott Wegner >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > This seems to be broken for all PRs now > > :beam-runners-direct-java:checkstyleMain FAILED [ant:checkstyle] [WARN] > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Java_Phrase/src/runners/direct-java/src/main/java/org/apache/beam/runners/direct/portable/ReferenceRunner.java:164:3: > Missing a Javadoc comment. [JavadocMethod] [ant:checkstyle] [ERROR] > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Java_Phrase/src/runners/direct-java/src/main/java/org/apache/beam/runners/direct/portable/job/ReferenceRunnerJobServer.java:147: > Missing a Javadoc comment. [JavadocType] > > https://scans.gradle.com/s/nwgb7xegklwqo/console-log?task=:beam-runners-direct-java:checkstyleMain -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6503) PCollectionViews$SimplePCollectionView.hashCode once again allocates memory (fix reverted, then fixed again)
[ https://issues.apache.org/jira/browse/BEAM-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-6503: -- Affects Version/s: (was: 2.8.0) 2.10.0 > PCollectionViews$SimplePCollectionView.hashCode once again allocates memory > (fix reverted, then fixed again) > > > Key: BEAM-6503 > URL: https://issues.apache.org/jira/browse/BEAM-6503 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Affects Versions: 2.10.0 >Reporter: Vojtech Janota >Assignee: Vojtech Janota >Priority: Trivial > > I'm currently profiling memory consumption of our Beam pipeline and have > noticed that > > org.apache.beam.sdk.values.PCollectionViews$SimplePCollectionView.hashCode() > makes noticeable heap allocations. The implementation is: > return Objects.hash(tag); > That itself translates to: > return Arrays.hashCode(values); > Which performs implicit array creation in order to call: > public static int Arrays.hashCode(Object a[]); > Instead of the helper call, doing simple: > tag.hashCode(); > Seems more appropriate. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6503) PCollectionViews$SimplePCollectionView.hashCode once again allocates memory (fix reverted, then fixed again)
[ https://issues.apache.org/jira/browse/BEAM-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-6503: -- Fix Version/s: (was: 2.9.0) > PCollectionViews$SimplePCollectionView.hashCode once again allocates memory > (fix reverted, then fixed again) > > > Key: BEAM-6503 > URL: https://issues.apache.org/jira/browse/BEAM-6503 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Affects Versions: 2.8.0 >Reporter: Vojtech Janota >Assignee: Vojtech Janota >Priority: Trivial > > I'm currently profiling memory consumption of our Beam pipeline and have > noticed that > > org.apache.beam.sdk.values.PCollectionViews$SimplePCollectionView.hashCode() > makes noticeable heap allocations. The implementation is: > return Objects.hash(tag); > That itself translates to: > return Arrays.hashCode(values); > Which performs implicit array creation in order to call: > public static int Arrays.hashCode(Object a[]); > Instead of the helper call, doing simple: > tag.hashCode(); > Seems more appropriate. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5933) PCollectionViews$SimplePCollectionView.hashCode allocates memory
[ https://issues.apache.org/jira/browse/BEAM-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751443#comment-16751443 ] Kenneth Knowles commented on BEAM-5933: --- I was just going through the Release tab of Jira and I think it might be good to represent the reality there, so I will clone. > PCollectionViews$SimplePCollectionView.hashCode allocates memory > > > Key: BEAM-5933 > URL: https://issues.apache.org/jira/browse/BEAM-5933 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Affects Versions: 2.8.0 >Reporter: Vojtech Janota >Assignee: Vojtech Janota >Priority: Trivial > Fix For: 2.9.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > I'm currently profiling memory consumption of our Beam pipeline and have > noticed that > > org.apache.beam.sdk.values.PCollectionViews$SimplePCollectionView.hashCode() > makes noticeable heap allocations. The implementation is: > return Objects.hash(tag); > That itself translates to: > return Arrays.hashCode(values); > Which performs implicit array creation in order to call: > public static int Arrays.hashCode(Object a[]); > Instead of the helper call, doing simple: > tag.hashCode(); > Seems more appropriate. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-5933) PCollectionViews$SimplePCollectionView.hashCode allocates memory
[ https://issues.apache.org/jira/browse/BEAM-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles resolved BEAM-5933. --- Resolution: Fixed > PCollectionViews$SimplePCollectionView.hashCode allocates memory > > > Key: BEAM-5933 > URL: https://issues.apache.org/jira/browse/BEAM-5933 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Affects Versions: 2.8.0 >Reporter: Vojtech Janota >Assignee: Vojtech Janota >Priority: Trivial > Fix For: 2.9.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > I'm currently profiling memory consumption of our Beam pipeline and have > noticed that > > org.apache.beam.sdk.values.PCollectionViews$SimplePCollectionView.hashCode() > makes noticeable heap allocations. The implementation is: > return Objects.hash(tag); > That itself translates to: > return Arrays.hashCode(values); > Which performs implicit array creation in order to call: > public static int Arrays.hashCode(Object a[]); > Instead of the helper call, doing simple: > tag.hashCode(); > Seems more appropriate. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5446) SplittableDoFn: Remove runner time execution information from public API surface
[ https://issues.apache.org/jira/browse/BEAM-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751441#comment-16751441 ] Kenneth Knowles commented on BEAM-5446: --- Since this _was_ included in 2.9.0 but will not be in 2.10.0, I am going to re-close and clone. > SplittableDoFn: Remove runner time execution information from public API > surface > > > Key: BEAM-5446 > URL: https://issues.apache.org/jira/browse/BEAM-5446 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Scott Wegner >Priority: Minor > Fix For: 2.9.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Move the setting of "claim observers" within RestrictionTracker to another > location to clean up the RestrictionTracker interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-6502) SplittableDoFn: Re-Remove runner time execution information from public API surface
[ https://issues.apache.org/jira/browse/BEAM-6502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-6502: - Assignee: Scott Wegner (was: Luke Cwik) > SplittableDoFn: Re-Remove runner time execution information from public API > surface > --- > > Key: BEAM-6502 > URL: https://issues.apache.org/jira/browse/BEAM-6502 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Scott Wegner >Priority: Minor > Fix For: 2.11.0 > > > Move the setting of "claim observers" within RestrictionTracker to another > location to clean up the RestrictionTracker interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6500) Precomit broken due to style violation
[ https://issues.apache.org/jira/browse/BEAM-6500?focusedWorklogId=189627=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189627 ] ASF GitHub Bot logged work on BEAM-6500: Author: ASF GitHub Bot Created on: 24/Jan/19 19:08 Start Date: 24/Jan/19 19:08 Worklog Time Spent: 10m Work Description: swegner commented on pull request #7618: [BEAM-6500] Add missing javadocs for new public methods URL: https://github.com/apache/beam/pull/7618 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189627) Time Spent: 40m (was: 0.5h) > Precomit broken due to style violation > -- > > Key: BEAM-6500 > URL: https://issues.apache.org/jira/browse/BEAM-6500 > Project: Beam > Issue Type: New Feature > Components: java-fn-execution >Reporter: Alex Amato >Assignee: Scott Wegner >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > This seems to be broken for all PRs now > > :beam-runners-direct-java:checkstyleMain FAILED [ant:checkstyle] [WARN] > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Java_Phrase/src/runners/direct-java/src/main/java/org/apache/beam/runners/direct/portable/ReferenceRunner.java:164:3: > Missing a Javadoc comment. [JavadocMethod] [ant:checkstyle] [ERROR] > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Java_Phrase/src/runners/direct-java/src/main/java/org/apache/beam/runners/direct/portable/job/ReferenceRunnerJobServer.java:147: > Missing a Javadoc comment. [JavadocType] > > https://scans.gradle.com/s/nwgb7xegklwqo/console-log?task=:beam-runners-direct-java:checkstyleMain -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-5446) SplittableDoFn: Remove runner time execution information from public API surface
[ https://issues.apache.org/jira/browse/BEAM-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles resolved BEAM-5446. --- Resolution: Fixed Assignee: Luke Cwik (was: Scott Wegner) > SplittableDoFn: Remove runner time execution information from public API > surface > > > Key: BEAM-5446 > URL: https://issues.apache.org/jira/browse/BEAM-5446 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: Minor > Fix For: 2.9.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Move the setting of "claim observers" within RestrictionTracker to another > location to clean up the RestrictionTracker interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6233) Make bundle execution with ExecutableStage support timer/states
[ https://issues.apache.org/jira/browse/BEAM-6233?focusedWorklogId=189628=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189628 ] ASF GitHub Bot logged work on BEAM-6233: Author: ASF GitHub Bot Created on: 24/Jan/19 19:08 Start Date: 24/Jan/19 19:08 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #7330: [BEAM-6233]: Add initial user timer support in Dataflow for batch pipelines URL: https://github.com/apache/beam/pull/7330#discussion_r250738966 ## File path: runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/ProcessRemoteBundleOperation.java ## @@ -48,25 +61,98 @@ new OutputReceiverFactory() { @Override public FnDataReceiver create(String pCollectionId) { - return receivedElement -> { -LOG.debug("Consume element {}", receivedElement); -outputReceiverMap.get(pCollectionId).process((WindowedValue) receivedElement); - }; + return receivedElement -> receive(pCollectionId, receivedElement); } }; private final StateRequestHandler stateRequestHandler; private final BundleProgressHandler progressHandler; private RemoteBundle remoteBundle; + private final DataflowExecutionContext executionContext; + private final Map timerOutputIdToSpecMap; + private final Map> timerWindowCodersMap; + private final Map timerIdToTimerSpecMap; + private final Map timerIdToKey; + private ExecutableStage executableStage; public ProcessRemoteBundleOperation( - OperationContext context, + ExecutableStage executableStage, + DataflowExecutionContext executionContext, + DataflowOperationContext operationContext, StageBundleFactory stageBundleFactory, Map outputReceiverMap) { -super(EMPTY_RECEIVER_ARRAY, context); +super(EMPTY_RECEIVER_ARRAY, operationContext); + this.stageBundleFactory = stageBundleFactory; -this.outputReceiverMap = outputReceiverMap; this.stateRequestHandler = StateRequestHandler.unsupported(); this.progressHandler = BundleProgressHandler.ignored(); +this.executionContext = executionContext; +this.timerOutputIdToSpecMap = new HashMap<>(); +this.timerWindowCodersMap = new HashMap<>(); +this.executableStage = executableStage; +this.timerIdToKey = new HashMap<>(); +this.outputReceiverMap = outputReceiverMap; +timerIdToTimerSpecMap = new HashMap<>(); + +for (Map transformTimerMap : + stageBundleFactory.getProcessBundleDescriptor().getTimerSpecs().values()) { + for (ProcessBundleDescriptors.TimerSpec timerSpec : transformTimerMap.values()) { +timerIdToTimerSpecMap.put(timerSpec.timerId(), timerSpec); + } +} + +for (Map transformTimerMap : + stageBundleFactory.getProcessBundleDescriptor().getTimerSpecs().values()) { + for (ProcessBundleDescriptors.TimerSpec timerSpec : transformTimerMap.values()) { +timerOutputIdToSpecMap.put(timerSpec.outputCollectionId(), timerSpec); + } +} + +for (RunnerApi.PTransform pTransform : Review comment: The inner loop and the lookup logic complicates this unfortunately to a point where the stream method obfuscates too much. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189628) Time Spent: 3h 10m (was: 3h) > Make bundle execution with ExecutableStage support timer/states > --- > > Key: BEAM-6233 > URL: https://issues.apache.org/jira/browse/BEAM-6233 > Project: Beam > Issue Type: Task > Components: runner-dataflow >Reporter: Boyuan Zhang >Assignee: Sam Rohde >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6324) CassandraIO.Read - Add the ability to provide a filter to the query
[ https://issues.apache.org/jira/browse/BEAM-6324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-6324: -- Fix Version/s: (was: 2.9.0) > CassandraIO.Read - Add the ability to provide a filter to the query > --- > > Key: BEAM-6324 > URL: https://issues.apache.org/jira/browse/BEAM-6324 > Project: Beam > Issue Type: Improvement > Components: io-java-cassandra >Affects Versions: 2.9.0 >Reporter: Shahar Frank >Assignee: Shahar Frank >Priority: Major > Labels: performance, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > CassandraIO.Read doesn't support using WHERE to filter the input at the > source (In Cassandra) which might provide great performance boost. > Already implemented by: > https://github.com/apache/beam/pull/7340 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6233) Make bundle execution with ExecutableStage support timer/states
[ https://issues.apache.org/jira/browse/BEAM-6233?focusedWorklogId=189625=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189625 ] ASF GitHub Bot logged work on BEAM-6233: Author: ASF GitHub Bot Created on: 24/Jan/19 19:07 Start Date: 24/Jan/19 19:07 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #7330: [BEAM-6233]: Add initial user timer support in Dataflow for batch pipelines URL: https://github.com/apache/beam/pull/7330#discussion_r250738679 ## File path: runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/ProcessRemoteBundleOperation.java ## @@ -100,6 +197,75 @@ public void finish() throws Exception { } catch (Exception e) { throw new RuntimeException("Failed to finish remote bundle", e); } + + // TODO(BEAM-6274): do we have to put this in the "start" method as well? + try (RemoteBundle bundle = + stageBundleFactory.getBundle(receiverFactory, stateRequestHandler, progressHandler)) { + +// TODO(BEAM-6274): Why do we need to namespace this to "user"? +DataflowExecutionContext.DataflowStepContext stepContext = +executionContext +.getStepContext((DataflowOperationContext) this.context) +.namespacedToUser(); + +// TODO(BEAM-6274): investigate if this is the correct window +TimerInternals.TimerData timerData = +stepContext.getNextFiredTimer(GlobalWindow.Coder.INSTANCE); +while (timerData != null) { + LOG.debug("Found fired timer in start {}", timerData); + + // TODO(BEAM-6274): get the correct payload and payload coder + StateNamespaces.WindowNamespace windowNamespace = + (StateNamespaces.WindowNamespace) timerData.getNamespace(); + BoundedWindow window = windowNamespace.getWindow(); + + WindowedValue> timerValue = + WindowedValue.of( + KV.of( + timerIdToKey.get(timerData.getTimerId()), + Timer.of(timerData.getTimestamp(), new byte[0])), + timerData.getTimestamp(), + Collections.singleton(window), + PaneInfo.NO_FIRING); + + String mainInputId = + timerIdToTimerSpecMap.get(timerData.getTimerId()).inputCollectionId(); + + bundle.getInputReceivers().get(mainInputId).accept(timerValue); + + // TODO(BEAM-6274): investigate if this is the correct window + timerData = stepContext.getNextFiredTimer(GlobalWindow.Coder.INSTANCE); +} + } +} + } + + private void receive(String pCollectionId, Object receivedElement) throws Exception { +LOG.debug("Received element {} for pcollection {}", receivedElement, pCollectionId); +// TODO(BEAM-6274): move this out into its own receiver class Review comment: I plan on a follow up PR cleaning this up and moving into its own class. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189625) Time Spent: 3h (was: 2h 50m) > Make bundle execution with ExecutableStage support timer/states > --- > > Key: BEAM-6233 > URL: https://issues.apache.org/jira/browse/BEAM-6233 > Project: Beam > Issue Type: Task > Components: runner-dataflow >Reporter: Boyuan Zhang >Assignee: Sam Rohde >Priority: Major > Time Spent: 3h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6468) Cannot create empty TestBoundedTable
[ https://issues.apache.org/jira/browse/BEAM-6468?focusedWorklogId=189618=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189618 ] ASF GitHub Bot logged work on BEAM-6468: Author: ASF GitHub Bot Created on: 24/Jan/19 18:51 Start Date: 24/Jan/19 18:51 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #7568: [BEAM-6468] Allow creating empty TestBoundedTable URL: https://github.com/apache/beam/pull/7568#issuecomment-457312809 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189618) Time Spent: 2h 20m (was: 2h 10m) > Cannot create empty TestBoundedTable > > > Key: BEAM-6468 > URL: https://issues.apache.org/jira/browse/BEAM-6468 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6154) Gcsio batch delete broken in Python 3
[ https://issues.apache.org/jira/browse/BEAM-6154?focusedWorklogId=189609=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189609 ] ASF GitHub Bot logged work on BEAM-6154: Author: ASF GitHub Bot Created on: 24/Jan/19 18:36 Start Date: 24/Jan/19 18:36 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #7617: [BEAM-6154] Update google-apitools to 0.5.26 and fix gcsio in python 3 URL: https://github.com/apache/beam/pull/7617#issuecomment-457307785 Run Python Dataflow ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189609) Time Spent: 50m (was: 40m) > Gcsio batch delete broken in Python 3 > - > > Key: BEAM-6154 > URL: https://issues.apache.org/jira/browse/BEAM-6154 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > I'm running Python SDK agianst GCP in Python 3.5 and got following gcsio > error while deleting files: > {code} > File "/usr/local/lib/python3.5/site-packages/apache_beam/io/iobase.py", > line 1077, in > window.TimestampedValue(v, timestamp.MAX_TIMESTAMP) for v in outputs) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", > line 315, in finalize_write > num_threads) > File "/usr/local/lib/python3.5/site-packages/apache_beam/internal/util.py", > line 145, in run_using_threadpool > return pool.map(fn_to_execute, inputs) > File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 266, in map > return self._map_async(func, iterable, mapstar, chunksize).get() > File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 644, in get > raise self._value > File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 119, in worker > result = (True, func(*args, **kwds)) > File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 44, in mapstar > return list(map(*args)) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", > line 299, in _rename_batch > FileSystems.rename(source_files, destination_files) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/filesystems.py", line > 252, in rename > return filesystem.rename(source_file_names, destination_file_names) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsfilesystem.py", > line 229, in rename > copy_statuses = gcsio.GcsIO().copy_batch(batch) > File "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsio.py", > line 322, in copy_batch > api_calls = batch_request.Execute(self.client._http) # pylint: > disable=protected-access > File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", > line 222, in Execute > batch_http_request.Execute(http) > File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", > line 480, in Execute > self._Execute(http) > File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", > line 450, in _Execute > mime_response = parser.parsestr(header + response.content) > TypeError: Can't convert 'bytes' object to str implicitly > {code} > After looking into related code in apitools library, I found response.content > that's returned via http request to gcs is bytes and apitools didn't handle > this scenario. This can be a blocker to any pipeline depending on gcsio and > apparently blocks all Dataflow job in Python 3. > This could be another case that moving off apitools dependency in > [BEAM-4850|https://issues.apache.org/jira/browse/BEAM-4850]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6154) Gcsio batch delete broken in Python 3
[ https://issues.apache.org/jira/browse/BEAM-6154?focusedWorklogId=189612=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189612 ] ASF GitHub Bot logged work on BEAM-6154: Author: ASF GitHub Bot Created on: 24/Jan/19 18:36 Start Date: 24/Jan/19 18:36 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #7617: [BEAM-6154] Update google-apitools to 0.5.26 and fix gcsio in python 3 URL: https://github.com/apache/beam/pull/7617#issuecomment-457307867 Run Python Dataflow ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189612) Time Spent: 1h 20m (was: 1h 10m) > Gcsio batch delete broken in Python 3 > - > > Key: BEAM-6154 > URL: https://issues.apache.org/jira/browse/BEAM-6154 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > I'm running Python SDK agianst GCP in Python 3.5 and got following gcsio > error while deleting files: > {code} > File "/usr/local/lib/python3.5/site-packages/apache_beam/io/iobase.py", > line 1077, in > window.TimestampedValue(v, timestamp.MAX_TIMESTAMP) for v in outputs) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", > line 315, in finalize_write > num_threads) > File "/usr/local/lib/python3.5/site-packages/apache_beam/internal/util.py", > line 145, in run_using_threadpool > return pool.map(fn_to_execute, inputs) > File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 266, in map > return self._map_async(func, iterable, mapstar, chunksize).get() > File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 644, in get > raise self._value > File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 119, in worker > result = (True, func(*args, **kwds)) > File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 44, in mapstar > return list(map(*args)) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", > line 299, in _rename_batch > FileSystems.rename(source_files, destination_files) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/filesystems.py", line > 252, in rename > return filesystem.rename(source_file_names, destination_file_names) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsfilesystem.py", > line 229, in rename > copy_statuses = gcsio.GcsIO().copy_batch(batch) > File "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsio.py", > line 322, in copy_batch > api_calls = batch_request.Execute(self.client._http) # pylint: > disable=protected-access > File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", > line 222, in Execute > batch_http_request.Execute(http) > File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", > line 480, in Execute > self._Execute(http) > File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", > line 450, in _Execute > mime_response = parser.parsestr(header + response.content) > TypeError: Can't convert 'bytes' object to str implicitly > {code} > After looking into related code in apitools library, I found response.content > that's returned via http request to gcs is bytes and apitools didn't handle > this scenario. This can be a blocker to any pipeline depending on gcsio and > apparently blocks all Dataflow job in Python 3. > This could be another case that moving off apitools dependency in > [BEAM-4850|https://issues.apache.org/jira/browse/BEAM-4850]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6154) Gcsio batch delete broken in Python 3
[ https://issues.apache.org/jira/browse/BEAM-6154?focusedWorklogId=189608=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189608 ] ASF GitHub Bot logged work on BEAM-6154: Author: ASF GitHub Bot Created on: 24/Jan/19 18:35 Start Date: 24/Jan/19 18:35 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #7617: [BEAM-6154] Update google-apitools to 0.5.26 and fix gcsio in python 3 URL: https://github.com/apache/beam/pull/7617#issuecomment-457307339 Run Python Dataflow ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189608) Time Spent: 40m (was: 0.5h) > Gcsio batch delete broken in Python 3 > - > > Key: BEAM-6154 > URL: https://issues.apache.org/jira/browse/BEAM-6154 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > I'm running Python SDK agianst GCP in Python 3.5 and got following gcsio > error while deleting files: > {code} > File "/usr/local/lib/python3.5/site-packages/apache_beam/io/iobase.py", > line 1077, in > window.TimestampedValue(v, timestamp.MAX_TIMESTAMP) for v in outputs) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", > line 315, in finalize_write > num_threads) > File "/usr/local/lib/python3.5/site-packages/apache_beam/internal/util.py", > line 145, in run_using_threadpool > return pool.map(fn_to_execute, inputs) > File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 266, in map > return self._map_async(func, iterable, mapstar, chunksize).get() > File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 644, in get > raise self._value > File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 119, in worker > result = (True, func(*args, **kwds)) > File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 44, in mapstar > return list(map(*args)) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", > line 299, in _rename_batch > FileSystems.rename(source_files, destination_files) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/filesystems.py", line > 252, in rename > return filesystem.rename(source_file_names, destination_file_names) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsfilesystem.py", > line 229, in rename > copy_statuses = gcsio.GcsIO().copy_batch(batch) > File "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsio.py", > line 322, in copy_batch > api_calls = batch_request.Execute(self.client._http) # pylint: > disable=protected-access > File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", > line 222, in Execute > batch_http_request.Execute(http) > File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", > line 480, in Execute > self._Execute(http) > File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", > line 450, in _Execute > mime_response = parser.parsestr(header + response.content) > TypeError: Can't convert 'bytes' object to str implicitly > {code} > After looking into related code in apitools library, I found response.content > that's returned via http request to gcs is bytes and apitools didn't handle > this scenario. This can be a blocker to any pipeline depending on gcsio and > apparently blocks all Dataflow job in Python 3. > This could be another case that moving off apitools dependency in > [BEAM-4850|https://issues.apache.org/jira/browse/BEAM-4850]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6154) Gcsio batch delete broken in Python 3
[ https://issues.apache.org/jira/browse/BEAM-6154?focusedWorklogId=189607=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189607 ] ASF GitHub Bot logged work on BEAM-6154: Author: ASF GitHub Bot Created on: 24/Jan/19 18:34 Start Date: 24/Jan/19 18:34 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #7617: [BEAM-6154] Update google-apitools to 0.5.26 and fix gcsio in python 3 URL: https://github.com/apache/beam/pull/7617#issuecomment-457307212 Run Dataflow Python ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189607) Time Spent: 0.5h (was: 20m) > Gcsio batch delete broken in Python 3 > - > > Key: BEAM-6154 > URL: https://issues.apache.org/jira/browse/BEAM-6154 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > I'm running Python SDK agianst GCP in Python 3.5 and got following gcsio > error while deleting files: > {code} > File "/usr/local/lib/python3.5/site-packages/apache_beam/io/iobase.py", > line 1077, in > window.TimestampedValue(v, timestamp.MAX_TIMESTAMP) for v in outputs) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", > line 315, in finalize_write > num_threads) > File "/usr/local/lib/python3.5/site-packages/apache_beam/internal/util.py", > line 145, in run_using_threadpool > return pool.map(fn_to_execute, inputs) > File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 266, in map > return self._map_async(func, iterable, mapstar, chunksize).get() > File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 644, in get > raise self._value > File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 119, in worker > result = (True, func(*args, **kwds)) > File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 44, in mapstar > return list(map(*args)) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", > line 299, in _rename_batch > FileSystems.rename(source_files, destination_files) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/filesystems.py", line > 252, in rename > return filesystem.rename(source_file_names, destination_file_names) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsfilesystem.py", > line 229, in rename > copy_statuses = gcsio.GcsIO().copy_batch(batch) > File "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsio.py", > line 322, in copy_batch > api_calls = batch_request.Execute(self.client._http) # pylint: > disable=protected-access > File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", > line 222, in Execute > batch_http_request.Execute(http) > File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", > line 480, in Execute > self._Execute(http) > File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", > line 450, in _Execute > mime_response = parser.parsestr(header + response.content) > TypeError: Can't convert 'bytes' object to str implicitly > {code} > After looking into related code in apitools library, I found response.content > that's returned via http request to gcs is bytes and apitools didn't handle > this scenario. This can be a blocker to any pipeline depending on gcsio and > apparently blocks all Dataflow job in Python 3. > This could be another case that moving off apitools dependency in > [BEAM-4850|https://issues.apache.org/jira/browse/BEAM-4850]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5442) PortableRunner swallows custom options for Runner
[ https://issues.apache.org/jira/browse/BEAM-5442?focusedWorklogId=189604=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189604 ] ASF GitHub Bot logged work on BEAM-5442: Author: ASF GitHub Bot Created on: 24/Jan/19 18:32 Start Date: 24/Jan/19 18:32 Worklog Time Spent: 10m Work Description: mxm commented on issue #7564: [release] Revert "[BEAM-5442] Pass SDK-unknown PipelineOptions to Runner" URL: https://github.com/apache/beam/pull/7564#issuecomment-457306533 Do we want to fix the missing image or assume this is working working correctly since all other tests pass? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189604) Time Spent: 16.5h (was: 16h 20m) > PortableRunner swallows custom options for Runner > - > > Key: BEAM-5442 > URL: https://issues.apache.org/jira/browse/BEAM-5442 > Project: Beam > Issue Type: Bug > Components: sdk-java-core, sdk-py-core >Reporter: Maximilian Michels >Assignee: Thomas Weise >Priority: Major > Labels: portability, portability-flink > Time Spent: 16.5h > Remaining Estimate: 0h > > The PortableRunner doesn't pass custom PipelineOptions to the executing > Runner. > Example: {{--parallelism=4}} won't be forwarded to the FlinkRunner. > (The option is just removed during proto translation without any warning) > We should allow some form of customization through the options, even for the > PortableRunner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6500) Precomit broken due to style violation
[ https://issues.apache.org/jira/browse/BEAM-6500?focusedWorklogId=189603=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189603 ] ASF GitHub Bot logged work on BEAM-6500: Author: ASF GitHub Bot Created on: 24/Jan/19 18:32 Start Date: 24/Jan/19 18:32 Worklog Time Spent: 10m Work Description: swegner commented on pull request #7618: [BEAM-6500] Add missing javadocs for new public methods URL: https://github.com/apache/beam/pull/7618 Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). It will help us expedite review of your Pull Request if you tag someone (e.g. `@username`) to look at it. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | --- | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 189603) Time Spent: 10m Remaining Estimate: 0h > Precomit broken due to style violation > -- > > Key: BEAM-6500 > URL: https://issues.apache.org/jira/browse/BEAM-6500 > Project: Beam > Issue Type: New Feature > Components: java-fn-execution >
[jira] [Assigned] (BEAM-6500) Precomit broken due to style violation
[ https://issues.apache.org/jira/browse/BEAM-6500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Wegner reassigned BEAM-6500: -- Assignee: Scott Wegner > Precomit broken due to style violation > -- > > Key: BEAM-6500 > URL: https://issues.apache.org/jira/browse/BEAM-6500 > Project: Beam > Issue Type: New Feature > Components: java-fn-execution >Reporter: Alex Amato >Assignee: Scott Wegner >Priority: Major > > This seems to be broken for all PRs now > > :beam-runners-direct-java:checkstyleMain FAILED [ant:checkstyle] [WARN] > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Java_Phrase/src/runners/direct-java/src/main/java/org/apache/beam/runners/direct/portable/ReferenceRunner.java:164:3: > Missing a Javadoc comment. [JavadocMethod] [ant:checkstyle] [ERROR] > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Java_Phrase/src/runners/direct-java/src/main/java/org/apache/beam/runners/direct/portable/job/ReferenceRunnerJobServer.java:147: > Missing a Javadoc comment. [JavadocType] > > https://scans.gradle.com/s/nwgb7xegklwqo/console-log?task=:beam-runners-direct-java:checkstyleMain -- This message was sent by Atlassian JIRA (v7.6.3#76005)