[jira] [Work logged] (BEAM-5995) Create Jenkins jobs to run the load tests
[ https://issues.apache.org/jira/browse/BEAM-5995?focusedWorklogId=233293&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233293 ] ASF GitHub Bot logged work on BEAM-5995: Author: ASF GitHub Bot Created on: 26/Apr/19 06:47 Start Date: 26/Apr/19 06:47 Worklog Time Spent: 10m Work Description: kkucharc commented on issue #8151: [BEAM-5995] add Jenkins job with GBK Python load tests URL: https://github.com/apache/beam/pull/8151#issuecomment-486946186 Run Python Load Tests Smoke This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233293) Time Spent: 31h 50m (was: 31h 40m) > Create Jenkins jobs to run the load tests > - > > Key: BEAM-5995 > URL: https://issues.apache.org/jira/browse/BEAM-5995 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Kasia Kucharczyk >Assignee: Kasia Kucharczyk >Priority: Major > Labels: triaged > Time Spent: 31h 50m > Remaining Estimate: 0h > > (/) Add SMOKE test > Add GBK load tests. > Add CoGBK load tests. > Add Pardo load tests. > Add SideInput tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5995) Create Jenkins jobs to run the load tests
[ https://issues.apache.org/jira/browse/BEAM-5995?focusedWorklogId=233291&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233291 ] ASF GitHub Bot logged work on BEAM-5995: Author: ASF GitHub Bot Created on: 26/Apr/19 06:41 Start Date: 26/Apr/19 06:41 Worklog Time Spent: 10m Work Description: kkucharc commented on issue #8151: [BEAM-5995] add Jenkins job with GBK Python load tests URL: https://github.com/apache/beam/pull/8151#issuecomment-486944634 run seed job This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233291) Time Spent: 31h 40m (was: 31.5h) > Create Jenkins jobs to run the load tests > - > > Key: BEAM-5995 > URL: https://issues.apache.org/jira/browse/BEAM-5995 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Kasia Kucharczyk >Assignee: Kasia Kucharczyk >Priority: Major > Labels: triaged > Time Spent: 31h 40m > Remaining Estimate: 0h > > (/) Add SMOKE test > Add GBK load tests. > Add CoGBK load tests. > Add Pardo load tests. > Add SideInput tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6627) Use Metrics API in IO performance tests
[ https://issues.apache.org/jira/browse/BEAM-6627?focusedWorklogId=233285&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233285 ] ASF GitHub Bot logged work on BEAM-6627: Author: ASF GitHub Bot Created on: 26/Apr/19 06:22 Start Date: 26/Apr/19 06:22 Worklog Time Spent: 10m Work Description: mwalenia commented on issue #8400: [BEAM-6627] Added byte and item counting metrics to integration tests URL: https://github.com/apache/beam/pull/8400#issuecomment-486940668 I fixed the commit history to be cleaner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233285) Time Spent: 10h 40m (was: 10.5h) > Use Metrics API in IO performance tests > --- > > Key: BEAM-6627 > URL: https://issues.apache.org/jira/browse/BEAM-6627 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Michal Walenia >Assignee: Michal Walenia >Priority: Minor > Time Spent: 10h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7141) Expose kv and window parameters for on_timer
[ https://issues.apache.org/jira/browse/BEAM-7141?focusedWorklogId=233277&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233277 ] ASF GitHub Bot logged work on BEAM-7141: Author: ASF GitHub Bot Created on: 26/Apr/19 05:57 Start Date: 26/Apr/19 05:57 Worklog Time Spent: 10m Work Description: rakeshcusat commented on issue #8408: [BEAM-7141] Add window & timestamp in timer callback method argument URL: https://github.com/apache/beam/pull/8408#issuecomment-486935666 This is still work in progress. I am mostly done with implementation but still working on my test case. I thought to open the PR so you can provide me early feedback. R: @tweise @robertwb @pabloem This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233277) Time Spent: 20m (was: 10m) > Expose kv and window parameters for on_timer > > > Key: BEAM-7141 > URL: https://issues.apache.org/jira/browse/BEAM-7141 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Affects Versions: 2.12.0 >Reporter: Thomas Weise >Assignee: Rakesh Kumar >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > We would like to have access to key and window inside the timer callback. > Without, it is also difficult to debug. We run into this while working on > BEAM-7112 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6695) Latest transform for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-6695?focusedWorklogId=233274&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233274 ] ASF GitHub Bot logged work on BEAM-6695: Author: ASF GitHub Bot Created on: 26/Apr/19 05:56 Start Date: 26/Apr/19 05:56 Worklog Time Spent: 10m Work Description: ttanay commented on issue #8206: [BEAM-6695] Latest PTransform for Python SDK URL: https://github.com/apache/beam/pull/8206#issuecomment-486935471 Thank you @robinyqiu! :smile: This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233274) Time Spent: 6h 20m (was: 6h 10m) > Latest transform for Python SDK > --- > > Key: BEAM-6695 > URL: https://issues.apache.org/jira/browse/BEAM-6695 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Ahmet Altay >Assignee: Tanay Tummalapalli >Priority: Minor > Time Spent: 6h 20m > Remaining Estimate: 0h > > Add a PTransform} and Combine.CombineFn for computing the latest element in a > PCollection. > It should offer the same API as its Java counterpart: > https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Latest.java -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7141) Expose kv and window parameters for on_timer
[ https://issues.apache.org/jira/browse/BEAM-7141?focusedWorklogId=233269&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233269 ] ASF GitHub Bot logged work on BEAM-7141: Author: ASF GitHub Bot Created on: 26/Apr/19 05:54 Start Date: 26/Apr/19 05:54 Worklog Time Spent: 10m Work Description: rakeshcusat commented on pull request #8408: [BEAM-7141] Add window & timestamp in timer callback method argument URL: https://github.com/apache/beam/pull/8408 Timer callback argument didn't have timestamp and window parameter. This change will allow callback method to specify window and timestamp parameters and system will be able to pass these values. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [x] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/) | --- | --- | --- Pre-Commit Tests Status (on master branch) --
[jira] [Work logged] (BEAM-7139) Blog post announcing Kotlin Sample addition to Beam
[ https://issues.apache.org/jira/browse/BEAM-7139?focusedWorklogId=233240&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233240 ] ASF GitHub Bot logged work on BEAM-7139: Author: ASF GitHub Bot Created on: 26/Apr/19 02:49 Start Date: 26/Apr/19 02:49 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #8391: [BEAM-7139] Blogpost for Kotlin Samples URL: https://github.com/apache/beam/pull/8391 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233240) Time Spent: 1h 20m (was: 1h 10m) > Blog post announcing Kotlin Sample addition to Beam > --- > > Key: BEAM-7139 > URL: https://issues.apache.org/jira/browse/BEAM-7139 > Project: Beam > Issue Type: Task > Components: website >Reporter: Harshit Dwivedi >Assignee: Harshit Dwivedi >Priority: Trivial > Time Spent: 1h 20m > Remaining Estimate: 0h > > Publishing a quick blog post that lets the users know that the Beam samples > now have kotlin support. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-7155) Provide access to within DoFn.OnTimerContext
Reza ardeshir rokni created BEAM-7155: - Summary: Provide access to within DoFn.OnTimerContext Key: BEAM-7155 URL: https://issues.apache.org/jira/browse/BEAM-7155 Project: Beam Issue Type: Improvement Components: sdk-java-core Affects Versions: 2.11.0 Reporter: Reza ardeshir rokni org.apache.beam.sdk.transforms.DoFn.OnTimerContext Does not support access to Key. Current workaround: @ProcessElement @StateId("dataKey") ValueState key { key.write(key)} ... @OnTimer("processTimer") public void OnTimer(OnTimerContext otc, @StateId("dataKey") ValueState key, {key.read()} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7087) Create an errors package for Go SDK
[ https://issues.apache.org/jira/browse/BEAM-7087?focusedWorklogId=233220&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233220 ] ASF GitHub Bot logged work on BEAM-7087: Author: ASF GitHub Bot Created on: 26/Apr/19 01:05 Start Date: 26/Apr/19 01:05 Worklog Time Spent: 10m Work Description: lostluck commented on pull request #8406: [BEAM-7087] Updating Go SDK errors in base beam directory URL: https://github.com/apache/beam/pull/8406#discussion_r278779819 ## File path: sdks/go/pkg/beam/combine.go ## @@ -54,25 +57,26 @@ func TryCombinePerKey(s Scope, combinefn interface{}, col PCollection) (PCollect ValidateKVType(col) col, err := TryGroupByKey(s, col) if err != nil { - return PCollection{}, fmt.Errorf("failed to group by key: %v", err) + return PCollection{}, addCombinePerKeyCtx(err, s) Review comment: Just checking: Does the root error here from TryGroupByKey reveal that it's failing at adding a GroupByKey to the graph? Otherwise it feels like we're losing some context. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233220) Time Spent: 1h 20m (was: 1h 10m) > Create an errors package for Go SDK > --- > > Key: BEAM-7087 > URL: https://issues.apache.org/jira/browse/BEAM-7087 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Major > Fix For: Not applicable > > Time Spent: 1h 20m > Remaining Estimate: 0h > > We can't import an established errors package such as github.com/pkg/errors > due to how the Go SDK is currently implemented, so instead we should have our > own package. This package should provide the following to be worth replacing > the basic errors we currently use. > # A way to wrap other errors without losing the original error, as in > [https://github.com/pkg/errors] > # Making printed errors more legible than currently. (For ex. having > different wrapped errors on different lines, differentiating between errors > and context, etc.) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7087) Create an errors package for Go SDK
[ https://issues.apache.org/jira/browse/BEAM-7087?focusedWorklogId=233219&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233219 ] ASF GitHub Bot logged work on BEAM-7087: Author: ASF GitHub Bot Created on: 26/Apr/19 01:05 Start Date: 26/Apr/19 01:05 Worklog Time Spent: 10m Work Description: lostluck commented on pull request #8406: [BEAM-7087] Updating Go SDK errors in base beam directory URL: https://github.com/apache/beam/pull/8406#discussion_r278780082 ## File path: sdks/go/pkg/beam/coder.go ## @@ -26,6 +26,7 @@ import ( "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec" "github.com/apache/beam/sdks/go/pkg/beam/core/typex" "github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx" + "github.com/apache/beam/sdks/go/pkg/beam/internal" Review comment: Shouldn't this be ".../beam/internal/errors" ? (same in other files.) I missed this in the previous PR. The errors package files should be under a subdirectory in the internal folder to have the package properly be "errors" rather than "internal". Note how code in the "beam" package is all in a folder called "beam" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233219) Time Spent: 1h 10m (was: 1h) > Create an errors package for Go SDK > --- > > Key: BEAM-7087 > URL: https://issues.apache.org/jira/browse/BEAM-7087 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Major > Fix For: Not applicable > > Time Spent: 1h 10m > Remaining Estimate: 0h > > We can't import an established errors package such as github.com/pkg/errors > due to how the Go SDK is currently implemented, so instead we should have our > own package. This package should provide the following to be worth replacing > the basic errors we currently use. > # A way to wrap other errors without losing the original error, as in > [https://github.com/pkg/errors] > # Making printed errors more legible than currently. (For ex. having > different wrapped errors on different lines, differentiating between errors > and context, etc.) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6138) Add User Metric Support to Java SDK
[ https://issues.apache.org/jira/browse/BEAM-6138?focusedWorklogId=233206&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233206 ] ASF GitHub Bot logged work on BEAM-6138: Author: ASF GitHub Bot Created on: 26/Apr/19 00:42 Start Date: 26/Apr/19 00:42 Worklog Time Spent: 10m Work Description: ajamato commented on pull request #8280: [BEAM-6138] Update java SDK to report user distribution tuple metrics over the FN API URL: https://github.com/apache/beam/pull/8280#discussion_r275425153 ## File path: runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MetricsContainerImplTest.java ## @@ -178,6 +178,34 @@ public void testMonitoringInfosArePopulatedForUserCounters() { assertThat(actualMonitoringInfos, containsInAnyOrder(builder1.build(), builder2.build())); } + @Test + public void testMonitoringInfosArePopulatedForUserDistributions() { +MetricsContainerImpl testObject = new MetricsContainerImpl("step1"); +DistributionCell c1 = testObject.getDistribution(MetricName.named("ns", "name1")); +DistributionCell c2 = testObject.getDistribution(MetricName.named("ns", "name2")); +c1.update(5L); +c2.update(4L); + +SimpleMonitoringInfoBuilder builder1 = new SimpleMonitoringInfoBuilder(); +builder1.setUrnForUserDistribution("ns", "name1"); +builder1.setInt64DistributionValue(DistributionData.create(5, 1, 5, 5)); +builder1.setPTransformLabel("step1"); +builder1.build(); Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233206) Time Spent: 9h 20m (was: 9h 10m) > Add User Metric Support to Java SDK > --- > > Key: BEAM-6138 > URL: https://issues.apache.org/jira/browse/BEAM-6138 > Project: Beam > Issue Type: New Feature > Components: java-fn-execution >Reporter: Alex Amato >Assignee: Alex Amato >Priority: Major > Labels: triaged > Time Spent: 9h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6138) Add User Metric Support to Java SDK
[ https://issues.apache.org/jira/browse/BEAM-6138?focusedWorklogId=233205&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233205 ] ASF GitHub Bot logged work on BEAM-6138: Author: ASF GitHub Bot Created on: 26/Apr/19 00:42 Start Date: 26/Apr/19 00:42 Worklog Time Spent: 10m Work Description: ajamato commented on pull request #8280: [BEAM-6138] Update java SDK to report user distribution tuple metrics over the FN API URL: https://github.com/apache/beam/pull/8280#discussion_r275424257 ## File path: runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MonitoringInfoConstants.java ## @@ -41,7 +41,7 @@ public static final String TOTAL_MSECS = extractUrn(MonitoringInfoSpecs.Enum.TOTAL_MSECS); public static final String USER_COUNTER_PREFIX = extractUrn(MonitoringInfoSpecs.Enum.USER_COUNTER); -public static final String USER_DISTRIBUTION_COUNTER_PREFIX = +public static final String USER_DISTRIBUTION_PREFIX = Review comment: I removed references to calling it a "counter" whereever possible, because it is a distribution, not a counter. This approach is still consistent. USER_X_PREFIX. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233205) Time Spent: 9h 10m (was: 9h) > Add User Metric Support to Java SDK > --- > > Key: BEAM-6138 > URL: https://issues.apache.org/jira/browse/BEAM-6138 > Project: Beam > Issue Type: New Feature > Components: java-fn-execution >Reporter: Alex Amato >Assignee: Alex Amato >Priority: Major > Labels: triaged > Time Spent: 9h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6138) Add User Metric Support to Java SDK
[ https://issues.apache.org/jira/browse/BEAM-6138?focusedWorklogId=233207&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233207 ] ASF GitHub Bot logged work on BEAM-6138: Author: ASF GitHub Bot Created on: 26/Apr/19 00:42 Start Date: 26/Apr/19 00:42 Worklog Time Spent: 10m Work Description: ajamato commented on pull request #8280: [BEAM-6138] Update java SDK to report user distribution tuple metrics over the FN API URL: https://github.com/apache/beam/pull/8280#discussion_r275425479 ## File path: runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MetricUrns.java ## @@ -31,6 +31,9 @@ public static MetricName parseUrn(String urn) { if (urn.startsWith(MonitoringInfoConstants.Urns.USER_COUNTER_PREFIX)) { urn = urn.substring(MonitoringInfoConstants.Urns.USER_COUNTER_PREFIX.length()); } +if (urn.startsWith(MonitoringInfoConstants.Urns.USER_DISTRIBUTION_PREFIX)) { Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233207) Time Spent: 9.5h (was: 9h 20m) > Add User Metric Support to Java SDK > --- > > Key: BEAM-6138 > URL: https://issues.apache.org/jira/browse/BEAM-6138 > Project: Beam > Issue Type: New Feature > Components: java-fn-execution >Reporter: Alex Amato >Assignee: Alex Amato >Priority: Major > Labels: triaged > Time Spent: 9.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7087) Create an errors package for Go SDK
[ https://issues.apache.org/jira/browse/BEAM-7087?focusedWorklogId=233194&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233194 ] ASF GitHub Bot logged work on BEAM-7087: Author: ASF GitHub Bot Created on: 26/Apr/19 00:14 Start Date: 26/Apr/19 00:14 Worklog Time Spent: 10m Work Description: youngoli commented on issue #8406: [BEAM-7087] Updating Go SDK errors in base beam directory URL: https://github.com/apache/beam/pull/8406#issuecomment-486881210 R: @lostluck CC: @aaltay Ahmet I may be sending some of the later PRs to you for review, so wanted to keep you in the loop. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233194) Time Spent: 1h (was: 50m) > Create an errors package for Go SDK > --- > > Key: BEAM-7087 > URL: https://issues.apache.org/jira/browse/BEAM-7087 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Major > Fix For: Not applicable > > Time Spent: 1h > Remaining Estimate: 0h > > We can't import an established errors package such as github.com/pkg/errors > due to how the Go SDK is currently implemented, so instead we should have our > own package. This package should provide the following to be worth replacing > the basic errors we currently use. > # A way to wrap other errors without losing the original error, as in > [https://github.com/pkg/errors] > # Making printed errors more legible than currently. (For ex. having > different wrapped errors on different lines, differentiating between errors > and context, etc.) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5709) Tests in BeamFnControlServiceTest are flaky.
[ https://issues.apache.org/jira/browse/BEAM-5709?focusedWorklogId=233193&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233193 ] ASF GitHub Bot logged work on BEAM-5709: Author: ASF GitHub Bot Created on: 26/Apr/19 00:13 Start Date: 26/Apr/19 00:13 Worklog Time Spent: 10m Work Description: boyuanzz commented on pull request #8395: [BEAM-5709] Changing sleeps to CountdownLatch in test URL: https://github.com/apache/beam/pull/8395 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233193) Time Spent: 2h (was: 1h 50m) > Tests in BeamFnControlServiceTest are flaky. > > > Key: BEAM-5709 > URL: https://issues.apache.org/jira/browse/BEAM-5709 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Minor > Labels: triaged > Fix For: Not applicable > > Time Spent: 2h > Remaining Estimate: 0h > > https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/BeamFnControlServiceTest.java > Tests for BeamFnControlService are currently flaky. The test attempts to > verify that onCompleted was called on the mock streams, but that function > gets called on a separate thread, so occasionally the function will not have > been called yet, despite the server being shut down. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7087) Create an errors package for Go SDK
[ https://issues.apache.org/jira/browse/BEAM-7087?focusedWorklogId=233192&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233192 ] ASF GitHub Bot logged work on BEAM-7087: Author: ASF GitHub Bot Created on: 26/Apr/19 00:12 Start Date: 26/Apr/19 00:12 Worklog Time Spent: 10m Work Description: youngoli commented on pull request #8406: [BEAM-7087] Updating Go SDK errors in base beam directory URL: https://github.com/apache/beam/pull/8406 This change updates code in sdks/go/pkg/beam to use the new errors package, and (mostly) does not change code in any sub-directories. It should be the first of several PRs to update all our Go SDK code to use the new errors where possible. That said, this change isn't as "clean" as I'd like it to be. During the course of changing the calls I decided to change some of the messages slightly to more accurately match the intended style of errors that this package promotes (i.e. context should say "doing this" instead of "something broke"). You can see this in the spots where I completely changed error messages. But in order to retain all the useful information, I sometimes had to go and edit the calls that generated the error and so forth. So this PR is a mix of actual error improvements and just changing calls. For future PRs in this effort I'm definitely going to keep it more strict and not change any error implementations (I can note those down and put them in their own PR). Unfortunately I was too far into this change to separate it out easily, so whoever reviews this gets the short end of the stick. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [x] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_
[jira] [Created] (BEAM-7154) Switch internal Go SDK error code to use new package
Daniel Oliveira created BEAM-7154: - Summary: Switch internal Go SDK error code to use new package Key: BEAM-7154 URL: https://issues.apache.org/jira/browse/BEAM-7154 Project: Beam Issue Type: Improvement Components: sdk-go Reporter: Daniel Oliveira Assignee: Daniel Oliveira I added a new package for errors in the Go SDK: [https://github.com/apache/beam/pull/8369] This issue tracks progress on modifying existing error code, which mostly uses fmt.Errorf, to use this new package. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-7087) Create an errors package for Go SDK
[ https://issues.apache.org/jira/browse/BEAM-7087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira resolved BEAM-7087. --- Resolution: Fixed Fix Version/s: Not applicable > Create an errors package for Go SDK > --- > > Key: BEAM-7087 > URL: https://issues.apache.org/jira/browse/BEAM-7087 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Major > Fix For: Not applicable > > Time Spent: 40m > Remaining Estimate: 0h > > We can't import an established errors package such as github.com/pkg/errors > due to how the Go SDK is currently implemented, so instead we should have our > own package. This package should provide the following to be worth replacing > the basic errors we currently use. > # A way to wrap other errors without losing the original error, as in > [https://github.com/pkg/errors] > # Making printed errors more legible than currently. (For ex. having > different wrapped errors on different lines, differentiating between errors > and context, etc.) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-7153) pydoc broken for @deprecated annotations
Udi Meiri created BEAM-7153: --- Summary: pydoc broken for @deprecated annotations Key: BEAM-7153 URL: https://issues.apache.org/jira/browse/BEAM-7153 Project: Beam Issue Type: Bug Components: sdk-py-core Reporter: Udi Meiri It seems this is a similar issue to https://issues.apache.org/jira/browse/BEAM-7062 but I haven't investigated. Example: https://beam.apache.org/releases/pydoc/2.12.0/apache_beam.io.gcp.bigquery.html#apache_beam.io.gcp.bigquery.BigQuerySink -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7043) Add DynamoDBIO
[ https://issues.apache.org/jira/browse/BEAM-7043?focusedWorklogId=233178&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233178 ] ASF GitHub Bot logged work on BEAM-7043: Author: ASF GitHub Bot Created on: 25/Apr/19 23:34 Start Date: 25/Apr/19 23:34 Worklog Time Spent: 10m Work Description: cmachgodaddy commented on issue #8390: [BEAM-7043] Add DynamoDBIO URL: https://github.com/apache/beam/pull/8390#issuecomment-486874220 @chamikaramj , @lukecwik , @mxm , can you guys review this PR? Thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233178) Time Spent: 3h 20m (was: 3h 10m) > Add DynamoDBIO > -- > > Key: BEAM-7043 > URL: https://issues.apache.org/jira/browse/BEAM-7043 > Project: Beam > Issue Type: New Feature > Components: io-java-aws >Reporter: Cam Mach >Assignee: Cam Mach >Priority: Minor > Time Spent: 3h 20m > Remaining Estimate: 0h > > Currently we don't have any feature to write data to AWS DynamoDB. This > feature will enable us to send data to DynamoDB -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7043) Add DynamoDBIO
[ https://issues.apache.org/jira/browse/BEAM-7043?focusedWorklogId=233175&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233175 ] ASF GitHub Bot logged work on BEAM-7043: Author: ASF GitHub Bot Created on: 25/Apr/19 23:29 Start Date: 25/Apr/19 23:29 Worklog Time Spent: 10m Work Description: cmachgodaddy commented on pull request #8390: [BEAM-7043] Add DynamoDBIO URL: https://github.com/apache/beam/pull/8390#discussion_r278766976 ## File path: sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/dynamodb/AwsClientProviderMock.java ## @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.aws.dynamodb; + +import com.amazonaws.services.cloudwatch.AmazonCloudWatch; +import com.amazonaws.services.dynamodbv2.AmazonDynamoDB; +import org.mockito.Mockito; + +/** + * Mocking AwsClientProvider instance so it can take i different {@link AmazonDynamoDB} object type. + */ +public class AwsClientProviderMock implements AwsClientsProvider { Review comment: @iemejia , thanks for your comments. Working to address them This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233175) Time Spent: 3h 10m (was: 3h) > Add DynamoDBIO > -- > > Key: BEAM-7043 > URL: https://issues.apache.org/jira/browse/BEAM-7043 > Project: Beam > Issue Type: New Feature > Components: io-java-aws >Reporter: Cam Mach >Assignee: Cam Mach >Priority: Minor > Time Spent: 3h 10m > Remaining Estimate: 0h > > Currently we don't have any feature to write data to AWS DynamoDB. This > feature will enable us to send data to DynamoDB -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-7152) public class TestPipeline fails with java.lang.IllegalStateException
Hil created BEAM-7152: - Summary: public class TestPipeline fails with java.lang.IllegalStateException Key: BEAM-7152 URL: https://issues.apache.org/jira/browse/BEAM-7152 Project: Beam Issue Type: Bug Components: testing Affects Versions: 2.11.0 Environment: hil@xeon:~/mavenwave/mw-finack/deploy$ java -version openjdk version "1.8.0_191" OpenJDK Runtime Environment (build 1.8.0_191-8u191-b12-2ubuntu0.18.04.1-b12) OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode) hil@xeon:~/mavenwave/mw-finack/deploy$ uname -a Linux xeon 4.15.0-42-generic #45-Ubuntu SMP Thu Nov 15 19:32:57 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux hil@xeon:~/mavenwave/mw-finack/deploy$ Reporter: Hil I tried version 2.12 and 2.11 for [creating unit tests|[https://cwiki.apache.org/confluence/display/BEAM/Contribution+Testing+Guide#ContributionTestingGuide-EffectiveuseoftheTestPipelineJUnitrule]]: {code:java} public class FinackPipelineTest { @Rule public final TestPipeline pipeline = TestPipeline.create(); @Test @Category(NeedsRunner.class) public void myPipelineTest() throws Exception { String WORDS = "words"; String WHATEVER = "whatever"; final PCollection pCollection = pipeline .apply("Create", Create.of(WORDS).withCoder(StringUtf8Coder.of())) .apply( "Map1", MapElements.via( new SimpleFunction() { @Override public String apply(final String input) { return WHATEVER; } })); PAssert.that(pCollection).containsInAnyOrder(WHATEVER); pipeline.run(); } }{code} When I run the unit tests in Intellij, I got the following error: {color:#FF}java.lang.IllegalStateException: Is your TestPipeline declaration missing a @Rule annotation? Usage: @Rule public final transient TestPipeline pipeline = TestPipeline.create();{color} {color:#FF}at org.apache.beam.vendor.guava.v20_0.com.google.common.base.Preconditions.checkState(Preconditions.java:444){color} {color:#FF} at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:337){color} {color:#FF} at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:331){color} {color:#FF} at com.finack.app.FinackPipelineTest.myPipelineTest(FinackPipelineTest.java:86){color} {color:#FF} at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method){color} {color:#FF} at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62){color} {color:#FF} at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){color} {color:#FF} at java.lang.reflect.Method.invoke(Method.java:498){color} {color:#FF} at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:628){color} {color:#FF} at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:117){color} {color:#FF} at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:184){color} {color:#FF} at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73){color} {color:#FF} at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:180){color} {color:#FF} at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:127){color} {color:#FF} at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:68){color} {color:#FF} at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:135){color} {color:#FF} at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73){color} {color:#FF} at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:125){color} {color:#FF} at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:135){color} {color:#FF} at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:123){color} {color:#FF} at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73){color} {color:#FF} at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:122){color} {color:#FF} at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:80){color} {color:#FF} at java.util.ArrayLi
[jira] [Work logged] (BEAM-4046) Decouple gradle project names and maven artifact ids
[ https://issues.apache.org/jira/browse/BEAM-4046?focusedWorklogId=233172&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233172 ] ASF GitHub Bot logged work on BEAM-4046: Author: ASF GitHub Bot Created on: 25/Apr/19 23:20 Start Date: 25/Apr/19 23:20 Worklog Time Spent: 10m Work Description: adude3141 commented on issue #8194: [DO NOT MERGE] [BEAM-4046] decouple gradle project names and maven artifact ids URL: https://github.com/apache/beam/pull/8194#issuecomment-486871641 Run Java Flink PortableValidatesRunner Streaming This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233172) Time Spent: 11.5h (was: 11h 20m) > Decouple gradle project names and maven artifact ids > > > Key: BEAM-4046 > URL: https://issues.apache.org/jira/browse/BEAM-4046 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Kenneth Knowles >Assignee: Michael Luckey >Priority: Major > Time Spent: 11.5h > Remaining Estimate: 0h > > In our first draft, we had gradle projects like {{":beam-sdks-java-core"}}. > It is clumsy and requires a hacky settings.gradle that is not idiomatic. > In our second draft, we changed them to names that work well with Gradle, > like {{":sdks:java:core"}}. This caused Maven artifact IDs to be wonky. > In our third draft, we regressed to the first draft to get the Maven artifact > ids right. > These should be able to be decoupled. It seems there are many StackOverflow > questions on the subject. > Since it is unidiomatic and a poor user experience, if it does turn out to be > mandatory then it needs to be documented inline everywhere - the > settings.gradle should say why it is so bizarre, and each build.gradle should > indicate what its project id is. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4046) Decouple gradle project names and maven artifact ids
[ https://issues.apache.org/jira/browse/BEAM-4046?focusedWorklogId=233171&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233171 ] ASF GitHub Bot logged work on BEAM-4046: Author: ASF GitHub Bot Created on: 25/Apr/19 23:19 Start Date: 25/Apr/19 23:19 Worklog Time Spent: 10m Work Description: adude3141 commented on issue #8194: [DO NOT MERGE] [BEAM-4046] decouple gradle project names and maven artifact ids URL: https://github.com/apache/beam/pull/8194#issuecomment-486871489 Run Dataflow PortabilityApi ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233171) Time Spent: 11h 20m (was: 11h 10m) > Decouple gradle project names and maven artifact ids > > > Key: BEAM-4046 > URL: https://issues.apache.org/jira/browse/BEAM-4046 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Kenneth Knowles >Assignee: Michael Luckey >Priority: Major > Time Spent: 11h 20m > Remaining Estimate: 0h > > In our first draft, we had gradle projects like {{":beam-sdks-java-core"}}. > It is clumsy and requires a hacky settings.gradle that is not idiomatic. > In our second draft, we changed them to names that work well with Gradle, > like {{":sdks:java:core"}}. This caused Maven artifact IDs to be wonky. > In our third draft, we regressed to the first draft to get the Maven artifact > ids right. > These should be able to be decoupled. It seems there are many StackOverflow > questions on the subject. > Since it is unidiomatic and a poor user experience, if it does turn out to be > mandatory then it needs to be documented inline everywhere - the > settings.gradle should say why it is so bizarre, and each build.gradle should > indicate what its project id is. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4046) Decouple gradle project names and maven artifact ids
[ https://issues.apache.org/jira/browse/BEAM-4046?focusedWorklogId=233170&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233170 ] ASF GitHub Bot logged work on BEAM-4046: Author: ASF GitHub Bot Created on: 25/Apr/19 23:19 Start Date: 25/Apr/19 23:19 Worklog Time Spent: 10m Work Description: adude3141 commented on issue #8194: [DO NOT MERGE] [BEAM-4046] decouple gradle project names and maven artifact ids URL: https://github.com/apache/beam/pull/8194#issuecomment-486871424 Run Dataflow ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233170) Time Spent: 11h 10m (was: 11h) > Decouple gradle project names and maven artifact ids > > > Key: BEAM-4046 > URL: https://issues.apache.org/jira/browse/BEAM-4046 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Kenneth Knowles >Assignee: Michael Luckey >Priority: Major > Time Spent: 11h 10m > Remaining Estimate: 0h > > In our first draft, we had gradle projects like {{":beam-sdks-java-core"}}. > It is clumsy and requires a hacky settings.gradle that is not idiomatic. > In our second draft, we changed them to names that work well with Gradle, > like {{":sdks:java:core"}}. This caused Maven artifact IDs to be wonky. > In our third draft, we regressed to the first draft to get the Maven artifact > ids right. > These should be able to be decoupled. It seems there are many StackOverflow > questions on the subject. > Since it is unidiomatic and a poor user experience, if it does turn out to be > mandatory then it needs to be documented inline everywhere - the > settings.gradle should say why it is so bizarre, and each build.gradle should > indicate what its project id is. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7043) Add DynamoDBIO
[ https://issues.apache.org/jira/browse/BEAM-7043?focusedWorklogId=233168&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233168 ] ASF GitHub Bot logged work on BEAM-7043: Author: ASF GitHub Bot Created on: 25/Apr/19 23:16 Start Date: 25/Apr/19 23:16 Worklog Time Spent: 10m Work Description: cmachgodaddy commented on issue #8390: [BEAM-7043] Add DynamoDBIO URL: https://github.com/apache/beam/pull/8390#issuecomment-486870805 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233168) Time Spent: 3h (was: 2h 50m) > Add DynamoDBIO > -- > > Key: BEAM-7043 > URL: https://issues.apache.org/jira/browse/BEAM-7043 > Project: Beam > Issue Type: New Feature > Components: io-java-aws >Reporter: Cam Mach >Assignee: Cam Mach >Priority: Minor > Time Spent: 3h > Remaining Estimate: 0h > > Currently we don't have any feature to write data to AWS DynamoDB. This > feature will enable us to send data to DynamoDB -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6880) Deprecate Java Portable Reference Runner
[ https://issues.apache.org/jira/browse/BEAM-6880?focusedWorklogId=233155&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233155 ] ASF GitHub Bot logged work on BEAM-6880: Author: ASF GitHub Bot Created on: 25/Apr/19 23:00 Start Date: 25/Apr/19 23:00 Worklog Time Spent: 10m Work Description: youngoli commented on issue #8380: [BEAM-6880] Remove deprecated Reference Runner code. URL: https://github.com/apache/beam/pull/8380#issuecomment-486867567 Run Java Flink PortableValidatesRunner Streaming This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233155) Time Spent: 4h (was: 3h 50m) > Deprecate Java Portable Reference Runner > > > Key: BEAM-6880 > URL: https://issues.apache.org/jira/browse/BEAM-6880 > Project: Beam > Issue Type: New Feature > Components: runner-direct, test-failures, testing >Reporter: Mikhail Gryzykhin >Assignee: Daniel Oliveira >Priority: Major > Time Spent: 4h > Remaining Estimate: 0h > > This ticket is about deprecating Java Portable Reference runner. > > Discussion is happening in [this > thread|[https://lists.apache.org/thread.html/0b68efce9b7f2c5297b32d09e5d903e9b354199fe2ce446fbcd240bc@%3Cdev.beam.apache.org%3E]] > > > Current summary is: disable beam_PostCommit_Java_PVR_Reference job. > Keeping or removing reference runner code is still under discussion. It is > suggested to create PR that removes relevant code and start voting there. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7138) keep Java serialized coder in length-prefixed wire coder construction
[ https://issues.apache.org/jira/browse/BEAM-7138?focusedWorklogId=233137&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233137 ] ASF GitHub Bot logged work on BEAM-7138: Author: ASF GitHub Bot Created on: 25/Apr/19 22:38 Start Date: 25/Apr/19 22:38 Worklog Time Spent: 10m Work Description: ihji commented on issue #8396: [BEAM-7138] keep Java serialized coder in wire coder construction URL: https://github.com/apache/beam/pull/8396#issuecomment-486862571 I'll look into what's wrong in the test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233137) Time Spent: 50m (was: 40m) > keep Java serialized coder in length-prefixed wire coder construction > - > > Key: BEAM-7138 > URL: https://issues.apache.org/jira/browse/BEAM-7138 > Project: Beam > Issue Type: Improvement > Components: java-fn-execution >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > don't replace Java serialized coder with byte array coder in length-prefixed > wire coder construction. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7138) keep Java serialized coder in length-prefixed wire coder construction
[ https://issues.apache.org/jira/browse/BEAM-7138?focusedWorklogId=233136&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233136 ] ASF GitHub Bot logged work on BEAM-7138: Author: ASF GitHub Bot Created on: 25/Apr/19 22:37 Start Date: 25/Apr/19 22:37 Worklog Time Spent: 10m Work Description: ihji commented on issue #8396: [BEAM-7138] keep Java serialized coder in wire coder construction URL: https://github.com/apache/beam/pull/8396#issuecomment-486862457 @mxm I encountered the `ClassCastException` when I tried to use `.withMaxNumRecords` in `KafkaIO`. It generates a `PCollection` of `AutoValue_BoundedReadFromUnboundedSource_Shard` and current implementation of WireCoder doesn't support Java serialized coder which this `PCollection` needs to use. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233136) Time Spent: 40m (was: 0.5h) > keep Java serialized coder in length-prefixed wire coder construction > - > > Key: BEAM-7138 > URL: https://issues.apache.org/jira/browse/BEAM-7138 > Project: Beam > Issue Type: Improvement > Components: java-fn-execution >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > don't replace Java serialized coder with byte array coder in length-prefixed > wire coder construction. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7139) Blog post announcing Kotlin Sample addition to Beam
[ https://issues.apache.org/jira/browse/BEAM-7139?focusedWorklogId=233133&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233133 ] ASF GitHub Bot logged work on BEAM-7139: Author: ASF GitHub Bot Created on: 25/Apr/19 22:17 Start Date: 25/Apr/19 22:17 Worklog Time Spent: 10m Work Description: harshithdwivedi commented on issue #8391: [BEAM-7139] Blogpost for Kotlin Samples URL: https://github.com/apache/beam/pull/8391#issuecomment-486857625 No worries, done with the changes! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233133) Time Spent: 1h 10m (was: 1h) > Blog post announcing Kotlin Sample addition to Beam > --- > > Key: BEAM-7139 > URL: https://issues.apache.org/jira/browse/BEAM-7139 > Project: Beam > Issue Type: Task > Components: website >Reporter: Harshit Dwivedi >Assignee: Harshit Dwivedi >Priority: Trivial > Time Spent: 1h 10m > Remaining Estimate: 0h > > Publishing a quick blog post that lets the users know that the Beam samples > now have kotlin support. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-7141) Expose kv and window parameters for on_timer
[ https://issues.apache.org/jira/browse/BEAM-7141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh Kumar reassigned BEAM-7141: -- Assignee: Rakesh Kumar > Expose kv and window parameters for on_timer > > > Key: BEAM-7141 > URL: https://issues.apache.org/jira/browse/BEAM-7141 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Affects Versions: 2.12.0 >Reporter: Thomas Weise >Assignee: Rakesh Kumar >Priority: Major > > We would like to have access to key and window inside the timer callback. > Without, it is also difficult to debug. We run into this while working on > BEAM-7112 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work started] (BEAM-7141) Expose kv and window parameters for on_timer
[ https://issues.apache.org/jira/browse/BEAM-7141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-7141 started by Rakesh Kumar. -- > Expose kv and window parameters for on_timer > > > Key: BEAM-7141 > URL: https://issues.apache.org/jira/browse/BEAM-7141 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Affects Versions: 2.12.0 >Reporter: Thomas Weise >Assignee: Rakesh Kumar >Priority: Major > > We would like to have access to key and window inside the timer callback. > Without, it is also difficult to debug. We run into this while working on > BEAM-7112 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6880) Deprecate Java Portable Reference Runner
[ https://issues.apache.org/jira/browse/BEAM-6880?focusedWorklogId=233125&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233125 ] ASF GitHub Bot logged work on BEAM-6880: Author: ASF GitHub Bot Created on: 25/Apr/19 21:56 Start Date: 25/Apr/19 21:56 Worklog Time Spent: 10m Work Description: youngoli commented on issue #8380: [BEAM-6880] Remove deprecated Reference Runner code. URL: https://github.com/apache/beam/pull/8380#issuecomment-486852153 Run Java Flink PortableValidatesRunner Streaming This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233125) Time Spent: 3h 50m (was: 3h 40m) > Deprecate Java Portable Reference Runner > > > Key: BEAM-6880 > URL: https://issues.apache.org/jira/browse/BEAM-6880 > Project: Beam > Issue Type: New Feature > Components: runner-direct, test-failures, testing >Reporter: Mikhail Gryzykhin >Assignee: Daniel Oliveira >Priority: Major > Time Spent: 3h 50m > Remaining Estimate: 0h > > This ticket is about deprecating Java Portable Reference runner. > > Discussion is happening in [this > thread|[https://lists.apache.org/thread.html/0b68efce9b7f2c5297b32d09e5d903e9b354199fe2ce446fbcd240bc@%3Cdev.beam.apache.org%3E]] > > > Current summary is: disable beam_PostCommit_Java_PVR_Reference job. > Keeping or removing reference runner code is still under discussion. It is > suggested to create PR that removes relevant code and start voting there. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7121) Provide deterministic version of Python's ProtoCoder
[ https://issues.apache.org/jira/browse/BEAM-7121?focusedWorklogId=233126&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233126 ] ASF GitHub Bot logged work on BEAM-7121: Author: ASF GitHub Bot Created on: 25/Apr/19 21:56 Start Date: 25/Apr/19 21:56 Worklog Time Spent: 10m Work Description: charlesccychen commented on issue #8377: [BEAM-7121] Add deterministic proto coder URL: https://github.com/apache/beam/pull/8377#issuecomment-486852299 Also note that precommits fail because of lint: ``` 11:31:44 > Task :beam-sdks-python:lintPy27 FAILED 11:31:44 * Module apache_beam.coders.coders_test 11:31:44 C: 95, 0: Line too long (102/80) (line-too-long) 11:31:44 11:31:44 11:31:44 Your code has been rated at 10.00/10 (previous run: 10.00/10, -0.00) 11:31:44 11:31:44 Command exited with non-zero status 16 11:31:44 601.37user 16.20system 2:47.52elapsed 368%CPU (0avgtext+0avgdata 422480maxresident)k 11:31:44 0inputs+224outputs (0major+828313minor)pagefaults 0swaps 11:31:44 ERROR: InvocationError for command '/usr/bin/time /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/build/srcs/sdks/python/scripts/run_pylint.sh' (exited with code 16) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233126) Time Spent: 1h 10m (was: 1h) > Provide deterministic version of Python's ProtoCoder > > > Key: BEAM-7121 > URL: https://issues.apache.org/jira/browse/BEAM-7121 > Project: Beam > Issue Type: Improvement > Components: runner-core >Reporter: Yifan Mai >Priority: Minor > Time Spent: 1h 10m > Remaining Estimate: 0h > > Passing deterministic=true to proto's > [SerializeToString|https://github.com/protocolbuffers/protobuf/blob/60b66a119d17f0a2a595a231bea87cd4f4cf2689/python/google/protobuf/message.py#L189-L204] > will result in deterministic encoding of maps in protos. This can be used to > provide a deterministic version of ProtoCoder. > This would allow protos to be used as a key for grouping by key. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6880) Deprecate Java Portable Reference Runner
[ https://issues.apache.org/jira/browse/BEAM-6880?focusedWorklogId=233123&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233123 ] ASF GitHub Bot logged work on BEAM-6880: Author: ASF GitHub Bot Created on: 25/Apr/19 21:55 Start Date: 25/Apr/19 21:55 Worklog Time Spent: 10m Work Description: youngoli commented on issue #8380: [BEAM-6880] Remove deprecated Reference Runner code. URL: https://github.com/apache/beam/pull/8380#issuecomment-486852057 Run Java Flink PortableValidatesRunner Batch This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233123) Time Spent: 3h 40m (was: 3.5h) > Deprecate Java Portable Reference Runner > > > Key: BEAM-6880 > URL: https://issues.apache.org/jira/browse/BEAM-6880 > Project: Beam > Issue Type: New Feature > Components: runner-direct, test-failures, testing >Reporter: Mikhail Gryzykhin >Assignee: Daniel Oliveira >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > This ticket is about deprecating Java Portable Reference runner. > > Discussion is happening in [this > thread|[https://lists.apache.org/thread.html/0b68efce9b7f2c5297b32d09e5d903e9b354199fe2ce446fbcd240bc@%3Cdev.beam.apache.org%3E]] > > > Current summary is: disable beam_PostCommit_Java_PVR_Reference job. > Keeping or removing reference runner code is still under discussion. It is > suggested to create PR that removes relevant code and start voting there. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7121) Provide deterministic version of Python's ProtoCoder
[ https://issues.apache.org/jira/browse/BEAM-7121?focusedWorklogId=233124&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233124 ] ASF GitHub Bot logged work on BEAM-7121: Author: ASF GitHub Bot Created on: 25/Apr/19 21:55 Start Date: 25/Apr/19 21:55 Worklog Time Spent: 10m Work Description: charlesccychen commented on issue #8377: [BEAM-7121] Add deterministic proto coder URL: https://github.com/apache/beam/pull/8377#issuecomment-486852058 Thanks. @yifanmai @aaltay @robertwb What is the rationale for keeping both ProtoCoder and DeterministicProtoCoder going forward? Is there so much of a performance difference? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233124) Time Spent: 1h (was: 50m) > Provide deterministic version of Python's ProtoCoder > > > Key: BEAM-7121 > URL: https://issues.apache.org/jira/browse/BEAM-7121 > Project: Beam > Issue Type: Improvement > Components: runner-core >Reporter: Yifan Mai >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > Passing deterministic=true to proto's > [SerializeToString|https://github.com/protocolbuffers/protobuf/blob/60b66a119d17f0a2a595a231bea87cd4f4cf2689/python/google/protobuf/message.py#L189-L204] > will result in deterministic encoding of maps in protos. This can be used to > provide a deterministic version of ProtoCoder. > This would allow protos to be used as a key for grouping by key. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6880) Deprecate Java Portable Reference Runner
[ https://issues.apache.org/jira/browse/BEAM-6880?focusedWorklogId=233122&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233122 ] ASF GitHub Bot logged work on BEAM-6880: Author: ASF GitHub Bot Created on: 25/Apr/19 21:55 Start Date: 25/Apr/19 21:55 Worklog Time Spent: 10m Work Description: youngoli commented on issue #8380: [BEAM-6880] Remove deprecated Reference Runner code. URL: https://github.com/apache/beam/pull/8380#issuecomment-486851997 Run Java PortabilityApi PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233122) Time Spent: 3.5h (was: 3h 20m) > Deprecate Java Portable Reference Runner > > > Key: BEAM-6880 > URL: https://issues.apache.org/jira/browse/BEAM-6880 > Project: Beam > Issue Type: New Feature > Components: runner-direct, test-failures, testing >Reporter: Mikhail Gryzykhin >Assignee: Daniel Oliveira >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > > This ticket is about deprecating Java Portable Reference runner. > > Discussion is happening in [this > thread|[https://lists.apache.org/thread.html/0b68efce9b7f2c5297b32d09e5d903e9b354199fe2ce446fbcd240bc@%3Cdev.beam.apache.org%3E]] > > > Current summary is: disable beam_PostCommit_Java_PVR_Reference job. > Keeping or removing reference runner code is still under discussion. It is > suggested to create PR that removes relevant code and start voting there. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7012) Support TestStream in FlinkRunner
[ https://issues.apache.org/jira/browse/BEAM-7012?focusedWorklogId=233116&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233116 ] ASF GitHub Bot logged work on BEAM-7012: Author: ASF GitHub Bot Created on: 25/Apr/19 21:30 Start Date: 25/Apr/19 21:30 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #8383: [BEAM-7012] Support TestStream in streaming Flink Runner URL: https://github.com/apache/beam/pull/8383#issuecomment-486845015 Nice! It is great to have an example of this for a runner other than the direct runner. I will read this later for sure. Sorry I didn't have time to review it promptly. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233116) Time Spent: 2h 20m (was: 2h 10m) > Support TestStream in FlinkRunner > - > > Key: BEAM-7012 > URL: https://issues.apache.org/jira/browse/BEAM-7012 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Fix For: 2.13.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > TestStream is a primitive transform which is only supported by the > DirectRunner. It might be useful to also implement it in the Flink Runner to > run similar kind of tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6621) Exercise Dataflow runner integration tests in a postcommit suite for Python 3.5 and 3.6
[ https://issues.apache.org/jira/browse/BEAM-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826479#comment-16826479 ] Valentyn Tymofieiev commented on BEAM-6621: --- It's partially completed, I removed the tag. > Exercise Dataflow runner integration tests in a postcommit suite for Python > 3.5 and 3.6 > --- > > Key: BEAM-6621 > URL: https://issues.apache.org/jira/browse/BEAM-6621 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Valentyn Tymofieiev >Assignee: Juta Staes >Priority: Critical > Labels: triaged > Time Spent: 1h 20m > Remaining Estimate: 0h > > [https://github.com/apache/beam/pull/7756] adds a test suite with a single > test. Once merged, we need to extend the suite to all available integration > tests. If some tests are not passing, we can track any remaining > parallelizable work in separate JIRAs. > cc: [~Juta] > cc: [~markflyhigh] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6621) Exercise Dataflow runner integration tests in a postcommit suite for Python 3.5 and 3.6
[ https://issues.apache.org/jira/browse/BEAM-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev updated BEAM-6621: -- Fix Version/s: (was: 2.12.0) > Exercise Dataflow runner integration tests in a postcommit suite for Python > 3.5 and 3.6 > --- > > Key: BEAM-6621 > URL: https://issues.apache.org/jira/browse/BEAM-6621 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Valentyn Tymofieiev >Assignee: Juta Staes >Priority: Critical > Labels: triaged > Time Spent: 1h 20m > Remaining Estimate: 0h > > [https://github.com/apache/beam/pull/7756] adds a test suite with a single > test. Once merged, we need to extend the suite to all available integration > tests. If some tests are not passing, we can track any remaining > parallelizable work in separate JIRAs. > cc: [~Juta] > cc: [~markflyhigh] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4543) Remove dependency on googledatastore in favor of google-cloud-datastore.
[ https://issues.apache.org/jira/browse/BEAM-4543?focusedWorklogId=233099&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233099 ] ASF GitHub Bot logged work on BEAM-4543: Author: ASF GitHub Bot Created on: 25/Apr/19 20:41 Start Date: 25/Apr/19 20:41 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #8262: [BEAM-4543] Python Datastore IO using google-cloud-datastore URL: https://github.com/apache/beam/pull/8262#discussion_r278722641 ## File path: sdks/python/setup.py ## @@ -137,11 +137,14 @@ def get_version(): ] GCP_REQUIREMENTS = [ +'cachetools>=3.1.0,<4', # google-apitools 0.5.23 and above has important Python 3 supports. 'google-apitools>=0.5.26,<0.5.27', -'proto-google-cloud-datastore-v1>=0.90.0,<=0.90.4', Review comment: Don't we need proto-google-cloud-datastore-v1 for v1/datastoreio ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233099) Time Spent: 4h 20m (was: 4h 10m) > Remove dependency on googledatastore in favor of google-cloud-datastore. > > > Key: BEAM-4543 > URL: https://issues.apache.org/jira/browse/BEAM-4543 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Udi Meiri >Priority: Minor > Labels: triaged > Time Spent: 4h 20m > Remaining Estimate: 0h > > apache-beam[gcp] package depends [1] on googledatastore package [2]. We > should replace this dependency with google-cloud-datastore [3] which is > officially supported, has better release cadence and also has Python 3 > support. > [1] > https://github.com/apache/beam/blob/fad655462f8fadfdfaab0b7a09cab538f076f94e/sdks/python/setup.py#L126 > [2] [https://pypi.org/project/googledatastore/] > [3] [https://pypi.org/project/google-cloud-datastore/] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4543) Remove dependency on googledatastore in favor of google-cloud-datastore.
[ https://issues.apache.org/jira/browse/BEAM-4543?focusedWorklogId=233098&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233098 ] ASF GitHub Bot logged work on BEAM-4543: Author: ASF GitHub Bot Created on: 25/Apr/19 20:41 Start Date: 25/Apr/19 20:41 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #8262: [BEAM-4543] Python Datastore IO using google-cloud-datastore URL: https://github.com/apache/beam/pull/8262#discussion_r278721359 ## File path: sdks/python/apache_beam/io/gcp/datastore/v1new/datastore_write_it_test.py ## @@ -0,0 +1,75 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""An integration test for datastore_write_it_pipeline + +This test creates entities and writes them to Cloud Datastore. Subsequently, +these entities are read from Cloud Datastore, compared to the expected value +for the entity, and deleted. + +There is no output; instead, we use `assert_that` transform to verify the +results in the pipeline. +""" + +from __future__ import absolute_import + +import logging +import random +import unittest +from datetime import datetime + +from hamcrest.core.core.allof import all_of +from nose.plugins.attrib import attr + +from apache_beam.testing.pipeline_verifiers import PipelineStateMatcher +from apache_beam.testing.test_pipeline import TestPipeline + +try: + from apache_beam.io.gcp.datastore.v1new import datastore_write_it_pipeline +# TODO(BEAM-4543): Remove TypeError once googledatastore dependency is removed. +except (ImportError, TypeError): + datastore_write_it_pipeline = None + + +class DatastoreWriteIT(unittest.TestCase): + + NUM_ENTITIES = 1001 + LIMIT = 500 + + def run_datastore_write(self, limit=None): +test_pipeline = TestPipeline(is_integration_test=True) +current_time = datetime.now().strftime("%m%d%H%M%S") +seed = random.randint(0, 10) +kind = 'testkind%s%d' % (current_time, seed) +pipeline_verifiers = [PipelineStateMatcher()] +extra_opts = {'kind': kind, + 'num_entities': self.NUM_ENTITIES, + 'on_success_matcher': all_of(*pipeline_verifiers)} +if limit is not None: + extra_opts['limit'] = limit + +datastore_write_it_pipeline.run(test_pipeline.get_full_options_as_args( +**extra_opts)) + + @attr('IT') + def test_datastore_write_limit(self): Review comment: Can you add a IT for read as well ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233098) Time Spent: 4h 10m (was: 4h) > Remove dependency on googledatastore in favor of google-cloud-datastore. > > > Key: BEAM-4543 > URL: https://issues.apache.org/jira/browse/BEAM-4543 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Udi Meiri >Priority: Minor > Labels: triaged > Time Spent: 4h 10m > Remaining Estimate: 0h > > apache-beam[gcp] package depends [1] on googledatastore package [2]. We > should replace this dependency with google-cloud-datastore [3] which is > officially supported, has better release cadence and also has Python 3 > support. > [1] > https://github.com/apache/beam/blob/fad655462f8fadfdfaab0b7a09cab538f076f94e/sdks/python/setup.py#L126 > [2] [https://pypi.org/project/googledatastore/] > [3] [https://pypi.org/project/google-cloud-datastore/] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4543) Remove dependency on googledatastore in favor of google-cloud-datastore.
[ https://issues.apache.org/jira/browse/BEAM-4543?focusedWorklogId=233101&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233101 ] ASF GitHub Bot logged work on BEAM-4543: Author: ASF GitHub Bot Created on: 25/Apr/19 20:41 Start Date: 25/Apr/19 20:41 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #8262: [BEAM-4543] Python Datastore IO using google-cloud-datastore URL: https://github.com/apache/beam/pull/8262#discussion_r278692493 ## File path: sdks/python/apache_beam/io/gcp/datastore/v1new/query_splitter_test.py ## @@ -173,5 +163,9 @@ def test_client_key_sort_key(self): self.assertEqual(expected_sort, keys) +# Hide base class from collection by nose. +del QuerySplitterTestBase Review comment: But the actual tests are in base class, right ? Do they still get collected ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233101) Time Spent: 4.5h (was: 4h 20m) > Remove dependency on googledatastore in favor of google-cloud-datastore. > > > Key: BEAM-4543 > URL: https://issues.apache.org/jira/browse/BEAM-4543 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Udi Meiri >Priority: Minor > Labels: triaged > Time Spent: 4.5h > Remaining Estimate: 0h > > apache-beam[gcp] package depends [1] on googledatastore package [2]. We > should replace this dependency with google-cloud-datastore [3] which is > officially supported, has better release cadence and also has Python 3 > support. > [1] > https://github.com/apache/beam/blob/fad655462f8fadfdfaab0b7a09cab538f076f94e/sdks/python/setup.py#L126 > [2] [https://pypi.org/project/googledatastore/] > [3] [https://pypi.org/project/google-cloud-datastore/] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4543) Remove dependency on googledatastore in favor of google-cloud-datastore.
[ https://issues.apache.org/jira/browse/BEAM-4543?focusedWorklogId=233097&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233097 ] ASF GitHub Bot logged work on BEAM-4543: Author: ASF GitHub Bot Created on: 25/Apr/19 20:41 Start Date: 25/Apr/19 20:41 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #8262: [BEAM-4543] Python Datastore IO using google-cloud-datastore URL: https://github.com/apache/beam/pull/8262#discussion_r278693190 ## File path: sdks/python/apache_beam/io/gcp/datastore/v1/datastoreio.py ## @@ -45,6 +50,10 @@ try: from google.cloud.proto.datastore.v1 import datastore_pb2 from googledatastore import helper as datastore_helper + logging.warning( + 'Using deprecated Datastore client.\n' + 'This client will be removed in the next Beam major release.\n' Review comment: "removed in Beam 3.0 (next Beam major release)" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233097) Time Spent: 4h (was: 3h 50m) > Remove dependency on googledatastore in favor of google-cloud-datastore. > > > Key: BEAM-4543 > URL: https://issues.apache.org/jira/browse/BEAM-4543 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Udi Meiri >Priority: Minor > Labels: triaged > Time Spent: 4h > Remaining Estimate: 0h > > apache-beam[gcp] package depends [1] on googledatastore package [2]. We > should replace this dependency with google-cloud-datastore [3] which is > officially supported, has better release cadence and also has Python 3 > support. > [1] > https://github.com/apache/beam/blob/fad655462f8fadfdfaab0b7a09cab538f076f94e/sdks/python/setup.py#L126 > [2] [https://pypi.org/project/googledatastore/] > [3] [https://pypi.org/project/google-cloud-datastore/] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4543) Remove dependency on googledatastore in favor of google-cloud-datastore.
[ https://issues.apache.org/jira/browse/BEAM-4543?focusedWorklogId=233100&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233100 ] ASF GitHub Bot logged work on BEAM-4543: Author: ASF GitHub Bot Created on: 25/Apr/19 20:41 Start Date: 25/Apr/19 20:41 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #8262: [BEAM-4543] Python Datastore IO using google-cloud-datastore URL: https://github.com/apache/beam/pull/8262#discussion_r278722835 ## File path: sdks/python/setup.py ## @@ -137,11 +137,14 @@ def get_version(): ] GCP_REQUIREMENTS = [ +'cachetools>=3.1.0,<4', # google-apitools 0.5.23 and above has important Python 3 supports. 'google-apitools>=0.5.26,<0.5.27', -'proto-google-cloud-datastore-v1>=0.90.0,<=0.90.4', -# [BEAM-4543] Datastore IO is not supported in Python 3. +# [BEAM-4543] googledatastore is not supported in Python 3. Review comment: Can both google-cloud-datastore and googledatastore installed in the same env without conflicts ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233100) > Remove dependency on googledatastore in favor of google-cloud-datastore. > > > Key: BEAM-4543 > URL: https://issues.apache.org/jira/browse/BEAM-4543 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Udi Meiri >Priority: Minor > Labels: triaged > Time Spent: 4h 20m > Remaining Estimate: 0h > > apache-beam[gcp] package depends [1] on googledatastore package [2]. We > should replace this dependency with google-cloud-datastore [3] which is > officially supported, has better release cadence and also has Python 3 > support. > [1] > https://github.com/apache/beam/blob/fad655462f8fadfdfaab0b7a09cab538f076f94e/sdks/python/setup.py#L126 > [2] [https://pypi.org/project/googledatastore/] > [3] [https://pypi.org/project/google-cloud-datastore/] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7112) State cleanup interferes with user timer callback
[ https://issues.apache.org/jira/browse/BEAM-7112?focusedWorklogId=233095&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233095 ] ASF GitHub Bot logged work on BEAM-7112: Author: ASF GitHub Bot Created on: 25/Apr/19 20:24 Start Date: 25/Apr/19 20:24 Worklog Time Spent: 10m Work Description: tweise commented on issue #8399: [BEAM-7112] Timer race with state cleanup - take two URL: https://github.com/apache/beam/pull/8399#issuecomment-486824637 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233095) Time Spent: 5h 50m (was: 5h 40m) > State cleanup interferes with user timer callback > - > > Key: BEAM-7112 > URL: https://issues.apache.org/jira/browse/BEAM-7112 > Project: Beam > Issue Type: Bug > Components: runner-flink >Affects Versions: 2.12.0 >Reporter: Thomas Weise >Assignee: Thomas Weise >Priority: Major > Labels: portability-flink > Fix For: 2.13.0 > > Time Spent: 5h 50m > Remaining Estimate: 0h > > Cleanup timers and user timers are fired at the watermark. Processing of > timers in the SDK worker is asynchronous, so it is possible that the state is > already removed when the user timer callback executes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6621) Exercise Dataflow runner integration tests in a postcommit suite for Python 3.5 and 3.6
[ https://issues.apache.org/jira/browse/BEAM-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826440#comment-16826440 ] Kenneth Knowles commented on BEAM-6621: --- This is listed as part of 2.12.0. Voting has just finished. Is it done? > Exercise Dataflow runner integration tests in a postcommit suite for Python > 3.5 and 3.6 > --- > > Key: BEAM-6621 > URL: https://issues.apache.org/jira/browse/BEAM-6621 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Valentyn Tymofieiev >Assignee: Juta Staes >Priority: Critical > Labels: triaged > Fix For: 2.12.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > [https://github.com/apache/beam/pull/7756] adds a test suite with a single > test. Once merged, we need to extend the suite to all available integration > tests. If some tests are not passing, we can track any remaining > parallelizable work in separate JIRAs. > cc: [~Juta] > cc: [~markflyhigh] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-7151) Support conjunction clause when it's only equi-join
Rui Wang created BEAM-7151: -- Summary: Support conjunction clause when it's only equi-join Key: BEAM-7151 URL: https://issues.apache.org/jira/browse/BEAM-7151 Project: Beam Issue Type: Sub-task Components: dsl-sql Reporter: Rui Wang conjunction_clause: function_call(function_parameter, ...) | field_access | column function_parameter: function_call | field_access In Beam, equi-join is implemented by CoGBK, which requires both join inputs (assume binary join) to build PCollection of KV, where the key is join key. For equi-join, conjunction clause is essentially an equation. In order to build KV, it requires that columns from different sides of equation should come from different join input. For example, a + b = 2 cannot be used to build join key but a = 2 - b can. So rewriting is required for clauses when it does not satisfy this property. It also implies that not every clause is rewritable. Say the clause is f(a, b) = 3, in which a is from left input and b is from right input. If this function f is not splittable, such that we cannot move a or b to right side of equation, then we cannot support this clause in BeamSQL's join. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7138) keep Java serialized coder in length-prefixed wire coder construction
[ https://issues.apache.org/jira/browse/BEAM-7138?focusedWorklogId=233092&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233092 ] ASF GitHub Bot logged work on BEAM-7138: Author: ASF GitHub Bot Created on: 25/Apr/19 20:03 Start Date: 25/Apr/19 20:03 Worklog Time Spent: 10m Work Description: mxm commented on issue #8396: [BEAM-7138] keep Java serialized coder in wire coder construction URL: https://github.com/apache/beam/pull/8396#issuecomment-486818011 Actually, just came across this as well when implementing TestStream for PVR. We need to look into why the Flink tests fail here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233092) Time Spent: 0.5h (was: 20m) > keep Java serialized coder in length-prefixed wire coder construction > - > > Key: BEAM-7138 > URL: https://issues.apache.org/jira/browse/BEAM-7138 > Project: Beam > Issue Type: Improvement > Components: java-fn-execution >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > don't replace Java serialized coder with byte array coder in length-prefixed > wire coder construction. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7139) Blog post announcing Kotlin Sample addition to Beam
[ https://issues.apache.org/jira/browse/BEAM-7139?focusedWorklogId=233090&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233090 ] ASF GitHub Bot logged work on BEAM-7139: Author: ASF GitHub Bot Created on: 25/Apr/19 20:02 Start Date: 25/Apr/19 20:02 Worklog Time Spent: 10m Work Description: pabloem commented on issue #8391: [BEAM-7139] Blogpost for Kotlin Samples URL: https://github.com/apache/beam/pull/8391#issuecomment-486817686 btw, to see the staged version of the website, check the jenkins result for the Website_Stage_GCS test task. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233090) Time Spent: 1h (was: 50m) > Blog post announcing Kotlin Sample addition to Beam > --- > > Key: BEAM-7139 > URL: https://issues.apache.org/jira/browse/BEAM-7139 > Project: Beam > Issue Type: Task > Components: website >Reporter: Harshit Dwivedi >Assignee: Harshit Dwivedi >Priority: Trivial > Time Spent: 1h > Remaining Estimate: 0h > > Publishing a quick blog post that lets the users know that the Beam samples > now have kotlin support. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7139) Blog post announcing Kotlin Sample addition to Beam
[ https://issues.apache.org/jira/browse/BEAM-7139?focusedWorklogId=233089&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233089 ] ASF GitHub Bot logged work on BEAM-7139: Author: ASF GitHub Bot Created on: 25/Apr/19 20:01 Start Date: 25/Apr/19 20:01 Worklog Time Spent: 10m Work Description: pabloem commented on issue #8391: [BEAM-7139] Blogpost for Kotlin Samples URL: https://github.com/apache/beam/pull/8391#issuecomment-486817554 Sorry about the trouble. It seems that this wont work for kotlin: http://apache-beam-website-pull-requests.storage.googleapis.com/8391/blog/2019/04/25/beam-kotlin.html You may be able to just let it highlight as Java. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233089) Time Spent: 50m (was: 40m) > Blog post announcing Kotlin Sample addition to Beam > --- > > Key: BEAM-7139 > URL: https://issues.apache.org/jira/browse/BEAM-7139 > Project: Beam > Issue Type: Task > Components: website >Reporter: Harshit Dwivedi >Assignee: Harshit Dwivedi >Priority: Trivial > Time Spent: 50m > Remaining Estimate: 0h > > Publishing a quick blog post that lets the users know that the Beam samples > now have kotlin support. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-7150) Support nested AND
[ https://issues.apache.org/jira/browse/BEAM-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang updated BEAM-7150: --- Description: There might be nested AND cases: a = b AND (c = d AND (e = f)). Such cases should be flattened as a = b AND c = d AND e = f. This is useful because after done so, we can easily build keys for join inputs: each conjunction clause contributes a field of key (key is Row in SQL equi-join) for join inputs. was: There might be nested AND cases: a = b AND (c = d AND (e = f)). Such cases should be flattened as a = b AND c = d AND e = f. > Support nested AND > -- > > Key: BEAM-7150 > URL: https://issues.apache.org/jira/browse/BEAM-7150 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql >Reporter: Rui Wang >Priority: Major > > There might be nested AND cases: > a = b AND (c = d AND (e = f)). > Such cases should be flattened as a = b AND c = d AND e = f. > This is useful because after done so, we can easily build keys for join > inputs: each conjunction clause contributes a field of key (key is Row in SQL > equi-join) for join inputs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-6815) Add last year speakers to the Beam summit site
[ https://issues.apache.org/jira/browse/BEAM-6815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pablo Estrada resolved BEAM-6815. - Resolution: Fixed Assignee: Aizhamal Nurmamat kyzy (was: Pablo Estrada) Fix Version/s: Not applicable > Add last year speakers to the Beam summit site > -- > > Key: BEAM-6815 > URL: https://issues.apache.org/jira/browse/BEAM-6815 > Project: Beam > Issue Type: Task > Components: beam-events >Reporter: Aizhamal Nurmamat kyzy >Assignee: Aizhamal Nurmamat kyzy >Priority: Minor > Fix For: Not applicable > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6830) Direct runner throws error "http 404 not found" when reading Bigquery tables from any other region than "US"
[ https://issues.apache.org/jira/browse/BEAM-6830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pablo Estrada updated BEAM-6830: Component/s: (was: beam-events) io-java-gcp > Direct runner throws error "http 404 not found" when reading Bigquery tables > from any other region than "US" > > > Key: BEAM-6830 > URL: https://issues.apache.org/jira/browse/BEAM-6830 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Suraj >Assignee: Pablo Estrada >Priority: Major > > When trying to read bigquery table located in region "asia-southeast-1" using > DirectRunner ,it throws error "http 404 not found" but same code works when > run using DataflowRunner -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-7136) Beam documentation inconsistent
[ https://issues.apache.org/jira/browse/BEAM-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pablo Estrada updated BEAM-7136: Component/s: (was: beam-events) website > Beam documentation inconsistent > --- > > Key: BEAM-7136 > URL: https://issues.apache.org/jira/browse/BEAM-7136 > Project: Beam > Issue Type: Bug > Components: beam-model, website >Reporter: Daniel Collins >Assignee: Aizhamal Nurmamat kyzy >Priority: Major > > Apache beam documentation is inconsistent about Python usage of Pub/Sub. The > chart [here|[https://beam.apache.org/documentation/io/built-in/]] claims that > Pub/Sub is available for usage in Batch Pipelines, but the source [from > clicking on that > link|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/pubsub.py]] > explicitly says only streaming is supported. Both of these cannot be > correct. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-7150) Support nested AND
Rui Wang created BEAM-7150: -- Summary: Support nested AND Key: BEAM-7150 URL: https://issues.apache.org/jira/browse/BEAM-7150 Project: Beam Issue Type: Sub-task Components: dsl-sql Reporter: Rui Wang There might be nested AND cases: a = b AND (c = d AND (e = f)). Such cases should be flattened as a = b AND c = d AND e = f. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-7149) support OR
Rui Wang created BEAM-7149: -- Summary: support OR Key: BEAM-7149 URL: https://issues.apache.org/jira/browse/BEAM-7149 Project: Beam Issue Type: Sub-task Components: dsl-sql Reporter: Rui Wang OR supposed to be implement in rules. Basically when LogicalRel is generated and being converted to PhysicalRel, OR should be handled by a separate converter rule: all conjunction clauses of OR should be converted to independent JOIN and their results are combined by UNION rel with deduplication. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-6112) SQL could support equijoins on complex expressions, not just column refs
[ https://issues.apache.org/jira/browse/BEAM-6112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang reassigned BEAM-6112: -- Assignee: (was: Rui Wang) > SQL could support equijoins on complex expressions, not just column refs > > > Key: BEAM-6112 > URL: https://issues.apache.org/jira/browse/BEAM-6112 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kenneth Knowles >Priority: Major > > Currently a join such as {{CAST(A.x AS BIGINT) = B.y}} will fail, along with > other similar simple expressions. We only support joining directly on > immediate column references. It can be worked around by inserting a WITH > clause for the intermediate value. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-7147) Several PR Postcommits failing: java.lang.OutOfMemoryError
[ https://issues.apache.org/jira/browse/BEAM-7147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yifan zou closed BEAM-7147. --- Resolution: Information Provided Fix Version/s: Not applicable > Several PR Postcommits failing: java.lang.OutOfMemoryError > -- > > Key: BEAM-7147 > URL: https://issues.apache.org/jira/browse/BEAM-7147 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Daniel Oliveira >Assignee: yifan zou >Priority: Major > Labels: currently-failing > Fix For: Not applicable > > > While working on my PR ([PR #6880|https://github.com/apache/beam/pull/8380]) > I tried to run a few Postcommits and got the same error on each one. Not sure > if this is related to the work being done to move to new machines or > something else. Can someone help diagnose? > From this run > ([https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch_PR/38/]): > {noformat} > *17:10:36* Error occurred during initialization of VM > *17:10:36* java.lang.OutOfMemoryError: unable to create new native thread > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7139) Blog post announcing Kotlin Sample addition to Beam
[ https://issues.apache.org/jira/browse/BEAM-7139?focusedWorklogId=233031&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233031 ] ASF GitHub Bot logged work on BEAM-7139: Author: ASF GitHub Bot Created on: 25/Apr/19 18:38 Start Date: 25/Apr/19 18:38 Worklog Time Spent: 10m Work Description: harshithdwivedi commented on issue #8391: [BEAM-7139] Blogpost for Kotlin Samples URL: https://github.com/apache/beam/pull/8391#issuecomment-486790554 Oh neat, I didn't know that! Done 🙂 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233031) Time Spent: 40m (was: 0.5h) > Blog post announcing Kotlin Sample addition to Beam > --- > > Key: BEAM-7139 > URL: https://issues.apache.org/jira/browse/BEAM-7139 > Project: Beam > Issue Type: Task > Components: website >Reporter: Harshit Dwivedi >Assignee: Harshit Dwivedi >Priority: Trivial > Time Spent: 40m > Remaining Estimate: 0h > > Publishing a quick blog post that lets the users know that the Beam samples > now have kotlin support. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6880) Deprecate Java Portable Reference Runner
[ https://issues.apache.org/jira/browse/BEAM-6880?focusedWorklogId=233023&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233023 ] ASF GitHub Bot logged work on BEAM-6880: Author: ASF GitHub Bot Created on: 25/Apr/19 18:20 Start Date: 25/Apr/19 18:20 Worklog Time Spent: 10m Work Description: youngoli commented on issue #8380: [BEAM-6880] Remove deprecated Reference Runner code. URL: https://github.com/apache/beam/pull/8380#issuecomment-486783882 Run Java PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233023) Time Spent: 3h 20m (was: 3h 10m) > Deprecate Java Portable Reference Runner > > > Key: BEAM-6880 > URL: https://issues.apache.org/jira/browse/BEAM-6880 > Project: Beam > Issue Type: New Feature > Components: runner-direct, test-failures, testing >Reporter: Mikhail Gryzykhin >Assignee: Daniel Oliveira >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > This ticket is about deprecating Java Portable Reference runner. > > Discussion is happening in [this > thread|[https://lists.apache.org/thread.html/0b68efce9b7f2c5297b32d09e5d903e9b354199fe2ce446fbcd240bc@%3Cdev.beam.apache.org%3E]] > > > Current summary is: disable beam_PostCommit_Java_PVR_Reference job. > Keeping or removing reference runner code is still under discussion. It is > suggested to create PR that removes relevant code and start voting there. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-7148) LTS backport: AfterProcessingTime trigger doesn't fire reliably
Maximilian Michels created BEAM-7148: Summary: LTS backport: AfterProcessingTime trigger doesn't fire reliably Key: BEAM-7148 URL: https://issues.apache.org/jira/browse/BEAM-7148 Project: Beam Issue Type: Bug Components: runner-flink Affects Versions: 2.1.0, 2.2.0, 2.3.0 Reporter: Maximilian Michels Assignee: Maximilian Michels Fix For: 2.13.0 *Issue* Beam AfterProcessingTime trigger doesn't fire always reliably after a configured delay. The following job triggers should fire after watermark passes the end of the window and then every 5 seconds for late data and the finally at the end of allowed lateness. *Expected behaviour* Late firing after processing time trigger should fire after 5 seconds since first late records arrive in the pane. *Actual behaviour* >From my testings late triggers works for some keys but not for the other - >it's pretty random which keys are affected. The DummySource generates 15 >distinct keys AA,BB,..., PP. For each key it sends 5 on time records and one >late record. In case late trigger firing is missed it won't fire until the >allowed lateness period. *Job code* {code:java} String[] runnerArgs = {"--runner=FlinkRunner", "--parallelism=8"}; FlinkPipelineOptions options = PipelineOptionsFactory.fromArgs(runnerArgs).as(FlinkPipelineOptions.class); Pipeline pipeline = Pipeline.create(options); PCollection apply = pipeline.apply(Read.from(new DummySource())) .apply(Window.into(FixedWindows.of(Duration.standardSeconds(10))) .triggering(AfterWatermark.pastEndOfWindow() .withLateFirings( AfterProcessingTime .pastFirstElementInPane().plusDelayOf(Duration.standardSeconds(5 .accumulatingFiredPanes() .withAllowedLateness(Duration.standardMinutes(2), Window.ClosingBehavior.FIRE_IF_NON_EMPTY) ); apply.apply(Count.perElement()) .apply(ParDo.of(new DoFn, Long>() { @ProcessElement public void process(ProcessContext context, BoundedWindow window) { LOG.info("Count: {}. For window {}, Pane {}", context.element(), window, context.pane()); } })); pipeline.run().waitUntilFinish();{code} *How can you replicate the issue?* I've created a github repo [https://github.com/pbartoszek/BEAM-3863_late_trigger] with the code shown above. Please check out the README file for details how to replicate the issue. *What's is causing the issue?* I explained the cause in PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-7148) LTS backport: AfterProcessingTime trigger doesn't fire reliably
[ https://issues.apache.org/jira/browse/BEAM-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels updated BEAM-7148: - Fix Version/s: (was: 2.13.0) 2.7.1 > LTS backport: AfterProcessingTime trigger doesn't fire reliably > --- > > Key: BEAM-7148 > URL: https://issues.apache.org/jira/browse/BEAM-7148 > Project: Beam > Issue Type: Bug > Components: runner-flink >Affects Versions: 2.1.0, 2.2.0, 2.3.0 >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Labels: triaged > Fix For: 2.7.1 > > > *Issue* > Beam AfterProcessingTime trigger doesn't fire always reliably after a > configured delay. > The following job triggers should fire after watermark passes the end of the > window and then every 5 seconds for late data and the finally at the end of > allowed lateness. > *Expected behaviour* > Late firing after processing time trigger should fire after 5 seconds since > first late records arrive in the pane. > *Actual behaviour* > From my testings late triggers works for some keys but not for the other - > it's pretty random which keys are affected. The DummySource generates 15 > distinct keys AA,BB,..., PP. For each key it sends 5 on time records and one > late record. In case late trigger firing is missed it won't fire until the > allowed lateness period. > *Job code* > {code:java} > String[] runnerArgs = {"--runner=FlinkRunner", "--parallelism=8"}; > FlinkPipelineOptions options = > PipelineOptionsFactory.fromArgs(runnerArgs).as(FlinkPipelineOptions.class); > Pipeline pipeline = Pipeline.create(options); > PCollection apply = pipeline.apply(Read.from(new DummySource())) > > .apply(Window.into(FixedWindows.of(Duration.standardSeconds(10))) > .triggering(AfterWatermark.pastEndOfWindow() > .withLateFirings( > AfterProcessingTime > > .pastFirstElementInPane().plusDelayOf(Duration.standardSeconds(5 > .accumulatingFiredPanes() > .withAllowedLateness(Duration.standardMinutes(2), > Window.ClosingBehavior.FIRE_IF_NON_EMPTY) > ); > apply.apply(Count.perElement()) > .apply(ParDo.of(new DoFn, Long>() { > @ProcessElement > public void process(ProcessContext context, BoundedWindow window) > { > LOG.info("Count: {}. For window {}, Pane {}", > context.element(), window, context.pane()); > } > })); > pipeline.run().waitUntilFinish();{code} > > *How can you replicate the issue?* > I've created a github repo > [https://github.com/pbartoszek/BEAM-3863_late_trigger] with the code shown > above. Please check out the README file for details how to replicate the > issue. > *What's is causing the issue?* > I explained the cause in PR. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-3863) AfterProcessingTime trigger doesn't fire reliably
[ https://issues.apache.org/jira/browse/BEAM-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels resolved BEAM-3863. -- Resolution: Fixed Fix Version/s: 2.13.0 This should be resolved now. Thanks again for the great report! > AfterProcessingTime trigger doesn't fire reliably > - > > Key: BEAM-3863 > URL: https://issues.apache.org/jira/browse/BEAM-3863 > Project: Beam > Issue Type: Bug > Components: runner-flink >Affects Versions: 2.1.0, 2.2.0, 2.3.0 >Reporter: Pawel Bartoszek >Assignee: Maximilian Michels >Priority: Major > Labels: triaged > Fix For: 2.13.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > *Issue* > Beam AfterProcessingTime trigger doesn't fire always reliably after a > configured delay. > The following job triggers should fire after watermark passes the end of the > window and then every 5 seconds for late data and the finally at the end of > allowed lateness. > *Expected behaviour* > Late firing after processing time trigger should fire after 5 seconds since > first late records arrive in the pane. > *Actual behaviour* > From my testings late triggers works for some keys but not for the other - > it's pretty random which keys are affected. The DummySource generates 15 > distinct keys AA,BB,..., PP. For each key it sends 5 on time records and one > late record. In case late trigger firing is missed it won't fire until the > allowed lateness period. > *Job code* > {code:java} > String[] runnerArgs = {"--runner=FlinkRunner", "--parallelism=8"}; > FlinkPipelineOptions options = > PipelineOptionsFactory.fromArgs(runnerArgs).as(FlinkPipelineOptions.class); > Pipeline pipeline = Pipeline.create(options); > PCollection apply = pipeline.apply(Read.from(new DummySource())) > > .apply(Window.into(FixedWindows.of(Duration.standardSeconds(10))) > .triggering(AfterWatermark.pastEndOfWindow() > .withLateFirings( > AfterProcessingTime > > .pastFirstElementInPane().plusDelayOf(Duration.standardSeconds(5 > .accumulatingFiredPanes() > .withAllowedLateness(Duration.standardMinutes(2), > Window.ClosingBehavior.FIRE_IF_NON_EMPTY) > ); > apply.apply(Count.perElement()) > .apply(ParDo.of(new DoFn, Long>() { > @ProcessElement > public void process(ProcessContext context, BoundedWindow window) > { > LOG.info("Count: {}. For window {}, Pane {}", > context.element(), window, context.pane()); > } > })); > pipeline.run().waitUntilFinish();{code} > > *How can you replicate the issue?* > I've created a github repo > [https://github.com/pbartoszek/BEAM-3863_late_trigger] with the code shown > above. Please check out the README file for details how to replicate the > issue. > *What's is causing the issue?* > I explained the cause in PR. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3863) AfterProcessingTime trigger doesn't fire reliably
[ https://issues.apache.org/jira/browse/BEAM-3863?focusedWorklogId=233020&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233020 ] ASF GitHub Bot logged work on BEAM-3863: Author: ASF GitHub Bot Created on: 25/Apr/19 18:13 Start Date: 25/Apr/19 18:13 Worklog Time Spent: 10m Work Description: mxm commented on pull request #8366: [BEAM-3863] Ensure correct firing of processing time timers URL: https://github.com/apache/beam/pull/8366 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233020) Time Spent: 2h 40m (was: 2.5h) > AfterProcessingTime trigger doesn't fire reliably > - > > Key: BEAM-3863 > URL: https://issues.apache.org/jira/browse/BEAM-3863 > Project: Beam > Issue Type: Bug > Components: runner-flink >Affects Versions: 2.1.0, 2.2.0, 2.3.0 >Reporter: Pawel Bartoszek >Assignee: Maximilian Michels >Priority: Major > Labels: triaged > Time Spent: 2h 40m > Remaining Estimate: 0h > > *Issue* > Beam AfterProcessingTime trigger doesn't fire always reliably after a > configured delay. > The following job triggers should fire after watermark passes the end of the > window and then every 5 seconds for late data and the finally at the end of > allowed lateness. > *Expected behaviour* > Late firing after processing time trigger should fire after 5 seconds since > first late records arrive in the pane. > *Actual behaviour* > From my testings late triggers works for some keys but not for the other - > it's pretty random which keys are affected. The DummySource generates 15 > distinct keys AA,BB,..., PP. For each key it sends 5 on time records and one > late record. In case late trigger firing is missed it won't fire until the > allowed lateness period. > *Job code* > {code:java} > String[] runnerArgs = {"--runner=FlinkRunner", "--parallelism=8"}; > FlinkPipelineOptions options = > PipelineOptionsFactory.fromArgs(runnerArgs).as(FlinkPipelineOptions.class); > Pipeline pipeline = Pipeline.create(options); > PCollection apply = pipeline.apply(Read.from(new DummySource())) > > .apply(Window.into(FixedWindows.of(Duration.standardSeconds(10))) > .triggering(AfterWatermark.pastEndOfWindow() > .withLateFirings( > AfterProcessingTime > > .pastFirstElementInPane().plusDelayOf(Duration.standardSeconds(5 > .accumulatingFiredPanes() > .withAllowedLateness(Duration.standardMinutes(2), > Window.ClosingBehavior.FIRE_IF_NON_EMPTY) > ); > apply.apply(Count.perElement()) > .apply(ParDo.of(new DoFn, Long>() { > @ProcessElement > public void process(ProcessContext context, BoundedWindow window) > { > LOG.info("Count: {}. For window {}, Pane {}", > context.element(), window, context.pane()); > } > })); > pipeline.run().waitUntilFinish();{code} > > *How can you replicate the issue?* > I've created a github repo > [https://github.com/pbartoszek/BEAM-3863_late_trigger] with the code shown > above. Please check out the README file for details how to replicate the > issue. > *What's is causing the issue?* > I explained the cause in PR. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4543) Remove dependency on googledatastore in favor of google-cloud-datastore.
[ https://issues.apache.org/jira/browse/BEAM-4543?focusedWorklogId=233019&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233019 ] ASF GitHub Bot logged work on BEAM-4543: Author: ASF GitHub Bot Created on: 25/Apr/19 18:13 Start Date: 25/Apr/19 18:13 Worklog Time Spent: 10m Work Description: udim commented on issue #8262: [BEAM-4543] Python Datastore IO using google-cloud-datastore URL: https://github.com/apache/beam/pull/8262#issuecomment-486781398 run python postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233019) Time Spent: 3h 50m (was: 3h 40m) > Remove dependency on googledatastore in favor of google-cloud-datastore. > > > Key: BEAM-4543 > URL: https://issues.apache.org/jira/browse/BEAM-4543 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Udi Meiri >Priority: Minor > Labels: triaged > Time Spent: 3h 50m > Remaining Estimate: 0h > > apache-beam[gcp] package depends [1] on googledatastore package [2]. We > should replace this dependency with google-cloud-datastore [3] which is > officially supported, has better release cadence and also has Python 3 > support. > [1] > https://github.com/apache/beam/blob/fad655462f8fadfdfaab0b7a09cab538f076f94e/sdks/python/setup.py#L126 > [2] [https://pypi.org/project/googledatastore/] > [3] [https://pypi.org/project/google-cloud-datastore/] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-7147) Several PR Postcommits failing: java.lang.OutOfMemoryError
[ https://issues.apache.org/jira/browse/BEAM-7147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826315#comment-16826315 ] yifan zou commented on BEAM-7147: - the beam4 is offline. The OOM is misleading, it is actually caused by some thread leaking. A major leak was fixed in [https://issues.apache.org/jira/browse/BEAM-7109], but seems like there are still other tests eating the thread pool of the VM. I'll keep beam4 offline for investigation. You can rerun the test and it could be assigned to a healthy node. > Several PR Postcommits failing: java.lang.OutOfMemoryError > -- > > Key: BEAM-7147 > URL: https://issues.apache.org/jira/browse/BEAM-7147 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Daniel Oliveira >Assignee: yifan zou >Priority: Major > Labels: currently-failing > > While working on my PR ([PR #6880|https://github.com/apache/beam/pull/8380]) > I tried to run a few Postcommits and got the same error on each one. Not sure > if this is related to the work being done to move to new machines or > something else. Can someone help diagnose? > From this run > ([https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch_PR/38/]): > {noformat} > *17:10:36* Error occurred during initialization of VM > *17:10:36* java.lang.OutOfMemoryError: unable to create new native thread > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7029) Support KafkaIO to be configured externally for use with other SDKs
[ https://issues.apache.org/jira/browse/BEAM-7029?focusedWorklogId=233018&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233018 ] ASF GitHub Bot logged work on BEAM-7029: Author: ASF GitHub Bot Created on: 25/Apr/19 18:10 Start Date: 25/Apr/19 18:10 Worklog Time Spent: 10m Work Description: mxm commented on pull request #8322: [BEAM-7029] Add KafkaIO.Write as an external transform URL: https://github.com/apache/beam/pull/8322#discussion_r278670159 ## File path: sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java ## @@ -1325,12 +1325,89 @@ public void populateDisplayData(DisplayData.Builder builder) { abstract Builder toBuilder(); @AutoValue.Builder -abstract static class Builder { +abstract static class Builder +implements ExternalTransformBuilder>, PDone> { abstract Builder setTopic(String topic); abstract Builder setWriteRecordsTransform(WriteRecords transform); abstract Write build(); + + @Override + public PTransform>, PDone> buildExternal( + External.Configuration configuration) { +String topic = utf8String(configuration.topic); +setTopic(topic); + +Map producerConfig = new HashMap<>(); +for (KV kv : configuration.producerConfig) { + String key = utf8String(kv.getKey()); + String value = utf8String(kv.getValue()); + producerConfig.put(key, value); +} +Class keySerializer = resolveClass(utf8String(configuration.keySerializer)); +Class valSerializer = resolveClass(utf8String(configuration.valueSerializer)); + +WriteRecords writeRecords = +KafkaIO.writeRecords() +.updateProducerProperties(producerConfig) +.withKeySerializer(keySerializer) +.withValueSerializer(valSerializer) +.withTopic(topic); +setWriteRecordsTransform(writeRecords); + +return build(); + } + + private static Class resolveClass(String className) { Review comment: Actually, I've moved the method to the root class because I didn't know whether this would be useful beyond this scope. That way at least both inner classes can use it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233018) Time Spent: 13h 10m (was: 13h) > Support KafkaIO to be configured externally for use with other SDKs > --- > > Key: BEAM-7029 > URL: https://issues.apache.org/jira/browse/BEAM-7029 > Project: Beam > Issue Type: New Feature > Components: io-java-kafka, runner-flink, sdk-py-core >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Time Spent: 13h 10m > Remaining Estimate: 0h > > As of BEAM-6730, we can externally configure existing transforms from SDKs. > We should add more useful transforms then just {{GenerateSequence}}. > {{KafkaIO}} is a good candidate. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7139) Blog post announcing Kotlin Sample addition to Beam
[ https://issues.apache.org/jira/browse/BEAM-7139?focusedWorklogId=233015&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233015 ] ASF GitHub Bot logged work on BEAM-7139: Author: ASF GitHub Bot Created on: 25/Apr/19 18:04 Start Date: 25/Apr/19 18:04 Worklog Time Spent: 10m Work Description: pabloem commented on issue #8391: [BEAM-7139] Blogpost for Kotlin Samples URL: https://github.com/apache/beam/pull/8391#issuecomment-486778315 Thank you Harshith. This looks good to me : ) just one more thing: If you add the language to the code snippet like so: ` ```java `, the website will add syntax highlighting to your code samples. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233015) Time Spent: 0.5h (was: 20m) > Blog post announcing Kotlin Sample addition to Beam > --- > > Key: BEAM-7139 > URL: https://issues.apache.org/jira/browse/BEAM-7139 > Project: Beam > Issue Type: Task > Components: website >Reporter: Harshit Dwivedi >Assignee: Harshit Dwivedi >Priority: Trivial > Time Spent: 0.5h > Remaining Estimate: 0h > > Publishing a quick blog post that lets the users know that the Beam samples > now have kotlin support. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-7137) TypeError caused by using str variable as header argument in apache_beam.io.textio.WriteToText
[ https://issues.apache.org/jira/browse/BEAM-7137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826287#comment-16826287 ] Chamikara Jayalath commented on BEAM-7137: -- +1 for encoding header to bytes (using UTF-8). We have clearly documented that header should be a string here: [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/textio.py#L599] > TypeError caused by using str variable as header argument in > apache_beam.io.textio.WriteToText > -- > > Key: BEAM-7137 > URL: https://issues.apache.org/jira/browse/BEAM-7137 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Affects Versions: 2.11.0 > Environment: Python 3.5.6 > macOS Mojave 10.14.4 >Reporter: yoshiki obata >Assignee: yoshiki obata >Priority: Major > > Using str header to apache_beam.io.textio.WriteToText as argument cause > TypeError with Python 3.5.6 - or maybe higher - despite docstring says header > is str. > This error occurred by writing header to file without encoding to bytes at > apache_beam.io.textio._TextSink.open. > > {code:java} > Traceback (most recent call last): > File "apache_beam/runners/common.py", line 727, in > apache_beam.runners.common.DoFnRunner.process > File "apache_beam/runners/common.py", line 555, in > apache_beam.runners.common.PerWindowInvoker.invoke_process > File "apache_beam/runners/common.py", line 625, in > apache_beam.runners.common.PerWindowInvoker._invoke_per_window > File > "/Users/yob/.local/share/virtualenvs/test/lib/python3.5/site-packages/apache_beam/io/iobase.py", > line 1033, in process > self.writer = self.sink.open_writer(init_result, str(uuid.uuid4())) > File > "/Users/yob/.local/share/virtualenvs/test/lib/python3.5/site-packages/apache_beam/options/value_provider.py", > line 137, in _f > return fnc(self, *args, **kwargs) > File > "/Users/yob/.local/share/virtualenvs/test/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", > line 185, in open_writer > return FileBasedSinkWriter(self, os.path.join(init_result, uid) + suffix) > File > "/Users/yob/.local/share/virtualenvs/test/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", > line 389, in __init__ > self.temp_handle = self.sink.open(temp_shard_path) > File > "/Users/yob/.local/share/virtualenvs/test/lib/python3.5/site-packages/apache_beam/io/textio.py", > line 393, in open > file_handle.write(self._header) > TypeError: a bytes-like object is required, not 'str' > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-7147) Several PR Postcommits failing: java.lang.OutOfMemoryError
[ https://issues.apache.org/jira/browse/BEAM-7147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-7147: -- Description: While working on my PR ([PR #6880|https://github.com/apache/beam/pull/8380]) I tried to run a few Postcommits and got the same error on each one. Not sure if this is related to the work being done to move to new machines or something else. Can someone help diagnose? >From this run >([https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch_PR/38/]): {noformat} *17:10:36* Error occurred during initialization of VM *17:10:36* java.lang.OutOfMemoryError: unable to create new native thread {noformat} was: While working on my PR ([PR #6880|https://github.com/apache/beam/pull/8380]) I tried to run a few Postcommits and got the same error on each one. Not sure if this is related to the work being done to move to new machines or something else. Can someone help diagnose? >From this run >([https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch_PR/38/]): *17:10:36* Error occurred during initialization of VM*17:10:36* java.lang.OutOfMemoryError: unable to create new native thread > Several PR Postcommits failing: java.lang.OutOfMemoryError > -- > > Key: BEAM-7147 > URL: https://issues.apache.org/jira/browse/BEAM-7147 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Daniel Oliveira >Assignee: yifan zou >Priority: Major > Labels: currently-failing > > While working on my PR ([PR #6880|https://github.com/apache/beam/pull/8380]) > I tried to run a few Postcommits and got the same error on each one. Not sure > if this is related to the work being done to move to new machines or > something else. Can someone help diagnose? > From this run > ([https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch_PR/38/]): > {noformat} > *17:10:36* Error occurred during initialization of VM > *17:10:36* java.lang.OutOfMemoryError: unable to create new native thread > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-7147) Several PR Postcommits failing: java.lang.OutOfMemoryError
[ https://issues.apache.org/jira/browse/BEAM-7147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826285#comment-16826285 ] Daniel Oliveira commented on BEAM-7147: --- Yifan, I'm adding you as assignee since you've been working on the migration to new machines, so you might know if this is related or not. > Several PR Postcommits failing: java.lang.OutOfMemoryError > -- > > Key: BEAM-7147 > URL: https://issues.apache.org/jira/browse/BEAM-7147 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Daniel Oliveira >Assignee: yifan zou >Priority: Major > Labels: currently-failing > > While working on my PR ([PR #6880|https://github.com/apache/beam/pull/8380]) > I tried to run a few Postcommits and got the same error on each one. Not sure > if this is related to the work being done to move to new machines or > something else. Can someone help diagnose? > From this run > ([https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch_PR/38/]): > *17:10:36* Error occurred during initialization of VM*17:10:36* > java.lang.OutOfMemoryError: unable to create new native thread -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-7147) Several PR Postcommits failing: java.lang.OutOfMemoryError
Daniel Oliveira created BEAM-7147: - Summary: Several PR Postcommits failing: java.lang.OutOfMemoryError Key: BEAM-7147 URL: https://issues.apache.org/jira/browse/BEAM-7147 Project: Beam Issue Type: Bug Components: test-failures Reporter: Daniel Oliveira Assignee: yifan zou While working on my PR ([PR #6880|https://github.com/apache/beam/pull/8380]) I tried to run a few Postcommits and got the same error on each one. Not sure if this is related to the work being done to move to new machines or something else. Can someone help diagnose? >From this run >([https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch_PR/38/]): *17:10:36* Error occurred during initialization of VM*17:10:36* java.lang.OutOfMemoryError: unable to create new native thread -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-7137) TypeError caused by using str variable as header argument in apache_beam.io.textio.WriteToText
[ https://issues.apache.org/jira/browse/BEAM-7137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826279#comment-16826279 ] Valentyn Tymofieiev commented on BEAM-7137: --- [~yoshiki.obata], if you don't plan to work on this issue, feel free to reassign it to me and I will try to find an owner. > TypeError caused by using str variable as header argument in > apache_beam.io.textio.WriteToText > -- > > Key: BEAM-7137 > URL: https://issues.apache.org/jira/browse/BEAM-7137 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Affects Versions: 2.11.0 > Environment: Python 3.5.6 > macOS Mojave 10.14.4 >Reporter: yoshiki obata >Assignee: yoshiki obata >Priority: Major > > Using str header to apache_beam.io.textio.WriteToText as argument cause > TypeError with Python 3.5.6 - or maybe higher - despite docstring says header > is str. > This error occurred by writing header to file without encoding to bytes at > apache_beam.io.textio._TextSink.open. > > {code:java} > Traceback (most recent call last): > File "apache_beam/runners/common.py", line 727, in > apache_beam.runners.common.DoFnRunner.process > File "apache_beam/runners/common.py", line 555, in > apache_beam.runners.common.PerWindowInvoker.invoke_process > File "apache_beam/runners/common.py", line 625, in > apache_beam.runners.common.PerWindowInvoker._invoke_per_window > File > "/Users/yob/.local/share/virtualenvs/test/lib/python3.5/site-packages/apache_beam/io/iobase.py", > line 1033, in process > self.writer = self.sink.open_writer(init_result, str(uuid.uuid4())) > File > "/Users/yob/.local/share/virtualenvs/test/lib/python3.5/site-packages/apache_beam/options/value_provider.py", > line 137, in _f > return fnc(self, *args, **kwargs) > File > "/Users/yob/.local/share/virtualenvs/test/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", > line 185, in open_writer > return FileBasedSinkWriter(self, os.path.join(init_result, uid) + suffix) > File > "/Users/yob/.local/share/virtualenvs/test/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", > line 389, in __init__ > self.temp_handle = self.sink.open(temp_shard_path) > File > "/Users/yob/.local/share/virtualenvs/test/lib/python3.5/site-packages/apache_beam/io/textio.py", > line 393, in open > file_handle.write(self._header) > TypeError: a bytes-like object is required, not 'str' > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (BEAM-7137) TypeError caused by using str variable as header argument in apache_beam.io.textio.WriteToText
[ https://issues.apache.org/jira/browse/BEAM-7137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826278#comment-16826278 ] Valentyn Tymofieiev edited comment on BEAM-7137 at 4/25/19 5:51 PM: [~yoshiki.obata], thanks a lot for trying out Beam on Python 3 and reporting this! We have to either encode header to bytes [here|https://github.com/apache/beam/blob/2cb44a81b258a64544fd8ca387305b2d5ccce13b/sdks/python/apache_beam/io/textio.py#L393], or require header to be bytes in the first place. Looking through the IO codebase, it seems that we assume header to be a string in quite a few places starting from [WriteToText PTransform|https://github.com/apache/beam/blob/2cb44a81b258a64544fd8ca387305b2d5ccce13b/sdks/python/apache_beam/io/textio.py#L599], so encoding header to bytes may be the path of least resistance. We can revise this if we find a strong reason to require header to be bytes. cc: [~chamikara] Also, we should find out why this was not caught by our postcommit integration tests, and improve test coverage so that we have confidence that that both write and read path work correctly. cc: [~Juta] was (Author: tvalentyn): [~yoshiki.obata], thanks a lot for trying out Beam on Python 3 and reporting this! We have to either encode header to bytes [here|https://github.com/apache/beam/blob/2cb44a81b258a64544fd8ca387305b2d5ccce13b/sdks/python/apache_beam/io/textio.py#L393], or require header to be bytes in the first place. Looking through the IO codebase, it seems that we assume header to be a string in quite a few places starting from [WriteToText PTransform|https://github.com/apache/beam/blob/2cb44a81b258a64544fd8ca387305b2d5ccce13b/sdks/python/apache_beam/io/textio.py#L599], so encoding header to bytes may be the past of least resistance. We can revise this if we find a strong reason to require header to be bytes. cc: [~chamikara] Also, we should find out why this was not caught by our postcommit integration tests, and improve test coverage so that we have confidence that that both write and read path work correctly. cc: [~Juta] > TypeError caused by using str variable as header argument in > apache_beam.io.textio.WriteToText > -- > > Key: BEAM-7137 > URL: https://issues.apache.org/jira/browse/BEAM-7137 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Affects Versions: 2.11.0 > Environment: Python 3.5.6 > macOS Mojave 10.14.4 >Reporter: yoshiki obata >Assignee: yoshiki obata >Priority: Major > > Using str header to apache_beam.io.textio.WriteToText as argument cause > TypeError with Python 3.5.6 - or maybe higher - despite docstring says header > is str. > This error occurred by writing header to file without encoding to bytes at > apache_beam.io.textio._TextSink.open. > > {code:java} > Traceback (most recent call last): > File "apache_beam/runners/common.py", line 727, in > apache_beam.runners.common.DoFnRunner.process > File "apache_beam/runners/common.py", line 555, in > apache_beam.runners.common.PerWindowInvoker.invoke_process > File "apache_beam/runners/common.py", line 625, in > apache_beam.runners.common.PerWindowInvoker._invoke_per_window > File > "/Users/yob/.local/share/virtualenvs/test/lib/python3.5/site-packages/apache_beam/io/iobase.py", > line 1033, in process > self.writer = self.sink.open_writer(init_result, str(uuid.uuid4())) > File > "/Users/yob/.local/share/virtualenvs/test/lib/python3.5/site-packages/apache_beam/options/value_provider.py", > line 137, in _f > return fnc(self, *args, **kwargs) > File > "/Users/yob/.local/share/virtualenvs/test/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", > line 185, in open_writer > return FileBasedSinkWriter(self, os.path.join(init_result, uid) + suffix) > File > "/Users/yob/.local/share/virtualenvs/test/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", > line 389, in __init__ > self.temp_handle = self.sink.open(temp_shard_path) > File > "/Users/yob/.local/share/virtualenvs/test/lib/python3.5/site-packages/apache_beam/io/textio.py", > line 393, in open > file_handle.write(self._header) > TypeError: a bytes-like object is required, not 'str' > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-7137) TypeError caused by using str variable as header argument in apache_beam.io.textio.WriteToText
[ https://issues.apache.org/jira/browse/BEAM-7137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev updated BEAM-7137: -- Issue Type: Sub-task (was: Bug) Parent: BEAM-1251 > TypeError caused by using str variable as header argument in > apache_beam.io.textio.WriteToText > -- > > Key: BEAM-7137 > URL: https://issues.apache.org/jira/browse/BEAM-7137 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Affects Versions: 2.11.0 > Environment: Python 3.5.6 > macOS Mojave 10.14.4 >Reporter: yoshiki obata >Assignee: yoshiki obata >Priority: Minor > > Using str header to apache_beam.io.textio.WriteToText as argument cause > TypeError with Python 3.5.6 - or maybe higher - despite docstring says header > is str. > This error occurred by writing header to file without encoding to bytes at > apache_beam.io.textio._TextSink.open. > > {code:java} > Traceback (most recent call last): > File "apache_beam/runners/common.py", line 727, in > apache_beam.runners.common.DoFnRunner.process > File "apache_beam/runners/common.py", line 555, in > apache_beam.runners.common.PerWindowInvoker.invoke_process > File "apache_beam/runners/common.py", line 625, in > apache_beam.runners.common.PerWindowInvoker._invoke_per_window > File > "/Users/yob/.local/share/virtualenvs/test/lib/python3.5/site-packages/apache_beam/io/iobase.py", > line 1033, in process > self.writer = self.sink.open_writer(init_result, str(uuid.uuid4())) > File > "/Users/yob/.local/share/virtualenvs/test/lib/python3.5/site-packages/apache_beam/options/value_provider.py", > line 137, in _f > return fnc(self, *args, **kwargs) > File > "/Users/yob/.local/share/virtualenvs/test/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", > line 185, in open_writer > return FileBasedSinkWriter(self, os.path.join(init_result, uid) + suffix) > File > "/Users/yob/.local/share/virtualenvs/test/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", > line 389, in __init__ > self.temp_handle = self.sink.open(temp_shard_path) > File > "/Users/yob/.local/share/virtualenvs/test/lib/python3.5/site-packages/apache_beam/io/textio.py", > line 393, in open > file_handle.write(self._header) > TypeError: a bytes-like object is required, not 'str' > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-7137) TypeError caused by using str variable as header argument in apache_beam.io.textio.WriteToText
[ https://issues.apache.org/jira/browse/BEAM-7137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev updated BEAM-7137: -- Priority: Major (was: Minor) > TypeError caused by using str variable as header argument in > apache_beam.io.textio.WriteToText > -- > > Key: BEAM-7137 > URL: https://issues.apache.org/jira/browse/BEAM-7137 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Affects Versions: 2.11.0 > Environment: Python 3.5.6 > macOS Mojave 10.14.4 >Reporter: yoshiki obata >Assignee: yoshiki obata >Priority: Major > > Using str header to apache_beam.io.textio.WriteToText as argument cause > TypeError with Python 3.5.6 - or maybe higher - despite docstring says header > is str. > This error occurred by writing header to file without encoding to bytes at > apache_beam.io.textio._TextSink.open. > > {code:java} > Traceback (most recent call last): > File "apache_beam/runners/common.py", line 727, in > apache_beam.runners.common.DoFnRunner.process > File "apache_beam/runners/common.py", line 555, in > apache_beam.runners.common.PerWindowInvoker.invoke_process > File "apache_beam/runners/common.py", line 625, in > apache_beam.runners.common.PerWindowInvoker._invoke_per_window > File > "/Users/yob/.local/share/virtualenvs/test/lib/python3.5/site-packages/apache_beam/io/iobase.py", > line 1033, in process > self.writer = self.sink.open_writer(init_result, str(uuid.uuid4())) > File > "/Users/yob/.local/share/virtualenvs/test/lib/python3.5/site-packages/apache_beam/options/value_provider.py", > line 137, in _f > return fnc(self, *args, **kwargs) > File > "/Users/yob/.local/share/virtualenvs/test/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", > line 185, in open_writer > return FileBasedSinkWriter(self, os.path.join(init_result, uid) + suffix) > File > "/Users/yob/.local/share/virtualenvs/test/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", > line 389, in __init__ > self.temp_handle = self.sink.open(temp_shard_path) > File > "/Users/yob/.local/share/virtualenvs/test/lib/python3.5/site-packages/apache_beam/io/textio.py", > line 393, in open > file_handle.write(self._header) > TypeError: a bytes-like object is required, not 'str' > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-7137) TypeError caused by using str variable as header argument in apache_beam.io.textio.WriteToText
[ https://issues.apache.org/jira/browse/BEAM-7137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826278#comment-16826278 ] Valentyn Tymofieiev commented on BEAM-7137: --- [~yoshiki.obata], thanks a lot for trying out Beam on Python 3 and reporting this! We have to either encode header to bytes [here|https://github.com/apache/beam/blob/2cb44a81b258a64544fd8ca387305b2d5ccce13b/sdks/python/apache_beam/io/textio.py#L393], or require header to be bytes in the first place. Looking through the IO codebase, it seems that we assume header to be a string in quite a few places starting from [WriteToText PTransform|https://github.com/apache/beam/blob/2cb44a81b258a64544fd8ca387305b2d5ccce13b/sdks/python/apache_beam/io/textio.py#L599], so encoding header to bytes may be the past of least resistance. We can revise this if we find a strong reason to require header to be bytes. cc: [~chamikara] Also, we should find out why this was not caught by our postcommit integration tests, and improve test coverage so that we have confidence that that both write and read path work correctly. cc: [~Juta] > TypeError caused by using str variable as header argument in > apache_beam.io.textio.WriteToText > -- > > Key: BEAM-7137 > URL: https://issues.apache.org/jira/browse/BEAM-7137 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.11.0 > Environment: Python 3.5.6 > macOS Mojave 10.14.4 >Reporter: yoshiki obata >Assignee: yoshiki obata >Priority: Minor > > Using str header to apache_beam.io.textio.WriteToText as argument cause > TypeError with Python 3.5.6 - or maybe higher - despite docstring says header > is str. > This error occurred by writing header to file without encoding to bytes at > apache_beam.io.textio._TextSink.open. > > {code:java} > Traceback (most recent call last): > File "apache_beam/runners/common.py", line 727, in > apache_beam.runners.common.DoFnRunner.process > File "apache_beam/runners/common.py", line 555, in > apache_beam.runners.common.PerWindowInvoker.invoke_process > File "apache_beam/runners/common.py", line 625, in > apache_beam.runners.common.PerWindowInvoker._invoke_per_window > File > "/Users/yob/.local/share/virtualenvs/test/lib/python3.5/site-packages/apache_beam/io/iobase.py", > line 1033, in process > self.writer = self.sink.open_writer(init_result, str(uuid.uuid4())) > File > "/Users/yob/.local/share/virtualenvs/test/lib/python3.5/site-packages/apache_beam/options/value_provider.py", > line 137, in _f > return fnc(self, *args, **kwargs) > File > "/Users/yob/.local/share/virtualenvs/test/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", > line 185, in open_writer > return FileBasedSinkWriter(self, os.path.join(init_result, uid) + suffix) > File > "/Users/yob/.local/share/virtualenvs/test/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", > line 389, in __init__ > self.temp_handle = self.sink.open(temp_shard_path) > File > "/Users/yob/.local/share/virtualenvs/test/lib/python3.5/site-packages/apache_beam/io/textio.py", > line 393, in open > file_handle.write(self._header) > TypeError: a bytes-like object is required, not 'str' > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-2857) Create FileIO in Python
[ https://issues.apache.org/jira/browse/BEAM-2857?focusedWorklogId=233005&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233005 ] ASF GitHub Bot logged work on BEAM-2857: Author: ASF GitHub Bot Created on: 25/Apr/19 17:39 Start Date: 25/Apr/19 17:39 Worklog Time Spent: 10m Work Description: pabloem commented on issue #8394: [BEAM-2857] Implementing WriteToFiles transform for fileio (Python) URL: https://github.com/apache/beam/pull/8394#issuecomment-486769282 Passing postcommit: https://builds.apache.org/job/beam_PostCommit_Python_Verify_PR/593/ PAssing py3 postcommit: https://builds.apache.org/job/beam_PostCommit_Python3_Verify_PR/230/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233005) Time Spent: 40m (was: 0.5h) > Create FileIO in Python > --- > > Key: BEAM-2857 > URL: https://issues.apache.org/jira/browse/BEAM-2857 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Eugene Kirpichov >Assignee: Pablo Estrada >Priority: Major > Labels: gsoc, gsoc2019, mentor, triaged > Time Spent: 40m > Remaining Estimate: 0h > > Beam Java has a FileIO with operations: match()/matchAll(), readMatches(), > which together cover the majority of needs for general-purpose file > ingestion. Beam Python should have something similar. > An early design document for this: https://s.apache.org/fileio-beam-python -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7029) Support KafkaIO to be configured externally for use with other SDKs
[ https://issues.apache.org/jira/browse/BEAM-7029?focusedWorklogId=233004&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233004 ] ASF GitHub Bot logged work on BEAM-7029: Author: ASF GitHub Bot Created on: 25/Apr/19 17:36 Start Date: 25/Apr/19 17:36 Worklog Time Spent: 10m Work Description: mxm commented on pull request #8322: [BEAM-7029] Add KafkaIO.Write as an external transform URL: https://github.com/apache/beam/pull/8322#discussion_r278657310 ## File path: sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java ## @@ -1325,12 +1325,89 @@ public void populateDisplayData(DisplayData.Builder builder) { abstract Builder toBuilder(); @AutoValue.Builder -abstract static class Builder { +abstract static class Builder +implements ExternalTransformBuilder>, PDone> { abstract Builder setTopic(String topic); abstract Builder setWriteRecordsTransform(WriteRecords transform); abstract Write build(); + + @Override + public PTransform>, PDone> buildExternal( + External.Configuration configuration) { +String topic = utf8String(configuration.topic); +setTopic(topic); + +Map producerConfig = new HashMap<>(); +for (KV kv : configuration.producerConfig) { + String key = utf8String(kv.getKey()); + String value = utf8String(kv.getValue()); + producerConfig.put(key, value); +} +Class keySerializer = resolveClass(utf8String(configuration.keySerializer)); +Class valSerializer = resolveClass(utf8String(configuration.valueSerializer)); + +WriteRecords writeRecords = +KafkaIO.writeRecords() +.updateProducerProperties(producerConfig) +.withKeySerializer(keySerializer) +.withValueSerializer(valSerializer) +.withTopic(topic); +setWriteRecordsTransform(writeRecords); + +return build(); + } + + private static Class resolveClass(String className) { Review comment: +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233004) Time Spent: 13h (was: 12h 50m) > Support KafkaIO to be configured externally for use with other SDKs > --- > > Key: BEAM-7029 > URL: https://issues.apache.org/jira/browse/BEAM-7029 > Project: Beam > Issue Type: New Feature > Components: io-java-kafka, runner-flink, sdk-py-core >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Time Spent: 13h > Remaining Estimate: 0h > > As of BEAM-6730, we can externally configure existing transforms from SDKs. > We should add more useful transforms then just {{GenerateSequence}}. > {{KafkaIO}} is a good candidate. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7029) Support KafkaIO to be configured externally for use with other SDKs
[ https://issues.apache.org/jira/browse/BEAM-7029?focusedWorklogId=233003&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233003 ] ASF GitHub Bot logged work on BEAM-7029: Author: ASF GitHub Bot Created on: 25/Apr/19 17:35 Start Date: 25/Apr/19 17:35 Worklog Time Spent: 10m Work Description: mxm commented on pull request #8322: [BEAM-7029] Add KafkaIO.Write as an external transform URL: https://github.com/apache/beam/pull/8322#discussion_r278657210 ## File path: sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java ## @@ -1325,12 +1325,89 @@ public void populateDisplayData(DisplayData.Builder builder) { abstract Builder toBuilder(); @AutoValue.Builder -abstract static class Builder { +abstract static class Builder +implements ExternalTransformBuilder>, PDone> { abstract Builder setTopic(String topic); abstract Builder setWriteRecordsTransform(WriteRecords transform); abstract Write build(); + + @Override + public PTransform>, PDone> buildExternal( + External.Configuration configuration) { +String topic = utf8String(configuration.topic); +setTopic(topic); + +Map producerConfig = new HashMap<>(); +for (KV kv : configuration.producerConfig) { + String key = utf8String(kv.getKey()); + String value = utf8String(kv.getValue()); + producerConfig.put(key, value); +} +Class keySerializer = resolveClass(utf8String(configuration.keySerializer)); +Class valSerializer = resolveClass(utf8String(configuration.valueSerializer)); + +WriteRecords writeRecords = +KafkaIO.writeRecords() +.updateProducerProperties(producerConfig) +.withKeySerializer(keySerializer) +.withValueSerializer(valSerializer) +.withTopic(topic); +setWriteRecordsTransform(writeRecords); + +return build(); + } + + private static Class resolveClass(String className) { +try { + return Class.forName(className); +} catch (ClassNotFoundException e) { + throw new RuntimeException("Could not find Serializer class: " + className); +} + } + + private static String utf8String(byte[] bytes) { Review comment: I think it makes sense not to repeat this. Let me see if I can find something. `StringUtils` seems like a good place otherwise. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233003) Time Spent: 12h 50m (was: 12h 40m) > Support KafkaIO to be configured externally for use with other SDKs > --- > > Key: BEAM-7029 > URL: https://issues.apache.org/jira/browse/BEAM-7029 > Project: Beam > Issue Type: New Feature > Components: io-java-kafka, runner-flink, sdk-py-core >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Time Spent: 12h 50m > Remaining Estimate: 0h > > As of BEAM-6730, we can externally configure existing transforms from SDKs. > We should add more useful transforms then just {{GenerateSequence}}. > {{KafkaIO}} is a good candidate. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7029) Support KafkaIO to be configured externally for use with other SDKs
[ https://issues.apache.org/jira/browse/BEAM-7029?focusedWorklogId=233000&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-233000 ] ASF GitHub Bot logged work on BEAM-7029: Author: ASF GitHub Bot Created on: 25/Apr/19 17:34 Start Date: 25/Apr/19 17:34 Worklog Time Spent: 10m Work Description: mxm commented on pull request #8322: [BEAM-7029] Add KafkaIO.Write as an external transform URL: https://github.com/apache/beam/pull/8322#discussion_r278656684 ## File path: sdks/python/apache_beam/io/external/kafka.py ## @@ -118,35 +118,112 @@ def expand(self, pbegin): payload.SerializeToString(), self.expansion_service)) - @staticmethod - def _encode_map(dict_obj): -kv_list = [(key.encode('utf-8'), val.encode('utf-8')) - for key, val in dict_obj.items()] -coder = IterableCoder(TupleCoder( -[LengthPrefixCoder(BytesCoder()), LengthPrefixCoder(BytesCoder())])) -coder_urns = ['beam:coder:iterable:v1', - 'beam:coder:kv:v1', - 'beam:coder:bytes:v1', - 'beam:coder:bytes:v1'] -return ConfigValue( -coder_urn=coder_urns, -payload=coder.encode(kv_list)) - - @staticmethod - def _encode_list(list_obj): -encoded_list = [val.encode('utf-8') for val in list_obj] -coder = IterableCoder(LengthPrefixCoder(BytesCoder())) -coder_urns = ['beam:coder:iterable:v1', - 'beam:coder:bytes:v1'] -return ConfigValue( -coder_urn=coder_urns, -payload=coder.encode(encoded_list)) - - @staticmethod - def _encode_str(str_obj): -encoded_str = str_obj.encode('utf-8') -coder = LengthPrefixCoder(BytesCoder()) -coder_urns = ['beam:coder:bytes:v1'] -return ConfigValue( -coder_urn=coder_urns, -payload=coder.encode(encoded_str)) + +class WriteToKafka(ptransform.PTransform): + """ +An external PTransform which writes KV data to a specified Kafka topic. +If no Kafka Serializer for key/value is provided, then key/value are +assumed to be byte arrays. + +Note: To use this transform, you need to start the Java expansion service. +Please refer to the portability documentation on how to do that. The +expansion service address has to be provided when instantiating this +transform. During pipeline translation this transform will be replaced by +the Java SDK's KafkaIO. Review comment: Wasn't sure how this is displayed in IDEs for Python development but seems fair not to repeat it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 233000) Time Spent: 12h 40m (was: 12.5h) > Support KafkaIO to be configured externally for use with other SDKs > --- > > Key: BEAM-7029 > URL: https://issues.apache.org/jira/browse/BEAM-7029 > Project: Beam > Issue Type: New Feature > Components: io-java-kafka, runner-flink, sdk-py-core >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Time Spent: 12h 40m > Remaining Estimate: 0h > > As of BEAM-6730, we can externally configure existing transforms from SDKs. > We should add more useful transforms then just {{GenerateSequence}}. > {{KafkaIO}} is a good candidate. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7029) Support KafkaIO to be configured externally for use with other SDKs
[ https://issues.apache.org/jira/browse/BEAM-7029?focusedWorklogId=232998&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-232998 ] ASF GitHub Bot logged work on BEAM-7029: Author: ASF GitHub Bot Created on: 25/Apr/19 17:33 Start Date: 25/Apr/19 17:33 Worklog Time Spent: 10m Work Description: mxm commented on pull request #8322: [BEAM-7029] Add KafkaIO.Write as an external transform URL: https://github.com/apache/beam/pull/8322#discussion_r278656442 ## File path: sdks/python/apache_beam/runners/portability/flink_runner_test.py ## @@ -189,12 +190,28 @@ def test_external_transforms(self): value_deserializer='org.apache.kafka.' 'common.serialization.' 'LongDeserializer', - expansion_service=expansion_address)) + expansion_service=get_expansion_service())) self.assertTrue('No resolvable bootstrap urls given in bootstrap.servers' in str(ctx.exception), 'Expected to fail due to invalid bootstrap.servers, but ' 'failed due to:\n%s' % str(ctx.exception)) + # We just test the expansion but do not execute. Review comment: Yes, I saw the PR (#8397). That will be great to extend our test coverage here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 232998) Time Spent: 12.5h (was: 12h 20m) > Support KafkaIO to be configured externally for use with other SDKs > --- > > Key: BEAM-7029 > URL: https://issues.apache.org/jira/browse/BEAM-7029 > Project: Beam > Issue Type: New Feature > Components: io-java-kafka, runner-flink, sdk-py-core >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Time Spent: 12.5h > Remaining Estimate: 0h > > As of BEAM-6730, we can externally configure existing transforms from SDKs. > We should add more useful transforms then just {{GenerateSequence}}. > {{KafkaIO}} is a good candidate. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3863) AfterProcessingTime trigger doesn't fire reliably
[ https://issues.apache.org/jira/browse/BEAM-3863?focusedWorklogId=232994&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-232994 ] ASF GitHub Bot logged work on BEAM-3863: Author: ASF GitHub Bot Created on: 25/Apr/19 17:29 Start Date: 25/Apr/19 17:29 Worklog Time Spent: 10m Work Description: mxm commented on issue #8366: [BEAM-3863] Ensure correct firing of processing time timers URL: https://github.com/apache/beam/pull/8366#issuecomment-486765910 Run Flink ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 232994) Time Spent: 2.5h (was: 2h 20m) > AfterProcessingTime trigger doesn't fire reliably > - > > Key: BEAM-3863 > URL: https://issues.apache.org/jira/browse/BEAM-3863 > Project: Beam > Issue Type: Bug > Components: runner-flink >Affects Versions: 2.1.0, 2.2.0, 2.3.0 >Reporter: Pawel Bartoszek >Assignee: Maximilian Michels >Priority: Major > Labels: triaged > Time Spent: 2.5h > Remaining Estimate: 0h > > *Issue* > Beam AfterProcessingTime trigger doesn't fire always reliably after a > configured delay. > The following job triggers should fire after watermark passes the end of the > window and then every 5 seconds for late data and the finally at the end of > allowed lateness. > *Expected behaviour* > Late firing after processing time trigger should fire after 5 seconds since > first late records arrive in the pane. > *Actual behaviour* > From my testings late triggers works for some keys but not for the other - > it's pretty random which keys are affected. The DummySource generates 15 > distinct keys AA,BB,..., PP. For each key it sends 5 on time records and one > late record. In case late trigger firing is missed it won't fire until the > allowed lateness period. > *Job code* > {code:java} > String[] runnerArgs = {"--runner=FlinkRunner", "--parallelism=8"}; > FlinkPipelineOptions options = > PipelineOptionsFactory.fromArgs(runnerArgs).as(FlinkPipelineOptions.class); > Pipeline pipeline = Pipeline.create(options); > PCollection apply = pipeline.apply(Read.from(new DummySource())) > > .apply(Window.into(FixedWindows.of(Duration.standardSeconds(10))) > .triggering(AfterWatermark.pastEndOfWindow() > .withLateFirings( > AfterProcessingTime > > .pastFirstElementInPane().plusDelayOf(Duration.standardSeconds(5 > .accumulatingFiredPanes() > .withAllowedLateness(Duration.standardMinutes(2), > Window.ClosingBehavior.FIRE_IF_NON_EMPTY) > ); > apply.apply(Count.perElement()) > .apply(ParDo.of(new DoFn, Long>() { > @ProcessElement > public void process(ProcessContext context, BoundedWindow window) > { > LOG.info("Count: {}. For window {}, Pane {}", > context.element(), window, context.pane()); > } > })); > pipeline.run().waitUntilFinish();{code} > > *How can you replicate the issue?* > I've created a github repo > [https://github.com/pbartoszek/BEAM-3863_late_trigger] with the code shown > above. Please check out the README file for details how to replicate the > issue. > *What's is causing the issue?* > I explained the cause in PR. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3863) AfterProcessingTime trigger doesn't fire reliably
[ https://issues.apache.org/jira/browse/BEAM-3863?focusedWorklogId=232992&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-232992 ] ASF GitHub Bot logged work on BEAM-3863: Author: ASF GitHub Bot Created on: 25/Apr/19 17:28 Start Date: 25/Apr/19 17:28 Worklog Time Spent: 10m Work Description: mxm commented on issue #8366: [BEAM-3863] Ensure correct firing of processing time timers URL: https://github.com/apache/beam/pull/8366#issuecomment-486765577 Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 232992) Time Spent: 2h 20m (was: 2h 10m) > AfterProcessingTime trigger doesn't fire reliably > - > > Key: BEAM-3863 > URL: https://issues.apache.org/jira/browse/BEAM-3863 > Project: Beam > Issue Type: Bug > Components: runner-flink >Affects Versions: 2.1.0, 2.2.0, 2.3.0 >Reporter: Pawel Bartoszek >Assignee: Maximilian Michels >Priority: Major > Labels: triaged > Time Spent: 2h 20m > Remaining Estimate: 0h > > *Issue* > Beam AfterProcessingTime trigger doesn't fire always reliably after a > configured delay. > The following job triggers should fire after watermark passes the end of the > window and then every 5 seconds for late data and the finally at the end of > allowed lateness. > *Expected behaviour* > Late firing after processing time trigger should fire after 5 seconds since > first late records arrive in the pane. > *Actual behaviour* > From my testings late triggers works for some keys but not for the other - > it's pretty random which keys are affected. The DummySource generates 15 > distinct keys AA,BB,..., PP. For each key it sends 5 on time records and one > late record. In case late trigger firing is missed it won't fire until the > allowed lateness period. > *Job code* > {code:java} > String[] runnerArgs = {"--runner=FlinkRunner", "--parallelism=8"}; > FlinkPipelineOptions options = > PipelineOptionsFactory.fromArgs(runnerArgs).as(FlinkPipelineOptions.class); > Pipeline pipeline = Pipeline.create(options); > PCollection apply = pipeline.apply(Read.from(new DummySource())) > > .apply(Window.into(FixedWindows.of(Duration.standardSeconds(10))) > .triggering(AfterWatermark.pastEndOfWindow() > .withLateFirings( > AfterProcessingTime > > .pastFirstElementInPane().plusDelayOf(Duration.standardSeconds(5 > .accumulatingFiredPanes() > .withAllowedLateness(Duration.standardMinutes(2), > Window.ClosingBehavior.FIRE_IF_NON_EMPTY) > ); > apply.apply(Count.perElement()) > .apply(ParDo.of(new DoFn, Long>() { > @ProcessElement > public void process(ProcessContext context, BoundedWindow window) > { > LOG.info("Count: {}. For window {}, Pane {}", > context.element(), window, context.pane()); > } > })); > pipeline.run().waitUntilFinish();{code} > > *How can you replicate the issue?* > I've created a github repo > [https://github.com/pbartoszek/BEAM-3863_late_trigger] with the code shown > above. Please check out the README file for details how to replicate the > issue. > *What's is causing the issue?* > I explained the cause in PR. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7130) convertAvroFieldStrict as public static function could handle more types of value for logical type timestamp-millis
[ https://issues.apache.org/jira/browse/BEAM-7130?focusedWorklogId=232989&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-232989 ] ASF GitHub Bot logged work on BEAM-7130: Author: ASF GitHub Bot Created on: 25/Apr/19 17:18 Start Date: 25/Apr/19 17:18 Worklog Time Spent: 10m Work Description: bmv126 commented on pull request #8376: [BEAM-7130]Support Datetime value conversion in convertAvroFieldStrict URL: https://github.com/apache/beam/pull/8376#discussion_r278650481 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java ## @@ -669,7 +670,11 @@ public static Object convertAvroFieldStrict( .fromBytes(byteBuffer.duplicate(), type.type, logicalType); return convertDecimal(bigDecimal, fieldType); } else if (logicalType instanceof LogicalTypes.TimestampMillis) { -return convertDateTimeStrict((Long) value, fieldType); +if (value instanceof DateTime) { + return convertDateTimeStrict(((DateTime) value).getMillis(), fieldType); Review comment: As the TimestampConversion is set in a static initialization block, the expectation of the initial code is to covert long to JodaTime during deserialization. So with that expectation the else part will not be needed in this case. Also how are the other logical-types of Avro handled when converting to Row ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 232989) Time Spent: 10m Remaining Estimate: 0h > convertAvroFieldStrict as public static function could handle more types of > value for logical type timestamp-millis > --- > > Key: BEAM-7130 > URL: https://issues.apache.org/jira/browse/BEAM-7130 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Rui Wang >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > https://lists.apache.org/thread.html/68322dcf9418b1d1640273f1a58f70874b61b4996d08dd982b29492c@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-7070) JOIN condition should accept field access
[ https://issues.apache.org/jira/browse/BEAM-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang closed BEAM-7070. -- Resolution: Fixed Fix Version/s: Not applicable > JOIN condition should accept field access > - > > Key: BEAM-7070 > URL: https://issues.apache.org/jira/browse/BEAM-7070 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Fix For: Not applicable > > Time Spent: 5.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-7146) Code generator is accessible from all RelNode
Rui Wang created BEAM-7146: -- Summary: Code generator is accessible from all RelNode Key: BEAM-7146 URL: https://issues.apache.org/jira/browse/BEAM-7146 Project: Beam Issue Type: New Feature Components: dsl-sql Reporter: Rui Wang Right now code generation is used in Projection (in BeamCalcRel). However, code generation can be used in all places that might need to execute a function. Therefore, this JIRA propose to have a layer (called code generator) that could be used in other rels: join, aggregation, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6695) Latest transform for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-6695?focusedWorklogId=232980&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-232980 ] ASF GitHub Bot logged work on BEAM-6695: Author: ASF GitHub Bot Created on: 25/Apr/19 17:00 Start Date: 25/Apr/19 17:00 Worklog Time Spent: 10m Work Description: robinyqiu commented on issue #8206: [BEAM-6695] Latest PTransform for Python SDK URL: https://github.com/apache/beam/pull/8206#issuecomment-486755234 LGTM, and thank you very much for making these changes! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 232980) Time Spent: 6h 10m (was: 6h) > Latest transform for Python SDK > --- > > Key: BEAM-6695 > URL: https://issues.apache.org/jira/browse/BEAM-6695 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Ahmet Altay >Assignee: Tanay Tummalapalli >Priority: Minor > Time Spent: 6h 10m > Remaining Estimate: 0h > > Add a PTransform} and Combine.CombineFn for computing the latest element in a > PCollection. > It should offer the same API as its Java counterpart: > https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Latest.java -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7070) JOIN condition should accept field access
[ https://issues.apache.org/jira/browse/BEAM-7070?focusedWorklogId=232973&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-232973 ] ASF GitHub Bot logged work on BEAM-7070: Author: ASF GitHub Bot Created on: 25/Apr/19 16:47 Start Date: 25/Apr/19 16:47 Worklog Time Spent: 10m Work Description: akedin commented on pull request #8301: [BEAM-7070] JOIN condition should accept field access URL: https://github.com/apache/beam/pull/8301 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 232973) Time Spent: 5.5h (was: 5h 20m) > JOIN condition should accept field access > - > > Key: BEAM-7070 > URL: https://issues.apache.org/jira/browse/BEAM-7070 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Time Spent: 5.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7029) Support KafkaIO to be configured externally for use with other SDKs
[ https://issues.apache.org/jira/browse/BEAM-7029?focusedWorklogId=232961&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-232961 ] ASF GitHub Bot logged work on BEAM-7029: Author: ASF GitHub Bot Created on: 25/Apr/19 16:17 Start Date: 25/Apr/19 16:17 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #8322: [BEAM-7029] Add KafkaIO.Write as an external transform URL: https://github.com/apache/beam/pull/8322#discussion_r278627111 ## File path: sdks/python/apache_beam/io/external/kafka.py ## @@ -118,35 +118,112 @@ def expand(self, pbegin): payload.SerializeToString(), self.expansion_service)) - @staticmethod - def _encode_map(dict_obj): -kv_list = [(key.encode('utf-8'), val.encode('utf-8')) - for key, val in dict_obj.items()] -coder = IterableCoder(TupleCoder( -[LengthPrefixCoder(BytesCoder()), LengthPrefixCoder(BytesCoder())])) -coder_urns = ['beam:coder:iterable:v1', - 'beam:coder:kv:v1', - 'beam:coder:bytes:v1', - 'beam:coder:bytes:v1'] -return ConfigValue( -coder_urn=coder_urns, -payload=coder.encode(kv_list)) - - @staticmethod - def _encode_list(list_obj): -encoded_list = [val.encode('utf-8') for val in list_obj] -coder = IterableCoder(LengthPrefixCoder(BytesCoder())) -coder_urns = ['beam:coder:iterable:v1', - 'beam:coder:bytes:v1'] -return ConfigValue( -coder_urn=coder_urns, -payload=coder.encode(encoded_list)) - - @staticmethod - def _encode_str(str_obj): -encoded_str = str_obj.encode('utf-8') -coder = LengthPrefixCoder(BytesCoder()) -coder_urns = ['beam:coder:bytes:v1'] -return ConfigValue( -coder_urn=coder_urns, -payload=coder.encode(encoded_str)) + +class WriteToKafka(ptransform.PTransform): + """ +An external PTransform which writes KV data to a specified Kafka topic. +If no Kafka Serializer for key/value is provided, then key/value are +assumed to be byte arrays. + +Note: To use this transform, you need to start the Java expansion service. +Please refer to the portability documentation on how to do that. The +expansion service address has to be provided when instantiating this +transform. During pipeline translation this transform will be replaced by +the Java SDK's KafkaIO. Review comment: Prob. move this information to the top of the module (instead of repeating for read and write transforms). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 232961) Time Spent: 12h (was: 11h 50m) > Support KafkaIO to be configured externally for use with other SDKs > --- > > Key: BEAM-7029 > URL: https://issues.apache.org/jira/browse/BEAM-7029 > Project: Beam > Issue Type: New Feature > Components: io-java-kafka, runner-flink, sdk-py-core >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Time Spent: 12h > Remaining Estimate: 0h > > As of BEAM-6730, we can externally configure existing transforms from SDKs. > We should add more useful transforms then just {{GenerateSequence}}. > {{KafkaIO}} is a good candidate. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7029) Support KafkaIO to be configured externally for use with other SDKs
[ https://issues.apache.org/jira/browse/BEAM-7029?focusedWorklogId=232962&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-232962 ] ASF GitHub Bot logged work on BEAM-7029: Author: ASF GitHub Bot Created on: 25/Apr/19 16:17 Start Date: 25/Apr/19 16:17 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #8322: [BEAM-7029] Add KafkaIO.Write as an external transform URL: https://github.com/apache/beam/pull/8322#discussion_r278624714 ## File path: sdks/java/container/Dockerfile ## @@ -22,6 +22,8 @@ MAINTAINER "Apache Beam " ADD target/slf4j-api.jar /opt/apache/beam/jars/ ADD target/slf4j-jdk14.jar /opt/apache/beam/jars/ ADD target/beam-sdks-java-harness.jar /opt/apache/beam/jars/ +ADD target/beam-sdks-java-io-kafka.jar /opt/apache/beam/jars/ Review comment: Please add a TODO to remove these when we support custom environments. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 232962) Time Spent: 12h 10m (was: 12h) > Support KafkaIO to be configured externally for use with other SDKs > --- > > Key: BEAM-7029 > URL: https://issues.apache.org/jira/browse/BEAM-7029 > Project: Beam > Issue Type: New Feature > Components: io-java-kafka, runner-flink, sdk-py-core >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Time Spent: 12h 10m > Remaining Estimate: 0h > > As of BEAM-6730, we can externally configure existing transforms from SDKs. > We should add more useful transforms then just {{GenerateSequence}}. > {{KafkaIO}} is a good candidate. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7029) Support KafkaIO to be configured externally for use with other SDKs
[ https://issues.apache.org/jira/browse/BEAM-7029?focusedWorklogId=232964&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-232964 ] ASF GitHub Bot logged work on BEAM-7029: Author: ASF GitHub Bot Created on: 25/Apr/19 16:17 Start Date: 25/Apr/19 16:17 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #8322: [BEAM-7029] Add KafkaIO.Write as an external transform URL: https://github.com/apache/beam/pull/8322#discussion_r278624790 ## File path: sdks/java/container/Dockerfile-java11 ## @@ -22,6 +22,8 @@ MAINTAINER "Apache Beam " ADD target/slf4j-api.jar /opt/apache/beam/jars/ ADD target/slf4j-jdk14.jar /opt/apache/beam/jars/ ADD target/beam-sdks-java-harness.jar /opt/apache/beam/jars/ +ADD target/beam-sdks-java-io-kafka.jar /opt/apache/beam/jars/ Review comment: Ditto (here and below). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 232964) Time Spent: 12h 20m (was: 12h 10m) > Support KafkaIO to be configured externally for use with other SDKs > --- > > Key: BEAM-7029 > URL: https://issues.apache.org/jira/browse/BEAM-7029 > Project: Beam > Issue Type: New Feature > Components: io-java-kafka, runner-flink, sdk-py-core >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Time Spent: 12h 20m > Remaining Estimate: 0h > > As of BEAM-6730, we can externally configure existing transforms from SDKs. > We should add more useful transforms then just {{GenerateSequence}}. > {{KafkaIO}} is a good candidate. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7029) Support KafkaIO to be configured externally for use with other SDKs
[ https://issues.apache.org/jira/browse/BEAM-7029?focusedWorklogId=232963&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-232963 ] ASF GitHub Bot logged work on BEAM-7029: Author: ASF GitHub Bot Created on: 25/Apr/19 16:17 Start Date: 25/Apr/19 16:17 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #8322: [BEAM-7029] Add KafkaIO.Write as an external transform URL: https://github.com/apache/beam/pull/8322#discussion_r278627923 ## File path: sdks/python/apache_beam/runners/portability/flink_runner_test.py ## @@ -189,12 +190,28 @@ def test_external_transforms(self): value_deserializer='org.apache.kafka.' 'common.serialization.' 'LongDeserializer', - expansion_service=expansion_address)) + expansion_service=get_expansion_service())) self.assertTrue('No resolvable bootstrap urls given in bootstrap.servers' in str(ctx.exception), 'Expected to fail due to invalid bootstrap.servers, but ' 'failed due to:\n%s' % str(ctx.exception)) + # We just test the expansion but do not execute. Review comment: FYI: @ihji is looking into adding a validates runner test suite for cross-language transforms that includes Kafka. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 232963) Time Spent: 12h 10m (was: 12h) > Support KafkaIO to be configured externally for use with other SDKs > --- > > Key: BEAM-7029 > URL: https://issues.apache.org/jira/browse/BEAM-7029 > Project: Beam > Issue Type: New Feature > Components: io-java-kafka, runner-flink, sdk-py-core >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Time Spent: 12h 10m > Remaining Estimate: 0h > > As of BEAM-6730, we can externally configure existing transforms from SDKs. > We should add more useful transforms then just {{GenerateSequence}}. > {{KafkaIO}} is a good candidate. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7029) Support KafkaIO to be configured externally for use with other SDKs
[ https://issues.apache.org/jira/browse/BEAM-7029?focusedWorklogId=232944&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-232944 ] ASF GitHub Bot logged work on BEAM-7029: Author: ASF GitHub Bot Created on: 25/Apr/19 15:30 Start Date: 25/Apr/19 15:30 Worklog Time Spent: 10m Work Description: aromanenko-dev commented on pull request #8322: [BEAM-7029] Add KafkaIO.Write as an external transform URL: https://github.com/apache/beam/pull/8322#discussion_r278579554 ## File path: sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java ## @@ -1325,12 +1325,89 @@ public void populateDisplayData(DisplayData.Builder builder) { abstract Builder toBuilder(); @AutoValue.Builder -abstract static class Builder { +abstract static class Builder +implements ExternalTransformBuilder>, PDone> { abstract Builder setTopic(String topic); abstract Builder setWriteRecordsTransform(WriteRecords transform); abstract Write build(); + + @Override + public PTransform>, PDone> buildExternal( + External.Configuration configuration) { +String topic = utf8String(configuration.topic); +setTopic(topic); + +Map producerConfig = new HashMap<>(); +for (KV kv : configuration.producerConfig) { + String key = utf8String(kv.getKey()); + String value = utf8String(kv.getValue()); + producerConfig.put(key, value); +} +Class keySerializer = resolveClass(utf8String(configuration.keySerializer)); +Class valSerializer = resolveClass(utf8String(configuration.valueSerializer)); + +WriteRecords writeRecords = +KafkaIO.writeRecords() +.updateProducerProperties(producerConfig) +.withKeySerializer(keySerializer) +.withValueSerializer(valSerializer) +.withTopic(topic); +setWriteRecordsTransform(writeRecords); + +return build(); + } + + private static Class resolveClass(String className) { +try { + return Class.forName(className); +} catch (ClassNotFoundException e) { + throw new RuntimeException("Could not find Serializer class: " + className); +} + } + + private static String utf8String(byte[] bytes) { Review comment: There is the same method for Read transform. Do we have similar function in any utils class already? If not, it'd make sense to put it into `org.apache.beam.sdk.util.StringUtils` to avoid code duplication. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 232944) Time Spent: 11h 50m (was: 11h 40m) > Support KafkaIO to be configured externally for use with other SDKs > --- > > Key: BEAM-7029 > URL: https://issues.apache.org/jira/browse/BEAM-7029 > Project: Beam > Issue Type: New Feature > Components: io-java-kafka, runner-flink, sdk-py-core >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Time Spent: 11h 50m > Remaining Estimate: 0h > > As of BEAM-6730, we can externally configure existing transforms from SDKs. > We should add more useful transforms then just {{GenerateSequence}}. > {{KafkaIO}} is a good candidate. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7029) Support KafkaIO to be configured externally for use with other SDKs
[ https://issues.apache.org/jira/browse/BEAM-7029?focusedWorklogId=232914&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-232914 ] ASF GitHub Bot logged work on BEAM-7029: Author: ASF GitHub Bot Created on: 25/Apr/19 14:40 Start Date: 25/Apr/19 14:40 Worklog Time Spent: 10m Work Description: aromanenko-dev commented on pull request #8322: [BEAM-7029] Add KafkaIO.Write as an external transform URL: https://github.com/apache/beam/pull/8322#discussion_r278579554 ## File path: sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java ## @@ -1325,12 +1325,89 @@ public void populateDisplayData(DisplayData.Builder builder) { abstract Builder toBuilder(); @AutoValue.Builder -abstract static class Builder { +abstract static class Builder +implements ExternalTransformBuilder>, PDone> { abstract Builder setTopic(String topic); abstract Builder setWriteRecordsTransform(WriteRecords transform); abstract Write build(); + + @Override + public PTransform>, PDone> buildExternal( + External.Configuration configuration) { +String topic = utf8String(configuration.topic); +setTopic(topic); + +Map producerConfig = new HashMap<>(); +for (KV kv : configuration.producerConfig) { + String key = utf8String(kv.getKey()); + String value = utf8String(kv.getValue()); + producerConfig.put(key, value); +} +Class keySerializer = resolveClass(utf8String(configuration.keySerializer)); +Class valSerializer = resolveClass(utf8String(configuration.valueSerializer)); + +WriteRecords writeRecords = +KafkaIO.writeRecords() +.updateProducerProperties(producerConfig) +.withKeySerializer(keySerializer) +.withValueSerializer(valSerializer) +.withTopic(topic); +setWriteRecordsTransform(writeRecords); + +return build(); + } + + private static Class resolveClass(String className) { +try { + return Class.forName(className); +} catch (ClassNotFoundException e) { + throw new RuntimeException("Could not find Serializer class: " + className); +} + } + + private static String utf8String(byte[] bytes) { Review comment: We have the same method for Read transform. Do we have similar function in any utils class already? If not, it'd make sense to put it into `org.apache.beam.sdk.util.StringUtils` to avoid code duplication. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 232914) Time Spent: 11.5h (was: 11h 20m) > Support KafkaIO to be configured externally for use with other SDKs > --- > > Key: BEAM-7029 > URL: https://issues.apache.org/jira/browse/BEAM-7029 > Project: Beam > Issue Type: New Feature > Components: io-java-kafka, runner-flink, sdk-py-core >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Time Spent: 11.5h > Remaining Estimate: 0h > > As of BEAM-6730, we can externally configure existing transforms from SDKs. > We should add more useful transforms then just {{GenerateSequence}}. > {{KafkaIO}} is a good candidate. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7029) Support KafkaIO to be configured externally for use with other SDKs
[ https://issues.apache.org/jira/browse/BEAM-7029?focusedWorklogId=232915&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-232915 ] ASF GitHub Bot logged work on BEAM-7029: Author: ASF GitHub Bot Created on: 25/Apr/19 14:40 Start Date: 25/Apr/19 14:40 Worklog Time Spent: 10m Work Description: aromanenko-dev commented on pull request #8322: [BEAM-7029] Add KafkaIO.Write as an external transform URL: https://github.com/apache/beam/pull/8322#discussion_r278581393 ## File path: sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java ## @@ -1325,12 +1325,89 @@ public void populateDisplayData(DisplayData.Builder builder) { abstract Builder toBuilder(); @AutoValue.Builder -abstract static class Builder { +abstract static class Builder +implements ExternalTransformBuilder>, PDone> { abstract Builder setTopic(String topic); abstract Builder setWriteRecordsTransform(WriteRecords transform); abstract Write build(); + + @Override + public PTransform>, PDone> buildExternal( + External.Configuration configuration) { +String topic = utf8String(configuration.topic); +setTopic(topic); + +Map producerConfig = new HashMap<>(); +for (KV kv : configuration.producerConfig) { + String key = utf8String(kv.getKey()); + String value = utf8String(kv.getValue()); + producerConfig.put(key, value); +} +Class keySerializer = resolveClass(utf8String(configuration.keySerializer)); +Class valSerializer = resolveClass(utf8String(configuration.valueSerializer)); + +WriteRecords writeRecords = +KafkaIO.writeRecords() +.updateProducerProperties(producerConfig) +.withKeySerializer(keySerializer) +.withValueSerializer(valSerializer) +.withTopic(topic); +setWriteRecordsTransform(writeRecords); + +return build(); + } + + private static Class resolveClass(String className) { Review comment: Does it makes sense to move into `org.apache.beam.sdk.util.SerializableUtils` ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 232915) Time Spent: 11h 40m (was: 11.5h) > Support KafkaIO to be configured externally for use with other SDKs > --- > > Key: BEAM-7029 > URL: https://issues.apache.org/jira/browse/BEAM-7029 > Project: Beam > Issue Type: New Feature > Components: io-java-kafka, runner-flink, sdk-py-core >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Time Spent: 11h 40m > Remaining Estimate: 0h > > As of BEAM-6730, we can externally configure existing transforms from SDKs. > We should add more useful transforms then just {{GenerateSequence}}. > {{KafkaIO}} is a good candidate. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7112) State cleanup interferes with user timer callback
[ https://issues.apache.org/jira/browse/BEAM-7112?focusedWorklogId=232916&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-232916 ] ASF GitHub Bot logged work on BEAM-7112: Author: ASF GitHub Bot Created on: 25/Apr/19 14:40 Start Date: 25/Apr/19 14:40 Worklog Time Spent: 10m Work Description: mxm commented on pull request #8399: [BEAM-7112] Timer race with state cleanup - take two URL: https://github.com/apache/beam/pull/8399#discussion_r278583803 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java ## @@ -401,7 +402,14 @@ private void setTimer(WindowedValue timerElement, TimerInternals.TimerDa @SuppressWarnings("ByteBufferBackingArray") private void fireCleanupTimers() { while (!cleanupTimers.isEmpty()) { - InternalTimer timer = cleanupTimers.remove(); + KV> kv = cleanupTimers.peek(); + if (kv.getKey() != null && kv.getKey() == sdkHarnessRunner.remoteBundle) { +// user timer and cleanup may trigger together in finish bundle, +// defer state cleanup to allow user timers to complete +return; Review comment: Let's move it to where it belongs, i.e. after the bundle has been finished. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 232916) Time Spent: 5h 40m (was: 5.5h) > State cleanup interferes with user timer callback > - > > Key: BEAM-7112 > URL: https://issues.apache.org/jira/browse/BEAM-7112 > Project: Beam > Issue Type: Bug > Components: runner-flink >Affects Versions: 2.12.0 >Reporter: Thomas Weise >Assignee: Thomas Weise >Priority: Major > Labels: portability-flink > Fix For: 2.13.0 > > Time Spent: 5h 40m > Remaining Estimate: 0h > > Cleanup timers and user timers are fired at the watermark. Processing of > timers in the SDK worker is asynchronous, so it is possible that the state is > already removed when the user timer callback executes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7029) Support KafkaIO to be configured externally for use with other SDKs
[ https://issues.apache.org/jira/browse/BEAM-7029?focusedWorklogId=232911&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-232911 ] ASF GitHub Bot logged work on BEAM-7029: Author: ASF GitHub Bot Created on: 25/Apr/19 14:36 Start Date: 25/Apr/19 14:36 Worklog Time Spent: 10m Work Description: mxm commented on issue #8322: [BEAM-7029] Add KafkaIO.Write as an external transform URL: https://github.com/apache/beam/pull/8322#issuecomment-486699358 I already discovered this before opening this PR and created https://jira.apache.org/jira/browse/BEAM-7084. I think this can be solved independently of this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 232911) Time Spent: 11h 20m (was: 11h 10m) > Support KafkaIO to be configured externally for use with other SDKs > --- > > Key: BEAM-7029 > URL: https://issues.apache.org/jira/browse/BEAM-7029 > Project: Beam > Issue Type: New Feature > Components: io-java-kafka, runner-flink, sdk-py-core >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Time Spent: 11h 20m > Remaining Estimate: 0h > > As of BEAM-6730, we can externally configure existing transforms from SDKs. > We should add more useful transforms then just {{GenerateSequence}}. > {{KafkaIO}} is a good candidate. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7029) Support KafkaIO to be configured externally for use with other SDKs
[ https://issues.apache.org/jira/browse/BEAM-7029?focusedWorklogId=232909&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-232909 ] ASF GitHub Bot logged work on BEAM-7029: Author: ASF GitHub Bot Created on: 25/Apr/19 14:31 Start Date: 25/Apr/19 14:31 Worklog Time Spent: 10m Work Description: tweise commented on issue #8322: [BEAM-7029] Add KafkaIO.Write as an external transform URL: https://github.com/apache/beam/pull/8322#issuecomment-486697457 We should allow the user to specify the environment for the external transform. Would like to be able to use EMBEDDED when using the Flink runner. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 232909) Time Spent: 11h 10m (was: 11h) > Support KafkaIO to be configured externally for use with other SDKs > --- > > Key: BEAM-7029 > URL: https://issues.apache.org/jira/browse/BEAM-7029 > Project: Beam > Issue Type: New Feature > Components: io-java-kafka, runner-flink, sdk-py-core >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Time Spent: 11h 10m > Remaining Estimate: 0h > > As of BEAM-6730, we can externally configure existing transforms from SDKs. > We should add more useful transforms then just {{GenerateSequence}}. > {{KafkaIO}} is a good candidate. -- This message was sent by Atlassian JIRA (v7.6.3#76005)