[jira] [Work logged] (BEAM-7970) Regenerate Go SDK proto files in correct version
[ https://issues.apache.org/jira/browse/BEAM-7970?focusedWorklogId=307635=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307635 ] ASF GitHub Bot logged work on BEAM-7970: Author: ASF GitHub Bot Created on: 06/Sep/19 05:38 Start Date: 06/Sep/19 05:38 Worklog Time Spent: 10m Work Description: youngoli commented on pull request #9367: [BEAM-7970] Regenerate Go SDK protos in correct version. URL: https://github.com/apache/beam/pull/9367 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307635) Time Spent: 0.5h (was: 20m) > Regenerate Go SDK proto files in correct version > > > Key: BEAM-7970 > URL: https://issues.apache.org/jira/browse/BEAM-7970 > Project: Beam > Issue Type: Bug > Components: sdk-go >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Generated proto files in the Go SDK currently include this bit: > {{// This is a compile-time assertion to ensure that this generated file}} > {{// is compatible with the proto package it is being compiled against.}} > {{// A compilation error at this line likely means your copy of the}} > {{// proto package needs to be updated.}} > {{const _ = proto.ProtoPackageIsVersion2 // please upgrade the proto package}} > > This indicates that the protos are being generated as proto v2 for whatever > reason. Most likely, as mentioned by this post with someone with a similar > issue, because the proto generation binary needs to be rebuilt before > generating the files again: > [https://github.com/golang/protobuf/issues/449#issuecomment-340884839] > This hasn't caused any errors so far, but might eventually cause errors if we > hit version differences between the v2 and v3 protos. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7970) Regenerate Go SDK proto files in correct version
[ https://issues.apache.org/jira/browse/BEAM-7970?focusedWorklogId=307634=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307634 ] ASF GitHub Bot logged work on BEAM-7970: Author: ASF GitHub Bot Created on: 06/Sep/19 05:38 Start Date: 06/Sep/19 05:38 Worklog Time Spent: 10m Work Description: youngoli commented on issue #9367: [BEAM-7970] Regenerate Go SDK protos in correct version. URL: https://github.com/apache/beam/pull/9367#issuecomment-528716476 Closing because it turns out attempting to regen protos with an updated version of the proto compiler also requires updating a bunch of the Go SDK's dependencies, which is a huge can of worms. And it's mostly unnecessary anyway. The real way to avoid issues is to use an older version of the proto compiler, so the change that really needs to be made is updating the documentation to mention which version to use. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307634) Time Spent: 20m (was: 10m) > Regenerate Go SDK proto files in correct version > > > Key: BEAM-7970 > URL: https://issues.apache.org/jira/browse/BEAM-7970 > Project: Beam > Issue Type: Bug > Components: sdk-go >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Generated proto files in the Go SDK currently include this bit: > {{// This is a compile-time assertion to ensure that this generated file}} > {{// is compatible with the proto package it is being compiled against.}} > {{// A compilation error at this line likely means your copy of the}} > {{// proto package needs to be updated.}} > {{const _ = proto.ProtoPackageIsVersion2 // please upgrade the proto package}} > > This indicates that the protos are being generated as proto v2 for whatever > reason. Most likely, as mentioned by this post with someone with a similar > issue, because the proto generation binary needs to be rebuilt before > generating the files again: > [https://github.com/golang/protobuf/issues/449#issuecomment-340884839] > This hasn't caused any errors so far, but might eventually cause errors if we > hit version differences between the v2 and v3 protos. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-3165) Mongo document read with non hex objectid
[ https://issues.apache.org/jira/browse/BEAM-3165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923932#comment-16923932 ] Maxim Ermilov commented on BEAM-3165: - Python's apache_beam.io.mongodbio has same bug. From [https://docs.mongodb.com/manual/core/document/#the-id-field] "The {{_id}} field may contain values of any [BSON data type|https://docs.mongodb.com/manual/reference/bson-types/], other than an array." So code should not assume _id to be [ObjectId|https://docs.mongodb.com/manual/reference/bson-types/#objectid] > Mongo document read with non hex objectid > - > > Key: BEAM-3165 > URL: https://issues.apache.org/jira/browse/BEAM-3165 > Project: Beam > Issue Type: Bug > Components: io-java-mongodb >Affects Versions: 2.1.0 >Reporter: Utkarsh Sopan >Priority: Major > > I have a mongo collection which has non-hex '_id' in form a string. > I cant read them into a PCollection getting following exception > Exception in thread "main" java.lang.IllegalArgumentException: invalid > hexadecimal representation of an ObjectId: [somestring] > at org.bson.types.ObjectId.parseHexString(ObjectId.java:523) > at org.bson.types.ObjectId.(ObjectId.java:237) > at > org.bson.json.JsonReader.visitObjectIdConstructor(JsonReader.java:674) > at org.bson.json.JsonReader.readBsonType(JsonReader.java:197) > at org.bson.codecs.DocumentCodec.decode(DocumentCodec.java:139) > at org.bson.codecs.DocumentCodec.decode(DocumentCodec.java:45) > at org.bson.codecs.configuration.LazyCodec.decode(LazyCodec.java:47) > at org.bson.codecs.DocumentCodec.readValue(DocumentCodec.java:215) > at org.bson.codecs.DocumentCodec.decode(DocumentCodec.java:141) > at org.bson.codecs.DocumentCodec.decode(DocumentCodec.java:45) > at org.bson.codecs.DocumentCodec.readValue(DocumentCodec.java:215) > at org.bson.codecs.DocumentCodec.readList(DocumentCodec.java:222) > at org.bson.codecs.DocumentCodec.readValue(DocumentCodec.java:208) > at org.bson.codecs.DocumentCodec.decode(DocumentCodec.java:141) > at org.bson.codecs.DocumentCodec.decode(DocumentCodec.java:45) > at org.bson.Document.parse(Document.java:105) > at org.bson.Document.parse(Document.java:90) > at > org.apache.beam.sdk.io.mongodb.MongoDbIO$BoundedMongoDbReader.start(MongoDbIO.java:472) > at > org.apache.beam.runners.direct.BoundedReadEvaluatorFactory$BoundedReadEvaluator.processElement(BoundedReadEvaluatorFactory.java:141) > at > org.apache.beam.runners.direct.TransformExecutor.processElements(TransformExecutor.java:146) > at > org.apache.beam.runners.direct.TransformExecutor.run(TransformExecutor.java:110) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Comment Edited] (BEAM-8036) [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method
[ https://issues.apache.org/jira/browse/BEAM-8036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923930#comment-16923930 ] Rui Wang edited comment on BEAM-8036 at 9/6/19 5:11 AM: I believe the root cause is from [#9158|https://github.com/apache/beam/pull/9158] as [#9210|https://github.com/apache/beam/pull/9210] is directly built from it. was (Author: amaliujia): I believe the root cause is from #9158 as #9210 is directly built from it. > [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method > > > Key: BEAM-8036 > URL: https://issues.apache.org/jira/browse/BEAM-8036 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Mikhail Gryzykhin >Assignee: Rui Wang >Priority: Major > Labels: currently-failing > Fix For: Not applicable > > Time Spent: 2h > Remaining Estimate: 0h > > _Use this form to file an issue for test failure:_ > * [Jenkins > Job|[https://builds.apache.org/job/beam_PostCommit_SQL/2417/console]] > * [Gradle Build Scan|TODO] > * [Test source code|TODO] > Initial investigation: > *09:03:27* > > *Task :sdks:java:extensions:sql:datacatalog:integrationTest* > *09:03:27* *09:03:27* > org.apache.beam.sdk.extensions.sql.meta.provider.datacatalog.DataCatalogBigQueryIT > > testReadWrite FAILED*09:03:27* java.lang.NoSuchMethodError at > DataCatalogBigQueryIT.java:69*09:03:27* *09:03:27* 1 test completed, 1 > failed*09:03:28* *09:03:28* > > *Task :sdks:java:extensions:sql:datacatalog:integrationTest* > FAILED*09:03:28* *09:03:28* FAILURE: Build failed with an exception. > > > _After you've filled out the above details, please [assign the issue to an > individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist]. > Assignee should [treat test failures as > high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test], > helping to fix the issue or find a more appropriate owner. See [Apache Beam > Post-Commit > Policies|https://beam.apache.org/contribute/postcommits-policies]._ -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-8036) [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method
[ https://issues.apache.org/jira/browse/BEAM-8036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923930#comment-16923930 ] Rui Wang commented on BEAM-8036: I believe the root cause is from #9158 as #9210 is directly built from it. > [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method > > > Key: BEAM-8036 > URL: https://issues.apache.org/jira/browse/BEAM-8036 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Mikhail Gryzykhin >Assignee: Rui Wang >Priority: Major > Labels: currently-failing > Fix For: Not applicable > > Time Spent: 2h > Remaining Estimate: 0h > > _Use this form to file an issue for test failure:_ > * [Jenkins > Job|[https://builds.apache.org/job/beam_PostCommit_SQL/2417/console]] > * [Gradle Build Scan|TODO] > * [Test source code|TODO] > Initial investigation: > *09:03:27* > > *Task :sdks:java:extensions:sql:datacatalog:integrationTest* > *09:03:27* *09:03:27* > org.apache.beam.sdk.extensions.sql.meta.provider.datacatalog.DataCatalogBigQueryIT > > testReadWrite FAILED*09:03:27* java.lang.NoSuchMethodError at > DataCatalogBigQueryIT.java:69*09:03:27* *09:03:27* 1 test completed, 1 > failed*09:03:28* *09:03:28* > > *Task :sdks:java:extensions:sql:datacatalog:integrationTest* > FAILED*09:03:28* *09:03:28* FAILURE: Build failed with an exception. > > > _After you've filled out the above details, please [assign the issue to an > individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist]. > Assignee should [treat test failures as > high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test], > helping to fix the issue or find a more appropriate owner. See [Apache Beam > Post-Commit > Policies|https://beam.apache.org/contribute/postcommits-policies]._ -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7947) Improves the interfaces of classes such as FnDataService, BundleProcessor, ActiveBundle, etc to change the parameter type from WindowedValue to T
[ https://issues.apache.org/jira/browse/BEAM-7947?focusedWorklogId=307626=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307626 ] ASF GitHub Bot logged work on BEAM-7947: Author: ASF GitHub Bot Created on: 06/Sep/19 05:07 Start Date: 06/Sep/19 05:07 Worklog Time Spent: 10m Work Description: sunjincheng121 commented on issue #9496: [BEAM-7947] Improves the interfaces of classes such as FnDataService,… URL: https://github.com/apache/beam/pull/9496#issuecomment-528710018 R: @kennknowles @robertwb @lukecwik This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307626) Time Spent: 20m (was: 10m) > Improves the interfaces of classes such as FnDataService, BundleProcessor, > ActiveBundle, etc to change the parameter type from WindowedValue to T > > > Key: BEAM-7947 > URL: https://issues.apache.org/jira/browse/BEAM-7947 > Project: Beam > Issue Type: Sub-task > Components: java-fn-execution >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: 2.16.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Both `Coder>` and `FnDataReceiver>` use > `WindowedValue` as the data structure that both sides of Runner and SDK > Harness know each other. Control Plane/Data Plane/State Plane/Logging is a > highly abstraction, such as Control Plane and Logging, these are common > requirements for all multi-language platforms. For example, the Flink > community is also discussing how to support Python UDF, as well as how to > deal with docker environment. how to data transfer, how to state access, how > to logging etc. If Beam can further abstract these service interfaces, i.e., > interface definitions are compatible with multiple engines, and finally > provided to other projects in the form of class libraries, it definitely will > help other platforms that want to support multiple languages. Here I am to > throw out a minnow to catch a whale, take the FnDataService#receive interface > as an example, and turn `WindowedValue` into `T` so that other platforms > can be extended arbitrarily, as follows: > {code} > InboundDataClient receive(LogicalEndpoint inputLocation, Coder coder, > FnDataReceiver> listener); > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8151) Allow the Python SDK to use many many threads
[ https://issues.apache.org/jira/browse/BEAM-8151?focusedWorklogId=307614=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307614 ] ASF GitHub Bot logged work on BEAM-8151: Author: ASF GitHub Bot Created on: 06/Sep/19 04:24 Start Date: 06/Sep/19 04:24 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #9477: [BEAM-8151, BEAM-7848] Up the max number of threads inside the SDK harness to a default of 10k URL: https://github.com/apache/beam/pull/9477#issuecomment-528702292 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307614) Time Spent: 4h 40m (was: 4.5h) > Allow the Python SDK to use many many threads > - > > Key: BEAM-8151 > URL: https://issues.apache.org/jira/browse/BEAM-8151 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: Major > Time Spent: 4h 40m > Remaining Estimate: 0h > > We need to use a thread pool which shrinks the number of active threads when > they are not being used. > > This is to prevent any stuckness issues related to a runner scheduling more > work items then there are "work" threads inside the SDK harness. > > By default the control plane should have all "requests" being processed in > parallel and the runner is responsible for not overloading the SDK with too > much work. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7947) Improves the interfaces of classes such as FnDataService, BundleProcessor, ActiveBundle, etc to change the parameter type from WindowedValue to T
[ https://issues.apache.org/jira/browse/BEAM-7947?focusedWorklogId=307611=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307611 ] ASF GitHub Bot logged work on BEAM-7947: Author: ASF GitHub Bot Created on: 06/Sep/19 04:11 Start Date: 06/Sep/19 04:11 Worklog Time Spent: 10m Work Description: sunjincheng121 commented on pull request #9496: [BEAM-7947] Improves the interfaces of classes such as FnDataService,… URL: https://github.com/apache/beam/pull/9496 Currently, Both `Coder>` and `FnDataReceiver>` use `WindowedValue` as the data structure that both sides of Runner and SDK Harness know each other. Control Plane/Data Plane/State Plane/Logging is a highly abstraction, such as Control Plane and Logging, these are common requirements for all multi-language platforms. For example, the Flink community is also discussing how to support Python UDF, as well as how to deal with docker environment. how to data transfer, how to state access, how to logging etc. If Beam can further abstract these service interfaces, i.e., interface definitions are compatible with multiple engines, and finally provided to other projects in the form of class libraries, it definitely will help other platforms that want to support multiple languages. Take the FnDataService#receive interface as an example, and turn `WindowedValue` into `T` so that other platforms can be extended arbitrarily, as follows: ``` InboundDataClient receive(LogicalEndpoint inputLocation, Coder coder, FnDataReceiver> listener); ``` For details of the discussion can be found in here: https://lists.apache.org/list.html?d...@beam.apache.org:lte=1M:%5BDISCUSS%5D%20Turn%20%60WindowedValue Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build
[jira] [Work logged] (BEAM-7945) Allow runner to configure "semi_persist_dir" which is used in the SDK harness
[ https://issues.apache.org/jira/browse/BEAM-7945?focusedWorklogId=307599=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307599 ] ASF GitHub Bot logged work on BEAM-7945: Author: ASF GitHub Bot Created on: 06/Sep/19 03:21 Start Date: 06/Sep/19 03:21 Worklog Time Spent: 10m Work Description: sunjincheng121 commented on pull request #9452: [BEAM-7945] Allow runner to configure semi_persist_dir which is used … URL: https://github.com/apache/beam/pull/9452#discussion_r321562506 ## File path: model/fn-execution/src/main/proto/beam_fn_api.proto ## @@ -815,6 +815,7 @@ message StartWorkerRequest { org.apache.beam.model.pipeline.v1.ApiServiceDescriptor logging_endpoint = 3; org.apache.beam.model.pipeline.v1.ApiServiceDescriptor artifact_endpoint = 4; org.apache.beam.model.pipeline.v1.ApiServiceDescriptor provision_endpoint = 5; + string semi_persist_dir = 6; Review comment: Oh, Yes, I see, this is useless changes. `pipeline_options` already defined in the `ProvisionInfo`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307599) Time Spent: 1h 20m (was: 1h 10m) > Allow runner to configure "semi_persist_dir" which is used in the SDK harness > - > > Key: BEAM-7945 > URL: https://issues.apache.org/jira/browse/BEAM-7945 > Project: Beam > Issue Type: Sub-task > Components: java-fn-execution, sdk-go, sdk-java-core, sdk-py-core >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: 2.16.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently "semi_persist_dir" is not configurable. This may become a problem > in certain scenarios. For example, the default value of "semi_persist_dir" is > "/tmp" > ([https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L48]) > in Python SDK harness. When the environment type is "PROCESS", the disk of > "/tmp" may be filled up and unexpected issues will occur in production > environment. We should provide a way to configure "semi_persist_dir" in > EnvironmentFactory at the runner side. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (BEAM-8088) PCollection boundedness should be tracked and propagated in python sdk
[ https://issues.apache.org/jira/browse/BEAM-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chad Dombrova updated BEAM-8088: Summary: PCollection boundedness should be tracked and propagated in python sdk (was: [python] PCollection boundedness should be tracked and propagated) > PCollection boundedness should be tracked and propagated in python sdk > -- > > Key: BEAM-8088 > URL: https://issues.apache.org/jira/browse/BEAM-8088 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Fix For: 2.16.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > As far as I can tell Python does not care about boundedness of PCollections > even in streaming mode, but external transforms _do_. In my ongoing effort > to get PubsubIO external transforms working I discovered that I could not > generate an unbounded write. > My pipeline looks like this: > {code:python} > ( > pipe > | 'PubSubInflow' >> > external.pubsub.ReadFromPubSub(subscription=subscription, > with_attributes=True) > | 'PubSubOutflow' >> > external.pubsub.WriteToPubSub(topic=OUTPUT_TOPIC, with_attributes=True) > ) > {code} > The PCollections returned from the external Read are Unbounded, as expected, > but python is responsible for creating the intermediate PCollection, which is > always Bounded, and thus external Write is always Bounded. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (BEAM-8088) [python] PCollection boundedness should be tracked and propagated
[ https://issues.apache.org/jira/browse/BEAM-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chad Dombrova updated BEAM-8088: Summary: [python] PCollection boundedness should be tracked and propagated (was: PCollection boundedness should be tracked and propagated) > [python] PCollection boundedness should be tracked and propagated > - > > Key: BEAM-8088 > URL: https://issues.apache.org/jira/browse/BEAM-8088 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Fix For: 2.16.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > As far as I can tell Python does not care about boundedness of PCollections > even in streaming mode, but external transforms _do_. In my ongoing effort > to get PubsubIO external transforms working I discovered that I could not > generate an unbounded write. > My pipeline looks like this: > {code:python} > ( > pipe > | 'PubSubInflow' >> > external.pubsub.ReadFromPubSub(subscription=subscription, > with_attributes=True) > | 'PubSubOutflow' >> > external.pubsub.WriteToPubSub(topic=OUTPUT_TOPIC, with_attributes=True) > ) > {code} > The PCollections returned from the external Read are Unbounded, as expected, > but python is responsible for creating the intermediate PCollection, which is > always Bounded, and thus external Write is always Bounded. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8161) Upgrade to joda time 2.10.3 to get updated TZDB
[ https://issues.apache.org/jira/browse/BEAM-8161?focusedWorklogId=307598=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307598 ] ASF GitHub Bot logged work on BEAM-8161: Author: ASF GitHub Bot Created on: 06/Sep/19 03:18 Start Date: 06/Sep/19 03:18 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #9495: [BEAM-8161] Update joda time to 2.10.3 to get TZDB 2019b URL: https://github.com/apache/beam/pull/9495 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | --- | [![Build
[jira] [Commented] (BEAM-5530) Migrate to java.time lib instead of joda-time
[ https://issues.apache.org/jira/browse/BEAM-5530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923892#comment-16923892 ] Luke Cwik commented on BEAM-5530: - One advantage with jodatime is that the TZDB is typically updated more often and you don't have to wait for a JVM release or muck with the JVM installation to get an update TZDB. Once custom containers are used for the Java SDK everywhere, this point will be moot because then anybody will be able to roll their own container containing an updated TZDB, > Migrate to java.time lib instead of joda-time > - > > Key: BEAM-5530 > URL: https://issues.apache.org/jira/browse/BEAM-5530 > Project: Beam > Issue Type: Improvement > Components: dependencies >Reporter: Alexey Romanenko >Priority: Major > Fix For: 3.0.0 > > > Joda-time has been used till moving to Java 8. For now, these two time > libraries are used together. It will make sense finally to move everywhere to > only one lib - *java.time* - as a standard Java time library (see mail list > discussion: > [https://lists.apache.org/thread.html/b10f6f9daed44f5fa65e315a44b68b2f57c3e80225f5d549b84918af@%3Cdev.beam.apache.org%3E]). > > Since this migration will introduce breaking API changes, then we should > address it to 3.0 release. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8161) Upgrade to joda time 2.10.3 to get updated TZDB
Luke Cwik created BEAM-8161: --- Summary: Upgrade to joda time 2.10.3 to get updated TZDB Key: BEAM-8161 URL: https://issues.apache.org/jira/browse/BEAM-8161 Project: Beam Issue Type: Improvement Components: sdk-java-core Reporter: Luke Cwik Assignee: Luke Cwik -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (BEAM-8161) Upgrade to joda time 2.10.3 to get updated TZDB
[ https://issues.apache.org/jira/browse/BEAM-8161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke Cwik updated BEAM-8161: Status: Open (was: Triage Needed) > Upgrade to joda time 2.10.3 to get updated TZDB > --- > > Key: BEAM-8161 > URL: https://issues.apache.org/jira/browse/BEAM-8161 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8151) Allow the Python SDK to use many many threads
[ https://issues.apache.org/jira/browse/BEAM-8151?focusedWorklogId=307590=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307590 ] ASF GitHub Bot logged work on BEAM-8151: Author: ASF GitHub Bot Created on: 06/Sep/19 02:56 Start Date: 06/Sep/19 02:56 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #9477: [BEAM-8151, BEAM-7848] Up the max number of threads inside the SDK harness to a default of 10k URL: https://github.com/apache/beam/pull/9477#issuecomment-528687056 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307590) Time Spent: 4.5h (was: 4h 20m) > Allow the Python SDK to use many many threads > - > > Key: BEAM-8151 > URL: https://issues.apache.org/jira/browse/BEAM-8151 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: Major > Time Spent: 4.5h > Remaining Estimate: 0h > > We need to use a thread pool which shrinks the number of active threads when > they are not being used. > > This is to prevent any stuckness issues related to a runner scheduling more > work items then there are "work" threads inside the SDK harness. > > By default the control plane should have all "requests" being processed in > parallel and the runner is responsible for not overloading the SDK with too > much work. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7984) [python] The coder returned for typehints.List should be IterableCoder
[ https://issues.apache.org/jira/browse/BEAM-7984?focusedWorklogId=307570=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307570 ] ASF GitHub Bot logged work on BEAM-7984: Author: ASF GitHub Bot Created on: 06/Sep/19 01:49 Start Date: 06/Sep/19 01:49 Worklog Time Spent: 10m Work Description: robertwb commented on issue #9344: [BEAM-7984] The coder returned for typehints.List should be IterableCoder URL: https://github.com/apache/beam/pull/9344#issuecomment-528673785 Long queue... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307570) Time Spent: 4h (was: 3h 50m) > [python] The coder returned for typehints.List should be IterableCoder > -- > > Key: BEAM-7984 > URL: https://issues.apache.org/jira/browse/BEAM-7984 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 4h > Remaining Estimate: 0h > > IterableCoder encodes a list and decodes to list, but > typecoders.registry.get_coder(typehints.List[bytes]) returns a > FastPrimitiveCoder. I don't see any reason why this would be advantageous. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7984) [python] The coder returned for typehints.List should be IterableCoder
[ https://issues.apache.org/jira/browse/BEAM-7984?focusedWorklogId=307572=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307572 ] ASF GitHub Bot logged work on BEAM-7984: Author: ASF GitHub Bot Created on: 06/Sep/19 01:49 Start Date: 06/Sep/19 01:49 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #9344: [BEAM-7984] The coder returned for typehints.List should be IterableCoder URL: https://github.com/apache/beam/pull/9344 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307572) Time Spent: 4h 10m (was: 4h) > [python] The coder returned for typehints.List should be IterableCoder > -- > > Key: BEAM-7984 > URL: https://issues.apache.org/jira/browse/BEAM-7984 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > > IterableCoder encodes a list and decodes to list, but > typecoders.registry.get_coder(typehints.List[bytes]) returns a > FastPrimitiveCoder. I don't see any reason why this would be advantageous. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7984) [python] The coder returned for typehints.List should be IterableCoder
[ https://issues.apache.org/jira/browse/BEAM-7984?focusedWorklogId=307565=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307565 ] ASF GitHub Bot logged work on BEAM-7984: Author: ASF GitHub Bot Created on: 06/Sep/19 01:29 Start Date: 06/Sep/19 01:29 Worklog Time Spent: 10m Work Description: chadrik commented on issue #9344: [BEAM-7984] The coder returned for typehints.List should be IterableCoder URL: https://github.com/apache/beam/pull/9344#issuecomment-528669756 yay, it works! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307565) Time Spent: 3h 50m (was: 3h 40m) > [python] The coder returned for typehints.List should be IterableCoder > -- > > Key: BEAM-7984 > URL: https://issues.apache.org/jira/browse/BEAM-7984 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 3h 50m > Remaining Estimate: 0h > > IterableCoder encodes a list and decodes to list, but > typecoders.registry.get_coder(typehints.List[bytes]) returns a > FastPrimitiveCoder. I don't see any reason why this would be advantageous. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=307561=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307561 ] ASF GitHub Bot logged work on BEAM-7389: Author: ASF GitHub Bot Created on: 06/Sep/19 01:01 Start Date: 06/Sep/19 01:01 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9257: [BEAM-7389] Add DoFn methods sample URL: https://github.com/apache/beam/pull/9257 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307561) Time Spent: 55h 50m (was: 55h 40m) > Colab examples for element-wise transforms (Python) > --- > > Key: BEAM-7389 > URL: https://issues.apache.org/jira/browse/BEAM-7389 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Rose Nguyen >Assignee: David Cavazos >Priority: Minor > Time Spent: 55h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8146) SchemaCoder/RowCoder have no equals() function
[ https://issues.apache.org/jira/browse/BEAM-8146?focusedWorklogId=307552=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307552 ] ASF GitHub Bot logged work on BEAM-8146: Author: ASF GitHub Bot Created on: 06/Sep/19 00:42 Start Date: 06/Sep/19 00:42 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on pull request #9493: [BEAM-8146] Add equals and hashCode to SchemaCoder and RowCoder URL: https://github.com/apache/beam/pull/9493 Adds a `typeDescriptor` parameter to SchemaCoder. Two SchemaCoders are considered equal if their type descriptors are equal and their schemas are equal (i.e. the schemas have the same UUID). This change required plumbing through a TypeDescriptor everywhere a SchemaCoder is created. Fortunately, most of these occurrences are already using a TypeDescriptor to look up a Schema from a SchemaRegistry. There are a few exceptions to this rule, where I have to construct a `TypeDescriptor.of(clazz)` we may be able to eliminate this with a bit of refactoring. This PR also adds a SchemaCoderTest file which just contains tests of `SchemaCoder::equals`. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | --- | [![Build
[jira] [Work logged] (BEAM-5428) Implement cross-bundle state caching.
[ https://issues.apache.org/jira/browse/BEAM-5428?focusedWorklogId=307545=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307545 ] ASF GitHub Bot logged work on BEAM-5428: Author: ASF GitHub Bot Created on: 06/Sep/19 00:04 Start Date: 06/Sep/19 00:04 Worklog Time Spent: 10m Work Description: tweise commented on pull request #9418: [BEAM-5428] Implement cross-bundle user state caching in the Python SDK URL: https://github.com/apache/beam/pull/9418#discussion_r321527154 ## File path: sdks/python/apache_beam/runners/worker/sdk_worker_main.py ## @@ -149,6 +149,7 @@ def main(unused_argv): control_address=service_descriptor.url, worker_count=_get_worker_count(sdk_pipeline_options), worker_id=_worker_id, +state_cache_size=_get_state_cache_size(sdk_pipeline_options), Review comment: It might be better to pass the pipeline options to SDKHarness and have it extract what it needs. There is currently quite some (parameter related) noise wherever SdkHarness is constructed in general. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307545) Time Spent: 17h 50m (was: 17h 40m) > Implement cross-bundle state caching. > - > > Key: BEAM-5428 > URL: https://issues.apache.org/jira/browse/BEAM-5428 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: Robert Bradshaw >Assignee: Rakesh Kumar >Priority: Major > Time Spent: 17h 50m > Remaining Estimate: 0h > > Tech spec: > [https://docs.google.com/document/d/1BOozW0bzBuz4oHJEuZNDOHdzaV5Y56ix58Ozrqm2jFg/edit#heading=h.7ghoih5aig5m] > Relevant document: > [https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit#|https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit] > Mailing list link: > [https://lists.apache.org/thread.html/caa8d9bc6ca871d13de2c5e6ba07fdc76f85d26497d95d90893aa1f6@%3Cdev.beam.apache.org%3E] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7600) Spark portable runner: reuse SDK harness
[ https://issues.apache.org/jira/browse/BEAM-7600?focusedWorklogId=307543=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307543 ] ASF GitHub Bot logged work on BEAM-7600: Author: ASF GitHub Bot Created on: 05/Sep/19 23:57 Start Date: 05/Sep/19 23:57 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #9095: [BEAM-7600] borrow SDK harness management code into Spark runner URL: https://github.com/apache/beam/pull/9095#discussion_r321531229 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DefaultExecutableStageContext.java ## @@ -102,7 +100,8 @@ private JobFactoryState(int maxFactories) { new ConcurrentHashMap<>(); @Override -public FlinkExecutableStageContext get(JobInfo jobInfo) { +public ExecutableStageContext get( +JobInfo jobInfo, SerializableFunction isReleaseSynchronous) { Review comment: As discussed offline, I made separate contexts for Flink and Spark. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307543) Time Spent: 6h 50m (was: 6h 40m) > Spark portable runner: reuse SDK harness > > > Key: BEAM-7600 > URL: https://issues.apache.org/jira/browse/BEAM-7600 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > Time Spent: 6h 50m > Remaining Estimate: 0h > > Right now, we're creating a new SDK harness every time an executable stage is > run [1], which is expensive. We should be able to re-use code from the Flink > runner to re-use the SDK harness [2]. > > [1] > [https://github.com/apache/beam/blob/c9fb261bc7666788402840bb6ce1b0ce2fd445d1/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkExecutableStageFunction.java#L135] > [2] > [https://github.com/apache/beam/blob/c9fb261bc7666788402840bb6ce1b0ce2fd445d1/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/FlinkDefaultExecutableStageContext.java] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8113) FlinkRunner: Stage files from context classloader
[ https://issues.apache.org/jira/browse/BEAM-8113?focusedWorklogId=307542=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307542 ] ASF GitHub Bot logged work on BEAM-8113: Author: ASF GitHub Bot Created on: 05/Sep/19 23:56 Start Date: 05/Sep/19 23:56 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #9451: [BEAM-8113] Stage files from context classloader URL: https://github.com/apache/beam/pull/9451#discussion_r321526565 ## File path: runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PipelineResourcesTest.java ## @@ -44,24 +49,98 @@ @Rule public transient TemporaryFolder tmpFolder = new TemporaryFolder(); @Rule public transient ExpectedException thrown = ExpectedException.none(); + ClassLoader currentContext = null; Review comment: nit: Consider making this a test rule like [RestoreSystemProperties](https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/RestoreSystemProperties.java) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307542) Time Spent: 7h 10m (was: 7h) > FlinkRunner: Stage files from context classloader > - > > Key: BEAM-8113 > URL: https://issues.apache.org/jira/browse/BEAM-8113 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Jan Lukavský >Assignee: Jan Lukavský >Priority: Major > Time Spent: 7h 10m > Remaining Estimate: 0h > > Currently, only files from {{FlinkRunner.class.getClassLoader()}} are staged > by default. Add also files from > {{Thread.currentThread().getContextClassLoader()}}. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8113) FlinkRunner: Stage files from context classloader
[ https://issues.apache.org/jira/browse/BEAM-8113?focusedWorklogId=307541=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307541 ] ASF GitHub Bot logged work on BEAM-8113: Author: ASF GitHub Bot Created on: 05/Sep/19 23:56 Start Date: 05/Sep/19 23:56 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #9451: [BEAM-8113] Stage files from context classloader URL: https://github.com/apache/beam/pull/9451#discussion_r321530670 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PipelineResources.java ## @@ -40,34 +44,42 @@ public class PipelineResources { /** - * Attempts to detect all the resources the class loader has access to. This does not recurse to - * class loader parents stopping it from pulling in resources from the system class loader. + * Detects all URLs that are present in all class loaders in between context class loader of + * calling thread and class loader of class passed as parameter. It doesn't follow parents above + * this class loader stopping it from pulling in resources from the system class loader. * - * @param classLoader The URLClassLoader to use to detect resources to stage. - * @throws IllegalArgumentException If either the class loader is not a URLClassLoader or one of - * the resources the class loader exposes is not a file resource. + * @param cls Class whose class loader stops recursion into parent loaders + * @throws IllegalArgumentException no classloader in context hierarchy is URLClassloader or if + * one of the resources any class loader exposes is not a file resource. * @return A list of absolute paths to the resources the class loader uses. */ - public static List detectClassPathResourcesToStage(ClassLoader classLoader) { -if (!(classLoader instanceof URLClassLoader)) { - String message = - String.format( - "Unable to use ClassLoader to detect classpath elements. " - + "Current ClassLoader is %s, only URLClassLoaders are supported.", - classLoader); - throw new IllegalArgumentException(message); -} + public static List detectClassPathResourcesToStage(Class cls) { +return detectClassPathResourcesToStage(cls.getClassLoader()); + } -List files = new ArrayList<>(); -for (URL url : ((URLClassLoader) classLoader).getURLs()) { - try { -files.add(new File(url.toURI()).getAbsolutePath()); - } catch (IllegalArgumentException | URISyntaxException e) { -String message = String.format("Unable to convert url (%s) to file.", url); -throw new IllegalArgumentException(message, e); - } + /** + * Detect resources to stage, by using hierarchy between current context classloader and the given + * one. Always include the given class loader in the process. + * + * @param loader the loader to use as stopping condition when traversing context class loaders + * @return A list of absolute paths to the resources the class loader uses. + */ + @VisibleForTesting + static List detectClassPathResourcesToStage(final ClassLoader loader) { +Set files = new HashSet<>(); Review comment: Making this a set even when there is a single classloader means that we will reorder that classloaders classpath which will cause lots of dependency conflict issues. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307541) Time Spent: 7h 10m (was: 7h) > FlinkRunner: Stage files from context classloader > - > > Key: BEAM-8113 > URL: https://issues.apache.org/jira/browse/BEAM-8113 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Jan Lukavský >Assignee: Jan Lukavský >Priority: Major > Time Spent: 7h 10m > Remaining Estimate: 0h > > Currently, only files from {{FlinkRunner.class.getClassLoader()}} are staged > by default. Add also files from > {{Thread.currentThread().getContextClassLoader()}}. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-5428) Implement cross-bundle state caching.
[ https://issues.apache.org/jira/browse/BEAM-5428?focusedWorklogId=307536=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307536 ] ASF GitHub Bot logged work on BEAM-5428: Author: ASF GitHub Bot Created on: 05/Sep/19 23:43 Start Date: 05/Sep/19 23:43 Worklog Time Spent: 10m Work Description: tweise commented on pull request #9418: [BEAM-5428] Implement cross-bundle user state caching in the Python SDK URL: https://github.com/apache/beam/pull/9418#discussion_r321527901 ## File path: sdks/python/apache_beam/runners/worker/worker_pool_main.py ## @@ -47,21 +47,26 @@ class BeamFnExternalWorkerPoolServicer( beam_fn_api_pb2_grpc.BeamFnExternalWorkerPoolServicer): - def __init__(self, worker_threads, use_process=False, - container_executable=None): + def __init__(self, worker_threads, + use_process=False, + container_executable=None, + state_cache_size=0): self._worker_threads = worker_threads self._use_process = use_process self._container_executable = container_executable +self._state_cache_size = state_cache_size self._worker_processes = {} @classmethod def start(cls, worker_threads=1, use_process=False, port=0, -container_executable=None): +state_cache_size=0, container_executable=None): worker_server = grpc.server(futures.ThreadPoolExecutor(max_workers=10)) worker_address = 'localhost:%s' % worker_server.add_insecure_port( '[::]:%s' % port) -worker_pool = cls(worker_threads, use_process=use_process, - container_executable=container_executable) +worker_pool = cls(worker_threads, + use_process=use_process, + container_executable=container_executable, + state_cache_size=state_cache_size) Review comment: This should not occur here but come from the pipeline options from the provisioning endpoint. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307536) Time Spent: 17h 40m (was: 17.5h) > Implement cross-bundle state caching. > - > > Key: BEAM-5428 > URL: https://issues.apache.org/jira/browse/BEAM-5428 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: Robert Bradshaw >Assignee: Rakesh Kumar >Priority: Major > Time Spent: 17h 40m > Remaining Estimate: 0h > > Tech spec: > [https://docs.google.com/document/d/1BOozW0bzBuz4oHJEuZNDOHdzaV5Y56ix58Ozrqm2jFg/edit#heading=h.7ghoih5aig5m] > Relevant document: > [https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit#|https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit] > Mailing list link: > [https://lists.apache.org/thread.html/caa8d9bc6ca871d13de2c5e6ba07fdc76f85d26497d95d90893aa1f6@%3Cdev.beam.apache.org%3E] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-5428) Implement cross-bundle state caching.
[ https://issues.apache.org/jira/browse/BEAM-5428?focusedWorklogId=307529=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307529 ] ASF GitHub Bot logged work on BEAM-5428: Author: ASF GitHub Bot Created on: 05/Sep/19 23:38 Start Date: 05/Sep/19 23:38 Worklog Time Spent: 10m Work Description: tweise commented on pull request #9418: [BEAM-5428] Implement cross-bundle user state caching in the Python SDK URL: https://github.com/apache/beam/pull/9418#discussion_r321527154 ## File path: sdks/python/apache_beam/runners/worker/sdk_worker_main.py ## @@ -149,6 +149,7 @@ def main(unused_argv): control_address=service_descriptor.url, worker_count=_get_worker_count(sdk_pipeline_options), worker_id=_worker_id, +state_cache_size=_get_state_cache_size(sdk_pipeline_options), Review comment: It might be better to pass the pipeline options to SDKHarness and have it extract what it needs. There is currently quite some noise wherever SdkHarness is constructed in general. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307529) Time Spent: 17.5h (was: 17h 20m) > Implement cross-bundle state caching. > - > > Key: BEAM-5428 > URL: https://issues.apache.org/jira/browse/BEAM-5428 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: Robert Bradshaw >Assignee: Rakesh Kumar >Priority: Major > Time Spent: 17.5h > Remaining Estimate: 0h > > Tech spec: > [https://docs.google.com/document/d/1BOozW0bzBuz4oHJEuZNDOHdzaV5Y56ix58Ozrqm2jFg/edit#heading=h.7ghoih5aig5m] > Relevant document: > [https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit#|https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit] > Mailing list link: > [https://lists.apache.org/thread.html/caa8d9bc6ca871d13de2c5e6ba07fdc76f85d26497d95d90893aa1f6@%3Cdev.beam.apache.org%3E] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8160) Add instructions about how to set FnApi multi-threads/processes
Hannah Jiang created BEAM-8160: -- Summary: Add instructions about how to set FnApi multi-threads/processes Key: BEAM-8160 URL: https://issues.apache.org/jira/browse/BEAM-8160 Project: Beam Issue Type: Task Components: sdk-py-core Reporter: Hannah Jiang Assignee: Hannah Jiang Add instructions to Beam site or Beam wiki for easy discovery. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-8158) Loosen Python dependency restrictions: httplib2, oauth2client
[ https://issues.apache.org/jira/browse/BEAM-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923795#comment-16923795 ] Jonathan Jin commented on BEAM-8158: [~udim]: The product in question is unfortunately stuck on Python 2 for now. We're planning a migration to Python 3 over the next several months, but we'd ideally like to be able to integrate cleanly in Python 2 without resorting to stopgaps. > Loosen Python dependency restrictions: httplib2, oauth2client > - > > Key: BEAM-8158 > URL: https://issues.apache.org/jira/browse/BEAM-8158 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Jonathan Jin >Assignee: Chamikara Jayalath >Priority: Major > > The Beam Python SDK's current pinned dependencies create dependency conflict > issues for my team at Twitter. > I'd like the following expansions of the Python SDK's dependency ranges: > * oauth2client>=2.0.1,<*4* to at least oauth2client>=2.0.1,<=*4.1.2* > * httplib2>=0.8,<=*0.12.0* to at least httplib2>=0.8,<=*0.12.3* > I understand, from pull request > [8653|https://github.com/apache/beam/pull/8653] by [~altay], that the latter > is blocked in turn by googledatastore. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7984) [python] The coder returned for typehints.List should be IterableCoder
[ https://issues.apache.org/jira/browse/BEAM-7984?focusedWorklogId=307504=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307504 ] ASF GitHub Bot logged work on BEAM-7984: Author: ASF GitHub Bot Created on: 05/Sep/19 22:51 Start Date: 05/Sep/19 22:51 Worklog Time Spent: 10m Work Description: chadrik commented on issue #9344: [BEAM-7984] The coder returned for typehints.List should be IterableCoder URL: https://github.com/apache/beam/pull/9344#issuecomment-528622418 wow, Python PreCommit takes _forever_ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307504) Time Spent: 3h 40m (was: 3.5h) > [python] The coder returned for typehints.List should be IterableCoder > -- > > Key: BEAM-7984 > URL: https://issues.apache.org/jira/browse/BEAM-7984 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > IterableCoder encodes a list and decodes to list, but > typecoders.registry.get_coder(typehints.List[bytes]) returns a > FastPrimitiveCoder. I don't see any reason why this would be advantageous. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8151) Allow the Python SDK to use many many threads
[ https://issues.apache.org/jira/browse/BEAM-8151?focusedWorklogId=307498=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307498 ] ASF GitHub Bot logged work on BEAM-8151: Author: ASF GitHub Bot Created on: 05/Sep/19 22:37 Start Date: 05/Sep/19 22:37 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9477: [BEAM-8151, BEAM-7848] Up the max number of threads inside the SDK harness to a default of 10k URL: https://github.com/apache/beam/pull/9477#issuecomment-528618322 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307498) Time Spent: 4h 20m (was: 4h 10m) > Allow the Python SDK to use many many threads > - > > Key: BEAM-8151 > URL: https://issues.apache.org/jira/browse/BEAM-8151 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: Major > Time Spent: 4h 20m > Remaining Estimate: 0h > > We need to use a thread pool which shrinks the number of active threads when > they are not being used. > > This is to prevent any stuckness issues related to a runner scheduling more > work items then there are "work" threads inside the SDK harness. > > By default the control plane should have all "requests" being processed in > parallel and the runner is responsible for not overloading the SDK with too > much work. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7967) Execute portable Flink application jar
[ https://issues.apache.org/jira/browse/BEAM-7967?focusedWorklogId=307497=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307497 ] ASF GitHub Bot logged work on BEAM-7967: Author: ASF GitHub Bot Created on: 05/Sep/19 22:27 Start Date: 05/Sep/19 22:27 Worklog Time Spent: 10m Work Description: ibzib commented on issue #9408: [BEAM-7967] Execute portable Flink application jar URL: https://github.com/apache/beam/pull/9408#issuecomment-528615272 Run Seed Job This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307497) Time Spent: 3h 10m (was: 3h) > Execute portable Flink application jar > -- > > Key: BEAM-7967 > URL: https://issues.apache.org/jira/browse/BEAM-7967 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Time Spent: 3h 10m > Remaining Estimate: 0h > > [https://docs.google.com/document/d/1kj_9JWxGWOmSGeZ5hbLVDXSTv-zBrx4kQRqOq85RYD4/edit#heading=h.oes73844vmhl] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=307493=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307493 ] ASF GitHub Bot logged work on BEAM-7389: Author: ASF GitHub Bot Created on: 05/Sep/19 22:26 Start Date: 05/Sep/19 22:26 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9262: [BEAM-7389] Add code examples for Regex page URL: https://github.com/apache/beam/pull/9262 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307493) Time Spent: 55h 40m (was: 55.5h) > Colab examples for element-wise transforms (Python) > --- > > Key: BEAM-7389 > URL: https://issues.apache.org/jira/browse/BEAM-7389 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Rose Nguyen >Assignee: David Cavazos >Priority: Minor > Time Spent: 55h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7967) Execute portable Flink application jar
[ https://issues.apache.org/jira/browse/BEAM-7967?focusedWorklogId=307496=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307496 ] ASF GitHub Bot logged work on BEAM-7967: Author: ASF GitHub Bot Created on: 05/Sep/19 22:26 Start Date: 05/Sep/19 22:26 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #9408: [BEAM-7967] Execute portable Flink application jar URL: https://github.com/apache/beam/pull/9408#discussion_r321509256 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineRunner.java ## @@ -120,4 +132,62 @@ private PortablePipelineResult createPortablePipelineResult( return flinkRunnerResult; } } + + public static void main(String[] args) throws Exception { Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307496) Time Spent: 3h (was: 2h 50m) > Execute portable Flink application jar > -- > > Key: BEAM-7967 > URL: https://issues.apache.org/jira/browse/BEAM-7967 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Time Spent: 3h > Remaining Estimate: 0h > > [https://docs.google.com/document/d/1kj_9JWxGWOmSGeZ5hbLVDXSTv-zBrx4kQRqOq85RYD4/edit#heading=h.oes73844vmhl] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=307492=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307492 ] ASF GitHub Bot logged work on BEAM-7389: Author: ASF GitHub Bot Created on: 05/Sep/19 22:25 Start Date: 05/Sep/19 22:25 Worklog Time Spent: 10m Work Description: aaltay commented on issue #9262: [BEAM-7389] Add code examples for Regex page URL: https://github.com/apache/beam/pull/9262#issuecomment-528614780 We can wait until the next release for pydocs. Their main use is for released versions really. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307492) Time Spent: 55.5h (was: 55h 20m) > Colab examples for element-wise transforms (Python) > --- > > Key: BEAM-7389 > URL: https://issues.apache.org/jira/browse/BEAM-7389 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Rose Nguyen >Assignee: David Cavazos >Priority: Minor > Time Spent: 55.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8151) Allow the Python SDK to use many many threads
[ https://issues.apache.org/jira/browse/BEAM-8151?focusedWorklogId=307491=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307491 ] ASF GitHub Bot logged work on BEAM-8151: Author: ASF GitHub Bot Created on: 05/Sep/19 22:24 Start Date: 05/Sep/19 22:24 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9477: [BEAM-8151, BEAM-7848] Up the max number of threads inside the SDK harness to a default of 10k URL: https://github.com/apache/beam/pull/9477#issuecomment-528614423 Thanks Luke. This should be good for teh worker. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307491) Time Spent: 4h 10m (was: 4h) > Allow the Python SDK to use many many threads > - > > Key: BEAM-8151 > URL: https://issues.apache.org/jira/browse/BEAM-8151 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > > We need to use a thread pool which shrinks the number of active threads when > they are not being used. > > This is to prevent any stuckness issues related to a runner scheduling more > work items then there are "work" threads inside the SDK harness. > > By default the control plane should have all "requests" being processed in > parallel and the runner is responsible for not overloading the SDK with too > much work. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7967) Execute portable Flink application jar
[ https://issues.apache.org/jira/browse/BEAM-7967?focusedWorklogId=307489=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307489 ] ASF GitHub Bot logged work on BEAM-7967: Author: ASF GitHub Bot Created on: 05/Sep/19 22:18 Start Date: 05/Sep/19 22:18 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #9408: [BEAM-7967] Execute portable Flink application jar URL: https://github.com/apache/beam/pull/9408#discussion_r321506835 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/jobsubmission/PortablePipelineJarUtils.java ## @@ -51,4 +83,102 @@ ARTIFACT_STAGING_FOLDER_PATH + "/artifact-manifest.json"; static final String PIPELINE_PATH = PIPELINE_FOLDER_PATH + "/pipeline.json"; static final String PIPELINE_OPTIONS_PATH = PIPELINE_FOLDER_PATH + "/pipeline-options.json"; + + private static final Logger LOG = LoggerFactory.getLogger(PortablePipelineJarCreator.class); + + private static InputStream getResourceFromClassPath(String resourcePath) throws IOException { +InputStream inputStream = PortablePipelineJarUtils.class.getResourceAsStream(resourcePath); +if (inputStream == null) { + throw new FileNotFoundException( + String.format("Resource %s not found on classpath.", resourcePath)); +} +return inputStream; + } + + /** Populates {@code builder} using the JSON resource specified by {@code resourcePath}. */ + private static void parseJsonResource(String resourcePath, Builder builder) throws IOException { +try (InputStream inputStream = getResourceFromClassPath(resourcePath)) { + String contents = new String(ByteStreams.toByteArray(inputStream), StandardCharsets.UTF_8); + JsonFormat.parser().merge(contents, builder); +} + } + + public static Pipeline getPipelineFromClasspath() throws IOException { +Pipeline.Builder builder = Pipeline.newBuilder(); +parseJsonResource("/" + PIPELINE_PATH, builder); +return builder.build(); + } + + public static Struct getPipelineOptionsFromClasspath() throws IOException { +Struct.Builder builder = Struct.newBuilder(); +parseJsonResource("/" + PIPELINE_OPTIONS_PATH, builder); +return builder.build(); + } + + public static ProxyManifest getArtifactManifestFromClassPath() throws IOException { +ProxyManifest.Builder builder = ProxyManifest.newBuilder(); +parseJsonResource("/" + ARTIFACT_MANIFEST_PATH, builder); +return builder.build(); + } + + /** Writes artifacts listed in {@code proxyManifest}. */ + public static String stageArtifacts( + ProxyManifest proxyManifest, + PipelineOptions options, + String invocationId, + String artifactStagingPath) + throws Exception { +Collection filesToStage = +prepareArtifactsForStaging(proxyManifest, options, invocationId); +try (GrpcFnServer artifactServer = +GrpcFnServer.allocatePortAndCreateFor( +new BeamFileSystemArtifactStagingService(), InProcessServerFactory.create())) { + ManagedChannel grpcChannel = + InProcessManagedChannelFactory.create() + .forDescriptor(artifactServer.getApiServiceDescriptor()); + ArtifactServiceStager stager = ArtifactServiceStager.overChannel(grpcChannel); + String stagingSessionToken = + BeamFileSystemArtifactStagingService.generateStagingSessionToken( + invocationId, artifactStagingPath); + String retrievalToken = stager.stage(stagingSessionToken, filesToStage); + // Clean up. + for (StagedFile file : filesToStage) { +if (!file.getFile().delete()) { + LOG.warn("Failed to delete file {}", file.getFile()); +} + } + grpcChannel.shutdown(); + return retrievalToken; +} + } + + /** + * Artifacts are expected to exist as resources on the classpath, located using {@code + * proxyManifest}. Write them to tmp files so they can be staged. + */ + private static Collection prepareArtifactsForStaging( + ProxyManifest proxyManifest, PipelineOptions options, String invocationId) + throws IOException { +List filesToStage = new ArrayList<>(); +Path outputFolderPath = +Paths.get( +MoreObjects.firstNonNull( +options.getTempLocation(), System.getProperty("java.io.tmpdir")), +invocationId); +if (!outputFolderPath.toFile().mkdir()) { + throw new IOException("Failed to create folder " + outputFolderPath); +} +for (Location location : proxyManifest.getLocationList()) { + try (InputStream inputStream = getResourceFromClassPath(location.getUri())) { Review comment: As far as I know, the jar's contents only exist as class path resources (streams), not regular files, and the artifact staging code expects that we are staging regular
[jira] [Commented] (BEAM-7978) ArithmeticExceptions on getting backlog bytes
[ https://issues.apache.org/jira/browse/BEAM-7978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923775#comment-16923775 ] Kenneth Knowles commented on BEAM-7978: --- I just happened to dig in to this, and really Minutes.between(instant, instant) is not safe unless you handle the overflow and convert to UNKNOWN behavior. Localizing to this is, in the simplified kinesis client, seems best. That keeps the troublesome behavior to a minimum. It is better than forcing a complex spec outside this local function. > ArithmeticExceptions on getting backlog bytes > -- > > Key: BEAM-7978 > URL: https://issues.apache.org/jira/browse/BEAM-7978 > Project: Beam > Issue Type: Bug > Components: io-java-kinesis >Affects Versions: 2.14.0 >Reporter: Mateusz >Assignee: Alexey Romanenko >Priority: Major > Fix For: 2.16.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Hello, > Beam 2.14.0 > (and to be more precise > [commit|https://github.com/apache/beam/commit/9fe0a0387ad1f894ef02acf32edbc9623a3b13ad#diff-b4964a457006b1555c7042c739b405ec]) > introduced a change in watermark calculation in Kinesis IO causing below > error: > {code:java} > exception: "java.lang.RuntimeException: Unknown kinesis failure, when trying > to reach kinesis > at > org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.wrapExceptions(SimplifiedKinesisClient.java:227) > at > org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.getBacklogBytes(SimplifiedKinesisClient.java:167) > at > org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.getBacklogBytes(SimplifiedKinesisClient.java:155) > at > org.apache.beam.sdk.io.kinesis.KinesisReader.getTotalBacklogBytes(KinesisReader.java:158) > at > org.apache.beam.runners.dataflow.worker.StreamingModeExecutionContext.flushState(StreamingModeExecutionContext.java:433) > at > org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1289) > at > org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:149) > at > org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1024) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.ArithmeticException: Value cannot fit in an int: > 153748963401 > at org.joda.time.field.FieldUtils.safeToInt(FieldUtils.java:229) > at > org.joda.time.field.BaseDurationField.getDifference(BaseDurationField.java:141) > at > org.joda.time.base.BaseSingleFieldPeriod.between(BaseSingleFieldPeriod.java:72) > at org.joda.time.Minutes.minutesBetween(Minutes.java:101) > at > org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.lambda$getBacklogBytes$3(SimplifiedKinesisClient.java:169) > at > org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.wrapExceptions(SimplifiedKinesisClient.java:210) > ... 10 more > {code} > We spotted this issue on Dataflow runner. It's problematic as inability to > get backlog bytes seems to result in constant recreation of KinesisReader. > The issue happens if the backlog bytes are retrieved before watermark value > is updated from initial default value. Easy way to reproduce it is to create > a pipeline with Kinesis source for a stream where no records are being put. > While debugging it locally, you can observe that the watermark is set to the > value on the past (like: "-290308-12-21T19:59:05.225Z"). After two minutes > (default watermark idle duration threshold is set to 2 minutes) , the > watermark is set to value of > [watermarkIdleThreshold|https://github.com/apache/beam/blob/9fe0a0387ad1f894ef02acf32edbc9623a3b13ad/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/WatermarkPolicyFactory.java#L110]), > so the next backlog bytes retrieval should be correct. However, as described > before, running the pipeline on Dataflow runner results in KinesisReader > being closed just after creation, so the watermark won't be fixed. > The reason of the issue is following: The introduced watermark policies are > relying on > [WatermarkParameters|https://github.com/apache/beam/blob/9fe0a0387ad1f894ef02acf32edbc9623a3b13ad/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/WatermarkParameters.java] > which initialises currentWatermark and eventTime to > [BoundedWindow.TIMESTAMP_MIN_VALUE|https://github.com/apache/beam/commit/9fe0a0387ad1f894ef02acf32edbc9623a3b13ad#diff-b4964a457006b1555c7042c739b405ecR52]. > This result in watermark being set to
[jira] [Work logged] (BEAM-8151) Allow the Python SDK to use many many threads
[ https://issues.apache.org/jira/browse/BEAM-8151?focusedWorklogId=307488=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307488 ] ASF GitHub Bot logged work on BEAM-8151: Author: ASF GitHub Bot Created on: 05/Sep/19 22:17 Start Date: 05/Sep/19 22:17 Worklog Time Spent: 10m Work Description: angoenka commented on pull request #9477: [BEAM-8151, BEAM-7848] Up the max number of threads inside the SDK harness to a default of 10k URL: https://github.com/apache/beam/pull/9477#discussion_r321506498 ## File path: sdks/python/apache_beam/runners/worker/sdk_worker_main.py ## @@ -202,7 +202,7 @@ def _get_worker_count(pipeline_options): re.match(r'worker_threads=(?P.*)', experiment).group('worker_threads')) - return 12 + return 1 Review comment: also +1 for this change if it does not have performance impact as it will solve a problem which we discussed over the mailing thread https://lists.apache.org/thread.html/0cbf73d696e0d3a5bb8e93618ac9d6bb81daecf2c9c8e11ee220c8ae@%3Cdev.beam.apache.org%3E Where this was the 2nd option in the proposal. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307488) Time Spent: 4h (was: 3h 50m) > Allow the Python SDK to use many many threads > - > > Key: BEAM-8151 > URL: https://issues.apache.org/jira/browse/BEAM-8151 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: Major > Time Spent: 4h > Remaining Estimate: 0h > > We need to use a thread pool which shrinks the number of active threads when > they are not being used. > > This is to prevent any stuckness issues related to a runner scheduling more > work items then there are "work" threads inside the SDK harness. > > By default the control plane should have all "requests" being processed in > parallel and the runner is responsible for not overloading the SDK with too > much work. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8151) Allow the Python SDK to use many many threads
[ https://issues.apache.org/jira/browse/BEAM-8151?focusedWorklogId=307487=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307487 ] ASF GitHub Bot logged work on BEAM-8151: Author: ASF GitHub Bot Created on: 05/Sep/19 22:16 Start Date: 05/Sep/19 22:16 Worklog Time Spent: 10m Work Description: angoenka commented on pull request #9477: [BEAM-8151, BEAM-7848] Up the max number of threads inside the SDK harness to a default of 10k URL: https://github.com/apache/beam/pull/9477#discussion_r321506498 ## File path: sdks/python/apache_beam/runners/worker/sdk_worker_main.py ## @@ -202,7 +202,7 @@ def _get_worker_count(pipeline_options): re.match(r'worker_threads=(?P.*)', experiment).group('worker_threads')) - return 12 + return 1 Review comment: also +1 for this change if it does not have performance impact as it will solve a problem which we discussed over the mailing thread https://lists.apache.org/thread.html/0cbf73d696e0d3a5bb8e93618ac9d6bb81daecf2c9c8e11ee220c8ae@%3Cdev.beam.apache.org%3E This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307487) Time Spent: 3h 50m (was: 3h 40m) > Allow the Python SDK to use many many threads > - > > Key: BEAM-8151 > URL: https://issues.apache.org/jira/browse/BEAM-8151 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: Major > Time Spent: 3h 50m > Remaining Estimate: 0h > > We need to use a thread pool which shrinks the number of active threads when > they are not being used. > > This is to prevent any stuckness issues related to a runner scheduling more > work items then there are "work" threads inside the SDK harness. > > By default the control plane should have all "requests" being processed in > parallel and the runner is responsible for not overloading the SDK with too > much work. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-5539) Beam Dependency Update Request: google-cloud-pubsub
[ https://issues.apache.org/jira/browse/BEAM-5539?focusedWorklogId=307482=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307482 ] ASF GitHub Bot logged work on BEAM-5539: Author: ASF GitHub Bot Created on: 05/Sep/19 22:13 Start Date: 05/Sep/19 22:13 Worklog Time Spent: 10m Work Description: udim commented on pull request #9491: [BEAM-5539] Upgrade google-cloud-pubsub and bigquery packages. URL: https://github.com/apache/beam/pull/9491 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | --- | [![Build
[jira] [Work logged] (BEAM-8151) Allow the Python SDK to use many many threads
[ https://issues.apache.org/jira/browse/BEAM-8151?focusedWorklogId=307478=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307478 ] ASF GitHub Bot logged work on BEAM-8151: Author: ASF GitHub Bot Created on: 05/Sep/19 22:12 Start Date: 05/Sep/19 22:12 Worklog Time Spent: 10m Work Description: angoenka commented on pull request #9477: [BEAM-8151, BEAM-7848] Up the max number of threads inside the SDK harness to a default of 10k URL: https://github.com/apache/beam/pull/9477#discussion_r321505439 ## File path: sdks/python/apache_beam/runners/worker/sdk_worker_main.py ## @@ -202,7 +202,7 @@ def _get_worker_count(pipeline_options): re.match(r'worker_threads=(?P.*)', experiment).group('worker_threads')) - return 12 + return 1 Review comment: > Do we know why 10K is a good number? Would not this practically accept all work? > > How did we come up with 12 in the first place. (@angoenka may have information) It was just some basic experimentation where for the job which I was experimenting on did not show a lot of benefit beyond 12 threads. For TFX pipelines, we recommend 100 threads when running on Flink because of some scheduling issues where we use to get dead lock in situation similar to the one mentioned by Luke. I am not sure how it will impact the resource usage if we have 1 standing threads for single core. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307478) Time Spent: 3h 40m (was: 3.5h) > Allow the Python SDK to use many many threads > - > > Key: BEAM-8151 > URL: https://issues.apache.org/jira/browse/BEAM-8151 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > We need to use a thread pool which shrinks the number of active threads when > they are not being used. > > This is to prevent any stuckness issues related to a runner scheduling more > work items then there are "work" threads inside the SDK harness. > > By default the control plane should have all "requests" being processed in > parallel and the runner is responsible for not overloading the SDK with too > much work. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7967) Execute portable Flink application jar
[ https://issues.apache.org/jira/browse/BEAM-7967?focusedWorklogId=307477=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307477 ] ASF GitHub Bot logged work on BEAM-7967: Author: ASF GitHub Bot Created on: 05/Sep/19 22:09 Start Date: 05/Sep/19 22:09 Worklog Time Spent: 10m Work Description: ibzib commented on issue #9408: [BEAM-7967] Execute portable Flink application jar URL: https://github.com/apache/beam/pull/9408#issuecomment-528610103 Run Seed Job This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307477) Time Spent: 2h 40m (was: 2.5h) > Execute portable Flink application jar > -- > > Key: BEAM-7967 > URL: https://issues.apache.org/jira/browse/BEAM-7967 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Time Spent: 2h 40m > Remaining Estimate: 0h > > [https://docs.google.com/document/d/1kj_9JWxGWOmSGeZ5hbLVDXSTv-zBrx4kQRqOq85RYD4/edit#heading=h.oes73844vmhl] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7967) Execute portable Flink application jar
[ https://issues.apache.org/jira/browse/BEAM-7967?focusedWorklogId=307476=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307476 ] ASF GitHub Bot logged work on BEAM-7967: Author: ASF GitHub Bot Created on: 05/Sep/19 22:08 Start Date: 05/Sep/19 22:08 Worklog Time Spent: 10m Work Description: angoenka commented on issue #9408: [BEAM-7967] Execute portable Flink application jar URL: https://github.com/apache/beam/pull/9408#issuecomment-528609802 Thanks! This will be really useful. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307476) Time Spent: 2.5h (was: 2h 20m) > Execute portable Flink application jar > -- > > Key: BEAM-7967 > URL: https://issues.apache.org/jira/browse/BEAM-7967 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Time Spent: 2.5h > Remaining Estimate: 0h > > [https://docs.google.com/document/d/1kj_9JWxGWOmSGeZ5hbLVDXSTv-zBrx4kQRqOq85RYD4/edit#heading=h.oes73844vmhl] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7967) Execute portable Flink application jar
[ https://issues.apache.org/jira/browse/BEAM-7967?focusedWorklogId=307474=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307474 ] ASF GitHub Bot logged work on BEAM-7967: Author: ASF GitHub Bot Created on: 05/Sep/19 22:08 Start Date: 05/Sep/19 22:08 Worklog Time Spent: 10m Work Description: angoenka commented on pull request #9408: [BEAM-7967] Execute portable Flink application jar URL: https://github.com/apache/beam/pull/9408#discussion_r321504185 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/jobsubmission/PortablePipelineJarUtils.java ## @@ -51,4 +83,102 @@ ARTIFACT_STAGING_FOLDER_PATH + "/artifact-manifest.json"; static final String PIPELINE_PATH = PIPELINE_FOLDER_PATH + "/pipeline.json"; static final String PIPELINE_OPTIONS_PATH = PIPELINE_FOLDER_PATH + "/pipeline-options.json"; + + private static final Logger LOG = LoggerFactory.getLogger(PortablePipelineJarCreator.class); + + private static InputStream getResourceFromClassPath(String resourcePath) throws IOException { +InputStream inputStream = PortablePipelineJarUtils.class.getResourceAsStream(resourcePath); +if (inputStream == null) { + throw new FileNotFoundException( + String.format("Resource %s not found on classpath.", resourcePath)); +} +return inputStream; + } + + /** Populates {@code builder} using the JSON resource specified by {@code resourcePath}. */ + private static void parseJsonResource(String resourcePath, Builder builder) throws IOException { +try (InputStream inputStream = getResourceFromClassPath(resourcePath)) { + String contents = new String(ByteStreams.toByteArray(inputStream), StandardCharsets.UTF_8); + JsonFormat.parser().merge(contents, builder); +} + } + + public static Pipeline getPipelineFromClasspath() throws IOException { +Pipeline.Builder builder = Pipeline.newBuilder(); +parseJsonResource("/" + PIPELINE_PATH, builder); +return builder.build(); + } + + public static Struct getPipelineOptionsFromClasspath() throws IOException { +Struct.Builder builder = Struct.newBuilder(); +parseJsonResource("/" + PIPELINE_OPTIONS_PATH, builder); +return builder.build(); + } + + public static ProxyManifest getArtifactManifestFromClassPath() throws IOException { +ProxyManifest.Builder builder = ProxyManifest.newBuilder(); +parseJsonResource("/" + ARTIFACT_MANIFEST_PATH, builder); +return builder.build(); + } + + /** Writes artifacts listed in {@code proxyManifest}. */ + public static String stageArtifacts( + ProxyManifest proxyManifest, + PipelineOptions options, + String invocationId, + String artifactStagingPath) + throws Exception { +Collection filesToStage = +prepareArtifactsForStaging(proxyManifest, options, invocationId); +try (GrpcFnServer artifactServer = +GrpcFnServer.allocatePortAndCreateFor( +new BeamFileSystemArtifactStagingService(), InProcessServerFactory.create())) { + ManagedChannel grpcChannel = + InProcessManagedChannelFactory.create() + .forDescriptor(artifactServer.getApiServiceDescriptor()); + ArtifactServiceStager stager = ArtifactServiceStager.overChannel(grpcChannel); + String stagingSessionToken = + BeamFileSystemArtifactStagingService.generateStagingSessionToken( + invocationId, artifactStagingPath); + String retrievalToken = stager.stage(stagingSessionToken, filesToStage); + // Clean up. + for (StagedFile file : filesToStage) { +if (!file.getFile().delete()) { + LOG.warn("Failed to delete file {}", file.getFile()); +} + } + grpcChannel.shutdown(); + return retrievalToken; +} + } + + /** + * Artifacts are expected to exist as resources on the classpath, located using {@code + * proxyManifest}. Write them to tmp files so they can be staged. + */ + private static Collection prepareArtifactsForStaging( + ProxyManifest proxyManifest, PipelineOptions options, String invocationId) + throws IOException { +List filesToStage = new ArrayList<>(); +Path outputFolderPath = +Paths.get( +MoreObjects.firstNonNull( +options.getTempLocation(), System.getProperty("java.io.tmpdir")), +invocationId); +if (!outputFolderPath.toFile().mkdir()) { + throw new IOException("Failed to create folder " + outputFolderPath); +} +for (Location location : proxyManifest.getLocationList()) { + try (InputStream inputStream = getResourceFromClassPath(location.getUri())) { Review comment: Can we use the jar directly instead of copying the artifacts to local file system? Copying files to local file system will also call for cleanup.
[jira] [Work logged] (BEAM-7967) Execute portable Flink application jar
[ https://issues.apache.org/jira/browse/BEAM-7967?focusedWorklogId=307475=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307475 ] ASF GitHub Bot logged work on BEAM-7967: Author: ASF GitHub Bot Created on: 05/Sep/19 22:08 Start Date: 05/Sep/19 22:08 Worklog Time Spent: 10m Work Description: angoenka commented on pull request #9408: [BEAM-7967] Execute portable Flink application jar URL: https://github.com/apache/beam/pull/9408#discussion_r321496144 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineRunner.java ## @@ -120,4 +132,62 @@ private PortablePipelineResult createPortablePipelineResult( return flinkRunnerResult; } } + + public static void main(String[] args) throws Exception { Review comment: Lets add comment as to why this main method is needed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307475) Time Spent: 2h 20m (was: 2h 10m) > Execute portable Flink application jar > -- > > Key: BEAM-7967 > URL: https://issues.apache.org/jira/browse/BEAM-7967 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Time Spent: 2h 20m > Remaining Estimate: 0h > > [https://docs.google.com/document/d/1kj_9JWxGWOmSGeZ5hbLVDXSTv-zBrx4kQRqOq85RYD4/edit#heading=h.oes73844vmhl] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.
[ https://issues.apache.org/jira/browse/BEAM-7060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923767#comment-16923767 ] Udi Meiri commented on BEAM-7060: - Thanks! Standard typing migration is tracked in https://issues.apache.org/jira/browse/BEAM-8156 > Design Py3-compatible typehints annotation support in Beam 3. > - > > Key: BEAM-7060 > URL: https://issues.apache.org/jira/browse/BEAM-7060 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Udi Meiri >Priority: Major > Fix For: 2.16.0 > > Time Spent: 15.5h > Remaining Estimate: 0h > > Existing [Typehints implementaiton in > Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/ > ] heavily relies on internal details of CPython implementation, and some of > the assumptions of this implementation broke as of Python 3.6, see for > example: https://issues.apache.org/jira/browse/BEAM-6877, which makes > typehints support unusable on Python 3.6 as of now. [Python 3 Kanban > Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245=detail] > lists several specific typehints-related breakages, prefixed with "TypeHints > Py3 Error". > We need to decide whether to: > - Deprecate in-house typehints implementation. > - Continue to support in-house implementation, which at this point is a stale > code and has other known issues. > - Attempt to use some off-the-shelf libraries for supporting > type-annotations, like Pytype, Mypy, PyAnnotate. > WRT to this decision we also need to plan on immediate next steps to unblock > adoption of Beam for Python 3.6+ users. One potential option may be to have > Beam SDK ignore any typehint annotations on Py 3.6+. > cc: [~udim], [~altay], [~robertwb]. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7616) urlopen calls could get stuck without a timeout
[ https://issues.apache.org/jira/browse/BEAM-7616?focusedWorklogId=307470=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307470 ] ASF GitHub Bot logged work on BEAM-7616: Author: ASF GitHub Bot Created on: 05/Sep/19 22:04 Start Date: 05/Sep/19 22:04 Worklog Time Spent: 10m Work Description: aaltay commented on issue #9401: [BEAM-7616] apitools use urllib with the global timeout. Set it to 60 seconds # to prevent network related stuckness issues. URL: https://github.com/apache/beam/pull/9401#issuecomment-528608447 Run Python 2 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307470) Time Spent: 5h 20m (was: 5h 10m) > urlopen calls could get stuck without a timeout > --- > > Key: BEAM-7616 > URL: https://issues.apache.org/jira/browse/BEAM-7616 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ahmet Altay >Assignee: Udi Meiri >Priority: Blocker > Fix For: 2.14.0, 2.16.0 > > Time Spent: 5h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7616) urlopen calls could get stuck without a timeout
[ https://issues.apache.org/jira/browse/BEAM-7616?focusedWorklogId=307468=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307468 ] ASF GitHub Bot logged work on BEAM-7616: Author: ASF GitHub Bot Created on: 05/Sep/19 22:03 Start Date: 05/Sep/19 22:03 Worklog Time Spent: 10m Work Description: aaltay commented on issue #9401: [BEAM-7616] apitools use urllib with the global timeout. Set it to 60 seconds # to prevent network related stuckness issues. URL: https://github.com/apache/beam/pull/9401#issuecomment-528608240 This version seems to work. I can find the new logs in stackdriver logs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307468) Time Spent: 5h 10m (was: 5h) > urlopen calls could get stuck without a timeout > --- > > Key: BEAM-7616 > URL: https://issues.apache.org/jira/browse/BEAM-7616 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ahmet Altay >Assignee: Udi Meiri >Priority: Blocker > Fix For: 2.14.0, 2.16.0 > > Time Spent: 5h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8113) FlinkRunner: Stage files from context classloader
[ https://issues.apache.org/jira/browse/BEAM-8113?focusedWorklogId=307465=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307465 ] ASF GitHub Bot logged work on BEAM-8113: Author: ASF GitHub Bot Created on: 05/Sep/19 21:54 Start Date: 05/Sep/19 21:54 Worklog Time Spent: 10m Work Description: je-ik commented on issue #9451: [BEAM-8113] Stage files from context classloader URL: https://github.com/apache/beam/pull/9451#issuecomment-528605188 @lukecwik sure ... the related code goes apparently through kinda rapid development, as there are more and more uses of the modified method in core. Essentially, that was a sort of why I wanted to keep that in flink runner code only. But if we want to solve the issues for all runners, then it should be placed in core, but that will probably make this PR more complicated to merge. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307465) Time Spent: 7h (was: 6h 50m) > FlinkRunner: Stage files from context classloader > - > > Key: BEAM-8113 > URL: https://issues.apache.org/jira/browse/BEAM-8113 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Jan Lukavský >Assignee: Jan Lukavský >Priority: Major > Time Spent: 7h > Remaining Estimate: 0h > > Currently, only files from {{FlinkRunner.class.getClassLoader()}} are staged > by default. Add also files from > {{Thread.currentThread().getContextClassLoader()}}. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8159) Support TIME ADD/SUB
Rui Wang created BEAM-8159: -- Summary: Support TIME ADD/SUB Key: BEAM-8159 URL: https://issues.apache.org/jira/browse/BEAM-8159 Project: Beam Issue Type: Sub-task Components: dsl-sql-zetasql Reporter: Rui Wang https://github.com/google/zetasql/blob/master/docs/time_functions.md#time_add https://github.com/google/zetasql/blob/master/docs/time_functions.md#time_sub -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (BEAM-8159) Support TIME ADD/SUB
[ https://issues.apache.org/jira/browse/BEAM-8159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang updated BEAM-8159: --- Status: Open (was: Triage Needed) > Support TIME ADD/SUB > > > Key: BEAM-8159 > URL: https://issues.apache.org/jira/browse/BEAM-8159 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql-zetasql >Reporter: Rui Wang >Priority: Major > > https://github.com/google/zetasql/blob/master/docs/time_functions.md#time_add > https://github.com/google/zetasql/blob/master/docs/time_functions.md#time_sub -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-8158) Loosen Python dependency restrictions: httplib2, oauth2client
[ https://issues.apache.org/jira/browse/BEAM-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923750#comment-16923750 ] Udi Meiri commented on BEAM-8158: - Jonathan are you using Python 2 or 3? Perhaps we can conditionally increase httplib2 max only on Python 3. > Loosen Python dependency restrictions: httplib2, oauth2client > - > > Key: BEAM-8158 > URL: https://issues.apache.org/jira/browse/BEAM-8158 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Jonathan Jin >Assignee: Chamikara Jayalath >Priority: Major > > The Beam Python SDK's current pinned dependencies create dependency conflict > issues for my team at Twitter. > I'd like the following expansions of the Python SDK's dependency ranges: > * oauth2client>=2.0.1,<*4* to at least oauth2client>=2.0.1,<=*4.1.2* > * httplib2>=0.8,<=*0.12.0* to at least httplib2>=0.8,<=*0.12.3* > I understand, from pull request > [8653|https://github.com/apache/beam/pull/8653] by [~altay], that the latter > is blocked in turn by googledatastore. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Assigned] (BEAM-8158) Loosen Python dependency restrictions: httplib2, oauth2client
[ https://issues.apache.org/jira/browse/BEAM-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Altay reassigned BEAM-8158: - Assignee: Chamikara Jayalath (was: Ahmet Altay) > Loosen Python dependency restrictions: httplib2, oauth2client > - > > Key: BEAM-8158 > URL: https://issues.apache.org/jira/browse/BEAM-8158 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Jonathan Jin >Assignee: Chamikara Jayalath >Priority: Major > > The Beam Python SDK's current pinned dependencies create dependency conflict > issues for my team at Twitter. > I'd like the following expansions of the Python SDK's dependency ranges: > * oauth2client>=2.0.1,<*4* to at least oauth2client>=2.0.1,<=*4.1.2* > * httplib2>=0.8,<=*0.12.0* to at least httplib2>=0.8,<=*0.12.3* > I understand, from pull request > [8653|https://github.com/apache/beam/pull/8653] by [~altay], that the latter > is blocked in turn by googledatastore. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-8158) Loosen Python dependency restrictions: httplib2, oauth2client
[ https://issues.apache.org/jira/browse/BEAM-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923745#comment-16923745 ] Ahmet Altay commented on BEAM-8158: --- /cc [~udim] > Loosen Python dependency restrictions: httplib2, oauth2client > - > > Key: BEAM-8158 > URL: https://issues.apache.org/jira/browse/BEAM-8158 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Jonathan Jin >Assignee: Chamikara Jayalath >Priority: Major > > The Beam Python SDK's current pinned dependencies create dependency conflict > issues for my team at Twitter. > I'd like the following expansions of the Python SDK's dependency ranges: > * oauth2client>=2.0.1,<*4* to at least oauth2client>=2.0.1,<=*4.1.2* > * httplib2>=0.8,<=*0.12.0* to at least httplib2>=0.8,<=*0.12.3* > I understand, from pull request > [8653|https://github.com/apache/beam/pull/8653] by [~altay], that the latter > is blocked in turn by googledatastore. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Assigned] (BEAM-8158) Loosen Python dependency restrictions: httplib2, oauth2client
[ https://issues.apache.org/jira/browse/BEAM-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Altay reassigned BEAM-8158: - Assignee: Ahmet Altay > Loosen Python dependency restrictions: httplib2, oauth2client > - > > Key: BEAM-8158 > URL: https://issues.apache.org/jira/browse/BEAM-8158 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Jonathan Jin >Assignee: Ahmet Altay >Priority: Major > > The Beam Python SDK's current pinned dependencies create dependency conflict > issues for my team at Twitter. > I'd like the following expansions of the Python SDK's dependency ranges: > * oauth2client>=2.0.1,<*4* to at least oauth2client>=2.0.1,<=*4.1.2* > * httplib2>=0.8,<=*0.12.0* to at least httplib2>=0.8,<=*0.12.3* > I understand, from pull request > [8653|https://github.com/apache/beam/pull/8653] by [~altay], that the latter > is blocked in turn by googledatastore. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-8158) Loosen Python dependency restrictions: httplib2, oauth2client
[ https://issues.apache.org/jira/browse/BEAM-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923744#comment-16923744 ] Ahmet Altay commented on BEAM-8158: --- [~jinnovation] no updates, but let's explore a few options. [~chamikara] could we update the googledatastore dependency to 7.1.0 or make this dependency optional? > Loosen Python dependency restrictions: httplib2, oauth2client > - > > Key: BEAM-8158 > URL: https://issues.apache.org/jira/browse/BEAM-8158 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Jonathan Jin >Priority: Major > > The Beam Python SDK's current pinned dependencies create dependency conflict > issues for my team at Twitter. > I'd like the following expansions of the Python SDK's dependency ranges: > * oauth2client>=2.0.1,<*4* to at least oauth2client>=2.0.1,<=*4.1.2* > * httplib2>=0.8,<=*0.12.0* to at least httplib2>=0.8,<=*0.12.3* > I understand, from pull request > [8653|https://github.com/apache/beam/pull/8653] by [~altay], that the latter > is blocked in turn by googledatastore. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (BEAM-7839) Distinguish unknown and unrecognized states in Dataflow runner.
[ https://issues.apache.org/jira/browse/BEAM-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev resolved BEAM-7839. --- Fix Version/s: 2.16.0 Assignee: Valentyn Tymofieiev Resolution: Fixed > Distinguish unknown and unrecognized states in Dataflow runner. > --- > > Key: BEAM-7839 > URL: https://issues.apache.org/jira/browse/BEAM-7839 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Fix For: 2.16.0 > > Time Spent: 40m > Remaining Estimate: 0h > > In BEAM-7766 [~kenn] mentioned that when reasoning about pipeline state, > Dataflow runner should distinguish between an unknown (to the service) > pipeline state state and unrecognized by the SDK pipeline state. An > unrecognized state may happen when service API introduces a new state that > older versions of the SDK will not know about. Filing this issue to track > introducing a distinction. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.
[ https://issues.apache.org/jira/browse/BEAM-7060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923739#comment-16923739 ] Chad Dombrova commented on BEAM-7060: - Note that work on type annotations is continuing in BEAM-7746 > Design Py3-compatible typehints annotation support in Beam 3. > - > > Key: BEAM-7060 > URL: https://issues.apache.org/jira/browse/BEAM-7060 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Udi Meiri >Priority: Major > Fix For: 2.16.0 > > Time Spent: 15.5h > Remaining Estimate: 0h > > Existing [Typehints implementaiton in > Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/ > ] heavily relies on internal details of CPython implementation, and some of > the assumptions of this implementation broke as of Python 3.6, see for > example: https://issues.apache.org/jira/browse/BEAM-6877, which makes > typehints support unusable on Python 3.6 as of now. [Python 3 Kanban > Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245=detail] > lists several specific typehints-related breakages, prefixed with "TypeHints > Py3 Error". > We need to decide whether to: > - Deprecate in-house typehints implementation. > - Continue to support in-house implementation, which at this point is a stale > code and has other known issues. > - Attempt to use some off-the-shelf libraries for supporting > type-annotations, like Pytype, Mypy, PyAnnotate. > WRT to this decision we also need to plan on immediate next steps to unblock > adoption of Beam for Python 3.6+ users. One potential option may be to have > Beam SDK ignore any typehint annotations on Py 3.6+. > cc: [~udim], [~altay], [~robertwb]. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7839) Distinguish unknown and unrecognized states in Dataflow runner.
[ https://issues.apache.org/jira/browse/BEAM-7839?focusedWorklogId=307450=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307450 ] ASF GitHub Bot logged work on BEAM-7839: Author: ASF GitHub Bot Created on: 05/Sep/19 21:22 Start Date: 05/Sep/19 21:22 Worklog Time Spent: 10m Work Description: tvalentyn commented on pull request #9132: [BEAM-7839] Default to UNRECOGNIZED state when a state cannot be accurately interpreted by the SDK. URL: https://github.com/apache/beam/pull/9132 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307450) Time Spent: 40m (was: 0.5h) > Distinguish unknown and unrecognized states in Dataflow runner. > --- > > Key: BEAM-7839 > URL: https://issues.apache.org/jira/browse/BEAM-7839 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Valentyn Tymofieiev >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > In BEAM-7766 [~kenn] mentioned that when reasoning about pipeline state, > Dataflow runner should distinguish between an unknown (to the service) > pipeline state state and unrecognized by the SDK pipeline state. An > unrecognized state may happen when service API introduces a new state that > older versions of the SDK will not know about. Filing this issue to track > introducing a distinction. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.
[ https://issues.apache.org/jira/browse/BEAM-7060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri resolved BEAM-7060. - Resolution: Fixed > Design Py3-compatible typehints annotation support in Beam 3. > - > > Key: BEAM-7060 > URL: https://issues.apache.org/jira/browse/BEAM-7060 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Udi Meiri >Priority: Major > Fix For: 2.16.0 > > Time Spent: 15.5h > Remaining Estimate: 0h > > Existing [Typehints implementaiton in > Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/ > ] heavily relies on internal details of CPython implementation, and some of > the assumptions of this implementation broke as of Python 3.6, see for > example: https://issues.apache.org/jira/browse/BEAM-6877, which makes > typehints support unusable on Python 3.6 as of now. [Python 3 Kanban > Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245=detail] > lists several specific typehints-related breakages, prefixed with "TypeHints > Py3 Error". > We need to decide whether to: > - Deprecate in-house typehints implementation. > - Continue to support in-house implementation, which at this point is a stale > code and has other known issues. > - Attempt to use some off-the-shelf libraries for supporting > type-annotations, like Pytype, Mypy, PyAnnotate. > WRT to this decision we also need to plan on immediate next steps to unblock > adoption of Beam for Python 3.6+ users. One potential option may be to have > Beam SDK ignore any typehint annotations on Py 3.6+. > cc: [~udim], [~altay], [~robertwb]. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.
[ https://issues.apache.org/jira/browse/BEAM-7060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923735#comment-16923735 ] Udi Meiri commented on BEAM-7060: - Py3 annotations support has been merged. Closing. Other aspects such as mypy support and using the official types in the typing module will be handled in other JIRAs. > Design Py3-compatible typehints annotation support in Beam 3. > - > > Key: BEAM-7060 > URL: https://issues.apache.org/jira/browse/BEAM-7060 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Udi Meiri >Priority: Major > Fix For: 2.16.0 > > Time Spent: 15.5h > Remaining Estimate: 0h > > Existing [Typehints implementaiton in > Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/ > ] heavily relies on internal details of CPython implementation, and some of > the assumptions of this implementation broke as of Python 3.6, see for > example: https://issues.apache.org/jira/browse/BEAM-6877, which makes > typehints support unusable on Python 3.6 as of now. [Python 3 Kanban > Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245=detail] > lists several specific typehints-related breakages, prefixed with "TypeHints > Py3 Error". > We need to decide whether to: > - Deprecate in-house typehints implementation. > - Continue to support in-house implementation, which at this point is a stale > code and has other known issues. > - Attempt to use some off-the-shelf libraries for supporting > type-annotations, like Pytype, Mypy, PyAnnotate. > WRT to this decision we also need to plan on immediate next steps to unblock > adoption of Beam for Python 3.6+ users. One potential option may be to have > Beam SDK ignore any typehint annotations on Py 3.6+. > cc: [~udim], [~altay], [~robertwb]. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7859) Portable Wordcount on Spark runner does not work in DOCKER execution mode.
[ https://issues.apache.org/jira/browse/BEAM-7859?focusedWorklogId=307446=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307446 ] ASF GitHub Bot logged work on BEAM-7859: Author: ASF GitHub Bot Created on: 05/Sep/19 21:16 Start Date: 05/Sep/19 21:16 Worklog Time Spent: 10m Work Description: angoenka commented on pull request #9421: [BEAM-7859] set SDK worker parallelism to 1 in word count test URL: https://github.com/apache/beam/pull/9421 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307446) Time Spent: 1h (was: 50m) > Portable Wordcount on Spark runner does not work in DOCKER execution mode. > -- > > Key: BEAM-7859 > URL: https://issues.apache.org/jira/browse/BEAM-7859 > Project: Beam > Issue Type: Bug > Components: runner-spark, sdk-py-harness >Reporter: Valentyn Tymofieiev >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > Time Spent: 1h > Remaining Estimate: 0h > > The error was observed during Beam 2.14.0 release validation, see: > https://issues.apache.org/jira/browse/BEAM-7224?focusedCommentId=16896831=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16896831 > Looks like master currently fails with a different error, both in Loopback > and Docker modes. > [~ibzib] [~altay] [~robertwb] [~angoenka] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.
[ https://issues.apache.org/jira/browse/BEAM-7060?focusedWorklogId=307444=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307444 ] ASF GitHub Bot logged work on BEAM-7060: Author: ASF GitHub Bot Created on: 05/Sep/19 21:13 Start Date: 05/Sep/19 21:13 Worklog Time Spent: 10m Work Description: udim commented on pull request #9283: [BEAM-7060] Type hints from Python 3 annotations URL: https://github.com/apache/beam/pull/9283 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307444) Time Spent: 15.5h (was: 15h 20m) > Design Py3-compatible typehints annotation support in Beam 3. > - > > Key: BEAM-7060 > URL: https://issues.apache.org/jira/browse/BEAM-7060 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Udi Meiri >Priority: Major > Fix For: 2.16.0 > > Time Spent: 15.5h > Remaining Estimate: 0h > > Existing [Typehints implementaiton in > Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/ > ] heavily relies on internal details of CPython implementation, and some of > the assumptions of this implementation broke as of Python 3.6, see for > example: https://issues.apache.org/jira/browse/BEAM-6877, which makes > typehints support unusable on Python 3.6 as of now. [Python 3 Kanban > Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245=detail] > lists several specific typehints-related breakages, prefixed with "TypeHints > Py3 Error". > We need to decide whether to: > - Deprecate in-house typehints implementation. > - Continue to support in-house implementation, which at this point is a stale > code and has other known issues. > - Attempt to use some off-the-shelf libraries for supporting > type-annotations, like Pytype, Mypy, PyAnnotate. > WRT to this decision we also need to plan on immediate next steps to unblock > adoption of Beam for Python 3.6+ users. One potential option may be to have > Beam SDK ignore any typehint annotations on Py 3.6+. > cc: [~udim], [~altay], [~robertwb]. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7984) [python] The coder returned for typehints.List should be IterableCoder
[ https://issues.apache.org/jira/browse/BEAM-7984?focusedWorklogId=307439=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307439 ] ASF GitHub Bot logged work on BEAM-7984: Author: ASF GitHub Bot Created on: 05/Sep/19 21:07 Start Date: 05/Sep/19 21:07 Worklog Time Spent: 10m Work Description: robertwb commented on issue #9344: [BEAM-7984] The coder returned for typehints.List should be IterableCoder URL: https://github.com/apache/beam/pull/9344#issuecomment-528582137 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307439) Time Spent: 3.5h (was: 3h 20m) > [python] The coder returned for typehints.List should be IterableCoder > -- > > Key: BEAM-7984 > URL: https://issues.apache.org/jira/browse/BEAM-7984 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > > IterableCoder encodes a list and decodes to list, but > typecoders.registry.get_coder(typehints.List[bytes]) returns a > FastPrimitiveCoder. I don't see any reason why this would be advantageous. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7738) Support PubSubIO to be configured externally for use with other SDKs
[ https://issues.apache.org/jira/browse/BEAM-7738?focusedWorklogId=307440=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307440 ] ASF GitHub Bot logged work on BEAM-7738: Author: ASF GitHub Bot Created on: 05/Sep/19 21:07 Start Date: 05/Sep/19 21:07 Worklog Time Spent: 10m Work Description: chadrik commented on issue #9268: [BEAM-7738] Add external transform support to PubsubIO URL: https://github.com/apache/beam/pull/9268#issuecomment-528582296 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307440) Time Spent: 3h (was: 2h 50m) > Support PubSubIO to be configured externally for use with other SDKs > > > Key: BEAM-7738 > URL: https://issues.apache.org/jira/browse/BEAM-7738 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp, runner-flink, sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Labels: portability > Time Spent: 3h > Remaining Estimate: 0h > > Now that KafkaIO is supported via the external transform API (BEAM-7029) we > should add support for PubSub. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8000) Add Delete method to gRPC JobService
[ https://issues.apache.org/jira/browse/BEAM-8000?focusedWorklogId=307438=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307438 ] ASF GitHub Bot logged work on BEAM-8000: Author: ASF GitHub Bot Created on: 05/Sep/19 21:05 Start Date: 05/Sep/19 21:05 Worklog Time Spent: 10m Work Description: chadrik commented on issue #9489: [BEAM-8000] Add Delete method to JobService for clearing old jobs from memory URL: https://github.com/apache/beam/pull/9489#issuecomment-528581673 R: @lukecwik This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307438) Time Spent: 20m (was: 10m) > Add Delete method to gRPC JobService > > > Key: BEAM-8000 > URL: https://issues.apache.org/jira/browse/BEAM-8000 > Project: Beam > Issue Type: Improvement > Components: beam-model >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Labels: portability > Time Spent: 20m > Remaining Estimate: 0h > > As a user of the InMemoryJobService, I want a method to purge jobs from > memory when they are no longer needed, so that the service does not balloon > in memory usage over time. > I was planning to name this Delete. Also considering the name Purge. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8000) Add Delete method to gRPC JobService
[ https://issues.apache.org/jira/browse/BEAM-8000?focusedWorklogId=307437=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307437 ] ASF GitHub Bot logged work on BEAM-8000: Author: ASF GitHub Bot Created on: 05/Sep/19 21:05 Start Date: 05/Sep/19 21:05 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9489: [BEAM-8000] Add Delete method to JobService for clearing old jobs from memory URL: https://github.com/apache/beam/pull/9489 As a user of the InMemoryJobService, I want a method to remove jobs from memory when they are no longer needed, so that the service does not grow in memory usage over time. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
[jira] [Commented] (BEAM-8148) [python] allow tests to be specified when using tox
[ https://issues.apache.org/jira/browse/BEAM-8148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923730#comment-16923730 ] Chad Dombrova commented on BEAM-8148: - PR is merged, so this is fixed! > [python] allow tests to be specified when using tox > --- > > Key: BEAM-8148 > URL: https://issues.apache.org/jira/browse/BEAM-8148 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > I don't know how to specify individual tests using gradle, and as a python > developer, it's way too opaque for me to figure out on my own. > The developer wiki suggest calling the tests directly: > {noformat} > python setup.py nosetests --tests :. > {noformat} > But this defeats the purpose of using tox, which is there to help users > manage the myriad virtual envs required to run the different tests. > Luckily there is an easier way! You just add {posargs} to the tox commands. > PR is coming shortly. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7600) Spark portable runner: reuse SDK harness
[ https://issues.apache.org/jira/browse/BEAM-7600?focusedWorklogId=307435=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307435 ] ASF GitHub Bot logged work on BEAM-7600: Author: ASF GitHub Bot Created on: 05/Sep/19 21:02 Start Date: 05/Sep/19 21:02 Worklog Time Spent: 10m Work Description: angoenka commented on pull request #9095: [BEAM-7600] borrow SDK harness management code into Spark runner URL: https://github.com/apache/beam/pull/9095#discussion_r321478397 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DefaultExecutableStageContext.java ## @@ -102,7 +100,8 @@ private JobFactoryState(int maxFactories) { new ConcurrentHashMap<>(); @Override -public FlinkExecutableStageContext get(JobInfo jobInfo) { +public ExecutableStageContext get( +JobInfo jobInfo, SerializableFunction isReleaseSynchronous) { Review comment: I still feel that we should not have `isReleaseSynchronous` in `get` method as the usage can be unpredictable in case where we call `get` method at 2 locations for the same jobInfo but with different `isReleaseSynchronous` method. As we are using caching, the 1st `isReleaseSynchronous` will be applied and can lead to concurrency bug. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307435) Time Spent: 6h 40m (was: 6.5h) > Spark portable runner: reuse SDK harness > > > Key: BEAM-7600 > URL: https://issues.apache.org/jira/browse/BEAM-7600 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > Time Spent: 6h 40m > Remaining Estimate: 0h > > Right now, we're creating a new SDK harness every time an executable stage is > run [1], which is expensive. We should be able to re-use code from the Flink > runner to re-use the SDK harness [2]. > > [1] > [https://github.com/apache/beam/blob/c9fb261bc7666788402840bb6ce1b0ce2fd445d1/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkExecutableStageFunction.java#L135] > [2] > [https://github.com/apache/beam/blob/c9fb261bc7666788402840bb6ce1b0ce2fd445d1/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/FlinkDefaultExecutableStageContext.java] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7967) Execute portable Flink application jar
[ https://issues.apache.org/jira/browse/BEAM-7967?focusedWorklogId=307429=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307429 ] ASF GitHub Bot logged work on BEAM-7967: Author: ASF GitHub Bot Created on: 05/Sep/19 20:57 Start Date: 05/Sep/19 20:57 Worklog Time Spent: 10m Work Description: ibzib commented on issue #9408: [BEAM-7967] Execute portable Flink application jar URL: https://github.com/apache/beam/pull/9408#issuecomment-528578625 Run PortableJar_Flink PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307429) Time Spent: 2h (was: 1h 50m) > Execute portable Flink application jar > -- > > Key: BEAM-7967 > URL: https://issues.apache.org/jira/browse/BEAM-7967 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Time Spent: 2h > Remaining Estimate: 0h > > [https://docs.google.com/document/d/1kj_9JWxGWOmSGeZ5hbLVDXSTv-zBrx4kQRqOq85RYD4/edit#heading=h.oes73844vmhl] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7967) Execute portable Flink application jar
[ https://issues.apache.org/jira/browse/BEAM-7967?focusedWorklogId=307431=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307431 ] ASF GitHub Bot logged work on BEAM-7967: Author: ASF GitHub Bot Created on: 05/Sep/19 20:57 Start Date: 05/Sep/19 20:57 Worklog Time Spent: 10m Work Description: ibzib commented on issue #9408: [BEAM-7967] Execute portable Flink application jar URL: https://github.com/apache/beam/pull/9408#issuecomment-528578775 Run PortableJar_Flink PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307431) Time Spent: 2h 10m (was: 2h) > Execute portable Flink application jar > -- > > Key: BEAM-7967 > URL: https://issues.apache.org/jira/browse/BEAM-7967 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Time Spent: 2h 10m > Remaining Estimate: 0h > > [https://docs.google.com/document/d/1kj_9JWxGWOmSGeZ5hbLVDXSTv-zBrx4kQRqOq85RYD4/edit#heading=h.oes73844vmhl] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8155) Add PubsubIO.readMessagesWithCoderAndParseFn method
[ https://issues.apache.org/jira/browse/BEAM-8155?focusedWorklogId=307427=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307427 ] ASF GitHub Bot logged work on BEAM-8155: Author: ASF GitHub Bot Created on: 05/Sep/19 20:52 Start Date: 05/Sep/19 20:52 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #9473: [BEAM-8155] Add static PubsubIO.readMessagesWithCoderAndParseFn method URL: https://github.com/apache/beam/pull/9473#issuecomment-528576937 Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307427) Remaining Estimate: 0h Time Spent: 10m > Add PubsubIO.readMessagesWithCoderAndParseFn method > --- > > Key: BEAM-8155 > URL: https://issues.apache.org/jira/browse/BEAM-8155 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Claire McGinty >Assignee: Claire McGinty >Priority: Major > Fix For: 2.16.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Complements [#8879|https://github.com/apache/beam/pull/8879]; the constructor > for a generic-typed PubsubIO.Read is private, so some kind of public > constructor is needed to use custom parse functions with generic types. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8155) Add PubsubIO.readMessagesWithCoderAndParseFn method
[ https://issues.apache.org/jira/browse/BEAM-8155?focusedWorklogId=307428=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307428 ] ASF GitHub Bot logged work on BEAM-8155: Author: ASF GitHub Bot Created on: 05/Sep/19 20:52 Start Date: 05/Sep/19 20:52 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #9473: [BEAM-8155] Add static PubsubIO.readMessagesWithCoderAndParseFn method URL: https://github.com/apache/beam/pull/9473 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307428) Time Spent: 20m (was: 10m) > Add PubsubIO.readMessagesWithCoderAndParseFn method > --- > > Key: BEAM-8155 > URL: https://issues.apache.org/jira/browse/BEAM-8155 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Claire McGinty >Assignee: Claire McGinty >Priority: Major > Fix For: 2.16.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Complements [#8879|https://github.com/apache/beam/pull/8879]; the constructor > for a generic-typed PubsubIO.Read is private, so some kind of public > constructor is needed to use custom parse functions with generic types. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8113) FlinkRunner: Stage files from context classloader
[ https://issues.apache.org/jira/browse/BEAM-8113?focusedWorklogId=307426=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307426 ] ASF GitHub Bot logged work on BEAM-8113: Author: ASF GitHub Bot Created on: 05/Sep/19 20:48 Start Date: 05/Sep/19 20:48 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #9451: [BEAM-8113] Stage files from context classloader URL: https://github.com/apache/beam/pull/9451#issuecomment-528575435 I have to take another look through the comments/updates. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307426) Time Spent: 6h 50m (was: 6h 40m) > FlinkRunner: Stage files from context classloader > - > > Key: BEAM-8113 > URL: https://issues.apache.org/jira/browse/BEAM-8113 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Jan Lukavský >Assignee: Jan Lukavský >Priority: Major > Time Spent: 6h 50m > Remaining Estimate: 0h > > Currently, only files from {{FlinkRunner.class.getClassLoader()}} are staged > by default. Add also files from > {{Thread.currentThread().getContextClassLoader()}}. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=307424=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307424 ] ASF GitHub Bot logged work on BEAM-7389: Author: ASF GitHub Bot Created on: 05/Sep/19 20:44 Start Date: 05/Sep/19 20:44 Worklog Time Spent: 10m Work Description: davidcavazos commented on issue #9257: [BEAM-7389] Add DoFn methods sample URL: https://github.com/apache/beam/pull/9257#issuecomment-528574027 @aaltay Tests are passing. More detailed descriptions will be available in the docs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307424) Time Spent: 55h 20m (was: 55h 10m) > Colab examples for element-wise transforms (Python) > --- > > Key: BEAM-7389 > URL: https://issues.apache.org/jira/browse/BEAM-7389 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Rose Nguyen >Assignee: David Cavazos >Priority: Minor > Time Spent: 55h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7967) Execute portable Flink application jar
[ https://issues.apache.org/jira/browse/BEAM-7967?focusedWorklogId=307421=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307421 ] ASF GitHub Bot logged work on BEAM-7967: Author: ASF GitHub Bot Created on: 05/Sep/19 20:43 Start Date: 05/Sep/19 20:43 Worklog Time Spent: 10m Work Description: ibzib commented on issue #9408: [BEAM-7967] Execute portable Flink application jar URL: https://github.com/apache/beam/pull/9408#issuecomment-528573675 Run Seed Job This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307421) Time Spent: 1h 50m (was: 1h 40m) > Execute portable Flink application jar > -- > > Key: BEAM-7967 > URL: https://issues.apache.org/jira/browse/BEAM-7967 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Time Spent: 1h 50m > Remaining Estimate: 0h > > [https://docs.google.com/document/d/1kj_9JWxGWOmSGeZ5hbLVDXSTv-zBrx4kQRqOq85RYD4/edit#heading=h.oes73844vmhl] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8151) Allow the Python SDK to use many many threads
[ https://issues.apache.org/jira/browse/BEAM-8151?focusedWorklogId=307420=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307420 ] ASF GitHub Bot logged work on BEAM-8151: Author: ASF GitHub Bot Created on: 05/Sep/19 20:35 Start Date: 05/Sep/19 20:35 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #9477: [BEAM-8151, BEAM-7848] Up the max number of threads inside the SDK harness to a default of 10k URL: https://github.com/apache/beam/pull/9477#issuecomment-528571071 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307420) Time Spent: 3.5h (was: 3h 20m) > Allow the Python SDK to use many many threads > - > > Key: BEAM-8151 > URL: https://issues.apache.org/jira/browse/BEAM-8151 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > > We need to use a thread pool which shrinks the number of active threads when > they are not being used. > > This is to prevent any stuckness issues related to a runner scheduling more > work items then there are "work" threads inside the SDK harness. > > By default the control plane should have all "requests" being processed in > parallel and the runner is responsible for not overloading the SDK with too > much work. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8158) Loosen Python dependency restrictions: httplib2, oauth2client
Jonathan created BEAM-8158: -- Summary: Loosen Python dependency restrictions: httplib2, oauth2client Key: BEAM-8158 URL: https://issues.apache.org/jira/browse/BEAM-8158 Project: Beam Issue Type: Improvement Components: sdk-py-core Reporter: Jonathan The Beam Python SDK's current pinned dependencies create dependency conflict issues for my team at Twitter. I'd like the following expansions of the Python SDK's dependency ranges: * oauth2client>=2.0.1,<*4* to at least oauth2client>=2.0.1,<=*4.1.2* * httplib2>=0.8,<=*0.12.0* to at least httplib2>=0.8,<=*0.12.3* I understand, from pull request [8653|https://github.com/apache/beam/pull/8653] by [~altay], that the latter is blocked in turn by googledatastore. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-2572) Implement an S3 filesystem for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923701#comment-16923701 ] Pablo Estrada commented on BEAM-2572: - [~yajunwong] thanks for working on this. I think it's worth contributing - it just requires healthy unit tests. > Implement an S3 filesystem for Python SDK > - > > Key: BEAM-2572 > URL: https://issues.apache.org/jira/browse/BEAM-2572 > Project: Beam > Issue Type: Task > Components: sdk-py-core >Reporter: Dmitry Demeshchuk >Priority: Minor > Labels: GSoC2019, gsoc, gsoc2019, mentor, outreachy19dec > > There are two paths worth exploring, to my understanding: > 1. Sticking to the HDFS-based approach (like it's done in Java). > 2. Using boto/boto3 for accessing S3 through its common API endpoints. > I personally prefer the second approach, for a few reasons: > 1. In real life, HDFS and S3 have different consistency guarantees, therefore > their behaviors may contradict each other in some edge cases (say, we write > something to S3, but it's not immediately accessible for reading from another > end). > 2. There are other AWS-based sources and sinks we may want to create in the > future: DynamoDB, Kinesis, SQS, etc. > 3. boto3 already provides somewhat good logic for basic things like > reattempting. > Whatever path we choose, there's another problem related to this: we > currently cannot pass any global settings (say, pipeline options, or just an > arbitrary kwarg) to a filesystem. Because of that, we'd have to setup the > runner nodes to have AWS keys set up in the environment, which is not trivial > to achieve and doesn't look too clean either (I'd rather see one single place > for configuring the runner options). > Also, it's worth mentioning that I already have a janky S3 filesystem > implementation that only supports DirectRunner at the moment (because of the > previous paragraph). I'm perfectly fine finishing it myself, with some > guidance from the maintainers. > Where should I move on from here, and whose input should I be looking for? > Thanks! -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8113) FlinkRunner: Stage files from context classloader
[ https://issues.apache.org/jira/browse/BEAM-8113?focusedWorklogId=307406=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307406 ] ASF GitHub Bot logged work on BEAM-8113: Author: ASF GitHub Bot Created on: 05/Sep/19 19:44 Start Date: 05/Sep/19 19:44 Worklog Time Spent: 10m Work Description: je-ik commented on issue #9451: [BEAM-8113] Stage files from context classloader URL: https://github.com/apache/beam/pull/9451#issuecomment-528549486 @lukecwik can I take your last comment as that you are okay with this PR? I'd like to merge it tomorrow, so that I don't have to rebase it again and again. :-) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307406) Time Spent: 6h 40m (was: 6.5h) > FlinkRunner: Stage files from context classloader > - > > Key: BEAM-8113 > URL: https://issues.apache.org/jira/browse/BEAM-8113 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Jan Lukavský >Assignee: Jan Lukavský >Priority: Major > Time Spent: 6h 40m > Remaining Estimate: 0h > > Currently, only files from {{FlinkRunner.class.getClassLoader()}} are staged > by default. Add also files from > {{Thread.currentThread().getContextClassLoader()}}. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8113) FlinkRunner: Stage files from context classloader
[ https://issues.apache.org/jira/browse/BEAM-8113?focusedWorklogId=307405=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307405 ] ASF GitHub Bot logged work on BEAM-8113: Author: ASF GitHub Bot Created on: 05/Sep/19 19:42 Start Date: 05/Sep/19 19:42 Worklog Time Spent: 10m Work Description: je-ik commented on issue #9451: [BEAM-8113] Stage files from context classloader URL: https://github.com/apache/beam/pull/9451#issuecomment-528548813 Run Flink ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307405) Time Spent: 6.5h (was: 6h 20m) > FlinkRunner: Stage files from context classloader > - > > Key: BEAM-8113 > URL: https://issues.apache.org/jira/browse/BEAM-8113 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Jan Lukavský >Assignee: Jan Lukavský >Priority: Major > Time Spent: 6.5h > Remaining Estimate: 0h > > Currently, only files from {{FlinkRunner.class.getClassLoader()}} are staged > by default. Add also files from > {{Thread.currentThread().getContextClassLoader()}}. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8157) Flink state requests return wrong state in timers when encoded key is length-prefixed
[ https://issues.apache.org/jira/browse/BEAM-8157?focusedWorklogId=307397=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307397 ] ASF GitHub Bot logged work on BEAM-8157: Author: ASF GitHub Bot Created on: 05/Sep/19 19:18 Start Date: 05/Sep/19 19:18 Worklog Time Spent: 10m Work Description: tweise commented on pull request #9484: [BEAM-8157] Remove length prefix from state key for Flink's state backend URL: https://github.com/apache/beam/pull/9484#discussion_r321437183 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java ## @@ -343,11 +343,11 @@ public void clear(K key, W window) { } private void prepareStateBackend(K key) { - // Key for state request is shipped already encoded as ByteString, - // this is mostly a wrapping with ByteBuffer. We still follow the - // usual key encoding procedure. - // final ByteBuffer encodedKey = FlinkKeyUtils.encodeKey(key, keyCoder); - final ByteBuffer encodedKey = ByteBuffer.wrap(key.toByteArray()); + // Key for state request is shipped already encoded as ByteString, but it is Review comment: https://gist.github.com/tweise/e1d95369234a52b12f90794f6032f039 That's working on 2.14 and also on master as of 4460f03cefbde5cf66c797408439321de2f40692 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307397) Time Spent: 2h (was: 1h 50m) > Flink state requests return wrong state in timers when encoded key is > length-prefixed > - > > Key: BEAM-8157 > URL: https://issues.apache.org/jira/browse/BEAM-8157 > Project: Beam > Issue Type: Bug > Components: runner-flink >Affects Versions: 2.13.0 >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Fix For: 2.16.0 > > Time Spent: 2h > Remaining Estimate: 0h > > Due to multiple changes made in BEAM-7126, the Flink internal key encoding is > broken when the key is encoded with a length prefix. The Flink runner > requires the internal key to be encoded without a length prefix. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.
[ https://issues.apache.org/jira/browse/BEAM-7060?focusedWorklogId=307388=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307388 ] ASF GitHub Bot logged work on BEAM-7060: Author: ASF GitHub Bot Created on: 05/Sep/19 18:15 Start Date: 05/Sep/19 18:15 Worklog Time Spent: 10m Work Description: udim commented on issue #9283: [BEAM-7060] Type hints from Python 3 annotations URL: https://github.com/apache/beam/pull/9283#issuecomment-528509353 Run Python 2 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307388) Time Spent: 15h 20m (was: 15h 10m) > Design Py3-compatible typehints annotation support in Beam 3. > - > > Key: BEAM-7060 > URL: https://issues.apache.org/jira/browse/BEAM-7060 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Udi Meiri >Priority: Major > Fix For: 2.16.0 > > Time Spent: 15h 20m > Remaining Estimate: 0h > > Existing [Typehints implementaiton in > Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/ > ] heavily relies on internal details of CPython implementation, and some of > the assumptions of this implementation broke as of Python 3.6, see for > example: https://issues.apache.org/jira/browse/BEAM-6877, which makes > typehints support unusable on Python 3.6 as of now. [Python 3 Kanban > Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245=detail] > lists several specific typehints-related breakages, prefixed with "TypeHints > Py3 Error". > We need to decide whether to: > - Deprecate in-house typehints implementation. > - Continue to support in-house implementation, which at this point is a stale > code and has other known issues. > - Attempt to use some off-the-shelf libraries for supporting > type-annotations, like Pytype, Mypy, PyAnnotate. > WRT to this decision we also need to plan on immediate next steps to unblock > adoption of Beam for Python 3.6+ users. One potential option may be to have > Beam SDK ignore any typehint annotations on Py 3.6+. > cc: [~udim], [~altay], [~robertwb]. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-7819) PubsubMessage message parsing is lacking non-attribute fields
[ https://issues.apache.org/jira/browse/BEAM-7819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923683#comment-16923683 ] Udi Meiri commented on BEAM-7819: - Also, you might get inconsistent results from Dataflow: sometimes the message_id and publish_time will be filled and sometimes they will be empty. The feature is currently being rolled out. > PubsubMessage message parsing is lacking non-attribute fields > - > > Key: BEAM-7819 > URL: https://issues.apache.org/jira/browse/BEAM-7819 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Ahmet Altay >Assignee: Matthew Darwin >Priority: Major > Time Spent: 9h 40m > Remaining Estimate: 0h > > User reported issue: > https://lists.apache.org/thread.html/139b0c15abc6471a2e2202d76d915c645a529a23ecc32cd9cfecd315@%3Cuser.beam.apache.org%3E > """ > Looking at the source code, with my untrained python eyes, I think if the > intention is to include the message id and the publish time in the attributes > attribute of the PubSubMessage type, then the protobuf mapping is missing > something:- > @staticmethod > def _from_proto_str(proto_msg): > """Construct from serialized form of ``PubsubMessage``. > Args: > proto_msg: String containing a serialized protobuf of type > https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage > Returns: > A new PubsubMessage object. > """ > msg = pubsub.types.pubsub_pb2.PubsubMessage() > msg.ParseFromString(proto_msg) > # Convert ScalarMapContainer to dict. > attributes = dict((key, msg.attributes[key]) for key in msg.attributes) > return PubsubMessage(msg.data, attributes) > The protobuf definition is here:- > https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage > and so it looks as if the message_id and publish_time are not being parsed as > they are seperate from the attributes. Perhaps the PubsubMessage class needs > expanding to include these as attributes, or they would need adding to the > dictionary for attributes. This would only need doing for the _from_proto_str > as obviously they would not need to be populated when transmitting a message > to PubSub. > My python is not great, I'm assuming the latter option would need to look > something like this? > attributes = dict((key, msg.attributes[key]) for key in msg.attributes) > attributes.update({'message_id': msg.message_id, 'publish_time': > msg.publish_time}) > return PubsubMessage(msg.data, attributes) > """ -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Comment Edited] (BEAM-7819) PubsubMessage message parsing is lacking non-attribute fields
[ https://issues.apache.org/jira/browse/BEAM-7819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923665#comment-16923665 ] Udi Meiri edited comment on BEAM-7819 at 9/5/19 6:06 PM: - Adding '--kms_key_name ""' should fix the issue. Don't forget to rerun "python setup.py sdist" to recreate the tarball passed to sdk_location every time. I'm not sure what the worker_jar does but I'd leave it in. :) {code} python setup.py sdist && ./scripts/run_integration_test.sh --project [my-project] --gcs_location gs://[my-project]/tmp --test_opts --tests=apache_beam.io.gcp.pubsub_integration_test --runner TestDataflowRunner --sdk_location ./dist/apache-beam-2.16.0.dev0.tar.gz --worker_jar ../../runners/google-cloud-dataflow-java/worker/build/libs/beam-runners-google-cloud-dataflow-java-fn-api-worker-2.16.0-SNAPSHOT.jar --kms_key_name "" {code} was (Author: udim): Adding '--kms_key_name ""' should fix the issue. Don't forget to rerun "python setup.py sdist" to recreate the tarball passed to sdk_location every time. I'm not sure what the worker_jar does but I'd leave it in. :) {code} python setup.py sdist && ./scripts/run_integration_test.sh --project [my-project] --gcs_location gs://[my-project]/tmp --test_opts --tests=apache_beam.io.gcp.pubsub_integration_test --runner TestDataflowRunner --sdk_location ./dist/apache-beam-2.16.0.dev0.tar.gz --worker_jar ../../runners/google-cloud-dataflow-java/worker/build/libs/beam-runners-google-cloud-dataflow-java-fn-api-worker-2.16.0-SNAPSHOT.jar --kms_key_name="" {code} > PubsubMessage message parsing is lacking non-attribute fields > - > > Key: BEAM-7819 > URL: https://issues.apache.org/jira/browse/BEAM-7819 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Ahmet Altay >Assignee: Matthew Darwin >Priority: Major > Time Spent: 9h 40m > Remaining Estimate: 0h > > User reported issue: > https://lists.apache.org/thread.html/139b0c15abc6471a2e2202d76d915c645a529a23ecc32cd9cfecd315@%3Cuser.beam.apache.org%3E > """ > Looking at the source code, with my untrained python eyes, I think if the > intention is to include the message id and the publish time in the attributes > attribute of the PubSubMessage type, then the protobuf mapping is missing > something:- > @staticmethod > def _from_proto_str(proto_msg): > """Construct from serialized form of ``PubsubMessage``. > Args: > proto_msg: String containing a serialized protobuf of type > https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage > Returns: > A new PubsubMessage object. > """ > msg = pubsub.types.pubsub_pb2.PubsubMessage() > msg.ParseFromString(proto_msg) > # Convert ScalarMapContainer to dict. > attributes = dict((key, msg.attributes[key]) for key in msg.attributes) > return PubsubMessage(msg.data, attributes) > The protobuf definition is here:- > https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage > and so it looks as if the message_id and publish_time are not being parsed as > they are seperate from the attributes. Perhaps the PubsubMessage class needs > expanding to include these as attributes, or they would need adding to the > dictionary for attributes. This would only need doing for the _from_proto_str > as obviously they would not need to be populated when transmitting a message > to PubSub. > My python is not great, I'm assuming the latter option would need to look > something like this? > attributes = dict((key, msg.attributes[key]) for key in msg.attributes) > attributes.update({'message_id': msg.message_id, 'publish_time': > msg.publish_time}) > return PubsubMessage(msg.data, attributes) > """ -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-7819) PubsubMessage message parsing is lacking non-attribute fields
[ https://issues.apache.org/jira/browse/BEAM-7819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923665#comment-16923665 ] Udi Meiri commented on BEAM-7819: - Adding '--kms_key_name ""' should fix the issue. Don't forget to rerun "python setup.py sdist" to recreate the tarball passed to sdk_location every time. I'm not sure what the worker_jar does but I'd leave it in. :) {code} python setup.py sdist && ./scripts/run_integration_test.sh --project [my-project] --gcs_location gs://[my-project]/tmp --test_opts --tests=apache_beam.io.gcp.pubsub_integration_test --runner TestDataflowRunner --sdk_location ./dist/apache-beam-2.16.0.dev0.tar.gz --worker_jar ../../runners/google-cloud-dataflow-java/worker/build/libs/beam-runners-google-cloud-dataflow-java-fn-api-worker-2.16.0-SNAPSHOT.jar --kms_key_name="" {code} > PubsubMessage message parsing is lacking non-attribute fields > - > > Key: BEAM-7819 > URL: https://issues.apache.org/jira/browse/BEAM-7819 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Ahmet Altay >Assignee: Matthew Darwin >Priority: Major > Time Spent: 9h 40m > Remaining Estimate: 0h > > User reported issue: > https://lists.apache.org/thread.html/139b0c15abc6471a2e2202d76d915c645a529a23ecc32cd9cfecd315@%3Cuser.beam.apache.org%3E > """ > Looking at the source code, with my untrained python eyes, I think if the > intention is to include the message id and the publish time in the attributes > attribute of the PubSubMessage type, then the protobuf mapping is missing > something:- > @staticmethod > def _from_proto_str(proto_msg): > """Construct from serialized form of ``PubsubMessage``. > Args: > proto_msg: String containing a serialized protobuf of type > https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage > Returns: > A new PubsubMessage object. > """ > msg = pubsub.types.pubsub_pb2.PubsubMessage() > msg.ParseFromString(proto_msg) > # Convert ScalarMapContainer to dict. > attributes = dict((key, msg.attributes[key]) for key in msg.attributes) > return PubsubMessage(msg.data, attributes) > The protobuf definition is here:- > https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage > and so it looks as if the message_id and publish_time are not being parsed as > they are seperate from the attributes. Perhaps the PubsubMessage class needs > expanding to include these as attributes, or they would need adding to the > dictionary for attributes. This would only need doing for the _from_proto_str > as obviously they would not need to be populated when transmitting a message > to PubSub. > My python is not great, I'm assuming the latter option would need to look > something like this? > attributes = dict((key, msg.attributes[key]) for key in msg.attributes) > attributes.update({'message_id': msg.message_id, 'publish_time': > msg.publish_time}) > return PubsubMessage(msg.data, attributes) > """ -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7859) Portable Wordcount on Spark runner does not work in DOCKER execution mode.
[ https://issues.apache.org/jira/browse/BEAM-7859?focusedWorklogId=307374=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307374 ] ASF GitHub Bot logged work on BEAM-7859: Author: ASF GitHub Bot Created on: 05/Sep/19 17:51 Start Date: 05/Sep/19 17:51 Worklog Time Spent: 10m Work Description: ibzib commented on issue #9421: [BEAM-7859] set SDK worker parallelism to 1 in word count test URL: https://github.com/apache/beam/pull/9421#issuecomment-528496517 Ping @angoenka -- this should be a no-op This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307374) Time Spent: 50m (was: 40m) > Portable Wordcount on Spark runner does not work in DOCKER execution mode. > -- > > Key: BEAM-7859 > URL: https://issues.apache.org/jira/browse/BEAM-7859 > Project: Beam > Issue Type: Bug > Components: runner-spark, sdk-py-harness >Reporter: Valentyn Tymofieiev >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > Time Spent: 50m > Remaining Estimate: 0h > > The error was observed during Beam 2.14.0 release validation, see: > https://issues.apache.org/jira/browse/BEAM-7224?focusedCommentId=16896831=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16896831 > Looks like master currently fails with a different error, both in Loopback > and Docker modes. > [~ibzib] [~altay] [~robertwb] [~angoenka] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-8153) PubSubIntegrationTest failing in post-commit
[ https://issues.apache.org/jira/browse/BEAM-8153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923655#comment-16923655 ] Udi Meiri commented on BEAM-8153: - Yes, the immediate issue is resolved > PubSubIntegrationTest failing in post-commit > > > Key: BEAM-8153 > URL: https://issues.apache.org/jira/browse/BEAM-8153 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, test-failures >Reporter: Udi Meiri >Assignee: Matthew Darwin >Priority: Major > > Most likely due to: https://github.com/apache/beam/pull/9232 > {code} > 11:44:31 > == > 11:44:31 ERROR: test_streaming_with_attributes > (apache_beam.io.gcp.pubsub_integration_test.PubSubIntegrationTest) > 11:44:31 > -- > 11:44:31 Traceback (most recent call last): > 11:44:31 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/pubsub_integration_test.py", > line 199, in test_streaming_with_attributes > 11:44:31 self._test_streaming(with_attributes=True) > 11:44:31 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/pubsub_integration_test.py", > line 191, in _test_streaming > 11:44:31 timestamp_attribute=self.TIMESTAMP_ATTRIBUTE) > 11:44:31 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/pubsub_it_pipeline.py", > line 91, in run_pipeline > 11:44:31 result = p.run() > 11:44:31 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/pipeline.py", > line 420, in run > 11:44:31 return self.runner.run_pipeline(self, self._options) > 11:44:31 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/direct/test_direct_runner.py", > line 51, in run_pipeline > 11:44:31 hc_assert_that(self.result, pickler.loads(on_success_matcher)) > 11:44:31 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/1398941891/lib/python3.7/site-packages/hamcrest/core/assert_that.py", > line 43, in assert_that > 11:44:31 _assert_match(actual=arg1, matcher=arg2, reason=arg3) > 11:44:31 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/1398941891/lib/python3.7/site-packages/hamcrest/core/assert_that.py", > line 49, in _assert_match > 11:44:31 if not matcher.matches(actual): > 11:44:31 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/1398941891/lib/python3.7/site-packages/hamcrest/core/core/allof.py", > line 16, in matches > 11:44:31 if not matcher.matches(item): > 11:44:31 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/1398941891/lib/python3.7/site-packages/hamcrest/core/base_matcher.py", > line 28, in matches > 11:44:31 match_result = self._matches(item) > 11:44:31 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/tests/pubsub_matcher.py", > line 91, in _matches > 11:44:31 return Counter(self.messages) == Counter(self.expected_msg) > 11:44:31 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/1398941891/lib/python3.7/collections/__init__.py", > line 566, in __init__ > 11:44:31 self.update(*args, **kwds) > 11:44:31 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/1398941891/lib/python3.7/collections/__init__.py", > line 653, in update > 11:44:31 _count_elements(self, iterable) > 11:44:31 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/pubsub.py", > line 83, in __hash__ > 11:44:31 self.message_id, self.publish_time.seconds, > 11:44:31 AttributeError: 'NoneType' object has no attribute 'seconds' > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7984) [python] The coder returned for typehints.List should be IterableCoder
[ https://issues.apache.org/jira/browse/BEAM-7984?focusedWorklogId=307367=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307367 ] ASF GitHub Bot logged work on BEAM-7984: Author: ASF GitHub Bot Created on: 05/Sep/19 17:42 Start Date: 05/Sep/19 17:42 Worklog Time Spent: 10m Work Description: robertwb commented on issue #9344: [BEAM-7984] The coder returned for typehints.List should be IterableCoder URL: https://github.com/apache/beam/pull/9344#issuecomment-528493036 Thanks for tracking this down. https://github.com/apache/beam/pull/9486 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307367) Time Spent: 3h 20m (was: 3h 10m) > [python] The coder returned for typehints.List should be IterableCoder > -- > > Key: BEAM-7984 > URL: https://issues.apache.org/jira/browse/BEAM-7984 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > IterableCoder encodes a list and decodes to list, but > typecoders.registry.get_coder(typehints.List[bytes]) returns a > FastPrimitiveCoder. I don't see any reason why this would be advantageous. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Comment Edited] (BEAM-7819) PubsubMessage message parsing is lacking non-attribute fields
[ https://issues.apache.org/jira/browse/BEAM-7819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923646#comment-16923646 ] Matthew Darwin edited comment on BEAM-7819 at 9/5/19 5:40 PM: -- Struggling a little with running the integration test locally; does it need both a the tarball building and the worker_jar? ./scripts/run_integration_test.sh --project [my-project] --gcs_location gs://[my-project]/tmp --test_opts --tests=apache_beam.io.gcp.pubsub_integration_test --runner TestDataflowRunner --sdk_location ./dist/apache-beam-2.16.0.dev0.tar.gz --worker_jar ../../runners/google-cloud-dataflow-java/worker/build/libs/beam-runners-google-cloud-dataflow-java-fn-api-worker-2.16.0-SNAPSHOT.jar {\{AssertionError: }} {{Expected: (Test pipeline expected terminated in state: RUNNING and Expected 4 messages.)}} \{{ but: Test pipeline expected terminated in state: RUNNING Test pipeline job terminated in state: FAILED}} In addition, for the pubsub integration test, given that pubsub will generate the message_id and publish_time, I'm not sure how exactly to handle that in the expected messages? was (Author: matt-darwin): Struggling a little with running the integration test locally; does it need both a the tarball building and the worker_jar? ./scripts/run_integration_test.sh --project [my-project] --gcs_location gs://[my-project]/tmp --test_opts --tests=apache_beam.io.gcp.pubsub_integration_test --runner TestDataflowRunner --sdk_location ./dist/apache-beam-2.16.0.dev0.tar.gz {{AssertionError: }} {{Expected: (Test pipeline expected terminated in state: RUNNING and Expected 4 messages.)}} {{ but: Test pipeline expected terminated in state: RUNNING Test pipeline job terminated in state: FAILED}} In addition, for the pubsub integration test, given that pubsub will generate the message_id and publish_time, I'm not sure how exactly to handle that in the expected messages? > PubsubMessage message parsing is lacking non-attribute fields > - > > Key: BEAM-7819 > URL: https://issues.apache.org/jira/browse/BEAM-7819 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Ahmet Altay >Assignee: Matthew Darwin >Priority: Major > Time Spent: 9h 40m > Remaining Estimate: 0h > > User reported issue: > https://lists.apache.org/thread.html/139b0c15abc6471a2e2202d76d915c645a529a23ecc32cd9cfecd315@%3Cuser.beam.apache.org%3E > """ > Looking at the source code, with my untrained python eyes, I think if the > intention is to include the message id and the publish time in the attributes > attribute of the PubSubMessage type, then the protobuf mapping is missing > something:- > @staticmethod > def _from_proto_str(proto_msg): > """Construct from serialized form of ``PubsubMessage``. > Args: > proto_msg: String containing a serialized protobuf of type > https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage > Returns: > A new PubsubMessage object. > """ > msg = pubsub.types.pubsub_pb2.PubsubMessage() > msg.ParseFromString(proto_msg) > # Convert ScalarMapContainer to dict. > attributes = dict((key, msg.attributes[key]) for key in msg.attributes) > return PubsubMessage(msg.data, attributes) > The protobuf definition is here:- > https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage > and so it looks as if the message_id and publish_time are not being parsed as > they are seperate from the attributes. Perhaps the PubsubMessage class needs > expanding to include these as attributes, or they would need adding to the > dictionary for attributes. This would only need doing for the _from_proto_str > as obviously they would not need to be populated when transmitting a message > to PubSub. > My python is not great, I'm assuming the latter option would need to look > something like this? > attributes = dict((key, msg.attributes[key]) for key in msg.attributes) > attributes.update({'message_id': msg.message_id, 'publish_time': > msg.publish_time}) > return PubsubMessage(msg.data, attributes) > """ -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7967) Execute portable Flink application jar
[ https://issues.apache.org/jira/browse/BEAM-7967?focusedWorklogId=307365=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307365 ] ASF GitHub Bot logged work on BEAM-7967: Author: ASF GitHub Bot Created on: 05/Sep/19 17:39 Start Date: 05/Sep/19 17:39 Worklog Time Spent: 10m Work Description: ibzib commented on issue #9408: [BEAM-7967] Execute portable Flink application jar URL: https://github.com/apache/beam/pull/9408#issuecomment-528491791 > When is this test intended to run? Postcommit only? @tweise Postcommit or trigger only This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307365) Time Spent: 1h 40m (was: 1.5h) > Execute portable Flink application jar > -- > > Key: BEAM-7967 > URL: https://issues.apache.org/jira/browse/BEAM-7967 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Time Spent: 1h 40m > Remaining Estimate: 0h > > [https://docs.google.com/document/d/1kj_9JWxGWOmSGeZ5hbLVDXSTv-zBrx4kQRqOq85RYD4/edit#heading=h.oes73844vmhl] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Assigned] (BEAM-7819) PubsubMessage message parsing is lacking non-attribute fields
[ https://issues.apache.org/jira/browse/BEAM-7819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthew Darwin reassigned BEAM-7819: Assignee: Matthew Darwin (was: Udi Meiri) > PubsubMessage message parsing is lacking non-attribute fields > - > > Key: BEAM-7819 > URL: https://issues.apache.org/jira/browse/BEAM-7819 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Ahmet Altay >Assignee: Matthew Darwin >Priority: Major > Time Spent: 9h 40m > Remaining Estimate: 0h > > User reported issue: > https://lists.apache.org/thread.html/139b0c15abc6471a2e2202d76d915c645a529a23ecc32cd9cfecd315@%3Cuser.beam.apache.org%3E > """ > Looking at the source code, with my untrained python eyes, I think if the > intention is to include the message id and the publish time in the attributes > attribute of the PubSubMessage type, then the protobuf mapping is missing > something:- > @staticmethod > def _from_proto_str(proto_msg): > """Construct from serialized form of ``PubsubMessage``. > Args: > proto_msg: String containing a serialized protobuf of type > https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage > Returns: > A new PubsubMessage object. > """ > msg = pubsub.types.pubsub_pb2.PubsubMessage() > msg.ParseFromString(proto_msg) > # Convert ScalarMapContainer to dict. > attributes = dict((key, msg.attributes[key]) for key in msg.attributes) > return PubsubMessage(msg.data, attributes) > The protobuf definition is here:- > https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage > and so it looks as if the message_id and publish_time are not being parsed as > they are seperate from the attributes. Perhaps the PubsubMessage class needs > expanding to include these as attributes, or they would need adding to the > dictionary for attributes. This would only need doing for the _from_proto_str > as obviously they would not need to be populated when transmitting a message > to PubSub. > My python is not great, I'm assuming the latter option would need to look > something like this? > attributes = dict((key, msg.attributes[key]) for key in msg.attributes) > attributes.update({'message_id': msg.message_id, 'publish_time': > msg.publish_time}) > return PubsubMessage(msg.data, attributes) > """ -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-7819) PubsubMessage message parsing is lacking non-attribute fields
[ https://issues.apache.org/jira/browse/BEAM-7819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923646#comment-16923646 ] Matthew Darwin commented on BEAM-7819: -- Struggling a little with running the integration test locally; does it need both a the tarball building and the worker_jar? ./scripts/run_integration_test.sh --project [my-project] --gcs_location gs://[my-project]/tmp --test_opts --tests=apache_beam.io.gcp.pubsub_integration_test --runner TestDataflowRunner --sdk_location ./dist/apache-beam-2.16.0.dev0.tar.gz {{AssertionError: }} {{Expected: (Test pipeline expected terminated in state: RUNNING and Expected 4 messages.)}} {{ but: Test pipeline expected terminated in state: RUNNING Test pipeline job terminated in state: FAILED}} In addition, for the pubsub integration test, given that pubsub will generate the message_id and publish_time, I'm not sure how exactly to handle that in the expected messages? > PubsubMessage message parsing is lacking non-attribute fields > - > > Key: BEAM-7819 > URL: https://issues.apache.org/jira/browse/BEAM-7819 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Ahmet Altay >Assignee: Udi Meiri >Priority: Major > Time Spent: 9h 40m > Remaining Estimate: 0h > > User reported issue: > https://lists.apache.org/thread.html/139b0c15abc6471a2e2202d76d915c645a529a23ecc32cd9cfecd315@%3Cuser.beam.apache.org%3E > """ > Looking at the source code, with my untrained python eyes, I think if the > intention is to include the message id and the publish time in the attributes > attribute of the PubSubMessage type, then the protobuf mapping is missing > something:- > @staticmethod > def _from_proto_str(proto_msg): > """Construct from serialized form of ``PubsubMessage``. > Args: > proto_msg: String containing a serialized protobuf of type > https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage > Returns: > A new PubsubMessage object. > """ > msg = pubsub.types.pubsub_pb2.PubsubMessage() > msg.ParseFromString(proto_msg) > # Convert ScalarMapContainer to dict. > attributes = dict((key, msg.attributes[key]) for key in msg.attributes) > return PubsubMessage(msg.data, attributes) > The protobuf definition is here:- > https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage > and so it looks as if the message_id and publish_time are not being parsed as > they are seperate from the attributes. Perhaps the PubsubMessage class needs > expanding to include these as attributes, or they would need adding to the > dictionary for attributes. This would only need doing for the _from_proto_str > as obviously they would not need to be populated when transmitting a message > to PubSub. > My python is not great, I'm assuming the latter option would need to look > something like this? > attributes = dict((key, msg.attributes[key]) for key in msg.attributes) > attributes.update({'message_id': msg.message_id, 'publish_time': > msg.publish_time}) > return PubsubMessage(msg.data, attributes) > """ -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7967) Execute portable Flink application jar
[ https://issues.apache.org/jira/browse/BEAM-7967?focusedWorklogId=307364=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307364 ] ASF GitHub Bot logged work on BEAM-7967: Author: ASF GitHub Bot Created on: 05/Sep/19 17:37 Start Date: 05/Sep/19 17:37 Worklog Time Spent: 10m Work Description: tweise commented on issue #9408: [BEAM-7967] Execute portable Flink application jar URL: https://github.com/apache/beam/pull/9408#issuecomment-528491002 I mean it is great to have a backdoor to publish container snapshots, but.. When is this test intended to run? Postcommit only? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307364) Time Spent: 1.5h (was: 1h 20m) > Execute portable Flink application jar > -- > > Key: BEAM-7967 > URL: https://issues.apache.org/jira/browse/BEAM-7967 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Time Spent: 1.5h > Remaining Estimate: 0h > > [https://docs.google.com/document/d/1kj_9JWxGWOmSGeZ5hbLVDXSTv-zBrx4kQRqOq85RYD4/edit#heading=h.oes73844vmhl] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7967) Execute portable Flink application jar
[ https://issues.apache.org/jira/browse/BEAM-7967?focusedWorklogId=307357=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307357 ] ASF GitHub Bot logged work on BEAM-7967: Author: ASF GitHub Bot Created on: 05/Sep/19 17:30 Start Date: 05/Sep/19 17:30 Worklog Time Spent: 10m Work Description: ibzib commented on issue #9408: [BEAM-7967] Execute portable Flink application jar URL: https://github.com/apache/beam/pull/9408#issuecomment-528488379 Run Seed Job This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307357) Time Spent: 1h 20m (was: 1h 10m) > Execute portable Flink application jar > -- > > Key: BEAM-7967 > URL: https://issues.apache.org/jira/browse/BEAM-7967 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Time Spent: 1h 20m > Remaining Estimate: 0h > > [https://docs.google.com/document/d/1kj_9JWxGWOmSGeZ5hbLVDXSTv-zBrx4kQRqOq85RYD4/edit#heading=h.oes73844vmhl] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8157) Flink state requests return wrong state in timers when encoded key is length-prefixed
[ https://issues.apache.org/jira/browse/BEAM-8157?focusedWorklogId=307355=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307355 ] ASF GitHub Bot logged work on BEAM-8157: Author: ASF GitHub Bot Created on: 05/Sep/19 17:25 Start Date: 05/Sep/19 17:25 Worklog Time Spent: 10m Work Description: tweise commented on pull request #9484: [BEAM-8157] Remove length prefix from state key for Flink's state backend URL: https://github.com/apache/beam/pull/9484#discussion_r321387508 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java ## @@ -343,11 +343,11 @@ public void clear(K key, W window) { } private void prepareStateBackend(K key) { - // Key for state request is shipped already encoded as ByteString, - // this is mostly a wrapping with ByteBuffer. We still follow the - // usual key encoding procedure. - // final ByteBuffer encodedKey = FlinkKeyUtils.encodeKey(key, keyCoder); - final ByteBuffer encodedKey = ByteBuffer.wrap(key.toByteArray()); + // Key for state request is shipped already encoded as ByteString, but it is Review comment: There is something that I'm missing. State and timers work as of release 2.14 with Flink 1.8. What has caused it to break? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307355) Time Spent: 1h 50m (was: 1h 40m) > Flink state requests return wrong state in timers when encoded key is > length-prefixed > - > > Key: BEAM-8157 > URL: https://issues.apache.org/jira/browse/BEAM-8157 > Project: Beam > Issue Type: Bug > Components: runner-flink >Affects Versions: 2.13.0 >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Fix For: 2.16.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Due to multiple changes made in BEAM-7126, the Flink internal key encoding is > broken when the key is encoded with a length prefix. The Flink runner > requires the internal key to be encoded without a length prefix. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7616) urlopen calls could get stuck without a timeout
[ https://issues.apache.org/jira/browse/BEAM-7616?focusedWorklogId=307354=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307354 ] ASF GitHub Bot logged work on BEAM-7616: Author: ASF GitHub Bot Created on: 05/Sep/19 17:23 Start Date: 05/Sep/19 17:23 Worklog Time Spent: 10m Work Description: aaltay commented on issue #9401: [BEAM-7616] apitools use urllib with the global timeout. Set it to 60 seconds # to prevent network related stuckness issues. URL: https://github.com/apache/beam/pull/9401#issuecomment-528485156 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307354) Time Spent: 5h (was: 4h 50m) > urlopen calls could get stuck without a timeout > --- > > Key: BEAM-7616 > URL: https://issues.apache.org/jira/browse/BEAM-7616 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ahmet Altay >Assignee: Udi Meiri >Priority: Blocker > Fix For: 2.14.0, 2.16.0 > > Time Spent: 5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7616) urlopen calls could get stuck without a timeout
[ https://issues.apache.org/jira/browse/BEAM-7616?focusedWorklogId=307353=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307353 ] ASF GitHub Bot logged work on BEAM-7616: Author: ASF GitHub Bot Created on: 05/Sep/19 17:22 Start Date: 05/Sep/19 17:22 Worklog Time Spent: 10m Work Description: aaltay commented on issue #9401: [BEAM-7616] apitools use urllib with the global timeout. Set it to 60 seconds # to prevent network related stuckness issues. URL: https://github.com/apache/beam/pull/9401#issuecomment-528485086 Run Python 2 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307353) Time Spent: 4h 50m (was: 4h 40m) > urlopen calls could get stuck without a timeout > --- > > Key: BEAM-7616 > URL: https://issues.apache.org/jira/browse/BEAM-7616 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ahmet Altay >Assignee: Udi Meiri >Priority: Blocker > Fix For: 2.14.0, 2.16.0 > > Time Spent: 4h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8131) Provide Kubernetes setup with Prometheus
[ https://issues.apache.org/jira/browse/BEAM-8131?focusedWorklogId=307352=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307352 ] ASF GitHub Bot logged work on BEAM-8131: Author: ASF GitHub Bot Created on: 05/Sep/19 17:21 Start Date: 05/Sep/19 17:21 Worklog Time Spent: 10m Work Description: Ardagan commented on issue #9482: [BEAM-8131] Provide Kubernetes setup for Prometheus URL: https://github.com/apache/beam/pull/9482#issuecomment-528484358 In doc I see that we decided to move from BQ to Prometheus. Can you add some background on this decision compared to using Grafana BQ connector? https://grafana.com/grafana/plugins/doitintl-bigquery-datasource/installation (I added comment in doc as well) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 307352) Time Spent: 1h (was: 50m) > Provide Kubernetes setup with Prometheus > > > Key: BEAM-8131 > URL: https://issues.apache.org/jira/browse/BEAM-8131 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Kamil Wasilewski >Assignee: Kamil Wasilewski >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)