[jira] [Created] (BEAM-10212) Add support for Java SDK harness state caching

2020-06-08 Thread Luke Cwik (Jira)
Luke Cwik created BEAM-10212:


 Summary: Add support for Java SDK harness state caching
 Key: BEAM-10212
 URL: https://issues.apache.org/jira/browse/BEAM-10212
 Project: Beam
  Issue Type: Sub-task
  Components: sdk-java-harness
Reporter: Luke Cwik


Tech spec: 
[https://docs.google.com/document/d/1BOozW0bzBuz4oHJEuZNDOHdzaV5Y56ix58Ozrqm2jFg/edit#heading=h.7ghoih5aig5m]

Relevant document: 
[https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit#|https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit]

Mailing list link: 
[https://lists.apache.org/thread.html/caa8d9bc6ca871d13de2c5e6ba07fdc76f85d26497d95d90893aa1f6@%3Cdev.beam.apache.org%3E]

 

See https://issues.apache.org/jira/browse/BEAM-5428 and 
https://issues.apache.org/jira/browse/BEAM-8298 for Python implementation 
details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-10142) Remove additional identity function workaround in View.java

2020-06-05 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik closed BEAM-10142.

Fix Version/s: Not applicable
 Assignee: Luke Cwik
   Resolution: Not A Problem

This isn't needed as the workaround wasn't implemented in pr/11821

> Remove additional identity function workaround in View.java
> ---
>
> Key: BEAM-10142
> URL: https://issues.apache.org/jira/browse/BEAM-10142
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, sdk-java-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
>  Labels: portability
> Fix For: Not applicable
>
>
> Dataflow is impacted by a bug in how it does graph replacement for transforms 
> and needs the view transforms to have any transform so that the resulting 
> CreateDataflowView is not considered a composite transform.
>  
> We need to either fix the expansion, migrate DataflowTranslator to use the 
> pipeline proto, or migrate Dataflow to only run when using the beam_fn_api 
> experiment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-10190) Reduce cost of toString of MetricKey and MetricName

2020-06-04 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-10190.
--
Fix Version/s: 2.23.0
   Resolution: Fixed

> Reduce cost of toString of MetricKey and MetricName
> ---
>
> Key: BEAM-10190
> URL: https://issues.apache.org/jira/browse/BEAM-10190
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Yixing Zhang
>Assignee: Yixing Zhang
>Priority: P2
> Fix For: 2.23.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Samza runner heavily uses MetricKey.toString() and MetricName.toString() to 
> update Samza metrics. We found that the toString methods have high CPU cost. 
> And according to this article: 
> [https://redfin.engineering/java-string-concatenation-which-way-is-best-8f590a7d22a8],
>  we should use "+" operator instead of String.format for string concatenation 
> for better performance.
> We do see a 10% QPS gain in nexmark queries using Samza runner with the 
> change of using "+" operator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8602) Always use shadow configuration for direct runner dependencies

2020-06-03 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-8602.
-
Fix Version/s: 2.23.0
   Resolution: Fixed

> Always use shadow configuration for direct runner dependencies
> --
>
> Key: BEAM-8602
> URL: https://issues.apache.org/jira/browse/BEAM-8602
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql, sdk-java-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: P2
> Fix For: 2.23.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-10179) Remove URNJavaDoFn since the workaround is no longer required

2020-06-02 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-10179.
--
Fix Version/s: 2.23.0
   Resolution: Fixed

> Remove URNJavaDoFn since the workaround is no longer required
> -
>
> Key: BEAM-10179
> URL: https://issues.apache.org/jira/browse/BEAM-10179
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
> Fix For: 2.23.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10179) Remove URNJavaDoFn since the workaround is no longer required

2020-06-02 Thread Luke Cwik (Jira)
Luke Cwik created BEAM-10179:


 Summary: Remove URNJavaDoFn since the workaround is no longer 
required
 Key: BEAM-10179
 URL: https://issues.apache.org/jira/browse/BEAM-10179
 Project: Beam
  Issue Type: Improvement
  Components: sdk-go
Reporter: Luke Cwik
Assignee: Luke Cwik






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8374) PublishResult returned by SnsIO is missing sdkResponseMetadata and sdkHttpMetadata

2020-06-02 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-8374.
-
Fix Version/s: 2.20.0
   Resolution: Fixed

> PublishResult returned by SnsIO is missing sdkResponseMetadata and 
> sdkHttpMetadata
> --
>
> Key: BEAM-8374
> URL: https://issues.apache.org/jira/browse/BEAM-8374
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-aws
>Affects Versions: 2.13.0, 2.14.0, 2.15.0
>Reporter: Jonothan Farr
>Assignee: Jonothan Farr
>Priority: P3
>  Labels: stale-assigned
> Fix For: 2.20.0
>
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>
> Currently the PublishResultCoder in SnsIO only serializes the messageId field 
> so the PublishResult returned by Beam returns null for 
> getSdkResponseMetadata() and getSdkHttpMetadata(). This makes it impossible 
> to check the HTTP status for errors, which is necessary since this is not 
> handled in SnsIO.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-4766) Use vendored version of protobuf in sdks/java/core or replace usage of ByteString with something else

2020-06-02 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-4766.
-
Fix Version/s: 2.23.0
 Assignee: Luke Cwik
   Resolution: Fixed

> Use vendored version of protobuf in sdks/java/core or replace usage of 
> ByteString with something else
> -
>
> Key: BEAM-4766
> URL: https://issues.apache.org/jira/browse/BEAM-4766
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
>  Labels: portability, stale-P2
> Fix For: 2.23.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The sdk/java/core library will bring in the pipeline model proto to be able 
> to perform Pipeline > proto translation within the SDK instead of it being 
> hidden away in runners core and similar libraries so it makes a lot of sense 
> to just migrate the implementation to use vendored protobuf.
>  
> This is currently used in ByteKey and TextSource.
>  
> An alternative would be to re-implement the logic to not use ByteStrings.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-3736) Add SetUp() and TearDown() for CombineFns

2020-06-02 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124134#comment-17124134
 ] 

Luke Cwik commented on BEAM-3736:
-

May I suggest:
Step 0: Add a ValidatesRunner test that fails if setup isn't invoked. Take a 
look at some of the lifecycle tests: 
https://github.com/apache/beam/blob/9c16b898f0c90e83d74f5ac1a0d5b8853f872ebb/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java#L1500

You can then validate that in step 1 it fails because the runner is rejecting 
it and in step 2 it should now pass.

> Add SetUp() and TearDown() for CombineFns
> -
>
> Key: BEAM-3736
> URL: https://issues.apache.org/jira/browse/BEAM-3736
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-py-core
>Reporter: Chuan Yu Foo
>Priority: P2
>
> I have a CombineFn that has a large amount of state that needs to be loaded 
> once before it can add_input or merge_combiners (for example, the CombineFn 
> might load up a large lookup table used for combining). 
> Right now, to initialise this state, for each of the methods, I check if the 
> state has already been initialised, and if not, I initialise it. It would be 
> nice if CombineFn provided a SetUp() method that is called once to initialise 
> this state (and a corresponding TearDown() method to clean up this state if 
> necessary).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-7063) Stage Dataflow runner harness in Dataflow FnApi Python test suites that run using an unreleased SDK.

2020-06-02 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-7063.
-
Fix Version/s: Not applicable
   Resolution: Fixed

I believe we pass the --dataflowWorkerJar everywhere now and have been for a 
while.

> Stage Dataflow runner harness in Dataflow FnApi Python test suites that run 
> using an unreleased SDK.
> 
>
> Key: BEAM-7063
> URL: https://issues.apache.org/jira/browse/BEAM-7063
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Valentyn Tymofieiev
>Priority: P2
>  Labels: stale-assigned
> Fix For: Not applicable
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Seems like this was the first post-commit failure: 
> [https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_Python_Verify/7855/]
> (the one before this also failed but that looks like a flake)
>  
>  
> Looking at failed tests I noticed that some of the were due to failed 
> Dataflow jobs. I noticed three issues.
> (1) Error in Dataflow jobs "This handler is only capable of dealing with 
> urn:beam:sideinput:materialization:multimap:0.1 materializations but was 
> asked to handle beam:side_input:multimap:v1 for PCollectionView with tag 
> side0-FilterOutSpammers."
> [https://pantheon.corp.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2019-04-08_17_57_01-16397932959358575026?project=apache-beam-testing]
> [https://pantheon.corp.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2019-04-08_18_07_11-1342017143600540529?project=apache-beam-testing]
>  
> (2)  BigQuery job failures (possibly this is transient)
> [https://pantheon.corp.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2019-04-08_18_25_22-16437560120793921616?project=apache-beam-testing]
>  
> (3) Some batch Dataflow jobs timed out and were cancelled (possibly also 
> transient).
>  
> (1) is probably due to 
> [https://github.com/apache/beam/commit/684f8130284a7c7979773300d04e5473ca0ac8f3]
>  
>  
> cc: [~altay] [~lostluck] [~alanmyrvold]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8539) Clearly define the valid job state transitions

2020-06-02 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-8539.
-
Fix Version/s: 2.21.0
 Assignee: Luke Cwik
   Resolution: Fixed

> Clearly define the valid job state transitions
> --
>
> Key: BEAM-8539
> URL: https://issues.apache.org/jira/browse/BEAM-8539
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, runner-core, sdk-java-core, sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Luke Cwik
>Priority: P2
>  Labels: stale-P2
> Fix For: 2.21.0
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> The Beam job state transitions are ill-defined, which is big problem for 
> anything that relies on the values coming from JobAPI.GetStateStream.
> I was hoping to find something like a state transition diagram in the docs so 
> that I could determine the start state, the terminal states, and the valid 
> transitions, but I could not find this. The code reveals that the SDKs differ 
> on the fundamentals:
> Java InMemoryJobService:
>  * start state: *STOPPED*
>  * run - about to submit to executor:  STARTING
>  * run - actually running on executor:  RUNNING
>  * terminal states: DONE, FAILED, CANCELLED, DRAINED
> Python AbstractJobServiceServicer / LocalJobServicer:
>  * start state: STARTING
>  * terminal states: DONE, FAILED, CANCELLED, *STOPPED*
> I think it would be good to make python work like Java, so that there is a 
> difference in state between a job that has been prepared and one that has 
> additionally been run.
> It's hard to tell how far this problem has spread within the various runners. 
>  I think a simple thing that can be done to help standardize behavior is to 
> implement the terminal states as an enum in the beam_job_api.proto, or create 
> a utility function in each language for checking if a state is terminal, so 
> that it's not left up to each runner to reimplement this logic.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-6142) Improve ByteBuddy DoFnInvoker implementation wrt SplittableDoFns / BundleFinalization

2020-06-02 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-6142.
-
Fix Version/s: 2.21.0
 Assignee: Luke Cwik
   Resolution: Fixed

> Improve ByteBuddy DoFnInvoker implementation wrt SplittableDoFns / 
> BundleFinalization
> -
>
> Key: BEAM-6142
> URL: https://issues.apache.org/jira/browse/BEAM-6142
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
>  Labels: stale-P2
> Fix For: 2.21.0
>
>
> Add support for bundle finalization parameter to start bundle and finish 
> bundle and allow for arbitrary ordering of parameters.
> Make InputT optional for GetInitialRestricton and allow for arbitrary 
> ordering of parameters.
> Make InputT and Backlog parameters optional for SplitRestriction and allow 
> for arbitrary ordering of parameters/
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-4660) Add well known timer coder for Python SDK

2020-06-02 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-4660.
-
Fix Version/s: 2.21.0
 Assignee: Robert Bradshaw
   Resolution: Fixed

> Add well known timer coder for Python SDK
> -
>
> Key: BEAM-4660
> URL: https://issues.apache.org/jira/browse/BEAM-4660
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Robert Bradshaw
>Priority: P2
>  Labels: portability, stale-P2
> Fix For: 2.21.0
>
>
> Encoding is: [Instant, Payload]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-4657) Python SDK harness should support user timers

2020-06-02 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-4657.
-
Fix Version/s: 2.21.0
   Resolution: Fixed

> Python SDK harness should support user timers
> -
>
> Key: BEAM-4657
> URL: https://issues.apache.org/jira/browse/BEAM-4657
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Boyuan Zhang
>Priority: P2
>  Labels: portability, stale-P2
> Fix For: 2.21.0
>
>
> Wire up the onTimer method in the Python SDK harness connecting it to the 
> RemoteGrpcPort read/write that is responsible for producing/consumer timers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-4657) Python SDK harness should support user timers

2020-06-02 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik reassigned BEAM-4657:
---

Assignee: Boyuan Zhang

> Python SDK harness should support user timers
> -
>
> Key: BEAM-4657
> URL: https://issues.apache.org/jira/browse/BEAM-4657
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Boyuan Zhang
>Priority: P2
>  Labels: portability, stale-P2
>
> Wire up the onTimer method in the Python SDK harness connecting it to the 
> RemoteGrpcPort read/write that is responsible for producing/consumer timers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-4582) Incorrectly translates apache_beam.runners.dataflow.native_io.streaming_create.DecodeAndEmitDoFn when creating the Dataflow pipeline json description

2020-06-02 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-4582.
-
Fix Version/s: 2.21.0
 Assignee: Luke Cwik
   Resolution: Cannot Reproduce

Streaming create for portable Dataflow now works.

> Incorrectly translates 
> apache_beam.runners.dataflow.native_io.streaming_create.DecodeAndEmitDoFn 
> when creating the Dataflow pipeline json description
> -
>
> Key: BEAM-4582
> URL: https://issues.apache.org/jira/browse/BEAM-4582
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
>  Labels: portability, stale-P2
> Fix For: 2.21.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> When executing against Dataflow, the JSON pipeline description contains the 
> following JSON which doesn't appear in the pipeline proto:
>  
> {code:java}
> {
>   "kind": "ParallelDo", 
>   "name": "s2", 
>   "properties": {
> "display_data": [
>   {
> "key": "fn", 
> "label": "Transform Function", 
> "namespace": "apache_beam.transforms.core.ParDo", 
> "shortValue": "DecodeAndEmitDoFn", 
> "type": "STRING", 
> "value": 
> "apache_beam.runners.dataflow.native_io.streaming_create.DecodeAndEmitDoFn"
>   }
> ], 
> "non_parallel_inputs": {}, 
> "output_info": [
>   {
> "encoding": {
>   "@type": "kind:windowed_value", 
>   "component_encodings": [
> {
>   "@type": 
> "FastPrimitivesCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxuicUlAUWZuZklmWWpxc4gQa5CBs3GQsbaQqZQ/vi0xJycpMTk7Hiw+kJmPEYFZCZn56RCjWABGsFaW8iWVJykBwDlGS3/",
>  
>   "component_encodings": [
> {
>   "@type": 
> "FastPrimitivesCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxuicUlAUWZuZklmWWpxc4gQa5CBs3GQsbaQqZQ/vi0xJycpMTk7Hiw+kJmPEYFZCZn56RCjWABGsFaW8iWVJykBwDlGS3/",
>  
>   "component_encodings": []
> }, 
> {
>   "@type": 
> "FastPrimitivesCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxuicUlAUWZuZklmWWpxc4gQa5CBs3GQsbaQqZQ/vi0xJycpMTk7Hiw+kJmPEYFZCZn56RCjWABGsFaW8iWVJykBwDlGS3/",
>  
>   "component_encodings": []
> }
>   ], 
>   "is_pair_like": true
> }, 
> {
>   "@type": "kind:global_window"
> }
>   ], 
>   "is_wrapper": true
> }, 
> "output_name": "out", 
> "user_name": "Some Numbers/Decode Values.out"
>   }
> ], 
> "parallel_input": {
>   "@type": "OutputReference", 
>   "output_name": "out", 
>   "step_name": "s1"
> }, 
> "serialized_fn": "ref_AppliedPTransform_AppliedPTransform_45", 
> "user_name": "Some Numbers/Decode Values"
>   }
> }, 
> {code}
> This causes the DataflowRunner to use a legacy code path and ask the Python 
> SDK harness to execute a transform with a payload 
> *ref_AppliedPTransform_AppliedPTransform_45* instead of sending the 
> PTransform proto.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-4655) Update pipeline translation for timers inside Python SDK

2020-06-02 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik closed BEAM-4655.
---
Resolution: Not A Problem

Migrated timers to use a separate field on the data channel and not model them 
as PCollections.

> Update pipeline translation for timers inside Python SDK
> 
>
> Key: BEAM-4655
> URL: https://issues.apache.org/jira/browse/BEAM-4655
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Luke Cwik
>Assignee: Boyuan Zhang
>Priority: P2
>  Labels: portability, stale-P2
> Fix For: 2.21.0
>
>
> Add the timer PCollection and treat timers as inputs/outputs of the ParDo 
> PTransform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-4655) Update pipeline translation for timers inside Python SDK

2020-06-02 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-4655.
-
Fix Version/s: 2.21.0
   Resolution: Fixed

> Update pipeline translation for timers inside Python SDK
> 
>
> Key: BEAM-4655
> URL: https://issues.apache.org/jira/browse/BEAM-4655
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Luke Cwik
>Assignee: Boyuan Zhang
>Priority: P2
>  Labels: portability, stale-P2
> Fix For: 2.21.0
>
>
> Add the timer PCollection and treat timers as inputs/outputs of the ParDo 
> PTransform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (BEAM-4655) Update pipeline translation for timers inside Python SDK

2020-06-02 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik reopened BEAM-4655:
-

> Update pipeline translation for timers inside Python SDK
> 
>
> Key: BEAM-4655
> URL: https://issues.apache.org/jira/browse/BEAM-4655
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Luke Cwik
>Assignee: Boyuan Zhang
>Priority: P2
>  Labels: portability, stale-P2
> Fix For: 2.21.0
>
>
> Add the timer PCollection and treat timers as inputs/outputs of the ParDo 
> PTransform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-4948) Beam Dependency Update Request: com.google.guava

2020-06-02 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-4948:

Fix Version/s: (was: 2.15.0)

> Beam Dependency Update Request: com.google.guava
> 
>
> Key: BEAM-4948
> URL: https://issues.apache.org/jira/browse/BEAM-4948
> Project: Beam
>  Issue Type: Bug
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Priority: P2
>  Labels: stale-P2
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> 2018-07-25 20:28:03.628639
> Please review and upgrade the com.google.guava to the latest version 
> None 
>  
> cc: 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-4655) Update pipeline translation for timers inside Python SDK

2020-06-02 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik reassigned BEAM-4655:
---

Assignee: Boyuan Zhang

> Update pipeline translation for timers inside Python SDK
> 
>
> Key: BEAM-4655
> URL: https://issues.apache.org/jira/browse/BEAM-4655
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Luke Cwik
>Assignee: Boyuan Zhang
>Priority: P2
>  Labels: portability, stale-P2
>
> Add the timer PCollection and treat timers as inputs/outputs of the ParDo 
> PTransform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-4766) Use vendored version of protobuf in sdks/java/core or replace usage of ByteString with something else

2020-06-02 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17123968#comment-17123968
 ] 

Luke Cwik commented on BEAM-4766:
-

Fix in https://github.com/apache/beam/pull/11891

> Use vendored version of protobuf in sdks/java/core or replace usage of 
> ByteString with something else
> -
>
> Key: BEAM-4766
> URL: https://issues.apache.org/jira/browse/BEAM-4766
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Priority: P2
>  Labels: portability, stale-P2
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The sdk/java/core library will bring in the pipeline model proto to be able 
> to perform Pipeline > proto translation within the SDK instead of it being 
> hidden away in runners core and similar libraries so it makes a lot of sense 
> to just migrate the implementation to use vendored protobuf.
>  
> This is currently used in ByteKey and TextSource.
>  
> An alternative would be to re-implement the logic to not use ByteStrings.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8647) Remove .mailmap from the sources

2020-06-02 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik reassigned BEAM-8647:
---

Assignee: Romain Manni-Bucau

> Remove .mailmap from the sources
> 
>
> Key: BEAM-8647
> URL: https://issues.apache.org/jira/browse/BEAM-8647
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Romain Manni-Bucau
>Assignee: Romain Manni-Bucau
>Priority: P2
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Hi,
>  
> .mailmap manipulates individuals data which are considered "personal" (name, 
> email etc)
> AFAIK Apache/Beam is not allowed to do it straight, in particular for EU 
> citizens (_GDPR)._
> Can the file be removed since it is not used by the beam project (at least 
> apache/beam repo)?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-10158) [Python] Reuse a shared unbounded thread pool

2020-06-01 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-10158.
--
Fix Version/s: 2.23.0
   Resolution: Fixed

> [Python] Reuse a shared unbounded thread pool
> -
>
> Key: BEAM-10158
> URL: https://issues.apache.org/jira/browse/BEAM-10158
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
> Fix For: 2.23.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> During testing we create a lot of thread pools many of which we don't 
> shutdown which can lead to thread exhaustion on some machiens.
>  
> Swapping to use a shared thread pool will decrease the memory overhead for 
> these unused threads and allow for greater reuse.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10158) [Python] Reuse a shared unbounded thread pool

2020-05-29 Thread Luke Cwik (Jira)
Luke Cwik created BEAM-10158:


 Summary: [Python] Reuse a shared unbounded thread pool
 Key: BEAM-10158
 URL: https://issues.apache.org/jira/browse/BEAM-10158
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: Luke Cwik
Assignee: Luke Cwik


During testing we create a lot of thread pools many of which we don't shutdown 
which can lead to thread exhaustion on some machiens.

 

Swapping to use a shared thread pool will decrease the memory overhead for 
these unused threads and allow for greater reuse.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10053) Timers exception on "Job Drain" while using stateful beam processing in global window

2020-05-29 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-10053:
-
Status: Open  (was: Triage Needed)

> Timers exception on "Job Drain" while using stateful beam processing in 
> global window
> -
>
> Key: BEAM-10053
> URL: https://issues.apache.org/jira/browse/BEAM-10053
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-java-core
>Affects Versions: 2.19.0
>Reporter: MOHIL
>Priority: P2
>
> Hello,
>  
> I have a use case where I have two sets of PCollections (RecordA and RecordB) 
> coming from a real time streaming source like Kafka.
>  
> Both Records are correlated with a common key, let's say KEY.
>  
> The purpose is to enrich RecordA with RecordB's data for which I am using 
> CoGbByKey. Since RecordA and RecordB for a common key can come within 1-2 
> minutes of event time, I am maintaining a sliding window for both records and 
> then do CoGpByKey for both PCollections.
>  
> The sliding windows that will find both RecordA and RecordB for a common key 
> KEY, will emit enriched output. Now, since multiple sliding windows can emit 
> the same output, I finally remove duplicate results by feeding aforementioned 
> outputs to a global window where I maintain a state to check whether output 
> has already been processed or not. Since it is a global window, I maintain a 
> Timer on state (for GC) to let it expire after 10 minutes have elapsed since 
> state has been written.
>  
> This is working perfectly fine w.r.t the expected results. However, I am 
> unable to stop job gracefully i.e. Drain the job gracefully. I see following 
> exception:
>  
> java.lang.IllegalStateException: 
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn@16b089a6 received state 
> cleanup timer for window 
> org.apache.beam.sdk.transforms.windowing.GlobalWindow@29ca0210 that is before 
> the appropriate cleanup time 294248-01-24T04:00:54.776Z
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState(Preconditions.java:842)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processSystemTimer(SimpleParDoFn.java:384)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.access$700(SimpleParDoFn.java:73)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn$TimerType$2.processTimer(SimpleParDoFn.java:444)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processTimers(SimpleParDoFn.java:467)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processTimers(SimpleParDoFn.java:354)
> org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.finish(ParDoOperation.java:52)
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85)
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1316)
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:149)
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1049)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> java.lang.Thread.run(Thread.java:745)
> java.lang.IllegalStateException: 
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn@59902a10 received state 
> cleanup timer for window 
> org.apache.beam.sdk.transforms.windowing.GlobalWindow@29ca0210 that is before 
> the appropriate cleanup time 294248-01-24T04:00:54.776Z
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState(Preconditions.java:842)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processSystemTimer(SimpleParDoFn.java:384)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.access$700(SimpleParDoFn.java:73)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn$TimerType$2.processTimer(SimpleParDoFn.java:444)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processTimers(SimpleParDoFn.java:467)
> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processTimers(SimpleParDoFn.java:354)
> org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.finish(ParDoOperation.java:52)
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85)
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1316)
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:149)
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1049)
> 

[jira] [Updated] (BEAM-10114) Add Pub/Sub Lite IO to beam builtin

2020-05-28 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-10114:
-
Status: Open  (was: Triage Needed)

> Add Pub/Sub Lite IO to beam builtin
> ---
>
> Key: BEAM-10114
> URL: https://issues.apache.org/jira/browse/BEAM-10114
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Daniel Collins
>Priority: P2
>
> The IO currently lives [on the pubsub lite 
> github|[https://github.com/googleapis/java-pubsublite/tree/master/pubsublite-beam-io]]
>  but should be moved to being part of beam.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10142) Remove additional identity function workaround in View.java

2020-05-28 Thread Luke Cwik (Jira)
Luke Cwik created BEAM-10142:


 Summary: Remove additional identity function workaround in 
View.java
 Key: BEAM-10142
 URL: https://issues.apache.org/jira/browse/BEAM-10142
 Project: Beam
  Issue Type: Improvement
  Components: runner-dataflow, sdk-java-core
Reporter: Luke Cwik


Dataflow is impacted by a bug in how it does graph replacement for transforms 
and needs the view transforms to have any transform so that the resulting 
CreateDataflowView is not considered a composite transform.

 

We need to either fix the expansion, migrate DataflowTranslator to use the 
pipeline proto, or migrate Dataflow to only run when using the beam_fn_api 
experiment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-7568) Java dataflow harness re-encodes value state cells even if they haven't changed

2020-05-28 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-7568.
-
Fix Version/s: 2.23.0
   Resolution: Fixed

> Java dataflow harness re-encodes value state cells even if they haven't 
> changed
> ---
>
> Key: BEAM-7568
> URL: https://issues.apache.org/jira/browse/BEAM-7568
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Affects Versions: 2.13.0
>Reporter: Steve Niemitz
>Assignee: Steve Niemitz
>Priority: P2
> Fix For: 2.23.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The java dataflow worker seems to re-encode ValueState cells after every work 
> item, even they weren't modified.
> You can see here 
> [https://github.com/apache/beam/blob/a71bfda77df36aa1531f01533c372233cfba0dd9/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateInternals.java#L413]
>  that the value is always encoded (and used to weight the cache entry) even 
> if it won't be persisted back to windmill. 
> This can have some large performance implications if they values being stored 
> are expensive/large to encode, and infrequently modified.  Ideally, the 
> weight would be also cached, and the value would only need to be modified if 
> it was changed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-10075) Allow users to tune the grouping table size in batch dataflow pipelines

2020-05-27 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-10075.
--
Resolution: Fixed

> Allow users to tune the grouping table size in batch dataflow pipelines
> ---
>
> Key: BEAM-10075
> URL: https://issues.apache.org/jira/browse/BEAM-10075
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Steve Niemitz
>Assignee: Steve Niemitz
>Priority: P2
> Fix For: 2.23.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The dataflow worker hard-codes the grouping table size to 100 MB.  We should 
> allow users to specify this as a pipeline parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-10016) Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2

2020-05-26 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117378#comment-17117378
 ] 

Luke Cwik edited comment on BEAM-10016 at 5/27/20, 4:19 AM:


The issue is that the GreedyPipelineFuser (and related classes) doesn't take 
into account the change in the encoding from the flattens input to the flattens 
output in certain scenarios where the flatten isn't being merged with an 
existing stage.

Normally one could copy the coder from the flatten's output PCollection to all 
the input PCollections to fix this but this doesn't hold when dealing with 
cross language pipelines because we could have
{code:java}
ParDo(Java) -> PC(big endian int coder)   -> Flatten(Python) -> 
PC(varint coder)
ParDo(Go) -> PCollection(little endian int coder) /{code}
The Python SDK in this case would know big endian int coder, little endian int 
coder and varint coder but Java/Go would only know the big endian int coder and 
little endian int coder respectively.

The solution in the above example is to make the Python SDK perform the 
transcoding by having it execute the flatten. Only flattens where the 
input/output coder matches can be done by a runner since no transcoding is 
necessary.

An alternative would be to require flattens have the same input and output 
coders but this has its own problems since it would be a backwards incompatible 
change or to insert identity ParDo's within SDKs to make sure that input/output 
coders match whenever there is a Flatten.


was (Author: lcwik):
The issue is that the GreedyPipelineFuser (and related classes) doesn't take 
into account the change in the encoding from the flattens input to the flattens 
output in certain scenarios where the flatten isn't being merged with an 
existing stage.

Normally one could copy the coder from the flatten's output PCollection to all 
the input PCollections to fix this but this doesn't hold when dealing with 
cross language pipelines because we could have
{code:java}
ParDo(Java) -> PC(big endian int coder)   -> Flatten(Python) -> 
PC(varint coder)
ParDo(Go) -> PCollection(little endian int coder) /{code}
The Python SDK in this case would know big endian int coder, little endian int 
coder and varint coder but Java/Go would only know the big endian int coder and 
little endian int coder respectively.

The solution in the above example is to make the Python SDK perform the 
transcoding by having it execute the flatten. Only flattens where the 
input/output coder matches can be done by a runner since no transcoding is 
necessary.

> Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2
> ---
>
> Key: BEAM-10016
> URL: https://issues.apache.org/jira/browse/BEAM-10016
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Maximilian Michels
>Priority: P2
>
> Both beam_PostCommit_Java_PVR_Flink_Batch and 
> beam_PostCommit_Java_PVR_Flink_Streaming are failing newly added test 
> org.apache.beam.sdk.transforms.FlattenTest.testFlattenWithDifferentInputAndOutputCoders2.
> SEVERE: Error in task code:  CHAIN MapPartition (MapPartition at [6]{Values, 
> FlatMapElements, PAssert$0}) -> FlatMap (FlatMap at ExtractOutput[0]) -> Map 
> (Key Extractor) -> GroupCombine (GroupCombine at GroupCombine: 
> PAssert$0/GroupGlobally/GatherAllOutputs/GroupByKey) -> Map (Key Extractor) 
> (2/2) java.lang.ClassCastException: org.apache.beam.sdk.values.KV cannot be 
> cast to [B
>   at 
> org.apache.beam.sdk.coders.ByteArrayCoder.encode(ByteArrayCoder.java:41)
>   at 
> org.apache.beam.sdk.coders.LengthPrefixCoder.encode(LengthPrefixCoder.java:56)
>   at org.apache.beam.sdk.coders.Coder.encode(Coder.java:136)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:590)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:581)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:541)
>   at 
> org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.accept(BeamFnDataSizeBasedBufferingOutboundObserver.java:109)
>   at 
> org.apache.beam.runners.fnexecution.control.SdkHarnessClient$CountingFnDataReceiver.accept(SdkHarnessClient.java:667)
>   at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.processElements(FlinkExecutableStageFunction.java:271)
>   at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:203)
>   at 
> org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103)
>   at 

[jira] [Commented] (BEAM-10016) Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2

2020-05-26 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117378#comment-17117378
 ] 

Luke Cwik commented on BEAM-10016:
--

The issue is that the GreedyPipelineFuser (and related classes) doesn't take 
into account the change in the encoding from the flattens input to the flattens 
output in certain scenarios where the flatten isn't being merged with an 
existing stage.

Normally one could copy the coder from the flatten's output PCollection to all 
the input PCollections to fix this but this doesn't hold when dealing with 
cross language pipelines because we could have
{code:java}
ParDo(Java) -> PC(big endian int coder)   -> Flatten(Python) -> 
PC(varint coder)
ParDo(Go) -> PCollection(little endian int coder) /{code}
The Python SDK in this case would know big endian int coder, little endian int 
coder and varint coder but Java/Go would only know the big endian int coder and 
little endian int coder respectively.

The solution in the above example is to make the Python SDK perform the 
transcoding by having it execute the flatten. Only flattens where the 
input/output coder matches can be done by a runner since no transcoding is 
necessary.

> Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2
> ---
>
> Key: BEAM-10016
> URL: https://issues.apache.org/jira/browse/BEAM-10016
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Maximilian Michels
>Priority: P2
>
> Both beam_PostCommit_Java_PVR_Flink_Batch and 
> beam_PostCommit_Java_PVR_Flink_Streaming are failing newly added test 
> org.apache.beam.sdk.transforms.FlattenTest.testFlattenWithDifferentInputAndOutputCoders2.
> SEVERE: Error in task code:  CHAIN MapPartition (MapPartition at [6]{Values, 
> FlatMapElements, PAssert$0}) -> FlatMap (FlatMap at ExtractOutput[0]) -> Map 
> (Key Extractor) -> GroupCombine (GroupCombine at GroupCombine: 
> PAssert$0/GroupGlobally/GatherAllOutputs/GroupByKey) -> Map (Key Extractor) 
> (2/2) java.lang.ClassCastException: org.apache.beam.sdk.values.KV cannot be 
> cast to [B
>   at 
> org.apache.beam.sdk.coders.ByteArrayCoder.encode(ByteArrayCoder.java:41)
>   at 
> org.apache.beam.sdk.coders.LengthPrefixCoder.encode(LengthPrefixCoder.java:56)
>   at org.apache.beam.sdk.coders.Coder.encode(Coder.java:136)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:590)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:581)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:541)
>   at 
> org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.accept(BeamFnDataSizeBasedBufferingOutboundObserver.java:109)
>   at 
> org.apache.beam.runners.fnexecution.control.SdkHarnessClient$CountingFnDataReceiver.accept(SdkHarnessClient.java:667)
>   at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.processElements(FlinkExecutableStageFunction.java:271)
>   at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:203)
>   at 
> org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103)
>   at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504)
>   at 
> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
>   at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-10076) Dataflow worker status page incorrectly displays work item statuses

2020-05-26 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-10076.
--
Resolution: Fixed

> Dataflow worker status page incorrectly displays work item statuses
> ---
>
> Key: BEAM-10076
> URL: https://issues.apache.org/jira/browse/BEAM-10076
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Steve Niemitz
>Assignee: Steve Niemitz
>Priority: P2
> Fix For: 2.23.0
>
> Attachments: image-2020-05-25-17-13-49-465.png
>
>
> The work item status page incorrectly renders its table due to an incorrectly 
> placed  tag.
>  (see attached screenshot)
> !image-2020-05-25-17-13-49-465.png|width=512,height=94!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10094) Spark failing testFlattenWithDifferentInputAndOutputCoders2

2020-05-26 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-10094:
-
Status: Open  (was: Triage Needed)

> Spark failing testFlattenWithDifferentInputAndOutputCoders2
> ---
>
> Key: BEAM-10094
> URL: https://issues.apache.org/jira/browse/BEAM-10094
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Maximilian Michels
>Priority: P2
>
> Spark portable validates runner is failing on newly added test 
> org.apache.beam.sdk.transforms.FlattenTest.testFlattenWithDifferentInputAndOutputCoders2.
> SEVERE: Error in task code:  CHAIN MapPartition (MapPartition at [6]{Values, 
> FlatMapElements, PAssert$0}) -> FlatMap (FlatMap at ExtractOutput[0]) -> Map 
> (Key Extractor) -> GroupCombine (GroupCombine at GroupCombine: 
> PAssert$0/GroupGlobally/GatherAllOutputs/GroupByKey) -> Map (Key Extractor) 
> (2/2) java.lang.ClassCastException: org.apache.beam.sdk.values.KV cannot be 
> cast to [B
>   at 
> org.apache.beam.sdk.coders.ByteArrayCoder.encode(ByteArrayCoder.java:41)
>   at 
> org.apache.beam.sdk.coders.LengthPrefixCoder.encode(LengthPrefixCoder.java:56)
>   at org.apache.beam.sdk.coders.Coder.encode(Coder.java:136)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:590)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:581)
>   at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:541)
>   at 
> org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.accept(BeamFnDataSizeBasedBufferingOutboundObserver.java:109)
>   at 
> org.apache.beam.runners.fnexecution.control.SdkHarnessClient$CountingFnDataReceiver.accept(SdkHarnessClient.java:667)
>   at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.processElements(FlinkExecutableStageFunction.java:271)
>   at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:203)
>   at 
> org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103)
>   at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504)
>   at 
> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
>   at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10097) Migrate PCollection views to use both iterable and multimap materializations/access patterns

2020-05-26 Thread Luke Cwik (Jira)
Luke Cwik created BEAM-10097:


 Summary: Migrate PCollection views to use both iterable and 
multimap materializations/access patterns
 Key: BEAM-10097
 URL: https://issues.apache.org/jira/browse/BEAM-10097
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-core, sdk-java-harness
Reporter: Luke Cwik
Assignee: Luke Cwik


Currently all the PCollection views have a trival mapping from KV> to the view that is being requested (singleton, iterable, list, 
map, multimap.

We should be using the primitive views (iterable, multimap) directly without 
going through the naive mapping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9371) Implement SideInput load test in Java

2020-05-26 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117146#comment-17117146
 ] 

Luke Cwik commented on BEAM-9371:
-

Is this a duplicate of https://issues.apache.org/jira/browse/BEAM-5982?

> Implement SideInput load test in Java
> -
>
> Key: BEAM-9371
> URL: https://issues.apache.org/jira/browse/BEAM-9371
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Michał Walenia
>Assignee: Michał Walenia
>Priority: P2
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-3419) Enable iterable side input for beam runners.

2020-05-26 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-3419.
-
Fix Version/s: 2.19.0
 Assignee: Luke Cwik
   Resolution: Fixed

> Enable iterable side input for beam runners.
> 
>
> Key: BEAM-3419
> URL: https://issues.apache.org/jira/browse/BEAM-3419
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Robert Bradshaw
>Assignee: Luke Cwik
>Priority: P2
> Fix For: 2.19.0
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10057) Failure when getting watermark "getWatermark is never meant to be invoked."

2020-05-26 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17116945#comment-17116945
 ] 

Luke Cwik commented on BEAM-10057:
--

[~bhulette] https://github.com/apache/beam/pull/11781 is ready for cherry pick 
into 2.22 release branch.

> Failure when getting watermark "getWatermark is never meant to be invoked."
> ---
>
> Key: BEAM-10057
> URL: https://issues.apache.org/jira/browse/BEAM-10057
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Luke Cwik
>Priority: P1
> Fix For: 2.22.0
>
>
> generic::unknown: org.apache.beam.sdk.util.UserCodeException: 
> java.lang.UnsupportedOperationException: getWatermark is never meant to be 
> invoked. at 
> org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:36) at 
> org.apache.beam.sdk.io.Read$UnboundedSourceAsSDFWrapperFn$DoFnInvoker.invokeProcessElement(Unknown
>  Source) at 
> org.apache.beam.fn.harness.FnApiDoFnRunner.processElementForElementAndRestriction(FnApiDoFnRunner.java:838)
>  at 
> org.apache.beam.fn.harness.FnApiDoFnRunner.processElementForSizedElementAndRestriction(FnApiDoFnRunner.java:808)
>  at 
> org.apache.beam.fn.harness.FnApiDoFnRunner.access$200(FnApiDoFnRunner.java:132)
>  at 
> org.apache.beam.fn.harness.FnApiDoFnRunner$Factory$2.accept(FnApiDoFnRunner.java:226)
>  at 
> org.apache.beam.fn.harness.FnApiDoFnRunner$Factory$2.accept(FnApiDoFnRunner.java:223)
>  at 
> org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$MetricTrackingFnDataReceiver.accept(PCollectionConsumerRegistry.java:216)
>  at 
> org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$MetricTrackingFnDataReceiver.accept(PCollectionConsumerRegistry.java:179)
>  at 
> org.apache.beam.fn.harness.BeamFnDataReadRunner.forwardElementToConsumer(BeamFnDataReadRunner.java:204)
>  at 
> org.apache.beam.fn.harness.data.QueueingBeamFnDataClient.drainAndBlock(QueueingBeamFnDataClient.java:106)
>  at 
> org.apache.beam.fn.harness.control.ProcessBundleHandler.processBundle(ProcessBundleHandler.java:295)
>  at 
> org.apache.beam.fn.harness.control.BeamFnControlClient.delegateOnInstructionRequestType(BeamFnControlClient.java:173)
>  at 
> org.apache.beam.fn.harness.control.BeamFnControlClient.lambda$processInstructionRequests$0(BeamFnControlClient.java:157)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  
> Seems to be a breakage in SDF due to a recent change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10057) Failure when getting watermark "getWatermark is never meant to be invoked."

2020-05-21 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113492#comment-17113492
 ] 

Luke Cwik commented on BEAM-10057:
--

Cham is working on verifying it fixes the Xlang Kafka issue.

> Failure when getting watermark "getWatermark is never meant to be invoked."
> ---
>
> Key: BEAM-10057
> URL: https://issues.apache.org/jira/browse/BEAM-10057
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Luke Cwik
>Priority: P1
> Fix For: 2.22.0
>
>
> generic::unknown: org.apache.beam.sdk.util.UserCodeException: 
> java.lang.UnsupportedOperationException: getWatermark is never meant to be 
> invoked. at 
> org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:36) at 
> org.apache.beam.sdk.io.Read$UnboundedSourceAsSDFWrapperFn$DoFnInvoker.invokeProcessElement(Unknown
>  Source) at 
> org.apache.beam.fn.harness.FnApiDoFnRunner.processElementForElementAndRestriction(FnApiDoFnRunner.java:838)
>  at 
> org.apache.beam.fn.harness.FnApiDoFnRunner.processElementForSizedElementAndRestriction(FnApiDoFnRunner.java:808)
>  at 
> org.apache.beam.fn.harness.FnApiDoFnRunner.access$200(FnApiDoFnRunner.java:132)
>  at 
> org.apache.beam.fn.harness.FnApiDoFnRunner$Factory$2.accept(FnApiDoFnRunner.java:226)
>  at 
> org.apache.beam.fn.harness.FnApiDoFnRunner$Factory$2.accept(FnApiDoFnRunner.java:223)
>  at 
> org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$MetricTrackingFnDataReceiver.accept(PCollectionConsumerRegistry.java:216)
>  at 
> org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$MetricTrackingFnDataReceiver.accept(PCollectionConsumerRegistry.java:179)
>  at 
> org.apache.beam.fn.harness.BeamFnDataReadRunner.forwardElementToConsumer(BeamFnDataReadRunner.java:204)
>  at 
> org.apache.beam.fn.harness.data.QueueingBeamFnDataClient.drainAndBlock(QueueingBeamFnDataClient.java:106)
>  at 
> org.apache.beam.fn.harness.control.ProcessBundleHandler.processBundle(ProcessBundleHandler.java:295)
>  at 
> org.apache.beam.fn.harness.control.BeamFnControlClient.delegateOnInstructionRequestType(BeamFnControlClient.java:173)
>  at 
> org.apache.beam.fn.harness.control.BeamFnControlClient.lambda$processInstructionRequests$0(BeamFnControlClient.java:157)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  
> Seems to be a breakage in SDF due to a recent change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10057) Failure when getting watermark "getWatermark is never meant to be invoked."

2020-05-21 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113491#comment-17113491
 ] 

Luke Cwik commented on BEAM-10057:
--

Fix in https://github.com/apache/beam/pull/11781

> Failure when getting watermark "getWatermark is never meant to be invoked."
> ---
>
> Key: BEAM-10057
> URL: https://issues.apache.org/jira/browse/BEAM-10057
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Luke Cwik
>Priority: P1
> Fix For: 2.22.0
>
>
> generic::unknown: org.apache.beam.sdk.util.UserCodeException: 
> java.lang.UnsupportedOperationException: getWatermark is never meant to be 
> invoked. at 
> org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:36) at 
> org.apache.beam.sdk.io.Read$UnboundedSourceAsSDFWrapperFn$DoFnInvoker.invokeProcessElement(Unknown
>  Source) at 
> org.apache.beam.fn.harness.FnApiDoFnRunner.processElementForElementAndRestriction(FnApiDoFnRunner.java:838)
>  at 
> org.apache.beam.fn.harness.FnApiDoFnRunner.processElementForSizedElementAndRestriction(FnApiDoFnRunner.java:808)
>  at 
> org.apache.beam.fn.harness.FnApiDoFnRunner.access$200(FnApiDoFnRunner.java:132)
>  at 
> org.apache.beam.fn.harness.FnApiDoFnRunner$Factory$2.accept(FnApiDoFnRunner.java:226)
>  at 
> org.apache.beam.fn.harness.FnApiDoFnRunner$Factory$2.accept(FnApiDoFnRunner.java:223)
>  at 
> org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$MetricTrackingFnDataReceiver.accept(PCollectionConsumerRegistry.java:216)
>  at 
> org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$MetricTrackingFnDataReceiver.accept(PCollectionConsumerRegistry.java:179)
>  at 
> org.apache.beam.fn.harness.BeamFnDataReadRunner.forwardElementToConsumer(BeamFnDataReadRunner.java:204)
>  at 
> org.apache.beam.fn.harness.data.QueueingBeamFnDataClient.drainAndBlock(QueueingBeamFnDataClient.java:106)
>  at 
> org.apache.beam.fn.harness.control.ProcessBundleHandler.processBundle(ProcessBundleHandler.java:295)
>  at 
> org.apache.beam.fn.harness.control.BeamFnControlClient.delegateOnInstructionRequestType(BeamFnControlClient.java:173)
>  at 
> org.apache.beam.fn.harness.control.BeamFnControlClient.lambda$processInstructionRequests$0(BeamFnControlClient.java:157)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  
> Seems to be a breakage in SDF due to a recent change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9979) Fix race condition where the read index maybe reported from the last executed bundle

2020-05-19 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-9979:

Fix Version/s: (was: 2.22.0)
   2.23.0

> Fix race condition where the read index maybe reported from the last executed 
> bundle
> 
>
> Key: BEAM-9979
> URL: https://issues.apache.org/jira/browse/BEAM-9979
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Luke Cwik
>Priority: P3
> Fix For: 2.23.0
>
>
> When the BeamFnDataReadRunner is reused there is a short period of time when 
> a progress request could happen before the the start function is called 
> resetting the read index to -1.
> I believe there should be a way to *reset* an operator before it gets added 
> to the set of cached bundle processors separate instead of placing clean-up 
> in any *start* functions that those operators may rely on preventing exposing 
> details of those operators before *start* may have been invoked.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9979) Fix race condition where the read index maybe reported from the last executed bundle

2020-05-19 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111371#comment-17111371
 ] 

Luke Cwik commented on BEAM-9979:
-

Nobody is working on it, moved to 2.23.

> Fix race condition where the read index maybe reported from the last executed 
> bundle
> 
>
> Key: BEAM-9979
> URL: https://issues.apache.org/jira/browse/BEAM-9979
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Luke Cwik
>Priority: P3
> Fix For: 2.22.0
>
>
> When the BeamFnDataReadRunner is reused there is a short period of time when 
> a progress request could happen before the the start function is called 
> resetting the read index to -1.
> I believe there should be a way to *reset* an operator before it gets added 
> to the set of cached bundle processors separate instead of placing clean-up 
> in any *start* functions that those operators may rely on preventing exposing 
> details of those operators before *start* may have been invoked.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-9339) Declare capabilities in SDK environments

2020-05-18 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110811#comment-17110811
 ] 

Luke Cwik edited comment on BEAM-9339 at 5/19/20, 3:13 AM:
---

Java SDK for Dataflow was fixed in 2.22 while Python has worked since 2.21. Go 
also has worked since 2.21


was (Author: lcwik):
Java SDK for Dataflow was fixed in 2.22

> Declare capabilities in SDK environments
> 
>
> Key: BEAM-9339
> URL: https://issues.apache.org/jira/browse/BEAM-9339
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9339) Declare capabilities in SDK environments

2020-05-18 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-9339.
-
Resolution: Fixed

Java SDK for Dataflow was fixed in 2.22

> Declare capabilities in SDK environments
> 
>
> Key: BEAM-9339
> URL: https://issues.apache.org/jira/browse/BEAM-9339
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-9577) Update artifact staging and retrieval protocols to be dependency aware.

2020-05-18 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110748#comment-17110748
 ] 

Luke Cwik edited comment on BEAM-9577 at 5/19/20, 1:06 AM:
---

I found out that the DataflowPipelineTranslator does not use the 
getOrCreateDefaultEnvironment logic and its environment currently lacks all the 
artifact metadata currently.
See: 
https://github.com/apache/beam/blob/76fbe45189ef0fa4b770d607c2f86e8870974523/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java#L139

Is that covered by this task or is there another one this should be added to?


was (Author: lcwik):
I found out that the DataflowPipelineTranslator does not use the 
getOrCreateDefaultEnvironment logic and its environment currently lacks all the 
artifact metadata currently.
See: 
https://github.com/apache/beam/blob/76fbe45189ef0fa4b770d607c2f86e8870974523/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java#L139

> Update artifact staging and retrieval protocols to be dependency aware.
> ---
>
> Key: BEAM-9577
> URL: https://issues.apache.org/jira/browse/BEAM-9577
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P2
>  Time Spent: 23h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9577) Update artifact staging and retrieval protocols to be dependency aware.

2020-05-18 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110748#comment-17110748
 ] 

Luke Cwik commented on BEAM-9577:
-

I found out that the DataflowPipelineTranslator does not use the 
getOrCreateDefaultEnvironment logic and its environment currently lacks all the 
artifact metadata currently.
See: 
https://github.com/apache/beam/blob/76fbe45189ef0fa4b770d607c2f86e8870974523/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java#L139

> Update artifact staging and retrieval protocols to be dependency aware.
> ---
>
> Key: BEAM-9577
> URL: https://issues.apache.org/jira/browse/BEAM-9577
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P2
>  Time Spent: 23h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9339) Declare capabilities in SDK environments

2020-05-18 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110747#comment-17110747
 ] 

Luke Cwik commented on BEAM-9339:
-

Fix in https://github.com/apache/beam/pull/11748

> Declare capabilities in SDK environments
> 
>
> Key: BEAM-9339
> URL: https://issues.apache.org/jira/browse/BEAM-9339
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-9339) Declare capabilities in SDK environments

2020-05-18 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110744#comment-17110744
 ] 

Luke Cwik edited comment on BEAM-9339 at 5/19/20, 1:01 AM:
---

Dataflow doesn't specify the capabilities in its pipeline proto representation 
since it isn't using the default getOrCreate environment call.

See:
https://github.com/apache/beam/blob/76fbe45189ef0fa4b770d607c2f86e8870974523/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java#L139


was (Author: lcwik):
Dataflow doesn't specify the capabilities in its pipeline proto representation 
since it isn't using the default getOrCreate environment call.

> Declare capabilities in SDK environments
> 
>
> Key: BEAM-9339
> URL: https://issues.apache.org/jira/browse/BEAM-9339
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9339) Declare capabilities in SDK environments

2020-05-18 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110744#comment-17110744
 ] 

Luke Cwik commented on BEAM-9339:
-

Dataflow doesn't specify the capabilities in its pipeline proto representation 
since it isn't using the default getOrCreate environment call.

> Declare capabilities in SDK environments
> 
>
> Key: BEAM-9339
> URL: https://issues.apache.org/jira/browse/BEAM-9339
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P2
> Fix For: 2.21.0
>
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9339) Declare capabilities in SDK environments

2020-05-18 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-9339:

Status: Open  (was: Triage Needed)

> Declare capabilities in SDK environments
> 
>
> Key: BEAM-9339
> URL: https://issues.apache.org/jira/browse/BEAM-9339
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9339) Declare capabilities in SDK environments

2020-05-18 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-9339:

Fix Version/s: (was: 2.21.0)
   2.22.0

> Declare capabilities in SDK environments
> 
>
> Key: BEAM-9339
> URL: https://issues.apache.org/jira/browse/BEAM-9339
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (BEAM-9339) Declare capabilities in SDK environments

2020-05-18 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik reopened BEAM-9339:
-

> Declare capabilities in SDK environments
> 
>
> Key: BEAM-9339
> URL: https://issues.apache.org/jira/browse/BEAM-9339
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P2
> Fix For: 2.21.0
>
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10028) [Java SDK] Support state backed iterables within the SDK harness

2020-05-18 Thread Luke Cwik (Jira)
Luke Cwik created BEAM-10028:


 Summary: [Java SDK] Support state backed iterables within the SDK 
harness
 Key: BEAM-10028
 URL: https://issues.apache.org/jira/browse/BEAM-10028
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-harness
Reporter: Luke Cwik
Assignee: Luke Cwik






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10014) [Java] Support retrieval of large gbk iterables over the state API.

2020-05-15 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-10014:
-
Status: Open  (was: Triage Needed)

> [Java] Support retrieval of large gbk iterables over the state API.
> ---
>
> Key: BEAM-10014
> URL: https://issues.apache.org/jira/browse/BEAM-10014
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-10014) [Java] Support retrieval of large gbk iterables over the state API.

2020-05-15 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik reassigned BEAM-10014:


Assignee: Luke Cwik  (was: Robert Bradshaw)

> [Java] Support retrieval of large gbk iterables over the state API.
> ---
>
> Key: BEAM-10014
> URL: https://issues.apache.org/jira/browse/BEAM-10014
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10014) [Java] Support retrieval of large gbk iterables over the state API.

2020-05-15 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-10014:
-
Component/s: (was: sdk-py-harness)
 sdk-java-harness

> [Java] Support retrieval of large gbk iterables over the state API.
> ---
>
> Key: BEAM-10014
> URL: https://issues.apache.org/jira/browse/BEAM-10014
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10014) [Java] Support retrieval of large gbk iterables over the state API.

2020-05-15 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-10014:
-
Fix Version/s: (was: 2.20.0)

> [Java] Support retrieval of large gbk iterables over the state API.
> ---
>
> Key: BEAM-10014
> URL: https://issues.apache.org/jira/browse/BEAM-10014
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Robert Bradshaw
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10014) [Java] Support retrieval of large gbk iterables over the state API.

2020-05-15 Thread Luke Cwik (Jira)
Luke Cwik created BEAM-10014:


 Summary: [Java] Support retrieval of large gbk iterables over the 
state API.
 Key: BEAM-10014
 URL: https://issues.apache.org/jira/browse/BEAM-10014
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-harness
Reporter: Luke Cwik
Assignee: Robert Bradshaw
 Fix For: 2.20.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-6120) Support retrieval of large gbk iterables over the state API.

2020-05-15 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-6120.
-
Fix Version/s: 2.20.0
   Resolution: Fixed

> Support retrieval of large gbk iterables over the state API.
> 
>
> Key: BEAM-6120
> URL: https://issues.apache.org/jira/browse/BEAM-6120
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-2902) Add user state support for ParDo.Multi for the Dataflow runner

2020-05-15 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-2902.
-
Fix Version/s: 2.21.0
   Resolution: Fixed

Works with Dataflow runner v2

> Add user state support for ParDo.Multi for the Dataflow runner
> --
>
> Key: BEAM-2902
> URL: https://issues.apache.org/jira/browse/BEAM-2902
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Minor
>  Labels: portability, sickbay
> Fix For: 2.21.0
>
>
> Add support for user state in a ParDo.Multi by:
> * adding expansion/conversion logic within Dataflow service to add any needed 
> GBKs to allow expansion.
> *OR*
> * Add support for expansion inside the Beam SDK once the PTransformMatcher 
> exposes a way to know when the replacement is not required by checking that 
> the preceding ParDos to a GBK are key preserving.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-2902) Add user state support for ParDo.Multi for the Dataflow runner

2020-05-15 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik reassigned BEAM-2902:
---

Assignee: Luke Cwik

> Add user state support for ParDo.Multi for the Dataflow runner
> --
>
> Key: BEAM-2902
> URL: https://issues.apache.org/jira/browse/BEAM-2902
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Minor
>  Labels: portability, sickbay
>
> Add support for user state in a ParDo.Multi by:
> * adding expansion/conversion logic within Dataflow service to add any needed 
> GBKs to allow expansion.
> *OR*
> * Add support for expansion inside the Beam SDK once the PTransformMatcher 
> exposes a way to know when the replacement is not required by checking that 
> the preceding ParDos to a GBK are key preserving.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-6559) IsmReaderImpl makes inconsistent assumptions about nullness

2020-05-15 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-6559.
-
Fix Version/s: Not applicable
   Resolution: Won't Fix

Not worth investigating if this is a real issue as this code is going the way 
of the dodo.

> IsmReaderImpl makes inconsistent assumptions about nullness
> ---
>
> Key: BEAM-6559
> URL: https://issues.apache.org/jira/browse/BEAM-6559
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Kenneth Knowles
>Assignee: Luke Cwik
>Priority: Critical
> Fix For: Not applicable
>
>
> In IsmReaderImpl, there is a dereference of indexPerShard where a conditional 
> just above implies that it could be null. You can find it by searching for 
> this Jira number in the codebase, where I am suppressing the FB warning, 
> though it implies a real bug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-6560) IsmReaderImpl has inconsistent synchronization

2020-05-15 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-6560.
-
Fix Version/s: Not applicable
   Resolution: Won't Fix

Not worth investigating if this is a real issue as this code is going the way 
of the dodo.

> IsmReaderImpl has inconsistent synchronization
> --
>
> Key: BEAM-6560
> URL: https://issues.apache.org/jira/browse/BEAM-6560
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Kenneth Knowles
>Assignee: Luke Cwik
>Priority: Major
> Fix For: Not applicable
>
>
> This bug filed because findbugs complained. There are many complaints about 
> inconsistent synchronization that deserve review, and suppression if they are 
> good to go. Grep for this issue in the codebase to find, or just open 
> IsmReaderImpl.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-6597) Put MonitoringInfos/metrics in the Java SDK ProcessBundleProgressResponse

2020-05-15 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-6597.
-
Fix Version/s: 2.21.0
   Resolution: Fixed

> Put MonitoringInfos/metrics in the Java SDK ProcessBundleProgressResponse
> -
>
> Key: BEAM-6597
> URL: https://issues.apache.org/jira/browse/BEAM-6597
> Project: Beam
>  Issue Type: New Feature
>  Components: java-fn-execution
>Reporter: Alex Amato
>Assignee: Luke Cwik
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> I think this is the correct approach, as I don't believe there is any hook in 
> the Java SDK yet for ProcessBundleProgressResponses.
> (1) Implement ProcessBundleProgressResponse
> See FnHarness.main to add a handle for RequestCase.PROGRESS_BUNDLE.
> (2) Refactor ProgressBundleHandler so that the metrics can be extracted from 
> the MetricContainerStep map and SimpleExecutionStates for the instrucitonId 
> when the call comes in. (Right now all these objects only live in the local 
> function, they may need to live in an object instead which can be accessed by 
> both process bundle and progress bundle responses). Be careful to not 
> introduce thread contention. Ideally we need a way to read the values without 
> locking new ones from being written.
> (Test 1) Also be sure to simplify RemoteExecutionTest.testMetrics().
> By inspecting the metric progress, we can remove the sleeps from this code. 
> Currently there are sleeps in start, process and finish to ensure execution 
> time metrics are added. Instead, once progress bundle responses are 
> introduced, the metrics can be examined here



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-1866) Fn API support for Metrics

2020-05-15 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik reassigned BEAM-1866:
---

Assignee: Luke Cwik

> Fn API support for Metrics
> --
>
> Key: BEAM-1866
> URL: https://issues.apache.org/jira/browse/BEAM-1866
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Dan Halperin
>Assignee: Luke Cwik
>Priority: Major
>  Labels: portability
> Fix For: Not applicable
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> As part of the Fn API work, we need to define a Metrics interface between the 
> Runner and the SDK. Right now, Metrics are simply lost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10012) Update Python SDK to construct Dataflow job requests from Beam runner API protos

2020-05-15 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-10012:
-
Status: Open  (was: Triage Needed)

> Update Python SDK to construct Dataflow job requests from Beam runner API 
> protos
> 
>
> Key: BEAM-10012
> URL: https://issues.apache.org/jira/browse/BEAM-10012
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Priority: Major
>
> Currently, portable runners are expected to do following when constructing a 
> runner specific job.
> SDK specific job graph -> Beam runner API proto -> Runner specific job request
> Portable Spark and Flink follow this model.
> Dataflow does following.
> SDK specific job graph -> Runner specific job request
> Beam runner API proto -> Upload to GCS -> Download at workers
>  
> We should update Dataflow to follow the prior path which is expected to be 
> followed by all portable runners.
> This will simplify the cross-language transforms job construction logic for 
> Dataflow.
> We can probably start this by just implementing this for Python SDK for 
> portions of pipeline received by expanding external transforms.
> cc: [~lcwik] [~robertwb]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-3545) Fn API metrics in Go SDK harness

2020-05-15 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-3545.
-
Fix Version/s: 2.21.0
   Resolution: Fixed

Any additional styles of metrics/capabilities have been tracked under other 
bugs.

> Fn API metrics in Go SDK harness
> 
>
> Key: BEAM-3545
> URL: https://issues.apache.org/jira/browse/BEAM-3545
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Kenneth Knowles
>Assignee: Robert Burke
>Priority: Major
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 17h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-3543) Fn API metrics in Java SDK Harness

2020-05-15 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik reassigned BEAM-3543:
---

Assignee: Luke Cwik

> Fn API metrics in Java SDK Harness
> --
>
> Key: BEAM-3543
> URL: https://issues.apache.org/jira/browse/BEAM-3543
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-harness
>Reporter: Kenneth Knowles
>Assignee: Luke Cwik
>Priority: Major
>  Labels: portability
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-3543) Fn API metrics in Java SDK Harness

2020-05-15 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-3543.
-
Fix Version/s: 2.21.0
   Resolution: Fixed

> Fn API metrics in Java SDK Harness
> --
>
> Key: BEAM-3543
> URL: https://issues.apache.org/jira/browse/BEAM-3543
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-harness
>Reporter: Kenneth Knowles
>Assignee: Luke Cwik
>Priority: Major
>  Labels: portability
> Fix For: 2.21.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-3544) Fn API metrics in Python SDK Harness

2020-05-15 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-3544.
-
Fix Version/s: 2.21.0
   Resolution: Fixed

> Fn API metrics in Python SDK Harness
> 
>
> Key: BEAM-3544
> URL: https://issues.apache.org/jira/browse/BEAM-3544
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-harness
>Reporter: Kenneth Knowles
>Assignee: Robert Bradshaw
>Priority: Major
> Fix For: 2.21.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-3544) Fn API metrics in Python SDK Harness

2020-05-15 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik reassigned BEAM-3544:
---

Assignee: Robert Bradshaw

> Fn API metrics in Python SDK Harness
> 
>
> Key: BEAM-3544
> URL: https://issues.apache.org/jira/browse/BEAM-3544
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-harness
>Reporter: Kenneth Knowles
>Assignee: Robert Bradshaw
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9934) Resolve differences in beam:metric:element_count:v1 implementations

2020-05-14 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-9934:

Fix Version/s: (was: 2.22.0)

> Resolve differences in beam:metric:element_count:v1 implementations
> ---
>
> Key: BEAM-9934
> URL: https://issues.apache.org/jira/browse/BEAM-9934
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
>
> The [element 
> count|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/model/pipeline/src/main/proto/metrics.proto#L206]
>  metric represents the number of elements within a PCollection and is 
> interpreted differently across the Beam SDK versions.
> In the [Java 
> SDK|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/PCollectionConsumerRegistry.java#L207]
>  this represents the number of elements and includes how many windows those 
> elements are in. This metric is incremented as soon as the element has been 
> output.
> In the [Python 
> SDK|https://github.com/apache/beam/blame/bfd151aa4c3aad29f3aea6482212ff8543ded8d7/sdks/python/apache_beam/runners/worker/opcounters.py#L247]
>  this represents the number of elements and doesn't include how many windows 
> those elements are in. The metric is also only incremented after the element 
> has finished processing.
> The [Go 
> SDK|https://github.com/apache/beam/blob/7097850daa46674b88425a124bc442fc8ce0dcb8/sdks/go/pkg/beam/core/runtime/exec/datasource.go#L260]
>  does the same thing as Python.
> Traditionally in Dataflow this has always been the exploded window element 
> count and the counter is incremented as soon as the element is output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9995) Ensure that the environment is propagated through from ExpansionService to Dataflow

2020-05-14 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-9995:

Component/s: runner-dataflow

> Ensure that the environment is propagated through from ExpansionService to 
> Dataflow
> ---
>
> Key: BEAM-9995
> URL: https://issues.apache.org/jira/browse/BEAM-9995
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Minor
>
> The CrossLanguageTest#test_flatten fails because the environment being 
> returned from the expansion service is being lost during translation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9995) Ensure that the environment is propagated through from ExpansionService to Dataflow

2020-05-14 Thread Luke Cwik (Jira)
Luke Cwik created BEAM-9995:
---

 Summary: Ensure that the environment is propagated through from 
ExpansionService to Dataflow
 Key: BEAM-9995
 URL: https://issues.apache.org/jira/browse/BEAM-9995
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Luke Cwik
Assignee: Luke Cwik


The CrossLanguageTest#test_flatten fails because the environment being returned 
from the expansion service is being lost during translation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9964) Setting workerCacheMb to make its way to the WindmillStateCache Constructor

2020-05-14 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-9964:

Status: Open  (was: Triage Needed)

> Setting workerCacheMb to make its way to the WindmillStateCache Constructor
> ---
>
> Key: BEAM-9964
> URL: https://issues.apache.org/jira/browse/BEAM-9964
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Omar Ismail
>Priority: Minor
>
> Setting --workerCacheMB seems to affect batch pipelines only. For Streaming, 
> the cache seems to be hardcoded to 100Mb [1]. If possible, I would like to 
> make it allowable to change the cache value in Streaming when setting 
> -workerCacheMB.
> I've never made changes to the Beam SDK, so I am super excited to work on 
> this! 
>  
> [[1] 
> https://github.com/apache/beam/blob/5e659bb80bcbf70795f6806e05a255ee72706d9f/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateCache.java#L73|https://github.com/apache/beam/blob/5e659bb80bcbf70795f6806e05a255ee72706d9f/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateCache.java#L73]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-9964) Setting workerCacheMb to make its way to the WindmillStateCache Constructor

2020-05-14 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik reassigned BEAM-9964:
---

Assignee: Omar Ismail

> Setting workerCacheMb to make its way to the WindmillStateCache Constructor
> ---
>
> Key: BEAM-9964
> URL: https://issues.apache.org/jira/browse/BEAM-9964
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Omar Ismail
>Assignee: Omar Ismail
>Priority: Minor
>
> Setting --workerCacheMB seems to affect batch pipelines only. For Streaming, 
> the cache seems to be hardcoded to 100Mb [1]. If possible, I would like to 
> make it allowable to change the cache value in Streaming when setting 
> -workerCacheMB.
> I've never made changes to the Beam SDK, so I am super excited to work on 
> this! 
>  
> [[1] 
> https://github.com/apache/beam/blob/5e659bb80bcbf70795f6806e05a255ee72706d9f/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateCache.java#L73|https://github.com/apache/beam/blob/5e659bb80bcbf70795f6806e05a255ee72706d9f/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateCache.java#L73]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9935) Resolve differences in allowed_split_point implementations

2020-05-14 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107289#comment-17107289
 ] 

Luke Cwik commented on BEAM-9935:
-

Turns out the go implementation respects the value even if it is too strict in 
its implementation so removed from 2.22 list.

> Resolve differences in allowed_split_point implementations
> --
>
> Key: BEAM-9935
> URL: https://issues.apache.org/jira/browse/BEAM-9935
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Daniel Oliveira
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> [Java SDK 
> harness|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/BeamFnDataReadRunner.java#L223]
>  doesn't support it yet which is also safe.
> [Go SDK 
> harness|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/go/pkg/beam/core/runtime/exec/datasource.go#L273]
>  only supports splits if points are specified and it doesn't use the fraction 
> at all.
> [Python SDK 
> harness|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/python/apache_beam/runners/worker/bundle_processor.py#L947]
>  ignores the split points meaning that it may return an invalid split 
> location based upon the runners limitations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9935) Resolve differences in allowed_split_point implementations

2020-05-14 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-9935:

Fix Version/s: (was: 2.22.0)

> Resolve differences in allowed_split_point implementations
> --
>
> Key: BEAM-9935
> URL: https://issues.apache.org/jira/browse/BEAM-9935
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Daniel Oliveira
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> [Java SDK 
> harness|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/BeamFnDataReadRunner.java#L223]
>  doesn't support it yet which is also safe.
> [Go SDK 
> harness|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/go/pkg/beam/core/runtime/exec/datasource.go#L273]
>  only supports splits if points are specified and it doesn't use the fraction 
> at all.
> [Python SDK 
> harness|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/python/apache_beam/runners/worker/bundle_processor.py#L947]
>  ignores the split points meaning that it may return an invalid split 
> location based upon the runners limitations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-9935) Resolve differences in allowed_split_point implementations

2020-05-13 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik reassigned BEAM-9935:
---

Assignee: Daniel Oliveira  (was: Robert Bradshaw)

> Resolve differences in allowed_split_point implementations
> --
>
> Key: BEAM-9935
> URL: https://issues.apache.org/jira/browse/BEAM-9935
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Daniel Oliveira
>Priority: Major
> Fix For: 2.22.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> [Java SDK 
> harness|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/BeamFnDataReadRunner.java#L223]
>  doesn't support it yet which is also safe.
> [Go SDK 
> harness|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/go/pkg/beam/core/runtime/exec/datasource.go#L273]
>  only supports splits if points are specified and it doesn't use the fraction 
> at all.
> [Python SDK 
> harness|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/python/apache_beam/runners/worker/bundle_processor.py#L947]
>  ignores the split points meaning that it may return an invalid split 
> location based upon the runners limitations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9935) Resolve differences in allowed_split_point implementations

2020-05-13 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17106776#comment-17106776
 ] 

Luke Cwik commented on BEAM-9935:
-

[~danoliveira], can you address the Go version of this code borrowing on pr 
11671, 11688, and 11689 for implementation details?

> Resolve differences in allowed_split_point implementations
> --
>
> Key: BEAM-9935
> URL: https://issues.apache.org/jira/browse/BEAM-9935
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Daniel Oliveira
>Priority: Major
> Fix For: 2.22.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> [Java SDK 
> harness|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/BeamFnDataReadRunner.java#L223]
>  doesn't support it yet which is also safe.
> [Go SDK 
> harness|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/go/pkg/beam/core/runtime/exec/datasource.go#L273]
>  only supports splits if points are specified and it doesn't use the fraction 
> at all.
> [Python SDK 
> harness|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/python/apache_beam/runners/worker/bundle_processor.py#L947]
>  ignores the split points meaning that it may return an invalid split 
> location based upon the runners limitations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9935) Resolve differences in allowed_split_point implementations

2020-05-13 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17106775#comment-17106775
 ] 

Luke Cwik commented on BEAM-9935:
-

This is not a blocker for 2.21 after 11671 was merged. 11688 was for 2.22. 
There is still changes for a Go implementation that is necessary.

> Resolve differences in allowed_split_point implementations
> --
>
> Key: BEAM-9935
> URL: https://issues.apache.org/jira/browse/BEAM-9935
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Robert Bradshaw
>Priority: Major
> Fix For: 2.22.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> [Java SDK 
> harness|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/BeamFnDataReadRunner.java#L223]
>  doesn't support it yet which is also safe.
> [Go SDK 
> harness|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/go/pkg/beam/core/runtime/exec/datasource.go#L273]
>  only supports splits if points are specified and it doesn't use the fraction 
> at all.
> [Python SDK 
> harness|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/python/apache_beam/runners/worker/bundle_processor.py#L947]
>  ignores the split points meaning that it may return an invalid split 
> location based upon the runners limitations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9935) Resolve differences in allowed_split_point implementations

2020-05-13 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-9935:

Priority: Major  (was: Blocker)

> Resolve differences in allowed_split_point implementations
> --
>
> Key: BEAM-9935
> URL: https://issues.apache.org/jira/browse/BEAM-9935
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Robert Bradshaw
>Priority: Major
> Fix For: 2.22.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> [Java SDK 
> harness|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/BeamFnDataReadRunner.java#L223]
>  doesn't support it yet which is also safe.
> [Go SDK 
> harness|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/go/pkg/beam/core/runtime/exec/datasource.go#L273]
>  only supports splits if points are specified and it doesn't use the fraction 
> at all.
> [Python SDK 
> harness|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/python/apache_beam/runners/worker/bundle_processor.py#L947]
>  ignores the split points meaning that it may return an invalid split 
> location based upon the runners limitations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9001) Allow setting environment ID to all transforms in the SDK

2020-05-13 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17106714#comment-17106714
 ] 

Luke Cwik commented on BEAM-9001:
-

We only need to cherry pick the Python portion of 
https://github.com/apache/beam/pull/11670 if there are merge conflicts with the 
other parts for 2.21. This can be resolved after the cherry pick.

> Allow setting environment ID to all transforms in the SDK
> -
>
> Key: BEAM-9001
> URL: https://issues.apache.org/jira/browse/BEAM-9001
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, sdk-java-harness, sdk-py-core, 
> sdk-py-harness
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Luke Cwik
>Priority: Blocker
> Fix For: 2.21.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently Beam SDKs set environment in a known set of transforms and do not 
> not set it in others. Runners expect certain transforms to not to resolve to 
> an environment.
> It might be cleaner to set environment in all transforms by default (at the 
> SDKs) and allow runners to override this for transforms that are naively 
> implemented in the corresponding runners.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-6327) Don't attempt to fuse subtransforms of primitive/known transforms.

2020-05-13 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik reassigned BEAM-6327:
---

Assignee: Luke Cwik  (was: Kyle Weaver)

> Don't attempt to fuse subtransforms of primitive/known transforms.
> --
>
> Key: BEAM-6327
> URL: https://issues.apache.org/jira/browse/BEAM-6327
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-direct
>Reporter: Robert Bradshaw
>Assignee: Luke Cwik
>Priority: Major
>  Labels: portability
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Currently we must remove all sub-components of any known transform that may 
> have an optional substructure, e.g. 
> [https://github.com/apache/beam/blob/release-2.9.0/sdks/python/apache_beam/runners/portability/portable_runner.py#L126]
>  (for GBK) and [https://github.com/apache/beam/pull/7360] (Reshuffle).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-6327) Don't attempt to fuse subtransforms of primitive/known transforms.

2020-05-13 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-6327.
-
Fix Version/s: 2.22.0
   Resolution: Fixed

> Don't attempt to fuse subtransforms of primitive/known transforms.
> --
>
> Key: BEAM-6327
> URL: https://issues.apache.org/jira/browse/BEAM-6327
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-direct
>Reporter: Robert Bradshaw
>Assignee: Luke Cwik
>Priority: Major
>  Labels: portability
> Fix For: 2.22.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Currently we must remove all sub-components of any known transform that may 
> have an optional substructure, e.g. 
> [https://github.com/apache/beam/blob/release-2.9.0/sdks/python/apache_beam/runners/portability/portable_runner.py#L126]
>  (for GBK) and [https://github.com/apache/beam/pull/7360] (Reshuffle).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9945) Use consistent element count for progress counter.

2020-05-13 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17106619#comment-17106619
 ] 

Luke Cwik commented on BEAM-9945:
-

[~ibzib] Can you cherrypick https://github.com/apache/beam/pull/11689 or just 
the Python portion of it if there is an issue generating the cherrypick?

> Use consistent element count for progress counter.
> --
>
> Key: BEAM-9945
> URL: https://issues.apache.org/jira/browse/BEAM-9945
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Critical
> Fix For: 2.21.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This influences how the SDK communicates progress and how splitting is 
> performed.
> We currently inspect the element count of the output (which has inconsistent 
> definitions across SDKs, see BEAM-9934). Instead, we can move to using an 
> explicit, separate metric. 
> This can currently lead to incorrect progress reporting and in some cases 
> even a crash in the UW depending on the SDK, and makes it more difficult (but 
> not impossible) to fix element counts in the future. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-9945) Use consistent element count for progress counter.

2020-05-13 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17106619#comment-17106619
 ] 

Luke Cwik edited comment on BEAM-9945 at 5/13/20, 7:45 PM:
---

[~ibzib] Can you cherrypick https://github.com/apache/beam/pull/11689 or just 
the Python portion of it if there is an issue generating the cherrypick?

This would close the issue.


was (Author: lcwik):
[~ibzib] Can you cherrypick https://github.com/apache/beam/pull/11689 or just 
the Python portion of it if there is an issue generating the cherrypick?

> Use consistent element count for progress counter.
> --
>
> Key: BEAM-9945
> URL: https://issues.apache.org/jira/browse/BEAM-9945
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Critical
> Fix For: 2.21.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This influences how the SDK communicates progress and how splitting is 
> performed.
> We currently inspect the element count of the output (which has inconsistent 
> definitions across SDKs, see BEAM-9934). Instead, we can move to using an 
> explicit, separate metric. 
> This can currently lead to incorrect progress reporting and in some cases 
> even a crash in the UW depending on the SDK, and makes it more difficult (but 
> not impossible) to fix element counts in the future. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9979) Fix race condition where the read index maybe reported from the last executed bundle

2020-05-13 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-9979:

Component/s: (was: sdk-py-harness)

> Fix race condition where the read index maybe reported from the last executed 
> bundle
> 
>
> Key: BEAM-9979
> URL: https://issues.apache.org/jira/browse/BEAM-9979
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Luke Cwik
>Priority: Minor
> Fix For: 2.22.0
>
>
> When the BeamFnDataReadRunner/DataInputOperation is reused there is a short 
> period of time when a progress request could happen before the the start 
> function is called resetting the read index to -1.
> I believe there should be a way to *reset* an operator before it gets added 
> to the set of cached bundle processors separate instead of placing clean-up 
> in any *start* functions that those operators may rely on preventing exposing 
> details of those operators before *start* may have been invoked.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9979) Fix race condition where the read index maybe reported from the last executed bundle

2020-05-13 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-9979:

Description: 
When the BeamFnDataReadRunner is reused there is a short period of time when a 
progress request could happen before the the start function is called resetting 
the read index to -1.

I believe there should be a way to *reset* an operator before it gets added to 
the set of cached bundle processors separate instead of placing clean-up in any 
*start* functions that those operators may rely on preventing exposing details 
of those operators before *start* may have been invoked.

  was:
When the BeamFnDataReadRunner/DataInputOperation is reused there is a short 
period of time when a progress request could happen before the the start 
function is called resetting the read index to -1.

I believe there should be a way to *reset* an operator before it gets added to 
the set of cached bundle processors separate instead of placing clean-up in any 
*start* functions that those operators may rely on preventing exposing details 
of those operators before *start* may have been invoked.


> Fix race condition where the read index maybe reported from the last executed 
> bundle
> 
>
> Key: BEAM-9979
> URL: https://issues.apache.org/jira/browse/BEAM-9979
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Luke Cwik
>Priority: Minor
> Fix For: 2.22.0
>
>
> When the BeamFnDataReadRunner is reused there is a short period of time when 
> a progress request could happen before the the start function is called 
> resetting the read index to -1.
> I believe there should be a way to *reset* an operator before it gets added 
> to the set of cached bundle processors separate instead of placing clean-up 
> in any *start* functions that those operators may rely on preventing exposing 
> details of those operators before *start* may have been invoked.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9979) Fix race condition where the read index maybe reported from the last executed bundle

2020-05-12 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-9979:

Description: 
When the BeamFnDataReadRunner/DataInputOperation is reused there is a short 
period of time when a progress request could happen before the the start 
function is called resetting the read index to -1.

I believe there should be a way to *reset* an operator before it gets added to 
the set of cached bundle processors separate instead of placing clean-up in any 
*start* functions that those operators may rely on preventing exposing details 
of those operators before *start* may have been invoked.

  was:When the BeamFnDataReadRunner/DataInputOperation is reused there is a 
short period of time when a progress request could happen before the the start 
function is called resetting the read index to -1.


> Fix race condition where the read index maybe reported from the last executed 
> bundle
> 
>
> Key: BEAM-9979
> URL: https://issues.apache.org/jira/browse/BEAM-9979
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness, sdk-py-harness
>Reporter: Luke Cwik
>Priority: Minor
> Fix For: 2.22.0
>
>
> When the BeamFnDataReadRunner/DataInputOperation is reused there is a short 
> period of time when a progress request could happen before the the start 
> function is called resetting the read index to -1.
> I believe there should be a way to *reset* an operator before it gets added 
> to the set of cached bundle processors separate instead of placing clean-up 
> in any *start* functions that those operators may rely on preventing exposing 
> details of those operators before *start* may have been invoked.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9979) Fix race condition where the read index maybe reported from the last executed bundle

2020-05-12 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-9979:

Status: Open  (was: Triage Needed)

> Fix race condition where the read index maybe reported from the last executed 
> bundle
> 
>
> Key: BEAM-9979
> URL: https://issues.apache.org/jira/browse/BEAM-9979
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness, sdk-py-harness
>Reporter: Luke Cwik
>Priority: Minor
> Fix For: 2.22.0
>
>
> When the BeamFnDataReadRunner/DataInputOperation is reused there is a short 
> period of time when a progress request could happen before the the start 
> function is called resetting the read index to -1.
> I believe there should be a way to *reset* an operator before it gets added 
> to the set of cached bundle processors separate instead of placing clean-up 
> in any *start* functions that those operators may rely on preventing exposing 
> details of those operators before *start* may have been invoked.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9979) Fix race condition where the read index maybe reported from the last executed bundle

2020-05-12 Thread Luke Cwik (Jira)
Luke Cwik created BEAM-9979:
---

 Summary: Fix race condition where the read index maybe reported 
from the last executed bundle
 Key: BEAM-9979
 URL: https://issues.apache.org/jira/browse/BEAM-9979
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-harness, sdk-py-harness
Reporter: Luke Cwik
 Fix For: 2.22.0


When the BeamFnDataReadRunner/DataInputOperation is reused there is a short 
period of time when a progress request could happen before the the start 
function is called resetting the read index to -1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-9945) Use consistent element count for progress counter.

2020-05-12 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105926#comment-17105926
 ] 

Luke Cwik edited comment on BEAM-9945 at 5/13/20, 3:23 AM:
---

When testing the implementation against runner v2, I found that the Go SDK's 
read index ends at *# elements* or the *stop index* when splitting while the 
Python/Java implementations always stop at *# elements - 1* or *stop index - 1*

I think it makes sense to use Go's definition.


was (Author: lcwik):
When testing the implementation against runner v2, I found that the Go SDK's 
read index ends at *# elements* or the *stop index* when splitting while the 
Python/Java implements always stop at *# elements - 1* or *stop index - 1*

I think it makes sense to use Go's definition.

> Use consistent element count for progress counter.
> --
>
> Key: BEAM-9945
> URL: https://issues.apache.org/jira/browse/BEAM-9945
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Critical
> Fix For: 2.21.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This influences how the SDK communicates progress and how splitting is 
> performed.
> We currently inspect the element count of the output (which has inconsistent 
> definitions across SDKs, see BEAM-9934). Instead, we can move to using an 
> explicit, separate metric. 
> This can currently lead to incorrect progress reporting and in some cases 
> even a crash in the UW depending on the SDK, and makes it more difficult (but 
> not impossible) to fix element counts in the future. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9945) Use consistent element count for progress counter.

2020-05-12 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105937#comment-17105937
 ] 

Luke Cwik commented on BEAM-9945:
-

Fix in https://github.com/apache/beam/pull/11689

> Use consistent element count for progress counter.
> --
>
> Key: BEAM-9945
> URL: https://issues.apache.org/jira/browse/BEAM-9945
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Critical
> Fix For: 2.21.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This influences how the SDK communicates progress and how splitting is 
> performed.
> We currently inspect the element count of the output (which has inconsistent 
> definitions across SDKs, see BEAM-9934). Instead, we can move to using an 
> explicit, separate metric. 
> This can currently lead to incorrect progress reporting and in some cases 
> even a crash in the UW depending on the SDK, and makes it more difficult (but 
> not impossible) to fix element counts in the future. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9945) Use consistent element count for progress counter.

2020-05-12 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-9945:

Status: Open  (was: Triage Needed)

> Use consistent element count for progress counter.
> --
>
> Key: BEAM-9945
> URL: https://issues.apache.org/jira/browse/BEAM-9945
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Critical
> Fix For: 2.21.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This influences how the SDK communicates progress and how splitting is 
> performed.
> We currently inspect the element count of the output (which has inconsistent 
> definitions across SDKs, see BEAM-9934). Instead, we can move to using an 
> explicit, separate metric. 
> This can currently lead to incorrect progress reporting and in some cases 
> even a crash in the UW depending on the SDK, and makes it more difficult (but 
> not impossible) to fix element counts in the future. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (BEAM-9945) Use consistent element count for progress counter.

2020-05-12 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik reopened BEAM-9945:
-

When testing the implementation against runner v2, I found that the Go SDK's 
read index ends at *# elements* or the *stop index* when splitting while the 
Python/Java implements always stop at *# elements - 1* or *stop index - 1*

I think it makes sense to use Go's definition.

> Use consistent element count for progress counter.
> --
>
> Key: BEAM-9945
> URL: https://issues.apache.org/jira/browse/BEAM-9945
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Critical
> Fix For: 2.21.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This influences how the SDK communicates progress and how splitting is 
> performed.
> We currently inspect the element count of the output (which has inconsistent 
> definitions across SDKs, see BEAM-9934). Instead, we can move to using an 
> explicit, separate metric. 
> This can currently lead to incorrect progress reporting and in some cases 
> even a crash in the UW depending on the SDK, and makes it more difficult (but 
> not impossible) to fix element counts in the future. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9934) Resolve differences in beam:metric:element_count:v1 implementations

2020-05-12 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105866#comment-17105866
 ] 

Luke Cwik commented on BEAM-9934:
-

2.22

> Resolve differences in beam:metric:element_count:v1 implementations
> ---
>
> Key: BEAM-9934
> URL: https://issues.apache.org/jira/browse/BEAM-9934
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
> Fix For: 2.22.0
>
>
> The [element 
> count|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/model/pipeline/src/main/proto/metrics.proto#L206]
>  metric represents the number of elements within a PCollection and is 
> interpreted differently across the Beam SDK versions.
> In the [Java 
> SDK|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/PCollectionConsumerRegistry.java#L207]
>  this represents the number of elements and includes how many windows those 
> elements are in. This metric is incremented as soon as the element has been 
> output.
> In the [Python 
> SDK|https://github.com/apache/beam/blame/bfd151aa4c3aad29f3aea6482212ff8543ded8d7/sdks/python/apache_beam/runners/worker/opcounters.py#L247]
>  this represents the number of elements and doesn't include how many windows 
> those elements are in. The metric is also only incremented after the element 
> has finished processing.
> The [Go 
> SDK|https://github.com/apache/beam/blob/7097850daa46674b88425a124bc442fc8ce0dcb8/sdks/go/pkg/beam/core/runtime/exec/datasource.go#L260]
>  does the same thing as Python.
> Traditionally in Dataflow this has always been the exploded window element 
> count and the counter is incremented as soon as the element is output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9934) Resolve differences in beam:metric:element_count:v1 implementations

2020-05-12 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-9934:

Fix Version/s: (was: 2.21.0)
   2.22.0

> Resolve differences in beam:metric:element_count:v1 implementations
> ---
>
> Key: BEAM-9934
> URL: https://issues.apache.org/jira/browse/BEAM-9934
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
> Fix For: 2.22.0
>
>
> The [element 
> count|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/model/pipeline/src/main/proto/metrics.proto#L206]
>  metric represents the number of elements within a PCollection and is 
> interpreted differently across the Beam SDK versions.
> In the [Java 
> SDK|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/PCollectionConsumerRegistry.java#L207]
>  this represents the number of elements and includes how many windows those 
> elements are in. This metric is incremented as soon as the element has been 
> output.
> In the [Python 
> SDK|https://github.com/apache/beam/blame/bfd151aa4c3aad29f3aea6482212ff8543ded8d7/sdks/python/apache_beam/runners/worker/opcounters.py#L247]
>  this represents the number of elements and doesn't include how many windows 
> those elements are in. The metric is also only incremented after the element 
> has finished processing.
> The [Go 
> SDK|https://github.com/apache/beam/blob/7097850daa46674b88425a124bc442fc8ce0dcb8/sdks/go/pkg/beam/core/runtime/exec/datasource.go#L260]
>  does the same thing as Python.
> Traditionally in Dataflow this has always been the exploded window element 
> count and the counter is incremented as soon as the element is output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9488) Python SDK sending unexpected MonitoringInfo

2020-05-12 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-9488:

Fix Version/s: (was: 2.22.0)
   2.21.0

> Python SDK sending unexpected MonitoringInfo
> 
>
> Key: BEAM-9488
> URL: https://issues.apache.org/jira/browse/BEAM-9488
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ruoyun Huang
>Assignee: Luke Cwik
>Priority: Minor
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> element_count metrics is supposed to be tied with pcollection ids, but by 
> inspecting what is sent over by python sdk, we see there are monitoringInfo 
> sent wit ptransforms in it. 
> [Double checked the job graph, these seem to be redundant. i.e. the 
> corresponding pcollection does have its own MonitoringInfo reported.]
> Likely a bug. 
> Proof: 
> urn: "beam:metric:element_count:v1"
> type: "beam:metrics:sum_int_64"
> metric {
>   counter_data {
> int64_value: 1
>   }
> }
> labels {
>   key: "PTRANSFORM"
>   value: "start/MaybeReshuffle/Reshuffle/RemoveRandomKeys-ptransform-85"
> }
> labels {
>   key: "TAG"
>   value: "None"
> }
> timestamp {
>   seconds: 1583949073
>   nanos: 842402935
> }



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-6327) Don't attempt to fuse subtransforms of primitive/known transforms.

2020-05-12 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-6327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105629#comment-17105629
 ] 

Luke Cwik commented on BEAM-6327:
-

I'm not sure if we want to add it to the fuser since the pipeline runner may 
want to choose the order in which they perform transform expansion/replacement.

> Don't attempt to fuse subtransforms of primitive/known transforms.
> --
>
> Key: BEAM-6327
> URL: https://issues.apache.org/jira/browse/BEAM-6327
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-direct
>Reporter: Robert Bradshaw
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Currently we must remove all sub-components of any known transform that may 
> have an optional substructure, e.g. 
> [https://github.com/apache/beam/blob/release-2.9.0/sdks/python/apache_beam/runners/portability/portable_runner.py#L126]
>  (for GBK) and [https://github.com/apache/beam/pull/7360] (Reshuffle).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   6   7   8   9   >