[GitHub] [beam] dmvk commented on a change in pull request #13353: [BEAM-11267] Remove unecessary reshuffle for stateful ParDo after key…

2020-11-16 Thread GitBox


dmvk commented on a change in pull request #13353:
URL: https://github.com/apache/beam/pull/13353#discussion_r524545989



##
File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingTranslationContext.java
##
@@ -84,6 +85,17 @@ public void setOutputDataStream(PValue value, DataStream 
set) {
 }
   }
 
+   void setProducer(T value, PTransform producer) {
+if (!producers.containsKey(value)) {

Review comment:
   👍 makes sense





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] dmvk commented on a change in pull request #13353: [BEAM-11267] Remove unecessary reshuffle for stateful ParDo after key…

2020-11-16 Thread GitBox


dmvk commented on a change in pull request #13353:
URL: https://github.com/apache/beam/pull/13353#discussion_r524545316



##
File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/WorkItemKeySelector.java
##
@@ -49,6 +52,6 @@ public ByteBuffer 
getKey(WindowedValue> value) thro
 
   @Override
   public TypeInformation getProducedType() {
-return new GenericTypeInfo<>(ByteBuffer.class);
+return new CoderTypeInformation<>(FlinkKeyUtils.ByteBufferCoder.of(), 
pipelineOptions.get());

Review comment:
   Not sure if this is necessary. I wanted to ensure that the new 
"reinterpreted partitioning" is compatible with the one used by GBK / Combine.
   
   The idea was if partitioning is not compatible, it may result in some state 
partitioning related glitches (eg. you wouldn't have local state for a 
key-group you need).
   
   Second thoughts, flink selects target partition (key group) based on "pojo 
hash code" (not based on binary representation), so the previous version was 
probably compatible enough 🤔 
   
   @mxm WDYT?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] dmvk commented on a change in pull request #13353: [BEAM-11267] Remove unecessary reshuffle for stateful ParDo after key…

2020-11-16 Thread GitBox


dmvk commented on a change in pull request #13353:
URL: https://github.com/apache/beam/pull/13353#discussion_r524545316



##
File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/WorkItemKeySelector.java
##
@@ -49,6 +52,6 @@ public ByteBuffer 
getKey(WindowedValue> value) thro
 
   @Override
   public TypeInformation getProducedType() {
-return new GenericTypeInfo<>(ByteBuffer.class);
+return new CoderTypeInformation<>(FlinkKeyUtils.ByteBufferCoder.of(), 
pipelineOptions.get());

Review comment:
   Not sure if this is necessary. I wanted to ensure that the new 
"reinterpreted partitioning" is compatible with the one used by GBK / Combine.
   
   The idea was if partitioning is not compatible, it may result in some state 
partitioning related glitches (eg. you wouldn't have local state for a 
key-group you need).
   
   Second thoughts, flink selects target partition based on "pojo hash code" 
(not based on binary representation), so the previous version was probably 
compatible enough 🤔 
   
   @mxm WDYT?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org