mwinkels commented on PR #32:
URL: 
https://github.com/apache/flink-connector-gcp-pubsub/pull/32#issuecomment-3062000817

   Hi @clmccart ,
   
   We are trying to use this code with our Flink setup and we found a couple of 
issues.
   
   First, we are migrating to Flink v2.0 and had to adjust the code - mostly 
remove the classes that are no longer supported.
   
   This is what we found:
   
   1. the streaming pull works fine and with Flink v2.0 job restarts are much 
faster, which leads to better performance overall.
   2. the `PubSubSplitEnumerator` could be completely stateless. GCP PubSub 
does not support splits that can be referenced externally, so there is no need 
to maintain any state on the connector side for assigning splits to readers.
   3. the `SubscriptionSplit`-state is similarly not really necessary, since 
there is only one subscription for a source and there is no meaningful split. 
The current implementation is harmful, since it stores the task-id in the split 
and when it recovers, there is a potential of re-creating subscriptions that 
are not part of an active reader. We remove the `uid` field from the 
`SubscriptionSplitProto` to fix this issue.
   4. the `PubSubSplitReader` is more complex then it needs to be. A 
SplitReader instance is always bound to a single source task, so it only ever 
needs a single subscription  for the single split that it will received. The 
`Map<Split, Subscription>` that is currently in the `PubSubSplitReader` should 
only ever hold one instance. We replaced it with an instance field of type 
`NotifyingPullSubscriber`.
   
   I hope these findings will help to improve this MR and get it to the finish 
line.
   
   Best,
   -Maarten Winkels
   
   P.S.: I'm not sure how to contribute changes back, since they are Flink v2 
based.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to