Github user tillrohrmann commented on the pull request:
https://github.com/apache/flink/pull/1069#issuecomment-137140076
@gdfm, thanks for the great explanation. A good way to visualize what's
happening.
I'm just wondering why you shouldn't be able to connect more than 2
containers, let's say 3. In cases where you have an extremely high data skew,
this might be helpful. Imagine that you have 10 containers and only 2 of them
are full. Then in the best case you'll get 4 half filled containers after
connecting two containers. But this still leaves 6 unused containers. Wouldn't
it be better to connect for example 5 containers in this case? Then in the best
case you would use all available containers. But of course this strongly
depends on your actual data and therefore I'd vote to make the number of
connected containers configurable.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---