Github user fhueske commented on the pull request:

    https://github.com/apache/incubator-flink/pull/128#issuecomment-56951810
  
    The DOP of the partition operator needs to be explicitly set to the DOP of 
the receiving task. Otherwise, the data is shuffled again.
    I'm pretty sure this behavior is never wanted and think it opens a 
potential trap. Also multiple successors with different DOPs might cause 
problems.
    These were exactly the cases, I tried to avoid with my implementation (+ 
repartitioning where it does not make any sense).
    
    The way it is done in this PR, makes things more explicit and controlable 
for the user. A user can do more stuff but also a lot of very stupid things.
    I would prefer the safer alternative, but won't veto if others find this is 
a better solution.
    
    However, if we go with the PR, I vote to make the risk of this operator 
mcuh more clear in the JavaDocs and the documentation.
    We could also include the DOP as an additional parameter to rebalance() and 
partitionByHash() and deactivate the setParallelism() to make clear that the 
DOP is very important for this operator (-1 could be used for the default 
parallelsim).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to