[GitHub] [flink] zentol edited a comment on pull request #15497: [FLINK-22084][runtime] Use a consistent default max parallelism in the Adaptive Scheduler

GitBox Wed, 07 Apr 2021 15:30:01 -0700


zentol edited a comment on pull request #15497:
URL: https://github.com/apache/flink/pull/15497#issuecomment-815303500



   > We could also add the invariant that is currently defined for reactive 
mode, where the savepoint can set the max parallelism as long it is at least 
what has been auto-configured, though I agree that this adds yet another bit of 
complexity to this tiny use case.
   
   It also wouldn't solve the issue for the RescalingITCase :/
   
   I would rule out option 2 because it will cause the system to behave 
inconsistently; it will work if the initially set max parallelism is larger 
then the one derived from the parallelism, until a user increases the 
parallelism too much. If we're breaking the behavior, let's be strict about it.
   
   As for 1vs3, so far we only concerned ourselves with the case where the max 
parallelism was set initially, and then in a later submission was removed. I'm 
perfectly content with removing support for such use-cases.
   
   However, I'm concerned about the following:
   Consider a job for which the max parallelism was never set by the user. We 
ran the with with some initial parallelism P1, derived max parallelism MP1, and 
stored it in the savepoint. On the second run we increase the parallelism to P2.
   Is there a case where P2 > P1 AND P2 < MP1, but the derived max parallelism 
MP2 > MP1? IOW, is there a case where a user can increase the parallelism such 
that it is still reasonable to expect it to run (because we aren't exceeding 
the initial max parallelism), but our new strict rules would forbid it to do so?
   If so, then I think we cannot really deny this from working.
   
   Unless I'm missing anything, then here is an example where this happens:
   ```
   P1= 80 => MP1=128
   P2=100 => MP2=256
   ```
   
   So similarly to option 2, with option 1 we still have this inconsistency 
that can very well break existing jobs when migrated to the adaptive scheduler, 
_or at some point in the future after migration_. The only way to prevent that 
is option 3, or, option 4: outright reject jobs that have not explicitly set 
the max parallelism.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] zentol edited a comment on pull request #15497: [FLINK-22084][runtime] Use a consistent default max parallelism in the Adaptive Scheduler

Reply via email to