Till Rohrmann created FLINK-3885:
------------------------------------

             Summary: Revisit parallelism logic
                 Key: FLINK-3885
                 URL: https://issues.apache.org/jira/browse/FLINK-3885
             Project: Flink
          Issue Type: Bug
          Components: Streaming
    Affects Versions: 1.1.0
            Reporter: Till Rohrmann


Flink has multiple ways to specify the parallelism of programs. One can set it 
directly at the operator or job-wide via the {{ExecutionConfig}}. Then it is 
also possible to define a default parallelism in the {{flink-conf.yaml}}. This 
default parallelism is only respected on the client side.

For batch jobs, the default parallelism is consistently set in the 
{{Optimizer}}. However, for streaming jobs the default parallelism is only set 
iff the {{StreamExecutionEnvironment}} has been created via 
{{StreamExecutionEnvironment.getExecutionEnvironment()}}. If one creates a 
{{RemoteStreamEnvironment}}, then the default parallelism won't be respected. 
Instead the parallelism is {{-1}}. Also the {{JobManager}} does not respect the 
default parallelism. But whenever it sees a parallelism of {{-1}} it sets it to 
{{1}}. This behaviour is not consistent with the batch behaviour and the 
behaviour described in the documentation of {{parallelism.default}}.

On top of that we also have the {{PARALLELISM_AUTO_MAX}} value for the 
parallelism. This value tells the system to use all available slots. However, 
this feature is nowhere documented in our docs.

I would propose to get rid of {{PARALLELISM_AUTO_MAX}} and to make the default 
parallelism behaviour in the batch and streaming API consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to