[
https://issues.apache.org/jira/browse/FLINK-20115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233765#comment-17233765
]
Till Rohrmann commented on FLINK-20115:
---------------------------------------
I started testing the new batch execution. Since the documentation isn't ready
yet, I used the corresponding
[FLIP-134|https://cwiki.apache.org/confluence/display/FLINK/FLIP-134%3A+Batch+execution+for+the+DataStream+API]
as a starting point. The first thing I noticed is that the actual
implementation does not fully implement the FLIP as some of the proposed API
does not exist. For example
{code}
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
env.setRuntimeMode(RuntimeMode.BATCH)
{code}
does not work. Moreover, some of the mentioned configuration parameters do not
exist (only {{execution.runtime-mode}} exists). I would suggest to update the
FLIP according to the actual progress in order to avoid confusion.
I am not entirely sold on configuring the execution mode as a configuration
parameter. I see a couple of problems with it:
1) The execution mode is probably a job specific configuration (e.g. depending
on the actual sources). Why should it be configurable by the client's
{{flink-conf.yaml}}?
2) The discoverability of these options are poor imo. The problem is that users
need to know about {{ExecutionOptions}} in order to find the
{{ExecutionOptions.RUNTIME_MODE}}. Allowing the user to configure the mode more
prominently could avoid confusion.
Another problem I run into is that one can construct a {{BATCH}} job which
needs more than a single slot to execute. Hence, the FLIP's statement that one
only needs a single slot is wrong. The problem is that one can still construct
a pipelined region which requires more than a single slot by using {{rescale}}.
A job showing the problem can be found
[here|https://github.com/tillrohrmann/flink-streaming-batch-execution/tree/rescale-fails-scheduling-with-single-slot].
Next I tried to use the new {{FileSource}}. The first problem I ran into was
that the connector depends on {{flink-connector-base}} which I needed to add
additionally. This seems a bit cumbersome. The ticket to discuss this problem
is FLINK-20196.
Then I tried to add the new {{FileSink}}. The job can be found
[here|https://github.com/tillrohrmann/flink-streaming-batch-execution/tree/file-sink-fails-with-default-bucket-assigner].
Here I ran immediately into an {{UnsupportedOperationException}} when using
the default {{BucketAssigner}}. The ticked to discuss this problem is
FLINK-20197.
> Test Batch execution for the DataStream API
> --------------------------------------------
>
> Key: FLINK-20115
> URL: https://issues.apache.org/jira/browse/FLINK-20115
> Project: Flink
> Issue Type: Sub-task
> Components: API / DataStream
> Affects Versions: 1.12.0
> Reporter: Robert Metzger
> Assignee: Till Rohrmann
> Priority: Critical
> Fix For: 1.12.0
>
>
> Test the following new features:
> - https://issues.apache.org/jira/browse/FLINK-19316
> - https://issues.apache.org/jira/browse/FLINK-19268
> - https://issues.apache.org/jira/browse/FLINK-19758
> The three issues can really only be tested in combination. FLINK-19316 is
> done but missing documentation.
> Write an example that uses a (new) FileSource, a (new) FileSink, some random
> transformations
> Run the example in BATCH mode
> How ergonomic is the API/configuration?
> Are there any weird log messages/exceptions in the JM/TM logs
> Maybe try sth that doesn't work on BATCH execution, such as
> iterations/feedback edges.
> ----
> [General Information about the Flink 1.12 release
> testing|https://cwiki.apache.org/confluence/display/FLINK/1.12+Release+-+Community+Testing]
> When testing a feature, consider the following aspects:
> - Is the documentation easy to understand
> - Are the error messages, log messages, APIs etc. easy to understand
> - Is the feature working as expected under normal conditions
> - Is the feature working / failing as expected with invalid input, induced
> errors etc.
> If you find a problem during testing, please file a ticket
> (Priority=Critical; Fix Version = 1.12.0), and link it in this testing ticket.
> During the testing, and once you are finished, please write a short summary
> of all things you have tested.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)