+1 for removing them.
From a quick look, most of them (not all) have been deprecated a long
time ago.
Cheers,
Kostas
On Mon, Aug 17, 2020 at 9:37 PM Dawid Wysakowicz wrote:
@David Yes, my idea was to remove any use of fold method and all
related classes including WindowedStream#fold
@Klou Good idea to also remove the deprecated enableCheckpointing() &
StreamExecutionEnvironment#readFile and alike. I did another pass over some
of the classes and thought we could also drop:
ExecutionConfig#set/getCodeAnalysisMode
ExecutionConfig#disable/enableSysoutLogging
ExecutionConfig#set/isFailTaskOnCheckpointError
ExecutionConfig#isLatencyTrackingEnabled
As for the `forceCheckpointing` I am not fully convinced to doing it.
As far as I know iterations still do not participate in checkpointing
correctly. Therefore it still might make sense to force it. In other words
there is no real alternative to that method. Unless we only remove the
methods from StreamExecutionEnvironment and redirect to the setter in
CheckpointConfig. WDYT?
An updated list of methods I suggest to remove:
ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in
1.1?)
StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
(deprecated in 1.2)
RuntimeContext#getAllAccumulators (deprecated in 0.10)
DataStream#fold and all related classes and methods such as
FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
1.3/1.4)
StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
(deprecated in 1.5)
DataStream#split (deprecated in 1.8)
Methods in (Connected)DataStream that specify keys as either indices
or field names such as DataStream#keyBy, DataStream#partitionCustom,
ConnectedStream#keyBy, .... (deprecated in 1.11)
Bear in mind that majority of the options listed above in
ExecutionConfig take no effect. They were left there purely to satisfy the
binary compatibility. Personally I don't see any benefit of leaving a
method and silently dropping the underlying feature. The only configuration
that is respected is setting the number of execution retries.
I also wanted to make it explicit that most of the changes above
would result in a binary incompatibility and require additional exclusions
in the japicmp. Those are:
ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in
1.1?)
DataStream#fold and all related classes and methods such as
FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
1.3/1.4)
DataStream#split (deprecated in 1.8)
Methods in (Connected)DataStream that specify keys as either indices
or field names such as DataStream#keyBy, DataStream#partitionCustom,
ConnectedStream#keyBy, .... (deprecated in 1.11)
StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
(deprecated in 1.2)
Looking forward to more opinions on the issue.
Best,
Dawid
On 17/08/2020 12:49, Kostas Kloudas wrote:
Thanks a lot for starting this Dawid,
Big +1 for the proposed clean-up, and I would also add the deprecated
methods of the StreamExecutionEnvironment like:
enableCheckpointing(long interval, CheckpointingMode mode, boolean
force)
enableCheckpointing()
isForceCheckpointing()
readFile(FileInputFormat inputFormat,String
filePath,FileProcessingMode watchType,long interval, FilePathFilter
filter)
readFileStream(...)
socketTextStream(String hostname, int port, char delimiter, long
maxRetry)
socketTextStream(String hostname, int port, char delimiter)
There are more, like the (get)/setNumberOfExecutionRetries() that were
deprecated long ago, but I have not investigated to see if they are
actually easy to remove.
Cheers,
Kostas
On Mon, Aug 17, 2020 at 10:53 AM Dawid Wysakowicz
wrote:
Hi devs and users,
I wanted to ask you what do you think about removing some of the
deprecated APIs around the DataStream API.
The APIs I have in mind are:
RuntimeContext#getAllAccumulators (deprecated in 0.10)
DataStream#fold and all related classes and methods such as
FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
1.3/1.4)
StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
(deprecated in 1.5)
DataStream#split (deprecated in 1.8)
Methods in (Connected)DataStream that specify keys as either indices
or field names such as DataStream#keyBy, DataStream#partitionCustom,
ConnectedStream#keyBy, .... (deprecated in 1.11)
I think the first three should be straightforward. They are long
deprecated. The getAccumulators method is not used very often in my
opinion. The same applies to the DataStream#fold which additionally is not
very performant. Lastly the setStateBackend has an alternative with a class
from the AbstractStateBackend hierarchy, therefore it will be still code
compatible. Moreover if we remove the
#setStateBackend(AbstractStateBackend) we will get rid off warnings users
have right now when setting a statebackend as the correct method cannot be
used without an explicit casting.
As for the DataStream#split I know there were some objections against
removing the #split method in the past. I still believe the output tags can
replace the split method already.
The only problem in the last set of methods I propose to remove is
that they were deprecated only in the last release and those method were
only partially deprecated. Moreover some of the methods were not deprecated
in ConnectedStreams. Nevertheless I'd still be inclined to remove the
methods in this release.
Let me know what do you think about it.
Best,
Dawid