@David Yes, my idea was to remove any use of fold method and all related
classes including WindowedStream#fold

@Klou Good idea to also remove the deprecated enableCheckpointing() &
StreamExecutionEnvironment#readFile and alike. I did another pass over
some of the classes and thought we could also drop:

  * ExecutionConfig#set/getCodeAnalysisMode
  * ExecutionConfig#disable/enableSysoutLogging
  * ExecutionConfig#set/isFailTaskOnCheckpointError
  * ExecutionConfig#isLatencyTrackingEnabled

As for the `forceCheckpointing` I am not fully convinced to doing it. As
far as I know iterations still do not participate in checkpointing
correctly. Therefore it still might make sense to force it. In other
words there is no real alternative to that method. Unless we only remove
the methods from StreamExecutionEnvironment and redirect to the setter
in CheckpointConfig. WDYT?

An updated list of methods I suggest to remove:

  * ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
  * ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
  * ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
  * ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
  * ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in 1.1?)
  * 
StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
    (deprecated in 1.2)
  * RuntimeContext#getAllAccumulators (deprecated in 0.10)
  * DataStream#fold and all related classes and methods such as
    FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated
    in 1.3/1.4)
  * StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
    (deprecated in 1.5)
  * DataStream#split (deprecated in 1.8)
  * Methods in (Connected)DataStream that specify keys as either indices
    or field names such as DataStream#keyBy, DataStream#partitionCustom,
    ConnectedStream#keyBy, .... (deprecated in 1.11)

Bear in mind that majority of the options listed above in
ExecutionConfig take no effect. They were left there purely to satisfy
the binary compatibility. Personally I don't see any benefit of leaving
a method and silently dropping the underlying feature. The only
configuration that is respected is setting the number of execution retries.

I also wanted to make it explicit that most of the changes above would
result in a binary incompatibility and require additional exclusions in
the japicmp. Those are:

  * ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
  * ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
  * ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
  * ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
  * ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in 1.1?)
  * DataStream#fold and all related classes and methods such as
    FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated
    in 1.3/1.4)
  * DataStream#split (deprecated in 1.8)
  * Methods in (Connected)DataStream that specify keys as either indices
    or field names such as DataStream#keyBy, DataStream#partitionCustom,
    ConnectedStream#keyBy, .... (deprecated in 1.11)
  * 
StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
    (deprecated in 1.2)

Looking forward to more opinions on the issue.

Best,

Dawid


On 17/08/2020 12:49, Kostas Kloudas wrote:
> Thanks a lot for starting this Dawid,
>
> Big +1 for the proposed clean-up, and I would also add the deprecated
> methods of the StreamExecutionEnvironment like:
>
> enableCheckpointing(long interval, CheckpointingMode mode, boolean force)
> enableCheckpointing()
> isForceCheckpointing()
>
> readFile(FileInputFormat<OUT> inputFormat,String
> filePath,FileProcessingMode watchType,long interval, FilePathFilter
> filter)
> readFileStream(...)
>
> socketTextStream(String hostname, int port, char delimiter, long maxRetry)
> socketTextStream(String hostname, int port, char delimiter)
>
> There are more, like the (get)/setNumberOfExecutionRetries() that were
> deprecated long ago, but I have not investigated to see if they are
> actually easy to remove.
>
> Cheers,
> Kostas
>
> On Mon, Aug 17, 2020 at 10:53 AM Dawid Wysakowicz
> <dwysakow...@apache.org> wrote:
>> Hi devs and users,
>>
>> I wanted to ask you what do you think about removing some of the deprecated 
>> APIs around the DataStream API.
>>
>> The APIs I have in mind are:
>>
>> RuntimeContext#getAllAccumulators (deprecated in 0.10)
>> DataStream#fold and all related classes and methods such as FoldFunction, 
>> FoldingState, FoldingStateDescriptor ... (deprecated in 1.3/1.4)
>> StreamExecutionEnvironment#setStateBackend(AbstractStateBackend) (deprecated 
>> in 1.5)
>> DataStream#split (deprecated in 1.8)
>> Methods in (Connected)DataStream that specify keys as either indices or 
>> field names such as DataStream#keyBy, DataStream#partitionCustom, 
>> ConnectedStream#keyBy, .... (deprecated in 1.11)
>>
>> I think the first three should be straightforward. They are long deprecated. 
>> The getAccumulators method is not used very often in my opinion. The same 
>> applies to the DataStream#fold which additionally is not very performant. 
>> Lastly the setStateBackend has an alternative with a class from the 
>> AbstractStateBackend hierarchy, therefore it will be still code compatible. 
>> Moreover if we remove the #setStateBackend(AbstractStateBackend) we will get 
>> rid off warnings users have right now when setting a statebackend as the 
>> correct method cannot be used without an explicit casting.
>>
>> As for the DataStream#split I know there were some objections against 
>> removing the #split method in the past. I still believe the output tags can 
>> replace the split method already.
>>
>> The only problem in the last set of methods I propose to remove is that they 
>> were deprecated only in the last release and those method were only 
>> partially deprecated. Moreover some of the methods were not deprecated in 
>> ConnectedStreams. Nevertheless I'd still be inclined to remove the methods 
>> in this release.
>>
>> Let me know what do you think about it.
>>
>> Best,
>>
>> Dawid

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to