Thanks for the discussion!

>From this thread I do not see any objection with moving forward with
removing the sink.
Given this I will open a voting thread tomorrow.

Cheers,
Kostas

On Wed, Oct 28, 2020 at 6:50 PM Stephan Ewen <se...@apache.org> wrote:
>
> +1 to remove the Bucketing Sink.
>
> It has been very common in the past to remove code that was deprecated for 
> multiple releases in favor of reducing baggage.
> Also in cases that had no perfect drop-in replacement, but needed users to 
> forward fit the code.
> I am not sure I understand why this case is so different.
>
> Why the Bucketing Sink should be thrown out, in my opinion:
>
> The Bucketing sink makes it easier for users to add general Hadoop writes.
> But the price is that it easily leads to dataloss, because it assumes 
> flush()/sync() work reliably on Hadoop relicably, which they don't (HDFS 
> works somewhat, S3 works not at all).
> I think the Bucketing sink is a trap for users, that's why it was deprecated 
> long ago.
>
> The StreamingFileSink covers the majority of cases from the Bucketing Sink.
> It does have some friction when adding/wrapping some general Hadoop writers. 
> Parts will be solved with the transactional sink work.
> If something is missing and blocking users, we can prioritize adding it to 
> the Streaming File Sink. Also that is something we did before and it helped 
> being pragmatic with moving forward, rather than being held back by "maybe 
> there is something we don't know".
>
>
>
>
> On Wed, Oct 28, 2020 at 12:36 PM Chesnay Schepler <ches...@apache.org> wrote:
>>
>> Then we can't remove it, because there is no way for us to ascertain
>> whether anyone is still using it.
>>
>> Sure, the user ML is the best we got, but you can't argue that we don't
>> want any users to be affected and then use an imperfect mean to find users.
>> If you are fine with relying on the user ML, then you _are_ fine with
>> removing it at the cost of friction for some users.
>>
>> To be clear, I, personally, don't have a problem with removing it (we
>> have removed other connectors in the past that did not have a migration
>> plan), I just reject he argumentation.
>>
>> On 10/28/2020 12:21 PM, Kostas Kloudas wrote:
>> > No, I do not think that "we are fine with removing it at the cost of
>> > friction for some users".
>> >
>> > I believe that this can be another discussion that we should have as
>> > soon as we establish that someone is actually using it. The point I am
>> > trying to make is that if no user is using it, we should remove it and
>> > not leave unmaintained code around.
>> >
>> > On Wed, Oct 28, 2020 at 12:11 PM Chesnay Schepler <ches...@apache.org> 
>> > wrote:
>> >> The alternative could also be to use a different argument than "no one
>> >> uses it", e.g., we are fine with removing it at the cost of friction for
>> >> some users because there are better alternatives.
>> >>
>> >> On 10/28/2020 10:46 AM, Kostas Kloudas wrote:
>> >>> I think that the mailing lists is the best we can do and I would say
>> >>> that they seem to be working pretty well (e.g. the recent Mesos
>> >>> discussion).
>> >>> Of course they are not perfect but the alternative would be to never
>> >>> remove anything user facing until the next major release, which I find
>> >>> pretty strict.
>> >>>
>> >>> On Wed, Oct 28, 2020 at 10:04 AM Chesnay Schepler <ches...@apache.org> 
>> >>> wrote:
>> >>>> If the conclusion is that we shouldn't remove it if _anyone_ is using
>> >>>> it, then we cannot remove it because the user ML obviously does not
>> >>>> reach all users.
>> >>>>
>> >>>> On 10/28/2020 9:28 AM, Kostas Kloudas wrote:
>> >>>>> Hi all,
>> >>>>>
>> >>>>> I am bringing the up again to see if there are any users actively
>> >>>>> using the BucketingSink.
>> >>>>> So far, if I am not mistaken (and really sorry if I forgot anything),
>> >>>>> it is only a discussion between devs about the potential problems of
>> >>>>> removing it. I totally understand Chesnay's concern about not
>> >>>>> providing compatibility with the StreamingFileSink (SFS) and if there
>> >>>>> are any users, then we should not remove it without trying to find a
>> >>>>> solution for them.
>> >>>>>
>> >>>>> But if there are no users then I would still propose to remove the
>> >>>>> module, given that I am not aware of any efforts to provide
>> >>>>> compatibility with the SFS any time soon.
>> >>>>> The reasons for removing it also include the facts that we do not
>> >>>>> actively maintain it and we do not add new features. As for potential
>> >>>>> missing features in the SFS compared to the BucketingSink that was
>> >>>>> mentioned before, I am not aware of any fundamental limitations and
>> >>>>> even if there are, I would assume that the solution is not to direct
>> >>>>> the users to a deprecated sink but rather try to increase the
>> >>>>> functionality of the actively maintained one.
>> >>>>>
>> >>>>> Please keep in mind that the BucketingSink is deprecated since FLINK
>> >>>>> 1.9 and there is a new File Sink that is coming as part of FLIP-143
>> >>>>> [1].
>> >>>>> Again, if there are any active users who cannot migrate easily, then
>> >>>>> we cannot remove it before trying to provide a smooth migration path.
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Kostas
>> >>>>>
>> >>>>> [1] 
>> >>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-143%3A+Unified+Sink+API
>> >>>>>
>> >>>>> On Fri, Oct 16, 2020 at 4:36 PM Chesnay Schepler <ches...@apache.org> 
>> >>>>> wrote:
>> >>>>>> @Seth: Earlier in this discussion it was said that the BucketingSink
>> >>>>>> would not be usable in 1.12 .
>> >>>>>>
>> >>>>>> On 10/16/2020 4:25 PM, Seth Wiesman wrote:
>> >>>>>>> +1 It has been deprecated for some time and the StreamingFileSink has
>> >>>>>>> stabalized with a large number of formats and features.
>> >>>>>>>
>> >>>>>>> Plus, the bucketing sink only implements a small number of stable
>> >>>>>>> interfaces[1]. I would expect users to continue to use the bucketing 
>> >>>>>>> sink
>> >>>>>>> from the 1.11 release with future versions for some time.
>> >>>>>>>
>> >>>>>>> Seth
>> >>>>>>>
>> >>>>>>> https://github.com/apache/flink/blob/2ff3b771cbb091e1f43686dd8e176cea6d435501/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L170-L172
>> >>>>>>>
>> >>>>>>> On Thu, Oct 15, 2020 at 2:57 PM Kostas Kloudas <kklou...@gmail.com> 
>> >>>>>>> wrote:
>> >>>>>>>
>> >>>>>>>> @Arvid Heise I also do not remember exactly what were all the
>> >>>>>>>> problems. The fact that we added some more bulk formats to the
>> >>>>>>>> streaming file sink definitely reduced the non-supported features. 
>> >>>>>>>> In
>> >>>>>>>> addition, the latest discussion I found on the topic was [1] and the
>> >>>>>>>> conclusion of that discussion seems to be to remove it.
>> >>>>>>>>
>> >>>>>>>> Currently, I cannot find any obvious reason why keeping the
>> >>>>>>>> BucketingSink, apart from the fact that we do not have a migration
>> >>>>>>>> plan unfortunately. This is why I posted this to dev@ and user@.
>> >>>>>>>>
>> >>>>>>>> Cheers,
>> >>>>>>>> Kostas
>> >>>>>>>>
>> >>>>>>>> [1]
>> >>>>>>>> https://lists.apache.org/thread.html/r799be74658bc7e169238cc8c1e479e961a9e85ccea19089290940ff0%40%3Cdev.flink.apache.org%3E
>> >>>>>>>>
>> >>>>>>>> On Wed, Oct 14, 2020 at 8:03 AM Arvid Heise <ar...@ververica.com> 
>> >>>>>>>> wrote:
>> >>>>>>>>> I remember this conversation popping up a few times already and 
>> >>>>>>>>> I'm in
>> >>>>>>>>> general a big fan of removing BucketingSink.
>> >>>>>>>>>
>> >>>>>>>>> However, until now there were a few features lacking in 
>> >>>>>>>>> StreamingFileSink
>> >>>>>>>>> that are present in BucketingSink and that are being actively used 
>> >>>>>>>>> (I
>> >>>>>>>> can't
>> >>>>>>>>> exactly remember them now, but I can look it up if everyone else 
>> >>>>>>>>> is also
>> >>>>>>>>> suffering from bad memory). Did we manage to add them in the 
>> >>>>>>>>> meantime? If
>> >>>>>>>>> not, then it feels rushed to remove it at this point.
>> >>>>>>>>>
>> >>>>>>>>> On Tue, Oct 13, 2020 at 2:33 PM Kostas Kloudas <kklou...@gmail.com>
>> >>>>>>>> wrote:
>> >>>>>>>>>> @Chesnay Schepler  Off the top of my head, I cannot find an easy 
>> >>>>>>>>>> way
>> >>>>>>>>>> to migrate from the BucketingSink to the StreamingFileSink. It 
>> >>>>>>>>>> may be
>> >>>>>>>>>> possible but it will require some effort because the logic would 
>> >>>>>>>>>> be
>> >>>>>>>>>> "read the old state, commit it, and start fresh with the
>> >>>>>>>>>> StreamingFileSink."
>> >>>>>>>>>>
>> >>>>>>>>>> On Tue, Oct 13, 2020 at 2:09 PM Aljoscha Krettek 
>> >>>>>>>>>> <aljos...@apache.org>
>> >>>>>>>>>> wrote:
>> >>>>>>>>>>> On 13.10.20 14:01, David Anderson wrote:
>> >>>>>>>>>>>> I thought this was waiting on FLIP-46 -- Graceful Shutdown
>> >>>>>>>> Handling --
>> >>>>>>>>>> and
>> >>>>>>>>>>>> in fact, the StreamingFileSink is mentioned in that FLIP as a
>> >>>>>>>>>> motivating
>> >>>>>>>>>>>> use case.
>> >>>>>>>>>>> Ah yes, I see FLIP-147 as a more general replacement for FLIP-46.
>> >>>>>>>> Thanks
>> >>>>>>>>>>> for the reminder, we should close FLIP-46 now with an explanatory
>> >>>>>>>>>>> message to avoid confusion.
>> >>>>>>>>> --
>> >>>>>>>>>
>> >>>>>>>>> Arvid Heise | Senior Java Developer
>> >>>>>>>>>
>> >>>>>>>>> <https://www.ververica.com/>
>> >>>>>>>>>
>> >>>>>>>>> Follow us @VervericaData
>> >>>>>>>>>
>> >>>>>>>>> --
>> >>>>>>>>>
>> >>>>>>>>> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
>> >>>>>>>>> Conference
>> >>>>>>>>>
>> >>>>>>>>> Stream Processing | Event Driven | Real Time
>> >>>>>>>>>
>> >>>>>>>>> --
>> >>>>>>>>>
>> >>>>>>>>> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>> >>>>>>>>>
>> >>>>>>>>> --
>> >>>>>>>>> Ververica GmbH
>> >>>>>>>>> Registered at Amtsgericht Charlottenburg: HRB 158244 B
>> >>>>>>>>> Managing Directors: Timothy Alexander Steinert, Yip Park Tung 
>> >>>>>>>>> Jason, Ji
>> >>>>>>>>> (Toni) Cheng
>> >>
>>

Reply via email to