Thanks for the discussion! >From this thread I do not see any objection with moving forward with removing the sink. Given this I will open a voting thread tomorrow.
Cheers, Kostas On Wed, Oct 28, 2020 at 6:50 PM Stephan Ewen <se...@apache.org> wrote: > > +1 to remove the Bucketing Sink. > > It has been very common in the past to remove code that was deprecated for > multiple releases in favor of reducing baggage. > Also in cases that had no perfect drop-in replacement, but needed users to > forward fit the code. > I am not sure I understand why this case is so different. > > Why the Bucketing Sink should be thrown out, in my opinion: > > The Bucketing sink makes it easier for users to add general Hadoop writes. > But the price is that it easily leads to dataloss, because it assumes > flush()/sync() work reliably on Hadoop relicably, which they don't (HDFS > works somewhat, S3 works not at all). > I think the Bucketing sink is a trap for users, that's why it was deprecated > long ago. > > The StreamingFileSink covers the majority of cases from the Bucketing Sink. > It does have some friction when adding/wrapping some general Hadoop writers. > Parts will be solved with the transactional sink work. > If something is missing and blocking users, we can prioritize adding it to > the Streaming File Sink. Also that is something we did before and it helped > being pragmatic with moving forward, rather than being held back by "maybe > there is something we don't know". > > > > > On Wed, Oct 28, 2020 at 12:36 PM Chesnay Schepler <ches...@apache.org> wrote: >> >> Then we can't remove it, because there is no way for us to ascertain >> whether anyone is still using it. >> >> Sure, the user ML is the best we got, but you can't argue that we don't >> want any users to be affected and then use an imperfect mean to find users. >> If you are fine with relying on the user ML, then you _are_ fine with >> removing it at the cost of friction for some users. >> >> To be clear, I, personally, don't have a problem with removing it (we >> have removed other connectors in the past that did not have a migration >> plan), I just reject he argumentation. >> >> On 10/28/2020 12:21 PM, Kostas Kloudas wrote: >> > No, I do not think that "we are fine with removing it at the cost of >> > friction for some users". >> > >> > I believe that this can be another discussion that we should have as >> > soon as we establish that someone is actually using it. The point I am >> > trying to make is that if no user is using it, we should remove it and >> > not leave unmaintained code around. >> > >> > On Wed, Oct 28, 2020 at 12:11 PM Chesnay Schepler <ches...@apache.org> >> > wrote: >> >> The alternative could also be to use a different argument than "no one >> >> uses it", e.g., we are fine with removing it at the cost of friction for >> >> some users because there are better alternatives. >> >> >> >> On 10/28/2020 10:46 AM, Kostas Kloudas wrote: >> >>> I think that the mailing lists is the best we can do and I would say >> >>> that they seem to be working pretty well (e.g. the recent Mesos >> >>> discussion). >> >>> Of course they are not perfect but the alternative would be to never >> >>> remove anything user facing until the next major release, which I find >> >>> pretty strict. >> >>> >> >>> On Wed, Oct 28, 2020 at 10:04 AM Chesnay Schepler <ches...@apache.org> >> >>> wrote: >> >>>> If the conclusion is that we shouldn't remove it if _anyone_ is using >> >>>> it, then we cannot remove it because the user ML obviously does not >> >>>> reach all users. >> >>>> >> >>>> On 10/28/2020 9:28 AM, Kostas Kloudas wrote: >> >>>>> Hi all, >> >>>>> >> >>>>> I am bringing the up again to see if there are any users actively >> >>>>> using the BucketingSink. >> >>>>> So far, if I am not mistaken (and really sorry if I forgot anything), >> >>>>> it is only a discussion between devs about the potential problems of >> >>>>> removing it. I totally understand Chesnay's concern about not >> >>>>> providing compatibility with the StreamingFileSink (SFS) and if there >> >>>>> are any users, then we should not remove it without trying to find a >> >>>>> solution for them. >> >>>>> >> >>>>> But if there are no users then I would still propose to remove the >> >>>>> module, given that I am not aware of any efforts to provide >> >>>>> compatibility with the SFS any time soon. >> >>>>> The reasons for removing it also include the facts that we do not >> >>>>> actively maintain it and we do not add new features. As for potential >> >>>>> missing features in the SFS compared to the BucketingSink that was >> >>>>> mentioned before, I am not aware of any fundamental limitations and >> >>>>> even if there are, I would assume that the solution is not to direct >> >>>>> the users to a deprecated sink but rather try to increase the >> >>>>> functionality of the actively maintained one. >> >>>>> >> >>>>> Please keep in mind that the BucketingSink is deprecated since FLINK >> >>>>> 1.9 and there is a new File Sink that is coming as part of FLIP-143 >> >>>>> [1]. >> >>>>> Again, if there are any active users who cannot migrate easily, then >> >>>>> we cannot remove it before trying to provide a smooth migration path. >> >>>>> >> >>>>> Thanks, >> >>>>> Kostas >> >>>>> >> >>>>> [1] >> >>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-143%3A+Unified+Sink+API >> >>>>> >> >>>>> On Fri, Oct 16, 2020 at 4:36 PM Chesnay Schepler <ches...@apache.org> >> >>>>> wrote: >> >>>>>> @Seth: Earlier in this discussion it was said that the BucketingSink >> >>>>>> would not be usable in 1.12 . >> >>>>>> >> >>>>>> On 10/16/2020 4:25 PM, Seth Wiesman wrote: >> >>>>>>> +1 It has been deprecated for some time and the StreamingFileSink has >> >>>>>>> stabalized with a large number of formats and features. >> >>>>>>> >> >>>>>>> Plus, the bucketing sink only implements a small number of stable >> >>>>>>> interfaces[1]. I would expect users to continue to use the bucketing >> >>>>>>> sink >> >>>>>>> from the 1.11 release with future versions for some time. >> >>>>>>> >> >>>>>>> Seth >> >>>>>>> >> >>>>>>> https://github.com/apache/flink/blob/2ff3b771cbb091e1f43686dd8e176cea6d435501/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L170-L172 >> >>>>>>> >> >>>>>>> On Thu, Oct 15, 2020 at 2:57 PM Kostas Kloudas <kklou...@gmail.com> >> >>>>>>> wrote: >> >>>>>>> >> >>>>>>>> @Arvid Heise I also do not remember exactly what were all the >> >>>>>>>> problems. The fact that we added some more bulk formats to the >> >>>>>>>> streaming file sink definitely reduced the non-supported features. >> >>>>>>>> In >> >>>>>>>> addition, the latest discussion I found on the topic was [1] and the >> >>>>>>>> conclusion of that discussion seems to be to remove it. >> >>>>>>>> >> >>>>>>>> Currently, I cannot find any obvious reason why keeping the >> >>>>>>>> BucketingSink, apart from the fact that we do not have a migration >> >>>>>>>> plan unfortunately. This is why I posted this to dev@ and user@. >> >>>>>>>> >> >>>>>>>> Cheers, >> >>>>>>>> Kostas >> >>>>>>>> >> >>>>>>>> [1] >> >>>>>>>> https://lists.apache.org/thread.html/r799be74658bc7e169238cc8c1e479e961a9e85ccea19089290940ff0%40%3Cdev.flink.apache.org%3E >> >>>>>>>> >> >>>>>>>> On Wed, Oct 14, 2020 at 8:03 AM Arvid Heise <ar...@ververica.com> >> >>>>>>>> wrote: >> >>>>>>>>> I remember this conversation popping up a few times already and >> >>>>>>>>> I'm in >> >>>>>>>>> general a big fan of removing BucketingSink. >> >>>>>>>>> >> >>>>>>>>> However, until now there were a few features lacking in >> >>>>>>>>> StreamingFileSink >> >>>>>>>>> that are present in BucketingSink and that are being actively used >> >>>>>>>>> (I >> >>>>>>>> can't >> >>>>>>>>> exactly remember them now, but I can look it up if everyone else >> >>>>>>>>> is also >> >>>>>>>>> suffering from bad memory). Did we manage to add them in the >> >>>>>>>>> meantime? If >> >>>>>>>>> not, then it feels rushed to remove it at this point. >> >>>>>>>>> >> >>>>>>>>> On Tue, Oct 13, 2020 at 2:33 PM Kostas Kloudas <kklou...@gmail.com> >> >>>>>>>> wrote: >> >>>>>>>>>> @Chesnay Schepler Off the top of my head, I cannot find an easy >> >>>>>>>>>> way >> >>>>>>>>>> to migrate from the BucketingSink to the StreamingFileSink. It >> >>>>>>>>>> may be >> >>>>>>>>>> possible but it will require some effort because the logic would >> >>>>>>>>>> be >> >>>>>>>>>> "read the old state, commit it, and start fresh with the >> >>>>>>>>>> StreamingFileSink." >> >>>>>>>>>> >> >>>>>>>>>> On Tue, Oct 13, 2020 at 2:09 PM Aljoscha Krettek >> >>>>>>>>>> <aljos...@apache.org> >> >>>>>>>>>> wrote: >> >>>>>>>>>>> On 13.10.20 14:01, David Anderson wrote: >> >>>>>>>>>>>> I thought this was waiting on FLIP-46 -- Graceful Shutdown >> >>>>>>>> Handling -- >> >>>>>>>>>> and >> >>>>>>>>>>>> in fact, the StreamingFileSink is mentioned in that FLIP as a >> >>>>>>>>>> motivating >> >>>>>>>>>>>> use case. >> >>>>>>>>>>> Ah yes, I see FLIP-147 as a more general replacement for FLIP-46. >> >>>>>>>> Thanks >> >>>>>>>>>>> for the reminder, we should close FLIP-46 now with an explanatory >> >>>>>>>>>>> message to avoid confusion. >> >>>>>>>>> -- >> >>>>>>>>> >> >>>>>>>>> Arvid Heise | Senior Java Developer >> >>>>>>>>> >> >>>>>>>>> <https://www.ververica.com/> >> >>>>>>>>> >> >>>>>>>>> Follow us @VervericaData >> >>>>>>>>> >> >>>>>>>>> -- >> >>>>>>>>> >> >>>>>>>>> Join Flink Forward <https://flink-forward.org/> - The Apache Flink >> >>>>>>>>> Conference >> >>>>>>>>> >> >>>>>>>>> Stream Processing | Event Driven | Real Time >> >>>>>>>>> >> >>>>>>>>> -- >> >>>>>>>>> >> >>>>>>>>> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany >> >>>>>>>>> >> >>>>>>>>> -- >> >>>>>>>>> Ververica GmbH >> >>>>>>>>> Registered at Amtsgericht Charlottenburg: HRB 158244 B >> >>>>>>>>> Managing Directors: Timothy Alexander Steinert, Yip Park Tung >> >>>>>>>>> Jason, Ji >> >>>>>>>>> (Toni) Cheng >> >> >>