+1 to remove the Bucketing Sink.

Thanks for the effort on ORC and `HadoopPathBasedBulkFormatBuilder`, I
think it's safe to get rid of the old Bucketing API with them.

Best,
Jingsong

On Thu, Oct 29, 2020 at 3:06 AM Kostas Kloudas <kklou...@gmail.com> wrote:

> Thanks for the discussion!
>
> From this thread I do not see any objection with moving forward with
> removing the sink.
> Given this I will open a voting thread tomorrow.
>
> Cheers,
> Kostas
>
> On Wed, Oct 28, 2020 at 6:50 PM Stephan Ewen <se...@apache.org> wrote:
> >
> > +1 to remove the Bucketing Sink.
> >
> > It has been very common in the past to remove code that was deprecated
> for multiple releases in favor of reducing baggage.
> > Also in cases that had no perfect drop-in replacement, but needed users
> to forward fit the code.
> > I am not sure I understand why this case is so different.
> >
> > Why the Bucketing Sink should be thrown out, in my opinion:
> >
> > The Bucketing sink makes it easier for users to add general Hadoop
> writes.
> > But the price is that it easily leads to dataloss, because it assumes
> flush()/sync() work reliably on Hadoop relicably, which they don't (HDFS
> works somewhat, S3 works not at all).
> > I think the Bucketing sink is a trap for users, that's why it was
> deprecated long ago.
> >
> > The StreamingFileSink covers the majority of cases from the Bucketing
> Sink.
> > It does have some friction when adding/wrapping some general Hadoop
> writers. Parts will be solved with the transactional sink work.
> > If something is missing and blocking users, we can prioritize adding it
> to the Streaming File Sink. Also that is something we did before and it
> helped being pragmatic with moving forward, rather than being held back by
> "maybe there is something we don't know".
> >
> >
> >
> >
> > On Wed, Oct 28, 2020 at 12:36 PM Chesnay Schepler <ches...@apache.org>
> wrote:
> >>
> >> Then we can't remove it, because there is no way for us to ascertain
> >> whether anyone is still using it.
> >>
> >> Sure, the user ML is the best we got, but you can't argue that we don't
> >> want any users to be affected and then use an imperfect mean to find
> users.
> >> If you are fine with relying on the user ML, then you _are_ fine with
> >> removing it at the cost of friction for some users.
> >>
> >> To be clear, I, personally, don't have a problem with removing it (we
> >> have removed other connectors in the past that did not have a migration
> >> plan), I just reject he argumentation.
> >>
> >> On 10/28/2020 12:21 PM, Kostas Kloudas wrote:
> >> > No, I do not think that "we are fine with removing it at the cost of
> >> > friction for some users".
> >> >
> >> > I believe that this can be another discussion that we should have as
> >> > soon as we establish that someone is actually using it. The point I am
> >> > trying to make is that if no user is using it, we should remove it and
> >> > not leave unmaintained code around.
> >> >
> >> > On Wed, Oct 28, 2020 at 12:11 PM Chesnay Schepler <ches...@apache.org>
> wrote:
> >> >> The alternative could also be to use a different argument than "no
> one
> >> >> uses it", e.g., we are fine with removing it at the cost of friction
> for
> >> >> some users because there are better alternatives.
> >> >>
> >> >> On 10/28/2020 10:46 AM, Kostas Kloudas wrote:
> >> >>> I think that the mailing lists is the best we can do and I would say
> >> >>> that they seem to be working pretty well (e.g. the recent Mesos
> >> >>> discussion).
> >> >>> Of course they are not perfect but the alternative would be to never
> >> >>> remove anything user facing until the next major release, which I
> find
> >> >>> pretty strict.
> >> >>>
> >> >>> On Wed, Oct 28, 2020 at 10:04 AM Chesnay Schepler <
> ches...@apache.org> wrote:
> >> >>>> If the conclusion is that we shouldn't remove it if _anyone_ is
> using
> >> >>>> it, then we cannot remove it because the user ML obviously does not
> >> >>>> reach all users.
> >> >>>>
> >> >>>> On 10/28/2020 9:28 AM, Kostas Kloudas wrote:
> >> >>>>> Hi all,
> >> >>>>>
> >> >>>>> I am bringing the up again to see if there are any users actively
> >> >>>>> using the BucketingSink.
> >> >>>>> So far, if I am not mistaken (and really sorry if I forgot
> anything),
> >> >>>>> it is only a discussion between devs about the potential problems
> of
> >> >>>>> removing it. I totally understand Chesnay's concern about not
> >> >>>>> providing compatibility with the StreamingFileSink (SFS) and if
> there
> >> >>>>> are any users, then we should not remove it without trying to
> find a
> >> >>>>> solution for them.
> >> >>>>>
> >> >>>>> But if there are no users then I would still propose to remove the
> >> >>>>> module, given that I am not aware of any efforts to provide
> >> >>>>> compatibility with the SFS any time soon.
> >> >>>>> The reasons for removing it also include the facts that we do not
> >> >>>>> actively maintain it and we do not add new features. As for
> potential
> >> >>>>> missing features in the SFS compared to the BucketingSink that was
> >> >>>>> mentioned before, I am not aware of any fundamental limitations
> and
> >> >>>>> even if there are, I would assume that the solution is not to
> direct
> >> >>>>> the users to a deprecated sink but rather try to increase the
> >> >>>>> functionality of the actively maintained one.
> >> >>>>>
> >> >>>>> Please keep in mind that the BucketingSink is deprecated since
> FLINK
> >> >>>>> 1.9 and there is a new File Sink that is coming as part of
> FLIP-143
> >> >>>>> [1].
> >> >>>>> Again, if there are any active users who cannot migrate easily,
> then
> >> >>>>> we cannot remove it before trying to provide a smooth migration
> path.
> >> >>>>>
> >> >>>>> Thanks,
> >> >>>>> Kostas
> >> >>>>>
> >> >>>>> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-143%3A+Unified+Sink+API
> >> >>>>>
> >> >>>>> On Fri, Oct 16, 2020 at 4:36 PM Chesnay Schepler <
> ches...@apache.org> wrote:
> >> >>>>>> @Seth: Earlier in this discussion it was said that the
> BucketingSink
> >> >>>>>> would not be usable in 1.12 .
> >> >>>>>>
> >> >>>>>> On 10/16/2020 4:25 PM, Seth Wiesman wrote:
> >> >>>>>>> +1 It has been deprecated for some time and the
> StreamingFileSink has
> >> >>>>>>> stabalized with a large number of formats and features.
> >> >>>>>>>
> >> >>>>>>> Plus, the bucketing sink only implements a small number of
> stable
> >> >>>>>>> interfaces[1]. I would expect users to continue to use the
> bucketing sink
> >> >>>>>>> from the 1.11 release with future versions for some time.
> >> >>>>>>>
> >> >>>>>>> Seth
> >> >>>>>>>
> >> >>>>>>>
> https://github.com/apache/flink/blob/2ff3b771cbb091e1f43686dd8e176cea6d435501/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L170-L172
> >> >>>>>>>
> >> >>>>>>> On Thu, Oct 15, 2020 at 2:57 PM Kostas Kloudas <
> kklou...@gmail.com> wrote:
> >> >>>>>>>
> >> >>>>>>>> @Arvid Heise I also do not remember exactly what were all the
> >> >>>>>>>> problems. The fact that we added some more bulk formats to the
> >> >>>>>>>> streaming file sink definitely reduced the non-supported
> features. In
> >> >>>>>>>> addition, the latest discussion I found on the topic was [1]
> and the
> >> >>>>>>>> conclusion of that discussion seems to be to remove it.
> >> >>>>>>>>
> >> >>>>>>>> Currently, I cannot find any obvious reason why keeping the
> >> >>>>>>>> BucketingSink, apart from the fact that we do not have a
> migration
> >> >>>>>>>> plan unfortunately. This is why I posted this to dev@ and
> user@.
> >> >>>>>>>>
> >> >>>>>>>> Cheers,
> >> >>>>>>>> Kostas
> >> >>>>>>>>
> >> >>>>>>>> [1]
> >> >>>>>>>>
> https://lists.apache.org/thread.html/r799be74658bc7e169238cc8c1e479e961a9e85ccea19089290940ff0%40%3Cdev.flink.apache.org%3E
> >> >>>>>>>>
> >> >>>>>>>> On Wed, Oct 14, 2020 at 8:03 AM Arvid Heise <
> ar...@ververica.com> wrote:
> >> >>>>>>>>> I remember this conversation popping up a few times already
> and I'm in
> >> >>>>>>>>> general a big fan of removing BucketingSink.
> >> >>>>>>>>>
> >> >>>>>>>>> However, until now there were a few features lacking in
> StreamingFileSink
> >> >>>>>>>>> that are present in BucketingSink and that are being actively
> used (I
> >> >>>>>>>> can't
> >> >>>>>>>>> exactly remember them now, but I can look it up if everyone
> else is also
> >> >>>>>>>>> suffering from bad memory). Did we manage to add them in the
> meantime? If
> >> >>>>>>>>> not, then it feels rushed to remove it at this point.
> >> >>>>>>>>>
> >> >>>>>>>>> On Tue, Oct 13, 2020 at 2:33 PM Kostas Kloudas <
> kklou...@gmail.com>
> >> >>>>>>>> wrote:
> >> >>>>>>>>>> @Chesnay Schepler  Off the top of my head, I cannot find an
> easy way
> >> >>>>>>>>>> to migrate from the BucketingSink to the StreamingFileSink.
> It may be
> >> >>>>>>>>>> possible but it will require some effort because the logic
> would be
> >> >>>>>>>>>> "read the old state, commit it, and start fresh with the
> >> >>>>>>>>>> StreamingFileSink."
> >> >>>>>>>>>>
> >> >>>>>>>>>> On Tue, Oct 13, 2020 at 2:09 PM Aljoscha Krettek <
> aljos...@apache.org>
> >> >>>>>>>>>> wrote:
> >> >>>>>>>>>>> On 13.10.20 14:01, David Anderson wrote:
> >> >>>>>>>>>>>> I thought this was waiting on FLIP-46 -- Graceful Shutdown
> >> >>>>>>>> Handling --
> >> >>>>>>>>>> and
> >> >>>>>>>>>>>> in fact, the StreamingFileSink is mentioned in that FLIP
> as a
> >> >>>>>>>>>> motivating
> >> >>>>>>>>>>>> use case.
> >> >>>>>>>>>>> Ah yes, I see FLIP-147 as a more general replacement for
> FLIP-46.
> >> >>>>>>>> Thanks
> >> >>>>>>>>>>> for the reminder, we should close FLIP-46 now with an
> explanatory
> >> >>>>>>>>>>> message to avoid confusion.
> >> >>>>>>>>> --
> >> >>>>>>>>>
> >> >>>>>>>>> Arvid Heise | Senior Java Developer
> >> >>>>>>>>>
> >> >>>>>>>>> <https://www.ververica.com/>
> >> >>>>>>>>>
> >> >>>>>>>>> Follow us @VervericaData
> >> >>>>>>>>>
> >> >>>>>>>>> --
> >> >>>>>>>>>
> >> >>>>>>>>> Join Flink Forward <https://flink-forward.org/> - The Apache
> Flink
> >> >>>>>>>>> Conference
> >> >>>>>>>>>
> >> >>>>>>>>> Stream Processing | Event Driven | Real Time
> >> >>>>>>>>>
> >> >>>>>>>>> --
> >> >>>>>>>>>
> >> >>>>>>>>> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> >> >>>>>>>>>
> >> >>>>>>>>> --
> >> >>>>>>>>> Ververica GmbH
> >> >>>>>>>>> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> >> >>>>>>>>> Managing Directors: Timothy Alexander Steinert, Yip Park Tung
> Jason, Ji
> >> >>>>>>>>> (Toni) Cheng
> >> >>
> >>
>


-- 
Best, Jingsong Lee

Reply via email to