Well, the difference is, a technical user writes the UDF and a
non-technical user may use this built-in thing (misconfigure it) and shoot
themselves in the foot.

On Wed, Jul 1, 2020, 6:40 PM Andrew Melo <andrew.m...@gmail.com> wrote:

> On Wed, Jul 1, 2020 at 8:13 PM Burak Yavuz <brk...@gmail.com> wrote:
> >
> > I'm not sure having a built-in sink that allows you to DDOS servers is
> the best idea either. foreachWriter is typically used for such use cases,
> not foreachBatch. It's also pretty hard to guarantee exactly-once, rate
> limiting, etc.
>
> If you control the machines and can run arbitrary code, you can DDOS
> whatever you want. What's the difference between this proposal and
> writing a UDF that opens 1,000 connections to a target machine?
>
> > Best,
> > Burak
> >
> > On Wed, Jul 1, 2020 at 5:54 PM Holden Karau <hol...@pigscanfly.ca>
> wrote:
> >>
> >> I think adding something like this (if it doesn't already exist) could
> help make structured streaming easier to use, foreachBatch is not the best
> API.
> >>
> >> On Wed, Jul 1, 2020 at 2:21 PM Jungtaek Lim <
> kabhwan.opensou...@gmail.com> wrote:
> >>>
> >>> I guess the method, query parameter, header, and the payload would be
> all different for almost every use case - that makes it hard to generalize
> and requires implementation to be pretty much complicated to be flexible
> enough.
> >>>
> >>> I'm not aware of any custom sink implementing REST so your best bet
> would be simply implementing your own with foreachBatch, but so someone
> might jump in and provide a pointer if there is something in the Spark
> ecosystem.
> >>>
> >>> Thanks,
> >>> Jungtaek Lim (HeartSaVioR)
> >>>
> >>> On Thu, Jul 2, 2020 at 3:21 AM Sam Elamin <hussam.ela...@gmail.com>
> wrote:
> >>>>
> >>>> Hi All,
> >>>>
> >>>>
> >>>> We ingest alot of restful APIs into our lake and I'm wondering if it
> is at all possible to created a rest sink in structured streaming?
> >>>>
> >>>> For now I'm only focusing on restful services that have an
> incremental ID so my sink can just poll for new data then ingest.
> >>>>
> >>>> I can't seem to find a connector that does this and my gut instinct
> tells me it's probably because it isn't possible due to something
> completely obvious that I am missing
> >>>>
> >>>> I know some RESTful API obfuscate the IDs to a hash of strings and
> that could be a problem but since I'm planning on focusing on just
> numerical IDs that just get incremented I think I won't be facing that issue
> >>>>
> >>>>
> >>>> Can anyone let me know if this sounds like a daft idea? Will I need
> something like Kafka or kinesis as a buffer and redundancy or am I
> overthinking this?
> >>>>
> >>>>
> >>>> I would love to bounce ideas with people who runs structured
> streaming jobs in production
> >>>>
> >>>>
> >>>> Kind regards
> >>>> San
> >>>>
> >>>>
> >>
> >>
> >> --
> >> Twitter: https://twitter.com/holdenkarau
> >> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>

Reply via email to