Well, the difference is, a technical user writes the UDF and a non-technical user may use this built-in thing (misconfigure it) and shoot themselves in the foot.
On Wed, Jul 1, 2020, 6:40 PM Andrew Melo <andrew.m...@gmail.com> wrote: > On Wed, Jul 1, 2020 at 8:13 PM Burak Yavuz <brk...@gmail.com> wrote: > > > > I'm not sure having a built-in sink that allows you to DDOS servers is > the best idea either. foreachWriter is typically used for such use cases, > not foreachBatch. It's also pretty hard to guarantee exactly-once, rate > limiting, etc. > > If you control the machines and can run arbitrary code, you can DDOS > whatever you want. What's the difference between this proposal and > writing a UDF that opens 1,000 connections to a target machine? > > > Best, > > Burak > > > > On Wed, Jul 1, 2020 at 5:54 PM Holden Karau <hol...@pigscanfly.ca> > wrote: > >> > >> I think adding something like this (if it doesn't already exist) could > help make structured streaming easier to use, foreachBatch is not the best > API. > >> > >> On Wed, Jul 1, 2020 at 2:21 PM Jungtaek Lim < > kabhwan.opensou...@gmail.com> wrote: > >>> > >>> I guess the method, query parameter, header, and the payload would be > all different for almost every use case - that makes it hard to generalize > and requires implementation to be pretty much complicated to be flexible > enough. > >>> > >>> I'm not aware of any custom sink implementing REST so your best bet > would be simply implementing your own with foreachBatch, but so someone > might jump in and provide a pointer if there is something in the Spark > ecosystem. > >>> > >>> Thanks, > >>> Jungtaek Lim (HeartSaVioR) > >>> > >>> On Thu, Jul 2, 2020 at 3:21 AM Sam Elamin <hussam.ela...@gmail.com> > wrote: > >>>> > >>>> Hi All, > >>>> > >>>> > >>>> We ingest alot of restful APIs into our lake and I'm wondering if it > is at all possible to created a rest sink in structured streaming? > >>>> > >>>> For now I'm only focusing on restful services that have an > incremental ID so my sink can just poll for new data then ingest. > >>>> > >>>> I can't seem to find a connector that does this and my gut instinct > tells me it's probably because it isn't possible due to something > completely obvious that I am missing > >>>> > >>>> I know some RESTful API obfuscate the IDs to a hash of strings and > that could be a problem but since I'm planning on focusing on just > numerical IDs that just get incremented I think I won't be facing that issue > >>>> > >>>> > >>>> Can anyone let me know if this sounds like a daft idea? Will I need > something like Kafka or kinesis as a buffer and redundancy or am I > overthinking this? > >>>> > >>>> > >>>> I would love to bounce ideas with people who runs structured > streaming jobs in production > >>>> > >>>> > >>>> Kind regards > >>>> San > >>>> > >>>> > >> > >> > >> -- > >> Twitter: https://twitter.com/holdenkarau > >> Books (Learning Spark, High Performance Spark, etc.): > https://amzn.to/2MaRAG9 > >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >