I'm not sure having a built-in sink that allows you to DDOS servers is the best idea either. foreachWriter is typically used for such use cases, not foreachBatch. It's also pretty hard to guarantee exactly-once, rate limiting, etc.
Best, Burak On Wed, Jul 1, 2020 at 5:54 PM Holden Karau <hol...@pigscanfly.ca> wrote: > I think adding something like this (if it doesn't already exist) could > help make structured streaming easier to use, foreachBatch is not the best > API. > > On Wed, Jul 1, 2020 at 2:21 PM Jungtaek Lim <kabhwan.opensou...@gmail.com> > wrote: > >> I guess the method, query parameter, header, and the payload would be all >> different for almost every use case - that makes it hard to generalize and >> requires implementation to be pretty much complicated to be flexible enough. >> >> I'm not aware of any custom sink implementing REST so your best bet would >> be simply implementing your own with foreachBatch, but so someone might >> jump in and provide a pointer if there is something in the Spark ecosystem. >> >> Thanks, >> Jungtaek Lim (HeartSaVioR) >> >> On Thu, Jul 2, 2020 at 3:21 AM Sam Elamin <hussam.ela...@gmail.com> >> wrote: >> >>> Hi All, >>> >>> >>> We ingest alot of restful APIs into our lake and I'm wondering if it is >>> at all possible to created a rest sink in structured streaming? >>> >>> For now I'm only focusing on restful services that have an incremental >>> ID so my sink can just poll for new data then ingest. >>> >>> I can't seem to find a connector that does this and my gut instinct >>> tells me it's probably because it isn't possible due to something >>> completely obvious that I am missing >>> >>> I know some RESTful API obfuscate the IDs to a hash of strings and that >>> could be a problem but since I'm planning on focusing on just numerical IDs >>> that just get incremented I think I won't be facing that issue >>> >>> >>> Can anyone let me know if this sounds like a daft idea? Will I need >>> something like Kafka or kinesis as a buffer and redundancy or am I >>> overthinking this? >>> >>> >>> I would love to bounce ideas with people who runs structured streaming >>> jobs in production >>> >>> >>> Kind regards >>> San >>> >>> >>> > > -- > Twitter: https://twitter.com/holdenkarau > Books (Learning Spark, High Performance Spark, etc.): > https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau >