Fwd: Announcing ApacheCon @Home 2020

2020-07-01 Thread Felix Cheung
-- Forwarded message - We are pleased to announce that ApacheCon @Home will be held online, September 29 through October 1. More event details are available at https://apachecon.com/acah2020 but there’s a few things that I want to highlight for you, the members. Yes, the CFP

Re: REST Structured Steaming Sink

2020-07-01 Thread Burak Yavuz
Well, the difference is, a technical user writes the UDF and a non-technical user may use this built-in thing (misconfigure it) and shoot themselves in the foot. On Wed, Jul 1, 2020, 6:40 PM Andrew Melo wrote: > On Wed, Jul 1, 2020 at 8:13 PM Burak Yavuz wrote: > > > > I'm not sure having a

Re: REST Structured Steaming Sink

2020-07-01 Thread Andrew Melo
On Wed, Jul 1, 2020 at 8:13 PM Burak Yavuz wrote: > > I'm not sure having a built-in sink that allows you to DDOS servers is the > best idea either. foreachWriter is typically used for such use cases, not > foreachBatch. It's also pretty hard to guarantee exactly-once, rate limiting, > etc.

Re: REST Structured Steaming Sink

2020-07-01 Thread Holden Karau
On Wed, Jul 1, 2020 at 6:13 PM Burak Yavuz wrote: > I'm not sure having a built-in sink that allows you to DDOS servers is the > best idea either > Do you think it would be used accidentally? If so we could have it with default per server rate limits that people would have to explicitly tune. >

Re: REST Structured Steaming Sink

2020-07-01 Thread Burak Yavuz
I'm not sure having a built-in sink that allows you to DDOS servers is the best idea either. foreachWriter is typically used for such use cases, not foreachBatch. It's also pretty hard to guarantee exactly-once, rate limiting, etc. Best, Burak On Wed, Jul 1, 2020 at 5:54 PM Holden Karau wrote:

Re: REST Structured Steaming Sink

2020-07-01 Thread Holden Karau
I think adding something like this (if it doesn't already exist) could help make structured streaming easier to use, foreachBatch is not the best API. On Wed, Jul 1, 2020 at 2:21 PM Jungtaek Lim wrote: > I guess the method, query parameter, header, and the payload would be all > different for

Re: REST Structured Steaming Sink

2020-07-01 Thread Jungtaek Lim
I guess the method, query parameter, header, and the payload would be all different for almost every use case - that makes it hard to generalize and requires implementation to be pretty much complicated to be flexible enough. I'm not aware of any custom sink implementing REST so your best bet

REST Structured Steaming Sink

2020-07-01 Thread Sam Elamin
Hi All, We ingest alot of restful APIs into our lake and I'm wondering if it is at all possible to created a rest sink in structured streaming? For now I'm only focusing on restful services that have an incremental ID so my sink can just poll for new data then ingest. I can't seem to find a

Re: Truncate table

2020-07-01 Thread Russell Spitzer
I'm not sure what you're really trying to do here but it sounds like saving the data to a park a file or other temporary store before truncating would protect you in case of failure. On Wed, Jul 1, 2020, 9:48 AM Amit Sharma wrote: > Hi, i have scenario where i have to read certain raw from a

Truncate table

2020-07-01 Thread Amit Sharma
Hi, i have scenario where i have to read certain raw from a table and truncate the table and store the certain raws back to the table. I am doing below steps 1. reading certain raws in DF1 from cassandra table A. 2. saving into cassandra as override in table A the problem is when I truncate the

Re: Running Apache Spark Streaming on the GraalVM Native Image

2020-07-01 Thread Pasha Finkelshteyn
Hi Ivo, I believe there's absolutely no way that Spark will work on GraalVM Native Image because Spark generates code and loads classes in runtime, while GraalVM Native Image works only in closed world and has no any way to load classes which are not present in classpath at compie time. On

upsert dataframe to kudu

2020-07-01 Thread Umesh Bansal
Hi All, We are running into issues when spark is trying to insert a dataframe into the kudu table having 300 columns. Few of the tables getting inserted with NULL values. In code, we are using upsert built in method and passing dataframe on it Thanks

Running Apache Spark Streaming on the GraalVM Native Image

2020-07-01 Thread ivo.kn...@t-online.de
Hi guys, so I want to get Apache Spark to run on the GraalVM Native Image in a simple single-node streaming application, but I get the following error, when trying to build the native image: (check attached file) And as I researched online, there seems to be no successful combination of