Spark structured streaming with periodical persist and unpersist

2021-02-11 Thread act_coder
I am currently building a spark structured streaming application where I am doing a batch-stream join. And the source for the batch data gets updated periodically. So, I am planning to do a persist/unpersist of that batch data periodically. Below is a sample code which I am using to persist and

Find difference between two dataframes in spark structured streaming

2020-12-16 Thread act_coder
I am creating a spark structured streaming job, where I need to find the difference between two dataframes. Dataframe 1 : [1, item1, value1] [2, item2, value2] [3, item3, value3] [4, item4, value4] [5, item5, value5] Dataframe 2: [4, item4, value4] [5, item5, value5] New Dataframe with

converting dataframe from one format to another in spark structured streaming

2020-11-26 Thread act_coder
I am creating a spark structured streaming application and I have a streaming dataframe which has the below data in it. { "name":"sensor1", "time":"2020-11-27T01:01:00", "sensorvalue":11.0, "tag1":"tagvalue" } I would like to convert that dataframe into below format. { "name":"sensor1",

converting dataframe from one format to another in spark structured streaming

2020-11-26 Thread act_coder
I am creating a spark structured streaming application and I have a streaming dataframe which has the below data in it. { "name":"sensor1", "time":"2020-11-27T01:01:00", "sensorvalue":11.0, "tag1":"tagvalue" } I would like to convert that dataframe into below format. { "name":"sensor1",

Single spark streaming job to read incoming events with dynamic schema

2020-11-16 Thread act_coder
I am trying to create a spark structured streaming job which reads from a Kafka topic and the events coming from that Kafka topic will have different schemas (There is no standard schema for the incoming events). Sample incoming events: event1: {timestamp:2018-09-28T15:50:57.2420418+00:00,

Re: Using two WriteStreams in same spark structured streaming job

2020-11-04 Thread act_coder
If I use for each function, then I may need to use custom Kafka stream writer right ?! And I might not be able to use default writestream.format(Kafka) method ?! -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

Using two WriteStreams in same spark structured streaming job

2020-11-04 Thread act_coder
I have a scenario where I would like to save the same streaming dataframe to two different streaming sinks. I have created a streaming dataframe which I need to send to both Kafka topic and delta lake. I thought of using forEachBatch, but looks like it doesn't support multiple STREAMING SINKS.

How to use groupByKey() in spark structured streaming without aggregates

2020-10-27 Thread act_coder
Is there a way through which we can use* groupByKey() Function in spark structured streaming without aggregates ?* I have a scenario like below, where we would like to group the items based on a key without applying any aggregates. *Sample incoming data:* I would like to apply groupByKey on

Re: How to disable 'spark.security.credentials.${service}.enabled' in Structured streaming while connecting to a kafka cluster

2020-01-12 Thread act_coder
Hi Gabor, Thanks for the reply. Yes, I have created a JIRA here and have added the details over there - https://issues.apache.org/jira/browse/SPARK-30495 Please let me know if that is the right place to create it. If not, where should I be creating it ? Also would like to know, whether it is

Re: How to disable 'spark.security.credentials.${service}.enabled' in Structured streaming while connecting to a kafka cluster

2020-01-12 Thread act_coder
Hi Gabor, Thanks for the reply. Yes, I have created a JIRA here and have added the details over there - https://issues.apache.org/jira/browse/SPARK-30495 Please let me know if that is the right place to create it. If not, where should I be

How to disable 'spark.security.credentials.${service}.enabled' in Structured streaming while connecting to a kafka cluster

2020-01-08 Thread act_coder
I am trying to read data from a secured Kafka cluster using spark structured streaming. Also I am using the below library to read the data - "spark-sql-kafka-0-10_2.12":"3.0.0-preview" since it has the feature to specify our custom group id (instead of spark setting its own custom group id)