from:"act_coder"

Spark structured streaming with periodical persist and unpersist

2021-02-11 Thread act_coder

I am currently building a spark structured streaming application where I am doing a batch-stream join. And the source for the batch data gets updated periodically. So, I am planning to do a persist/unpersist of that batch data periodically. Below is a sample code which I am using to persist and

Find difference between two dataframes in spark structured streaming

2020-12-16 Thread act_coder

I am creating a spark structured streaming job, where I need to find the difference between two dataframes. Dataframe 1 : [1, item1, value1] [2, item2, value2] [3, item3, value3] [4, item4, value4] [5, item5, value5] Dataframe 2: [4, item4, value4] [5, item5, value5] New Dataframe with

converting dataframe from one format to another in spark structured streaming

2020-11-26 Thread act_coder

I am creating a spark structured streaming application and I have a streaming dataframe which has the below data in it. { "name":"sensor1", "time":"2020-11-27T01:01:00", "sensorvalue":11.0, "tag1":"tagvalue" } I would like to convert that dataframe into below format. { "name":"sensor1",

converting dataframe from one format to another in spark structured streaming

2020-11-26 Thread act_coder

I am creating a spark structured streaming application and I have a streaming dataframe which has the below data in it. { "name":"sensor1", "time":"2020-11-27T01:01:00", "sensorvalue":11.0, "tag1":"tagvalue" } I would like to convert that dataframe into below format. { "name":"sensor1",

Single spark streaming job to read incoming events with dynamic schema

2020-11-16 Thread act_coder

I am trying to create a spark structured streaming job which reads from a Kafka topic and the events coming from that Kafka topic will have different schemas (There is no standard schema for the incoming events). Sample incoming events: event1: {timestamp:2018-09-28T15:50:57.2420418+00:00,

Re: Using two WriteStreams in same spark structured streaming job

2020-11-04 Thread act_coder

If I use for each function, then I may need to use custom Kafka stream writer right ?! And I might not be able to use default writestream.format(Kafka) method ?! -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

Using two WriteStreams in same spark structured streaming job

2020-11-04 Thread act_coder

I have a scenario where I would like to save the same streaming dataframe to two different streaming sinks. I have created a streaming dataframe which I need to send to both Kafka topic and delta lake. I thought of using forEachBatch, but looks like it doesn't support multiple STREAMING SINKS.

How to use groupByKey() in spark structured streaming without aggregates

2020-10-27 Thread act_coder

Is there a way through which we can use* groupByKey() Function in spark structured streaming without aggregates ?* I have a scenario like below, where we would like to group the items based on a key without applying any aggregates. *Sample incoming data:* I would like to apply groupByKey on

Re: How to disable 'spark.security.credentials.${service}.enabled' in Structured streaming while connecting to a kafka cluster

2020-01-12 Thread act_coder

Hi Gabor, Thanks for the reply. Yes, I have created a JIRA here and have added the details over there - https://issues.apache.org/jira/browse/SPARK-30495 Please let me know if that is the right place to create it. If not, where should I be creating it ? Also would like to know, whether it is

Re: How to disable 'spark.security.credentials.${service}.enabled' in Structured streaming while connecting to a kafka cluster

2020-01-12 Thread act_coder

Hi Gabor, Thanks for the reply. Yes, I have created a JIRA here and have added the details over there - https://issues.apache.org/jira/browse/SPARK-30495 Please let me know if that is the right place to create it. If not, where should I be

How to disable 'spark.security.credentials.${service}.enabled' in Structured streaming while connecting to a kafka cluster

2020-01-08 Thread act_coder

I am trying to read data from a secured Kafka cluster using spark structured streaming. Also I am using the below library to read the data - "spark-sql-kafka-0-10_2.12":"3.0.0-preview" since it has the feature to specify our custom group id (instead of spark setting its own custom group id)

Spark structured streaming with periodical persist and unpersist

Find difference between two dataframes in spark structured streaming

converting dataframe from one format to another in spark structured streaming

converting dataframe from one format to another in spark structured streaming

Single spark streaming job to read incoming events with dynamic schema

Re: Using two WriteStreams in same spark structured streaming job

Using two WriteStreams in same spark structured streaming job

How to use groupByKey() in spark structured streaming without aggregates

Re: How to disable 'spark.security.credentials.${service}.enabled' in Structured streaming while connecting to a kafka cluster

Re: How to disable 'spark.security.credentials.${service}.enabled' in Structured streaming while connecting to a kafka cluster

How to disable 'spark.security.credentials.${service}.enabled' in Structured streaming while connecting to a kafka cluster

11 matches

Site Navigation

Mail list logo

Footer information