Re: DB Config data update across multiple Spark Streaming Jobs

2021-03-15 Thread forece85
Any suggestion on this? How to update configuration data on all executors with out downtime? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

DB Config data update across multiple Spark Streaming Jobs

2021-03-13 Thread forece85
Hi, We have multiple spark jobs running on a single EMR cluster. All jobs use same business related configurations which are stored in Postgres. How to update this configuration data at all executors dynamically if any changes happened to Postgres db data with out spark restarts. We are using

Re: Spark Streaming - Routing rdd to Executor based on Key

2021-03-09 Thread forece85
Not sure if kinesis have such flexibility. What else possibilities are there at transformations level? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail:

Re: Spark Streaming - Routing rdd to Executor based on Key

2021-03-09 Thread forece85
Any example for this please -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Spark Streaming - Routing rdd to Executor based on Key

2021-03-09 Thread forece85
We are doing batch processing using Spark Streaming with Kinesis with a batch size of 5 mins. We want to send all events with same eventId to same executor for a batch so that we can do multiple events based grouping operations based on eventId. No previous batch or future batch data is concerned.

Spark Streaming - Routing rdd to Executor based on Key

2021-03-09 Thread forece85
We are doing batch processing using Spark Streaming with Kinesis with a batch size of 5 mins. We want to send all events with same eventId to same executor for a batch so that we can do multiple events based grouping operations based on eventId. No previous batch or future batch data is concerned.

Spark Stremaing - Dstreams - Removing RDD

2020-07-27 Thread forece85
We are using Spark Streaming (Dstreams) with Kinesis batch interval as 10sec. For every random batch, processing time is taking very long. While checking logs, we found below log lines when ever we are getting spike in processing time:

Re: Spark Streaming - Set Parallelism and Optimize driver

2020-07-20 Thread forece85
Thanks for reply. Please find sudo code below. Its Dstreams reading for every 10secs from kinesis stream and after transformations, pushing into hbase. Once got Dstream, we are using below code to repartition and do processing: dStream = dStream.repartition(javaSparkContext.defaultMinPartitions()

Re: Spark Streaming - Set Parallelism and Optimize driver

2020-07-20 Thread forece85
Thanks for reply. Please find sudo code below. We are fetching Dstreams from kinesis stream for every 10sec and performing transformations and finally persisting to hbase tables using batch insertions. dStream = dStream.repartition(jssc.defaultMinPartitions() * 3); dStream.foreachRDD(javaRDD ->

Spark Streaming - Set Parallelism and Optimize driver

2020-07-20 Thread forece85
I am new to spark streaming and trying to understand spark ui and to do optimizations. 1. Processing at executors took less time than at driver. How to optimize to make driver tasks fast ? 2. We are using dstream.repartition(defaultParallelism*3) to increase parallelism which is causing high