Any suggestion on this? How to update configuration data on all executors
with out downtime?
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Hi,
We have multiple spark jobs running on a single EMR cluster. All jobs use
same business related configurations which are stored in Postgres. How to
update this configuration data at all executors dynamically if any changes
happened to Postgres db data with out spark restarts.
We are using
Not sure if kinesis have such flexibility. What else possibilities are there
at transformations level?
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail:
Any example for this please
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
We are doing batch processing using Spark Streaming with Kinesis with a batch
size of 5 mins. We want to send all events with same eventId to same
executor for a batch so that we can do multiple events based grouping
operations based on eventId. No previous batch or future batch data is
concerned.
We are doing batch processing using Spark Streaming with Kinesis with a batch
size of 5 mins. We want to send all events with same eventId to same
executor for a batch so that we can do multiple events based grouping
operations based on eventId. No previous batch or future batch data is
concerned.
We are using Spark Streaming (Dstreams) with Kinesis batch interval as 10sec.
For every random batch, processing time is taking very long. While checking
logs, we found below log lines when ever we are getting spike in processing
time:
Thanks for reply. Please find sudo code below. Its Dstreams reading for every
10secs from kinesis stream and after transformations, pushing into hbase.
Once got Dstream, we are using below code to repartition and do processing:
dStream = dStream.repartition(javaSparkContext.defaultMinPartitions()
Thanks for reply. Please find sudo code below. We are fetching Dstreams from
kinesis stream for every 10sec and performing transformations and finally
persisting to hbase tables using batch insertions.
dStream = dStream.repartition(jssc.defaultMinPartitions() * 3);
dStream.foreachRDD(javaRDD ->
I am new to spark streaming and trying to understand spark ui and to do
optimizations.
1. Processing at executors took less time than at driver. How to optimize to
make driver tasks fast ?
2. We are using dstream.repartition(defaultParallelism*3) to increase
parallelism which is causing high
10 matches
Mail list logo