Re: FlatMapGroupsWithStateFunction is called thrice - Production use case.

2021-03-18 Thread Kuttaiah Robin
; Jungtaek Lim (HeartSaVioR) > > On Thu, Mar 11, 2021 at 4:54 PM Kuttaiah Robin wrote: > >> Hello, >> >> I have a use case where I need to read events(non correlated) from a >> source kafka topic, then correlate and push forward to another target topic. >> &g

FlatMapGroupsWithStateFunction is called thrice - Production use case.

2021-03-10 Thread Kuttaiah Robin
Hello, I have a use case where I need to read events(non correlated) from a source kafka topic, then correlate and push forward to another target topic. I use spark structured streaming with FlatMapGroupsWithStateFunction along with GroupStateTimeout.ProcessingTimeTimeout() . After each

How to handle spark state which is growing too big even with timeout set.

2021-02-11 Thread Kuttaiah Robin
Hello, I have a use case where I need to read events(non correlated) from a kafka topic, then correlate and push forward to another topic. I use spark structured streaming with FlatMapGroupsWithStateFunction along with GroupStateTimeout.ProcessingTimeTimeout() . After each timeout, I do some

Spark Listeners for getting dataset partition information in streaming application

2018-11-02 Thread Kuttaiah Robin
Hello, Is there a way spark streaming application will get to know during the start and end of the data read from a dataset partition? I want to create partition specific cache during the start and delete during the partition is read completely. Thanks for you help in advance. regards, Robin

How to use Dataset forEachPartion and groupByKey together

2018-11-01 Thread Kuttaiah Robin
Hello all, Am using spark-2.3.0 and hadoop-2.7.4. I have spark streaming application which listens to kafka topic, does some transformation and writes to Oracle database using JDBC client. Step 1. Read events from Kafka as shown below; -- Dataset

Redeploying spark streaming application aborts because of checkpoint issue

2018-10-14 Thread Kuttaiah Robin
Hello all, Am using spark-2.3.0 and hadoop-2.7.4. I have spark streaming application which listens to kafka topic, does some transformation and writes to Oracle database using JDBC client. Read events from Kafka as shown below; m_oKafkaEvents =

Re: Recreate Dataset from list of Row in spark streaming application.

2018-10-06 Thread Kuttaiah Robin
rocess > method as it will run in executors. > > Best Regards, > Ryan > > > On Fri, Oct 5, 2018 at 6:54 AM Kuttaiah Robin wrote: > >> Hello, >> >> I have a spark streaming application which reads from Kafka based on the >> given schema. >> >>

Recreate Dataset from list of Row in spark streaming application.

2018-10-05 Thread Kuttaiah Robin
Hello, I have a spark streaming application which reads from Kafka based on the given schema. Dataset m_oKafkaEvents = getSparkSession().readStream().format("kafka") .option("kafka.bootstrap.servers", strKafkaAddress) .option("assign", strSubscription)

Re: Creating spark Row from database values

2018-09-26 Thread Kuttaiah Robin
it helps. > > Regards, > Shahab > > On Wed, Sep 26, 2018 at 8:02 AM Kuttaiah Robin wrote: > >> Hello, >> >> Currently I have Oracle database table with description as shown below; >> >> Table INSIGHT_ID_FED_IDENTIFIERS >> --

Creating spark Row from database values

2018-09-26 Thread Kuttaiah Robin
Hello, Currently I have Oracle database table with description as shown below; Table INSIGHT_ID_FED_IDENTIFIERS - CURRENT_INSTANCE_ID VARCHAR2(100) PREVIOUS_INSTANCE_ID VARCHAR2(100) Sample values in the table basically output of select * from

Spark FlatMapGroupsWithStateFunction throws cannot resolve 'named_struct()' due to data type mismatch 'SerializeFromObject"

2018-09-17 Thread Kuttaiah Robin
Hello, Am using FlatMapGroupsWithStateFunction in my spark streaming application. FlatMapGroupsWithStateFunction idstateUpdateFunction = new FlatMapGroupsWithStateFunction() {.} SessionUpdate class is having trouble when added the highlighted code which throws below exception; The same