Re: Kafka spark structure streaming out of memory issue

2020-08-13 Thread Srinivas V
It depends on how much memory is available and how much data you are processing. Please provide data size and cluster details to help. On Fri, Aug 14, 2020 at 12:54 AM km.santanu wrote: > Hi > I am using Kafka stateless structure streaming.i have enabled watermark as > 1 > hour.after long

Re: Metrics Problem

2020-06-30 Thread Srinivas V
packaged as part of my main jar. > It worked well both on driver and locally, but not on executors. > > Regards, > > Bryan Jeffrey > > Get Outlook for Android <https://aka.ms/ghei36> > > ------ > *From:* Srinivas V > *Sent:* Saturday,

Re: Metrics Problem

2020-06-26 Thread Srinivas V
6> > > -- > *From:* Srinivas V > *Sent:* Friday, June 26, 2020 9:47:52 PM > *To:* Bryan Jeffrey > *Cc:* user > *Subject:* Re: Metrics Problem > > It should work when you are giving hdfs path as long as your jar exists in > the path. > Your

Re: Metrics Problem

2020-06-26 Thread Srinivas V
It should work when you are giving hdfs path as long as your jar exists in the path. Your error is more security issue (Kerberos) or Hadoop dependencies missing I think, your error says : org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation On Fri, Jun 26, 2020 at 8:44 PM

Re: Spark Structured Streaming: “earliest” as “startingOffsets” is not working

2020-06-26 Thread Srinivas V
Cool. Are you not using watermark ? Also, is it possible to start listening offsets from a specific date time ? Regards Srini On Sat, Jun 27, 2020 at 6:12 AM Eric Beabes wrote: > My apologies... After I set the 'maxOffsetsPerTrigger' to a value such as > '20' it started working. Hopefully

[spark-structured-streaming] [stateful]

2020-06-14 Thread Srinivas V
Does stateful structured streaming work on a stand-alone spark cluster with few nodes? Does it need hdfs ? If not how to get it working without hdfs ? Regards Srini

Re: [spark-structured-streaming] [kafka] consume topics from multiple Kafka clusters

2020-06-09 Thread Srinivas V
,") > .load() > > After acquiring the DataFrame, you can union them and treat all the data > with a single process. > > val unifiedData = df_cluster1.union(df_cluster2) > // apply further transformations on `unifiedData` > > kr, Gerard. > > > : > > >

Re: [spark-structured-streaming] [kafka] consume topics from multiple Kafka clusters

2020-06-09 Thread Srinivas V
Thanks for the quick reply. This may work but I have like 5 topics to listen to right now, I am trying to keep all topics in an array in a properties file and trying to read all at once. This way it is dynamic and you have one code block like below and you may add or delete topics from the config

[spark-structured-streaming] [kafka] consume topics from multiple Kafka clusters

2020-06-09 Thread Srinivas V
Hello, In Structured Streaming, is it possible to have one spark application with one query to consume topics from multiple kafka clusters? I am trying to consume two topics each from different Kafka Cluster, but it gives one of the topics as an unknown topic and the job keeps running without

Re: Using Spark Accumulators with Structured Streaming

2020-06-08 Thread Srinivas V
hing < > mailinglist...@gmail.com> wrote: > >> Right, this is exactly how I've it right now. Problem is in the cluster >> mode 'myAcc' does NOT get distributed. Try it out in the cluster mode & you >> will see what I mean. >> >> I think how Zh

Re: Using Spark Accumulators with Structured Streaming

2020-06-08 Thread Srinivas V
apGroupsWithState(timeoutConf = >> > > > GroupStateTimeout.ProcessingTimeTimeout)(func = mappingFunc) >> > > > >> > > > val query = wordCounts.writeStream >> > > > .outputMode(OutputMode.Update) >> > > > ... >> > > > ``` >> >

Re: Using Spark Accumulators with Structured Streaming

2020-05-30 Thread Srinivas V
, Iterator > eventsIterator, GroupState state) { > } > } > > On Fri, May 29, 2020 at 1:08 PM Srinivas V wrote: > >> >> Yes, accumulators are updated in the call method of StateUpdateTask. Like >> when state times out or when the data is pushed to next Kafka topic

Re: Using Spark Accumulators with Structured Streaming

2020-05-29 Thread Srinivas V
n Fri, May 29, 2020 at 6:51 AM Srinivas V wrote: > >> Yes it is application specific class. This is how java Spark Functions >> work. >> You can refer to this code in the documentation: >> https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark

Re: Using Spark Accumulators with Structured Streaming

2020-05-29 Thread Srinivas V
plication specific class. Does it > have 'updateState' method or something? I googled but couldn't find any > documentation about doing it this way. Can you please direct me to some > documentation. Thanks. > > On Thu, May 28, 2020 at 4:43 AM Srinivas V wrote: > >> yes, I am us

Re: Using Spark Accumulators with Structured Streaming

2020-05-28 Thread Srinivas V
ue) > Thread.sleep(5 * 1000) > } > > //query.awaitTermination() > ``` > > And the accumulator value updated can be found from driver stdout. > > -- > Cheers, > -z > > On Thu, 28 May 2020 17:12:48 +0530 > Srinivas V wrote: > > > yes, I

Re: Using Spark Accumulators with Structured Streaming

2020-05-28 Thread Srinivas V
n other streaming > applications it works as expected. > > > > On Wed, May 27, 2020 at 9:01 AM Srinivas V wrote: > >> Yes, I am talking about Application specific Accumulators. Actually I am >> getting the values printed in my driver log as well as sent to Grafa

Re: Using Spark Accumulators with Structured Streaming

2020-05-27 Thread Srinivas V
in > your code? I am talking about the Application Specific Accumulators. The > other standard counters such as 'event.progress.inputRowsPerSecond' are > getting populated correctly! > > On Mon, May 25, 2020 at 8:39 PM Srinivas V wrote: > >> Hello, >> Even for me it comes

Re: Using Spark Accumulators with Structured Streaming

2020-05-25 Thread Srinivas V
Hello, Even for me it comes as 0 when I print in OnQueryProgress. I use LongAccumulator as well. Yes, it prints on my local but not on cluster. But one consolation is that when I send metrics to Graphana, the values are coming there. On Tue, May 26, 2020 at 3:10 AM Something Something <

[structured streaming] [stateful] Null value appeared in non-nullable field

2020-05-23 Thread Srinivas V
Hello, I am listening to a kaka topic through Spark Structured Streaming [2.4.5]. After processing messages for few mins, I am getting below NullPointerException.I have three beans used here 1.Event 2.StateInfo 3.SessionUpdateInfo. I am suspecting that the problem is with StateInfo, when it is

Re: GrupState limits

2020-05-12 Thread Srinivas V
If you are talking about total number of objects the state can hold, that depends on the executor memory you have on your cluster apart from rest of the memory required for processing. The state is stored in hdfs and retrieved while processing the next events. If you maintain million objects with

Re: Spark structured streaming - performance tuning

2020-05-08 Thread Srinivas V
Anyone else can answer below questions on performance tuning Structured streaming? @Jacek? On Sun, May 3, 2020 at 12:07 AM Srinivas V wrote: > Hi Alex, read the book , it is a good one but i don’t see things which I > strongly want to understand. > You are right on the partition and t

Re: Spark structured streaming - performance tuning

2020-05-02 Thread Srinivas V
every partition in Kafka is mapped into Spark > partition. And in Spark, every partition is mapped into task. But you can > use `coalesce` to decrease the number of Spark partitions, so you'll have > less tasks... > > Srinivas V at "Sat, 18 Apr 2020 10:32:33 +0530"

Re: Spark structured streaming - performance tuning

2020-04-17 Thread Srinivas V
of > each partition is a separate task that need to be executed, so you need to > plan number of cores correspondingly. > > Srinivas V at "Thu, 16 Apr 2020 22:49:15 +0530" wrote: > SV> Hello, > SV> Can someone point me to a good video or document which takes about &g

Spark structured streaming - performance tuning

2020-04-16 Thread Srinivas V
Hello, Can someone point me to a good video or document which takes about performance tuning for structured streaming app? I am looking especially for listening to Kafka topics say 5 topics each with 100 portions . Trying to figure out best cluster size and number of executors and cores required.

Re: Spark Streaming not working

2020-04-10 Thread Srinivas V
Check if your broker details are correct, verify if you have network connectivity to your client box and Kafka broker server host. On Fri, Apr 10, 2020 at 11:04 PM Debabrata Ghosh wrote: > Hi, > I have a spark streaming application where Kafka is producing > records but unfortunately

Re: spark structured streaming GroupState returns weird values from sate

2020-03-31 Thread Srinivas V
of non getter methods of the fields defined? Still how is that causing the state object get corrupt so much? On Sat, Mar 28, 2020 at 7:46 PM Srinivas V wrote: > Ok, I will try to create some simple code to reproduce, if I can. Problem > is that I am adding this code in an existing big p

Re: spark structured streaming GroupState returns weird values from sate

2020-03-28 Thread Srinivas V
iness logic which you may want to > redact, and provide full of source code which reproduces the bug? > > On Sat, Mar 28, 2020 at 8:11 PM Srinivas V wrote: > >> Sorry for typos , correcting them below >> >> On Sat, Mar 28, 2020 at 4:39 PM Srinivas V wrote: >>

Re: spark structured streaming GroupState returns weird values from sate

2020-03-28 Thread Srinivas V
Sorry for typos , correcting them below On Sat, Mar 28, 2020 at 4:39 PM Srinivas V wrote: > Sorry I was just changing some names not to send exact names. Please > ignore that. I am really struggling with this since couple of days. Can > this happen due to > 1. some of the value

Re: spark structured streaming GroupState returns weird values from sate

2020-03-28 Thread Srinivas V
t across multiple JVM >> runs to make it work properly, but I suspect it doesn't retain the order. >> >> On Fri, Mar 27, 2020 at 10:28 PM Srinivas V wrote: >> >>> I am listening to Kafka topic with a structured streaming application >>> with Java, te

spark structured streaming GroupState returns weird values from sate

2020-03-27 Thread Srinivas V
I am listening to Kafka topic with a structured streaming application with Java, testing it on my local Mac. When I retrieve back GroupState object with state.get(), it is giving some random values for the fields in the object, some are interchanging some are default and some are junk values.

Re: structured streaming Kafka consumer group.id override

2020-03-19 Thread Srinivas V
ailure or killing , the group id changes, but the > starting offset will be the last one you consumed last time . > > Srinivas V 于2020年3月19日周四 下午12:36写道: > >> Hello, >> 1. My Kafka consumer name is randomly being generated by spark structured >> streaming. Can I override thi

structured streaming Kafka consumer group.id override

2020-03-18 Thread Srinivas V
Hello, 1. My Kafka consumer name is randomly being generated by spark structured streaming. Can I override this? 2. When testing in development, when I stop my streaming job for Kafka consumer job for couple of days and try to start back again, the job keeps failing for missing offsets as the

structured streaming with mapGroupWithState

2020-03-11 Thread Srinivas V
Anyone using this combination for prod? I am planning to use for a use case with 15000 events per second from few Kafka topics. Through events are big, I would just have to take the businessIds, frequency, first and last event timestamp and save this into mapGroupWithState. I need to keep them for