Re: Kafka spark structure streaming out of memory issue

2020-08-13 Thread Srinivas V
It depends on how much memory is available and how much data you are processing. Please provide data size and cluster details to help. On Fri, Aug 14, 2020 at 12:54 AM km.santanu wrote: > Hi > I am using Kafka stateless structure streaming.i have enabled watermark as > 1 > hour.after long

Re: Metrics Problem

2020-06-30 Thread Srinivas V
Then it should permission issue. What kind of cluster is it and which user is running it ? Does that user have hdfs permissions to access the folder where the jar file is ? On Mon, Jun 29, 2020 at 1:17 AM Bryan Jeffrey wrote: > Srinivas, > > Interestingly, I did have the metrics jar

Re: Metrics Problem

2020-06-26 Thread Srinivas V
One option is to create your main jar included with metrics jar like a fat jar. On Sat, Jun 27, 2020 at 8:04 AM Bryan Jeffrey wrote: > Srinivas, > > Thanks for the insight. I had not considered a dependency issue as the > metrics jar works well applied on the driver. Perhaps

Re: Metrics Problem

2020-06-26 Thread Srinivas V
It should work when you are giving hdfs path as long as your jar exists in the path. Your error is more security issue (Kerberos) or Hadoop dependencies missing I think, your error says : org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation On Fri, Jun 26, 2020 at 8:44 PM

Re: Spark Structured Streaming: “earliest” as “startingOffsets” is not working

2020-06-26 Thread Srinivas V
Cool. Are you not using watermark ? Also, is it possible to start listening offsets from a specific date time ? Regards Srini On Sat, Jun 27, 2020 at 6:12 AM Eric Beabes wrote: > My apologies... After I set the 'maxOffsetsPerTrigger' to a value such as > '20' it started working. Hopefully

[spark-structured-streaming] [stateful]

2020-06-14 Thread Srinivas V
Does stateful structured streaming work on a stand-alone spark cluster with few nodes? Does it need hdfs ? If not how to get it working without hdfs ? Regards Srini

Re: [spark-structured-streaming] [kafka] consume topics from multiple Kafka clusters

2020-06-09 Thread Srinivas V
ok, thanks for confirming, I will do it this way. Regards Srini On Tue, Jun 9, 2020 at 11:31 PM Gerard Maas wrote: > Hi Srinivas, > > Reading from different brokers is possible but you need to connect to each > Kafka cluster separately. > Trying to mix connections to two

Re: [spark-structured-streaming] [kafka] consume topics from multiple Kafka clusters

2020-06-09 Thread Srinivas V
Thanks for the quick reply. This may work but I have like 5 topics to listen to right now, I am trying to keep all topics in an array in a properties file and trying to read all at once. This way it is dynamic and you have one code block like below and you may add or delete topics from the config

[spark-structured-streaming] [kafka] consume topics from multiple Kafka clusters

2020-06-09 Thread Srinivas V
Hello, In Structured Streaming, is it possible to have one spark application with one query to consume topics from multiple kafka clusters? I am trying to consume two topics each from different Kafka Cluster, but it gives one of the topics as an unknown topic and the job keeps running without

Re: Using Spark Accumulators with Structured Streaming

2020-06-08 Thread Srinivas V
hing < > mailinglist...@gmail.com> wrote: > >> Right, this is exactly how I've it right now. Problem is in the cluster >> mode 'myAcc' does NOT get distributed. Try it out in the cluster mode & you >> will see what I mean. >> >> I think how Zh

Re: Using Spark Accumulators with Structured Streaming

2020-06-08 Thread Srinivas V
apGroupsWithState(timeoutConf = >> > > > GroupStateTimeout.ProcessingTimeTimeout)(func = mappingFunc) >> > > > >> > > > val query = wordCounts.writeStream >> > > > .outputMode(OutputMode.Update) >> > > > ... >> > > > ``` >> >

Re: Using Spark Accumulators with Structured Streaming

2020-05-30 Thread Srinivas V
, Iterator > eventsIterator, GroupState state) { > } > } > > On Fri, May 29, 2020 at 1:08 PM Srinivas V wrote: > >> >> Yes, accumulators are updated in the call method of StateUpdateTask. Like >> when state times out or when the data is pushed to next Kafka topic

Re: Using Spark Accumulators with Structured Streaming

2020-05-29 Thread Srinivas V
n Fri, May 29, 2020 at 6:51 AM Srinivas V wrote: > >> Yes it is application specific class. This is how java Spark Functions >> work. >> You can refer to this code in the documentation: >> https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark

Re: Using Spark Accumulators with Structured Streaming

2020-05-29 Thread Srinivas V
plication specific class. Does it > have 'updateState' method or something? I googled but couldn't find any > documentation about doing it this way. Can you please direct me to some > documentation. Thanks. > > On Thu, May 28, 2020 at 4:43 AM Srinivas V wrote: > >> yes, I am us

Re: Using Spark Accumulators with Structured Streaming

2020-05-28 Thread Srinivas V
ue) > Thread.sleep(5 * 1000) > } > > //query.awaitTermination() > ``` > > And the accumulator value updated can be found from driver stdout. > > -- > Cheers, > -z > > On Thu, 28 May 2020 17:12:48 +0530 > Srinivas V wrote: > > > yes, I

Re: Using Spark Accumulators with Structured Streaming

2020-05-28 Thread Srinivas V
n other streaming > applications it works as expected. > > > > On Wed, May 27, 2020 at 9:01 AM Srinivas V wrote: > >> Yes, I am talking about Application specific Accumulators. Actually I am >> getting the values printed in my driver log as well as sent to Grafa

Re: Using Spark Accumulators with Structured Streaming

2020-05-27 Thread Srinivas V
in > your code? I am talking about the Application Specific Accumulators. The > other standard counters such as 'event.progress.inputRowsPerSecond' are > getting populated correctly! > > On Mon, May 25, 2020 at 8:39 PM Srinivas V wrote: > >> Hello, >> Even for me it comes

Re: Using Spark Accumulators with Structured Streaming

2020-05-25 Thread Srinivas V
Hello, Even for me it comes as 0 when I print in OnQueryProgress. I use LongAccumulator as well. Yes, it prints on my local but not on cluster. But one consolation is that when I send metrics to Graphana, the values are coming there. On Tue, May 26, 2020 at 3:10 AM Something Something <

[structured streaming] [stateful] Null value appeared in non-nullable field

2020-05-23 Thread Srinivas V
Hello, I am listening to a kaka topic through Spark Structured Streaming [2.4.5]. After processing messages for few mins, I am getting below NullPointerException.I have three beans used here 1.Event 2.StateInfo 3.SessionUpdateInfo. I am suspecting that the problem is with StateInfo, when it is

Re: GrupState limits

2020-05-12 Thread Srinivas V
If you are talking about total number of objects the state can hold, that depends on the executor memory you have on your cluster apart from rest of the memory required for processing. The state is stored in hdfs and retrieved while processing the next events. If you maintain million objects with

Re: Spark structured streaming - performance tuning

2020-05-08 Thread Srinivas V
Anyone else can answer below questions on performance tuning Structured streaming? @Jacek? On Sun, May 3, 2020 at 12:07 AM Srinivas V wrote: > Hi Alex, read the book , it is a good one but i don’t see things which I > strongly want to understand. > You are right on the partition and t

Re: Spark structured streaming - performance tuning

2020-05-02 Thread Srinivas V
every partition in Kafka is mapped into Spark > partition. And in Spark, every partition is mapped into task. But you can > use `coalesce` to decrease the number of Spark partitions, so you'll have > less tasks... > > Srinivas V at "Sat, 18 Apr 2020 10:32:33 +0530"

Re: Spark structured streaming - performance tuning

2020-04-17 Thread Srinivas V
of > each partition is a separate task that need to be executed, so you need to > plan number of cores correspondingly. > > Srinivas V at "Thu, 16 Apr 2020 22:49:15 +0530" wrote: > SV> Hello, > SV> Can someone point me to a good video or document which takes about &g

Spark structured streaming - performance tuning

2020-04-16 Thread Srinivas V
Hello, Can someone point me to a good video or document which takes about performance tuning for structured streaming app? I am looking especially for listening to Kafka topics say 5 topics each with 100 portions . Trying to figure out best cluster size and number of executors and cores required.

Re: Spark Streaming not working

2020-04-10 Thread Srinivas V
Check if your broker details are correct, verify if you have network connectivity to your client box and Kafka broker server host. On Fri, Apr 10, 2020 at 11:04 PM Debabrata Ghosh wrote: > Hi, > I have a spark streaming application where Kafka is producing > records but unfortunately

Re: spark structured streaming GroupState returns weird values from sate

2020-03-31 Thread Srinivas V
of non getter methods of the fields defined? Still how is that causing the state object get corrupt so much? On Sat, Mar 28, 2020 at 7:46 PM Srinivas V wrote: > Ok, I will try to create some simple code to reproduce, if I can. Problem > is that I am adding this code in an existing big p

Re: spark structured streaming GroupState returns weird values from sate

2020-03-28 Thread Srinivas V
iness logic which you may want to > redact, and provide full of source code which reproduces the bug? > > On Sat, Mar 28, 2020 at 8:11 PM Srinivas V wrote: > >> Sorry for typos , correcting them below >> >> On Sat, Mar 28, 2020 at 4:39 PM Srinivas V wrote: >>

Re: spark structured streaming GroupState returns weird values from sate

2020-03-28 Thread Srinivas V
Sorry for typos , correcting them below On Sat, Mar 28, 2020 at 4:39 PM Srinivas V wrote: > Sorry I was just changing some names not to send exact names. Please > ignore that. I am really struggling with this since couple of days. Can > this happen due to > 1. some of the value

Re: spark structured streaming GroupState returns weird values from sate

2020-03-28 Thread Srinivas V
t across multiple JVM >> runs to make it work properly, but I suspect it doesn't retain the order. >> >> On Fri, Mar 27, 2020 at 10:28 PM Srinivas V wrote: >> >>> I am listening to Kafka topic with a structured streaming application >>> with Java, te

spark structured streaming GroupState returns weird values from sate

2020-03-27 Thread Srinivas V
I am listening to Kafka topic with a structured streaming application with Java, testing it on my local Mac. When I retrieve back GroupState object with state.get(), it is giving some random values for the fields in the object, some are interchanging some are default and some are junk values.

Re: structured streaming Kafka consumer group.id override

2020-03-19 Thread Srinivas V
ailure or killing , the group id changes, but the > starting offset will be the last one you consumed last time . > > Srinivas V 于2020年3月19日周四 下午12:36写道: > >> Hello, >> 1. My Kafka consumer name is randomly being generated by spark structured >> streaming. Can I override thi

structured streaming Kafka consumer group.id override

2020-03-18 Thread Srinivas V
Hello, 1. My Kafka consumer name is randomly being generated by spark structured streaming. Can I override this? 2. When testing in development, when I stop my streaming job for Kafka consumer job for couple of days and try to start back again, the job keeps failing for missing offsets as the

structured streaming with mapGroupWithState

2020-03-11 Thread Srinivas V
Anyone using this combination for prod? I am planning to use for a use case with 15000 events per second from few Kafka topics. Through events are big, I would just have to take the businessIds, frequency, first and last event timestamp and save this into mapGroupWithState. I need to keep them for

[ spark-streaming ] - Data Locality issue

2020-02-04 Thread Karthik Srinivas
Hi, I am using spark 2.3.2, i am facing issues due to data locality, even after giving spark.locality.wait.rack=200, locality_level is always RACK_LOCAL, can someone help me with this. Thank you

Data locality

2020-02-04 Thread Karthik Srinivas
Hi all, I am using spark 2.3.2, i am facing issues due to data locality, even after giving spark.locality.wait.rack=200, locality_level is always RACK_LOCAL, can someone help me with this. Thank you

Please unsubscribe me

2016-12-05 Thread Srinivas Potluri

Re: retrieve cell value from a rowMatrix.

2016-01-21 Thread Srivathsan Srinivas
//www.baidu.com/link?url=NXDpnPwRtM663-SOtf7UI7Vn88RsjDBih1D0weJSyZKL55PYEfeX6xc_dQUB3bpcAxBmEF9qhmXxZZFRGk0N43KqVWwLLAjgC6_W43ex9Rm> > ? > > -- 原始邮件 -- > *发件人:* "Srivathsan Srinivas";<srivathsan.srini...@gmail.com>; > *发送时间:* 2016年1月21日

retrieve cell value from a rowMatrix.

2016-01-20 Thread Srivathsan Srinivas
Hi, Is there a way to retrieve the cell value of a rowMatrix? Like m(i,j)? The docs say that the indices are long. Maybe I am doing something wrong...but, there doesn't seem to be any such direct method. Any suggestions? -- Thanks, Srini.

Re: Spray client reports Exception: akka.actor.ActorSystem.dispatcher()Lscala/concurrent/ExecutionContext

2014-11-10 Thread Srinivas Chamarthi
I am trying to use spark with spray and I have the dependency problem with quasiquotes. The issue comes up only when I include spark dependencies. I am not sure how this one can be excluded. Jianshi: can you let me know what version of spray + akka + spark are you using ? [error]

Spray with Spark-sql build fails with Incompatible dependencies

2014-11-10 Thread Srinivas Chamarthi
-version suffixes in {file:/ [error]org.scalamacros:quasiquotes _2.10, _2.10.3 [trace] Stack trace suppressed: run last *:update for the full output. [error] (*:update) Conflicting cross-version suffixes in: org.scalamacros:quasiq uotes thx srinivas

Re: Unresolved Attributes

2014-11-09 Thread Srinivas Chamarthi
work. not sure if theres an issue already and already fixed in master. I will raise an issue if someone else also confirms it. thx srinivas On Sat, Nov 8, 2014 at 3:26 PM, Srinivas Chamarthi srinivas.chamar...@gmail.com wrote: I have an exception when I am trying to run a simple where clause

Unresolved Attributes

2014-11-08 Thread Srinivas Chamarthi
I have an exception when I am trying to run a simple where clause query. I can see the name attribute is present in the schema but somehow it still throws the exception. query = select name from business where business_id= + business_id what am I doing wrong ? thx srinivas Exception in thread

contains in array in Spark SQL

2014-11-08 Thread Srinivas Chamarthi
hi, what would be the syntax for check for an attribute in an array data type for my where clause ? select * from business where cateogories contains 'X' // something like this , is this right syntax ?? attribute: categories type: Array thx srinivas

RE: Data from Mysql using JdbcRDD

2014-08-01 Thread srinivas
Hi Thanks Alli have few more questions on this suppose i don't want to pass where caluse in my sql and is their a way that i can do this. Right now i am trying to modify JdbcRDD class by removing all the paramaters for lower bound and upper bound. But i am getting run time exceptions. Is

Does spark streaming fit to our application

2014-07-21 Thread srinivas
Hi, Our application is required to do some aggregations on data that will be coming as a stream for over two months. I would like to know if spark streaming will be suitable for our requirement. After going through some documentation and videos i think we can do aggregations on data based on

Re: Spark Streaming Json file groupby function

2014-07-18 Thread srinivas
) }) //results.print() ssc.start() ssc.awaitTermination() } Thanks, -Srinivas -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Json-file-groupby-function-tp9618p10170.html Sent from the Apache Spark User List mailing list archive

Re: Spark Streaming Json file groupby function

2014-07-17 Thread srinivas
: java.lang.String cannot be cast to java.lang.Integer at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106) Basically i am trying to do max operation in my sparksql. please let me know if their any work around solution for this. Thanks, -Srinivas. -- View this message

Re: Spark Streaming Json file groupby function

2014-07-17 Thread srinivas
Hi TD, It Worked...Thank you so much for all your help. Thanks, -Srinivas. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Json-file-groupby-function-tp9618p10132.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark Streaming Json file groupby function

2014-07-16 Thread srinivas
,name:srinivas,score:10,school:lfs} I am thinking of some thing wrong with input RDD. Please let me know whats causing this error. Thanks, -Srinivas. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Json-file-groupby-function-tp9618p9933.html

Re: Spark Streaming Json file groupby function

2014-07-15 Thread srinivas
of org.apache.spark.rdd.RDD[Record] [error] recrdd.registerAsTable(table1) [error] ^ [error] one error found [error] (compile:compile) Compilation failed Please look into this and let me know if i am missing any thing. Thanks, -Srinivas. -- View this message in context: http://apache-spark-user

Re: Spark Streaming Json file groupby function

2014-07-15 Thread srinivas
/main/scala/jsonfile.scala:38: No TypeTag available for Record [error] recrdd.registerAsTable(table1) [error] ^ [error] one error found [error] (compile:compile) Compilation failed [error] Total time: 17 s, completed Jul 16, 2014 3:11:53 AM Please advice me on how to proceed Thanks, -Srinivas

Spark Streaming Json file groupby function

2014-07-14 Thread srinivas
hi I am new to spark and scala and I am trying to do some aggregations on json file stream using Spark Streaming. I am able to parse the json string and it is converted to map(id - 123, name - srini, mobile - 12324214, score - 123, test_type - math) now i want to use GROUPBY function on each

Re: Spark Streaming Json file groupby function

2014-07-14 Thread srinivas
Hi, Thanks for ur reply...i imported StreamingContext and right now i am getting my Dstream as something like map(id - 123, name - srini, mobile - 12324214, score - 123, test_type - math) map(id - 321, name - vasu, mobile - 73942090, score - 324, test_type -sci) map(id - 432, name -,

Re: Spark Streaming Json file groupby function

2014-07-14 Thread srinivas
Hi, Thanks for ur reply...i imported StreamingContext and right now i am getting my Dstream as something like map(id - 123, name - srini, mobile - 12324214, score - 123, test_type - math) map(id - 321, name - vasu, mobile - 73942090, score - 324, test_type -sci) map(id - 432, name -,

Re: Spark Streaming Json file groupby function

2014-07-14 Thread srinivas
Hi TD, Thanks for ur help...i am able to convert map to records using case class. I am left with doing some aggregations. I am trying to do some SQL type operations on my records set. My code looks like case class Record(ID:Int,name:String,score:Int,school:String) //val records = jsonf.map(m =