Unsubscribe

2023-12-29 Thread Vinti Maheshwari

Re: Spark and KafkaUtils

2016-03-15 Thread Vinti Maheshwari
("meta-inf/services/") => MergeStrategy.filterDistinctLines case "reference.conf"=> MergeStrategy.concat case _ => MergeStrategy.first } Thanks & Regards, Vinti On Wed, Feb

Re: [MARKETING] Spark Streaming stateful transformation mapWithState function getting error scala.MatchError: [Ljava.lang.Object]

2016-03-14 Thread Vinti Maheshwari
t state and what is new data. > > > > Assuming it works like the Java API, to use this function to maintain > State you must call State.update, while you can return anything, not just > the state. > > > > Cheers > > Iain > > > > *From:* Vinti Maheshw

Spark Streaming stateful transformation mapWithState function getting error scala.MatchError: [Ljava.lang.Object]

2016-03-12 Thread Vinti Maheshwari
Hi All, I wanted to replace my updateStateByKey function with mapWithState function (Spark 1.6) to improve performance of my program. I was following these two documents: https://databricks.com/blog/2016/02/01/faster-stateful-stream-processing-in-spark-streaming.html

Re: Spark + Kafka all messages being used in 1 batch

2016-03-06 Thread Vinti Maheshwari
I have 2 machines in my cluster with the below specifications: 128 GB RAM and 8 cores machine Regards, ~Vinti On Sun, Mar 6, 2016 at 7:54 AM, Vinti Maheshwari <vinti.u...@gmail.com> wrote: > Thanks Supreeth and Shahbaz. I will try adding > spark.streaming.kafka.maxRatePerParti

Re: Spark + Kafka all messages being used in 1 batch

2016-03-06 Thread Vinti Maheshwari
tting spark.streaming.kafka.maxRatePerPartition, this can help >> control the number of messages read from Kafka per partition on the spark >> streaming consumer. >> >> -S >> >> >> On Mar 5, 2016, at 10:02 PM, Vinti Maheshwari <vinti.u...@gmail.com>

Spark + Kafka all messages being used in 1 batch

2016-03-05 Thread Vinti Maheshwari
Hello, I am trying to figure out why my kafka+spark job is running slow. I found that spark is consuming all the messages out of kafka into a single batch itself and not sending any messages to the other batches. 2016/03/05 21:57:05

Re: spark streaming

2016-03-02 Thread Vinti Maheshwari
shixi...@databricks.com > wrote: > Hey, > > KafkaUtils.createDirectStream doesn't need a StorageLevel as it doesn't > store blocks to BlockManager. However, the error is not related > to StorageLevel. It may be a bug. Could you provide more info about it? > E.g., Spark version,

Re: Spark streaming: StorageLevel.MEMORY_AND_DISK_SER setting for KafkaUtils.createDirectStream

2016-03-02 Thread Vinti Maheshwari
Increasing Spark_executors_instances to 4 worked. SPARK_EXECUTOR_INSTANCES="4" #Number of workers to start (Default: 2) Regards, Vinti On Wed, Mar 2, 2016 at 4:28 AM, Vinti Maheshwari <vinti.u...@gmail.com> wrote: > Thanks much Saisai. Got it. > So i think increasing

Re: Spark streaming: StorageLevel.MEMORY_AND_DISK_SER setting for KafkaUtils.createDirectStream

2016-03-02 Thread Vinti Maheshwari
ire to store the input data ahead of time. Only receiver-based > approach could specify the storage level. > > Thanks > Saisai > > On Wed, Mar 2, 2016 at 7:08 PM, Vinti Maheshwari <vinti.u...@gmail.com> > wrote: > >> Hi All, >> >> I wanted to set *Storag

Spark streaming: StorageLevel.MEMORY_AND_DISK_SER setting for KafkaUtils.createDirectStream

2016-03-02 Thread Vinti Maheshwari
Hi All, I wanted to set *StorageLevel.MEMORY_AND_DISK_SER* in my spark-streaming program as currently i am getting MetadataFetchFailedException*. *I am not sure where i should pass StorageLevel.MEMORY_AND_DISK, as it seems like createDirectStream doesn't allow to pass that parameter. val

spark streaming

2016-03-02 Thread Vinti Maheshwari
Hi All, I wanted to set *StorageLevel.MEMORY_AND_DISK_SER* in my spark-streaming program as currently i am getting MetadataFetchFailedException*. *I am not sure where i should pass StorageLevel.MEMORY_AND_DISK, as it seems like createDirectStream doesn't allow to pass that parameter. val

Re: perl Kafka::Producer, “Kafka::Exception::Producer”, “code”, -1000, “message”, "Invalid argument

2016-02-29 Thread Vinti Maheshwari
may have better luck > on a perl or kafka related list. > > On Mon, Feb 29, 2016 at 3:26 PM, Vinti Maheshwari <vinti.u...@gmail.com> > wrote: > >> Hi All, >> >> I wrote kafka producer using kafka perl api, But i am getting error when >> i am passing

Re: Spark streaming not remembering previous state

2016-02-27 Thread Vinti Maheshwari
Thanks much Amit, Sebastian. It worked. Regards, ~Vinti On Sat, Feb 27, 2016 at 12:44 PM, Amit Assudani <aassud...@impetus.com> wrote: > Your context is not being created using checkpoints, use get or create, > > From: Vinti Maheshwari <vinti.u...@gmail.com> > Date: Sat

Re: Spark and KafkaUtils

2016-02-24 Thread Vinti Maheshwari
filter {x => x.data.getName.matches("sbt.*") || x.data.getName.matches(".*macros.*")}} Thanks, ~Vinti On Wed, Feb 24, 2016 at 12:55 PM, Vinti Maheshwari <vinti.u...@gmail.com> wrote: > Thanks much Cody, I added assembly.sbt and modified build.sbt with ivy bug > related con

Re: Spark and KafkaUtils

2016-02-24 Thread Vinti Maheshwari
build a jar with > all the dependencies. > > On Wed, Feb 24, 2016 at 1:50 PM, Vinti Maheshwari <vinti.u...@gmail.com> > wrote: > >> I am not using sbt assembly currently. I need to check how to use sbt >> assembly. >> >> Regards, >> ~Vinti >> &g

Re: Spark and KafkaUtils

2016-02-24 Thread Vinti Maheshwari
le jar along with your code. Otherwise > you'd have to specify each separate jar in your spark-submit line, which is > a pain. > > On Wed, Feb 24, 2016 at 12:49 PM, Vinti Maheshwari <vinti.u...@gmail.com> > wrote: > >> Hi Cody, >> >> I tried with the build file

Re: Spark and KafkaUtils

2016-02-24 Thread Vinti Maheshwari
r/kafka-exactly-once/blob/master/build.sbt > > includes some hacks for ivy issues that may no longer be strictly > necessary, but try that build and see if it works for you. > > > On Wed, Feb 24, 2016 at 11:14 AM, Vinti Maheshwari <vinti.u...@gmail.com> > wrote: >

Spark and KafkaUtils

2016-02-24 Thread Vinti Maheshwari
Hello, I have tried multiple different settings in build.sbt but seems like nothing is working. Can anyone suggest the right syntax/way to include kafka with spark? Error Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/streaming/kafka/KafkaUtils$ build.sbt

Network Spark Streaming from multiple remote hosts

2016-02-23 Thread Vinti Maheshwari
Hi All I wrote program for Spark Streaming in Scala. In my program, i passed 'remote-host' and 'remote port' under socketTextStream. And in the remote machine, i have one perl script who is calling system command: echo 'data_str' | nc <> In that way, my spark program is able to get data,

Re: Spark Streaming not reading input coming from the other ip

2016-02-22 Thread Vinti Maheshwari
e "telnet" to > test if the network is normal? > > On Mon, Feb 22, 2016 at 6:59 AM, Vinti Maheshwari <vinti.u...@gmail.com> > wrote: > >> For reference, my program: >> >> def main(args: Array[String]): Unit = { >> val conf = new SparkConf()

Re: Spark Streaming not reading input coming from the other ip

2016-02-22 Thread Vinti Maheshwari
dl.So.gcda,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 On Mon, Feb 22, 2016 at 6:38 AM, Vinti Maheshwari <vinti.u...@gmail.com> wrote: > Hi > > I am in spark Streaming context, and i am reading input from the the > socket using nc

Spark Streaming not reading input coming from the other ip

2016-02-22 Thread Vinti Maheshwari
Hi I am in spark Streaming context, and i am reading input from the the socket using nc -lk . When i am running it and manually giving input it's working. But, if input is coming from different ip to this socket then spark is not reading that input, though it's showing all the input coming

Re: Stream group by

2016-02-21 Thread Vinti Maheshwari
, record_4, record._5)) >>> .groupByKey((r1, r2) => (r1._1 + r2._1, r1._2 + r2._2, r1._3 + >>> r2._3)) >>> .foreachRDD(rdd => { >>>rdd.collect().foreach((fileName, valueTuple) => >> global map here>) >>> })

Re: Stream group by

2016-02-21 Thread Vinti Maheshwari
dd.collect().foreach((fileName, valueTuple) => > map here>) >> }) >> >> -- >> Thanks >> Jatin Kumar | Rocket Scientist >> +91-7696741743 m >> >> On Sun, Feb 21, 2016 at 11:30 PM, Vinti Maheshwari <vinti.u...@gmail.com> >>

Re: Stream group by

2016-02-21 Thread Vinti Maheshwari
Nevermind, seems like an executor level mutable map is not recommended as stated in http://blog.guillaume-pitel.fr/2015/06/spark-trick-shot-node-centered-aggregator/ On Sun, Feb 21, 2016 at 9:54 AM, Vinti Maheshwari <vinti.u...@gmail.com> wrote: > Thanks for your reply Jatin. I c

Re: Stream group by

2016-02-21 Thread Vinti Maheshwari
1, r2) => (r1._1 + r2._1, r1._2 + r2._2, r1._3 + > r2._3)) > > I hope that helps. > > -- > Thanks > Jatin Kumar | Rocket Scientist > +91-7696741743 m > > On Sun, Feb 21, 2016 at 10:35 PM, Vinti Maheshwari <vinti.u...@gmail.com> > wrote: > >> Hello, >

Stream group by

2016-02-21 Thread Vinti Maheshwari
Hello, I have input lines like below *Input* t1, file1, 1, 1, 1 t1, file1, 1, 2, 3 t1, file2, 2, 2, 2, 2 t2, file1, 5, 5, 5 t2, file2, 1, 1, 2, 2 and i want to achieve the output like below rows which is a vertical addition of the corresponding numbers. *Output* “file1” : [ 1+1+5, 1+2+5, 1+3+5

Re: Need help in spark-Scala program

2016-02-01 Thread Vinti Maheshwari
Hi, Sorry, please ignore my message, It was sent by mistake. I am still drafting. Regards, Vinti On Mon, Feb 1, 2016 at 2:25 PM, Vinti Maheshwari <vinti.u...@gmail.com> wrote: > Hi All, > > I recently started learning Spark. I need to use spark-streaming. > > 1) Inp

Spark Streaming application designing question

2016-02-01 Thread Vinti Maheshwari
Hi, I am new in spark. I wanted to do spark streaming setup to retrieve key value pairs of below format files: file: info1 Note: Each info file will have around of 1000 of these records. And our system continuously generating info files. So Through spark streaming i wanted to aggregate result.

Need help in spark-Scala program

2016-02-01 Thread Vinti Maheshwari
Hi All, I recently started learning Spark. I need to use spark-streaming. 1) Input, need to read from MongoDB db.event_gcovs.find({executions:"56791a746e928d7b176d03c0", valid:1, infofile:{$exists:1}, geo:"sunnyvale"}, {infofile:1}).count() > Number of Info files: 24441 /* 0 */ { "_id" :