Re: How to clean up logs-dirs and local-dirs of running spark streaming in yarn cluster mode

2018-12-26 Thread shyla deshpande
90%, is there anything else I can do? Thanks -Shyla On Tue, Dec 25, 2018 at 7:26 PM Fawze Abujaber wrote: > http://shzhangji.com/blog/2015/05/31/spark-streaming-logging-configuration/ > > On Wed, Dec 26, 2018 at 1:05 AM shyla deshpande > wrote: > >> Please point me

Re: How to clean up logs-dirs and local-dirs of running spark streaming in yarn cluster mode

2018-12-25 Thread shyla deshpande
Please point me to any documentation if available. Thanks On Tue, Dec 18, 2018 at 11:10 AM shyla deshpande wrote: > Is there a way to do this without stopping the streaming application in > yarn cluster mode? > > On Mon, Dec 17, 2018 at 4:42 PM shyla deshpande > wrote: >

Re: How to clean up logs-dirs and local-dirs of running spark streaming in yarn cluster mode

2018-12-18 Thread shyla deshpande
Is there a way to do this without stopping the streaming application in yarn cluster mode? On Mon, Dec 17, 2018 at 4:42 PM shyla deshpande wrote: > I get the ERROR > 1/1 local-dirs are bad: /mnt/yarn; 1/1 log-dirs are bad: > /var/log/hadoop-yarn/containers > > Is there a

How to clean up logs-dirs and local-dirs of running spark streaming in yarn cluster mode

2018-12-17 Thread shyla deshpande
I get the ERROR 1/1 local-dirs are bad: /mnt/yarn; 1/1 log-dirs are bad: /var/log/hadoop-yarn/containers Is there a way to clean up these directories while the spark streaming application is running? Thanks

spark job error

2018-01-30 Thread shyla deshpande
I am running Zeppelin on EMR. with the default settings. I am getting the following error. Restarting the Zeppelin application fixes the problem. What default settings do I need to override that will help fix this error. org.apache.spark.SparkException: Job aborted due to stage failure: Task 71

How do I save the dataframe data as a pdf file?

2017-12-12 Thread shyla deshpande
Hello all, Is there a way to write the dataframe data as a pdf file? Thanks -Shyla

Re: Any libraries to do Complex Event Processing with spark streaming?

2017-10-03 Thread shyla deshpande
On Tue, Oct 3, 2017 at 10:50 AM, shyla deshpande <deshpandesh...@gmail.com> wrote: > Hi all, > I have a data pipeline using Spark streaming, Kafka and Cassandra. > Are there any libraries to help me with complex event processing using > Spark Streaming? > > I appreciate your help. > > Thanks >

Any libraries to do Complex Event Processing with spark streaming?

2017-10-03 Thread shyla deshpande
Hi all, I have a data pipeline using Spark streaming, Kafka and Cassandra. Are there any libraries to help me with complex event processing using Spark Streaming? I appreciate your help. Thanks

What is the right way to stop a streaming application?

2017-08-22 Thread shyla deshpande
Hi all, I am running a spark streaming application on AWS EC2 cluster in standalone mode. I am using DStreams and Spark 2.0.2 I do have the setting stopGracefullyOnShutdown to true. What is the right way to stop the streaming application. Thanks

Re: How do I pass multiple cassandra hosts in spark submit?

2017-08-10 Thread shyla deshpande
Got the answer from https://groups.google.com/a/lists.datastax.com/forum/#!topic/spark-connector-user/ETCZdCcaKq8 On Thu, Aug 10, 2017 at 11:59 AM, shyla deshpande <deshpandesh...@gmail.com> wrote: > I have a 3 node cassandra cluster. I want to pass all the 3 nodes in spark > subm

How do I pass multiple cassandra hosts in spark submit?

2017-08-10 Thread shyla deshpande
I have a 3 node cassandra cluster. I want to pass all the 3 nodes in spark submit. How do I do that. Any code samples will help. Thanks

Re: KafkaUtils.createRDD , How do I read all the data from kafka in a batch program for a given topic?

2017-08-10 Thread shyla deshpande
Thanks Cody. On Wed, Aug 9, 2017 at 8:46 AM, Cody Koeninger <c...@koeninger.org> wrote: > org.apache.spark.streaming.kafka.KafkaCluster has methods > getLatestLeaderOffsets and getEarliestLeaderOffsets > > On Mon, Aug 7, 2017 at 11:37 PM, shyla deshpande > <deshpan

Re: KafkaUtils.createRDD , How do I read all the data from kafka in a batch program for a given topic?

2017-08-07 Thread shyla deshpande
Thanks TD. On Mon, Aug 7, 2017 at 8:59 PM, Tathagata Das <tathagata.das1...@gmail.com> wrote: > I dont think there is any easier way. > > On Mon, Aug 7, 2017 at 7:32 PM, shyla deshpande <deshpandesh...@gmail.com> > wrote: > >> Thanks TD for the response. I forg

Re: KafkaUtils.createRDD , How do I read all the data from kafka in a batch program for a given topic?

2017-08-07 Thread shyla deshpande
ng-in-apache-spark-2-2.html > > On Mon, Aug 7, 2017 at 6:03 PM, shyla deshpande <deshpandesh...@gmail.com> > wrote: > >> Hi all, >> >> What is the easiest way to read all the data from kafka in a batch >> program for a given topic? >> I have 10 kafka parti

KafkaUtils.createRDD , How do I read all the data from kafka in a batch program for a given topic?

2017-08-07 Thread shyla deshpande
Hi all, What is the easiest way to read all the data from kafka in a batch program for a given topic? I have 10 kafka partitions, but the data is not much. I would like to read from the earliest from all the partitions for a topic. I appreciate any help. Thanks

Re: kafka settting, enable.auto.commit to false is being overridden and I lose data. please help!

2017-08-07 Thread shyla deshpande
I am doing that already for all known messy data. Thanks Cody for all your time and input On Mon, Aug 7, 2017 at 11:58 AM, Cody Koeninger <c...@koeninger.org> wrote: > Yes > > On Mon, Aug 7, 2017 at 12:32 PM, shyla deshpande > <deshpandesh...@gmail.com> wrot

Re: kafka settting, enable.auto.commit to false is being overridden and I lose data. please help!

2017-08-07 Thread shyla deshpande
l, since map is not an action. > > Filtering for unsuccessful attempts and collecting those back to the > driver would be one way for the driver to know whether it was safe to > commit. > > On Mon, Aug 7, 2017 at 12:31 AM, shyla deshpande > <deshpandesh...@gmail.com> wrote: >

Re: kafka settting, enable.auto.commit to false is being overridden and I lose data. please help!

2017-08-06 Thread shyla deshpande
raised? If so is there a way to commit the offsets only when there are no exceptions? On Sun, Aug 6, 2017 at 10:23 PM, shyla deshpande <deshpandesh...@gmail.com> wrote: > Thanks again Cody, > > My understanding is all the code inside foreachRDD is running on the > driver excep

Re: kafka settting, enable.auto.commit to false is being overridden and I lose data. please help!

2017-08-06 Thread shyla deshpande
> > http://spark.apache.org/docs/latest/streaming-programming- > guide.html#design-patterns-for-using-foreachrdd > > On Sun, Aug 6, 2017 at 12:55 PM, shyla deshpande > <deshpandesh...@gmail.com> wrote: > > Thanks Cody for your response. > > > > All I want to

Re: kafka settting, enable.auto.commit to false is being overridden and I lose data. please help!

2017-08-06 Thread shyla deshpande
foreach isn't being retried and succeeding > the second time around, etc > > On Sat, Aug 5, 2017 at 5:10 PM, shyla deshpande > <deshpandesh...@gmail.com> wrote: > > Hello All, > > I am using spark 2.0.2 and spark-streaming-kafka-0-10_2.11 . > > > > I

kafka settting, enable.auto.commit to false is being overridden and I lose data. please help!

2017-08-05 Thread shyla deshpande
Hello All, I am using spark 2.0.2 and spark-streaming-kafka-0-10_2.11 . I am setting enable.auto.commit to false, and manually want to commit the offsets after my output operation is successful. So when a exception is raised during during the processing I do not want the offsets to be committed.

Spark streaming application is failing after running for few hours

2017-07-10 Thread shyla deshpande
My Spark streaming application is failing after running for few hours. After it fails, when I check the storage tab, I see that MapWithStateRDD is less than 100%. Is this is reason why it is failing? What does MapWithStateRDD 90% cached mean. Does this mean I lost 10% or the 10% is spilled to

Re: Spark streaming giving me a bunch of WARNINGS, please help meunderstand them

2017-07-10 Thread shyla deshpande
卜丝炒饭 <1427357...@qq.com> wrote: > It seems you are usibg kafka 0.10. > See my comments below. > > ---Original--- > *From:* "shyla deshpande"<deshpandesh...@gmail.com> > *Date:* 2017/7/10 08:17:10 > *To:* "user"<user@spark.apache.org>; > *Subje

Spark streaming giving me a bunch of WARNINGS, please help me understand them

2017-07-09 Thread shyla deshpande
WARN Use an existing SparkContext, some configuration may not take effect. I wanted to restart the spark streaming app, so stopped the running and issued a new spark submit. Why and how it will use a existing SparkContext? WARN Spark is not running in local mode, therefore the

Spark sql with Zeppelin, Task not serializable error when I try to cache the spark sql table

2017-05-31 Thread shyla deshpande
Hello all, I am using Zeppelin 0.7.1 with Spark 2.1.0 I am getting org.apache.spark.SparkException: Task not serializable error when I try to cache the spark sql table. I am using a UDF on a column of table and want to cache the resultant table . I can execute the paragraph successfully when

Re: Structured streaming and writing output to Cassandra

2017-04-08 Thread shyla deshpande
-streaming.html > > Cheers > Jules > > Sent from my iPhone > Pardon the dumb thumb typos :) > > On Apr 7, 2017, at 11:23 AM, shyla deshpande <deshpandesh...@gmail.com> > wrote: > > Is anyone using structured streaming and writing the results to Cassandra > databa

Structured streaming and writing output to Cassandra

2017-04-07 Thread shyla deshpande
Is anyone using structured streaming and writing the results to Cassandra database in a production environment? I do not think I have enough expertise to write a custom sink that can be used in production environment. Please help!

What is the best way to run a scheduled spark batch job on AWS EC2 ?

2017-04-06 Thread shyla deshpande
I want to run a spark batch job maybe hourly on AWS EC2 . What is the easiest way to do this. Thanks

What is the best way to run a scheduled spark batch job on AWS EC2 ?

2017-04-06 Thread shyla deshpande
I want to run a spark batch job maybe hourly on AWS EC2 . What is the easiest way to do this. Thanks

Re: dataframe filter, unable to bind variable

2017-03-31 Thread shyla deshpande
Works. Thanks Hosur. On Thu, Mar 30, 2017 at 8:37 PM, hosur narahari <hnr1...@gmail.com> wrote: > Try lit(fromDate) and lit(toDate). You've to import > org.apache.spark.sql.functions.lit > to use it > > On 31 Mar 2017 7:45 a.m., "shyla deshpande" <

dataframe filter, unable to bind variable

2017-03-30 Thread shyla deshpande
The following works df.filter($"createdate".between("2017-03-20", "2017-03-22")) I would like to pass variables fromdate and todate to the filter instead of constants. Unable to get the syntax right. Please help. Thanks

Re: Will the setting for spark.default.parallelism be used for spark.sql.shuffle.output.partitions?

2017-03-30 Thread shyla deshpande
The spark version I am using is spark 2.1. On Thu, Mar 30, 2017 at 9:58 AM, shyla deshpande <deshpandesh...@gmail.com> wrote: > Thanks >

Will the setting for spark.default.parallelism be used for spark.sql.shuffle.output.partitions?

2017-03-30 Thread shyla deshpande
Thanks

Re: Spark SQL, dataframe join questions.

2017-03-29 Thread shyla deshpande
On Tue, Mar 28, 2017 at 2:57 PM, shyla deshpande <deshpandesh...@gmail.com> wrote: > Following are my questions. Thank you. > > 1. When joining dataframes is it a good idea to repartition on the key column > that is used in the join or > the optimizer is too smart so for

Re: dataframe join questions. Appreciate your input.

2017-03-29 Thread shyla deshpande
On Tue, Mar 28, 2017 at 2:57 PM, shyla deshpande <deshpandesh...@gmail.com> wrote: > Following are my questions. Thank you. > > 1. When joining dataframes is it a good idea to repartition on the key column > that is used in the join or > the optimizer is too smart so for

dataframe join questions?

2017-03-28 Thread shyla deshpande
Following are my questions. Thank you. 1. When joining dataframes is it a good idea to repartition on the key column that is used in the join or the optimizer is too smart so forget it. 2. In RDD join, wherever possible we do reduceByKey before the join to avoid a big shuffle of data. Do we need

Re: Converting dataframe to dataset question

2017-03-23 Thread shyla deshpande
import spark.implicits._ > val df = Seq(TeamUser("t1", "u1", "r1")).toDF() > df.printSchema() > spark.close() > } > } > > case class TeamUser(teamId: String, userId: String, role: String) > > > On Fri, Mar 24, 2017 at 5:23 A

Re: Spark dataframe, UserDefinedAggregateFunction(UDAF) help!!

2017-03-23 Thread shyla deshpande
Seq*[String]](0) > > > Yong > > > ------ > *From:* shyla deshpande <deshpandesh...@gmail.com> > *Sent:* Thursday, March 23, 2017 8:18 PM > *To:* user > *Subject:* Spark dataframe, UserDefinedAggregateFunction(UDAF) help!! > > This is my input data. The UDAF n

Spark dataframe, UserDefinedAggregateFunction(UDAF) help!!

2017-03-23 Thread shyla deshpande
This is my input data. The UDAF needs to aggregate the goals for a team and return a map that gives the count for every goal in the team. I am getting the following error java.lang.ClassCastException: scala.collection.mutable.WrappedArray$ofRef cannot be cast to [Ljava.lang.String; at

Re: Converting dataframe to dataset question

2017-03-23 Thread shyla deshpande
userDS.show > +--+--+-+ > |teamid|userid| role| > +--+--+-+ > |t1|u1|role1| > +--+--+-----+ > > > scala> userDS.printSchema > root > |-- teamid: string (nullable = true) > |-- userid: string (nullable = true) > |-- ro

Re: Converting dataframe to dataset question

2017-03-23 Thread shyla deshpande
userDS:Dataset[Teamuser] = userDF.as[Teamuser] On Thu, Mar 23, 2017 at 12:49 PM, shyla deshpande <deshpandesh...@gmail.com> wrote: > I realized, my case class was inside the object. It should be defined > outside the scope of the object. Thanks > > On Wed, Mar 22, 2017 at 6:07 P

Re: Converting dataframe to dataset question

2017-03-23 Thread shyla deshpande
I realized, my case class was inside the object. It should be defined outside the scope of the object. Thanks On Wed, Mar 22, 2017 at 6:07 PM, shyla deshpande <deshpandesh...@gmail.com> wrote: > Why userDS is Dataset[Any], instead of Dataset[Teamuser]? Appreciate your >

Converting dataframe to dataset question

2017-03-22 Thread shyla deshpande
Why userDS is Dataset[Any], instead of Dataset[Teamuser]? Appreciate your help. Thanks val spark = SparkSession .builder .config("spark.cassandra.connection.host", cassandrahost) .appName(getClass.getSimpleName) .getOrCreate() import spark.implicits._ val

Spark Streaming questions, just 2

2017-03-21 Thread shyla deshpande
Hello all, I have a couple of spark streaming questions. Thanks. 1. In the case of stateful operations, the data is, by default, persistent in memory. In memory does it mean MEMORY_ONLY? When is it removed from memory? 2. I do not see any documentation for spark.cleaner.ttl. Is this no

Re: spark streaming with kafka source, how many concurrent jobs?

2017-03-21 Thread shyla deshpande
p1 of batch X+1 > This is not safe because it breaks the checkpointing logic in subtle ways. > Note that this was never documented in the spark online docs. > > On Tue, Mar 14, 2017 at 2:29 PM, shyla deshpande <deshpandesh...@gmail.com > > wrote: > >> Thanks TD

Re: spark streaming with kafka source, how many concurrent jobs?

2017-03-14 Thread shyla deshpande
, Tathagata Das <t...@databricks.com> wrote: > That config I not safe. Please do not use it. > > On Mar 10, 2017 10:03 AM, "shyla deshpande" <deshpandesh...@gmail.com> > wrote: > >> I have a spark streaming application which processes 3 kafka streams and >

spark streaming with kafka source, how many concurrent jobs?

2017-03-10 Thread shyla deshpande
I have a spark streaming application which processes 3 kafka streams and has 5 output operations. Not sure what should be the setting for spark.streaming.concurrentJobs. 1. If the concurrentJobs setting is 4 does that mean 2 output operations will be run sequentially? 2. If I had 6 cores what

Re: error in kafka producer

2017-02-28 Thread shyla deshpande
5 PM, Marco Mistroni <mmistr...@gmail.com> wrote: > This exception coming from a Spark program? > could you share few lines of code ? > > kr > marco > > On Tue, Feb 28, 2017 at 10:23 PM, shyla deshpande < > deshpandesh...@gm

error in kafka producer

2017-02-28 Thread shyla deshpande
producer send callback exception: org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for positionevent-6 due to 30003 ms has passed since batch creation plus linger time

Re: In Spark streaming, will saved kafka offsets become invalid if I change the number of partitions in a kafka topic?

2017-02-26 Thread shyla deshpande
Please help! On Sat, Feb 25, 2017 at 11:10 PM, shyla deshpande <deshpandesh...@gmail.com> wrote: > I am commiting offsets to Kafka after my output has been stored, using the > commitAsync API. > > My question is if I increase/decrease the number of kafka partitions, will &

In Spark streaming, will saved kafka offsets become invalid if I change the number of partitions in a kafka topic?

2017-02-25 Thread shyla deshpande
I am commiting offsets to Kafka after my output has been stored, using the commitAsync API. My question is if I increase/decrease the number of kafka partitions, will the saved offsets will become invalid. Thanks

Spark streaming on AWS EC2 error . Please help

2017-02-20 Thread shyla deshpande
I am running Spark streaming on AWS EC2 in standalone mode. When I do a spark-submit, I get the following message. I am subscribing to 3 kafka topics and it is reading and processing just 2 topics. Works fine in local mode. Appreciate your help. Thanks Exception in thread "pool-26-thread-132"

Re: Spark standalone cluster on EC2 error .. Checkpoint..

2017-02-17 Thread shyla deshpande
<tathagata.das1...@gmail.com > wrote: > Seems like an issue with the HDFS you are using for checkpointing. Its not > able to write data properly. > > On Thu, Feb 16, 2017 at 2:40 PM, shyla deshpande <deshpandesh...@gmail.com > > wrote: > >> Caused by: org.apache.hadoop.ipc

Spark standalone cluster on EC2 error .. Checkpoint..

2017-02-16 Thread shyla deshpande
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /checkpoint/11ea8862-122c-4614-bc7e-f761bb57ba23/rdd-347/.part-1-attempt-3 could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this

Re: Spark streaming question - SPARK-13758 Need to use an external RDD inside DStream processing...Please help

2017-02-07 Thread shyla deshpande
and my cached RDD is not small. If it was maybe I could materialize and broadcast. Thanks On Tue, Feb 7, 2017 at 10:28 AM, shyla deshpande <deshpandesh...@gmail.com> wrote: > I have a situation similar to the following and I get SPARK-13758 > <https://issues.apache.org/jira/bro

Spark streaming question - SPARK-13758 Need to use an external RDD inside DStream processing...Please help

2017-02-07 Thread shyla deshpande
I have a situation similar to the following and I get SPARK-13758 . I understand why I get this error, but I want to know what should be the approach in dealing with these situations. Thanks > var cached =

Re: saveToCassandra issue. Please help

2017-02-03 Thread shyla deshpande
> > 1.- Do some grouping before insert to Cassandra. > 2.- Insert to cassandra all the entries and add some logic to your > request to get the most recent. > > Regards, > > 2017-02-03 10:26 GMT-08:00 shyla deshpande <deshpandesh...@gmail.com>: > > Hi All, > &

Re: saveToCassandra issue. Please help

2017-02-03 Thread shyla deshpande
is PT0H9M30S and then PT0H10M0S. Appreciate your input. Thanks On Fri, Feb 3, 2017 at 12:45 AM, shyla deshpande <deshpandesh...@gmail.com> wrote: > Hello All, > > This is the content of my RDD which I am saving to Cassandra table. > > But looks like the 2nd row is written first a

saveToCassandra issue. Please help

2017-02-03 Thread shyla deshpande
Hello All, This is the content of my RDD which I am saving to Cassandra table. But looks like the 2nd row is written first and then the first row overwrites it. So I end up with bad output. (494bce4f393b474980290b8d1b6ebef9, 2017-02-01, PT0H9M30S, WEDNESDAY) (494bce4f393b474980290b8d1b6ebef9,

Re: mapWithState question

2017-01-30 Thread shyla deshpande
So, when tasks get reexecuted due to any failure, your mapping function > will also be reexecuted, and the writes to kafka can happen multiple times. > So you may only get at least once guarantee about those Kafka writes > > > On Mon, Jan 30, 2017 at 10:02 AM, shyla deshpande < > desh

Re: mapWithState question

2017-01-30 Thread shyla deshpande
Hello, TD, your suggestion works great. Thanks I have 1 more question, I need to write to kafka from within the mapWithState function. Just wanted to check if this a bad pattern in any way. Thank you. On Sat, Jan 28, 2017 at 9:14 AM, shyla deshpande <deshpandesh...@gmail.com>

Re: mapWithState question

2017-01-28 Thread shyla deshpande
Jan 28, 2017 at 12:30 AM, shyla deshpande < > deshpandesh...@gmail.com> wrote: > >> Can multiple DStreams manipulate a state? I have a stream that gives me >> total minutes the user spent on a course material. I have another stream >> that gives me chapters completed a

mapWithState question

2017-01-28 Thread shyla deshpande
Can multiple DStreams manipulate a state? I have a stream that gives me total minutes the user spent on a course material. I have another stream that gives me chapters completed and lessons completed by the user. I want to keep track for each user total_minutes, chapters_completed and

Re: where is mapWithState executed?

2017-01-25 Thread shyla deshpande
After more reading, I know the state is distributed across the cluster. But If I need to lookup a map in the updatefunction, I need to broadcast it. Just want to make sure I am on the right path. Appreciate your help. Thanks On Wed, Jan 25, 2017 at 2:33 PM, shyla deshpande <deshpand

where is mapWithState executed?

2017-01-25 Thread shyla deshpande
Is it executed on the driver or executor. If I need to lookup a map in the updatefunction, I need to broadcast it, if mapWithState executed runs on executor. Thanks

Re: How to make the state in a streaming application idempotent?

2017-01-25 Thread shyla deshpande
Processing of the same data more than once can happen only when the app recovers after failure or during upgrade. So how do I apply your 2nd solution only for 1-2 hrs after restart. On Wed, Jan 25, 2017 at 12:51 PM, shyla deshpande <deshpandesh...@gmail.com> wrote: > Thanks Burak.

Re: How to make the state in a streaming application idempotent?

2017-01-25 Thread shyla deshpande
ith Big Data, generally you would be okay with > approximations. Try both out, see what scales/works with your dataset. > Maybe you may handle the second implementation. > > On Wed, Jan 25, 2017 at 12:23 PM, shyla deshpande < > deshpandesh...@gmail.com> wrote: > >> Thanks Burak

Re: How to make the state in a streaming application idempotent?

2017-01-25 Thread shyla deshpande
ore space than the BloomFilter though depending on > your data volume. > > On Wed, Jan 25, 2017 at 11:24 AM, shyla deshpande < > deshpandesh...@gmail.com> wrote: > >> In the previous email you gave me 2 solutions >> 1. Bloom filter --> problem in repopulating the

Re: How to make the state in a streaming application idempotent?

2017-01-25 Thread shyla deshpande
lter. > > Does that make sense? > > > On Wed, Jan 25, 2017 at 9:28 AM, shyla deshpande <deshpandesh...@gmail.com > > wrote: > >> Hi Burak, >> Thanks for the response. Can you please elaborate on your idea of storing >> the state of the unique ids. >> Do

Re: How to make the state in a streaming application idempotent?

2017-01-25 Thread shyla deshpande
t; Burak > > On Tue, Jan 24, 2017 at 9:53 PM, shyla deshpande <deshpandesh...@gmail.com > > wrote: > >> Please share your thoughts. >> >> On Tue, Jan 24, 2017 at 4:01 PM, shyla deshpande < >> deshpandesh...@gmail.com> wrote: >> >&g

Recovering from checkpoint question

2017-01-24 Thread shyla deshpande
If I just want to resubmit the spark streaming app with different configuration options like different --executor-memory or --total-executor-cores, will the checkpoint directory help me continue from where I left off. Appreciate your response. Thanks

Re: How to make the state in a streaming application idempotent?

2017-01-24 Thread shyla deshpande
Please share your thoughts. On Tue, Jan 24, 2017 at 4:01 PM, shyla deshpande <deshpandesh...@gmail.com> wrote: > > > On Tue, Jan 24, 2017 at 9:44 AM, shyla deshpande <deshpandesh...@gmail.com > > wrote: > >> My streaming application stores lot of aggregations

Re: How to make the state in a streaming application idempotent?

2017-01-24 Thread shyla deshpande
On Tue, Jan 24, 2017 at 9:44 AM, shyla deshpande <deshpandesh...@gmail.com> wrote: > My streaming application stores lot of aggregations using mapWithState. > > I want to know what are all the possible ways I can make it idempotent. > > Please share your views. > > Than

Re: How to make the state in a streaming application idempotent?

2017-01-24 Thread shyla deshpande
My streaming application stores lot of aggregations using mapWithState. I want to know what are all the possible ways I can make it idempotent. Please share your views. Thanks On Mon, Jan 23, 2017 at 5:41 PM, shyla deshpande <deshpandesh...@gmail.com> wrote: > In a Wordcount applicat

How to make the state in a streaming application idempotent?

2017-01-23 Thread shyla deshpande
In a Wordcount application which stores the count of all the words input so far using mapWithState. How do I make sure my counts are not messed up if I happen to read a line more than once? Appreciate your response. Thanks

Re: Using mapWithState without a checkpoint

2017-01-23 Thread shyla deshpande
Hello spark users, I do have the same question as Daniel. I would like to save the state in Cassandra and on failure recover using the initialState. If some one has already tried this, please share your experience and sample code. Thanks. On Thu, Nov 17, 2016 at 9:45 AM, Daniel Haviv <

Re: Spark streaming app that processes Kafka DStreams produces no output and no error

2017-01-19 Thread shyla deshpande
There was a issue connecting to Kafka, once that was fixed the spark app works. Hope this helps someone. Thanks On Mon, Jan 16, 2017 at 7:58 AM, shyla deshpande <deshpandesh...@gmail.com> wrote: > Hello, > I checked the log file on the worker node and don't s

Re: Spark streaming app that processes Kafka DStreams produces no output and no error

2017-01-16 Thread shyla deshpande
PM, shyla deshpande <deshpandesh...@gmail.com> wrote: > Hello, > > I want to add that, > I don't even see the streaming tab in the application UI on port 4040 when > I run it on the cluster. > The cluster on EC2 has 1 master node and 1 worker node. > The cores used o

Re: Spark streaming app that processes Kafka DStreams produces no output and no error

2017-01-14 Thread shyla deshpande
with just 2 cores? Appreciate your time and help. Thanks On Fri, Jan 13, 2017 at 10:46 PM, shyla deshpande <deshpandesh...@gmail.com> wrote: > Hello, > > My spark streaming app that reads kafka topics and prints the DStream > works fine on my laptop, but on AWS cluster it

Spark streaming app that processes Kafka DStreams produces no output and no error

2017-01-13 Thread shyla deshpande
Hello, My spark streaming app that reads kafka topics and prints the DStream works fine on my laptop, but on AWS cluster it produces no output and no errors. Please help me debug. I am using Spark 2.0.2 and kafka-0-10 Thanks The following is the output of the spark streaming app... 17/01/14

Re: Docker image for Spark streaming app

2017-01-08 Thread shyla deshpande
started. Thanks On Sun, Jan 8, 2017 at 1:52 PM, shyla deshpande <deshpandesh...@gmail.com> wrote: > Thanks really appreciate. > > On Sun, Jan 8, 2017 at 1:02 PM, vvshvv <vvs...@gmail.com> wrote: > >> Hi, >> >> I am running spark streaming job using s

Docker image for Spark streaming app

2017-01-08 Thread shyla deshpande
I looking for a docker image that I can use from docker hub for running a spark streaming app with scala and spark 2.0 +. I am new to docker and unable to find one image from docker hub that suits my needs. Please let me know if anyone is using a docker for spark streaming app and share your

How do I read data in dockerized kafka from a spark streaming application

2017-01-06 Thread shyla deshpande
My kafka is in a docker container. How do I read this Kafka data in my Spark streaming app. Also, I need to write data from Spark Streaming to Cassandra database which is in docker container. I appreciate any help. Thanks.

Re: How many Spark streaming applications can be run at a time on a Spark cluster?

2016-12-24 Thread shyla deshpande
gt;> streaming application. >> >> >> Thanks, >> Divya >> >> On 15 December 2016 at 08:42, shyla deshpande <deshpandesh...@gmail.com> >> wrote: >> >>> How many Spark streaming applications can be run at a time on a Spark >>> c

How many Spark streaming applications can be run at a time on a Spark cluster?

2016-12-14 Thread shyla deshpande
How many Spark streaming applications can be run at a time on a Spark cluster? Is it better to have 1 spark streaming application to consume all the Kafka topics or have multiple streaming applications when possible to keep it simple? Thanks

Livy VS Spark Job Server

2016-12-12 Thread shyla deshpande
It will be helpful if someone can compare Livy and Spark Job Server. Thanks

Wrting data from Spark streaming to AWS Redshift?

2016-12-09 Thread shyla deshpande
Hello all, Is it possible to Write data from Spark streaming to AWS Redshift? I came across the following article, so looks like it works from a Spark batch program. https://databricks.com/blog/2015/10/19/introducing-redshift-data-source-for-spark.html I want to write to AWS Redshift from

Re: Spark 2.0.2 , using DStreams in Spark Streaming . How do I create SQLContext? Please help

2016-12-01 Thread shyla deshpande
option. > > Thanks > Deepak > > On Thu, Dec 1, 2016 at 12:27 PM, shyla deshpande <deshpandesh...@gmail.com > > wrote: > >> I am Spark 2.0.2 , using DStreams because I need Cassandra Sink. >> >> How do I create SQLContext? I get the error SQLConte

Spark 2.0.2 , using DStreams in Spark Streaming . How do I create SQLContext? Please help

2016-11-30 Thread shyla deshpande
I am Spark 2.0.2 , using DStreams because I need Cassandra Sink. How do I create SQLContext? I get the error SQLContext deprecated. *[image: Inline image 1]* *Thanks*

Re: updateStateByKey -- when the key is multi-column (like a composite key )

2016-11-30 Thread shyla deshpande
ich would include > your fields. Another approach would be to create a hash (like create a > json version of the hash and return that.) > > On Wed, Nov 30, 2016 at 12:30 PM, shyla deshpande < > deshpandesh...@gmail.com> wrote: > >> updateStateByKey - Can this be

updateStateByKey -- when the key is multi-column (like a composite key )

2016-11-30 Thread shyla deshpande
updateStateByKey - Can this be used when the key is multi-column (like a composite key ) and the value is not numeric. All the examples I have come across is where the key is a simple String and the Value is numeric. Appreciate any help. Thanks

Re: Do I have to wrap akka around spark streaming app?

2016-11-29 Thread shyla deshpande
tions > http://allegro.tech/2015/08/spark-kafka-integration.html > > > 2016-11-29 2:18 GMT+01:00 shyla deshpande <deshpandesh...@gmail.com>: > >> Hello All, >> >> I just want to make sure this is a right use case for Kafka --> Spark >> Streaming >

Re: Do I have to wrap akka around spark streaming app?

2016-11-28 Thread shyla deshpande
deo as complete and that triggers a lot of other events. I need a way to notify the app about the creation of the completion event. Appreciate any suggestions. Thanks On Mon, Nov 28, 2016 at 2:35 PM, shyla deshpande <deshpandesh...@gmail.com> wrote: > In this case, persisting to Cassandra i

Re: Do I have to wrap akka around spark streaming app?

2016-11-28 Thread shyla deshpande
notify the user ? > > 2016-11-28 23:15 GMT+01:00 shyla deshpande <deshpandesh...@gmail.com>: > >> Thanks Vincent for the input. Not sure I understand your suggestion. >> Please clarify. >> >> Few words about my use case : >> When the user watches a v

Re: Do I have to wrap akka around spark streaming app?

2016-11-28 Thread shyla deshpande
> > 2016-11-28 21:46 GMT+01:00 shyla deshpande <deshpandesh...@gmail.com>: > >> Thanks Daniel for the response. >> >> I am planning to use Spark streaming to do Event Processing. I will have >> akka actors sending messages to kafka. I process them using Spark s

Re: Do I have to wrap akka around spark streaming app?

2016-11-28 Thread shyla deshpande
use case a bit? > > In general, there is no single correct answer to your current question as > it's quite broad. > > Daniel > > On Mon, Nov 28, 2016 at 9:11 AM, shyla deshpande <deshpandesh...@gmail.com > > wrote: > >> My data pipeline is Kafka --> Spark Streaming --&g

Re: Do I have to wrap akka around spark streaming app?

2016-11-28 Thread shyla deshpande
Anyone with experience of spark streaming in production, appreciate your input. Thanks -shyla On Mon, Nov 28, 2016 at 12:11 AM, shyla deshpande <deshpandesh...@gmail.com> wrote: > My data pipeline is Kafka --> Spark Streaming --> Cassandra. > > Can someone please explain

Do I have to wrap akka around spark streaming app?

2016-11-28 Thread shyla deshpande
My data pipeline is Kafka --> Spark Streaming --> Cassandra. Can someone please explain me when would I need to wrap akka around the spark streaming app. My knowledge of akka and the actor system is poor. Please help! Thanks

Re: How do I persist the data after I process the data with Structured streaming...

2016-11-22 Thread shyla deshpande
.foreach. >> >> On Tue, Nov 22, 2016 at 12:55 PM, shyla deshpande < >> deshpandesh...@gmail.com> wrote: >> >>> Hi, >>> >>> Structured streaming works great with Kafka source but I need to persist >>> the data after processing in some database like Cassandra or at least >>> Postgres. >>> >>> Any suggestions, help please. >>> >>> Thanks >>> >> >> >

How do I persist the data after I process the data with Structured streaming...

2016-11-22 Thread shyla deshpande
Hi, Structured streaming works great with Kafka source but I need to persist the data after processing in some database like Cassandra or at least Postgres. Any suggestions, help please. Thanks

  1   2   >