Interesting Stateful Streaming question

2017-06-29 Thread kant kodali
Hi All, Here is a problem and I am wondering if Spark Streaming is the right tool for this ? I have stream of messages m1, m2, m3and each of those messages can be in state s1, s2, s3,sn (you can imagine the number of states are about 100) and I want to compute some metrics that visit all

Re: Interesting Stateful Streaming question

2017-06-29 Thread kant kodali
Is mapWithState an answer for this ? https://databricks.com/blog/2016/02/01/faster-stateful-stream-processing-in-apache-spark-streaming.html On Thu, Jun 29, 2017 at 11:55 AM, kant kodali <kanth...@gmail.com> wrote: > Hi All, > > Here is a problem and I am wondering if

Re: What's the simplest way to Read Avro records from Kafka to Spark DataSet/DataFrame?

2017-07-03 Thread kant kodali
roSchema", schema.toString) .load("src/test/resources/episodes.avro").show() On Thu, Jun 29, 2017 at 1:59 AM, kant kodali <kanth...@gmail.com> wrote: > Forgot to mention I am getting a stream of Avro records and I want to do > Structured streaming on these Avro r

Re: What's the simplest way to Read Avro records from Kafka to Spark DataSet/DataFrame?

2017-06-29 Thread kant kodali
Forgot to mention I am getting a stream of Avro records and I want to do Structured streaming on these Avro records but first I wan to be able to parse them and put them in a DataSet or something like that. On Thu, Jun 29, 2017 at 12:56 AM, kant kodali <kanth...@gmail.com> wrote: &g

What's the simplest way to Read Avro records from Kafka to Spark DataSet/DataFrame?

2017-06-29 Thread kant kodali
Hi All, What's the simplest way to Read Avro records from Kafka and put it into Spark DataSet/DataFrame without using Confluent Schema registry or Twitter Bijection API? Thanks!

GraphQL to Spark SQL

2017-07-06 Thread kant kodali
Hi All, I wonder if anyone had experience exposing Spark SQL interface through GraphQL? The main benefit I see is that we could send Spark SQL query through REST so clients can express their own transformations over REST. I understand the final outcome is probably the same as what one would

If I pass raw SQL string to dataframe do I still get the Spark SQL optimizations?

2017-07-06 Thread kant kodali
HI All, I am wondering If I pass a raw SQL string to dataframe do I still get the Spark SQL optimizations? why or why not? Thanks!

How to create SparkSession using SparkConf?

2017-04-26 Thread kant kodali
Hi All, I am wondering how to create SparkSession using SparkConf object? Although I can see that most of the key value pairs we set in SparkConf we can also set in SparkSession or SparkSession.Builder however I don't see sparkConf.setJars which is required right? Because we want the driver jar

Re: How to create SparkSession using SparkConf?

2017-04-27 Thread kant kodali
on").config("spark.jars", "a.jar, > b.jar").getOrCreate() > > > Thanks > > Yanbo > > > On Thu, Apr 27, 2017 at 9:21 AM, kant kodali <kanth...@gmail.com> wrote: > >> I am using Spark 2.1 BTW. >> >> On Wed, Apr 26, 2017 a

Re: How to create SparkSession using SparkConf?

2017-04-26 Thread kant kodali
I am using Spark 2.1 BTW. On Wed, Apr 26, 2017 at 3:22 PM, kant kodali <kanth...@gmail.com> wrote: > Hi All, > > I am wondering how to create SparkSession using SparkConf object? Although > I can see that most of the key value pairs we set in SparkConf we can also >

Re: How to create SparkSession using SparkConf?

2017-04-27 Thread kant kodali
normally pass when we call StreamingContext.getOrCreate. On Thu, Apr 27, 2017 at 8:47 AM, kant kodali <kanth...@gmail.com> wrote: > Ahhh Thanks much! I miss my sparkConf.setJars function instead of this > hacky comma separated jar names. > > On Thu, Apr 27, 2017 at 8:

How to convert Dstream of JsonObject to Dataframe in spark 2.1.0?

2017-04-24 Thread kant kodali
Hi All, How to convert Dstream of JsonObject to Dataframe in spark 2.1.0? That JsonObject is from Gson Library. Thanks!

Re: How to convert Dstream of JsonObject to Dataframe in spark 2.1.0?

2017-04-24 Thread kant kodali
t; which you can use to convert between JsonObjects to StructType schemas > > Regards > Sam > > > On Sun, Apr 23, 2017 at 7:50 PM, kant kodali <kanth...@gmail.com> wrote: > >> Hi All, >> >> How to convert Dstream of JsonObject to Dataframe in spark 2.1.0? That >> JsonObject is from Gson Library. >> >> Thanks! >> > >

Fwd: PageRank - 4x slower then Spark?!

2017-08-18 Thread kant kodali
-- Forwarded message -- From: Kaepke, Marc Date: Fri, Aug 18, 2017 at 10:51 AM Subject: PageRank - 4x slower then Spark?! To: "u...@flink.apache.org" Hi everyone, I compared Flink and Spark by using PageRank. I guessed Flink

Re: Does Spark SQL uses Calcite?

2017-08-19 Thread kant kodali
, kant On Sat, Aug 12, 2017 at 6:07 AM, Russell Spitzer <russell.spit...@gmail.com> wrote: > You don't have to go through hive. It's just spark sql. The application is > just a forked hive thrift server. > > On Fri, Aug 11, 2017 at 8:53 PM kant kodali <kanth...@gmail.c

What are Analysis Errors With respect to Spark Sql DataFrames and DataSets?

2017-05-03 Thread kant kodali
Hi All, I understand the compile time Errors this blog is talking about but I don't understand what are Analysis Errors? Any Examples? Thanks!

Re: What are Analysis Errors With respect to Spark Sql DataFrames and DataSets?

2017-05-03 Thread kant kodali
) would throw an analysis exception. > > On Wed, May 3, 2017 at 1:38 PM, kant kodali <kanth...@gmail.com> wrote: > >> Hi All, >> >> I understand the compile time Errors this blog >> <https://databricks.com/blog/2016/07/14/a-tale-of-three-apache-spark-apis-rdds

Re: How to print data to console in structured streaming using Spark 2.1.0?

2017-05-17 Thread kant kodali
afka-clients > 0.10.0.1 > > > > On Tue, May 16, 2017 at 4:29 PM, kant kodali <kanth...@gmail.com> wrote: > >> Hi Michael, >> >> Thanks for the catch. I assume you meant >> *spark-streaming-kafka-0-10_2.11-2.1.0.jar* >> >> I add this

How to flatten struct into a dataframe?

2017-05-17 Thread kant kodali
Hi, I have the following schema. And I am trying to put the structure below in a data frame or dataset such that each in field inside a struct is a column in a data frame. I tried to follow this link and

Re: How to see the full contents of dataset or dataframe is structured streaming?

2017-05-18 Thread kant kodali
so for console sink it is not possible? On Wed, May 17, 2017 at 11:30 PM, Jörn Franke <jornfra...@gmail.com> wrote: > You can also write it into a file and view it using your favorite > viewer/editor > > On 18. May 2017, at 04:55, kant kodali <kanth...@gmail.com> wr

Spark Structured Streaming is taking too long to process 2KB messages

2017-05-18 Thread kant kodali
Hi All, Here is my code. Dataset df = ds.select(functions.from_json(new Column("value").cast("string"), getSchema()).as("payload")); Dataset df1 = df.selectExpr("payload.data.*"); StreamingQuery query = df1.writeStream().outputMode("append").option("truncate",

Re: How to see the full contents of dataset or dataframe is structured streaming?

2017-05-18 Thread kant kodali
Looks like there is .option("truncate", "false") On Wed, May 17, 2017 at 11:30 PM, Jörn Franke <jornfra...@gmail.com> wrote: > You can also write it into a file and view it using your favorite > viewer/editor > > On 18. May 2017, at 04:55, kant kodali

How to see the full contents of dataset or dataframe is structured streaming?

2017-05-17 Thread kant kodali
Hi All, How to see the full contents of dataset or dataframe is structured streaming just like we normally with *df.show(false)*? Is there any parameter I can pass in to the code below? val df1 = df.selectExpr("payload.data.*"); df1.writeStream().outputMode("append").format("console").start()

Re: How to print data to console in structured streaming using Spark 2.1.0?

2017-05-16 Thread kant kodali
and print it > val wordCountsDataFrame = spark.sql("select *, now() as TStamp from > myTempTable") > wordCountsDataFrame.write.mode(mode).save(output) > val lines = wordCountsDataFrame.count().toInt > // wordCountsDataFrame.show(20, false) > print

what is the difference between json format vs kafka format?

2017-05-13 Thread kant kodali
HI All, What is the difference between sparkSession.readStream.format("kafka") vs sparkSession.readStream.format("json") ? I am sending json encoded messages in Kafka and I am not sure which one of the above I should use? Thanks!

Re: what is the difference between json format vs kafka format?

2017-05-13 Thread kant kodali
> schema. > > Make sure that the generated text file have sufficient data to infer the > full schema. Let me know if this works for you. > > TD > > > On Sat, May 13, 2017 at 6:04 PM, kant kodali <kanth...@gmail.com> wrote: > >> Hi! >> >> Thanks

Re: what is the difference between json format vs kafka format?

2017-05-13 Thread kant kodali
d > above). Here is our blog post on this. > > https://databricks.com/blog/2017/04/26/processing-data-in- > apache-kafka-with-structured-streaming-in-apache-spark-2-2.html > > And my talk also explains this. > > https://spark-summit.org/east-2017/events/making-structured- >

How to print data to console in structured streaming using Spark 2.1.0?

2017-05-16 Thread kant kodali
Hi All, I have the following code. val ds = sparkSession.readStream() .format("kafka") .option("kafka.bootstrap.servers",bootstrapServers)) .option("subscribe", topicName) .option("checkpointLocation", hdfsCheckPointDir)

Question on Spark code

2017-06-25 Thread kant kodali
Hi All, I came across this file https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/Logging.scala and I am wondering what is the purpose of this? Especially it doesn't prevent any string concatenation and also the if checks are already done by the library

Re: Question on Spark code

2017-06-25 Thread kant kodali
e, and a fairly niche/advanced feature. > > > On Sun, Jun 25, 2017 at 8:25 PM kant kodali <kanth...@gmail.com> wrote: > >> @Sean Got it! I come from Java world so I guess I was wrong in assuming >> that arguments are evaluated during the method invocation time. How about

Re: Question on Spark code

2017-06-25 Thread kant kodali
hat you want with log > statements. The message isn't constructed unless it will be logged. > > protected def logInfo(msg: => String) { > > > On Sun, Jun 25, 2017 at 10:28 AM kant kodali <kanth...@gmail.com> wrote: > >> Hi All, >> >> I came across this

Re: Is there a Kafka sink for Spark Structured Streaming

2017-05-19 Thread kant kodali
Thanks! On Fri, May 19, 2017 at 4:50 PM, Tathagata Das <tathagata.das1...@gmail.com> wrote: > Should release by the end of this month. > > On Fri, May 19, 2017 at 4:07 PM, kant kodali <kanth...@gmail.com> wrote: > >> Hi Patrick, >> >> I am using 2.1.1

Re: couple naive questions on Spark Structured Streaming

2017-05-22 Thread kant kodali
HI Burak, My response is inline. Thanks a lot! On Mon, May 22, 2017 at 9:26 AM, Burak Yavuz wrote: > Hi Kant, > >> >> >> 1. Can we use Spark Structured Streaming for stateless transformations >> just like we would do with DStreams or Spark Structured Streaming is only >>

Re: Are there any Kafka forEachSink examples?

2017-05-23 Thread kant kodali
PM, Michael Armbrust <mich...@databricks.com> wrote: > There is an example in this post: > > https://databricks.com/blog/2017/04/04/real-time-end-to- > end-integration-with-apache-kafka-in-apache-sparks- > structured-streaming.html > > On Tue, May 23, 2017 at 1

Re: 2.2. release date ?

2017-05-23 Thread kant kodali
Heard its end of this month (May) On Tue, May 23, 2017 at 9:41 AM, mojhaha kiklasds wrote: > Hello, > > I could see a RC2 candidate for Spark 2.2, but not sure about the expected > release timeline on that. > Would be great if somebody can confirm it. > > Thanks, >

Are there any Kafka forEachSink examples?

2017-05-23 Thread kant kodali
Hi All, Are there any Kafka forEachSink examples preferably in Java but Scala is fine too? Thanks!

Re: Running into the same problem as JIRA SPARK-19268

2017-05-24 Thread kant kodali
s there? > > On Wed, May 24, 2017 at 3:50 PM, Michael Armbrust <mich...@databricks.com> > wrote: > >> -dev >> >> Have you tried clearing out the checkpoint directory? Can you also give >> the full stack trace? >> >> On Wed, May 24, 2017 at 3:4

Re: Running into the same problem as JIRA SPARK-19268

2017-05-25 Thread kant kodali
uration).getClass) > > On Wed, May 24, 2017 at 5:59 PM, kant kodali <kanth...@gmail.com> wrote: > >> Hi All, >> >> I specified hdfsCheckPointDir = /usr/local/hadoop/checkpoint as you can >> see below however I dont see checkpoint directory under my hadoop_hom

Re: Running into the same problem as JIRA SPARK-19268

2017-05-25 Thread kant kodali
currently not setting that in conf/spark-env.sh and thats the only hadoop related environment variable I see. please let me know thanks! On Thu, May 25, 2017 at 1:19 AM, kant kodali <kanth...@gmail.com> wrote: > Hi Ryan, > > I did add that print statement and here is what I

couple naive questions on Spark Structured Streaming

2017-05-20 Thread kant kodali
Hi, 1. Can we use Spark Structured Streaming for stateless transformations just like we would do with DStreams or Spark Structured Streaming is only meant for stateful computations? 2. When we use groupBy and Window operations for event time processing and specify a watermark does this mean the

Re: Running into the same problem as JIRA SPARK-19268

2017-05-25 Thread kant kodali
s? Thanks! On Thu, May 25, 2017 at 1:31 AM, kant kodali <kanth...@gmail.com> wrote: > Executing this bin/hadoop fs -ls /usr/local/hadoop/checkpoint says > > ls: `/usr/local/hadoop/checkpoint': No such file or directory > > This is what I expected as well since I don't see any ch

Re: Running into the same problem as JIRA SPARK-19268

2017-05-25 Thread kant kodali
Should I file a ticket or should I try another version like Spark 2.2 since I am currently using 2.1.1? On Thu, May 25, 2017 at 2:38 PM, kant kodali <kanth...@gmail.com> wrote: > Hi Ryan, > > You are right I was setting checkpointLocation for readStream. Now I did > set

Re: Running into the same problem as JIRA SPARK-19268

2017-05-25 Thread kant kodali
riteStream`. However, I still have no > idea why use a temp checkpoint location will fail. > > On Thu, May 25, 2017 at 2:23 PM, kant kodali <kanth...@gmail.com> wrote: > >> I did the following >> >> *bin/hadoop fs -mkdir -p **/usr/local/hadoop/checkpoint* and did *bin

Re: Running into the same problem as JIRA SPARK-19268

2017-05-25 Thread kant kodali
On Thu, May 25, 2017 at 3:41 PM, Shixiong(Ryan) Zhu wrote: > bin/hadoop fs -ls /usr/local/hadoop/checkpoint/state/0/* > Hi, There are no files under bin/hadoop fs -ls /usr/local/hadoop/checkpoint/state/0/* but all the directories until

Re: Running into the same problem as JIRA SPARK-19268

2017-05-24 Thread kant kodali
Even if I do simple count aggregation like below I get the same error as https://issues.apache.org/jira/browse/SPARK-19268 Dataset df2 = df1.groupBy(functions.window(df1.col("Timestamp5"), "24 hours", "24 hours"), df1.col("AppName")).count(); On Wed, May

Re: Running into the same problem as JIRA SPARK-19268

2017-05-26 Thread kant kodali
logs output > by HDFSBackedStateStoreProvider. > > On Thu, May 25, 2017 at 4:05 PM, kant kodali <kanth...@gmail.com> wrote: > >> >> On Thu, May 25, 2017 at 3:41 PM, Shixiong(Ryan) Zhu < >> shixi...@databricks.com> wrote: >> >>> bin/hadoop fs -ls /usr/local/hadoop/checkpoint

Re: Spark Structured Streaming is taking too long to process 2KB messages

2017-05-18 Thread kant kodali
ok so the problem really was I was compiling with 2.1.0 jars and at run time supplying 2.1.1. once I changed to 2.1.1 at compile time as well it seem to work fine and I can see all my 75 fields. On Thu, May 18, 2017 at 2:39 AM, kant kodali <kanth...@gmail.com> wrote: > Hi All, >

Re: Spark Streaming: Custom Receiver OOM consistently

2017-05-22 Thread kant kodali
Well there are few things here. 1. What is the Spark Version? 2. You said there is OOM error but what is the cause that appears in the log message or stack trace? OOM can happen for various reasons and JVM usually specifies the cause in the error message. 3. What is the driver and executor

Re: How to print data to console in structured streaming using Spark 2.1.0?

2017-05-16 Thread kant kodali
ngOffsets", "earliest")" to start the > stream from the beginning. > > On Tue, May 16, 2017 at 12:36 AM, kant kodali <kanth...@gmail.com> wrote: > >> Hi All, >> >> I have the following code. >> >> val ds = sparkSession.readStream() &

Re: How to print data to console in structured streaming using Spark 2.1.0?

2017-05-16 Thread kant kodali
, Michael Armbrust <mich...@databricks.com> wrote: > Looks like you are missing the kafka dependency. > > On Tue, May 16, 2017 at 1:04 PM, kant kodali <kanth...@gmail.com> wrote: > >> Looks like I am getting the following runtime exception. I am using Spark &

How to convert Dataset to Dataset in Spark Structured Streaming?

2017-05-30 Thread kant kodali
Hi All, I have a Dataset and I am trying to convert it into Dataset (json String) using Spark Structured Streaming. I have tried the following. df2.toJSON().writeStream().foreach(new KafkaSink()) This doesn't seem to work for the following reason. "Queries with streaming sources must be

Re: How to convert Dataset to Dataset in Spark Structured Streaming?

2017-05-31 Thread kant kodali
om/blog/2017/02/23/working-complex-data-formats- > structured-streaming-apache-spark-2-1.html > > Cheers > Jules > > Sent from my iPhone > Pardon the dumb thumb typos :) > > On May 30, 2017, at 7:31 PM, kant kodali <kanth...@gmail.com> wrote: > > Hi All, > &

Re: How to convert Dataset to Dataset in Spark Structured Streaming?

2017-05-31 Thread kant kodali
: 34} but *what I need is something like this { result: {"name": "hello", "ratio": 1.56, "count": 34} } however I don't have a result column or even this {**"name": "hello", "ratio": 1.56, "count": 34} **would work.* On

Is there a way to do conditional group by in spark 2.1.1?

2017-06-03 Thread kant kodali
Hi All, Is there a way to do conditional group by in spark 2.1.1? other words, I want to do something like this if (field1 == "foo") { df.groupBy(field1) } else if (field2 == "bar") df.groupBy(field2) Thanks

Re: What is the easiest way for an application to Query parquet data on HDFS?

2017-06-04 Thread kant kodali
wrote: > Check out http://livy.io/ > > > On Sun, Jun 4, 2017 at 11:59 AM, kant kodali <kanth...@gmail.com> wrote: > >> Hi All, >> >> I am wondering what is the easiest way for a Micro service to query data >> on HDFS? By easiest way I mean using mi

Is there a way to do partial sort in update mode in Spark Structured Streaming?

2017-06-03 Thread kant kodali
Hi All, 1. Is there a way to do partial sort of say (timestamp column) in update mode? I am currently using Spark 2.1.1 and its looks like it is not possible however I am wondering if this possible in 2.2? 2. can we do full sort in update mode with a specified watermark? since after a specified

What is the easiest way for an application to Query parquet data on HDFS?

2017-06-04 Thread kant kodali
Hi All, I am wondering what is the easiest way for a Micro service to query data on HDFS? By easiest way I mean using minimal number of tools. Currently I use spark structured streaming to do some real time aggregations and store it in HDFS. But now, I want my Micro service app to be able to

Re: What is the easiest way for an application to Query parquet data on HDFS?

2017-06-05 Thread kant kodali
Spark > Standalone with a 32 node cluster. > > Hope this gives some better idea. > > Thanks, > Muthu > > > On Sun, Jun 4, 2017 at 10:33 PM, kant kodali <kanth...@gmail.com> wrote: > >> Hi Muthu, >> >> I am actually using Play framework for my Mic

Running into the same problem as JIRA SPARK-20325

2017-05-31 Thread kant kodali
Hi All, I am using Spark 2.1.1 and forEachSink to write to Kafka. I call .start and .awaitTermination for each query however I get the following error "Cannot start query with id d4b3554d-ee1d-469c-bf9d-19976c8a7b47 as another query with same id is already active" So, my question is the same

Re: How to convert Dataset to Dataset in Spark Structured Streaming?

2017-05-31 Thread kant kodali
https://stackoverflow.com/questions/44280360/how-to-convert-datasetrow-to-dataset-of-json-messages-to-write-to-kafka Thanks! On Wed, May 31, 2017 at 1:41 AM, kant kodali <kanth...@gmail.com> wrote: > small correction. > > If I try to convert a Row into a Json String it results

Re: What is the easiest way for an application to Query parquet data on HDFS?

2017-06-04 Thread kant kodali
Hi Muthu, I am actually using Play framework for my Micro service which uses Akka but I still don't understand How SparkSession can use Akka to communicate with SparkCluster? SparkPi or SparkPl? any link? Thanks!

Re: What is the real difference between Kafka streaming and Spark Streaming?

2017-06-15 Thread kant kodali
applications > <https://databricks.com/blog/2017/05/22/running-streaming-jobs-day-10x-cost-savings.html> > . > > Michael > > On Sun, Jun 11, 2017 at 1:12 AM, kant kodali <kanth...@gmail.com> wrote: > >> Hi All, >> >> I am trying hard

Re: What is the real difference between Kafka streaming and Spark Streaming?

2017-06-11 Thread kant kodali
ng application – >latency or throughput? >5. Kafka streaming is relatively new and less mature than Spark >Streaming > > > > Mohammed > > > > *From:* vincent gromakowski [mailto:vincent.gromakow...@gmail.com] > *Sent:* Sunday, June 11, 2017 12:09 PM >

Re: What is the real difference between Kafka streaming and Spark Streaming?

2017-06-11 Thread kant kodali
Also another difference I see is some thing like Spark Sql where there are logical plans, physical plans, Code generation and all those optimizations I don't see them in Kafka Streaming at this time. On Sun, Jun 11, 2017 at 2:19 PM, kant kodali <kanth...@gmail.com> wrote: > I a

What is the charting library used by Databricks UI?

2017-06-16 Thread kant kodali
Hi All, I am wondering what is the charting library used by Databricks UI to display graphs in real time while streaming jobs? Thanks!

Re: What is the charting library used by Databricks UI?

2017-06-16 Thread kant kodali
e in firebug. > > > > *From:* kant kodali [mailto:kanth...@gmail.com] > *Sent:* Friday, June 16, 2017 1:26 PM > *To:* user @spark > *Subject:* What is the charting library used by Databricks UI? > > > > Hi All, > > > > I am wondering what is the charting

What is the real difference between Kafka streaming and Spark Streaming?

2017-06-11 Thread kant kodali
Hi All, I am trying hard to figure out what is the real difference between Kafka Streaming vs Spark Streaming other than saying one can be used as part of Micro services (since Kafka streaming is just a library) and the other is a Standalone framework by itself. If I can accomplish same job one

Re: What are Analysis Errors With respect to Spark Sql DataFrames and DataSets?

2017-05-04 Thread kant kodali
Thanks a lot! On Wed, May 3, 2017 at 4:36 PM, Michael Armbrust wrote: > if I do dataset.select("nonExistentColumn") then the Analysis Error is >> thrown at compile time right? >> > > if you do df.as[MyClass].map(_.badFieldName) you will get a compile > error. However,

unable to find how to integrate SparkSession with a Custom Receiver.

2017-05-04 Thread kant kodali
Hi All, I have a Custom Receiver that implements onStart() and OnStop Methods of the Receiver class and I am trying to figure out how to integrate with SparkSession since I want to do stateful analytics using Structured Streaming. I couldn't find it in the docs. any idea? When I was doing

is Spark Application code dependent on which mode we run?

2017-05-05 Thread kant kodali
Hi All, Does rdd.collect() call works for Client mode but not for cluster mode? If so, is there way for the Application to know which mode it is running in? It looks like for cluster mode we don't need to call rdd.collect() instead we can just call rdd.first() or whatever Thanks!

Re: unable to find how to integrate SparkSession with a Custom Receiver.

2017-05-04 Thread kant kodali
-in- > apache-kafka-with-structured-streaming-in-apache-spark-2-2.html > > > On Thu, May 4, 2017 at 12:43 AM, kant kodali <kanth...@gmail.com> wrote: > >> Hi All, >> >> I have a Custom Receiver that implements onStart() and OnStop Methods of >> the Receiver cl

do we need to enable writeAheadLogs for DirectStream as well or is it only for indirect stream?

2017-05-02 Thread kant kodali
Hi All, I need some fault tolerance for my stateful computations and I am wondering why we need to enable writeAheadLogs for DirectStream like Kafka (for Indirect stream it makes sense). In case of driver failure DirectStream such as Kafka can pull the messages again from the last committed

Re: Is there a Kafka sink for Spark Structured Streaming

2017-05-19 Thread kant kodali
Hi Patrick, I am using 2.1.1 and I tried the above code you sent and I get "java.lang.UnsupportedOperationException: Data source kafka does not support streamed writing" so yeah this probably works only from Spark 2.2 onwards. I am not sure when it officially releases. Thanks! On Fri, May 19,

The following Error seems to happen once in every ten minutes (Spark Structured Streaming)?

2017-05-31 Thread kant kodali
Hi All, When my query is streaming I get the following error once in say 10 minutes. Lot of the solutions online seems to suggest just clear data directories under datanode and namenode and restart the HDFS cluster but I didn't see anything that explains the cause? If it happens so frequent what

Re: Does Kafka dependency jars changed for Spark Structured Streaming 2.2.0?

2017-09-11 Thread kant kodali
;) .option("subscribe", "hello")) .option("startingOffsets", "earliest") .load() .writeStream() .format("console") .start(); query.awaitTermination(); On Mon, Sep 11, 2017 at 6:24 PM, kant kodali <kanth...@gmail.com> wrote: > Hi All, > > Does

Does Kafka dependency jars changed for Spark Structured Streaming 2.2.0?

2017-09-11 Thread kant kodali
Hi All, Does Kafka dependency jars changed for Spark Structured Streaming 2.2.0? kafka-clients-0.10.0.1.jar spark-streaming-kafka-0-10_2.11-2.2.0.jar 1) Above two are the only Kafka related jars or am I missing something? 2) What is the difference between the above two jars? 3) If I have

unable to read from Kafka (very strange)

2017-09-11 Thread kant kodali
Hi All, I started using spark 2.2.0 very recently and now I can't even get the json data from Kafka out to console. I have no clue what's happening. This was working for me when I was using 2.1.1 Here is my code StreamingQuery query = sparkSession.readStream() .format("kafka")

Should I use Dstream or Structured Stream to transfer data from source to sink and then back from sink to source?

2017-09-13 Thread kant kodali
Hi All, I am trying to read data from kafka, insert into Mongo and read from mongo and insert back into Kafka. I went with structured stream approach first however I believe I am making some naiver error because my map operations are not getting invoked. The pseudo code looks like this DataSet

Re: Should I use Dstream or Structured Stream to transfer data from source to sink and then back from sink to source?

2017-09-14 Thread kant kodali
t getting executed. ... are you sure > the query is running? Maybe actual code (not pseudocode) may help debug > this. > > > On Wed, Sep 13, 2017 at 11:20 AM, kant kodali <kanth...@gmail.com> wrote: > >> Hi All, >> >> I am trying to read data from kafka, insert

Structured streaming coding question

2017-09-19 Thread kant kodali
Hi All, I have the following Psuedo code (I could paste the real code however it is pretty long and involves Database calls inside dataset.map operation and so on) so I am just trying to simplify my question. would like to know if there is something wrong with the following pseudo code? DataSet

Re: How to read from multiple kafka topics using structured streaming (spark 2.2.0)?

2017-09-19 Thread kant kodali
tml :) > > Pozdrawiam, > Jacek Laskowski > > https://about.me/JacekLaskowski > Spark Structured Streaming (Apache Spark 2.2+) https://bit.ly/spark- > structured-streaming > Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark > Follow me at https://twitter.com

How to read from multiple kafka topics using structured streaming (spark 2.2.0)?

2017-09-19 Thread kant kodali
HI All, I am wondering How to read from multiple kafka topics using structured streaming (code below)? I googled prior to asking this question and I see responses related to Dstreams but not structured streams. Is it possible to read multiple topics using the same spark structured stream?

Re: Structured streaming coding question

2017-09-19 Thread kant kodali
Looks like my problem was the order of awaitTermination() for some reason. Doesn't work On Tue, Sep 19, 2017 at 1:54 PM, kant kodali <kanth...@gmail.com> wrote: > Hi All, > > I have the following Psuedo code (I could paste the real code however it > is pretty long and

Re: Structured streaming coding question

2017-09-19 Thread kant kodali
processingTime(1000)).foreach(new KafkaSink("hello2")).start(); query2.awaitTermination() On Tue, Sep 19, 2017 at 10:09 PM, kant kodali <kanth...@gmail.com> wrote: > Looks like my problem was the order of awaitTermination() for some reason. > > Doesn't work > >

Is there a SparkILoop for Java?

2017-09-20 Thread kant kodali
Hi All, I am wondering if there is a SparkILoop for java so I can pass Java code as a string to repl? Thanks!

Re: Structured streaming coding question

2017-09-20 Thread kant kodali
second query may fail, and as long as your > first query is running, you will not know. Put this as the last line > instead: > > spark.streams.awaitAnyTermination() > > On Tue, Sep 19, 2017 at 10:11 PM, kant kodali <kanth...@gmail.com> wrote: > >> Looks like my problem

Re: Structured streaming coding question

2017-09-20 Thread kant kodali
brk...@gmail.com> wrote: > Please remove > > query1.awaitTermination(); > query2.awaitTermination(); > > once > > query1.awaitTermination(); > > is called, you don't even get to query2.awaitTermination(). > > > On Tue, Sep 19, 2017 at 11:59 PM, kant kod

Re: Structured streaming coding question

2017-09-20 Thread kant kodali
Hi Burak, Are you saying get rid of both query1.awaitTermination(); query2.awaitTermination(); and just have the line below? sparkSession.streams().awaitAnyTermination(); Thanks! On Wed, Sep 20, 2017 at 12:51 AM, kant kodali <kanth...@gmail.com> wrote: > If I don't get anywh

Re: Structured streaming coding question

2017-09-20 Thread kant kodali
Just tried with sparkSession.streams().awaitAnyTermination(); And thats the only await* I had and it works! But what if I don't want all my queries to fail or stop making progress if one of them fails? On Wed, Sep 20, 2017 at 2:26 AM, kant kodali <kanth...@gmail.com> wrote: >

Re: Is there a SparkILoop for Java?

2017-09-20 Thread kant kodali
gt; interpretation of Java code through dynamic class loading, not as powerful > and requires some work on your end, but you can build a Java-based > interpreter. > > jg > > > On Sep 20, 2017, at 08:06, kant kodali <kanth...@gmail.com> wrote: > > Got it! Thats what I

Re: ConcurrentModificationException using Kafka Direct Stream

2017-09-17 Thread kant kodali
Are you creating threads in your application? On Sun, Sep 17, 2017 at 7:48 AM, HARSH TAKKAR wrote: > > Hi > > I am using spark 2.1.0 with scala 2.11.8, and while iterating over the > partitions of each rdd in a dStream formed using KafkaUtils, i am getting > the below

Re: ConcurrentModificationException using Kafka Direct Stream

2017-09-18 Thread kant kodali
ting any thread for kafka DStream > however, we have a single thread for refreshing a resource cache on > driver, but that is totally separate to this connection. > > On Mon, Sep 18, 2017 at 12:29 AM kant kodali <kanth...@gmail.com> wrote: > >> Are you creating threads in yo

Re: Is there a SparkILoop for Java?

2017-09-20 Thread kant kodali
gt; > On Wed, Sep 20, 2017 at 8:38 PM, kant kodali <kanth...@gmail.com> wrote: > >> Hi All, >> >> I am wondering if there is a SparkILoop >> <https://spark.apache.org/docs/0.8.0/api/repl/org/apache/spark/repl/SparkILoop.html> >> for >> java so I can pass Java code as a string to repl? >> >> Thanks! >> > >

Re: How to convert Array of Json rows into Dataset of specific columns in Spark 2.2.0?

2017-10-08 Thread kant kodali
om/blog/2017/06/27/4-sql-high-order- > lambda-functions-examine-complex-structured-data-databricks.html > > https://databricks.com/blog/2017/06/13/five-spark-sql- > utility-functions-extract-explore-complex-data-types.html > > Cheers > Jules > > > Sent from my iPhone >

Re: Does Spark 2.2.0 support Dataset<List<Map<String,Object>>> ?

2017-10-10 Thread kant kodali
public Seq<Map<String, String>> call(String input) throws Exception { List<Map<String, String>> temp = new ArrayList<>(); temp.add(new HashMap<String, String>()); return JavaConverters.asScalaBufferConverter(temp).asScala().toSeq();

Re: Does Spark 2.2.0 support Dataset<List<Map<String,Object>>> ?

2017-10-09 Thread kant kodali
use kryo encoder you can do your original Dataset< > List<Map<String,Object>>>> i think > > for example in scala i create here an intermediate Dataset[Any]: > > scala> Seq(1,2,3).toDS.map(x => if (x % 2 == 0) x else > x.toString)(org.apache.spark.sql.

Re: How to convert Array of Json rows into Dataset of specific columns in Spark 2.2.0?

2017-10-09 Thread kant kodali
https://issues.apache.org/jira/browse/SPARK-8 On Sun, Oct 8, 2017 at 11:58 AM, kant kodali <kanth...@gmail.com> wrote: > I have the following so far > > private StructType getSchema() { > return new StructType() > .add("name", StringTy

Re: Does Spark 2.2.0 support Dataset<List<Map<String,Object>>> ?

2017-10-09 Thread kant kodali
gt; it supports Dataset<Seq<Map<String, X>>> where X must be a supported type > also. Object is not a supported type. > > On Mon, Oct 9, 2017 at 7:36 AM, kant kodali <kanth...@gmail.com> wrote: > >> Hi All, >> >> I am wondering if spark supports

Re: is it ok to have multiple sparksession's in one spark structured streaming app?

2017-09-08 Thread kant kodali
perspective it is not good > idea. > > On Thu, Sep 7, 2017 at 2:40 AM, kant kodali <kanth...@gmail.com> wrote: > >> Hi All, >> >> I am wondering if it is ok to have multiple sparksession's in one spark >> structured streaming app? Basically, I want to create 1)

How to select the entire row that has max timestamp for every key in Spark Structured Streaming 2.1.1?

2017-08-29 Thread kant kodali
Hi All, I am wondering what is the easiest and concise way to express the computation below in Spark Structured streaming given that it supports both imperative and declarative styles? I am just trying to select rows that has max timestamp for each train? Instead of doing some sort of nested

<    1   2   3   4   5   >