Re: structured streaming join of streaming dataframe with static dataframe performance

2022-08-04 Thread kant kodali
I suspect it is probably because the incoming rows when I joined with static frame can lead to variable degree of skewness over time and if so it is probably better to employ different join strategies at run time. But if you know your Dataset I believe you can just do broadcast join for your

Re: https://spark-project.atlassian.net/browse/SPARK-1153

2020-02-24 Thread kant kodali
great! Thanks On Sun, Feb 23, 2020 at 3:53 PM kant kodali wrote: > Hi All, > > Any chance of fixing this one ? > https://spark-project.atlassian.net/browse/SPARK-1153 or offer some work > around may be? > > Currently, I got bunch of events streaming into kafka ac

Re: SparkGraph review process

2020-02-23 Thread kant kodali
Hi Sean, In that case, Can we have Graphframes as part of spark release? or separate release is also fine. Currently, I don't see any releases w.r.t Graphframes. Thanks On Fri, Feb 14, 2020 at 9:06 AM Sean Owen wrote: > This will not be Spark 3.0, no. > > On Fri, Feb 14, 2020 at 1:1

https://spark-project.atlassian.net/browse/SPARK-1153

2020-02-23 Thread kant kodali
Hi All, Any chance of fixing this one ? https://spark-project.atlassian.net/browse/SPARK-1153 or offer some work around may be? Currently, I got bunch of events streaming into kafka across various topics and they are stamped with an UUIDv1 for each event. so it is easy to construct edges using

Re: SparkGraph review process

2020-02-13 Thread kant kodali
any update on this? Is spark graph going to make it into Spark or no? On Mon, Oct 14, 2019 at 12:26 PM Holden Karau wrote: > Maybe let’s ask the folks from Lightbend who helped with the previous > scala upgrade for their thoughts? > > On Mon, Oct 14, 2019 at 8:24 PM Xiao Li wrote: > >> 1. On

https://github.com/google/zetasql

2019-05-21 Thread kant kodali
https://github.com/google/zetasql

Re: queryable state & streaming

2019-03-16 Thread kant kodali
Any update on this? On Wed, Oct 24, 2018 at 4:26 PM Arun Mahadevan wrote: > I don't think separate API or RPCs etc might be necessary for queryable > state if the state can be exposed as just another datasource. Then the sql > queries can be issued against it just like executing sql queries

Re: Plan on Structured Streaming in next major/minor release?

2018-11-02 Thread kant kodali
If I can add one thing to this list I would say stateless aggregations using Raw SQL. For example: As I read micro-batches from Kafka I want to do say a count of that micro batch and spit it out using Raw SQL . (No Count aggregation across batches.) On Tue, Oct 30, 2018 at 4:55 PM Jungtaek Lim

Re: Plan on Structured Streaming in next major/minor release?

2018-10-20 Thread kant kodali
+1 For Raising all this. +1 For Queryable State (SPARK-16738 [3]) On Thu, Oct 18, 2018 at 9:59 PM Jungtaek Lim wrote: > Small correction: "timeout" in map/flatmapGroupsWithState would not work > similar as State TTL when event time and watermark is set. So timeout in >

Re: Feature request: Java-specific transform method in Dataset

2018-07-01 Thread kant kodali
I am not affiliated with Flink or Spark but I do think some of the thoughts here makes sense On Sun, Jul 1, 2018 at 4:12 PM, Sean Owen wrote: > It's true, that

Re: [VOTE] Spark 2.3.1 (RC1)

2018-05-16 Thread kant kodali
, 2018 at 3:22 AM, Marco Gaido <marcogaid...@gmail.com> wrote: > I'd be against having a new feature in a minor maintenance release. I > think such a release should contain only bugfixes. > > 2018-05-16 12:11 GMT+02:00 kant kodali <kanth...@gmail.com>: > >> Can thi

Re: [VOTE] Spark 2.3.1 (RC1)

2018-05-16 Thread kant kodali
Can this https://issues.apache.org/jira/browse/SPARK-23406 be part of 2.3.1? On Tue, May 15, 2018 at 2:07 PM, Marcelo Vanzin wrote: > Bummer. People should still feel welcome to test the existing RC so we > can rule out other issues. > > On Tue, May 15, 2018 at 2:04 PM,

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-21 Thread kant kodali
Hi All, +1 for the tickets proposed by Ryan Blue Any possible chance of this one https://issues.apache.org/jira/browse/SPARK-23406 getting into 2.3.0? It's a very important feature for us so if it doesn't make the cut I would have to cherry-pick this commit and compile from the source for our

Re: [ANNOUNCE] Announcing Apache Spark 2.2.0

2017-07-17 Thread kant kodali
+1 On Tue, Jul 11, 2017 at 3:56 PM, Jean Georges Perrin wrote: > Awesome! Congrats! Can't wait!! > > jg > > > On Jul 11, 2017, at 18:48, Michael Armbrust > wrote: > > Hi all, > > Apache Spark 2.2.0 is the third release of the Spark 2.x line. This > release

Re: Question on Spark code

2017-06-25 Thread kant kodali
e, and a fairly niche/advanced feature. > > > On Sun, Jun 25, 2017 at 8:25 PM kant kodali <kanth...@gmail.com> wrote: > >> @Sean Got it! I come from Java world so I guess I was wrong in assuming >> that arguments are evaluated during the method invocation time. How about

Re: Question on Spark code

2017-06-25 Thread kant kodali
hat you want with log > statements. The message isn't constructed unless it will be logged. > > protected def logInfo(msg: => String) { > > > On Sun, Jun 25, 2017 at 10:28 AM kant kodali <kanth...@gmail.com> wrote: > >> Hi All, >> >> I came across this

Question on Spark code

2017-06-25 Thread kant kodali
Hi All, I came across this file https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/Logging.scala and I am wondering what is the purpose of this? Especially it doesn't prevent any string concatenation and also the if checks are already done by the library

Re: Running into the same problem as JIRA SPARK-19268

2017-05-24 Thread kant kodali
Even if I do simple count aggregation like below I get the same error as https://issues.apache.org/jira/browse/SPARK-19268 Dataset df2 = df1.groupBy(functions.window(df1.col("Timestamp5"), "24 hours", "24 hours"), df1.col("AppName")).count(); On Wed, May

Running into the same problem as JIRA SPARK-19268

2017-05-24 Thread kant kodali
Hi All, I am using Spark 2.1.1 and running in a Standalone mode using HDFS and Kafka I am running into the same problem as https://issues.apache.org/jira/browse/SPARK-19268 with my app(not KafkaWordCount). Here is my sample code *Here is how I create ReadStream* sparkSession.readStream()

Spark 2.2.0 or Spark 2.3.0?

2017-05-01 Thread kant kodali
Hi All, If I understand the Spark standard release process correctly. It looks like the official release is going to be sometime end of this month and it is going to be 2.2.0 right (not 2.3.0)? I am eagerly looking for Spark 2.2.0 because of the "update mode" option in Spark Streaming. Please

Re: is there a way to persist the lineages generated by spark?

2017-04-07 Thread kant kodali
client > describes a calculation, but in the end the description is wrong. > > > On 4. Apr 2017, at 05:19, kant kodali <kanth...@gmail.com> wrote: > > > > Hi All, > > > > I am wondering if there a way to persist the lineages generated by spark >

is there a way to persist the lineages generated by spark?

2017-04-03 Thread kant kodali
Hi All, I am wondering if there a way to persist the lineages generated by spark underneath? Some of our clients want us to prove if the result of the computation that we are showing on a dashboard is correct and for that If we can show the lineage of transformations that are executed to get to

Are we still dependent on Guava jar in Spark 2.1.0 as well?

2017-02-26 Thread kant kodali
Are we still dependent on Guava jar in Spark 2.1.0 as well (Given Guava jar incompatibility issues)?

Re: Java 9

2017-02-07 Thread kant kodali
Well and the module system! On Tue, Feb 7, 2017 at 4:03 AM, Timur Shenkao wrote: > If I'm not wrong, they got fid of *sun.misc.Unsafe *in Java 9. > > This class is till used by several libraries & frameworks. > >

Re: Wrting data from Spark streaming to AWS Redshift?

2016-12-11 Thread kant kodali
@shyla a side question: What does Redshift can do that Spark cannot do? Trying to understand your use case. On Fri, Dec 9, 2016 at 8:47 PM, ayan guha wrote: > Ideally, saving data to external sources should not be any different. give > the write options as stated in the

Re: How do I convert json_encoded_blob_column into a data frame? (This may be a feature request)

2016-11-17 Thread kant kodali
out. > > On Wed, Nov 16, 2016 at 4:39 PM, kant kodali <kanth...@gmail.com> wrote: > >> 1. I have a Cassandra Table where one of the columns is blob. And this >> blob contains a JSON encoded String however not all the blob's across the >> Cassandra table for that co

Another Interesting Question on SPARK SQL

2016-11-17 Thread kant kodali
​ Which parts in the diagram above are executed by DataSource connectors and which parts are executed by Tungsten? or to put it in another way which phase in the diagram above does Tungsten leverages the Datasource connectors (such as say cassandra connector ) ? My understanding so far is that

How do I convert json_encoded_blob_column into a data frame? (This may be a feature request)

2016-11-16 Thread kant kodali
https://spark.apache.org/docs/2.0.2/sql-programming-guide.html#json-datasets "Spark SQL can automatically infer the schema of a JSON dataset and load it as a DataFrame. This conversion can be done using SQLContext.read.json() on either an RDD of String, or a JSON file." val df =

Re: Spark Improvement Proposals

2016-10-12 Thread kant kodali
Some of you guys may have already seen this but in case if you haven't you may want to check it out. http://www.slideshare.net/sbaltagi/flink-vs-spark On Tue, Oct 11, 2016 at 1:57 PM, Ryan Blue wrote: > I don't think we will have trouble with whatever rule that is

Re: This Exception has been really hard to trace

2016-10-10 Thread kant kodali
rk is different from the compiled one. You should mark the Spark components  "provided". See https://issues.apache.org/jira/browse/SPARK-9219 On Sun, Oct 9, 2016 at 8:13 PM, kant kodali <kanth...@gmail.com> wrote: I tried SpanBy but look like there is a strange error that happening no

Re: This Exception has been really hard to trace

2016-10-09 Thread kant kodali
Hi Reynold, Actually, I did that a well before posting my question here. Thanks,kant On Sun, Oct 9, 2016 8:48 PM, Reynold Xin r...@databricks.com wrote: You should probably check with DataStax who build the Cassandra connector for Spark. On Sun, Oct 9, 2016 at 8:13 PM, kant kodali <ka

This Exception has been really hard to trace

2016-10-09 Thread kant kodali
I tried SpanBy but look like there is a strange error that happening no matter which way I try. Like the one here described for Java solution. http://qaoverflow.com/question/how-to-use-spanby-in-java/ java.lang.ClassCastException: cannot assign instance of

Fwd: seeing this message repeatedly.

2016-09-05 Thread kant kodali
-- Forwarded message -- From: kant kodali <kanth...@gmail.com> Date: Sat, Sep 3, 2016 at 5:39 PM Subject: seeing this message repeatedly. To: "user @spark" <u...@spark.apache.org> Hi Guys, I am running my driver program on my local machine and my s

Re: What are the names of the network protocols used between Spark Driver, Master and Workers?

2016-08-30 Thread kant kodali
Ok I will answer my own question. Looks like Netty based RPC On Mon, Aug 29, 2016 9:22 PM, kant kodali kanth...@gmail.com wrote: What are the names of the network protocols used between Spark Driver, Master and Workers?

Re: is the Lineage of RDD stored as a byte code in memory or a file?

2016-08-24 Thread kant kodali
, 2016 at 2:00 AM, kant kodali < kanth...@gmail.com > wrote: Hi Guys, I have this question for a very long time and after diving into the source code(specifically from the links below) I have a feeling that the lineage of an RDD (the transformations) are converted into byte code and stored in

is the Lineage of RDD stored as a byte code in memory or a file?

2016-08-23 Thread kant kodali
Hi Guys, I have this question for a very long time and after diving into the source code(specifically from the links below) I have a feeling that the lineage of an RDD (the transformations) are converted into byte code and stored in memory or disk. or if I were to ask another question on a