Re: How to split a dataframe into two dataframes based on count

2020-05-18 Thread Vipul Rajan
Hi Mohit, "Seems like the limit on parent is executed twice and return different records each time. Not sure why it is executed twice when I mentioned only once" That is to be expected. Since spark follows lazy evaluation, which means that execution only happens when you call an action, every

Re: Issue with UDF Int Conversion - Str to Int

2020-03-23 Thread Vipul Rajan
Hi Ayan, You don't have to bother with conversion at all. All functions that should work on number columns would still work as long as all values in the column are numbers: scala> df2.printSchema root |-- id: string (nullable = false) |-- id2: string (nullable = false) scala> df2.show

Re: [External]Re: spark 2.x design docs

2019-09-19 Thread Vipul Rajan
ng specifically for documents spark committer use for reference. > > > > Currently I’ve put custom logs in spark-core sources then building and > running jobs on it. > > Form printed logs I try to understand execution flows. > > > > *From:* Vipul Rajan > *Sent:* T

Re: spark 2.x design docs

2019-09-19 Thread Vipul Rajan
https://github.com/JerryLead/SparkInternals/blob/master/EnglishVersion/2-JobLogicalPlan.md This is pretty old. but it might help a little bit. I myself am going through the source code and trying to reverse engineer stuff. Let me know if you'd like to pool resources sometime. Regards On Thu, Sep

Re: Use derived column for other derived column in the same statement

2019-04-22 Thread Vipul Rajan
Hi Rishi, TL;DR Using Scala, this would work df.withColumn("derived1", lit("something")).withColumn("derived2", col("derived1") === "something") just note that I used 3 equal to signs instead of just two. That should be enough, if you want to understand why read further. so "==" gives boolean

Re: Structured Streaming initialized with cached data or others

2019-04-22 Thread Vipul Rajan
Please look into arbitrary stateful aggregation. I do not completely understand your problem though. If you could give me an example. I'd be happy to help On Mon, 22 Apr 2019, 15:31 shicheng31...@gmail.com, wrote: > Hi ,all: > As we all known, structured streaming is used to handle