Re: Select entire row based on a logic applied on 2 columns across multiple rows

2017-08-30 Thread Andrés Ivaldi
I see, as @ayan said, it's valid, but, why don't use API or SQL, the build-in options are optimized I understand that SQL API is hard when trying to build an api over that, but Spark API doesn't, and you can do a lot with that. regards, On Wed, Aug 30, 2017 at 10:31 AM, ayan guha

Re: Select entire row based on a logic applied on 2 columns across multiple rows

2017-08-30 Thread ayan guha
Well, using raw sql is a valid option, but if you do not want you can always implement the concept using apis. All these constructs have api counterparts, such as filter, window, over, row number etc. On Wed, 30 Aug 2017 at 10:49 pm, purna pradeep wrote: > @Andres I

Re: Select entire row based on a logic applied on 2 columns across multiple rows

2017-08-30 Thread purna pradeep
@Andres I need latest but it should less than 10 months based income_age column and don't want to use sql here On Wed, Aug 30, 2017 at 8:08 AM Andrés Ivaldi wrote: > Hi, if you need the last value from income in window function you can use > last_value. > No tested but meaby

Re: Select entire row based on a logic applied on 2 columns across multiple rows

2017-08-30 Thread Andrés Ivaldi
Hi, if you need the last value from income in window function you can use last_value. No tested but meaby with @ayan sql spark.sql("select *, row_number(), last_value(income) over (partition by id order by income_age_ts desc) r from t") On Tue, Aug 29, 2017 at 11:30 PM, purna pradeep

Re: Select entire row based on a logic applied on 2 columns across multiple rows

2017-08-29 Thread purna pradeep
@ayan, Thanks for your response I would like to have functions in this case calculateIncome and the reason why I need function is to reuse in other parts of the application ..that's the reason I'm planning for mapgroups with function as argument which takes rowiterator ..but not sure if this is

Re: Select entire row based on a logic applied on 2 columns across multiple rows

2017-08-29 Thread ayan guha
Hi the tool you are looking for is window function. Example: >>> df.show() +++---+--+-+ |JoinDate|dept| id|income|income_age_ts| +++---+--+-+ | 4/20/13| ES|101| 19000| 4/20/17| | 4/20/13| OS|101| 1| 10/3/15| | 4/20/12|

Re: Select entire row based on a logic applied on 2 columns across multiple rows

2017-08-29 Thread purna pradeep
Please click on unnamed text/html link for better view On Tue, Aug 29, 2017 at 8:11 PM purna pradeep wrote: > > -- Forwarded message - > From: Mamillapalli, Purna Pradeep < > purnapradeep.mamillapa...@capitalone.com> > Date: Tue, Aug 29, 2017 at 8:08 PM

Select entire row based on a logic applied on 2 columns across multiple rows

2017-08-29 Thread purna pradeep
-- Forwarded message - From: Mamillapalli, Purna Pradeep Date: Tue, Aug 29, 2017 at 8:08 PM Subject: Spark question To: purna pradeep Below is the input Dataframe(In real this is a very large Dataframe)