Re: add an auto_increment column

2022-02-08 Thread capitnfrakass
I have got the answer from Mich's answer. Thank you both. frakass On 08/02/2022 16:36, Gourav Sengupta wrote: Hi, so do you want to rank apple and tomato both as 2? Not quite clear on the use case here though. Regards, Gourav Sengupta On Tue, Feb 8, 2022 at 7:10 AM wrote: Hello Gourav

Re: add an auto_increment column

2022-02-08 Thread Gourav Sengupta
Hi, so do you want to rank apple and tomato both as 2? Not quite clear on the use case here though. Regards, Gourav Sengupta On Tue, Feb 8, 2022 at 7:10 AM wrote: > > Hello Gourav > > > As you see here orderBy has already give the solution for "equal > amount": > > >>> df = > >>> >

Re: add an auto_increment column

2022-02-08 Thread Bitfox
Maybe col func is not even needed here. :) >>> df.select(F.dense_rank().over(wOrder).alias("rank"), "fruit","amount").show() ++--+--+ |rank| fruit|amount| ++--+--+ | 1|cherry| 5| | 2| apple| 3| | 2|tomato| 3| | 3|orange| 2|

Re: add an auto_increment column

2022-02-07 Thread Mich Talebzadeh
simple either rank() or desnse_rank() >>> from pyspark.sql import functions as F >>> from pyspark.sql.functions import col >>> from pyspark.sql.window import Window >>> wOrder = Window().orderBy(df['amount'].desc()) >>> df.select(F.rank().over(wOrder).alias("rank"), col('fruit'),

Re: add an auto_increment column

2022-02-07 Thread Stelios Philippou
https://stackoverflow.com/a/51854022/299676 On Tue, 8 Feb 2022 at 09:25, Stelios Philippou wrote: > This has the information that you require in order to add an extra column > with a sequence to it. > > > On Tue, 8 Feb 2022 at 09:11, wrote: > >> >> Hello Gourav >> >> >> As you see here orderBy

Re: add an auto_increment column

2022-02-07 Thread Stelios Philippou
This has the information that you require in order to add an extra column with a sequence to it. On Tue, 8 Feb 2022 at 09:11, wrote: > > Hello Gourav > > > As you see here orderBy has already give the solution for "equal > amount": > > >>> df = > >>> >

Re: add an auto_increment column

2022-02-07 Thread capitnfrakass
Hello Gourav As you see here orderBy has already give the solution for "equal amount": df = sc.parallelize([("orange",2),("apple",3),("tomato",3),("cherry",5)]).toDF(['fruit','amount']) df.select("*").orderBy("amount",ascending=False).show() +--+--+ | fruit|amount|

Re: add an auto_increment column

2022-02-07 Thread Gourav Sengupta
Hi, sorry once again, will try to understand the problem first :) As we can clearly see that the initial responses were incorrectly guessing the solution to be monotonically_increasing function What if there are two fruits with equal amount? For any real life application, can we understand what

Re: add an auto_increment column

2022-02-07 Thread ayan guha
For this req you can rank or dense rank. On Tue, 8 Feb 2022 at 1:12 pm, wrote: > Hello, > > For this query: > > >>> df.select("*").orderBy("amount",ascending=False).show() > +--+--+ > | fruit|amount| > +--+--+ > |tomato| 9| > | apple| 6| > |cherry| 5| > |orange|

Re: add an auto_increment column

2022-02-07 Thread capitnfrakass
Hello, For this query: df.select("*").orderBy("amount",ascending=False).show() +--+--+ | fruit|amount| +--+--+ |tomato| 9| | apple| 6| |cherry| 5| |orange| 3| +--+--+ I want to add a column "top", in which the value is: 1,2,3... meaning top1, top2,

Re: add an auto_increment column

2022-02-07 Thread Gourav Sengupta
Hi, can we understand the requirement first? What is that you are trying to achieve by auto increment id? Do you just want different ID's for rows, or you may want to keep track of the record count of a table as well, or do you want to do use them for surrogate keys? If you are going to insert

Re: add an auto_increment column

2022-02-06 Thread Siva Samraj
Monotonically_increasing_id() will give the same functionality On Mon, 7 Feb, 2022, 6:57 am , wrote: > For a dataframe object, how to add a column who is auto_increment like > mysql's behavior? > > Thank you. > > - > To

Re: add an auto_increment column

2022-02-06 Thread ayan guha
Try this: https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.functions.monotonically_increasing_id.html On Mon, 7 Feb 2022 at 12:27 pm, wrote: > For a dataframe object, how to add a column who is auto_increment like > mysql's behavior? > > Thank you. > >

add an auto_increment column

2022-02-06 Thread capitnfrakass
For a dataframe object, how to add a column who is auto_increment like mysql's behavior? Thank you. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org