I have got the answer from Mich's answer. Thank you both.
frakass
On 08/02/2022 16:36, Gourav Sengupta wrote:
Hi,
so do you want to rank apple and tomato both as 2? Not quite clear on
the use case here though.
Regards,
Gourav Sengupta
On Tue, Feb 8, 2022 at 7:10 AM wrote:
Hello Gourav
Hi,
so do you want to rank apple and tomato both as 2? Not quite clear on the
use case here though.
Regards,
Gourav Sengupta
On Tue, Feb 8, 2022 at 7:10 AM wrote:
>
> Hello Gourav
>
>
> As you see here orderBy has already give the solution for "equal
> amount":
>
> >>> df =
> >>>
>
Maybe col func is not even needed here. :)
>>> df.select(F.dense_rank().over(wOrder).alias("rank"),
"fruit","amount").show()
++--+--+
|rank| fruit|amount|
++--+--+
| 1|cherry| 5|
| 2| apple| 3|
| 2|tomato| 3|
| 3|orange| 2|
simple either rank() or desnse_rank()
>>> from pyspark.sql import functions as F
>>> from pyspark.sql.functions import col
>>> from pyspark.sql.window import Window
>>> wOrder = Window().orderBy(df['amount'].desc())
>>> df.select(F.rank().over(wOrder).alias("rank"), col('fruit'),
https://stackoverflow.com/a/51854022/299676
On Tue, 8 Feb 2022 at 09:25, Stelios Philippou wrote:
> This has the information that you require in order to add an extra column
> with a sequence to it.
>
>
> On Tue, 8 Feb 2022 at 09:11, wrote:
>
>>
>> Hello Gourav
>>
>>
>> As you see here orderBy
This has the information that you require in order to add an extra column
with a sequence to it.
On Tue, 8 Feb 2022 at 09:11, wrote:
>
> Hello Gourav
>
>
> As you see here orderBy has already give the solution for "equal
> amount":
>
> >>> df =
> >>>
>
Hello Gourav
As you see here orderBy has already give the solution for "equal
amount":
df =
sc.parallelize([("orange",2),("apple",3),("tomato",3),("cherry",5)]).toDF(['fruit','amount'])
df.select("*").orderBy("amount",ascending=False).show()
+--+--+
| fruit|amount|
Hi,
sorry once again, will try to understand the problem first :)
As we can clearly see that the initial responses were incorrectly guessing
the solution to be monotonically_increasing function
What if there are two fruits with equal amount? For any real life
application, can we understand what
For this req you can rank or dense rank.
On Tue, 8 Feb 2022 at 1:12 pm, wrote:
> Hello,
>
> For this query:
>
> >>> df.select("*").orderBy("amount",ascending=False).show()
> +--+--+
> | fruit|amount|
> +--+--+
> |tomato| 9|
> | apple| 6|
> |cherry| 5|
> |orange|
Hello,
For this query:
df.select("*").orderBy("amount",ascending=False).show()
+--+--+
| fruit|amount|
+--+--+
|tomato| 9|
| apple| 6|
|cherry| 5|
|orange| 3|
+--+--+
I want to add a column "top", in which the value is: 1,2,3... meaning
top1, top2,
Hi,
can we understand the requirement first?
What is that you are trying to achieve by auto increment id? Do you just
want different ID's for rows, or you may want to keep track of the record
count of a table as well, or do you want to do use them for surrogate keys?
If you are going to insert
Monotonically_increasing_id() will give the same functionality
On Mon, 7 Feb, 2022, 6:57 am , wrote:
> For a dataframe object, how to add a column who is auto_increment like
> mysql's behavior?
>
> Thank you.
>
> -
> To
Try this:
https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.functions.monotonically_increasing_id.html
On Mon, 7 Feb 2022 at 12:27 pm, wrote:
> For a dataframe object, how to add a column who is auto_increment like
> mysql's behavior?
>
> Thank you.
>
>
For a dataframe object, how to add a column who is auto_increment like
mysql's behavior?
Thank you.
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
14 matches
Mail list logo