Re: Question about spark.sql min_by

2022-02-21 Thread Mich Talebzadeh
I gave a similar answer to windowing functions in this thread add an auto_increment column dated 7th February https://lists.apache.org/list.html?user@spark.apache.org HTH view my Linkedin profile

Re: Question about spark.sql min_by

2022-02-21 Thread David Diebold
Thank you for your answers. Indeed windowing should help there. Also, I just realized maybe I can try to create a struct column with both price and sellerId and apply min() on it, ordering would consider price first for the ordering (https://stackoverflow.com/a/52669177/2015762) Cheers! Le lun.

Re: Question about spark.sql min_by

2022-02-21 Thread ayan guha
Why this can not be done by window function? Or is min by is just a short hand? On Tue, 22 Feb 2022 at 12:42 am, Sean Owen wrote: > From the source code, looks like this function was added to pyspark in > Spark 3.3, up for release soon. It exists in SQL. You can still use it in > SQL with

Re: Question about spark.sql min_by

2022-02-21 Thread Sean Owen
>From the source code, looks like this function was added to pyspark in Spark 3.3, up for release soon. It exists in SQL. You can still use it in SQL with `spark.sql(...)` in Python though, not hard. On Mon, Feb 21, 2022 at 4:01 AM David Diebold wrote: > Hello all, > > I'm trying to use the

Question about spark.sql min_by

2022-02-21 Thread David Diebold
Hello all, I'm trying to use the spark.sql min_by aggregation function with pyspark. I'm relying on this distribution of spark : spark-3.2.1-bin-hadoop3.2 I have a dataframe made of these columns: - productId : int - sellerId : int - price : double For each product, I want to get the seller who