[jira] [Commented] (SPARK-37491) Fix Series.asof when values of the series is not sorted

pralabhkumar (Jira) Mon, 10 Jan 2022 06:24:14 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-37491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472056#comment-17472056
 ]


pralabhkumar commented on SPARK-37491:
--------------------------------------

Lets take example of 

pser = pd.Series([2, 1, np.nan, 4], index=[10, 20, 30, 40], name="Koalas")

pser.asof([5,20])  will give output [Nan , 1] 

While 

ps.from_pandas(pser).asof[5,20] will give output [Nan, 2]

*Explanation*

Data frame created after applying condition.

F.when(index_scol <= SF.lit(index).cast(index_type)  Without applying max 
aggregation  

+-------------+--------------+-----------------+

|col_5        |col_25        |__index_level_0__|

+-------------+--------------+-----------------+

|null|2.0|10               |

|null|1.0|20               |

|null|null|30               |

|null|null|40               |

+-------------+--------------+-----------------+

Since we are taking max , output is coming 2. Ideally what we need is the last 
non null value or each col with increasing order of __index_level_0__.

Now to implement the logic . What I planning to do is create a below DF from 
the above DF , using explode , partition and row_number

__index_level_0__.        Identifier          value    row_number

40                                      col_5               null.      1

30                                    col_5                null       2

20                                    col_5                null       3

10                                    col_5               null         4

40                                     col_20         2              1

30                                     col_20        1              2

20                                    col_20         null         3

10                                  col_20            null         4  

 

Then filter on row_number=1 . There are other things to take care , but 
majority of the logic is this .

Please let me know if its in correct direction ( This is actually passing all 
the asof test cases ,including the  case which is described in jira. ) . 

 

[~itholic]  

> Fix Series.asof when values of the series is not sorted
> -------------------------------------------------------
>
>                 Key: SPARK-37491
>                 URL: https://issues.apache.org/jira/browse/SPARK-37491
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.3.0
>            Reporter: dch nguyen
>            Priority: Major
>
> https://github.com/apache/spark/pull/34737#discussion_r758223279



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37491) Fix Series.asof when values of the series is not sorted

Reply via email to