[ 
https://issues.apache.org/jira/browse/SPARK-25996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Gaido resolved SPARK-25996.
---------------------------------
    Resolution: Not A Problem

[~igomezraggio] check the ts of the first row. it is {{00:00:01}}, so the 
result is correct.

> Agregaciones no retornan los valores correctos con rows con timestamps iguales
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-25996
>                 URL: https://issues.apache.org/jira/browse/SPARK-25996
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.3.1, 2.4.0
>         Environment: Windows 10
> PyCharm 2018.2.2
> Python 3.6
>  
>            Reporter: Ignacio Gómez
>            Priority: Major
>
> Hi all,
> When using pyspark I perform a count operation prior to the previous date of 
> the current row, including in the count the current row, with the 
> corresponding query:
> query = """
>  select *,* count ( * ) over (partition by ACCOUNTID
>  order by TS
>  range between interval 5000 milliseconds preceding and current row) as 
> total_count
>  from df3
>  """
>  df3 = sqlContext.sql(query)
> and return the following:
>  
> |ACCOUNTID|AMOUNT|TS|total_count|
> |1|100|2018-01-01 00:00:01|1|
> |1|1000|2018-01-01 10:00:01|1|
> |1|25|2018-01-01 10:00:02|2|
> |1|500|2018-01-01 10:00:03|3|
> |1|100|2018-01-01 10:00:04|4|
> |1|80|2018-01-01 10:00:05|5|
> |1|700|2018-01-01 11:00:04|1|
> |1|205|2018-01-02 10:00:02|1|
> |1|500|2018-01-02 10:00:03|2|
> |3|80|2018-01-02 10:00:05|1|
>  
> As you can see, in the third row, the total_count should give 3 instead of 2 
> because there are 2 previous records and not 1. In the following rows, the 
> error is dragged.
> This happens with the other aggregation operations.
> Beyond the fact that the date of the first rows is the same, that does not 
> mean that these two exist and should not be considered as the only one that 
> exists is the last one with the same date.
>  
> Could you help me?
> Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to