[ https://issues.apache.org/jira/browse/SPARK-25996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marco Gaido resolved SPARK-25996. --------------------------------- Resolution: Not A Problem [~igomezraggio] check the ts of the first row. it is {{00:00:01}}, so the result is correct. > Agregaciones no retornan los valores correctos con rows con timestamps iguales > ------------------------------------------------------------------------------ > > Key: SPARK-25996 > URL: https://issues.apache.org/jira/browse/SPARK-25996 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 2.3.1, 2.4.0 > Environment: Windows 10 > PyCharm 2018.2.2 > Python 3.6 > > Reporter: Ignacio Gómez > Priority: Major > > Hi all, > When using pyspark I perform a count operation prior to the previous date of > the current row, including in the count the current row, with the > corresponding query: > query = """ > select *,* count ( * ) over (partition by ACCOUNTID > order by TS > range between interval 5000 milliseconds preceding and current row) as > total_count > from df3 > """ > df3 = sqlContext.sql(query) > and return the following: > > |ACCOUNTID|AMOUNT|TS|total_count| > |1|100|2018-01-01 00:00:01|1| > |1|1000|2018-01-01 10:00:01|1| > |1|25|2018-01-01 10:00:02|2| > |1|500|2018-01-01 10:00:03|3| > |1|100|2018-01-01 10:00:04|4| > |1|80|2018-01-01 10:00:05|5| > |1|700|2018-01-01 11:00:04|1| > |1|205|2018-01-02 10:00:02|1| > |1|500|2018-01-02 10:00:03|2| > |3|80|2018-01-02 10:00:05|1| > > As you can see, in the third row, the total_count should give 3 instead of 2 > because there are 2 previous records and not 1. In the following rows, the > error is dragged. > This happens with the other aggregation operations. > Beyond the fact that the date of the first rows is the same, that does not > mean that these two exist and should not be considered as the only one that > exists is the last one with the same date. > > Could you help me? > Thank you -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org