RalfJL commented on issue #12949: URL: https://github.com/apache/pinot/issues/12949#issuecomment-2082309994
Thanks for the query. I can confirm that it returns the same amount of records as the query ``` select a.acctuniquesessionid, a.acctstatustype, a.ts, b.acctstatustype, b.ts, (cast(b.ts as BIGINT)-cast(a.ts as BIGINT))/1000 as Zeit from radius_start a join radius_stop b on a.acctuniquesessionid = b.acctuniquesessionid order by Zeit desc ``` But it is not the same as the second query with the "not exists" clause. Please correct me if I am wrong. And by the way, my "not exists" clause is wrong, see explanation below. Never the less it is astonishing, that adding a filter produces more output than the same query without the filter. The overall goal is to find matching Start-Stop Records though they have the same "acctuniquesessionid". The key is the timestamp column "ts". Think about the following example: acctuniquesssionid ts-Start ts-Stop 099d8dd5581b5e511f895c7c736913d4 2024-04-28 13:05:05.0 **2024-04-28 13:15:08.0** 099d8dd5581b5e511f895c7c736913d4 **2024-04-28 13:15:09.0** 2024-04-28 14:24:48.0 Simply joining both tables will return 4 records. ts acctstatustype ts acctstatustype Zeit 2024-04-28 13:15:09.0 Start 2024-04-28 13:15:08.0 Stop -1 2024-04-28 13:15:09.0 Start 2024-04-28 14:24:48.0 Stop 4179 2024-04-28 13:05:05.0 Start 2024-04-28 13:15:08.0 Stop 603 2024-04-28 13:05:05.0 Start 2024-04-28 14:24:48.0 Stop 4783 And obviously the first and the last records are wrong Next step is to filter out the record, where Stop is before Start (kick out the 1. Record) ``` select a.ts, a.acctstatustype, b.ts, b.acctstatustype,(cast(b.ts as BIGINT) - cast(a.ts as BIGINT)) / 1000 as Zeit from radius_start a inner join radius_stop b on a.acctuniquesessionid = b.acctuniquesessionid where a.ts < b.ts and a.acctuniquesessionid = '099d8dd5581b5e511f895c7c736913d4' ``` And the next step is to kick out the pair where Start from the first session is combined with the Stop of the second session. In other words there can not be any other Start record between our Start-Stop pair ``` select a.ts, a.acctstatustype, b.ts, b.acctstatustype,(cast(b.ts as BIGINT) - cast(a.ts as BIGINT)) / 1000 as Zeit from radius_start a inner join radius_stop b on a.acctuniquesessionid = b.acctuniquesessionid and a.ts < b.ts where a.acctuniquesessionid = '099d8dd5581b5e511f895c7c736913d4' and not exists ( select 1 from radius_start c where c.acctuniquesessionid = a.acctuniquesessionid and cast(c.ts as BIGINT) between cast(a.ts as BIGINT) +1 and cast(b.ts as BIGINT) -1) ``` Which returns the desired pairs: acctuniquesessionid ts acctstatustype ts acctstatustype Zeit 099d8dd5581b5e511f895c7c736913d4 2024-04-28 13:15:09.0 Start 2024-04-28 14:24:48.0 Stop 4179 099d8dd5581b5e511f895c7c736913d4 2024-04-28 13:05:05.0 Start 2024-04-28 13:15:08.0 Stop 603 Removing the filter for acctuniquessionid still shows the right records but the number of records returned differ by the factor 2.5 depending on the where clause Query with "not exists" returns 467702 records Query without any where clause returns 185501 records actually it should be the other way round, right? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org