RalfJL commented on issue #12949:
URL: https://github.com/apache/pinot/issues/12949#issuecomment-2082309994

   Thanks for the query.
   I can confirm that it returns the same amount of records as the query
   ```
   select a.acctuniquesessionid,  a.acctstatustype, a.ts, b.acctstatustype, 
b.ts, (cast(b.ts as BIGINT)-cast(a.ts as BIGINT))/1000 as Zeit from 
radius_start a
   join radius_stop b
   on a.acctuniquesessionid = b.acctuniquesessionid
   order by Zeit desc
   ```
   But it is not the same as the second query with the "not exists" clause. 
Please correct me if I am wrong.
   And by the way, my "not exists" clause is wrong, see explanation below.
   Never the less it is astonishing, that adding a filter produces more output 
than the same query without the filter.
   
   
   The overall goal is to find matching Start-Stop Records though they have the 
same "acctuniquesessionid". The key is the timestamp column "ts".
   Think about the following example:
   acctuniquesssionid                                    ts-Start               
            ts-Stop
   099d8dd5581b5e511f895c7c736913d4   2024-04-28 13:05:05.0  **2024-04-28 
13:15:08.0**
   099d8dd5581b5e511f895c7c736913d4   **2024-04-28 13:15:09.0**  2024-04-28 
14:24:48.0
   
   Simply joining both tables will return 4 records.
   ts   acctstatustype  ts      acctstatustype  Zeit
   2024-04-28 13:15:09.0        Start   2024-04-28 13:15:08.0   Stop    -1
   2024-04-28 13:15:09.0        Start   2024-04-28 14:24:48.0   Stop    4179
   2024-04-28 13:05:05.0        Start   2024-04-28 13:15:08.0   Stop    603
   2024-04-28 13:05:05.0        Start   2024-04-28 14:24:48.0   Stop    4783
   And obviously the first and the last records are wrong
   Next step is to filter out the record, where Stop is before Start (kick out 
the 1. Record)
   ```
   select a.ts, a.acctstatustype, b.ts, b.acctstatustype,(cast(b.ts as BIGINT) 
- cast(a.ts as BIGINT)) / 1000 as Zeit from radius_start a
   inner join radius_stop b
   on a.acctuniquesessionid = b.acctuniquesessionid
   where a.ts < b.ts and a.acctuniquesessionid = 
'099d8dd5581b5e511f895c7c736913d4'
   ```
   And the next step is to kick out the pair where Start from the first session 
is combined with the Stop of the second session.
   In other words there can not be any other Start record between our 
Start-Stop pair
   ```
   select a.ts, a.acctstatustype, b.ts, b.acctstatustype,(cast(b.ts as BIGINT) 
- cast(a.ts as BIGINT)) / 1000 as Zeit from radius_start a
   inner join radius_stop b
   on a.acctuniquesessionid = b.acctuniquesessionid and a.ts < b.ts 
   where  a.acctuniquesessionid = '099d8dd5581b5e511f895c7c736913d4'
   and not exists ( select 1 from radius_start c where c.acctuniquesessionid = 
a.acctuniquesessionid and cast(c.ts as BIGINT) between cast(a.ts as BIGINT) +1 
and cast(b.ts as BIGINT) -1)
   ```
   Which returns the desired pairs:
   acctuniquesessionid  ts      acctstatustype  ts      acctstatustype  Zeit
   099d8dd5581b5e511f895c7c736913d4     2024-04-28 13:15:09.0   Start   
2024-04-28 14:24:48.0   Stop    4179
   099d8dd5581b5e511f895c7c736913d4     2024-04-28 13:05:05.0   Start   
2024-04-28 13:15:08.0   Stop    603
   
   Removing the filter for acctuniquessionid still shows the right records but 
the number of records returned differ by the factor 2.5 depending on the where 
clause
   Query with "not exists" returns 467702 records
   Query without any where clause returns 185501 records
   actually it should be the other way round, right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to