RalfJL opened a new issue, #12949:
URL: https://github.com/apache/pinot/issues/12949

   Currently I am investigating if Pinot can help us to calculate inbytes and 
outbytes from Radius accounting data.
   The setup is:
   sysbench (produces artificial accounting data) -> freeradius -> kafka -> 
pinot
   The randomness of the artificial data is low so we end up with several 
"sessions" having the same Acct-Unique-Session-ID.
   e.g. query:
   ```
   select a.acctuniquesessionid, a.acctstatustype, a.eventtime, 
a.eventtimestamp from radius_json a
   where a.acctuniquesessionid = 'da0f86138ec36b2364889f048bf2ac82'
   order by a.eventtime
   ```
   returns:
   ```
   acctuniquesessionid  acctstatustype  eventtime       eventtimestamp
   da0f86138ec36b2364889f048bf2ac82     Start   1712050245468   Apr 2 2024 
09:30:45 UTC
   da0f86138ec36b2364889f048bf2ac82     Stop    1712050293427   Apr 2 2024 
09:31:33 UTC
   da0f86138ec36b2364889f048bf2ac82     Start   1712759564017   Apr 10 2024 
14:32:43 UTC
   da0f86138ec36b2364889f048bf2ac82     Interim-Update  1712759589307   Apr 10 
2024 14:33:09 UTC
   da0f86138ec36b2364889f048bf2ac82     Interim-Update  1712759612449   Apr 10 
2024 14:33:32 UTC
   da0f86138ec36b2364889f048bf2ac82     Interim-Update  1712759616370   Apr 10 
2024 14:33:36 UTC
   da0f86138ec36b2364889f048bf2ac82     Stop    1712759668126   Apr 10 2024 
14:34:27 UTC
   ```
   Obviously there are two different sessions with the same acctuniquesessionid 
in the data.
   In real life we can not guarantee that Session ID's are always unique, so 
this might also happen in real life data.
   
   The solution is to filter out all 'Start' - 'Stop' combinations where there 
is a 'Start' or a 'Stop' in between.
   And here comes the problem:
   query:
   ```
   select a.acctuniquesessionid, (a.eventtime - b.eventtime)/1000 as Zeit, 
a.acctstatustype, b.acctstatustype, a.eventtime, b.eventtime 
    ,( select count(c.eventtime) from radius_json c where b.acctuniquesessionid 
= c.acctuniquesessionid 
                                 and a.acctuniquesessionid = 
c.acctuniquesessionid
                                and a.eventtime > 0 and b.eventtime > 0
                                and (c.acctstatustype = 'Stop' or 
c.acctstatustype = 'Start' ) 
                                and c.eventtime between b.eventtime+1 and 
a.eventtime-1
                                 ) 
   
   from radius_json a
   join radius_json b
   on b.acctuniquesessionid = a.acctuniquesessionid 
      and a.acctstatustype = 'Stop' and b.acctstatustype = 'Start' -- and 
a._3gppimsi <> 'null'
      and a.eventtime > b.eventtime
   where  
        not exists ( select 1 from radius_json c where b.acctuniquesessionid = 
c.acctuniquesessionid and a.acctuniquesessionid = c.acctuniquesessionid
                                and a.eventtime > 0 and b.eventtime > 0
                                and (c.acctstatustype = 'Stop' or 
c.acctstatustype = 'Start' ) 
                                and c.eventtime between b.eventtime+1 and 
a.eventtime-1
                                 ) 
   -- and                       a.acctuniquesessionid = 
'fac07ae7853810946d87e868e463af2c'
   -- and                               a.acctuniquesessionid = 
'ac6baa744130522c1eb1eec161114d1b'
   and             a.acctuniquesessionid = 'da0f86138ec36b2364889f048bf2ac82'
   order by Zeit desc
   ```
   Returns:
   ```
   acctuniquesessionid  Zeit    acctstatustype  acctstatustype  eventtime       
eventtime       EXPR$4
   da0f86138ec36b2364889f048bf2ac82     104     Stop    Start   1712759668126   
1712759564017   0
   da0f86138ec36b2364889f048bf2ac82     47      Stop    Start   1712050293427   
1712050245468   0
   ```
   The two sessions as expected.
   
   Removing the filter a.acctuniquesessionid = 
'da0f86138ec36b2364889f048bf2ac82' shows a very different result
   query:
   ```
   select a.acctuniquesessionid, (a.eventtime - b.eventtime)/1000 as Zeit, 
a.acctstatustype, b.acctstatustype, a.eventtime, b.eventtime 
    ,( select count(c.eventtime) from radius_json c where b.acctuniquesessionid 
= c.acctuniquesessionid 
                                 and a.acctuniquesessionid = 
c.acctuniquesessionid
                                and a.eventtime > 0 and b.eventtime > 0
                                and (c.acctstatustype = 'Stop' or 
c.acctstatustype = 'Start' ) 
                                and c.eventtime between b.eventtime+1 and 
a.eventtime-1
                                 ) 
   
   from radius_json a
   join radius_json b
   on b.acctuniquesessionid = a.acctuniquesessionid 
      and a.acctstatustype = 'Stop' and b.acctstatustype = 'Start' -- and 
a._3gppimsi <> 'null'
      and a.eventtime > b.eventtime
   where  
        not exists ( select 1 from radius_json c where b.acctuniquesessionid = 
c.acctuniquesessionid and a.acctuniquesessionid = c.acctuniquesessionid
                                and a.eventtime > 0 and b.eventtime > 0
                                and (c.acctstatustype = 'Stop' or 
c.acctstatustype = 'Start' ) 
                                and c.eventtime between b.eventtime+1 and 
a.eventtime-1
                                 ) 
   -- and                       a.acctuniquesessionid = 
'fac07ae7853810946d87e868e463af2c'
   -- and                               a.acctuniquesessionid = 
'ac6baa744130522c1eb1eec161114d1b'
   -- and                  a.acctuniquesessionid = 
'da0f86138ec36b2364889f048bf2ac82'
   order by Zeit desc
   ```
   result (shortened by thousands of other sessions):
   ```
   da0f86138ec36b2364889f048bf2ac82     709422  Stop    Start   1712759668126   
1712050245468   0
   ```
   So here obviously both subqueries return wrong results. There is a 'Start' 
and a 'Stop' between timestamp 1712759668126 and 1712050245468 but it is not 
detected anymore when the filter for the acctuniquessionid is removed
   
   Am I missing here something or am I hitting a bug in multistage engine?
   
   I am using Pinot 1.1.0 as docker compose image (single instance of all 
server processes)
   The query plan for both queries is the same beside the Logical Filter
   first query:
                       LogicalFilter(condition=[AND(=($14, 
_UTF-8'da0f86138ec36b2364889f048bf2ac82'), =($12, _UTF-8'Stop'))])
   second query:
                       LogicalFilter(condition=[=($12, _UTF-8'Stop')])
   
   PLEASE NOTE: Some filters are redundant like "b.acctuniquesessionid = 
c.acctuniquesessionid and a.acctuniquesessionid = c.acctuniquesessionid" and 
are a test if the result changes. But it doesn't


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to