RalfJL opened a new issue, #12949: URL: https://github.com/apache/pinot/issues/12949
Currently I am investigating if Pinot can help us to calculate inbytes and outbytes from Radius accounting data. The setup is: sysbench (produces artificial accounting data) -> freeradius -> kafka -> pinot The randomness of the artificial data is low so we end up with several "sessions" having the same Acct-Unique-Session-ID. e.g. query: ``` select a.acctuniquesessionid, a.acctstatustype, a.eventtime, a.eventtimestamp from radius_json a where a.acctuniquesessionid = 'da0f86138ec36b2364889f048bf2ac82' order by a.eventtime ``` returns: ``` acctuniquesessionid acctstatustype eventtime eventtimestamp da0f86138ec36b2364889f048bf2ac82 Start 1712050245468 Apr 2 2024 09:30:45 UTC da0f86138ec36b2364889f048bf2ac82 Stop 1712050293427 Apr 2 2024 09:31:33 UTC da0f86138ec36b2364889f048bf2ac82 Start 1712759564017 Apr 10 2024 14:32:43 UTC da0f86138ec36b2364889f048bf2ac82 Interim-Update 1712759589307 Apr 10 2024 14:33:09 UTC da0f86138ec36b2364889f048bf2ac82 Interim-Update 1712759612449 Apr 10 2024 14:33:32 UTC da0f86138ec36b2364889f048bf2ac82 Interim-Update 1712759616370 Apr 10 2024 14:33:36 UTC da0f86138ec36b2364889f048bf2ac82 Stop 1712759668126 Apr 10 2024 14:34:27 UTC ``` Obviously there are two different sessions with the same acctuniquesessionid in the data. In real life we can not guarantee that Session ID's are always unique, so this might also happen in real life data. The solution is to filter out all 'Start' - 'Stop' combinations where there is a 'Start' or a 'Stop' in between. And here comes the problem: query: ``` select a.acctuniquesessionid, (a.eventtime - b.eventtime)/1000 as Zeit, a.acctstatustype, b.acctstatustype, a.eventtime, b.eventtime ,( select count(c.eventtime) from radius_json c where b.acctuniquesessionid = c.acctuniquesessionid and a.acctuniquesessionid = c.acctuniquesessionid and a.eventtime > 0 and b.eventtime > 0 and (c.acctstatustype = 'Stop' or c.acctstatustype = 'Start' ) and c.eventtime between b.eventtime+1 and a.eventtime-1 ) from radius_json a join radius_json b on b.acctuniquesessionid = a.acctuniquesessionid and a.acctstatustype = 'Stop' and b.acctstatustype = 'Start' -- and a._3gppimsi <> 'null' and a.eventtime > b.eventtime where not exists ( select 1 from radius_json c where b.acctuniquesessionid = c.acctuniquesessionid and a.acctuniquesessionid = c.acctuniquesessionid and a.eventtime > 0 and b.eventtime > 0 and (c.acctstatustype = 'Stop' or c.acctstatustype = 'Start' ) and c.eventtime between b.eventtime+1 and a.eventtime-1 ) -- and a.acctuniquesessionid = 'fac07ae7853810946d87e868e463af2c' -- and a.acctuniquesessionid = 'ac6baa744130522c1eb1eec161114d1b' and a.acctuniquesessionid = 'da0f86138ec36b2364889f048bf2ac82' order by Zeit desc ``` Returns: ``` acctuniquesessionid Zeit acctstatustype acctstatustype eventtime eventtime EXPR$4 da0f86138ec36b2364889f048bf2ac82 104 Stop Start 1712759668126 1712759564017 0 da0f86138ec36b2364889f048bf2ac82 47 Stop Start 1712050293427 1712050245468 0 ``` The two sessions as expected. Removing the filter a.acctuniquesessionid = 'da0f86138ec36b2364889f048bf2ac82' shows a very different result query: ``` select a.acctuniquesessionid, (a.eventtime - b.eventtime)/1000 as Zeit, a.acctstatustype, b.acctstatustype, a.eventtime, b.eventtime ,( select count(c.eventtime) from radius_json c where b.acctuniquesessionid = c.acctuniquesessionid and a.acctuniquesessionid = c.acctuniquesessionid and a.eventtime > 0 and b.eventtime > 0 and (c.acctstatustype = 'Stop' or c.acctstatustype = 'Start' ) and c.eventtime between b.eventtime+1 and a.eventtime-1 ) from radius_json a join radius_json b on b.acctuniquesessionid = a.acctuniquesessionid and a.acctstatustype = 'Stop' and b.acctstatustype = 'Start' -- and a._3gppimsi <> 'null' and a.eventtime > b.eventtime where not exists ( select 1 from radius_json c where b.acctuniquesessionid = c.acctuniquesessionid and a.acctuniquesessionid = c.acctuniquesessionid and a.eventtime > 0 and b.eventtime > 0 and (c.acctstatustype = 'Stop' or c.acctstatustype = 'Start' ) and c.eventtime between b.eventtime+1 and a.eventtime-1 ) -- and a.acctuniquesessionid = 'fac07ae7853810946d87e868e463af2c' -- and a.acctuniquesessionid = 'ac6baa744130522c1eb1eec161114d1b' -- and a.acctuniquesessionid = 'da0f86138ec36b2364889f048bf2ac82' order by Zeit desc ``` result (shortened by thousands of other sessions): ``` da0f86138ec36b2364889f048bf2ac82 709422 Stop Start 1712759668126 1712050245468 0 ``` So here obviously both subqueries return wrong results. There is a 'Start' and a 'Stop' between timestamp 1712759668126 and 1712050245468 but it is not detected anymore when the filter for the acctuniquessionid is removed Am I missing here something or am I hitting a bug in multistage engine? I am using Pinot 1.1.0 as docker compose image (single instance of all server processes) The query plan for both queries is the same beside the Logical Filter first query: LogicalFilter(condition=[AND(=($14, _UTF-8'da0f86138ec36b2364889f048bf2ac82'), =($12, _UTF-8'Stop'))]) second query: LogicalFilter(condition=[=($12, _UTF-8'Stop')]) PLEASE NOTE: Some filters are redundant like "b.acctuniquesessionid = c.acctuniquesessionid and a.acctuniquesessionid = c.acctuniquesessionid" and are a test if the result changes. But it doesn't -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org