kmitchener commented on issue #166:
URL: 
https://github.com/apache/arrow-datafusion/issues/166#issuecomment-1237211919

   This runs now, but returns 0 results *most of the time*. The way views are 
executed still looks a little funny to me, so to exclude the possibility of 
some bug in the view code, I converted it to a with statement like below:
   
   ```sql
   with revenue as (
        select
                l_suppkey as supplier_no,
                sum(l_extendedprice * (1 - l_discount)) as total_revenue
        from
                lineitem
        where
                l_shipdate >= date '1996-01-01'
                and l_shipdate < date '1996-01-01' + interval '3' month
        group by
                l_suppkey)
   select
        s_suppkey,
        s_name,
        s_address,
        s_phone,
        total_revenue
   from
        supplier,
        revenue
   where
        s_suppkey = supplier_no
        and total_revenue = (
                select
                        max(total_revenue)
                from
                        revenue
        )
   order by
        s_suppkey;
   ```
   
   Running this query back to back multiple times usually returns 0 results, 
but sometimes it correctly returns the top supplier as it's supposed to -- 1 
result:
   
   ```sql
   
+-----------+--------------------+-------------------+-----------------+--------------------+
   | s_suppkey | s_name             | s_address         | s_phone         | 
total_revenue      |
   
+-----------+--------------------+-------------------+-----------------+--------------------+
   | 8449      | Supplier#000008449 | Wp34zim9qYFbVctdW | 20-469-856-8873 | 
1772627.2086999998 |
   
+-----------+--------------------+-------------------+-----------------+--------------------+
   1 row in set. Query took 3.363 seconds.
   ```
   
   I extracted just the WITH section to see what it's returning. And 2 back to 
back runs of this query shows different results for `total_revenue`:
   ```sql
   with revenue as (
           select
                   l_suppkey as supplier_no,
                   sum(l_extendedprice * (1 - l_discount)) as total_revenue
           from
                   lineitem
           where
                   l_shipdate >= date '1996-01-01'
                   and l_shipdate < date '1996-01-01' + interval '3' month
           group by
                   l_suppkey) select * from revenue order by 2 desc limit 1;
   ```
   ```sql
   +-------------+---------------+
   | supplier_no | total_revenue |
   +-------------+---------------+
   | 8449        | 1772627.2087  |
   +-------------+---------------+
   1 row in set. Query took 2.959 seconds.
   ```
   then
   ```sql
   +-------------+--------------------+
   | supplier_no | total_revenue      |
   +-------------+--------------------+
   | 8449        | 1772627.2086999998 |
   +-------------+--------------------+
   1 row in set. Query took 2.554 seconds.
   ```
   
   I understand floating point results are uncertain. Is that what's going on 
here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to