osawyerr opened a new issue, #6794:
URL: https://github.com/apache/arrow-datafusion/issues/6794

   ### Describe the bug
   
   Datafusion gives incorrect results when running TPC-H Query 8 with parquet 
files. 
   
   ### To Reproduce
   
   1. Generate TPC-H parquet files for scale factor 10
   2. Open datafusion-cli and create external tables pointing to files
   ```sql
   create external table lineitem stored as parquet location 
'/path/to/lineitem/lineitem_1687987398_default/';
   create external table customer stored as parquet location 
'/path/to/customer/customer_1687987384_default/';
   create external table nation stored as parquet location 
'/path/to/nation/nation_1687988005_default/';
   create external table orders stored as parquet location 
'/path/to/orders/orders_1687988005_default/';
   create external table region stored as parquet location 
'/path/to/region/region_1687988223_default/';
   create external table supplier stored as parquet location 
'/path/to/supplier/supplier_1687988223_default/';
   create external table part stored as parquet location 
'/path/to/part/part_1687988133_default/';
   ```
   3. Run TPC-H Query 8
   ```sql
   select
     o_year, sum(case when nation = 'BRAZIL' then volume else 
   0
    end) / sum(volume) as mkt_share
   from
     (
       select
         extract(year from o_orderdate) as o_year,
         l_extendedprice * (
   1
    - l_discount) as volume,
         n2.n_name as nation
       from part, supplier, lineitem, orders, customer, nation n1, nation n2, 
region
       where
         p_partkey = l_partkey
         and s_suppkey = l_suppkey
         and l_orderkey = o_orderkey
         and o_custkey = c_custkey
         and c_nationkey = n1.n_nationkey
         and n1.n_regionkey = r_regionkey
         and r_name = 'AMERICA'
         and s_nationkey = n2.n_nationkey
         and o_orderdate between date '1995-01-01' and date '1996-12-31'
         and p_type = 'ECONOMY ANODIZED STEEL'
     ) as all_nations
   group by o_year
   order by o_year;
   ```
   4. Incorrect results displayed below
   ```
   +--------+-------------------------------------------+
   | o_year | mkt_share                                 |
   +--------+-------------------------------------------+
   | 1995.0 | -0.00000000000011380044067220119495060732 |
   | 1996.0 | 0.00000000000019588288500717383285218261  |
   +--------+-------------------------------------------+
   2 rows in set. Query took 1.131 seconds.
   ```
   
   
   ### Expected behavior
   
   The correct results should be:
   1. From Postgres:
   ```
    o_year |       mkt_share        
   --------+------------------------
      1995 | 0.03882014251433219622
      1996 | 0.03948968749183991638
   (2 rows)
   
   ```
   2. From DuckDb (with same parquet files):
   ```
   ┌────────┬─────────────────────┐
   │ o_year │      mkt_share      │
   │ int64  │       double        │
   ├────────┼─────────────────────┤
   │   1995 │  0.0388201425143322 │
   │   1996 │ 0.03948968749183992 │
   └────────┴─────────────────────┘
   Run Time (s): real 1.328 user 8.095730 sys 0.358842
   ```
   
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to