alchemist51 opened a new issue, #18863:
URL: https://github.com/apache/datafusion/issues/18863

   ### Describe the bug
   
   We were trying to run the clickbench query on the partitioned data for 
different target partition size and observed the results are different.
   
   ### To Reproduce
   
   Download the partitioned data for hits using below command:
   
   ```
   seq 0 99 | xargs -P100 -I{} bash -c 'wget --directory-prefix partitioned 
--continue --progress=dot:giga 
https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_{}.parquet'
   ```
   
   register the data folder as an external table from datafusion cli:
   
   ```
   CREATE EXTERNAL TABLE hits
   STORED AS PARQUET
   LOCATION '~/hits/partitioned';
   ```
   
   ```
   set datafusion.execution.target_partitions =3;
   ```
   
   The results are like this:
   
   ```
   
+---------------------+-------------+---+---------------------+---------------------------+
   | WatchID             | ClientIP    | c | sum(hits.IsRefresh) | 
avg(hits.ResolutionWidth) |
   
+---------------------+-------------+---+---------------------+---------------------------+
   | 7904046282518428963 | 1509330109  | 2 | 0                   | 1368.0       
             |
   | 7224410078130478461 | -776509581  | 2 | 0                   | 1368.0       
             |
   | 8566928176839891583 | -1402644643 | 2 | 0                   | 1368.0       
             |
   | 6655575552203051303 | 1611957945  | 2 | 0                   | 1638.0       
             |
   | 8449123891155589752 | 558210368   | 1 | 0                   | 1368.0       
             |
   | 7249929090756875277 | 558210368   | 1 | 0                   | 1368.0       
             |
   | 8346088010501248028 | -2091064649 | 1 | 0                   | 1638.0       
             |
   | 8797199898703927977 | -2140306534 | 1 | 0                   | 2038.0       
             |
   | 7860441087193910310 | 1154898388  | 1 | 0                   | 1638.0       
             |
   | 6255708766253389085 | 1154898388  | 1 | 0                   | 1638.0       
             |
   
+---------------------+-------------+---+---------------------+---------------------------+
   ```
   
   
   When we change the target_partitions to 10, the results look like this:
   
   ```
   
+---------------------+-------------+---+---------------------+---------------------------+
   | WatchID             | ClientIP    | c | sum(hits.IsRefresh) | 
avg(hits.ResolutionWidth) |
   
+---------------------+-------------+---+---------------------+---------------------------+
   | 7224410078130478461 | -776509581  | 2 | 0                   | 1368.0       
             |
   | 8566928176839891583 | -1402644643 | 2 | 0                   | 1368.0       
             |
   | 6655575552203051303 | 1611957945  | 2 | 0                   | 1638.0       
             |
   | 7904046282518428963 | 1509330109  | 2 | 0                   | 1368.0       
             |
   | 9074542984305678345 | 53624704    | 1 | 0                   | 1750.0       
             |
   | 8657316443585142993 | 1566983451  | 1 | 0                   | 362.0        
             |
   | 7644914580732725077 | 1328088797  | 1 | 0                   | 1638.0       
             |
   | 6952801458493199229 | 1328088797  | 1 | 0                   | 1638.0       
             |
   | 7412006209029830543 | -51933821   | 1 | 0                   | 1250.0       
             |
   | 7393014425039729645 | -51933821   | 1 | 0                   | 1250.0       
             |
   
+---------------------+-------------+---+---------------------+---------------------------+
   ```
   
   ### Expected behavior
   
   Consisent results for both cases.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to