comphead commented on PR #25:
URL: 
https://github.com/apache/datafusion-benchmarks/pull/25#issuecomment-3633070813

   TPCDS data regenerated, 
   
   the reason we still have some of empty queries is the query generator uses 
query templates like q1 below, substituting the random values during generation 
and this is totally independent of data. I believe we need to play with the 
filter on existing queries and make it return the result
   
   ```
   define COUNTY = random(1, rowcount("active_counties", "store"), uniform);
   define STATE = distmember(fips_county, [COUNTY], 3); 
   define YEAR = random(1998, 2002, uniform);
   define AGG_FIELD = 
text({"SR_RETURN_AMT",1},{"SR_FEE",1},{"SR_REFUNDED_CASH",1},{"SR_RETURN_AMT_INC_TAX",1},{"SR_REVERSED_CHARGE",1},{"SR_STORE_CREDIT",1},{"SR_RETURN_TAX",1});
   define _LIMIT=100;
   
   with customer_total_return as
   (select sr_customer_sk as ctr_customer_sk
   ,sr_store_sk as ctr_store_sk
   ,sum([AGG_FIELD]) as ctr_total_return
   from store_returns
   ,date_dim
   where sr_returned_date_sk = d_date_sk
   and d_year =[YEAR]
   group by sr_customer_sk
   ,sr_store_sk)
   [_LIMITA] select [_LIMITB] c_customer_id
   from customer_total_return ctr1
   ,store
   ,customer
   where ctr1.ctr_total_return > (select avg(ctr_total_return)*1.2
   from customer_total_return ctr2
   where ctr1.ctr_store_sk = ctr2.ctr_store_sk)
   and s_store_sk = ctr1.ctr_store_sk
   and s_state = '[STATE]'
   and ctr1.ctr_customer_sk = c_customer_sk
   order by c_customer_id
   [_LIMITC];
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to