[GitHub] [spark] dongjoon-hyun commented on issue #27233: [SPARK-29701][SQL] Correct behaviours of group analytical queries when empty input given

GitBox Fri, 24 Jan 2020 16:10:14 -0800

dongjoon-hyun commented on issue #27233: [SPARK-29701][SQL] Correct behaviours 
of group analytical queries when empty input given
URL: https://github.com/apache/spark/pull/27233#issuecomment-578350299
 
 
   When you do `SELECT` query, how do you feel if you get a result set with one 
missing row always?
   > To me correctness means you get the wrong data back out
   
   For the following statement, do you feel the following query is 
counter-intuitive? SQL should return `0` for `COUNT(*)` when there is no rows. 
If you feel that's correct, why not on `GROUPING SET grand total`?
   > the 0 to me seems counter intuitive anyway
   ```
   spark-sql> select sum(a), count(*) from (select 1 a where false);
   NULL 0
   ```
   
   We are discussing now because of the following.
   > If there are other people that disagree then it obviously needs 
discussion, 
   
   `GROUPING SETS` is a commonly used expression for analytics queries 
(including AB testings). You may not hit this query if your workload doesn't 
like that. I understand your situation, but please don't overlook the other 
people situation. We have queries.
   >  if someone can give me a concrete example that this caused them $$ lost 
that might change my mind.
   
   For $$, as you see in this PR, by definition, if Spark doesn't give a `grand 
total`, we need to run another separate query to get the value in this step. It 
means we need to maintain a full complex query for this one missing `() /*grand 
total*/`. It's a maintenance cost. (I'll not argue about the computing resource 
cost because the current implementation of this PR is not optimized yet.)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #27233: [SPARK-29701][SQL] Correct behaviours of group analytical queries when empty input given

Reply via email to