[GitHub] [spark] mgaido91 commented on issue #25347: [SPARK-28610][SQL] Allow having a decimal buffer for long sum

2019-08-28 Thread GitBox
mgaido91 commented on issue #25347: [SPARK-28610][SQL] Allow having a decimal 
buffer for long sum
URL: https://github.com/apache/spark/pull/25347#issuecomment-525768794
 
 
   yes, I agree the perf issue is very significant. Let me close this one, 
thank you all for the reviews!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mgaido91 commented on issue #25347: [SPARK-28610][SQL] Allow having a decimal buffer for long sum

2019-08-27 Thread GitBox
mgaido91 commented on issue #25347: [SPARK-28610][SQL] Allow having a decimal 
buffer for long sum
URL: https://github.com/apache/spark/pull/25347#issuecomment-525314313
 
 
   @cloud-fan unfortunately the performance overhead is very significant. I 
tried and run the benchmarks in both modes.
   
   Here you can see the code:
   
   ```
   runBenchmark("aggregate without grouping") {
 val N = 500L << 22
 withSQLConf(SQLConf.SUM_DECIMAL_BUFFER_FOR_LONG.key -> "false") {
   codegenBenchmark("agg w/o group long buffer", N) {
 spark.range(N).selectExpr("sum(id)").collect()
   }
 }
 withSQLConf(SQLConf.SUM_DECIMAL_BUFFER_FOR_LONG.key -> "true") {
   codegenBenchmark("agg w/o group decimal buffer", N) {
 spark.range(N).selectExpr("sum(id)").collect()
   }
 }
   }
   ```
   and here it is the output (as you can see, the overhead is more than 10x on 
a simple sum of longs):
   ``
   [info] Running benchmark: agg w/o group long buffer
   [info]   Running case: agg w/o group long buffer wholestage off
   [info]   Stopped after 2 iterations, 105407 ms
   [info]   Running case: agg w/o group long buffer wholestage on
   [info]   Stopped after 5 iterations, 6282 ms
   [info] 
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_152-b16 on Mac OS X 10.13.6
   [info] Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
   [info] agg w/o group long buffer:Best Time(ms)   Avg 
Time(ms)   Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
   [info] 

   [info] agg w/o group long buffer wholestage off  48538  
52704 NaN 43,2  23,1   1,0X
   [info] agg w/o group long buffer wholestage on1231   
1257  28   1703,6   0,6  39,4X
   [info] 
   [info] Running benchmark: agg w/o group decimal buffer
   [info]   Running case: agg w/o group decimal buffer wholestage off
   [info]   Stopped after 2 iterations, 1276890 ms
   [info]   Running case: agg w/o group decimal buffer wholestage on
   [info]   Stopped after 5 iterations, 496100 ms
   [info] 
   [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_152-b16 on Mac OS X 10.13.6
   [info] Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
   [info] agg w/o group decimal buffer: Best Time(ms)   Avg 
Time(ms)   Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
   [info] 

   [info] agg w/o group decimal buffer wholestage off 633585 
638445 NaN  3,3 302,1   1,0X
   [info] agg w/o group decimal buffer wholestage on  92037  
99220 NaN 22,8  43,9   6,9X
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mgaido91 commented on issue #25347: [SPARK-28610][SQL] Allow having a decimal buffer for long sum

2019-08-26 Thread GitBox
mgaido91 commented on issue #25347: [SPARK-28610][SQL] Allow having a decimal 
buffer for long sum
URL: https://github.com/apache/spark/pull/25347#issuecomment-524938517
 
 
   I'll do that @cloud-fan , thanks


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mgaido91 commented on issue #25347: [SPARK-28610][SQL] Allow having a decimal buffer for long sum

2019-08-22 Thread GitBox
mgaido91 commented on issue #25347: [SPARK-28610][SQL] Allow having a decimal 
buffer for long sum
URL: https://github.com/apache/spark/pull/25347#issuecomment-523800999
 
 
   > The change to support decimal sum buffer itself looks nice to me, but I 
think we'd like a coarser flag for that.
   
   @maropu @cloud-fan any suggestion on what this flag should be/look like?
   
   Meanwhile I'll update the PR to limit the scope only to the buffer, thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mgaido91 commented on issue #25347: [SPARK-28610][SQL] Allow having a decimal buffer for long sum

2019-08-21 Thread GitBox
mgaido91 commented on issue #25347: [SPARK-28610][SQL] Allow having a decimal 
buffer for long sum
URL: https://github.com/apache/spark/pull/25347#issuecomment-523508395
 
 
   > Spark needs to follow SQL standard but not Postgres, so making Spark 
behave the same with Postgres is not a strong reason to me.
   
   I 100% agree on that.
   
   @maropu what do you think?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mgaido91 commented on issue #25347: [SPARK-28610][SQL] Allow having a decimal buffer for long sum

2019-08-21 Thread GitBox
mgaido91 commented on issue #25347: [SPARK-28610][SQL] Allow having a decimal 
buffer for long sum
URL: https://github.com/apache/spark/pull/25347#issuecomment-523439380
 
 
   I think the reason is to have feature parity with Postgres. Using it as 
return type too supports also cases which would overflow otherwise. I think, 
since we are guarding it with a flag, it is fine to return a decimal and have 
wider support.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mgaido91 commented on issue #25347: [SPARK-28610][SQL] Allow having a decimal buffer for long sum

2019-08-21 Thread GitBox
mgaido91 commented on issue #25347: [SPARK-28610][SQL] Allow having a decimal 
buffer for long sum
URL: https://github.com/apache/spark/pull/25347#issuecomment-523383193
 
 
   thanks @wangyum !
   
   Actually I tried your very same command and Oracle returned the exact result 
to me. Anyway, I don't think it is relevant here. Oracle uses always DECIMAL, 
it doesn't really have a LONG datatype...
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mgaido91 commented on issue #25347: [SPARK-28610][SQL] Allow having a decimal buffer for long sum

2019-08-20 Thread GitBox
mgaido91 commented on issue #25347: [SPARK-28610][SQL] Allow having a decimal 
buffer for long sum
URL: https://github.com/apache/spark/pull/25347#issuecomment-523091311
 
 
   Well, actually mainstream DBs behave differently among each other:
   
- SQLServer returns bigint for a sum of bigint, int for a sum of int, 
decimal(28,s) for a sum of decimal(p,s);
- MySQL returns decimal(38,0) for a sum of int, decimal(41, 0) for a sum of 
long;
- Postgres behaves like after this PR with the flag on, ie. a sum of 
integers returns a long, while a sum of longs returns a decimal.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mgaido91 commented on issue #25347: [SPARK-28610][SQL] Allow having a decimal buffer for long sum

2019-08-19 Thread GitBox
mgaido91 commented on issue #25347: [SPARK-28610][SQL] Allow having a decimal 
buffer for long sum
URL: https://github.com/apache/spark/pull/25347#issuecomment-522552348
 
 
   thanks for your comments @cloud-fan and @maropu.
   
   I agree long term to have coarser flags controlling all these things, like: 
ansi strict, legacy and so on. So far we are creating a new flag for each case 
as it is done now in this PR.
   
   I am not sure if your comments say that we should hold this until we create 
such flags or we can go ahead on this as it is and create those flags later. 
May you please clarify?
   
   Thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mgaido91 commented on issue #25347: [SPARK-28610][SQL] Allow having a decimal buffer for long sum

2019-08-06 Thread GitBox
mgaido91 commented on issue #25347: [SPARK-28610][SQL] Allow having a decimal 
buffer for long sum
URL: https://github.com/apache/spark/pull/25347#issuecomment-518550239
 
 
   Yes @maropu, you're right. The reason why I didn't change the output 
attribute was not to cause a breaking change. But since we are introducing a 
flag for it, it may be ok to do so. What do you think? cc @cloud-fan what do 
you think about this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mgaido91 commented on issue #25347: [SPARK-28610][SQL] Allow having a decimal buffer for long sum

2019-08-04 Thread GitBox
mgaido91 commented on issue #25347: [SPARK-28610][SQL] Allow having a decimal 
buffer for long sum
URL: https://github.com/apache/spark/pull/25347#issuecomment-518010363
 
 
   @kiszk no, since the buffer value for average is a double for longs, so 
there is not the same issue.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mgaido91 commented on issue #25347: [SPARK-28610][SQL] Allow having a decimal buffer for long sum

2019-08-04 Thread GitBox
mgaido91 commented on issue #25347: [SPARK-28610][SQL] Allow having a decimal 
buffer for long sum
URL: https://github.com/apache/spark/pull/25347#issuecomment-517982880
 
 
   thanks for the suggestion @xuanyuanking , I added the link!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org