Hi Olivier, Here it is.
== Physical Plan == Aggregate false, [PartialGroup#155], [PartialGroup#155 AS is_bad#108,Coalesce(SUM(PartialCount#152L),0) AS count#109L,(CAST(SUM(PartialSum#153), DoubleType) / CAST(SUM(PartialCount#154L), DoubleType)) AS avg#110] Exchange (HashPartitioning [PartialGroup#155], 200), [] Aggregate true, [meta#143[is_bad]], [meta#143[is_bad] AS PartialGroup#155,COUNT(1) AS PartialCount#152L,COUNT(nvar#145[var1]) AS PartialCount#154L,SUM(nvar#145[var1]) AS PartialSum#153] Project [meta#143,nvar#145] Filter ((date#147 >= 2014-04-01) && (date#147 <= 2014-04-30)) PhysicalRDD [meta#143,nvar#145,date#147], MapPartitionsRDD[6] at explain at <console>:32 Jianshi On Tue, May 12, 2015 at 10:34 PM, Olivier Girardot <ssab...@gmail.com> wrote: > can you post the explain too ? > > Le mar. 12 mai 2015 à 12:11, Jianshi Huang <jianshi.hu...@gmail.com> a > écrit : > >> Hi, >> >> I have a SQL query on tables containing big Map columns (thousands of >> keys). I found it to be very slow. >> >> select meta['is_bad'] as is_bad, count(*) as count, avg(nvar['var1']) as >> avg >> from test >> where date between '2014-04-01' and '2014-04-30' >> group by meta['is_bad'] >> >> => >> >> +---------+-----------+-----------------------+ >> | is_bad | count | avg | >> +---------+-----------+-----------------------+ >> | 0 | 17024396 | 0.16257395850742645 | >> | 1 | 179729 | -0.37626256661125485 | >> | 2 | 28128 | 0.11674427263203344 | >> | 3 | 116327 | -0.6398689187187386 | >> | 4 | 87715 | -0.5349632960030563 | >> | 5 | 169771 | 0.40812641191854626 | >> | 6 | 542447 | 0.5238256418341465 | >> | 7 | 160324 | 0.29442847034840386 | >> | 8 | 2099 | -0.9165701665162977 | >> | 9 | 3104 | 0.3845685004598235 | >> +---------+-----------+-----------------------+ >> 10 rows selected (130.5 seconds) >> >> >> The total number of rows is less than 20M. Why so slow? >> >> I'm running on Spark 1.4.0-SNAPSHOT with 100 executors each having 4GB >> ram and 2 CPU core. >> >> Looks like https://issues.apache.org/jira/browse/SPARK-5446 is still >> open, when can we have it fixed? :) >> >> -- >> Jianshi Huang >> >> LinkedIn: jianshi >> Twitter: @jshuang >> Github & Blog: http://huangjs.github.com/ >> > -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/