Hi, Tested with HiveContext also. It also take similar amount of time. To make the things clear, the following is select clause for a given column
*aggregateStats( "$columnName" , max( cast($columnName as double)), |min(cast($columnName as double)), avg(cast($columnName as double)), count(*) )* aggregateStats is UDF generating case class to hold the values. Regards, Madhukara Phatak http://datamantra.io/ On Tue, May 19, 2015 at 5:57 PM, madhu phatak <phatak....@gmail.com> wrote: > Hi, > Tested for calculating values for 300 columns. Analyser takes around 4 > minutes to generate the plan. Is this normal? > > > > > Regards, > Madhukara Phatak > http://datamantra.io/ > > On Tue, May 19, 2015 at 4:35 PM, madhu phatak <phatak....@gmail.com> > wrote: > >> Hi, >> I am using spark 1.3.1 >> >> >> >> >> Regards, >> Madhukara Phatak >> http://datamantra.io/ >> >> On Tue, May 19, 2015 at 4:34 PM, Wangfei (X) <wangf...@huawei.com> wrote: >> >>> And which version are you using >>> >>> 发自我的 iPhone >>> >>> 在 2015年5月19日,18:29,"ayan guha" <guha.a...@gmail.com> 写道: >>> >>> can you kindly share your code? >>> >>> On Tue, May 19, 2015 at 8:04 PM, madhu phatak <phatak....@gmail.com> >>> wrote: >>> >>>> Hi, >>>> I am trying run spark sql aggregation on a file with 26k columns. No >>>> of rows is very small. I am running into issue that spark is taking huge >>>> amount of time to parse the sql and create a logical plan. Even if i have >>>> just one row, it's taking more than 1 hour just to get pass the parsing. >>>> Any idea how to optimize in these kind of scenarios? >>>> >>>> >>>> Regards, >>>> Madhukara Phatak >>>> http://datamantra.io/ >>>> >>> >>> >>> >>> -- >>> Best Regards, >>> Ayan Guha >>> >>> >> >