One option is batch up columns and do the batches in sequence. On 20 May 2015 00:20, "madhu phatak" <phatak....@gmail.com> wrote:
> Hi, > Another update, when run on more that 1000 columns I am getting > > Could not write class > __wrapper$1$40255d281a0d4eacab06bcad6cf89b0d/__wrapper$1$40255d281a0d4eacab06bcad6cf89b0d$$anonfun$wrapper$1$$anon$1 > because it exceeds JVM code size limits. Method apply's code too large! > > > > > > > Regards, > Madhukara Phatak > http://datamantra.io/ > > On Tue, May 19, 2015 at 6:23 PM, madhu phatak <phatak....@gmail.com> > wrote: > >> Hi, >> Tested with HiveContext also. It also take similar amount of time. >> >> To make the things clear, the following is select clause for a given >> column >> >> >> *aggregateStats( "$columnName" , max( cast($columnName as double)), >> |min(cast($columnName as double)), avg(cast($columnName as double)), >> count(*) )* >> >> aggregateStats is UDF generating case class to hold the values. >> >> >> >> >> >> >> >> >> Regards, >> Madhukara Phatak >> http://datamantra.io/ >> >> On Tue, May 19, 2015 at 5:57 PM, madhu phatak <phatak....@gmail.com> >> wrote: >> >>> Hi, >>> Tested for calculating values for 300 columns. Analyser takes around 4 >>> minutes to generate the plan. Is this normal? >>> >>> >>> >>> >>> Regards, >>> Madhukara Phatak >>> http://datamantra.io/ >>> >>> On Tue, May 19, 2015 at 4:35 PM, madhu phatak <phatak....@gmail.com> >>> wrote: >>> >>>> Hi, >>>> I am using spark 1.3.1 >>>> >>>> >>>> >>>> >>>> Regards, >>>> Madhukara Phatak >>>> http://datamantra.io/ >>>> >>>> On Tue, May 19, 2015 at 4:34 PM, Wangfei (X) <wangf...@huawei.com> >>>> wrote: >>>> >>>>> And which version are you using >>>>> >>>>> 发自我的 iPhone >>>>> >>>>> 在 2015年5月19日,18:29,"ayan guha" <guha.a...@gmail.com> 写道: >>>>> >>>>> can you kindly share your code? >>>>> >>>>> On Tue, May 19, 2015 at 8:04 PM, madhu phatak <phatak....@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> I am trying run spark sql aggregation on a file with 26k columns. No >>>>>> of rows is very small. I am running into issue that spark is taking huge >>>>>> amount of time to parse the sql and create a logical plan. Even if i have >>>>>> just one row, it's taking more than 1 hour just to get pass the parsing. >>>>>> Any idea how to optimize in these kind of scenarios? >>>>>> >>>>>> >>>>>> Regards, >>>>>> Madhukara Phatak >>>>>> http://datamantra.io/ >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Best Regards, >>>>> Ayan Guha >>>>> >>>>> >>>> >>> >> >