Re: Spark SQL on large number of columns

madhu phatak Tue, 19 May 2015 07:20:57 -0700

Hi,
Another update, when run on more that 1000 columns I am getting

Could not write class
__wrapper$1$40255d281a0d4eacab06bcad6cf89b0d/__wrapper$1$40255d281a0d4eacab06bcad6cf89b0d$$anonfun$wrapper$1$$anon$1
because it exceeds JVM code size limits. Method apply's code too large!







Regards,
Madhukara Phatak
http://datamantra.io/

On Tue, May 19, 2015 at 6:23 PM, madhu phatak <phatak....@gmail.com> wrote:

> Hi,
> Tested with HiveContext also. It also take similar amount of time.
>
> To make the things clear, the following is select clause for a given column
>
>
> *aggregateStats( "$columnName" , max( cast($columnName as double)),   
> |min(cast($columnName as double)), avg(cast($columnName as double)), count(*) 
> )*
>
> aggregateStats is UDF generating case class to hold the values.
>
>
>
>
>
>
>
>
> Regards,
> Madhukara Phatak
> http://datamantra.io/
>
> On Tue, May 19, 2015 at 5:57 PM, madhu phatak <phatak....@gmail.com>
> wrote:
>
>> Hi,
>> Tested for calculating values for 300 columns. Analyser takes around 4
>> minutes to generate the plan. Is this normal?
>>
>>
>>
>>
>> Regards,
>> Madhukara Phatak
>> http://datamantra.io/
>>
>> On Tue, May 19, 2015 at 4:35 PM, madhu phatak <phatak....@gmail.com>
>> wrote:
>>
>>> Hi,
>>> I am using spark 1.3.1
>>>
>>>
>>>
>>>
>>> Regards,
>>> Madhukara Phatak
>>> http://datamantra.io/
>>>
>>> On Tue, May 19, 2015 at 4:34 PM, Wangfei (X) <wangf...@huawei.com>
>>> wrote:
>>>
>>>>  And which version are you using
>>>>
>>>> 发自我的 iPhone
>>>>
>>>> 在 2015年5月19日，18:29，"ayan guha" <guha.a...@gmail.com> 写道：
>>>>
>>>>   can you kindly share your code?
>>>>
>>>> On Tue, May 19, 2015 at 8:04 PM, madhu phatak <phatak....@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>> I  am trying run spark sql aggregation on a file with 26k columns. No
>>>>> of rows is very small. I am running into issue that spark is taking huge
>>>>> amount of time to parse the sql and create a logical plan. Even if i have
>>>>> just one row, it's taking more than 1 hour just to get pass the parsing.
>>>>> Any idea how to optimize in these kind of scenarios?
>>>>>
>>>>>
>>>>>  Regards,
>>>>>  Madhukara Phatak
>>>>> http://datamantra.io/
>>>>>
>>>>
>>>>
>>>>
>>>>  --
>>>> Best Regards,
>>>> Ayan Guha
>>>>
>>>>
>>>
>>
>

Re: Spark SQL on large number of columns

Reply via email to