Re: Spark SQL on large number of columns

madhu phatak Tue, 19 May 2015 05:53:59 -0700

Hi,
Tested with HiveContext also. It also take similar amount of time.

To make the things clear, the following is select clause for a given column



*aggregateStats( "$columnName" , max( cast($columnName as double)),
|min(cast($columnName as double)), avg(cast($columnName as double)),
count(*) )*

aggregateStats is UDF generating case class to hold the values.








Regards,
Madhukara Phatak
http://datamantra.io/

On Tue, May 19, 2015 at 5:57 PM, madhu phatak <phatak....@gmail.com> wrote:

> Hi,
> Tested for calculating values for 300 columns. Analyser takes around 4
> minutes to generate the plan. Is this normal?
>
>
>
>
> Regards,
> Madhukara Phatak
> http://datamantra.io/
>
> On Tue, May 19, 2015 at 4:35 PM, madhu phatak <phatak....@gmail.com>
> wrote:
>
>> Hi,
>> I am using spark 1.3.1
>>
>>
>>
>>
>> Regards,
>> Madhukara Phatak
>> http://datamantra.io/
>>
>> On Tue, May 19, 2015 at 4:34 PM, Wangfei (X) <wangf...@huawei.com> wrote:
>>
>>>  And which version are you using
>>>
>>> 发自我的 iPhone
>>>
>>> 在 2015年5月19日，18:29，"ayan guha" <guha.a...@gmail.com> 写道：
>>>
>>>   can you kindly share your code?
>>>
>>> On Tue, May 19, 2015 at 8:04 PM, madhu phatak <phatak....@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>> I  am trying run spark sql aggregation on a file with 26k columns. No
>>>> of rows is very small. I am running into issue that spark is taking huge
>>>> amount of time to parse the sql and create a logical plan. Even if i have
>>>> just one row, it's taking more than 1 hour just to get pass the parsing.
>>>> Any idea how to optimize in these kind of scenarios?
>>>>
>>>>
>>>>  Regards,
>>>>  Madhukara Phatak
>>>> http://datamantra.io/
>>>>
>>>
>>>
>>>
>>>  --
>>> Best Regards,
>>> Ayan Guha
>>>
>>>
>>
>

Re: Spark SQL on large number of columns

Reply via email to