Re: Spark SQL on large number of columns

madhu phatak Tue, 19 May 2015 03:54:07 -0700

Hi,
An additional information is,  table is backed by a csv file which is read
using spark-csv from databricks.





Regards,
Madhukara Phatak
http://datamantra.io/

On Tue, May 19, 2015 at 4:05 PM, madhu phatak <phatak....@gmail.com> wrote:

> Hi,
> I have fields from field_0 to fied_26000. The query is select on
>
> max( cast($columnName as double)),
>    |min(cast($columnName as double)), avg(cast($columnName as double)), 
> count(*)
>
> for all those 26000 fields in one query.
>
>
>
>
>
> Regards,
> Madhukara Phatak
> http://datamantra.io/
>
> On Tue, May 19, 2015 at 3:59 PM, ayan guha <guha.a...@gmail.com> wrote:
>
>> can you kindly share your code?
>>
>> On Tue, May 19, 2015 at 8:04 PM, madhu phatak <phatak....@gmail.com>
>> wrote:
>>
>>> Hi,
>>> I  am trying run spark sql aggregation on a file with 26k columns. No of
>>> rows is very small. I am running into issue that spark is taking huge
>>> amount of time to parse the sql and create a logical plan. Even if i have
>>> just one row, it's taking more than 1 hour just to get pass the parsing.
>>> Any idea how to optimize in these kind of scenarios?
>>>
>>>
>>> Regards,
>>> Madhukara Phatak
>>> http://datamantra.io/
>>>
>>
>>
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>
>

Re: Spark SQL on large number of columns

Reply via email to