Re: Spark SQL on large number of columns

2015-05-19 Thread madhu phatak
Hi, I am using spark 1.3.1 Regards, Madhukara Phatak http://datamantra.io/ On Tue, May 19, 2015 at 4:34 PM, Wangfei (X) wangf...@huawei.com wrote: And which version are you using 发自我的 iPhone 在 2015年5月19日,18:29,ayan guha guha.a...@gmail.com 写道: can you kindly share your code? On

Re: Spark SQL on large number of columns

2015-05-19 Thread Wangfei (X)
And which version are you using 发自我的 iPhone 在 2015年5月19日,18:29,ayan guha guha.a...@gmail.commailto:guha.a...@gmail.com 写道: can you kindly share your code? On Tue, May 19, 2015 at 8:04 PM, madhu phatak phatak@gmail.commailto:phatak@gmail.com wrote: Hi, I am trying run spark sql

Spark SQL on large number of columns

2015-05-19 Thread madhu phatak
Hi, I am trying run spark sql aggregation on a file with 26k columns. No of rows is very small. I am running into issue that spark is taking huge amount of time to parse the sql and create a logical plan. Even if i have just one row, it's taking more than 1 hour just to get pass the parsing. Any

Re: Spark SQL on large number of columns

2015-05-19 Thread madhu phatak
Hi, I have fields from field_0 to fied_26000. The query is select on max( cast($columnName as double)), |min(cast($columnName as double)), avg(cast($columnName as double)), count(*) for all those 26000 fields in one query. Regards, Madhukara Phatak http://datamantra.io/ On Tue, May 19,

Re: Spark SQL on large number of columns

2015-05-19 Thread ayan guha
can you kindly share your code? On Tue, May 19, 2015 at 8:04 PM, madhu phatak phatak@gmail.com wrote: Hi, I am trying run spark sql aggregation on a file with 26k columns. No of rows is very small. I am running into issue that spark is taking huge amount of time to parse the sql and

Re: Spark SQL on large number of columns

2015-05-19 Thread madhu phatak
Hi, An additional information is, table is backed by a csv file which is read using spark-csv from databricks. Regards, Madhukara Phatak http://datamantra.io/ On Tue, May 19, 2015 at 4:05 PM, madhu phatak phatak@gmail.com wrote: Hi, I have fields from field_0 to fied_26000. The query

Re: Spark SQL on large number of columns

2015-05-19 Thread madhu phatak
Hi, Tested for calculating values for 300 columns. Analyser takes around 4 minutes to generate the plan. Is this normal? Regards, Madhukara Phatak http://datamantra.io/ On Tue, May 19, 2015 at 4:35 PM, madhu phatak phatak@gmail.com wrote: Hi, I am using spark 1.3.1 Regards,

Re: Spark SQL on large number of columns

2015-05-19 Thread madhu phatak
Hi, Another update, when run on more that 1000 columns I am getting Could not write class __wrapper$1$40255d281a0d4eacab06bcad6cf89b0d/__wrapper$1$40255d281a0d4eacab06bcad6cf89b0d$$anonfun$wrapper$1$$anon$1 because it exceeds JVM code size limits. Method apply's code too large! Regards,

Re: Spark SQL on large number of columns

2015-05-19 Thread madhu phatak
Hi, Tested with HiveContext also. It also take similar amount of time. To make the things clear, the following is select clause for a given column *aggregateStats( $columnName , max( cast($columnName as double)), |min(cast($columnName as double)), avg(cast($columnName as double)), count(*) )*

Re: Spark SQL on large number of columns

2015-05-19 Thread ayan guha
One option is batch up columns and do the batches in sequence. On 20 May 2015 00:20, madhu phatak phatak@gmail.com wrote: Hi, Another update, when run on more that 1000 columns I am getting Could not write class