Re: spark sql query optimization , and decision tree building

2014-10-27 Thread Yanbo Liang
If you want to calculate mean, variance, minimum, maximum and total count for each columns, especially for features of machine learning, you can try MultivariateOnlineSummarizer. MultivariateOnlineSummarizer implements a numerically stable algorithm to compute sample mean and variance by column in

RE: spark sql query optimization , and decision tree building

2014-10-22 Thread Cheng, Hao
The “output” variable is actually a SchemaRDD, it provides lots of DSL API, see http://spark.apache.org/docs/1.1.0/api/scala/index.html#org.apache.spark.sql.SchemaRDD 1) How to save result values of a query into a list ? [CH:] val list: Array[Row] = output.collect, however get 1M records into

Re: spark sql query optimization , and decision tree building

2014-10-22 Thread sanath kumar
Thank you very much , two more small questions : 1) val output = sqlContext.sql(SELECT * From people) my output has 128 columns and single row . how to find the which column has the maximum value in a single row using scala ? 2) as each row has 128 columns how to print each row into a text