If you want to calculate mean, variance, minimum, maximum and total count
for each columns, especially for features of machine learning, you can try
MultivariateOnlineSummarizer.
MultivariateOnlineSummarizer implements a numerically stable algorithm to
compute sample mean and variance by column in
The “output” variable is actually a SchemaRDD, it provides lots of DSL API, see
http://spark.apache.org/docs/1.1.0/api/scala/index.html#org.apache.spark.sql.SchemaRDD
1) How to save result values of a query into a list ?
[CH:] val list: Array[Row] = output.collect, however get 1M records into
Thank you very much ,
two more small questions :
1) val output = sqlContext.sql(SELECT * From people)
my output has 128 columns and single row .
how to find the which column has the maximum value in a single row using
scala ?
2) as each row has 128 columns how to print each row into a text