You can use Spark MLlib http://spark.apache.org/docs/latest/ml-guide.html#announcement-dataframe-based-api-is-primary-api
On Tue, Aug 23, 2016 at 7:34 AM, Wen Pei Yu <yuw...@cn.ibm.com> wrote: > Hi > > We have a dataframe, then want group it and apply a ML algorithm or > statistics(say t test) to each one. Is there any efficient way for this > situation? > > Currently, we transfer to pyspark, use groupbykey and apply numpy function > to array. But this wasn't an efficient way, right? > > Regards. > Wenpei. > -- Thanks & regards, Nirmal Team Lead - WSO2 Machine Learner Associate Technical Lead - Data Technologies Team, WSO2 Inc. Mobile: +94715779733 Blog: http://nirmalfdo.blogspot.com/