On Tue, Aug 23, 2016 at 10:56 AM, Wen Pei Yu <yuw...@cn.ibm.com> wrote:
> We can group a dataframe by one column like > > df.groupBy(df.col("gender")) > On top of this DF, use a filter that would enable you to extract the grouped DF as separated DFs. Then you can apply ML on top of each DF. eg: xyzDF.filter(col("x").equalTo(x)) > > It like split a dataframe to multiple dataframe. Currently, we can only > apply simple sql function to this GroupedData like agg, max etc. > > What we want is apply one ML algorithm to each group. > > Regards. > > [image: Inactive hide details for Nirmal Fernando ---08/23/2016 01:14:48 > PM---Hi Wen, AFAIK Spark MLlib implements its machine learning]Nirmal > Fernando ---08/23/2016 01:14:48 PM---Hi Wen, AFAIK Spark MLlib implements > its machine learning algorithms on top of > > From: Nirmal Fernando <nir...@wso2.com> > To: Wen Pei Yu/China/IBM@IBMCN > Cc: User <user@spark.apache.org> > Date: 08/23/2016 01:14 PM > > Subject: Re: Apply ML to grouped dataframe > ------------------------------ > > > > Hi Wen, > > AFAIK Spark MLlib implements its machine learning algorithms on top of > Spark dataframe API. What did you mean by a grouped dataframe? > > On Tue, Aug 23, 2016 at 10:42 AM, Wen Pei Yu <*yuw...@cn.ibm.com* > <yuw...@cn.ibm.com>> wrote: > > Hi Nirmal > > I didn't get your point. > Can you tell me more about how to use MLlib to grouped dataframe? > > Regards. > Wenpei. > > [image: Inactive hide details for Nirmal Fernando ---08/23/2016 > 10:26:36 AM---You can use Spark MLlib > http://spark.apache.org/docs/late]Nirmal > Fernando ---08/23/2016 10:26:36 AM---You can use Spark MLlib > > *http://spark.apache.org/docs/latest/ml-guide.html#announcement-dataframe-bas* > > <http://spark.apache.org/docs/latest/ml-guide.html#announcement-dataframe-bas> > > From: Nirmal Fernando <*nir...@wso2.com* <nir...@wso2.com>> > To: Wen Pei Yu/China/IBM@IBMCN > Cc: User <*user@spark.apache.org* <user@spark.apache.org>> > Date: 08/23/2016 10:26 AM > Subject: Re: Apply ML to grouped dataframe > ------------------------------ > > > > > You can use Spark MLlib > > *http://spark.apache.org/docs/latest/ml-guide.html#announcement-dataframe-based-api-is-primary-api* > > <http://spark.apache.org/docs/latest/ml-guide.html#announcement-dataframe-based-api-is-primary-api> > > On Tue, Aug 23, 2016 at 7:34 AM, Wen Pei Yu <*yuw...@cn.ibm.com* > <yuw...@cn.ibm.com>> wrote: > Hi > > We have a dataframe, then want group it and apply a ML algorithm > or statistics(say t test) to each one. Is there any efficient way > for this > situation? > > Currently, we transfer to pyspark, use groupbykey and apply > numpy function to array. But this wasn't an efficient way, right? > > Regards. > Wenpei. > > > > > -- > > Thanks & regards, > Nirmal > > Team Lead - WSO2 Machine Learner > Associate Technical Lead - Data Technologies Team, WSO2 Inc. > Mobile: *+94715779733* <%2B94715779733> > Blog: *http://nirmalfdo.blogspot.com/* <http://nirmalfdo.blogspot.com/> > > > > > > > > -- > > Thanks & regards, > Nirmal > > Team Lead - WSO2 Machine Learner > Associate Technical Lead - Data Technologies Team, WSO2 Inc. > Mobile: +94715779733 > Blog: *http://nirmalfdo.blogspot.com/* <http://nirmalfdo.blogspot.com/> > > > > -- Thanks & regards, Nirmal Team Lead - WSO2 Machine Learner Associate Technical Lead - Data Technologies Team, WSO2 Inc. Mobile: +94715779733 Blog: http://nirmalfdo.blogspot.com/