Re: Apply ML to grouped dataframe

2016-08-23 Thread Wen Pei Yu
[2.0,16.0]| |12462589343|3| [1.0,1.0]| +---+-++ From: ayan guha <guha.a...@gmail.com> To: Wen Pei Yu/China/IBM@IBMCN Cc: user <user@spark.apache.org>, Nirmal Fernando <nir...@wso2.com> Date: 08/23/2016 05:13 PM Subject: Re: A

Re: Apply ML to grouped dataframe

2016-08-23 Thread ayan guha
do <nir...@wso2.com> > To: Wen Pei Yu/China/IBM@IBMCN > Cc: User <user@spark.apache.org> > Date: 08/23/2016 01:55 PM > Subject: Re: Apply ML to grouped dataframe > -- > > > > > > On Tue, Aug 23, 2016 at 10:56 AM, Wen Pei Yu <*yuw...

Re: Apply ML to grouped dataframe

2016-08-23 Thread Wen Pei Yu
User <user@spark.apache.org> Date: 08/23/2016 01:55 PM Subject: Re: Apply ML to grouped dataframe On Tue, Aug 23, 2016 at 10:56 AM, Wen Pei Yu <yuw...@cn.ibm.com> wrote: We can group a dataframe by one column like df.groupBy(df.col("gender")) On top of this

Re: Apply ML to grouped dataframe

2016-08-22 Thread Nirmal Fernando
t; From: Nirmal Fernando <nir...@wso2.com> > To: Wen Pei Yu/China/IBM@IBMCN > Cc: User <user@spark.apache.org> > Date: 08/23/2016 01:14 PM > > Subject: Re: Apply ML to grouped dataframe > -- > > > > Hi Wen, > > AFAIK

Re: Apply ML to grouped dataframe

2016-08-22 Thread Wen Pei Yu
: Nirmal Fernando <nir...@wso2.com> To: Wen Pei Yu/China/IBM@IBMCN Cc: User <user@spark.apache.org> Date: 08/23/2016 01:14 PM Subject: Re: Apply ML to grouped dataframe Hi Wen, AFAIK Spark MLlib implements its machine learning algorithms on top of Spark dataframe 

Re: Apply ML to grouped dataframe

2016-08-22 Thread Nirmal Fernando
b > http://spark.apache.org/docs/latest/ml-guide.html# > announcement-dataframe-bas > > From: Nirmal Fernando <nir...@wso2.com> > To: Wen Pei Yu/China/IBM@IBMCN > Cc: User <user@spark.apache.org> > Date: 08/23/2016 10:26 AM > Subject: Re: Apply ML to grouped

Re: Apply ML to grouped dataframe

2016-08-22 Thread Wen Pei Yu
Hi Nirmal I didn't get your point. Can you tell me more about how to use MLlib to grouped dataframe? Regards. Wenpei. From: Nirmal Fernando <nir...@wso2.com> To: Wen Pei Yu/China/IBM@IBMCN Cc: User <user@spark.apache.org> Date: 08/23/2016 10:26 AM Subject:

Re: Apply ML to grouped dataframe

2016-08-22 Thread Nirmal Fernando
You can use Spark MLlib http://spark.apache.org/docs/latest/ml-guide.html#announcement-dataframe-based-api-is-primary-api On Tue, Aug 23, 2016 at 7:34 AM, Wen Pei Yu wrote: > Hi > > We have a dataframe, then want group it and apply a ML algorithm or > statistics(say t test)

Apply ML to grouped dataframe

2016-08-22 Thread Wen Pei Yu
Hi We have a dataframe, then want group it and apply a ML algorithm or statistics(say t test) to each one. Is there any efficient way for this situation? Currently, we transfer to pyspark, use groupbykey and apply numpy function to array. But this wasn't an efficient way, right? Regards.