Re: Filtering RDD Using Spark.mllib's ChiSqSelector

2016-07-19 Thread Tobi Bosede
Thanks Yanbo, will try that! On Sun, Jul 17, 2016 at 10:26 PM, Yanbo Liang wrote: > Hi Tobi, > > Thanks for clarifying the question. It's very straight forward to convert > the filtered RDD to DataFrame, you can refer the following code snippets: > > from pyspark.sql import

Re: Filtering RDD Using Spark.mllib's ChiSqSelector

2016-07-17 Thread Yanbo Liang
Hi Tobi, Thanks for clarifying the question. It's very straight forward to convert the filtered RDD to DataFrame, you can refer the following code snippets: from pyspark.sql import Row rdd2 = filteredRDD.map(lambda v: Row(features=v)) df = rdd2.toDF() Thanks Yanbo 2016-07-16 14:51 GMT-07:00

Re: Filtering RDD Using Spark.mllib's ChiSqSelector

2016-07-16 Thread Tobi Bosede
Hi Yanbo, Appreciate the response. I might not have phrased this correctly, but I really wanted to know how to convert the pipeline rdd into a data frame. I have seen the example you posted. However I need to transform all my data, just not 1 line. So I did sucessfully use map to use the chisq

Re: Filtering RDD Using Spark.mllib's ChiSqSelector

2016-07-16 Thread Yanbo Liang
Hi Tobi, The MLlib RDD-based API does support to apply transformation on both Vector and RDD, but you did not use the appropriate way to do. Suppose you have a RDD with LabeledPoint in each line, you can refer the following code snippets to train a ChiSqSelectorModel model and do transformation:

Filtering RDD Using Spark.mllib's ChiSqSelector

2016-07-14 Thread Tobi Bosede
Hi everyone, I am trying to filter my features based on the spark.mllib ChiSqSelector. filteredData = vectorizedTestPar.map(lambda lp: LabeledPoint(lp.label, model.transform(lp.features))) However when I do the following I get the error below. Is there any other way to filter my data to avoid