Yes, perhaps we could use SQLTransformer as well. http://spark.apache.org/docs/latest/ml-features.html#sqltransformer
On Sun, Jun 18, 2017 at 10:47 AM, Pralabh Kumar <pralabhku...@gmail.com> wrote: > Hi Yan > > Yes sql is good option , but if we have to create ML Pipeline , then > having transformers and set it into pipeline stages ,would be better option > . > > Regards > Pralabh Kumar > > On Sun, Jun 18, 2017 at 4:23 AM, 颜发才(Yan Facai) <facai....@gmail.com> > wrote: > >> To filter data, how about using sql? >> >> df.createOrReplaceTempView("df") >> val sqlDF = spark.sql("SELECT * FROM df WHERE EMOTION IN >> (HAPPY,SAD,ANGRY,NEUTRAL,NA)") >> >> https://spark.apache.org/docs/latest/sql-programming-guide.html#sql >> >> >> >> On Fri, Jun 16, 2017 at 11:28 PM, Pralabh Kumar <pralabhku...@gmail.com> >> wrote: >> >>> Hi Saatvik >>> >>> You can write your own transformer to make sure that column contains >>> ,value which u provided , and filter out rows which doesn't follow the >>> same. >>> >>> Something like this >>> >>> >>> case class CategoryTransformer(override val uid : String) extends >>> Transformer{ >>> override def transform(inputData: DataFrame): DataFrame = { >>> inputData.select("col1").filter("col1 in ('happy')") >>> } >>> override def copy(extra: ParamMap): Transformer = ??? >>> @DeveloperApi >>> override def transformSchema(schema: StructType): StructType ={ >>> schema >>> } >>> } >>> >>> >>> Usage >>> >>> val data = sc.parallelize(List("abce","happy")).toDF("col1") >>> val trans = new CategoryTransformer("1") >>> data.show() >>> trans.transform(data).show() >>> >>> >>> This transformer will make sure , you always have values in col1 as >>> provided by you. >>> >>> >>> Regards >>> Pralabh Kumar >>> >>> On Fri, Jun 16, 2017 at 8:10 PM, Saatvik Shah <saatvikshah1...@gmail.com >>> > wrote: >>> >>>> Hi Pralabh, >>>> >>>> I want the ability to create a column such that its values be >>>> restricted to a specific set of predefined values. >>>> For example, suppose I have a column called EMOTION: I want to ensure >>>> each row value is one of HAPPY,SAD,ANGRY,NEUTRAL,NA. >>>> >>>> Thanks and Regards, >>>> Saatvik Shah >>>> >>>> >>>> On Fri, Jun 16, 2017 at 10:30 AM, Pralabh Kumar <pralabhku...@gmail.com >>>> > wrote: >>>> >>>>> Hi satvik >>>>> >>>>> Can u please provide an example of what exactly you want. >>>>> >>>>> >>>>> >>>>> On 16-Jun-2017 7:40 PM, "Saatvik Shah" <saatvikshah1...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi Yan, >>>>>> >>>>>> Basically the reason I was looking for the categorical datatype is as >>>>>> given here >>>>>> <https://pandas.pydata.org/pandas-docs/stable/categorical.html>: >>>>>> ability to fix column values to specific categories. Is it possible to >>>>>> create a user defined data type which could do so? >>>>>> >>>>>> Thanks and Regards, >>>>>> Saatvik Shah >>>>>> >>>>>> On Fri, Jun 16, 2017 at 1:42 AM, 颜发才(Yan Facai) <facai....@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> You can use some Transformers to handle categorical data, >>>>>>> For example, >>>>>>> StringIndexer encodes a string column of labels to a column of >>>>>>> label indices: >>>>>>> http://spark.apache.org/docs/latest/ml-features.html#stringindexer >>>>>>> >>>>>>> >>>>>>> On Thu, Jun 15, 2017 at 10:19 PM, saatvikshah1994 < >>>>>>> saatvikshah1...@gmail.com> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> I'm trying to convert a Pandas -> Spark dataframe. One of the >>>>>>>> columns I have >>>>>>>> is of the Category type in Pandas. But there does not seem to be >>>>>>>> support for >>>>>>>> this same type in Spark. What is the best alternative? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> View this message in context: http://apache-spark-user-list. >>>>>>>> 1001560.n3.nabble.com/Best-alternative-for-Category-Type-in- >>>>>>>> Spark-Dataframe-tp28764.html >>>>>>>> Sent from the Apache Spark User List mailing list archive at >>>>>>>> Nabble.com. >>>>>>>> >>>>>>>> ------------------------------------------------------------ >>>>>>>> --------- >>>>>>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> *Saatvik Shah,* >>>>>> *1st Year,* >>>>>> *Masters in the School of Computer Science,* >>>>>> *Carnegie Mellon University* >>>>>> >>>>>> *https://saatvikshah1994.github.io/ >>>>>> <https://saatvikshah1994.github.io/>* >>>>>> >>>>> >>>> >>>> >>>> -- >>>> *Saatvik Shah,* >>>> *1st Year,* >>>> *Masters in the School of Computer Science,* >>>> *Carnegie Mellon University* >>>> >>>> *https://saatvikshah1994.github.io/ >>>> <https://saatvikshah1994.github.io/>* >>>> >>> >>> >> >