Yes, perhaps we could use SQLTransformer as well.

http://spark.apache.org/docs/latest/ml-features.html#sqltransformer

On Sun, Jun 18, 2017 at 10:47 AM, Pralabh Kumar <pralabhku...@gmail.com>
wrote:

> Hi Yan
>
> Yes sql is good option , but if we have to create ML Pipeline , then
> having transformers and set it into pipeline stages ,would be better option
> .
>
> Regards
> Pralabh Kumar
>
> On Sun, Jun 18, 2017 at 4:23 AM, 颜发才(Yan Facai) <facai....@gmail.com>
> wrote:
>
>> To filter data, how about using sql?
>>
>> df.createOrReplaceTempView("df")
>> val sqlDF = spark.sql("SELECT * FROM df WHERE EMOTION IN 
>> (HAPPY,SAD,ANGRY,NEUTRAL,NA)")
>>
>> https://spark.apache.org/docs/latest/sql-programming-guide.html#sql
>>
>>
>>
>> On Fri, Jun 16, 2017 at 11:28 PM, Pralabh Kumar <pralabhku...@gmail.com>
>> wrote:
>>
>>> Hi Saatvik
>>>
>>> You can write your own transformer to make sure that column contains
>>> ,value which u provided , and filter out rows which doesn't follow the
>>> same.
>>>
>>> Something like this
>>>
>>>
>>> case class CategoryTransformer(override val uid : String) extends
>>> Transformer{
>>>   override def transform(inputData: DataFrame): DataFrame = {
>>>     inputData.select("col1").filter("col1 in ('happy')")
>>>   }
>>>   override def copy(extra: ParamMap): Transformer = ???
>>>   @DeveloperApi
>>>   override def transformSchema(schema: StructType): StructType ={
>>>    schema
>>>   }
>>> }
>>>
>>>
>>> Usage
>>>
>>> val data = sc.parallelize(List("abce","happy")).toDF("col1")
>>> val trans = new CategoryTransformer("1")
>>> data.show()
>>> trans.transform(data).show()
>>>
>>>
>>> This transformer will make sure , you always have values in col1 as
>>> provided by you.
>>>
>>>
>>> Regards
>>> Pralabh Kumar
>>>
>>> On Fri, Jun 16, 2017 at 8:10 PM, Saatvik Shah <saatvikshah1...@gmail.com
>>> > wrote:
>>>
>>>> Hi Pralabh,
>>>>
>>>> I want the ability to create a column such that its values be
>>>> restricted to a specific set of predefined values.
>>>> For example, suppose I have a column called EMOTION: I want to ensure
>>>> each row value is one of HAPPY,SAD,ANGRY,NEUTRAL,NA.
>>>>
>>>> Thanks and Regards,
>>>> Saatvik Shah
>>>>
>>>>
>>>> On Fri, Jun 16, 2017 at 10:30 AM, Pralabh Kumar <pralabhku...@gmail.com
>>>> > wrote:
>>>>
>>>>> Hi satvik
>>>>>
>>>>> Can u please provide an example of what exactly you want.
>>>>>
>>>>>
>>>>>
>>>>> On 16-Jun-2017 7:40 PM, "Saatvik Shah" <saatvikshah1...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Yan,
>>>>>>
>>>>>> Basically the reason I was looking for the categorical datatype is as
>>>>>> given here
>>>>>> <https://pandas.pydata.org/pandas-docs/stable/categorical.html>:
>>>>>> ability to fix column values to specific categories. Is it possible to
>>>>>> create a user defined data type which could do so?
>>>>>>
>>>>>> Thanks and Regards,
>>>>>> Saatvik Shah
>>>>>>
>>>>>> On Fri, Jun 16, 2017 at 1:42 AM, 颜发才(Yan Facai) <facai....@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> You can use some Transformers to handle categorical data,
>>>>>>> For example,
>>>>>>> StringIndexer encodes a string column of labels to a column of
>>>>>>> label indices:
>>>>>>> http://spark.apache.org/docs/latest/ml-features.html#stringindexer
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jun 15, 2017 at 10:19 PM, saatvikshah1994 <
>>>>>>> saatvikshah1...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>> I'm trying to convert a Pandas -> Spark dataframe. One of the
>>>>>>>> columns I have
>>>>>>>> is of the Category type in Pandas. But there does not seem to be
>>>>>>>> support for
>>>>>>>> this same type in Spark. What is the best alternative?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> View this message in context: http://apache-spark-user-list.
>>>>>>>> 1001560.n3.nabble.com/Best-alternative-for-Category-Type-in-
>>>>>>>> Spark-Dataframe-tp28764.html
>>>>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>>>>> Nabble.com.
>>>>>>>>
>>>>>>>> ------------------------------------------------------------
>>>>>>>> ---------
>>>>>>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Saatvik Shah,*
>>>>>> *1st  Year,*
>>>>>> *Masters in the School of Computer Science,*
>>>>>> *Carnegie Mellon University*
>>>>>>
>>>>>> *https://saatvikshah1994.github.io/
>>>>>> <https://saatvikshah1994.github.io/>*
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Saatvik Shah,*
>>>> *1st  Year,*
>>>> *Masters in the School of Computer Science,*
>>>> *Carnegie Mellon University*
>>>>
>>>> *https://saatvikshah1994.github.io/
>>>> <https://saatvikshah1994.github.io/>*
>>>>
>>>
>>>
>>
>

Reply via email to