Hosur Narahari created SPARK-22021:
--------------------------------------

             Summary: Add a feature transformation to accept a function and 
apply it on all rows of dataframe
                 Key: SPARK-22021
                 URL: https://issues.apache.org/jira/browse/SPARK-22021
             Project: Spark
          Issue Type: New Feature
          Components: ML
    Affects Versions: 2.3.0
            Reporter: Hosur Narahari


More often we generate derived features in ML pipeline by doing some 
mathematical or other kind of operation on columns of dataframe like getting a 
total of few columns as a new column or if there is text field message and we 
want the length of message etc. We currently don't have an efficient way to 
handle such scenario in ML pipeline.

By Providing a transformer which accepts a function and performs that on 
mentioned columns to generate output column of numerical type, user has the 
flexibility to derive features by applying any domain specific logic.

Example:

val function = "function(a,b) { return a+b;}"
val transformer = new GenFuncTransformer().setInputCols(Array("v1", 
"v2")).setOutputCol("result").setFunction(function)
val df = Seq((1.0, 2.0), (3.0, 4.0)).toDF("v1", "v2")
val result = transformer.transform(df)
result.show

v1   v2  result
1.0 2.0 3.0
3.0 4.0 7.0




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to