[ 
https://issues.apache.org/jira/browse/SPARK-22021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16167806#comment-16167806
 ] 

Nick Pentreath commented on SPARK-22021:
----------------------------------------

Why a JavaScript function? I think this is not a good fit to go into Spark ML 
core. You can easily have this as an external library or Spark package.

We are looking at potentially a transformer for generic Scala functions in 
SPARK-20271

> Add a feature transformation to accept a function and apply it on all rows of 
> dataframe
> ---------------------------------------------------------------------------------------
>
>                 Key: SPARK-22021
>                 URL: https://issues.apache.org/jira/browse/SPARK-22021
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>    Affects Versions: 2.3.0
>            Reporter: Hosur Narahari
>
> More often we generate derived features in ML pipeline by doing some 
> mathematical or other kind of operation on columns of dataframe like getting 
> a total of few columns as a new column or if there is text field message and 
> we want the length of message etc. We currently don't have an efficient way 
> to handle such scenario in ML pipeline.
> By Providing a transformer which accepts a function and performs that on 
> mentioned columns to generate output column of numerical type, user has the 
> flexibility to derive features by applying any domain specific logic.
> Example:
> val function = "function(a,b) { return a+b;}"
> val transformer = new GenFuncTransformer().setInputCols(Array("v1", 
> "v2")).setOutputCol("result").setFunction(function)
> val df = Seq((1.0, 2.0), (3.0, 4.0)).toDF("v1", "v2")
> val result = transformer.transform(df)
> result.show
> v1   v2  result
> 1.0 2.0 3.0
> 3.0 4.0 7.0



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to