[jira] [Commented] (SPARK-7514) Add MinMaxScaler to feature transformation
[ https://issues.apache.org/jira/browse/SPARK-7514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537651#comment-14537651 ] yuhao yang commented on SPARK-7514: --- Thanks Joseph, just one concern for using center as it will change the core function from Normalized( x ) = (x - min) / (max - min) * scale + newBase to Normalized( x ) = ((x - min) / (max - min) - 0.5 )* scale + center which seems be to not as straightforward. Sure we can further discuss it over code. Add MinMaxScaler to feature transformation -- Key: SPARK-7514 URL: https://issues.apache.org/jira/browse/SPARK-7514 Project: Spark Issue Type: New Feature Components: MLlib Reporter: yuhao yang Original Estimate: 24h Remaining Estimate: 24h Add a popular scaling method to feature component, which is commonly known as min-max normalization or Rescaling. Core function is, Normalized( x ) = (x - min) / (max - min) * scale + newBase where newBase and scale are parameters of the VectorTransformer. newBase is the new minimum number for the feature, and scale controls the range after transformation. This is a little complicated than the basic MinMax normalization, yet it provides flexibility so that users can control the range more specifically. like [0.1, 0.9] in some NN application. for case that max == min, 0.5 is used as the raw value. reference: http://en.wikipedia.org/wiki/Feature_scaling http://stn.spotfire.com/spotfire_client_help/index.htm#norm/norm_scale_between_0_and_1.htm -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7514) Add MinMaxScaler to feature transformation
[ https://issues.apache.org/jira/browse/SPARK-7514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537602#comment-14537602 ] yuhao yang commented on SPARK-7514: --- Class name has always been MinMaxScaler in the code, yet I named jira wrongly... For the parameters, currently the model looks like: class MinMaxScalerModel ( +val min: Vector, +val max: Vector, +var newBase: Double, +var scale: Double) extends VectorTransformer I have used min, max to store the model statistics. In some articles, the range bounds are named newMin / newMax (I think it can be confusing). ran out of variable names here... setCenterScale looks good. Add MinMaxScaler to feature transformation -- Key: SPARK-7514 URL: https://issues.apache.org/jira/browse/SPARK-7514 Project: Spark Issue Type: New Feature Components: MLlib Reporter: yuhao yang Original Estimate: 24h Remaining Estimate: 24h Add a popular scaling method to feature component, which is commonly known as min-max normalization or Rescaling. Core function is, Normalized( x ) = (x - min) / (max - min) * scale + newBase where newBase and scale are parameters of the VectorTransformer. newBase is the new minimum number for the feature, and scale controls the range after transformation. This is a little complicated than the basic MinMax normalization, yet it provides flexibility so that users can control the range more specifically. like [0.1, 0.9] in some NN application. for case that max == min, 0.5 is used as the raw value. reference: http://en.wikipedia.org/wiki/Feature_scaling http://stn.spotfire.com/spotfire_client_help/index.htm#norm/norm_scale_between_0_and_1.htm -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7514) Add MinMaxScaler to feature transformation
[ https://issues.apache.org/jira/browse/SPARK-7514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537614#comment-14537614 ] Joseph K. Bradley commented on SPARK-7514: -- Let's only use 1 of either base or center. I prefer center since it seems more specific. I'll try to check out the PR soon. Add MinMaxScaler to feature transformation -- Key: SPARK-7514 URL: https://issues.apache.org/jira/browse/SPARK-7514 Project: Spark Issue Type: New Feature Components: MLlib Reporter: yuhao yang Original Estimate: 24h Remaining Estimate: 24h Add a popular scaling method to feature component, which is commonly known as min-max normalization or Rescaling. Core function is, Normalized( x ) = (x - min) / (max - min) * scale + newBase where newBase and scale are parameters of the VectorTransformer. newBase is the new minimum number for the feature, and scale controls the range after transformation. This is a little complicated than the basic MinMax normalization, yet it provides flexibility so that users can control the range more specifically. like [0.1, 0.9] in some NN application. for case that max == min, 0.5 is used as the raw value. reference: http://en.wikipedia.org/wiki/Feature_scaling http://stn.spotfire.com/spotfire_client_help/index.htm#norm/norm_scale_between_0_and_1.htm -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7514) Add MinMaxScaler to feature transformation
[ https://issues.apache.org/jira/browse/SPARK-7514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537570#comment-14537570 ] Joseph K. Bradley commented on SPARK-7514: -- Thanks for checking! I like the updated name MinMaxScaler. For parameters, I like specifying min,max as in sklearn. We could also provide a setter method of the form {{setCenterScale(center: Double, scale: Double)}} which computes and sets min,max. (I'm using center instead of translation since that seems more natural to me.) But that may be superfluous, so let me know if that sounds useful. Add MinMaxScaler to feature transformation -- Key: SPARK-7514 URL: https://issues.apache.org/jira/browse/SPARK-7514 Project: Spark Issue Type: New Feature Components: MLlib Reporter: yuhao yang Original Estimate: 24h Remaining Estimate: 24h Add a popular scaling method to feature component, which is commonly known as min-max normalization or Rescaling. Core function is, Normalized( x ) = (x - min) / (max - min) * scale + newBase where newBase and scale are parameters of the VectorTransformer. newBase is the new minimum number for the feature, and scale controls the range after transformation. This is a little complicated than the basic MinMax normalization, yet it provides flexibility so that users can control the range more specifically. like [0.1, 0.9] in some NN application. for case that max == min, 0.5 is used as the raw value. reference: http://en.wikipedia.org/wiki/Feature_scaling http://stn.spotfire.com/spotfire_client_help/index.htm#norm/norm_scale_between_0_and_1.htm -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org