[jira] [Commented] (SPARK-7514) Add MinMaxScaler to feature transformation

2015-05-11 Thread yuhao yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537651#comment-14537651
 ] 

yuhao yang commented on SPARK-7514:
---

Thanks Joseph, just one concern for using center as it will change the core 
function from
Normalized( x ) = (x - min) / (max - min) * scale + newBase
to 
Normalized( x ) = ((x - min) / (max - min)  - 0.5 )* scale + center
which seems be to not as straightforward.

Sure we can further discuss it over code.

 Add MinMaxScaler to feature transformation
 --

 Key: SPARK-7514
 URL: https://issues.apache.org/jira/browse/SPARK-7514
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: yuhao yang
   Original Estimate: 24h
  Remaining Estimate: 24h

 Add a popular scaling method to feature component, which is commonly known as 
 min-max normalization or Rescaling.
 Core function is,
 Normalized( x ) = (x - min) / (max - min) * scale + newBase
 where newBase and scale are parameters of the VectorTransformer. newBase is 
 the new minimum number for the feature, and scale controls the range after 
 transformation. This is a little complicated than the basic MinMax 
 normalization, yet it provides flexibility so that users can control the 
 range more specifically. like [0.1, 0.9] in some NN application.
 for case that max == min, 0.5 is used as the raw value.
 reference:
  http://en.wikipedia.org/wiki/Feature_scaling
 http://stn.spotfire.com/spotfire_client_help/index.htm#norm/norm_scale_between_0_and_1.htm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7514) Add MinMaxScaler to feature transformation

2015-05-10 Thread yuhao yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537602#comment-14537602
 ] 

yuhao yang commented on SPARK-7514:
---

Class name has always been MinMaxScaler in the code, yet I named jira wrongly...

For the parameters, currently the model looks like:
class MinMaxScalerModel (
+val min: Vector,
+val max: Vector,
+var newBase: Double,
+var scale: Double) extends VectorTransformer 

I have used min, max to store the model statistics. In some articles, the range 
bounds are named newMin / newMax (I think it can be confusing). 
ran out of variable names here...

setCenterScale looks good.






 Add MinMaxScaler to feature transformation
 --

 Key: SPARK-7514
 URL: https://issues.apache.org/jira/browse/SPARK-7514
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: yuhao yang
   Original Estimate: 24h
  Remaining Estimate: 24h

 Add a popular scaling method to feature component, which is commonly known as 
 min-max normalization or Rescaling.
 Core function is,
 Normalized( x ) = (x - min) / (max - min) * scale + newBase
 where newBase and scale are parameters of the VectorTransformer. newBase is 
 the new minimum number for the feature, and scale controls the range after 
 transformation. This is a little complicated than the basic MinMax 
 normalization, yet it provides flexibility so that users can control the 
 range more specifically. like [0.1, 0.9] in some NN application.
 for case that max == min, 0.5 is used as the raw value.
 reference:
  http://en.wikipedia.org/wiki/Feature_scaling
 http://stn.spotfire.com/spotfire_client_help/index.htm#norm/norm_scale_between_0_and_1.htm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7514) Add MinMaxScaler to feature transformation

2015-05-10 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537614#comment-14537614
 ] 

Joseph K. Bradley commented on SPARK-7514:
--

Let's only use 1 of either base or center.  I prefer center since it 
seems more specific.
I'll try to check out the PR soon.

 Add MinMaxScaler to feature transformation
 --

 Key: SPARK-7514
 URL: https://issues.apache.org/jira/browse/SPARK-7514
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: yuhao yang
   Original Estimate: 24h
  Remaining Estimate: 24h

 Add a popular scaling method to feature component, which is commonly known as 
 min-max normalization or Rescaling.
 Core function is,
 Normalized( x ) = (x - min) / (max - min) * scale + newBase
 where newBase and scale are parameters of the VectorTransformer. newBase is 
 the new minimum number for the feature, and scale controls the range after 
 transformation. This is a little complicated than the basic MinMax 
 normalization, yet it provides flexibility so that users can control the 
 range more specifically. like [0.1, 0.9] in some NN application.
 for case that max == min, 0.5 is used as the raw value.
 reference:
  http://en.wikipedia.org/wiki/Feature_scaling
 http://stn.spotfire.com/spotfire_client_help/index.htm#norm/norm_scale_between_0_and_1.htm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7514) Add MinMaxScaler to feature transformation

2015-05-10 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537570#comment-14537570
 ] 

Joseph K. Bradley commented on SPARK-7514:
--

Thanks for checking!  I like the updated name MinMaxScaler.

For parameters, I like specifying min,max as in sklearn.

We could also provide a setter method of the form {{setCenterScale(center: 
Double, scale: Double)}} which computes and sets min,max.  (I'm using center 
instead of translation since that seems more natural to me.)  But that may be 
superfluous, so let me know if that sounds useful.

 Add MinMaxScaler to feature transformation
 --

 Key: SPARK-7514
 URL: https://issues.apache.org/jira/browse/SPARK-7514
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: yuhao yang
   Original Estimate: 24h
  Remaining Estimate: 24h

 Add a popular scaling method to feature component, which is commonly known as 
 min-max normalization or Rescaling.
 Core function is,
 Normalized( x ) = (x - min) / (max - min) * scale + newBase
 where newBase and scale are parameters of the VectorTransformer. newBase is 
 the new minimum number for the feature, and scale controls the range after 
 transformation. This is a little complicated than the basic MinMax 
 normalization, yet it provides flexibility so that users can control the 
 range more specifically. like [0.1, 0.9] in some NN application.
 for case that max == min, 0.5 is used as the raw value.
 reference:
  http://en.wikipedia.org/wiki/Feature_scaling
 http://stn.spotfire.com/spotfire_client_help/index.htm#norm/norm_scale_between_0_and_1.htm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org