[ https://issues.apache.org/jira/browse/SPARK-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiangrui Meng resolved SPARK-2272. ---------------------------------- Resolution: Fixed Fix Version/s: 1.1.0 Issue resolved by pull request 1207 [https://github.com/apache/spark/pull/1207] > Feature scaling which standardizes the range of independent variables or > features of data. > ------------------------------------------------------------------------------------------ > > Key: SPARK-2272 > URL: https://issues.apache.org/jira/browse/SPARK-2272 > Project: Spark > Issue Type: New Feature > Components: MLlib > Reporter: DB Tsai > Assignee: DB Tsai > Fix For: 1.1.0 > > > Feature scaling is a method used to standardize the range of independent > variables or features of data. In data processing, it is also known as data > normalization and is generally performed during the data preprocessing step. > In this work, a trait called `VectorTransformer` is defined for generic > transformation of a vector. It contains two methods, `apply` which applies > transformation on a vector and `unapply` which applies inverse transformation > on a vector. > There are three concrete implementations of `VectorTransformer`, and they all > can be easily extended with PMML transformation support. > 1) `VectorStandardizer` - Standardises a vector given the mean and variance. > Since the standardization will densify the output, the output is always in > dense vector format. > > 2) `VectorRescaler` - Rescales a vector into target range specified by a > tuple of two double values or two vectors as new target minimum and maximum. > Since the rescaling will substrate the minimum of each column first, the > output will always be in dense vector regardless of input vector type. > 3) `VectorDivider` - Transforms a vector by dividing a constant or diving a > vector with element by element basis. This transformation will preserve the > type of input vector without densifying the result. > Utility helper methods are implemented for taking an input of RDD[Vector], > and then transformed RDD[Vector] and transformer are returned for dividing, > rescaling, normalization, and standardization. -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org