zhengruifeng opened a new pull request #26832: [SPARK-30202][ML][PYSPARK] impl QuantileTransform URL: https://github.com/apache/spark/pull/26832 ### What changes were proposed in this pull request? Impl QuantileTransform, a non-parametric transformation to map the data to another distribution The impl followed scikit-learn' s impl, however there still are sereral differences: 1, use `QuantileSummaries` for approximation, no matter the size of dataset; 2, use linear interpolate, the logic is similar to existing `IsotonicRegression`, while scikit-learn use a bi-directional interpolate (the two methods only differ on a special case that values are repeated in the features); 3, treat sparse vectors just like dense ones, while scikit-learn have two different logics for sparse and dense datasets. ### Why are the changes needed? 1, it is common to map the data to another desired distribution, and was already impled in scikit-learn 2, it is easy for parallelism ### Does this PR introduce any user-facing change? Yes, a new model ### How was this patch tested? added testsuites
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org