zhengruifeng opened a new pull request #26832: [SPARK-30202][ML][PYSPARK] impl 
QuantileTransform
URL: https://github.com/apache/spark/pull/26832
 
 
   ### What changes were proposed in this pull request?
   Impl QuantileTransform, a non-parametric transformation to map the data to 
another
   distribution
   The impl followed scikit-learn' s impl, however there still are sereral 
differences:
   1, use `QuantileSummaries` for approximation, no matter the size of dataset;
   2, use linear interpolate, the logic is similar to existing 
`IsotonicRegression`, while scikit-learn use a bi-directional interpolate (the 
two methods only differ on a special case that  values are repeated in the 
features);
   3, treat sparse vectors just like dense ones, while scikit-learn have two 
different logics for sparse and dense datasets.
   
   ### Why are the changes needed?
   1, it is common to map the data to another desired distribution, and was 
already impled in scikit-learn
   2, it is easy for parallelism
   
   ### Does this PR introduce any user-facing change?
   Yes, a new model
   
   
   ### How was this patch tested?
   added testsuites
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to