[ https://issues.apache.org/jira/browse/SPARK-28499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-28499. ------------------------------- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 25244 [https://github.com/apache/spark/pull/25244] > Optimize MinMaxScaler > --------------------- > > Key: SPARK-28499 > URL: https://issues.apache.org/jira/browse/SPARK-28499 > Project: Spark > Issue Type: Improvement > Components: ML > Affects Versions: 3.0.0 > Reporter: zhengruifeng > Assignee: zhengruifeng > Priority: Minor > Fix For: 3.0.0 > > > current impl of MinMaxScaler has some small places to be optimized: > 1, avoid call param getter in udf. > If I remember correctly, there was some tickets and prs about this, calling > param getter in udf or map function, will significantly slow down the > computation. > 2, for a constant dim, the transformed value is also a constant value, which > can be precomputed. > 3, for a usual dim (i-th), the value is update by > values(i) = (values(i) - minArray(i)) / range(i) * scale + $(min) > here, we can precompute scale / range, so that a division can be skipped. -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org