zhengruifeng created SPARK-28499: ------------------------------------ Summary: Optimize MinMaxScaler Key: SPARK-28499 URL: https://issues.apache.org/jira/browse/SPARK-28499 Project: Spark Issue Type: Improvement Components: ML Affects Versions: 3.0.0 Reporter: zhengruifeng
current impl of MinMaxScaler has some small places to be optimized: 1, avoid call param getter in udf. If I remember correctly, there was some tickets and prs about this, calling param getter in udf or map function, will significantly slow down the computation. 2, for a constant dim, the transformed value is also a constant value, which can be precomputed. 3, for a usual dim (i-th), the value is update by values(i) = (values(i) - minArray(i)) / range(i) * scale + $(min) here, we can precompute range * scale, so that a division can be skipped. -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org