Github user srowen commented on the issue: https://github.com/apache/spark/pull/22784 Hm, as a general comment, is this going to scale? This is making a potentially huge sparse data set dense, and computing a PCA via SVD. I get the idea that it's better to have some option than none, but I wonder if this approach is realistic for a data set with even 100K rows, and if not, is it going to confuse people.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org