srowen commented on a change in pull request #22764: [SPARK-25765][ML] Add training cost to BisectingKMeans summary URL: https://github.com/apache/spark/pull/22764#discussion_r243301364
########## File path: mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeansModel.scala ########## @@ -195,7 +200,7 @@ object BisectingKMeansModel extends Loader[BisectingKMeansModel] { val data = rows.select("index", "size", "center", "norm", "cost", "height", "children") val nodes = data.rdd.map(Data.apply).collect().map(d => (d.index, d)).toMap val rootNode = buildTree(rootId, nodes) - new BisectingKMeansModel(rootNode, DistanceMeasure.EUCLIDEAN) + new BisectingKMeansModel(rootNode, DistanceMeasure.EUCLIDEAN, 0.0) Review comment: Would it not just be the same? `rootNode.leafNodes.map(_.cost).sum`? If that cost info is present in the nodes (?) it doesn't need a pass over data (which indeed doesn't exist at this point). If it's valuable enough to include at all, should this info not be correct where it is in fact available? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org