srowen commented on a change in pull request #22764: [SPARK-25765][ML] Add 
training cost to BisectingKMeans summary
URL: https://github.com/apache/spark/pull/22764#discussion_r243301364
 
 

 ##########
 File path: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeansModel.scala
 ##########
 @@ -195,7 +200,7 @@ object BisectingKMeansModel extends 
Loader[BisectingKMeansModel] {
       val data = rows.select("index", "size", "center", "norm", "cost", 
"height", "children")
       val nodes = data.rdd.map(Data.apply).collect().map(d => (d.index, 
d)).toMap
       val rootNode = buildTree(rootId, nodes)
-      new BisectingKMeansModel(rootNode, DistanceMeasure.EUCLIDEAN)
+      new BisectingKMeansModel(rootNode, DistanceMeasure.EUCLIDEAN, 0.0)
 
 Review comment:
   Would it not just be the same? `rootNode.leafNodes.map(_.cost).sum`? If that 
cost info is present in the nodes (?) it doesn't need a pass over data (which 
indeed doesn't exist at this point). If it's valuable enough to include at all, 
should this info not be correct where it is in fact available?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to