Github user GeorgeDittmar commented on a diff in the pull request: https://github.com/apache/spark/pull/6112#discussion_r31387313 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala --- @@ -717,6 +719,49 @@ class SparseVector( new SparseVector(size, ii, vv) } } + + override def argmax: Int = { + if (size == 0) { + -1 + } else { + + //grab first active index and value by default + var maxIdx = indices(0) + var maxValue = values(0) + + foreachActive { (i, v) => + if (v > maxValue) { + maxIdx = i + maxValue = v + } + } + + // look for inactive values incase all active node values are negative + if(size != values.size && maxValue <= 0){ --- End diff -- Found another corner case with if we have a 0 value defined in the active set of values at the very end of the vector. I'm wondering if it might make more sense to more strongly enforce in the SparseVector implementation that 0's cant be in the set of active values? maybe thats too strict of a rule, but it would cut down on these corner cases. Seems odd to allow addition of active nodes with value 0 if they should really be inactive. As well if we call SparseVector.toSparseVector it looks like it filters out the zeros to begin with so might make sense to do this more formally at object creation time. @mengxr thoughts?
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org