Hi,
I need to change the toString on LabeledPoint to libsvm format so that I
can dump RDD[LabeledPoint] as a format that could be read by sparse
glmnet-R and other packages to benchmark mllib classification accuracy...
Basically I have to change the toString of LabeledPoint and toString of
SparseVector
Should I add it as a PR or is it already being added ?
I added these functions toLibSvm in my internal util class for now...
def toLibSvm(labelPoint: LabeledPoint): String = {
labelPoint.label.toString + + toLibSvm(labelPoint.features
.asInstanceOf[SparseVector])
}
def toLibSvm(features: SparseVector): String = {
val indices = features.indices
val values = features.values
indices.zip(values).mkString( ).replace(',', ':').replace((,
).replace(),)
}
Thanks.
Deb
On Fri, May 9, 2014 at 10:09 PM, mateiz g...@git.apache.org wrote:
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/685#discussion_r12502569
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/linalg/VectorsSuite.scala ---
@@ -100,4 +100,27 @@ class VectorsSuite extends FunSuite {
assert(vec2(6) === 4.0)
assert(vec2(7) === 0.0)
}
+
+ test(parse vectors) {
+val vectors = Seq(
+ Vectors.dense(Array.empty[Double]),
+ Vectors.dense(1.0),
+ Vectors.dense(1.0, 0.0, -2.0),
+ Vectors.sparse(0, Array.empty[Int], Array.empty[Double]),
+ Vectors.sparse(1, Array(0), Array(1.0)),
+ Vectors.sparse(3, Array(0, 2), Array(1.0, -2.0)))
+vectors.foreach { v =
+ val v1 = Vectors.parse(v.toString)
+ assert(v.getClass === v1.getClass)
+ assert(v === v1)
+}
+
+val malformatted = Seq(1, [1,,], [1,2, (1,[1,2]),
(1,[1],[2.0,1.0]))
+malformatted.foreach { s =
+ intercept[RuntimeException] {
--- End diff --
Should be Exception instead
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---