Github user mengxr commented on the pull request:
https://github.com/apache/incubator-spark/pull/575#issuecomment-35449886
@fommil @MLnick I included MTJ into the benchmarks (see the updated comment
above). Basically it performs very similar to breeze.
@martinjaggi Gradient based method needs dot product between sparse and
dense vectors, or multiplication between sparse matrix and dense vectors if we
consider creating a local sparse matrix first. If the input RDD to gradient
based method is not cached, I would recommend cache it first or down-sample it
if it is too large to cache. If serialization of the input data occurs for
every iteration, the computation cost becomes negligible. If data is cached and
we don't copy data around during the conversion from the data model we defined
and the underlying vector implementation, the overhead is very small. I'm also
working on a performance test suite for MLlib algorithms to make it easy for us
to do the comparison.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
[email protected] or file a JIRA ticket with INFRA.
---