JBlas gave roughly 5x -7x performance for solving the dense linear systems in ALS when I integrated it into a prototype of Mahout's ALS for a research paper.

There are some caveats with it unfortunately:

- it requires certain fortran libs to be installed on the machines of the cluster

- its jar is really huge, so it would blow up the size of "uber-jars" built from mahout

- AFAIK its also a problem to ship it license-wise as the required libraries would not be Apache licensed

See this discussion from the Spark community for details:

https://github.com/apache/incubator-spark/pull/575


Best,
Sebastian

On 03/04/2014 11:17 PM, Suneel Marthi wrote:
There's JBlas which is used by Spark, Deeplearning.org and other Ml projects.  
IIRC, there was some prototyping done in the past using JBlas for Mahout - 
Sebastian or Sean can better speak to that?  It definitely has better 
performance than Mahout-Math.

Managing the native Fortran dependencies could be challenging with JBlas, not 
to mention that JBlas may not support sparse matrices (someone correct me here).






On Tuesday, March 4, 2014 4:57 PM, Giorgio Zoppi <giorgio.zo...@gmail.com> 
wrote:

I would like to find some way of speed up matrix library, ie JNI+C++.


2014-03-04 22:53 GMT+01:00 Frank Scholten <fr...@frankscholten.nl>:

Yes, I like to work on standardizing the code around input formats.


On Mon, Mar 3, 2014 at 7:37 PM, Suneel Marthi <suneel_mar...@yahoo.com
wrote:

To get things moving for 1.0:


a) Address the 4 issues that Sean had raised - we have already started
looking at Backlog and
  closing them, started looking at converting old
MapReduce to newer MapReduce API.

     If someone could start looking at standardizing the input/output
formats across classifiers, clustering and recommenders that would be
great.  Guess Frank S. has already started work in that direction.

b)  Need a better and cleaner serialized form of Vectors to handle names
and other kind'a stuff, this is gonna impact everything that's presently
implemented.

c)  Agree with ssc, to start looking at Spark-Mahout integration.


d) Need volunteers to QA/address issues with the present
classifiers/clustering algorithms. I personally can vouch for how
disastrous it is to deploy any of Mahout's classifiers/clustering
implementations in an Operations environment. A good example of that is
Sean's recent patch for RDF.

Naive Bayes code as it is now seems half-baked and is incomplete. Not
every code path has been tested on Streaming KMeans.

This should go some way in addressing the technical debt that's been
piled
over the years.





On Monday, March 3, 2014 1:05 PM, Sebastian Schelter <s...@apache.org>
wrote:

I would like to discuss whether we should start to have some
Spark-related code in Mahout.

--sebastian


On 03/03/2014 06:56 PM, Suneel Marthi wrote:
Grant had setup a Google Hangout for Mahout sometime last year before
0.8 release.  I had one setup too for 0.9 release. I definitely wouldn't
want to have a hangout on Saturday or weekend.





On Monday, March 3, 2014 12:52
  PM, Ted Dunning <ted.dunn...@gmail.com>
wrote:

Happy to organize a google hangout.  That has the advantage of allowing
more attendees and supporting YouTube archiving.

Sent from my iPhone


On Mar 3, 2014, at 9:34, Giorgio Zoppi <giorgio.zo...@gmail.com>
wrote:

Hello All,
Dr.Dunning could you set a meeting next Sat morning, so we can chat
and
discuss by skype improvements and what to do and indentify volunteer
and
tasks.
Best Regards,
Giorgio


2014-03-03 18:30 GMT+01:00 peng <pc...@uowmail.edu.au>:

Me three


On Sun 02 Mar 2014 11:45:33 AM EST, Ted Dunning wrote:

Ravi,


  >>>> Good points.

On Sun, Mar 2, 2014 at 12:38 AM, Ravi Mummulla <
ravi.mummu...@gmail.com>
wrote:

- Natively support Windows (guidance, etc. No documentation exists
today,
for instance)
There is a bit of demand for that.

- Faster time to first application (from discovery to first
application


  >>>>> currently takes a non-trivial amount of effort; how can we lower
the
bar
and reduce the friction for adoption?)
There is huge evidence that this is important.


      - Better documenting use cases with working samples/examples
(Documentation
on https://mahout.apache.org/users/basics/algorithms.html is
spread
out
and
there is too much
  focus on algorithms as opposed to use cases -
this
is
an
adoption
   blocker)
This is also important.


- Uniformity of the API set across all algorithms (are we providing
the
same experience across all APIs?)
And many people have been tripped up by this.


      - Measuring/publishing scalability metrics of various algorithms
(why
would
we want users to adopt Mahout vs. other frameworks for ML at
scale?)
I don't see this as important as some of your other points, but is
still
useful.


--
Quiero ser el rayo de sol que cada día te despierta
para hacerte respirar y vivir en me.
"Favola -Moda".







Reply via email to