+1

Sent from my Verizon Wireless 4G LTE smartphone

<div>-------- Original message --------</div><div>From: Dmitriy Lyubimov 
<dlie...@gmail.com> </div><div>Date:01/23/2015  6:06 PM  (GMT-05:00) 
</div><div>To: dev@mahout.apache.org </div><div>Subject: Codebase refactoring 
proposal </div><div>
</div>
So right now mahout-spark depends on mr-legacy.
I did quick refactoring and it turns out it only _irrevocably_ depends on
the following classes there:

MatrixWritable, VectorWritable, Varint/Varlong and VarintWritable, and ...
*sigh* o.a.m.common.Pair

So  I just dropped those five classes into new a new tiny mahout-hadoop
module (to signify stuff that is directly relevant to serializing thigns to
DFS API) and completely removed mrlegacy and its transients from spark and
spark-shell dependencies.

So non-cli applications (shell scripts and embedded api use) actually only
need spark dependencies (which come from SPARK_HOME classpath, of course)
and mahout jars (mahout-spark, mahout-math(-scala), mahout-hadoop and
optionally mahout-spark-shell (for running shell)).

This of course still doesn't address driver problems that want to throw
more stuff into front-end classpath (such as cli parser) but at least it
renders transitive luggage of mr-legacy (and the size of worker-shipped
jars) much more tolerable.

How does that sound?

Reply via email to