+1
Sent from my Verizon Wireless 4G LTE smartphone <div>-------- Original message --------</div><div>From: Dmitriy Lyubimov <dlie...@gmail.com> </div><div>Date:01/23/2015 6:06 PM (GMT-05:00) </div><div>To: dev@mahout.apache.org </div><div>Subject: Codebase refactoring proposal </div><div> </div> So right now mahout-spark depends on mr-legacy. I did quick refactoring and it turns out it only _irrevocably_ depends on the following classes there: MatrixWritable, VectorWritable, Varint/Varlong and VarintWritable, and ... *sigh* o.a.m.common.Pair So I just dropped those five classes into new a new tiny mahout-hadoop module (to signify stuff that is directly relevant to serializing thigns to DFS API) and completely removed mrlegacy and its transients from spark and spark-shell dependencies. So non-cli applications (shell scripts and embedded api use) actually only need spark dependencies (which come from SPARK_HOME classpath, of course) and mahout jars (mahout-spark, mahout-math(-scala), mahout-hadoop and optionally mahout-spark-shell (for running shell)). This of course still doesn't address driver problems that want to throw more stuff into front-end classpath (such as cli parser) but at least it renders transitive luggage of mr-legacy (and the size of worker-shipped jars) much more tolerable. How does that sound?