And in case anyone wonders yes shell starts and runs test script totally fine with mrlegacy dependency on classpath (startup script modified to use mahout-hadoop instead) -- both in local and distributed (standalone) mode:
================================================ $ MASTER=spark://localhost:7077 bin/mahout spark-shell _ _ _ __ ___ __ _| |__ ___ _ _| |_ | '_ ` _ \ / _` | '_ \ / _ \| | | | __| | | | | | | (_| | | | | (_) | |_| | |_ |_| |_| |_|\__,_|_| |_|\___/ \__,_|\__| version 1.0 Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_71) Type in expressions to have them evaluated. Type :help for more information. 15/01/23 15:28:25 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 15/01/23 15:28:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Created spark context.. Mahout distributed context is available as "implicit val sdc". mahout> :load spark-shell/src/test/mahout/simple.mscala Loading spark-shell/src/test/mahout/simple.mscala... a: org.apache.mahout.math.DenseMatrix = { 0 => {0:1.0,1:2.0,2:3.0} 1 => {0:3.0,1:4.0,2:5.0} } drmA: org.apache.mahout.math.drm.CheckpointedDrm[Int] = org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark@7940bbc5 drmAtA: org.apache.mahout.math.drm.DrmLike[Int] = OpAB(OpAt(org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark@7940bbc5 ),org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark@7940bbc5) r: org.apache.mahout.math.drm.CheckpointedDrm[Int] = org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark@3c46dadf res4: org.apache.mahout.math.Matrix = { 0 => {0:11.0,1:15.0,2:19.0} 1 => {0:15.0,1:21.0,2:27.0} 2 => {0:19.0,1:27.0,2:35.0} } mahout> On Fri, Jan 23, 2015 at 3:07 PM, Suneel Marthi <suneel.mar...@gmail.com> wrote: > +1 > > On Fri, Jan 23, 2015 at 6:04 PM, Dmitriy Lyubimov <dlie...@gmail.com> > wrote: > > > So right now mahout-spark depends on mr-legacy. > > I did quick refactoring and it turns out it only _irrevocably_ depends on > > the following classes there: > > > > MatrixWritable, VectorWritable, Varint/Varlong and VarintWritable, and > ... > > *sigh* o.a.m.common.Pair > > > > So I just dropped those five classes into new a new tiny mahout-hadoop > > module (to signify stuff that is directly relevant to serializing thigns > to > > DFS API) and completely removed mrlegacy and its transients from spark > and > > spark-shell dependencies. > > > > So non-cli applications (shell scripts and embedded api use) actually > only > > need spark dependencies (which come from SPARK_HOME classpath, of course) > > and mahout jars (mahout-spark, mahout-math(-scala), mahout-hadoop and > > optionally mahout-spark-shell (for running shell)). > > > > This of course still doesn't address driver problems that want to throw > > more stuff into front-end classpath (such as cli parser) but at least it > > renders transitive luggage of mr-legacy (and the size of worker-shipped > > jars) much more tolerable. > > > > How does that sound? > > >