sorry i meant _without_ mrlegacy on classpath.
On Fri, Jan 23, 2015 at 3:31 PM, Dmitriy Lyubimov <[email protected]> wrote:
> And in case anyone wonders yes shell starts and runs test script totally
> fine with mrlegacy dependency on classpath (startup script modified to use
> mahout-hadoop instead) -- both in local and distributed (standalone) mode:
>
> ================================================
>
> $ MASTER=spark://localhost:7077 bin/mahout spark-shell
>
> _ _
> _ __ ___ __ _| |__ ___ _ _| |_
> | '_ ` _ \ / _` | '_ \ / _ \| | | | __|
> | | | | | | (_| | | | | (_) | |_| | |_
> |_| |_| |_|\__,_|_| |_|\___/ \__,_|\__| version 1.0
>
>
> Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java
> 1.7.0_71)
> Type in expressions to have them evaluated.
> Type :help for more information.
> 15/01/23 15:28:25 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to
> another address
> 15/01/23 15:28:26 WARN NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> Created spark context..
> Mahout distributed context is available as "implicit val sdc".
>
>
> mahout> :load spark-shell/src/test/mahout/simple.mscala
> Loading spark-shell/src/test/mahout/simple.mscala...
> a: org.apache.mahout.math.DenseMatrix =
> {
> 0 => {0:1.0,1:2.0,2:3.0}
> 1 => {0:3.0,1:4.0,2:5.0}
> }
> drmA: org.apache.mahout.math.drm.CheckpointedDrm[Int] =
> org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark@7940bbc5
> drmAtA: org.apache.mahout.math.drm.DrmLike[Int] =
> OpAB(OpAt(org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark@7940bbc5
> ),org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark@7940bbc5)
> r: org.apache.mahout.math.drm.CheckpointedDrm[Int] =
> org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark@3c46dadf
> res4: org.apache.mahout.math.Matrix =
> {
> 0 => {0:11.0,1:15.0,2:19.0}
> 1 => {0:15.0,1:21.0,2:27.0}
> 2 => {0:19.0,1:27.0,2:35.0}
> }
> mahout>
>
>
> On Fri, Jan 23, 2015 at 3:07 PM, Suneel Marthi <[email protected]>
> wrote:
>
>> +1
>>
>> On Fri, Jan 23, 2015 at 6:04 PM, Dmitriy Lyubimov <[email protected]>
>> wrote:
>>
>> > So right now mahout-spark depends on mr-legacy.
>> > I did quick refactoring and it turns out it only _irrevocably_ depends
>> on
>> > the following classes there:
>> >
>> > MatrixWritable, VectorWritable, Varint/Varlong and VarintWritable, and
>> ...
>> > *sigh* o.a.m.common.Pair
>> >
>> > So I just dropped those five classes into new a new tiny mahout-hadoop
>> > module (to signify stuff that is directly relevant to serializing
>> thigns to
>> > DFS API) and completely removed mrlegacy and its transients from spark
>> and
>> > spark-shell dependencies.
>> >
>> > So non-cli applications (shell scripts and embedded api use) actually
>> only
>> > need spark dependencies (which come from SPARK_HOME classpath, of
>> course)
>> > and mahout jars (mahout-spark, mahout-math(-scala), mahout-hadoop and
>> > optionally mahout-spark-shell (for running shell)).
>> >
>> > This of course still doesn't address driver problems that want to throw
>> > more stuff into front-end classpath (such as cli parser) but at least it
>> > renders transitive luggage of mr-legacy (and the size of worker-shipped
>> > jars) much more tolerable.
>> >
>> > How does that sound?
>> >
>>
>
>