Re: Codebase refactoring proposal

Dmitriy Lyubimov Fri, 30 Jan 2015 16:10:53 -0800

in 'spark' module it is overwritten with spark dependency, which also comes
at the same version so happens. so should be fine with 1.1.x


[INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @
mahout-spark_2.10 ---
[INFO] org.apache.mahout:mahout-spark_2.10:jar:1.0-SNAPSHOT
[INFO] +- org.apache.spark:spark-core_2.10:jar:1.1.0:compile
[INFO] |  +- org.apache.hadoop:hadoop-client:jar:2.2.0:compile
[INFO] |  |  +- org.apache.hadoop:hadoop-common:jar:2.2.0:compile
[INFO] |  |  |  +- commons-cli:commons-cli:jar:1.2:compile
[INFO] |  |  |  +- org.apache.commons:commons-math:jar:2.1:compile
[INFO] |  |  |  +- commons-io:commons-io:jar:2.4:compile
[INFO] |  |  |  +- commons-logging:commons-logging:jar:1.1.3:compile
[INFO] |  |  |  +- commons-lang:commons-lang:jar:2.6:compile
[INFO] |  |  |  +-
commons-configuration:commons-configuration:jar:1.6:compile
[INFO] |  |  |  |  +-
commons-collections:commons-collections:jar:3.2.1:compile
[INFO] |  |  |  |  +- commons-digester:commons-digester:jar:1.8:compile
[INFO] |  |  |  |  |  \-
commons-beanutils:commons-beanutils:jar:1.7.0:compile
[INFO] |  |  |  |  \-
commons-beanutils:commons-beanutils-core:jar:1.8.0:compile
[INFO] |  |  |  +- org.apache.avro:avro:jar:1.7.4:compile
[INFO] |  |  |  +- com.google.protobuf:protobuf-java:jar:2.5.0:compile
[INFO] |  |  |  +- org.apache.hadoop:hadoop-auth:jar:2.2.0:compile
[INFO] |  |  |  \- org.apache.commons:commons-compress:jar:1.4.1:compile
[INFO] |  |  |     \- org.tukaani:xz:jar:1.0:compile
[INFO] |  |  +- org.apache.hadoop:hadoop-hdfs:jar:2.2.0:compile
[INFO] |  |  +-
org.apache.hadoop:hadoop-mapreduce-client-app:jar:2.2.0:compile
[INFO] |  |  |  +-
org.apache.hadoop:hadoop-mapreduce-client-common:jar:2.2.0:compile
[INFO] |  |  |  |  +- org.apache.hadoop:hadoop-yarn-client:jar:2.2.0:compile
[INFO] |  |  |  |  |  +- com.google.inject:guice:jar:3.0:compile
[INFO] |  |  |  |  |  |  +- javax.inject:javax.inject:jar:1:compile
[INFO] |  |  |  |  |  |  \- aopalliance:aopalliance:jar:1.0:compile
[INFO] |  |  |  |  |  +-
com.sun.jersey.jersey-test-framework:jersey-test-framework-grizzly2:jar:1.9:compile
[INFO] |  |  |  |  |  |  +-
com.sun.jersey.jersey-test-framework:jersey-test-framework-core:jar:1.9:compile
[INFO] |  |  |  |  |  |  |  +-
javax.servlet:javax.servlet-api:jar:3.0.1:compile
[INFO] |  |  |  |  |  |  |  \- com.sun.jersey:jersey-client:jar:1.9:compile
[INFO] |  |  |  |  |  |  \- com.sun.jersey:jersey-grizzly2:jar:1.9:compile
[INFO] |  |  |  |  |  |     +-
org.glassfish.grizzly:grizzly-http:jar:2.1.2:compile
[INFO] |  |  |  |  |  |     |  \-
org.glassfish.grizzly:grizzly-framework:jar:2.1.2:compile
[INFO] |  |  |  |  |  |     |     \-
org.glassfish.gmbal:gmbal-api-only:jar:3.0.0-b023:compile
[INFO] |  |  |  |  |  |     |        \-
org.glassfish.external:management-api:jar:3.0.0-b012:compile
[INFO] |  |  |  |  |  |     +-
org.glassfish.grizzly:grizzly-http-server:jar:2.1.2:compile
[INFO] |  |  |  |  |  |     |  \-
org.glassfish.grizzly:grizzly-rcm:jar:2.1.2:compile
[INFO] |  |  |  |  |  |     +-
org.glassfish.grizzly:grizzly-http-servlet:jar:2.1.2:compile
[INFO] |  |  |  |  |  |     \- org.glassfish:javax.servlet:jar:3.1:compile
[INFO] |  |  |  |  |  +- com.sun.jersey:jersey-server:jar:1.9:compile
[INFO] |  |  |  |  |  |  +- asm:asm:jar:3.1:compile
[INFO] |  |  |  |  |  |  \- com.sun.jersey:jersey-core:jar:1.9:compile
[INFO] |  |  |  |  |  +- com.sun.jersey:jersey-json:jar:1.9:compile
[INFO] |  |  |  |  |  |  +- org.codehaus.jettison:jettison:jar:1.1:compile
[INFO] |  |  |  |  |  |  |  \- stax:stax-api:jar:1.0.1:compile
[INFO] |  |  |  |  |  |  +- com.sun.xml.bind:jaxb-impl:jar:2.2.3-1:compile
[INFO] |  |  |  |  |  |  |  \- javax.xml.bind:jaxb-api:jar:2.2.2:compile
[INFO] |  |  |  |  |  |  |     \-
javax.activation:activation:jar:1.1:compile
[INFO] |  |  |  |  |  |  +-
org.codehaus.jackson:jackson-jaxrs:jar:1.8.3:compile
[INFO] |  |  |  |  |  |  \-
org.codehaus.jackson:jackson-xc:jar:1.8.3:compile
[INFO] |  |  |  |  |  \-
com.sun.jersey.contribs:jersey-guice:jar:1.9:compile
[INFO] |  |  |  |  \-
org.apache.hadoop:hadoop-yarn-server-common:jar:2.2.0:compile
[INFO] |  |  |  \-
org.apache.hadoop:hadoop-mapreduce-client-shuffle:jar:2.2.0:compile
[INFO] |  |  +- org.apache.hadoop:hadoop-yarn-api:jar:2.2.0:compile
[INFO] |  |  +-
org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.2.0:compile
[INFO] |  |  |  \- org.apache.hadoop:hadoop-yarn-common:jar:2.2.0:compile
[INFO] |  |  +-
org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:2.2.0:compile
[INFO] |  |  \- org.apache.hadoop:hadoop-annotations:jar:2.2.0:compile
[INFO] |  +- net.java.dev.jets3t:jets3t:jar:0.7.1:compile
[INFO] |  |  +- commons-codec:commons-codec:jar:1.3:compile
[INFO] |  |  \- commons-httpclient:commons-httpclient:jar:3.1:compile
[INFO] |  +- org.apache.curator:curator-recipes:jar:2.4.0:compile
[INFO] |  |  +- org.apache.curator:curator-framework:jar:2.4.0:compile
[INFO] |  |  |  \- org.apache.curator:curator-client:jar:2.4.0:compile
[INFO] |  |  \- org.apache.zookeeper:zookeeper:jar:3.4.5:compile
[INFO] |  |     \- jline:jline:jar:0.9.94:compile
[INFO] |  +- org.eclipse.jetty:jetty-plus:jar:8.1.14.v20131031:compile
[INFO] |  |  +-
org.eclipse.jetty.orbit:javax.transaction:jar:1.1.1.v201105210645:compile
[INFO] |  |  +- org.eclipse.jetty:jetty-webapp:jar:8.1.14.v20131031:compile
[INFO] |  |  |  +- org.eclipse.jetty:jetty-xml:jar:8.1.14.v20131031:compile
[INFO] |  |  |  \-
org.eclipse.jetty:jetty-servlet:jar:8.1.14.v20131031:compile
[INFO] |  |  \- org.eclipse.jetty:jetty-jndi:jar:8.1.14.v20131031:compile
[INFO] |  |     \-
org.eclipse.jetty.orbit:javax.mail.glassfish:jar:1.4.1.v201005082020:compile
[INFO] |  |        \-
org.eclipse.jetty.orbit:javax.activation:jar:1.1.0.v201105071233:compile
[INFO] |  +- org.eclipse.jetty:jetty-security:jar:8.1.14.v20131031:compile
[INFO] |  +- org.eclipse.jetty:jetty-util:jar:8.1.14.v20131031:compile
[INFO] |  +- org.eclipse.jetty:jetty-server:jar:8.1.14.v20131031:compile
[INFO] |  |  +-
org.eclipse.jetty.orbit:javax.servlet:jar:3.0.0.v201112011016:compile
[INFO] |  |  +-
org.eclipse.jetty:jetty-continuation:jar:8.1.14.v20131031:compile
[INFO] |  |  \- org.eclipse.jetty:jetty-http:jar:8.1.14.v20131031:compile
[INFO] |  |     \- org.eclipse.jetty:jetty-io:jar:8.1.14.v20131031:compile
[INFO] |  +- com.google.guava:guava:jar:16.0:compile
d

On Fri, Jan 30, 2015 at 4:03 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:

> looks like it is also requested by mahout-math, wonder what is using it
> there.
>
> At very least, it needs to be synchronized to the one currently used by
> spark.
>
> [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ mahout-hadoop
> ---
> [INFO] org.apache.mahout:mahout-hadoop:jar:1.0-SNAPSHOT
> *[INFO] +- org.apache.mahout:mahout-math:jar:1.0-SNAPSHOT:compile*
> [INFO] |  +- org.apache.commons:commons-math3:jar:3.2:compile
> *[INFO] |  +- com.google.guava:guava:jar:16.0:compile*
> [INFO] |  \- com.tdunning:t-digest:jar:2.0.2:compile
> [INFO] +- org.apache.mahout:mahout-math:test-jar:tests:1.0-SNAPSHOT:test
> [INFO] +- org.apache.hadoop:hadoop-client:jar:2.2.0:compile
> [INFO] |  +- org.apache.hadoop:hadoop-common:jar:2.2.0:compile
>
>
> On Fri, Jan 30, 2015 at 7:52 AM, Pat Ferrel <p...@occamsmachete.com> wrote:
>
>> Looks like Guava is in Spark.
>>
>> On Jan 29, 2015, at 4:03 PM, Pat Ferrel <p...@occamsmachete.com> wrote:
>>
>> IndexedDataset uses Guava. Can’t tell from sure but it sounds like this
>> would not be included since I think it was taken from the mrlegacy jar.
>>
>> On Jan 25, 2015, at 10:52 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:
>>
>> ---------- Forwarded message ----------
>> From: "Pat Ferrel" <p...@occamsmachete.com>
>> Date: Jan 25, 2015 9:39 AM
>> Subject: Re: Codebase refactoring proposal
>> To: <dev@mahout.apache.org>
>> Cc:
>>
>> > When you get a chance a PR would be good.
>>
>> Yes, it would. And not just for that.
>>
>> > As I understand it you are putting some class jars somewhere in the
>> classpath. Where? How?
>> >
>>
>> /bin/mahout
>>
>> (Computes 2 different classpaths. See  'bin/mahout classpath' vs.
>> 'bin/mahout -spark'.)
>>
>> If i interpret current shell code there correctky, legacy path tries to
>> use
>> examples assemblies if not packaged, or /lib if packaged. True motivation
>> of that significantly predates 2010 and i suspect only Benson knows whole
>> true intent there.
>>
>> The spark path, which is really a quick hack of the script, tries to get
>> only selected mahout jars and locally instlalled spark classpath which i
>> guess is just the shaded spark jar in recent spark releases. It also
>> apparently tries to include /libs/*, which is never compiled in unpackaged
>> version, and now i think it is a bug it is included  because /libs/* is
>> apparently legacy packaging, and shouldnt be used  in spark jobs with a
>> wildcard. I cant beleive how lazy i am, i still did not find time to
>> understand mahout build in all cases.
>>
>> I am not even sure if packaged mahout will work with spark, honestly,
>> because of the /lib. Never tried that, since i mostly use application
>> embedding techniques.
>>
>> The same solution may apply to adding external dependencies and removing
>> the assembly in the Spark module. Which would leave only one major build
>> issue afaik.
>> >
>> > On Jan 24, 2015, at 11:53 PM, Dmitriy Lyubimov <dlie...@gmail.com>
>> wrote:
>> >
>> > No, no PR. Only experiment on private. But i believe i sufficiently
>> defined
>> > what i want to do in order to gauge if we may want to advance it some
>> time
>> > later. Goal is much lighter dependency for spark code. Eliminate
>> everything
>> > that is not compile-time dependent. (and a lot of it is thru legacy MR
>> code
>> > which we of course don't use).
>> >
>> > Cant say i understand the remaining issues you are talking about though.
>> >
>> > If you are talking about compiling lib or shaded assembly, no, this
>> doesn't
>> > do anything about it. Although point is, as it stands, the algebra and
>> > shell don't have any external dependencies but spark and these 4 (5?)
>> > mahout jars so they technically don't even need an assembly (as
>> > demonstrated).
>> >
>> > As i said, it seems driver code is the only one that may need some
>> external
>> > dependencies, but that's a different scenario from those i am talking
>> > about. But i am relatively happy with having the first two working
>> nicely
>> > at this point.
>> >
>> > On Sat, Jan 24, 2015 at 9:06 AM, Pat Ferrel <p...@occamsmachete.com>
>> wrote:
>> >
>> >> +1
>> >>
>> >> Is there a PR? You mention a "tiny mahout-hadoop” module. It would be
>> nice
>> >> to see how you’ve structured that in case we can use the same model to
>> >> solve the two remaining refactoring issues.
>> >> 1) external dependencies in the spark module
>> >> 2) no spark or h2o in the release artifacts.
>> >>
>> >> On Jan 23, 2015, at 6:45 PM, Shannon Quinn <squ...@gatech.edu> wrote:
>> >>
>> >> Also +1
>> >>
>> >> iPhone'd
>> >>
>> >>> On Jan 23, 2015, at 18:38, Andrew Palumbo <ap....@outlook.com> wrote:
>> >>>
>> >>> +1
>> >>>
>> >>>
>> >>> Sent from my Verizon Wireless 4G LTE smartphone
>> >>>
>> >>> <div>-------- Original message --------</div><div>From: Dmitriy
>> Lyubimov
>> >> <dlie...@gmail.com> </div><div>Date:01/23/2015  6:06 PM  (GMT-05:00)
>> >> </div><div>To: dev@mahout.apache.org </div><div>Subject: Codebase
>> >> refactoring proposal </div><div>
>> >>> </div>
>> >>> So right now mahout-spark depends on mr-legacy.
>> >>> I did quick refactoring and it turns out it only _irrevocably_ depends
>> on
>> >>> the following classes there:
>> >>>
>> >>> MatrixWritable, VectorWritable, Varint/Varlong and VarintWritable, and
>> >> ...
>> >>> *sigh* o.a.m.common.Pair
>> >>>
>> >>> So  I just dropped those five classes into new a new tiny
>> mahout-hadoop
>> >>> module (to signify stuff that is directly relevant to serializing
>> thigns
>> >> to
>> >>> DFS API) and completely removed mrlegacy and its transients from spark
>> >> and
>> >>> spark-shell dependencies.
>> >>>
>> >>> So non-cli applications (shell scripts and embedded api use) actually
>> >> only
>> >>> need spark dependencies (which come from SPARK_HOME classpath, of
>> course)
>> >>> and mahout jars (mahout-spark, mahout-math(-scala), mahout-hadoop and
>> >>> optionally mahout-spark-shell (for running shell)).
>> >>>
>> >>> This of course still doesn't address driver problems that want to
>> throw
>> >>> more stuff into front-end classpath (such as cli parser) but at least
>> it
>> >>> renders transitive luggage of mr-legacy (and the size of
>> worker-shipped
>> >>> jars) much more tolerable.
>> >>>
>> >>> How does that sound?
>> >>
>> >>
>> >
>>
>>
>>
>

Re: Codebase refactoring proposal

Reply via email to