Re: Mahout distro Size

2016-09-06 Thread Andrew Palumbo
Actually I think I remember Dr. Cos saying, around the time we started working 
with bigtop, that he went through their poms with a fine toothed comb and 
helped them get everything in order.  Maybe we could ask him to help us out.


From: Dmitriy Lyubimov 
Sent: Tuesday, September 6, 2016 8:24:29 PM
To: dev@mahout.apache.org
Subject: Re: Mahout distro Size

I dunno. they build shaded assembly artifact it seems and are happy with
this approach. It would seem we'd just need the legacy deps in a similar
case.

On Tue, Sep 6, 2016 at 4:48 PM, Andrew Palumbo  wrote:

> bq.
>
> 4: other projects do something too. spark (at least it used to) to produce
> tons of lib-managed deps as the result of its build, they probably still
> have?
>
>
> Do you mean using something like Spark's dependency resolver?
>
> 
> From: Dmitriy Lyubimov 
> Sent: Tuesday, September 6, 2016 4:46:24 PM
> To: dev@mahout.apache.org
> Subject: Re: Mahout distro Size
>
> 2 + 1
> 3 + 1
>
> 4: other projects do something too. spark (at least it used to) to produce
> tons of lib-managed deps as the result of its build, they probably still
> have?
>
> On the other hand, the samsara only dependencies are really light. backends
> are really always "provided", and the rest of it is fairly small enough not
> to be an issue either way.  but we probably definitely should drop local
> support for MR stuff (MR local mode didn't work correctly anyway, last time
> I checked)
>
> On Tue, Sep 6, 2016 at 1:33 PM, Andrew Palumbo  wrote:
>
> > The current apache-mahout-distribution-0.12.2.tar.gz<http://mirror.
> > stjschools.org/public/apache/mahout/0.12.2/apache-mahout-
> > distribution-0.12.2.tar.gz> is 224M. we need to look for ways to get this
> > size down.
> >
> >   1.  A few Possibilities:
> >
> >   2.  Drop h2o (binary only) from Distro? (18M - unused)
> >
> >   3.  MAHOUT-1865<https://issues.apache.org/jira/browse/MAHOUT-1865>:
> > Remove Hadoop 1 support. could save us some space.
> >
> >   4.  MAHOUT-1706<https://issues.apache.org/jira/browse/MAHOUT-1706>:
> > Remove dependency jars from /lib in mahout binary distribution. Should
> also
> > save space.
> >
> >   5.  Having dropped support for MAHOUT_LOCAL we can now likely set a lot
> > of dependencies to  scope, we can revisit: MAHOUT-1705<
> > https://issues.apache.org/jira/browse/MAHOUT-1705>: Verify dependencies
> > in job jar for mahout-examples.
> >
> >  *   16M./lib/hadoop
> >
> >  *   85M./lib/
> >
> > *   Many of the jars in /lib/ and possibly /lib/hadoop are
> already
> > packaged into the mahout-examples jar and adding them to the classpath
> from
> > /lib/ is therefore redundant. As well many may be provided.
> >
>


Re: Mahout distro Size

2016-09-06 Thread Dmitriy Lyubimov
I dunno. they build shaded assembly artifact it seems and are happy with
this approach. It would seem we'd just need the legacy deps in a similar
case.

On Tue, Sep 6, 2016 at 4:48 PM, Andrew Palumbo  wrote:

> bq.
>
> 4: other projects do something too. spark (at least it used to) to produce
> tons of lib-managed deps as the result of its build, they probably still
> have?
>
>
> Do you mean using something like Spark's dependency resolver?
>
> 
> From: Dmitriy Lyubimov 
> Sent: Tuesday, September 6, 2016 4:46:24 PM
> To: dev@mahout.apache.org
> Subject: Re: Mahout distro Size
>
> 2 + 1
> 3 + 1
>
> 4: other projects do something too. spark (at least it used to) to produce
> tons of lib-managed deps as the result of its build, they probably still
> have?
>
> On the other hand, the samsara only dependencies are really light. backends
> are really always "provided", and the rest of it is fairly small enough not
> to be an issue either way.  but we probably definitely should drop local
> support for MR stuff (MR local mode didn't work correctly anyway, last time
> I checked)
>
> On Tue, Sep 6, 2016 at 1:33 PM, Andrew Palumbo  wrote:
>
> > The current apache-mahout-distribution-0.12.2.tar.gz<http://mirror.
> > stjschools.org/public/apache/mahout/0.12.2/apache-mahout-
> > distribution-0.12.2.tar.gz> is 224M. we need to look for ways to get this
> > size down.
> >
> >   1.  A few Possibilities:
> >
> >   2.  Drop h2o (binary only) from Distro? (18M - unused)
> >
> >   3.  MAHOUT-1865<https://issues.apache.org/jira/browse/MAHOUT-1865>:
> > Remove Hadoop 1 support. could save us some space.
> >
> >   4.  MAHOUT-1706<https://issues.apache.org/jira/browse/MAHOUT-1706>:
> > Remove dependency jars from /lib in mahout binary distribution. Should
> also
> > save space.
> >
> >   5.  Having dropped support for MAHOUT_LOCAL we can now likely set a lot
> > of dependencies to  scope, we can revisit: MAHOUT-1705<
> > https://issues.apache.org/jira/browse/MAHOUT-1705>: Verify dependencies
> > in job jar for mahout-examples.
> >
> >  *   16M./lib/hadoop
> >
> >  *   85M./lib/
> >
> > *   Many of the jars in /lib/ and possibly /lib/hadoop are
> already
> > packaged into the mahout-examples jar and adding them to the classpath
> from
> > /lib/ is therefore redundant. As well many may be provided.
> >
>


Re: Mahout distro Size

2016-09-06 Thread Andrew Palumbo
bq.

4: other projects do something too. spark (at least it used to) to produce
tons of lib-managed deps as the result of its build, they probably still
have?


Do you mean using something like Spark's dependency resolver?


From: Dmitriy Lyubimov 
Sent: Tuesday, September 6, 2016 4:46:24 PM
To: dev@mahout.apache.org
Subject: Re: Mahout distro Size

2 + 1
3 + 1

4: other projects do something too. spark (at least it used to) to produce
tons of lib-managed deps as the result of its build, they probably still
have?

On the other hand, the samsara only dependencies are really light. backends
are really always "provided", and the rest of it is fairly small enough not
to be an issue either way.  but we probably definitely should drop local
support for MR stuff (MR local mode didn't work correctly anyway, last time
I checked)

On Tue, Sep 6, 2016 at 1:33 PM, Andrew Palumbo  wrote:

> The current apache-mahout-distribution-0.12.2.tar.gz<http://mirror.
> stjschools.org/public/apache/mahout/0.12.2/apache-mahout-
> distribution-0.12.2.tar.gz> is 224M. we need to look for ways to get this
> size down.
>
>   1.  A few Possibilities:
>
>   2.  Drop h2o (binary only) from Distro? (18M - unused)
>
>   3.  MAHOUT-1865<https://issues.apache.org/jira/browse/MAHOUT-1865>:
> Remove Hadoop 1 support. could save us some space.
>
>   4.  MAHOUT-1706<https://issues.apache.org/jira/browse/MAHOUT-1706>:
> Remove dependency jars from /lib in mahout binary distribution. Should also
> save space.
>
>   5.  Having dropped support for MAHOUT_LOCAL we can now likely set a lot
> of dependencies to  scope, we can revisit: MAHOUT-1705<
> https://issues.apache.org/jira/browse/MAHOUT-1705>: Verify dependencies
> in job jar for mahout-examples.
>
>  *   16M./lib/hadoop
>
>  *   85M./lib/
>
> *   Many of the jars in /lib/ and possibly /lib/hadoop are already
> packaged into the mahout-examples jar and adding them to the classpath from
> /lib/ is therefore redundant. As well many may be provided.
>


Re: Mahout distro Size

2016-09-06 Thread Andrew Palumbo
Ok sounds good, I agree with all as well.



I think that I have a PR that I could ressurect for #3.  I'd tried it just 
before a release, and then pulled it at the last minute.  I think that it was 
relatively simple:  https://github.com/apache/mahout/pull/129.  I only pulled 
it because I did not have time to test is well enough.With some minor updates, 
this should take care of it.

[https://avatars0.githubusercontent.com/u/7681565?v=3&s=400]<https://github.com/apache/mahout/pull/129>

MAHOUT-1706: remove dependency jars from /lib in the binary distribution by 
andrewpalumbo · Pull Request #129 · 
apache/mahout<https://github.com/apache/mahout/pull/129>
github.com
The mahout distribution currently is shipping ~56 MB of dependecy jars in the 
/lib directory of the distribution. These are only added to the classpath by 
/bin/mahout in the binary distribution. ...




+1 to #5, which is covered by MAHOUT-1705, and needs to be reopened-.  This 
will take a bit of work and I'm sure a good amount of testing.



As far as MAHOUT_LOCAL goes, it is already already in the process of being 
phased out.  It has been removed from all of the examples.

Here's my +1 to dropping it all together.


From: Suneel Marthi 
Sent: Tuesday, September 6, 2016 4:55:10 PM
To: mahout
Subject: Re: Mahout distro Size

+1 to all of them. 2 and 3 are very trivial to do.  Definitely consider
doing #5.


On Tue, Sep 6, 2016 at 4:33 PM, Andrew Palumbo  wrote:

> The current apache-mahout-distribution-0.12.2.tar.gz<http://mirror.stjsc
> hools.org/public/apache/mahout/0.12.2/apache-mahout-distribu
> tion-0.12.2.tar.gz> is 224M. we need to look for ways to get this size
> down.
>
>   1.  A few Possibilities:
>
>   2.  Drop h2o (binary only) from Distro? (18M - unused)
>
>   3.  MAHOUT-1865<https://issues.apache.org/jira/browse/MAHOUT-1865>:
> Remove Hadoop 1 support. could save us some space.
>
>   4.  MAHOUT-1706<https://issues.apache.org/jira/browse/MAHOUT-1706>:
> Remove dependency jars from /lib in mahout binary distribution. Should also
> save space.
>
>   5.  Having dropped support for MAHOUT_LOCAL we can now likely set a lot
> of dependencies to  scope, we can revisit: MAHOUT-1705<
> https://issues.apache.org/jira/browse/MAHOUT-1705>: Verify dependencies
> in job jar for mahout-examples.
>
>  *   16M./lib/hadoop
>
>  *   85M./lib/
>
> *   Many of the jars in /lib/ and possibly /lib/hadoop are already
> packaged into the mahout-examples jar and adding them to the classpath from
> /lib/ is therefore redundant. As well many may be provided.
>


Re: Mahout distro Size

2016-09-06 Thread Andrew Palumbo
I'm uncertainly sure that each is understood [?]


From: Suneel Marthi 
Sent: Tuesday, September 6, 2016 4:54:09 PM
To: mahout
Subject: Re: Mahout distro Size

On Tue, Sep 6, 2016 at 4:47 PM, Dmitriy Lyubimov  wrote:

> PS i probably should not say "probably definitely" next to each other.
> Definitely just definitely :)
>

That's fine.

 "Openly Closed" is now officially part of Apache Lexicon, so why not add
"Definitely Probable".


>
> On Tue, Sep 6, 2016 at 1:46 PM, Dmitriy Lyubimov 
> wrote:
>
> > 2 + 1
> > 3 + 1
> >
> > 4: other projects do something too. spark (at least it used to) to
> produce
> > tons of lib-managed deps as the result of its build, they probably still
> > have?
> >
> > On the other hand, the samsara only dependencies are really light.
> > backends are really always "provided", and the rest of it is fairly small
> > enough not to be an issue either way.  but we probably definitely should
> > drop local support for MR stuff (MR local mode didn't work correctly
> > anyway, last time I checked)
> >
> > On Tue, Sep 6, 2016 at 1:33 PM, Andrew Palumbo 
> wrote:
> >
> >> The current apache-mahout-distribution-0.12.2.tar.gz<http://mirror.
> stjsc
> >> hools.org/public/apache/mahout/0.12.2/apache-mahout-distribu
> >> tion-0.12.2.tar.gz> is 224M. we need to look for ways to get this size
> >> down.
> >>
> >>   1.  A few Possibilities:
> >>
> >>   2.  Drop h2o (binary only) from Distro? (18M - unused)
> >>
> >>   3.  MAHOUT-1865<https://issues.apache.org/jira/browse/MAHOUT-1865>:
> >> Remove Hadoop 1 support. could save us some space.
> >>
> >>   4.  MAHOUT-1706<https://issues.apache.org/jira/browse/MAHOUT-1706>:
> >> Remove dependency jars from /lib in mahout binary distribution. Should
> also
> >> save space.
> >>
> >>   5.  Having dropped support for MAHOUT_LOCAL we can now likely set a
> lot
> >> of dependencies to  scope, we can revisit: MAHOUT-1705<
> >> https://issues.apache.org/jira/browse/MAHOUT-1705>: Verify dependencies
> >> in job jar for mahout-examples.
> >>
> >>  *   16M./lib/hadoop
> >>
> >>  *   85M./lib/
> >>
> >> *   Many of the jars in /lib/ and possibly /lib/hadoop are
> >> already packaged into the mahout-examples jar and adding them to the
> >> classpath from /lib/ is therefore redundant. As well many may be
> provided.
> >>
> >
> >
>


Re: Mahout distro Size

2016-09-06 Thread Suneel Marthi
+1 to all of them. 2 and 3 are very trivial to do.  Definitely consider
doing #5.


On Tue, Sep 6, 2016 at 4:33 PM, Andrew Palumbo  wrote:

> The current apache-mahout-distribution-0.12.2.tar.gz hools.org/public/apache/mahout/0.12.2/apache-mahout-distribu
> tion-0.12.2.tar.gz> is 224M. we need to look for ways to get this size
> down.
>
>   1.  A few Possibilities:
>
>   2.  Drop h2o (binary only) from Distro? (18M - unused)
>
>   3.  MAHOUT-1865:
> Remove Hadoop 1 support. could save us some space.
>
>   4.  MAHOUT-1706:
> Remove dependency jars from /lib in mahout binary distribution. Should also
> save space.
>
>   5.  Having dropped support for MAHOUT_LOCAL we can now likely set a lot
> of dependencies to  scope, we can revisit: MAHOUT-1705<
> https://issues.apache.org/jira/browse/MAHOUT-1705>: Verify dependencies
> in job jar for mahout-examples.
>
>  *   16M./lib/hadoop
>
>  *   85M./lib/
>
> *   Many of the jars in /lib/ and possibly /lib/hadoop are already
> packaged into the mahout-examples jar and adding them to the classpath from
> /lib/ is therefore redundant. As well many may be provided.
>


Re: Mahout distro Size

2016-09-06 Thread Suneel Marthi
On Tue, Sep 6, 2016 at 4:47 PM, Dmitriy Lyubimov  wrote:

> PS i probably should not say "probably definitely" next to each other.
> Definitely just definitely :)
>

That's fine.

 "Openly Closed" is now officially part of Apache Lexicon, so why not add
"Definitely Probable".


>
> On Tue, Sep 6, 2016 at 1:46 PM, Dmitriy Lyubimov 
> wrote:
>
> > 2 + 1
> > 3 + 1
> >
> > 4: other projects do something too. spark (at least it used to) to
> produce
> > tons of lib-managed deps as the result of its build, they probably still
> > have?
> >
> > On the other hand, the samsara only dependencies are really light.
> > backends are really always "provided", and the rest of it is fairly small
> > enough not to be an issue either way.  but we probably definitely should
> > drop local support for MR stuff (MR local mode didn't work correctly
> > anyway, last time I checked)
> >
> > On Tue, Sep 6, 2016 at 1:33 PM, Andrew Palumbo 
> wrote:
> >
> >> The current apache-mahout-distribution-0.12.2.tar.gz stjsc
> >> hools.org/public/apache/mahout/0.12.2/apache-mahout-distribu
> >> tion-0.12.2.tar.gz> is 224M. we need to look for ways to get this size
> >> down.
> >>
> >>   1.  A few Possibilities:
> >>
> >>   2.  Drop h2o (binary only) from Distro? (18M - unused)
> >>
> >>   3.  MAHOUT-1865:
> >> Remove Hadoop 1 support. could save us some space.
> >>
> >>   4.  MAHOUT-1706:
> >> Remove dependency jars from /lib in mahout binary distribution. Should
> also
> >> save space.
> >>
> >>   5.  Having dropped support for MAHOUT_LOCAL we can now likely set a
> lot
> >> of dependencies to  scope, we can revisit: MAHOUT-1705<
> >> https://issues.apache.org/jira/browse/MAHOUT-1705>: Verify dependencies
> >> in job jar for mahout-examples.
> >>
> >>  *   16M./lib/hadoop
> >>
> >>  *   85M./lib/
> >>
> >> *   Many of the jars in /lib/ and possibly /lib/hadoop are
> >> already packaged into the mahout-examples jar and adding them to the
> >> classpath from /lib/ is therefore redundant. As well many may be
> provided.
> >>
> >
> >
>


Re: Mahout distro Size

2016-09-06 Thread Dmitriy Lyubimov
PS i probably should not say "probably definitely" next to each other.
Definitely just definitely :)

On Tue, Sep 6, 2016 at 1:46 PM, Dmitriy Lyubimov  wrote:

> 2 + 1
> 3 + 1
>
> 4: other projects do something too. spark (at least it used to) to produce
> tons of lib-managed deps as the result of its build, they probably still
> have?
>
> On the other hand, the samsara only dependencies are really light.
> backends are really always "provided", and the rest of it is fairly small
> enough not to be an issue either way.  but we probably definitely should
> drop local support for MR stuff (MR local mode didn't work correctly
> anyway, last time I checked)
>
> On Tue, Sep 6, 2016 at 1:33 PM, Andrew Palumbo  wrote:
>
>> The current apache-mahout-distribution-0.12.2.tar.gz> hools.org/public/apache/mahout/0.12.2/apache-mahout-distribu
>> tion-0.12.2.tar.gz> is 224M. we need to look for ways to get this size
>> down.
>>
>>   1.  A few Possibilities:
>>
>>   2.  Drop h2o (binary only) from Distro? (18M - unused)
>>
>>   3.  MAHOUT-1865:
>> Remove Hadoop 1 support. could save us some space.
>>
>>   4.  MAHOUT-1706:
>> Remove dependency jars from /lib in mahout binary distribution. Should also
>> save space.
>>
>>   5.  Having dropped support for MAHOUT_LOCAL we can now likely set a lot
>> of dependencies to  scope, we can revisit: MAHOUT-1705<
>> https://issues.apache.org/jira/browse/MAHOUT-1705>: Verify dependencies
>> in job jar for mahout-examples.
>>
>>  *   16M./lib/hadoop
>>
>>  *   85M./lib/
>>
>> *   Many of the jars in /lib/ and possibly /lib/hadoop are
>> already packaged into the mahout-examples jar and adding them to the
>> classpath from /lib/ is therefore redundant. As well many may be provided.
>>
>
>


Re: Mahout distro Size

2016-09-06 Thread Dmitriy Lyubimov
2 + 1
3 + 1

4: other projects do something too. spark (at least it used to) to produce
tons of lib-managed deps as the result of its build, they probably still
have?

On the other hand, the samsara only dependencies are really light. backends
are really always "provided", and the rest of it is fairly small enough not
to be an issue either way.  but we probably definitely should drop local
support for MR stuff (MR local mode didn't work correctly anyway, last time
I checked)

On Tue, Sep 6, 2016 at 1:33 PM, Andrew Palumbo  wrote:

> The current apache-mahout-distribution-0.12.2.tar.gz stjschools.org/public/apache/mahout/0.12.2/apache-mahout-
> distribution-0.12.2.tar.gz> is 224M. we need to look for ways to get this
> size down.
>
>   1.  A few Possibilities:
>
>   2.  Drop h2o (binary only) from Distro? (18M - unused)
>
>   3.  MAHOUT-1865:
> Remove Hadoop 1 support. could save us some space.
>
>   4.  MAHOUT-1706:
> Remove dependency jars from /lib in mahout binary distribution. Should also
> save space.
>
>   5.  Having dropped support for MAHOUT_LOCAL we can now likely set a lot
> of dependencies to  scope, we can revisit: MAHOUT-1705<
> https://issues.apache.org/jira/browse/MAHOUT-1705>: Verify dependencies
> in job jar for mahout-examples.
>
>  *   16M./lib/hadoop
>
>  *   85M./lib/
>
> *   Many of the jars in /lib/ and possibly /lib/hadoop are already
> packaged into the mahout-examples jar and adding them to the classpath from
> /lib/ is therefore redundant. As well many may be provided.
>


Mahout distro Size

2016-09-06 Thread Andrew Palumbo
The current 
apache-mahout-distribution-0.12.2.tar.gz
 is 224M. we need to look for ways to get this size down.

  1.  A few Possibilities:

  2.  Drop h2o (binary only) from Distro? (18M - unused)

  3.  MAHOUT-1865: Remove 
Hadoop 1 support. could save us some space.

  4.  MAHOUT-1706: Remove 
dependency jars from /lib in mahout binary distribution. Should also save space.

  5.  Having dropped support for MAHOUT_LOCAL we can now likely set a lot of 
dependencies to  scope, we can revisit: 
MAHOUT-1705: Verify 
dependencies in job jar for mahout-examples.

 *   16M./lib/hadoop

 *   85M./lib/

*   Many of the jars in /lib/ and possibly /lib/hadoop are already 
packaged into the mahout-examples jar and adding them to the classpath from 
/lib/ is therefore redundant. As well many may be provided.