Re: Removing MAHOUT_LOCAL option

2016-03-21 Thread Dmitriy Lyubimov
stochastic svd in DSSVD.scala is identical to MR with exception that MR
frankly is using a more numerically stable reordered Givens QR, while the
DSSVD.scala uses a less numerically stable Cholesky QR.

Aside from that, the DrmLike input parameter is fully compatible with hdfs
sequence file input for the MR version.

in Samsara the code would be (I am writing from memory and hopefully spell
everything right)



val drmX = drmDfsRead(path=)
val (drmU, drmV, s) = dssvd(drmX, k=..., q=..., ...)  // whatever
paremeters you normally use here

This should do it.
of course you'd run into significant infrastructure migration if you
currently do not have H20 or Spark available and spinning somewhere already.

-d

On Mon, Mar 21, 2016 at 12:57 PM, Mihai Dascalu 
wrote:

> We still have a legacy code that uses for a Stochastic SVD the local
> HADOOP instance directly in a Java desktop application. But if the desire
> is to eliminate it, we’ve been inclining for a while to migrate everything
> to Spark.
>
> Sorry, I’m old school and use MR, plus I’m new to Spark :) Is there an
> easy way to migrate your Spark example into the Java source code so that we
> do not disrupt the overall flow?
>
>
> Have a great evening!
> Mihai
>
> > On 21 Mar 2016, at 19:31, Dmitriy Lyubimov  wrote:
> >
> > my 1 cents (since it is less than 2) is MAHOUT_LOCAL is part of MR legacy
> > packaging. as long as MR is still here (and I would say it needs to be
> > still here, unless it falls in complete disrepair and totally out of sync
> > with even dated mapreduce apis), MAHOUT_LOCAL needs to stay. As soon as
> MR
> > goes, it goes too.
> >
> > maybe we just simply need a separate mahout script for non-legacy things,
> > or factor out legacy related shell things into another script (something
> > like mahout-mr.sh instead of mahout.sh)
> >
> > On Mon, Mar 21, 2016 at 8:45 AM, Suneel Marthi 
> wrote:
> >
> >> Some background on this issue:
> >>
> >> 1.  Now that we support Spark and H2O as back ends since 0.10.0 and
> Flink
> >> coming soon in 0.12.0, its been bloating the size of our release
> artifacts
> >> when pushing releases to Apache mirrors. Hence we were looking at
> pruning
> >> some of the components that have not been used or have been long marked
> >> deprecated and are not being worked on.
> >>
> >> 2.  Since Mahout 0.7 release in June 2012, the project has diverged from
> >> the MiA book even for legacy MapReduce.  Not sure if that's indeed
> helping
> >> onboard new users.
> >>
> >> 3.  Seems like the consensus so far based on the user responses is to
> >> retain the MAHOUT_LOCAL the option, thanks all for your responses.
> >>
> >>
> >> On Mon, Mar 21, 2016 at 11:38 AM, scott cote 
> wrote:
> >>
> >>> one more comment - I understand that it only works for the legacy code.
> >>> Kill it when the legacy code is no longer deprecated, but gone ….
> >>>
> >>> Otherwise - you will shut out people who buy the older mahout books
> (such
> >>> as MIA) which are still good reads, even though the tech is dated.
> >>>
> >>> SCott
> >>>
>  On Mar 21, 2016, at 2:24 AM, David Starina 
> >>> wrote:
> 
>  Anyhow, I'm +1 for removing MAHOUT_LOCAL, but I believe the deprecated
>  MapReduce-based code still makes sense if it is running well on
> Ignite.
> 
>  On Mon, Mar 21, 2016 at 8:20 AM, David Starina <
> >> david.star...@gmail.com>
>  wrote:
> 
> > Has anyone tried to run the deprecated MapReduce code on Ignite? Is
> >> the
> > performance improvement good enough to reconsider leaving those
> >>> algorithms
> > in Mahout?
> >
> > On Mon, Mar 21, 2016 at 12:45 AM, Andrew Musselman <
> > andrew.mussel...@gmail.com> wrote:
> >
> >> Yes I agree; will leave the question open a couple days.
> >>
> >> On Sunday, March 20, 2016, Pat Ferrel 
> wrote:
> >>
> >>> Maybe a better user question is: How many people are still using
> the
> >>> deprecated Hadoop code?
> >>>
> >>> If the number is small +1 for removal.
> >>>
> >>> On Mar 20, 2016, at 11:04 AM, Andrew Musselman <
> >> andrew.mussel...@gmail.com
> >>> > wrote:
> >>>
> >>> To clarify, the MAHOUT_LOCAL option only works for legacy Hadoop
> >>> MapReduce-based jobs which officially became deprecated in 0.10.0.
> >>>
> >>> On Sun, Mar 20, 2016 at 10:25 AM, Andrew Musselman <
> >>> andrew.mussel...@gmail.com > wrote:
> >>>
>  Yes as I understand it.
> 
> 
>  On Sunday, March 20, 2016, Pat Ferrel  >>> > wrote:
> 
> > Are we just talking about Hadoop Mapreduce? I thought is was
> >> ignored
> >>> when
> > using Spark.
> >
> > On Mar 20, 2016, at 8:20 AM, alok tanna  >>> > wrote:
> >
> > -1 MAHOUT_LOCAL  is very useful for quick POC .
> >
> > Thanks,
> > Alok Tanna
> > Sent from my iPhone
> >
> >> On Mar 20, 2016, at 5:01 AM,

Re: Removing MAHOUT_LOCAL option

2016-03-21 Thread Mihai Dascalu
We still have a legacy code that uses for a Stochastic SVD the local HADOOP 
instance directly in a Java desktop application. But if the desire is to 
eliminate it, we’ve been inclining for a while to migrate everything to Spark.

Sorry, I’m old school and use MR, plus I’m new to Spark :) Is there an easy way 
to migrate your Spark example into the Java source code so that we do not 
disrupt the overall flow?


Have a great evening!
Mihai

> On 21 Mar 2016, at 19:31, Dmitriy Lyubimov  wrote:
> 
> my 1 cents (since it is less than 2) is MAHOUT_LOCAL is part of MR legacy
> packaging. as long as MR is still here (and I would say it needs to be
> still here, unless it falls in complete disrepair and totally out of sync
> with even dated mapreduce apis), MAHOUT_LOCAL needs to stay. As soon as MR
> goes, it goes too.
> 
> maybe we just simply need a separate mahout script for non-legacy things,
> or factor out legacy related shell things into another script (something
> like mahout-mr.sh instead of mahout.sh)
> 
> On Mon, Mar 21, 2016 at 8:45 AM, Suneel Marthi  wrote:
> 
>> Some background on this issue:
>> 
>> 1.  Now that we support Spark and H2O as back ends since 0.10.0 and Flink
>> coming soon in 0.12.0, its been bloating the size of our release artifacts
>> when pushing releases to Apache mirrors. Hence we were looking at pruning
>> some of the components that have not been used or have been long marked
>> deprecated and are not being worked on.
>> 
>> 2.  Since Mahout 0.7 release in June 2012, the project has diverged from
>> the MiA book even for legacy MapReduce.  Not sure if that's indeed helping
>> onboard new users.
>> 
>> 3.  Seems like the consensus so far based on the user responses is to
>> retain the MAHOUT_LOCAL the option, thanks all for your responses.
>> 
>> 
>> On Mon, Mar 21, 2016 at 11:38 AM, scott cote  wrote:
>> 
>>> one more comment - I understand that it only works for the legacy code.
>>> Kill it when the legacy code is no longer deprecated, but gone ….
>>> 
>>> Otherwise - you will shut out people who buy the older mahout books (such
>>> as MIA) which are still good reads, even though the tech is dated.
>>> 
>>> SCott
>>> 
 On Mar 21, 2016, at 2:24 AM, David Starina 
>>> wrote:
 
 Anyhow, I'm +1 for removing MAHOUT_LOCAL, but I believe the deprecated
 MapReduce-based code still makes sense if it is running well on Ignite.
 
 On Mon, Mar 21, 2016 at 8:20 AM, David Starina <
>> david.star...@gmail.com>
 wrote:
 
> Has anyone tried to run the deprecated MapReduce code on Ignite? Is
>> the
> performance improvement good enough to reconsider leaving those
>>> algorithms
> in Mahout?
> 
> On Mon, Mar 21, 2016 at 12:45 AM, Andrew Musselman <
> andrew.mussel...@gmail.com> wrote:
> 
>> Yes I agree; will leave the question open a couple days.
>> 
>> On Sunday, March 20, 2016, Pat Ferrel  wrote:
>> 
>>> Maybe a better user question is: How many people are still using the
>>> deprecated Hadoop code?
>>> 
>>> If the number is small +1 for removal.
>>> 
>>> On Mar 20, 2016, at 11:04 AM, Andrew Musselman <
>> andrew.mussel...@gmail.com
>>> > wrote:
>>> 
>>> To clarify, the MAHOUT_LOCAL option only works for legacy Hadoop
>>> MapReduce-based jobs which officially became deprecated in 0.10.0.
>>> 
>>> On Sun, Mar 20, 2016 at 10:25 AM, Andrew Musselman <
>>> andrew.mussel...@gmail.com > wrote:
>>> 
 Yes as I understand it.
 
 
 On Sunday, March 20, 2016, Pat Ferrel >> > wrote:
 
> Are we just talking about Hadoop Mapreduce? I thought is was
>> ignored
>>> when
> using Spark.
> 
> On Mar 20, 2016, at 8:20 AM, alok tanna >> > wrote:
> 
> -1 MAHOUT_LOCAL  is very useful for quick POC .
> 
> Thanks,
> Alok Tanna
> Sent from my iPhone
> 
>> On Mar 20, 2016, at 5:01 AM, Mihai Dascalu <
>>> mihai.dasc...@cs.pub.ro
>>> >
> wrote:
>> 
>> -1 I still use it for fast deployment and it’s really helpful for
>> small
> local processing
>> 
>> Have a great weekend!
>> Mihai
>> 
>>> On 20 Mar 2016, at 06:13, Suneel Marthi <
>> suneel.mar...@gmail.com
>>> >
> wrote:
>>> 
>>> +1 to remove this
>>> 
>>> Sent from my iPhone
>>> 
 On Mar 20, 2016, at 12:01 AM, Andrew Musselman <
> andrew.mussel...@gmail.com > wrote:
 
 We're discussing removing the MAHOUT_LOCAL option in order to
>>> trim
> artifact
 sizes.
 
 If you think keeping the option to use MAHOUT_LOCAL for testing
>> with
> the
 single-node mode of Hadoop is important please let us know. It
>> can be
> handy
 for trying t

Re: Removing MAHOUT_LOCAL option

2016-03-21 Thread Andrew Musselman
I haven't but if you'd like to try it out and report back I'd love to hear
about it.

The mr jobs are staying in for now, no active move to remove them.

On Mon, Mar 21, 2016 at 12:20 AM, David Starina 
wrote:

> Has anyone tried to run the deprecated MapReduce code on Ignite? Is the
> performance improvement good enough to reconsider leaving those algorithms
> in Mahout?
>


Re: Removing MAHOUT_LOCAL option

2016-03-21 Thread Dmitriy Lyubimov
my 1 cents (since it is less than 2) is MAHOUT_LOCAL is part of MR legacy
packaging. as long as MR is still here (and I would say it needs to be
still here, unless it falls in complete disrepair and totally out of sync
with even dated mapreduce apis), MAHOUT_LOCAL needs to stay. As soon as MR
goes, it goes too.

maybe we just simply need a separate mahout script for non-legacy things,
or factor out legacy related shell things into another script (something
like mahout-mr.sh instead of mahout.sh)

On Mon, Mar 21, 2016 at 8:45 AM, Suneel Marthi  wrote:

> Some background on this issue:
>
> 1.  Now that we support Spark and H2O as back ends since 0.10.0 and Flink
> coming soon in 0.12.0, its been bloating the size of our release artifacts
> when pushing releases to Apache mirrors. Hence we were looking at pruning
> some of the components that have not been used or have been long marked
> deprecated and are not being worked on.
>
> 2.  Since Mahout 0.7 release in June 2012, the project has diverged from
> the MiA book even for legacy MapReduce.  Not sure if that's indeed helping
> onboard new users.
>
> 3.  Seems like the consensus so far based on the user responses is to
> retain the MAHOUT_LOCAL the option, thanks all for your responses.
>
>
> On Mon, Mar 21, 2016 at 11:38 AM, scott cote  wrote:
>
> > one more comment - I understand that it only works for the legacy code.
> > Kill it when the legacy code is no longer deprecated, but gone ….
> >
> > Otherwise - you will shut out people who buy the older mahout books (such
> > as MIA) which are still good reads, even though the tech is dated.
> >
> > SCott
> >
> > > On Mar 21, 2016, at 2:24 AM, David Starina 
> > wrote:
> > >
> > > Anyhow, I'm +1 for removing MAHOUT_LOCAL, but I believe the deprecated
> > > MapReduce-based code still makes sense if it is running well on Ignite.
> > >
> > > On Mon, Mar 21, 2016 at 8:20 AM, David Starina <
> david.star...@gmail.com>
> > > wrote:
> > >
> > >> Has anyone tried to run the deprecated MapReduce code on Ignite? Is
> the
> > >> performance improvement good enough to reconsider leaving those
> > algorithms
> > >> in Mahout?
> > >>
> > >> On Mon, Mar 21, 2016 at 12:45 AM, Andrew Musselman <
> > >> andrew.mussel...@gmail.com> wrote:
> > >>
> > >>> Yes I agree; will leave the question open a couple days.
> > >>>
> > >>> On Sunday, March 20, 2016, Pat Ferrel  wrote:
> > >>>
> >  Maybe a better user question is: How many people are still using the
> >  deprecated Hadoop code?
> > 
> >  If the number is small +1 for removal.
> > 
> >  On Mar 20, 2016, at 11:04 AM, Andrew Musselman <
> > >>> andrew.mussel...@gmail.com
> >  > wrote:
> > 
> >  To clarify, the MAHOUT_LOCAL option only works for legacy Hadoop
> >  MapReduce-based jobs which officially became deprecated in 0.10.0.
> > 
> >  On Sun, Mar 20, 2016 at 10:25 AM, Andrew Musselman <
> >  andrew.mussel...@gmail.com > wrote:
> > 
> > > Yes as I understand it.
> > >
> > >
> > > On Sunday, March 20, 2016, Pat Ferrel  >  > wrote:
> > >
> > >> Are we just talking about Hadoop Mapreduce? I thought is was
> ignored
> >  when
> > >> using Spark.
> > >>
> > >> On Mar 20, 2016, at 8:20 AM, alok tanna  >  > wrote:
> > >>
> > >> -1 MAHOUT_LOCAL  is very useful for quick POC .
> > >>
> > >> Thanks,
> > >> Alok Tanna
> > >> Sent from my iPhone
> > >>
> > >>> On Mar 20, 2016, at 5:01 AM, Mihai Dascalu <
> > mihai.dasc...@cs.pub.ro
> >  >
> > >> wrote:
> > >>>
> > >>> -1 I still use it for fast deployment and it’s really helpful for
> > >>> small
> > >> local processing
> > >>>
> > >>> Have a great weekend!
> > >>> Mihai
> > >>>
> >  On 20 Mar 2016, at 06:13, Suneel Marthi <
> suneel.mar...@gmail.com
> >  >
> > >> wrote:
> > 
> >  +1 to remove this
> > 
> >  Sent from my iPhone
> > 
> > > On Mar 20, 2016, at 12:01 AM, Andrew Musselman <
> > >> andrew.mussel...@gmail.com > wrote:
> > >
> > > We're discussing removing the MAHOUT_LOCAL option in order to
> > trim
> > >> artifact
> > > sizes.
> > >
> > > If you think keeping the option to use MAHOUT_LOCAL for testing
> > >>> with
> > >> the
> > > single-node mode of Hadoop is important please let us know. It
> > >>> can be
> > >> handy
> > > for trying things out but it would be nice to ditch the effort
> > >> required to
> > > maintain it.
> > >
> > > See https://issues.apache.org/jira/browse/MAHOUT-1705 for more
> > >> context.
> > >
> > > Thanks!
> > >>>
> > >>
> > >>
> > 
> > 
> > >>>
> > >>
> > >>
> >
> >
>


Re: Removing MAHOUT_LOCAL option

2016-03-21 Thread Suneel Marthi
Some background on this issue:

1.  Now that we support Spark and H2O as back ends since 0.10.0 and Flink
coming soon in 0.12.0, its been bloating the size of our release artifacts
when pushing releases to Apache mirrors. Hence we were looking at pruning
some of the components that have not been used or have been long marked
deprecated and are not being worked on.

2.  Since Mahout 0.7 release in June 2012, the project has diverged from
the MiA book even for legacy MapReduce.  Not sure if that's indeed helping
onboard new users.

3.  Seems like the consensus so far based on the user responses is to
retain the MAHOUT_LOCAL the option, thanks all for your responses.


On Mon, Mar 21, 2016 at 11:38 AM, scott cote  wrote:

> one more comment - I understand that it only works for the legacy code.
> Kill it when the legacy code is no longer deprecated, but gone ….
>
> Otherwise - you will shut out people who buy the older mahout books (such
> as MIA) which are still good reads, even though the tech is dated.
>
> SCott
>
> > On Mar 21, 2016, at 2:24 AM, David Starina 
> wrote:
> >
> > Anyhow, I'm +1 for removing MAHOUT_LOCAL, but I believe the deprecated
> > MapReduce-based code still makes sense if it is running well on Ignite.
> >
> > On Mon, Mar 21, 2016 at 8:20 AM, David Starina 
> > wrote:
> >
> >> Has anyone tried to run the deprecated MapReduce code on Ignite? Is the
> >> performance improvement good enough to reconsider leaving those
> algorithms
> >> in Mahout?
> >>
> >> On Mon, Mar 21, 2016 at 12:45 AM, Andrew Musselman <
> >> andrew.mussel...@gmail.com> wrote:
> >>
> >>> Yes I agree; will leave the question open a couple days.
> >>>
> >>> On Sunday, March 20, 2016, Pat Ferrel  wrote:
> >>>
>  Maybe a better user question is: How many people are still using the
>  deprecated Hadoop code?
> 
>  If the number is small +1 for removal.
> 
>  On Mar 20, 2016, at 11:04 AM, Andrew Musselman <
> >>> andrew.mussel...@gmail.com
>  > wrote:
> 
>  To clarify, the MAHOUT_LOCAL option only works for legacy Hadoop
>  MapReduce-based jobs which officially became deprecated in 0.10.0.
> 
>  On Sun, Mar 20, 2016 at 10:25 AM, Andrew Musselman <
>  andrew.mussel...@gmail.com > wrote:
> 
> > Yes as I understand it.
> >
> >
> > On Sunday, March 20, 2016, Pat Ferrel   > wrote:
> >
> >> Are we just talking about Hadoop Mapreduce? I thought is was ignored
>  when
> >> using Spark.
> >>
> >> On Mar 20, 2016, at 8:20 AM, alok tanna   > wrote:
> >>
> >> -1 MAHOUT_LOCAL  is very useful for quick POC .
> >>
> >> Thanks,
> >> Alok Tanna
> >> Sent from my iPhone
> >>
> >>> On Mar 20, 2016, at 5:01 AM, Mihai Dascalu <
> mihai.dasc...@cs.pub.ro
>  >
> >> wrote:
> >>>
> >>> -1 I still use it for fast deployment and it’s really helpful for
> >>> small
> >> local processing
> >>>
> >>> Have a great weekend!
> >>> Mihai
> >>>
>  On 20 Mar 2016, at 06:13, Suneel Marthi   >
> >> wrote:
> 
>  +1 to remove this
> 
>  Sent from my iPhone
> 
> > On Mar 20, 2016, at 12:01 AM, Andrew Musselman <
> >> andrew.mussel...@gmail.com > wrote:
> >
> > We're discussing removing the MAHOUT_LOCAL option in order to
> trim
> >> artifact
> > sizes.
> >
> > If you think keeping the option to use MAHOUT_LOCAL for testing
> >>> with
> >> the
> > single-node mode of Hadoop is important please let us know. It
> >>> can be
> >> handy
> > for trying things out but it would be nice to ditch the effort
> >> required to
> > maintain it.
> >
> > See https://issues.apache.org/jira/browse/MAHOUT-1705 for more
> >> context.
> >
> > Thanks!
> >>>
> >>
> >>
> 
> 
> >>>
> >>
> >>
>
>


Re: Removing MAHOUT_LOCAL option

2016-03-21 Thread scott cote
one more comment - I understand that it only works for the legacy code.  Kill 
it when the legacy code is no longer deprecated, but gone ….

Otherwise - you will shut out people who buy the older mahout books (such as 
MIA) which are still good reads, even though the tech is dated.

SCott

> On Mar 21, 2016, at 2:24 AM, David Starina  wrote:
> 
> Anyhow, I'm +1 for removing MAHOUT_LOCAL, but I believe the deprecated
> MapReduce-based code still makes sense if it is running well on Ignite.
> 
> On Mon, Mar 21, 2016 at 8:20 AM, David Starina 
> wrote:
> 
>> Has anyone tried to run the deprecated MapReduce code on Ignite? Is the
>> performance improvement good enough to reconsider leaving those algorithms
>> in Mahout?
>> 
>> On Mon, Mar 21, 2016 at 12:45 AM, Andrew Musselman <
>> andrew.mussel...@gmail.com> wrote:
>> 
>>> Yes I agree; will leave the question open a couple days.
>>> 
>>> On Sunday, March 20, 2016, Pat Ferrel  wrote:
>>> 
 Maybe a better user question is: How many people are still using the
 deprecated Hadoop code?
 
 If the number is small +1 for removal.
 
 On Mar 20, 2016, at 11:04 AM, Andrew Musselman <
>>> andrew.mussel...@gmail.com
 > wrote:
 
 To clarify, the MAHOUT_LOCAL option only works for legacy Hadoop
 MapReduce-based jobs which officially became deprecated in 0.10.0.
 
 On Sun, Mar 20, 2016 at 10:25 AM, Andrew Musselman <
 andrew.mussel...@gmail.com > wrote:
 
> Yes as I understand it.
> 
> 
> On Sunday, March 20, 2016, Pat Ferrel >>> > wrote:
> 
>> Are we just talking about Hadoop Mapreduce? I thought is was ignored
 when
>> using Spark.
>> 
>> On Mar 20, 2016, at 8:20 AM, alok tanna >>> > wrote:
>> 
>> -1 MAHOUT_LOCAL  is very useful for quick POC .
>> 
>> Thanks,
>> Alok Tanna
>> Sent from my iPhone
>> 
>>> On Mar 20, 2016, at 5:01 AM, Mihai Dascalu >>> >
>> wrote:
>>> 
>>> -1 I still use it for fast deployment and it’s really helpful for
>>> small
>> local processing
>>> 
>>> Have a great weekend!
>>> Mihai
>>> 
 On 20 Mar 2016, at 06:13, Suneel Marthi >>> >
>> wrote:
 
 +1 to remove this
 
 Sent from my iPhone
 
> On Mar 20, 2016, at 12:01 AM, Andrew Musselman <
>> andrew.mussel...@gmail.com > wrote:
> 
> We're discussing removing the MAHOUT_LOCAL option in order to trim
>> artifact
> sizes.
> 
> If you think keeping the option to use MAHOUT_LOCAL for testing
>>> with
>> the
> single-node mode of Hadoop is important please let us know. It
>>> can be
>> handy
> for trying things out but it would be nice to ditch the effort
>> required to
> maintain it.
> 
> See https://issues.apache.org/jira/browse/MAHOUT-1705 for more
>> context.
> 
> Thanks!
>>> 
>> 
>> 
 
 
>>> 
>> 
>> 



Re: Removing MAHOUT_LOCAL option

2016-03-21 Thread scott cote
I know that I’m not a contributor, but the local option allowed me to get into 
the use of Mahout without a lot of upfront cost.  Please don’t lose site of 
acquiring new users of Mahout - the local install is a key part of that process.

SCott
> On Mar 19, 2016, at 11:01 PM, Andrew Musselman  
> wrote:
> 
> We're discussing removing the MAHOUT_LOCAL option in order to trim artifact
> sizes.
> 
> If you think keeping the option to use MAHOUT_LOCAL for testing with the
> single-node mode of Hadoop is important please let us know. It can be handy
> for trying things out but it would be nice to ditch the effort required to
> maintain it.
> 
> See https://issues.apache.org/jira/browse/MAHOUT-1705 for more context.
> 
> Thanks!



Re: Removing MAHOUT_LOCAL option

2016-03-21 Thread David Starina
Anyhow, I'm +1 for removing MAHOUT_LOCAL, but I believe the deprecated
MapReduce-based code still makes sense if it is running well on Ignite.

On Mon, Mar 21, 2016 at 8:20 AM, David Starina 
wrote:

> Has anyone tried to run the deprecated MapReduce code on Ignite? Is the
> performance improvement good enough to reconsider leaving those algorithms
> in Mahout?
>
> On Mon, Mar 21, 2016 at 12:45 AM, Andrew Musselman <
> andrew.mussel...@gmail.com> wrote:
>
>> Yes I agree; will leave the question open a couple days.
>>
>> On Sunday, March 20, 2016, Pat Ferrel  wrote:
>>
>> > Maybe a better user question is: How many people are still using the
>> > deprecated Hadoop code?
>> >
>> > If the number is small +1 for removal.
>> >
>> > On Mar 20, 2016, at 11:04 AM, Andrew Musselman <
>> andrew.mussel...@gmail.com
>> > > wrote:
>> >
>> > To clarify, the MAHOUT_LOCAL option only works for legacy Hadoop
>> > MapReduce-based jobs which officially became deprecated in 0.10.0.
>> >
>> > On Sun, Mar 20, 2016 at 10:25 AM, Andrew Musselman <
>> > andrew.mussel...@gmail.com > wrote:
>> >
>> > > Yes as I understand it.
>> > >
>> > >
>> > > On Sunday, March 20, 2016, Pat Ferrel > > > wrote:
>> > >
>> > >> Are we just talking about Hadoop Mapreduce? I thought is was ignored
>> > when
>> > >> using Spark.
>> > >>
>> > >> On Mar 20, 2016, at 8:20 AM, alok tanna > > > wrote:
>> > >>
>> > >> -1 MAHOUT_LOCAL  is very useful for quick POC .
>> > >>
>> > >> Thanks,
>> > >> Alok Tanna
>> > >> Sent from my iPhone
>> > >>
>> > >>> On Mar 20, 2016, at 5:01 AM, Mihai Dascalu > > >
>> > >> wrote:
>> > >>>
>> > >>> -1 I still use it for fast deployment and it’s really helpful for
>> small
>> > >> local processing
>> > >>>
>> > >>> Have a great weekend!
>> > >>> Mihai
>> > >>>
>> >  On 20 Mar 2016, at 06:13, Suneel Marthi > > >
>> > >> wrote:
>> > 
>> >  +1 to remove this
>> > 
>> >  Sent from my iPhone
>> > 
>> > > On Mar 20, 2016, at 12:01 AM, Andrew Musselman <
>> > >> andrew.mussel...@gmail.com > wrote:
>> > >
>> > > We're discussing removing the MAHOUT_LOCAL option in order to trim
>> > >> artifact
>> > > sizes.
>> > >
>> > > If you think keeping the option to use MAHOUT_LOCAL for testing
>> with
>> > >> the
>> > > single-node mode of Hadoop is important please let us know. It
>> can be
>> > >> handy
>> > > for trying things out but it would be nice to ditch the effort
>> > >> required to
>> > > maintain it.
>> > >
>> > > See https://issues.apache.org/jira/browse/MAHOUT-1705 for more
>> > >> context.
>> > >
>> > > Thanks!
>> > >>>
>> > >>
>> > >>
>> >
>> >
>>
>
>


Re: Removing MAHOUT_LOCAL option

2016-03-21 Thread David Starina
Has anyone tried to run the deprecated MapReduce code on Ignite? Is the
performance improvement good enough to reconsider leaving those algorithms
in Mahout?

On Mon, Mar 21, 2016 at 12:45 AM, Andrew Musselman <
andrew.mussel...@gmail.com> wrote:

> Yes I agree; will leave the question open a couple days.
>
> On Sunday, March 20, 2016, Pat Ferrel  wrote:
>
> > Maybe a better user question is: How many people are still using the
> > deprecated Hadoop code?
> >
> > If the number is small +1 for removal.
> >
> > On Mar 20, 2016, at 11:04 AM, Andrew Musselman <
> andrew.mussel...@gmail.com
> > > wrote:
> >
> > To clarify, the MAHOUT_LOCAL option only works for legacy Hadoop
> > MapReduce-based jobs which officially became deprecated in 0.10.0.
> >
> > On Sun, Mar 20, 2016 at 10:25 AM, Andrew Musselman <
> > andrew.mussel...@gmail.com > wrote:
> >
> > > Yes as I understand it.
> > >
> > >
> > > On Sunday, March 20, 2016, Pat Ferrel  > > wrote:
> > >
> > >> Are we just talking about Hadoop Mapreduce? I thought is was ignored
> > when
> > >> using Spark.
> > >>
> > >> On Mar 20, 2016, at 8:20 AM, alok tanna  > > wrote:
> > >>
> > >> -1 MAHOUT_LOCAL  is very useful for quick POC .
> > >>
> > >> Thanks,
> > >> Alok Tanna
> > >> Sent from my iPhone
> > >>
> > >>> On Mar 20, 2016, at 5:01 AM, Mihai Dascalu  > >
> > >> wrote:
> > >>>
> > >>> -1 I still use it for fast deployment and it’s really helpful for
> small
> > >> local processing
> > >>>
> > >>> Have a great weekend!
> > >>> Mihai
> > >>>
> >  On 20 Mar 2016, at 06:13, Suneel Marthi  > >
> > >> wrote:
> > 
> >  +1 to remove this
> > 
> >  Sent from my iPhone
> > 
> > > On Mar 20, 2016, at 12:01 AM, Andrew Musselman <
> > >> andrew.mussel...@gmail.com > wrote:
> > >
> > > We're discussing removing the MAHOUT_LOCAL option in order to trim
> > >> artifact
> > > sizes.
> > >
> > > If you think keeping the option to use MAHOUT_LOCAL for testing
> with
> > >> the
> > > single-node mode of Hadoop is important please let us know. It can
> be
> > >> handy
> > > for trying things out but it would be nice to ditch the effort
> > >> required to
> > > maintain it.
> > >
> > > See https://issues.apache.org/jira/browse/MAHOUT-1705 for more
> > >> context.
> > >
> > > Thanks!
> > >>>
> > >>
> > >>
> >
> >
>


Re: Removing MAHOUT_LOCAL option

2016-03-20 Thread Andrew Musselman
Yes I agree; will leave the question open a couple days.

On Sunday, March 20, 2016, Pat Ferrel  wrote:

> Maybe a better user question is: How many people are still using the
> deprecated Hadoop code?
>
> If the number is small +1 for removal.
>
> On Mar 20, 2016, at 11:04 AM, Andrew Musselman  > wrote:
>
> To clarify, the MAHOUT_LOCAL option only works for legacy Hadoop
> MapReduce-based jobs which officially became deprecated in 0.10.0.
>
> On Sun, Mar 20, 2016 at 10:25 AM, Andrew Musselman <
> andrew.mussel...@gmail.com > wrote:
>
> > Yes as I understand it.
> >
> >
> > On Sunday, March 20, 2016, Pat Ferrel  > wrote:
> >
> >> Are we just talking about Hadoop Mapreduce? I thought is was ignored
> when
> >> using Spark.
> >>
> >> On Mar 20, 2016, at 8:20 AM, alok tanna  > wrote:
> >>
> >> -1 MAHOUT_LOCAL  is very useful for quick POC .
> >>
> >> Thanks,
> >> Alok Tanna
> >> Sent from my iPhone
> >>
> >>> On Mar 20, 2016, at 5:01 AM, Mihai Dascalu  >
> >> wrote:
> >>>
> >>> -1 I still use it for fast deployment and it’s really helpful for small
> >> local processing
> >>>
> >>> Have a great weekend!
> >>> Mihai
> >>>
>  On 20 Mar 2016, at 06:13, Suneel Marthi  >
> >> wrote:
> 
>  +1 to remove this
> 
>  Sent from my iPhone
> 
> > On Mar 20, 2016, at 12:01 AM, Andrew Musselman <
> >> andrew.mussel...@gmail.com > wrote:
> >
> > We're discussing removing the MAHOUT_LOCAL option in order to trim
> >> artifact
> > sizes.
> >
> > If you think keeping the option to use MAHOUT_LOCAL for testing with
> >> the
> > single-node mode of Hadoop is important please let us know. It can be
> >> handy
> > for trying things out but it would be nice to ditch the effort
> >> required to
> > maintain it.
> >
> > See https://issues.apache.org/jira/browse/MAHOUT-1705 for more
> >> context.
> >
> > Thanks!
> >>>
> >>
> >>
>
>


Re: Removing MAHOUT_LOCAL option

2016-03-20 Thread Pat Ferrel
Maybe a better user question is: How many people are still using the deprecated 
Hadoop code?

If the number is small +1 for removal.

On Mar 20, 2016, at 11:04 AM, Andrew Musselman  
wrote:

To clarify, the MAHOUT_LOCAL option only works for legacy Hadoop
MapReduce-based jobs which officially became deprecated in 0.10.0.

On Sun, Mar 20, 2016 at 10:25 AM, Andrew Musselman <
andrew.mussel...@gmail.com> wrote:

> Yes as I understand it.
> 
> 
> On Sunday, March 20, 2016, Pat Ferrel  wrote:
> 
>> Are we just talking about Hadoop Mapreduce? I thought is was ignored when
>> using Spark.
>> 
>> On Mar 20, 2016, at 8:20 AM, alok tanna  wrote:
>> 
>> -1 MAHOUT_LOCAL  is very useful for quick POC .
>> 
>> Thanks,
>> Alok Tanna
>> Sent from my iPhone
>> 
>>> On Mar 20, 2016, at 5:01 AM, Mihai Dascalu 
>> wrote:
>>> 
>>> -1 I still use it for fast deployment and it’s really helpful for small
>> local processing
>>> 
>>> Have a great weekend!
>>> Mihai
>>> 
 On 20 Mar 2016, at 06:13, Suneel Marthi 
>> wrote:
 
 +1 to remove this
 
 Sent from my iPhone
 
> On Mar 20, 2016, at 12:01 AM, Andrew Musselman <
>> andrew.mussel...@gmail.com> wrote:
> 
> We're discussing removing the MAHOUT_LOCAL option in order to trim
>> artifact
> sizes.
> 
> If you think keeping the option to use MAHOUT_LOCAL for testing with
>> the
> single-node mode of Hadoop is important please let us know. It can be
>> handy
> for trying things out but it would be nice to ditch the effort
>> required to
> maintain it.
> 
> See https://issues.apache.org/jira/browse/MAHOUT-1705 for more
>> context.
> 
> Thanks!
>>> 
>> 
>> 



Re: Removing MAHOUT_LOCAL option

2016-03-20 Thread Andrew Musselman
To clarify, the MAHOUT_LOCAL option only works for legacy Hadoop
MapReduce-based jobs which officially became deprecated in 0.10.0.

On Sun, Mar 20, 2016 at 10:25 AM, Andrew Musselman <
andrew.mussel...@gmail.com> wrote:

> Yes as I understand it.
>
>
> On Sunday, March 20, 2016, Pat Ferrel  wrote:
>
>> Are we just talking about Hadoop Mapreduce? I thought is was ignored when
>> using Spark.
>>
>> On Mar 20, 2016, at 8:20 AM, alok tanna  wrote:
>>
>> -1 MAHOUT_LOCAL  is very useful for quick POC .
>>
>> Thanks,
>> Alok Tanna
>> Sent from my iPhone
>>
>> > On Mar 20, 2016, at 5:01 AM, Mihai Dascalu 
>> wrote:
>> >
>> > -1 I still use it for fast deployment and it’s really helpful for small
>> local processing
>> >
>> > Have a great weekend!
>> > Mihai
>> >
>> >> On 20 Mar 2016, at 06:13, Suneel Marthi 
>> wrote:
>> >>
>> >> +1 to remove this
>> >>
>> >> Sent from my iPhone
>> >>
>> >>> On Mar 20, 2016, at 12:01 AM, Andrew Musselman <
>> andrew.mussel...@gmail.com> wrote:
>> >>>
>> >>> We're discussing removing the MAHOUT_LOCAL option in order to trim
>> artifact
>> >>> sizes.
>> >>>
>> >>> If you think keeping the option to use MAHOUT_LOCAL for testing with
>> the
>> >>> single-node mode of Hadoop is important please let us know. It can be
>> handy
>> >>> for trying things out but it would be nice to ditch the effort
>> required to
>> >>> maintain it.
>> >>>
>> >>> See https://issues.apache.org/jira/browse/MAHOUT-1705 for more
>> context.
>> >>>
>> >>> Thanks!
>> >
>>
>>


Re: Removing MAHOUT_LOCAL option

2016-03-20 Thread Andrew Musselman
Yes as I understand it.

On Sunday, March 20, 2016, Pat Ferrel  wrote:

> Are we just talking about Hadoop Mapreduce? I thought is was ignored when
> using Spark.
>
> On Mar 20, 2016, at 8:20 AM, alok tanna  > wrote:
>
> -1 MAHOUT_LOCAL  is very useful for quick POC .
>
> Thanks,
> Alok Tanna
> Sent from my iPhone
>
> > On Mar 20, 2016, at 5:01 AM, Mihai Dascalu  > wrote:
> >
> > -1 I still use it for fast deployment and it’s really helpful for small
> local processing
> >
> > Have a great weekend!
> > Mihai
> >
> >> On 20 Mar 2016, at 06:13, Suneel Marthi  > wrote:
> >>
> >> +1 to remove this
> >>
> >> Sent from my iPhone
> >>
> >>> On Mar 20, 2016, at 12:01 AM, Andrew Musselman <
> andrew.mussel...@gmail.com > wrote:
> >>>
> >>> We're discussing removing the MAHOUT_LOCAL option in order to trim
> artifact
> >>> sizes.
> >>>
> >>> If you think keeping the option to use MAHOUT_LOCAL for testing with
> the
> >>> single-node mode of Hadoop is important please let us know. It can be
> handy
> >>> for trying things out but it would be nice to ditch the effort
> required to
> >>> maintain it.
> >>>
> >>> See https://issues.apache.org/jira/browse/MAHOUT-1705 for more
> context.
> >>>
> >>> Thanks!
> >
>
>


Re: Removing MAHOUT_LOCAL option

2016-03-20 Thread Pat Ferrel
Are we just talking about Hadoop Mapreduce? I thought is was ignored when using 
Spark. 

On Mar 20, 2016, at 8:20 AM, alok tanna  wrote:

-1 MAHOUT_LOCAL  is very useful for quick POC .

Thanks,
Alok Tanna
Sent from my iPhone

> On Mar 20, 2016, at 5:01 AM, Mihai Dascalu  wrote:
> 
> -1 I still use it for fast deployment and it’s really helpful for small local 
> processing
> 
> Have a great weekend!
> Mihai
> 
>> On 20 Mar 2016, at 06:13, Suneel Marthi  wrote:
>> 
>> +1 to remove this
>> 
>> Sent from my iPhone
>> 
>>> On Mar 20, 2016, at 12:01 AM, Andrew Musselman  
>>> wrote:
>>> 
>>> We're discussing removing the MAHOUT_LOCAL option in order to trim artifact
>>> sizes.
>>> 
>>> If you think keeping the option to use MAHOUT_LOCAL for testing with the
>>> single-node mode of Hadoop is important please let us know. It can be handy
>>> for trying things out but it would be nice to ditch the effort required to
>>> maintain it.
>>> 
>>> See https://issues.apache.org/jira/browse/MAHOUT-1705 for more context.
>>> 
>>> Thanks!
> 



Re: Removing MAHOUT_LOCAL option

2016-03-20 Thread alok tanna
-1 MAHOUT_LOCAL  is very useful for quick POC .

Thanks,
Alok Tanna
Sent from my iPhone

> On Mar 20, 2016, at 5:01 AM, Mihai Dascalu  wrote:
> 
> -1 I still use it for fast deployment and it’s really helpful for small local 
> processing
> 
> Have a great weekend!
> Mihai
> 
>> On 20 Mar 2016, at 06:13, Suneel Marthi  wrote:
>> 
>> +1 to remove this
>> 
>> Sent from my iPhone
>> 
>>> On Mar 20, 2016, at 12:01 AM, Andrew Musselman  
>>> wrote:
>>> 
>>> We're discussing removing the MAHOUT_LOCAL option in order to trim artifact
>>> sizes.
>>> 
>>> If you think keeping the option to use MAHOUT_LOCAL for testing with the
>>> single-node mode of Hadoop is important please let us know. It can be handy
>>> for trying things out but it would be nice to ditch the effort required to
>>> maintain it.
>>> 
>>> See https://issues.apache.org/jira/browse/MAHOUT-1705 for more context.
>>> 
>>> Thanks!
> 


Re: Removing MAHOUT_LOCAL option

2016-03-20 Thread Mihai Dascalu
-1 I still use it for fast deployment and it’s really helpful for small local 
processing

Have a great weekend!
Mihai

> On 20 Mar 2016, at 06:13, Suneel Marthi  wrote:
> 
> +1 to remove this
> 
> Sent from my iPhone
> 
>> On Mar 20, 2016, at 12:01 AM, Andrew Musselman  
>> wrote:
>> 
>> We're discussing removing the MAHOUT_LOCAL option in order to trim artifact
>> sizes.
>> 
>> If you think keeping the option to use MAHOUT_LOCAL for testing with the
>> single-node mode of Hadoop is important please let us know. It can be handy
>> for trying things out but it would be nice to ditch the effort required to
>> maintain it.
>> 
>> See https://issues.apache.org/jira/browse/MAHOUT-1705 for more context.
>> 
>> Thanks!



Re: Removing MAHOUT_LOCAL option

2016-03-19 Thread Suneel Marthi
+1 to remove this

Sent from my iPhone

> On Mar 20, 2016, at 12:01 AM, Andrew Musselman  
> wrote:
> 
> We're discussing removing the MAHOUT_LOCAL option in order to trim artifact
> sizes.
> 
> If you think keeping the option to use MAHOUT_LOCAL for testing with the
> single-node mode of Hadoop is important please let us know. It can be handy
> for trying things out but it would be nice to ditch the effort required to
> maintain it.
> 
> See https://issues.apache.org/jira/browse/MAHOUT-1705 for more context.
> 
> Thanks!