my  take is legacy is just a module (aka maven artifact). Just like it
is now. we just need to re-route(cut) dependencies on it.

On Fri, Mar 6, 2015 at 2:56 PM, Pat Ferrel <p...@occamsmachete.com> wrote:
> The simplest way to split the project is into engines—hadoop and spark. What 
> is happening with H2O? is it being used? Flink isn’t anything like ready for 
> a release.
>
> Again the simplest would be two packaged builds, one for legacy stuff, which 
> would not require Scala or Spark at all.
>
> The other would be a maven based Scala + Spark + java math module. So this 
> would be mostly Scala with only the math module overlap. It requires the 
> refactoring work that Dmitriy has done, which would make it stand-alone. An 
> sbt build is clearly optional here but would be in keeping with our all-in 
> Scala approach. Personally I like sbt a lot better than maven but it is less 
> mature.
>
> The benefit would be:
> 1) potentially separate release schedules, hadoop not so often and eventually 
> not at all, spark every few days if you follow their schedule (not suggesting 
> this)
> 2) much faster build times for either branch—as anyone knows, building with 
> tests is starting to take a long time.
> 3) possible use of new tool chain like sbt in scala branch
> 4) much simpler launcher script—mahout’s is getting a mess and doesn’t run at 
> all on Windows. Requiring it to support both engines is not making things 
> easy and much work goes into getting around old ideas like the classpath and 
> job.jars. Creating one for each engine would seem to reduce complexity.
> 5) easier to support. If we really are going to have 4 engines the current 
> build and launch mechanisms along with release schedules can’t really be 
> maintained and even 2 is ugly.
>
> On Mar 6, 2015, at 11:52 AM, Suneel Marthi <suneel.mar...@gmail.com> wrote:
>
> On Fri, Mar 6, 2015 at 1:41 PM, Andrew Palumbo <ap....@outlook.com> wrote:
>
>>
>> On 03/06/2015 12:44 PM, Pat Ferrel wrote:
>>
>>> This is great.
>>>
>>> So we’ve talked about a name change and shortly we’ll be forced to come
>>> up with something the describes what Mahout has become. Most past users
>>> think of it as a scalable ML library on Hadoop. That may describe
>>> Mahout-Legacy but it seems like we need a name for the Scala
>>> DSL/Spark/other? part of the project. Lots of projects have sub-projects so
>>> we know there is no issue with naming sub-projects. So my question to
>>> everyone is:
>>>
>>> Should (or can) the Top Level Project be renamed? If so to what?
>>>
>> I don't like the idea of a top level name change.  I think that it would
>> be a much better idea to direct our resources at polishing and developing
>> what we have now.  As well, especially for this release, I think that it
>> would do a disservice to the "legacy" components (which as you point out
>> have not been deprecated) with ~45 completed bugfixes and several more in
>> the pipe.
>>
>> I don't like the idea of renaming Mahout either and agree with AP.
>
>>
>>> If we don’t rename the TLP then what should we call legacy (not very
>>> appealing) and scala/DSL (not a name really)
>>>
>> agreed.  Legacy is not the most appealing name.  Maybe something like
>> Mahout-MapReduce?  Though that could cause some confusion regarding the "no
>> new MapReduce code"
>>
>> My opinion:
>>> Since we are deemphasizing legacy I’m not sure there is a need to call
>>> attention to it by giving it a subproject name. However it is not
>>> deprecated so we need to include it in releases and even fix the minimum of
>>> critical bugs for some time to come.
>>>
>> agreed regarding fixing critical legacy bugs.  Looking through the issues
>> last night there didn't seem to me a lot of critical bugs, and probably a
>> good amount of issues can be closed out as wont fix/not an issue.
>>
>
> +1
>
>
>>
>>> Mahout is getting beat up in the circles of those who talk about such
>>> things and much of this is because people don’t understand what it has
>>> become. Therefore I’d like to see a project rename to reset expectations.
>>> Leave the name Mahout for legacy stuff and give a new name to the Scala
>>> environment. Split the builds and create new docs for the Scala stuff. This
>>> would seem to make it easier to document since legacy is most of what the
>>> CMS documents, we could create whole new template for the new project name.
>>>
>> What is the upside to splitting the builds? I'm not against it- I'm just
>> not sure I understand.
>>
>>>
>>> Failing this, many of the same benefits could be gained by creating
>>> legacy and scala sub-projects with better names. This I know we can do and
>>> recall that things like MLlib are generally not tied to Spark when speaking
>>> about them. So a subproject could have very much its own identity.
>>>
>>> Looking at the long history of Mahout it seems like the current
>>> generality was hard gained through implementing many special purpose
>>> algorithms, some of which were grad student projects. This is where MLlib
>>> is today in some ways. So a general framework and environment makes a lot
>>> of sense as the evolution of Mahout. Let’s give it a name, something better
>>> than DSL.
>>>
>> I think that a pretty clear description of what the other side of the
>> project is has been emerging recently.  IMO We need to start getting it out
>> there.  Probably a good start would be to update the front page of the
>> mahout site.
>
>
>  +1
>
>> I don't have any good ideas regarding names for this side of the project.
>>
>>
>>
>>> On Mar 5, 2015, at 7:43 PM, Andrew Musselman <andrew.mussel...@gmail.com>
>>> wrote:
>>>
>>> Thanks AP
>>>
>>> On Thursday, March 5, 2015, Andrew Palumbo <ap....@outlook.com> wrote:
>>>
>>> I went through all of the unresolved JIRA issues and marked all with at
>>>> least a "legacy" or "scala". (for lack of a better name for all that is
>>>> not
>>>> legacy) label. Hopefully I got them all.
>>>>
>>>> Some are labelled with both (math, build, documentation related to both
>>>> or
>>>> neither, etc.)
>>>>
>>>> legacy issues:
>>>>
>>>> https://issues.apache.org/jira/browse/MAHOUT-1522?jql=
>>>> project%20%3D%20MAHOUT%20AND%20resolution%20%3D%
>>>> 20Unresolved%20AND%20labels%20%3D%20scala%20ORDER%20BY%20priority%20DESC
>>>>
>>>> "scala" issues:
>>>>
>>>> https://issues.apache.org/jira/browse/MAHOUT-1522?jql=
>>>> project%20%3D%20MAHOUT%20AND%20resolution%20%3D%
>>>> 20Unresolved%20AND%20labels%20%3D%20legacy%20ORDER%20BY%
>>>> 20priority%20DESC
>>>>
>>>> Hopefully this will help us get started closing up some old issues. I'll
>>>> try to make another pass over them and close tomorrow and try to find
>>>> some
>>>> that need to be closed out.
>>>>
>>>>
>>
>

Reply via email to