Re: Welcome Pat Ferrel as new committer on Mahout

2014-04-24 Thread Martin, Nick
Awesome Pat congrats!!! Very well deserved.

Sent from my iPhone

On Apr 24, 2014, at 6:20 AM, Sebastian Schelter s...@apache.org wrote:

 Hi,
 
 this is to announce that the Project Management Committee (PMC) for Apache 
 Mahout has asked Pat Ferrel to become committer and we are pleased to 
 announce that he has accepted.
 
 Being a committer enables easier contribution to the project since in 
 addition to posting patches on JIRA it also gives write access to the code 
 repository. That also means that now we have yet another person who can 
 commit patches submitted by others to our repo *wink*
 
 Pat, we look forward to working with you in the future. Welcome! It would be 
 great if you could introduce yourself with a few words.
 
 -s


RE: Documentation, Documentation, Documentation

2014-04-14 Thread Martin, Nick
Drafted a little intro to the item based rec and dropped it in the comments for 
1445. Aimed to include some examples of the variety of things one can do with 
the algo and hopefully enough info that someone hitting the page could get a 
feel for what they can potentially accomplish before diving directly into the 
'guts' of the workflow/config options, etc. 

Happy to take edits, saw there was another submission a bit ahead of mine this 
morning so not sure how that gets resolved. 

Anyways, maybe this can get us closer on cleanup!

-Original Message-
From: Sebastian Schelter [mailto:s...@apache.org] 
Sent: Sunday, April 13, 2014 7:49 AM
To: u...@mahout.apache.org; dev@mahout.apache.org
Subject: Documentation, Documentation, Documentation

Hi,

this is another reminder that we still have to finish our documentation 
improvements! The website looks shiny now and there have been lots of 
discussions about new directions but we still have some work todo in cleaning 
up webpages. We should especially make sure that the examples work.

Please help with that, anyone who is willing to sacrifice some time, go through 
a website and try out the steps described is of great help to the project. It 
would also be awesome to get some help in creating a few new pages, especially 
for the recommenders.

Here's the list of documentation related jira's for 1.0:

https://issues.apache.org/jira/browse/MAHOUT-1441?jql=project%20%3D%20MAHOUT%20AND%20component%20%3D%20Documentation%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20due%20ASC%2C%20priority%20DESC%2C%20created%20ASC

Best,
Sebastian


RE: Board Report

2014-04-07 Thread Martin, Nick
I'll join Chandler w/ the downstream user input fwiw...

Pat's earlier email described our shop perfectly re: recommendations. We're a 
large organization using Mahout's recommendation capability in several projects 
but shy away from the other components. We have a half dozen business units and 
several of them have had fits-and-starts with Mahout for 
clustering/classification/some fpm but collectively we've started to share the 
recommendation capability because it's approachable, has efficient data 
requirements for input and is fairly well documented for our use cases. I think 
the documentation ills have been captured extensively (esp. recently) on @user 
and even here on @dev around some of the other components and I can vouch that 
folks in our organization cite that as a reason they abandoned Mahout.

I share Chandler's desire (and others that have offered thoughts in this 
direction in the past week or so on @dev) that whatever the roadmap is that 
it's clear and I can plan around it for the next 24-36 months. We have h20 up 
and I confess the potential of migrating our 'data science' activities to a 
singular execution framework/interface/ride-along atop existing Hadoop clusters 
is alluring. We have expansive sprawl wrt stats packages and some diversity in 
ML libs/packages and for an organization our size that's extremely costly. Any 
opportunity we have to consolidate capabilities in this space helps us 
tremendously. Re: Spark we understand the diversification from MR is coming but 
in many important areas of our business we're only now gaining traction with 
leaders to implement MR-based solutions. We're a large ship and turn slow, so 
all I ask is that there's a long tail for deprecated MR capabilities because 
we'll be slow to convert. 

-Original Message-
From: Chandler Burgess [mailto:cburg...@icontrolesi.com] 
Sent: Monday, April 07, 2014 4:11 PM
To: dev@mahout.apache.org
Subject: RE: Board Report

First, take my opinions with a grain of salt, as I'm sure most will. This is 
basically an anecdote to back up Sean's and Pat's concerns.

I come from an industry (legal) where there is a huge demand for increased 
analytics and machine learning applications. Our stack already includes 
Lucene/Solr, I had heard about Mahout and was curious about applying it to some 
of the things we wanted to do.

I spent around a month playing with Mahout, reading all the documentation and 
articles I could, Mahout In Action, Taming Text, etc. After a month, I came 
away highly disappointed. The documentation in general is very poor, some of 
the drivers are buggy, others unusable because there is basically no 
documentation, examples/potential applications are missing (what the hell can I 
do with Lanczos SVD output? I just want LSI!), and, now, reading more about 
Spark/h20 it leaves me uneasy that anything I write and use Mahout for will 
change in the near future, not to mention another platform/technology 
(potentially 2!) I have to learn. 

It seems far, far away from a 1.0 release, which by all public indications is 
next.

It was attractive from a licensing standpoint, and we will probably still use 
it just for seq2sparse. And that will be about it. We're already putting a 
stack together using other libraries which are better documented, from all 
appearances more stable and feature rich, and faster (though maybe not as 
scalable in some cases).

I have deadlines to meet, deliverables to produce, and other projects to work 
on. As it is, I can't trust Mahout and the learning curve is too steep for 
someone like me to apply this in a production environment without being in a 
much bigger company with a lot more resources.

That said, my opinion would be that ONE direction needs to be chosen as the 
main focus and efforts geared toward that. If it's moving to Spark, which 
sounds awesome, then so be it. Otherwise, I fear Mahout will end up a toy for 
hobbyists, people who are already vested in it, or relegated to the trash bin 
while industry moves on to bigger and better things.

-Original Message-
From: Pat Ferrel [mailto:p...@occamsmachete.com]
Sent: Monday, April 07, 2014 1:03 PM
To: dev@mahout.apache.org
Subject: Re: Board Report

Mahout needs a reboot. Grant has the right perspective, but I'd take it 
further. His #2 (two efforts) is not and never would be reasonable in anything 
but a huge company. 

I have never and would never take a team the size of Mahout (even with some new 
commiters) and split a reboot into two parts on two engines. No sane project 
manager would allow this. Why do we think it will work here?

The recent Gigaom article left me sympathetic with how confused the readers 
must be, let alone potential users or contributors.

Sean is not being nihilistic, two directions will not work for Mahout. Mahout 
has a bad reputation already for being a poorly documented and a poorly 
integrated loose collections of code with a lot of technical debt. Honestly has 

RE: moving to java 1.7

2014-03-21 Thread Martin, Nick
FWIW, 6 is the standard where I am. No idea when the move away will begin but I 
can tell you it's  not currently very high on the list.

-Original Message-
From: Ted Dunning [mailto:ted.dunn...@gmail.com] 
Sent: Friday, March 21, 2014 12:27 AM
To: Mahout Dev List; Suneel Marthi
Subject: Re: moving to java 1.7

Even more than vendors supporting Java 7, the move of Mahout to Java 7 is 
predicated on users of Hadoop (and Mahout) dropping Java 6.

I still see significant numbers of MapR customers with Java 6 as a corporate 
standard.




On Thu, Mar 20, 2014 at 9:06 PM, Suneel Marthi suneel_mar...@yahoo.comwrote:

 Mahout moving to Java 7 is subject to the different Hadoop vendors 
 supporting Java 7 with their distros.  As Andrew's mentioned, Mahout 
 builds with 1.7 already.





 On Friday, March 21, 2014 12:04 AM, Andrew Musselman  
 andrew.mussel...@gmail.com wrote:

 I don't think it's a priority but stuff builds with Java 7 already.

  On Mar 20, 2014, at 8:32 PM, Saikat Kanjilal sxk1...@hotmail.com
 wrote:
 
  Hey Guys,I'm curious whether there's any plans to move to jdk 1.7 
  for
 the 1.0 release or if this is more effort than what is planned.   Seems
 like a lot of people are moving to 1.7 and wanted to understand 
 whether there's a need.Regards



RE: Welcome Andrew Musselman as new comitter

2014-03-07 Thread Martin, Nick
Awesome! Congrats Andrew very well-deserved.

-Original Message-
From: Sebastian Schelter [mailto:s...@apache.org] 
Sent: Friday, March 07, 2014 12:13 PM
To: u...@mahout.apache.org; dev@mahout.apache.org
Subject: Welcome Andrew Musselman as new comitter

Hi,

this is to announce that the Project Management Committee (PMC) for Apache 
Mahout has asked Andrew Musselman to become committer and we are pleased to 
announce that he has accepted.

Being a committer enables easier contribution to the project since in addition 
to posting patches on JIRA it also gives write access to the code repository. 
That also means that now we have yet another person who can commit patches 
submitted by others to our repo *wink*

Andrew, we look forward to working with you in the future. Welcome! It would be 
great if you could introduce yourself with a few words :)

Sebastian