Re: Mahout contributions

Saikat Kanjilal Thu, 28 Apr 2016 10:33:11 -0700

Because EL gives you the visualization and non Lucene type query constructs as 
well and also that it already has a rest API that I plan on tying into mahout.  
I plan on wrapping some of the clustering algorithms that I implement using 
Mahout and Spark as a service which can then make calls into other services 
(namely elasticsearch and neo4j graph service).


Sent from my iPhone

> On Apr 28, 2016, at 10:22 AM, Khurrum Nasim <[email protected]> wrote:
> 
> @Saikat- why use EL instead of Lucene directly. 
> 
> 
> 
>> On Apr 28, 2016, at 12:08 PM, Saikat Kanjilal <[email protected]> wrote:
>> 
>> This is great information thank you, based on this recommendation I won't 
>> create a JIRA but start work on my project and when the code approaches the 
>> percentages you are describing I will create the appropriate JIRA's and put 
>> together a proposal to send to the list, sound ok?  Based on your latest 
>> updates to the wiki i will work on a handful of the clustering algorithms 
>> since I see that the Spark implementations for these are not yet complete.
>> Thank you again
>> 
>>> From: [email protected]
>>> To: [email protected]
>>> Subject: Re: Mahout contributions
>>> Date: Thu, 28 Apr 2016 01:31:09 +0000
>>> 
>>> Saikat, 
>>> 
>>> One other thing that I should say is that you do not need clearance or 
>>> input from the committers to begin work on your project, and the interest 
>>> can and should come from the community as a whole. You can write proposal 
>>> as you've done, and if you don't see any "+1"s or responses from the 
>>> community at whole with in a few days, you may want to explain in more 
>>> detail, give examples and use cases.  If you are still not seeing +1s or 
>>> any responses from others then I think you can assume that there may not be 
>>> interest; this is usually how things work.  
>>> 
>>> However if its something that your passionate about and you feel like you 
>>> can deliver this should not to stop you.  People do not always read the 
>>> dev@ emails or have time to respond.  You can still move forward with your 
>>> proposed contribution by following the steps laid out in my previous email; 
>>> follow the protocol at:
>>> 
>>> http://mahout.apache.org/developers/how-to-contribute.html
>>> 
>>> and create a JIRA.  When you have reached a significant amount of 
>>> completion (around 70-80%), open a PR for review, this way you can explain 
>>> in more detail. 
>>> 
>>> But please realize that when you open a JIRA for a new issue there is some 
>>> expectation of a commitment on your part to complete it. 
>>> 
>>> For example, I am currently investigating some new plotting features.  I 
>>> have spent a good deal of time this week and last already and am even 
>>> mocking up code as a sketch of what may become an implementation before I 
>>> open a "New Feature" JIRA for it.    
>>> 
>>> My point is absolutely not to discourage you or anybody else from opening 
>>> JIRAs for new features, rather to let you know that when you open an JIRA 
>>> for a new issue, It tells others that your are working on it, and thus may 
>>> discourage another with a similar idea to contribute this feature.  So it 
>>> is best to open it once you've begun your work and are committed to it.
>>> 
>>> Andy
>>> 
>>> ________________________________________
>>> From: Saikat Kanjilal <[email protected]>
>>> Sent: Wednesday, April 27, 2016 8:24 PM
>>> To: [email protected]
>>> Subject: RE: Mahout contributions
>>> 
>>> Andrew,Thank you very much for your input, I actually want to start a new 
>>> set of JIRAs, here's what I want to work on, I want to build a framework 
>>> that ties together search/visualization capability with some machine 
>>> learning algorithms, so essentially think of it as tying in elasticsearch 
>>> and kibana  into mahout , the user can search for their data with 
>>> elasticsearch and for deeper analysis on that data they can feed that data 
>>> into one or more mahout backends for analysis.  Another interesting tie in 
>>> might be to hack kibana to render ggplot like graphics based on the output 
>>> of mahout algorithms (assuming this can be a kibana plugin).
>>> Before I go hog wild to create a bunch of JIRA's I'd like to know if 
>>> there's interest in this initiative.  The tool will bring together the ELK 
>>> stack with dynamic machine learning algorithms.  I can go into a lot more 
>>> detail around use cases if there's enough interest.
>>> Looking forward to your and other committers input.Thanks
>>> 
>>>> From: [email protected]
>>>> To: [email protected]
>>>> Subject: Re: Mahout contributions
>>>> Date: Wed, 27 Apr 2016 20:16:38 +0000
>>>> 
>>>> Hello Saikat,
>>>> 
>>>> #1 and #2 above are already implemented.  #4 is tricky so i would not 
>>>> recommend without a strong knowledge of the codebase, and #5 is now 
>>>> deprecated.  (I've just updated the algorithms grid to reflect this).  The 
>>>> algorithms page includes both algorithms implemented in the math-scala 
>>>> library and algorithms which have CLI drivers written for them.
>>>> 
>>>> Please see: http://mahout.apache.org/developers/how-to-contribute.html
>>>> 
>>>> And please note that per that documentation, it is in everybody's best 
>>>> interest to keep messages on list, contacting committers directly is 
>>>> discouraged.
>>>> 
>>>> The best way to contribute (if you have not found a new bug or issue) 
>>>> would be for you to pick a single open issue in the mahout JIRA which is 
>>>> not already assigned, and start work on it.  When your work is ready for 
>>>> review, just open up a PR and the committers will review it.  Please note 
>>>> that if you do pick up an issue to work on, we do expect some amount of 
>>>> responsibility and reliability and tangible amount of satisfactory work 
>>>> since once you've marked a JIRA as something you're working on, others 
>>>> will pass on it.
>>>> 
>>>> Another good way to contribute would be to look for enhancements that 
>>>> could make to existing code not necessarily open JIRAs that need to be 
>>>> assigned to you.  For example please see the recent contribution and 
>>>> workflow on: https://issues.apache.org/jira/browse/MAHOUT-1833 .
>>>> 
>>>> If you have something new that you'd like to implement, simply start a new 
>>>> JIRA issue and begin work on it.  In this case, when you have some code 
>>>> that is ready for review,  you can simply open up a PR for it and 
>>>> committers will review it.  For new implementations, we generally say that 
>>>> you should do this when you are at least 70-80% finished with your coding.
>>>> 
>>>> Thank You,
>>>> 
>>>> Andy
>>>> 
>>>> 
>>>> 
>>>> ________________________________________
>>>> From: Saikat Kanjilal <[email protected]>
>>>> Sent: Tuesday, April 26, 2016 7:17 PM
>>>> To: [email protected]
>>>> Subject: RE: Mahout contributions
>>>> 
>>>> Hello,Following up on my last email with more specifics,  I've looked 
>>>> through the wiki (https://mahout.apache.org/users/basics/algorithms.html) 
>>>> and I'm interested in implementing the one or more of the following 
>>>> algorithms with Mahout using spark: 1) Matrix Factorization with ALS 2) 
>>>> Naive Bayes 3) Weighted Matrix Factorization, SVD++ 4) Sparse TF-IDF 
>>>> Vectors from Text 5) Lucene integration.
>>>> Had a few questions:1) Which of these should I start with and where is 
>>>> there the greatest need?2) Should I fork the repo and create branches for 
>>>> the each of the above implementations?3) Should I go ahead and create some 
>>>> JIRAs for these?
>>>> Would love to have some pointers to get started?Regards
>>>> 
>>>> From: [email protected]
>>>> To: [email protected]
>>>> Subject: Mahout contributions
>>>> Date: Wed, 30 Mar 2016 10:23:45 -0700
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Hello Committers,I was looking through the current jira tickets and was 
>>>> wondering if there's a particular area of Mahout that needs some more help 
>>>> than others, should I focus on contributing some algorithms usign DSL or 
>>>> Samsara related efforts, I've finally got some bandwidth to do some work 
>>>> and would love some guidance before assigning myself some tickets.Regards
>

Re: Mahout contributions

Reply via email to