Re: Mahout contributions

Khurrum Nasim Thu, 28 Apr 2016 10:22:31 -0700

@Saikat- why use EL instead of Lucene directly.



> On Apr 28, 2016, at 12:08 PM, Saikat Kanjilal <[email protected]> wrote:
> 
> This is great information thank you, based on this recommendation I won't 
> create a JIRA but start work on my project and when the code approaches the 
> percentages you are describing I will create the appropriate JIRA's and put 
> together a proposal to send to the list, sound ok?  Based on your latest 
> updates to the wiki i will work on a handful of the clustering algorithms 
> since I see that the Spark implementations for these are not yet complete.
> Thank you again
> 
>> From: [email protected]
>> To: [email protected]
>> Subject: Re: Mahout contributions
>> Date: Thu, 28 Apr 2016 01:31:09 +0000
>> 
>> Saikat, 
>> 
>> One other thing that I should say is that you do not need clearance or input 
>> from the committers to begin work on your project, and the interest can and 
>> should come from the community as a whole. You can write proposal as you've 
>> done, and if you don't see any "+1"s or responses from the community at 
>> whole with in a few days, you may want to explain in more detail, give 
>> examples and use cases.  If you are still not seeing +1s or any responses 
>> from others then I think you can assume that there may not be interest; this 
>> is usually how things work.  
>> 
>> However if its something that your passionate about and you feel like you 
>> can deliver this should not to stop you.  People do not always read the dev@ 
>> emails or have time to respond.  You can still move forward with your 
>> proposed contribution by following the steps laid out in my previous email; 
>> follow the protocol at:
>> 
>> http://mahout.apache.org/developers/how-to-contribute.html
>> 
>> and create a JIRA.  When you have reached a significant amount of completion 
>> (around 70-80%), open a PR for review, this way you can explain in more 
>> detail. 
>> 
>> But please realize that when you open a JIRA for a new issue there is some 
>> expectation of a commitment on your part to complete it. 
>> 
>> For example, I am currently investigating some new plotting features.  I 
>> have spent a good deal of time this week and last already and am even 
>> mocking up code as a sketch of what may become an implementation before I 
>> open a "New Feature" JIRA for it.    
>> 
>> My point is absolutely not to discourage you or anybody else from opening 
>> JIRAs for new features, rather to let you know that when you open an JIRA 
>> for a new issue, It tells others that your are working on it, and thus may 
>> discourage another with a similar idea to contribute this feature.  So it is 
>> best to open it once you've begun your work and are committed to it.
>> 
>> Andy
>> 
>> ________________________________________
>> From: Saikat Kanjilal <[email protected]>
>> Sent: Wednesday, April 27, 2016 8:24 PM
>> To: [email protected]
>> Subject: RE: Mahout contributions
>> 
>> Andrew,Thank you very much for your input, I actually want to start a new 
>> set of JIRAs, here's what I want to work on, I want to build a framework 
>> that ties together search/visualization capability with some machine 
>> learning algorithms, so essentially think of it as tying in elasticsearch 
>> and kibana  into mahout , the user can search for their data with 
>> elasticsearch and for deeper analysis on that data they can feed that data 
>> into one or more mahout backends for analysis.  Another interesting tie in 
>> might be to hack kibana to render ggplot like graphics based on the output 
>> of mahout algorithms (assuming this can be a kibana plugin).
>> Before I go hog wild to create a bunch of JIRA's I'd like to know if there's 
>> interest in this initiative.  The tool will bring together the ELK stack 
>> with dynamic machine learning algorithms.  I can go into a lot more detail 
>> around use cases if there's enough interest.
>> Looking forward to your and other committers input.Thanks
>> 
>>> From: [email protected]
>>> To: [email protected]
>>> Subject: Re: Mahout contributions
>>> Date: Wed, 27 Apr 2016 20:16:38 +0000
>>> 
>>> Hello Saikat,
>>> 
>>> #1 and #2 above are already implemented.  #4 is tricky so i would not 
>>> recommend without a strong knowledge of the codebase, and #5 is now 
>>> deprecated.  (I've just updated the algorithms grid to reflect this).  The 
>>> algorithms page includes both algorithms implemented in the math-scala 
>>> library and algorithms which have CLI drivers written for them.
>>> 
>>> Please see: http://mahout.apache.org/developers/how-to-contribute.html
>>> 
>>> And please note that per that documentation, it is in everybody's best 
>>> interest to keep messages on list, contacting committers directly is 
>>> discouraged.
>>> 
>>> The best way to contribute (if you have not found a new bug or issue) would 
>>> be for you to pick a single open issue in the mahout JIRA which is not 
>>> already assigned, and start work on it.  When your work is ready for 
>>> review, just open up a PR and the committers will review it.  Please note 
>>> that if you do pick up an issue to work on, we do expect some amount of 
>>> responsibility and reliability and tangible amount of satisfactory work 
>>> since once you've marked a JIRA as something you're working on, others will 
>>> pass on it.
>>> 
>>> Another good way to contribute would be to look for enhancements that could 
>>> make to existing code not necessarily open JIRAs that need to be assigned 
>>> to you.  For example please see the recent contribution and workflow on: 
>>> https://issues.apache.org/jira/browse/MAHOUT-1833 .
>>> 
>>> If you have something new that you'd like to implement, simply start a new 
>>> JIRA issue and begin work on it.  In this case, when you have some code 
>>> that is ready for review,  you can simply open up a PR for it and 
>>> committers will review it.  For new implementations, we generally say that 
>>> you should do this when you are at least 70-80% finished with your coding.
>>> 
>>> Thank You,
>>> 
>>> Andy
>>> 
>>> 
>>> 
>>> ________________________________________
>>> From: Saikat Kanjilal <[email protected]>
>>> Sent: Tuesday, April 26, 2016 7:17 PM
>>> To: [email protected]
>>> Subject: RE: Mahout contributions
>>> 
>>> Hello,Following up on my last email with more specifics,  I've looked 
>>> through the wiki (https://mahout.apache.org/users/basics/algorithms.html) 
>>> and I'm interested in implementing the one or more of the following 
>>> algorithms with Mahout using spark: 1) Matrix Factorization with ALS 2) 
>>> Naive Bayes 3) Weighted Matrix Factorization, SVD++ 4) Sparse TF-IDF 
>>> Vectors from Text 5) Lucene integration.
>>> Had a few questions:1) Which of these should I start with and where is 
>>> there the greatest need?2) Should I fork the repo and create branches for 
>>> the each of the above implementations?3) Should I go ahead and create some 
>>> JIRAs for these?
>>> Would love to have some pointers to get started?Regards
>>> 
>>> From: [email protected]
>>> To: [email protected]
>>> Subject: Mahout contributions
>>> Date: Wed, 30 Mar 2016 10:23:45 -0700
>>> 
>>> 
>>> 
>>> 
>>> Hello Committers,I was looking through the current jira tickets and was 
>>> wondering if there's a particular area of Mahout that needs some more help 
>>> than others, should I focus on contributing some algorithms usign DSL or 
>>> Samsara related efforts, I've finally got some bandwidth to do some work 
>>> and would love some guidance before assigning myself some tickets.Regards
>

Re: Mahout contributions

Reply via email to