I agree with Andrew. Mahout should remain indigenous.
Prakash - you may want to create your own project on github using the mahout library. > On Apr 28, 2016, at 5:43 PM, Andrew Palumbo <ap....@outlook.com> wrote: > > I don't think that this sort of of integration work would be a good fit > directly to the Mahout project. Mahout is more about math, algorithms and an > environment to develop algorithms. We stay away from direct platform > integration. In the past we did have some elasticsearch/mahout integration > work that is not in the code base for this exact reason. I would suggest > that better places to contribute something like this may be: PIO > (https://prediction.io/), or even directly as a package for spark > http://spark-packages.org/ . > > Recent projects integrating Mahout have recently been added to PIO: > https://github.com/PredictionIO/template-scala-parallel-universal-recommendation. > > > I think that the project that you are proposing would be a better fit there. > > Thanks, > > Andy > > > ________________________________________ > From: Saikat Kanjilal <sxk1...@hotmail.com> > Sent: Thursday, April 28, 2016 1:50 PM > To: dev@mahout.apache.org > Subject: Re: Mahout contributions > > I want to start with social data as an example, for example data returned > from FB graph API as well user Twitter data, will send some samples later if > you're interested. > > Sent from my iPhone > >> On Apr 28, 2016, at 10:41 AM, Khurrum Nasim <khurrum.na...@useitc.com> wrote: >> >> >> What type of JSON payload size are we talking about here ? >> >>> On Apr 28, 2016, at 1:32 PM, Saikat Kanjilal <sxk1...@hotmail.com> wrote: >>> >>> Because EL gives you the visualization and non Lucene type query constructs >>> as well and also that it already has a rest API that I plan on tying into >>> mahout. I plan on wrapping some of the clustering algorithms that I >>> implement using Mahout and Spark as a service which can then make calls >>> into other services (namely elasticsearch and neo4j graph service). >>> >>> Sent from my iPhone >>> >>>> On Apr 28, 2016, at 10:22 AM, Khurrum Nasim <khurrum.na...@useitc.com> >>>> wrote: >>>> >>>> @Saikat- why use EL instead of Lucene directly. >>>> >>>> >>>> >>>>> On Apr 28, 2016, at 12:08 PM, Saikat Kanjilal <sxk1...@hotmail.com> wrote: >>>>> >>>>> This is great information thank you, based on this recommendation I won't >>>>> create a JIRA but start work on my project and when the code approaches >>>>> the percentages you are describing I will create the appropriate JIRA's >>>>> and put together a proposal to send to the list, sound ok? Based on your >>>>> latest updates to the wiki i will work on a handful of the clustering >>>>> algorithms since I see that the Spark implementations for these are not >>>>> yet complete. >>>>> Thank you again >>>>> >>>>>> From: ap....@outlook.com >>>>>> To: dev@mahout.apache.org >>>>>> Subject: Re: Mahout contributions >>>>>> Date: Thu, 28 Apr 2016 01:31:09 +0000 >>>>>> >>>>>> Saikat, >>>>>> >>>>>> One other thing that I should say is that you do not need clearance or >>>>>> input from the committers to begin work on your project, and the >>>>>> interest can and should come from the community as a whole. You can >>>>>> write proposal as you've done, and if you don't see any "+1"s or >>>>>> responses from the community at whole with in a few days, you may want >>>>>> to explain in more detail, give examples and use cases. If you are >>>>>> still not seeing +1s or any responses from others then I think you can >>>>>> assume that there may not be interest; this is usually how things work. >>>>>> >>>>>> However if its something that your passionate about and you feel like >>>>>> you can deliver this should not to stop you. People do not always read >>>>>> the dev@ emails or have time to respond. You can still move forward >>>>>> with your proposed contribution by following the steps laid out in my >>>>>> previous email; follow the protocol at: >>>>>> >>>>>> http://mahout.apache.org/developers/how-to-contribute.html >>>>>> >>>>>> and create a JIRA. When you have reached a significant amount of >>>>>> completion (around 70-80%), open a PR for review, this way you can >>>>>> explain in more detail. >>>>>> >>>>>> But please realize that when you open a JIRA for a new issue there is >>>>>> some expectation of a commitment on your part to complete it. >>>>>> >>>>>> For example, I am currently investigating some new plotting features. I >>>>>> have spent a good deal of time this week and last already and am even >>>>>> mocking up code as a sketch of what may become an implementation before >>>>>> I open a "New Feature" JIRA for it. >>>>>> >>>>>> My point is absolutely not to discourage you or anybody else from >>>>>> opening JIRAs for new features, rather to let you know that when you >>>>>> open an JIRA for a new issue, It tells others that your are working on >>>>>> it, and thus may discourage another with a similar idea to contribute >>>>>> this feature. So it is best to open it once you've begun your work and >>>>>> are committed to it. >>>>>> >>>>>> Andy >>>>>> >>>>>> ________________________________________ >>>>>> From: Saikat Kanjilal <sxk1...@hotmail.com> >>>>>> Sent: Wednesday, April 27, 2016 8:24 PM >>>>>> To: dev@mahout.apache.org >>>>>> Subject: RE: Mahout contributions >>>>>> >>>>>> Andrew,Thank you very much for your input, I actually want to start a >>>>>> new set of JIRAs, here's what I want to work on, I want to build a >>>>>> framework that ties together search/visualization capability with some >>>>>> machine learning algorithms, so essentially think of it as tying in >>>>>> elasticsearch and kibana into mahout , the user can search for their >>>>>> data with elasticsearch and for deeper analysis on that data they can >>>>>> feed that data into one or more mahout backends for analysis. Another >>>>>> interesting tie in might be to hack kibana to render ggplot like >>>>>> graphics based on the output of mahout algorithms (assuming this can be >>>>>> a kibana plugin). >>>>>> Before I go hog wild to create a bunch of JIRA's I'd like to know if >>>>>> there's interest in this initiative. The tool will bring together the >>>>>> ELK stack with dynamic machine learning algorithms. I can go into a lot >>>>>> more detail around use cases if there's enough interest. >>>>>> Looking forward to your and other committers input.Thanks >>>>>> >>>>>>> From: ap....@outlook.com >>>>>>> To: dev@mahout.apache.org >>>>>>> Subject: Re: Mahout contributions >>>>>>> Date: Wed, 27 Apr 2016 20:16:38 +0000 >>>>>>> >>>>>>> Hello Saikat, >>>>>>> >>>>>>> #1 and #2 above are already implemented. #4 is tricky so i would not >>>>>>> recommend without a strong knowledge of the codebase, and #5 is now >>>>>>> deprecated. (I've just updated the algorithms grid to reflect this). >>>>>>> The algorithms page includes both algorithms implemented in the >>>>>>> math-scala library and algorithms which have CLI drivers written for >>>>>>> them. >>>>>>> >>>>>>> Please see: http://mahout.apache.org/developers/how-to-contribute.html >>>>>>> >>>>>>> And please note that per that documentation, it is in everybody's best >>>>>>> interest to keep messages on list, contacting committers directly is >>>>>>> discouraged. >>>>>>> >>>>>>> The best way to contribute (if you have not found a new bug or issue) >>>>>>> would be for you to pick a single open issue in the mahout JIRA which >>>>>>> is not already assigned, and start work on it. When your work is ready >>>>>>> for review, just open up a PR and the committers will review it. >>>>>>> Please note that if you do pick up an issue to work on, we do expect >>>>>>> some amount of responsibility and reliability and tangible amount of >>>>>>> satisfactory work since once you've marked a JIRA as something you're >>>>>>> working on, others will pass on it. >>>>>>> >>>>>>> Another good way to contribute would be to look for enhancements that >>>>>>> could make to existing code not necessarily open JIRAs that need to be >>>>>>> assigned to you. For example please see the recent contribution and >>>>>>> workflow on: https://issues.apache.org/jira/browse/MAHOUT-1833 . >>>>>>> >>>>>>> If you have something new that you'd like to implement, simply start a >>>>>>> new JIRA issue and begin work on it. In this case, when you have some >>>>>>> code that is ready for review, you can simply open up a PR for it and >>>>>>> committers will review it. For new implementations, we generally say >>>>>>> that you should do this when you are at least 70-80% finished with your >>>>>>> coding. >>>>>>> >>>>>>> Thank You, >>>>>>> >>>>>>> Andy >>>>>>> >>>>>>> >>>>>>> >>>>>>> ________________________________________ >>>>>>> From: Saikat Kanjilal <sxk1...@hotmail.com> >>>>>>> Sent: Tuesday, April 26, 2016 7:17 PM >>>>>>> To: dev@mahout.apache.org >>>>>>> Subject: RE: Mahout contributions >>>>>>> >>>>>>> Hello,Following up on my last email with more specifics, I've looked >>>>>>> through the wiki >>>>>>> (https://mahout.apache.org/users/basics/algorithms.html) and I'm >>>>>>> interested in implementing the one or more of the following algorithms >>>>>>> with Mahout using spark: 1) Matrix Factorization with ALS 2) Naive >>>>>>> Bayes 3) Weighted Matrix Factorization, SVD++ 4) Sparse TF-IDF Vectors >>>>>>> from Text 5) Lucene integration. >>>>>>> Had a few questions:1) Which of these should I start with and where is >>>>>>> there the greatest need?2) Should I fork the repo and create branches >>>>>>> for the each of the above implementations?3) Should I go ahead and >>>>>>> create some JIRAs for these? >>>>>>> Would love to have some pointers to get started?Regards >>>>>>> >>>>>>> From: sxk1...@hotmail.com >>>>>>> To: dev@mahout.apache.org >>>>>>> Subject: Mahout contributions >>>>>>> Date: Wed, 30 Mar 2016 10:23:45 -0700 >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Hello Committers,I was looking through the current jira tickets and was >>>>>>> wondering if there's a particular area of Mahout that needs some more >>>>>>> help than others, should I focus on contributing some algorithms usign >>>>>>> DSL or Samsara related efforts, I've finally got some bandwidth to do >>>>>>> some work and would love some guidance before assigning myself some >>>>>>> tickets.Regards >>