Thank you Dimitry. So is there an architectural blueprint for mahout ? What I mean is how can get the 1000 feet overview ? Or the bird eye view of the project. I do see Mahout is very modularized - however I’m still trying to make heads and tails out it :)
@Dimitry - "my investigation points that there are architectural problems in spark that are hard to overcome at this point for high IO algorithms.” - Can you share some more details about this - I’m just curious. > On Apr 18, 2016, at 8:18 PM, Dmitriy Lyubimov <[email protected]> wrote: > > Khurrum, > > mahout is so much a library at this point. > > if you mean if it can be used to build networks with 2d inputs, yes i did > some of that. multi-epoch SGD based systems should be easy enough to build, > and will probably have a reasonable performance -- although I think > dedicated CNN systems like Caffe would still run faster at this point. Full > batch trainers are somewhat slow for larger problems though, my > investigation points that there are architectural problems in spark that > are hard to overcome at this point for high IO algorithms. > > On Mon, Apr 18, 2016 at 11:49 AM, Khurrum Nasim <[email protected]> > wrote: > >> Hi Guys, >> >> Can Mahout be used for things like face detection ? Also which unit >> tests or integration tests do you recommend I should run just to get a >> better feel of the execution flow. >> >> I’m still slowly acclimating to the project. But hopefully should come up >> to speed soon. >> >> >> Many Thanks, >> >> Khurrum >> >> >> >> >>> On Mar 30, 2016, at 3:10 PM, Suneel Marthi <[email protected]> wrote: >>> >>> Thanks Khurrum for stepping up. >>> >>> You just need basic programming skills - Java/Scala to be able to >>> contribute. We can help you with the algorithms and linear algebra stuff. >>> >>> >>> Welcome aboard !! >>> >>> >>> On Wed, Mar 30, 2016 at 3:05 PM, Khurrum Nasim <[email protected] >>> >>> wrote: >>> >>>> Thanks for the advice Dimitry. I’m already signed up on ASF jira. My >>>> handle is “nasimk” >>>> >>>> Do I need to be a linear algebra expert and or math phd to contribute ? >>>> I have 10 plus years of computer programming experience. my background >> is >>>> comp sci. >>>> >>>> Khurrum >>>> >>>> >>>> >>>> >>>> >>>>> On Mar 30, 2016, at 2:57 PM, Dmitriy Lyubimov <[email protected]> >> wrote: >>>>> >>>>> PS You may also want to sign up with ASF Jira so we can assign issues >> to >>>>> yourself. >>>>> >>>>> On Wed, Mar 30, 2016 at 11:52 AM, Dmitriy Lyubimov <[email protected]> >>>>> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Wed, Mar 30, 2016 at 11:43 AM, Khurrum Nasim < >>>> [email protected]> >>>>>> wrote: >>>>>> >>>>>>> Thanks Dimirtry. >>>>>>> >>>>>>> I take a look at see where I can start pitching in. Do I need >>>>>>> contributor access ? how would I create feature branch of my work ? >>>>>>> >>>>>> >>>>>> Khurrum, >>>>>> >>>>>> you only need github account. What you need is to create mahout's >> master >>>>>> fork in your github space and keep it in sync, as possible, with >> master >>>> as >>>>>> you go (by doing regular pulls). That way you have the most chance of >>>>>> having least conflicts possible. >>>>>> >>>>>> At any point in time (I recommend at perhaps when you feel you are >> about >>>>>> 50 to 70% done or just need a code advice), you can create a github >> pull >>>>>> request to the apache/mahout master. Make sure to include MAHOUT-XXX >>>> issue >>>>>> in the head of the pull request, that way ASF will automatically >>>> propagate >>>>>> code comments to jira, and so all discussion can be done entirely on >>>> github. >>>>>> >>>>>> Again, if you take on a signficant contribution (such as a new >> numerical >>>>>> method contribution), I recommend to discuss the proposal on the @dev >>>> list >>>>>> >>>>>> thanks. >>>>>> >>>>>> >>>>>>> >>>>>>> Khurrum >>>>>>> >>>>>>>> On Mar 30, 2016, at 1:12 PM, Dmitriy Lyubimov <[email protected]> >>>>>>> wrote: >>>>>>>> >>>>>>>> Oh but of course! please do! >>>>>>>> >>>>>>>> You may work on any issue, this or any other of your choice, or even >>>> on >>>>>>> any >>>>>>>> new issue you can think of (for sizeable contributions it is >>>>>>> recommended to >>>>>>>> start discussion on the @dev list first though, to make sure to >>>> benefit >>>>>>>> from experience of others. Please file any new issue first to jira). >>>>>>>> >>>>>>>> On Wed, Mar 30, 2016 at 9:05 AM, shashi bushan dongur (JIRA) < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> [ >>>>>>>>> >>>>>>> >>>> >> https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218216#comment-15218216 >>>>>>>>> ] >>>>>>>>> >>>>>>>>> shashi bushan dongur commented on MAHOUT-1788: >>>>>>>>> ---------------------------------------------- >>>>>>>>> >>>>>>>>> Hello. I would like to start contributing to mahout. Can I work on >>>> this >>>>>>>>> issue? >>>>>>>>> >>>>>>>>>> spark-itemsimilarity integration test script cleanup >>>>>>>>>> ---------------------------------------------------- >>>>>>>>>> >>>>>>>>>> Key: MAHOUT-1788 >>>>>>>>>> URL: >>>> https://issues.apache.org/jira/browse/MAHOUT-1788 >>>>>>>>>> Project: Mahout >>>>>>>>>> Issue Type: Improvement >>>>>>>>>> Components: cooccurrence >>>>>>>>>> Affects Versions: 0.11.0 >>>>>>>>>> Reporter: Pat Ferrel >>>>>>>>>> Assignee: Pat Ferrel >>>>>>>>>> Priority: Trivial >>>>>>>>>> Fix For: 1.0.0 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> binary release does not contain data for itemsimilarity tests, >> neith >>>>>>>>> binary nor source versions will run on a cluster unless data is >> hand >>>>>>> copied >>>>>>>>> to hdfs. >>>>>>>>>> Clean this up so it copies data if needed and the data is in both >>>>>>>>> versions. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> This message was sent by Atlassian JIRA >>>>>>>>> (v6.3.4#6332) >>>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>> >>>> >> >>
