Thank you Dimitry.  

So is there an architectural blueprint for mahout ?   What I mean is how can 
get the 1000 feet overview ? Or the bird eye view of the project.  
I do see Mahout is very modularized - however I’m still trying to make heads 
and tails out it :)

@Dimitry - 
"my investigation points that  there are architectural problems in spark that
are hard to overcome at this point for high IO algorithms.”  - Can you share 
some more details about this - I’m just curious.  


> On Apr 18, 2016, at 8:18 PM, Dmitriy Lyubimov <[email protected]> wrote:
> 
> Khurrum,
> 
> mahout is so much  a library at this point.
> 
> if you mean if it can be used to build networks with 2d inputs, yes i did
> some of that. multi-epoch SGD based systems should be easy enough to build,
> and will probably have a reasonable performance -- although I think
> dedicated CNN systems like Caffe would still run faster at this point. Full
> batch trainers are somewhat slow for larger problems though, my
> investigation points that  there are architectural problems in spark that
> are hard to overcome at this point for high IO algorithms.
> 
> On Mon, Apr 18, 2016 at 11:49 AM, Khurrum Nasim <[email protected]>
> wrote:
> 
>> Hi Guys,
>> 
>> Can Mahout be used for things like face detection ?    Also which unit
>> tests or integration tests do you recommend I should run just to get a
>> better feel of the execution flow.
>> 
>> I’m still slowly acclimating to the project.  But hopefully should come up
>> to speed soon.
>> 
>> 
>> Many Thanks,
>> 
>> Khurrum
>> 
>> 
>> 
>> 
>>> On Mar 30, 2016, at 3:10 PM, Suneel Marthi <[email protected]> wrote:
>>> 
>>> Thanks Khurrum for stepping up.
>>> 
>>> You just need basic programming skills - Java/Scala to be able to
>>> contribute. We can help you with the algorithms and linear algebra stuff.
>>> 
>>> 
>>> Welcome aboard !!
>>> 
>>> 
>>> On Wed, Mar 30, 2016 at 3:05 PM, Khurrum Nasim <[email protected]
>>> 
>>> wrote:
>>> 
>>>> Thanks for the advice Dimitry.  I’m already signed up on ASF jira.    My
>>>> handle is “nasimk”
>>>> 
>>>> Do I need to be a linear algebra expert and or math phd  to contribute ?
>>>> I have 10 plus years of computer programming experience.  my background
>> is
>>>> comp sci.
>>>> 
>>>> Khurrum
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On Mar 30, 2016, at 2:57 PM, Dmitriy Lyubimov <[email protected]>
>> wrote:
>>>>> 
>>>>> PS You may also want to sign up with ASF Jira so we can assign issues
>> to
>>>>> yourself.
>>>>> 
>>>>> On Wed, Mar 30, 2016 at 11:52 AM, Dmitriy Lyubimov <[email protected]>
>>>>> wrote:
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Wed, Mar 30, 2016 at 11:43 AM, Khurrum Nasim <
>>>> [email protected]>
>>>>>> wrote:
>>>>>> 
>>>>>>> Thanks Dimirtry.
>>>>>>> 
>>>>>>> I take a look at see where I can start pitching in.  Do I need
>>>>>>> contributor access ? how  would I create feature branch of my work ?
>>>>>>> 
>>>>>> 
>>>>>> Khurrum,
>>>>>> 
>>>>>> you only need github account. What you need is to create mahout's
>> master
>>>>>> fork in your github space and keep it in sync, as possible, with
>> master
>>>> as
>>>>>> you go (by doing regular pulls). That way you have the most chance of
>>>>>> having least conflicts possible.
>>>>>> 
>>>>>> At any point in time (I recommend at perhaps when you feel you are
>> about
>>>>>> 50 to 70% done or just need a code advice), you can create a github
>> pull
>>>>>> request to the apache/mahout master. Make sure to include MAHOUT-XXX
>>>> issue
>>>>>> in the head of the pull request, that way ASF will automatically
>>>> propagate
>>>>>> code comments to jira, and so all discussion can be done entirely on
>>>> github.
>>>>>> 
>>>>>> Again, if you take on a signficant contribution (such as a new
>> numerical
>>>>>> method contribution), I recommend to discuss the proposal on the @dev
>>>> list
>>>>>> 
>>>>>> thanks.
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> Khurrum
>>>>>>> 
>>>>>>>> On Mar 30, 2016, at 1:12 PM, Dmitriy Lyubimov <[email protected]>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Oh but of course! please do!
>>>>>>>> 
>>>>>>>> You may work on any issue, this or any other of your choice, or even
>>>> on
>>>>>>> any
>>>>>>>> new issue you can think of (for sizeable contributions it is
>>>>>>> recommended to
>>>>>>>> start discussion on the @dev list first though, to make sure to
>>>> benefit
>>>>>>>> from experience of others. Please file any new issue first to jira).
>>>>>>>> 
>>>>>>>> On Wed, Mar 30, 2016 at 9:05 AM, shashi bushan dongur (JIRA) <
>>>>>>>> [email protected]> wrote:
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> [
>>>>>>>>> 
>>>>>>> 
>>>> 
>> https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218216#comment-15218216
>>>>>>>>> ]
>>>>>>>>> 
>>>>>>>>> shashi bushan dongur commented on MAHOUT-1788:
>>>>>>>>> ----------------------------------------------
>>>>>>>>> 
>>>>>>>>> Hello. I would like to start contributing to mahout. Can I work on
>>>> this
>>>>>>>>> issue?
>>>>>>>>> 
>>>>>>>>>> spark-itemsimilarity integration test script cleanup
>>>>>>>>>> ----------------------------------------------------
>>>>>>>>>> 
>>>>>>>>>>             Key: MAHOUT-1788
>>>>>>>>>>             URL:
>>>> https://issues.apache.org/jira/browse/MAHOUT-1788
>>>>>>>>>>         Project: Mahout
>>>>>>>>>>      Issue Type: Improvement
>>>>>>>>>>      Components: cooccurrence
>>>>>>>>>> Affects Versions: 0.11.0
>>>>>>>>>>        Reporter: Pat Ferrel
>>>>>>>>>>        Assignee: Pat Ferrel
>>>>>>>>>>        Priority: Trivial
>>>>>>>>>>         Fix For: 1.0.0
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> binary release does not contain data for itemsimilarity tests,
>> neith
>>>>>>>>> binary nor source versions will run on a cluster unless data is
>> hand
>>>>>>> copied
>>>>>>>>> to hdfs.
>>>>>>>>>> Clean this up so it copies data if needed and the data is in both
>>>>>>>>> versions.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> This message was sent by Atlassian JIRA
>>>>>>>>> (v6.3.4#6332)
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>> 

Reply via email to