[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an Arti...

loachli Thu, 08 Jan 2015 17:06:54 -0800

Github user loachli commented on the pull request:

    https://github.com/apache/spark/pull/1290#issuecomment-69278665
  
    Hi jkbradley:
      Could you tell the jira number  related  to  ânew spark.ml package and 
its design docâ
    
    
    åä»¶äºº: jkbradley [mailto:notificati...@github.com]
    åéæ¶é´: 2015å¹´1æ9æ¥ 3:51
    æ¶ä»¶äºº: apache/spark
    æé: Lizhengbing (bing, BIPA)
    ä¸»é¢: Re: [spark] [MLLIB] [spark-2352] Implementation of an Artificial 
Neural Network (ANN) (#1290)
    
    
    @bgreeven<https://github.com/bgreeven> Iâm not too surprised that the 
majority vote (a.k.a. one vs. all) did not do very well; it does not scale well 
with the number of classes. A tree (or better yet, error-corrected output 
codes) generally work better, in my experience.
    
    @avulanov<https://github.com/avulanov> True, we try for consistency with 
APIs, except where weâre changing the norm. There is not a clear write-up 
about the ânorm,â although the new spark.ml package andHc (in the JIRA) 
give an overview of some parts. Basically, weâre aiming to make things more 
pluggable and extensible, while minimizing API change. If that requires 
short-term API changes (such as switching away from ANNWithX method names), 
that can be acceptable.
    
    @bgreeven<https://github.com/bgreeven> 
@avulanov<https://github.com/avulanov> The test results look pretty good, 
though Iâm not sure what to expect for accuracy. I think the main item 
remaining is figuring out the public API. Itâs tough since neural networks / 
deep learning are a rapidly evolving field, and there are a lot of model & 
algorithm variants out there. Ideally, we could put together a design doc (to 
be linked from the JIRA) for this big feature which would:
    
      *   Design a public API for neural networks and deep learning
         *   Comparison of other major librariesâ APIs
         *   Minimum viable product API for an initial PR
         *   Path for the future:
            *   What extensions might we need to do, and can we keep the public 
API stable for these?
            *   What extensions might users want to do? Is the API easily 
extensible and/or pluggable, or can we make it so in the future without 
changing the existing public API?
      *   Briefly discuss the algorithm
         *   Alg sketch, limitations, etc.
         *   Alternative algorithms, and a path for making the optimization 
algorithm pluggable in the future (as weâve discussed a bit in the PR 
conversation)
    
    I realize it takes quite a while to get a big new feature ready. If youâd 
like to encourage early adoption, you could also post this for now as a package 
for Spark, while the PR is made fully ready.
    
    CC: @mengxr<https://github.com/mengxr>
    
    â
    Reply to this email directly or view it on 
GitHub<https://github.com/apache/spark/pull/1290#issuecomment-69237765>.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an Arti...

Reply via email to