Re: [jira] [Comment Edited] (MAHOUT-1426) GSOC 2013 Neural network algorithms

Ted Dunning Sun, 16 Mar 2014 13:39:43 -0700

OK.

I am confused now as well.


Even so, I would recommend that you propose a non-map-reduce but still
parallel version.

Some of the confusion may stem from the fact that you can design some
non-map-reduce programs to run in such a way that a map-reduce execution
framework like Hadoop thinks that they are doing map-reduce.  Instead,
these programs are doing whatever they feel like and just pretending to be
map-reduce programs in order to get a bunch of processes launched.



On Sun, Mar 16, 2014 at 1:27 PM, Maciej Mazur <[email protected]>wrote:

> I have one final question.
>
> I've mixed feelings about this discussion.
> You are saying that there is no point in doing mapreduce implementation of
> neural netoworks (with pretraining).
> Then you are thinking that non map reduce would of substatial interest.
> On the other hand you say that it would be easy and it beats the purpose of
> doing it of doing it on mahout (because it is not a mr version).
> Finally you are saying that building something simple and working is a good
> thing.
>
> I do not really know what to think about it.
> Could you give me some advice whether I should write a proposal or not?
> (And if I should: Should I propose MapReduce or not MapReduce verison?
> There is already NN algorithm but without pretraining.)
>
> Thanks,
> Maciej Mazur
>
>
>
>
>
> On Fri, Feb 28, 2014 at 5:44 AM, peng <[email protected]> wrote:
>
> > Oh, thanks a lot, I missed that one :)
> > +1 on easiest one implemented first. I haven't think about difficulty
> > issue, need  to read more about YARN extension.
> >
> > Yours Peng
> >
> >
> > On Thu 27 Feb 2014 08:06:27 PM EST, Yexi Jiang wrote:
> >
> >> Hi, Peng,
> >>
> >> Do you mean the MultilayerPerceptron? There are three 'train' method,
> and
> >> only one (the one without the parameters trackingKey and groupKey) is
> >> implemented. In current implementation, they are not used.
> >>
> >> Regards,
> >> Yexi
> >>
> >>
> >> 2014-02-27 19:31 GMT-05:00 Ted Dunning <[email protected]>:
> >>
> >>  Generally for training models like this, there is an assumption that
> >>> fault
> >>> tolerance is not particularly necessary because the low risk of failure
> >>> trades against algorithmic speed.  For reasonably small chance of
> >>> failure,
> >>> simply re-running the training is just fine.  If there is high risk of
> >>> failure, simply checkpointing the parameter server is sufficient to
> allow
> >>> restarts without redundancy.
> >>>
> >>> Sharding the parameter is quite possible and is reasonable when the
> >>> parameter vector exceed 10's or 100's of millions of parameters, but
> >>> isn't
> >>> likely much necessary below that.
> >>>
> >>> The asymmetry is similarly not a big deal.  The traffic to and from the
> >>> parameter server isn't enormous.
> >>>
> >>>
> >>> Building something simple and working first is a good thing.
> >>>
> >>>
> >>> On Thu, Feb 27, 2014 at 3:56 PM, peng <[email protected]> wrote:
> >>>
> >>>  With pleasure! the original downpour paper propose a parameter server
> >>>>
> >>> from
> >>>
> >>>> which subnodes download shards of old model and upload gradients. So
> if
> >>>>
> >>> the
> >>>
> >>>> parameter server is down, the process has to be delayed, it also
> >>>> requires
> >>>> that all model parameters to be stored and atomically updated on (and
> >>>> fetched from) a single machine, imposing asymmetric HDD and bandwidth
> >>>> requirement. This design is necessary only because each -=delta
> >>>> operation
> >>>> has to be atomic. Which cannot be ensured across network (e.g. on
> HDFS).
> >>>>
> >>>> But it doesn't mean that the operation cannot be decentralized:
> >>>>
> >>> parameters
> >>>
> >>>> can be sharded across multiple nodes and multiple accumulator
> instances
> >>>>
> >>> can
> >>>
> >>>> handle parts of the vector subtraction. This should be easy if you
> >>>>
> >>> create a
> >>>
> >>>> buffer for the stream of gradient, and allocate proper numbers of
> >>>>
> >>> producers
> >>>
> >>>> and consumers on each machine to make sure it doesn't overflow.
> >>>> Obviously
> >>>> this is far from MR framework, but at least it can be made homogeneous
> >>>>
> >>> and
> >>>
> >>>> slightly faster (because sparse data can be distributed in a way to
> >>>> minimize their overlapping, so gradients doesn't have to go across the
> >>>> network that frequent).
> >>>>
> >>>> If we instead using a centralized architect. Then there must be >=1
> >>>>
> >>> backup
> >>>
> >>>> parameter server for mission critical training.
> >>>>
> >>>> Yours Peng
> >>>>
> >>>> e.g. we can simply use a producer/consumer pattern
> >>>>
> >>>> If we use a producer/consumer pattern for all gradients,
> >>>>
> >>>> On Thu 27 Feb 2014 05:09:52 PM EST, Yexi Jiang wrote:
> >>>>
> >>>>  Peng,
> >>>>>
> >>>>> Can you provide more details about your thought?
> >>>>>
> >>>>> Regards,
> >>>>>
> >>>>>
> >>>>> 2014-02-27 16:00 GMT-05:00 peng <[email protected]>:
> >>>>>
> >>>>>   That should be easy. But that defeats the purpose of using mahout
> as
> >>>>>
> >>>>>> there
> >>>>>> are already enough implementations of single node backpropagation
> (in
> >>>>>> which
> >>>>>> case GPU is much faster).
> >>>>>>
> >>>>>> Yexi:
> >>>>>>
> >>>>>> Regarding downpour SGD and sandblaster, may I suggest that the
> >>>>>> implementation better has no parameter server? It's obviously a
> single
> >>>>>> point of failure and in terms of bandwidth, a bottleneck. I heard
> that
> >>>>>> MLlib on top of Spark has a functional implementation (never read or
> >>>>>>
> >>>>> test
> >>>
> >>>> it), and its possible to build the workflow on top of YARN. Non of
> >>>>>>
> >>>>> those
> >>>
> >>>> framework has an heterogeneous topology.
> >>>>>>
> >>>>>> Yours Peng
> >>>>>>
> >>>>>>
> >>>>>> On Thu 27 Feb 2014 09:43:19 AM EST, Maciej Mazur (JIRA) wrote:
> >>>>>>
> >>>>>>
> >>>>>>         [ https://issues.apache.org/jira/browse/MAHOUT-1426?page=
> >>>>>>> com.atlassian.jira.plugin.system.issuetabpanels:comment-
> >>>>>>> tabpanel&focusedCommentId=13913488#comment-13913488 ]
> >>>>>>>
> >>>>>>> Maciej Mazur edited comment on MAHOUT-1426 at 2/27/14 2:41 PM:
> >>>>>>> ---------------------------------------------------------------
> >>>>>>>
> >>>>>>> I've read the papers. I didn't think about distributed network. I
> had
> >>>>>>>
> >>>>>> in
> >>>
> >>>> mind network that will fit into memory, but will require significant
> >>>>>>> amount
> >>>>>>> of computations.
> >>>>>>>
> >>>>>>> I understand that there are better options for neural networks than
> >>>>>>>
> >>>>>> map
> >>>
> >>>> reduce.
> >>>>>>> How about non-map-reduce version?
> >>>>>>> I see that you think it is something that would make a sense.
> (Doing
> >>>>>>> a
> >>>>>>> non-map-reduce neural network in Mahout would be of substantial
> >>>>>>> interest.)
> >>>>>>> Do you think it will be a valueable contribution?
> >>>>>>> Is there a need for this type of algorithm?
> >>>>>>> I think about multi-threded batch gradient descent with pretraining
> >>>>>>>
> >>>>>> (RBM
> >>>
> >>>> or/and Autoencoders).
> >>>>>>>
> >>>>>>> I have looked into these old JIRAs. RBM patch was withdrawn.
> >>>>>>> "I would rather like to withdraw that patch, because by the time i
> >>>>>>> implemented it i didn't know that the learning algorithm is not
> >>>>>>> suited
> >>>>>>> for
> >>>>>>> MR, so I think there is no point including the patch."
> >>>>>>>
> >>>>>>>
> >>>>>>> was (Author: maciejmazur):
> >>>>>>> I've read the papers. I didn't think about distributed network. I
> had
> >>>>>>>
> >>>>>> in
> >>>
> >>>> mind network that will fit into memory, but will require significant
> >>>>>>> amount
> >>>>>>> of computations.
> >>>>>>>
> >>>>>>> I understand that there are better options for neural networks than
> >>>>>>>
> >>>>>> map
> >>>
> >>>> reduce.
> >>>>>>> How about non-map-reduce version?
> >>>>>>> I see that you think it is something that would make a sense.
> >>>>>>> Do you think it will be a valueable contribution?
> >>>>>>> Is there a need for this type of algorithm?
> >>>>>>> I think about multi-threded batch gradient descent with pretraining
> >>>>>>>
> >>>>>> (RBM
> >>>
> >>>> or/and Autoencoders).
> >>>>>>>
> >>>>>>> I have looked into these old JIRAs. RBM patch was withdrawn.
> >>>>>>> "I would rather like to withdraw that patch, because by the time i
> >>>>>>> implemented it i didn't know that the learning algorithm is not
> >>>>>>> suited
> >>>>>>> for
> >>>>>>> MR, so I think there is no point including the patch."
> >>>>>>>
> >>>>>>>    GSOC 2013 Neural network algorithms
> >>>>>>>
> >>>>>>>  -----------------------------------
> >>>>>>>>
> >>>>>>>>                    Key: MAHOUT-1426
> >>>>>>>>                    URL: https://issues.apache.org/
> >>>>>>>> jira/browse/MAHOUT-1426
> >>>>>>>>                Project: Mahout
> >>>>>>>>             Issue Type: Improvement
> >>>>>>>>             Components: Classification
> >>>>>>>>               Reporter: Maciej Mazur
> >>>>>>>>
> >>>>>>>> I would like to ask about possibilites of implementing neural
> >>>>>>>> network
> >>>>>>>> algorithms in mahout during GSOC.
> >>>>>>>> There is a classifier.mlp package with neural network.
> >>>>>>>> I can't see neighter RBM  nor Autoencoder in these classes.
> >>>>>>>> There is only one word about Autoencoders in NeuralNetwork class.
> >>>>>>>> As far as I know Mahout doesn't support convolutional networks.
> >>>>>>>> Is it a good idea to implement one of these algorithms?
> >>>>>>>> Is it a reasonable amount of work?
> >>>>>>>> How hard is it to get GSOC in Mahout?
> >>>>>>>> Did anyone succeed last year?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> This message was sent by Atlassian JIRA
> >>>>>>> (v6.1.5#6160)
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>
> >>
> >>
> >>
>

Re: [jira] [Comment Edited] (MAHOUT-1426) GSOC 2013 Neural network algorithms

Reply via email to