OK. I am confused now as well.
Even so, I would recommend that you propose a non-map-reduce but still parallel version. Some of the confusion may stem from the fact that you can design some non-map-reduce programs to run in such a way that a map-reduce execution framework like Hadoop thinks that they are doing map-reduce. Instead, these programs are doing whatever they feel like and just pretending to be map-reduce programs in order to get a bunch of processes launched. On Sun, Mar 16, 2014 at 1:27 PM, Maciej Mazur <[email protected]>wrote: > I have one final question. > > I've mixed feelings about this discussion. > You are saying that there is no point in doing mapreduce implementation of > neural netoworks (with pretraining). > Then you are thinking that non map reduce would of substatial interest. > On the other hand you say that it would be easy and it beats the purpose of > doing it of doing it on mahout (because it is not a mr version). > Finally you are saying that building something simple and working is a good > thing. > > I do not really know what to think about it. > Could you give me some advice whether I should write a proposal or not? > (And if I should: Should I propose MapReduce or not MapReduce verison? > There is already NN algorithm but without pretraining.) > > Thanks, > Maciej Mazur > > > > > > On Fri, Feb 28, 2014 at 5:44 AM, peng <[email protected]> wrote: > > > Oh, thanks a lot, I missed that one :) > > +1 on easiest one implemented first. I haven't think about difficulty > > issue, need to read more about YARN extension. > > > > Yours Peng > > > > > > On Thu 27 Feb 2014 08:06:27 PM EST, Yexi Jiang wrote: > > > >> Hi, Peng, > >> > >> Do you mean the MultilayerPerceptron? There are three 'train' method, > and > >> only one (the one without the parameters trackingKey and groupKey) is > >> implemented. In current implementation, they are not used. > >> > >> Regards, > >> Yexi > >> > >> > >> 2014-02-27 19:31 GMT-05:00 Ted Dunning <[email protected]>: > >> > >> Generally for training models like this, there is an assumption that > >>> fault > >>> tolerance is not particularly necessary because the low risk of failure > >>> trades against algorithmic speed. For reasonably small chance of > >>> failure, > >>> simply re-running the training is just fine. If there is high risk of > >>> failure, simply checkpointing the parameter server is sufficient to > allow > >>> restarts without redundancy. > >>> > >>> Sharding the parameter is quite possible and is reasonable when the > >>> parameter vector exceed 10's or 100's of millions of parameters, but > >>> isn't > >>> likely much necessary below that. > >>> > >>> The asymmetry is similarly not a big deal. The traffic to and from the > >>> parameter server isn't enormous. > >>> > >>> > >>> Building something simple and working first is a good thing. > >>> > >>> > >>> On Thu, Feb 27, 2014 at 3:56 PM, peng <[email protected]> wrote: > >>> > >>> With pleasure! the original downpour paper propose a parameter server > >>>> > >>> from > >>> > >>>> which subnodes download shards of old model and upload gradients. So > if > >>>> > >>> the > >>> > >>>> parameter server is down, the process has to be delayed, it also > >>>> requires > >>>> that all model parameters to be stored and atomically updated on (and > >>>> fetched from) a single machine, imposing asymmetric HDD and bandwidth > >>>> requirement. This design is necessary only because each -=delta > >>>> operation > >>>> has to be atomic. Which cannot be ensured across network (e.g. on > HDFS). > >>>> > >>>> But it doesn't mean that the operation cannot be decentralized: > >>>> > >>> parameters > >>> > >>>> can be sharded across multiple nodes and multiple accumulator > instances > >>>> > >>> can > >>> > >>>> handle parts of the vector subtraction. This should be easy if you > >>>> > >>> create a > >>> > >>>> buffer for the stream of gradient, and allocate proper numbers of > >>>> > >>> producers > >>> > >>>> and consumers on each machine to make sure it doesn't overflow. > >>>> Obviously > >>>> this is far from MR framework, but at least it can be made homogeneous > >>>> > >>> and > >>> > >>>> slightly faster (because sparse data can be distributed in a way to > >>>> minimize their overlapping, so gradients doesn't have to go across the > >>>> network that frequent). > >>>> > >>>> If we instead using a centralized architect. Then there must be >=1 > >>>> > >>> backup > >>> > >>>> parameter server for mission critical training. > >>>> > >>>> Yours Peng > >>>> > >>>> e.g. we can simply use a producer/consumer pattern > >>>> > >>>> If we use a producer/consumer pattern for all gradients, > >>>> > >>>> On Thu 27 Feb 2014 05:09:52 PM EST, Yexi Jiang wrote: > >>>> > >>>> Peng, > >>>>> > >>>>> Can you provide more details about your thought? > >>>>> > >>>>> Regards, > >>>>> > >>>>> > >>>>> 2014-02-27 16:00 GMT-05:00 peng <[email protected]>: > >>>>> > >>>>> That should be easy. But that defeats the purpose of using mahout > as > >>>>> > >>>>>> there > >>>>>> are already enough implementations of single node backpropagation > (in > >>>>>> which > >>>>>> case GPU is much faster). > >>>>>> > >>>>>> Yexi: > >>>>>> > >>>>>> Regarding downpour SGD and sandblaster, may I suggest that the > >>>>>> implementation better has no parameter server? It's obviously a > single > >>>>>> point of failure and in terms of bandwidth, a bottleneck. I heard > that > >>>>>> MLlib on top of Spark has a functional implementation (never read or > >>>>>> > >>>>> test > >>> > >>>> it), and its possible to build the workflow on top of YARN. Non of > >>>>>> > >>>>> those > >>> > >>>> framework has an heterogeneous topology. > >>>>>> > >>>>>> Yours Peng > >>>>>> > >>>>>> > >>>>>> On Thu 27 Feb 2014 09:43:19 AM EST, Maciej Mazur (JIRA) wrote: > >>>>>> > >>>>>> > >>>>>> [ https://issues.apache.org/jira/browse/MAHOUT-1426?page= > >>>>>>> com.atlassian.jira.plugin.system.issuetabpanels:comment- > >>>>>>> tabpanel&focusedCommentId=13913488#comment-13913488 ] > >>>>>>> > >>>>>>> Maciej Mazur edited comment on MAHOUT-1426 at 2/27/14 2:41 PM: > >>>>>>> --------------------------------------------------------------- > >>>>>>> > >>>>>>> I've read the papers. I didn't think about distributed network. I > had > >>>>>>> > >>>>>> in > >>> > >>>> mind network that will fit into memory, but will require significant > >>>>>>> amount > >>>>>>> of computations. > >>>>>>> > >>>>>>> I understand that there are better options for neural networks than > >>>>>>> > >>>>>> map > >>> > >>>> reduce. > >>>>>>> How about non-map-reduce version? > >>>>>>> I see that you think it is something that would make a sense. > (Doing > >>>>>>> a > >>>>>>> non-map-reduce neural network in Mahout would be of substantial > >>>>>>> interest.) > >>>>>>> Do you think it will be a valueable contribution? > >>>>>>> Is there a need for this type of algorithm? > >>>>>>> I think about multi-threded batch gradient descent with pretraining > >>>>>>> > >>>>>> (RBM > >>> > >>>> or/and Autoencoders). > >>>>>>> > >>>>>>> I have looked into these old JIRAs. RBM patch was withdrawn. > >>>>>>> "I would rather like to withdraw that patch, because by the time i > >>>>>>> implemented it i didn't know that the learning algorithm is not > >>>>>>> suited > >>>>>>> for > >>>>>>> MR, so I think there is no point including the patch." > >>>>>>> > >>>>>>> > >>>>>>> was (Author: maciejmazur): > >>>>>>> I've read the papers. I didn't think about distributed network. I > had > >>>>>>> > >>>>>> in > >>> > >>>> mind network that will fit into memory, but will require significant > >>>>>>> amount > >>>>>>> of computations. > >>>>>>> > >>>>>>> I understand that there are better options for neural networks than > >>>>>>> > >>>>>> map > >>> > >>>> reduce. > >>>>>>> How about non-map-reduce version? > >>>>>>> I see that you think it is something that would make a sense. > >>>>>>> Do you think it will be a valueable contribution? > >>>>>>> Is there a need for this type of algorithm? > >>>>>>> I think about multi-threded batch gradient descent with pretraining > >>>>>>> > >>>>>> (RBM > >>> > >>>> or/and Autoencoders). > >>>>>>> > >>>>>>> I have looked into these old JIRAs. RBM patch was withdrawn. > >>>>>>> "I would rather like to withdraw that patch, because by the time i > >>>>>>> implemented it i didn't know that the learning algorithm is not > >>>>>>> suited > >>>>>>> for > >>>>>>> MR, so I think there is no point including the patch." > >>>>>>> > >>>>>>> GSOC 2013 Neural network algorithms > >>>>>>> > >>>>>>> ----------------------------------- > >>>>>>>> > >>>>>>>> Key: MAHOUT-1426 > >>>>>>>> URL: https://issues.apache.org/ > >>>>>>>> jira/browse/MAHOUT-1426 > >>>>>>>> Project: Mahout > >>>>>>>> Issue Type: Improvement > >>>>>>>> Components: Classification > >>>>>>>> Reporter: Maciej Mazur > >>>>>>>> > >>>>>>>> I would like to ask about possibilites of implementing neural > >>>>>>>> network > >>>>>>>> algorithms in mahout during GSOC. > >>>>>>>> There is a classifier.mlp package with neural network. > >>>>>>>> I can't see neighter RBM nor Autoencoder in these classes. > >>>>>>>> There is only one word about Autoencoders in NeuralNetwork class. > >>>>>>>> As far as I know Mahout doesn't support convolutional networks. > >>>>>>>> Is it a good idea to implement one of these algorithms? > >>>>>>>> Is it a reasonable amount of work? > >>>>>>>> How hard is it to get GSOC in Mahout? > >>>>>>>> Did anyone succeed last year? > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> This message was sent by Atlassian JIRA > >>>>>>> (v6.1.5#6160) > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>>> > >>> > >> > >> > >> >
