[jira] [Updated] (MAHOUT-1329) Mahout for hadoop 2

2014-02-25 Thread Gokhan Capan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gokhan Capan updated MAHOUT-1329:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 Mahout for hadoop 2
 ---

 Key: MAHOUT-1329
 URL: https://issues.apache.org/jira/browse/MAHOUT-1329
 Project: Mahout
  Issue Type: Task
  Components: build
Affects Versions: 0.9
Reporter: Sergey Svinarchuk
Assignee: Gokhan Capan
  Labels: patch
 Fix For: 1.0

 Attachments: 1329-2.patch, 1329-3.patch, 1329.patch


 Update mahout for work with hadoop 2.X, targeting this for Mahout 1.0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAHOUT-1329) Mahout for hadoop 2

2014-02-25 Thread Gokhan Capan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911436#comment-13911436
 ] 

Gokhan Capan commented on MAHOUT-1329:
--

I committed this to trunk

 Mahout for hadoop 2
 ---

 Key: MAHOUT-1329
 URL: https://issues.apache.org/jira/browse/MAHOUT-1329
 Project: Mahout
  Issue Type: Task
  Components: build
Affects Versions: 0.9
Reporter: Sergey Svinarchuk
Assignee: Gokhan Capan
  Labels: patch
 Fix For: 1.0

 Attachments: 1329-2.patch, 1329-3.patch, 1329.patch


 Update mahout for work with hadoop 2.X, targeting this for Mahout 1.0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAHOUT-1329) Mahout for hadoop 2

2014-02-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911451#comment-13911451
 ] 

Hudson commented on MAHOUT-1329:


SUCCESS: Integrated in Mahout-Quality #2490 (See 
[https://builds.apache.org/job/Mahout-Quality/2490/])
MAHOUT-1329: Mahout for hadoop 2 (gcapan: rev 1571637)
* /mahout/trunk/core/pom.xml
* /mahout/trunk/integration/pom.xml
* /mahout/trunk/pom.xml


 Mahout for hadoop 2
 ---

 Key: MAHOUT-1329
 URL: https://issues.apache.org/jira/browse/MAHOUT-1329
 Project: Mahout
  Issue Type: Task
  Components: build
Affects Versions: 0.9
Reporter: Sergey Svinarchuk
Assignee: Gokhan Capan
  Labels: patch
 Fix For: 1.0

 Attachments: 1329-2.patch, 1329-3.patch, 1329.patch


 Update mahout for work with hadoop 2.X, targeting this for Mahout 1.0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (MAHOUT-1419) Random decision forest is excessively slow on numeric features

2014-02-25 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated MAHOUT-1419:
--

   Resolution: Fixed
Fix Version/s: 1.0
 Assignee: Sean Owen
   Status: Resolved  (was: Patch Available)

OK, the core patch is in. I think additional test scripts can be added 
separately as desired.

 Random decision forest is excessively slow on numeric features
 --

 Key: MAHOUT-1419
 URL: https://issues.apache.org/jira/browse/MAHOUT-1419
 Project: Mahout
  Issue Type: Bug
  Components: Classification
Affects Versions: 0.7, 0.8, 0.9
Reporter: Sean Owen
Assignee: Sean Owen
 Fix For: 1.0

 Attachments: MAHOUT-1419.patch, create-rf-data.sh, run-rf.sh


 Follow-up to MAHOUT-1417. There's a customer running this and observing it 
 take an unreasonably long time on about 2GB of data -- like, 24 hours when 
 other RDF M/R implementations take 9 minutes. The difference is big enough to 
 probably be considered a defect. MAHOUT-1417 got that down to about 5 hours. 
 I am trying to further improve it.
 One key issue seems to be how splits are evaluated over numeric features. A 
 split is tried for every distinct numeric value of the feature in the whole 
 data set. Since these are floating point values, they could (and in the 
 customer's case are) all distinct. 200K rows means 200K splits to evaluate 
 every time a node is built on the feature.
 A better approach is to sample percentiles out of the feature and evaluate 
 only those as splits. Really doing that efficiently would require a lot of 
 rewrite. However, there are some modest changes possible which get some of 
 the benefit, and appear to make it run about 3x faster. That is --on a data 
 set that exhibits this problem -- meaning one using numeric features which 
 are generally distinct. Which is not exotic.
 There are comparable but different problems with handling of categorical 
 features, but that's for a different patch.
 I have a patch, but it changes behavior to some extent since it is evaluating 
 only a sample of splits instead of every single possible one. In particular 
 it makes the output of OptIgSplit no longer match the DefaultIgSplit. 
 Although I think the point is that optimized may mean giving different 
 choices of split here, which could yield differing trees. So that test 
 probably has to go.
 (Along the way I found a number of micro-optimizations in this part of the 
 code that added up to maybe a 3% speedup. And fixed an NPE too.)
 I will propose a patch shortly with all of this for thoughts.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAHOUT-1419) Random decision forest is excessively slow on numeric features

2014-02-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911662#comment-13911662
 ] 

Hudson commented on MAHOUT-1419:


SUCCESS: Integrated in Mahout-Quality #2492 (See 
[https://builds.apache.org/job/Mahout-Quality/2492/])
MAHOUT-1419: Random decision forest is excessively slow on numeric features 
(srowen: rev 1571704)
* /mahout/trunk/CHANGELOG
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/classifier/df/split/OptIgSplit.java
* 
/mahout/trunk/core/src/test/java/org/apache/mahout/classifier/df/split/OptIgSplitTest.java
* 
/mahout/trunk/core/src/test/java/org/apache/mahout/classifier/df/tools/VisualizerTest.java
* 
/mahout/trunk/examples/src/main/java/org/apache/mahout/classifier/df/mapreduce/BuildForest.java


 Random decision forest is excessively slow on numeric features
 --

 Key: MAHOUT-1419
 URL: https://issues.apache.org/jira/browse/MAHOUT-1419
 Project: Mahout
  Issue Type: Bug
  Components: Classification
Affects Versions: 0.7, 0.8, 0.9
Reporter: Sean Owen
Assignee: Sean Owen
 Fix For: 1.0

 Attachments: MAHOUT-1419.patch, create-rf-data.sh, run-rf.sh


 Follow-up to MAHOUT-1417. There's a customer running this and observing it 
 take an unreasonably long time on about 2GB of data -- like, 24 hours when 
 other RDF M/R implementations take 9 minutes. The difference is big enough to 
 probably be considered a defect. MAHOUT-1417 got that down to about 5 hours. 
 I am trying to further improve it.
 One key issue seems to be how splits are evaluated over numeric features. A 
 split is tried for every distinct numeric value of the feature in the whole 
 data set. Since these are floating point values, they could (and in the 
 customer's case are) all distinct. 200K rows means 200K splits to evaluate 
 every time a node is built on the feature.
 A better approach is to sample percentiles out of the feature and evaluate 
 only those as splits. Really doing that efficiently would require a lot of 
 rewrite. However, there are some modest changes possible which get some of 
 the benefit, and appear to make it run about 3x faster. That is --on a data 
 set that exhibits this problem -- meaning one using numeric features which 
 are generally distinct. Which is not exotic.
 There are comparable but different problems with handling of categorical 
 features, but that's for a different patch.
 I have a patch, but it changes behavior to some extent since it is evaluating 
 only a sample of splits instead of every single possible one. In particular 
 it makes the output of OptIgSplit no longer match the DefaultIgSplit. 
 Although I think the point is that optimized may mean giving different 
 choices of split here, which could yield differing trees. So that test 
 probably has to go.
 (Along the way I found a number of micro-optimizations in this part of the 
 code that added up to maybe a 3% speedup. And fixed an NPE too.)
 I will propose a patch shortly with all of this for thoughts.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (MAHOUT-1346) Spark Bindings (DRM)

2014-02-25 Thread Dmitriy Lyubimov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Lyubimov updated MAHOUT-1346:
-

Attachment: ScalaSparkBindings.pdf

WIP manual and working notes

 Spark Bindings (DRM)
 

 Key: MAHOUT-1346
 URL: https://issues.apache.org/jira/browse/MAHOUT-1346
 Project: Mahout
  Issue Type: Improvement
Affects Versions: 0.8
Reporter: Dmitriy Lyubimov
Assignee: Dmitriy Lyubimov
 Fix For: 1.0

 Attachments: ScalaSparkBindings.pdf


 Spark bindings for Mahout DRM. 
 DRM DSL. 
 Disclaimer. This will all be experimental at this point.
 The idea is to wrap DRM by Spark RDD with support of some basic 
 functionality, perhaps some humble beginning of Cost-based optimizer 
 (0) Spark serialization support for Vector, Matrix 
 (1) Bagel transposition 
 (2) slim X'X
 (2a) not-so-slim X'X
 (3) blockify() (compose RDD containing vertical blocks of original input)
 (4) read/write Mahout DRM off HDFS
 (5) A'B
 ...



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (MAHOUT-1426) GSOC 2013 Neural network algorithms

2014-02-25 Thread Maciej Mazur (JIRA)
Maciej Mazur created MAHOUT-1426:


 Summary: GSOC 2013 Neural network algorithms
 Key: MAHOUT-1426
 URL: https://issues.apache.org/jira/browse/MAHOUT-1426
 Project: Mahout
  Issue Type: Improvement
  Components: Classification
Reporter: Maciej Mazur


I would like to ask about possibilites of implementing neural network 
algorithms in mahout during GSOC.

There is a classifier.mlp package with neural network.
I can't see neighter RBM  nor Autoencoder in these classes.
There is only one word about Autoencoders in NeuralNetwork class.
As far as I know Mahout doesn't support convolutional networks.

Is it a good idea to implement one of these algorithms?
Is it a reasonable amount of work?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (MAHOUT-1426) GSOC 2013 Neural network algorithms

2014-02-25 Thread Maciej Mazur (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciej Mazur updated MAHOUT-1426:
-

Description: 
I would like to ask about possibilites of implementing neural network 
algorithms in mahout during GSOC.

There is a classifier.mlp package with neural network.
I can't see neighter RBM  nor Autoencoder in these classes.
There is only one word about Autoencoders in NeuralNetwork class.
As far as I know Mahout doesn't support convolutional networks.

Is it a good idea to implement one of these algorithms?
Is it a reasonable amount of work?

How hard is it to get GSOC in Mahout?
Did anyone succeed last year?

  was:
I would like to ask about possibilites of implementing neural network 
algorithms in mahout during GSOC.

There is a classifier.mlp package with neural network.
I can't see neighter RBM  nor Autoencoder in these classes.
There is only one word about Autoencoders in NeuralNetwork class.
As far as I know Mahout doesn't support convolutional networks.

Is it a good idea to implement one of these algorithms?
Is it a reasonable amount of work?


 GSOC 2013 Neural network algorithms
 ---

 Key: MAHOUT-1426
 URL: https://issues.apache.org/jira/browse/MAHOUT-1426
 Project: Mahout
  Issue Type: Improvement
  Components: Classification
Reporter: Maciej Mazur

 I would like to ask about possibilites of implementing neural network 
 algorithms in mahout during GSOC.
 There is a classifier.mlp package with neural network.
 I can't see neighter RBM  nor Autoencoder in these classes.
 There is only one word about Autoencoders in NeuralNetwork class.
 As far as I know Mahout doesn't support convolutional networks.
 Is it a good idea to implement one of these algorithms?
 Is it a reasonable amount of work?
 How hard is it to get GSOC in Mahout?
 Did anyone succeed last year?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: [jira] [Created] (MAHOUT-1426) GSOC 2013 Neural network algorithms

2014-02-25 Thread Yexi Jiang
Since the training methods for neural network largely requires a lot of
iterations, it is not perfect suitable to implement it in MapReduce style.

Currently, the NeuralNetwork is implemented as an online learning model and
the training is conducted via stochastic gradient descent.

Moreover, currently version of NeuralNetwork is mainly used for supervised
learning, so there is no RBM or Autoencoder.

Regards,
Yexi


2014-02-25 10:34 GMT-05:00 Maciej Mazur (JIRA) j...@apache.org:

 Maciej Mazur created MAHOUT-1426:
 

  Summary: GSOC 2013 Neural network algorithms
  Key: MAHOUT-1426
  URL: https://issues.apache.org/jira/browse/MAHOUT-1426
  Project: Mahout
   Issue Type: Improvement
   Components: Classification
 Reporter: Maciej Mazur


 I would like to ask about possibilites of implementing neural network
 algorithms in mahout during GSOC.

 There is a classifier.mlp package with neural network.
 I can't see neighter RBM  nor Autoencoder in these classes.
 There is only one word about Autoencoders in NeuralNetwork class.
 As far as I know Mahout doesn't support convolutional networks.

 Is it a good idea to implement one of these algorithms?
 Is it a reasonable amount of work?



 --
 This message was sent by Atlassian JIRA
 (v6.1.5#6160)




-- 
--
Yexi Jiang,
ECS 251,  yjian...@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/


[jira] [Comment Edited] (MAHOUT-1426) GSOC 2013 Neural network algorithms

2014-02-25 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911680#comment-13911680
 ] 

Suneel Marthi edited comment on MAHOUT-1426 at 2/25/14 3:59 PM:


The classifier.mlp is a supervised classifier based on Online learning training 
using SGD.  There are old JIRAs that had RBM implementation (not MapReduce)  - 
Mahout-968 and one for Autoencoders (MAhout-732). Both of which never made it 
to the codebase. 


was (Author: smarthi):
The classifier.mlp is a supercised classifier based on Online learning training 
using SGD.  There are old JIRAs that had RBM implementation (not MapReduce)  - 
Mahout-968 and one for Autoencoders (MAhout-732). Both of which never made it 
to the codebase. 

 GSOC 2013 Neural network algorithms
 ---

 Key: MAHOUT-1426
 URL: https://issues.apache.org/jira/browse/MAHOUT-1426
 Project: Mahout
  Issue Type: Improvement
  Components: Classification
Reporter: Maciej Mazur

 I would like to ask about possibilites of implementing neural network 
 algorithms in mahout during GSOC.
 There is a classifier.mlp package with neural network.
 I can't see neighter RBM  nor Autoencoder in these classes.
 There is only one word about Autoencoders in NeuralNetwork class.
 As far as I know Mahout doesn't support convolutional networks.
 Is it a good idea to implement one of these algorithms?
 Is it a reasonable amount of work?
 How hard is it to get GSOC in Mahout?
 Did anyone succeed last year?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAHOUT-1426) GSOC 2013 Neural network algorithms

2014-02-25 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911680#comment-13911680
 ] 

Suneel Marthi commented on MAHOUT-1426:
---

The classifier.mlp is a supercised classifier based on Online learning training 
using SGD.  There are old JIRAs that had RBM implementation (not MapReduce)  - 
Mahout-968 and one for Autoencoders (MAhout-732). Both of which never made it 
to the codebase. 

 GSOC 2013 Neural network algorithms
 ---

 Key: MAHOUT-1426
 URL: https://issues.apache.org/jira/browse/MAHOUT-1426
 Project: Mahout
  Issue Type: Improvement
  Components: Classification
Reporter: Maciej Mazur

 I would like to ask about possibilites of implementing neural network 
 algorithms in mahout during GSOC.
 There is a classifier.mlp package with neural network.
 I can't see neighter RBM  nor Autoencoder in these classes.
 There is only one word about Autoencoders in NeuralNetwork class.
 As far as I know Mahout doesn't support convolutional networks.
 Is it a good idea to implement one of these algorithms?
 Is it a reasonable amount of work?
 How hard is it to get GSOC in Mahout?
 Did anyone succeed last year?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: [jira] [Commented] (MAHOUT-1426) GSOC 2013 Neural network algorithms

2014-02-25 Thread Maciej Mazur
I understand that neural networks aren't perfectly suitable for MapReduce.
But if there is very large network and lagre training set it seems to be a
good solution to use MapReduce.

RBMs and Autoencoders are used for pretraining.  It allows to learn better
representation for deep architectures (acording to
http://www.iro.umontreal.ca/~bengioy/papers/ftml_book.pdf). Deep supervised
multi-layer Neural Networks are very hard to train, starting from random
initialization.



On Tue, Feb 25, 2014 at 5:01 PM, Suneel Marthi (JIRA) j...@apache.orgwrote:


 [
 https://issues.apache.org/jira/browse/MAHOUT-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911680#comment-13911680]

 Suneel Marthi commented on MAHOUT-1426:
 ---

 The classifier.mlp is a supercised classifier based on Online learning
 training using SGD.  There are old JIRAs that had RBM implementation (not
 MapReduce)  - Mahout-968 and one for Autoencoders (MAhout-732). Both of
 which never made it to the codebase.

  GSOC 2013 Neural network algorithms
  ---
 
  Key: MAHOUT-1426
  URL: https://issues.apache.org/jira/browse/MAHOUT-1426
  Project: Mahout
   Issue Type: Improvement
   Components: Classification
 Reporter: Maciej Mazur
 
  I would like to ask about possibilites of implementing neural network
 algorithms in mahout during GSOC.
  There is a classifier.mlp package with neural network.
  I can't see neighter RBM  nor Autoencoder in these classes.
  There is only one word about Autoencoders in NeuralNetwork class.
  As far as I know Mahout doesn't support convolutional networks.
  Is it a good idea to implement one of these algorithms?
  Is it a reasonable amount of work?
  How hard is it to get GSOC in Mahout?
  Did anyone succeed last year?



 --
 This message was sent by Atlassian JIRA
 (v6.1.5#6160)



Re: [jira] [Commented] (MAHOUT-1426) GSOC 2013 Neural network algorithms

2014-02-25 Thread Ted Dunning
Doing a non-map-reduce neural network in Mahout would be of substantial
interest.

I don't see a role for something that is 10x slower than it should be.


On Tue, Feb 25, 2014 at 10:03 AM, Maciej Mazur maciejmaz...@gmail.comwrote:

 I understand that neural networks aren't perfectly suitable for MapReduce.
 But if there is very large network and lagre training set it seems to be a
 good solution to use MapReduce.

 RBMs and Autoencoders are used for pretraining.  It allows to learn better
 representation for deep architectures (acording to
 http://www.iro.umontreal.ca/~bengioy/papers/ftml_book.pdf). Deep
 supervised
 multi-layer Neural Networks are very hard to train, starting from random
 initialization.



 On Tue, Feb 25, 2014 at 5:01 PM, Suneel Marthi (JIRA) j...@apache.org
 wrote:

 
  [
 
 https://issues.apache.org/jira/browse/MAHOUT-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911680#comment-13911680
 ]
 
  Suneel Marthi commented on MAHOUT-1426:
  ---
 
  The classifier.mlp is a supercised classifier based on Online learning
  training using SGD.  There are old JIRAs that had RBM implementation (not
  MapReduce)  - Mahout-968 and one for Autoencoders (MAhout-732). Both of
  which never made it to the codebase.
 
   GSOC 2013 Neural network algorithms
   ---
  
   Key: MAHOUT-1426
   URL: https://issues.apache.org/jira/browse/MAHOUT-1426
   Project: Mahout
Issue Type: Improvement
Components: Classification
  Reporter: Maciej Mazur
  
   I would like to ask about possibilites of implementing neural network
  algorithms in mahout during GSOC.
   There is a classifier.mlp package with neural network.
   I can't see neighter RBM  nor Autoencoder in these classes.
   There is only one word about Autoencoders in NeuralNetwork class.
   As far as I know Mahout doesn't support convolutional networks.
   Is it a good idea to implement one of these algorithms?
   Is it a reasonable amount of work?
   How hard is it to get GSOC in Mahout?
   Did anyone succeed last year?
 
 
 
  --
  This message was sent by Atlassian JIRA
  (v6.1.5#6160)
 



[jira] [Commented] (MAHOUT-1426) GSOC 2013 Neural network algorithms

2014-02-25 Thread Yexi Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911865#comment-13911865
 ] 

Yexi Jiang commented on MAHOUT-1426:


I totally agree with you. From the algorithmic perspective, RBM and Autoencoder 
is proved to be very effective for feature learning. When training multi-level 
neural network, it is usually necessary to stack the RBMs or Autoencoders to 
learn the representative features first.

1. If the training dataset is large.
It is true that if the training data is huge, the online version be be slow as 
it is not a parallel implementation. If we implement the algorithm in MapReduce 
way, the data can be read in parallel. Now matter we use stochastic gradient 
descent, mini-batch gradient descent, or full batch gradient descent, we need 
to train the model with many iteration. In practice, we need one job for each 
iteration. It is know that the start-up time of hadoop is time-consuming, 
therefore, the overhead can be even higher than the actual computing time. For 
example, if we use stochastic gradient descent, after each partition read one 
data instance, we need to update and synchronize the model. IMHO, BSP is more 
effective than MapReduce in such scenario.

2. If the model is large.
If the model is large, we need to partition the model and store it 
distributedly, you can find a solution at a related NIPS paper 
(http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/large_deep_networks_nips2012.pdf).

In this case, the distributed system needs to be heterogeneous, since different 
nodes may have different tasks (for parameter storage or for computing). It is 
difficult to design an algorithm to conduct such work under MapReduce style, as 
each task is considered to be homogeneous in MapReduce. 

Actually, according to the talk of Tera-scale deep learning 
(http://static.googleusercontent.com/media/research.google.com/en/us/archive/unsupervised_learning_talk_2012.pdf),
 even BSP is not quite suitable since the error may always happen in a large 
scale distributed system. In their implementation, they implemented an 
asynchronous computing framework to conduct the large scale learning.

In summary, implementing MapReduce version of NeuralNetwork is OK, but compared 
with the more suitable computing frameworks, it is not so efficient.




 GSOC 2013 Neural network algorithms
 ---

 Key: MAHOUT-1426
 URL: https://issues.apache.org/jira/browse/MAHOUT-1426
 Project: Mahout
  Issue Type: Improvement
  Components: Classification
Reporter: Maciej Mazur

 I would like to ask about possibilites of implementing neural network 
 algorithms in mahout during GSOC.
 There is a classifier.mlp package with neural network.
 I can't see neighter RBM  nor Autoencoder in these classes.
 There is only one word about Autoencoders in NeuralNetwork class.
 As far as I know Mahout doesn't support convolutional networks.
 Is it a good idea to implement one of these algorithms?
 Is it a reasonable amount of work?
 How hard is it to get GSOC in Mahout?
 Did anyone succeed last year?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)