[jira] [Updated] (MAHOUT-1319) seqdirectory -filter argument silently ignored when run as MR

2013-12-20 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1319:
--

Attachment: (was: MAHOUT-1319.patch)

 seqdirectory -filter argument silently ignored when run as MR
 -

 Key: MAHOUT-1319
 URL: https://issues.apache.org/jira/browse/MAHOUT-1319
 Project: Mahout
  Issue Type: Bug
  Components: Integration
Affects Versions: 0.8
Reporter: Liz Merkhofer
Assignee: Suneel Marthi
  Labels: seqdirectory, text
 Fix For: 0.9

 Attachments: MAHOUT-1319-custom-filter.patch


 Running seqdirectory (Sequence Files from Input Directory) from the command 
 line and specifying a custom filter using the -filter parameter, the argument 
 is ignored and the default PrefixAdditionFilter is used on the input. No 
 exception is thrown.
 When the same command is run with -xm sequential, the filter is found and 
 works as expected.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (MAHOUT-1319) seqdirectory -filter argument silently ignored when run as MR

2013-12-20 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853776#comment-13853776
 ] 

Suneel Marthi commented on MAHOUT-1319:
---

Uploading a new patch that takes a filter class that implements PathFilter.  
Unlike the sequential version the MR version already handles the keyprefix and 
chunk sizes without the need of a filter class (like PrefixAdditionFilter).

With this patch it should be possible to pass in a CustomFilter that implements 
PathFilter to the MR version of seqdirectory.

 seqdirectory -filter argument silently ignored when run as MR
 -

 Key: MAHOUT-1319
 URL: https://issues.apache.org/jira/browse/MAHOUT-1319
 Project: Mahout
  Issue Type: Bug
  Components: Integration
Affects Versions: 0.8
Reporter: Liz Merkhofer
Assignee: Suneel Marthi
  Labels: seqdirectory, text
 Fix For: 0.9

 Attachments: MAHOUT-1319-custom-filter.patch, MAHOUT-1319.patch


 Running seqdirectory (Sequence Files from Input Directory) from the command 
 line and specifying a custom filter using the -filter parameter, the argument 
 is ignored and the default PrefixAdditionFilter is used on the input. No 
 exception is thrown.
 When the same command is run with -xm sequential, the filter is found and 
 works as expected.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (MAHOUT-1319) seqdirectory -filter argument silently ignored when run as MR

2013-12-20 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1319:
--

Attachment: MAHOUT-1319.patch

 seqdirectory -filter argument silently ignored when run as MR
 -

 Key: MAHOUT-1319
 URL: https://issues.apache.org/jira/browse/MAHOUT-1319
 Project: Mahout
  Issue Type: Bug
  Components: Integration
Affects Versions: 0.8
Reporter: Liz Merkhofer
Assignee: Suneel Marthi
  Labels: seqdirectory, text
 Fix For: 0.9

 Attachments: MAHOUT-1319-custom-filter.patch, MAHOUT-1319.patch


 Running seqdirectory (Sequence Files from Input Directory) from the command 
 line and specifying a custom filter using the -filter parameter, the argument 
 is ignored and the default PrefixAdditionFilter is used on the input. No 
 exception is thrown.
 When the same command is run with -xm sequential, the filter is found and 
 works as expected.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (MAHOUT-1384) Executing the MR version of Naive Bayes/CNB of classify_20newgroups.sh fails.

2013-12-20 Thread Suneel Marthi (JIRA)
Suneel Marthi created MAHOUT-1384:
-

 Summary: Executing the MR version of Naive Bayes/CNB of 
classify_20newgroups.sh fails.
 Key: MAHOUT-1384
 URL: https://issues.apache.org/jira/browse/MAHOUT-1384
 Project: Mahout
  Issue Type: Bug
  Components: Examples
Affects Versions: 0.8
Reporter: Suneel Marthi
Assignee: Suneel Marthi
Priority: Minor
 Fix For: 0.9


Executing the MR version of Naive Bayes/CNB of classify_20newgroups.sh fails.  
This is because the example files are not copied to HDFS for the MR version 
(like what's presently being done in cluster-reuters.sh).





--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (MAHOUT-1384) Executing the MR version of Naive Bayes/CNB of classify_20newgroups.sh fails.

2013-12-20 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1384:
--

Status: Patch Available  (was: Open)

 Executing the MR version of Naive Bayes/CNB of classify_20newgroups.sh fails.
 -

 Key: MAHOUT-1384
 URL: https://issues.apache.org/jira/browse/MAHOUT-1384
 Project: Mahout
  Issue Type: Bug
  Components: Examples
Affects Versions: 0.8
Reporter: Suneel Marthi
Assignee: Suneel Marthi
Priority: Minor
 Fix For: 0.9

 Attachments: MAHOUT-1384.patch


 Executing the MR version of Naive Bayes/CNB of classify_20newgroups.sh fails. 
  This is because the example files are not copied to HDFS for the MR version 
 (like what's presently being done in cluster-reuters.sh).



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (MAHOUT-1384) Executing the MR version of Naive Bayes/CNB of classify_20newgroups.sh fails.

2013-12-20 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1384:
--

Attachment: MAHOUT-1384.patch

 Executing the MR version of Naive Bayes/CNB of classify_20newgroups.sh fails.
 -

 Key: MAHOUT-1384
 URL: https://issues.apache.org/jira/browse/MAHOUT-1384
 Project: Mahout
  Issue Type: Bug
  Components: Examples
Affects Versions: 0.8
Reporter: Suneel Marthi
Assignee: Suneel Marthi
Priority: Minor
 Fix For: 0.9

 Attachments: MAHOUT-1384.patch


 Executing the MR version of Naive Bayes/CNB of classify_20newgroups.sh fails. 
  This is because the example files are not copied to HDFS for the MR version 
 (like what's presently being done in cluster-reuters.sh).



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (MAHOUT-1384) Executing the MR version of Naive Bayes/CNB of classify_20newgroups.sh fails in seqdirectory step.

2013-12-20 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1384:
--

Summary: Executing the MR version of Naive Bayes/CNB of 
classify_20newgroups.sh fails in seqdirectory step.  (was: Executing the MR 
version of Naive Bayes/CNB of classify_20newgroups.sh fails.)

 Executing the MR version of Naive Bayes/CNB of classify_20newgroups.sh fails 
 in seqdirectory step.
 --

 Key: MAHOUT-1384
 URL: https://issues.apache.org/jira/browse/MAHOUT-1384
 Project: Mahout
  Issue Type: Bug
  Components: Examples
Affects Versions: 0.8
Reporter: Suneel Marthi
Assignee: Suneel Marthi
Priority: Minor
 Fix For: 0.9

 Attachments: MAHOUT-1384.patch


 Executing the MR version of Naive Bayes/CNB of classify_20newgroups.sh fails. 
  This is because the example files are not copied to HDFS for the MR version 
 (like what's presently being done in cluster-reuters.sh).



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (MAHOUT-1384) Executing the MR version of Naive Bayes/CNB of classify_20newgroups.sh fails in seqdirectory step.

2013-12-20 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1384:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Fix committed to trunk.

 Executing the MR version of Naive Bayes/CNB of classify_20newgroups.sh fails 
 in seqdirectory step.
 --

 Key: MAHOUT-1384
 URL: https://issues.apache.org/jira/browse/MAHOUT-1384
 Project: Mahout
  Issue Type: Bug
  Components: Examples
Affects Versions: 0.8
Reporter: Suneel Marthi
Assignee: Suneel Marthi
Priority: Minor
 Fix For: 0.9

 Attachments: MAHOUT-1384.patch


 Executing the MR version of Naive Bayes/CNB of classify_20newgroups.sh fails. 
  This is because the example files are not copied to HDFS for the MR version 
 (like what's presently being done in cluster-reuters.sh).



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


Re: Mahout 0.9 release

2013-12-20 Thread Gokhan Capan
+1 for 1.0.

This is more challenging than expected (the old hadoop 0.23 profile
support is misleading)

Sent from my iPhone

 On Dec 19, 2013, at 19:48, Andrew Musselman andrew.mussel...@gmail.com 
 wrote:

 +1


 On Thu, Dec 19, 2013 at 9:20 AM, Suneel Marthi suneel_mar...@yahoo.comwrote:

 +1

 Sent from my iPhone

 On Dec 19, 2013, at 12:17 PM, Frank Scholten fr...@frankscholten.nl
 wrote:

 I am looking at M-1329 (Support for Hadoop 2.x) as we speak. This change
 requires quite some testing and I prefer to push this to 1.0. I am
 thinking
 of creating a unit test that starts miniclusters for each versions and
 runs
 a job in them.




 On Thu, Dec 19, 2013 at 12:28 AM, Suneel Marthi suneel_mar...@yahoo.com
 wrote:

 There's M-1329 that covers this. Hopefully it should make it for 0.9

 Sent from my iPhone

 On Dec 18, 2013, at 6:20 PM, Isabel Drost-Fromm isa...@apache.org
 wrote:

 On Mon, 16 Dec 2013 23:16:36 +0200
 Gokhan Capan gkhn...@gmail.com wrote:

 M-1354 (Support for Hadoop 2.x) - Patch available.
 Gokhan, any updates on this.

 Nope, still couldn't make it work.


 Should we push that for 1.0 then (if this is shortly before completion
 and there's too much in 1.0 to push for a release early next year, I'd
 also be happy to have a smaller release between now and Berlin
 Buzzwords that includes the fix...).

 Isabel



[jira] [Updated] (MAHOUT-1030) Regression: Clustered Points Should be WeightedPropertyVectorWritable not WeightedVectorWritable

2013-12-20 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1030:
--

Fix Version/s: (was: 1.0)

 Regression: Clustered Points Should be WeightedPropertyVectorWritable not 
 WeightedVectorWritable
 

 Key: MAHOUT-1030
 URL: https://issues.apache.org/jira/browse/MAHOUT-1030
 Project: Mahout
  Issue Type: Bug
  Components: Clustering, Integration
Affects Versions: 0.7
Reporter: Jeff Eastman
Assignee: Andrew Musselman
 Fix For: 0.9

 Attachments: MAHOUT-1030.patch, MAHOUT-1030.patch, MAHOUT-1030.patch, 
 MAHOUT-1030.patch, MAHOUT-1030.patch, MAHOUT-1030.patch, MAHOUT-1030.patch


 Looks like this won't make it into this build. Pretty widespread impact on 
 code and tests and I don't know which properties were implemented in the old 
 version. I will create a JIRA and post my interim results.
 On 6/8/12 12:21 PM, Jeff Eastman wrote:
  That's a reversion that evidently got in when the new 
  ClusterClassificationDriver was introduced. It should be a pretty easy fix 
  and I will see if I can make the change before Paritosh cuts the release 
  bits tonight.
 
  On 6/7/12 1:00 PM, Pat Ferrel wrote:
  It appears that in kmeans the clusteredPoints are now written as 
  WeightedVectorWritable where in mahout 0.6 they were 
  WeightedPropertyVectorWritable? This means that the distance from the 
  centroid is no longer stored here? Why? I hope I'm wrong because that is 
  not a welcome change. How is one to order clustered docs by distance from 
  cluster centroid?
 
  I'm sure I could calculate the distance but that would mean looking up the 
  centroid for the cluster id given in the above WeightedVectorWritable, 
  which means iterating through all the clusters for each clustered doc. In 
  my case the number of clusters could be fairly large.
 
  Am I missing something?
 
 
 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (MAHOUT-976) Implement Multilayer Perceptron

2013-12-20 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853937#comment-13853937
 ] 

Suneel Marthi commented on MAHOUT-976:
--

Can this be marked as Duplicate of M-1265 since the code for M-1265 was 
committed to trunk?

 Implement Multilayer Perceptron
 ---

 Key: MAHOUT-976
 URL: https://issues.apache.org/jira/browse/MAHOUT-976
 Project: Mahout
  Issue Type: New Feature
Affects Versions: 0.7
Reporter: Christian Herta
Assignee: Ted Dunning
Priority: Minor
  Labels: multilayer, networks, neural, perceptron
 Fix For: Backlog

 Attachments: MAHOUT-976.patch, MAHOUT-976.patch, MAHOUT-976.patch, 
 MAHOUT-976.patch

   Original Estimate: 80h
  Remaining Estimate: 80h

 Implement a multi layer perceptron
  * via Matrix Multiplication
  * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: 
 Efficent Backprop
  * arbitrary number of hidden layers (also 0  - just the linear model)
  * connection between proximate layers only 
  * different cost and activation functions (different activation function in 
 each layer) 
  * test of backprop by gradient checking 
  * normalization of the inputs (storeable) as part of the model
  
 First:
  * implementation stocastic gradient descent like gradient machine
  * simple gradient descent incl. momentum
 Later (new jira issues):  
  * Distributed Batch learning (see below)  
  * Stacked (Denoising) Autoencoder - Feature Learning
  * advanced cost minimazation like 2nd order methods, conjugate gradient etc.
 Distribution of learning can be done by (batch learning):
  1 Partioning of the data in x chunks 
  2 Learning the weight changes as matrices in each chunk
  3 Combining the matrixes and update of the weights - back to 2
 Maybe this procedure can be done with random parts of the chunks (distributed 
 quasi online learning). 
 Batch learning with delta-bar-delta heuristics for adapting the learning 
 rates.
  



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (MAHOUT-1385) Caching Encoders don't cache

2013-12-20 Thread Johannes Schulte (JIRA)
Johannes Schulte created MAHOUT-1385:


 Summary: Caching Encoders don't cache
 Key: MAHOUT-1385
 URL: https://issues.apache.org/jira/browse/MAHOUT-1385
 Project: Mahout
  Issue Type: Bug
Affects Versions: 0.8
Reporter: Johannes Schulte
Priority: Minor


The Caching... line of encoders contains code of caching the hash code terms 
added to the vector. However, the method hashForProbe inside this classes is 
never called as the signature has String for the parameter original form 
(instead of byte[] like other encoders).

Changing this to byte[] however would lose the java String internal caching of 
the Strings hash code , that is used as a key in the cache map, triggering 
another hash code calculation.





--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (MAHOUT-1385) Caching Encoders don't cache

2013-12-20 Thread Johannes Schulte (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johannes Schulte updated MAHOUT-1385:
-

Attachment: MAHOUT-1385-test.patch

No solution but demonstration of the defect

 Caching Encoders don't cache
 

 Key: MAHOUT-1385
 URL: https://issues.apache.org/jira/browse/MAHOUT-1385
 Project: Mahout
  Issue Type: Bug
Affects Versions: 0.8
Reporter: Johannes Schulte
Priority: Minor
 Attachments: MAHOUT-1385-test.patch


 The Caching... line of encoders contains code of caching the hash code terms 
 added to the vector. However, the method hashForProbe inside this classes 
 is never called as the signature has String for the parameter original form 
 (instead of byte[] like other encoders).
 Changing this to byte[] however would lose the java String internal caching 
 of the Strings hash code , that is used as a key in the cache map, triggering 
 another hash code calculation.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (MAHOUT-1384) Executing the MR version of Naive Bayes/CNB of classify_20newgroups.sh fails in seqdirectory step.

2013-12-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854018#comment-13854018
 ] 

Hudson commented on MAHOUT-1384:


SUCCESS: Integrated in Mahout-Quality #2377 (See 
[https://builds.apache.org/job/Mahout-Quality/2377/])
MAHOUT-1384: Executing the MR version of Naive Bayes/CNB of 
classify_20newgroups.sh fails in seqdirectory step. (smarthi: rev 1552538)
* /mahout/trunk/CHANGELOG
* /mahout/trunk/examples/bin/classify-20newsgroups.sh


 Executing the MR version of Naive Bayes/CNB of classify_20newgroups.sh fails 
 in seqdirectory step.
 --

 Key: MAHOUT-1384
 URL: https://issues.apache.org/jira/browse/MAHOUT-1384
 Project: Mahout
  Issue Type: Bug
  Components: Examples
Affects Versions: 0.8
Reporter: Suneel Marthi
Assignee: Suneel Marthi
Priority: Minor
 Fix For: 0.9

 Attachments: MAHOUT-1384.patch


 Executing the MR version of Naive Bayes/CNB of classify_20newgroups.sh fails. 
  This is because the example files are not copied to HDFS for the MR version 
 (like what's presently being done in cluster-reuters.sh).



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (MAHOUT-1319) seqdirectory -filter argument silently ignored when run as MR

2013-12-20 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1319:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch committed to trunk.

 seqdirectory -filter argument silently ignored when run as MR
 -

 Key: MAHOUT-1319
 URL: https://issues.apache.org/jira/browse/MAHOUT-1319
 Project: Mahout
  Issue Type: Bug
  Components: Integration
Affects Versions: 0.8
Reporter: Liz Merkhofer
Assignee: Suneel Marthi
  Labels: seqdirectory, text
 Fix For: 0.9

 Attachments: MAHOUT-1319-custom-filter.patch, MAHOUT-1319.patch


 Running seqdirectory (Sequence Files from Input Directory) from the command 
 line and specifying a custom filter using the -filter parameter, the argument 
 is ignored and the default PrefixAdditionFilter is used on the input. No 
 exception is thrown.
 When the same command is run with -xm sequential, the filter is found and 
 works as expected.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (MAHOUT-976) Implement Multilayer Perceptron

2013-12-20 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854566#comment-13854566
 ] 

Ted Dunning commented on MAHOUT-976:


Seems like a dupe to me.  Yexi has incorporated the good bits.

 Implement Multilayer Perceptron
 ---

 Key: MAHOUT-976
 URL: https://issues.apache.org/jira/browse/MAHOUT-976
 Project: Mahout
  Issue Type: New Feature
Affects Versions: 0.7
Reporter: Christian Herta
Assignee: Ted Dunning
Priority: Minor
  Labels: multilayer, networks, neural, perceptron
 Fix For: Backlog

 Attachments: MAHOUT-976.patch, MAHOUT-976.patch, MAHOUT-976.patch, 
 MAHOUT-976.patch

   Original Estimate: 80h
  Remaining Estimate: 80h

 Implement a multi layer perceptron
  * via Matrix Multiplication
  * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: 
 Efficent Backprop
  * arbitrary number of hidden layers (also 0  - just the linear model)
  * connection between proximate layers only 
  * different cost and activation functions (different activation function in 
 each layer) 
  * test of backprop by gradient checking 
  * normalization of the inputs (storeable) as part of the model
  
 First:
  * implementation stocastic gradient descent like gradient machine
  * simple gradient descent incl. momentum
 Later (new jira issues):  
  * Distributed Batch learning (see below)  
  * Stacked (Denoising) Autoencoder - Feature Learning
  * advanced cost minimazation like 2nd order methods, conjugate gradient etc.
 Distribution of learning can be done by (batch learning):
  1 Partioning of the data in x chunks 
  2 Learning the weight changes as matrices in each chunk
  3 Combining the matrixes and update of the weights - back to 2
 Maybe this procedure can be done with random parts of the chunks (distributed 
 quasi online learning). 
 Batch learning with delta-bar-delta heuristics for adapting the learning 
 rates.
  



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (MAHOUT-976) Implement Multilayer Perceptron

2013-12-20 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-976:
-

   Resolution: Duplicate
Fix Version/s: 0.9
   Status: Resolved  (was: Patch Available)

 Implement Multilayer Perceptron
 ---

 Key: MAHOUT-976
 URL: https://issues.apache.org/jira/browse/MAHOUT-976
 Project: Mahout
  Issue Type: New Feature
Affects Versions: 0.7
Reporter: Christian Herta
Assignee: Ted Dunning
Priority: Minor
  Labels: multilayer, networks, neural, perceptron
 Fix For: Backlog, 0.9

 Attachments: MAHOUT-976.patch, MAHOUT-976.patch, MAHOUT-976.patch, 
 MAHOUT-976.patch

   Original Estimate: 80h
  Remaining Estimate: 80h

 Implement a multi layer perceptron
  * via Matrix Multiplication
  * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: 
 Efficent Backprop
  * arbitrary number of hidden layers (also 0  - just the linear model)
  * connection between proximate layers only 
  * different cost and activation functions (different activation function in 
 each layer) 
  * test of backprop by gradient checking 
  * normalization of the inputs (storeable) as part of the model
  
 First:
  * implementation stocastic gradient descent like gradient machine
  * simple gradient descent incl. momentum
 Later (new jira issues):  
  * Distributed Batch learning (see below)  
  * Stacked (Denoising) Autoencoder - Feature Learning
  * advanced cost minimazation like 2nd order methods, conjugate gradient etc.
 Distribution of learning can be done by (batch learning):
  1 Partioning of the data in x chunks 
  2 Learning the weight changes as matrices in each chunk
  3 Combining the matrixes and update of the weights - back to 2
 Maybe this procedure can be done with random parts of the chunks (distributed 
 quasi online learning). 
 Batch learning with delta-bar-delta heuristics for adapting the learning 
 rates.
  



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (MAHOUT-976) Implement Multilayer Perceptron

2013-12-20 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-976:
-

Fix Version/s: (was: Backlog)

 Implement Multilayer Perceptron
 ---

 Key: MAHOUT-976
 URL: https://issues.apache.org/jira/browse/MAHOUT-976
 Project: Mahout
  Issue Type: New Feature
Affects Versions: 0.7
Reporter: Christian Herta
Assignee: Ted Dunning
Priority: Minor
  Labels: multilayer, networks, neural, perceptron
 Fix For: 0.9

 Attachments: MAHOUT-976.patch, MAHOUT-976.patch, MAHOUT-976.patch, 
 MAHOUT-976.patch

   Original Estimate: 80h
  Remaining Estimate: 80h

 Implement a multi layer perceptron
  * via Matrix Multiplication
  * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: 
 Efficent Backprop
  * arbitrary number of hidden layers (also 0  - just the linear model)
  * connection between proximate layers only 
  * different cost and activation functions (different activation function in 
 each layer) 
  * test of backprop by gradient checking 
  * normalization of the inputs (storeable) as part of the model
  
 First:
  * implementation stocastic gradient descent like gradient machine
  * simple gradient descent incl. momentum
 Later (new jira issues):  
  * Distributed Batch learning (see below)  
  * Stacked (Denoising) Autoencoder - Feature Learning
  * advanced cost minimazation like 2nd order methods, conjugate gradient etc.
 Distribution of learning can be done by (batch learning):
  1 Partioning of the data in x chunks 
  2 Learning the weight changes as matrices in each chunk
  3 Combining the matrixes and update of the weights - back to 2
 Maybe this procedure can be done with random parts of the chunks (distributed 
 quasi online learning). 
 Batch learning with delta-bar-delta heuristics for adapting the learning 
 rates.
  



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


Mahout 0.9 Release - code freeze

2013-12-20 Thread Suneel Marthi
We fixed all the bugs planned for 0.9 and the code's been committed to trunk. 
the plan is to freeze the trunk this sunday in preparation for 0.9 release.

Please let this group know if there's any code that needs to make it to trunk 
before the code freeze date, otherwise please hold off from committing new code 
to trunk.

Thank u.

Re: Getting off MODERATE list?

2013-12-20 Thread Ted Dunning
Hmm

I should probably be ON that list, but clearly am not.  Not being on the
list, I probably can't help.

Isabel, Grant,

Are you guys on this?  Can you boot Otis and add me?




On Fri, Dec 20, 2013 at 4:27 PM, Otis Gospodnetic 
otis.gospodne...@gmail.com wrote:

 Hi,

 Anyone knows how I can get off Mahout moderator list? Would be an awesome
 Christmas present. :)  Any pointers would be greatly appreciated.

 Thanks,
 Otis
 --
 Performance Monitoring * Log Analytics * Search Analytics
 Solr  Elasticsearch Support * http://sematext.com/


 -- Forwarded message --
 From: dev-reject-1387561207.4887.lfipnnphncnmnkkno...@mahout.apache.org
 Date: Fri, Dec 20, 2013 at 12:40 PM
 Subject: MODERATE for dev@mahout.apache.org
 To:
 Cc: dev-allow-tc.1387561207.llgedpelfhophepigcda-pat=
 occamsmachete@mahout.apache.org



 To approve:
dev-accept-1387561207.4887.lfipnnphncnmnkkno...@mahout.apache.org
 To reject:
dev-reject-1387561207.4887.lfipnnphncnmnkkno...@mahout.apache.org
 To give a reason to reject:
 %%% Start comment
 %%% End comment