[jira] [Updated] (MAHOUT-1285) Arff loader can misparse string data as double

2013-11-28 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1285:
--

Attachment: MAHOUT-1285.patch

> Arff loader can misparse string data as double
> --
>
> Key: MAHOUT-1285
> URL: https://issues.apache.org/jira/browse/MAHOUT-1285
> Project: Mahout
>  Issue Type: Bug
>Affects Versions: 0.9
> Environment: Linux Ubuntu 12.4
>Reporter: Neil Walkinshaw
> Fix For: Backlog
>
> Attachments: MAHOUT-1285.patch, tempArff
>
>
> Have successfully loaded numerous ARFF files with Mahout (originally 
> generated via WEKA). The files contain randomly generated data. For a 
> specific random seed, the following exception is thrown:
> java.lang.NumberFormatException: For input string: 
> "b1shkt70694difsmmmdv0ikmoh"
>   at 
> sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1241)
>   at java.lang.Double.parseDouble(Double.java:540)
>   at 
> org.apache.mahout.utils.vectors.arff.MapBackedARFFModel.processNumeric(MapBackedARFFModel.java:146)
>   at 
> org.apache.mahout.utils.vectors.arff.MapBackedARFFModel.getValue(MapBackedARFFModel.java:97)
>   at 
> org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:77)
>   at 
> org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:30)
>   at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>   at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
>   at 
> org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:44)
>   at 
> org.apache.mahout.utils.vectors.arff.Driver.writeFile(Driver.java:251)
>   at org.apache.mahout.utils.vectors.arff.Driver.main(Driver.java:145)
>   at 
> libInterfaces.MahoutTraceBuilder.generateMahoutFile(MahoutTraceBuilder.java:38)
>   at 
> libInterfaces.MahoutTraceBuilder.generateMahoutReader(MahoutTraceBuilder.java:42)
>   at tests.InputTester.testMahoutMeansShift(InputTester.java:111)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAHOUT-1285) Arff loader can misparse string data as double

2013-11-28 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1285:
--

Status: Patch Available  (was: Open)

A simple fix would be to check if the input String is of NumericFormat before 
parsing it.
See attached patch, its not been tested.

> Arff loader can misparse string data as double
> --
>
> Key: MAHOUT-1285
> URL: https://issues.apache.org/jira/browse/MAHOUT-1285
> Project: Mahout
>  Issue Type: Bug
>Affects Versions: 0.9
> Environment: Linux Ubuntu 12.4
>Reporter: Neil Walkinshaw
> Fix For: Backlog
>
> Attachments: tempArff
>
>
> Have successfully loaded numerous ARFF files with Mahout (originally 
> generated via WEKA). The files contain randomly generated data. For a 
> specific random seed, the following exception is thrown:
> java.lang.NumberFormatException: For input string: 
> "b1shkt70694difsmmmdv0ikmoh"
>   at 
> sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1241)
>   at java.lang.Double.parseDouble(Double.java:540)
>   at 
> org.apache.mahout.utils.vectors.arff.MapBackedARFFModel.processNumeric(MapBackedARFFModel.java:146)
>   at 
> org.apache.mahout.utils.vectors.arff.MapBackedARFFModel.getValue(MapBackedARFFModel.java:97)
>   at 
> org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:77)
>   at 
> org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:30)
>   at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>   at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
>   at 
> org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:44)
>   at 
> org.apache.mahout.utils.vectors.arff.Driver.writeFile(Driver.java:251)
>   at org.apache.mahout.utils.vectors.arff.Driver.main(Driver.java:145)
>   at 
> libInterfaces.MahoutTraceBuilder.generateMahoutFile(MahoutTraceBuilder.java:38)
>   at 
> libInterfaces.MahoutTraceBuilder.generateMahoutReader(MahoutTraceBuilder.java:42)
>   at tests.InputTester.testMahoutMeansShift(InputTester.java:111)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


(MAHOUT-1285) Re-generate the exception programmatically

2013-11-28 Thread Tharindu Rusira
Hi,
Pardon if I'm missing something trivial , I'm new to Mahout.
Is there a way to generate this exception scenario from the code (within a
debugger) ? I could only find this [1], which says how to load arff files
from the command line.


[1]
https://cwiki.apache.org/confluence/display/MAHOUT/Creating+Vectors+from+Weka's+ARFF+Format#

Thanks.


[jira] [Issue Comment Deleted] (MAHOUT-1285) Arff loader can misparse string data as double

2013-11-28 Thread Tharindu Rusira (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tharindu Rusira updated MAHOUT-1285:


Comment: was deleted

(was: [~neilwalkinshaw], could you please show me how to re-generate this 
exception? 
Thanks)

> Arff loader can misparse string data as double
> --
>
> Key: MAHOUT-1285
> URL: https://issues.apache.org/jira/browse/MAHOUT-1285
> Project: Mahout
>  Issue Type: Bug
>Affects Versions: 0.9
> Environment: Linux Ubuntu 12.4
>Reporter: Neil Walkinshaw
> Fix For: Backlog
>
> Attachments: tempArff
>
>
> Have successfully loaded numerous ARFF files with Mahout (originally 
> generated via WEKA). The files contain randomly generated data. For a 
> specific random seed, the following exception is thrown:
> java.lang.NumberFormatException: For input string: 
> "b1shkt70694difsmmmdv0ikmoh"
>   at 
> sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1241)
>   at java.lang.Double.parseDouble(Double.java:540)
>   at 
> org.apache.mahout.utils.vectors.arff.MapBackedARFFModel.processNumeric(MapBackedARFFModel.java:146)
>   at 
> org.apache.mahout.utils.vectors.arff.MapBackedARFFModel.getValue(MapBackedARFFModel.java:97)
>   at 
> org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:77)
>   at 
> org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:30)
>   at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>   at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
>   at 
> org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:44)
>   at 
> org.apache.mahout.utils.vectors.arff.Driver.writeFile(Driver.java:251)
>   at org.apache.mahout.utils.vectors.arff.Driver.main(Driver.java:145)
>   at 
> libInterfaces.MahoutTraceBuilder.generateMahoutFile(MahoutTraceBuilder.java:38)
>   at 
> libInterfaces.MahoutTraceBuilder.generateMahoutReader(MahoutTraceBuilder.java:42)
>   at tests.InputTester.testMahoutMeansShift(InputTester.java:111)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAHOUT-1285) Arff loader can misparse string data as double

2013-11-28 Thread Tharindu Rusira (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13835177#comment-13835177
 ] 

Tharindu Rusira commented on MAHOUT-1285:
-

[~neilwalkinshaw], could you please show me how to re-generate this exception? 
Thanks

> Arff loader can misparse string data as double
> --
>
> Key: MAHOUT-1285
> URL: https://issues.apache.org/jira/browse/MAHOUT-1285
> Project: Mahout
>  Issue Type: Bug
>Affects Versions: 0.9
> Environment: Linux Ubuntu 12.4
>Reporter: Neil Walkinshaw
> Fix For: Backlog
>
> Attachments: tempArff
>
>
> Have successfully loaded numerous ARFF files with Mahout (originally 
> generated via WEKA). The files contain randomly generated data. For a 
> specific random seed, the following exception is thrown:
> java.lang.NumberFormatException: For input string: 
> "b1shkt70694difsmmmdv0ikmoh"
>   at 
> sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1241)
>   at java.lang.Double.parseDouble(Double.java:540)
>   at 
> org.apache.mahout.utils.vectors.arff.MapBackedARFFModel.processNumeric(MapBackedARFFModel.java:146)
>   at 
> org.apache.mahout.utils.vectors.arff.MapBackedARFFModel.getValue(MapBackedARFFModel.java:97)
>   at 
> org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:77)
>   at 
> org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:30)
>   at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>   at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
>   at 
> org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:44)
>   at 
> org.apache.mahout.utils.vectors.arff.Driver.writeFile(Driver.java:251)
>   at org.apache.mahout.utils.vectors.arff.Driver.main(Driver.java:145)
>   at 
> libInterfaces.MahoutTraceBuilder.generateMahoutFile(MahoutTraceBuilder.java:38)
>   at 
> libInterfaces.MahoutTraceBuilder.generateMahoutReader(MahoutTraceBuilder.java:42)
>   at tests.InputTester.testMahoutMeansShift(InputTester.java:111)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAHOUT-1261) TasteHadoopUtils.idToIndex can return an int that has size Integer.MAX_VALUE

2013-11-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13835094#comment-13835094
 ] 

Hudson commented on MAHOUT-1261:


SUCCESS: Integrated in Mahout-Quality #2338 (See 
[https://builds.apache.org/job/Mahout-Quality/2338/])
MAHOUT-1261: TasteHadoopUtils.idToIndex can return an int that has size 
Integer.MAX_VALUE (smarthi: rev 1546379)
* /mahout/trunk/CHANGELOG
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/TasteHadoopUtils.java
* 
/mahout/trunk/core/src/test/java/org/apache/mahout/cf/taste/hadoop/TasteHadoopUtilsTest.java


> TasteHadoopUtils.idToIndex can return an int that has size Integer.MAX_VALUE
> 
>
> Key: MAHOUT-1261
> URL: https://issues.apache.org/jira/browse/MAHOUT-1261
> Project: Mahout
>  Issue Type: Bug
>  Components: Collaborative Filtering
>Affects Versions: 0.8
>Reporter: Dan Filimon
>Assignee: Sebastian Schelter
>Priority: Minor
> Fix For: 0.9
>
> Attachments: MAHOUT-1261.patch
>
>
> I'm running ItemSimilarityJob on a very large (~600M by 4B) matrix that's 
> very sparse (total set of associations is 630MB).
> The job fails because of an IndexException in ToUserVectorsReducer.
> TasteHadoopUtils.idToIndex(long id) hashes a long with:
> 0x7fff & Longs.hashCode(id) (line 
> o.a.m.cf.taste.hadoop.TasteHadoopUtils:57).
> For some id (I don't know what value), the result returned is 
> Integer.MAX_VALUE.
> This cannot be set in the userVector because the cardinality of that is also 
> Integer.MAX_VALUE and it throws an exception.
> So, the issue is that values from 0 to INT_MAX are returned by idToIndex but 
> the vector only has 0 to INT_MAX - 1 possible entries.
> It's a nasty little off-by-one bug.
> I'm thinking of just % size when setting.
> [~ssc] & everyone else, thoughts? :)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (MAHOUT-1350) Bean Utils JarClassLoader Warnings

2013-11-28 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved MAHOUT-1350.
---

   Resolution: Not A Problem
Fix Version/s: 0.9

> Bean Utils JarClassLoader Warnings
> --
>
> Key: MAHOUT-1350
> URL: https://issues.apache.org/jira/browse/MAHOUT-1350
> Project: Mahout
>  Issue Type: Bug
>Reporter: David Williams
>Assignee: Suneel Marthi
>Priority: Minor
> Fix For: 0.9
>
>
> Hi all,
> I am trying to embed a user based recommender in a web service using embedded 
> jetty, and spring 3.  However, including the mahout libraries leads to this 
> collision.  It means I CANNOT use Mahout in its current implementation.
> {code}
> JarClassLoader: Warning: org/apache/commons/collections/FastHashMap.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/collections/ArrayStack.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/collections/FastHashMap$Values.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/collections/FastHashMap$CollectionView$CollectionViewIterator.class
>  in lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/collections/FastHashMap$1.class 
> in lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/collections/BufferUnderflowException.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/collections/FastHashMap$KeySet.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/collections/FastHashMap$CollectionView.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/collections/FastHashMap$EntrySet.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/BasicDynaBean.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/BasicDynaClass.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/beanutils/BeanAccessLanguageException.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/BeanUtils.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/BeanUtilsBean$1.class 
> in lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/BeanUtilsBean.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/ConstructorUtils.class 
> in lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/beanutils/ContextClassLoaderLocal.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/beanutils/ConversionException.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/ConvertUtils.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/ConvertUtilsBean.class 
> in lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/Converter.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by

[jira] [Commented] (MAHOUT-1350) Bean Utils JarClassLoader Warnings

2013-11-28 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13835076#comment-13835076
 ] 

Suneel Marthi commented on MAHOUT-1350:
---

Resolving this as 'Not a Problem', the issue is with conflicting versions of 
similar 3rd party jars in Hadoop and Spring 3.x, definitely not a Mahout issue.

> Bean Utils JarClassLoader Warnings
> --
>
> Key: MAHOUT-1350
> URL: https://issues.apache.org/jira/browse/MAHOUT-1350
> Project: Mahout
>  Issue Type: Bug
>Reporter: David Williams
>Assignee: Suneel Marthi
>Priority: Minor
>
> Hi all,
> I am trying to embed a user based recommender in a web service using embedded 
> jetty, and spring 3.  However, including the mahout libraries leads to this 
> collision.  It means I CANNOT use Mahout in its current implementation.
> {code}
> JarClassLoader: Warning: org/apache/commons/collections/FastHashMap.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/collections/ArrayStack.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/collections/FastHashMap$Values.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/collections/FastHashMap$CollectionView$CollectionViewIterator.class
>  in lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/collections/FastHashMap$1.class 
> in lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/collections/BufferUnderflowException.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/collections/FastHashMap$KeySet.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/collections/FastHashMap$CollectionView.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/collections/FastHashMap$EntrySet.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/BasicDynaBean.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/BasicDynaClass.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/beanutils/BeanAccessLanguageException.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/BeanUtils.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/BeanUtilsBean$1.class 
> in lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/BeanUtilsBean.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/ConstructorUtils.class 
> in lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/beanutils/ContextClassLoaderLocal.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/beanutils/ConversionException.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/ConvertUtils.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/ConvertUtilsBean.class 
> in lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)

[jira] [Updated] (MAHOUT-1364) Upgrade Mahout codebase to Lucene 4.6

2013-11-28 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1364:
--

Component/s: Clustering
 CLI
 Classification

> Upgrade Mahout codebase to Lucene 4.6
> -
>
> Key: MAHOUT-1364
> URL: https://issues.apache.org/jira/browse/MAHOUT-1364
> Project: Mahout
>  Issue Type: Improvement
>  Components: Classification, CLI, Clustering, Examples, Integration
>Affects Versions: 0.8
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
> Fix For: 0.9
>
>
> Parallel Randomized tests (using Carrot RandomizedRunner) fail on Mac OS for 
> code that invokes Lucene API, see the discussion in M-1345.  The fix is to 
> upgrade to a Lucene version > 4.3.1 (which is the present Lucene version in 
> Mahout trunk).  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Comment Edited] (MAHOUT-1364) Upgrade Mahout codebase to Lucene 4.6

2013-11-28 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13835069#comment-13835069
 ] 

Suneel Marthi edited comment on MAHOUT-1364 at 11/28/13 8:42 PM:
-

My initial attempt at this broke all of the FeatureVectorEncoders, due to the 
strict TokenStream workflow in Lucene 4.6.  This may be more involved than 
initially anticipated, will still target this for 0.9 but may have to be 
deferred to Release 1.0 and upgrade to Lucene 4.5.1 for 0.9 release if we can't 
make it.


was (Author: smarthi):
My initial attempt at this broke all of the FeatureValueEncoders, due to the 
strict TokenStream workflow in Lucene 4.6.  This may be more involved than 
initially anticipated, will still target this for 0.9 but may have to be 
deferred to Release 1.0 and upgrade to Lucene 4.5.1 for 0.9 release if we can't 
make it.

> Upgrade Mahout codebase to Lucene 4.6
> -
>
> Key: MAHOUT-1364
> URL: https://issues.apache.org/jira/browse/MAHOUT-1364
> Project: Mahout
>  Issue Type: Improvement
>  Components: Classification, CLI, Clustering, Examples, Integration
>Affects Versions: 0.8
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
> Fix For: 0.9
>
>
> Parallel Randomized tests (using Carrot RandomizedRunner) fail on Mac OS for 
> code that invokes Lucene API, see the discussion in M-1345.  The fix is to 
> upgrade to a Lucene version > 4.3.1 (which is the present Lucene version in 
> Mahout trunk).  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAHOUT-1364) Upgrade Mahout codebase to Lucene 4.6

2013-11-28 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13835069#comment-13835069
 ] 

Suneel Marthi commented on MAHOUT-1364:
---

My initial attempt at this broke all of the FeatureValueEncoders, due to the 
strict TokenStream workflow in Lucene 4.6.  This may be more involved than 
initially anticipated, will still target this for 0.9 but may have to be 
deferred to Release 1.0 and upgrade to Lucene 4.5.1 for 0.9 release if we can't 
make it.

> Upgrade Mahout codebase to Lucene 4.6
> -
>
> Key: MAHOUT-1364
> URL: https://issues.apache.org/jira/browse/MAHOUT-1364
> Project: Mahout
>  Issue Type: Improvement
>  Components: Examples, Integration
>Affects Versions: 0.8
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
> Fix For: 0.9
>
>
> Parallel Randomized tests (using Carrot RandomizedRunner) fail on Mac OS for 
> code that invokes Lucene API, see the discussion in M-1345.  The fix is to 
> upgrade to a Lucene version > 4.3.1 (which is the present Lucene version in 
> Mahout trunk).  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAHOUT-1273) Single Pass Algorithm for Penalized Linear Regression with Cross Validation on MapReduce

2013-11-28 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1273:
--

Fix Version/s: (was: 0.9)
   1.0

Deferring to Release 1.0, [~kuny...@stanford.edu] feel free to bring this back 
to 0.9 queue if you are around.

> Single Pass Algorithm for Penalized Linear Regression with Cross Validation 
> on MapReduce
> 
>
> Key: MAHOUT-1273
> URL: https://issues.apache.org/jira/browse/MAHOUT-1273
> Project: Mahout
>  Issue Type: New Feature
>Affects Versions: 0.9
>Reporter: Kun Yang
>  Labels: documentation, features, patch, test
> Fix For: 1.0
>
> Attachments: Algorithm and Numeric Stability.pdf, Examples.pdf, 
> Manual and Example.pdf, Manual and Example.pdf, Notes.pdf, 
> PenalizedLinear.pdf, PenalizedLinearRegression.patch, java files.pdf
>
>   Original Estimate: 720h
>  Remaining Estimate: 720h
>
> Penalized linear regression such as Lasso, Elastic-net are widely used in 
> machine learning, but there are no very efficient scalable implementations on 
> MapReduce.
> The published distributed algorithms for solving this problem is either 
> iterative (which is not good for MapReduce, see Steven Boyd's paper) or 
> approximate (what if we need exact solutions, see Paralleled stochastic 
> gradient descent); another disadvantage of these algorithms is that they can 
> not do cross validation in the training phase, which requires a 
> user-specified penalty parameter in advance. 
> My ideas can train the model with cross validation in a single pass. They are 
> based on some simple observations.
> The core algorithm is a modified version of coordinate descent (see J. 
> Freedman's paper). They implemented a very efficient R package "glmnet", 
> which is the de facto standard of penalized regression.
> I have implemented the primitive version of this algorithm in Alpine Data 
> Labs.  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: Mahout 0.9 release

2013-11-28 Thread Ted Dunning
On Thu, Nov 28, 2013 at 7:36 AM, Suneel Marthi wrote:

> M-1273 - Kun Yung, Ted, defer this to next release ???
>

Yes.  We haven't heard from Kun for quite a while.


Re: Mahout 0.9 release

2013-11-28 Thread Yexi Jiang
I am working on M-1265.


2013/11/28 Suneel Marthi 

> Update on Open JIRAs for 0.9:
>
> Mahout-1245, Mahout-1304, Mahout-1305, Mahout-1307, Mahout-1326 - all
> related to Wiki updates, please see Isabel's updates.
>
> M-1286 - Peng and Sebastian, we had talked about this during the last
> hangout. Can this be included in 0.9?
>
> M-1030- Andrew Musselman, its critical that we get this into 0.9, its been
> deferred for last 2 Mahout releases.
>
> M-1319, M-1328, M-1347, M-1350 - Suneel
>
>
> M-1265 - Multi Layer Perceptron, Yexi please look at my comments on
> Reviewboard.
>
> M-1273 - Kun Yung, Ted, defer this to next release ???
>
>
>
> M-1312, M-1256 - Stevo, could u take one of them
>
> On Thursday, November 28, 2013 5:01 AM, Isabel Drost-Fromm <
> isa...@apache.org> wrote:
>
> On Wed, 27 Nov 2013 14:23:11 -0800 (PST)
> Suneel Marthi  wrote:
> > Below are the Open issues for 0.9:-
>
> This looks like we should be targeting Dec. 9th as code freeze to me.
> What do you all think?
>
>
> > Mahout-1245, Mahout-1304, Mahout-1305, Mahout-1307, Mahout-1326 - All
> > related to Wiki updates, missing Wiki documentation and Wiki
> > migration to new CMS.  Isabel's working on M-1245 (migrating to new
> > CMS). Could some of the others be consolidated with that?
>
> I believe MAHOUT-1245 essentially is ready to be published - all I want
> before notifying INFRA to
>  switch to the new cms based site is one other
> person to take at least a brief look.
>
> For MAHOUT-1304 - Sebastian, can you please check that the cms based
> site actually does fit on 1280px? We can close this issue then.
>
> MAHOUT-1305 - I think this should be turned into a task to actually
> delete most of the pages that have been migrated to the new CMS (almost
> all of them). Once 1245 is shipped, it would be great if a few more
> people could lend a hand in getting this done.
>
> MAHOUT-1307 - Can be closed once switched to CMS
>
> MAHOUT-1326 - This really relates to the old Confluence export plugin
> we once have been using to generate static pages out of our wiki that
> is no longer active. Unless anyone on the Mahout dev list
>  knows how to
> fully delete all exported static pages we should file an issue with
> INFRA to ask for help getting those deleted. They definitely are
> confusing to users.
>
>
>
> > M-1286 - Peng and ssc, we had talked about this during the last
> > hangout. Can this be included in 0.9?
> >
> > M-1030 - Andrew Musselman? Any updates on this, its important that we
> > fix this for 0.9
> >
> > M-1319, M-1328,
> >  M-1347, M-1364 - Suneel
> >
> > M-1273 - Kun Yung, remember talking about this in one of the earlier
> > hangouts; can't recall what was decided?
> >
> > M-1312, M-1256 - Dan Filimon (or Stevo??)
> >
> > M-996  someone could pick this up (if its still relevant with present
> > codebase i.e.)
>
> I think this can move to the next release - according to the
> contributor and Sebastian the patch is rather hacky and there for
> illustration purposes only. I'd rather see some more thought go into
> that instead of pushing to have this in 0.9.
>
>
> > M-1265 Yexi had submitted a patch for this, it would be good if this
> > could go in as part of 0.9
> >
> > M-1288 Solr Recommender - Pat Ferrell
> >
> > M-1285: Any takers for this?
>
> Would be nice to have - in particular if someone on dev@ (not
> necessarily a committer) wants to get started with the code base.
> Otherwise I'd say fix for next release if time gets short.
>
>
> > M-1356: Isabel's started on this, Stevo could u review this?
>
> We definitely can punt that for the next release or even thereafter. It
> would be great if someone who has some knowledge of Java security
> policies would take a look. The implication of not fixing this
> essentially is that in case someone commits test code that writes
> outside of target or to some globally shared directory we might end up
> having randomly failing tests due to the parallel setup again. But as
> these will occur shortly after the commit it should be easy enough to
> find the code change that caused the breakage.
>
>
>
> > M-1329: Support for Hadoop 2
>
> Is that truly feasable within a week?
>
>
> > M-1366:  Stevo, Isabel 
>
> This should be done as part of the release process by release manager
> at the latest.
>
>
> > M-1261: Sebastian???
> >
> > M-1309, M-1310, M-1311, M-1316 - all related to running Mahout on
> > Windows ??
>
> I'm not aware of us supporting Windows.
>
>
> > M-1350 - Any takers?? (Stevo??)
>
> To me this looks like a broken classpath on the user side. Without a
> patch to at least re-produce the issue I wouldn't spend too much time
>
> on this.
>
>
> Isabel
>



-- 
--
Yexi Jiang,
ECS 251,  yjian...@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/


Re: Mahout 0.9 release

2013-11-28 Thread Shannon Quinn
Possibly. I'll know more after Monday (got a few big deadlines then). 

iPhone'd

> On Nov 28, 2013, at 13:32, Suneel Marthi  wrote:
> 
> Shannon,
> 
> Would it be possible to add Spectral clustering to 
> examples/bin/cluster-reuters.sh (for 0.9)?
> 
> 
> 
> 
> 
> 
> On Thursday, November 28, 2013 12:59 PM, Shannon Quinn  
> wrote:
> 
> I'll aim to get the documentation on spectral clustering done by 0.9, and the 
> code fixes and improvements in for 1.0.
> 
> iPhone'd
> 
> 
>> On Nov 28, 2013, at 12:15, Suneel Marthi  wrote:
>> 
>> Yes, lets defer the arbitrary properties to next release.
>> 
>> 
>> 
>> 
>> 
>> On Thursday, November 28, 2013 11:02 AM, Andrew Musselman 
>>  wrote:
>> 
>> Was going to open M-1030 this weekend; I think doing the quick fix can be 
>> done in time and the more involved job of putting arbitrary properties on 
>> vectors should be pushed to 1.0.
>> 
>> Sound reasonable?
>> 
>> 
>> 
>> On Thu, Nov 28, 2013 at 7:58 AM, Suneel Marthi  
>> wrote:
>> 
>> Forgot to add 
>>> 
>>> 
>>> M-1288 Solr Recommender - Pat Ferrell
>>> 
>>> to my earlier email.
>>> 
>>> 
>>> 
>>> 
>>> On Thursday, November 28, 2013 10:38 AM, Suneel Marthi 
>>>  wrote:
>>> 
>>> Adding Mahout-1349 to the list of JIRAs .
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Thursday, November 28, 2013 10:37 AM, Suneel Marthi 
>>>  wrote:
>>> 
>>> Update on Open JIRAs for 0.9:
>>> 
>>> Mahout-1245, Mahout-1304, Mahout-1305, Mahout-1307, Mahout-1326 - all 
>>> related to Wiki updates, please see Isabel's updates.
>>> 
>>> M-1286 - Peng and
>>>   Sebastian, we had talked about this during the last hangout. Can this be 
>>> included in 0.9?
>>> 
>>> M-1030- Andrew Musselman, its critical that we get this into 0.9, its been 
>>> deferred for last 2 Mahout releases.
>>> 
>>> M-1319, M-1328, M-1347, M-1350 - Suneel
>>> 
>>> 
>>> M-1265 - Multi Layer Perceptron, Yexi please look at my comments on 
>>> Reviewboard.
>>> 
>>> M-1273 - Kun Yung, Ted, defer this to next release ???
>>> 
>>> 
>>> 
>>> M-1312, M-1256 - Stevo, could u take one of them
>>> 
>>> 
>>> On Thursday, November 28, 2013 5:01 AM, Isabel Drost-Fromm 
>>>  wrote:
>>> 
>>> On Wed, 27 Nov 2013 14:23:11 -0800
>>>   (PST)
>>> Suneel Marthi  wrote:
 Below are the Open issues for 0.9:-
>>> 
>>> This looks like we should be targeting Dec. 9th as code freeze to me.
>>> What do you all think?
>>> 
>>> 
 Mahout-1245, Mahout-1304, Mahout-1305, Mahout-1307, Mahout-1326 - All
 related to Wiki updates, missing Wiki documentation and Wiki
 migration to new CMS.  Isabel's working on M-1245 (migrating to new
 CMS). Could some of the others be consolidated with that?
>>> 
>>> I believe MAHOUT-1245 essentially is ready to be published - all I want
>>> before notifying INFRA to
>>> switch to the new cms based site is one other
>>> person to take at least a brief look.
>>> 
>>> For MAHOUT-1304 - Sebastian, can you please check that the cms based
>>> site actually does fit on 1280px? We can close this issue then.
>>> 
>>> MAHOUT-1305 - I think this should be turned into a task to actually
>>> delete most of the pages that have been migrated to the new CMS (almost
>>> all of them). Once 1245 is shipped, it would be great if a few more
>>> people could lend a hand in getting this done.
>>> 
>>> MAHOUT-1307 - Can be closed once switched to CMS
>>> 
>>> MAHOUT-1326 - This really relates to the old Confluence export plugin
>>> we once have been using to generate static pages out of our wiki that
>>> is no longer active. Unless anyone on the Mahout dev list
>>> knows how to
>>> fully
>>>   delete all exported static pages we should file an issue with
>>> INFRA to ask for help getting those deleted. They definitely are
>>> confusing to users.
>>> 
>>> 
>>> 
 M-1286 - Peng and ssc, we had talked about this during the last
 hangout. Can this be included in 0.9?
 
 M-1030 - Andrew Musselman? Any updates on this, its important that we
 fix this for 0.9
 
 M-1319, M-1328,
M-1347, M-1364 - Suneel
 
 M-1273 - Kun Yung, remember talking about this in one of the earlier
 hangouts; can't recall what was decided?
 
 M-1312, M-1256 - Dan Filimon (or Stevo??)
 
 M-996  someone could pick
>>>   this up (if its still relevant with present
 codebase i.e.)
>>> 
>>> I think this can move to the next release - according to the
>>> contributor and Sebastian the patch is rather hacky and there for
>>> illustration purposes only. I'd rather see some more thought go into
>>> that instead of pushing to have this in 0.9.
>>> 
>>> 
 M-1265 Yexi had submitted a patch for this, it would be good if this
 could go in as part of 0.9 
 
 M-1288 Solr Recommender - Pat Ferrell
 
 M-1285: Any takers for this?
>>> 
>>> Would be nice to have - in particular if someone on dev@ (not
>>> necessarily a committer) wants to get started with the code base.
>>> Otherwise I'd say fix for next release
>>>   if time gets short.
>>> 
>>> 
 M

[jira] [Commented] (MAHOUT-1030) Regression: Clustered Points Should be WeightedPropertyVectorWritable not WeightedVectorWritable

2013-11-28 Thread Andrew Musselman (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13835036#comment-13835036
 ] 

Andrew Musselman commented on MAHOUT-1030:
--

I'm planning on fixing this for clustering to get into the 0.9 release next 
week, and then do the bigger change for 1.0.

> Regression: Clustered Points Should be WeightedPropertyVectorWritable not 
> WeightedVectorWritable
> 
>
> Key: MAHOUT-1030
> URL: https://issues.apache.org/jira/browse/MAHOUT-1030
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering, Integration
>Affects Versions: 0.7
>Reporter: Jeff Eastman
>Assignee: Andrew Musselman
> Fix For: 1.0, 0.9
>
> Attachments: MAHOUT-1030.patch, MAHOUT-1030.patch, MAHOUT-1030.patch
>
>
> Looks like this won't make it into this build. Pretty widespread impact on 
> code and tests and I don't know which properties were implemented in the old 
> version. I will create a JIRA and post my interim results.
> On 6/8/12 12:21 PM, Jeff Eastman wrote:
> > That's a reversion that evidently got in when the new 
> > ClusterClassificationDriver was introduced. It should be a pretty easy fix 
> > and I will see if I can make the change before Paritosh cuts the release 
> > bits tonight.
> >
> > On 6/7/12 1:00 PM, Pat Ferrel wrote:
> >> It appears that in kmeans the clusteredPoints are now written as 
> >> WeightedVectorWritable where in mahout 0.6 they were 
> >> WeightedPropertyVectorWritable? This means that the distance from the 
> >> centroid is no longer stored here? Why? I hope I'm wrong because that is 
> >> not a welcome change. How is one to order clustered docs by distance from 
> >> cluster centroid?
> >>
> >> I'm sure I could calculate the distance but that would mean looking up the 
> >> centroid for the cluster id given in the above WeightedVectorWritable, 
> >> which means iterating through all the clusters for each clustered doc. In 
> >> my case the number of clusters could be fairly large.
> >>
> >> Am I missing something?
> >>
> >>
> >



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAHOUT-1030) Regression: Clustered Points Should be WeightedPropertyVectorWritable not WeightedVectorWritable

2013-11-28 Thread Pat Ferrel (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13835031#comment-13835031
 ] 

Pat Ferrel commented on MAHOUT-1030:


Broken record warning: The bigger issue (I agree with Grant about tackling it) 
is that each part of Mahout may or may not support NamedVectors, let alone 
WeightedPropertyVectorWritable. It is probably a big job but before 1.0 it 
would sure be nice to have something like WeightedPropertyVectorWritable 
supported optionally everywhere in Mahout. I've run across many occasions where 
it would save a lot of extra import/export code and do it in a completely 
scalable way. Import/export code is almost always non-scalable because people, 
including me, are too lazy to write external-id to internal-id to external-id 
lookup code in a scalable way. There are a fair number of cases where using a 
WeightedPropertyVectorWritable raises some issues like matrix transpose and 
multiply. Maybe a better way to solve the external to internal to external 
problem is with a scalable implementation supplied as a separate tool in 
Mahout. If there is a feature request for this larger issue maybe that is a 
better place for this discussion. 

As far as this issue is concerned, it is related only to refactoring of the 
clustering code. In 0.6 the distance to centroid was stored in a 
WeightedPropertyVectorWritable. In 0.7 during refactoring, the vector type was 
changed and the distance to centroid was no longer stored with the clustered 
vectors. Restoring this would make iterating through every clustered vector to 
recalculate the distance unnecessary.


> Regression: Clustered Points Should be WeightedPropertyVectorWritable not 
> WeightedVectorWritable
> 
>
> Key: MAHOUT-1030
> URL: https://issues.apache.org/jira/browse/MAHOUT-1030
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering, Integration
>Affects Versions: 0.7
>Reporter: Jeff Eastman
>Assignee: Andrew Musselman
> Fix For: 1.0, 0.9
>
> Attachments: MAHOUT-1030.patch, MAHOUT-1030.patch, MAHOUT-1030.patch
>
>
> Looks like this won't make it into this build. Pretty widespread impact on 
> code and tests and I don't know which properties were implemented in the old 
> version. I will create a JIRA and post my interim results.
> On 6/8/12 12:21 PM, Jeff Eastman wrote:
> > That's a reversion that evidently got in when the new 
> > ClusterClassificationDriver was introduced. It should be a pretty easy fix 
> > and I will see if I can make the change before Paritosh cuts the release 
> > bits tonight.
> >
> > On 6/7/12 1:00 PM, Pat Ferrel wrote:
> >> It appears that in kmeans the clusteredPoints are now written as 
> >> WeightedVectorWritable where in mahout 0.6 they were 
> >> WeightedPropertyVectorWritable? This means that the distance from the 
> >> centroid is no longer stored here? Why? I hope I'm wrong because that is 
> >> not a welcome change. How is one to order clustered docs by distance from 
> >> cluster centroid?
> >>
> >> I'm sure I could calculate the distance but that would mean looking up the 
> >> centroid for the cluster id given in the above WeightedVectorWritable, 
> >> which means iterating through all the clusters for each clustered doc. In 
> >> my case the number of clusters could be fairly large.
> >>
> >> Am I missing something?
> >>
> >>
> >



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: Mahout 0.9 release

2013-11-28 Thread Suneel Marthi
Shannon,

Would it be possible to add Spectral clustering to 
examples/bin/cluster-reuters.sh (for 0.9)?






On Thursday, November 28, 2013 12:59 PM, Shannon Quinn  
wrote:
 
I'll aim to get the documentation on spectral clustering done by 0.9, and the 
code fixes and improvements in for 1.0.

iPhone'd


> On Nov 28, 2013, at 12:15, Suneel Marthi  wrote:
> 
> Yes, lets defer the arbitrary properties to next release.
> 
> 
> 
> 
> 
> On Thursday, November 28, 2013 11:02 AM, Andrew Musselman 
>  wrote:
> 
> Was going to open M-1030 this weekend; I think doing the quick fix can be 
> done in time and the more involved job of putting arbitrary properties on 
> vectors should be pushed to 1.0.
> 
> Sound reasonable?
> 
> 
> 
> On Thu, Nov 28, 2013 at 7:58 AM, Suneel Marthi  
> wrote:
> 
> Forgot to add 
>> 
>> 
>> M-1288 Solr Recommender - Pat Ferrell
>> 
>> to my earlier email.
>> 
>> 
>> 
>> 
>> On Thursday, November 28, 2013 10:38 AM, Suneel Marthi 
>>  wrote:
>> 
>> Adding Mahout-1349 to the list of JIRAs .
>> 
>> 
>> 
>> 
>> 
>> On Thursday, November 28, 2013 10:37 AM, Suneel Marthi 
>>  wrote:
>> 
>> Update on Open JIRAs for 0.9:
>> 
>> Mahout-1245, Mahout-1304, Mahout-1305, Mahout-1307, Mahout-1326 - all 
>> related to Wiki updates, please see Isabel's updates.
>> 
>> M-1286 - Peng and
>>  Sebastian, we had talked about this during the last hangout. Can this be 
>>included in 0.9?
>> 
>> M-1030- Andrew Musselman, its critical that we get this into 0.9, its been 
>> deferred for last 2 Mahout releases.
>> 
>> M-1319, M-1328, M-1347, M-1350 - Suneel
>> 
>> 
>> M-1265 - Multi Layer Perceptron, Yexi please look at my comments on 
>> Reviewboard.
>> 
>> M-1273 - Kun Yung, Ted, defer this to next release ???
>> 
>> 
>> 
>> M-1312, M-1256 - Stevo, could u take one of them
>> 
>> 
>> On Thursday, November 28, 2013 5:01 AM, Isabel Drost-Fromm 
>>  wrote:
>> 
>> On Wed, 27 Nov 2013 14:23:11 -0800
>>  (PST)
>> Suneel Marthi  wrote:
>>> Below are the Open issues for 0.9:-
>> 
>> This looks like we should be targeting Dec. 9th as code freeze to me.
>> What do you all think?
>> 
>> 
>>> Mahout-1245, Mahout-1304, Mahout-1305, Mahout-1307, Mahout-1326 - All
>>> related to Wiki updates, missing Wiki documentation and Wiki
>>> migration to new CMS.  Isabel's working on M-1245 (migrating to new
>>> CMS). Could some of the others be consolidated with that?
>> 
>> I believe MAHOUT-1245 essentially is ready to be published - all I want
>> before notifying INFRA to
>> switch to the new cms based site is one other
>> person to take at least a brief look.
>> 
>> For MAHOUT-1304 - Sebastian, can you please check that the cms based
>> site actually does fit on 1280px? We can close this issue then.
>> 
>> MAHOUT-1305 - I think this should be turned into a task to actually
>> delete most of the pages that have been migrated to the new CMS (almost
>> all of them). Once 1245 is shipped, it would be great if a few more
>> people could lend a hand in getting this done.
>> 
>> MAHOUT-1307 - Can be closed once switched to CMS
>> 
>> MAHOUT-1326 - This really relates to the old Confluence export plugin
>> we once have been using to generate static pages out of our wiki that
>> is no longer active. Unless anyone on the Mahout dev list
>> knows how to
>> fully
>>  delete all exported static pages we should file an issue with
>> INFRA to ask for help getting those deleted. They definitely are
>> confusing to users.
>> 
>> 
>> 
>>> M-1286 - Peng and ssc, we had talked about this during the last
>>> hangout. Can this be included in 0.9?
>>> 
>>> M-1030 - Andrew Musselman? Any updates on this, its important that we
>>> fix this for 0.9
>>> 
>>> M-1319, M-1328,
>>>   M-1347, M-1364 - Suneel
>>> 
>>> M-1273 - Kun Yung, remember talking about this in one of the earlier
>>> hangouts; can't recall what was decided?
>>> 
>>> M-1312, M-1256 - Dan Filimon (or Stevo??)
>>> 
>>> M-996  someone could pick
>>  this up (if its still relevant with present
>>> codebase i.e.)
>> 
>> I think this can move to the next release - according to the
>> contributor and Sebastian the patch is rather hacky and there for
>> illustration purposes only. I'd rather see some more thought go into
>> that instead of pushing to have this in 0.9.
>> 
>> 
>>> M-1265 Yexi had submitted a patch for this, it would be good if this
>>> could go in as part of 0.9 
>>> 
>>> M-1288 Solr Recommender - Pat Ferrell
>>> 
>>> M-1285: Any takers for this?
>> 
>> Would be nice to have - in particular if someone on dev@ (not
>> necessarily a committer) wants to get started with the code base.
>> Otherwise I'd say fix for next release
>>  if time gets short.
>> 
>> 
>>> M-1356: Isabel's started on this, Stevo could u review this?
>> 
>> We definitely can punt that for the next release or even thereafter. It
>> would be great if someone who has some knowledge of Java security
>> policies would take a look. The implication of not fixing this
>> essentially is that in case someone commits test code

Jenkins build is back to normal : Mahout-Examples-Cluster-Reuters-II #678

2013-11-28 Thread Apache Jenkins Server
See 




Re: Mahout 0.9 release

2013-11-28 Thread Shannon Quinn
I'll aim to get the documentation on spectral clustering done by 0.9, and the 
code fixes and improvements in for 1.0.

iPhone'd

> On Nov 28, 2013, at 12:15, Suneel Marthi  wrote:
> 
> Yes, lets defer the arbitrary properties to next release.
> 
> 
> 
> 
> 
> On Thursday, November 28, 2013 11:02 AM, Andrew Musselman 
>  wrote:
> 
> Was going to open M-1030 this weekend; I think doing the quick fix can be 
> done in time and the more involved job of putting arbitrary properties on 
> vectors should be pushed to 1.0.
> 
> Sound reasonable?
> 
> 
> 
> On Thu, Nov 28, 2013 at 7:58 AM, Suneel Marthi  
> wrote:
> 
> Forgot to add 
>> 
>> 
>> M-1288 Solr Recommender - Pat Ferrell
>> 
>> to my earlier email.
>> 
>> 
>> 
>> 
>> On Thursday, November 28, 2013 10:38 AM, Suneel Marthi 
>>  wrote:
>> 
>> Adding Mahout-1349 to the list of JIRAs .
>> 
>> 
>> 
>> 
>> 
>> On Thursday, November 28, 2013 10:37 AM, Suneel Marthi 
>>  wrote:
>> 
>> Update on Open JIRAs for 0.9:
>> 
>> Mahout-1245, Mahout-1304, Mahout-1305, Mahout-1307, Mahout-1326 - all 
>> related to Wiki updates, please see Isabel's updates.
>> 
>> M-1286 - Peng and
>>  Sebastian, we had talked about this during the last hangout. Can this be 
>> included in 0.9?
>> 
>> M-1030- Andrew Musselman, its critical that we get this into 0.9, its been 
>> deferred for last 2 Mahout releases.
>> 
>> M-1319, M-1328, M-1347, M-1350 - Suneel
>> 
>> 
>> M-1265 - Multi Layer Perceptron, Yexi please look at my comments on 
>> Reviewboard.
>> 
>> M-1273 - Kun Yung, Ted, defer this to next release ???
>> 
>> 
>> 
>> M-1312, M-1256 - Stevo, could u take one of them
>> 
>> 
>> On Thursday, November 28, 2013 5:01 AM, Isabel Drost-Fromm 
>>  wrote:
>> 
>> On Wed, 27 Nov 2013 14:23:11 -0800
>>  (PST)
>> Suneel Marthi  wrote:
>>> Below are the Open issues for 0.9:-
>> 
>> This looks like we should be targeting Dec. 9th as code freeze to me.
>> What do you all think?
>> 
>> 
>>> Mahout-1245, Mahout-1304, Mahout-1305, Mahout-1307, Mahout-1326 - All
>>> related to Wiki updates, missing Wiki documentation and Wiki
>>> migration to new CMS.  Isabel's working on M-1245 (migrating to new
>>> CMS). Could some of the others be consolidated with that?
>> 
>> I believe MAHOUT-1245 essentially is ready to be published - all I want
>> before notifying INFRA to
>> switch to the new cms based site is one other
>> person to take at least a brief look.
>> 
>> For MAHOUT-1304 - Sebastian, can you please check that the cms based
>> site actually does fit on 1280px? We can close this issue then.
>> 
>> MAHOUT-1305 - I think this should be turned into a task to actually
>> delete most of the pages that have been migrated to the new CMS (almost
>> all of them). Once 1245 is shipped, it would be great if a few more
>> people could lend a hand in getting this done.
>> 
>> MAHOUT-1307 - Can be closed once switched to CMS
>> 
>> MAHOUT-1326 - This really relates to the old Confluence export plugin
>> we once have been using to generate static pages out of our wiki that
>> is no longer active. Unless anyone on the Mahout dev list
>> knows how to
>> fully
>>  delete all exported static pages we should file an issue with
>> INFRA to ask for help getting those deleted. They definitely are
>> confusing to users.
>> 
>> 
>> 
>>> M-1286 - Peng and ssc, we had talked about this during the last
>>> hangout. Can this be included in 0.9?
>>> 
>>> M-1030 - Andrew Musselman? Any updates on this, its important that we
>>> fix this for 0.9
>>> 
>>> M-1319, M-1328,
>>>   M-1347, M-1364 - Suneel
>>> 
>>> M-1273 - Kun Yung, remember talking about this in one of the earlier
>>> hangouts; can't recall what was decided?
>>> 
>>> M-1312, M-1256 - Dan Filimon (or Stevo??)
>>> 
>>> M-996  someone could pick
>>  this up (if its still relevant with present
>>> codebase i.e.)
>> 
>> I think this can move to the next release - according to the
>> contributor and Sebastian the patch is rather hacky and there for
>> illustration purposes only. I'd rather see some more thought go into
>> that instead of pushing to have this in 0.9.
>> 
>> 
>>> M-1265 Yexi had submitted a patch for this, it would be good if this
>>> could go in as part of 0.9 
>>> 
>>> M-1288 Solr Recommender - Pat Ferrell
>>> 
>>> M-1285: Any takers for this?
>> 
>> Would be nice to have - in particular if someone on dev@ (not
>> necessarily a committer) wants to get started with the code base.
>> Otherwise I'd say fix for next release
>>  if time gets short.
>> 
>> 
>>> M-1356: Isabel's started on this, Stevo could u review this?
>> 
>> We definitely can punt that for the next release or even thereafter. It
>> would be great if someone who has some knowledge of Java security
>> policies would take a look. The implication of not fixing this
>> essentially is that in case someone commits test code that writes
>> outside of target or to some globally shared directory we might end up
>> having randomly failing tests due to the parallel setup again. But as
>> these will occ

[jira] [Commented] (MAHOUT-1030) Regression: Clustered Points Should be WeightedPropertyVectorWritable not WeightedVectorWritable

2013-11-28 Thread Andrew Musselman (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834990#comment-13834990
 ] 

Andrew Musselman commented on MAHOUT-1030:
--

I'll dig through the 0.6 code for this; in the meantime does anyone remember if 
every usage of WeightedPropertyVectorWritable was changed to 
WeightedVectorWritable, or just certain ones used in clustering?

> Regression: Clustered Points Should be WeightedPropertyVectorWritable not 
> WeightedVectorWritable
> 
>
> Key: MAHOUT-1030
> URL: https://issues.apache.org/jira/browse/MAHOUT-1030
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering, Integration
>Affects Versions: 0.7
>Reporter: Jeff Eastman
>Assignee: Andrew Musselman
> Fix For: 1.0, 0.9
>
> Attachments: MAHOUT-1030.patch, MAHOUT-1030.patch, MAHOUT-1030.patch
>
>
> Looks like this won't make it into this build. Pretty widespread impact on 
> code and tests and I don't know which properties were implemented in the old 
> version. I will create a JIRA and post my interim results.
> On 6/8/12 12:21 PM, Jeff Eastman wrote:
> > That's a reversion that evidently got in when the new 
> > ClusterClassificationDriver was introduced. It should be a pretty easy fix 
> > and I will see if I can make the change before Paritosh cuts the release 
> > bits tonight.
> >
> > On 6/7/12 1:00 PM, Pat Ferrel wrote:
> >> It appears that in kmeans the clusteredPoints are now written as 
> >> WeightedVectorWritable where in mahout 0.6 they were 
> >> WeightedPropertyVectorWritable? This means that the distance from the 
> >> centroid is no longer stored here? Why? I hope I'm wrong because that is 
> >> not a welcome change. How is one to order clustered docs by distance from 
> >> cluster centroid?
> >>
> >> I'm sure I could calculate the distance but that would mean looking up the 
> >> centroid for the cluster id given in the above WeightedVectorWritable, 
> >> which means iterating through all the clusters for each clustered doc. In 
> >> my case the number of clusters could be fairly large.
> >>
> >> Am I missing something?
> >>
> >>
> >



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: Mahout 0.9 release

2013-11-28 Thread Suneel Marthi
Yes, lets defer the arbitrary properties to next release.





On Thursday, November 28, 2013 11:02 AM, Andrew Musselman 
 wrote:
 
Was going to open M-1030 this weekend; I think doing the quick fix can be done 
in time and the more involved job of putting arbitrary properties on vectors 
should be pushed to 1.0.

Sound reasonable?



On Thu, Nov 28, 2013 at 7:58 AM, Suneel Marthi  wrote:

Forgot to add 
>
>
>M-1288 Solr Recommender - Pat Ferrell
>
>to my earlier email.
>
>
>
>
>On Thursday, November 28, 2013 10:38 AM, Suneel Marthi 
> wrote:
>
>Adding Mahout-1349 to the list of JIRAs .
>
>
>
>
>
>On Thursday, November 28, 2013 10:37 AM, Suneel Marthi 
> wrote:
>
>Update on Open JIRAs for 0.9:
>
>Mahout-1245, Mahout-1304, Mahout-1305, Mahout-1307, Mahout-1326 - all related 
>to Wiki updates, please see Isabel's updates.
>
>M-1286 - Peng and
> Sebastian, we had talked about this during the last hangout. Can this be 
>included in 0.9?
>
>M-1030- Andrew Musselman, its critical that we get this into 0.9, its been 
>deferred for last 2 Mahout releases.
>
>M-1319, M-1328, M-1347, M-1350 - Suneel
>
>
>M-1265 - Multi Layer Perceptron, Yexi please look at my comments on 
>Reviewboard.
>
>M-1273 - Kun Yung, Ted, defer this to next release ???
>
>
>
>M-1312, M-1256 - Stevo, could u take one of them
>
>
>On Thursday, November 28, 2013 5:01 AM, Isabel Drost-Fromm  
>wrote:
>
>On Wed, 27 Nov 2013 14:23:11 -0800
> (PST)
>Suneel Marthi  wrote:
>> Below are the Open issues for 0.9:-
>
>This looks like we should be targeting Dec. 9th as code freeze to me.
>What do you all think?
>
>
>> Mahout-1245, Mahout-1304, Mahout-1305, Mahout-1307, Mahout-1326 - All
>> related to Wiki updates, missing Wiki documentation and Wiki
>> migration to new CMS.  Isabel's working on M-1245 (migrating to new
>> CMS). Could some of the others be consolidated with that?
>
>I believe MAHOUT-1245 essentially is ready to be published - all I want
>before notifying INFRA to
>switch to the new cms based site is one other
>person to take at least a brief look.
>
>For MAHOUT-1304 - Sebastian, can you please check that the cms based
>site actually does fit on 1280px? We can close this issue then.
>
>MAHOUT-1305 - I think this should be turned into a task to actually
>delete most of the pages that have been migrated to the new CMS (almost
>all of them). Once 1245 is shipped, it would be great if a few more
>people could lend a hand in getting this done.
>
>MAHOUT-1307 - Can be closed once switched to CMS
>
>MAHOUT-1326 - This really relates to the old Confluence export plugin
>we once have been using to generate static pages out of our wiki that
>is no longer active. Unless anyone on the Mahout dev list
>knows how to
>fully
> delete all exported static pages we should file an issue with
>INFRA to ask for help getting those deleted. They definitely are
>confusing to users.
>
>
>
>> M-1286 - Peng and ssc, we had talked about this during the last
>> hangout. Can this be included in 0.9?
>>
>> M-1030 - Andrew Musselman? Any updates on this, its important that we
>> fix this for 0.9
>>
>> M-1319, M-1328,
>>  M-1347, M-1364 - Suneel
>>
>> M-1273 - Kun Yung, remember talking about this in one of the earlier
>> hangouts; can't recall what was decided?
>>
>> M-1312, M-1256 - Dan Filimon (or Stevo??)
>>
>> M-996  someone could pick
> this up (if its still relevant with present
>> codebase i.e.)
>
>I think this can move to the next release - according to the
>contributor and Sebastian the patch is rather hacky and there for
>illustration purposes only. I'd rather see some more thought go into
>that instead of pushing to have this in 0.9.
>
>
>> M-1265 Yexi had submitted a patch for this, it would be good if this
>> could go in as part of 0.9 
>>
>> M-1288 Solr Recommender - Pat Ferrell
>>
>> M-1285: Any takers for this?
>
>Would be nice to have - in particular if someone on dev@ (not
>necessarily a committer) wants to get started with the code base.
>Otherwise I'd say fix for next release
> if time gets short.
>
>
>> M-1356: Isabel's started on this, Stevo could u review this?
>
>We definitely can punt that for the next release or even thereafter. It
>would be great if someone who has some knowledge of Java security
>policies would take a look. The implication of not fixing this
>essentially is that in case someone commits test code that writes
>outside of target or to some globally shared directory we might end up
>having randomly failing tests due to the parallel setup again. But as
>these will occur shortly after the commit it should be easy enough to
>find the code change that caused the breakage.
>
>
>
>> M-1329: Support for Hadoop 2
>
>Is that truly feasable
> within a week?
>
>
>> M-1366:  Stevo, Isabel 
>
>This should be done as part of the release process by release manager
>at the latest.
>
>
>> M-1261: Sebastian???
>>
>> M-1309, M-1310, M-1311, M-1316 - all related to running Mahout on
>> Windows ??
>
>I'm not aware of us supporting Windows.
>
>
>> M-1

[jira] [Commented] (MAHOUT-1265) Add Multilayer Perceptron

2013-11-28 Thread Yexi Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834959#comment-13834959
 ] 

Yexi Jiang commented on MAHOUT-1265:


OK, I'll revise it accordingly.

> Add Multilayer Perceptron 
> --
>
> Key: MAHOUT-1265
> URL: https://issues.apache.org/jira/browse/MAHOUT-1265
> Project: Mahout
>  Issue Type: New Feature
>Reporter: Yexi Jiang
>  Labels: machine_learning, neural_network
> Attachments: mahout-1265.patch
>
>
> Design of multilayer perceptron
> 1. Motivation
> A multilayer perceptron (MLP) is a kind of feed forward artificial neural 
> network, which is a mathematical model inspired by the biological neural 
> network. The multilayer perceptron can be used for various machine learning 
> tasks such as classification and regression. It is helpful if it can be 
> included in mahout.
> 2. API
> The design goal of API is to facilitate the usage of MLP for user, and make 
> the implementation detail user transparent.
> The following is an example code of how user uses the MLP.
> -
> //  set the parameters
> double learningRate = 0.5;
> double momentum = 0.1;
> int[] layerSizeArray = new int[] {2, 5, 1};
> String costFuncName = “SquaredError”;
> String squashingFuncName = “Sigmoid”;
> //  the location to store the model, if there is already an existing model at 
> the specified location, MLP will throw exception
> URI modelLocation = ...
> MultilayerPerceptron mlp = new MultiLayerPerceptron(layerSizeArray, 
> modelLocation);
> mlp.setLearningRate(learningRate).setMomentum(momentum).setRegularization(...).setCostFunction(...).setSquashingFunction(...);
> //  the user can also load an existing model with given URI and update the 
> model with new training data, if there is no existing model at the specified 
> location, an exception will be thrown
> /*
> MultilayerPerceptron mlp = new MultiLayerPerceptron(learningRate, 
> regularization, momentum, squashingFuncName, costFuncName, modelLocation);
> */
> URI trainingDataLocation = …
> //  the detail of training is transparent to the user, it may running in a 
> single machine or in a distributed environment
> mlp.train(trainingDataLocation);
> //  user can also train the model with one training instance in stochastic 
> gradient descent way
> Vector trainingInstance = ...
> mlp.train(trainingInstance);
> //  prepare the input feature
> Vector inputFeature …
> //  the semantic meaning of the output result is defined by the user
> //  in general case, the dimension of output vector is 1 for regression and 
> two-class classification
> //  the dimension of output vector is n for n-class classification (n > 2)
> Vector outputVector = mlp.output(inputFeature); 
> -
> 3. Methodology
> The output calculation can be easily implemented with feed-forward approach. 
> Also, the single machine training is straightforward. The following will 
> describe how to train MLP in distributed way with batch gradient descent. The 
> workflow is illustrated as the below figure.
> https://docs.google.com/drawings/d/1s8hiYKpdrP3epe1BzkrddIfShkxPrqSuQBH0NAawEM4/pub?w=960&h=720
> For the distributed training, each training iteration is divided into two 
> steps, the weight update calculation step and the weight update step. The 
> distributed MLP can only be trained in batch-update approach.
> 3.1 The partial weight update calculation step:
> This step trains the MLP distributedly. Each task will get a copy of the MLP 
> model, and calculate the weight update with a partition of data.
> Suppose the training error is E(w) = ½ \sigma_{d \in D} cost(t_d, y_d), where 
> D denotes the training set, d denotes a training instance, t_d denotes the 
> class label and y_d denotes the output of the MLP. Also, suppose sigmoid 
> function is used as the squashing function, 
> squared error is used as the cost function, 
> t_i denotes the target value for the ith dimension of the output layer, 
> o_i denotes the actual output for the ith dimension of the output layer, 
> l denotes the learning rate,
> w_{ij} denotes the weight between the jth neuron in previous layer and the 
> ith neuron in the next layer. 
> The weight of each edge is updated as 
> \Delta w_{ij} = l * 1 / m * \delta_j * o_i, 
> where \delta_j = - \sigma_{m} * o_j^{(m)} * (1 - o_j^{(m)}) * (t_j^{(m)} - 
> o_j^{(m)}) for output layer, \delta = - \sigma_{m} * o_j^{(m)} * (1 - 
> o_j^{(m)}) * \sigma_k \delta_k * w_{jk} for hidden layer. 
> It is easy to know that \delta_j can be rewritten as 
> \delta_j = - \sigma_{i = 1}^k \sigma_{m_i} * o_j^{(m_i)} * (1 - o_j^{(m_i)}) 
> * (t_j^{(m_i)} - o_j^{(m_i)})
> The above equation indicates that the \delta_j can be divided into k parts.
> So for the implementation, each mapper can calculate par

Re: Mahout 0.9 release

2013-11-28 Thread Andrew Musselman
Was going to open M-1030 this weekend; I think doing the quick fix can be
done in time and the more involved job of putting arbitrary properties on
vectors should be pushed to 1.0.

Sound reasonable?


On Thu, Nov 28, 2013 at 7:58 AM, Suneel Marthi wrote:

> Forgot to add
>
> M-1288 Solr Recommender - Pat Ferrell
>
> to my earlier email.
>
>
>
> On Thursday, November 28, 2013 10:38 AM, Suneel Marthi <
> suneel_mar...@yahoo.com> wrote:
>
> Adding Mahout-1349 to the list of JIRAs .
>
>
>
>
>
> On Thursday, November 28, 2013 10:37 AM, Suneel Marthi <
> suneel_mar...@yahoo.com> wrote:
>
> Update on Open JIRAs for 0.9:
>
> Mahout-1245, Mahout-1304, Mahout-1305, Mahout-1307, Mahout-1326 - all
> related to Wiki updates, please see Isabel's updates.
>
> M-1286 - Peng and
>  Sebastian, we had talked about this during the last hangout. Can this be
> included in 0.9?
>
> M-1030- Andrew Musselman, its critical that we get this into 0.9, its been
> deferred for last 2 Mahout releases.
>
> M-1319, M-1328, M-1347, M-1350 - Suneel
>
>
> M-1265 - Multi Layer Perceptron, Yexi please look at my comments on
> Reviewboard.
>
> M-1273 - Kun Yung, Ted, defer this to next release ???
>
>
>
> M-1312, M-1256 - Stevo, could u take one of them
>
>
> On Thursday, November 28, 2013 5:01 AM, Isabel Drost-Fromm <
> isa...@apache.org> wrote:
>
> On Wed, 27 Nov 2013 14:23:11 -0800
>  (PST)
> Suneel Marthi  wrote:
> > Below are the Open issues for 0.9:-
>
> This looks like we should be targeting Dec. 9th as code freeze to me.
> What do you all think?
>
>
> > Mahout-1245, Mahout-1304, Mahout-1305, Mahout-1307, Mahout-1326 - All
> > related to Wiki updates, missing Wiki documentation and Wiki
> > migration to new CMS.  Isabel's working on M-1245 (migrating to new
> > CMS). Could some of the others be consolidated with that?
>
> I believe MAHOUT-1245 essentially is ready to be published - all I want
> before notifying INFRA to
> switch to the new cms based site is one other
> person to take at least a brief look.
>
> For MAHOUT-1304 - Sebastian, can you please check that the cms based
> site actually does fit on 1280px? We can close this issue then.
>
> MAHOUT-1305 - I think this should be turned into a task to actually
> delete most of the pages that have been migrated to the new CMS (almost
> all of them). Once 1245 is shipped, it would be great if a few more
> people could lend a hand in getting this done.
>
> MAHOUT-1307 - Can be closed once switched to CMS
>
> MAHOUT-1326 - This really relates to the old Confluence export plugin
> we once have been using to generate static pages out of our wiki that
> is no longer active. Unless anyone on the Mahout dev list
> knows how to
> fully
>  delete all exported static pages we should file an issue with
> INFRA to ask for help getting those deleted. They definitely are
> confusing to users.
>
>
>
> > M-1286 - Peng and ssc, we had talked about this during the last
> > hangout. Can this be included in 0.9?
> >
> > M-1030 - Andrew Musselman? Any updates on this, its important that we
> > fix this for 0.9
> >
> > M-1319, M-1328,
> >  M-1347, M-1364 - Suneel
> >
> > M-1273 - Kun Yung, remember talking about this in one of the earlier
> > hangouts; can't recall what was decided?
> >
> > M-1312, M-1256 - Dan Filimon (or Stevo??)
> >
> > M-996  someone could pick
>  this up (if its still relevant with present
> > codebase i.e.)
>
> I think this can move to the next release - according to the
> contributor and Sebastian the patch is rather hacky and there for
> illustration purposes only. I'd rather see some more thought go into
> that instead of pushing to have this in 0.9.
>
>
> > M-1265 Yexi had submitted a patch for this, it would be good if this
> > could go in as part of 0.9
> >
> > M-1288 Solr Recommender - Pat Ferrell
> >
> > M-1285: Any takers for this?
>
> Would be nice to have - in particular if someone on dev@ (not
> necessarily a committer) wants to get started with the code base.
> Otherwise I'd say fix for next release
>  if time gets short.
>
>
> > M-1356: Isabel's started on this, Stevo could u review this?
>
> We definitely can punt that for the next release or even thereafter. It
> would be great if someone who has some knowledge of Java security
> policies would take a look. The implication of not fixing this
> essentially is that in case someone commits test code that writes
> outside of target or to some globally shared directory we might end up
> having randomly failing tests due to the parallel setup again. But as
> these will occur shortly after the commit it should be easy enough to
> find the code change that caused the breakage.
>
>
>
> > M-1329: Support for Hadoop 2
>
> Is that truly feasable
>  within a week?
>
>
> > M-1366:  Stevo, Isabel 
>
> This should be done as part of the release process by release manager
> at the latest.
>
>
> > M-1261: Sebastian???
> >
> > M-1309, M-1310, M-1311, M-1316 - all related to running Mahout on
> > Windows ??
>
> I'm not aware of us sup

Re: Mahout 0.9 release

2013-11-28 Thread Suneel Marthi
Forgot to add 

M-1288 Solr Recommender - Pat Ferrell 

to my earlier email.



On Thursday, November 28, 2013 10:38 AM, Suneel Marthi 
 wrote:
 
Adding Mahout-1349 to the list of JIRAs . 





On Thursday, November 28, 2013 10:37 AM, Suneel Marthi 
 wrote:
 
Update on Open JIRAs for 0.9:

Mahout-1245, Mahout-1304, Mahout-1305, Mahout-1307, Mahout-1326 - all related 
to Wiki updates, please see Isabel's updates.

M-1286 - Peng and
 Sebastian, we had talked about this during the last hangout. Can this be 
included in 0.9?

M-1030- Andrew Musselman, its critical that we get this into 0.9, its been 
deferred for last 2 Mahout releases.

M-1319, M-1328, M-1347, M-1350 - Suneel


M-1265 - Multi Layer Perceptron, Yexi please look at my comments on Reviewboard.

M-1273 - Kun Yung, Ted, defer this to next release ???



M-1312, M-1256 - Stevo, could u take one of them


On Thursday, November 28, 2013 5:01 AM, Isabel Drost-Fromm  
wrote:

On Wed, 27 Nov 2013 14:23:11 -0800
 (PST)
Suneel Marthi  wrote:
> Below are the Open issues for 0.9:-

This looks like we should be targeting Dec. 9th as code freeze to me.
What do you all think?


> Mahout-1245, Mahout-1304, Mahout-1305, Mahout-1307, Mahout-1326 - All
> related to Wiki updates, missing Wiki documentation and Wiki
> migration to new CMS.  Isabel's working on M-1245 (migrating to new
> CMS). Could some of the others be consolidated with that?

I believe MAHOUT-1245 essentially is ready to be published - all I want
before notifying INFRA to
switch to the new cms based site is one other
person to take at least a brief look.

For MAHOUT-1304 - Sebastian, can you please check that the cms based
site actually does fit on 1280px? We can close this issue then.

MAHOUT-1305 - I think this should be turned into a task to actually
delete most of the pages that have been migrated to the new CMS (almost
all of them). Once 1245 is shipped, it would be great if a few more
people could lend a hand in getting this done.

MAHOUT-1307 - Can be closed once switched to CMS

MAHOUT-1326 - This really relates to the old Confluence export plugin
we once have been using to generate static pages out of our wiki that
is no longer active. Unless anyone on the Mahout dev list
knows how to
fully
 delete all exported static pages we should file an issue with
INFRA to ask for help getting those deleted. They definitely are
confusing to users.



> M-1286 - Peng and ssc, we had talked about this during the last
> hangout. Can this be included in 0.9?
> 
> M-1030 - Andrew Musselman? Any updates on this, its important that we
> fix this for 0.9
> 
> M-1319, M-1328,
>  M-1347, M-1364 - Suneel
> 
> M-1273 - Kun Yung, remember talking about this in one of the earlier
> hangouts; can't recall what was decided?
> 
> M-1312, M-1256 - Dan Filimon (or Stevo??)
> 
> M-996  someone could pick
 this up (if its still relevant with present
> codebase i.e.)

I think this can move to the next release - according to the
contributor and Sebastian the patch is rather hacky and there for
illustration purposes only. I'd rather see some more thought go into
that instead of pushing to have this in 0.9.


> M-1265 Yexi had submitted a patch for this, it would be good if this
> could go in as part of 0.9  
> 
> M-1288 Solr Recommender - Pat Ferrell 
> 
> M-1285: Any takers for this?

Would be nice to have - in particular if someone on dev@ (not
necessarily a committer) wants to get started with the code base.
Otherwise I'd say fix for next release
 if time gets short.


> M-1356: Isabel's started on this, Stevo could u review this?

We definitely can punt that for the next release or even thereafter. It
would be great if someone who has some knowledge of Java security
policies would take a look. The implication of not fixing this
essentially is that in case someone commits test code that writes
outside of target or to some globally shared directory we might end up
having randomly failing tests due to the parallel setup again. But as
these will occur shortly after the commit it should be easy enough to
find the code change that caused the breakage.



> M-1329: Support for Hadoop 2

Is that truly feasable
 within a week?


> M-1366:  Stevo, Isabel 

This should be done as part of the release process by release manager
at the latest.


> M-1261: Sebastian???
> 
> M-1309, M-1310, M-1311, M-1316 - all related to running Mahout on
> Windows ??

I'm not aware of us supporting Windows.


> M-1350 - Any takers?? (Stevo??)

To me this looks like a broken classpath on the user side. Without a
patch to at least re-produce the issue I wouldn't spend too much time

on this.


Isabel

[jira] [Commented] (MAHOUT-1265) Add Multilayer Perceptron

2013-11-28 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834939#comment-13834939
 ] 

Suneel Marthi commented on MAHOUT-1265:
---

[~yxjiang] Please look at my comments on Reviewboard.

> Add Multilayer Perceptron 
> --
>
> Key: MAHOUT-1265
> URL: https://issues.apache.org/jira/browse/MAHOUT-1265
> Project: Mahout
>  Issue Type: New Feature
>Reporter: Yexi Jiang
>  Labels: machine_learning, neural_network
> Attachments: mahout-1265.patch
>
>
> Design of multilayer perceptron
> 1. Motivation
> A multilayer perceptron (MLP) is a kind of feed forward artificial neural 
> network, which is a mathematical model inspired by the biological neural 
> network. The multilayer perceptron can be used for various machine learning 
> tasks such as classification and regression. It is helpful if it can be 
> included in mahout.
> 2. API
> The design goal of API is to facilitate the usage of MLP for user, and make 
> the implementation detail user transparent.
> The following is an example code of how user uses the MLP.
> -
> //  set the parameters
> double learningRate = 0.5;
> double momentum = 0.1;
> int[] layerSizeArray = new int[] {2, 5, 1};
> String costFuncName = “SquaredError”;
> String squashingFuncName = “Sigmoid”;
> //  the location to store the model, if there is already an existing model at 
> the specified location, MLP will throw exception
> URI modelLocation = ...
> MultilayerPerceptron mlp = new MultiLayerPerceptron(layerSizeArray, 
> modelLocation);
> mlp.setLearningRate(learningRate).setMomentum(momentum).setRegularization(...).setCostFunction(...).setSquashingFunction(...);
> //  the user can also load an existing model with given URI and update the 
> model with new training data, if there is no existing model at the specified 
> location, an exception will be thrown
> /*
> MultilayerPerceptron mlp = new MultiLayerPerceptron(learningRate, 
> regularization, momentum, squashingFuncName, costFuncName, modelLocation);
> */
> URI trainingDataLocation = …
> //  the detail of training is transparent to the user, it may running in a 
> single machine or in a distributed environment
> mlp.train(trainingDataLocation);
> //  user can also train the model with one training instance in stochastic 
> gradient descent way
> Vector trainingInstance = ...
> mlp.train(trainingInstance);
> //  prepare the input feature
> Vector inputFeature …
> //  the semantic meaning of the output result is defined by the user
> //  in general case, the dimension of output vector is 1 for regression and 
> two-class classification
> //  the dimension of output vector is n for n-class classification (n > 2)
> Vector outputVector = mlp.output(inputFeature); 
> -
> 3. Methodology
> The output calculation can be easily implemented with feed-forward approach. 
> Also, the single machine training is straightforward. The following will 
> describe how to train MLP in distributed way with batch gradient descent. The 
> workflow is illustrated as the below figure.
> https://docs.google.com/drawings/d/1s8hiYKpdrP3epe1BzkrddIfShkxPrqSuQBH0NAawEM4/pub?w=960&h=720
> For the distributed training, each training iteration is divided into two 
> steps, the weight update calculation step and the weight update step. The 
> distributed MLP can only be trained in batch-update approach.
> 3.1 The partial weight update calculation step:
> This step trains the MLP distributedly. Each task will get a copy of the MLP 
> model, and calculate the weight update with a partition of data.
> Suppose the training error is E(w) = ½ \sigma_{d \in D} cost(t_d, y_d), where 
> D denotes the training set, d denotes a training instance, t_d denotes the 
> class label and y_d denotes the output of the MLP. Also, suppose sigmoid 
> function is used as the squashing function, 
> squared error is used as the cost function, 
> t_i denotes the target value for the ith dimension of the output layer, 
> o_i denotes the actual output for the ith dimension of the output layer, 
> l denotes the learning rate,
> w_{ij} denotes the weight between the jth neuron in previous layer and the 
> ith neuron in the next layer. 
> The weight of each edge is updated as 
> \Delta w_{ij} = l * 1 / m * \delta_j * o_i, 
> where \delta_j = - \sigma_{m} * o_j^{(m)} * (1 - o_j^{(m)}) * (t_j^{(m)} - 
> o_j^{(m)}) for output layer, \delta = - \sigma_{m} * o_j^{(m)} * (1 - 
> o_j^{(m)}) * \sigma_k \delta_k * w_{jk} for hidden layer. 
> It is easy to know that \delta_j can be rewritten as 
> \delta_j = - \sigma_{i = 1}^k \sigma_{m_i} * o_j^{(m_i)} * (1 - o_j^{(m_i)}) 
> * (t_j^{(m_i)} - o_j^{(m_i)})
> The above equation indicates that the \delta_j can be divided into k parts.
> So for the implementation, e

[jira] [Commented] (MAHOUT-1345) Enable randomised testing for all Mahout modules

2013-11-28 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834935#comment-13834935
 ] 

Suneel Marthi commented on MAHOUT-1345:
---

[~isabel]  This can be committed to trunk, we have a separate JIRA to upgrade 
to Lucene 4.6.0 (if possible in 0.9 timelines) else we should upgrade to lucene 
4.5.1 to not see failures on Mac OS.

> Enable randomised testing for all Mahout modules
> 
>
> Key: MAHOUT-1345
> URL: https://issues.apache.org/jira/browse/MAHOUT-1345
> Project: Mahout
>  Issue Type: Improvement
>Affects Versions: 0.8
>Reporter: Isabel Drost-Fromm
>Priority: Minor
> Fix For: 0.9
>
> Attachments: MAHOUT-1345.diff, MAHOUT-1345.patch
>
>
> When enabling randomised testing for all modules I found a few tests became 
> unstable or even fail deterministically due to lingering threads. The 
> attached patch:
> * defines the randomised testing dependency in our parent pom
> * re-uses said dependencies in all depending modules (makes upgrading easier 
> as the version number needs to be changed in just one place)
> * adds several code changes that fixed the failures due to lingering threads 
> for me on my machine. I'd greatly appreciate input a) from those who wrote 
> the respective code and b) others who ran the tests with these changes to 
> make sure there are no other tests that suffer from the same issues. 
> Warning: I touched quite a few bits and pieces I'm not intimately familiar 
> with over the last few weeks  (whenever I had a few spare minutes) - second 
> pair of eyes needed.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAHOUT-1354) Mahout Support for Hadoop 2

2013-11-28 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1354:
--

Fix Version/s: (was: 0.9)
   1.0

> Mahout Support for Hadoop 2 
> 
>
> Key: MAHOUT-1354
> URL: https://issues.apache.org/jira/browse/MAHOUT-1354
> Project: Mahout
>  Issue Type: Improvement
>Affects Versions: 0.8
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
> Fix For: 1.0
>
>
> Mahout support for Hadoop , now that Hadoop 2 is official.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: Mahout 0.9 release

2013-11-28 Thread Suneel Marthi
Adding Mahout-1349 to the list of JIRAs . 





On Thursday, November 28, 2013 10:37 AM, Suneel Marthi 
 wrote:
 
Update on Open JIRAs for 0.9:

Mahout-1245, Mahout-1304, Mahout-1305, Mahout-1307, Mahout-1326 - all related 
to Wiki updates, please see Isabel's updates.

M-1286 - Peng and Sebastian, we had talked about this during the last hangout. 
Can this be included in 0.9?

M-1030- Andrew Musselman, its critical that we get this into 0.9, its been 
deferred for last 2 Mahout releases.

M-1319, M-1328, M-1347, M-1350 - Suneel


M-1265 - Multi Layer Perceptron, Yexi please look at my comments on Reviewboard.

M-1273 - Kun Yung, Ted, defer this to next release ???



M-1312, M-1256 - Stevo, could u take one of them


On Thursday, November 28, 2013 5:01 AM, Isabel Drost-Fromm  
wrote:

On Wed, 27 Nov 2013 14:23:11 -0800 (PST)
Suneel Marthi  wrote:
> Below are the Open issues for 0.9:-

This looks like we should be targeting Dec. 9th as code freeze to me.
What do you all think?


> Mahout-1245, Mahout-1304, Mahout-1305, Mahout-1307, Mahout-1326 - All
> related to Wiki updates, missing Wiki documentation and Wiki
> migration to new CMS.  Isabel's working on M-1245 (migrating to new
> CMS). Could some of the others be consolidated with that?

I believe MAHOUT-1245 essentially is ready to be published - all I want
before notifying INFRA to
switch to the new cms based site is one other
person to take at least a brief look.

For MAHOUT-1304 - Sebastian, can you please check that the cms based
site actually does fit on 1280px? We can close this issue then.

MAHOUT-1305 - I think this should be turned into a task to actually
delete most of the pages that have been migrated to the new CMS (almost
all of them). Once 1245 is shipped, it would be great if a few more
people could lend a hand in getting this done.

MAHOUT-1307 - Can be closed once switched to CMS

MAHOUT-1326 - This really relates to the old Confluence export plugin
we once have been using to generate static pages out of our wiki that
is no longer active. Unless anyone on the Mahout dev list
knows how to
fully delete all exported static pages we should file an issue with
INFRA to ask for help getting those deleted. They definitely are
confusing to users.



> M-1286 - Peng and ssc, we had talked about this during the last
> hangout. Can this be included in 0.9?
> 
> M-1030 - Andrew Musselman? Any updates on this, its important that we
> fix this for 0.9
> 
> M-1319, M-1328,
>  M-1347, M-1364 - Suneel
> 
> M-1273 - Kun Yung, remember talking about this in one of the earlier
> hangouts; can't recall what was decided?
> 
> M-1312, M-1256 - Dan Filimon (or Stevo??)
> 
> M-996  someone could pick this up (if its still relevant with present
> codebase i.e.)

I think this can move to the next release - according to the
contributor and Sebastian the patch is rather hacky and there for
illustration purposes only. I'd rather see some more thought go into
that instead of pushing to have this in 0.9.


> M-1265 Yexi had submitted a patch for this, it would be good if this
> could go in as part of 0.9  
> 
> M-1288 Solr Recommender - Pat Ferrell 
> 
> M-1285: Any takers for this?

Would be nice to have - in particular if someone on dev@ (not
necessarily a committer) wants to get started with the code base.
Otherwise I'd say fix for next release if time gets short.


> M-1356: Isabel's started on this, Stevo could u review this?

We definitely can punt that for the next release or even thereafter. It
would be great if someone who has some knowledge of Java security
policies would take a look. The implication of not fixing this
essentially is that in case someone commits test code that writes
outside of target or to some globally shared directory we might end up
having randomly failing tests due to the parallel setup again. But as
these will occur shortly after the commit it should be easy enough to
find the code change that caused the breakage.



> M-1329: Support for Hadoop 2

Is that truly feasable within a week?


> M-1366:  Stevo, Isabel 

This should be done as part of the release process by release manager
at the latest.


> M-1261: Sebastian???
> 
> M-1309, M-1310, M-1311, M-1316 - all related to running Mahout on
> Windows ??

I'm not aware of us supporting Windows.


> M-1350 - Any takers?? (Stevo??)

To me this looks like a broken classpath on the user side. Without a
patch to at least re-produce the issue I wouldn't spend too much time

on this.


Isabel

Re: Mahout 0.9 release

2013-11-28 Thread Suneel Marthi
Update on Open JIRAs for 0.9:

Mahout-1245, Mahout-1304, Mahout-1305, Mahout-1307, Mahout-1326 - all related 
to Wiki updates, please see Isabel's updates.

M-1286 - Peng and Sebastian, we had talked about this during the last hangout. 
Can this be included in 0.9?

M-1030- Andrew Musselman, its critical that we get this into 0.9, its been 
deferred for last 2 Mahout releases.

M-1319, M-1328, M-1347, M-1350 - Suneel


M-1265 - Multi Layer Perceptron, Yexi please look at my comments on Reviewboard.

M-1273 - Kun Yung, Ted, defer this to next release ???



M-1312, M-1256 - Stevo, could u take one of them

On Thursday, November 28, 2013 5:01 AM, Isabel Drost-Fromm  
wrote:
 
On Wed, 27 Nov 2013 14:23:11 -0800 (PST)
Suneel Marthi  wrote:
> Below are the Open issues for 0.9:-

This looks like we should be targeting Dec. 9th as code freeze to me.
What do you all think?


> Mahout-1245, Mahout-1304, Mahout-1305, Mahout-1307, Mahout-1326 - All
> related to Wiki updates, missing Wiki documentation and Wiki
> migration to new CMS.  Isabel's working on M-1245 (migrating to new
> CMS). Could some of the others be consolidated with that?

I believe MAHOUT-1245 essentially is ready to be published - all I want
before notifying INFRA to
 switch to the new cms based site is one other
person to take at least a brief look.

For MAHOUT-1304 - Sebastian, can you please check that the cms based
site actually does fit on 1280px? We can close this issue then.

MAHOUT-1305 - I think this should be turned into a task to actually
delete most of the pages that have been migrated to the new CMS (almost
all of them). Once 1245 is shipped, it would be great if a few more
people could lend a hand in getting this done.

MAHOUT-1307 - Can be closed once switched to CMS

MAHOUT-1326 - This really relates to the old Confluence export plugin
we once have been using to generate static pages out of our wiki that
is no longer active. Unless anyone on the Mahout dev list
 knows how to
fully delete all exported static pages we should file an issue with
INFRA to ask for help getting those deleted. They definitely are
confusing to users.



> M-1286 - Peng and ssc, we had talked about this during the last
> hangout. Can this be included in 0.9?
> 
> M-1030 - Andrew Musselman? Any updates on this, its important that we
> fix this for 0.9
> 
> M-1319, M-1328,
>  M-1347, M-1364 - Suneel
> 
> M-1273 - Kun Yung, remember talking about this in one of the earlier
> hangouts; can't recall what was decided?
> 
> M-1312, M-1256 - Dan Filimon (or Stevo??)
> 
> M-996  someone could pick this up (if its still relevant with present
> codebase i.e.)

I think this can move to the next release - according to the
contributor and Sebastian the patch is rather hacky and there for
illustration purposes only. I'd rather see some more thought go into
that instead of pushing to have this in 0.9.


> M-1265 Yexi had submitted a patch for this, it would be good if this
> could go in as part of 0.9  
> 
> M-1288 Solr Recommender - Pat Ferrell 
> 
> M-1285: Any takers for this?

Would be nice to have - in particular if someone on dev@ (not
necessarily a committer) wants to get started with the code base.
Otherwise I'd say fix for next release if time gets short.


> M-1356: Isabel's started on this, Stevo could u review this?

We definitely can punt that for the next release or even thereafter. It
would be great if someone who has some knowledge of Java security
policies would take a look. The implication of not fixing this
essentially is that in case someone commits test code that writes
outside of target or to some globally shared directory we might end up
having randomly failing tests due to the parallel setup again. But as
these will occur shortly after the commit it should be easy enough to
find the code change that caused the breakage.



> M-1329: Support for Hadoop 2

Is that truly feasable within a week?


> M-1366:  Stevo, Isabel 

This should be done as part of the release process by release manager
at the latest.


> M-1261: Sebastian???
> 
> M-1309, M-1310, M-1311, M-1316 - all related to running Mahout on
> Windows ??

I'm not aware of us supporting Windows.


> M-1350 - Any takers?? (Stevo??)

To me this looks like a broken classpath on the user side. Without a
patch to at least re-produce the issue I wouldn't spend too much time

on this.


Isabel

[jira] [Updated] (MAHOUT-1285) Arff loader can misparse string data as double

2013-11-28 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1285:
--

Fix Version/s: Backlog

Marking this as 'Backlog', feel free to bring it back to 0.9 queue if there's a 
patch.

> Arff loader can misparse string data as double
> --
>
> Key: MAHOUT-1285
> URL: https://issues.apache.org/jira/browse/MAHOUT-1285
> Project: Mahout
>  Issue Type: Bug
>Affects Versions: 0.9
> Environment: Linux Ubuntu 12.4
>Reporter: Neil Walkinshaw
> Fix For: Backlog
>
> Attachments: tempArff
>
>
> Have successfully loaded numerous ARFF files with Mahout (originally 
> generated via WEKA). The files contain randomly generated data. For a 
> specific random seed, the following exception is thrown:
> java.lang.NumberFormatException: For input string: 
> "b1shkt70694difsmmmdv0ikmoh"
>   at 
> sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1241)
>   at java.lang.Double.parseDouble(Double.java:540)
>   at 
> org.apache.mahout.utils.vectors.arff.MapBackedARFFModel.processNumeric(MapBackedARFFModel.java:146)
>   at 
> org.apache.mahout.utils.vectors.arff.MapBackedARFFModel.getValue(MapBackedARFFModel.java:97)
>   at 
> org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:77)
>   at 
> org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:30)
>   at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>   at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
>   at 
> org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:44)
>   at 
> org.apache.mahout.utils.vectors.arff.Driver.writeFile(Driver.java:251)
>   at org.apache.mahout.utils.vectors.arff.Driver.main(Driver.java:145)
>   at 
> libInterfaces.MahoutTraceBuilder.generateMahoutFile(MahoutTraceBuilder.java:38)
>   at 
> libInterfaces.MahoutTraceBuilder.generateMahoutReader(MahoutTraceBuilder.java:42)
>   at tests.InputTester.testMahoutMeansShift(InputTester.java:111)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAHOUT-1350) Bean Utils JarClassLoader Warnings

2013-11-28 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834915#comment-13834915
 ] 

Suneel Marthi commented on MAHOUT-1350:
---

[~mobiusinversion]  This is not specifically an issue with Mahout per se, but 
is a combination of using Hadoop + Spring.
What you are seeing has been reported elsewhere at 
http://stackoverflow.com/questions/18690582/how-to-create-jetty-spring-app-with-hbase-connection.

Hadoop comes with commons-collections-3.2.1.jar and 
commons-beanutils-1.7.1.jar, while Spring 3 is bundled with 
commons-beanutils-1.8.0.jar;  hence the jar conflict at runtime. 

Not sure if there's anything that can be fixed in Mahout (since both Hadoop and 
Spring 3.x are 3rd party libraries as far as Mahout is concerned).

With your permission, I would like to close this JIRA as 'Not a Problem'.

> Bean Utils JarClassLoader Warnings
> --
>
> Key: MAHOUT-1350
> URL: https://issues.apache.org/jira/browse/MAHOUT-1350
> Project: Mahout
>  Issue Type: Bug
>Reporter: David Williams
>Priority: Minor
>
> Hi all,
> I am trying to embed a user based recommender in a web service using embedded 
> jetty, and spring 3.  However, including the mahout libraries leads to this 
> collision.  It means I CANNOT use Mahout in its current implementation.
> {code}
> JarClassLoader: Warning: org/apache/commons/collections/FastHashMap.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/collections/ArrayStack.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/collections/FastHashMap$Values.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/collections/FastHashMap$CollectionView$CollectionViewIterator.class
>  in lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/collections/FastHashMap$1.class 
> in lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/collections/BufferUnderflowException.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/collections/FastHashMap$KeySet.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/collections/FastHashMap$CollectionView.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/collections/FastHashMap$EntrySet.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/BasicDynaBean.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/BasicDynaClass.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/beanutils/BeanAccessLanguageException.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/BeanUtils.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/BeanUtilsBean$1.class 
> in lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/BeanUtilsBean.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/ConstructorUtils.class 
> in lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/beanutils/ContextClassLoaderLocal.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/beanutils/ConversionException.class in 
> lib/commons-bean

[jira] [Assigned] (MAHOUT-1350) Bean Utils JarClassLoader Warnings

2013-11-28 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi reassigned MAHOUT-1350:
-

Assignee: Suneel Marthi

> Bean Utils JarClassLoader Warnings
> --
>
> Key: MAHOUT-1350
> URL: https://issues.apache.org/jira/browse/MAHOUT-1350
> Project: Mahout
>  Issue Type: Bug
>Reporter: David Williams
>Assignee: Suneel Marthi
>Priority: Minor
>
> Hi all,
> I am trying to embed a user based recommender in a web service using embedded 
> jetty, and spring 3.  However, including the mahout libraries leads to this 
> collision.  It means I CANNOT use Mahout in its current implementation.
> {code}
> JarClassLoader: Warning: org/apache/commons/collections/FastHashMap.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/collections/ArrayStack.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/collections/FastHashMap$Values.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/collections/FastHashMap$CollectionView$CollectionViewIterator.class
>  in lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/collections/FastHashMap$1.class 
> in lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/collections/BufferUnderflowException.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/collections/FastHashMap$KeySet.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/collections/FastHashMap$CollectionView.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/collections/FastHashMap$EntrySet.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/BasicDynaBean.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/BasicDynaClass.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/beanutils/BeanAccessLanguageException.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/BeanUtils.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/BeanUtilsBean$1.class 
> in lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/BeanUtilsBean.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/ConstructorUtils.class 
> in lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/beanutils/ContextClassLoaderLocal.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/beanutils/ConversionException.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/ConvertUtils.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/ConvertUtilsBean.class 
> in lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/Converter.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different b

[jira] [Issue Comment Deleted] (MAHOUT-1305) Rework the wiki

2013-11-28 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1305:
--

Comment: was deleted

(was: I am in all day meetings Monday and Tuesday and will have very limited 
access to email. I will respond to emails as soon as possible.
)

> Rework the wiki
> ---
>
> Key: MAHOUT-1305
> URL: https://issues.apache.org/jira/browse/MAHOUT-1305
> Project: Mahout
>  Issue Type: Bug
>  Components: Website
>Reporter: Sebastian Schelter
>Priority: Blocker
> Fix For: 0.9
>
>
> We should think about completely redoing our wiki. At the moment, we're 
> listing lots of algorithms that we either never implemented or already 
> removed. I also have the impression that a lot of stuff is outdated.
> It would be awesome if we had an up-to-date documentation of the code with 
> instructions on how to get into using mahout quickly.
> We should also have examples for all our 3 C's.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (MAHOUT-1261) TasteHadoopUtils.idToIndex can return an int that has size Integer.MAX_VALUE

2013-11-28 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved MAHOUT-1261.
---

   Resolution: Fixed
Fix Version/s: 0.9

Patch committed to trunk.

> TasteHadoopUtils.idToIndex can return an int that has size Integer.MAX_VALUE
> 
>
> Key: MAHOUT-1261
> URL: https://issues.apache.org/jira/browse/MAHOUT-1261
> Project: Mahout
>  Issue Type: Bug
>  Components: Collaborative Filtering
>Affects Versions: 0.8
>Reporter: Dan Filimon
>Assignee: Sebastian Schelter
>Priority: Minor
> Fix For: 0.9
>
> Attachments: MAHOUT-1261.patch
>
>
> I'm running ItemSimilarityJob on a very large (~600M by 4B) matrix that's 
> very sparse (total set of associations is 630MB).
> The job fails because of an IndexException in ToUserVectorsReducer.
> TasteHadoopUtils.idToIndex(long id) hashes a long with:
> 0x7fff & Longs.hashCode(id) (line 
> o.a.m.cf.taste.hadoop.TasteHadoopUtils:57).
> For some id (I don't know what value), the result returned is 
> Integer.MAX_VALUE.
> This cannot be set in the userVector because the cardinality of that is also 
> Integer.MAX_VALUE and it throws an exception.
> So, the issue is that values from 0 to INT_MAX are returned by idToIndex but 
> the vector only has 0 to INT_MAX - 1 possible entries.
> It's a nasty little off-by-one bug.
> I'm thinking of just % size when setting.
> [~ssc] & everyone else, thoughts? :)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: Mahout 0.9 release

2013-11-28 Thread Suneel Marthi





On Thursday, November 28, 2013 5:01 AM, Isabel Drost-Fromm  
wrote:
 
On Wed, 27 Nov 2013 14:23:11 -0800 (PST)
Suneel Marthi  wrote:
> Below are the Open issues for 0.9:-

This looks like we should be targeting Dec. 9th as code freeze to me.
What do you all think?


> Mahout-1245, Mahout-1304, Mahout-1305, Mahout-1307, Mahout-1326 - All
> related to Wiki updates, missing Wiki documentation and Wiki
> migration to new CMS.  Isabel's working on M-1245 (migrating to new
> CMS). Could some of the others be consolidated with that?

I believe MAHOUT-1245 essentially is ready to be published - all I want
before notifying INFRA to switch to the new cms based site is one other
person to take at least a brief look.

For MAHOUT-1304 - Sebastian, can you please check that the cms based
site actually does fit on 1280px? We can close this issue then.

MAHOUT-1305 - I think this should be turned into a task to actually
delete most of the pages that have been migrated to the new CMS (almost
all of them). Once 1245 is shipped, it would be great if a few more
people could lend a hand in getting this done.

MAHOUT-1307 - Can be closed once switched to CMS

MAHOUT-1326 - This really relates to the old Confluence export plugin
we once have been using to generate static pages out of our wiki that
is no longer active. Unless anyone on the Mahout dev list knows how to
fully delete all exported static pages we should file an issue with
INFRA to ask for help getting those deleted. They definitely are
confusing to users.



> M-1286 - Peng and ssc, we had talked about this during the last
> hangout. Can this be included in 0.9?
> 
> M-1030 - Andrew Musselman? Any updates on this, its important that we
> fix this for 0.9
> 
> M-1319, M-1328,
>  M-1347, M-1364 - Suneel
> 
> M-1273 - Kun Yung, remember talking about this in one of the earlier
> hangouts; can't recall what was decided?
> 
> M-1312, M-1256 - Dan Filimon (or Stevo??)
> 
> M-996  someone could pick this up (if its still relevant with present
> codebase i.e.)

I think this can move to the next release - according to the
contributor and Sebastian the patch is rather hacky and there for
illustration purposes only. I'd rather see some more thought go into
that instead of pushing to have this in 0.9.


> M-1265 Yexi had submitted a patch for this, it would be good if this
> could go in as part of 0.9  
> 
> M-1288 Solr Recommender - Pat Ferrell 
> 
> M-1285: Any takers for this?

Would be nice to have - in particular if someone on dev@ (not
necessarily a committer) wants to get started with the code base.
Otherwise I'd say fix for next release if time gets short.


> M-1356: Isabel's started on this, Stevo could u review this?

We definitely can punt that for the next release or even thereafter. It
would be great if someone who has some knowledge of Java security
policies would take a look. The implication of not fixing this
essentially is that in case someone commits test code that writes
outside of target or to some globally shared directory we might end up
having randomly failing tests due to the parallel setup again. But as
these will occur shortly after the commit it should be easy enough to
find the code change that caused the breakage.



> M-1329: Support for Hadoop 2

Is that truly feasable within a week?


> M-1366:  Stevo, Isabel 

This should be done as part of the release process by release manager
at the latest.


> M-1261: Sebastian???
> 
> M-1309, M-1310, M-1311, M-1316 - all related to running Mahout on
> Windows ??

I'm not aware of us supporting Windows.


>>> M-1350 - Any takers?? (Stevo??)

>>To me this looks like a broken classpath on the user side. Without a
>>patch to at least re-produce the issue I wouldn't spend too much time

>>on this.


>>Isabel

Not really, per the report this happens when Mahout is being used in a Spring 
environment.
Spring has its 3rd party jars and so does Mahout, at runtime the application 
sees 2 different versions of the same jar
(from Spring and Mahout).  It may be that MAhout jars are from older versions, 
and need to be updated to latest.

[jira] [Commented] (MAHOUT-1345) Enable randomised testing for all Mahout modules

2013-11-28 Thread Frank Scholten (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834850#comment-13834850
 ] 

Frank Scholten commented on MAHOUT-1345:


[~smarthi]] Agreed.

> Enable randomised testing for all Mahout modules
> 
>
> Key: MAHOUT-1345
> URL: https://issues.apache.org/jira/browse/MAHOUT-1345
> Project: Mahout
>  Issue Type: Improvement
>Affects Versions: 0.8
>Reporter: Isabel Drost-Fromm
>Priority: Minor
> Fix For: 0.9
>
> Attachments: MAHOUT-1345.diff, MAHOUT-1345.patch
>
>
> When enabling randomised testing for all modules I found a few tests became 
> unstable or even fail deterministically due to lingering threads. The 
> attached patch:
> * defines the randomised testing dependency in our parent pom
> * re-uses said dependencies in all depending modules (makes upgrading easier 
> as the version number needs to be changed in just one place)
> * adds several code changes that fixed the failures due to lingering threads 
> for me on my machine. I'd greatly appreciate input a) from those who wrote 
> the respective code and b) others who ran the tests with these changes to 
> make sure there are no other tests that suffer from the same issues. 
> Warning: I touched quite a few bits and pieces I'm not intimately familiar 
> with over the last few weeks  (whenever I had a few spare minutes) - second 
> pair of eyes needed.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAHOUT-1343) JSON output format for clusterdumper

2013-11-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834709#comment-13834709
 ] 

Hudson commented on MAHOUT-1343:


SUCCESS: Integrated in Mahout-Quality #2337 (See 
[https://builds.apache.org/job/Mahout-Quality/2337/])
MAHOUT-1343: More Lucene 3.x calls that need to be replaced by equivalent 
Lucene 4.x API (smarthi: rev 1546288)
* 
/mahout/trunk/integration/src/main/java/org/apache/mahout/utils/vectors/lucene/CachedTermInfo.java
* 
/mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/TestClusterDumper.java
* 
/mahout/trunk/integration/src/test/java/org/apache/mahout/utils/vectors/lucene/CachedTermInfoTest.java
* 
/mahout/trunk/integration/src/test/java/org/apache/mahout/utils/vectors/lucene/LuceneIterableTest.java


> JSON output format for clusterdumper
> 
>
> Key: MAHOUT-1343
> URL: https://issues.apache.org/jira/browse/MAHOUT-1343
> Project: Mahout
>  Issue Type: Improvement
>  Components: Clustering, Integration
>Affects Versions: 0.8
>Reporter: Telvis Calhoun
>Assignee: Stevo Slavic
>  Labels: dumper
> Fix For: 0.9
>
> Attachments: MAHOUT-1343.patch, MAHOUT-1343.patch, 
> clusterdump-example.json
>
>
> This patch adds JSON output format to the clusterdump utility. Each cluster 
> is represented as a JSON-encoded line. The command is something like:
> >> mahout clusterdump -d dictionary -dt text -i clusters/clusters-2-final -p 
> >> clusters/clusteredPoints -n 10 -o clusterdump.json -of JSON



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAHOUT-1347) Add Streaming K-Means clustering algorithm to examples/bin/cluster-reuters.sh

2013-11-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834708#comment-13834708
 ] 

Hudson commented on MAHOUT-1347:


SUCCESS: Integrated in Mahout-Quality #2337 (See 
[https://builds.apache.org/job/Mahout-Quality/2337/])
MAHOUT-1347: Add Streaming K-Means clustering algorithm to 
examples/bin/cluster-reuters.sh (smarthi: rev 1546250)
* /mahout/trunk/CHANGELOG
MAHOUT-1347: Add Streaming K-Means clustering algorithm to 
examples/bin/cluster-reuters.sh (smarthi: rev 1546232)
* /mahout/trunk/examples/bin/cluster-reuters.sh


> Add Streaming K-Means clustering algorithm to examples/bin/cluster-reuters.sh
> -
>
> Key: MAHOUT-1347
> URL: https://issues.apache.org/jira/browse/MAHOUT-1347
> Project: Mahout
>  Issue Type: Improvement
>  Components: Examples
>Affects Versions: 0.8
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
> Fix For: 0.9
>
> Attachments: MAHOUT-1347.patch
>
>
> Add Streaming K-Means Clustering to examples/bin/cluster_reuters.sh



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAHOUT-1309) Install mahout on windows

2013-11-28 Thread Isabel Drost-Fromm (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Isabel Drost-Fromm updated MAHOUT-1309:
---

Priority: Trivial  (was: Major)

> Install mahout on windows
> -
>
> Key: MAHOUT-1309
> URL: https://issues.apache.org/jira/browse/MAHOUT-1309
> Project: Mahout
>  Issue Type: Task
>  Components: build
>Affects Versions: 0.7
> Environment: Operation system: Windows server
>Reporter: Sergey Svinarchuk
>Priority: Trivial
> Attachments: patchfile.patch
>
>
> Need create installation script for mahout on Windows and install it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAHOUT-1309) Install mahout on windows

2013-11-28 Thread Isabel Drost-Fromm (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834686#comment-13834686
 ] 

Isabel Drost-Fromm commented on MAHOUT-1309:


Also on first sight this patch looks like it will break builds for any 
non-windows system (I see C:/ in hard coded paths when quickly skimming the 
patch).

Though not an officially supported platform, any patch that makes developing, 
building and running Mahout easier for people in Windows-land is welcome. 
Breaking builds on other systems is not an option though.

> Install mahout on windows
> -
>
> Key: MAHOUT-1309
> URL: https://issues.apache.org/jira/browse/MAHOUT-1309
> Project: Mahout
>  Issue Type: Task
>  Components: build
>Affects Versions: 0.7
> Environment: Operation system: Windows server
>Reporter: Sergey Svinarchuk
> Attachments: patchfile.patch
>
>
> Need create installation script for mahout on Windows and install it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAHOUT-1345) Enable randomised testing for all Mahout modules

2013-11-28 Thread Isabel Drost-Fromm (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834681#comment-13834681
 ] 

Isabel Drost-Fromm commented on MAHOUT-1345:


Fine by me.

> Enable randomised testing for all Mahout modules
> 
>
> Key: MAHOUT-1345
> URL: https://issues.apache.org/jira/browse/MAHOUT-1345
> Project: Mahout
>  Issue Type: Improvement
>Affects Versions: 0.8
>Reporter: Isabel Drost-Fromm
>Priority: Minor
> Fix For: 0.9
>
> Attachments: MAHOUT-1345.diff, MAHOUT-1345.patch
>
>
> When enabling randomised testing for all modules I found a few tests became 
> unstable or even fail deterministically due to lingering threads. The 
> attached patch:
> * defines the randomised testing dependency in our parent pom
> * re-uses said dependencies in all depending modules (makes upgrading easier 
> as the version number needs to be changed in just one place)
> * adds several code changes that fixed the failures due to lingering threads 
> for me on my machine. I'd greatly appreciate input a) from those who wrote 
> the respective code and b) others who ran the tests with these changes to 
> make sure there are no other tests that suffer from the same issues. 
> Warning: I touched quite a few bits and pieces I'm not intimately familiar 
> with over the last few weeks  (whenever I had a few spare minutes) - second 
> pair of eyes needed.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAHOUT-1350) Bean Utils JarClassLoader Warnings

2013-11-28 Thread Isabel Drost-Fromm (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834680#comment-13834680
 ] 

Isabel Drost-Fromm commented on MAHOUT-1350:


[~mobiusinversion] would be great if you could supply a patch that at least 
shows how to re-produce the issue.

To me this looks like a classpath issue rather than a Mahout specific issue.

> Bean Utils JarClassLoader Warnings
> --
>
> Key: MAHOUT-1350
> URL: https://issues.apache.org/jira/browse/MAHOUT-1350
> Project: Mahout
>  Issue Type: Bug
>Reporter: David Williams
>Priority: Minor
>
> Hi all,
> I am trying to embed a user based recommender in a web service using embedded 
> jetty, and spring 3.  However, including the mahout libraries leads to this 
> collision.  It means I CANNOT use Mahout in its current implementation.
> {code}
> JarClassLoader: Warning: org/apache/commons/collections/FastHashMap.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/collections/ArrayStack.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/collections/FastHashMap$Values.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/collections/FastHashMap$CollectionView$CollectionViewIterator.class
>  in lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/collections/FastHashMap$1.class 
> in lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/collections/BufferUnderflowException.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/collections/FastHashMap$KeySet.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/collections/FastHashMap$CollectionView.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/collections/FastHashMap$EntrySet.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/BasicDynaBean.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/BasicDynaClass.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/beanutils/BeanAccessLanguageException.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/BeanUtils.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/BeanUtilsBean$1.class 
> in lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/BeanUtilsBean.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/ConstructorUtils.class 
> in lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/beanutils/ContextClassLoaderLocal.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/beanutils/ConversionException.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/ConvertUtils.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/ConvertUtilsBean.class 
> in lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different byte

[jira] [Updated] (MAHOUT-1350) Bean Utils JarClassLoader Warnings

2013-11-28 Thread Isabel Drost-Fromm (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Isabel Drost-Fromm updated MAHOUT-1350:
---

Priority: Minor  (was: Blocker)

> Bean Utils JarClassLoader Warnings
> --
>
> Key: MAHOUT-1350
> URL: https://issues.apache.org/jira/browse/MAHOUT-1350
> Project: Mahout
>  Issue Type: Bug
>Reporter: David Williams
>Priority: Minor
>
> Hi all,
> I am trying to embed a user based recommender in a web service using embedded 
> jetty, and spring 3.  However, including the mahout libraries leads to this 
> collision.  It means I CANNOT use Mahout in its current implementation.
> {code}
> JarClassLoader: Warning: org/apache/commons/collections/FastHashMap.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/collections/ArrayStack.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/collections/FastHashMap$Values.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/collections/FastHashMap$CollectionView$CollectionViewIterator.class
>  in lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/collections/FastHashMap$1.class 
> in lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/collections/BufferUnderflowException.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/collections/FastHashMap$KeySet.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/collections/FastHashMap$CollectionView.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/collections/FastHashMap$EntrySet.class in 
> lib/commons-beanutils-1.7.0.jar is hidden by 
> lib/commons-collections-3.2.1.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/BasicDynaBean.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/BasicDynaClass.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/beanutils/BeanAccessLanguageException.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/BeanUtils.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/BeanUtilsBean$1.class 
> in lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/BeanUtilsBean.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/ConstructorUtils.class 
> in lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/beanutils/ContextClassLoaderLocal.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: 
> org/apache/commons/beanutils/ConversionException.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/ConvertUtils.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/ConvertUtilsBean.class 
> in lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader: Warning: org/apache/commons/beanutils/Converter.class in 
> lib/commons-beanutils-core-1.8.0.jar is hidden by 
> lib/commons-beanutils-1.7.0.jar (with different bytecode)
> JarClassLoader

Re: Mahout 0.9 release

2013-11-28 Thread Isabel Drost-Fromm
On Wed, 27 Nov 2013 14:23:11 -0800 (PST)
Suneel Marthi  wrote:
> Below are the Open issues for 0.9:-

This looks like we should be targeting Dec. 9th as code freeze to me.
What do you all think?


> Mahout-1245, Mahout-1304, Mahout-1305, Mahout-1307, Mahout-1326 - All
> related to Wiki updates, missing Wiki documentation and Wiki
> migration to new CMS.  Isabel's working on M-1245 (migrating to new
> CMS). Could some of the others be consolidated with that?

I believe MAHOUT-1245 essentially is ready to be published - all I want
before notifying INFRA to switch to the new cms based site is one other
person to take at least a brief look.

For MAHOUT-1304 - Sebastian, can you please check that the cms based
site actually does fit on 1280px? We can close this issue then.

MAHOUT-1305 - I think this should be turned into a task to actually
delete most of the pages that have been migrated to the new CMS (almost
all of them). Once 1245 is shipped, it would be great if a few more
people could lend a hand in getting this done.

MAHOUT-1307 - Can be closed once switched to CMS

MAHOUT-1326 - This really relates to the old Confluence export plugin
we once have been using to generate static pages out of our wiki that
is no longer active. Unless anyone on the Mahout dev list knows how to
fully delete all exported static pages we should file an issue with
INFRA to ask for help getting those deleted. They definitely are
confusing to users.



> M-1286 - Peng and ssc, we had talked about this during the last
> hangout. Can this be included in 0.9?
> 
> M-1030 - Andrew Musselman? Any updates on this, its important that we
> fix this for 0.9
> 
> M-1319, M-1328,
>  M-1347, M-1364 - Suneel
> 
> M-1273 - Kun Yung, remember talking about this in one of the earlier
> hangouts; can't recall what was decided?
> 
> M-1312, M-1256 - Dan Filimon (or Stevo??)
> 
> M-996  someone could pick this up (if its still relevant with present
> codebase i.e.)

I think this can move to the next release - according to the
contributor and Sebastian the patch is rather hacky and there for
illustration purposes only. I'd rather see some more thought go into
that instead of pushing to have this in 0.9.

 
> M-1265 Yexi had submitted a patch for this, it would be good if this
> could go in as part of 0.9  
> 
> M-1288 Solr Recommender - Pat Ferrell 
> 
> M-1285: Any takers for this?

Would be nice to have - in particular if someone on dev@ (not
necessarily a committer) wants to get started with the code base.
Otherwise I'd say fix for next release if time gets short.

 
> M-1356: Isabel's started on this, Stevo could u review this?

We definitely can punt that for the next release or even thereafter. It
would be great if someone who has some knowledge of Java security
policies would take a look. The implication of not fixing this
essentially is that in case someone commits test code that writes
outside of target or to some globally shared directory we might end up
having randomly failing tests due to the parallel setup again. But as
these will occur shortly after the commit it should be easy enough to
find the code change that caused the breakage.


 
> M-1329: Support for Hadoop 2

Is that truly feasable within a week?


> M-1366:  Stevo, Isabel 

This should be done as part of the release process by release manager
at the latest.


> M-1261: Sebastian???
> 
> M-1309, M-1310, M-1311, M-1316 - all related to running Mahout on
> Windows ??

I'm not aware of us supporting Windows.


> M-1350 - Any takers?? (Stevo??)

To me this looks like a broken classpath on the user side. Without a
patch to at least re-produce the issue I wouldn't spend too much time
on this.


Isabel





[jira] [Updated] (MAHOUT-1309) Install mahout on windows

2013-11-28 Thread Anonymous (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous updated MAHOUT-1309:
--

Status: Patch Available  (was: Reopened)

> Install mahout on windows
> -
>
> Key: MAHOUT-1309
> URL: https://issues.apache.org/jira/browse/MAHOUT-1309
> Project: Mahout
>  Issue Type: Task
>  Components: build
>Affects Versions: 0.7
> Environment: Operation system: Windows server
>Reporter: Sergey Svinarchuk
> Attachments: patchfile.patch
>
>
> Need create installation script for mahout on Windows and install it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)