The maintainer of FPG algorithm
If it is available, then I also want to become a maintainer of FGP algorithm. Thanks Yoonmin
Re: About Parallel Frequent Growth algorithm
(1) diligently answer inqueries about method use and theoretical foundation on the user list (2) fix arising issues related to that stuff, diligently as well. when (1) and (2) stops happening, the method gets axed in a year or couple releases (which what pretty much happend this time i think). You think you'd be able to subscribe for that for the next few years? On Mon, Jan 20, 2014 at 4:46 PM, Qinghao Dai wrote: > May I ask what is the qualification to be the maintainer? > I have read this part of code, and would like to have a try. > > Best Regards, > Qinghao > > > 2014/1/20 Ted Dunning > > > On Mon, Jan 20, 2014 at 5:44 PM, Suneel Marthi > >wrote: > > > > > I was asked this question too and I had no clear answer. May be it > wasn't > > > right to remove FP from the codebase. > > > > > > > The major problem was that we had no maintainers for the code. > > >
Re: About Parallel Frequent Growth algorithm
May I ask what is the qualification to be the maintainer? I have read this part of code, and would like to have a try. Best Regards, Qinghao 2014/1/20 Ted Dunning > On Mon, Jan 20, 2014 at 5:44 PM, Suneel Marthi >wrote: > > > I was asked this question too and I had no clear answer. May be it wasn't > > right to remove FP from the codebase. > > > > The major problem was that we had no maintainers for the code. >
Re: About Parallel Frequent Growth algorithm
On Mon, Jan 20, 2014 at 5:44 PM, Suneel Marthi wrote: > I was asked this question too and I had no clear answer. May be it wasn't > right to remove FP from the codebase. > The major problem was that we had no maintainers for the code.
Re: About Parallel Frequent Growth algorithm
it seems more like it is not supported. I'd port it into spark counterpart and make sure there's a support (i.e. a person to go after when it breaks :) On Mon, Jan 20, 2014 at 3:44 PM, Suneel Marthi wrote: > I was asked this question too and I had no clear answer. May be it wasn't > right to remove FP from the codebase. > Not having this may well be one another reason for users to look at > options other than Mahout. > > Given the issues that Frank's reported with Streaming KMeans (and I am > seeing them too) I was gonna rollback the Release presently in staging > anyways. > > Do we take a step back and restore FP for 0.9? > > > > > > > On Monday, January 20, 2014 6:31 PM, Dmitriy Lyubimov > wrote: > > that's a bit weird though. Association mining is still a pretty popular > technique. (our scientists use it, albeit not in exact FPGrowth form) > > > > On Mon, Jan 20, 2014 at 3:15 PM, Sebastian Schelter > wrote: > > > Hi Yoonmin, > > > > we removed a bunch of algorithms either because they were rarely used or > > not actively maintained anymore. IIRC the first thing was true for PFG. > > > > --sebastian > > > > > > On 01/20/2014 03:42 AM, Yoonmin Nam wrote: > > > >> Hello, everyone! > >> > >> > >> > >> Is there anyone know about the reason why PFG is deprecated in Mahout? > >> > >> > >> > >> I knew that new algorithm (BIGFIM) will be implemented as a substitution > >> of > >> old PFG algorithm for parallel frequent pattern mining. > >> > >> > >> > >> Please let me know if you knew the reason. > >> > >> > >> > >> Thanks! > >> > >> > >> > > >
Re: About Parallel Frequent Growth algorithm
I was asked this question too and I had no clear answer. May be it wasn't right to remove FP from the codebase. Not having this may well be one another reason for users to look at options other than Mahout. Given the issues that Frank's reported with Streaming KMeans (and I am seeing them too) I was gonna rollback the Release presently in staging anyways. Do we take a step back and restore FP for 0.9? On Monday, January 20, 2014 6:31 PM, Dmitriy Lyubimov wrote: that's a bit weird though. Association mining is still a pretty popular technique. (our scientists use it, albeit not in exact FPGrowth form) On Mon, Jan 20, 2014 at 3:15 PM, Sebastian Schelter wrote: > Hi Yoonmin, > > we removed a bunch of algorithms either because they were rarely used or > not actively maintained anymore. IIRC the first thing was true for PFG. > > --sebastian > > > On 01/20/2014 03:42 AM, Yoonmin Nam wrote: > >> Hello, everyone! >> >> >> >> Is there anyone know about the reason why PFG is deprecated in Mahout? >> >> >> >> I knew that new algorithm (BIGFIM) will be implemented as a substitution >> of >> old PFG algorithm for parallel frequent pattern mining. >> >> >> >> Please let me know if you knew the reason. >> >> >> >> Thanks! >> >> >> >
Re: About Parallel Frequent Growth algorithm
that's a bit weird though. Association mining is still a pretty popular technique. (our scientists use it, albeit not in exact FPGrowth form) On Mon, Jan 20, 2014 at 3:15 PM, Sebastian Schelter wrote: > Hi Yoonmin, > > we removed a bunch of algorithms either because they were rarely used or > not actively maintained anymore. IIRC the first thing was true for PFG. > > --sebastian > > > On 01/20/2014 03:42 AM, Yoonmin Nam wrote: > >> Hello, everyone! >> >> >> >> Is there anyone know about the reason why PFG is deprecated in Mahout? >> >> >> >> I knew that new algorithm (BIGFIM) will be implemented as a substitution >> of >> old PFG algorithm for parallel frequent pattern mining. >> >> >> >> Please let me know if you knew the reason. >> >> >> >> Thanks! >> >> >> >
Re: About Parallel Frequent Growth algorithm
Hi Yoonmin, we removed a bunch of algorithms either because they were rarely used or not actively maintained anymore. IIRC the first thing was true for PFG. --sebastian On 01/20/2014 03:42 AM, Yoonmin Nam wrote: Hello, everyone! Is there anyone know about the reason why PFG is deprecated in Mahout? I knew that new algorithm (BIGFIM) will be implemented as a substitution of old PFG algorithm for parallel frequent pattern mining. Please let me know if you knew the reason. Thanks!
[OT] Uses Cases for Taming Text, 2nd ed.
Hi Mahout Users, Drew Farris, Tom Morton and I are currently working on the 2nd Edition of Taming Text (http://www.manning.com/ingersoll for first ed.) and are soliciting interested parties who would be willing to contribute to a chapter on practical use cases (i.e. you have something in production and are willing to write about it) for search with Solr, NLP using OpenNLP or Stanford NLP and machine learning using Mahout, OpenNLP or MALLET -- ideally you are using combinations of 2 or more of these to solve your problems. We are especially interested in large scale use cases in eCommerce, Advertising, social media analytics, fraud, etc. The writing process is fairly straightforward. A section roughly equates to somewhere between 3 - 10 pages, including diagrams/pictures. After writing, there will be some feedback from editors and us, but otherwise the process is fairly simple. In order to participate, you must have permission from your company to write on the topic. You would not need to divulge any proprietary information, but we would want enough information for our readers to gain a high-level understanding of your use case. In exchange for your participation, you will have your name and company published on that section of the book as well as in the acknowledgments section. If you have a copy of Lucene in Action or Mahout In Action, it would be similar to the use case sections in those books. If you are interested, please respond privately to me using my gsing...@apache.org email address with this subject line. Thanks, Grant, Drew, Tom
Re: MAHOUT 0.9 Release - New URL
This is an issue (trivial one though) that needs to be fixed for 0.9 Release, will be rerolling the release today (in the next few hrs) and putting out a new release candidate in staging. Thanks for reporting this Andrew P. On Monday, January 20, 2014 12:34 AM, Andrew Palumbo wrote: I ran through the tests with on a CentOS VM AMD64 2 cores 4 GB RAM. Had a bit of trouble getting the Hadoop natives to compile and therefore may have run into some problems because of the hadoop setup. Ran into some problems in the example scripts. Particularly with ./cluster-syntheticcontrol.sh ->4,5. I will run through the rest of the examples when im sure I've got hadoop setup right. Apache Maven 3.1.2-SNAPSHOT Java version: 1.6.0_45, vendor: Sun Microsystems Inc. Java home: /usr/java/jdk1.6.0_45/jre OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch: "amd64", family: "unix" $MAHOUT_LOCAL=true Hadoop 2.2.0 a) Verify that u can unpack the release (tar or zip) ...passed (tar) [passed ] b) Verify u r able to compile the distro mvn compile- [passed with warnings] [WARNING] Expected all dependencies to require Scala version: 2.9.3 [WARNING] org.apache.mahout:mahout-math-scala:0.9 requires scala version: 2.9.3 [WARNING] org.scalatest:scalatest_2.9.2:1.9.1 requires scala version: 2.9.2 [WARNING] Multiple versions of scala libraries detected! c) Run through the unit tests: mvn clean test mvn clean test [passed] d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run through all the different options in each script Running example scripts with $MAHOUT_LOCAL=true ./cluster-syntheticcontrol.sh ->1 [works] ./cluster-syntheticcontrol.sh ->2 [works] ./cluster-syntheticcontrol.sh ->3 [works] ./cluster-syntheticcontrol.sh ->4 [exits, throws exception] [...] WARNING: Unable to add class: org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job java.lang.ClassNotFoundException: org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:171) at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128) Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn ./cluster-syntheticcontrol.sh ->5 [exits, throws exception] WARNING: Unable to add class: org.apache.mahout.clustering.syntheticcontrol.meanshift.Job java.lang.ClassNotFoundException: org.apache.mahout.clustering.syntheticcontrol.meanshift.Job at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:171) at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128) Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn WARNING: No org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found on classpath, will use command-line arguments only Unknown program 'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job' chosen. ./classify-20newsgroups.sh ->1 [works] ./classify-20newsgroups.sh ->2 [works] cluster-reuters.sh ->1 [works] cluster-reuters.sh ->2 [works] cluster-reuters.sh ->3 [works] Same error as noted previosly in the thread: cluster-reuters.sh ->4 [0 clusters] [...] WARNING: No qualcluster.props found on classpath, will use command-line arguments only Num clusters: 0; maxDistance: 0.00 [Dunn Index] First: Infinity [Davies-Bouldin Index] First: NaN Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info INFO: Program took 669 ms (Minutes: 0.01115) cluster,distance.mean,distance.sd,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train > Date: Thu, 16 Jan 2014 06:41:09 -0800 > From: suneel_mar...@yahoo.com > Subject: MAHOUT 0.9 Release - New URL > To: user@mahout.apache.org; d...@mahout.apache.org > > Third time's a Charm!!! > > > Here's the new URL for Mah
Re: MAHOUT 0.9 Release - New URL
Hmmm... that's an issue. Since both Dirichlet and Meanshift clustering have been removed from 0.9, cluster-syntheticcontrol.sh options 4,5 are not gonna work and should have been removed for 0.9. To PMC, -> rollback the release, fix this issue (and other patches that were submitted in the last few days) and put out another release ? On Monday, January 20, 2014 12:33 AM, Andrew Palumbo wrote: I ran through the tests with on a CentOS VM AMD64 2 cores 4 GB RAM. Had a bit of trouble getting the Hadoop natives to compile and therefore may have run into some problems because of the hadoop setup. Ran into some problems in the example scripts. Particularly with ./cluster-syntheticcontrol.sh ->4,5. I will run through the rest of the examples when im sure I've got hadoop setup right. Apache Maven 3.1.2-SNAPSHOT Java version: 1.6.0_45, vendor: Sun Microsystems Inc. Java home: /usr/java/jdk1.6.0_45/jre OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch: "amd64", family: "unix" $MAHOUT_LOCAL=true Hadoop 2.2.0 a) Verify that u can unpack the release (tar or zip) ...passed (tar) [passed ] b) Verify u r able to compile the distro mvn compile- [passed with warnings] [WARNING] Expected all dependencies to require Scala version: 2.9.3 [WARNING] org.apache.mahout:mahout-math-scala:0.9 requires scala version: 2.9.3 [WARNING] org.scalatest:scalatest_2.9.2:1.9.1 requires scala version: 2.9.2 [WARNING] Multiple versions of scala libraries detected! c) Run through the unit tests: mvn clean test mvn clean test [passed] d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run through all the different options in each script Running example scripts with $MAHOUT_LOCAL=true ./cluster-syntheticcontrol.sh ->1 [works] ./cluster-syntheticcontrol.sh ->2 [works] ./cluster-syntheticcontrol.sh ->3 [works] ./cluster-syntheticcontrol.sh ->4 [exits, throws exception] [...] WARNING: Unable to add class: org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job java.lang.ClassNotFoundException: org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:171) at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128) Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn ./cluster-syntheticcontrol.sh ->5 [exits, throws exception] WARNING: Unable to add class: org.apache.mahout.clustering.syntheticcontrol.meanshift.Job java.lang.ClassNotFoundException: org.apache.mahout.clustering.syntheticcontrol.meanshift.Job at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:171) at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128) Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn WARNING: No org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found on classpath, will use command-line arguments only Unknown program 'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job' chosen. ./classify-20newsgroups.sh ->1 [works] ./classify-20newsgroups.sh ->2 [works] cluster-reuters.sh ->1 [works] cluster-reuters.sh ->2 [works] cluster-reuters.sh ->3 [works] Same error as noted previosly in the thread: cluster-reuters.sh ->4 [0 clusters] [...] WARNING: No qualcluster.props found on classpath, will use command-line arguments only Num clusters: 0; maxDistance: 0.00 [Dunn Index] First: Infinity [Davies-Bouldin Index] First: NaN Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info INFO: Program took 669 ms (Minutes: 0.01115) cluster,distance.mean,distance.sd,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train > Date: Thu, 16 Jan 2014 06:41:09 -0800 > From: suneel_mar...@yahoo.com > Subject: MAHOUT 0.9 Release - New URL > To: