The maintainer of FPG algorithm

2014-01-20 Thread Yoonmin Nam
If it is available, then I also want to become a maintainer of FGP
algorithm.

Thanks

Yoonmin



Re: About Parallel Frequent Growth algorithm

2014-01-20 Thread Dmitriy Lyubimov
(1) diligently answer inqueries about method use and theoretical foundation
on the user list
(2) fix arising issues related to that stuff, diligently as well.

when (1) and (2) stops happening, the method gets axed in a year or couple
releases (which what pretty much happend this time i think).

You think you'd be able to subscribe for that for the next few years?


On Mon, Jan 20, 2014 at 4:46 PM, Qinghao Dai  wrote:

> May I ask what is the qualification to be the maintainer?
> I have read this part of code, and would like to have a try.
>
> Best Regards,
> Qinghao
>
>
> 2014/1/20 Ted Dunning 
>
> > On Mon, Jan 20, 2014 at 5:44 PM, Suneel Marthi  > >wrote:
> >
> > > I was asked this question too and I had no clear answer. May be it
> wasn't
> > > right to remove FP from the codebase.
> > >
> >
> > The major problem was that we had no maintainers for the code.
> >
>


Re: About Parallel Frequent Growth algorithm

2014-01-20 Thread Qinghao Dai
May I ask what is the qualification to be the maintainer?
I have read this part of code, and would like to have a try.

Best Regards,
Qinghao


2014/1/20 Ted Dunning 

> On Mon, Jan 20, 2014 at 5:44 PM, Suneel Marthi  >wrote:
>
> > I was asked this question too and I had no clear answer. May be it wasn't
> > right to remove FP from the codebase.
> >
>
> The major problem was that we had no maintainers for the code.
>


Re: About Parallel Frequent Growth algorithm

2014-01-20 Thread Ted Dunning
On Mon, Jan 20, 2014 at 5:44 PM, Suneel Marthi wrote:

> I was asked this question too and I had no clear answer. May be it wasn't
> right to remove FP from the codebase.
>

The major problem was that we had no maintainers for the code.


Re: About Parallel Frequent Growth algorithm

2014-01-20 Thread Dmitriy Lyubimov
it seems more like it is not supported.
I'd port it into spark counterpart and make sure there's a support (i.e. a
person to go after when it breaks :)


On Mon, Jan 20, 2014 at 3:44 PM, Suneel Marthi wrote:

> I was asked this question too and I had no clear answer. May be it wasn't
> right to remove FP from the codebase.
> Not having this may well be one another reason for users to look at
> options other than Mahout.
>
> Given the issues that Frank's reported with Streaming KMeans (and I am
> seeing them too) I was gonna rollback the Release presently in staging
> anyways.
>
> Do we take a step back and restore FP for 0.9?
>
>
>
>
>
>
> On Monday, January 20, 2014 6:31 PM, Dmitriy Lyubimov 
> wrote:
>
> that's a bit weird though. Association mining is still a pretty popular
> technique. (our scientists use it, albeit not in exact FPGrowth form)
>
>
>
> On Mon, Jan 20, 2014 at 3:15 PM, Sebastian Schelter 
> wrote:
>
> > Hi Yoonmin,
> >
> > we removed a bunch of algorithms either because they were rarely used or
> > not actively maintained anymore. IIRC the first thing was true for PFG.
> >
> > --sebastian
> >
> >
> > On 01/20/2014 03:42 AM, Yoonmin Nam wrote:
> >
> >> Hello, everyone!
> >>
> >>
> >>
> >> Is there anyone know about the reason why PFG is deprecated in Mahout?
> >>
> >>
> >>
> >> I knew that new algorithm (BIGFIM) will be implemented as a substitution
> >> of
> >> old PFG algorithm for parallel frequent pattern mining.
> >>
> >>
> >>
> >> Please let me know if you knew the reason.
> >>
> >>
> >>
> >> Thanks!
> >>
> >>
> >>
> >
>


Re: About Parallel Frequent Growth algorithm

2014-01-20 Thread Suneel Marthi
I was asked this question too and I had no clear answer. May be it wasn't right 
to remove FP from the codebase.
Not having this may well be one another reason for users to look at options 
other than Mahout.

Given the issues that Frank's reported with Streaming KMeans (and I am seeing 
them too) I was gonna rollback the Release presently in staging anyways. 

Do we take a step back and restore FP for 0.9? 






On Monday, January 20, 2014 6:31 PM, Dmitriy Lyubimov  wrote:
 
that's a bit weird though. Association mining is still a pretty popular
technique. (our scientists use it, albeit not in exact FPGrowth form)



On Mon, Jan 20, 2014 at 3:15 PM, Sebastian Schelter  wrote:

> Hi Yoonmin,
>
> we removed a bunch of algorithms either because they were rarely used or
> not actively maintained anymore. IIRC the first thing was true for PFG.
>
> --sebastian
>
>
> On 01/20/2014 03:42 AM, Yoonmin Nam wrote:
>
>> Hello, everyone!
>>
>>
>>
>> Is there anyone know about the reason why PFG is deprecated in Mahout?
>>
>>
>>
>> I knew that new algorithm (BIGFIM) will be implemented as a substitution
>> of
>> old PFG algorithm for parallel frequent pattern mining.
>>
>>
>>
>> Please let me know if you knew the reason.
>>
>>
>>
>> Thanks!
>>
>>
>>
>

Re: About Parallel Frequent Growth algorithm

2014-01-20 Thread Dmitriy Lyubimov
that's a bit weird though. Association mining is still a pretty popular
technique. (our scientists use it, albeit not in exact FPGrowth form)


On Mon, Jan 20, 2014 at 3:15 PM, Sebastian Schelter  wrote:

> Hi Yoonmin,
>
> we removed a bunch of algorithms either because they were rarely used or
> not actively maintained anymore. IIRC the first thing was true for PFG.
>
> --sebastian
>
>
> On 01/20/2014 03:42 AM, Yoonmin Nam wrote:
>
>> Hello, everyone!
>>
>>
>>
>> Is there anyone know about the reason why PFG is deprecated in Mahout?
>>
>>
>>
>> I knew that new algorithm (BIGFIM) will be implemented as a substitution
>> of
>> old PFG algorithm for parallel frequent pattern mining.
>>
>>
>>
>> Please let me know if you knew the reason.
>>
>>
>>
>> Thanks!
>>
>>
>>
>


Re: About Parallel Frequent Growth algorithm

2014-01-20 Thread Sebastian Schelter

Hi Yoonmin,

we removed a bunch of algorithms either because they were rarely used or 
not actively maintained anymore. IIRC the first thing was true for PFG.


--sebastian

On 01/20/2014 03:42 AM, Yoonmin Nam wrote:

Hello, everyone!



Is there anyone know about the reason why PFG is deprecated in Mahout?



I knew that new algorithm (BIGFIM) will be implemented as a substitution of
old PFG algorithm for parallel frequent pattern mining.



Please let me know if you knew the reason.



Thanks!






[OT] Uses Cases for Taming Text, 2nd ed.

2014-01-20 Thread Grant Ingersoll
Hi Mahout Users,

Drew Farris, Tom Morton and I are currently working on the 2nd Edition of 
Taming Text (http://www.manning.com/ingersoll for first ed.) and are soliciting 
interested parties who would be willing to contribute to a chapter on practical 
use cases (i.e. you have something in production and are willing to write about 
it) for search with Solr, NLP using OpenNLP or Stanford NLP and machine 
learning using Mahout, OpenNLP or MALLET -- ideally you are using combinations 
of 2 or more of these to solve your problems.  We are especially interested in 
large scale use cases in eCommerce, Advertising, social media analytics, fraud, 
etc.

The writing process is fairly straightforward.  A section roughly equates to 
somewhere between 3 - 10 pages, including diagrams/pictures.  After writing, 
there will be some feedback from editors and us, but otherwise the process is 
fairly simple.

In order to participate, you must have permission from your company to write on 
the topic.  You would not need to divulge any proprietary information, but we 
would want enough information for our readers to gain a high-level 
understanding of your use case.  In exchange for your participation, you will 
have your name and company published on that section of the book as well as in 
the acknowledgments section.  If you have a copy of Lucene in Action or Mahout 
In Action, it would be similar to the use case sections in those books.

If you are interested, please respond privately to me using my 
gsing...@apache.org email address with this subject line.

Thanks,
Grant, Drew, Tom






Re: MAHOUT 0.9 Release - New URL

2014-01-20 Thread Suneel Marthi
This is an issue (trivial one though) that needs to be fixed for 0.9 Release, 
will be rerolling the release today (in the next few hrs) and putting out a new 
release candidate in staging.

Thanks for reporting this Andrew P. 





On Monday, January 20, 2014 12:34 AM, Andrew Palumbo  wrote:
 
I ran through the tests with on a CentOS VM AMD64 2 cores 4 GB RAM.  Had a bit 
of trouble getting the Hadoop natives to compile and therefore may have run 
into some problems because of the hadoop setup.  Ran into some problems in the 
example scripts.  Particularly with ./cluster-syntheticcontrol.sh ->4,5.  I 
will run through the rest of the examples when im sure I've got hadoop setup 
right.


Apache Maven 3.1.2-SNAPSHOT 
Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
Java home: /usr/java/jdk1.6.0_45/jre
OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch: "amd64", family: 
"unix"
$MAHOUT_LOCAL=true
Hadoop 2.2.0


a) Verify that u can unpack the release (tar or zip) ...passed (tar) [passed ]

b) Verify u r able to compile the distro

    mvn compile- [passed with warnings]

    [WARNING]  Expected all dependencies to require Scala version: 2.9.3
    [WARNING]  org.apache.mahout:mahout-math-scala:0.9 requires scala version: 
2.9.3
    [WARNING]  org.scalatest:scalatest_2.9.2:1.9.1 requires scala version: 2.9.2
    [WARNING] Multiple versions of scala libraries detected!

c)  Run through the unit tests: mvn clean test
    mvn clean test [passed]

d) Run the
 example scripts under $MAHOUT_HOME/examples/bin. 
Please run through all the different options in each script

    Running example scripts with $MAHOUT_LOCAL=true

    ./cluster-syntheticcontrol.sh ->1 [works]
    ./cluster-syntheticcontrol.sh ->2 [works]
    ./cluster-syntheticcontrol.sh ->3 [works]


    ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
    [...]
    WARNING: Unable to add class: 
org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
    java.lang.ClassNotFoundException: 
org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
        at
 java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:171)
        at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
        at
 org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
    Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn


    ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]

    WARNING: Unable to add class: 
org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
    java.lang.ClassNotFoundException: 
org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:171)
        at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
    Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
    WARNING: No 
org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found on 
classpath, will use command-line arguments only
    Unknown program
 'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job' chosen.


    ./classify-20newsgroups.sh ->1 [works]
    ./classify-20newsgroups.sh ->2 [works]


    cluster-reuters.sh ->1 [works]
    cluster-reuters.sh ->2 [works]
    cluster-reuters.sh ->3 [works]
    
    Same error as noted previosly in the thread:

    cluster-reuters.sh ->4 [0 clusters]

    [...]

    WARNING: No qualcluster.props found on classpath, will use command-line 
arguments only
    Num clusters: 0; maxDistance: 0.00
    [Dunn Index]
 First: Infinity
    [Davies-Bouldin Index] First: NaN
    Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
    INFO: Program took 669 ms (Minutes: 0.01115)
    
cluster,distance.mean,distance.sd,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train






> Date: Thu, 16 Jan 2014 06:41:09 -0800
> From: suneel_mar...@yahoo.com
> Subject: MAHOUT 0.9 Release - New URL 
> To: user@mahout.apache.org; d...@mahout.apache.org
> 
> Third time's a Charm!!!
> 
> 
> Here's the new URL for Mah

Re: MAHOUT 0.9 Release - New URL

2014-01-20 Thread Suneel Marthi
Hmmm... that's an issue. Since both Dirichlet and Meanshift clustering have 
been removed from 0.9, cluster-syntheticcontrol.sh options 4,5 are not gonna 
work and should have been removed for 0.9.

To PMC,

 -> rollback the release, fix this issue (and other patches that were submitted 
in the last few days) and put out another release ?







On Monday, January 20, 2014 12:33 AM, Andrew Palumbo  wrote:
 
I ran through the tests with on a CentOS VM AMD64 2 cores 4 GB RAM.  Had a bit 
of trouble getting the Hadoop natives to compile and therefore may have run 
into some problems because of the hadoop setup.  Ran into some problems in the 
example scripts.  Particularly with ./cluster-syntheticcontrol.sh ->4,5.  I 
will run through the rest of the examples when im sure I've got hadoop setup 
right.


Apache Maven 3.1.2-SNAPSHOT 
Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
Java home: /usr/java/jdk1.6.0_45/jre
OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch: "amd64", family: 
"unix"
$MAHOUT_LOCAL=true
Hadoop 2.2.0


a) Verify that u can unpack the release (tar or zip) ...passed (tar) [passed ]

b) Verify u r able to compile the distro

    mvn compile- [passed with warnings]

    [WARNING]  Expected all dependencies to require Scala version: 2.9.3
    [WARNING]  org.apache.mahout:mahout-math-scala:0.9 requires scala version: 
2.9.3
    [WARNING]  org.scalatest:scalatest_2.9.2:1.9.1 requires scala version: 2.9.2
    [WARNING] Multiple versions of scala libraries detected!

c)  Run through the unit tests: mvn clean test
    mvn clean test [passed]

d) Run the example scripts under $MAHOUT_HOME/examples/bin. 
Please run through all the different options in each script

    Running example scripts with $MAHOUT_LOCAL=true

    ./cluster-syntheticcontrol.sh ->1 [works]
    ./cluster-syntheticcontrol.sh ->2 [works]
    ./cluster-syntheticcontrol.sh ->3 [works]


    ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
    [...]
    WARNING: Unable to add class: 
org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
    java.lang.ClassNotFoundException: 
org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:171)
        at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
    Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn


    ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]

    WARNING: Unable to add class: 
org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
    java.lang.ClassNotFoundException: 
org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:171)
        at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
    Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
    WARNING: No 
org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found on 
classpath, will use command-line arguments only
    Unknown program 
'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job' chosen.


    ./classify-20newsgroups.sh ->1 [works]
    ./classify-20newsgroups.sh ->2 [works]


    cluster-reuters.sh ->1 [works]
    cluster-reuters.sh ->2 [works]
    cluster-reuters.sh ->3 [works]
    
    Same error as noted previosly in the thread:

    cluster-reuters.sh ->4 [0 clusters]

    [...]

    WARNING: No qualcluster.props found on classpath, will use command-line 
arguments only
    Num clusters: 0; maxDistance: 0.00
    [Dunn Index] First: Infinity
    [Davies-Bouldin Index] First: NaN
    Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
    INFO: Program took 669 ms (Minutes: 0.01115)
    
cluster,distance.mean,distance.sd,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train






> Date: Thu, 16 Jan 2014 06:41:09 -0800
> From: suneel_mar...@yahoo.com
> Subject: MAHOUT 0.9 Release - New URL 
> To: