Mahout GSoC 2010: Association Mining

Neal Clark Fri, 09 Apr 2010 14:12:42 -0700

Hello,

I just wanted to introduce myself. I am a MSc. Computer Science
student at the University of Victoria. My research over the past year
has been focused on developing and implementing an Apriori based
frequent item-set mining algorithm for mining large data sets at low
support counts.


https://docs.google.com/Doc?docid=0ATkk_-6ZolXnZGZjeGYzNzNfOTBjcjJncGpkaA&hl=en

The main finding of the above report is that support levels as low as
0.001% on the webdocs (1.4GB) dataset can be efficiently calculated.
On a 100 core cluster all frequent k2 pairs can calculated in
approximately 6 minutes.

I currently have an optimized k2 Hadoop implementation and algorithm
for generating frequent pairs and I am currently extending my work to
items of any length. The analysis of the extended approach will be
complete within the next two weeks.

Would you be interesting in moving forward with such an implementation
 as a GSoC project? If so any comments/feedback would be very much
appreciated. If you are interested I can create a proposal and submit
it to your issue tracker when it comes back online.

Thanks,

Neal.

Mahout GSoC 2010: Association Mining

Reply via email to