Re: Mahout GSoC 2010: Association Mining

Ted Dunning Fri, 09 Apr 2010 17:22:16 -0700

Neal, I think that this might well be a useful contribution to Mahout, but,
if I am not mistaken, I think that the deadline for student proposals for
GSoC has just passed.


That likely means that making this contribution an official GSoC project is
not possible.  I am sure that the Mahout community would welcome you as a
contributor even without official Google status.  If you would like to do
this, go ahead and propose what you want to do (when JIRA comes back or just
by email discussion) and you can get started.

On Fri, Apr 9, 2010 at 2:11 PM, Neal Clark <[email protected]> wrote:

> Hello,
>
> I just wanted to introduce myself. I am a MSc. Computer Science
> student at the University of Victoria. My research over the past year
> has been focused on developing and implementing an Apriori based
> frequent item-set mining algorithm for mining large data sets at low
> support counts.
>
>
> https://docs.google.com/Doc?docid=0ATkk_-6ZolXnZGZjeGYzNzNfOTBjcjJncGpkaA&hl=en
>
> The main finding of the above report is that support levels as low as
> 0.001% on the webdocs (1.4GB) dataset can be efficiently calculated.
> On a 100 core cluster all frequent k2 pairs can calculated in
> approximately 6 minutes.
>
> I currently have an optimized k2 Hadoop implementation and algorithm
> for generating frequent pairs and I am currently extending my work to
> items of any length. The analysis of the extended approach will be
> complete within the next two weeks.
>
> Would you be interesting in moving forward with such an implementation
>  as a GSoC project? If so any comments/feedback would be very much
> appreciated. If you are interested I can create a proposal and submit
> it to your issue tracker when it comes back online.
>
> Thanks,
>
> Neal.
>

Re: Mahout GSoC 2010: Association Mining

Reply via email to