What is the best way to tell if Apache code is being maintained, in particular the fp-growth algorithm in Spark's MLlib?

My original intent (5 months ago) was to replace the map reduce portion of the fp-growth code with an alternate, though I wasn't sure what that alternate should be.

My motivation for wanting frequent itemsets is that they are closed with respect to intersections, so they form simplicial complexes. I've written software for mining simplicial complexes for their geometry. Actually, for their 2-dimensional persistent homology. It means I can look at how the geometry changes as both the support and confidence parameters vary. I'm hoping to take at least some of the guesswork out of making the right choices for these parameters, which seems to be sort of an open question.

So for now I'll see if Spark's implementation generates usable frequent item sets, and have some fun learning Scala, and see about maybe getting fp-growth running on top of Flink.


On 04/27/2015 07:59 AM, Ted Dunning wrote:

Ray,

Is the Spark implementation usable?  Is it maintained?  If not, there is
a decent reason to move forward.

I don't think that we want to revive the old map-reduce implementation.



On Mon, Apr 27, 2015 at 5:48 AM, ray <rtmel...@gmail.com
<mailto:rtmel...@gmail.com>> wrote:

    I had it in mind to volunteer to maintain the fp-growth code in
    Mahout, but I see that Spark has an fp-growth implementation.  So
    now that I have the time to work on this, I'm wondering if there is
    any point, or if there is still any interest in the Mahout community.

    If not, so be it.  If so, I volunteer.

    Regards, Ray.


Reply via email to