What is the best way to tell if Apache code is being maintained, in
particular the fp-growth algorithm in Spark's MLlib?
My original intent (5 months ago) was to replace the map reduce portion
of the fp-growth code with an alternate, though I wasn't sure what that
alternate should be.
My motivation for wanting frequent itemsets is that they are closed with
respect to intersections, so they form simplicial complexes. I've
written software for mining simplicial complexes for their geometry.
Actually, for their 2-dimensional persistent homology. It means I can
look at how the geometry changes as both the support and confidence
parameters vary. I'm hoping to take at least some of the guesswork out
of making the right choices for these parameters, which seems to be sort
of an open question.
So for now I'll see if Spark's implementation generates usable frequent
item sets, and have some fun learning Scala, and see about maybe getting
fp-growth running on top of Flink.
On 04/27/2015 07:59 AM, Ted Dunning wrote:
Ray,
Is the Spark implementation usable? Is it maintained? If not, there is
a decent reason to move forward.
I don't think that we want to revive the old map-reduce implementation.
On Mon, Apr 27, 2015 at 5:48 AM, ray <rtmel...@gmail.com
<mailto:rtmel...@gmail.com>> wrote:
I had it in mind to volunteer to maintain the fp-growth code in
Mahout, but I see that Spark has an fp-growth implementation. So
now that I have the time to work on this, I'm wondering if there is
any point, or if there is still any interest in the Mahout community.
If not, so be it. If so, I volunteer.
Regards, Ray.