Grant: Chapter 5 of Han and Kamber (Data Mining: Concepts and Techniques) detail itemset mining and the fpgrowth alg. Han is a co-inventor of it.
There is a bit of repetition in the output compared to other itemset mining packages, though this structure is convenient for relational indexing by key. - Neal On Mon, Feb 15, 2010 at 6:49 AM, Robin Anil <[email protected]> wrote: > Ok.. A bit more background.. > > An Itemset is a subset I1, I2, I3... In > > so [I2, I4, I7] is an itemset and the support(no of times its visible in the > dataset) is say Y > > A Pattern is Pair<Itemset, support> > > Take a look at in this format > > 68: > ([68],90692), > ([17, 68],90683), > ([12, 68],90490), > ([17, 12, 68],90481), > ([18, 68],90291) > > these are top patterns containing 68 and their support in descending order > 68 occurs with 12, 90490 times > > Robin > > > On Mon, Feb 15, 2010 at 6:27 PM, Grant Ingersoll <[email protected]>wrote: > >> >> On Feb 14, 2010, at 11:37 PM, Robin Anil wrote: >> >> > Each key is a feature and each attribute is the topK frequent patterns >> where >> > the feature exist >> >> Still a bit confused. >> Given: >> Key: 68: Value: ([68],90692), ([17, 68],90683), ([12, 68],90490), ([17, 12, >> 68],90481), ([18, 68],90291), ([17, 18, 68],90282), ([12, 18, 68],90229), >> ([17, 12, 18, 68],90220), ([31, 68],89071), ([17, 31, 68],89062), ([12, 31, >> 68],88874), ([17, 12, 31, 68],88865), ([18, 31, 68],88681), ([17, 18, 31, >> 68],88672), ([12, 18, 31, 68],88619), ([17, 12, 18, 31, 68],88610), ([16, >> 68],87933), >> >> So, 68 is the feature in question. That makes sense. Then, what is the >> significance of the [] areas, as in [68],90692 or [17,12,68], 90481. Why >> all the repetition? >> >> -Grant >
