Ted,

do you think you can give some good links to paper or orther resources about
mentioned approaches? I would like to look at it after the weekend.
As far as I can see the association mining (and the guha method in its
original form) is not meant to be a predictive method but rather data
exploratory method (although having some kind of predictive power but not
formaly supported in the theoretical background). However, comparing
association mining to other approaches can be very interesting topic as
well.

Best regards,
Lukas

On Fri, Apr 9, 2010 at 8:03 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:

> Lukas,
>
> The strongest alternative for this kind of application (and the normal
> choice for large scale applications) is on-line gradient descent learning
> with an L_1 or L_1 + L_2 regularization.  The typical goal is to predict
> some outcome (click or purchase or signup) from a variety of large
> vocabulary features.  As such association mining is usually just a
> pre-processing step before actual learning is applied.  There is some
> indication that an efficient sparse on-line gradient descent algorithm
> applied to features and combinations could do just as well, especially if
> the learning on combinations is applied in several passes.
>
> These on-line algorithms have the virtue of being extremely fast and with
> feature sharding, have substantial potential for parallel implementation.
>
> What do you think about these two methods?  Can you compare them?
>
> On Fri, Apr 9, 2010 at 4:26 AM, Lukáš Vlček <lukas.vl...@gmail.com> wrote:
>
> >  One
> > example would be analysis of click stream, where you can learn that those
> > people visiting some negative comment on product blog never enter order
> > form. Not saying this is best example but in general this is the essence
> of
> > it. You simply need to take all possible values from the transaction into
> > account even if it is missing in the market basket....
> >
> > The biggest challenge in implementing this would be the fact that the
> > analysis have to deal with all the data (not just the most frequent
> > patterns) and combinations. It is very resource expensive.
> >
>

Reply via email to