Thanks for pitching in.  Ordering is extremely important indeed.

On Thu, Nov 19, 2009 at 12:56 AM, Ted Dunning <[email protected]> wrote:
> If you want to preserve some ordering ifnormation, then you have a bit more
> of a problem.  The same basic idea can work where you model your data as a
> mixture density over sequence models.  Once you do that, then the mixture
> parameters make a reasonable space to cluster in.  If you have some kind of
> sequence model then the dirichlet process code currently in Mahout can be
> used to do your clustering.

Dont they ( hidden-variable-mixture-models) contradict De Finetti's
basic exchangibility theorem. Unless you are treating each sequence
itself as a term ( which I think  is probably what you are referring
to ) and doing sampling on them. In that case how am I creating
documents ?

>
> There is probably one too many if's in the previous paragraph for you to be
> happy with it.
>
> Can you say something more about your sequences?  Can you say something
> about your resources?  Do you have a good sequence model?

Basically I want to cluster user's browsing behavior. And see what are
the dominant  browsing  paths for a particular user. For example :
portal->sports->ad-click->movies->ad-click->ad-click etc.
Would also appreciate your thoughts on  Suffix-Tree-Clustering based
approaches, which I have been contemplating. Meanwhile there seems to
be lot  more work done for bioinformatics than text/web-mining  in
Sequence Clustering.

-Prasen

>
> On Wed, Nov 18, 2009 at 4:03 AM, prasenjit mukherjee
> <[email protected]>wrote:
>
>> Can we model the sequence clustering problem into a traditional
>> term-doc clustering ?
>>
>> One approach I can think of is creating a self-similarity matrix
>> between the sequences and then running a traditional clustering algo (
>> spectral or k-means ). That seems to be too expensive though.
>>
>> Any suggestions ?
>>
>> Thanks,
>> -Prasen
>>
>> On Wed, Nov 11, 2009 at 3:53 PM, Isabel Drost <[email protected]> wrote:
>> > On Sat prasenjit mukherjee <[email protected]> wrote:
>> >
>> >> I was thinking of using a semi-supervised ( unsupervised will be even
>> >> better ) sequence clustering technique ( like CRF, HMM etc. ) Just
>> >> curious, any work been done ( or discussed ) in this mailing list to
>> >> perform sequence clustering using temporal data.
>> >
>> > So far none that I am aware of. There were a few discussions on HMMs
>> > early on, but I am not sure what came out of that.
>> >
>> > Isabel
>> >
>>
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>

Reply via email to