Thanks for pitching in. Ordering is extremely important indeed. On Thu, Nov 19, 2009 at 12:56 AM, Ted Dunning <[email protected]> wrote: > If you want to preserve some ordering ifnormation, then you have a bit more > of a problem. The same basic idea can work where you model your data as a > mixture density over sequence models. Once you do that, then the mixture > parameters make a reasonable space to cluster in. If you have some kind of > sequence model then the dirichlet process code currently in Mahout can be > used to do your clustering.
Dont they ( hidden-variable-mixture-models) contradict De Finetti's basic exchangibility theorem. Unless you are treating each sequence itself as a term ( which I think is probably what you are referring to ) and doing sampling on them. In that case how am I creating documents ? > > There is probably one too many if's in the previous paragraph for you to be > happy with it. > > Can you say something more about your sequences? Can you say something > about your resources? Do you have a good sequence model? Basically I want to cluster user's browsing behavior. And see what are the dominant browsing paths for a particular user. For example : portal->sports->ad-click->movies->ad-click->ad-click etc. Would also appreciate your thoughts on Suffix-Tree-Clustering based approaches, which I have been contemplating. Meanwhile there seems to be lot more work done for bioinformatics than text/web-mining in Sequence Clustering. -Prasen > > On Wed, Nov 18, 2009 at 4:03 AM, prasenjit mukherjee > <[email protected]>wrote: > >> Can we model the sequence clustering problem into a traditional >> term-doc clustering ? >> >> One approach I can think of is creating a self-similarity matrix >> between the sequences and then running a traditional clustering algo ( >> spectral or k-means ). That seems to be too expensive though. >> >> Any suggestions ? >> >> Thanks, >> -Prasen >> >> On Wed, Nov 11, 2009 at 3:53 PM, Isabel Drost <[email protected]> wrote: >> > On Sat prasenjit mukherjee <[email protected]> wrote: >> > >> >> I was thinking of using a semi-supervised ( unsupervised will be even >> >> better ) sequence clustering technique ( like CRF, HMM etc. ) Just >> >> curious, any work been done ( or discussed ) in this mailing list to >> >> perform sequence clustering using temporal data. >> > >> > So far none that I am aware of. There were a few discussions on HMMs >> > early on, but I am not sure what came out of that. >> > >> > Isabel >> > >> > > > > -- > Ted Dunning, CTO > DeepDyve >
