On Sun, Nov 22, 2009 at 9:54 AM, Ted Dunning <[email protected]> wrote: > Expressing your symbolic sequences by tiling these phrases gives you much of > the temporality that you are interested and lets you use algorithms like > k-means pretty much directly.
Approach sounds interesting . Can you explain a bit on how you intend to represent a sequence as a vector here ? Assuming sequence being "a b a a c". I was thinking of the following 2 approaches : If I use symbols as my basis and the coefficients as time-slices then I would loose the information of recurring symbols ( symbol a in my example ) . e.g. vector representation of "a b a a c": 1(a)+ 2(b) + 5(c) ( problem : how to incorporate 3a,4a ) On the other hand if I use time-slices as my basis and some mapping of terms as its coefficients then my simple euclidean measure wont make any sense. e.g. let's a->1, b->2, c->3, then vector representation of "a b a a c": 1(t1) + 2(t2) + 1(t3) + 1(t4) + 3(t5) -Prasen > > If you don't have symbolic sequences, you have another problem, but you > might get similar results by doing vector quantization on your continuous > time-series expressed in terms of multi-scale localized spectral detectors. > Some problems work well with those techniques, some definitely need more > interesting feature detectors. The spectral processing and vector > quantization are fairly natural tasks for map-reduce which is nice. In > fact, vector quantization is commonly done with some variant on k-means. >
