Re: boost selected dimensions in kmeans clustering
On Thu, Jan 15, 2015 at 5:23 AM, Miguel Angel Martin junquera < mianmarjun.mailingl...@gmail.com> wrote: > My question is:.. > Is it better to scale up these dimensions directly in the tf-idf > sequence final mix file using this correction factors OR first do scale > up in each tf-vectors and then mix vectors and recalculate the tf-idf > final to minimize errors or desviations in a subsequent clustering > from this tf-idf final mix vectors. > Mathematically it doesn't matter whether you scale the vectors at generation time or before computing distance or by scaling during the distance computation. Different places for the change may be more or less easy in terms of programming. The two easiest places tend to be at the beginning (if you know the weights) since you have to write that code anyway, or at the end since there are provisions for changing the metric in some programs.
Re: boost selected dimensions in kmeans clustering
hi Ted, Yes. I was considering various possibilities. one of them was this. ( scale up these dimensions, for example,multiplying by a configurable factor correction.) I really want to mix two different vectors from the same documents with different lengths and dictionaries , (perhaps some terms of dictionaries are the same). Then I will be multiplyingdimension of each vector by a configurable factor correction. My question is:.. Is it better to scale up these dimensions directly in the tf-idf sequence final mix file using this correction factors OR first do scale up in each tf-vectors and then mix vectors and recalculate the tf-idf final to minimize errors or desviations in a subsequent clustering from this tf-idf final mix vectors. Thanks in advance for your help. One last note: I am bass player and 701q AKG with fiio E12+E09K is a perfect combination!! ;-) 2015-01-14 20:12 GMT+01:00 Ted Dunning : > The easiest way is to scale those dimensions up. > > > > On Wed, Jan 14, 2015 at 2:41 AM, Miguel Angel Martin junquera < > mianmarjun.mailingl...@gmail.com> wrote: > > > hi all, > > > > > > I am clustering using kmeans several text documents from distintct > sources > > and I have generated the sparse vectors of each document yet. > > I want to boost some dimensions in the sparse vectors. > > > > what is the best way to do this ? > > > > is it a good idea load the vectors and find the dimensions values of tf > > or tf-idf and boost this values? > > > > > > Thanks in advance and regards > > >
Re: boost selected dimensions in kmeans clustering
The easiest way is to scale those dimensions up. On Wed, Jan 14, 2015 at 2:41 AM, Miguel Angel Martin junquera < mianmarjun.mailingl...@gmail.com> wrote: > hi all, > > > I am clustering using kmeans several text documents from distintct sources > and I have generated the sparse vectors of each document yet. > I want to boost some dimensions in the sparse vectors. > > what is the best way to do this ? > > is it a good idea load the vectors and find the dimensions values of tf > or tf-idf and boost this values? > > > Thanks in advance and regards >