Re: boost selected dimensions in kmeans clustering

2015-01-15 Thread Ted Dunning
On Thu, Jan 15, 2015 at 5:23 AM, Miguel Angel Martin junquera 
mianmarjun.mailingl...@gmail.com wrote:

 My question is:..
  Is it  better to scale up these dimensions  directly in the tf-idf
 sequence final mix  file using this correction factors  OR first do scale
 up   in each  tf-vectors and then mix vectors and  recalculate the  tf-idf
 final  to minimize  errors or desviations   in a  subsequent clustering
 from this tf-idf final mix vectors.


Mathematically it doesn't matter whether you scale the vectors at
generation time or before computing distance or by scaling during the
distance computation.

Different places for the change may be more or less easy in terms of
programming.  The two easiest places tend to be at the beginning (if you
know the weights) since you have to write that code anyway, or at the end
since there are provisions for changing the metric in some programs.


Re: boost selected dimensions in kmeans clustering

2015-01-15 Thread Miguel Angel Martin junquera
hi Ted,

Yes. I was considering various possibilities. one of them was this. ( scale
up these dimensions, for example,multiplying by a configurable factor
correction.)

 I really want  to mix two different vectors from the same documents
 with different lengths and dictionaries , (perhaps some terms of
dictionaries are the same). Then I will be  multiplyingdimension of
each vector  by a configurable factor correction.

My question is:..
 Is it  better to scale up these dimensions  directly in the tf-idf
sequence final mix  file using this correction factors  OR first do scale
up   in each  tf-vectors and then mix vectors and  recalculate the  tf-idf
final  to minimize  errors or desviations   in a  subsequent clustering
from this tf-idf final mix vectors.

Thanks in advance for your help.

One last note:

I am bass player and  701q AKG  with fiio E12+E09K is a perfect
combination!!


;-)






2015-01-14 20:12 GMT+01:00 Ted Dunning ted.dunn...@gmail.com:

 The easiest way is to scale those dimensions up.



 On Wed, Jan 14, 2015 at 2:41 AM, Miguel Angel Martin junquera 
 mianmarjun.mailingl...@gmail.com wrote:

  hi all,
 
 
  I am clustering using kmeans several text documents from distintct
 sources
  and I have  generated the sparse vectors of each document yet.
  I want to boost some dimensions in the sparse vectors.
 
  what is the best way to do this ?
 
  is it a good idea  load the vectors  and find the dimensions values of tf
  or tf-idf and boost this values?
 
 
  Thanks in advance and regards
 



Re: boost selected dimensions in kmeans clustering

2015-01-14 Thread Ted Dunning
The easiest way is to scale those dimensions up.



On Wed, Jan 14, 2015 at 2:41 AM, Miguel Angel Martin junquera 
mianmarjun.mailingl...@gmail.com wrote:

 hi all,


 I am clustering using kmeans several text documents from distintct sources
 and I have  generated the sparse vectors of each document yet.
 I want to boost some dimensions in the sparse vectors.

 what is the best way to do this ?

 is it a good idea  load the vectors  and find the dimensions values of tf
 or tf-idf and boost this values?


 Thanks in advance and regards