[R] Histogram equalization of dataset

2013-03-20 Thread SpaceSeller
I read about histogram equalization of dataset. It is commonly used for image
data. It is written: Each variable in dataset is assigned to one bin. Then,
each variable in the dataset is assigned to one bin, incrementing the value
of that bin by one. Then, a cumulative histogram is created by adding to
each bin the value of its preceding bins. Finaly, the value of each bin is
divided by the total number of observations, thus standardizing the data
within the interval of [0,1].

Could you show me how to get it?
If a have  a dataset below:
  X1
X2
Min.   :-4.37371   Min.   :-27.84627   
 1st Qu.:-0.205581st Qu.:  0.0  
 Median :-0.01528  Median :  0.07848
 Mean   :-0.04896   Mean   :  0.02751  
 3rd Qu.: 0.14511  3rd Qu.:  0.28831   
 Max.   : 0.78047   Max.   :  0.89851   



--
View this message in context: 
http://r.789695.n4.nabble.com/Histogram-equalization-of-dataset-tp4661887.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Feature selection for kmeans

2013-02-07 Thread SpaceSeller

I know that within sum of squares, DB, sillhouette and cophenetic are
indicators of clustering quality, but what indicators I need to observe when
I choose attributes for kmeans?



--
View this message in context: 
http://r.789695.n4.nabble.com/Feature-selection-for-kmeans-tp4657830.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] What methods you use for choosing attributes for kmeans and other clustering (unsupervised) methods?

2013-02-04 Thread SpaceSeller
What characteristics makes attribute great for kmeans clustering and how do
you measure these characteristics? Do you look at the distribution shape,
correlation matrix between attributes, pairs graphs, statistical tests...?




--
View this message in context: 
http://r.789695.n4.nabble.com/What-methods-you-use-for-choosing-attributes-for-kmeans-and-other-clustering-unsupervised-methods-tp4657464.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.