[R] Outer function in R

2013-07-02 Thread Dzu
Dear members 

I am trying to apply the function kl.dist (Kullback-Leibler Distance
measure) to multiple matrixes.

I tried the following : 

veckldist - Vectorize(kl.dist)
distancematrix - outer (matrix1,matrix2, veckldist)


But the code is complaining that the list of the object does not match. The
lengths of my matrixes are same 

How could I fix the error?

Thanks





--
View this message in context: 
http://r.789695.n4.nabble.com/Outer-function-in-R-tp4670738.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Distance Measurement between probability distributions

2013-07-01 Thread Dzu
Dear R -Users,

I wanted to know about some existing functions (despite of euclidiean) to
compute the distance between multiple histograms.
I have found some examples like kullbakc -Leibler DIvergenz but the syntax
for this is not available?

Does anybody have an idea?
Thanks



--
View this message in context: 
http://r.789695.n4.nabble.com/Distance-Measurement-between-probability-distributions-tp4670646.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Bhattacharyya in R

2013-07-01 Thread Dzu
Dear R-user,
I am trying to apply bhattacharyya-distance function to my data. Did anybody
use it before ?

My code is the following

#Bhattacharya Distance measure
#a and b are vectors 
a - (1,2,3,4,2,2,2,2,2,2,2,1,4,5,6,-1,-1,-1,-1,-1,-3,-3,-3)
b -
(1.1,1.1,1.2,1.2,1.2,1.2,1.2,2.1,2.1,2.2,2.2,2,0,0,0,0,2,2,2,2,2,3.1,3.1)

dist - bhattacharyya.matrix(a,b,missclasification = TRUE)
plot(dist)


Could somebody give me a guide on the syntax ?

Thanks
Dizem





--
View this message in context: 
http://r.789695.n4.nabble.com/Bhattacharyya-in-R-tp4670671.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] K-means results understanding!!!

2013-06-24 Thread Dzu
Dear  members.

I am having problems to understand the kmeans- results in R. I am applying
kmeans-algorithms to my big data file, and it is producing the results of
the clusters.

Q1) Does anybody knows how to find out in which cluster (I have fixed
numberofclusters = 5 ) which data have been used?
COMMAND
(kmeans.results - kmeans(mydata,centers =5, iter.max= 1000, nstart =1))

Q2) When I call kmeans.results I have the following output: 


K-means clustering with 5 clusters of sizes 17, 1, 6, 4, 32

Cluster means:
  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11][,12]
1000000000 0 0.000 0.0008235294
2000000000 0 0.000 0.00
3000000000 0 0.000 0.00
4000000000 0 0.000 0.004000
5000000000 0 0.0003125 0.000375
 [,13]   [,14]   [,15]   [,16]   [,17]  [,18]
1 0.0008235294 0.001176471 0.005176471 0.012471295 0.041181652 0.10663935
2 0.00 0.0 0.0 0.0 0.169491525 0.61016949
3 0.00 0.0 0.0 0.00233 0.00667 0.07695015
4 0.003000 0.00150 0.00100 0.01750 0.02900 0.0615
5 0.0015625000 0.003437500 0.010687500 0.046375000 0.100062500 0.14306250
   [,19] [,20] [,21] [,22]  [,23]  [,24]   [,25]
1 0.12946535 1.0017347 0.3360283 0.2455259 0.08565672 0.02553212 0.00600
2 0.94915254 0.1694915 0.1016949 0.000 0. 0. 0.0
3 0.09376439 1.3857837 0.2659812 0.1015707 0.03804953 0.02023362 0.00767
4 0.1710 0.6665000 0.786 0.186 0.0465 0.0145 0.01200
5 0.1810 0.5200625 0.4156875 0.3461250 0.16925000 0.04918750 0.01150
 [,26]   [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35]
1 0.0005882353 0.001176471 0 0 0 0 0 0 0 0
2 0.00 0.0 0 0 0 0 0 0 0 0
3 0.001000 0.0 0 0 0 0 0 0 0 0
4 0.00 0.0 0 0 0 0 0 0 0 0
5 0.0013125000 0.0 0 0 0 0 0 0 0 0
  [,36] [,37] [,38] [,39] [,40]
1 0 0 0 0 0
2 0 0 0 0 0
3 0 0 0 0 0
4 0 0 0 0 0
5 0 0 0 0 0

Clustering vector:
 [1] 1 5 5 3 1 5 5 5 5 1 4 1 5 5 5 5 4 5 2 3 5 5 1 5 5 5 5 1 3 1 4 5 5 1 5 5
5 1
[39] 3 1 5 5 3 1 1 1 1 5 5 1 4 1 3 5 5 5 5 5 5 1

Within cluster sum of squares by cluster:
[1] 0.6702803 0.000 0.2453294 0.1860180 1.3535263
 (between_SS / total_SS =  76.8 %)

Available components:

[1] cluster  centers  totsswithinss
tot.withinss
[6] betweensssize
 
Q3)I would like to understand which raw data are in which cluster ?  Does
somebody knows how to access the table of raw data which are in the same
cluster ?

Thanks for help
DZU



--
View this message in context: 
http://r.789695.n4.nabble.com/K-means-results-understanding-tp4670171.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] K-means results understanding!!!

2013-06-24 Thread Dzu
Hi,
Thanks for reply but I already read the help page I am new in R and did not
understand the output description of kmeans -function. That is why I wanted
to ask some experts in the group.

My point is that I do not understand which data are combined in the specific
cluster?

I tried the following :

(kmeans.results - kmeans(mydata,centers =4, iter.max= 1000, nstart =1))
# The output data type is logical , cl1 is the cluster 1

cl1 - data.frame(as.numeric(kmeans.results$cluster == 1)) 
nbcl1 - sum (cl1, na.rm = 1)
#output of the number of cl1 logical 1 values is for example 22
#this means there are 22 vectors which are similar 

but when I call  :
mydata[kmeans.results$cluster==1,]
I only get 1 vector not 22 vectors that are in the cluster 1.

I thought in the cluster 1 there are many vectors that are similar based on
kmeans -function. But the output is only one vector!



--
View this message in context: 
http://r.789695.n4.nabble.com/K-means-results-understanding-tp4670171p4670187.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] hist function in a for loop

2013-06-18 Thread Dzu
Dear all,

I need to create a for-loop in which I can compute multiple histograms
My code is the following :
#singlefile includes huge csv file
#I want to specify the binsize
#I would like to compute in the for -loop the histograms


numfiles - length(singlefile)
for (i in 1 :51)
{ 
binsize - -20 :20/2
hist(singlefile(singlefile$GVC[singlefile$new_id==i]], break=seq(), by =
binsize)))

What do I have to do ?
How can I specify the range for i  ?

I am totally lost
Thanks for support
D.U



--
View this message in context: 
http://r.789695.n4.nabble.com/hist-function-in-a-for-loop-tp4669797.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] hist function in a for loop

2013-06-18 Thread Dzu
Hello 
Thanks for reply

I want to compute several histograms in a for loop.I am trying to set the
binsize constant in the beginning.

#compute the histograms
 for (i in 1:12)
{ 
binsize - -20 :20/2

hist(singlefile$GVC(singlefile$new_id[,i], freq = FALSE,xlab =Graph i,
col = pink,main =Example Histogram, ylim = c(-3.0,3.0)))
singlefile$GVCmin - min(singlefile$GVC[1])
singlefile$GVCmin - min(singlefile$GVC[1])
x1 - seq(-3.0,3.0,by=.01)
lines(x1,dnorm(x1),col =black)
}

I tried also this but it does not do anything. I also tried your proposal ,
but it says that : breaks = binsize  is not allowed.

I think I am totaly far away from that what I want to do with my code .

One single histogram plotting and computing is easy , but if it is in the
loop , by the syntax to feed the function with counter i is not working

Thanks
Dizem




--
View this message in context: 
http://r.789695.n4.nabble.com/hist-function-in-a-for-loop-tp4669797p4669816.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] hist function in a for loop

2013-06-18 Thread Dzu
Dear,

I want to do the following :

#I have created a huge csv.files with 44 colums
#I want to select the specific colums from these files
#CL1 consist data from which I want to compute the histogramms, CL2 is the
cloumn which has numbers that identifies  know from which line my second
histogram data should start.
THE CSV FILE  loos like this:

CL1   CL2  CL3   CL4 ..CLn
0.316.7  4.3  ...
...   ..   ...       
0.82  .. .

My target is to select only CL1 and CL2 compute histogram using CL1 data for
each CL2-block as an example [1:2] until CL2 [1:60]

I could print the histogramms but I can do only one by one. I want to
compute all of them with the same binsize!!

Therefore I wrote this code:

#combine diffrent csv files into one

files - list.files (path = ./Inputfiles,.csv)
numfiles - length(files)
print(files)
singlefile - list()

#for loop
offset - 1
mytotaldata - list()   #mytotaldata includes merged csv.file
for (i in 1:numfiles)
{
mytotaldata[[files[i]]] - read.csv(files[i], header = TRUE, sep = ,
,quote = \)

#CL5 adding and giving an identification
mytotaldata[[files[i]]][CL5] -  i

#CL2 adding and create identification for the number of lines 

mytotaldata[[files[i]]][CL2] -
as.character(floor(as.numeric(rownames(mytotaldata[[files[i]]]))/1000)+offset)
offset - as.numeric(tail(mytotaldata[[files[i]]],1)[CL2]) + 1

#Create a singlefile for the whole data
singlefile - rbind(singlefile,mytotaldata[[files[i]]])
}

#Now I have combined csv file added 2 columns CL2, CL5
# Compute the histograms
#library (lattice)
numfiles - length(singlefile)  ###Is this necessary???
for (i in 1:i)
{ 
#all the histograms with the same csv file
binsize - -20 :20/2
hist(singlefile$CL1(singlefile$CL2[,1], freq = FALSE,xlab =Graph i,
col = pink,main =Example Histogram, ylim = c(-3.0,3.0)))
singlefile$GVCmin - min(singlefile$CL1[1])
singlefile$GVCmin - min(singlefile$CL1[1])
x1 - seq(-3.0,3.0,by=.01)
lines(x1,dnorm(x1),col =black)
}

My struggle point is the for-loop with the histograms computation in the
loop and using the binsize I have specified.

Maybe now the question is clear!
In case somebody has faced a similar problem ,please let me know about
tircks, ideas !!
I am trying many diffrent thing to let this for loop work but I did not find
a solution, therefore I decided to ask in the forum

Thanks in advance
DZU















--
View this message in context: 
http://r.789695.n4.nabble.com/hist-function-in-a-for-loop-tp4669797p4669823.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reading multiple csv files with a for loop

2013-06-17 Thread Dzu
Dear R-help users,

I am quite new in R. I have multiple csv.files with different size. I would
like to read them by using a for- loop and parallel by reading I need to add
a new column which can be specified by myself.

But my for-loop does not work !
Could somebody give me any idea ?
Many thanks!

myfiles -list()
for ( i in 1:11) myfiles[i] - read.csv(toread,header = TRUE, sep=)
names(myfiles) - paste(myfiles)
mytotalfiles - myfiles

#sample the data by the number of the columns by adding a new column 
sample(i1, 1000, replace = FALSE, prob = NULL)
for n - 1000
sample - myfiles[sample(nrow (df), 1000),]







--
View this message in context: 
http://r.789695.n4.nabble.com/Reading-multiple-csv-files-with-a-for-loop-tp4669681.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.