Hi r-help-boun...@r-project.org napsal dne 21.04.2009 12:25:01:
> > Dear R users, > > I need to do sampling without replacement (bootstraps). I have two variables > (Xvar, Yvar). > I have a correlation from original data set cor(Xvar, Yvar)=0.6174221. I am > doing 50000 sampling, > and in each sampling calculating correlations, saving, sorting and getting > 95% cutt off point (0.1351877). > I am getting maximum value as 0.3507219 (much smaller than correlation of my > original data). > I repeated the sampling a couple of time and none of them produced a correlation > coefficient higher than my original data set. However, if I sort out my Xvar > and Yvar and > obtain correlation it is 0.9657125 which is much higher than correlation for > my original data. > I am doing sampling in another program and getting at least 1% higher > correlation than mine. > Now I am getting confused with sampling(random data) in R. My data and codes > for the scenario above are below > > > Xvar<-c(0.1818182,0.5384615,0.5535714,0.4680851,0.4545455,0.4385965,0.5185185, > 0.4035088,0.4901961,0.3650794,0.462963,0.4,0.56,0.3965517,0.4909091, > 0.4716981,0.4310345,0.2,0.1509434,0.2647059,0.173913,0.1914894,0. > 1914894,0.1489362,0.1363636,0.2244898,0.2325581,0.1333333,0.1818182,0.1702128, > 0.2173913,0.2380952,0.1632653,0.5614035,0.3396226,0.4909091,0.3770492, > 0.5,0.5185185,0.5,0.4666667,0.4464286,0.362069,0.4285714,0.4561404, > 0.4736842,0.4545455,0.4166667,0.4181818,0.4590164,0.5166667,0.5423729, > 0.4833333,0.5454545,0.4393939,0.5172414,0.4098361,0.4745763,0.4754098, > 0.5166667,0.5,0.4603175,0.42,0.4038462,0.4897959,0.3148148,0.3673469, > 0.4,0.4583333,0.3877551,0.4375,0.4117647,0.4313725,0.5333333,0.3962264, > 0.3548387,0.5272727,0.4137931,0.3928571,0.4666667,0.4210526,0.4363636, > 0.4545455,0.4310345,0.4237288,0.4814815,0.4912281,0.4333333,0.4,0.4285714, > 0.4516129,0.5090909,0.4464286,0.4642857,0.4166667,0.4098361,0.4909091, > 0.3809524,0.5272727,0.4814815,0.5254237,0.627451,0.5,0.5471698,0.5454545, > 0.5925926,0.5769231,0.5818182,0.4444444,0.4915254,0.4727273,0.4107143, > 0.4285714,0.4310345,0.4237288,0.4285714,0.440678,0.4237288,0.4807692, > 0.4150943,0.4615385,0.4107143,0.4814815,0.4074074,0.4210526,0.5263158, > 0.440678,0.4576271,0.5344828,0.5,0.5636364,0.4677419,0.5,0.5192308, > 0.4642857,0.5090909,0.58,0.4482759,0.5098039,0.4035088,0.4210526,0. > 5098039,0.4385965,0.5283019,0.5471698,0.625,0.4310345,0.4912281,0.5283019, > 0.4576271,0.5471698,0.4745763,0.4821429) > > Yvar<-c(0.2553191,0.4107143,0.5660377,0.3888889,0.3606557,0.2898551,0.3818182, > 0.4,0.4,0.3278689,0.2903226,0.4074074,0.4181818,0.3,0.2238806,0.3728814, > 0.3709677,0.2307692,0.2830189,0.2244898,0.2142857,0.2131148,0.22,0. > 2258065,0.2321429,0.2,0.2264151,0.22,0.2115385,0.2459016,0.1166667,0.1785714, > 0.2068966,0.6,0.4285714,0.3134328,0.4461538,0.3965517,0.4769231,0. > 6181818,0.4827586,0.3709677,0.3965517,0.4821429,0.4545455,0.359375,0.4576271, > 0.4516129,0.5272727,0.4603175,0.4,0.4912281,0.5384615,0.5,0.4516129,0. > 4126984,0.4655172,0.5263158,0.4925373,0.358209,0.4285714,0.4920635, > 0.4482759,0.3235294,0.4,0.4375,0.440678,0.3898305,0.35,0.4528302,0.58, > 0.4153846,0.3174603,0.5185185,0.3870968,0.2894737,0.3709677,0.369863, > 0.3676471,0.3636364,0.3088235,0.328125,0.4032258,0.4084507,0.3188406, > 0.3636364,0.3823529,0.2816901,0.4722222,0.5,0.3521127,0.4393939,0.3787879, > 0.453125,0.4324324,0.4057971,0.4545455,0.4492754,0.5,0.4098361,0. > 4067797,0.3666667,0.3928571,0.4285714,0.5,0.2923077,0.4561404,0.45,0.5538462, > 0.4626866,0.4057971,0.3676471,0.5322581,0.5428571,0.375,0.4411765,0. > 4571429,0.4,0.3846154,0.3870968,0.4915254,0.530303,0.4375,0.4918033,0.4179104, > 0.4032258,0.3606557,0.5178571,0.4848485,0.390625,0.375,0.4375,0. > 3666667,0.4,0.4477612,0.2571429,0.4032258,0.3382353,0.4814815,0.4090909,0.3548387, > 0.4821429,0.5,0.557377,0.4333333,0.5454545,0.4590164,0.3943662,0. > 5076923,0.5,0.3283582,0.3676471,0.559322) > > my.cor<-cor(Xvar, Yvar) > print(my.cor) > > nperm<-49999 > Perm.Cor<-NULL > > for (iperm in 1:nperm) { > XvarNew<-sample(Xvar, size=length(Xvar), replace=FALSE) > YvarNew<-sample(Yvar, size=length(Yvar), replace=FALSE) > perm.cor<-cor(XvarNew, YvarNew) > Perm.Cor<-c(Perm.Cor, perm.cor) > } AFAICU you do not sample your data you shuffle them. Then you compute cor with shuffled data (X and Y are shuffled independently) which results in low correlation (it is like shuffling cards). Maybe you could use smaller size and sample not original data but a vector of indices perm.cor<-rep(NA, 49999) for (iperm in 1:nperm) { ind <- sample(1:length(Xvar), size = 100, replace=FALSE) perm.cor[iperm] <- cor(Xvar[ind], Yvar[ind]) perm.cor } max(perm.cor) hist(perm.cor) The result seems to be quite reasonable. Regards Petr > print(max(Perm.Cor)) > XvarSorted<-sort(Xvar, decreasing=TRUE) > YvarSorted<-sort(Yvar, decreasing=TRUE) > max.cor<-cor(XvarSorted, YvarSorted) > print(max.cor) > if(mat.cor>0) Perm.Cor.Sorted<-sort(Perm.Cor, decreasing=TRUE) > if(mat.cor<0) Perm.Cor.Sorted<-sort(Perm.Cor, decreasing=FALSE) > T95<-Perm.Cor.Sorted[(nperm+1)*0.05] # 95% treshold value > T99<-Perm.Cor.Sorted[(nperm+1)*0.01] # 99% treshold value > > > > I want to understand where I am making a mistake. Any comment is deeply appreciated. > > Kind Regards > > Seyit Ali > > > ------------------------------------------------------------------------------------------------------------------ > Dr. Seyit Ali KAYIS > Selcuk University > Faculty of Agriculture > Kampus, Konya, TURKEY > > s_a_ka...@yahoo.com, s_a_ka...@hotmail.com > Tell: +90 332 223 2830 Mobile: +90 535 587 1139 Fax: +90 332 241 0108 > > Greetings from Konya, TURKEY > http://www.ziraat.selcuk.edu.tr/skayis/ > ---------------------------------------------------------------------------------------------------------------------- > > > > > > > > _________________________________________________________________ > Earning enough? Find out with SEEK Salary Survey > > %2Eco%2Enz%2F%3Ftracking%3Dsk%3Atl%3Asknzsal%3Amsnnz%3A0%3Ahottag%3Aearn% > 5Fenough&_t=757263783&_r=Seek_NZ_tagline&_m=EXT > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.