On Thu, 2007-08-16 at 12:33 -0400, AbouEl-Makarim Aboueissa wrote: > Dear All: > > Urgent help is needed. > > > I have a data set in matrix format of three columns: X, Y and index > of four groups (1,2,3,4). What I need to do is the following; > > 1- How I can subtract the sample mean of each group indexed 1,2,3,4 > from the > corresponding data values of this group and create new columns > say X-sample mean > and Y-sample mean? I tried to use the "tapply" but I have some > difficulties to restore the new data > > > 2- How I can use the “tapply” if possible or any other R-function to > find the correlation > coefficient between the X and Y columns for each group indexed > 1,2,3,4.? Could not use the "tapply". > > > I attached part of the data as txt file. > > > Thank you so much for your attention to this matter, and I look > forward to hear from you soon. > > Regards, > > Abou > > > Data: > ==== > x y index > 15807.24 12.5 4 > 15752.51 33.5 4 > 12893.76 01.5 3 > 8426.88 22.2 3 > 5706.24 333 3 > 3982.08 560 2 > 3642.62 670 2 > 295.68 124 1 > 215.40 104 1 > 195.40 204 1 > 4240.21 22.4 2 > 1222.72 45.9 2 > 1142.26 23.6 2 > 63.00 90.1 1 > 1216.00 82.4 2 > 2769.60 111 2 > 1790.46 34.7 2 > 26.10 26.10 1 > 19676.83 0.99 4 > 10920.60 203 3 > 6144.00 46 3 > 4534.48 4534.48 3 > 40000.00 65 4 > 29500.00 56 4 > 17100.00 77 4 > 9000.00 435 3 > 6300.00 84 3 > 3962.88 334 2 > 5690.00 653 3 > 3736.00 233 2 > 2750.00 22 2 > 1316.00 345 2 > 4595.00 4595.00 3 > 5928.00 45 3 > 2645.70 0.00 2 > 2580.24 454 2 > 6547.34 6547.34 3 > 1615.68 5 2 > 194.06 55 1 > 184.80 6 1 > 82.94 44 1 > 16649.00 56 4 > 4500.00 74 3 > 1600.00 744 2 > > =================
I might be tempted to take the following approach: If your data is a matrix, coerce it to a data frame first. Let's call that 'DF'. > str(DF) 'data.frame': 44 obs. of 3 variables: $ x : num 15807 15753 12894 8427 5706 ... $ y : num 12.5 33.5 1.5 22.2 333 560 670 124 104 204 ... $ index: int 4 4 3 3 3 2 2 1 1 1 ... Now use split() to break up the data frame into a list of 4 sub-dataframes, based upon the index value. We can use scale() within a lapply() loop to center the 'x' and 'y' columns for each sub-dataframe: DF.ctr <- lapply(split(DF[, -3], DF$index), scale, scale = FALSE) > str(DF.ctr) List of 4 $ 1: num [1:8, 1:2] 138.5 58.2 38.2 -94.2 -131.1 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:8] "8" "9" "10" "14" ... .. ..$ : chr [1:2] "x" "y" ..- attr(*, "scaled:center")= Named num [1:2] 157.2 81.7 .. ..- attr(*, "names")= chr [1:2] "x" "y" $ 2: num [1:16, 1:2] 1469 1129 1727 -1291 -1371 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:16] "6" "7" "11" "12" ... .. ..$ : chr [1:2] "x" "y" ..- attr(*, "scaled:center")= Named num [1:2] 2513 230 .. ..- attr(*, "names")= chr [1:2] "x" "y" $ 3: num [1:13, 1:2] 5879 1413 -1308 3906 -870 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:13] "3" "4" "5" "20" ... .. ..$ : chr [1:2] "x" "y" ..- attr(*, "scaled:center")= Named num [1:2] 7014 1352 .. ..- attr(*, "names")= chr [1:2] "x" "y" $ 4: num [1:7, 1:2] -6262 -6317 -2393 17931 7431 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:7] "1" "2" "19" "23" ... .. ..$ : chr [1:2] "x" "y" ..- attr(*, "scaled:center")= Named num [1:2] 22069 43 .. ..- attr(*, "names")= chr [1:2] "x" "y" Now, create a new single DF comprised of the sub-dataframes from DF.ctr: DF.new <- do.call(rbind, DF.ctr) Define colnames: colnames(DF.new) <- c("x-mean", "y-mean") > str(DF.new) num [1:44, 1:2] 138.5 58.2 38.2 -94.2 -131.1 ... - attr(*, "dimnames")=List of 2 ..$ : chr [1:44] "8" "9" "10" "14" ... ..$ : chr [1:2] "x-mean" "y-mean" Now, use merge() to join DF and DF.new by the rownames: DF.final <- merge(DF, DF.new, by = "row.names") > DF.final Row.names x y index x-mean y-mean 1 1 15807.24 12.50 4 -6262.12857 -30.498571 2 10 195.40 204.00 1 38.22750 122.350000 3 11 4240.21 22.40 2 1726.93188 -208.037500 4 12 1222.72 45.90 2 -1290.55812 -184.537500 5 13 1142.26 23.60 2 -1371.01812 -206.837500 6 14 63.00 90.10 1 -94.17250 8.450000 7 15 1216.00 82.40 2 -1297.27812 -148.037500 8 16 2769.60 111.00 2 256.32188 -119.437500 9 17 1790.46 34.70 2 -722.81812 -195.737500 10 18 26.10 26.10 1 -131.07250 -55.550000 11 19 19676.83 0.99 4 -2392.53857 -42.008571 12 2 15752.51 33.50 4 -6316.85857 -9.498571 13 20 10920.60 203.00 3 3906.26923 -1148.809231 14 21 6144.00 46.00 3 -870.33077 -1305.809231 15 22 4534.48 4534.48 3 -2479.85077 3182.670769 16 23 40000.00 65.00 4 17930.63143 22.001429 17 24 29500.00 56.00 4 7430.63143 13.001429 18 25 17100.00 77.00 4 -4969.36857 34.001429 19 26 9000.00 435.00 3 1985.66923 -916.809231 20 27 6300.00 84.00 3 -714.33077 -1267.809231 21 28 3962.88 334.00 2 1449.60188 103.562500 22 29 5690.00 653.00 3 -1324.33077 -698.809231 23 3 12893.76 1.50 3 5879.42923 -1350.309231 24 30 3736.00 233.00 2 1222.72188 2.562500 25 31 2750.00 22.00 2 236.72188 -208.437500 26 32 1316.00 345.00 2 -1197.27812 114.562500 27 33 4595.00 4595.00 3 -2419.33077 3243.190769 28 34 5928.00 45.00 3 -1086.33077 -1306.809231 29 35 2645.70 0.00 2 132.42188 -230.437500 30 36 2580.24 454.00 2 66.96187 223.562500 31 37 6547.34 6547.34 3 -466.99077 5195.530769 32 38 1615.68 5.00 2 -897.59812 -225.437500 33 39 194.06 55.00 1 36.88750 -26.650000 34 4 8426.88 22.20 3 1412.54923 -1329.609231 35 40 184.80 6.00 1 27.62750 -75.650000 36 41 82.94 44.00 1 -74.23250 -37.650000 37 42 16649.00 56.00 4 -5420.36857 13.001429 38 43 4500.00 74.00 3 -2514.33077 -1277.809231 39 44 1600.00 744.00 2 -913.27812 513.562500 40 5 5706.24 333.00 3 -1308.09077 -1018.809231 41 6 3982.08 560.00 2 1468.80188 329.562500 42 7 3642.62 670.00 2 1129.34188 439.562500 43 8 295.68 124.00 1 138.50750 42.350000 44 9 215.40 104.00 1 58.22750 22.350000 With respect to getting the correlation coefficient for each sub-group, you can do the following: > unlist(lapply(split(DF[, -3], DF$index), function(x) cor(x)[1, 2])) 1 2 3 4 0.4468744 0.2619220 -0.3608070 0.3848641 See ?split, ?lapply, ?scale, ?do.call, ?rbind, ?unlist, ?merge and ?cor HTH, Marc Schwartz ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.