[R] return() in nested functions
Dear WizaRds, After consulting different sources I am still unable to understand the correct use of return() in nested functions. To illustrate the problem: f <- function(x,y,type){ est1<-function(x,y){ z=x+y out(x,y,z)} est2<-function(x,y){ z=x*y out(x,y,z)} out<-function(x,y,z) return(x,y,z) if (type=="est1") est1(x,y) if (type=="est2") est2(x,y) } test<-f(1,2,type="est1") # gives Null for test However, without the second 'if' condition, it works properly: Warning message: multi-argument returns are deprecated in: return(x, y, z) > test $x [1] 1 $y [1] 2 $z [1] 3 Basically, the function I am working on is of the above structure, be it more complex. I would like f to return the results of function 'out' to the user in the assigned variable, e.g. 'test'. i did consult try() and tryCatch(), but it doesn't seem to be what I am looking for. Thank you for your help and understanding mark __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Conditional Selection of Columns for Tables
Dear Wizards - Thank you so much for your help. That was exactly what I was looking for. Now, I have been working on conditional selection of columns in a data frame. My goal is to calculate the total revenue per sales representative per status in a table. I have come to a complete stop: Let's say, we have a data.frame called df with several columns and a number of rows: df <- data.frame( nr=101:110, letter=LETTERS[1:10], name=c(rep("eenie",3), rep("meenie",2), rep("miney",4), "moe"), revenue=round(runif(10, min=100, max=1000),0), status=round(runif(10,min=1, max=3),0) ) gives nr letter name revenue status 1 101 A eenie 764 2 2 102 B eenie 918 2 3 103 C eenie 936 3 4 104 D meenie 770 2 5 105 E meenie 280 1 6 106 F miney 172 2 7 107 G miney 439 2 8 108 H miney 607 1 9 109 I miney 553 1 10 110 Jmoe 322 2 where status means: 3=no deal, 2=pending, 1=good job. now, we want the total revenue per sales representative per status in a table. sum( subset(df, name=="eenie", select=revenue) ) gives the total of eenie without status, but I would like to have sthg like: status 1 name revenue eenie 1000 meenie 2000... status 2 name revenue eenie 100 meenie 200... Are these flat contingency tables? How can I get the results without much hazzle in one list/ table? i did read the ?ftable and what I was able to derive so far is: flat.df <- ftable(df[c("name", "revenue", "status")]) but I am unable to further agglomerate the data. hmpf. Good God, what would I do without my R-help forum? Thank you again Cheers and a relaxing weekend mark __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] select subsets in data frame
Dear WizaRds! A trivial question indeed on selecting subsets in data frames. I am sorry. Unfortunately, I did not find any helpful information on the introduction, searched the help archive and read in introductory books. Please help: I want to select column "KB" which is read via read.csv2 as a data.frame into d. I checked that it is indeed a data.frame object and included the correct header information in line 1. For example purposes, look at this small object: <<*>>= (4) d <- data.frame(A=1:3, Date=c("01.01.07","02.01.07","03.01.07"), KB=c("Eenie", "Meenie", "Miney") ) d["KB"=="Eenie",] # gives @ output-start [1] ADate KB <0 rows> (or 0-length row.names) output-end @ If I follow Venables/ Ripley in Modern Applied Statistics with S, it should look like this: <<*>>= (5) library(MASS) attach(painters) painters[Colour>=17,] @ gives the correct subset. But d[KB=="Eenie",] # gives Error in `[.data.frame`(d, KB == "Eenie", ) : object "KB" not found I need every KB named Eenie. What did I do wrong? The alternative I found seems to be quite complicated: <<*>>= (6) d[which( d[,"KB"]=="Eenie" ), ] @ output-start A DateKB 1 1 01.01.07 Eenie output-end Thank you so much for your help. cheers mark "I believe I found the missing link between animal and civilized man. It's us." -- Konrad Lorenz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Two Phase Sampling
Dear WizaRds, I tried to construct a two-phase sampling design in Survey just the way I hoped understood in Vienna - I was wrong. I think I am too stupid to create the correct subset for phase 2. Phase1: Sample 1000 parts with 80 defective. Phase2: Sample 100 parts out of these 1000 with 15 defective. 0:ok, 1:defunct. The table below gives the conditional sampling values. Please help me: library(survey) ss1 <- data.frame(id=1:1000, ph1.x=rep(c(1,0),c(10,990)), subset=rep(c(1,0),c(100,900)), ph2.y=rep(c(1,0,NA),c(15,85,900)), n1=rep(1000,1000), n2=rep(100,1000) ) table(ss1$ph1.y, ss1$ph2.x) >Phase1.x >Phase2.y 0 1 > 0 85 0 > 1 5 10 p2 <- twophase(id=list(~id,~id), strata=list(NULL,NULL), data=ss1, subset=~subset, fpc=list(~n1,~n2)) svymean (~ph2.y, design=p2s) > mean SE >ph2.y 0.15 0 However, taking into consideration the 2nd sample, the estimator should be: ph1.x.bar (phase1)=80/1000=0.08 and ph2.y.bar (phase2)=15/100=0.15 defect boards, that means y.est=1.5*0.08=0.12 defect boards, since the RATIO ESTIMATOR equals 15/10=1.5 defect parts for the ratio of defect ph2/defect ph1. What again did I do wrong? I am positive that the estimator is 12 defective parts per 100 average, so how do I correctly construct the twophase design? ps: I hope this is not sthg. undergraduates master eloquently... Thank you so much for your help. I invite you to all the BBQ and beer there is in Europe! Yours always mark __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Multistage Sampling
Dear WizaRds, dear Thomas, First of all, I want to tell you how grateful I am for all your support. I wish I will be able to help others along one day the same way you do. Thank you so much. I am struggling with a multistage sampling design: library(survey) multi3 <- data.frame(cluster=c(1,1,1,1 ,2,2,2, 3,3), id=c(1,2,3,4, 1,2,3, 1,2), nl=c(4,4,4,4, 3,3,3, 2,2), Nl=c(100,100,100,100, 50,50,50, 75,75), M=rep(23,9), y=c(23,33,77,25, 35,74,27, 37,72) ) dmulti3 <- svydesign(id=~cluster+id, fpc=~M+Nl, data=multi3) svymean (~y, dmulti3) mean SE y 45.796 5.5483 svytotal(~y, dmulti3) totalSE y 78999 13643 and I estimate the population total as N=M/m sum(Nl) = 23/3*(100+50+75)=1725. With this, my variance estimator is: y1<-mean(multi3$y[1:4]) # 39.5 y2<-mean(multi3$y[5:7]) # 45.33 y3<-mean(multi3$y[8:9]) # 54.5 yT1<-100*y1 # 3950 total cluster 1 yT2<-50*y2 # 2266.67 total cluster 2 yT3<-75*y3 # 4087.5 total cluster 3 ybarT<-1/3*sum(yT1,yT2,yT3) # 3434.722 s1 <- var(multi3$y[1:4]) # 643.67 var cluster 1 s2 <- var(multi3$y[5:7]) # 632.33 var cluster 2 s3 <- var(multi3$y[8:9]) # 612.5 var cluster 3 var.yT <- 23^2*( 20/23*1/6*sum( (yT1-ybarT)^2,(yT2-ybarT)^2,(yT3-ybarT)^2 ) + 1/69 * sum(100*96*s1, 50*47*s2, 75*73*s3) ) # 242 101 517 but var.yT/1725^2 = 81.36157 SE = 9.02006, but it should be SE=13643/1725=7.90899 Is this calculation correct? I remember svytotal using a different variance estimator compared to svymean, and that svytotal gives the unbiased estimation. To solve the problem, I went ahead and tried to calibrate the design object, telling Survey the population total N=1725: dmulti3.cal <- calibrate(dmulti3, ~1, pop=1725) svymean (~y, dmulti3.cal) mean SE y 45.796 5.5483 svytotal(~y, dmulti3.cal) total SE y 78999 9570.7 , which indeed gives me the computed svymean SE, but alas, I still don't know why my variance is so different. I think it might have sthg to do with a differently computed N and the fact that your estimator formula is a different one. Since I calculated the Taylor Series solution, i suppose there must be another approach? The calibration help page tells me to enter a list of population total vectors for each cluster, which would result in: dmulti3.cal <- calibrate(dmulti3, ~1, pop=c(100,50,75)) Error in regcalibrate.survey.design2(design, formula, population, aggregate.stage = aggregate.stage, : Population and sample totals are not the same length. I am very grateful for your help and wish you alle the best Yours mark __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] persp/ scatterplot3d
Dear WizaRds, I would like to create a 3d-plot with persp(). I sampled 17 points with xyz-coordinates (real-life example!), representing the peaks of the whole plane with "zero coordinates" x=3,y=3,z=3. My intention is to show which entries are above or below the "zero" level with persp() on a nicely created grid. I also tried scatterplot3d(), but, alas, I am unable to tell the function that my points represent the peaks of the plane and are either above or below "normal" (whatever that means...) Please help me: dat<-matrix(c(1.33,1.00,2.67,4.33,4.00,5.00,0.67,3.33,1.00,3.00,3.00,1.33,1.67,1.67,1.33,1.67,2.33,1.67,0.67,1.00,1.33,3.33,2.67,1.67,1.67,2.67,2.00,0.33,0.67,0.33,2.67,3.33,0.67,0.67,1.33,0.00,4.33,3.33,4.67,3.00,4.00,5.00,4.00,3.67,1.67,3.00,3.67,3.33,1.00,1.33,0.33), ncol=3) colnames(dat)<-c("x","y","z") x=dat$x; y=dat$y, z=dat$z persp(x,y,z) # doesn't work at all, of course, even if I utilize outer() scatterplot3d(x,y,z) # returns a 3d scatterplot, but not the way I would like to see it fit. Thank you so much for your help and support! mark __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Post Stratification
Dear WizaRds, having met some of you in person in Vienna, I think even more fondly of this community and hope to continue on this route. It was great talking with you and learning from you. Thank you. I am trying to work through an artificial example in post stratification. This is my dataset: library(survey) age <- data.frame(id=1:8, stratum=rep( c("S1","S2"),c(5,3)), weight=rep(c(3,4),c(5,3)), nh=rep(c(5,3),c(5,3)), Nh=rep(c(15,12),c(5,3)), y=c(23,25,27,21,22, 77,72,74) ) pop.types <- table(stratum=age$stratum) age.post <- svydesign(ids=~1, strata=NULL, data=age, fpc=~Nh) ## no clusters, no strata post <- postStratify(design=age.post, strata=~stratum, population=pop.types) svymean (~y, post) svytotal (~y, post) gives mean SE y 42.625 0.5467 total SE y 341 4.3737 So, is it correct to define pop.types as the number of elements sampled per stratum (nh) or rather the total of elements per stratum (Nh)? If so: pop.types <- data.frame(stratum = c("S1","S2"), Freq = c(15, 12)) The help says: The 'population' totals can be specified as a table with the strata variables in the margins, or as a data frame where one column lists frequencies and the other columns list the unique combinations of strata variables. ?? However, I compute: Nh=c(15,12); nh=c(5,3); sh=by(age$y, age$stratum, var); N=sum(Nh) # Mean estimator y.bar=by(age$y, age$stratum, mean) ## 23.6; 74.33 estimator=1/N*sum(Nh*y.bar) ## 46.14815 # Variance estimator vari=1/N^2*sum(Nh*(Nh-nh)*sh/nh) sqrt(vari) ## .7425903 and with Taylor expansion .7750118 Please help me correct my mistakes. Thank you so much. Yours mark __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Survey - twophase
Dear WizaRds, I am struggling with the use of twophase in package survey. My goal is to compute a simple example in two phase sampling: phase 1: I sample n1=1000 circuit boards and find 80 non functional phase 2: Given the n1=1000 sample I sample n2=100 and find 15 non functional. Let's say, phase 2 shows this result together with phase 1: ...phase1 ...ok defunct phase2 ok..850.85 ...defunct..5...10.15 sum90...10100 That is in R: fail <- data.frame(id=1:1000 , x=c(rep(0,920), rep(1,80)), y=c(rep(0,985), rep(1,15)), n1=rep(1000,1000), n2=rep(100,1000), N=rep(5000,1000)) des.fail<- twophase(id=list(~id,~id), data=fail, subset=~I(x==1)) #fpc=list(~n1,~n2) svymean(~y, des.fail) gives mean y 0.1875, SE 0.0196, but theoretically, we have x.bar1 (phase1)=0.08 and y.bar2 (phase2)=0.15 defect boards. Two phase sampling assumes some relation between the easily/ fast received x-information and the elaborate/ time-consuming y-information, say a ratio r=sum y (phase2)/ sum x (phase2)=15/10=1.5 (out of the above table) Ergo, the y.ratio estimator = r*x.bar(phase1) = 1.5*0.08 = 0.12 with variance = (n1-n2)/n1 * s_regression^2/n2 + s_y^2/n1 = 900/1000 * 0.0765/100 + 0.129/1000 = .00081 SE .02846 with s_regression^2 = yk=c(rep(0,85), rep(1,15)); xk=c(rep(0,90), rep(1,10)) 1/98*sum((yk-1.5*xk)^2) and s_yk^2 = 1/99 * sum( (yk-.15)^2)=0.1287879 I am sorry to bother you with my false calculations, but I just don't know how to receive the correct results. Please help. My example is taken from Kauermann/ Kuechenhoff 2006, p. 111f. thank you so much yours always mark __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] pairwise combinatons of variables
Dear WizaRds, although this might be a trivial question to the community, I was unable to find anything solving my problem in the help files on CRAN. Please help. Suppose I have 4 variables and want to use all possible combinations: 1,2 1,3 1,4 2,3 2,4 3,4 for a further kmeans partitioning. I tried permutations() of package e1071, but this is not what I need. Thank you for your help and support. mark. - Additionally: For anybody who is willing to offer some advise, here is my complete approach: library(e1071) mat <- matrix( c(6,7,8,2,3,4,12,14,14, 14,15,13,3,1,2,3,4,2, 15,3,10,5,11,7,13,6,1, 15,4,10,6,12,8,12,7,1), ncol=9, byrow=T ) rownames(mat) <- paste("v", 1:4, sep="" ) tmat <- t(mat) cluster <- c(1, 2, 1, 3, 3, 3, 1, 2, 2) centroids <- matrix( 0, ncol=3, nrow=4 ) obj <- vector(mode="list", length=3) for (j in 1:4){ for (i in 1:3){ where <- which(cluster==i) # which obj is in which class? centroids[j,i] <- mean( tmat[ where,j ] ) obj[[i]] <- tmat[where,] } } colnames(centroids) <- paste( c("Cluster"), 1:3) rownames(centroids) <- rownames(mat) centroids obj ## now I want to do kmeans of all possible variable pairs, e.g. v1 and v3 ## automization in a second step later wjk <- kmeans(tmat[,c(1,3)], centers=centroids[c(1,3),], iter.max=10, algorithm="MacQueen") ## obviously wrong __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] kmeans Clustering
Dear WizaRds, My goal is to program the VS-KM algorithm by Brusco and Cradit 01 and I have come to a complete stop in my efforts. Maybe anybody is willing to follow my thoughts and offer some help. In a first step, I want to use a single variable for the partitioning process. As the center-matrix I use the objects that belong to the cluster I found with the hierarchial Ward algorithm. Then, I have to take all possible variable pairs and apply kmeans again, which is quite confusing to me. Here is what I do: ## 0. data mat <- matrix( c(6,7,8,2,3,4,12,14,14, 14,15,13,3,1,2,3,4,2, 15,3,10,5,11,7,13,6,1, 15,4,10,6,12,8,12,7,1), ncol=9, byrow=T ) rownames(mat) <- paste("v", 1:4, sep="" ) tmat <- t(mat) ## 1. Provide clusters via Ward: ward<- hclust(d=dist(tmat), method = "ward", members=NULL) ## 2. Compute cluster centers and create center-matrix for kmeans: groups <- cutree(ward, k = 3, h = NULL) centroids <- vector(mode="numeric", length=3) obj <- vector(mode="list", length=3) for (i in 1:3){ where <- which(groups==i) # which object belongs to which group? centroids[i] <- mean( tmat[ where, ] ) obj[[i]] <- tmat[where,] } P <- vector(mode="numeric", dim(mat)[2] ) pj <- vector(mode="list", length=dim(mat)[1]) for (i in 1:dim(mat)[1]){ pj[[i]] <- kmeans( tmat[,i], centers=centroids, iter.max=10, algorithm="MacQueen") P <- rbind(P, pj[[i]]$cluster) } P <- P[-1,] ## gives a matrix of partitions using each single variable ## (I'm sure, P can be programmed much easier) ## 3. kmeans using all possible pairs of variables, here just e.g. variables 1 and 3: wjk <- kmeans(tmat[,c(1,3)], centers=centroids, iter.max=10, algorithm="MacQueen") ### which, of course, gives an error message since "centroids" is not a matrix of the cluster centers. How on earth do I correctly construct a matrix of centers corresponding to the pairwise variables? Is it always the same matrix no matter which pair of variables I choose? I apologize for my lack of clustering knowledge and expertise - any help is welcome. Thank you very much. Many greetings mark __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Clustering and Rand Index - VS-KM
Dear WizaRds, I have been trying to compute the adjusted Rand index as by Hubert/ Arabie, and could not correctly approach how to define a partition object as in my last request yesterday. With package fpc I try to work around the problem, using my original data: mat <- matrix( c(6,7,8,2,3,4,12,14,14, 14,15,13,3,1,2,3,4,2, 15,3,10,5,11,7,13,6,1, 15,4,10,6,12,8,12,7,1), ncol=9, byrow=T ) rownames(mat) <- paste("v", 1:4, sep="" ) ## and the given partitions: p1=c(1,1,1,2,2,2,3,3,3) p2=c(1,1,1,3,2,2,3,3,2) p3=c(1,2,1,3,1,3,1,3,2) p4=c(1,2,1,3,1,3,1,3,2) ## Now cluster.stats(d=dist(mat), clustering=p1, alt.clustering=p2) ## just gives Error in as.dist(dmat[clustering == i, clustering == i]) : (subscript) logical subscript too long I think I don't understand the use of 'd' here. How can I calculate the corrected Rand matrix: ( .000 .407 -.071 -.071) ( .407 .000 -.071 -.071) (-.071 -.071 .000 1.000) (-.071 -.071 1.000 .000) Does the clue package help me here? Does anyone know if there is a VS-KM algorithm (Variable Selection Heuristic for K-Means Clustering) implemented in R? Unfortunately, I did not find any serach entries. Thank you for your help and support Mark __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Clustering and Rand Index
Dear WizaRds, I am trying to compute the (adjusted) Rand Index in order to comprehend the variable selection heuristic (VS-KM) according to Brusco/ Cradit 2001 (Psychometrika 66 No.2 p.249-270, 2001). Unfortunately, I am unable to correctly use cl_ensemble and cl_agreement (package: clue). Here is what I am trying to do: library(clue) ## Let p1..p4 be four partitions of the kind p1=c(1,1,1,2,2,2,3,3,3) p2=c(1,1,1,3,2,2,3,3,2) p3=c(1,2,1,3,1,3,1,3,2) p4=c(1,2,1,3,1,3,1,3,2) Each object within the partitions is assigned to cluster 1,2,3 respectively. Now I have to create a cl_ensemble object, so that I can calculate the Rand index: ens <- cl_ensemble(list=c(p1,p2,p3,p4)) which only leads to "Ensemble elements must be all partitions or all hierarchies." Although I understand that p1..p4 are vectors in this example, they represent the partitions I want to use. I don't know how to create the necessary partition object in order to transform it into an ensemble object, so that I can run cl_agreement - so much transformation, so little time... I have also tried to work around this prbl, creating partitions via k-means, but I do not get the same partitions I need to validate. I am sure the following algorithm needs improvement, especially the use of putting matrices into a list through a for loop (ouch) - I am very grateful for your comments of improving this terrible piece of R-work (is it easier to do sthg with apply?). Thank you very much for your help and support Mark mat <- matrix( c(6,7,8,2,3,4,12,14,14, 14,15,13,3,1,2,3,4,2, 15,3,10,5,11,7,13,6,1, 15,4,10,6,12,8,12,7,1), ncol=9, byrow=T ) rownames(mat) <- paste("v", 1:4, sep="" ) clus.mat <- vector(mode="list", length=4) for (i in 1:4){ clus.mat[[i]] <- kmeans(mat[i,], centers=3, nstart=1, algorithm="MacQueen") ## run kmeans on each row (clustering per single variable) } clus.mat __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] nls() fit to Kahnemann/ Tversky function
Dear WizaRds, I would like to fit a curve to ten points with nls() for one unknown parameter gamma in the Kahnemann/ Tversky function, but somehow it won't work and I am unable to locate my mistake. p.kum <- seq(0.1,1, by=0.1) felt.prob.kum <- c(0.16, 0.23, 0.36, 0.49, 0.61, 0.71, 0.85, 0.89, 0.95, 1) ## how to find a function that fits these points nicely? plot(p.kum, felt.prob.kum) ## looks a little like an "S" gamma <- rep(0.5, 10) nls.dataframe <- data.frame(p.kum,felt.prob.kum, gamma) nls.kurve <- nls( formula = felt.prob.kum ~ p.kum^gamma/(p.kum^gamma+(1-p.kum)^gamma)^(1/gamma), data=nls.dataframe, start=c(gamma=gamma), algorithm="plinear" ) summary(nls.kurve) gives: Error in La.chol2inv(x, size) : 'size' cannot exceed nrow(x) = 10 If I go with the Gauss-Newton algorithm I get an singular gradient matrix error, so I tried the Golub-Pereyra algorithm for partially linear least-squares. It also seems the nls model tries to find ten different gammas, but I want only one single gamma parameter for the function. I appreciate your help and support. Thank you. sol lucet omnibus Mark Hempelmann __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] regression with restrictions - optimization problem
Dear WizaRds! I am sorry to ask for some help, but I have come to a complete stop in my efforts. I hope, though, that some of you might find the problem quite interesting to look at. I have been trying to estimate parameters for lotteries, the so called utility of chance, i.e. the "felt" probability compared to a rational given probability. A real brief example: Given is a lottery payoff matrix of the type x1x2 ... x10 median 10005000 ... 50003750 01000 ... 50002250 etc. The actual data frame consists of 11 columns and 28 rows. Each entry x1 ... x10 gives the amount of money resp. the utility of that amount you receive playing the lottery. The probability for each column is 10%. The median represents the empirical answers of players where the person is indifferent if they prefer to receive the lottery or the sum of money as a sure payoff. I try to determine the probability people feel instead of the known 10% probability of each column payoff entry. But here's the catch: People also give different utilities to each amount of money, which basically gives us some sort of function like this: u(x1...x10) = u(x1)*pi(p1) + u(x2)*pi(p2) +...+u(x10)*pi(p10)=y u() - unknown utility function pi() - unknown probability function y - empirical answer p1..p10 - probabilities, here always 0.1 To keep it simple, I set u(0)=0 and u(5000)=5000 and vary u(1000) between a start and end point. On each cycle R computes the regression coefficients that serve as the pi(p) estimators for every 10% step. Then I minimize the residual sum of squares which should give the best estimators for every 10% step. How can I possibly calculate a "smooth" pi(p) curve, a curve that should look like an "S", plotted against the cumulative 10% probabilities? I only have my ten estimators. How can I possibly tell R the necessary restrictions of nonnegative estimators and their sum to equal one? Here is my quite naive approach: a70 <- matrix(c(1000,5000,5000,5000,2150, 0,1000,5000,5000,1750, 0,0,1000,5000,1150, 0,0,0,1000,200, 1000,1000,5000,5000,2050, 0,1000,1000,5000,1972), ncol=5, byrow=T) colnames(a70)=c(paste("x", 1:4, sep=""), "med") a70 <- as.data.frame(a70) start=800; end=2000 step=10; u1000=start-step u1000 <- u1000+step # varying the 1000 entry a70[a70==1000] <- u1000 reg70 <- lm(a70$med ~ -1+x1+x2+x3+x4, data=a70) res <- sum( (reg70$residuals^2) ) for (i in 1:( (end-start)/step) ){ a70[a70==u1000]<- u1000+step u1000 <- u1000+step reg70 <- lm(a70$med ~ -1+x1+x2+x3+x4, data=a70) if (res >= sum( (reg70$residuals^2) )) { res <- sum( (reg70$residuals^2) ) print(paste("cycle", i, "u1000=", u1000, "RSS=", res)) final70 <- a70 finalreg <- reg70 } } print(final70) summary(finalreg) Maybe a better approach works with optim(stats) or dfp(Bhat), but I have no idea how to correctly approach such a restricted optimization problem. Thank you su much for your help and support. Mark Hempelmann __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Survey - Cluster Sampling
Dear WizaRds, I am struggling to compute correctly a cluster sampling design. I want to do one stage clustering with different parametric changes: Let M be the total number of clusters in the population, and m the number sampled. Let N be the total of elements in the population and n the number sampled. y are the values sampled. This is my example data: clus1 <- data.frame(cluster=c(1,1,1,2,2,2,3,3,3), id=seq(1:3,3), weight=rep(72/9,9), nl=rep(3,9), Nl=rep(3,9), N=rep(72,9), y=c(23,33,77, 25,35,74, 27,37,72) ) 1. Let M=m=3 and N=n=9. Then: dclus1<-svydesign(id=~cluster, data=clus1) svymean(~y, dclus1) meanSE y 44.778 0.294, the unweighted mean, assuming equal probability in the clusters. ok. 2. Let M=23, m=3 and N=72, n=9, then I am unable to use svydesign correctly: dclus2<-svydesign(id=~cluster, data=clus1, fpc=~N) svymean(~y, dclus2) mean SE y 44.778 0.2878, but it should be 23/72 * 1/3(133+134+136)=42.91, since I have to include the total number of clusters/total population M/N into the estimator. How can I include the information of the total number of clusters? 3. How do I work with weights correctly? I understand that weights imply inverse probability weighting 1/p with p=n/N in simple sampling, in our case 72/9=8, because I sample 9 units out of a total population of 72. Again, I couldn't tell survey the number of total clusters M. So: dclus3<-svydesign(id=~cluster, weights=~weight, data=clus1, fpc=~N) svymean(~y, dclus3) mean SE y 44.778 0.2878, still exactly the same numbers, although I provided the weights. What am I doing wrong? I am sorry to bother you. Studying Statistics isn't done in a day, that's for sure. Thank you so much for your understanding and support. mark __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Survey and Stratification
Dear WizaRds, Working through sampling theory, I tried to comprehend the concept of stratification and apply it with Survey to a small example. My question is more of theoretic nature, so I apologize if this does not fully fit this board's intention, but I have come to a complete stop in my efforts and need an expert to help me along. Please help: age<-matrix(c(rep(1,5), rep(2,3), 1:8, rep(3,5), rep(4,3), rep(5,5), rep(3,3), rep(15,5), rep(12,3), 23,25,27,21,22, 33,27,29), ncol=6, byrow=F) colnames(age)<-c("stratum", "id", "weight", "nh", "Nh", "y") age<-as.data.frame(age) ## create survey design object age.des1<-svydesign(ids=~id, strata=~stratum, weight=~Nh, data=age) svymean(~y, age.des1) ## gives mean 25.568, SE 0.9257 age.des2<-svydesign(ids=~id, strata=~stratum, weight=~I(nh/Nh), data=age) svymean(~y, age.des2) ## gives mean 25.483, SE 0.9227 age.des3<-svydesign(ids=~id, strata=~stratum, weight=~weight, data=age) svymean(~y, age.des3) ## gives mean 26.296, SE 0.9862 age.des4<-svydesign(ids=~id, strata=~stratum, data=age) svymean(~y, age.des4) ## gives mean 25.875, SE 0.9437 age.des3 is the only estimator I am able to compute per hand correctly. It is stratified random sampling with inverse probablility weighting with weight= nh/Nh ## sample size/ stratum size. Basically, I thought the option weight=~Nh as well as weight=~I(nh/Nh) would result in the same number, but it does not. I am reading Thompson(02), Cochran(77) and of course Lumley on his Survey package, but I can't find my mistake. I thought the Hansen-Hurwitz estimator per stratum offers the right numbers: p1=5/15, p2=3/12, so y1.total=1/5*(3*118), y2.total=1/3*(4*89) and the stratified estimator with this design should be: 1/27(y1.total+y2.total), obviously wrong. How on earth do I get the numbers Survey is calculating? I am very sorry to bother you with this problem, however, I didn't find anybody who was willing to help me. Thank you so much Mark __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] firefox and R 201
please help. why is it that i cannot open html help pages out of the R menu? here is what I do: using browser firefox1.0 (open source!), java plugin jre 150 installed, supposedly working properly. opening R201patched, html help, link:search engine and keywords: works properly, jre symbol appears. clicking on any link (keywords on that page): no reaction whatsoever. what am i doing wrong? closing and reopening firefox won't help, since the browser then asks me to create a new profile. maybe, firefox and R (interacting with java) are conflicting? i couldn't find any help entry, so i am sorry if this problem was addressed earlier. R forever! viele grüße mark hempelmann universität bielefeld __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html