[R] return() in nested functions

2007-07-14 Thread Mark Hempelmann
Dear WizaRds,

After consulting different sources I am still unable to understand the 
correct use of return() in nested functions. To illustrate the problem:

f <- function(x,y,type){




if (type=="est1") est1(x,y)
if (type=="est2") est2(x,y)

test<-f(1,2,type="est1") # gives Null for test

However, without the second 'if' condition, it works properly:
Warning message:
multi-argument returns are deprecated in: return(x, y, z)
> test
[1] 1
[1] 2
[1] 3

Basically, the function I am working on is of the above structure, be it
more complex. I would like f to return the results of function 'out' to 
the user in the assigned variable, e.g. 'test'. i did consult try() and 
tryCatch(), but it doesn't seem to be what I am looking for.

Thank you for your help and understanding

R-help@stat.math.ethz.ch mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Conditional Selection of Columns for Tables

2007-01-27 Thread Mark Hempelmann
Dear Wizards -

Thank you so much for your help. That was exactly what I was looking 
for. Now, I have been working on conditional selection of columns in a 
data frame. My goal is to calculate the total revenue per sales 
representative per status in a table. I have come to a complete stop:

Let's say, we have a data.frame called df with several columns and a
number of rows:

df <- data.frame( nr=101:110, letter=LETTERS[1:10],
name=c(rep("eenie",3), rep("meenie",2), rep("miney",4),
   "moe"), revenue=round(runif(10, min=100, max=1000),0),
status=round(runif(10,min=1, max=3),0) )

 nr letter   name revenue status
1  101  A  eenie 764  2
2  102  B  eenie 918  2
3  103  C  eenie 936  3
4  104  D meenie 770  2
5  105  E meenie 280  1
6  106  F  miney 172  2
7  107  G  miney 439  2
8  108  H  miney 607  1
9  109  I  miney 553  1
10 110  Jmoe 322  2

where status means: 3=no deal, 2=pending, 1=good job.
now, we want the total revenue per sales representative per status in a

sum( subset(df, name=="eenie", select=revenue) )

gives the total of eenie without status, but I would like to have sthg like:

name   revenue
eenie  1000
meenie 2000...

name   revenue
eenie  100
meenie 200...

Are these flat contingency tables? How can I get the results without
much hazzle in one list/ table? i did read the ?ftable and what I was 
able to derive so far is:

flat.df <- ftable(df[c("name", "revenue", "status")])

but I am unable to further agglomerate the data. hmpf.
Good God, what would I do without my R-help forum?

Thank you again
Cheers and a relaxing weekend

R-help@stat.math.ethz.ch mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] select subsets in data frame

2007-01-10 Thread Mark Hempelmann
Dear WizaRds!

A trivial question indeed on selecting subsets in data frames. I am 
sorry. Unfortunately, I did not find any helpful information on the 
introduction, searched the help archive and read in introductory books. 
Please help:

I want to select column "KB" which is read via read.csv2 as a data.frame 
into d. I checked that it is indeed a data.frame object and included the 
correct header information in line 1. For example purposes, look at this 
small object:
<<*>>= (4)
d <- data.frame(A=1:3, Date=c("01.01.07","02.01.07","03.01.07"),
KB=c("Eenie", "Meenie", "Miney") )

d["KB"=="Eenie",] # gives
[1] ADate KB
<0 rows> (or 0-length row.names)
If I follow Venables/ Ripley in Modern Applied Statistics with S, it 
should look like this:

<<*>>= (5)
gives the correct subset. But
d[KB=="Eenie",] # gives

Error in `[.data.frame`(d, KB == "Eenie", ) :
 object "KB" not found

I need every KB named Eenie. What did I do wrong? The alternative I 
found seems to be quite complicated:

<<*>>= (6)
d[which( d[,"KB"]=="Eenie" ), ]
   A DateKB
1 1 01.01.07 Eenie

Thank you so much for your help.


"I believe I found the missing link between animal and civilized man. 
It's us." -- Konrad Lorenz

R-help@stat.math.ethz.ch mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Two Phase Sampling

2006-07-11 Thread Mark Hempelmann
Dear WizaRds,

I tried to construct a two-phase sampling design in Survey just the way 
I hoped understood in Vienna - I was wrong. I think I am too stupid to 
create the correct subset for phase 2. Phase1: Sample 1000 parts with 80 
defective. Phase2: Sample 100 parts out of these 1000 with  15 
defective. 0:ok, 1:defunct. The table below gives the conditional 
sampling values.

Please help me:

ss1 <- data.frame(id=1:1000, ph1.x=rep(c(1,0),c(10,990)),
subset=rep(c(1,0),c(100,900)), ph2.y=rep(c(1,0,NA),c(15,85,900)),
n1=rep(1000,1000), n2=rep(100,1000) )
table(ss1$ph1.y, ss1$ph2.x)

 >Phase2.y  0  1
 >   0 85  0
 >   1  5 10

p2 <- twophase(id=list(~id,~id), strata=list(NULL,NULL),
data=ss1, subset=~subset, fpc=list(~n1,~n2))
svymean (~ph2.y, design=p2s)

 >  mean SE
 >ph2.y 0.15  0

However, taking into consideration the 2nd sample, the estimator should be:

ph1.x.bar (phase1)=80/1000=0.08 and ph2.y.bar (phase2)=15/100=0.15 
defect boards, that means y.est=1.5*0.08=0.12 defect boards, since the 
RATIO ESTIMATOR equals 15/10=1.5 defect parts for the ratio of defect 
ph2/defect ph1.

What again did I do wrong? I am positive that the estimator is 12 
defective parts per 100 average, so how do I correctly construct the 
twophase design?

ps: I hope this is not sthg. undergraduates master eloquently...

Thank you so much for your help. I invite you to all the BBQ and beer 
there is in Europe!

Yours always

R-help@stat.math.ethz.ch mailing list
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Multistage Sampling

2006-07-07 Thread Mark Hempelmann
Dear WizaRds, dear Thomas,

First of all, I want to tell you how grateful I am for all your 
support. I wish I will be able to help others along one day the same way 
you do. Thank you so much. I am struggling with a multistage sampling 

multi3  <- data.frame(cluster=c(1,1,1,1 ,2,2,2, 3,3), id=c(1,2,3,4, 
1,2,3, 1,2),
nl=c(4,4,4,4, 3,3,3, 2,2), Nl=c(100,100,100,100, 50,50,50, 75,75), 
y=c(23,33,77,25, 35,74,27, 37,72) )

dmulti3 <- svydesign(id=~cluster+id, fpc=~M+Nl, data=multi3)
svymean (~y, dmulti3)
mean SE
y 45.796 5.5483

svytotal(~y, dmulti3)
y 78999 13643

and I estimate the population total as N=M/m sum(Nl) = 
23/3*(100+50+75)=1725. With this, my variance estimator is:
y1<-mean(multi3$y[1:4]) # 39.5
y2<-mean(multi3$y[5:7]) # 45.33
y3<-mean(multi3$y[8:9]) # 54.5

yT1<-100*y1 # 3950 total cluster 1
yT2<-50*y2 # 2266.67 total cluster 2
yT3<-75*y3 # 4087.5 total cluster 3
ybarT<-1/3*sum(yT1,yT2,yT3) # 3434.722
s1 <- var(multi3$y[1:4]) # 643.67 var cluster 1
s2 <- var(multi3$y[5:7]) # 632.33 var cluster 2
s3 <- var(multi3$y[8:9]) # 612.5 var cluster 3

var.yT <- 23^2*( 20/23*1/6*sum( 
(yT1-ybarT)^2,(yT2-ybarT)^2,(yT3-ybarT)^2 ) +
1/69 * sum(100*96*s1, 50*47*s2, 75*73*s3) ) # 242 101 517

var.yT/1725^2 = 81.36157
SE = 9.02006,
but it should be SE=13643/1725=7.90899

Is this calculation correct? I remember svytotal using a different 
variance estimator compared to svymean, and that svytotal gives the 
unbiased estimation. To solve the problem, I went ahead and tried to 
calibrate the design object, telling Survey the population total N=1725:

dmulti3.cal <- calibrate(dmulti3, ~1, pop=1725)
svymean (~y, dmulti3.cal)
mean SE
y 45.796 5.5483

svytotal(~y, dmulti3.cal)
  total SE
y 78999 9570.7

, which indeed gives me the computed svymean SE, but alas, I still don't 
know why my variance is so different. I think it might have sthg to do 
with a differently computed N and the fact that your estimator formula 
is a different one. Since I calculated the Taylor Series solution, i 
suppose there must be another approach? The calibration help page tells 
me to enter a list of population total vectors for each cluster, which 
would result in:

dmulti3.cal <- calibrate(dmulti3, ~1, pop=c(100,50,75))
Error in regcalibrate.survey.design2(design, formula, population, 
aggregate.stage = aggregate.stage,  :
Population and sample totals are not the same length.

I am very grateful for your help and wish you alle the best

R-help@stat.math.ethz.ch mailing list
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] persp/ scatterplot3d

2006-06-28 Thread Mark Hempelmann
Dear WizaRds,

I would like to create a 3d-plot with persp(). I sampled 17 points 
with xyz-coordinates (real-life example!), representing the peaks of the 
whole plane with "zero coordinates" x=3,y=3,z=3. My intention is to show 
which entries are above or below the "zero" level with persp() on a 
nicely created grid. I also tried scatterplot3d(), but, alas, I am 
unable to tell the function that my points represent the peaks of the 
plane and are either above or below "normal" (whatever that means...) 
Please help me:


x=dat$x; y=dat$y, z=dat$z
persp(x,y,z) #   doesn't work at all, of course, even if I utilize outer()

scatterplot3d(x,y,z) #   returns a 3d scatterplot, but not the way I 
would like to see it fit.

Thank you so much for your help and support!


R-help@stat.math.ethz.ch mailing list
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Post Stratification

2006-06-18 Thread Mark Hempelmann
Dear WizaRds,

having met some of you in person in Vienna,  I think even more fondly 
of this community and hope to continue on this route. It was great 
talking with you and learning from you. Thank you. I am trying to work 
through an artificial example in post stratification. This is my dataset:

age <- data.frame(id=1:8, stratum=rep( c("S1","S2"),c(5,3)), 
weight=rep(c(3,4),c(5,3)), nh=rep(c(5,3),c(5,3)), 
Nh=rep(c(15,12),c(5,3)), y=c(23,25,27,21,22, 77,72,74) )

pop.types <- table(stratum=age$stratum)
age.post <- svydesign(ids=~1, strata=NULL, data=age, fpc=~Nh) ## no 
clusters, no strata

post <- postStratify(design=age.post, strata=~stratum, population=pop.types)

svymean  (~y, post)
svytotal (~y, post)

 mean SE
y 42.625 0.5467
   total SE
y   341 4.3737

So, is it correct to define pop.types as the number of elements sampled 
per stratum (nh) or rather the total of elements per stratum (Nh)? If so:

pop.types <- data.frame(stratum = c("S1","S2"), Freq = c(15, 12))
The help says: The 'population' totals can be specified as a table with 
the strata variables in the margins, or as a data frame where one 
column lists frequencies and the other columns list the unique 
combinations of strata variables. ??

However, I compute:
Nh=c(15,12); nh=c(5,3); sh=by(age$y, age$stratum, var); N=sum(Nh)
# Mean estimator
y.bar=by(age$y, age$stratum, mean) ## 23.6; 74.33
estimator=1/N*sum(Nh*y.bar) ## 46.14815
# Variance estimator
sqrt(vari) ##   .7425903

and with Taylor expansion .7750118

Please help me correct my mistakes. Thank you so much.

R-help@stat.math.ethz.ch mailing list
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Survey - twophase

2006-06-05 Thread Mark Hempelmann
Dear WizaRds,

I am struggling with the use of twophase in package survey. My goal 
is to compute a simple example in two phase sampling:

phase 1: I sample n1=1000 circuit boards and find 80 non functional
phase 2: Given the n1=1000 sample I sample n2=100 and find 15 non 
functional. Let's say, phase 2 shows this result together with phase 1:
...ok defunct
phase2 ok..850.85

That is in R:
fail <- data.frame(id=1:1000 , x=c(rep(0,920), rep(1,80)), 
y=c(rep(0,985), rep(1,15)), n1=rep(1000,1000), n2=rep(100,1000), 

des.fail<- twophase(id=list(~id,~id), data=fail, subset=~I(x==1)) 
svymean(~y, des.fail)

gives mean y 0.1875, SE 0.0196, but theoretically,
we have x.bar1 (phase1)=0.08 and y.bar2 (phase2)=0.15 defect boards.

Two phase sampling assumes some relation between the easily/ fast 
received x-information and the elaborate/ time-consuming y-information, 
say a ratio r=sum y (phase2)/ sum x (phase2)=15/10=1.5 (out of the above 
Ergo, the y.ratio estimator = r*x.bar(phase1) = 1.5*0.08 = 0.12 with 
variance = (n1-n2)/n1 * s_regression^2/n2 + s_y^2/n1 = 900/1000 * 
0.0765/100 + 0.129/1000 = .00081 SE .02846
with s_regression^2 =
yk=c(rep(0,85), rep(1,15)); xk=c(rep(0,90), rep(1,10))
s_yk^2 =
1/99 * sum( (yk-.15)^2)=0.1287879

I am sorry to bother you with my false calculations, but I just don't 
know how to receive the correct results. Please help. My example is 
taken from Kauermann/ Kuechenhoff 2006, p. 111f.

thank you so much
yours always


R-help@stat.math.ethz.ch mailing list
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] pairwise combinatons of variables

2006-03-25 Thread Mark Hempelmann
Dear WizaRds,

although this might be a trivial question to the community, I was 
unable to 
find anything solving my problem in the help files on CRAN. Please help.

Suppose I have 4 variables and want to use all possible combinations:
for a further kmeans partitioning.

I tried permutations() of package e1071, but this is not what I need. Thank you 
for your help and support.


Additionally: For anybody who is willing to offer some advise, here is my 
complete approach:

mat <- matrix( c(6,7,8,2,3,4,12,14,14, 14,15,13,3,1,2,3,4,2, 
15,3,10,5,11,7,13,6,1, 15,4,10,6,12,8,12,7,1), ncol=9, byrow=T )
rownames(mat) <- paste("v", 1:4, sep="" )

tmat <- t(mat)
cluster <- c(1, 2, 1, 3, 3, 3, 1, 2, 2)
centroids   <- matrix( 0, ncol=3, nrow=4 )
obj <- vector(mode="list", length=3)

for (j in 1:4){
for (i in 1:3){
where <- which(cluster==i) # which obj is in which class?
centroids[j,i] <- mean( tmat[ where,j ] )
obj[[i]] <- tmat[where,]
colnames(centroids) <- paste( c("Cluster"), 1:3)
rownames(centroids) <- rownames(mat)


##  now I want to do kmeans of all possible variable pairs, e.g. v1 and v3
##  automization in a second step later
wjk <- kmeans(tmat[,c(1,3)], centers=centroids[c(1,3),], iter.max=10, 
algorithm="MacQueen")   ## obviously wrong

R-help@stat.math.ethz.ch mailing list
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] kmeans Clustering

2006-03-23 Thread Mark Hempelmann
Dear WizaRds,

My goal is to program the VS-KM algorithm by Brusco and Cradit 01 and I 
come to a complete stop in my efforts. Maybe anybody is willing to follow my 
thoughts and offer some help.
In a first step, I want to use a single variable for the partitioning 
As the center-matrix I use the objects that belong to the cluster I found with 
the hierarchial Ward algorithm. Then, I have to take all possible variable 
and apply kmeans again, which is quite confusing to me. Here is
what I do:

##  0. data
mat <- matrix( c(6,7,8,2,3,4,12,14,14, 14,15,13,3,1,2,3,4,2,
15,3,10,5,11,7,13,6,1, 15,4,10,6,12,8,12,7,1), ncol=9, byrow=T )
rownames(mat) <- paste("v", 1:4, sep="" )
tmat <- t(mat)

##  1. Provide clusters via Ward:
ward<- hclust(d=dist(tmat), method = "ward", members=NULL)

##  2. Compute cluster centers and create center-matrix for kmeans:
groups  <- cutree(ward, k = 3, h = NULL)

centroids   <- vector(mode="numeric", length=3)
obj <- vector(mode="list", length=3)

for (i in 1:3){
where <- which(groups==i) # which object belongs to which group?
centroids[i] <- mean( tmat[ where, ] )
obj[[i]] <- tmat[where,]
P   <- vector(mode="numeric", dim(mat)[2] )
pj  <- vector(mode="list", length=dim(mat)[1])

for (i in 1:dim(mat)[1]){
pj[[i]] <- kmeans( tmat[,i], centers=centroids, iter.max=10, 
P <- rbind(P, pj[[i]]$cluster)
P   <- P[-1,]

##  gives a matrix of partitions using each single variable
##  (I'm sure, P can be programmed much easier)

##  3. kmeans using all possible pairs of variables, here just e.g. 
variables 1 
and 3:
wjk <- kmeans(tmat[,c(1,3)], centers=centroids, iter.max=10, 

which, of course, gives an error message since "centroids" is not a 
matrix of 
the cluster centers. How on earth do I correctly construct a matrix of centers 
corresponding to the pairwise variables? Is it always the same matrix no matter 
which pair of variables I choose?
I apologize for my lack of clustering knowledge and expertise - any 
help is 
welcome. Thank you very much.

Many greetings

R-help@stat.math.ethz.ch mailing list
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Clustering and Rand Index - VS-KM

2006-01-08 Thread Mark Hempelmann
Dear WizaRds,

I have been trying to compute the adjusted Rand index as by Hubert/ 
Arabie, and could not correctly approach how to define a partition 
object as in my last request yesterday.

With package fpc I try to work around the problem, using my original data:

mat <- matrix( c(6,7,8,2,3,4,12,14,14, 14,15,13,3,1,2,3,4,2, 
15,3,10,5,11,7,13,6,1, 15,4,10,6,12,8,12,7,1), ncol=9, byrow=T )
rownames(mat) <- paste("v", 1:4, sep="" )

## and the given partitions:


## Now

cluster.stats(d=dist(mat), clustering=p1, alt.clustering=p2)

## just gives
Error in as.dist(dmat[clustering == i, clustering == i]) :
(subscript) logical subscript too long

I think I don't understand the use of 'd' here. How can I calculate the 
corrected Rand matrix:
( .000  .407 -.071 -.071)
( .407  .000 -.071 -.071)
(-.071 -.071  .000 1.000)
(-.071 -.071 1.000  .000)

Does the clue package help me here? Does anyone know if there is a VS-KM 
algorithm (Variable Selection Heuristic for K-Means Clustering) 
implemented in R? Unfortunately, I did not find any serach entries.

Thank you for your help and support

R-help@stat.math.ethz.ch mailing list
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Clustering and Rand Index

2006-01-07 Thread Mark Hempelmann
Dear WizaRds,

I am trying to compute the (adjusted) Rand Index in order to comprehend 
the variable selection heuristic (VS-KM) according to Brusco/ Cradit 
2001 (Psychometrika 66 No.2 p.249-270, 2001).

Unfortunately, I am unable to correctly use
cl_ensemble and cl_agreement (package: clue). Here is what I am trying 
to do:


##  Let p1..p4 be four partitions of the kind


Each object within the partitions is assigned to cluster 1,2,3 
respectively. Now I have to create a cl_ensemble object, so that I can 
calculate the Rand index:

ens <- cl_ensemble(list=c(p1,p2,p3,p4))

which only leads to
"Ensemble elements must be all partitions or all hierarchies."

Although I understand that p1..p4 are vectors in this example, they 
represent the partitions I want to use. I don't know how to create the 
necessary partition object in order to transform it into an ensemble 
object, so that I can run cl_agreement - so much transformation, so 
little time...

I have also tried to work around this prbl, creating partitions via 
k-means, but I do not get the same partitions I need to validate. I am 
sure the following algorithm needs improvement, especially the use of 
putting matrices into a list through a for loop (ouch) - I am very 
grateful for your comments of improving this terrible piece of R-work 
(is it easier to do sthg with apply?).

Thank you very much for your help and support

mat <- matrix( c(6,7,8,2,3,4,12,14,14, 14,15,13,3,1,2,3,4,2, 
15,3,10,5,11,7,13,6,1, 15,4,10,6,12,8,12,7,1), ncol=9, byrow=T )
rownames(mat) <- paste("v", 1:4, sep="" )

clus.mat <- vector(mode="list", length=4)
for (i in 1:4){
clus.mat[[i]] <- kmeans(mat[i,], centers=3, nstart=1, 
algorithm="MacQueen") ## run kmeans on each row (clustering per single 


R-help@stat.math.ethz.ch mailing list
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] nls() fit to Kahnemann/ Tversky function

2005-10-31 Thread Mark Hempelmann
Dear WizaRds,

 I would like to fit a curve to ten points with nls() for one 
unknown parameter gamma in the Kahnemann/ Tversky function, but somehow 
it won't work and I am unable to locate my mistake.

p.kum <- seq(0.1,1, by=0.1)
felt.prob.kum <- c(0.16, 0.23, 0.36, 0.49, 0.61, 0.71, 0.85, 0.89, 0.95, 
1) ## how to find a function that fits these points nicely?
plot(p.kum, felt.prob.kum) ## looks a little like an "S"

gamma <- rep(0.5, 10)
nls.dataframe <- data.frame(p.kum,felt.prob.kum, gamma)

nls.kurve <- nls( formula = felt.prob.kum ~ 
p.kum^gamma/(p.kum^gamma+(1-p.kum)^gamma)^(1/gamma), data=nls.dataframe, 
start=c(gamma=gamma), algorithm="plinear" )


gives: Error in La.chol2inv(x, size) : 'size' cannot exceed nrow(x) = 10

 If I go with the Gauss-Newton algorithm I get an singular gradient 
matrix error, so I tried the Golub-Pereyra algorithm for partially 
linear least-squares.

 It also seems the nls model tries to find ten different gammas, but 
I want only one single gamma parameter for the function. I appreciate 
your help and support. Thank you.

sol lucet omnibus
Mark Hempelmann

R-help@stat.math.ethz.ch mailing list
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] regression with restrictions - optimization problem

2005-09-09 Thread Mark Hempelmann
Dear WizaRds!

I am sorry to ask for some help, but I have come to a complete stop in 
my efforts. I hope, though, that some of you might find the problem 
quite interesting to look at.

I have been trying to estimate parameters for lotteries, the so called 
utility of chance, i.e. the "felt" probability compared to a rational 
given probability. A real brief example: Given is a lottery payoff 
matrix of the type

x1x2 ... x10 median
10005000 ... 50003750
01000 ... 50002250 etc.

The actual data frame consists of 11 columns and 28 rows.

Each entry x1 ... x10 gives the amount of money resp. the utility of 
that amount you receive playing the lottery. The probability for each 
column is 10%. The median represents the empirical answers of players 
where the person is indifferent if they prefer to receive the lottery or 
the sum of money as a sure payoff.

I try to determine the probability people feel instead of the known 10% 
probability of each column payoff entry. But here's the catch:

People also give different utilities to each amount of money, which 
basically gives us some sort of function like this:
u(x1...x10) = u(x1)*pi(p1) + u(x2)*pi(p2) +...+u(x10)*pi(p10)=y
u() - unknown utility function
pi() - unknown probability function
y - empirical answer
p1..p10 - probabilities, here always 0.1

To keep it simple, I set u(0)=0 and u(5000)=5000 and vary u(1000) 
between a start and end point. On each cycle R computes the regression 
coefficients that serve as the pi(p) estimators for every 10% step.
Then I minimize the residual sum of squares which should give the best 
estimators for every 10% step.

How can I possibly calculate a "smooth" pi(p) curve, a curve that should 
look like an "S", plotted against the cumulative 10% probabilities? I 
only have my ten estimators. How can I possibly tell R the necessary 
restrictions of nonnegative estimators and their sum to equal one? Here 
is my quite naive approach:

a70 <- matrix(c(1000,5000,5000,5000,2150, 0,1000,5000,5000,1750, 
0,0,1000,5000,1150, 0,0,0,1000,200, 1000,1000,5000,5000,2050, 
0,1000,1000,5000,1972), ncol=5, byrow=T)
colnames(a70)=c(paste("x", 1:4, sep=""), "med")
a70 <- as.data.frame(a70)

start=800; end=2000
step=10; u1000=start-step

u1000 <- u1000+step # varying the 1000 entry
a70[a70==1000] <- u1000
reg70 <- lm(a70$med ~ -1+x1+x2+x3+x4, data=a70)
res <- sum( (reg70$residuals^2) )

for (i in 1:( (end-start)/step) ){
 a70[a70==u1000]<- u1000+step
 u1000 <- u1000+step
 reg70 <- lm(a70$med ~ -1+x1+x2+x3+x4, data=a70)
 if (res >= sum( (reg70$residuals^2) )) {
 res <- sum( (reg70$residuals^2) )
 print(paste("cycle", i, "u1000=", u1000, "RSS=", res))
 final70 <- a70
 finalreg <- reg70

 Maybe a better approach works with optim(stats) or dfp(Bhat), but I 
have no idea how to correctly approach such a restricted optimization 
 Thank you su much for your help and support.

Mark Hempelmann

R-help@stat.math.ethz.ch mailing list
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Survey - Cluster Sampling

2005-06-16 Thread Mark Hempelmann
Dear WizaRds,

I am struggling to compute correctly a cluster sampling design. I want
to do one stage clustering with different parametric changes:

Let M be the total number  of clusters in the population, and m the
number sampled. Let N be the total of elements in the population and n
the number sampled. y are the values sampled. This is my example data:

clus1 <- data.frame(cluster=c(1,1,1,2,2,2,3,3,3), id=seq(1:3,3),
weight=rep(72/9,9), nl=rep(3,9), Nl=rep(3,9), N=rep(72,9), y=c(23,33,77,
25,35,74, 27,37,72) )

1. Let M=m=3 and N=n=9. Then:

dclus1<-svydesign(id=~cluster,  data=clus1)
svymean(~y, dclus1)

y 44.778 0.294, the unweighted mean, assuming equal probability in the
clusters. ok.

2. Let M=23, m=3 and N=72, n=9, then I am unable to use svydesign correctly:

dclus2<-svydesign(id=~cluster,  data=clus1, fpc=~N)
svymean(~y, dclus2)

 mean SE
y 44.778 0.2878, but it should be 23/72 * 1/3(133+134+136)=42.91, since
I have to include the total number of clusters/total population M/N into
the estimator. How can I include the information of the total number of

3. How do I work with weights correctly? I understand that weights imply
  inverse probability weighting 1/p with p=n/N in simple sampling, in
our case 72/9=8, because I sample 9 units out of a total population of
72. Again, I couldn't tell survey the number of total clusters M. So:

dclus3<-svydesign(id=~cluster,  weights=~weight, data=clus1, fpc=~N)
svymean(~y, dclus3)

 mean SE
y 44.778 0.2878, still exactly the same numbers, although I provided the
weights. What am I doing wrong?

I am sorry to bother you. Studying Statistics isn't done in a day,
that's for sure. Thank you so much for your understanding and support.


R-help@stat.math.ethz.ch mailing list
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Survey and Stratification

2005-05-26 Thread Mark Hempelmann

Dear WizaRds,

	Working through sampling theory, I tried to comprehend the concept of 
stratification and apply it with Survey to a small example. My question 
is more of theoretic nature, so I apologize if this does not fully fit 
this board's intention, but I have come to a complete stop in my efforts 
and need an expert to help me along. Please help:

age<-matrix(c(rep(1,5), rep(2,3), 1:8, rep(3,5), rep(4,3), rep(5,5), 
rep(3,3), rep(15,5), rep(12,3), 23,25,27,21,22, 33,27,29), ncol=6, byrow=F)

colnames(age)<-c("stratum", "id", "weight", "nh", "Nh", "y")

## create survey design object
age.des1<-svydesign(ids=~id, strata=~stratum, weight=~Nh, data=age)
svymean(~y, age.des1)
## gives mean 25.568, SE 0.9257

age.des2<-svydesign(ids=~id, strata=~stratum, weight=~I(nh/Nh), data=age)
svymean(~y, age.des2)
## gives mean 25.483, SE 0.9227

age.des3<-svydesign(ids=~id, strata=~stratum, weight=~weight, data=age)
svymean(~y, age.des3)
## gives mean 26.296, SE 0.9862

age.des4<-svydesign(ids=~id, strata=~stratum, data=age)
svymean(~y, age.des4)
## gives mean 25.875, SE 0.9437

age.des3 is the only estimator I am able to compute per hand correctly. 
It is stratified random sampling with inverse probablility weighting 
with weight= nh/Nh ## sample size/ stratum size.

Basically, I thought the option weight=~Nh as well as weight=~I(nh/Nh) 
would result in the same number, but it does not. I am reading 
Thompson(02), Cochran(77) and of course Lumley on his Survey package, 
but I can't find my mistake.

I thought the Hansen-Hurwitz estimator per stratum offers the right numbers:
p1=5/15, p2=3/12, so y1.total=1/5*(3*118), y2.total=1/3*(4*89) and the 
stratified estimator with this design should be: 
1/27(y1.total+y2.total), obviously wrong. How on earth do I get the 
numbers Survey is calculating?

I am very sorry to bother you with this problem, however, I didn't find 
anybody who was willing to help me.

Thank you so much

R-help@stat.math.ethz.ch mailing list
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] firefox and R 201

2004-12-12 Thread Mark Hempelmann
please help.
why is it that i cannot open html help pages out of the R menu? here is what I do: using browser firefox1.0 (open 
source!), java plugin jre 150 installed, supposedly working properly. opening R201patched, html help, link:search engine 
and keywords: works properly, jre symbol appears. clicking on any link (keywords on that page): no reaction whatsoever. 
what am i doing wrong?

closing and reopening firefox won't help, since the browser then asks me to create a new profile. maybe, firefox and R 
(interacting with java) are conflicting? i couldn't find any help entry, so i am sorry if this problem was addressed 

R forever!
viele grüße
mark hempelmann
universität bielefeld
[EMAIL PROTECTED] mailing list
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html