[R] confidence intervals for differences in proportions from complex survey design?

2015-05-10 Thread Brown, Tony Nicholas
All:

I need to generate confidence intervals for differences in proportions using 
data from a complex survey design. An example follows where I attempt to 
estimate the difference in depression prevalence by sex.

# Data might look something like this:
Dfr-data.frame(depression=sample(c(yes,no), size=30, replace=TRUE),
sex=sample(c(M,F), size=30, replace=TRUE),
cluster=rep(1:10, times=3),
stratum=rep(1:5, each=2, times=3),
pweight=runif(n=30, min=1, max=3))
Dfr
library(survey)
msdesign-svydesign(id=~cluster, strata=~stratum, weights=~pweight, nest=TRUE,
data=Dfr)
# When searching online, one recommendation was to use svyglm() to generate an
# approximation as follows:
confint(with(Dfr, svyglm(I(depression==yes)~sex, 
family=gaussian(link=identity), 
msdesign)), level=0.95, method=Wald)

This question has been asked before on the listserv (circa 2007) and I 
contacted the original poster, who indicated that they never received a reply.

Here is the question as described by the original poster:

I'm trying to get confidence intervals of proportions (sometimes for 
subgroups) estimated from complex survey data. Because a function like 
prop.test() does not exist for the survey package I tried the following:

1) Define a survey object (PSU of clustered sample, population weights);
2) Use svyglm() of the package survey to estimate a binary logistic 
regression (family='binomial'): For the confidence interval of a single 
proportion regress the binary dependent variable on a constant (1), for 
confidence intervals of that variable for subgroups regress this 
variable on the groups (factor) variable;
3) Use predict() to obtain estimated logits and the respective standard 
errors (mod.dat specifying either the constant or the subgroups):

pred=predict(model,mod.dat,type='link',se.fit=T)

and apply the following to obtain the proportion with its confidence 
intervals (for example, for conf.level=.95):

lo.e = pred[1:length(pred)]-qnorm((1+conf.level)/2)*SE(pred)
hi.e = pred[1:length(pred)]+qnorm((1+conf.level)/2)*SE(pred)
prop = 1/(1+exp(-pred[1:length(pred)]))
lo = 1/(1+exp(-lo.e))
hi = 1/(1+exp(-hi.e))

I think that in that way I get CI's based on asymptotic normality - 
either for a single proportion or split up into subgroups.

Question: Is this a correct or a defensible procedure? Or should I use a 
different approach? Note that this approach should also allow to 
estimate CI's for proportions of subgroups taking into account the 
complex survey design.

Thanks in advance for any help that you can provide.

Tony

--
Tony N. Brown, Ph.D.
Associate Chair and Associate Professor of Sociology
Google Scholar Profile: http://tinyurl.com/lozlht8
LinkedIn Profile: https://www.linkedin.com/pub/tony-nicholas-brown/a6/64/31a

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] graphically representing frequency of words in a speech?

2009-06-10 Thread Brown, Tony Nicholas
Yihui,

This is quite impressive, thanks for helping me think about how to make tag 
clouds in R.

Tony

-Original Message-
From: Yihui Xie [mailto:xieyi...@gmail.com] 
Sent: Wednesday, June 10, 2009 3:15 AM
To: Brown, Tony Nicholas
Cc: r-help@r-project.org
Subject: Re: [R] graphically representing frequency of words in a speech?

Hi,

As Gregor Gorjanc mentioned, it's very inconvenient to let R decide
the fontsize and placement of words in a plot. There have already been
very mature applications of tag cloud; one of them I'm relatively
familiar is the WordPress plugin wp-cumulus, which makes use of a
Flash object to generate tag cloud, and it has fantastic 3D rotation
effect of the cloud. I've spent a couple of hours porting it into R;
see the source code and effect here:

http://yihui.name/en/2009/06/creating-tag-cloud-using-r-and-flash-javascript-swfobject/

HTH.

Regards,
Yihui
--
Yihui Xie xieyi...@gmail.com
Phone: +86-(0)10-82509086 Fax: +86-(0)10-82509086
Mobile: +86-15810805877
Homepage: http://www.yihui.name
School of Statistics, Room 1037, Mingde Main Building,
Renmin University of China, Beijing, 100872, China



On Mon, Jun 8, 2009 at 2:41 AM, Brown, Tony
Nicholastony.n.br...@vanderbilt.edu wrote:
 Dear all,



 I recently saw a graph on television that displayed selected
 words/phrases in a speech scaled in size according to their frequency.
 So words/phrases that were often used appeared large and words that were
 rarely used appeared small. The closest thing I can find on the web to
 approximate what I saw can be found here:
 http://stateoftheunion.onetwothree.net/ The example at that website is
 more complicated but captures the general idea.



 Would someone point me in the right direction in terms of replicating
 such a graph.



 Thanks in advance,

 Tony



 
 -

 Tony N. Brown, Ph.D.

 Editor-Elect, American Sociological Review

 Associate Professor of Sociology and Human and Organizational
 Development (secondary)

 Program Faculty, Effective Health Communication and African American 
 Diaspora Studies

 Faculty Head of Hank Ingram House, The Commons

 Vanderbilt University

 (615) 322-7518

 (615) 322-7505 fax




        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] graphically representing frequency of words in a speech?

2009-06-07 Thread Brown, Tony Nicholas
Dear all,

 

I recently saw a graph on television that displayed selected
words/phrases in a speech scaled in size according to their frequency.
So words/phrases that were often used appeared large and words that were
rarely used appeared small. The closest thing I can find on the web to
approximate what I saw can be found here:
http://stateoftheunion.onetwothree.net/ The example at that website is
more complicated but captures the general idea.

 

Would someone point me in the right direction in terms of replicating
such a graph.

 

Thanks in advance,

Tony

 


-

Tony N. Brown, Ph.D.

Editor-Elect, American Sociological Review

Associate Professor of Sociology and Human and Organizational
Development (secondary)

Program Faculty, Effective Health Communication and African American 
Diaspora Studies

Faculty Head of Hank Ingram House, The Commons

Vanderbilt University

(615) 322-7518

(615) 322-7505 fax




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] graphically representing frequency of words in a speech?

2009-06-07 Thread Brown, Tony Nicholas
Thank you so much Mark and Gregor. The basic information, suggestions,
and R code that you provided is most helpful. 

Tony

-Original Message-
From: Gorjanc Gregor [mailto:gregor.gorj...@bfro.uni-lj.si] 
Sent: Sunday, June 07, 2009 2:17 PM
To: Marc Schwartz; Brown, Tony Nicholas
Cc: rhelp help
Subject: RE: [R] graphically representing frequency of words in a
speech?

 The only thing that I found for R is by Gregor Gorjanc, but the
 information seems to be dated:

http://www.bfro.uni-lj.si/MR/ggorjan/software/R/index.html#tagCloud

Hi,

Yes, I have tried to create a tag cloud plot in R, but I abandoned the
project
due to other things. The main obstacle was that in R we need to take
care of the fontsizes and placement of words, while this is very easy
with
say browsers, who do all the renderind. I tracked the last version of
the R file
which is pasted bellow. I must say that I do not remember the status of
the
code so use it as you wish. If anyone wishes to take this project
further, please
do so!

gg

### tagCloud.R
###-
---
### What: Tag cloud plot functions
### Time-stamp: 2006-09-10 02:53:29 ggorjan
###-
---

tagCloud - function(x, n=100, decreasing=TRUE,
 threshold=NULL, fontsize=c(12, 36),
 align=TRUE, expandRow=TRUE,
 justRow=bottom, title,
 textGpar=gpar(col=navy),
 rectGpar=gpar(col=white),
 titleGpar=gpar(), viewGpar=gpar(),
 mar=c(1, 1, 1, 1))
{
  UseMethod(tagCloud)
}

tagCloud.default - function(x, n=100, decreasing=TRUE,
 threshold=NULL, fontsize=c(12, 36),
 align=TRUE, expandRow=TRUE,
 justRow=bottom, title,
 textGpar=gpar(col=navy),
 rectGpar=gpar(col=white),
 titleGpar=gpar(), viewGpar=gpar(),
 mar=c(1, 1, 1, 1))
{
  if(!is.null(dim(x))) stop('x' must be a vector)

  tagCloud.table(table(x), n=n, decreasing=decreasing,
fontsize=fontsize,
 threshold=threshold, align=align, expandRow=expandRow,
 justRow=justRow, title=title, textGpar=textGpar,
 rectGpar=rectGpar, titleGpar=titleGpar,
viewGpar=viewGpar,
 mar=mar)
}

tagCloud.table - function(x, n=100, decreasing=TRUE,
   threshold=NULL, fontsize=c(12, 36),
   align=TRUE, expandRow=TRUE,
   justRow=bottom, title,
   textGpar=gpar(col=navy),
   rectGpar=gpar(col=white),
   titleGpar=gpar(), viewGpar=gpar(),
   mar=c(1, 1, 1, 1))
{
  ## --- Check ---

  if(length(dim(x)) != 1)
stop('x' must be one dimensional table)

  ## --- Threshold ---

  if(!is.null(threshold)) x - x[x = threshold]

  ## --- Number of units ---

  N - length(x)## length of table
  if(is.null(n)) {  ## if n=NULL, plot all units
n - N
  } else {
if(n  N) n - N## if n is to big, decrease it
if(n  1) n - round(N * n) ## if n is percentage of units
  }

  fontsizeLength - length(fontsize)
  if(fontsizeLength != 2)
stop('fontsize' must be of length two)

  ## --- Sort and subset ---

  if(n  N) { ## only if we want to plot subset of units
tmp - sort(x, decreasing=decreasing)
x - x[names(x) %in% names(tmp[1:n])]
  }

  ## --- Get relative freq ---

  x - prop.table(x)

  ## --- Fontsize ---

  fontsizeDiff - diff(fontsize)
  xDiff - max(x) - min(x)
  if(xDiff != 0) {
off - ifelse(fontsizeDiff  0, min(x), max(x))
fontsize - (x - off) / xDiff * fontsizeDiff + min(fontsize)
  } else { ## all units have the same frequency
fontsize - rep(min(fontsize), times=n)
  }

  ## --- Viewport and rectangle ---

  grid.newpage()
  width - unit(1, npc)
  height - unit(1, npc)
  vp - viewport(y=unit(mar[1], lines), x=unit(mar[2], lines), ,
 width=width - unit(mar[2] + mar[4], lines),
 height=height - unit(mar[1] + mar[3], lines),
 just=c(left, bottom), gp=viewGpar, name=main)
  pushViewport(vp)

  if(!missing(title))
grid.text(title, y=height, gp=titleGpar, name=title)

  grid.rect(gp=rectGpar, name=cloud)

  ## --- Grobs ---

  tag - vector(mode=list, length=4)
  names(tag) - c(fontsize, grob, width, height)
  tag[[1]] - tag[[2]] - tag[[3]] - tag[[4]] - vector(mode=list,
length=n)
  for(i in 1:n) {
tag$fontsize[[i]] - fontsize[i]
tag$grob[[i]] - textGrob(names(x[i]),
gp=gpar(fontsize=fontsize[i]))
tag$width[[i]] - convertWidth(grobWidth(tag$grob[[i]]),
unitTo=npc,
   valueOnly=TRUE)
tag$height[[i]] - convertHeight

Re: [R] randomly sample within clustered data?

2008-09-15 Thread Brown, Tony Nicholas
Thierry,

Thanks so much. Your solution works perfectly.

Tony

-Original Message-
From: ONKELINX, Thierry [mailto:[EMAIL PROTECTED] 
Sent: Monday, September 15, 2008 2:56 AM
To: Brown, Tony Nicholas; r-help@r-project.org
Subject: RE: [R] randomly sample within clustered data?

Something like this?

do.call(rbind, 
lapply(
split(Dataf, Dataf$id), 
function(x){
x[sample(seq_len(nrow(x)), size=2), ]
}
)
)
 
HTH,

Thierry



ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium 
tel. + 32 54/436 185
[EMAIL PROTECTED] 
www.inbo.be 

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey

-Oorspronkelijk bericht-
Van: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Namens Brown, Tony Nicholas
Verzonden: maandag 15 september 2008 9:40
Aan: r-help@r-project.org
Onderwerp: [R] randomly sample within clustered data?

Dear useRs,



What is an efficient way to randomly sample from clustered data such
that I get equal representation from each cluster? For example, let's
say I want to randomly sample two cases from each cluster created by the
id variable in the following data frame:



 id-c(rep(100, 4),rep(101, 3), rep(102, 6), rep(103, 7))

 sex-sample(c(m,f), 20, replace=TRUE)

 weight-rnorm(n=20, mean=150, sd=3)

 attitude-sample(1:7, 20, replace=TRUE)

 Dataf-data.frame(id,sex,weight,attitude)

 Dataf

id sex   weight attitude

1  100   m 146.50646

2  100   f 150.23174

3  100   f 149.36865

4  100   m 144.72187

5  101   m 147.90714

6  101   m 148.38026

7  101   m 154.46341

8  102   m 153.27195

9  102   m 148.98215

10 102   f 148.06561

11 102   f 148.89496

12 102   m 146.99634

13 102   m 153.05424

14 103   m 148.15581

15 103   f 148.04824

16 103   m 151.80442

17 103   f 155.49764

18 103   m 150.04231

19 103   f 146.04875

20 103   m 154.66517

 



Here's the R code I wrote that obviously does not work:



sapply(split(Dataf, Dataf$id), sample, size=2)



I would prefer a data frame (i.e., Dataf2) as the final output and it
should look something like this:



 Dataf2

id sex   weight attitude

1  100   m 146.50646

2  100   m 144.72187

3  101   m 147.90714

4  101   m 154.46341

5  102   m 153.27195

6  102   m 148.98215

7  103   f 155.49764

8  103   f 146.04875

 



Thanks in advance in your assistance.



Tony





--



Tony N. Brown, Ph.D.

Associate Professor of Sociology

Faculty Head of Hank Ingram House, The Commons

Research Fellow, Vanderbilt Center for Nashville Studies

Vanderbilt University

(615) 322-7518

(615) 322-7505 fax

[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] 




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver
weer en binden het INBO onder geen enkel beding, zolang dit bericht niet
bevestigd is door een geldig ondertekend document. The views expressed
in  this message and any annex are purely those of the writer and may
not be regarded as stating an official position of INBO, as long as the
message is not confirmed by a duly signed document.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] randomly sample within clustered data?

2008-09-15 Thread Brown, Tony Nicholas
Dear useRs,

 

What is an efficient way to randomly sample from clustered data such
that I get equal representation from each cluster? For example, let's
say I want to randomly sample two cases from each cluster created by the
id variable in the following data frame:

 

 id-c(rep(100, 4),rep(101, 3), rep(102, 6), rep(103, 7))

 sex-sample(c(m,f), 20, replace=TRUE)

 weight-rnorm(n=20, mean=150, sd=3)

 attitude-sample(1:7, 20, replace=TRUE)

 Dataf-data.frame(id,sex,weight,attitude)

 Dataf

id sex   weight attitude

1  100   m 146.50646

2  100   f 150.23174

3  100   f 149.36865

4  100   m 144.72187

5  101   m 147.90714

6  101   m 148.38026

7  101   m 154.46341

8  102   m 153.27195

9  102   m 148.98215

10 102   f 148.06561

11 102   f 148.89496

12 102   m 146.99634

13 102   m 153.05424

14 103   m 148.15581

15 103   f 148.04824

16 103   m 151.80442

17 103   f 155.49764

18 103   m 150.04231

19 103   f 146.04875

20 103   m 154.66517

 

 

Here's the R code I wrote that obviously does not work:

 

sapply(split(Dataf, Dataf$id), sample, size=2)

 

I would prefer a data frame (i.e., Dataf2) as the final output and it
should look something like this:

 

 Dataf2

id sex   weight attitude

1  100   m 146.50646

2  100   m 144.72187

3  101   m 147.90714

4  101   m 154.46341

5  102   m 153.27195

6  102   m 148.98215

7  103   f 155.49764

8  103   f 146.04875

 

 

Thanks in advance in your assistance.

 

Tony

 

 

--



Tony N. Brown, Ph.D.

Associate Professor of Sociology

Faculty Head of Hank Ingram House, The Commons

Research Fellow, Vanderbilt Center for Nashville Studies

Vanderbilt University

(615) 322-7518

(615) 322-7505 fax

[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] 




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.