[R] mlogit.effects()
Dear colleagues, I am struggling mightily with the mlogit package. First, the reason that I am using mlogit as opposed to multinom() in nnet is because my data is ranked, not just ordinal. So, I’m really trying to fit an exploded logit or rank-ordered model. All of the covariates of interest are individual-specific, none are alternative specific. The code below produces a model with my covariates of interest, so that is good. But, I cannot get predict.mlogit or effects.mlogit to work *at all*. The help package is quite unclear as to how to format the sample data that is fed to either of those two functions. Can any one help in that regard? Failing that, can anyone provide a suggestion for an alternative way of modelling ranked categorical data? I’m aware of the pmr and Rankcluster packages. The former however is also poorly documented and the latter is computationally intense to select clusters. I’m trying to do this as simply as possible while remaining loyal to the ranked structure of the data. Thanks, Simon Kiss #Loadpackages library(RCurl) library(mlogit) library(tidyr) library(dplyr) #URL where data is stored dat.url<- 'https://raw.githubusercontent.com/sjkiss/Survey/master/mlogit.out.csv' #Get data dat<-read.csv(dat.url) #Complete cases only as it seems mlogit cannot handle missing values or tied data which in this case you might get because of median imputation dat<-dat[complete.cases(dat),] #Tidy data to get it into long format dat.out<-dat %>% gather(Open, Rank, -c(1,9:12)) %>% arrange(X, Open, Rank) #Create mlogit object mlogit.out<-mlogit.data(dat.out, shape='long',alt.var='Open',choice='Rank', ranked=TRUE,chid.var='X') #Fit Model mod1<-mlogit(Rank~1|gender+age+economic+Job,data=mlogit.out) __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] recode the same subset of variables in several list elements
Hi Jim, So that does the rescale part very efficiently. But I’d like to know how to do that on each list element using lapply or llply. I have about 4 data frames and a few other recodes to do so automating would be nice, rather than applying your code to each individual list element. simon > On Apr 2, 2015, at 6:30 PM, Jim Lemon wrote: > > Hi Simon, > How about this? > > library(plotrix) > revlist<-grep("i",names(df),fixed=TRUE) > df[,revlist]<-sapply(df[,revlist],rescale,c(3,1)) > > Jim > > > On Fri, Apr 3, 2015 at 6:30 AM, Simon Kiss <mailto:sjk...@gmail.com>> wrote: > Hi there: I have a list of data frames with identical variable names. I’d > like to reverse scale the same variables in each data.frame. > I’d appreciate any one’s suggestions as to how to accomplish this. Right now, > I’m working with the code at the very bottom of my sample data. > Thanks, Simon Kiss > > #Create data.frame1 > df<-data.frame( > ivar1=sample(c(1,2,3), replace=TRUE, size=100), > ivar2=sample(c(1,2,3), replace=TRUE, size=100), > hvar1=sample(c(1,2,3), replace=TRUE, size=100), > hvar2=sample(c(1,2,3), replace=TRUE, size=100), > evar1=sample(c(1,2,3), replace=TRUE, size=100), > evar2=sample(c(1,2,3), replace=TRUE, size=100) > ) > > #data.frame2 > df1<-data.frame( > ivar1=sample(c(1,2,3), replace=TRUE, size=100), > ivar2=sample(c(1,2,3), replace=TRUE, size=100), > hvar1=sample(c(1,2,3), replace=TRUE, size=100), > hvar2=sample(c(1,2,3), replace=TRUE, size=100), > evar1=sample(c(1,2,3), replace=TRUE, size=100), > evar2=sample(c(1,2,3), replace=TRUE, size=100) > ) > > #List > list1<-list(df, df1) > #vector of first variables I’d like to recode > i.recodes<-grep('^i.', names(df), value=TRUE) > #Vector of second variables to recode > e.recodes<-grep('^e.', names(df), value=TRUE) > > #Set up RESCALE function from RPMG package > RESCALE <- function (x, nx1, nx2, minx, maxx) > { nx = nx1 + (nx2 - nx1) * (x - minx)/(maxx - minx) > return(nx) > } > > #This is what I’m playing around with > test<-lapply(list1, function(y) { > out<-y[,i.recodes] > out<-lapply(out, function(x) RESCALE(x, 0,1,1,6)) > y[,names(x)]<-out > }) > [[alternative HTML version deleted]] > > __ > R-help@r-project.org <mailto:R-help@r-project.org> mailing list -- To > UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > <https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > <http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] recode the same subset of variables in several list elements
Hi there: I have a list of data frames with identical variable names. I’d like to reverse scale the same variables in each data.frame. I’d appreciate any one’s suggestions as to how to accomplish this. Right now, I’m working with the code at the very bottom of my sample data. Thanks, Simon Kiss #Create data.frame1 df<-data.frame( ivar1=sample(c(1,2,3), replace=TRUE, size=100), ivar2=sample(c(1,2,3), replace=TRUE, size=100), hvar1=sample(c(1,2,3), replace=TRUE, size=100), hvar2=sample(c(1,2,3), replace=TRUE, size=100), evar1=sample(c(1,2,3), replace=TRUE, size=100), evar2=sample(c(1,2,3), replace=TRUE, size=100) ) #data.frame2 df1<-data.frame( ivar1=sample(c(1,2,3), replace=TRUE, size=100), ivar2=sample(c(1,2,3), replace=TRUE, size=100), hvar1=sample(c(1,2,3), replace=TRUE, size=100), hvar2=sample(c(1,2,3), replace=TRUE, size=100), evar1=sample(c(1,2,3), replace=TRUE, size=100), evar2=sample(c(1,2,3), replace=TRUE, size=100) ) #List list1<-list(df, df1) #vector of first variables I’d like to recode i.recodes<-grep('^i.', names(df), value=TRUE) #Vector of second variables to recode e.recodes<-grep('^e.', names(df), value=TRUE) #Set up RESCALE function from RPMG package RESCALE <- function (x, nx1, nx2, minx, maxx) { nx = nx1 + (nx2 - nx1) * (x - minx)/(maxx - minx) return(nx) } #This is what I’m playing around with test<-lapply(list1, function(y) { out<-y[,i.recodes] out<-lapply(out, function(x) RESCALE(x, 0,1,1,6)) y[,names(x)]<-out }) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] basic help with as.Date()
Hi there: normally I’m quite comfortable with as.Date(). But this data set is causing problems. The core of the data frame looks like the sample data frame below, but my attempt to convert df$mydate to a date object returns only NA. Can anyone provide a suggestion? Thank you, Simon Kiss #sample data frame df<-data.frame(mydate=factor(c('Jan-15', 'Feb-13', 'Mar-11', 'Jul-12')), other=rnorm(4, 3)) #Attempt to convert as.Date(as.character(df$mydate), format='%b-%y') __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] foreign:::writeForeignSPSS vs. write.foreign(df, datafile, codefile, package='spss')
Hello: I discovered recently that the function foreign:::writeForeignSPSS allows for variable names longer than 8 characters and has an additional argument varnames. Neither of these capabilities exist with write.foreign. But according to the help file for write.foreign it seems that the latter actually somehow calls the former. Am I reading this wrong? Can someone explain the difference between the two functions? Thanks. Yours, Simon Kiss __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data Frame
HI, of course. The a mini-version of my data-set is below, stored in d2. Then the code I'm working follows. library(reshape2) #Create d2 structure(list(row = 1:50, rank1 = structure(c(3L, 3L, 3L, 4L, 3L, 3L, NA, NA, 3L, NA, 3L, 3L, 1L, NA, 2L, NA, 3L, NA, 2L, 1L, 1L, 3L, NA, 6L, NA, 1L, NA, 3L, 1L, NA, 1L, NA, NA, 6L, 3L, NA, 1L, 3L, 3L, 4L, 1L, NA, 3L, 3L, 3L, NA, 3L, 3L, NA, 1L), .Label = c("accessible", "alternatives", "information", "responsive", "social", "technical", "trade"), class = "factor"), rank2 = structure(c(6L, 1L, 1L, 2L, 4L, 6L, NA, NA, 6L, NA, 6L, 4L, 2L, NA, 4L, NA, 6L, NA, 1L, 6L, 3L, 2L, NA, 3L, NA, 6L, NA, 6L, 6L, NA, 3L, NA, NA, 3L, 6L, NA, 6L, 6L, 6L, 7L, 3L, NA, 1L, 6L, 6L, NA, 2L, 6L, NA, 2L), .Label = c("accessible", "alternatives", "information", "responsive", "social", "technical", "trade"), class = "factor"), rank3 = structure(c(1L, 6L, 4L, 3L, 2L, 4L, NA, NA, 4L, NA, 1L, 1L, 6L, NA, 1L, NA, 1L, NA, 7L, 3L, 6L, 1L, NA, 2L, NA, 4L, NA, 1L, 3L, NA, 6L, NA, NA, 4L, 2L, NA, 7L, 1L, 1L, 6L, 7L, NA, 6L, 1L, 1L, NA, 4L, 1L, NA, 3L), .Label = c("accessible", "alternatives", "information", "responsive", "social", "technical", "trade"), class = "factor"), rank4 = structure(c(7L, 4L, 2L, 1L, 1L, 7L, NA, NA, 1L, NA, 7L, 2L, 7L, NA, 3L, NA, 2L, NA, 3L, 4L, 5L, 6L, NA, 4L, NA, 3L, NA, 4L, 4L, NA, 4L, NA, NA, 2L, 7L, NA, 2L, 2L, 2L, 3L, 6L, NA, 2L, 5L, 4L, NA, 1L, 2L, NA, 4L), .Label = c("accessible", "alternatives", "information", "responsive", "social", "technical", "trade"), class = "factor"), rank5 = structure(c(2L, 7L, 6L, 7L, 7L, 2L, NA, NA, 2L, NA, 2L, 7L, 3L, NA, 6L, NA, 7L, NA, 6L, 7L, 4L, 7L, NA, 7L, NA, 7L, NA, 2L, 2L, NA, 2L, NA, NA, 7L, 1L, NA, 3L, 7L, 4L, 2L, 2L, NA, 4L, 2L, 2L, NA, 6L, 4L, NA, 5L), .Label = c("accessible", "alternatives", "information", "responsive", "social", "technical", "trade"), class = "factor"), rank6 = structure(c(4L, 2L, 7L, 6L, 6L, 1L, NA, NA, 7L, NA, 4L, 5L, 4L, NA, 7L, NA, 4L, NA, 4L, 2L, 2L, 4L, NA, 1L, NA, 2L, NA, 7L, 7L, NA, 7L, NA, NA, 1L, 4L, NA, 4L, 4L, 7L, 1L, 4L, NA, 7L, 7L, 7L, NA, 7L, 7L, NA, 7L), .Label = c("accessible", "alternatives", "information", "responsive", "social", "technical", "trade"), class = "factor"), rank7 = structure(c(5L, 5L, 5L, 5L, 5L, 5L, NA, NA, 5L, NA, 5L, 6L, 5L, NA, 5L, NA, 5L, NA, 5L, 5L, 7L, 5L, NA, 5L, NA, 5L, NA, 5L, 5L, NA, 5L, NA, NA, 5L, 5L, NA, 5L, NA, 5L, 5L, 5L, NA, 5L, 4L, 5L, NA, 5L, 5L, NA, 6L), .Label = c("accessible", "alternatives", "information", "responsive", "social", "technical", "trade"), class = "factor")), .Names = c("row", "rank1", "rank2", "rank3", "rank4", "rank5", "rank6", "rank7"), row.names = c(NA, 50L), class = "data.frame") #This code is a replication of David Carlson's code (below) which works splendidly, but does not work on my data-set #Melt d2: Note, I've used value.name='color' to maximize comparability with David's suggestion d3 <- melt(d2, id.vars=1, measure.vars=2:8, variable.name="rank",value.name="color") #Make Rank Variable Numeric d3$rank<-as.numeric(d3$rank) #Recast d3 into d4 d4<- dcast(d3, row~color,value.var="rank", fill=0) #Note that d4 appears to provide a binary variable for one if a respondent checked the option, but does not provide information as to which rank they assigned each option, but also seems to summarize the number of missing values #David Carlson's Code mydf <- data.frame(t(replicate(100, sample(c("red", "blue", "green", "yellow", NA), 4 mydf <- data.frame(rows=1:100, mydf) colnames(mydf) <- c("row", "rank1", "rank2", "rank3", "rank4") mymelt <- melt(mydf, id.vars=1, measure.vars=2:5, variable.name="rank", value.name="color") mymelt$rank <- as.numeric(mymelt$rank) mycast <- dcast(mymelt, row~color, value.var="rank", fill=0) #Compare str(mydf) str(d2) head(mycast) head(d4) Again, I'm grateful for assistance. I can't understand what how my data-set differs from David's sample data-set. Simon Kiss On Sep 4, 2014, at 2:35 PM, David L Carlson wrote: > I think we would need enough of the data you are using to figure out h
Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data Frame
Hi David and list: This is working, except at this command mycast <- dcast(mymelt, row~color, value.var="rank", fill=0) dcast is using "length" as the default aggregating function. This results in not accurate results. It tells me, for example how many choices were missing values and it tells me if a person selected any given option (value is reported as 1). When I try to run your reproducible research, it works great, but something with the aggregating function is not working properly with mine. Any other thoughts? Simon On Aug 18, 2014, at 10:44 AM, David L Carlson wrote: > Another approach using reshape2: > >> library(reshape2) >> # Construct data/ add column of row numbers >> set.seed(42) >> mydf <- data.frame(t(replicate(100, sample(c("red", "blue", > + "green", "yellow", NA), 4 >> mydf <- data.frame(rows=1:100, mydf) >> colnames(mydf) <- c("row", "rank1", "rank2", "rank3", "rank4") >> head(mydf) > row rank1 rank2 rank3 rank4 > 1 1yellowred blue > 2 2 yellow green red > 3 3 yellow green blue > 4 4 blue yellow green > 5 5 red blue green > 6 6 red green blue >> # Reshape >> mymelt <- melt(mydf, id.vars=1, measure.vars=2:5, > + variable.name="rank", value.name="color") >> # Convert rank to numeric >> mymelt$rank <- as.numeric(mymelt$rank) >> mycast <- dcast(mymelt, row~color, value.var="rank", fill=0) >> head(mycast) > row blue green red yellow NA > 1 14 0 3 2 1 > 2 20 2 4 1 3 > 3 33 2 0 1 4 > 4 4 2 4 0 3 1 > 5 53 4 2 0 1 > 6 64 3 2 0 1 > > David C > > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf Of David L Carlson > Sent: Sunday, August 17, 2014 6:32 PM > To: Simon Kiss; r-help@r-project.org > Subject: Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A > Data Frame > > There is probably an easier way to do this, but > >> set.seed(42) >> mydf <- data.frame(t(replicate(100, sample(c("red", "blue", > + "green", "yellow", NA), 4 >> colnames(mydf) <- c("rank1", "rank2", "rank3", "rank4") >> head(mydf) > rank1 rank2 rank3 rank4 > 1yellowred blue > 2 yellow green red > 3 yellow green blue > 4 blue yellow green > 5 red blue green > 6 red green blue >> lvls <- levels(mydf$rank1) >> # convert color factors to numeric >> for (i in seq_along(mydf)) mydf[,i] <- as.numeric(mydf[,i]) >> # stack the columns >> mydf2 <- stack(mydf) >> # convert rank factor to numeric >> mydf2$ind <- as.numeric(mydf2$ind) >> # add row numbers >> mydf2 <- data.frame(rows=1:100, mydf2) >> # Create table >> mytbl <- xtabs(ind~rows+values, mydf2) >> # convert to data frame >> mydf3 <- data.frame(unclass(mytbl)) >> colnames(mydf3) <- lvls >> head(mydf3) > blue green red yellow > 14 0 3 2 > 20 2 4 1 > 33 2 0 1 > 42 4 0 3 > 53 4 2 0 > 64 3 2 0 > > David C > > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf Of Simon Kiss > Sent: Friday, August 15, 2014 3:58 PM > To: r-help@r-project.org > Subject: Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A > Data Frame > > > Both the suggestions I got work very well, but what I didn't realize is that > NA values would cause serious problems. Where there is a missing value, > using the argument na.last=NA to order just returns the the order of the > factor levels, but excludes the missing values, but I have no idea where > those occur in the or rather which of those variables were actually missing. > Have I explained this problem sufficiently? > I didn't think it would cause such a problem so I didn't include it in the > original problem definition. > Yours, Simon > On Jul 25, 2014, at 4:58 PM, David L Carlson wrote: > >> I think this gets what you want. But your data are not reproducible since >> they are randomly drawn without setting a seed and the two data sets have no >> relationship to one another. >> >>> set.seed(42) >>> mydf <- data.frame(t(replicate(100, sample(c("r
Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data Frame
Both the suggestions I got work very well, but what I didn't realize is that NA values would cause serious problems. Where there is a missing value, using the argument na.last=NA to order just returns the the order of the factor levels, but excludes the missing values, but I have no idea where those occur in the or rather which of those variables were actually missing. Have I explained this problem sufficiently? I didn't think it would cause such a problem so I didn't include it in the original problem definition. Yours, Simon On Jul 25, 2014, at 4:58 PM, David L Carlson wrote: > I think this gets what you want. But your data are not reproducible since > they are randomly drawn without setting a seed and the two data sets have no > relationship to one another. > >> set.seed(42) >> mydf <- data.frame(t(replicate(100, sample(c("red", "blue", > + "green", "yellow") >> colnames(mydf) <- c("rank1", "rank2", "rank3", "rank4") >> mydf2 <- data.frame(t(apply(mydf, 1, order))) >> colnames(mydf2) <- levels(mydf$rank1) >> head(mydf) > rank1 rank2 rank3 rank4 > 1 yellow greenred blue > 2 green blue yellow red > 3 green yellowred blue > 4 yellowred green blue > 5 yellowred green blue > 6 yellowred blue green >> head(mydf2) > blue green red yellow > 14 2 3 1 > 22 1 4 3 > 34 1 3 2 > 44 3 2 1 > 54 3 2 1 > 63 4 2 1 > > - > David L Carlson > Department of Anthropology > Texas A&M University > College Station, TX 77840-4352 > > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf Of Simon Kiss > Sent: Friday, July 25, 2014 2:34 PM > To: r-help@r-project.org > Subject: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data > Frame > > Hello: > I have data that looks like mydf, below. It is the results of a survey where > participants were to put a number of statements (in this case colours) in > their order of preference. In this case, the rank number is the variable, and > the factor level for each respondent is which colour they assigned to that > rank. I would like to find a way to effectively transpose the data frame so > that it looks like mydf2, also below, where the colours the participants were > able to choose are the variables and the variable score is what that person > ranked that variable. > > Ultimately what I would like to do is a factor analysis on these items, so > I'd like to be able to see if people ranked red and yellow higher together > but ranked green and blue together lower, that sort of thing. > I have played around with different variations of t(), melt(), ifelse() and > if() but can't find a solution. > Thank you > Simon > #Reproducible code > mydf<-data.frame(rank1=sample(c('red', 'blue', 'green', 'yellow'), > replace=TRUE, size=100), rank2=sample(c('red', 'blue', 'green', 'yellow'), > replace=TRUE, size=100), rank3=sample(c('red', 'blue', 'green', 'yellow'), > replace=TRUE, size=100), rank4=sample(c('red', 'blue', 'green', 'yellow'), > replace=TRUE, size=100)) > > mydf2<-data.frame(red=sample(c(1,2,3,4), > replace=TRUE,size=100),blue=sample(c(1,2,3,4), > replace=TRUE,size=100),green=sample(c(1,2,3,4), replace=TRUE,size=100) > ,yellow=sample(c(1,2,3,4), replace=TRUE,size=100)) > * > Simon J. Kiss, PhD > Assistant Professor, Wilfrid Laurier University > 73 George Street > Brantford, Ontario, Canada > N3T 2C9 > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 905 746 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Turn Rank Ordering Into Numerical Scores By Transposing A Data Frame
Hello: I have data that looks like mydf, below. It is the results of a survey where participants were to put a number of statements (in this case colours) in their order of preference. In this case, the rank number is the variable, and the factor level for each respondent is which colour they assigned to that rank. I would like to find a way to effectively transpose the data frame so that it looks like mydf2, also below, where the colours the participants were able to choose are the variables and the variable score is what that person ranked that variable. Ultimately what I would like to do is a factor analysis on these items, so I'd like to be able to see if people ranked red and yellow higher together but ranked green and blue together lower, that sort of thing. I have played around with different variations of t(), melt(), ifelse() and if() but can't find a solution. Thank you Simon #Reproducible code mydf<-data.frame(rank1=sample(c('red', 'blue', 'green', 'yellow'), replace=TRUE, size=100), rank2=sample(c('red', 'blue', 'green', 'yellow'), replace=TRUE, size=100), rank3=sample(c('red', 'blue', 'green', 'yellow'), replace=TRUE, size=100), rank4=sample(c('red', 'blue', 'green', 'yellow'), replace=TRUE, size=100)) mydf2<-data.frame(red=sample(c(1,2,3,4), replace=TRUE,size=100),blue=sample(c(1,2,3,4), replace=TRUE,size=100),green=sample(c(1,2,3,4), replace=TRUE,size=100) ,yellow=sample(c(1,2,3,4), replace=TRUE,size=100)) * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with polychoric correlation in psych library
Hello I have a data.frame of 32 variables, all are ordered factors. str(dat) returns the following 'data.frame': 32 obs. of 43 variables: $ q1a: Ord.factor w/ 6 levels "Strongly Disagree"<..: 3 4 2 5 NA NA 5 5 3 5 ... $ q1b: Ord.factor w/ 6 levels "Strongly Disagree"<..: 3 NA 4 NA NA NA NA 5 4 4 ... $ q1c: Ord.factor w/ 6 levels "Strongly Disagree"<..: NA NA 5 5 NA 4 NA 5 NA 5 ... $ q1d: Ord.factor w/ 6 levels "Strongly Disagree"<..: 5 NA 5 NA NA 5 NA 5 NA 4 ... $ q1e: Ord.factor w/ 6 levels "Strongly Disagree"<..: 5 NA NA 5 5 NA NA 5 5 NA ... $ q1f: Ord.factor w/ 6 levels "Strongly Disagree"<..: 4 5 5 5 5 5 5 4 5 5 ... I'm trying to come up with a polychoric correlation matrix for these, and so I convert them to numeric values: 'data.frame': 32 obs. of 43 variables: $ q1a: num 3 4 2 5 NA NA 5 5 3 5 ... $ q1b: num 3 NA 4 NA NA NA NA 5 4 4 ... $ q1c: num NA NA 5 5 NA 4 NA 5 NA 5 ... $ q1d: num 5 NA 5 NA NA 5 NA 5 NA 4 ... and try: library(psych) polychoric(values, na.rm=TRUE), but this returns the following error The items do not have an equal number of response alternatives, global set to FALSE Error in poly[1, ] : incorrect number of dimensions In addition: Warning message: In mclapply(seq_len(n), do_one, mc.preschedule = mc.preschedule, : all scheduled cores encountered errors in user code Can anyone provide any guidance? Thanks, Simon Kiss * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 905 746 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] escape characters for apostrophes in a .csv file
Hello: I have a .csv file that includes some character strings (open ended survey responses) that includes some apostrophe. Using read.csv() the file reads in just fine, except upon being read in the apostrophes are displayed with the double-slash, i.e. 'I've' becomes 'I\\'ve'. I'd like to print these responess out for a report. Is there a way that I can have the apostrophes read in as original or print them out without the escape characters. Thank you. * Simon J. Kiss, PhD __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot2 counts versus percentages
Hello: I’m having troubles with plotting a barchart with percentages rather than counts in ggplot2. I’m aware that others have a problem with this, but cannot get this to work as I wish. At the end, I’d like a facetted barchart with percentages rather than with counts. Thank you for any assistance! I have a data.frame that looks like this below, table the data and then melt it to get it into long format. #Libraries l<-c(‘reshape’, ‘ggplot2’) lapply(l, library, character.only=T) #Sample test<-data.frame(society=sample(myvalues, size=100, replace=TRUE), equality=sample(myvalues, size=100, replace=TRUE), discrim=sample(myvalues, size=100, replace=TRUE)) #Long format test.table<-apply(test, 2, table) test.table<-melt(test.table) #And now I do this to create a facetted series of barcharts ggplot(test.table,aes(x=X1, y=value))+geom_bar(stat=‘identity')+facet_grid(~X2) #How do I get it to plot percentages, rather than the counts? #I’ve tried several variations of this to no success ggplot(test.table,aes(x=X1, y=value))+geom_bar(stat='identity', aes(y=value, (..count..)/sum(..count..)))+facet_grid(~X2) ggplot(test.table,aes(x=X1))+geom_bar(stat='identity', aes(y=value/(..sum..)/value))+facet_grid(~X2) Thank you for your assistance! Yours, Simon Kiss * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Searching the help archives - 404 error?
I'm using Mac OS 10.8.5, Chrome 31 and Safari 6.1. Recently, when entering anything into the search box here: http://tolstoy.newcastle.edu.au/R/ I get this response when searching using either Chrome or Safari: 404. That’s an error. The requested URL /u/newcastlemaths?q=rprofile&sa=Google+Search was not found on this server. That’s all we know. Has the search engine for the help archives moved? Yours, Simon Kiss * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help using mapply to run multiple models
Hi there: Just to tie this altogether. Here is the final function f<- function (modelType, responseName, predictorNames, data, ..., envir = parent.frame()) { call <- match.call() call$formula <- formula(envir = envir, paste(responseName, sep = " ~ ", paste0("`", predictorNames, "`", collapse = " + "))) call[[1]] <- as.name(modelType) call$responseName <- NULL # omit responseName= call$predictorNames <- NULL # omit 'predictorNames=' eval(call, envir = envir) } Here I call the function to a list of predictor variables and one dependent variable. Note "glm" and not glm. z <- lapply(list(c("hp","drat"), c("cyl"), c("am","gear")), FUN=function(preds)f("glm", "carb", preds, data=mtcars, family='binomial')) I do get this error: Error in glm.control(modelType = "glm") : unused argument(s) (modelType = "glm") But lapply(z, summary) does seem to return a list of model summaries. It looks like it worked. I also tried. z <- lapply(list(c("hp","drat"), c("cyl"), c("am","gear")), FUN=function(preds)f("lm", "mpg", preds, data=mtcars)) Here, I get: 1: In lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : extra argument ‘modelType’ is disregarded. 2: In lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : extra argument ‘modelType’ is disregarded. 3: In lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : extra argument ‘modelType’ is disregarded. But again, it actually looks like it worked. So, thank you very much! Yours, Simon Kiss On 2013-12-19, at 1:55 PM, Simon Kiss wrote: > Hello Bill, that is fantastic and it's quite a bit above what I could write. > Is there a way to make the model type an argument to the function so that you > can specify whether one is running glm, lm and such? > I tried to modify it by inserting an argument modelType below, but that > doesn't work. > Yours, simon Kiss >> f <- function (modelType, responseName, predictorNames, data, ..., envir = >> parent.frame()) >> { >> call <- match.call() >> call$formula <- formula(envir = envir, paste(responseName, sep = " ~ ", >> paste0("`", predictorNames, "`", collapse = " + "))) >> call[[1]] <- quote(modelType) # ' >> call$responseName <- NULL # omit responseName= >> call$predictorNames <- NULL # omit 'predictorNames=' >> eval(call, envir = envir) >> } > On 2013-12-18, at 3:07 PM, William Dunlap wrote: > >> f <- function (responseName, predictorNames, data, ..., envir = >> parent.frame()) >> { >> call <- match.call() >> call$formula <- formula(envir = envir, paste(responseName, sep = " ~ ", >> paste0("`", predictorNames, "`", collapse = " + "))) >> call[[1]] <- quote(glm) # 'f' -> 'glm' >> call$responseName <- NULL # omit responseName= >> call$predictorNames <- NULL # omit 'predictorNames=' >> eval(call, envir = envir) >> } >> as in >> z <- lapply(list(c("hp","drat"), c("cyl"), c("am","gear")), >> FUN=function(preds)f("carb", preds, data=mtcars, family=poisson)) >> lapply(z, summary) > > * > Simon J. Kiss, PhD > Assistant Professor, Wilfrid Laurier University > 73 George Street > Brantford, Ontario, Canada > N3T 2C9 > Cell: +1 905 746 7606 > > > * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 905 746 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help using mapply to run multiple models
Hello Bill, that is fantastic and it's quite a bit above what I could write. Is there a way to make the model type an argument to the function so that you can specify whether one is running glm, lm and such? I tried to modify it by inserting an argument modelType below, but that doesn't work. Yours, simon Kiss > f <- function (modelType, responseName, predictorNames, data, ..., envir = > parent.frame()) >{ >call <- match.call() >call$formula <- formula(envir = envir, paste(responseName, sep = " ~ ", >paste0("`", predictorNames, "`", collapse = " + "))) >call[[1]] <- quote(modelType) # ' >call$responseName <- NULL # omit responseName= >call$predictorNames <- NULL # omit 'predictorNames=' >eval(call, envir = envir) >} On 2013-12-18, at 3:07 PM, William Dunlap wrote: > f <- function (responseName, predictorNames, data, ..., envir = > parent.frame()) >{ >call <- match.call() >call$formula <- formula(envir = envir, paste(responseName, sep = " ~ ", >paste0("`", predictorNames, "`", collapse = " + "))) >call[[1]] <- quote(glm) # 'f' -> 'glm' >call$responseName <- NULL # omit responseName= >call$predictorNames <- NULL # omit 'predictorNames=' >eval(call, envir = envir) >} > as in >z <- lapply(list(c("hp","drat"), c("cyl"), c("am","gear")), > FUN=function(preds)f("carb", preds, data=mtcars, family=poisson)) >lapply(z, summary) * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 905 746 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help using mapply to run multiple models
Dennis, how would your function be modified to allow it to be more flexible in future. I'm thinking like: > f <- function(x='Dependent variable', y='List of Independent Variables', > z='Data Frame') > { >form <- as.formula(paste(x, y, sep = " ~ ")) >glm(form, data =z) > } I tried that then using modlist <- lapply(xvars, f), but it didn't work. On 2013-12-18, at 3:29 AM, Dennis Murphy wrote: > Hi: > > Here's a way to generate a list of model objects. Once you have the > list, you can write or call functions to extract useful pieces of > information from each model object and use lapply() to call each list > component recursively. > > sample.df<-data.frame(var1=rbinom(50, size=1, prob=0.5), > var2=rbinom(50, size=2, prob=0.5), > var3=rbinom(50, size=3, prob=0.5), > var4=rbinom(50, size=2, prob=0.5), > var5=rbinom(50, size=2, prob=0.5)) > > # vector of x-variable names > xvars <- names(sample.df)[-1] > > # function to paste a variable x into a formula object and > # then pass it to glm() > f <- function(x) > { >form <- as.formula(paste("var1", x, sep = " ~ ")) >glm(form, data = sample.df) > } > > # Apply the function f to each variable in xvars > modlist <- lapply(xvars, f) > > To give you an idea of some of the things you can do with the list: > > sapply(modlist, class)# return class of each component > lapply(modlist, summary) # return the summary of each model > > # combine the model coefficients into a two-column matrix > do.call(rbind, lapply(modlist, coef)) > > > You'd probably want to rename the second column since the slopes are > associated with different x variables. > > Dennis > > On Tue, Dec 17, 2013 at 5:53 PM, Simon Kiss wrote: >> I think I'm missing something. I have a data frame that looks below. >> sample.df<-data.frame(var1=rbinom(50, size=1, prob=0.5), var2=rbinom(50, >> size=2, prob=0.5), var3=rbinom(50, size=3, prob=0.5), var4=rbinom(50, >> size=2, prob=0.5), var5=rbinom(50, size=2, prob=0.5)) >> >> I'd like to run a series of univariate general linear models where var1 is >> always the dependent variable and each of the other variables is the >> independent. Then I'd like to summarize each in a table. >> I've tried : >> >> sample.formula=list(var1~var2, var1 ~var3, var1 ~var4, var1~var5) >> mapply(glm, formula=sample.formula, data=list(sample.df), family='binomial') >> >> And that works pretty well, except, I'm left with a matrix that contains all >> the information I need. I can't figure out how to use summary() properly on >> this information to usefully report that information. >> >> Thank you for any suggestions. >> >> * >> Simon J. Kiss, PhD >> Assistant Professor, Wilfrid Laurier University >> 73 George Street >> Brantford, Ontario, Canada >> N3T 2C9 >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 905 746 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help using mapply to run multiple models
Thanks! that works, more or less. Although the wonky behaviour of mapply that David pointed out is irritating. I tried deleting the $call item from the models produced and passing them to stargazer for reporting the results, but stargazer won't recognize the results even though the class is explicitly "glm lm". Does anyone know why mapply produces such weird results? On 2013-12-18, at 3:29 AM, Dennis Murphy wrote: > Hi: > > Here's a way to generate a list of model objects. Once you have the > list, you can write or call functions to extract useful pieces of > information from each model object and use lapply() to call each list > component recursively. > > sample.df<-data.frame(var1=rbinom(50, size=1, prob=0.5), > var2=rbinom(50, size=2, prob=0.5), > var3=rbinom(50, size=3, prob=0.5), > var4=rbinom(50, size=2, prob=0.5), > var5=rbinom(50, size=2, prob=0.5)) > > # vector of x-variable names > xvars <- names(sample.df)[-1] > > # function to paste a variable x into a formula object and > # then pass it to glm() > f <- function(x) > { >form <- as.formula(paste("var1", x, sep = " ~ ")) >glm(form, data = sample.df) > } > > # Apply the function f to each variable in xvars > modlist <- lapply(xvars, f) > > To give you an idea of some of the things you can do with the list: > > sapply(modlist, class)# return class of each component > lapply(modlist, summary) # return the summary of each model > > # combine the model coefficients into a two-column matrix > do.call(rbind, lapply(modlist, coef)) > > > You'd probably want to rename the second column since the slopes are > associated with different x variables. > > Dennis > > On Tue, Dec 17, 2013 at 5:53 PM, Simon Kiss wrote: >> I think I'm missing something. I have a data frame that looks below. >> sample.df<-data.frame(var1=rbinom(50, size=1, prob=0.5), var2=rbinom(50, >> size=2, prob=0.5), var3=rbinom(50, size=3, prob=0.5), var4=rbinom(50, >> size=2, prob=0.5), var5=rbinom(50, size=2, prob=0.5)) >> >> I'd like to run a series of univariate general linear models where var1 is >> always the dependent variable and each of the other variables is the >> independent. Then I'd like to summarize each in a table. >> I've tried : >> >> sample.formula=list(var1~var2, var1 ~var3, var1 ~var4, var1~var5) >> mapply(glm, formula=sample.formula, data=list(sample.df), family='binomial') >> >> And that works pretty well, except, I'm left with a matrix that contains all >> the information I need. I can't figure out how to use summary() properly on >> this information to usefully report that information. >> >> Thank you for any suggestions. >> >> * >> Simon J. Kiss, PhD >> Assistant Professor, Wilfrid Laurier University >> 73 George Street >> Brantford, Ontario, Canada >> N3T 2C9 >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 905 746 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help using mapply to run multiple models
I think I'm missing something. I have a data frame that looks below. sample.df<-data.frame(var1=rbinom(50, size=1, prob=0.5), var2=rbinom(50, size=2, prob=0.5), var3=rbinom(50, size=3, prob=0.5), var4=rbinom(50, size=2, prob=0.5), var5=rbinom(50, size=2, prob=0.5)) I'd like to run a series of univariate general linear models where var1 is always the dependent variable and each of the other variables is the independent. Then I'd like to summarize each in a table. I've tried : sample.formula=list(var1~var2, var1 ~var3, var1 ~var4, var1~var5) mapply(glm, formula=sample.formula, data=list(sample.df), family='binomial') And that works pretty well, except, I'm left with a matrix that contains all the information I need. I can't figure out how to use summary() properly on this information to usefully report that information. Thank you for any suggestions. * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot2 percentages of subpopulations
Hi there: I have a sample data set that looks like below. The variable 'value' represents the counts of cases in each response category. And I would like to get the barchart to graph the number of responses as a percentage of each total *subpopulation* (Males compared to Females), rather than as a percentage of *all* the responses. Can someone provide a suggestion? Thank you Yours, Simon Kiss #Sample Code sample.dat<-data.frame(response.category=rep(c('A', 'B','C'), 2), value=c(50,25,25, 25,25,25), pop=c(rep('Males', 3), rep('Females', 3))) #Draw GGPLot test<-ggplot(sample.dat, aes(x=response.category,y=value, group=pop)) test+geom_bar(stat='identity', position='dodge',aes(fill=pop)) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Goodness of fit statistics for cfa are missing (sem package)
Dear colleagues, I'm working on a confirmatory factor analysis and the model is not returning most of the usual goodness-of-fit statistics. I'm testing whether this survey data confirms a hypothesized two-factor uncorrelated structure that has theoretical and empirical support from another case. Below is (hopefully!) reproducible code which creates ff.cov a replica of the covariance matrix from my own data, the model that I have specified and am testing (cfa.mod.1) and the sem test of that model (cfa) and then a summary of the model fit (summary(cfa1)). The problem is that many of the usual measures o goodness-of-fit do not appear after summary(cfa1). I only get a chi-square statistic, degrees of freedom and a BIC. I saw a previous question on the R-mailing list that raised a similar issue and it was suggested that the problem lay in the specification of the model and that the degrees of freedom there were 0. Here, though, the df is 77. Unfortunately I can't find that question in the archives again or I would have linked to it. The data set includes 376 observations and has 14 variables. Seven (coded with an h in the variable name, as in cc.h.varname.e or h) are hypothesized to load on one factor uncorrelated with the second factor, coded with a c (as in cc.c.varname.c or i). I used this for guidance http://vimeo.com/38941937 Yours, Simon Kiss #Load Libraries library(sem) #Create Covariance Matrix ff.cov <- structure(c(0.0925407885304659, 0.0296839426523298, 0.00787168458781362, 0.0261784946236559, 0.0031878853046595, 0.0261837275985663, 0.00847584229390681, -0.00106, -0.00600867383512545, -0.010714623655914, -0.0123756272401434, -0.00528007168458781, -0.00116, 0.000812186379928316, 0.0296839426523298, 0.0665810023041475, 0.00836764592933948, 0.0281491359447005, 0.00793406810035842, 0.0169870865335381, 0.00258921786994368, 0.000712720174091142, -0.00318649385560676, -0.0083643253968254, -0.0133228366615463, -0.00557817844342038, -0.00224328341013825, -0.000821018945212493, 0.00787168458781362, 0.00836764592933948, 0.0804340181771633, 0.00589630696364567, 0.0201758960573477, 0.012536866359447, 0.000343669994879673, 0.00425544674859191, -0.00838453020993344, 0.00563975294418843, 0.00256180235535074, 0.00609073860727087, -0.00659535970302099, 0.00495727086533538, 0.0261784946236559, 0.0281491359447005, 0.00589630696364567, 0.0716310995903738, 0.00856442652329749, 0.0175328725038402, 0.0104401625704045, 0.0074095942140297, 0.00455983742959549, -0.0123115783410138, -0.0192821300563236, -0.0166337109575013, 0.00623943292370712, -0.0114852790578597, 0.0031878853046595, 0.00793406810035842, 0.0201758960573477, 0.00856442652329749, 0.0550506451612903, 0.00998831541218638, -0.000462311827956989, -0.0019576523297491, -0.0053855017921147, 0.00893281362007168, 8.10035842293904e-05, 0.00704324372759857, -0.00593381720430108, -0.00112867383512545, 0.0261837275985663, 0.0169870865335381, 0.012536866359447, 0.0175328725038402, 0.00998831541218638, 0.050779846390169, 0.00705281105990783, -0.00306167946748592, -0.00736291858678955, 0.00135779825908858, -0.00280025601638505, 0.00161077316948285, -0.00899035330261137, 0.00921536098310292, 0.00847584229390681, 0.00258921786994368, 0.000343669994879673, 0.0104401625704045, -0.000462311827956989, 0.00705281105990783, 0.0475710074244752, -0.0101174718381976, -0.0102886418330773, -0.0175483013312852, -0.0107030209933436, -0.00982140168970814, -0.00959551843317972, -0.00663934971838198, -0.00106, 0.000712720174091142, 0.00425544674859191, 0.0074095942140297, -0.0019576523297491, -0.00306167946748592, -0.0101174718381976, 0.0572903110599078, 0.0148363965693804, 0.0134899987199181, 0.0172146441372248, 0.00366528545826933, 0.0148914759344598, 0.00469321556579621, -0.00600867383512545, -0.00318649385560676, -0.00838453020993344, 0.00455983742959549, -0.0053855017921147, -0.00736291858678955, -0.0102886418330773, 0.0148363965693804, 0.0650389644137225, 0.00571948412698413, 0.00671484895033282, -0.000752505120327701, 0.0295244790066564, -0.00901979006656426, -0.010714623655914, -0.0083643253968254, 0.00563975294418843, -0.0123115783410138, 0.00893281362007168, 0.00135779825908858, -0.0175483013312852, 0.0134899987199181, 0.00571948412698413, 0.0839045558115719, 0.0463349206349206,
[R] Psych package: Error in biplot.psych(sample.mod) : Biplot requires factor/component scores:
Hello: I'm trying to construct a biplot from the psych package. The underlying data frame looks just like sample.data, below. I turned it into a polychoric correlation matrix sample.cor, below, as it is derived from a series of Likert (ordinal) items. All are positive, I just used negative numbers in this dataset to get two separate factors. I created a PCA from sample.cor$rho, specifying that scores were to be kept via scores=TRUE, but the command, biplot.psych(sample.mod) returns the error message: Error in biplot.psych, Biplot requires factor/component scores. But it seems from the help documentation, that one really only has to use the command biplot(mod) to get the plot. Can someone please advise? Yours, Simon Kiss #Sample data sample.data<-data.frame(var1=sample(c(0,0.33, 0.66, 1), size=100, replace=TRUE), var2=sample(c(0,0.33, 0.66, 1), size=100, replace=TRUE), var3=sample(c(0,-0.33, -0.66, -1), size=100, replace=TRUE), var4=sample(c(0,-0.33,-0.66,-1), size=100, replace=TRUE)) #Correlation Matrix sample.cor<-polychoric(sample.data, polycor=TRUE) #Principal Components Analysis sample.mod<-principal(sample.cor$rho, nfactors=2,scores=TRUE,covar=TRUE) #Draw Biplot biplot.psych(sample.mod) #error Error in biplot.psych(sample.mod) : Biplot requires factor/component scores: __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Adding time series to time graphs
Hello: I have done this before but cannot figure out how to do it again. I would like to graph campaign evolution of news stories on certain topics. The campaign time period is as follows: campaign<-seq.Date(from=as.Date('2011-09-06'), to=as.Date('2011-10-5'), by=1) I have a table of newspaper story frequencies containing a certain word that can be turned into a data.frame (or not). I'll reproduce it as a data.frame plotdf<-data.frame(story.dates=seq.Date(as.Date('2011-09-17'),as.Date('2011-09-30'), by=1), Freq=seq(1,14, by=1)) How do I overlay the frequency of newspaper stories in a line plot on a graph where the x-axis is a series of dates twice as long as the time series itself? The reason I'd like this is because I'd like to add a couple of other story time series as well. They may appear at other points in time in the campaign as well. Thanks. Simon Kiss __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] latex(test, collabel=) returns wrong latex code?
Hello: I'm working with a 2-dimensional table that looks sort of like test below. I'm trying to produce latex code that will add dimension names for both the rows and the columns. In using the following code, latex chokes when I include collabel='Vote' but it's fine without it. The code below prouces the latex code further below. I'm confused by this, because it looks like it's creating two bits of text for each instance of \multicolumn. Is that really allowed in \multicolumn? Could someone clarify? Thank you! Yours, SJK library(Hmisc) test<-as.table(matrix(c(50,50,50,50), ncol=2)) latex(test, rowlabel='Gender',collabel='Vote', file='') % latex.default(test, rowlabel = "Gender", collabel = "vote", file = "") % \begin{table}[!tbp] \begin{center} \begin{tabular}{lrr} \hline\hline \multicolumn{1}{l}{Gender}&\multicolumn{1}{vote}{A}&\multicolumn{1}{l}{B}\tabularnewline \hline A&$50$&$50$\tabularnewline B&$50$&$50$\tabularnewline \hline \end{tabular} \end{center} \end{table} * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 905 746 7606 Please avoid sending me Word, PowerPoint or Excel attachments. Sending these documents puts pressure on many people to use Microsoft software and helps to deny them any other choice. In effect, you become a buttress of the Microsoft monopoly. To convert to plain text choose Text Only or Text Document as the Save As Type. Your computer may also have a program to convert to PDF format. Select File, then Print. Scroll through available printers and select the PDF converter. Click on the Print button and enter a name for the PDF file when requested. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Reading outdated .Rprofile file
Hi there: I'm having a weird problem with my startup procedure. R.app is reading an unknown .Rprofile file. First, I'm on a Mac Os 10.6.8 running R.app 2.15.0 On startup > getwd() [1] "/Users/simon" But: the contents of my .Rprofile file in my home directory when viewed with a text editor are: .First<-function() { source("/Users/simon/Documents/R/functions/trim.leading.R") source("/Users/simon/Documents/R/functions/trim.trailing.R") source("/Users/simon/Documents/R/functions/trim.R") source("/Users/simon/Documents/R/functions/pseudor2.R") source("/Users/simon/Documents/R/functions/dates.R") source("/Users/simon/Documents/R/functions/andersen.R") source("/Users/simon/Documents/R/functions/tabfun.R") source("/Users/simon/Documents/R/functions/cox_snell.R") source("/Users/simon/Documents/R/functions/cor.prob.R") source("/Users/simon/Documents/R/functions/kmo.R") source("/Users/simon/Documents/R/functions/residual.stats.R") source("/Users/simon/Documents/R/functions/missings.plot.R") } but then, when I type .First from the command line I get function () { source("/Users/simon/Documents/R/functions/sample_size.R") source("/Users/simon/Documents/R/functions/pseudor2.R") source("/Users/simon/Documents/R/functions/dates.R") source("/Users/simon/Documents/R/functions/andersen.R") source("/Users/simon/Documents/R/functions/tabfun.R") source("/Users/simon/Documents/R/functions/cox_snell.R") source("/Users/simon/Documents/R/functions/cor.prob.R") source("/Users/simon/Documents/R/functions/kmo.R") } Needless to say, I get an error because the file sample.size.R was deleted a long time ago. So how do I get R.app to read the updated .Rprofile file? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with multiple barplots
Hello: I need to create a six barplots from data that looks pretty close to what appears below. There are two grouping variables (age and gender) and three dependent variables for each grouping variables. I'm not really familiar with trellis graphics, perhaps there is something that can do what I need there, i don't know. The thing is: I *need* these to appear on one row, with some way of differentiating between the three barplots of one grouping variable and the three from the other grouping variable. It's for a grant application and space is at a premium. The width of everything can be about 7 inches wide and the height maybe 2 to 2.5 inches. I also need an outer margin to place a legend. I can do this with the following using the layout command, but I cannot figure out a nice way to differentiate the two groups of variables. I'd like to find a way to put a little bit of space between the three from one grouping variable and the three from another grouping variable. If anyone has any thoughts, I'd be very grateful. Yours truly, Simon J. Kiss ###Random Data crime<-sample(c('agree' ,'disagree'), replace=TRUE, size=100) guns<-sample(c('agree','disagree'), replace=TRUE, size=100) climate<-sample(c('agree', 'disagree'), replace=TRUE, size=100) gender<-sample(c('male','both' ,'female'), replace=TRUE, size=100) age<-sample(c('old', 'neither', 'young'), replace=TRUE, size=100) dat<-as.data.frame(cbind(crime, guns, climate, gender, age)) ###Code I'm working with now layout(matrix(c(1,2,3,4,5,6), c(1,6))) barplot(prop.table(table(dat$guns, dat$gender), 2)) barplot(prop.table(table(dat$crime, dat$gender), 2)) barplot(prop.table(table(dat$climate, dat$gender), 2)) barplot(prop.table(table(dat$guns, dat$gender), 2)) barplot(prop.table(table(dat$crime, dat$age), 2)) barplot(prop.table(table(dat$climate, dat$age), 2)) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Function failure in tm
HI all: I have a customized source reader for the package tm (that Milan Bouchet-Vallat has been instrumental in producing). I can get it to produce a corpus of class: "VCorpus" "Corpus" "list" class(mycorp[1]) returns "VCorpus" "Corpus" "list" and class(mycorp[[1]] returns "PlainTextDocument" "TextDocument" "character" But now that I've got a corpsu, none of the transformation functions work at all. They all return the following error (with the respective function name) Error in UseMethod("stripWhitespace", x) : no applicable method for 'stripWhitespace' applied to an object of class "NULL" I haven't seen this error reported anywhere in the R-list archives. Does anyone have any suggestions? Yours, Simon Kiss P.S. The results of sessionInfo() are R version 2.15.0 (2012-03-30) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] RWekajars_3.7.7-2 rJava_0.9-3 RWeka_0.4-13 Snowball_0.0-8 [5] tm.plugin.factiva_1.1 tm_0.5-8.1 loaded via a namespace (and not attached): [1] grid_2.15.0 slam_0.1-26 tools_2.15.0 XML_3.9-4 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tm: custom reader for readPlain
Hmm...Thanks a lot! that seems like really useful stuff. It might be a bit over my head, but I'll look into it. The articles are all contained in one text file, but they are clearly delimited (either by a series of ) or the regular expression ^Document.[0-9]. Simon On 2013-01-08, at 4:44 PM, Milan Bouchet-Valat wrote: > Le mardi 08 janvier 2013 à 15:56 -0500, Simon Kiss a écrit : >> Hello: >> I have a series of newspaper articles from a Canadian newspaper >> database (Canadian Newsstand) that look just like below. >> >> I've read through this vignette >> (http://cran.r-project.org/web/packages/tm/vignettes/extensions.pdf) >> about creating a custom reader to extract meta-data, but I can't >> understand how to apply this in the context of a text document, rather >> than in the tabular format as in the vignette. You can see there's >> all kinds of valuable information in each document -Author, page >> number, publication year, section, publication title >> Can anyone provide some suggestions to someone unfamiliar with the tm >> package as to how to go about creating a custom reader for this >> situation? > You should create a reader function that takes as an input the text > content you pasted at the end of your messages, parses it as > appropriate, and returns a PlainTextDocument. The information can be set > using the meta() function on the document object before returning it. > You can see how this process works by looking at the readFactivaHTML.R > file from my tm.plugin.factiva package, and probably from other packages > too (do not use readFactivaXML.R as it uses a method that only works for > XML input). Of course, parsing the input will take some work, but it > shouldn't be too hard if you split each line into a field identifier > (the part before ":") and the value of the field, and create a character > vector from that. > > An information you did not give us is how are distributed the different > articles you need to import. If they are each in a separate files, you > can adapt DirSource() from tm so that it calls your reader function on > each file. If they are in one file, you need to create a custom source > that will read the file, split it and call the reader function on the > part corresponding to each article; this latter way is illustrated by > the HTML part of the FactivaSource.R file (again, skip the XML part). > > Finally, maybe you can extract the articles in a different format, > ideally in XML, which is easier to use? Or maybe this newspaper is > available on Factiva, in which case my package will work for you? > > > Hope this helps > > >> Yours truly, >> Simon Kiss >> >> >> >> Document 1 of 40 >> First Nation agrees not to block trains >> Author: SHAWN BERRY Legislature Bureau >> Publication info: Daily Gleaner [Fredericton, N.B] 07 Jan 2013: A.3. >> http://remote.libproxy.wlu.ca/login?url=http://search.proquest.com/docview/1266701269?accountid=15090 >> Abstract: Participants are also concerned about Chief Theresa Spence who >> stopped eating solid food on Dec. 11 in a bid to secure a meeting between >> First Nations leaders, Prime Minister Stephen Harper and Gov. Gen. David >> Johnston to discuss the treaty relationship. >> Links: null >> Full Text: A bunch of text about a story here >> Subject: Railroads; Native North Americans; Meetings; Injunctions >> Title: First Nation agrees not to block trains >> Publication title: Daily Gleaner >> First page: A.3 >> Publication year: 2013 >> Publication date: Jan 7, 2013 >> Year: 2013 >> Section: Main >> Publisher: Infomart, a division of Postmedia Network Inc. >> Place of publication: Fredericton, N.B. >> Country of publication: Canada >> Journal subject: GENERAL INTEREST PERIODICALS--UNITED STATES >> ISSN: 08216983 >> Source type: Newspapers >> Language of publication: English >> Document type: News >> ProQuest document ID: 1266701269 >> Document URL: >> http://remote.libproxy.wlu.ca/login?url=http://search.proquest.com/docview/1266701269?accountid=15090 >> Copyright: (Copyright (c) 2013 The Daily Gleaner (Fredericton)) >> Last updated: 2013-01-07 >> Database: Canadian Newsstand Complete >> >> >> * >> Simon J. Kiss, PhD >> Assistant Professor, Wilfrid Laurier University >> 73 George Street >> Brantford, Ontario, Canada >> N3T 2C9 >> Cell: +1 905 746 7606 >> >> Please avoid sending me Word, PowerPoint or Excel attachments.
[R] tm: custom reader for readPlain
Hello: I have a series of newspaper articles from a Canadian newspaper database (Canadian Newsstand) that look just like below. I've read through this vignette (http://cran.r-project.org/web/packages/tm/vignettes/extensions.pdf) about creating a custom reader to extract meta-data, but I can't understand how to apply this in the context of a text document, rather than in the tabular format as in the vignette. You can see there's all kinds of valuable information in each document -Author, page number, publication year, section, publication title Can anyone provide some suggestions to someone unfamiliar with the tm package as to how to go about creating a custom reader for this situation? Yours truly, Simon Kiss Document 1 of 40 First Nation agrees not to block trains Author: SHAWN BERRY Legislature Bureau Publication info: Daily Gleaner [Fredericton, N.B] 07 Jan 2013: A.3. http://remote.libproxy.wlu.ca/login?url=http://search.proquest.com/docview/1266701269?accountid=15090 Abstract: Participants are also concerned about Chief Theresa Spence who stopped eating solid food on Dec. 11 in a bid to secure a meeting between First Nations leaders, Prime Minister Stephen Harper and Gov. Gen. David Johnston to discuss the treaty relationship. Links: null Full Text: A bunch of text about a story here Subject: Railroads; Native North Americans; Meetings; Injunctions Title: First Nation agrees not to block trains Publication title: Daily Gleaner First page: A.3 Publication year: 2013 Publication date: Jan 7, 2013 Year: 2013 Section: Main Publisher: Infomart, a division of Postmedia Network Inc. Place of publication: Fredericton, N.B. Country of publication: Canada Journal subject: GENERAL INTEREST PERIODICALS--UNITED STATES ISSN: 08216983 Source type: Newspapers Language of publication: English Document type: News ProQuest document ID: 1266701269 Document URL: http://remote.libproxy.wlu.ca/login?url=http://search.proquest.com/docview/1266701269?accountid=15090 Copyright: (Copyright (c) 2013 The Daily Gleaner (Fredericton)) Last updated: 2013-01-07 Database: Canadian Newsstand Complete * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 905 746 7606 Please avoid sending me Word, PowerPoint or Excel attachments. Sending these documents puts pressure on many people to use Microsoft software and helps to deny them any other choice. In effect, you become a buttress of the Microsoft monopoly. To convert to plain text choose Text Only or Text Document as the Save As Type. Your computer may also have a program to convert to PDF format. Select File, then Print. Scroll through available printers and select the PDF converter. Click on the Print button and enter a name for the PDF file when requested. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Changing Variable Names In VCD
Hello: What is the most efficient way to change the plotted variable names in mosaic plots in the vcd package? Should one do a separate contingency table first, change the dimension names there and then pass that to mosaic? Or is there a way to do it simply within mosaic. I was thinking something like: mosaic(~var1+var2, labelling_args=list(varnames=c('newvar1', 'newvar2')) Simon Kiss * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] xtable with psych objects
Hello: I s there a way to use xtable with objects from the psych package, particularly principal()? Is there a difference between princomp and principal? xtable seems to play better with princomp. Thank you. Yours, Simon Kiss * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Warning message: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!
Hi there I'm trying to fit a logistic regression model to data that looks very similar to the data in the sample below. I don't understand why I'm getting this error; none of the data are proportional and the weights are numeric values. Should I be concerned about the warning about non-integer successes in my binomial glm? If I should be, how do I go about addressing it? I'm pretty sure the weights in the data frame are sampling weights. What follows is the result of str() on my data, the series of commands I'm using to fit the model, the responses I'm getting and then some code to reproduce the data and go through the same steps with that code. One last (minor) question. When calling svyglm on the sample data, I actually get some information about the model fitting results as well as the error about non-integer successes. In my real data, you only get the warning. Calling summary(mod1) on the real data does return information about the coefficients and the model fitting. I'm grateful for any help. I'm aware that the topic of non-integer successes has been addressed before, but I could not find my answer to this question. Yours, Simon Kiss ##str() on original data str(mat1) 'data.frame': 1001 obs. of 5 variables: $ prov : Factor w/ 4 levels "Ontario","PQ",..: 2 2 2 2 2 2 2 2 2 2 ... $ edu : Factor w/ 2 levels "secondary","post-secondary": 2 2 2 1 1 2 2 2 1 1 ... $ gender: Factor w/ 2 levels "Male","Female": 1 1 2 2 2 2 1 1 2 2 ... $ weight: num 1.145 1.436 0.954 0.765 0.776 ... $ trust : Factor w/ 2 levels "no trust","trust": 2 1 1 1 1 2 1 2 1 2 ... ###Set up survey design des.1<-svydesign(~0, weights=~weight, data=mat1) ###model and response to svyglm mod1<-svyglm(trust ~ gender+edu+prov, design=des.1, family='binomial') Warning message: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm! Model Summary summary(mod1) Call: svyglm(formula = trust ~ gender + edu + prov, design = des.1, family = "binomial") Survey design: svydesign(~0, weights = ~weight, data = mat1) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.625909 0.156560 -3.998 6.87e-05 *** genderFemale 0.013519 0.140574 0.0960.923 edupost-secondary -0.011569 0.141528 -0.0820.935 provPQ-0.006614 0.172105 -0.0380.969 provatl0.335166 0.297860 1.1250.261 provwest -0.053862 0.174826 -0.3080.758 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1.002254) Number of Fisher Scoring iterations: 4 #Attempt To Reproduce The Problem Data mat.test<-data.frame(edu=c(rep('secondary', 300), rep('post-secondary', 300)), prov=c(rep('ON', 200), rep('PQ', 200), rep('AB', 200)), trust=c(rep('trust',200), rep('notrust',400)), gender=c(rep('Male', 300), rep('Female', 300)), weight=rnorm(600, mean=1, sd=0.3)) ###Survey Design object test<-svydesign(~0, weights=~weight, data=mat.test) #Call To svyglm svyglm(trust ~ edu+prov+gender, design=test, family='binomial') #Reults Independent Sampling design (with replacement) svydesign(~0, weights = ~weight, data = mat.test) Call: svyglm(formula = trust ~ edu + prov + gender, design = test, family = "binomial") Coefficients: (Intercept) edusecondaryprovONprovPQgenderMale -2.658e+01-8.454e-04 5.317e+01-1.408e-02NA Degrees of Freedom: 599 Total (i.e. Null); 596 Residual Null Deviance: 759.6 Residual Deviance: 3.406e-09AIC: 8 Warning messages: 1: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm! 2: glm.fit: algorithm did not converge * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Ordering List Items Chronologically
Dear colleagues, Is there a way to order list items by date? I have a series of surveys in a list where the name of each list item is the date the survey was taken but the list items are out of order. Each data frame has a variable in it with the survey date as well, if that helps. Yours, Simon Kiss #Sample Data mylist<-list('1991-01-01'=data.frame(a=rep(5,5), survey.date=rep(as.Date('1991-01-01', format='%Y-%m-%d'))), '1979-01-01'=data.frame(aa=rep(5,5), survey.date=rep(as.Date('1979-01-01', format='%Y-%m-%d'), 5)), '2001-01-01'=data.frame(c=rep(6,5), survey.date=rep(as.Date('2001-01-01', format='%Y-%m-%d'), 5))) mylist __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] select different variables from a list of data frames
Hi: How do I select different variables from a list of data frames. I have a list of 13 that looks like below. Each data frame has more variables than I need. How do I go through the list and select the variables that I need. In the example below, I need to get the variables "a", and "q10" and "q14" to be returned to two separate data frames. Thank you. Yours, Simon Kiss #Sample data mylist<-list(df1=data.frame(a=seq(1,10,1), c=seq(1,109,1), q10=rep('favour', 10)), df2=data.frame(a=seq(1,10,1), b=seq(15,24,1), q14=rep('favour', 10))) #The variables with different names that I need are q<-c('q10', 'q14') #My current code dat<-mapply(function(x,y) { data.frame(a=x$a, y$q) }, x=mylist, y=q) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] using lapply with recode
Hello: Forgive me, this is surely a simple question but I can't figure it out, having consulted the help archives and "Data Manipulation With R" (Spector). I have a list of 11 data frames with one common variable in each (prov). I'd like to use lapply to go through and recode one particular level of that common variable. I can get the recode to work, but it only returns the variable that has been recoded. I need the whole data frame with the recoded variable. Thank you for your help. Reproducible data and my current code are below. Sample Data mylist<-list(df1=data.frame(a=seq(1,10,1), prov=c(rep('QUE', 5), rep('BC', 5))), df2=data.frame(a=seq(1,10,1), prov=c(rep('Quebec', 5), rep('AB', 5 str(mylist) ###My current code lapply(mylist, function(x) { recode(x$prov, "'QUE'='QC' ; 'Quebec'='QC'") } ) * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Combine two variables
Hi: I have two variables in a data frame that are the results of a wording experiment in a survey. I'd like to create a third variable that combines the two variables. Recode doesn't seem to work, because it just recodes the first variable into the third, then recodes the second variable into the third, overwriting the first recode. I can do this with a rather elaborate indexing process, subsetting the first column and then copying the data into the second etc. But I'm looking for a cleaner way to do this. The data frame looks like this. df<-data.frame(var1=sample(c('a','b','c',NA),replace=TRUE, size=100), var2=sample(c('a','b','c',NA),replace=TRUE,size=100)) df<-subset(df, !is.na(var1) |!is.na(var2)) As you can see, if one variable has an NA, then the other variable has a valid value, so how do I just combine the two variables into one? Thank you for your assistance. Simon Kiss __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Changing line length in Sweave output works for numeric, but not for character vectors
Hi: I'm not sure how to do 1, but I also tried strwrap() and that worked OK. Although it's not pretty. But it'll do. Simon On 2012-08-20, at 5:52 PM, Yihui Xie wrote: > Two possible solutions: > > 1. Redefine the LaTeX environment so it allows wrapping (see listings > for example); > 2. Manually break your long string into shorter pieces and paste() > them together, e.g. paste('long', 'long', 'string') > > Regards, > Yihui > -- > Yihui Xie > Phone: 515-294-2465 Web: http://yihui.name > Department of Statistics, Iowa State University > 2215 Snedecor Hall, Ames, IA > > > On Mon, Aug 20, 2012 at 5:03 PM, Simon Kiss wrote: >> Hi there: I'm preparing a report in RStudio 0.96.330 on a Mac OS. I'm >> running R 2.15.0 >> >> I understand from Ross Ihaka's document >> (http://www.stat.auckland.ac.nz/~stat782/downloads/Sweave-customisation.pdf) >> that you can modify the line length of Sweave output by a call to >> options(wdith=x). >> >> This works great for me for numeric output, but not for character vectors >> that I have to print. The following is some sample code that illustrates my >> problem. >> >> Is there a different way to format character vectors that are stored in R? >> Yours, Simon Kiss >> >> \documentclass{article} >> >> \begin{document} >> \SweaveOpts{concordance=TRUE} >> >> <>= >> seq(1,100,1) >> @ >> >> <>= >> options(width=30) >> @ >> <>= >> seq(1,100,1) >> @ >> >> <>= >> test<-c('The government should do more to advance societys goals, even if >> that means limiting the freedom and choices of individuals.') >> @ >> >> \end{document} >> * >> Simon J. Kiss, PhD >> Assistant Professor, Wilfrid Laurier University >> 73 George Street >> Brantford, Ontario, Canada >> N3T 2C9 >> Cell: +1 905 746 7606 >> >> Please avoid sending me Word, PowerPoint or Excel attachments. Sending these >> documents puts pressure on them to use Microsoft software and helps to deny >> them any other choice. In effect, you become a buttress of the Microsoft >> monopoly. This pressure is a major obstacle to the broader adoption of free >> software. >> >> To convert to plain text choose Text Only or Text Document as the Save As >> Type. Your computer may also have a program to convert to PDF format. >> Select File, then Print. Scroll through available printers and select the >> PDF converter. Click on the Print button and enter a name for the PDF file >> when requested. >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 905 746 7606 Please avoid sending me Word, PowerPoint or Excel attachments. Sending these documents puts pressure on them to use Microsoft software and helps to deny them any other choice. In effect, you become a buttress of the Microsoft monopoly. This pressure is a major obstacle to the broader adoption of free software. To convert to plain text choose Text Only or Text Document as the Save As Type. Your computer may also have a program to convert to PDF format. Select File, then Print. Scroll through available printers and select the PDF converter. Click on the Print button and enter a name for the PDF file when requested. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Changing line length in Sweave output works for numeric, but not for character vectors
Hi there: I'm preparing a report in RStudio 0.96.330 on a Mac OS. I'm running R 2.15.0 I understand from Ross Ihaka's document (http://www.stat.auckland.ac.nz/~stat782/downloads/Sweave-customisation.pdf) that you can modify the line length of Sweave output by a call to options(wdith=x). This works great for me for numeric output, but not for character vectors that I have to print. The following is some sample code that illustrates my problem. Is there a different way to format character vectors that are stored in R? Yours, Simon Kiss \documentclass{article} \begin{document} \SweaveOpts{concordance=TRUE} <>= seq(1,100,1) @ <>= options(width=30) @ <>= seq(1,100,1) @ <>= test<-c('The government should do more to advance societys goals, even if that means limiting the freedom and choices of individuals.') @ \end{document} * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 905 746 7606 Please avoid sending me Word, PowerPoint or Excel attachments. Sending these documents puts pressure on them to use Microsoft software and helps to deny them any other choice. In effect, you become a buttress of the Microsoft monopoly. This pressure is a major obstacle to the broader adoption of free software. To convert to plain text choose Text Only or Text Document as the Save As Type. Your computer may also have a program to convert to PDF format. Select File, then Print. Scroll through available printers and select the PDF converter. Click on the Print button and enter a name for the PDF file when requested. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Rcurl, postForm()
Dear colleagues, Could I get some assistance using postForm() to scrape the business names and addresses at this website: http://www.brantford.ca/business/LocalBusinessCommunity/Pages/BusinessDirectorySearch.aspx I've read through (http://www.omegahat.org/RCurl/RCurlJSS.pdf) and scoured the web for tutorials, but I can't crack it. I'm aware that this is probably a pretty basic question, but I need some help regardless. Yours, Simon Kiss library(XML) library(RCurl) library(scrapeR) library(RHTMLForms) #Set URL bus<-c('http://www.brantford.ca/business/LocalBusinessCommunity/Pages/BusinessDirectorySearch.aspx') #Scrape URL orig<-getURLContent(url=bus) #Parse doc doc<-htmlParse(orig[[1]], asText=TRUE) #Get The forms forms<-getNodeSet(doc, "//form") forms[[1]] #These are the input nodes getNodeSet(forms[[1]], ".//input") #These are the select nodes getNodeSet(forms[[1]], ".//select") * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 905 746 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using xpathapply or getnodeset to get text between two distinct tags
Hello: The following code extracts the links to the daily transcripts of Canada's House Of Commons. 'links' is a matrix of URLs (ncol=1), each of which points to one day's transcripts. If you inspect the code for scrape(links[1]), you will find that periodically there appears an italicitze tag after a paragraph tag (Translation. At this point, the speaker is speaking French. Then there are some tags that list some text, and then, after the speaker has returned to English, you get the same formula as above, English some speech Some Speech Ultimately, what I'd like to do i count the words between the tags 'Tanslation' and 'English'. I'm pretty sure I can get the text into the tm package to do the word counts, what I really don't know how to is return the text between 'Translation' and 'English' so that I can mark it as 'French' and then return the text between 'English' and 'Translation' and mark it as English. Does any one have any suggestions? Yours truly, Simon J. Kiss #Necessary libraries library(XML) library(scrapeR) #URL for links to 2012 transcripts hansard<-c('http://www.parl.gc.ca/housechamberbusiness/ChamberSittings.aspx?View=H&Language=E&Mode=1&Parl=41&Ses=1') #Scrape the page with the links doc<-scrape(url=hansard, parse=TRUE, follow=TRUE) #Not sure what exactly this does, but it is necessary doc<-doc[[1]] #Get the xmlRoot directory doc<- xmlRoot(doc) #Get nodes that contain only the links to each day's transcripts links<- getNodeSet(doc, "//a[@class='PublicationCalendarLink']/@href") links<-matrix(links) #Paste those href links to the root URL links<-apply(links, 1, function(x) paste('http://www.parl.gc.ca', x, sep='')) #Inspect links[1] #Scrape text from first URL in 'links' oneday<-scrape(links[1])[[1]] #Return p/i elements from 'oneday' getNodeset(oneday, "//p/i") #sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] C/en_US.UTF-8/C/C/C/C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] scrapeR_0.1.6 RCurl_1.91-1 bitops_1.0-4.1 XML_3.9-4 loaded via a namespace (and not attached): [1] tools_2.15.0 * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 905 746 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] adding a caption to a mosaic plot?
Dear all: Is there a way to add text to the margins or outer margins of a mosaic plot using the vcd package? I understand the margins argument to mosaic, but I don't know how to add text to that. I'd like to add a caption to a plot. If possible, I'd like to know how to set the font and size for that function as well. My plot looks roughly as below. Thank you for your time! Simon J. Kiss mydat<-data.frame(gender=factor(rbinom(100, 1, 0.5), labels=c('female', 'male')), hair=factor(rbinom(100, 1, 0.5), labels=c('blonde', 'black'))) mosaic_1<-table(mydat) mosaic(mosaic_1, gp=shading_hsv, main='my title', pop=FALSE, split_vertical=FALSE, margins=c(4.1, 2.1, 8, 5.1), labeling_args=list(rot_labels=c(left=0), offset_labels=c(left=3), gp_main=gpar(cex=2), offset_varnames=c(left=5.5), gp_labels=gpar(cex=1.5), gp_varnames=gpar(cex=1.5), labeling_values=c('observed'))) labeling_cells(text=round(prop.table(mosaic_1, 1)*100), gp_text=gpar(ces=2), clip=FALSE)(mosaic_1) * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 905 746 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Use scores from factor analysis and missing values factanal(), napredict(), na.omit()
Dear all, I have a series of variables that looks roughly like the sample data below and I'm trying to conduct a factor analysis. I've omitted cases with missing values for the factor analysis, but now I'd like to use the scores on each component as new variables in the *original* data set for analysis. That is, I'd like to take the scores on each of the two factors and see how they relate to the variable "trust" in the original data set. It looks like I could create a common index variable out of the rownames in each data set and then merge them, but I'm wondering if there is a less bulky way to do that perhaps via ?napredict? Thank you for your time. Yours, Simon J. Kiss * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 905 746 7606 #Sample Data mydat<-data.frame(trust=rnorm(100, mean=5, sd=2), v=rnorm(100, mean=1, sd=0.2), w=rnorm(100, mean=2, sd=0.5), x=rnorm(100, mean=0.2, sd=0.2), y=rnorm(100, mean=0.3, sd=0.1), z=rnorm(100, mean=0.5, sd=0.3)) #Set some missing values mydat[52,2]<-NA mydat[53,1]<-NA mydat[95,3]<-NA #Subset original data set by variables for factor analysis my<-subset(mydat, select=c(v,w,x,y,z)) #Omit cases with missing variables my<-na.omit(mydat) #Factor analysis plus generate Scores myfit<-factanal(my, 2, rotation='varimax', scores='Bartlett') #Reintegrate Scores from two factors to original dataset for regression analysis #?na.predict ?merge(rownames) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] grep and XML
Hi all: I struggle a lot scraping web data. I still haven't got a handle on the XML package. I'd like to get particular exchange rates from this table: https://raw.github.com/currencybot/open-exchange-rates/master/latest.json This is the code that I'm working with: library(RCurl) library(XML) txt<-getURL("https://raw.github.com/currencybot/open-exchange-rates/master/latest.json";) txt<-htmlParse(txt, asText=TRUE) txt<- getNodeSet(txt, '//p') So, I can get the node, properly but then, if I try soething like this: grep(c('USD'), txt) I get: integer(0) Can anyone suggest a way forward? Yours, Simon KIss * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 905 746 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lapply to change variable names and variable values
Thanks both! That solves ! You've made a very happy newbie! Simon On 2012-03-12, at 2:52 PM, Sarah Goslee wrote: > Hi Simon, > > On Mon, Mar 12, 2012 at 2:37 PM, Simon Kiss wrote: >> Hi: I'm sure this is a very easy problem. I've consulted Data Manipulation >> With R and the R Book and can't find an answer. >> >> Sample list of data frames looks as follows: >> >> .xx<-list(df<-data.frame(Var1=rep('Alabama', 400), Var2=rep(c(2004, 2005, >> 2006, 2007), 400)), df2<-data.frame(Var1=rep('Tennessee', 400), >> Var2=rep(c(2004,2005,2006,2007), 400)), df3<-data.frame(Var1=rep('Alaska', >> 400), Var2=rep(c(2004,2005,2006,2007), 400)) ) > > I tweaked this a bit so that it doesn't actually create df, df2, df3 as well > as > making a list of them, and so that xx doesn't begin with a . and shows up with > ls(). I don't need invisible objects in my testing session. > > xx<-list(df=data.frame(Var1=rep('Alabama', 400), Var2=rep(c(2004, > 2005, 2006, 2007), 400)), df2=data.frame(Var1=rep('Tennessee', 400), > Var2=rep(c(2004,2005,2006,2007), 400)), > df3=data.frame(Var1=rep('Alaska', 400), > Var2=rep(c(2004,2005,2006,2007), 400)) ) > > >> I would like to accomplish the following two tasks. >> First, I'd like to go through and change the names of each of the data >> frames within the list >> to be 'State' and 'Year' >> >> Second, I'd like to go through and add one year to each of the 'Var2' >> variables. >> >> Third, I'd like to then delete those cases in the data frames that have >> values of Var2 (or Year) values of 2008. >> >> I could do this manually, but my data are actually bigger than this, plus >> I'd really like to learn. I've been trying to use lapply, but I can't get my >> head around how it works: >> .xx<- lapply(.xx, function(x) colnames(x)<-c('State', 'Year') >> just changes the actual list of data frames to a list of the character >> string ('State' and 'Year') How do I actually change the underlying >> variable names? > > Your function doesn't return the right thing. To see how it works, it's often > a > good idea to write a stand-alone function and see what it does. For instance, > > rename <- function(x) { > colnames(x)<-c('State', 'Year') > x > } > > To me at least, as soon as it's written as a stand-alone it's obvious that > you have to return x in the last line. You can either use rename() in your > lapply statement: > xx<- lapply(xx, rename) > > or you can write the full function into the lapply statement: >> xx<-list(df=data.frame(Var1=rep('Alabama', 400), Var2=rep(c(2004, 2005, >> 2006, 2007), 400)), df2=data.frame(Var1=rep('Tennessee', 400), >> Var2=rep(c(2004,2005,2006,2007), 400)), df3=data.frame(Var1=rep('Alaska', >> 400), Var2=rep(c(2004,2005,2006,2007), 400)) ) >> xx <- lapply(xx, function(x){ colnames(x)<-c('State', 'Year'); x} ) >> colnames(xx[[1]]) > [1] "State" "Year" > > The same strategy should work for your other needs as well. > > Sarah > > > > -- > Sarah Goslee > http://www.functionaldiversity.org * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 905 746 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lapply to change variable names and variable values
Hi: I'm sure this is a very easy problem. I've consulted Data Manipulation With R and the R Book and can't find an answer. Sample list of data frames looks as follows: .xx<-list(df<-data.frame(Var1=rep('Alabama', 400), Var2=rep(c(2004, 2005, 2006, 2007), 400)), df2<-data.frame(Var1=rep('Tennessee', 400), Var2=rep(c(2004,2005,2006,2007), 400)), df3<-data.frame(Var1=rep('Alaska', 400), Var2=rep(c(2004,2005,2006,2007), 400)) ) I would like to accomplish the following two tasks. First, I'd like to go through and change the names of each of the data frames within the list to be 'State' and 'Year' Second, I'd like to go through and add one year to each of the 'Var2' variables. Third, I'd like to then delete those cases in the data frames that have values of Var2 (or Year) values of 2008. I could do this manually, but my data are actually bigger than this, plus I'd really like to learn. I've been trying to use lapply, but I can't get my head around how it works: .xx<- lapply(.xx, function(x) colnames(x)<-c('State', 'Year') just changes the actual list of data frames to a list of the character string ('State' and 'Year') How do I actually change the underlying variable names? I'm grateful for your suggestions! Simon Kiss * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 905 746 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] txtStart creates a NULL file
Hello all: I'm trying to use the following code to get commands, comments and results to a .txt file. It only appears to capture comments. When I comment those out with #, it creates a NULL file. Someone seemed to have a similar problem with a mac GUI (https://stat.ethz.ch/pipermail/r-help/2010-September/253177.html) but the result seemed to be ambiguous. Is there a work-around? Reproducible code and sessioninfo are below. The OS is Mac OS 10.6.8. Yours truly, Simon Kiss install.packages("HSAUR") library(HSAUR) library(TeachingDemos) data("Forbes2000", package="HSAUR") #This is a test of R output for the blind txtStart('test.txt', commands=TRUE, results=TRUE) txtComment('This command provides the mean profit in the data set') mean(Forbes2000$profits, na.rm=TRUE) txtComment('This command provides the standard deviation of the profits data set') sd(Forbes2000$profits, na.rm=TRUE) txtComment('This command provides the average profit by country') aggregate(Forbes2000$profits, by=list(Forbes2000$country), function(x) mean(x, na.rm=TRUE)) txtStop() SessionInfo() R version 2.13.2 (2011-09-30) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_CA.UTF-8/en_CA.UTF-8/C/C/en_CA.UTF-8/en_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] TeachingDemos_2.7 loaded via a namespace (and not attached): [1] tools_2.13.2 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] (no subject)
Dear colleagues, I'm trying to fit a multinomial logistic regression for an ordinal variable. I see in the help pages for multinom in nnet that one should scale the predictors from 0-1. Is that really necessary? Also: can anyone clarify what the difference between alternative-specific and individual specific variables are? Yours, Simon Kiss l * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 905 746 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Listing tables together from random samples from a generated population?
. HI there, I'd like to show demonstrate how the chi-squared distribution works, so I've come up with a sample data frame of two categorical variables y<-data.frame(gender=sample(c('Male', 'Female'), size=10, replace=TRUE, c(0.5, 0.5)), tea=sample(c('Yes', 'No'), size=10, replace=TRUE, c(0.5, 0.5))) And I'd like to create a list of 100 different samples of those two variables and the resulting 2X2 contingency tables table(.y[sample(nrow(.y), 100), ]) How would I combine these 100 tables into a list? I'd like to be able to go in and find some of the extreme values to show how the sampling distribution of the chi-square values. I can already get a histogram of 100 different chi-squared values that shows the distribution nicely (see below), but I'd like to actually show the underlying tables, for demonstration's sake. .z<-vector() for (i in 1:100) { .z<-c(.z, chisq.test(table(.y[sample(nrow(.y), 200), ]))$statistic) } hist(.z, xlab='Chi-Square Value', main="Chi-Squared Values From 100 different samples asking\nabout gender and tea/coffee drinking") abline(v=3.84, lty=2) Thank you in advance, Simon Kiss * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 905 746 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] (no subject)
Dear colleagues I'm using data that looks like .test and .test1 below to draw two mosaic plots with cell labelling (the row percentages from the tables). When I take out the pop=FALSE commands in the mosaic commands and comment out the two lines labelling the cells, then the plots are laid out exactly as I'd like: side-by-side. But I do require the cell labelling and the pop=FALSE arguments. I suspect I need to add in a call to pushViewport or an upViewport command, but I'm not sure. Any advice is welcome. library(vcd) library(grid) .test<-as.table(matrix(c(1, 2, 3, 4, 5, 6), nrow=3, ncol=2, byrow=TRUE)) .test<-prop.table(.test, 1) .test1<-as.table(matrix(c(1, 2, 3, 4), nrow=2, ncol=2, byrow=TRUE)) .test1<-prop.table(.test1, 1) dimnames(.test)<-list("Fluoride Cluster"=c('Beneficial\nand Safe', 'Mixed Opinion', 'Harmful With No Benefits'), "Governments Should Not Impose Treatment"=c('Agree', 'Disagree')) dimnames(.test1)<-list("Vaccines Are Too Much To Handle"= c('Agree' , 'Disagree'), "Governments Should Not Oblige Treatment" =c('Agree', 'Disagree')) grid.newpage() pushViewport(viewport(layout=grid.layout(1,2))) pushViewport(viewport(layout.pos.col=1)) mosaic(.test, gp=shading_hsv, pop=FALSE, split_verticaL=FALSE, newpage=FALSE, labeling_args=list(offset_varnames=c(top=3), offset_labels=c(top=2))) labeling_cells(text=round(prop.table(.test, 1), 2)*100, clip=FALSE)(.test) popViewport() pushViewport(viewport(layout.pos.col=2)) mosaic(.test1, gp=shading_hsv, newpage=FALSE,pop=FALSE, split_vertical=FALSE, labeling_args=list(offset_varnames=c(top=3), offset_labels=c(top=2))) labeling_cells(text=round(prop.table(.test1, 1), 2)*100, clip=FALSE)(.test1) popViewport(2) * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 905 746 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help combining cell labelling and multiple mosaic plots
Dear colleagues I'm using data that looks like .test and .test1 below to draw two mosaic plots with cell labelling (the row percentages from the tables). When I take out the pop=FALSE commands in the mosaic commands and comment out the two lines labelling the cells, then the plots are laid out exactly as I'd like: side-by-side. But I do require the cell labelling and the pop=FALSE arguments. I suspect I need to add in a call to pushViewport or an upViewport command, but I'm not sure. Any advice is welcome. library(vcd) library(grid) .test<-as.table(matrix(c(1, 2, 3, 4, 5, 6), nrow=3, ncol=2, byrow=TRUE)) .test<-prop.table(.test, 1) .test1<-as.table(matrix(c(1, 2, 3, 4), nrow=2, ncol=2, byrow=TRUE)) .test1<-prop.table(.test1, 1) dimnames(.test)<-list("Fluoride Cluster"=c('Beneficial\nand Safe', 'Mixed Opinion', 'Harmful With No Benefits'), "Governments Should Not Impose Treatment"=c('Agree', 'Disagree')) dimnames(.test1)<-list("Vaccines Are Too Much To Handle"= c('Agree' , 'Disagree'), "Governments Should Not Oblige Treatment" =c('Agree', 'Disagree')) grid.newpage() pushViewport(viewport(layout=grid.layout(1,2))) pushViewport(viewport(layout.pos.col=1)) mosaic(.test, gp=shading_hsv, pop=FALSE, split_verticaL=FALSE, newpage=FALSE, labeling_args=list(offset_varnames=c(top=3), offset_labels=c(top=2))) labeling_cells(text=round(prop.table(.test, 1), 2)*100, clip=FALSE)(.test) popViewport() pushViewport(viewport(layout.pos.col=2)) mosaic(.test1, gp=shading_hsv, newpage=FALSE,pop=FALSE, split_vertical=FALSE, labeling_args=list(offset_varnames=c(top=3), offset_labels=c(top=2))) labeling_cells(text=round(prop.table(.test1, 1), 2)*100, clip=FALSE)(.test1) popViewport(2) * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 905 746 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] htmlParse hangs or crashes
Dear colleagues, each time I use htmlParse, R crashes or hangs. The url I'd like to parse is included below as is the results of a series of basic commands that describe what I'm experiencing. The results of sessionInfo() are attached at the bottom of the message. The thing is, htmlTreeParse appears to work just fine, although it doesn't appear to contain the information I need (the URLs of the articles linked to on this search page). Regardless, I'd still like to understand why htmlParse doesn't work. Thank you for any insight. Yours, Simon Kiss myurl<-c("http://timesofindia.indiatimes.com/searchresult.cms?sortorder=score&searchtype=2&maxrow=10&startdate=2001-01-01&enddate=2011-08-25&article=2&pagenumber=1&isphrase=no&query=IIM&searchfield=§ion=&kdaterange=30&date1mm=01&date1dd=01&date1=2001&date2mm=08&date2dd=25&date2=2011";) .x<-htmlParse(myurl) class(.x) #returns "HTMLInternalDocument" "XMLInternalDocument" .x #returns *** caught segfault *** address 0x1398754, cause 'memory not mapped' Traceback: 1: .Call("RS_XML_dumpHTMLDoc", doc, as.integer(indent), as.character(encoding), as.logical(indent), PACKAGE = "XML") 2: saveXML(from) 3: saveXML(from) 4: asMethod(object) 5: as(x, "character") 6: cat(as(x, "character"), "\n") 7: print.XMLInternalDocument() 8: print() Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace sessionInfo() R version 2.13.0 (2011-04-13) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_CA.UTF-8/en_CA.UTF-8/C/C/en_CA.UTF-8/en_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] XML_3.4-0 RCurl_1.5-0bitops_1.0-4.1 * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 905 746 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R hangs after htmlTreeParse
Dear colleagues, I'm trying to parse the html content from this webpage: http://timesofindia.indiatimes.com/searchresult.cms?sortorder=score&searchtype=2&maxrow=10&startdate=2001-01-01&enddate=2011-08-25&article=2&pagenumber=1&isphrase=no&query=IIM&searchfield=§ion=&kdaterange=30&date1mm=01&date1dd=01&date1=2001&date2mm=08&date2dd=25&date2=2011 Using the following code library(RCurl) library(XML) myurl<-c("http://timesofindia.indiatimes.com/searchresult.cms?sortorder=score&searchtype=2&maxrow=10&startdate=2001-01-01&enddate=2011-08-25&article=2&pagenumber=1&isphrase=no&query=IIM&searchfield=§ion=&kdaterange=30&date1mm=01&date1dd=01&date1=2001&date2mm=08&date2dd=25&date2=2011";) .x<-getURL(myurl) htmlTreeParse(.x, asText=T) This prints approximately 15 lines of the output from the html document and then mysteriously stops. The command line prompt does not reappear and force quit is the only option. I'm running R 2.13 on Mac os 10.6 and the latest versions of XML and RCURL are installed. Yours, Simon Kiss __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Comparison of means in survey package
Dear list colleagues, I'm trying to come up with a test question for undergraduates to illustrate comparison of means from a complex survey design. The data for the example looks roughly like this: mytest<-data.frame(harper=rnorm(500, mean=60, sd=1), party=sample(c("BQ", "NDP", "Conservative", "Liberal", "None", NA), size=500, replace=TRUE), natwgt=sample(c(0.88, 0.99, 1.43, 1.22, 1.1), size=500, replace=TRUE), gender=sample(c("Male", "Female"), size=500, replace=TRUE)) Using svyby I can get the means for each group of interest (primarily the party variable), but I can't get further to actually do the comparison of means. I saw a reference on the help listserv to the effect that the survey package does not do ttests and that one should use svyglm. However, that was in 2009 and I see that there's a command, svytteset in the package which seems to be on point. However, when I've tried that command I can't get it to work: it returns the following error message: t = NaN, df = 3255, p-value = NA alternative hypothesis: true difference in mean is not equal to 0 sample estimates: difference in mean 38.80387 This is from my data, not the code above. Would there also be a way just to do the comparison of means test between two subgroups of a factor, and not just on all factor levels? Using 2.13 on mac os 10.6 and the latest version of survey package. Yours, Simon Kiss * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 905 746 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Applyin Weights In RCommander
Dear Colleagues, Do any R-plugins handle complex sampling procedures? I know that survey is probably the best one from the command line and the standard linear model can handle it in the RCommander, but I'd like to be able to show students how to apply weights doing simple descriptive statistics as well, in R Commander. Yours, Simon Kiss * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 905 746 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Apostrophes in R Commander in recode
Dear colleagues, I'm using R64 (2.13) on Mac OS 10.6.8 and I've encountered a problem with the recode function in Rcommander. The application cannot deal with apostrpohes ( ' ) do not. I've got a factor from the 2008 Canada Election study (highest level of schooling) and some of the values include "Bachelor's Degree" , "Master's Degree". I've troubleshooted (shot?) the recode function for all the levels and it's really the apostrophe that is the problem. When entering "Bachelor's Degree"=1, I get the error message [39] ERROR: Use only double-quotes (" ") in recode directives I see also that the same problem exists in recode from the command line. There are two ways I can solve this myself, but neither are both are a bit more complex than the context requires (e.g. exercises for an undergraduate class). I can use gsub from the command line to remove the apostrophes, or i can import the data file without using value labels as factor levels and that would doubtless work. But the technical documentation for the CES is very poor; my students would have to end up opening up the original .sav file in PASW and hunt down what the underlying factor levels refer to in that instance. Is there a solution within R Commander? Yours, Simon Kiss * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 905 746 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cycling from x11 window in RCommander to graphics device window: Mac Os 10.6.8
Dear John, The Command Tab does not work for me, but I have been able to get expose to work. I.e. it does bring up all windows, including the x11 terminal. It will take a little getting used to, but it is functional. I apologize for cluttering the list with minutiae Thank you! Yours S. On 2011-07-28, at 5:21 PM, John Fox wrote: > Dear Simon, > > I'm sitting in front of a MacBook Pro and Command-tab works perfectly fine > for me: Selecting X11 brings the R Commander Window to the front, and > selecting R brings the Quartz graphics window to the front. I must admit that > my habit in classroom demonstrations on a Mac is to use Expose to select > Windows, but, unless I misunderstand your problem, Command-tab also works. > > I'm using R 2.13.1 under Mac OS X 10.6.7 with XQuartz 2.3.6 and > tcltk-8.5.5-x11. > > I hope this helps, > John > > > John Fox > Sen. William McMaster Prof. of Social Statistics > Department of Sociology > McMaster University > Hamilton, Ontario, Canada > http://socserv.mcmaster.ca/jfox/ > On Thu, 28 Jul 2011 13:40:11 -0400 > Simon Kiss wrote: >> Dear Colleagues, >> I have recently installed R Commander on my Mac OS 10.6.8. I'd like to use >> it for an undergraduate class this year. >> Everything appears to be working fine, except for one thing. I cannot use >> Command-tab to cycle from the X11 window in which RCommander is running to >> any other window open in my workspace. This is particularly important >> because I cannot cycle to the graphics device window that is opened when I >> call a new plot. If I force quit the X11 window and Rcommander, R remains >> running and I can see the graphics device window and the plot looks fine. >> But as you can imagine, this is quite laborious, having to restart. >> I've looked through the help documentation and tried reinstalling tcltk >> prior to opening up Rcommander, but that does not address the problem. >> Any thoughts? >> Yours, Simon Kiss >> * >> Simon J. Kiss, PhD >> Assistant Professor, Wilfrid Laurier University >> 73 George Street >> Brantford, Ontario, Canada >> N3T 2C9 >> Cell: +1 905 746 7606 >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 905 746 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] cycling from x11 window in RCommander to graphics device window: Mac Os 10.6.8
Dear Colleagues, I have recently installed R Commander on my Mac OS 10.6.8. I'd like to use it for an undergraduate class this year. Everything appears to be working fine, except for one thing. I cannot use Command-tab to cycle from the X11 window in which RCommander is running to any other window open in my workspace. This is particularly important because I cannot cycle to the graphics device window that is opened when I call a new plot. If I force quit the X11 window and Rcommander, R remains running and I can see the graphics device window and the plot looks fine. But as you can imagine, this is quite laborious, having to restart. I've looked through the help documentation and tried reinstalling tcltk prior to opening up Rcommander, but that does not address the problem. Any thoughts? Yours, Simon Kiss * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 905 746 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sum part of a vector
Dear colleagues, I have a data set that looks roughly like this; mydat<-data.frame(state=c(rep("Alabama", 5), rep("Delaware", 5), rep("California", 5)), news=runif(15, min=0, max=8), cum.news=rep(0, 15)) For each state, I'd like to cumulatively sum the value of "news" and make that put that value in cum.news. I'm trying as follows but I get really weird results. One thing is that it keeps counting 0's as 1. for (i in levels(mydat$state)) { mydat[mydat$state==i, ]$cum.news<-sapply(mydat[mydat$state==i, ]$news, function(x) sum(1:x)) } I can sort of get the same sapply function to do what I want when working on a test string test<-1:10 sapply(test, function(x) sum(1:x)) Any thoughts? Simon Kiss * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 905 746 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grep lines before or after pattern matched?
Josh, that's amazing. Is there any way to have it grab two different lines after the grep, say the second and the fourth line? There's some other information in the text file I'd like to grab. I could do two separate commands, but I'd like to know if this could be done in one command... Simon Kiss On 2011-07-11, at 1:31 PM, Joshua Wiley wrote: > If you know you can find the start of the document (say that line > always starts with Document...), then: > > grep("Document+.", yourfile, value = FALSE) + 4 > > should give you 4 lines after each line where Document occurred. No > loop needed :) > > On Mon, Jul 11, 2011 at 10:25 AM, Simon Kiss wrote: >> Hi Josh, >> Sorry for the insufficient introduction. This might work, but I'm not sure. >> The file that I have includes up to 100 documents (Document 1, Document 2, >> Document 3Document 100) with the newspaper name following 4 lines below >> each Document number. >> I'm using readlines to get the text file into R and then trying to use grep >> to get the newspaper name for each record. But your idea of indexing the >> text object read into R with the line number where the newspaper name is >> found is a good one. I'll just have to come up with a loop to tell R to get >> the 4th, 8th, 12, 16th, line, etc. >> I'll see if I can get that to work. >> Simon >> On 2011-07-11, at 12:45 PM, Joshua Wiley wrote: >> >>> Dear Simon, >>> >>> Maybe I don't understand properlyif you are doing this in R, can't >>> you just pick the line you want? >>> >>> Josh >>> >>> ## print your data to clipboard >>> cat("Document 1 of 100 \n \n \n Newspaper Name \n \n Day Date", file = >>> "clipboard") >>> ## read data in, and only select the 4th line to pass to grep() >>> grep("pattern", x = readLines("clipboard")[4]) >>> >>> >>> On Mon, Jul 11, 2011 at 9:31 AM, Simon Kiss wrote: >>>> Dear colleagues, >>>> I have a series of newspaper articles in a text file, downloaded from a >>>> text file. They look as follows: >>>> >>>> Document 1 of 100 >>>> \n >>>> \n >>>> \n >>>> Newspaper Name >>>> \n >>>> \n >>>> Day Date >>>> >>>> I have a series of grep scripts that can extract the date and convert it >>>> to a date object, but I can't figure out how to grep the newspaper name. >>>> There is no field ID attached to those lines. The best I can come up with >>>> would be to have the program grep the four lines following matching the >>>> pattern "Document [0-9]". There is an an argument to grep in unix that >>>> can do this ...grep -A4 'pattern' infile>outfile, but I don't know if >>>> there is an equivalent argument in R. >>>> >>>> Any thoughts. >>>> Yours, Simon Kiss >>>> * >>>> Simon J. Kiss, PhD >>>> Assistant Professor, Wilfrid Laurier University >>>> 73 George Street >>>> Brantford, Ontario, Canada >>>> N3T 2C9 >>>> Cell: +1 905 746 7606 >>>> >>>> __ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >>> >>> >>> -- >>> Joshua Wiley >>> Ph.D. Student, Health Psychology >>> University of California, Los Angeles >>> https://joshuawiley.com/ >> >> * >> Simon J. Kiss, PhD >> Assistant Professor, Wilfrid Laurier University >> 73 George Street >> Brantford, Ontario, Canada >> N3T 2C9 >> Cell: +1 905 746 7606 >> >> >> >> >> >> >> >> >> >> >> >> > > > > -- > Joshua Wiley > Ph.D. Student, Health Psychology > University of California, Los Angeles > https://joshuawiley.com/ * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 905 746 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grep lines before or after pattern matched?
Hi Josh, Sorry for the insufficient introduction. This might work, but I'm not sure. The file that I have includes up to 100 documents (Document 1, Document 2, Document 3Document 100) with the newspaper name following 4 lines below each Document number. I'm using readlines to get the text file into R and then trying to use grep to get the newspaper name for each record. But your idea of indexing the text object read into R with the line number where the newspaper name is found is a good one. I'll just have to come up with a loop to tell R to get the 4th, 8th, 12, 16th, line, etc. I'll see if I can get that to work. Simon On 2011-07-11, at 12:45 PM, Joshua Wiley wrote: > Dear Simon, > > Maybe I don't understand properlyif you are doing this in R, can't > you just pick the line you want? > > Josh > > ## print your data to clipboard > cat("Document 1 of 100 \n \n \n Newspaper Name \n \n Day Date", file = > "clipboard") > ## read data in, and only select the 4th line to pass to grep() > grep("pattern", x = readLines("clipboard")[4]) > > > On Mon, Jul 11, 2011 at 9:31 AM, Simon Kiss wrote: >> Dear colleagues, >> I have a series of newspaper articles in a text file, downloaded from a text >> file. They look as follows: >> >> Document 1 of 100 >> \n >> \n >> \n >> Newspaper Name >> \n >> \n >> Day Date >> >> I have a series of grep scripts that can extract the date and convert it to >> a date object, but I can't figure out how to grep the newspaper name. There >> is no field ID attached to those lines. The best I can come up with would be >> to have the program grep the four lines following matching the pattern >> "Document [0-9]". There is an an argument to grep in unix that can do this >> ...grep -A4 'pattern' infile>outfile, but I don't know if there is an >> equivalent argument in R. >> >> Any thoughts. >> Yours, Simon Kiss >> * >> Simon J. Kiss, PhD >> Assistant Professor, Wilfrid Laurier University >> 73 George Street >> Brantford, Ontario, Canada >> N3T 2C9 >> Cell: +1 905 746 7606 >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Joshua Wiley > Ph.D. Student, Health Psychology > University of California, Los Angeles > https://joshuawiley.com/ * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 905 746 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] grep lines before or after pattern matched?
Dear colleagues, I have a series of newspaper articles in a text file, downloaded from a text file. They look as follows: Document 1 of 100 \n \n \n Newspaper Name \n \n Day Date I have a series of grep scripts that can extract the date and convert it to a date object, but I can't figure out how to grep the newspaper name. There is no field ID attached to those lines. The best I can come up with would be to have the program grep the four lines following matching the pattern "Document [0-9]". There is an an argument to grep in unix that can do this ...grep -A4 'pattern' infile>outfile, but I don't know if there is an equivalent argument in R. Any thoughts. Yours, Simon Kiss * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 905 746 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] error message trying to plot survival curves from hypothetical covariate profiles
Dear colleagues, following John Fox' advice in this article (http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-cox-regression.pdf), I'm trying to create a new data frame to examine the differential survival curves from a combination of covariates. These are derived from a Cox Proportional Hazards model I fit to data about the diffusion of a particular policy across American states over a period of 7 years. The original dataset looks as follows: 'data.frame': 819 obs. of 10 variables: $ state : Factor w/ 39 levels "Alabama","Arkansas",..: 1 1 1 1 1 1 1 1 1 1 ... $ year: num 2005 2005 2005 2006 2006 ... $ enviro : num 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 ... $ ban : num 0 0 0 0 0 0 0 0 0 0 ... $ partisan: Factor w/ 3 levels "democrat","mixed",..: 1 1 1 1 1 1 1 1 1 1 ... $ news: num 0 0 0 0 0 0 0 0 0 0 ... $ start : num 2005 2005 2005 2006 2006 ... $ stop: num 2006 2006 2006 2007 2007 ... $ risk: num 1 2 3 1 2 3 1 2 3 1 ... $ evstatus: num 0 0 0 0 0 0 0 0 0 0 ... I am modelling the survival time until the adoption of the policy as follows: mod1<-Surv(newdat$start, newdat$stop, newdat$evstatus) mymod1<-coxph(mod1 ~ news + enviro + partisan + cluster(state) + strata(evstatus), method=c("efron"), robust=TRUE) Again, following Fox, I try to construct a data frame with a hypothetical covariate profile: n<-data.frame(news=rep(c(1,4,8)), evstatus=as.factor(1:3), enviro=mean(newdat$enviro), partisan=c("democrat", "mixed", "republican")) plot(survfit(mymod1, newdata=n)) Error in scale.default(x2, center = xcenter, scale = FALSE) : length of 'center' must equal the number of columns of 'x' I've looked and someone encountered a similar error trying to plot predicted values from a stepwise regression. That issue did not appear to be solved. On the surface of it it seems that I need to expand the 'n' data frame to have an equal number of columns as the original (newdat), although perusing Fox's data and instructions, that does not appear to be the case there, so I'm a little bit lost. Any guidance is appreciated Yours truly, Simon J. Kiss * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 905 746 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] indexing list elements with lapply?
Dear colleagues, I have a list that looks like what the code below produces. I need a function to go through each list element and work on the second column of each list element (the first column is irrelevant to me...if the proposed function works on the first column as a consequence of a writing something simple, that's fine). I need to index the second column of each list element to the first item in each column. So for each list, I need to divide each number in the second column by the first number in that column. This code does what I want, but it only works on one item in the list r[[1]][,2] / r[[1]][1,2]. I've tried working with this function but can't get it to work: f<-function(x) { for (i in 1:5) { x[[i]][,2]/x[[i]][1,2] } } lapply(r, f) But I get this error message: Error in x[[i]][, 2] : incorrect number of dimensions Hope someone can help. I'm grateful for any suggestions. Yours, Simon Kiss **dataset ff<-runif(10, 0.85, 1) ff<-cbind(ff, 1-ff) gg<-runif(10, 0.85, 1) gg<-cbind(gg, 1-ff) hh<-runif(10, 0.86, 1) hh<-cbind(hh, 1-hh) ii<-runif(10, 0.92, 1) ii<-cbind(ii, 1-ii) jj<-runif(10,0.76, 1) jj<-cbind(jj, 1-jj) r<-list(ff, gg, hh,ii, jj) * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 519 761 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] error in recode.defalt ....object '.data' not found
Dear colleagues, working with the data frame below, trying to reverse two variables I the error message below. i searched through the help list but could not find any postings which could help me solve the situation. I tried attaching and detaching the data frame to no avail. Yours, Simon Kiss *DATA FRAME 'data.frame': 1569 obs. of 9 variables: $ equal : num 3 4 3 2 3 4 2 3 2 2 ... $ disc : num 3 2 3 3 2 2 3 3 3 3 ... $ family: num 3 2 2 2 3 2 2 1 2 1 ... $ special : num 3 3 4 4 3 3 4 4 3 4 ... $ immigrants: num 3 8 3 8 3 3 4 1 1 2 ... $ wedlock : num 3 3 3 3 3 2 2 8 2 3 ... $ crime : num 3 2 2 1 2 3 1 8 2 1 ... $ breakdown : num 3 3 3 2 2 4 8 2 2 4 ... $ nonwhites : num 2 4 3 3 2 2 3 4 3 3 ... *RECODE social$nonwhites<-recode(social$nonwhites, "1=4; 2=3; 3=2; 4=1; 8=NA; -9=NA") *ERROR Error in recode.default(social$nonwhites, "1=4; 2=3; 3=2; 4=1; 8=NA; -9=NA") : object '.data' not found * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 519 761 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sampling design runs with no errors but returns empty data set
Dear colleagues, I'm working with the 2008 Canada Election Studies (http://www.queensu.ca/cora/_files/_CES/CES2008.sav.zip), trying to construct a weighted national sample using the survey package. Three weights are included in the national survey (a household weight, a provincial weight and a national weight which is a product of the first two). In the following code I removed variables with missing national weights and tried to construct the sample from advice I've gleaned from the documentation for the survey package and other help requests. There are no errors, but the data frame (weight_test) contains no What am I missing? Yours, Simon Kiss P.S. The code is only reproducible if the data set is downloadable. I'm nt sure ces<-read.spss(file.choose(), to.data.frame=TRUE, use.value.labels=FALSE) missing_data<-subset(ces1, !is.na(ces08_NATWGT)) weight_test<-svydesign(id=~0, weights=~ces08_NATWGT, data=missing_data) Note: this is some reproducible code that creates a data set that is a very stripped down version of what I'm working with, but with this, the surveydesign function appears to work properly. mydat<-data.frame(ces08_HHWGT=runif(3000, 0.5, 5), ces08_PROVWGT=runif(3000, 0.6, 1.2), party=sample(c("NDP", "BQ", "Lib", "Con"), 3000, replace=TRUE), age=sample(seq(18, 72,1), 3000, replace=TRUE), income=sample(seq(21,121,1), 3000, replace=TRUE)) mydat$ces08_NATWGT<-mydat$ces08_HHWGT*mydat$ces08_PROVWGT weight_test<-svydesign(id=~1, weights=~ces08_NATWGT, data=mydat) * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 519 761 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] baseline hazard function
Dear colleagues, I have the following dataset. It is modelled on the data included in Box-Seteffenheiser and Jones "Event History Modelling" Using the following code, I try to find the baseline hazard function haz_1<-muhaz(bpa$time, bpa$censored, subset=(bpa$year=="2010" | bpa$ban=="1"), min.time=1, max.time=3) I think I'm doing everything right, but what I don't understand is how to derive a duration dependency coefficient rom the values contained in the muhaz object as per Box-Steffenheiser and Jones' recommendations in Ch. 5 of Event History Modelling. I get the following summary(haz_1) Number of Observations .. 50 Censored Observations ... 43 Method used . Local Boundary Correction Type Left and Right Kernel type . Epanechnikov Minimum Time 1 Maximum Time 3 Number of minimization points ... 51 Number of estimation points . 101 Pilot Bandwidth . 0.25 Smoothing Bandwidth . 1.27 Minimum IMSE 6716.9 Can anyone provide any advice? Yours, Simon Kiss 'data.frame': 147 obs. of 7 variables: $ state : Factor w/ 50 levels "Alabama","Alaska",..: 1 1 1 2 2 2 3 3 3 4 ... $ partisan: Factor w/ 3 levels "democrat","mixed",..: 1 1 1 2 2 2 3 3 3 1 ... $ ban : num 0 0 0 0 0 0 0 0 0 0 ... $ year: num 2008 2009 2010 2008 2009 ... $ news: num 1.67 1.67 0 2 0 ... $ time: num 1 2 3 1 2 3 1 2 3 1 ... $ censored: num 0 0 0 0 0 0 0 0 0 0 ... * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 519 761 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] subsetting based on joint values of critera
Dear colleagues, I have a dataset that looks as below. I would like to make a new dataset that excludes the cases which are joint conjunctions of particular state names and years, so Connecticut and 2010, Maryland and 2010 and Vermont and 2010. I'm trying the following subset code: newdata<- subset(bpa, (!State=="Connecticut" & year<"2010")) It appears that it's only evaluating both criteria independently and not jointly, so this is returning all cases in 2008 and 2009, leaving out connecticut for those years as well. How do I get subset to return a dataset based on the joint occurrence of values of two variables? Yours, Simon Kiss str(bpa) 'data.frame': 150 obs. of 5 variables: $ State : Factor w/ 50 levels "Alabama","Alaska",..: 1 2 3 4 5 6 7 8 9 10 ... $ year: num 2008 2008 2008 2008 2008 ... $ ban : num 0 0 0 0 0 0 0 0 0 0 ... $ partisan: Factor w/ 3 levels "democrat","mixed",..: 1 1 1 1 1 1 1 2 3 2 ... $ news: num 1.67 2 0 0 2.38 ... * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 519 761 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Counting dates in arbitrary ranges
Dear Colleagues, I have a data set that looks as below. I'd like to count the number of dates in a series of arbitrary ranges (breaks) i.e. not pre-defined breaks such as months, quarters or years. table(format()) produces ideally formatted output, but table() does not appear to accept arbitrary ranges. I also tried converting the dates to numeric and using histogram to try to get the data, but that doesn't work either. Cut appears to accept an arbitrary range, but I could only get it to produce NAs. Any suggestions? Yours, Simon Kiss mydata<-list(x=seq(as.Date("2007-05-01"), as.Date("2009-09-10"),"days"), y=seq(as.Date("2007-06-16"), as.Date("2009-11-12"),"days")) table(format(mydata[[1]], "%Y")) t_1<-hist(as.numeric(mydata[[1]], breaks=c("14056", "14421")))$counts cut(mydata[[1]], breaks=c(as.Date("2008-06-26"), ("2009=06-26"))) * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 519 761 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Importing multiple text files with lapply.
Hi Jim, Ultimately, I'm going to want to count the frequency of dates by particular time periods (months, quarters, years) for each state and then plot the data. I know there are commands in ggplots that will do that, so I'm not too worried about that, but I was stuck on getting 50 text files (one for each state) read into R. For the record, using read.table individually on a state file will get in a useable format, but wasn't working in conjunction with lapply. To reiterate, the home file has 50 .txt files each with a column of dates in the format I sent you. I will try readLines and see if I can get it to loop through. Yours, Simon Kiss On 2011-01-17, at 7:44 PM, jim holtman wrote: > It sounds like you want to use 'readLines' and not 'read.table' > >> x <- readLines(textConnection("January 11, 2009 > + January 11, 2009 > + October 19, 2008 > + October 13, 2008 > + August 16, 2008 > + June 19, 2008 > + April 19, 2008 > + April 16, 2008 > + February 9, 2008 > + September 2, 2007")) >> closeAllConnections() >> x > [1] "January 11, 2009" "January 11, 2009" "October 19, 2008" > "October 13, 2008" "August 16, 2008" > [6] "June 19, 2008" "April 19, 2008" "April 16, 2008" > "February 9, 2008" "September 2, 2007" >> > > What exactly are you going to do with the data after you read it in? > > On Mon, Jan 17, 2011 at 6:22 PM, Simon Kiss wrote: >> Dear jim, >> Yes, it's true, the data are separated onto new lines as follows: >> January 11, 2009 >> January 11, 2009 >> October 19, 2008 >> October 13, 2008 >> August 16, 2008 >> June 19, 2008 >> April 19, 2008 >> April 16, 2008 >> February 9, 2008 >> September 2, 2007 >> >> I tried your attempt and it didn't work either; it returned the error >> message: >> Error in FUN(X[[1L]], ...) : >> 'file' must be a character string or connection >> >> On 2011-01-17, at 2:02 PM, jim holtman wrote: >> >>> try: >>> >>> mylist <- lapply(a, read.table, header = TRUE, sep = '\n') >>> >>> also is the separator really '\n' meaning a new-line? What exactly >>> does the data look like? >>> >>> On Mon, Jan 17, 2011 at 11:47 AM, Simon Kiss wrote: >>>> Hello, >>>> I'm trying to read in 50 text filess with dates as content to create a >>>> list of tables. >>>> >>>> a is the list of filenames that need to be read in. >>>> >>>> The following command returns the following error >>>> mylist<-lapply(a, read.table(header=TRUE, sep="\n")) >>>> >>>> Error in read.table(header = TRUE, sep = "\n") : >>>> element 1 is empty; >>>> the part of the args list of 'is.character' being evaluated was: >>>> (file) >>>> >>>> Does anyone have any suggestions? >>>> Yours, Simon Kiss >>>> * >>>> Simon J. Kiss, PhD >>>> Assistant Professor, Wilfrid Laurier University >>>> 73 George Street >>>> Brantford, Ontario, Canada >>>> N3T 2C9 >>>> Cell: +1 519 761 7606 >>>> >>>> __ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >>> >>> >>> -- >>> Jim Holtman >>> Data Munger Guru >>> >>> What is the problem that you are trying to solve? >> >> * >> Simon J. Kiss, PhD >> Assistant Professor, Wilfrid Laurier University >> 73 George Street >> Brantford, Ontario, Canada >> N3T 2C9 >> Cell: +1 519 761 7606 >> >> >> >> >> >> >> >> >> >> >> > > > > -- > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 519 761 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Importing multiple text files with lapply.
readLines worked great Jim, thanks! Simon Kiss On 2011-01-17, at 7:44 PM, jim holtman wrote: > It sounds like you want to use 'readLines' and not 'read.table' > >> x <- readLines(textConnection("January 11, 2009 > + January 11, 2009 > + October 19, 2008 > + October 13, 2008 > + August 16, 2008 > + June 19, 2008 > + April 19, 2008 > + April 16, 2008 > + February 9, 2008 > + September 2, 2007")) >> closeAllConnections() >> x > [1] "January 11, 2009" "January 11, 2009" "October 19, 2008" > "October 13, 2008" "August 16, 2008" > [6] "June 19, 2008" "April 19, 2008""April 16, 2008" > "February 9, 2008" "September 2, 2007" >> > > What exactly are you going to do with the data after you read it in? > > On Mon, Jan 17, 2011 at 6:22 PM, Simon Kiss wrote: >> Dear jim, >> Yes, it's true, the data are separated onto new lines as follows: >> January 11, 2009 >> January 11, 2009 >> October 19, 2008 >> October 13, 2008 >> August 16, 2008 >> June 19, 2008 >> April 19, 2008 >> April 16, 2008 >> February 9, 2008 >> September 2, 2007 >> >> I tried your attempt and it didn't work either; it returned the error >> message: >> Error in FUN(X[[1L]], ...) : >> 'file' must be a character string or connection >> >> On 2011-01-17, at 2:02 PM, jim holtman wrote: >> >>> try: >>> >>> mylist <- lapply(a, read.table, header = TRUE, sep = '\n') >>> >>> also is the separator really '\n' meaning a new-line? What exactly >>> does the data look like? >>> >>> On Mon, Jan 17, 2011 at 11:47 AM, Simon Kiss wrote: >>>> Hello, >>>> I'm trying to read in 50 text filess with dates as content to create a >>>> list of tables. >>>> >>>> a is the list of filenames that need to be read in. >>>> >>>> The following command returns the following error >>>> mylist<-lapply(a, read.table(header=TRUE, sep="\n")) >>>> >>>> Error in read.table(header = TRUE, sep = "\n") : >>>> element 1 is empty; >>>> the part of the args list of 'is.character' being evaluated was: >>>> (file) >>>> >>>> Does anyone have any suggestions? >>>> Yours, Simon Kiss >>>> * >>>> Simon J. Kiss, PhD >>>> Assistant Professor, Wilfrid Laurier University >>>> 73 George Street >>>> Brantford, Ontario, Canada >>>> N3T 2C9 >>>> Cell: +1 519 761 7606 >>>> >>>> __ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >>> >>> >>> -- >>> Jim Holtman >>> Data Munger Guru >>> >>> What is the problem that you are trying to solve? >> >> * >> Simon J. Kiss, PhD >> Assistant Professor, Wilfrid Laurier University >> 73 George Street >> Brantford, Ontario, Canada >> N3T 2C9 >> Cell: +1 519 761 7606 >> >> >> >> >> >> >> >> >> >> >> > > > > -- > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 519 761 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Importing multiple text files with lapply.
Dear jim, Yes, it's true, the data are separated onto new lines as follows: January 11, 2009 January 11, 2009 October 19, 2008 October 13, 2008 August 16, 2008 June 19, 2008 April 19, 2008 April 16, 2008 February 9, 2008 September 2, 2007 I tried your attempt and it didn't work either; it returned the error message: Error in FUN(X[[1L]], ...) : 'file' must be a character string or connection On 2011-01-17, at 2:02 PM, jim holtman wrote: > try: > > mylist <- lapply(a, read.table, header = TRUE, sep = '\n') > > also is the separator really '\n' meaning a new-line? What exactly > does the data look like? > > On Mon, Jan 17, 2011 at 11:47 AM, Simon Kiss wrote: >> Hello, >> I'm trying to read in 50 text filess with dates as content to create a list >> of tables. >> >> a is the list of filenames that need to be read in. >> >> The following command returns the following error >> mylist<-lapply(a, read.table(header=TRUE, sep="\n")) >> >> Error in read.table(header = TRUE, sep = "\n") : >> element 1 is empty; >> the part of the args list of 'is.character' being evaluated was: >> (file) >> >> Does anyone have any suggestions? >> Yours, Simon Kiss >> * >> Simon J. Kiss, PhD >> Assistant Professor, Wilfrid Laurier University >> 73 George Street >> Brantford, Ontario, Canada >> N3T 2C9 >> Cell: +1 519 761 7606 >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 519 761 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Importing multiple text files with lapply.
Hello, I'm trying to read in 50 text filess with dates as content to create a list of tables. a is the list of filenames that need to be read in. The following command returns the following error mylist<-lapply(a, read.table(header=TRUE, sep="\n")) Error in read.table(header = TRUE, sep = "\n") : element 1 is empty; the part of the args list of 'is.character' being evaluated was: (file) Does anyone have any suggestions? Yours, Simon Kiss * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 519 761 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] 45 Degree labels on barplot? Help understanding code previously posted.
Dear colleagues, i found a line or two of code in the help archives from Uwe Ligges about creating slanted x-labels for a barplot and it works well for my purposes (code below). However, I was hoping someone could explain to me precisely what the code is doing. I'm aware it's invoking the text command, and I know the first ttwo arguments to text are x and y co-ordinates. I'm also aware that par("usr")[3] is grabbing the third element of the vector of plotting co-ordinates. But I tried replacing par("usr")[3] with just "0" and that didn't work; all the labels got bunched up on the left. Is it necessary to create a new object via "barplot" and then quote that in the x,y coordinates of text? Like I said, the code works great, but I'm trying to actually understand the rationale behind the elements so I can apply it in future. Yours, Simon Kiss #Reproducible Code mydat<-data.frame(countries=c("Canada", "Denmark", "Framce", "United Kingdom", "Germany", "Australia", "New Zealand", "Switzerland", "Belgium", "Netherlands"), stories_total=c(429, 25, 239, 99, 100, 96, 18, 21, 0, 6), avg=c(4.165048544, 6.25, 6.459459459, 0.908256881, 1.923076923, 1.103448276, 1.058823529, 1.615384615, 0, 0.107142857), steps=c(2, 2, 2, 0,1, 1, 1, 0,0,0), newspapers=c(103, 4, 37, 109, 52, 87, 17, 13, 10, 56)) mydat.sort1<-mydat[order(-mydat$avg), ] myplot<-barplot(mydat.sort1$avg, col=c("black", "black", "black", "grey", "white", "grey", "grey", "white", "white", "white"), ylim=c(0,7), main="Regulatory Action On Bisphenol A By Newspaper Coverage") col.vec=c("black", "grey", "white") legend("topright", col=col.vec, fill=c("black", "grey", "white"), legend=c("Meaningful Ban", "Recommendations To Withdraw", "No Legislative Action")) labels=mydat.sort1$countries #These lines create the labels text(myplot, par("usr")[3], labels=labels, srt=35, offset=1, adj=1, xpd=TRUE) axis(2) par("usr")[3] * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 519 761 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] separate elements of a character vector
Dear colleagues, this seems like an easy problem, and I found some suggestions which I've incorporated in the help list, but I can't quite get it right. I want to add a series of years to a second x-axis category label. I generate them with test and test_2 below, format them with some spacing (which is the suggestion I took from the R-list) and concatenate them and then write them with mtext. At the end, the labels in test are bunched up together in the center of the plot window. Can anyone suggest a way to space out the elements of "test" to look like evenly-spaced x-labels? Yours, Simon Kiss x1<-rnorm(500) plot(x1) test<-seq(1987, 2002, by=1) test_2<-seq(2003, 2006, by=1) test<-format(c(test, test_2), width=5) mtext(test, side=1, line=2) * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 519 761 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Create single vector after looping through multiple data frames with GREP
Hello all, I changed the subject line of the e-mail, because the question I''m posing now is different than the first one. I hope that this is proper etiquette. However, the original chain is included below. I've incorporated bits of both Ethan and Brian's code into the script below, but there's one aspect I can't get my head around. I'm totally new to programming with control structures. The reproducible code below creates a list containing 19 data frames, one each for the "Most Important Problem" survey data for Canada. What I'd like at this stage is a loop where I can search through all the data frames for rows containing the search term and then bind the rows together in a plotable (sp?) format. At the bottom of the code below, you'll find my first attempt to make use of a search string and to put it into a plotable format. It only partially works. I can only get the numbers for one year, where I'd like to be able to get a string of numbers for several years.But, on the upside, grep appears to do the trick in terms of selecting rows. Can any one suggest a solution? Yours truly, Simon Kiss #This is the reproducible code to set-up all the data frames require("XML") library(XML) #This gets the data from the web and lists them mylist <- paste ("http://www.queensu.ca/cora/_trends/mip_";, c(1987:2001,2003:2006), ".htm", sep="") alltables <- lapply(mylist, readHTMLTable) #convert to dataframes r<-lapply(alltables, function(x) {as.data.frame(x)} ) #This is just some house-cleaning; structuring all the tables so they are uniform r[[1]][3]<-r[[1]][2] r[[1]][2]<-c(" ") r[[2]][4]<-r[[2]][2] r[[2]][5]<-r[[2]][3] r[[2]][2:3]<-c(" ") r[[3]][4:5]<-r[[3]][3:4] r[[3]][3]<-c(" ") #This loop deletes some superfluous columns and rows, turns the first column in to character strings and the data into numeric for (i in 1:19) { n.rows<-dim(r[[i]])[1] r[[i]] <- r[[i]][15:n.rows-3, 1:5] n.rows<-dim(r[[i]])[1] row.names(r[[i]]) <-NULL names(r[[i]]) <- c("Response", "Q1", "Q2", "Q3", "Q4") r[[i]][, 1]<-as.character(r[[i]][,1]) #r[[i]][,2:5]<-as.numeric(as.character(r[[i]][,2:5])) r[[i]][, 2:5]<-lapply(r[[i]][, 2:5], function(x) {as.numeric(as.character(x))}) #n.rows<-dim(r[[i]])[1] #r[[i]]<-r[[i]][9 } #This code is my first attempt at introducing a search string, getting the rows, binding and plotting; economy<-r[[10]][grep('Economy', r[[10]][,1]),] economy_2<-r[[11]][grep('Economy', r[[11]][,1]),] test<-cbind(economy, economy_2) plot(as.numeric(test), type='l') #here's another attempt I'm trying economy<-data.frame for (i in 15:19) { economy[i,] <-r[[i]][grep('Economy', r[[i]][,1]), ] } Begin forwarded message: > From: Simon Kiss > Date: October 7, 2010 4:59:46 PM EDT > To: Simon Kiss > Subject: Fwd: [R] Converting scraped data > > > > Begin forwarded message: > >> From: Ethan Brown >> Date: October 6, 2010 4:22:41 PM GMT-04:00 >> To: Simon Kiss >> Cc: r-help@r-project.org >> Subject: Re: [R] Converting scraped data >> >> Hi Simon, >> >> You'll notice the "test" data.frame has a whole mix of characters in >> the columns you're interested, including a "-" for missing values, and >> that the columns you're interested in are in fact factors. >> >> as.numeric(factor) returns the level of the factor, not the value of >> the level. (See ?levels and ?factor)--that's why it's giving you those >> irrelevant integers. I always end up using something like this handy >> code snippet to deal with the situation: >> >> unfactor <- function(factors) >> # From http://psychlab2.ucr.edu/rwiki/index.php/R_Code_Snippets#unfactor >> # Transform a factor back into its factor names >> { >> return(levels(factors)[factors]) >> } >> >> Then, to get your data to where you want it, I'd do this: >> >> require(XML) >> theurl <- "http://www.queensu.ca/cora/_trends/mip_2006.htm"; >> tables <- readHTMLTable(theurl) >> n.rows <- unlist(lapply(tables, function(t) dim(t)[1])) >> class(tables) >> test<-data.frame(tables, stringsAsFactors=FALSE) >> >> >> result <- test[11:42, 1:5] #Extract the actual data we want >> names(result) <- c("Response", "Q1", "Q2","Q3","Q4") >> for(i in 2:5) { >> # Convert columns to factors >> result[,i] <- as.numeric(unfactor(result[,i])) >> } >> result >> >> From here
[R] Converting scraped data
Dear Colleagues, I used this code to scrape data from the URL conatined within. This code should be reproducible. require("XML") library(XML) theurl <- "http://www.queensu.ca/cora/_trends/mip_2006.htm"; tables <- readHTMLTable(theurl) n.rows <- unlist(lapply(tables, function(t) dim(t)[1])) class(tables) test<-data.frame(tables, stringsAsFactors=FALSE) test[16,c(2:5)] as.numeric(test[16,c(2:5)]) quartz() plot(c(1:4), test[15, c(2:5)]) calling the values from the row of interest using test[16, c(2:5)] can bring them up as represented on the screen, plotting them or coercing them to numeric changes the values and in a way that doesn't make sense to me. My intuitino is that there is something going on with the way the characters are coded or classed when they're scraped into R. I've looked around the help files for converting from character to numeric but can't find a solution. I also tried this: as.numeric(as.character(test[16,c(2:5)] and that also changed the values from what they originally were. I'm grateful for any suggestions. Yours, Simon Kiss * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 519 761 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] group means of multi-way table?
hello, can someone tell me how to generate the means for a data frame that looks like this? My data frame has many more variables, but I won't bother you with those; these are the one's that I'm interested in. Needless to say, z is the variable in which I'm interested. I'd like to find out the mean score of z for NDP managers, Conservative managers and Liberal managers and then for a few other configurations. Ive played around with aggregate, tapply and by, but I can't get it to work. Cordially, Simon Kiss mydata=data.frame(cbind(x,y,z)) mydata$x=as.factor(sample(c("labourers", "salaried", "managers"), size=300, replace=TRUE)) mydata$y=as.factor(sample(c("NDP", "Green", "Liberal", "Conservative"), size=300, replace=TRUE)) mydata$z=as.numeric(sample(1:4, size=300, replace=TRUE)) * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 223 Grand River Hall, 171 Colborne Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 519 761 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Grouping and stacking bar plot for categorical variables
Hi all, I have a series of cateogiral variables that look just like this: welfare=sample(c("less", "same", "more"), 1000, replace=TRUE) education=sample(c("less", "same", "more"), 1000, replace=TRUE) defence=sample(c("less", "same", "more"), 1000, replace=TRUE) egp=sample(c("salariat", "routine non-manual", "self-employed, farmers", "skilled labour, foremen", "unskilled labour", "social and cultural specialists"), 1000, replace=TRUE) welfare, education and defence are responses to a series of questions about whether or not the respondent supports, less, the same or more spending on an issue. egp is a class category. What I would like is a barplot that is both stacked and grouped. The x-axis categories should be the egp class category. Within each class category I would like a cluster of stacked bars that show the distribution of spending support for each issue. Can anyone suggest something? Yours, Simon Kiss * Simon J. Kiss, PhD SSHRC and DAAD Post-Doctoral Fellow John F. Kennedy Institute of North America Studies Free University of Berlin Lansstraße 7-9 14195 Berlin, Germany Cell: +49 (0)1525-300-2812, Web: http://www.jfki.fu-berlin.de/index.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] generate irregular series of dates
Dear Gabor, Yours worked really well. For what it's worth, here is the final product. I also added a line or two to reconvert the dates back to written form (October 15 2010). require(chron) dd <- seq(as.Date("INSERT FIRST DATE OF CLASSES IN TERM HERE"), as.Date("INSERT LAST DAY OF CLASSES IN TERM HERE"), "day") a=as.character(dd[weekdays(dd) %in% c("INSERT FIRST WEEKDAY OF CLASS", "INSERT SECOND WEEKDAY OF CLASS")]) a=chron(a, format = c(dates="y-m-d"), out.format=c(dates="month day, year")) write.table(a, "INSERT FILE LOCATION WHERE YOU WISH TO SAVE DATES", quote=FALSE, col.names=FALSE, row.names=FALSE) Thanks a lot. Simon Kiss On 29-Jun-10, at 9:21 PM, Gabor Grothendieck wrote: On Tue, Jun 29, 2010 at 6:22 AM, Simon Kiss wrote: Dear colleagues, particularly academic ones, So I'm creating a Microsoft Word template for myself so that every time I teach a new course, I don't have to enter in the dates manually for each class session. I'd like to use an R script that can generate an irregular series of dates starting from one date (semester begin) to another (semester end) using an irregular interval in between (Tuesdays and Thursdays, for example). I know that a regular series of dates is no problem, but what about an irregular series? Generate all the dates in the range of interest and then pick off the Tuesdays and Thursdays: dd <- seq(as.Date("2010-01-01"), as.Date("2010-12-31"), "day") dd[weekdays(dd) %in% c("Tuesday", "Thursday")] * Simon J. Kiss, PhD SSHRC and DAAD Post-Doctoral Fellow John F. Kennedy Institute of North America Studies Free University of Berlin Lansstraße 7-9 14195 Berlin, Germany Cell: +49 (0)1525-300-2812, Web: http://www.jfki.fu-berlin.de/index.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] generate irregular series of dates
Dear colleagues, particularly academic ones, So I'm creating a Microsoft Word template for myself so that every time I teach a new course, I don't have to enter in the dates manually for each class session. I'd like to use an R script that can generate an irregular series of dates starting from one date (semester begin) to another (semester end) using an irregular interval in between (Tuesdays and Thursdays, for example). I know that a regular series of dates is no problem, but what about an irregular series? Yours, Simon Kisss * Simon J. Kiss, PhD SSHRC and DAAD Post-Doctoral Fellow John F. Kennedy Institute of North America Studies Free University of Berlin Lansstraße 7-9 14195 Berlin, Germany Cell: +49 (0)1525-300-2812, Web: http://www.jfki.fu-berlin.de/index.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Stacked Histogram, multiple lines for dates of news stories?
Dear colleagues, I have extracted the dates of several news stories from a newspaper data base to chart coverage trends of an issue over time. They are in a data frame that looks just like one generated by the reproducible code below. I can already generate a histogram of the dates with various intervals (months, quarters, weeks years) using hist.Date. However, there are two other things I'd like to do. First, I'd like to either create a stacked histogram so that one could see whether one newspaper really pushed coverage of an issue at a certain point while others then followed later on in time. Second, or alternatively, I would like to do a line graph of the same data for the different papers to represent the same trends. I guess what I'm finding challenging is that I don't have counts of the number of stories on each day or in each week or in each month; I just have the dates themselves. The date.Hist command was very useful in turning those into bins, but I'd like to push it a bit further and to a stacked histogram or a multiple line chart. Can anyone suggest a way to go about doing this? I should say, I played around in Hadley Wickham's ggplot package and looked at his website, and there is a way to render multiple lines here: http://had.co.nz/ggplot2/scale_date.html but it was not clear to me how to plot just the dates or an index of the dates as I don't have a value for the y axis, other than the number of times a story was published in that time frame. Regardless, I hope someone can suggest something. Yours, Simon J. Kiss test=sample(1:3, 50, replace=TRUE) test=as.factor(test) levels(test)=c("Star", "Globe and Mail", "Post") test2=ISOdatetime(sample(2004:2009, 50, replace=TRUE), sample(1:12, size=50, replace=TRUE), sample(1:30, 50, replace=TRUE), 0,0,0) test2=as.Date(test2) test_df=data.frame(test, test2) * Simon J. Kiss, PhD SSHRC and DAAD Post-Doctoral Fellow John F. Kennedy Institute of North America Studies Free University of Berlin Lansstraße 7-9 14195 Berlin, Germany Cell: +49 (0)1525-300-2812, Web: http://www.jfki.fu-berlin.de/index.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Comparing a 4-point and 5-point Likert scale
Help with survey data: Hello R colleagues, I hope this is an appropriate place to direct this question. It relates specifically to the comparability of a 5-point likert to a 4- point likert scale. One question in my dataset asks "How much should be done to reduce the gap between rich and poor" Much more, somewhat more, about the same, somewhat less and much less. The second questions ask: "People who can afford to, should be able to pay for their own health care" strongly agree, agree, disagree, strongly agree. Now, assuming that I rescale them so that 1 equals the most egalitarian position and the highest number (4 or 5) equals the least egalitarian position, how can I make these two results comparable. Two ways come to mind: one is to collapse both into a dichotomous variable and do a logistic regression on both. The danger here is that I have to decide what to do with the middle position in the first question, assign it to the egalitarian or non-egalitarian category. A second way would be to multiply the scores in the first question by 4 (to get results that are either 4, 8, 12, 16 or 20) and then multiply the second question by five to get responses that are either 5, 10, 15 or 20. My idea is then to add the two, average them and use that value as an index of economic egalitarianism? Yes / no? Suggestions? I am an R user and I hope that a purely statistical question is not especially misplaced. Yours truly, Simon Kiss * Simon J. Kiss, PhD SSHRC and DAAD Post-Doctoral Fellow John F. Kennedy Institute of North America Studies Free University of Berlin Lansstraße 7-9 14195 Berlin, Germany Cell: +49 (0)1525-300-2812, Web: http://www.jfki.fu-berlin.de/index.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help calculating variable based on factor level of another
Dear colleagues, I want to calculate the value of x2 based on the value of x1. x1 is a factor with three separate levels. I want to make sure that missing values remain as NA in X2, but non-missing values take on a value of either 0 or 1 dependending on the value in x1. This is the code I'm working with...Can any one help? I've seen some other requests on a topic like this, but not using factors with strings as levels; only with numeric variables. Simon x1<-factor(levels="social and cultural specialists", "labour", "salariat") x2<-if(x1==c("social and cultural specialists")) "1" elseif (x1==NA) "NA" else "0" * Simon J. Kiss, PhD SSHRC and DAAD Post-Doctoral Fellow John F. Kennedy Institute of North America Studies Free University of Berlin Lansstraße 7-9 14195 Berlin, Germany Cell: +49 (0)1525-300-2812, Web: http://www.jfki.fu-berlin.de/index.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Compressed values on y-axis in effects plot
Dear colleagues, the code below generated the two effects plots that I have attached. I hope they are not stripped. The original two models are as follows: green_shift_mod=glm(green_shift ~ educ+party_id+educ:party_id, family=binomial, data=x) carbon_tax_mod=glm(carbon_tax ~ educ+party_id+educ:party_id, family=binomial, data=x) Then, I try to plot the effects of party_id by education for both models It works well for carbon_tax_mod; but for green_shift_mod, effects plots the effects of party ID by education in a straight, horizontal line, with the values completely compressed. I've looked through; all the variables included in the two models are identical save for the DV. And the DV's in both models are ordered factors. Is any one familiar with this problem in effects plots? Yours, Simon Kiss quartz() jpeg(filename="test.jpeg", type=c("quartz")) plot(effect("educ:party_id", green_shift_mod, rug=TRUE), ylab="Probability of Disagreeing", xlab="Party ID", main="Probability of Disagreeing That The Green Shift Would Hurt The Economy") dev.off() quartz() jpeg(filename="test2.jpeg", type=c("quartz")) plot(effect("educ:party_id", carbon_tax_mod, rug=TRUE), ylab="Probability of Disagreeing", xlab="Party ID", main="Probability of Disagreeing That The Carbon Tax Would Hurt The Economy") dev.off() * Simon J. Kiss, PhD SSHRC and DAAD Post-Doctoral Fellow John F. Kennedy Institute of North America Studies Free University of Berlin Lansstraße 7-9 14195 Berlin, Germany Cell: +49 (0)1525-300-2812, Web: http://www.jfki.fu-berlin.de/index.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding different hues for a mosaic plot compatible with grayscale printing
Dear Colleagues, Thanks for that JIm, but It strikes me that printing the residual values in the cells might be a simpler way of communicating the direction of each cell. I can get the residuals printed via the labeling_values commands in mosaic, but I cannot seem to *combine* this with labeling_borders commands that I'd like to use to modify the rotation, font size and contents of variable names and labels. The following mosaic command draws the plot with the labeling I'd like. >mosaic(~social_class+ctax_agg_scaled, pop=FALSE, shade=TRUE, main="The Liberals Carbon Tax Or Green Shift Would Hurt The Canadian Economy By EGP Class Category", main_gp=gpar(fontsize=16), gp=shading_hcl(CST21$observed, CST21$expected, ASR21, df=12, h=c(260,0), c=c(100,0), l=c(90,50), interpolate=c(1,2,3,4)), labeling_args=list(labels=TRUE, rot_labels=c(25,0,0,25), gp_labels=gpar(fontsize=7), just_labels="center", offset_labels=c(1,0,0,4), offset_varnames=c(2,0,0,4), set_varnames=c(ctax_agg_scaled="The Liberal Green Shift Or Carbon Tax Would Hurt The Canadian Economy", social_class="EGP Class Category"))) And when I take out the labeling_borders commands and insert the following, >labeling=labeling_values(value_type=c("residuals"), suppress=0) then I do get the residuals printed, but the labels are unattractive. How do I combine labeling_borders and labeling_values commands in one command. Yours, Simon Kiss On 12-May-10, at 2:42 PM, Jim Lemon wrote: On 05/12/2010 07:34 PM, Simon Kiss wrote: I'm working with the following code below to generate a how do I set the h,c, and l values such that the significant, positive residuals appear different on a grayscale printer from significant grayscale residuals. The challenge as I see it is that one can only distinguish the positive and negative residuals with the hue/. Varying the chroma and the luminance only affect the distinctions between large and small and significant and non significant. But my positive and negative residuals are both large (absolutely) and significant, meaning that they will have the same chroma and luminosity, but different hues. I guess the key here is to find two separate hue values that appear substantially different *on a grayscale printer* at the same chroma and luminance. I have read through Zeileis et al. (2007, 2008) but can't quite find the answer there. I have also tried the Friendly shading to vary the line type, but I can't find line types that are different enough to communicate the difference between positive and negative residuals clearly. Your assistance is appreciated. >mosaic(~educ+trade_off_scaled, shade=TRUE, main="Support For Environmental Protection At The Expense of Creating Jobs By Education", gp=shading_hcl(CST17$observed, CST17$expected, ASR17, df=6, h=c(260,0), c=c(100,0), l=c(90,0)), labeling_args=list(rot_labels=c(25,90,0,0), offset_labels=c(1,0,0,2), offset_varnames=c(2,0,0,4), set_varnames=c(trade_off_scaled="Protecting The Environment Is More Important Than Creating Jobs", educ="Level of Education"))) Hi Simon, I thought that the symbolbox function might do something useful, but it required a bit of modification. The attached mod allows the user to fill a rectangle with symbols, which includes things like "+" and "-". Jim * Simon J. Kiss, PhD SSHRC and DAAD Post-Doctoral Fellow John F. Kennedy Institute of North America Studies Free University of Berlin Lansstraße 7-9 14195 Berlin, Germany Cell: +49 (0)1525-300-2812, Web: http://www.jfki.fu-berlin.de/index.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Finding different hues for a mosaic plot compatible with grayscale printing
I'm working with the following code below to generate a how do I set the h,c, and l values such that the significant, positive residuals appear different on a grayscale printer from significant grayscale residuals. The challenge as I see it is that one can only distinguish the positive and negative residuals with the hue/. Varying the chroma and the luminance only affect the distinctions between large and small and significant and non significant. But my positive and negative residuals are both large (absolutely) and significant, meaning that they will have the same chroma and luminosity, but different hues. I guess the key here is to find two separate hue values that appear substantially different *on a grayscale printer* at the same chroma and luminance. I have read through Zeileis et al. (2007, 2008) but can't quite find the answer there. I have also tried the Friendly shading to vary the line type, but I can't find line types that are different enough to communicate the difference between positive and negative residuals clearly. Your assistance is appreciated. >mosaic(~educ+trade_off_scaled, shade=TRUE, main="Support For Environmental Protection At The Expense of Creating Jobs By Education", gp=shading_hcl(CST17$observed, CST17$expected, ASR17, df=6, h=c(260,0), c=c(100,0), l=c(90,0)), labeling_args=list(rot_labels=c(25,90,0,0), offset_labels=c(1,0,0,2), offset_varnames=c(2,0,0,4), set_varnames=c(trade_off_scaled="Protecting The Environment Is More Important Than Creating Jobs", educ="Level of Education"))) * Simon J. Kiss, PhD SSHRC and DAAD Post-Doctoral Fellow John F. Kennedy Institute of North America Studies Free University of Berlin Lansstraße 7-9 14195 Berlin, Germany Cell: +49 (0)1525-300-2812, Web: http://www.jfki.fu-berlin.de/index.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] NULL variable read in from SPS
Hello all, I'm having difficulty getting one particular variable into R from SPSS v. 16.0 for mac. R version is 2.10.1. I saved the relevant variables from SPSS into a .csv file and then read them into R. All the variables worked fine, except for one (enviro_spending). In the SPSS file it is correctly coded as a nominal variable and there is nothing that I can tell that distinguishes it from the others. I have tried to include a good representation of reproduceable code below along with the results I am obtaining. Yours, Simon The variables are as follows: educ =c("university", "university") trade_off =c("*this cell is blank*", "disagree") age=c(45,43) gender_1=c("female", "female") eviro_spending=c("Less/Same", "Less/Same") carbon_tax_agg=c("agree", "disagree") y=data.frame(educ, trade_off, age, gender_1, enviro_spending, carbon_tax_agg) #The following are the original commands I used to read the .csv file into R. y=read.csv(file.choose(), header=TRUE) #When I do the following, all the variable names are correct names(y) #When I do the following, all the data in the dataframe are correct. y #But when I do the following, I get the following results y$enviro_spending #NULL is.character(y$enviro_spending) #FALSE is.factor(y$enviro_spending) #FALSE #I tried to save the single problematic variable from my spss file to a .csv file as and then read that into R. z=read.csv(file.choose(), header=TRUE) #Just as before, calling the dataframe gives the data exactly as it should z #less/same #more #less/same #more #more #But when I call the specific variable, I get #NULL z$enviro_spending #NULL * Simon J. Kiss, PhD SSHRC and DAAD Post-Doctoral Fellow John F. Kennedy Institute of North America Studies Free University of Berlin Lansstraße 7-9 14195 Berlin, Germany Cell: +49 (0)1525-300-2812, Web: http://www.jfki.fu-berlin.de/index.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with recode -Error in parse(text = range[[1]][1]) : unexpected end of input in " c(0"
Dear colleagues, in the help archive there was a previous person who encountered a problem with the "recode" command in the car library. I'm not sure if that was solved, there was no posting to that effect, but I'm having the same problem. I'm trying to recode a numeric variable with values from 0-100 into a binary variable with values (0,1). The following command: recode(green_2004_2$french, "c(50:100)=0; c(0:49.99)=1") gets the following error message Error in parse(text = range[[1]][1]) : unexpected end of input in " c(0" I tried it with a second numerical variable in the same data set, but get precisely the same error at precisely the same location in the command, i.e. the second colon. As far as I can tell I have the most up-to-date version of car installed. Any suggestions? Yours, Simon Kiss * Simon J. Kiss, PhD SSHRC and DAAD Post-Doctoral Fellow John F. Kennedy Institute of North America Studies Free University of Berlin Lansstraße 7-9 14195 Berlin, Germany Cell: +49 (0)1525-300-2812, Web: http://www.jfki.fu-berlin.de/index.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Summing a series made up of part of a vector
Dear colleagues, I have a data frame that looks so: *x1 x4 1 4.2 2 3.6 3 2.7 . . 308 n.a. x4 is a vector of percentages, sorted in descending value. I would like to create a new variable that represents the sum of the series of values of x4 to that row. So I would like x5 to look like this. x5 1 4.2 2 7.8 (4.2 +3.6) 3 10.5 (4.2+3.6+2.7) 308 =na So the last number in the vector x5 should be 100, as these are all percentages. Any suggestions? Yours truly, simon Kiss * Simon J. Kiss, PhD SSHRC and DAAD Post-Doctoral Fellow John F. Kennedy Institute of North America Studies Free University of Berlin Lansstraße 7-9 14195 Berlin, Germany Cell: +49 (0)1525-300-2812, Web: http://www.jfki.fu-berlin.de/index.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] row names in regression results and saving the identification results from added variable plots
Hello all, Is there a way to take the row names from my data.frame and have them imported to the regression results? At the moment, I my original data frame looks like this: / Riding name / Turnout / Margin / Expenditures 1 / Abbotsford 2 / . 3 / . 4 / .Willow I know how to set the row names for the original data frame to be the Riding name, but when I run the regression, the residuals, dfbetas, cook's d all lose those and are listed with the original row number. This does not pose a significant problem for when I'm just looking at residuals and defbetas, because I've figured out how to match up the row names to those variables. But it is posing a bit of a problem now that I'm looking at added variable plots; the calculations are more difficult to match up the results to the row names. As a second question, I have figured out how to identify the points in added variable plots - av.plots=(model, labels=names(residuals(model_name)), identify.points=TRUE) However, when I'm finished identifying points, the results are not saved. I'm not sure if I can use the "identify" command with the av.plots command in (car) as you can with other standard plots because av.plots brings up an interactive menu that does not appear to allow for that. If any one can help, it would be appreciated! Yours, Simon Kiss * Simon J. Kiss, PhD SSHRC and DAAD Post-Doctoral Fellow John F. Kennedy Institute of North America Studies Free University of Berlin Lansstraße 7-9 14195 Berlin, Germany Cell: +49 (0)1525-300-2812, Web: http://www.jfki.fu-berlin.de/index.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R Full Screen
Hello all, I'm new user with R and just completed a five day course on the program. Somehow, a few basic questions remain unanswered. I'm working on a Mac Os X system and have my laptop connected to a large, flat- screen monitor. I can't make any of the Quartz windows fill the monitor's screen; I'd like to make them full screen to identify points in a dense scatterplot. Thank you for any suggestions. Yours, Simon Kiss * Simon J. Kiss, PhD SSHRC and DAAD Post-Doctoral Fellow John F. Kennedy Institute of North America Studies Free University of Berlin Lansstraße 7-9 14195 Berlin, Germany Cell: +49 (0)1525-300-2812, Web: http://www.jfki.fu-berlin.de/index.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.