from:"Simon Kiss"

[R] mlogit.effects()

2015-06-17 Thread Simon Kiss

Dear colleagues,
I am struggling mightily with the mlogit package.  First, the reason that I am 
using mlogit as opposed to multinom() in nnet is because my data is ranked, not 
just ordinal.  So, I’m really trying to fit an exploded logit or rank-ordered 
model.  All of the covariates of interest are individual-specific, none are 
alternative specific.  The code below produces a model with my covariates of 
interest, so that is good. But, I cannot get predict.mlogit or effects.mlogit 
to work *at all*.  The help package is quite unclear as to how to format the 
sample data that is fed to either of those two functions.
Can any one help in that regard?  Failing that, can anyone provide a suggestion 
for an alternative way of modelling ranked categorical data? I’m aware of the 
pmr and Rankcluster packages. The former however is also poorly documented and 
the latter is computationally intense to select clusters.  
I’m trying to do this as simply as possible while remaining loyal to the ranked 
structure of the data. 

Thanks, Simon Kiss

#Loadpackages 
library(RCurl)
library(mlogit)
library(tidyr)
library(dplyr)
#URL where data is stored
dat.url<-  
'https://raw.githubusercontent.com/sjkiss/Survey/master/mlogit.out.csv'
#Get data
dat<-read.csv(dat.url)
#Complete cases only as it seems mlogit cannot handle missing values or tied 
data which in this case you might get because of median imputation
dat<-dat[complete.cases(dat),]
#Tidy data to get it into long format
dat.out<-dat %>%
  gather(Open, Rank, -c(1,9:12)) %>%
  arrange(X, Open, Rank)
#Create mlogit object
mlogit.out<-mlogit.data(dat.out, shape='long',alt.var='Open',choice='Rank', 
ranked=TRUE,chid.var='X')
#Fit Model
mod1<-mlogit(Rank~1|gender+age+economic+Job,data=mlogit.out)

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] recode the same subset of variables in several list elements

2015-04-06 Thread Simon Kiss

Hi Jim, So that does the rescale part very efficiently. But I’d like to know 
how to do that on each list element using lapply or llply.  I have about 4 data 
frames and a few other recodes to do so automating would be nice, rather than 
applying your code to each individual list element.
simon
> On Apr 2, 2015, at 6:30 PM, Jim Lemon  wrote:
> 
> Hi Simon,
> How about this?
> 
> library(plotrix)
> revlist<-grep("i",names(df),fixed=TRUE)
> df[,revlist]<-sapply(df[,revlist],rescale,c(3,1))
> 
> Jim
> 
> 
> On Fri, Apr 3, 2015 at 6:30 AM, Simon Kiss  <mailto:sjk...@gmail.com>> wrote:
> Hi there: I have a list of data frames with identical variable  names.  I’d 
> like to reverse scale the same variables in each data.frame.
> I’d appreciate any one’s suggestions as to how to accomplish this. Right now, 
> I’m working with the code at the very bottom of my sample data.
> Thanks, Simon Kiss
> 
> #Create data.frame1
> df<-data.frame(
>   ivar1=sample(c(1,2,3), replace=TRUE, size=100),
>   ivar2=sample(c(1,2,3), replace=TRUE, size=100),
>   hvar1=sample(c(1,2,3), replace=TRUE, size=100),
>   hvar2=sample(c(1,2,3), replace=TRUE, size=100),
>   evar1=sample(c(1,2,3), replace=TRUE, size=100),
>   evar2=sample(c(1,2,3), replace=TRUE, size=100)
>   )
> 
> #data.frame2
>   df1<-data.frame(
> ivar1=sample(c(1,2,3), replace=TRUE, size=100),
> ivar2=sample(c(1,2,3), replace=TRUE, size=100),
> hvar1=sample(c(1,2,3), replace=TRUE, size=100),
> hvar2=sample(c(1,2,3), replace=TRUE, size=100),
> evar1=sample(c(1,2,3), replace=TRUE, size=100),
> evar2=sample(c(1,2,3), replace=TRUE, size=100)
>   )
> 
> #List
> list1<-list(df, df1)
> #vector of first variables I’d like to recode
> i.recodes<-grep('^i.', names(df), value=TRUE)
> #Vector of second variables to recode
> e.recodes<-grep('^e.', names(df), value=TRUE)
> 
> #Set up RESCALE function from RPMG package
> RESCALE <- function (x, nx1, nx2, minx, maxx)
> { nx = nx1 + (nx2 - nx1) * (x - minx)/(maxx - minx)
>   return(nx)
> }
> 
> #This is what I’m playing around with
> test<-lapply(list1, function(y) {
>   out<-y[,i.recodes]
>   out<-lapply(out, function(x) RESCALE(x, 0,1,1,6))
>   y[,names(x)]<-out
> })
> [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org <mailto:R-help@r-project.org> mailing list -- To 
> UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help 
> <https://stat.ethz.ch/mailman/listinfo/r-help>
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
> <http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
> 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] recode the same subset of variables in several list elements

2015-04-02 Thread Simon Kiss

Hi there: I have a list of data frames with identical variable  names.  I’d 
like to reverse scale the same variables in each data.frame.  
I’d appreciate any one’s suggestions as to how to accomplish this. Right now, 
I’m working with the code at the very bottom of my sample data. 
Thanks, Simon Kiss

#Create data.frame1
df<-data.frame(
  ivar1=sample(c(1,2,3), replace=TRUE, size=100),
  ivar2=sample(c(1,2,3), replace=TRUE, size=100),
  hvar1=sample(c(1,2,3), replace=TRUE, size=100),
  hvar2=sample(c(1,2,3), replace=TRUE, size=100),
  evar1=sample(c(1,2,3), replace=TRUE, size=100),
  evar2=sample(c(1,2,3), replace=TRUE, size=100)
  )
  
#data.frame2
  df1<-data.frame(
ivar1=sample(c(1,2,3), replace=TRUE, size=100),
ivar2=sample(c(1,2,3), replace=TRUE, size=100),
hvar1=sample(c(1,2,3), replace=TRUE, size=100),
hvar2=sample(c(1,2,3), replace=TRUE, size=100),
evar1=sample(c(1,2,3), replace=TRUE, size=100),
evar2=sample(c(1,2,3), replace=TRUE, size=100)
  )

#List
list1<-list(df, df1)
#vector of first variables I’d like to recode
i.recodes<-grep('^i.', names(df), value=TRUE)
#Vector of second variables to recode
e.recodes<-grep('^e.', names(df), value=TRUE)

#Set up RESCALE function from RPMG package
RESCALE <- function (x, nx1, nx2, minx, maxx) 
{ nx = nx1 + (nx2 - nx1) * (x - minx)/(maxx - minx)
  return(nx)
}

#This is what I’m playing around with
test<-lapply(list1, function(y) {
  out<-y[,i.recodes]
  out<-lapply(out, function(x) RESCALE(x, 0,1,1,6))
  y[,names(x)]<-out
})
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] basic help with as.Date()

2015-03-26 Thread Simon Kiss

Hi there: normally I’m quite comfortable with as.Date(). But this data set is 
causing problems.

The core of the data frame looks like the sample data frame below, but my 
attempt to convert df$mydate to a date object returns only NA. Can anyone 
provide a suggestion?

Thank you, Simon Kiss

#sample data frame
df<-data.frame(mydate=factor(c('Jan-15', 'Feb-13', 'Mar-11', 'Jul-12')), 
other=rnorm(4, 3))
#Attempt to convert
as.Date(as.character(df$mydate), format='%b-%y')

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] foreign:::writeForeignSPSS vs. write.foreign(df, datafile, codefile, package='spss')

2015-01-29 Thread Simon Kiss

Hello:
I discovered recently that the function foreign:::writeForeignSPSS allows for 
variable names longer than 8 characters and has an additional argument 
varnames.  Neither of these capabilities exist with write.foreign. But 
according to the help file for write.foreign it seems that the latter actually 
somehow calls the former.  Am I reading this wrong? Can someone explain the 
difference between the two functions?
Thanks.
Yours, Simon Kiss

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data Frame

2014-09-05 Thread Simon Kiss

HI, of course.

The a mini-version of my data-set is below, stored in d2. Then the code I'm 
working follows.
library(reshape2)
#Create d2
structure(list(row = 1:50, rank1 = structure(c(3L, 3L, 3L, 4L, 
3L, 3L, NA, NA, 3L, NA, 3L, 3L, 1L, NA, 2L, NA, 3L, NA, 2L, 1L, 
1L, 3L, NA, 6L, NA, 1L, NA, 3L, 1L, NA, 1L, NA, NA, 6L, 3L, NA, 
1L, 3L, 3L, 4L, 1L, NA, 3L, 3L, 3L, NA, 3L, 3L, NA, 1L), .Label = 
c("accessible", 
"alternatives", "information", "responsive", "social", "technical", 
"trade"), class = "factor"), rank2 = structure(c(6L, 1L, 1L, 
2L, 4L, 6L, NA, NA, 6L, NA, 6L, 4L, 2L, NA, 4L, NA, 6L, NA, 1L, 
6L, 3L, 2L, NA, 3L, NA, 6L, NA, 6L, 6L, NA, 3L, NA, NA, 3L, 6L, 
NA, 6L, 6L, 6L, 7L, 3L, NA, 1L, 6L, 6L, NA, 2L, 6L, NA, 2L), .Label = 
c("accessible", 
"alternatives", "information", "responsive", "social", "technical", 
"trade"), class = "factor"), rank3 = structure(c(1L, 6L, 4L, 
3L, 2L, 4L, NA, NA, 4L, NA, 1L, 1L, 6L, NA, 1L, NA, 1L, NA, 7L, 
3L, 6L, 1L, NA, 2L, NA, 4L, NA, 1L, 3L, NA, 6L, NA, NA, 4L, 2L, 
NA, 7L, 1L, 1L, 6L, 7L, NA, 6L, 1L, 1L, NA, 4L, 1L, NA, 3L), .Label = 
c("accessible", 
"alternatives", "information", "responsive", "social", "technical", 
"trade"), class = "factor"), rank4 = structure(c(7L, 4L, 2L, 
1L, 1L, 7L, NA, NA, 1L, NA, 7L, 2L, 7L, NA, 3L, NA, 2L, NA, 3L, 
4L, 5L, 6L, NA, 4L, NA, 3L, NA, 4L, 4L, NA, 4L, NA, NA, 2L, 7L, 
NA, 2L, 2L, 2L, 3L, 6L, NA, 2L, 5L, 4L, NA, 1L, 2L, NA, 4L), .Label = 
c("accessible", 
"alternatives", "information", "responsive", "social", "technical", 
"trade"), class = "factor"), rank5 = structure(c(2L, 7L, 6L, 
7L, 7L, 2L, NA, NA, 2L, NA, 2L, 7L, 3L, NA, 6L, NA, 7L, NA, 6L, 
7L, 4L, 7L, NA, 7L, NA, 7L, NA, 2L, 2L, NA, 2L, NA, NA, 7L, 1L, 
NA, 3L, 7L, 4L, 2L, 2L, NA, 4L, 2L, 2L, NA, 6L, 4L, NA, 5L), .Label = 
c("accessible", 
"alternatives", "information", "responsive", "social", "technical", 
"trade"), class = "factor"), rank6 = structure(c(4L, 2L, 7L, 
6L, 6L, 1L, NA, NA, 7L, NA, 4L, 5L, 4L, NA, 7L, NA, 4L, NA, 4L, 
2L, 2L, 4L, NA, 1L, NA, 2L, NA, 7L, 7L, NA, 7L, NA, NA, 1L, 4L, 
NA, 4L, 4L, 7L, 1L, 4L, NA, 7L, 7L, 7L, NA, 7L, 7L, NA, 7L), .Label = 
c("accessible", 
"alternatives", "information", "responsive", "social", "technical", 
"trade"), class = "factor"), rank7 = structure(c(5L, 5L, 5L, 
5L, 5L, 5L, NA, NA, 5L, NA, 5L, 6L, 5L, NA, 5L, NA, 5L, NA, 5L, 
5L, 7L, 5L, NA, 5L, NA, 5L, NA, 5L, 5L, NA, 5L, NA, NA, 5L, 5L, 
NA, 5L, NA, 5L, 5L, 5L, NA, 5L, 4L, 5L, NA, 5L, 5L, NA, 6L), .Label = 
c("accessible", 
"alternatives", "information", "responsive", "social", "technical", 
"trade"), class = "factor")), .Names = c("row", "rank1", "rank2", 
"rank3", "rank4", "rank5", "rank6", "rank7"), row.names = c(NA, 
50L), class = "data.frame")


#This code is a replication of David Carlson's code (below) which works 
splendidly, but does not work on my data-set
#Melt d2: Note, I've used value.name='color' to maximize comparability with 
David's suggestion
d3 <- melt(d2, id.vars=1, measure.vars=2:8, 
variable.name="rank",value.name="color")
#Make Rank Variable Numeric
d3$rank<-as.numeric(d3$rank)
#Recast d3 into d4
d4<- dcast(d3, row~color,value.var="rank", fill=0)
#Note that d4 appears to provide a binary variable for one if a respondent 
checked the option, but does not provide information as to which rank they 
assigned each option, but also seems to summarize the number of missing values

#David Carlson's Code
mydf <- data.frame(t(replicate(100, sample(c("red", "blue",  "green", "yellow", 
NA), 4
mydf <- data.frame(rows=1:100, mydf)
colnames(mydf) <- c("row", "rank1", "rank2", "rank3", "rank4")
mymelt <- melt(mydf, id.vars=1, measure.vars=2:5, variable.name="rank", 
value.name="color")
mymelt$rank <- as.numeric(mymelt$rank)
mycast <- dcast(mymelt, row~color, value.var="rank", fill=0)

#Compare
str(mydf)
str(d2)
head(mycast)
head(d4)

Again, I'm grateful for assistance. I can't understand what how my data-set 
differs from David's sample data-set.
Simon Kiss
On Sep 4, 2014, at 2:35 PM, David L Carlson  wrote:

> I think we would need enough of the data you are using to figure out h

Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data Frame

2014-09-04 Thread Simon Kiss

Hi David and list:
This is working, except at this command
mycast <- dcast(mymelt, row~color, value.var="rank", fill=0)

dcast is using "length" as the default aggregating function. This results in 
not accurate results. It tells me, for example how many choices were missing 
values and it tells me if a person selected any given option (value is reported 
as 1).
When I try to run your reproducible research, it works great, but something 
with the aggregating function is not working properly with mine. 
Any other thoughts?
Simon
On Aug 18, 2014, at 10:44 AM, David L Carlson  wrote:

> Another approach using reshape2:
> 
>> library(reshape2)
>> # Construct data/ add column of row numbers
>> set.seed(42)
>> mydf <- data.frame(t(replicate(100, sample(c("red", "blue",
> +   "green", "yellow", NA), 4
>> mydf <- data.frame(rows=1:100, mydf)
>> colnames(mydf) <- c("row", "rank1", "rank2", "rank3", "rank4")
>> head(mydf)
>  row  rank1  rank2  rank3 rank4
> 1   1yellowred  blue
> 2   2 yellow  green  red
> 3   3 yellow  green   blue  
> 4   4  blue yellow green
> 5   5   red   blue green
> 6   6   red  green  blue
>> # Reshape
>> mymelt <- melt(mydf, id.vars=1, measure.vars=2:5, 
> + variable.name="rank", value.name="color")
>> # Convert rank to numeric
>> mymelt$rank <- as.numeric(mymelt$rank)
>> mycast <- dcast(mymelt, row~color, value.var="rank", fill=0)
>> head(mycast)
>  row blue green red yellow NA
> 1   14 0   3  2  1
> 2   20 2   4  1  3
> 3   33 2   0  1  4
> 4   4    2     4   0  3  1
> 5   53 4   2  0  1
> 6   64 3   2  0  1
> 
> David C
> 
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
> Behalf Of David L Carlson
> Sent: Sunday, August 17, 2014 6:32 PM
> To: Simon Kiss; r-help@r-project.org
> Subject: Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A 
> Data Frame
> 
> There is probably an easier way to do this, but
> 
>> set.seed(42)
>> mydf <- data.frame(t(replicate(100, sample(c("red", "blue",
> +  "green", "yellow", NA), 4
>> colnames(mydf) <- c("rank1", "rank2", "rank3", "rank4")
>> head(mydf)
>   rank1  rank2  rank3 rank4
> 1yellowred  blue
> 2 yellow  green  red
> 3 yellow  green   blue  
> 4  blue yellow green
> 5   red   blue green
> 6   red  green  blue
>> lvls <- levels(mydf$rank1)
>> # convert color factors to numeric
>> for (i in seq_along(mydf)) mydf[,i] <- as.numeric(mydf[,i]) 
>> # stack the columns
>> mydf2 <- stack(mydf)
>> # convert rank factor to numeric
>> mydf2$ind <- as.numeric(mydf2$ind)
>> # add row numbers
>> mydf2 <- data.frame(rows=1:100, mydf2)
>> # Create table
>> mytbl <- xtabs(ind~rows+values, mydf2)
>> # convert to data frame
>> mydf3 <- data.frame(unclass(mytbl))
>> colnames(mydf3) <- lvls
>> head(mydf3)
>  blue green red yellow
> 14 0   3  2
> 20 2   4  1
> 33 2   0  1
> 42 4   0  3
> 53 4   2  0
> 64 3   2  0
> 
> David C
> 
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
> Behalf Of Simon Kiss
> Sent: Friday, August 15, 2014 3:58 PM
> To: r-help@r-project.org
> Subject: Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A 
> Data Frame
> 
> 
> Both the suggestions I got work very well, but what I didn't realize is that 
> NA values would cause serious problems.  Where there is a missing value, 
> using the argument na.last=NA to order just returns the the order of the 
> factor levels, but excludes the missing values, but I have no idea where 
> those occur in the or rather which of those variables were actually missing.  
> Have I explained this problem sufficiently? 
> I didn't think it would cause such a problem so I didn't include it in the 
> original problem definition.
> Yours, Simon
> On Jul 25, 2014, at 4:58 PM, David L Carlson  wrote:
> 
>> I think this gets what you want. But your data are not reproducible since 
>> they are randomly drawn without setting a seed and the two data sets have no 
>> relationship to one another.
>> 
>>> set.seed(42)
>>> mydf <- data.frame(t(replicate(100, sample(c("r

Re: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data Frame

2014-08-15 Thread Simon Kiss


Both the suggestions I got work very well, but what I didn't realize is that NA 
values would cause serious problems.  Where there is a missing value, using the 
argument na.last=NA to order just returns the the order of the factor levels, 
but excludes the missing values, but I have no idea where those occur in the or 
rather which of those variables were actually missing.  
Have I explained this problem sufficiently? 
I didn't think it would cause such a problem so I didn't include it in the 
original problem definition.
Yours, Simon
On Jul 25, 2014, at 4:58 PM, David L Carlson  wrote:

> I think this gets what you want. But your data are not reproducible since 
> they are randomly drawn without setting a seed and the two data sets have no 
> relationship to one another.
> 
>> set.seed(42)
>> mydf <- data.frame(t(replicate(100, sample(c("red", "blue",
> + "green", "yellow")
>> colnames(mydf) <- c("rank1", "rank2", "rank3", "rank4")
>> mydf2 <- data.frame(t(apply(mydf, 1, order)))
>> colnames(mydf2) <- levels(mydf$rank1)
>> head(mydf)
>   rank1  rank2  rank3 rank4
> 1 yellow  greenred  blue
> 2  green   blue yellow   red
> 3  green yellowred  blue
> 4 yellowred  green  blue
> 5 yellowred  green  blue
> 6 yellowred   blue green
>> head(mydf2)
>  blue green red yellow
> 14 2   3  1
> 22 1   4  3
> 34 1   3  2
> 44 3   2  1
> 54 3   2  1
> 63 4   2  1
> 
> -
> David L Carlson
> Department of Anthropology
> Texas A&M University
> College Station, TX 77840-4352
> 
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
> Behalf Of Simon Kiss
> Sent: Friday, July 25, 2014 2:34 PM
> To: r-help@r-project.org
> Subject: [R] Turn Rank Ordering Into Numerical Scores By Transposing A Data 
> Frame
> 
> Hello:
> I have data that looks like mydf, below.  It is the results of a survey where 
> participants were to put a number of statements (in this case colours) in 
> their order of preference. In this case, the rank number is the variable, and 
> the factor level for each respondent is which colour they assigned to that 
> rank.  I would like to find a way to effectively transpose the data frame so 
> that it looks like mydf2, also below, where the colours the participants were 
> able to choose are the variables and the variable score is what that person 
> ranked that variable. 
> 
> Ultimately what I would like to do is a factor analysis on these items, so 
> I'd like to be able to see if people ranked red and yellow higher together 
> but ranked green and blue together lower, that sort of thing.  
> I have played around with different variations of t(), melt(), ifelse() and 
> if() but can't find a solution. 
> Thank you
> Simon
> #Reproducible code
> mydf<-data.frame(rank1=sample(c('red', 'blue', 'green', 'yellow'), 
> replace=TRUE, size=100), rank2=sample(c('red', 'blue', 'green', 'yellow'), 
> replace=TRUE, size=100), rank3=sample(c('red', 'blue', 'green', 'yellow'), 
> replace=TRUE, size=100), rank4=sample(c('red', 'blue', 'green', 'yellow'), 
> replace=TRUE, size=100))
> 
> mydf2<-data.frame(red=sample(c(1,2,3,4), 
> replace=TRUE,size=100),blue=sample(c(1,2,3,4), 
> replace=TRUE,size=100),green=sample(c(1,2,3,4), replace=TRUE,size=100) 
> ,yellow=sample(c(1,2,3,4), replace=TRUE,size=100))
> *
> Simon J. Kiss, PhD
> Assistant Professor, Wilfrid Laurier University
> 73 George Street
> Brantford, Ontario, Canada
> N3T 2C9
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Turn Rank Ordering Into Numerical Scores By Transposing A Data Frame

2014-07-25 Thread Simon Kiss

Hello:
I have data that looks like mydf, below.  It is the results of a survey where 
participants were to put a number of statements (in this case colours) in their 
order of preference. In this case, the rank number is the variable, and the 
factor level for each respondent is which colour they assigned to that rank.  I 
would like to find a way to effectively transpose the data frame so that it 
looks like mydf2, also below, where the colours the participants were able to 
choose are the variables and the variable score is what that person ranked that 
variable. 

Ultimately what I would like to do is a factor analysis on these items, so I'd 
like to be able to see if people ranked red and yellow higher together but 
ranked green and blue together lower, that sort of thing.  
I have played around with different variations of t(), melt(), ifelse() and 
if() but can't find a solution. 
Thank you
Simon
#Reproducible code
mydf<-data.frame(rank1=sample(c('red', 'blue', 'green', 'yellow'), 
replace=TRUE, size=100), rank2=sample(c('red', 'blue', 'green', 'yellow'), 
replace=TRUE, size=100), rank3=sample(c('red', 'blue', 'green', 'yellow'), 
replace=TRUE, size=100), rank4=sample(c('red', 'blue', 'green', 'yellow'), 
replace=TRUE, size=100))

mydf2<-data.frame(red=sample(c(1,2,3,4), 
replace=TRUE,size=100),blue=sample(c(1,2,3,4), 
replace=TRUE,size=100),green=sample(c(1,2,3,4), replace=TRUE,size=100) 
,yellow=sample(c(1,2,3,4), replace=TRUE,size=100))
*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help with polychoric correlation in psych library

2014-06-02 Thread Simon Kiss

Hello I have a data.frame of 32 variables, all are ordered factors. str(dat) 
returns the following
'data.frame':   32 obs. of  43 variables:
 $ q1a: Ord.factor w/ 6 levels "Strongly Disagree"<..: 3 4 2 5 NA NA 5 5 3 5 ...
 $ q1b: Ord.factor w/ 6 levels "Strongly Disagree"<..: 3 NA 4 NA NA NA NA 5 4 4 
...
 $ q1c: Ord.factor w/ 6 levels "Strongly Disagree"<..: NA NA 5 5 NA 4 NA 5 NA 5 
...
 $ q1d: Ord.factor w/ 6 levels "Strongly Disagree"<..: 5 NA 5 NA NA 5 NA 5 NA 4 
...
 $ q1e: Ord.factor w/ 6 levels "Strongly Disagree"<..: 5 NA NA 5 5 NA NA 5 5 NA 
...
 $ q1f: Ord.factor w/ 6 levels "Strongly Disagree"<..: 4 5 5 5 5 5 5 4 5 5 ...

I'm trying to come up with a polychoric correlation matrix for these, and so I 
convert them to numeric values:
'data.frame':   32 obs. of  43 variables:
 $ q1a: num  3 4 2 5 NA NA 5 5 3 5 ...
 $ q1b: num  3 NA 4 NA NA NA NA 5 4 4 ...
 $ q1c: num  NA NA 5 5 NA 4 NA 5 NA 5 ...
 $ q1d: num  5 NA 5 NA NA 5 NA 5 NA 4 ...

and try: 
library(psych)
polychoric(values, na.rm=TRUE), but this returns the following error


The items do not have an equal number of response alternatives, global set to 
FALSE
Error in poly[1, ] : incorrect number of dimensions
In addition: Warning message:
In mclapply(seq_len(n), do_one, mc.preschedule = mc.preschedule,  :
  all scheduled cores encountered errors in user code

Can anyone provide any guidance?
Thanks, Simon Kiss

*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] escape characters for apostrophes in a .csv file

2014-04-23 Thread Simon Kiss

Hello: 
I have a .csv file that includes some character strings (open ended survey 
responses) that includes some apostrophe. Using read.csv() the file reads in 
just fine, except upon being read in the apostrophes are displayed with the 
double-slash, i.e. 'I've' becomes 'I\\'ve'.  I'd like to print these responess 
out for a report.  Is there a way that I can have the apostrophes read in as 
original or print them out without the escape characters.
Thank you. 
*
Simon J. Kiss, PhD

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] ggplot2 counts versus percentages

2014-02-09 Thread Simon Kiss

Hello: I’m having troubles with plotting a barchart with percentages rather 
than counts in ggplot2. I’m aware that others have a problem with this, but 
cannot get this to work as I wish. At the end, I’d like a facetted barchart 
with percentages rather than with counts.   Thank you for any assistance!

I have a data.frame that looks like this below, table the data and then melt it 
to get it into long format.
#Libraries
l<-c(‘reshape’, ‘ggplot2’)
lapply(l, library, character.only=T)

#Sample
test<-data.frame(society=sample(myvalues, size=100, replace=TRUE), 
equality=sample(myvalues, size=100, replace=TRUE), discrim=sample(myvalues, 
size=100, replace=TRUE))

#Long format
test.table<-apply(test, 2, table)
test.table<-melt(test.table)

#And now I do this to create a facetted series of barcharts
ggplot(test.table,aes(x=X1, y=value))+geom_bar(stat=‘identity')+facet_grid(~X2)
#How do I get it to plot percentages, rather than the counts?
#I’ve tried several variations of this to no success
ggplot(test.table,aes(x=X1, y=value))+geom_bar(stat='identity', aes(y=value, 
(..count..)/sum(..count..)))+facet_grid(~X2)
ggplot(test.table,aes(x=X1))+geom_bar(stat='identity', 
aes(y=value/(..sum..)/value))+facet_grid(~X2)

Thank you for your assistance!
Yours, Simon Kiss


*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Searching the help archives - 404 error?

2013-12-19 Thread Simon Kiss

I'm using Mac OS 10.8.5, Chrome 31 and Safari 6.1. 
Recently, when entering anything into the search box here:
http://tolstoy.newcastle.edu.au/R/
I get this response when searching using either Chrome or Safari:

404. That’s an error.

The requested URL /u/newcastlemaths?q=rprofile&sa=Google+Search was not found 
on this server. That’s all we know.

Has the search engine for the help archives moved?
Yours, Simon Kiss
*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help using mapply to run multiple models

2013-12-19 Thread Simon Kiss

Hi there: Just to tie this altogether.

Here is the final function

f<- function (modelType, responseName, predictorNames, data, ..., envir = 
parent.frame())
{
  call <- match.call()
  call$formula <- formula(envir = envir, paste(responseName, sep = " ~ ",
   paste0("`", predictorNames, "`", 
collapse = " + ")))
  call[[1]] <- as.name(modelType)
  call$responseName <- NULL # omit responseName=
  call$predictorNames <- NULL # omit 'predictorNames='
  eval(call, envir = envir)
}
  
Here I call the function to a list of predictor variables and one dependent 
variable. Note "glm" and not glm.
z <- lapply(list(c("hp","drat"), c("cyl"), c("am","gear")), 
FUN=function(preds)f("glm", "carb", preds, data=mtcars, family='binomial'))

I do get this error:
Error in glm.control(modelType = "glm") : 
  unused argument(s) (modelType = "glm")

But 
lapply(z, summary)

does seem to return a list of model summaries. It looks like it worked.

I also tried. 
z <- lapply(list(c("hp","drat"), c("cyl"), c("am","gear")), 
FUN=function(preds)f("lm", "mpg", preds, data=mtcars))

Here, I get:
1: In lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
  extra argument ‘modelType’ is disregarded.
2: In lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
  extra argument ‘modelType’ is disregarded.
3: In lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
  extra argument ‘modelType’ is disregarded.

But again, it actually looks like it worked.
So, thank you very much!
Yours, Simon Kiss

On 2013-12-19, at 1:55 PM, Simon Kiss  wrote:

> Hello Bill, that is fantastic and it's quite a bit above what I could write. 
> Is there a way to make the model type an argument to the function so that you 
> can specify whether one is running glm, lm and such? 
> I tried to modify it by inserting an argument modelType below, but that 
> doesn't work.
> Yours, simon Kiss
>> f <- function (modelType, responseName, predictorNames, data, ..., envir = 
>> parent.frame())
>>   {
>>   call <- match.call()
>>   call$formula <- formula(envir = envir, paste(responseName, sep = " ~ ",
>>   paste0("`", predictorNames, "`", collapse = " + ")))
>>   call[[1]] <- quote(modelType) # '
>>   call$responseName <- NULL # omit responseName=
>>   call$predictorNames <- NULL # omit 'predictorNames='
>>   eval(call, envir = envir)
>>   }
> On 2013-12-18, at 3:07 PM, William Dunlap  wrote:
> 
>> f <- function (responseName, predictorNames, data, ..., envir = 
>> parent.frame())
>>   {
>>   call <- match.call()
>>   call$formula <- formula(envir = envir, paste(responseName, sep = " ~ ",
>>   paste0("`", predictorNames, "`", collapse = " + ")))
>>   call[[1]] <- quote(glm) # 'f' -> 'glm'
>>   call$responseName <- NULL # omit responseName=
>>   call$predictorNames <- NULL # omit 'predictorNames='
>>   eval(call, envir = envir)
>>   }
>> as in
>>   z <- lapply(list(c("hp","drat"), c("cyl"), c("am","gear")), 
>> FUN=function(preds)f("carb", preds, data=mtcars, family=poisson))
>>   lapply(z, summary)
> 
> *
> Simon J. Kiss, PhD
> Assistant Professor, Wilfrid Laurier University
> 73 George Street
> Brantford, Ontario, Canada
> N3T 2C9
> Cell: +1 905 746 7606
> 
> 
> 

*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help using mapply to run multiple models

2013-12-19 Thread Simon Kiss

Hello Bill, that is fantastic and it's quite a bit above what I could write. Is 
there a way to make the model type an argument to the function so that you can 
specify whether one is running glm, lm and such? 
I tried to modify it by inserting an argument modelType below, but that doesn't 
work.
Yours, simon Kiss
>  f <- function (modelType, responseName, predictorNames, data, ..., envir = 
> parent.frame())
>{
>call <- match.call()
>call$formula <- formula(envir = envir, paste(responseName, sep = " ~ ",
>paste0("`", predictorNames, "`", collapse = " + ")))
>call[[1]] <- quote(modelType) # '
>call$responseName <- NULL # omit responseName=
>call$predictorNames <- NULL # omit 'predictorNames='
>eval(call, envir = envir)
>}
On 2013-12-18, at 3:07 PM, William Dunlap  wrote:

>  f <- function (responseName, predictorNames, data, ..., envir = 
> parent.frame())
>{
>call <- match.call()
>call$formula <- formula(envir = envir, paste(responseName, sep = " ~ ",
>paste0("`", predictorNames, "`", collapse = " + ")))
>call[[1]] <- quote(glm) # 'f' -> 'glm'
>call$responseName <- NULL # omit responseName=
>call$predictorNames <- NULL # omit 'predictorNames='
>eval(call, envir = envir)
>}
> as in
>z <- lapply(list(c("hp","drat"), c("cyl"), c("am","gear")), 
> FUN=function(preds)f("carb", preds, data=mtcars, family=poisson))
>lapply(z, summary)

*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help using mapply to run multiple models

2013-12-18 Thread Simon Kiss

Dennis, how would your function be modified to allow it to be more flexible in 
future. 
I'm thinking like:
> f <- function(x='Dependent variable', y='List of Independent Variables', 
> z='Data Frame')
> {
>form <- as.formula(paste(x, y, sep = " ~ "))
>glm(form, data =z)
> }

I tried that then using 
modlist <- lapply(xvars, f), but it didn't work. 

On 2013-12-18, at 3:29 AM, Dennis Murphy  wrote:

> Hi:
> 
> Here's a way to generate a list of model objects. Once you have the
> list, you can write or call functions to extract useful pieces of
> information from each model object and use lapply() to call each list
> component recursively.
> 
> sample.df<-data.frame(var1=rbinom(50, size=1, prob=0.5),
>  var2=rbinom(50, size=2, prob=0.5),
>  var3=rbinom(50, size=3, prob=0.5),
>  var4=rbinom(50, size=2, prob=0.5),
>  var5=rbinom(50, size=2, prob=0.5))
> 
> # vector of x-variable names
> xvars <- names(sample.df)[-1]
> 
> # function to paste a variable x into a formula object and
> # then pass it to glm()
> f <- function(x)
> {
>form <- as.formula(paste("var1", x, sep = " ~ "))
>glm(form, data = sample.df)
> }
> 
> # Apply the function f to each variable in xvars
> modlist <- lapply(xvars, f)
> 
> To give you an idea of some of the things you can do with the list:
> 
> sapply(modlist, class)# return class of each component
> lapply(modlist, summary)   # return the summary of each model
> 
> # combine the model coefficients into a two-column matrix
> do.call(rbind, lapply(modlist, coef))
> 
> 
> You'd probably want to rename the second column since the slopes are
> associated with different x variables.
> 
> Dennis
> 
> On Tue, Dec 17, 2013 at 5:53 PM, Simon Kiss  wrote:
>> I think I'm missing something.  I have a data frame that looks below.
>> sample.df<-data.frame(var1=rbinom(50, size=1, prob=0.5), var2=rbinom(50, 
>> size=2, prob=0.5), var3=rbinom(50, size=3, prob=0.5), var4=rbinom(50, 
>> size=2, prob=0.5), var5=rbinom(50, size=2, prob=0.5))
>> 
>> I'd like to run a series of univariate general linear models where var1 is 
>> always the dependent variable and each of the other variables is the 
>> independent. Then I'd like to summarize each in a table.
>> I've tried :
>> 
>> sample.formula=list(var1~var2, var1 ~var3, var1 ~var4, var1~var5)
>> mapply(glm, formula=sample.formula, data=list(sample.df), family='binomial')
>> 
>> And that works pretty well, except, I'm left with a matrix that contains all 
>> the information I need. I can't figure out how to use summary() properly on 
>> this information to usefully report that information.
>> 
>> Thank you for any suggestions.
>> 
>> *
>> Simon J. Kiss, PhD
>> Assistant Professor, Wilfrid Laurier University
>> 73 George Street
>> Brantford, Ontario, Canada
>> N3T 2C9
>> 
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help using mapply to run multiple models

2013-12-18 Thread Simon Kiss

Thanks! that works, more or less. Although the wonky behaviour of mapply that 
David pointed out is irritating. I tried deleting the $call item from the 
models produced and passing them to stargazer for reporting the results, but 
stargazer won't recognize the results even though the class is explicitly "glm 
lm".  
Does anyone know why mapply produces such weird results?
On 2013-12-18, at 3:29 AM, Dennis Murphy  wrote:

> Hi:
> 
> Here's a way to generate a list of model objects. Once you have the
> list, you can write or call functions to extract useful pieces of
> information from each model object and use lapply() to call each list
> component recursively.
> 
> sample.df<-data.frame(var1=rbinom(50, size=1, prob=0.5),
>  var2=rbinom(50, size=2, prob=0.5),
>  var3=rbinom(50, size=3, prob=0.5),
>  var4=rbinom(50, size=2, prob=0.5),
>  var5=rbinom(50, size=2, prob=0.5))
> 
> # vector of x-variable names
> xvars <- names(sample.df)[-1]
> 
> # function to paste a variable x into a formula object and
> # then pass it to glm()
> f <- function(x)
> {
>form <- as.formula(paste("var1", x, sep = " ~ "))
>glm(form, data = sample.df)
> }
> 
> # Apply the function f to each variable in xvars
> modlist <- lapply(xvars, f)
> 
> To give you an idea of some of the things you can do with the list:
> 
> sapply(modlist, class)# return class of each component
> lapply(modlist, summary)   # return the summary of each model
> 
> # combine the model coefficients into a two-column matrix
> do.call(rbind, lapply(modlist, coef))
> 
> 
> You'd probably want to rename the second column since the slopes are
> associated with different x variables.
> 
> Dennis
> 
> On Tue, Dec 17, 2013 at 5:53 PM, Simon Kiss  wrote:
>> I think I'm missing something.  I have a data frame that looks below.
>> sample.df<-data.frame(var1=rbinom(50, size=1, prob=0.5), var2=rbinom(50, 
>> size=2, prob=0.5), var3=rbinom(50, size=3, prob=0.5), var4=rbinom(50, 
>> size=2, prob=0.5), var5=rbinom(50, size=2, prob=0.5))
>> 
>> I'd like to run a series of univariate general linear models where var1 is 
>> always the dependent variable and each of the other variables is the 
>> independent. Then I'd like to summarize each in a table.
>> I've tried :
>> 
>> sample.formula=list(var1~var2, var1 ~var3, var1 ~var4, var1~var5)
>> mapply(glm, formula=sample.formula, data=list(sample.df), family='binomial')
>> 
>> And that works pretty well, except, I'm left with a matrix that contains all 
>> the information I need. I can't figure out how to use summary() properly on 
>> this information to usefully report that information.
>> 
>> Thank you for any suggestions.
>> 
>> *
>> Simon J. Kiss, PhD
>> Assistant Professor, Wilfrid Laurier University
>> 73 George Street
>> Brantford, Ontario, Canada
>> N3T 2C9
>> 
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help using mapply to run multiple models

2013-12-17 Thread Simon Kiss

I think I'm missing something.  I have a data frame that looks below.  
sample.df<-data.frame(var1=rbinom(50, size=1, prob=0.5), var2=rbinom(50, 
size=2, prob=0.5), var3=rbinom(50, size=3, prob=0.5), var4=rbinom(50, size=2, 
prob=0.5), var5=rbinom(50, size=2, prob=0.5))

I'd like to run a series of univariate general linear models where var1 is 
always the dependent variable and each of the other variables is the 
independent. Then I'd like to summarize each in a table.
I've tried : 

sample.formula=list(var1~var2, var1 ~var3, var1 ~var4, var1~var5)
mapply(glm, formula=sample.formula, data=list(sample.df), family='binomial')

And that works pretty well, except, I'm left with a matrix that contains all 
the information I need. I can't figure out how to use summary() properly on 
this information to usefully report that information. 

Thank you for any suggestions. 

*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] ggplot2 percentages of subpopulations

2013-09-10 Thread Simon Kiss

Hi there: 
I have a sample data set that looks like below.  The variable 'value' 
represents the counts of cases in each response category.  And I would like to 
get the barchart to graph the number of responses as a percentage of each total 
*subpopulation* (Males compared to Females), rather than as a percentage of 
*all* the responses.
Can someone provide a suggestion?
Thank you

Yours, Simon Kiss
#Sample Code
sample.dat<-data.frame(response.category=rep(c('A', 'B','C'), 2), 
value=c(50,25,25, 25,25,25), pop=c(rep('Males', 3), rep('Females', 3)))
#Draw GGPLot
test<-ggplot(sample.dat, aes(x=response.category,y=value, group=pop))
test+geom_bar(stat='identity', position='dodge',aes(fill=pop))

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Goodness of fit statistics for cfa are missing (sem package)

2013-07-09 Thread Simon Kiss

Dear colleagues,
I'm working on a confirmatory factor analysis and the model is not returning 
most of the usual goodness-of-fit statistics. 

I'm testing whether this survey data confirms a hypothesized two-factor 
uncorrelated structure that has theoretical and empirical support from another 
case.

Below is (hopefully!) reproducible code which creates ff.cov a replica of the 
covariance matrix from my own data, the model that I have specified and am 
testing (cfa.mod.1) and the sem test of that model (cfa) and then a summary of 
the model fit (summary(cfa1)). 

The problem is that many of the usual measures o goodness-of-fit do not appear 
after summary(cfa1).  I only get a chi-square statistic, degrees of freedom and 
a BIC.  

I saw a previous question on the R-mailing list that raised a similar issue and 
it was suggested that the problem lay in the specification of the model and 
that the degrees of freedom there were 0.  Here, though, the df is 77.  
Unfortunately I can't find that question in the archives again or I would have 
linked to it.

The data set includes 376 observations and has 14 variables.  Seven (coded with 
an h in the variable name, as in cc.h.varname.e or h) are hypothesized to load 
on one factor uncorrelated with the second factor, coded with a c (as in 
cc.c.varname.c or i). 
I used this for guidance http://vimeo.com/38941937
Yours, Simon Kiss


#Load Libraries
library(sem)

#Create Covariance Matrix
ff.cov <-
  structure(c(0.0925407885304659, 0.0296839426523298, 0.00787168458781362, 
  0.0261784946236559, 0.0031878853046595, 0.0261837275985663, 
0.00847584229390681, 
  -0.00106, -0.00600867383512545, -0.010714623655914, 
-0.0123756272401434, 
  -0.00528007168458781, -0.00116, 0.000812186379928316, 
0.0296839426523298, 
  0.0665810023041475, 0.00836764592933948, 0.0281491359447005, 
  0.00793406810035842, 0.0169870865335381, 0.00258921786994368, 
  0.000712720174091142, -0.00318649385560676, -0.0083643253968254, 
  -0.0133228366615463, -0.00557817844342038, -0.00224328341013825, 
  -0.000821018945212493, 0.00787168458781362, 0.00836764592933948, 
  0.0804340181771633, 0.00589630696364567, 0.0201758960573477, 
  0.012536866359447, 0.000343669994879673, 0.00425544674859191, 
  -0.00838453020993344, 0.00563975294418843, 0.00256180235535074, 
  0.00609073860727087, -0.00659535970302099, 0.00495727086533538, 
  0.0261784946236559, 0.0281491359447005, 0.00589630696364567, 
  0.0716310995903738, 0.00856442652329749, 0.0175328725038402, 
  0.0104401625704045, 0.0074095942140297, 0.00455983742959549, 
  -0.0123115783410138, -0.0192821300563236, -0.0166337109575013, 
  0.00623943292370712, -0.0114852790578597, 0.0031878853046595, 
  0.00793406810035842, 0.0201758960573477, 0.00856442652329749, 
  0.0550506451612903, 0.00998831541218638, -0.000462311827956989, 
  -0.0019576523297491, -0.0053855017921147, 0.00893281362007168, 
  8.10035842293904e-05, 0.00704324372759857, -0.00593381720430108, 
  -0.00112867383512545, 0.0261837275985663, 0.0169870865335381, 
  0.012536866359447, 0.0175328725038402, 0.00998831541218638, 
0.050779846390169, 
  0.00705281105990783, -0.00306167946748592, -0.00736291858678955, 
  0.00135779825908858, -0.00280025601638505, 0.00161077316948285, 
  -0.00899035330261137, 0.00921536098310292, 0.00847584229390681, 
  0.00258921786994368, 0.000343669994879673, 0.0104401625704045, 
  -0.000462311827956989, 0.00705281105990783, 0.0475710074244752, 
  -0.0101174718381976, -0.0102886418330773, -0.0175483013312852, 
  -0.0107030209933436, -0.00982140168970814, -0.00959551843317972, 
  -0.00663934971838198, -0.00106, 0.000712720174091142, 
0.00425544674859191, 
  0.0074095942140297, -0.0019576523297491, -0.00306167946748592, 
  -0.0101174718381976, 0.0572903110599078, 0.0148363965693804, 
  0.0134899987199181, 0.0172146441372248, 0.00366528545826933, 
  0.0148914759344598, 0.00469321556579621, -0.00600867383512545, 
  -0.00318649385560676, -0.00838453020993344, 0.00455983742959549, 
  -0.0053855017921147, -0.00736291858678955, -0.0102886418330773, 
  0.0148363965693804, 0.0650389644137225, 0.00571948412698413, 
  0.00671484895033282, -0.000752505120327701, 0.0295244790066564, 
  -0.00901979006656426, -0.010714623655914, -0.0083643253968254, 
  0.00563975294418843, -0.0123115783410138, 0.00893281362007168, 
  0.00135779825908858, -0.0175483013312852, 0.0134899987199181, 
  0.00571948412698413, 0.0839045558115719, 0.0463349206349206,

[R] Psych package: Error in biplot.psych(sample.mod) : Biplot requires factor/component scores:

2013-06-25 Thread Simon Kiss

Hello: I'm trying to construct a biplot from the psych package. The underlying 
data frame looks just like sample.data, below. I turned it into a polychoric 
correlation matrix sample.cor, below, as it is derived from a series of Likert 
(ordinal) items. All are positive, I just used negative numbers in this dataset 
to get two separate factors.  I created a PCA from sample.cor$rho, specifying 
that scores were to be kept via scores=TRUE, but the command, 
biplot.psych(sample.mod) returns the error message: Error in biplot.psych, 
Biplot requires factor/component scores.
But it seems from the help documentation, that one really only has to use the 
command biplot(mod) to get the plot.
Can someone please advise?
Yours, Simon Kiss


#Sample data
sample.data<-data.frame(var1=sample(c(0,0.33, 0.66, 1), size=100, 
replace=TRUE), var2=sample(c(0,0.33, 0.66, 1), size=100, replace=TRUE), 
var3=sample(c(0,-0.33, -0.66, -1), size=100, replace=TRUE), 
var4=sample(c(0,-0.33,-0.66,-1), size=100, replace=TRUE))

#Correlation Matrix
sample.cor<-polychoric(sample.data, polycor=TRUE)
#Principal Components Analysis
sample.mod<-principal(sample.cor$rho, nfactors=2,scores=TRUE,covar=TRUE)
#Draw Biplot
biplot.psych(sample.mod)

#error
Error in biplot.psych(sample.mod) : 
  Biplot requires factor/component scores:

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Adding time series to time graphs

2013-04-11 Thread Simon Kiss

Hello: I have done this before but cannot figure out how to do it again.

I would like to graph campaign evolution of news stories on certain topics. The 
campaign time period is as follows:

campaign<-seq.Date(from=as.Date('2011-09-06'), to=as.Date('2011-10-5'), by=1)

I have a table of newspaper story frequencies containing a certain word that 
can be turned into a data.frame (or not). I'll reproduce it as a data.frame

plotdf<-data.frame(story.dates=seq.Date(as.Date('2011-09-17'),as.Date('2011-09-30'),
 by=1), Freq=seq(1,14, by=1))

How do I overlay the frequency of newspaper stories in a line plot on a graph 
where the x-axis is a series of dates twice as long as the time series itself? 
The reason I'd like this is because I'd like to add a couple of other story 
time series as well. They may appear at other points in time in the campaign as 
well.

Thanks.
Simon Kiss
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] latex(test, collabel=) returns wrong latex code?

2013-03-15 Thread Simon Kiss

Hello:
I'm working with a 2-dimensional table that looks sort of like test below.
I'm trying to produce latex code that will add dimension names for both the 
rows and the columns.
In using the following code, latex chokes when I include collabel='Vote' but 
it's fine without it.

The code below prouces the latex code further below.  I'm confused by this, 
because it looks like it's creating two bits of text for each instance of 
\multicolumn.  Is that really allowed in \multicolumn?
Could someone clarify?
Thank you!
Yours, SJK


library(Hmisc)
test<-as.table(matrix(c(50,50,50,50), ncol=2))
latex(test, rowlabel='Gender',collabel='Vote', file='')

% latex.default(test, rowlabel = "Gender", collabel = "vote", file = "") 
%
\begin{table}[!tbp]
\begin{center}
\begin{tabular}{lrr}
\hline\hline
\multicolumn{1}{l}{Gender}&\multicolumn{1}{vote}{A}&\multicolumn{1}{l}{B}\tabularnewline
\hline
A&$50$&$50$\tabularnewline
B&$50$&$50$\tabularnewline
\hline
\end{tabular}
\end{center}
\end{table}
*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

Please avoid sending me Word, PowerPoint or Excel attachments. Sending these 
documents puts pressure on many people to use Microsoft software and helps to 
deny them any other choice. In effect, you become a buttress of the Microsoft 
monopoly.

To convert to plain text choose Text Only or Text Document as the Save As Type. 
 Your computer may also have a program to convert to PDF format. Select File, 
then Print. Scroll through available printers and select the PDF converter. 
Click on the Print button and enter a name for the PDF file when requested.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Reading outdated .Rprofile file

2013-03-05 Thread Simon Kiss

Hi there:
I'm having a weird problem with my startup procedure. R.app is reading an 
unknown .Rprofile file.

First, I'm on a Mac Os 10.6.8 running R.app 2.15.0

On startup
 > getwd()
[1] "/Users/simon"

But: the contents of my .Rprofile file in my home directory when viewed with a 
text editor are:

.First<-function() {
 source("/Users/simon/Documents/R/functions/trim.leading.R")
source("/Users/simon/Documents/R/functions/trim.trailing.R")
source("/Users/simon/Documents/R/functions/trim.R")
  source("/Users/simon/Documents/R/functions/pseudor2.R")
source("/Users/simon/Documents/R/functions/dates.R")
source("/Users/simon/Documents/R/functions/andersen.R")
source("/Users/simon/Documents/R/functions/tabfun.R")
source("/Users/simon/Documents/R/functions/cox_snell.R")
source("/Users/simon/Documents/R/functions/cor.prob.R")
source("/Users/simon/Documents/R/functions/kmo.R")
source("/Users/simon/Documents/R/functions/residual.stats.R")

source("/Users/simon/Documents/R/functions/missings.plot.R")
}


but then, when I type .First from the command line I get
function () 
{
source("/Users/simon/Documents/R/functions/sample_size.R")
source("/Users/simon/Documents/R/functions/pseudor2.R")
source("/Users/simon/Documents/R/functions/dates.R")
source("/Users/simon/Documents/R/functions/andersen.R")
source("/Users/simon/Documents/R/functions/tabfun.R")
source("/Users/simon/Documents/R/functions/cox_snell.R")
source("/Users/simon/Documents/R/functions/cor.prob.R")
source("/Users/simon/Documents/R/functions/kmo.R")
}

Needless to say, I get an error because the file sample.size.R was deleted a 
long time ago.

So how do I get R.app to read the updated .Rprofile file?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help with multiple barplots

2013-01-31 Thread Simon Kiss

Hello: I need to create a six barplots from data that looks pretty close to 
what appears below. There are two grouping variables (age and gender) and three 
dependent variables for each grouping variables.  I'm not really familiar with 
trellis graphics, perhaps there is something that can do what I need there, i 
don't know.  
The thing is: I *need* these to appear on one row, with some way of 
differentiating between the three barplots of one grouping variable and the 
three from the other grouping variable.  It's for a grant application and space 
is at a premium.  The width of everything can be about 7 inches wide and the 
height maybe 2 to 2.5 inches. I also need an outer margin to place a legend.  I 
can do this with the following using the layout command, but I cannot figure 
out a nice way to differentiate the two groups of variables.  I'd like to find 
a way to put a little bit of space between the three from one grouping variable 
and the three from another grouping variable.  

If anyone has any thoughts, I'd be very grateful. Yours truly, Simon J. Kiss

###Random Data
crime<-sample(c('agree' ,'disagree'), replace=TRUE, size=100)
guns<-sample(c('agree','disagree'), replace=TRUE, size=100)
climate<-sample(c('agree', 'disagree'), replace=TRUE, size=100)
gender<-sample(c('male','both' ,'female'), replace=TRUE, size=100)
age<-sample(c('old', 'neither', 'young'), replace=TRUE, size=100)
dat<-as.data.frame(cbind(crime, guns, climate, gender, age))
###Code I'm working with now
layout(matrix(c(1,2,3,4,5,6), c(1,6)))
barplot(prop.table(table(dat$guns, dat$gender), 2))
barplot(prop.table(table(dat$crime, dat$gender), 2))
barplot(prop.table(table(dat$climate, dat$gender), 2))
barplot(prop.table(table(dat$guns, dat$gender), 2))
barplot(prop.table(table(dat$crime, dat$age), 2))
barplot(prop.table(table(dat$climate, dat$age), 2))

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Function failure in tm

2013-01-15 Thread Simon Kiss

HI all:
I have a customized source reader for the package tm (that Milan Bouchet-Vallat 
has been instrumental in producing). 
I can get it to produce a corpus of class:
"VCorpus" "Corpus"  "list"   

class(mycorp[1]) returns
"VCorpus" "Corpus"  "list"   

and class(mycorp[[1]] returns 
"PlainTextDocument" "TextDocument"  "character"   

But now that I've got a corpsu, none of the transformation functions work at 
all. They all return the following error (with the respective function name)
Error in UseMethod("stripWhitespace", x) : 
  no applicable method for 'stripWhitespace' applied to an object of class 
"NULL"

I haven't seen this error reported anywhere in the R-list archives.  Does 
anyone have any suggestions?
Yours, Simon Kiss

P.S. The results of sessionInfo() are
R version 2.15.0 (2012-03-30)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

other attached packages:
[1] RWekajars_3.7.7-2 rJava_0.9-3   RWeka_0.4-13  
Snowball_0.0-8   
[5] tm.plugin.factiva_1.1 tm_0.5-8.1   

loaded via a namespace (and not attached):
[1] grid_2.15.0  slam_0.1-26  tools_2.15.0 XML_3.9-4   

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tm: custom reader for readPlain

2013-01-08 Thread Simon Kiss

Hmm...Thanks a lot! that seems like really useful stuff. It might be a bit over 
 my head, but I'll look into it. 
The articles are all contained in one text file, but they are clearly delimited 
(either by a series of  ) or the regular expression ^Document.[0-9]. 
Simon
On 2013-01-08, at 4:44 PM, Milan Bouchet-Valat wrote:

> Le mardi 08 janvier 2013 à 15:56 -0500, Simon Kiss a écrit :
>> Hello:
>> I have a series of newspaper articles from a Canadian newspaper
>> database (Canadian Newsstand) that look just like below.
>> 
>> I've read through this vignette
>> (http://cran.r-project.org/web/packages/tm/vignettes/extensions.pdf)
>> about creating a custom reader to extract meta-data, but I can't
>> understand how to apply this in the context of a text document, rather
>> than in the tabular format as in the vignette.  You can see there's
>> all kinds of valuable information in each document -Author, page
>> number, publication year, section, publication title
>> Can anyone provide some suggestions to someone unfamiliar with the tm
>> package as to how to go about creating a custom reader for this
>> situation?
> You should create a reader function that takes as an input the text
> content you pasted at the end of your messages, parses it as
> appropriate, and returns a PlainTextDocument. The information can be set
> using the meta() function on the document object before returning it.
> You can see how this process works by looking at the readFactivaHTML.R
> file from my tm.plugin.factiva package, and probably from other packages
> too (do not use readFactivaXML.R as it uses a method that only works for
> XML input). Of course, parsing the input will take some work, but it
> shouldn't be too hard if you split each line into a field identifier
> (the part before ":") and the value of the field, and create a character
> vector from that.
> 
> An information you did not give us is how are distributed the different
> articles you need to import. If they are each in a separate files, you
> can adapt DirSource() from tm so that it calls your reader function on
> each file. If they are in one file, you need to create a custom source
> that will read the file, split it and call the reader function on the
> part corresponding to each article; this latter way is illustrated by
> the HTML part of the FactivaSource.R file (again, skip the XML part).
> 
> Finally, maybe you can extract the articles in a different format,
> ideally in XML, which is easier to use? Or maybe this newspaper is
> available on Factiva, in which case my package will work for you?
> 
> 
> Hope this helps
> 
> 
>> Yours truly,
>> Simon Kiss
>> 
>> 
>> 
>> Document 1 of 40
>> First Nation agrees not to block trains
>> Author: SHAWN BERRY Legislature Bureau
>> Publication info: Daily Gleaner [Fredericton, N.B] 07 Jan 2013: A.3.
>> http://remote.libproxy.wlu.ca/login?url=http://search.proquest.com/docview/1266701269?accountid=15090
>> Abstract: Participants are also concerned about Chief Theresa Spence who 
>> stopped eating solid food on Dec. 11 in a bid to secure a meeting between 
>> First Nations leaders, Prime Minister Stephen Harper and Gov. Gen. David 
>> Johnston to discuss the treaty relationship.
>> Links: null
>> Full Text: A bunch of text about a story here
>> Subject: Railroads; Native North Americans; Meetings; Injunctions
>> Title: First Nation agrees not to block trains
>> Publication title: Daily Gleaner
>> First page: A.3
>> Publication year: 2013
>> Publication date: Jan 7, 2013
>> Year: 2013
>> Section: Main
>> Publisher: Infomart, a division of Postmedia Network Inc.
>> Place of publication: Fredericton, N.B.
>> Country of publication: Canada
>> Journal subject: GENERAL INTEREST PERIODICALS--UNITED STATES
>> ISSN: 08216983
>> Source type: Newspapers
>> Language of publication: English
>> Document type: News
>> ProQuest document ID: 1266701269
>> Document URL: 
>> http://remote.libproxy.wlu.ca/login?url=http://search.proquest.com/docview/1266701269?accountid=15090
>> Copyright: (Copyright (c) 2013 The Daily Gleaner (Fredericton))
>> Last updated: 2013-01-07
>> Database: Canadian Newsstand Complete
>> 
>> 
>> *
>> Simon J. Kiss, PhD
>> Assistant Professor, Wilfrid Laurier University
>> 73 George Street
>> Brantford, Ontario, Canada
>> N3T 2C9
>> Cell: +1 905 746 7606
>> 
>> Please avoid sending me Word, PowerPoint or Excel attachments.

[R] tm: custom reader for readPlain

2013-01-08 Thread Simon Kiss

Hello:
I have a series of newspaper articles from a Canadian newspaper database 
(Canadian Newsstand) that look just like below.

I've read through this vignette 
(http://cran.r-project.org/web/packages/tm/vignettes/extensions.pdf) about 
creating a custom reader to extract meta-data, but I can't understand how to 
apply this in the context of a text document, rather than in the tabular format 
as in the vignette.  You can see there's all kinds of valuable information in 
each document -Author, page number, publication year, section, publication 
title
Can anyone provide some suggestions to someone unfamiliar with the tm package 
as to how to go about creating a custom reader for this situation?
Yours truly,
Simon Kiss



Document 1 of 40
First Nation agrees not to block trains
Author: SHAWN BERRY Legislature Bureau
Publication info: Daily Gleaner [Fredericton, N.B] 07 Jan 2013: A.3.
http://remote.libproxy.wlu.ca/login?url=http://search.proquest.com/docview/1266701269?accountid=15090
Abstract: Participants are also concerned about Chief Theresa Spence who 
stopped eating solid food on Dec. 11 in a bid to secure a meeting between First 
Nations leaders, Prime Minister Stephen Harper and Gov. Gen. David Johnston to 
discuss the treaty relationship.
Links: null
Full Text: A bunch of text about a story here
Subject: Railroads; Native North Americans; Meetings; Injunctions
Title: First Nation agrees not to block trains
Publication title: Daily Gleaner
First page: A.3
Publication year: 2013
Publication date: Jan 7, 2013
Year: 2013
Section: Main
Publisher: Infomart, a division of Postmedia Network Inc.
Place of publication: Fredericton, N.B.
Country of publication: Canada
Journal subject: GENERAL INTEREST PERIODICALS--UNITED STATES
ISSN: 08216983
Source type: Newspapers
Language of publication: English
Document type: News
ProQuest document ID: 1266701269
Document URL: 
http://remote.libproxy.wlu.ca/login?url=http://search.proquest.com/docview/1266701269?accountid=15090
Copyright: (Copyright (c) 2013 The Daily Gleaner (Fredericton))
Last updated: 2013-01-07
Database: Canadian Newsstand Complete


*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

Please avoid sending me Word, PowerPoint or Excel attachments. Sending these 
documents puts pressure on many people to use Microsoft software and helps to 
deny them any other choice. In effect, you become a buttress of the Microsoft 
monopoly.

To convert to plain text choose Text Only or Text Document as the Save As Type. 
 Your computer may also have a program to convert to PDF format. Select File, 
then Print. Scroll through available printers and select the PDF converter. 
Click on the Print button and enter a name for the PDF file when requested.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Changing Variable Names In VCD

2012-12-18 Thread Simon Kiss

Hello:
What is the most efficient way to change the plotted variable names in mosaic 
plots in the vcd package? Should one do a separate contingency table first, 
change the dimension names there and then pass that to mosaic?
Or is there a way to do it simply within mosaic.
I was thinking something like:
mosaic(~var1+var2, labelling_args=list(varnames=c('newvar1', 'newvar2'))
Simon Kiss
*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] xtable with psych objects

2012-12-18 Thread Simon Kiss

Hello:
I s there a way to use xtable with objects from the psych package, particularly 
principal()?
Is there a difference between princomp and principal? xtable seems to play 
better with princomp.
Thank you.
Yours, Simon Kiss
*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Warning message: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!

2012-12-10 Thread Simon Kiss

Hi there 
I'm trying to fit a logistic regression model to data that looks very similar 
to the data in the sample below.   I don't understand why I'm getting this 
error; none of the data are proportional and the weights are numeric values.  
Should I be concerned about the warning about non-integer successes in my 
binomial glm? If I should be, how do I go about addressing it?
I'm pretty sure the weights in the data frame are sampling weights.  

What follows is the result of str() on my data, the series of commands I'm 
using to fit the model, the responses I'm getting and then some code to 
reproduce the data and go through the same steps with that code.  One last 
(minor) question.  When calling svyglm on the sample data, I actually get some 
information about the model fitting results as well as the error about 
non-integer successes.  In my real data, you only get the warning. Calling 
summary(mod1) on the real data does return information about the coefficients 
and the model fitting.

I'm grateful for any help. I'm aware that the topic of non-integer successes 
has been addressed before, but I could not find my answer to this question.

Yours, Simon Kiss

##str() on original data
str(mat1)
'data.frame':   1001 obs. of  5 variables:
 $ prov  : Factor w/ 4 levels "Ontario","PQ",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ edu   : Factor w/ 2 levels "secondary","post-secondary": 2 2 2 1 1 2 2 2 1 1 
...
 $ gender: Factor w/ 2 levels "Male","Female": 1 1 2 2 2 2 1 1 2 2 ...
 $ weight: num  1.145 1.436 0.954 0.765 0.776 ...
 $ trust : Factor w/ 2 levels "no trust","trust": 2 1 1 1 1 2 1 2 1 2 ...

###Set up survey design
des.1<-svydesign(~0, weights=~weight, data=mat1)

###model and response to svyglm
mod1<-svyglm(trust ~ gender+edu+prov, design=des.1, family='binomial')

Warning message:
In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!

Model Summary
summary(mod1)

Call:
svyglm(formula = trust ~ gender + edu + prov, design = des.1, 
family = "binomial")

Survey design:
svydesign(~0, weights = ~weight, data = mat1)

Coefficients:
   Estimate Std. Error t value Pr(>|t|)
(Intercept)   -0.625909   0.156560  -3.998 6.87e-05 ***
genderFemale   0.013519   0.140574   0.0960.923
edupost-secondary -0.011569   0.141528  -0.0820.935
provPQ-0.006614   0.172105  -0.0380.969
provatl0.335166   0.297860   1.1250.261
provwest  -0.053862   0.174826  -0.3080.758
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1.002254)

Number of Fisher Scoring iterations: 4




#Attempt To Reproduce The Problem
Data
mat.test<-data.frame(edu=c(rep('secondary', 300), rep('post-secondary', 300)), 
prov=c(rep('ON', 200), rep('PQ', 200), rep('AB', 200)), 
trust=c(rep('trust',200), rep('notrust',400)), gender=c(rep('Male', 300), 
rep('Female', 300)), weight=rnorm(600, mean=1, sd=0.3))
###Survey Design object
test<-svydesign(~0, weights=~weight, data=mat.test)

#Call To svyglm
svyglm(trust ~ edu+prov+gender, design=test, family='binomial')

#Reults
Independent Sampling design (with replacement)
svydesign(~0, weights = ~weight, data = mat.test)

Call:  svyglm(formula = trust ~ edu + prov + gender, design = test, 
family = "binomial")

Coefficients:
 (Intercept)  edusecondaryprovONprovPQgenderMale  
  -2.658e+01-8.454e-04 5.317e+01-1.408e-02NA  

Degrees of Freedom: 599 Total (i.e. Null);  596 Residual
Null Deviance:  759.6 
Residual Deviance: 3.406e-09AIC: 8 
Warning messages:
1: In eval(expr, envir, enclos) :
  non-integer #successes in a binomial glm!
2: glm.fit: algorithm did not converge 
*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Ordering List Items Chronologically

2012-11-20 Thread Simon Kiss

Dear colleagues,
Is there a way to order list items by date? I have a series of surveys in a 
list where the name of each list item is the date the survey was taken but the 
list items are out of order.  Each data frame has a variable in it with the 
survey date as well, if that helps.
Yours, Simon Kiss
#Sample Data
mylist<-list('1991-01-01'=data.frame(a=rep(5,5), 
survey.date=rep(as.Date('1991-01-01', format='%Y-%m-%d'))), 
'1979-01-01'=data.frame(aa=rep(5,5), survey.date=rep(as.Date('1979-01-01', 
format='%Y-%m-%d'), 5)), '2001-01-01'=data.frame(c=rep(6,5), 
survey.date=rep(as.Date('2001-01-01', format='%Y-%m-%d'), 5)))
mylist

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] select different variables from a list of data frames

2012-11-12 Thread Simon Kiss

Hi:
How do I select different variables from a list of data frames.
I have a list of 13 that looks like below.  Each data frame has more variables 
than I need.  How do I go through the list and select the variables that I need.
In the example below, I need to get the variables "a", and "q10" and "q14" to 
be returned to two separate data frames.
Thank you.
Yours, Simon Kiss

#Sample data
  mylist<-list(df1=data.frame(a=seq(1,10,1), c=seq(1,109,1), q10=rep('favour', 
10)), df2=data.frame(a=seq(1,10,1), b=seq(15,24,1), q14=rep('favour', 10)))

#The variables with different names that I need are
q<-c('q10', 'q14')
#My current code

dat<-mapply(function(x,y) {
  data.frame(a=x$a, y$q)
}, x=mylist, y=q)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] using lapply with recode

2012-11-08 Thread Simon Kiss

Hello: 
Forgive me, this is surely a simple question but I can't figure it out, having 
consulted the help archives and "Data Manipulation With R" (Spector).
I have a list of 11 data frames with one common variable in each (prov). I'd 
like to use lapply to go through and recode one particular level of that common 
variable. 
I can get the recode to work, but it only returns the variable that has been 
recoded.  I need the whole data frame with the recoded variable.

Thank you for your help. Reproducible data and my current code are below.


Sample Data
mylist<-list(df1=data.frame(a=seq(1,10,1), prov=c(rep('QUE', 5), rep('BC', 
5))), df2=data.frame(a=seq(1,10,1), prov=c(rep('Quebec', 5), rep('AB', 5
str(mylist)

###My current code
lapply(mylist, function(x) {
recode(x$prov, "'QUE'='QC' ; 'Quebec'='QC'")
}
)

*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Combine two variables

2012-09-11 Thread Simon Kiss

Hi:
I have two variables in a data frame that are the results of a wording 
experiment in a survey. I'd like to create a third variable that combines the 
two variables.  Recode doesn't seem to work, because it just recodes the first 
variable into the third, then recodes the second variable into the third, 
overwriting the first recode. I can do this with a rather elaborate indexing 
process, subsetting the first column and then copying the data into the second 
etc. But I'm looking for a cleaner way to do this. The data frame looks like 
this.


df<-data.frame(var1=sample(c('a','b','c',NA),replace=TRUE, size=100), 
var2=sample(c('a','b','c',NA),replace=TRUE,size=100))

df<-subset(df, !is.na(var1) |!is.na(var2))

As you can see, if one variable has an NA, then the other variable has a valid 
value, so how do I just combine the two variables into one?
Thank you for your assistance.
Simon Kiss

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Changing line length in Sweave output works for numeric, but not for character vectors

2012-08-21 Thread Simon Kiss

Hi: I'm not sure how to do 1, but I also tried strwrap() and that worked OK. 
Although it's not pretty. But it'll do.
Simon
On 2012-08-20, at 5:52 PM, Yihui Xie wrote:

> Two possible solutions:
> 
> 1. Redefine the LaTeX environment so it allows wrapping (see listings
> for example);
> 2. Manually break your long string into shorter pieces and paste()
> them together, e.g. paste('long', 'long', 'string')
> 
> Regards,
> Yihui
> --
> Yihui Xie 
> Phone: 515-294-2465 Web: http://yihui.name
> Department of Statistics, Iowa State University
> 2215 Snedecor Hall, Ames, IA
> 
> 
> On Mon, Aug 20, 2012 at 5:03 PM, Simon Kiss  wrote:
>> Hi there: I'm preparing a report in RStudio 0.96.330 on a Mac OS. I'm 
>> running R 2.15.0
>> 
>> I understand from Ross Ihaka's document 
>> (http://www.stat.auckland.ac.nz/~stat782/downloads/Sweave-customisation.pdf) 
>> that you can modify the line length of Sweave output by a call to 
>> options(wdith=x).
>> 
>> This works great for me for numeric output, but not for character vectors 
>> that I have to print. The following is some sample code that illustrates my 
>> problem.
>> 
>> Is there a different way to format character vectors that are stored in R?
>> Yours, Simon Kiss
>> 
>> \documentclass{article}
>> 
>> \begin{document}
>> \SweaveOpts{concordance=TRUE}
>> 
>> <>=
>> seq(1,100,1)
>> @
>> 
>> <>=
>> options(width=30)
>> @
>> <>=
>> seq(1,100,1)
>> @
>> 
>> <>=
>> test<-c('The government should do more to advance societys goals, even if 
>> that means limiting the freedom and choices of individuals.')
>> @
>> 
>> \end{document}
>> *
>> Simon J. Kiss, PhD
>> Assistant Professor, Wilfrid Laurier University
>> 73 George Street
>> Brantford, Ontario, Canada
>> N3T 2C9
>> Cell: +1 905 746 7606
>> 
>> Please avoid sending me Word, PowerPoint or Excel attachments. Sending these 
>> documents puts pressure on them to use Microsoft software and helps to deny 
>> them any other choice. In effect, you become a buttress of the Microsoft
>> monopoly. This pressure is a major obstacle to the broader adoption of free 
>> software.
>> 
>> To convert to plain text choose Text Only or Text Document as the Save As 
>> Type.  Your computer may also have a program to convert to PDF format. 
>> Select File, then Print. Scroll through available printers and select the 
>> PDF converter. Click on the Print button and enter a name for the PDF file 
>> when requested.
>> 
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

Please avoid sending me Word, PowerPoint or Excel attachments. Sending these 
documents puts pressure on them to use Microsoft software and helps to deny 
them any other choice. In effect, you become a buttress of the Microsoft 
monopoly. This pressure is a major obstacle to the broader adoption of free 
software.

To convert to plain text choose Text Only or Text Document as the Save As Type. 
 Your computer may also have a program to convert to PDF format. Select File, 
then Print. Scroll through available printers and select the PDF converter. 
Click on the Print button and enter a name for the PDF file when requested.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Changing line length in Sweave output works for numeric, but not for character vectors

2012-08-20 Thread Simon Kiss

Hi there: I'm preparing a report in RStudio 0.96.330 on a Mac OS. I'm running R 
2.15.0

I understand from Ross Ihaka's document 
(http://www.stat.auckland.ac.nz/~stat782/downloads/Sweave-customisation.pdf) 
that you can modify the line length of Sweave output by a call to 
options(wdith=x).

This works great for me for numeric output, but not for character vectors that 
I have to print. The following is some sample code that illustrates my problem. 

Is there a different way to format character vectors that are stored in R?
Yours, Simon Kiss

\documentclass{article}

\begin{document}
\SweaveOpts{concordance=TRUE}

<>=
seq(1,100,1)
@

<>=
options(width=30)
@
<>=
seq(1,100,1)
@

<>=
test<-c('The government should do more to advance societys goals, even if that 
means limiting the freedom and choices of individuals.')
@

\end{document}
*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

Please avoid sending me Word, PowerPoint or Excel attachments. Sending these 
documents puts pressure on them to use Microsoft software and helps to deny 
them any other choice. In effect, you become a buttress of the Microsoft 
monopoly. This pressure is a major obstacle to the broader adoption of free 
software.

To convert to plain text choose Text Only or Text Document as the Save As Type. 
 Your computer may also have a program to convert to PDF format. Select File, 
then Print. Scroll through available printers and select the PDF converter. 
Click on the Print button and enter a name for the PDF file when requested.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Rcurl, postForm()

2012-05-28 Thread Simon Kiss

Dear colleagues,
Could I get some assistance using postForm() to scrape the business names and 
addresses at this website: 
http://www.brantford.ca/business/LocalBusinessCommunity/Pages/BusinessDirectorySearch.aspx

I've read through (http://www.omegahat.org/RCurl/RCurlJSS.pdf) and scoured the 
web for tutorials, but I can't crack it.  I'm aware that this is probably a 
pretty basic question, but I need some help regardless. Yours, Simon Kiss

library(XML)
library(RCurl)
library(scrapeR)
library(RHTMLForms)
#Set URL
bus<-c('http://www.brantford.ca/business/LocalBusinessCommunity/Pages/BusinessDirectorySearch.aspx')
#Scrape URL
orig<-getURLContent(url=bus)
#Parse doc
doc<-htmlParse(orig[[1]], asText=TRUE)
#Get The forms 
forms<-getNodeSet(doc, "//form")
forms[[1]]
#These are the input nodes
getNodeSet(forms[[1]], ".//input")
#These are the select nodes
getNodeSet(forms[[1]], ".//select")

*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Using xpathapply or getnodeset to get text between two distinct tags

2012-05-11 Thread Simon Kiss

Hello:
 
The following code extracts the links to the daily transcripts of Canada's 
House Of Commons.  'links' is a matrix of URLs (ncol=1), each of which points 
to one day's transcripts.

If you inspect the code for scrape(links[1]), you will find that periodically 
there appears an italicitze tag after a paragraph tag (Translation. At this point, the speaker is speaking French.

Then there are some  tags that list some text, and then, after the speaker 
has returned to English, you get the same formula as above, English some speech Some Speech 
Ultimately, what I'd like to do i count the words between the  tags 
'Tanslation' and 'English'.
I'm pretty sure I can get the text into the tm package to do the word counts, 
what I really don't know how to is return the text between 'Translation' and 
'English' so that I can mark it as 'French' and then return the text between 
'English' and 'Translation' and mark it as English.  
Does any one have any suggestions? Yours truly,
Simon J. Kiss


#Necessary libraries
library(XML)
library(scrapeR)
#URL for links to 2012 transcripts
hansard<-c('http://www.parl.gc.ca/housechamberbusiness/ChamberSittings.aspx?View=H&Language=E&Mode=1&Parl=41&Ses=1')
#Scrape the page with the links
doc<-scrape(url=hansard, parse=TRUE, follow=TRUE)
#Not sure what exactly this does, but it is necessary
doc<-doc[[1]]
#Get the xmlRoot directory
doc<-  xmlRoot(doc)
#Get nodes that contain only the links to each day's transcripts
links<-  getNodeSet(doc, "//a[@class='PublicationCalendarLink']/@href")
links<-matrix(links)
#Paste those href links to the root URL
links<-apply(links, 1, function(x) paste('http://www.parl.gc.ca', x, sep=''))
#Inspect
links[1]
#Scrape text from first URL in 'links'
oneday<-scrape(links[1])[[1]]

#Return p/i elements from 'oneday'
getNodeset(oneday, "//p/i")

#sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] C/en_US.UTF-8/C/C/C/C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

other attached packages:
[1] scrapeR_0.1.6  RCurl_1.91-1   bitops_1.0-4.1 XML_3.9-4 

loaded via a namespace (and not attached):
[1] tools_2.15.0
*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] adding a caption to a mosaic plot?

2012-05-02 Thread Simon Kiss

Dear all:
Is there a way to add text to the margins or outer margins of a mosaic plot 
using the vcd package? I understand the margins argument to mosaic, but I don't 
know how to add text to that. 
I'd like to add a caption to a plot.  If possible, I'd like to know how to set 
the font and size for that function as well. My plot looks roughly as below. 
Thank you for your time!
Simon J. Kiss

mydat<-data.frame(gender=factor(rbinom(100, 1, 0.5),  labels=c('female', 
'male')), hair=factor(rbinom(100, 1, 0.5), labels=c('blonde', 'black')))
mosaic_1<-table(mydat) 
mosaic(mosaic_1, gp=shading_hsv, main='my title', pop=FALSE, 
split_vertical=FALSE,  margins=c(4.1, 2.1, 8, 5.1), 
labeling_args=list(rot_labels=c(left=0), offset_labels=c(left=3), 
gp_main=gpar(cex=2), offset_varnames=c(left=5.5), gp_labels=gpar(cex=1.5), 
gp_varnames=gpar(cex=1.5), labeling_values=c('observed')))
labeling_cells(text=round(prop.table(mosaic_1, 1)*100), gp_text=gpar(ces=2), 
clip=FALSE)(mosaic_1)

*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Use scores from factor analysis and missing values factanal(), napredict(), na.omit()

2012-04-26 Thread Simon Kiss

Dear all,
I have a series of variables that looks roughly like the sample data below and 
I'm trying to conduct a factor analysis.  I've omitted cases with missing 
values for the factor analysis, but now I'd like to use the scores on each 
component as new variables in the *original* data set for analysis. That is, 
I'd like to take the scores on each of the two factors and see how they relate 
to the variable "trust" in the original data set. It looks like I could create 
a common index variable out of the rownames in each data set and then merge 
them, but I'm wondering if there is a less bulky way to do that perhaps via 
?napredict?
Thank you for your time.
Yours, Simon J. Kiss
*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606



#Sample Data
mydat<-data.frame(trust=rnorm(100, mean=5, sd=2), v=rnorm(100, mean=1, sd=0.2), 
w=rnorm(100, mean=2, sd=0.5), x=rnorm(100, mean=0.2, sd=0.2), y=rnorm(100, 
mean=0.3, sd=0.1), z=rnorm(100, mean=0.5, sd=0.3))
#Set some missing values
mydat[52,2]<-NA
mydat[53,1]<-NA
mydat[95,3]<-NA
#Subset original data set by variables for factor analysis
my<-subset(mydat, select=c(v,w,x,y,z))
#Omit cases with missing variables
my<-na.omit(mydat)
#Factor analysis plus generate Scores
myfit<-factanal(my, 2, rotation='varimax', scores='Bartlett')

#Reintegrate Scores from two factors to original dataset for regression analysis
#?na.predict ?merge(rownames)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] grep and XML

2012-04-16 Thread Simon Kiss

Hi all:
I struggle a lot scraping web data. I still haven't got a handle on the XML 
package. 
I'd like to get particular exchange rates from this table: 
https://raw.github.com/currencybot/open-exchange-rates/master/latest.json
This is the code that I'm working with:
library(RCurl)
library(XML)

txt<-getURL("https://raw.github.com/currencybot/open-exchange-rates/master/latest.json";)
txt<-htmlParse(txt, asText=TRUE)
txt<-  getNodeSet(txt, '//p')
So, I can get the node, properly but then, if I try soething like this:
grep(c('USD'), txt)

I get: 
integer(0)

Can anyone suggest a way forward?
Yours, Simon KIss

*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lapply to change variable names and variable values

2012-03-12 Thread Simon Kiss

Thanks both! That solves ! You've made a very happy newbie!
Simon
On 2012-03-12, at 2:52 PM, Sarah Goslee wrote:

> Hi Simon,
> 
> On Mon, Mar 12, 2012 at 2:37 PM, Simon Kiss  wrote:
>> Hi: I'm sure this is a very easy problem. I've consulted Data Manipulation 
>> With R and the R Book and can't find an answer.
>> 
>> Sample list of data frames looks as follows:
>> 
>> .xx<-list(df<-data.frame(Var1=rep('Alabama', 400), Var2=rep(c(2004, 2005, 
>> 2006, 2007), 400)), df2<-data.frame(Var1=rep('Tennessee', 400), 
>> Var2=rep(c(2004,2005,2006,2007), 400)), df3<-data.frame(Var1=rep('Alaska', 
>> 400), Var2=rep(c(2004,2005,2006,2007), 400)) )
> 
> I tweaked this a bit so that it doesn't actually create df, df2, df3 as well 
> as
> making a list of them, and so that xx doesn't begin with a . and shows up with
> ls(). I don't need invisible objects in my testing session.
> 
> xx<-list(df=data.frame(Var1=rep('Alabama', 400), Var2=rep(c(2004,
> 2005, 2006, 2007), 400)), df2=data.frame(Var1=rep('Tennessee', 400),
> Var2=rep(c(2004,2005,2006,2007), 400)),
> df3=data.frame(Var1=rep('Alaska', 400),
> Var2=rep(c(2004,2005,2006,2007), 400)) )
> 
> 
>> I would like to accomplish the following two tasks.
>> First, I'd like to go through and change the names of each of the data 
>> frames within the list
>> to be 'State' and 'Year'
>> 
>> Second, I'd like to go through and add one year to each of the 'Var2'  
>> variables.
>> 
>> Third, I'd like to then delete those cases in the data frames that have 
>> values of Var2 (or Year) values of 2008.
>> 
>> I could do this manually, but my data are actually bigger than this, plus 
>> I'd really like to learn. I've been trying to use lapply, but I can't get my 
>> head around how it works:
>>  .xx<- lapply(.xx, function(x) colnames(x)<-c('State', 'Year')
>> just changes the actual list of data frames to a list of the character 
>> string ('State' and 'Year')  How do I actually change the underlying 
>> variable names?
> 
> Your function doesn't return the right thing. To see how it works, it's often 
> a
> good idea to write a stand-alone function and see what it does. For instance,
> 
> rename <- function(x) {
>   colnames(x)<-c('State', 'Year')
>   x
> }
> 
> To me at least, as soon as it's written as a stand-alone it's obvious that
> you have to return x in the last line. You can either use rename() in your
> lapply statement:
> xx<- lapply(xx, rename)
> 
> or you can write the full function into the lapply statement:
>> xx<-list(df=data.frame(Var1=rep('Alabama', 400), Var2=rep(c(2004, 2005, 
>> 2006, 2007), 400)), df2=data.frame(Var1=rep('Tennessee', 400), 
>> Var2=rep(c(2004,2005,2006,2007), 400)), df3=data.frame(Var1=rep('Alaska', 
>> 400), Var2=rep(c(2004,2005,2006,2007), 400)) )
>> xx <- lapply(xx, function(x){ colnames(x)<-c('State', 'Year'); x} )
>> colnames(xx[[1]])
> [1] "State" "Year"
> 
> The same strategy should work for your other needs as well.
> 
> Sarah
> 
> 
> 
> -- 
> Sarah Goslee
> http://www.functionaldiversity.org

*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] lapply to change variable names and variable values

2012-03-12 Thread Simon Kiss

Hi: I'm sure this is a very easy problem. I've consulted Data Manipulation With 
R and the R Book and can't find an answer.

Sample list of data frames looks as follows: 

.xx<-list(df<-data.frame(Var1=rep('Alabama', 400), Var2=rep(c(2004, 2005, 2006, 
2007), 400)), df2<-data.frame(Var1=rep('Tennessee', 400), 
Var2=rep(c(2004,2005,2006,2007), 400)), df3<-data.frame(Var1=rep('Alaska', 
400), Var2=rep(c(2004,2005,2006,2007), 400)) )

I would like to accomplish the following two tasks. 
First, I'd like to go through and change the names of each of the data frames 
within the list
to be 'State' and 'Year'

Second, I'd like to go through and add one year to each of the 'Var2'  
variables.

Third, I'd like to then delete those cases in the data frames that have values 
of Var2 (or Year) values of 2008.

I could do this manually, but my data are actually bigger than this, plus I'd 
really like to learn. I've been trying to use lapply, but I can't get my head 
around how it works: 
  .xx<- lapply(.xx, function(x) colnames(x)<-c('State', 'Year')
just changes the actual list of data frames to a list of the character string 
('State' and 'Year')  How do I actually change the underlying variable names?

I'm grateful for your suggestions!
Simon Kiss

*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] txtStart creates a NULL file

2012-02-14 Thread Simon Kiss

Hello all: 
I'm trying to use the following code to get commands, comments and results to a 
.txt file.  It only appears to capture comments. When I comment those out with 
#, it creates a NULL file.  
Someone seemed to have a similar problem with a mac GUI 
(https://stat.ethz.ch/pipermail/r-help/2010-September/253177.html) but the 
result seemed to be ambiguous. Is there a work-around? Reproducible code and 
sessioninfo are below.   The OS is Mac OS 10.6.8.

Yours truly, Simon Kiss

install.packages("HSAUR")
library(HSAUR)
library(TeachingDemos)
data("Forbes2000", package="HSAUR")
#This is a test of R output for the blind
txtStart('test.txt', commands=TRUE, results=TRUE)
txtComment('This command provides the mean profit in the data set')
mean(Forbes2000$profits, na.rm=TRUE)
txtComment('This command provides the standard deviation of the profits data 
set')
sd(Forbes2000$profits, na.rm=TRUE)
txtComment('This command provides the average profit by country')
aggregate(Forbes2000$profits, by=list(Forbes2000$country), function(x) mean(x, 
na.rm=TRUE))
txtStop()


SessionInfo()

R version 2.13.2 (2011-09-30)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/C/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

other attached packages:
[1] TeachingDemos_2.7

loaded via a namespace (and not attached):
[1] tools_2.13.2

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] (no subject)

2011-11-11 Thread Simon Kiss

Dear colleagues,
I'm trying to fit a multinomial logistic regression for an ordinal variable.  
I see in the help pages for multinom in nnet that one should scale the 
predictors from 0-1.  Is that really necessary?
Also: can anyone clarify what the difference between alternative-specific and 
individual specific variables are?
Yours, Simon Kiss



l
*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Listing tables together from random samples from a generated population?

2011-11-10 Thread Simon Kiss

.
HI there,
I'd like to show demonstrate how the chi-squared distribution works, so I've 
come up with a sample data frame of two categorical variables
y<-data.frame(gender=sample(c('Male', 'Female'), size=10, replace=TRUE, 
c(0.5, 0.5)), tea=sample(c('Yes', 'No'), size=10, replace=TRUE, c(0.5, 
0.5)))

And I'd like to create a list of 100 different samples of those two variables 
and the resulting 2X2 contingency tables

table(.y[sample(nrow(.y), 100), ])

How would I combine these 100 tables into a list? I'd like to be able to go in 
and find some of the extreme values to show how the sampling distribution of 
the chi-square values.

I can already get a histogram of 100 different chi-squared values that shows 
the distribution nicely (see below), but I'd like to actually show the 
underlying tables, for demonstration's sake.

 .z<-vector()
for (i in 1:100) {
.z<-c(.z, chisq.test(table(.y[sample(nrow(.y), 200), ]))$statistic)
}
hist(.z, xlab='Chi-Square Value', main="Chi-Squared Values From 100 different 
samples asking\nabout gender and tea/coffee drinking")
abline(v=3.84, lty=2)

Thank you in advance,
Simon Kiss

*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] (no subject)

2011-10-31 Thread Simon Kiss

Dear colleagues
I'm using data that looks like .test and .test1 below to draw two mosaic plots 
with cell labelling (the row percentages from the tables). 
When I take out the pop=FALSE commands in the mosaic commands and comment out 
the two lines labelling the cells, then the plots are laid out exactly as I'd 
like: side-by-side.
But I do require the cell labelling and the pop=FALSE arguments. I suspect I 
need to add in a call to pushViewport or an upViewport command, but I'm not 
sure. Any advice is welcome.


library(vcd)
library(grid)


.test<-as.table(matrix(c(1, 2, 3, 4, 5, 6), nrow=3, ncol=2, byrow=TRUE))
  .test<-prop.table(.test, 1)
.test1<-as.table(matrix(c(1, 2, 3, 4), nrow=2, ncol=2, byrow=TRUE))
  .test1<-prop.table(.test1, 1)

dimnames(.test)<-list("Fluoride Cluster"=c('Beneficial\nand Safe', 'Mixed 
Opinion', 'Harmful With No Benefits'), "Governments Should Not Impose 
Treatment"=c('Agree', 'Disagree'))
dimnames(.test1)<-list("Vaccines Are Too Much To Handle"= c('Agree' , 
'Disagree'), "Governments Should Not Oblige Treatment" =c('Agree', 'Disagree'))
grid.newpage()
pushViewport(viewport(layout=grid.layout(1,2)))
pushViewport(viewport(layout.pos.col=1))
mosaic(.test, gp=shading_hsv, pop=FALSE, 
split_verticaL=FALSE, newpage=FALSE, 
labeling_args=list(offset_varnames=c(top=3), offset_labels=c(top=2)))
labeling_cells(text=round(prop.table(.test, 1), 2)*100, clip=FALSE)(.test)
popViewport()

pushViewport(viewport(layout.pos.col=2))
  mosaic(.test1, gp=shading_hsv, newpage=FALSE,pop=FALSE, split_vertical=FALSE, 
labeling_args=list(offset_varnames=c(top=3), offset_labels=c(top=2)))
labeling_cells(text=round(prop.table(.test1, 1), 2)*100, clip=FALSE)(.test1)
popViewport(2)
*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help combining cell labelling and multiple mosaic plots

2011-10-31 Thread Simon Kiss

Dear colleagues
I'm using data that looks like .test and .test1 below to draw two mosaic plots 
with cell labelling (the row percentages from the tables). 
When I take out the pop=FALSE commands in the mosaic commands and comment out 
the two lines labelling the cells, then the plots are laid out exactly as I'd 
like: side-by-side.
But I do require the cell labelling and the pop=FALSE arguments. I suspect I 
need to add in a call to pushViewport or an upViewport command, but I'm not 
sure. Any advice is welcome.


library(vcd)
library(grid)


.test<-as.table(matrix(c(1, 2, 3, 4, 5, 6), nrow=3, ncol=2, byrow=TRUE))
   .test<-prop.table(.test, 1)
.test1<-as.table(matrix(c(1, 2, 3, 4), nrow=2, ncol=2, byrow=TRUE))
   .test1<-prop.table(.test1, 1)

dimnames(.test)<-list("Fluoride Cluster"=c('Beneficial\nand Safe', 'Mixed 
Opinion', 'Harmful With No Benefits'), "Governments Should Not Impose 
Treatment"=c('Agree', 'Disagree'))
dimnames(.test1)<-list("Vaccines Are Too Much To Handle"= c('Agree' , 
'Disagree'), "Governments Should Not Oblige Treatment" =c('Agree', 'Disagree'))
 grid.newpage()
 pushViewport(viewport(layout=grid.layout(1,2)))
pushViewport(viewport(layout.pos.col=1))
 mosaic(.test, gp=shading_hsv, pop=FALSE, 
split_verticaL=FALSE, newpage=FALSE, 
labeling_args=list(offset_varnames=c(top=3), offset_labels=c(top=2)))
labeling_cells(text=round(prop.table(.test, 1), 2)*100, clip=FALSE)(.test)
popViewport()

pushViewport(viewport(layout.pos.col=2))
   mosaic(.test1, gp=shading_hsv, newpage=FALSE,pop=FALSE, 
split_vertical=FALSE, labeling_args=list(offset_varnames=c(top=3), 
offset_labels=c(top=2)))
labeling_cells(text=round(prop.table(.test1, 1), 2)*100, clip=FALSE)(.test1)
popViewport(2)
*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] htmlParse hangs or crashes

2011-09-05 Thread Simon Kiss

Dear colleagues,
each time I use htmlParse, R crashes or hangs.  The url I'd like to parse is 
included below as is the results of a series of basic commands that describe 
what I'm experiencing.  The results of sessionInfo() are attached at the bottom 
of the message.
The thing is, htmlTreeParse appears to work just fine, although it doesn't 
appear to contain the information I need (the URLs of the articles linked to on 
this search page).  Regardless, I'd still like to understand why htmlParse 
doesn't work.
Thank you for any insight.
Yours, 
Simon Kiss


myurl<-c("http://timesofindia.indiatimes.com/searchresult.cms?sortorder=score&searchtype=2&maxrow=10&startdate=2001-01-01&enddate=2011-08-25&article=2&pagenumber=1&isphrase=no&query=IIM&searchfield=§ion=&kdaterange=30&date1mm=01&date1dd=01&date1=2001&date2mm=08&date2dd=25&date2=2011";)

.x<-htmlParse(myurl)

class(.x)
#returns "HTMLInternalDocument" "XMLInternalDocument" 

.x
#returns
*** caught segfault ***
address 0x1398754, cause 'memory not mapped'

Traceback:
 1: .Call("RS_XML_dumpHTMLDoc", doc, as.integer(indent), 
as.character(encoding), as.logical(indent), PACKAGE = "XML")
 2: saveXML(from)
 3: saveXML(from)
 4: asMethod(object)
 5: as(x, "character")
 6: cat(as(x, "character"), "\n")
 7: print.XMLInternalDocument()
 8: print()

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace

sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/C/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

other attached packages:
[1] XML_3.4-0  RCurl_1.5-0bitops_1.0-4.1
*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R hangs after htmlTreeParse

2011-08-25 Thread Simon Kiss

Dear colleagues,
I'm trying to parse the html content from this webpage:
http://timesofindia.indiatimes.com/searchresult.cms?sortorder=score&searchtype=2&maxrow=10&startdate=2001-01-01&enddate=2011-08-25&article=2&pagenumber=1&isphrase=no&query=IIM&searchfield=§ion=&kdaterange=30&date1mm=01&date1dd=01&date1=2001&date2mm=08&date2dd=25&date2=2011

Using the following code
library(RCurl)
library(XML)
myurl<-c("http://timesofindia.indiatimes.com/searchresult.cms?sortorder=score&searchtype=2&maxrow=10&startdate=2001-01-01&enddate=2011-08-25&article=2&pagenumber=1&isphrase=no&query=IIM&searchfield=§ion=&kdaterange=30&date1mm=01&date1dd=01&date1=2001&date2mm=08&date2dd=25&date2=2011";)

.x<-getURL(myurl)
htmlTreeParse(.x, asText=T)

This prints approximately 15 lines of the output from the html document and 
then mysteriously stops. The command line prompt does not reappear and force 
quit is the only option. 
I'm running R 2.13 on Mac os 10.6 and the latest versions of XML and RCURL are 
installed.
Yours, Simon Kiss

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Comparison of means in survey package

2011-08-18 Thread Simon Kiss

Dear list colleagues,
I'm trying to come up with a test question for undergraduates to illustrate 
comparison of means from a complex survey design. The data for the example 
looks roughly like this:

mytest<-data.frame(harper=rnorm(500, mean=60, sd=1), party=sample(c("BQ", 
"NDP", "Conservative", "Liberal", "None", NA), size=500, replace=TRUE), 
natwgt=sample(c(0.88, 0.99, 1.43, 1.22, 1.1), size=500, replace=TRUE), 
gender=sample(c("Male", "Female"), size=500, replace=TRUE))

Using svyby I can get the means for each group of interest (primarily the party 
variable), but I can't get further to actually do the comparison of means.  I 
saw a reference on the help listserv to the effect that the survey package does 
not do ttests and that one should use svyglm.  However, that was in 2009 and I 
see that there's a command, svytteset in the package which seems to be on 
point.  However, when I've tried that command I can't get it to work: it 
returns the following error message:

t = NaN, df = 3255, p-value = NA 
alternative hypothesis: true difference in mean is not equal to 0 
sample estimates:
difference in mean 
  38.80387 

This is from my data, not the code above.  Would there also be a way just to do 
the comparison of means test between two subgroups of a factor, and not just on 
all factor levels?

Using 2.13 on mac os 10.6 and the latest version of survey package.

Yours, Simon Kiss


*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Applyin Weights In RCommander

2011-08-12 Thread Simon Kiss

Dear Colleagues,
Do any R-plugins handle complex sampling procedures? I know that survey is 
probably the best one from the command line and the standard linear model can 
handle it in the RCommander, but I'd like to be able to show students how to 
apply weights doing simple descriptive statistics as well, in R Commander.
Yours, Simon Kiss
*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Apostrophes in R Commander in recode

2011-07-29 Thread Simon Kiss

Dear colleagues, 

I'm using R64 (2.13) on Mac OS 10.6.8 and I've encountered a problem with the 
recode function in Rcommander.  The application cannot deal with apostrpohes ( 
' ) do not.  I've got a factor from the 2008 Canada Election study (highest 
level of schooling) and some of the values include "Bachelor's Degree" , 
"Master's Degree".

I've troubleshooted (shot?) the recode function for all the levels and it's 
really the apostrophe that is the problem. 

When entering "Bachelor's Degree"=1, I get the error message 

[39] ERROR: Use only double-quotes (" ") in recode directives

I see also that the same problem exists in recode from the command line.

There are two ways I can solve this myself, but neither are both are a bit more 
complex than the context requires (e.g. exercises for an undergraduate class). 
I can use gsub from the command line to remove the apostrophes, or i can  
import the data file without using value labels as factor levels and that would 
doubtless work.  But the technical documentation for the CES is very poor; my 
students would have to end up opening up the original .sav file in PASW and 
hunt down what the underlying factor levels refer to in that instance.  

Is there a solution within R Commander?
Yours, Simon Kiss
*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cycling from x11 window in RCommander to graphics device window: Mac Os 10.6.8

2011-07-28 Thread Simon Kiss

Dear John,
The Command Tab does not work for me, but I have been able to get expose to 
work. I.e. it does bring up all windows, including the x11 terminal.  It will 
take a little getting used to, but it is functional.
I apologize for cluttering the list with minutiae
Thank you!
Yours
S.
On 2011-07-28, at 5:21 PM, John Fox wrote:

> Dear Simon,
> 
> I'm sitting in front of a MacBook Pro and Command-tab works perfectly fine 
> for me: Selecting X11 brings the R Commander Window to the front, and 
> selecting R brings the Quartz graphics window to the front. I must admit that 
> my habit in classroom demonstrations on a Mac is to use Expose to select 
> Windows, but, unless I misunderstand your problem, Command-tab also works.
> 
> I'm using R 2.13.1 under Mac OS X 10.6.7 with XQuartz 2.3.6 and 
> tcltk-8.5.5-x11.
> 
> I hope this helps,
> John
> 
> 
> John Fox
> Sen. William McMaster Prof. of Social Statistics
> Department of Sociology
> McMaster University
> Hamilton, Ontario, Canada
> http://socserv.mcmaster.ca/jfox/
> On Thu, 28 Jul 2011 13:40:11 -0400
> Simon Kiss  wrote:
>> Dear Colleagues, 
>> I have recently installed R Commander on my Mac OS 10.6.8. I'd like to use 
>> it for an undergraduate class this year.
>> Everything appears to be working fine, except for one thing.  I cannot use 
>> Command-tab to cycle from the X11 window in which RCommander is running to 
>> any other window open in my workspace.  This is particularly important 
>> because I cannot cycle to the graphics device window that is opened when I 
>> call a new plot.  If I force quit the X11 window and Rcommander, R remains 
>> running and I can see the graphics device window and the plot looks fine.
>> But as you can imagine, this is quite laborious, having to restart.
>> I've looked through the help documentation and tried reinstalling tcltk 
>> prior to opening up Rcommander, but that does not address the problem.
>> Any thoughts?
>> Yours, Simon Kiss
>> *
>> Simon J. Kiss, PhD
>> Assistant Professor, Wilfrid Laurier University
>> 73 George Street
>> Brantford, Ontario, Canada
>> N3T 2C9
>> Cell: +1 905 746 7606
>> 
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
>   

*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] cycling from x11 window in RCommander to graphics device window: Mac Os 10.6.8

2011-07-28 Thread Simon Kiss

Dear Colleagues, 
I have recently installed R Commander on my Mac OS 10.6.8. I'd like to use it 
for an undergraduate class this year.
Everything appears to be working fine, except for one thing.  I cannot use 
Command-tab to cycle from the X11 window in which RCommander is running to any 
other window open in my workspace.  This is particularly important because I 
cannot cycle to the graphics device window that is opened when I call a new 
plot.  If I force quit the X11 window and Rcommander, R remains running and I 
can see the graphics device window and the plot looks fine.
But as you can imagine, this is quite laborious, having to restart.
I've looked through the help documentation and tried reinstalling tcltk prior 
to opening up Rcommander, but that does not address the problem.
Any thoughts?
Yours, Simon Kiss
*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] sum part of a vector

2011-07-23 Thread Simon Kiss

Dear colleagues, I have a data set that looks roughly like this;
mydat<-data.frame(state=c(rep("Alabama", 5), rep("Delaware", 5), 
rep("California", 5)), news=runif(15, min=0, max=8), cum.news=rep(0, 15))

For each state, I'd like to cumulatively sum the value of "news" and make that 
put that value in cum.news.

I'm trying as follows but I get really weird results. One thing is that it 
keeps counting 0's as 1. 

for (i in levels(mydat$state)) {
mydat[mydat$state==i, ]$cum.news<-sapply(mydat[mydat$state==i, ]$news, 
function(x) sum(1:x))
}

I can sort of get the same sapply function to do what I want when working on a 
test string
test<-1:10
sapply(test, function(x) sum(1:x))

Any thoughts?
Simon Kiss
*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep lines before or after pattern matched?

2011-07-11 Thread Simon Kiss

Josh, that's amazing. Is there any way to have it grab two different lines 
after the grep, say the second and the fourth line? There's some other 
information in the text file I'd like to grab.  I could do two separate 
commands, but I'd like to know if this could be done in one command...
Simon Kiss
On 2011-07-11, at 1:31 PM, Joshua Wiley wrote:

> If you know you can find the start of the document (say that line
> always starts with Document...), then:
> 
> grep("Document+.", yourfile, value = FALSE) + 4
> 
> should give you 4 lines after each line where Document occurred.  No
> loop needed :)
> 
> On Mon, Jul 11, 2011 at 10:25 AM, Simon Kiss  wrote:
>> Hi Josh,
>> Sorry for the insufficient introduction. This might work, but I'm not sure.
>> The file that I have includes up to 100 documents (Document 1, Document 2, 
>> Document 3Document 100) with the newspaper name following 4 lines below 
>> each Document number.
>> I'm using readlines to get the text file into R and then trying to use grep 
>> to get the newspaper name for each record. But your idea of indexing the 
>> text object read into R with the line number where the newspaper name is 
>> found is a good one.  I'll just have to come up with a loop to tell R to get 
>> the 4th, 8th, 12, 16th, line, etc.
>> I'll see if I can get that to work.
>> Simon
>> On 2011-07-11, at 12:45 PM, Joshua Wiley wrote:
>> 
>>> Dear Simon,
>>> 
>>> Maybe I don't understand properlyif you are doing this in R, can't
>>> you just pick the line you want?
>>> 
>>> Josh
>>> 
>>> ## print your data to clipboard
>>> cat("Document 1 of 100 \n \n \n Newspaper Name \n \n Day Date", file =
>>> "clipboard")
>>> ## read data in, and only select the 4th line to pass to grep()
>>> grep("pattern", x = readLines("clipboard")[4])
>>> 
>>> 
>>> On Mon, Jul 11, 2011 at 9:31 AM, Simon Kiss  wrote:
>>>> Dear colleagues,
>>>> I have a series of newspaper articles in a text file, downloaded from a 
>>>> text file.  They look as follows:
>>>> 
>>>> Document 1 of 100
>>>> \n
>>>> \n
>>>> \n
>>>> Newspaper Name
>>>> \n
>>>> \n
>>>> Day Date
>>>> 
>>>> I have a series of grep scripts that can extract the date and convert it 
>>>> to a date object, but I can't figure out how to grep the newspaper name.  
>>>> There is no field ID attached to those lines. The best I can come up with 
>>>> would be to have the program grep the four lines following matching the 
>>>> pattern "Document [0-9]".  There is an an argument to grep in unix that 
>>>> can do this ...grep -A4 'pattern' infile>outfile, but I don't know if 
>>>> there is an equivalent argument in R.
>>>> 
>>>> Any thoughts.
>>>> Yours, Simon Kiss
>>>> *
>>>> Simon J. Kiss, PhD
>>>> Assistant Professor, Wilfrid Laurier University
>>>> 73 George Street
>>>> Brantford, Ontario, Canada
>>>> N3T 2C9
>>>> Cell: +1 905 746 7606
>>>> 
>>>> __
>>>> R-help@r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide 
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Joshua Wiley
>>> Ph.D. Student, Health Psychology
>>> University of California, Los Angeles
>>> https://joshuawiley.com/
>> 
>> *
>> Simon J. Kiss, PhD
>> Assistant Professor, Wilfrid Laurier University
>> 73 George Street
>> Brantford, Ontario, Canada
>> N3T 2C9
>> Cell: +1 905 746 7606
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 
> 
> -- 
> Joshua Wiley
> Ph.D. Student, Health Psychology
> University of California, Los Angeles
> https://joshuawiley.com/

*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep lines before or after pattern matched?

2011-07-11 Thread Simon Kiss

Hi Josh,
Sorry for the insufficient introduction. This might work, but I'm not sure.
The file that I have includes up to 100 documents (Document 1, Document 2, 
Document 3Document 100) with the newspaper name following 4 lines below 
each Document number.
I'm using readlines to get the text file into R and then trying to use grep to 
get the newspaper name for each record. But your idea of indexing the text 
object read into R with the line number where the newspaper name is found is a 
good one.  I'll just have to come up with a loop to tell R to get the 4th, 8th, 
12, 16th, line, etc. 
I'll see if I can get that to work.
Simon
On 2011-07-11, at 12:45 PM, Joshua Wiley wrote:

> Dear Simon,
> 
> Maybe I don't understand properlyif you are doing this in R, can't
> you just pick the line you want?
> 
> Josh
> 
> ## print your data to clipboard
> cat("Document 1 of 100 \n \n \n Newspaper Name \n \n Day Date", file =
> "clipboard")
> ## read data in, and only select the 4th line to pass to grep()
> grep("pattern", x = readLines("clipboard")[4])
> 
> 
> On Mon, Jul 11, 2011 at 9:31 AM, Simon Kiss  wrote:
>> Dear colleagues,
>> I have a series of newspaper articles in a text file, downloaded from a text 
>> file.  They look as follows:
>> 
>> Document 1 of 100
>> \n
>> \n
>> \n
>> Newspaper Name
>> \n
>> \n
>> Day Date
>> 
>> I have a series of grep scripts that can extract the date and convert it to 
>> a date object, but I can't figure out how to grep the newspaper name.  There 
>> is no field ID attached to those lines. The best I can come up with would be 
>> to have the program grep the four lines following matching the pattern 
>> "Document [0-9]".  There is an an argument to grep in unix that can do this 
>> ...grep -A4 'pattern' infile>outfile, but I don't know if there is an 
>> equivalent argument in R.
>> 
>> Any thoughts.
>> Yours, Simon Kiss
>> *
>> Simon J. Kiss, PhD
>> Assistant Professor, Wilfrid Laurier University
>> 73 George Street
>> Brantford, Ontario, Canada
>> N3T 2C9
>> Cell: +1 905 746 7606
>> 
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> 
> 
> -- 
> Joshua Wiley
> Ph.D. Student, Health Psychology
> University of California, Los Angeles
> https://joshuawiley.com/

*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] grep lines before or after pattern matched?

2011-07-11 Thread Simon Kiss

Dear colleagues,
I have a series of newspaper articles in a text file, downloaded from a text 
file.  They look as follows:

Document 1 of 100
\n
\n
\n
Newspaper Name
\n
\n
Day Date

I have a series of grep scripts that can extract the date and convert it to a 
date object, but I can't figure out how to grep the newspaper name.  There is 
no field ID attached to those lines. The best I can come up with would be to 
have the program grep the four lines following matching the pattern "Document 
[0-9]".  There is an an argument to grep in unix that can do this ...grep -A4 
'pattern' infile>outfile, but I don't know if there is an equivalent argument 
in R.

Any thoughts.
Yours, Simon Kiss
*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] error message trying to plot survival curves from hypothetical covariate profiles

2011-06-14 Thread Simon Kiss

Dear colleagues, 
following John Fox' advice in this article 
(http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-cox-regression.pdf),
 I'm trying to create a new data frame to examine the differential survival 
curves from a combination of covariates.
These are derived from a Cox Proportional Hazards model I fit to data about the 
diffusion of a particular policy across American states over a period of 7 
years.

The original dataset looks as follows:
'data.frame':   819 obs. of  10 variables:
 $ state   : Factor w/ 39 levels "Alabama","Arkansas",..: 1 1 1 1 1 1 1 1 1 1 
...
 $ year: num  2005 2005 2005 2006 2006 ...
 $ enviro  : num  0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 ...
 $ ban : num  0 0 0 0 0 0 0 0 0 0 ...
 $ partisan: Factor w/ 3 levels "democrat","mixed",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ news: num  0 0 0 0 0 0 0 0 0 0 ...
 $ start   : num  2005 2005 2005 2006 2006 ...
 $ stop: num  2006 2006 2006 2007 2007 ...
 $ risk: num  1 2 3 1 2 3 1 2 3 1 ...
 $ evstatus: num  0 0 0 0 0 0 0 0 0 0 ...

I am modelling the survival time until the adoption of the policy as follows:

mod1<-Surv(newdat$start, newdat$stop, newdat$evstatus)
mymod1<-coxph(mod1 ~ news + enviro + partisan + cluster(state) + 
strata(evstatus), method=c("efron"), robust=TRUE)

Again, following Fox, I try to construct a data frame with a hypothetical 
covariate profile:

n<-data.frame(news=rep(c(1,4,8)), evstatus=as.factor(1:3), 
enviro=mean(newdat$enviro), partisan=c("democrat", "mixed", "republican"))
plot(survfit(mymod1, newdata=n))

Error in scale.default(x2, center = xcenter, scale = FALSE) : 
  length of 'center' must equal the number of columns of 'x'

I've looked and someone encountered a similar error trying to plot predicted 
values from a stepwise regression. That issue did not appear to be solved.  
On the surface of it it seems that I need to expand the 'n' data frame to have 
an equal number of columns as the original (newdat), although perusing Fox's 
data and instructions, that does not appear to be the case there, so I'm a 
little bit lost.

Any guidance is appreciated 
Yours truly,
Simon J. Kiss
*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] indexing list elements with lapply?

2011-04-22 Thread Simon Kiss

Dear colleagues,
I have a list that looks like what the code below produces.  I  need a function 
to go through each list element and work on the second column of each list 
element (the first column is irrelevant to me...if the proposed function works 
on the first column as a consequence of a writing something simple, that's 
fine).
I need to index the second column of each list element to the first item in 
each column.  So for each list, I need to divide each number in the second 
column by the first number in that column.

This code does what I want, but it only works on one item in the list
r[[1]][,2] / r[[1]][1,2].

I've tried working with this function but can't get it to work: 
f<-function(x) {
for (i in 1:5)
{

x[[i]][,2]/x[[i]][1,2]
}
}

lapply(r, f)

But I get this error message:
Error in x[[i]][, 2] : incorrect number of dimensions

Hope someone can help. I'm grateful for any suggestions. 
Yours, Simon Kiss

**dataset

ff<-runif(10, 0.85, 1)
ff<-cbind(ff, 1-ff)
gg<-runif(10, 0.85, 1)
gg<-cbind(gg, 1-ff)
hh<-runif(10, 0.86, 1)
hh<-cbind(hh, 1-hh)
ii<-runif(10, 0.92, 1)
ii<-cbind(ii, 1-ii)
jj<-runif(10,0.76, 1)
jj<-cbind(jj, 1-jj)
r<-list(ff, gg, hh,ii, jj)
*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 519 761 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] error in recode.defalt ....object '.data' not found

2011-03-31 Thread Simon Kiss

Dear colleagues, working with the data frame below, trying to reverse two 
variables I the error message below.
i searched through the help list but could not find any postings which could 
help me solve the situation. I tried attaching and detaching the data frame to 
no avail.
Yours, Simon Kiss

*DATA FRAME
'data.frame':   1569 obs. of  9 variables:
 $ equal : num  3 4 3 2 3 4 2 3 2 2 ...
 $ disc  : num  3 2 3 3 2 2 3 3 3 3 ...
 $ family: num  3 2 2 2 3 2 2 1 2 1 ...
 $ special   : num  3 3 4 4 3 3 4 4 3 4 ...
 $ immigrants: num  3 8 3 8 3 3 4 1 1 2 ...
 $ wedlock   : num  3 3 3 3 3 2 2 8 2 3 ...
 $ crime : num  3 2 2 1 2 3 1 8 2 1 ...
 $ breakdown : num  3 3 3 2 2 4 8 2 2 4 ...
 $ nonwhites : num  2 4 3 3 2 2 3 4 3 3 ...

*RECODE
social$nonwhites<-recode(social$nonwhites, "1=4; 2=3; 3=2; 4=1; 8=NA; -9=NA")

*ERROR
Error in recode.default(social$nonwhites, "1=4; 2=3; 3=2; 4=1; 8=NA; -9=NA") : 
  object '.data' not found


*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 519 761 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] sampling design runs with no errors but returns empty data set

2011-03-30 Thread Simon Kiss

Dear colleagues,
I'm working with the 2008 Canada Election Studies 
(http://www.queensu.ca/cora/_files/_CES/CES2008.sav.zip), trying to construct a 
weighted national sample using the survey package.
Three weights are included in the national survey (a household weight,  a 
provincial weight and a national weight which is a product of the first two).
In the following code I removed variables with missing national weights and 
tried to construct the sample from advice I've gleaned from the documentation 
for the survey package and other help requests.
There are no errors, but the data frame (weight_test) contains no 
What am I missing?  
Yours, Simon Kiss
P.S. The code is only reproducible if the data set is downloadable.  I'm nt sure

ces<-read.spss(file.choose(), to.data.frame=TRUE, use.value.labels=FALSE)
missing_data<-subset(ces1, !is.na(ces08_NATWGT))
weight_test<-svydesign(id=~0, weights=~ces08_NATWGT, data=missing_data)

Note: this is some reproducible code that creates a data set that is a very 
stripped down version of what I'm working with, but with this, the surveydesign 
function appears to work properly.

mydat<-data.frame(ces08_HHWGT=runif(3000, 0.5, 5), ces08_PROVWGT=runif(3000, 
0.6, 1.2), party=sample(c("NDP", "BQ", "Lib", "Con"), 3000, replace=TRUE), 
age=sample(seq(18, 72,1), 3000, replace=TRUE), income=sample(seq(21,121,1), 
3000, replace=TRUE))
mydat$ces08_NATWGT<-mydat$ces08_HHWGT*mydat$ces08_PROVWGT
weight_test<-svydesign(id=~1, weights=~ces08_NATWGT, data=mydat)



*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 519 761 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] baseline hazard function

2011-01-26 Thread Simon Kiss

Dear colleagues, I have the following dataset.  It is modelled on the data 
included in Box-Seteffenheiser and Jones "Event History Modelling"
Using the following code, I try to find the baseline hazard function 

haz_1<-muhaz(bpa$time, bpa$censored, subset=(bpa$year=="2010" | bpa$ban=="1"), 
min.time=1, max.time=3)

I think I'm doing everything right, but what I don't understand is how to 
derive a duration dependency coefficient rom the values contained in the muhaz 
object as per Box-Steffenheiser and Jones' recommendations in Ch. 5 of Event 
History Modelling.

I get the following summary(haz_1)
Number of Observations .. 50
Censored Observations ... 43
Method used . Local
Boundary Correction Type  Left and Right
Kernel type . Epanechnikov
Minimum Time  1
Maximum Time  3
Number of minimization points ... 51
Number of estimation points . 101
Pilot Bandwidth . 0.25
Smoothing Bandwidth . 1.27
Minimum IMSE  6716.9


Can anyone provide any advice?
Yours, Simon Kiss

'data.frame':   147 obs. of  7 variables:
 $ state   : Factor w/ 50 levels "Alabama","Alaska",..: 1 1 1 2 2 2 3 3 3 4 ...
 $ partisan: Factor w/ 3 levels "democrat","mixed",..: 1 1 1 2 2 2 3 3 3 1 ...
 $ ban : num  0 0 0 0 0 0 0 0 0 0 ...
 $ year: num  2008 2009 2010 2008 2009 ...
 $ news: num  1.67 1.67 0 2 0 ...
 $ time: num  1 2 3 1 2 3 1 2 3 1 ...
 $ censored: num  0 0 0 0 0 0 0 0 0 0 ...


*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 519 761 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] subsetting based on joint values of critera

2011-01-25 Thread Simon Kiss

Dear colleagues, I have a dataset that looks as below.

I would like to make a new dataset that excludes the cases which are joint 
conjunctions of particular state names and years, so Connecticut and 2010, 
Maryland and 2010 and Vermont and 2010.

I'm trying the following subset code: 
newdata<- subset(bpa, (!State=="Connecticut" & year<"2010"))

It appears that it's only evaluating both criteria independently and not 
jointly, so this is returning all cases in 2008 and 2009, leaving out 
connecticut for those years as well.
How do I get subset to return a dataset based on the joint occurrence of values 
of two variables?

Yours,  Simon Kiss

str(bpa)
'data.frame':   150 obs. of  5 variables:
 $ State   : Factor w/ 50 levels "Alabama","Alaska",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ year: num  2008 2008 2008 2008 2008 ...
 $ ban : num  0 0 0 0 0 0 0 0 0 0 ...
 $ partisan: Factor w/ 3 levels "democrat","mixed",..: 1 1 1 1 1 1 1 2 3 2 ...
 $ news: num  1.67 2 0 0 2.38 ...
*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 519 761 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Counting dates in arbitrary ranges

2011-01-18 Thread Simon Kiss

Dear Colleagues,
I have a data set that looks as below. I'd like to count the number of dates in 
a series of arbitrary ranges (breaks) i.e. not pre-defined breaks such as 
months, quarters or years. table(format()) produces ideally formatted output, 
but table() does not appear to accept arbitrary ranges.
I also tried converting the dates to numeric and using histogram to try to get 
the data, but that doesn't work either.  Cut appears to accept an arbitrary 
range, but I could only get it to produce NAs.

Any suggestions? Yours, Simon Kiss

mydata<-list(x=seq(as.Date("2007-05-01"), as.Date("2009-09-10"),"days"), 
y=seq(as.Date("2007-06-16"), as.Date("2009-11-12"),"days"))
table(format(mydata[[1]], "%Y")) 
t_1<-hist(as.numeric(mydata[[1]], breaks=c("14056", "14421")))$counts
cut(mydata[[1]], breaks=c(as.Date("2008-06-26"), ("2009=06-26")))


*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 519 761 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Importing multiple text files with lapply.

2011-01-17 Thread Simon Kiss

Hi Jim, 
Ultimately, I'm going to want to count the frequency of dates by particular 
time periods (months, quarters, years) for each state and then plot the data. I 
know there are commands in ggplots that will do that, so I'm not too worried 
about that, but I was stuck on getting 50 text files (one for each state) read 
into R.  For the record, using read.table individually on a state file will get 
in a useable format, but wasn't working in conjunction with lapply.
To reiterate, the home file has 50 .txt files each with a column of dates in 
the format I sent you.  
I will try readLines and see if I can get it to loop through.
Yours, Simon Kiss
On 2011-01-17, at 7:44 PM, jim holtman wrote:

> It sounds like you want to use 'readLines' and not 'read.table'
> 
>> x <- readLines(textConnection("January 11, 2009
> + January 11, 2009
> + October 19, 2008
> + October 13, 2008
> + August 16, 2008
> + June 19, 2008
> + April 19, 2008
> + April 16, 2008
> + February 9, 2008
> + September 2, 2007"))
>> closeAllConnections()
>> x
> [1] "January 11, 2009"  "January 11, 2009"  "October 19, 2008"
> "October 13, 2008"  "August 16, 2008"
> [6] "June 19, 2008" "April 19, 2008"    "April 16, 2008"
> "February 9, 2008"  "September 2, 2007"
>> 
> 
> What exactly are you going to do with the data after you read it in?
> 
> On Mon, Jan 17, 2011 at 6:22 PM, Simon Kiss  wrote:
>> Dear jim,
>> Yes, it's true, the data are separated onto new lines as follows:
>> January 11, 2009
>> January 11, 2009
>> October 19, 2008
>> October 13, 2008
>> August 16, 2008
>> June 19, 2008
>> April 19, 2008
>> April 16, 2008
>> February 9, 2008
>> September 2, 2007
>> 
>> I tried your attempt and it didn't work either; it returned the error 
>> message:
>> Error in FUN(X[[1L]], ...) :
>>  'file' must be a character string or connection
>> 
>> On 2011-01-17, at 2:02 PM, jim holtman wrote:
>> 
>>> try:
>>> 
>>> mylist <- lapply(a, read.table, header = TRUE, sep = '\n')
>>> 
>>> also is the separator really '\n' meaning a new-line?  What exactly
>>> does the data look like?
>>> 
>>> On Mon, Jan 17, 2011 at 11:47 AM, Simon Kiss  wrote:
>>>> Hello,
>>>> I'm trying to read in 50 text filess with dates as content to create a 
>>>> list of tables.
>>>> 
>>>> a is the list of filenames that need to be read in.
>>>> 
>>>> The following command returns the following error
>>>> mylist<-lapply(a, read.table(header=TRUE, sep="\n"))
>>>> 
>>>> Error in read.table(header = TRUE, sep = "\n") :
>>>>  element 1 is empty;
>>>>   the part of the args list of 'is.character' being evaluated was:
>>>>   (file)
>>>> 
>>>> Does anyone have any suggestions?
>>>> Yours, Simon Kiss
>>>> *
>>>> Simon J. Kiss, PhD
>>>> Assistant Professor, Wilfrid Laurier University
>>>> 73 George Street
>>>> Brantford, Ontario, Canada
>>>> N3T 2C9
>>>> Cell: +1 519 761 7606
>>>> 
>>>> __
>>>> R-help@r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide 
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Jim Holtman
>>> Data Munger Guru
>>> 
>>> What is the problem that you are trying to solve?
>> 
>> *
>> Simon J. Kiss, PhD
>> Assistant Professor, Wilfrid Laurier University
>> 73 George Street
>> Brantford, Ontario, Canada
>> N3T 2C9
>> Cell: +1 519 761 7606
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 
> 
> -- 
> Jim Holtman
> Data Munger Guru
> 
> What is the problem that you are trying to solve?

*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 519 761 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Importing multiple text files with lapply.

2011-01-17 Thread Simon Kiss

readLines worked great Jim, thanks!
Simon Kiss
On 2011-01-17, at 7:44 PM, jim holtman wrote:

> It sounds like you want to use 'readLines' and not 'read.table'
> 
>> x <- readLines(textConnection("January 11, 2009
> + January 11, 2009
> + October 19, 2008
> + October 13, 2008
> + August 16, 2008
> + June 19, 2008
> + April 19, 2008
> + April 16, 2008
> + February 9, 2008
> + September 2, 2007"))
>> closeAllConnections()
>> x
> [1] "January 11, 2009"  "January 11, 2009"  "October 19, 2008"
> "October 13, 2008"  "August 16, 2008"
> [6] "June 19, 2008" "April 19, 2008""April 16, 2008"
> "February 9, 2008"  "September 2, 2007"
>> 
> 
> What exactly are you going to do with the data after you read it in?
> 
> On Mon, Jan 17, 2011 at 6:22 PM, Simon Kiss  wrote:
>> Dear jim,
>> Yes, it's true, the data are separated onto new lines as follows:
>> January 11, 2009
>> January 11, 2009
>> October 19, 2008
>> October 13, 2008
>> August 16, 2008
>> June 19, 2008
>> April 19, 2008
>> April 16, 2008
>> February 9, 2008
>> September 2, 2007
>> 
>> I tried your attempt and it didn't work either; it returned the error 
>> message:
>> Error in FUN(X[[1L]], ...) :
>>  'file' must be a character string or connection
>> 
>> On 2011-01-17, at 2:02 PM, jim holtman wrote:
>> 
>>> try:
>>> 
>>> mylist <- lapply(a, read.table, header = TRUE, sep = '\n')
>>> 
>>> also is the separator really '\n' meaning a new-line?  What exactly
>>> does the data look like?
>>> 
>>> On Mon, Jan 17, 2011 at 11:47 AM, Simon Kiss  wrote:
>>>> Hello,
>>>> I'm trying to read in 50 text filess with dates as content to create a 
>>>> list of tables.
>>>> 
>>>> a is the list of filenames that need to be read in.
>>>> 
>>>> The following command returns the following error
>>>> mylist<-lapply(a, read.table(header=TRUE, sep="\n"))
>>>> 
>>>> Error in read.table(header = TRUE, sep = "\n") :
>>>>  element 1 is empty;
>>>>   the part of the args list of 'is.character' being evaluated was:
>>>>   (file)
>>>> 
>>>> Does anyone have any suggestions?
>>>> Yours, Simon Kiss
>>>> *
>>>> Simon J. Kiss, PhD
>>>> Assistant Professor, Wilfrid Laurier University
>>>> 73 George Street
>>>> Brantford, Ontario, Canada
>>>> N3T 2C9
>>>> Cell: +1 519 761 7606
>>>> 
>>>> __
>>>> R-help@r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide 
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Jim Holtman
>>> Data Munger Guru
>>> 
>>> What is the problem that you are trying to solve?
>> 
>> *
>> Simon J. Kiss, PhD
>> Assistant Professor, Wilfrid Laurier University
>> 73 George Street
>> Brantford, Ontario, Canada
>> N3T 2C9
>> Cell: +1 519 761 7606
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 
> 
> -- 
> Jim Holtman
> Data Munger Guru
> 
> What is the problem that you are trying to solve?

*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 519 761 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Importing multiple text files with lapply.

2011-01-17 Thread Simon Kiss

Dear jim,
Yes, it's true, the data are separated onto new lines as follows:
January 11, 2009 
January 11, 2009 
October 19, 2008 
October 13, 2008 
August 16, 2008 
June 19, 2008 
April 19, 2008 
April 16, 2008 
February 9, 2008
September 2, 2007

I tried your attempt and it didn't work either; it returned the error message:
Error in FUN(X[[1L]], ...) : 
  'file' must be a character string or connection

On 2011-01-17, at 2:02 PM, jim holtman wrote:

> try:
> 
> mylist <- lapply(a, read.table, header = TRUE, sep = '\n')
> 
> also is the separator really '\n' meaning a new-line?  What exactly
> does the data look like?
> 
> On Mon, Jan 17, 2011 at 11:47 AM, Simon Kiss  wrote:
>> Hello,
>> I'm trying to read in 50 text filess with dates as content to create a list 
>> of tables.
>> 
>> a is the list of filenames that need to be read in.
>> 
>> The following command returns the following error
>> mylist<-lapply(a, read.table(header=TRUE, sep="\n"))
>> 
>> Error in read.table(header = TRUE, sep = "\n") :
>>  element 1 is empty;
>>   the part of the args list of 'is.character' being evaluated was:
>>   (file)
>> 
>> Does anyone have any suggestions?
>> Yours, Simon Kiss
>> *
>> Simon J. Kiss, PhD
>> Assistant Professor, Wilfrid Laurier University
>> 73 George Street
>> Brantford, Ontario, Canada
>> N3T 2C9
>> Cell: +1 519 761 7606
>> 
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> 
> 
> -- 
> Jim Holtman
> Data Munger Guru
> 
> What is the problem that you are trying to solve?

*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 519 761 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Importing multiple text files with lapply.

2011-01-17 Thread Simon Kiss

Hello,
I'm trying to read in 50 text filess with dates as content to create a list of 
tables.  

a is the list of filenames that need to be read in.

The following command returns the following error
mylist<-lapply(a, read.table(header=TRUE, sep="\n"))

Error in read.table(header = TRUE, sep = "\n") : 
  element 1 is empty;
   the part of the args list of 'is.character' being evaluated was:
   (file)

Does anyone have any suggestions?
Yours, Simon Kiss
*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 519 761 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] 45 Degree labels on barplot? Help understanding code previously posted.

2010-12-10 Thread Simon Kiss

Dear colleagues,
i found a line or two of code in the help archives from Uwe Ligges about 
creating slanted x-labels for a barplot and it works well for my purposes (code 
below). However, I was hoping someone could explain to me precisely what the 
code is doing.  
I'm aware it's invoking the text command, and I know the first ttwo arguments 
to text are x and y co-ordinates.  I'm also aware that par("usr")[3] is 
grabbing the third element of the vector of plotting co-ordinates.  But I tried 
replacing par("usr")[3] with just "0" and that didn't work; all the labels got 
bunched up on the left.  Is it necessary to create a new object via "barplot" 
and then quote that in the x,y coordinates of text? 
Like I said, the code works great, but I'm trying to actually understand the 
rationale behind the elements so I can apply it in future.
Yours,  Simon Kiss

#Reproducible Code
mydat<-data.frame(countries=c("Canada", "Denmark", "Framce", "United Kingdom", 
"Germany", "Australia", "New Zealand", "Switzerland", "Belgium", 
"Netherlands"), stories_total=c(429, 25,
239, 99, 100, 96, 18, 21, 0, 6), avg=c(4.165048544, 6.25, 6.459459459, 
0.908256881, 1.923076923, 1.103448276, 1.058823529, 1.615384615, 0, 
0.107142857), steps=c(2, 2, 2, 0,1, 1, 1, 0,0,0), 
newspapers=c(103, 4, 37, 109, 52, 87, 17, 13, 10, 56))
mydat.sort1<-mydat[order(-mydat$avg), ]
myplot<-barplot(mydat.sort1$avg, col=c("black", "black", "black", "grey", 
"white", "grey", "grey", "white", "white", "white"), ylim=c(0,7), 
main="Regulatory Action On Bisphenol A By Newspaper Coverage")
col.vec=c("black", "grey", "white")
legend("topright", col=col.vec, fill=c("black", "grey", "white"), 
legend=c("Meaningful Ban", "Recommendations To Withdraw", "No Legislative 
Action"))
labels=mydat.sort1$countries
#These lines create the labels
text(myplot, par("usr")[3], labels=labels, srt=35, offset=1, adj=1, xpd=TRUE)
axis(2)
par("usr")[3]

*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 519 761 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] separate elements of a character vector

2010-10-19 Thread Simon Kiss

Dear colleagues, this seems like an easy problem, and I found some suggestions 
which I've incorporated in the help list, but I can't quite get it right.  
I want to add a series of years to a second x-axis category label. I generate 
them with test and test_2 below, format them with some spacing (which is the 
suggestion I took from the R-list) and concatenate them and then write them 
with mtext.  At the end, the labels in test are bunched up together in the 
center of the plot window.  Can anyone suggest a way to space out the elements 
of "test" to look like evenly-spaced x-labels?
Yours, 
Simon Kiss

x1<-rnorm(500)
plot(x1)

test<-seq(1987, 2002, by=1)

test_2<-seq(2003, 2006, by=1)

test<-format(c(test, test_2), width=5)

mtext(test, side=1, line=2)
*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 519 761 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Create single vector after looping through multiple data frames with GREP

2010-10-10 Thread Simon Kiss

Hello all, 

I changed the subject line of the e-mail, because the question I''m posing now 
is different than the first one. I hope that this is proper etiquette.  
However, the original chain is included below.

I've incorporated bits of  both Ethan and Brian's code into the script below, 
but there's one aspect I can't get my head around. I'm totally new to 
programming with control structures. The reproducible code below creates a list 
containing 19 data frames, one each for the "Most Important Problem"  survey 
data for Canada.

What I'd like at this stage is a loop where I can search through all the data 
frames for rows containing the search term and then bind the rows together in a 
plotable (sp?) format.

At the bottom of the code below, you'll find my first attempt to make use of a 
search string and to put it into a plotable format.  It only partially works.  
I can only get the numbers for one year, where I'd like to be able to get a 
string of numbers for several years.But, on the upside, grep appears to do the 
trick in terms of selecting rows.  

Can any one suggest a solution?
Yours truly,
Simon Kiss

#This is the reproducible code to set-up all the data frames
require("XML")
library(XML)
#This gets the data from the web and lists them
mylist <- paste ("http://www.queensu.ca/cora/_trends/mip_";,
c(1987:2001,2003:2006), ".htm", sep="")
alltables <- lapply(mylist, readHTMLTable)

#convert to dataframes
r<-lapply(alltables, function(x) {as.data.frame(x)} )

#This is just some house-cleaning; structuring all the tables so they are 
uniform 
r[[1]][3]<-r[[1]][2]
r[[1]][2]<-c(" ")
r[[2]][4]<-r[[2]][2]
r[[2]][5]<-r[[2]][3]
r[[2]][2:3]<-c(" ")
r[[3]][4:5]<-r[[3]][3:4]
r[[3]][3]<-c(" ")

#This loop deletes some superfluous columns and rows, turns the first column in 
to character strings and the data into numeric
for (i in 1:19) {
n.rows<-dim(r[[i]])[1]
r[[i]] <- r[[i]][15:n.rows-3, 1:5]
n.rows<-dim(r[[i]])[1]
row.names(r[[i]]) <-NULL
names(r[[i]]) <- c("Response", "Q1", "Q2", "Q3", "Q4")

r[[i]][, 1]<-as.character(r[[i]][,1])
#r[[i]][,2:5]<-as.numeric(as.character(r[[i]][,2:5]))
r[[i]][, 2:5]<-lapply(r[[i]][, 2:5], function(x) {as.numeric(as.character(x))})
#n.rows<-dim(r[[i]])[1]
#r[[i]]<-r[[i]][9
}

#This code is my first attempt at introducing a search string, getting the 
rows, binding and plotting;
economy<-r[[10]][grep('Economy', r[[10]][,1]),]
economy_2<-r[[11]][grep('Economy', r[[11]][,1]),]
test<-cbind(economy, economy_2)
plot(as.numeric(test), type='l')

#here's another attempt I'm trying
economy<-data.frame
for (i in 15:19) {
economy[i,] <-r[[i]][grep('Economy', r[[i]][,1]), ]
}

Begin forwarded message:

> From: Simon Kiss 
> Date: October 7, 2010 4:59:46 PM EDT
> To: Simon Kiss 
> Subject: Fwd: [R] Converting scraped data
> 
> 
> 
> Begin forwarded message:
> 
>> From: Ethan Brown 
>> Date: October 6, 2010 4:22:41 PM GMT-04:00
>> To: Simon Kiss 
>> Cc: r-help@r-project.org
>> Subject: Re: [R] Converting scraped data
>> 
>> Hi Simon,
>> 
>> You'll notice the "test" data.frame has a whole mix of characters in
>> the columns you're interested, including a "-" for missing values, and
>> that the columns you're interested in are in fact factors.
>> 
>> as.numeric(factor) returns the level of the factor, not the value of
>> the level. (See ?levels and ?factor)--that's why it's giving you those
>> irrelevant integers. I always end up using something like this handy
>> code snippet to deal with the situation:
>> 
>> unfactor <- function(factors)
>> # From http://psychlab2.ucr.edu/rwiki/index.php/R_Code_Snippets#unfactor
>> # Transform a factor back into its factor names
>> {
>>  return(levels(factors)[factors])
>> }
>> 
>> Then, to get your data to where you want it, I'd do this:
>> 
>> require(XML)
>> theurl <- "http://www.queensu.ca/cora/_trends/mip_2006.htm";
>> tables <- readHTMLTable(theurl)
>> n.rows <- unlist(lapply(tables, function(t) dim(t)[1]))
>> class(tables)
>> test<-data.frame(tables, stringsAsFactors=FALSE)
>> 
>> 
>> result <- test[11:42, 1:5] #Extract the actual data we want
>> names(result) <- c("Response", "Q1", "Q2","Q3","Q4")
>> for(i in 2:5) {
>> # Convert columns to factors
>> result[,i] <- as.numeric(unfactor(result[,i]))
>> }
>> result
>> 
>> From here

[R] Converting scraped data

2010-10-06 Thread Simon Kiss


Dear Colleagues,
I used this code to scrape data from the URL conatined within.  This  
code should be reproducible.


require("XML")
library(XML)
theurl <- "http://www.queensu.ca/cora/_trends/mip_2006.htm";
tables <- readHTMLTable(theurl)
n.rows <- unlist(lapply(tables, function(t) dim(t)[1]))
class(tables)
test<-data.frame(tables, stringsAsFactors=FALSE)
test[16,c(2:5)]
as.numeric(test[16,c(2:5)])
quartz()
plot(c(1:4), test[15, c(2:5)])

calling the values from the row of interest using test[16, c(2:5)] can  
bring them up as represented on the screen, plotting them or coercing  
them to numeric changes the values and in a way that doesn't make  
sense to me. My intuitino is that there is something going on with the  
way the characters are coded or classed when they're scraped into R.   
I've looked around the help files for converting from character to  
numeric but can't find a solution.


I also tried this:

as.numeric(as.character(test[16,c(2:5)] and that also changed the  
values from what they originally were.


I'm grateful for any suggestions.
Yours, Simon Kiss



*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 519 761 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] group means of multi-way table?

2010-09-21 Thread Simon Kiss

hello, can someone tell me how to generate the means for a data frame  
that looks like this? My data frame has many more variables, but I  
won't bother you with those; these are the one's that I'm interested in.
Needless to say, z is the variable in which I'm interested. I'd like  
to find out the mean score of z for NDP managers, Conservative  
managers and Liberal managers and then for a few other configurations.
Ive played around with aggregate, tapply and by, but I can't get it to  
work.

Cordially,
Simon Kiss
mydata=data.frame(cbind(x,y,z))
mydata$x=as.factor(sample(c("labourers", "salaried", "managers"),  
size=300, replace=TRUE))
mydata$y=as.factor(sample(c("NDP", "Green", "Liberal",  
"Conservative"), size=300, replace=TRUE))

mydata$z=as.numeric(sample(1:4, size=300, replace=TRUE))

*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
223 Grand River Hall, 171 Colborne Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 519 761 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Grouping and stacking bar plot for categorical variables

2010-07-19 Thread Simon Kiss


Hi all,
I have a series of cateogiral variables that look just like this:

welfare=sample(c("less", "same", "more"), 1000, replace=TRUE)
education=sample(c("less", "same", "more"), 1000, replace=TRUE)
defence=sample(c("less", "same", "more"), 1000, replace=TRUE)
egp=sample(c("salariat", "routine non-manual", "self-employed,  
farmers", "skilled labour, foremen", "unskilled labour", "social and  
cultural specialists"), 1000, replace=TRUE)


welfare, education and defence are responses to a series of questions  
about whether or not the respondent supports, less, the same or more  
spending on an issue.


egp is a class category.

What I would like is a barplot that is both stacked and grouped.  The  
x-axis categories should be the egp class category.  Within each class  
category I would like a cluster of stacked bars that show the  
distribution of spending support for each issue.


Can anyone suggest something?
Yours, Simon Kiss

*
Simon J. Kiss, PhD
SSHRC and DAAD Post-Doctoral Fellow
John F. Kennedy Institute of North America Studies
Free University of Berlin
Lansstraße 7-9
14195 Berlin, Germany
Cell: +49 (0)1525-300-2812,
Web: http://www.jfki.fu-berlin.de/index.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] generate irregular series of dates

2010-07-01 Thread Simon Kiss


Dear Gabor,
Yours worked really well. For what it's worth, here is the final  
product.
I also added a line or two to reconvert the dates back to written form  
(October 15 2010).



require(chron)
dd <- seq(as.Date("INSERT FIRST DATE OF CLASSES IN TERM HERE"),  
as.Date("INSERT LAST DAY OF CLASSES IN TERM HERE"), "day")
a=as.character(dd[weekdays(dd) %in% c("INSERT FIRST WEEKDAY OF CLASS",  
"INSERT SECOND WEEKDAY OF CLASS")])
a=chron(a, format = c(dates="y-m-d"), out.format=c(dates="month day,  
year"))
write.table(a, "INSERT FILE LOCATION WHERE YOU WISH TO SAVE DATES",  
quote=FALSE, col.names=FALSE, row.names=FALSE)


Thanks a lot.
Simon Kiss

On 29-Jun-10, at 9:21 PM, Gabor Grothendieck wrote:


On Tue, Jun 29, 2010 at 6:22 AM, Simon Kiss  wrote:

Dear colleagues, particularly academic ones,
So I'm creating a Microsoft Word template for myself so that every  
time I
teach a new course, I don't have to enter in the dates manually for  
each

class session.
I'd like to use an R script that can generate an irregular series  
of dates
starting from one date (semester begin) to another (semester end)  
using an

irregular interval in between (Tuesdays and Thursdays, for example).
I know that a regular series of dates is no problem, but what about  
an

irregular series?


Generate all the dates in the range of interest and then pick off the
Tuesdays and Thursdays:

dd <- seq(as.Date("2010-01-01"), as.Date("2010-12-31"), "day")
dd[weekdays(dd) %in% c("Tuesday", "Thursday")]


*
Simon J. Kiss, PhD
SSHRC and DAAD Post-Doctoral Fellow
John F. Kennedy Institute of North America Studies
Free University of Berlin
Lansstraße 7-9
14195 Berlin, Germany
Cell: +49 (0)1525-300-2812,
Web: http://www.jfki.fu-berlin.de/index.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] generate irregular series of dates

2010-06-29 Thread Simon Kiss


Dear colleagues, particularly academic ones,
So I'm creating a Microsoft Word template for myself so that every  
time I teach a new course, I don't have to enter in the dates manually  
for each class session.
I'd like to use an R script that can generate an irregular series of  
dates starting from one date (semester begin) to another (semester  
end) using an irregular interval in between (Tuesdays and Thursdays,  
for example).
I know that a regular series of dates is no problem, but what about an  
irregular series?

Yours,
Simon Kisss
*
Simon J. Kiss, PhD
SSHRC and DAAD Post-Doctoral Fellow
John F. Kennedy Institute of North America Studies
Free University of Berlin
Lansstraße 7-9
14195 Berlin, Germany
Cell: +49 (0)1525-300-2812,
Web: http://www.jfki.fu-berlin.de/index.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Stacked Histogram, multiple lines for dates of news stories?

2010-06-28 Thread Simon Kiss


Dear colleagues,
I have extracted the dates of several news stories from a newspaper  
data base to chart coverage trends of an issue over time. They are in  
a data frame that looks just like one generated by the reproducible  
code below.
I can already generate a histogram of the dates with various intervals  
(months, quarters, weeks years) using hist.Date.  However, there are  
two other things I'd like to do.
First, I'd like to either create a stacked histogram so that one could  
see whether one newspaper really pushed coverage of an issue at a  
certain point while others then followed later on in time.  Second, or  
alternatively, I would like to do a line graph of the same data for  
the different papers to represent the same trends.
I guess what I'm finding challenging is that I don't have counts of  
the number of stories on each day or in each week or in each month; I  
just have the dates themselves.  The date.Hist command was very useful  
in turning those into bins, but I'd like to push it a bit further and  
to a stacked histogram or a multiple line chart.

Can anyone suggest a way to go about doing this?

I should say, I played around in Hadley Wickham's ggplot package and  
looked at his website, and there is a way to render multiple lines  
here: http://had.co.nz/ggplot2/scale_date.html
but it was not clear to me how to plot just the dates or an index of  
the dates as I don't have a value for the y axis, other than the  
number of times a story was published in that time frame.


Regardless, I hope someone can suggest something.
Yours,
Simon J. Kiss

test=sample(1:3, 50, replace=TRUE)
test=as.factor(test)
levels(test)=c("Star", "Globe and Mail", "Post")
test2=ISOdatetime(sample(2004:2009, 50, replace=TRUE), sample(1:12,  
size=50, replace=TRUE), sample(1:30, 50, replace=TRUE), 0,0,0)

test2=as.Date(test2)
test_df=data.frame(test, test2)

*
Simon J. Kiss, PhD
SSHRC and DAAD Post-Doctoral Fellow
John F. Kennedy Institute of North America Studies
Free University of Berlin
Lansstraße 7-9
14195 Berlin, Germany
Cell: +49 (0)1525-300-2812,
Web: http://www.jfki.fu-berlin.de/index.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Comparing a 4-point and 5-point Likert scale

2010-06-03 Thread Simon Kiss


Help with survey data:
Hello R colleagues,
I hope this is an appropriate place to direct this question.  It  
relates specifically to the comparability of a 5-point likert to a 4- 
point likert scale.


One question in my dataset asks "How much should be done to reduce the  
gap between rich and poor"

Much more, somewhat more, about the same, somewhat less and much less.

The second questions ask:
"People who can afford to, should be able to pay for their own health  
care"

strongly agree, agree, disagree, strongly agree.

Now, assuming that I rescale them so that 1 equals the most  
egalitarian position and the highest number (4 or 5) equals the least  
egalitarian position, how can I make these two results comparable.


Two ways come to mind: one is to collapse both into a dichotomous  
variable and do a logistic regression on both. The danger here is that  
I have to decide what to do with the middle position in the first  
question, assign it to the egalitarian or non-egalitarian category.
A second way would be to multiply the scores in the first question by  
4 (to get results that are either 4, 8, 12, 16 or 20) and then  
multiply the second question by five to get responses that are either  
5, 10, 15 or 20. My idea is then to add the two, average them and use  
that value as an index of economic egalitarianism?

Yes / no? Suggestions?
I am an R user and I hope that a purely statistical question is not  
especially misplaced.

Yours truly,
Simon Kiss
*
Simon J. Kiss, PhD
SSHRC and DAAD Post-Doctoral Fellow
John F. Kennedy Institute of North America Studies
Free University of Berlin
Lansstraße 7-9
14195 Berlin, Germany
Cell: +49 (0)1525-300-2812,
Web: http://www.jfki.fu-berlin.de/index.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] help calculating variable based on factor level of another

2010-05-27 Thread Simon Kiss


Dear colleagues,

I want to calculate the value of x2 based on the value of x1.  x1 is a
factor with three separate levels. I want to make sure that missing
values remain as NA in X2, but non-missing values take on a value of
either 0  or 1 dependending on the value in x1.

This is the code I'm working with...Can any one help?
I've seen some other requests on a topic like this, but not using  
factors with strings as levels; only with numeric variables.

Simon

x1<-factor(levels="social and cultural specialists", "labour",
"salariat")

x2<-if(x1==c("social and cultural specialists")) "1" elseif (x1==NA)
"NA" else "0"
*
Simon J. Kiss, PhD
SSHRC and DAAD Post-Doctoral Fellow
John F. Kennedy Institute of North America Studies
Free University of Berlin
Lansstraße 7-9
14195 Berlin, Germany
Cell: +49 (0)1525-300-2812,
Web: http://www.jfki.fu-berlin.de/index.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Compressed values on y-axis in effects plot

2010-05-20 Thread Simon Kiss

Dear colleagues, the code below generated the two effects plots that I  
have attached. I hope they are not stripped.


The original two models are as follows:
green_shift_mod=glm(green_shift ~ educ+party_id+educ:party_id,  
family=binomial, data=x)
carbon_tax_mod=glm(carbon_tax ~ educ+party_id+educ:party_id,  
family=binomial, data=x)


Then, I try to plot the effects of party_id by education for both models

It works well for carbon_tax_mod; but for green_shift_mod, effects  
plots the effects of party ID by education in a straight, horizontal  
line, with the values completely compressed.
I've looked through; all the variables included in the two models are  
identical save for the DV. And the DV's in both models are ordered  
factors.

Is any one familiar with this problem in effects plots?
Yours, Simon Kiss

quartz()
jpeg(filename="test.jpeg", type=c("quartz"))
plot(effect("educ:party_id", green_shift_mod, rug=TRUE),  
ylab="Probability of Disagreeing", xlab="Party ID", main="Probability  
of Disagreeing That The Green Shift Would Hurt The Economy")

dev.off()
quartz()
jpeg(filename="test2.jpeg", type=c("quartz"))
plot(effect("educ:party_id", carbon_tax_mod, rug=TRUE),  
ylab="Probability of Disagreeing", xlab="Party ID", main="Probability  
of Disagreeing That The Carbon Tax Would Hurt The Economy")

dev.off()



*
Simon J. Kiss, PhD
SSHRC and DAAD Post-Doctoral Fellow
John F. Kennedy Institute of North America Studies
Free University of Berlin
Lansstraße 7-9
14195 Berlin, Germany
Cell: +49 (0)1525-300-2812,
Web: http://www.jfki.fu-berlin.de/index.html








__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Finding different hues for a mosaic plot compatible with grayscale printing

2010-05-13 Thread Simon Kiss

Dear Colleagues,
Thanks for that JIm, but It strikes me that printing the residual  
values in the cells might be a simpler way of communicating the  
direction of each cell.
I can get the residuals printed via the labeling_values commands in  
mosaic, but I cannot seem to *combine* this with labeling_borders  
commands that I'd like to use to modify the rotation, font size and  
contents of variable names and labels.

The following mosaic command draws the plot with the labeling I'd like.

>mosaic(~social_class+ctax_agg_scaled, pop=FALSE, shade=TRUE,  
main="The Liberals Carbon Tax Or Green Shift Would Hurt The Canadian  
Economy By EGP Class Category", main_gp=gpar(fontsize=16),  
gp=shading_hcl(CST21$observed, CST21$expected, ASR21, df=12,  
h=c(260,0), c=c(100,0), l=c(90,50), interpolate=c(1,2,3,4)),  
labeling_args=list(labels=TRUE, rot_labels=c(25,0,0,25),  
gp_labels=gpar(fontsize=7), just_labels="center",  
offset_labels=c(1,0,0,4), offset_varnames=c(2,0,0,4),  
set_varnames=c(ctax_agg_scaled="The Liberal Green Shift Or Carbon Tax  
Would Hurt The Canadian Economy", social_class="EGP Class Category")))

And when I take out the labeling_borders commands and insert the  
following,

>labeling=labeling_values(value_type=c("residuals"), suppress=0)

then I do get the residuals printed, but the labels are unattractive.

How do I combine labeling_borders and labeling_values commands in one  
command.

Yours, Simon Kiss   
On 12-May-10, at 2:42 PM, Jim Lemon wrote:

On 05/12/2010 07:34 PM, Simon Kiss wrote:

I'm working with the following code below to generate a
how do I set the h,c, and l values such that the significant,  
positive

residuals appear different on a grayscale printer from significant
grayscale residuals. The challenge as I see it is that one can only
distinguish the positive and negative residuals with the hue/.  
Varying
the chroma and the luminance only affect the distinctions between  
large

and small and significant and non significant. But my positive and
negative residuals are both large (absolutely) and significant,  
meaning
that they will have the same chroma and luminosity, but different  
hues.

I guess the key here is to find two separate hue values that appear
substantially different *on a grayscale printer* at the same chroma  
and

luminance. I have read through Zeileis et al. (2007, 2008) but can't
quite find the answer there.
I have also tried the Friendly shading to vary the line type, but I
can't find line types that are different enough to communicate the
difference between positive and negative residuals clearly.

Your assistance is appreciated.

>mosaic(~educ+trade_off_scaled, shade=TRUE, main="Support For
Environmental Protection At The Expense of Creating Jobs By  
Education",
gp=shading_hcl(CST17$observed, CST17$expected, ASR17, df=6,  
h=c(260,0),

c=c(100,0), l=c(90,0)), labeling_args=list(rot_labels=c(25,90,0,0),
offset_labels=c(1,0,0,2), offset_varnames=c(2,0,0,4),
set_varnames=c(trade_off_scaled="Protecting The Environment Is More
Important Than Creating Jobs", educ="Level of Education")))

Hi Simon,
I thought that the symbolbox function might do something useful, but  
it required a bit of modification. The attached mod allows the user  
to fill a rectangle with symbols, which includes things like "+" and  
"-".

Jim

*
Simon J. Kiss, PhD
SSHRC and DAAD Post-Doctoral Fellow
John F. Kennedy Institute of North America Studies
Free University of Berlin
Lansstraße 7-9
14195 Berlin, Germany
Cell: +49 (0)1525-300-2812,
Web: http://www.jfki.fu-berlin.de/index.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Finding different hues for a mosaic plot compatible with grayscale printing

2010-05-12 Thread Simon Kiss


I'm working with the following code below to generate a
how do I set the h,c, and l values such that the significant, positive  
residuals appear different on a grayscale printer from significant  
grayscale residuals.  The challenge as I see it is that one can only  
distinguish the positive and negative residuals with the hue/. Varying  
the chroma and the luminance only affect the distinctions between  
large and small and significant and non significant.  But my positive  
and negative residuals are both large (absolutely) and significant,  
meaning that they will have the same chroma and luminosity, but  
different hues.
I guess the key here is to find two separate hue values that appear  
substantially different *on a grayscale printer* at the same chroma  
and luminance. I have read through Zeileis et al. (2007, 2008) but  
can't quite find the answer there.
I have also tried the Friendly shading to vary the line type, but I  
can't find line types that are different enough to communicate the  
difference between positive and negative residuals clearly.


Your assistance is appreciated.

>mosaic(~educ+trade_off_scaled, shade=TRUE, main="Support For  
Environmental Protection At The Expense of Creating Jobs By  
Education", gp=shading_hcl(CST17$observed, CST17$expected, ASR17,  
df=6, h=c(260,0), c=c(100,0), l=c(90,0)),  
labeling_args=list(rot_labels=c(25,90,0,0), offset_labels=c(1,0,0,2),  
offset_varnames=c(2,0,0,4),  
set_varnames=c(trade_off_scaled="Protecting The Environment Is More  
Important Than Creating Jobs", educ="Level of Education")))


*
Simon J. Kiss, PhD
SSHRC and DAAD Post-Doctoral Fellow
John F. Kennedy Institute of North America Studies
Free University of Berlin
Lansstraße 7-9
14195 Berlin, Germany
Cell: +49 (0)1525-300-2812,
Web: http://www.jfki.fu-berlin.de/index.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] NULL variable read in from SPS

2010-04-27 Thread Simon Kiss


Hello all,
I'm having difficulty getting one particular variable into R from SPSS  
v. 16.0 for mac.  R version is 2.10.1.  I saved the relevant variables  
from SPSS into a .csv file and then read them into R.  All the  
variables worked fine, except for one (enviro_spending). In the SPSS  
file it is correctly coded as a nominal variable and there is nothing  
that I can tell that distinguishes it from the others.


I have tried to include a good representation of reproduceable code  
below along with the results I am obtaining.


Yours, Simon

The variables are as follows:

educ =c("university", "university")
trade_off =c("*this cell is blank*", "disagree")
age=c(45,43)
gender_1=c("female", "female")
eviro_spending=c("Less/Same", "Less/Same")
carbon_tax_agg=c("agree", "disagree")
y=data.frame(educ, trade_off, age, gender_1, enviro_spending,  
carbon_tax_agg)


#The following are the original commands I used to read the .csv file  
into R.

y=read.csv(file.choose(), header=TRUE)

#When I do the following, all the variable names are correct

names(y)

#When I do the following, all the data in the dataframe are correct.
y

#But when I do the following, I get the following results
y$enviro_spending
#NULL
is.character(y$enviro_spending)
#FALSE
is.factor(y$enviro_spending)
#FALSE

#I tried to save the single problematic variable from my spss file to  
a .csv file as and then read that into R.

z=read.csv(file.choose(), header=TRUE)
#Just as before, calling the dataframe gives the data exactly as it  
should

z
#less/same
#more
#less/same
#more
#more

#But when I call the specific variable, I get #NULL
z$enviro_spending
#NULL
*
Simon J. Kiss, PhD
SSHRC and DAAD Post-Doctoral Fellow
John F. Kennedy Institute of North America Studies
Free University of Berlin
Lansstraße 7-9
14195 Berlin, Germany
Cell: +49 (0)1525-300-2812,
Web: http://www.jfki.fu-berlin.de/index.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problem with recode -Error in parse(text = range[[1]][1]) : unexpected end of input in " c(0"

2010-04-14 Thread Simon Kiss


Dear colleagues,
in the help archive there was a previous person who encountered a  
problem with the "recode" command in the car library. I'm not sure if  
that was solved, there was no posting to that effect, but I'm having  
the same problem.


I'm trying to recode a numeric variable with values from 0-100 into a  
binary variable with values (0,1).


The following command:

recode(green_2004_2$french, "c(50:100)=0; c(0:49.99)=1")

gets the following error message

Error in parse(text = range[[1]][1]) : unexpected end of input in " c(0"

I tried it with a second numerical variable in the same data set, but  
get precisely the same error at precisely the same location in the  
command, i.e. the second colon.
As far as I can tell I have the most up-to-date version of car  
installed.

Any suggestions?
Yours, Simon Kiss

*
Simon J. Kiss, PhD
SSHRC and DAAD Post-Doctoral Fellow
John F. Kennedy Institute of North America Studies
Free University of Berlin
Lansstraße 7-9
14195 Berlin, Germany
Cell: +49 (0)1525-300-2812,
Web: http://www.jfki.fu-berlin.de/index.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Summing a series made up of part of a vector

2010-04-07 Thread Simon Kiss


Dear colleagues,
I have a data frame that looks so:
*x1  x4
1   4.2
2   3.6
3   2.7
.
.
308 n.a.

x4 is a vector of percentages, sorted in descending value. I would  
like to create a new variable that represents the sum of the series of  
values of x4 to that row.  So I would like x5 to look like this.


x5
1 4.2
2 7.8 (4.2 +3.6)
3 10.5 (4.2+3.6+2.7)

308 =na

So the last number in the vector x5 should be 100, as these are all  
percentages.


Any suggestions? Yours truly,
simon Kiss
*
Simon J. Kiss, PhD
SSHRC and DAAD Post-Doctoral Fellow
John F. Kennedy Institute of North America Studies
Free University of Berlin
Lansstraße 7-9
14195 Berlin, Germany
Cell: +49 (0)1525-300-2812,
Web: http://www.jfki.fu-berlin.de/index.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] row names in regression results and saving the identification results from added variable plots

2010-03-26 Thread Simon Kiss


Hello all,
Is there a way to take the row names from my data.frame and have them  
imported to the regression results?

At the moment, I my original data frame looks like this:
/ Riding name / Turnout / Margin / Expenditures
1 / Abbotsford
2 / .
3 / .
4 / .Willow

I know how to set the row names for the original data frame to be the  
Riding name, but when I run the regression, the residuals, dfbetas,  
cook's d all lose those and are listed with the original row number.   
This does not pose a significant problem for when I'm just looking at  
residuals and defbetas, because I've figured out how to match up the  
row names to those variables.


But it is posing a bit of a problem now that I'm looking at added  
variable plots; the calculations are more difficult to match up the  
results to the row names.


As a second question, I have figured out how to identify the points in  
added variable plots - av.plots=(model,  
labels=names(residuals(model_name)), identify.points=TRUE)


However, when I'm finished identifying points, the results are not  
saved. I'm not sure if I can use the "identify" command with the  
av.plots command in (car) as you can with other standard plots because  
av.plots brings up an interactive menu that does not appear to allow  
for that.


If any one can help, it would be appreciated!


Yours,
Simon Kiss
*
Simon J. Kiss, PhD
SSHRC and DAAD Post-Doctoral Fellow
John F. Kennedy Institute of North America Studies
Free University of Berlin
Lansstraße 7-9
14195 Berlin, Germany
Cell: +49 (0)1525-300-2812,
Web: http://www.jfki.fu-berlin.de/index.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R Full Screen

2010-03-24 Thread Simon Kiss


Hello all,
I'm new user with R and just completed a five day course on the  
program. Somehow, a few basic questions remain unanswered. I'm working  
on a Mac Os X system and have my laptop connected to a large, flat- 
screen monitor. I can't make any of the Quartz windows fill the  
monitor's screen; I'd like to make them full screen to identify points  
in a dense scatterplot.

Thank you for any suggestions. Yours, Simon Kiss
*
Simon J. Kiss, PhD
SSHRC and DAAD Post-Doctoral Fellow
John F. Kennedy Institute of North America Studies
Free University of Berlin
Lansstraße 7-9
14195 Berlin, Germany
Cell: +49 (0)1525-300-2812,
Web: http://www.jfki.fu-berlin.de/index.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

90 matches

Mail list logo