[R] Sample of a subsample
Hello everybody! I have the following problem: I'd like to select a sample from a subsample in a dataset. Actually, I don't want to select it, but to create a new variable sampleNo that indicates to which sample (one or two) a case belongs to. Lets suppose I have a dataset containing 40 cases: data <- data.frame(var1=seq(1:40), var2=seq(40,1)) The first sample (n=10) I drew like this: data$sampleNo <- 0 idx <- sample(seq(1,nrow(data)), size=10, replace=F) data[idx,]$sampleNo <- 1 Now, (and here my problems start) I'd like to draw a second sample (n=10). But this sample should be drawn from the cases that don't belong to the first sample only. *Additionally, "var1" should be an even number.* So sampleNo should be 0 for cases that were not drawn at all, 1 for cases that belong to the first sample and 2 for cases belonging to the second sample (= sampleNo equals 0 and var1 is even). I was trying to solve it like this: idx2<-data$var1%%2 & data$sampleNo==0 sample(data[idx2,], size=10, replace=F) But how can I set sampleNo to 2? Thank you very much for your help! David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] calculate factor scores
Hello everybody, I have a problem regarding factor analysis: As I am using the hetmat()-function from the polycor-package in order to calculate different kinds of correlation coefficients automatically* I cannot obtain factor scores using fit$scores. The problem is that I am using the fa()-function with a correlation table (structural level) instead of raw data. Can anyone help me in calculating factor scores ex post? Thank you for any hints! David -- Here's the code: # select variables df<-data[c("var1", "var2", "var3", "var4", "var5", "var6")] # compute heterogenous correlation matrix library(polycor) hetmat<-hetcor(df)$cor # factor analysis library(psych) fa.parallel(hetmat) # number of factors? fit<-fa(hetmat, nfactors=4, fm="ml", rotate="varimax") # factor analysis colnames(fit$loadings)<-c("Factor1","Factor2","Factor3","Factor4") fit$scores??? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] aggregating variables (sum within groups)
Hello everybody! I have a (probabely very easy) problem. Even though I was looking in several r-books I could not find a suitable function to this problem, that's why I hope that someone here could help me: # Sample data: group-c(A,A,A,B,B,C,C,C) var1-c(1,0,0,1,1,0,NA,1) var2-c(0,1,NA,0,1,1,0,0) testdata-data.frame(group, var1, var2) Now, I'd like to generate two aggregated variables: testdata$x- ??? should count the sum of var1 within each group (=4) testdata$y- ??? should count the sum of var2 within each group (=3) Therefore I am looking for a function like ave() which does not calculate the mean value but a sum. Thank you for any hints! David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] factor levels numeric values
Hi everybody, I have another question (to which I could not find an answer in my r-books. I am sure, it's not a great issue, but I simply lack of a good idea how to solve this: One of my variables gets imported as a factor instead of a numeric variable. Now I have a... Factor w/ 63 levels 0,0.02,0.03,..: 1 NA NA 1 NA NA 1 1 53 10 ... How can I transform these factor levels into actual values? Thank you very much for any help! David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Counting within groups / means by groups
Hi everyone! I have problems finding a solution to the following two problems: My sample-dataframe consists of two variables group and value: group-c(A, A, A, B, B, B, B, C) value-c(1,3,2,2,2,4,4,1) df-as.data.frame(cbind(group, value)) Problem 1: ** Now I'd like to count the number of group-A-cases, group-B-cases etc and write this number into a new column. It should be like: count_group-c(3, 3, 3, 4, 4, 4, 4, 1) Problem 2: *** I'd like to add new column with the mean values (or any other function) within my groups. E.g: Group A: (1+3+2)/3=2 Group B: (2+2+4+4)/4=3 Group C: =1 Now I'd add another column 2 2 3 3 3 3 1 Can anyone help me, how this can be done best? Thank you! David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Crime hotspot maps (kernel density)
Hi everybody, does anyone of you know how to create a (crime) hotspot map using R? Are there any packages or do you know any ressources? It should be something like this: http://www.caliper.com/Maptitude/Crime/MotorVehicleTheft2.png (but it doesnt necessarely have to be a map) Many thanks, David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] reduce three columns to one with the colnames
Hello everybody, I have three variables blue, green and red containing values 0 (no) and 1 (yes). How can I easily create another variable colors with the values blue, green and red? I hope that you can understand my question and appreciate any solutions or hints! Thank you! David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reduce three columns to one with the colnames
OK, seems like nobody understood my question ;-) Let's make another example: I have three variables: data$male and data$female and data$transsexuals All the three of them contain the values 0 and 1. Now I'd like to create another variable data$sex. Now in all cases where data$female==1 the variable data$sex should be set to 'female', all in all cases where data$male==1 the variable data$sex should be set to 'male' and so on... Thank you! David 2013/5/13 Bert Gunter gunter.ber...@gene.com No -- my answer is wrong. I'll leave it to others to correct. Obvious question to OP: What if more than one of your colors variables simultaneously have a 1? -- Bert On Mon, May 13, 2013 at 8:09 AM, Bert Gunter bgun...@gene.com wrote: Cute answer, Pascal. It may even be the answer to the question the OP should have asked, but I don't think it answered the question that was asked. That might be: c(red[red], green[green], blue[blue]) Cheers, Bert On Mon, May 13, 2013 at 7:36 AM, Pascal Oettli kri...@ymail.com wrote: Hi, ?rgb HTH Pascal 2013/5/13 David Studer stude...@gmail.com Hello everybody, I have three variables blue, green and red containing values 0 (no) and 1 (yes). How can I easily create another variable colors with the values blue, green and red? I hope that you can understand my question and appreciate any solutions or hints! Thank you! David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Color spalettes for black/white printing
Hi everybody! Does anyone know a good way to color my images so that when I print them out on a non-color-printer the colors used can be distinguished well? As I have many categories I would not want to assign the colors c(black, grey, white) by hand. Thank you! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] urgent: question concerning data manipulation
Hello everyone! Does anyone of you know how I could solve the following problem. I guess, it is not a very difficult question, but I simply lack of the right idea: I have a dataset containing data of convictions. This dataset contains 4 columns: - personId: individual number that identifies the offender - law: law which has been violated - article: article which has been violated # Testdata: personId-c(1,1,2,2,2,2,2,3,4,4) law-c(SVG, SVG, StGB, StGB, SVG, AuG, StGB, SVG, StGB, AuG) article-c(10, 10, 123, 122, 10, 40, 126, 10, 111, 40) testdata-data.frame(personId, law, article) Now I'd like to create three additional dummy-coded columns for each law (SVG, StGB, AuG). For each offender (all offenders have the same personId) it should be checked, whether there are any violations against the three laws. If there are any violations against SVG (for example), then in all rows of this offender the column SVG should have the value 1 (otherwise 0). For example offender 2 has once violated against law SVG therefore his four entries should have the value 1 at the column SVG. I hope you can understand my problem. I'd really appreciate any hints and solutions! Thank you! David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] recoding variables again :(
Hello everybody! I have again a rather simple question concerning recoding of variables: I have a variable/data-frame column BIRTHPLACE containing abbreviations of the 26 swiss counties (AG, AI, AR, BE, ZH, ... ) as well as international country codes (USA, GER, ESP, etc.) and another variable RES_STA indicating the residence status (A, B, C, X, Y) My goal is now to create a new variable VARNEW under the following conditions: - should be the RESIDENCE_STATUS - except: - if RESIDENCE_STATUS is X and at the same time BIRTHPLACE is one of the 26 swiss counties then it should be swiss - otherweise it should be unknown I have already tried the following code: mydata$VARNEW-mydata$RESIDENCE_STATUS # setting VARNEW as RESIDENCE_STATUS idx-(mydata$RESIDENCE_STATUS==X !(# TRUE: unknown; FALSE: swiss mydata$BIRTHPLACE==AG | mydata$BIRTHPLACE==BE | mydata$BIRTHPLACE==AR ... ) ) and then? Thank you for any help! David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Recoding variables (without recode() )
Hi everybody! I have a rather simple question: # play data persId-c(1,2,3,1,4,5,2) varA-c(11,12,13,12,14,15,10) df-as.data.frame(cbind(persId, varA)) Now I'd like to create a new columns (df$new) according to the value of df$VarA. For example df$new1 should be 1 if df$varA==2 or df$new2 should be 1 if df$varA13. I tried to do it like this: if(df$varA==2) {df$new1-1} But, obviously, that's not how it works (I might be thinking to much in mySQL: update table set new1=1 where varA==2). How can I solve this problem using if? I would not want to use recode() as my conditions might be more complicated later on. Thank you very much! David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] missing values are not allowed in subscripted assignments of data frames
Hello everybody! I am trying to replace community numbers with community names (character). I am using the following code: data[data$commNo==786, commNo]-Name of the Community Unfortunately, I get the error message missing values are not allowed in subscripted assignments of data frames However, when I check data$commNo with table(useNA=always) or with table(is.na(data$commNo)) it tells me that there are no NA's at all... ? Can anyone help please? Thank you very much! David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] importing a SAS syntax-files (value labels)
Hello everybody, I imported an SAS data-file into R. open.sas7bdat() did not work, so I had to convert it to csv first. Now I would like to recode the value values into factors. Unfortunately I only have a SAS syntax file, having this form: proc format; value $resstatus 'B'= 'Jahresaufenthalter' 'C' = 'Niedergelassene' 'I' = 'Dipl./int. Funkt. und Angehörige' ; run; Does anyone know if there is a possibility to change the numeric value labels into factor levels acording to the SAS syntax-file? I cannot do this manually as there are several hundred labels... Thank you! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] p-values from lm()
Hi everyone! Can anyone tell me, how to obtain p.values from a linear model? Example: mod1-lm(dV~iV1+iV2) Now, I can get the coefficients with mod1$coef But how can I get p-values? ($p.values seems to work with cor.test() only) Thank you! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] divide factor in n equal groups?
Could anyone please tell me what is the most elegant way to divide an ordinal variable in equal groups? (as cut() does with continous variables) for example I'd like to have the factor educational level in three groups low medium and high Thank you! David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] regression methods for rare events?
Hi everybody! I have a sample with n=2.000. This sample contains rare events (10, 20, 30 individuals with a specific illness). Now I'd like to do a logistic regression in order to identify risk factors. I have several independent variables on an interval scale. Does anyone know whether the number of these rare events is sufficient in order to calculate a multivariate logistic regression? Or are there any alternative models I should use? (which are available in R) Thank you very much any advice! David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] inter-item-correlation-table
Hi everybody! Does anyone know how to obtain a inter-item-correlation-table (with p-values or significance-levels)? (as SPSS does, either spearman or pearson) Repeatedly using cor.test() is pretty exhausting as the table size increases... Thank you! David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] y-axis-problem (barplots)
Hi everybody! I would like to plot a barplot, but, unfortunately, when I change the y-axis limits the bars do not start at 0 any more but get negative: # # Data (just a short example): a-c(1.61, 2.1) b-c(1.5, 1.9) c-c(1.85, 2.2) d-c(1.63, 2.3) x-rbind(a,b,c,d) colnames(x)-c(var1,var2) # Problem: barplot(x[,1]) barplot(x[,1], ylim=c(1,2)) # Bars do not start at 0 # Could anyone help me? Thank you! David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Transform dataframe
Hi everyone! I have to following question: I have three items that had to be ordered (e.g. three persons were rating var1 on the first rank): var1 var2 var3 123 213 132 123 Now I'd like to have the data.frame the other way round, so that the ranks are in the columns: rank1 rank2 rank3 var1 var2 var3 var2 var1 var3 var1 var3 var2 var1 var2 var3 Can anyone help me achieving this? # code: var1-c(1,2,1,1) var2-c(2,1,3,2) var3-c(3,3,2,3) df-as.data.frame(cbind(var1,var2,var3,var4)) ?? Thank you very much! David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot2-Problem (plot different variables)
Hi everyone! I have the following difficulties using ggplot2 # My Data data-as.data.frame(cbind(a=c(1,1,2,2,2,2,3,3,4), b=c(1,2,3,3,4,4,4,4,4))) And I would like to plot the frequency-distributions of both variables in one plot as lines. For both variables the values (1-4) should be on the x-axys and the frequency on the y-axis. I have already found out, that this should work (somehow) using stat_summary() Can anyone help me? library(ggplot2) p-ggplot(data=data, aes(a,b) ) p+stat_summary(??) Thank you very much for help! David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Recode Variable
Hello everybody, I know this is pretty basic stuff, but could anyone explain me how to recode a single value of a variable into a missing value? I used to do it like this: myData[myData$var1==5;var1]-NA # recode value 5 into NA But the column var1 already contains NAs, which results in the following error message: missing values are not allowed in subscripted assignments of data frames Thank you very much for any advice! David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Multiple line-plot
Hello everybody! I have again another newbie-question. I was trying to plot three curves within one single plot: Crime development (relative frequencies) according to the hours of tv consume per week (high/low/all together). Here are the data: par(mfrow=c(1,1)) # Data input tvHrs-c(21,22,23,24,25,26,27,28,21,22,23,24,25,26,27,28,2,3,4,5,6,7,8,9,10,11,12,13,14) crimeDvp-c(2,2,2,2,2,3,3,3,3,3,3,3,4,4,5,5,2,2,3,3,3,3,3,4,4,4,4,5,5) crimeDvp-factor(crimeDvp, levels=1:5, labels=c(strongly\nincreased,increased,equal,decreased,strongly\ndecreased), ordered=T) data-data.frame(tvHrs, crimeDvp) # Plotting lines plot(prop.table(table(crimeDvp)), type='b', ylab='percent', xlab='crime development') legend(topright, inset=.05, title=TV consume, c(high,low,all), fill=c(red,black,green)) I have experimented with the lines()-function, but couldnt do it. Thank you for any hints! David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Multiple linear Regression: Standardized Coefficients
Hello everybody, Can anyone tell me, how to obtain standardized regression coefficients (betas) for my independent variables when doing a multiple linear regression? height-c(180,160,150,170,190,172) sex-c(1,2,2,1,1,2) age-c(40,20,30,40,20,25) fit-lm(height~age+sex) summary(fit) I already heard about the QuantPsyc-Package, which, unfortunately, produces an error (it says sd(data.frame is deprecated). Thank you very much! David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Change dataframe-structure
Hello everybody, I have the following problem and have no idea how to solve it: In my dataframe I have six columns representing six societal problems (p1, p2, ..., p6). The values are ranks between 1 (worst problem) and 6 (best problem) p1 p2 p3 p4 p5 p6 1 3 2 5 4 6 2 3 1 6 4 5 1 2 3 4 6 5 but I'd like the dataframe the other way round: 123456 p1 p3 p2 p4 p4 p6 p3 p1 p2 p5 p6 p4 p1 p2 p3 p4 p6 p5 Can anyone help? Thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] factor level for non-existing value
Hello everybody! Let's assume I have the following factor with it's levels: a-factor(c(2,3,3,2,4,2,3,2,2,2,3,2,3)) mydata-data.frame(a) When I plot the vector a using barplot(table(mydata$a) unfortunately the value 1 does not show up, as it does not appear in my data. But still, it theoretically exists. How can I assign the following levels to the factor? 1: dislike very much 2: dislike 3: like 4: like very much I have already tried the following code, which does not work levels(data$a)-c(dislike very much,dislike,like,like very much) as 2 then becomes dislike very much. I hope you understand my problem. Thank you for any help! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to select columns
Hello, I have the following question: when creating a data.frame a1-c(1,2,3) a2-c(1,2,3) c-data.frame(a1,a2) I can select columns using an index like: c[,1:2] Is this possible too when using column-names? (something like c(,a1:a2), which doesn't work) Alternative question: Is there a function to get the index of a variable by name or can I select certain columns using a loop? (a_1, a_2, ..., a_n) Thank you very much! David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.