[R] Banner using R
Hi, I am a bit rusty with R programming and I would appreciate some assistance with the following. I have a dataset like: Data <- data.frame(v1 = c('A', 'B' ,'B' ,'A', 'B'), v2 =c('A', 'B', 'A', 'A', 'B'), v3 = c('A', 'A', 'A', 'A', 'A’)) How can I get a banner of the sort? Count v1 v2 v3 TOT A 2 3 5 10 B 3 2 0 5 I have tried with xtabs and expss but I do not seem to get what I need... Thanks, Luca __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to visualise what code is processed within a for loop
Thank you for both replies Don & Rui, The very issue here is that there is a search that needs to be done within a text field and I agree with Rui later comment that regexpr might indeed be the time consuming piece of code. I might try to optimise this piece of code later on, but for the time being I am working on the following part of building a neural network to try indeed classifying some text. Again, thanks, Luca 2018-04-30 17:25 GMT+02:00 MacQueen, Don : > Luca, > > > > If speed is important, you might improve performance by making d0 into a > true matrix, rather than a data frame (assuming d0 is indeed a data frame > at this point). Although data frames may look like matrices, they aren’t, > and they have some overhead that matrices don’t. I don’t think you would > be able to use the [[nm]] syntax with a matrix, but [ , nm] should work, > provided the matrix has column names. Or you could perhaps index by column > number. > > > > I had a project some years ago in which I reduced calculation time a lot > by extracting the numeric columns of a data frame and working with them, > then recombining them with the character columns. R’s performance working > with data frames has improved since then, so I really don’t know if it > would make a difference for your task. > > > > -Don > > > > -- > > Don MacQueen > > Lawrence Livermore National Laboratory > > 7000 East Ave., L-627 > > Livermore, CA 94550 > > 925-423-1062 > > Lab cell 925-724-7509 > > > > > > *From: *Luca Meyer > *Date: *Monday, April 30, 2018 at 8:08 AM > *To: *Rui Barradas > *Cc: *"MacQueen, Don" , array R-help < > r-help@r-project.org> > *Subject: *Re: [R] How to visualise what code is processed within a for > loop > > > > Hi Rui > > Thank you for your suggestion, > > > > I have tested the code suggested by you against that supplied by Don in > terms of timing and results are very much aligned: to populate a 5954x899 > 0/1 matrix on my machine your procedure took 79 secs, while the one with > ifelse employed 80 secs, hence unfortunately not really any significant > time saved there. > > Nevertheless thank you for your contribution. > > Kind regards, > > > > Luca > > > > 2018-04-28 23:18 GMT+02:00 Rui Barradas : > > I forgot to explain why my suggestion. > > The logical condition returns FALSE/TRUE that in R are coded as 0/1. > So all you have to do is coerce to integer. > > This works because the ifelse will return a 1 or a 0 depending on the > condition. Meaning exactly the same values. And is more efficient since > ifelse creates both vectors, the true part and the false part, and then > indexes those vectors in order to return the appropriate values. This is > the double of the trouble and a great deal of memory used. > > Rui Barradas > > On 4/28/2018 10:12 PM, Rui Barradas wrote: > > Hello, > > instead of ifelse, the following is exactly the same and much more > efficient. > > d0[[nm]] <- as.integer(regexpr(d1[i,1], d0$X0) > 0) > > > Hope this helps, > > Rui Barradas > > On 4/28/2018 8:45 PM, Luca Meyer wrote: > > Thanks Don, > > for (i in 1:10){ >nm <- paste0("V", i) >d0[[nm]] <- ifelse( regexpr(d1[i,1], d0$X0) > 0, 1, 0) > } > > is exaclty what I needed. > > Best regards, > > Luca > > > 2018-04-25 23:03 GMT+02:00 MacQueen, Don : > > Your code doesn't make sense to me in a couple of ways. > > Inside the loop, the first line assigns a value to an object named "t". > Then, the second line does the same thing, assigns a value to an object > named "t". > > The value of the object named "t" after the second line will be the output > of the ifelse() expression, whatever that is. This has the effect of making > the first line irrelevant. Whatever value t has after the first line is > replaced by whatever it gets from the second line. > > It looks like the first line inside the loop is constructing the name of a > data frame column, and storing that name as a character string. However, > the second line doesn't use that name at all. If your goal is to update the > contents of a column, you need to assign something to that column in the > next line. Instead you assign it to the object named "t". > > What you're looking for will be more along the lines of this: > > for (i in 1:10){ >nm <- paste0("V", i) >d0[[nm]] <- ifelse( regexpr(d1[i,1], d0$X0) > 0, 1, 0) > } > > This may not a complete solution, since I have no idea what the con
Re: [R] How to visualise what code is processed within a for loop
Hi Rui Thank you for your suggestion, I have tested the code suggested by you against that supplied by Don in terms of timing and results are very much aligned: to populate a 5954x899 0/1 matrix on my machine your procedure took 79 secs, while the one with ifelse employed 80 secs, hence unfortunately not really any significant time saved there. Nevertheless thank you for your contribution. Kind regards, Luca 2018-04-28 23:18 GMT+02:00 Rui Barradas : > I forgot to explain why my suggestion. > > The logical condition returns FALSE/TRUE that in R are coded as 0/1. > So all you have to do is coerce to integer. > > This works because the ifelse will return a 1 or a 0 depending on the > condition. Meaning exactly the same values. And is more efficient since > ifelse creates both vectors, the true part and the false part, and then > indexes those vectors in order to return the appropriate values. This is > the double of the trouble and a great deal of memory used. > > Rui Barradas > > On 4/28/2018 10:12 PM, Rui Barradas wrote: > >> Hello, >> >> instead of ifelse, the following is exactly the same and much more >> efficient. >> >> d0[[nm]] <- as.integer(regexpr(d1[i,1], d0$X0) > 0) >> >> >> Hope this helps, >> >> Rui Barradas >> >> On 4/28/2018 8:45 PM, Luca Meyer wrote: >> >>> Thanks Don, >>> >>> for (i in 1:10){ >>>nm <- paste0("V", i) >>>d0[[nm]] <- ifelse( regexpr(d1[i,1], d0$X0) > 0, 1, 0) >>> } >>> >>> is exaclty what I needed. >>> >>> Best regards, >>> >>> Luca >>> >>> >>> 2018-04-25 23:03 GMT+02:00 MacQueen, Don : >>> >>> Your code doesn't make sense to me in a couple of ways. >>>> >>>> Inside the loop, the first line assigns a value to an object named "t". >>>> Then, the second line does the same thing, assigns a value to an object >>>> named "t". >>>> >>>> The value of the object named "t" after the second line will be the >>>> output >>>> of the ifelse() expression, whatever that is. This has the effect of >>>> making >>>> the first line irrelevant. Whatever value t has after the first line is >>>> replaced by whatever it gets from the second line. >>>> >>>> It looks like the first line inside the loop is constructing the name >>>> of a >>>> data frame column, and storing that name as a character string. However, >>>> the second line doesn't use that name at all. If your goal is to update >>>> the >>>> contents of a column, you need to assign something to that column in the >>>> next line. Instead you assign it to the object named "t". >>>> >>>> What you're looking for will be more along the lines of this: >>>> >>>> for (i in 1:10){ >>>>nm <- paste0("V", i) >>>>d0[[nm]] <- ifelse( regexpr(d1[i,1], d0$X0) > 0, 1, 0) >>>> } >>>> >>>> This may not a complete solution, since I have no idea what the contents >>>> or structure of d1 are, or what the regexpr() is expected to return. >>>> >>>> And notice the use of double brackets, [[ and ]]. This is one way to >>>> reference a column of a data frame when you have the column's name >>>> stored >>>> in a variable. Another way is d0[ , nm] >>>> >>>> >>>> A couple of additional comments: >>>> >>>> "t" is a poor choice of object name, because it is one of R's built-in >>>> functions (immediately after starting a fresh session of R, with nothing >>>> left over from any previous session, type help("r") and see what you >>>> get). >>>> >>>> ifelse() is intended for use on vectors, not scalars, and it looks >>>> like >>>> maybe you're using it on a scalar (can't be sure about this, though) >>>> >>>> For example, ifelse() is designed for this kind of usage: >>>> >>>>> ifelse( c(TRUE, FALSE, TRUE) , 1:3, 11:13) >>>>> >>>> [1] 1 12 3 >>>> >>>> Although it works ok for these >>>> >>>>> ifelse(TRUE, 3, 4) >>>>> >>>> [1] 3 >>>> >>>>> ifelse(FALSE, 3, 4) >>>>>
Re: [R] How to visualise what code is processed within a for loop
Thanks Don, for (i in 1:10){ nm <- paste0("V", i) d0[[nm]] <- ifelse( regexpr(d1[i,1], d0$X0) > 0, 1, 0) } is exaclty what I needed. Best regards, Luca 2018-04-25 23:03 GMT+02:00 MacQueen, Don : > Your code doesn't make sense to me in a couple of ways. > > Inside the loop, the first line assigns a value to an object named "t". > Then, the second line does the same thing, assigns a value to an object > named "t". > > The value of the object named "t" after the second line will be the output > of the ifelse() expression, whatever that is. This has the effect of making > the first line irrelevant. Whatever value t has after the first line is > replaced by whatever it gets from the second line. > > It looks like the first line inside the loop is constructing the name of a > data frame column, and storing that name as a character string. However, > the second line doesn't use that name at all. If your goal is to update the > contents of a column, you need to assign something to that column in the > next line. Instead you assign it to the object named "t". > > What you're looking for will be more along the lines of this: > > for (i in 1:10){ > nm <- paste0("V", i) > d0[[nm]] <- ifelse( regexpr(d1[i,1], d0$X0) > 0, 1, 0) > } > > This may not a complete solution, since I have no idea what the contents > or structure of d1 are, or what the regexpr() is expected to return. > > And notice the use of double brackets, [[ and ]]. This is one way to > reference a column of a data frame when you have the column's name stored > in a variable. Another way is d0[ , nm] > > > A couple of additional comments: > > "t" is a poor choice of object name, because it is one of R's built-in > functions (immediately after starting a fresh session of R, with nothing > left over from any previous session, type help("r") and see what you get). > > ifelse() is intended for use on vectors, not scalars, and it looks like > maybe you're using it on a scalar (can't be sure about this, though) > > For example, ifelse() is designed for this kind of usage: > > ifelse( c(TRUE, FALSE, TRUE) , 1:3, 11:13) > [1] 1 12 3 > > Although it works ok for these > > ifelse(TRUE, 3, 4) > [1] 3 > > ifelse(FALSE, 3, 4) > [1] 4 > They are not really what it is intended for. > > -- > Don MacQueen > Lawrence Livermore National Laboratory > 7000 East Ave., L-627 > Livermore, CA 94550 > 925-423-1062 > Lab cell 925-724-7509 > > > On 4/24/18, 12:30 AM, "R-help on behalf of Luca Meyer" < > r-help-boun...@r-project.org on behalf of lucam1...@gmail.com> wrote: > > Hi, > > I am trying to debug the following code: > > for (i in 1:10){ > t <- paste("d0$V",i,sep="") > t <- ifelse(regexpr(d1[i,1],d0$X0)>0,1,0) > } > > and I would like to see what code is actually processing R, how can I > do > that? > > More to the point, I am trying to update my variables d0$V1 to d0$V10 > according to the presence or absence of some text (contained in the > file > d1) within the d0$X0 variable. > > The code seem to run ok, if I add print(table(t)) within the loop I > can see > that the ifelse procedure is working and to some cases within the > d0$V1 to > d0$V10 variable range a 1 is assigned. But when checking my d0$V1 to > d0$V10 > after the for loop they are all still equal to zero... > > Thanks, > > Luca > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to visualise what code is processed within a for loop
Hi Bob, Thank you for your suggestion. Actually d0 is a dataframe, does that change something in the code you propose? Kind regards, Luca 2018-04-24 10:19 GMT+02:00 Bob O'Hara : > The loop never assigns anything to d0, only t. The first line makes t > a character string "d0$V1" (or "d0$V2" etc.). The second line assigns > either 0 or 1 to t. > > Looking at this, I don't think you've got into the R psychology (bad > news if you want to use R, good news in many other ways). I assume d0 > is a list, so could you put the V's into a vector, and then just use > this: > > d0$V <- sapply(d1[1:10,1], grepl, d0$X0) > > (I haven't checked it, but it looks,like it will do the trick. It > returns a logical vector, so if you need integers, then use an > as.numeric() around the right hand side. Or hope that R does type > conversion for you when you need it) > > HTH > > Bob > > On 24 April 2018 at 09:30, Luca Meyer wrote: > > Hi, > > > > I am trying to debug the following code: > > > > for (i in 1:10){ > > t <- paste("d0$V",i,sep="") > > t <- ifelse(regexpr(d1[i,1],d0$X0)>0,1,0) > > } > > > > and I would like to see what code is actually processing R, how can I do > > that? > > > > More to the point, I am trying to update my variables d0$V1 to d0$V10 > > according to the presence or absence of some text (contained in the file > > d1) within the d0$X0 variable. > > > > The code seem to run ok, if I add print(table(t)) within the loop I can > see > > that the ifelse procedure is working and to some cases within the d0$V1 > to > > d0$V10 variable range a 1 is assigned. But when checking my d0$V1 to > d0$V10 > > after the for loop they are all still equal to zero... > > > > Thanks, > > > > Luca > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > -- > Bob O'Hara > NOTE NEW ADDRESS!!! > Institutt for matematiske fag > NTNU > 7491 Trondheim > Norway > > Mobile: +49 1515 888 5440 > Journal of Negative Results - EEB: www.jnr-eeb.org > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to visualise what code is processed within a for loop
Hi, I am trying to debug the following code: for (i in 1:10){ t <- paste("d0$V",i,sep="") t <- ifelse(regexpr(d1[i,1],d0$X0)>0,1,0) } and I would like to see what code is actually processing R, how can I do that? More to the point, I am trying to update my variables d0$V1 to d0$V10 according to the presence or absence of some text (contained in the file d1) within the d0$X0 variable. The code seem to run ok, if I add print(table(t)) within the loop I can see that the ifelse procedure is working and to some cases within the d0$V1 to d0$V10 variable range a 1 is assigned. But when checking my d0$V1 to d0$V10 after the for loop they are all still equal to zero... Thanks, Luca [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to dynamically add variables to a dataframe
Hi, I am a bit rusty with R programming and do not seem to find a solution to add a number of variables to my existing dataframe. Basically I need to add n=dim(d1)[1] variables to my d0 dataframe and I would like them to be named V1, V2, V3, ... , V[dim(d1)[1]) When running the following code: for (t in 1:dim(d1)[1]){ d0$V[t] <- 0 } all I get is a V variable populated with zeros... I am sure there is a fairly straightforward code to accomplish what I need, any suggestion? Thank you, Luca [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to integrate a dynamic code within a R script?
Hi, I am working on a script which should includes a dynamic listing, i.e. # SCRIPT BEGINS # some R procedures here # DYNAMIC PART BEGINS d1$X5 <-f1("AAA") d1$X5 <-f1("AAa") d1$X5 <-f1("ABa") # etc... d1$X6 <-f2("AAA") d1$X6 <-f2("AAs") d1$X6 <-f2("ABs") # etc... # DYNAMIC PART ENDS # other procedures here # SCRIPT ENDS Basically I have an Excel page with a quite long listing of "AAA", "AAa", "ABa", "ccc", "Ded", etc, one entry on each line. The listing is likely to change over time and the script will run at least once a day. My initial planning was to do something like f1 <- read.xlsx("LIST.xlsx",1, startRow=2, colNames = F) f1$X2 <- paste('d1$X5 <-f1("',f1$X1,'")', sep='') f1$X3 <- paste('d1$X6 <-f2("',f1$X1,'")', sep='') and I obtain something like X1 X2X3 1 AAA d1$X5 <-f1("AAA") d1$X6 <-f2("AAA") 2 AAa d1$X5 <-f1("AAa") d1$X6 <-f2("AAs") 3 ABa d1$X5 <-f1("ABa") d1$X6 <-f2("ABs") How can I integrate the above in the DYNAMIC PART of my script above? I am sure there is a pretty simple solution but I seem not to get around to it. Thanks, Luca [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Writing text files out of a dataset
Hello, I am trying to run the following syntax for all cases within the dataframe "data" d1 <- data[1,c("material")] fileConn<-file("TESTI/d1.txt") writeLines(d1, fileConn) close(fileConn) I am trying to use the for function: for (i in 1:nrow(data)){ d[i] <- data[i,c("material")] fileConn<-file("TESTI/d[i].txt") writeLines(d[i], fileConn) close(fileConn) } but I get the error: Object "d" not found Any suggestion on how I can solve the above? Thanks, Luca [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R and Supervised learning
Hi Bert, Thank you for your useful suggestions I will follow them and come back to this list with any specific R code issue I might have. Kind regards, Luca 2017-10-02 16:57 GMT+02:00 Bert Gunter : > Luca: > > 1. We are not a consulting service. We *help* with R pogramming issues. > Users are typically expected to make an effort by providing R code and, if > appropriate, small data sets that illustrate their difficulties. > > 2. SEARCH! e.g. on "text processing R" or some such; or try Rseek.org with > such searches. R has extensive text processing capabilities, e.g. via > regex's. > > 3. "Supervised Learning algorithm" is far too vague to be useful. > > 4. See this CRAN task view: > https://cran.r-project.org/web/views/MachineLearning.html > > 4. The answer to your query is almost certainly yes, but you may have to > do some reading to clarify your thinking. As this involves primarily > statistical issues, you may wish to post on a statistical site like > http://stats.stackexchange.com/ to get advice. R-help site helps with R > programming primarily, not statistical methodology (although they do > sometimes intersect). > > Cheers, > Bert > > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R and Supervised learning
Hi, I am currently find myself selecting manually amoungts several hundreds Google Alerts (GA) texts those that are indeed relevant for my research vs those which are not (despite they are triggered by some relevant seach keywords). Basically each week I get several hundreds GA email such as: https://www.dropbox.com/s/u7rp0ez1tamq001/Alerte%20Google%C2%A0-%20laitier%20-%20lucam1968%40gmail.com%20-%20Gmail.pdf?dl=0 and https://www.dropbox.com/s/1ubx5enw6tc90hj/Google%20Alert%20-%20latte%20-%20lucam1968%40gmail.com%20-%20Gmail.pdf?dl=0 >From such emails I create a file such as: https://www.dropbox.com/s/y5yqcsxp1zcmnhc/test_sample.xlsx?dl=0 And this is really becoming a time consuming procedure, hence my decision to try appling artificial intelligence solutions to such a case. What I would really need are 2 separate steps: (1) A procedure that reads the GA email and creates a file such as the excel I have shared here (only first 3 columns) (2) Some sort of supervised learning algorithm that can learn by example from my choices and decide on my behalf (see column 4 in the attached file). That is: taking the output from step (1) above I can classify a few hundreds cases and then let the algorithm learn and classify future/additional data. I plan to regularly review such a classification, correct missclassifications and train the algorithm again with the objective to improve its ability to correctly classify the GA texts. Is my explanation clear enought? Can all the above be done within R? If so, is there any package/procedure I should be using? Thank you in advance for any suggestion you might have. Luca [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How do I best create a R procedure from a R file?
Hi, I am working on the following file: > str(elencositi) 'data.frame':641 obs. of 2 variables: $ indirizzo.sito: chr "10ahora.com.ar" "abceconomia.co" "accmag.com" " actu.orange.fr" ... $ nome.sito : chr "10ahora" "ABC economia" "Acc Magazine" "Orange Actu" ... > head(elencositi) indirizzo.sitonome.sito 1 10ahora.com.ar 10ahora 2 abceconomia.co ABC economia 3 accmag.com Acc Magazine 4 actu.orange.fr Orange Actu 5 affaires.lapresse.caLa Presse 6 agipapress.blogspot.it Agigapress Which is regularly updated and I consequently need to update a procedure that takes elencositi data to update dati$FONTE as indicated below: dati$FONTE <- ifelse(dati$FONTE=='10ahora.com.ar','10ahora',dati$FONTE) dati$FONTE <- ifelse(dati$FONTE=='abceconomia.co','ABC economia',dati$FONTE) dati$FONTE <- ifelse(dati$FONTE=='accmag.com','Acc Magazine',dati$FONTE) Currently I am using a time consuming procedure involving Excel to update that, but how can I make that automatic? Thank you in advance, Luca [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Script/function/procedure with loop
Thanks Sarah, The code works just fine. Luca 2016-07-11 22:43 GMT+02:00 Sarah Goslee : > Taking your question at face value, except for the factors in your > original data frame, you can output anything you'd like to text > onscreen using cat(). Output can also be saved to text files with > sink() or using batch files, etc and so forth. > > > > date <- > c("07-jul-16","07-jul-16","07-jul-16","08-jul-16","08-jul-16","08-jul-16","09-jul-16","09-jul-16") > varA <- c("text A1","text A2","text A3","text A4","text A5","text > A6","text A7","text A8") > varB <- c("link B1","link B2","link B3","link B4","link B5","link > B6","link B7","link B8") > mydf <- data.frame(date, varA, varB, stringsAsFactors=FALSE) > > for(i in sort(unique(mydf$date))) { > thisdate <- subset(mydf, date==i) > cat(i, "\n\n") > for(j in seq_len(nrow(thisdate))) { > cat(thisdate$varA[j], "\n") > cat(thisdate$varB[j], "\n\n") > } > } > > This code prints to screen: > > 07-jul-16 > > text A1 > link B1 > > text A2 > link B2 > > text A3 > link B3 > > 08-jul-16 > > text A4 > link B4 > > text A5 > link B5 > > text A6 > link B6 > > 09-jul-16 > > text A7 > link B7 > > text A8 > link B8 > > > On Mon, Jul 11, 2016 at 4:24 PM, Luca Meyer wrote: > > Can anyone point me to an R script/function/procedure which, starting > from > > the following sample data > > > > #sample data > > #NB: nrow(df) is variable > > > > date = > > > c("07-jul-16","07-jul-16","07-jul-16","08-jul-16","08-jul-16","08-jul-16","09-jul-16","09-jul-16") > > varA = c("text A1","text A2","text A3","text A4","text A5","text > A6","text > > A7","text A8") > > varB = c("link B1","link B2","link B3","link B4","link B5","link > B6","link > > B7","link B8") > > df = data.frame(date, varA, varB) > > > > allows me to obtain a text output such as: > > > >> 07-jul-16 > > > > text A1 > > link B1 > > > > text A2 > > link B2 > > > > text A3 > > link B3 > > > >> 08-jul-16 > > > > text A4 > > link B4 > > > > text A5 > > link B5 > > > > text A6 > > link B6 > > > >> 09-jul-16 > > > > text A7 > > link B7 > > > > text A8 > > link B8 > > > > etc... > > > > Thanks, > > > > Luca > > > > [[alternative HTML version deleted]] > > Please post in plain text. > > -- > Sarah Goslee > http://www.functionaldiversity.org > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Script/function/procedure with loop
Can anyone point me to an R script/function/procedure which, starting from the following sample data #sample data #NB: nrow(df) is variable date = c("07-jul-16","07-jul-16","07-jul-16","08-jul-16","08-jul-16","08-jul-16","09-jul-16","09-jul-16") varA = c("text A1","text A2","text A3","text A4","text A5","text A6","text A7","text A8") varB = c("link B1","link B2","link B3","link B4","link B5","link B6","link B7","link B8") df = data.frame(date, varA, varB) allows me to obtain a text output such as: > 07-jul-16 text A1 link B1 text A2 link B2 text A3 link B3 > 08-jul-16 text A4 link B4 text A5 link B5 text A6 link B6 > 09-jul-16 text A7 link B7 text A8 link B8 etc... Thanks, Luca [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Assistance with httr package with R version 3.3.0
Hi Jim, Thank you for your suggestion. I have actually tried to upload XML and xml2 but nothing changed...any other suggestion? Kind regards, Luca > rm(list=ls()) > library(httr) > library(XML) > library(xml2) > > #carico i dati da Google spreadsheets > url <- " https://docs.google.com/spreadsheets/d/102-jJ7x1YfIe4Kkvb9olQ4chQ_TS90jxoU0vAbFZewc/pubhtml?gid=0&single=true " > readSpreadsheet <- function(url, sheet = 1){ + r <- GET(url) + html <- content(r) + sheets <- readHTMLTable(html, header=FALSE, stringsAsFactors=FALSE) + df <- sheets[[sheet]] + dfClean <- function(df){ + nms <- t(df[1,]) + names(df) <- nms + df <- df[-1,-1] + row.names(df) <- seq(1,nrow(df)) + df + } + dfClean(df) + } > dati <- readSpreadsheet(url) Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘readHTMLTable’ for signature ‘"xml_document"’ > rm(readSpreadsheet,url) 2016-05-10 8:52 GMT+02:00 Jim Lemon : > Hi Luca, > The function readHTMLtable is in the XML package, not httr. Perhaps > that is the problem as I don't see a dependency in httr for XML > (although xml2 is suggested). > > Jim > > > On Tue, May 10, 2016 at 2:58 PM, Luca Meyer wrote: > > Hello, > > > > I am trying to run a code I have been using for a few years now after > > downloading the new R version 3.3.0 and I get the following error: > > > >> rm(list=ls()) > >> library(httr) > >> > >> #carico i dati da Google spreadsheets > >> url <- " > > > https://docs.google.com/spreadsheets/d/102-jJ7x1YfIe4Kkvb9olQ4chQ_TS90jxoU0vAbFZewc/pubhtml?gid=0&single=true > > " > >> readSpreadsheet <- function(url, sheet = 1){ > > + r <- GET(url) > > + html <- content(r) > > + sheets <- readHTMLTable(html, header=FALSE, stringsAsFactors=FALSE) > > + df <- sheets[[sheet]] > > + dfClean <- function(df){ > > + nms <- t(df[1,]) > > + names(df) <- nms > > + df <- df[-1,-1] > > + row.names(df) <- seq(1,nrow(df)) > > + df > > + } > > + dfClean(df) > > + } > >> dati <- readSpreadsheet(url) > > Error in (function (classes, fdef, mtable) : > > unable to find an inherited method for function ‘readHTMLTable’ for > > signature ‘"xml_document"’ > >> rm(readSpreadsheet,url) > > > > Can anyone suggest a solution to it? > > > > Thanks, > > > > Luca > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Assistance with httr package with R version 3.3.0
Hello, I am trying to run a code I have been using for a few years now after downloading the new R version 3.3.0 and I get the following error: > rm(list=ls()) > library(httr) > > #carico i dati da Google spreadsheets > url <- " https://docs.google.com/spreadsheets/d/102-jJ7x1YfIe4Kkvb9olQ4chQ_TS90jxoU0vAbFZewc/pubhtml?gid=0&single=true " > readSpreadsheet <- function(url, sheet = 1){ + r <- GET(url) + html <- content(r) + sheets <- readHTMLTable(html, header=FALSE, stringsAsFactors=FALSE) + df <- sheets[[sheet]] + dfClean <- function(df){ + nms <- t(df[1,]) + names(df) <- nms + df <- df[-1,-1] + row.names(df) <- seq(1,nrow(df)) + df + } + dfClean(df) + } > dati <- readSpreadsheet(url) Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘readHTMLTable’ for signature ‘"xml_document"’ > rm(readSpreadsheet,url) Can anyone suggest a solution to it? Thanks, Luca [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [FORGED] How to remove the grid around the plot(ca(...)) function?
That worked just fine. Thanks Paul! Luca 2015-10-09 0:11 GMT+02:00 Paul Murrell : > Hi > > The plot.ca() function contains explicit calls to axis(), box(), and > abline(), so, for example, ... > > plot(ca(d1), mass = c(TRUE,FALSE), xlab="", ylab="", axes=FALSE) > > ... does not work. > > One option is draw-it-yourself (as suggested by David Carlson), another > option is to copy the function source and write your own version that has > those axis(), box(), and abline() calls removed (not recommended for a > number of reasons), and another option is like this ... > > # Draw original plot > plot(ca(d1), mass = c(TRUE,FALSE), xlab="", ylab="") > # Generate 'grid' version of the plot > library(gridGraphics) > grid.echo() > # What has been drawn? > grid.ls() > # Remove whichever bits you want > grid.remove("axis", grep=TRUE, global=TRUE) > grid.remove("box", grep=TRUE) > grid.remove("abline", grep=TRUE, global=TRUE) > > Paul > > On 09/10/15 07:06, Luca Meyer wrote: > >> Hello R-experts, >> >> Could anyone suggest how I can remove the grid coming out of the >> plot(ca(...)) function? >> >> For instance I have: >> >> library(ca) >> v1 <- c(10,15,20,15,25) >> v2 <- c(23,4,7,12,2) >> v3 <- c(10,70,2,3,7) >> d1 <- data.frame(v1,v2,v3) >> rownames(d1) <- c("B1","B2","B3","B4","B5") >> plot(ca(d1), mass = c(TRUE,FALSE), xlab="", ylab="") >> >> As you can I could remove the X and Y axis label, but basically I am >> looking for a chart containing only the data points - with relative >> inertia >> represented by their size - and labels with no extra lines or number, any >> clue on how I can do that? >> >> Thank you, >> >> Luca >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > -- > Dr Paul Murrell > Department of Statistics > The University of Auckland > Private Bag 92019 > Auckland > New Zealand > 64 9 3737599 x85392 > p...@stat.auckland.ac.nz > http://www.stat.auckland.ac.nz/~paul/ > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to remove the grid around the plot(ca(...)) function?
Hello R-experts, Could anyone suggest how I can remove the grid coming out of the plot(ca(...)) function? For instance I have: library(ca) v1 <- c(10,15,20,15,25) v2 <- c(23,4,7,12,2) v3 <- c(10,70,2,3,7) d1 <- data.frame(v1,v2,v3) rownames(d1) <- c("B1","B2","B3","B4","B5") plot(ca(d1), mass = c(TRUE,FALSE), xlab="", ylab="") As you can I could remove the X and Y axis label, but basically I am looking for a chart containing only the data points - with relative inertia represented by their size - and labels with no extra lines or number, any clue on how I can do that? Thank you, Luca [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Joining two datasets - recursive procedure?
Hi David, hello R-experts Thank you for your input. I have tried the syntax you suggested but unfortunately the marginal distributions v1xv2 change after the manipulation. Please see below or https://www.dropbox.com/s/qhmpkkrejjkpbkx/sample_code.txt?dl=0 for the syntax. > rm(list=ls()) > > # this is usual (an extract of) the INPUT file I have: > f1 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", "B", + "B", "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A", + "B", "C", "A", "B", "C"), v3 = c("B", "B", "B", "C", "C", "C", + "B", "B", "B", "C", "C", "C"), v4 = c(18.18530, 3.43806,0.00273, 1.42917, 1.05786, 0.00042, 2.37232, 3.01835, 0, 1.13430, 0.92872, + 0)), .Names = c("v1", "v2", "v3", "v4"), class = "data.frame", row.names = c(2L, + 9L, 11L, 41L, 48L, 50L, 158L, 165L, 167L, 197L, 204L, 206L)) > > #first I order the file such that I have 6 distinct v1xv2 combinations > f1 <- f1[order(f1$v1,f1$v2),] > > # then I compute (manually) the relative importance of each v1xv2 combination: > tAA <- (18.18530+1.42917)/(18.18530+1.42917+3.43806+1.05786+0.00273+0.00042+2.37232+1.13430+3.01835+0.92872+0.0+0.0) # this is for combination v1=A & v2=A > tAB <- (3.43806+1.05786)/(18.18530+1.42917+3.43806+1.05786+0.00273+0.00042+2.37232+1.13430+3.01835+0.92872+0.0+0.0) # this is for combination v1=A & v2=B > tAC <- (0.00273+0.00042)/(18.18530+1.42917+3.43806+1.05786+0.00273+0.00042+2.37232+1.13430+3.01835+0.92872+0.0+0.0) # this is for combination v1=A & v2=C > tBA <- (2.37232+1.13430)/(18.18530+1.42917+3.43806+1.05786+0.00273+0.00042+2.37232+1.13430+3.01835+0.92872+0.0+0.0) # this is for combination v1=B & v2=A > tBB <- (3.01835+0.92872)/(18.18530+1.42917+3.43806+1.05786+0.00273+0.00042+2.37232+1.13430+3.01835+0.92872+0.0+0.0) # this is for combination v1=B & v2=B > tBC <- (0.0+0.0)/(18.18530+1.42917+3.43806+1.05786+0.00273+0.00042+2.37232+1.13430+3.01835+0.92872+0.0+0.0) # this is for combination v1=B & v2=C > # and just to make sure I have not made mistakes the following should be equal to 1 > tAA+tAB+tAC+tBA+tBB+tBC [1] 1 > > # procedure suggested by David Winsemius > lookarr <- array(NA, dim=c(length(unique(f1$v1)),length(unique(f1$v2)),length(unique(f1$v3)) ) , dimnames=list( unique(f1$v1), unique(f1$v2), unique(f1$v3) ) ) > lookarr[] <- c(tAA,tAA,tAB,tAB,tAC,tAC,tBA,tBA,tBB,tBB,tBC,tBC) > lookarr["A","B","C"] [1] 0.1250369 > lookarr[ with(f1, cbind(v1, v2, v3)) ] [1] 6.213554e-01 1.110842e-01 1.424236e-01 1.250369e-01 9.978703e-05 0.00e+00 6.213554e-01 1.110842e-01 1.424236e-01 1.250369e-01 9.978703e-05 [12] 0.00e+00 > f1$v4mod <- f1$v4*lookarr[ with(f1, cbind(v1,v2,v3)) ] > > # i compare original vs modified marginal distributions > aggregate(v4~v1*v2,f1,sum) v1 v2 v4 1 A A 19.61447 2 B A 3.50662 3 A B 4.49592 4 B B 3.94707 5 A C 0.00315 6 B C 0.0 > aggregate(v4mod~v1*v2,f1,sum) v1 v2v4mod 1 A A 1.145829e+01 2 B A 1.600057e+00 3 A B 6.219326e-01 4 B B 5.460087e-01 5 A C 2.724186e-07 6 B C 0.00e+00 > aggregate(v4~v3,f1,sum) v3 v4 1 B 27.01676 2 C 4.55047 > aggregate(v4mod~v3,f1,sum) v3 v4mod 1 B 13.6931347 2 C 0.5331569 Any suggestion on how this can be fixed? Remember, I am searching for a solution where by aggregate(v4~v1*v2,f1,sum)==aggregate(v4~v1*v2,f1,sum) while aggregate(v4~v3,f1,sum)!=aggregate(v4mod~v3,f1,sum) by specified amounts (see my earlier example). Thank you very much, Luca 2015-03-22 22:11 GMT+01:00 David Winsemius : > > On Mar 22, 2015, at 1:12 PM, Luca Meyer wrote: > > > Hi Bert, > > > > Maybe I did not explain myself clearly enough. But let me show you with a > > manual example that indeed what I would like to do is feasible. > > > > The following is also available for download from > > https://www.dropbox.com/s/qhmpkkrejjkpbkx/sample_code.txt?dl=0 > > > > rm(list=ls()) > > > > This is usual (an extract of) the INPUT file I have: > > > > f1 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", "B", > > "B", "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A", >
Re: [R] Joining two datasets - recursive procedure?
Dear All, I think I have found a fix developing the draft syntax I have provided yesterday, see below or https://www.dropbox.com/s/pbz9dcgxu6ljj8x/sample_code_1.txt?dl=0. Only desirable improvement is related to the block where I compute the modified v4 (lines 46-60 in the attached file). Provided the real data are of the dimension 8x13x13 (v1xv2xv3), is there anyway to write that block sentence in an automated way? I recall some function that could do that but I can't remenber which one... Thanks to everybody and especially to Bert and David for trying to assist me with this one. And apologizes for not being so clear upfront but I was trying to figure it out myself too... Kind regards, Luca === rm(list=ls()) # this is usual (an extract of) the INPUT file I have: f1 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C"), v3 = c("B", "B", "B", "C", "C", "C", "B", "B", "B", "C", "C", "C"), v4 = c(18.18530, 3.43806,0.00273, 1.42917, 1.05786, 0.00042, 2.37232, 3.01835, 0, 1.13430, 0.92872, 0)), .Names = c("v1", "v2", "v3", "v4"), class = "data.frame", row.names = c(2L, 9L, 11L, 41L, 48L, 50L, 158L, 165L, 167L, 197L, 204L, 206L)) #first I order the file such that I have 6 distinct v1xv2 combinations f1 <- f1[order(f1$v1,f1$v2),] #I compute the relative importance of each v1xv2 automatically t1 <- aggregate(v4~1,f1,sum) tXX <- aggregate(v4~v1*v2,f1,sum) tAA <- as.numeric(tXX$v4[tXX$v1=="A"&tXX$v2=="A"]/t1) tAB <- as.numeric(tXX$v4[tXX$v1=="A"&tXX$v2=="B"]/t1) tAC <- as.numeric(tXX$v4[tXX$v1=="A"&tXX$v2=="C"]/t1) tBA <- as.numeric(tXX$v4[tXX$v1=="B"&tXX$v2=="A"]/t1) tBB <- as.numeric(tXX$v4[tXX$v1=="B"&tXX$v2=="B"]/t1) tBC <- as.numeric(tXX$v4[tXX$v1=="B"&tXX$v2=="C"]/t1) tAA+tAB+tAC+tBA+tBB+tBC rm(t1) # Next, I compute the difference I need to compute for each C category (t1 <- aggregate(v4~v3,f1,sum)) # this is the actual distribution (t2 <- structure(list(v3 = c("B", "C"), v4 = c(29, 2.56723)), .Names = c("v3", "v4"), row.names = c(NA, -2L), class = "data.frame")) # this is the target distribution # I verify t1 & t2 total is the same aggregate(v4~1,t1,sum) aggregate(v4~1,t2,sum) # I determine the value to be added/subtracted to each instance of v3 t1 <- merge(t1,t2,by="v3") t1$dif <- t1$v4.y-t1$v4.x t1 <- t1[,c("v3","dif")] t1 # I merge the t1 file with the f1 f1 <- merge (f1,t1,by="v3") f1 rm(t1,t2) # I compute the modified v4 value f1$v4mod <- f1$v4 f1$v4mod <- ifelse (f1$v1=="A" & f1$v2=="A" & f1$v3=="B", f1$v4+(tAA*f1$dif), f1$v4mod) f1$v4mod <- ifelse (f1$v1=="A" & f1$v2=="A" & f1$v3=="C", f1$v4+(tAA*f1$dif), f1$v4mod) f1$v4mod <- ifelse (f1$v1=="A" & f1$v2=="B" & f1$v3=="B", f1$v4+(tAB*f1$dif), f1$v4mod) f1$v4mod <- ifelse (f1$v1=="A" & f1$v2=="B" & f1$v3=="C", f1$v4+(tAB*f1$dif), f1$v4mod) f1$v4mod <- ifelse (f1$v1=="A" & f1$v2=="C" & f1$v3=="B", f1$v4+(tAC*f1$dif), f1$v4mod) f1$v4mod <- ifelse (f1$v1=="A" & f1$v2=="C" & f1$v3=="C", f1$v4+(tAC*f1$dif), f1$v4mod) f1$v4mod <- ifelse (f1$v1=="B" & f1$v2=="A" & f1$v3=="B", f1$v4+(tBA*f1$dif), f1$v4mod) f1$v4mod <- ifelse (f1$v1=="B" & f1$v2=="A" & f1$v3=="C", f1$v4+(tBA*f1$dif), f1$v4mod) f1$v4mod <- ifelse (f1$v1=="B" & f1$v2=="B" & f1$v3=="B", f1$v4+(tBB*f1$dif), f1$v4mod) f1$v4mod <- ifelse (f1$v1=="B" & f1$v2=="B" & f1$v3=="C", f1$v4+(tBB*f1$dif), f1$v4mod) f1$v4mod <- ifelse (f1$v1=="B" & f1$v2=="C" & f1$v3=="B", f1$v4+(tBC*f1$dif), f1$v4mod) f1$v4mod <- ifelse (f1$v1=="B" & f1$v2=="C" & f1$v3=="C", f1$v4+(tBC*f1$dif), f1$v4mod) f1 # i compare original vs modified marginal distributions aggregate(v4~v1*v2,f1,sum) aggregate(v4mod~v1*v2,f1,sum) aggregate(v4~v3,f1,sum) aggregate(v4mod~v3,f1,sum) aggregate(v4~1,f1,sum) aggregate(v4mod~1,f1,sum) rm(list=ls()) 2015-0
Re: [R] Fwd: Joining two datasets - recursive procedure?
Hi Bert, Maybe I did not explain myself clearly enough. But let me show you with a manual example that indeed what I would like to do is feasible. The following is also available for download from https://www.dropbox.com/s/qhmpkkrejjkpbkx/sample_code.txt?dl=0 rm(list=ls()) This is usual (an extract of) the INPUT file I have: f1 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C"), v3 = c("B", "B", "B", "C", "C", "C", "B", "B", "B", "C", "C", "C"), v4 = c(18.18530, 3.43806,0.00273, 1.42917, 1.05786, 0.00042, 2.37232, 3.01835, 0, 1.13430, 0.92872, 0)), .Names = c("v1", "v2", "v3", "v4"), class = "data.frame", row.names = c(2L, 9L, 11L, 41L, 48L, 50L, 158L, 165L, 167L, 197L, 204L, 206L)) This are the initial marginal distributions aggregate(v4~v1*v2,f1,sum) aggregate(v4~v3,f1,sum) First I order the file such that I have nicely listed 6 distinct v1xv2 combinations. f1 <- f1[order(f1$v1,f1$v2),] Then I compute (manually) the relative importance of each v1xv2 combination: tAA <- (18.18530+1.42917)/(18.18530+1.42917+3.43806+1.05786+0.00273+0.00042+2.37232+1.13430+3.01835+0.92872+0.0+0.0) # this is for combination v1=A & v2=A tAB <- (3.43806+1.05786)/(18.18530+1.42917+3.43806+1.05786+0.00273+0.00042+2.37232+1.13430+3.01835+0.92872+0.0+0.0) # this is for combination v1=A & v2=B tAC <- (0.00273+0.00042)/(18.18530+1.42917+3.43806+1.05786+0.00273+0.00042+2.37232+1.13430+3.01835+0.92872+0.0+0.0) # this is for combination v1=A & v2=C tBA <- (2.37232+1.13430)/(18.18530+1.42917+3.43806+1.05786+0.00273+0.00042+2.37232+1.13430+3.01835+0.92872+0.0+0.0) # this is for combination v1=B & v2=A tBB <- (3.01835+0.92872)/(18.18530+1.42917+3.43806+1.05786+0.00273+0.00042+2.37232+1.13430+3.01835+0.92872+0.0+0.0) # this is for combination v1=B & v2=B tBC <- (0.0+0.0)/(18.18530+1.42917+3.43806+1.05786+0.00273+0.00042+2.37232+1.13430+3.01835+0.92872+0.0+0.0) # this is for combination v1=B & v2=C # and just to make sure I have not made mistakes the following should be equal to 1 tAA+tAB+tAC+tBA+tBB+tBC Next, I know I need to increase v4 any time v3=B and the total increase I need to have over the whole dataset is 29-27.01676=1.98324. In turn, I need to dimish v4 any time V3=C by the same amount (4.55047-2.56723=1.98324). This aspect was perhaps not clear at first. I need to move v4 across v3 categories, but the totals will always remain unchanged. Since I want the data alteration to be proportional to the v1xv2 combinations I do the following: f1$v4 <- ifelse (f1$v1=="A" & f1$v2=="A" & f1$v3=="B", f1$v4+(tAA*1.98324), f1$v4) f1$v4 <- ifelse (f1$v1=="A" & f1$v2=="A" & f1$v3=="C", f1$v4-(tAA*1.98324), f1$v4) f1$v4 <- ifelse (f1$v1=="A" & f1$v2=="B" & f1$v3=="B", f1$v4+(tAB*1.98324), f1$v4) f1$v4 <- ifelse (f1$v1=="A" & f1$v2=="B" & f1$v3=="C", f1$v4-(tAB*1.98324), f1$v4) f1$v4 <- ifelse (f1$v1=="A" & f1$v2=="C" & f1$v3=="B", f1$v4+(tAC*1.98324), f1$v4) f1$v4 <- ifelse (f1$v1=="A" & f1$v2=="C" & f1$v3=="C", f1$v4-(tAC*1.98324), f1$v4) f1$v4 <- ifelse (f1$v1=="B" & f1$v2=="A" & f1$v3=="B", f1$v4+(tBA*1.98324), f1$v4) f1$v4 <- ifelse (f1$v1=="B" & f1$v2=="A" & f1$v3=="C", f1$v4-(tBA*1.98324), f1$v4) f1$v4 <- ifelse (f1$v1=="B" & f1$v2=="B" & f1$v3=="B", f1$v4+(tBB*1.98324), f1$v4) f1$v4 <- ifelse (f1$v1=="B" & f1$v2=="B" & f1$v3=="C", f1$v4-(tBB*1.98324), f1$v4) f1$v4 <- ifelse (f1$v1=="B" & f1$v2=="C" & f1$v3=="B", f1$v4+(tBC*1.98324), f1$v4) f1$v4 <- ifelse (f1$v1=="B" & f1$v2=="C" & f1$v3=="C", f1$v4-(tBC*1.98324), f1$v4) This are the final marginal distributions: aggregate(v4~v1*v2,f1,sum) aggregate(v4~v3,f1,sum) Can this procedure be made programmatic so that I can run it on the (8x13x13) categories matrix? if so, how would you do it? I have really hard time to do it with some (semi)automatic procedure. Thank you very much indeed once more :) Luca 2015-03-22 18:32 GMT+01:00 Bert Gunter : > Nonsense. You are not telling us somet
[R] Fwd: Joining two datasets - recursive procedure?
Sorry forgot to keep the rest of the group in the loop - Luca -- Forwarded message -- From: Luca Meyer Date: 2015-03-22 16:27 GMT+01:00 Subject: Re: [R] Joining two datasets - recursive procedure? To: Bert Gunter Hi Bert, That is exactly what I am trying to achieve. Please notice that negative v4 values are allowed. I have done a similar task in the past manually by recursively alterating v4 distribution across v3 categories within fix each v1&v2 combination so I am quite positive it can be achieved but honestly I took me forever to do it manually and since this is likely to be an exercise I need to repeat from time to time I wish I could learn how to do it programmatically Thanks again for any further suggestion you might have, Luca 2015-03-22 16:05 GMT+01:00 Bert Gunter : > Oh, wait a minute ... > > You still want the marginals for the other columns to be as originally? > > If so, then this is impossible in general as the sum of all the values > must be what they were originally and you cannot therefore choose your > values for V3 arbitrarily. > > Or at least, that seems to be what you are trying to do. > > -- Bert > > Bert Gunter > Genentech Nonclinical Biostatistics > (650) 467-7374 > > "Data is not information. Information is not knowledge. And knowledge > is certainly not wisdom." > Clifford Stoll > > > > > On Sun, Mar 22, 2015 at 7:55 AM, Bert Gunter wrote: > > I would have thought that this is straightforward given my previous > email... > > > > Just set z to what you want -- e,g, all B values to 29/number of B's, > > and all C values to 2.567/number of C's (etc. for more categories). > > > > A slick but sort of cheat way to do this programmatically -- in the > > sense that it relies on the implementation of factor() rather than its > > API -- is: > > > > y <- f1$v3 ## to simplify the notation; could be done using with() > > z <- (c(29,2.567)/table(y))[c(y)] > > > > Then proceed to z1 as I previously described > > > > -- Bert > > > > > > Bert Gunter > > Genentech Nonclinical Biostatistics > > (650) 467-7374 > > > > "Data is not information. Information is not knowledge. And knowledge > > is certainly not wisdom." > > Clifford Stoll > > > > > > > > > > On Sun, Mar 22, 2015 at 2:00 AM, Luca Meyer wrote: > >> Hi Bert, hello R-experts, > >> > >> I am close to a solution but I still need one hint w.r.t. the following > >> procedure (available also from > >> https://www.dropbox.com/s/qhmpkkrejjkpbkx/sample_code.txt?dl=0) > >> > >> rm(list=ls()) > >> > >> # this is (an extract of) the INPUT file I have: > >> f1 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", "B", "B", > >> "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A", "B", "C", "A", > >> "B", "C"), v3 = c("B", "B", "B", "C", "C", "C", "B", "B", "B", "C", "C", > >> "C"), v4 = c(18.18530, 3.43806,0.00273, 1.42917, 1.05786, 0.00042, > 2.37232, > >> 3.01835, 0, 1.13430, 0.92872, 0)), .Names = c("v1", "v2", "v3", "v4"), > class > >> = "data.frame", row.names = c(2L, 9L, 11L, 41L, 48L, 50L, 158L, 165L, > 167L, > >> 197L, 204L, 206L)) > >> > >> # this is the procedure that Bert suggested (slightly adjusted): > >> z <- rnorm(nrow(f1)) ## or anything you want > >> z1 <- round(with(f1,v4 + z -ave(z,v1,v2,FUN=mean)), digits=5) > >> aggregate(v4~v1*v2,f1,sum) > >> aggregate(z1~v1*v2,f1,sum) > >> aggregate(v4~v3,f1,sum) > >> aggregate(z1~v3,f1,sum) > >> > >> My question to you is: how can I set z so that I can obtain specific > values > >> for z1-v4 in the v3 aggregation? > >> In other words, how can I configure the procedure so that e.g. B=29 and > >> C=2.56723 after running the procedure: > >> aggregate(z1~v3,f1,sum) > >> > >> Thank you, > >> > >> Luca > >> > >> PS: to avoid any doubts you might have about who I am the following is > my > >> web page: http://lucameyer.wordpress.com/ > >> > >> > >> 2015-03-21 18:13 GMT+01:00 Bert Gunter : > >
Re: [R] Joining two datasets - recursive procedure?
Hi Bert, Thanks again for your assistance. Unfortunately when I apply the additional code you suggest I get B=40.23326 & C=-8.66603 and not B=29 & C=2.56723. Any idea why that might be happening? Please see below or on https://www.dropbox.com/s/qhmpkkrejjkpbkx/sample_code.txt?dl=0 the code I am running: rm(list=ls()) # this is (an extract of) the INPUT file I have: f1 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C"), v3 = c("B", "B", "B", "C", "C", "C", "B", "B", "B", "C", "C", "C"), v4 = c(18.18530, 3.43806,0.00273, 1.42917, 1.05786, 0.00042, 2.37232, 3.01835, 0, 1.13430, 0.92872, 0)), .Names = c("v1", "v2", "v3", "v4"), class = "data.frame", row.names = c(2L, 9L, 11L, 41L, 48L, 50L, 158L, 165L, 167L, 197L, 204L, 206L)) # this is the procedure that Bert suggested (slightly adjusted): y <- f1$v3 ## to simplify the notation; could be done using with() z <- (c(29,2.567)/table(y))[c(y)] # z <- rnorm(nrow(f1)) ## or anything you want z1 <- round(with(f1,v4 + z -ave(z,v1,v2,FUN=mean)), digits=5) aggregate(v4~v1*v2,f1,sum) aggregate(z1~v1*v2,f1,sum) aggregate(v4~v3,f1,sum) aggregate(z1~v3,f1,sum) Thanks again, Luca 2015-03-22 15:55 GMT+01:00 Bert Gunter : > I would have thought that this is straightforward given my previous > email... > > Just set z to what you want -- e,g, all B values to 29/number of B's, > and all C values to 2.567/number of C's (etc. for more categories). > > A slick but sort of cheat way to do this programmatically -- in the > sense that it relies on the implementation of factor() rather than its > API -- is: > > y <- f1$v3 ## to simplify the notation; could be done using with() > z <- (c(29,2.567)/table(y))[c(y)] > > Then proceed to z1 as I previously described > > -- Bert > > > Bert Gunter > Genentech Nonclinical Biostatistics > (650) 467-7374 > > "Data is not information. Information is not knowledge. And knowledge > is certainly not wisdom." > Clifford Stoll > > > > > On Sun, Mar 22, 2015 at 2:00 AM, Luca Meyer wrote: > > Hi Bert, hello R-experts, > > > > I am close to a solution but I still need one hint w.r.t. the following > > procedure (available also from > > https://www.dropbox.com/s/qhmpkkrejjkpbkx/sample_code.txt?dl=0) > > > > rm(list=ls()) > > > > # this is (an extract of) the INPUT file I have: > > f1 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", "B", "B", > > "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A", "B", "C", "A", > > "B", "C"), v3 = c("B", "B", "B", "C", "C", "C", "B", "B", "B", "C", "C", > > "C"), v4 = c(18.18530, 3.43806,0.00273, 1.42917, 1.05786, 0.00042, > 2.37232, > > 3.01835, 0, 1.13430, 0.92872, 0)), .Names = c("v1", "v2", "v3", "v4"), > class > > = "data.frame", row.names = c(2L, 9L, 11L, 41L, 48L, 50L, 158L, 165L, > 167L, > > 197L, 204L, 206L)) > > > > # this is the procedure that Bert suggested (slightly adjusted): > > z <- rnorm(nrow(f1)) ## or anything you want > > z1 <- round(with(f1,v4 + z -ave(z,v1,v2,FUN=mean)), digits=5) > > aggregate(v4~v1*v2,f1,sum) > > aggregate(z1~v1*v2,f1,sum) > > aggregate(v4~v3,f1,sum) > > aggregate(z1~v3,f1,sum) > > > > My question to you is: how can I set z so that I can obtain specific > values > > for z1-v4 in the v3 aggregation? > > In other words, how can I configure the procedure so that e.g. B=29 and > > C=2.56723 after running the procedure: > > aggregate(z1~v3,f1,sum) > > > > Thank you, > > > > Luca > > > > PS: to avoid any doubts you might have about who I am the following is my > > web page: http://lucameyer.wordpress.com/ > > > > > > 2015-03-21 18:13 GMT+01:00 Bert Gunter : > >> > >> ... or cleaner: > >> > >> z1 <- with(f1,v4 + z -
Re: [R] Joining two datasets - recursive procedure?
Hi Bert, hello R-experts, I am close to a solution but I still need one hint w.r.t. the following procedure (available also from https://www.dropbox.com/s/qhmpkkrejjkpbkx/sample_code.txt?dl=0) rm(list=ls()) # this is (an extract of) the INPUT file I have: f1 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C"), v3 = c("B", "B", "B", "C", "C", "C", "B", "B", "B", "C", "C", "C"), v4 = c(18.18530, 3.43806,0.00273, 1.42917, 1.05786, 0.00042, 2.37232, 3.01835, 0, 1.13430, 0.92872, 0)), .Names = c("v1", "v2", "v3", "v4"), class = "data.frame", row.names = c(2L, 9L, 11L, 41L, 48L, 50L, 158L, 165L, 167L, 197L, 204L, 206L)) # this is the procedure that Bert suggested (slightly adjusted): z <- rnorm(nrow(f1)) ## or anything you want z1 <- round(with(f1,v4 + z -ave(z,v1,v2,FUN=mean)), digits=5) aggregate(v4~v1*v2,f1,sum) aggregate(z1~v1*v2,f1,sum) aggregate(v4~v3,f1,sum) aggregate(z1~v3,f1,sum) My question to you is: how can I set z so that I can obtain specific values for z1-v4 in the v3 aggregation? In other words, how can I configure the procedure so that e.g. B=29 and C=2.56723 after running the procedure: aggregate(z1~v3,f1,sum) Thank you, Luca PS: to avoid any doubts you might have about who I am the following is my web page: http://lucameyer.wordpress.com/ 2015-03-21 18:13 GMT+01:00 Bert Gunter : > ... or cleaner: > > z1 <- with(f1,v4 + z -ave(z,v1,v2,FUN=mean)) > > > Just for curiosity, was this homework? (in which case I should > probably have not provided you an answer -- that is, assuming that I > HAVE provided an answer). > > Cheers, > Bert > > Bert Gunter > Genentech Nonclinical Biostatistics > (650) 467-7374 > > "Data is not information. Information is not knowledge. And knowledge > is certainly not wisdom." > Clifford Stoll > > > > > On Sat, Mar 21, 2015 at 7:53 AM, Bert Gunter wrote: > > z <- rnorm(nrow(f1)) ## or anything you want > > z1 <- f1$v4 + z - with(f1,ave(z,v1,v2,FUN=mean)) > > > > > > aggregate(v4~v1,f1,sum) > > aggregate(z1~v1,f1,sum) > > aggregate(v4~v2,f1,sum) > > aggregate(z1~v2,f1,sum) > > aggregate(v4~v3,f1,sum) > > aggregate(z1~v3,f1,sum) > > > > > > Cheers, > > Bert > > > > Bert Gunter > > Genentech Nonclinical Biostatistics > > (650) 467-7374 > > > > "Data is not information. Information is not knowledge. And knowledge > > is certainly not wisdom." > > Clifford Stoll > > > > > > > > > > On Sat, Mar 21, 2015 at 6:49 AM, Luca Meyer wrote: > >> Hi Bert, > >> > >> Thank you for your message. I am looking into ave() and tapply() as you > >> suggested but at the same time I have prepared a example of input and > output > >> files, just in case you or someone else would like to make an attempt to > >> generate a code that goes from input to output. > >> > >> Please see below or download it from > >> https://www.dropbox.com/s/qhmpkkrejjkpbkx/sample_code.txt?dl=0 > >> > >> # this is (an extract of) the INPUT file I have: > >> f1 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", "B", > >> "B", "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A", > >> "B", "C", "A", "B", "C"), v3 = c("B", "B", "B", "C", "C", "C", > >> "B", "B", "B", "C", "C", "C"), v4 = c(18.18530, 3.43806,0.00273, > 1.42917, > >> 1.05786, 0.00042, 2.37232, 3.01835, 0, 1.13430, 0.92872, > >> 0)), .Names = c("v1", "v2", "v3", "v4"), class = "data.frame", > row.names = > >> c(2L, > >> 9L, 11L, 41L, 48L, 50L, 158L, 165L, 167L, 197L, 204L, 206L)) > >> > >> # this is (an extract of) the OUTPUT file I would like to obtain: > >> f2 <- structure(list(v1 = c("A", "A", "A", "A", "A
Re: [R] Joining two datasets - recursive procedure?
Hi Bert, Thank you for your message. I am looking into ave() and tapply() as you suggested but at the same time I have prepared a example of input and output files, just in case you or someone else would like to make an attempt to generate a code that goes from input to output. Please see below or download it from https://www.dropbox.com/s/qhmpkkrejjkpbkx/sample_code.txt?dl=0 # this is (an extract of) the INPUT file I have: f1 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C"), v3 = c("B", "B", "B", "C", "C", "C", "B", "B", "B", "C", "C", "C"), v4 = c(18.18530, 3.43806,0.00273, 1.42917, 1.05786, 0.00042, 2.37232, 3.01835, 0, 1.13430, 0.92872, 0)), .Names = c("v1", "v2", "v3", "v4"), class = "data.frame", row.names = c(2L, 9L, 11L, 41L, 48L, 50L, 158L, 165L, 167L, 197L, 204L, 206L)) # this is (an extract of) the OUTPUT file I would like to obtain: f2 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C"), v3 = c("B", "B", "B", "C", "C", "C", "B", "B", "B", "C", "C", "C"), v4 = c(17.83529, 3.43806,0.00295, 1.77918, 1.05786, 0.0002, 2.37232, 3.01835, 0, 1.13430, 0.92872, 0)), .Names = c("v1", "v2", "v3", "v4"), class = "data.frame", row.names = c(2L, 9L, 11L, 41L, 48L, 50L, 158L, 165L, 167L, 197L, 204L, 206L)) # please notice that while the aggregated v4 on v3 has changed … aggregate(f1[,c("v4")],list(f1$v3),sum) aggregate(f2[,c("v4")],list(f2$v3),sum) # … the aggregated v4 over v1xv2 has remained unchanged: aggregate(f1[,c("v4")],list(f1$v1,f1$v2),sum) aggregate(f2[,c("v4")],list(f2$v1,f2$v2),sum) Thank you very much in advance for your assitance. Luca 2015-03-21 13:18 GMT+01:00 Bert Gunter : > 1. Still not sure what you mean, but maybe look at ?ave and ?tapply, > for which ave() is a wrapper. > > 2. You still need to heed the rest of Jeff's advice. > > Cheers, > Bert > > Bert Gunter > Genentech Nonclinical Biostatistics > (650) 467-7374 > > "Data is not information. Information is not knowledge. And knowledge > is certainly not wisdom." > Clifford Stoll > > > > > On Sat, Mar 21, 2015 at 4:53 AM, Luca Meyer wrote: > > Hi Jeff & other R-experts, > > > > Thank you for your note. I have tried myself to solve the issue without > > success. > > > > Following your suggestion, I am providing a sample of the dataset I am > > using below (also downloadble in plain text from > > https://www.dropbox.com/s/qhmpkkrejjkpbkx/sample_code.txt?dl=0): > > > > #this is an extract of the overall dataset (n=1200 cases) > > f1 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", "B", > > "B", "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A", > > "B", "C", "A", "B", "C"), v3 = c("B", "B", "B", "C", "C", "C", > > "B", "B", "B", "C", "C", "C"), v4 = c(18.1853007621835, 3.43806581506388, > > 0.002733567617055, 1.42917483425029, 1.05786640463504, > > 0.000420548864162308, > > 2.37232740842861, 3.01835841813241, 0, 1.13430282139936, > 0.928725667117666, > > 0)), .Names = c("v1", "v2", "v3", "v4"), class = "data.frame", row.names > = > > c(2L, > > 9L, 11L, 41L, 48L, 50L, 158L, 165L, 167L, 197L, 204L, 206L)) > > > > I need to find a automated procedure that allows me to adjust v3 > marginals > > while maintaining v1xv2 marginals unchanged. > > > > That is: modify the v4 values you can find by running: > > > &g
Re: [R] Joining two datasets - recursive procedure?
Hi Jeff & other R-experts, Thank you for your note. I have tried myself to solve the issue without success. Following your suggestion, I am providing a sample of the dataset I am using below (also downloadble in plain text from https://www.dropbox.com/s/qhmpkkrejjkpbkx/sample_code.txt?dl=0): #this is an extract of the overall dataset (n=1200 cases) f1 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C"), v3 = c("B", "B", "B", "C", "C", "C", "B", "B", "B", "C", "C", "C"), v4 = c(18.1853007621835, 3.43806581506388, 0.002733567617055, 1.42917483425029, 1.05786640463504, 0.000420548864162308, 2.37232740842861, 3.01835841813241, 0, 1.13430282139936, 0.928725667117666, 0)), .Names = c("v1", "v2", "v3", "v4"), class = "data.frame", row.names = c(2L, 9L, 11L, 41L, 48L, 50L, 158L, 165L, 167L, 197L, 204L, 206L)) I need to find a automated procedure that allows me to adjust v3 marginals while maintaining v1xv2 marginals unchanged. That is: modify the v4 values you can find by running: aggregate(f1[,c("v4")],list(f1$v3),sum) while maintaining costant the values you can find by running: aggregate(f1[,c("v4")],list(f1$v1,f1$v2),sum) Now does it make sense? Please notice I have tried to build some syntax that tries to modify values within each v1xv2 combination by computing sum of v4, row percentage in terms of v4, and there is where my effort is blocked. Not really sure how I should proceed. Any suggestion? Thanks, Luca 2015-03-19 2:38 GMT+01:00 Jeff Newmiller : > I don't understand your description. The standard practice on this list is > to provide a reproducible R example [1] of the kind of data you are working > with (and any code you have tried) to go along with your description. In > this case, that would be two dputs of your input data frames and a dput of > an output data frame (generated by hand from your input data frame). > (Probably best to not use the full number of input values just to keep the > size down.) We could then make an attempt to generate code that goes from > input to output. > > Of course, if you post that hard work using HTML then it will get > corrupted (much like the text below from your earlier emails) and we won't > be able to use it. Please learn to post from your email software using > plain text when corresponding with this mailing list. > > [1] > http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example > --- > Jeff Newmiller The . . Go Live... > DCN:Basics: ##.#. ##.#. Live > Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/BatteriesO.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > --- > Sent from my phone. Please excuse my brevity. > > On March 18, 2015 9:05:37 AM PDT, Luca Meyer wrote: > >Thanks for you input Michael, > > > >The continuous variable I have measures quantities (down to the 3rd > >decimal level) so unfortunately are not frequencies. > > > >Any more specific suggestions on how that could be tackled? > > > >Thanks & kind regards, > > > >Luca > > > > > >=== > > > >Michael Friendly wrote: > >I'm not sure I understand completely what you want to do, but > >if the data were frequencies, it sounds like task for fitting a > >loglinear model with the model formula > > > >~ V1*V2 + V3 > > > >On 3/18/2015 2:17 AM, Luca Meyer wrote: > >>* Hello, > >*>>* I am facing a quite challenging task (at least to me) and I was > >wondering > >*>* if someone could advise how R could assist me to speed the task up. > >*>>* I am dealing with a dataset with 3 discrete variables and one > >continuous > >*>* variable. The discrete variables are: > >*>>* V1: 8 modalities > >*>* V2: 13 modalities > >*>* V3: 13 modalities > >*>>* The continuous variable V4 is a decimal number always greater than > >zero in > >*>* the marginals of each of the 3 variables but it is
[R] Joining two datasets - recursive procedure?
Thanks for you input Michael, The continuous variable I have measures quantities (down to the 3rd decimal level) so unfortunately are not frequencies. Any more specific suggestions on how that could be tackled? Thanks & kind regards, Luca === Michael Friendly wrote: I'm not sure I understand completely what you want to do, but if the data were frequencies, it sounds like task for fitting a loglinear model with the model formula ~ V1*V2 + V3 On 3/18/2015 2:17 AM, Luca Meyer wrote: >* Hello, *>>* I am facing a quite challenging task (at least to me) and I was wondering *>* if someone could advise how R could assist me to speed the task up. *>>* I am dealing with a dataset with 3 discrete variables and one continuous *>* variable. The discrete variables are: *>>* V1: 8 modalities *>* V2: 13 modalities *>* V3: 13 modalities *>>* The continuous variable V4 is a decimal number always greater than zero in *>* the marginals of each of the 3 variables but it is sometimes equal to zero *>* (and sometimes negative) in the joint tables. *>>* I have got 2 files: *>>* => one with distribution of all possible combinations of V1xV2 (some of *>* which are zero or neagtive) and *>* => one with the marginal distribution of V3. *>>* I am trying to build the long and narrow dataset V1xV2xV3 in such a way *>* that each V1xV2 cell does not get modified and V3 fits as closely as *>* possible to its marginal distribution. Does it make sense? *>>* To be even more specific, my 2 input files look like the following. *>>* FILE 1 *>* V1,V2,V4 *>* A, A, 24.251 *>* A, B, 1.065 *>* (...) *>* B, C, 0.294 *>* B, D, 2.731 *>* (...) *>* H, L, 0.345 *>* H, M, 0.000 *>>* FILE 2 *>* V3, V4 *>* A, 1.575 *>* B, 4.294 *>* C, 10.044 *>* (...) *>* L, 5.123 *>* M, 3.334 *>>* What I need to achieve is a file such as the following *>>* FILE 3 *>* V1, V2, V3, V4 *>* A, A, A, ??? *>* A, A, B, ??? *>* (...) *>* D, D, E, ??? *>* D, D, F, ??? *>* (...) *>* H, M, L, ??? *>* H, M, M, ??? *>>* Please notice that FILE 3 need to be such that if I aggregate on V1+V2 I *>* recover exactly FILE 1 and that if I aggregate on V3 I can recover a file *>* as close as possible to FILE 3 (ideally the same file). *>>* Can anyone suggest how I could do that with R? *>>* Thank you very much indeed for any assistance you are able to provide. *>>* Kind regards, *>>* Luca* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Joining two datasets - recursive procedure?
Hello, I am facing a quite challenging task (at least to me) and I was wondering if someone could advise how R could assist me to speed the task up. I am dealing with a dataset with 3 discrete variables and one continuous variable. The discrete variables are: V1: 8 modalities V2: 13 modalities V3: 13 modalities The continuous variable V4 is a decimal number always greater than zero in the marginals of each of the 3 variables but it is sometimes equal to zero (and sometimes negative) in the joint tables. I have got 2 files: => one with distribution of all possible combinations of V1xV2 (some of which are zero or neagtive) and => one with the marginal distribution of V3. I am trying to build the long and narrow dataset V1xV2xV3 in such a way that each V1xV2 cell does not get modified and V3 fits as closely as possible to its marginal distribution. Does it make sense? To be even more specific, my 2 input files look like the following. FILE 1 V1,V2,V4 A, A, 24.251 A, B, 1.065 (...) B, C, 0.294 B, D, 2.731 (...) H, L, 0.345 H, M, 0.000 FILE 2 V3, V4 A, 1.575 B, 4.294 C, 10.044 (...) L, 5.123 M, 3.334 What I need to achieve is a file such as the following FILE 3 V1, V2, V3, V4 A, A, A, ??? A, A, B, ??? (...) D, D, E, ??? D, D, F, ??? (...) H, M, L, ??? H, M, M, ??? Please notice that FILE 3 need to be such that if I aggregate on V1+V2 I recover exactly FILE 1 and that if I aggregate on V3 I can recover a file as close as possible to FILE 3 (ideally the same file). Can anyone suggest how I could do that with R? Thank you very much indeed for any assistance you are able to provide. Kind regards, Luca [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to verify char variables contain at least one value
Hi Jim, Thank you, it works indeed :) Luca 2014/1/2 Jim Lemon > On 01/02/2014 05:17 PM, Luca Meyer wrote: > >> Happy new year fellows, >> >> I am trying to do something I believe should be fairly straightforward but >> I cannot find my way out. >> >> My dataset d2 is 26 rows by 245 columns, exclusively char variables. I >> would like to check whether at least one column from V13 till V239 (they >> are in numerical sequence) has been filled in, so I try >> >> d2$check<- c(d2$V13:d2$V239) >> >> and/or >> >> d2$check<- paste(d2$V13:d2$V239,sep="") >> >> but I get (translated from Italian): >> >> Error in d2$V13:d2$V239 : argument NA/NaN >> >> I have tried nchar but the same error occurs. I have also tried to run the >> above functions on a smaller variable subset (V13, V14, V15, see below for >> details) just to double check in case some variable would erroneously be >> in >> another format, but the same occur. >> >> d2$V13 >>> >> [1] """""" >> """""""da -5.1% a -10%" >> "" >> [9] """""" >> """""""" >> "" >> [17] """""" >> """""""" >> "" >> [25] """" >> >>> d2$V14 >>> >> [1] "" "" "" >> "" "" "" "da -10.1% a >> -15%" >> "" >> [9] "" "" "" >> "" "" "" "" >> "" >> [17] "" "" "" >> "" "" "" "" >> "" >> [25] "" "" >> >>> d2$V15 >>> >> [1] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" >> "" "" "" >> >> Can anyone suggest an alternative function for me to create a variable >> that >> checks whether there is at least one value for each of the 26 records I >> need to analyze? >> >> Hi Luca, > Perhaps you are looking for something like this: > > d2check<-unlist(apply(as.matrix(d2[,paste("V",13:239,sep="")]),1,nchar)) > # to test for any non empty rows > any(d2check) > > Jim > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to verify char variables contain at least one value
Happy new year fellows, I am trying to do something I believe should be fairly straightforward but I cannot find my way out. My dataset d2 is 26 rows by 245 columns, exclusively char variables. I would like to check whether at least one column from V13 till V239 (they are in numerical sequence) has been filled in, so I try d2$check <- c(d2$V13:d2$V239) and/or d2$check <- paste(d2$V13:d2$V239,sep="") but I get (translated from Italian): Error in d2$V13:d2$V239 : argument NA/NaN I have tried nchar but the same error occurs. I have also tried to run the above functions on a smaller variable subset (V13, V14, V15, see below for details) just to double check in case some variable would erroneously be in another format, but the same occur. > d2$V13 [1] """""" """""""da -5.1% a -10%" "" [9] """""" """""""" "" [17] """""" """""""" "" [25] """" > d2$V14 [1] "" "" "" "" "" "" "da -10.1% a -15%" "" [9] "" "" "" "" "" "" "" "" [17] "" "" "" "" "" "" "" "" [25] "" "" > d2$V15 [1] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" Can anyone suggest an alternative function for me to create a variable that checks whether there is at least one value for each of the 26 records I need to analyze? Thank you in advance, Luca [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sourcing from 2 different computers R code
Hi, Thanks for the advise. I have solved by following one option on the drop down menu which I did not see earlier and got: source("c:\\Users\\...\\filename.R") Thank you for the prompt reply, Luca 2013/11/12 Pascal Oettli > Hello, > > What is the result when you use source("C:/Users/...R")? > > Regards, > Pascal > > > On 12 November 2013 15:13, Luca Meyer wrote: > >> Hi, >> >> I have a piece of code sitting on a dropbox directory and haev installed R >> 3.0.2 on 2 machines: one MacBook Pro and one Sony Vaio pc. >> >> Now, when I use >> >> source("/Users/R") >> >> to call the script from the Mac no problems, but when I use >> >> source("C:\Users\...R") >> >> to call the script from the Sony Vaio I get the following: >> >> Error: '\U' used without hex digits in character string starting "'C:\U" >> >> What am I doing wrong? >> >> Thanks in advance, >> >> Luca >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Pascal Oettli > Project Scientist > JAMSTEC > Yokohama, Japan > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sourcing from 2 different computers R code
Hi, I have a piece of code sitting on a dropbox directory and haev installed R 3.0.2 on 2 machines: one MacBook Pro and one Sony Vaio pc. Now, when I use source("/Users/R") to call the script from the Mac no problems, but when I use source("C:\Users\...R") to call the script from the Sony Vaio I get the following: Error: '\U' used without hex digits in character string starting "'C:\U" What am I doing wrong? Thanks in advance, Luca [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Uploading Google Spreadsheet data into R
It does indeed. Thank you David, Luca 2013/11/8 David Carlson > Stripping down to the bare essentials seems to get it. In > particular making the query just "select *" instead of "select * > where B!=''" works. You don't need the processing that the more > complicated Guardian web page requires. After loading the RCurl > package and creating the gsqAPI function: > > > > tmp=gsqAPI("0AkvLBhzbLcz5dHljNGhUdmNJZ0dOdGJLTVRjTkRhTkE","selec > t *", 0) > > str(tmp) > 'data.frame': 9 obs. of 3 variables: > $ COL1: chr "25/10/2013" "25/10/2013" "31/10/2013" > "31/10/2013" ... > $ COL2: int 50 10 16 18 25 34 56 47 50 > $ COL3: chr "TEXT" "TEXT TEXT" "TEXT" "TEXT" ... > > tmp > COL1 COL2 COL3 > 1 25/10/2013 50 TEXT > 2 25/10/2013 10 TEXT TEXT > 3 31/10/2013 16 TEXT > 4 31/10/2013 18 TEXT > 5 31/10/2013 25 TEXT TEXT > 6 31/10/2013 34 TEXT > 7 31/10/2013 56 TEXT > 8 31/10/2013 47 TEXT > 9 31/10/2013 50 TEXT > > ----- > David L Carlson > Department of Anthropology > Texas A&M University > College Station, TX 77840-4352 > > -Original Message- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Luca Meyer > Sent: Friday, November 8, 2013 1:33 AM > To: r-help@r-project.org > Subject: [R] Uploading Google Spreadsheet data into R > > Hello, > > I am trying to upload data I have on a Google Spreadsheet within > R to > perform some analysis. I regularly update such data and need to > perform > data analysis in the quickiest possible way - i.e. without need > to publish > the data, so I was wondering how to make work this piece of code > (source > http://www.r-bloggers.com/datagrabbing-commonly-formatted-sheets > -from-a-google-spreadsheet-guardian-2014-university-guide-data/) > with my dataset (see > https://docs.google.com/spreadsheet/ccc?key=0AkvLBhzbLcz5dHljNGh > UdmNJZ0dOdGJLTVRjTkRhTkE#gid=0 > ): > > library(RCurl) > gsqAPI = function(key,query,gid=0){ > tmp=getURL( paste( > sep="",'https://spreadsheets.google.com/tq?', > 'tqx=out:csv','&tq=', curlEscape(query), '&key=', key, '&gid=', > gid), > ssl.verifypeer = FALSE ) > return( read.csv( textConnection( tmp ), stringsAsFactors=F ) > ) > } > handler=function(key,i){ > tmp=gsqAPI(key,"select * where B!=''", i) > subject=sub(".Rank",'',colnames(tmp)[1]) > colnames(tmp)[1]="Subject.Rank" > tmp$subject=subject > tmp > } > key='0AkvLBhzbLcz5dHljNGhUdmNJZ0dOdGJLTVRjTkRhTkE' > gdata=handler(key,0) > > The code is currently returning the following: > > Error in `$<-.data.frame`(`*tmp*`, "subject", value = "COL1") : > replacement has 1 row, data has 0 > > Thank you in advance, > Luca > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible > code. > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Uploading Google Spreadsheet data into R
Hello, I am trying to upload data I have on a Google Spreadsheet within R to perform some analysis. I regularly update such data and need to perform data analysis in the quickiest possible way - i.e. without need to publish the data, so I was wondering how to make work this piece of code (source http://www.r-bloggers.com/datagrabbing-commonly-formatted-sheets-from-a-google-spreadsheet-guardian-2014-university-guide-data/) with my dataset (see https://docs.google.com/spreadsheet/ccc?key=0AkvLBhzbLcz5dHljNGhUdmNJZ0dOdGJLTVRjTkRhTkE#gid=0 ): library(RCurl) gsqAPI = function(key,query,gid=0){ tmp=getURL( paste( sep="",'https://spreadsheets.google.com/tq?', 'tqx=out:csv','&tq=', curlEscape(query), '&key=', key, '&gid=', gid), ssl.verifypeer = FALSE ) return( read.csv( textConnection( tmp ), stringsAsFactors=F ) ) } handler=function(key,i){ tmp=gsqAPI(key,"select * where B!=''", i) subject=sub(".Rank",'',colnames(tmp)[1]) colnames(tmp)[1]="Subject.Rank" tmp$subject=subject tmp } key='0AkvLBhzbLcz5dHljNGhUdmNJZ0dOdGJLTVRjTkRhTkE' gdata=handler(key,0) The code is currently returning the following: Error in `$<-.data.frame`(`*tmp*`, "subject", value = "COL1") : replacement has 1 row, data has 0 Thank you in advance, Luca [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading .gsheet within R
Thank you Henrik & the others that have commented. Accessing the actual online data is what I would need, but apparently this is not yet feasible… Luca Il giorno 01/dic/2012, alle ore 04:53, Henrik Bengtsson ha scritto: > On Fri, Nov 30, 2012 at 9:43 AM, Luca Meyer wrote: >> Hello R-experts, >> >> I would like to know if there is a solution to read files with extension >> .gsheet directly into R - see http://www.fileinfo.com/extension/gsheet for >> more info on this file format. > > AFAIK, those files (*.gsheet, *.gdoc, *.gslides) are just tiny JSON > files containing references to the online/cloud resource (specifying > the "url" and the "resource_id"). There are several packages on CRAN > for parsing JSON files. Accessing the actual online data is a > different story... > > My $0.02 > > /Henrik > >> >> Thank you, >> Luca >> >> Mr. Luca Meyer >> www.lucameyer.com >> R 2.15.1 >> Mac OS X 10.8.2 >> >> >> >> >> >> >> >>[[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Reading .gsheet within R
Hello R-experts, I would like to know if there is a solution to read files with extension .gsheet directly into R - see http://www.fileinfo.com/extension/gsheet for more info on this file format. Thank you, Luca Mr. Luca Meyer www.lucameyer.com R 2.15.1 Mac OS X 10.8.2 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Leading plus in numeric fields
Hello R experts, I have go this data frame: 'data.frame': 1 obs. of 20 variables: $ Anno : chr "PREVISIONI VS TARGET" $ OreTot: num 41 $ GioTot: logi NA $ OrGTot: logi NA $ OreCli: num 99 $ GioCli: logi NA $ OrGCli: logi NA $ OreFor: num -27 $ GioFor: logi NA $ OrGFor: logi NA $ OreOrt: num -18 $ GioOrt: logi NA $ OrGOrt: logi NA $ OreSpo: num -6 $ GioSpo: logi NA $ OrGSpo: logi NA $ OreUff: num -7 $ GioUff: logi NA $ OrGUff: logi NA $ temp : num 0 Is there any way I can format the numeric fields so that I get a leading "+" whenever the value is > 0? In the specific case I would need something like: 'data.frame': 1 obs. of 20 variables: $ Anno : chr "PREVISIONI VS TARGET" $ OreTot: num +41 $ GioTot: logi NA $ OrGTot: logi NA $ OreCli: num +99 $ GioCli: logi NA $ OrGCli: logi NA $ OreFor: num -27 $ GioFor: logi NA $ OrGFor: logi NA $ OreOrt: num -18 $ GioOrt: logi NA $ OrGOrt: logi NA $ OreSpo: num -6 $ GioSpo: logi NA $ OrGSpo: logi NA $ OreUff: num -7 $ GioUff: logi NA $ OrGUff: logi NA $ temp : num 0 Thank you in advance, Luca Mr. Luca Meyer www.lucameyer.com R version 2.15.1 Mac OS X 10.8 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regexpr with accents
Thanks Arun, It works all right, I just found out that my problem was not with accents but with the correct spelling of "some text". Kind regards, Luca Il giorno 06/ago/2012, alle ore 15.01, arun ha scritto: > > > Hi, > > Here, the string with in the quotes are read exactly like that. So, you may > have to use the symbol instead of "friendly" or "numeric" from the link. Or > you have to convert those. > > d1 <- data.frame(V1 = 1:4, > V2 = c("some text = 9", "some tèxt = 9", "some tèxt = 9", "some > tèxt = 9")) > > d1$V1[regexpr("some tèxt = 9",d1$V2)>0] <- 9 > d1$V1[regexpr("some tèxt = 9",d1$V2)>0] <- 9 > d1$V1[regexpr("some tèxt = 9",d1$V2)>0] <- 9 > > d1 > V1 V2 > 1 1 some text = 9 > 2 9 some tèxt = 9 > 3 9 some tèxt = 9 > 4 9 some tèxt = 9 > > A.K. > > > - Original Message - > From: Luca Meyer > To: r-help@r-project.org > Cc: > Sent: Monday, August 6, 2012 8:25 AM > Subject: [R] regexpr with accents > > Sorry but my previous email did not go through properly. Instead of the ? you > should really read an è or è according to > http://www.lookuptables.com/. > > So there are extended ASCII characters I need to deal with. > > I have tried > > d1$V1[regexpr("some tèxt = 9",d1$V2)>0] <- 9 > and > > d1$V1[regexpr("some tèxt = 9",d1$V2)>0] <- 9 > > without success... > > Thanks, > Luca > > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] regexpr with accents
Sorry but my previous email did not go through properly. Instead of the ? you should really read an è or è according to http://www.lookuptables.com/. So there are extended ASCII characters I need to deal with. I have tried d1$V1[regexpr("some tèxt = 9",d1$V2)>0] <- 9 and d1$V1[regexpr("some tèxt = 9",d1$V2)>0] <- 9 without success... Thanks, Luca [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] regexpr with accents
Hello, I have build a syntax to find out if a given substring is included in a larger string that works like this: d1$V1[regexpr("some text = 9",d1$V2)>0] <- 9 and this works all right till "some text" contains standard ASCII set. However, it does not work when accents are included as the following: d1$V1[regexpr("some tèxt = 9",d1$V2)>0] <- 9 I have tried to substitute "è" with several wildcards but it did not work, can anyone suggest how to have the syntax parse the string ignoring the accent? Thank you in advance, Luca __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple lines for each record: how do I handle that
Hi Jorge, The method you suggest is indeed working fine on the small sample data set. When I apply to a larger dataset (714 rows by 160 columns) it transforms some variables from "factor" to "list", how can I change it back to their original class in an automatic way? Thanks, Luca Il giorno 22/feb/2012, alle ore 21.05, Jorge I Velez ha scritto: > Hi Luca, > > Thank you for the example. Here is one way of doing what you want (of course > there are many of them!): > > # data > d0 <- structure(list(id = c(1, 1, 2, 2, 2, 3), v1 = c(NA, 1, NA, 1, > NA, 1), v2 = structure(c(3L, 1L, 2L, 1L, 1L, 3L), .Label = c("", > "no", "yes"), class = "factor"), v3 = structure(c(NA, 1L, NA, > NA, 3L, 2L), .Label = c("1", "2", "3"), class = "factor")), .Names = c("id", > "v1", "v2", "v3"), row.names = c(NA, -6L), class = "data.frame") > > # processing > out <- lapply(split(d0, d0$id), function(l) apply(l[,-1], 2, function(x) > x[!is.na(x) & x != ""])) > out <- data.frame(do.call(rbind, out)) > > # output > cbind(id = unique(d0$id), out) > > Perhaps plyr would be a better way ;-) > > HTH, > Jorge.- > > > On Wed, Feb 22, 2012 at 2:49 PM, Luca Meyer <> wrote: > Sure, I am sorry I have not done that in the first place. > > The datasets I have looks like: > > id <- c(1,1,2,2,2,3) > v1 <- c(NA,1,NA,1,NA,1) > v2 <- as.character(c("yes","","no","","","yes")) > v3 <- as.factor(c(NA,1,NA,NA,3,2)) > d0 <- data.frame(id,v1,v2,v3) > d0 > > What I would need is to derive a dataset that looks like: > > id <- c(1,2,3) > v1 <- c(1,1,1) > v2 <- as.character(c("yes","no","yes")) > v3 <- as.factor(c(1,3,2)) > d1 <- data.frame(id,v1,v2,v3) > d1 > > The issue is related to the need to have an automated procedure that reads in > the different variable types and aggregates them accordingly as every dataset > will be different from the previous in terms of number of variables and > records involved. > > Thank you, > Luca > > Il giorno 22/feb/2012, alle ore 20.26, Sarah Goslee ha scritto: > > > If you provide a small reproducible example of your data format and > > expected output, I'm sure someone here can offer a useful solution. > > > > Without knowing what your data look like, not so easy. > > > > Sarah > > > > On Wed, Feb 22, 2012 at 2:22 PM, Luca Meyer <> wrote: > >> Hi Folks, > >> > >> I just discovered that my dataset (coming from QuestionPro platform) has > >> got multiple lines for each respondent id, but what I would really need is > >> a "regular" data matrix where each respondent's data is shown on a single > >> line. > >> > >> Does anyone has already develop a procedure that automatically takes the > >> multiple lines and aggregates them into a single line? > >> > >> Thank you in advance, > >> Luca > >> > >> Mr. Luca Meyer > >> www.lucameyer.com > >> R version 2.14.1 (2011-12-22) > >> Mac OS X 10.6.8 > >> > >> > > -- > > Sarah Goslee > > http://www.functionaldiversity.org > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple lines for each record: how do I handle that
Sure, I am sorry I have not done that in the first place. The datasets I have looks like: id <- c(1,1,2,2,2,3) v1 <- c(NA,1,NA,1,NA,1) v2 <- as.character(c("yes","","no","","","yes")) v3 <- as.factor(c(NA,1,NA,NA,3,2)) d0 <- data.frame(id,v1,v2,v3) d0 What I would need is to derive a dataset that looks like: id <- c(1,2,3) v1 <- c(1,1,1) v2 <- as.character(c("yes","no","yes")) v3 <- as.factor(c(1,3,2)) d1 <- data.frame(id,v1,v2,v3) d1 The issue is related to the need to have an automated procedure that reads in the different variable types and aggregates them accordingly as every dataset will be different from the previous in terms of number of variables and records involved. Thank you, Luca Il giorno 22/feb/2012, alle ore 20.26, Sarah Goslee ha scritto: > If you provide a small reproducible example of your data format and > expected output, I'm sure someone here can offer a useful solution. > > Without knowing what your data look like, not so easy. > > Sarah > > On Wed, Feb 22, 2012 at 2:22 PM, Luca Meyer wrote: >> Hi Folks, >> >> I just discovered that my dataset (coming from QuestionPro platform) has got >> multiple lines for each respondent id, but what I would really need is a >> "regular" data matrix where each respondent's data is shown on a single line. >> >> Does anyone has already develop a procedure that automatically takes the >> multiple lines and aggregates them into a single line? >> >> Thank you in advance, >> Luca >> >> Mr. Luca Meyer >> www.lucameyer.com >> R version 2.14.1 (2011-12-22) >> Mac OS X 10.6.8 >> >> > -- > Sarah Goslee > http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Multiple lines for each record: how do I handle that
Hi Folks, I just discovered that my dataset (coming from QuestionPro platform) has got multiple lines for each respondent id, but what I would really need is a "regular" data matrix where each respondent's data is shown on a single line. Does anyone has already develop a procedure that automatically takes the multiple lines and aggregates them into a single line? Thank you in advance, Luca Mr. Luca Meyer www.lucameyer.com R version 2.14.1 (2011-12-22) Mac OS X 10.6.8 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Row percentage labels within mosaic graph
Hi, I have the following sample data x <- c(1,2,1,1,2,3,1,2,3,2,3,1,2,2,1,3,2,3,1,3) y <- c(1,2,1,2,2,2,1,2,1,1,1,1,2,2,1,2,2,1,2,1) w <- c(1, 1, 1.5, 1, 1.2, 0.8, 0.9, 1.7, 1, 1.3, 1, 1, 0.7, 0.8, 1.4, 1.3, 1, 1, 0.9, 0.7) d1 <- data.frame(x,y,w) and I wish to build a x,y mosaic graphs that shows as labels both row percentages and number of cases. So far I have been using this script: require(gmodels) require(vcd) d2 <- xtabs(~ x + y, data=d1) mosaic ( d2, gp = shading_max, labeling_args = list( gp_labels = gpar(fontsize = 10, fontface = 1), rot_labels = c(0,90,90,0), gp_varnames = gpar(fontsize = 0, fontface = 2) ), main= "title", main_gp = gpar(fontsize = 16, fontface = 2), pop=FALSE ) d3 <- CrossTable(d1$x,d1$y) etichette <- ifelse(d2 < 5, "<5", paste(round(d3$prop.row*100, digits=1),"%\n(n=",d2,")", sep="")) labeling_cells(text = etichette, clip = FALSE, gp_text=gpar(fontsize=10))(d2) This works just fine but now I have to apply the weight w to the computation. I have modified the first part of the above script to d2 <-round(xtabs(w ~ x + y, data=d1), digits=0) mosaic ( d2, gp = shading_max, labeling_args = list( gp_labels = gpar(fontsize = 10, fontface = 1), rot_labels = c(0,90,90,0), gp_varnames = gpar(fontsize = 0, fontface = 2) ), main= "title", main_gp = gpar(fontsize = 16, fontface = 2), pop=FALSE ) but I have some difficulty with the labeling part. I can show number of observation in the labels using: d3 <- as.list(round(xtabs(w ~ x + y, data=d1)), digits=0) etichette <- ifelse(d2 < 5, "<5", paste("(n=",d3,")", sep="")) labeling_cells(text = etichette, clip = FALSE, gp_text=gpar(fontsize=10))(d2) but I would need to show row proportions, do you know how I can do that? Thanks, Luca Mr. Luca Meyer www.lucameyer.com R version 2.13.1 (2011-07-08) Mac OS X 10.6.8 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ordering rows within CrossTable
Hi, I am running the following -masked- code: set.seed(23) city <- sample(c("C1","C2"),size=100,replace=T) reason <- sample(c("R1","R2","R3","R4"),size=100,replace=T) df <- data.frame(city,reason) library(gmodels) CrossTable(df$reason,df$city,prop.r=F,prop.c=F,prop.t=F,prop.chisq=F) And I get the following output: | df$city df$reason |C1 |C2 | Row Total | -|---|---|---| R1 | 4 |13 |17 | -|---|---|---| R2 |19 |10 |29 | -|---|---|---| R3 |12 |13 |25 | -|---|---|---| R4 |11 |18 |29 | -|---|---|---| Column Total |46 |54 | 100 | -|---|---|---| I would like to have the df$reason sorted by decreasing count on the Row Total - that is showing R2, R4, R3 and finally R1 - how can I do that? Thanks, Luca Mr. Luca Meyer www.lucameyer.com R version 2.13.1 (2011-07-08) Mac OS X 10.6.8 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Showing zero frequencies with xtabs
Thanks Peter & Petr, It was indeed an issue of having some character variables in there. Now it works just fine. Cheers, Luca Il giorno 30/ago/2011, alle ore 10.15, peter dalgaard ha scritto: > > On Aug 30, 2011, at 10:04 , Luca Meyer wrote: > >> Hi, >> >> Does anyone know how to show zero frequencies variable levels with the xtabs >> command? They show with the table(x,y) command but I need to apply weight to >> frequency tables and I also need to cbind several tables together, which >> implies that they all need to show the same number of rows. > > Are you sure you are doing the same thing as with table(). I'd expect it to > work if you ensure that the variables are factors: > >> library(ISwR) >> xtabs(~sex+menarche,data=juul) > menarche > sex 1 2 > 2 369 335 > >> juul$sex <- factor(juul$sex,levels=1:2) >> xtabs(~sex+menarche,data=juul) > menarche > sex 1 2 > 1 0 0 > 2 369 335 > > > >> >> Alternatively, do you know how to column bind tables with different number >> of rows? I cannot use merge as it requires daata.frame and that modifies the >> look of the banner table I am trying to create... >> >> Thanks, >> Luca >> >> >> Mr. Luca Meyer >> www.lucameyer.com >> R version 2.13.1 (2011-07-08) >> Mac OS X 10.6.8 >> >> >> >> >> >> >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > -- > Peter Dalgaard, Professor > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Email: pd@cbs.dk Priv: pda...@gmail.com > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Showing zero frequencies with xtabs
Hi, Does anyone know how to show zero frequencies variable levels with the xtabs command? They show with the table(x,y) command but I need to apply weight to frequency tables and I also need to cbind several tables together, which implies that they all need to show the same number of rows. Alternatively, do you know how to column bind tables with different number of rows? I cannot use merge as it requires daata.frame and that modifies the look of the banner table I am trying to create... Thanks, Luca Mr. Luca Meyer www.lucameyer.com R version 2.13.1 (2011-07-08) Mac OS X 10.6.8 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I get a weighted frequency table?
Thank you, that's works just fine. Luca Il giorno 29/ago/2011, alle ore 23.48, H. T. Reynolds ha scritto: > Hi, > > I use xtabs with the weight variable on the left hand side of the formula as > in > > xtabs(weight ~ opinion + gender + ...) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I get a weighted frequency table?
Hi David, Unfortunately I need to use the "should have been" frequencies if the sample corresponded perfectly in terms of some "reference" variables to the population. That is, if in my sample I observe V1_R1=10%, V1_R2=50%, V3_R3=40% while the same known population distribution is V1_R1=20%, V1_R2=30%, V3_R3=50% then I would like to see what V2*V3, V2*V4, ... , V2* VN, V3*V4, ... ,VN-1 * VN would have been had the sample perfectly reflect the population in terms of V1. I hope that clarifies what I am trying to achieve... Thanks, Luca Il giorno 29/ago/2011, alle ore 16.29, David L Carlson ha scritto: > If you are talking about weights that are the frequencies in each cell, you > can use xtabs(): > > df <- data.frame(Var1=c("Absent", "Present", "Absent", "Present"), > Var2=c("Absent", "Absent", "Present", "Present"), Freq=c(17, 6, 3, 12)) > df > xtabs(Freq~Var1+Var2, data=df) > > -- > David L Carlson > Associate Professor of Anthropology > Texas A&M University > College Station, TX 77843-4352 > > > > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf Of Leandro Marino > Sent: Sunday, August 28, 2011 12:15 PM > To: Luca Meyer > Cc: r-help@r-project.org > Subject: Re: [R] How do I get a weighted frequency table? > > *Luca, > * > > > you may use survey package. You have to declare the design with design > function and than you can you svytotal, svyby, svymean functions to do your > tabulations. > > Regards, > Leandro > > > > Atenciosamente, > Leandro Marino > http://www.leandromarino.com.br (Fotsgrafo) > http://est.leandromarino.com.br/Blog (Estatmstico) > Cel.: + 55 21 9845-7707 > Cel.: + 55 21 8777-7907 > > > > 2011/8/28 Luca Meyer > >> Hello, >> >> I have to run a set of crosstabulations to which I need to apply some >> weights. I am currently doing an unweighted version of such crosstabs >> using table(x,y). >> >> I am used with SPSS to create a weighting variable and to use WEIGHT >> BY VAR before running the CTABLES, is there a similar procedure in R? >> >> Thanks, >> Luca >> >> Mr. Luca Meyer >> www.lucameyer.com >> R version 2.13.1 (2011-07-08) >> Mac OS X 10.6.8 >> >> >> >> >> >> >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How do I get a weighted frequency table?
Hello, I have to run a set of crosstabulations to which I need to apply some weights. I am currently doing an unweighted version of such crosstabs using table(x,y). I am used with SPSS to create a weighting variable and to use WEIGHT BY VAR before running the CTABLES, is there a similar procedure in R? Thanks, Luca Mr. Luca Meyer www.lucameyer.com R version 2.13.1 (2011-07-08) Mac OS X 10.6.8 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding labels into lattice's barchart
Thanks Deepayan, What you suggest is quite fine, but provides the overall number of cases for the entire dataset splitted into V2 levels. What about if I need to show panel specific's values? For instance I want to show not the total number of Female but the total number of Female in 1st Class. In other worlds, take your example and suppose I have: barchart(V2 ~ Freq | V1, data = tdf, groups = V3, layout=c(1,4), stack=TRUE, ylim = sprintf("%s (n=%g)", names(numByV2), numByV2)) and now what I would like to show is the result of with(tdf, tapply(Freq, list(V2,V1), sum)) next to each stacked bar. In the previous example, I would need show in the Crew panel Female (n=23), in the 3rd Class panel Female (n=196), etc... Can I do that? Thanks, Luca Il giorno 14/feb/2011, alle ore 11.43, Deepayan Sarkar ha scritto: > On Wed, Feb 9, 2011 at 11:04 PM, Luca Meyer wrote: >> *** APOLOGIZES FOR THOSE READING THE LIST THROUGH NABBLE THIS WAS ALREADY >> POSTED THERE BUT NOT FORWARDED TO THE LIST FOR SOME UNKNOWN REASON *** >> >> I have a dataset that looks like: >> >> $ V1: factor with 4 levels >> $ V2: factor with 4 levels >> $ V3: factor with 2 levels >> $ V4: num (summing up to 100 within V3 levels) >> $ V5: num (nr of cases for each unique combination of V1*V2*V3 levels) >> >> Quite new to lattice - I've started reading Deepayan's book a few days ago - >> I have written the following: >> >> barchart(V2 ~ V4 | V1, >>data=d1, >>groups=V3, >>stack=TRUE, >>auto.key= list(space="top"), >>layout = c(1,4), >>xlab=" " >>) >> >> which works just fine as a stacked bar chart with bars adding up to 100%. >> Now what I would like to see is the number of cases showing next to the 4 >> x-axis's labels - i.e. V2_L1, ... V2_L4. >> >> In other words now I see something like: >> >> *** V1_L1 *** >> V2_L4 AAAVVV >> V2_L3 AA >> V2_L2 AV >> V2_L1 AA >> *** V1_L2 *** >> V2_L4 AA >> V2_L3 AV >> etc... >> >> But what I am looking for is something like: >> *** V1_L1 *** >> V2_L4 (n=60) AAAVVV >> V2_L3 (n=10) AA >> V2_L2 (n=52) AV >> V2_L1 (n=15) AA >> *** V1_L2 *** >> V2_L4 (n=18) AA >> V2_L3 (n=74) AV >> etc... >> >> How can I do that? I have tried: >> >> V6 <- paste(V2," (n",V5,")") > > What you really want is to compute the total sum of V5 per level of V2 > (and add that to the labels of V2). There are many ways of doing so, > one is tapply(). > > In the absence of a reproducible example, here is an approximation: > > tdf <- as.data.frame.table(apply(Titanic, c(1, 2, 4), sum)) > names(tdf)[1:3] <- paste("V", 1:3, sep = "") > > str(tdf) > > barchart(V2 ~ Freq | V1, data=tdf, groups=V3, stack=TRUE) > > with(tdf, tapply(Freq, V2, sum)) > > numByV2 <- with(tdf, tapply(Freq, V2, sum)) > > barchart(V2 ~ Freq | V1, data = tdf, groups = V3, stack=TRUE, >ylim = sprintf("%s (n=%g)", names(numByV2), numByV2)) > > ## or > > levels(tdf$V2) <- sprintf("%s (n=%g)", levels(tdf$V2), numByV2) > barchart(V2 ~ Freq | V1, data=tdf, groups=V3, stack=TRUE) > > -Deepayan > >> >> but what i get when I run >> >> barchart(V6 ~ V4 | V1, >>data=d1, >>groups=V3, >>stack=TRUE, >>auto.key= list(space="top"), >>layout = c(1,4), >>xlab=" " >>) >> >> is a bunch of empty bars due to the fact that the unique combinations have >> risen. >> >> Any help would be appreciated. >> >> Thanks, >> Luca >> >> Mr. Luca Meyer >> www.lucameyer.com >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding labels into lattice's barchart
Thanks Deepayan, What you suggest is quite fine, but provides the overall number of cases for the entire dataset splitted into V2 levels. What about if I need to show panel specific's values? For instance I want to show not the total number of Female but the total number of Female in 1st Class. In other worlds, take your example and suppose I have: barchart(V2 ~ Freq | V1, data = tdf, groups = V3, layout=c(1,4), stack=TRUE, ylim = sprintf("%s (n=%g)", names(numByV2), numByV2)) and now what I would like to show is the result of with(tdf, tapply(Freq, list(V2,V1), sum)) next to each stacked bar. In the previous example, I would need show in the Crew panel Female (n=23), in the 3rd Class panel Female (n=196), etc... Can I do that? Thanks, Luca Il giorno 14/feb/2011, alle ore 11.43, Deepayan Sarkar ha scritto: > On Wed, Feb 9, 2011 at 11:04 PM, Luca Meyer wrote: >> *** APOLOGIZES FOR THOSE READING THE LIST THROUGH NABBLE THIS WAS ALREADY >> POSTED THERE BUT NOT FORWARDED TO THE LIST FOR SOME UNKNOWN REASON *** >> >> I have a dataset that looks like: >> >> $ V1: factor with 4 levels >> $ V2: factor with 4 levels >> $ V3: factor with 2 levels >> $ V4: num (summing up to 100 within V3 levels) >> $ V5: num (nr of cases for each unique combination of V1*V2*V3 levels) >> >> Quite new to lattice - I've started reading Deepayan's book a few days ago - >> I have written the following: >> >> barchart(V2 ~ V4 | V1, >> data=d1, >> groups=V3, >> stack=TRUE, >> auto.key= list(space="top"), >> layout = c(1,4), >> xlab=" " >> ) >> >> which works just fine as a stacked bar chart with bars adding up to 100%. >> Now what I would like to see is the number of cases showing next to the 4 >> x-axis's labels - i.e. V2_L1, ... V2_L4. >> >> In other words now I see something like: >> >> *** V1_L1 *** >> V2_L4 AAAVVV >> V2_L3 AA >> V2_L2 AV >> V2_L1 AA >> *** V1_L2 *** >> V2_L4 AA >> V2_L3 AV >> etc... >> >> But what I am looking for is something like: >> *** V1_L1 *** >> V2_L4 (n=60) AAAVVV >> V2_L3 (n=10) AA >> V2_L2 (n=52) AV >> V2_L1 (n=15) AA >> *** V1_L2 *** >> V2_L4 (n=18) AA >> V2_L3 (n=74) AV >> etc... >> >> How can I do that? I have tried: >> >> V6 <- paste(V2," (n",V5,")") > > What you really want is to compute the total sum of V5 per level of V2 > (and add that to the labels of V2). There are many ways of doing so, > one is tapply(). > > In the absence of a reproducible example, here is an approximation: > > tdf <- as.data.frame.table(apply(Titanic, c(1, 2, 4), sum)) > names(tdf)[1:3] <- paste("V", 1:3, sep = "") > > str(tdf) > > barchart(V2 ~ Freq | V1, data=tdf, groups=V3, stack=TRUE) > > with(tdf, tapply(Freq, V2, sum)) > > numByV2 <- with(tdf, tapply(Freq, V2, sum)) > > barchart(V2 ~ Freq | V1, data = tdf, groups = V3, stack=TRUE, > ylim = sprintf("%s (n=%g)", names(numByV2), numByV2)) > > ## or > > levels(tdf$V2) <- sprintf("%s (n=%g)", levels(tdf$V2), numByV2) > barchart(V2 ~ Freq | V1, data=tdf, groups=V3, stack=TRUE) > > -Deepayan > >> >> but what i get when I run >> >> barchart(V6 ~ V4 | V1, >> data=d1, >> groups=V3, >> stack=TRUE, >> auto.key= list(space="top"), >> layout = c(1,4), >> xlab=" " >> ) >> >> is a bunch of empty bars due to the fact that the unique combinations have >> risen. >> >> Any help would be appreciated. >> >> Thanks, >> Luca >> >> Mr. Luca Meyer >> www.lucameyer.com >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Adding labels into lattice's barchart
*** APOLOGIZES FOR THOSE READING THE LIST THROUGH NABBLE THIS WAS ALREADY POSTED THERE BUT NOT FORWARDED TO THE LIST FOR SOME UNKNOWN REASON *** I have a dataset that looks like: $ V1: factor with 4 levels $ V2: factor with 4 levels $ V3: factor with 2 levels $ V4: num (summing up to 100 within V3 levels) $ V5: num (nr of cases for each unique combination of V1*V2*V3 levels) Quite new to lattice - I've started reading Deepayan's book a few days ago - I have written the following: barchart(V2 ~ V4 | V1, data=d1, groups=V3, stack=TRUE, auto.key= list(space="top"), layout = c(1,4), xlab=" " ) which works just fine as a stacked bar chart with bars adding up to 100%. Now what I would like to see is the number of cases showing next to the 4 x-axis's labels - i.e. V2_L1, ... V2_L4. In other words now I see something like: *** V1_L1 *** V2_L4 AAAVVV V2_L3 AA V2_L2 AV V2_L1 AA *** V1_L2 *** V2_L4 AA V2_L3 AV etc... But what I am looking for is something like: *** V1_L1 *** V2_L4 (n=60) AAAVVV V2_L3 (n=10) AA V2_L2 (n=52) AV V2_L1 (n=15) AA *** V1_L2 *** V2_L4 (n=18) AA V2_L3 (n=74) AV etc... How can I do that? I have tried: V6 <- paste(V2," (n",V5,")") but what i get when I run barchart(V6 ~ V4 | V1, data=d1, groups=V3, stack=TRUE, auto.key= list(space="top"), layout = c(1,4), xlab=" " ) is a bunch of empty bars due to the fact that the unique combinations have risen. Any help would be appreciated. Thanks, Luca Mr. Luca Meyer www.lucameyer.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Panel title: mfrow() or ?
Hi, I am trying to build a 3 rows by 2 columns panel using par(mfrow=c(3,2)) The 6 graphs are coming out quite all right, but now I would like to put a title on top of the page - i.e. something that is common for all 6 graphs - how can I do that? Thanks, Luca Mr. Luca Meyer www.lucameyer.com IBM SPSS Statistics release 19.0.0 R version 2.12.1 (2010-12-16) Mac OS X 10.6.5 (10H574) - kernel Darwin 10.5.0 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Getting total bar's label & value labels in a barplot
Hi, I have been trying to get the label under the total column - i.e. a mean value of columns 2 to 6 - in a barplot I generate with this script: t1 <- tapply(A, B, sum) t1[8] <- mean(t1[2:6]) t1 <- as.table(t1) barplot(t1, ylim=c(0,3000)) mtext("Var1", side = 1, line = 3) mtext("Var2", side = 2, line = 3) I have been trying to use axis(1, at=1:8, labels=c("1","2","3","4","5","6","7","8")) but I get labels not standing underneat the columns...can someone help me out on this one? Also, I would like to plot onto each bar the corresponding numerical value - e.g. "1824" on the first bar, ecc... Please notice that str(t1) would look like: Named num [1:8] 1824 2339 2492 2130 2360 ... - attr(*, "names")= chr [1:8] "1" "2" "3" "4" ... Thanks, Luca Mr. Luca Meyer www.lucameyer.com IBM SPSS Statistics release 19.0.0 R version 2.12.1 (2010-12-16) Mac OS X 10.6.5 (10H574) - kernel Darwin 10.5.0 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error in calling source(): invalid multibyte character in parser
How would I go by doing that? I have tried with: source("file.R", encoding="it_IT.UTF-8") But I get Error in file(file, "r", encoding = encoding) : unsupported conversion from 'it_IT.UTF-8' to '' Thanks, Luca PS: "it_IT.UTF-8" is what I get under locale when I run sessionInfo() Il giorno 03/gen/2011, alle ore 09.48, Prof Brian Ripley ha scritto: > On Mon, 3 Jan 2011, peter dalgaard wrote: > >> >> On Jan 3, 2011, at 08:32 , Luca Meyer wrote: >> >>> Being italians when writing comments/instructions we use accented letters - >>> like à, ò, è, etc when running R scripts using such characters I get >>> and error saying: >>> >>> invalid multibyte character in parser >>> >>> I have been looking at the help and searched the r-help archives but I >>> haven't find anything that I could intelligibly apply to my case. >>> >>> Can anyone suggest a fix for this error? >> >> The most likely cause is that your scripts are written in an "8 bit ASCII" >> encoding (Latin-1 or -9, most likely), while R is running in a UTF8 locale. >> If that is the cause, the fix is to standardize things to use the same >> locale. You can convert the encoding of your source file using the iconv >> utility (in a Terminal window). > > Or use the 'encoding' argument of source() to tell R what the encoding is, > e.g. encoding="latin1" or "latin-9" (the inconsistency being in the iconv > used on Macs, not in R). > >> >> -pd >> >>> >>> Thanks, >>> Luca >>> >>> Mr. Luca Meyer >>> www.lucameyer.com >>> IBM SPSS Statistics release 19.0.0 >>> R version 2.12.1 (2010-12-16) >>> Mac OS X 10.6.5 (10H574) - kernel Darwin 10.5.0 >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> -- >> Peter Dalgaard >> Center for Statistics, Copenhagen Business School >> Solbjerg Plads 3, 2000 Frederiksberg, Denmark >> Phone: (+45)38153501 >> Email: pd@cbs.dk Priv: pda...@gmail.com >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > -- > Brian D. Ripley, rip...@stats.ox.ac.uk > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error in calling source(): invalid multibyte character in parser
It works fine, thanks. I was just wondering is there is anyway to include automatically the command you suggest as a default when I open R. Thanks, Luca Il giorno 03/gen/2011, alle ore 08.36, Phil Spector ha scritto: > Luca - > What happens why you type > > Sys.setlocale('LC_ALL','C') > > before issuing the source command? > > - Phil Spector >Statistical Computing Facility >Department of Statistics >UC Berkeley > spec...@stat.berkeley.edu > > > On Mon, 3 Jan 2011, Luca Meyer wrote: > >> Being italians when writing comments/instructions we use accented letters - >> like à, ò, è, etc when running R scripts using such characters I get and >> error saying: >> >> invalid multibyte character in parser >> >> I have been looking at the help and searched the r-help archives but I >> haven't find anything that I could intelligibly apply to my case. >> >> Can anyone suggest a fix for this error? >> >> Thanks, >> Luca >> >> Mr. Luca Meyer >> www.lucameyer.com >> IBM SPSS Statistics release 19.0.0 >> R version 2.12.1 (2010-12-16) >> Mac OS X 10.6.5 (10H574) - kernel Darwin 10.5.0 >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] error in calling source(): invalid multibyte character in parser
Being italians when writing comments/instructions we use accented letters - like à, ò, è, etc when running R scripts using such characters I get and error saying: invalid multibyte character in parser I have been looking at the help and searched the r-help archives but I haven't find anything that I could intelligibly apply to my case. Can anyone suggest a fix for this error? Thanks, Luca Mr. Luca Meyer www.lucameyer.com IBM SPSS Statistics release 19.0.0 R version 2.12.1 (2010-12-16) Mac OS X 10.6.5 (10H574) - kernel Darwin 10.5.0 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Passing parameter to a function
Hi Duncan, Yes, A and B are columns in D. Having said that I and trying to avoid tab(D$A,D$B) and I would prefer: tab(A,B) Unfortunately the syntax you suggest is giving me the same error: Error in eval(expr, envir, enclos) : object "A" not found I have tried to add some deparse() but I have got the error over again. The last version I have tried: function(x,y){ z <- substitute(time ~ x + y, list(x = deparse(substitute(x)), y = deparse(substitute(y xtabs(z, data=D) gives me another error: Error in terms.formula(formula, data = data) : formula models not valid in ExtractVars Any idea on how I should modify the function to make it work? Thanks, Luca Il giorno 20/dic/2010, alle ore 19.28, Duncan Murdoch ha scritto: > On 20/12/2010 1:13 PM, Luca Meyer wrote: >> I am trying to pass a couple of variable names to a xtabs formula: >> >> > tab<- function(x,y){ >> xtabs(time~x+y, data=D) >> } >> >> But when I run: >> >> > tab(A,B) >> >> I get: >> >> Error in eval(expr, envir, enclos) : object "A" not found >> >> I am quite sure that there is some easy way out, but I have tried with >> different combinations of deparse(), substitute(), eval(), etc without >> success, can someone help? > > I assume that A and B are columns in D? If so, you could use > > tab(D$A, D$B) > > to get what you want. If you really want tab(A,B) to work, you'll need to do > messy work with substitute, e.g. in the tab function, something like > > fla <- substitute(time ~ x + y, list(x = substitute(x), y = substitute(y)) > xtabs(fla, data=D) > > Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Alternative to extended recode sintax? Bug?
Yes, I am seeing that at the end of 2010-beginning 2011. Try: weekdays(as.POSIXct("2010-12-25")+(0:20)*24*60*60) week(as.POSIXct("2010-12-25")+(0:20)*24*60*60) Week 1 (2011) is made up of 6 days Luca Il giorno 20/dic/2010, alle ore 17.54, David Winsemius ha scritto: > > On Dec 20, 2010, at 10:58 AM, Luca Meyer wrote: > >> Right, I appreciate the first day of the year start date. I am just >> wondering why then the cut off day is not the same for the rest of the >> year...but it's all right to use other packages. > > Are you saying it shifts within the year? I am not seeing that: > > require(lubridate) > > > weekdays(as.POSIXct("2010-01-01")+(0:8)*24*60*60) > [1] "Friday""Saturday" "Sunday""Monday""Tuesday" "Wednesday" > [7] "Thursday" "Friday""Saturday" > > week(as.POSIXct("2010-01-01")+(0:8)*24*60*60) > [1] 1 1 1 1 1 1 2 2 2 > > Looks to be incrementing weeks between Wed and Thurs at the beginning of the > year just as it did in your example. I admit that I thought that it should be > shifting at the Thursday - Friday divide, but setting a zero point can be > ambiguous. I thought if it were Midnight Thursday-Friday that all of > Thurdays would be in week 1. But at least it appears consistent. > > >> Thanks, >> Luca >> >> Il giorno 20/dic/2010, alle ore 14.16, David Winsemius ha scritto: >> >>> >>> On Dec 20, 2010, at 12:54 AM, Luca Meyer wrote: >>> >>>> All right, I get it now: lubridate's week() define weeks from Thursday >>>> till the following Wednesday. You'd probably agree with me that it's a bit >>>> strange what it is going to do over the turn of the year: >>>> >>>>> y <- >>>>> as.POSIXct(c("2010-12-27","2010-12-28","2010-12-29","2010-12-30","2010-12-31","2011-01-01","2011-01-02","2011-01-03","2011-01-04","2011-01-05","2011-01-06","2011-01-07","2011-01-08","2011-01-09","2011-01-10","2011-01-11","2010-01-12","2010-01-13","2010-01-14")) >>>>> week(y) >>>> [1] 52 52 52 53 53 1 1 1 1 1 1 2 2 2 2 2 2 2 3 >>>> >>>> Why would the first week of the year be made of 6 days and the turn from >>>> week 1 to week 2 on the night between Thursday and Friday and not >>>> Wednesday and Friday like every other week? >>> >>> weeks in lubridate start on whatever day of the week is the first of that >>> year. >>> >>> If you want a Monday starting day (or the option to change to another >>> starting day), then package chron has such facilities. >>> >>> >>>> >>>> Cheers, >>>> Luca >>>> >>>> >>>> >>>> Il giorno 19/dic/2010, alle ore 18.14, Uwe Ligges ha scritto: >>>> >>>>> >>>>> >>>>> On 19.12.2010 13:20, David Winsemius wrote: >>>>>> >>>>>> On Dec 19, 2010, at 5:11 AM, Luca Meyer wrote: >>>>>> >>>>>>> Something goes wrong with the week function of the lubridate package: >>>>>>> >>>>>>>> x= as.POSIXct(factor(c("2010-12-15 17:28:27", >>>>>>> + "2010-12-15 17:32:34", >>>>>>> + "2010-12-15 18:48:39", >>>>>>> + "2010-12-15 19:25:00", >>>>>>> + "2010-12-16 08:00:00", >>>>>>> + "2010-12-16 08:25:49", >>>>>>> + "2010-12-16 09:00:00"))) >>>>>>>> require(lubridate) >>>>>> >>>>>>>> weekdays(x) >>>>>>> [1] "Mercoledì" "Mercoledì" "Mercoledì" "Mercoledì" "Giovedì" >>>>>>> "Giovedì" "Giovedì" >>>>>>>> week(x) >>>>>>> [1] 50 50 50 50 51 51 51 >>>>>> >>>>>> But 2010-12-15 is a Wednesday and 2010-12-16 is a Thursday. >>>>>> >>>>> >>>>> >>>>> Together with the description of ?week this shows that lubridate's week() >>>>> function works as documented rather than as expected by Luca Meyer. >>>>> >>>>> Uwe Ligges >>>> >>> >>> David Winsemius, MD >>> West Hartford, CT >>> >> > > David Winsemius, MD > West Hartford, CT > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Passing parameter to a function
I am trying to pass a couple of variable names to a xtabs formula: > tab <- function(x,y){ xtabs(time~x+y, data=D) } But when I run: > tab(A,B) I get: Error in eval(expr, envir, enclos) : object "A" not found I am quite sure that there is some easy way out, but I have tried with different combinations of deparse(), substitute(), eval(), etc without success, can someone help? Thanks, Luca Luca Meyer www.lucameyer.com IBM SPSS Statistics release 19.0.0 R version 2.12.1 (2010-12-16) Mac OS X 10.6.5 (10H574) - kernel Darwin 10.5.0 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Alternative to extended recode sintax? Bug?
Right, I appreciate the first day of the year start date. I am just wondering why then the cut off day is not the same for the rest of the year...but it's all right to use other packages. Thanks, Luca Il giorno 20/dic/2010, alle ore 14.16, David Winsemius ha scritto: > > On Dec 20, 2010, at 12:54 AM, Luca Meyer wrote: > >> All right, I get it now: lubridate's week() define weeks from Thursday till >> the following Wednesday. You'd probably agree with me that it's a bit >> strange what it is going to do over the turn of the year: >> >>> y <- >>> as.POSIXct(c("2010-12-27","2010-12-28","2010-12-29","2010-12-30","2010-12-31","2011-01-01","2011-01-02","2011-01-03","2011-01-04","2011-01-05","2011-01-06","2011-01-07","2011-01-08","2011-01-09","2011-01-10","2011-01-11","2010-01-12","2010-01-13","2010-01-14")) >>> week(y) >> [1] 52 52 52 53 53 1 1 1 1 1 1 2 2 2 2 2 2 2 3 >> >> Why would the first week of the year be made of 6 days and the turn from >> week 1 to week 2 on the night between Thursday and Friday and not Wednesday >> and Friday like every other week? > > weeks in lubridate start on whatever day of the week is the first of that > year. > > If you want a Monday starting day (or the option to change to another > starting day), then package chron has such facilities. > > >> >> Cheers, >> Luca >> >> >> >> Il giorno 19/dic/2010, alle ore 18.14, Uwe Ligges ha scritto: >> >>> >>> >>> On 19.12.2010 13:20, David Winsemius wrote: >>>> >>>> On Dec 19, 2010, at 5:11 AM, Luca Meyer wrote: >>>> >>>>> Something goes wrong with the week function of the lubridate package: >>>>> >>>>>> x= as.POSIXct(factor(c("2010-12-15 17:28:27", >>>>> + "2010-12-15 17:32:34", >>>>> + "2010-12-15 18:48:39", >>>>> + "2010-12-15 19:25:00", >>>>> + "2010-12-16 08:00:00", >>>>> + "2010-12-16 08:25:49", >>>>> + "2010-12-16 09:00:00"))) >>>>>> require(lubridate) >>>> >>>>>> weekdays(x) >>>>> [1] "Mercoledì" "Mercoledì" "Mercoledì" "Mercoledì" "Giovedì" >>>>> "Giovedì" "Giovedì" >>>>>> week(x) >>>>> [1] 50 50 50 50 51 51 51 >>>> >>>> But 2010-12-15 is a Wednesday and 2010-12-16 is a Thursday. >>>> >>> >>> >>> Together with the description of ?week this shows that lubridate's week() >>> function works as documented rather than as expected by Luca Meyer. >>> >>> Uwe Ligges >> > > David Winsemius, MD > West Hartford, CT > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Alternative to extended recode sintax? Bug?
All right, I get it now: lubridate's week() define weeks from Thursday till the following Wednesday. You'd probably agree with me that it's a bit strange what it is going to do over the turn of the year: > y <- > as.POSIXct(c("2010-12-27","2010-12-28","2010-12-29","2010-12-30","2010-12-31","2011-01-01","2011-01-02","2011-01-03","2011-01-04","2011-01-05","2011-01-06","2011-01-07","2011-01-08","2011-01-09","2011-01-10","2011-01-11","2010-01-12","2010-01-13","2010-01-14")) > week(y) [1] 52 52 52 53 53 1 1 1 1 1 1 2 2 2 2 2 2 2 3 Why would the first week of the year be made of 6 days and the turn from week 1 to week 2 on the night between Thursday and Friday and not Wednesday and Friday like every other week? Cheers, Luca Il giorno 19/dic/2010, alle ore 18.14, Uwe Ligges ha scritto: > > > On 19.12.2010 13:20, David Winsemius wrote: >> >> On Dec 19, 2010, at 5:11 AM, Luca Meyer wrote: >> >>> Something goes wrong with the week function of the lubridate package: >>> >>>> x= as.POSIXct(factor(c("2010-12-15 17:28:27", >>> + "2010-12-15 17:32:34", >>> + "2010-12-15 18:48:39", >>> + "2010-12-15 19:25:00", >>> + "2010-12-16 08:00:00", >>> + "2010-12-16 08:25:49", >>> + "2010-12-16 09:00:00"))) >>>> require(lubridate) >> >>>> weekdays(x) >>> [1] "Mercoledì" "Mercoledì" "Mercoledì" "Mercoledì" "Giovedì" >>> "Giovedì" "Giovedì" >>>> week(x) >>> [1] 50 50 50 50 51 51 51 >> >> But 2010-12-15 is a Wednesday and 2010-12-16 is a Thursday. >> > > > Together with the description of ?week this shows that lubridate's week() > function works as documented rather than as expected by Luca Meyer. > > Uwe Ligges __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] tabulating 2 factors weighting by a third var
Hi, This must be an easy one but so far I haven't find a way out... I have a data frame such as: $ v1: Factor w/ 5 levels $ v2: Factor w/ 2 levels $ v3: Class 'difftime' atomic [1:] basically v1 and v2 are factors, while v3 is a variable containing the duration of certain activities (values ranging from 11 to 45000 sec, no missing values) How can I get a table such that v1 levels will show as rows, v2 levels as columns and v3 is the weight by which table(v1,v2) is weighted? That is, instead of getting the count of occurences in each of the 10 cells of table(v1,v2) I would like to get the sum(v3), how can it be done? Thanks, Luca Luca Meyer www.lucameyer.com IBM SPSS Statistics release 19.0.0 R version 2.12.1 (2010-12-16) Mac OS X 10.6.5 (10H574) - kernel Darwin 10.5.0 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Ifelse stability problems?
%) 45.9 134 B (10-20%) 24.9 135 C (5-15%) 0.2 136 D (1-5%) 1.6 137 E (10-20%) 27.4 138 A (50-67%) 61.5 139 B (10-20%) 8.7 140 C (5-15%) 22.5 141 D (1-5%) 0.1 142 E (10-20%) 7.2 143 A (50-67%) 64.1 144 B (10-20%) 0.9 145 C (5-15%) 14.4ok 146 D (1-5%) 11.2ok 147 E (10-20%) 9.4ok 148 A (50-67%) 67.8 149 B (10-20%) 11.7ok 150 C (5-15%) 10.6ok 151 D (1-5%) 1.3 152 E (10-20%) 8.6 153 A (50-67%) 65.9 154 B (10-20%) 9.9ok 155 C (5-15%) 11.3ok 156 D (1-5%) 1.6 157 E (10-20%) 11.3ok 158 A (50-67%) 77.0 159 B (10-20%) 5.3 160 C (5-15%) 8.6 161 D (1-5%) 2.6 162 E (10-20%) 6.5 163 A (50-67%) 77.5 164 B (10-20%) 5.7 165 C (5-15%) 8.1 166 D (1-5%) 4.6 167 E (10-20%) 4.2 168 A (50-67%) 40.1 169 B (10-20%) 12.9ok 170 C (5-15%) 33.2 171 D (1-5%) 0.3 172 E (10-20%) 13.6ok 173 A (50-67%) 53.9 174 B (10-20%) 10.1ok 175 C (5-15%) 8.4 176 D (1-5%) 4.2 177 E (10-20%) 23.4 178 A (50-67%) 94.3 179 C (5-15%) 1.7 180 E (10-20%) 4.0 181 A (50-67%) 62.1 182 B (10-20%) 12.3ok 183 C (5-15%) 5.3 184 D (1-5%) 7.3 185 E (10-20%) 13.0ok 186 A (50-67%) 49.2 187 B (10-20%) 14.1ok 188 C (5-15%) 7.9 189 D (1-5%) 8.9 190 E (10-20%) 20.0ok 191 A (50-67%) 63.6 192 B (10-20%) 10.4ok 193 C (5-15%) 11.9ok 194 D (1-5%) 2.4 195 E (10-20%) 11.7ok 196 A (50-67%) 55.1 197 B (10-20%) 13.5ok 198 C (5-15%) 11.2ok 199 D (1-5%) 4.8 200 E (10-20%) 15.5ok 201 A (50-67%) 68.6 202 B (10-20%) 3.1 203 C (5-15%) 8.2 204 D (1-5%) 9.2ok 205 E (10-20%) 10.8ok 206 A (50-67%) 45.0 207 B (10-20%) 4.8 208 C (5-15%) 7.1 209 D (1-5%) 4.9 210 E (10-20%) 38.2 211 A (50-67%) 85.2 212 B (10-20%) 3.1 213 C (5-15%) 4.4 214 D (1-5%) 0.4 215 E (10-20%) 6.9 216 A (50-67%) 60.5 217 B (10-20%) 10.1ok 218 C (5-15%) 11.1ok 219 D (1-5%) 1.8 220 E (10-20%) 16.5ok 221 A (50-67%) 58.7 222 B (10-20%) 7.0 223 C (5-15%) 10.5ok 224 D (1-5%) 5.2 225 E (10-20%) 18.7ok 226 A (50-67%) 90.0 227 C (5-15%) 5.6 228 D (1-5%) 0.7 229 E (10-20%) 3.8 230 A (50-67%) 62.5 231 B (10-20%) 13.7ok 232 C (5-15%) 9.7ok 233 D (1-5%) 2.6 234 E (10-20%) 11.6ok 235 A (50-67%) 55.6 236 B (10-20%) 17.6ok 237 C (5-15%) 11.8ok 238 D (1-5%) 2.6 239 E (10-20%) 12.4ok 240 A (50-67%) 85.2 241 B (10-20%) 0.6 242 C (5-15%) 2.1 243 D (1-5%) 2.3 244 E (10-20%) 9.8ok 245 A (50-67%) 87.4 246 B (10-20%) 0.4 247 C (5-15%) 2.9 248 D (1-5%) 2.8 249 E (10-20%) 6.4 250 A (50-67%) 73.0 251 B (10-20%) 4.0 252 C (5-15%) 15.6ok 253 D (1-5%) 0.7 254 E (10-20%) 6.7 255 A (50-67%) 90.4 256 C (5-15%) 2.4 257 D (1-5%) 2.5 258 E (10-20%) 4.7 259 A (50-67%) 64.3 260 B (10-20%) 6.6 261 C (5-15%) 13.3ok 262 D (1-5%) 3.5 263 E (10-20%) 12.3ok 264 A (50-67%) 65.5 265 B (10-20%) 13.5ok 266 C (5-15%) 4.6 267 D (1-5%) 0.9 268 E (10-20%) 15.4ok 269 A (50-67%) 72.1 270 B (10-20%) 6.4 271 C (5-15%) 12.7ok 272 D (1-5%) 1.1 273 E (10-20%) 7.7 274 A (50-67%) 71.4 275 B (10-20%) 0.9 276 C (5-15%) 21.9 277 E (10-20%) 5.7 278 A (50-67%) 53.0 279 B (10-20%) 3.6 280 C (5-15%) 36.4 281 E (10-20%) 7.0 Can anyone explain why this might occur? Thanks, Luca Luca Meyer www.lucameyer.com IBM SPSS Statistics release 19.0.0 R version 2.12.1 (2010-12-16) Mac OS X 10.6.5 (10H574) - kernel Darwin 10.5.0 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Alternative to extended recode sintax? Bug?
Something goes wrong with the week function of the lubridate package: > x= as.POSIXct(factor(c("2010-12-15 17:28:27", + "2010-12-15 17:32:34", + "2010-12-15 18:48:39", + "2010-12-15 19:25:00", + "2010-12-16 08:00:00", + "2010-12-16 08:25:49", + "2010-12-16 09:00:00"))) > require(lubridate) > weekdays(x) [1] "Mercoledì" "Mercoledì" "Mercoledì" "Mercoledì" "Giovedì" "Giovedì" "Giovedì" > week(x) [1] 50 50 50 50 51 51 51 > Please notice Mercoledì=Wednesday and Giovedì=Thursday, why would the beginning of the week start on Thursday? Also please beware that on previous weeks this does not occur, that is all weeks till 49 will all begin on Mondays and end on Sundays as required. Thanks, Luca Il giorno 18/dic/2010, alle ore 14.39, David Winsemius ha scritto: > > On Dec 17, 2010, at 11:08 AM, Luca Meyer wrote: > > x= factor(c("2009-03-30 00:00:00", "2009-04-06 00:00:00", "2009-04-13 > 00:00:00", "2009-04-20 00:00:00", "2009-04-27 00:00:00", "2009-05-04 > 00:00:00" ,"2009-05-11 00:00:00", "2009-05-18 00:00:00")) > require(lubridate) > xd=as.POSIXct(x) > week(xd) > # [1] 13 14 15 16 17 18 19 20 > year(xd) > # [1] 2009 2009 2009 2009 2009 2009 2009 2009 > paste(year(xd), " W",week(xd), sep="") > #[1] "2009 W13" "2009 W14" "2009 W15" "2009 W16" "2009 W17" "2009 W18" "2009 > W19" "2009 W20" > > > > David Winsemius, MD > West Hartford, CT > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] testing with if: what I am doing wrong?
I am running this small program: x <- factor(c("A","B","A","C")) y <- c(1,2,3,4) w <-data.frame(x,y) if (w$x=="A"){ w$z=1 } w And I obtain: x y z 1 A 1 1 2 B 2 1 3 A 3 1 4 C 4 1 And not x y z 1 A 1 1 2 B 2 NA 3 A 3 1 4 C 4 NA Like I should obtain. What am I doing wrong? Please notice that I get a warning approximately saying - translated from italian: In if (w$x == "A") { : the condition length > 1 and only the first element will be used Thanks, Luca __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Alternative to extended recode sintax?
Dear R-users, I have a factor variable within my data frame which I derive week after week from a POSIXct variable using the cut(var,"weeks") command I have found in the chron package. The levels() command gives me: [1] "2009-03-30 00:00:00" "2009-04-06 00:00:00" "2009-04-13 00:00:00" "2009-04-20 00:00:00" "2009-04-27 00:00:00" "2009-05-04 00:00:00" "2009-05-11 00:00:00" "2009-05-18 00:00:00" [9] "2009-05-25 00:00:00" "2009-06-01 00:00:00" "2009-06-08 00:00:00" "2009-06-15 00:00:00" "2009-06-22 00:00:00" "2009-06-29 00:00:00" "2009-07-06 00:00:00" "2009-07-13 00:00:00" [17] "2009-07-20 00:00:00" "2009-07-27 00:00:00" "2009-08-03 00:00:00" "2009-08-10 00:00:00" "2009-08-17 00:00:00" "2009-08-24 00:00:00" "2009-08-31 00:00:00" "2009-09-07 00:00:00" [25] "2009-09-14 00:00:00" "2009-09-21 00:00:00" "2009-09-28 00:00:00" "2009-10-05 00:00:00" "2009-10-12 00:00:00" "2009-10-19 00:00:00" "2009-10-25 23:00:00" "2009-11-01 23:00:00" [33] "2009-11-08 23:00:00" "2009-11-15 23:00:00" "2009-11-22 23:00:00" "2009-11-29 23:00:00" "2009-12-06 23:00:00" "2009-12-13 23:00:00" "2009-12-20 23:00:00" "2009-12-27 23:00:00" [41] "2010-01-03 23:00:00" "2010-01-10 23:00:00" "2010-01-17 23:00:00" "2010-01-24 23:00:00" "2010-01-31 23:00:00" "2010-02-07 23:00:00" "2010-02-14 23:00:00" "2010-02-21 23:00:00" [49] "2010-02-28 23:00:00" "2010-03-07 23:00:00" "2010-03-14 23:00:00" "2010-03-21 23:00:00" "2010-03-29 00:00:00" "2010-04-05 00:00:00" "2010-04-12 00:00:00" "2010-04-19 00:00:00" [57] "2010-04-26 00:00:00" "2010-05-03 00:00:00" "2010-05-10 00:00:00" "2010-05-17 00:00:00" "2010-05-24 00:00:00" "2010-05-31 00:00:00" "2010-06-07 00:00:00" "2010-06-14 00:00:00" [65] "2010-06-21 00:00:00" "2010-06-28 00:00:00" "2010-07-05 00:00:00" "2010-07-12 00:00:00" "2010-07-19 00:00:00" "2010-07-26 00:00:00" "2010-08-02 00:00:00" "2010-08-09 00:00:00" [73] "2010-08-16 00:00:00" "2010-08-23 00:00:00" "2010-08-30 00:00:00" "2010-09-06 00:00:00" "2010-09-13 00:00:00" "2010-09-20 00:00:00" "2010-09-27 00:00:00" "2010-10-04 00:00:00" [81] "2010-10-11 00:00:00" "2010-10-18 00:00:00" "2010-10-25 00:00:00" "2010-10-31 23:00:00" "2010-11-07 23:00:00" "2010-11-14 23:00:00" "2010-11-21 23:00:00" "2010-11-28 23:00:00" [89] "2010-12-05 23:00:00" "2010-12-12 23:00:00" Now what I would like is to have more readable labels, such as 2010-W01 for the first week of 2010, 2009-W34 for the 34th week in 2009, etcis there an easier way to achieve that than having to write out the all recode sintax: library(car) dataset$newvar <- recode(dataset$oldvar, " c('2009-03-30 00:00:00')='2009-W13'; c('2009-04-06 00:00:00')='2009-W14'; # etc... c('2010-12-05 23:00:00')='2009-W48'; c('2010-12-12 23:00:00')='2009-W49'; # etc...this part should be updated with time unless I'll find some automatic procedure ") Thanks, Luca Luca Meyer www.lucameyer.com IBM SPSS Statistics release 19.0.0 R version 2.12.1 (2010-12-16) Mac OS X 10.6.5 (10H574) - kernel Darwin 10.5.0 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to use diff() with different variables?
Hi, I first should say I am new to R. I have searched without success the R-archives to see if I could find an answer to what I am about to ask you. My dataset is like: xfine 1 A 2010-12-09 07:57:33 2 B 2010-12-09 08:05:00 3 C 2010-12-08 20:42:00 ... that is: 'data.frame': 3 obs. of 2 variables: $ x : Factor w/ 3 levels "A","B","C": 1 2 3 $ fine: POSIXct, format: "2010-12-09 07:57:33" "2010-12-09 08:05:00" "2010-12-08 20:42:00" What I am trying to do is to build another variable fine1 that should contain the lagged value for "fine", that is: xfine fine1 1 A 2010-12-09 07:57:33 NA 2 B 2010-12-09 08:05:00 2010-12-09 07:57:33 3 C 2010-12-08 20:42:00 2010-12-09 08:05:00 How can I do that? Thanks, Luca Luca Meyer www.lucameyer.com IBM SPSS Statistics release 19.0.0 R version 2.12.0 (2010-10-15) Mac OS X 10.6.5 (10H574) - kernel Darwin 10.5.0 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.