[R] Coding your Secret Santa in R!
Hello Everyone! Christmas is coming and with it, gift exchange! Every year, with my family, we draw names from a hat to decide who gives a gift to who. Very basic and annoying method, as it doesn't prevent somebody to draw himself, to draw his/her partner, to draw years after years the same person and it forces to either have everybody at the same place at the same time to do the draw or have somebody to manage everything which with break the fun for him/her. This year, I decided it was time to upgrade and enter the 2.0 era for secret santa, I've coded it in R! The principle is simple. You enter the people names, the draw restrictions and the program randomly picks everyone secret santa and send them a email to tell them. R is so great... If you're interested, here is my code. It's probably not optimal but it still works. Part of the comments are in french, sorry about that. Merry Christmas! Bastien code du tirage au sort pour les cadeaux de noel ### set working directory setwd("U:\\Dropbox\\Gestion familiale\\tirage Noël Lombardo") ### load required package (only if you want to send emails) library(sendmailR) ### set the year (use later a little bit, could be more useful) an <- 2015 ### write a vector of all participants #participants.2014 <- c("Bastien","Isa","Cath","Rob","Matt","Sylvie","John","Myriam","Yolande","Mike", "Audrey")# if you want history participants.2015 <- c("Bastien","Isa","Cath","Rob","Matt","Sylvie","John") participants <- participants.2015 ## The one to use this year ### If you want the code to send email, make a named list of the email address of participants list.email <- c(Bastien="", Isa=" ", John=" ", Sylvie=" ", Cath=" ", Rob=" ", Matt=" ") ### You can add restrictions, i.e. people who can't give to other people. Create as many as you want, ### They are on the form of 2 columns matrix with the first column being the giver and the second column the receiver ### In this case, there is 3 kinds of restrictions: ###1) you don't want to draw yourself ###2) you don't want to draw your partner, girlfriend or boyfriend ###3) you don't want to draw the same person as last year #1) restiction.soismeme <- cbind(giver=participants,receiver=participants) #2) restriction.couple <- matrix(c("Bastien","Isa","Cath","Rob","Sylvie", "John","Mike","Audrey"),4,2,byrow=T) #3) (restriction 2014 read on my hard drive last years restrictions, will not work on your computer) #restriction.2013 <- matrix(c("Bastien","Sylvie", "Isa", "Bastien", "Matt", "Yolande","Rob","John","Cath","Rob"),5,2,byrow=T) restriction.2014 <- cbind(unlist(strsplit(list.files("2014"),".txt")),as.character(unlist(sapply(list.files("2014", full.names=T),read.table ## then you append (rbind) all the restrictions, the order matters! restrictions <- rbind(restriction.couple,restriction.couple[,2:1],restiction.soismeme,restriction.2014) ### I created a simple function validating the draw (making sure the draw isn't in the restrictions ### this function is use latter in a "while" loop valide.res <- function(paires, restric){ any(apply(restric, 1, function(xx) all(paires==xx))) } ### Draw people as long as you have a restriction in the results res=T while(res==T){ tirage <- cbind(giver=sample(participants,length(participants)),receiver=sample(participants,length(participants))) res <- any(apply(tirage,1,valide.res,restrictions)) } ### This loop is run to output the draw results ### It does 2 things: ### 1) save a text file named with the giver's name which contains the receiver's name ### 2) send an email to the giver with the body of the message being the receiver's name for(i in 1:nrow(tirage)){ # 1) write text file write.table(tirage[i,"receiver"],file=paste0(an,"\\",tirage[i,"giver"],".txt"), quote=F,row.names=F, col.names=F) # 2) send an email body <- list(paste0("Voici le résultat du tirage pour l'échange de cadeaux ", an, "!"," Vous avez pigé : "), paste0("*** ",tirage[i,"receiver"]," ***"), paste0("Bravo! et Joyeux Noël!")) sendmail(" ", list.email[[tirage[i,"giver"]]], "Secret Santa des Lombardo!", body, control=list(smtpServer="relais.videotron.ca")) } ### It's all done! __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Weird behavior of aggregate() function
Hello list, I have found a weird behavior of the aggregate() function when used with characters. I think the problem as to do with converting characters to factors. I'm trying to aggregate a character vector using an homemade function. My function is giving me all the possible pairs of modalities observed. Reproducible code: ### ### my grouping variable gr - c(A,A,B,B,C,C,C,D,D,E,E,E) ### my variable vari - c(rs2,rs2,mj2,mj1,rs1,rs1,rs2,mj1,mj1,rs1,mj1,mj2) ### what the table would look like cbind(gr,vari) ### My function that gives every pairs of variables possible (my real function can go up to length(TE)==5, but for the sake of the example, I've reduced it here) faire.paires - function(TE){ gg - rbind(c(TE[1],TE[2]), c(TE[1],TE[3])) gg - gg[rowSums(is.na(gg))==0,,drop=F] gg } ### The function gives exactly what I want when I run it on a specific entry faire.paires(TE = vari[gr==B]) ### But with aggregate(), it transforms everything into integer res - aggregate(list(TE = vari), by=list(gr),faire.paires) res str(res) ### it's like it's using factor than losing the key to tell me which integer ### mean which modality ### if I give it directly factors: res2 - aggregate(list(TE = as.factor(vari)), by=list(gr),faire.paires) res2 str(res2) ### does not fix the problem. Any idea? I know my function may not be the best or most efficient way to succeed. However, I'm still puzzled on why aggregate gives me this weird output. Best regards, Bastien Ferland-Raymond, M.Sc. Stat., M.Sc. Biol. Division des orientations et projets spéciaux Direction des inventaires forestiers Ministère des Forêts, de la Faune et des Parcs __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weird behavior of aggregate() function
Thanks Ista for youe help, it works and I understand why. However, I'm still confuse why the previous code lost the factor key. It could just have converted to factors and output factors but instead it's outputing integer... I'm not a very big fan of the default stringAsFactors=T, but that's another debate. Anyway, thanks again, Bastien -Message d'origine- De : Ista Zahn [mailto:istaz...@gmail.com] Envoyé : 26 janvier 2015 11:51 À : Ferland-Raymond, Bastien (DIF) Cc : r-help@r-project.org Objet : Re: [R] Weird behavior of aggregate() function ?aggregate informs you that unless x is a time series it will be converted to a data.frame. data.frame will convert your character to a factor unless you tell it not to. You can prevent this by converting vari to a data.frame yourself, passing the stringsAsFactors argument, like this: aggregate(data.frame(TE = vari, stringsAsFactors = FALSE), by=list(gr),faire.paires) Best, Ista On Mon, Jan 26, 2015 at 11:30 AM, bastien.ferland-raym...@mffp.gouv.qc.ca wrote: Hello list, I have found a weird behavior of the aggregate() function when used with characters. I think the problem as to do with converting characters to factors. I'm trying to aggregate a character vector using an homemade function. My function is giving me all the possible pairs of modalities observed. Reproducible code: ### ### my grouping variable gr - c(A,A,B,B,C,C,C,D,D,E,E,E) ### my variable vari - c(rs2,rs2,mj2,mj1,rs1,rs1,rs2,mj1,mj1,rs1,mj1,m j2) ### what the table would look like cbind(gr,vari) ### My function that gives every pairs of variables possible (my real function can go up to length(TE)==5, but for the sake of the example, I've reduced it here) faire.paires - function(TE){ gg - rbind(c(TE[1],TE[2]), c(TE[1],TE[3])) gg - gg[rowSums(is.na(gg))==0,,drop=F] gg } ### The function gives exactly what I want when I run it on a specific entry faire.paires(TE = vari[gr==B]) ### But with aggregate(), it transforms everything into integer res - aggregate(list(TE = vari), by=list(gr),faire.paires) res str(res) ### it's like it's using factor than losing the key to tell me which integer ### mean which modality ### if I give it directly factors: res2 - aggregate(list(TE = as.factor(vari)), by=list(gr),faire.paires) res2 str(res2) ### does not fix the problem. Any idea? I know my function may not be the best or most efficient way to succeed. However, I'm still puzzled on why aggregate gives me this weird output. Best regards, Bastien Ferland-Raymond, M.Sc. Stat., M.Sc. Biol. Division des orientations et projets spéciaux Direction des inventaires forestiers Ministère des Forêts, de la Faune et des Parcs __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] bad label change in step() from lmerTest package
Hello list, I recently started working with the step() function in the lmerTest package and I notice a weird behavior that may be a bug. The package perform stepwise selection of fixed and random effects, however when it discard the random variable because not significant, it changes the label of the dependant variable in the best model formula. Here is a reproducible example : ### load de library : library(lmerTest) ### data preparation set.seed(1234) ## the Xs x1 = rnorm(100,23,2) x2 = rnorm(100,15,3) x3 = rnorm(100,5,2) x4 = rnorm(100,10,5) ## the dependant variable dep = (2 * x1 + rnorm(100,0,5)) + (-4 * x2 + rnorm(100,0,1)) + (0.1 * x3 + rnorm(100,0,3)) + (1 * x4 + rnorm(100,0,8)) ## the random variable, one good (significant) and one bad (not-significant) good.random = as.character(cut(dep+rnorm(100,0,2),3, c(group1,group2,group3))) bad.random = sample(c(group1,group2,group3), 100, replace=T) ### we make the starting models, one with the good and one with the bad random variable mod.good - lmer(dep ~ x1+x2+x3+x4+(1|good.random)) mod.bad - lmer(dep ~ x1+x2+x3+x4+(1|bad.random)) ### we do the stepwise selection select.good - step(mod.good) # should keep the random variable select.bad - step(mod.bad) # should remove the random variable ### The label of the dependant variable change between model where the random effect was removed and the one where it was kept. formula(select.good$model) # output : dep ~ x1 + x2 + x4 + (1 | good.random) # it's what it's suppose to be : dep ~ formula(select.bad$model) #output : y ~ x1 + x2 + x3 + x4 # here, it's change by : y ~ ### end code This is problematic when you're doing automatic model selection. Is it an option that I missed or a bug? Also, it's interesting to notice that the stepwise selection of the model with the bad random variable didn't remove the variable x3 which is clearly not significant. So I wonder if the function is doing selection of fixed effects after having removed the random effects. Thanks, Bastien Ferland-Raymond, M.Sc. Stat., M.Sc. Biol. Division des orientations et projets spéciaux Direction des inventaires forestiers Ministère des Forêts, de la Faune et des Parcs __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] column width in .dbf files using write.dbf ... to be continued
Hello Arnaud, You posted this question a long long time ago, however I found your answer so I decided to post it anyway in case somebody else have the same problem as you and me. You were actually very close in finding your solution. The function DoWritedbf is an internal function from the foreign package. To access it outside of the package just do: foreign:::DoWritedbf so in your line: invisible(.Call(foreign:::DoWritedbf, as.character(file), dataframe, as.integer(precision), as.integer(scale), as.character(DataTypes))) It is explain here: http://stackoverflow.com/questions/2165342/r-calling-a-function-from-a-namespace Sorry for the delay in my answer... Bastien Ferland-Raymond, M.Sc. Stat., M.Sc. Biol. Division des orientations et projets spéciaux Direction des inventaires forestiers Ministère des Ressources naturelles In reply to : # Dear UseRs, I did not have any answer to my previous message (Is there a way to define manually columns width when using write.dbf function from the library foreign ?), so I tried to modify write.dbf function to do what I want. Here is my modified version : write.dbfMODIF - function (dataframe, file, factor2char = TRUE, max_nchar = 254, width = d) { allowed_classes - c(logical, integer, numeric, character, factor, Date) if (!is.data.frame(dataframe)) dataframe - as.data.frame(dataframe) if (any(sapply(dataframe, function(x) !is.null(dim(x) stop(cannot handle matrix/array columns) cl - sapply(dataframe, function(x) class(x[1L])) asis - cl == AsIs cl[asis sapply(dataframe, mode) == character] - character if (length(cl0 - setdiff(cl, allowed_classes))) stop(data frame contains columns of unsupported class(es) , paste(cl0, collapse = ,)) m - ncol(dataframe) DataTypes - c(logical = L, integer = N, numeric = F, character = C, factor = if (factor2char) C else N, Date = D)[cl] for (i in seq_len(m)) { x - dataframe[[i]] if (is.factor(x)) dataframe[[i]] - if (factor2char) as.character(x) else as.integer(x) else if (inherits(x, Date)) dataframe[[i]] - format(x, %Y%m%d) } precision - integer(m) scale - integer(m) dfnames - names(dataframe) for (i in seq_len(m)) { nlen - nchar(dfnames[i], b) x - dataframe[, i] if (is.logical(x)) { precision[i] - 1L scale[i] - 0L } else if (is.integer(x)) { rx - range(x, na.rm = TRUE) rx[!is.finite(rx)] - 0 if (any(rx == 0)) rx - rx + 1 mrx - as.integer(max(ceiling(log10(abs(rx + 3L) precision[i] - min(max(nlen, mrx), 19L) scale[i] - 0L } else if (is.double(x)) { precision[i] - 19L rx - range(x, na.rm = TRUE) rx[!is.finite(rx)] - 0 mrx - max(ceiling(log10(abs(rx scale[i] - min(precision[i] - ifelse(mrx 0L, mrx + 3L, 3L), 15L) } else if (is.character(x)) { if (width == d) { mf - max(nchar(x[!is.na(x)], b)) p - max(nlen, mf) if (p max_nchar) warning(gettext(character column %d will be truncated to %d bytes, i, max_nchar), domain = NA) precision[i] - min(p, max_nchar) scale[i] - 0L } else { if (width max_nchar) warning(gettext(character column %d will be truncated to %d bytes, i, max_nchar), domain = NA) precision[i] - min(width, max_nchar) } } else stop(unknown column type in data frame) } if (any(is.na(precision))) stop(NA in precision) if (any(is.na(scale))) stop(NA in scale) invisible(.Call(DoWritedbf, as.character(file), dataframe, as.integer(precision), as.integer(scale), as.character(DataTypes))) } However, when I wanted to use this function ... it does not find the DoWritedbf function that is called in the last lines (a function written in C). Is there a way to temporally replace the original write.dbf function by this one in the foreign package ? Thanks, Arnaud R version 2.10.0 (2009-10-26) i386-pc-mingw32 ## __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Modifying values into XML with R
Dear R gurus, I use R all the time at work, so one day a problem managing my personal arise data made me think: Why not use R, it does everything!. Anyway, my goal is to use R to manage my personal music library, and more precisely my playcounts. I have two XML files, one from Winamp and the other one from Itunes. Both have pretty much the same songs, but their playcounts are different. I want to import both of them in R, merge their playcounts and export it back to a XML file that I will be able to reload in Winamp (or Itunes). So far, I managed to import in R both libraries, extract their playcounts and merge them. But now I'm stuck at putting back this new playcount into the original XML. Here is a reproducible example showing what I want to do: ## ### first download one of my xml from : ## https://www.dropbox.com/s/qxteao3z8ypyfqh/petitXMLwinamp.xml ## load it in R and root it: winamp-xmlTreeParse(petitXMLwinamp.xml, useInternal = T) racine - xmlRoot(winamp) racine# to view the library ### I can extract one song (the first one for the example): une.chanson - xmlSApply(racine[[1]][[dict]][[2]],xmlValue) ### I can extract the playcount of this track with: racine[[1]][[dict]][[2]][which.max(une.chanson ==Play Count)+1] ### Now, I would simply want to change it from 2 to, lets say, 17: racine[[1]][[dict]][[2]][which.max(une.chanson ==Play Count)+1] - 17 ### it doesn't work, I get the error: Error in racine[[1]][[dict]][[2]][which.max(une.chanson == Play Count) + : object of type 'externalptr' is not subsettable ### If I try again digging further into the node I get the same error but with a different outcome: racine[[1]][[dict]][[2]][which.max(une.chanson ==Play Count)+1][[1]][[1]]-17 #Error in racine[[1]][[dict]][[2]][which.max(une.chanson == Play Count) + : # object of type 'externalptr' is not subsettable racine[[1]][[dict]][[2]][which.max(une.chanson ==Play Count)+1] #$integer #integer217/integer #attr(,class) #[1] XMLInternalNodeList XMLNodeList ### It created an error, but appended 17 to the 2 to create 217... ## Anybody here have an idea how to just change values of my XML document? Also, the class of my playcount node is XMLNodeList. What is a XMLNodeList? I can't find any reference of it in the XML package manual, so I don't know how to manage it and create it. The solution may be into switching from XMLNodeList to XMLNode and back to XMLNodeList. Anyway, those things are kind of complicated, I don't think I understand well yet the whole XML structure. Thanks in advance for your help, don't hesitate to ask questions if you need precision. Bastien R version 2.14.1 (2011-12-22) Platform: i386-pc-mingw32/i386 (32-bit) on Windows __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] optimising a loop
Dear R community, I'm trying to remove a loop from my code but I'm stock and I can't find a good way to do it. Hopefully one of you will have something clever to propose. Here is a simplified example: I have a squared matrix: nom.plac2 - c(102, 103, 301, 303,304, 403) poids2 - matrix(NA, 6,6, dimnames=list(nom.plac2,nom.plac2)) poids2 102 103 301 303 304 403 102 NA NA NA NA NA NA 103 NA NA NA NA NA NA 301 NA NA NA NA NA NA 303 NA NA NA NA NA NA 304 NA NA NA NA NA NA 403 NA NA NA NA NA NA I want to replace some of the NAs following specific criterion included in 2 others matrix: wei2 - matrix(c(.6,.4,.5,.5,.9,.1,.8,.2,.7,.3,.6,.4),6,2,dimnames=list(nom.plac2, c(p1,p2)),byrow=T) wei2 p1 p2 102 0.6 0.4 103 0.5 0.5 301 0.9 0.1 303 0.8 0.2 304 0.7 0.3 403 0.6 0.4 voisin - matrix(c(103,304, 303, 102, 103 ,303,403,304,303,102,103 ,303), 6,2,dimnames=list(nom.plac2, c(v1,v2)),byrow=T) voisin v1v2 102 103 304 103 303 102 301 103 303 303 403 304 304 303 102 403 103 303 So my final result is: 102 103 301 303 304 403 102 NA 0.6 NA NA 0.4 NA 103 0.5 NA NA 0.5 NA NA 301 NA 0.9 NA 0.1 NA NA 303 NA NA NA NA 0.2 0.8 304 0.3 NA NA 0.7 NA NA 403 NA 0.6 NA 0.4 NA NA So, globally I want to fill for each line of poids2 data from wei2 associated with the good the good identifier found in voisin. This can easily be done by a loop: loop - poids2 for(i in 1:6){ + loop[i,voisin[i,]] - wei2[i,] + } But I expect it to be quite slow with my larger dataset. Does any of you has an idea how I could remove the loop and speed up the operation? Best regards, Bastien Ferland-Raymond, M.Sc. Stat., M.Sc. Biol. Division des orientations et projets spéciaux Direction des inventaires forestiers Ministère des Ressources naturelles et de la Faune du Québec __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.