[R] Coding your Secret Santa in R!

2015-12-01 Thread Bastien.Ferland-Raymond
Hello Everyone!

Christmas is coming and with it, gift exchange!  Every year, with my family, we 
draw names from a hat to decide who gives a gift to who.  Very basic and 
annoying method, as it doesn't prevent somebody to draw himself, to draw 
his/her partner, to draw years after years the same person and it forces to 
either have everybody at the same place at the same time to do the draw or have 
somebody to manage everything which with break the fun for him/her. 

This year, I decided it was time to upgrade and enter the 2.0 era for secret 
santa, I've coded it in R!

The principle is simple.  You enter the people names, the draw restrictions and 
the program randomly picks everyone secret santa and send them a email to tell 
them.  R is so great...

If you're interested, here is my code.  It's probably not optimal but it still 
works.  Part of the comments are in french, sorry about that.

Merry Christmas!
Bastien




  code du tirage au sort pour les cadeaux de noel

###  set working directory
setwd("U:\\Dropbox\\Gestion familiale\\tirage Noël Lombardo")

### load required package (only if you want to send emails)
library(sendmailR)

### set the year (use later a little bit, could be more useful)
an <- 2015

### write a vector of all participants
#participants.2014 <- 
c("Bastien","Isa","Cath","Rob","Matt","Sylvie","John","Myriam","Yolande","Mike",
 "Audrey")# if you want history
participants.2015 <- c("Bastien","Isa","Cath","Rob","Matt","Sylvie","John")

participants <- participants.2015   ## The one to use this year

###  If you want the code to send email, make a named list of the email address 
of participants
list.email <- c(Bastien="", 
Isa="",
John="", 
Sylvie="",
Cath="", Rob="",
Matt="")


###  You can add restrictions, i.e. people who can't give to other people.  
Create as many as you want,
###  They are on the form of 2 columns matrix with the first column being the 
giver and the second column the receiver
###  In this case, there is 3 kinds of restrictions: 
###1) you don't want to draw yourself
###2) you don't want to draw your partner, girlfriend or boyfriend
###3) you don't want to draw the same person as last year 

#1)
restiction.soismeme <- cbind(giver=participants,receiver=participants)  
   

#2)
restriction.couple <- matrix(c("Bastien","Isa","Cath","Rob","Sylvie", 
"John","Mike","Audrey"),4,2,byrow=T)

#3) (restriction 2014 read on my hard drive last years restrictions, will not 
work on your computer)
#restriction.2013 <- matrix(c("Bastien","Sylvie", "Isa", "Bastien", "Matt", 
"Yolande","Rob","John","Cath","Rob"),5,2,byrow=T)
restriction.2014 <- 
cbind(unlist(strsplit(list.files("2014"),".txt")),as.character(unlist(sapply(list.files("2014",
 full.names=T),read.table

##  then you append (rbind) all the restrictions, the order matters!
restrictions <- 
rbind(restriction.couple,restriction.couple[,2:1],restiction.soismeme,restriction.2014)


###  I created a simple function validating the draw (making sure the draw 
isn't in the restrictions
###  this function is use latter in a "while" loop
valide.res <- function(paires, restric){
any(apply(restric, 1, function(xx) all(paires==xx)))
}


###  Draw people as long as you have a restriction in the results
res=T
while(res==T){
tirage <- 
cbind(giver=sample(participants,length(participants)),receiver=sample(participants,length(participants)))
res <- any(apply(tirage,1,valide.res,restrictions))
}



###  This loop is run to output the draw results
###  It does 2 things:
###   1) save a text file named with the giver's name which contains the 
receiver's name 
###   2) send an email to the giver with the body of the message being the 
receiver's name
for(i in 1:nrow(tirage)){ 
  # 1) write text file

write.table(tirage[i,"receiver"],file=paste0(an,"\\",tirage[i,"giver"],".txt"), 
quote=F,row.names=F, col.names=F) 
  # 2) send an email
body <- list(paste0("Voici le résultat du tirage pour l'échange de cadeaux 
", an, "!","  Vous avez pigé : "),
 paste0("*** ",tirage[i,"receiver"]," ***"),
 paste0("Bravo! et Joyeux Noël!"))
sendmail("", list.email[[tirage[i,"giver"]]], 
"Secret Santa des Lombardo!", body, 
control=list(smtpServer="relais.videotron.ca"))
}


###  It's all done!

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Weird behavior of aggregate() function

2015-01-26 Thread Bastien.Ferland-Raymond

Hello list,

I have found a weird behavior of the aggregate() function when used with 
characters. I think the problem as to do with converting characters to factors.

I'm trying to aggregate a character vector using an homemade function.  My 
function is giving me all the possible pairs of modalities observed.


Reproducible code:

###
### my grouping variable
gr - c(A,A,B,B,C,C,C,D,D,E,E,E)
### my variable
vari - 
c(rs2,rs2,mj2,mj1,rs1,rs1,rs2,mj1,mj1,rs1,mj1,mj2)

### what the table would look like
cbind(gr,vari)

###  My function that gives every pairs of variables possible (my real function 
can go up to length(TE)==5, but for the sake of the example, I've reduced it 
here)
faire.paires - function(TE){
gg - rbind(c(TE[1],TE[2]),
c(TE[1],TE[3]))
gg - gg[rowSums(is.na(gg))==0,,drop=F]
gg
}

###  The function gives exactly what I want when I run it on a specific entry
faire.paires(TE = vari[gr==B])

###  But with aggregate(), it transforms everything into integer
res - aggregate(list(TE = vari), by=list(gr),faire.paires)
res
str(res)

###  it's like it's using factor than losing the key to tell me which integer
###  mean which modality


###  if I give it directly factors:
res2 - aggregate(list(TE = as.factor(vari)), by=list(gr),faire.paires)
res2
str(res2)

###  does not fix the problem.


Any idea?

I know my function may not be the best or most efficient way to succeed. 
However, I'm still puzzled on
why aggregate gives me this weird output.

Best regards,

Bastien Ferland-Raymond, M.Sc. Stat., M.Sc. Biol.
Division des orientations et projets spéciaux
Direction des inventaires forestiers
Ministère des Forêts, de la Faune et des Parcs 

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Weird behavior of aggregate() function

2015-01-26 Thread Bastien.Ferland-Raymond
Thanks Ista for youe help, it works and I understand why.

However, I'm still confuse why the previous code lost the factor key.  It 
could just have converted to factors and output factors but instead it's 
outputing integer...

I'm not a very big fan of the default stringAsFactors=T, but that's another 
debate.

Anyway, thanks again,

Bastien 

-Message d'origine-
De : Ista Zahn [mailto:istaz...@gmail.com] 
Envoyé : 26 janvier 2015 11:51
À : Ferland-Raymond, Bastien (DIF)
Cc : r-help@r-project.org
Objet : Re: [R] Weird behavior of aggregate() function

?aggregate informs you that unless x is a time series it will be converted to a 
data.frame. data.frame will convert your character to a factor unless you tell 
it not to.

You can prevent this by converting vari to a data.frame yourself, passing the 
stringsAsFactors argument, like this:

aggregate(data.frame(TE = vari, stringsAsFactors = FALSE),
by=list(gr),faire.paires)

Best,
Ista

On Mon, Jan 26, 2015 at 11:30 AM,
bastien.ferland-raym...@mffp.gouv.qc.ca wrote:

 Hello list,

 I have found a weird behavior of the aggregate() function when used with 
 characters. I think the problem as to do with converting characters to 
 factors.

 I'm trying to aggregate a character vector using an homemade function.  My 
 function is giving me all the possible pairs of modalities observed.


 Reproducible code:

 ###
 ### my grouping variable
 gr - c(A,A,B,B,C,C,C,D,D,E,E,E)
 ### my variable
 vari - 
 c(rs2,rs2,mj2,mj1,rs1,rs1,rs2,mj1,mj1,rs1,mj1,m
 j2)

 ### what the table would look like
 cbind(gr,vari)

 ###  My function that gives every pairs of variables possible (my real 
 function can go up to length(TE)==5, but for the sake of the example, 
 I've reduced it here) faire.paires - function(TE){ gg - 
 rbind(c(TE[1],TE[2]),
 c(TE[1],TE[3]))
 gg - gg[rowSums(is.na(gg))==0,,drop=F] gg }

 ###  The function gives exactly what I want when I run it on a 
 specific entry faire.paires(TE = vari[gr==B])

 ###  But with aggregate(), it transforms everything into integer res 
 - aggregate(list(TE = vari), by=list(gr),faire.paires) res
 str(res)

 ###  it's like it's using factor than losing the key to tell me which 
 integer ###  mean which modality


 ###  if I give it directly factors:
 res2 - aggregate(list(TE = as.factor(vari)), 
 by=list(gr),faire.paires)
 res2
 str(res2)

 ###  does not fix the problem.
 

 Any idea?

 I know my function may not be the best or most efficient way to 
 succeed. However, I'm still puzzled on why aggregate gives me this weird 
 output.

 Best regards,

 Bastien Ferland-Raymond, M.Sc. Stat., M.Sc. Biol.
 Division des orientations et projets spéciaux Direction des 
 inventaires forestiers Ministère des Forêts, de la Faune et des Parcs

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] bad label change in step() from lmerTest package

2014-12-16 Thread Bastien.Ferland-Raymond
Hello list,

I recently started working with the step() function in the lmerTest package and 
I notice a weird behavior that may be a bug.  The package perform stepwise 
selection of fixed and random effects, however when it discard the random 
variable because not significant, it changes the label of the dependant 
variable in the best model formula. 

Here is a reproducible example :

### load de library :
library(lmerTest)

###  data preparation
set.seed(1234)

## the Xs
x1 = rnorm(100,23,2)
x2 = rnorm(100,15,3)
x3 = rnorm(100,5,2)
x4 = rnorm(100,10,5)

## the dependant variable
dep = (2 * x1 +  rnorm(100,0,5)) + (-4 * x2 +  rnorm(100,0,1)) + (0.1 * x3 +  
rnorm(100,0,3)) + (1 * x4 +  rnorm(100,0,8))

## the random variable, one good (significant) and one bad (not-significant)
good.random = as.character(cut(dep+rnorm(100,0,2),3, 
c(group1,group2,group3)))
bad.random = sample(c(group1,group2,group3), 100, replace=T)

###  we make the starting models, one with the good and one with the bad random 
variable
mod.good - lmer(dep ~ x1+x2+x3+x4+(1|good.random))
mod.bad  -   lmer(dep ~ x1+x2+x3+x4+(1|bad.random))

### we do the stepwise selection
select.good - step(mod.good)   # should keep the random variable
select.bad - step(mod.bad) # should remove the random 
variable

###  The label of the dependant variable change between model where the random 
effect was removed and the one where it was kept.
formula(select.good$model)
# output : dep ~ x1 + x2 + x4 + (1 | good.random)
# it's what it's suppose to be : dep ~

formula(select.bad$model)
#output : y ~ x1 + x2 + x3 + x4
# here, it's change by : y ~
### end code

This is problematic when you're doing automatic model selection.  Is it an 
option that I missed or a bug?
Also, it's interesting to notice that the stepwise selection of the model with 
the bad random variable didn't remove the variable x3 which is clearly not 
significant.  So I wonder if the function is doing selection of fixed effects 
after having removed the random effects.

Thanks,



Bastien Ferland-Raymond, M.Sc. Stat., M.Sc. Biol.
Division des orientations et projets spéciaux
Direction des inventaires forestiers
Ministère des Forêts, de la Faune et des Parcs 

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] column width in .dbf files using write.dbf ... to be continued

2013-05-22 Thread Bastien.Ferland-Raymond

Hello Arnaud,

You posted this question a long long time ago, however I found your answer so I 
decided to post it anyway in case somebody else have the same problem as you 
and me.

You were actually very close in finding your solution.  The function DoWritedbf 
is an internal function from the foreign package.  To access it outside of the 
package just do:

foreign:::DoWritedbf

so in your line:

invisible(.Call(foreign:::DoWritedbf, as.character(file), dataframe,
  as.integer(precision), as.integer(scale), as.character(DataTypes)))

It is explain here: 
http://stackoverflow.com/questions/2165342/r-calling-a-function-from-a-namespace

Sorry for the delay in my answer...

Bastien Ferland-Raymond, M.Sc. Stat., M.Sc. Biol.
Division des orientations et projets spéciaux
Direction des inventaires forestiers
Ministère des Ressources naturelles

In reply to :
#
Dear UseRs,
I did not have any answer to my previous message (Is there a way to define 
manually columns width when using write.dbf function from the library foreign 
?), so I tried to modify write.dbf function to do what I want.
Here is my modified version :
write.dbfMODIF - function (dataframe, file, factor2char = TRUE, max_nchar = 
254, width = d)
{
allowed_classes - c(logical, integer, numeric, character,
factor, Date)
if (!is.data.frame(dataframe))
dataframe - as.data.frame(dataframe) if (any(sapply(dataframe, 
function(x) !is.null(dim(x)
stop(cannot handle matrix/array columns) cl - sapply(dataframe, 
function(x) class(x[1L])) asis - cl == AsIs
cl[asis  sapply(dataframe, mode) == character] - character if 
(length(cl0 - setdiff(cl, allowed_classes)))
stop(data frame contains columns of unsupported class(es) ,
paste(cl0, collapse = ,))

m - ncol(dataframe)
DataTypes - c(logical = L, integer = N, numeric = F,
character = C, factor = if (factor2char) C else N,
Date = D)[cl]
for (i in seq_len(m)) {
x - dataframe[[i]]
if (is.factor(x))
dataframe[[i]] - if (factor2char)
as.character(x)
else as.integer(x)
else if (inherits(x, Date))
dataframe[[i]] - format(x, %Y%m%d)
}
precision - integer(m)
scale - integer(m)
dfnames - names(dataframe)
for (i in seq_len(m)) {
nlen - nchar(dfnames[i], b)
x - dataframe[, i]
if (is.logical(x)) {
precision[i] - 1L
scale[i] - 0L


}

else if (is.integer(x)) {
rx - range(x, na.rm = TRUE)
rx[!is.finite(rx)] - 0
if (any(rx == 0))
rx - rx + 1
mrx - as.integer(max(ceiling(log10(abs(rx +
3L)
precision[i] - min(max(nlen, mrx), 19L)
scale[i] - 0L


}

else if (is.double(x)) {
precision[i] - 19L
rx - range(x, na.rm = TRUE)
rx[!is.finite(rx)] - 0
mrx - max(ceiling(log10(abs(rx
scale[i] - min(precision[i] - ifelse(mrx  0L, mrx +
3L, 3L), 15L)


}

else if (is.character(x)) {
if (width == d) {
   mf - max(nchar(x[!is.na(x)], b))
p - max(nlen, mf)
if (p  max_nchar)
warning(gettext(character column %d will be truncated
to %d bytes,
  i, max_nchar), domain = NA)
precision[i] - min(p, max_nchar)
scale[i] - 0L


} else {


if (width  max_nchar)
warning(gettext(character column %d will be truncated
to %d bytes,
  i, max_nchar), domain = NA)
precision[i] - min(width, max_nchar)


}

}

else stop(unknown column type in data frame)
}
if (any(is.na(precision)))
stop(NA in precision)
if (any(is.na(scale)))
stop(NA in scale)
invisible(.Call(DoWritedbf, as.character(file), dataframe,
as.integer(precision), as.integer(scale), as.character(DataTypes))) }
However, when I wanted to use this function ... it does not find the DoWritedbf 
function that is called in the last lines (a function written in C).
Is there a way to temporally replace the original write.dbf function by this 
one in the foreign package ?
Thanks,
Arnaud
R version 2.10.0 (2009-10-26)
i386-pc-mingw32
##

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Modifying values into XML with R

2012-04-26 Thread Bastien.Ferland-Raymond
Dear R gurus,

I use R all the time at work, so one day a problem managing my personal arise 
data made me think: Why not use R, it does everything!.

Anyway, my goal is to use R to manage my personal music library, and more 
precisely my playcounts.  I have two XML files, one from Winamp and the other 
one from Itunes.  Both have pretty much the same songs, but their playcounts 
are different. I want to import both of them in R, merge their playcounts and 
export it back to a XML file that I will be able to reload in Winamp (or 
Itunes).  So far, I managed to import in R both libraries, extract their 
playcounts and merge them.  But now I'm stuck at putting back this new 
playcount into the original XML.

Here is a reproducible example showing what I want to do:

##

### first download one of my xml from :
##  https://www.dropbox.com/s/qxteao3z8ypyfqh/petitXMLwinamp.xml


## load it in R and root it:
winamp-xmlTreeParse(petitXMLwinamp.xml, useInternal = T)
racine - xmlRoot(winamp)

racine#  to view the library


###  I can extract one song (the first one for the example):
une.chanson - xmlSApply(racine[[1]][[dict]][[2]],xmlValue)


###  I can extract the playcount of this track with:
racine[[1]][[dict]][[2]][which.max(une.chanson ==Play Count)+1]

### Now, I would simply want to change it from 2 to, lets say, 17:
racine[[1]][[dict]][[2]][which.max(une.chanson ==Play Count)+1] - 17

###  it doesn't work, I get the error:
Error in racine[[1]][[dict]][[2]][which.max(une.chanson == Play Count) +  :
  object of type 'externalptr' is not subsettable

###  If I try again digging further into the node I get the same error but with 
a different outcome:
racine[[1]][[dict]][[2]][which.max(une.chanson ==Play 
Count)+1][[1]][[1]]-17
#Error in racine[[1]][[dict]][[2]][which.max(une.chanson == Play Count) +  :
#  object of type 'externalptr' is not subsettable
racine[[1]][[dict]][[2]][which.max(une.chanson ==Play Count)+1]
#$integer
#integer217/integer

#attr(,class)
#[1] XMLInternalNodeList XMLNodeList

###  It created an error, but appended 17 to the 2 to create 217...

##

Anybody here have an idea how to just change values of my XML document?  Also, 
the class of my playcount node is XMLNodeList.  What is a XMLNodeList?  I 
can't find any reference of it in the XML package manual, so I don't know how 
to manage it and create it.  The solution may be into switching from 
XMLNodeList to XMLNode and back to XMLNodeList.  Anyway, those things are kind 
of complicated, I don't think I understand well yet the whole XML structure.


Thanks in advance for your help, don't hesitate to ask questions if you need 
precision.

Bastien
R version 2.14.1 (2011-12-22)
Platform: i386-pc-mingw32/i386 (32-bit)
on Windows

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] optimising a loop

2011-11-03 Thread Bastien.Ferland-Raymond
Dear R community,

I'm trying to remove a loop from my code but I'm stock and I can't find a good 
way to do it.  Hopefully one of you will have something clever to propose.

Here is a simplified example:

I have a squared matrix:

 nom.plac2 - c(102, 103, 301, 303,304, 403)
 poids2 - matrix(NA, 6,6, dimnames=list(nom.plac2,nom.plac2))
 poids2
102 103 301 303 304 403
102  NA  NA  NA  NA  NA  NA
103  NA  NA  NA  NA  NA  NA
301  NA  NA  NA  NA  NA  NA
303  NA  NA  NA  NA  NA  NA
304  NA  NA  NA  NA  NA  NA
403  NA  NA  NA  NA  NA  NA

I want to replace some of the NAs following specific criterion included in 2 
others matrix:

 wei2 - 
 matrix(c(.6,.4,.5,.5,.9,.1,.8,.2,.7,.3,.6,.4),6,2,dimnames=list(nom.plac2, 
 c(p1,p2)),byrow=T)
 wei2
 p1  p2
102 0.6 0.4
103 0.5 0.5
301 0.9 0.1
303 0.8 0.2
304 0.7 0.3
403 0.6 0.4
 voisin - matrix(c(103,304, 303, 102, 103 
 ,303,403,304,303,102,103 ,303),
  6,2,dimnames=list(nom.plac2, c(v1,v2)),byrow=T)
 voisin
v1v2
102 103 304
103 303 102
301 103 303
303 403 304
304 303 102
403 103 303

So my final result is:

102 103 301 303 304 403
102  NA 0.6  NA  NA 0.4  NA
103 0.5  NA  NA 0.5  NA  NA
301  NA 0.9  NA 0.1  NA  NA
303  NA  NA  NA  NA 0.2 0.8
304 0.3  NA  NA 0.7  NA  NA
403  NA 0.6  NA 0.4  NA  NA


So, globally I want to fill for each line of poids2 data from wei2 
associated with the good the good identifier found in voisin.

This can easily be done by a loop:

 loop - poids2
 for(i in 1:6){
+ loop[i,voisin[i,]] - wei2[i,]
+ }

But I expect it to be quite slow with my larger dataset.

Does any of you has an idea how I could remove the loop and speed up the 
operation?

Best regards,


Bastien Ferland-Raymond, M.Sc. Stat., M.Sc. Biol.
Division des orientations et projets spéciaux
Direction des inventaires forestiers
Ministère des Ressources naturelles et de la Faune du Québec

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.