Re: [R] extracting characters from a string

2013-01-23 Thread Biau David
thanks, it works well. I have to work on Arun's previous answer to make it work 
too.


 
David


>
> De : Rui Barradas 
>À : Biau David  
>Cc : r help list  
>Envoyé le : Mercredi 23 janvier 2013 19h57
>Objet : Re: [R] extracting characters from a string
> 
>Hello,
>
>I've just noticed that my first solution would only return the first set 
>of alphabetic characters, such as "Van", not "Van den Hoops".
>The following will solve that problem.
>
>
>fun2 <- function(x, sep = ", "){
>    x <- strsplit(x, sep)
>    m <- lapply(x, function(y) gregexpr(" [[:alpha:]]*$", y))
>    res <- lapply(seq_along(x), function(i)
>        regmatches(x[[i]], m[[i]], invert = TRUE))
>    res <- lapply(res, unlist)
>    lapply(res, function(y) y[nchar(y) > 0])
>}
>fun2(pub)
>
>
>Hope this helps,
>
>Rui Barradas
>
>Em 23-01-2013 18:33, Rui Barradas escreveu:
>> Hello,
>>
>> Try the following.
>>
>> fun <- function(x, sep = ", "){
>>      s <- unlist(strsplit(x, sep))
>>      regmatches(s, regexpr("[[:alpha:]]*", s))
>> }
>>
>> fun(pub)
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>> Em 23-01-2013 17:38, Biau David escreveu:
>>> Dear All,
>>>
>>> I have a data frame of vectors of publication names such as 'pub':
>>>
>>> pub1 <- c('Brown DK, Santos R, Rome DF, Don Juan X')
>>> pub2 <- c('Benigni D')
>>> pub3 <- c('Arstra SD, Van den Hoops DD, lamarque D')
>>>
>>> pub <- rbind(pub1, pub2, pub3)
>>>
>>>
>>> I would like to construct a dataframe with only author's last name and
>>> each last name in columns and the publication in rows. Basically I
>>> want to get rid of the initials (max 2, always before a comma) and
>>> spaces surounding last name. I would like to avoid a loop.
>>>
>>> ps: If I could have even a short explanation of the code that extract
>>> the values of the character string that would also be great!
>>>
>>>
>>> David
>>>
>>>     [[alternative HTML version deleted]]
>>>
>>>
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] extracting characters from a string

2013-01-23 Thread Biau David
Dear All,

I have a data frame of vectors of publication names such as 'pub':

pub1 <- c('Brown DK, Santos R, Rome DF, Don Juan X')
pub2 <- c('Benigni D')
pub3 <- c('Arstra SD, Van den Hoops DD, lamarque D')

pub <- rbind(pub1, pub2, pub3)


I would like to construct a dataframe with only author's last name and each 
last name in columns and the publication in rows. Basically I want to get rid 
of the initials (max 2, always before a comma) and spaces surounding last name. 
I would like to avoid a loop.

ps: If I could have even a short explanation of the code that extract the 
values of the character string that would also be great!

 
David

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] removing loops from code in making data.frame

2013-01-16 Thread Biau David
thanks, it goes a lot faster. Just one thing though, when I apply the code to 
my data, both data.frames end up "differente. Or at least identical(df1, df2) 
if false

however when i do which(df1!=df2) it says 'integer (0)'.

Could that be due to the class of the vectors or some thing of the sort?

thanks,


 
David Biau


>
> De : arun 
>À : Biau David  
>Cc : R help  
>Envoyé le : Mardi 15 janvier 2013 21h54
>Objet : Re: [R] removing loops from code in making data.frame
> 
>Hi,
>
>You could also do this:
>res1<-do.call(rbind,lapply(xaulist,function(x) 
>as.numeric(apply(t(mapply(`==`,tata,x)),2,any
>identical(res1,tutu)
>#[1] TRUE
>A.K.
>
>
>
>
>
>- Original Message -
>From: Biau David 
>To: r help list 
>Cc: 
>Sent: Tuesday, January 15, 2013 2:41 PM
>Subject: [R] removing loops from code in making data.frame
>
>Dear all,
>
>I am working on an author network and to do so I have to arrange a data.frame 
>(tutu) crossing author names (rows) per publication number (column). The 
>participation of the author to a study is indicated by a 1 and 0 otherwise.
>
>I have a vector (xaulist) of all the names of authors and a data.frame (tata) 
>with all the publications in row and the authors in columns. I have writen a 
>loop to obtain my data.frame but it takes a long time when the number of 
>studies increases. I was looking for a more efficient code.
>
>Here is a minimal working example (my code is terrible i know...):
>
>#-
>
>au1 <- c('deb', 'art', 'deb', 'seb', 'deb', 'deb', 'mar', 'mar', 'joy', 'deb')
>au2 <- c('art', 'deb', 'soy', 'deb', 'joy', 'ani', 'deb', 'deb', 'nem', 'mar')
>au3 <- c('mar', 'lio', 'mil', 'mar', 'ani', 'lul', 'nem', 'art', 'deb', 'tat')
>
>tata <- data.frame(au1, au2, au3)
>xaulist2 <- levels(factor(unlist(tata[,])))
>xaulist <- levels(as.factor(xaulist2))
>
>tutu <- matrix(NA, nrow=length(xaulist), ncol=dim(tata)[1]) # row are authors 
>and col are papers
>for (i in 1:length(xaulist))
>{
>  for (j in 1:dim(tata)[1])
>  {
>  ifelse('TRUE' %in% as.character(tata[j,]==xaulist[i]), tutu[i,j] <- 1,  
>tutu[i,j] <- 0)
>  }
>}
>tutu[is.na(tutu)] <- 0
>
>#-
>
>I am looking at some more efficient way to build 'tutu'.
>
>Thank you very much,
>
> 
>David
>
>    [[alternative HTML version deleted]]
>
>
>__
>R-help@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] removing loops from code in making data.frame

2013-01-15 Thread Biau David
Dear all,

I am working on an author network and to do so I have to arrange a data.frame 
(tutu) crossing author names (rows) per publication number (column). The 
participation of the author to a study is indicated by a 1 and 0 otherwise.

I have a vector (xaulist) of all the names of authors and a data.frame (tata) 
with all the publications in row and the authors in columns. I have writen a 
loop to obtain my data.frame but it takes a long time when the number of 
studies increases. I was looking for a more efficient code.

Here is a minimal working example (my code is terrible i know...):

#-

au1 <- c('deb', 'art', 'deb', 'seb', 'deb', 'deb', 'mar', 'mar', 'joy', 'deb')
au2 <- c('art', 'deb', 'soy', 'deb', 'joy', 'ani', 'deb', 'deb', 'nem', 'mar')
au3 <- c('mar', 'lio', 'mil', 'mar', 'ani', 'lul', 'nem', 'art', 'deb', 'tat')

tata <- data.frame(au1, au2, au3)
xaulist2 <- levels(factor(unlist(tata[,])))
xaulist <- levels(as.factor(xaulist2))

tutu <- matrix(NA, nrow=length(xaulist), ncol=dim(tata)[1]) # row are authors 
and col are papers
for (i in 1:length(xaulist))
{
  for (j in 1:dim(tata)[1])
  {
  ifelse('TRUE' %in% as.character(tata[j,]==xaulist[i]), tutu[i,j] <- 1,  
tutu[i,j] <- 0)
  }
}
tutu[is.na(tutu)] <- 0

#-

I am looking at some more efficient way to build 'tutu'.

Thank you very much,

 
David

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] extracting character values

2013-01-13 Thread Biau David
thanks too. It works also perfect. Not sure I understand all the code though: 
will have to look into it!


 
David Biau


>
> De : arun 
>À : Biau David  
>Cc : R help ; Uwe Ligges 
> 
>Envoyé le : Dimanche 13 janvier 2013 18h36
>Objet : Re: [R] extracting character values
> 
>Hi,
>This should also work:
>do.call(data.frame,lapply(netw,function(x) gsub("^ *(\\D+) \\w+$","\\1",x)))
>A.K.
>
>
>
>
>
>
>From: Biau David 
>To: arun ; r help list  
>Sent: Sunday, January 13, 2013 12:02 PM
>Subject: Re: [R] extracting character values
>
>
>OK,
>
>here is a minimal working example:
>
>au1 <- c('biau dj', 'jones kb', 'van den hoofs j', ' biau dj', 'biau dj', 
>'campagna r', 'biau dj', 'weiss kr', 'verdegaal sh', 'riad s')
>au2 <- c('weiss kr', 'ferguson pc', ' greidanus nv', ' porcher r', 'ferguson 
>pc', 'pessis e', 'leclerc p', 'biau dj', 'bovee jv', 'biau d')
>au3 <- c('bhumbra rs', 'lam b', 'garbuz ds', NA, 'chung p', ' biau dj', 
>'marmor s', 'bhumbra r', 'pansuriya tc', NA)
>
>netw <- data.frame(au1, au2, au3)
>res <- data.frame(matrix(NA, nrow=dim(netw)[1], ncol=dim(netw)[2]))
>
>for (i in 1:dim(netw)[2])
>{
>wh <- regexpr('[a-z]{3,}', as.character(netw[,i]))
>res[i] <- substring(as.character(netw[,i]), wh, wh + attr(wh,'match.length')-1)
>}
>
> problem is for author "van den hoofs j" who is only retrieved as 'van'
>
>thanks,
>
>
>David Biau
>
>
>>
>> De : arun 
>>À : Biau David  
>>Envoyé le : Dimanche 13 janvier 2013 17h38
>>Objet : Re: [R] extracting character values
>> 
>>HI,
>>
>>
>> res <- data.frame(matrix(NA, nrow=dim(netw)[1], ncol=dim(netw)[2]))
>>#Error in matrix(NA, nrow = dim(netw)[1], ncol = dim(netw)[2]) : 
>> # object 'netw' not found
>>Can you provide an example dataset of netw?
>>Thanks.
>>A.K.
>>
>>
>>
>>- Original Message -
>>From: Biau David 
>>To: r help list 
>>Cc: 
>>Sent: Sunday, January 13, 2013 3:53 AM
>>Subject: [R] extracting character values
>>
>>Dear all,
>>
>>I have a dataframe of names (netw), with each cell including last name and 
>>initials of an author; some cells have NA. I would like to extract only the 
>>last name from each cell; this new dataframe is calle 'res'
>>
>>
>>Here is what I do:
>>
>>res <- data.frame(matrix(NA, nrow=dim(netw)[1], ncol=dim(netw)[2]))
>>
>>for (i in 1:x)
>>{
>>wh <- regexpr('[a-z]{3,}', as.character(netw[,i]))
>>res[i] <- substring(as.character(netw[,i]), wh, wh + 
>>attr(wh,'match.length')-1)
>>}
>>
>> 
>>the problem is that I cannot manage to extract 'complex' names properly such 
>>as ' van der hoops bf  ': here I only get 'van', the real last name is
>'van der hoops' and 'bf' are the initials. Basically the last name has always 
>a minimum of 3 consecutive letters, but may have 3 or more letters separated 
>by one or more space; the cell may start by a space too; initials never have 
>more than 2 letters.
>>
>>Someone would have a nice idea for that? Thanks,
>>
>>
>>David
>>
>>    [[alternative HTML version deleted]]
>>
>>
>>__
>>R-help@r-project.org mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide http://www.r-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>>
>
>
>
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] extracting character values

2013-01-13 Thread Biau David
works great thanks. And you cut off my code a lot and removed the loop. 


 
David Biau


>
> De : Uwe Ligges 
>À : Biau David  
>Cc : arun ; r help list  
>Envoyé le : Dimanche 13 janvier 2013 18h22
>Objet : Re: [R] extracting character values
> 
>
>
>On 13.01.2013 18:02, Biau David wrote:
>> OK,
>>
>> here is a minimal working example:
>>
>> au1 <- c('biau dj', 'jones kb', 'van den hoofs j', ' biau dj', 'biau dj', 
>> 'campagna r', 'biau dj', 'weiss kr', 'verdegaal sh', 'riad s')
>> au2 <- c('weiss kr', 'ferguson pc', ' greidanus nv', ' porcher r', 'ferguson 
>> pc', 'pessis e', 'leclerc p', 'biau dj', 'bovee jv', 'biau d')
>> au3 <- c('bhumbra rs', 'lam b', 'garbuz ds', NA, 'chung p', ' biau dj', 
>> 'marmor s', 'bhumbra r', 'pansuriya tc', NA)
>>
>> netw <- data.frame(au1, au2, au3)
>> res <- data.frame(matrix(NA, nrow=dim(netw)[1], ncol=dim(netw)[2]))
>>
>> for (i in 1:dim(netw)[2])
>> {
>> wh <- regexpr('[a-z]{3,}', as.character(netw[,i]))
>> res[i] <- substring(as.character(netw[,i]), wh, wh + 
>> attr(wh,'match.length')-1)
>> }
>
>
>There may be an easier solution, but this should do:
>
>res <- data.frame(lapply(netw,
>      function(x)
>        gsub("^ *([[:alpha:] ]*) +[[:alpha:]]+$", "\\1", x)))
>
>Uwe Ligges
>
>
>
>
>>   problem is for author "van den hoofs j" who is only retrieved as 'van'
>>
>> thanks,
>>
>>
>> David Biau
>>
>>
>>> 
>>> De : arun 
>>> À : Biau David 
>>> Envoyé le : Dimanche 13 janvier 2013 17h38
>>> Objet : Re: [R] extracting character values
>>>
>>> HI,
>>>
>>>
>>>   res <- data.frame(matrix(NA, nrow=dim(netw)[1], ncol=dim(netw)[2]))
>>> #Error in matrix(NA, nrow = dim(netw)[1], ncol = dim(netw)[2]) :
>>>   # object 'netw' not found
>>> Can you provide an example dataset of netw?
>>> Thanks.
>>> A.K.
>>>
>>>
>>>
>>> - Original Message -
>>> From: Biau David 
>>> To: r help list 
>>> Cc:
>>> Sent: Sunday, January 13, 2013 3:53 AM
>>> Subject: [R] extracting character values
>>>
>>> Dear all,
>>>
>>> I have a dataframe of names (netw), with each cell including last name and 
>>> initials of an author; some cells have NA. I would like to extract only the 
>>> last name from each cell; this new dataframe is calle 'res'
>>>
>>>
>>> Here is what I do:
>>>
>>> res <- data.frame(matrix(NA, nrow=dim(netw)[1], ncol=dim(netw)[2]))
>>>
>>> for (i in 1:x)
>>> {
>>> wh <- regexpr('[a-z]{3,}', as.character(netw[,i]))
>>> res[i] <- substring(as.character(netw[,i]), wh, wh + 
>>> attr(wh,'match.length')-1)
>>> }
>>>
>>>
>>> the problem is that I cannot manage to extract 'complex' names properly 
>>> such as ' van der hoops bf  ': here I only get 'van', the real last name is 
>>> 'van der hoops' and 'bf' are the initials. Basically the last name has 
>>> always a minimum of 3 consecutive letters, but may have 3 or more letters 
>>> separated by one or more space; the cell may start by a space too; initials 
>>> never have more than 2 letters.
>>>
>>> Someone would have a nice idea for that? Thanks,
>>>
>>>
>>> David
>>>
>>>      [[alternative HTML version deleted]]
>>>
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>>
>>>
>>     [[alternative HTML version deleted]]
>>
>>
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] extracting character values

2013-01-13 Thread Biau David
OK,

here is a minimal working example:

au1 <- c('biau dj', 'jones kb', 'van den hoofs j', ' biau dj', 'biau dj', 
'campagna r', 'biau dj', 'weiss kr', 'verdegaal sh', 'riad s')
au2 <- c('weiss kr', 'ferguson pc', ' greidanus nv', ' porcher r', 'ferguson 
pc', 'pessis e', 'leclerc p', 'biau dj', 'bovee jv', 'biau d')
au3 <- c('bhumbra rs', 'lam b', 'garbuz ds', NA, 'chung p', ' biau dj', 'marmor 
s', 'bhumbra r', 'pansuriya tc', NA)

netw <- data.frame(au1, au2, au3)
res <- data.frame(matrix(NA, nrow=dim(netw)[1], ncol=dim(netw)[2]))

for (i in 1:dim(netw)[2])
{
wh <- regexpr('[a-z]{3,}', as.character(netw[,i]))
res[i] <- substring(as.character(netw[,i]), wh, wh + attr(wh,'match.length')-1)
}

 problem is for author "van den hoofs j" who is only retrieved as 'van'

thanks,


David Biau


>
> De : arun 
>À : Biau David  
>Envoyé le : Dimanche 13 janvier 2013 17h38
>Objet : Re: [R] extracting character values
> 
>HI,
>
>
> res <- data.frame(matrix(NA, nrow=dim(netw)[1], ncol=dim(netw)[2]))
>#Error in matrix(NA, nrow = dim(netw)[1], ncol = dim(netw)[2]) : 
> # object 'netw' not found
>Can you provide an example dataset of netw?
>Thanks.
>A.K.
>
>
>
>- Original Message -
>From: Biau David 
>To: r help list 
>Cc: 
>Sent: Sunday, January 13, 2013 3:53 AM
>Subject: [R] extracting character values
>
>Dear all,
>
>I have a dataframe of names (netw), with each cell including last name and 
>initials of an author; some cells have NA. I would like to extract only the 
>last name from each cell; this new dataframe is calle 'res'
>
>
>Here is what I do:
>
>res <- data.frame(matrix(NA, nrow=dim(netw)[1], ncol=dim(netw)[2]))
>
>for (i in 1:x)
>{
>wh <- regexpr('[a-z]{3,}', as.character(netw[,i]))
>res[i] <- substring(as.character(netw[,i]), wh, wh + attr(wh,'match.length')-1)
>}
>
> 
>the problem is that I cannot manage to extract 'complex' names properly such 
>as ' van der hoops bf  ': here I only get 'van', the real last name is 'van 
>der hoops' and 'bf' are the initials. Basically the last name has always a 
>minimum of 3 consecutive letters, but may have 3 or more letters separated by 
>one or more space; the cell may start by a space too; initials never have more 
>than 2 letters.
>
>Someone would have a nice idea for that? Thanks,
>
>
>David
>
>    [[alternative HTML version deleted]]
>
>
>__
>R-help@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] count combined occurrences of categories

2013-01-13 Thread Biau David
OK thanks for the tips. I have abandonned the use of cbidn in dataframe. I've 
used the obth dcast() and melt() and they both work fine. Thanks again

 
David Biau


>
> De : David Winsemius 
>À : arun  
>Cc : R help ; Biau David  
>Envoyé le : Vendredi 11 janvier 2013 18h54
>Objet : Re: [R] count combined occurrences of categories
> 
>
>On Jan 11, 2013, at 9:47 AM, arun wrote:
>
>> HI David,
>> 
>> I get different results with dcast()
>> 
>> library(reshape2)
>>   dcast(melt(tutu,"nam"),nam~value,length)
>> #  nam art deb joy mar seb lio nem tat
>> #1  da   2   3   1   4   1   1   0   0
>> #2  fr   2   2   2   3   0   1   1   1
>> #3  ya   1   2   1   0   0   1   1   0
>> 
>>  tutus <- data.frame(nam=tutu$nam, au=with(tutu, c(au1,au2,au3)))
>>  with(tutus,table(nam,au))
>> #    au
>> #nam  1 2 3 4 5 6 7
>>  # da 2 3 1 2 4 0 0   #some numbers don't match the previous result
>>   #fr 2 2 2 2 2 1 1
>>   #ya 1 2 1 1 0 1 0
>> #If I convert to as.character(), it matched with the dcast() results
>
>Probably due to the fact I used c() on factors:
>
>tutu <- data.frame(nam, au1, au2, au3, stringsAsFactors=FALSE)
>> tutus <- data.frame(nam=tutu$nam, au=with(tutu, c(au1,au2,au3)))
>> tutab <- with(tutus, table(nam, au)  )
>> tutab
>    au
>nam  art deb joy lio mar nem seb tat
>  da   2   3   1   1   4   0   1   0
>  fr   2   2   2   1   3   1   0   1
>  ya   1   2   1   1   0   1   0   0
>
>-- David.
>> 
>> tutunew<-data.frame(nam=tutu$nam,au=with(tutu,c(as.character(au1),as.character(au2),as.character(au3
>> with(tutunew,table(nam,au))
>> #    au
>> #nam  art deb joy lio mar nem seb tat
>>  # da   2   3   1   1   4   0   1   0
>>   #fr   2   2   2   1   3   1   0   1
>>   #ya   1   2   1   1   0   1   0   0
>> A.K.
>> 
>> 
>> 
>> 
>> 
>> - Original Message -
>> From: David Winsemius 
>> To: Biau David 
>> Cc: r help list 
>> Sent: Friday, January 11, 2013 12:20 PM
>> Subject: Re: [R] count combined occurrences of categories
>> 
>> 
>> On Jan 11, 2013, at 2:54 AM, Biau David wrote:
>> 
>>> Dear all,
>>> 
>>> i would like to count the number of times where I have combined occurrences 
>>> of the categories of 2 variables.
>>> 
>>> For instance, in the dataframe below, i would like to know how many times 
>>> each author (au1, au2, au3 represent the first, second, third author) is 
>>> associated with each of the category of the variable 'nam'. The position of 
>>> the author does not matter.
>>> 
>>> nam <- c('da', 'ya', 'da', 'da', 'fr', 'fr', 'fr', 'da', 'ya', 'fr')
>>> au1 <- c('deb', 'art', 'deb', 'seb', 'deb', 'deb', 'mar', 'mar', 'joy', 
>>> 'joy')
>>> au2 <- c('art', 'deb', 'mar', 'deb', 'joy', 'mar', 'art', 'lio', 'nem', 
>>> 'mar')
>>> au3 <- c('mar', 'lio', 'joy', 'mar', 'art', 'lio', 'nem', 'art', 'deb', 
>>> 'tat')
>>> tutu <- data.frame(cbind(nam, au1, au2, au3))
>> 
>> You should first abandon the practice of using `cbind` inside `data.frame`. 
>> Obscure errors will plague your R experience until you do so.
>> 
>> Bas solution:
>> 
>>> tutus <- data.frame(nam=tutu$nam, au=with(tutu, c(au1,au2,au3)))
>>> tutab <- with(tutus, table(nam, au)  )
>>> tutab
>>     au
>> nam  1 2 3 4 5 6 7
>>   da 2 3 1 2 4 0 0
>>   fr 2 2 2 2 2 1 1
>>   ya 1 2 1 1 0 1 0
>> 
>> --
>> David Winsemius, MD
>> Alameda, CA, USA
>> 
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
>
>David Winsemius, MD
>Alameda, CA, USA
>
>
>
>
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] extracting character values

2013-01-13 Thread Biau David
Dear all,

I have a dataframe of names (netw), with each cell including last name and 
initials of an author; some cells have NA. I would like to extract only the 
last name from each cell; this new dataframe is calle 'res'


Here is what I do:

res <- data.frame(matrix(NA, nrow=dim(netw)[1], ncol=dim(netw)[2]))

for (i in 1:x)
{
wh <- regexpr('[a-z]{3,}', as.character(netw[,i]))
res[i] <- substring(as.character(netw[,i]), wh, wh + attr(wh,'match.length')-1)
}

 
the problem is that I cannot manage to extract 'complex' names properly such as 
' van der hoops bf  ': here I only get 'van', the real last name is 'van der 
hoops' and 'bf' are the initials. Basically the last name has always a minimum 
of 3 consecutive letters, but may have 3 or more letters separated by one or 
more space; the cell may start by a space too; initials never have more than 2 
letters.

Someone would have a nice idea for that? Thanks,


David

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] count combined occurrences of categories

2013-01-11 Thread Biau David
Dear all,
 
i would like to count the number of times where I have combined occurrences of 
the categories of 2 variables.
 
For instance, in the dataframe below, i would like to know how many times each 
author (au1, au2, au3 represent the first, second, third author) is associated 
with each of the category of the variable 'nam'. The position of the author 
does not matter.
 
nam <- c('da', 'ya', 'da', 'da', 'fr', 'fr', 'fr', 'da', 'ya', 'fr')
au1 <- c('deb', 'art', 'deb', 'seb', 'deb', 'deb', 'mar', 'mar', 'joy', 'joy')
au2 <- c('art', 'deb', 'mar', 'deb', 'joy', 'mar', 'art', 'lio', 'nem', 'mar')
au3 <- c('mar', 'lio', 'joy', 'mar', 'art', 'lio', 'nem', 'art', 'deb', 'tat')
tutu <- data.frame(cbind(nam, au1, au2, au3))
 
thanks,

David
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Re : interpretation of coefficients in survreg AND obtaining the hazard function

2010-11-15 Thread Biau David
Dear Prof Therneau,

thank yo for this information: this is going to be most useful for what I want 
to do. I will look into the ACF model.

Yours,

 David Biau.





De : Terry Therneau 

Cc : r-help@r-project.org
Envoyé le : Lun 15 novembre 2010, 15h 33min 23s
Objet : Re: interpretation of coefficients in survreg AND obtaining the hazard 
function

1. The weibull is the only distribution that can be written in both a
proportional hazazrds for and an accelerated failure time form.  Survreg
uses the latter.
   In an ACF model, we model the time to failure.  Positive coefficients
are good (longer time to death).
   In a PH model, we model the death rate.  Positive coefficients are
bad (higher death rate).

You are not the first to be confused by the change in sign between the
two models.

2. There are about 5 different ways to parameterize a Weibull
distribution, 1-4 appear in various texts and the acf form is #5.  This
is a second common issue with survreg that strikes only the more
sophisticated users: to understand the output they look up the Weibull
in a textbook, and become even more confused!  

Kalbfliesch and Prentice is a good reference for the acf form.  The
manual page for psurvreg has some information on this, as does the very
end of ?survreg.  The psurvreg page also has an example of how to
extract the hazard function for a Weibull fit.

Begin included message 

Dear R help list,

I am modeling some survival data with coxph and survreg (dist='weibull')
using 
package survival. I have 2 problems:

1) I do not understand how to interpret the regression coefficients in
the 
survreg output and it is not clear, for me, from ?survreg.objects how
to.

Here is an example of the codes that points out my problem:
- data is stc1
- the factor is dichotomous with 'low' and 'high' categories

slr <- Surv(stc1$ti_lr, stc1$ev_lr==1)

mca <- coxph(slr~as.factor(grade2=='high'), data=stc1)
mcb <- coxph(slr~as.factor(grade2), data=stc1)
mwa <- survreg(slr~as.factor(grade2=='high'), data=stc1,
dist='weibull', 
scale=0)
mwb <- survreg(slr~as.factor(grade2), data=stc1, dist='weibull',
scale=0)

> summary(mca)$coef

coef
exp(coef)  se(coef) z  Pr(>|z|)
as.factor(grade2 == "high")TRUE 0.2416562  1.273356 0.2456232
0.9838494  0.3251896

> summary(mcb)$coef
   coef exp(coef)  
se(coef) z Pr(>|z|)
as.factor(grade2)low -0.2416562 0.7853261 0.2456232
-0.9838494
0.3251896

> summary(mwa)$coef
(Intercept) as.factor(grade2 == "high")TRUE 
7.9068380   -0.4035245 

> summary(mwb)$coef
(Intercept) as.factor(grade2)low 
7.5033135   0.4035245 


No problem with the interpretation of the coefs in the cox model.
However, i do 
not understand why
a) the coefficients in the survreg model are the opposite (negative when
the 
other is positive) of what I have in the cox model? are these not the
log(HR) 
given the categories of these variable?
b) how come the intercept coefficient changes (the scale parameter does
not 
change)?

2) My second question relates to the first.
a) given a model from survreg, say mwa above, how should i do to extract
the 
base hazard and the hazard of each patient given a set of predictors?
With the 
hazard function for the ith individual in the study given by  h_i(t) = 
exp(\beta'x_i)*\lambda*\gamma*t^{\gamma-1}, it doesn't look like to me
that 
predict(mwa, type='linear') is \beta'x_i.
b) since I need the coefficient intercept from the model to obtain the
scale 
parameter  to obtain the base hazard function as defined in Collett 
(h_0(t)=\lambda*\gamma*t^{\gamma-1}), I am concerned that this
coefficient 
intercept changes depending on the reference level of the factor entered
in the 
model. The change is very important when I have more than one predictor
in the 
model.

Any help would be greatly appreciated,

David Biau.


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Re : interpretation of coefficients in survreg AND obtaining the hazard function for an individual given a set of predictors

2010-11-14 Thread Biau David
Dear Prof Lumley,

This is a very clear, precise, and useful answer to all my questions.

Thank you very much.

 David Biau.





De : Thomas Lumley 

Cc : r help list 
Envoyé le : Dim 14 novembre 2010, 23h 54min 23s
Objet : Re: [R] interpretation of coefficients in survreg AND obtaining the
hazard function for an individual given a set of predictors


> Dear R help list,
>
> I am modeling some survival data with coxph and survreg (dist='weibull') using
> package survival. I have 2 problems:
>
> 1) I do not understand how to interpret the regression coefficients in the
> survreg output and it is not clear, for me, from ?survreg.objects how to.
>
> Here is an example of the codes that points out my problem:
> - data is stc1
> - the factor is dichotomous with 'low' and 'high' categories
>
> slr <- Surv(stc1$ti_lr, stc1$ev_lr==1)
>
> mca <- coxph(slr~as.factor(grade2=='high'), data=stc1)
> mcb <- coxph(slr~as.factor(grade2), data=stc1)
> mwa <- survreg(slr~as.factor(grade2=='high'), data=stc1, dist='weibull',
> scale=0)
> mwb <- survreg(slr~as.factor(grade2), data=stc1, dist='weibull', scale=0)
>
>> summary(mca)$coef
> coef
> exp(coef)  se(coef) z  Pr(>|z|)
> as.factor(grade2 == "high")TRUE 0.2416562  1.273356 0.2456232
> 0.9838494  0.3251896
>
>> summary(mcb)$coef
>   coef exp(coef)
> se(coef) z Pr(>|z|)
> as.factor(grade2)low -0.2416562 0.7853261 0.2456232 -0.9838494
> 0.3251896
>
>> summary(mwa)$coef
> (Intercept) as.factor(grade2 == "high")TRUE
> 7.9068380   -0.4035245
>
>> summary(mwb)$coef
> (Intercept) as.factor(grade2)low
> 7.5033135   0.4035245
>
>
> No problem with the interpretation of the coefs in the cox model. However, i 
do
> not understand why
> a) the coefficients in the survreg model are the opposite (negative when the
> other is positive) of what I have in the cox model? are these not the log(HR)
> given the categories of these variable?

No. survreg() fits accelerated failure models, not proportional
hazards models.   The coefficients are logarithms of ratios of
survival times, so a positive coefficient means longer survival.


> b) how come the intercept coefficient changes (the scale parameter does not
> change)?

Because you have reversed the order of the factor levels.  The
coefficient of that variable changes sign and the intercept changes to
compensate.


> 2) My second question relates to the first.
> a) given a model from survreg, say mwa above, how should i do to extract the
> base hazard and the hazard of each patient given a set of predictors? With the
> hazard function for the ith individual in the study given by  h_i(t) =
> exp(\beta'x_i)*\lambda*\gamma*t^{\gamma-1}, it doesn't look like to me that
> predict(mwa, type='linear') is \beta'x_i.

No, it's beta'x_i for the accelerated failure parametrization of the
Weibull.  In terms of the CDF

F_i(t) = F_0( exp((t+beta'x_i)/scale) )

So you need to multiply by the scale parameter and change sign to get
the log hazard ratios.


> b) since I need the coefficient intercept from the model to obtain the scale
> parameter  to obtain the base hazard function as defined in Collett
> (h_0(t)=\lambda*\gamma*t^{\gamma-1}), I am concerned that this coefficient
> intercept changes depending on the reference level of the factor entered in 
the
> model. The change is very important when I have more than one predictor in the
> model.

As Terry Therneau pointed out recently in the context of the Cox
model, there is no such thing as "the" baseline hazard.  The baseline
hazard is the hazard when all your covariates are equal to zero, and
this depends on how you parametrize.  In mwa, zero is grade2="low", in
mwb, zero is grade2="high", so the hazard at zero has to be different
in the two cases.

 -thomas

--
Thomas Lumley
Professor of Biostatistics
University of Auckland



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Re : interpretation of coefficients in survreg AND obtaining the hazard function for an individual given a set of predictors

2010-11-13 Thread Biau David
Thank you David for your answer,

- grade2 is a factor with 2 categories: "high" and "low" 
- yes as.factor is superfluous; it is just that it avoids warnings sometimes. 
This can be overlooked.
- I will look into Terry Therneau answers; he gives a good explanation on how 
to 
obtain the hazard for an individual given a set of predictors for the Cox 
model; 
I will look to see if this works for survreg andlook into survreg.distributions 
if it doesn't
- I'll come back if I can't figure it out.

Thanks again.

Best,

 David Biau.





De : David Winsemius 

Cc : r help list 
Envoyé le : Sam 13 novembre 2010, 19h 55min 10s
Objet : Re: [R] interpretation of coefficients in survreg AND obtaining the
hazard function for an individual given a set of predictors


On Nov 13, 2010, at 12:51 PM, Biau David wrote:

> Dear R help list,
> 
> I am modeling some survival data with coxph and survreg (dist='weibull') using
> package survival. I have 2 problems:
> 
> 1) I do not understand how to interpret the regression coefficients in the
> survreg output and it is not clear, for me, from ?survreg.objects how to.

Have you read:

?survreg.distributions  # linked from survreg help

> 
> Here is an example of the codes that points out my problem:
> - data is stc1
> - the factor is dichotomous with 'low' and 'high' categories

Not an unambiguous description for the purposes of answering your many 
questions. Please provide data or at the very least: str(stc1)

> 
> slr <- Surv(stc1$ti_lr, stc1$ev_lr==1)
> 
> mca <- coxph(slr~as.factor(grade2=='high'), data=stc1)

Not sure what that would be returning since we do not know the encoding of
grade2. If you want an estimate on a subset wouldn't you do the subsetting
outside of the formula? (You may be reversing the order by offering a logical 
test for grade2.)

> mcb <- coxph(slr~as.factor(grade2), data=stc1)

You have not provided the data or str(stc1), so it is entirely possible that 
as.factor is superfluous in this call.


> mwa <- survreg(slr~as.factor(grade2=='high'), data=stc1, dist='weibull',
> scale=0)
> mwb <- survreg(slr~as.factor(grade2), data=stc1, dist='weibull', scale=0)
> 
>> summary(mca)$coef
> coef
> exp(coef)  se(coef) z  Pr(>|z|)
> as.factor(grade2 == "high")TRUE 0.2416562  1.273356 0.2456232
> 0.9838494  0.3251896
> 
>> summary(mcb)$coef
>   coef exp(coef)
> se(coef) z Pr(>|z|)
> as.factor(grade2)low -0.2416562 0.7853261 0.2456232 -0.9838494
> 0.3251896
> 
>> summary(mwa)$coef
> (Intercept) as.factor(grade2 == "high")TRUE
> 7.9068380   -0.4035245
> 
>> summary(mwb)$coef
> (Intercept) as.factor(grade2)low
> 7.5033135   0.4035245
> 
> 
> No problem with the interpretation of the coefs in the cox model. However, i 
do
> not understand why
> a) the coefficients in the survreg model are the opposite (negative when the
> other is positive) of what I have in the cox model? are these not the log(HR)
> given the categories of these variable?

Probably because the order of the factor got reversed when you changed the
covariate to logical and them back to factor.

> b) how come the intercept coefficient changes (the scale parameter does not
> change)?
> 
> 2) My second question relates to the first.
> a) given a model from survreg, say mwa above, how should i do to extract the
> base hazard

Answered by Therneau earlier this week and the next question last month:

https://stat.ethz.ch/pipermail/r-help/2010-November/259570.html

https://stat.ethz.ch/pipermail/r-help/2010-October/257941.html


> and the hazard of each patient given a set of predictors? With the
> hazard function for the ith individual in the study given by  h_i(t) =
> exp(\beta'x_i)*\lambda*\gamma*t^{\gamma-1}, it doesn't look like to me that
> predict(mwa, type='linear') is \beta'x_i.


> b) since I need the coefficient intercept from the model to obtain the scale
> parameter  to obtain the base hazard function as defined in Collett
> (h_0(t)=\lambda*\gamma*t^{\gamma-1}), I am concerned that this coefficient
> intercept changes depending on the reference level of the factor entered in 
the
> model. The change is very important when I have more than one predictor in the
> model.
> 
> Any help would be greatly appreciated,
> 
> David Biau.
> 


David Winsemius, MD
West Hartford, CT


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] interpretation of coefficients in survreg AND obtaining the hazard function for an individual given a set of predictors

2010-11-13 Thread Biau David
Dear R help list,

I am modeling some survival data with coxph and survreg (dist='weibull') using 
package survival. I have 2 problems:

1) I do not understand how to interpret the regression coefficients in the 
survreg output and it is not clear, for me, from ?survreg.objects how to.

Here is an example of the codes that points out my problem:
- data is stc1
- the factor is dichotomous with 'low' and 'high' categories

slr <- Surv(stc1$ti_lr, stc1$ev_lr==1)

mca <- coxph(slr~as.factor(grade2=='high'), data=stc1)
mcb <- coxph(slr~as.factor(grade2), data=stc1)
mwa <- survreg(slr~as.factor(grade2=='high'), data=stc1, dist='weibull', 
scale=0)
mwb <- survreg(slr~as.factor(grade2), data=stc1, dist='weibull', scale=0)

> summary(mca)$coef
 coef 
exp(coef)  se(coef) z  Pr(>|z|)
as.factor(grade2 == "high")TRUE 0.2416562  1.273356 0.2456232 
0.9838494  0.3251896

> summary(mcb)$coef
   coef exp(coef)  
se(coef) z Pr(>|z|)
as.factor(grade2)low -0.2416562 0.7853261 0.2456232 -0.9838494 
0.3251896

> summary(mwa)$coef
(Intercept) as.factor(grade2 == "high")TRUE 
7.9068380   -0.4035245 

> summary(mwb)$coef
(Intercept) as.factor(grade2)low 
7.5033135   0.4035245 


No problem with the interpretation of the coefs in the cox model. However, i do 
not understand why
a) the coefficients in the survreg model are the opposite (negative when the 
other is positive) of what I have in the cox model? are these not the log(HR) 
given the categories of these variable?
b) how come the intercept coefficient changes (the scale parameter does not 
change)?

2) My second question relates to the first.
a) given a model from survreg, say mwa above, how should i do to extract the 
base hazard and the hazard of each patient given a set of predictors? With the 
hazard function for the ith individual in the study given by  h_i(t) = 
exp(\beta'x_i)*\lambda*\gamma*t^{\gamma-1}, it doesn't look like to me that 
predict(mwa, type='linear') is \beta'x_i.
b) since I need the coefficient intercept from the model to obtain the scale 
parameter  to obtain the base hazard function as defined in Collett 
(h_0(t)=\lambda*\gamma*t^{\gamma-1}), I am concerned that this coefficient 
intercept changes depending on the reference level of the factor entered in the 
model. The change is very important when I have more than one predictor in the 
model.

Any help would be greatly appreciated,

David Biau.



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Re : Re : How to compare the effect of a variable across regression models?

2010-08-13 Thread Biau David
},
>  year = 1995,
>  title = {Model inconsistency, illustrated by the {Cox} proportional hazards
>  model},
>  journal = Stat in Med,
>  volume = 14,
>  pages = {735-746},
>  annote = {covariable adjustment; adjusted estimates; baseline imbalances;
>   RCT; model misspecification; model identification}
> }
>
> One possible remedy, which may not work for your goals, is to embed all models
> in a grand model that is used for inference.
>
> When coefficients ARE comparable in some sense, you can use the bootstrap to 
>get
> confidence bands for differences in regressor effects between models.
>
> Frank
>
> Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
> Department of Biostatistics   Vanderbilt University
>
> On Fri, 13 Aug 2010, Biau David wrote:
>
>> Hello,
>>
>> I would like, if it is possible, to compare the effect of a variable across
>> regression models. I have looked around but I haven't found anything. Maybe
>> someone could help? Here is the problem:
>>
>> I am studying the effect of a variable (age) on an outcome (local recurrence:
>> lr). I have built 3 models:
>> - model 1: lr ~ age  y = \beta_(a1).age
>> - model 2: lr ~ age +  presentation variables (X_p)y = \beta_(a2).age
> +
>> \BETA_(p2).X_p
>> - model 3: lr ~ age + presentation variables + treatment variables( X_t)
>>   y = \beta_(a3).age  + \BETA_(p3).X_(p) + \BETA_(t3).X_t
>>
>> Presentation variables include variables such as tumor grade, tumor size,
>>etc...
>> the physician cannot interfer with these variables.
>> Treatment variables include variables such as chemotherapy, radiation,
> surgical
>> margins (a surrogate for adequate surgery).
>>
>> I have used cph for the models and restricted cubic splines (Design library)
>>for
>> age. I have noted that the effect of age decreases from model 1 to 3.
>>
>> I would like to compare the effect of age on the outcome across the different
>> models. A test of \beta_(a1) = \beta_(a2) = \beta_(a3) and then two by two
>> comparisons or a global trend test maybe? Is that possible?
>>
>> Thank you for your help,
>>
>>
>> David Biau.
>>
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
>
>[[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Re : How to compare the effect of a variable across regression models?

2010-08-13 Thread Biau David
OK,

thank you very much for the answer.I will look into that. Hopefully I'll find 
smoething that will work out.

Best,

 David Biau.





De : Frank Harrell 

Cc : r help list 
Envoyé le : Ven 13 août 2010, 15h 50min 18s
Objet : Re: [R] How to compare the effect of a variable across regression 
models?


David,

In the Cox and many other regression models, the effect of a variable is 
context-dependent.  There is an identifiability problem in what you are doing, 
as discussed by

@ARTICLE{for95mod,
  author = {Ford, Ian and Norrie, John and Ahmadi, Susan},
  year = 1995,
  title = {Model inconsistency, illustrated by the {Cox} proportional hazards
  model},
  journal = Stat in Med,
  volume = 14,
  pages = {735-746},
  annote = {covariable adjustment; adjusted estimates; baseline imbalances;
   RCT; model misspecification; model identification}
}

One possible remedy, which may not work for your goals, is to embed all models 
in a grand model that is used for inference.

When coefficients ARE comparable in some sense, you can use the bootstrap to 
get 
confidence bands for differences in regressor effects between models.

Frank

Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

On Fri, 13 Aug 2010, Biau David wrote:

> Hello,
> 
> I would like, if it is possible, to compare the effect of a variable across
> regression models. I have looked around but I haven't found anything. Maybe
> someone could help? Here is the problem:
> 
> I am studying the effect of a variable (age) on an outcome (local recurrence:
> lr). I have built 3 models:
> - model 1: lr ~ age  y = \beta_(a1).age
> - model 2: lr ~ age +  presentation variables (X_p)y = \beta_(a2).age 
+
> \BETA_(p2).X_p
> - model 3: lr ~ age + presentation variables + treatment variables( X_t)
>   y = \beta_(a3).age  + \BETA_(p3).X_(p) + \BETA_(t3).X_t
> 
> Presentation variables include variables such as tumor grade, tumor size,
>etc...
> the physician cannot interfer with these variables.
> Treatment variables include variables such as chemotherapy, radiation, 
surgical
> margins (a surrogate for adequate surgery).
> 
> I have used cph for the models and restricted cubic splines (Design library) 
>for
> age. I have noted that the effect of age decreases from model 1 to 3.
> 
> I would like to compare the effect of age on the outcome across the different
> models. A test of \beta_(a1) = \beta_(a2) = \beta_(a3) and then two by two
> comparisons or a global trend test maybe? Is that possible?
> 
> Thank you for your help,
> 
> 
> David Biau.
> 
> 
> 
> 
> [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to compare the effect of a variable across regression models?

2010-08-13 Thread Biau David
Hello,

I would like, if it is possible, to compare the effect of a variable across 
regression models. I have looked around but I haven't found anything. Maybe 
someone could help? Here is the problem:

I am studying the effect of a variable (age) on an outcome (local recurrence: 
lr). I have built 3 models:
- model 1: lr ~ age  y = \beta_(a1).age
- model 2: lr ~ age +  presentation variables (X_p)y = \beta_(a2).age + 
\BETA_(p2).X_p
- model 3: lr ~ age + presentation variables + treatment variables( X_t) 
   y = \beta_(a3).age  + \BETA_(p3).X_(p) + \BETA_(t3).X_t
 
Presentation variables include variables such as tumor grade, tumor size, 
etc... 
the physician cannot interfer with these variables.
Treatment variables include variables such as chemotherapy, radiation, surgical 
margins (a surrogate for adequate surgery).

I have used cph for the models and restricted cubic splines (Design library) 
for 
age. I have noted that the effect of age decreases from model 1 to 3.

I would like to compare the effect of age on the outcome across the different 
models. A test of \beta_(a1) = \beta_(a2) = \beta_(a3) and then two by two 
comparisons or a global trend test maybe? Is that possible?

Thank you for your help,


David Biau.



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Re : How to extract se(coef) from cph?

2010-08-05 Thread Biau David
thanks, it works just great.

 David Biau.





De : "Abhijit Dasgupta, PhD" 

Cc : r help list 
Envoyé le : Jeu 5 août 2010, 22h 15min 37s
Objet : Re: [R] How to extract se(coef) from cph?

if the cph model fit is m1, you can try

sqrt(diag(m1$var))

This is coded in print.cph.fit (library(rms))

On 08/05/2010 04:03 PM, Biau David wrote:
> Hello,
>
> I am modeling some survival data wih cph (Design). I have modeled a predictor
> which showed non linear effect with restricted cubic splines. I would like to
> retrieve the se(coef) for other, linear, predictors. This is just to make nice
> LateX tables automatically. I have the coefficients with coef().
>
> How do I do that?
>
> Thanks,
>
>   David Biau.
>
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 

Abhijit Dasgupta, PhD
Director and Principal Statistician
ARAASTAT
Ph: 301.385.3067
E: adasgu...@araastat.com
W: http://www.araastat.com


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Re : How to extract se(coef) from cph?

2010-08-05 Thread Biau David
Excellent!

Yes, FH has a function to get LateX tables, but I not malleable enough.

Thanks,

 David Biau.





De : David Winsemius 

Cc : r help list 
Envoyé le : Jeu 5 août 2010, 22h 11min 20s
Objet : Re: [R] How to extract se(coef) from cph?


On Aug 5, 2010, at 4:03 PM, Biau David wrote:

> Hello,
> 
> I am modeling some survival data wih cph (Design). I have modeled a predictor
> which showed non linear effect with restricted cubic splines. I would like to
> retrieve the se(coef) for other, linear, predictors.

The cph object has a "var". The vcov function is an extractor function. You
would probably be using something like:

diag(vcov(fit))^(1/2)

> This is just to make nice
> LateX tables automatically.

Are you sure Frank has not already programed that for you somewhere? Perhaps 
latex.cph?

> I have the coefficients with coef().
> 
> How do I do that?
> 
> Thanks,
> 
> David Biau.
> 

--
David Winsemius, MD
West Hartford, CT


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to extract se(coef) from cph?

2010-08-05 Thread Biau David
Hello,

I am modeling some survival data wih cph (Design). I have modeled a predictor 
which showed non linear effect with restricted cubic splines. I would like to 
retrieve the se(coef) for other, linear, predictors. This is just to make nice 
LateX tables automatically. I have the coefficients with coef().

How do I do that?

Thanks,

 David Biau.



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Re : Re : COXPH: how to get the score test and likelihood ratio test for a specific variable in a multivariate Coxph ?

2010-07-30 Thread Biau David
Well thank you very much for these explanations. Unfortunately, I must admit 
the 
book I have for survival analysis seems less precise as to which test to use 
and 
why.

Still, in coxph (survival), if I have multiple variables in a model, say  X_1, 
X_2, and X_3, how do I test their respective coefficients \beta_1, \beta_2, and 
\beta_3 with the LR, score and Wald? I guess i can do it by comparing the model 
with all three variables to those without each of the variables, but is there 
not a more straightforward manner?

 David Biau.





De : "Therneau, Terry M., Ph.D." 
À : David Winsemius ; Biau David 
Cc : r help list 
Envoyé le : Ven 30 juillet 2010, 19h 07min 15s
Objet : RE: Re : [R] COXPH: how to get the score test and likelihood ratio test 
for a specific variable in a multivariate Coxph ?

The Wald, score, and LR tests are discussed in full in my book.  They
are not the same.
The LR test is the difference between LR(beta=0) and LR(beta=final). The
score test is a Taylor series approximation to this using an expansion
around beta=0.  The Wald test is a similar Taylor series approximation,
but around beta=final.  
  If there are no tied times the score test = Log-rank test.  If there
are ties, then they are just a tiny bit different: the paper using the
log-rank has an n-1 in his variance term and the Cox model has an n.
Neither is right or wrong, just a different choice.

Terry Therneau



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Re : COXPH: how to get the score test and likelihood ratio test for a specific variable in a multivariate Coxph ?

2010-07-30 Thread Biau David
Thx for the answer.

I am using survival.

I didn't know that the Wald and score tests were the same for individual 
variables in a coxph; I Thought the score test was the "multivariate version" 
of 
the Log-rank.

However, say I have only one variable in the model, I should expect the test 
for 
the full model and the one for a single variable to be the same? Then it seems 
to me that the default test is the Wald and that the Wald and the Score are 
different.

> cox_lr_age <- coxph(Surv(tilr, ev_lr==1)~age, data=tam)
> summary(cox_lr_age)
Call:
coxph(formula = Surv(tilr, ev_lr == 1) ~ age, data = tam)

  n=2156 (76 observations deleted due to missingness)

coef exp(coef) se(coef) z Pr(>|z|)
age 0.019504  1.019696 0.004651 4.193 2.75e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ 
’ 1 

exp(coef) exp(-coef) lower .95 upper .95
age 1.020 0.9807 1.010 1.029

Rsquare= 0.008   (max possible= 0.669 )
Likelihood ratio test= 18.4  on 1 df,   p=1.787e-05
Wald test= 17.58  on 1 df,   p=2.751e-05
Score (logrank) test = 17.86  on 1 df,   p=2.375e-05


 David Biau.





De : David Winsemius 

Cc : r help list 
Envoyé le : Ven 30 juillet 2010, 17h 34min 28s
Objet : Re: [R] COXPH: how to get the score test and likelihood ratio test for 
a 
specific variable in a multivariate Coxph ?


On Jul 30, 2010, at 11:08 AM, Biau David wrote:

> Hello,
> 
> I would like to get the likelihood ratio and score tests for specific 
variables
> in a multivariate coxph model. The default is Wald, so the tests for each
> separate variable is based on Wald's test. I have the other tests for the full
> model but I don't know how to get them for each variable.
> 
> Any idea?
> 

The first idea would be to specify which function in which package you are
asking questions about. In the case of coxph in the survival package, for 
instance, you do get a likelihood ratio test (== differences in 
log-likelihoods) 
by default. A score test is, at least as as I understand it for individual
variables, equivalent to a Wald test, so I don't really understand your 
question, since youa re already getting all of that in the survival package.

(You can extract a "score" value and loglik values from a coxph object by:
(with the first example in the coxph help page)

coxph(Surv(time, status) ~ x + strata(sex), test1)$score
xoxph(Surv(time, status) ~ x + strata(sex), test1)$loglik

But anova(coxph-object) would give you these values in a neater bundle.
#Analysis of Deviance Table
# Cox model: response is Surv(time, status)
#Terms added sequentially (first to last)
#  loglik  Chisq Df Pr(>|Chi|)
# NULL -3.8712
# x-3.3277 1.0871  1 0.2971

The question about "getting them for each variable" does not make a lot of 
sense 
to me, since likelihood tests are model comparisons. You can only make such
statements about the consequences of adding or deleting a variable to/from an 
existing model.

--David Winsemius, MD
West Hartford, CT


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] COXPH: how to get the score test and likelihood ratio test for a specific variable in a multivariate Coxph ?

2010-07-30 Thread Biau David
Hello,

I would like to get the likelihood ratio and score tests for specific variables 
in a multivariate coxph model. The default is Wald, so the tests for each 
separate variable is based on Wald's test. I have the other tests for the full 
model but I don't know how to get them for each variable.

Any idea?

 David Biau.



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] longitudinal tobit regression in R

2010-06-30 Thread Biau David
Hi,

I am trying to model a score over time. This score shows a ceiling effect. I 
was 
willing to use a longitudinal tobit model, such as the one described by Twisk 
et 
al. (Twisk_Longitudinal tobit regression: A new approach to analyze outcome 
variables with floor or ceiling effects_JCE_2009) but it is programmed for 
STATA.

Has anyone used such models in R?
Any other idea?

David Biau.


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] crosstabling multiple variables at once

2010-05-20 Thread Biau David


Hi,
>
>I am trying to describe a data.frame by obtaining multiple crosstable summary 
>statistics at once. I have tried table, xtab, crosstable, summaryBy and 
>describe but none of these functions seems to allow muliple conparisons at 
>once.
> Here, is what I would like to do:
>
>I have, for instance, age, sex (M and F), grade (1, 2, 3) and site (limb, 
>trunk) and I want the, for instance, following summary statistics:
>- age (mean, SD) for males and age for females
>- age for grade 1, grade2, and grade 3
>- age for site limb, site trunk
>- sex (count, proportions) for grade 1, grade2, and grade 3
>- sex (count, proportions) site limb, site trunk (already have sex/age above)
>- grade (count, proportions) for site limb, site trunk (already have grade/sex 
>and grade/age above)
>a
>lso, I want each of these not crossed by any others (mean overall age, numbers 
>of males, etc) which could be seen as each crossed with its own.
> 
>I have at least 10 variables, continuous, categorical ordered and non ordered. 
>I don't want any tests.
>
>Any idea?
>
>David Biau. 
>


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] multiple 2 by 2 crosstabulations?

2010-05-19 Thread Biau David
Hello,

I have a dataframe (var_1, var_2, ..., var_n) and I would like to export 
summary statistics to Latex in the form of a table. I want specific summary 
statistics by crossing numerous variables 2x2 AT ONCE. In each cell I would 
like sometimes to have the median (Q1 - Q3), or frequency and proportion, etc. 
CrossTable, xtab, etc... do not allow for multiple 2 by 2 crosstabulation. The 
table would look like this:

     var_1  var_2 var3, ...
var_1   a        b            c 
var_2   d        e        f
var_3   .. ...     ...

with a, b, c, ... the results of each crosstabulation. I have continuous and 
categorical variables.

Any idea?

Thank you very much,

David.


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.