[R] Comparing two diagnostic tests with lme4

2016-07-03 Thread Keno Kyrill Bressem
Dear R experts,

I compare two diagnostic tests. Therfore, I collected patient data from
several studies. The dataframe is similar to this one:

set.seed(10)
data = data.frame( test1 = rbinom(1000, 1, 0.6),
   test2 = rbinom(1000, 1, 0.4),
   reference = rbinom(1000, 1, 0.7),
   study = sort(paste("study_", round(runif(1000, 1, 20),0)
,sep = "")),
   id = 1:1000,
   age = round(rnorm(1000, 60, 10), 0))

I did a lot of research on how to use hierarchical models for calculating
the respective sensitivities and specifities for my tests and tried a lot
of variations in the formula of  glmer. However, I don't have sufficient
statistical knowledge for interpreting these models. So I don't know if my
approach is correct. Therfore I am showing you my latest model.

First, I would like to calculate the logit sensitivity and specifity for
each test. In a paper by Genders et al.
 (appendices)

a Stata code to calculate the logit sensitivity and specifity is provided.
I transferred this code to "R", but I am not sure if it's correct this way.


m.sen <- glmer(test1 ~ ( 1 | study) + ( 1 | id ), data = subset(data,
reference == 1), family = binomial(link = "logit"),
control = glmerControl(optimizer = "bobyqa"), nAGQ = 1)

# require("useful")
m.spe <-  glmer(binary.flip(test1) ~ ( 1 | study) + ( 1 | id ), data =
subset(data, reference == 0), family = binomial(link = "logit"),
 control = glmerControl(optimizer = "bobyqa"), nAGQ =

logit.sen = fixef(m.sen)
logit.spe = fixef(m.spe)


My first question is if it is possible to calculate the logit sensitivity
and specifity of a diagnostic test like this. The next step would be to
adjust for different patient characteristics, such as age.

data <- within(data, {age = as.factor(round(age, -1))})
m.sen.age <- glmer(test1 ~ age + ( 1 | study) + ( 1 | id ), data =
subset(data, reference == 1),
family = binomial(link = "logit"), control = glmerControl(optimizer =
"bobyqa"), nAGQ = 1)
fix <- fixef(m.sen.age)

Now I add the estimates. For example, to determine the logit sensitivity of
test1 in patients aged between 55 and 65: sen.50 = fix[1] + fix[5]
As an alternative, I thought about further defining the data subset, which
produces nearly identically results.

m.sen.age <- glmer(test1 ~ ( 1 | study) + ( 1 | id ), data = subset(data,
reference == 1 & age == 60),
family = binomial(link = "logit"), control = glmerControl(optimizer =
"bobyqa"), nAGQ = 1)
sen.age50 = fixef(m.sen.age)

Thank you very much in advance
kb

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R - Populate Another Variable Based on Multiple Conditions | For a Large Dataset

2016-07-03 Thread Jeff Newmiller
There are a great many hits when I search on the keywords "kaplan meier plot 
R"... so my first reaction is that you should be referring to some of the 
existing packages for doing this type of analysis. I do not do this type of 
analysis normally, so am probably not your best helper... perhaps someone else 
will chime in if you show that you have read some existing KM examples. 

My second reaction is that if you want to avoid losing records you should also 
avoid adding records. Your example extends from the first matching date to and 
including the next matching date, which conflicts with analysis of successive 
treatment periods. You may have a good reason for doing this, but in my 
experience this is usually a mistake. 

Finally, I think you should more closely study the use of the ave function that 
I already used if you want to work with the data in its original form. It 
should not be too difficult to generate your diff_days column using ave if you 
have the admin_period column that I showed you how to make. 
-- 
Sent from my phone. Please excuse my brevity.

On July 3, 2016 1:47:17 PM PDT, Kevin Wamae  wrote:
>Hi Bert, my first task is to make a Kaplan Meier Plot to evaluate the
>risk of developing disease in the treated vs the non-treated
>individuals. I therefore figured it might be easier to compute dates
>first as any further analysis will be based on time, in this case days.
>I keep getting recommendations on how to tweak my analysis and keeps
>coming down to dates between the start of drug administration and the
>end of it.
>
>Can you suggest an “easier” way to go about this.. 
>
>Regards
>---
>Kevin Wame 
> 
>
>On 7/3/16, 11:28 PM, "Bert Gunter"  wrote:
>
>I haven't followed this thread closely, but if it's not too late, I
>might suggest that you stop worrying about how you want your data
>frame to look and start worrying about you want to display/analyze
>your data. As Jeff suggested, you and your supervisor are probably
>being driven by paradigms from Excel, SPSS, or whatever that are
>simply unnecessary for R. My guess would be that if you explained the
>sort of analyses/plots you wish to do, you will find it can be done
>fairly directly from your existing data. At the very least it would
>give Jeff and other helpeRs a better idea of what you might need
>rather than what you and your supervisor think you need.
>
>
>Cheers,
>Bert
>
>
>Bert Gunter
>
>"The trouble with having an open mind is that people keep coming along
>and sticking things into it."
>-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
>On Sun, Jul 3, 2016 at 1:08 PM, Kevin Wamae 
>wrote:
>> Hi Jeff, It works on well on a dataset with 10 rows and I figure
>it will work well with the “real” dataset. You’ve been of great help
>and I am starting to make headway.
>>
>> It creates a new dataframe (result), as shown below that doesn’t
>quite have the result as I would want it.
>>
>> ID  admin_periodstart   end ddays
>> J1/31   5/11/07 8/13/07 94
>> J1/32   8/13/07 11/12/0791
>> J1/33   11/12/072/4/08 84
>> J1/34   2/4/08  5/5/08  91
>> J1/35   5/5/08   5/4/09364
>> J1/36   5/4/09   5/17/10378
>> J1/37   5/17/10 5/16/11 364
>> J10/1   1   5/11/07 8/13/07 94
>> J10/1   2   8/13/07 11/12/0791
>> J10/1   3   11/12/072/4/08  84
>> J10/1   4   2/4/085/5/0891
>> J10/1   5   5/5/085/8/09368
>> J10/1   6   5/8/09   5/17/10374
>> J10/1   7   5/17/10 5/16/11 364
>> J102/1  1   5/15/07 8/15/07 92
>> J102/1  2   8/15/07 11/13/0790
>> J102/1  3   11/13/072/5/08 84
>> J102/1  4   2/5/085/6/0891
>> J102/1  5   5/6/085/5/09364
>> J102/1  6   5/5/095/19/10   379
>>
>> My supervisor doesn’t want me to create a new dataset, she’s afraid I
>might lose some data…I cannot fight that.
>>
>> Like you mentioned earlier, I might be mixing up things which I think
>is what you alluded to earlier.
>>
>> After consultation with my supervisor, this is what we’ve agreed. For
>every individual, given the start and end date, create a new column
>(say, diff_days) and for every row that falls within the range of start
>and end_date, get the difference between the date in that row and start
>date and add it to the diff_days column. Below is an example of the
>result. As it can be seen 5/11/2007 is the start while 2/4/2008 is the
>end. The diff_days has been populated excluding the end date and that
>is because that is the start of the study in 2008 that will continue
>into 2009 and thus from 2/4/2008, I should 

Re: [R] R - Populate Another Variable Based on Multiple Conditions | For a Large Dataset

2016-07-03 Thread Kevin Wamae
Hi Bert, my first task is to make a Kaplan Meier Plot to evaluate the risk of 
developing disease in the treated vs the non-treated individuals. I therefore 
figured it might be easier to compute dates first as any further analysis will 
be based on time, in this case days. I keep getting recommendations on how to 
tweak my analysis and keeps coming down to dates between the start of drug 
administration and the end of it.

Can you suggest an “easier” way to go about this.. 

Regards
---
Kevin Wame 
 

On 7/3/16, 11:28 PM, "Bert Gunter"  wrote:

I haven't followed this thread closely, but if it's not too late, I
might suggest that you stop worrying about how you want your data
frame to look and start worrying about you want to display/analyze
your data. As Jeff suggested, you and your supervisor are probably
being driven by paradigms from Excel, SPSS, or whatever that are
simply unnecessary for R. My guess would be that if you explained the
sort of analyses/plots you wish to do, you will find it can be done
fairly directly from your existing data. At the very least it would
give Jeff and other helpeRs a better idea of what you might need
rather than what you and your supervisor think you need.


Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sun, Jul 3, 2016 at 1:08 PM, Kevin Wamae  wrote:
> Hi Jeff, It works on well on a dataset with 10 rows and I figure it will 
> work well with the “real” dataset. You’ve been of great help and I am 
> starting to make headway.
>
> It creates a new dataframe (result), as shown below that doesn’t quite have 
> the result as I would want it.
>
> ID  admin_periodstart   end ddays
> J1/31   5/11/07 8/13/07 94
> J1/32   8/13/07 11/12/0791
> J1/33   11/12/072/4/08 84
> J1/34   2/4/08  5/5/08  91
> J1/35   5/5/08   5/4/09364
> J1/36   5/4/09   5/17/10378
> J1/37   5/17/10 5/16/11 364
> J10/1   1   5/11/07 8/13/07 94
> J10/1   2   8/13/07 11/12/0791
> J10/1   3   11/12/072/4/08  84
> J10/1   4   2/4/085/5/0891
> J10/1   5   5/5/085/8/09368
> J10/1   6   5/8/09   5/17/10374
> J10/1   7   5/17/10 5/16/11 364
> J102/1  1   5/15/07 8/15/07 92
> J102/1  2   8/15/07 11/13/0790
> J102/1  3   11/13/072/5/08 84
> J102/1  4   2/5/085/6/0891
> J102/1  5   5/6/085/5/09364
> J102/1  6   5/5/095/19/10   379
>
> My supervisor doesn’t want me to create a new dataset, she’s afraid I might 
> lose some data…I cannot fight that.
>
> Like you mentioned earlier, I might be mixing up things which I think is what 
> you alluded to earlier.
>
> After consultation with my supervisor, this is what we’ve agreed. For every 
> individual, given the start and end date, create a new column (say, 
> diff_days) and for every row that falls within the range of start and 
> end_date, get the difference between the date in that row and start date and 
> add it to the diff_days column. Below is an example of the result. As it can 
> be seen 5/11/2007 is the start while 2/4/2008 is the end. The diff_days has 
> been populated excluding the end date and that is because that is the start 
> of the study in 2008 that will continue into 2009 and thus from 2/4/2008, I 
> should compute diff_days till 2009 and so no (I hope this makes sense).
>
> ID  datedrug_admin  yearmonth   diff_days
> R1/35/11/2007   Y   20075   0
> R1/35/16/2007   20075   6
> R1/35/22/2007   20075   11
> R1/35/28/2007   20075   17
> R1/31/14/2008   20081   248
> R1/31/21/2008   20081   255
> R1/31/28/2008   20081   263
> R1/32/4/2008Y   20082
>
>
> Regards
> ---
> Kevin Wame
>
>
> On 7/3/16, 10:09 PM, "Jeff Newmiller"  wrote:
>
> Typo on the second line
>
> result <- (   result0
>   %>% select( -admin_period1 )
>   %>% inner_join( result0 %>% select( ID, admin_period1, end=start )
>, by = c( ID="ID", admin_period ="admin_period1" )
> )
>   %>% mutate( ddays = end - start )
>   )
> --
> Sent from my phone. Please excuse my brevity.
>
> On July 3, 2016 11:55:14 AM PDT, Kevin Wamae  
> wrote:

Re: [R] R - Populate Another Variable Based on Multiple Conditions | For a Large Dataset

2016-07-03 Thread Bert Gunter
I haven't followed this thread closely, but if it's not too late, I
might suggest that you stop worrying about how you want your data
frame to look and start worrying about you want to display/analyze
your data. As Jeff suggested, you and your supervisor are probably
being driven by paradigms from Excel, SPSS, or whatever that are
simply unnecessary for R. My guess would be that if you explained the
sort of analyses/plots you wish to do, you will find it can be done
fairly directly from your existing data. At the very least it would
give Jeff and other helpeRs a better idea of what you might need
rather than what you and your supervisor think you need.


Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sun, Jul 3, 2016 at 1:08 PM, Kevin Wamae  wrote:
> Hi Jeff, It works on well on a dataset with 10 rows and I figure it will 
> work well with the “real” dataset. You’ve been of great help and I am 
> starting to make headway.
>
> It creates a new dataframe (result), as shown below that doesn’t quite have 
> the result as I would want it.
>
> ID  admin_periodstart   end ddays
> J1/31   5/11/07 8/13/07 94
> J1/32   8/13/07 11/12/0791
> J1/33   11/12/072/4/08 84
> J1/34   2/4/08  5/5/08  91
> J1/35   5/5/08   5/4/09364
> J1/36   5/4/09   5/17/10378
> J1/37   5/17/10 5/16/11 364
> J10/1   1   5/11/07 8/13/07 94
> J10/1   2   8/13/07 11/12/0791
> J10/1   3   11/12/072/4/08  84
> J10/1   4   2/4/085/5/0891
> J10/1   5   5/5/085/8/09368
> J10/1   6   5/8/09   5/17/10374
> J10/1   7   5/17/10 5/16/11 364
> J102/1  1   5/15/07 8/15/07 92
> J102/1  2   8/15/07 11/13/0790
> J102/1  3   11/13/072/5/08 84
> J102/1  4   2/5/085/6/0891
> J102/1  5   5/6/085/5/09364
> J102/1  6   5/5/095/19/10   379
>
> My supervisor doesn’t want me to create a new dataset, she’s afraid I might 
> lose some data…I cannot fight that.
>
> Like you mentioned earlier, I might be mixing up things which I think is what 
> you alluded to earlier.
>
> After consultation with my supervisor, this is what we’ve agreed. For every 
> individual, given the start and end date, create a new column (say, 
> diff_days) and for every row that falls within the range of start and 
> end_date, get the difference between the date in that row and start date and 
> add it to the diff_days column. Below is an example of the result. As it can 
> be seen 5/11/2007 is the start while 2/4/2008 is the end. The diff_days has 
> been populated excluding the end date and that is because that is the start 
> of the study in 2008 that will continue into 2009 and thus from 2/4/2008, I 
> should compute diff_days till 2009 and so no (I hope this makes sense).
>
> ID  datedrug_admin  yearmonth   diff_days
> R1/35/11/2007   Y   20075   0
> R1/35/16/2007   20075   6
> R1/35/22/2007   20075   11
> R1/35/28/2007   20075   17
> R1/31/14/2008   20081   248
> R1/31/21/2008   20081   255
> R1/31/28/2008   20081   263
> R1/32/4/2008Y   20082
>
>
> Regards
> ---
> Kevin Wame
>
>
> On 7/3/16, 10:09 PM, "Jeff Newmiller"  wrote:
>
> Typo on the second line
>
> result <- (   result0
>   %>% select( -admin_period1 )
>   %>% inner_join( result0 %>% select( ID, admin_period1, end=start )
>, by = c( ID="ID", admin_period ="admin_period1" )
> )
>   %>% mutate( ddays = end - start )
>   )
> --
> Sent from my phone. Please excuse my brevity.
>
> On July 3, 2016 11:55:14 AM PDT, Kevin Wamae  
> wrote:
>>Hi Jeff, “likes its Excel”, I don’t follow. Pardon me for any mix up.
>>
>>Thanks for the code.  After running it, this is the error I get.
>>
>>Error: cannot join on columns 'admin_period' x 'admin_period1': index
>>out of bounds
>>
>>Regards
>>---
>>Kevin Wame | Ph.D. Student (IDeAL)
>>KEMRI-Wellcome Trust Collaborative Research Programme
>>Centre for Geographic Medicine Research
>>P.O. Box 230-80108, Kilifi, Kenya
>>
>>
>>On 7/3/16, 9:34 PM, "Jeff Newmiller"  wrote:
>>
>>I still get the impression from your mixing of information types that

Re: [R] regroup row names

2016-07-03 Thread Ulrik Stervbo
Do the elements in 'locs' hahabe an _ somewhere? If not the search and
replace find nothing.

Bert's suggestion of taking a substring is better if you are just
interested in characters on fixed positions.

Bert also suggested that you could maybe benefit from reading a few
tutorials and I agree.

Best,
Ulrik

On Sun, 3 Jul 2016, 21:36 lily li,  wrote:

> Hi Ulrik,
>
> I created another column named locs, and used the code df$locs <-
> gsub("_.*", "", df$locs), but I found that the names does not change at
> all. And the new column becomes characters after using the gsub function.
> What is the problem? Thanks again.
>
>
> On Sun, Jul 3, 2016 at 12:41 PM, Ulrik Stervbo 
> wrote:
>
>> Hi Lily,
>>
>> My suggestion should remove the underscore and everything after it,
>> leaving just aClim and bClim in the ID column.
>>
>> Best
>> Ulrik
>>
>> On Sun, 3 Jul 2016, 20:34 lily li,  wrote:
>>
>>> Hi Ulrik,
>>>
>>> Thanks. This is for one group, but how to do for several groups? I tried
>>> gsub(c(),c(),df$ID), but it does not work.
>>>
>>>
>>> On Sun, Jul 3, 2016 at 12:24 PM, Ulrik Stervbo 
>>> wrote:
>>>
 Hi Lily,

 you can use gsub:

 df$ID <- gsub("_.*", "", df$ID)

 HTH
 Ulrik

 On Sun, 3 Jul 2016 at 20:16 lily li  wrote:

> I have a problem in changing row names in a dataframe in R. The first
> column is ID, such as aClim_st02, aClim_st03, aClim_st 05, bClim_st01,
> bClim_st02, etc. How to rename the names, so that aClim_ all grouped to
> aClim, while bClim_ all grouped to bClim? Thanks for your help.
> df
>
> IDtemp   precip   LW   SW
> aClim_st02
> aClim_st03
> aClim_st05
> bClim_st01
> bClim_st02
> ...
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

>>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R - Populate Another Variable Based on Multiple Conditions | For a Large Dataset

2016-07-03 Thread Kevin Wamae
Hi Jeff, It works on well on a dataset with 10 rows and I figure it will 
work well with the “real” dataset. You’ve been of great help and I am starting 
to make headway. 

It creates a new dataframe (result), as shown below that doesn’t quite have the 
result as I would want it.

ID  admin_periodstart   end ddays
J1/31   5/11/07 8/13/07 94
J1/32   8/13/07 11/12/0791
J1/33   11/12/072/4/08 84
J1/34   2/4/08  5/5/08  91
J1/35   5/5/08   5/4/09364
J1/36   5/4/09   5/17/10378
J1/37   5/17/10 5/16/11 364
J10/1   1   5/11/07 8/13/07 94
J10/1   2   8/13/07 11/12/0791
J10/1   3   11/12/072/4/08  84
J10/1   4   2/4/085/5/0891
J10/1   5   5/5/085/8/09368
J10/1   6   5/8/09   5/17/10374
J10/1   7   5/17/10 5/16/11 364
J102/1  1   5/15/07 8/15/07 92
J102/1  2   8/15/07 11/13/0790
J102/1  3   11/13/072/5/08 84
J102/1  4   2/5/085/6/0891
J102/1  5   5/6/085/5/09364
J102/1  6   5/5/095/19/10   379

My supervisor doesn’t want me to create a new dataset, she’s afraid I might 
lose some data…I cannot fight that.

Like you mentioned earlier, I might be mixing up things which I think is what 
you alluded to earlier.

After consultation with my supervisor, this is what we’ve agreed. For every 
individual, given the start and end date, create a new column (say, diff_days) 
and for every row that falls within the range of start and end_date, get the 
difference between the date in that row and start date and add it to the 
diff_days column. Below is an example of the result. As it can be seen 
5/11/2007 is the start while 2/4/2008 is the end. The diff_days has been 
populated excluding the end date and that is because that is the start of the 
study in 2008 that will continue into 2009 and thus from 2/4/2008, I should 
compute diff_days till 2009 and so no (I hope this makes sense).

ID  datedrug_admin  yearmonth   diff_days
R1/35/11/2007   Y   20075   0
R1/35/16/2007   20075   6
R1/35/22/2007   20075   11
R1/35/28/2007   20075   17
R1/31/14/2008   20081   248
R1/31/21/2008   20081   255
R1/31/28/2008   20081   263
R1/32/4/2008Y   20082   


Regards
---
Kevin Wame 
 

On 7/3/16, 10:09 PM, "Jeff Newmiller"  wrote:

Typo on the second line

result <- (   result0 
  %>% select( -admin_period1 )
  %>% inner_join( result0 %>% select( ID, admin_period1, end=start )
   , by = c( ID="ID", admin_period ="admin_period1" )
)
  %>% mutate( ddays = end - start )
  )
-- 
Sent from my phone. Please excuse my brevity.

On July 3, 2016 11:55:14 AM PDT, Kevin Wamae  wrote:
>Hi Jeff, “likes its Excel”, I don’t follow. Pardon me for any mix up.
>
>Thanks for the code.  After running it, this is the error I get.
>
>Error: cannot join on columns 'admin_period' x 'admin_period1': index
>out of bounds
>
>Regards
>---
>Kevin Wame | Ph.D. Student (IDeAL)
>KEMRI-Wellcome Trust Collaborative Research Programme
>Centre for Geographic Medicine Research
>P.O. Box 230-80108, Kilifi, Kenya
> 
>
>On 7/3/16, 9:34 PM, "Jeff Newmiller"  wrote:
>
>I still get the impression from your mixing of information types that
>you are thinking like this is Excel.
>
>Perhaps something like
>
>drug_study$admin_period  <- ave( "Y" == drug_study$drug_admin,
>drug_study$ID, FUN=cumsum )
>library(dplyr)
>result0 <- (   drug_study
>  %>% filter( 0 != admin_period )
>  %>% group_by( ID, admin_period )
>  %>% summarise( start = min( date ) )
>  %>% mutate( admin_period1 = admin_period -1 )
>  )
>result <- (   result0 
>  %>% select( -admin_period )
> %>% inner_join( result0 %>% select( ID, admin_period1, end=start )
> , by = c( ID="ID", admin_period ="admin_period1" )
>)
>  %>% mutate( ddays = end - start )
>  )
>-- 
>Sent from my phone. Please excuse my brevity.
>
>On July 3, 2016 10:24:51 AM PDT, Kevin Wamae
> wrote:
>>HI Jeff, it’s been an uphill task working with the dataset and I am
>not
>>the first to complain. Nonetheless, data-cleaning is ongoing and since
>>I cannot wait for that to get done, I decided to make the most of what
>>the dataset looks like at 

Re: [R] R - Populate Another Variable Based on Multiple Conditions | For a Large Dataset

2016-07-03 Thread Kevin Wamae
Thanks Jeff, let me try it on the larger dataset.

Regards
---
Kevin Wame 
 

On 7/3/16, 10:09 PM, "Jeff Newmiller"  wrote:

result <- (   result0 
  %>% select( -admin_period1 )
  %>% inner_join( result0 %>% select( ID, admin_period1, end=start )
   , by = c( ID="ID", admin_period ="admin_period1" )
)
  %>% mutate( ddays = end - start )
  )


__

This e-mail contains information which is confidential. It is intended only for 
the use of the named recipient. If you have received this e-mail in error, 
please let us know by replying to the sender, and immediately delete it from 
your system.  Please note, that in these circumstances, the use, disclosure, 
distribution or copying of this information is strictly prohibited. 
KEMRI-Wellcome Trust Programme cannot accept any responsibility for the  
accuracy or completeness of this message as it has been transmitted over a 
public network. Although the Programme has taken reasonable precautions to 
ensure no viruses are present in emails, it cannot accept responsibility for 
any loss or damage arising from the use of the email or attachments. Any views 
expressed in this message are those of the individual sender, except where the 
sender specifically states them to be the views of KEMRI-Wellcome Trust 
Programme.
__
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R - Populate Another Variable Based on Multiple Conditions | For a Large Dataset

2016-07-03 Thread Jeff Newmiller
Typo on the second line

result <- (   result0 
  %>% select( -admin_period1 )
  %>% inner_join( result0 %>% select( ID, admin_period1, end=start )
   , by = c( ID="ID", admin_period ="admin_period1" )
)
  %>% mutate( ddays = end - start )
  )
-- 
Sent from my phone. Please excuse my brevity.

On July 3, 2016 11:55:14 AM PDT, Kevin Wamae  wrote:
>Hi Jeff, “likes its Excel”, I don’t follow. Pardon me for any mix up.
>
>Thanks for the code.  After running it, this is the error I get.
>
>Error: cannot join on columns 'admin_period' x 'admin_period1': index
>out of bounds
>
>Regards
>---
>Kevin Wame | Ph.D. Student (IDeAL)
>KEMRI-Wellcome Trust Collaborative Research Programme
>Centre for Geographic Medicine Research
>P.O. Box 230-80108, Kilifi, Kenya
> 
>
>On 7/3/16, 9:34 PM, "Jeff Newmiller"  wrote:
>
>I still get the impression from your mixing of information types that
>you are thinking like this is Excel.
>
>Perhaps something like
>
>drug_study$admin_period  <- ave( "Y" == drug_study$drug_admin,
>drug_study$ID, FUN=cumsum )
>library(dplyr)
>result0 <- (   drug_study
>  %>% filter( 0 != admin_period )
>  %>% group_by( ID, admin_period )
>  %>% summarise( start = min( date ) )
>  %>% mutate( admin_period1 = admin_period -1 )
>  )
>result <- (   result0 
>  %>% select( -admin_period )
> %>% inner_join( result0 %>% select( ID, admin_period1, end=start )
> , by = c( ID="ID", admin_period ="admin_period1" )
>)
>  %>% mutate( ddays = end - start )
>  )
>-- 
>Sent from my phone. Please excuse my brevity.
>
>On July 3, 2016 10:24:51 AM PDT, Kevin Wamae
> wrote:
>>HI Jeff, it’s been an uphill task working with the dataset and I am
>not
>>the first to complain. Nonetheless, data-cleaning is ongoing and since
>>I cannot wait for that to get done, I decided to make the most of what
>>the dataset looks like at this time. It appears the process may take a
>>while.
>>
>>Thanks for the script. From the output, I noticed that “result”
>>contains the first and last date for each of the individuals and not
>>taking into account the variable “drug-admin”. 
>>
>>IDstart   end
>>J1/3  1/5/09  12/25/10
>>R1/3  1/4/07  12/15/08
>>R10/1 1/4/07  3/5/12
>>
>>My aim is to pick the date, for example in 2007, where drug-admin ==
>>“Y” as my start and the date in the subsequent year (2008 in this
>case)
>>where drug-admin == “Y” as my end. Then, I should populate the
>variable
>>“study_id” with “start” up to the entry just above the one whose date
>>matches “end”, as the output below shows (I hope its structure is
>>maintained as I have copied it from R-Studio). The goal for now is to
>>then get difference in days between “date” and “study_id” and still
>get
>>to keep that column for “study_id” as I might use it later.
>>
>>From the output, it can be seen that for this individual, the dates
>run
>>from 2007 to 2008. However, for some individuals, the dates run from
>>2008-2009, 2009-2010 and so on. Therefore, I need to make the script
>>deal with all the years as the dates range from 2001-2016
>>
>>IDdatedrug_admin  yearmonth   study_id
>>R1/3  5/11/07 Y   20075   5/11/07
>>R1/3  5/16/07 20075   5/11/07
>>R1/3  5/22/07 20075   5/11/07
>>R1/3  5/28/07 20075   5/11/07
>>R1/3  6/5/07  20076   5/11/07
>>R1/3  6/11/07 20076   5/11/07
>>R1/3  6/18/07 20076   5/11/07
>>R1/3  6/25/07 20076   5/11/07
>>R1/3  7/2/07  20077   5/11/07
>>R1/3  7/16/07 20077   5/11/07
>>R1/3  7/29/07 20077   5/11/07
>>R1/3  8/2/07  20078   5/11/07
>>R1/3  8/7/07  20078   5/11/07
>>R1/3  8/13/07 20078   5/11/07
>>R1/3  9/18/07 20079   5/11/07
>>R1/3  9/24/07 20079   5/11/07
>>R1/3  10/6/07 200710  5/11/07
>>R1/3  10/8/07 200710  5/11/07
>>R1/3  10/15/07200710  5/11/07
>>R1/3  10/22/07200710  5/11/07
>>R1/3  10/29/07200710  5/11/07
>>R1/3  11/8/07 200711  5/11/07
>>R1/3  11/12/07200711  5/11/07
>>R1/3  11/19/07200711  5/11/07
>>R1/3  11/29/07200711  5/11/07
>>R1/3  12/6/07 200712  5/11/07
>>R1/3  12/10/07200712  5/11/07
>>R1/3  12/21/07200712  5/11/07
>>R1/3  1/7/08  20081   5/11/07
>>R1/3  1/14/08 20081

Re: [R] regroup row names

2016-07-03 Thread Ulrik Stervbo
Hi Lily,

My suggestion should remove the underscore and everything after it, leaving
just aClim and bClim in the ID column.

Best
Ulrik

On Sun, 3 Jul 2016, 20:34 lily li,  wrote:

> Hi Ulrik,
>
> Thanks. This is for one group, but how to do for several groups? I tried
> gsub(c(),c(),df$ID), but it does not work.
>
>
> On Sun, Jul 3, 2016 at 12:24 PM, Ulrik Stervbo 
> wrote:
>
>> Hi Lily,
>>
>> you can use gsub:
>>
>> df$ID <- gsub("_.*", "", df$ID)
>>
>> HTH
>> Ulrik
>>
>> On Sun, 3 Jul 2016 at 20:16 lily li  wrote:
>>
>>> I have a problem in changing row names in a dataframe in R. The first
>>> column is ID, such as aClim_st02, aClim_st03, aClim_st 05, bClim_st01,
>>> bClim_st02, etc. How to rename the names, so that aClim_ all grouped to
>>> aClim, while bClim_ all grouped to bClim? Thanks for your help.
>>> df
>>>
>>> IDtemp   precip   LW   SW
>>> aClim_st02
>>> aClim_st03
>>> aClim_st05
>>> bClim_st01
>>> bClim_st02
>>> ...
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regroup row names

2016-07-03 Thread Bert Gunter
I strongly suspect that you do not need to do this. What I think you
do need to do is to create a new column (which will be a factor)
identifying the climate ("a" or "b"), which can then be used to group
climates in plots, used as a covariate in statistical analyses, etc.
Moreover, there is probably no need for things to be in order (R is
not Excel or SPSS or ...).

You can either use regexp's (e.g. grep -- very powerful but with a
hefty learning curve) or because of the simplicity of your ID's,
?substring ; e.g.

yourdat$clim_type <- substring(yourdat$ID,1,1)

Please do some more tutorials on your own, as these (not regexp's) are
fairly basic R features that all users should be aware of.

Incidentally, check out the "stringr" package, which is supposed to
make string manipulation tasks like this easier (I have not used it
though).

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sun, Jul 3, 2016 at 11:14 AM, lily li  wrote:
> I have a problem in changing row names in a dataframe in R. The first
> column is ID, such as aClim_st02, aClim_st03, aClim_st 05, bClim_st01,
> bClim_st02, etc. How to rename the names, so that aClim_ all grouped to
> aClim, while bClim_ all grouped to bClim? Thanks for your help.
> df
>
> IDtemp   precip   LW   SW
> aClim_st02
> aClim_st03
> aClim_st05
> bClim_st01
> bClim_st02
> ...
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R - Populate Another Variable Based on Multiple Conditions | For a Large Dataset

2016-07-03 Thread Jeff Newmiller
I still get the impression from your mixing of information types that you are 
thinking like this is Excel.

Perhaps something like

drug_study$admin_period  <- ave( "Y" == drug_study$drug_admin, drug_study$ID, 
FUN=cumsum )
library(dplyr)
result0 <- (   drug_study
  %>% filter( 0 != admin_period )
  %>% group_by( ID, admin_period )
  %>% summarise( start = min( date ) )
  %>% mutate( admin_period1 = admin_period -1 )
  )
result <- (   result0 
  %>% select( -admin_period )
  %>% inner_join( result0 %>% select( ID, admin_period1, end=start )
   , by = c( ID="ID", admin_period ="admin_period1" )
)
  %>% mutate( ddays = end - start )
  )
-- 
Sent from my phone. Please excuse my brevity.

On July 3, 2016 10:24:51 AM PDT, Kevin Wamae  wrote:
>HI Jeff, it’s been an uphill task working with the dataset and I am not
>the first to complain. Nonetheless, data-cleaning is ongoing and since
>I cannot wait for that to get done, I decided to make the most of what
>the dataset looks like at this time. It appears the process may take a
>while.
>
>Thanks for the script. From the output, I noticed that “result”
>contains the first and last date for each of the individuals and not
>taking into account the variable “drug-admin”. 
>
>ID start   end
>J1/3   1/5/09  12/25/10
>R1/3   1/4/07  12/15/08
>R10/1  1/4/07  3/5/12
>
>My aim is to pick the date, for example in 2007, where drug-admin ==
>“Y” as my start and the date in the subsequent year (2008 in this case)
>where drug-admin == “Y” as my end. Then, I should populate the variable
>“study_id” with “start” up to the entry just above the one whose date
>matches “end”, as the output below shows (I hope its structure is
>maintained as I have copied it from R-Studio). The goal for now is to
>then get difference in days between “date” and “study_id” and still get
>to keep that column for “study_id” as I might use it later.
>
>From the output, it can be seen that for this individual, the dates run
>from 2007 to 2008. However, for some individuals, the dates run from
>2008-2009, 2009-2010 and so on. Therefore, I need to make the script
>deal with all the years as the dates range from 2001-2016
>
>ID datedrug_admin  yearmonth   study_id
>R1/3   5/11/07 Y   20075   5/11/07
>R1/3   5/16/07 20075   5/11/07
>R1/3   5/22/07 20075   5/11/07
>R1/3   5/28/07 20075   5/11/07
>R1/3   6/5/07  20076   5/11/07
>R1/3   6/11/07 20076   5/11/07
>R1/3   6/18/07 20076   5/11/07
>R1/3   6/25/07 20076   5/11/07
>R1/3   7/2/07  20077   5/11/07
>R1/3   7/16/07 20077   5/11/07
>R1/3   7/29/07 20077   5/11/07
>R1/3   8/2/07  20078   5/11/07
>R1/3   8/7/07  20078   5/11/07
>R1/3   8/13/07 20078   5/11/07
>R1/3   9/18/07 20079   5/11/07
>R1/3   9/24/07 20079   5/11/07
>R1/3   10/6/07 200710  5/11/07
>R1/3   10/8/07 200710  5/11/07
>R1/3   10/15/07200710  5/11/07
>R1/3   10/22/07200710  5/11/07
>R1/3   10/29/07200710  5/11/07
>R1/3   11/8/07 200711  5/11/07
>R1/3   11/12/07200711  5/11/07
>R1/3   11/19/07200711  5/11/07
>R1/3   11/29/07200711  5/11/07
>R1/3   12/6/07 200712  5/11/07
>R1/3   12/10/07200712  5/11/07
>R1/3   12/21/07200712  5/11/07
>R1/3   1/7/08  20081   5/11/07
>R1/3   1/14/08 20081   5/11/07
>R1/3   1/21/08 20081   5/11/07
>R1/3   1/28/08 20081   5/11/07
>R1/3   2/4/08  Y   20082   
>
>
>Regards
>---
>Kevin Wame 
>
>###
>
>###
>
>
>
>On 7/3/16, 7:05 PM, "Jeff Newmiller"  wrote:
>
>result <- setNames( data.frame( aggregate( date~ID, data=drug_study,
>FUN=min ),  aggregate( date~ID, data=drug_study, FUN=max )[2] ), c(
>"ID", "start", "end" ) )
>
>
>__
>
>This e-mail contains information which is confidential. It is intended
>only for the use of the named recipient. If you have received this
>e-mail in error, please let us know by replying to the sender, and
>immediately delete it from your system.  Please note, that in these
>circumstances, the use, disclosure, distribution or copying of 

Re: [R] regroup row names

2016-07-03 Thread lily li
Hi Ulrik,

Thanks. This is for one group, but how to do for several groups? I tried
gsub(c(),c(),df$ID), but it does not work.


On Sun, Jul 3, 2016 at 12:24 PM, Ulrik Stervbo 
wrote:

> Hi Lily,
>
> you can use gsub:
>
> df$ID <- gsub("_.*", "", df$ID)
>
> HTH
> Ulrik
>
> On Sun, 3 Jul 2016 at 20:16 lily li  wrote:
>
>> I have a problem in changing row names in a dataframe in R. The first
>> column is ID, such as aClim_st02, aClim_st03, aClim_st 05, bClim_st01,
>> bClim_st02, etc. How to rename the names, so that aClim_ all grouped to
>> aClim, while bClim_ all grouped to bClim? Thanks for your help.
>> df
>>
>> IDtemp   precip   LW   SW
>> aClim_st02
>> aClim_st03
>> aClim_st05
>> bClim_st01
>> bClim_st02
>> ...
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regroup row names

2016-07-03 Thread Ulrik Stervbo
Hi Lily,

you can use gsub:

df$ID <- gsub("_.*", "", df$ID)

HTH
Ulrik

On Sun, 3 Jul 2016 at 20:16 lily li  wrote:

> I have a problem in changing row names in a dataframe in R. The first
> column is ID, such as aClim_st02, aClim_st03, aClim_st 05, bClim_st01,
> bClim_st02, etc. How to rename the names, so that aClim_ all grouped to
> aClim, while bClim_ all grouped to bClim? Thanks for your help.
> df
>
> IDtemp   precip   LW   SW
> aClim_st02
> aClim_st03
> aClim_st05
> bClim_st01
> bClim_st02
> ...
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] regroup row names

2016-07-03 Thread lily li
I have a problem in changing row names in a dataframe in R. The first
column is ID, such as aClim_st02, aClim_st03, aClim_st 05, bClim_st01,
bClim_st02, etc. How to rename the names, so that aClim_ all grouped to
aClim, while bClim_ all grouped to bClim? Thanks for your help.
df

IDtemp   precip   LW   SW
aClim_st02
aClim_st03
aClim_st05
bClim_st01
bClim_st02
...

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R - Populate Another Variable Based on Multiple Conditions | For a Large Dataset

2016-07-03 Thread Kevin Wamae
HI Jeff, it’s been an uphill task working with the dataset and I am not the 
first to complain. Nonetheless, data-cleaning is ongoing and since I cannot 
wait for that to get done, I decided to make the most of what the dataset looks 
like at this time. It appears the process may take a while.

Thanks for the script. From the output, I noticed that “result” contains the 
first and last date for each of the individuals and not taking into account the 
variable “drug-admin”. 

ID  start   end
J1/31/5/09  12/25/10
R1/31/4/07  12/15/08
R10/1   1/4/07  3/5/12

My aim is to pick the date, for example in 2007, where drug-admin == “Y” as my 
start and the date in the subsequent year (2008 in this case) where drug-admin 
== “Y” as my end. Then, I should populate the variable “study_id” with “start” 
up to the entry just above the one whose date matches “end”, as the output 
below shows (I hope its structure is maintained as I have copied it from 
R-Studio). The goal for now is to then get difference in days between “date” 
and “study_id” and still get to keep that column for “study_id” as I might use 
it later.

From the output, it can be seen that for this individual, the dates run from 
2007 to 2008. However, for some individuals, the dates run from 2008-2009, 
2009-2010 and so on. Therefore, I need to make the script deal with all the 
years as the dates range from 2001-2016

ID  datedrug_admin  yearmonth   study_id
R1/35/11/07 Y   20075   5/11/07
R1/35/16/07 20075   5/11/07
R1/35/22/07 20075   5/11/07
R1/35/28/07 20075   5/11/07
R1/36/5/07  20076   5/11/07
R1/36/11/07 20076   5/11/07
R1/36/18/07 20076   5/11/07
R1/36/25/07 20076   5/11/07
R1/37/2/07  20077   5/11/07
R1/37/16/07 20077   5/11/07
R1/37/29/07 20077   5/11/07
R1/38/2/07  20078   5/11/07
R1/38/7/07  20078   5/11/07
R1/38/13/07 20078   5/11/07
R1/39/18/07 20079   5/11/07
R1/39/24/07 20079   5/11/07
R1/310/6/07 200710  5/11/07
R1/310/8/07 200710  5/11/07
R1/310/15/07200710  5/11/07
R1/310/22/07200710  5/11/07
R1/310/29/07200710  5/11/07
R1/311/8/07 200711  5/11/07
R1/311/12/07200711  5/11/07
R1/311/19/07200711  5/11/07
R1/311/29/07200711  5/11/07
R1/312/6/07 200712  5/11/07
R1/312/10/07200712  5/11/07
R1/312/21/07200712  5/11/07
R1/31/7/08  20081   5/11/07
R1/31/14/08 20081   5/11/07
R1/31/21/08 20081   5/11/07
R1/31/28/08 20081   5/11/07
R1/32/4/08  Y   20082   


Regards
---
Kevin Wame 

###

###



On 7/3/16, 7:05 PM, "Jeff Newmiller"  wrote:

result <- setNames( data.frame( aggregate( date~ID, data=drug_study, FUN=min ), 
 aggregate( date~ID, data=drug_study, FUN=max )[2] ), c( "ID", "start", "end" ) 
)


__

This e-mail contains information which is confidential. It is intended only for 
the use of the named recipient. If you have received this e-mail in error, 
please let us know by replying to the sender, and immediately delete it from 
your system.  Please note, that in these circumstances, the use, disclosure, 
distribution or copying of this information is strictly prohibited. 
KEMRI-Wellcome Trust Programme cannot accept any responsibility for the  
accuracy or completeness of this message as it has been transmitted over a 
public network. Although the Programme has taken reasonable precautions to 
ensure no viruses are present in emails, it cannot accept responsibility for 
any loss or damage arising from the use of the email or attachments. Any views 
expressed in this message are those of the individual sender, except where the 
sender specifically states them to be the views of KEMRI-Wellcome Trust 
Programme.
__
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, 

Re: [R] BCa Bootstrapped regression coefficients from lmrob function not working

2016-07-03 Thread peter dalgaard

> On 03 Jul 2016, at 13:47 , varin sacha via R-help  
> wrote:
> 
> Dear R-experts,
> 
> I am trying to calculate the bootstrapped (BCa) regression coefficients for a 
> robust regression using MM-type estimator (lmrob function from robustbase 
> package).
> 
> My R code here below is showing a warning message ([1] "All values of t are 
> equal to 
> 22.2073014256803\n Can not calculate confidence intervals" NULL), I was 
> wondering if it was because I am trying to fit a robust regression with lmrob 
> function rather than a simple lm ? I mean maybe the boot.ci function does not 
> work with lmrob function ? If not, I was wondering what was going on ?

You need to review your code. You calculate a,b,c,d in the global environment 
and create newdata as a subset of Dataset, then use a,b,c,d in the formula, but 
no such variables are in newdata. AFAICT, all your bootstrap fits use the 
_same_ global values for a,b,c,d hence give the same result 1000 times...

-pd


> 
> Here is the reproducible example
> 
> 
> Dataset = 
> data.frame(PIBparHab=c(43931,67524,48348,44827,52409,15245,24453,57636,28992,17102,51495,47243,40908,22494,12784,48391,44221,32514,35132,46679,106022,9817,99635,38678,49128,12876,20732,17151,19670,41053,22488,57134,83295,10660),
> 
> QUALITESANSREDONDANCE=c(1082.5,1066.6,1079.3,1079.9,1074.9,1008.6,1007.5,.3,1108.2,1109.7,1059.6,1165.1,1026.7,1035.1,997.8,1044.8,1073.6,1085.7,1083.8,1021.6,1036.2,1075.3,1069.3,1101.4,1086.9,1072.1,1166.7,983.9,1004.5,1082.5,1123.5,1094.9,1105.1,1010.8),
> 
> competitivite=c(89,83,78,73,90,71,77,85,61,67,98,82,70,43,57,78,72,79,61,71,86,63,90,75,87,64,60,56,66,80,53,91,97,62),
> 
> innovation=c(56,52,53,54,57,43,54,60,47,55,58,62,52,35,47,59,56,56,45,52,58,33,57,57,61,40,45,41,50,61,50,65,68,34))
> 
> library("robustbase")
> newdata=na.omit(Dataset)
> a=Dataset$PIBparHab
> b=Dataset$QUALITESANSREDONDANCE
> c=Dataset$competitivite
> d=Dataset$innovation
> 
> fm.lmrob=lmrob(a~b+c+d,data=newdata)
> fm.lmrob
> 
> boot.Lmrob=function(formula,data,indices) {
> d=data[indices,]
> fit=lmrob(formula,data=d)
> return(coef(fit))
> }
> 
> library(boot)
> results=boot(data=newdata, statistic=boot.Lmrob, R=1000,formula=a~b+c+d)
> boot.ci(results, type= "bca",index=2)
> 
> 
> Any help would be highly appreciated,
> S
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R - Populate Another Variable Based on Multiple Conditions | For a Large Dataset

2016-07-03 Thread Jeff Newmiller
Your goal of putting character representations of dates in certain rows of a 
column is hard to imagine a use for.  Your goal of identifying start and end 
dates seems reasonable enough. It can be accomplished using aggregate from base 
R (less external dependency) or summarise from dplyr (faster, simpler syntax):

result <- setNames( data.frame( aggregate( date~ID, data=drug_study, FUN=min ), 
 aggregate( date~ID, data=drug_study, FUN=max )[2] ), c( "ID", "start", "end" ) 
)

or

library( dplyr )
result <- (   drug_study
  %>% group_by( ID )
  %>% summarise( start=min( date ), end=max( date) )
   )

-- 
Sent from my phone. Please excuse my brevity.

On July 3, 2016 5:19:01 AM PDT, Kevin Wamae  wrote:
>Hi John, attached is the file in txt. Kindly let me know if it fails
>again..
>
>Regards
>---
>Kevin Wame | Ph.D. Student (IDeAL)
>KEMRI-Wellcome Trust Collaborative Research Programme
>Centre for Geographic Medicine Research
>P.O. Box 230-80108, Kilifi, Kenya
> 
>
>On 7/3/16, 3:16 PM, "John Kane"  wrote:
>
>The data set did not show up. The R-help list tends to strip out most
>file types as a safety precaution.  Try renaming the file from xxx.csv
>to xxx.txt and it should come through alright.
>
>
>
>John Kane
>Kingston ON Canada
>
>
>> -Original Message-
>> From: kwa...@kemri-wellcome.org
>> Sent: Sun, 3 Jul 2016 09:39:59 +
>> To: jdnew...@dcn.davis.ca.us, r-help@r-project.org
>> Subject: Re: [R] R - Populate Another Variable Based on Multiple
>> Conditions | For a Large Dataset
>> 
>> Hi Jeff, pardon me, I was surely not making it easy. I hope this time
>I
>> will ☺
>> 
>> Attached is snippet of the dataset in csv format and below is the
>> R.script I have managed so far.
>> 
>>
>---
>>
>---
>> 
>> drug_study <- read.csv("drug_study.csv", header = T);
>head(drug_study)
>> drug_study$date <- as.Date(drug_study$date, "%m/%d/%Y")
>> drug_study$study_id <- ""  #create new column
>> 
>> individual <- unique (drug_study$ID)  #vector of individuals
>> datalength <- dim(drug_study)[1]  #number of rows in dataframe
>> 
>> for (i in 1:length(individual)) {
>>   for (j in 1:datalength) {
>> start_admin <- drug_study[c(drug_study$ID == individual[i] &
>> drug_study$year == 2007 & drug_study$drug_admin == "Y" &
>drug_study$month
>> == 5),2]  #capture date of start
>> end_admin <- drug_study[(drug_study$ID == individual[i] &
>> drug_study$year == 2008 & drug_study$drug_admin == "Y" &
>drug_study$month
>> == 2),2]#capture date of end
>> 
>> if(drug_study[j,1] == individual[i] & drug_study[j,2] >=
>start_admin
>> & drug_study[j,2] < end_admin) {
>>   drug_study[j,6] <- paste(start_admin) #populate respective row
>if
>> condition is met
>> }
>>   }
>> }
>>
>~
>>
>~
>> 
>> For this dataset, there exists three individuals, J1/3, R1/3, R10/1.
>> 
>> The script works for the last two individuals but not J1/3 with the
>error
>> below:
>> 
>>
>~
>>
>~
>> Error in if (drug_study[j, 1] == individual[i] & drug_study[j, 2] >=
>> start_admin &  :
>>   argument is of length zero
>>
>~
>>
>~
>> 
>> I figured it’s because this individuals start_admin and end_admin
>dates
>> aren’t captured because the if-loop fails. There’s my first problem,
>> there are thousands of individuals with varying
>> start_admin and end_admin dates and I need a script to capture these
>for
>> every individual.
>> 
>> Secondly, the above script is taking almost an hour to run for the
>entire
>> dataset, just for the individuals whose start_admin and end_admin
>dates
>> can be captured by the if-loop.
>> 
>> I need help in coming up with a script that will tackle the problem
>> taking into account the different start_admin and end_admin dates and
>be
>> resourceful with regards to time.
>> 
>> Regards
>>
>---
>> Kevin Kariuki
>> 
>>
>###
>>

Re: [R] Extracting matrix from netCDF file using ncdf4 package

2016-07-03 Thread Bert Gunter
Well, yes, ... but no: there is no need to pre-define the matrix.

The following is still a (interpreted) loop, but it is fast and short.

## ex is the downloaded array, here filled with random numbers

reqX = c(35,35,40,65,95)
reqY = c(2,5,10,112,120)

out <-sapply(seq_along(reqX), function(i)ex[reqX[i],reqY[i],] )

> dim(out)
[1] 365   5

You might find it useful to go through a (web) tutorial or two to
learn more about such R functionality.
Useful suggestions can be found here: https://www.rstudio.com/online-learning/#R

Cheers,
Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sat, Jul 2, 2016 at 3:43 PM, Hemant Chowdhary via R-help
 wrote:
>  I am working with a 3-dimensional netCDF file having dimensions of X=100, 
> Y=200, T=365.
> My objective is to extract time vectors of a few specific grids that may not 
> be contiguous on X and/or Y.
>
> For example, I want to extract a 5x365 matrix where 5 rows are each vectors 
> of length 365 of 5 specific X,Y combinations.
>
> For this, I am currently using the following
>
> reqX = c(35,35,40,65,95);
> reqY = c(2,5,10,112,120,120);
> nD = length(reqX)
> for(i in 1:nD){
> idX = ncvar_get(nc=myNC, varid="myVar", start=c(reqX[i],reqY[i]), 
> count=c(1,1))
> if(i==1){dX = idX} else {dX = rbind(dX,idX)}
> }
>
> Is there more elegant/faster way other than to using a For Loop like this? It 
> seems very slow when I may have to get much larger matrix where nD can be 
> more than 1000.
>
> Thank you HC
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting matrix from netCDF file using ncdf4 package

2016-07-03 Thread Hemant Chowdhary via R-help
Thank you both.
Yes, this is basically the issue of able to subset an array rather than 
extracting from the netCDF file. The dX = ncvar_get(nc=myNC, 
varid="myVar")command already results in the array. And one can subset that 
array using indices.
In turn the problem can be stated as follows:Let us say dX is a 3D array with 
dimensions 100x200x365. The objective is to extract five specific vectors of 
365 each corresponding to reqX = c(35,35,40,65,95); and reqY = 
c(2,5,10,112,120); 
dX2 = dX[reqX, reqY,]results again in an array of 5x5x365, i.e., corresponding 
to all 25 combinations of reqX and reqY. 
Somehow, I was expecting that there is a subsetting function that can result in 
a matrix of 5x365 directly. If there is none than one can extract one grid at a 
time and fill the pre-defined matrix as you have suggested.
Thank you againHC



 

On Saturday, 2 July 2016 7:26 PM, Roy Mendelssohn - NOAA Federal 
 wrote:
 

 Sending this to Hemant a second time as i forgot to reply to list.

Hi Hemant:

Well technically the code you give below shouldn’t work, because “start” and 
“count” are suppose to be of the same dimensions as the variables.  I guess 
Pierce’s code must be very forgiving if that is working.  One thing you can do 
to speed things up is pre-allocate the array you want to create, say

> dX <- array(NA_real_, dim=c(5,365))


and then have the ncvar_get call write directly to the array:

> dX[i,] <- ncvar_get(nc=myNC, varid="myVar", start=c(reqX[i],reqY[i],1), 
> count=c(1,1,-1)) 

The second thing you can do, is to use “lapply” instead of the “for” loop, but 
I don’t know how much faster that will make your code.  The fastest however, if 
you have the memory, is to just read the array into memory:

> dX <-  ncvar_get(nc=myNC, varid=“myVar”)


and then use R’s subsetting abilities. You can do fancier subsetting of arrays 
in memory than you can to arrays on disk.

HTH,

-Roy


> On Jul 2, 2016, at 3:43 PM, Hemant Chowdhary via R-help 
>  wrote:
> 
> I am working with a 3-dimensional netCDF file having dimensions of X=100, 
> Y=200, T=365. 
> My objective is to extract time vectors of a few specific grids that may not 
> be contiguous on X and/or Y. 
> 
> For example, I want to extract a 5x365 matrix where 5 rows are each vectors 
> of length 365 of 5 specific X,Y combinations. 
> 
> For this, I am currently using the following 
> 
> reqX = c(35,35,40,65,95); 
> reqY = c(2,5,10,112,120,120); 
> nD = length(reqX) 
> for(i in 1:nD){ 
> idX = ncvar_get(nc=myNC, varid="myVar", start=c(reqX[i],reqY[i]), 
> count=c(1,1)) 
> if(i==1){dX = idX} else {dX = rbind(dX,idX)} 
> } 
> 
> Is there more elegant/faster way other than to using a For Loop like this? It 
> seems very slow when I may have to get much larger matrix where nD can be 
> more than 1000. 
> 
> Thank you HC
> 
>     [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

> On Jul 2, 2016, at 3:43 PM, Hemant Chowdhary via R-help 
>  wrote:
> 
> I am working with a 3-dimensional netCDF file having dimensions of X=100, 
> Y=200, T=365. 
> My objective is to extract time vectors of a few specific grids that may not 
> be contiguous on X and/or Y. 
> 
> For example, I want to extract a 5x365 matrix where 5 rows are each vectors 
> of length 365 of 5 specific X,Y combinations. 
> 
> For this, I am currently using the following 
> 
> reqX = c(35,35,40,65,95); 
> reqY = c(2,5,10,112,120,120); 
> nD = length(reqX) 
> for(i in 1:nD){ 
> idX = ncvar_get(nc=myNC, varid="myVar", start=c(reqX[i],reqY[i]), 
> count=c(1,1)) 
> if(i==1){dX = idX} else {dX = rbind(dX,idX)} 
> } 
> 
> Is there more elegant/faster way other than to using a For Loop like this? It 
> seems very slow when I may have to get much larger matrix where nD can be 
> more than 1000. 
> 
> Thank you HC
> 
>     [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

**
"The contents of this message do not reflect any position of the U.S. 
Government or NOAA."
**
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
***Note new address and phone***
110 Shaffer Road
Santa Cruz, CA 95060
Phone: (831)-420-3666
Fax: (831) 420-3980
e-mail: roy.mendelss...@noaa.gov www: http://www.pfeg.noaa.gov/

"Old age and treachery 

[R-es] Ayuda gráficos ggplot

2016-07-03 Thread Alexa Aristizabal
Hola a todos!

Soy nueva en R y necesito hacer unos gráficos para una investigación, he
explorado un poco y estoy intentando usar ggplot2 ya que hace gráficos de
muy buena calidad...tengo los datos de varios años para diferentes grupos
de empresas y los pretendo graficar tanto en un solo grafico como en varios
(facet_wrap) pero tengo problemas desde el inicio...este es el código que
estoy utilizando:

library(reshape2)
library(ggplot2)

emp <- read.csv("C:/Users/usuario/Documents/tamano_empresas.csv",
header=TRUE, sep=";", comment.char="" , strip.white=FALSE, dec = ",")

melted = melt(emp, id.vars="Empresas")

ggplot(data=melted, aes(x=variable, y=value, col=Empresas)) + geom_line()

El cual genera el siguiente error:

geom_path: Each group consists of only one observation. Do you need to
adjust the
group aesthetic?

Y ya ahí no sé que debo hacer. Así mismo, me gustaría que en el eje de la X
donde aparecen los años ...aparecieran únicamente los años y no X2003,
X2004 y así!!!

Agradezco cualquier ayuda. Adjunto el dataset.

Muchas gracias.
Empresas;2003;2004;2005;2006;2007;2008;2009;2010;2011;2012;2013;2014;2015
UNI;1,150833;1,56;3,511667;5,15;5,268333;2,965;0,5558333;0,3116667;0,303;0,2825;0,167;0,1241667;0,227
PEQ;3,665492;4,570317;4,696733;4,7982;5,955667;5,489925;1,198842;0,6899583;0,887625;0,8387083;0,4930333;0,5391167;0,5525417
MED;0,0883;0,0883;0,09;0,324675;0,7458333;0,847;0,578;0,383;0,3325967;0,3272108;0,2367042;0,2049167;0,1696667
GRA;2,333467;2,106333;2,184683;3,079225;4,277608;4,634233;1,228358;0,8109583;1,3906;0,5731834;0,2206667;0,2099333;-0,01936667
___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es

Re: [R] R - Populate Another Variable Based on Multiple Conditions | For a Large Dataset

2016-07-03 Thread John Kane
The data set did not show up. The R-help list tends to strip out most file 
types as a safety precaution.  Try renaming the file from xxx.csv to xxx.txt 
and it should come through alright.



John Kane
Kingston ON Canada


> -Original Message-
> From: kwa...@kemri-wellcome.org
> Sent: Sun, 3 Jul 2016 09:39:59 +
> To: jdnew...@dcn.davis.ca.us, r-help@r-project.org
> Subject: Re: [R] R - Populate Another Variable Based on Multiple
> Conditions | For a Large Dataset
> 
> Hi Jeff, pardon me, I was surely not making it easy. I hope this time I
> will ☺
> 
> Attached is snippet of the dataset in csv format and below is the
> R.script I have managed so far.
> 
> ---
> ---
> 
> drug_study <- read.csv("drug_study.csv", header = T); head(drug_study)
> drug_study$date <- as.Date(drug_study$date, "%m/%d/%Y")
> drug_study$study_id <- ""  #create new column
> 
> individual <- unique (drug_study$ID)  #vector of individuals
> datalength <- dim(drug_study)[1]  #number of rows in dataframe
> 
> for (i in 1:length(individual)) {
>   for (j in 1:datalength) {
> start_admin <- drug_study[c(drug_study$ID == individual[i] &
> drug_study$year == 2007 & drug_study$drug_admin == "Y" & drug_study$month
> == 5),2]  #capture date of start
> end_admin <- drug_study[(drug_study$ID == individual[i] &
> drug_study$year == 2008 & drug_study$drug_admin == "Y" & drug_study$month
> == 2),2]#capture date of end
> 
> if(drug_study[j,1] == individual[i] & drug_study[j,2] >= start_admin
> & drug_study[j,2] < end_admin) {
>   drug_study[j,6] <- paste(start_admin) #populate respective row if
> condition is met
> }
>   }
> }
> ~
> ~
> 
> For this dataset, there exists three individuals, J1/3, R1/3, R10/1.
> 
> The script works for the last two individuals but not J1/3 with the error
> below:
> 
> ~
> ~
> Error in if (drug_study[j, 1] == individual[i] & drug_study[j, 2] >=
> start_admin &  :
>   argument is of length zero
> ~
> ~
> 
> I figured it’s because this individuals start_admin and end_admin dates
> aren’t captured because the if-loop fails. There’s my first problem,
> there are thousands of individuals with varying
> start_admin and end_admin dates and I need a script to capture these for
> every individual.
> 
> Secondly, the above script is taking almost an hour to run for the entire
> dataset, just for the individuals whose start_admin and end_admin dates
> can be captured by the if-loop.
> 
> I need help in coming up with a script that will tackle the problem
> taking into account the different start_admin and end_admin dates and be
> resourceful with regards to time.
> 
> Regards
> ---
> Kevin Kariuki
> 
> ###
> ###
> 
> On 7/3/16, 8:42 AM, "Jeff Newmiller"  wrote:
> 
> You are making this hard on yourself by not paying attention the Posting
> Guide listed in the footer of every email on this list. You would
> probably also find [1] helpful also.
> 
> [1]
> http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
> --
> Sent from my phone. Please excuse my brevity.
> 
> On July 2, 2016 3:41:07 PM PDT, Kevin Wamae 
> wrote:
> >Hi Jeff, sorry for referring to you as Jennifer earlier, accept my
> >apologies.
>> 
> >I attached a sample dataset in the question, am afraid it must have
> >failed to attach.
>> 
> >I have attached it again..
>> 
>> 
> >Regards
> >---
> >Kevin Kariuki
>> 
>> 
> >On 7/2/16, 7:37 PM, "Jeff Newmiller"  wrote:
>> 
> >I can understand you not wanting to supply your actual data online, but
> >only you know what your data looks like so only you can create a
> >simulated data set that we could show you how to work with.
> >--
> >Sent from my phone. Please 

[R] BCa Bootstrapped regression coefficients from lmrob function not working

2016-07-03 Thread varin sacha via R-help
Dear R-experts,

I am trying to calculate the bootstrapped (BCa) regression coefficients for a 
robust regression using MM-type estimator (lmrob function from robustbase 
package).

My R code here below is showing a warning message ([1] "All values of t are 
equal to 
22.2073014256803\n Can not calculate confidence intervals" NULL), I was 
wondering if it was because I am trying to fit a robust regression with lmrob 
function rather than a simple lm ? I mean maybe the boot.ci function does not 
work with lmrob function ? If not, I was wondering what was going on ?

Here is the reproducible example


Dataset = 
data.frame(PIBparHab=c(43931,67524,48348,44827,52409,15245,24453,57636,28992,17102,51495,47243,40908,22494,12784,48391,44221,32514,35132,46679,106022,9817,99635,38678,49128,12876,20732,17151,19670,41053,22488,57134,83295,10660),

QUALITESANSREDONDANCE=c(1082.5,1066.6,1079.3,1079.9,1074.9,1008.6,1007.5,.3,1108.2,1109.7,1059.6,1165.1,1026.7,1035.1,997.8,1044.8,1073.6,1085.7,1083.8,1021.6,1036.2,1075.3,1069.3,1101.4,1086.9,1072.1,1166.7,983.9,1004.5,1082.5,1123.5,1094.9,1105.1,1010.8),

competitivite=c(89,83,78,73,90,71,77,85,61,67,98,82,70,43,57,78,72,79,61,71,86,63,90,75,87,64,60,56,66,80,53,91,97,62),

innovation=c(56,52,53,54,57,43,54,60,47,55,58,62,52,35,47,59,56,56,45,52,58,33,57,57,61,40,45,41,50,61,50,65,68,34))

library("robustbase")
newdata=na.omit(Dataset)
a=Dataset$PIBparHab
b=Dataset$QUALITESANSREDONDANCE
c=Dataset$competitivite
d=Dataset$innovation

fm.lmrob=lmrob(a~b+c+d,data=newdata)
fm.lmrob

boot.Lmrob=function(formula,data,indices) {
d=data[indices,]
fit=lmrob(formula,data=d)
return(coef(fit))
}

library(boot)
results=boot(data=newdata, statistic=boot.Lmrob, R=1000,formula=a~b+c+d)
boot.ci(results, type= "bca",index=2)


Any help would be highly appreciated,
S

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] trouble double looping to generate data for a meta-analysis

2016-07-03 Thread Jim Lemon
Hi Marietta,
You may not be aware that the variable k is doing nothing in your
example except running the random variable generation 2 or 3 times for
each cycle of the outer loop as each successive run just overwrites
the one before. If you want to include all two or three lots of values
you will have to do something like this:

db<-list()
gen_sample<-function(n,k) {
 for(m in 1:length(n)) {
  for(j in 1:n[m]) {
   for(i in 1:k[m]) {
dbindx<-i+(j-1)*k[m]+(m-1)*(n[m]+k[1]+k[2])
db[[dbindx]]<-
 matrix(c(rnorm(n[m], 100, 15),
  rsnorm(n[m], 100, 15),
  rlogis(n[m], 100, 15)),ncol=3)
   }
  }
 }
 return(db)
}
gen_sample(c(10,15),c(2,3))

Jim


On Sat, Jul 2, 2016 at 3:28 AM, Marietta Suarez  wrote:
> i'm trying to generate data for a meta analysis. 1- generate data following
> a normal distribution, 2- generate data following a skewed distribution, 3-
> generate data following a logistic distribution. i need to loop this
> because the # of studies in each meta will be either 10 or 15. k or total
> number of studies in the meta will be 5. i need to loop twice to repeat
> this process 10 times. database should be 3 columns (distributions) by 65
> rows x 10 reps
>
>
> here's my code, not sure what's not working:
> library(fGarch)
>
> #n reps =10
> rep=10
>
> #begin function here, need to vary n and k, when k=2 n=10, when k3 n=15
> fun=function(n, k){
>
>   #prepare to store data
>   data=matrix(0,nrow=10*k, ncol=3)
>   db=matrix(0,nrow=650, ncol=3)
>
>   for (j in 1:rep)
>   {
> for (i in 1:k)
> {
>   #generate data under normal, skewed, and logistic distributions here
>
>   data[,1]=rnorm(n, 100, 15)
>   data[,2]=rsnorm(n, 100, 15, 1)
>   data[,3]=rlogis(n, 100, 15)
> }
>   [j]=db
>   }
> }
>
> save=fun(10,2)
>
> Please help!!!
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R - Populate Another Variable Based on Multiple Conditions | For a Large Dataset

2016-07-03 Thread Kevin Wamae
Hi Jeff, pardon me, I was surely not making it easy. I hope this time I will ☺

Attached is snippet of the dataset in csv format and below is the R.script I 
have managed so far.

---
---

drug_study <- read.csv("drug_study.csv", header = T); head(drug_study)
drug_study$date <- as.Date(drug_study$date, "%m/%d/%Y")
drug_study$study_id <- ""  #create new column

individual <- unique (drug_study$ID)  #vector of individuals
datalength <- dim(drug_study)[1]  #number of rows in dataframe

for (i in 1:length(individual)) {
  for (j in 1:datalength) {
start_admin <- drug_study[c(drug_study$ID == individual[i] & 
drug_study$year == 2007 & drug_study$drug_admin == "Y" & drug_study$month == 
5),2]  #capture date of start
end_admin <- drug_study[(drug_study$ID == individual[i] & drug_study$year 
== 2008 & drug_study$drug_admin == "Y" & drug_study$month == 2),2]#capture 
date of end

if(drug_study[j,1] == individual[i] & drug_study[j,2] >= start_admin & 
drug_study[j,2] < end_admin) {
  drug_study[j,6] <- paste(start_admin) #populate respective row if 
condition is met
} 
  } 
}
~
~

For this dataset, there exists three individuals, J1/3, R1/3, R10/1.

The script works for the last two individuals but not J1/3 with the error below:

~
~
Error in if (drug_study[j, 1] == individual[i] & drug_study[j, 2] >= 
start_admin &  : 
  argument is of length zero
~
~

I figured it’s because this individuals start_admin and end_admin dates aren’t 
captured because the if-loop fails. There’s my first problem, there are 
thousands of individuals with varying
start_admin and end_admin dates and I need a script to capture these for every 
individual.

Secondly, the above script is taking almost an hour to run for the entire 
dataset, just for the individuals whose start_admin and end_admin dates can be 
captured by the if-loop.

I need help in coming up with a script that will tackle the problem taking into 
account the different start_admin and end_admin dates and be resourceful with 
regards to time.

Regards
---
Kevin Kariuki

###
###

On 7/3/16, 8:42 AM, "Jeff Newmiller"  wrote:

You are making this hard on yourself by not paying attention the Posting Guide 
listed in the footer of every email on this list. You would probably also find 
[1] helpful also. 

[1] 
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
-- 
Sent from my phone. Please excuse my brevity.

On July 2, 2016 3:41:07 PM PDT, Kevin Wamae  wrote:
>Hi Jeff, sorry for referring to you as Jennifer earlier, accept my
>apologies.
>
>I attached a sample dataset in the question, am afraid it must have
>failed to attach.
>
>I have attached it again..
>
>
>Regards
>---
>Kevin Kariuki
> 
>
>On 7/2/16, 7:37 PM, "Jeff Newmiller"  wrote:
>
>I can understand you not wanting to supply your actual data online, but
>only you know what your data looks like so only you can create a
>simulated data set that we could show you how to work with. 
>-- 
>Sent from my phone. Please excuse my brevity.
>
>On July 2, 2016 2:57:39 AM PDT, Kevin Wamae 
>wrote:
>>I have a drug-trial study dataset (attached image).
>>
>>Since its a large and complex dataset (at least to me) and I hope to
>be
>>as clear as possible with my question.
>>The dataset is from a study where individuals are given drugs and
>>followed up over a period spanning two consecutive years. Individuals
>>do not start treatment on the same day and once they start, the
>>variable "drug-admin" is marked "x" as well as the time they stop
>>treatment in the following year.
>>There exists another variable, "study_id", that I hope to populate as
>>can 

Re: [R] lineplot.CI xaxis scale change in sciplot?

2016-07-03 Thread Jim Lemon
Hi Clemence,
I don't have sciplot installed, but the help page suggests that the
"xaxt" argument is available. This will prevent the x axis from being
displayed and you can then specify the x axis you want. Assume that
you want an x axis from 0 to 300 by 50:

axis(1,at=seq(0,300,by=50))

Jim


On Thu, Jun 30, 2016 at 12:04 PM, Clemence Henry
 wrote:
> Hi,
>
> I am trying to change the values of the tick marks on the xaxis of the 
> following multipanel plot (see relevant bits of script below) to increments 
> of 50 or to a custom scale (ie. 50, 100, 150, 200, 300...).
> So far I tried using xaxp or xlim both in par() or lineplot.CI(), as well as 
> axTicks and axisTicks but did not get it to work.
> Suggestions?
>
> #Plots average A/Ci for each day from ACi
> #Parameters of the panels
> par(mfcol=c(3,2), #row,col
> mar=c(2,2,1,1), #inner margin (bottom, left, top, right)
> oma=c(4,4,1,1), #outer margin (bottom, left, top, right)
> omd=c(0.1,0.8,0.1,0.95), #outer dimensions, values {0-1}, (x1, x2, y1, y2)
> xpd=NA)
>
> ...
>
>
> #PAR = 1000, Day2
> with(subset1000_2,
>  lineplot.CI(x.factor=Ci.average,
>  response=Photo,
>  group=Treatment,
>  ylab=NA,
>  xlab=NA,
>  legend=FALSE,
>  type="p",
>  x.cont=TRUE, #continuous x axis (spacing proportional to 
> values)
>  ylim=c(1,45), #range y axis
>  err.width=0.05,
>  pch = c(16,16,16), #symbols shape
>  col=c("gray84","black","gray48"),
>  fun=
>  ))
> mtext("Day2, PAR=1000", side = 3, line= -1, adj=0, at=1, cex=0.6) #subtitle
>
> 
>
> #legends
> mtext("Ci", side = 1, line= 1, outer = TRUE, cex=0.7) #x legend
> mtext("Photosynthetic rate", side = 2, line= 1, outer = TRUE, cex=0.7) #y 
> legend
> Thank you kindly for your support.
>
> Clemence
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.