Re: [R] Create a function problem

2021-05-15 Thread Mathew Guilfoyle
Have you tried putting the a,b, and c column names you are passing in quotes 
i.e. as strings? Currently your function is expecting separate objects with 
those names.

The select function itself can accept unquoted column names, as can others in 
R, because specific processing they do in the background.  Your wrapper 
function doesn’t have that facility.  I have a very limited understanding of 
the details to explain this but it is one of the ‘gotchas’ when starting.

No doubt others will point out that the approach you are using is probably not 
the best (lacks flexibility; some column names are hard coded etc), but if it 
gets the job done, to some extent that’s all that matters.

> On 14 May 2021, at 23:06, Kai Yang via R-help  wrote:
> 

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] change frequency of wind data correctly

2020-12-06 Thread Mathew Guilfoyle
Hi Stefano

I think either of these does what you need...


1: This gets the interval column as you want it, but utilises the lubridate 
package:

library(lubridate)
mydf$interval = ceiling_date(mydf$data_POSIX, unit="30 minutes”)


2: Alternative in base R is a bit more long winded: convert the date to numeric 
(in seconds), divide by 1800 (seconds in 30min), take the ceiling, and convert 
back.

mydf$interval = as.POSIXct(ceiling(as.numeric(mydf$data_POSIX)/1800)*1800, 
origin="1970-01-01", tz="Etc/GMT-1")


Cheers


> On 2 Dec 2020, at 17:53, Stefano Sofia  
> wrote:
> 
> init_day <- as.POSIXct("2018-02-01-00-00", format="%Y-%m-%d-%H-%M", 
> tz="Etc/GMT-1")
> fin_day <- as.POSIXct("2018-02-01-02-00", format="%Y-%m-%d-%H-%M", 
> tz="Etc/GMT-1")
> mydf <- data.frame(data_POSIX=seq(init_day, fin_day, by="10 mins"))
> mydf$vmax <- round(rnorm(13, 35, 10))
> mydf$interval <- cut(mydf$data_POSIX, , breaks="30 min")
> means <- aggregate(vmax ~ interval, mydf, mean)


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] struggling with apply

2020-05-27 Thread Mathew Guilfoyle
A bit quicker:

t(pmin(t(somematrix), UB))



> On 27 May 2020, at 20:56, Bert Gunter  wrote:
> 
> Jeff: Check it!
> 
>> somematrix <- matrix(c(1,4,3,6,3,9,12,8,5,7,11,11),nrow=3,ncol=4)
>> UB=c(2.5, 5.5, 8.5, 10.5)
>> apply( somematrix, 2, function( x ) pmin( x, UB ) )
> [,1] [,2] [,3] [,4]
> [1,]1  2.5  2.5  2.5
> [2,]4  3.0  5.5  5.5
> [3,]3  8.5  5.0  8.5
> [4,]1  6.0 10.5  7.0
> 
> Not what was wanted.
> Am I missing something?
> 
> Bert Gunter
> 
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> 
> 
> On Wed, May 27, 2020 at 12:38 PM Jeff Newmiller  >
> wrote:
> 
>> Sigh. Transpose?
>> 
>> apply( somematrix, 2, function( x ) pmin( x, UB ) )
>> 
>> On May 27, 2020 11:22:06 AM PDT, Bert Gunter 
>> wrote:
>>> Better, I think (no indexing):
>>> 
>>> t(apply(somematrix,1,function(x)pmin(x,UB)))
>>> 
>>> 
>>> Bert Gunter
>>> 
>>> "The trouble with having an open mind is that people keep coming along
>>> and
>>> sticking things into it."
>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>> 
>>> 
>>> On Wed, May 27, 2020 at 10:56 AM Rui Barradas 
>>> wrote:
>>> 
 Hello,
 
 Try pmin. And loop by column/UB index with sapply/seq_along.
 
 
 sapply(seq_along(UB), function(i) pmin(UB[i], somematrix[,i]))
 # [,1] [,2] [,3] [,4]
 #[1,]  1.0  5.5  8.5  7.0
 #[2,]  2.5  3.0  8.0 10.5
 #[3,]  2.5  5.5  5.0 10.5
 
 
 Hope this helps,
 
 Rui Barradas
 
 
 Às 18:46 de 27/05/20, Michael Ashton escreveu:
> Hi -
> 
> I have a matrix of n rows and 4 columns.
> 
> I want to cap the value in each column by a different upper bound.
>>> So,
 suppose my matrix is
> 
> somematrix <- matrix(c(1,4,3,6,3,9,12,8,5,7,11,11),nrow=3,ncol=4)
>> somematrix
>  [,1] [,2] [,3] [,4]
> [1,]16   127
> [2,]438   11
> [3,]395   11
> 
> Now I want to have the maximum value in each column described by
> UB=c(2.5, 5.5, 8.5, 10.5)
> 
> So that the right answer will look like:
>  [,1]  [,2][,3]   [,4]
> [1,]1  5.5 8.57
> [2,]2.538 10.5
> [3,]2.5   5.5  510.5
> 
> I've tried a few things, like:
> newmatrix <- apply(somematrix,c(1,2),function(x) min(UB,x))
> 
> but I can't figure out to apply the relevant element of the UB list
>>> to
 the right element of the matrix. When I run the above, for example,
>>> it
 takes min(UB,x) over all UB, so I get:
> 
> newmatrix
>  [,1] [,2] [,3] [,4]
> [1,]  1.0  2.5  2.5  2.5
> [2,]  2.5  2.5  2.5  2.5
> [3,]  2.5  2.5  2.5  2.5
> 
> I'm sure there's a simple and elegant solution but I don't know
>>> what it
 is!
> 
> Thanks in advance,
> 
> Mike
> 
> Michael Ashton, CFA
> Managing Principal
> 
> Enduring Investments LLC
> W: 973.457.4602
> C: 551.655.8006
> Schedule a Call: https://calendly.com/m-ashton
> 
> 
>  [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
>>> 
>>>  [[alternative HTML version deleted]]
>>> 
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> --
>> Sent from my phone. Please excuse my brevity.
>> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org  mailing list -- To 
> UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help 
> 
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
> 
> and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]


[R] mgcv::gamm error when combining random smooths and correlation/autoregressive term

2018-04-18 Thread Mathew Guilfoyle
I am having difficulty fitting a mgcv::gamm model that includes both a random 
smooth term (i.e. 'fs' smooth) and autoregressive errors.  Standard smooth 
terms with a factor interaction using the 'by=' option work fine.  Both on my 
actual data and a toy example (below) I am getting the same error so am 
inclined to wonder if this is either a bug or a model that gamm is simply 
unable to fit?

Any insight or suggestions would be much appreciated.

M


Example:

library('mgcv')
set.seed(1)
df = data.frame(index=rep(1:10,5), x=runif(50,0,1), subject = 
as.factor(sort(rep(1:5,10

# random intercept
m1 = gamm(x~s(index), random=list(subject=~1), data=df, method = 'REML')

#factor interaction, random intercept, AR errors
m2 = gamm(x~s(index, by=subject), random=list(subject=~1), 
correlation=corAR1(form=~index|subject), data=df, method = 'REML')

#factor interaction, random intercept and slope, AR errors
m3 = gamm(x~s(index, by=subject), random=list(subject=~index), 
correlation=corAR1(form=~index|subject), data=df, method = 'REML')

#random smooth on its own works ok
m4 = gamm(x~s(index, subject, bs='fs'), data=df, method = 'REML')


#combination of 'fs' smooth and AR term generates the error: "Error in 
matrix(0, size.cg[i], size.cg[i]) : object 'size.cg' not found"
#the grouping term ( |subject ) is redundant in corAR1 as the mgcv 
documentation indicates that gamm will assume the grouping 
#from the random smooth term; the same error happens irrespective of whether 
the grouping is included in corAR1 or not.

m5 = gamm(x~s(index, subject, bs='fs'), 
correlation=corAR1(form=~index|subject), data=df, method = 'REML')__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R help

2018-03-31 Thread Mathew Guilfoyle
When i increments to 6 (during the fifth iteration) the subsequent test of 
x[i]<=5 will produce an error since x has only five elements.

> On 31 Mar 2018, at 14:45, Henri Moolman  wrote:
> 
> Could you please provide help with something from R that I find rather
> puzzling? In the small program below x[1]=1, .  .  .  , x[5]=5. R also
> finds that x[1]<=5 is TRUE. Yet when you attempt to execute while, R does
> not seem to recognize the condition. Any thoughts on why this happens?
> 
> Regards
> 
> Henri Moolman
> 
>> x=c(1,2,3,4,5)
>> x[1]
> [1] 1
>> i=1
>> x[1]<=5
> [1] TRUE
>> while(x[i]<=5){
> + i=i+1
> + }
> Error in while (x[i] <= 5) { : missing value where TRUE/FALSE needed
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] mgcv::gam is it possible to have a 'simple' product of 1-d smooths?

2018-01-17 Thread Mathew Guilfoyle
I am trying to test out several mgcv::gam models in a scalar-on-function 
regression analysis.

The following is the 'hierarchy' of models I would like to test:

(1)  Y_i = a + integral[ X_i(t)*Beta(t) dt ]

(2)  Y_i = a + integral[ F{X_i(t)}*Beta(t) dt ]

(3)  Y_i = a + integral[ F{X_i(t),t} dt ]

equivalents for discrete data might be:

1)  Y_i = a + sum_t[ L_t * X_it * Beta_t ]

(2)  Y_i = a + sum_t[ L_t * F{X_it} * Beta_t ]

(3)  Y_i = a + sum_t[ L_t * F{X_it,t} ]

where Y_i are scalar outcomes for the i-th subject, and X_i(t) is a functional 
covariate observed at times t in [0,1,...T], and L are the quadrature weights.  
Beta() and/or F{} are the functions to be estimated.   Intuitively, model 1 is 
a linear functional model with a (potentially non-linear) time-dependent 
regression coefficient (beta()) for the covariate.  Model 2 allows for a 
non-linear function of the covariate (F{}), but which is constant over time.  
Model 3 is the full 'functional GAM' that allows for a fully flexible 
non-linear covariate- and time- dependent function.  

In my mind at least these would seem to form a natural step-by-step approach of 
increasing complexity for exploring this type of regression model.

Models 1 (linear functional) and 3 (functional GAM) are relatively 
straightforward to do with matrix arguments to mgcv::gam.  Assume N subjects 
observed at T time points.  Y is the length-N vector of scalar outcomes and X, 
T, and W are the N*T matrices of the functional predictor data values, their 
observation times, and trapezoidal quadrature weights, respectively.

Models (1) and (3) could be obtained with:

   m1 = gam(Y ~ s(T, by=I(X*W), bs='ps')
   m3 = gam(Y ~ te(X, T, by=W), bs='ps')

However, I cannot find a way to achieve model (2) where there is a 'simple' 
product of the smooth functions of X and T.  Effectively what I need (I think) 
is a way of creating a `te()` tensor product smooth but somehow constraining 
each marginal smooth to be the same for all values of the other variable?  Is 
this possible?

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] mgcv::gam is it possible to have a 'simple' product of 1-d smooths?

2018-01-17 Thread Mathew Guilfoyle
I am trying to test out several mgcv::gam models in a scalar-on-function 
regression analysis.

The following is the 'hierarchy' of models I would like to test:

(1)  Y_i = a + integral[ X_i(t)*Beta(t) dt ]

(2)  Y_i = a + integral[ F{X_i(t)}*Beta(t) dt ]

(3)  Y_i = a + integral[ F{X_i(t),t} dt ]

equivalents for discrete data might be:

1)  Y_i = a + sum_t[ L_t * X_it * Beta_t ]

(2)  Y_i = a + sum_t[ L_t * F{X_it} * Beta_t ]

(3)  Y_i = a + sum_t[ L_t * F{X_it,t} ]

where Y_i are scalar outcomes for the i-th subject, and X_i(t) is a functional 
covariate observed at times t in [0,1,...T], and L are the quadrature weights.  
Beta() and/or F{} are the functions to be estimated.   Intuitively, model 1 is 
a linear functional model with a (potentially non-linear) time-dependent 
regression coefficient (beta()) for the covariate.  Model 2 allows for a 
non-linear function of the covariate (F{}), but which is constant over time.  
Model 3 is the full 'functional GAM' that allows for a fully flexible 
non-linear covariate- and time- dependent function.  

In my mind at least these would seem to form a natural step-by-step approach of 
increasing complexity for exploring this type of regression model.

Models 1 (linear functional) and 3 (functional GAM) are relatively 
straightforward to do with matrix arguments to mgcv::gam.  Assume N subjects 
observed at T time points.  Y is the length-N vector of scalar outcomes and X, 
T, and W are the N*T matrices of the functional predictor data values, their 
observation times, and trapezoidal quadrature weights, respectively.

Models (1) and (3) could be obtained with:

m1 = gam(Y ~ s(T, by=I(X*W), bs='ps')
m3 = gam(Y ~ te(X, T, by=W), bs='ps')

However, I cannot find a way to achieve model (2) where there is a 'simple' 
product of the smooth functions of X and T.  Effectively what I need (I think) 
is a way of creating a `te()` tensor product smooth but somehow constraining 
each marginal smooth to be the same for all values of the other variable?  Is 
this possible?
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] correlation between nominal and ordinal

2017-09-01 Thread Mathew Guilfoyle
Any of the usual rank correlation methods should be fine if you're expecting a 
monotonic relationship e.g. Spearman's rho or Kendall's tau.

> On 1 Sep 2017, at 21:25, merlinverde...@infomed.sld.cu wrote:
> 
> I would be very grateful if you would tell me how I can find the degree of 
> correlation between a nominal dependent variable and an independent ordinal 
> variable. The nominal variable has only two levels: YES and NO.
> thank you very much in advance
> regards,
> merlin
> 
> 
> 
> 
> 
> 
> --
> Este mensaje le ha llegado mediante el servicio de correo electronico que 
> ofrece Infomed para respaldar el cumplimiento de las misiones del Sistema 
> Nacional de Salud. La persona que envia este correo asume el compromiso de 
> usar el servicio a tales fines y cumplir con las regulaciones establecidas
> 
> Infomed: http://www.sld.cu/
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R Programming help needed - Returning dataframes + 2 Variables dynamically

2017-07-28 Thread Mathew Guilfoyle
The returned values are in the list you assign to test_data, the original x and 
y are not modified, i.e the returned value for x will be test_data[[1]] and for 
y will be test_data[[2]].  Using the same variable names in the function and 
test is perhaps what is leading to confusion.

> On 28 Jul 2017, at 09:13, Vijaya Kumar Regati  
> wrote:
> 
> Hi,
> 
> 
> That was very useful information. Thanks.
> 
> But still I am not able to get the desired output with updated code.
> 
> Kindly help if you have any further thoughts ...
> 
> Updated code :
> x <- 0
> y <- 0
> 
> Logic_fn <- function(x,y){
> 
> print("Passed Values")
> print(x)
> print(y)
> x <- x + 1
> y <- y + 1
> 
> print("After addition :")
> print(x)
> print(y)
> 
> test_data <- rbind(x,y)
> test_data <- data.frame(test_data)
> return(list(x,y,test_data))
> }
> 
> for ( i in 1:1 ) {
> test_data <- Logic_fn(x,y)
> print("Returned Values :")
> print(x)
> print(y)
> }
> 
> 
> Wrong output :
> [1] "Passed Values"
> [1] 0
> [1] 0
> [1] "After addition :"
> [1] 1
> [1] 1
> [1] "Returned Values :"
> [1] 0
> [1] 0
> 
> 
> With Regards,
> Vijaya Kumar Regati
> Technical Lead, M3bi India Private Ltd
> Work: 040-67064732
> 
> 
> From: Koustav Pal 
> Sent: Friday, July 28, 2017 12:25:54 PM
> To: Vijaya Kumar Regati
> Cc: vijaykr@gmail.com; r-help@R-project.org
> Subject: Re: [R] R Programming help needed - Returning dataframes + 2 
> Variables dynamically
> 
> c() is used for constructing vectors. Or in other words using the method 
> c(x,y) provides a vector of length 2 (hopefully). Therefore, you are pushing 
> a single vector to your function and not two arguments as you want.
> 
> It should be Logic_fn(x,y) not Logic_fn(c(x,y)).
> 
> Furthermore, i would recommend reading up on how function constructs work. 
> Two return statements within a function are redundant because the function 
> will exit after the first return statement, and anything else just creates 
> more ambiguity. If you want to return the x y variables as well, you should 
> use a list like so, list(test_data, x, y) and then retrieve information from 
> the list like this Boo[[1]].
> 
> Finally i would also like to point out that you should note R's behaviour 
> unless explicitly stated while calling the function is to assign the first 
> argument to the first declared variable and the second to the second. So, if 
> you called Logic_fn(y,x), then y will be assigned to x and x to y. So to 
> avoid such a scenario you can mention it explicitly as Logic_fn(y=y,x=x).
> 
> 
> On Jul 28, 2017 8:39 AM, "Vijaya Kumar Regati" 
> > wrote:
> Hi,
> 
> 
> Can someone please help me on below issue I am facing :
> 
> 
> I am trying to play with returning a dataframe+2 variables using a fn.
> But facing an issue :
> 
> Error in Logic_fn(c(x, y)) : argument "y" is missing, with no default
> 
> This is the code I am using :
> 
> 
> x <- 0
> y <- 0
> 
> Logic_fn <- function(x,y){
> x <- x + 1
> y < y + 1
> test_data <- rbind(x,y)
> test_data <- data.frame(test_data)
> return(test_data)
> return(c(x,y))
> }
> 
> for ( i in 1:1) {
>  test_data <- Logic_fn(c(x,y))
> test_data[1]
> test_data[2]
> test_data
> 
> }
> 
> With Regards,
> Vijaya Kumar Regati
> 
> 
> Disclaimer: IMPORTANT NOTICE: This e-mail (including any attachments) are 
> confidential, may contain proprietary or privileged information and is 
> intended for the named recipient(s) only. If you are not the intended 
> recipient, any disclosures, use, review, distribution, printing or copying of 
> the information contained in this e-mail message and/or attachments to it are 
> strictly prohibited. If you have received this email in error, please notify 
> the sender by return e-mail or telephone immediately and permanently delete 
> the message and any attachments.
> 
>[[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To 
> UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> Disclaimer: IMPORTANT NOTICE: This e-mail (including any attachments) are 
> confidential, may contain proprietary or privileged information and is 
> intended for the named recipient(s) only. If you are not the intended 
> recipient, any disclosures, use, review, distribution, printing or copying of 
> the information contained in this e-mail message and/or attachments to it are 
> strictly prohibited. If you have received this email in error, please notify 
> the sender by return e-mail or telephone immediately and permanently delete 
> the message and any attachments.
> 
>[[alternative HTML version deleted]]
> 
> 

Re: [R] Problem in shiny writing a .txt file

2017-07-19 Thread Mathew Guilfoyle
Is res.path usually empty? If the res.path directory is empty (i.e. 
dir(res.path) is an empty vector) the file.remove operation will remove the 
directory.  This behaviour is documented in the help for file.remove.  When 
your subsequent function tries to write to that directory it does not exist 
hence the error. 

wd = tempdir()
res.path = paste0(wd,'/OUT/')
dir.create(res.path)
#this will delete the directory as res.path is empty
file.remove(paste0(res.path,dir(res.path))) 

dir.create(res.path)
file.create(paste0(res.path,'test.txt')
#this will delete the test.txt file and leave the directory in place
file.remove(paste0(res.path,dir(res.path))) 

Does your function work if you don't do the file.remove beforehand?

HTH

> On 19 Jul 2017, at 09:19, Ana Belén Marín  wrote:
> 
> Hi all!
> 
> I'm developing a shiny app and I have problems when I wanna write a .txt 
> file.
> 
> First of all, I change the directory in order to work in a temporal one:
> 
> wd   <- tempdir()
> setwd( wd )
> res.path <- paste0( wd, "/OUT/" )
> dir.create( res.path )
> 
> Just before calling the function that fails, I remove, if exist, the old 
> files of the directory:
> 
> file.remove( paste0( res.path, dir( res.path ) ) )
> 
> Then, I call f.texto, whose code is as follows:
> 
> f.texto <- function( l, res, a ){
>   myfile <- res
>   write( paste( "SAIC50. ", a, ", ", date(), sep = "" ), file = paste( 
> res.path, myfile, sep = "" ) )
>   write( "--", file = paste (res.path, 
> myfile, sep = ""), append = T )
>   write( "\r\n", file = paste( res.path, myfile, sep = "" ), append = T )
>   write( l[[6]], file = paste( res.path, myfile, sep = "" ), append = T )
>   write( "\r\n", file = paste( res.path, myfile, sep = "" ), append = T )
> }
> 
> and I get the next error the first time I run the app and choose the 
> wanted option  (then, the error disappears):
> 
> Error in file: cannot open the connection.
> 
> Warning message:
> 
> In file(file, ifelse(append, "a", "w")) :
> 
> cannot open the file 
> '/tmp/RtmpSQOYnV/OUT/Resultado-ajuste5-ic50-2017-07-19.txt': No such 
> file or director
> 
> It's like the first time running the app, the file is not created with 
> the command /write("...", file = "...") in spite of the option append is 
> false. /
> 
> Thanks for your help,
> 
> Anab.
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] mgcv gam/bam model selection with random effects and AR terms

2017-04-08 Thread Mathew Guilfoyle
Would be grateful for advice on gam/bam model selection incorporating random 
effects and autoregressive terms.

I have a multivariate time series recorded on ~500 subjects at ~100 time 
points.  One of the variables (A) is the dependent and four others (B to E) are 
predictors.  My basic formula is:

[model 1]: bam(A ~ s(time)+s(B)+s(C)+s(D)+s(E))

I've then included a random intercept and a random effect for time as the 
pattern of A over time is highly variable across subjects.

[model 2]: bam(A ~ s(time)+s(B)+s(C)+s(D)+s(E)+s(id, bs='re')+s(id,time, 
bs='re'))

I expect there is also potential for autocorrelation within the time series. So:

[model 3]: bam(A ~ s(time)+s(B)+s(C)+s(D)+s(E)+s(id, bs='re')+s(id,time, 
bs='re'), AR.start = startindex, rho = 0.52)

The rho value of 0.52 was settled on by trial-and-error minimising fREML/ML 
(side question: am I correct in understanding that bam can only use a fixed rho 
rather than taking this as a value to optimise as in gamm?)

The lowest fREML or ML values are obtained by model 3 (71674 vs 72099) for 
model 2) but the highest adjusted R2/deviance explained is with model 2 (37.7 
vs 42.1%).  Model 1 is inferior to both the others on all measures.

Is it better to select the model including the AR term given the lower ML or is 
it legitimate to go with the 'simpler' model 2 that has higher R2/deviance 
explained?

I am unable to provide a fully reproducible example as I don't know how to 
generate sample data with these specific characteristics.

Many thanks
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Taking the sum of only some columns of a data frame

2017-03-31 Thread Mathew Guilfoyle
This does the summation you want in one line:

#create example data and column selection
d = as.data.frame(matrix(rnorm(50),ncol=5))
cols = c(1,3)

#sum selected columns and put results in new row
d[nrow(d)+1,cols] = colSums(d[,cols])

However, I would agree with the sentiments that this is a bad idea; far better 
to have the mean values stored in a new object leaving the original data table 
untainted.  


> On 31 Mar 2017, at 17:20, Bruce Ratner PhD  wrote:
> 
> Hi R'ers:
> Given a data.frame of five columns and ten rows. 
> I would like to take the sum of, say, the first and third columns only.
> For the remaining columns, I do not want any calculations, thus rending their 
> "values" on the "total" row blank. The sum/total row is to be combined to the 
> original data.frame, yielding a data.frame with five columns and eleven rows. 
> 
> Thanks, in advance. 
> Bruce 
> 
> 
> __
> Bruce Ratner PhD
> The Significant Statistician™
> 
> 
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lagging over consecutive pairs of rows in dataframe

2017-03-18 Thread Mathew Guilfoyle
If you are strict about your data formatting then the following is a fast way 
of calculating the differences, based on reshaping the data column:

A = matrix(mydata$rslt, nrow=2)
data.frame(exp=1:ncol(A), diff=A[2,]-A[1,])

alternatively, if the 'exp' values are not guaranteed to be sequential you can 
reshape an index:

A = matrix(1:nrow(mydata), nrow=2)
data.frame(exp=mydata$exp[A[1,]], diff=mydata$rslt[A[2,]]-mydata$rslt[A[1,]])

However, I would suggest that you have a further variable to label 'control' 
and 'treatment' groups and explicitly use this for the calculation.  Otherwise, 
if at any time you sort or reorder the data you will run into problems or 
produce erroneous results (but more than likely won't generate any actual R 
errors to alert you):

data.frame(exp = unique(mydata$exp), diff = as.vector(by(mydata, mydata$exp, 
function(x) x$rslt[x$type=='treatment']-x$rslt[x$type=='control'])))

The efficiency of the various options that have been suggested in this thread 
piqued my interest so a quick benchmark seemed in order (see below, including a 
'safer' method).  Of course, this is probably only relevant if you have huge 
datasets that you are repeatedly performing this calculation on.


library('microbenchmark')

#create some example data similar to the OP
ndata = 1000
mydata = data.frame(exp = cumsum(rep(c(1,0),ndata)),rslt=sample(1:50, size = 
ndata*2, replace = TRUE), type=rep(c('control','treatment'),ndata))

#various suggested options
diff.BG = function(mydata) {
  evens <-  (seq_len(nrow(mydata)) %% 2) == 0
  data.frame(exp = mydata[evens,1 ], diff = diff(mydata[,2])[evens[-1]])
}

diff.US = function(mydata) {
  aggregate(mydata$rslt, by = list(group = mydata$exp), FUN = diff)
}

diff.MG1 = function(mydata) {
  A = matrix(mydata$rslt, nrow=2)
  data.frame(exp=1:ncol(A), diff=A[2,]-A[1,])
}

diff.MG2 = function(mydata) {
  A = matrix(1:nrow(mydata), nrow=2)
  data.frame(exp=mydata$exp[A[1,]], diff=mydata$rslt[A[2,]]-mydata$rslt[A[1,]])
}

diff.safe = function(mydata) {
  data.frame(exp = unique(mydata$exp), diff = as.vector(by(mydata, mydata$exp, 
function(x) x$rslt[x$type=='treatment']-x$rslt[x$type=='control'])))
}

#benchmark
microbenchmark(BG=diff.BG(mydata), US=diff.US(mydata), MG1=diff.MG1(mydata), 
MG2=diff.MG2(mydata), safe=diff.safe(my data))

Unit: microseconds
 expr   min  lqmean  median  uqmax neval
   BG   273.837299.0015351.0377316.7400350.5220   2385.289   100
   US  9872.457  10511.1065  11555.6048  11108.0790  12471.8060  17609.518   100
  MG1   168.783196.8635229.9329210.9370249.4895471.381   100
  MG2   221.303237.0480265.5097254.3895280.7815418.728   100
 safe 97869.540 104164.5130 109579.9834 107199.7715 110315.8590 170028.377   100



> On 17 Mar 2017, at 14:54, Evan Cooch  wrote:
> 
> Suppose I have a dataframe that looks like the following:
> 
> n=2
> mydata <- data.frame(exp = rep(1:5,each=n), rslt = 
> c(12,15,7,8,24,28,33,15,22,11))
> mydata
>exp rslt
> 11   12
> 21   15
> 327
> 428
> 53   24
> 63   28
> 74   33
> 84   15
> 95   22
> 10   5   11
> 
> The variable 'exp' (for experiment') occurs in pairs over consecutive 
> rows -- 1,1, then 2,2, then 3,3, and so on. The first row in a pair is 
> the 'control', and the second is a 'treatment'. The rslt column is the 
> result.
> 
> What I'm trying to do is create a subset of this dataframe that consists 
> of the exp number, and the lagged difference between the 'control' and 
> 'treatment' result.  So, for exp=1, the difference is (15-12)=3. For 
> exp=2,  the difference is (8-7)=1, and so on. What I'm hoping to do is 
> take mydata (above), and turn it into
> 
>  exp  diff
> 1   1  3
> 2   2  1
> 3   3  4
> 4   4  -18
> 5   5  -11
> 
> The basic 'trick' I can't figure out is how to create a lagged variable 
> between the second row (record) for a given level of exp, and the first 
> row for that exp.  This is easy to do in SAS (which I'm more familiar 
> with), but I'm struggling with the equivalent in R. The brute force 
> approach  I thought of is to simply split the dataframe into to (one 
> even rows, one odd rows), merge by exp, and then calculate a difference. 
> But this seems to require renaming the rslt column in the two new 
> dataframes so they are different in the merge (say, rslt_cont n the odd 
> dataframe, and rslt_trt in the even dataframe), allowing me to calculate 
> a difference between the two.
> 
> While I suppose this would work, I'm wondering if I'm missing a more 
> elegant 'in place' approach that doesn't require me to split the data 
> frame and do every via a merge.
> 
> Suggestions/pointers to the obvious welcome. I've tried playing with 
> lag, and some approaches using lag in the zoo package,  but haven't 
> found the magic trick. The problem (meaning, what I can't figure out) 
> seems to 

Re: [R] Matrix

2017-03-06 Thread Mathew Guilfoyle
Effectively you want a circulant matrix but filled by column.

Example input vector and number of columns

x = c(1:8,19:20)
nc = 5

For the result you specifically describe, the following generalises for any 
vector and any arbitrary number of columns 'nc', padding with zeros as 
necessary.

matrix(c(rep(c(x,rep(0,nc)),nc-1),x), ncol=nc)


For the unpadded column circulant on the same inputs

nx = length(x)
ind = 1+ outer(0:(nx-1),0:(1-nc),'+') %% nx
matrix(x[ind], ncol=nc)


Cheers

> On 6 Mar 2017, at 16:18, Peter Thuresson  wrote:
> 
> Hello,
> 
> Is there a function in R which can transform, let say a vector:
> 
> c(1:4)
> 
> to a matrix where the vector is repeated but "projected" +1 one step down for 
> every (new) column.
> I want the output below from the vector above, like this:
> 
> p<-c(1,2,3,4,0,0,0,0,1,2,3,4,0,0,0,0,1,2,3,4,0,0,0,0,1,2,3,4)
> 
> matrix(p,7,4)
> 
> best regards / Peter
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] gamm problem/error fitting smooth by factor

2016-12-26 Thread Mathew Guilfoyle
I have a (unbalanced) dataset of time series collected across several subjects 
(n~500, ~6 observations).  I would like to model the overall smooth time 
trend of a variable and how this trend differs by various categorical factors, 
with the subject as a random effect.

My baseline model 

m1 = gamm(v1 ~ s(time), random=list(id=~1+time), data=dat)

shows that time is a significant term.

I have then tried to run this model:

m2 = gamm(v1 ~ s(time)+s(time, by=fac1), random=list(id=~1+time), data=dat)

where fac1 is a binary factor.

My intuitive understanding of this is that the first smooth term will capture 
the overall major trend common to both groups of fac1 and the second smooth 
will model the deviation from this mean trend for the fac1==1 subgroup.  
However, I get the error:

 Error in MEestimate(lmeSt, grps) : 
  Singularity in backsolve at level 0, block 1 

I've tried other models to isolate the problem but get the exact same error

m3 = gamm(v1 ~ s(time)+s(time, by=fac1), random=list(id=~1), data=dat)   
#remove the time random effect
m4 = gamm(v1 ~ fac1+s(time)+s(time, by=fac1), random=list(id=~1+time), 
data=dat)   #have the factor as a main effect also

I'm not sure if the whole notion of what I'm trying to do is wrong-headed or if 
I need to adjust some parameters to get the model (m2) to fit.

Thanks

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.