from:"Joris Meys"

Re: [R] Should there be an R-beginners list?

2013-11-27 Thread Joris Meys

StackOverflow has certainly its merits, although I miss a bit the good ol'
Oxford sarcasm gems you find in this list.

This said : Beginner's list. Bad, bad idea. First rule in my classes is:
RTFI (Read The Fucking Internetzz). Anybody using R should be able to do a
basic Google search. A beginner's list is not going to help them in
learning that.

If beginners do the effort of following the posting guidelines, netiquette
or any other guide to getting help on the internet, they can safely use
this list.

Cheers
Joris

On Wed, Nov 27, 2013 at 9:47 AM, Rolf Turner wrote:

> On 11/25/13 09:04, Rich Shepard wrote:
>
>> On Sun, 24 Nov 2013, Yihui Xie wrote:
>>
>>  Mailing lists are good for a smaller group of people, and especially
>>> good when more focused on discussions on development (including bug
>>> reports). The better place for questions is a web forum.
>>>
>>
>>   I disagree. Mail lists push messages to subscribers while web fora
>> require
>> one to use a browser, log in, then pull messages. Not nearly as
>> convenient.
>>
>
> Well expressed Rich.  I agree with you completely.
>
> cheers,
>
> Rolf Turner
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Tinn-R user guide (latex sources) available on GitHub

2013-11-27 Thread Joris Meys

Thank you for doing this!


On Wed, Nov 27, 2013 at 11:22 AM, Jose Claudio Faria <
joseclaudio.fa...@gmail.com> wrote:

> Dears user,
>
> The Tinn-R User Guide is completely written in LaTeX and the idea
> behind this to be available on GitHub is that it has contributions
> from multiple users.
>
> If you find something that you would like to include or impruve:
> please, fell free to make it better.
>
> This User Guide have been developing under both OS: Windows and Linux.
>
> Under Windows: we have been using Tinn-R as editor and MikTeX as compiler.
>
> Under Linux: we have been using Vim (with LaTeX-Box plugin) as editor
> and TexLive as compiler.
>
> Link: https://github.com/jcfaria/Tinn-R-User-Guide
>
> Regards,
> --
> ///\\\///\\\///\\\///\\\///\\\///\\\///\\\///\\\
> Jose Claudio Faria
> Estatistica
> UESC/DCET/Brasil
> joseclaudio.faria at gmail.com
> Telefones:
> 55(73)3680.5545 - UESC
> 55(73)9100.7351 - TIM
> 55(73)8817.6159 - OI
> ///\\\///\\\///\\\///\\\///\\\///\\\///\\\///\\\
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problems when Apply a script to a list

2010-08-27 Thread Joris Meys

Where exactly did you put the sink() statement? I tried it with a 1000
dataframes and I have no problem whatsoever.

Cheers
Joris

On Fri, Aug 27, 2010 at 6:56 AM,   wrote:
> Joris,
> thank you very much for your help.
> It is very helpful for me.
> I still have a problem with sink stack although I put after my last print
> result sink().
> I want to ask you another one question. Do you know how can I have at the
> sink output file a message for which one set of the list are the exact
> results. When I use for instead of apply I had
> cat("\n***\nEstimation \n\nDataset
> Sim : ",
>          i )
> Thank you in advance
> Evgenia
> Joris Meys writes:
>>
>> Answers below.
>> On Thu, Aug 26, 2010 at 11:20 AM, Evgenia  wrote:
>>>
>>> Dear users,
>>> ***I have a function f to simulate data from a model (example below
>>> used
>>> only to show my problems)
>>> f<-function(n,mean1){
>>> a<-matrix(rnorm(n, mean1 , sd = 1),ncol=5)
>>> b<-matrix(runif(n),ncol=5)
>>> data<-rbind(a,b)
>>> out<-data
>>> out}
>>> *I want to simulate 1000 datasets (here only 5) so I use
>>> S<-list()
>>> for (i in 1:5){
>>> S[[i]]<-f(n=10,mean1=0)}
>>> **I have a very complicated function  for estimation of a model which
>>> I
>>> want to apply to Each one of the above simulated datasets
>>> fun<-function(data){data<-as.matrix(data)
>>> sink(' Example.txt',append=TRUE)
>>>          cat("\n***\nEstimation
>>> \n\nDataset Sim : ",
>>>            i )
>>> d<-data%*%t(data)
>>> s<-solve(d)
>>> print(s)
>>> out<-s
>>> out
>>> }
>>> results<-list()
>>> for (i in 1:5){results[[i]]<-fun(data=S[[i]])}
>>>
>>> My questions are:
>>> 1) for some datasets system is computational singular and this causes
>>> execution of the for to stop.By this way I have only results until this
>>> problem happens.How can I pass over the execution for this step and have
>>> results for All other datasets for which function fun is applicable?
>>
>> see ?try, or ?tryCatch.
>> I'd do something in the line of
>> for(i in 1:5){
>>     tmp <- try(fun(data=S[[i]]))
>>     results[[i]] <- ifelse(is(tmp,"try-error"),NA,tmp)
>> }
>> Alternatively, you could also use lapply :
>> results <- lapply(S,function(x{
>>   tmp <- try(fun(data=x))
>>    ifelse(is(tmp,"try-error"),NA,tmp)
>> })
>>>
>>> 2) After some runs to my program, I receive this error message someError
>>> in
>>> sink("output.txt") : sink stack is full . How can I solve this problem,
>>> as I
>>> want to have results of my program for 1000 datasets.
>>
>> That is because you never empty the sink. add sink() after the last
>> line you want in the file. This will empty the sink buffer to the
>> file. Otherwise R keeps everything in the memory, and that gets too
>> full after a while.
>>>
>>> 3) Using for is the correct way to run my proram for a list
>>
>> See the lapply solution.
>>>
>>> Thanks alot
>>> Evgenia
>>> --
>>> View this message in context:
>>> http://r.789695.n4.nabble.com/Problems-when-Apply-a-script-to-a-list-tp2339403p2339403.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>> Joris Meys
>> Statistical consultant
>> Ghent University
>> Faculty of Bioscience Engineering
>> Department of Applied mathematics, biometrics and process control
>> tel : +32 9 264 59 87
>> joris.m...@ugent.be
>> ---
>> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>
>
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Passing arguments between S4 methods fails within a function:bug? example with raster package.

2010-08-26 Thread Joris Meys

Dear all,

This problem came up initially while debugging a function, but it
seems to be a more general problem of R. I hope I'm wrong, but I can't
find another explanation. Let me illustrate with the raster package.

For an object "RasterLayer" (which inherits from Raster), there is a
method xyValues defined with the signature
(object="RasterLayer",xy="matrix"). There is also a method with
signature (object="Raster",xy="vector"). The only thing this method
does, is change xy into a matrix and then pass on to the next method
using callGeneric again. Arguments are passed.

Now this all works smoothly, as long as you stay in the global environment :
require(raster)

a <- raster()
a[] <- 1:ncell(a)

origin <- c(-80,50)
eff.dist <- 10

unlist(xyValues(a,xy=origin,buffer=eff.dist))
[1] 14140 14141 14500 14501

Now let's make a very basic test function :

test <- function(x,orig.point){
eff.distance <- 10
p <- unlist(xyValues(x,xy=orig.point,buffer=eff.distance))
return(p)
}

This gives the following result :
> test(a,origin)
Error in .local(object, xy, ...) : object 'eff.distance' not found

huh? Apparently, eff.distance got lost somewhere in the parsetree (am
I saying this correctly?)

The funny thing is when we change origin to a matrix :
> origin <- matrix(origin,ncol=2)

> unlist(xyValues(a,xy=origin,buffer=eff.dist))
[1] 14140 14141 14500 14501

> test(a,origin)
[1] 14140 14141 14500 14501

It all works again! So something goes wrong with passing the arguments
from one method to another using callGeneric. Is this a bug in R or am
I missing something obvious?

The relevant code from the raster package :

setMethod("xyValues", signature(object='Raster', xy='vector'),
function(object, xy, ...) {
if (length(xy) == 2) {
callGeneric(object, matrix(xy, ncol=2), ...)
} else {
stop('xy coordinates should be a two-column matrix or 
data.frame,
or a vector of two numbers.')
}
} )

setMethod("xyValues", signature(object='RasterLayer', xy='matrix'),
function(object, xy, method='simple', buffer=NULL, fun=NULL, 
na.rm=TRUE) {

if (dim(xy)[2] != 2) {
stop('xy has wrong dimensions; it should have 2 
columns' )
}

if (! is.null(buffer)) {
return( .xyvBuf(object, xy, buffer, fun, na.rm=na.rm) )
}

if (method=='bilinear') {
return(.bilinearValue(object, xy))
} else if (method=='simple') {
cells <- cellFromXY(object, xy)
    return(.readCells(object, cells))
} else {
stop('invalid method argument. Should be simple or 
bilinear.')
}
}   
)   



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problems when Apply a script to a list

2010-08-26 Thread Joris Meys

Answers below.

On Thu, Aug 26, 2010 at 11:20 AM, Evgenia  wrote:
>
> Dear users,
>
> ***I have a function f to simulate data from a model (example below used
> only to show my problems)
>
> f<-function(n,mean1){
> a<-matrix(rnorm(n, mean1 , sd = 1),ncol=5)
> b<-matrix(runif(n),ncol=5)
> data<-rbind(a,b)
> out<-data
> out}
>
> *I want to simulate 1000 datasets (here only 5) so I use
> S<-list()
>
> for (i in 1:5){
> S[[i]]<-f(n=10,mean1=0)}
>
> **I have a very complicated function  for estimation of a model which I
> want to apply to Each one of the above simulated datasets
>
> fun<-function(data){data<-as.matrix(data)
> sink(' Example.txt',append=TRUE)
>          cat("\n***\nEstimation
> \n\nDataset Sim : ",
>            i )
> d<-data%*%t(data)
> s<-solve(d)
> print(s)
> out<-s
> out
> }
> results<-list()
> for (i in 1:5){results[[i]]<-fun(data=S[[i]])}
>
>
> My questions are:
> 1) for some datasets system is computational singular and this causes
> execution of the for to stop.By this way I have only results until this
> problem happens.How can I pass over the execution for this step and have
> results for All other datasets for which function fun is applicable?

see ?try, or ?tryCatch.

I'd do something in the line of
for(i in 1:5){
 tmp <- try(fun(data=S[[i]]))
 results[[i]] <- ifelse(is(tmp,"try-error"),NA,tmp)
}

Alternatively, you could also use lapply :
results <- lapply(S,function(x{
   tmp <- try(fun(data=x))
ifelse(is(tmp,"try-error"),NA,tmp)
})

>
> 2) After some runs to my program, I receive this error message someError in
> sink("output.txt") : sink stack is full . How can I solve this problem, as I
> want to have results of my program for 1000 datasets.

That is because you never empty the sink. add sink() after the last
line you want in the file. This will empty the sink buffer to the
file. Otherwise R keeps everything in the memory, and that gets too
full after a while.

>
> 3) Using for is the correct way to run my proram for a list
See the lapply solution.

>
> Thanks alot
>
> Evgenia
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Problems-when-Apply-a-script-to-a-list-tp2339403p2339403.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Change value of a slot of an S4 object within a method.

2010-08-26 Thread Joris Meys

OK, I admit. It will never win a beauty price, but I won't either so
we go pretty well together.

Seriously, I am aware this is about the worst way to solve such a
problem. But as I said, rewriting the class definitions wasn't an
option any more. I'll definitely take your advice for the next
project.

Cheers
Joris

On Wed, Aug 25, 2010 at 9:36 PM, Steve Lianoglou
 wrote:
> Howdy,

> My eyes start to gloss over on their first encounter of `substitute`
> ... add enough `eval`s and (even) an `expression` (!) to that, and
> you'll probably see me running for the hills ... but hey, if it makes
> sense to you, more power to you ;-)
>
> Glad you found a fix that works,
>
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Change value of a slot of an S4 object within a method.

2010-08-25 Thread Joris Meys

Hi Steve,

thanks for the tip.  I'll definitely take a closer look at your
solution for implementation for future use.  But right now I don't
have the time to start rewriting my class definitions.

Luckily, I found where exactly things were going wrong. After reading
into the documentation about the evaluation in R, I figured out I have
to specify the environment where substitute should look explicitly as
parent.frame(1). I still don't understand completely why exactly, but
it does the job.

Thus :
eval(eval(substitute(expression(obj...@extra[[name]] <<- value

should become :

eval(
   eval(
  substitute(
 expression(obj...@extra[[name]] <<- value)
  ,env=parent.frame(1) )
   )
)

Tried it out and it works.

Cheers
Joris

On Wed, Aug 25, 2010 at 6:21 PM, Steve Lianoglou
 wrote:
> Hi Joris,
>
> On Wed, Aug 25, 2010 at 11:56 AM, Joris Meys  wrote:
>> Dear all,
>>
>> I have an S4 class with a slot "extra" which is a list. I want to be
>> able to add an element called "name" to that list, containing the
>> object "value" (which can be a vector, a dataframe or another S4
>> object)
>>
>> Obviously
>>
>> setMethod("add.extra",signature=c("PM10Data","character","vector"),
>>  function(object,name,value){
>>             obj...@extra[[name]] <- value
>>  }
>> )
>>
>> Contrary to what I would expect, the line :
>> eval(eval(substitute(expression(obj...@extra[[name]] <<- value
>>
>> gives the error :
>> Error in obj...@extra[[name]] <<- value : object 'object' not found
>>
>> Substitute apparently doesn't work any more in S4 methods...
>>
>>  I found a work-around by calling the initializer every time, but this
>> influences the performance. Returning the object is also not an
>> option, as I'd have to remember to assign that one each time to the
>> original name.
>>
>> Basically I'm trying to do some call by reference with S4, but don't
>> see how I should do that. How would I be able to tackle this problem
>> in an efficient and elegant way?
>
> In lots of my own S4 classes I define a slot called ".cache" which is
> an environment for this exact purpose.
>
> Using this solution for your scenario might look something like this:
>
> setMethod("add.extra",signature=c("PM10Data","character","vector"),
> function(object, name, value) {
>  obj...@.cache$extra[[name]] <- value
> })
>
> I'm not sure what your entire problem looks like, but to "get" your
> extra list, or a value form it, you could:
>
> setMethod("extra", signature="PM10Data",
> function(object, name=NULL) {
>  if (!is.null(name)) {
>    obj...@.cache$extra[[name]]
>  } else {
>    obj...@.cache$extra
> })
>
> ... or something like that.
>
> The last thing you have to be careful of is that you nave to make sure
> that each new("PM10Data") object you have initializes its *own* cache:
>
> setClass("PM10Data", representation(..., .cache='environment'))
> setMethod("initialize", "PM10Data",
> function(.Object, ..., .cache=new.env()) {
>  callNextMethod(.Object, .cache=.cache, ...)
> })
>
> Does that make sense?
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Change value of a slot of an S4 object within a method.

2010-08-25 Thread Joris Meys

Dear all,

I have an S4 class with a slot "extra" which is a list. I want to be
able to add an element called "name" to that list, containing the
object "value" (which can be a vector, a dataframe or another S4
object)

Obviously

setMethod("add.extra",signature=c("PM10Data","character","vector"),
  function(object,name,value){
 obj...@extra[[name]] <- value
  }
)

Contrary to what I would expect, the line :
eval(eval(substitute(expression(obj...@extra[[name]] <<- value

gives the error :
Error in obj...@extra[[name]] <<- value : object 'object' not found

Substitute apparently doesn't work any more in S4 methods...

 I found a work-around by calling the initializer every time, but this
influences the performance. Returning the object is also not an
option, as I'd have to remember to assign that one each time to the
original name.

Basically I'm trying to do some call by reference with S4, but don't
see how I should do that. How would I be able to tackle this problem
in an efficient and elegant way?

Thank you in advance
Cheers
Joris

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] strange error : isS4(x) in gamm function (mgcv package). Variable in data-frame not recognized???

2010-07-28 Thread Joris Meys

Follow up:

I finally succeeded to more or less reproduce the error. The origin
lied in the fact that I accidently loaded a function while being in
browser mode for debugging that function. So something went very much
wrong with the namespaces. Teaches me right...

Cheers
Joris

On Wed, Jul 28, 2010 at 2:27 PM, Joris Meys  wrote:
> Dear all,
>
> it gets even more weird. After restarting R, the code I used works
> just fine. The call is generated in a function that I debugged using
> browser(). Problem is solved, but I have no clue whatsoever how that
> error came about. It must have something to do with namespaces, but
> the origin is dark. I tried to regenerate the error, but didn't
> succeed.
>
> Somebody an idea as to where I have to look for a cause?
>
> Cheers
> Joris
>
> On Wed, Jul 28, 2010 at 1:16 PM, Joris Meys  wrote:
>> Dear all,
>>
>> I run a gamm with following call :
>>
>> result <- try(gamm(values~ s( VM )+s( RH )+s( TT )+s( PP
>> )+RF+weekend+s(day)+s(julday) ,correlation=corCAR1(form=~ day|month
>> ),data=tmp) )"
>>
>> with mgcv version 1.6.2
>>
>> No stress about the data, the error is not data-related. I get :
>>
>> Error in isS4(x) : object 'VM' not found
>>
>> What so? I did define the dataframe to be used, and the dataframe
>> contains a variable VM :
>>
>>> str(tmp)
>> 'data.frame':   4014 obs. of  12 variables:
>>  $ values : num  73.45 105.45 74.45 41.45 -4.55 ...
>>  $ dates  :Class 'Date'  num [1:4014] 9862 9863 9864 9865 9866 ...
>>  $ year   : num  -5.65 -5.65 -5.65 -5.65 -5.65 ...
>>  $ day    : num  -178 -177 -176 -175 -174 ...
>>  $ month  : Factor w/ 156 levels "1996-April","1996-August",..: 17 17
>> 17 17 17 17 17 17 17 17 ...
>>  $ julday : num  -2241 -2240 -2239 -2238 -2237 ...
>>  $ weekend: num  -0.289 -0.289 -0.289 0.711 0.711 ...
>>  $ VM     : num  0.139 -1.451 0.349 0.839 -0.611 ...
>>  $ RH     : num  55.2 61.4 59.8 64.1 60.7 ...
>>  $ TT     : num  -23.4 -23.6 -19.5 -16.1 -15.3 ...
>>  $ PP     : num  6.17 4.27 -4.93 -9.23 -2.63 ...
>>  $ RF     : Ord.factor w/ 3 levels "None"<"<2.5mm"<..: 1 1 1 1 1 1 1 1 1 1 
>> ...
>>  - attr(*, "means")= num
>>
>> Any idea what I'm missing here?
>>
>> Cheers
>> Joris
>>
>>
>> --
>> Joris Meys
>> Statistical consultant
>>
>> Ghent University
>> Faculty of Bioscience Engineering
>> Department of Applied mathematics, biometrics and process control
>>
>> tel : +32 9 264 59 87
>> joris.m...@ugent.be
>> ---
>> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>>
>
>
>
> --
> Joris Meys
> Statistical consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Applied mathematics, biometrics and process control
>
> tel : +32 9 264 59 87
> joris.m...@ugent.be
> ---
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] strange error : isS4(x) in gamm function (mgcv package). Variable in data-frame not recognized???

2010-07-28 Thread Joris Meys

Dear all,

it gets even more weird. After restarting R, the code I used works
just fine. The call is generated in a function that I debugged using
browser(). Problem is solved, but I have no clue whatsoever how that
error came about. It must have something to do with namespaces, but
the origin is dark. I tried to regenerate the error, but didn't
succeed.

Somebody an idea as to where I have to look for a cause?

Cheers
Joris

On Wed, Jul 28, 2010 at 1:16 PM, Joris Meys  wrote:
> Dear all,
>
> I run a gamm with following call :
>
> result <- try(gamm(values~ s( VM )+s( RH )+s( TT )+s( PP
> )+RF+weekend+s(day)+s(julday) ,correlation=corCAR1(form=~ day|month
> ),data=tmp) )"
>
> with mgcv version 1.6.2
>
> No stress about the data, the error is not data-related. I get :
>
> Error in isS4(x) : object 'VM' not found
>
> What so? I did define the dataframe to be used, and the dataframe
> contains a variable VM :
>
>> str(tmp)
> 'data.frame':   4014 obs. of  12 variables:
>  $ values : num  73.45 105.45 74.45 41.45 -4.55 ...
>  $ dates  :Class 'Date'  num [1:4014] 9862 9863 9864 9865 9866 ...
>  $ year   : num  -5.65 -5.65 -5.65 -5.65 -5.65 ...
>  $ day    : num  -178 -177 -176 -175 -174 ...
>  $ month  : Factor w/ 156 levels "1996-April","1996-August",..: 17 17
> 17 17 17 17 17 17 17 17 ...
>  $ julday : num  -2241 -2240 -2239 -2238 -2237 ...
>  $ weekend: num  -0.289 -0.289 -0.289 0.711 0.711 ...
>  $ VM     : num  0.139 -1.451 0.349 0.839 -0.611 ...
>  $ RH     : num  55.2 61.4 59.8 64.1 60.7 ...
>  $ TT     : num  -23.4 -23.6 -19.5 -16.1 -15.3 ...
>  $ PP     : num  6.17 4.27 -4.93 -9.23 -2.63 ...
>  $ RF     : Ord.factor w/ 3 levels "None"<"<2.5mm"<..: 1 1 1 1 1 1 1 1 1 1 ...
>  - attr(*, "means")= num
>
> Any idea what I'm missing here?
>
> Cheers
> Joris
>
>
> --
> Joris Meys
> Statistical consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Applied mathematics, biometrics and process control
>
> tel : +32 9 264 59 87
> joris.m...@ugent.be
> ---
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] strange error : isS4(x) in gamm function (mgcv package). Variable in data-frame not recognized???

2010-07-28 Thread Joris Meys

Dear all,

I run a gamm with following call :

result <- try(gamm(values~ s( VM )+s( RH )+s( TT )+s( PP
)+RF+weekend+s(day)+s(julday) ,correlation=corCAR1(form=~ day|month
),data=tmp) )"

with mgcv version 1.6.2

No stress about the data, the error is not data-related. I get :

Error in isS4(x) : object 'VM' not found

What so? I did define the dataframe to be used, and the dataframe
contains a variable VM :

> str(tmp)
'data.frame':   4014 obs. of  12 variables:
 $ values : num  73.45 105.45 74.45 41.45 -4.55 ...
 $ dates  :Class 'Date'  num [1:4014] 9862 9863 9864 9865 9866 ...
 $ year   : num  -5.65 -5.65 -5.65 -5.65 -5.65 ...
 $ day: num  -178 -177 -176 -175 -174 ...
 $ month  : Factor w/ 156 levels "1996-April","1996-August",..: 17 17
17 17 17 17 17 17 17 17 ...
 $ julday : num  -2241 -2240 -2239 -2238 -2237 ...
 $ weekend: num  -0.289 -0.289 -0.289 0.711 0.711 ...
 $ VM : num  0.139 -1.451 0.349 0.839 -0.611 ...
 $ RH : num  55.2 61.4 59.8 64.1 60.7 ...
 $ TT : num  -23.4 -23.6 -19.5 -16.1 -15.3 ...
 $ PP : num  6.17 4.27 -4.93 -9.23 -2.63 ...
 $ RF : Ord.factor w/ 3 levels "None"<"<2.5mm"<..: 1 1 1 1 1 1 1 1 1 1 ...
 - attr(*, "means")= num

Any idea what I'm missing here?

Cheers
Joris


-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] random sample from arrays

2010-07-09 Thread Joris Meys

Could you elaborate?

Both

 x <- 1:4
 set <- matrix(nrow = 50, ncol = 11)
  for(i in c(1:11)){
set[,i] <-sample(x,50)
print(c(i,"->", set), quote = FALSE)
   }

and

 x <- 1:4
 set <- matrix(nrow = 50, ncol = 11)
 for(i in c(1:50)){
   set[i,] <-sample(x,11)
   print(c(i,"->", set), quote = FALSE)
  }

run perfectly fine on my computer.
Cheers


On Fri, Jul 9, 2010 at 3:10 PM, Assa Yeroslaviz  wrote:
> Hi Joris,
> I guess i did it wrong again.
> but your example didn't work either. I still get the error massage.
>
> but replicate function just fine. I can even replicate the whole array
> lines.
>
> THX
>
> Assa
>
> On Thu, Jul 8, 2010 at 15:20, Joris Meys  wrote:
>>
>> Don't know what exactly you're trying to do, but you make a matrix
>> with 11 columns and 50 rows, then treat it as a vector. On top of
>> that, you try to fill 50 rows/columns with 50 values. Off course that
>> doesn't work. Did you check the warning messages when running the
>> code?
>>
>> Either do :
>>
>>  for(i in c(1:11)){
>>    set[,i] <-sample(x,50)
>>    print(c(i,"->", set), quote = FALSE)
>>   }
>>
>> or
>>
>>  for(i in c(1:50)){
>>    set[i,] <-sample(x,11)
>>    print(c(i,"->", set), quote = FALSE)
>>   }
>>
>> Or just forget about the loop altogether and do :
>>
>> set <- replicate(11,sample(x,50))
>> or
>> set <- t(replicate(50,sample(x,11)))
>>
>> cheers
>>
>> On Thu, Jul 8, 2010 at 8:04 AM, Assa Yeroslaviz  wrote:
>> > Hello R users,
>> >
>> > I'm trying to extract random samples from a big array I have.
>> >
>> > I have a data frame of over 40k lines and would like to produce around
>> > 50
>> > random sample of around 200 lines each from this array.
>> >
>> > this is the matrix
>> >          ID xxx_1c xxx__2c xxx__3c xxx__4c xxx__5T xxx__6T xxx__7T
>> > xxx__8T
>> > yyy_1c yyy_1c _2c
>> > 1 A_512  2.150295  2.681759  2.177138  2.142790  2.115344  2.013047
>> > 2.115634  2.189372  1.643328  1.563523
>> > 2 A_134 12.832488 12.596373 12.882581 12.987091 11.956149 11.994779
>> > 11.650336 11.995504 13.024494 12.776322
>> > 3 A_152  2.063276  2.160961  2.067549  2.059732  2.656416  2.075775
>> > 2.033982  2.111937  1.606340  1.548940
>> > 4 A_163  9.570761 10.448615  9.432859  9.732615 10.354234 10.993279
>> > 9.160038  9.104121 10.079177  9.828757
>> > 5 A_184  3.574271  4.680859  4.517047  4.047096  3.623668  3.021356
>> > 3.559434  3.156093  4.308437  4.045098
>> > 6 A_199  7.593952  7.454087  7.513013  7.449552  7.345718  7.367068
>> > 7.410085  7.022582  7.668616  7.953706
>> > ...
>> >
>> > I tried to do it with a for loop:
>> >
>> > genelist <- read.delim("/user/R/raw_data.txt")
>> > rownames(genelist) <- genelist[,1]
>> > genes <- rownames(genelist)
>> >
>> > x <- 1:4
>> > set <- matrix(nrow = 50, ncol = 11)
>> >
>> > for(i in c(1:50)){
>> >    set[i] <-sample(x,50)
>> >    print(c(i,"->", set), quote = FALSE)
>> >    }
>> >
>> > which basically do the trick, but I just can't save the results outside
>> > the
>> > loop.
>> > After having the random sets of lines it wasn't a problem to extract the
>> > line from the arrays using subset.
>> >
>> > genSet1 <-sample(x,50)
>> > random1 <- genes %in% genSet1
>> > subsetGenelist <- subset(genelist, random1)
>> >
>> >
>> > is there a different way of creating these random vectors or saving the
>> > loop
>> > results outside tjhe loop so I cn work with them?
>> >
>> > Thanks a lot
>> >
>> > Assa
>> >
>> >        [[alternative HTML version deleted]]
>> >
>> > __
>> > R-help@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>>
>>
>> --
>> Joris Meys
>> Statistical consultant
>>
>> Ghent University
>> Faculty of Bioscience Engineering
>> Department of Applied mathematics, biometrics and process control
>>
>> tel : +32 9 264 59 87
>> joris.m...@ugent.be
>> ---
>> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] split with list

2010-07-09 Thread Joris Meys

One  solution is  to put these unwanted entries to ""

repor$9853312 [1:2,2:3] <- ""

Cheers
Joris

On Fri, Jul 9, 2010 at 12:18 PM, n.via...@libero.it  wrote:
>
> Dear List I would like to ask you something concenting a better print of the 
> R output:
> I have a bit data frame which has the following structure:
> CFISCALE              RAGSOCB            ANNO       VAR1        VAR2.
> 9853312                     astra                 2005           6            
>    45
>
> 9853312                     astra                 2006          78            
>   45
>
>
> 9853312                     astra                 2007           55           
>    76
>
>
> 9653421                      geox                 2005           35           
>   89
>
>
>
> 9653421                     geox                 2006            24           
>     33
>
> 9653421                      geox                 2007           54           
>    55
>
>
> The first thing I did is to split my data frame for CFISCALE. The result is 
> that R has transformed my data frame into a list. The second step was to 
> transpose each element of my list.
> repo=split(rep,rep$CFISCALE)
> repor=lapply(repo,function(x){
> t(x)})
>
>
> When I print my list the format is the following
> $9853312
>                                   1                           2               
>          3
>
> CFISCALE            "9853312"         "9853312"             "9853312"
>
> RAGSOCB            "astra"               "astra"                    "astra"
>
> ANNO                   "2005"                "2006"                      
> "2007"
>
> VAR1                       6                         78                       
>        55
>
> VAR2                       45                        45                       
>       76
>
>
> There is a way to remove the  first row I mean 1, 2 , 3 and to have just one 
> CFISCALE and RAGSOCB???
> For the second problem I tried to use unique but it seems that it doesnt work 
> for list. So what I would like to get is:
> $9853312
>
>
>
>
> CFISCALE            "9853312"
>
>
> RAGSOCB            "astra"
> ANNO                   "2005"                "2006"                      
> "2007"
>
> VAR1                       6                         78                       
>        55
>
> VAR2                       45                        45                       
>       76
>
>
> This is because I next run xtable on my list in order to get a table in 
> Latex, which I woud like to be in a nice format.
> Thanks a lot for your attention!
>
>
>
>
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Non-parametric regression

2010-07-09 Thread Joris Meys

Just to be correct : gam is mentioned on the page Tal linked to, but
is a semi-parametric approach using maximum likelihood. It stays valid
though.

Another thing : you detect non-normality. But can you use a Poisson
distribution for example? The framework of generalized linear models
and generalized additive models allows you to deal with non-normality
of your data.

In any case, I suggest you contact a statistician nearby for guidance.

Cheers
Joris

On Fri, Jul 9, 2010 at 10:26 AM, Tal Galili  wrote:
> >From reviewing the first google page result for "Non-parametric regression
> R", I hope this link will prove useful:
>
> http://socserv.mcmaster.ca/jfox/Courses/Oxford-2005/R-nonparametric-regression.html
>
>
>
> Contact
> Details:---
> Contact me: tal.gal...@gmail.com |  972-52-7275845
> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
> www.r-statistics.com (English)
> --
>
>
>
>
> On Fri, Jul 9, 2010 at 11:01 AM, Ralf B  wrote:
>
>> I have two data sets, each a vector of 1000 numbers, each vector
>> representing a distribution (i.e. 1000 numbers each of which
>> representing a frequency at one point on a scale between 1 and 1000).
>> For similfication, here an short version with only 5 points.
>>
>>
>> a <- c(8,10,8,12,4)
>> b <- c(7,11,8,10,5)
>>
>> Leaving the obvious discussion about causality aside fro a moment, I
>> would like to see how well i can predict b from a using a regression.
>> Since I do not know anything about the distribution type and already
>> discovered non-normality I cannot use parametric regression or
>> anything GLM for that matter.
>>
>> How should I proceed in using non-parametric regression to model
>> vector a and see how well it predicts b? Perhaps you could extend the
>> given lines into a short example script to give me an idea? Are there
>> any other options?
>>
>> Best,
>> Ralf
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>        [[alternative HTML version deleted]]
>
> ______
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R^2 in loess and predict?

2010-07-09 Thread Joris Meys

I do not agree with your interpretation of the adjusted R^2. The R^2
is no more than the ratio of the explained variance by the total
variance, expressed in sums of squares. The adjusted R^2 is adjusted
for the degrees of freedom, and can only be used for selection
purposes. The interpretation towards the final model is hard, and
definitely not a measure of how well it models the population.

For a loess regression this can be calculated as well. But the loess
is a local regression technique, highly flexible, and highly dependent
on the window you use. In these cases, R^2 (or any other goodness of
fit test) tells you even less. You can get an R^2 of 1 if you chose
the window small enough.

If you want to do inference on nonlinear regression techniques, I
strongly suggest you use Generalized Additive Models, eg from the
package mgcv. There you can use the framework of likelihood ratio
tests for determination of goodness of fit by comparing models.

Cheers
Joris

On Fri, Jul 9, 2010 at 10:42 AM, Ralf B  wrote:
> Parametric regression produces R^2 as a measure of how well the model
> predicts the sample and adjusted R^2 as a measure of how well it
> models the population. What is the equalvalent for non-parametric
> regression (e.g. loess function) ?
>
> Ralf
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] package installation for Windows 7

2010-07-08 Thread Joris Meys

On Thu, Jul 8, 2010 at 3:39 PM, Duncan Murdoch  wrote:
> On 08/07/2010 9:26 AM, David Bickel wrote:
>>
>> Yes, the User into which I logged in before launching RGui is an
>> Administrator. Correct, the problem is not limited to Bioconductor packages.
>>
>
> On Windows 7, it's not enough to have the user be an administrator, you need
> to run programs specifically with administrative rights.  You do this by
> right clicking on the icon, and choosing "Run as administrator".

Reason for this is that "Program Files" is a protected directory, and
changes can only be done by programs that get administrator rights
(which is not the same as running a program in an administrator
account). Without those administrator rights, R cannot access the
default folder for the packages, as that one is also included in the
Program Files. Also changing the Rprofile.site becomes a hassle, as so
many other tweaks.

Actually, the only programs getting there are Microsoft related
applications. If there's no strict need to place a program there, I
stay away from that folder as far as possible. Vista has the same
problem by the way, and obviously the same solutions.

Cheers
Joris

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using nlm or optim

2010-07-08 Thread Joris Meys

Without data I can't check, but try :
mle(nll,start=list(c=0.01,z=2.1,s=200),fixed=list(V=Var,M=Mean))

With a random dataset I get :
> Mean <- rnorm(136)

> Var <- 1 + rnorm(136)^2
> mle(nll,start=list(c=0.01,z=2.1,s=200),fixed=list(V=Var,M=Mean))
Error in optim(start, f, method = method, hessian = TRUE, ...) :
  initial value in 'vmmin' is not finite

This might be just a data problem, but again, I'm not sure.

Cheers
Joris

On Thu, Jul 8, 2010 at 3:11 AM, Anita Narwani  wrote:
> Hello,
> I am trying to use nlm to estimate the parameters that minimize the
> following function:
>
> Predict<-function(M,c,z){
> + v = c*M^z
> + return(v)
> + }
>
> M is a variable and c and z are parameters to be estimated.
>
> I then write the negative loglikelihood function assuming normal errors:
>
> nll<-function(M,V,c,z,s){
> n<-length(Mean)
> logl<- -.5*n*log(2*pi) -.5*n*log(s) - (1/(2*s))*sum((V-Predict(Mean,c,z))^2)
> return(-logl)
> }
>
> When I put the Mean and Variance (variables with 136 observations) into this
> function, and estimates for c,z, and s, it outputs the estimate for the
> normal negative loglikelihood given the data, so I know that this works.
>
> However, I am unable to use mle to estimate the parameters c, z, and s. I do
> not know how or where the data i.e. Mean (M) and Variance (V) should enter
> into the mle function. I have tried variations on
>
> mle(nll,start=list(c=0.01,z=2.1,s=200)) including
> mle(nll,start=list(M=Mean,V=Var, c=0.01,z=2.1,s=200))
>
> I keep getting errors and am quite certain that I just have a syntax error
> in the script because I don't know how to enter the variables into MLE.
>
> Thanks for your help,
> Anita.
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] random sample from arrays

2010-07-08 Thread Joris Meys

Don't know what exactly you're trying to do, but you make a matrix
with 11 columns and 50 rows, then treat it as a vector. On top of
that, you try to fill 50 rows/columns with 50 values. Off course that
doesn't work. Did you check the warning messages when running the
code?

Either do :

 for(i in c(1:11)){
set[,i] <-sample(x,50)
print(c(i,"->", set), quote = FALSE)
   }

or

 for(i in c(1:50)){
set[i,] <-sample(x,11)
print(c(i,"->", set), quote = FALSE)
   }

Or just forget about the loop altogether and do :

set <- replicate(11,sample(x,50))
or
set <- t(replicate(50,sample(x,11)))

cheers

On Thu, Jul 8, 2010 at 8:04 AM, Assa Yeroslaviz  wrote:
> Hello R users,
>
> I'm trying to extract random samples from a big array I have.
>
> I have a data frame of over 40k lines and would like to produce around 50
> random sample of around 200 lines each from this array.
>
> this is the matrix
>          ID xxx_1c xxx__2c xxx__3c xxx__4c xxx__5T xxx__6T xxx__7T xxx__8T
> yyy_1c yyy_1c _2c
> 1 A_512  2.150295  2.681759  2.177138  2.142790  2.115344  2.013047
> 2.115634  2.189372  1.643328  1.563523
> 2 A_134 12.832488 12.596373 12.882581 12.987091 11.956149 11.994779
> 11.650336 11.995504 13.024494 12.776322
> 3 A_152  2.063276  2.160961  2.067549  2.059732  2.656416  2.075775
> 2.033982  2.111937  1.606340  1.548940
> 4 A_163  9.570761 10.448615  9.432859  9.732615 10.354234 10.993279
> 9.160038  9.104121 10.079177  9.828757
> 5 A_184  3.574271  4.680859  4.517047  4.047096  3.623668  3.021356
> 3.559434  3.156093  4.308437  4.045098
> 6 A_199  7.593952  7.454087  7.513013  7.449552  7.345718  7.367068
> 7.410085  7.022582  7.668616  7.953706
> ...
>
> I tried to do it with a for loop:
>
> genelist <- read.delim("/user/R/raw_data.txt")
> rownames(genelist) <- genelist[,1]
> genes <- rownames(genelist)
>
> x <- 1:4
> set <- matrix(nrow = 50, ncol = 11)
>
> for(i in c(1:50)){
>    set[i] <-sample(x,50)
>    print(c(i,"->", set), quote = FALSE)
>    }
>
> which basically do the trick, but I just can't save the results outside the
> loop.
> After having the random sets of lines it wasn't a problem to extract the
> line from the arrays using subset.
>
> genSet1 <-sample(x,50)
> random1 <- genes %in% genSet1
> subsetGenelist <- subset(genelist, random1)
>
>
> is there a different way of creating these random vectors or saving the loop
> results outside tjhe loop so I cn work with them?
>
> Thanks a lot
>
> Assa
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] package installation for Windows 7

2010-07-08 Thread Joris Meys

ttp://bioconductor.org/biocLite.R";)
>> BioC_mirror = http://www.bioconductor.org
>> Change using chooseBioCmirror().
>> Warning messages:
>> 1: In safeSource() : Redefining ‘biocinstall’
>> 2: In safeSource() : Redefining ‘biocinstallPkgGroups’
>> 3: In safeSource() : Redefining ‘biocinstallRepos’
>>  > biocLite("Biobase")
>> Using R version 2.11.1, biocinstall version 2.6.7.
>> Installing Bioconductor version 2.6 packages:
>> [1] "Biobase"
>> Please wait...
>>
>> Warning in install.packages(pkgs = pkgs, repos = repos, ...) :
>> argument 'lib' is missing: using '\Users\dbickel/R/win-library/2.11'
>> Error in if (ok) { : missing value where TRUE/FALSE needed
>>
>>  > utils:::menuInstallLocal() # "Install package(s) from local zip
>> files..."
>> Error in if (ok) { : missing value where TRUE/FALSE needed
>>
>>  > utils:::menuInstallPkgs() # "Install package(s)..."
>> --- Please select a CRAN mirror for use in this session ---
>> Error in if (ok) { : missing value where TRUE/FALSE needed
>>
>>
>> I would appreciate any assistance.
>>
>> David
>>
>>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] strsplit("dia ma", "\\b") splits characterwise

2010-07-08 Thread Joris Meys

l guess this is expected behaviour, although counterintuitive. \b
represents an empty string indicating a word boundary, but is coerced
to character and thus simply the empty string. This means the output
you get is the same as
> strsplit("dia ma", "",perl=T)
[[1]]
[1] "d" "i" "a" " " "m" "a"

I'd use the seperating character as split in strsplit, eg

> strsplit("dia ma", "\\s")
[[1]]
[1] "dia" "ma"

If you need the space in the list as well, you'll have to go around it I guess.

> test <- as.vector(gregexpr("\\b", "dia ma", perl=TRUE)[[1]])
> test
[1] 1 4 5 7
> apply(embed(test,2),1,function(x) substr("dia ma",x[2],x[1]-1))
[1] "dia" " "   "ma"

It would be nice if special characters like \b would be recognized by
strsplit as well though.

Cheers
Joris

On Thu, Jul 8, 2010 at 10:15 AM, Suharto Anggono Suharto Anggono
 wrote:
> \b is word boundary.
> But, unexpectedly, strsplit("dia ma", "\\b") splits character by character.
>
>> strsplit("dia ma", "\\b")
> [[1]]
> [1] "d" "i" "a" " " "m" "a"
>
>> strsplit("dia ma", "\\b", perl=TRUE)
> [[1]]
> [1] "d" "i" "a" " " "m" "a"
>
>
> How can that be?
>
> This is the output of 'gregexpr'.
>
>> gregexpr("\\b", "dia ma")
> [[1]]
> [1] 1 2 3 4 5 6
> attr(,"match.length")
> [1] 0 0 0 0 0 0
>
>> gregexpr("\\b", "dia ma", perl=TRUE)
> [[1]]
> [1] 1 4 5 7
> attr(,"match.length")
> [1] 0 0 0 0
>
>
> The output from gregexpr("\\b", "dia ma", perl=TRUE) is what I expect. I 
> expect 'strsplit' to split at that points.
>
> This is in Windows. R was installed from binary.
>
>> sessionInfo()
> R version 2.11.1 (2010-05-31)
> i386-pc-mingw32
>
> locale:
> [1] LC_COLLATE=English_United States.1252
> [2] LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
>
>
> R 2.8.1 shows the same 'strsplit' behavior, but the behavior of default 
> 'gregexpr' (i.e. perl=FALSE) is different.
>
>> strsplit("dia ma", "\\b")
> [[1]]
> [1] "d" "i" "a" " " "m" "a"
>
>> strsplit("dia ma", "\\b", perl=TRUE)
> [[1]]
> [1] "d" "i" "a" " " "m" "a"
>
>> gregexpr("\\b", "dia ma")
> [[1]]
> [1] 1 4 5 7
> attr(,"match.length")
> [1] 0 0 0 0
>
>> gregexpr("\\b", "dia ma", perl=TRUE)
> [[1]]
> [1] 1 4 5 7
> attr(,"match.length")
> [1] 0 0 0 0
>
>> sessionInfo()
> R version 2.8.1 (2008-12-22)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
> States.1252;LC_MON
> ETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United 
> States.1252
>
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
>
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to determine order of arima bt acf and pacf

2010-07-08 Thread Joris Meys

Did you read these already?

http://www.statoek.wiso.uni-goettingen.de/veranstaltungen/zeitreihen/sommer03/ts_r_intro.pdf

http://www.barigozzi.eu/ARIMA.pdf

Cheers
Joris

On Thu, Jul 8, 2010 at 11:45 AM, vijaysheegi  wrote:
>
> Hi R community,
> I am new to R.Have  some doubts on ACF and pacf.Appying acf and pacf to
> dataframe or table.How to determine ARIMA degrees which suits our example .
> Please assist me on this and also please give me link for the same so that i
> will also try understand the solution.
>
> Thanks in advance
> vijaysheegi
> student
>
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/how-to-determine-order-of-arima-bt-acf-and-pacf-tp2282053p2282053.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] package installation for Windows 7

2010-07-08 Thread Joris Meys

N mirror for use in this session ---
> Error in if (ok) { : missing value where TRUE/FALSE needed
>
>
> I would appreciate any assistance.
>
> David
>
> --
> David R. Bickel, PhD
> Associate Professor
> Ottawa Institute of Systems Biology
> Biochem., Micro. and I. Department
> Mathematics and Statistics Department
> University of Ottawa
> 451 Smyth Road
> Ottawa, Ontario K1H 8M5
>
> http://www.statomics.com
>
> Office Tel: (613) 562-5800 ext. 8670
> Office Fax: (613) 562-5185
> Office Room: RGN 4510F (Follow the signs to the elevator, and take it to the
> fourth floor. Turn left and go all the way to the end of the hall, and enter
> the door to the OISB area.)
> Lab Tel.: (613) 562-5800 ext. 8304
> Lab Room: RGN 4501T
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] relation in aggregated data

2010-07-08 Thread Joris Meys

Depending on the data and the research question, a meta-analytic
approach might be appropriate. You can see every campaign as a
"study". See the package metafor for example. You can only draw very
general conclusions, but at least your inference will be closer to
correct.

Cheers
Joris

On Thu, Jul 8, 2010 at 10:03 AM, Petr PIKAL  wrote:
> Thank you
>
> Actually when I do this myself I always try to make day or week averages
> if possible. However this was done by one of my colleagues and basically
> the aggregation was done on basis of campaigns. There is 4 to 6 campaigns
> per year and sometimes there is apparent relationship in aggregated data
> sometimes is not. My opinion is that I can not say much about exact
> relations until I have other clues or ways like expected underlaying laws
> of physics.
>
> Thanks again
>
> Best regards
> Petr
>
>
>
> Joris Meys  napsal dne 07.07.2010 17:33:55:
>
>> You examples are pretty extreme... Combining 120 data points in 4
>> points is off course never going to give a result. Try :
>>
>> fac <- rep(1:8,each=15)
>> xprum <- tapply(x, fac, mean)
>> yprum <- tapply(y, fac, mean)
>> plot(xprum, yprum)
>>
>> Relation is not obvious, but visible.
>>
>> Yes, you lose information. Yes, your hypothesis changes. But in the
>> case you describe, averaging the x-values for every day (so you get an
>> average linked to 1 y value) seems like a possibility, given you take
>> that into account when formulating the hypothesis. Optimally, you
>> should take the standard error on the average into account for the
>> analysis, but this is complicated, often not done and in most cases
>> ignoring this issue is not influencing the results to that extent it
>> becomes important.
>>
>> YMMV
>>
>> Cheers
>>
>> On Wed, Jul 7, 2010 at 4:24 PM, Petr PIKAL 
> wrote:
>> > Dear all
>> >
>> > My question is more on statistics than on R, however it can be
>> > demonstrated by R. It is about pros and cons trying to find a
> relationship
>> > by aggregated data. I can have two variables which can be related and
> I
>> > measure them regularly during some time (let say a year) but I can not
>> > measure them in a same time - (e.g. I can not measure x and respective
>> > value of y, usually I have 3 or more values of x and only one value of
> y
>> > per day).
>> >
>> > I can make a aggregated values (let say quarterly). My questions are:
>> >
>> > 1.      Is such approach sound? Can I use it?
>> > 2.      What could be the problems
>> > 3.      Is there any other method to inspect variables which can be
>> > related but you can not directly measure them in a same time?
>> >
>> > My opinion is, that it is not much sound to inspect aggregated values
> and
>> > there can be many traps especially if there are only few aggregated
>> > values. Below you can see my examples.
>> >
>> > If you have some opinion on this issue, please let me know.
>> >
>> > Best regards
>> > Petr
>> >
>> > Let us have a relation x/y
>> >
>> > set.seed(555)
>> > x <- rnorm(120)
>> > y <- 5*x+3+rnorm(120)
>> > plot(x, y)
>> >
>> > As you can see there is clear relation which can be seen from plot.
> Now I
>> > make a factor for aggregation.
>> >
>> > fac <- rep(1:4,each=30)
>> >
>> > xprum <- tapply(x, fac, mean)
>> > yprum <- tapply(y, fac, mean)
>> > plot(xprum, yprum)
>> >
>> > Relationship is completely gone. Now let us make other fake data
>> >
>> > xn <- runif(120)*rep(1:4, each=30)
>> > yn <- runif(120)*rep(1:4, each=30)
>> > plot(xn,yn)
>> >
>> > There is no visible relation, xn and yn are independent but related to
>> > aggregation factor.
>> >
>> > xprumn <- tapply(xn, fac, mean)
>> > yprumn <- tapply(yn, fac, mean)
>> > plot(xprumn, yprumn)
>> >
>> > Here you can see perfect relation which is only due to aggregation
> factor.
>> >
>> > __
>> > R-help@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>>
>>
>> --
>> Joris Meys
>> Statistical consultant
>>
>> Ghent University
>> Faculty of Bioscience Engineering
>> Department of Applied mathematics, biometrics and process control
>>
>> tel : +32 9 264 59 87
>> joris.m...@ugent.be
>> ---
>> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] when all I have is a contingency table....

2010-07-07 Thread Joris Meys

See Teds answer for histogram (I'd go with barplot).

For most statistical procedures there is a weighted version (e.g.
weighted.mean() for the mean). Your counts are valid weights for most
procedures.

Cheers
Joris

On Wed, Jul 7, 2010 at 10:39 PM, Andrei Zorine  wrote:
> Hello,
> I just need a hint here:
> Suppose I have no raw data, but only a frequency table I have, and I
> want to run basic statistical procedures with it, like histogram,
> descriptive statistics, etc. How do I do this with R?
> For example, how do I plot a histogram for this table for a sample of size 60?
>
> Value   Count
> 1       10
> 2       8
> 3       12
> 4       9
> 5       14
> 6       7
>
>
> Thanks,
> A.Z.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problems with write.table, involving loops & paste statement

2010-07-07 Thread Joris Meys

To be correct, everything is written to all folders according to my testing.

There is absolutely no need whatsoever to use l_ply. And in any case,
take as much as possible outside the loop, like the library statement
and the max.col.

Following is _a_ solution, not the most optimal, but as close as
feasible to your code :

A_var_df <- data.frame(index=1:length(seq(1.0, -0.9, by= -0.1)),
from=seq(1.0, -0.9, by= -0.1), to=seq(0.9, -1.0, by= -0.1))

# I make some data and make sure I can adjust the number of dirs and the steps
Dchr1 <-matrix(rep(1:100,each=10),ncol=100)
dirs <- 20
max.col <- ncol(Dchr1)
steps = ceiling(max.col/dirs)

cols <- seq(1, max.col, by=steps)

for(i in 1:length(A_var_df[,1]))
{
  k <- cols[i]
 print(as.data.frame(Dchr1[,k:min(k+steps, max.col)]))
 print(paste("./A_",A_var_df[i,1], "/k.csv", sep=""))
# I use print here just to show which dataframe is going to which directory,
# you can reconstruct the write.table yourself.
}

Cheers
Joris

On Wed, Jul 7, 2010 at 9:32 PM, CC  wrote:
> Hi!
>
> I want to write portions of my data (3573 columns at a time) to twenty
> folders I have available titled "A_1" to "A_20" such that the first 3573
> columns will go to folder A_1, next 3573 to folder A_2 and so on.
>
> This code below ensures that the data is written into all 20 folders, but
> only the last iteration of the loop (last 3573 columns) is being written
> into ALL of the folders (A_1 to A_20) rather than the sequential order that
> I would like.
>
> How can I fix this?
>
> Thank you!
>
> *
>
> Code:
>
> A_var_df <- data.frame(index=1:length(seq(1.0, -0.9, by= -0.1)),
> from=seq(1.0, -0.9, by= -0.1), to=seq(0.9, -1.0, by= -0.1))
>
> for(i in 1:length(A_var_df[,1]))
> {
> library(plyr)
> max.col <- ncol(Dchr1)
> l_ply(seq(1, max.col, by=3573), function(k)
> write.table(as.data.frame(Dchr1[,k:min(k+3572, max.col)]), paste("./A_",
> A_var_df[i,1], "/k.csv", sep=""), sep=",", row.names=F, quote=F) )
> }
>
> **
>
> --
> Thanks,
> CC
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] relation in aggregated data

2010-07-07 Thread Joris Meys

You examples are pretty extreme... Combining 120 data points in 4
points is off course never going to give a result. Try :

fac <- rep(1:8,each=15)
xprum <- tapply(x, fac, mean)
yprum <- tapply(y, fac, mean)
plot(xprum, yprum)

Relation is not obvious, but visible.

Yes, you lose information. Yes, your hypothesis changes. But in the
case you describe, averaging the x-values for every day (so you get an
average linked to 1 y value) seems like a possibility, given you take
that into account when formulating the hypothesis. Optimally, you
should take the standard error on the average into account for the
analysis, but this is complicated, often not done and in most cases
ignoring this issue is not influencing the results to that extent it
becomes important.

YMMV

Cheers

On Wed, Jul 7, 2010 at 4:24 PM, Petr PIKAL  wrote:
> Dear all
>
> My question is more on statistics than on R, however it can be
> demonstrated by R. It is about pros and cons trying to find a relationship
> by aggregated data. I can have two variables which can be related and I
> measure them regularly during some time (let say a year) but I can not
> measure them in a same time - (e.g. I can not measure x and respective
> value of y, usually I have 3 or more values of x and only one value of y
> per day).
>
> I can make a aggregated values (let say quarterly). My questions are:
>
> 1.      Is such approach sound? Can I use it?
> 2.      What could be the problems
> 3.      Is there any other method to inspect variables which can be
> related but you can not directly measure them in a same time?
>
> My opinion is, that it is not much sound to inspect aggregated values and
> there can be many traps especially if there are only few aggregated
> values. Below you can see my examples.
>
> If you have some opinion on this issue, please let me know.
>
> Best regards
> Petr
>
> Let us have a relation x/y
>
> set.seed(555)
> x <- rnorm(120)
> y <- 5*x+3+rnorm(120)
> plot(x, y)
>
> As you can see there is clear relation which can be seen from plot. Now I
> make a factor for aggregation.
>
> fac <- rep(1:4,each=30)
>
> xprum <- tapply(x, fac, mean)
> yprum <- tapply(y, fac, mean)
> plot(xprum, yprum)
>
> Relationship is completely gone. Now let us make other fake data
>
> xn <- runif(120)*rep(1:4, each=30)
> yn <- runif(120)*rep(1:4, each=30)
> plot(xn,yn)
>
> There is no visible relation, xn and yn are independent but related to
> aggregation factor.
>
> xprumn <- tapply(xn, fac, mean)
> yprumn <- tapply(yn, fac, mean)
> plot(xprumn, yprumn)
>
> Here you can see perfect relation which is only due to aggregation factor.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Different goodness of fit tests leads to contradictory conclusions

2010-07-07 Thread Joris Meys

The first two options are GOF-tests for ungrouped data, the latter two
can only be used for grouped data. According to my experience, the G^2
test is more influenced by this than the X^2 test (gives -wrongly-
significant deviations more easily when used for ungrouped data).

If you started from binary data, you can only use the Hosmer tests.

Cheers
Joris

On Wed, Jul 7, 2010 at 4:00 PM, Kiyoshi Sasaki  wrote:
> I am trying to test goodness of fit for my legalistic regression using 
> several options as shown below.  Hosmer-Lemeshow test (whose function I 
> borrowed from a previous post), Hosmer–le Cessie omnibus lack of fit test 
> (also borrowed from a previous post), Pearson chi-square test, and deviance 
> test.  All the tests, except the deviance tests, produced p-values well above 
> 0.05.  Would anyone please teach me why deviance test produce p-values such a 
> small value (0.001687886) that suggest lack of fit, while other tests suggest 
> good fit? Did I do something wrong?
>
> Thank you for your time and help!
>
> Kiyoshi
>
>
> mod.fit <- glm(formula = no.NA$repcnd ~  no.NA$svl, family = binomial(link = 
> logit), data =  no.NA, na.action = na.exclude, control = list(epsilon = 
> 0.0001, maxit = 50, trace = F))
>
>> # Option 1: Hosmer-Lemeshow test
>> mod.fit <- glm(formula = no.NA$repcnd ~  no.NA$svl, family = binomial(link = 
>> logit), data =  no.NA, na.action = na.exclude, control = list(epsilon = 
>> 0.0001, maxit = 50, trace = F))
>
>  >  hosmerlem <- function (y, yhat, g = 10)
> {
> cutyhat <- cut(yhat, breaks = quantile(yhat, probs = seq(0, 1, 1/g)), 
> include.lowest = T)
> obs <- xtabs(cbind(1 - y, y) ~ cutyhat)
> expect <- xtabs(cbind(1 - yhat, yhat) ~ cutyhat)
> chisq <- sum((obs - expect)^2/expect)
> P <- 1 - pchisq(chisq, g - 2)
> c("X^2" = chisq, Df = g - 2, "P(>Chi)" = P)
> }
>
>> hosmerlem(no.NA$repcnd, fitted(mod.fit))
>  X^2       Df       P(>Chi)
> 7.8320107    8.000    0.4500497
>
>
>> # Option 2 - Hosmer–le Cessie omnibus lack of fit test:
>> library(Design)
>> lrm.GOF <- lrm(formula = no.NA$repcnd ~  no.NA$svl, data =  no.NA, y = T, x 
>> = T)
>> resid(lrm.GOF,type = "gof")
> Sum of squared errors Expected value|H0    SD 
> Z P
>      48.3487115      48.3017399       
>    0.1060826 0.4427829 0.6579228
>
>> # Option 3 - Pearson chi-square p-value:
>> pp <- sum(resid(lrm.GOF,typ="pearson")^2)
>> 1-pchisq(pp, mod.fit$df.resid)
> [1] 0.4623282
>
>
>> # Option 4 - Deviance (G^2) test:
>> 1-pchisq(mod.fit$deviance, mod.fit$df.resid)
> [1] 0.001687886
>
>
>
>        [[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] PCA and Regression

2010-07-06 Thread Joris Meys

PCA components are orthogonal by definition so no, that doesn't make
sense at all. Do yourself a favor and get a book on multivariate data
analysis. Two books come to mind: Obviously the one of Hair and
colleagues, called "multivariate data analysis" and easily found in a
university library (or on the internet).

The second one is "An R and S-Plus Companion to Multivariate Analysis"
by Everitt. This one you have less chance of finding in the library,
but you find it online for sale.

This is a nice introduction as well :
https://netfiles.uiuc.edu/miguez/www/Teaching/MultivariateRGGobi.pdf

Never use a chainsaw without reading the manual.

Cheers
Joris

On Tue, Jul 6, 2010 at 9:07 PM, Marino Taussig De Bodonia, Agnese
 wrote:
> Hello,
>
> I am currently analyzing responses to questionnaires about general attitudes. 
> I have performed a PCA on my data, and have retained two Principal 
> Components. Now I would like to use the scores of both the principal 
> comonents in a multiple regression. I would like to know if it makes sense to 
> use the scores of one principal component to explain the variance in the 
> scores of another principal component:
>
> lm(scores of principal component 1~scores of principal component 2+ age, 
> gender, etc..)
>
> Please could you let me know if this is statistically sound?
>
> Thank you in advance for you time,
>
> Agnese
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help With ANOVA

2010-07-06 Thread Joris Meys

I still can't reproduce your example. The aov output gives me the following :

> anova(aov(Intensity ~ Group, data = zzzanova))
Analysis of Variance Table

Response: Intensity
  Df Sum Sq Mean Sq F value  Pr(>F)
Group  5  98.85  19.771  2.1469 0.07576 .
Residuals 48 442.03   9.209
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Next to that I noticed you have truncated data, which has implications
for the analysis as well. If you use a Kruskal-Wallis test, the
p-value becomes larger :

> kruskal.test(Intensity ~ Group, data = zzzanova)

Kruskal-Wallis rank sum test

data:  Intensity by Group
Kruskal-Wallis chi-squared = 6.6955, df = 5, p-value = 0.2443

Which is to be expected, as you have almost 50% truncated data. So a
p-value of 0.005 seems very wrong to me.

Cheers
Joris

On Tue, Jul 6, 2010 at 7:11 PM, Amit Patel  wrote:
> Hi Joris
>
> Sorry i had a misprint in the appendix code in the last email
>
> datalist <- c(-4.60517, -4.60517, -4.60517, -4.60517, -4.60517, -4.60517, 
> -4.60517, 3.003749, -4.60517,
>    2.045314, 2.482557, -4.60517, -4.60517, -4.60517, -4.60517, 1.592743, 
> -4.60517,
>    -4.60517, 0.91328, -4.60517, -4.60517, 1.827744, 2.457795, 0.355075, 
> -4.60517, 2.39127,
>    2.016987, 2.319903, 1.146683, -4.60517, -4.60517, -4.60517, 1.846162, 
> -4.60517, 2.121427, 1.973118,
>    -4.60517, 2.251568, -4.60517, 2.270724, 0.70338, 0.963816, -4.60517,  
> 0.023703, -4.60517,
>    2.043382, 1.070586, 2.768289, 1.085169, 0.959334, -0.02428, -4.60517, 
> 1.371895, 1.533227)
>
> "zzzanova" <-
> structure(list(Intensity = datalist,
> Group = structure(c(1,1,1,1,1,1,1,1,1,
>         2,2,2,2,2,2,2,2,
>         3,3,3,3,3,3,3,3,3,
>         4,4,4,4,4,4,4,4,4,4,
>         5,5,5,5,5,5,5,5,5,
>         6,6,6,6,6,6,6,6,6), .Label = c("Group1", "Group2", "Group3", 
> "Group4", "Group5", "Group6"), class = "factor"),
>    Sample = structure(c( 1, 2, 3, 4, 5, 6, 7, 8, 9,
>    10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
>    20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
>    30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 
> 40,41,42,43,44,45,46,47,48,49,50,51,52,53,54)
> ))
> , .Names = c("Intensity",
> "Group", "Sample"), row.names =
> c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10",
> "11", "12", "13", "14", "15", "16", "17", "18", "19", "20",
> "21", "22", "23", "24", "25", "26", "27", "28", "29", "30",
> "31", "32", "33", "34", "35", "36", "37", "38", "39", "40",
> "41", "42", "43", "44", "45", "46", "47", "48", "49", "50",
> "51", "52", "53", "54"),class = "data.frame")
>
>
> Thanks for your reply
>
>
>
>
>
> - Original Message 
> From: Joris Meys 
> To: Amit Patel 
> Cc: r-help@r-project.org
> Sent: Tue, 6 July, 2010 17:04:40
> Subject: Re: [R] Help With ANOVA
>
> We're missing the samp1 etc. in order to be able to test the code.
> Where did you get the other p-value?
> Cheers
> Joris
>
> On Tue, Jul 6, 2010 at 3:08 PM, Amit Patel  wrote:
>> Hi I needed some help with ANOVA
>>
>> I have a problem with My ANOVA
>> analysis. I have a dataset with a known ANOVA p-value, however I can
>> not seem to re-create it in R.
>>
>> I have created a list (zzzanova) which contains
>> 1)Intensity Values
>> 2)Group Number (6 Different Groups)
>> 3)Sample Number (54 different samples)
>> this is created by the script in Appendix 1
>>
>> I then conduct ANOVA with the command
>>> zzz.aov <- aov(Intensity ~ Group, data = zzzanova)
>>
>> I get a p-value of
>> Pr(>F)1
>> 0.9483218
>>
>> The
>> expected p-value is 0.00490 so I feel I maybe using ANOVA incorrectly
>> or have put in a wrong formula. I am trying to do an ANOVA analysis
>> across all 6 Groups. Is there something wrong with my formula. But I think I
>> have made a mistake in the formula rather than anything else.
>>
>>
>>
>>
>> APPENDIX 1
>>
>> datalist <- c(-4.60517, -4.60517, -4.60517, -4.60517, -4.60517, -4.60517, 
>> -4.60517, 3.003749, -4.60517

Re: [R] Help With ANOVA

2010-07-06 Thread Joris Meys

We're missing the samp1 etc. in order to be able to test the code.
Where did you get the other p-value?
Cheers
Joris

On Tue, Jul 6, 2010 at 3:08 PM, Amit Patel  wrote:
> Hi I needed some help with ANOVA
>
> I have a problem with My ANOVA
> analysis. I have a dataset with a known ANOVA p-value, however I can
> not seem to re-create it in R.
>
> I have created a list (zzzanova) which contains
> 1)Intensity Values
> 2)Group Number (6 Different Groups)
> 3)Sample Number (54 different samples)
> this is created by the script in Appendix 1
>
> I then conduct ANOVA with the command
>> zzz.aov <- aov(Intensity ~ Group, data = zzzanova)
>
> I get a p-value of
> Pr(>F)1
> 0.9483218
>
> The
> expected p-value is 0.00490 so I feel I maybe using ANOVA incorrectly
> or have put in a wrong formula. I am trying to do an ANOVA analysis
> across all 6 Groups. Is there something wrong with my formula. But I think I
> have made a mistake in the formula rather than anything else.
>
>
>
>
> APPENDIX 1
>
> datalist <- c(-4.60517, -4.60517, -4.60517, -4.60517, -4.60517, -4.60517, 
> -4.60517, 3.003749, -4.60517,
>    2.045314, 2.482557, -4.60517, -4.60517, -4.60517, -4.60517, 1.592743, 
> -4.60517,
>    -4.60517, 0.91328, -4.60517, -4.60517, 1.827744, 2.457795, 0.355075, 
> -4.60517, 2.39127,
>    2.016987, 2.319903, 1.146683, -4.60517, -4.60517, -4.60517, 1.846162, 
> -4.60517, 2.121427, 1.973118,
>    -4.60517, 2.251568, -4.60517, 2.270724, 0.70338, 0.963816, -4.60517,  
> 0.023703, -4.60517,
>    2.043382, 1.070586, 2.768289, 1.085169, 0.959334, -0.02428, -4.60517, 
> 1.371895, 1.533227)
>
> "zzzanova" <-
> structure(list(Intensity = c(t(Samp1), t(Samp2), t(Samp3), t(Samp4)),
> Group = structure(c(1,1,1,1,1,1,1,1,1,
>         2,2,2,2,2,2,2,2,
>         3,3,3,3,3,3,3,3,3,
>         4,4,4,4,4,4,4,4,4,4,
>         5,5,5,5,5,5,5,5,5,
>         6,6,6,6,6,6,6,6,6), .Label = c("Group1", "Group2", "Group3", 
> "Group4", "Group5", "Group6"), class = "factor"),
>    Sample = structure(c( 1, 2, 3, 4, 5, 6, 7, 8, 9,
>    10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
>    20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
>    30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 
> 40,41,42,43,44,45,46,47,48,49,50,51,52,53,54)
> ))
> , .Names = c("Intensity",
> "Group", "Sample"), row.names =
> c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10",
> "11", "12", "13", "14", "15", "16", "17", "18", "19", "20",
> "21", "22", "23", "24", "25", "26", "27", "28", "29", "30",
> "31", "32", "33", "34", "35", "36", "37", "38", "39", "40",
> "41", "42", "43", "44", "45", "46", "47", "48", "49", "50",
> "51", "52", "53", "54"),class = "data.frame")
>
>
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] calculation on series with different time-steps

2010-07-06 Thread Joris Meys

Look at ?ifelse en ?abs, eg :

data_frame$new_column_in_dataframe <- ifelse(stage$julian_day ==
baro$julian_day & abs(stage$time -
baro$hour) <= 30,
stage$stage.cm - baro$pressure, NA )

Cheers
Joris


On Thu, Jul 1, 2010 at 10:09 PM, Jeana Lee  wrote:
> Hello,
>
> I have two series, one with stream stage measurements every 5 minutes, and
> the other with barometric pressure measurements every hour.  I want to
> subtract each barometric pressure measurement from the 12 stage measurements
> closest in time to it (6 stage measurements on either side of the hour).
>
> I want to do something like the following, but I don't know the syntax.
>
> "If the Julian day of the stage measurement is equal to the Julian day of
> the pressure measurement, AND the absolute value of the difference between
> the time of the stage measurement and the hour of the pressure measurement
> is less than or equal to 30 minutes, then subtract the pressure measurement
> from the stage measurement (and put it in a new column in the stage data
> frame)."
>
>                     if ( stage$julian_day = baro$julian_day & |stage$time -
> baro$hour| <= 30 )
>                     then (stage$stage.cm - baro$pressure)
>
> Can you help me?
>
> Thanks,
> JL
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] timeseries

2010-07-06 Thread Joris Meys

Difficult to guess why, but I reckon you should use ts() instead of
as.ts. Otherwise set the tsp-attribute correctly. Eg :

> x <- cumsum(1+round(rnorm(20),2))
> as.ts(x)
Time Series:
Start = 1
End = 20
Frequency = 1
 [1]  0.87  3.51  4.08  4.20  3.25  4.63  6.30  6.89  9.28  9.93 10.19
 9.97 10.20 11.51 11.52 11.53 12.97 14.04 17.02 18.47

> tseries <- ts(x,frequency=12,start=c(2004,3))
> tseries
   Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct   Nov   Dec
2004  0.87  3.51  4.08  4.20  3.25  4.63  6.30  6.89  9.28  9.93
2005 10.19  9.97 10.20 11.51 11.52 11.53 12.97 14.04 17.02 18.47

> tsp(x) <- c(2004+2/12,2005.75,12)
> as.ts(x)
   Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct   Nov   Dec
2004  0.87  3.51  4.08  4.20  3.25  4.63  6.30  6.89  9.28  9.93
2005 10.19  9.97 10.20 11.51 11.52 11.53 12.97 14.04 17.02 18.47

See ?ts to get the options right. I suggest to use the function ts()
instead of assigning the tsp attribute.
)
Cheers
Joris


On Mon, Jul 5, 2010 at 9:35 AM, nuncio m  wrote:
> Dear useRs,
> I am trying to construct a time series using as.ts function, surprisingly
> when I plot
> the data the x axis do not show the time in years, however if I use
> ts(data), time in years are shown in the
> x axis.  Why such difference in the results of both the commands
> Thanks
> nuncio
>
>
> --
> Nuncio.M
> Research Scientist
> National Center for Antarctic and Ocean research
> Head land Sada
> Vasco da Gamma
> Goa-403804
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] acf

2010-07-06 Thread Joris Meys

Your question is a bit confusing. "acfresidfit" is an object, of which
we don't know the origin. with your test file, I arrive at the first
correlations (but with integer headings) :

> residfit <- read.table("fileree2_test_out.txt")
> acf(residfit)
> acfresid <- acf(residfit)
> acfresid

Autocorrelations of series ‘residfit’, by lag

 0  1  2  3  4  5  6  7  8  9
   10 11 12 13 14 15 16
 1.000 -0.015  0.010  0.099  0.048 -0.014 -0.039 -0.019  0.040  0.018
0.042  0.078 -0.029  0.028 -0.016 -0.021 -0.109
17 18 19 20 21 22 23 24 25
 0.000 -0.038 -0.006  0.015 -0.032 -0.002  0.014 -0.226 -0.030

Could you please check where the object acfresidfit is coming from and
how you generated it?
Cheers
Joris

On Tue, Jul 6, 2010 at 9:47 AM, nuncio m  wrote:
> Hi list,
>          I have the following code to compute the acf of a time series
> acfresid <- acf(residfit), where residfit is the series
> when I type acfresid at the prompt the follwoing is displayed
>
> Autocorrelations of series ‘residfit’, by lag
>
> 0. 0.0833 0.1667 0.2500 0. 0.4167 0.5000 0.5833 0.6667 0.7500 0.8333
>
>  1.000 -0.015  0.010  0.099  0.048 -0.014 -0.039 -0.019  0.040  0.018  0.042
>
> 0.9167 1. 1.0833 1.1667 1.2500 1. 1.4167 1.5000 1.5833 1.6667 1.7500
>
>  0.078 -0.029  0.028 -0.016 -0.021 -0.109  0.000 -0.038 -0.006  0.015 -0.032
>
> 1.8333 1.9167 2. 2.0833
> -0.002  0.014 -0.226 -0.030
> Residfit is a timeseries object at monthly interval (0.0833), Here I
> understand R computed the correlation at lags 0 to 2 years.
>
> What is surprising to me is
> if I type acfresidfit at the prompt the following is displayed
>
> Autocorrelations of series ‘residfit’, by lag
>
>     0      1      2      3      4      5      6      7      8      9     10
>
>  1.000 -0.004  0.011  0.041 -0.056  0.019 -0.052 -0.027 -0.008 -0.012 -0.034
>
>    11     12     13     14     15     16     17     18     19     20     21
>
>  0.024 -0.005  0.006 -0.045  0.031 -0.035 -0.011 -0.021 -0.020 -0.010 -0.007
>
>    22     23     24     25
> -0.038  0.017  0.051  0.038
> >From the header I understand both are autocorrelation computed at the same
> lags. but the correlations are different
>
> where am I going wrong and which is the correct one.
>
> file residfit is also attached(filename-fileree2_test_out.txt)
> Thanks
> nuncio
> --
> Nuncio.M
> Research Scientist
> National Center for Antarctic and Ocean research
> Head land Sada
> Vasco da Gamma
> Goa-403804
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] S4 classes and debugging - Is there a summary?

2010-07-02 Thread Joris Meys

On Fri, Jul 2, 2010 at 4:01 PM, Martin Morgan  wrote:
>
>  selectMethod(xyValues, c('RasterLayer', 'matrix'))
>
> would be my choice.
>
Thanks, that's a discovery. I guess I misunderstood the help pages on that one.
>
> I don't really have the right experience, but Chamber's 2008 Software
> for Data Analysis... and Gentleman's 2008 R Programming for
> Bioinformatics... books would be where I'd start. ?Methods and ?Classes
> are I think under-used.

I did read the posting guide ;-) but I have to admit the help pages
are not always that clear. Which is one of the reasons why S4 gets me
frustrated so easily. I also have Chamber's book and Gentleman's book
is here on the department as well. I'll go again through the relevant
chapters.

Thanks
Joris

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] S4 classes and debugging - Is there a summary?

2010-07-02 Thread Joris Meys

Correction: trace off course works when specifying the signature. so eg:

trace(xyValues,tracer=browser,signature=c(object="RasterLayer", xy="matrix"))
allows me to browse through the function. Should have specified the
signature, I overlooked that. Still, if there's a manual on S4 that
anybody likes to mention, please go ahead. I find bits and pieces on
the internet, but haven't found an overview yet dealing with all
peculiarities of these classes.

Cheers
Joris

On Fri, Jul 2, 2010 at 2:05 PM, Joris Meys  wrote:
> Dear all,
>
> I'm getting more and more frustrated with the whole S4 thing and I'm
> looking for a more advanced summary on how to deal with them. Often I
> get error messages that don't make sense at all, or the code is not
> doing what I think it would do. Far too often inspecting the code
> requires me to go to the source, which doesn't really help in easily
> finding the bit of code you're interested in.
>
> Getting the code with getAnywhere() doesn't always work. Debug()
> doesn't work. Using trace() and browser() is not an option, as I can't
> find the correct method.
>
> eg :
> library(raster)
>> getAnywhere(xyValues)
> A single object matching ‘xyValues’ was found
> It was found in the following places
>  package:raster
>  namespace:raster
> with value
>
> standardGeneric for "xyValues" defined from package "raster"
>
> function (object, xy, ...)
> standardGeneric("xyValues")
> 
> Methods may be defined for arguments: object, xy
> Use  showMethods("xyValues")  for currently available ones.
>> showMethods("xyValues")
> Function: xyValues (package raster)
> object="Raster", xy="data.frame"
> object="Raster", xy="SpatialPoints"
> object="Raster", xy="vector"
> object="RasterLayer", xy="matrix"
> object="RasterStackBrick", xy="matrix"
>
> And now...?
>
> Is there an overview that actually explains how you get the
> information you're looking for without strolling through the complete
> source? Sorry if I sound frustrated, but this is costing me huge
> amounts of time, to that extent that I rather write a custom function
> than use one in an S4 package if I'm not absolutely sure about what it
> does and how it achieves it.
>
> Cheers
> Joris
>
>
> --
> Joris Meys
> Statistical consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Applied mathematics, biometrics and process control
>
> tel : +32 9 264 59 87
> joris.m...@ugent.be
> ---
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] S4 classes and debugging - Is there a summary?

2010-07-02 Thread Joris Meys

Dear all,

I'm getting more and more frustrated with the whole S4 thing and I'm
looking for a more advanced summary on how to deal with them. Often I
get error messages that don't make sense at all, or the code is not
doing what I think it would do. Far too often inspecting the code
requires me to go to the source, which doesn't really help in easily
finding the bit of code you're interested in.

Getting the code with getAnywhere() doesn't always work. Debug()
doesn't work. Using trace() and browser() is not an option, as I can't
find the correct method.

eg :
library(raster)
> getAnywhere(xyValues)
A single object matching ‘xyValues’ was found
It was found in the following places
  package:raster
  namespace:raster
with value

standardGeneric for "xyValues" defined from package "raster"

function (object, xy, ...)
standardGeneric("xyValues")

Methods may be defined for arguments: object, xy
Use  showMethods("xyValues")  for currently available ones.
> showMethods("xyValues")
Function: xyValues (package raster)
object="Raster", xy="data.frame"
object="Raster", xy="SpatialPoints"
object="Raster", xy="vector"
object="RasterLayer", xy="matrix"
object="RasterStackBrick", xy="matrix"

And now...?

Is there an overview that actually explains how you get the
information you're looking for without strolling through the complete
source? Sorry if I sound frustrated, but this is costing me huge
amounts of time, to that extent that I rather write a custom function
than use one in an S4 package if I'm not absolutely sure about what it
does and how it achieves it.

Cheers
Joris


-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] left end or right end

2010-07-01 Thread Joris Meys

First of all, read the posting guide carefully :
http://www.R-project.org/posting-guide.html
Your question is far from clear. When you say that the lengths of P
and Q are different, you mean the length of the data or the difference
between start and end? That makes a world of difference.

Regarding the statistical test, that depends on what your data
represents. Is it possible for P to fall close to the left and the
right :
P-
Q   ---
For example.

You should also specify which test you want to use. Then people on the
list will be able to tell you whether that is available in R. You can
off course construct your own test with the tools R provides, but
again, this requires a lot more information. Next to that, the list is
actually not intended for statistical advice, but for advice regarding
R code. Maybe somebody will join in with some statistical guidance,
but if you don't know what to do, you better consult a statistician at
your departement.

Cheers
Joris

On Thu, Jul 1, 2010 at 1:53 PM, ravikumar sukumar
 wrote:
> Dear all,
> I am a biologist. I have two sets of distance P(start1, end1) and Q(start2,
> end2).
> The distance will be like this.
> P         
> Q  
>
> I want to know whether P falls closely to the right end or left  end of Q.
>  P and Q are of different lengths for each data point. There are more than
> 1 pairs of P and Q.
> Is there any test or function in R to bring a statistically significant
> conclusion.
>
> Thanks for all,
> Suku
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] run R

2010-06-30 Thread Joris Meys

Welcome to a new world. You're using the _programming language_ R with
thousands of packages, and first of all you should be reading the
introduction material thoroughly :

http://cran.r-project.org/doc/manuals/R-intro.pdf
http://cran.r-project.org/doc/contrib/Owen-TheRGuide.pdf

Regarding your question : Check the help files of source :
?source

if you have a script say foo.R in a folder c:\bar, do :

source("c:/bar/foo.R")

mind the slashes instead of the backslashes. I assume you're on a
Windows system...

Cheers
Joris

On Wed, Jun 30, 2010 at 3:08 PM,   wrote:
>
>
>      Hi,
>
>
>      I'm starting use the R Package and I have some .R scripts. How
>      can I run these .R scripts.
>
>
>      Conrado
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Use of processor by R 32bit on a 64bit machine

2010-06-29 Thread Joris Meys

*slaps forehead*
Thanks. So out it goes, that hyperthreading. Who invented
hyperthreading on a quad-core anyway?


Cheers
Joris

2010/6/29 Uwe Ligges :
>
>
> On 29.06.2010 15:30, Joris Meys wrote:
>>
>> Dear all,
>>
>> I've recently purchased a new 64bit system with an intel i7 quadcore
>> processor. As I understood (maybe wrongly) that to date the 32bit
>> version of R is more stable than the 64bit, I installed the 32bit
>> version and am happily using it ever since. Now I'm running a whole
>> lot of models, which goes smoothly, and I thought out of curiosity to
>> check how much processor I'm using. I would have thought I used 25%
>> (being one core), as on my old dual core R uses 50% of the total
>> processor capacity. Funny, it turns out that R is currently using only
>> 12-13% of my cpu, which is about half of what I expected.
>>
>
> An Intel Core i7 Quadcore has 8 virtual cores since it supports
> hyperthreading. R uses one of these virtual cores. Note that 2 virtual cores
> won't be twice as fast since they are running on the same physical core.
> Hence this is expected.
>
> Uwe Ligges
>
>
>
>> Did I miss something somewhere? Should I change some settings? I'm
>> running on a Windows 7 enterprise. I looked around already, but I have
>> the feeling I overlooked something.
>>
>> Cheers
>> Joris
>>
>> sessionInfo()
>> R version 2.10.1 (2009-12-14)
>> i386-pc-mingw32
>>
>> locale:
>> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
>> States.1252    LC_MONETARY=English_United States.1252
>> [4] LC_NUMERIC=C                           LC_TIME=English_United
>> States.1252
>>
>> attached base packages:
>> [1] grDevices datasets  splines   graphics  stats     tcltk     utils
>>    methods   base
>>
>> other attached packages:
>> [1] svSocket_0.9-48 TinnR_1.0.3     R2HTML_2.0.0    Hmisc_3.7-0
>> survival_2.35-7
>>
>> loaded via a namespace (and not attached):
>> [1] cluster_1.12.3 grid_2.10.1    lattice_0.18-3 svMisc_0.9-57
>>  tools_2.10.1
>>
>>
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Use of processor by R 32bit on a 64bit machine

2010-06-29 Thread Joris Meys

Dear all,

I've recently purchased a new 64bit system with an intel i7 quadcore
processor. As I understood (maybe wrongly) that to date the 32bit
version of R is more stable than the 64bit, I installed the 32bit
version and am happily using it ever since. Now I'm running a whole
lot of models, which goes smoothly, and I thought out of curiosity to
check how much processor I'm using. I would have thought I used 25%
(being one core), as on my old dual core R uses 50% of the total
processor capacity. Funny, it turns out that R is currently using only
12-13% of my cpu, which is about half of what I expected.

Did I miss something somewhere? Should I change some settings? I'm
running on a Windows 7 enterprise. I looked around already, but I have
the feeling I overlooked something.

Cheers
Joris

sessionInfo()
R version 2.10.1 (2009-12-14)
i386-pc-mingw32

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C   LC_TIME=English_United
States.1252

attached base packages:
[1] grDevices datasets  splines   graphics  stats tcltk utils
   methods   base

other attached packages:
[1] svSocket_0.9-48 TinnR_1.0.3 R2HTML_2.0.0Hmisc_3.7-0
survival_2.35-7

loaded via a namespace (and not attached):
[1] cluster_1.12.3 grid_2.10.1lattice_0.18-3 svMisc_0.9-57  tools_2.10.1


-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] (New) Popularity of R, SAS, SPSS, Stata...

2010-06-28 Thread Joris Meys

Dear Robert,

I've tried to acces that link, but to no prevail. Seems the server
r4stats.com is down, as he doesn't respond. This link got me to the
site :
http://sites.google.com/site/r4statistics/popularity

Cheers
Joris

On Mon, Jun 28, 2010 at 3:52 PM, Muenchen, Robert A (Bob)
 wrote:
> Greeting Listserv Readers,
>
> At http://r4stats.com/popularity I have added plots, data, and/or
> discussion of:
>
> 1. Scholarly impact of each package across the years
> 2. The number of subscribers to some of the listservs
> 3. How popular each package is among Google searches across the years
> 4. Survey results from a Rexer Analytics poll
> 5. Survey results from a KDnuggests poll
> 6. A rudimentary analysis of the software skills that employers are
> seeking
>
> Thanks very much to all the folks who helped on this project including:
> John Fox, Marc Schwartz, Duncan Murdoch, Martin Weiss, John (Jiangtang)
> HU, Andre Wielki, Kjetil Halvorsen, Dario Solari, Joris Meys, Keo
> Ormsby, Karl Rexer, and Gregory Piatetsky-Shapiro.
>
> If anyone can think of other angles, please let me know.
>
> Cheers,
> Bob
>
> =
>  Bob Muenchen (pronounced Min'-chen), Manager
>  Research Computing Support
>  Voice: (865) 974-5230
>  Email: muenc...@utk.edu
>  Web:   http://oit.utk.edu/research,
>  News:  http://oit.utk.edu/research/news.php
>  Feedback: http://oit.utk.edu/feedback/
> =
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Different standard errors from R and other software

2010-06-26 Thread Joris Meys

If I understand correctly from their website, "discrete choice" models
are mostly generalized linear models with the common link functions
for discrete data? Apart from a few names I didn't recognize, all
analyses seem quite "standard" to me. So I wonder why you would write
the log-likelihood yourself for techniques that are implemented in R.

Unless I missed something pretty important, or you want to do a
specific analysis that wasn't clear to me, you should take a closer
look at the possibilities in R for generalized linear (mixed)
modelling and so on.

Binary choice translates to a simple glm with a logit function.
Multinomial choice can be done with eg. multinom() from nnet. Ordered
choice can be done with polr() from the MASS package. A nice one to
look at is the package mgcv or gamm4 in case of big datasets. They
offer very flexible models that can include random terms, specific
variance-covariance structures and non-linear relations in the form of
splines.

Apologies if this is all obvious and known to you. In that case you
might want to specify what exactly it is you are comparing and how
exactly you calculated it yourself.

Cheers
Joris

On Fri, Jun 25, 2010 at 11:47 PM, Min Chen  wrote:
> Hi all,
>
>    Sorry to bother you. I'm estimating a discrete choice model in R using
> the maxBFGS command. Since I wrote the log-likelihood myself, in order to
> double check, I run the same model in Limdep. It turns out that the
> coefficient estimates are quite close; however, the standard errors are very
> different. I also computed the hessian and outer product of the gradients in
> R using the numDeriv package, but the results are still very different from
> those in Limdep. Is it the routine to compute the inverse hessian that
> causes the difference? Thank you very much!
>
>     Best wishes.
>
>
> Min
>
>
> --
> Min Chen
> Ph.D. Candidate
> Department of Agricultural, Food, and Resource Economics
> 125 Cook Hall
> Michigan State University
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Popularity of R, SAS, SPSS, Stata...

2010-06-25 Thread Joris Meys

>>I had taken the opposite tack with Google Trends by subtracting
> keywords
>>like:
>>SAS -shoes -airlines -sonar...
>>but never got as good results as that beautiful "X code for" search.
>>When you see the end-of-semester panic bumps in traffic, you know
> you're
>>nailing it!
>
> I have to eat those words already. The "R code for" search that showed a
> peak every December did not have quotes around it, so it was searching
> for those three words not the complete phrase. When you add the quotes,
> the peaks vanish.

Don't swallow! You're looking through search terms, not through web
pages. R code for regression, regression code R etc. are all valid
searches, no quotation marks needed.

http://www.google.com/insights/search/#q=code%20for%20r%2Ccode%20for%20SAS%2Ccode%20for%20SPSS%2Ccode%20for%20matlab&cmpt=q

This one is nice too. You can see that the bump in the autumn semester
for R is replacing the one for Matlab. Then in the spring semester
Matlab stays high but R drops. And both the US and India always have a
very large search index, whereas the rest of the world is essentially
worthless. Which leads me to the conclusion that : 1) The results are
probably coming from google.com, excluding local versions, and 2) in
the US (and India), statistics is mainly taught in the autumn
semester. Given the fact that daylight has a beneficial effect on the
emotional well being, the impopularity of statistics is likely caused
by unfortunate scheduling.

Forget Excel. Google rocks! ;-)

Cheers
Joris

>
> Once you go the phrase route, you gain precision but end up with zero
> counts on various phrases. I avoided that by combining them with "+" to
> get enough to plot. The resulting graph shows SAS dominant until
> mid-2006 when SPSS takes the top position, followed by R, SAS, Stata in
> order:
>
> http://www.google.com/insights/search/#q=%22r%20code%20for%22%2B%22r%20m
> anual%22%2B%22r%20tutorial%22%2B%22r%20graph%22%2C%22sas%20code%20for%22
> %2B%22sas%20manual%22%2B%22sas%20tutorial%22%2B%22sas%20graph%22%2C%22sp
> ss%20code%20for%22%2B%22spss%20manual%22%2B%22spss%20tutorial%22%2B%22sp
> ss%20graph%22%2C%22stata%20code%20for%22%2B%22stata%20manual%22%2B%22sta
> ta%20tutorial%22%2B%22stata%20graph%22%2C%22s-plus%20code%20for%22%2B%22
> s-plus%20manual%22%2Bs-plus%20tutorial%22%2B%22s-plus%20graph%22&cmpt=q
>
> This might be a good one to add to http://r4stats.com/popularity
>
> Bob
>
>>
>>I see that there's a car, the R Code Mustang, that adding "for" gets
> rid
>>of.
>>
>>Thanks for getting me back on a topic that I had given up on!
>>
>>Bob
>>
>>>-Original Message-
>>>From: r-help-boun...@r-project.org
>>[mailto:r-help-boun...@r-project.org]
>>>On Behalf Of Joris Meys
>>>Sent: Thursday, June 24, 2010 7:56 PM
>>>To: Dario Solari
>>>Cc: r-help@r-project.org
>>>Subject: Re: [R] Popularity of R, SAS, SPSS, Stata...
>>>
>>>Nice idea, but quite sensitive to search terms, if you compare your
>>>result on "... code" with "... code for":
>>>http://www.google.com/insights/search/#q=r%20code%20for%2Csas%20code%2
> 0
>>f
>>>or%2Cspss%20code%20for&cmpt=q
>>>
>>>On Thu, Jun 24, 2010 at 10:48 PM, Dario Solari
> 
>>>wrote:
>>>> First: excuse for my english
>>>>
>>>> My opinion: a useful font for measuring "popoularity" can be Google
>>>> Insights for Search - http://www.google.com/insights/search/#
>>>>
>>>> Every person using a software like R, SAS, SPSS needs first to learn
>>>> it. So probably he make a web-search for a manual, a tutorial, a
>>>> guide. One can measure the share of this kind of serach query.
>>>> This kind of results can be useful to determine trends of
>>>> "popularity".
>>>>
>>>> Example 1: "R tutorial/manual/guide", "SAS tutorial/manual/guide",
>>>> "SPSS tutorial/manual/guide"
>>>>
>>>http://www.google.com/insights/search/#q=%22r%20tutorial%22%2B%22r%20m
> a
>>n
>>>ual%22%2B%22r%20guide%22%2B%22r%20vignette%22%2C%22spss%20tutorial%22%
> 2
>>B
>>>%22spss%20manual%22%2B%22spss%20guide%22%2C%22sas%20tutorial%22%2B%22s
> a
>>s
>>>%20manual%22%2B%22sas%20guide%22&cmpt=q
>>>>
>>>> Example 2: "R software", "SAS software", "SPSS software"
>>>>
>>>http://www.google.com/insights/search/#q=%22r%20software%22%2C%22spss%
> 2
>>0
>

Re: [R] All a column to a data frame with a specific condition

2010-06-25 Thread Joris Meys

merge(sum_plan_a,data,by="plan_a")

Cheers
Joris


On Sat, Jun 26, 2010 at 2:27 AM, Yi  wrote:
> Hi, folks,
>
> Please first look at the codes:
>
> plan_a=c('apple','orange','apple','apple','pear','bread')
> plan_b=c('bread','bread','orange','bread','bread','yogurt')
> value=1:6
> data=data.frame(plan_a,plan_b,value)
> library(plyr)
> library(reshape)
> mm=melt(data, id=c('plan_a','plan_b'))
> sum_plan_a=cast(mm,plan_a~variable,sum)
>
> ### I would like to add a new column to the data.frame named 'data',  with
> the same sum of value for the same type of plan_a
> ### The result should come up like this:
>
>   plan_a  plan_b  value  sum_plan_a
> 1  apple  bread      1        8
> 2 orange  bread     2        2
> 3  apple orange     3        8
> 4  apple  bread      4        8
> 5   pear  bread      5         5
> 6  bread yogurt     6         6
>
> Any tips?
>
> Thank you.
>
>        [[alternative HTML version deleted]]
>
> ______
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Average 2 Columns when possible, or return available value

2010-06-25 Thread Joris Meys

Just want to add that if you want to clean out the NA rows in a matrix
or data frame, take a look at ?complete.cases. Can be handy to use
with big datasets. I got curious, so I just ran the codes given here
on a big dataset, before and after removing NA rows. I have to be
honest, this is surely an illustration of the power of rowMeans. I'm
amazed myself.

DF <- data.frame(
  A=rep(DF$A,1),
  B=rep(DF$B,1)
)

> system.time(apply(DF,1,mean,na.rm=TRUE))
   user  system elapsed
  13.260.06   13.46

> system.time(matrix(rowMeans(DF, na.rm=TRUE), ncol=1))
   user  system elapsed
   0.030.000.03

> system.time(t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean,
+ na.rm=TRUE)[,-1]))
+ )

Timing stopped at: 227.84 1.03 249.31  -- I got impatient and pressed the escape

> DF <- DF[complete.cases(DF),]

> system.time(apply(DF,1,mean,na.rm=TRUE))
   user  system elapsed
   0.390.000.39

> system.time(matrix(rowMeans(DF, na.rm=TRUE), ncol=1))
   user  system elapsed
   0.010.000.02

> system.time(t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean,
+ na.rm=TRUE)[,-1]))
+ )
   user  system elapsed
  10.010.07   13.40

Cheers
Joris


On Sat, Jun 26, 2010 at 1:08 AM, emorway  wrote:
>
> Forum,
>
> Using the following data:
>
> DF<-read.table(textConnection("A B
> 22.60 NA
>  NA NA
>  NA NA
>  NA NA
>  NA NA
>  NA NA
>  NA NA
>  NA NA
> 102.00 NA
>  19.20 NA
>  19.20 NA
>  NA NA
>  NA NA
>  NA NA
>  11.80 NA
>  7.62 NA
>  NA NA
>  NA NA
>  NA NA
>  NA NA
>  NA NA
>  75.00 NA
>  NA NA
>  18.30 18.2
>  NA NA
>  NA NA
>  8.44 NA
>  18.00 NA
>  NA NA
>  12.90 NA"),header=T)
> closeAllConnections()
>
> The second column is a duplicate reading of the first column, and when two
> values are available, I would like to average column 1 and 2 (example code
> below).  But if there is only one reading, I would like to retain it, but I
> haven't found a good way to exclude NA's using the following code:
>
> t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean)[,-1]))
>
> Currently, row 24 is the only row with a returned value.  I'd like the
> result to return column "A" if it is the only available value, and average
> where possible.  Of course, if both columns are NA, NA is the only possible
> result.
>
> The result I'm after would look like this (row 24 is an avg):
>
>  22.60
>    NA
>    NA
>    NA
>    NA
>    NA
>    NA
>    NA
> 102.00
>  19.20
>  19.20
>    NA
>    NA
>    NA
>  11.80
>  7.62
>    NA
>    NA
>    NA
>    NA
>    NA
>  75.00
>    NA
>  18.25
>    NA
>    NA
>  8.44
>  18.00
>    NA
>  12.90
>
> This is a small example from a much larger data frame, so if you're
> wondering what the deal is with list(), that will come into play for the
> larger problem I'm trying to solve.
>
> Respectfully,
> Eric
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Average-2-Columns-when-possible-or-return-available-value-tp2269049p2269049.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Euclidean Distance Matrix Analysis (EDMA) in R?

2010-06-25 Thread Joris Meys

Thanks for the link, very interesting book. Yet, I couldn't find the
part about EDMA. It would have surprised me anyway, as the input of
multidimensional scaling is one matrix with euclidean distances
between your observations, whereas in EDMA the data consist of a
number of distance matrices.

Quite a different thing if you ask me. Neither cmdscale nor isoMDS or
its derivated functions (eg metaMDS in the vegan package) are going to
be of any help.

Now I come to think of it, vegan has a procrustes function, but I'm
not sure if it is generalized to be of use in EDMA.

Cheers
Joris

On Fri, Jun 25, 2010 at 7:42 PM, Kjetil Halvorsen
 wrote:
> There is a freely downloadable and very relevant (& readable) book at
> https://ccrma.stanford.edu/~dattorro/mybook.html
> "Convex Optimization and Euclidean Distance geometry", and it indeed names 
> EDMA
> as a form of multidimensional scaling (or maybe in the oposite way).
> You should have a look
> at the codes for multidimensional scaling in R.
>
> Kjetil
>
> On Fri, Jun 25, 2010 at 6:25 AM, gokhanocakoglu  
> wrote:
>>
>> thanks for your interests Joris
>>
>>
>>
>> Gokhan OCAKOGLU
>> Uludag University
>> Faculty of Medicine
>> Department of Biostatistics
>> http://www20.uludag.edu.tr/~biostat/ocakoglui.htm
>>
>> --
>> View this message in context: 
>> http://r.789695.n4.nabble.com/Euclidean-Distance-Matrix-Analysis-EDMA-in-R-tp2266797p2268257.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Wilcoxon signed rank test and its requirements

2010-06-25 Thread Joris Meys

2010/6/25 Frank E Harrell Jr :
> The central limit theorem doesn't help.  It just addresses type I error,
> not power.
>
> Frank

I don't think I stated otherwise. I am aware of the fact that the
wilcoxon has an Asymptotic Relative Efficiency greater than 1 compared
to the t-test in case of skewed distributions. Apologies if I caused
more confusion.

The "problem" with the wilcoxon is twofold as far as I understood this
data correctly :
- there are quite some ties
- the wilcoxon assumes under the null that the distributions are the
same, not only the location. The influence of unequal variances and/or
shapes of the distribution is enhanced in the case of unequal sample
sizes.

The central limit theory makes that :
- the t-test will do correct inference in the presence of ties
- unequal variances can be taken into account using the modified
t-test, both in the case of equal and unequal sample sizes

For these reasons, I would personally use the t-test for comparing two
samples from the described population. Your mileage may vary.

Cheers
Joris

>
> On 06/25/2010 04:29 AM, Joris Meys wrote:
>> As a remark on your histogram : use less breaks! This histogram tells
>> you nothing. An interesting function is ?density , eg :
>>
>> x<-rnorm(250)
>> hist(x,freq=F)
>> lines(density(x),col="red")
>>
>> See also this ppt, a very nice and short introduction to graphics in R :
>> http://csg.sph.umich.edu/docs/R/graphics-1.pdf
>>
>> 2010/6/25 Atte Tenkanen:
>>> Is there anything for me?
>>>
>>> There is a lot of data, n=2418, but there are also a lot of ties.
>>> My sample n≈250-300
>>
>> You should think about the central limit theorem. Actually, you can
>> just use a t-test to compare means, as with those sample sizes the
>> mean is almost certainly normally distributed.
>>>
>>> i would like to test, whether the mean of the sample differ significantly 
>>> from the population mean.
>>>
>> According to probability theory, this will be in 5% of the cases if
>> you repeat your sampling infinitly. But as David asked: why on earth
>> do you want to test that?
>>
>> cheers
>> Joris
>>
>
>
> --
> Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
> Department of Biostatistics   Vanderbilt University
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] i want create script

2010-06-25 Thread Joris Meys

Please read the posting guide :  http://www.R-project.org/posting-guide.html

Your question is very vague. One could assume you're completely new to
R and want the commands to read a csv file (see ?read.csv), and to
write away a table (eg ?write.table to write your predicted data in a
text format).

My guess is you want to run this script in the shell without having to
open R, similar to a perl scipt. For this, take a look at:

http://cran.r-project.org/doc/manuals/R-intro.html#Scripting-with-R
http://projects.uabgrid.uab.edu/r-group/wiki/CommandLineProcessing

Cheers
Joris

On Fri, Jun 25, 2010 at 8:26 AM, vijaysheegi  wrote:
>
> Hi R community,
> I want to create a script which will take the .csv table as input and do
> some prediction and output should be returned to some file.Inputs is exel
> sheet containing some tables of data.out should be table of predicted
> data.Will some one help me in this regards...
> Thanks in advance.
>
> I am using Windows R.Please advise proccedure to create Rscript.
>
>
> Regards
> -
> Vijay
> Research student
> Bangalore
> India
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/i-want-create-script-tp2268011p2268011.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Optimizing given two vectors of data

2010-06-25 Thread Joris Meys

Optim uses vectors of _parameters_, not of data. You add a
(likelihood) function, give initial values of the parameters, and get
the optimized parameters back. See ?optim and the examples therein. It
contains an example for optimization using multiple data columns.

Cheers
Joris

On Fri, Jun 25, 2010 at 8:12 AM, confusedSoul  wrote:
>
> I am trying to estimate an Arrhenius-exponential model in R.  I have one
> vector of data containing failure times, and another containing
> corresponding temperatures.  I am trying to optimize a maximum likelihood
> function given BOTH these vectors.  However, the optim command takes only
> one such vector parameter.
>
> How can I pass both vectors into the function?
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Optimizing-given-two-vectors-of-data-tp2268002p2268002.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Euclidean Distance Matrix Analysis (EDMA) in R?

2010-06-25 Thread Joris Meys

I've been looking around myself, but I couldn't find any. Maybe
somebody will chime in to direct you to the correct places. I also
checked the papers, and it seems not too hard to implement. If I find
some time, I'll take a look at it next week.

For the other two gentlemen, check:

http://www.getahead.psu.edu/PDF/EuclideanDistanceMatrixAnalysis.pdf
http://www.getahead.psu.edu/PDF/no.1.pdf

Cheers
Joris

On Fri, Jun 25, 2010 at 8:30 AM, gokhanocakoglu  wrote:
>
> In fact, Euclidean Distance Matrix Analysis (EDMA) of form is a coordinate
> free approach to the analysis of form using landmark data which was
> developed by  Subhash Lele and Joan Richstmeier. They also developed a
> computer program (http://www.getahead.psu.edu/comment/edma.asp) that allow
> to perform several techniques including EDMA I-II but I wonder is there a
> package or code available in R to perform EDMA...
>
> thanks
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Euclidean-Distance-Matrix-Analysis-EDMA-in-R-tp2266797p2268018.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Wilcoxon signed rank test and its requirements

2010-06-25 Thread Joris Meys

As a remark on your histogram : use less breaks! This histogram tells
you nothing. An interesting function is ?density , eg :

x<-rnorm(250)
hist(x,freq=F)
lines(density(x),col="red")

See also this ppt, a very nice and short introduction to graphics in R :
http://csg.sph.umich.edu/docs/R/graphics-1.pdf

2010/6/25 Atte Tenkanen :
> Is there anything for me?
>
> There is a lot of data, n=2418, but there are also a lot of ties.
> My sample n≈250-300

You should think about the central limit theorem. Actually, you can
just use a t-test to compare means, as with those sample sizes the
mean is almost certainly normally distributed.
>
> i would like to test, whether the mean of the sample differ significantly 
> from the population mean.
>
According to probability theory, this will be in 5% of the cases if
you repeat your sampling infinitly. But as David asked: why on earth
do you want to test that?

cheers
Joris

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Simple qqplot question

2010-06-25 Thread Joris Meys

Sorry, missed the two variable thing. Go with the lm solution then,
and you can tweak the plot yourself (the confidence intervals are
easily obtained via predict(lm.object, interval="prediction") ). The
function qq.plot uses robust regression, but in your case normal
regression will do.

Regarding the shapes : this just indicates both tails are shorter than
expected, so you have a kurtosis greater than 3 (or positive,
depending whether you do the correction or not)

Cheers
Joris

On Fri, Jun 25, 2010 at 4:10 AM, Ralf B  wrote:
> Short rep: I have two distributions, data and data2; each build from
> about 3 million data points; they appear similar when looking at
> densities and histograms. I plotted qqplots for further eye-balling:
>
> qqplot(data, data2, xlab = "1", ylab = "2")
>
> and get an almost perfect diagonal line which means they are in fact
> very alike. Now I tried to check normality using qqnorm -- and I think
> I am doing something wrong here:
>
> qqnorm(data, main = "Q-Q normality plot for 1")
> qqnorm(data2, main = "Q-Q normality plot for 2")
>
> I am getting perfect S-shaped curves (??) for both distributions. Am I
> something missing here?
>
> |
> |                               *  *   *  *
> |                           *
> |                        *
> |                    *
> |               *
> |            *
> |         *
> | * * *
> |-
>
> Thanks, Ralf
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] write a loop for tallies

2010-06-24 Thread Joris Meys

It would help if you placed r <- 0; s <- 0 etc. outside the loop. Same
goes for cat(...). And get rid of the sum(r), sum(s) and so on, that's
doing nothing (r,s,... are single numbers)

This said :

See Peter Langfelder's response.

Cheers
Joris

> # see ?table for a better approach
> r<-0
> s<-0
> t<-0
> u<-0
> v<-0
>
> a<- for (i in 1:length(n)) {
> ifelse(n[i] == "3000", r <- r+1,
> ifelse(n[i] == "4000", s <- r+1,
> ifelse(n[i] == "5000", t <- r+1,
> ifelse(n[i] == "6000", u <- r+1,
> ifelse(n[i] == "7000", v <- r+1, NA)))))
> }
> cat("r = ", r, "\n")
> cat("s = ", s, "\n")
> cat("t = ", t, "\n")
> cat("u = ", u, "\n")
> cat("v = ", v, "\n")


-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Popularity of R, SAS, SPSS, Stata...

2010-06-24 Thread Joris Meys

dangit, tab in the way...

On Fri, Jun 25, 2010 at 1:56 AM, Joris Meys  wrote:
> Nice idea, but quite sensitive to search terms, if you compare your
> result on "... code" with "... code for":
> http://www.google.com/insights/search/#q=r%20code%20for%2Csas%20code%20for%2Cspss%20code%20for&cmpt=q

This one actually shows quite nicely when students start panicking for
their projects.
Cheers
Joris

> On Thu, Jun 24, 2010 at 10:48 PM, Dario Solari  wrote:
>> First: excuse for my english
>>
>> My opinion: a useful font for measuring "popoularity" can be Google
>> Insights for Search - http://www.google.com/insights/search/#
>>
>> Every person using a software like R, SAS, SPSS needs first to learn
>> it. So probably he make a web-search for a manual, a tutorial, a
>> guide. One can measure the share of this kind of serach query.
>> This kind of results can be useful to determine trends of
>> "popularity".
>>
>> Example 1: "R tutorial/manual/guide", "SAS tutorial/manual/guide",
>> "SPSS tutorial/manual/guide"
>> http://www.google.com/insights/search/#q=%22r%20tutorial%22%2B%22r%20manual%22%2B%22r%20guide%22%2B%22r%20vignette%22%2C%22spss%20tutorial%22%2B%22spss%20manual%22%2B%22spss%20guide%22%2C%22sas%20tutorial%22%2B%22sas%20manual%22%2B%22sas%20guide%22&cmpt=q
>>
>> Example 2: "R software", "SAS software", "SPSS software"
>> http://www.google.com/insights/search/#q=%22r%20software%22%2C%22spss%20software%22%2C%22sas%20software%22&cmpt=q
>>
>> Example 3: "R code", "SAS code", "SPSS code"
>> http://www.google.com/insights/search/#q=%22r%20code%22%2C%22spss%20code%22%2C%22sas%20code%22&cmpt=q
>>
>> Example 4: "R graph", "SAS graph", "SPSS graph"
>> http://www.google.com/insights/search/#q=%22r%20graph%22%2C%22spss%20graph%22%2C%22sas%20graph%22&cmpt=q
>>
>> Example 5: "R regression", "SAS regression", "SPSS regression"
>> http://www.google.com/insights/search/#q=%22r%20regression%22%2C%22spss%20regression%22%2C%22sas%20regression%22&cmpt=q
>>
>> Some example are cross-software (learning needs - Example1), other can
>> be biased by the tarditional use of that software (in SPSS usually you
>> don't manipulate graph, i think)
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Joris Meys
> Statistical consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Applied mathematics, biometrics and process control
>
> tel : +32 9 264 59 87
> joris.m...@ugent.be
> ---
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Popularity of R, SAS, SPSS, Stata...

2010-06-24 Thread Joris Meys

Nice idea, but quite sensitive to search terms, if you compare your
result on "... code" with "... code for":
http://www.google.com/insights/search/#q=r%20code%20for%2Csas%20code%20for%2Cspss%20code%20for&cmpt=q

On Thu, Jun 24, 2010 at 10:48 PM, Dario Solari  wrote:
> First: excuse for my english
>
> My opinion: a useful font for measuring "popoularity" can be Google
> Insights for Search - http://www.google.com/insights/search/#
>
> Every person using a software like R, SAS, SPSS needs first to learn
> it. So probably he make a web-search for a manual, a tutorial, a
> guide. One can measure the share of this kind of serach query.
> This kind of results can be useful to determine trends of
> "popularity".
>
> Example 1: "R tutorial/manual/guide", "SAS tutorial/manual/guide",
> "SPSS tutorial/manual/guide"
> http://www.google.com/insights/search/#q=%22r%20tutorial%22%2B%22r%20manual%22%2B%22r%20guide%22%2B%22r%20vignette%22%2C%22spss%20tutorial%22%2B%22spss%20manual%22%2B%22spss%20guide%22%2C%22sas%20tutorial%22%2B%22sas%20manual%22%2B%22sas%20guide%22&cmpt=q
>
> Example 2: "R software", "SAS software", "SPSS software"
> http://www.google.com/insights/search/#q=%22r%20software%22%2C%22spss%20software%22%2C%22sas%20software%22&cmpt=q
>
> Example 3: "R code", "SAS code", "SPSS code"
> http://www.google.com/insights/search/#q=%22r%20code%22%2C%22spss%20code%22%2C%22sas%20code%22&cmpt=q
>
> Example 4: "R graph", "SAS graph", "SPSS graph"
> http://www.google.com/insights/search/#q=%22r%20graph%22%2C%22spss%20graph%22%2C%22sas%20graph%22&cmpt=q
>
> Example 5: "R regression", "SAS regression", "SPSS regression"
> http://www.google.com/insights/search/#q=%22r%20regression%22%2C%22spss%20regression%22%2C%22sas%20regression%22&cmpt=q
>
> Some example are cross-software (learning needs - Example1), other can
> be biased by the tarditional use of that software (in SPSS usually you
> don't manipulate graph, i think)
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Install package automatically if not there?

2010-06-24 Thread Joris Meys

If you want to make changes more permanent, you should take a look at
the "Rprofile.site" file. This one gets loaded in R at the startup of
the console. You can set the CRAN there if you want too.

Cheers
Joris

On Fri, Jun 25, 2010 at 1:32 AM, Joshua Wiley  wrote:
> Hello Ralf,
>
> Glad it works for you.  As far as avoiding any prompting if packages
> are out-of-date; I am not sure.  It honestly seems like a risky idea
> to me to have old packages being overwritten without a user knowing.
> However, I added a few lines of code here to set the CRAN mirror, and
> revert it back to whatever it was when the function ends to make it as
> inobtrusive as possible.
>
>
> load.fun <- function(x) {
>  old.repos <- getOption("repos")
>  on.exit(options(repos = old.repos)) #this resets the repos option
> when the function exits
>  new.repos <- old.repos
>  new.repos["CRAN"] <- "http://cran.stat.ucla.edu"; #set your favorite
> CRAN Mirror here
>  options(repos = new.repos)
>  x <- as.character(substitute(x))
>  if(isTRUE(x %in% .packages(all.available=TRUE))) {
>    eval(parse(text=paste("require(", x, ")", sep="")))
>  } else {
>    update.packages() # recommended before installing so that
> dependencies are the latest version
>    eval(parse(text=paste("install.packages('", x, "')", sep="")))
>    eval(parse(text=paste("require(", x, ")", sep="")))
>  }
> }
>
> Best regards,
>
> Josh
>
> On Thu, Jun 24, 2010 at 3:34 PM, Ralf B  wrote:
>> Thanks, works great.
>>
>> By the way, is there a say to even go further, and avoid prompting
>> (e.g. picking a particiular server by default and updating without
>> further prompt) ?  I could understand if such a feature does not exist for
>> security reasons...
>>
>> Thanks again,
>> Ralf
>>
>> On Thu, Jun 24, 2010 at 5:12 PM, Joshua Wiley  wrote:
>>> On Thu, Jun 24, 2010 at 1:51 PM, Joshua Wiley  
>>> wrote:
>>>> Hello Ralf,
>>>>
>>>> This is a little function that you may find helpful.  If the package
>>>> is already installed, it just loads it, otherwise it updates the
>>>> existing packages and then installs the required package.  As in
>>>> require(), 'x' does not need to be quoted.
>>>>
>>>> load.fun <- function(x) {
>>>>  x <- as.character(substitute(x))
>>>>  if(isTRUE(x %in% .packages(all.available=TRUE))) {
>>>>    eval(parse(text=paste("require(", x, ")", sep="")))
>>>>  } else {
>>>>    update.packages() # recommended before installing so that
>>>> dependencies are the latest version
>>>>    eval(parse(text=paste("install.packages('", x, "')", sep="")))
>>>>  }
>>>> }
>>>
>>> I miscopied the last line; it should be
>>>
>>> ###
>>> load.fun <- function(x) {
>>>  x <- as.character(substitute(x))
>>>  if(isTRUE(x %in% .packages(all.available=TRUE))) {
>>>    eval(parse(text=paste("require(", x, ")", sep="")))
>>>  } else {
>>>    update.packages() # recommended before installing so that
>>> dependencies are the latest version
>>>    eval(parse(text=paste("install.packages('", x, "')", sep="")))
>>>    eval(parse(text=paste("require(", x, ")", sep="")))
>>>  }
>>> }
>>> ###
>>>
>>>>
>>>> HTH,
>>>>
>>>> Josh
>>>>
>>>> On Thu, Jun 24, 2010 at 12:25 PM, Ralf B  wrote:
>>>>> Hi fans,
>>>>>
>>>>> is it possible for a script to check if a library has been installed?
>>>>> I want to automatically install it if it is missing to avoid scripts
>>>>> to crash when running on a new machine...
>>>>>
>>>>> Ralf
>>>>>
>>>>> __
>>>>> R-help@r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide 
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Joshua Wiley
>>>> Ph.D. Student
>>>> Health Psychology
>>>> University of California, Los Angeles
>>>>
>>>
>>>
>>>
>>> --
>>> Joshua Wiley
>>> Ph.D. Student
>>> Health Psychology
>>> University of California, Los Angeles
>>>
>>
>
>
>
> --
> Joshua Wiley
> Ph.D. Student
> Health Psychology
> University of California, Los Angeles
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Install package automatically if not there?

2010-06-24 Thread Joris Meys

I've never seen R been Perl'd this nice before.

On Thu, Jun 24, 2010 at 10:32 PM, Albert-Jan Roskam  wrote:
> require(pkg) || install.packages(pkg)
>
> Cheers!!
>
> Albert-Jan
>
>
>
> ~~
>
> All right, but apart from the sanitation, the medicine, education, wine, 
> public order, irrigation, roads, a fresh water system, and public health, 
> what have the Romans ever done for us?
>
> ~~
>
> --- On Thu, 6/24/10, Ralf B  wrote:
>
> From: Ralf B 
> Subject: [R] Install package automatically if not there?
> To: "r-help@r-project.org" 
> Date: Thursday, June 24, 2010, 9:25 PM
>
> Hi fans,
>
> is it possible for a script to check if a library has been installed?
> I want to automatically install it if it is missing to avoid scripts
> to crash when running on a new machine...
>
> Ralf
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Simple qqplot question

2010-06-24 Thread Joris Meys

Also take a look at qq.plot in the package "car". Gives you exactly
what you want.
Cheers
Joris

On Fri, Jun 25, 2010 at 12:55 AM, Ralf B  wrote:
> More details...
>
> I have two distributions which are very similar. I have plotted
> density plots already from the two distributions. In addition,
> I created a qqplot that show an almost straight line. What I want is a
> line that represents the ideal case in which the two
> distributions match perfectly. I would use this line to see how much
> the errors divert at different stages of the plot.
>
> Ralf
>
>
>
> On Thu, Jun 24, 2010 at 5:56 PM, stephen sefick  wrote:
>> You are going to have to define the question a little better.  Also,
>> please provide a reproducible example.
>>
>> On Thu, Jun 24, 2010 at 4:44 PM, Ralf B  wrote:
>>> I am a beginner in R, so please don't step on me if this is too
>>> simple. I have two data sets datax and datay for which I created a
>>> qqplot
>>>
>>> qqplot(datax,datay)
>>>
>>> but now I want a line that indicates the perfect match so that I can
>>> see how much the plot diverts from the ideal. This ideal however is
>>> not normal, so I think qqnorm and qqline cannot be applied.
>>>
>>> Perhaps you can help?
>>>
>>> Ralf
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Stephen Sefick
>> 
>> | Auburn University                                   |
>> | Department of Biological Sciences           |
>> | 331 Funchess Hall                                  |
>> | Auburn, Alabama                                   |
>> | 36849                                                    |
>> |___|
>> | sas0...@auburn.edu                             |
>> | http://www.auburn.edu/~sas0025             |
>> |___|
>>
>> Let's not spend our time and resources thinking about things that are
>> so little or so large that all they really do for us is puff us up and
>> make us feel like gods.  We are mammals, and have not exhausted the
>> annoying little problems of being mammals.
>>
>>                                                                -K. Mullis
>>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help, bifurcation diagram efficiency

2010-06-24 Thread Joris Meys

Basically, don't write loops. Think vectors, matrices,... The R
Inferno of Patrick Burns contains a lot of valuable information on
optimizing code :

http://lib.stat.cmu.edu/S/Spoetry/Tutor/R_inferno.pdf

Cheers
Joris


On Thu, Jun 24, 2010 at 7:51 PM, Tyler Massaro  wrote:
> Hello all -
>
> This code will run, but it bogs down my computer when I run it for finer and
> finer time increments and more generations.  I was wondering if there is a
> better way to write my loops so that this wouldn't happen.  Thanks!
>
> -Tyler
>
> #
>
> # Bifurcation diagram
>
> # Using Braaksma system of equations
>
> # We have however used a Fourier analysis
>
> # to get a forcing function similar to
>
> # cardiac action potential...
>
> #
>
> require(odesolve)
>
>
>
> # We get this s_of_t function from Maple ws
>
> s_of_t = function(t)
>
> {
>
> (1/10) * (( (1/2) + (1/2) * (sin((1/4)*pi*t))/(abs(sin((1/4)*pi*t * (
> 6.588315815*sin((1/4)*pi*t) - 1.697435362*sin((1/2)*pi*t) - 1.570296922*sin
> ((3/4)*pi*t) + 0.3247901958*sin(pi*t) + 0.7962749105*sin((5/4)*pi*t) +
> 0.07812230515*sin((3/2)*pi*t) - 0.3424877143*sin((7/4)*pi*t) - 0.1148306748*
> sin(2*pi*t) + 0.1063696962*sin((9/4)*pi*t) + 0.02812403009*sin((5/2)*pi*t)))
>
>  }
>
>
> ModBraaksma = function(t, n, p)
>
> {
>
>
>  dx.dt = (1/0.01)*(n[2]-((1/2)*n[1]^2+(1/3)*n[1]^3))
>
>  dy.dt = -(n[1]+p["alpha"]) + 0.032 * s_of_t(t)
>
> list(c(dx.dt, dy.dt))
>
> }
>
>
> initial.values = c(0.1, -0.02)
>
>
> alphamin = 0.01
>
> alphamax = 0.02
>
>
> alphas = seq(alphamin, alphamax, by = 0.1)
>
>
> TimeInterval = 100
>
>
> times = seq(0.001, TimeInterval, by = 0.001)
>
>
> plot(1, xlim = c(alphamin, alphamax), ylim = c(0,0.3), type = "n",xlab
> = "Values
> of alpha", ylab = "Approximate loop size for a limit cycle", main =
> "Bifurcation
> Diagram")
>
>
> for (i in 1:length(alphas)){
>
>  params = c(alpha=alphas[i])
>
> out = lsoda(initial.values, times, ModBraaksma, params)
>
> X = out[,2]
>
> Y = out[,3]
>
>  for(j in 200:length(times)){
>
>  if (abs(X[j]) < 0.001) {
>
> points(alphas[i], Y[j], pch = ".")
>
>  }
>
> }
>
> }
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Wilcoxon signed rank test and its requirements

2010-06-24 Thread Joris Meys

On Fri, Jun 25, 2010 at 12:17 AM, David Winsemius
 wrote:
>
> On Jun 24, 2010, at 6:09 PM, Joris Meys wrote:
>
>> I do agree that one should not trust solely on sources like wikipedia
>> and graphpad, although they contain a lot of valuable information.
>>
>> This said, it is not too difficult to illustrate why, in the case of
>> the one-sample signed rank test,
>
> That is a key point. I was assuming that you were using the paired sample
> version of the WSRT and I may have been misleading the OP. For the
> one-sample situation, the assumption of symmetry is needed but for the
> paired sampling version of the test, the location shift becomes the tested
> hypothesis, and no assumptions about the form of the hypothesis are made
> except that they be the same.

I believe you mean the form of the distributions. The assumption that
the distributions of both samples are the same (or similar, it is a
robust test) implies that the differences x_i - y_i are more or less
symmetrically distributed. Key point here that we're not talking about
the distribution of the populations/samples (as done in the OP) but
about the distribution of the difference. I may not have been clear
enough on that one.

Cheers
Joris

> Any consideration of median or mean (which
> will be the same in the case of symmetric distributions) gets lost in the
> paired test case.
>
> --
> David.
>
>
>> the differences should be not to far
>> away from symmetrical. It just needs some reflection on how the
>> statistic is calculated. If you have an asymmetrical distribution, you
>> have a lot of small differences with a negative sign and a lot of
>> large differences with a positive sign if you test against the median
>> or mean. Hence the sum of ranks for one side will be higher than for
>> the other, leading eventually to a significant result.
>>
>> An extreme example :
>>
>>> set.seed(100)
>>> y <- rnorm(100,1,2)^2
>>> wilcox.test(y,mu=median(y))
>>
>>       Wilcoxon signed rank test with continuity correction
>>
>> data:  y
>> V = 3240.5, p-value = 0.01396
>> alternative hypothesis: true location is not equal to 1.829867
>>
>>> wilcox.test(y,mu=mean(y))
>>
>>       Wilcoxon signed rank test with continuity correction
>>
>> data:  y
>> V = 1763, p-value = 0.008837
>> alternative hypothesis: true location is not equal to 5.137409
>>
>> Which brings us to the question what location is actually tested in
>> the wilcoxon test. For the measure of location to be the mean (or
>> median), one has to assume that the distribution of the differences is
>> rather symmetrical, which implies your data has to be distributed
>> somewhat symmetrical. The test is robust against violations of this
>> -implicit- assumption, but in more extreme cases skewness does matter.
>>
>> Cheers
>> Joris
>>
>> On Thu, Jun 24, 2010 at 7:40 PM, David Winsemius 
>> wrote:
>>>
>>>
>>> You are being misled. Simply finding a statement on a statistics software
>>> website, even one as reputable as Graphpad (???), does not mean that it
>>> is
>>> necessarily true. My understanding (confirmed reviewing "Nonparametric
>>> statistical methods for complete and censored data" by M. M. Desu,
>>> Damaraju
>>> Raghavarao, is that the Wilcoxon signed-rank test does not require that
>>> the
>>> underlying distributions be symmetric. The above quotation is highly
>>> inaccurate.
>>>
>>> --
>>> David.
>>>
>>>>
>>
>> --
>> Joris Meys
>> Statistical consultant
>>
>> Ghent University
>> Faculty of Bioscience Engineering
>> Department of Applied mathematics, biometrics and process control
>>
>> tel : +32 9 264 59 87
>> joris.m...@ugent.be
>> ---
>> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Wilcoxon signed rank test and its requirements

2010-06-24 Thread Joris Meys

I do agree that one should not trust solely on sources like wikipedia
and graphpad, although they contain a lot of valuable information.

This said, it is not too difficult to illustrate why, in the case of
the one-sample signed rank test, the differences should be not to far
away from symmetrical. It just needs some reflection on how the
statistic is calculated. If you have an asymmetrical distribution, you
have a lot of small differences with a negative sign and a lot of
large differences with a positive sign if you test against the median
or mean. Hence the sum of ranks for one side will be higher than for
the other, leading eventually to a significant result.

An extreme example :

> set.seed(100)
> y <- rnorm(100,1,2)^2
> wilcox.test(y,mu=median(y))

Wilcoxon signed rank test with continuity correction

data:  y
V = 3240.5, p-value = 0.01396
alternative hypothesis: true location is not equal to 1.829867

> wilcox.test(y,mu=mean(y))

Wilcoxon signed rank test with continuity correction

data:  y
V = 1763, p-value = 0.008837
alternative hypothesis: true location is not equal to 5.137409

Which brings us to the question what location is actually tested in
the wilcoxon test. For the measure of location to be the mean (or
median), one has to assume that the distribution of the differences is
rather symmetrical, which implies your data has to be distributed
somewhat symmetrical. The test is robust against violations of this
-implicit- assumption, but in more extreme cases skewness does matter.

Cheers
Joris

On Thu, Jun 24, 2010 at 7:40 PM, David Winsemius  wrote:
>
>
> You are being misled. Simply finding a statement on a statistics software
> website, even one as reputable as Graphpad (???), does not mean that it is
> necessarily true. My understanding (confirmed reviewing "Nonparametric
> statistical methods for complete and censored data" by M. M. Desu, Damaraju
> Raghavarao, is that the Wilcoxon signed-rank test does not require that the
> underlying distributions be symmetric. The above quotation is highly
> inaccurate.
>
> --
> David.
>
>>

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] PD: ?to calculate sth for groups defined between points in one variable (string), / value separating/ spliting variable into groups by i.e. between start, NA, NA, stop1, start2, NA, stop2

2010-06-24 Thread Joris Meys

Same trick :
c0<-rbind( 1,  2 , 3, 4,  5, 6, 7, 8, 9,10,11,
12,13,14,15,16,17 )
c0
c1<-rbind(10, 20 ,30,40, 50,10,60,20,30,40,50,  30,10,
0,NA,20,10.3444)
c1
c2<-rbind(NA,"A",NA,NA,"B",NA,NA,NA,NA,NA,NA,"C",NA,NA,NA,NA,"D")
c2
C.df<-data.frame(c0,c1,c2)
C.df

pos <- which(!is.na(C.df$c2))

idx <- sapply(2:length(pos),function(i) pos[i-1]:(pos[i]-1))
names(idx) <- sapply(2:length(pos),
function(i) paste(C.df$c2[pos[i-1]],"-",C.df$c2[pos[i]]))


out <- lapply(idx,function(i) summary(C.df[i,1:2]))
out



2010/6/24 Eugeniusz Kałuża :
> Dear useRs,
>
> Thanks for advice from Joris Meys,
> Now will try to think how to make it working for less specyfic case,
> to make the problem more general.
> Then the result should be displayed for every group between non empty string 
> in c2
> i.e. not only result for:
>  #mean:
>          c1     c3    c4           c5
>          20  Start1 Stop1 Start1-Stop1
>    25.48585  Start2 Stop2 Start2-Stop2
>
> but also for every one group created by space between two closest strings in 
> c2, that contains only seriess of Na, NA, NA, separated from time to time by 
> one string i.e.:
>  #mean:
>          c1     c3    c4           c5
>          20 Start1 Stop1 Start1-Stop1
>          .. Stop1 Start2 Stop1-Start2
>    25.48585  Start2 Stop2 Start2-Stop2
>
> i.e.
> to rewrite this maybe for another simpler version of command
>
> but also for every one group created by space between two closest strings in 
> c2, that contains only seriess of Na, NA, NA, separated from time to time by 
> one string A, NA, NA, NA, NA, B, NA, NA, NA, C, NA,NA,NA,NA,D, NA,NA
> i.e.:
>  #mean:
>          c1     c3    c4           c5
>          20      A     B          A-B
>          ..      B     C          B-C
>    25.48585      C     D          C-D
> ...
>
>
> Looking for more general method (function), grouping between these letters in 
> c2,
> I will now try to study solution proposed by Joris Meys
> Thanks for immediate aswer
> Kaluza
>
>
>
>
> -Wiadomo¶æ oryginalna-
> Od: Joris Meys [mailto:jorism...@gmail.com]
> Wys³ano: Cz 2010-06-24 15:14
> Do: Eugeniusz Ka³u¿a
> DW: r-help@r-project.org
> Temat: Re: [R] ?to calculate sth for groups defined between points in one 
> variable (string), / value separating/ spliting variable into groups by i.e. 
> between start, NA, NA, stop1, start2, NA, stop2
>
> On Thu, Jun 24, 2010 at 1:18 PM, Eugeniusz Kaluza
>  wrote:
>>
>> Dear useRs,
>>
>> Thanks for any advices
>>
>> # I do not know where are the examples how to mark groups
>> #  based on signal occurence in the additional variable: cf. variable c2,
>> # How to calculate different calculations for groups defined by (split by 
>> occurence of c2 characteristic data)
>>
>>
>> #First example of simple data
>> #mexample   1      2    3  4     5  6  7  8  9  10 11       12 13 14 15 16 17
>> c0<-rbind( 1,      2 , 3, 4,      5, 6, 7, 8, 9,10,11,      
>> 12,13,14,15,16,17     )
>> c0
>> c1<-rbind(10,     20 ,30,40,     50,10,60,20,30,40,50,      30,10, 
>> 0,NA,20,10.3444)
>> c1
>> c2<-rbind(NA,"Start1",NA,NA,"Stop1",NA,NA,NA,NA,NA,NA,"Start2",NA,NA,NA,NA,"Stop2")
>> c2
>> C.df<-data.frame(cbind(c0,c1,c2))
>> colnames(C.df)<-c("c0","c1","c2")
>> C.df
>>
>> # preparation of form for explaining further needed result (next 3 lines are 
>> not needed indeed, they are only  to explain how to obtain final result
>>  c3<-rbind(NA,"Start1","Start1","Start1","Start1","Start2","Start2","Start2","Start2","Start2","Start2","Start2","Start2","Start2","Start2","Start2","Start2")
>>  c4<-rbind(NA, "Stop1", "Stop1", "Stop1", "Stop1", "Stop2", "Stop2", 
>> "Stop2", "Stop2", "Stop2", "Stop2", "Stop2", "Stop2", "Stop2", "Stop2", 
>> "Stop2", "Stop2")
>>  C.df<-data.frame(cbind(c0,c1,c2,c3,c4))
>>  colnames(C.df)<-c("c0","c1","c2","c3","c4")
>>  C.df$c5<-paste(C.df$c3,C.df$c4,sep="-")
>>  C.df
>>
> Now this is something I don't get. The list "Start2-Stop2" starts way
> before Start2, actually at Stop1. Sure that's what you want?
>

Re: [R] How to say "if error"

2010-06-24 Thread Joris Meys

You could do that using the options, eg :

set.seed(1)
x <- rnorm(1:10)
y <- letters[1:10]
z <- rnorm(1:10)

warn <-getOption("warn")
options(warn=2)
for (i in list(x,y,z)){
  cc <- try(mean(i), silent=T)
  if(is(cc,"try-error")) {next}
  print(cc)
}
options(warn=warn)

see ?options under "warn"

Cheers
Joris

On Thu, Jun 24, 2010 at 5:12 PM, Paul Chatfield
 wrote:
>
> On a similar issue, how can you detect a warning in a loop - e.g. the
> following gives a warning, so I'd like to set up code to recognise that and
> then carry on in a loop
>
> x<-rnorm(2);y<-c(1,0)
> ff<-glm(y/23~x, family=binomial)
>
> so this would be incorporated into a loop that might be
>
> x<-rnorm(10);y<-rep(c(1,0),5)
> for (i in 1:10)
> {ee<-glm(y~x, family=binomial)
> ff<-glm(y/23~x, family=binomial)}
>
> from which I would recognise the warning in ff and not those in ee, saving
> results from ee and not from ff. The last bit would be easy adding a line
> if(there_is_a_warning_message) {newvector<-NA} else {use results} but how do
> you detect the warning message?
>
> Thanks all for your feedback so far,
>
> Paul
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/How-to-say-if-error-tp2266619p2267140.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Averaging half hourly data to hourly

2010-06-24 Thread Joris Meys

See ?aggregate

Cheers
J

On Thu, Jun 24, 2010 at 10:45 AM, Jennifer Wright
 wrote:
> Hi all,
>
> I have some time-series data in half hourly time steps and I want to be able
> to either average or sum the two half hours together into hours. Any help
> greatly appreciated.
>
> Thanks,
>
> Jenny
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] "If-error-resume-next"

2010-06-24 Thread Joris Meys

http://r.789695.n4.nabble.com/How-to-say-if-error-td2266619.html#a2266619

Cheers

On Thu, Jun 24, 2010 at 4:30 PM, Christofer Bogaso
 wrote:
> In VBA there is a syntax "if error resume next" which sometimes acts as very
> beneficial on ignoring error. Is there any similar kind of function/syntax
> here in R as well?
>
> Thanks for your attention
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to say "if error"

2010-06-24 Thread Joris Meys

An old-fashioned and I guess also advised-against method would be to
use try() itself, eg :

set.seed(1)
x <- rnorm(1:10)
y <- letters[1:10]
z <- rnorm(1:10)

for (i in list(x,y,z)){
  cc <- try(sum(i), silent=T)
  if(is(cc,"try-error")) {next}
  print(cc)
}

Put silent=F if you want to see the error methods. See also ?try (and ?is )
Cheers
Joris

On Thu, Jun 24, 2010 at 3:34 PM, Duncan Murdoch
 wrote:
> On 24/06/2010 7:06 AM, Paul Chatfield wrote:
>>
>> I've had a look at the conditions in base and I can't get the ones to work
>> I've looked at but it is all new to me.
>> For example, I can work the examples for tryCatch, but it won't print a
>> finally message for me when I apply it to my model.  Even if I could get
>> this to work, I think it would still cause a break e.g.
>> for (j in 1:10)
>> {tryCatch(ifelse(j==5, stop(j), j), finally=print("oh dear"))}
>>
>> Thanks for the suggestion though - any others?
>>
>
> I think you don't want to use finally, which is just code that's guaranteed
> to be executed at the end.  You want to catch the errors and continue.  For
> example,
>
> for (j in 1:10)
> { tryCatch(ifelse(j==5, stop(j), print(j)), error=function(e) {print("caught
> error"); print(e)}) }
>
> Duncan Murdoch
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Question on WLS (gls vs lm)

2010-06-24 Thread Joris Meys

Thanks!  I was getting confused as wel, Wolf really had a point.

I have to admit that this is all a bit counterintuitive. I would
expect those weights to be the inverse of the variances, as GLS uses
the inverse of the variance-covariance matrix. I finally found in the
help files ?nlme::varWeights the needed information, after you pointed
out the problem and after strolling through quite a part of the help
files...

Does this mean that the GLS algorithm uses 1/sd as weights (can't
imagine that...), or is this one of the things in R that just are like
that?

Cheers
Joris

On Thu, Jun 24, 2010 at 3:20 PM, Viechtbauer Wolfgang (STAT)
 wrote:
> The weights in 'aa' are the inverse standard deviations. But you want to use 
> the inverse variances as the weights:
>
> aa <- (attributes(summary(f1)$modelStruct$varStruct)$weights)^2
>
> And then the results are essentially identical.
>
> Best,
>
> --
> Wolfgang Viechtbauer                        http://www.wvbauer.com/
> Department of Methodology and Statistics    Tel: +31 (0)43 388-2277
> School for Public Health and Primary Care   Office Location:
> Maastricht University, P.O. Box 616         Room B2.01 (second floor)
> 6200 MD Maastricht, The Netherlands         Debyeplein 1 (Randwyck)
>
>
> Original Message
> From: r-help-boun...@r-project.org
> [mailto:r-help-boun...@r-project.org] On Behalf Of Stats Wolf Sent:
> Thursday, June 24, 2010 15:00 To: Joris Meys
> Cc: r-help@r-project.org
> Subject: Re: [R] Question on WLS (gls vs lm)
>
>> Indeed, they should give the same results, and hence I was worried to
>> see that the results were not that same. Suffice it to look at
>> standard errors and p-values: they do differ, and the differences are
>> not really that small.
>>
>>
>> Thanks,
>> Stats Wolf
>>
>>
>>
>> On Thu, Jun 24, 2010 at 2:39 PM, Joris Meys 
>> wrote:
>>> Indeed, WLS is a special case of GLS, where the error covariance
>>> matrix is a diagonal matrix. OLS is a special case of GLS, where the
>>> error is considered homoscedastic and all weights are equal to 1. And
>>> I now realized that the varIdent() indeed makes a diagonal covariance
>>> matrix, so the results should be the same in fact. Sorry for missing
>>> that one.
>>>
>>> A closer inspection shows that the results don't differ too much. The
>>> fitting method differs between both functions; lm.wfit uses the QR
>>> decomposition, whereas gls() uses restricted maximum likelihood. In
>>> Asymptopia, they should give the same result.
>>>
>>> Cheers
>>> Joris
>>>
>>>
>>> On Thu, Jun 24, 2010 at 12:54 PM, Stats Wolf 
>>> wrote:
>>>> Thanks for reply.
>>>>
>>>> Yes, they do differ, but does not gls() with the weights argument
>>>> (correlation being unchanged) make the special version of GLS, as
>>>> this sentence from the page you provided says: "The method leading
>>>> to this result is called Generalized Least Squares estimation
>>>> (GLS), of which WLS is just a special case"?
>>>>
>>>> Best,
>>>> Stats Wolf
>>>>
>>>>
>>>>
>>>> On Thu, Jun 24, 2010 at 12:49 PM, Joris Meys 
>>>> wrote:
>>>>> Isn't that exactly what you would expect when using a _generalized_
>>>>> least squares compared to a normal least squares? GLS is not the
>>>>> same as WLS.
>>>>>
>>>>> http://www.aiaccess.net/English/Glossaries/GlosMod/e_gm_least_square
>>>>> s_generalized.htm
>>>>>
>>>>> Cheers
>>>>> Joris
>>>>>
>>>>> On Thu, Jun 24, 2010 at 9:16 AM, Stats Wolf 
>>>>> wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> I understand that gls() uses generalized least squares, but I
>>>>>> thought that maybe optimum weights from gls might be used as
>>>>>> weights in lm (as shown below), but apparently this is not the
>>>>>> case. See:
>>>>>>
>>>>>> library(nlme)
>>>>>> f1 <- gls(Petal.Width ~ Species / Petal.Length, data = iris,
>>>>>> weights = varIdent(form = ~ 1 | Species)) aa <-
>>>>>> attributes(summary(f1)$modelStruct$varStruct)$weights
>>>>>> f2 <- lm(Petal.Width ~ Species / Petal.Length, data = iris,
>>>>>> weights = aa)
>>>>>>
>>&

Re: [R] ?to calculate sth for groups defined between points in one variable (string), / value separating/ spliting variable into groups by i.e. between start, NA, NA, stop1, start2, NA, stop2

2010-06-24 Thread Joris Meys

On Thu, Jun 24, 2010 at 1:18 PM, Eugeniusz Kaluza
 wrote:
>
> Dear useRs,
>
> Thanks for any advices
>
> # I do not know where are the examples how to mark groups
> #  based on signal occurence in the additional variable: cf. variable c2,
> # How to calculate different calculations for groups defined by (split by 
> occurence of c2 characteristic data)
>
>
> #First example of simple data
> #mexample   1      2    3  4     5  6  7  8  9  10 11       12 13 14 15 16 17
> c0<-rbind( 1,      2 , 3, 4,      5, 6, 7, 8, 9,10,11,      12,13,14,15,16,17 
>     )
> c0
> c1<-rbind(10,     20 ,30,40,     50,10,60,20,30,40,50,      30,10, 
> 0,NA,20,10.3444)
> c1
> c2<-rbind(NA,"Start1",NA,NA,"Stop1",NA,NA,NA,NA,NA,NA,"Start2",NA,NA,NA,NA,"Stop2")
> c2
> C.df<-data.frame(cbind(c0,c1,c2))
> colnames(C.df)<-c("c0","c1","c2")
> C.df
>
> # preparation of form for explaining further needed result (next 3 lines are 
> not needed indeed, they are only  to explain how to obtain final result
>  c3<-rbind(NA,"Start1","Start1","Start1","Start1","Start2","Start2","Start2","Start2","Start2","Start2","Start2","Start2","Start2","Start2","Start2","Start2")
>  c4<-rbind(NA, "Stop1", "Stop1", "Stop1", "Stop1", "Stop2", "Stop2", "Stop2", 
> "Stop2", "Stop2", "Stop2", "Stop2", "Stop2", "Stop2", "Stop2", "Stop2", 
> "Stop2")
>  C.df<-data.frame(cbind(c0,c1,c2,c3,c4))
>  colnames(C.df)<-c("c0","c1","c2","c3","c4")
>  C.df$c5<-paste(C.df$c3,C.df$c4,sep="-")
>  C.df
>
Now this is something I don't get. The list "Start2-Stop2" starts way
before Start2, actually at Stop1. Sure that's what you want?

I took the liberty of showing how to get the data between start and
stop for every entry, and how to apply functions to it. If you don't
get the code, look at
?lapply
?apply
?grep

I also adjusted your example, as you caused all variables to be
factors by using the cbind in the data.frame function. Never do this
unless you're really sure you have to. But I can't think of a case
where that would be beneficial...

...
C.df<-data.frame(c0,c1,c2)
C.df

# find positions
Start <- grep("Start",C.df$c2)
Stop <- grep("Stop",C.df$c2)

# create indices
idx <- apply(cbind(Start,Stop),1,function(i) i[1]:i[2])
names(idx) <- paste("Start",1:length(Start),"-Stop",1:length(Start),sep="")

# Apply the function summary and get a list back named by the interval.
out <- lapply(idx,function(i) summary(C.df[i,1:2]))
out

If you really need to start Start2 right after Stop1, you can use a
similar approach.

Cheers
Joris

> # NEEDED RESULTS
>  # needed result
> # for Stat1-Stop1: mean(20,30,40,50)
> # for Stat2-Stop2: mean(c(10,60,20,30,40,50,30,10,0,NA,20,10.3444), na.rm=T)
> #mean:
>         c1     c3    c4           c5
>         20  Start1 Stop1 Start1-Stop1
>   25.48585  Start2 Stop2 Start2-Stop2
>
> #sum
> # for Stat1-Stop1: sum(20,30,40,50)
> # for Stat2-Stop2: sum(c(10,60,20,30,40,50,30,10,0,NA,20,10.3444), na.rm=T)
> #sum:
>         c1     c3    c4           c5
>        140  Start1 Stop1 Start1-Stop1
>   280.3444  Start2 Stop2 Start2-Stop2
>
> # for Stat1-Stop1: max(20,30,40,50)
> # for Stat2-Stop2: max(c(10,60,20,30,40,50,30,10,0,NA,20,10.3444), na.rm=T)
> #max:
>         c1     c3    c4           c5
>        50  Start1 Stop1 Start1-Stop1
>        60  Start2 Stop2 Start2-Stop2
>
> # place of max  (in Start1-Stop1: 4 th element in gruop Start1-Stop1
> # place of max  (in Start1-Stop1: 2 nd element in gruop Start1-Stop1
>
>        c0     c3    c4           c5
>         4  Start1 Stop1 Start1-Stop1
>         2  Start2 Stop2 Start2-Stop2
>
>
> Thanks for any suggestion,
> Kaluza
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Question on WLS (gls vs lm)

2010-06-24 Thread Joris Meys

Indeed, WLS is a special case of GLS, where the error covariance
matrix is a diagonal matrix. OLS is a special case of GLS, where the
error is considered homoscedastic and all weights are equal to 1. And
I now realized that the varIdent() indeed makes a diagonal covariance
matrix, so the results should be the same in fact. Sorry for missing
that one.

A closer inspection shows that the results don't differ too much. The
fitting method differs between both functions; lm.wfit uses the QR
decomposition, whereas gls() uses restricted maximum likelihood. In
Asymptopia, they should give the same result.

Cheers
Joris


On Thu, Jun 24, 2010 at 12:54 PM, Stats Wolf  wrote:
> Thanks for reply.
>
> Yes, they do differ, but does not gls() with the weights argument
> (correlation being unchanged) make the special version of GLS, as this
> sentence from the page you provided says: "The method leading to this
> result is called Generalized Least Squares estimation (GLS), of which
> WLS is just a special case"?
>
> Best,
> Stats Wolf
>
>
>
> On Thu, Jun 24, 2010 at 12:49 PM, Joris Meys  wrote:
>> Isn't that exactly what you would expect when using a _generalized_
>> least squares compared to a normal least squares? GLS is not the same
>> as WLS.
>>
>> http://www.aiaccess.net/English/Glossaries/GlosMod/e_gm_least_squares_generalized.htm
>>
>> Cheers
>> Joris
>>
>> On Thu, Jun 24, 2010 at 9:16 AM, Stats Wolf  wrote:
>>> Hi all,
>>>
>>> I understand that gls() uses generalized least squares, but I thought
>>> that maybe optimum weights from gls might be used as weights in lm (as
>>> shown below), but apparently this is not the case. See:
>>>
>>> library(nlme)
>>> f1 <- gls(Petal.Width ~ Species / Petal.Length, data = iris,  weights
>>> = varIdent(form = ~ 1 | Species))
>>> aa <- attributes(summary(f1)$modelStruct$varStruct)$weights
>>> f2 <- lm(Petal.Width ~ Species / Petal.Length, data = iris, weights = aa)
>>>
>>> summary(f1)$tTable; summary(f2)
>>>
>>> So, the two models with the very same weights do differ (in terms of
>>> standard errors). Could you please explain why? Are these different
>>> types of weights?
>>>
>>> Many thanks in advance,
>>> Stats Wolf
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Joris Meys
>> Statistical consultant
>>
>> Ghent University
>> Faculty of Bioscience Engineering
>> Department of Applied mathematics, biometrics and process control
>>
>> tel : +32 9 264 59 87
>> joris.m...@ugent.be
>> ---
>> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>>
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Wilcoxon signed rank test and its requirements

2010-06-24 Thread Joris Meys

One way of looking at it is doing a sign test after substraction of
the mean. For symmetrical data sets, E[X-mean(X)] = 0, so you expect
to have about as many values above as below zero. There is a sign test
somewhere in one of the packages, but it's easily done using the
binom.test as well :

> set.seed(12345)
> x1 <- rnorm(100)
> x2 <- rpois(100,2)

>  binom.test((sum(x1-mean(x1)>0)),length(x1))

Exact binomial test

data:  (sum(x1 - mean(x1) > 0)) and length(x1)
number of successes = 56, number of trials = 100, p-value = 0.2713
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.4571875 0.6591640
sample estimates:
probability of success
  0.56

>  binom.test((sum(x2-mean(x2)>0)),length(x2))

Exact binomial test

data:  (sum(x2 - mean(x2) > 0)) and length(x2)
number of successes = 37, number of trials = 100, p-value = 0.01203
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.2755666 0.4723516
sample estimates:
probability of success
  0.37

Cheers
Joris

On Thu, Jun 24, 2010 at 4:16 AM, Atte Tenkanen  wrote:
> PS.
>
> Mayby I can somehow try to transform data and check it, for example, using 
> the skewness-function of timeDate-package?
>
>> Thanks. What I have had to ask is that
>>
>> how do you test that the data is symmetric enough?
>> If it is not, is it ok to use some data transformation?
>>
>> when it is said:
>>
>> "The Wilcoxon signed rank test does not assume that the data are
>> sampled from a Gaussian distribution. However it does assume that the
>> data are distributed symmetrically around the median. If the
>> distribution is asymmetrical, the P value will not tell you much about
>> whether the median is different than the hypothetical value."
>>
>> > On Wed, Jun 23, 2010 at 10:27 PM, Atte Tenkanen  wrote:
>> > > Hi all,
>> > >
>> > > I have a distribution, and take a sample of it. Then I compare
>> that
>> > sample with the mean of the population like here in "Wilcoxon signed
>>
>> > rank test with continuity correction":
>> > >
>> > >> wilcox.test(Sample,mu=mean(All), alt="two.sided")
>> > >
>> > >        Wilcoxon signed rank test with continuity correction
>> > >
>> > > data:  AlphaNoteOnsetDists
>> > > V = 63855, p-value = 0.0002093
>> > > alternative hypothesis: true location is not equal to 0.4115136
>> > >
>> > >> wilcox.test(Sample,mu=mean(All), alt = "greater")
>> > >
>> > >        Wilcoxon signed rank test with continuity correction
>> > >
>> > > data:  AlphaNoteOnsetDists
>> > > V = 63855, p-value = 0.0001047
>> > > alternative hypothesis: true location is greater than 0.4115136
>> > >
>> > > What assumptions are needed for the population?
>> >
>> > wikipedia says:
>> > "The Wilcoxon signed-rank test is a _non-parametric_ statistical
>> > hypothesis test for... "
>> > it also talks about the assumptions.
>> >
>> > > What can we say according these results?
>> > > p-value for the "less" is 0.999.
>> >
>> > That the p-value for less and greater seem to sum up to one, and that
>> > the p-value of greater is half of that for two-sided. You shouldn't
>> > ask what we can say. You should ask yourself "What was the question
>> > and is this test giving me an answer on that question?"
>> >
>> > Cheers
>> > Joris
>> >
>> > --
>> > Joris Meys
>> > Statistical consultant
>> >
>> > Ghent University
>> > Faculty of Bioscience Engineering
>> > Department of Applied mathematics, biometrics and process control
>> >
>> > tel : +32 9 264 59 87
>> > joris.m...@ugent.be
>> > ---
>> > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Question on WLS (gls vs lm)

2010-06-24 Thread Joris Meys

Isn't that exactly what you would expect when using a _generalized_
least squares compared to a normal least squares? GLS is not the same
as WLS.

http://www.aiaccess.net/English/Glossaries/GlosMod/e_gm_least_squares_generalized.htm

Cheers
Joris

On Thu, Jun 24, 2010 at 9:16 AM, Stats Wolf  wrote:
> Hi all,
>
> I understand that gls() uses generalized least squares, but I thought
> that maybe optimum weights from gls might be used as weights in lm (as
> shown below), but apparently this is not the case. See:
>
> library(nlme)
> f1 <- gls(Petal.Width ~ Species / Petal.Length, data = iris,  weights
> = varIdent(form = ~ 1 | Species))
> aa <- attributes(summary(f1)$modelStruct$varStruct)$weights
> f2 <- lm(Petal.Width ~ Species / Petal.Length, data = iris, weights = aa)
>
> summary(f1)$tTable; summary(f2)
>
> So, the two models with the very same weights do differ (in terms of
> standard errors). Could you please explain why? Are these different
> types of weights?
>
> Many thanks in advance,
> Stats Wolf
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] count data with a specific range

2010-06-24 Thread Joris Meys

see ?levels eg:

x <- rnorm(10)
y <- cut(x,c(-10,0,10))
levels(y)<-c("-10-0","0-10")

cheers
Joris

On Thu, Jun 24, 2010 at 4:14 AM, Yi  wrote:
> Yeap. It works. Just to make the result more beautiful.
>
> One more question.
>
> The interval is showns as (0,10].
>
> Is there a way to change it into the format 0-10?
> Thanks.
>
> On Wed, Jun 23, 2010 at 6:12 PM, Joris Meys  wrote:
>>
>> see ?cut
>>
>> Cheers
>> Joris
>>
>> On Thu, Jun 24, 2010 at 2:57 AM, Yi  wrote:
>> > I would like to prepare the data for barplot. But I only have the data
>> > frame
>> > now.
>> >
>> > x1=rnorm(10,mean=2)
>> > x2=rnorm(20,mean=-1)
>> > x3=rnorm(15,mean=3)
>> > data=data.frame(x1,x2,x3)
>> >
>> > If there a way to put data within a specific range? The expected result
>> > is
>> > as follows:
>> >  range       x1                  x2                    x3
>> > -10-0        2                      5                     1  (# points
>> > in
>> > this range)
>> > 0-10         7                     9                      6
>> > ...
>> >
>> > I know the table function but I do not know how to deal with the range
>> > issue.
>> >
>> > Thanks in advance.
>> >
>> > Yi
>> >
>> >        [[alternative HTML version deleted]]
>> >
>> > __
>> > R-help@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>>
>>
>> --
>> Joris Meys
>> Statistical consultant
>>
>> Ghent University
>> Faculty of Bioscience Engineering
>> Department of Applied mathematics, biometrics and process control
>>
>> tel : +32 9 264 59 87
>> joris.m...@ugent.be
>> ---
>> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] count data with a specific range

2010-06-23 Thread Joris Meys

see ?cut

Cheers
Joris

On Thu, Jun 24, 2010 at 2:57 AM, Yi  wrote:
> I would like to prepare the data for barplot. But I only have the data frame
> now.
>
> x1=rnorm(10,mean=2)
> x2=rnorm(20,mean=-1)
> x3=rnorm(15,mean=3)
> data=data.frame(x1,x2,x3)
>
> If there a way to put data within a specific range? The expected result is
> as follows:
>  range       x1                  x2                    x3
> -10-0        2                      5                     1  (# points in
> this range)
> 0-10         7                     9                      6
> ...
>
> I know the table function but I do not know how to deal with the range
> issue.
>
> Thanks in advance.
>
> Yi
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem to building R (datasets)

2010-06-23 Thread Joris Meys

Why compile from source? 2.11.1 installs fine on XP from the binary,
so that's the more obvious solution.

Cheers
Joris

On Thu, Jun 24, 2010 at 12:39 AM, Geun Seop Lee  wrote:
>>
>>  Dear all,
>>>
>>> While I was trying to build R source, I found an error at datasets package
>>> (there was no error before that)
>>>
>>> ../../../library/datasets/R/datasets is unchanged
>>> Error in dir.create(Rdatadir, showWarnings = FALSE) :
>>>   file name conversion problem
>>> Calls:  ->  -> dir.create
>>> Execution halted
>>> make[2]: *** [all] Error 1
>>> make[1]: *** [R] Error 1
>>> make: *** [all] Error 2
>>>
>>> And it was caused by
>>>
>>> �...@$(ECHO) "tools:::data2LazyLoadDB(\"$(pkg)\", compress=3)" | \
>>>    R_DEFAULT_PACKAGES=NULL LC_ALL=C $(R_EXE) > /dev/null
>>> at the Makefile.win in the src/datasets directory
>>> I am using Window XP and tried to compile 2.11.1 version.
>>>
>>> I can't imagine how I can solve this problem. Any hints or suggestions
>>> will be appreciated.
>>>
>>> Thank you.
>>>
>>> Lee.
>>>
>>
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Beginning Eigen System question.

2010-06-23 Thread Joris Meys

On Thu, Jun 24, 2010 at 12:41 AM, Joris Meys  wrote:
> Which other Linear Algebra system, and which function did you use in R?
> Cheers
> Joris

Never mind, off course you used "eigen()"...
Eigenvectors are only determined up to a constant. If I'm not mistaken
(but check the help files on it), R normalizes the eigenvectors (and
so does your other Linear Algebra system). This makes the eigenvectors
 defined up to the sign.

Cheers
Joris
>
> On Thu, Jun 24, 2010 at 12:32 AM,   wrote:
>> Forgive me if I missunderstand a basic Eigensystem but when I present the 
>> following matrix to most any other LinearAlgebra system:
>>
>>  1  3  1
>>  1  2  2
>>  1  1  3
>>
>> I get an answer like:
>>
>> //$values
>> //[1]  5.00e+00  1.00e+00 -5.536207e-16
>>
>> //$vectors
>> //           [,1]       [,2]       [,3]
>> //[1,] 0.5773503 -0.8451543 -0.9428090
>> //[2,] 0.5773503 -0.1690309  0.2357023
>> //[3,] 0.5773503  0.5070926  0.2357023
>>
>> But R gives me:
>>
>> //$values
>> //[1]  5.00e+00  1.00e+00 -5.536207e-16
>>
>> //$vectors
>> //           [,1]       [,2]       [,3]
>> //[1,] -0.5773503 -0.8451543 -0.9428090
>> //[2,] -0.5773503 -0.1690309  0.2357023
>> //[3,] -0.5773503  0.5070926  0.2357023
>>
>> The only difference seems to be the sign on the first eigen vector. What am 
>> I missing?
>>
>> Kevin
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Joris Meys
> Statistical consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Applied mathematics, biometrics and process control
>
> tel : +32 9 264 59 87
> joris.m...@ugent.be
> ---
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Beginning Eigen System question.

2010-06-23 Thread Joris Meys

Which other Linear Algebra system, and which function did you use in R?
Cheers
Joris

On Thu, Jun 24, 2010 at 12:32 AM,   wrote:
> Forgive me if I missunderstand a basic Eigensystem but when I present the 
> following matrix to most any other LinearAlgebra system:
>
>  1  3  1
>  1  2  2
>  1  1  3
>
> I get an answer like:
>
> //$values
> //[1]  5.00e+00  1.00e+00 -5.536207e-16
>
> //$vectors
> //           [,1]       [,2]       [,3]
> //[1,] 0.5773503 -0.8451543 -0.9428090
> //[2,] 0.5773503 -0.1690309  0.2357023
> //[3,] 0.5773503  0.5070926  0.2357023
>
> But R gives me:
>
> //$values
> //[1]  5.00e+00  1.00e+00 -5.536207e-16
>
> //$vectors
> //           [,1]       [,2]       [,3]
> //[1,] -0.5773503 -0.8451543 -0.9428090
> //[2,] -0.5773503 -0.1690309  0.2357023
> //[3,] -0.5773503  0.5070926  0.2357023
>
> The only difference seems to be the sign on the first eigen vector. What am I 
> missing?
>
> Kevin
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] list operation

2010-06-23 Thread Joris Meys

Another variation on the same theme :

lst=list(m=c('a','b','c'),n=c('c','a'),l=c('a','bc'))
set <- c('a','c')

f <-function(lst,set) sapply(lst,function(x) sum(set %in% x)==length(set) )
i <- f(lst,set)
names(i[i])

Doesn't serve anybody but keeps my mind fresh.

For long lists, you might benefit from first calculating the length of
set, to avoid having to do that n times for a list of length n.

Cheers
Joris

On Wed, Jun 23, 2010 at 11:01 PM, Peter Alspach
 wrote:
> Tena koe Yu
>
> One possibility:
>
> lst[sapply(lst, function(x) length(x[x%in% c('a','c')])==2)]
>
> HTH ...
>
> Peter Alspach
>
>> -Original Message-
>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
>> project.org] On Behalf Of Yuan Jian
>> Sent: Thursday, 24 June 2010 1:35 a.m.
>> To: r-help@r-project.org
>> Subject: [R] list operation
>>
>> Hi,
>>
>> it seems a simple problem, but I can not find a clear way.
>> I have a list:
>> lst=list(m=c('a','b','c'),n=c('c','a'),l=c('a','bc'))
>> > lst
>> $m
>> [1] "a" "b" "c"
>> $n
>> [1] "c" "a"
>> $l
>> [1] "a"  "bc"
>>
>> how can I get list elements that include a given subset? for example,
>> for given subset {'a','c'}, the answer should be 'm' and 'n'.
>>
>> thanks
>> Yu
>>
>>
>>
>>       [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Probabilities from survfit.coxph:

2010-06-23 Thread Joris Meys

On Wed, Jun 23, 2010 at 9:03 PM, Parminder Mankoo  wrote:
> Hello:
>
> In the example below (or for a censored data) using survfit.coxph, can
> anyone point me to a link or a pdf as to how the probabilities appearing in
> bold under "summary(pred$surv)" are calculated? Do these represent
> acumulative probability distribution in time (not including censored time)?

predicted survival. Hence the column name "survival"...

Cheers
Joris
>
> Thanks very much,
> parmee
>
> *fit <- coxph(Surv(futime, fustat) ~ age, data = ovarian)*
> *pred <- survfit(fit, newdata=data.frame(age=60))*
> *summary(pred)*
>
>  time n.risk n.event survival std.err lower 95% CI upper 95% CI
>   59     26       1    *0.978*  0.0240        0.932        1.000
>  115     25       1    *0.952*  0.0390        0.878        1.000
>  156     24       1    *0.917*  0.0556        0.814        1.000
>  268     23       1    *0.880*  0.0704        0.752        1.000
>  329     22       1    *0.818*  0.0884        0.662        1.000
>  353     21       1    *0.760*  0.0991        0.588        0.981
>  365     20       1    *0.698*  0.1079        0.516        0.945
>  431     17       1    *0.623*  0.1187        0.429        0.905
>  464     15       1    *0.549*  0.1248        0.352        0.858
>  475     14       1    *0.480*  0.1267        0.286        0.805
>  563     12       1    *0.382*  0.1332        0.193        0.757
>  638     11       1    *0.297*  0.1292        0.127        0.697
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Wilcoxon signed rank test and its requirements

2010-06-23 Thread Joris Meys

On Wed, Jun 23, 2010 at 10:27 PM, Atte Tenkanen  wrote:
> Hi all,
>
> I have a distribution, and take a sample of it. Then I compare that sample 
> with the mean of the population like here in "Wilcoxon signed rank test with 
> continuity correction":
>
>> wilcox.test(Sample,mu=mean(All), alt="two.sided")
>
>        Wilcoxon signed rank test with continuity correction
>
> data:  AlphaNoteOnsetDists
> V = 63855, p-value = 0.0002093
> alternative hypothesis: true location is not equal to 0.4115136
>
>> wilcox.test(Sample,mu=mean(All), alt = "greater")
>
>        Wilcoxon signed rank test with continuity correction
>
> data:  AlphaNoteOnsetDists
> V = 63855, p-value = 0.0001047
> alternative hypothesis: true location is greater than 0.4115136
>
> What assumptions are needed for the population?

wikipedia says:
"The Wilcoxon signed-rank test is a _non-parametric_ statistical
hypothesis test for... "
it also talks about the assumptions.

> What can we say according these results?
> p-value for the "less" is 0.999.

That the p-value for less and greater seem to sum up to one, and that
the p-value of greater is half of that for two-sided. You shouldn't
ask what we can say. You should ask yourself "What was the question
and is this test giving me an answer on that question?"

Cheers
Joris

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Comparing distributions

2010-06-23 Thread Joris Meys

A qqplot would indeed help. ?ks.test for more formal testing, but be
aware: You should also think about what you call similar
distributions. See following example :

set.seed(12345)
x1 <- c(rnorm(100),rnorm(150,3.3,0.7))
x2 <- c(rnorm(140,1,1.2),rnorm(110,3.3,0.6))
x3 <- c(rnorm(140,2,1.2),rnorm(110,4.3,0.6))
d1 <-density(x1)
d2 <- density(x2)
d3 <- density(x3)

xlim <- 1.2*c(min(x1,x2,x3),max(x1,x2,x3))
ylim <- 1.2*c(0,max(d1$y,d2$y,d3$y))

op <- par(mfrow=c(1,3))
plot(d1,xlim=xlim,ylim=ylim)
lines(d2,col="red")
lines(d3,col="blue")
qqplot(x1,x2)
qqplot(x2,x3)
par(op)

# formal testing
ks.test(x1,x2)
ks.test(x2,x3)

# relocate x3
x3b <- x3 - mean(x3-x2)
x3c <- x3 - mean(x3-x1)

# formal testing
ks.test(x2,x3b)
ks.test(x1,x3c)

# test location
t.test(x2-x1)
t.test(x3-x2)
t.test(x3-x1)

Cheers
Joris

On Wed, Jun 23, 2010 at 9:33 PM, Ralf B  wrote:
> I am trying to do something in R and would appreciate a push into the
> right direction. I hope some of you experts can help.
>
> I have two distributions obtrained from 1 datapoints each (about
> 1 datapoints each, non-normal with multi-model shape (when
> eye-balling densities) but other then that I know little about its
> distribution). When plotting the two distributions together I can see
> that the two densities are alike with a certain distance to each other
> (e.g. 50 units on the X axis). I tried to plot a simplified picture of
> the density plot below:
>
>
>
>
> |
> |                                                         *
> |                                                      *     *
> |                                                   *    +   *
> |                                              *     +     +  *
> |                     *        +           *   +            +  *
> |                 *        +*     +   *  +                   + *
> |              *       +       *     +                           +*
> |           *       +                                               +*
> |        *       +                                                    +*
> |     *      +                                                          + *
> |  *      +                                                               + *
> |___
>
>
> What I would like to do is to formally test their similarity or
> otherwise measure it more reliably than just showing and discussing a
> plot. Is there a general approach other then using a Mann-Whitney test
> which is very strict and seems to assume a perfect match. Is there a
> test that takes in a certain 'band' (e.g. 50,100, 150 units on X) or
> are there any other similarity measures that could give me a statistic
> about how close these two distributions are to each other ? All I can
> say from eye-balling is that they seem to follow each other and it
> appears that one distribution is shifted by a amount from the other.
> Any ideas?
>
> Ralf
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] question about a program

2010-06-23 Thread Joris Meys

Most of the computation time is in the functions qvnorm. You can win a
little bit by optimizing the code, but the gain is relatively small.
You can also decrease the interval used to evaluate qvnorm to win some
speed there. As you look for the upper tail, no need to evaluate the
function in negative numbers. Eg :

pair_kFWER2=function(m,alpha,rho,k,pvaluessort){
library(mvtnorm)
cc_z <- numeric(m)
Var <- matrix(c(1,rho,rho,1), nrow=2, ncol=2, byrow=T)

lpart <- 1:k  # first part
hpart <- (k+1):m  # second part

# make first part of the cc_z
cc_z[lpart] <- qmvnorm((k*(k-1))/(m*(m-1))*alpha, tail="upper",
interval=c(0,4),sigma=Var)$quantile

# make second part of the cc_z
cc_z[hpart] <- sapply(hpart,function(i){
qmvnorm((k*(k-1))/((m-i+k)*(m-i+k-1))*alpha,interval=c(0,4),
tail="upper", sigma=Var)$quantile
})

crit_cons <- 1-pnorm(cc_z)

# does the test whether values of crit_cons[i] are smaller than
# values pvaluessort[i] and returns a vector
# which says at which positions does not occur
# tail takes the last position. I guess that's what you wanted
out <- tail(which(!(crit_cons < pvaluessort)),1)
return(out)
}

> system.time(replicate(100,pair_kFWER(m,alpha,rho,k,pvaluessort)))
   user  system elapsed
   5.970.046.04

> system.time(replicate(100,pair_kFWER2(m,alpha,rho,k,pvaluessort)))
   user  system elapsed
   4.430.004.44

Check whether the function works correctly. It gives the same result
as the other one in my tests. Only in the case where your function
returns 0, the modified returns integer(0)

Cheers
Joris


On Wed, Jun 23, 2010 at 2:21 PM, li li  wrote:
> Dear all,
>   I have the following program for a multiple comparison procedure.
> There are two functions for the two steps. First step is to calculate the
> critical values,
> while the second step is the actual procedure [see below: program with two
> functions].
>   This  work fine. However, However I want to put them into one function
> for the convenience
> of later use [see below: program with one function]. Some how the big
> function works extremely
> slow.  For example I chose
> m <- 10
> rho <- 0.1
> k <- 2
> alpha <- 0.05
> pvaluessort <- sort(1-pnorm(rnorm(10))
>
> Is there anything that I did wrong?
>  Thank you!
>                    Hannah
>
>
> ##Program with two functions
> ## first step
> library(mvtnorm)
> cc_f <- function(m, rho, k, alpha) {
>
> cc_z <- numeric(m)
> var <- matrix(c(1,rho,rho,1), nrow=2, ncol=2, byrow=T)
> for (i in 1:m){
>   if (i <= k) {cc_z[i] <- qmvnorm((k*(k-1))/(m*(m-1))*alpha, tail="upper",
> sigma=var)$quantile} else
>               {cc_z[i] <- qmvnorm((k*(k-1))/((m-i+k)*(m-i+k-1))*alpha,
> tail="upper", sigma=var)$quantile}
>               }
> cc <- 1-pnorm(cc_z)
> return(cc)
>                             }
> ## second step
> pair_kFWER=function(m,crit_cons,pvaluessort){
> k=0
> while((k  k=k+1
> }
> return(m-k)
> }
> ###
> ##Program with one function ##
>
> pair_kFWER=function(m,alpha,rho,k,pvaluessort){
> library(mvtnorm)
> cc_z <- numeric(m)
> var <- matrix(c(1,rho,rho,1), nrow=2, ncol=2, byrow=T)
> for (i in 1:m){
>   if (i <= k) {cc_z[i] <- qmvnorm((k*(k-1))/(m*(m-1))*alpha, tail="upper",
> sigma=var)$quantile} else
>               {cc_z[i] <- qmvnorm((k*(k-1))/((m-i+k)*(m-i+k-1))*alpha,
> tail="upper", sigma=var)$quantile}
>               }
> crit_cons <- 1-pnorm(cc_z)
>
> k=0
> while((k  k=k+1
> }
> return(m-k)
> }
> #####
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] (help) This is an R workspace memory processing question

2010-06-23 Thread Joris Meys

http://lmgtfy.com/?q=R+large+data+sets

Cheers
Joris

On Wed, Jun 23, 2010 at 6:38 AM, 나여나  wrote:
>
>    Hi all!
>
>
>   This is an R workspace memory processing question.
>
>
>   There is a method which from the R will control 10GB data at 500MB units?
>
>
>   my work environment :
>
>   R version : 2.11.1
>   OS : WinXP Pro sp3
>
>
>   Thanks and best regards.
>
>
>   Park, Young-Ju
>
>   from Korea.
>
>   [1][rKWLzcpt.zNp8gmPEwGJCA00]
>
>   
> [...@from=dllmain&rcpt=r%2Dhelp%40r%2Dproject%2Eorg&msgid=%3C20100623133828%2EH
>   M%2E0Zq%40dllmain%2Ewwl737%2Ehanmail%2Enet%3E]
>
> References
>
>   1. mailto:dllm...@hanmail.net
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] "save scores" from sem

2010-06-23 Thread Joris Meys

I should have specified: lavaan is not familiar to me. I'm also not
familiar enough with the sem package to tell you how to obtain the
scores, but all information regarding the fit is in the object. With
that, it shouldn't be too difficult to get the scores you want. This
paper might give you some more information, in case you didn't know it
yet :

http://socserv.mcmaster.ca/jfox/Misc/sem/SEM-paper.pdf

On a side note, sem with a single latent variable might be seen as a
factor analysis with one component, but definitely not as a PCA. A PCA
is constructed based on the total variance, rendering an orthogonal
space with as many dimensions as there ara variables. Not so for a FA,
as the matrix used to calculate the eigenvectors and eigenvalues is a
reduced matrix, in essence only taking into account part of the
variation in the data for calculation of the loadings. This makes PCA
absolutely defined, but FA only up to a rotation.

On a second side note, using the saved scores in some subsequent
analysis is in my view completely against the idea behind sem.
Structural equation modelling combines those observed variables
exactly to be able to take the variation on the combined latent
variable into account. If you use those latent variables as input in a
second analysis, you lose the information regarding the variation.

Cheers
Joris

On Wed, Jun 23, 2010 at 9:53 AM, Steve Powell  wrote:
> Dear Joris,
> thanks for your reply - it is the sem case which interests me. Suppose
> for example I use sem to construct a CFA for a set of variables, with
> a single latent variable, then this could be equivalent to a PCA with
> a single component, couldn't it? From the PCA I could "save" the
> scores as new variables; is there an equivalent with sem? This would
> be particularly useful if e.g. in sem I let some of the errors covary
> and then wanted to use the "saved scores" in some subsequent analysis.
>
> By the way, lavaan is at cran.r-project.org/web/packages/lavaan/index.html
>
> Best Wishes
> Steve
>
> www.promente.org | skype stevepowell99 | +387 61 215 997
>
>
>
>
> On Tue, Jun 22, 2010 at 7:08 PM, Joris Meys  wrote:
>> PCA and factor analysis is implemented in the core R distribution, no
>> extra packages needed. When using princomp, they're in the object.
>>
>>  pr.c <- princomp(USArrests,scale=T)
>>  pr.c$scores # gives you the scores
>>
>> see ?princomp
>>
>> When using factanal, you can ask for regression scores or Bartlett
>> scorse. See ?factanal.
>> Mind you, you will get different -i.e. more correct- results than
>> those obtained by SPSS.
>>
>> I don't understand what you mean with scores in the context of
>> structural equation modelling. Lavaan is unknown to me.
>>
>> Cheers
>> Joris
>>
>> On Tue, Jun 22, 2010 at 3:11 PM, Steve Powell  wrote:
>>>  Dear expeRts,
>>> sorry for such a newbie question -
>>> in PCA/factor analysis e.g. in SPSS it is possible to save scores from the
>>> factors. Is it analogously possible to "save" the implied scores from the
>>> latent variables in a measurement model or structural model e.g. using the
>>> sem or lavaan packages, to use in further analyses?
>>> Best wishes
>>> Steve Powell
>>>
>>> www.promente.org | skype stevepowell99 | +387 61 215 997
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Joris Meys
>> Statistical consultant
>>
>> Ghent University
>> Faculty of Bioscience Engineering
>> Department of Applied mathematics, biometrics and process control
>>
>> tel : +32 9 264 59 87
>> joris.m...@ugent.be
>> ---
>> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>>
>

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Mahalanobis distance

2010-06-22 Thread Joris Meys

Say X is your data matrix with the variable, then you could do :
 X <- matrix(rnorm(2100),300,7)
 S <- var(X)
 dist <- as.dist(
 apply(X,1,function(i){
 mahalanobis(X,i,S)
 }
 )
 )

Cheers
Joris

On Tue, Jun 22, 2010 at 11:41 PM, yoo hoo  wrote:
> I am a new R user.  i have a question about Mahalanobis distance.actually i 
> have 300 rows and 7 columns. columns are different measurements, 300 rows are 
> genes. since genes can
> classify into 4 categories. i used dist() with euclidean distance and 
> cmdscale to do MDS plot. but find out Mahalanobis distance may be
> better. how do i use Mahalanobis() to generate similar dist object which i 
> can use MDS plot? second question is if should i calculate mean for
> every categories for every measurement first and do 4*4 distance matrix, or i 
> should calculate pairwise distance first and then find category
> means since i only care about relative position of 4 categories in MDS
> plot. Thank you very much.
>
>
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to 'understand' R functions besides reading R codes

2010-06-22 Thread Joris Meys

There is:
1) read the help files
2) read the vignette (see ?vignette )
2) look up at www.rseek.org
3) check google
4) look at the references mentioned in the help files (or do first if
you're not familiar with the method)
5) ask here

If still nothing :
use another function/method.

Cheers
Joris

On Wed, Jun 23, 2010 at 2:29 AM, Yi  wrote:
> Apologize for not being clearer earlier. I would like to ask again. Thank
> Joris and Markleeds for response.
>
> Two examples:
>
> 1. Function 'var'. In R, it is the sum of square divided by (n-1) but not by
> n. (I know this in R class)
>
> 2. Function 'lm'. In R, it is the residual sum of square divied by (n-2) not
> by n, the same as in the least squares estimate. But the assumption
> following that in the method of maximum likelihood.  (I know this by looking
> up the book 'Introductory Statistics with R').
>
> But not all functions are explained in the books.
>
> I am thinking whether there is a general way to figure out how R function
> works and what is the underlying theory besides looking at the codes
> (type var or lm at the  Rprompt). Because R codes are too difficult to
> understand.
>
> Thanks
>
> Yi
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Verify the linear regression model used in R ( fundamental theory)

2010-06-22 Thread Joris Meys

It's normally always specified, unless in the case of least squares
linear regression (lm), where it is considered obvious.

Cheers
Joris

On Wed, Jun 23, 2010 at 1:46 AM, Yi  wrote:
> Hi, folks,
>
> As I understand, Least-squares Estimate (second-moment assumption) and the
> Method of Maximum Likelihood (full distribtuion assumption) are used for
> linear regression.
>
> I do >?lm, but the help file does not tell me the model employed in R. But
> in the book 'Introductory Statistics with R', it indicates R estimate the
> parameters using the method of Least-squares. However it assumes the error
> is iid N(o,sigma^2).
>
> Am I correct? Is there any general way (like RSiteSearch() ) to find out
> what the model (theory) is for specific function? Let's say how to find out
> the assumption and the model used for rlm.
>
> Thanks
>
> Yi
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] glm

2010-06-22 Thread Joris Meys

On Tue, Jun 22, 2010 at 1:00 AM, Samuel Okoye  wrote:
> Hi,
>
> I have the following data
>
> data1 <- data.frame(count = c(0,1,1,2,4,5,13,16,14), weeks = 1:9,
>     treat=c(rep("1mg",3),rep("5mg",3),rep("10mg",3)))
> and I am using
>
> library(splines)
>
> to fit
>
> glm.m <- glm(count~weeks)+as.factor(treat),family=poisson,data=data1)
>
> and I am interested in predicting the count variale for the weeks 10, 11 and
> 12 with treat 10mg and 15mg.

bad luck for you.

newdat <-data.frame(
weeks=rep(10:12,each=2),
treat=rep(c("5mg","10mg"),times=3)
)

preds <- predict(glm.m,type="response",newdata=newdat,se.fit=T)
cbind(newdat,preds)

gives as expected :
Warning message:
In bs(weeks, degree = 3L, knots = numeric(0), Boundary.knots = c(1L,  :
  some 'x' values beyond boundary knots may cause ill-conditioned bases

  weeks treat   fitse.fit residual.scale
110   5mg  5.934881  5.205426  1
210  10mg 12.041639  9.514347  1
311   5mg  4.345165  6.924663  1
411  10mg  8.816168 15.805171  1
512   5mg  2.781063  8.123436  1
612  10mg  5.642667 18.221007  1


Watch the standard errors on the predicted values. No, you shouldn't
predict outside your data space, especially when using splines. And
when interested in 15mg, well, you shouldn't put treatment as a factor
to start with.

Cheers
Joris

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] "save scores" from sem

2010-06-22 Thread Joris Meys

PCA and factor analysis is implemented in the core R distribution, no
extra packages needed. When using princomp, they're in the object.

 pr.c <- princomp(USArrests,scale=T)
 pr.c$scores # gives you the scores

see ?princomp

When using factanal, you can ask for regression scores or Bartlett
scorse. See ?factanal.
Mind you, you will get different -i.e. more correct- results than
those obtained by SPSS.

I don't understand what you mean with scores in the context of
structural equation modelling. Lavaan is unknown to me.

Cheers
Joris

On Tue, Jun 22, 2010 at 3:11 PM, Steve Powell  wrote:
>  Dear expeRts,
> sorry for such a newbie question -
> in PCA/factor analysis e.g. in SPSS it is possible to save scores from the
> factors. Is it analogously possible to "save" the implied scores from the
> latent variables in a measurement model or structural model e.g. using the
> sem or lavaan packages, to use in further analyses?
> Best wishes
> Steve Powell
>
> www.promente.org | skype stevepowell99 | +387 61 215 997
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Duplicate dates in zoo objects

2010-06-22 Thread Joris Meys

Oops, I was too fast. I meant :

i <- length(x)- match(unique(index(x)),rev(index(x)))+1

(one has to reverse the indices again)

But in any case, the aggregate-way is indeed far cleaner. Thx for
pointing that out.
Cheers
Joris

On Tue, Jun 22, 2010 at 7:24 PM, Achim Zeileis  wrote:
> On Tue, 22 Jun 2010, Joris Meys wrote:
>
>> Try this :
>>>
>>> x.Date <- as.Date("2003-02-01") + c(1, 3, 7, 7, 14) - 1
>>
>>> x <- zoo(1:5, x.Date)
>>
>>> x
>>
>> 2003-02-01 2003-02-03 2003-02-07 2003-02-07 2003-02-14
>>        1          2          3          4          5
>>
>>> i <- match(unique(index(x)),rev(index(x)))
>>
>>> x[i]
>>
>> 2003-02-01 2003-02-03 2003-02-07 2003-02-14
>>        1          2          4          5
>
> Note that this happens to do the right thing in this particular toy example
> but not more generally. Simply adding a single observation will make it
> fail. The aggregate() solution I posted previously is preferred:
>
> R> x.Date <- as.Date("2003-02-01") + c(1, 3, 7, 7, 14, 15) - 1
> R> x <- zoo(1:6, x.Date)
> Warning message:
> In zoo(1:6, x.Date) :
>  some methods for "zoo" objects do not work if the index entries in
> 'order.by' are not unique
> R> x
> 2003-02-01 2003-02-03 2003-02-07 2003-02-07 2003-02-14 2003-02-15
>         1          2          3          4          5          6
> R> aggregate(x, time(x), tail, 1)
> 2003-02-01 2003-02-03 2003-02-07 2003-02-14 2003-02-15
>         1          2          4          5          6
> R> i <- match(unique(index(x)),rev(index(x)))
> R>  x[i]
> 2003-02-01 2003-02-03 2003-02-07 2003-02-14 2003-02-15
>         1          2          3          5          6
>
> Best,
> Z
>
>> Cheers
>> Joris
>>
>>
>> On Tue, Jun 22, 2010 at 4:35 PM, Research 
>> wrote:
>>>
>>> Hello,
>>>
>>> I have a zoo time series read from an excel file which has some dates the
>>> same, such as the following example:
>>>
>>> 02/10/1995     4925.5
>>> 30/10/1995     4915.9
>>> 23/01/1996     4963.5
>>> 23/01/1996     5009.2
>>> 04/03/1996     5031.9     # here
>>> 04/03/1996     5006.5     # here
>>> 03/04/1996     5069.2
>>> 03/05/1996     5103.7
>>> 31/05/1996     5107.1
>>> 01/07/1996     5153.1
>>> 02/08/1996     5151.7
>>>
>>> Is there a simple way to keep the last  price of the ones that have the
>>> same
>>> dates?
>>>
>>> 04/03/1996    5031.9
>>> 04/03/1996    5006.5
>>>
>>> i.e., keep only the "04/03/1996    5006.5"  price and discard the
>>> previous
>>> one... Is there an implicit function that does that or do I need some
>>> sort
>>> of recursive algorithm?
>>>
>>> You can try a solution on this example (for convenience):
>>>
>>> x.Date <- as.Date("2003-02-01") + c(1, 3, 7, 7, 14) - 1
>>> x <- zoo(rnorm(5), x.Date)
>>>
>>> Zoo object  has 2 prices with same dates.
>>>
>>> Many thanks in advance,
>>> Costas
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Joris Meys
>> Statistical consultant
>>
>> Ghent University
>> Faculty of Bioscience Engineering
>> Department of Applied mathematics, biometrics and process control
>>
>> tel : +32 9 264 59 87
>> joris.m...@ugent.be
>> ---
>> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Duplicate dates in zoo objects

2010-06-22 Thread Joris Meys

Try this :
> x.Date <- as.Date("2003-02-01") + c(1, 3, 7, 7, 14) - 1

> x <- zoo(1:5, x.Date)

> x
2003-02-01 2003-02-03 2003-02-07 2003-02-07 2003-02-14
 1  2  3  4  5

> i <- match(unique(index(x)),rev(index(x)))

> x[i]
2003-02-01 2003-02-03 2003-02-07 2003-02-14
 1  2  4  5

Cheers
Joris


On Tue, Jun 22, 2010 at 4:35 PM, Research  wrote:
> Hello,
>
> I have a zoo time series read from an excel file which has some dates the
> same, such as the following example:
>
> 02/10/1995     4925.5
> 30/10/1995     4915.9
> 23/01/1996     4963.5
> 23/01/1996     5009.2
> 04/03/1996     5031.9     # here
> 04/03/1996     5006.5     # here
> 03/04/1996     5069.2
> 03/05/1996     5103.7
> 31/05/1996     5107.1
> 01/07/1996     5153.1
> 02/08/1996     5151.7
>
> Is there a simple way to keep the last  price of the ones that have the same
> dates?
>
> 04/03/1996    5031.9
> 04/03/1996    5006.5
>
> i.e., keep only the "04/03/1996    5006.5"  price and discard the previous
> one... Is there an implicit function that does that or do I need some sort
> of recursive algorithm?
>
> You can try a solution on this example (for convenience):
>
> x.Date <- as.Date("2003-02-01") + c(1, 3, 7, 7, 14) - 1
> x <- zoo(rnorm(5), x.Date)
>
> Zoo object  has 2 prices with same dates.
>
> Many thanks in advance,
> Costas
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] replication time series using permutations of the each "y" values

2010-06-22 Thread Joris Meys

Read the posting guide please. What's your data structure? That's
quite important. As I see it, I can easily get a matrix with what you
want by :
x1 <- rep("a","b",each=3)
x2 <- rep("c","d","f",times=2)
x3 <- rep("g",6)
x4 <- rep("h",6)
result <- rbind(x1,x2,x3,x4)

But that's not what you want probably.
Cheers
Joris

2010/6/22 Josué Polanco :
> Dear All,
>
> I'm trying to create this:
> I've these data (a, b,c, etc. are numbers) :
>
> 1  a b
> 2  c d f
> 3  g
> 4  h
>   ...
> N
>
> I'm trying to create the all "time series permutations" (length = N)
> taking account
> the possibilities for each value. For example (N=4),
>
> 1  a
> 2  c
> 3  g
> 4  h
>
> next,
> 1   b
> 2   c
> 3   g
> 4   h
>
> next
> 1   a
> 2   d
> 3   g
> 4   h
>
> and so on. For this example, there are 2*3 different (possibilities)
> time series, but
> I need to do, v. gr., N = 100, and 2^14  different (possibilities)
> time series. I am wondering
> if somebody could give me a hint (suggestion) to make this using a
> function in R.
>
> thank you so much
>
> best regards
>
> --
> Josue Polanco
>
> ______
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How do you install R package?

2010-06-22 Thread Joris Meys

open the R console.
type:
?install.packages.

press enter.

read.

say "Doh, that's easy."
do what you read.

cheers
Joris

On Tue, Jun 22, 2010 at 5:38 PM, Amy Li  wrote:
> Hi I'm a new user. I need to install some new packages. Can someone show me?
>
> Amy
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Project

2010-06-22 Thread Joris Meys

Presuming you're talking about Perl, I might be able to help with
specific issues, but read the posting guide before you actually call
upon our help.

http://www.R-project.org/posting-guide.html

One book I definitely can recommend is Phil Spectors Data Manipulation with R

http://www.springer.com/statistics/computanional+statistics/book/978-0-387-74730-9

It covers most you need to know about data structures in R, as they're
quite different from Perl. Pay attention to the vectorization and the
use of indices in R, as both concepts differ substantially from Perl.
In essence, avoid for-loops as much as you can. One other source you
might look at is the R inferno of Patrick Burns :

http://lib.stat.cmu.edu/S/Spoetry/Tutor/R_inferno.pdf

Cheers
Joris

On Tue, Jun 22, 2010 at 4:35 AM, Colton Hunt  wrote:
> I am an intern in Virginia Beach and I am currently working on a project
> that involves converting a script of Pearl to R. The code takes one input
> text file and outputs six text files. If you think you can help, I will be
> able to provide more detailed information. Thank you for your time!
>
> Colton Hunt
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] {Spam?} RE: {Spam?} RE: {Spam?} Re: mgcv, testing gamm vs lme, which degrees of freedom?

2010-06-22 Thread Joris Meys

Hi Carlos,

there is no possible way you can compare both models using a classical
statistical framework, be it ML, REML or otherwise. The assumptions
are violated. Regarding the df, see my previous mail.

In your case, I'd resort to the AIC/BIC criteria, and if prediction is
the main focus, compare the predictive power of both models using a
crossvalidation approach. Wood suggests in his book also a MCMC
approach for more difficult comparisons.

Cheers
Joris

On Tue, Jun 22, 2010 at 1:31 AM, Carlo Fezzi  wrote:
> Hi Christos,
>
> thanks for your kind reply, I agree entirely with your interpreation.
>
> In the first model comparison, however, "anova" does seem to work
> according to our interpretation, i.e. the "df" are equal in the two model.
> My intuition is that the "anova" command does a fixed-effect test rather
> than a random effect one. This is the results I get:
>
> anova(f1$lme,f2$lme)
>
>       Model df      AIC      BIC    logLik
> f1$lme     1  5 466.6232 479.6491 -228.3116
> f2$lme     2  5 347.6293 360.6551 -168.8146
>
> Hence I was not sure our interpretation was correct.
>
> On your second regarding mode point I totally agree about the appealing of
> GAMs... howver, I am working in a specific application where the quadratic
> function is the established benchmark and I think that testing against it
> will show even more strongly the appeal of a gamm approach. Any idea of
> which bases could work?
>
> Finally thansk for the tip regarding gamm4, unfortunately I need to fit a
> bivariate smooth so I cannot use it.
>
> Best wishes,
>
> Carlo
>
>
>
>
-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Popularity of R, SAS, SPSS, Stata...

2010-06-22 Thread Joris Meys

Hehe,

You do have a point in not calling R a statistical language. It is
indeed far more than that; Yet, I don't agree that statistics is done
by stuffy professors. Wished it was so, but alas, last time I looked
at my paycheck I had to conclude that I might be stuffy, but I'm far
from being paid as a professor...

Cheers
Joris

On Tue, Jun 22, 2010 at 11:34 AM, Patrick Burns
 wrote:
> I'll expand my statement slightly.
>
> Yes, Peter, you are the archetypical
> stuffy professor.  The truth hurts.
>
> By any reasonable metric that I've
> thought of my company name is at least
> one-third "statistics", from which a
> common (and I think correct) inference
> would be that I'm not anti-statistics.
>
>
> There are two aspects of why I think
> that R should not be called a statistical
> program: marketing and reality.
>
> Marketing
>
> Identifying with the most dreaded experience
> in university is not so good for "sales".
> (Reducing stuffiness might reduce the root
> problem here.)
>
> Reality
>
> R really is used for more than statistics.
> Almost all of my use of R is outside the
> realm of statistics.  Maybe the field of
> statistics should have claim on a lot of
> that, but as of now that isn't the case.
>
> A Fusion
>
> R's real competition is not SAS or SPSS, but
> Excel.  As Brian has pointed out before,
> the vast majority of statistics is actually
> done in Excel.  Is Excel a statistics program?
> I don't think many people think that -- neither
> statisticians nor non-statisticians.
>
> Pat
>
>
> On 21/06/2010 10:32, Joris Meys wrote:
>>
>> On Mon, Jun 21, 2010 at 11:15 AM, Patrick Burns
>>   wrote:
>>>
>>> (Statistics is what stuffy professors
>>> do, I just look at my data and try to
>>> figure out what it means.)
>>
>> Often those stuffy professors have a reason to do so. When they want
>> an objective view on the data for example, or an objective measure of
>> the significance of a hypothesis. But you're right, who cares about
>> objectiveness these days? It doesn't sell you a paper, does it?
>>
>> Cheers
>> Joris
>>
>>
>
> --
> Patrick Burns
> pbu...@pburns.seanet.com
> http://www.burns-stat.com
> (home of 'Some hints for the R beginner'
> and 'The R Inferno')
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] {Spam?} Re: mgcv, testing gamm vs lme, which degrees of freedom?

2010-06-22 Thread Joris Meys

On Mon, Jun 21, 2010 at 10:01 PM, Bert Gunter  wrote:
>
> 1. This discussion probably belongs on the sig-mixed-models list.

Correct, but I'll keep it here now for archive purposes.

>
> 2. Your claim is incorrect, I think. The structure of the random errors =
> model covariance can be parameterized in various ways, and one can try to
> test significance of nested parameterizations (for a particular fixed
> effects parameterizaton). Whether you can do so meaningfully especially in
> the gamm context --  is another issue, but if you check e.g. Bates and
> Pinheiro, anova for different random effects parameterizations is advocated,
> and is implemented in the anova.lme() nlme function.
>
It is a matter of debate and interpretation. Mind you that I never
said it can't be done. I just wanted to pointed out that in most
cases, it shouldn't be done. As somebody else noted, both Wood and
Pinheiro and Bates suggest testing the random components _if the fixed
effects are the same_ . Yet, even then it gets difficult to offer the
correct hypothesis. In the example of Carlo, H0 is actually not
correct :

1) The "7 extra random components" are not really 7 extra random
components, as they are definitely not independent, but actually all
part of the same spline fit.

2) According to my understanding, the degrees of freedom for a
likelihood is determined by the amount of parameters fitted, not the
amount of variables in the model. The parameters linked to the random
effect are part of the variance structure, and the number of
parameters to be fitted in this variance structure is not dependent on
the number of knots, but on the number of smooths. Indeed, in the gamm
model the variance of the parameters b for the random effect is
considered to be equal for all b's related to the same smooth.

3) it appears to me that you violate the restriction both
Pinheiro/Bates and Wood put on the testing of models with LR : you can
only do so if none of the variance parameters are restriced to the
edge of the feasible parameter space. Again as I see it, the model
with less knots implies the assumption that some variance components
are zero.

Hence, you cannot use a LR test to compare both models. The anova.lme
won't carry out the test anyway :

 library(mgcv)
 library(nlme)
 set.seed(123)

  x  <- runif(100,1,10)# regressor
  b0 <- rep(rnorm(10,mean=1,sd=2),each=10) # random intercept
  id <- rep(1:10, each=10) # identifier
  y <- b0 + x - 0.1 * x^3 + rnorm(100,0,1)  # dependent variable

  f1 <- gamm(y ~ s(x, k=3, bs="cr"), random = list(id=~1), method="ML" )

  f2 <- gamm(y ~ s(x, k=10, bs="cr"), random = list(id=~1),method="ML" )

> anova(f1$lme,f2$lme)
   Model df  AIC  BIClogLik
f1$lme 1  5 466.6232 479.6491 -228.3116
f2$lme 2  5 347.6293 360.6551 -168.8146

If you change the random term not related to the splines, it does
carry out the test. In this case, you can test the random effects as
explained by Pinheiro/Bates.

> f3 <- gamm(y ~ s(x, k=10, bs="cr"),method="ML" )

> anova(f2$lme,f3$lme)
   Model df  AIC  BIClogLik   Test  L.Ratio p-value
f2$lme 1  5 347.6293 360.6551 -168.8146
f3$lme 2  4 446.2704 456.6910 -219.1352 1 vs 2 100.6411  <.0001

As an illustration, the variance of the random effects :
> f1$lme
...
Random effects:
 Formula: ~Xr.1 - 1 | g.1
Xr.1
StdDev: 163.8058

 Formula: ~1 | id %in% g.1
(Intercept) Residual
StdDev:1.686341 2.063269

Number of Observations: 100
Number of Groups:
g.1 id %in% g.1
  1  10

> f2$lme
...
Random effects:
 Formula: ~Xr.1 - 1 | g.1
 Structure: pdIdnot
   Xr.11Xr.12Xr.13Xr.14Xr.15Xr.16Xr.17Xr.18
StdDev: 2.242349 2.242349 2.242349 2.242349 2.242349 2.242349 2.242349 2.242349
...

Clearly, the SD for all parameters b is exactly the same, and hence
only 1 parameter in the likelihood function. So both models won't
differ in df when using the ML estimation. This does not mean that the
b-coefficients for the random effects cannot be obtained. They are
just not part of the minimalization in the likelihood function. Or, as
Wood said about random effects : you don't estimate them, although you
might want to predict them.

Cheers
Joris

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] {Spam?} Re: mgcv, testing gamm vs lme, which degrees of freedom?

2010-06-21 Thread Joris Meys

t;>> Dear all,
>>>>>>>
>>>>>>> I am using the "mgcv" package by Simon Wood to estimate an additive
>>>>>>> mixed
>>>>>>> model in which I assume normal distribution for the residuals. I
>>>>>>> would
>>>>>>> like to test this model vs a standard parametric mixed model, such
>>>>>>> as
>>>>>>> the
>>>>>>> ones which are possible to estimate with "lme".
>>>>>>>
>>>>>>> Since the smoothing splines can be written as random effects, is it
>>>>>>> correct to use an (approximate) likelihood ratio test for this?
>>>>>> -- yes this is ok (subject to the usual caveats about testing on the
>>>>>> boundary
>>>>>> of the parameter space) but your 2 example models below will have
>>>>>>  the
>>>>>> same
>>>>>> number of degrees of freedom!
>>>>>>
>>>>>>> If so,
>>>>>>> which is the correct number of degrees of freedom?
>>>>>> --- The edf from the lme object, if you are testing using the log
>>>>>> likelihood
>>>>>> returned by the  lme representation of the model.
>>>>>>
>>>>>>> Sometime the function
>>>>>>> LogLik() seems to provide strange results regarding the number of
>>>>>>> degrees
>>>>>>> of freedom (df) for the gam, for instance in the example I copied
>>>>>>> below
>>>>>>> the df for the "gamm" are equal to the ones for the "lme", but the
>>>>>>> summary(model.gam) seems to indicate a much higher edf for the gamm.
>>>>>> --- the edf for the lme representation of the model counts only the
>>>>>> fixed
>>>>>> effects + the variance parameters (which includes smoothing
>>>>>> parameters).
>>>>>> Each
>>>>>> smooth typically contributes only one or two fixed effect parameters,
>>>>>> with
>>>>>> the rest of the coefficients for the smooth treated as random
>>>>>> effects.
>>>>>>
>>>>>> --- the edf for the gam representation of the same model differs in
>>>>>> that
>>>>>> it
>>>>>> also counts the *effective* number of parameters used to represent
>>>>>> each
>>>>>> smooth: this includes contributions from all those coefficients that
>>>>>> the
>>>>>> lme
>>>>>> representation treated as strictly random.
>>>>>>
>>>>>> best,
>>>>>> Simon
>>>>>>
>>>>>>
>>>>>>> I would be very grateful to anybody who could point out a solution,
>>>>>>>
>>>>>>> Best wishes,
>>>>>>>
>>>>>>> Carlo
>>>>>>>
>>>>>>> Example below:
>>>>>>>
>>>>>>> 
>>>>>>>
>>>>>>> rm(list = ls())
>>>>>>> library(mgcv)
>>>>>>> library(nlme)
>>>>>>>
>>>>>>> set.seed(123)
>>>>>>>
>>>>>>> x  <- runif(100,1,10)                                # regressor
>>>>>>> b0 <- rep(rnorm(10,mean=1,sd=2),each=10)     # random intercept
>>>>>>> id <- rep(1:10, each=10)                     # identifier
>>>>>>>
>>>>>>> y <- b0 + x - 0.1 * x^3 + rnorm(100,0,1)  # dependent variable
>>>>>>>
>>>>>>> f1 <- lme(y ~ x + I(x^2), random = list(id=~1) , method="ML" )  #
>>>>>>> lme
>>>>>>> model
>>>>>>>
>>>>>>> f2 <- gamm(y ~ s(x), random = list(id=~1), method="ML" )    # gamm
>>>>>>>
>>>>>>> ## same number of "df" according to logLik:
>>>>>>> logLik(f1)
>>>>>>> logLik(f2$lme)
>>>>>>>
>>>>>>> ## much higher edf according to summary:
>>>>>>> summary(f2$gam)
>>>>>>>
>>>>>>> ---
>>>>>>>
>>>>>>> __
>>>>>>> R-help@r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide
>>>>>>> http://www.R-project.org/posting-guide.html and provide commented,
>>>>>>> minimal,
>>>>>>> self-contained, reproducible code.
>>>>>>
>>>>>> --
>>>>>>> Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY
>>>>>>> UK
>>>>>>> +44 1225 386603  www.maths.bath.ac.uk/~sw283
>>>>>>
>>>>>
>>>>> __
>>>>> R-help@r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Joris Meys
>>>> Statistical consultant
>>>>
>>>> Ghent University
>>>> Faculty of Bioscience Engineering
>>>> Department of Applied mathematics, biometrics and process control
>>>>
>>>> tel : +32 9 264 59 87
>>>> joris.m...@ugent.be
>>>> ---
>>>> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>>>>
>>>
>>>
>>>
>>
>>
>>
>> --
>> Joris Meys
>> Statistical consultant
>>
>> Ghent University
>> Faculty of Bioscience Engineering
>> Department of Applied mathematics, biometrics and process control
>>
>> tel : +32 9 264 59 87
>> joris.m...@ugent.be
>> ---
>> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>>
>
>
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ctree

2010-06-21 Thread Joris Meys

You can't get them out of the plot function, but you can calculate
them from the fit. This code returns a matrix with the appropriate
rownames and colnames.

x1 <- treeresponse(iris.ct)
p <- matrix(
  unlist(unique(x1)),
  ncol=3,
  byrow=T
)

colnames(p) <- levels(iris$Species)
rownames(p) <- unique(where(iris.ct))
p



On Mon, Jun 21, 2010 at 3:28 PM,   wrote:
>
> Hello,
>
> This is a re-submittal of question I submitted last week, but haven't rec'd
> any responses.
>
> I need to extract the probabilities used to construct the barplots
> displayed as part of the graph produced by plot("ctree").
>
> For example,
>
> library(party)
>
> iris.ct <- ctree(Species ~ . , data = iris)
> plot(iris.ct)
>
> Instead of a simple example with only 4 terminal nodes, my analysis
> produces 55 terminal nodes, so unless someone is really interested I'm only
> using the iris data as an example.
>
> This is the model I am sending to ctree. The graph produced is very
> informative, but I need the information from the plot(Marshct)
>
> Marshct <- ctree(Physiogomy ~ meanAnnualDepthAve + meanAnnualDepthWetAve +
> medAnnualDepthAve + medianAnnualDepthWetAve + medianAnnualDepthDryAve +
> continHydroWetAve + DCHperiodAverage + DCHydroDryAve +
> threeDayWaterDepthMinAve + threeDayWaterDepthMaxAve + sevenDayDryFreqAve +
> sevenDayWaterDepthMaxAve + sevenDayWaterDepthMinAve +
> seventeenDayWaterDepMaxAve + seventeenDayWaterDepMinAve +
> thirtyoneDayWaterDepthMaxAve + wetIntensityAve + BulkDensity + LOI + TP +
> TN + TC + Total_Mg, data = Marsh)
>
> plot(Marshct)
>
>
> Working in Windows XP with R2.11.1 (2010-05-31)
>
> Thanks
> Steve
>
> Steve Friedman Ph. D.
> Spatial Statistical Analyst
> Everglades and Dry Tortugas National Park
> 950 N Krome Ave (3rd Floor)
> Homestead, Florida 33034
>
> steve_fried...@nps.gov
> Office (305) 224 - 4282
> Fax     (305) 224 - 4147
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Return value associated with a factor

2010-06-21 Thread Joris Meys

Code is not runnable, so can't check why it goes wrong, but tried
already with as.character(save.tract$...) ?
Cheers
Joris

On Mon, Jun 21, 2010 at 3:15 PM, GL  wrote:
>
> I am using the code below to extract census tract information.
> save.tract$state, save.tract$county and save.tract$tract are returned as
> factors. In the last three statements, I need to save the actual value of
> the factor, but, instead, the code is yielding the position of the factor.
> How do I instead return the value of the factor?
>
> By way of example, for Lon=-82.49574 and Lat=29.71495, the code returns
> state = 1, county = 1 and tract = 161. The desired results are state=12,
> county = 001 tract = 002201.
>
>
> #set libraries
> library(UScensus2000tract)
> library(gdata)
> data(florida.tract)
>
>
> #read input
> dbs.in = read.delim("addresses_coded_new.txt", header = TRUE, sep = "\t",
>                     quote="\"", dec=".")
>
> #instantiate output
> more.columns <- data.frame( state=double(0),
>                            county=double(0),
>                            tract=double(0))
>
> dbs.in <- cbind(dbs.in,more.columns)
>
> #fiure out how many times to loop
> j <- nrow(dbs.in)
>
> #loop through each lab/long and assign census tract
>
> for (i  in 1:j) {
>
> index<-overlay(SpatialPoints(cbind(dbs.in$Lon[i],dbs.in$Lat[i])),florida.tract)
>         save.tract<-florida.tract[index,]
>         dbs.in$state[i] <- save.tract$state #this is returning the position
> in the list instead of the value
>         dbs.in$county[i] <- save.tract$county #this is returning the
> position in the list instead of the value
>         dbs.in$tract[i] <- save.tract$tract #this is returning the position
> in the list instead of the value
>    }
>
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Return-value-associated-with-a-factor-tp2262605p2262605.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Replacing elements of a list over a certain threshold

2010-06-21 Thread Joris Meys

You shouldn't use sapply/lapply for this but use the indices

> set.seed(1)
> r <- round(runif(10,1,10))
> treshold <- 5
> head(r)
[1] 3 4 6 9 3 9

> system.time( r[ r>threshold ] <- threshold )
   user  system elapsed
  0   0   0

> head(r)
[1] 3 4 5 5 3 5



On Mon, Jun 21, 2010 at 2:03 PM, Christos Argyropoulos
 wrote:
>
> Hi,
> You should use the sapply/lapply for such operations.
>
>> r<-round(runif(10,1,10))
>> head(r)
> [1] 3 7 6 3 2 8
>> filt<-function(x,thres) ifelse(x> system.time(r2<-sapply(r,filt,thres=5))
>   user  system elapsed
>   3.36    0.00    3.66
>> head(r2)
> [1] 3 5 5 3 2 5
>
>
> To return a list, replace "sapply" with "lapply"
>
> Christos
>
>
>
>> Date: Mon, 21 Jun 2010 11:34:01 +0100
>> From: ja...@ipec.co.uk
>> To: r-help@r-project.org
>> Subject: [R]  Replacing elements of a list over a certain threshold
>>
>> Dear List,
>>
>> I have a list of length ~1000 filled with numerics. I need to replace
>> the elements of this list that are above a certain numerical threshold
>> with the value of the threshold.
>>
>> e.g
>> example=list(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1)
>> threshold=5
>> 
>> example=(1, 2, 3, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 3, 2, 1).
>>
>> I have written a crude script that achieves this but it's very slow. Is
>> there a way to do this using some R function?
>>
>> Crude script: http://pastebin.com/3KSfi8nD
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> _
> Hotmail: Free, trusted and rich email service.
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 3 4 5 >

1 - 100 of 438 matches

Mail list logo