[R] possible problem with endpoints?

2011-03-17 Thread Erin Hodgess
Dear R People:

Hello again!

I found something unusual in the behavior of the endpoints function
from the xts package:

 x1 - ts(1:24,start=2008,freq=12)
 dat - seq(as.Date(2008/01/01),length=24,by=months)
 library(zoo);library(xts)
 x2 - zoo(1:24,order=dat)
 #Here is the surprise:
 endpoints(as.xts(x1),'quarters')
 [1]  0  1  4  7 10 13 16 19 22 24
 endpoints(as.xts(x2),'quarters')
[1]  0  3  6  9 12 15 18 21 24


Shouldn't the endpoints be the same, please?  What am I doing wrong, please?

Thanks,
Erin


-- 
Erin Hodgess
Associate Professor
Department of Computer and Mathematical Sciences
University of Houston - Downtown
mailto: erinm.hodg...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plotting multiple figures on one page

2011-03-17 Thread Jim Lemon

On 03/17/2011 07:46 AM, scarlet wrote:

I am new to the R language. I am trying to plot multiple figures on one page
through a loop, but the code just produce one graph on one page. Can someone
show some light on what's wrong?

Here is my code:

library(quantreg)
tcdata-read.table(mydata.txt,header=TRUE)
postscript(myfigure.ps)
basins- paste(c(b1,b2,b3,b4,b5,b6),sep=)
par(mfrow = c(2,3),mar=c(0,4,0,2),oma=c(5,0,3,0))

for (k in 1:6) {

data0=subset(tcdata,basin==basins[k])
attach(data0)

model=rq(rain~year,tau=seq(0.1,0.9,0.1))
plot(summary(model,alpha=0.05,se=iid),parm=2,pch=19,cex=1.2,mar=c(5,5,5,5),
ylab= ,xlab= )

}
dev.off()



Hi scarlet,
First, you don't really have to paste the values for basin together, 
the c is all you need. Trying this code:


tcdata-data.frame(rain=sample(400:1500,360),
 year=rep(1927:1986,each=6),basin=rep(1:6,60))
par(mfrow = c(2,3),mar=c(0,4,0,2),oma=c(5,0,3,0))
for (k in unique(tcdata$basin)) {
 data0=subset(tcdata,basin==k)
 model=rq(rain~year,data=data0,tau=seq(0.1,0.9,0.1))
 plot(summary(model,alpha=0.05,se=iid),parm=2,pch=19,
  cex=1.2,mar=c(5,5,5,5),ylab= ,xlab= )
}

does exactly what you say, that is, ignores the division of the figure 
area into six parts. As ordinary plots work okay, I assume that there is 
something in the plot method for an rq model that overrides the call to 
par(mfrow...)


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regex query (Apache logs)

2011-03-17 Thread Allan Engelhardt

Some comments:

1. [^\s] matches everything up to a literal 's', unless perl=TRUE.
2. The (.*) is greedy, so you'll need (.*?)\s(.*?)\s(.*?)$ or 
similar at the end of the expression


With those changes (and removing a space inserted by the newsgroup 
posting) the expression works for me.


 (pat - readLines(/tmp/b.txt)[1])
[1] 
^(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})\\s([^\\s]*)\\s([^\\s]*)\\s\\[([^\\]]+)\\]\\s\([A-Z]*)\\s([^\\s]*)\\s([^\\s]*)\\\s([^\\s]+)\\s(\\d+)\\s\(.*?)\\\s\(.*?)\\\s\(.*?)\$

 regexpr(pat, test, perl=TRUE)
[1] 1
attr(,match.length)
[1] 436

3. Consider a different approach, e.g. scan(textConnection(test), 
what=character(0))


Hope this helps

Allan


On 16/03/11 22:18, Saptarshi Guha wrote:

Hello R users,

I have this regex see [1] for apache log lines. I tried using R to parse
some data (only because I wanted to stay in R).
A sample line is [2]

(a) I saved the line in [1] into ~/tmp/a.txt and [2] into /tmp/a.txt

pat- readLines(~/tmp/a.txt)
test- readLines(/tmp/a.txt)
test
grep(pat,test)

returns integer(0)

The same query works in python via re.match() (i.e does return groups)

Using readLines, the regex is escaped for me. Does Python and R use
different regex styles?

Cheers
Saptarshi

[1]
  
^(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s([^\s]*)\s([^\s]*)\s\[([^\]]+)\]\s([A-Z]*)\s([^\s]*)\s([^\s]*)\s([^\s]+)\s(\d+)\s(.*)\s(.*)\s(.*)$

[2]
220.213.119.925 addons.mozilla.org - [10/Jan/2001:01:55:07 -0800] GET
/blocklist/3/%8ce33983c0-fd0e-11dc-12aa-0800200c9a66%7D/4.0b5/Fennec/20110217140304/Android_arm-eabi-gcc3/chrome:%2F%2Fglobal%2Flocale%2Fintl.properties/beta/Linux%
202.6.32.9/default/default/6/6/1/ HTTP/1.1 200 3243 - Mozilla/5.0
(Android; Linux armv7l; rv:2.0b12pre) Gecko/20110217 Firefox/4.0b12pre
Fennec/4.0b5 BLOCKLIST_v3=110.163.217.169.1299218425.9706

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Flexible rbind

2011-03-17 Thread Santosh Srinivas
Dear All,

I am trying to create a empty structure that I want to fill gradually
through the code.
I want to use something like rbind to create the basic structure first.

I am looking for a possibility to do an rbind where the columns names
dont match fully (but the missing columns can be defaulted to either
zero or n/a)  (my actual data has a lot of columns).
Please see the data frames below

I have the following data frame that is filled
 dput(d)
structure(list(type = structure(1:9, .Label = c(test1, test2,
test3, test4, test5, test6, test7, test8, test9
), class = factor), meter = c(37.25122438, 58.65845732, 63.47108421,
94.76085162, 13.75013867, 8.520664878, 79.74623167, 62.16109819,
20.47806657), degree = c(2.884440333, 74.94067898, 32.64251152,
83.24820274, 58.36084794, 12.64490368, 4.428796741, 32.12824213,
97.83710088)), .Names = c(type, meter, degree), class =
data.frame, row.names = c(NA,
-9L))

To the above data I want to append the following data to create an
empty structure for the new data
 dput(dNewTests)
structure(list(type = structure(1:9, .Label = c(test14, test15,
test16, test17, test18, test19, test20, test21, test22
), class = factor)), .Names = type, class = data.frame, row.names = c(NA,
-9L))

rbind obviously throws the following error:

 rbind(d, dNewTests)
Error in rbind(deparse.level, ...) :
  numbers of columns of arguments do not match

Any way that I can default the missing columns to NA or some default
value like zero?

Also, I tried to go through the archives and tried merge_all. What am
I doing wrong wih the code below ?

require(reshape)
merge_all(d, dNewTests,by=type)

Error in fix.by(by.x, x) :
  'by' must specify column(s) as numbers, names or logical


Thanks,
Santosh

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] table() reading problem

2011-03-17 Thread Jim Lemon

On 03/16/2011 08:20 PM, fre wrote:

I have the following problem:

I have some string with numbers like k. I want to have a table like the
function table() gives. However I am not able to call the first row, the 1,
2, 3, 5, 6 or 9. I tried to do that by a data.frame, but that doesn't seem
to work either. The levels keep bothering me.

This is an example of the code:

k-c(1,1,1,1,1,3,5,6,2,1,3,9,2,3,1,1,1)

table(k)

k
1 2 3 5 6 9
9 2 3 1 1 1

x-table(k)

dim(x)

[1] 6


x[1] #But i only want the one

1
9


x-data.frame(x)

x[1,1] #You are not allowed to use this one for example 3*x[1,1] is
impossible

[1] 1
Levels: 1 2 3 5 6 9




I hope anyone has an idea of using the table function without this
inconvenience. I thought about writing a counter myself, but that seems
complicated.
Because I have to examine very large examples later on, I don't want to slow
the calculations down if possible.

Thanks for your help in advance.


Hi Frederique,
It seems to me that you want the names of the table rather than the 
counts, as Sarah has already suggested. So:


as.numeric(names(x))

should give you what you want. However, if all you want to do is 
determine the unique elements of k, then:


unique(k)

might be what you are seeking.

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] table() reading problem

2011-03-17 Thread Jim Lemon
Sender: r-help-boun...@r-project.org
On-Behalf-Of: j...@bitwrit.com.au
Subject: Re: [R] table() reading problem
Message-Id: 4d81c14a.2000...@bitwrit.com.au
Recipient: killian.door...@barclayscapital.com

___


 e at 1 Churchill Place, London, E14 5HP.  This email may relate to or be sent 
from other members of the Barclays Group.
___
---BeginMessage---

On 03/16/2011 08:20 PM, fre wrote:

I have the following problem:

I have some string with numbers like k. I want to have a table like the
function table() gives. However I am not able to call the first row, the 1,
2, 3, 5, 6 or 9. I tried to do that by a data.frame, but that doesn't seem
to work either. The levels keep bothering me.

This is an example of the code:

k-c(1,1,1,1,1,3,5,6,2,1,3,9,2,3,1,1,1)

table(k)

k
1 2 3 5 6 9
9 2 3 1 1 1

x-table(k)

dim(x)

[1] 6


x[1] #But i only want the one

1
9


x-data.frame(x)

x[1,1] #You are not allowed to use this one for example 3*x[1,1] is
impossible

[1] 1
Levels: 1 2 3 5 6 9




I hope anyone has an idea of using the table function without this
inconvenience. I thought about writing a counter myself, but that seems
complicated.
Because I have to examine very large examples later on, I don't want to slow
the calculations down if possible.

Thanks for your help in advance.


Hi Frederique,
It seems to me that you want the names of the table rather than the 
counts, as Sarah has already suggested. So:


as.numeric(names(x))

should give you what you want. However, if all you want to do is 
determine the unique elements of k, then:


unique(k)

might be what you are seeking.

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
---End Message---
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Flexible rbind

2011-03-17 Thread Santosh Srinivas
Sender: r-help-boun...@r-project.org
On-Behalf-Of: santosh.srini...@gmail.com
Subject: [R] Flexible rbind
Message-Id: aanlktimneofi+ndx+xmvhr3uw3ppygkvtgkvbmzva...@mail.gmail.com
Recipient: simon.r...@barclayscapital.com

___


 e at 1 Churchill Place, London, E14 5HP.  This email may relate to or be sent 
from other members of the Barclays Group.
___
---BeginMessage---
Dear All,

I am trying to create a empty structure that I want to fill gradually
through the code.
I want to use something like rbind to create the basic structure first.

I am looking for a possibility to do an rbind where the columns names
dont match fully (but the missing columns can be defaulted to either
zero or n/a)  (my actual data has a lot of columns).
Please see the data frames below

I have the following data frame that is filled
 dput(d)
structure(list(type = structure(1:9, .Label = c(test1, test2,
test3, test4, test5, test6, test7, test8, test9
), class = factor), meter = c(37.25122438, 58.65845732, 63.47108421,
94.76085162, 13.75013867, 8.520664878, 79.74623167, 62.16109819,
20.47806657), degree = c(2.884440333, 74.94067898, 32.64251152,
83.24820274, 58.36084794, 12.64490368, 4.428796741, 32.12824213,
97.83710088)), .Names = c(type, meter, degree), class =
data.frame, row.names = c(NA,
-9L))

To the above data I want to append the following data to create an
empty structure for the new data
 dput(dNewTests)
structure(list(type = structure(1:9, .Label = c(test14, test15,
test16, test17, test18, test19, test20, test21, test22
), class = factor)), .Names = type, class = data.frame, row.names = c(NA,
-9L))

rbind obviously throws the following error:

 rbind(d, dNewTests)
Error in rbind(deparse.level, ...) :
  numbers of columns of arguments do not match

Any way that I can default the missing columns to NA or some default
value like zero?

Also, I tried to go through the archives and tried merge_all. What am
I doing wrong wih the code below ?

require(reshape)
merge_all(d, dNewTests,by=type)

Error in fix.by(by.x, x) :
  'by' must specify column(s) as numbers, names or logical


Thanks,
Santosh

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
---End Message---
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Flexible rbind

2011-03-17 Thread Santosh Srinivas
Sender: r-help-boun...@r-project.org
On-Behalf-Of: santosh.srini...@gmail.com
Subject: [R] Flexible rbind
Message-Id: aanlktimneofi+ndx+xmvhr3uw3ppygkvtgkvbmzva...@mail.gmail.com
Recipient: killian.door...@barclayscapital.com

___


 e at 1 Churchill Place, London, E14 5HP.  This email may relate to or be sent 
from other members of the Barclays Group.
___
---BeginMessage---
Dear All,

I am trying to create a empty structure that I want to fill gradually
through the code.
I want to use something like rbind to create the basic structure first.

I am looking for a possibility to do an rbind where the columns names
dont match fully (but the missing columns can be defaulted to either
zero or n/a)  (my actual data has a lot of columns).
Please see the data frames below

I have the following data frame that is filled
 dput(d)
structure(list(type = structure(1:9, .Label = c(test1, test2,
test3, test4, test5, test6, test7, test8, test9
), class = factor), meter = c(37.25122438, 58.65845732, 63.47108421,
94.76085162, 13.75013867, 8.520664878, 79.74623167, 62.16109819,
20.47806657), degree = c(2.884440333, 74.94067898, 32.64251152,
83.24820274, 58.36084794, 12.64490368, 4.428796741, 32.12824213,
97.83710088)), .Names = c(type, meter, degree), class =
data.frame, row.names = c(NA,
-9L))

To the above data I want to append the following data to create an
empty structure for the new data
 dput(dNewTests)
structure(list(type = structure(1:9, .Label = c(test14, test15,
test16, test17, test18, test19, test20, test21, test22
), class = factor)), .Names = type, class = data.frame, row.names = c(NA,
-9L))

rbind obviously throws the following error:

 rbind(d, dNewTests)
Error in rbind(deparse.level, ...) :
  numbers of columns of arguments do not match

Any way that I can default the missing columns to NA or some default
value like zero?

Also, I tried to go through the archives and tried merge_all. What am
I doing wrong wih the code below ?

require(reshape)
merge_all(d, dNewTests,by=type)

Error in fix.by(by.x, x) :
  'by' must specify column(s) as numbers, names or logical


Thanks,
Santosh

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
---End Message---
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subset using grepl

2011-03-17 Thread Kang Min
Ok thank you!

On Mar 17, 12:12 pm, bill.venab...@csiro.au wrote:
 subset(data,grepl([1-5], section)  !grepl(0, section))

 BTW

 grepl([1:5], section)

 does work.  It checks for the characters 1, :, or 5.  

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf Of Kang Min
 Sent: Thursday, 17 March 2011 1:29 PM
 To: r-h...@r-project.org
 Subject: Re: [R]Subsetusinggrepl

 I have a new question, also regardinggrepl.
 I would like tosubsetrows with numbers from 1 to 5 in the section
 column, so I used

 subset(data,grepl([1:5], section))

 but this gave me rows with 10, 1 and 5. (Why is this so?) So I tried

 subset(data,grepl([1,2,3,4,5], section))

 which worked. But I also got 10 in the dataframe as well. How can I
 exclude 10?

 data
 section piece   LTc1    LTc2
 10a     10-1    0.729095368     NA
 10a     10-2    59.53292189     13.95612454
 10h     10-3    0.213756661     NA
 10i     10-4    NA      NA
 10b     NA      NA      NA
 10c     NA      NA      NA
 10d     NA      NA      NA
 10e     NA      NA      NA
 10f     NA      NA      NA
 10g     NA      NA      NA
 10h     NA      NA      NA
 10j     NA      NA      NA
 1b      1-1     NA      NA
 1d      1-2     29.37971303     12.79688209
 1g      1-6     NA      7.607911603
 1h      1-3     0.298059164     27.09896941
 1i      1-4     25.11261782     19.87149991
 1j      1-5     36.66969601     42.28507923
 1a      NA      NA      NA
 1c      NA      NA      NA
 1e      NA      NA      NA
 1f      NA      NA      NA
 2a      2-1     15.98582117     10.58696146
 2a      2-2     0.557308341     41.52650718
 2c      2-3     14.99499024     10.0896793
 2e      2-4     148.4530636     56.45493191
 2f      2-5     25.27493551     12.98808577
 2i      2-6     20.32857108     22.76075728
 2b      NA      NA      NA
 2d      NA      NA      NA
 2g      NA      NA      NA
 2h      NA      NA      NA
 2j      NA      NA      NA
 3a      3-1     13.36602867     11.47541439
 3a      3-7     NA      111.9007822
 3c      3-2     10.57406701     5.58567
 3d      3-3     11.73240891     10.73833651
 3e      3-8     NA      14.54214165
 3h      3-4     21.56072089     21.59748884
 3i      3-5     15.42846935     16.62715409
 3i      3-6     129.7367193     121.8206045
 3b      NA      NA      NA
 3f      NA      NA      NA
 3g      NA      NA      NA
 3j      NA      NA      NA
 5b      5-1     18.61733498     18.13545293
 5d      5-3     NA      7.81018526
 5f      5-2     12.5158971      14.37884817
 5a      NA      NA      NA
 5c      NA      NA      NA
 5e      NA      NA      NA
 5g      NA      NA      NA
 5h      NA      NA      NA
 5i      NA      NA      NA
 5j      NA      NA      NA
 9h      9-1     NA      NA
 9a      NA      NA      NA
 9b      NA      NA      NA
 9c      NA      NA      NA
 9d      NA      NA      NA
 9e      NA      NA      NA
 9f      NA      NA      NA
 9g      NA      NA      NA
 9i      NA      NA      NA
 9j      NA      NA      NA
 8a      8-1     14.29712852     12.83178905
 8e      8-2     23.46594953     9.097377872
 8f      8-3     NA      NA
 8f      8-4     22.20001584     20.39646766
 8h      8-5     50.54497551     56.93752065
 8b      NA      NA      NA
 8c      NA      NA      NA
 8d      NA      NA      NA
 8g      NA      NA      NA
 8i      NA      NA      NA
 8j      NA      NA      NA
 4b      4-1     40.83468857     35.99017683
 4f      4-3     NA      182.8060799
 4f      4-4     NA      36.81401955
 4h      4-2     17.13625062     NA
 4a      NA      NA      NA
 4c      NA      NA      NA
 4d      NA      NA      NA
 4e      NA      NA      NA
 4g      NA      NA      NA
 4i      NA      NA      NA
 4j      NA      NA      NA
 7b      7-1     8.217605633     8.565035083
 7a      NA      NA      NA
 7c      NA      NA      NA
 7d      NA      NA      NA
 7e      NA      NA      NA
 7f      NA      NA      NA
 7g      NA      NA      NA
 7h      NA      NA      NA
 7i      NA      NA      NA
 7j      NA      NA      NA
 6b      6-6     NA      11.57887288
 6c      6-1     27.32608984     17.17778959
 6c      6-2     78.21988783     61.80558768
 6d      6-7     NA      3.599685625
 6f      6-3     26.78838281     23.33258286
 6h      6-4     NA      NA
 6h      6-5     NA      NA
 6a      NA      NA      NA
 6e      NA      NA      NA
 6g      NA      NA      NA
 6i      NA      NA      NA
 6j      NA      NA      NA

 On Jan 29, 10:43 pm, Prof Brian Ripley rip...@stats.ox.ac.uk wrote:
  On Sat, 29 Jan 2011, Kang Min wrote:
   Thanks Prof Ripley, the condition worked!
   Btw I tried to search ?repl but I don't have documentation for it. Is
   it in a non-basic package?

  I meantgrepl: the edit messed up (but not on my screen, as sometimes
  happens when working remotely).  The point is that 'perl=TRUE'
  guarantees that [A-J] is interpreted in ASCII order.

   On Jan 29, 6:54�pm, Prof Brian Ripley rip...@stats.ox.ac.uk wrote:
   The grep 

Re: [R] NaNs in Nested Mixed Model

2011-03-17 Thread Johan Stenberg
Thank you Thierry for your kind answer!

If you don't mind I would like to ask a follow-up question. In your
suggestions I get P-values for Species. However, I am really not
interested in that factor per se. Would it make sense to use this
model instead if I am only interested in Genotype?

 model-glmer(Nymphs~Genotype+(1|Species/Genotype),family=poisson)

All the best,
Johan

2011/3/9 ONKELINX, Thierry thierry.onkel...@inbo.be:
 Dear Johan,

 A few remarks.

 - R-sig-mixed models is a better list for asking questions about mixed model.
 - I presume that Nymphs is the number of insects? In that case you need a 
 generalised linear (mixed) model with poisson family
 - What are you interessed in? The variability among genotypes or the effect 
 of each genotype.
        You can achieve the first with a glmm like glmer(Nymphs ~ Species + 
 (1|Genotype), family = poisson). Genotype will be implicitly nested in 
 Species.  Note that since you have only 4 genotypes, you will not get very 
 reliable estimates of the genotype variance.
        For the latter you cannot use a mixed model so you need a simple 
 glm(Nymphs ~ Species/Genotype, family = poisson). Note that several 
 coefficients will be NaN, because you cannot estimate them.

 Best regards,

 Thierry

 
 ir. Thierry Onkelinx
 Instituut voor natuur- en bosonderzoek
 team Biometrie  Kwaliteitszorg
 Gaverstraat 4
 9500 Geraardsbergen
 Belgium

 Research Institute for Nature and Forest
 team Biometrics  Quality Assurance
 Gaverstraat 4
 9500 Geraardsbergen
 Belgium

 tel. + 32 54/436 185
 thierry.onkel...@inbo.be
 www.inbo.be

 To call in the statistician after the experiment is done may be no more than 
 asking him to perform a post-mortem examination: he may be able to say what 
 the experiment died of.
 ~ Sir Ronald Aylmer Fisher

 The plural of anecdote is not data.
 ~ Roger Brinner

 The combination of some data and an aching desire for an answer does not 
 ensure that a reasonable answer can be extracted from a given body of data.
 ~ John Tukey


 -Oorspronkelijk bericht-
 Van: r-help-boun...@r-project.org
 [mailto:r-help-boun...@r-project.org] Namens Johan Stenberg
 Verzonden: dinsdag 8 maart 2011 16:52
 Aan: r-help@r-project.org
 Onderwerp: [R] NaNs in Nested Mixed Model

 Dear R users,

 I have a problem with something called NaNs in a nested mixed model.

 The background is that I have studied the number of insect
 nymphs emerging from replicated Willow genotypes in the
 field. I have 15 replicates each of 4 Willow genotypes
 belonging two 2 Willow species.
 Now I want to elucidate the effect of Willow genotype on the
 number of emerging nymphs. Previously I performed a simple
 one-way anova with genotype as explanatory factor and
 number of nymphs emerging as dependent variable, but the
 editor of the journal I've submitted this piece to wants me
 to nest Willow genotype within Willow species before he
 accepts the paper for publication [Species*Genotype(Species)].

 The fact that I didn't include Willow species as a factor
 in my initial analysis reflects that I am not very interested
 in the species factor per se - I am just interested in if
 genetic variation in the host plant is important, but
 species is of course a factor that structures genetic diversity.

 I thought the below model would be appropriate:

  model-lme(Nymphs~Species*Genotype,random=~1|Species/Genotype)

 ...but I then get the error message Error in MEEM(object, conLin,
 control$niterEM) : Singularity in backsolve at level 0, block 1

 I then tried to remove Genotype from the fixed factors, but
 then I get the error message NaNs produced.

  model-lme(Nymphs~Species,random=~1|Species/Genotype)
  summary(model)
 Linear mixed-effects model fit by REML
  Data: NULL
        AIC      BIC    logLik
   259.5054 269.8077 -124.7527

 Random effects:
  Formula: ~1 | Species
         (Intercept)
 StdDev:   0.9481812

  Formula: ~1 | Genotype %in% Species
         (Intercept) Residual
 StdDev:   0.3486937 1.947526

 Fixed effects: Nymphs ~ Species
                  Value Std.Error DF   t-value p-value
 (Intercept)   2.67  1.042243 56  2.558585  0.0132
 Speciesviminalis -2.03  1.473954  0 -1.379510     NaN
  Correlation:
              (Intr)
 Speciesviminalis -0.707

 Standardized Within-Group Residuals:
        Min         Q1        Med         Q3        Max
 -1.4581821 -0.3892233 -0.2751795  0.3439871  3.1630658

 Number of Observations: 60
 Number of Groups:
           Species Genotype %in% Species
             2             4
 Warning message:
 In pt(q, df, lower.tail, log.p) : NaNs produced
 ***

 Do you have any idea what these error messages mean in my
 case and how I can get around them?

 Thank you on beforehand! (data set attached).

 Johan


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do 

Re: [R] Specify feature weights in model prediction (CARET)

2011-03-17 Thread Kendric Wang
It was a general question to find out if the caret framework supported
passing feature weights to any predictive model. But I guess, it may only
make sense in the context of the model. The ones I am interested in would be
svm and knn.

Thanks,
Kendric

On Wed, Mar 16, 2011 at 6:20 PM, Max Kuhn mxk...@gmail.com wrote:

  Using the 'CARET' package, is it possible to specify weights for features
  used in model prediction?

 For what model?

  And for the 'knn' implementation, is there a way
  to choose a distance metric (i.e. Mahalanobis distance)?
 

 No, sorry.

 Max


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] flow map lines between point pairs (latitude/longitude)

2011-03-17 Thread vioravis
I am working on a similar problem. I have to add two columns: one containing
the US state to which the origin belongs and another one to add the state in
to which destination belongs. All I have is the latitude and the longitude
of the origin and destination. Are there any packages in R that can do
this??? Thank you.

Ravi

--
View this message in context: 
http://r.789695.n4.nabble.com/flow-map-lines-between-point-pairs-latitude-longitude-tp860009p3383842.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fetch uneven

2011-03-17 Thread rens
Thanks !
You all made my day !


--
View this message in context: 
http://r.789695.n4.nabble.com/fetch-uneven-tp3381949p3383956.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R² for non-linear model

2011-03-17 Thread Rubén Roa
Hi Alexx,

I don't see any problem in comparing models based on different distributions 
for the same data using the AIC, as long as they have a different number of 
parameters and all the constants are included.
For example, you can compare distribution mixture models with different number 
of components using the AIC.
This is one example:
Roa-Ureta. 2010. A Likelihood-Based Model of Fish Growth With Multiple Length 
Frequency Data. Journal of Biological, Agricultural and Environmental 
Statistics 15:416-429.
Here is another example:
www.education.umd.edu/EDMS/fac/Dayton/PCIC_JMASM.pdf
Prof. Dayton writes above that one advantage of AIC over hypothesis testing is:
(d) Considerations related to underlying distributions   for   random   
variables   can   be 
incorporated  into  the  decision-making  process rather than being treated as 
an assumption whose 
robustness  must  be  considered  (e.g.,  models based  on  normal  densities  
and  on  log-normal 
densities can be compared).
Last, if you read Akaike's theorem you will see there is nothing precluding 
comparing models built on different distributional models. Here it is:
 the expected (over the sample space and the space of parameter estimates) 
maximum log-likelihood of some data on a working model overshoots the expected 
(over the sample space only) maximum log-likelihood of the data under the true 
model that 
generated the data by exactly the number of  parameters in the working model.
A remarkable result.

Rubén

-Original Message-
From: r-help-boun...@r-project.org on behalf of Alexx Hardt
Sent: Wed 3/16/2011 7:42 PM
To: r-help@r-project.org
Subject: Re: [R] R² for non-linear model
 
Am 16.03.2011 19:34, schrieb Anna Gretschel:
 Am 16.03.2011 19:21, schrieb Alexx Hardt:
 And to be on-topic: Anna, as far as I know anova's are only useful to
 compare a submodel (e.g. with one less regressor) to another model.

 thanks! i don't get it either what they mean by fortune...

It's an R-package (and a pdf [1]) with collected quotes from the mailing 
list.
Be careful with the suggestion to use AIC. If you wanted to compare two 
models using AICs, you need the same distribution (that is, 
Verteilungsannahme) in both models.
To my knowledge, there is no way to compare a gaussian model to an 
exponential one (except common sense), but my knowledge is very limited.

[1] http://cran.r-project.org/web/packages/fortunes/vignettes/fortunes.pdf

-- 
alexx@alexx-fett:~$ vi .emacs

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rJava software

2011-03-17 Thread Bio7
Hello,

it seems that rJava tries to detect the path to the Java Virtual Maschine
from a registry key which is not installed. I gues that the HLM means
HKEY_LOCAL_MASCHINE where you find normally the path to your installed Java.
On my Windows System i have for example the path
HKEY_LOCAL_MASCHINE\Software\JavaSoft\Java Runtime Environment\1.6.0_24
where you can find the key for the Java installation path.
Try to open regedit fromyour Windows console and convince yourself that the
path is not present.

Maybee a reinstallation of the JRE or JDK helps (for the registry key). Or
try to put your Java installation (the location) on the OS system path (open
the Windows Shell and type Java -version if a installation is already on the
system path), maybee that helps, too. 

Also, does it matter where java is installed?
Yes, it does matter because the OS has to find the executables and native
libs of the JRE and rJava has to find them, too because it relies on them.

I hope this helps.
 

--
View this message in context: 
http://r.789695.n4.nabble.com/rJava-software-tp3383366p3383977.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] adding linear regression data to plot

2011-03-17 Thread Jim Lemon

On 03/17/2011 02:47 AM, derek wrote:

I know I can add line to graph with abline(), but I would like to print
R-squared, F-test value, Residuals  and other statistics from lm() to a
graph. I don't know how to access the values from summary(), so that I can
use them in a following code or print them in a graph.


Hi derek,
As Peter has already replied, you can use str() to find out the 
components of the summary. Try something like this:


plot(1:10)
library(plotrix)
addtable2plot(2,8,coef(result))

where result is what has been returned by the call to lm.

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Why doesn't this work ?

2011-03-17 Thread Iain Gallagher
The first line of this reply is a definite candidate for the fortunes package!

best

i

--- On Thu, 17/3/11, bill.venab...@csiro.au bill.venab...@csiro.au wrote:

 From: bill.venab...@csiro.au bill.venab...@csiro.au
 Subject: Re: [R] Why doesn't this work ?
 To: ericst...@aol.com, r-help@r-project.org
 Date: Thursday, 17 March, 2011, 3:54
 It doesn't work (in R) because it is
 not written in R.  It's written in some other language
 that looks a bit like R.
 
  t - 3
  z - t %in% 1:3
  z
 [1] TRUE
  t - 4
  z - t %in% 1:3
  z
 [1] FALSE
  
 
  
 
 -Original Message-
 From: r-help-boun...@r-project.org
 [mailto:r-help-boun...@r-project.org]
 On Behalf Of eric
 Sent: Thursday, 17 March 2011 1:26 PM
 To: r-help@r-project.org
 Subject: [R] Why doesn't this work ?
 
 Why doesn't this work and is there a better way ?
 
 z -ifelse(t==1 || 2 || 3, 1,0)
 t -3
 z
 [1] 1
 t -4
 z
 [1] 1
 
 trying to say ...if t == 1 or if t== 2 or if t ==3 then
 true, otherwise
 false
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Why-doesn-t-this-work-tp3383656p3383656.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.
 
 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] calculating the occurrences of distinct observations in the subsets of a dataframe

2011-03-17 Thread Bodnar Laszlo EB_HU
Hello everybody,

I have a data frame in R which is similar to the follows. Actually my real 'df' 
dataframe is much bigger than this one here but I really do not want to confuse 
anybody so that is why I try to simplify things as much as possible.

So here's the data frame.

id -c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3)
a -c(3,1,3,3,1,3,3,3,3,1,3,2,1,2,1,3,3,2,1,1,1,3,1,3,3,3,2,1,1,3)
b -c(3,2,1,1,1,1,1,1,1,1,1,2,1,3,2,1,1,1,2,1,3,1,2,2,1,3,3,2,3,2)
c -c(1,3,2,3,2,1,2,3,3,2,2,3,1,2,3,3,3,1,1,2,3,3,1,2,2,3,2,2,3,2)
d -c(3,3,3,1,3,2,2,1,2,3,2,2,2,1,3,1,2,2,3,2,3,2,3,2,1,1,1,1,1,2)
e -c(2,3,1,2,1,2,3,3,1,1,2,1,1,3,3,2,1,1,3,3,2,2,3,3,3,2,3,2,1,3)
df -data.frame(id,a,b,c,d,e)
df

Basically what I would like to do is to get the occurrences of numbers for each 
column (a,b,c,d,e) and for each id group (1,2,3) (for this latter grouping see 
my column 'id').

So, for column 'a' and for id number '1' (for the latter see column 'id') the 
code would be something like this:
as.numeric(table(df[1:10,2]))

The results are:
[1] 3 7

Just to briefly explain my results: in column 'a' (and regarding only those 
records which have number '1' in column 'id') we can say that:
number 1 occured 3 times, and
number 3 occured 7 times.

Again, just to show you another example. For column 'a' and for id number '2' 
(for the latter grouping see again column 'id'):
as.numeric(table(df[11:20,2]))

After running the codes the results are: [1] 4 3 3

Let me explain a little again: in column 'a' and regarding only those 
observations which have number '2' in column 'id') we can say that
number 1 occured 4 times
number 2 occured 3 times and
number 3 occured 3 times.

Last example: for column 'e' and for id number '3' the code would be:
as.numeric(table(df[21:30,6]))

With the results:
[1] 1 4 5

...meaning that number '1' occured once, number '2' occured four times and 
number '3' occured 5 times.

So this is what I would like to do. Calculating the occurrences of numbers for 
each custom-defined subsets (and then collecting these values into a data 
frame). I know it is NOT a difficult task but the PROBLEM is that I'm gonna 
have to change the input 'df' dataframe on a regular basis and hence both the 
overall number of rows and columns might CHANGE over time...

What I have done so far is that I have separated the 'df' dataframe by columns, 
like this:
for (z in (2:ncol(df))) assign(paste(df,z,sep=.),df[,z])

So df.2 will refer to df$a, df.3 will equal df$b, df.4 will equal df$c etc. But 
I'm really stuck now and I don't know how to move forward, you know, getting 
the occurrences for each column and each group of ids.

Do you have any ideas?
Best regards,
Laszlo


Ez az e-mail és az összes hozzá tartozó csatolt melléklet titkos és/vagy 
jogilag, szakmailag vagy más módon védett információt tartalmazhat. 
Amennyiben nem Ön a levél címzettje akkor a levél tartalmának közlése, 
reprodukálása, másolása, vagy egyéb más úton történő terjesztése, 
felhasználása szigorúan tilos. Amennyiben tévedésből kapta meg ezt az 
üzenetet kérjük azonnal értesítse az üzenet küldőjét. Az Erste Bank 
Hungary Zrt. (EBH) nem vállal felelősséget az információ teljes és pontos 
- címzett(ek)hez történő - eljuttatásáért, valamint semmilyen 
késésért, kapcsolat megszakadásból eredő hibáért, vagy az információ 
felhasználásából vagy annak megbízhatatlanságából eredő kárért.

Az üzenetek EBH-n kívüli küldője vagy címzettje tudomásul veszi és 
hozzájárul, hogy az üzenetekhez más banki alkalmazott is hozzáférhet az 
EBH folytonos munkamenetének biztosítása érdekében.


This e-mail and any attached files are confidential and/...{{dropped:19}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Why doesn't this work ?

2011-03-17 Thread Ted Harding
And, just to make it really clear (I hope!):

Your original expression
  z -ifelse(t==1 || 2 || 3, 1,0)
looks like a transcription into R of the words

  If t equals 1 or 2 or 3 then z is 1 else z is 0

However, your t==1 || 2 || 3 has to be parsed in the correct
order according to operator precedence. If you look at '?Syntax'
you will see that operator '==' has precedence over the or
operator '||'. Hence the expression t==1 || 2 || 3 will be
parsed as

  (t==1) || 2 || 3

So, whatever the value of 't' (t==1 may be either TRUE or
FALSE), the result will be either

  TRUE || 2 || 3
or
  FALSE || 2 || 3

When numeric values (like 2 or 3) occur in a logical
expression, they are coerced to a logical TRUE (with the
one exception that a numerical value of 0 is coerced
to FALSE). Hence, whatever the outcome of t==1 the result
will be either

  TRUE || TRUE || TRUE
or
  FALSE || TRUE || TRUE

which is always TRUE. This explains your results. Bill and Phil
have given you an alternative which works: t %in% (1:3).
A way of writing it (though longer) which is closer to your
original wording could be

  if( (t==1) || (t==2) || (t==3) , 1, 0)

which really spells out how to force the parsing.

Hoping this helps,
Ted.

On 17-Mar-11 03:54:32, bill.venab...@csiro.au wrote:
 It doesn't work (in R) because it is not written in R.  It's written in
 some other language that looks a bit like R.
 
 t - 3
 z - t %in% 1:3
 z
 [1] TRUE
 t - 4
 z - t %in% 1:3
 z
 [1] FALSE
 
 -Original Message-
 From: r-help-boun...@r-project.org
 [mailto:r-help-boun...@r-project.org] On Behalf Of eric
 Sent: Thursday, 17 March 2011 1:26 PM
 To: r-help@r-project.org
 Subject: [R] Why doesn't this work ?
 
 Why doesn't this work and is there a better way ?
 
 z -ifelse(t==1 || 2 || 3, 1,0)
 t -3
 z
 [1] 1
 t -4
 z
 [1] 1
 
 trying to say ...if t == 1 or if t== 2 or if t ==3 then true, otherwise
 false
 
 --
 View this message in context:
 http://r.789695.n4.nabble.com/Why-doesn-t-this-work-tp3383656p3383656.ht
 ml
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) ted.hard...@wlandres.net
Fax-to-email: +44 (0)870 094 0861
Date: 17-Mar-11   Time: 10:13:53
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] create data set from selection of rows

2011-03-17 Thread e-letter
On 15/03/2011, Francisco Gochez fjgoc...@googlemail.com wrote:
 Hi,

 What you are after is:

 datasubset - dataset[ dataset[,3] == text3, ]

Thank you. For the set

text1,23,text2,45
text1,23,text3,78
text1,23,text3,56
text1,23,text2,45

Is it possible to write a function that selects rows containing
'text3' and applies the function 'sum' to values '78' and '56'? The
control statements described in the document 'an introduction to r'
(Venables and Smith, 2010) suggest that the if statement would return
'true', to prevent a sum function being applied to 'true' results.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Retrieve an index of nested lists | Changing name delimiter in 'unlist()'

2011-03-17 Thread Janko Thyson
Dear list,

I have to problems that are connected:

PROBLEM 1
I wonder if it is somehow possible to patch the function
'unlist(use.names=TRUE)' such that you can specify an arbitrary name
delimiter, e.g. / or _. As I often name my variables var.x.y, the
default delimiter makes it hard to distinguish the distinct layers of a
nested list after unlisting. 
As 'unlist()' it is called by '.Internal()', I didn't have a clue of how to
find the section where the names are pasted.

I'd like to use such a patched version in order to create a customized index
(class: data.frame) of nested lists which should look like this

name index  is.top  is.bottom   degree
a1  TRUEFALSE   1
a/a.11-1FALSE   FALSE   2
a/a.1/a.1.1  1-1-1  FALSE   TRUE3
a/a.21-2FALSE  FALSE   2
a/a.2/a.2.1  1-2-1  FALSE  TRUE3
b2  TRUE   FALSE   1
...

Of course I could get such an index by recursively traversing the list
layers, but this takes much too long. Maybe you also have another idea on
how to create such an index.

PROBLEM 2
I'd also like to retrieve such an index for nested environment structures.
Therefore, I would either need something similar to 'unlist()' that works on
environments or find something that coerces nested environments to nested
lists. Again, I solved this by recursively looping over the respective
environment layers, but it's too inefficient. Is there something out there
that does the job via a C routine somehow?
Thanks for any suggestions,
Janko

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] calculating the occurrences of distinct observations in the subsets of a dataframe

2011-03-17 Thread Tóth Dénes

Hi!

Sorry, I made an error in the previous e-mail.
So try this:
by(df[,-1],df$id,function(x) apply(x,2,tabulate))

This gives you a list. You can rearrange it into a data frame or a 3d
array if you wish.

Regards,
  Denes




 Hello everybody,

 I have a data frame in R which is similar to the follows. Actually my real
 'df' dataframe is much bigger than this one here but I really do not want
 to confuse anybody so that is why I try to simplify things as much as
 possible.

 So here's the data frame.

 id -c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3)
 a -c(3,1,3,3,1,3,3,3,3,1,3,2,1,2,1,3,3,2,1,1,1,3,1,3,3,3,2,1,1,3)
 b -c(3,2,1,1,1,1,1,1,1,1,1,2,1,3,2,1,1,1,2,1,3,1,2,2,1,3,3,2,3,2)
 c -c(1,3,2,3,2,1,2,3,3,2,2,3,1,2,3,3,3,1,1,2,3,3,1,2,2,3,2,2,3,2)
 d -c(3,3,3,1,3,2,2,1,2,3,2,2,2,1,3,1,2,2,3,2,3,2,3,2,1,1,1,1,1,2)
 e -c(2,3,1,2,1,2,3,3,1,1,2,1,1,3,3,2,1,1,3,3,2,2,3,3,3,2,3,2,1,3)
 df -data.frame(id,a,b,c,d,e)
 df

 Basically what I would like to do is to get the occurrences of numbers for
 each column (a,b,c,d,e) and for each id group (1,2,3) (for this latter
 grouping see my column 'id').

 So, for column 'a' and for id number '1' (for the latter see column 'id')
 the code would be something like this:
 as.numeric(table(df[1:10,2]))

 The results are:
 [1] 3 7

 Just to briefly explain my results: in column 'a' (and regarding only
 those records which have number '1' in column 'id') we can say that:
 number 1 occured 3 times, and
 number 3 occured 7 times.

 Again, just to show you another example. For column 'a' and for id number
 '2' (for the latter grouping see again column 'id'):
 as.numeric(table(df[11:20,2]))

 After running the codes the results are: [1] 4 3 3

 Let me explain a little again: in column 'a' and regarding only those
 observations which have number '2' in column 'id') we can say that
 number 1 occured 4 times
 number 2 occured 3 times and
 number 3 occured 3 times.

 Last example: for column 'e' and for id number '3' the code would be:
 as.numeric(table(df[21:30,6]))

 With the results:
 [1] 1 4 5

 ...meaning that number '1' occured once, number '2' occured four times and
 number '3' occured 5 times.

 So this is what I would like to do. Calculating the occurrences of numbers
 for each custom-defined subsets (and then collecting these values into a
 data frame). I know it is NOT a difficult task but the PROBLEM is that I'm
 gonna have to change the input 'df' dataframe on a regular basis and hence
 both the overall number of rows and columns might CHANGE over time...

 What I have done so far is that I have separated the 'df' dataframe by
 columns, like this:
 for (z in (2:ncol(df))) assign(paste(df,z,sep=.),df[,z])

 So df.2 will refer to df$a, df.3 will equal df$b, df.4 will equal df$c
 etc. But I'm really stuck now and I don't know how to move forward, you
 know, getting the occurrences for each column and each group of ids.

 Do you have any ideas?
 Best regards,
 Laszlo

 
 Ez az e-mail és az összes hozzá tartozó csatolt melléklet titkos
 és/vagy jogilag, szakmailag vagy más módon védett információt
 tartalmazhat. Amennyiben nem Ön a levél címzettje akkor a levél
 tartalmának közlése, reprodukálása, másolása, vagy egyéb más
 úton történő terjesztése, felhasználása szigorúan tilos.
 Amennyiben tévedésből kapta meg ezt az üzenetet kérjük azonnal
 értesítse az üzenet küldőjét. Az Erste Bank Hungary Zrt. (EBH) nem
 vállal felelősséget az információ teljes és pontos - címzett(ek)hez
 történő - eljuttatásáért, valamint semmilyen késésért, kapcsolat
 megszakadásból eredő hibáért, vagy az információ
 felhasználásából vagy annak megbízhatatlanságából eredő
 kárért.

 Az üzenetek EBH-n kívüli küldője vagy címzettje tudomásul veszi és
 hozzájárul, hogy az üzenetekhez más banki alkalmazott is hozzáférhet
 az EBH folytonos munkamenetének biztosítása érdekében.


 This e-mail and any attached files are confidential and/...{{dropped:19}}

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] changing the dimensions of a matrix in a real specific way

2011-03-17 Thread Bodnar Laszlo EB_HU
Hi again,

I'd like to ask you a question again.

I have a matrix like this:
a -matrix(c(1,2,3,4,5,6,7,8,9,10,11,12))
a

  [,1]
 [1,]1
 [2,]2
 [3,]3
 [4,]4
 [5,]5
 [6,]6
 [7,]7
 [8,]8
 [9,]9
[10,]   10
[11,]   11
[12,]   12

Is there a proper way to change the dimensions of this matrix so that I'll get 
this as a result:
 [,1] [,2] [,3]
[1,]   123
[2,]   456
[3,]   789
[4,]  10   11   12

Thank you very much and have a pleasant day,
Laszlo


Ez az e-mail és az összes hozzá tartozó csatolt melléklet titkos és/vagy 
jogilag, szakmailag vagy más módon védett információt tartalmazhat. 
Amennyiben nem Ön a levél címzettje akkor a levél tartalmának közlése, 
reprodukálása, másolása, vagy egyéb más úton történő terjesztése, 
felhasználása szigorúan tilos. Amennyiben tévedésből kapta meg ezt az 
üzenetet kérjük azonnal értesítse az üzenet küldőjét. Az Erste Bank 
Hungary Zrt. (EBH) nem vállal felelősséget az információ teljes és pontos 
- címzett(ek)hez történő - eljuttatásáért, valamint semmilyen 
késésért, kapcsolat megszakadásból eredő hibáért, vagy az információ 
felhasználásából vagy annak megbízhatatlanságából eredő kárért.

Az üzenetek EBH-n kívüli küldője vagy címzettje tudomásul veszi és 
hozzájárul, hogy az üzenetekhez más banki alkalmazott is hozzáférhet az 
EBH folytonos munkamenetének biztosítása érdekében.


This e-mail and any attached files are confidential and/...{{dropped:19}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] create data set from selection of rows

2011-03-17 Thread jim holtman
Is this what you want:

 x
 V1 V2V3 V4
1 text1 23 text2 45
2 text1 23 text3 78
3 text1 23 text3 56
4 text1 23 text2 45
 str(x)
'data.frame':   4 obs. of  4 variables:
 $ V1: Factor w/ 1 level text1: 1 1 1 1
 $ V2: int  23 23 23 23
 $ V3: Factor w/ 2 levels text2,text3: 1 2 2 1
 $ V4: int  45 78 56 45
 tapply(x$V4, x$V3, sum)
text2 text3
   90   134



On Thu, Mar 17, 2011 at 6:50 AM, e-letter inp...@gmail.com wrote:
 On 15/03/2011, Francisco Gochez fjgoc...@googlemail.com wrote:
 Hi,

 What you are after is:

 datasubset - dataset[ dataset[,3] == text3, ]

 Thank you. For the set

 text1,23,text2,45
 text1,23,text3,78
 text1,23,text3,56
 text1,23,text2,45

 Is it possible to write a function that selects rows containing
 'text3' and applies the function 'sum' to values '78' and '56'? The
 control statements described in the document 'an introduction to r'
 (Venables and Smith, 2010) suggest that the if statement would return
 'true', to prevent a sum function being applied to 'true' results.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] create data set from selection of rows

2011-03-17 Thread Dennis Murphy
Using the summarise function in package plyr is one way; taking df to be
your data frame with variable names V1-V4,

library(plyr)
summarise(subset(df, V3 == 'text3'), sum = sum(V4))
  sum
1 134

Another is to use the data.table package:

library(data.table)
dt - data.table(df)
dt[V3 == 'text3', sum(V4)]
[1] 134

A third route would be with package sqldf, something like

sqldf(df, select sum(V4) from df where V3 == 'text3')

but sqldf is messed up on my system (probably because I didn't install it
properly) so the code is untested.

HTH,
Dennis

On Thu, Mar 17, 2011 at 3:50 AM, e-letter inp...@gmail.com wrote:

 On 15/03/2011, Francisco Gochez fjgoc...@googlemail.com wrote:
  Hi,
 
  What you are after is:
 
  datasubset - dataset[ dataset[,3] == text3, ]

 Thank you. For the set

 text1,23,text2,45
 text1,23,text3,78
 text1,23,text3,56
 text1,23,text2,45

 Is it possible to write a function that selects rows containing
 'text3' and applies the function 'sum' to values '78' and '56'? The
 control statements described in the document 'an introduction to r'
 (Venables and Smith, 2010) suggest that the if statement would return
 'true', to prevent a sum function being applied to 'true' results.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Extracting columns from a class

2011-03-17 Thread nuncio m
Hi list,
   I am not a frequent user of R.  Recently I used R in principal
component analysis and got the result as a class, which has information like
standard deviation and principal components from 1 to 10.  How is it
possible to extract the column corresponding to first principal component
and write it to a file
the out from prcomp command is something like this

Standard
deviations:

 [1] 3.325801e+00 7.669837e-01 6.625773e-01 4.990732e-01 3.470071e-01
 [6] 2.946679e-01 2.206289e-01 1.645828e-01 1.570887e-01
4.741294e-16


Rotation:
   PC1   PC2   PC3   PC4  PC5
  [1,] -0.07900624 -0.0824864352  0.1208419434  0.1763425845  0.089545020
  [2,] -0.09114708 -0.0901675110  0.1377608881  0.2224127252  0.076620976
  [3,] -0.10510742 -0.0935434206  0.1113586044  0.2513993555  0.029783117

I want to extract PC1 and 1 value in the standard deviation

Thanks

-- 
Nuncio.M
Research Scientist
National Center for Antarctic and Ocean research
Head land Sada
Vasco da Gamma
Goa-403804

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] assigning to list element within target environment

2011-03-17 Thread Richard D. Morey
I would like to assign an value to an element of a list contained in an 
environment. The list will contain vectors and matrices. Here's a simple 
example:


# create toy environment
testEnv = new.env(parent = emptyenv())

# create list that will be in the environment, then assign() it
x = list(a=1,b=2)
assign(xList,x,testEnv)

# create new element, to be inserted into xList
c = 5:7

Now, what I'd like to do is something like this:

assign(xList[[3]],c,testEnv)

But the assign() help says:
assign does not dispatch assignment methods, so it cannot be used to 
set elements of vectors, names, attributes, etc.


So that's no good. Also, this fails:
attach(testEnv)
xList[[3]] - c
detach(testEnv)

because, as the attach() help says:
If you use - or assign to assign to an attached database, you only 
alter the attached copy, not the original object.


What I've been doing is making a copy of xList in the local environment, 
changing it, then using assign() to make the change in the testEnv 
environment. This is unsatisfactory due to the double use of memory 
making a copy of testEnv (the actual list I will use will be quite large).


Another option would be this:

# Assign the new element to a variable in the target environment
assign(temp,c,testEnv)
# Assign it within the target environment, from the temp variable
eval(parse(text=xList[[3]]=temp),env=testEnv)
# remove the unneeded variable
rm(temp,envir=testEnv)

But I figure there must be a more elegant way. Anyone have any ideas?

Thanks,
Richard

--
Richard D. Morey
Assistant Professor
Psychometrics and Statistics
Rijksuniversiteit Groningen / University of Groningen
http://drsmorey.org/research/rdmorey

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] changing the dimensions of a matrix in a real specific way

2011-03-17 Thread Tóth Dénes

 t(matrix(a,3,4))

for more complex arrays, see ?aperm


 Hi again,

 I'd like to ask you a question again.

 I have a matrix like this:
 a -matrix(c(1,2,3,4,5,6,7,8,9,10,11,12))
 a

   [,1]
  [1,]1
  [2,]2
  [3,]3
  [4,]4
  [5,]5
  [6,]6
  [7,]7
  [8,]8
  [9,]9
 [10,]   10
 [11,]   11
 [12,]   12

 Is there a proper way to change the dimensions of this matrix so that I'll
 get this as a result:
  [,1] [,2] [,3]
 [1,]   123
 [2,]   456
 [3,]   789
 [4,]  10   11   12

 Thank you very much and have a pleasant day,
 Laszlo

 
 Ez az e-mail és az összes hozzá tartozó csatolt melléklet titkos
 és/vagy jogilag, szakmailag vagy más módon védett információt
 tartalmazhat. Amennyiben nem Ön a levél címzettje akkor a levél
 tartalmának közlése, reprodukálása, másolása, vagy egyéb más
 úton történő terjesztése, felhasználása szigorúan tilos.
 Amennyiben tévedésből kapta meg ezt az üzenetet kérjük azonnal
 értesítse az üzenet küldőjét. Az Erste Bank Hungary Zrt. (EBH) nem
 vállal felelősséget az információ teljes és pontos - címzett(ek)hez
 történő - eljuttatásáért, valamint semmilyen késésért, kapcsolat
 megszakadásból eredő hibáért, vagy az információ
 felhasználásából vagy annak megbízhatatlanságából eredő
 kárért.

 Az üzenetek EBH-n kívüli küldője vagy címzettje tudomásul veszi és
 hozzájárul, hogy az üzenetekhez más banki alkalmazott is hozzáférhet
 az EBH folytonos munkamenetének biztosítása érdekében.


 This e-mail and any attached files are confidential and/...{{dropped:19}}

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Flexible rbind

2011-03-17 Thread sayan dasgupta
Hi Santosh,

May be you looking at something like this

merge(d,dNewTests,by=type,all=TRUE)


On Thu, Mar 17, 2011 at 1:03 PM, Santosh Srinivas 
santosh.srini...@gmail.com wrote:

 Dear All,

 I am trying to create a empty structure that I want to fill gradually
 through the code.
 I want to use something like rbind to create the basic structure first.

 I am looking for a possibility to do an rbind where the columns names
 dont match fully (but the missing columns can be defaulted to either
 zero or n/a)  (my actual data has a lot of columns).
 Please see the data frames below

 I have the following data frame that is filled
  dput(d)
 structure(list(type = structure(1:9, .Label = c(test1, test2,
 test3, test4, test5, test6, test7, test8, test9
 ), class = factor), meter = c(37.25122438, 58.65845732, 63.47108421,
 94.76085162, 13.75013867, 8.520664878, 79.74623167, 62.16109819,
 20.47806657), degree = c(2.884440333, 74.94067898, 32.64251152,
 83.24820274, 58.36084794, 12.64490368, 4.428796741, 32.12824213,
 97.83710088)), .Names = c(type, meter, degree), class =
 data.frame, row.names = c(NA,
 -9L))

 To the above data I want to append the following data to create an
 empty structure for the new data
  dput(dNewTests)
 structure(list(type = structure(1:9, .Label = c(test14, test15,
 test16, test17, test18, test19, test20, test21, test22
 ), class = factor)), .Names = type, class = data.frame, row.names =
 c(NA,
 -9L))

 rbind obviously throws the following error:

  rbind(d, dNewTests)
 Error in rbind(deparse.level, ...) :
  numbers of columns of arguments do not match

 Any way that I can default the missing columns to NA or some default
 value like zero?

 Also, I tried to go through the archives and tried merge_all. What am
 I doing wrong wih the code below ?

 require(reshape)
 merge_all(d, dNewTests,by=type)

 Error in fix.by(by.x, x) :
  'by' must specify column(s) as numbers, names or logical


 Thanks,
 Santosh

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] assigning to list element within target environment

2011-03-17 Thread Hadley Wickham
On Thu, Mar 17, 2011 at 7:25 AM, Richard D. Morey r.d.mo...@rug.nl wrote:
 I would like to assign an value to an element of a list contained in an
 environment. The list will contain vectors and matrices. Here's a simple
 example:

 # create toy environment
 testEnv = new.env(parent = emptyenv())

 # create list that will be in the environment, then assign() it
 x = list(a=1,b=2)
 assign(xList,x,testEnv)

 # create new element, to be inserted into xList
 c = 5:7

 Now, what I'd like to do is something like this:

 assign(xList[[3]],c,testEnv)

testEnv$x[[3]] - c

?

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Why doesn't this work ?

2011-03-17 Thread Barry Rowlingson
On Thu, Mar 17, 2011 at 3:54 AM,  bill.venab...@csiro.au wrote:
 It doesn't work (in R) because it is not written in R.  It's written in some 
 other language that looks a bit like R.

 It parses in R, so I would say it was written in R.

 To paraphrase Obi-wan, it's just not the R you are looking for.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] changing the dimensions of a matrix in a real specific way

2011-03-17 Thread Sarah Goslee
matrix(a, ncol=3, nrow=4, byrow=TRUE)

or

 dim(a) - c(3,4)
 a - t(a)
 a
 [,1] [,2] [,3]
[1,]123
[2,]456
[3,]789
[4,]   10   11   12

Depending on the context of the problem.

Sarah

On Thu, Mar 17, 2011 at 7:59 AM, Bodnar Laszlo EB_HU
laszlo.bod...@erstebank.hu wrote:
 Hi again,

 I'd like to ask you a question again.

 I have a matrix like this:
 a -matrix(c(1,2,3,4,5,6,7,8,9,10,11,12))
 a

      [,1]
  [1,]    1
  [2,]    2
  [3,]    3
  [4,]    4
  [5,]    5
  [6,]    6
  [7,]    7
  [8,]    8
  [9,]    9
 [10,]   10
 [11,]   11
 [12,]   12

 Is there a proper way to change the dimensions of this matrix so that I'll 
 get this as a result:
     [,1] [,2] [,3]
 [1,]   1    2    3
 [2,]   4    5    6
 [3,]   7    8    9
 [4,]  10   11   12

 Thank you very much and have a pleasant day,
 Laszlo


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Incorrect degrees of freedom in SEM model using lavaan

2011-03-17 Thread Andrew Miles
I have been trying to use lavaan (version 0.4-7) for a simple path model,
but the program seems to be computing far less degrees of freedom for my
model then it should have.  I have 7 variables, which should give (7)(8)/2 =
28 covariances, and hence 28 DF.  The model seems to only think I have 13
DF.  The code to reproduce the problem is below.  Have I done something
wrong, or is this something I should take to the developer?


library(lavaan)

myCov = matrix(c(24.40, 0, 0, 0, 0, 0, 0, .03, .03, 0, 0, 0, 0, 0, 6.75, -
.08, 519.38, 0, 0, 0, 0, .36, .01, 2.74, .18, 0, 0, 0, .51, .0, -.31, .02,
.2, .0, 0, -.17, .0, -1.6, -.04, .01, .25, 0, -.1, .02, -.03, .0, -.01, .01
, .02), nrow=7, byrow=TRUE, dimnames=list(c(Internet, Stress3,
HHincome, Race, Age, Gender, Stress1), c(Internet, Stress3,
HHincome, Race, Age, Gender, Stress1)))


model = '

Internet ~ HHincome + Race + Age + Gender + Stress1

Stress3 ~ Internet + HHincome + Race + Age + Gender + Stress1

'


fit=sem(model, sample.nobs=161, sample.cov=myCov, mimic.Mplus=FALSE)


#check the number of parameters being estimated

inspect(fit, what=free)


#Note the DF for the Chi-square is 0, when it should be 28-13 = 15

summary(fit)


Andrew Miles

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Autocorrelation in linear models

2011-03-17 Thread Ben Bolker
Arni Magnusson arnima at hafro.is writes:

 
 I have been reading about autocorrelation in linear models over the last 
 couple of days, and I have to say the more I read, the more confused I 
 get. Beyond confusion lies enlightenment, so I'm tempted to ask R-Help for 
 guidance.
 
 Most authors are mainly worried about autocorrelation in the residuals, 
 but some authors are also worried about autocorrelation within Y and 
 within X vectors before any model is fitted. Would you test for 
 autocorrelation both in the data and in the residuals?

   My immediate reaction is that autocorrelation in the raw data
(marginal autocorrelation) is not relevant. (There are exceptions,
of course -- in many ecological systems the marginal autocorrelation
tells us something about the processes driving the system, so we
may want to quantify/estimate it -- but I wouldn't generally think
that *testing* it (e.g. trying to reject a null hypothesis of
ACF=0) makes sense.)

 
 If we limit our worries to the residuals, it looks like we have a variety 
 of tests for lag=1:
 
stats::cor.test(residuals(fm)[-n], residuals(fm)[-1])
stats::Box.test(residuals(fm))
lmtest::dwtest(fm, alternative=two.sided)
lmtest::bgtest(fm, type=F)

  Note that (I think) all of these tests are based on
lag-1 autocorrelation only (I see you mention this
later).  Have you looked at nlme:::ACF ?  It is possible
to get non-significant autocorrelation at lag 1 with sig.
autocorrelation at higher lags.

 
 In my model, a simple lm(y~x1+x2) with n=20 annual measurements, I have 
 significant _positive_ autocorrelation within Y and within both X vectors, 
 but _negative_ autocorrelation in the residuals. 

  That's plausible. Again, I think the residual autocorrelation
is what you should worry about.

 The residual 
 autocorrelation is not quite significant, with the p-values
 
0.070
0.064
0.125
0.077
 
 from the tests above. I seem to remember some authors saying that the 
 Durbin-Watson test has less power than some alternative tests, as 
 reflected here. The difference in p-values is substantial,

  ?? I wouldn't necessarily say so -- I would guess you could get this
range of p-values from a single test statistic if you had
multiple simulated data sets from the same underlying model
and parameters ...  Have you tried running such simulations?

 so choosing 
 which test to use could in many cases make a big difference for the 
 subsequent analysis and conclusions. Most of them (cor.test, Box.test, 
 bgtest) can also test lags1. Which test would you recommend? I imagine 
 the basic cor.test is somehow inappropriate for this; the other tests 
 wouldn't have been invented otherwise, right?

  I don't know the details (it's been a while since I did time
series analysis, and it wasn't in this particular vein.)

 The car::dwt(fm) has p-values fluctuating by a factor of 2, unless I run a 
 very long simulation, which results in a p-value similar to 
 lmtest::dwtest, at least in my case.
 
 Finally, one question regarding remedies. If there was significant 
 _positive_ autocorrelation in the residuals, some authors suggest 
 remedying this by deflating the df (fewer effective df in the data) and 
 redo the t-tests of the regression coefficients, rejecting fewer null 
 hypotheses. Does that mean if the residuals are _negatively_ correlated 
 then I should inflate the df (more effective df in the data) and reject 
 more null hypotheses?

   My personal taste is that these df adjustments are bit cheesy.
Most of the time I would prefer to fit a model that incorporated
autocorrelation (i.e. nlme::gls(y~x1+x2,correlation=corAR1()) [or pick
another choice of time-series model from ?corClasses].

  More generally, this whole approach falls into the category of
test for presence of XX; if XX is not statistically significant
then ignore it, which is worrisome (if your test for XX is very
powerful then you will be concerned about dealing with XX even when
its effect on your results would be trivial; if your test for XX
is weak or you have very little data then you won't detect
XX even when it is present).  I would say that if you're really
concerned about autocorrelation you should just automatically use
a modeling approach (see above) that incorporates it.

 
 That's four question marks. I'd greatly appreciate guidance on any of 
 them.
 
 Thanks in advance,
 

  cheers
Ben

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bug in lattice auto.key argument

2011-03-17 Thread Deepayan Sarkar
On Wed, Mar 16, 2011 at 12:52 AM,  jlu...@ria.buffalo.edu wrote:
 The Lattice auto.key argument has a bug in R.12.2.

 R version 2.12.2 (2011-02-25)
 Platform: i386-pc-mingw32/i386 (32-bit)
 
 other attached packages:
 [1] lattice_0.19-17

 loaded via a namespace (and not attached):
 [1] grid_2.12.2

 If I set up my plot parameters as

 require(lattice)
 superpose.line.settings - trellis.par.get(superpose.line)
 str(superpose.line.settings)
 superpose.line.settings$col - 1
 superpose.line.settings$lty - c(1,2,5)     *
 superpose.line.settings$lwd - 2
 trellis.par.set(superpose.line,superpose.line.settings)

 and then set up my key list as

 my.key - list(space=top, points=FALSE, lines=TRUE, columns=3)

 and then run xyplot with the argument auto.key=my.key,
 I do NOT get the proper legend at the top of the plot.
 Instead of the lines in the legend having the characteristics of
 lty=c(1,2,5), they have the characteristics of lwd=c(1,2,5).

I think Felix has fixed this bug a couple of days ago. Please check
the r-forge version from

https://r-forge.r-project.org/R/?group_id=638

There should be a new release sometime next week.

-Deepayan


 The auto.key argument works fine in R.12.1

 R version 2.12.1 (2010-12-16)
 Platform: i386-pc-mingw32/i386 (32-bit)
 .
 other attached packages:
 [1] lattice_0.19-13

 loaded via a namespace (and not attached):
 [1] grid_2.12.1


 Joe
        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] setting up a R working group

2011-03-17 Thread Joanne Demmler

Dear R users,

we are currently trying to set up a R working group at Swansea University.
I would be very grateful to get some information and feedback from R 
users that have done something similar, in particular regarding:


- help with giving a general overview of what R is capable of
- help with what to present or teach (did you organise your own training 
programme or did you get someone from outside?)

- are there people out there who would like to present their work to us?

Any help will be highly appreciated!
Thanks Joanne

--
Joanne Demmler Ph.D.
Research Assistant
College of Medicine
Swansea University
Singleton Park
Swansea SA2 8PP
UK
tel:+44 (0)1792 295674
fax:+44 (0)1792 513430
email:  j.demm...@swansea.ac.uk
DECIPHer:   www.decipher.uk.net

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Using barplot() with zoo -- names.arg not permitted?

2011-03-17 Thread David Wolfskill
I've used barplot(), including the anmes.arg parameter, on data frames
successfully, but I'm even newer to using zoo than I am to R.  :-}

I am working on a functon that accepts a data frame (df) as its
primary argument, extracts information from it to create a zoo, then
generates a plot based on that.

The data frame has a column called time which is a standard UNIX time
-- the number of seconds since 01 Jan 1970 (UTC), so I treat that as a
POSIXct.  (The types of the other columns in the data frame varies
greatly, and the sequence in which they occur is not expected to be
constant.  Thus, I find it easier to let R make a reasonable guess as to
the data type, and override special cases -- such as time -- on an
individual basis.)

The data frame contains certain columns whose values are counters, so I
generate a zoo of the diff():

  oid = kern.cp_time_
  pat = paste(^, oid, sep = )
  class(df$time) - POSIXct
  df_d - diff(zoo(df[, grep(pat, names(df))], order.by = df$time))

I then save the start  end timestamps for future reference, and
generate another zoo, this one containing only the percentages from
df_d:

  cpu_states - c(sys, intr, user, nice, idle)
  d_range - range(index(df_d))
  df_pct - sweep(df_d, 1, apply(df_d, 1, sum), /)[, paste(oid, 
cpu_states[1:4], sep = )] * 100

Well, that is, I save the percentages in which I'm interested.

So far, so good.  df_pct is a zoo (as expected, and the content looks
reasonable.

I then plot it, e.g.:

barplot(df_pct, border = NA, col = colvec, space = 0, main = title,
  sub = sub, ylab = CPU %, ylim = c(0, 100), xlab = Time)

But the X-axis labels show up as large integers -- the POSIXct values
are apparently treated as numeric quantities for this purpose.

And if I try

barplot(df_pct, border = NA, col = colvec, space = 0, main = title,
  sub = sub, ylab = CPU %, ylim = c(0, 100), xlab = Time,
  names.arg = index(df_pct))

I am told:

Warning messages:
1: In plot.window(xlim, ylim, log = log, ...) :
  names is not a graphical parameter
2: In axis(if (horiz) 2 else 1, at = at.l, labels = names.arg, lty = axis.lty,  
:
  names is not a graphical parameter
3: In title(main = main, sub = sub, xlab = xlab, ylab = ylab, ...) :
  names is not a graphical parameter
4: In axis(if (horiz) 1 else 2, cex.axis = cex.axis, ...) :
  names is not a graphical parameter

which I find a bit perplexing.

If I plot via:

barplot(df_pct, border = NA, col = col_vec, space = 0, main = title,
  sub = sub, ylab = CPU %, ylim = c(0, 100), x lab = Time,
  axisnames = FALSE)

that works OK, but then I'm unable to generate a human-readable set
of X-axis labels: At best, I'm able to get a single label at the origin.

Here's an excerpt from the generated df_d:
Browse[2] df_d[1:10]
   kern.cp_time_idle kern.cp_time_intr kern.cp_time_nice
1297278017   275 0 0
1297278018   231 0 0
1297278019   266 1 0
1297278020   230 2 0
1297278021   191 0 0
1297278022   114 0 0
129727802330 0 0
1297278024 0 0 0
1297278025 0 0 0
1297278026 0 0 0
   kern.cp_time_sys kern.cp_time_user
12972780178 1
1297278018   1735
1297278019   10 6
1297278020   2824
1297278021   2172
1297278022   27   147
1297278023   43   219
1297278024   47   249
1297278025   41   259
1297278026815


Here's a excerpt from the generated df_pct:

Browse[2] df_pct[1:10]
   kern.cp_time_sys kern.cp_time_intr kern.cp_time_user
1297278017 2.816901 0.000 0.3521127
1297278018 6.007067 0.00012.3674912
1297278019 3.533569 0.3533569 2.1201413
1297278020 9.859155 0.7042254 8.4507042
1297278021 7.394366 0.00025.3521127
1297278022 9.375000 0.00051.0416667
129727802314.726027 0.00075.000
129727802415.878378 0.00084.1216216
129727802513.67 0.00086.333
129727802634.782609 0.00065.2173913
   kern.cp_time_nice
1297278017 0
1297278018 0
1297278019 0
1297278020 0
1297278021 0
1297278022 0
1297278023 0
1297278024  

Re: [R] Help- Fitting a Thin Plate Spline

2011-03-17 Thread Simon Wood

library(mgcv)

b - gam(arcsine.success ~ s(date.num,clutch.size,k=50))
vis.gam(b,theta=30)

will fit a thin plate spline and plot it. k is an upper limit on the 
number of degrees of freedom for the TPS, but the actual degrees of 
freedom are chosen automatically (use the 'method' argument of gam to 
choose exactly how). Increase k if 50 is overly restrictive here.


gam will also let you fit binary/binomial data directly, so you could 
probably do away with the arcsine transformation (if I'm guessing right 
about what you are doing here).


There are also several other libraries that will let you fit thin plate 
splines: gss, for example.


Simon




On 14/03/11 09:55, kehoe wrote:

Hi Everyone,

I'm a pretty useless r-er but have data that SPSS etc doesn't like. I've
managed to do GLMs for my data, but now need to fit a thin plate spline for
my data (arcsine.success~date.num:clutch.size)
If anyone has a bit of spare time and could come up with a bit of code I'd
be very grateful- I just don't get R language!

Thanks

Rach

--
View this message in context: 
http://r.789695.n4.nabble.com/Help-Fitting-a-Thin-Plate-Spline-tp3353485p3353485.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
---
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603   http://people.bath.ac.uk/sw283

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to make sure R's g++ compiler uses certain C++ flags when making a package

2011-03-17 Thread Mike Marchywka




 Date: Wed, 16 Mar 2011 13:50:37 -0700
 From: solomon.mess...@gmail.com
 To: r-help@r-project.org
 Subject: Re: [R] How to make sure R's g++ compiler uses certain C++ flags 
 when making a package

 Looks like the problem may be that R is automatically passing this flag to
 the g++ compiler: -arch x86_64 which appears to be causing trouble for
 opencv. Does anyone know how to suppress this flag?

Are you builing a 32 or 64 bit app? I haven't looked but offhand that may be
what this flag is for ( you can man g++ and check I suppose ) 

I was going to reply after doing some checking as I have never actually made
a complete R package but I did make a test one and verified I could call
the c++ code from R. I seem to remember that it was easy to just invoke g++
explicitly and R would use whatever binaries you happen to have left there.
That is, if you just compile the code for your R modules yourself there
is no real reason to worry about how R does it. Mixing some flags will
prevent linking or, worse, create mysterious run time errors.

Also, IIRC, your parameters were the output of foreign scripts ( opencv 
apparently)
and not too helpful here ( and in any case you should echo these to see what 
they
are if they are an issue). 





 --

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lmm WITHOUT random factor (lme4)

2011-03-17 Thread Mark Difford
On Mar 17, 2011; 11:43am Baugh wrote:

 Question: can I simply substitute a dummy var (e.g. populated by zeros)
 for ID to run the model 
 without the random factor? when I try this R returns values that seem
 reasonable, but I want to be sure 
 this is appropriate.

If you can fit the model using lme (and it looks like you easily can) then
another check would be:

## Compare models with and without random effects
fm - lm(Reaction ~ Days, sleepstudy)
fm1 - lme(Reaction ~ Days, random= ~1|Subject, sleepstudy)
anova(fm1, fm) ## lme-fitted model must come first

Regards, Mark.

--
View this message in context: 
http://r.789695.n4.nabble.com/lmm-WITHOUT-random-factor-lme4-tp3384054p3384072.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange R squared, possible error

2011-03-17 Thread derek
Exuse me, I don't claim R^2 can't be negative. What I say if I get R^2
negative then the data are useless.
I know, that what Thomas said is true in general case. But in my special
case of data, using nonzero intercept is nonsense, and to get R^2 less than
0.985 is considered poor job (standard R^20.995). (R^2 given by R^2 = 1 -
Sum(R[i]^2) / Sum((y[i])^2) )

Because lm() uses two differrent formulas for computing R^2,
it is confusing to get R^2 closer to 1 when linear model with zero intercept
y=a*x (a = slope) is used, rather than in case with model y=a*x+b (a=slope,
b= nonzero intercept).

I think R^2 is only measure of good fit for least squares optimization and
it doesn't matter which formula is used: (R^2 = 1 - Sum(R[i]^2) /
Sum((y[i])^2) or R^2 = 1 - Sum(R[i]^2) / Sum((y[i])^2-y*)), but using both
is confusing.

So I would like to know why two different formulas for R^2 are used? 



--
View this message in context: 
http://r.789695.n4.nabble.com/Strange-R-squared-possible-error-tp3382818p3383992.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] lmm WITHOUT random factor (lme4)

2011-03-17 Thread Baugh
LMM without Random effect:

I want to run an LMM both with and without the random factor (ID). And then
extract the log-lik values from the two models in order to generate a
p-value.

with random factor as:lmer(y~x+(1|ID),data)

Question: can I simply substitute a dummy var (e.g. populated by zeros) for
ID to run the model without the random factor? when I try this R returns
values that seem reasonable, but I want to be sure this is appropriate.

From other inquires in the forum it seems that substituting (ID-1) or some
variant like that, does not work.

Thanks for your thoughts. 
Baugh


--
View this message in context: 
http://r.789695.n4.nabble.com/lmm-WITHOUT-random-factor-lme4-tp3384054p3384054.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange R squared, possible error

2011-03-17 Thread Hadley Wickham
 2) I don't want to fit data with linear model of zero intercept.
 3) I dont know if I understand correctly. Im 100% sure the model for my data
 should have zero intercept.
 The only coordinate which Im 100% sure is correct. If I had measured quality
 Y of a same sample X0 number of times I would get E(Y(X0))=0.

Are points 2) and 3) not contradictory?

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cv.lm syntax error

2011-03-17 Thread agent dunham
I rewrite my previous comand, CVlm works now. It was related to having the
same names in the df than in the formula included at CVlm . 

Now I'd like to know if the seed is of any special importance, or I can type
just seed=29 like in the example. 

Thanks in advance, u...@host.com

--
View this message in context: 
http://r.789695.n4.nabble.com/cv-lm-syntax-error-tp3334889p3384198.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Segmentation fault when using plot function

2011-03-17 Thread cchace

I frequently get a segmentation fault error when using the plot command.
It happens about half the time. 

We are running an old version of R (R version 2.8.0 (2008-10-20) on Linux.

I did a quick search for this problem and didn't find anything. Apologies if
I missed it.


 *** Process received signal ***
 Signal: Segmentation fault (11)
 Signal code: Address not mapped (1)
 Failing at address: 0x231e18f22
 [ 0] /lib64/libc.so.6 [0x3c84a300a0]
 [ 1] /lib64/libc.so.6(strlen+0x10) [0x3c84a76090]
 [ 2] /usr/lib64/libX11.so.6 [0x3c8765d101]
 [ 3] /usr/lib64/libX11.so.6(_XlcCreateLC+0x46) [0x3c8765e5d6]
 [ 4] /usr/lib64/libX11.so.6(_XlcUtf8Loader+0x10) [0x3c8767ebe0]
 [ 5] /usr/lib64/libX11.so.6(_XOpenLC+0x104) [0x3c87665674]
 [ 6] /usr/lib64/libX11.so.6(_XlcCurrentLC+0x8) [0x3c87665768]
 [ 7] /usr/lib64/libX11.so.6(XSupportsLocale+0x9) [0x3c87665b59]
 [ 8] /usr/lib64/R/modules//R_X11.so(X11_Open+0x51) [0x2aaab8a32581]
 [ 9] /usr/lib64/R/modules//R_X11.so(X11DeviceDriver+0x19a) [0x2aaab8a336aa]
 [10] /usr/lib64/R/modules//R_X11.so [0x2aaab8a33c30]
 [11] /usr/lib64/R/lib/libR.so [0x3659cff75e]
 [12] /usr/lib64/R/lib/libR.so(Rf_eval+0x427) [0x3659ccc2d7]
 [13] /usr/lib64/R/lib/libR.so [0x3659ccd6a2]
 [14] /usr/lib64/R/lib/libR.so(Rf_eval+0x427) [0x3659ccc2d7]
 [15] /usr/lib64/R/lib/libR.so(Rf_applyClosure+0x291) [0x3659ccfa41]
 [16] /usr/lib64/R/lib/libR.so(Rf_eval+0x2fb) [0x3659ccc1ab]
 [17] /usr/lib64/R/lib/libR.so(GEcurrentDevice+0x158) [0x3659c95a88]
 [18] /usr/lib64/R/lib/libR.so [0x3659d25c99]
 [19] /usr/lib64/R/lib/libR.so [0x3659cff75e]
 [20] /usr/lib64/R/lib/libR.so(Rf_eval+0x427) [0x3659ccc2d7]
 [21] /usr/lib64/R/lib/libR.so [0x3659ccd6a2]
 [22] /usr/lib64/R/lib/libR.so(Rf_eval+0x427) [0x3659ccc2d7]
 [23] /usr/lib64/R/lib/libR.so(Rf_applyClosure+0x291) [0x3659ccfa41]
 [24] /usr/lib64/R/lib/libR.so(Rf_eval+0x2fb) [0x3659ccc1ab]
 [25] /usr/lib64/R/lib/libR.so [0x3659ccd6a2]
 [26] /usr/lib64/R/lib/libR.so(Rf_eval+0x427) [0x3659ccc2d7]
 [27] /usr/lib64/R/lib/libR.so(Rf_applyClosure+0x291) [0x3659ccfa41]
 [28] /usr/lib64/R/lib/libR.so(Rf_usemethod+0x3fa) [0x3659d019da]
 [29] /usr/lib64/R/lib/libR.so [0x3659d0203d]
 *** End of error message ***

Process R segmentation fault at Thu Mar 17 09:33:59 2011

--
View this message in context: 
http://r.789695.n4.nabble.com/Segmentation-fault-when-using-plot-function-tp3384603p3384603.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] setting up a R working group

2011-03-17 Thread Rainer M Krug
On Thu, Mar 17, 2011 at 2:42 PM, Joanne Demmler j.demm...@swansea.ac.uk wrote:
 Dear R users,

Hi


 we are currently trying to set up a R working group at Swansea University.
 I would be very grateful to get some information and feedback from R users
 that have done something similar, in particular regarding:

We started one,but it fell apart after I left the University - first
lesson: you need a critical mass and at least two people who are
prepared to drive it.


 - help with giving a general overview of what R is capable of
 - help with what to present or teach (did you organise your own training
 programme or did you get someone from outside?)

We managed with organising it internally, but there were topics, for
which external lecturers were very useful - it depends what expertise
your R userbase has. It is always better to do the training
internally, as then there is always the possibility to go back to the
person who taught the course.

 - are there people out there who would like to present their work to us?

I guess that depends if you are able / willing to at least are
prepared to provide travel expenses and what you expect?


 Any help will be highly appreciated!

Welcome - and please fel free to ask more detailed questions. By the
way - there is also a sig R-teach which might also be of help.

Cheers,

Rainer

 Thanks Joanne

 --
 Joanne Demmler Ph.D.
 Research Assistant
 College of Medicine
 Swansea University
 Singleton Park
 Swansea SA2 8PP
 UK
 tel:            +44 (0)1792 295674
 fax:            +44 (0)1792 513430
 email:          j.demm...@swansea.ac.uk
 DECIPHer:       www.decipher.uk.net

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
NEW GERMAN FAX NUMBER!!!

Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
Biology, UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Natural Sciences Building
Office Suite 2039
Stellenbosch University
Main Campus, Merriman Avenue
Stellenbosch
South Africa

Cell:           +27 - (0)83 9479 042
Fax:            +27 - (0)86 516 2782
Fax:            +49 - (0)321 2125 2244
email:          rai...@krugs.de

Skype:          RMkrug
Google:         r.m.k...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using barplot() with zoo -- names.arg not permitted?

2011-03-17 Thread Gabor Grothendieck
On Thu, Mar 17, 2011 at 9:38 AM, David Wolfskill da...@catwhisker.org wrote:
 I've used barplot(), including the anmes.arg parameter, on data frames
 successfully, but I'm even newer to using zoo than I am to R.  :-}

 I am working on a functon that accepts a data frame (df) as its
 primary argument, extracts information from it to create a zoo, then
 generates a plot based on that.

 The data frame has a column called time which is a standard UNIX time
 -- the number of seconds since 01 Jan 1970 (UTC), so I treat that as a
 POSIXct.  (The types of the other columns in the data frame varies
 greatly, and the sequence in which they occur is not expected to be
 constant.  Thus, I find it easier to let R make a reasonable guess as to
 the data type, and override special cases -- such as time -- on an
 individual basis.)

 The data frame contains certain columns whose values are counters, so I
 generate a zoo of the diff():

  oid = kern.cp_time_
  pat = paste(^, oid, sep = )
  class(df$time) - POSIXct
  df_d - diff(zoo(df[, grep(pat, names(df))], order.by = df$time))

 I then save the start  end timestamps for future reference, and
 generate another zoo, this one containing only the percentages from
 df_d:

  cpu_states - c(sys, intr, user, nice, idle)
  d_range - range(index(df_d))
  df_pct - sweep(df_d, 1, apply(df_d, 1, sum), /)[, paste(oid, 
 cpu_states[1:4], sep = )] * 100

 Well, that is, I save the percentages in which I'm interested.

 So far, so good.  df_pct is a zoo (as expected, and the content looks
 reasonable.

 I then plot it, e.g.:

 barplot(df_pct, border = NA, col = colvec, space = 0, main = title,
  sub = sub, ylab = CPU %, ylim = c(0, 100), xlab = Time)

 But the X-axis labels show up as large integers -- the POSIXct values
 are apparently treated as numeric quantities for this purpose.

 And if I try

 barplot(df_pct, border = NA, col = colvec, space = 0, main = title,
  sub = sub, ylab = CPU %, ylim = c(0, 100), xlab = Time,
  names.arg = index(df_pct))

 I am told:

 Warning messages:
 1: In plot.window(xlim, ylim, log = log, ...) :
  names is not a graphical parameter
 2: In axis(if (horiz) 2 else 1, at = at.l, labels = names.arg, lty = 
 axis.lty,  :
  names is not a graphical parameter
 3: In title(main = main, sub = sub, xlab = xlab, ylab = ylab, ...) :
  names is not a graphical parameter
 4: In axis(if (horiz) 1 else 2, cex.axis = cex.axis, ...) :
  names is not a graphical parameter

 which I find a bit perplexing.

 If I plot via:

 barplot(df_pct, border = NA, col = col_vec, space = 0, main = title,
  sub = sub, ylab = CPU %, ylim = c(0, 100), x lab = Time,
  axisnames = FALSE)

 that works OK, but then I'm unable to generate a human-readable set
 of X-axis labels: At best, I'm able to get a single label at the origin.

 Here's an excerpt from the generated df_d:
 Browse[2] df_d[1:10]
           kern.cp_time_idle kern.cp_time_intr kern.cp_time_nice
 1297278017               275                 0                 0
 1297278018               231                 0                 0
 1297278019               266                 1                 0
 1297278020               230                 2                 0
 1297278021               191                 0                 0
 1297278022               114                 0                 0
 1297278023                30                 0                 0
 1297278024                 0                 0                 0
 1297278025                 0                 0                 0
 1297278026                 0                 0                 0
           kern.cp_time_sys kern.cp_time_user
 1297278017                8                 1
 1297278018               17                35
 1297278019               10                 6
 1297278020               28                24
 1297278021               21                72
 1297278022               27               147
 1297278023               43               219
 1297278024               47               249
 1297278025               41               259
 1297278026                8                15


 Here's a excerpt from the generated df_pct:

 Browse[2] df_pct[1:10]
           kern.cp_time_sys kern.cp_time_intr kern.cp_time_user
 1297278017         2.816901         0.000         0.3521127
 1297278018         6.007067         0.000        12.3674912
 1297278019         3.533569         0.3533569         2.1201413
 1297278020         9.859155         0.7042254         8.4507042
 1297278021         7.394366         0.000        25.3521127
 1297278022         9.375000         0.000        51.0416667
 1297278023        14.726027         0.000        75.000
 1297278024        15.878378         0.000        84.1216216
 1297278025        13.67         0.000        86.333
 1297278026        34.782609         0.000        65.2173913
           kern.cp_time_nice
 1297278017                 0
 1297278018                 0
 1297278019      

Re: [R] lmm WITHOUT random factor (lme4)

2011-03-17 Thread ONKELINX, Thierry
Dear Mark,

You cannot compare lm() with lme() because the likelihoods are not the same. 
Use gls() instead of lm()

library(nlme)
data(sleepstudy, package = lme4)
fm - lm(Reaction ~ Days, sleepstudy)
fm0 - gls(Reaction ~ Days, sleepstudy)
logLik(fm)
logLik(fm0)

fm1 - lme(Reaction ~ Days, random= ~1|Subject, sleepstudy)
anova(fm0, fm1)

Best regards,

Thierry


ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie  Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics  Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than 
asking him to perform a post-mortem examination: he may be able to say what the 
experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure 
that a reasonable answer can be extracted from a given body of data.
~ John Tukey
  

 -Oorspronkelijk bericht-
 Van: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] Namens Mark Difford
 Verzonden: donderdag 17 maart 2011 10:55
 Aan: r-help@r-project.org
 Onderwerp: Re: [R] lmm WITHOUT random factor (lme4)
 
 On Mar 17, 2011; 11:43am Baugh wrote:
 
  Question: can I simply substitute a dummy var (e.g. populated by 
  zeros) for ID to run the model without the random factor? when I 
  try this R returns values that seem reasonable, but I want 
 to be sure 
  this is appropriate.
 
 If you can fit the model using lme (and it looks like you 
 easily can) then another check would be:
 
 ## Compare models with and without random effects fm - 
 lm(Reaction ~ Days, sleepstudy)
 fm1 - lme(Reaction ~ Days, random= ~1|Subject, sleepstudy)
 anova(fm1, fm) ## lme-fitted model must come first
 
 Regards, Mark.
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/lmm-WITHOUT-random-factor-lme4-t
 p3384054p3384072.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange R squared, possible error

2011-03-17 Thread Peter Ehlers

On 2011-03-17 02:08, derek wrote:

Exuse me, I don't claim R^2 can't be negative. What I say if I get R^2
negative then the data are useless.
I know, that what Thomas said is true in general case. But in my special
case of data, using nonzero intercept is nonsense, and to get R^2 less than
0.985 is considered poor job (standard R^20.995). (R^2 given by R^2 = 1 -
Sum(R[i]^2) / Sum((y[i])^2) )

Because lm() uses two differrent formulas for computing R^2,
it is confusing to get R^2 closer to 1 when linear model with zero intercept
y=a*x (a = slope) is used, rather than in case with model y=a*x+b (a=slope,
b= nonzero intercept).

I think R^2 is only measure of good fit for least squares optimization and
it doesn't matter which formula is used: (R^2 = 1 - Sum(R[i]^2) /
Sum((y[i])^2) or R^2 = 1 - Sum(R[i]^2) / Sum((y[i])^2-y*)), but using both
is confusing.

So I would like to know why two different formulas for R^2 are used?



The beauty of R is that you can study the code and if you
don't like it, you can just write your own. I doubt that
anyone here would be offended if you used whatever definition
of R^2 you like best.

Thomas has explained clearly what R does and why. If you
find R's usage confusing then your understanding of
regression is lacking the depth required to apply it.
You would probably benefit considerably from consulting
a text or two on regression.

Peter Ehlers




--
View this message in context: 
http://r.789695.n4.nabble.com/Strange-R-squared-possible-error-tp3382818p3383992.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Segmentation fault when using plot function

2011-03-17 Thread Henrik Bengtsson
On Thu, Mar 17, 2011 at 6:58 AM, cchace cch...@wellington.com wrote:

 I frequently get a segmentation fault error when using the plot command.
 It happens about half the time.

 We are running an old version of R (R version 2.8.0 (2008-10-20) on Linux.

 I did a quick search for this problem and didn't find anything. Apologies if
 I missed it.

Eventually some kind person with lots of extra time on his/her hand
will install R v2.8.0 on their machine, try to reproduce the problem,
troubleshoot it, fix the bug, and provide you with a customized
retroactive patched R v2.8.0 that you could install.  While waiting
for that to happen, you may want to try out R v2.12.2.

/Henrik



  *** Process received signal ***
  Signal: Segmentation fault (11)
  Signal code: Address not mapped (1)
  Failing at address: 0x231e18f22
  [ 0] /lib64/libc.so.6 [0x3c84a300a0]
  [ 1] /lib64/libc.so.6(strlen+0x10) [0x3c84a76090]
  [ 2] /usr/lib64/libX11.so.6 [0x3c8765d101]
  [ 3] /usr/lib64/libX11.so.6(_XlcCreateLC+0x46) [0x3c8765e5d6]
  [ 4] /usr/lib64/libX11.so.6(_XlcUtf8Loader+0x10) [0x3c8767ebe0]
  [ 5] /usr/lib64/libX11.so.6(_XOpenLC+0x104) [0x3c87665674]
  [ 6] /usr/lib64/libX11.so.6(_XlcCurrentLC+0x8) [0x3c87665768]
  [ 7] /usr/lib64/libX11.so.6(XSupportsLocale+0x9) [0x3c87665b59]
  [ 8] /usr/lib64/R/modules//R_X11.so(X11_Open+0x51) [0x2aaab8a32581]
  [ 9] /usr/lib64/R/modules//R_X11.so(X11DeviceDriver+0x19a) [0x2aaab8a336aa]
  [10] /usr/lib64/R/modules//R_X11.so [0x2aaab8a33c30]
  [11] /usr/lib64/R/lib/libR.so [0x3659cff75e]
  [12] /usr/lib64/R/lib/libR.so(Rf_eval+0x427) [0x3659ccc2d7]
  [13] /usr/lib64/R/lib/libR.so [0x3659ccd6a2]
  [14] /usr/lib64/R/lib/libR.so(Rf_eval+0x427) [0x3659ccc2d7]
  [15] /usr/lib64/R/lib/libR.so(Rf_applyClosure+0x291) [0x3659ccfa41]
  [16] /usr/lib64/R/lib/libR.so(Rf_eval+0x2fb) [0x3659ccc1ab]
  [17] /usr/lib64/R/lib/libR.so(GEcurrentDevice+0x158) [0x3659c95a88]
  [18] /usr/lib64/R/lib/libR.so [0x3659d25c99]
  [19] /usr/lib64/R/lib/libR.so [0x3659cff75e]
  [20] /usr/lib64/R/lib/libR.so(Rf_eval+0x427) [0x3659ccc2d7]
  [21] /usr/lib64/R/lib/libR.so [0x3659ccd6a2]
  [22] /usr/lib64/R/lib/libR.so(Rf_eval+0x427) [0x3659ccc2d7]
  [23] /usr/lib64/R/lib/libR.so(Rf_applyClosure+0x291) [0x3659ccfa41]
  [24] /usr/lib64/R/lib/libR.so(Rf_eval+0x2fb) [0x3659ccc1ab]
  [25] /usr/lib64/R/lib/libR.so [0x3659ccd6a2]
  [26] /usr/lib64/R/lib/libR.so(Rf_eval+0x427) [0x3659ccc2d7]
  [27] /usr/lib64/R/lib/libR.so(Rf_applyClosure+0x291) [0x3659ccfa41]
  [28] /usr/lib64/R/lib/libR.so(Rf_usemethod+0x3fa) [0x3659d019da]
  [29] /usr/lib64/R/lib/libR.so [0x3659d0203d]
  *** End of error message ***

 Process R segmentation fault at Thu Mar 17 09:33:59 2011

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Segmentation-fault-when-using-plot-function-tp3384603p3384603.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] possible problem with endpoints?

2011-03-17 Thread Joshua Ulrich
Hi Erin,

On Thu, Mar 17, 2011 at 1:27 AM, Erin Hodgess erinm.hodg...@gmail.com wrote:
 Dear R People:

 Hello again!

 I found something unusual in the behavior of the endpoints function
 from the xts package:

 x1 - ts(1:24,start=2008,freq=12)
 dat - seq(as.Date(2008/01/01),length=24,by=months)
 library(zoo);library(xts)
 x2 - zoo(1:24,order=dat)
 #Here is the surprise:
 endpoints(as.xts(x1),'quarters')
  [1]  0  1  4  7 10 13 16 19 22 24
 endpoints(as.xts(x2),'quarters')
 [1]  0  3  6  9 12 15 18 21 24


 Shouldn't the endpoints be the same, please?  What am I doing wrong, please?

Looks like a timezone issue.  I'll investigate further.

 endpoints(as.xts(x1,tz=GMT),'quarters')
[1]  0  3  6  9 12 15 18 21 24

 Thanks,
 Erin


 --
 Erin Hodgess
 Associate Professor
 Department of Computer and Mathematical Sciences
 University of Houston - Downtown
 mailto: erinm.hodg...@gmail.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


--
Joshua Ulrich  |  FOSS Trading: www.fosstrading.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R² for non-linear model

2011-03-17 Thread Kjetil Halvorsen
see inline.

On Thu, Mar 17, 2011 at 4:58 AM, Rubén Roa r...@azti.es wrote:
 Hi Alexx,

 I don't see any problem in comparing models based on different distributions 
 for the same data using the AIC, as long as they have a different number of 
 parameters and all the constants are included.
 For example, you can compare distribution mixture models with different 
 number of components using the AIC.
 This is one example:
 Roa-Ureta. 2010. A Likelihood-Based Model of Fish Growth With Multiple Length 
 Frequency Data. Journal of Biological, Agricultural and Environmental 
 Statistics 15:416-429.
 Here is another example:
 www.education.umd.edu/EDMS/fac/Dayton/PCIC_JMASM.pdf
 Prof. Dayton writes above that one advantage of AIC over hypothesis testing 
 is:
 (d) Considerations related to underlying distributions   for   random   
 variables   can   be
 incorporated  into  the  decision-making  process rather than being treated 
 as an assumption whose
 robustness  must  be  considered  (e.g.,  models based  on  normal  densities 
  and  on  log-normal
 densities can be compared).

My  reading of this is that AIC can be used to compare models with
densities relative to the same
dominating measure.

Kjetil

 Last, if you read Akaike's theorem you will see there is nothing precluding 
 comparing models built on different distributional models. Here it is:
  the expected (over the sample space and the space of parameter estimates) 
 maximum log-likelihood of some data on a working model overshoots the 
 expected (over the sample space only) maximum log-likelihood of the data 
 under the true model that
 generated the data by exactly the number of  parameters in the working model.
 A remarkable result.

 Rubén

 -Original Message-
 From: r-help-boun...@r-project.org on behalf of Alexx Hardt
 Sent: Wed 3/16/2011 7:42 PM
 To: r-help@r-project.org
 Subject: Re: [R] R² for non-linear model

 Am 16.03.2011 19:34, schrieb Anna Gretschel:
 Am 16.03.2011 19:21, schrieb Alexx Hardt:
 And to be on-topic: Anna, as far as I know anova's are only useful to
 compare a submodel (e.g. with one less regressor) to another model.

 thanks! i don't get it either what they mean by fortune...

 It's an R-package (and a pdf [1]) with collected quotes from the mailing
 list.
 Be careful with the suggestion to use AIC. If you wanted to compare two
 models using AICs, you need the same distribution (that is,
 Verteilungsannahme) in both models.
 To my knowledge, there is no way to compare a gaussian model to an
 exponential one (except common sense), but my knowledge is very limited.

 [1] http://cran.r-project.org/web/packages/fortunes/vignettes/fortunes.pdf

 --
 alexx@alexx-fett:~$ vi .emacs

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Incorrect degrees of freedom in SEM model using lavaan

2011-03-17 Thread Gustavo Carvalho
Your model is saturated.

I think lavaan calculates the number of degrees of freedom this way:

DF = n*(n + 1)/2 - t - n.fix*(n.fix + 1)/2

n = number of variables
t = number of free parameters
n.fix = number of fixed exogenous variables

So, if you fix the exogenous variables, as in mimic = Mplus:

DF = 28 - 13 - 15 = 0

If you don't, as in mimic = EQS:

DF = 28 - 28 - 0 = 0

You are probably not considering the variances and covariances of the
exogenous variables to arrive at 15 degrees of freedom:

model = '

Internet + Stress3 ~ HHincome + Race + Age + Gender + Stress1

Stress3 ~ Internet

HHincome ~~ 0*Race + 0*Age + 0*Gender + 0*Stress1
Race ~~  0*Age + 0*Gender + 0*Stress1
Age ~~ 0*Gender + 0*Stress1
Gender ~~  0*Stress1

HHincome ~~ 1*HHincome
Race ~~ 1*Race
Age ~~ 1*Age
Gender ~~ 1*Gender
Stress1 ~~ 1*Stress1
'

fit = lavaan(model, sample.nobs=161, sample.cov=myCov, int.ov.free =
T, int.lv.free = F, auto.fix.first = T, auto.fix.single = T,
auto.resid.var = T, auto.cov.lv.x = T, auto.cov.y = T,
mimic = EQS)
summary(fit)

Cheers,

Gustavo.



On Thu, Mar 17, 2011 at 9:50 AM, Andrew Miles rstuff.mi...@gmail.com wrote:
 I have been trying to use lavaan (version 0.4-7) for a simple path model,
 but the program seems to be computing far less degrees of freedom for my
 model then it should have.  I have 7 variables, which should give (7)(8)/2 =
 28 covariances, and hence 28 DF.  The model seems to only think I have 13
 DF.  The code to reproduce the problem is below.  Have I done something
 wrong, or is this something I should take to the developer?


 library(lavaan)

 myCov = matrix(c(24.40, 0, 0, 0, 0, 0, 0, .03, .03, 0, 0, 0, 0, 0, 6.75, -
 .08, 519.38, 0, 0, 0, 0, .36, .01, 2.74, .18, 0, 0, 0, .51, .0, -.31, .02,
 .2, .0, 0, -.17, .0, -1.6, -.04, .01, .25, 0, -.1, .02, -.03, .0, -.01, .01
 , .02), nrow=7, byrow=TRUE, dimnames=list(c(Internet, Stress3,
 HHincome, Race, Age, Gender, Stress1), c(Internet, Stress3,
 HHincome, Race, Age, Gender, Stress1)))


 model = '

 Internet ~ HHincome + Race + Age + Gender + Stress1

 Stress3 ~ Internet + HHincome + Race + Age + Gender + Stress1

 '


 fit=sem(model, sample.nobs=161, sample.cov=myCov, mimic.Mplus=FALSE)


 #check the number of parameters being estimated

 inspect(fit, what=free)


 #Note the DF for the Chi-square is 0, when it should be 28-13 = 15

 summary(fit)


 Andrew Miles

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Incorrect degrees of freedom in SEM model using lavaan

2011-03-17 Thread Mike Cheung
Dear Andrew,

The reported df in lavaan is 0 which is correct. It is because this
path model is saturated. 28 is not the df, it is the no. of pieces
of information. The no. of parameter estimates is also 28. Thus, the
df is 0.

However, you are correct that there are only 13, not 28, free
parameters reported by lavaan. It is because fixed.x=TRUE was
implicitly used in your example (see ?sem). Since HHincome + Race +
Age + Gender + Stress1 are predictors, they are fixed at their sample
values when fixed.x=TRUE was used. And their covariance elements are
not involved in calculating the df.

If you want to mirror the behaviors in other SEM packages, you may try:

fit2 - sem(model, sample.nobs=161, sample.cov=myCov, fixed.x=FALSE)
inspect(fit2, what=free)
summary(fit2)

Hope it helps.
Mike
-- 
-
 Mike W.L. Cheung               Phone: (65) 6516-3702
 Department of Psychology       Fax:   (65) 6773-1843
 National University of Singapore
 http://courses.nus.edu.sg/course/psycwlm/internet/
-


On Thu, Mar 17, 2011 at 8:50 PM, Andrew Miles rstuff.mi...@gmail.com wrote:
 I have been trying to use lavaan (version 0.4-7) for a simple path model,
 but the program seems to be computing far less degrees of freedom for my
 model then it should have.  I have 7 variables, which should give (7)(8)/2 =
 28 covariances, and hence 28 DF.  The model seems to only think I have 13
 DF.  The code to reproduce the problem is below.  Have I done something
 wrong, or is this something I should take to the developer?


 library(lavaan)

 myCov = matrix(c(24.40, 0, 0, 0, 0, 0, 0, .03, .03, 0, 0, 0, 0, 0, 6.75, -
 .08, 519.38, 0, 0, 0, 0, .36, .01, 2.74, .18, 0, 0, 0, .51, .0, -.31, .02,
 .2, .0, 0, -.17, .0, -1.6, -.04, .01, .25, 0, -.1, .02, -.03, .0, -.01, .01
 , .02), nrow=7, byrow=TRUE, dimnames=list(c(Internet, Stress3,
 HHincome, Race, Age, Gender, Stress1), c(Internet, Stress3,
 HHincome, Race, Age, Gender, Stress1)))


 model = '

 Internet ~ HHincome + Race + Age + Gender + Stress1

 Stress3 ~ Internet + HHincome + Race + Age + Gender + Stress1

 '


 fit=sem(model, sample.nobs=161, sample.cov=myCov, mimic.Mplus=FALSE)


 #check the number of parameters being estimated

 inspect(fit, what=free)


 #Note the DF for the Chi-square is 0, when it should be 28-13 = 15

 summary(fit)


 Andrew Miles

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] setting up a R working group

2011-03-17 Thread Spencer Graves




  1.  There is a CRAN mirror at Bristol (roughly 100km from 
Swansea?).  They must have (or have had) some very active R users there.



  2.  Sundar Dorai-Raj and I developed a local R Archive Network 
and subversion repository internal to a company where we used to work.  
I believe that made it easier for us to collaborate and to share our 
results with others.  (See Creating R Packages, Using CRAN, R-Forge, 
And Local R Archive Networks And Subversion (SVN) Repositories, 
available from CRAN - Documentation:  Contributed.)  This is not the 
first thing you should do, but I believe it can help build local 
collaboration and awareness of R.



  Hope this helps.
  Spencer


On 3/17/2011 6:42 AM, Joanne Demmler wrote:

Dear R users,

we are currently trying to set up a R working group at Swansea 
University.
I would be very grateful to get some information and feedback from R 
users that have done something similar, in particular regarding:


- help with giving a general overview of what R is capable of
- help with what to present or teach (did you organise your own 
training programme or did you get someone from outside?)

- are there people out there who would like to present their work to us?

Any help will be highly appreciated!
Thanks Joanne



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need to abstract changing name of column within loop

2011-03-17 Thread Joshua Ulrich
On Wed, Mar 16, 2011 at 6:58 PM, jctoll jct...@gmail.com wrote:
 Hi,

 I'm struggling to figure out the way to change the name of a column
 from within a loop.  The problem is I can't refer to the object by its
 actual variable name, since that will change each time through the
 loop.  My xts object is A.

 head(A)
           A.Open A.High A.Low A.Close A.Volume A.Adjusted A.Adjusted.1
 2007-01-03  34.99  35.48 34.05   34.30  2574600      34.30  1186780
 2007-01-04  34.30  34.60 33.46   34.41  2073700      34.41  1190586
 2007-01-05  34.30  34.40 34.00   34.09  2676600      34.09  1179514
 2007-01-08  33.98  34.08 33.68   33.97  1557200      33.97  1175362
 2007-01-09  34.08  34.32 33.63   34.01  1386200      34.01  1176746
 2007-01-10  34.04  34.04 33.37   33.70  2157400      33.70  1166020

 It's column names are:
 colnames(A)
 [1] A.Open       A.High       A.Low        A.Close
 A.Volume     A.Adjusted   A.Adjusted.1

 I want to change the 7th column name:
 colnames(A)[7]
 [1] A.Adjusted.1

 I need to do that through a reference to i:
 i
 [1] A

 This works:
 colnames(get(i))[7]
 [1] A.Adjusted.1

 And this is what I want to change the column name to:
 paste(i, .MarketCap, sep = )
 [1] A.MarketCap

 But how do I make the assignment?  This clearly doesn't work:

 colnames(get(i))[7] -  paste(i, .MarketCap, sep = )
 Error in colnames(get(i))[7] - paste(i, .MarketCap, sep = ) :
  could not find function get-

 Nor does this (it creates a new object A.Adjusted.1 with a value of
 A.MarketCap) :

 assign(colnames(get(i))[7], paste(i, .MarketCap, sep = ))

 How can I change the name of that column within my big loop?  Any
 ideas?  Thanks!

I usually make a copy of the object, change it, then overwrite the original:

tmp - get(i)
colnames(tmp)[7] - foo
assign(i,tmp)

Hope that helps.

 Best regards,


 James

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


--
Joshua Ulrich  |  FOSS Trading: www.fosstrading.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange R squared, possible error

2011-03-17 Thread Gabor Grothendieck
On Wed, Mar 16, 2011 at 3:49 PM, derek jan.kac...@gmail.com wrote:
 k=lm(y~x)
 summary(k)
 returns R^2=0.9994

 lm(y~x) is supposed to find coef. a anb b in y=a*x+b

 l=lm(y~x+0)
 summary(l)
 returns R^2=0.9998
 lm(y~x+0) is supposed to find coef. a in y=a*x+b while setting b=0

 The question is why do I get better R^2, when it should be otherwise?

 Im sorry to use the word MS exel here, but I verified it in exel and it
 gives:
 R^2=0.9994 when y=a*x+b is used
 R^2=0.99938 when y=a*x+0 is used


The idea is that if you have a positive quantity that can be broken
down into two nonnegative quantities: X = X1 + X2 then it makes sense
to ask what proportion X1 is of X.   For example: 10 = 6 + 4 and 6 is
.6 of the total.

Now, in the case of a model with an intercept its a mathematical fact
that the variance of the response equals the variance of the fitted
model plus the variance of the residuals.  Thus it makes sense to ask
what fraction of the variance of the response is represented by the
variance of the fitted model (this fraction is R^2).

But if there is no intercept then that mathematical fact breaks down.
That is, its no longer true that the variance of the response equals
the variance of the fitted model plus the variance of the residuals.
Thus how meaningful is it to ask what proportion the variance of the
fitted model is of the variance of the response in the first place?
In this case we need to rethink the entire approach which is why a
different formula is required.

Also, maybe the real problem is not this at all. That is perhaps you
are not really trying to find the goodness of fit but rather you are
trying to compare two particular models: one with intercept and one
without.  In that case R^2 is not really what you want.  Instead use
the R anova command. For example, using the built in BOD data frame:

 fm - lm(demand ~ Time, BOD)
 fm0 - lm(demand ~ Time - 1, BOD)
 anova(fm, fm0)
Analysis of Variance Table

Model 1: demand ~ Time
Model 2: demand ~ Time - 1
  Res.Df RSS Df Sum of Sq  F  Pr(F)
1  4  38.069
2  5 135.820 -1   -97.751 10.271 0.03275 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Here we see that the residual sum of squares is much less for the full
model than for the reduced model and its significant at the 3.275%
level.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] calculating AUCs for each of the 1000 boot strap samples

2011-03-17 Thread Brian Diggs

Taby,

First, it is better to reply to the whole list (which I have included on 
this reply); there is a better chance of someone helping you.  Just 
because I could help with one aspect does not mean I necessarily can (or 
have the time to) help with more.


Further comments are inline below.

On 3/16/2011 10:45 PM, taby gathoni wrote:

Hi Brian,

Thanks for this comment I will action on this. Thanks also for the
comment, Andrija also advised the same thing and it worked like magic.

My next cause of action was to get the confidence intervals with the
AUC values.

For the confidence intervals i did them manually. for 99% i cut out
first 5 and last 5 after ranking the ACs while for 95% CI i cut out
first 25 and last 25


In general, this is right.  The middle 95% excludes the 2.5% on the 
ends, so for 1000 samples that is excluding the 25 most extreme values.



and this is my output


Upper bound Lower  Bound

at 99% CI   0.8175  0.50125

at 95% CI   0.7775  0.50375

from my understanding because there are small samples of 20 GOOD and
20 BAD the variations in the upper and lower bound should be minimal in
the 1000 samples.


I don't know why you would necessarily expect the variance to be 
minimal.  It is what it is.  Also, I don't know why you took 20 of each 
rather than just a random sub-sample.



If you get time, Would you be in a position to assist me find out
why  i have such huge variations? Thank you for taking time to respond.


Maybe pull out 10 of your bootstrap samples and look at the ROC curves 
themselves and their associated AUC.  That might give you a sense as to 
the variability that is possible (which is reflected in the confidence 
interval).


As a final note, you are reinventing the wheel.  There are several 
packages that deal with ROC curves.  Two I like in particular are ROCR 
and pROC.  The latter even has built in routines for computing 
confidence intervals for the AUC using bootstrap replication.



Kind regards,
Taby


--- On Wed, 3/16/11, Brian Diggsdig...@ohsu.edu  wrote:

From: Brian Diggsdig...@ohsu.edu
Subject: Re: calculating AUCs for each of the 1000 boot strap samples
To: tab...@yahoo.com
Cc: R helpr-help@r-project.org
Date: Wednesday, March 16, 2011, 10:42 PM

On 3/16/2011 8:04 AM, taby gathoni wrote:

data-data.frame(id=1:(165+42),main_samp$SCORE, 
x=rep(c(BAD,GOOD),c(42,165)))

  f-function(x) {

+ str.sample-list()
+ for (i in 1:length(levels(x$x)))
+ {
+ str.sample[[i]]-x[x$x==levels(x$x)[i] 
,][sample(tapply(x$x,x$x,length)[i],20,rep=T),]
+ }
+ strat.sample-do.call(rbind,str.sample)
+ return(strat.sample$main_samp.SCORE)
+ }

  f(data)

   [1]
   706 633 443 843 756 743 730 843 706 730 606 743 768 768 743 763 608 730
   743 743 530 813 813 831 793 900 793 693 900 738 706 831
[33] 818 758 718 831 768 638 770 738

  repl-list()
  auc-list()
  for(i in 1:1000)

+ {
+ repl[[i]]-f(data)
+ auc[[i]]-colAUC(repl[[i]],rep(c(BAD,GOOD)),plotROC=FALSE,alg=ROC)
+ }
Error in
   colAUC(repl[[i]], rep(c(BAD, GOOD)), plotROC = FALSE, alg = ROC) :
colAUC: length(y) and nrow(X) must be the same Thanks alotTaby


I think (though I can't check because the example is not reproducible without 
main_samp$SCORE), that the problem is that the second argument to colAUC should 
be
rep(c(BAD, GOOD), c(20,20))
The error is that repl[[i]] is length 40 while rep(c(BAD, GOOD)) is length 
2.

P.S. When giving an example, it is better to not include the prompts and 
continuation prompts.  Copy it from the script rather than the output. Relevant 
output can then be included as script comments (prefixed with #).  That makes 
cutting-and-pasting to test easier.

-- Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health  Science University



--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health  Science University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regex query (Apache logs)

2011-03-17 Thread Saptarshi Guha
Hello Allan,

Thanks the response. Provides me hope. I appreciate [3], might even go with
that.
And for posterity, here's the code (assuming pastebin never expires)

[1] Test string : http://pastebin.com/FyAFzmTv
[2] Pattern (modified as per your suggestion) : http://pastebin.com/s7VT0r5K

pattern - readLines(url(http://pastebin.com/raw.php?i=s7VT0r5K;),
warn=FALSE)
test - readLines(url(http://pastebin.com/raw.php?i=rbAvR2dK;),warn=FALSE)
regexpr(pattern, test, perl=TRUE)

Thanks
Saptarshi


On Thu, Mar 17, 2011 at 12:12 AM, Allan Engelhardt all...@cybaea.comwrote:

 Some comments:

 1. [^\s] matches everything up to a literal 's', unless perl=TRUE.
 2. The (.*) is greedy, so you'll need (.*?)\s(.*?)\s(.*?)$ or similar
 at the end of the expression

 With those changes (and removing a space inserted by the newsgroup posting)
 the expression works for me.

  (pat - readLines(/tmp/b.txt)[1])
 [1]
 ^(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})\\s([^\\s]*)\\s([^\\s]*)\\s\\[([^\\]]+)\\]\\s\([A-Z]*)\\s([^\\s]*)\\s([^\\s]*)\\\s([^\\s]+)\\s(\\d+)\\s\(.*?)\\\s\(.*?)\\\s\(.*?)\$
  regexpr(pat, test, perl=TRUE)
 [1] 1
 attr(,match.length)
 [1] 436

 3. Consider a different approach, e.g. scan(textConnection(test),
 what=character(0))

 Hope this helps

 Allan



 On 16/03/11 22:18, Saptarshi Guha wrote:

 Hello R users,

 I have this regex see [1] for apache log lines. I tried using R to parse
 some data (only because I wanted to stay in R).
 A sample line is [2]

 (a) I saved the line in [1] into ~/tmp/a.txt and [2] into /tmp/a.txt

 pat- readLines(~/tmp/a.txt)
 test- readLines(/tmp/a.txt)
 test
 grep(pat,test)

 returns integer(0)

 The same query works in python via re.match() (i.e does return groups)

 Using readLines, the regex is escaped for me. Does Python and R use
 different regex styles?

 Cheers
 Saptarshi

 [1]

  
 ^(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s([^\s]*)\s([^\s]*)\s\[([^\]]+)\]\s([A-Z]*)\s([^\s]*)\s([^\s]*)\s([^\s]+)\s(\d+)\s(.*)\s(.*)\s(.*)$

 [2]
 220.213.119.925 addons.mozilla.org - [10/Jan/2001:01:55:07 -0800] GET

 /blocklist/3/%8ce33983c0-fd0e-11dc-12aa-0800200c9a66%7D/4.0b5/Fennec/20110217140304/Android_arm-eabi-gcc3/chrome:%2F%2Fglobal%2Flocale%2Fintl.properties/beta/Linux%
 202.6.32.9/default/default/6/6/1/ HTTP/1.1 200 3243 - Mozilla/5.0
 (Android; Linux armv7l; rv:2.0b12pre) Gecko/20110217 Firefox/4.0b12pre
 Fennec/4.0b5 BLOCKLIST_v3=110.163.217.169.1299218425.9706

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Spatial cluster analysis of continous outcome variable

2011-03-17 Thread Jon Toledo

Dear R Users, R Core
Team,
I have a two dimensional space where I measure a numerical value in two 
situations at different points. I have measured the change and I would like to 
test if there are areas in this 2D-space where there is a different amount of 
change (no change, increase, decrease). I don´t know if it´s better to analyse 
the data just with the coordinates or if its better to group them in pixels 
(and obtain the mean value for each pixel) and then run the cluster analysis. I 
would like to know if there is a package/function that allows me to do these 
calculations.I would also like to know if it could be done in a 3D space (I 
have collapsed the data to 2D because I don´t have many points.
Thanks in advance
J Toledo  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Segmentation fault when using plot function

2011-03-17 Thread Prof Brian Ripley

On Thu, 17 Mar 2011, Henrik Bengtsson wrote:


On Thu, Mar 17, 2011 at 6:58 AM, cchace cch...@wellington.com wrote:


I frequently get a segmentation fault error when using the plot command.
It happens about half the time.

We are running an old version of R (R version 2.8.0 (2008-10-20) on Linux.

I did a quick search for this problem and didn't find anything. Apologies if
I missed it.


Eventually some kind person with lots of extra time on his/her hand
will install R v2.8.0 on their machine, try to reproduce the problem,
troubleshoot it, fix the bug, and provide you with a customized
retroactive patched R v2.8.0 that you could install.  While waiting
for that to happen, you may want to try out R v2.12.2.


If the bug is in R: it looks much more likely to be in the locale 
handling of the X11 functions of the unstated (but probably old and 
unpatched) version of 'Linux'.



 *** Process received signal ***
 Signal: Segmentation fault (11)
 Signal code: Address not mapped (1)
 Failing at address: 0x231e18f22
 [ 0] /lib64/libc.so.6 [0x3c84a300a0]
 [ 1] /lib64/libc.so.6(strlen+0x10) [0x3c84a76090]
 [ 2] /usr/lib64/libX11.so.6 [0x3c8765d101]
 [ 3] /usr/lib64/libX11.so.6(_XlcCreateLC+0x46) [0x3c8765e5d6]
 [ 4] /usr/lib64/libX11.so.6(_XlcUtf8Loader+0x10) [0x3c8767ebe0]
 [ 5] /usr/lib64/libX11.so.6(_XOpenLC+0x104) [0x3c87665674]
 [ 6] /usr/lib64/libX11.so.6(_XlcCurrentLC+0x8) [0x3c87665768]
 [ 7] /usr/lib64/libX11.so.6(XSupportsLocale+0x9) [0x3c87665b59]
 [ 8] /usr/lib64/R/modules//R_X11.so(X11_Open+0x51) [0x2aaab8a32581]


That is all that was useful

--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cv.lm syntax error

2011-03-17 Thread agent dunham
I rewrite my previous comand, CVlm works now. It was related to having the
same names in the df than in the formula included at CVlm . 

Now I'd like to know:

Q1: if the seed is of any special importance, or I can type just seed=29
like in the example? 

Q2: I typed: 

CVlm(df = mydf, form.lm = formula(response~v1+v2+factor), m=3, dots = TRUE,
seed=29, plotit=TRUE, printit=TRUE)

In case is important factor has three levels, this way: table(factor): 
levels: 2  3  4


freq:   2 21 34

And when the 3rd Fold it says: 

fold 3 
Observations in test set: 2 5 6 8 11 13 15 16 17 22 29 33 35 37 41 46 51 52
55 
Error en model.frame.default(Terms, newdata, na.action = na.action, xlev =
object$xlevels) : 
  factor 'factor' has new level(s) 2

What does it mean?

Thanks in advance,u...@host.com

--
View this message in context: 
http://r.789695.n4.nabble.com/cv-lm-syntax-error-tp3334889p3385065.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange R squared, possible error

2011-03-17 Thread derek
Thank you for your very comprehensible answer. 
I a priori know that model y=a*x+0 is right and that I can't get x=constant
nor y=constant. 
I'm comparing performance of data gathering in my data set to another data
sets in which performance gathering is characterized by R-squared . The data
in data sets are not random but have random error with normal distribution.
The error in my data is composed of human and equipment error (normal
distribution), appliance error (probably skewed normal distribution) and
some other interferences. So I compare model y = a*x + b and y = a*x + 0 to
know the magnitude of interferences which I can't influence - the difference
of R^2's.

--
View this message in context: 
http://r.789695.n4.nabble.com/Strange-R-squared-possible-error-tp3382818p3385009.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange R squared, possible error

2011-03-17 Thread derek
Yes they are. I had edited the reply, but It didn't help.
Correction:
2)I meant zero slope, no zero intercept.


--
View this message in context: 
http://r.789695.n4.nabble.com/Strange-R-squared-possible-error-tp3382818p3384648.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lmm WITHOUT random factor (lme4)

2011-03-17 Thread Mark Difford
On Mar 17, 2011; 04:29pm Thierry Onkelinx wrote:

 You cannot compare lm() with lme() because the likelihoods are not the
 same. Use gls() instead of lm()

Hi Thierry,

Of course, I stand subject to correction, but unless something dramatic has
changed, you can. gls() can be used if you need to accommodate a correlation
structure.

The method I have outlined, i.e. anova(lme$obj, lm$obj), is detailed in
Pinheiro  Bates (2000) beginning page 154. Please refer to this if you
doubt me.

Regards, Mark.

--
View this message in context: 
http://r.789695.n4.nabble.com/lmm-WITHOUT-random-factor-lme4-tp3384054p3384802.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange R squared, possible error

2011-03-17 Thread derek
Thats exactly what I would like to do. Any idea on good text? I've consulted
severel texts, but no one defined R^2 as R^2 = 1 - Sum(R[i]^2) /
Sum((y[i])^2-y*)) still less why to use different formulas for similar model
or why should be R^2 closer to 1 when y=a*x+0 than in general model y=a*x+b.

from manual:
r.squared R^2, the ‘fraction of variance explained by the model’, 
R^2 = 1 - Sum(R[i]^2) / Sum((y[i]- y*)^2), 
where y* is the mean of y[i] if there is an intercept and zero otherwise. 

I don't need explaining what R^2 does nor how to interpret it, because I
know what it means and how it is derived. I don't need to be told which
model I should apply. So the answers from Thomas weren't helpful. 

I don't claim it is wrong, otherwise wouldn't be employed, but I want to see
the reason behind using two formulas.

Control questions:
1) Statement if there is an intercept means intercept including zero
intercept?

2) If I use model y = a*x+0 which formula for R^2 is used: the one with Y*
or the one without?

--
View this message in context: 
http://r.789695.n4.nabble.com/Strange-R-squared-possible-error-tp3382818p3384844.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Beginner question: How to replace part of a filename in read.csv?

2011-03-17 Thread pierz
I would like to use samp as a part of a filename that I can change. My source
files are .csv files with date as the file name, and I would like to be able
to type in the date (later perhaps automate this using list.files) and then
read the csv and write the pdf automatically. I have tried different
combinations with  and () around samp, but I keep getting the error
object 'samp.csv' not found.

samp - 20110317
read.csv(file=samp.csv,...)
#next R processes some code that works fine, and then should save the
figure:
pdf(file=samp.pdf,...)
dev.off()

How can I get R to replace samp as 20110317 in front of .csv and .pdf?

Best,
Sjouke

--
View this message in context: 
http://r.789695.n4.nabble.com/Beginner-question-How-to-replace-part-of-a-filename-in-read-csv-tp3384786p3384786.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lmm WITHOUT random factor (lme4)

2011-03-17 Thread Mark Difford
On Mar 17, 2011; 04:29pm Thierry Onkelinx wrote:

 You cannot compare lm() with lme() because the likelihoods are not the
 same. Use gls() instead of lm()

And perhaps I should have added the following:

First para on page 155 of Pinheiro  Bates (2000) states, The anova method
can be used to compare lme and lm objects.

Regards, Mark.

--
View this message in context: 
http://r.789695.n4.nabble.com/lmm-WITHOUT-random-factor-lme4-tp3384054p3384823.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Gelman-Rubin convergence diagnostics via coda package

2011-03-17 Thread fbielejec
Dear,

I'm trying to run diagnostics on MCMC analysis (fitting a log-linear
model to rates data). I'm getting an error message when trying
Gelman-Rubin shrink factor plot:

gelman.plot(out)
Error in chol.default(W) : 
  the leading minor of order 2 is not positive definite

I take it that somewhere, somehow a matrix is singular, but how can
that be remedied? 

My code:

library(rjags)

#JAGS MODEL3 CODE-#
data {
   for (i in 1:N) {
 y[i] - z2[i] - z1[i]
   }
}

model {

  for (i in 1 : N) {
  
  mean[i] - (beta[i] * (n[i] - z1[i]) * (n[i] - z2[i]) ) / n[i]

  y[i] ~ dpois( mean[i] )
  log(beta[i]) -  a1 * type[i] + a0
}

a1 ~ dnorm(0, 1.0E-4)
a0 ~ dnorm(0, 1.0E-4)

}
#--# 

N = 12
n = c(63, 33, 18, 16, 36, 3, 33, 101, 64, 30, 10, 36)
z1 = c(33, 8, 14, 4, 2, 8, 24, 76, 35, 14, 6, 22)
z2 = c(37, 8, 16, 6, 3, 10, 28, 91, 56, 17, 8, 28)
type = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1)

init0 = list(a0 = 0.0, a1 = 0.0, .RNG.seed = 1234, .RNG.name =
base::Super-Duper) init1 = list(a0 = 1.0, a1 = 1.0, .RNG.seed =
1234, .RNG.name = base::Super-Duper)

jags - jags.model('model3.bug',
   data = list('N' = N,
   'n' = n,
   'z1' = z1,
   'z2' = z2,
   'type' = type),
   inits = list(init0, init1),
   n.chains = 2,
   n.adapt = 100)
  
update(jags, 1000)
  
out - coda.samples(model = jags,
variable.names =  c('beta'),
n.iter = 25000,
thin = 5)

gelman.plot(out)

-- 
while(!succeed) { try(); }

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] generalized mixed linear models, glmmPQL and GLMER give very different results that both do not fit the data well...

2011-03-17 Thread Franssens, Samuel
Hi,

I have the following type of data: 86 subjects in three independent groups 
(high power vs low power vs control). Each subject solves 8 reasoning problems 
of two kinds: conflict problems and noconflict problems. I measure accuracy in 
solving the reasoning problems. To summarize: binary response, 1 within subject 
var (TYPE), 1 between subject var (POWER).

I wanted to fit the following model: for problem i, person j:
logodds ( Y_ij ) = b_0j + b_1j TYPE_ij
with b_0j = b_00 + b_01 POWER_j + u_0j
and b_1j = b_10 + b_11 POWER_j

I think it makes sense, but I'm not sure.
Here are the observed cell means:
conflict noconflict
control 0.68965520.9568966
high  0.69354840.9677419
low 0.88461540.9903846

GLMER gives me:
summary(glmer(accuracy~f_power*f_type + (1|subject), 
family=binomial,data=syllogisms))
Generalized linear mixed model fit by the Laplace approximation
Formula: accuracy ~ f_power * f_type + (1 | subject)
   Data: syllogisms
 AIC   BIC logLik deviance
406 437.7   -196  392
Random effects:
Groups  NameVariance Std.Dev.
subject (Intercept) 4.9968   2.2353
Number of obs: 688, groups: subject, 86

Fixed effects:
Estimate Std. Error z value Pr(|z|)
(Intercept)  1.507450.50507   2.985  0.00284 **
f_powerhp0.130830.70719   0.185  0.85323
f_powerlow   2.041210.85308   2.393  0.01672 *
f_typenoconflict 3.287150.64673   5.083 3.72e-07 ***
f_powerhp:f_typenoconflict   0.216800.93165   0.233  0.81599
f_powerlow:f_typenoconflict -0.011991.45807  -0.008  0.99344
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
(Intr) f_pwrh f_pwrl f_typn f_pwrh:_
f_powerhp   -0.714
f_powerlow  -0.592  0.423
f_typncnflc -0.185  0.132  0.109
f_pwrhp:f_t  0.128 -0.170 -0.076 -0.694
f_pwrlw:f_t  0.082 -0.059 -0.144 -0.444  0.308

glmmPQL gives me:
summary(glmmPQL(fixed=accuracy~f_power*f_type, random=~1|subject, 
family=binomial, data=syllogisms))
iteration 1
iteration 2
iteration 3
iteration 4
iteration 5
iteration 6
Linear mixed-effects model fit by maximum likelihood
Data: syllogisms
  AIC BIC logLik
   NA  NA NA

Random effects:
Formula: ~1 | subject
(Intercept)  Residual
StdDev:1.817202 0.8045027

Variance function:
Structure: fixed weights
Formula: ~invwt
Fixed effects: accuracy ~ f_power * f_type
Value Std.Error  DF  t-value p-value
(Intercept) 1.1403334 0.4064642 599 2.805496  0.0052
f_powerhp   0.0996481 0.5683296  83 0.175335  0.8612
f_powerlow  1.5358270 0.6486150  83 2.367856  0.0202
f_typenoconflict3.0096016 0.4769761 599 6.309754  0.
f_powerhp:f_typenoconflict  0.1856061 0.6790046 599 0.273350  0.7847
f_powerlow:f_typenoconflict 0.0968204 1.0318659 599 0.093830  0.9253
Correlation:
(Intr) f_pwrh f_pwrl f_typn f_pwrh:_
f_powerhp   -0.715
f_powerlow  -0.627  0.448
f_typenoconflict-0.194  0.138  0.121
f_powerhp:f_typenoconflict   0.136 -0.182 -0.085 -0.702
f_powerlow:f_typenoconflict  0.089 -0.064 -0.153 -0.462  0.325

Standardized Within-Group Residuals:
 Min   Q1  Med   Q3  Max
-12.43735991   0.06243699   0.22966010   0.33106978   2.23942234

Number of Observations: 688
Number of Groups: 86


Strange thing is that when you convert the estimates to probabilities, they are 
quite far off. For control, no conflict (intercept), the estimation from glmer 
is 1.5 - 81% and for glmmPQL is 1.14 - 75%, whereas the observed is: 68%.

Am I doing something wrong?

Any help is very much appreciated.
Sam.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Beginner question: How to replace part of a filename in read.csv?

2011-03-17 Thread Sarah Goslee
You need to use paste(), as in
paste(samp, .csv, sep=)

Use that as your filename.

Sarah

On Thu, Mar 17, 2011 at 11:05 AM, pierz s.pier...@gmail.com wrote:
 I would like to use samp as a part of a filename that I can change. My source
 files are .csv files with date as the file name, and I would like to be able
 to type in the date (later perhaps automate this using list.files) and then
 read the csv and write the pdf automatically. I have tried different
 combinations with  and () around samp, but I keep getting the error
 object 'samp.csv' not found.

 samp - 20110317
 read.csv(file=samp.csv,...)
 #next R processes some code that works fine, and then should save the
 figure:
 pdf(file=samp.pdf,...)
 dev.off()

 How can I get R to replace samp as 20110317 in front of .csv and .pdf?

 Best,
 Sjouke


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] generalized mixed linear models, glmmPQL and GLMER give very different results that both do not fit the data well...

2011-03-17 Thread Bert Gunter
I suggest that you post this on the R-sig-mixed-models list where you
are more likely to find those with bothe interest and expertise in
these matters.

-- Bert

On Thu, Mar 17, 2011 at 7:44 AM, Franssens, Samuel
samuel.franss...@econ.kuleuven.be wrote:
 Hi,

 I have the following type of data: 86 subjects in three independent groups 
 (high power vs low power vs control). Each subject solves 8 reasoning 
 problems of two kinds: conflict problems and noconflict problems. I measure 
 accuracy in solving the reasoning problems. To summarize: binary response, 1 
 within subject var (TYPE), 1 between subject var (POWER).

 I wanted to fit the following model: for problem i, person j:
 logodds ( Y_ij ) = b_0j + b_1j TYPE_ij
 with b_0j = b_00 + b_01 POWER_j + u_0j
 and b_1j = b_10 + b_11 POWER_j

 I think it makes sense, but I'm not sure.
 Here are the observed cell means:
                conflict                 noconflict
 control 0.6896552            0.9568966
 high      0.6935484            0.9677419
 low         0.8846154            0.9903846

 GLMER gives me:
 summary(glmer(accuracy~f_power*f_type + (1|subject), 
 family=binomial,data=syllogisms))
 Generalized linear mixed model fit by the Laplace approximation
 Formula: accuracy ~ f_power * f_type + (1 | subject)
   Data: syllogisms
  AIC   BIC logLik deviance
 406 437.7   -196      392
 Random effects:
 Groups  Name        Variance Std.Dev.
 subject (Intercept) 4.9968   2.2353
 Number of obs: 688, groups: subject, 86

 Fixed effects:
                            Estimate Std. Error z value Pr(|z|)
 (Intercept)                  1.50745    0.50507   2.985  0.00284 **
 f_powerhp                    0.13083    0.70719   0.185  0.85323
 f_powerlow                   2.04121    0.85308   2.393  0.01672 *
 f_typenoconflict             3.28715    0.64673   5.083 3.72e-07 ***
 f_powerhp:f_typenoconflict   0.21680    0.93165   0.233  0.81599
 f_powerlow:f_typenoconflict -0.01199    1.45807  -0.008  0.99344
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 Correlation of Fixed Effects:
            (Intr) f_pwrh f_pwrl f_typn f_pwrh:_
 f_powerhp   -0.714
 f_powerlow  -0.592  0.423
 f_typncnflc -0.185  0.132  0.109
 f_pwrhp:f_t  0.128 -0.170 -0.076 -0.694
 f_pwrlw:f_t  0.082 -0.059 -0.144 -0.444  0.308

 glmmPQL gives me:
 summary(glmmPQL(fixed=accuracy~f_power*f_type, random=~1|subject, 
 family=binomial, data=syllogisms))
 iteration 1
 iteration 2
 iteration 3
 iteration 4
 iteration 5
 iteration 6
 Linear mixed-effects model fit by maximum likelihood
 Data: syllogisms
  AIC BIC logLik
   NA  NA     NA

 Random effects:
 Formula: ~1 | subject
        (Intercept)  Residual
 StdDev:    1.817202 0.8045027

 Variance function:
 Structure: fixed weights
 Formula: ~invwt
 Fixed effects: accuracy ~ f_power * f_type
                                Value Std.Error  DF  t-value p-value
 (Intercept)                 1.1403334 0.4064642 599 2.805496  0.0052
 f_powerhp                   0.0996481 0.5683296  83 0.175335  0.8612
 f_powerlow                  1.5358270 0.6486150  83 2.367856  0.0202
 f_typenoconflict            3.0096016 0.4769761 599 6.309754  0.
 f_powerhp:f_typenoconflict  0.1856061 0.6790046 599 0.273350  0.7847
 f_powerlow:f_typenoconflict 0.0968204 1.0318659 599 0.093830  0.9253
 Correlation:
                            (Intr) f_pwrh f_pwrl f_typn f_pwrh:_
 f_powerhp                   -0.715
 f_powerlow                  -0.627  0.448
 f_typenoconflict            -0.194  0.138  0.121
 f_powerhp:f_typenoconflict   0.136 -0.182 -0.085 -0.702
 f_powerlow:f_typenoconflict  0.089 -0.064 -0.153 -0.462  0.325

 Standardized Within-Group Residuals:
         Min           Q1          Med           Q3          Max
 -12.43735991   0.06243699   0.22966010   0.33106978   2.23942234

 Number of Observations: 688
 Number of Groups: 86


 Strange thing is that when you convert the estimates to probabilities, they 
 are quite far off. For control, no conflict (intercept), the estimation from 
 glmer is 1.5 - 81% and for glmmPQL is 1.14 - 75%, whereas the observed is: 
 68%.

 Am I doing something wrong?

 Any help is very much appreciated.
 Sam.

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Bert Gunter
Genentech Nonclinical Biostatistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Beginner question: How to replace part of a filename in read.csv?

2011-03-17 Thread Scott Chamberlain
 paste(samp, .pdf, sep=)
[1] 20110317.pdf
 paste(samp, .csv, sep=)
[1] 20110317.csv
On Thursday, March 17, 2011 at 10:05 AM, pierz wrote:
I would like to use samp as a part of a filename that I can change. My source
 files are .csv files with date as the file name, and I would like to be able
 to type in the date (later perhaps automate this using list.files) and then
 read the csv and write the pdf automatically. I have tried different
 combinations with  and () around samp, but I keep getting the error
 object 'samp.csv' not found.
 
 samp - 20110317
 read.csv(file=samp.csv,...)
 #next R processes some code that works fine, and then should save the
 figure:
 pdf(file=samp.pdf,...)
 dev.off()
 
 How can I get R to replace samp as 20110317 in front of .csv and .pdf?
 
 Best,
 Sjouke
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Beginner-question-How-to-replace-part-of-a-filename-in-read-csv-tp3384786p3384786.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plotting multiple figures on one page

2011-03-17 Thread scarlet
Jim, 

Thanks for looking into this. The c without paste works. If the rq model
overrides the mfrow, I think I will have to piece together individual plots
using other software. 

scarlet

--
View this message in context: 
http://r.789695.n4.nabble.com/plotting-multiple-figures-on-one-page-tp3382981p3384787.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange R squared, possible error

2011-03-17 Thread Greg Snow
It is all a matter of what you are comparing too, or what the null model is.  
For most cases (standard regression) we compare a model with slope and 
intercept to an intercept only model (looking at the effect of the slope), the 
intercept only model fits a horizontal line through the mean of the y's hence 
the subtraction of the mean.  If we don't do that then R-squared can easily 
become meaningless.  Here is an example where we compute the r-squared using 
the no-intercept formula:

x - rnorm(100, 1000, 20)
y - rnorm(100, 1000, 20)
cor(x,y)

summary( lm( y ~ rep(1,100) + x + 0 ) )


Notice how big the r-squared value is (and that it is not anywhere near the 
square of the correlation) for data that is pretty independent.

When you force the intercept to 0, then you are using a different null model 
(mean 0).  Part of Thomas's point was that if we still subtract the mean in 
this case then the calculation of r-squared can give a negative number, which 
you pointed out is meaningless, the gist is that that is the incorrect formula 
to use and so R instead uses the formula without subtracting the mean when you 
don't fit an intercept.

The reason the r-squared values are different is because they are using 
different denominators and are therefore not comparable.

The reason that R uses 2 different formulas/denominators is because there is 
not one single formula/denominator that makes general sense in both cases.

Hope this helps,


-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of derek
 Sent: Thursday, March 17, 2011 9:29 AM
 To: r-help@r-project.org
 Subject: Re: [R] Strange R squared, possible error
 
 Thats exactly what I would like to do. Any idea on good text? I've
 consulted
 severel texts, but no one defined R^2 as R^2 = 1 - Sum(R[i]^2) /
 Sum((y[i])^2-y*)) still less why to use different formulas for similar
 model
 or why should be R^2 closer to 1 when y=a*x+0 than in general model
 y=a*x+b.
 
 from manual:
 r.squared R^2, the ‘fraction of variance explained by the model’,
 R^2 = 1 - Sum(R[i]^2) / Sum((y[i]- y*)^2),
 where y* is the mean of y[i] if there is an intercept and zero
 otherwise.
 
 I don't need explaining what R^2 does nor how to interpret it, because
 I
 know what it means and how it is derived. I don't need to be told which
 model I should apply. So the answers from Thomas weren't helpful.
 
 I don't claim it is wrong, otherwise wouldn't be employed, but I want
 to see
 the reason behind using two formulas.
 
 Control questions:
 1) Statement if there is an intercept means intercept including zero
 intercept?
 
 2) If I use model y = a*x+0 which formula for R^2 is used: the one with
 Y*
 or the one without?
 
 --
 View this message in context: http://r.789695.n4.nabble.com/Strange-R-
 squared-possible-error-tp3382818p3384844.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] fitting gamm with interaction term

2011-03-17 Thread Irene Mantzouni
Hi all,

 

I would like to fit a gamm model of the form:

 

Y~X+X*f(z)

Where f is the smooth function and 

With random effects on X and on the intercept.

 

So, I try to write it like this:

 

gam.lme- gamm(Y~ s(z, by=X) +X,  random=list(groups=pdDiag(~1+X))  )

 

but I get the error message :

 

Error in MEestimate(lmeSt, grps) : 

  Singularity in backsolve at level 0, block 1

 

When I simply fit a gam model using the formula above, then it works ok.

 

Is it possible to fit such a model with gamm?

 

 

Thanks a lot!


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plotting multiple figures on one page

2011-03-17 Thread Muhammad Rahiz

Scarlet,

If the mfrow is being overridden, perhaps the rimage package might be 
able to piece the individual plots...


--
Muhammad Rahiz
Researcher  DPhil Candidate (Climate Systems  Policy)
School of Geography  the Environment
University of Oxford

On Thu, 17 Mar 2011, scarlet wrote:


Jim,

Thanks for looking into this. The c without paste works. If the rq model
overrides the mfrow, I think I will have to piece together individual plots
using other software.

scarlet

--
View this message in context: 
http://r.789695.n4.nabble.com/plotting-multiple-figures-on-one-page-tp3382981p3384787.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fitting gamm with interaction term

2011-03-17 Thread Simon Wood

Is X a numeric variable or a factor? If it's numeric try

gam.lme- gamm(Y~ s(z, by=X),  random=list(groups=pdDiag(~1+X))  )

... since otherwise the separate X term is confounded with s(z, by=X). 
(gam detects such confounding and copes with it, but gamm can't).


Simon


On 17/03/11 17:47, Irene Mantzouni wrote:

Hi all,



I would like to fit a gamm model of the form:



Y~X+X*f(z)

Where f is the smooth function and

With random effects on X and on the intercept.



So, I try to write it like this:



gam.lme- gamm(Y~ s(z, by=X) +X,  random=list(groups=pdDiag(~1+X))  )



but I get the error message :



Error in MEestimate(lmeSt, grps) :

   Singularity in backsolve at level 0, block 1



When I simply fit a gam model using the formula above, then it works ok.



Is it possible to fit such a model with gamm?





Thanks a lot!


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603   http://people.bath.ac.uk/sw283

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Counting

2011-03-17 Thread Jim Silverton
I have a matrix say:

23   1
12  12
00
0   1
0   1
0   2
23  2

I want to count of number of distinct rows and the number of disinct element
in the second column and put these counts in a column. SO at the end of the
day I should have:

c(1, 1, 1, 2, 2, 1, 1) for the distinct rows and c(1, 1, 1, 2, 2, 2, 2) for
the counts of how many times the elements in the second column exists. Any
help is greatly appreciated.

-- 
Thanks,
Jim.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Spatial cluster analysis of continous outcome variable

2011-03-17 Thread Mike Marchywka







Did you post your data or hypothetical data?
Usually that helps make your problem more clear and more interesting
( likely to get a useful response to your post).




From: tintin...@hotmail.com
To: r-help@r-project.org
Date: Thu, 17 Mar 2011 17:38:14 +0100
Subject: [R] Spatial cluster analysis of continous outcome variable



Dear R Users, R Core
Team,
I have a two dimensional space where I measure a numerical value in two 
situations at different points. I have measured the change and I would like to 
test if there are areas in this 2D-space where there is a different amount of 
change (no change, increase, decrease). I don´t know if it´s better to analyse 
the data just with the coordinates or if its better to group them in pixels 
(and obtain the mean value for each pixel) and then run the cluster analysis. I 
would like to know if there is a package/function that allows me to do these 
calculations.I would also like to know if it could be done in a 3D space (I 
have collapsed the data to 2D because I don´t have many points.
Thanks in advance





J Toledo
[[alternative HTML version deleted]]

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Counting

2011-03-17 Thread K. Elo

Dear Jim,

17.03.2011 20:54, Jim Silverton wrote:

I have a matrix say:

23   1
12  12
00
0   1
0   1
0   2
23  2

I want to count of number of distinct rows and the number of disinct element
in the second column and put these counts in a column. SO at the end of the
day I should have:

c(1, 1, 1, 2, 2, 1, 1) for the distinct rows...


Let's suppose my.data is your data frame, var is the 1st column and 
var1 is the second.


1) Create a 3rd columns for the first task:
   my.data$var2-0
2) Count distinct rows:

   for (i in 1:nrow(my.data)) { my.data$var2[i]-nrow(subset(my.data, 
var==var[i]  var1==var1[i])) }


After this, the output of my.data$var2 is:

[1] 1 1 1 2 2 1 1

 ... and c(1, 1, 1, 2, 2, 2, 2) for the counts of how many times the
 elements in the second column exists.

Here I'm a bit irritated. Shouldn't the count for the first element 1 
rather be 3, since the number 3 occurs three times... If this is what 
You are looking for, then the following should work:


1) Create a 4th column for:
   my.data$var3-0
2) Count distinct elements in the second column:

   for (i in 1:nrow(my.data)) { 
my.data$var3[i]-sum(my.data$var1==my.data$var1[i]) }


After this, the output of my.data$var3 is:

[1] 3 1 1 3 3 2 2

HTH,
Kimmo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ggplot, transformation, and ylim

2011-03-17 Thread godwin yung
Hey everyone,

I'm having a little trouble with ggplot. I have two sets of y-values, one
whose range is contained in the other. Due to the nature of the y-values, I
wish to scale the y axis with a log base two transformation. Furthermore, I
wish to plot the two sets of y-values as boxplots in separate graphs with
the same ylim (and thus with the same ticks and tick labels). As of now, I
have:
ggplot(data, aes(x1, y1))+geom_boxplot()+scale_y_log2()
ggplot(data, aes(x2, y2))+geom_boxplot()+scale_y_log2()
which successfully creates the boxplot and transforms the y-axis. However, I
can't figure out how to change the range of the y-axis so that they're the
same in both graphs. Inserting +scale_y_continuous(limits=c(...)) before
or after +scale_y_log2() doesn't seem to work. I also tried:
qplot(x1,y1,geom=boxplot, ylim=c(...))+scale_y_log2()
but my efforts were in vain. Any ideas?

Thanks.

Cheers,
Godwin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot, transformation, and ylim

2011-03-17 Thread godwin yung
Hey everyone,

Nevermind, I figured it out:
ggplot(data, aes(x1, y1))+geom_boxplot()+scale_y_log2(lim=c(...))
:)

Cheers,
Godwin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R² for non-linear model

2011-03-17 Thread Alexander Engelhardt

Hi,
thank you for your elaborate answer. I downloaded Prof. Dayton's pdf and 
will read it tomorrow.


A friend also told me that our professor said you can actually compare 
AICs for different distributions. Apparently it's not correct strictly 
speaking, because of the two different likelihoods, but you can get 
meaningful information out of it. Did I understand that right?


By the way, what is the etiquette way of answering your post? Should I 
mail to you /and/ the list?


Regards,
 Alex (with a new e-mail address, in case this mixes something up)

Am 17.03.2011 09:58, schrieb Rubén Roa:

Hi Alexx,

I don't see any problem in comparing models based on different
distributions for the same data using the AIC, as long as they have a
different number of parameters and all the constants are included.
For example, you can compare distribution mixture models with different
number of components using the AIC.
This is one example:
Roa-Ureta. 2010. A Likelihood-Based Model of Fish Growth With Multiple
Length Frequency Data. Journal of Biological, Agricultural and
Environmental Statistics 15:416-429.
Here is another example:
www.education.umd.edu/EDMS/fac/Dayton/PCIC_JMASM.pdf
Prof. Dayton writes above that one advantage of AIC over hypothesis
testing is:
(d) Considerations related to underlying distributions for random
variables can be
incorporated into the decision-making process rather than being treated
as an assumption whose
robustness must be considered (e.g., models based on normal densities
and on log-normal
densities can be compared).
Last, if you read Akaike's theorem you will see there is nothing
precluding comparing models built on different distributional models.
Here it is:
 the expected (over the sample space and the space of parameter
estimates) maximum log-likelihood of some data on a working model
overshoots the expected (over the sample space only) maximum
log-likelihood of the data under the true model that
generated the data by exactly the number of parameters in the working
model.
A remarkable result.

Rubén

-Original Message-
From: r-help-boun...@r-project.org on behalf of Alexx Hardt
Sent: Wed 3/16/2011 7:42 PM
To: r-help@r-project.org
Subject: Re: [R] R² for non-linear model

Am 16.03.2011 19:34, schrieb Anna Gretschel:
  Am 16.03.2011 19:21, schrieb Alexx Hardt:
  And to be on-topic: Anna, as far as I know anova's are only useful to
  compare a submodel (e.g. with one less regressor) to another model.
 
  thanks! i don't get it either what they mean by fortune...

It's an R-package (and a pdf [1]) with collected quotes from the mailing
list.
Be careful with the suggestion to use AIC. If you wanted to compare two
models using AICs, you need the same distribution (that is,
Verteilungsannahme) in both models.
To my knowledge, there is no way to compare a gaussian model to an
exponential one (except common sense), but my knowledge is very limited.

[1] http://cran.r-project.org/web/packages/fortunes/vignettes/fortunes.pdf

--
alexx@alexx-fett:~$ vi .emacs

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] fitting gamm with interaction term

2011-03-17 Thread Irene Mantzouni
Hi all,

 

I would like to fit a gamm model of the form:

 

Y~X+X*f(z)

Where f is the smooth function and 

With random effects on X and on the intercept.

 

So, I try to write it like this:

 

gam.lme- gamm(Y~ s(z, by=X) +X,  random=list(groups=pdDiag(~1+X))  )

 

but I get the error message :

 

Error in MEestimate(lmeSt, grps) : 

  Singularity in backsolve at level 0, block 1

 

When I simply fit a gam model using the formula above, then it works ok.

 

Is it possible to fit such a model with gamm?

 

 

Thanks a lot!


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Counting

2011-03-17 Thread Petr Savicky
On Thu, Mar 17, 2011 at 02:54:49PM -0400, Jim Silverton wrote:
 I have a matrix say:
 
 23   1
 12  12
 00
 0   1
 0   1
 0   2
 23  2
 
 I want to count of number of distinct rows and the number of disinct element
 in the second column and put these counts in a column. SO at the end of the
 day I should have:
 
 c(1, 1, 1, 2, 2, 1, 1) for the distinct rows and c(1, 1, 1, 2, 2, 2, 2) for
 the counts of how many times the elements in the second column exists. Any
 help is greatly appreciated.

Hi.

I understand the first as follows. For each row compute the number of
rows, which are equal to the given one. If this is correct, then
the following can be used.

  a - cbind(
  c(23, 12,  0,  0,  0,  0, 23),
  c(1, 12,  0,  1,  1,  2,  2))

  u - rep(1, times=nrow(a))
  ave(u, a[, 1], a[, 2], FUN=sum)
  [1] 1 1 1 2 2 1 1

I am not sure, whether i understand the second correctly. Can you
explain it in more detail? I would expect

  ave(u, a[, 2], FUN=sum)
  [1] 3 1 1 3 3 2 2

However, this is different from your expected output. Do you count
only consecutive equal elements?

Hope this helps.

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Scope and apply-type functions

2011-03-17 Thread jamie.f.olson
So, I've been confused by this for a while.  If I want to create functions in
an apply, it only uses the desired value for the variable if I create a new
local variable:

 lapply(1:5,function(h){k=h;function(){k}})[[1]]()
[1] 1
 lapply(1:5,function(k){function(){k}})[[1]]()
[1] 5
 

Normally, a function will use values for a variable if the variable is local
to the definition of the function:

 a = function(x){function(){x}}
 a(5)
function(){x}

 a(5)()
[1] 5
 a(6)()
[1] 6


So why doesn't this work for apply?  Does apply work more like loops, with a
shared variable that changes values so that the first function defined
actually uses the last value of the variable?

 a = list();for (k in 1:5){a = c(a,function(){k})}
 a[[1]]()
[1] 5
 


Or is there something else entirely?

--
View this message in context: 
http://r.789695.n4.nabble.com/Scope-and-apply-type-functions-tp3385230p3385230.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Scope and apply-type functions

2011-03-17 Thread baptiste auguie
Hi,

I think it's a side effect of lazy evaluation, where you should
probably use the ?force like a jedi,

lapply(1:5,function(k){force(k) ; function(){k}})[[2]]()

HTH,

baptiste

On 18 March 2011 07:01, jamie.f.olson inspired2apa...@gmail.com wrote:
 So, I've been confused by this for a while.  If I want to create functions in
 an apply, it only uses the desired value for the variable if I create a new
 local variable:

 lapply(1:5,function(h){k=h;function(){k}})[[1]]()
 [1] 1
 lapply(1:5,function(k){function(){k}})[[1]]()
 [1] 5


 Normally, a function will use values for a variable if the variable is local
 to the definition of the function:

 a = function(x){function(){x}}
 a(5)
 function(){x}

 a(5)()
 [1] 5
 a(6)()
 [1] 6


 So why doesn't this work for apply?  Does apply work more like loops, with a
 shared variable that changes values so that the first function defined
 actually uses the last value of the variable?

 a = list();for (k in 1:5){a = c(a,function(){k})}
 a[[1]]()
 [1] 5



 Or is there something else entirely?

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Scope-and-apply-type-functions-tp3385230p3385230.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Numeric vector converted mysteriously to characters in data frame?

2011-03-17 Thread Martin Ralphs
Thanks Jim,

It turns out that the problem was that all columns had been converted
to factors.  Once I converted them back to numeric variables the code
worked fine.  If anybody is wondering how, you can do this with the
following:

mynumber - as.numeric(levels(myfactor))[myfactor]

There are plenty of other threads that describe this process fully.

Martin


On Thu, Mar 17, 2011 at 3:11 AM, Jim Holtman jholt...@gmail.com wrote:

 take a look at what the output of your cbind is; my guess is that it is a 
 character matrix.  you probably want to do

 data.frame(...,...,...)

 without the cbind.

 Sent from my iPad

 On Mar 16, 2011, at 18:12, Martin Ralphs martin.ral...@gmail.com wrote:

  Dear R Help,
 
  I would be very grateful if somebody could explain why this is happening.  I
  am trying to plot a lattice barchart of a vector of numbers with age
  bandings that I have brought together into a data frame.  When I plot the
  variables in their raw vector form, it works.  When I try to plot them in
  the df, R labels the numeric axis with default factor values and treats the
  numeric vector as character.  Why is this?  Apologies if this is blindingly
  obvious, I am new to R.
 
  Here is some sample code that demonstrates the problem:
 
  library(lattice)
 
  # Load in some sample data ...
  ageband - c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18)
  agenames -
  c(00-04,05-09,10-14,15-19,20-24,25-29,30-34,35-39,
  40-44,45-49,50-54,55-59,60-64,65-69,70-74,75-79,80-84,85
  plus)
  popcount -
  c(35274,40958,41574,47973,50384,51248,65748,54854,60948,66473,70854,
  60475,61854,55848,45857,30584,25475,20574)
 
 
  region - North East
  test1 - as.data.frame(cbind(region,ageband,agenames,popcount))
  region - North West
  test2 - as.data.frame(cbind(region,ageband,agenames,popcount))
  test - rbind(test1,test2)
  names(test) - c(GOR,Band,AgeBand,Persons)
 
  # When I plot my numeric data from the df it is treated as character
  vector...
 
  hg1 - barchart(AgeBand ~ Persons | GOR,
         data=test,
         origin=0,
         layout=c(2,1),
         reference=TRUE,
         xlab=Count (Persons),
         ylab=Quinary Age Band,
         scales=free)
 
  hg1
 
  # But when I plot the source vectors in the same way, it works how I want it
  to... (without the region bit, of course)
  hg2 - barchart(agenames ~ popcount | region,
         origin=0,
         layout=c(2,1),
         reference=TRUE,
         xlab=Count (Persons),
         ylab=Quinary Age Band,
         scales=free)
  hg2
 
     [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Scope and apply-type functions

2011-03-17 Thread Bert Gunter
Try this:

lapply(1:5,function(i){i;function()i})[[2]]()

or

 lapply(1:5,function(i){i;function(j=i) j } )[[2]]()


## both give 2

Now try:

lapply(1:5,function(i){function(j=i) j } )[[2]]()

## gives 5 !


The problem is that if you do:

 lapply(1:5,function(h){function(h)h)

what you get is a list of functions of the form function(h)h , where the
The h in this function has nothing to do with the index h in the
lapply statement, which no longer exists. So the result is an error
message saying it can't find h.

The top 2 versions specifically put the value of i at the time the ith
function is defined into the function environment, so they exist after
the lapply call is finished.

The 3rd version obviously uses the value of i at the conclusion of the
lapply call when the list of functions is returned from the lapply
call and they need to find the value of i for their environments.

In general, the scoping with lapply gets complicated and, iirc, the
experts have said is nonstandard (Experts: please correct if wrong).
So the best advice is to try to avoid constructs like this if
possible.

-- Bert


On Thu, Mar 17, 2011 at 11:01 AM, jamie.f.olson
inspired2apa...@gmail.com wrote:
 So, I've been confused by this for a while.  If I want to create functions in
 an apply, it only uses the desired value for the variable if I create a new
 local variable:

 lapply(1:5,function(h){k=h;function(){k}})[[1]]()
 [1] 1
 lapply(1:5,function(k){function(){k}})[[1]]()
 [1] 5


 Normally, a function will use values for a variable if the variable is local
 to the definition of the function:

 a = function(x){function(){x}}
 a(5)
 function(){x}

 a(5)()
 [1] 5
 a(6)()
 [1] 6


 So why doesn't this work for apply?  Does apply work more like loops, with a
 shared variable that changes values so that the first function defined
 actually uses the last value of the variable?

 a = list();for (k in 1:5){a = c(a,function(){k})}
 a[[1]]()
 [1] 5



 Or is there something else entirely?

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Scope-and-apply-type-functions-tp3385230p3385230.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need to abstract changing name of column within loop

2011-03-17 Thread jctoll
On Thu, Mar 17, 2011 at 10:32 AM, Joshua Ulrich josh.m.ulr...@gmail.com wrote:
 On Wed, Mar 16, 2011 at 6:58 PM, jctoll jct...@gmail.com wrote:
 Hi,

 I'm struggling to figure out the way to change the name of a column
 from within a loop.  The problem is I can't refer to the object by its
 actual variable name, since that will change each time through the
 loop.  My xts object is A.

 head(A)
           A.Open A.High A.Low A.Close A.Volume A.Adjusted A.Adjusted.1
 2007-01-03  34.99  35.48 34.05   34.30  2574600      34.30  1186780
 2007-01-04  34.30  34.60 33.46   34.41  2073700      34.41  1190586
 2007-01-05  34.30  34.40 34.00   34.09  2676600      34.09  1179514
 2007-01-08  33.98  34.08 33.68   33.97  1557200      33.97  1175362
 2007-01-09  34.08  34.32 33.63   34.01  1386200      34.01  1176746
 2007-01-10  34.04  34.04 33.37   33.70  2157400      33.70  1166020

 It's column names are:
 colnames(A)
 [1] A.Open       A.High       A.Low        A.Close
 A.Volume     A.Adjusted   A.Adjusted.1

 I want to change the 7th column name:
 colnames(A)[7]
 [1] A.Adjusted.1

 I need to do that through a reference to i:
 i
 [1] A

 This works:
 colnames(get(i))[7]
 [1] A.Adjusted.1

 And this is what I want to change the column name to:
 paste(i, .MarketCap, sep = )
 [1] A.MarketCap

 But how do I make the assignment?  This clearly doesn't work:

 colnames(get(i))[7] -  paste(i, .MarketCap, sep = )
 Error in colnames(get(i))[7] - paste(i, .MarketCap, sep = ) :
  could not find function get-

 Nor does this (it creates a new object A.Adjusted.1 with a value of
 A.MarketCap) :

 assign(colnames(get(i))[7], paste(i, .MarketCap, sep = ))

 How can I change the name of that column within my big loop?  Any
 ideas?  Thanks!

 I usually make a copy of the object, change it, then overwrite the original:

 tmp - get(i)
 colnames(tmp)[7] - foo
 assign(i,tmp)

 Hope that helps.

 Best regards,

 Joshua Ulrich  |  FOSS Trading: www.fosstrading.com


Thank you, that does help.  It sounds like your way is similar to the
way David suggested.  At least now I can just resign myself to doing
it that way rather than torturing my mind trying to figure out a
shorter way.

Thanks,

James

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Spatial cluster analysis of continous outcome variable

2011-03-17 Thread Jon Toledo

I attach the data (csv format). There are the 3 coordinates, (but as there are 
not so many points I wanted two do 3 analysis in each of them collapsing one 
variable).There are two variables to study I have posted the data as a ratio 
between both states and as a percentage state between both states. The data are 
from different samples (and each sample has 3 or 6 measures).Thanks again.

 From: marchy...@hotmail.com
 To: tintin...@hotmail.com; r-help@r-project.org
 Subject: RE: [R] Spatial cluster analysis of continous outcome variable
 Date: Thu, 17 Mar 2011 15:20:09 -0400
 
 
 
 
 
 
 
 
 Did you post your data or hypothetical data?
 Usually that helps make your problem more clear and more interesting
 ( likely to get a useful response to your post).
 
 
 
 
 From: tintin...@hotmail.com
 To: r-help@r-project.org
 Date: Thu, 17 Mar 2011 17:38:14 +0100
 Subject: [R] Spatial cluster analysis of continous outcome variable
 
 
 
 Dear R Users, R Core
 Team,
 I have a two dimensional space where I measure a numerical value in two 
 situations at different points. I have measured the change and I would like 
 to test if there are areas in this 2D-space where there is a different amount 
 of change (no change, increase, decrease). I don´t know if it´s better to 
 analyse the data just with the coordinates or if its better to group them in 
 pixels (and obtain the mean value for each pixel) and then run the cluster 
 analysis. I would like to know if there is a package/function that allows me 
 to do these calculations.I would also like to know if it could be done in a 
 3D space (I have collapsed the data to 2D because I don´t have many points.
 Thanks in advance
 
 
 
 
 
 J Toledo
 [[alternative HTML version deleted]]
 
 
  __
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using barplot() with zoo -- names.arg not permitted?

2011-03-17 Thread David Wolfskill
On Thu, Mar 17, 2011 at 10:23:33AM -0400, Gabor Grothendieck wrote:
 On Thu, Mar 17, 2011 at 9:38 AM, David Wolfskill r...@catwhisker.org wrote:
 ...
  But the X-axis labels show up as large integers -- the POSIXct values
  are apparently treated as numeric quantities for this purpose.
 ...
 Please cut this all down to as small a reproducible example as you can
 manage and place it all in ***one*** code section that also generates
 the data so that we can copy a single section of code from your post
 to R.  Use dput to convert data to code for this purpose.

Thank you for that.  :-)

The data below is a bit larger than absolutely necessary, but its
much smaller than what I would normally be using -- and I was able
to paste this into R directly (on a different machine, getting the
same results), so I believe it will suffice.

As a benefit, after going through this, I did manage to find an
invocation that is able to generate an acceptable X-axis -- at
least, sometimes.  (Changing the size of the X11 window appears to
mess things up -- at least.)  But this seems rather more complicated
than I expected.

Anyway, here's a cut/paste of the requested code section:

df - structure(list(time = c(1297278016L, 1297278017L, 1297278018L, 
1297278019L, 1297278020L, 1297278021L, 1297278022L, 1297278023L, 
1297278024L, 1297278025L, 1297278026L, 1297278027L, 1297278028L, 
1297278029L, 1297278030L, 1297278031L, 1297278032L, 1297278033L, 
1297278034L, 1297278035L, 1297278036L, 1297278037L, 1297278038L, 
1297278039L, 1297278040L, 1297278041L, 1297278042L, 1297278043L, 
1297278044L, 1297278045L, 1297278046L, 1297278047L, 1297278048L, 
1297278049L, 1297278050L, 1297278051L, 1297278054L, 1297278055L, 
1297278056L, 1297278057L, 1297278058L, 1297278059L, 1297278060L, 
1297278061L, 1297278062L, 1297278063L, 1297278064L, 1297278065L, 
1297278066L, 1297278067L, 1297278068L, 1297278069L, 1297278070L, 
1297278071L, 1297278072L, 1297278073L, 1297278074L, 1297278075L, 
1297278076L, 1297278077L, 1297278078L, 1297278079L, 1297278080L, 
1297278081L, 1297278082L, 1297278083L, 1297278084L, 1297278085L, 
1297278086L, 1297278087L, 1297278088L, 1297278089L, 1297278090L, 
1297278091L, 1297278092L, 1297278093L, 1297278094L, 1297278095L, 
1297278096L, 1297278097L, 1297278098L, 1297278099L, 1297278100L, 
1297278101L, 1297278102L), kern.cp_time_idle = c(3193265L, 3193540L, 
3193771L, 3194037L, 3194267L, 3194458L, 3194572L, 3194602L, 3194602L, 
3194602L, 3194602L, 3194602L, 3194603L, 3194607L, 3194617L, 3194672L, 
3194802L, 3194933L, 3195067L, 3195225L, 3195378L, 3195378L, 3195418L, 
3195418L, 3195418L, 3195419L, 3195419L, 3195421L, 3195422L, 3195422L, 
3195422L, 3195422L, 3195422L, 3195450L, 3195450L, 3195664L, 3196432L, 
3196440L, 3196871L, 3196872L, 3196874L, 3196944L, 3196944L, 3197037L, 
3197130L, 3197130L, 3197132L, 3197159L, 3197159L, 3197162L, 3197166L, 
3197236L, 3197367L, 3197367L, 3197497L, 3197639L, 3197782L, 3197922L, 
3197946L, 3197947L, 3197954L, 3197954L, 3197954L, 3198194L, 3198194L, 
3198204L, 3198205L, 3198256L, 3198265L, 3198269L, 3198269L, 3198269L, 
3198269L, 3198269L, 3198270L, 3198270L, 3198270L, 3198270L, 3198270L, 
3198270L, 3198270L, 3198323L, 3198323L, 3198367L, 3198631L), 
kern.cp_time_intr = c(5142L, 5142L, 5142L, 5143L, 5145L, 
5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 
5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 
5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 
5146L, 5147L, 5147L, 5147L, 5148L, 5148L, 5149L, 5150L, 5152L, 
5152L, 5152L, 5153L, 5153L, 5155L, 5156L, 5156L, 5156L, 5156L, 
5156L, 5156L, 5156L, 5156L, 5156L, 5156L, 5156L, 5156L, 5156L, 
5156L, 5156L, 5156L, 5156L, 5156L, 5157L, 5157L, 5157L, 5157L, 
5157L, 5157L, 5157L, 5157L, 5157L, 5157L, 5157L, 5157L, 5157L, 
5157L, 5157L, 5157L, 5157L, 5157L, 5157L, 5157L, 5157L), 
kern.cp_time_nice = c(36L, 36L, 36L, 36L, 36L, 36L, 36L, 
36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 
36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 
36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 
36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 
36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 
36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 
36L, 36L, 36L, 36L, 36L, 36L), kern.cp_time_sys = c(132583L, 
132591L, 132608L, 132618L, 132646L, 132667L, 132694L, 132737L, 
132784L, 132825L, 132833L, 132881L, 132933L, 132977L, 133017L, 
133059L, 133093L, 133121L, 133147L, 133172L, 133210L, 133220L, 
133259L, 133302L, 133342L, 133388L, 133431L, 133494L, 133547L, 
133557L, 133592L, 133631L, 133673L, 133713L, 133765L, 133785L, 
133798L, 133808L, 133816L, 133857L, 133892L, 133951L, 133960L, 
134004L, 134043L, 134109L, 134170L, 134206L, 134268L, 134329L, 
134396L, 134443L, 134486L, 134501L, 134558L, 134611L, 134654L, 
134704L, 134743L, 134789L, 134824L, 134871L, 

[R] A question about list

2011-03-17 Thread ufuk beyaztas
Hi dear all,

It may be a simple question, i have a list output with different number of
elements as following;

[[1]]
 [1]  0.86801402 -0.82974691  0.3974 -0.98566707 -4.96576856 -1.32056754
 [7] -5.54093319 -0.07600462 -1.34457280 -1.04080125  1.62843297 -0.20473912
[13]  0.30659907  2.66908117  2.53791837  0.53788013 -0.57463077  0.27708874
[19] -2.94233200  1.54565643 -6.83694100  7.21556266 -3.14823536 -1.34590796
[25]  0.78660855  5.53692735  1.22511890  7.65249980 -0.43008997 -0.10536125

[[2]]
 [1] -2.80332826  0.54414548  4.38232256 -1.38407653 -1.59241491 -1.35509664
 [7]  1.04806755 -0.27685465 -1.36671548 -3.16649719  2.17194692 -3.49404253
[13]  4.69102017  2.78297615  0.34565006  1.05954751  1.78836097 -0.80393182
[19]  3.74315304  1.17427902  1.62354686  0.53186688 -6.56519965 -3.39045485
[25]  0.01043676 -0.18857654 -0.57070351 -0.06135564  6.92331269 -1.46544614
[31] -1.65309767

[[3]]
 [1]  4.1923546  0.6319591 -0.8568113 -3.3115788 -2.4166481 -1.1543074
 [7] -0.9333245  0.2632038 -0.6909956 -3.1008763 -2.9557687  1.5382464
[13]  1.2713290  6.6527302  1.0433603 -0.9916190 -2.7724673 -1.6554250
[19]  1.8023591 -1.5101793  1.2604704 -0.2853326 -2.4312827 -0.4731487
[25]  3.5061061  1.7392190  1.5493419 -0.7203778 -0.6995221  2.7686406
[31]  6.1813364 -1.8665294

I want to compute the percentile of all of 94 elements, i tried
quantile(data) but i got an error like
Error in sort.list(x, partial = unique(c(lo, hi))) : 
  'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?
Can some one help me abouth this ?
Thanks for any idea...

--
View this message in context: 
http://r.789695.n4.nabble.com/A-question-about-list-tp3385711p3385711.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Autocorrelation in non-linear regression model

2011-03-17 Thread pruro
Hey all!

I am working on my master thesis and I am desperate with my model. 
It looks as following:

Y(t) = β1*X1(t) + β2*X2(t) + δ*(β1*((1+c)/(δ+c))+β2)*IE(t) -
β2*α*((1+c)/(δ+c))*(δ+g)* IE(t-1)

note: c and g is a constant value

The problem I encounter is that between IE(t) and IE(t-1) there is strong
linear correlation (autocorrelation). How can I solve this problem? Of
utterly importance is to have finally a significant coefficient δ and α
which is than used for a consecutive model. However, I get either no
significant values for δ and α, or for one of the two some unrealistic
values.

Is there an option to combine both in using some non-linear time lagged
model, time series or plugged in autoregression? 

A following up question would be how to place penalties for this model. I
would like to restrict values for δ and α between 0 and 0.5 and add
penalties when they come closer to the boundaries.

I really need some help. Because I am stuck with it for the last two weeks
and don't know how to go about it.

Thanks for the support

Cheers,

Bob


--
View this message in context: 
http://r.789695.n4.nabble.com/Autocorrelation-in-non-linear-regression-model-tp3385647p3385647.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Histograms with strings,

2011-03-17 Thread Khanvilkar, Shashank
Hello,
Thanks in advance for any help,

I have read a CSV file in which there is a column for an IP addr as in:

tmpInFile$V2
  [1] 74.125.224.38 74.125.224.38 129.46.71.19  129.46.71.19
  [5] 129.46.71.19  129.46.71.19  129.46.71.19  129.46.71.19
  [9] 129.46.71.19  129.46.71.19  129.46.71.19  129.46.71.19

If I want to find the IP addr that has the highest occurrence (129.46.71.19, in 
this case), is there a simple way to do this?

Thanks
Shank

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help with plotting a line that is multicoloured based on levels of a factor

2011-03-17 Thread Pamela Allen
 

 

Hi All, 

 

I'm trying to plot data that is a time series of flows that are associated
with a specific level, and I would like each level to represent a colour
in a line plot.  Here is some data that approximates what I'm using:

 

date=c(1:300)

flow=sin(2*pi/53*c(1:300))

levels=c(rep(c(high,med,low),100))

data=cbind.data.frame(date, flow, levels)

 

the levels column represents the levels of flow.  What I've done so far
is to plot this data using coloured points corresponding with each flow
level:

 

colour=ifelse(data$levels==high,red,

ifelse(data$levels==med,green,

ifelse(data$levels==low,blue,)))

plot(date, flow, col=colour)

 

What I would like to do instead is to plot the line of this data, not the
points.  i.e., 

plot(date, flow, type=l)

 

But I would like the colour of the line to change with each level, i.e.,

plot(date, flow, type=l, col=colour)

 

But this doesn't work because the line is continuous and the colours are
discrete.  I looked into using clipplot, but I'm not sure how I would
specify limits that would give different sections of the line correct
colours.  Does anyone know of a way to draw a line with different colours?
One way I thought of was to plot each level of flow separately and then
build the plot up, i.e., 

plot(data$date[data$levels==high], data$flow[data$levels==high],
col=red, type=l)

lines(data$date[data$levels==med], data$flow[data$levels==med],
col=green, type=l)

lines(data$date[data$levels==low], data$flow[data$levels==low],
col=blue, type=l)

 

But the line fills in data gaps, so this doesn't work.

 

Any help would be much appreciated!  Thank you.

 

-Pam Allen

pal...@hatfieldgroup.com

 

 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help with Time Series Plot‏

2011-03-17 Thread Axel Urbiz
 Dear List,

This is an embarrassing question, but I can seem to make this work…How do I
change the font size on the xlab and on the numbers shown in the x-axis on
the time series plot below. The arguments cex.lab and cex.axis do not seem
to be 'passing' to the plot function.

plot(ts(rnorm(100), start=2004, freq=12),
  ylab=RQI, xlab=My X lab, col=black, cex.lab=0.1, cex.axis=0.7)
Thanks,
Axel.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange R squared, possible error

2011-03-17 Thread derek
Thank you for our reply. It's a pity, that 2 variables defined by different
formula have same name. If the variables had been named differently, I
wouldn't have problem at all and it looks like it's done on purpose.
Because I test a quality of data (performance of collecting data) not a
model which I know a priori, R^2 gives me simple information of that quality
even for nonlinear model.

Still when testing different models R^2 shows nothing important. I didn't
stated that negative R^2 is meaningless. It has meaning that the model or
the data are wrong.

--
View this message in context: 
http://r.789695.n4.nabble.com/Strange-R-squared-possible-error-tp3382818p3385837.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] A question about list

2011-03-17 Thread Tóth Dénes

?unlist

quantile(unlist(data))


 Hi dear all,

 It may be a simple question, i have a list output with different number of
 elements as following;

 [[1]]
  [1]  0.86801402 -0.82974691  0.3974 -0.98566707 -4.96576856
 -1.32056754
  [7] -5.54093319 -0.07600462 -1.34457280 -1.04080125  1.62843297
 -0.20473912
 [13]  0.30659907  2.66908117  2.53791837  0.53788013 -0.57463077
 0.27708874
 [19] -2.94233200  1.54565643 -6.83694100  7.21556266 -3.14823536
 -1.34590796
 [25]  0.78660855  5.53692735  1.22511890  7.65249980 -0.43008997
 -0.10536125

 [[2]]
  [1] -2.80332826  0.54414548  4.38232256 -1.38407653 -1.59241491
 -1.35509664
  [7]  1.04806755 -0.27685465 -1.36671548 -3.16649719  2.17194692
 -3.49404253
 [13]  4.69102017  2.78297615  0.34565006  1.05954751  1.78836097
 -0.80393182
 [19]  3.74315304  1.17427902  1.62354686  0.53186688 -6.56519965
 -3.39045485
 [25]  0.01043676 -0.18857654 -0.57070351 -0.06135564  6.92331269
 -1.46544614
 [31] -1.65309767

 [[3]]
  [1]  4.1923546  0.6319591 -0.8568113 -3.3115788 -2.4166481 -1.1543074
  [7] -0.9333245  0.2632038 -0.6909956 -3.1008763 -2.9557687  1.5382464
 [13]  1.2713290  6.6527302  1.0433603 -0.9916190 -2.7724673 -1.6554250
 [19]  1.8023591 -1.5101793  1.2604704 -0.2853326 -2.4312827 -0.4731487
 [25]  3.5061061  1.7392190  1.5493419 -0.7203778 -0.6995221  2.7686406
 [31]  6.1813364 -1.8665294

 I want to compute the percentile of all of 94 elements, i tried
 quantile(data) but i got an error like
 Error in sort.list(x, partial = unique(c(lo, hi))) :
   'x' must be atomic for 'sort.list'
 Have you called 'sort' on a list?
 Can some one help me abouth this ?
 Thanks for any idea...

 --
 View this message in context:
 http://r.789695.n4.nabble.com/A-question-about-list-tp3385711p3385711.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >