Re: [R] Subset of time observations where timediff > 60 secs

2009-12-08 Thread Peter Dalgaard

Gabor Grothendieck wrote:

There is an example at the end of the Prices and Returns section of
the zoo-quickref vignette in the zoo package.

library(zoo)
vignette("zoo-quickref")

If speed is your main concern check this recent thread that was posted
on R-sig-finance:
http://n4.nabble.com/SUMMARY-Reducing-an-intra-day-dataset-into-one-obs-per-second-td949612.html


Hum, not sure the above actually solve the same problem. Looks like one 
of those rare cases where SAS's data step solves something almost 
trivially (apologies if the syntax below is not quite right), but it 
won't vectorize in any sensible way for R:


data fee;
 set foo;
 retain last:
 if time < last + 60 then delete; else last=time;

I think the easiest fix could be to code the core loop in C (e.g., 
returning a vector of indices to retain).





On Mon, Dec 7, 2009 at 10:57 AM, Karl Ove Hufthammer  wrote:

Dear list members

I have a rather large vector (part of a data frame) giving the time
(date + time, POSIXct) of observations. The times are irregular (with
both small and large jumps) but increasing, and there are several
millions of them.

I now wish to reduce my data set, so that I only have observations which
are at least (for example) 60 seconds apart. Basically, I need (all) the
indices of my time variable where the difference in times are at least
60 seconds.

I thought this would be a rather simple task, but perhaps I'm tired, for
I couldn't figure out how to do it in a even moderately elegant way (not
looping over all the values, which is quite slow).

This solution seemed sensible:

x=cumsum(diff(timevar) %/% 60)
ind=c(1,cumsum(rle(x)$lengths)+1) # And perhaps removing the last value

but doesn't work, as it only captures the 'first times' in each
60-second interval following the first time value, and thus may include
times with values that are closer than 60 seconds.

I also considered round.POSIXct and trunc.POSIXct, but these are not
appropriate either, for obvious reasons.

So, any ideas how to do this in an elegant and efficient way?

--
Karl Ove Hufthammer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - (p.dalga...@biostat.ku.dk)  FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Ordeing Zoo object

2009-12-08 Thread Achim Zeileis

On Tue, 8 Dec 2009, Bogaso wrote:



Hi all, I would like to ask how to order a Zoo object? Consider following
code


dat <- zooreg(rnorm(5), as.yearmon(as.Date("2001-01-01")), frequency=12)
dat

 Jan 2001   Feb 2001   Mar 2001   Apr 2001   May 2001
-0.8916124 -0.4516505  1.1305884 -1.4881309  0.3703734

Here I want to order from last to 1st i.e. from May to Jan. So I used
following code :


dat[5:1]

 Jan 2001   Feb 2001   Mar 2001   Apr 2001   May 2001
-0.8916124 -0.4516505  1.1305884 -1.4881309  0.3703734

Therefore I am not getting what I desired. If someone points out the correct
code I would be grateful.


The zoo object itself is always ordered by time. So it takes the 
observations you specified (5:1) and orders them by time again (so you get 
1:5). You need to extract the data without the time index and then proceed 
as usual:


  sort(coredata(dat))
or
  coredata(dat)[5:1]
etc.
Z


Thanks,
--
View this message in context: 
http://n4.nabble.com/Ordeing-Zoo-object-tp955868p955868.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] barchart() {lattice} help.

2009-12-08 Thread Deepayan Sarkar
On Sun, Dec 6, 2009 at 8:53 AM, Peng Cai  wrote:
> Please ignore my last question. I found a way to handle that. One last
> thing:
>
> I'm defining my own y scales. In the process the bar starts from below the
> "y=0" line (or below the y-axis). Is there a way to get rid of it.

Have you tried adding 'origin=0' ?

-Deepayan

>
> Here is the code and data I'm using
>  R Code (Data read in object "dta")
>
> dta$age <- factor(dta$age, levels = c("0--4", "5--14", "15--18"), ordered =
> TRUE)
> dta$race <- factor(dta$race, levels = c("White", "Black", "Other"))
> *yScale<-seq(0,1000,50)
> *library(lattice)
> barchart(dta$sum ~ dta$age | dta$gender,
>  data = dta,
>  groups = dta$race,
>  ylab = "Sum of admissions over 10 years (1996-2005)",
>  xlab = "Age",
>  par.settings = simpleTheme(col = c("green1", "yellow1", "orange")),
>  key = list(space="right",
>  cex=1,
>  text=list(levels(dta$race)),
>  rectangles=list(size=1.7, border="white", col = c("green1", "yellow1",
> "orange"))),
>  strip = strip.custom(bg="greenyellow"),
> * scales=list(y=list(rot=360, at=yScale), tck=c(1,0)),
>  panel=function(x,y,...)
>  {
>  panel.abline(h=c(yScale), col.line="gray")
>            panel.barchart(x,y,...)
>  }
> *)
>
>  Data:
>
> age gender  race sum
> 0--4 Female Black 145
> 0--4 Female Other  53
> 0--4 Female White  47
> 0--4   Male Black 286
> 0--4   Male Other 130
> 0--4   Male White  94
> 15--18 Female Black  30
> 15--18 Female Other   3
> 15--18 Female White   9
> 15--18   Male Black  21
> 15--18   Male Other   2
> 15--18   Male White   3
> 5--14 Female Black 138
> 5--14 Female Other  31
> 5--14 Female White  23
> 5--14   Male Black 199
> 5--14   Male Other  65
> 5--14   Male White  29
>
> Thanks,
> Peng
>
>
> On Sun, Dec 6, 2009 at 11:47 AM, Peng Cai  wrote:
>
>> Thank you Uwe, Dennis, and Gary for your help. I have one more question:
>>
>> I'm using pre-defined y-scales and trying to create grid lines.
>>
>> As "Female" category has low sum value, its y-axis range from 0-150 whereas
>> "Male" ranges from 0-300. Is it possible to make them on same scale. Here is
>> the previous code with an additional yScale code.
>>
>> R Code (Data read in object "dta")
>>
>> dta$age <- factor(dta$age, levels = c("0--4", "5--14", "15--18"), ordered =
>> TRUE)
>> dta$race <- factor(dta$race, levels = c("White", "Black", "Other"))
>> *yScale<-seq(0,1000,50)
>> *library(lattice)
>> barchart(dta$sum ~ dta$age | dta$gender,
>>  data = dta,
>>  groups = dta$race,
>>  stack = FALSE,
>>  aspect=0.6,
>>  layout=c(2,1),
>>
>>  ylab = "Sum of admissions over 10 years (1996-2005)",
>>  xlab = "Age",
>>  par.settings = simpleTheme(col = c("green1", "yellow1", "orange")),
>>  key = list(space="right",
>>   cex=1,
>>   text=list(levels(dta$race)),
>>   rectangles=list(size=1.7, border="white", col = c("green1", "yellow1",
>> "orange"))),
>>   strip = strip.custom(bg="greenyellow"),
>> * scales=list(relation="free", y=list(rot=360, at=yScale)),
>>  panel=function(x,y,...)
>>  {
>>   panel.abline(h=c(yScale), col.line="gray")
>>             panel.barchart(x,y,...)
>>  }  *
>> )
>>
>>
>> Data:
>>
>> age gender  race sum
>> 0--4 Female Black 145
>> 0--4 Female Other  53
>> 0--4 Female White  47
>> 0--4   Male Black 286
>> 0--4   Male Other 130
>> 0--4   Male White  94
>> 15--18 Female Black  30
>> 15--18 Female Other   3
>> 15--18 Female White   9
>> 15--18   Male Black  21
>> 15--18   Male Other   2
>> 15--18   Male White   3
>> 5--14 Female Black 138
>> 5--14 Female Other  31
>> 5--14 Female White  23
>> 5--14   Male Black 199
>> 5--14   Male Other  65
>> 5--14   Male White  29
>>
>> Thanks,
>> Peng
>>
>>   On Sun, Dec 6, 2009 at 11:18 AM, Gary Miller 
>> wrote:
>>
>>> Thanks Uwe, I got your suggestions part too.
>>>
>>> 2009/12/6 Uwe Ligges 
>>>
>>> >
>>> >
>>> > Peng Cai wrote:
>>> >
>>> >> Hi,
>>> >>
>>> >> I'm plotting grouped barplot using following code and data. I need help
>>> >> with
>>> >> re-ordering the labels.
>>> >>
>>> >> 1. On x-axis the factor "AGE" is grouped in order "0--4", "15--18",
>>> >> "5--14";
>>> >> whereas I would like to have it in "0--4", "5--14", "15--18".
>>> >>
>>> >> 2. If I need to re-order "RACE" variable. How can I do it assuming I
>>> need
>>> >> to
>>> >> change it on both the x-axis and legend. Currenlty the order is
>>> >> "Black","Other","White"; whereas I would like "White", "Black",
>>> "Other".
>>> >>
>>> >> Can anyone help please. I'm using following code, which is working fine
>>> >> except above issues.
>>> >>
>>> >> Code:
>>> >>
>>> >> library(lattice)
>>> >>
>>> >
>>> > To answer your question:
>>> >
>>> >  dta$age <- factor(dta$age, levels = c("0--4", "5--14", "15--18"),
>>> >                   ordered = TRUE)
>>> >  dta$race <- factor(dta$race, levels = c("White", "Black", "Other"))
>>> >
>>> >  library(lattice)
>>> >
>>> >  barchart(sum ~ age | gender, data = dta, groups = race,
>>> >   stack = FALSE,
>>> >   ylab = "Sum of admissions over 10 years (1996-2005)",
>>> >   xlab = "Age",
>>> >   par

[R] Ordeing Zoo object

2009-12-08 Thread Bogaso

Hi all, I would like to ask how to order a Zoo object? Consider following
code

> dat <- zooreg(rnorm(5), as.yearmon(as.Date("2001-01-01")), frequency=12)
> dat
  Jan 2001   Feb 2001   Mar 2001   Apr 2001   May 2001 
-0.8916124 -0.4516505  1.1305884 -1.4881309  0.3703734 

Here I want to order from last to 1st i.e. from May to Jan. So I used
following code :

> dat[5:1]
  Jan 2001   Feb 2001   Mar 2001   Apr 2001   May 2001 
-0.8916124 -0.4516505  1.1305884 -1.4881309  0.3703734 

Therefore I am not getting what I desired. If someone points out the correct
code I would be grateful.

Thanks,
-- 
View this message in context: 
http://n4.nabble.com/Ordeing-Zoo-object-tp955868p955868.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Serial Correlation in panel data regression

2009-12-08 Thread sayan dasgupta
Dear Sir,
Thanks for your reply
But still exists a trick . Basically I want to do Panel Tobit. I am using
the tobit function from the package (AER) on a panel data .
Suppose that Gasoline$lgaspcar is  a 0 inflated data and I do
m1<- tobit (as.formula(paste("lgaspcar ~", rhs)), data=Gasoline)

then if I do library(lmtest)

coeftest(m1,vcovHC)
Will it take account of the heteroskedasticity and serial correlation(
within country ) of the data


Regards
Sayan Dasgupta





On Tue, Dec 8, 2009 at 8:29 PM, Millo Giovanni
wrote:

>  Dear Sayan,
>
> there is a vcovHC method for panel models doing the White-Arellano
> covariance matrix, which is robust vs. heteroskedasticity *and* serial
> correlation, although in a different way from that of vcovHAC. You can
> supply it to coeftest as well, just as you did. The point is in estimating
> the model as a panel model in the first place.
>
> So this should do what you need:
>
>
> data("Gasoline", package="plm")
> Gasoline$f.year=as.factor(Gasoline$year)
> library(plm)
>
> rhs <- "-1 + f.year + lincomep+lrpmg+lcarpcap"
> pm1<- plm(as.formula(paste("lgaspcar ~", rhs)), data=Gasoline,
> model="pooling")
> library(lmtest)
> coeftest(pm1, vcov=vcovHC)
>
> Please refer to the package vignette for 'plm' to check what it does
> exactly. Let me know if there are any issues.
>
> Best,
> Giovanni
>
>
>
>
> -Original Message-
> From: Achim Zeileis 
> [mailto:achim.zeil...@wu-wien.ac.at
> ]
> Sent: Tue 08/12/2009 13.48
> To: sayan dasgupta
> Cc: r-help@R-project.org; yves.croiss...@let.ish-lyon.cnrs.fr; Millo
> Giovanni
> Subject: Re: Serial Correlation in panel data regression
>
> On Tue, 8 Dec 2009, sayan dasgupta wrote:
>
> > Dear R users,
> > I have a question here
> >
> > library(AER)
> > library(plm)
> > library(sandwich)
> > ## take the following data
> > data("Gasoline", package="plm")
> > Gasoline$f.year=as.factor(Gasoline$year)
> >
> > Now I run the following regression
> >
> > rhs <- "-1 + f.year + lincomep+lrpmg+lcarpcap"
> > m1<- lm(as.formula(paste("lgaspcar ~", rhs)), data=Gasoline)
> > ###Now I want to find the autocorrelation,heteroskedasticity adjusted
> > standard errors as a part of coeftest
> > ### Basically I would like to take care of the within country serial
> > correlaion
> >
> > ###that is I want to do
> > coeftest(m1, vcov=function(x) vcovHAC(x,order.by=...))
> >
> > Please suggest what should be the argument of order.by and whether that
> will
> > give me the desired result
>
> Currently, the default vcovHAC() method just implements the time series
> case. A generalization to panel data is not yet available.
>
> Maybe Yves and Giovanni (authors of "plm") have done something in that
> direction...
>
> sorry,
> Z
>
>
>  Ai sensi del D.Lgs. 196/2003 si precisa che le informazioni contenute in
> questo messaggio sono riservate ed a uso esclusivo del destinatario. Qualora
> il messaggio in parola Le fosse pervenuto per errore, La invitiamo ad
> eliminarlo senza copiarlo e a non inoltrarlo a terzi, dandocene gentilmente
> comunicazione. Grazie.
>
> Pursuant to Legislative Decree No. 196/2003, you are hereby informed that
> this message contains confidential information intended only for the use of
> the addressee. If you are not the addressee, and have received this message
> by mistake, please delete it and immediately notify us. You may not copy or
> disseminate this message to anyone. Thank you.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Significant performance difference between split of a data.frame and split of vectors

2009-12-08 Thread David Winsemius


On Dec 9, 2009, at 12:00 AM, Peng Yu wrote:

On Tue, Dec 8, 2009 at 10:37 PM, David Winsemius > wrote:


On Dec 8, 2009, at 11:28 PM, Peng Yu wrote:


I have the following code, which tests the split on a data.frame and
the split on each column (as vector) separately. The runtimes are of
10 time difference. When m and k increase, the difference become  
even

bigger.

I'm wondering why the performance on data.frame is so bad. Is it a  
bug

in R? Can it be improved?


You might want to look at the data.table package. The author calinms
significant speed improvements over dta.frames


This bug has been found long time back and a package has been
developed for it. Should the fix be integrated in data.frame rather
than be implemented in an additional package?


What bug?




David.



system.time(split(as.data.frame(x),f))


 user  system elapsed
 1.700   0.010   1.786


system.time(lapply(


+ 1:dim(x)[[2]]
+ , function(i) {
+   split(x[,i],f)
+ }
+ )
+ )
 user  system elapsed
 0.170   0.000   0.167

###
m=3
n=6
k=3000

set.seed(0)
x=replicate(n,rnorm(m))
f=sample(1:k, size=m, replace=T)

system.time(split(as.data.frame(x),f))

system.time(lapply(
  1:dim(x)[[2]]
  , function(i) {
split(x[,i],f)
  }
  )
  )

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Exporting Contingency Tables with xtable

2009-12-08 Thread Na'im R. Tyson

Dear R-philes:

I am having an issue with exporting contingency tables with xtable().   
I set up a contingency and convert it to a matrix for passing to  
xtable() as shown below.


v.cont.table <- table(v_lda$class, grps,
dnn=c("predicted", "observed"))
v.cont.mat <- as.matrix(v.cont.table)

Both produce output as follows:

observed
predicted  uh uh~
  uh  201  30
  uh~   6  10

However, when I construct the latex table with xtable(v.cont.mat), I  
get a good table without the headings of "predicted" and "observed".


\begin{table}[ht]
\begin{center}
\begin{tabular}{rrr}
  \hline
 & uh & uh\~{} \\
  \hline
uh & 201 &  30 \\
  uh\~{} &   6 &  10 \\
   \hline
\end{tabular}
\end{center}
\end{table}

Question: is there any easy way to retain or re-insert the dimension  
names from the contingency table and matrix?


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Can elements of a list be passed as multiple arguments?

2009-12-08 Thread David Winsemius


On Dec 8, 2009, at 11:37 PM, Peng Yu wrote:


I want to split a matrix, where both 'u' and 'w' are results of
possible ways. However, whenever 'n' changes, the argument passed to
mapply() has to change. Is there a way to pass elements of a list as
multiple arguments?


You need to explain what you want in more detail. In your example  
mapply did exactly what you told it to. No errors. Three matrices.  
What were you expecting when you gave it three lists in each argument?




m=10
n=2
k=3

set.seed(0)
x=replicate(n,rnorm(m))
f=sample(1:k, size=m, replace=T)

u=split(as.data.frame(x),f)

v=lapply(
   1:dim(x)[[2]]
   , function(i) {
 split(x[,i],f)
   }
   )

w=mapply(
   function(x,y) {
 cbind(x,y)
   }
   , v[[1]], v[[2]]
   )

--

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Significant performance difference between split of a data.frame and split of vectors

2009-12-08 Thread Peng Yu
On Tue, Dec 8, 2009 at 10:37 PM, David Winsemius  wrote:
>
> On Dec 8, 2009, at 11:28 PM, Peng Yu wrote:
>
>> I have the following code, which tests the split on a data.frame and
>> the split on each column (as vector) separately. The runtimes are of
>> 10 time difference. When m and k increase, the difference become even
>> bigger.
>>
>> I'm wondering why the performance on data.frame is so bad. Is it a bug
>> in R? Can it be improved?
>
> You might want to look at the data.table package. The author calinms
> significant speed improvements over dta.frames

This bug has been found long time back and a package has been
developed for it. Should the fix be integrated in data.frame rather
than be implemented in an additional package?

> David.
>>
>>> system.time(split(as.data.frame(x),f))
>>
>>  user  system elapsed
>>  1.700   0.010   1.786
>>>
>>> system.time(lapply(
>>
>> +         1:dim(x)[[2]]
>> +         , function(i) {
>> +           split(x[,i],f)
>> +         }
>> +         )
>> +     )
>>  user  system elapsed
>>  0.170   0.000   0.167
>>
>> ###
>> m=3
>> n=6
>> k=3000
>>
>> set.seed(0)
>> x=replicate(n,rnorm(m))
>> f=sample(1:k, size=m, replace=T)
>>
>> system.time(split(as.data.frame(x),f))
>>
>> system.time(lapply(
>>       1:dim(x)[[2]]
>>       , function(i) {
>>         split(x[,i],f)
>>       }
>>       )
>>   )
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to know what vignette are available in a package?

2009-12-08 Thread David Winsemius


On Dec 8, 2009, at 11:50 PM, Peng Yu wrote:


For any given package (for example, data.table), is there a way to
show all the available vignette from the package (or to know whether
there is vignette)?



?vignette

--
David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to know what vignette are available in a package?

2009-12-08 Thread Peng Yu
For any given package (for example, data.table), is there a way to
show all the available vignette from the package (or to know whether
there is vignette)?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Can elements of a list be passed as multiple arguments?

2009-12-08 Thread Peng Yu
I want to split a matrix, where both 'u' and 'w' are results of
possible ways. However, whenever 'n' changes, the argument passed to
mapply() has to change. Is there a way to pass elements of a list as
multiple arguments?

m=10
n=2
k=3

set.seed(0)
x=replicate(n,rnorm(m))
f=sample(1:k, size=m, replace=T)

u=split(as.data.frame(x),f)

v=lapply(
1:dim(x)[[2]]
, function(i) {
  split(x[,i],f)
}
)

w=mapply(
function(x,y) {
  cbind(x,y)
}
, v[[1]], v[[2]]
)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Significant performance difference between split of a data.frame and split of vectors

2009-12-08 Thread David Winsemius


On Dec 8, 2009, at 11:28 PM, Peng Yu wrote:


I have the following code, which tests the split on a data.frame and
the split on each column (as vector) separately. The runtimes are of
10 time difference. When m and k increase, the difference become even
bigger.

I'm wondering why the performance on data.frame is so bad. Is it a bug
in R? Can it be improved?


You might want to look at the data.table package. The author calinms  
significant speed improvements over dta.frames


--
David.



system.time(split(as.data.frame(x),f))

  user  system elapsed
 1.700   0.010   1.786


system.time(lapply(

+ 1:dim(x)[[2]]
+ , function(i) {
+   split(x[,i],f)
+ }
+ )
+ )
  user  system elapsed
 0.170   0.000   0.167

###
m=3
n=6
k=3000

set.seed(0)
x=replicate(n,rnorm(m))
f=sample(1:k, size=m, replace=T)

system.time(split(as.data.frame(x),f))

system.time(lapply(
   1:dim(x)[[2]]
   , function(i) {
 split(x[,i],f)
   }
   )
   )

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Significant performance difference between split of a data.frame and split of vectors

2009-12-08 Thread Peng Yu
I have the following code, which tests the split on a data.frame and
the split on each column (as vector) separately. The runtimes are of
10 time difference. When m and k increase, the difference become even
bigger.

I'm wondering why the performance on data.frame is so bad. Is it a bug
in R? Can it be improved?

> system.time(split(as.data.frame(x),f))
   user  system elapsed
  1.700   0.010   1.786
>
> system.time(lapply(
+ 1:dim(x)[[2]]
+ , function(i) {
+   split(x[,i],f)
+ }
+ )
+ )
   user  system elapsed
  0.170   0.000   0.167

###
m=3
n=6
k=3000

set.seed(0)
x=replicate(n,rnorm(m))
f=sample(1:k, size=m, replace=T)

system.time(split(as.data.frame(x),f))

system.time(lapply(
1:dim(x)[[2]]
, function(i) {
  split(x[,i],f)
}
)
)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem with split eating giga-bytes of memory

2009-12-08 Thread Mark Kimpel
Jim, could you provide a code snippit to illustrate what you mean?

Hadley, good point, I did not know that.

Mark

Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine

15032 Hunter Court, Westfield, IN  46074

(317) 490-5129 Work, & Mobile & VoiceMail
(317) 399-1219 Skype No Voicemail please


On Tue, Dec 8, 2009 at 11:00 PM, jim holtman  wrote:

> Also instead of 'splitting' the data frame, I split the indices and then
> use those to access the information in the original dataframe.
>
>
> On Tue, Dec 8, 2009 at 9:54 PM, Mark Kimpel  wrote:
>
>> Hadley, Just as you were apparently writing I had the same thought and did
>> exactly what you suggested, converting all columns except the one that I
>> want split to character. Executed almost instantaneously without problem.
>> Thanks! Mark
>>
>> Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
>> Indiana University School of Medicine
>>
>> 15032 Hunter Court, Westfield, IN  46074
>>
>> (317) 490-5129 Work, & Mobile & VoiceMail
>> (317) 399-1219 Skype No Voicemail please
>>
>>
>>  On Tue, Dec 8, 2009 at 10:48 PM, hadley wickham 
>> wrote:
>>
>> > Hi Mark,
>> >
>> > Why are you using factors?  I think for this case you might find
>> > characters are faster and more space efficient.
>> >
>> > Alternatively, you can have a look at the plyr package which uses some
>> > tricks to keep memory usage down.
>> >
>> > Hadley
>> >
>> > On Tue, Dec 8, 2009 at 9:46 PM, Mark Kimpel  wrote:
>> > > Charles, I suspect your are correct regarding copying of the
>> attributes.
>> > > First off, selectSubAct.df is my "real" data, which turns out to be of
>> > the
>> > > same dim() as myDataFrame below, but each column is make up of
>> strings,
>> > not
>> > > simple letters, and there are many levels in each column, which I did
>> not
>> > > properly duplicate in my first example. I have ammended that below and
>> > with
>> > > the split the new object size is now not 10X the size of the original,
>> > but
>> > > 100X. My "real" data is even more complex than this, so I suspect that
>> is
>> > > where the problem lies. I need to search for a better solution to my
>> > problem
>> > > than split, for which I will start a separate thread if I can't figure
>> > > something out.
>> > >
>> > > Thanks for pointing me in the right direction,
>> > >
>> > > Mark
>> > >
>> > > myDataFrame <- data.frame(matrix(paste("The rain in Spain",
>> > > as.character(1:1400), sep = "."), ncol = 7, nrow = 399000))
>> > > mySplitVar <- factor(paste("Rainy days and Mondays",
>> > as.character(1:1400),
>> > > sep = "."))
>> > > myDataFrame <- cbind(myDataFrame, mySplitVar)
>> > > object.size(myDataFrame)
>> > > ## 12860880 bytes # ~ 13MB
>> > > myDataFrame.split <- split(myDataFrame, myDataFrame$mySplitVar)
>> > > object.size(myDataFrame.split)
>> > > ## 1,274,929,792 bytes ~ 1.2GB
>> > > object.size(selectSubAct.df)
>> > > ## 52,348,272 bytes # ~ 52MB
>> > > Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
>> > > Indiana University School of Medicine
>> > >
>> > > 15032 Hunter Court, Westfield, IN  46074
>> > >
>> > > (317) 490-5129 Work, & Mobile & VoiceMail
>> > > (317) 399-1219 Skype No Voicemail please
>> > >
>> > >
>> > > On Tue, Dec 8, 2009 at 10:22 PM, Charles C. Berry <
>> cbe...@tajo.ucsd.edu
>> > >wrote:
>> > >
>> > >> On Tue, 8 Dec 2009, Mark Kimpel wrote:
>> > >>
>> > >>  I'm having trouble using split on a very large data-set with ~1400
>> > levels
>> > >>> of
>> > >>> the factor to be split. Unfortunately, I can't reproduce it with the
>> > >>> simple
>> > >>> self-contained example below. As you can see, splitting the
>> artificial
>> > >>> dataframe of size ~13MB results in a split dataframe of ~ 144MB,
>> with
>> > an
>> > >>> increase memory allocation of ~10 fold for the split object. If
>> split
>> > >>> scales
>> > >>> linearly, then my actual 52MB dataframe should be easily handled by
>> my
>> > >>> 12GB
>> > >>> of RAM, but it is not. instead, when I try to split selectSubAct.df
>> on
>> > one
>> > >>> of its factors with 1473 levels, my memory is slowly gobbled up
>> (plus 3
>> > GB
>> > >>> of swap) until I cancel the operation.
>> > >>>
>> > >>> Any ideas on what might be happening? Thanks, Mark
>> > >>>
>> > >>
>> > >> Each element of myDataFrame.split contains a copy of the attributes
>> of
>> > the
>> > >> parent data.frame.
>> > >>
>> > >> And probably it does scale linearly. But the scaling factor depends
>> on
>> > the
>> > >> size of the attributes that get copied, I guess.
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>> myDataFrame <- data.frame(matrix(LETTERS, ncol = 7, nrow = 399000))
>> > >>> mySplitVar <- factor(as.character(1:1400))
>> > >>> myDataFrame <- cbind(myDataFrame, mySplitVar)
>> > >>> object.size(myDataFrame)
>> > >>> ## 12860880 bytes # ~ 13MB
>> > >>> myDataFrame.split <- split(myDataFrame, myDataFrame$mySplitVar)
>> > >>> object.size(myDataFrame.split)
>> > >>> ## 144524992 bytes # ~ 144MB

Re: [R] Why cannot get the expected values in my function

2009-12-08 Thread David Winsemius


On Dec 8, 2009, at 11:07 PM, rusers.sh wrote:


Hi,
 In the following function, i hope to save my simulated data into the
"result" dataset, but why the final "result" dataset seems not to be
generated.




#Function
simdata<-function (nsim) {


# Instead why not:
cbind(x=runif(nsim), y=runif(nsim) )

 }

#simulation

simdata(10)  #correct result

 x   y
[1,] 0.2655087 0.372123900
[2,] 0.1848823 0.702374036
[3,] 0.1680415 0.807516399
[4,] 0.5858003 0.008945796
[5,] 0.2002145 0.685218596
[6,] 0.6062683 0.937641973
[7,] 0.9889093 0.397745453
[8,] 0.4662952 0.207823317
[9,] 0.2216014 0.024233910
[10,] 0.5074782 0.306768506
 But, the dataset "result" wasnot assigned the above values. What is  
the

problem?

result  #wrong result??

  x  y
[1,] NA NA
[2,] NA NA
[3,] NA NA
[4,] NA NA
[5,] NA NA
[6,] NA NA
[7,] NA NA
[8,] NA NA
[9,] NA NA
[10,] NA NA



Thanks a lot.
--
-
Jane Chang
Queen's

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Why cannot get the expected values in my function

2009-12-08 Thread rusers.sh
Hi,
  In the following function, i hope to save my simulated data into the
"result" dataset, but why the final "result" dataset seems not to be
generated.
#Function
simdata<-function (nsim) {
 result<-matrix(NA,nrow=nsim,ncol=2)
 colnames(result)<-c("x","y")
 for (i in 1:nsim) {
 set.seed(i)
 result[i,]<- cbind(runif(1),runif(1))
 }
  return(result)
 }

#simulation
> simdata(10)  #correct result
  x   y
 [1,] 0.2655087 0.372123900
 [2,] 0.1848823 0.702374036
 [3,] 0.1680415 0.807516399
 [4,] 0.5858003 0.008945796
 [5,] 0.2002145 0.685218596
 [6,] 0.6062683 0.937641973
 [7,] 0.9889093 0.397745453
 [8,] 0.4662952 0.207823317
 [9,] 0.2216014 0.024233910
[10,] 0.5074782 0.306768506
  But, the dataset "result" wasnot assigned the above values. What is the
problem?
> result  #wrong result??
   x  y
 [1,] NA NA
 [2,] NA NA
 [3,] NA NA
 [4,] NA NA
 [5,] NA NA
 [6,] NA NA
 [7,] NA NA
 [8,] NA NA
 [9,] NA NA
[10,] NA NA
>
 Thanks a lot.
-- 
-
Jane Chang
Queen's

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem with split eating giga-bytes of memory

2009-12-08 Thread jim holtman
Also instead of 'splitting' the data frame, I split the indices and then use
those to access the information in the original dataframe.

On Tue, Dec 8, 2009 at 9:54 PM, Mark Kimpel  wrote:

> Hadley, Just as you were apparently writing I had the same thought and did
> exactly what you suggested, converting all columns except the one that I
> want split to character. Executed almost instantaneously without problem.
> Thanks! Mark
>
> Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
> Indiana University School of Medicine
>
> 15032 Hunter Court, Westfield, IN  46074
>
> (317) 490-5129 Work, & Mobile & VoiceMail
> (317) 399-1219 Skype No Voicemail please
>
>
>  On Tue, Dec 8, 2009 at 10:48 PM, hadley wickham 
> wrote:
>
> > Hi Mark,
> >
> > Why are you using factors?  I think for this case you might find
> > characters are faster and more space efficient.
> >
> > Alternatively, you can have a look at the plyr package which uses some
> > tricks to keep memory usage down.
> >
> > Hadley
> >
> > On Tue, Dec 8, 2009 at 9:46 PM, Mark Kimpel  wrote:
> > > Charles, I suspect your are correct regarding copying of the
> attributes.
> > > First off, selectSubAct.df is my "real" data, which turns out to be of
> > the
> > > same dim() as myDataFrame below, but each column is make up of strings,
> > not
> > > simple letters, and there are many levels in each column, which I did
> not
> > > properly duplicate in my first example. I have ammended that below and
> > with
> > > the split the new object size is now not 10X the size of the original,
> > but
> > > 100X. My "real" data is even more complex than this, so I suspect that
> is
> > > where the problem lies. I need to search for a better solution to my
> > problem
> > > than split, for which I will start a separate thread if I can't figure
> > > something out.
> > >
> > > Thanks for pointing me in the right direction,
> > >
> > > Mark
> > >
> > > myDataFrame <- data.frame(matrix(paste("The rain in Spain",
> > > as.character(1:1400), sep = "."), ncol = 7, nrow = 399000))
> > > mySplitVar <- factor(paste("Rainy days and Mondays",
> > as.character(1:1400),
> > > sep = "."))
> > > myDataFrame <- cbind(myDataFrame, mySplitVar)
> > > object.size(myDataFrame)
> > > ## 12860880 bytes # ~ 13MB
> > > myDataFrame.split <- split(myDataFrame, myDataFrame$mySplitVar)
> > > object.size(myDataFrame.split)
> > > ## 1,274,929,792 bytes ~ 1.2GB
> > > object.size(selectSubAct.df)
> > > ## 52,348,272 bytes # ~ 52MB
> > > Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
> > > Indiana University School of Medicine
> > >
> > > 15032 Hunter Court, Westfield, IN  46074
> > >
> > > (317) 490-5129 Work, & Mobile & VoiceMail
> > > (317) 399-1219 Skype No Voicemail please
> > >
> > >
> > > On Tue, Dec 8, 2009 at 10:22 PM, Charles C. Berry <
> cbe...@tajo.ucsd.edu
> > >wrote:
> > >
> > >> On Tue, 8 Dec 2009, Mark Kimpel wrote:
> > >>
> > >>  I'm having trouble using split on a very large data-set with ~1400
> > levels
> > >>> of
> > >>> the factor to be split. Unfortunately, I can't reproduce it with the
> > >>> simple
> > >>> self-contained example below. As you can see, splitting the
> artificial
> > >>> dataframe of size ~13MB results in a split dataframe of ~ 144MB, with
> > an
> > >>> increase memory allocation of ~10 fold for the split object. If split
> > >>> scales
> > >>> linearly, then my actual 52MB dataframe should be easily handled by
> my
> > >>> 12GB
> > >>> of RAM, but it is not. instead, when I try to split selectSubAct.df
> on
> > one
> > >>> of its factors with 1473 levels, my memory is slowly gobbled up (plus
> 3
> > GB
> > >>> of swap) until I cancel the operation.
> > >>>
> > >>> Any ideas on what might be happening? Thanks, Mark
> > >>>
> > >>
> > >> Each element of myDataFrame.split contains a copy of the attributes of
> > the
> > >> parent data.frame.
> > >>
> > >> And probably it does scale linearly. But the scaling factor depends on
> > the
> > >> size of the attributes that get copied, I guess.
> > >>
> > >>
> > >>
> > >>
> > >>> myDataFrame <- data.frame(matrix(LETTERS, ncol = 7, nrow = 399000))
> > >>> mySplitVar <- factor(as.character(1:1400))
> > >>> myDataFrame <- cbind(myDataFrame, mySplitVar)
> > >>> object.size(myDataFrame)
> > >>> ## 12860880 bytes # ~ 13MB
> > >>> myDataFrame.split <- split(myDataFrame, myDataFrame$mySplitVar)
> > >>> object.size(myDataFrame.split)
> > >>> ## 144524992 bytes # ~ 144MB
> > >>>
> > >>
> > >> Note:
> > >>
> > >>  only.attr <- lapply(myDataFrame.split,function(x)
> sapply(x,attributes))
> > >>>
> > >>>
> >
> (object.size(myDataFrame.split)-object.size(myDataFrame))/object.size(only.attr)
> > >>>
> > >> 1.03726179240978 bytes
> > >>
> > >>
> > >>>
> > >>
> > >>  object.size(selectSubAct.df)
> > >>> ## 52,348,272 bytes # ~ 52MB
> > >>>
> > >>
> > >> What was this??
> > >>
> > >>
> > >> Chuck
> > >>
> > >>
> > >>>  sessionInfo()
> > 
> > >>> R version 2.10.0 Patched (2009-10-27 r50222)
> > >>> x86_64-unkno

Re: [R] problem with split eating giga-bytes of memory

2009-12-08 Thread Mark Kimpel
Hadley, Just as you were apparently writing I had the same thought and did
exactly what you suggested, converting all columns except the one that I
want split to character. Executed almost instantaneously without problem.
Thanks! Mark

Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine

15032 Hunter Court, Westfield, IN  46074

(317) 490-5129 Work, & Mobile & VoiceMail
(317) 399-1219 Skype No Voicemail please


On Tue, Dec 8, 2009 at 10:48 PM, hadley wickham  wrote:

> Hi Mark,
>
> Why are you using factors?  I think for this case you might find
> characters are faster and more space efficient.
>
> Alternatively, you can have a look at the plyr package which uses some
> tricks to keep memory usage down.
>
> Hadley
>
> On Tue, Dec 8, 2009 at 9:46 PM, Mark Kimpel  wrote:
> > Charles, I suspect your are correct regarding copying of the attributes.
> > First off, selectSubAct.df is my "real" data, which turns out to be of
> the
> > same dim() as myDataFrame below, but each column is make up of strings,
> not
> > simple letters, and there are many levels in each column, which I did not
> > properly duplicate in my first example. I have ammended that below and
> with
> > the split the new object size is now not 10X the size of the original,
> but
> > 100X. My "real" data is even more complex than this, so I suspect that is
> > where the problem lies. I need to search for a better solution to my
> problem
> > than split, for which I will start a separate thread if I can't figure
> > something out.
> >
> > Thanks for pointing me in the right direction,
> >
> > Mark
> >
> > myDataFrame <- data.frame(matrix(paste("The rain in Spain",
> > as.character(1:1400), sep = "."), ncol = 7, nrow = 399000))
> > mySplitVar <- factor(paste("Rainy days and Mondays",
> as.character(1:1400),
> > sep = "."))
> > myDataFrame <- cbind(myDataFrame, mySplitVar)
> > object.size(myDataFrame)
> > ## 12860880 bytes # ~ 13MB
> > myDataFrame.split <- split(myDataFrame, myDataFrame$mySplitVar)
> > object.size(myDataFrame.split)
> > ## 1,274,929,792 bytes ~ 1.2GB
> > object.size(selectSubAct.df)
> > ## 52,348,272 bytes # ~ 52MB
> > Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
> > Indiana University School of Medicine
> >
> > 15032 Hunter Court, Westfield, IN  46074
> >
> > (317) 490-5129 Work, & Mobile & VoiceMail
> > (317) 399-1219 Skype No Voicemail please
> >
> >
> > On Tue, Dec 8, 2009 at 10:22 PM, Charles C. Berry  >wrote:
> >
> >> On Tue, 8 Dec 2009, Mark Kimpel wrote:
> >>
> >>  I'm having trouble using split on a very large data-set with ~1400
> levels
> >>> of
> >>> the factor to be split. Unfortunately, I can't reproduce it with the
> >>> simple
> >>> self-contained example below. As you can see, splitting the artificial
> >>> dataframe of size ~13MB results in a split dataframe of ~ 144MB, with
> an
> >>> increase memory allocation of ~10 fold for the split object. If split
> >>> scales
> >>> linearly, then my actual 52MB dataframe should be easily handled by my
> >>> 12GB
> >>> of RAM, but it is not. instead, when I try to split selectSubAct.df on
> one
> >>> of its factors with 1473 levels, my memory is slowly gobbled up (plus 3
> GB
> >>> of swap) until I cancel the operation.
> >>>
> >>> Any ideas on what might be happening? Thanks, Mark
> >>>
> >>
> >> Each element of myDataFrame.split contains a copy of the attributes of
> the
> >> parent data.frame.
> >>
> >> And probably it does scale linearly. But the scaling factor depends on
> the
> >> size of the attributes that get copied, I guess.
> >>
> >>
> >>
> >>
> >>> myDataFrame <- data.frame(matrix(LETTERS, ncol = 7, nrow = 399000))
> >>> mySplitVar <- factor(as.character(1:1400))
> >>> myDataFrame <- cbind(myDataFrame, mySplitVar)
> >>> object.size(myDataFrame)
> >>> ## 12860880 bytes # ~ 13MB
> >>> myDataFrame.split <- split(myDataFrame, myDataFrame$mySplitVar)
> >>> object.size(myDataFrame.split)
> >>> ## 144524992 bytes # ~ 144MB
> >>>
> >>
> >> Note:
> >>
> >>  only.attr <- lapply(myDataFrame.split,function(x) sapply(x,attributes))
> >>>
> >>>
> (object.size(myDataFrame.split)-object.size(myDataFrame))/object.size(only.attr)
> >>>
> >> 1.03726179240978 bytes
> >>
> >>
> >>>
> >>
> >>  object.size(selectSubAct.df)
> >>> ## 52,348,272 bytes # ~ 52MB
> >>>
> >>
> >> What was this??
> >>
> >>
> >> Chuck
> >>
> >>
> >>>  sessionInfo()
> 
> >>> R version 2.10.0 Patched (2009-10-27 r50222)
> >>> x86_64-unknown-linux-gnu
> >>>
> >>> locale:
> >>> [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
> >>> [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
> >>> [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
> >>> [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
> >>> [9] LC_ADDRESS=C   LC_TELEPHONE=C
> >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> >>>
> >>> attached base packages:
> >>> [1] stats graphics  grDevices datasets  utils methods   base
> >>>
> >>> loaded via a namespace (and not

Re: [R] problem with split eating giga-bytes of memory

2009-12-08 Thread hadley wickham
Hi Mark,

Why are you using factors?  I think for this case you might find
characters are faster and more space efficient.

Alternatively, you can have a look at the plyr package which uses some
tricks to keep memory usage down.

Hadley

On Tue, Dec 8, 2009 at 9:46 PM, Mark Kimpel  wrote:
> Charles, I suspect your are correct regarding copying of the attributes.
> First off, selectSubAct.df is my "real" data, which turns out to be of the
> same dim() as myDataFrame below, but each column is make up of strings, not
> simple letters, and there are many levels in each column, which I did not
> properly duplicate in my first example. I have ammended that below and with
> the split the new object size is now not 10X the size of the original, but
> 100X. My "real" data is even more complex than this, so I suspect that is
> where the problem lies. I need to search for a better solution to my problem
> than split, for which I will start a separate thread if I can't figure
> something out.
>
> Thanks for pointing me in the right direction,
>
> Mark
>
> myDataFrame <- data.frame(matrix(paste("The rain in Spain",
> as.character(1:1400), sep = "."), ncol = 7, nrow = 399000))
> mySplitVar <- factor(paste("Rainy days and Mondays", as.character(1:1400),
> sep = "."))
> myDataFrame <- cbind(myDataFrame, mySplitVar)
> object.size(myDataFrame)
> ## 12860880 bytes # ~ 13MB
> myDataFrame.split <- split(myDataFrame, myDataFrame$mySplitVar)
> object.size(myDataFrame.split)
> ## 1,274,929,792 bytes ~ 1.2GB
> object.size(selectSubAct.df)
> ## 52,348,272 bytes # ~ 52MB
> Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
> Indiana University School of Medicine
>
> 15032 Hunter Court, Westfield, IN  46074
>
> (317) 490-5129 Work, & Mobile & VoiceMail
> (317) 399-1219 Skype No Voicemail please
>
>
> On Tue, Dec 8, 2009 at 10:22 PM, Charles C. Berry wrote:
>
>> On Tue, 8 Dec 2009, Mark Kimpel wrote:
>>
>>  I'm having trouble using split on a very large data-set with ~1400 levels
>>> of
>>> the factor to be split. Unfortunately, I can't reproduce it with the
>>> simple
>>> self-contained example below. As you can see, splitting the artificial
>>> dataframe of size ~13MB results in a split dataframe of ~ 144MB, with an
>>> increase memory allocation of ~10 fold for the split object. If split
>>> scales
>>> linearly, then my actual 52MB dataframe should be easily handled by my
>>> 12GB
>>> of RAM, but it is not. instead, when I try to split selectSubAct.df on one
>>> of its factors with 1473 levels, my memory is slowly gobbled up (plus 3 GB
>>> of swap) until I cancel the operation.
>>>
>>> Any ideas on what might be happening? Thanks, Mark
>>>
>>
>> Each element of myDataFrame.split contains a copy of the attributes of the
>> parent data.frame.
>>
>> And probably it does scale linearly. But the scaling factor depends on the
>> size of the attributes that get copied, I guess.
>>
>>
>>
>>
>>> myDataFrame <- data.frame(matrix(LETTERS, ncol = 7, nrow = 399000))
>>> mySplitVar <- factor(as.character(1:1400))
>>> myDataFrame <- cbind(myDataFrame, mySplitVar)
>>> object.size(myDataFrame)
>>> ## 12860880 bytes # ~ 13MB
>>> myDataFrame.split <- split(myDataFrame, myDataFrame$mySplitVar)
>>> object.size(myDataFrame.split)
>>> ## 144524992 bytes # ~ 144MB
>>>
>>
>> Note:
>>
>>  only.attr <- lapply(myDataFrame.split,function(x) sapply(x,attributes))
>>>
>>> (object.size(myDataFrame.split)-object.size(myDataFrame))/object.size(only.attr)
>>>
>> 1.03726179240978 bytes
>>
>>
>>>
>>
>>  object.size(selectSubAct.df)
>>> ## 52,348,272 bytes # ~ 52MB
>>>
>>
>> What was this??
>>
>>
>> Chuck
>>
>>
>>>  sessionInfo()

>>> R version 2.10.0 Patched (2009-10-27 r50222)
>>> x86_64-unknown-linux-gnu
>>>
>>> locale:
>>> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>> [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>> [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>> [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices datasets  utils     methods   base
>>>
>>> loaded via a namespace (and not attached):
>>> [1] tools_2.10.0
>>>
>>> Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
>>> Indiana University School of Medicine
>>>
>>> 15032 Hunter Court, Westfield, IN  46074
>>>
>>> (317) 490-5129 Work, & Mobile & VoiceMail
>>> (317) 399-1219 Skype No Voicemail please
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>> Charles C. Berry                            (858) 534-2098
>>                                            Dept of Family/Preventive
>> Medicin

Re: [R] problem with split eating giga-bytes of memory

2009-12-08 Thread Mark Kimpel
Charles, I suspect your are correct regarding copying of the attributes.
First off, selectSubAct.df is my "real" data, which turns out to be of the
same dim() as myDataFrame below, but each column is make up of strings, not
simple letters, and there are many levels in each column, which I did not
properly duplicate in my first example. I have ammended that below and with
the split the new object size is now not 10X the size of the original, but
100X. My "real" data is even more complex than this, so I suspect that is
where the problem lies. I need to search for a better solution to my problem
than split, for which I will start a separate thread if I can't figure
something out.

Thanks for pointing me in the right direction,

Mark

myDataFrame <- data.frame(matrix(paste("The rain in Spain",
as.character(1:1400), sep = "."), ncol = 7, nrow = 399000))
mySplitVar <- factor(paste("Rainy days and Mondays", as.character(1:1400),
sep = "."))
myDataFrame <- cbind(myDataFrame, mySplitVar)
object.size(myDataFrame)
## 12860880 bytes # ~ 13MB
myDataFrame.split <- split(myDataFrame, myDataFrame$mySplitVar)
object.size(myDataFrame.split)
## 1,274,929,792 bytes ~ 1.2GB
object.size(selectSubAct.df)
## 52,348,272 bytes # ~ 52MB
Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine

15032 Hunter Court, Westfield, IN  46074

(317) 490-5129 Work, & Mobile & VoiceMail
(317) 399-1219 Skype No Voicemail please


On Tue, Dec 8, 2009 at 10:22 PM, Charles C. Berry wrote:

> On Tue, 8 Dec 2009, Mark Kimpel wrote:
>
>  I'm having trouble using split on a very large data-set with ~1400 levels
>> of
>> the factor to be split. Unfortunately, I can't reproduce it with the
>> simple
>> self-contained example below. As you can see, splitting the artificial
>> dataframe of size ~13MB results in a split dataframe of ~ 144MB, with an
>> increase memory allocation of ~10 fold for the split object. If split
>> scales
>> linearly, then my actual 52MB dataframe should be easily handled by my
>> 12GB
>> of RAM, but it is not. instead, when I try to split selectSubAct.df on one
>> of its factors with 1473 levels, my memory is slowly gobbled up (plus 3 GB
>> of swap) until I cancel the operation.
>>
>> Any ideas on what might be happening? Thanks, Mark
>>
>
> Each element of myDataFrame.split contains a copy of the attributes of the
> parent data.frame.
>
> And probably it does scale linearly. But the scaling factor depends on the
> size of the attributes that get copied, I guess.
>
>
>
>
>> myDataFrame <- data.frame(matrix(LETTERS, ncol = 7, nrow = 399000))
>> mySplitVar <- factor(as.character(1:1400))
>> myDataFrame <- cbind(myDataFrame, mySplitVar)
>> object.size(myDataFrame)
>> ## 12860880 bytes # ~ 13MB
>> myDataFrame.split <- split(myDataFrame, myDataFrame$mySplitVar)
>> object.size(myDataFrame.split)
>> ## 144524992 bytes # ~ 144MB
>>
>
> Note:
>
>  only.attr <- lapply(myDataFrame.split,function(x) sapply(x,attributes))
>>
>> (object.size(myDataFrame.split)-object.size(myDataFrame))/object.size(only.attr)
>>
> 1.03726179240978 bytes
>
>
>>
>
>  object.size(selectSubAct.df)
>> ## 52,348,272 bytes # ~ 52MB
>>
>
> What was this??
>
>
> Chuck
>
>
>>  sessionInfo()
>>>
>> R version 2.10.0 Patched (2009-10-27 r50222)
>> x86_64-unknown-linux-gnu
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
>> [9] LC_ADDRESS=C   LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics  grDevices datasets  utils methods   base
>>
>> loaded via a namespace (and not attached):
>> [1] tools_2.10.0
>>
>> Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
>> Indiana University School of Medicine
>>
>> 15032 Hunter Court, Westfield, IN  46074
>>
>> (317) 490-5129 Work, & Mobile & VoiceMail
>> (317) 399-1219 Skype No Voicemail please
>>
>>[[alternative HTML version deleted]]
>>
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
> Charles C. Berry(858) 534-2098
>Dept of Family/Preventive
> Medicine
> E mailto:cbe...@tajo.ucsd.edu   UC San Diego
> http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901
>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible cod

Re: [R] re-ordering x-lables using barchart()

2009-12-08 Thread Xin Ge
In the same function as given below: Can anyone suggest how can I bold font
the "site names" ??? Thanks!
library(lattice)
barchart(yield ~ variety | site, data = barley,
groups = year, layout = c(6,1), aspect=.7,
ylab = "Barley Yield (bushels/acre)",
scales = list(x = list(abbreviate = TRUE, rot=45, minlength =
5)))

On Tue, Dec 8, 2009 at 10:35 PM, Xin Ge  wrote:

> Hi Gary, yes I'm using barchart() a lot these days... and was facing same
> problem of re-ordering...
>
>
> On Tue, Dec 8, 2009 at 10:28 PM, Gary Miller wrote:
>
>> @ David and Phil: Thanks for your suggestions.
>>
>> @ Xin: Are you also working with barchart()?
>>
>> -Gary
>>
>> On Tue, Dec 8, 2009 at 5:23 PM, Phil Spector > >wrote:
>>
>> > Gary -
>> >   If you create an ordered factor, barchart will plot the
>> > sites in the order you specify.  For example, try
>> >
>> > barley$site = ordered(barley$site,c('Waseca','Morris','Grand Rapids',
>> >'Duluth','University
>> Farm','Crookston'))
>> >
>> > before plotting.
>> >
>> >- Phil Spector
>> > Statistical Computing Facility
>> > Department of Statistics
>> > UC Berkeley
>> > spec...@stat.berkeley.edu
>>  >
>> >
>> > On Tue, 8 Dec 2009, Gary Miller wrote:
>> >
>> >   Hi R Users,
>> >>
>> >> I'm trying to re-order the "site names" ("Waseca", "Morris", ...). I'm
>> >> using
>> >> following code:
>> >>
>> >> libarry(lattice)
>> >> barchart(yield ~ variety | site, data = barley,
>> >> groups = year, layout = c(6,1), aspect=.7,
>> >> ylab = "Barley Yield (bushels/acre)",
>> >> scales = list(x = list(abbreviate = TRUE, rot=45, minlength
>> =
>> >> 5)))
>> >>
>> >> Can anyone help please.
>> >>
>> >> -Gary
>> >>
>> >>[[alternative HTML version deleted]]
>> >>
>> >>
>> >> __
>> >> R-help@r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide
>> >> http://www.R-project.org/posting-guide.html
>> 
>>  >> and provide commented, minimal, self-contained, reproducible code.
>> >>
>> >>
>>
>>[[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] re-ordering x-lables using barchart()

2009-12-08 Thread Gary Miller
@ David and Phil: Thanks for your suggestions.

@ Xin: Are you also working with barchart()?

-Gary

On Tue, Dec 8, 2009 at 5:23 PM, Phil Spector wrote:

> Gary -
>   If you create an ordered factor, barchart will plot the
> sites in the order you specify.  For example, try
>
> barley$site = ordered(barley$site,c('Waseca','Morris','Grand Rapids',
>'Duluth','University Farm','Crookston'))
>
> before plotting.
>
>- Phil Spector
> Statistical Computing Facility
> Department of Statistics
> UC Berkeley
> spec...@stat.berkeley.edu
>
>
> On Tue, 8 Dec 2009, Gary Miller wrote:
>
>   Hi R Users,
>>
>> I'm trying to re-order the "site names" ("Waseca", "Morris", ...). I'm
>> using
>> following code:
>>
>> libarry(lattice)
>> barchart(yield ~ variety | site, data = barley,
>> groups = year, layout = c(6,1), aspect=.7,
>> ylab = "Barley Yield (bushels/acre)",
>> scales = list(x = list(abbreviate = TRUE, rot=45, minlength =
>> 5)))
>>
>> Can anyone help please.
>>
>> -Gary
>>
>>[[alternative HTML version deleted]]
>>
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] partial match for two datasets

2009-12-08 Thread David Winsemius


On Dec 8, 2009, at 8:46 PM, Lynn Wang wrote:




Hi all,

I have two sets:

dig<-c("DAVID ADAMS","PIERS AKERMAN","SHERYLE BAGWELL","JULIAN  
BAJKOWSKI","CANDIDA BAKER")


import<-c("by DAVID ADAMS","piersAKERMAN","SHERYLE BagWEL","JULIAN  
BAJKOWSKI with ","Cand BAKER","smith green")



I want to get the following result from "import" after comparing the  
two sets


result<-c("by DAVID ADAMS","piersAKERMAN","JULIAN BAJKOWSKI with ")


> sapply(dig, function(x) grep(x, import) ) >0
 DAVID ADAMSPIERS AKERMAN  SHERYLE BAGWELL JULIAN  
BAJKOWSKICANDIDA BAKER
TRUE   NA   NA  
TRUE   NA


#Not exactly so need a partial match function that is more flexible.  
Unfortunately the Levenshtein function in MiscPsycho is not vectorized:



> import<-c("by DAVID ADAMS","piersAKERMAN","SHERYLE BagWEL","JULIAN  
BAJKOWSKI with ","Cand BAKER","smith green")
> dig<-c("DAVID ADAMS","PIERS AKERMAN","SHERYLE BAGWELL","JULIAN  
BAJKOWSKI","CANDIDA BAKER")

> library(MiscPsycho)
> import<-c("by DAVID ADAMS","piersAKERMAN","SHERYLE BagWEL","JULIAN  
BAJKOWSKI with ","Cand BAKER","smith green")

> word.pairs <- expand.grid(dig,import)
> wordpairs <- lapply(word.pairs,  as.character)
> wp2 <-data.frame(dig= wordpairs[[1]], import=wordpairs[[2]],  
stringsAsFactors=F)

> wp2$distnc <- apply(wp2, 1, function(x) stringMatch( x[1], x[2] ) )
>  wp2[wp2$distnc >.7, ]
dig importdistnc
1   DAVID ADAMS by DAVID ADAMS 0.7142857
7 PIERS AKERMAN   piersAKERMAN 0.9230769
13  SHERYLE BAGWELL SHERYLE BagWEL 0.933
19 JULIAN BAJKOWSKI JULIAN BAJKOWSKI with  0.7272727
25CANDIDA BAKER Cand BAKER 0.7692308


(I think you missed a couple of obvious matches that ought to be in  
the list)


--
David




I created a "partialmatch" function as follow, but can not get right  
result.


partialmatch<- function(x, y) as.vector(y[regexpr(as.character(x),  
as.character(y), ignore.case = TRUE)>0])


result<-partialmatch(dig,import)


[1] "by DAVID ADAMS"



Thanks,

lynn


  
__

Win 1 of 4 Sony home entertainment packs thanks to Yahoo!7.
Enter now: http://au.docs.yahoo.com/homepageset/
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem with split eating giga-bytes of memory

2009-12-08 Thread Charles C. Berry

On Tue, 8 Dec 2009, Mark Kimpel wrote:


I'm having trouble using split on a very large data-set with ~1400 levels of
the factor to be split. Unfortunately, I can't reproduce it with the simple
self-contained example below. As you can see, splitting the artificial
dataframe of size ~13MB results in a split dataframe of ~ 144MB, with an
increase memory allocation of ~10 fold for the split object. If split scales
linearly, then my actual 52MB dataframe should be easily handled by my 12GB
of RAM, but it is not. instead, when I try to split selectSubAct.df on one
of its factors with 1473 levels, my memory is slowly gobbled up (plus 3 GB
of swap) until I cancel the operation.

Any ideas on what might be happening? Thanks, Mark


Each element of myDataFrame.split contains a copy of the attributes of the 
parent data.frame.


And probably it does scale linearly. But the scaling factor depends on the 
size of the attributes that get copied, I guess.





myDataFrame <- data.frame(matrix(LETTERS, ncol = 7, nrow = 399000))
mySplitVar <- factor(as.character(1:1400))
myDataFrame <- cbind(myDataFrame, mySplitVar)
object.size(myDataFrame)
## 12860880 bytes # ~ 13MB
myDataFrame.split <- split(myDataFrame, myDataFrame$mySplitVar)
object.size(myDataFrame.split)
## 144524992 bytes # ~ 144MB


Note:


only.attr <- lapply(myDataFrame.split,function(x) sapply(x,attributes))
(object.size(myDataFrame.split)-object.size(myDataFrame))/object.size(only.attr)

1.03726179240978 bytes






object.size(selectSubAct.df)
## 52,348,272 bytes # ~ 52MB


What was this??


Chuck




sessionInfo()

R version 2.10.0 Patched (2009-10-27 r50222)
x86_64-unknown-linux-gnu

locale:
[1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8   LC_NAME=C
[9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices datasets  utils methods   base

loaded via a namespace (and not attached):
[1] tools_2.10.0

Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine

15032 Hunter Court, Westfield, IN  46074

(317) 490-5129 Work, & Mobile & VoiceMail
(317) 399-1219 Skype No Voicemail please

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading from Google Docs

2009-12-08 Thread Farrel Buchinsky
Is anyone using RGoogleDocs? If so have you used it in the last few weeks
and is it working as it used to. Look at the problem I have run into.
Farrel Buchinsky




On Sat, Nov 28, 2009 at 14:25, Farrel Buchinsky  wrote:

> Thank you for the interest in my problem.
>
>
> I have been  using the same script (see below) successfully for the past 5
> months and now all of a sudden I have problems. Could R be
> functioning differently under 2.10? Could Google have changed their
> authentication procedures? In other words are you currently able to read
> spreadsheets into R the way you used to?
>
> library(RGoogleDocs)
> ps <-readline(prompt="get the password in ")
> sheets.con = getGoogleDocsConnection(getGoogleAuth("fjb...@gmail.com", ps,
> service ="wise"))
> ts2=getWorksheets("OnCall",sheets.con)
> Error in getDocs(con) : problems connecting to get the list of documents
>
> I used options(error = recover) to troubleshoot but alas I am none the
> wiser (no pun intended). I am pasting the output here. Can you see where the
> problem is coming from? [By the way, I changed my script temporarily to
> service="writely" and the getDocs command worked. If I remember correctly
> RGoogleDocs had a problem about 6 months ago whereby one could list the
> documents but not the spreadsheets and then you fixed it. ]
>
> Enter a frame number, or 0 to exit
>
> 1: getWorksheets("OnCall", sheets.con)
> 2: getDocs(con)
>
> Selection: 2
> Called from: eval(expr, envir, enclos)
> Browse[1]> objects()
> [1] "as.data.frame" "auth"  "curl"  "folders"   "h"
> "status""what"  "x"
> Browse[1]> body()
> {
> if (what %in% names(GoogleURLs))
> what = GoogleURLs[what]
> else if (is(curl, "GoogleSpreadsheetsConnection"))
> what = GoogleURLs["spreadsheets"]
> curlSetOpt(customrequest = "GET", curl = curl)
> h = basicTextGatherer()
> if (folders)
> what = paste(what, "showfolders=true", sep = "?")
> x = getURL(what, curl = curl, headerfunction = h$update,
> followlocation = TRUE, ...)
> status = parseHTTPHeader(h$value())
> if (floor(as.numeric(status[["status"]])/100) != 2)
> stop("problems connecting to get the list of documents")
> doc = xmlParse(x, asText = TRUE)
> if (toupper(xmlName(xmlRoot(doc))) == "HTML")
> stop("Can't get document list. Is the connection still valid?
> Perhaps initialize a new connection.")
> convertDocList(doc, curl, as.data.frame)
> }
> Browse[1]> status
>
> WWW-Authenticate
>Content-Type
> "GoogleLogin realm=\"http://www.google.com/accounts/ClientLogin\";,
> service=\"writely\""
>  "text/html; charset=UTF-8"
>
> Date
> Expires
> "Sat, 28 Nov 2009
> 19:04:22 GMT" "Sat,
> 28 Nov 2009 19:04:22 GMT"
>
>  Cache-Control
>X-Content-Type-Options
>
> "private, max-age=0"
>   "nosniff"
>
> X-XSS-Protection
> X-Frame-Options
>
>  "0"
>"SAMEORIGIN"
>
>   Server
>   Transfer-Encoding
>
>"GFE/2.0"
>   "chunked"
>
>   status
>   statusMessage
>
>"401"
> "Token invalid"
>
>
>
> Farrel Buchinsky
> Google Voice Tel: (412) 567-7870
>
>
>
> On Sat, Nov 28, 2009 at 12:45, Duncan Temple Lang  > wrote:
>
>>
>>
>> Farrel Buchinsky wrote:
>> > Please oh please could someone help me or at least confirm that they are
>> > having the same problem.
>> >
>> > Why am I getting the error message from RGoogleDocs
>> >
>> >> getDocs(sheets.con)
>> > Error in getDocs(sheets.con) :
>> >   problems connecting to get the list of documents
>>
>> You are using a connection to the wise service (for worksheets)
>> to get the list of documents from the document service.
>>
>> If you call getDocs() with an connection to writely, I
>> imagine it will succeed.
>>
>> So you have a token, but it is for the wrong thing.
>>
>> >
>> >
>> > How do I troubleshoot?
>>
>> The first thing is to learn about debugging in R.
>> For example,
>>
>> options(error = recover)
>>
>> getDocs(sheets.con)
>>
>> The error occurs and you are presented with a menu prompt that allows you
>> to select the call frame of interest. There is only one - getDocs().
>> Enter 1 .  Now you have an R prompt  that allows you to explore
>> the call frame.
>>
>>  objects()
>>
>>  body()
>>
>>
>> Take a look at status
>>
>>  status
>>
>>
>>WWW-Authenticate
>> "GoogleLogin realm=\"http://www.google.com/accounts/ClientLogin\";,
>> service=\"writely\""
>>
>> Content-Type
>> "text/html;
>> charset=UTF-8"
>>
>> Date
>>"Sat, 28 Nov 2009
>> 17:36:16 GMT"
>>
>>  Expires
>>"Sat, 28 Nov 2009
>> 17:36:16 GMT"
>>
>>  

Re: [R] Problem with if statement

2009-12-08 Thread Gabor Grothendieck
It would be easier to answer if a portion of the input were shown in
the question using dput(rf) or dput(head(rf)) but lets assume it looks
like this and that our objective is to display a table of Name vs.
DNAME counts where DNAME consists of manufactured short names  -- is
that right?

DF <- data.frame(ID = c(10, 20, 10), Name = c("A blah", "B blah", "A blah"))

# Then here are several alternatives:

# 1. using assignment as in original question
DF1 <- DF
DF1$DNAME[DF1$ID == 10] <- "A"
DF1$DNAME[DF1$ID == 20] <- "B"
with(DF1, table(Name, DNAME))

# 2. using ifelse
DF2 <- DF
DF$NAME <- ifelse(DF2$ID == 10, "A", "B")

# 3. using sub assuming first word is unique
DF3 <- transform(DF, DNAME = sub(" .*", "", Name))
with(DF3, table(Name, DNAME))

# 4. using abbreviate although abbreviated names do not always look nice
DF4 <- transform(DF, DNAME = abbreviate(Name))
with(DF4, table(Name, DNAME))

# 5. subscripting by name
shortNames <- c("A blah" = "A", "B blah" = "B")
DF5 <- transform(DF, DNAME = shortNames[Name])
with(DF5, table(Name, DNAME))


On Tue, Dec 8, 2009 at 7:45 PM, Arthur Burke
 wrote:
> I am trying to use the value of an ID variable in an if statement and
> not getting the results I expected.
>
> # ID values for two school districts
>> with(rf, tapply(DistrictID, DistrictName, min) )
>
> Aberdeen School Dist. # 58         Buhl Joint School District
>                             59340                              53409
>
>
> This creates DNAME as I expected ...
>
> rf$DNAME[rf$DistrictID==59340] <- 'Aberdeen'
> rf$DNAME[rf$DistrictID==53409] <- 'Buhl'
>
>
>> with(rf, table(DistrictName, DNAME) )
>                                    DNAME
> DistrictName                         Aberdeen Buhl
>  Aberdeen School Dist. # 58              242    0
>  Buhl Joint School District                0  428
>
> But these if statements ...
>
> if(rf$DistrictID == 59340) {rf$D.NAME <- 'Aberdeen'}
> if(rf$DistrictID == 53409) {rf$D.NAME <- 'Buhl'}
>
>
> Lead to this ...
>
> with(rf, table(DistrictName, D.NAME) )
>
>                                    D.NAME
> DistrictName                         Aberdeen
>  Aberdeen School Dist. # 58              242
>  Buhl Joint School District              428
>
>
> What am I doing wrong in the if statement?
>
> Thanks!
> Art
>
> *
>
> Art Burke
>
> Associate, Evaluation Program
>
> Education Northwest
>
> 101 SW Main St, Ste 500
>
> Portland, OR 97204
>
> Phone: 503.275.9592
>
> art.bu...@educationnorthwest.org
> 
>
> http://educationnorthwest.org 
>
> We have recently changed our name to "Education Northwest" from
> "Northwest Regional Educational Laboratory." Please note the new e-mail
> and Web addresses in the signature above. You may continue to find us on
> the Web at http://www.nwrel.org   for the
> immediate future as well.
>
> 
>
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] by function ??

2009-12-08 Thread Ista Zahn
Hi,
I think you want

by(TestData[ , "RATIO"], LEAID, median)

-Ista

On Tue, Dec 8, 2009 at 8:36 PM, L.A.  wrote:
>
> I'm just learning and this is probably very simple, but I'm stuck.
>   I'm trying to understand the by().
> This works.
> by(TestData, LEAID, summary)
>
> But, This doesn't.
>
> by(TestData, LEAID, median(RATIO))
>
>
> ERROR: could not find function "FUN"
>
> HELP!
> Thanks,
> LA
> --
> View this message in context: 
> http://n4.nabble.com/by-function-tp955789p955789.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] partial match for two datasets

2009-12-08 Thread Lynn Wang


Hi all,

I have two sets: 

dig<-c("DAVID ADAMS","PIERS AKERMAN","SHERYLE BAGWELL","JULIAN 
BAJKOWSKI","CANDIDA BAKER")

import<-c("by DAVID ADAMS","piersAKERMAN","SHERYLE BagWEL","JULIAN BAJKOWSKI 
with ","Cand BAKER","smith green")
 
 
I want to get the following result from "import" after comparing the two sets
 
result<-c("by DAVID ADAMS","piersAKERMAN","JULIAN BAJKOWSKI with ")
 
 
I created a "partialmatch" function as follow, but can not get right result.

partialmatch<- function(x, y) as.vector(y[regexpr(as.character(x), 
as.character(y), ignore.case = TRUE)>0])

result<-partialmatch(dig,import)


[1] "by DAVID ADAMS"



Thanks,

lynn


  
__
Win 1 of 4 Sony home entertainment packs thanks to Yahoo!7.
Enter now: http://au.docs.yahoo.com/homepageset/
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] by function ??

2009-12-08 Thread L.A.

I'm just learning and this is probably very simple, but I'm stuck.
   I'm trying to understand the by().
This works.
by(TestData, LEAID, summary)

But, This doesn't.

by(TestData, LEAID, median(RATIO))


ERROR: could not find function "FUN"

HELP!
Thanks,
LA
-- 
View this message in context: 
http://n4.nabble.com/by-function-tp955789p955789.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Advice - get my function to work with a list

2009-12-08 Thread gcam

I really need help altering a function I have written.  The function
currently performs a specific task on a dataframe.  I now wish to be able to
use it with a splitting function by either passing it a pre-split dataframe
or by being able to designate the splitting factor.

Can anyone advise?

Here is my function:

attrition<-function(x){
atr.model<-matrix(nrow=8, ncol=2)
rownames(atr.model)<-c("Report", "Submission", "Loading", "Link",
"C.P.Link", "C.P.Sub", "Clear", "Performance")

#cases reported
atr.model["Report", 1]<-Reported[nrow(Reported),2]
atr.model["Report", 
2]<-(atr.model["Report",1]/atr.model["Report",1])*100

#submission rate
atr.model["Submission",1]<-nrow(x)
atr.model["Submission", 2]<-(atr.model["Submission", 
1]/atr.model["Report",
1])*100

#loading rate
atr.model["Loading",1]<-sum(x$Loaded.=="Y", x$Loaded.=="PC", na.rm=T)
atr.model["Loading", 2]<-(atr.model["Loading", 
1]/atr.model["Submission",
1])*100

#link rate
atr.model["Link",1]<-sum(x$Linked.=="Y", na.rm=T)
atr.model["Link", 2]<-(atr.model["Link", 1]/atr.model["Loading", 1])*100

#C.P
atr.model["C.P.Link",1]<-sum(x$Link.Type=="C.P", x$Link.Type=="C.C.P",
na.rm=T)
atr.model["C.P.Link", 2]<-(atr.model["C.P.Link", 1]/atr.model["Link",
1])*100

#C.P/Sub
atr.model["C.P.Sub", 2]<-(atr.model["C.P.Link", 
1]/atr.model["Submission",
1])*100

#clear
atr.model["Clear", 2]<-100

#Performance
atr.model["Performance",2]<-prod(atr.model["Submission", 2]/100,
atr.model["Loading", 2]/100, atr.model["C.P.Link", 2]/100,
atr.model["Clear", 2]/100)*100

atr.model[,2]<-round(atr.model[,2], 1)

print(atr.model)
}

As you can see it works for a dataframe now.  However I wish to build in a
split function or have it be able to deal with a split dataframe in the form
of a list.

Any advise would be fantastic.

Thanks

Gareth Campbell
-- 
View this message in context: 
http://n4.nabble.com/Advice-get-my-function-to-work-with-a-list-tp955783p955783.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] re-ordering x-lables using barchart()

2009-12-08 Thread Xin Ge
Thanks David, it worked!

On Tue, Dec 8, 2009 at 5:03 PM, David Winsemius wrote:

>
> On Dec 8, 2009, at 4:42 PM, Gary Miller wrote:
>
> Hi R Users,
>>
>> I'm trying to re-order the "site names" ("Waseca", "Morris", ...). I'm
>> using
>> following code:
>>
>> libarry(lattice)
>>
>
> # slip this code (or one with your preferred ordering) in before the plot
> call:
>
>  barley$site <- factor(barley$site, levels = c("Waseca","Morris",
> "Crookston"  , "University Farm","Grand Rapids" ,   "Duluth"))
>
>
> barchart(yield ~ variety | site, data = barley,
>> groups = year, layout = c(6,1), aspect=.7,
>> ylab = "Barley Yield (bushels/acre)",
>> scales = list(x = list(abbreviate = TRUE, rot=45, minlength =
>> 5)))
>>
>
> Comes out with rather squished labels although stretching the window helps.
>
>
>
>
>> Can anyone help please.
>>
>> -Gary
>>
>>
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with if statement

2009-12-08 Thread Erik Iverson
You can use the "ifelse" function for vectorized conditionals like you have 
here. See ?ifelse. 

From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of 
Arthur Burke [art.bu...@educationnorthwest.org]
Sent: Tuesday, December 08, 2009 6:45 PM
To: r-help@r-project.org
Subject: [R] Problem with if statement

I am trying to use the value of an ID variable in an if statement and
not getting the results I expected.

# ID values for two school districts
> with(rf, tapply(DistrictID, DistrictName, min) )

Aberdeen School Dist. # 58 Buhl Joint School District
 59340  53409


This creates DNAME as I expected ...

rf$DNAME[rf$DistrictID==59340] <- 'Aberdeen'
rf$DNAME[rf$DistrictID==53409] <- 'Buhl'


> with(rf, table(DistrictName, DNAME) )
DNAME
DistrictName Aberdeen Buhl
  Aberdeen School Dist. # 58  2420
  Buhl Joint School District0  428

But these if statements ...

if(rf$DistrictID == 59340) {rf$D.NAME <- 'Aberdeen'}
if(rf$DistrictID == 53409) {rf$D.NAME <- 'Buhl'}


Lead to this ...

with(rf, table(DistrictName, D.NAME) )

D.NAME
DistrictName Aberdeen
  Aberdeen School Dist. # 58  242
  Buhl Joint School District  428


What am I doing wrong in the if statement?

Thanks!
Art

*

Art Burke

Associate, Evaluation Program

Education Northwest

101 SW Main St, Ste 500

Portland, OR 97204

Phone: 503.275.9592

art.bu...@educationnorthwest.org


http://educationnorthwest.org 

We have recently changed our name to "Education Northwest" from
"Northwest Regional Educational Laboratory." Please note the new e-mail
and Web addresses in the signature above. You may continue to find us on
the Web at http://www.nwrel.org   for the
immediate future as well.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem with if statement

2009-12-08 Thread Arthur Burke
I am trying to use the value of an ID variable in an if statement and
not getting the results I expected.
 
# ID values for two school districts
> with(rf, tapply(DistrictID, DistrictName, min) )

Aberdeen School Dist. # 58 Buhl Joint School District 
 59340  53409 

 
This creates DNAME as I expected ...
 
rf$DNAME[rf$DistrictID==59340] <- 'Aberdeen'
rf$DNAME[rf$DistrictID==53409] <- 'Buhl'

 
> with(rf, table(DistrictName, DNAME) )
DNAME
DistrictName Aberdeen Buhl
  Aberdeen School Dist. # 58  2420
  Buhl Joint School District0  428
 
But these if statements ...
 
if(rf$DistrictID == 59340) {rf$D.NAME <- 'Aberdeen'}
if(rf$DistrictID == 53409) {rf$D.NAME <- 'Buhl'} 

 
Lead to this ...
 
with(rf, table(DistrictName, D.NAME) )

D.NAME
DistrictName Aberdeen
  Aberdeen School Dist. # 58  242
  Buhl Joint School District  428

 
What am I doing wrong in the if statement?
 
Thanks!
Art

*

Art Burke

Associate, Evaluation Program

Education Northwest

101 SW Main St, Ste 500

Portland, OR 97204

Phone: 503.275.9592

art.bu...@educationnorthwest.org
 

http://educationnorthwest.org  

We have recently changed our name to "Education Northwest" from
"Northwest Regional Educational Laboratory." Please note the new e-mail
and Web addresses in the signature above. You may continue to find us on
the Web at http://www.nwrel.org   for the
immediate future as well.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] conditionally merging adjacent rows in a data frame

2009-12-08 Thread Gabor Grothendieck
Here are a couple of solutions.  The first uses by and the second sqldf:

> Lines <- " rt dur tid  mood roi  x
+ 55 5523 200   4  subj   9  5
+ 56 5523  52   4  subj   7 31
+ 57 5523 209   4  subj   4  9
+ 58 5523 188   4  subj   4  7
+ 70 4016 264   5 indic   9 51
+ 71 4016 195   5 indic   4 14"
> d <- read.table(textConnection(Lines), header = TRUE)
>
>
> # solution 1 - see ?by and ?transform
>
> idx <- cumsum( c(TRUE,diff(d$roi)!=0) )
> do.call(rbind, by(d, idx, function(x)
+ transform(x, dur = sum(dur), x = mean(x))[1,,drop = FALSE ]))
rt dur tid  mood roi  x
1 5523 200   4  subj   9  5
2 5523  52   4  subj   7 31
3 5523 397   4  subj   4  8
4 4016 264   5 indic   9 51
5 4016 195   5 indic   4 14
>
> # solution 2 - see http://sqldf.googlecode.com
>
> dd <- data.frame(d, idx) # idx computed above
> library(sqldf)
> sqldf("select rt, sum(dur) dur, tid, mood, roi, avg(x) x from dd group by 
> idx")
rt dur tid  mood roi  x
1 5523 200   4  subj   9  5
2 5523  52   4  subj   7 31
3 5523 397   4  subj   4  8
4 4016 264   5 indic   9 51
5 4016 195   5 indic   4 14


On Tue, Dec 8, 2009 at 7:50 AM, Titus von der Malsburg
 wrote:
> Hi, I have a data frame and want to merge adjacent rows if some condition is
> met.  There's an obvious solution using a loop but it is prohibitively slow
> because my data frame is large.  Is there an efficient canonical solution for
> that?
>
>> head(d)
>     rt dur tid  mood roi  x
> 55 5523 200   4  subj   9  5
> 56 5523  52   4  subj   7 31
> 57 5523 209   4  subj   4  9
> 58 5523 188   4  subj   4  7
> 70 4016 264   5 indic   9 51
> 71 4016 195   5 indic   4 14
>
> The desired result would have consecutive rows with the same roi value merged.
> dur values should be added and x values averaged, other values don't differ in
> these rows and should stay the same.
>
>> head(result)
>     rt dur tid  mood roi  x
> 55 5523 200   4  subj   9  5
> 56 5523  52   4  subj   7 31
> 57 5523 397   4  subj   4  8
> 70 4016 264   5 indic   9 51
> 71 4016 195   5 indic   4 14
>
> There's also a solution using reshape.  It uses an index for blocks
>
>  d$index <- cumsum(c(TRUE,diff(d$roi)!=0))
>
> melts and then casts for every column using an appropriate fun.aggregate.
> However, this is a bit cumbersome and also I'm not sure how to make sure that
> I get the original order of rows.
>
> Thanks for any suggestion.
>
>  Titus
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Formatting the length one vector to match another?

2009-12-08 Thread David Winsemius

Or perhaps just:

plot(xrange, yrange, type ="s")

--
On Dec 8, 2009, at 6:00 PM, Peter Alspach wrote:


Tena koe Oliver

?seq

should help if I understand your question correctly 

Peter Alspach


-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of Wells Oliver
Sent: Wednesday, 9 December 2009 11:19 a.m.
To: r-help@r-project.org
Subject: [R] Formatting the length one vector to match another?

I have xrange which is a range of values from 1 to a max of 162.

I have a yrange of values which really could be any values,
but there's a min and a max. I'd like to create N number of
steps between the min and the max so the length matches the
xrange, so that I can plot them together.

Any tips? Thank you!

--
Wells Oliver
we...@submute.net



David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Formatting the length one vector to match another?

2009-12-08 Thread Peter Alspach
Tena koe Oliver

?seq

should help if I understand your question correctly 

Peter Alspach 

> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of Wells Oliver
> Sent: Wednesday, 9 December 2009 11:19 a.m.
> To: r-help@r-project.org
> Subject: [R] Formatting the length one vector to match another?
> 
> I have xrange which is a range of values from 1 to a max of 162.
> 
> I have a yrange of values which really could be any values, 
> but there's a min and a max. I'd like to create N number of 
> steps between the min and the max so the length matches the 
> xrange, so that I can plot them together.
> 
> Any tips? Thank you!
> 
> --
> Wells Oliver
> we...@submute.net
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R Split comma separated list

2009-12-08 Thread Carl Witthoft

All the solutions presented work, but, gosh,
why not fix the source file in the first place?

One way:  Open the text file in Excel or a clone thereof, and tell the 
importer dialog box to use both tabs and commas as delimiters.  Then 
save the result, which will have your 'third column' nicely split into 
separate sets of data.


Actually, I would recommend  going back to whatever (stupid incompetent) 
 tool produced that file full of inconsistent delimiters and fixed the 
problem at the source.



Carl

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] conditionally merging adjacent rows in a data frame

2009-12-08 Thread Marek Janad
I've sent last message only to Titus. Sorry :)
Below my proposition:

Instead of aggregate, try summaryBy from doBy package. It's much
faster. And this package made my life easier :)

try:
summaryBy(dur+x~index, data=d, FUN=c(sum, mean)),
but index should be in data.frame as I remember.

I haven't checked this code, so let me know, if it doesn't work.

2009/12/8 Titus von der Malsburg :
> On Tue, Dec 8, 2009 at 5:19 PM, Nikhil Kaza  wrote:
>> I suppose that is true, but the example data seem to suggest that it is
>> sorted by rt.
>
> I was not very clear on that.  Sorry.
>
>> d$count <- 1
>>  a <- with(d, aggregate(subset(d, select=c("dur", "x", "count"),
>> list(rt=rt,tid=tid,mood=mood,roi=roi), sum))
>> a$x <- a$x/a$count
>
> This is neat!
>
>> But it would still be nice to get a generic way that uses different
>> functions on different columns much like excel's pivot table.
>
> I guess the most straight-forward thing would be to extend aggregate
> to also accept instead of a FUN a list of FUNs where the first is
> applied to value of the first column of x (the data frame), the second
> to the second column, and so on.
>
>  Titus
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Marek

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grep() exclude certain patterns?

2009-12-08 Thread Peng Yu
On Fri, Dec 4, 2009 at 6:12 PM, Gavin Simpson  wrote:
> On Fri, 2009-12-04 at 15:18 -0600, Peng Yu wrote:
>> On Fri, Dec 4, 2009 at 3:06 PM, Peng Yu  wrote:
>> > On Fri, Dec 4, 2009 at 2:35 PM, Greg Snow  wrote:
>> >> The invert argument seems a likely candidate, you could also do perl=TRUE 
>> >> and use negations within the pattern (but that is probably overkill for 
>> >> your original question).
> 
>> Here is another bad example. See ?rep. The Usage section has 'rep(x,
>> ...)'. However, '...' is only explained later in Arguments. I know
>> that it is probably because '...' is from functions underlying rep().
>> But it does not matter to end users whether they are from an
>> underlying function or not. Why not put the arguments in the Usage
>> section?
>
> Because '...' is the argument to rep. Should we document very argument
> for every rep method in existence within this help file? The '...' has
> nothing to do with functions underlying rep if you mean "inside" rep, it
> is just the means by which arguments are passed from the generic to
> methods for objects of particular classes.

I knew what '...' means.

Let's take the following example to be clear what I mean. In order to
know what additional arguments can be passed to fun1() besides 'data',
'data.frame', 'graph', 'limit', I need to know fun1() calls par(). But
the information of which functions are called are not well documented
in the help. Then how can a user know the complete set of arguments
that can be passed to fun1()? I would say '...' makes the
implementation of functions easier but make the maintenance difficult.

fun1 <- function(data, data.frame, graph=TRUE, limit=20, ...) {
[omitted statements]
if (graph)
par(pch="*", ...)
[more omissions]
}

C has a similar grammar of '...' to accept any arguments, which was
useful in the old days. But I don't see any working examples in C++
that support '...'. C++ eliminates the usage of '...' by using classes
as arguments. But eliminating '...', the code will be more
understandable. I believe this is possible in R if we want to do so.

>> Similar cases can be found in the help of many functions.
>
> Probably because that's how the S3 method system works.
>
> If you learn how and why the R system works, things will make much more
> sense.
>
> G
>
> 
> --
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>  Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
>  ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
>  Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
>  Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
>  UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] arrow plots

2009-12-08 Thread Jim Lemon

On 12/09/2009 05:42 AM, Cable, Samuel B Civ USAF AFMC AFRL/RVBXI wrote:

Am doing some vector plots with the arrows() function.  Works well.  But
what I need to do is supply an arrow for scaling for the reader.  I need
to plot an arrow of some known magnitude somewhere on the page
(preferably outside the bounds of the plot, so that it can be seen
clearly) with some text underneath it that says, for instance, "10
kg-m/sec".  Any ideas?  Thanks.

   

Hi Samuel,
To add to what Gavin wrote, you can see code for a key of this sort in 
the lengthKey function in the plotrix package.


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grep() exclude certain patterns?

2009-12-08 Thread Peng Yu
On Fri, Dec 4, 2009 at 11:17 PM, Greg Snow  wrote:
> I am sure that you mentioned before that your are using 2.7.1, and possibly 
> even why, but with the number of posts to this list each day and the number 
> of different posters, I cannot keep track of what version everyone is using 
> (well, I probably could, but I am unwilling to put in the time/effort 
> required, and I don't expect anyone else to do it either).  Unless you remind 
> us otherwise, we will generally assume that you are using a reasonably up to 
> date version in answering questions.
>
> You will have a much easier time if you upgrade to a recent version of R, or 
> at least get a copy of the docs for the recent version.  Even if you don't 
> have rights on the computer you mainly use (common on university campuses), 
> you can have a portable version.  I have a copy of R installed on a flash 
> drive that I can run on someone else's computer without having to install 
> anything on it, or look up a reference, etc.
>
> Think about how long it has been since you asked the original question, and 
> you still don't have a usable (for you) answer.  What if instead you had done 
> a little comparison between your version of R and the current version and 
> phrased your question like:
>
> I would like to select some elements of a vector that do not match a given 
> pattern, in R 2.10 I could use the grep function with the argument 
> invert=TRUE, but I would like the script to be able to run on a computer that 
> I do not control with only R version 2.7.1 which has the grep function, but 
> not the invert argument.  Any suggestions?
>
> If that had been your original question (note that it shows what effort was 
> made, what restrictions you are working under, and other details), I bet that 
> you would have something productive by now.
>
> See further comments interspersed below:
>
>
>
>> -Original Message-
>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
>> project.org] On Behalf Of Peng Yu
>> Sent: Friday, December 04, 2009 2:06 PM
>> To: r-h...@stat.math.ethz.ch
>> Subject: Re: [R] grep() exclude certain patterns?
>>
>> On Fri, Dec 4, 2009 at 2:35 PM, Greg Snow  wrote:
>> > The invert argument seems a likely candidate, you could also do
>> perl=TRUE and use negations within the pattern (but that is probably
>> overkill for your original question).
>
> [snip]
>
>
>> > Could you explain to us the process that you use to search for
>> answers to your questions before posting?  You have been asking quite a
>> few questions that have answers out there if you can find them.  If you
>> tell us where you are looking (and why) then we may be able to suggest
>> some different search strategies that will help you find the answers
>> quicker.  Also knowing your thought process may help us in designing
>> future help/tutorials that cater more to people learning R for the
>> first time, things that seem obvious to those of us who have been using
>> the current documentation, apparently are not that obvious to some new
>> users (but also realize that the first place that you may think to look
>> may not even occur to some of us that learned computers in a different
>> time, see fortune(89) ).
>>
>> For this particular problem in the original post, it is due to the
>> fact that I use an older R.
>
> I had hoped that you would give us a bit more about your learning process 
> than just a list of a few help pages that you have read.  What tutorials or 
> other documents have you read?, what classes have you taken?, etc.  Also what 
> was your search process, where did you look? What search terms did you use? 
> Etc.

I have read R-intro.pdf. But there are a few parts of R-intro.pdf that
did not make sense to when I was new to R, if I didn't read
R-intro.pdf in order and even if I read it in order.

Let's take section 6.3 'Data frames' as an example.

Suppose that a guy had extensive experience with at least a major
programming language (such as, C++, python, etc.), but was new to R.
He wanted to pick up R quickly by working with it. He needed to learn
data frame to finish his task under a pressure. He read R-intro.pdf.
He found that Section 6.3 is on data frame and he went on to read it.

'A data frame is a list with class "data.frame". There are
restrictions on lists that may be made into data frames, namely...'

Does this sentence make sense to him? No. He wanted to understand what
a data frame is. But the document explains data.frame requiring the
reader know what a list is, but he didn't understand what a list was.
Many R people on the mailing list could say, he should read list
first. My response is 'why he has to?' An example of data frame can be
shown at the beginning of Section 6.3, it would be a much better
explanation than the current explanation with the reference to 'list'.

Then, the guy went on to Section 6.3.1, hoping to understand data
frame. But now 'read.table' jumps out with the reference to a later
Chapter. A

[R] Formatting the length one vector to match another?

2009-12-08 Thread Wells Oliver
I have xrange which is a range of values from 1 to a max of 162.

I have a yrange of values which really could be any values, but there's a
min and a max. I'd like to create N number of steps between the min and the
max so the length matches the xrange, so that I can plot them together.

Any tips? Thank you!

-- 
Wells Oliver
we...@submute.net

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] conditionally merging adjacent rows in a data frame

2009-12-08 Thread Titus von der Malsburg
On Tue, Dec 8, 2009 at 5:19 PM, Nikhil Kaza  wrote:
> I suppose that is true, but the example data seem to suggest that it is
> sorted by rt.

I was not very clear on that.  Sorry.

> d$count <- 1
>  a <- with(d, aggregate(subset(d, select=c("dur", "x", "count"),
> list(rt=rt,tid=tid,mood=mood,roi=roi), sum))
> a$x <- a$x/a$count

This is neat!

> But it would still be nice to get a generic way that uses different
> functions on different columns much like excel's pivot table.

I guess the most straight-forward thing would be to extend aggregate
to also accept instead of a FUN a list of FUNs where the first is
applied to value of the first column of x (the data frame), the second
to the second column, and so on.

  Titus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] problem with split eating giga-bytes of memory

2009-12-08 Thread Mark Kimpel
I'm having trouble using split on a very large data-set with ~1400 levels of
the factor to be split. Unfortunately, I can't reproduce it with the simple
self-contained example below. As you can see, splitting the artificial
dataframe of size ~13MB results in a split dataframe of ~ 144MB, with an
increase memory allocation of ~10 fold for the split object. If split scales
linearly, then my actual 52MB dataframe should be easily handled by my 12GB
of RAM, but it is not. instead, when I try to split selectSubAct.df on one
of its factors with 1473 levels, my memory is slowly gobbled up (plus 3 GB
of swap) until I cancel the operation.

Any ideas on what might be happening? Thanks, Mark

myDataFrame <- data.frame(matrix(LETTERS, ncol = 7, nrow = 399000))
mySplitVar <- factor(as.character(1:1400))
myDataFrame <- cbind(myDataFrame, mySplitVar)
object.size(myDataFrame)
## 12860880 bytes # ~ 13MB
myDataFrame.split <- split(myDataFrame, myDataFrame$mySplitVar)
object.size(myDataFrame.split)
## 144524992 bytes # ~ 144MB
object.size(selectSubAct.df)
## 52,348,272 bytes # ~ 52MB

> sessionInfo()
R version 2.10.0 Patched (2009-10-27 r50222)
x86_64-unknown-linux-gnu

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices datasets  utils methods   base

loaded via a namespace (and not attached):
[1] tools_2.10.0

Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine

15032 Hunter Court, Westfield, IN  46074

(317) 490-5129 Work, & Mobile & VoiceMail
(317) 399-1219 Skype No Voicemail please

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] re-ordering x-lables using barchart()

2009-12-08 Thread David Winsemius


On Dec 8, 2009, at 4:42 PM, Gary Miller wrote:


Hi R Users,

I'm trying to re-order the "site names" ("Waseca", "Morris", ...).  
I'm using

following code:

libarry(lattice)


# slip this code (or one with your preferred ordering) in before the  
plot call:


 barley$site <- factor(barley$site, levels = c("Waseca","Morris",  
"Crookston"  , "University Farm","Grand Rapids" ,   "Duluth"))



barchart(yield ~ variety | site, data = barley,
 groups = year, layout = c(6,1), aspect=.7,
 ylab = "Barley Yield (bushels/acre)",
 scales = list(x = list(abbreviate = TRUE, rot=45,  
minlength =

5)))


Comes out with rather squished labels although stretching the window  
helps.




Can anyone help please.

-Gary




David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Split comma separated list

2009-12-08 Thread Gaurav Moghe
Thanks a lot to everyone who replied. I used some of Stephan Kolassa's
suggestions (thanks, Stephan). Here's my final code which worked (with
comments to explain novices like me who come looking for the same answer)

x=read.table("file",sep="\t",colClasses="character")
prate<-x[,2]
lrate<-x[,3] #this reads column 3

for (i in (0:nrow(x))) {#for each row in the file
  line1<-lrate[i]  #define it separately
  lsp<-as.vector(as.numeric(unlist(strsplit(line1,"\\,"  #explanation
below
  }

1) strsplit Splits line by comma into a list
2) unlist "unlists" the list
3) as.vector(as.numeric()) converts it to a numeric vector for plotting a
boxplot

It is possible that the other approaches in this thread also work, some more
efficiently

Thanks again to everyone who replied

Gaurav


On Tue, Dec 8, 2009 at 4:05 PM, Phil Spector wrote:

> Gaurav -
>   Here's one way:
>
>  x = textConnection('ID1  0.342 0.01,1.2,0,0.323,0.67
>>
> + ID2 0.010  0.987,0.056,1.3,1.5,0.4
> + ID3 0.146  0.1173,0.1494,0.211,0.1257
> + + ')
>
>> y = read.table(x,stringsAsFactors=FALSE)
>> res = apply(y,1,function(x)as.numeric(strsplit(x[3],',')[[1]]))
>> names(res) = y[,1]
>> boxplot(res)
>>
>
>- Phil Spector
> Statistical Computing Facility
> Department of Statistics
> UC Berkeley
> spec...@stat.berkeley.edu
>
>
>
> On Tue, 8 Dec 2009, Gaurav Moghe wrote:
>
>  Hi all,
>>
>> I'm a beginner user of R. I am stuck at what I thought was a very obvious
>> problem, but surprisingly, I havent found any solution on the forum or
>> online till now.
>>
>> My problem is simple. I have a file which has entries like the following:
>> #ID   Value1List_of_values
>> ID1  0.342 0.01,1.2,0,0.323,0.67
>> ID2 0.010  0.987,0.056,1.3,1.5,0.4
>> ID3 0.146  0.1173,0.1494,0.211,0.1257
>> ...
>> ...
>>
>> I want to split the third column (by comma) into individual values and put
>> them in a variable so that I can plot a boxplot with those values, one
>> boxplot per row . I have been having three issues:
>> 1) R identifies the third column as an integer, instead of a list of lists
>> 2) I havent been able to split the third column into individual values
>> 3) How do I get it in a format suitable for plotting a boxplot?
>>
>> Any suggestions? I'd really appreciate any help on this.
>>
>> Thank you,
>> Gaurav
>>
>>[[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] igraph plot - vertex colors

2009-12-08 Thread Jakson A. Aquino
On Sun, Dec 06, 2009 at 04:34:18PM -0800, Brock Tibert wrote:
> I have successfully created and analyzed my network data.  I am
> new to R, and Network Analysis too, but I want to color my
> vertex based on some of the centrality measures calculated.  
> 
> Can someone point me in the right direction? Again, I am new to
> R, but given how powerful R appears to be, I figure this is
> probably pretty easy to do, I just wish I could figure it out.

Below is an example of how to do it. Suppose you have a igraph
object called g:

hc5 <- heat.colors(5)
g.bet <- betweenness(g)
g.bet.max <- max(g.bet)
vcolors <- 5 - round(4 *(g.bet / g.bet.max))
vcolors <- hc5[vcolors]
plot(g, vertex.color=vcolors, main = "Betweenness")

-- 
Jakson

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem in labeling the nodes of tree drawn by rpart

2009-12-08 Thread Frank E Harrell Jr
Depending on your sample size, you might be able to just label the nodes 
by drawing a random sample from the variable names :-)


Frank

kaida ning wrote:

Hi all,

I used rpart to fit a model, where the covariates are categorical variables.
Then I plotted the tree (mytree) and used the command "text" to add labels
to the tree.

In the nodes of the tree, the values of the covariates are represented with
a, b or c (tree attached).
Is there a way to show the real value(s) of the variable in the nodes
instead of a, b or c ?
I found that the command "labels(mytree,minlength=3)" can give me the
desired label, but I don't know how to add it to the tree.


Best,
Kaida


--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Split comma separated list

2009-12-08 Thread David Winsemius

On Dec 8, 2009, at 4:12 PM, Gaurav Moghe wrote:

> Hi David,
>
> 1) My code is as follows:
> x=read.table("file",sep="\t")
> prate<-x[,2]
> lrates<-(x[,3])
> When I do:
> print (typeof(lrates)): I get "integer"

You've already received several solutions from people more R-savvy  
than I,  so I will try instead to explain what went wrong with your  
code. I will bet you get:

 > class(lrates)
[1] "factor"

typeof() can be rather misleading when it comes to factors. They are  
represented internally as integers, even you they look like characters  
when printed.

>
> When I do:
> for (line1 in lrates) {
> lsp<-unlist(strsplit(line1,"\\,"))
>   }
> I get some intermediate value

If you had attempted the above with as.character(line1, "\\1,") it  
might have worked, since you almost certainly were operating on a  
factor. The as.character() coercion function will give you the  
character labels of the levels of factors.


> 2) With options()$dec, I get "NULL"

No problem here. I was wondering if you might be somewhere that uses  
the period as a comma and commas as decimal point.

-- 
David


>
> I wonder whether thats what you want,
> Gaurav
>
> On Tue, Dec 8, 2009 at 4:03 PM, David Winsemius  > wrote:
> Two questions:
>
> What is your code?
>
> What do you get with:
> > options()$dec
> decimal_point
>  "."
>
> -- David
>
>
> On Dec 8, 2009, at 3:55 PM, Gaurav Moghe wrote:
>
> Hi all,
>
> I'm a beginner user of R. I am stuck at what I thought was a very  
> obvious
> problem, but surprisingly, I havent found any solution on the forum or
> online till now.
>
> My problem is simple. I have a file which has entries like the  
> following:
> #ID   Value1List_of_values
> ID1  0.342 0.01,1.2,0,0.323,0.67
> ID2 0.010  0.987,0.056,1.3,1.5,0.4
> ID3 0.146  0.1173,0.1494,0.211,0.1257
> ...
> ...
>
> I want to split the third column (by comma) into individual values  
> and put
> them in a variable so that I can plot a boxplot with those values, one
> boxplot per row . I have been having three issues:
> 1) R identifies the third column as an integer, instead of a list of  
> lists
> 2) I havent been able to split the third column into individual values
> 3) How do I get it in a format suitable for plotting a boxplot?
>
> Any suggestions? I'd really appreciate any help on this.
>
> Thank you,
> Gaurav
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
>
>
>

David Winsemius, MD
Heritage Laboratories
West Hartford, CT


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] re-ordering x-lables using barchart()

2009-12-08 Thread Gary Miller
Hi R Users,

I'm trying to re-order the "site names" ("Waseca", "Morris", ...). I'm using
following code:

libarry(lattice)
barchart(yield ~ variety | site, data = barley,
  groups = year, layout = c(6,1), aspect=.7,
  ylab = "Barley Yield (bushels/acre)",
  scales = list(x = list(abbreviate = TRUE, rot=45, minlength =
5)))

Can anyone help please.

-Gary

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Split comma separated list

2009-12-08 Thread William Dunlap
> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of Gaurav Moghe
> Sent: Tuesday, December 08, 2009 12:56 PM
> To: r-help@r-project.org
> Subject: [R] Split comma separated list
> 
> Hi all,
> 
> I'm a beginner user of R. I am stuck at what I thought was a 
> very obvious
> problem, but surprisingly, I havent found any solution on the forum or
> online till now.
> 
> My problem is simple. I have a file which has entries like 
> the following:
> #ID   Value1List_of_values
> ID1  0.342 0.01,1.2,0,0.323,0.67
> ID2 0.010  0.987,0.056,1.3,1.5,0.4
> ID3 0.146  0.1173,0.1494,0.211,0.1257
> ...
> ...
> 
> I want to split the third column (by comma) into individual 
> values and put
> them in a variable so that I can plot a boxplot with those values, one
> boxplot per row . I have been having three issues:
> 1) R identifies the third column as an integer, instead of a 
> list of lists
> 2) I havent been able to split the third column into individual values

For 1) and 2) try:
  > z <- read.table(textConnection(input), header=T, comment="",
row.names=1, stringsAsFactors=FALSE)
  > z$List <- strsplit(z$List_of_values, ",") # strsplit fails on
factors
  > z$List <- lapply(z$List, as.numeric) # strings->numbers
  > z
   Value1 List_of_values
List
   ID1  0.342  0.01,1.2,0,0.323,0.67 0.010, 1.200, 0.000, 0.323,
0.670
   ID2  0.0100.987,0.056,1.3,1.5,0.4 0.987, 0.056, 1.300, 1.500,
0.400
   ID3  0.146 0.1173,0.1494,0.211,0.12570.1173, 0.1494, 0.2110,
0.1257

> 3) How do I get it in a format suitable for plotting a boxplot?

  > boxplot(z$List, names=z$Value1)

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> 
> Any suggestions? I'd really appreciate any help on this.
> 
> Thank you,
> Gaurav
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] {Lattice} help.

2009-12-08 Thread Greg Snow
Here are a couple of others to try (using the lattice package):

dotplot(Factor1 ~ Value | Factor2, data=foo, groups=Factor3, auto.key=T)
dotplot(Factor1 ~ jitter(Value) | Factor2, data=foo, groups=Factor3, auto.key=T)
dotplot(Factor3 ~ Value | Factor2*Factor1, data=foo )
dotplot(Factor1:Factor3 ~ Value | Factor2, data=foo )

also see the dotchart2 function in the Hmisc package for another version of the 
dotplot.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of Xin Ge
> Sent: Monday, December 07, 2009 6:50 PM
> To: r-help@r-project.org
> Subject: [R] {Lattice} help.
> 
> Hi All,
> 
> I have a 4-dimensional data. I'm using barchart() function from lattice
> package. The R code and data are below - code includes one for
> stack=TRUE
> and other for stack=FALSE.
> 
> I would like to present the data in another form which would be
> plotting
> Factor3 levels (P, Q, R, S) as two stacked bars (side by side). Like,
> for
> each level of Factor1 there should be two bars: first bar showing
> stacked
> values of "P" and "Q" and the adjacent bar showing stacked values of
> "R" and
> "S". Is it possible using barchart() function?
> 
> OR, if someone can give me some suggestions on the best way to present
> such data. Any help would be highly appreciated.
> 
> # Reading data in object "foo"
> 
> barchart(foo$Value ~ foo$Factor1 | foo$Factor2, data = foo,
>  groups = foo$Factor3, stack = TRUE,
>  auto.key = list(points = FALSE, rectangles = TRUE, space =
> "right"))
> barchart(foo$Value ~ foo$Factor1 | foo$Factor2, data = foo,
>  groups = foo$Factor3, stack = FALSE,
>  auto.key = list(points = FALSE, rectangles = TRUE, space =
> "right"))
> 
> # Data
> 
> Factor1Factor2Factor3Value
> AXP10
> AXQ20
> AXR10
> AXS20
> AYP20
> AYQ5
> AYR20
> AYS5
> BXP  20
> BXQ10
> BXR20
> BXS10
> BYP30
> BYQ50
> BYR30
> BYS50
> Thanks,
> ~Xin
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Split comma separated list

2009-12-08 Thread Gaurav Moghe
Hi David,

1) My code is as follows:
x=read.table("file",sep="\t")
prate<-x[,2]
lrates<-(x[,3])
When I do:
print (typeof(lrates)): I get "integer"

When I do:
for (line1 in lrates) {
lsp<-unlist(strsplit(line1,"\\,"))
  }
I get some intermediate value
2) With options()$dec, I get "NULL"

I wonder whether thats what you want,
Gaurav

On Tue, Dec 8, 2009 at 4:03 PM, David Winsemius wrote:

> Two questions:
>
> What is your code?
>
> What do you get with:
> > options()$dec
> decimal_point
>  "."
>
> -- David
>
>
> On Dec 8, 2009, at 3:55 PM, Gaurav Moghe wrote:
>
>  Hi all,
>>
>> I'm a beginner user of R. I am stuck at what I thought was a very obvious
>> problem, but surprisingly, I havent found any solution on the forum or
>> online till now.
>>
>> My problem is simple. I have a file which has entries like the following:
>> #ID   Value1List_of_values
>> ID1  0.342 0.01,1.2,0,0.323,0.67
>> ID2 0.010  0.987,0.056,1.3,1.5,0.4
>> ID3 0.146  0.1173,0.1494,0.211,0.1257
>> ...
>> ...
>>
>> I want to split the third column (by comma) into individual values and put
>> them in a variable so that I can plot a boxplot with those values, one
>> boxplot per row . I have been having three issues:
>> 1) R identifies the third column as an integer, instead of a list of lists
>> 2) I havent been able to split the third column into individual values
>> 3) How do I get it in a format suitable for plotting a boxplot?
>>
>> Any suggestions? I'd really appreciate any help on this.
>>
>> Thank you,
>> Gaurav
>>
>>[[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Split comma separated list

2009-12-08 Thread Stephan Kolassa

Hi Gaurav,

1) tell R when reading the data to consider the third column as 
"character" via the colClasses argument to read.table()


2) foo <- as.numeric(strplit(dataset$List_of_values,","))

3) unlist(foo) or some such

HTH,
Stephan


Gaurav Moghe schrieb:

Hi all,

I'm a beginner user of R. I am stuck at what I thought was a very obvious
problem, but surprisingly, I havent found any solution on the forum or
online till now.

My problem is simple. I have a file which has entries like the following:
#ID   Value1List_of_values
ID1  0.342 0.01,1.2,0,0.323,0.67
ID2 0.010  0.987,0.056,1.3,1.5,0.4
ID3 0.146  0.1173,0.1494,0.211,0.1257
...
...

I want to split the third column (by comma) into individual values and put
them in a variable so that I can plot a boxplot with those values, one
boxplot per row . I have been having three issues:
1) R identifies the third column as an integer, instead of a list of lists
2) I havent been able to split the third column into individual values
3) How do I get it in a format suitable for plotting a boxplot?

Any suggestions? I'd really appreciate any help on this.

Thank you,
Gaurav

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Quadratcount problem in spatstat

2009-12-08 Thread Rolf Turner


Sebastian:  Can you send me (off-list) your point pattern (ppp_cameroon)
so that I can experiment with it and try to figure out what's
going wrong?  (I am one of the maintainers of spatstat.)

Save the point pattern using dput(), e.g.

dput(ppp_cameroon,"cameroon.dput")

and then attach ``cameroon.dput'' to your email.

Thanks.

cheers,

Rolf Turner

On 9/12/2009, at 9:38 AM, Sebastian Schutte wrote:


Hi,

I know there are older threads discussing the quadratcount function in
spatstat. Unfortunately, I could not find a solution to my problem  
there.


I'm analyzing a point pattern in an irregular polygonal window.  
Both the

window (an entire country) and the points are projected using WGS84.
When I do quadratcount with only one quadrat for the entire country it
holds all my 154 points:

test <- quadratcount(ppp_cameroon, nx=1, ny=1)
plot(test)
sum(test)
[1] 154

But when I split the country up into several quadrats (which is the  
only
thing that makes sense for the statistics), the function returns  
results

that are plainly wrong. The counts always add up to 154, but when
overlaid with the actual points it looks like some points are not  
at all

counted (0 count quadrats where points are) and some count values are
much higher than they should be (empty quadrats with 20 points).

The problem does not occur with the humberside example data. I can  
also
reproduce everything in example(quadratcount) correctly with  
humberside,

but not with my own data set.

Quadrat.test and quadrat.test.ppp lead to the same problem.

I'm out of ideas, so thank you very much for any suggestions!
Sebastian

P.S:
I'm running R 2.10.0 under Ubuntu Linux 9.10 with spatstat 1.17-2


##
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Split comma separated list

2009-12-08 Thread Phil Spector

Gaurav -
   Here's one way:


x = textConnection('ID1  0.342 0.01,1.2,0,0.323,0.67

+ ID2 0.010  0.987,0.056,1.3,1.5,0.4
+ ID3 0.146  0.1173,0.1494,0.211,0.1257
+ 
+ ')

y = read.table(x,stringsAsFactors=FALSE)
res = apply(y,1,function(x)as.numeric(strsplit(x[3],',')[[1]]))
names(res) = y[,1]
boxplot(res)


- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu


On Tue, 8 Dec 2009, Gaurav Moghe wrote:


Hi all,

I'm a beginner user of R. I am stuck at what I thought was a very obvious
problem, but surprisingly, I havent found any solution on the forum or
online till now.

My problem is simple. I have a file which has entries like the following:
#ID   Value1List_of_values
ID1  0.342 0.01,1.2,0,0.323,0.67
ID2 0.010  0.987,0.056,1.3,1.5,0.4
ID3 0.146  0.1173,0.1494,0.211,0.1257
...
...

I want to split the third column (by comma) into individual values and put
them in a variable so that I can plot a boxplot with those values, one
boxplot per row . I have been having three issues:
1) R identifies the third column as an integer, instead of a list of lists
2) I havent been able to split the third column into individual values
3) How do I get it in a format suitable for plotting a boxplot?

Any suggestions? I'd really appreciate any help on this.

Thank you,
Gaurav

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Changing border width in barplot ?

2009-12-08 Thread Tal Galili
Hi Marc,
You answered my question in depth, leaving me to go with another solution.

Thank you for the detailed answer!
Best,
Tal



Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com/ (English)
--




On Tue, Dec 8, 2009 at 10:38 PM, Marc Schwartz  wrote:

>
> On Dec 8, 2009, at 2:26 PM, Tal Galili wrote:
>
>  Is it possible?
>> I was hoping to find something like:
>> lwd
>> for the different bars in the barplot but couldn't find it.
>> Does it exist ?
>>
>> Thanks,
>> Tal
>>
>
>
>
> barplot() calls rect() via a wrapper function internally to draw the
> rectangles. rect() uses par("lwd") to define the line widths by default:
>
>  par(mfrow = c(2, 1))
>
>  barplot(1:5)
>
>  par(lwd = 3)
>
>  barplot(1:5)
>
>
> If you wanted to define the widths of the lines on a per bar basis, you
> would need to manually create the bars using rect() directly, since there is
> no other way that I can see to pass a vector of line widths as the code is
> presently implemented.
>
> You might want to look at the barchart() function in lattice to see if
> there is more functionality there.
>
> HTH,
>
> Marc Schwartz
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Split comma separated list

2009-12-08 Thread David Winsemius

Two questions:

What is your code?

What do you get with:
> options()$dec
decimal_point
  "."

--  
David


On Dec 8, 2009, at 3:55 PM, Gaurav Moghe wrote:


Hi all,

I'm a beginner user of R. I am stuck at what I thought was a very  
obvious

problem, but surprisingly, I havent found any solution on the forum or
online till now.

My problem is simple. I have a file which has entries like the  
following:

#ID   Value1List_of_values
ID1  0.342 0.01,1.2,0,0.323,0.67
ID2 0.010  0.987,0.056,1.3,1.5,0.4
ID3 0.146  0.1173,0.1494,0.211,0.1257
...
...

I want to split the third column (by comma) into individual values  
and put

them in a variable so that I can plot a boxplot with those values, one
boxplot per row . I have been having three issues:
1) R identifies the third column as an integer, instead of a list of  
lists

2) I havent been able to split the third column into individual values
3) How do I get it in a format suitable for plotting a boxplot?

Any suggestions? I'd really appreciate any help on this.

Thank you,
Gaurav

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Split comma separated list

2009-12-08 Thread Gaurav Moghe
Hi all,

I'm a beginner user of R. I am stuck at what I thought was a very obvious
problem, but surprisingly, I havent found any solution on the forum or
online till now.

My problem is simple. I have a file which has entries like the following:
#ID   Value1List_of_values
ID1  0.342 0.01,1.2,0,0.323,0.67
ID2 0.010  0.987,0.056,1.3,1.5,0.4
ID3 0.146  0.1173,0.1494,0.211,0.1257
...
...

I want to split the third column (by comma) into individual values and put
them in a variable so that I can plot a boxplot with those values, one
boxplot per row . I have been having three issues:
1) R identifies the third column as an integer, instead of a list of lists
2) I havent been able to split the third column into individual values
3) How do I get it in a format suitable for plotting a boxplot?

Any suggestions? I'd really appreciate any help on this.

Thank you,
Gaurav

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Upgrading To 2.10 from 2.6.2

2009-12-08 Thread stephen's mailinglist account
On Tue, Dec 8, 2009 at 8:52 PM, stephen's mailinglist account
 wrote:
> On Tue, Dec 8, 2009 at 7:14 PM,   wrote:
>> Hi Stephen,
>>
>> After running the script
>>
>> sudo apt-get update
>> sudo apt-get install r-base
>>
>> I launch R and find the it still refers to R 2.6.2.
>>
>> How do I insure that R 2.10.0 is the active version of R ?
>>
>> Thanks for the help.
>>
>> Steve
>>
>>
>>
>> Steve Friedman Ph. D.
>> Spatial Statistical Analyst
>> Everglades and Dry Tortugas National Park
>> 950 N Krome Ave (3rd Floor)
>> Homestead, Florida 33034
>>
>> steve_fried...@nps.gov
>> Office (305) 224 - 4282
>> Fax     (305) 224 - 4147
>>
>>
>>
>>             "stephen's
>>             mailinglist
>>             account"                                                   To
>>             >             i...@googlemail.c         
>>             om>                                                        cc
>>                                       steve_fried...@nps.gov,
>>             12/08/2009 01:43          r-h...@r-project.org
>>             PM                                                    Subject
>>                                       Re: [R] Upgrading To 2.10 from
>>                                       2.6.2
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Dec 8, 2009 at 6:24 PM, stephen's mailinglist account
>>  wrote:
>>>
>>>
>>> On Tue, Dec 8, 2009 at 2:25 PM, Iain Gallagher
>>  wrote:

 Hi Steve

 Have you tried:

 apt-cache search gfortran

 in a terminal window.

 Then

 sudo apt-get install theRelevantPackage

 I think you also need the Universe repos enabled.

 HTH

 Iain

 --- On Tue, 8/12/09, steve_fried...@nps.gov 
>> wrote:

 > From: steve_fried...@nps.gov 
 > Subject: [R] Upgrading To 2.10 from 2.6.2
 > To: r-help@r-project.org
 > Date: Tuesday, 8 December, 2009, 13:38
 >
 > Hello
 >
 > I have a Linux machine (Ubuntu 8.04 hardy, Gcc version
 > 4.2.4
 > (i486-linux-gnu) currently running R 2.6.2. I'd like to
 > upgrade to 2.10.
 >
 > First Question):  What is the appropriate way to
 > remove the old version of
 > R?
 >
 >
 > Part 2.
 >  After downloading  r-base_2.10.0.orig.tar.gz and
 > opening the archive. I
 > ran the ./configure routine.
 >
 > It failed claiming that it could not find the F77
 > compiler.
 >

>> Apologies posted with some unintended HTML format which was scrubbed on
>> list
>>
>> repost
>>
>> Why not add the CRAN repository to your sources list in Ubuntu and let
>> your package manager (synaptic?/apt?) sort it out for you.  Version
>> 2.10 is available here for Hardy.  It is a good way to keep your
>> version of R up to date without worrying about what is happening with
>> the rest of your machine, which may be quite comfortably stable, and
>> it saves the hassle of you having to compile it yourself
>>
>> see pages at http://cran.r-project.org/bin/linux/ubuntu/
>>
>> You will need to add a line similar to the one below to
>> /etc/apt/sources.list
>>
>> deb http:///bin/linux/ubuntu hardy/
>>
>> you could edit the file directly or do it via synaptic or Gnome menu
>> (system/admin)
>>  has an entry for software sources.
>>
>>
>>
>>
>
Did you see whether it installed 2.10?
 If you fired up synaptic and click on origin do you see a CRAN mirror
 site with some packages to be able to select?  If so is 2.10 amongst
 them?

have you added the gpg signing key?


-- 
Stephen

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] arrow plots

2009-12-08 Thread Greg Snow
You can use the grconvertX and/or grconvertY functions to help find the 
coordinates for your arrow (for plotting in the margin). 

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of Gavin Simpson
> Sent: Tuesday, December 08, 2009 11:57 AM
> To: Cable, Samuel B Civ USAF AFMC AFRL/RVBXI
> Cc: r-help@r-project.org
> Subject: Re: [R] arrow plots
> 
> On Tue, 2009-12-08 at 13:42 -0500, Cable, Samuel B Civ USAF AFMC
> AFRL/RVBXI wrote:
> > Am doing some vector plots with the arrows() function.  Works well.
> But
> > what I need to do is supply an arrow for scaling for the reader.  I
> need
> > to plot an arrow of some known magnitude somewhere on the page
> > (preferably outside the bounds of the plot, so that it can be seen
> > clearly) with some text underneath it that says, for instance, "10
> > kg-m/sec".  Any ideas?  Thanks.
> 
> You can plot outside the plotting region using the 'xpd' plotting
> parameter:
> 
> plot(1:10, 1:10) ## dummy plot
> op <- par(xpd = TRUE) ## change clipping parameter & save defaults
> arrows(9, 10.75, 10, 10.75, length = 0.1) ## draw reference arrow
> text(9.5, 11,
>  labels = expression(10 ~ kg ~ m ~ sec^{"-1"})) ## add key
> par(op) ## reset plotting parameter defaults
> 
> You'll probably need to tweak the expression to get the units you
> wanted.
> 
> HTH
> 
> G
> --
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>  Dr. Gavin Simpson [t] +44 (0)20 7679 0522
>  ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
>  Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
>  Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
>  UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Changing border width in barplot ?

2009-12-08 Thread Marc Schwartz


On Dec 8, 2009, at 2:26 PM, Tal Galili wrote:


Is it possible?
I was hoping to find something like:
lwd
for the different bars in the barplot but couldn't find it.
Does it exist ?

Thanks,
Tal




barplot() calls rect() via a wrapper function internally to draw the  
rectangles. rect() uses par("lwd") to define the line widths by default:


  par(mfrow = c(2, 1))

  barplot(1:5)

  par(lwd = 3)

  barplot(1:5)


If you wanted to define the widths of the lines on a per bar basis,  
you would need to manually create the bars using rect() directly,  
since there is no other way that I can see to pass a vector of line  
widths as the code is presently implemented.


You might want to look at the barchart() function in lattice to see if  
there is more functionality there.


HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Changing border width in barplot ?

2009-12-08 Thread David Winsemius


On Dec 8, 2009, at 3:26 PM, Tal Galili wrote:


Is it possible?
I was hoping to find something like:
lwd
for the different bars in the barplot but couldn't find it.
Does it exist ?


?box




Thanks,
Tal


--

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Quadratcount problem in spatstat

2009-12-08 Thread Sebastian Schutte

Hi,

I know there are older threads discussing the quadratcount function in 
spatstat. Unfortunately, I could not find a solution to my problem there.


I'm analyzing a point pattern in an irregular polygonal window. Both the 
window (an entire country) and the points are projected using WGS84. 
When I do quadratcount with only one quadrat for the entire country it 
holds all my 154 points:


test <- quadratcount(ppp_cameroon, nx=1, ny=1)
plot(test)
sum(test)
[1] 154

But when I split the country up into several quadrats (which is the only 
thing that makes sense for the statistics), the function returns results 
that are plainly wrong. The counts always add up to 154, but when 
overlaid with the actual points it looks like some points are not at all 
counted (0 count quadrats where points are) and some count values are 
much higher than they should be (empty quadrats with 20 points).


The problem does not occur with the humberside example data. I can also 
reproduce everything in example(quadratcount) correctly with humberside, 
but not with my own data set.


Quadrat.test and quadrat.test.ppp lead to the same problem.

I'm out of ideas, so thank you very much for any suggestions!
Sebastian

P.S:
I'm running R 2.10.0 under Ubuntu Linux 9.10 with spatstat 1.17-2

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Changing border width in barplot ?

2009-12-08 Thread Tal Galili
Is it possible?
I was hoping to find something like:
lwd
for the different bars in the barplot but couldn't find it.
Does it exist ?

Thanks,
Tal




Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com/ (English)
--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] data.frame is slower than matrix?

2009-12-08 Thread Peng Yu
I'm doing some data manipulations. I thought originally that I should
use data.frame, as the elements in a data.frame can have multiple
types but the elements in a matrix has to be the same, which all the
data have to convert to strings if there is a single column that is
string.

However,  when I observed that if I change data.frame to matrix, my
program becomes hundred times faster. I'm wondering if it is in
general better to store and manipulate data in a matrix of strings and
then converted to data.frame whenever needed.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Forest Plot

2009-12-08 Thread Xin Ge
Thank you for your help!

On Mon, Dec 7, 2009 at 9:53 AM, Viechtbauer Wolfgang (STAT) <
wolfgang.viechtba...@stat.unimaas.nl> wrote:

> If you just want a forest plot, then the forest() function.
>
> If you have the betas and corresponding variances, then you can create a
> forest plot with:
>
> forest(betas, varbetas)
>
> And yes, estimate +/- 1.96*sqrt(variance of estimate) would be an
> *approximate* 95% CI.
>
> Best,
>
> --
> Wolfgang Viechtbauerhttp://www.wvbauer.com/
> Department of Methodology and StatisticsTel: +31 (0)43 388-2277
> School for Public Health and Primary Care   Office Location:
> Maastricht University, P.O. Box 616 Room B2.01 (second floor)
> 6200 MD Maastricht, The Netherlands Debyeplein 1 (Randwyck)
>
>
>  Original Message
> From: Xin Ge [mailto:xingemaill...@gmail.com]
> Sent: Sunday, December 06, 2009 00:40
> To: Viechtbauer Wolfgang (STAT)
> Cc: r-help@r-project.org
> Subject: Re: [R] Forest Plot
>
> > Thanks for your reply. Which function I should explore in "metafor"
> > package for this kind of plot.
> >
> > Also I have to do a forest plot for "regressions estimates" (betas)
> > and corresponding "sqrt(var)". I hope in this case there is no
> > difference between std. error and std. deviation? So, a 95%
> > confidence interval would be [estimate +/- 1.96*sqrt(variance of
> > estimate)]. Am I correct in saying this?
> >
> > Thanks again,
> > Xin
> >
> >
> > On Sat, Dec 5, 2009 at 6:21 PM, Viechtbauer Wolfgang (STAT)
> >  wrote:
> >
> > The figure that you linked to was produced with the "metafor"
> > package. It can also be used to produce a forest plot if you have
> > means and corresponding standard errors of the means. The standard
> > error of a mean is equal to SD / sqrt(n), so as long as you also know
> > the sample sizes (n), you can convert those standard deviations to
> > the standard errors.
> >
> > Best,
> >
> > --
> > Wolfgang Viechtbauerhttp://www.wvbauer.com/
> > Department of Methodology and StatisticsTel: +31 (0)43 388-2277
> > School for Public Health and Primary Care   Office Location:
> > Maastricht University, P.O. Box 616 Room B2.01 (second floor)
> > 6200 MD Maastricht, The Netherlands Debyeplein 1 (Randwyck)
> > 
> > From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On
> > Behalf Of Xin Ge [xingemaill...@gmail.com]
> > Sent: Sunday, December 06, 2009 12:11 AM
> > To: r-help@r-project.org
> > Subject: [R] Forest Plot
> >
> >
> > Hi All,
> >
> > I want to produce a similar "Forest Plot" as it is on the following
> > link, but my data would be having only two columns (one for
> > "Estimate" and other for "Std. Dev"). Can anyone suggest some
> > function() {Package} which can take such file as an input and give
> > following forest plot:
> >
> >
> http://bm2.genes.nig.ac.jp/RGM2/R_current/library/metafor/man/images/big_plot.rma.uni_001.png
> >
> > Thanks,
> > Xin
>
>  __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] problem in labeling the nodes of tree drawn by rpart

2009-12-08 Thread kaida ning
Hi all,

I used rpart to fit a model, where the covariates are categorical variables.
Then I plotted the tree (mytree) and used the command "text" to add labels
to the tree.

In the nodes of the tree, the values of the covariates are represented with
a, b or c (tree attached).
Is there a way to show the real value(s) of the variable in the nodes
instead of a, b or c ?
I found that the command "labels(mytree,minlength=3)" can give me the
desired label, but I don't know how to add it to the tree.


Best,
Kaida


tree.pdf
Description: Adobe PDF document
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] arrow plots

2009-12-08 Thread Gavin Simpson
On Tue, 2009-12-08 at 13:42 -0500, Cable, Samuel B Civ USAF AFMC
AFRL/RVBXI wrote:
> Am doing some vector plots with the arrows() function.  Works well.  But
> what I need to do is supply an arrow for scaling for the reader.  I need
> to plot an arrow of some known magnitude somewhere on the page
> (preferably outside the bounds of the plot, so that it can be seen
> clearly) with some text underneath it that says, for instance, "10
> kg-m/sec".  Any ideas?  Thanks.

You can plot outside the plotting region using the 'xpd' plotting
parameter:

plot(1:10, 1:10) ## dummy plot
op <- par(xpd = TRUE) ## change clipping parameter & save defaults
arrows(9, 10.75, 10, 10.75, length = 0.1) ## draw reference arrow
text(9.5, 11, 
 labels = expression(10 ~ kg ~ m ~ sec^{"-1"})) ## add key
par(op) ## reset plotting parameter defaults

You'll probably need to tweak the expression to get the units you
wanted.

HTH

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] arrow plots

2009-12-08 Thread Cable, Samuel B Civ USAF AFMC AFRL/RVBXI
Am doing some vector plots with the arrows() function.  Works well.  But
what I need to do is supply an arrow for scaling for the reader.  I need
to plot an arrow of some known magnitude somewhere on the page
(preferably outside the bounds of the plot, so that it can be seen
clearly) with some text underneath it that says, for instance, "10
kg-m/sec".  Any ideas?  Thanks.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Upgrading To 2.10 from 2.6.2

2009-12-08 Thread stephen's mailinglist account
On Tue, Dec 8, 2009 at 6:24 PM, stephen's mailinglist account
 wrote:
>
>
> On Tue, Dec 8, 2009 at 2:25 PM, Iain Gallagher 
>  wrote:
>>
>> Hi Steve
>>
>> Have you tried:
>>
>> apt-cache search gfortran
>>
>> in a terminal window.
>>
>> Then
>>
>> sudo apt-get install theRelevantPackage
>>
>> I think you also need the Universe repos enabled.
>>
>> HTH
>>
>> Iain
>>
>> --- On Tue, 8/12/09, steve_fried...@nps.gov  wrote:
>>
>> > From: steve_fried...@nps.gov 
>> > Subject: [R] Upgrading To 2.10 from 2.6.2
>> > To: r-help@r-project.org
>> > Date: Tuesday, 8 December, 2009, 13:38
>> >
>> > Hello
>> >
>> > I have a Linux machine (Ubuntu 8.04 hardy, Gcc version
>> > 4.2.4
>> > (i486-linux-gnu) currently running R 2.6.2. I'd like to
>> > upgrade to 2.10.
>> >
>> > First Question):  What is the appropriate way to
>> > remove the old version of
>> > R?
>> >
>> >
>> > Part 2.
>> >  After downloading  r-base_2.10.0.orig.tar.gz and
>> > opening the archive. I
>> > ran the ./configure routine.
>> >
>> > It failed claiming that it could not find the F77
>> > compiler.
>> >
>>
Apologies posted with some unintended HTML format which was scrubbed on list

repost

Why not add the CRAN repository to your sources list in Ubuntu and let
your package manager (synaptic?/apt?) sort it out for you.  Version
2.10 is available here for Hardy.  It is a good way to keep your
version of R up to date without worrying about what is happening with
the rest of your machine, which may be quite comfortably stable, and
it saves the hassle of you having to compile it yourself

see pages at http://cran.r-project.org/bin/linux/ubuntu/

You will need to add a line similar to the one below to /etc/apt/sources.list

deb http:///bin/linux/ubuntu hardy/

you could edit the file directly or do it via synaptic or Gnome menu
(system/admin)
 has an entry for software sources.

Stephen

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Upgrading To 2.10 from 2.6.2

2009-12-08 Thread Steve_Friedman
Ok, that seems to work.  Thanks for the help.

Steve


Steve Friedman Ph. D.
Spatial Statistical Analyst
Everglades and Dry Tortugas National Park
950 N Krome Ave (3rd Floor)
Homestead, Florida 33034

steve_fried...@nps.gov
Office (305) 224 - 4282
Fax (305) 224 - 4147


   
 "stephen's
 mailinglist   
 account"   To 
  
 om>cc 
   steve_fried...@nps.gov, 
 12/08/2009 01:24  r-help@r-project.org
 PMSubject 
   Re: [R] Upgrading To 2.10 from  
   2.6.2   
   
   
   
   
   
   






On Tue, Dec 8, 2009 at 2:25 PM, Iain Gallagher <
iaingallag...@btopenworld.com> wrote:
  Hi Steve

  Have you tried:

  apt-cache search gfortran

  in a terminal window.

  Then

  sudo apt-get install theRelevantPackage

  I think you also need the Universe repos enabled.

  HTH

  Iain

  --- On Tue, 8/12/09, steve_fried...@nps.gov 
  wrote:

  > From: steve_fried...@nps.gov 
  > Subject: [R] Upgrading To 2.10 from 2.6.2
  > To: r-help@r-project.org
  > Date: Tuesday, 8 December, 2009, 13:38
  >
  > Hello
  >
  > I have a Linux machine (Ubuntu 8.04 hardy, Gcc version
  > 4.2.4
  > (i486-linux-gnu) currently running R 2.6.2. I'd like to
  > upgrade to 2.10.
  >
  > First Question):  What is the appropriate way to
  > remove the old version of
  > R?
  >
  >
  > Part 2.
  >  After downloading  r-base_2.10.0.orig.tar.gz and
  > opening the archive. I
  > ran the ./configure routine.
  >
  > It failed claiming that it could not find the F77
  > compiler.
  >

Why not add the CRAN repository to your sources list in Ubuntu and let your
package manager (synaptic?/apt?) sort it out for you.  Version 2.10 is
available here for Hardy.  It is a good way to keep your version of R up to
date without worrying about what is happening with the rest of your
machine, which may be quite comfortably stable, and it saves the hassle of
you having to compile it yourself

see pages at http://cran.r-project.org/bin/linux/ubuntu/

You will need to add a line similar to the one below to
/etc/apt/sources.list

deb http:///bin/linux/ubuntu hardy/


you could edit the file directly or do it via synaptic or Gnome menu
(system/admin) has an entry for software sources.

Stephen

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Upgrading To 2.10 from 2.6.2

2009-12-08 Thread stephen's mailinglist account
On Tue, Dec 8, 2009 at 2:25 PM, Iain Gallagher <
iaingallag...@btopenworld.com> wrote:

> Hi Steve
>
> Have you tried:
>
> apt-cache search gfortran
>
> in a terminal window.
>
> Then
>
> sudo apt-get install theRelevantPackage
>
> I think you also need the Universe repos enabled.
>
> HTH
>
> Iain
>
> --- On Tue, 8/12/09, steve_fried...@nps.gov 
> wrote:
>
> > From: steve_fried...@nps.gov 
> > Subject: [R] Upgrading To 2.10 from 2.6.2
> > To: r-help@r-project.org
> > Date: Tuesday, 8 December, 2009, 13:38
> >
> > Hello
> >
> > I have a Linux machine (Ubuntu 8.04 hardy, Gcc version
> > 4.2.4
> > (i486-linux-gnu) currently running R 2.6.2. I'd like to
> > upgrade to 2.10.
> >
> > First Question):  What is the appropriate way to
> > remove the old version of
> > R?
> >
> >
> > Part 2.
> >  After downloading  r-base_2.10.0.orig.tar.gz and
> > opening the archive. I
> > ran the ./configure routine.
> >
> > It failed claiming that it could not find the F77
> > compiler.
> >
>
> Why not add the CRAN repository to your sources list in Ubuntu and let your
package manager (synaptic?/apt?) sort it out for you.  Version 2.10 is
available here for Hardy.  It is a good way to keep your version of R up to
date without worrying about what is happening with the rest of your machine,
which may be quite comfortably stable, and it saves the hassle of you having
to compile it yourself

see pages at http://cran.r-project.org/bin/linux/ubuntu/

You will need to add a line similar to the one below to
/etc/apt/sources.list

deb http:///bin/linux/ubuntu
 hardy/

you could edit the file directly or do it via synaptic or Gnome menu
(system/admin)
 has an entry for software sources.

Stephen

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] coefficients of each local polynomial from locfit

2009-12-08 Thread Liaw, Andy
I believe the prediction is done some some sort of grid, then
interpolated to fill in the rest.  This is, however, purely for
computational reason, and not for any threoretical reasons.  The formal
definition of local polynomials is to do a weighted fit of polynomial at
each point.

Andy 

> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of David Winsemius
> Sent: Tuesday, December 08, 2009 12:45 PM
> To: David Grabiner
> Cc: r-help@r-project.org
> Subject: Re: [R] coefficients of each local polynomial from locfit
> 
> 
> On Dec 8, 2009, at 12:33 PM, David Grabiner wrote:
> 
> > Hi list,
> >
> > This was asked a couple of years ago but I can't find a 
> resolution.   
> > Is
> > there any way to get the coefficients from one of the local  
> > polynomial fits
> > in locfit.  I realize that locfit only constructs polynomials at a  
> > handful
> > of intelligently selected points and uses interpolation to predict  
> > any other
> > points.
> 
> That is not my understanding of what locfit does. I could be 
> wrong and  
> am interested in correcting my misunderstanding if that is 
> the case. I  
> thought a polynomial was fit at _each_ point.  What pages in 
> Loader's  
> book should I be reading.
> 
> 
> >  I would like to know the terms of the polynomials at these points.
> > It seems likely that they are stored in [locfit 
> object]$eva$coef but  
> > I'm
> > having trouble making sense of that list.  That list is the second  
> > argument
> > passed into the C function spreplot called from preplot.locfit.raw  
> > if that
> > helps.
> >
> > Thanks,
> > David
> --
> 
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
Notice:  This e-mail message, together with any attachme...{{dropped:10}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lm: RME vs. ML

2009-12-08 Thread Ravi Varadhan

"worrying about df (ml vs reml) is just a silly obsession of statisticians (of 
which I'm one)"

I too have often wondered about the importance of such tertiary issues.  My 
half-baked understanding is that the main "practical" difference between ML vs 
REML is with regards to ease of computing the estimates, i.e. the REML 
estimates can be computed much more easily than ML.  However, I am wide open to 
be enlightened about other practically important differences.

Best,
Ravi.



Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University

Ph. (410) 502-2619
email: rvarad...@jhmi.edu


- Original Message -
From: Bert Gunter 
Date: Tuesday, December 8, 2009 11:38 am
Subject: Re: [R] lm: RME vs. ML
To: 'John Sorkin' , r-help-boun...@r-project.org, 
jlu...@ria.buffalo.edu
Cc: r-help@r-project.org


> A contrarian point of view:
> 
> If you have so little data (relative to the number of parameters to be
> estimated, especially NONLINEAR parameters like covariance estimates)that
> the ml vs reml bias could be large, then there's so little information
> anyway that such bias is the least of your problems (identifiability
> probably is a major issue-- mis-shapen confidence regions).
> 
> Ergo, worrying about df (ml vs reml) is just a silly obsession of
> statisticians (of which I'm one).
> 
> Criticisms, public or private, welcome of course. 
> 
> This is my view only and should not be considered a stain on my 
> employer --
> other than its misfortune in employing me.
> 
> 
> Bert Gunter
> Genentech Nonclinical Biostatistics
> 
> 
> -Original Message-
> From: r-help-boun...@r-project.org [ On
> Behalf Of John Sorkin
> Sent: Tuesday, December 08, 2009 7:12 AM
> To: r-help-boun...@r-project.org; jlu...@ria.buffalo.edu
> Cc: r-help@r-project.org
> Subject: Re: [R] lm: RME vs. ML
> 
> Your question is well taken. I did not give any criteria because I realized
> there might be different answers based upon different criteria. Certainly
> one fundamental criteria would be that the estimates are BLUE, but 
> this is
> not the only criteria one might be used.
> John 
> -Original Message-
> From: 
> To: John Sorkin 
> Cc:  
> To:  
> 
> Sent: 12/8/2009 9:39:28 AM
> Subject: Re: [R] lm: RME vs. ML
> 
> You need to give your criteria for "preferable".  For normal-linear 
> models, REML estimates of variances are unbiased, whereas ML estimates 
> are 
> downwardly biased.  My intuition is that the ML-induced bias would be 
> 
> worse in small samples. I don't know about other distributions. 
> Likewise I 
> don't know about MSE or other criterion for preference.
> 
> 
> 
> 
> 
> 
> "John Sorkin"  
> Sent by: r-help-boun...@r-project.org
> 12/07/2009 09:24 PM
> 
> To
> 
> cc
> 
> Subject
> [R] lm: RME vs. ML
> 
> 
> 
> 
> 
> 
> windows XP
> R 2.10
> 
> As pointed out by Prof. Venables and Ripley (MASS 4th edition, p275), 
> the 
> results obtained from lme using method="ML" and method="REML" are 
> often 
> different, especially for small datasets. Is there any way to 
> determine 
> which method is preferable for a given set of data?
> Thanks,
> john
> 
> 
> John David Sorkin M.D., Ph.D.
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
> 
> Confidentiality Statement:
> This email message, including any attachments, is for th...{{dropped:8}}
> 
> __
> R-help@r-project.org mailing list
> 
> PLEASE do read the posting guide 
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] coefficients of each local polynomial from locfit

2009-12-08 Thread David Winsemius


On Dec 8, 2009, at 12:33 PM, David Grabiner wrote:


Hi list,

This was asked a couple of years ago but I can't find a resolution.   
Is
there any way to get the coefficients from one of the local  
polynomial fits
in locfit.  I realize that locfit only constructs polynomials at a  
handful
of intelligently selected points and uses interpolation to predict  
any other

points.


That is not my understanding of what locfit does. I could be wrong and  
am interested in correcting my misunderstanding if that is the case. I  
thought a polynomial was fit at _each_ point.  What pages in Loader's  
book should I be reading.




 I would like to know the terms of the polynomials at these points.
It seems likely that they are stored in [locfit object]$eva$coef but  
I'm
having trouble making sense of that list.  That list is the second  
argument
passed into the C function spreplot called from preplot.locfit.raw  
if that

helps.

Thanks,
David

--

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Announcing a new R news site: R-bloggers.com

2009-12-08 Thread Dirk Eddelbuettel

On 8 December 2009 at 12:24, Gabor Grothendieck wrote:
| I am not sure if its still the case but one of the problems with the
| Planet R feed was that it had material in it not related to R or
| statistics or any technical subject at all so if you harvest it be
| sure to exclude such sources.

This comes up on some of the other 'Planet *' aggregators I read --- and the
general consensus that it is a good thing to see a fuller and broader picture
of activities of the community members. This is not meant to replace a
narrowly-focussed mailing lists.  So call it a feature and not a bug.  

For filtering of RSS feeds see e.g. Yahoo Pipes (http://pipes.yahoo.com/pipes/)

Dirk

-- 
Three out of two people have difficulties with fractions.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] coefficients of each local polynomial from locfit

2009-12-08 Thread David Grabiner
Hi list,

This was asked a couple of years ago but I can't find a resolution.  Is
there any way to get the coefficients from one of the local polynomial fits
in locfit.  I realize that locfit only constructs polynomials at a handful
of intelligently selected points and uses interpolation to predict any other
points.  I would like to know the terms of the polynomials at these points.
It seems likely that they are stored in [locfit object]$eva$coef but I'm
having trouble making sense of that list.  That list is the second argument
passed into the C function spreplot called from preplot.locfit.raw if that
helps.

Thanks,
David

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] "prodlim" problem with censor ticks in stratified KM plot

2009-12-08 Thread David Winsemius


On Dec 8, 2009, at 12:17 PM, David Winsemius wrote:



On Dec 8, 2009, at 11:49 AM, bnorth wrote:



I am having a problem with the censor tick marks with plot and  
prodlim
I am using the example code from ?prodlim but I have added  
mark.time=T to

the plot command to get tick marks at censor times.
The problem is the tick marks occur at exactly the same point in  
each of the

arms.
This seems wrong as the censorings do not occur at the same times  
in each

arm.

pfit.edema <- prodlim(Surv(time,status)~edema,data=pbc)


I get an error here, I think because status has three values and  
Surv expects two but possibly because my efforts to work around the  
Design overlays to the survival package were not correct. I tried a  
few other approaches to getting the code to work, but failed.


I did notice that pbc$status is numeric rather than a factor, and  
wondered if that three level categorical variable might be treated  
differently by the various functions you are using because of its R  
class. Perhaps if you created a factor pbc$stat2, it might work  
correctly?




Meant to say that pbc$edema was numeric. Got success in reproducing  
the problem using a Hist object instead of a Surv object and, ... no,  
changing edema to a factor variable does not solve the problem. How  
about sending an email to the author?




--
David


summary(pfit.edema)
summary(pfit.edema,intervals=TRUE)
plot(pfit.edema,mark.time=T)

Here is my sessionInfo

R version 2.9.0 (2009-04-17)
i386-pc-mingw32

locale:
LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United
Kingdom.1252;LC_MONETARY=English_United
Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252

attached base packages:
[1] splines   stats graphics  grDevices utils datasets   
methods

[8] base

other attached packages:
[1] prodlim_1.0.5  survival_2.35-7KernSmooth_2.22-22


many many thanks to anyone who can spot my error
--
View this message in context: 
http://n4.nabble.com/prodlim-problem-with-censor-ticks-in-stratified-KM-plot-tp955433p955433.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Announcing a new R news site: R-bloggers.com

2009-12-08 Thread Gabor Grothendieck
I am not sure if its still the case but one of the problems with the
Planet R feed was that it had material in it not related to R or
statistics or any technical subject at all so if you harvest it be
sure to exclude such sources.

On Tue, Dec 8, 2009 at 12:12 PM, Elijah Wright  wrote:
> Hi Tal!
>
> First let me say that I deeply appreciate the work that you're putting into
> this.  You're doing good things for our community, and that's great!
>
> I put the planet-R stuff together rather hastily a few years ago, as a way
> of seeing whether it was of enough use for the community for
> it to be something that the R project would want to provide as a standard
> service offering.  ;)  At that moment in time, it seemed like a
> slightly fringe thing to do, but I think the community has grown into it a
> bit now.  There are a *lot* more R-related RSS feeds in the
> ecosystem now than there were a couple or three years back.
>
> Another aspect - a couple of years ago, we didn't have decent recommenders
> for related feeds from things like Google Reader.  Nowadays,
> once google finds an R-related feed, it starts to suggest it to me.  That's
> very powerful, and I think the need for a centralized "planet"-style
> site is somewhat reduced by it.
>
> I'd strongly suggest that you make your site as community-oriented as
> possible, probably by asking folks in the
> community (like, say, Romain...) to help you run and administer the thing.
> That will make it more like a community project, and
> reduce the load on you personally.  I should have done something like that
> with the planetr.stderr.org site long ago - as Dirk notes,
> my cycle time for responding to mail and notes is a bit slow, and my time
> budget for messing about with the site is also pretty amazingly limited.
>
> As you note - there's not really much by way of contact information in the
> planetr templates.  They're quite limited and quite unimpressive.  :)  But,
> well, the amount of work that was required to get a "working" site up was
> also incredibly small for me.
>
> I'd be happy to see you harvest links out of planetr.stderr.org and add them
> to your r-bloggers site - some of the links I've collected there are
> institutional (journal feeds and the like) and should be possible to add
> without any consultation.  For the individual bloggers, I'd suggest
> contacting them to get permission to add their feeds - it just seems like
> the polite thing to do.
>
> Again, I want to thank you publicly for spending your time on this, and am
> happy to see someone taking action to improve communication
> and discussion across the R community of users and developers.  This
> benefits us all, greatly!
>
> Best, and be well,
>
> --elijah
>
>
> On Sat, Dec 5, 2009 at 2:32 PM, Tal Galili  wrote:
>
>> Hi Dirk,
>>
>> I wish to emphasis that I came across PlanetR over a year ago,
>> but completely forgot it existed when working on R-bloggers. Also, when I
>> contacted the bloggers about this idea, non of them actually wrote to me
>> about it (which makes me feel better about not remembering it). I apologies
>> if setting up R-bloggers seems like trying to "compete" with PlanetR, this
>> at all wasn't my intention.
>> Yet, now that my website is up, I hope it will be of use and here
>> are several ways in which (at hindsight) I can say it has something to
>> offer:
>>
>> 1) Planet R is limited (for years) to 26 feeds only, and I don't remember
>> seeing it evolve to include (or allow inclusion) of new R blogs that came
>> around.
>> 2) The feeds are of blogs and non blogs (such as wiki or cran updates).
>> That
>> makes finding "reading material" inside it very difficult, since the site
>> is
>> cluttered with a lot of "updates" from cranbarries and the wiki.
>> 3) In PlanetR, one can only view (about) 5 days back and no more
>> (R-bloggers
>> allows viewing of much more then 5 days back).
>> 4) R-bloggers allows searching inside the content, PlanetR doesn't.
>> 5) R-bloggers allow one to get e-mail updates, PlanetR doesn't.
>> 6) R-bloggers offers "related articles", PlanetR doesn't.
>>
>> I see R-bloggers  as a "news site" based on
>> the
>> R bloggers, and I can't say the same about PlanetR for the reasons I gave
>> above.
>>
>>
>> With much respect to you Dirk,
>> Tal
>>
>>
>>
>>
>>
>>
>> Contact
>> Details:---
>> Contact me: tal.gal...@gmail.com |  972-52-7275845
>> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
>> www.r-statistics.com/ (English)
>>
>> --
>>
>>
>>
>>
>> On Sat, Dec 5, 2009 at 9:59 PM, Dirk Eddelbuettel  wrote:
>>
>> >
>> > On 5 December 2009 at 21:38, Tal Galili wrote:
>> > | R-Bloggers.com hopes to serve the R community by presenting (in one
>> > place)
>> > | all the new articles (posts) written (in English) about R in the "R
>>

[R] Job Opportunity

2009-12-08 Thread David . Siev

Statistical Programmer - USDA Center for Veterinary Biologics

The USDA Center for Veterinary Biologics is seeking a statistical
programmer with good R skills. The position is a two-year term position in
the CVB Statistics Section with the possibility of becoming permanent.
Applicants must be United States citizens.

The position calls for a thorough knowledge of R and aptitude in both
programming and data analysis in order to develop tools for manipulating,
managing, and analyzing data. Experience is preferred, but qualified entry
level candidates will be considered. In addition, a strong candidate should
be able to interact well with consulting statisticians, scientists, and
information technology personnel.

Benefits include flexible work schedule, ten paid federal holidays, liberal
leave allowance (annual leave, sick leave, and family friendly leave
policy), life insurance program, health insurance program, retirement and
thrift savings plan, and optional long term care insurance.  Starting
salary based on experience and education.

The Center for Veterinary Biologics regulates vaccines, diagnostics, and
other immunobiologics used to manage animal diseases.  It is located on the
500-acre campus of the National Centers for Animal Health in Ames, Iowa,
where it maintains a close affiliation with the other animal disease
centers on the campus as well as nearby Iowa State University.  The CVB
Statistics Section provides statistical services to scientific reviewers
and laboratory scientists.

Ames is known for its stimulating cultural environment, pleasant
surroundings, and stable economy.  It was ranked second out of 189 other
small cities by The New Rating Guide to Life in America's Small Cities.  It
is 40 minutes north of Des Moines International Airport.


Contact
David Siev
Statistics Section Leader
USDA Center for Veterinary Biologics
Ames, IA 50010
david.s...@aphis.usda.gov

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] "prodlim" problem with censor ticks in stratified KM plot

2009-12-08 Thread David Winsemius


On Dec 8, 2009, at 11:49 AM, bnorth wrote:



I am having a problem with the censor tick marks with plot and prodlim
I am using the example code from ?prodlim but I have added  
mark.time=T to

the plot command to get tick marks at censor times.
The problem is the tick marks occur at exactly the same point in  
each of the

arms.
This seems wrong as the censorings do not occur at the same times in  
each

arm.

pfit.edema <- prodlim(Surv(time,status)~edema,data=pbc)


I get an error here, I think because status has three values and Surv  
expects two but possibly because my efforts to work around the Design  
overlays to the survival package were not correct. I tried a few other  
approaches to getting the code to work, but failed.


I did notice that pbc$status is numeric rather than a factor, and  
wondered if that three level categorical variable might be treated  
differently by the various functions you are using because of its R  
class. Perhaps if you created a factor pbc$stat2, it might work  
correctly?


--
David


summary(pfit.edema)
summary(pfit.edema,intervals=TRUE)
plot(pfit.edema,mark.time=T)

Here is my sessionInfo

R version 2.9.0 (2009-04-17)
i386-pc-mingw32

locale:
LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United
Kingdom.1252;LC_MONETARY=English_United
Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252

attached base packages:
[1] splines   stats graphics  grDevices utils datasets   
methods

[8] base

other attached packages:
[1] prodlim_1.0.5  survival_2.35-7KernSmooth_2.22-22


many many thanks to anyone who can spot my error
--
View this message in context: 
http://n4.nabble.com/prodlim-problem-with-censor-ticks-in-stratified-KM-plot-tp955433p955433.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Announcing a new R news site: R-bloggers.com

2009-12-08 Thread Elijah Wright
Hi Tal!

First let me say that I deeply appreciate the work that you're putting into
this.  You're doing good things for our community, and that's great!

I put the planet-R stuff together rather hastily a few years ago, as a way
of seeing whether it was of enough use for the community for
it to be something that the R project would want to provide as a standard
service offering.  ;)  At that moment in time, it seemed like a
slightly fringe thing to do, but I think the community has grown into it a
bit now.  There are a *lot* more R-related RSS feeds in the
ecosystem now than there were a couple or three years back.

Another aspect - a couple of years ago, we didn't have decent recommenders
for related feeds from things like Google Reader.  Nowadays,
once google finds an R-related feed, it starts to suggest it to me.  That's
very powerful, and I think the need for a centralized "planet"-style
site is somewhat reduced by it.

I'd strongly suggest that you make your site as community-oriented as
possible, probably by asking folks in the
community (like, say, Romain...) to help you run and administer the thing.
That will make it more like a community project, and
reduce the load on you personally.  I should have done something like that
with the planetr.stderr.org site long ago - as Dirk notes,
my cycle time for responding to mail and notes is a bit slow, and my time
budget for messing about with the site is also pretty amazingly limited.

As you note - there's not really much by way of contact information in the
planetr templates.  They're quite limited and quite unimpressive.  :)  But,
well, the amount of work that was required to get a "working" site up was
also incredibly small for me.

I'd be happy to see you harvest links out of planetr.stderr.org and add them
to your r-bloggers site - some of the links I've collected there are
institutional (journal feeds and the like) and should be possible to add
without any consultation.  For the individual bloggers, I'd suggest
contacting them to get permission to add their feeds - it just seems like
the polite thing to do.

Again, I want to thank you publicly for spending your time on this, and am
happy to see someone taking action to improve communication
and discussion across the R community of users and developers.  This
benefits us all, greatly!

Best, and be well,

--elijah


On Sat, Dec 5, 2009 at 2:32 PM, Tal Galili  wrote:

> Hi Dirk,
>
> I wish to emphasis that I came across PlanetR over a year ago,
> but completely forgot it existed when working on R-bloggers. Also, when I
> contacted the bloggers about this idea, non of them actually wrote to me
> about it (which makes me feel better about not remembering it). I apologies
> if setting up R-bloggers seems like trying to "compete" with PlanetR, this
> at all wasn't my intention.
> Yet, now that my website is up, I hope it will be of use and here
> are several ways in which (at hindsight) I can say it has something to
> offer:
>
> 1) Planet R is limited (for years) to 26 feeds only, and I don't remember
> seeing it evolve to include (or allow inclusion) of new R blogs that came
> around.
> 2) The feeds are of blogs and non blogs (such as wiki or cran updates).
> That
> makes finding "reading material" inside it very difficult, since the site
> is
> cluttered with a lot of "updates" from cranbarries and the wiki.
> 3) In PlanetR, one can only view (about) 5 days back and no more
> (R-bloggers
> allows viewing of much more then 5 days back).
> 4) R-bloggers allows searching inside the content, PlanetR doesn't.
> 5) R-bloggers allow one to get e-mail updates, PlanetR doesn't.
> 6) R-bloggers offers "related articles", PlanetR doesn't.
>
> I see R-bloggers  as a "news site" based on
> the
> R bloggers, and I can't say the same about PlanetR for the reasons I gave
> above.
>
>
> With much respect to you Dirk,
> Tal
>
>
>
>
>
>
> Contact
> Details:---
> Contact me: tal.gal...@gmail.com |  972-52-7275845
> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
> www.r-statistics.com/ (English)
>
> --
>
>
>
>
> On Sat, Dec 5, 2009 at 9:59 PM, Dirk Eddelbuettel  wrote:
>
> >
> > On 5 December 2009 at 21:38, Tal Galili wrote:
> > | R-Bloggers.com hopes to serve the R community by presenting (in one
> > place)
> > | all the new articles (posts) written (in English) about R in the "R
> > | blogosphere".
> >
> > But how is that different from
> >
> >  http://PlanetR.stderr.org
> >
> > which has been doing the same quite admirably for years?
> >
> > Dirk
> >
> > --
> > Three out of two people have difficulties with fractions.
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PL

[R] "prodlim" problem with censor ticks in stratified KM plot

2009-12-08 Thread bnorth

I am having a problem with the censor tick marks with plot and prodlim
I am using the example code from ?prodlim but I have added mark.time=T to
the plot command to get tick marks at censor times.
The problem is the tick marks occur at exactly the same point in each of the
arms. 
This seems wrong as the censorings do not occur at the same times in each
arm.

pfit.edema <- prodlim(Surv(time,status)~edema,data=pbc)
summary(pfit.edema)
summary(pfit.edema,intervals=TRUE)
plot(pfit.edema,mark.time=T)

Here is my sessionInfo

R version 2.9.0 (2009-04-17) 
i386-pc-mingw32 

locale:
LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United
Kingdom.1252;LC_MONETARY=English_United
Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252

attached base packages:
[1] splines   stats graphics  grDevices utils datasets  methods  
[8] base 

other attached packages:
[1] prodlim_1.0.5  survival_2.35-7KernSmooth_2.22-22


many many thanks to anyone who can spot my error 
-- 
View this message in context: 
http://n4.nabble.com/prodlim-problem-with-censor-ticks-in-stratified-KM-plot-tp955433p955433.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lm: RME vs. ML

2009-12-08 Thread Bert Gunter
A contrarian point of view:

If you have so little data (relative to the number of parameters to be
estimated, especially NONLINEAR parameters like covariance estimates)that
the ml vs reml bias could be large, then there's so little information
anyway that such bias is the least of your problems (identifiability
probably is a major issue-- mis-shapen confidence regions).

Ergo, worrying about df (ml vs reml) is just a silly obsession of
statisticians (of which I'm one).

Criticisms, public or private, welcome of course. 

This is my view only and should not be considered a stain on my employer --
other than its misfortune in employing me.


Bert Gunter
Genentech Nonclinical Biostatistics


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of John Sorkin
Sent: Tuesday, December 08, 2009 7:12 AM
To: r-help-boun...@r-project.org; jlu...@ria.buffalo.edu
Cc: r-help@r-project.org
Subject: Re: [R] lm: RME vs. ML

Your question is well taken. I did not give any criteria because I realized
there might be different answers based upon different criteria. Certainly
one fundamental criteria would be that the estimates are BLUE, but this is
not the only criteria one might be used.
John 
-Original Message-
From: 
To: John Sorkin 
Cc:  
To:  

Sent: 12/8/2009 9:39:28 AM
Subject: Re: [R] lm: RME vs. ML

You need to give your criteria for "preferable".  For normal-linear 
models, REML estimates of variances are unbiased, whereas ML estimates are 
downwardly biased.  My intuition is that the ML-induced bias would be 
worse in small samples. I don't know about other distributions. Likewise I 
don't know about MSE or other criterion for preference.






"John Sorkin"  
Sent by: r-help-boun...@r-project.org
12/07/2009 09:24 PM

To

cc

Subject
[R] lm: RME vs. ML






windows XP
R 2.10

As pointed out by Prof. Venables and Ripley (MASS 4th edition, p275), the 
results obtained from lme using method="ML" and method="REML" are often 
different, especially for small datasets. Is there any way to determine 
which method is preferable for a given set of data?
Thanks,
john


John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Opps Correct Version of Holiday Regressor Perl Script

2009-12-08 Thread Idgarad
Here is the correct version. The old version is the redirect only version of
the script.

### BEGIN SCRIPT 

#!/usr/bin/perl
##
# --start, -s = The date you would like to start generating regressors
#--end, -e = When to stop generating holiday regressros
# --scope, -c = D, W for Daily or Weekly respectively (e.g. Does this week
have a particular holiday)
# --file, -f = Ummm where to write the output silly!
#
# **NOTE** The EOM holiday is "End of Month" for computer systems this may
be important for
# extra processing and what not.
#
# You may need to set yout TZ environment variable if the script cannot
# determine your time zone from the system (e.g. SET TZ=CST )
##


use Getopt::Long;
use Date::Manip;
use Spreadsheet::WriteExcel;
use Calendar::Functions;
use Date::Holidays::USFederal;
#use Date::Holidays;
use Set::Array;
use POSIX qw/strftime/;
use Time::Local;

my @regressors = ();
#my $holidays = Date::Holidays->new(countrycode => 'us');

$result = GetOptions ("start|s=s" => \$start,
   "end|e=s" => \$end,
   "scope|c=s" => \$scope,
   "file|f=s" => \$filename);

open (OUTFILE, ">>$filename");


print "Generating Holiday Dummy Variables starting $start to $end generated
by $scope. Output to

$filename \n";

#print all the dates based on scope as a test




$startDate=ParseDate(\$start);
if (! $startDate) {
print "Error in the date";exit;
}
$endDate= ParseDate($end);
print OUTFILE "Start Date: ",UnixDate($startDate,"%m/%d/%Y"),"\n";
print OUTFILE "End Date: ",$end,"\n";

# print OUTFILE "Last Day in Month: ",UnixDate(ParseDate("last day in JAN
2004"),"%m/%d/%Y"),"\n";




print OUTFILE

"Date,HLY-NewYear,HLY-MLK,HLY-PRES,HLY-MEMORIAL,HLY-J4,HLY-LABOR,HLY-COLUMBUS,HLY-VETS,HLY-THANKS,HLY-XMA

S,HLY-ELECT,HLY-PATRIOT,EOM\n";
$baseDate=$startDate;

if ($scope eq "d"){

while(Date_Cmp($baseDate,$endDate)<0)
{
print OUTFILE UnixDate($baseDate,"%m/%d/%Y"), ",";
if(holidayCheck($baseDate) eq "New Year's Day"){print OUTFILE "1,"} else
{print OUTFILE "0,"};
if(holidayCheck($baseDate) eq "Martin Luther King, Jr. Day"){print OUTFILE
"1,"} else {print OUTFILE

"0,"};
if(holidayCheck($baseDate) eq "Presidents' Day"){print OUTFILE "1,"} else
{print OUTFILE "0,"};
if(holidayCheck($baseDate) eq "Memorial Day"){print OUTFILE "1,"} else
{print OUTFILE "0,"};
if(holidayCheck($baseDate) eq "Independence Day"){print OUTFILE "1,"} else
{print OUTFILE "0,"};
if(holidayCheck($baseDate) eq "Labor Day"){print OUTFILE "1,"} else {print
OUTFILE "0,"};
if(holidayCheck($baseDate) eq "Columbus Day"){print OUTFILE "1,"} else
{print OUTFILE "0,"};
if(holidayCheck($baseDate) eq "Veterans' Day"){print OUTFILE "1,"} else
{print OUTFILE "0,"};
if(holidayCheck($baseDate) eq "Thanksgiving Day"){print OUTFILE "1,"} else
{print OUTFILE "0,"};
if(holidayCheck($baseDate) eq "Christmas Day"){print OUTFILE "1,"} else
{print OUTFILE "0,"};
if(holidayCheck($baseDate) eq "Election Day"){print OUTFILE "1,"} else
{print OUTFILE "0,"};
if(holidayCheck($baseDate) eq "U.S. Patriot Day Unofficial
Observation"){print OUTFILE "1,"} else {print

OUTFILE "0,"};
if(holidayCheck($baseDate) eq "EOM"){print OUTFILE "1"} else {print OUTFILE
"0"};
print OUTFILE "\n";

$baseDate=DateCalc($baseDate,"+1 day");
}

} # END IF D

if($scope eq "w") {

while(Date_Cmp($baseDate,DateCalc($endDate,"+7 days"))<0)
{
print OUTFILE UnixDate($baseDate,"%m/%d/%Y"), ",";

if(

(holidayCheck(DateCalc($baseDate,"+0 days")) eq "New Year's Day") ||
(holidayCheck(DateCalc($baseDate,"+1 days")) eq "New Year's Day") ||
(holidayCheck(DateCalc($baseDate,"+2 days")) eq "New Year's Day") ||
(holidayCheck(DateCalc($baseDate,"+3 days")) eq "New Year's Day") ||
(holidayCheck(DateCalc($baseDate,"+4 days")) eq "New Year's Day") ||
(holidayCheck(DateCalc($baseDate,"+5 days")) eq "New Year's Day") ||
(holidayCheck(DateCalc($baseDate,"+6 days")) eq "New Year's Day")

){print OUTFILE "1,"} else {print OUTFILE "0,"};

if(
holidayCheck(DateCalc($baseDate,"+0 days")) eq "Martin Luther King, Jr. Day"
||
holidayCheck(DateCalc($baseDate,"+1 days")) eq "Martin Luther King, Jr. Day"
||
holidayCheck(DateCalc($baseDate,"+2 days")) eq "Martin Luther King, Jr. Day"
||
holidayCheck(DateCalc($baseDate,"+3 days")) eq "Martin Luther King, Jr. Day"
||
holidayCheck(DateCalc($baseDate,"+4 days")) eq "Martin Luther King, Jr. Day"
||
holidayCheck(DateCalc($baseDate,"+5 days")) eq "Martin Luther King, Jr. Day"
||
holidayCheck(DateCalc($baseDate,"+6 days")) eq "Martin Luther King, Jr. Day"

){print OUTFILE "1,"} else {print OUTFILE "0,"};

if(
holidayCheck(DateCalc($baseDate,"+0 days")) eq "Presidents' Day" ||
holidayCheck(DateCalc($baseDate,"+1 days")) eq "Presidents' Day" ||
holidayCheck(DateCalc($baseDate,"+2 days")) eq "Presidents' Day" ||
holidayCheck(DateCalc($baseDate,"+3 days")) eq "Presidents' Day" ||
holidayCheck(DateCalc($baseDate,"+4 days")) eq "Presidents' Day" ||
holidayCheck(DateCalc($baseDate,"+5 days")) eq "Presidents' Day" ||
holidayCheck(DateCalc($baseDate,"+6 days")) eq "Presidents' Day"
){

Re: [R] conditionally merging adjacent rows in a data frame

2009-12-08 Thread Titus von der Malsburg
On Tue, Dec 8, 2009 at 4:50 PM, Gray Calhoun  wrote:
> I think there might be a problem with this approach if roi, tid, rt,
> and mood are the same for nonconsecutive rows.

True, but I can use the index of my reshape solution. Aggregate was
the crucial ingredient.  Thanks both!

For the record, this is the full solution:

head(d)
rt dur tid  mood roi  x
55 5523 200   4  subj   9  5
56 5523  52   4  subj   7 31
57 5523 209   4  subj   4  9
58 5523 188   4  subj   4  7
70 4016 264   5 indic   9 51
71 4016 195   5 indic   4 14

index  <- c(TRUE,diff(d$roi)!=0)
d2 <- d[index,]

index  <- cumsum(index)
d2$dur <- aggregate(d$dur, list(index=index), sum)[2]
d2$x   <- aggregate(d$x, list(index=index), mean)[2]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Holiday Gift Perl Script for US Holiday Dummy Regressors

2009-12-08 Thread Idgarad
# BEGIN CODE ##



#!/usr/bin/perl
##
#
# --start, -s = The date you would like to start generating regressors
#--end, -e = When to stop generating holiday regressros
# --scope, -c = D, W for Daily or Weekly respectively (e.g. Does this week
have a particular holiday)
# --file, -f = Ummm where to write the output silly!
#
# **NOTE** The EOM holiday is "End of Month" for computer systems this may
be important for
# extra processing and what not.
#
# You may need to set yout TZ environment variable if the script cannot
# determine your time zone from the system (e.g. SET TZ=CST )
##
use Getopt::Long;
use Date::Manip;
use Spreadsheet::WriteExcel;
use Calendar::Functions;
use Date::Holidays::USFederal;
use Set::Array;
use POSIX qw/strftime/;
use Time::Local;

my @regressors = ();
#my $holidays = Date::Holidays->new(countrycode => 'us');

$result = GetOptions ("start|s=s" => \$start,
   "end|e=s" => \$end,
   "scope|c=s" => \$scope,
   "file|f=s" => \$filename);




print "Generating Holiday Dummy Variables starting $start to $end generated
by $scope. Output to $filename \n";

#print all the dates based on scope as a test




$startDate=ParseDate(\$start);
if (! $startDate) {
print "Error in the date";exit;
}
$endDate= ParseDate($end);
print "Start Date: ",UnixDate($startDate,"%m/%d/%Y"),"\n";
print "End Date: ",$end,"\n";

print "Last Day in Month: ",UnixDate(ParseDate("last day in JAN
2004"),"%m/%d/%Y"),"\n";



#HEADER OUTpUT
print
"Date,HLY-NewYear,HLY-MLK,HLY-PRES,HLY-MEMORIAL,HLY-J4,HLY-LABOR,HLY-COLUMBUS,HLY-VETS,HLY-THANKS,HLY-XMAS,HLY-ELECT,HLY-PATRIOT,EOM\n";
$baseDate=$startDate;

if ($scope eq "d"){

while(Date_Cmp($baseDate,$endDate)<0)
{
print UnixDate($baseDate,"%m/%d/%Y"), ",";
if(holidayCheck($baseDate) eq "New Year's Day"){print "1,"} else {print
"0,"};
if(holidayCheck($baseDate) eq "Martin Luther King, Jr. Day"){print "1,"}
else {print "0,"};
if(holidayCheck($baseDate) eq "Presidents' Day"){print "1,"} else {print
"0,"};
if(holidayCheck($baseDate) eq "Memorial Day"){print "1,"} else {print "0,"};
if(holidayCheck($baseDate) eq "Independence Day"){print "1,"} else {print
"0,"};
if(holidayCheck($baseDate) eq "Labor Day"){print "1,"} else {print "0,"};
if(holidayCheck($baseDate) eq "Columbus Day"){print "1,"} else {print "0,"};
if(holidayCheck($baseDate) eq "Veterans' Day"){print "1,"} else {print
"0,"};
if(holidayCheck($baseDate) eq "Thanksgiving Day"){print "1,"} else {print
"0,"};
if(holidayCheck($baseDate) eq "Christmas Day"){print "1,"} else {print
"0,"};
if(holidayCheck($baseDate) eq "Election Day"){print "1,"} else {print "0,"};
if(holidayCheck($baseDate) eq "U.S. Patriot Day Unofficial
Observation"){print "1,"} else {print "0,"};
if(holidayCheck($baseDate) eq "EOM"){print "1"} else {print "0"};
print "\n";

$baseDate=DateCalc($baseDate,"+1 day");
}

} # END IF D

if($scope eq "w") {

while(Date_Cmp($baseDate,DateCalc($endDate,"+7 days"))<0)
{
print UnixDate($baseDate,"%m/%d/%Y"), ",";

if(

(holidayCheck(DateCalc($baseDate,"+0 days")) eq "New Year's Day") ||
(holidayCheck(DateCalc($baseDate,"+1 days")) eq "New Year's Day") ||
(holidayCheck(DateCalc($baseDate,"+2 days")) eq "New Year's Day") ||
(holidayCheck(DateCalc($baseDate,"+3 days")) eq "New Year's Day") ||
(holidayCheck(DateCalc($baseDate,"+4 days")) eq "New Year's Day") ||
(holidayCheck(DateCalc($baseDate,"+5 days")) eq "New Year's Day") ||
(holidayCheck(DateCalc($baseDate,"+6 days")) eq "New Year's Day")

){print "1,"} else {print "0,"};

if(
holidayCheck(DateCalc($baseDate,"+0 days")) eq "Martin Luther King, Jr. Day"
||
holidayCheck(DateCalc($baseDate,"+1 days")) eq "Martin Luther King, Jr. Day"
||
holidayCheck(DateCalc($baseDate,"+2 days")) eq "Martin Luther King, Jr. Day"
||
holidayCheck(DateCalc($baseDate,"+3 days")) eq "Martin Luther King, Jr. Day"
||
holidayCheck(DateCalc($baseDate,"+4 days")) eq "Martin Luther King, Jr. Day"
||
holidayCheck(DateCalc($baseDate,"+5 days")) eq "Martin Luther King, Jr. Day"
||
holidayCheck(DateCalc($baseDate,"+6 days")) eq "Martin Luther King, Jr. Day"

){print "1,"} else {print "0,"};

if(
holidayCheck(DateCalc($baseDate,"+0 days")) eq "Presidents' Day" ||
holidayCheck(DateCalc($baseDate,"+1 days")) eq "Presidents' Day" ||
holidayCheck(DateCalc($baseDate,"+2 days")) eq "Presidents' Day" ||
holidayCheck(DateCalc($baseDate,"+3 days")) eq "Presidents' Day" ||
holidayCheck(DateCalc($baseDate,"+4 days")) eq "Presidents' Day" ||
holidayCheck(DateCalc($baseDate,"+5 days")) eq "Presidents' Day" ||
holidayCheck(DateCalc($baseDate,"+6 days")) eq "Presidents' Day"
){print "1,"} else {print "0,"};

if(
holidayCheck(DateCalc($baseDate,"+0 days")) eq "Memorial Day" ||
holidayCheck(DateCalc($baseDate,"+1 days")) eq "Memorial Day" ||
holidayCheck(DateCalc($baseDate,"+2 days")) eq "Memorial Day" ||
holidayCheck(DateCalc($baseDate,"+3 days")) eq "Memorial Day" ||
holidayCheck(DateCalc($baseDate,"+4 days")) eq "Memorial Day" ||
holidayCheck(DateCalc($baseDate,"+5 days")) eq "Memorial Day" ||
h

Re: [R] conditionally merging adjacent rows in a data frame

2009-12-08 Thread Gray Calhoun
I think there might be a problem with this approach if roi, tid, rt,
and mood are the same for nonconsecutive rows.
--Gray

On Tue, Dec 8, 2009 at 9:29 AM, Nikhil Kaza  wrote:
> How about creating an index using multiple columns.
>
>  a <- with(d, aggregate(dur, list(rt=rt,tid=tid,mood=mood,roi=roi), sum))
>  b <- with(d, aggregate(x, list(rt=rt,tid=tid,mood=mood,roi=roi), mean))
> c <- merge(a, b, by=c("rt","tid","mood", "roi"))
>
> I suppose one could save some time by not running aggregate twice on the
> same dataset, but I am not sure how.
>
> Nikhil
>
> On 8 Dec 2009, at 7:50AM, Titus von der Malsburg wrote:
>
>> Hi, I have a data frame and want to merge adjacent rows if some condition
>> is
>> met.  There's an obvious solution using a loop but it is prohibitively
>> slow
>> because my data frame is large.  Is there an efficient canonical solution
>> for
>> that?
>>
>>> head(d)
>>
>>    rt dur tid  mood roi  x
>> 55 5523 200   4  subj   9  5
>> 56 5523  52   4  subj   7 31
>> 57 5523 209   4  subj   4  9
>> 58 5523 188   4  subj   4  7
>> 70 4016 264   5 indic   9 51
>> 71 4016 195   5 indic   4 14
>>
>> The desired result would have consecutive rows with the same roi value
>> merged.
>> dur values should be added and x values averaged, other values don't
>> differ in
>> these rows and should stay the same.
>>
>>> head(result)
>>
>>    rt dur tid  mood roi  x
>> 55 5523 200   4  subj   9  5
>> 56 5523  52   4  subj   7 31
>> 57 5523 397   4  subj   4  8
>> 70 4016 264   5 indic   9 51
>> 71 4016 195   5 indic   4 14
>>
>> There's also a solution using reshape.  It uses an index for blocks
>>
>>  d$index <- cumsum(c(TRUE,diff(d$roi)!=0))
>>
>> melts and then casts for every column using an appropriate fun.aggregate.
>> However, this is a bit cumbersome and also I'm not sure how to make sure
>> that
>> I get the original order of rows.
>>
>> Thanks for any suggestion.
>>
>>  Titus
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Gray Calhoun

Assistant Professor of Economics
Iowa State University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error of Cox model

2009-12-08 Thread Nick Fankhauser

I have a set of parameter estimates for a multivariable Cox model predicting
survival duration and a data-frame of new measurements for the variables in
the model, as well as the actual survival duration.
Is there a function to estimate the error the model has on predicting
survival on this new set of data?


-- 
View this message in context: 
http://n4.nabble.com/Error-of-Cox-model-tp955382p955382.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] conditionally merging adjacent rows in a data frame

2009-12-08 Thread Nikhil Kaza

How about creating an index using multiple columns.

 a <- with(d, aggregate(dur, list(rt=rt,tid=tid,mood=mood,roi=roi),  
sum))
 b <- with(d, aggregate(x, list(rt=rt,tid=tid,mood=mood,roi=roi),  
mean))

c <- merge(a, b, by=c("rt","tid","mood", "roi"))

I suppose one could save some time by not running aggregate twice on  
the same dataset, but I am not sure how.


Nikhil

On 8 Dec 2009, at 7:50AM, Titus von der Malsburg wrote:

Hi, I have a data frame and want to merge adjacent rows if some  
condition is
met.  There's an obvious solution using a loop but it is  
prohibitively slow
because my data frame is large.  Is there an efficient canonical  
solution for

that?


head(d)

rt dur tid  mood roi  x
55 5523 200   4  subj   9  5
56 5523  52   4  subj   7 31
57 5523 209   4  subj   4  9
58 5523 188   4  subj   4  7
70 4016 264   5 indic   9 51
71 4016 195   5 indic   4 14

The desired result would have consecutive rows with the same roi  
value merged.
dur values should be added and x values averaged, other values don't  
differ in

these rows and should stay the same.


head(result)

rt dur tid  mood roi  x
55 5523 200   4  subj   9  5
56 5523  52   4  subj   7 31
57 5523 397   4  subj   4  8
70 4016 264   5 indic   9 51
71 4016 195   5 indic   4 14

There's also a solution using reshape.  It uses an index for blocks

 d$index <- cumsum(c(TRUE,diff(d$roi)!=0))

melts and then casts for every column using an appropriate  
fun.aggregate.
However, this is a bit cumbersome and also I'm not sure how to make  
sure that

I get the original order of rows.

Thanks for any suggestion.

 Titus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lm: RME vs. ML

2009-12-08 Thread Peter Dalgaard
jlu...@ria.buffalo.edu wrote:
> You need to give your criteria for "preferable".  For normal-linear 
> models, REML estimates of variances are unbiased, whereas ML estimates are 
> downwardly biased.  

I suspect that you can't actually say anything general about the
direction of the bias, except for the residual term. In the cases that
can be analyzed explicitly, variance estimates are complicated linear
combinations of sums of squares, and if those are not biased by the same
relative amount, the net result could conceivably be a positive bias. I
could be wrong though.

> My intuition is that the ML-induced bias would be 
> worse in small samples. I don't know about other distributions. Likewise I 
> don't know about MSE or other criterion for preference.
> 


-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - (p.dalga...@biostat.ku.dk)  FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lm: RME vs. ML

2009-12-08 Thread John Sorkin
Your question is well taken. I did not give any criteria because I realized 
there might be different answers based upon different criteria. Certainly one 
fundamental criteria would be that the estimates are BLUE, but this is not the 
only criteria one might be used.
John 
-Original Message-
From: 
To: John Sorkin 
Cc:  
To:  

Sent: 12/8/2009 9:39:28 AM
Subject: Re: [R] lm: RME vs. ML

You need to give your criteria for "preferable".  For normal-linear 
models, REML estimates of variances are unbiased, whereas ML estimates are 
downwardly biased.  My intuition is that the ML-induced bias would be 
worse in small samples. I don't know about other distributions. Likewise I 
don't know about MSE or other criterion for preference.






"John Sorkin"  
Sent by: r-help-boun...@r-project.org
12/07/2009 09:24 PM

To

cc

Subject
[R] lm: RME vs. ML






windows XP
R 2.10

As pointed out by Prof. Venables and Ripley (MASS 4th edition, p275), the 
results obtained from lme using method="ML" and method="REML" are often 
different, especially for small datasets. Is there any way to determine 
which method is preferable for a given set of data?
Thanks,
john


John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:16}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Difference in S.E. gee/yags and geeglm(/geese)

2009-12-08 Thread Torleif Markussen Lunde
Hi

A quick question. Standard errors reported by gee/yags differs from the ones in 
geeglm (geepack). 


require(gee)
require(geepack)
require(yags)

mm <- gee(breaks ~ tension, id=wool, data=warpbreaks, 
  corstr="exchangeable")
mm2 <- geeglm(breaks ~ tension, id=wool, data=warpbreaks, 
  corstr="exchangeable", std.err = "san.se")
mm3 <- yags(breaks ~ tension, id=wool, data=warpbreaks,  
corstr="exchangeable", alphainit=0.)

# S.E.
# gee:
sqrt(diag(mm$robust.variance))
#(Intercept)tensionMtensionH
#  5.777.463.73
# geeglm:
sqrt(diag(mm2$geese$vbeta))
# [1]  8.17 10.56  5.28
# yags:
sqrt(diag(slot(mm3, "robust.parmvar")))
# [1] 5.77 7.46 3.73

Any explanation of this behavior is welcome.

Best wishes
Torleif Markussen Lunde
PhD candidate
Centre for International Health/
Bjerkens Centre for Climate Research
University of Bergen
Norway

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error of Cox model

2009-12-08 Thread Nick Fankhauser

I have a set of parameter estimates for a multivariable Cox model predicting
survival duration and a data-frame of new measurements for the variables in
the model, as well as the actual survival duration.
Is there a function to estimate the error the model has on predicting
survival on this new set of data?
-- 
View this message in context: 
http://n4.nabble.com/Error-of-Cox-model-tp955300p955300.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Serial Correlation in panel data regression

2009-12-08 Thread Millo Giovanni
Dear Sayan,

there is a vcovHC method for panel models doing the White-Arellano covariance 
matrix, which is robust vs. heteroskedasticity *and* serial correlation, 
although in a different way from that of vcovHAC. You can supply it to coeftest 
as well, just as you did. The point is in estimating the model as a panel model 
in the first place.

So this should do what you need:

data("Gasoline", package="plm")
Gasoline$f.year=as.factor(Gasoline$year)
library(plm)
rhs <- "-1 + f.year + lincomep+lrpmg+lcarpcap"
pm1<- plm(as.formula(paste("lgaspcar ~", rhs)), data=Gasoline, model="pooling")
library(lmtest)
coeftest(pm1, vcov=vcovHC)

Please refer to the package vignette for 'plm' to check what it does exactly. 
Let me know if there are any issues.

Best,
Giovanni



-Original Message-
From: Achim Zeileis [mailto:achim.zeil...@wu-wien.ac.at]
Sent: Tue 08/12/2009 13.48
To: sayan dasgupta
Cc: r-help@R-project.org; yves.croiss...@let.ish-lyon.cnrs.fr; Millo Giovanni
Subject: Re: Serial Correlation in panel data regression
 
On Tue, 8 Dec 2009, sayan dasgupta wrote:

> Dear R users,
> I have a question here
> 
> library(AER)
> library(plm)   
> library(sandwich)
> ## take the following data
> data("Gasoline", package="plm")
> Gasoline$f.year=as.factor(Gasoline$year)
> 
> Now I run the following regression
> 
> rhs <- "-1 + f.year + lincomep+lrpmg+lcarpcap"
> m1<- lm(as.formula(paste("lgaspcar ~", rhs)), data=Gasoline)
> ###Now I want to find the autocorrelation,heteroskedasticity adjusted
> standard errors as a part of coeftest
> ### Basically I would like to take care of the within country serial
> correlaion
> 
> ###that is I want to do
> coeftest(m1, vcov=function(x) vcovHAC(x,order.by=...))
> 
> Please suggest what should be the argument of order.by and whether that will
> give me the desired result

Currently, the default vcovHAC() method just implements the time series 
case. A generalization to panel data is not yet available.

Maybe Yves and Giovanni (authors of "plm") have done something in that 
direction...

sorry,
Z
 
Ai sensi del D.Lgs. 196/2003 si precisa che le informazi...{{dropped:15}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fwd: conditionally merging adjacent rows in a data frame

2009-12-08 Thread Gray Calhoun
I think I forgot to send the original to the mailing list, so I'm
forwarding it (see below).  Sorry about that (and sorry if I did
remember and this is a duplicate).  After a few more minutes of
thought, I realized that you should probably make sure that rt, tid,
and mood are also the same in consecutive rows when constructing the
'consecutiveROI' vector (just as an additional error check).

There may be a built-in function that could replace first three line
lines, as well.

Best,
Gray

-- Forwarded message --
From: Gray Calhoun 
Date: Tue, Dec 8, 2009 at 6:42 AM
Subject: Re: [R] conditionally merging adjacent rows in a data frame
To: Titus von der Malsburg 


Hi Titus,
 This solution isn't great and will probably need some work on your
part.  The basic idea is to create a new index that is shared by
consecutive rows with the same value of roi, then just aggregate by
the new index

> consecutiveROI <- d$roi[-1] == d$roi[1:(length(d$roi)-1)]
> newindex <- 1:dim(d)[1]
> newindex[c(consecutiveROI, FALSE)] <- newindex[c(FALSE, consecutiveROI)]

> aggregate(d$x, list(newindex = newindex), mean)

And the same for dur.  You can get the unique rows of d with

> d[a$newindex,]

There may be bugs, but I think this general approach will work well.

--Gray

On Tue, Dec 8, 2009 at 6:50 AM, Titus von der Malsburg
 wrote:
> Hi, I have a data frame and want to merge adjacent rows if some condition is
> met.  There's an obvious solution using a loop but it is prohibitively slow
> because my data frame is large.  Is there an efficient canonical solution for
> that?
>
>> head(d)
>     rt dur tid  mood roi  x
> 55 5523 200   4  subj   9  5
> 56 5523  52   4  subj   7 31
> 57 5523 209   4  subj   4  9
> 58 5523 188   4  subj   4  7
> 70 4016 264   5 indic   9 51
> 71 4016 195   5 indic   4 14
>
> The desired result would have consecutive rows with the same roi value merged.
> dur values should be added and x values averaged, other values don't differ in
> these rows and should stay the same.
>
>> head(result)
>     rt dur tid  mood roi  x
> 55 5523 200   4  subj   9  5
> 56 5523  52   4  subj   7 31
> 57 5523 397   4  subj   4  8
> 70 4016 264   5 indic   9 51
> 71 4016 195   5 indic   4 14
>
> There's also a solution using reshape.  It uses an index for blocks
>
>  d$index <- cumsum(c(TRUE,diff(d$roi)!=0))
>
> melts and then casts for every column using an appropriate fun.aggregate.
> However, this is a bit cumbersome and also I'm not sure how to make sure that
> I get the original order of rows.
>
> Thanks for any suggestion.
>
>  Titus
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Gray Calhoun

Assistant Professor of Economics
Iowa State University



-- 
Gray Calhoun

Assistant Professor of Economics
Iowa State University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Modula Generators

2009-12-08 Thread Hans W Borchers
Sam K  yahoo.co.uk> writes:
> 
> Hi all,
> 
> Is there function on R for calculating Modula generators? For example for 
> primes above 100, e.g 157, i want
> to know which number generates the group under multiplication mod 157. i.e  i 
> want to find an element whose
> order is 156. The problem I occur is that modular arithmetic becomes 
> inaccurate when dealing with large numbers.

In other words, you are looking for a 'primitive root' in the field F_p
of prime p. By the way, there are phi(p) of them, where phi is Euler's 
function.

There is no known efficient algorithm for this, except the prime 
factorization of p-1 is given. And this algorithm is probabilistic,
i.e., in some cases it may not terminate.

In case your primes do not get too big, say between 100 and 1000, an
exhaustive search may be possible, somewhat simplified by the Euler
criterion.

Or you locate and read in some tables listing primitive roots. You can
find more information in Wikipedia or MathWorld.

For exact modular arithmetic you may turn to the 'gmp' R package.

Regards
Hans Werner

> Thanks for any help given.
> 
> Sam
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Printing 'k' levels of factors 'n' times each, but 'n' is unequal for all levels ?

2009-12-08 Thread A Singh

Dear List,

I need to print out each of 'k' levels of a factor 'n' times each, where 
'n' is the number of elements belonging to each factor.


I know that this can normally be done using the gl() command,
but in my case, each level 'k' has an unequal number of elements.

Example with code is as below:

vc<-read.table("P:\\Transit\\CORRECT 
files\\Everything-newest.csv",header=T, sep=",", dec=".", na.strings=NA, 
strip.white=T)


vcdf<-data.frame(vc)

tempdf<-data.frame(cbind(vcdf[,1:3], vcdf[,429]))
newtemp<-na.exclude(tempdf)

newtemp[,2]<-factor(newtemp[,2])

groupmean<-tapply(newtemp[,4], newtemp[,2], mean)

newmark<-factor(groupmean, exclude=(groupmean==0 | groupmean==1))
newmark

This is what the output is (going up to 61 levels)
1  2  3  4
  0.142857142857143  0.444   
5  6  8  9
0.33  0.09090909090  0.3846153846

. 61


The variable 'groupmean' calculates means for newtemp[,4] for 61 levels 
(k). Levels are specified in newtemp[,2].


I now want to be able to print out each value of 'groupmean'  as many times 
as there are elements in the group for which each is calculated.


So for E.g. if level 1 of newtemp[,2] has about 15 elements,  should be 
printed 15 times, level 2 = 12 times 0.1428, and so on.


Is there a way of specifying that a list needs to be populated with 
replicates of groupmeans based on values got from newtemp[,2]?


I just can't seem to figure this out by myself.

Many thanks for your help.

Aditi

--
A Singh
aditi.si...@bristol.ac.uk
School of Biological Sciences
University of Bristol

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >