Re: [R] Vim R plugin-2

2009-05-09 Thread Jose Quesada
Jakson A. Aquino jaksonaquino at gmail.com writes:


 Dear R users,

 People who uses vim in Linux/Unix may be interested in checking the
 plugin for R that I'm developing:

   http://www.vim.org/scripts/script.php?script_id=2628

 The plugin includes omni completion for R objects, code indentation
 and communication with R running in a terminal emulator (xterm or
 gnome-terminal). This last feature was already present in Johannes
 Ranke's plugin.

 I would like to know if you have any suggestions of improvements.

 Best regards,



Excellent work!
I wonder if this could be made more portable (e.g., it depends on perl, + R
being on a tty terminal, which is not always the case).

I'll try to look at it and see if I can port it so it works on windows.
But the
current communication method I use there are just the clipboard, not sure if
it'll be possible.

Any alternative ways of sending info both ways from R to any open
process (vim)
in windows?

I have, and like, perl; the only think that is missing is the tty support on
windows (I think!)

-- 
Jose Quesada, PhD.
Max Planck Institute,
Center for Adaptive Behavior and Cognition -ABC-, 
Lentzeallee 94, office 224, 14195 Berlin
http://www.josequesada.name/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] caching of the .Rprofile file

2009-05-09 Thread Tobias Verbeke

Hi Tom,


It seems that if I make a change to the .Rprofile file in my working
directory, it is not immediately reflected when the session is
restarted. (I am using statET and rJava)

Is that something I should expect?


No.

Is your launch configuration of R in StatET configured
such that it takes ${resource_loc} as working directory
(Main tab of the launch configuration) ?

This way you can select the directory you want as a working
directory in the Project Explorer and launch R directly in
there. If you do not launch R in that way it will take a
default directory and therefore not load the .Rprofile from
the specific directory you want to be the working directory.

HTH,
Tobias

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] (no subject)

2009-05-09 Thread Jaana Kettunen

 

Could you help me with a problem? I should put non-linear variables into 
zelig-model, how can that be done? I'm dealing with air pollution data, 
trying to find out daily associations between mortality and air pollutants. 
Weather variables used as confounders are in some cases non-linear. Since 
smoothing is not an option I don't know how to proceed.

 

 Thanks, Jaana


_
Hotmail® has ever-growing storage! Don’t worry about storage limits.

rial_Storage1_052009
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] new book on (Perl and) R for computational biology

2009-05-09 Thread Daniel Viar
It looks like the correct link is:

http://www.crcpress.com/product/isbn/9781420069730


On Fri, May 8, 2009 at 6:49 PM, Gabriel Valiente valie...@lsi.upc.edu wrote:
 There is a new book on (Perl and) R for computational biology,

 G. Valiente. Combinatorial Pattern Matching Algorithms in Computational
 Biology using Perl and R. Taylor  Francis/CRC Press (2009)

 http://www.crcpress.com/product/isbn/9781420063677

 I hope it will be of much use to R developers and users.

 Gabriel

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Strip labels: use xyplot() to plot columns in parallel with outer=TRUE

2009-05-09 Thread John Maindonald

The following tinkers with the strip labels, where the
different panels are for different levelf of a conditioning
factor.

tau - (0:5)/2.5; m - length(tau); n - 200; SD - 2
x0 - rnorm(n, mean=12.5, sd=SD)
matdf - data.frame(
   x = as.vector(sapply((0:5)/2.5, function(s)x0+rnorm(n, sd=2*s))),
   y - rep(15+2.5*x0, m), taugp = factor(rep(tau, rep(n,m
names(matdf) - c(x,y,taugp)
lab - c(list(0 (No error in x)),
lapply(tau[-1], function(x)substitute(A*s[z], list(A=x
xyplot(y ~ x | taugp, data=matdf,
   strip=strip.custom(strip.names=TRUE,
var.name=Add error with SD, sep=expression( = ),
factor.levels=as.expression(lab)))
Is there any way to get custom labels when the same is done by
plotting, variables in parallel?:

df - unstack(matdf, x ~ taugp)
df$y - 15+2.5*x0
lab2 - c(list(0 (No error in x)),
lapply(tau[-1], function(x)substitute(Add error with SD ==  
A*s[z],

  list(A=x
form - formula(paste(y ~ , paste(paste(X, tau, sep=),
   collapse=+)))
xyplot(form, data=df, outer=TRUE)

I'd hoped that the following would do the trick, but the first label
is repeated in each panel, and the variable names are added:

xyplot(form, data=df, outer=TRUE, strip=strip.custom(strip.names=TRUE,
  var.name=as.expression(lab2)))

John Maindonald email: john.maindon...@anu.edu.au
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] pdf transparency not working with Latex documents

2009-05-09 Thread jgarcia
Hello,
I' using the pdf() device with bg=transparent to create plots to be used
within a latex (beamer) presentation.

Later on, I see that the background of my pdf() graphics is solid white in
the  final presentation.

I'm using R-2.6.0, and I have also tried to set the version argument in
pdf() to 1.5 and 1.6. Later versions are not accepted.

Has anyone used transparency successfully in this way?

Thanks, and best regards,
Javier
...

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] pdf transparency not working with Latex documents. Solved

2009-05-09 Thread jgarcia
Hi,
I've found that after the call to pdf(), I had a posterior line:
par(bg=white)
that was creating this white background. Setting this to transparent works
fine.
Thanks,
Javier
...

 Hello,
 I' using the pdf() device with bg=transparent to create plots to be used
 within a latex (beamer) presentation.

 Later on, I see that the background of my pdf() graphics is solid white in
 the  final presentation.

 I'm using R-2.6.0, and I have also tried to set the version argument in
 pdf() to 1.5 and 1.6. Later versions are not accepted.

 Has anyone used transparency successfully in this way?

 Thanks, and best regards,
 Javier
 ...

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merging two data frames with 3 common variables makes duplicated rows

2009-05-09 Thread Rocko22

Thomas,

You are very clever! The meil2 data frame has twice the common variable
combinations:

 meil2
   dist sexe style meil
138F  clas 02:43:17
238F  free 02:24:46
338H  clas 02:37:36
438H  free 01:59:35
545F  clas 03:46:15
645F  free 02:20:15
745H  clas 02:30:07
845H  free 01:59:36
938F  clas 02:43:17
10   38F  free 02:24:46
11   38H  clas 02:37:36
12   38H  free 01:59:35
13   45F  clas 03:46:15
14   45F  free 02:20:15
15   45H  clas 02:30:07
16   45H  free 01:59:36

Keeping unique combinations merged correctly with the next data frame. This
merge() function is more subtle than I first thought. That means when
merging two data frames, if the resulting data frame has more rows than
either former data frames, it means that there are duplicate combinations of
the common variables in either or the two data frames.

Thank you very much, I will try to be more careful about this.

Rock


Thomas Lumley wrote:
 
 On Fri, 8 May 2009, Rock Ouimet wrote:
 
 I am new to R (ex SAS user) , and I cannot merge two data frames without
 getting duplicated rows in the results. How to avoid this happening
 without
 using the unique() function?

 1. First data frame is called tmv with 6 variables and 239 rows:

 tmv[1:10,]
  temps   nomprenom sexe dist style
 1  01:59:36   Cyr SteveH   45  free
 2  02:09:55  Gosselin ErickH   45  free
 3  02:12:18 Desfosses SachaH   45  free
 4  02:12:23  Lapointe SebastienH   45  free
 5  02:12:52LabrieMichelH   45  free
 6  02:12:54   LeblancMichelH   45  free
 7  02:13:02 Thibeault   SylvainH   45  free
 8  02:13:49Martel  StephaneH   45  free
 9  02:14:03Lavoie Jean-PhilippeH   45  free
 10 02:14:05Boivin   Jean-ClaudeH   45  free

 Its structure is:
 str(tmv)
 'data.frame':   239 obs. of  6 variables:
 $ temps :Class 'times'  atomic [1:239] 0.0831 0.0902 0.0919 0.0919 0.0923
 ...
  .. ..- attr(*, format)= chr h:m:s
 $ nom   : Factor w/ 167 levels Aubut,Audy,..: 45 84 55 105 98 110 158
 117 109 22 ...
 $ prenom: Factor w/ 135 levels Alain,Alexandre,..: 128 33 121 122 93
 93
 130 126 63 59 ...
 $ sexe  : Factor w/ 2 levels F,H: 2 2 2 2 2 2 2 2 2 2 ...
 $ dist  : int  45 45 45 45 45 45 45 45 45 45 ...
 $ style : Factor w/ 2 levels clas,free: 2 2 2 2 2 2 2 2 2 2 ...


 2. The second data frame is called meil2 with 4 variables and 16 rows;
 meil2[1:10,]
   dist sexe style meil
 138F  clas 02:43:17
 238F  free 02:24:46
 338H  clas 02:37:36
 438H  free 01:59:35
 545F  clas 03:46:15
 645F  free 02:20:15
 745H  clas 02:30:07
 845H  free 01:59:36
 938F  clas 02:43:17
 10   38F  free 02:24:46
 
 
 Lines 9 and 1 appear to be the same in meil2, as do 2 and 10.  If the 16
 rows consist of two repeats of 8 rows that would explain why you are
 getting two copies of each individual in the output. unique(meil2) would
 have just the distinct rows.
 
   -thomas
 
 Thomas Lumley Assoc. Professor, Biostatistics
 tlum...@u.washington.edu  University of Washington, Seattle
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/Merging-two-data-frames-with-3-common-variables-makes-duplicated-rows-tp23454018p23459790.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Gantt chart but slightly different

2009-05-09 Thread Jim Lemon

Beata Czyz wrote:

Hello,
I am new to this list and rather new to graphics with R.
I would like to make a chart like Gantt chart, something like that:
...
but I would like to fill the different blocks of tasks with different
pattern i.e. first blocks of Male 1 and Male 2 with pattern 1, second
blocks of Male 1 and Male 2 with pattern 2 etc.
Any idea?


Hi Beata,
This could be done by replacing the taskcolors argument in the 
gantt.chart function with an angle argument and passing that 
argument to the rect function that draws the bars. You could then get 
hatching of different directions instead of colors. Like this:


gantt.chart-function(x=NULL,format=%Y/%m/%d,xlim=NULL,
angle=c(45,45,90,90,135,135),
priority.legend=FALSE,vgridpos=NULL,vgridlab=NULL,vgrid.format=%Y/%m/%d,
half.height=0.25,hgrid=FALSE,main=,xlab=,cylindrical=FALSE) {

(a great chunk of the gantt.chart function)

rect(x$starts[x$labels==tasks[i]],topdown[i]-half.height,
  x$ends[x$labels==tasks[i]],topdown[i]+half.height,
  angle=angle[i],border=FALSE)


(the rest of the gantt.chart function)

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] (no subject)

2009-05-09 Thread David Winsemius


On May 9, 2009, at 5:39 AM, Jaana Kettunen wrote:





Could you help me with a problem? I should put non-linear variables  
into
zelig-model, how can that be done? I'm dealing with air pollution  
data,
trying to find out daily associations between mortality and air  
pollutants.
Weather variables used as confounders are in some cases non-linear.  
Since

smoothing is not an option I don't know how to proceed.


You should search within the Zelig documentation for examples of  
regression splines. It is not difficult to find these. This document  
has many such examples and it was the first hit on a Google search:

http://gking.harvard.edu/zelig/docs/zelig.pdf


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Vim R plugin-2

2009-05-09 Thread Jose Quesada
Jakson Alves de Aquino wrote:
 Jose Quesada wrote:
   
 I'll try to look at it and see if I can port it so it works on windows.
 But the
 current communication method I use there are just the clipboard, not sure if
 it'll be possible.
 

 Unfortunately, I cannot help on Windows environment.

   
 Any alternative ways of sending info both ways from R to any open
 process (vim)
 in windows?
 

 Netbeans could be a fast and portable route for communication between R
 and vim without the need of using either pipes or files saved in disk as
 intermediary. Typing :help netbeans in vim should show netbeans
 documentation. R also has TCP support: write.socket(), read.socket(),
 etc. We could begin to explore this route...

   
Great finding!
I wish I had more time to dedicate to this.

Have you seen:
http://pyclewn.wiki.sourceforge.net/features+

In my view, R as a language is very good but the tools around it are not
good.
When a matlab person tries R, their first comments are always how poor
the environment is.
Sure, one can have a debugger (with a crappy GUI in TK), and there's
some editor support, but it's kind of painful.

Integreting an R debugger with something like pyclewn would be very good.

Best,
-jose


-- 
Jose Quesada, PhD.
Max Planck Institute,
Center for Adaptive Behavior and Cognition -ABC-, 
Lentzeallee 94, office 224, 14195 Berlin
http://www.josequesada.name/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Vim R plugin-2

2009-05-09 Thread Tobias Verbeke

Hi Jose,

Jose Quesada wrote:

snip


In my view, R as a language is very good but the tools around it are not
good.
When a matlab person tries R, their first comments are always how poor
the environment is.
Sure, one can have a debugger (with a crappy GUI in TK), and there's
some editor support, but it's kind of painful.

Integreting an R debugger with something like pyclewn would be very good.


There's no integrated debugger yet, but the StatET plugin
for Eclipse is one example of a mature development environment
for R. Moreover it allows to leverage the Eclipse eco-system
and its myriad of plug-ins. No painful experience at all
for me..

http://www.walware.de/goto/statet

Best,
Tobias

P.S. When I try Matlab my first comment is always how poor
the language is ;-)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Beyond double-precision?

2009-05-09 Thread joaks1

Yes, all of the numbers are positive.  I actually have a Bayesian posterior
sample of log likelihoods [i.e. thousands of ln(likelihood) scores].  I want
to calculate the harmonic mean of these likelihoods, which means I need to
convert them back into likelihoods [i.e. e^ln(likelihood)], calculate the
harmonic mean, and then take the log of the mean.  I have done this before
in Mathematica, but I have a simulation pipeline written almost entirely in
R, so it would be nice if I could do these calculations in R.

If R cannot handle such small values, then perhaps there's a way to
calculate the harmonic mean from the log likelihood scores without
converting back to likelihoods?  I am a biologist, not a mathematician, so
any recommendations are welcome!  Thanks! -Jamie


spencerg wrote:
 
   Are all your numbers positive?  If yes, have you considered using 
 logarithms? 
 
   I would guess it is quite rare for people to compute likelihoods.  
 Instead I think most people use log(likelihoods).  Most of the 
 probability functions in R have an option of returning the logarithms. 
 
   Hope this helps. 
   Spencer
 
 joaks1 wrote:
 I need to perform some calculations with some extremely small numbers
 (i.e.
 likelihood values on the order of 1.0E-16,000).  Even when using the
 double() function, R is rounding these values to zero.  Is there any way
 to
 get R to deal with such small numbers?

 For example, I would like to be able to calculate e^-1 (i.e.
 exp(-1)) without the result being rounded to zero.

 I know I can do it in Mathematica, but I would prefer to use R if I can. 
 Any help would be appreciated!

 Many Thanks in Advance!

 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/Beyond-double-precision--tp23452471p23457955.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Citing R/Packages Question

2009-05-09 Thread roger koenker

I've had an email exchange with the authors of a recent paper
in Nature who also made a good faith effort to cite both R and the  
quantreg

package, and were told that the Nature house style didn't allow such
citations so they were dropped from the published paper and the
supplementary material appearing on the Nature website.

Since the CRAN website makes a special effort to make prior versions  
of packages

available, it would seem to me to be much more useful to cite version
numbers than access dates.  There  are serious questions about the
ephemerality of url citations, not all of which are adequately resolved
by the Wayback machine, and access dating, but it would be nice to
have some better standards for such contingent citations rather than
leave authors at the mercy of copy editors.  I would also be  
interested in

suggestions by other contributors.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email   rkoen...@uiuc.edu   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820


On May 8, 2009, at 5:27 PM, Derek Ogle wrote:


I used R and the quantreg package in a manuscript that is currently in
the proofs stage.  I cited both R and quantreg as suggested by
citation() and noted the version of R and quantreg that I used in the
main text as



 All tests were computed with the R v2.9.0 statistical programming
language (R Development Core 2008).  Quantile regressions were  
conducted

with the quantreg v4.27 package (Koenker 2008) for R.



The editor has asked me to also provide the date when the webpage was
accessed for both R and quantreg.



This does not seem like an appropriate request to me as both R and the
quantreg package are versioned.  This request seems to me to be the  
same

as asking someone when they purchased commercial package X version Y
(which I don't think would be asked).



Am I thinking about this correctly or has the editor made a valid
request?



I would be interested in any comments or opinions.



Dr. Derek H. Ogle

Associate Professor of Mathematical Sciences and Natural Resources

Northland College

1411 Ellis Avenue

Box 112

Ashland, WI

715.682.1300

www.ncfaculty.net/dogle/




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Vim R plugin-2

2009-05-09 Thread Leo

 Any alternative ways of sending info both ways from R to any open
 process (vim)
 in windows?

On windows, I'd rather use ole automation. A few years ago I
successfully used this plugin:
http://www.vim.org/scripts/script.php?script_id=889

I haven't used it since though.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] for loop

2009-05-09 Thread aledanda

Hi,
I need your help.
I have a vector of numbers reflecting the switch in the perception of a
figure. For a certain period I have positive numbers (which reflect the
perception A) then the perception changes and I have negative numbers
(perception B), and so on for 4 iterations. I need to take the rate of
this switch for my analysis. Namely, I need a new vector with numbers which
reflect how many digit follows in sequence before the change in perception
(and then I have to take the reciprocal of these numbers in order to obtain
the rate but is not a problem). For example, suppose that the new vector
looks like this: new - c(5,7,8,9) , 5 numbers positive then the perception
changes and 7 negative numbers follow, then it changes again and 8 positive
follows and so on.. 
In brief I need to write a little script that detects the change in sign of
the elements of the vector and count how many positive and how many negative
digits there are in sequence.

I would use the for loop, I started but then i don't know how to continue

rate - vector()
for(i in (length(a)) rate - (a[i] 0 

..can you help me?

Alessandra

-- 
View this message in context: 
http://www.nabble.com/for-loop-tp23459661p23459661.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Rmysql linking to an old-ish mysql build

2009-05-09 Thread Uwe Ligges



Jose Quesada wrote:

Hi,

I'm trying to get Rmysql to work o windows server 2008 64-bit.
I have the latest build of mysql installed (mysql-5.1.34-winx64).



Independent of the version number of MySQL (which is less than 6 months 
old):
If you are talking about the RMySQL binary build on CRAN: It is build 
against a 32-bit version of MySQL. I am not sure if there is a safe way 
to build a binary that properly links against 64-bit MySQL given you are 
running 32-bit R.

If there is, you have to install the package from sources yourself anyway.

Best,
Uwe Ligges




When trying to load Rmysql, I got a warning that Rmysql is linking to an
old-ish mysql build (5.0.67).
I could do some basic stuff (the connection works) but it breaks when
trying to read a large table.

So I set up to use the buld 5.0.67 that Rmysql likes.

10 hrs later and after lots of sysadmin work, I have to call it quits. I
couldn't make it work.

Since this mysql 5.0.67 is pretty old, I was wondering if anyone has
binaries for Rmysql that work for a more recent version.
Maybe the authors of the package have plans to update it soon?

I've tried the package on both R 2.9.0 and R2.8.1.

If nothing comes up, I'll try to spend a few more hours on getting the
old version to work.

Thanks!

Best,
-Jose



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Beyond double-precision?

2009-05-09 Thread spencerg
Dear Jamie: 

 The harmonic mean is exp(mean(logs)).  Therefore, log(harmonic 
mean) = mean(logs). 


 Does this make sense?
 Best Wishes,
 Spencer 


joaks1 wrote:

Yes, all of the numbers are positive.  I actually have a Bayesian posterior
sample of log likelihoods [i.e. thousands of ln(likelihood) scores].  I want
to calculate the harmonic mean of these likelihoods, which means I need to
convert them back into likelihoods [i.e. e^ln(likelihood)], calculate the
harmonic mean, and then take the log of the mean.  I have done this before
in Mathematica, but I have a simulation pipeline written almost entirely in
R, so it would be nice if I could do these calculations in R.

If R cannot handle such small values, then perhaps there's a way to
calculate the harmonic mean from the log likelihood scores without
converting back to likelihoods?  I am a biologist, not a mathematician, so
any recommendations are welcome!  Thanks! -Jamie


spencerg wrote:
  
  Are all your numbers positive?  If yes, have you considered using 
logarithms? 

  I would guess it is quite rare for people to compute likelihoods.  
Instead I think most people use log(likelihoods).  Most of the 
probability functions in R have an option of returning the logarithms. 

  Hope this helps. 
  Spencer


joaks1 wrote:


I need to perform some calculations with some extremely small numbers
(i.e.
likelihood values on the order of 1.0E-16,000).  Even when using the
double() function, R is rounding these values to zero.  Is there any way
to
get R to deal with such small numbers?

For example, I would like to be able to calculate e^-1 (i.e.
exp(-1)) without the result being rounded to zero.

I know I can do it in Mathematica, but I would prefer to use R if I can. 
Any help would be appreciated!


Many Thanks in Advance!

  

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.








__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] for loop

2009-05-09 Thread Uwe Ligges



aledanda wrote:

Hi,
I need your help.
I have a vector of numbers reflecting the switch in the perception of a
figure. For a certain period I have positive numbers (which reflect the
perception A) then the perception changes and I have negative numbers
(perception B), and so on for 4 iterations. I need to take the rate of
this switch for my analysis. Namely, I need a new vector with numbers which
reflect how many digit follows in sequence before the change in perception
(and then I have to take the reciprocal of these numbers in order to obtain
the rate but is not a problem). For example, suppose that the new vector
looks like this: new - c(5,7,8,9) , 5 numbers positive then the perception
changes and 7 negative numbers follow, then it changes again and 8 positive
follows and so on.. 
In brief I need to write a little script that detects the change in sign of

the elements of the vector and count how many positive and how many negative
digits there are in sequence.

I would use the for loop, I started but then i don't know how to continue

rate - vector()
for(i in (length(a)) rate - (a[i] 0 

..can you help me?


See ?sign and ?rle which together yield:

a - c(-1, -2, -3, 1, 2, -1)
rle(sign(a))
#Run Length Encoding
#  lengths: int [1:3] 3 2 1
#  values : num [1:3] -1 1 -1
## or just the vector you want is:

rle(sign(a))$lengths
[1] 3 2 1

Uwe Ligges




Alessandra



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with package SNOW on MacOS X 10.5.5

2009-05-09 Thread Rainer Machne

Hi Greg,

I don't know if this is related to your problem, but
I get the same error (on both ubuntu and fedora linux, R 2.9) and just 
found a very curious behaviour - snowfall apply functions don't like the 
variable name c.


E.g.:

c-1
sfLapply(1:10, exp)

issues the same error you had posted, while subsequent

rm(c)
sfLapply(1:10, exp)

runs fine.


Rainer



On Wed, 31 Dec 2008, Greg Riddick wrote:

 Hello All,

 I can run the lower level functions OK, but many of the higher level
 (eg. parSApply) functions are generating errors.

 When running the example (from the snow help docs) for parApply on
 MacOSX 10.5.5, I get the
 following error:


 cl - makeSOCKcluster(c(localhost,localhost))
 sum(parApply(cl, matrix(1:100,10), 1, sum))

 Error in do.call(fun, lapply(args, enquote)) :
 could not find function fun



 Any ideas? Do I possibly need MPI or PVM to run the Apply functions?

 Thanks,


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Histogram frequencies with a normal pdf curve overlay

2009-05-09 Thread Jacques Wagnor
Dear List,

When I plot a histogram with 'freq=FALSE' and overlay the histogram
with a normal pdf curve, everything looks as expected, as follows:

x - rnorm(1000)
hist(x, freq=FALSE)
curve(dnorm(x), add=TRUE, col=blue)

What do I need to do if I want to show the frequencies (freq=TRUE)
with the same normal pdf overlay, so that the plot would still look
the same?

Regards,

Jacques

platform   i386-pc-mingw32
arch   i386
os mingw32
system i386, mingw32
status
major  2
minor  8.0
year   2008
month  10
day20
svn rev46754
language   R
version.string R version 2.8.0 (2008-10-20)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] I'm offering $300 for someone who know R-programming to do the assignments for me.

2009-05-09 Thread Carl Witthoft

Sorry, but your professor offered me $500 NOT to do your assignments.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Beyond double-precision?

2009-05-09 Thread Berwin A Turlach
G'day all,

On Sat, 09 May 2009 08:01:40 -0700
spencerg spencer.gra...@prodsyse.com wrote:

   The harmonic mean is exp(mean(logs)).  Therefore, log(harmonic 
 mean) = mean(logs). 
 
   Does this make sense?

I think you are talking here about the geometric mean and not the
harmonic mean. :)

The harmonic mean is a bit more complicated.  If x_i are positive
values, then the harmonic mean is

H= n / (1/x_1 + 1/x_2 + ... + 1/x_n)

so

log(H) = log(n) - log( 1/x_1 + 1/x_2 + ... + 1/x_n)

now log(1/x_i) = -log(x_i) so if log(x_i) is available, the logarithm
of the individual terms are easily calculated.  But we need to
calculate the logarithm of a sum from the logarithms of the individual
terms.  

At the C level R's API has the function logspace_add for such tasks, so
it would be easy to do this at the C level.  But one could also
implement the equivalent of the C routine using R commands.  The way to
calculate log(x+y) from lx=log(x) and ly=log(y) according to
logspace_add is:

  max(lx,ly) + log1p(exp(-abs(lx-ly)))

So the following function may be helpful:

logadd - function(x){
  logspace_add - function(lx, ly)
max(lx, ly) + log1p(exp(-abs(lx-ly)))

  len_x - length(x)
   if(len_x  1){
res - logspace_add(x[1], x[2])
if( len_x  2 ){
  for(i in 3:len_x)
res - logspace_add(res, x[i])
}
  }else{
res - x
  }
  res
}

R set.seed(1)
R x - runif(50)
R lx - log(x)
R log(1/mean(1/x))  ## logarithm of harmonic mean
[1] -1.600885
R log(length(x)) - logadd(-lx)
[1] -1.600885

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6515 4416 (secr)
Dept of Statistics and Applied Probability+65 6515 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore
6 Science Drive 2, Blk S16, Level 7  e-mail: sta...@nus.edu.sg
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] I'm offering $300 for someone who know R-programming to do the assignments for me.

2009-05-09 Thread Gábor Csárdi
That's typical, my profs used to do this to me all the time.

G.

On Sat, May 9, 2009 at 6:17 PM, Carl Witthoft c...@witthoft.com wrote:
 Sorry, but your professor offered me $500 NOT to do your assignments.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Gabor Csardi gabor.csa...@unil.ch UNIL DGM

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Beyond double-precision?

2009-05-09 Thread spencerg

Dear Berwin:  Thanks for the elegant correction.  Spencer
Berwin A Turlach wrote:

G'day all,

On Sat, 09 May 2009 08:01:40 -0700
spencerg spencer.gra...@prodsyse.com wrote:

  
  The harmonic mean is exp(mean(logs)).  Therefore, log(harmonic 
mean) = mean(logs). 


  Does this make sense?



I think you are talking here about the geometric mean and not the
harmonic mean. :)

The harmonic mean is a bit more complicated.  If x_i are positive
values, then the harmonic mean is

H= n / (1/x_1 + 1/x_2 + ... + 1/x_n)

so

log(H) = log(n) - log( 1/x_1 + 1/x_2 + ... + 1/x_n)

now log(1/x_i) = -log(x_i) so if log(x_i) is available, the logarithm
of the individual terms are easily calculated.  But we need to
calculate the logarithm of a sum from the logarithms of the individual
terms.  


At the C level R's API has the function logspace_add for such tasks, so
it would be easy to do this at the C level.  But one could also
implement the equivalent of the C routine using R commands.  The way to
calculate log(x+y) from lx=log(x) and ly=log(y) according to
logspace_add is:

  max(lx,ly) + log1p(exp(-abs(lx-ly)))

So the following function may be helpful:

logadd - function(x){
  logspace_add - function(lx, ly)
max(lx, ly) + log1p(exp(-abs(lx-ly)))

  len_x - length(x)
   if(len_x  1){
res - logspace_add(x[1], x[2])
if( len_x  2 ){
  for(i in 3:len_x)
res - logspace_add(res, x[i])
}
  }else{
res - x
  }
  res
}

R set.seed(1)
R x - runif(50)
R lx - log(x)
R log(1/mean(1/x))  ## logarithm of harmonic mean
[1] -1.600885
R log(length(x)) - logadd(-lx)
[1] -1.600885

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6515 4416 (secr)
Dept of Statistics and Applied Probability+65 6515 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore

6 Science Drive 2, Blk S16, Level 7  e-mail: sta...@nus.edu.sg
Singapore 117546http://www.stat.nus.edu.sg/~statba




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Beyond double-precision?

2009-05-09 Thread Gabor Grothendieck
The following packages support high precision precision
arithmetic (and the last two also support exact arithmetic):

bc - interface to bc calculator
http://r-bc.googlecode.com

gmp - interface to gmp (gnu multiple precision)
http://cran.r-project.org/web/packages/gmp

rSymPy - interface to sympy computer algebra system
http://rsympy.googlecode.com

Ryacas - interface to yacas computer algebra system
http://ryacas.googlecode.com



On Fri, May 8, 2009 at 4:54 PM, joaks1 joa...@gmail.com wrote:

 I need to perform some calculations with some extremely small numbers (i.e.
 likelihood values on the order of 1.0E-16,000).  Even when using the
 double() function, R is rounding these values to zero.  Is there any way to
 get R to deal with such small numbers?

 For example, I would like to be able to calculate e^-1 (i.e.
 exp(-1)) without the result being rounded to zero.

 I know I can do it in Mathematica, but I would prefer to use R if I can.
 Any help would be appreciated!

 Many Thanks in Advance!
 --
 View this message in context: 
 http://www.nabble.com/Beyond-double-precision--tp23452471p23452471.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Histogram frequencies with a normal pdf curve overlay

2009-05-09 Thread Ted Harding
On 09-May-09 16:10:42, Jacques Wagnor wrote:
 Dear List,
 When I plot a histogram with 'freq=FALSE' and overlay the
 histogram with a normal pdf curve, everything looks as expected,
 as follows:
 
 x - rnorm(1000)
 hist(x, freq=FALSE)
 curve(dnorm(x), add=TRUE, col=blue)
 
 What do I need to do if I want to show the frequencies (freq=TRUE)
 with the same normal pdf overlay, so that the plot would still look
 the same?
 
 Regards,
 Jacques

Think first about how you would convert the histogram densities
(heights of the bars on the density scale) into histogram frequencies.

  Density * (bin width) * N = frequency

where N = total number in sample. Then all you need to is multiply
the Normal density by the same factor. To find out the bin width,
take the difference between succesive values of the breaks component
of the histogram. One way to do all this is

  N - 1000
  x - rnorm(N)
  H - hist(x, freq=TRUE)  ## This will plot the histogram as well
  dx - min(diff(H$breaks))
  curve(N*dx*dnorm(x), add=TRUE, col=blue)

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 09-May-09   Time: 17:31:03
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Histogram frequencies with a normal pdf curve overlay

2009-05-09 Thread Jacques Wagnor
Thank you!

On Sat, May 9, 2009 at 11:31 AM, Ted Harding
ted.hard...@manchester.ac.uk wrote:
 On 09-May-09 16:10:42, Jacques Wagnor wrote:
 Dear List,
 When I plot a histogram with 'freq=FALSE' and overlay the
 histogram with a normal pdf curve, everything looks as expected,
 as follows:

 x - rnorm(1000)
 hist(x, freq=FALSE)
 curve(dnorm(x), add=TRUE, col=blue)

 What do I need to do if I want to show the frequencies (freq=TRUE)
 with the same normal pdf overlay, so that the plot would still look
 the same?

 Regards,
 Jacques

 Think first about how you would convert the histogram densities
 (heights of the bars on the density scale) into histogram frequencies.

  Density * (bin width) * N = frequency

 where N = total number in sample. Then all you need to is multiply
 the Normal density by the same factor. To find out the bin width,
 take the difference between succesive values of the breaks component
 of the histogram. One way to do all this is

  N - 1000
  x - rnorm(N)
  H - hist(x, freq=TRUE)  ## This will plot the histogram as well
  dx - min(diff(H$breaks))
  curve(N*dx*dnorm(x), add=TRUE, col=blue)

 Ted.

 
 E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
 Fax-to-email: +44 (0)870 094 0861
 Date: 09-May-09                                       Time: 17:31:03
 -- XFMail --


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Rmysql linking to an old-ish mysql build

2009-05-09 Thread Prof Brian Ripley
This topic is usually covered on R-sig-db, so its archives will give 
more information (and as I recall, so would the R-help archives, not 
least in pointing you to R-sig-db).


On Sat, 9 May 2009, Uwe Ligges wrote:


Jose Quesada wrote:

Hi,

I'm trying to get Rmysql to work o windows server 2008 64-bit.
I have the latest build of mysql installed (mysql-5.1.34-winx64).



Independent of the version number of MySQL (which is less than 6 months old):
If you are talking about the RMySQL binary build on CRAN: It is build against 
a 32-bit version of MySQL. I am not sure if there is a safe way to build a 
binary that properly links against 64-bit MySQL given you are running 32-bit 
R.


MySQL is a client-server system: this will work if you have a 32-bit 
MySQL client DLL and arrange for RMySQL to find it (as a 32-bit client 
can talk to a 64-bit server).  That client DLL needs to be more or 
less the same MySQL version as RMySQL was built against (and what 
'more or less' means is determined by trial-and-error: there is no 
guarantee whatsoever that any other version will work, and even single

patch-level differences have results in crashes).


If there is, you have to install the package from sources yourself anyway.


That is in any case the safest thing to do.


Best,
Uwe Ligges




When trying to load Rmysql, I got a warning that Rmysql is linking to an
old-ish mysql build (5.0.67).
I could do some basic stuff (the connection works) but it breaks when
trying to read a large table.

So I set up to use the buld 5.0.67 that Rmysql likes.

10 hrs later and after lots of sysadmin work, I have to call it quits. I
couldn't make it work.

Since this mysql 5.0.67 is pretty old, I was wondering if anyone has
binaries for Rmysql that work for a more recent version.
Maybe the authors of the package have plans to update it soon?

I've tried the package on both R 2.9.0 and R2.8.1.

If nothing comes up, I'll try to spend a few more hours on getting the
old version to work.

Thanks!

Best,
-Jose



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] I'm offering $300 for someone who know R-programming to do the assignments for me.

2009-05-09 Thread Wensui Liu
my guess he might ask for production code but just didn't want to tell
the truth here.
in some software forums, this kind of things happen all the time :-)

On Fri, May 8, 2009 at 12:36 PM, Wacek Kusnierczyk
waclaw.marcin.kusnierc...@idi.ntnu.no wrote:
 Simon Pickett wrote:
 I bet at least a few people offered their services! It might be an
 undercover sting operation to weed out the unethical amongst us :-)


 ... written by some of the r core developers?

 vQ

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
==
WenSui Liu
Acquisition Risk, Chase
Blog   : statcompute.spaces.live.com

Tough Times Never Last. But Tough People Do.  - Robert Schuller

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] I'm offering $300 for someone who know R-programming to do the assignments for me.

2009-05-09 Thread markleeds

   I hate to start a whole war about this but isn't there some percent chance (
   not much but
   non zero ) that she's willing to pay the 300.00Â  so that she can get a nice
   solution that she can then
   learn from ? I'm definitely guilty of this behavior as a non-student and i
   forget to be honest if
   we was definitely a student but I think she was. Â again, not meaning to
   start a war so no replies preffered or atleast they should be off list ?

   On May 9, 2009, Gábor Csárdi csa...@rmki.kfki.hu wrote:

 That's typical, my profs used to do this to me all the time.
 G.
 On Sat, May 9, 2009 at 6:17 PM, Carl Witthoft [1]c...@witthoft.com
 wrote:
  Sorry, but your professor offered me $500 NOT to do your assignments.
 
  __
  [2]r-h...@r-project.org mailing list
  [3]https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 [4]http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 --
 Gabor Csardi [5]gabor.csa...@unil.ch UNIL DGM
 __
 [6]r-h...@r-project.org mailing list
 [7]https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 [8]http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

References

   1. mailto:c...@witthoft.com
   2. mailto:R-help@r-project.org
   3. https://stat.ethz.ch/mailman/listinfo/r-help
   4. http://www.R-project.org/posting-guide.html
   5. mailto:gabor.csa...@unil.ch
   6. mailto:R-help@r-project.org
   7. https://stat.ethz.ch/mailman/listinfo/r-help
   8. http://www.R-project.org/posting-guide.html
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Generating a conditional time variable

2009-05-09 Thread Vincent Arel-Bundock
 Hi everyone,

Please forgive me if my question is simple and my code terrible, I'm new to
R. I am not looking for a ready-made answer, but I would really appreciate
it if someone could share conceptual hints for programming, or point me
toward an R function/package that could speed up my processing time.

Thanks a lot for your help!

##

My dataframe includes the variables 'year', 'id', and 'eif' and has +/- 1.9
million id-year observations

I would like to do 2 things:

-1- I want to create a 'conditional_time' variable, which increases in
increments of 1 every year, but which resets during year(t) if event 'eif'
occured for this 'id' at year(t-1). It should also reset when we switch to a
new 'id'. For example:

dataframe = test
 yearid eif  conditional_time

1990   1010  01
1991   1010  02
1992   1010  13
1993   1010  01
1994   1010  02
1995   1010  03
1996   1010  04
1997   1010  15
1998   1010  01
1999   1010  02
2000   1010  03
2001   1010  04
2002   1010  05
2003   1010  06
1990   2010  01
1991   2010  02
1992   2010  03
1993   2010  04
1994   2010  05
1995   2010  06
1996   2010  07
1997   2010  08
1998   2010  09
1999   2010  010
2000   2010  011
2001   2010  112
2002   2010  01
2003   2010  02

-2- In a copy of the original dataframe, drop all id-year rows that
correspond to years after a given id has experienced his first 'eif' event.

I have written the code below to take care of -1-, but it is incredibly
inefficient. Given the size of my database, and considering how slow my
computer is, I don't think it's practical to use it. Also, it depends on
correct sorting of the dataframe, which might generate errors.

##

for (i in 1:nrow(test)) {
if (i == 1) {# If first id-year
cond_time - 1
test[i, 4] - cond_time

} else if ((test[i-1, 1]) != (test[i, 4])) { # If new id
cond_time - 1
test[i, 4] - cond_time
 } else {# Same id as previous row
if (test[i, 3] == 0) {
test[i, 4] - sum(cond_time, 1)
cond_time - test[i, 6]
} else {
test[i, 4] - sum(cond_time, 1)
cond_time - 0
}
}
}

-- 
Vincent Arel
M.A. Student, McGill University

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] a general way to select a subset of matrix rows?

2009-05-09 Thread Peter Kharchenko

Dear fellow R users,
I can't figure out how to do a simple thing properly: apply an operation 
to matrix columns on a selected subset of rows. Things go wrong when 
only one row is being selected. I am sure there's a way to do this 
properly.


 Here's an example:
# define a 3-by-4 matrix x
 x - matrix(runif(12),ncol=4)
 str(x)
num [1:3, 1:4] 0.568 0.217 0.309 0.859 0.651 ...

# calculate column means for selected rows
 rows - c(1,2)
 apply(x[rows,],2,mean)
[1] 0.3923531 0.7552746 0.3661532 0.1069531
# now the same thing, but the rows vector is actually just one row
 rows - c(2)
 apply(x[rows,],2,mean)
Error in apply(x[rows, ], 2, mean) : dim(X) must have a positive length

The problem is that while x[rows,] in the first case returned a matrix, 
in the second case, when only one row was selected, it returned a vector 
(and the apply obviously failed).  Is there a general way to subset a 
matrix so it still returns a matrix even if it's one row?
Unfortunately doing as.matrix(x[rows,]) doesn't work either, as it 
returns a transposed matrix in the case of a single row.


Is there a way to do this properly without writing out hideous if 
statements accounting for single row exception?


thanks,
-peter.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R package for estimating markov transition matrix from observations + confidence?

2009-05-09 Thread U.H
Dear R gurus,

I have data for which I want to estimate the markov transition matrix
that generated the sequence, and preferably obtain some measure of
confidence for that estimation.

e.g., for a series such as
 1 3 4 1 2 3 1 2 1 3 4 3 2 4 2 1 4 1 2 4 1 2 4 1 2 1 2 1 3 1

I would want to get an estimate of the matrix that generated it

[[originally:
[,1] [,2] [,3] [,4]
[1,] 0.00 0.33 0.33 0.33
[2,] 0.33 0.00 0.33 0.33
[3,] 0.33 0.33 0.00 0.33
[4,] 0.33 0.33 0.33 0.00
]]

and the confidence in that estimation.

I know that generating the cross--tab matrix is trivial, but if there
is a package that does that  and provides a likelihood as well, I'd
appreciate knowing about it.

Best,
Uri

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sqlSave()

2009-05-09 Thread Felipe Carrillo

Hi all: I have created a MS Access table named 'PredictedValues' through the 
statement below:
 myDB - odbcConnectAccess(C:/Documents and Settings/Owner/Desktop/Rpond 
Farming.mdb,uid=admin,pwd=)  
 sqlSave(myDB,PredictedValues,rownames=FALSE)
  close(myDB) 

But if I run the code again with new values I get the message below:
Error in sqlSave(myDB, PredictedValues, rownames = FALSE) : 
  table ‘PredictedValues’ already exists
and my new records don't get updated.

I was under the impression that 'sqlSave' would copy new data on top of the 
existing one or if the table didn't exist it would create one with the new 
values. I tried 'sqlUpdate' but my existing 'PredictedValues' didn't update. 
What am I doing wrong.
?



Felipe D. Carrillo  
Supervisory Fishery Biologist  
Department of the Interior  
US Fish  Wildlife Service  
California, USA




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sqlSave()

2009-05-09 Thread Felipe Carrillo

Sorry, I'am resending it because I forgot to send my system info(below)

Hi all: 
I have created a MS Access table named 'PredictedValues' through the statement 
below:
myDB - odbcConnectAccess(C:/Documents and Settings/Owner/Desktop/Rpond 
Farming.mdb,uid=admin,pwd=)
sqlSave(myDB,PredictedValues,rownames=FALSE)
  close(myDB) 

But if I run the code again with new values I get the message below:
Error in sqlSave(myDB, PredictedValues, rownames = FALSE) : 
  table ‘PredictedValues’ already exists
and my new records don't get updated.

I was under the impression that 'sqlSave' would copy new data on top of the 
existing one or if the table didn't exist it would create one with the new 
values. I tried 'sqlUpdate' but my existing 'PredictedValues' didn't update. 
What am I doing wrong.
?
 

sessionInfo()
R version 2.9.0 (2009-04-17) 
i386-pc-mingw32 

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
States.1252;LC_MONETARY=English_United 
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] graphics  grDevices datasets  tools stats grid  utils 
methods   base 

other attached packages:
 [1] RODBC_1.2-5 forecast_1.23   tseries_0.10-11 quadprog_1.4-10 zoo_1.3-1  
 hexbin_1.17.0   xtable_1.5-5lattice_0.17-22 plyr_0.1.8  
ggplot2_0.8.3   reshape_0.8.0   proto_0.3-7
[13] rcom_2.1-3  rscproxy_1.3-1 


Felipe D. Carrillo  
Supervisory Fishery Biologist  
Department of the Interior  
US Fish  Wildlife Service  
California, USA





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Overloading some non-dispatched S3 methods for new classes

2009-05-09 Thread Carlos J. Gil Bellosta
Hello,

I am building a package that creates a new kind of object not unlike a
dataframe. However, it is not an extension of a dataframe, as the data
themselves reside elsewhere. It only contains metadata.

I would like to be able to retrieve data from my objects such as the
number of rows, the number of columns, the colnames, etc.

I --quite naively-- thought that ncol, nrow, colnames, etc. would be
dispatched, so I would only need to create a, say, ncol.myclassname
function so as to be able to invoke ncol directly and transparently.

However, it is not the case. The only alternative I can think about is
to create decorated versions of ncol, nrow, etc. to avoid naming
conflicts. But I would still prefer my package users to be able to use
the undecorated function names.

Do I have a chance?

Best regards,

Carlos J. Gil Bellosta
http://www.datanalytics.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Generating a conditional time variable

2009-05-09 Thread Finak Greg
Assuming the year column has complete data and doesn't skip a year, the 
following should take care of 1)

#Simulated data frame: year from 1990 to 2003, for 5 different ids, each having 
one or two eif events
test-data.frame(year=rep(1990:2003,5),id=gl(5,length(1990:2003)),eif=as.vector(sapply(1:5,function(z){a-rep(0,length(1990:2003));a[sample(1:length(1990:2003),sample(1:2,1))]-1;a})))

#Generate the conditional_time column.
test-do.call(rbind,lapply(split(test,test$id),function(z){s-0;data.frame(z,cond_time=sapply(z$eif,function(i)ifelse(i==1,s-1,s-s+1)))}))

Generally sapply, lapply, and apply are faster than for loops. split() will 
split your data frame by the $id column (second argument). lapply() loops 
through the resulting list and generates the cond_time variable, resetting when 
eif==1, otherwise incrementing the count, much as you have in your code.


If I understand 2) correctly, the following should do the trick:
test2-test; #copy the data frame
test2-do.call(rbind,lapply(split(test,test$id),function(z)z[1:which(z$eif==1)[1],]))

Similar to the former, but sub-setting the rows of the data data frame up to 
the first event, for each id.

If the above is all you need, then 1) and 2) could be combined in a single call.

Others will likely have a different approach..

Cheers,

--
Greg Finak
Post-Doctoral Research Associate
Computational Biology Unit
Institut des Recherches Cliniques de Montreal
Montreal, QC.


On 09/05/09 1:40 PM, Vincent Arel-Bundock vincent.a...@gmail.com wrote:

Hi everyone,

Please forgive me if my question is simple and my code terrible, I'm new to
R. I am not looking for a ready-made answer, but I would really appreciate
it if someone could share conceptual hints for programming, or point me
toward an R function/package that could speed up my processing time.

Thanks a lot for your help!

##

My dataframe includes the variables 'year', 'id', and 'eif' and has +/- 1.9
million id-year observations

I would like to do 2 things:

-1- I want to create a 'conditional_time' variable, which increases in
increments of 1 every year, but which resets during year(t) if event 'eif'
occured for this 'id' at year(t-1). It should also reset when we switch to a
new 'id'. For example:

dataframe = test
 yearid eif  conditional_time

1990   1010  01
1991   1010  02
1992   1010  13
1993   1010  01
1994   1010  02
1995   1010  03
1996   1010  04
1997   1010  15
1998   1010  01
1999   1010  02
2000   1010  03
2001   1010  04
2002   1010  05
2003   1010  06
1990   2010  01
1991   2010  02
1992   2010  03
1993   2010  04
1994   2010  05
1995   2010  06
1996   2010  07
1997   2010  08
1998   2010  09
1999   2010  010
2000   2010  011
2001   2010  112
2002   2010  01
2003   2010  02

-2- In a copy of the original dataframe, drop all id-year rows that
correspond to years after a given id has experienced his first 'eif' event.

I have written the code below to take care of -1-, but it is incredibly
inefficient. Given the size of my database, and considering how slow my
computer is, I don't think it's practical to use it. Also, it depends on
correct sorting of the dataframe, which might generate errors.

##

for (i in 1:nrow(test)) {
if (i == 1) {# If first id-year
cond_time - 1
test[i, 4] - cond_time

} else if ((test[i-1, 1]) != (test[i, 4])) { # If new id
cond_time - 1
test[i, 4] - cond_time
 } else {# Same id as previous row
if (test[i, 3] == 0) {
test[i, 4] - sum(cond_time, 1)
cond_time - test[i, 6]
} else {
test[i, 4] - sum(cond_time, 1)
cond_time - 0
}
}
}

--
Vincent Arel
M.A. Student, McGill University

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Generating a conditional time variable

2009-05-09 Thread Finak Greg
That will teach me to post without a double-check.

On 09/05/09 3:11 PM, Finak Greg greg.fi...@ircm.qc.ca wrote:

Assuming the year column has complete data and doesn't skip a year, the 
following should take care of 1)

#Simulated data frame: year from 1990 to 2003, for 5 different ids, each having 
one or two eif events
test-data.frame(year=rep(1990:2003,5),id=gl(5,length(1990:2003)),eif=as.vector(sapply(1:5,function(z){a-rep(0,length(1990:2003));a[sample(1:length(1990:2003),sample(1:2,1))]-1;a})))

#Generate the conditional_time column.
test-do.call(rbind,lapply(split(test,test$id),function(z){s-0;data.frame(z,cond_time=sapply(z$eif,function(i)ifelse(i==1,s-1,s-s+1)))}))

The above resets the count at eif==1 rather than after, and there's a local 
assignment to s which should be global.
Thanks, David, for noting that.

 
do.call(rbind,lapply(split(test,test$id),function(z){s-0;data.frame(z,cond_time=sapply(z$eif,function(i)ifelse(i==1,{l-s+1;s-0;l},{l-s+1;s-s+1;l})))}))

Generally sapply, lapply, and apply are faster than for loops. split() will 
split your data frame by the $id column (second argument). lapply() loops 
through the resulting list and generates the cond_time variable, resetting when 
eif==1, otherwise incrementing the count, much as you have in your code.


If I understand 2) correctly, the following should do the trick:
test2-test; #copy the data frame
test2-do.call(rbind,lapply(split(test,test$id),function(z)z[1:which(z$eif==1)[1],]))

Similar to the former, but sub-setting the rows of the data data frame up to 
the first event, for each id.

If the above is all you need, then 1) and 2) could be combined in a single call.

Others will likely have a different approach..

Cheers,

--
Greg Finak
Post-Doctoral Research Associate
Computational Biology Unit
Institut des Recherches Cliniques de Montreal
Montreal, QC.


On 09/05/09 1:40 PM, Vincent Arel-Bundock vincent.a...@gmail.com wrote:

Hi everyone,

Please forgive me if my question is simple and my code terrible, I'm new to
R. I am not looking for a ready-made answer, but I would really appreciate
it if someone could share conceptual hints for programming, or point me
toward an R function/package that could speed up my processing time.

Thanks a lot for your help!

##

My dataframe includes the variables 'year', 'id', and 'eif' and has +/- 1.9
million id-year observations

I would like to do 2 things:

-1- I want to create a 'conditional_time' variable, which increases in
increments of 1 every year, but which resets during year(t) if event 'eif'
occured for this 'id' at year(t-1). It should also reset when we switch to a
new 'id'. For example:

dataframe = test
 yearid eif  conditional_time

1990   1010  01
1991   1010  02
1992   1010  13
1993   1010  01
1994   1010  02
1995   1010  03
1996   1010  04
1997   1010  15
1998   1010  01
1999   1010  02
2000   1010  03
2001   1010  04
2002   1010  05
2003   1010  06
1990   2010  01
1991   2010  02
1992   2010  03
1993   2010  04
1994   2010  05
1995   2010  06
1996   2010  07
1997   2010  08
1998   2010  09
1999   2010  010
2000   2010  011
2001   2010  112
2002   2010  01
2003   2010  02

-2- In a copy of the original dataframe, drop all id-year rows that
correspond to years after a given id has experienced his first 'eif' event.

I have written the code below to take care of -1-, but it is incredibly
inefficient. Given the size of my database, and considering how slow my
computer is, I don't think it's practical to use it. Also, it depends on
correct sorting of the dataframe, which might generate errors.

##

for (i in 1:nrow(test)) {
if (i == 1) {# If first id-year
cond_time - 1
test[i, 4] - cond_time

} else if ((test[i-1, 1]) != (test[i, 4])) { # If new id
cond_time - 1
test[i, 4] - cond_time
 } else {# Same id as previous row
if (test[i, 3] == 0) {
test[i, 4] - sum(cond_time, 1)
cond_time - test[i, 6]
} else {
test[i, 4] - sum(cond_time, 1)
cond_time - 0
}
}
}

--
Vincent Arel
M.A. Student, McGill University

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] need help with chisq

2009-05-09 Thread JC

I am very new to R. I have some data from a CVS stored in vdata with 4
columns labeled:
X08, Y08, X09, Y09.

I have created two new columns like so:

Z08 - (vdata$X08-vdata$Y08)

Z09 - (vdata$X09-vdata$Y09)

I would like to use chisq.test for each row and output the p-value
for each in a stored variable. I don't know how to do it. Can you
help?

so far I have done it for one row (but I want it done automatically
for all my data):

chidata=rbind(c(vdata$Y08[1],Z08[1]),c(vdata$Y09[1],Z09[1]))
results - chisq.test(chidata)
results$p.value

I tried removing the [1] and the c() but that didn't work...  Any
ideas?

THANKS!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] need help with chisq

2009-05-09 Thread David Winsemius


On May 9, 2009, at 4:53 PM, JC wrote:



I am very new to R. I have some data from a CVS stored in vdata with 4
columns labeled:
X08, Y08, X09, Y09.

I have created two new columns like so:

Z08 - (vdata$X08-vdata$Y08)

Z09 - (vdata$X09-vdata$Y09)

I would like to use chisq.test for each row


Of what?


and output the p-value
for each in a stored variable. I don't know how to do it. Can you
help?

so far I have done it for one row (but I want it done automatically
for all my data):

chidata=rbind(c(vdata$Y08[1],Z08[1]),c(vdata$Y09[1],Z09[1]))


Maybe I am dense, but I cannot figure out what hypothesis is being  
tested.


results - chisq.test(chidata)
results$p.value


Generally using apply(vdata, 1, . would give you a row by row  
computation.



I tried removing the [1] and the c() but that didn't work...  Any
ideas?


As Jim Holtman's tag line says: What problem are you trying to solve?

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] a general way to select a subset of matrix rows?

2009-05-09 Thread Henrique Dallazuanna
Yes,

use the drop argument;

apply(x[rows,,drop=F],2,mean)


On Sat, May 9, 2009 at 2:33 PM, Peter Kharchenko 
peter.kharche...@post.harvard.edu wrote:

 Dear fellow R users,
 I can't figure out how to do a simple thing properly: apply an operation to
 matrix columns on a selected subset of rows. Things go wrong when only one
 row is being selected. I am sure there's a way to do this properly.

  Here's an example:
 # define a 3-by-4 matrix x
  x - matrix(runif(12),ncol=4)
  str(x)
 num [1:3, 1:4] 0.568 0.217 0.309 0.859 0.651 ...

 # calculate column means for selected rows
  rows - c(1,2)
  apply(x[rows,],2,mean)
 [1] 0.3923531 0.7552746 0.3661532 0.1069531
 # now the same thing, but the rows vector is actually just one row
  rows - c(2)
  apply(x[rows,],2,mean)
 Error in apply(x[rows, ], 2, mean) : dim(X) must have a positive length

 The problem is that while x[rows,] in the first case returned a matrix, in
 the second case, when only one row was selected, it returned a vector (and
 the apply obviously failed).  Is there a general way to subset a matrix so
 it still returns a matrix even if it's one row?
 Unfortunately doing as.matrix(x[rows,]) doesn't work either, as it returns
 a transposed matrix in the case of a single row.

 Is there a way to do this properly without writing out hideous if
 statements accounting for single row exception?

 thanks,
 -peter.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Generating a conditional time variable

2009-05-09 Thread William Dunlap
You might try the following function.  First it identifies the last element in 
each run, then the length of each run, then calls sequence() to generate the 
within-run sequence numbers.  my.sequence is a version of sequence that is more 
efficient (less time, less memory) than sequence when there are lots of short 
runs (sequence() calls lapply, which makes a memory consuming list, and then 
unlists it, and my.sequence avoids the big intermediate list).

For your data, f(data) produces the same thing as data$conditional_time.

f-function(data, use.my.sequence=FALSE){
   n-nrow(data)
   lastInRun - with(data, eif | c(id[-1]!=id[-n], TRUE))
   runLengths - diff(c(0L,which(lastInRun)))
   if (use.my.sequence) {
  my.sequence- 
function(nvec)seq_len(sum(nvec))-rep.int(c(0L,cumsum(nvec[-length(nvec)])),nvec)
  my.sequence(runLengths)
   } else {
  sequence(runLengths)
   }
}

Bill Dunlap, Spotfire Division, TIBCO Software Inc.
 


 Hi everyone,

Please forgive me if my question is simple and my code terrible, I'm new to
R. I am not looking for a ready-made answer, but I would really appreciate
it if someone could share conceptual hints for programming, or point me
toward an R function/package that could speed up my processing time.

Thanks a lot for your help!

##

My dataframe includes the variables 'year', 'id', and 'eif' and has +/- 1.9
million id-year observations

I would like to do 2 things:

-1- I want to create a 'conditional_time' variable, which increases in
increments of 1 every year, but which resets during year(t) if event 'eif'
occured for this 'id' at year(t-1). It should also reset when we switch to a
new 'id'. For example:

dataframe = test
 yearid eif  conditional_time

1990   1010  01
1991   1010  02
1992   1010  13
1993   1010  01
1994   1010  02
1995   1010  03
1996   1010  04
1997   1010  15
1998   1010  01
1999   1010  02
2000   1010  03
2001   1010  04
2002   1010  05
2003   1010  06
1990   2010  01
1991   2010  02
1992   2010  03
1993   2010  04
1994   2010  05
1995   2010  06
1996   2010  07
1997   2010  08
1998   2010  09
1999   2010  010
2000   2010  011
2001   2010  112
2002   2010  01
2003   2010  02

-2- In a copy of the original dataframe, drop all id-year rows that
correspond to years after a given id has experienced his first 'eif' event.

I have written the code below to take care of -1-, but it is incredibly
inefficient. Given the size of my database, and considering how slow my
computer is, I don't think it's practical to use it. Also, it depends on
correct sorting of the dataframe, which might generate errors.

##

for (i in 1:nrow(test)) {
if (i == 1) {# If first id-year
cond_time - 1
test[i, 4] - cond_time

} else if ((test[i-1, 1]) != (test[i, 4])) { # If new id
cond_time - 1
test[i, 4] - cond_time
 } else {# Same id as previous row
if (test[i, 3] == 0) {
test[i, 4] - sum(cond_time, 1)
cond_time - test[i, 6]
} else {
test[i, 4] - sum(cond_time, 1)
cond_time - 0
}
}
}

--
Vincent Arel
M.A. Student, McGill University

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reading large files quickly

2009-05-09 Thread Rob Steele
I'm finding that readLines() and read.fwf() take nearly two hours to
work through a 3.5 GB file, even when reading in large (100 MB) chunks.
 The unix command wc by contrast processes the same file in three
minutes.  Is there a faster way to read files in R?

Thanks!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] clump of binary pixels on raster

2009-05-09 Thread milton ruser
Dear all,

I have a set od 30,000 binary landscapes, which represent habitat and
non-habitat cover.
I need to generate images that identify those neighbour (rule 8) pixels as
one patch ID,
and a different patch ID for each clump of pixels. I coded it using
labcon(adehabitat),
but as some of my landscapes have so many patches, labcon not finish and
entry in
a eternal looping. By other side, I coded another solution using R  grass
(r.clump),
but the solution is so slow, and as I need to run it a lot of time, I will
need about 3 weeks
to finish... I was thinking if raster package could do the job fastly than
R-grass.
Below you can find a simulation of what I need. On the second image, each
color
have different values.

MyMatrix-matrix(rep(0,100), ncol=10)
MyMatrix[2:4,3:6]-1
MyMatrix[7:8,1:3]-1
MyMatrix[8,7:8]-1
MyMatrix[8,7:8]-1
MyMatrix[6:7,8:9]-1
x11(800,400)
par(mfrow=c(1,2))
image(MyMatrix)

MyClusters-matrix(rep(0,100), ncol=10)
MyClusters[2:4,3:6]-1
MyClusters[7:8,1:3]-2
MyClusters[8,7:8]-3
MyClusters[8,7:8]-4
MyClusters[6:7,8:9]-4
image(MyClusters, col=c(transparent, 1,3,4,5))

Regards a lot,

milton
brazil=toronto.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Histogram frequencies with a normal pdf curve overlay

2009-05-09 Thread S Ellison
Assuming a constant bin width, you need to multiply the density by
n*binwidth, where the bin width is (obviously!) the width of the
histogram bins.



 Jacques Wagnor jacques.wag...@gmail.com 05/09/09 5:10 PM 
Dear List,

When I plot a histogram with 'freq=FALSE' and overlay the histogram
with a normal pdf curve, everything looks as expected, as follows:

x - rnorm(1000)
hist(x, freq=FALSE)
curve(dnorm(x), add=TRUE, col=blue)

What do I need to do if I want to show the frequencies (freq=TRUE)
with the same normal pdf overlay, so that the plot would still look
the same?

Regards,

Jacques

platform   i386-pc-mingw32
arch   i386
os mingw32
system i386, mingw32
status
major  2
minor  8.0
year   2008
month  10
day20
svn rev46754
language   R
version.string R version 2.8.0 (2008-10-20)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading large files quickly

2009-05-09 Thread Gabor Grothendieck
You could try it with sqldf and see if that is any faster.
It use RSQLite/sqlite to read the data into a database without
going through R and from there it reads all or a portion as
specified into R.  It requires two lines of code of the form:

f  file(myfile.dat)
DF - sqldf(select * from f, dbname = tempfile())

with appropriate modification to specify the format of your file and
possibly to indicate a portion only.  See example 6 on the sqldf
home page: http://sqldf.googlecode.com
and ?sqldf


On Sat, May 9, 2009 at 12:25 PM, Rob Steele
freenx.10.robste...@xoxy.net wrote:
 I'm finding that readLines() and read.fwf() take nearly two hours to
 work through a 3.5 GB file, even when reading in large (100 MB) chunks.
  The unix command wc by contrast processes the same file in three
 minutes.  Is there a faster way to read files in R?

 Thanks!

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading large files quickly

2009-05-09 Thread jim holtman
First 'wc' and readLines are doing vastly different functions.  'wc' is just
reading through the file without having to allocate memory to it;
'readLines' is actually storing the data in memory.

I have a 150MB file I was trying it on, and here is what 'wc' did on my
Windows system:

/cygdrive/c: time wc tempxx.txt
  1055808  13718468 151012320 tempxx.txt
real0m2.343s
user0m1.702s
sys 0m0.436s
/cygdrive/c:

If I multiply that by 25 to extrapolate to a 3.5GB file, it should take
about a little less than one minute to process on my relatively slow laptop.

'readLines' on the same file takes:

 system.time(x - readLines('/tempxx.txt'))
   user  system elapsed
  37.820.47   39.23
If I extrapolate that to 3.5GB, it would take about 16 minutes.  Now
considering that I only have 2GB on my system, I would not be able to read
the whole file in at once.

You never did specify what type of system you were running on and how much
memory you had.  Were you 'paging' due to lack of memory?

 system.time(x - readLines('/tempxx.txt'))
   user  system elapsed
  37.820.47   39.23
 object.size(x)
84814016 bytes



On Sat, May 9, 2009 at 12:25 PM, Rob Steele freenx.10.robste...@xoxy.netwrote:

 I'm finding that readLines() and read.fwf() take nearly two hours to
 work through a 3.5 GB file, even when reading in large (100 MB) chunks.
  The unix command wc by contrast processes the same file in three
 minutes.  Is there a faster way to read files in R?

 Thanks!

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Generating a conditional time variable

2009-05-09 Thread jim holtman
Here is yet another way of doing it (always the case in R):

#Simulated data frame: year from 1990 to 2003, for 5 different ids, each
having one or two eif events
test-data.frame(year=rep(1990:2003,5),id=gl(5,length(1990:2003)),
eif=as.vector(sapply(1:5,function(z){
a-rep(0,length(1990:2003))
a[sample(1:length(1990:2003),sample(1:2,1))]-1
a
})))

# partition by 'id' and then by 'eif' changes
test.new - do.call(rbind, lapply(split(test, test$id), function(.id){
# now by 'eif' changes
do.call(rbind, lapply(split(.id, cumsum(.id$eif)), function(.eif){
# create new dataframe with column
cbind(.eif, conditional_time=seq(nrow(.eif)))
}))
}))



On Sat, May 9, 2009 at 1:40 PM, Vincent Arel-Bundock vincent.a...@gmail.com
 wrote:

  Hi everyone,

 Please forgive me if my question is simple and my code terrible, I'm new to
 R. I am not looking for a ready-made answer, but I would really appreciate
 it if someone could share conceptual hints for programming, or point me
 toward an R function/package that could speed up my processing time.

 Thanks a lot for your help!

 ##

 My dataframe includes the variables 'year', 'id', and 'eif' and has +/- 1.9
 million id-year observations

 I would like to do 2 things:

 -1- I want to create a 'conditional_time' variable, which increases in
 increments of 1 every year, but which resets during year(t) if event 'eif'
 occured for this 'id' at year(t-1). It should also reset when we switch to
 a
 new 'id'. For example:

 dataframe = test
  yearid eif  conditional_time

 1990   1010  01
 1991   1010  02
 1992   1010  13
 1993   1010  01
 1994   1010  02
 1995   1010  03
 1996   1010  04
 1997   1010  15
 1998   1010  01
 1999   1010  02
 2000   1010  03
 2001   1010  04
 2002   1010  05
 2003   1010  06
 1990   2010  01
 1991   2010  02
 1992   2010  03
 1993   2010  04
 1994   2010  05
 1995   2010  06
 1996   2010  07
 1997   2010  08
 1998   2010  09
 1999   2010  010
 2000   2010  011
 2001   2010  112
 2002   2010  01
 2003   2010  02

 -2- In a copy of the original dataframe, drop all id-year rows that
 correspond to years after a given id has experienced his first 'eif' event.

 I have written the code below to take care of -1-, but it is incredibly
 inefficient. Given the size of my database, and considering how slow my
 computer is, I don't think it's practical to use it. Also, it depends on
 correct sorting of the dataframe, which might generate errors.

 ##

 for (i in 1:nrow(test)) {
if (i == 1) {# If first id-year
cond_time - 1
test[i, 4] - cond_time

} else if ((test[i-1, 1]) != (test[i, 4])) { # If new id
cond_time - 1
test[i, 4] - cond_time
 } else {# Same id as previous row
if (test[i, 3] == 0) {
test[i, 4] - sum(cond_time, 1)
cond_time - test[i, 6]
} else {
test[i, 4] - sum(cond_time, 1)
cond_time - 0
}
}
 }

 --
 Vincent Arel
 M.A. Student, McGill University

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Generating a conditional time variable

2009-05-09 Thread jim holtman
Corrected version.  I forgot the the count had to change 'after' eif==1:

#Simulated data frame: year from 1990 to 2003, for 5 different ids, each
having one or two eif events
test-data.frame(year=rep(1990:2003,5),id=gl(5,length(1990:2003)),
eif=as.vector(sapply(1:5,function(z){
a-rep(0,length(1990:2003))
a[sample(1:length(1990:2003),sample(1:2,1))]-1
a
})))
# partition by 'id' and then by 'eif' changes
test.new - do.call(rbind, lapply(split(test, test$id), function(.id){
# now by 'eif' changes
do.call(rbind, lapply(split(.id, cumsum(c(0, diff(.id$eif) == -1))),
function(.eif){
cbind(.eif, conditional_time=seq(nrow(.eif)))
}))
}))



On Sat, May 9, 2009 at 1:40 PM, Vincent Arel-Bundock vincent.a...@gmail.com
 wrote:

  Hi everyone,

 Please forgive me if my question is simple and my code terrible, I'm new to
 R. I am not looking for a ready-made answer, but I would really appreciate
 it if someone could share conceptual hints for programming, or point me
 toward an R function/package that could speed up my processing time.

 Thanks a lot for your help!

 ##

 My dataframe includes the variables 'year', 'id', and 'eif' and has +/- 1.9
 million id-year observations

 I would like to do 2 things:

 -1- I want to create a 'conditional_time' variable, which increases in
 increments of 1 every year, but which resets during year(t) if event 'eif'
 occured for this 'id' at year(t-1). It should also reset when we switch to
 a
 new 'id'. For example:

 dataframe = test
  yearid eif  conditional_time

 1990   1010  01
 1991   1010  02
 1992   1010  13
 1993   1010  01
 1994   1010  02
 1995   1010  03
 1996   1010  04
 1997   1010  15
 1998   1010  01
 1999   1010  02
 2000   1010  03
 2001   1010  04
 2002   1010  05
 2003   1010  06
 1990   2010  01
 1991   2010  02
 1992   2010  03
 1993   2010  04
 1994   2010  05
 1995   2010  06
 1996   2010  07
 1997   2010  08
 1998   2010  09
 1999   2010  010
 2000   2010  011
 2001   2010  112
 2002   2010  01
 2003   2010  02

 -2- In a copy of the original dataframe, drop all id-year rows that
 correspond to years after a given id has experienced his first 'eif' event.

 I have written the code below to take care of -1-, but it is incredibly
 inefficient. Given the size of my database, and considering how slow my
 computer is, I don't think it's practical to use it. Also, it depends on
 correct sorting of the dataframe, which might generate errors.

 ##

 for (i in 1:nrow(test)) {
if (i == 1) {# If first id-year
cond_time - 1
test[i, 4] - cond_time

} else if ((test[i-1, 1]) != (test[i, 4])) { # If new id
cond_time - 1
test[i, 4] - cond_time
 } else {# Same id as previous row
if (test[i, 3] == 0) {
test[i, 4] - sum(cond_time, 1)
cond_time - test[i, 6]
} else {
test[i, 4] - sum(cond_time, 1)
cond_time - 0
}
}
 }

 --
 Vincent Arel
 M.A. Student, McGill University

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Citing R/Packages Question

2009-05-09 Thread Achim Zeileis

On Sat, 9 May 2009, roger koenker wrote:


I've had an email exchange with the authors of a recent paper
in Nature who also made a good faith effort to cite both R and the quantreg
package, and were told that the Nature house style didn't allow such
citations so they were dropped from the published paper and the
supplementary material appearing on the Nature website.


Interesting. Software manuals with an ISBN are not good enough for the 
Nature house style? I wonder what the problem with that could be...


Since the CRAN website makes a special effort to make prior versions of 
packages available, it would seem to me to be much more useful to cite 
version numbers than access dates.


Definitely, yes. Current versions of R with current versions of quantreg 
for example yield:


  Roger Koenker (2009). quantreg: Quantile Regression.
  R package version 4.27. http://CRAN.R-project.org/package=quantreg

Even if 4.27 is not current anymore it will be available under the 
archive link at the above URL. So an access date is not necessary. 
Pointing this out to the journal editors might help. If not, providing the 
access date (while keeping all other information) won't do any damage.



There  are serious questions about the
ephemerality of url citations, not all of which are adequately resolved
by the Wayback machine, and access dating, but it would be nice to
have some better standards for such contingent citations rather than
leave authors at the mercy of copy editors.  I would also be interested in
suggestions by other contributors.


I wouldn't be aware of good generally applicable standards of citing 
software. The default output of citation() has been chosen because 
repository+package+version uniquely identify which package was used 
(not unsimilar to journal+volume+pages). Also, using the URL

  http://CRAN.R-project.org/package=quantreg
has the advantage that it is independent of the physical location on CRAN. 
So in case the structure of the package pages on CRAN changes in the 
future, the URL will still point to the relevant page with all necessary 
information.


Best,
Z

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Spatstat

2009-05-09 Thread Richard Chirgwin
Hi all,

I am trying to install Spatstat on OpenSUSE 11.1.
install.packages(spatstat, dependencies = TRUE)
fails on the basis of various compiler packages (full message below).

I have gcc version 4.3.2, which should include gfortran and g++ - so I'm not
sure what to do!

Richard

* Installing *source* package ‘deldir’ ...
** libs
gfortran   -fpic  -O2 -c acchk.f -o acchk.o
make: gfortran: Command not found
make: *** [acchk.o] Error 127
ERROR: compilation failed for package ‘deldir’
* Removing ‘/home/richard/R/i686-pc-linux-gnu-library/2.9/deldir’
* Installing *source* package ‘spatstat’ ...
** libs
gcc -std=gnu99 -I/usr/lib/R/include  -I/usr/local/include-fpic  -O2 -c
Kborder.c -o Kborder.o
gcc -std=gnu99 -I/usr/lib/R/include  -I/usr/local/include-fpic  -O2 -c
Kwborder.c -o Kwborder.o
g++ -I/usr/lib/R/include  -I/usr/local/include-fpic  -O2 -c
PerfectStrauss.cc -o PerfectStrauss.o
make: g++: Command not found
make: *** [PerfectStrauss.o] Error 127
ERROR: compilation failed for package ‘spatstat’
* Removing ‘/home/richard/R/i686-pc-linux-gnu-library/2.9/spatstat’

The downloaded packages are in
‘/tmp/RtmpdcNYyo/downloaded_packages’
Warning messages:
1: In install.packages(spatstat) :
  installation of package 'deldir' had non-zero exit status
2: In install.packages(spatstat) :
  installation of package 'spatstat' had non-zero exit status

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading large files quickly

2009-05-09 Thread Jakson Alves de Aquino
Rob Steele wrote:
 I'm finding that readLines() and read.fwf() take nearly two hours to
 work through a 3.5 GB file, even when reading in large (100 MB) chunks.
  The unix command wc by contrast processes the same file in three
 minutes.  Is there a faster way to read files in R?

I use statist to convert the fixed width data file into a csv file
because read.table() is considerably faster than read.fwf(). For example:

system(statist --na-string NA --xcols collist big.txt big.csv)
bigdf - read.table(file = big.csv, header=T, as.is=T)

The file collist is a text file whose lines contain the following
information:

variable begin end

where variable is the column name, and begin and end are integer
numbers indicating where in big.txt the columns begin and end.

Statist can be downloaded from: http://statist.wald.intevation.org/

-- 
Jakson Aquino
Social Sciences Department
Federal University of Ceará, Brazil

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] I don't see libR.so in my installation directory

2009-05-09 Thread cls59



Tena Sakai wrote:
 
 
 
 I became aware of such as I was preparing for
 an installation of little r.  The installation
 material stated to look for libR.so, and I want
 to make sure that the one I installed (2.9.0)
 is used by little r.
 
 
 


little r... do you mean the scripting front end for R? If so, the core
utility Rscript is probably installed (it was added in 2.5.0 I believe) and
provides the functionality of little r, including hash-bang lines. Check the
bin folder in the R installation.

If you are talking about something different, ignore this message :)

-Charlie

-
Charlie Sharpsteen
Undergraduate
Environmental Resources Engineering
Humboldt State University
-- 
View this message in context: 
http://www.nabble.com/I-don%27t-see-libR.so-in-my-installation-directory-tp23455074p23466363.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading large files quickly

2009-05-09 Thread Rob Steele
Thanks guys, good suggestions.  To clarify, I'm running on a fast
multi-core server with 16 GB RAM under 64 bit CentOS 5 and R 2.8.1.
Paging shouldn't be an issue since I'm reading in chunks and not trying
to store the whole file in memory at once.  Thanks again.

Rob Steele wrote:
 I'm finding that readLines() and read.fwf() take nearly two hours to
 work through a 3.5 GB file, even when reading in large (100 MB) chunks.
  The unix command wc by contrast processes the same file in three
 minutes.  Is there a faster way to read files in R?
 
 Thanks!


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading large files quickly

2009-05-09 Thread jim holtman
Since you are reading it in chunks, I assume that you are writing out each
segment as you read it in.  How are you writing it out to save it?  Is the
time you are quoting both the reading and the writing?  If so, can you break
down the differences in what these operations are taking?

How do you plan to use the data?  Is it all numeric?  Are you keeping it in
a dataframe?  Have you considered using 'scan' to read in the data and to
specify what the columns are?  If you would like some more help, the answer
to these questions will help.

On Sat, May 9, 2009 at 10:09 PM, Rob Steele freenx.10.robste...@xoxy.netwrote:

 Thanks guys, good suggestions.  To clarify, I'm running on a fast
 multi-core server with 16 GB RAM under 64 bit CentOS 5 and R 2.8.1.
 Paging shouldn't be an issue since I'm reading in chunks and not trying
 to store the whole file in memory at once.  Thanks again.

 Rob Steele wrote:
  I'm finding that readLines() and read.fwf() take nearly two hours to
  work through a 3.5 GB file, even when reading in large (100 MB) chunks.
   The unix command wc by contrast processes the same file in three
  minutes.  Is there a faster way to read files in R?
 
  Thanks!
  

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to get design matrix?

2009-05-09 Thread linakpl

If I was doing an ANOVA analysis how can I get the design matrix R used?
-- 
View this message in context: 
http://www.nabble.com/how-to-get-design-matrix--tp23466549p23466549.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to get design matrix?

2009-05-09 Thread David Winsemius


Got code?

On May 9, 2009, at 10:29 PM, linakpl wrote:



If I was doing an ANOVA analysis how can I get the design matrix R  
used?

--



David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Comparing COXPH models, one with age as a continuous variable, one with age as a three-level factor

2009-05-09 Thread John Sorkin
Windows XP
R 2.8.1

I am trying to use anova(fitCont,fitCat) to compare two Cox models (coxph) one 
in which age is entered as a continuous variable, and a second where age is 
entered as a three-level factor (young, middle, old). The Analysis of Deviance 
Table produced by anova does not give a p value. Is there any way to get anova 
to produce p values?

Thank you,
John Sorkin


ANOVA results are pasted below:

 anova(fitCont,fitCat)
Analysis of Deviance Table

Model 1: Surv(Time30, Died) ~ Rx + Age
Model 2: Surv(Time30, Died) ~ Rx + AgeGrp
  Resid. Df Resid. Dev Df Deviance
162 147.38
261 142.38  1 5.00



The entire program including the original coxph models follows:


 fitCont-coxph(Surv(Time30,Died)~Rx+Age,data=GVHDdata)

 summary(fitCont)
Call:
coxph(formula = Surv(Time30, Died) ~ Rx + Age, data = GVHDdata)

  n= 64 
 coef exp(coef) se(coef)z  p
Rx  1.375  3.96   0.5318 2.59 0.0097
Age 0.055  1.06   0.0252 2.19 0.0290

exp(coef) exp(-coef) lower .95 upper .95
Rx   3.96  0.253  1.40 11.22
Age  1.06  0.946  1.01  1.11

Rsquare= 0.154   (max possible= 0.915 )
Likelihood ratio test= 10.7  on 2 df,   p=0.00483
Wald test= 9.46  on 2 df,   p=0.0088
Score (logrank) test = 10.2  on 2 df,   p=0.00626


 fitCat-coxph(Surv(Time30,Died)~Rx+AgeGrp,data=GVHDdata)

 summary(fitCat)
Call:
coxph(formula = Surv(Time30, Died) ~ Rx + AgeGrp, data = GVHDdata)

  n= 64 
  coef exp(coef) se(coef)z p
Rx1.19  3.270.525 2.26 0.024
AgeGrp[T.(15,25]] 1.98  7.260.771 2.57 0.010
AgeGrp[T.(25,45]] 1.61  5.020.806 2.00 0.045

  exp(coef) exp(-coef) lower .95 upper .95
Rx 3.27  0.306  1.17  9.16
AgeGrp[T.(15,25]]  7.26  0.138  1.60 32.88
AgeGrp[T.(25,45]]  5.02  0.199  1.04 24.38

Rsquare= 0.217   (max possible= 0.915 )
Likelihood ratio test= 15.7  on 3 df,   p=0.00133
Wald test= 12.0  on 3 df,   p=0.0075
Score (logrank) test = 14.5  on 3 df,   p=0.00232


 anova(fitCont,fitCat)
Analysis of Deviance Table

Model 1: Surv(Time30, Died) ~ Rx + Age
Model 2: Surv(Time30, Died) ~ Rx + AgeGrp
  Resid. Df Resid. Dev Df Deviance
162 147.38
261 142.38  1 5.00

John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:6}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] I don't see libR.so in my installation directory

2009-05-09 Thread Dirk Eddelbuettel

On 8 May 2009 at 16:17, Tena Sakai wrote:
| Maybe I know the answer to my own question.
| When I built R 2.9.0, I didn't say:
| 
|   ./configure --enable-R-shlib
| 
| I know I have given --prefix flag, but that's
| the only flag I used.
| 
| I would appreciate it, if someone would give me
| a definitive answer, however.

You found the answer. littler aka 'r' embeds R by loading the shared
library --- the libR.so you were looking for.

Unless you have R build with --enable-R-shlib, you will not be able to use r,
or for that matter other users of embedded R.

Hope this helps, Dirk

 
| Regards,
| 
| Tena Sakai
| tsa...@gallo.ucsf.edu
| 
| 
| -Original Message-
| From: r-help-boun...@r-project.org on behalf of Tena Sakai
| Sent: Fri 5/8/2009 4:07 PM
| To: r-help@r-project.org
| Subject: [R] I don't see libR.so in my installation directory
|  
| Hi,
| 
| I installed R 2.9.0 a couple of days ago on a
| linux machine.  At the root of installation,
| I see 4 directories: bin, lib64, share, and src.
| 
| I don't see libR.so anywhere.  (In the following
| context, . (dot) indicates the root of my insta-
| llation.)  I do see:
| ./lib64/R/lib/libRblas.so
| ./lib64/R/lib/libRlapack.so
| 
| I became aware of such as I was preparing for
| an installation of little r.  The installation
| material stated to look for libR.so, and I want
| to make sure that the one I installed (2.9.0)
| is used by little r.
| 
| Would someone please clue me in?  Why don't I
| have libR.so and yet when I execute ./bin/R
| it says:
|   R version 2.9.0 (2009-04-17)
| 
| Regards,
| 
| Tena Sakai
| tsa...@gallo.ucsf.edu
| 
|   [[alternative HTML version deleted]]
| 
| __
| R-help@r-project.org mailing list
| https://stat.ethz.ch/mailman/listinfo/r-help
| PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
| and provide commented, minimal, self-contained, reproducible code.
| 
| 
|   [[alternative HTML version deleted]]
| 
| __
| R-help@r-project.org mailing list
| https://stat.ethz.ch/mailman/listinfo/r-help
| PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
| and provide commented, minimal, self-contained, reproducible code.

-- 
Three out of two people have difficulties with fractions.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.