date:20101115


Chris Carleton wrote:

Hi List,

I'm trying to get a density estimate for a point of interest from an npudens
object created for a sample of points. I'm working with 4 variables in total
(3 continuous and 1 unordered discrete - the discrete variable is the
character column in training.csv). When I try to evaluate the density for a
point that was not used in the training dataset, and when I extract the
fitted values from the npudens object itself, I'm getting values that are
much greater than 1 in some cases, which, if I understand correctly,
shouldn't be possible considering a pdf estimate can only be between 0 and
1. I think I must be doing something wrong, but I can't see it. Attached
I've included the training data (training.csv) and the point of interest
(origin.csv); below I've included the code I'm using and the results I'm
getting. I also don't understand why, when trying to evaluate the npudens
object at one point, I'm receiving the same set of fitted values from the
npudens object with the predict() function. It should be noted that I'm
indexing the dataframe of training data in order to get samples of the df
for density estimation (the samples are from different geographic locations
measured on the same set of variables; hence my use of sub-setting by [i]
and removing columns from the df before running the density estimation).
Moreover, in the example I'm providing here, the point of interest does
happen to come from the training dataset, but I'm receiving the same results
when I compare the point of interest to samples of which it is not a part
(density estimates that are either extremely small, which is acceptable, or
much greater than one, which doesn't seem right to me). Any thoughts would
be greatly appreciated,

Chris



I haven't looked at this in any detail, but why do say that pdf values
cannot exceed 1? That's certainly not true in general.

  -Peter Ehlers


fitted(npudens(tdat=training_df[training_cols_select][training_df$cat ==

i,]))

[1] 7.762187e+18 9.385532e+18 6.514318e+18 7.583486e+18 6.283017e+18
 [6] 6.167344e+18 9.820551e+18 7.952821e+18 7.882741e+18 1.744266e+19
[11] 6.653258e+18 8.704722e+18 8.631365e+18 1.876052e+19 1.995445e+19
[16] 2.323802e+19 1.203780e+19 8.493055e+18 8.485279e+18 1.722033e+19
[21] 2.227207e+19 2.177740e+19 2.168679e+19 9.329572e+18 9.380505e+18
[26] 1.023311e+19 2.109676e+19 7.903112e+18 7.935457e+18 8.91e+18
[31] 8.899827e+18 6.265440e+18 6.204720e+18 6.276559e+18 6.218002e+18


npu_dens - npudens(tdat=training_df[training_cols_select][training_df$cat

== i,])

summary(npu_dens)


Density Data: 35 training points, in 4 variable(s)
  aster_srtm_aspect aster_srtm_dem_filled aster_srtm_slope
Bandwidth(s):  29.22422  2.500559e-24 3.111467
  class_unsup_pc_iso
Bandwidth(s):  0.2304616

Bandwidth Type: Fixed
Log Likelihood: 1531.598

Continuous Kernel Type: Second-Order Gaussian
No. Continuous Vars.: 3

Unordered Categorical Kernel Type: Aitchison and Aitken
No. Unordered Categorical Vars.: 1


predict(npu_dens,newdata=origin[training_cols_select]))


[1] 7.762187e+18 9.385532e+18 6.514318e+18 7.583486e+18 6.283017e+18
 [6] 6.167344e+18 9.820551e+18 7.952821e+18 7.882741e+18 1.744266e+19
[11] 6.653258e+18 8.704722e+18 8.631365e+18 1.876052e+19 1.995445e+19
[16] 2.323802e+19 1.203780e+19 8.493055e+18 8.485279e+18 1.722033e+19
[21] 2.227207e+19 2.177740e+19 2.168679e+19 9.329572e+18 9.380505e+18
[26] 1.023311e+19 2.109676e+19 7.903112e+18 7.935457e+18 8.91e+18
[31] 8.899827e+18 6.265440e+18 6.204720e+18 6.276559e+18 6.218002e+18


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Version 2.12.0 exe file

2010-11-15 Thread Morris Anglin

I have R version 2.9.1 on my computer and the anlaysis is not working 
because I need to update to R version 2.12.0 the latest release. 

The person incharge of IT tried to download R version 2.12.0  but .exe file 
referenced in install isn't 
there -What might we be doing wrong?  We have downloaded as tar.gz, 
uncompress then look for .exe file but not present.

Many thanks,

Morris

Morris A. Anglin
Doctoral candidate
Institute of Health and Society
Newcastle University
Baddiley-Clark Building 
Richardson Road
NE2 4AX
UK

Direct line at Newcastle University:
+44 (0)191 222 8899
Direct line at Freeman Hospital: 
Tel: +44 (0) 191 233 6161 ext 26727
Mobile: +44 (0)07976206103
Email:morris.ang...@ncl.ac.uk
Email:morris.ang...@nuth.nhs.uk
http://www.ncl.ac.uk/ihs/postgrad/research/studentprofile.htm









To me, diversity is more complicated than rituals, food, clothing, etc. ...or 
race, language, and class, etc. To me, diversity needs to include divergent 
viewpoints, divergent interpretations, and divergent perspectives. Thus, 
diversity needs to include giving power and sanctioned space for divergent 
voices, even if this means disagreeing.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cannot see the y-labels (getting cut-off)


jim holtman wrote:

increase the margins on the plot:


par(mar=c(4,7,2,1))
plot(1:5,y,ylab='',yaxt='n' );
axis(2, at=y, labels=formatC(y,big.mark=,,format=fg),las=2,cex=0.1);



That's what I would do, but if you want to see how cex works,
use cex.axis=0.5. Check out ?par.

  -Peter Ehlers



On Sun, Nov 14, 2010 at 6:03 PM,
sachinthaka.abeyward...@allianz.com.au wrote:

Hi All,

When I run the following code, I cannot see the entire number. As opposed
to seeing 1,000,000,000. I only see 000,000 because the rest is cut off.

The cex option doesn't seem to be doing anything at all.

y-seq(1e09,5e09,1e09);
plot(1:5,y,ylab='',yaxt='n' );
axis(2, at=y, labels=formatC(y,big.mark=,,format=fg),las=2,cex=0.1);

Any thoughts?

Thanks,
Sachin
p.s. sorry about corporate notice.

--- Please consider the environment before printing this email ---

Allianz - Best General Insurance Company of the Year 2010*
Allianz - General Insurance Company of the Year 2009+

* Australian Banking and Finance Insurance Awards
+ Australia and New Zealand Insurance Industry Awards

This email and any attachments has been sent by Allianz ...{{dropped:3}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.







__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Version 2.12.0 exe file

Morris Anglin wrote:
I have R version 2.9.1 on my computer and the anlaysis is not working
because I need to update to R version 2.12.0 the latest release.

The person incharge of IT tried to download R version 2.12.0 but .exe file referenced in install isn't
there -What might we be doing wrong? We have downloaded as tar.gz,
uncompress then look for .exe file but not present.

(It's a good idea to state your OS.)
Anyway, assuming that you got your download from your favourite
CRAN repository, you may have missed these two sentences:

The sources have to be compiled before you can use them. If you do
not know what this means, you probably do not want to do it!

Pre-compiled versions are available at CRAN.

-Peter Ehlers

Many thanks,

Morris

Morris A. Anglin
Doctoral candidate
Institute of Health and Society
Newcastle University
Baddiley-Clark Building
Richardson Road

NE2 4AX
UK

Direct line at Newcastle University:
+44 (0)191 222 8899
Direct line at Freeman Hospital:
Tel: +44 (0) 191 233 6161 ext 26727

Mobile: +44 (0)07976206103
Email:morris.ang...@ncl.ac.uk
Email:morris.ang...@nuth.nhs.uk
http://www.ncl.ac.uk/ihs/postgrad/research/studentprofile.htm

To me, diversity is more complicated than rituals, food, clothing, etc. ...or race,
language, and class, etc. To me, diversity needs to include divergent viewpoints,
divergent interpretations, and divergent perspectives. Thus, diversity needs to
include giving power and sanctioned space for divergent voices, even if this means
disagreeing.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Path to nodes in ctree package party

2010-11-15 Thread Sven Garbade

Hallo list,

I'm wondering if there is a way to extract the path to terminal nodes
in a BinaryTree object, e.g. ctree, like the function path.rpart in
package rpart.

Thanks, Sven

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem in installing and starting Rattle



kgorahava wrote:
 
 I am trying to install and run Rattle on my Dell Laptop and I have Windows
 7 OS.
 
 install.packages(RGtk2)
 
 install.packages(rattle)
 
  library(rattle)
 Rattle: Graphical interface for data mining using R.
 Version 2.5.47 Copyright (c) 2006-2010 Togaware Pty Ltd.
 Type 'rattle()' to shake, rattle, and roll your data. 
 
 When I type the below command, I get an error message.
 
 

This had bugged me before, and reinstalling Rgtk over and over did not help.
I got it to work after total cleanup of all GTk related files (use a global
search!) and reinstalling Gtk. So old-version files had come into the way.

Dieter



-- 
View this message in context: 
http://r.789695.n4.nabble.com/Problem-in-installing-and-starting-Rattle-tp3042502p3042857.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem in installing and starting Rattle

2010-11-15 Thread Prof Brian Ripley

On Mon, 15 Nov 2010, Dieter Menne wrote:

kgorahava wrote:

I am trying to install and run Rattle on my Dell Laptop and I have Windows
7 OS.

install.packages(RGtk2)

install.packages(rattle)

library(rattle)
Rattle: Graphical interface for data mining using R.
Version 2.5.47 Copyright (c) 2006-2010 Togaware Pty Ltd.
Type 'rattle()' to shake, rattle, and roll your data.

When I type the below command, I get an error message.

This had bugged me before, and reinstalling Rgtk over and over did not help.
I got it to work after total cleanup of all GTk related files (use a global
search!) and reinstalling Gtk. So old-version files had come into the way.

If they are in your PATH, they will (and we do also warn that 32-bit
DLLs in the path for 64-bit R or v.v. will lead to confusion).

So just check your PATH: you can have multiple Gtk+ installations, and
you need to if you use both 32- and 64-bit R (but you need to set the
PATH appropriately).

Dieter

--
View this message in context:
http://r.789695.n4.nabble.com/Problem-in-installing-and-starting-Rattle-tp3042502p3042857.html
Sent from the R help mailing list archive at Nabble.com.

--
Brian D. Ripley, rip...@stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax: +44 1865 272595

[R] Problem with lm4 and lapack library

2010-11-15 Thread Javier Bustamante

I have R under Linux Kubuntu Hardy Heron 8.04 LTS.
I am trying to install package 'lme4' and get the following error:

* installing *source* package ‘lme4’ ...
** libs
gcc -std=gnu99 -I/usr/share/R/include -I/usr/share/R/include   
-I/usr/lib/R/library/Matrix/include -I/usr/lib/R/library/stats/include   
-fpic  -g -O2 -c 
init.c -o init.o
gcc -std=gnu99 -I/usr/share/R/include -I/usr/share/R/include   
-I/usr/lib/R/library/Matrix/include -I/usr/lib/R/library/stats/include   
-fpic  -g -O2 -c 
lmer.c -o lmer.o
gcc -std=gnu99 -I/usr/share/R/include -I/usr/share/R/include   
-I/usr/lib/R/library/Matrix/include -I/usr/lib/R/library/stats/include   
-fpic  -g -O2 -c 
local_stubs.c -o local_stubs.o
gcc -std=gnu99 -shared -o lme4.so init.o lmer.o 
local_stubs.o -L/usr/lib64/R/lib -lRlapack -lblas -lgfortran -lm -lgcc_s 
-L/usr/lib64/R/lib -lR
/usr/bin/ld: cannot find -lRlapack
collect2: ld devolvió el estado de salida 1
make: *** [lme4.so] Error 1
ERROR: compilation failed for package ‘lme4’
* removing ‘/usr/local/lib/R/site-library/lme4’

The downloaded packages are in
‘/tmp/Rtmp7uCMsh/downloaded_packages’
Mensajes de aviso perdidos
In install.packages(lme4) :
  installation of package 'lme4' had non-zero exit status
-

Surfing on internet and R hel system it seems the problem is related to 
LAPACK, LBAS. ATLAS libraries. R tries to find unsuccessfully  -lRlapack 

I have checked which libraries R is pointing to:
---
bu...@temisto:~$ ldd /usr/lib/R/bin/exec/R
linux-vdso.so.1 =  (0x7fff1e10)
libR.so = /usr/lib/R/lib/libR.so (0x7f9157c54000)
libc.so.6 = /lib/libc.so.6 (0x7f91578f2000)
libblas.so.3gf = /usr/lib/atlas/libblas.so.3gf (0x7f9156f42000)
libgfortran.so.2 = /usr/lib/libgfortran.so.2 (0x7f9156c83000)
libm.so.6 = /lib/libm.so.6 (0x7f9156a02000)
libreadline.so.5 = /lib/libreadline.so.5 (0x7f91567c2000)
libpcre.so.3 = /usr/lib/libpcre.so.3 (0x7f915659c000)
libz.so.1 = /usr/lib/libz.so.1 (0x7f9156385000)
libdl.so.2 = /lib/libdl.so.2 (0x7f9156181000)
/lib64/ld-linux-x86-64.so.2 (0x7f91581c)
libncurses.so.5 = /lib/libncurses.so.5 (0x7f9155f46000)
---

Also if I have the LAPACK and ATLAS libraries installed:

bu...@temisto:~$ dpkg -l | grep lapack
ii  lapack3 3.0.2531a-6.1ubuntu1  
ii  liblapack-dev   3.1.1-0.3ubuntu2 
ii  liblapack3gf 3.1.1-0.3ubuntu2  


bu...@temisto:~$ dpkg -l | grep atlas
ii  atlas3-base3.6.0-20.6   
 
ii  libatlas-base-dev  3.6.0-21.1ubuntu3
 
ii  libatlas-headers   3.6.0-21.1ubuntu3
 
ii  libatlas3gf-base   3.6.0-21.1ubuntu3
 

---
R CMD config BLAS_LIBS
-lblas

R CMD config LAPACK_LIBS
-L/usr/lib64/R/lib -lRlapack

R seems to be looking for LAPACK in /usr/lib64/R/lib -lRlapack, probably the 
wrong place.


Can somebody point me to a possible solution?


-- 
Dr Javier Bustamante
Estación Biológica de Doñana, CSIC
Dept. Wetland Ecology
Américo Vespucio s/n
41092 - Sevilla
Spain

voice: +34 954 466700 ext. 1217
fax: +34 954 621125
e-mail: jbustama...@ebd.csic.es

http://www.ebd.csic.es/bustamante/index.html

-other links---
http://last-ebd.blogspot.com/
http://euroconbio.blogspot.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sweave question



Duncan Murdoch-2 wrote:
 
 See SweavePDF() in the patchDVI package on R-forge.
 
 

In case googling patchDVI only show a few Japanese Pages, and search for
patchDVI in R-Forge gives nothing: try

https://r-forge.r-project.org/projects/sweavesearch/

(or did I miss something obvious, Duncan?)

Dieter


-- 
View this message in context: 
http://r.789695.n4.nabble.com/Sweave-question-tp3041257p3042892.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sweave question


On 15/11/2010 6:22 AM, Dieter Menne wrote:



Duncan Murdoch-2 wrote:


See SweavePDF() in the patchDVI package on R-forge.




In case googling patchDVI only show a few Japanese Pages, and search for
patchDVI in R-Forge gives nothing: try

https://r-forge.r-project.org/projects/sweavesearch/

(or did I miss something obvious, Duncan?)


No, I just didn't realize that it was hard to find.  But you can always 
select R-forge as a repository, and then install.packages() will find it.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] mgarch-BEKK

2010-11-15 Thread TeBe

Dear all..

Can anybody help me with mgarchBEKK? After  estimate bekk model, i want to 
check whether the residuals meet the  required assumptions. Can i  perform 
Portmanteau test, the ARCH-LM test, plots of the AC and PAC  functions of the 
residuals? Can you give some example with the script in  R? Please..

Thank You So Much


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] comparing levels of aggregation with negative binomial models

2010-11-15 Thread Ben Raymond


Dear R community,

I would like to compare the degree of aggregation (or dispersion) of  
bacteria isolated from plant material.  My data are discrete counts  
from leaf washes.  While I do have xy coordinates for each plant, it  
is aggregation in the sense of the concentration of bacteria in high  
density patches that I am interested in.
My attempt to analyze this was to fit negative binomial glms to each  
of my leaf treatments (using MASS) and to compare estimates of theta  
and use the standard errors to calculate confidence limits.   My  
values of theta (se) were 0.387 (0.058) and 0.1035 (0.015) which were  
in the right direction for my hypothesis.  However, some of the stats  
literature suggests that the confidence intervals of theta (or k) are  
not very robust and it would be better to calculate confidence  
intervals for 1/k.  Is there a way I can estimate confidence intervals  
for 1/k in R, or indeed a more elegant way of looking at aggregation?


Many thanks for your time.

yours,


Dr Ben Raymond,
NERC Advanced Research Fellow,
Lecturer in Population Genetics,
School of Biological Sciences,
Royal Holloway University of London,
Egham,
Surrey.
TW20 0EX

tel 0044 1784443547
ben.raym...@rhul.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to move an internal function to external keeping same environment?

2010-11-15 Thread Matthieu Stigler


Hi

I have within a quite big function foo1, an internal function foo2. Now, 
in order to have a cleaner code, I wish to have the internal foo2 as 
external. This foo2 was using arguments within the foo1 environment 
that were not declared as inputs of foo2, which works as long as foo2 is 
within foo1, but not anymore if foo2 is external, as is the case now.


Now, I could add all those arguments as inputs to foo2, but I feel if 
foo2 is called often, I would be copying those objects more than 
required. Am I wrong?


I then used this to avoid to declare explcitely each argument to foo2:

foo1-function(x){
  b-x[1]+2
  environment(foo2)-new.env(parent =as.environment(-1))
  c-foo2(x)

return(c)
}

foo2-function(x)  x*b
#try:
foo1(1:100)


This works. But I wanted to be sure:

-am I right that if I instead declare each element to be passed to foo2, 
this would be more copying than required? (imagine b in my case a heavy 
dataset, foo2 a long computation)
-is this lines environment(foo2)-new.env(parent =as.environment(-1)) 
the good way to do it or it can have unwanted implications?


Thanks a lot!!

Matthieu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to move an internal function to external keeping same environment?

On Mon, Nov 15, 2010 at 7:48 AM, Matthieu Stigler
matthieu.stig...@gmail.com wrote:
 Hi

 I have within a quite big function foo1, an internal function foo2. Now, in
 order to have a cleaner code, I wish to have the internal foo2 as
 external. This foo2 was using arguments within the foo1 environment that
 were not declared as inputs of foo2, which works as long as foo2 is within
 foo1, but not anymore if foo2 is external, as is the case now.

 Now, I could add all those arguments as inputs to foo2, but I feel if foo2
 is called often, I would be copying those objects more than required. Am I
 wrong?

 I then used this to avoid to declare explcitely each argument to foo2:

 foo1-function(x){
  b-x[1]+2
  environment(foo2)-new.env(parent =as.environment(-1))
  c-foo2(x)

 return(c)
 }

 foo2-function(x)  x*b
 #try:
 foo1(1:100)


 This works. But I wanted to be sure:

 -am I right that if I instead declare each element to be passed to foo2,
 this would be more copying than required? (imagine b in my case a heavy
 dataset, foo2 a long computation)
 -is this lines environment(foo2)-new.env(parent =as.environment(-1)) the
 good way to do it or it can have unwanted implications?


This would be good enough (replacing your environment(foo2)-... line):

environment(foo2) - environment()

If you add parameters to foo2 it won't actually copy them unless they
are modified in foo2.


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to move an internal function to external keeping same environment?


On 15/11/2010 7:48 AM, Matthieu Stigler wrote:

Hi

I have within a quite big function foo1, an internal function foo2. Now,
in order to have a cleaner code, I wish to have the internal foo2 as
external. This foo2 was using arguments within the foo1 environment
that were not declared as inputs of foo2, which works as long as foo2 is
within foo1, but not anymore if foo2 is external, as is the case now.

Now, I could add all those arguments as inputs to foo2, but I feel if
foo2 is called often, I would be copying those objects more than
required. Am I wrong?

I then used this to avoid to declare explcitely each argument to foo2:

foo1-function(x){
b-x[1]+2
environment(foo2)-new.env(parent =as.environment(-1))
c-foo2(x)

return(c)
}

foo2-function(x)  x*b
#try:
foo1(1:100)


This works. But I wanted to be sure:

-am I right that if I instead declare each element to be passed to foo2,
this would be more copying than required? (imagine b in my case a heavy
dataset, foo2 a long computation)
-is this lines environment(foo2)-new.env(parent =as.environment(-1))
the good way to do it or it can have unwanted implications?



I don't think modifying the environment of a closure could be seen as a 
way to get cleaner code.  I'd just leave foo2 within foo1.


There are several unwanted implications of the way you do it:

 - Setting the environment of foo2 to the foo1 evaluation frame means 
those local variables won't be garbage collected.  If foo1 creates a 
large temporary matrix, it will take up space in memory until you modify 
foo2 again.


 - If we say that foo2 has global scope (it might be limited to your 
package namespace, but let's call that global), then your foo1 has 
global side effects, and those are often a bad idea.  For example, 
suppose next week you define foo3 that also uses and modifies foo2.  
It's very easy for foo1 and foo3 to clash with conflicting use of the 
global foo2.  Don't use globals unless you really have to.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R package 'np' problems

2010-11-15 Thread Chris Carleton

Hi Peter and List,

I realized the err of my ways here. Thanks for the response; I appreciate
the help. The struggles of self-taught statistics and maths continue!

Chris

On 15 November 2010 04:34, P Ehlers ehl...@ucalgary.ca wrote:

 Chris Carleton wrote:

 Hi List,

 I'm trying to get a density estimate for a point of interest from an
 npudens
 object created for a sample of points. I'm working with 4 variables in
 total
 (3 continuous and 1 unordered discrete - the discrete variable is the
 character column in training.csv). When I try to evaluate the density for
 a
 point that was not used in the training dataset, and when I extract the
 fitted values from the npudens object itself, I'm getting values that are
 much greater than 1 in some cases, which, if I understand correctly,
 shouldn't be possible considering a pdf estimate can only be between 0 and
 1. I think I must be doing something wrong, but I can't see it. Attached
 I've included the training data (training.csv) and the point of interest
 (origin.csv); below I've included the code I'm using and the results I'm
 getting. I also don't understand why, when trying to evaluate the npudens
 object at one point, I'm receiving the same set of fitted values from the
 npudens object with the predict() function. It should be noted that I'm
 indexing the dataframe of training data in order to get samples of the df
 for density estimation (the samples are from different geographic
 locations
 measured on the same set of variables; hence my use of sub-setting by [i]
 and removing columns from the df before running the density estimation).
 Moreover, in the example I'm providing here, the point of interest does
 happen to come from the training dataset, but I'm receiving the same
 results
 when I compare the point of interest to samples of which it is not a part
 (density estimates that are either extremely small, which is acceptable,
 or
 much greater than one, which doesn't seem right to me). Any thoughts would
 be greatly appreciated,

 Chris


 I haven't looked at this in any detail, but why do say that pdf values
 cannot exceed 1? That's certainly not true in general.

  -Peter Ehlers


  fitted(npudens(tdat=training_df[training_cols_select][training_df$cat ==

 i,]))

 [1] 7.762187e+18 9.385532e+18 6.514318e+18 7.583486e+18 6.283017e+18
  [6] 6.167344e+18 9.820551e+18 7.952821e+18 7.882741e+18 1.744266e+19
 [11] 6.653258e+18 8.704722e+18 8.631365e+18 1.876052e+19 1.995445e+19
 [16] 2.323802e+19 1.203780e+19 8.493055e+18 8.485279e+18 1.722033e+19
 [21] 2.227207e+19 2.177740e+19 2.168679e+19 9.329572e+18 9.380505e+18
 [26] 1.023311e+19 2.109676e+19 7.903112e+18 7.935457e+18 8.91e+18
 [31] 8.899827e+18 6.265440e+18 6.204720e+18 6.276559e+18 6.218002e+18

  npu_dens -
 npudens(tdat=training_df[training_cols_select][training_df$cat

 == i,])

 summary(npu_dens)


 Density Data: 35 training points, in 4 variable(s)
  aster_srtm_aspect aster_srtm_dem_filled aster_srtm_slope
 Bandwidth(s):  29.22422  2.500559e-24 3.111467
  class_unsup_pc_iso
 Bandwidth(s):  0.2304616

 Bandwidth Type: Fixed
 Log Likelihood: 1531.598

 Continuous Kernel Type: Second-Order Gaussian
 No. Continuous Vars.: 3

 Unordered Categorical Kernel Type: Aitchison and Aitken
 No. Unordered Categorical Vars.: 1

  predict(npu_dens,newdata=origin[training_cols_select]))


 [1] 7.762187e+18 9.385532e+18 6.514318e+18 7.583486e+18 6.283017e+18
  [6] 6.167344e+18 9.820551e+18 7.952821e+18 7.882741e+18 1.744266e+19
 [11] 6.653258e+18 8.704722e+18 8.631365e+18 1.876052e+19 1.995445e+19
 [16] 2.323802e+19 1.203780e+19 8.493055e+18 8.485279e+18 1.722033e+19
 [21] 2.227207e+19 2.177740e+19 2.168679e+19 9.329572e+18 9.380505e+18
 [26] 1.023311e+19 2.109676e+19 7.903112e+18 7.935457e+18 8.91e+18
 [31] 8.899827e+18 6.265440e+18 6.204720e+18 6.276559e+18 6.218002e+18




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [klaR package] [NaiveBayes] warning message numerical 0 probability

2010-11-15 Thread Uwe Ligges




On 03.11.2010 16:26, Fabon Dzogang wrote:

Hi,

I run R 2.10.1 under ubuntu 10.04 LTS (Lucid Lynx) and klaR version 0.6-4.

I compute a model over a 2 classes dataset (composed of 700 examples).
To that aim, I use the function NaiveBayes provided in the package
klaR.
When I then use the prediction function : predict(my_model, new_data).
I get the following warning :

In FUN(1:747[[747L]], ...) : Numerical 0 probability with observation 458

As I did not find any documentation or any discussion concerning this
warning message, I looked in the klaR source code and found the
following line in predict.NaiveBayes.R :

warning(Numerical 0 probability with observation , i)



Within Naive Bayes in order to calculate the posteriori probabilities of
the classes it is necessary to calculate the probabilities of the
observations given the classes. The function NaiveBayes prints a warning
if all these probabilities are numerical 0, i.e. that the observation
has a numerical probability of 0 for *all* classes. Usually this is only
the case when the obs. is an extreme outlier.

I will change the warning to say all classes in further releases of klaR.

Best wishes,
Uwe Ligges




Unfortunately, it is hard to get a clear picture of the whole process
reading the code. I wonder if someone could help me with the meaning
of this warning message.

Sorry I did not provide an example, but I could not simulate the same
message over a small toy example.

Thank you,

Fabon Dzogang.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Access DDE Data

2010-11-15 Thread Santosh Srinivas

Hello Group,

Is it possible to access DDE data from R?

Thanks in advance for any pointers.
S

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Zero truncated Poisson distribution R2WinBUGS

2010-11-15 Thread julien martin

I am using a binomial mixture model to estimate abundance (N) and
detection probability (p) using simulated count data:
-Each site has a simulated abundance that follow a Poisson
distribution with lambda = 5
-There are 200 simulated sampled sites
-3 repeated counts at each site
- only 50 percent of the animals are counted during each count (i.e,
detection probability p =0.5, see codes)
We removed sites in which animals were never counted (see matrix y, in
the script)
I would like to use a zero truncated version of the Poisson
distribution (I am aware of zero-inflated binomial mixture models, but
still want to solve the problem described above).
The codes below:
(1) generate a matrix of counts (y), rows correspond to sites and
column to repeat visits at each sites. The script also removes sites
when no animals were counted during the 3 consecutive visits
(2) The second part of the script calls WinBUGS and run the binomial
mixture models on the count data. In this case the count matrix y was
converted to a vector C1 before being passed over to BUGS
Any idea how to create a zero truncated Poisson for parameter lam1
(i.e., parameter lambda of the Poisson distribution)

Thank you for your help.

#R script
#Simulated abundance data
n.site - 200# 200 sites visited
lam - 5 #mean number of animals per site
R - n.site  # nubmer of sites
T - 3  # Number of replicate counts at each site
N=rpois(n = n.site, lambda = lam) #True abundance

# Simulate count data; only a fraction of N is counted which results in y
y - array(dim = c(R, T))
for(i in 1:T){
y[,i] - rbinom(n = n.site, size = N, prob = 0.5)
}
#truncate y matrix
y   # R-by-T matrix of counts
sumy=apply(y,1,sum)
cbindysumy=cbind(y,sumy)
subsetcbindysumy=subset(cbindysumy,sumy!=0)
y=subsetcbindysumy[,1:3]# sites where no animals ever counted are removed
C1-c(y) #vectorized matrix y
R=dim(y)[1]
site = 1:R
site.p - rep(site, T)  
#
#WinBUGS codes
#
library(R2WinBUGS)# Load R2WinBUGS package
sink(Model.txt)
cat(
model {
# Priors: new uniform priors
p0~dunif(0,1)
lam1~dgamma(.01,.01)
# Likelihood
# Biological model for true abundance
 for (i in 1:R) {   # Loops over R sites
   N1[i] ~ dpois(lambda1[i])
   lambda1[i] - lam1
  }
# Observation model for replicated counts
 for (i in 1:n) {   # Loops over all n observations
   C1[i] ~ dbin(p1[i], N1[site.p[i]])
   p1[i] -p0
 }
# Derived quantities
 totalN1 - sum(N1[])   # Estimate total population size across all sites
}
,fill=TRUE)
sink()
# Package data for WinBUGS
R = dim(y)[1]   # number of sites: 200
n = dim(y)[1] * dim(y)[2]#number of observations (sites*surveys)
win.data - list(R = R, n = n, C1 = C1, site.p = site.p)
# Inits
Nst - apply(y, 1, max) + 1
inits - function(){list(N1 = Nst,lam1=runif(1,1,8),  p0=runif(1))}
parameters - c( totalN1, p0, lam1)
# MCMC settings
nc - 3
nb - 400#need to push to 400 for convergence
ni - 1400   #need to push to 14000 for convergence
nt - 1
out - bugs (win.data, inits, parameters, Model.txt, n.chains=nc,
n.iter=ni, n.burn = nb, n.thin=nt, debug = T)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Surprising behavior using seq()

2010-11-15 Thread Vadim Patsalo

Patrick and Bert,

Thank you both for you replies to my question. I see how my naïve expectations 
fail to floating point arithmetic. However, I still believe there is an 
underlying problem.

It seems to me that when asked,

 c(7.7, 7.8, 7.9) %in% seq(4, 8, by=0.1)
 [1]  TRUE FALSE  TRUE

R should return TRUE in all instances. %in% is testing set membership... in 
that way, shouldn't it be using all.equal() (instead of the implicit '=='), as 
Patrick suggests the R inferno?

Is there a convenient way to test set membership using all.equal()? In 
particular, can you do it (conveniently) when the lengths of the numeric lists 
are different?

Thanks again for your reply!
Vadim

On Nov 13, 2010, at 5:46 AM, Patrick Burns wrote:

 See Circle 1 of 'The R Inferno'.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] likelyhood maximization problem with polr

2010-11-15 Thread blackscorpio


Hello,
Thanks for your answer. I will try this function to see if it gives
equivalent results that those obtained with polr()+dropterm() (in a case
where polr() works).
Many thanks
-- 
View this message in context: 
http://r.789695.n4.nabble.com/likelyhood-maximization-problem-with-polr-tp2528818p3043137.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Replicate Excel's LOGEST worksheet function in R

2010-11-15 Thread cran . 30 . miller_2555

On Sat, Nov 13, 2010 at 10:37 PM, Jeff Newmiller - jdnew...@dcn.davis.ca.us
+cran+miller_2555+a7f4a7aeab.jdnewmil#dcn.davis.ca...@spamgourmet.comwrote:


 Anyway, I recommend you learn from David before criticizing his assistance.

  cran.30.miller_2...@spamgourmet.com On Fri, Nov 12, 2010 at 5:28 PM,
 David Winsemius -
 dwinsem...@comcast.net
 +cran+miller_2555+c0e7477398.dwinsemius#comcast@spamgourmet.com
 wrote:
  Then WHY did you ask for a function that would duplicate a particular
 Excel function? ... especially an Excel function which DOES a linear fit
 on the log of the Y argument?


It was not a critical response. It was a clarification with the solution
that fit my issue. In fact, David was directionally helpful in solving the
issue. I am neither a statistician nor well versed in the computational
nuance of Excel functions. In the future, I will refrain from such
clarifying posts to avoid inflammatory ones.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] interpretation of coefficients in survreg AND obtaining the hazard function

2010-11-15 Thread Terry Therneau

1. The weibull is the only distribution that can be written in both a
proportional hazazrds for and an accelerated failure time form.  Survreg
uses the latter.
   In an ACF model, we model the time to failure.  Positive coefficients
are good (longer time to death).
   In a PH model, we model the death rate.  Positive coefficients are
bad (higher death rate).

You are not the first to be confused by the change in sign between the
two models.

2. There are about 5 different ways to parameterize a Weibull
distribution, 1-4 appear in various texts and the acf form is #5.  This
is a second common issue with survreg that strikes only the more
sophisticated users: to understand the output they look up the Weibull
in a textbook, and become even more confused!   

Kalbfliesch and Prentice is a good reference for the acf form.  The
manual page for psurvreg has some information on this, as does the very
end of ?survreg.  The psurvreg page also has an example of how to
extract the hazard function for a Weibull fit.

Begin included message 

Dear R help list,

I am modeling some survival data with coxph and survreg (dist='weibull')
using 
package survival. I have 2 problems:

1) I do not understand how to interpret the regression coefficients in
the 
survreg output and it is not clear, for me, from ?survreg.objects how
to.

Here is an example of the codes that points out my problem:
- data is stc1
- the factor is dichotomous with 'low' and 'high' categories

slr - Surv(stc1$ti_lr, stc1$ev_lr==1)

mca - coxph(slr~as.factor(grade2=='high'), data=stc1)
mcb - coxph(slr~as.factor(grade2), data=stc1)
mwa - survreg(slr~as.factor(grade2=='high'), data=stc1,
dist='weibull', 
scale=0)
mwb - survreg(slr~as.factor(grade2), data=stc1, dist='weibull',
scale=0)

 summary(mca)$coef

coef 
exp(coef)  se(coef) z  Pr(|z|)
as.factor(grade2 == high)TRUE 0.2416562  1.273356 0.2456232 
0.9838494  0.3251896

 summary(mcb)$coef
   coef exp(coef)  
se(coef) z Pr(|z|)
as.factor(grade2)low -0.2416562 0.7853261 0.2456232
-0.9838494 
0.3251896

 summary(mwa)$coef
(Intercept) as.factor(grade2 == high)TRUE 
7.9068380   -0.4035245 

 summary(mwb)$coef
(Intercept) as.factor(grade2)low 
7.5033135   0.4035245 


No problem with the interpretation of the coefs in the cox model.
However, i do 
not understand why
a) the coefficients in the survreg model are the opposite (negative when
the 
other is positive) of what I have in the cox model? are these not the
log(HR) 
given the categories of these variable?
b) how come the intercept coefficient changes (the scale parameter does
not 
change)?

2) My second question relates to the first.
a) given a model from survreg, say mwa above, how should i do to extract
the 
base hazard and the hazard of each patient given a set of predictors?
With the 
hazard function for the ith individual in the study given by  h_i(t) = 
exp(\beta'x_i)*\lambda*\gamma*t^{\gamma-1}, it doesn't look like to me
that 
predict(mwa, type='linear') is \beta'x_i.
b) since I need the coefficient intercept from the model to obtain the
scale 
parameter  to obtain the base hazard function as defined in Collett 
(h_0(t)=\lambda*\gamma*t^{\gamma-1}), I am concerned that this
coefficient 
intercept changes depending on the reference level of the factor entered
in the 
model. The change is very important when I have more than one predictor
in the 
model.

Any help would be greatly appreciated,

David Biau.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ... predict.coxph

2010-11-15 Thread Terry Therneau

If you are looking at radioactive decay maybe but how often do
you actually see exponential KM curves in real life?
  Exponential curves are rare.  But proportional hazards does not imply
exponential.

 A  trial design
could in fact try to get all the control sample to event  at the same
time if enough was known about prognostic factors and natural trajectory

 You are a dreamer.  We know very little about even the diseases we know
best.  The life insurers are in no danger yet from accurate predictions
by the medical community.

Terry T

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Surprising behavior using seq()


On 15/11/2010 9:24 AM, Vadim Patsalo wrote:

Patrick and Bert,

Thank you both for you replies to my question. I see how my naïve expectations 
fail to floating point arithmetic. However, I still believe there is an 
underlying problem.

It seems to me that when asked,

  c(7.7, 7.8, 7.9) %in% seq(4, 8, by=0.1)
  [1]  TRUE FALSE  TRUE

R should return TRUE in all instances. %in% is testing set membership... in 
that way, shouldn't it be using all.equal() (instead of the implicit '=='), as 
Patrick suggests the R inferno?


No, because 7.8 is not in the set.  Some number quite close to it is 
there, but no 7.8.  This is true for both meanings of 7.8:


  - the number 78/10
  - R's representation of that number

Neither one is in the set.  What you have there is R's representation of 
4 plus 38 times R's representation of 0.1.  (R can represent 4 and 38 
exactly, but not 0.1, 7.8, pi, or most other numbers.)


Duncan Murdoch


Is there a convenient way to test set membership using all.equal()? In 
particular, can you do it (conveniently) when the lengths of the numeric lists 
are different?

Thanks again for your reply!
Vadim

On Nov 13, 2010, at 5:46 AM, Patrick Burns wrote:

  See Circle 1 of 'The R Inferno'.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Surprising behavior using seq()


Vadim Patsalo wrote:

Patrick and Bert,

Thank you both for you replies to my question. I see how my naïve expectations 
fail to floating point arithmetic. However, I still believe there is an 
underlying problem.

It seems to me that when asked,


c(7.7, 7.8, 7.9) %in% seq(4, 8, by=0.1)
[1]  TRUE FALSE  TRUE


R should return TRUE in all instances. %in% is testing set membership... in 
that way, shouldn't it be using all.equal() (instead of the implicit '=='), as 
Patrick suggests the R inferno?

Is there a convenient way to test set membership using all.equal()? In 
particular, can you do it (conveniently) when the lengths of the numeric lists 
are different?


This looks like a job for zapsmall; check ?zapsmall.

  -Peter Ehlers


Thanks again for your reply!
Vadim

On Nov 13, 2010, at 5:46 AM, Patrick Burns wrote:


See Circle 1 of 'The R Inferno'.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help with Hist

2010-11-15 Thread Steve Sidney


Dear list

I am trying to re-scale a histogram and using hist() but can not seem to 
create a reduced scale where the upper values are not plotted.


What I have is about 100 of which 80 or so are between a value of 0 and 
40 , one or two in the hundreds and an outlier around 2000.


What I would like to do is create an x-scale that shows 5 bins between 
0-100 and then 3/4 bins between 100 and 2000 but I don't need any 
resolution on the above 100 values.


If I use breaks c(0, 20, 40, 60, 80, 100, 200, 500) R gives me an error 
saying that there are values not included, which of course I know but I 
wish to ignore them.


It seems that I am missing something quite simple.

Any help would be appreciated.

Regards
Steve

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Null values in R

2010-11-15 Thread Raji


Hi R-helpers , can you please let me know the methods in which NULL values
can be handled in R? Are there any generic commands/functions that can be
given in a workspace,so that the NULL values occuring in that workspace (for
any datasets that are loaded , any output that is calculated) , are
considered in the same way?

Thanks in advance.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Null-values-in-R-tp3043184p3043184.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RCurl and cookies in POST requests

2010-11-15 Thread Christian M.

Hello Duncan.

Thanks for having a look at this. As soon as I get home I'll try
your suggestion.

BTW, the link to the omega-help mailing list seems to be broken:
http://www.omegahat.org/mailman/listinfo/

Thank you.

chr


Duncan Temple Lang (Monday 15 November 2010, 01:02):
 Hi Christian
 
  Thanks for finding this. The problem seems to be that the finalizer
 on the curl handle seems to disappear and so is not being called
 when the handle is garbage collected.  So there is a bug somewhere
 and I'll try to hunt it down quickly.
 
   In the meantime, you can achieve the same effect by calling the
 C routine curl_easy_cleanup.  You can't do this directly with a
 .Call() or .C() as there is no explicit interface in the RCurl
 package to this routine. However, you can use the Rffi package
 (on the omegahat  repository)
 
  library(Rffi)
  cif = CIF(voidType, list(pointerType))
  callCIF(cif, curl_easy_cleanup, c...@ref)
 
  I'll keep looking for why the finalizer is getting discarded.
 
  Thanks again,
 
  D.
 
 On 11/14/10 6:30 AM, Christian M. wrote:
  Hello.
  
  I know that it's usually possible to write cookies to a cookie
  file by removing the curl handle and doing a gc() call. I can do
  this with getURL(), but I just can't obtain the same results with
  postForm().
  
  If I use:
  
  curlHandle - getCurlHandle(cookiefile=FILE, cookiejar=FILE)
  
  and then do:
  
  getURL(http://example.com/script.cgi, curl=curlHandle)
  rm(curlHandle)
  gc()
  
  it's OK, the cookie is there. But, if I do (same handle; the
  parameter is a dummy):
  
  postForm(site, .params=list(par=cookie), curl=curlHandle,
style=POST)
  rm(curlHandle)
  gc()
  
  no cookie is written.
  
  Probably I'm doing something wrong, but don't know what.
  
  Is it possible to store cookies read from the output of a
  postForm() call? How?
  
  Thanks.
  
  Christian
  
  PS.: I'm attaching a script that can be sourced (and its .txt
  version). It contains an example. The expected result is a file
  (cookies.txt) with two cookies. The script currently uses
  getURL() and two cookies are stored. If postForm() is used
  (currently commented), only 1 cookie is written.
  

-- 
SDF Public Access UNIX System - http://sdf.lonestar.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to Read a Large CSV into a Database with R

2010-11-15 Thread Anthony Damico

Hi, I'm working in R 2.11.1 x64 on Windows x86_64-pc-mingw32.  I'm trying to
insert a very large CSV file into a SQLite database.  I'm pretty new to
working with databases in R, so I apologize if I'm overlooking something
obvious here.

I'm trying to work with the American Community Survey data, which is two
1.3GB csv files.  I have enough RAM to read one of them into memory, but not
both at the same time.  So, in order to analyze them, I'm trying to get them
into a SQLite database so I can use the R survey package's database-backed
survey objects capabilities (
http://faculty.washington.edu/tlumley/survey/svy-dbi.html).

I need to combine both of these CSV files into one table (within a
database), so I think that I'd need a SQL manipulation technique that reads
everything line by line, instead of pulling it all into memory.

I've tried using read.csv.sql, but it finishes without an error and then
only shows me the table structure when I run the final select statement.
When I run these exact same commands on a smaller CSV file, they work fine.
I imagine this is not working because the csv is so large, but I'm not sure
how to confirm that or what to change if it is.  I do want to get all
columns from the CSV into the data table, so I don't want to filter
anything.

library(sqldf)
setwd(R:\\American Community Survey\\Data\\2009)
sqldf(attach 'sqlite' as new)
read.csv.sql(ss09pusa.csv , sql=create table ss09pusa as select * from
file , dbname=sqlite)
sqldf(select * from ss09pusa limit 3,dbname=sqlite)


I've also tried using the SQL IMPORT command, which I couldn't get working
properly, even on a tiny two-field, five-row CSV file.

library(RSQLite)
setwd(R:\\American Community Survey\\Data\\2009)
in_csv - file(test.csv)
out_db - dbConnect(SQLite(), dbname=sqlite.db)
dbGetQuery(out_db , create table test (hello integer, world text))
dbGetQuery(out_db , import in_csv test)


Any advice would be sincerely appreciated.

Thanks!

Anthony Damico
Kaiser Family Foundation

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] jackknife-after-bootstrap

2010-11-15 Thread Tim Hesterberg

Can someone help me about detection of outliers using jackknife after
bootstrap algorithm?

A simple procedure is to calculate the mean of the bootstrap
statistics for all bootstrap samples that omit the first of the
original observations.  Repeat for the second, third, ... original
observation.  You now have $n$ means, and can look at these for
outliers.

A similar approach is to calculate means of bootstrap statistics
for samples that include (rather than omit) each of the the original 
observations.

Both of those approaches can suffer from considerable random variability.
Provided the number of bootstrap samples is large, a better approach
is to use linear regression, where
y = the vector of bootstrap statistics, length R
X = R by n matrix, with X[i, j] = the number of times
original observation j is included in bootstrap sample i
and without an intercept.  The $n$ regression coefficients give estimates
of the influence of the original observations, and you can look for outliers
in these influence estimates.

For comparison, the first simple procedure above corresponds to
taking averages of y for rows with X[, j] = 0, and the similar approach
to averaging y for rows with X[, j]  0.

For further discussion see
Hesterberg, Tim C. (1995), Tail-Specific Linear Approximations for Efficient 
Bootstrap Simulations, Journal of Computational and Graphical Statistics, 
4(2), 113-133.

Hesterberg, Tim C. and Stephen J. Ellis (1999), Linear Approximations for 
Functional Statistics in Large-Sample Applications, Technical Report No. 86, 
Research Department, MathSoft, Inc., 1700 Westlake Ave. N., Suite 500, Seattle, 
WA 98109.
http://home.comcast.net/~timhesterberg/articles/tech86-linear.pdf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with Hist

2010-11-15 Thread Ivan Calandra


Hi,

I think you should also give the upper extreme:

x - c(rnorm(80)+10, 101:110, 2001:2010)
hist(x, breaks=c(0, 20, 40, 60, 80, 100, 200, 500))
Error in hist.default(x, breaks = c(0, 20, 40, 60, 80, 100, 200, 500)) :
  some 'x' not counted; maybe 'breaks' do not span range of 'x'
hist(x, breaks=c(0, 20, 40, 60, 80, 100, 200, 500, 2100))  ## which 
looks horrible, but works, up to you how to cut it


HTH,
Ivan

Le 11/15/2010 15:53, Steve Sidney a écrit :

Dear list

I am trying to re-scale a histogram and using hist() but can not seem 
to create a reduced scale where the upper values are not plotted.


What I have is about 100 of which 80 or so are between a value of 0 
and 40 , one or two in the hundreds and an outlier around 2000.


What I would like to do is create an x-scale that shows 5 bins between 
0-100 and then 3/4 bins between 100 and 2000 but I don't need any 
resolution on the above 100 values.


If I use breaks c(0, 20, 40, 60, 80, 100, 200, 500) R gives me an 
error saying that there are values not included, which of course I 
know but I wish to ignore them.


It seems that I am missing something quite simple.

Any help would be appreciated.

Regards
Steve

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



--
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calan...@uni-hamburg.de

**
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Null values in R

2010-11-15 Thread jim holtman

Are you talking about NULL or NA that may occur in some data?  Can you
give an example of what your concern is and what set of operations you
want to do.  If they are NAs, there are some standard ways that they
can be handled.

On Mon, Nov 15, 2010 at 9:55 AM, Raji raji.sanka...@gmail.com wrote:

 Hi R-helpers , can you please let me know the methods in which NULL values
 can be handled in R? Are there any generic commands/functions that can be
 given in a workspace,so that the NULL values occuring in that workspace (for
 any datasets that are loaded , any output that is calculated) , are
 considered in the same way?

 Thanks in advance.
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Null-values-in-R-tp3043184p3043184.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ... predict.coxph

2010-11-15 Thread Mike Marchywka

 From: thern...@mayo.edu
 To: james.whan...@gmail.com
 Date: Mon, 15 Nov 2010 08:43:04 -0600
 CC: r-help@r-project.org
 Subject: Re: [R] ... predict.coxph

 If you are looking at radioactive decay maybe but how often do
 you actually see exponential KM curves in real life?
 Exponential curves are rare. But proportional hazards does not imply
 exponential.

Well, I'm being a bit extreme but of course the point would
be as you being deviating from that the fit could arguably
become less connected to anything of causal relevance. 
You've probably seen exchanges I've had with people wondering
why their linear regression doesn't work over arbitrary 
data blobs, how much more so here. A time constant that you
can relate to tunnelling probability has some relevance. 

  A trial design
 could in fact try to get all the control sample to event at the same
 time if enough was known about prognostic factors and natural trajectory

 You are a dreamer. We know very little about even the diseases we know
 best. The life insurers are in no danger yet from accurate predictions
 by the medical community.

OBviously the quesion is a bit hypotheticl, but thinking ahead
I wonder what kind of curves you are likely to see and how they 
relate to each other. I just thought if you defended the model,
and I admit to only skimming prior conversation, you may
be willing to volunteer additional summary comments.
I guess you could take something like a sigmoidal survival
curve and various limits of that. Certainly this won't happen
tomorrow but presumably the goal here is to reduce random/noise/variance
 and in any case use statistics to estimate
parameters that can relate to something physically measurable ( blood
parameters or something). In any case, what kinds of things could
a putative treament do to a curve like this that would not be
reasoably described by coxph ? Is this not something that you 
ever expect to care about? I guess my point is that the control
and treatment curves needs to have some relationships to make
ph parameters fit to anything relating to somehting causal. 

I'd also note that insurers typically need a prognosis when there is
no detectable evidence of a disease but trial criteria may have
some range of disease-range parameters that will allow enrollment.
Picking out small deviations from normal to first identify the
diseaes that will get you and then the time to death is of course
much more difficult than plotting a trajectory when someone has
clinical manifestations of a speicific diease. 

Thanks.

 Terry T

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with Hist



SteveSB wrote:
 
 What I have is about 100 of which 80 or so are between a value of 0 and 
 40 , one or two in the hundreds and an outlier around 2000.
 
 

Have a look at gap.barplot in package plotrix. Personally, I prefer to use
xlim(x(0,100) in this case and add one outlier at t=2000 in the legend.

Dieter


-- 
View this message in context: 
http://r.789695.n4.nabble.com/Help-with-Hist-tp3043187p3043211.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Null values in R

2010-11-15 Thread Jonathan P Daily

Would ?is.null be what you are looking for?
--
Jonathan P. Daily
Technician - USGS Leetown Science Center
11649 Leetown Road
Kearneysville WV, 25430
(304) 724-4480
Is the room still a room when its empty? Does the room,
 the thing itself have purpose? Or do we, what's the word... imbue it.
 - Jubal Early, Firefly



From:
Raji raji.sanka...@gmail.com
To:
r-help@r-project.org
Date:
11/15/2010 09:58 AM
Subject:
[R] Null values in R
Sent by:
r-help-boun...@r-project.org




Hi R-helpers , can you please let me know the methods in which NULL values
can be handled in R? Are there any generic commands/functions that can be
given in a workspace,so that the NULL values occuring in that workspace 
(for
any datasets that are loaded , any output that is calculated) , are
considered in the same way?

Thanks in advance.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Null-values-in-R-tp3043184p3043184.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how to store package options over sessions?

2010-11-15 Thread Mark Heckmann


I want to define some options for my package the user may change.
It would be convenient if the changes could be saved  when terminating  
an R session and recovered automatically on the next package load.


What is the standard way to implement this?

TIA
Mark

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to Read a Large CSV into a Database with R



Anthony-107 wrote:
 
 Hi, I'm working in R 2.11.1 x64 on Windows x86_64-pc-mingw32.  I'm trying
 to
 insert a very large CSV file into a SQLite database. 
 

Better use an external utility if this is a one-time import for this job:

http://sqlitebrowser.sourceforge.net/


Dieter

-- 
View this message in context: 
http://r.789695.n4.nabble.com/How-to-Read-a-Large-CSV-into-a-Database-with-R-tp3043209p3043226.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with Hist

2010-11-15 Thread Ben Bolker

Steve Sidney sbsidney at mweb.co.za writes:

 I am trying to re-scale a histogram and using hist() but can not seem to 
 create a reduced scale where the upper values are not plotted.
 
 What I have is about 100 of which 80 or so are between a value of 0 and 
 40 , one or two in the hundreds and an outlier around 2000.
 
 If I use breaks c(0, 20, 40, 60, 80, 100, 200, 500) R gives me an error 
 saying that there are values not included, which of course I know but I 
 wish to ignore them.

  Subset your data first to remove the outlier?

x.trimmed - x[x2000]

or

x.trimmed - subset(x,x2000) 

?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with Hist

2010-11-15 Thread sbsidney

Thanks

What you have suggested of course works but I am trying to reduce the 
'ugliness'.

Anybody got any other ideas?

Regards
Steve

Sent from my BlackBerry® wireless device

-Original Message-
From: Ivan Calandra ivan.calan...@uni-hamburg.de
Sender: r-help-boun...@r-project.org
Date: Mon, 15 Nov 2010 16:08:47 
To: r-help@r-project.org
Reply-To: ivan.calan...@uni-hamburg.de
Subject: Re: [R] Help with Hist

Hi,

I think you should also give the upper extreme:

x - c(rnorm(80)+10, 101:110, 2001:2010)
hist(x, breaks=c(0, 20, 40, 60, 80, 100, 200, 500))
Error in hist.default(x, breaks = c(0, 20, 40, 60, 80, 100, 200, 500)) :
   some 'x' not counted; maybe 'breaks' do not span range of 'x'
hist(x, breaks=c(0, 20, 40, 60, 80, 100, 200, 500, 2100))  ## which 
looks horrible, but works, up to you how to cut it

HTH,
Ivan

Le 11/15/2010 15:53, Steve Sidney a écrit :
 Dear list

 I am trying to re-scale a histogram and using hist() but can not seem 
 to create a reduced scale where the upper values are not plotted.

 What I have is about 100 of which 80 or so are between a value of 0 
 and 40 , one or two in the hundreds and an outlier around 2000.

 What I would like to do is create an x-scale that shows 5 bins between 
 0-100 and then 3/4 bins between 100 and 2000 but I don't need any 
 resolution on the above 100 values.

 If I use breaks c(0, 20, 40, 60, 80, 100, 200, 500) R gives me an 
 error saying that there are values not included, which of course I 
 know but I wish to ignore them.

 It seems that I am missing something quite simple.

 Any help would be appreciated.

 Regards
 Steve

__
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calan...@uni-hamburg.de

**
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] About upgrade R

2010-11-15 Thread John C Frain

Tal

My main use of R now is on Windows 7. As explained I always retain at
least one previous version on windows 7 PCs. My upgrade is done as
follows -

1) Download and install the binary install program for R and install.
2) Rename the library directory (default - C:\Program
Files\R\R-2.12.0patched\library) to library2. (Windows 7 will ask for
confirmation)
3) Copy the library directory from the previous version of R to the R
directory (default - C:\Program Files\R\R-2.12.0patched\) ((Windows 7
will ask for confirmation).
4) copy the contents of the library2 directory to the library
directory in the new R directory.
5) Right click on the R directory and select run as administrator to
start R as administrator.
6) In R run some variant of update.packages(checkBuilt=TRUE). On
occasion one will find that packages are reporting errors in the CRAN
compile and binary versions are not yet available for the new version
of R. I delete these from the library directory and look in CRAN for
possible explanations. Anyway I can revert back to the old version if
I need these packages. Generally one may find that the missing
packages are available shortly afterwards.
.
This procedure is fairly close to that recommended in the R FAQ and
meets my needs. I think that it is necessary to keep libraries for
different versions separate. Other users may have different
requirements and other update methods may be more appropriate to them.
There may be no method that is best for all users. I would imagine
that one could write a DOS or Power Shell or Python (or other) script
that would automate this process.

Best Regards

John

On 14 November 2010 20:24, Tal Galili tal.gal...@gmail.com wrote:
Hi John, thank you for that input.
It could be that the code I wrote here:
http://www.r-statistics.com/2010/04/changing-your-r-upgrading-strategy-and-the-r-code-to-do-it-on-windows/
Should be updated so every time you install a new R version, you run the
code for it to:
1) copy all packages from the old R version to the new R version library
2) update all the packages.
But I have no clue how to do step 1.
How do you find out the latest R version that was install previous to the
current one?
And then, how would you find where it's package library is?
If you could do this in R, then installing a new version of R could be made
simpler for you.
Cheers,
Tal

Contact
Details:---
Contact me: tal.gal...@gmail.com | 972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--

On Sun, Nov 14, 2010 at 10:05 PM, John C Frain fra...@gmail.com wrote:

The current method allows one to easily retain several versions
working in parallel. This particularly important if some package is
not available in the new version. A few years ago there were problems
such as these during a major overhaul of the rmetrics group of
packages. My current practice is to retain older versions until I am
sure that all I need is available in the new version. Thus I am in
favour of retaining the current system.

John

On Sunday, Novembe 14, 2010, Uwe Ligges lig...@statistik.tu-dortmund.de
wrote:

On 14.11.2010 17:59, Ajay Ohri wrote:

wont it make more common sense to make updating packages also as part
of every base version install BY Default.. just saying

At least I do not like the idea: If I just want to try a beta version, I
do not want that everything is updated and I can't switch back to my last
stable version.

Uwe Ligges

Websites-
http://decisionstats.com
http://dudeofdata.com

Linkedin- www.linkedin.com/in/ajayohri

2010/11/14 Uwe Liggeslig...@statistik.tu-dortmund.de:

Upgrading is mentioned in the FAQs / R for Windows FAQs.

If you have your additionally installed packages in a separate library
(not
the R base library) you can simply run

update.packages(checkBuilt=TRUE)

If not ...

Uwe Ligges

On 14.11.2010 15:51, Stephen Liu wrote:

Hi all,

Win 7 64-bit
R version 2.11.1 (2010-05-31)

I want to upgrade R to version 2.12.0
R-2.12.0 for Windows (32/64 bit)
http://cran.r-project.org/bin/windows/base/

I found steps on following site;
How to upgrade R on windows – another strategy (and the R code to do it)

http://www.r-statistics.com/2010/04/changing-your-r-upgrading-strategy-and-the-r-code-to-do-it-on-windows/

I wonder is there a straight forwards way to upgrade the package direct
on
repo? TIA

B.R.
Stephen L

__
r-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained,

[R] repository of earlier Windows versions of R packages

2010-11-15 Thread Jonathan Williams


Dear Helpers,
I was trying to find a repository of earlier Windows versions of R packages. 
However, while I can find the Archives for Linux versions (in the Old Sources 
section of each package's Downloads) , I cannot find one for Windows versions. 
Does such a repository exist? If so, where can I find it?ThanksJonathan Williams
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Kalman Filter

2010-11-15 Thread Garten Stuhl

Hello,



thanks for answer my Question. I prefer use KalmanLike(y, mod, nit = 0,
fast=TRUE). For parameter estimating I have a given time series. In these
are several components: Season and noise; furthermore it gives a mean
reversion process. The season is modelled as a fourierpolynom. From the
given time series I have to estimate the

- Season parameters

- The mean reversion factor

- variance from the noise



I think in the function KalmanLike y is the vector of the time series; what
does mod mean? How can I write the syntax for the state space?



Have anybody a simple example for better understanding KalmanLike. Or is it
better to use  other packages for parameter estimating?



I have no experience in work with Kalman filters and I'm a new R user.



Thanks for helping.



Best,

Thomas

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem in installing and starting Rattle

2010-11-15 Thread Feng Mai


I also have the problem trying to start rattle

Windows 7 32-bit R 2.12.0

When I try library(rattle) I get an error message
The procedure entry point deflateSetHeader could not be located in the
dynamic link library zilb1.dll
I hit OK and it prompts me to install GTK+ again. I tried to uninstall GTK+
first and delete all related files as Dieter suggested but still wont work
:(

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Problem-in-installing-and-starting-Rattle-tp3042502p3043262.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to move an internal function to external keeping same environment?

2010-11-15 Thread Matthieu Stigler


Le 15. 11. 10 14:14, Duncan Murdoch a écrit :

On 15/11/2010 7:48 AM, Matthieu Stigler wrote:

Hi

I have within a quite big function foo1, an internal function foo2. Now,
in order to have a cleaner code, I wish to have the internal foo2 as
external. This foo2 was using arguments within the foo1 environment
that were not declared as inputs of foo2, which works as long as foo2 is
within foo1, but not anymore if foo2 is external, as is the case now.

Now, I could add all those arguments as inputs to foo2, but I feel if
foo2 is called often, I would be copying those objects more than
required. Am I wrong?

I then used this to avoid to declare explcitely each argument to foo2:

foo1-function(x){
b-x[1]+2
environment(foo2)-new.env(parent =as.environment(-1))
c-foo2(x)

return(c)
}

foo2-function(x)  x*b
#try:
foo1(1:100)


This works. But I wanted to be sure:

-am I right that if I instead declare each element to be passed to foo2,
this would be more copying than required? (imagine b in my case a heavy
dataset, foo2 a long computation)
-is this lines environment(foo2)-new.env(parent =as.environment(-1))
the good way to do it or it can have unwanted implications?



I don't think modifying the environment of a closure could be seen as 
a way to get cleaner code.  I'd just leave foo2 within foo1.


There are several unwanted implications of the way you do it:

 - Setting the environment of foo2 to the foo1 evaluation frame means 
those local variables won't be garbage collected.  If foo1 creates a 
large temporary matrix, it will take up space in memory until you 
modify foo2 again.


 - If we say that foo2 has global scope (it might be limited to your 
package namespace, but let's call that global), then your foo1 has 
global side effects, and those are often a bad idea.  For example, 
suppose next week you define foo3 that also uses and modifies foo2.  
It's very easy for foo1 and foo3 to clash with conflicting use of the 
global foo2.  Don't use globals unless you really have to.


Duncan Murdoch

Dear Gabor, dear Duncan

Thanks a lot both of you for your fast and interesting answer!

I have limited understanding of Duncan's points but will follow your 
advice not to do it like this. If I am nervertheless quit keen to use 
foo2 externally, is the use of either assign() in foo1, or mget() in 
foo2 more indicated? Or are the same kind of remarks raised against 
environment() also relevant for assign() and mget()?


Thanks a lot!!

Matthieu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to Read a Large CSV into a Database with R

On Mon, Nov 15, 2010 at 10:07 AM, Anthony Damico ajdam...@gmail.com wrote:
 Hi, I'm working in R 2.11.1 x64 on Windows x86_64-pc-mingw32.  I'm trying to
 insert a very large CSV file into a SQLite database.  I'm pretty new to
 working with databases in R, so I apologize if I'm overlooking something
 obvious here.

 I'm trying to work with the American Community Survey data, which is two
 1.3GB csv files.  I have enough RAM to read one of them into memory, but not
 both at the same time.  So, in order to analyze them, I'm trying to get them
 into a SQLite database so I can use the R survey package's database-backed
 survey objects capabilities (
 http://faculty.washington.edu/tlumley/survey/svy-dbi.html).

 I need to combine both of these CSV files into one table (within a
 database), so I think that I'd need a SQL manipulation technique that reads
 everything line by line, instead of pulling it all into memory.

 I've tried using read.csv.sql, but it finishes without an error and then
 only shows me the table structure when I run the final select statement.
 When I run these exact same commands on a smaller CSV file, they work fine.
 I imagine this is not working because the csv is so large, but I'm not sure
 how to confirm that or what to change if it is.  I do want to get all
 columns from the CSV into the data table, so I don't want to filter
 anything.

 library(sqldf)
 setwd(R:\\American Community Survey\\Data\\2009)
 sqldf(attach 'sqlite' as new)
 read.csv.sql(ss09pusa.csv , sql=create table ss09pusa as select * from
 file , dbname=sqlite)
 sqldf(select * from ss09pusa limit 3,dbname=sqlite)


What the above code does, which is unlikely to be what you intended,
is to create an sqlite database called 'sqlite' and then read in the
indicated file into sqlite, read it in into R from sqlite (clearly
this step will fail if the data is too big for R but it its not then
you are ok) and then delete the table from the database so your sqldf
statement should give an error since there is no such table or else if
you have a data frame in your R workspace called ss09pusa the sqldf
statement will load that into a database table and the retrieve its
first three rows and then delete the table.

This sort of task is probably more suitable for RSQLite than sqldf
but if you wish to do it with sqldf you need to follow example 9 or
example 10 on the sqldf home page:

In example 9,

http://code.google.com/p/sqldf/#Example_9.__Working_with_Databases

its very important to note that sqldf automatically deletes any table
that it created after the sqldf or read.csv.sql statement is done so
to not have the table dropped is to make sure you issue an sql
statement that creates the table, create table mytab as select ...
rather than sqldf.

In example 10,

http://code.google.com/p/sqldf/#Example_10._Persistent_Connections

persistent connections are illustrated which represents an alternate
way to do this in sqldf.


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] merge two dataset and replace missing by 0

2010-11-15 Thread Kate Hsu

Hi r users,

I have two data sets (X1, X2). For example,
time1-c( 0,   8,  15,  22,  43,  64,  85, 106, 127, 148, 169, 190 ,211 )
outpue1-c(171 ,164 ,150 ,141 ,109 , 73 , 47  ,26  ,15  ,12   ,6   ,2   ,1 )
X1-cbind(time1,outpue1)

time2-c( 0   ,8  ,15 , 22  ,43 , 64  ,85 ,106 ,148)
output2-c( 5   ,5   ,4   ,5   ,5   ,4   ,1   ,2 ,  1 )
X2-cbind(time2,output2)

I want to merge X1 and X2 into a big dataset X by time1 and time2 so that
the missing item in output2 will be replace by 0. For example, there is no
output2 when time2=127, then the corresponding output will be 0. Anyone know
how to use merge command to deal with this?

Thanks,

Kate

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fetching data

2010-11-15 Thread Feng Mai


IMO it is not possible. The code behind aspx page queries data from a
database server and display it on the webpage.


Maithula Chandrashekhar wrote:
 
 Dear all R users, I am wondering is there any procedure exists on R to
 fetch
 data directly from http://www.ncdex.com/Market_Data/Spot_price.aspx; and
 save it in some time series object, without filling any field in that
 website.
 
 Can somebody point me whether it is possible?
 
 Thanks and regards,
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Fetching-data-tp3041662p3043275.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Kalman Filter

2010-11-15 Thread John Kerpel

Try the most excellent package dlm written by Giovanni Petris for your all
your Kalman filter needs.  Also buy the accompanying book - it really
integrates the dlm package with the theory behind it.

Best,

John

On Mon, Nov 15, 2010 at 8:39 AM, Garten Stuhl
gartenstu...@googlemail.comwrote:

 Hello,



 thanks for answer my Question. I prefer use KalmanLike(y, mod, nit = 0,
 fast=TRUE). For parameter estimating I have a given time series. In these
 are several components: Season and noise; furthermore it gives a mean
 reversion process. The season is modelled as a fourierpolynom. From the
 given time series I have to estimate the

 - Season parameters

 - The mean reversion factor

 - variance from the noise



 I think in the function KalmanLike y is the vector of the time series; what
 does mod mean? How can I write the syntax for the state space?



 Have anybody a simple example for better understanding KalmanLike. Or is it
 better to use  other packages for parameter estimating?



 I have no experience in work with Kalman filters and I'm a new R user.



 Thanks for helping.



 Best,

 Thomas

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem in installing and starting Rattle

2010-11-15 Thread Feng Mai


Ok to follow up my post, I finally got rattle and RGtk2 to work. The trick is
when R prompts me to install Gtk2+ I still hit yes but after the download,
once the installation process starts I close the R Gui window. After Gtk2+
installation is complete I start R again and it worked. 
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Problem-in-installing-and-starting-Rattle-tp3042502p3043318.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] repository of earlier Windows versions of R packages

2010-11-15 Thread Uwe Ligges




On 15.11.2010 15:52, Jonathan Williams wrote:


Dear Helpers,
I was trying to find a repository of earlier Windows versions of R packages. However, 
while I can find the Archives for Linux versions (in the Old Sources section 
of each package's Downloads) , I cannot find one for Windows versions. Does such a 
repository exist? If so, where can I find it?ThanksJonathan Williams


There is no such archive.

If you need a specific version of a package for a specific version of R, 
you need to recompile it yourself from sources.
The only thing we have is the last version of a package that was built 
for/with a specific version of R, see

your-CRAN-mirror/bin/windows/contrib/

Best,
Uwe Ligges






[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] merge two dataset and replace missing by 0

2010-11-15 Thread Uwe Ligges


See ?merge with argument all=TRUE and replace by 0 afterwards.

Uwe Ligges



On 15.11.2010 16:42, Kate Hsu wrote:

Hi r users,

I have two data sets (X1, X2). For example,
time1-c( 0,   8,  15,  22,  43,  64,  85, 106, 127, 148, 169, 190 ,211 )
outpue1-c(171 ,164 ,150 ,141 ,109 , 73 , 47  ,26  ,15  ,12   ,6   ,2   ,1 )
X1-cbind(time1,outpue1)

time2-c( 0   ,8  ,15 , 22  ,43 , 64  ,85 ,106 ,148)
output2-c( 5   ,5   ,4   ,5   ,5   ,4   ,1   ,2 ,  1 )
X2-cbind(time2,output2)

I want to merge X1 and X2 into a big dataset X by time1 and time2 so that
the missing item in output2 will be replace by 0. For example, there is no
output2 when time2=127, then the corresponding output will be 0. Anyone know
how to use merge command to deal with this?

Thanks,

Kate

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] merge two dataset and replace missing by 0

2010-11-15 Thread David Winsemius



On Nov 15, 2010, at 10:42 AM, Kate Hsu wrote:


Hi r users,

I have two data sets (X1, X2). For example,
time1-c( 0,   8,  15,  22,  43,  64,  85, 106, 127, 148, 169, 190 , 
211 )
outpue1-c(171 ,164 ,150 ,141 ,109 , 73 , 47  ,26  ,15  ,12   ,6   , 
2   ,1 )

X1-cbind(time1,outpue1)

time2-c( 0   ,8  ,15 , 22  ,43 , 64  ,85 ,106 ,148)
output2-c( 5   ,5   ,4   ,5   ,5   ,4   ,1   ,2 ,  1 )
X2-cbind(time2,output2)

I want to merge X1 and X2 into a big dataset X by time1 and time2 so  
that
the missing item in output2 will be replace by 0. For example, there  
is no
output2 when time2=127, then the corresponding output will be 0.  
Anyone know

how to use merge command to deal with this?


 merge(X1,X2, by.x=time1, by.y=time2, all=TRUE)
   time1 outpue1 output2
1  0 171   5
2  8 164   5
3 15 150   4
4 22 141   5
5 43 109   5
6 64  73   4
7 85  47   1
8106  26   2
9127  15  NA
10   148  12   1
11   169   6  NA
12   190   2  NA
13   211   1  NA



Thanks,

Kate

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to move an internal function to external keeping same environment?

On Mon, Nov 15, 2010 at 10:49 AM, Matthieu Stigler
matthieu.stig...@gmail.com wrote:
 I have limited understanding of Duncan's points but will follow your advice
 not to do it like this. If I am nervertheless quit keen to use foo2
 externally, is the use of either assign() in foo1, or mget() in foo2 more
 indicated? Or are the same kind of remarks raised against environment() also
 relevant for assign() and mget()?


Another way to approach this, which is safer, is to put it into an OO
framework creating a proto object (package proto) that contains the
shared data and make foo and foo2 its methods.

library(proto) # see http://r-proto.googlecode.com

# create proto object

p - proto()

# add foo method which stores mydata in receiver object
# and runs foo2

p$foo - function(.) { .$mydata - 1:3; .$foo2() }

# add foo2 method which grabs mydata from receiver
# and calculates result

p$foo2 - function(.) sum(.$mydata)

# run foo method - result is 6

p$foo()


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R crash

2010-11-15 Thread Jeremy MAZET

We distribute several R applications using the tcltk package on different 
servers or PC (Windows XP). On some machines and in a not reproducible 
way, all the R windows disappear when using functions like tkgetSaveFile 
or tkchooseDirectory. The R application remains open (the Rgui.exe 
processus is still available)!
Despite several tests, we can not identify the origin of this problem 
which seems independent of the R version! 
Have you any ideas? Are there any compatibility issues between the tcltk 
package and other applications?


Jérémy 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] merge two dataset and replace missing by 0

On Mon, Nov 15, 2010 at 10:42 AM, Kate Hsu yhsu.rh...@gmail.com wrote:
 Hi r users,

 I have two data sets (X1, X2). For example,
 time1-c( 0,   8,  15,  22,  43,  64,  85, 106, 127, 148, 169, 190 ,211 )
 outpue1-c(171 ,164 ,150 ,141 ,109 , 73 , 47  ,26  ,15  ,12   ,6   ,2   ,1 )
 X1-cbind(time1,outpue1)

 time2-c( 0   ,8  ,15 , 22  ,43 , 64  ,85 ,106 ,148)
 output2-c( 5   ,5   ,4   ,5   ,5   ,4   ,1   ,2 ,  1 )
 X2-cbind(time2,output2)

 I want to merge X1 and X2 into a big dataset X by time1 and time2 so that
 the missing item in output2 will be replace by 0. For example, there is no
 output2 when time2=127, then the corresponding output will be 0. Anyone know
 how to use merge command to deal with this?


Since these are time series you might want to use a time series
package to do this:

library(zoo)
time1-c( 0,   8,  15,  22,  43,  64,  85, 106, 127, 148, 169, 190 ,211 )
output1-c(171 ,164 ,150 ,141 ,109 , 73 , 47  ,26  ,15  ,12   ,6   ,2   ,1 )

time2-c( 0   ,8  ,15 , 22  ,43 , 64  ,85 ,106 ,148)
output2-c( 5   ,5   ,4   ,5   ,5   ,4   ,1   ,2 ,  1 )

z1 - zoo(output1, time1)
z2 - zoo(output2, time2)
merge(z1, z2, fill = 0)


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] merge two dataset and replace missing by 0

2010-11-15 Thread Kate Hsu

Thanks for all of your help. It works to me.
Kate


On Mon, Nov 15, 2010 at 10:06 AM, David Winsemius dwinsem...@comcast.netwrote:


 On Nov 15, 2010, at 10:42 AM, Kate Hsu wrote:

 Hi r users,

 I have two data sets (X1, X2). For example,
 time1-c( 0,   8,  15,  22,  43,  64,  85, 106, 127, 148, 169, 190 ,211 )
 outpue1-c(171 ,164 ,150 ,141 ,109 , 73 , 47  ,26  ,15  ,12   ,6   ,2   ,1
 )
 X1-cbind(time1,outpue1)

 time2-c( 0   ,8  ,15 , 22  ,43 , 64  ,85 ,106 ,148)
 output2-c( 5   ,5   ,4   ,5   ,5   ,4   ,1   ,2 ,  1 )
 X2-cbind(time2,output2)

 I want to merge X1 and X2 into a big dataset X by time1 and time2 so that
 the missing item in output2 will be replace by 0. For example, there is no
 output2 when time2=127, then the corresponding output will be 0. Anyone
 know
 how to use merge command to deal with this?


  merge(X1,X2, by.x=time1, by.y=time2, all=TRUE)
   time1 outpue1 output2
 1  0 171   5
 2  8 164   5
 3 15 150   4
 4 22 141   5
 5 43 109   5
 6 64  73   4
 7 85  47   1
 8106  26   2
 9127  15  NA
 10   148  12   1
 11   169   6  NA
 12   190   2  NA
 13   211   1  NA



 Thanks,

 Kate

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 David Winsemius, MD
 West Hartford, CT



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fetching data

2010-11-15 Thread Barry Rowlingson

On Mon, Nov 15, 2010 at 3:46 PM, Feng Mai maif...@gmail.com wrote:

 IMO it is not possible. The code behind aspx page queries data from a
 database server and display it on the webpage.

 That doesn't make it possible. Your web browser is sending a request
to the web server, and whatever happens behind the scenes at that end,
what comes back is a table in a web page which can be scraped.

 The tricky thing is constructing exactly the right request to get the
data you want.

 It can sometimes be as simple as constructing a url, something like
'http://example.com/price/gold/2010/11/12;, or it could need
parameters: http://example.com/price/commodity=goldyear=2010month=11;.

 Or it can be like this - some active server pages that pass a massive
VIEWSTATE object back and forth on each request. It's not impossible
to script it, but you just have a lot more mess to deal with. I have
done this kind of thing for some stupid web sites in the past. These
days people enjoy building sensible APIs to their web data.

 Oh, and it might be violating the site's terms and conditions of course.

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with Hist

2010-11-15 Thread Ivan Calandra

Well, another possibility would be to edit the plot so that you cut the 
empty part (between 300 and 2000).
There might be some function that can do it, maybe the 
plotrix::gap.barplot() that Dieter already told you about.


Le 11/15/2010 16:22, sbsid...@mweb.co.za a écrit :

Thanks

What you have suggested of course works but I am trying to reduce the 
'ugliness'.

Anybody got any other ideas?

Regards
Steve

Sent from my BlackBerry® wireless device

-Original Message-
From: Ivan Calandraivan.calan...@uni-hamburg.de
Sender: r-help-boun...@r-project.org
Date: Mon, 15 Nov 2010 16:08:47
To:r-help@r-project.org
Reply-To: ivan.calan...@uni-hamburg.de
Subject: Re: [R] Help with Hist

Hi,

I think you should also give the upper extreme:

x- c(rnorm(80)+10, 101:110, 2001:2010)
hist(x, breaks=c(0, 20, 40, 60, 80, 100, 200, 500))
Error in hist.default(x, breaks = c(0, 20, 40, 60, 80, 100, 200, 500)) :
some 'x' not counted; maybe 'breaks' do not span range of 'x'
hist(x, breaks=c(0, 20, 40, 60, 80, 100, 200, 500, 2100))  ## which
looks horrible, but works, up to you how to cut it

HTH,
Ivan

Le 11/15/2010 15:53, Steve Sidney a écrit :

Dear list

I am trying to re-scale a histogram and using hist() but can not seem
to create a reduced scale where the upper values are not plotted.

What I have is about 100 of which 80 or so are between a value of 0
and 40 , one or two in the hundreds and an outlier around 2000.

What I would like to do is create an x-scale that shows 5 bins between
0-100 and then 3/4 bins between 100 and 2000 but I don't need any
resolution on the above 100 values.

If I use breaks c(0, 20, 40, 60, 80, 100, 200, 500) R gives me an
error saying that there are values not included, which of course I
know but I wish to ignore them.

It seems that I am missing something quite simple.

Any help would be appreciated.

Regards
Steve

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calan...@uni-hamburg.de

**
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to Read a Large CSV into a Database with R

2010-11-15 Thread Anthony Damico

Hi Gabor,

Thank you for the prompt reply.  I definitely looked over all of the
examples on the code.google.com sqldf page before sending, which is why I
wrote the code

read.csv.sql(ss09pusa.csv , sql=create table ss09pusa as select * from
file , dbname=sqlite)

directly pulled from their code -- read.csv.sql(~/tmp.csv, sql = create
table mytab as select * from file,dbname = mydb)

..but I don't understand why this helps me around the memory problem, since
I think it still all gets read into memory.  Is there a way to do this line
by line?

I would prefer to use SQLite than sqldf, but I could not get the IMPORT
command (or .IMPORT) functioning at all.  I tried these with both dbGetQuery
and dbSendQuery.

library(RSQLite)
setwd(R:\\American Community Survey\\Data\\2009)
out_db - dbConnect(SQLite(), dbname=sqlite.db)
dbGetQuery(out_db , create table test (hello integer, world text))
dbGetQuery(out_db , mode csv)
dbGetQuery(out_db , import test.csv test)

When I hit the mode and import commands, it gives me an error that makes me
think it's handling these files in a completely different way.

 dbGetQuery(out_db , mode csv)
Error in sqliteExecStatement(con, statement, bind.data) :
  RS-DBI driver: (error in statement: near mode: syntax error)


I suppose I could just run sqlite3 commands from the system() function, but
I was hoping there might be a way to accomplish this task entirely within R?



Thanks again!



On Mon, Nov 15, 2010 at 10:41 AM, Gabor Grothendieck 
ggrothendi...@gmail.com wrote:

 On Mon, Nov 15, 2010 at 10:07 AM, Anthony Damico ajdam...@gmail.com
 wrote:
  Hi, I'm working in R 2.11.1 x64 on Windows x86_64-pc-mingw32.  I'm trying
 to
  insert a very large CSV file into a SQLite database.  I'm pretty new to
  working with databases in R, so I apologize if I'm overlooking something
  obvious here.
 
  I'm trying to work with the American Community Survey data, which is two
  1.3GB csv files.  I have enough RAM to read one of them into memory, but
 not
  both at the same time.  So, in order to analyze them, I'm trying to get
 them
  into a SQLite database so I can use the R survey package's
 database-backed
  survey objects capabilities (
  http://faculty.washington.edu/tlumley/survey/svy-dbi.html).
 
  I need to combine both of these CSV files into one table (within a
  database), so I think that I'd need a SQL manipulation technique that
 reads
  everything line by line, instead of pulling it all into memory.
 
  I've tried using read.csv.sql, but it finishes without an error and then
  only shows me the table structure when I run the final select statement.
  When I run these exact same commands on a smaller CSV file, they work
 fine.
  I imagine this is not working because the csv is so large, but I'm not
 sure
  how to confirm that or what to change if it is.  I do want to get all
  columns from the CSV into the data table, so I don't want to filter
  anything.
 
  library(sqldf)
  setwd(R:\\American Community Survey\\Data\\2009)
  sqldf(attach 'sqlite' as new)
  read.csv.sql(ss09pusa.csv , sql=create table ss09pusa as select * from
  file , dbname=sqlite)
  sqldf(select * from ss09pusa limit 3,dbname=sqlite)
 

 What the above code does, which is unlikely to be what you intended,
 is to create an sqlite database called 'sqlite' and then read in the
 indicated file into sqlite, read it in into R from sqlite (clearly
 this step will fail if the data is too big for R but it its not then
 you are ok) and then delete the table from the database so your sqldf
 statement should give an error since there is no such table or else if
 you have a data frame in your R workspace called ss09pusa the sqldf
 statement will load that into a database table and the retrieve its
 first three rows and then delete the table.

 This sort of task is probably more suitable for RSQLite than sqldf
 but if you wish to do it with sqldf you need to follow example 9 or
 example 10 on the sqldf home page:

 In example 9,

 http://code.google.com/p/sqldf/#Example_9.__Working_with_Databases

 its very important to note that sqldf automatically deletes any table
 that it created after the sqldf or read.csv.sql statement is done so
 to not have the table dropped is to make sure you issue an sql
 statement that creates the table, create table mytab as select ...
 rather than sqldf.

 In example 10,

 http://code.google.com/p/sqldf/#Example_10._Persistent_Connections

 persistent connections are illustrated which represents an alternate
 way to do this in sqldf.


 --
 Statistics  Software Consulting
 GKX Group, GKX Associates Inc.
 tel: 1-877-GKX-GROUP
 email: ggrothendieck at gmail.com


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] jackknife-after-bootstrap

2010-11-15 Thread ufuk beyaztas


I am always trying but i could not do it. Are there any example about this
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Re-jackknife-after-bootstrap-tp3043213p3043398.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Integrate to 1? (gauss.quad)

2010-11-15 Thread Doran, Harold

Thank you, Doug. I am still missing something here. Should this simply be 
sum(f(x_i) * w_i) where x_i is node i and w_i is the weight at node i? So, my 
function f(x) = (1/(s*sqrt(2*pi)))  * exp(-((qq$nodes-mu)^2/(2*s^2))) would 
then be multiplied only by the weights, qq$weights

Here is some reproducible code:

 library(statmod)
 Q - 5
 mu - 0; s - 1
 qq - gauss.quad(Q, kind = 'hermite')
 sum((1/(s*sqrt(2*pi)))  * exp(-((qq$nodes-mu)^2/(2*s^2))) * qq$weights)
[1] 0.5775682

 sum(qq$weights)
[1] 1.772454

 (1/(s*sqrt(2*pi)))  * exp(-((qq$nodes-mu)^2/(2*s^2)))
[1] 0.05184442 0.25198918 0.39894228 0.25198918 0.05184442

 dnorm(qq$nodes)
[1] 0.05184442 0.25198918 0.39894228 0.25198918 0.05184442

 -Original Message-
 From: dmba...@gmail.com [mailto:dmba...@gmail.com] On Behalf Of Douglas Bates
 Sent: Sunday, November 14, 2010 2:28 PM
 To: Doran, Harold
 Cc: r-help@r-project.org
 Subject: Re: [R] Integrate to 1? (gauss.quad)
 
 I don't know about the statmod package and the gauss.quad function but
 generally the definition of Gauss-Hermite quadrature is with respect
 to the function that is multiplied by exp(-x^2) in the integrand.  So
 your example would reduce to summing the weights.
 
 On Sun, Nov 14, 2010 at 11:18 AM, Doran, Harold hdo...@air.org wrote:
  Does anyone see why my code does not integrate to 1?
 
  library(statmod)
 
  mu - 0
  s - 1
  Q - 5
  qq - gauss.quad(Q, kind='hermite')
  sum((1/(s*sqrt(2*pi)))  * exp(-((qq$nodes-mu)^2/(2*s^2))) * qq$weights)
 
  ### This does what's it is supposed to
  myNorm - function(theta) (1/(s*sqrt(2*pi)))  * exp(-((theta-mu)^2/(2*s^2)))
  integrate(myNorm, -Inf, Inf)
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RGoogleDocs stopped working

2010-11-15 Thread Harlan Harris

Thanks, Duncan. Finally getting a chance to follow up on this...

I tried again, changing and resetting my password, and trying to specify my
login and password manually in the getGoogleDocsConnection argument list. I
also tried removing either or both of the service and error options. No luck
in any case. I also tried a different Google account, also with no luck.

I've also tried tweaking the URL being generated by the code, and in all
cases, I get a 403: Forbidden error with content Error=BadAuthentication.

I don't really know enough about how authentication is supposed to work to
get much farther. Can you help? Should I try the Google API forum instead?

 -Harlan



 From: Duncan Temple Lang dun...@wald.ucdavis.edu
 To: r-help@r-project.org
 Date: Wed, 10 Nov 2010 10:33:47 -0800
 Subject: Re: [R] RGoogleDocs stopped working

 Hi Harlan

  I just tried to connect to Google Docs and I had ostensibly the same
 problem.
 However, the password was actually different from what I had specified.
 After resetting it with GoogleDocs, the getGoogleDocsConnection() worked
 fine. So I don't doubt that the login and password are correct, but
 you might just try it again to ensure there are no typos.
 The other thing to look at is the values for Email and Passwd
 sent in the URL, i.e. the string in url in your debugging
 below. (Thanks for that by the way). If either has special characters,
 e.g. , it is imperative that they are escaped correctly, i.e. converted
 to %24.  This should happen and nothing should have changed, but it is
 worth verifying.

  So things still seem to work for me. It is a data point, but not one
 that gives you much of a clue as to what is wrong on your machine.

  D.


On Wed, Nov 10, 2010 at 10:36 AM, Harlan Harris har...@harris.name wrote:

 Hello,

 Some code using RGoogleDocs, which had been working smoothly since the
 summer, just stopped working. I know that it worked on November 3rd, but it
 doesn't work today. I've confirmed that the login and password still work
 when I log in manually. I've confirmed that the URL gives the same error
 when I paste it into Firefox. I don't know enough about this web service to
 figure out the problem myself, alas...

 Here's the error and other info (login/password omitted):

  ss.con - getGoogleDocsConnection(login=gd.login, password=gd.password,
 service='wise', error=FALSE)
 Error: Forbidden

 Enter a frame number, or 0 to exit

 1: getGoogleDocsConnection(login = gd.login, password = gd.password,
 service = wise, error = FALSE)
 2: getGoogleAuth(..., error = error)
 3: getForm(https://www.google.com/accounts/ClientLogin;, accountType =
 HOSTED_OR_GOOGLE, Email = login, Passw
 4: getURLContent(uri, .opts = .opts, .encoding = .encoding, binary =
 binary, curl = curl)
 5: stop.if.HTTP.error(http.header)

 Selection: 4
 Called from: eval(expr, envir, enclos)
 Browse[1] http.header
Content-Type
 Cache-control  Pragma
text/plainno-cache,
 no-store  no-cache
 ExpiresDate
 X-Content-Type-Options
 Mon, 01-Jan-1990 00:00:00 GMT Wed, 10 Nov 2010 15:24:39
 GMT   nosniff
X-XSS-Protection
 Content-Length  Server
 1; mode=block
 24   GSE
  status   statusMessage
   403 Forbidden\r\n
 Browse[1] url
 [1] 
 https://www.google.com/accounts/ClientLogin?accountType=HOSTED%5FOR%5FGOOGLEEmail=***Passwd=***service=wisesource=R%2DGoogleDocs%2D0%2E1
 
 Browse[1] .opts
 $ssl.verifypeer
 [1] FALSE


  R.Version()
 $platform
 [1] i386-apple-darwin9.8.0

 $arch
 [1] i386

 $os
 [1] darwin9.8.0

 $system
 [1] i386, darwin9.8.0

 $status
 [1] 

 $major
 [1] 2

 $minor
 [1] 10.1

 $year
 [1] 2009

 $month
 [1] 12

 $day
 [1] 14

 $`svn rev`
 [1] 50720

 $language
 [1] R

 $version.string
 [1] R version 2.10.1 (2009-12-14)


  installed.packages()[c('RCurl', 'RGoogleDocs'), ]
 Package
 LibPath Version Priority Bundle
 Contains
 RCurl   RCurl
 /Users/hharris/Library/R/2.10/library 1.4-3 NA   NA
 NA
 RGoogleDocs RGoogleDocs
 /Library/Frameworks/R.framework/Resources/library 0.4-1 NA   NA
 NA
 Depends Imports LinkingTo
 Suggests   Enhances OS_type License Built
 RCurl   R (= 2.7.0), methods, bitops NA  NA
 Rcompression NA   NA  BSD   2.10.1
 RGoogleDocs RCurl, XML, methods   NA  NA
 NA NA   NA  BSD   2.10.1


 Any ideas? Thank you!

  -Harlan



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

Re: [R] How to Read a Large CSV into a Database with R

On Mon, Nov 15, 2010 at 11:46 AM, Anthony Damico ajdam...@gmail.com wrote:
 Hi Gabor,

 Thank you for the prompt reply.  I definitely looked over all of the
 examples on the code.google.com sqldf page before sending, which is why I
 wrote the code

 read.csv.sql(ss09pusa.csv , sql=create table ss09pusa as select * from
 file , dbname=sqlite)

 directly pulled from their code -- read.csv.sql(~/tmp.csv, sql = create
 table mytab as select * from file,dbname = mydb)

 ..but I don't understand why this helps me around the memory problem, since
 I think it still all gets read into memory.  Is there a way to do this line
 by line?


OK. Maybe its something else.

The reading in of the file into the database should not be a resource
problem provided you have enough disk space and appropriate
permissions. sqldf / RSQLite are used to get sqlite to do it so that
the data never goes through R at that stage so R limitations can't
affect the reading in to the sqlite database.   When you read it from
the sqlite database then R limitations come into effect so you just
have to be sure not to read too much in at a time.  The use of create
table ... as select ... is to prevent sqldf from deleting the table
since sqldf is normally used in a fashion where you don't want to know
about the back end databases so it tries to create them and delete
them behind the scenes but here you want to explicitly use them so
you have to work around that.

Try this example. It should be reproducible so you just have to copy
it and paste it into your R session.  Uncomment the indicated line if
you want to be able to remove any pre-existing mydb file in the
current directory.  Try it in a fresh R session just to be sure that
nothing mucks it up.

library(sqldf)

# uncomment next line to make sure we are starting clean
# if (file.exists(mydb)) file.remove(mydb)

# create new database
sqldf(attach 'mydb' as new)

# create a new file.  BOD is built into R and has 6 rows.
write.table(BOD, file = tmp.csv, quote = FALSE, sep = ,)

# read new file into database
read.csv.sql(tmp.csv, sql = create table mytab as select * from file,
dbname = mydb)

# how many records are in table?
N - sqldf(select count(*) from mytab, dbname = mydb)[[1]]

# read in chunks and display what we have read

k - 4 # no of records to read at once
for(i in seq(0, N-1, k)) {
s - sprintf(select * from mytab limit %d, %d, i, k)
print(sqldf(s, dbname = mydb))
}

On my machine I get this output:

  Time demand
118.3
22   10.3
33   19.0
44   16.0
  Time demand
15   15.6
27   19.8

showing that it read the 6 line BOD data frame in chunks of 4 as required.
-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] plot.dendrogram() plot margins

2010-11-15 Thread francois fauteux

Hello,

Is it possible to remove those extra margins on the sample axis from
plot.dendrogram:

par(oma=c(0,0,0,0),mar=c(0,0,0,0))
ddr-as.dendrogram(hclust(dist(matrix(sample(1:1000,200),nrow=100
stats:::plot.dendrogram(ddr,horiz=F,axes=F,yaxs=i,leaflab=none)

vs.

stats:::plot.dendrogram(ddr,horiz=T,axes=F,yaxs=i,leaflab=none)

What variable / line of code corresponds to this additional margin space? I
would like to modify the code to remove the extra space and have that margin
equal to that when horiz=T, for plotting multiple dendrograms for one images
on the same device.

Thanks in advance, best.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sweave question

2010-11-15 Thread Ralf B

Thank you for all your comment. In result of own research I found this
method that seems to do what I want in addition to your suggestions:

tools::texi2dvi(myfile.tex, pdf=TRUE)

Thanks again,
Ralf

On Mon, Nov 15, 2010 at 6:42 AM, Duncan Murdoch
murdoch.dun...@gmail.com wrote:
 On 15/11/2010 6:22 AM, Dieter Menne wrote:


 Duncan Murdoch-2 wrote:

 See SweavePDF() in the patchDVI package on R-forge.



 In case googling patchDVI only show a few Japanese Pages, and search for
 patchDVI in R-Forge gives nothing: try

 https://r-forge.r-project.org/projects/sweavesearch/

 (or did I miss something obvious, Duncan?)

 No, I just didn't realize that it was hard to find.  But you can always
 select R-forge as a repository, and then install.packages() will find it.

 Duncan Murdoch

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] rotate column names in large matrix

2010-11-15 Thread Lara Poplarski

Dear List,


I have a large (1600*1600) matrix generated with symnum, that I am using to
eyeball the structure of a dataset.


I have abbreviated the column names with the abbr.colnames option. One way
to get an even more compact view of the matrix would be to display the
column names rotated by 90 degrees.


Any pointers on how to do this would be most useful. Any other tips for
displaying the matrix in compact form are of course also welcome.


Many thanks,

Lara

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Sweave: Conditional code chunks?

2010-11-15 Thread Ralf B

I have a code junk that produces a figure. In my special case,
however, data does not always exist. In cases where data exists, the
code chunk is of course trival (case #1), however, what do I do for
case # 2 where the data does not exist?
I can obviously prevent the code from being executed by checking the
existence of the object x, but on the Sweave level I have a static
figure chunk. Here an example that should be reproducible:

# case 1
x - c(1,2,3)
# case 2 - no definition of variable
#x - c(1,2,3)

echo=FALSE, results=hide, fig=TRUE=
if(exists(as.character(substitute(meta.summary{
  plot(x)
}
@

In a way I would need a conditional chunk or a chunk that draws a
figure only if it was generated and ignores it otherwise.

Any ideas?

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Non-positive definite cross-covariance matrices

2010-11-15 Thread Jeff Bassett

I am creating covariance matrices from sets of points, and I am having
frequent problems where I create matrices that are non-positive
definite.  I've started using the corpcor package, which was
specifically designed to address these types of problems.  It has
solved many of my problems, but I still have one left.

One of the matrices I need to calculate is a cross-covariance matrix.
In other words, I need to calculate cov(A, B), where A and B are each
a matrix defining a set of points.  The corpcor package does not seem
to be able to perform this operation.

Can anyone suggest a way to create cross-covariance matrices that are
guaranteed (or at least likely) to be positive definite, either using
corpcor or another package?

I'm using R 2.8.1 and corpcor 1.5.2 on Mac OS X 10.5.8.

- Jeff

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] RE : Sweave: Conditional code chunks?

2010-11-15 Thread Wolfgang RAFFELSBERGER

Hi Ralf,

I first create (or not) the figure as a separate file(s) and then use 
conditional LaTeX to display the existing file(s) (exactely where I want 
it/them to appear).  This also works with png etc, but you'll have to specify 
the extensions. Just be careful if you change paths ...
This will give something like:


label=condFig1,fig=T,include=F=
wantToPlot - TRUE
if(wantToPlot) {
  pdf(file=fig1.pdf)
  plot(1:10) # watever you want to plot ...
}
@

bla .. bla .. bla .. bla ..

\ifnum \Sexpr{as.numeric(exists(fig1.pdf))}0 {
  \begin{figure}[!h]
  \includegraphics {fig1} %% fig1
  \label{..}
  \end{figure}
} \fi


Wolfgang

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Wolfgang Raffelsberger, PhD
Laboratoire de BioInformatique et Génomique Intégratives
IGBMC,
1 rue Laurent Fries,  67404 Illkirch  Strasbourg,  France
Tel (+33) 388 65 3300 Fax (+33) 388 65 3276
wolfgang.raffelsberger  (at)  igbmc.fr




De : r-help-boun...@r-project.org [r-help-boun...@r-project.org] de la part de 
Ralf B [ralf.bie...@gmail.com]
Date d'envoi : lundi 15 novembre 2010 18:49
À : r-help Mailing List
Objet : [R] Sweave: Conditional code chunks?

I have a code junk that produces a figure. In my special case,
however, data does not always exist. In cases where data exists, the
code chunk is of course trival (case #1), however, what do I do for
case # 2 where the data does not exist?
I can obviously prevent the code from being executed by checking the
existence of the object x, but on the Sweave level I have a static
figure chunk. Here an example that should be reproducible:

# case 1
x - c(1,2,3)
# case 2 - no definition of variable
#x - c(1,2,3)

echo=FALSE, results=hide, fig=TRUE=
if(exists(as.character(substitute(meta.summary{
  plot(x)
}
@

In a way I would need a conditional chunk or a chunk that draws a
figure only if it was generated and ignores it otherwise.

Any ideas?

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to Read a Large CSV into a Database with R

2010-11-15 Thread Anthony Damico

Hi Gabor,

Thank you for your willingness to help me through this.  The code you sent
works on my machine exactly the same way as it does on yours.
Unfortunately, when I run the same code on the 1.3GB file, it creates the
table structure but doesn't read in a single line [confirmed with
sqldf(select * from mytab,dbname=mydb)]  Though I don't expect anyone to
download it, the file I'm using is ss09pusa.csv from
http://www2.census.gov/acs2009_1yr/pums/csv_pus.zip.  I tested both sets of
code on my work desktop and personal laptop, so it's not machine-specific
(although it might be Windows- or 64 bit-specific).

Do you have any other ideas as to how I might diagnose what's going on
here?  Or, alternatively, is there some workaround that would get this giant
CSV into a database?  If you think there's a reasonable way to use the
IMPORT command with RSQLite, that seems like it would import the fastest,
but I don't know that it's compatible with DBI on Windows.

Thanks again!
Anthony


 read.csv.sql(R:\\American Community Survey\\Data\\2009\\ss09pusa.csv,
sql = create table mytab as select * from file, dbname = mydb)
NULL
Warning message:
closing unused connection 3 (R:\American Community
Survey\Data\2009\ss09pusa.csv)

 # how many records are in table?
 N - sqldf(select count(*) from mytab, dbname = mydb)[[1]]

 # read in chunks and display what we have read

 k - 4 # no of records to read at once
 for(i in seq(0, N-1, k)) {
+s - sprintf(select * from mytab limit %d, %d, i, k)
+print(sqldf(s, dbname = mydb))
+ }
Error in seq.default(0, N - 1, k) : wrong sign in 'by' argument
 N
[1] 0




On Mon, Nov 15, 2010 at 12:24 PM, Gabor Grothendieck 
ggrothendi...@gmail.com wrote:

 On Mon, Nov 15, 2010 at 11:46 AM, Anthony Damico ajdam...@gmail.com
 wrote:
  Hi Gabor,
 
  Thank you for the prompt reply.  I definitely looked over all of the
  examples on the code.google.com sqldf page before sending, which is why
 I
  wrote the code
 
  read.csv.sql(ss09pusa.csv , sql=create table ss09pusa as select * from
  file , dbname=sqlite)
 
  directly pulled from their code -- read.csv.sql(~/tmp.csv, sql =
 create
  table mytab as select * from file,dbname = mydb)
 
  ..but I don't understand why this helps me around the memory problem,
 since
  I think it still all gets read into memory.  Is there a way to do this
 line
  by line?


 OK. Maybe its something else.

 The reading in of the file into the database should not be a resource
 problem provided you have enough disk space and appropriate
 permissions. sqldf / RSQLite are used to get sqlite to do it so that
 the data never goes through R at that stage so R limitations can't
 affect the reading in to the sqlite database.   When you read it from
 the sqlite database then R limitations come into effect so you just
 have to be sure not to read too much in at a time.  The use of create
 table ... as select ... is to prevent sqldf from deleting the table
 since sqldf is normally used in a fashion where you don't want to know
 about the back end databases so it tries to create them and delete
 them behind the scenes but here you want to explicitly use them so
 you have to work around that.

 Try this example. It should be reproducible so you just have to copy
 it and paste it into your R session.  Uncomment the indicated line if
 you want to be able to remove any pre-existing mydb file in the
 current directory.  Try it in a fresh R session just to be sure that
 nothing mucks it up.

 library(sqldf)

 # uncomment next line to make sure we are starting clean
 # if (file.exists(mydb)) file.remove(mydb)

 # create new database
 sqldf(attach 'mydb' as new)

 # create a new file.  BOD is built into R and has 6 rows.
 write.table(BOD, file = tmp.csv, quote = FALSE, sep = ,)

 # read new file into database
 read.csv.sql(tmp.csv, sql = create table mytab as select * from file,
dbname = mydb)

 # how many records are in table?
 N - sqldf(select count(*) from mytab, dbname = mydb)[[1]]

 # read in chunks and display what we have read

 k - 4 # no of records to read at once
 for(i in seq(0, N-1, k)) {
s - sprintf(select * from mytab limit %d, %d, i, k)
print(sqldf(s, dbname = mydb))
 }

 On my machine I get this output:

  Time demand
 118.3
 22   10.3
 33   19.0
 44   16.0
  Time demand
 15   15.6
 27   19.8

 showing that it read the 6 line BOD data frame in chunks of 4 as required.
 --
 Statistics  Software Consulting
 GKX Group, GKX Associates Inc.
 tel: 1-877-GKX-GROUP
 email: ggrothendieck at gmail.com


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sweave question


On 15/11/2010 12:38 PM, Ralf B wrote:

Thank you for all your comment. In result of own research I found this
method that seems to do what I want in addition to your suggestions:

tools::texi2dvi(myfile.tex, pdf=TRUE)


Sure, but this doesn't quite answer your original question:  you can't 
pass a Sweave document to texi2dvi, you

need to pass it through Sweave first.  So the two steps would be

Sweave(myfile.Rnw)
tools::texi2dvi(myfile.tex)

That's essentially what SweavePDF does, but it adds some extra bells and 
whistles, e.g. it can process multiple related files, it can go directly 
to a previewer,and it will set up forward and reverse searching from the 
previewer.


Duncan Murdoch



Thanks again,
Ralf

On Mon, Nov 15, 2010 at 6:42 AM, Duncan Murdoch
murdoch.dun...@gmail.com  wrote:
  On 15/11/2010 6:22 AM, Dieter Menne wrote:


  Duncan Murdoch-2 wrote:

  See SweavePDF() in the patchDVI package on R-forge.



  In case googling patchDVI only show a few Japanese Pages, and search for
  patchDVI in R-Forge gives nothing: try

  https://r-forge.r-project.org/projects/sweavesearch/

  (or did I miss something obvious, Duncan?)

  No, I just didn't realize that it was hard to find.  But you can always
  select R-forge as a repository, and then install.packages() will find it.

  Duncan Murdoch

  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rotate column names in large matrix

2010-11-15 Thread Joshua Wiley

Hi Lara,

Hmm, I've never seen column names rotated in R (certainly you could in
graphics, etc. and this should do it in that case:
lapply(strsplit(colnames(x), ''), paste, collapse = \n)  ).  You
could transpose the matrix so the columns become the rows and then
just have numbers (1:1600) as the columns?  That's the best solution
I've found when dealing with large correlation matrices.  I also
usually set options(digits = 2) or thereabouts (or use round() ).  I'd
be interested in seeing any other ideas people have also as this has
been troublesome to me in the paste some too.  As much as I hate to
say it, I find it easier to view some of these things in Excel (you
can just write the matrix to the clipboard and paste into Excel or
probably open office (though I have not tried)) because it has easy
scrolls bars and zooming.

Cheers,

Josh

On Mon, Nov 15, 2010 at 9:47 AM, Lara Poplarski larapoplar...@gmail.com wrote:
 Dear List,


 I have a large (1600*1600) matrix generated with symnum, that I am using to
 eyeball the structure of a dataset.


 I have abbreviated the column names with the abbr.colnames option. One way
 to get an even more compact view of the matrix would be to display the
 column names rotated by 90 degrees.


 Any pointers on how to do this would be most useful. Any other tips for
 displaying the matrix in compact form are of course also welcome.


 Many thanks,

 Lara

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Merge postscript files into ps/pdf

2010-11-15 Thread Patrick Connolly

On Fri, 12-Nov-2010 at 03:29PM -0500, Ralf B wrote:

| I know such programs, however, for my specific problem I have an R
| script that creates a report (which I have to create many times)
| and I would like to append about 100 single paged post scripts at
| the end as appendix. File names are controlled so it would be easy
| to detect them; I just miss a useful function/package that allows
| me to perhaps print them to a postscript graphics device.

R is a somewhat clumsy tool to do what would be more simply done
outside R, but if you insist on doing it in R, it makes sense to edit
the R code that did the plots in the first place.  You just need a
good text editor.  




| 
| Ralf
| 
| On Fri, Nov 12, 2010 at 11:47 AM, Greg Snow greg.s...@imail.org wrote:
|  The best approach if creating all the files using R is to change how you 
create the graphs so that they all go to one file to begin with (as mentioned 
by Joshua), but if some of the files are created differently (rgl, external 
programs), then this is not an option.
| 
|  One external program that is fairly easy to use is pdftk which will 
concatenate multiple pdf files into 1 (among other things).  If you want more 
control of layout then you can use LaTeX which will read and include ps/pdf.
| 
|  If you need to use R, then you can read ps files using the grImport 
package and then replot them to a postscript/pdf device with onefile set to 
TRUE.
| 
|  --
|  Gregory (Greg) L. Snow Ph.D.
|  Statistical Data Center
|  Intermountain Healthcare
|  greg.s...@imail.org
|  801.408.8111
| 
| 
|  -Original Message-
|  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
|  project.org] On Behalf Of Ralf B
|  Sent: Friday, November 12, 2010 12:07 AM
|  To: r-help Mailing List
|  Subject: [R] Merge postscript files into ps/pdf
| 
|  I created multiple postscript files using ?postscript. How can I merge
|  them into a single postscript file using R? How can I merge them into
|  a single pdf file?
| 
|  Thanks a lot,
|  Ralf
| 
|  __
|  R-help@r-project.org mailing list
|  https://stat.ethz.ch/mailman/listinfo/r-help
|  PLEASE do read the posting guide http://www.R-project.org/posting-
|  guide.html
|  and provide commented, minimal, self-contained, reproducible code.
| 
| 
| __
| R-help@r-project.org mailing list
| https://stat.ethz.ch/mailman/listinfo/r-help
| PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
| and provide commented, minimal, self-contained, reproducible code.

-- 
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.   
   ___Patrick Connolly   
 {~._.~}   Great minds discuss ideas
 _( Y )_ Average minds discuss events 
(:_~*~_:)  Small minds discuss people  
 (_)-(_)  . Eleanor Roosevelt
  
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to Read a Large CSV into a Database with R

2010-11-15 Thread Mike Marchywka










 From: ajdam...@gmail.com
 Date: Mon, 15 Nov 2010 13:28:40 -0500
 To: ggrothendi...@gmail.com; r-help@r-project.org
 Subject: Re: [R] How to Read a Large CSV into a Database with R

 Hi Gabor,

 Thank you for your willingness to help me through this. The code you sent
 works on my machine exactly the same way as it does on yours.
 Unfortunately, when I run the same code on the 1.3GB file, it creates the
 table structure but doesn't read in a single line [confirmed with
 sqldf(select * from mytab,dbname=mydb)] Though I don't expect anyone to
 download it, the file I'm using is ss09pusa.csv from
 http://www2.census.gov/acs2009_1yr/pums/csv_pus.zip. I tested both sets of
 code on my work desktop and personal laptop, so it's not machine-specific


 Do you have any other ideas as to how I might diagnose what's going on
 here? Or, alternatively, is there some workaround that would get this giant
 CSV into a database? If you think there's a reasonable way to use the
 IMPORT command with RSQLite, that seems like it would import the fastest,



I think someone else suggested external aproaches and indeed I hae
loaded census tiger filers into db for making maps and mobile apps etc.
I wold mention again that this may eliminate a memory limit and let you 
limp along but presumably you want a streaming source or something
if your analysis has preidctable access patterns and this data will not
be used as part of a hotel reservation system.

Structured input data, I think TIGER was line oriented, should be easy
to load into a db with bash script or java app.



 Thanks again!
 Anthony


  read.csv.sql(R:\\American Community Survey\\Data\\2009\\ss09pusa.csv,
 sql = create table mytab as select * from file, dbname = mydb)
 NULL
 Warning message:
 closing unused connection 3 (R:\American Community
 Survey\Data\2009\ss09pusa.csv)
 
  # how many records are in table?
  N - sqldf(select count(*) from mytab, dbname = mydb)[[1]]
 
  # read in chunks and display what we have read
 
  k - 4 # no of records to read at once
  for(i in seq(0, N-1, k)) {
 + s - sprintf(select * from mytab limit %d, %d, i, k)
 + print(sqldf(s, dbname = mydb))
 + }
 Error in seq.default(0, N - 1, k) : wrong sign in 'by' argument
  N
 [1] 0




 On Mon, Nov 15, 2010 at 12:24 PM, Gabor Grothendieck 
 ggrothendi...@gmail.com wrote:

  On Mon, Nov 15, 2010 at 11:46 AM, Anthony Damico 
  wrote:
   Hi Gabor,
  
   Thank you for the prompt reply. I definitely looked over all of the
   examples on the code.google.com sqldf page before sending, which is why
  I
   wrote the code
  
   read.csv.sql(ss09pusa.csv , sql=create table ss09pusa as select * from
   file , dbname=sqlite)
  
   directly pulled from their code -- read.csv.sql(~/tmp.csv, sql =
  create
   table mytab as select * from file,dbname = mydb)
  
   ..but I don't understand why this helps me around the memory problem,
  since
   I think it still all gets read into memory. Is there a way to do this
  line
   by line?
 
 
  OK. Maybe its something else.
 
  The reading in of the file into the database should not be a resource
  problem provided you have enough disk space and appropriate
  permissions. sqldf / RSQLite are used to get sqlite to do it so that
  the data never goes through R at that stage so R limitations can't
  affect the reading in to the sqlite database. When you read it from
  the sqlite database then R limitations come into effect so you just
  have to be sure not to read too much in at a time. The use of create
  table ... as select ... is to prevent sqldf from deleting the table
  since sqldf is normally used in a fashion where you don't want to know
  about the back end databases so it tries to create them and delete
  them behind the scenes but here you want to explicitly use them so
  you have to work around that.
 
  Try this example. It should be reproducible so you just have to copy
  it and paste it into your R session. Uncomment the indicated line if
  you want to be able to remove any pre-existing mydb file in the
  current directory. Try it in a fresh R session just to be sure that
  nothing mucks it up.
 
  library(sqldf)
 
  # uncomment next line to make sure we are starting clean
  # if (file.exists(mydb)) file.remove(mydb)
 
  # create new database
  sqldf(attach 'mydb' as new)
 
  # create a new file. BOD is built into R and has 6 rows.
  write.table(BOD, file = tmp.csv, quote = FALSE, sep = ,)
 
  # read new file into database
  read.csv.sql(tmp.csv, sql = create table mytab as select * from file,
  dbname = mydb)
 
  # how many records are in table?
  N - sqldf(select count(*) from mytab, dbname = mydb)[[1]]
 
  # read in chunks and display what we have read
 
  k - 4 # no of records to read at once
  for(i in seq(0, N-1, k)) {
  s - sprintf(select * from mytab limit %d, %d, i, k)
  print(sqldf(s, dbname = mydb))
  }
 
  On my machine I get this output:
 
  Time demand
  1 1 8.3
  2 2 10.3
  3 3 19.0
  4 4 16.0

[R] Need help with pointLabels()

2010-11-15 Thread Craig Starger

Hello R-list,

I am plotting a weighted linear regression in R. The points on my chart are
also scaled to sample size, so some points are large, some are small. I have
figured out everything I need using the plot() function: how to plot the
points, scale them by sample size, weight the linear regression by sample
size, plot that line, and plot the labels for the points. However, although
the pointLabel() function assures that the labels do not overlap with each
other, they still overlap with the larger points, and sometimes run off the
side of the chart too. I have tried saving the plot as EPS, PDF, SVG and
opening them in GIMP to try to move the text around, but GIMP does not
recognize the text as text, so I cannot select the labels and move them off
the points. I have tried various offsets too but that does not work
either. So I need to either (a) print the labels on the plot so that they do
not overlap the points or (b) be able to move the text around in the
resulting image file. Any advice? Here is the code I am using. BTW, I have
also tried ggplot2 but it has no function (that I am aware of) like
pointLabels() to avoid label overlap. Please feel free to email me directly.

postscript(file=fig_a.eps);
plot(x, y, xlab=X-axis, ylab=Y-axis, cex=0.02*sample_size, pch=21);
abline(lm(y~x, weight=sample_size))
pointLabel(x, y, labels=Category, cex=1, doPlot=TRUE, offset=2.5);
dev.off()

Thank you,

-craig.star...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] first-order integer valued autoregressive process, inar(1)

2010-11-15 Thread sahin


Hello,

in my doctoral thesis i try to model time series crash count data with  
an inar(1)-process, but i have a few problems in writing the r-code.  
is there someone, who works with inar-processes. i would be very  
grateful, if someone gives me some ideas in writing the code.


nazli

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R error using Survr function with gcmrec

2010-11-15 Thread Emily C

I've solved the condition problem, but have come across another one with the
gcmrec function and was wondering if anyone could point me in the right
direction again? After running the code below, I get this error message:

Error in gcmrec(Survr(id, time, event) ~ var, data = dataOK, s = 1096) :
  NA/NaN/Inf in foreign function call (arg 14)

I've looked into the code of the function and believe it has something to do
with the line call - match.call() but I'm not sure how it works. Thanks
for any input!


# dataset

id=c(rep(1,4),rep(2,4),rep(3,4),rep(4,5))


start=c(1996-01-01,1997-01-01,1998-01-01,1998-03-15,1996-01-01,
1996-04-15,1997-01-01,1998-01-01,1996-01-01,1997-01-01,
1998-01-01,1998-09-30,1996-01-01,1997-01-01,1997-12-15,
1998-01-01,1998-06-14)


stop=c(1997-01-01,1998-01-01,1998-03-15,1999-01-01,1996-04-15,
1997-01-01,1998-01-01,1999-01-01,1997-01-01,1998-01-01,
1998-09-30,1999-01-01,1997-01-01,1997-12-15,1998-01-01,
1998-06-14,1999-01-01)


time=c(366,365,73,292,105,261,365,365,366,365,272,93,366,348,17,164,201)


event=c(0,0,1,0,1,0,0,0,0,0,1,0,0,1,0,1,0)


enum=c(rep(seq(1,4,1),4),5)


var=c(21312,21869,22441,22441,3105,3105,3086,3075,130610,133147,135692,
135692,11686,11976,11976,12251,12251)


data=data.frame(id,start,stop,time,event,enum,var)


# modify Survr function

Survr = function (id, time, event)

{

if (length(unique(id)) = length(event[event == 0])) {

stop(Data doesn't match. Every subject must have a censored time)

}

if (length(unique(event))  2 | max(event) != 1 | min(event) !=

0) {

stop(event must be 0-1)

}

ans - cbind(id, time, event)

oldClass(ans) - Survr

invisible(ans)

}


# model

dataOK=addCenTime(data)

m-gcmrec(Survr(id,time,event)~var, data=dataOK, s=1096)








On Thu, Nov 11, 2010 at 3:50 PM, Emily C lia.bede...@gmail.com wrote:

 Sorry for the lack of context - I found a forum (
 http://r.789695.n4.nabble.com/R-help-f789696.html) that I thought was
 easier to navigate and was using it for the first time. Hit the reply
 button, but forgot that the mailing list recipients would not see previous
 message in the thread. I've pasted it at the bottom for reference.

 I actually need my data as single year entries as I am assessing the effect
 of a time-varying covariate (var) that is measured (and changes) on an
 annual basis. However, my events are assessed on a daily basis. But thank
 you for pointing out the condition in Survr(). To correct the problem, I
 believe it would still be valid to change the condition to:

 if (length(unique(id)) = length(event[event == 0])) {
   stop(Data doesn't match. Every subject must have a censored time)

 Thank you for the reply!


 On Thu, Nov 11, 2010 at 2:54 PM, David Winsemius 
 dwinsem...@comcast.netwrote:


 On Nov 11, 2010, at 2:50 PM, David Winsemius wrote:


 On Nov 11, 2010, at 2:09 PM, Emily wrote:


 I'm having the same problem


 (???: from a three year-old posting for which you didn't copy any
 context.)

  and was wondering whether you ever found a
 solution? It gives me the error Error in Survr(id, time, event) : Data
 doesn't match. Every subject must have a censored time even though all
 my
 subjects are right-censored, and to be sure, I've even used the
 addCenTime
 function. Any input appreciated!


 Your data has a lot of 0 events at the end of calendar years. That does
 not


  seem to be the expected format for the Survr records. It appears to
 define an


  invalid record as one where the only censoring event is at the time of
 the

 ^valid^

  last observation. Here's the first line in Survr that is throwing the
 error:

 if (length(unique(id)) != length(event[event == 0])) {
   stop(Data doesn't match. Every subject must have a censored time)

 I suspect you need to collapse your single-year entries with 0 events
 into multiple year entries with an event.

 --
 David.


 Here's my sample data:

 id=c(rep(1,4),rep(2,4),rep(3,4),rep(4,5))


 start=c(1996-01-01,1997-01-01,1998-01-01,1998-03-15,1996-01-01,1996-04-15,1997-01-01,1998-01-01,1996-01-01,1997-01-01,1998-01-01,1998-09-30,1996-01-01,1997-01-01,1997-12-15,1998-01-01,1998-06-14)


 stop=c(1997-01-01,1998-01-01,1998-03-15,1999-01-01,1996-04-15,1997-01-01,1998-01-01,1999-01-01,1997-01-01,1998-01-01,1998-09-30,1999-01-01,1997-01-01,1997-12-15,1998-01-01,1998-06-14,1999-01-01)

 time=c(366,365,73,292,105,261,365,365,366,365,272,93,366,348,17,164,201)

 event=c(0,0,1,0,1,0,0,0,0,0,1,0,0,1,0,1,0)

 enum=c(rep(seq(1,4,1),4),5)


 var=c(21312,21869,22441,22441,3105,3105,3086,3075,130610,133147,135692,135692,11686,11976,11976,12251,12251)

 data=data.frame(id,start,stop,time,event,enum,var)

 dataOK=addCenTime(data)
 m-gcmrec(Survr(id,time,event)~var, data=dataOK)
 --
 View this message in context:
 http://r.789695.n4.nabble.com/R-error-using-Survr-function-with-gcmrec-tp858931p3038374.html
 Sent from the R help mailing list archive at Nabble.com.

 __

[R] Re : interpretation of coefficients in survreg AND obtaining the hazard function

2010-11-15 Thread Biau David

Dear Prof Therneau,

thank yo for this information: this is going to be most useful for what I want 
to do. I will look into the ACF model.

Yours,

 David Biau.





De : Terry Therneau thern...@mayo.edu

Cc : r-help@r-project.org
Envoyé le : Lun 15 novembre 2010, 15h 33min 23s
Objet : Re: interpretation of coefficients in survreg AND obtaining the hazard 
function

1. The weibull is the only distribution that can be written in both a
proportional hazazrds for and an accelerated failure time form.  Survreg
uses the latter.
   In an ACF model, we model the time to failure.  Positive coefficients
are good (longer time to death).
   In a PH model, we model the death rate.  Positive coefficients are
bad (higher death rate).

You are not the first to be confused by the change in sign between the
two models.

2. There are about 5 different ways to parameterize a Weibull
distribution, 1-4 appear in various texts and the acf form is #5.  This
is a second common issue with survreg that strikes only the more
sophisticated users: to understand the output they look up the Weibull
in a textbook, and become even more confused!  

Kalbfliesch and Prentice is a good reference for the acf form.  The
manual page for psurvreg has some information on this, as does the very
end of ?survreg.  The psurvreg page also has an example of how to
extract the hazard function for a Weibull fit.

Begin included message 

Dear R help list,

I am modeling some survival data with coxph and survreg (dist='weibull')
using 
package survival. I have 2 problems:

1) I do not understand how to interpret the regression coefficients in
the 
survreg output and it is not clear, for me, from ?survreg.objects how
to.

Here is an example of the codes that points out my problem:
- data is stc1
- the factor is dichotomous with 'low' and 'high' categories

slr - Surv(stc1$ti_lr, stc1$ev_lr==1)

mca - coxph(slr~as.factor(grade2=='high'), data=stc1)
mcb - coxph(slr~as.factor(grade2), data=stc1)
mwa - survreg(slr~as.factor(grade2=='high'), data=stc1,
dist='weibull', 
scale=0)
mwb - survreg(slr~as.factor(grade2), data=stc1, dist='weibull',
scale=0)

 summary(mca)$coef

coef
exp(coef)  se(coef) z  Pr(|z|)
as.factor(grade2 == high)TRUE 0.2416562  1.273356 0.2456232
0.9838494  0.3251896

 summary(mcb)$coef
   coef exp(coef)  
se(coef) z Pr(|z|)
as.factor(grade2)low -0.2416562 0.7853261 0.2456232
-0.9838494
0.3251896

 summary(mwa)$coef
(Intercept) as.factor(grade2 == high)TRUE 
7.9068380   -0.4035245 

 summary(mwb)$coef
(Intercept) as.factor(grade2)low 
7.5033135   0.4035245 


No problem with the interpretation of the coefs in the cox model.
However, i do 
not understand why
a) the coefficients in the survreg model are the opposite (negative when
the 
other is positive) of what I have in the cox model? are these not the
log(HR) 
given the categories of these variable?
b) how come the intercept coefficient changes (the scale parameter does
not 
change)?

2) My second question relates to the first.
a) given a model from survreg, say mwa above, how should i do to extract
the 
base hazard and the hazard of each patient given a set of predictors?
With the 
hazard function for the ith individual in the study given by  h_i(t) = 
exp(\beta'x_i)*\lambda*\gamma*t^{\gamma-1}, it doesn't look like to me
that 
predict(mwa, type='linear') is \beta'x_i.
b) since I need the coefficient intercept from the model to obtain the
scale 
parameter  to obtain the base hazard function as defined in Collett 
(h_0(t)=\lambda*\gamma*t^{\gamma-1}), I am concerned that this
coefficient 
intercept changes depending on the reference level of the factor entered
in the 
model. The change is very important when I have more than one predictor
in the 
model.

Any help would be greatly appreciated,

David Biau.


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Force evaluation of variable when calling partialPlot

2010-11-15 Thread gdillon


RE: the folloing original question:
 I'm using the randomForest package and would like to generate partial
 dependence plots, one after another, for a variety of variables:
 
 m - randomForest( s, ... )
 varnames - c( var1, var2, var3, var4 )   # var1..4 are all in
 data frame s
 for( v in varnames ) {
partialPlot( x=m, pred.data=s, x.var=v )
 }
 
 ...but this doesn't work, with partialPlot complaining that it can't
 find the variable v.

I'm having a very similar problem using the partialPlot function. A
simplified version of my code looks like this:

data.in - paste(basedir,/sw_climate_dataframe_,root.name,.csv,sep=)
data - read.csv(data.in)
vars - c(sm1,precip.spring,tmax.fall,precip.fall)  #selected
variables in data frame
xdata - as.data.frame(data[,vars])
ydata - data[,5]
ntree - 2000

rf.pdplots - function() {
  sel.rf - randomForest(xdata,ydata,ntree=ntree,keep.forest=TRUE)
  par(family=sans,mfrow=c(2,2),mar=c(4,3,1,2),oma=c(0,3,0,0),mgp=c(2,1,0))
  for (i in 1:length(vars)) {
print((vars)[i])
partialPlot(sel.rf,xdata,vars[i],which.class=1,xlab=vars[i],main=)
mtext((Logit of probability of high severity)/2, side=2, line=1,
outer=T)
  }
}

rf.pdplots()

When I run this code, with partialPlots embedded in a function, I get the
following error:
Error in eval(expr, envir, enclos) : object i not found

If I just take the code inside the function and run it (not embedded in the
function), it runs just fine. Is there some reason why partialPlots doesn't
like to be called from inside a function?

Other things I've tried/noticed:
1. If I comment out the line with the partialPlots call (and the next line
with mtext), the function runs as expected and prints the variable names one
at a time.
2. If the variable i is defined as a number (e.g., 4) in the global
environment, then the function will run, the names print out one at a time,
and four plots are created. HOWEVER, the plots are all for the last (4th)
variable, BUT the x labels actually are different on each plot (i.e., the
xlab is actually looping through the four values in vars).

Can anyone help me make sense of this? Thanks.

-Greg
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Force-evaluation-of-variable-when-calling-partialPlot-tp2954229p3043750.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem in installing and starting Rattle

2010-11-15 Thread Graham Williams

On 16 November 2010 02:40, Feng Mai maif...@gmail.com wrote:


 I also have the problem trying to start rattle

 Windows 7 32-bit R 2.12.0

 When I try library(rattle) I get an error message
 The procedure entry point deflateSetHeader could not be located in the
 dynamic link library zilb1.dll
 I hit OK and it prompts me to install GTK+ again. I tried to uninstall GTK+
 first and delete all related files as Dieter suggested but still wont work
 :(


Yes, this is another issue that arose with R 2.12.0 and discussed on the
rattle-users mailing list (http://groups.google.com/group/rattle-users). Be
sure to use the newest version of Rattle (currently 2.5.47) where this has
been fixed. I've not worked out why, but if the XML package is loaded
before the RGtk2 package we see this error, but not if RGtk2 is loaded
first. Rattle now loads RGtk2 first.

Regards,
Graham




 --
 View this message in context:
 http://r.789695.n4.nabble.com/Problem-in-installing-and-starting-Rattle-tp3042502p3043262.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Defining functions inside loops

2010-11-15 Thread Eduardo de Oliveira Horta

Hello,

I was trying to define a set of functions inside a loop, with the loop index
working as a parameter for each function. Below I post a simpler example, as
to illustrate what I was intending:

f-list()
for (i in 1:10){
  f[[i]]-function(t){
f[[i]]-t^2+i
  }
}
rm(i)

With that, I was expecting that f[[1]] would be a function defined by t^2+1,
f[[2]] by t^2+2 and so on. However, the index i somehow doesn't get in the
function definition on each loop, that is, the functions f[[1]] through
f[[10]] are all defined by t^2+i. Thus, if I remove the object i from the
workspace, I get an error when evaluating these functions. Otherwise, if
don't remove the object i, it ends the loop with value equal to 10 and then
f[[1]](t)=f[[2]](t)=...=f[[10]](t)=t^2+10.

I am aware that I could simply put

f-function(u,i){
  f-t^2+i
}

but that's really not what I want.

Any help would be appreciated. Thanks in advance,

Eduardo Horta

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Need help with pointLabels()

2010-11-15 Thread Greg Snow

What package is pointLabel (or is it pointLabels) in?  giving a reproducible 
example includes stating packages other than the standard ones.

What you are trying to do is not simple for general cases, some tools work 
better on some datasets, but others work better on other datasets.

Some other tools that may help (but you will probably need to do some hand 
tweaking after using them, or basing your solution on these) could be:

thigmophobe.labels in the plotrix package
spread.labels in the plotrix package
TkIdentify in the TeachingDemos package
spread.labs in the TeachingDemos package


-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Craig Starger
 Sent: Monday, November 15, 2010 12:38 PM
 To: r-help@r-project.org
 Subject: [R] Need help with pointLabels()
 
 Hello R-list,
 
 I am plotting a weighted linear regression in R. The points on my chart
 are
 also scaled to sample size, so some points are large, some are small. I
 have
 figured out everything I need using the plot() function: how to plot
 the
 points, scale them by sample size, weight the linear regression by
 sample
 size, plot that line, and plot the labels for the points. However,
 although
 the pointLabel() function assures that the labels do not overlap with
 each
 other, they still overlap with the larger points, and sometimes run off
 the
 side of the chart too. I have tried saving the plot as EPS, PDF, SVG
 and
 opening them in GIMP to try to move the text around, but GIMP does not
 recognize the text as text, so I cannot select the labels and move them
 off
 the points. I have tried various offsets too but that does not work
 either. So I need to either (a) print the labels on the plot so that
 they do
 not overlap the points or (b) be able to move the text around in the
 resulting image file. Any advice? Here is the code I am using. BTW, I
 have
 also tried ggplot2 but it has no function (that I am aware of) like
 pointLabels() to avoid label overlap. Please feel free to email me
 directly.
 
 postscript(file=fig_a.eps);
 plot(x, y, xlab=X-axis, ylab=Y-axis, cex=0.02*sample_size, pch=21);
 abline(lm(y~x, weight=sample_size))
 pointLabel(x, y, labels=Category, cex=1, doPlot=TRUE, offset=2.5);
 dev.off()
 
 Thank you,
 
 -craig.star...@gmail.com
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Non-positive definite cross-covariance matrices

2010-11-15 Thread Giovanni Petris

What made you think that a cross-covariance matrix should be positive
definite? Id does not even need to be a square matrix, or symmetric.

Giovanni Petris

On Mon, 2010-11-15 at 12:58 -0500, Jeff Bassett wrote:
 I am creating covariance matrices from sets of points, and I am having
 frequent problems where I create matrices that are non-positive
 definite.  I've started using the corpcor package, which was
 specifically designed to address these types of problems.  It has
 solved many of my problems, but I still have one left.
 
 One of the matrices I need to calculate is a cross-covariance matrix.
 In other words, I need to calculate cov(A, B), where A and B are each
 a matrix defining a set of points.  The corpcor package does not seem
 to be able to perform this operation.
 
 Can anyone suggest a way to create cross-covariance matrices that are
 guaranteed (or at least likely) to be positive definite, either using
 corpcor or another package?
 
 I'm using R 2.8.1 and corpcor 1.5.2 on Mac OS X 10.5.8.
 
 - Jeff
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Defining functions inside loops

2010-11-15 Thread Greg Snow

This is a side effect of the lazy evaluation done in functions.  Look at the 
help page for the force function for more details and how to force evaluation 
and solve your problem.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Eduardo de Oliveira Horta
 Sent: Monday, November 15, 2010 1:50 PM
 To: r-help@r-project.org
 Subject: [R] Defining functions inside loops
 
 Hello,
 
 I was trying to define a set of functions inside a loop, with the loop
 index
 working as a parameter for each function. Below I post a simpler
 example, as
 to illustrate what I was intending:
 
 f-list()
 for (i in 1:10){
   f[[i]]-function(t){
 f[[i]]-t^2+i
   }
 }
 rm(i)
 
 With that, I was expecting that f[[1]] would be a function defined by
 t^2+1,
 f[[2]] by t^2+2 and so on. However, the index i somehow doesn't get
 in the
 function definition on each loop, that is, the functions f[[1]] through
 f[[10]] are all defined by t^2+i. Thus, if I remove the object i from
 the
 workspace, I get an error when evaluating these functions. Otherwise,
 if
 don't remove the object i, it ends the loop with value equal to 10 and
 then
 f[[1]](t)=f[[2]](t)=...=f[[10]](t)=t^2+10.
 
 I am aware that I could simply put
 
 f-function(u,i){
   f-t^2+i
 }
 
 but that's really not what I want.
 
 Any help would be appreciated. Thanks in advance,
 
 Eduardo Horta
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Need help with pointLabels()

2010-11-15 Thread Craig Starger

Sorry, pointLabel() is in the package maptools:

http://rgm2.lab.nig.ac.jp/RGM2/R_man-2.9.0/library/maptools/man/pointLabel.html

Thank you for the tips on the other packages, I will give it a try.

-C



On Mon, Nov 15, 2010 at 3:54 PM, Greg Snow greg.s...@imail.org wrote:

 What package is pointLabel (or is it pointLabels) in?  giving a
 reproducible example includes stating packages other than the standard ones.

 What you are trying to do is not simple for general cases, some tools work
 better on some datasets, but others work better on other datasets.

 Some other tools that may help (but you will probably need to do some hand
 tweaking after using them, or basing your solution on these) could be:

 thigmophobe.labels in the plotrix package
 spread.labels in the plotrix package
 TkIdentify in the TeachingDemos package
 spread.labs in the TeachingDemos package


 --
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.org
 801.408.8111


  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
  project.org] On Behalf Of Craig Starger
  Sent: Monday, November 15, 2010 12:38 PM
  To: r-help@r-project.org
  Subject: [R] Need help with pointLabels()
 
  Hello R-list,
 
  I am plotting a weighted linear regression in R. The points on my chart
  are
  also scaled to sample size, so some points are large, some are small. I
  have
  figured out everything I need using the plot() function: how to plot
  the
  points, scale them by sample size, weight the linear regression by
  sample
  size, plot that line, and plot the labels for the points. However,
  although
  the pointLabel() function assures that the labels do not overlap with
  each
  other, they still overlap with the larger points, and sometimes run off
  the
  side of the chart too. I have tried saving the plot as EPS, PDF, SVG
  and
  opening them in GIMP to try to move the text around, but GIMP does not
  recognize the text as text, so I cannot select the labels and move them
  off
  the points. I have tried various offsets too but that does not work
  either. So I need to either (a) print the labels on the plot so that
  they do
  not overlap the points or (b) be able to move the text around in the
  resulting image file. Any advice? Here is the code I am using. BTW, I
  have
  also tried ggplot2 but it has no function (that I am aware of) like
  pointLabels() to avoid label overlap. Please feel free to email me
  directly.
 
  postscript(file=fig_a.eps);
  plot(x, y, xlab=X-axis, ylab=Y-axis, cex=0.02*sample_size, pch=21);
  abline(lm(y~x, weight=sample_size))
  pointLabel(x, y, labels=Category, cex=1, doPlot=TRUE, offset=2.5);
  dev.off()
 
  Thank you,
 
  -craig.star...@gmail.com
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
  guide.html
  and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Defining functions inside loops

2010-11-15 Thread William Dunlap

You could make f[[i]] be function(t)t^2+i for i in 1:10
with
 f - lapply(1:10, function(i)local({ force(i) ; function(x)x^2+i}))
After that we get the correct results
 f[[7]](100:103)
[1] 10007 10208 10411 10616
but looking at the function doesn't immdiately tell you
what 'i' is in the function
 f[[7]]
function (x)
x^2 + i
environment: 0x19d7458
You can find it in f[[7]]'s environment
 get(i, envir=environment(f[[7]]))
[1] 7

The call to force() in the call to local() is not
necessary in this case, although it can help in
other situations.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Eduardo de 
 Oliveira Horta
 Sent: Monday, November 15, 2010 12:50 PM
 To: r-help@r-project.org
 Subject: [R] Defining functions inside loops
 
 Hello,
 
 I was trying to define a set of functions inside a loop, with 
 the loop index
 working as a parameter for each function. Below I post a 
 simpler example, as
 to illustrate what I was intending:
 
 f-list()
 for (i in 1:10){
   f[[i]]-function(t){
 f[[i]]-t^2+i
   }
 }
 rm(i)
 
 With that, I was expecting that f[[1]] would be a function 
 defined by t^2+1,
 f[[2]] by t^2+2 and so on. However, the index i somehow 
 doesn't get in the
 function definition on each loop, that is, the functions 
 f[[1]] through
 f[[10]] are all defined by t^2+i. Thus, if I remove the 
 object i from the
 workspace, I get an error when evaluating these functions. 
 Otherwise, if
 don't remove the object i, it ends the loop with value equal 
 to 10 and then
 f[[1]](t)=f[[2]](t)=...=f[[10]](t)=t^2+10.
 
 I am aware that I could simply put
 
 f-function(u,i){
   f-t^2+i
 }
 
 but that's really not what I want.
 
 Any help would be appreciated. Thanks in advance,
 
 Eduardo Horta
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with Hist

2010-11-15 Thread Steve Sidney


Hi Ivan / Dieter

Thanks for your assistance, I will try the suggestions.

The plotrix solution seems like the more elegant solution, except that 
unless I missing something I will land up with a value based graph 
rather than a density based one, which is what I really wanted.


It seems like the only solution will be to subset the data as has been 
suggested.


Regards
Steve

On 2010/11/15 06:45 PM, Ivan Calandra wrote:
Well, another possibility would be to edit the plot so that you cut 
the empty part (between 300 and 2000).
There might be some function that can do it, maybe the 
plotrix::gap.barplot() that Dieter already told you about.


Le 11/15/2010 16:22, sbsid...@mweb.co.za a écrit :

Thanks

What you have suggested of course works but I am trying to reduce the 
'ugliness'.


Anybody got any other ideas?

Regards
Steve

Sent from my BlackBerry® wireless device

-Original Message-
From: Ivan Calandraivan.calan...@uni-hamburg.de
Sender: r-help-boun...@r-project.org
Date: Mon, 15 Nov 2010 16:08:47
To:r-help@r-project.org
Reply-To: ivan.calan...@uni-hamburg.de
Subject: Re: [R] Help with Hist

Hi,

I think you should also give the upper extreme:

x- c(rnorm(80)+10, 101:110, 2001:2010)
hist(x, breaks=c(0, 20, 40, 60, 80, 100, 200, 500))
Error in hist.default(x, breaks = c(0, 20, 40, 60, 80, 100, 200, 500)) :
some 'x' not counted; maybe 'breaks' do not span range of 'x'
hist(x, breaks=c(0, 20, 40, 60, 80, 100, 200, 500, 2100))  ## which
looks horrible, but works, up to you how to cut it

HTH,
Ivan

Le 11/15/2010 15:53, Steve Sidney a écrit :

Dear list

I am trying to re-scale a histogram and using hist() but can not seem
to create a reduced scale where the upper values are not plotted.

What I have is about 100 of which 80 or so are between a value of 0
and 40 , one or two in the hundreds and an outlier around 2000.

What I would like to do is create an x-scale that shows 5 bins between
0-100 and then 3/4 bins between 100 and 2000 but I don't need any
resolution on the above 100 values.

If I use breaks c(0, 20, 40, 60, 80, 100, 200, 500) R gives me an
error saying that there are values not included, which of course I
know but I wish to ignore them.

It seems that I am missing something quite simple.

Any help would be appreciated.

Regards
Steve

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Need help with pointLabels()

2010-11-15 Thread Joshua Wiley

Hi Craig,

As Greg pointed out, choose optimal locations for labels is tricky
(side note for anyone reading pointLabel() comes from the maptools
package).  Inferring from your code, the labels you are plotting are
the Category each point belongs to which suggests there may not be a
huge amount of unique values.  Assuming you have a reasonably small
number of unique labels (say 10), what about colouring the points and
adding a legend?  This would be straightforward to code and completely
sidesteps the locationing of labels issue.  Of course, this can become
quite clunky with numerous levels.

You could do something similar with point shape, and you could even
mix shape and color to convey more information in the same space.

Cheers,

Josh


On Mon, Nov 15, 2010 at 11:37 AM, Craig Starger craig.star...@gmail.com wrote:
 Hello R-list,

 I am plotting a weighted linear regression in R. The points on my chart are
 also scaled to sample size, so some points are large, some are small. I have
 figured out everything I need using the plot() function: how to plot the
 points, scale them by sample size, weight the linear regression by sample
 size, plot that line, and plot the labels for the points. However, although
 the pointLabel() function assures that the labels do not overlap with each
 other, they still overlap with the larger points, and sometimes run off the
 side of the chart too. I have tried saving the plot as EPS, PDF, SVG and
 opening them in GIMP to try to move the text around, but GIMP does not
 recognize the text as text, so I cannot select the labels and move them off
 the points. I have tried various offsets too but that does not work
 either. So I need to either (a) print the labels on the plot so that they do
 not overlap the points or (b) be able to move the text around in the
 resulting image file. Any advice? Here is the code I am using. BTW, I have
 also tried ggplot2 but it has no function (that I am aware of) like
 pointLabels() to avoid label overlap. Please feel free to email me directly.

 postscript(file=fig_a.eps);
 plot(x, y, xlab=X-axis, ylab=Y-axis, cex=0.02*sample_size, pch=21);
 abline(lm(y~x, weight=sample_size))
 pointLabel(x, y, labels=Category, cex=1, doPlot=TRUE, offset=2.5);
 dev.off()

 Thank you,

 -craig.star...@gmail.com

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] indexing lists

2010-11-15 Thread Chris Carleton

Hi List,

I'm trying to work out how to use which(), or another function, to find the
top-level index of a list item based on a condition. An example will clarify
my question.

a - list(c(1,2),c(3,4))
a
[[1]]
[1] 1 2

[[2]]
[1] 3 4

I want to find the top level index of c(1,2), which should return 1 since;

a[[1]]
[1] 1 2

I can't seem to work out the syntax. I've tried;

which(a == c(1,2))

and an error about coercing to double is returned. I can find the index of
elements of a particular item by

which(a[[1]]==c(1,2)) or which(a[[1]]==1) etc that return [1] 1 2 and [1] 1
respectively as they should. Any thoughts?

C

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Defining functions inside loops

2010-11-15 Thread Eduardo de Oliveira Horta

Thanks a lot for your readiness! Problem (apparently) solved!

Best regards,

Eduardo Horta

On Mon, Nov 15, 2010 at 7:10 PM, William Dunlap wdun...@tibco.com wrote:

 You could make f[[i]] be function(t)t^2+i for i in 1:10
 with
 f - lapply(1:10, function(i)local({ force(i) ; function(x)x^2+i}))
 After that we get the correct results
 f[[7]](100:103)
[1] 10007 10208 10411 10616
 but looking at the function doesn't immdiately tell you
 what 'i' is in the function
 f[[7]]
function (x)
x^2 + i
environment: 0x19d7458
 You can find it in f[[7]]'s environment
 get(i, envir=environment(f[[7]]))
[1] 7

 The call to force() in the call to local() is not
 necessary in this case, although it can help in
 other situations.

 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com

  -Original Message-
  From: r-help-boun...@r-project.org
  [mailto:r-help-boun...@r-project.org] On Behalf Of Eduardo de
  Oliveira Horta
  Sent: Monday, November 15, 2010 12:50 PM
  To: r-help@r-project.org
  Subject: [R] Defining functions inside loops
 
  Hello,
 
  I was trying to define a set of functions inside a loop, with
  the loop index
  working as a parameter for each function. Below I post a
  simpler example, as
  to illustrate what I was intending:
 
  f-list()
  for (i in 1:10){
f[[i]]-function(t){
  f[[i]]-t^2+i
}
  }
  rm(i)
 
  With that, I was expecting that f[[1]] would be a function
  defined by t^2+1,
  f[[2]] by t^2+2 and so on. However, the index i somehow
  doesn't get in the
  function definition on each loop, that is, the functions
  f[[1]] through
  f[[10]] are all defined by t^2+i. Thus, if I remove the
  object i from the
  workspace, I get an error when evaluating these functions.
  Otherwise, if
  don't remove the object i, it ends the loop with value equal
  to 10 and then
  f[[1]](t)=f[[2]](t)=...=f[[10]](t)=t^2+10.
 
  I am aware that I could simply put
 
  f-function(u,i){
f-t^2+i
  }
 
  but that's really not what I want.
 
  Any help would be appreciated. Thanks in advance,
 
  Eduardo Horta
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] indexing lists

2010-11-15 Thread Erik Iverson


Chris,

Well, the 'answer' could be:

which(sapply(a, function(x) all(x == c(1,2

But I wonder how these elements of 'a' in your
actual application are coming to be?  If you're
constructing them, you can give the elements of
the list names, and then it doesn't matter what
numerical index they have, you can just reference
them by name.

a - list(name1 = 1:2,
  name2 = 3:4)
a
a - c(anothername = list(9:10), a)
a
a$name1

Chris Carleton wrote:

Hi List,

I'm trying to work out how to use which(), or another function, to find the
top-level index of a list item based on a condition. An example will clarify
my question.

a - list(c(1,2),c(3,4))
a
[[1]]
[1] 1 2

[[2]]
[1] 3 4

I want to find the top level index of c(1,2), which should return 1 since;

a[[1]]
[1] 1 2

I can't seem to work out the syntax. I've tried;

which(a == c(1,2))

and an error about coercing to double is returned. I can find the index of
elements of a particular item by

which(a[[1]]==c(1,2)) or which(a[[1]]==1) etc that return [1] 1 2 and [1] 1
respectively as they should. Any thoughts?

C

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] indexing lists

2010-11-15 Thread Joshua Wiley

Hi Chris,

Does this do what you're after?  It just compares each element of a
(i.e., a[[1]] and a[[2]]) to c(1, 2) and determines if they are
identical or not.

which(sapply(a, identical, y = c(1, 2)))

There were too many 1s floating around for me to figure out if you
wanted to find elements of a that matched the entire vector or
subelements of a that matched elements of the vector (if that makes
any sense).

HTH,

Josh

On Mon, Nov 15, 2010 at 1:24 PM, Chris Carleton
w_chris_carle...@hotmail.com wrote:
 Hi List,

 I'm trying to work out how to use which(), or another function, to find the
 top-level index of a list item based on a condition. An example will clarify
 my question.

 a - list(c(1,2),c(3,4))
 a
 [[1]]
 [1] 1 2

 [[2]]
 [1] 3 4

 I want to find the top level index of c(1,2), which should return 1 since;

 a[[1]]
 [1] 1 2

 I can't seem to work out the syntax. I've tried;

 which(a == c(1,2))

 and an error about coercing to double is returned. I can find the index of
 elements of a particular item by

 which(a[[1]]==c(1,2)) or which(a[[1]]==1) etc that return [1] 1 2 and [1] 1
 respectively as they should. Any thoughts?

 C

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Defining functions inside loops