Re: [R] Serverless databases in R

2010-04-19 Thread Barry Rowlingson
On Sun, Apr 18, 2010 at 11:30 PM, kMan kchambe...@gmail.com wrote:
 It was my understanding that .Rdata files were not very portable, and do not
 natively handle queries. Otherwise we'd all just use .RData files instead of
 farming the work out to SQL drivers  external libraries, and colleagues who
 use, e.g. SAS or SPSS would also have no trouble with them.

 The platform in cross-platform to me generally means the
operating system on which a program is running - and .Rdata files are
perfectly portable between R on Linux, MacOSX, Windows, Solaris etc
versions. You didn't mention portability to other statistical
packages. You also didn't mention needing SQL, or what you wanted to
do with your databases. I figured I'd just mention .Rdata files for
completeness!

 There's also RJDBC and RODBC which can interface to anything with a
JDBC or ODBC interface on your system.

 A .RData file could be considered as a serverless NoSQL database.
There's a GSOC proposal to investigate interfaces to NoSQL databases
and some info here:

http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2010:nosql_interface

 Isn't it odd that the open-source R community has developed functions
for reading in proprietary SAS and SPSS format files, but (AFAIK) the
commercial sector doesn't seem to support reading data from
open-sourced and open-specced R .Rdata files?

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Scanning only specific columns into R from a VERY large file

2010-04-19 Thread Rubén Roa
-Mensaje original-
De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En 
nombre de Josh B
Enviado el: sábado, 17 de abril de 2010 0:12
Para: R Help
Asunto: [R] Scanning only specific columns into R from a VERY large file

Hi,

I turn to you, the R Sages, once again for help. You've never let me down!

(1) Please make the following toy files:

x - read.table(textConnection(var.1 var.2 var.3 var.1000
indv.1 1 5 9 7
indv.21 2 9 3 8), header = TRUE)

y - read.table(textConnection(var.3 var.1000), header = TRUE)

write.csv(x, file = x.csv)
write.csv(y, file = y.csv)

(2) Pretend you are starting with the files x.csv and y.csv. They come from 
another source -- an online database. Pretend that these files are much, much, 
much larger. Specifically: 
(a) Pretend that x.csv contains 1000 columns by 210,000 rows. 
(b) y.csv contains just header titles. Pretend that there are 90 header 
titles in y.csv in total. These header titles are a subset of the header 
titles in x.csv.

(3) What I want to do is scan (or import, or whatever the appropriate word is) 
only a subset of the columns from x.csv into an R. Specifically, I only want 
to scan the columns of data from x.csv into R that are indicated in the file 
y.csv. I still want to scan in all 21 rows from x.csv, but only for the 
aforementioned columns listed in y.csv.

Can you guys recommend a strategy for me? I think I need to use the scan 
command, based on the hugeness of x.csv, but I don't know what exactly to do. 
Specific code that gets the job done would be the most useful. 

Thank you very much in advance!
Josh

---
Try with something like

do.call(cbind,scan(file=yourfile.csv,what=list(NULL,NULL,,0,NULL,0,NULL,NULL,...,NULL),flush=TRUE))
 

you have to work out how to set up the list of parameter 'what' to read the 
headers of 'y'. In the above the only columns read are those indicated by a '0'.

HTH

Ruben 



 

Dr. Rubén Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)
SPAIN

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Odp: multiple variables pointing to single dataframe?

2010-04-19 Thread Petr PIKAL
Hi

r-help-boun...@r-project.org napsal dne 16.04.2010 16:15:40:

 Hi,  I have a need to have 2 variables point to the same dataframe (d1), 
 I 

What does it mean to point to data frame? Seems to me that it is something 
from C+.

You can reference data frame by $ or by square brackets with as many 
variables as you want.

see

?[

regards
Petr


 don't want to simply copy the dataframe ( d2-d1 ) as my understanding 
is that
 this will create a second dataframe.  Any suggestions on best practice 
here?
 
 Thank You,
 
 //
 // Alex Bryant
 // Software Developer
 // Integrated Clinical Systems, Inc.
 // 908-996-7208
 
 
 
 Confidentiality Note: This e-mail, and any attachment 
to...{{dropped:13}}
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Truncated Normal Distribution and Truncated Pareto distribution

2010-04-19 Thread Julia Cains
Dear R helpers,

I have a bimodal dataset dealing with loss amounts. I have divided this dataset 
into two with the bounds for the first dataset i.e. dataset-A being 5,000$ to 
100,000$ and the dataset-B deals with the losses exceeding 100,000$ i.e. 
dataset-B is left truncated. 

I need to fit truncated normal disribution to dataset - I having lower bound of 
5000 and upper bound of 100,000. While I need to fit truncated Pareto for the 
lossess exceeding 100,000$.

Is there any package in R which will guide me to fit these two distrubitions 
also giving KS (Kolmogorov Smirnov) test and Anderson Darling test results.

Please guide

Julia







Only a man of Worth sees Worth in other men






  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Truncated Normal Distribution and Truncated Pareto distribution

2010-04-19 Thread yves croissant
The truncreg package fits the truncated normal model.



Le lundi 19 avril 2010 à 00:21 -0700, Julia Cains a écrit :
 Dear R helpers,
 
 I have a bimodal dataset dealing with loss amounts. I have divided this 
 dataset into two with the bounds for the first dataset i.e. dataset-A being 
 5,000$ to 100,000$ and the dataset-B deals with the losses exceeding 100,000$ 
 i.e. dataset-B is left truncated. 
 
 I need to fit truncated normal disribution to dataset - I having lower bound 
 of 5000 and upper bound of 100,000. While I need to fit truncated Pareto for 
 the lossess exceeding 100,000$.
 
 Is there any package in R which will guide me to fit these two distrubitions 
 also giving KS (Kolmogorov Smirnov) test and Anderson Darling test results.
 
 Please guide
 
 Julia
 
 
 
 
 
 
 
 Only a man of Worth sees Worth in other men
 
 
 
 
 
 
   
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Truncated Normal Distribution and Truncated Pareto distribution

2010-04-19 Thread Rubén Roa
-Mensaje original-
De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En 
nombre de Julia Cains
Enviado el: lunes, 19 de abril de 2010 9:22
Para: r-help@r-project.org
Asunto: [R] Truncated Normal Distribution and Truncated Pareto distribution

Dear R helpers,

I have a bimodal dataset dealing with loss amounts. I have divided this dataset 
into two with the bounds for the first dataset i.e. dataset-A being 5,000$ to 
100,000$ and the dataset-B deals with the losses exceeding 100,000$ i.e. 
dataset-B is left truncated. 

I need to fit truncated normal disribution to dataset - I having lower bound of 
5000 and upper bound of 100,000. While I need to fit truncated Pareto for the 
lossess exceeding 100,000$.

Is there any package in R which will guide me to fit these two distrubitions 
also giving KS (Kolmogorov Smirnov) test and Anderson Darling test results.

Please guide

Julia

---
See
library(MASS)
?fitdistr
You can define your customized truncated density as a function in the parameter 
densfun of fitdistr.

See also
http://www.mail-archive.com/r-h...@stat.math.ethz.ch/msg34540.html
http://www.mail-archive.com/r-h...@stat.math.ethz.ch/msg34548.html

HTH


 

Dr. Rubén Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)
SPAIN






Only a man of Worth sees Worth in other men






  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R's unfortunate treatment of X11 failures

2010-04-19 Thread Adam D. I. Kramer

Hi,

I use R in a terminal environment on a linux box, using X11 for
graphics, often tunnelled to the terminal I'm using.  When an internet
connection dies (say, if the ssh connection dies when forwarding), R usually
gives a scary warning like Error: X11 fatal IO error: please save work and
shut down R. That's all well and good.

Then, if I ignore it and try to plot() something, R dies quite
ungracefully, producing (e.g.) something like this:

R: ../../src/xcb_io.c:385: _XAllocID: Assertion `ret != inval_id' failed.
Aborted

...and returning me to the shell. If I have not saved, I have lost my work.

Clearly, this is my fault--I ignored the warning. However, I just had the
above happen to me without my being warned...I lost some data, but not a
lot.  I am working on creating a reproducible case currently.

That said, in the meantime, I would suggest a broader fix. I'm sure we can
agree that R aborting due to failed assertions like this are pretty
unfortunate.  In the case of X11 failures like this, however, there is an
alternative: Print an error message and take a preventative action.

Error: X11 fatal IO error: X11 device disabled. ...and then call dev.off().

Indeed, if I just run dev.off() when I see this error, nothing about R seems
corrupt at all.  I can start a new X11 window with, X11(localhost:10) and
things (such as plotting) function fine.

Cordially,
Adam

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] BRugs

2010-04-19 Thread flipha23

Hi. I am new here, and I am writing this Winbugs code with BRugs.
n=length(bi.bmi)
Lagegp=13
Lgen=2
Lrace=5
Lstra=15
Lpsu=2
#model gen x race
bi.bmi.model=function(){
# likelihood
for (i in 1:n){
bi.bmi[i]~ dbern(p[i])
logit(p[i])- a0 + a1[agegp[i]]+a2[gen[i]]+a3[race[i]]
   + a12[agegp[i], gen[i]]
  + gam[stra[i]]+ u[psu[i],stra[i]] }
# constraints for a1, a2, a3, a12
a1[1]-0.0
a2[1]-0.0
a3[1]-0.0
a12[1,1]-0.0
#
for(k in 2:Lgen){ a12[1,k]-0.0}
for(j in 2:13){ a12[j,1]-0.0}

# priors
a0~ dnorm(0.0, 1.0E-4)
for(i in 2:13){a1[i]~dnorm(0.0, 1.0E-4)}
for(j in 2:Lgen){ a2[j]~ dnorm(0.0, 1.0E-4)}
for(k in 2:Lrace){ a3[k]~ dnorm(0.0, 1.0E-4)}

for(i in 2:Lagegp){
for(j in 2:Lgen){
a12[i,j]~ dnorm(0.0, 1.0E-4)
}}
for(i in 1:Lstra){gam[i]~dunif(0, 1000)}
for( i in 1:Lpsu){
for(j in 1:Lstra){
u[i,j]~ dnorm(0.0, tau.u)
}}
tau.u-pow(sigma.u, -2)
sigma.u~ dunif(0.0,100)
}
 
library(BRugs)
writeModel(bi.bmi.model, con='bi.bmi.model2.txt')
bi.bmi.model.data=list('n', 'Lagegp','Lgen', 'Lrace', 'Lstra', 'Lpsu',
 'stra', 'psu','bi.bmi','agegp', 'gen', 'race')
bi.bmi.model.init=function(){
list( sigma.u=runif(1),
a0-rnorm(1), 
a1-c(NA,rep(0, 12)),
a2-c(NA, rep(0, Lgen-1)),
a3-c(NA, rep(0, Lrace-1)),
a12-matrix( c(rep(NA, 13), NA,rep(0, 12)), ncol=2), 

gam-rep(1,Lstra), u-matrix(rep(0, 30), nrow=2))
}
bi.bmi.model.parameters=c( 'a0', 'a1', 'a2', 'a3',  'a12')
bi.bmi.model.bugs=BRugsFit(modelFile='bi.bmi.model2.txt',
   data=bi.bmi.model.data,
   inits=bi.bmi.model.init,
   numChains=1, 
   para=bi.bmi.model.parameters,
   nBurnin=20, nIter=40)

When I run this I get this message.
model is syntactically correct
data loaded
array index is greater than array upper bound for a1
[1] C:\\DOCUME~1\\Owner\\LOCALS~1\\Temp\\RtmpNvSdyb/inits1.txt
Initializing chain 1: model must be compiled before initial values loaded
model must be initialized before updating
model must be initialized before DIC an be monitored
Error in samplesSet(parametersToSave) : 
  model must be initialized before monitors used

I checked the main effects alone, and it works fine, so I don't really
understand why it's saying, array index is greater than array upper bound
for a1.
Anyone who could help me with this would be greatly appreciated.
Thanks.

-- 
View this message in context: http://n4.nabble.com/BRugs-tp2015395p2015395.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Kaplan-Meier survfit problem

2010-04-19 Thread ericyujin99

When I try to the code from library(survival) of library(ISwR),

the following code 

survfit(Surv(days,status==1))

that could produce Kaplan-Meier estimates shows the following error

Error in survfit(Surv(days, status == 1)) : 
  Survfit requires a formula or a coxph fit as the first argument

How it can be done in R.2.10
-- 
View this message in context: 
http://n4.nabble.com/Kaplan-Meier-survfit-problem-tp2015369p2015369.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kaplan-Meier survfit problem

2010-04-19 Thread Dimitris Rizopoulos

you need:

survfit(Surv(days, status == 1) ~ 1)


I hope it helps.

Best,
Dimitris


On 4/19/2010 4:44 AM, ericyujin99 wrote:


When I try to the code from library(survival) of library(ISwR),

the following code

survfit(Surv(days,status==1))

that could produce Kaplan-Meier estimates shows the following error

Error in survfit(Surv(days, status == 1)) :
   Survfit requires a formula or a coxph fit as the first argument

How it can be done in R.2.10


--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] xtabs() of proportions, and naming a dimension (not a row)

2010-04-19 Thread Jeff Brown

Thanks a lot, David and Dennis!  Also, your suggestions for how I could have
better stated my question are duly noted, and appreciated.
-- 
View this message in context: 
http://n4.nabble.com/xtabs-of-proportions-and-naming-a-dimension-not-a-row-tp2015261p2015380.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Natural cubic splines produced by smooth.Pspline and predict function in the package pspline

2010-04-19 Thread Szymon Marszalek
Hello,

I am using R and the smooth.Pspline function in the pspline package to
smooth some data by using natural cubic splines. After fitting a
sufficiently smooth spline using the following call:

(ps=smooth.Pspline(x,y,norder=2,spar=0.8,method=1)

[the values of x are age in years from 1 to 100]

I tried to check that R in fact had fitted a natural cubic spline by
checking that the resulting spline was LINEAR outside the knots. I did this
by plotting the predicted values from the spline fitting in the following
way:

plot(predict(ps,c(seq(100,150,1))

Unfortunately, the trend beyond the region of knots (i.e. over x values of
100) was far from linear - it was some sort of exponentially increasing
trend. My understanding of natural cubic splines (from Green and Silverman -
Nonparametric regression and generalized linear models, 1994) is that a
natural cubic spline is a series of cubic polynomials joined at a set of
knots in such a way that first and second derivatives are equal at all
knots. Furthermore, a natural cubic spline has a knot at every data point,
and is LINEAR on the range outside of its knots.

This leads me to the question of what smooth.Pspline actually performs to
the data and whether it actually fits a natural cubic spline as stated in
the help file? Or does smooth.Pspline work as I expect it, but I am using
predict incorrectly?

I thank you for your time.
Szymon

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] dataframe

2010-04-19 Thread n.via...@libero.it
Hi all,
I'm trying to load a csv file in which all the variables must be of type 
number.The object is a dataframe.When i load the file what i get is a dataframe 
in wich the variables are of type factor.How can I get variables of type 
number???
Thanks all

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unwanted boxes in legend

2010-04-19 Thread Steve Murray

Dear all,

Thanks for the response, however I'm getting the following error message when I 
execute the legend command using the 'border' argument:

Error in legend(10, par(usr)[4], c(A, B,  : 
  unused argument(s) (border = FALSE)


Is anyone aware of any alternative means of switching off boxes around all but 
one of the elements in a legend?

Many thanks for any input,

Steve



 Date: Thu, 15 Apr 2010 12:13:40 -0600
 From: ehl...@ucalgary.ca
 To: smurray...@hotmail.com
 CC: r-help@r-project.org
 Subject: Re: [R] Unwanted boxes in legend

 On 2010-04-15 11:10, Steve Murray wrote:

 Dear all,

 I am using the following code to generate a legend in my plot (consisting of 
 both bars and points), but end up with boxes around my points:

 legend(10, par(usr)[4], c(A, B, C, D), fill=c(NA,NA, grey28, 
 NA), pch=c(16,4,NA,18), col=c(red,blue,grey28,yellow), lty=FALSE, 
 bty=n, horiz=FALSE)

 I want a box around the third element of the legend (to represent the bar 
 'fill' colour), but not for the others, where points are shown instead.

 What am I doing wrong above and how do I correct it?

 Add the 'border' argument:

 either

 border = FALSE # in which case no box is drawn for any element

 or

 border = c(NA, NA, black, NA)

 -Peter Ehlers


 Many thanks,

 Steve


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --
 Peter Ehlers
 University of Calgary
  

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] BRugs

2010-04-19 Thread Bob O'Hara
On 19 April 2010 06:04, flipha23 neungsoo...@gmail.com wrote:


 Hi. I am new here, and I am writing this Winbugs code with BRugs.


snip

When I run this I get this message.
 model is syntactically correct
 data loaded
 array index is greater than array upper bound for a1
 [1] C:\\DOCUME~1\\Owner\\LOCALS~1\\Temp\\RtmpNvSdyb/inits1.txt
 Initializing chain 1: model must be compiled before initial values loaded
 model must be initialized before updating
 model must be initialized before DIC an be monitored
 Error in samplesSet(parametersToSave) :
  model must be initialized before monitors used

 I checked the main effects alone, and it works fine, so I don't really
 understand why it's saying, array index is greater than array upper bound
 for a1.
 Anyone who could help me with this would be greatly appreciated.
 Thanks.

 I wonder if it meant array index is greater than array upper bound
for a12., nad the 2 got chopped off. Anyway, try to compile without your
arrays (u and a12), and see which one causes the trap. After that, you might
have to do some digging to find the problem. Sometimes you have to check the
data files that are saved, to make sure they contain what you want.

BTW, you have some loops written as 2:13. It might be better to write them
as 2:Lagegp, just for consistency.

Bob

-- 
Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 798 40216
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Google Wave: rni@googlewave.com
Journal of Negative Results - EEB: www.jnr-eeb.org

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Serverless databases in R

2010-04-19 Thread kMan
Interesting info, Barry. Thank you.

Sorry about the vagueness. I was concerned about restricting responses by
imposing SQL terminology on options I've overlooked. I got .RData and
workspace images mixed up. I do not know about the private sector's
willingness to read open-source, I was guessing (and hoping to be corrected
otherwise) that they might offer importing from common dbm/rdbms. 

So RSQlite, and RH2 provide functionality for serverless dbs, and Gabor's
sqldf sounds like a sweet way to interact with those. The NoSQL
(non-relational dbm) projects sound interesting  permit for queries. A case
can be made for .RData files to be NoSQL and queryless serverless databases.
Perhaps I could use sqldf to pull off queries on .RData files? I thought
ODBC was windows specific? Should XML be in the NoSQL project list, or is
it considered too far along?

Sincerely,
KeithC.

-Original Message-
From: b.rowling...@googlemail.com [mailto:b.rowling...@googlemail.com] On
Behalf Of Barry Rowlingson
Sent: Monday, April 19, 2010 12:33 AM
To: kMan
Cc: r-help@r-project.org
Subject: Re: [R] Serverless databases in R

On Sun, Apr 18, 2010 at 11:30 PM, kMan kchambe...@gmail.com wrote:
 It was my understanding that .Rdata files were not very portable, and 
 do not natively handle queries. Otherwise we'd all just use .RData 
 files instead of farming the work out to SQL drivers  external 
 libraries, and colleagues who use, e.g. SAS or SPSS would also have no
trouble with them.

 The platform in cross-platform to me generally means the operating
system on which a program is running - and .Rdata files are perfectly
portable between R on Linux, MacOSX, Windows, Solaris etc versions. You
didn't mention portability to other statistical packages. You also didn't
mention needing SQL, or what you wanted to do with your databases. I figured
I'd just mention .Rdata files for completeness!

 There's also RJDBC and RODBC which can interface to anything with a JDBC or
ODBC interface on your system.

 A .RData file could be considered as a serverless NoSQL database.
There's a GSOC proposal to investigate interfaces to NoSQL databases and
some info here:

http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2010:nosql_int
erface

 Isn't it odd that the open-source R community has developed functions for
reading in proprietary SAS and SPSS format files, but (AFAIK) the commercial
sector doesn't seem to support reading data from open-sourced and
open-specced R .Rdata files?

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Equivalent to Python os.walk?

2010-04-19 Thread Albert-Jan Roskam
Hi,
 
I would like to recursively loop through al subfolders of a directory and do 
stuff with certain file types in those dirs. Is there a package/function that 
could do this? So it's more than Sys.glob. I'm looking for equivalent of 
Python's os.walk *) and I don't want to reinvent the wheel.
 
Thank you.

Cheers!!
Albert-Jan
 
*) http://www.saltycrane.com/blog/2007/03/python-oswalk-example/

~~
All right, but apart from the sanitation, the medicine, education, wine, public 
order, irrigation, roads, a fresh water system, and public health, what have 
the Romans ever done for us?
~~


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Formatting data, adding column names, use reshape, a newbie question

2010-04-19 Thread Paul Rigor (ucla)
Hi all,
I'm an R novice.

I have data that's already formatted as molten that reshape should be able
to work with. For example, the following was read in with
read.csv(filename,sep= , header=FALSE)

  V1   V2 V3 V4
 V5
1originalbookbook.source1.txt3289004943039.525
2originalbookbook.source1.txt3289004943057.952


I would like add column names so I can use reshape's cast method.

How do I go about that?

Thanks,
Paul

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Odp: dataframe

2010-04-19 Thread Petr PIKAL
Hi


r-help-boun...@r-project.org napsal dne 19.04.2010 10:12:59:

 Hi all,
 I'm trying to load a csv file in which all the variables must be of type 

 number.The object is a dataframe.When i load the file what i get is a 
dataframe 

You probably have non numeric data in your original CSV. Either you can 
correct it before reading it or you could try to force colClasses 
parameter of read.*whatever* you use and failed to tell.

Another option is to remove nonumeric items and change factors to numeric 

see
?as.numeric and ?as.character

Regards
Petr

 in wich the variables are of type factor.How can I get variables of type 

 number???
 Thanks all
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to use Excel VBA's Shell() to call and execute R file

2010-04-19 Thread Guy Green

Hi KZ,

I don't think that I can answer what I think is the precise question - how
to run the R file from VBA but without using RExcel.

However with RExcel installed, I have found it very straightforward to run R
code from within VBA (thanks Erich Neuwirth - RExcel is great).  The VBA
code is:

RInterface.StartRServer
RInterface.RunRFile C:\Excel_R_script.txt
RInterface.StopRServer

That is all that it takes.  Excel_R_script.txt is just standard R code, in
a text file.  Is there a reason you don't want to use RExcel?

Guy
-- 
View this message in context: 
http://n4.nabble.com/how-to-use-Excel-VBA-s-Shell-to-call-and-execute-R-file-tp2014944p2015718.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glmer with non integer weights

2010-04-19 Thread Kay Cichini

hi emmanuel,

thanks a lot for your extensive answer.
do you think using the asin(sqrt()) transf. can be justified for publishing
prurpose or do i have to expect criticism.

naivly i excluded that possibility, because of violated anova-assumptions,
but if i did get you right the finite range rather posses a problem here.

why is it in this special case an advantage? 

greetings,
kay

-- 
View this message in context: 
http://n4.nabble.com/glmer-with-non-integer-weights-tp1837179p2015732.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] comparing attitudes of 2 groups / likert scales?

2010-04-19 Thread Mona_m

Hi,

I have just found this forum, and it looks like a great place to get some
help (I hope)
For my dissertation, which is due way too soon, I am doing a survey,
comparing attitudes of 2 independent groups, with 5 scale likert questions.
Basically I want to show if they have similar or different attitudes. I am
testing 4 hypotheses, and have in total about 20 questions. 

I have to say my statistic skills are very basic and very rusty, we had some
lectures two years ago, where we were introduced to R. I looked through my
notes, and back then we did a one sample t-test to analyse likert type
questions. I believe I would need to do a 2 sample unpaired t-test.  It
would be great if someone could give me some feedback if this test is the
most suitable one for my purpose, and maybe could explain to me what’s the
easiest way to do this in R?

You would help me loads!! 
Many thanks in advance
Mona

-- 
View this message in context: 
http://n4.nabble.com/comparing-attitudes-of-2-groups-likert-scales-tp2015738p2015738.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Serverless databases in R

2010-04-19 Thread Frank E Harrell Jr

Barry Rowlingson wrote:

On Sun, Apr 18, 2010 at 11:30 PM, kMan kchambe...@gmail.com wrote:

It was my understanding that .Rdata files were not very portable, and do not
natively handle queries. Otherwise we'd all just use .RData files instead of
farming the work out to SQL drivers  external libraries, and colleagues who
use, e.g. SAS or SPSS would also have no trouble with them.


 The platform in cross-platform to me generally means the
operating system on which a program is running - and .Rdata files are
perfectly portable between R on Linux, MacOSX, Windows, Solaris etc
versions. You didn't mention portability to other statistical
packages. You also didn't mention needing SQL, or what you wanted to
do with your databases. I figured I'd just mention .Rdata files for
completeness!

 There's also RJDBC and RODBC which can interface to anything with a
JDBC or ODBC interface on your system.

 A .RData file could be considered as a serverless NoSQL database.
There's a GSOC proposal to investigate interfaces to NoSQL databases
and some info here:

http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2010:nosql_interface

 Isn't it odd that the open-source R community has developed functions
for reading in proprietary SAS and SPSS format files, but (AFAIK) the
commercial sector doesn't seem to support reading data from
open-sourced and open-specced R .Rdata files?

Barry


Hi Barry,

Stat Transfer can read and write R binary data frames (.rda files).

Frank
--
Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] logit() etc {was Re: glmer with non integer weights}

2010-04-19 Thread Martin Maechler
 EC == Emmanuel Charpentier charp...@bacbuc.dyndns.org
 on Sun, 18 Apr 2010 11:29:29 +0200 writes:

EC Le vendredi 16 avril 2010 à 00:15 -0800, Kay Cichini a
EC écrit :
 thanks thierry,
 
 i considered this transformations already, but variance
 is not stabilized and/or normality is neither achieved.
 i guess i'll have to look out for non-parametrics?

EC Or (maybe) a model based on a non-Gaussian likelihood ?
EC A beta distribution comes to mind, either fitted by
EC maximum likelihood or (if relevant prior information is
EC available) in a Bayesian framework ?

EC But beware : you have a not-so-small problem ...

EC Your data have zeroes and ones, which, if you have no
EC information on a sample size, are sharp zeroes and
EC ones, and there therefore theoretically bound to
EC infinite linear predictors (in plain English : bloody
EC unlikely). These values make a fixed effect analysis
EC impossible : these points at infinite will make
EC regression essentially impossible. Consider :

 logit-function(x)log(x/(1-x))
 ilogit-function(x)1/(1+exp(-x))

Hmmm,  and some CRAN packages even define these ..

Now, please,  the help page   ?Logistic
has contained for a long time now

  Note:
  
   ‘qlogis(p)’ is the same as the well known ‘_logit_’ function,
   logit(p) = log(p/(1-p), and ‘plogis(x)’ has consequently been
   called the ‘inverse logit’.

So please note, and do use qlogis() and plogis() instead of
logit() and ilogit() ... 
or if you really really must (e.g. for didactical reasons), use

logit - qlogis

Using the logistic functions directly may also remind you or
your user that sometimes it will be advantageous to use
'log.p=TRUE' or 'lower.tail=FALSE'  ``coordinate systems

Martin


  [...]
  [...]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Serverless databases in R

2010-04-19 Thread Prof Brian Ripley

On Mon, 19 Apr 2010, Frank E Harrell Jr wrote:


Barry Rowlingson wrote:

On Sun, Apr 18, 2010 at 11:30 PM, kMan kchambe...@gmail.com wrote:
It was my understanding that .Rdata files were not very portable, and do 
not
natively handle queries. Otherwise we'd all just use .RData files instead 
of
farming the work out to SQL drivers  external libraries, and colleagues 
who

use, e.g. SAS or SPSS would also have no trouble with them.


 The platform in cross-platform to me generally means the
operating system on which a program is running - and .Rdata files are
perfectly portable between R on Linux, MacOSX, Windows, Solaris etc
versions. You didn't mention portability to other statistical
packages. You also didn't mention needing SQL, or what you wanted to
do with your databases. I figured I'd just mention .Rdata files for
completeness!

 There's also RJDBC and RODBC which can interface to anything with a
JDBC or ODBC interface on your system.

 A .RData file could be considered as a serverless NoSQL database.
There's a GSOC proposal to investigate interfaces to NoSQL databases
and some info here:

http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2010:nosql_interface

 Isn't it odd that the open-source R community has developed functions
for reading in proprietary SAS and SPSS format files, but (AFAIK) the
commercial sector doesn't seem to support reading data from
open-sourced and open-specced R .Rdata files?

Barry


Hi Barry,

Stat Transfer can read and write R binary data frames (.rda files).


Yes, but that is a considerable restriction (and other programs can do 
similar things).  I suspect it means 'data frames with columns from a 
prespecified small set of types' saved in an RDA2 gzipped binary xdr 
format.


BTW, .rda and .RData are simply convenient file extensions: the first 
is more convenient in the Windows world.  They are from one of a 
collection of many different formats, identified by the file 'magic' 
headers.


I am not so sure about 'open-specced R .Rdata files'.  In so far as 
there is a spec, I wrote it in 'R Internals' and it is not a full 
spec.  Mainly because many of the details are only relevant to R 
itself, such as how you read environments and some of the details of 
the object headers.


Had the RDA formats been written with the intent that they would be 
used other than to read all the objects they contain into R, they 
would have been structured differently with a lot more metadata. 
That has been noted for RDA3, but introducing such a format would be a 
major step and is not imminent.


--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Writing methods for existing generic function

2010-04-19 Thread Viechtbauer Wolfgang (STAT)
Dear All,

Suppose I want to write a method for the generic function confint():

 args(confint)
function (object, parm, level = 0.95, ...)

So, it looks like the second and third argument have been predefined in the 
generic function. Suppose one or several of the predefined arguments don't 
apply or fit (in some sense) with the design of the rest of the package. What 
should one do? I see several options:

1) Write a new generic function.
2) Rewrite the package (if possible) so that the arguments of the existing 
generic function do apply (in some sense).
3) Write a method for the existing generic function, adding the predefined 
arguments to the method call, but just ignore some/all of them in whatever is 
being done inside of the method function.

Are there any other options? Is there some recommended practice for this?

Thanks in advance for any feedback!

Best,

--
Wolfgang Viechtbauerhttp://www.wvbauer.com/
Department of Methodology and StatisticsTel: +31 (43) 388-2277
School for Public Health and Primary Care   Office Location:
Maastricht University, P.O. Box 616 Room B2.01 (second floor)
6200 MD Maastricht, The Netherlands Debyeplein 1 (Randwyck)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] merge

2010-04-19 Thread n.via...@libero.it
I have a problem with the merge function.
I have to merge two big dataframes which  look like the following example.The 
problems is that I get duplicated rows.

CODPROD   N1   N3   N4
23   3   55 4
24   5  6736
25  3   73 24



second data frame


CODPROD  N1  N2   
30   34   45
45   078
65056


The result that I get its like:

CODPROD N1   N2 N3N4  N1.1
23 3   NA55
4 3
24 5   NA67   
36   0
25 3   NA73   
24 0
30 34 45  NA   
NA 0
45  0  78  NA   
NA   0
65  0   56  NA
NA .   0

So N1.1 is a duplication of N1.I think I could solve the problems by 
specifying  the same columns but I have a lot of colums which have the same 
names in the two dataframe so I think its not the right way to solve it.

Anyone knows how to avoid duplication??

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Equivalent to Python os.walk?

2010-04-19 Thread jim holtman
You can use 'list.files(..., recursive=TRUE)' to get a list of the file
names and then process sequentially.  It all depends on the type of
processing that you want to do.  You can also write a recursive function to
do the same thing.  Only take a couple of lines of code.

On Mon, Apr 19, 2010 at 5:25 AM, Albert-Jan Roskam fo...@yahoo.com wrote:

 Hi,

 I would like to recursively loop through al subfolders of a directory and
 do stuff with certain file types in those dirs. Is there a package/function
 that could do this? So it's more than Sys.glob. I'm looking for equivalent
 of Python's os.walk *) and I don't want to reinvent the wheel.

 Thank you.

 Cheers!!
 Albert-Jan

 *) http://www.saltycrane.com/blog/2007/03/python-oswalk-example/

 ~~
 All right, but apart from the sanitation, the medicine, education, wine,
 public order, irrigation, roads, a fresh water system, and public health,
 what have the Romans ever done for us?
 ~~



[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Interacting with dendrogram plots, locator() or click()

2010-04-19 Thread David J. States
The two functions below will see if the current graphics device appears to 
contain a dendrogram plot, and if so will return the substree when a user 
clicks on a node.  Note: function name identify.dendrogram allows the 
dendrogram class to bind this function for identify(dnd).  If someone wants to 
incorporate this into the public dendrogram distribution, please feel free to 
do so.

David

Example:

# plot a dendrogram of US arrest data
# and print the subtree that a user selects

dnd = as.dendrogram(hclust(dist(USArrests)))
plot(dnd, horiz=TRUE)
dnd2 = identify(dnd)
str(dnd2)

identify.dendrogram = function(dnd) {
#
# Return the dendrogram corresponding to the node a user clicks on.
#

#
# First verify that it is a dendrogram plot corresponding to the
# input dendrogram, and determine if horizontal or veritical
#
n = attributes(dnd)$members
h = attributes(dnd)$height
usr = par()$usr
dx = usr[1] + usr[2]
dy = usr[3] + usr[4]
ok = FALSE
horiz = FALSE
if (abs(n + 1 - dx)  0.001  abs((h - dy)/dy)  0.001) {
ok = TRUE
horiz = FALSE
} else {
if (abs(n + 1 - dy)  0.001  abs((h - dx)/dx)  0.001) {
ok = TRUE
horiz = TRUE
}
}
#
# If the plot matches, call locator() for user input and match the node
#
if (ok) {
crd = locator(1)
if (is.null(crd)) {
return(NULL)
} else {
return(find.node(dnd, 1, crd, horiz))
}
} else {
warning(plot that does not correspond to the dendrogram in the 
call.)
return(NULL)
}
}

find.node = function(dnd, offset, crd, horiz) {
#
# find a node in a dendrgram matching the coordinates in crd
# horiz is the plot orientation, see plot(dendrogram)
#
# First see if this node matches the coordinates
#
h = attributes(dnd)$height
n = attributes(dnd)$members
usr = par()$usr

ok.x = FALSE
ok.y = FALSE

if (horiz) {
ok.x = (abs(crd$x - h) / (usr[1] - usr[2]))  0.05
ok.y = round(crd$y,0) = offset  round(crd$y,0) = offset + n 
- 1
} else {
ok.y = (abs(crd$y - h) / (usr[4] - usr[3]))  0.05
ok.x = round(crd$x,0) = offset  round(crd$x,0) = offset + n 
- 1
}
if (ok.x  ok.y) {
attr = attributes(dnd)
attr$offset = offset
attributes(dnd) = attr

return(dnd)
}
#
# No, so see if there are children of this node that match
#
if (!is.leaf(dnd)) {
nc = length(dnd)
child.offset = offset
for (i in 1:nc) {
#
# Return the match subtree or descend further if no match
#
ret = find.node(dnd[[i]], child.offset, crd, horiz)

if (!is.null(ret)) {
return(ret)
} else {
child.offset = child.offset + 
attributes(dnd[[i]])$members
}
}
}
#
# None of the children matched so return NULL
#
return(NULL)
}


David J. States, M.D. Ph.D
University of Texas Health Science Center at Houston

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of David J. States
Sent: Saturday, April 17, 2010 8:35 AM
To: r-help@R-project.org
Subject: [R] Interacting with dendrogram plots, locator() or click()

I would like to explore dendrogam plots interactively.  For example, click on a 
node and return information about all of the children of that node.

Is there a high level wrapper for locator() or click() that will return the 
nearest dendrogram node on a plot?

If not, is there a way to obtain the [x,y] coordinates of all the nodes on a 
plot?

Thanks,

David

David J. States, M.D., Ph.D.
Professor of Health Information Science
School of Health Information Sciences
Brown Foundation Institute of Molecular Medicine
University of Texas Health Science Center at Houston

Sarofim Research Building Room 437C
1825 Pressler St.
Houston, TX   77030

Telephone: 713 500 3845
email: david.j.sta...@uth.tmc.edumailto:david.j.sta...@uth.tmc.edu
URL:  http://www.stateslab.org


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, 

Re: [R] comparing attitudes of 2 groups / likert scales?

2010-04-19 Thread Dieter Menne


Mona_m wrote:
 
 For my dissertation, which is due way too soon, I am doing a survey,
 comparing attitudes of 2 independent groups, with 5 scale likert
 questions.
 Basically I want to show if they have similar or different attitudes. I am
 testing 4 hypotheses, and have in total about 20 questions. 
 
 

Using an unpaired (in you case) t-test on Likert scale is a bit risky,
because the Gaussian distribution might be severely violated. It might be Ok
if your data are reasonable centered around moderate, but frequently we
have responses where all but one subject replied with very good. If you
can create a sum of scores, these are frequently more suitable for being
analyzed by some quasi-continuous method, and using a non-parametric
Wilcoxon test might avoid reviewer comments in some areas of research.

If you only have few levels, something like polr (in MASS) and the plots
created from it by Fox/Anderson might be an alternative (google for
Fox/polytomous effects). These results are more difficult to interpret and I
have seen cases where papers using this where rejected in medical journals
(why don't you use Wilcoxon?).

Overall, when you have more complex cross-over designs with additional
crossed variables, violating Gaussian assumptions for me seems to be the
lesser evil compared to violating independence. Assess independence, equal
variance and normality -in that order (van Bell, Statistical rules of
thumb). I remember Douglas Bates mumbling something along the same lines,
but he mentioned a 10 level scale.

Dieter

-- 
View this message in context: 
http://n4.nabble.com/comparing-attitudes-of-2-groups-likert-scales-tp2015738p2015812.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Follow up on installing formatR...

2010-04-19 Thread Brian Lunergan
Good morning folks:

Made a second go at installing this and succeeded, but with some strange
behaviours along the way. First the system back story.

Using up to date edition Ubuntu 8.04

Using up to date edition of R 2.10.1

Using the Toronto, Ontario repository to draw from.

After the first attempt ended up with two of the four dependencies
installed so only the Rgtk related items would be needed this time. Went
into synaptic and found r-cran-rgtk2. Okay. Sounded likely so I installed
it successfully. Using a root terminal session I went back into R and ran
install.packages(formatR). Pulled in two dependencies plus the main
choice. Rejected one and kept two when it came time to install but still
installed successfully, or so it seemed when I ran library(formatR). Go
figure. A copy of the message run follows. Any comments or suggestions will
be of interest.

 install.packages(formatR)
Warning in install.packages(formatR) :
  argument 'lib' is missing: using '/usr/local/lib/R/site-library'
--- Please select a CRAN mirror for use in this session ---
Loading Tcl/Tk interface ... done
also installing the dependencies ‘RGtk2’, ‘gWidgetsRGtk2’

trying URL 'http://probability.ca/cran/src/contrib/RGtk2_2.12.18.tar.gz'
Content type 'application/x-gzip' length 2206504 bytes (2.1 Mb)
opened URL
==
downloaded 2.1 Mb

trying URL 'http://probability.ca/cran/src/contrib/gWidgetsRGtk2_0.0-64.tar.gz'
Content type 'application/x-gzip' length 138192 bytes (134 Kb)
opened URL
==
downloaded 134 Kb

trying URL 'http://probability.ca/cran/src/contrib/formatR_0.1-3.tar.gz'
Content type 'application/x-gzip' length 2672 bytes
opened URL
==
downloaded 2672 bytes

* installing *source* package ‘RGtk2’ ...
checking for pkg-config... /usr/bin/pkg-config
checking pkg-config is at least version 0.9.0... yes
checking for LIBGLADE... no
configure: WARNING: libglade not found
checking for INTROSPECTION... no
checking for GTK... no
configure: error: GTK version 2.8.0 required
ERROR: configuration failed for package ‘RGtk2’
* removing ‘/usr/local/lib/R/site-library/RGtk2’
* installing *source* package ‘gWidgetsRGtk2’ ...
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices ...
* DONE (gWidgetsRGtk2)
* installing *source* package ‘formatR’ ...
** R
** preparing package for lazy loading
Loading required package: gWidgets
Loading required package: MASS
** help
*** installing help indices
** building package indices ...
* DONE (formatR)

The downloaded packages are in
‘/tmp/RtmpNL20Be/downloaded_packages’
Warning message:
In install.packages(formatR) :
  installation of package 'RGtk2' had non-zero exit status


-- 
Brian Lunergan
Nepean, Ontario
Canada

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] fit a deterministic function to observed data

2010-04-19 Thread vincent laperriere
Hi all,

I am not a mathematician and I am trying to fit a function which could fit my 
observed data.
Which function should I use and how could I fit it to data in R?
Below are the data:
x - c(0, 9, 17, 24, 28, 30)
y - c(500, 480, 420, 300, 160, 5)

I use R for Mac OS, version 2.10-1 2009-08-24
Thank you for your help.

Vincent.


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Comparing data frames

2010-04-19 Thread Laura Ferrero-Miliani
Thank you all for your help and suggestions.

L

On Sun, Apr 18, 2010 at 8:51 PM, Tal Galili tal.gal...@gmail.com wrote:
 Would:
 ?merge
 Work for you ?


 Contact
 Details:---
 Contact me: tal.gal...@gmail.com |  972-52-7275845
 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
 www.r-statistics.com (English)
 --




 On Sun, Apr 18, 2010 at 7:30 PM, Laura Ferrero-Miliani laur...@gmail.com
 wrote:

 Dear very helpful friends,

 It is Sunday, there is no air traffic in Europe, what better to do
 than try and learn me some more R.
 I have the following example:

 owner - c(1:4)
 animal - c(cat, dog, cat, dog)
 char.1 - c(fluffy, playful, mean, stupid)
 food - c(cat food, left-overs, cat food, dog food)
 char.2 - c(lazy, destructive, antisocial, goofy)
 color - c(white, brown, black, black)
 char.3 - c(fat, tiny, evil, big)
 age - c(16, 2, 5, 10)

 pet.data - data.frame(owner, animal, char.1, food, char.2,
                                  color, char.3, age)

 animal - c(cat, dog)
 v1 - c(fluffy, big)
 v2 - c(fat, stupid)

 pet.key - data.frame(animal, v1, v2)


 Now I would like to compare my pet.key to my pet.data and add a
 variable to pet.data with that result.
 So for each cat in pet.data, char.1, char.2 and char.3 should
 contain fluffy AND fat for a complete match, or fluffy OR fat
 for a partial match, and so on.

 I don't know where to start. I *think* I should be using %in%, but I
 don't know how to build the expression so it works (so far I have
 tried and have gotten from lists to a weird array as a result!)

 btw, what if my data and my key contains repeated values e.g.

 animal v1      v2
 cat      fluffy fluffy


 Any suggestions?


 Thanks in advance,

 Laura

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help

2010-04-19 Thread anderson nuel
Hi,

Thank you for your help.

this function 'combvec '  takes any number of inputs on Matlab. So, you can
 take more than two matrix.


The help of this function 'combvec' is like this on Matlab:

 help combvec

 COMBVEC Create all combinations of vectors.

   Syntax

 combvec(a1,a2,...)

   Description

 COMBVEC(A1,A2,...) takes any number of inputs,
   A1 - Matrix of N1 (column) vectors.
   A2 - Matrix of N2 (column) vectors.
 and returns a matrix of (N1*N2*...) column vectors, where the columns
 consist of all possibilities of A2 vectors, appended to
 A1 vectors, etc.

   Example

 a1 = [1 2 3; 4 5 6];
 a2 = [7 8; 9 10];
 a3 = combvec(a1,a2)

2010/4/19 Dennis Murphy djmu...@gmail.com

 Hi:

 This is a simplistic version of combvec that works for two input matrices;
 I don't
 have Matlab, and I don't understand how the function generalizes to more
 than
 two input matrices, so this is the best I can offer, for what it's worth...

 combvec2 - function(m1, m2) {
c1 - ncol(m1)
c2 - ncol(m2)
k1 - kronecker(matrix(rep(1, c2), nrow = 1), m1)
k2 - kronecker(m2, matrix(rep(1, c1), nrow = 1))
rbind(k1, k2)
   }

  a1 - matrix(1:6, nrow = 2, byrow = TRUE)
  a1
  [,1] [,2] [,3]
 [1,]123
 [2,]456
  a2 - matrix(7:10, nrow = 2, byrow = TRUE)

  combvec2(a1, a2)
  [,1] [,2] [,3] [,4] [,5] [,6]
 [1,]123123
 [2,]456456
 [3,]777888
 [4,]999   10   10   10

 HTH,
 Dennis

 On Sun, Apr 18, 2010 at 3:00 AM, anderson nuel anderson@gmail.comwrote:

 Hello,

  I would like to create all combinations of vectors. I find on Matalb
 this
 function 'combvec' which create  all combinations of vectors.

 Please could you help me to find the corresponds function of 'combvec'.

 For example:

 On Matlab

  a1 = [1 2 3; 4 5 6]

 a1 =

 1 2 3
 4 5 6

  a2 = [7 8; 9 10]

 a2 =

 7 8
 910

  a3 = combvec(a1,a2)

 a3 =

 1 2 3 1 2 3
 4 5 6 4 5 6
 7 7 7 8 8 8
 9 9 9101010

 Best Regards

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help

2010-04-19 Thread anderson nuel
Hi,

 Thank you for your help.


I try your function 'combvec2' .but , it gives me an error :Erreur dans
rep(1, c2) : argument 'times' incorrect


 this function 'combvec '  takes any number of inputs on Matlab. So, you can
  take more than two matrix.


 The help of this function 'combvec' is like this on Matlab:

  help combvec

  COMBVEC Create all combinations of vectors.

Syntax

  combvec(a1,a2,...)

Description

  COMBVEC(A1,A2,...) takes any number of inputs,
A1 - Matrix of N1 (column) vectors.
A2 - Matrix of N2 (column) vectors.
  and returns a matrix of (N1*N2*...) column vectors, where the columns
  consist of all possibilities of A2 vectors, appended to
  A1 vectors, etc.

Example

  a1 = [1 2 3; 4 5 6];
  a2 = [7 8; 9 10];
  a3 = combvec(a1,a2)

 2010/4/19 Dennis Murphy djmu...@gmail.com

 Hi:

 This is a simplistic version of combvec that works for two input matrices;
 I don't
 have Matlab, and I don't understand how the function generalizes to more
 than
 two input matrices, so this is the best I can offer, for what it's
 worth...

 combvec2 - function(m1, m2) {
c1 - ncol(m1)
c2 - ncol(m2)
k1 - kronecker(matrix(rep(1, c2), nrow = 1), m1)
k2 - kronecker(m2, matrix(rep(1, c1), nrow = 1))
rbind(k1, k2)
   }

  a1 - matrix(1:6, nrow = 2, byrow = TRUE)
  a1
  [,1] [,2] [,3]
 [1,]123
 [2,]456
  a2 - matrix(7:10, nrow = 2, byrow = TRUE)

  combvec2(a1, a2)
  [,1] [,2] [,3] [,4] [,5] [,6]
 [1,]123123
 [2,]456456
 [3,]777888
 [4,]999   10   10   10

 HTH,
 Dennis

 On Sun, Apr 18, 2010 at 3:00 AM, anderson nuel anderson@gmail.comwrote:

  Hello,

  I would like to create all combinations of vectors. I find on Matalb
 this
 function 'combvec' which create  all combinations of vectors.

 Please could you help me to find the corresponds function of 'combvec'.

 For example:

 On Matlab

  a1 = [1 2 3; 4 5 6]

 a1 =

 1 2 3
 4 5 6

  a2 = [7 8; 9 10]

 a2 =

 7 8
 910

  a3 = combvec(a1,a2)

 a3 =

 1 2 3 1 2 3
 4 5 6 4 5 6
 7 7 7 8 8 8
 9 9 9101010

 Best Regards

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ecdf

2010-04-19 Thread Downey, Patrick
Hello,

I'd like to plot an empirical cumulative distribution function, except
instead of the fraction of values  x, I'd like the fraction of values  x.


I think this can be done using the ecdf function in {Hmisc}. I installed
the package and loaded it. However, when following the example given in the
documentation, I get an error:

x - rnorm(100)
ecdf(x,what='1-F')
Error in ecdf(x, what = 1-F) : unused argument(s) (what = 1-F)

I believe that this is because R is attempting to access the ecdf function
in base R, which does not have the what option. Am I correct, and if so,
how can I change that?

Note: I also tried to do it myself without the {Hmisc} ecdf function, and
couldn't figure out a way. 

x2 - 1-ecdf(x)

doesn't work, and neither does

x2 - rep(0,times=100)
for(i in 1:100){
  x2[i] - 1-ecdf(x)[i]
}

Both result in errors.

Thanks in advance for any suggestions you can offer.

-Mitch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ecdf

2010-04-19 Thread ONKELINX, Thierry
R is case sensitive. ecdf() is in the stats package, Ecdf() is in Hmisc.
So you want Ecdf(x,what='1-F')

Thierry


ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie  Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics  Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey
  

 -Oorspronkelijk bericht-
 Van: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] Namens Downey, Patrick
 Verzonden: maandag 19 april 2010 15:04
 Aan: R help
 Onderwerp: [R] ecdf
 
 Hello,
 
 I'd like to plot an empirical cumulative distribution
 function, except instead of the fraction of values  x, I'd 
 like the fraction of values  x.
 
 
 I think this can be done using the ecdf function in {Hmisc}. 
 I installed the package and loaded it. However, when
 following the example given in the documentation, I get an error:
 
 x - rnorm(100)
 ecdf(x,what='1-F')
 Error in ecdf(x, what = 1-F) : unused argument(s) (what = 1-F)
 
 I believe that this is because R is attempting to access the 
 ecdf function in base R, which does not have the what option. 
 Am I correct, and if so, how can I change that?
 
 Note: I also tried to do it myself without the {Hmisc} ecdf 
 function, and couldn't figure out a way. 
 
 x2 - 1-ecdf(x)
 
 doesn't work, and neither does
 
 x2 - rep(0,times=100)
 for(i in 1:100){
   x2[i] - 1-ecdf(x)[i]
 }
 
 Both result in errors.
 
 Thanks in advance for any suggestions you can offer.
 
 -Mitch
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

Druk dit bericht a.u.b. niet onnodig af.
Please do not print this message unnecessarily.

Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer 
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in  this message 
and any annex are purely those of the writer and may not be regarded as stating 
an official position of INBO, as long as the message is not confirmed by a duly 
signed document.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fit a deterministic function to observed data

2010-04-19 Thread Gabor Grothendieck
Plotting y vs. x:

plot(y ~ x)

the graph seems to be flattening out at x = 0 at a level of around y =
500 so lets look at:

plot(500-y ~ x)

This curve is moving up rapidly so lets take the log to flatten it out:

plot(log(500-y) ~ x)

That looks quite linear so log(500-y) = A + B * x and solving:

y = 500 - exp(A + B*x)

The 500 was a just a ballpark so lets make that a parameter too:

y = C - exp(A + B*x) = C - exp(A) * exp(B*x) = C + D * exp(B*x)

where we have replaced -exp(A) with D.

Fitting this gives:

 fm - nls(y ~ cbind(1, exp(B * x)), start = c(B = 1), alg = plinear); fm
Nonlinear regression model
  model:  y ~ cbind(1, exp(B * x))
   data:  parent.frame()
   B.lin1.lin2
  0.1513 498.9519  -5.1644
 residual sum-of-squares: 627.6

Number of iterations to convergence: 6
Achieved convergence tolerance: 6.192e-06

 # graphing
 plot(y ~ x, pch = 20, col = red)
 lines(fitted(fm) ~ x)
 title(y = 498.9519 - 5.1644 * exp(0.1513 * x))


On Mon, Apr 19, 2010 at 8:42 AM, vincent laperriere
vincent_laperri...@yahoo.fr wrote:
 Hi all,

 I am not a mathematician and I am trying to fit a function which could fit my 
 observed data.
 Which function should I use and how could I fit it to data in R?
 Below are the data:
 x - c(0, 9, 17, 24, 28, 30)
 y - c(500, 480, 420, 300, 160, 5)

 I use R for Mac OS, version 2.10-1 2009-08-24
 Thank you for your help.

 Vincent.



        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ecdf

2010-04-19 Thread Downey, Patrick
Hi Thierry,

That worked perfectly. Thanks for the suggestion.

For reference, in the documentation, it never lists {Hmisc}'s function as
starting with E instead of e. I don't know who's in charge of
documentation, but that should probably be corrected.

Thanks again.

-Mitch

-Original Message-
From: ONKELINX, Thierry [mailto:thierry.onkel...@inbo.be] 
Sent: Monday, April 19, 2010 9:08 AM
To: Downey, Patrick; R help
Subject: RE: [R] ecdf

R is case sensitive. ecdf() is in the stats package, Ecdf() is in Hmisc.
So you want Ecdf(x,what='1-F')

Thierry


ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie  Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics  Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey
  

 -Oorspronkelijk bericht-
 Van: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] Namens Downey, Patrick
 Verzonden: maandag 19 april 2010 15:04
 Aan: R help
 Onderwerp: [R] ecdf
 
 Hello,
 
 I'd like to plot an empirical cumulative distribution 
 function, except instead of the fraction of values  x, I'd 
 like the fraction of values  x.
 
 
 I think this can be done using the ecdf function in {Hmisc}. 
 I installed the package and loaded it. However, when 
 following the example given in the documentation, I get an error:
 
 x - rnorm(100)
 ecdf(x,what='1-F')
 Error in ecdf(x, what = 1-F) : unused argument(s) (what = 1-F)
 
 I believe that this is because R is attempting to access the 
 ecdf function in base R, which does not have the what option. 
 Am I correct, and if so, how can I change that?
 
 Note: I also tried to do it myself without the {Hmisc} ecdf 
 function, and couldn't figure out a way. 
 
 x2 - 1-ecdf(x)
 
 doesn't work, and neither does
 
 x2 - rep(0,times=100)
 for(i in 1:100){
   x2[i] - 1-ecdf(x)[i]
 }
 
 Both result in errors.
 
 Thanks in advance for any suggestions you can offer.
 
 -Mitch
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

Druk dit bericht a.u.b. niet onnodig af.
Please do not print this message unnecessarily.

Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver
weer 
en binden het INBO onder geen enkel beding, zolang dit bericht niet
bevestigd is
door een geldig ondertekend document. The views expressed in  this message 
and any annex are purely those of the writer and may not be regarded as
stating 
an official position of INBO, as long as the message is not confirmed by a
duly 
signed document.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Extracting the coefficients of each local polynomial from loess()

2010-04-19 Thread Tal Galili
Hello dear R users and Prof. Brian Ripley,

I am searching for a way to extract the estimated coeffiicents of each local
polynomial at given x from loess().

After searching and
askinghttp://stackoverflow.com/questions/2666799/extracting-the-fitted-terms-in-the-local-polynomial-function-of-a-loess-in-r-n,
I found this thread:
http://tolstoy.newcastle.edu.au/R/e2/help/07/01/8221.html

From it I understand that the only way to do this is by opening up the
functions that are behind loess(), and that it is not recommended for
someone who doesn't know what he is doing.
I don't know what I am doing here, if someone else does - and is willing to
help, I believe me (and others) would benefit from it.


Thanks,
Tal



Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merge

2010-04-19 Thread Sarah Goslee
What do you want to get?

And what exactly did you do?

Your question isn't very clear.

Sarah

On Mon, Apr 19, 2010 at 7:59 AM, n.via...@libero.it n.via...@libero.it wrote:
 I have a problem with the merge function.
 I have to merge two big dataframes which  look like the following example.The
 problems is that I get duplicated rows.

 CODPROD       N1           N3           N4
 23                       3               55                 4
 24                       5              67                36
 25                      3               73                 24



 second data frame


 CODPROD                  N1              N2
 30                                   34               45
 45                                   0                    78
 65                                    0                    56


 The result that I get its like:

 CODPROD                 N1       N2         N3            N4      N1.1
 23                                 3           NA        55
 4             3
 24                                 5           NA        67
 36           0
 25                                 3           NA        73
 24             0
 30                                 34         45          NA
 NA         0
 45                                  0          78          NA
 NA           0
 65                                  0           56          NA
 NA     .   0

 So N1.1 is a duplication of N1.I think I could solve the problems by
 specifying  the same columns but I have a lot of colums which have the same
 names in the two dataframe so I think its not the right way to solve it.

 Anyone knows how to avoid duplication??



-- 
Sarah Goslee
http://www.functionaldiversity.org
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] smart way to turn a vector into a matrix

2010-04-19 Thread anna

Hi Erik, what if I do some manipulations on the new list created with split
and I want to come back to its initial form?  I saw you have the unsplit
function but I get errors. What I am doing is that I use as the f function
as in the help the same initial f I used to split but I get the following
error:
Error in x[i] - value[[j]] : replacement has length zero


-
Anna Lippel
-- 
View this message in context: 
http://n4.nabble.com/smart-way-to-turn-a-vector-into-a-matrix-tp1692671p2015952.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] multiple variables pointing to single dataframe?

2010-04-19 Thread Alex Bryant
Hi, for example:

 x - Orange
 x2 - x
 x[1,]$age - 50
 x2[1,]
  Tree age circumference
11 11830

I would like a way for x2 to also reference the modified x data frame without 
having to reassign x2x each time x is modified.

Thanks,
Alex

-Original Message-
From: Petr PIKAL [mailto:petr.pi...@precheza.cz] 
Sent: Monday, April 19, 2010 3:18 AM
To: Alex Bryant
Cc: r-help@r-project.org
Subject: Odp: [R] multiple variables pointing to single dataframe?

Hi

r-help-boun...@r-project.org napsal dne 16.04.2010 16:15:40:

 Hi,  I have a need to have 2 variables point to the same dataframe (d1), 
 I 

What does it mean to point to data frame? Seems to me that it is something 
from C+.

You can reference data frame by $ or by square brackets with as many 
variables as you want.

see

?[

regards
Petr


 don't want to simply copy the dataframe ( d2-d1 ) as my understanding 
is that
 this will create a second dataframe.  Any suggestions on best practice 
here?
 
 Thank You,
 
 //
 // Alex Bryant
 // Software Developer
 // Integrated Clinical Systems, Inc.
 // 908-996-7208
 
 
 
 Confidentiality Note: This e-mail, and any attachment 
to...{{dropped:13}}
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] comparing attitudes of 2 groups / likert scales?

2010-04-19 Thread Tal Galili
Good luck in your work,

The simple solution would be to run many non-paired wilcox on all the 20
questions (the way Dieter suggested).
In which case, make sure to adjust for multiple comparisons.  Read about it,
and see:
?p.adjust
If you have some questions you can merge (by a simple mean of them), it will
probably do you good (using PCA might be an option, but it could also be an
over kill for you).


You might also be interested in plotting your data, here is a nice simple
hack on how to display the Correlation scatter-plot matrix for your data:
http://www.r-statistics.com/2010/04/correlation-scatter-plot-matrix-for-ordered-categorical-data/


And Dieter, thanks for a great quote: Assess independence, equalvariance
and normality -in that order (van Bell, Statistical rules of thumb).


Tal


Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--




On Mon, Apr 19, 2010 at 2:07 PM, Mona_m purplem...@blueyonder.co.uk wrote:


 Hi,

 I have just found this forum, and it looks like a great place to get some
 help (I hope)
 For my dissertation, which is due way too soon, I am doing a survey,
 comparing attitudes of 2 independent groups, with 5 scale likert questions.
 Basically I want to show if they have similar or different attitudes. I am
 testing 4 hypotheses, and have in total about 20 questions.

 I have to say my statistic skills are very basic and very rusty, we had
 some
 lectures two years ago, where we were introduced to R. I looked through my
 notes, and back then we did a one sample t-test to analyse likert type
 questions. I believe I would need to do a 2 sample unpaired t-test.  It
 would be great if someone could give me some feedback if this test is the
 most suitable one for my purpose, and maybe could explain to me what’s the
 easiest way to do this in R?

 You would help me loads!!
 Many thanks in advance
 Mona

 --
 View this message in context:
 http://n4.nabble.com/comparing-attitudes-of-2-groups-likert-scales-tp2015738p2015738.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unwanted boxes in legend

2010-04-19 Thread Thomas Stewart
Try border=c(0,0,1,0).
-tgs

On Mon, Apr 19, 2010 at 4:21 AM, Steve Murray smurray...@hotmail.comwrote:


 Dear all,

 Thanks for the response, however I'm getting the following error message
 when I execute the legend command using the 'border' argument:

 Error in legend(10, par(usr)[4], c(A, B,  :
   unused argument(s) (border = FALSE)


 Is anyone aware of any alternative means of switching off boxes around all
 but one of the elements in a legend?

 Many thanks for any input,

 Steve


 
  Date: Thu, 15 Apr 2010 12:13:40 -0600
  From: ehl...@ucalgary.ca
  To: smurray...@hotmail.com
  CC: r-help@r-project.org
  Subject: Re: [R] Unwanted boxes in legend
 
  On 2010-04-15 11:10, Steve Murray wrote:
 
  Dear all,
 
  I am using the following code to generate a legend in my plot
 (consisting of both bars and points), but end up with boxes around my
 points:
 
  legend(10, par(usr)[4], c(A, B, C, D), fill=c(NA,NA, grey28,
 NA), pch=c(16,4,NA,18), col=c(red,blue,grey28,yellow), lty=FALSE,
 bty=n, horiz=FALSE)
 
  I want a box around the third element of the legend (to represent the
 bar 'fill' colour), but not for the others, where points are shown instead.
 
  What am I doing wrong above and how do I correct it?
 
  Add the 'border' argument:
 
  either
 
  border = FALSE # in which case no box is drawn for any element
 
  or
 
  border = c(NA, NA, black, NA)
 
  -Peter Ehlers
 
 
  Many thanks,
 
  Steve
 
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
  --
  Peter Ehlers
  University of Calgary


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merge

2010-04-19 Thread Pete B

Maybe this what you are looking for 

lines1 - CODPROD N1 N3 N4
23 3 55 4
24 5 67 36
25 3 73 24

df1 - read.table(textConnection(lines1),header=TRUE)

lines2 -CODPROD N1 N2  
30 34 45
45 0 78
65 0 56 

df2 - read.table(textConnection(lines2),header=TRUE)

merge(df1, df2, by = intersect(names(df1),names(df2)), all=TRUE) 

HTH

Pete
-- 
View this message in context: http://n4.nabble.com/merge-tp2015796p2015966.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] multiple variables pointing to single dataframe?

2010-04-19 Thread Steve Lianoglou
Hi,

On Mon, Apr 19, 2010 at 10:15 AM, Alex Bryant abry...@i-review.com wrote:
 Hi, for example:

 x - Orange
 x2 - x
 x[1,]$age - 50
 x2[1,]
  Tree age circumference
 1    1 118            30

 I would like a way for x2 to also reference the modified x data frame without 
 having to reassign x2x each time x is modified.

You can't *really* do this in R, but I believe you can rig up a work
around using environments (if you really have to).

This SO thread with links is *somehow* related to what you're asking.
Perhaps you'll find what you're looking for there:

http://stackoverflow.com/questions/2603184/r-pass-by-reference

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merge

2010-04-19 Thread Petr PIKAL
Hi

If the columns has the same name but different values in them then you 
shall either decide which one to keep yourself or you shall keep both. If 
they have same name and same values you could select only those which 
names do not match.

names(data1) %in% names(data2)

can select which names match and you can get rid of them in one of your 
data frame before merge.

Something like that (untested)

data1[,c(1,which(!(names(data1) %in% names(data2]

Regards
Petr



r-help-boun...@r-project.org napsal dne 19.04.2010 15:56:34:

 What do you want to get?
 
 And what exactly did you do?
 
 Your question isn't very clear.
 
 Sarah
 
 On Mon, Apr 19, 2010 at 7:59 AM, n.via...@libero.it n.via...@libero.it 
wrote:
  I have a problem with the merge function.
  I have to merge two big dataframes which  look like the following 
example.The
  problems is that I get duplicated rows.
 
  CODPROD   N1   N3   N4
  23   3   55 4
  24   5  6736
  25  3   73 24
 
 
 
  second data frame
 
 
  CODPROD  N1  N2
  30   34   45
  45   078
  65056
 
 
  The result that I get its like:
 
  CODPROD N1   N2 N3N4  N1.1
  23 3   NA55
  4 3
  24 5   NA67
  36   0
  25 3   NA73
  24 0
  30 34 45  NA
  NA 0
  45  0  78  NA
  NA   0
  65  0   56  NA
  NA .   0
 
  So N1.1 is a duplication of N1.I think I could solve the problems by
  specifying  the same columns but I have a lot of colums which have the 
same
  names in the two dataframe so I think its not the right way to solve 
it.
 
  Anyone knows how to avoid duplication??
 
 
 
 -- 
 Sarah Goslee
 http://www.functionaldiversity.org
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] multiple variables pointing to single dataframe?

2010-04-19 Thread Gabor Grothendieck
If you only need to retrieve x by referring to x2 and you don`t have
to modify x via x2 then this works:

 x - Orange
 makeActiveBinding(x2, function() x, .GlobalEnv)
 x$age - 50
 head(x2)
  Tree age circumference
11  5030
21  5058
31  5087
41  50   115
51  50   120
61  50   142


On Mon, Apr 19, 2010 at 10:15 AM, Alex Bryant abry...@i-review.com wrote:
 Hi, for example:

 x - Orange
 x2 - x
 x[1,]$age - 50
 x2[1,]
  Tree age circumference
 1    1 118            30

 I would like a way for x2 to also reference the modified x data frame without 
 having to reassign x2x each time x is modified.

 Thanks,
 Alex

 -Original Message-
 From: Petr PIKAL [mailto:petr.pi...@precheza.cz]
 Sent: Monday, April 19, 2010 3:18 AM
 To: Alex Bryant
 Cc: r-help@r-project.org
 Subject: Odp: [R] multiple variables pointing to single dataframe?

 Hi

 r-help-boun...@r-project.org napsal dne 16.04.2010 16:15:40:

 Hi,  I have a need to have 2 variables point to the same dataframe (d1),
  I

 What does it mean to point to data frame? Seems to me that it is something
 from C+.

 You can reference data frame by $ or by square brackets with as many
 variables as you want.

 see

 ?[

 regards
 Petr


 don't want to simply copy the dataframe ( d2-d1 ) as my understanding
 is that
 this will create a second dataframe.  Any suggestions on best practice
 here?

 Thank You,

 //
 // Alex Bryant
 // Software Developer
 // Integrated Clinical Systems, Inc.
 // 908-996-7208


 
 Confidentiality Note: This e-mail, and any attachment
 to...{{dropped:13}}

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ecdf

2010-04-19 Thread David Winsemius


On Apr 19, 2010, at 9:12 AM, Downey, Patrick wrote:


Hi Thierry,

That worked perfectly. Thanks for the suggestion.

For reference, in the documentation, it


What is it?


never lists {Hmisc}'s function as
starting with E instead of e.


Every instance of the documentation in the r help system using ??ecdf  
that I can find for Hmisc's version has it properly capitalized. I  
have Hmisc_3.7-0



I don't know who's in charge of
documentation


It would be the package maintainer if there were a problem.


, but that


What is that?

--
David


should probably be corrected.

Thanks again.

-Mitch

-Original Message-
From: ONKELINX, Thierry [mailto:thierry.onkel...@inbo.be]
Sent: Monday, April 19, 2010 9:08 AM
To: Downey, Patrick; R help
Subject: RE: [R] ecdf

R is case sensitive. ecdf() is in the stats package, Ecdf() is in  
Hmisc.

So you want Ecdf(x,what='1-F')

Thierry


ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie  Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics  Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no  
more
than asking him to perform a post-mortem examination: he may be able  
to

say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does  
not

ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey



-Oorspronkelijk bericht-
Van: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] Namens Downey, Patrick
Verzonden: maandag 19 april 2010 15:04
Aan: R help
Onderwerp: [R] ecdf

Hello,

I'd like to plot an empirical cumulative distribution
function, except instead of the fraction of values  x, I'd
like the fraction of values  x.


I think this can be done using the ecdf function in {Hmisc}.
I installed the package and loaded it. However, when
following the example given in the documentation, I get an error:

x - rnorm(100)
ecdf(x,what='1-F')
Error in ecdf(x, what = 1-F) : unused argument(s) (what = 1-F)

I believe that this is because R is attempting to access the
ecdf function in base R, which does not have the what option.
Am I correct, and if so, how can I change that?

Note: I also tried to do it myself without the {Hmisc} ecdf
function, and couldn't figure out a way.

x2 - 1-ecdf(x)

doesn't work, and neither does

x2 - rep(0,times=100)
for(i in 1:100){
 x2[i] - 1-ecdf(x)[i]
}

Both result in errors.

Thanks in advance for any suggestions you can offer.

-Mitch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Druk dit bericht a.u.b. niet onnodig af.
Please do not print this message unnecessarily.

Dit bericht en eventuele bijlagen geven enkel de visie van de  
schrijver

weer
en binden het INBO onder geen enkel beding, zolang dit bericht niet
bevestigd is
door een geldig ondertekend document. The views expressed in  this  
message
and any annex are purely those of the writer and may not be regarded  
as

stating
an official position of INBO, as long as the message is not  
confirmed by a

duly
signed document.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ecdf

2010-04-19 Thread David Winsemius



The OP wrote me privately to say that the errant documantation was at:



http://lib.stat.cmu.edu/S/Harrell/help/Hmisc/html/ecdf.html


That is a rather old bit of information. It dates back to a time when  
Frank's address was at the the University of Virginia. In 2003 he  
moved to Vanderbilt so that page dates from some year prior to 2003.  
(And it may not currently be under the control of anyone given its  
quasi-archival status.)


--
David.


-Original Message-
From: David Winsemius [mailto:dwinsem...@comcast.net]
Sent: Monday, April 19, 2010 10:25 AM
To: Downey, Patrick
Cc: ONKELINX, Thierry; R help
Subject: Re: [R] ecdf


On Apr 19, 2010, at 9:12 AM, Downey, Patrick wrote:


Hi Thierry,

That worked perfectly. Thanks for the suggestion.

For reference, in the documentation, it


What is it?


never lists {Hmisc}'s function as
starting with E instead of e.


Every instance of the documentation in the r help system using ??ecdf
that I can find for Hmisc's version has it properly capitalized. I
have Hmisc_3.7-0


I don't know who's in charge of
documentation


It would be the package maintainer if there were a problem.


, but that


What is that?

--
David


should probably be corrected.

Thanks again.

-Mitch

-Original Message-
From: ONKELINX, Thierry [mailto:thierry.onkel...@inbo.be]
Sent: Monday, April 19, 2010 9:08 AM
To: Downey, Patrick; R help
Subject: RE: [R] ecdf

R is case sensitive. ecdf() is in the stats package, Ecdf() is in
Hmisc.
So you want Ecdf(x,what='1-F')

Thierry


ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie  Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics  Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no
more
than asking him to perform a post-mortem examination: he may be able
to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does
not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey



-Oorspronkelijk bericht-
Van: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] Namens Downey, Patrick
Verzonden: maandag 19 april 2010 15:04
Aan: R help
Onderwerp: [R] ecdf

Hello,

I'd like to plot an empirical cumulative distribution
function, except instead of the fraction of values  x, I'd
like the fraction of values  x.


I think this can be done using the ecdf function in {Hmisc}.
I installed the package and loaded it. However, when
following the example given in the documentation, I get an error:

x - rnorm(100)
ecdf(x,what='1-F')
Error in ecdf(x, what = 1-F) : unused argument(s) (what = 1-F)

I believe that this is because R is attempting to access the
ecdf function in base R, which does not have the what option.
Am I correct, and if so, how can I change that?

Note: I also tried to do it myself without the {Hmisc} ecdf
function, and couldn't figure out a way.

x2 - 1-ecdf(x)

doesn't work, and neither does

x2 - rep(0,times=100)
for(i in 1:100){
x2[i] - 1-ecdf(x)[i]
}

Both result in errors.

Thanks in advance for any suggestions you can offer.

-Mitch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Druk dit bericht a.u.b. niet onnodig af.
Please do not print this message unnecessarily.

Dit bericht en eventuele bijlagen geven enkel de visie van de
schrijver
weer
en binden het INBO onder geen enkel beding, zolang dit bericht niet
bevestigd is
door een geldig ondertekend document. The views expressed in  this
message
and any annex are purely those of the writer and may not be regarded
as
stating
an official position of INBO, as long as the message is not
confirmed by a
duly
signed document.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT



David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Using split and then unsplit

2010-04-19 Thread anna

Hello everyone,
I use the split function splitting with the f function on a 3 columns and
more than 100 000 rows data frame. Once it's split I have a list of data
frames still with 3 columns and n rows. I manipulate those list elements and
get a list of data frames still with 3 columns but less rows. So when I
unsplit it, I get an error as I use the same factor function I used to split
( f in the help split page) I guess because the number of rows changed. Do I
need to create a new f function to be able to unsplit or is there another
way to unsplit those data frames and rbind them? Thank you!

-
Anna Lippel
-- 
View this message in context: 
http://n4.nabble.com/Using-split-and-then-unsplit-tp2016071p2016071.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Grouping rows of data by day

2010-04-19 Thread jennyed

Hi all,

I have a set of data in hourly time steps with each row identified as 
time  data column1 data column2
1  
1.042
1.083
1.125
1.167
1.208
1.25 .and so on (the
time column is in fractions of a day)

I want to be able to group the data by day. I managed to do this using:

Day1H = hourlydata[c(1:24),]

but I'd like to be able to create groups for each day without doing this
manually for each set of 24 rows. 

Any suggestions greatly appreciated

Thanks


-- 
View this message in context: 
http://n4.nabble.com/Grouping-rows-of-data-by-day-tp2016063p2016063.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inline Package: void vs return type functions

2010-04-19 Thread satu

Many Thanks for your help

Best,

Sergio
-- 
View this message in context: 
http://n4.nabble.com/Inline-Package-void-vs-return-type-functions-tp1838423p2015898.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] comparing attitudes of 2 groups / likert scales?

2010-04-19 Thread David Winsemius


On Apr 19, 2010, at 10:08 AM, Tal Galili wrote:


Good luck in your work,

The simple solution would be to run many non-paired wilcox on all  
the 20

questions (the way Dieter suggested).
In which case, make sure to adjust for multiple comparisons.  Read  
about it,

and see:
?p.adjust
If you have some questions you can merge (by a simple mean of them),  
it will
probably do you good (using PCA might be an option, but it could  
also be an

over kill for you).


You might also be interested in plotting your data, here is a nice  
simple
hack on how to display the Correlation scatter-plot matrix for your  
data:

http://www.r-statistics.com/2010/04/correlation-scatter-plot-matrix-for-ordered-categorical-data/


And Dieter, thanks for a great quote: Assess independence,  
equalvariance

and normality -in that order (van Bell, Statistical rules of thumb).


If you are thinking of using that quote, you might want to check the  
spelling of his name. My memory is van Belle.


--
David.




Tal


Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il  
(Hebrew) |

www.r-statistics.com (English)
--




On Mon, Apr 19, 2010 at 2:07 PM, Mona_m  
purplem...@blueyonder.co.uk wrote:




Hi,

I have just found this forum, and it looks like a great place to  
get some

help (I hope)
For my dissertation, which is due way too soon, I am doing a survey,
comparing attitudes of 2 independent groups, with 5 scale likert  
questions.
Basically I want to show if they have similar or different  
attitudes. I am

testing 4 hypotheses, and have in total about 20 questions.

I have to say my statistic skills are very basic and very rusty, we  
had

some
lectures two years ago, where we were introduced to R. I looked  
through my
notes, and back then we did a one sample t-test to analyse likert  
type
questions. I believe I would need to do a 2 sample unpaired t- 
test.  It
would be great if someone could give me some feedback if this test  
is the
most suitable one for my purpose, and maybe could explain to me  
what’s the

easiest way to do this in R?

You would help me loads!!
Many thanks in advance
Mona

--
View this message in context:
http://n4.nabble.com/comparing-attitudes-of-2-groups-likert-scales-tp2015738p2015738.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using split and then unsplit

2010-04-19 Thread anna

here is an alternative that I just found to join my data frames with rbind:
result - do.call(rbind, myList)
It worked perfectly but I still don't understand why unsplit wouldn't
work...


-
Anna Lippel
-- 
View this message in context: 
http://n4.nabble.com/Using-split-and-then-unsplit-tp2016071p2016081.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] comparing attitudes of 2 groups / likert scales?

2010-04-19 Thread Dieter Menne


David Winsemius wrote:
 
 
 If you are thinking of using that quote, you might want to check the  
 spelling of his name. My memory is van Belle.
 
 

Sorry, I thought I had corrected that before mailing.

@BOOK{vanBelle2002,
  title = {Statistical rules of thumb},
  publisher = {Wiley series in probability and statistics},
  year = {2002},
  author = {Gerald van Belle}
}


-- 
View this message in context: 
http://n4.nabble.com/comparing-attitudes-of-2-groups-likert-scales-tp2015738p2016083.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Formatting data, adding column names, use reshape, a newbie question

2010-04-19 Thread hadley wickham
On Mon, Apr 19, 2010 at 5:13 AM, Paul Rigor (ucla) pr...@ucla.edu wrote:
 Hi all,
 I'm an R novice.

 I have data that's already formatted as molten that reshape should be able
 to work with. For example, the following was read in with
 read.csv(filename,sep= , header=FALSE)

      V1               V2         V3                 V4
  V5
 1    original    book    book.source1.txt    328900494    3039.525
 2    original    book    book.source1.txt    328900494    3057.952


 I would like add column names so I can use reshape's cast method.

names(df) - c(col1, col2, ..., col5) ?

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using split and then unsplit

2010-04-19 Thread David Winsemius


On Apr 19, 2010, at 11:06 AM, anna wrote:



Hello everyone,
I use the split function splitting with the f function on a 3  
columns and
more than 100 000 rows data frame. Once it's split I have a list of  
data
frames still with 3 columns and n rows. I manipulate those list  
elements and
get a list of data frames still with 3 columns but less rows. So  
when I
unsplit it, I get an error as I use the same factor function I used  
to split
( f in the help split page) I guess because the number of rows  
changed. Do I
need to create a new f function to be able to unsplit or is there  
another

way to unsplit those data frames and rbind them? Thank you!


You may get success with:

do.call(rbind, splitted)

The other task to set for yourself is to read the Posting Guide again  
and create test cases when posting to r-help.


--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using split and then unsplit

2010-04-19 Thread anna

Hi David, do.call worked perfectly but do you have an idea why unsplit
wouldn't work in that case, is that because the number of rows changed? bc
when the number didn't change unsplit worked

-
Anna Lippel
-- 
View this message in context: 
http://n4.nabble.com/Using-split-and-then-unsplit-tp2016071p2016116.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] stupid regexp question

2010-04-19 Thread servet ahmet çizmeli
Hello,

I have a stupid regexp question. I have a large data frame of strings. I would
like to convert all occurences of :

W.m^{-2}

to

W/m2

I make the following test :

gsub(glob2rx(W.m^{-2}), W/m2, W.m^{-2})

but it does not seem to work. I don't know how to do it otherwise as I could
never learn how to deal with the special characters (like .^{}) in regexps.

Thanks from advance for your kindly help
servet

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Drawing a line with misc3d

2010-04-19 Thread Christophe Genolini

Hi the list,

I would like to draw some lines with misc3d. I find a lot of tools to 
draw surfaces, but nothing for simple line... Is it possible?
Note that I know that it is possible to draw lines with rgl (using 
lines3d), but I need to do it with misc3d to export the drawing in .asy 
format.

Any solution?

Christophe

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] var.test

2010-04-19 Thread David Winsemius


On Apr 18, 2010, at 4:55 PM, anon anon wrote:


Hello,

I'm using var.test to do a simple F-test for equality of variances.  
I think

I'm missing something small here:


m-rnorm(10,sd=1)
n-rnorm(5,sd=1)
var.test(m,n)


   F test to compare two variances

data:  m and n
F = 13.7438, num df = 9, denom df = 4, p-value = 0.02256
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 1.543430 64.844094
sample estimates:
ratio of variances
 13.74375


qf(.0250,9,4)*var(m)/var(n)
[1] 2.912997 - correct degrees of freedom (I think!) and does not  
match

var.test lower bound

qf(.0250,4,9)*var(m)/var(n)
[1] 1.543430   -matches var.test lower bound with degrees of  
freedom

reversed


The var.test code is available for inspection:

getAnywhere(var.test.default)

It can be seen to use the ratio of the estimate to the theoretic qf  
value. Was there a reason you decided to use the product?


BETA - (1 - conf.level)/2
CINT - c(ESTIMATE/qf(1 - BETA, DF.x, DF.y), ESTIMATE/qf(BETA,
DF.x, DF.y))






It seems that the F-test in var.test is getting the degrees of  
freedom mixed

up. Outside calculators seem to agree with the qf function.


I would think that inverting the estimate should reverse the correct  
order for the degrees of freedom, but it is not clear that your choice  
for the CI calculation is the correct one.




So, am I misunderstanding something?


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Tinn-R

2010-04-19 Thread Robert Ruser
Hello,
I want to use the free distribution of R (R REvolution 3.2) and Tinn-R
editor as well. Unfortunately they don't cooperate. In Tinn-R
commands: send selection, send line etc. don't work. Do you have any
idea how to resolve this problem?

Best,
Robert

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] stupid regexp question

2010-04-19 Thread David Winsemius


On Apr 19, 2010, at 11:39 AM, servet ahmet çizmeli wrote:


Hello,

I have a stupid regexp question. I have a large data frame of  
strings. I would

like to convert all occurences of :

W.m^{-2}

to

W/m2

I make the following test :

gsub(glob2rx(W.m^{-2}), W/m2, W.m^{-2})


Two problems I see. There is no reason I can see to wrap the pattern  
in glob2rx, and you need to double-back-slash the specials when they  
appear in the pattern:


 gsub(W.m\\^\\{-2\\}, W/m2, W.m^{-2})
[1] W/m2

Seems successful on that limited test.



--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] stupid regexp question

2010-04-19 Thread Gabor Grothendieck
Use fixed = TRUE to turn off interpretation of special characters:

gsub(W.m^{-2}, W/m2, abc W.m^{-2} xyz, fixed = TRUE)


2010/4/19 servet ahmet çizmeli sa.cizm...@usherbrooke.ca:
 Hello,

 I have a stupid regexp question. I have a large data frame of strings. I would
 like to convert all occurences of :

 W.m^{-2}

 to

 W/m2

 I make the following test :

 gsub(glob2rx(W.m^{-2}), W/m2, W.m^{-2})

 but it does not seem to work. I don't know how to do it otherwise as I could
 never learn how to deal with the special characters (like .^{}) in regexps.

 Thanks from advance for your kindly help
 servet

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] BRugs

2010-04-19 Thread N S Ha

Thanks for the reply Bob, but it still does not work, you see. I ran this
model, just with the main effects and it ran fine.

n=length(bi.bmi)

Lgen=2
Lrace=5
Lagegp=13
Lstra=15
Lpsu=2

bi.bmi.model=function(){
# likelihood
for(i in 1:n){
bi.bmi[i]~ dbern(p[i])
logit(p[i])- a0 + a1[agegp[i]]+a2[gen[i]]+a3[race[i]]
+ g[stra[i]]+ u[psu[i],stra[i]]
}
# constraints for a1, a2, a3
a1[1]-0.0
a2[1]-0.0
a3[1]-0.0
# priors
a0~ dnorm(0.0, 1.0E-4)
for(j in 2:Lagegp){a1[j]~ dnorm(0.0, 1.0E-4)}
for(j in 2:Lgen){ a2[j]~ dnorm(0.0, 1.0E-4)}
for(k in 2:Lrace){ a3[k]~ dnorm(0.0, 1.0E-4)}

for(l in 1:Lstra){
g[l]~dunif(0, 100)
}
for( m in 1:Lpsu){
for(l in 1:Lstra){
u[m,l]~ dnorm(0.0, tau.u)
}}
tau.u-pow(sigma.u, -2)
sigma.u~ dunif(0.0,100)
}
 
library(BRugs)
writeModel(bi.bmi.model, con='bi.bmi.model.txt')
model.data=list( 'n','Lagegp', 'Lgen', 'Lrace', 'Lstra', 'Lpsu',
 
'bi.bmi','agegp', 'gen', 'race','stra', 'psu')
model.init=function(){
list( sigma.u=runif(1),
a0=rnorm(1), a1=c(NA, rep(0,12)),
a2=c(NA, rep(0, 1)),
a3=c(NA, rep(0, 4)), 
g=rep(0,Lstra), u=matrix(rep(0, 30), nrow=2))
}
model.parameters=c( 'a0', 'a1', 'a2', 'a3')
model.bugs=BRugsFit(modelFile='bi.bmi.model.txt',
   data=model.data,
   inits=model.init,
   numChains=1, 
   para=model.parameters,
   nBurnin=50, nIter=100)

This is just with the main effects, and this does not give me any problems,
and I also ran the following model with interaction term between gen and
race, and it also ran fine.
for (i in 1:n){
bi.bmi[i]~ dbern(p[i])
logit(p[i])- a0 + a1[agegp[i]]+a2[gen[i]]+a3[race[i]]
   + a23[gen[i], race[i]]
  + gam[stra[i]]+ u[psu[i],stra[i]]
}
# constraints for a2, a3, a12 and a13
a1[1]-0.0
a2[1]-0.0
a3[1]-0.0
a23[1,1]-0.0

#gen x race
for(j in 2:Lrace){ a23[1,j]-0.0}
for(k in 2:Lgen){ a23[k,1]-0.0}
# priors
a0~ dnorm(0.0, 1.0E-4)
for(i in 2:Lagegp){a1[i]~dnorm(0.0, 1.0E-4)}
for(i in 2:Lgen){ a2[i]~ dnorm(0.0, 1.0E-4)}
for(i in 2:Lrace){ a3[i]~ dnorm(0.0, 1.0E-4)}
for(i in 2:Lgen){
for(j in 2:Lrace){
a23[i,j]~ dnorm(0.0, 1.0E-4)
}}
for(i in 1:Lstra){
gam[i]~dunif(0, 1000)
}
for( i in 1:Lpsu){
for(j in 1:Lstra){
u[i,j]~ dnorm(0.0, tau.u)
}}
tau.u-pow(sigma.u, -2)
sigma.u~ dunif(0.0,100)
}

So, the error happens only when I try to plug in interaction with the agegp.
I still don't know how to correct it.
Thanks

-- 
View this message in context: http://n4.nabble.com/BRugs-tp2015395p2016164.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] getting column's main

2010-04-19 Thread Jannis

Try

col=max.col(data)


This should give you the index of the column with the max value. To get 
to the final result, combine this with


names(dataframe)[col]


to get the name of the column with the maximum value.


HTH
Jannis

AuriDUL schrieb:

Hello.

I have data of potatoes production in EU during 1998-2009 from EuroStat
where the first column consists of the names of EU countries, the following
columns consists of appropriate data in each year.

Let's say, I investigate Lithuania. For example, I have a row containing
potatoes production in 1998, 1999, ..., 2009 in Lithuania. I can easily find
the maximum value in this row but...

...but

How could I print the year (the name of the column) where that maximum value
of potaoes production in Lithuania exists? [if it's even possible.]

I only have a code which can print a number of the column where that maximum
value of Lithuania's potatoes production during 1998-2009 is.

Thanks in advance.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tinn-R

2010-04-19 Thread Tal Galili
Consider trying
notepad++
with
NppToR

That's what I use (it also works with the REvolution distribution)



Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--




On Mon, Apr 19, 2010 at 6:50 PM, Robert Ruser robert.ru...@gmail.comwrote:

 Hello,
 I want to use the free distribution of R (R REvolution 3.2) and Tinn-R
 editor as well. Unfortunately they don't cooperate. In Tinn-R
 commands: send selection, send line etc. don't work. Do you have any
 idea how to resolve this problem?

 Best,
 Robert

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Follow up on installing formatR...

2010-04-19 Thread Michael Lawrence
On Mon, Apr 19, 2010 at 5:28 AM, Brian Lunergan ff...@ncf.ca wrote:

 Good morning folks:

 Made a second go at installing this and succeeded, but with some strange
 behaviours along the way. First the system back story.


My only guess is that installing RGtk2 from source is only possible if you
have the gtk-devel package installed, i.e., the one with the headers.
Otherwise, you'll need the r-cran-rgtk2 binary.

Michael

Using up to date edition Ubuntu 8.04

 Using up to date edition of R 2.10.1

 Using the Toronto, Ontario repository to draw from.

 After the first attempt ended up with two of the four dependencies
 installed so only the Rgtk related items would be needed this time. Went
 into synaptic and found r-cran-rgtk2. Okay. Sounded likely so I installed
 it successfully.

Using a root terminal session I went back into R and ran
 install.packages(formatR). Pulled in two dependencies plus the main
 choice. Rejected one and kept two when it came time to install but still
 installed successfully, or so it seemed when I ran library(formatR). Go
 figure. A copy of the message run follows. Any comments or suggestions will
 be of interest.

  install.packages(formatR)
 Warning in install.packages(formatR) :
  argument 'lib' is missing: using '/usr/local/lib/R/site-library'
 --- Please select a CRAN mirror for use in this session ---
 Loading Tcl/Tk interface ... done
 also installing the dependencies ‘RGtk2’, ‘gWidgetsRGtk2’

 trying URL 'http://probability.ca/cran/src/contrib/RGtk2_2.12.18.tar.gz'
 Content type 'application/x-gzip' length 2206504 bytes (2.1 Mb)
 opened URL
 ==
 downloaded 2.1 Mb

 trying URL '
 http://probability.ca/cran/src/contrib/gWidgetsRGtk2_0.0-64.tar.gz'
 Content type 'application/x-gzip' length 138192 bytes (134 Kb)
 opened URL
 ==
 downloaded 134 Kb

 trying URL 'http://probability.ca/cran/src/contrib/formatR_0.1-3.tar.gz'
 Content type 'application/x-gzip' length 2672 bytes
 opened URL
 ==
 downloaded 2672 bytes

 * installing *source* package ‘RGtk2’ ...
 checking for pkg-config... /usr/bin/pkg-config
 checking pkg-config is at least version 0.9.0... yes
 checking for LIBGLADE... no
 configure: WARNING: libglade not found
 checking for INTROSPECTION... no
 checking for GTK... no
 configure: error: GTK version 2.8.0 required
 ERROR: configuration failed for package ‘RGtk2’
 * removing ‘/usr/local/lib/R/site-library/RGtk2’
 * installing *source* package ‘gWidgetsRGtk2’ ...
 ** R
 ** inst
 ** preparing package for lazy loading
 ** help
 *** installing help indices
 ** building package indices ...
 * DONE (gWidgetsRGtk2)
 * installing *source* package ‘formatR’ ...
 ** R
 ** preparing package for lazy loading
 Loading required package: gWidgets
 Loading required package: MASS
 ** help
 *** installing help indices
 ** building package indices ...
 * DONE (formatR)

 The downloaded packages are in
‘/tmp/RtmpNL20Be/downloaded_packages’
 Warning message:
 In install.packages(formatR) :
  installation of package 'RGtk2' had non-zero exit status
 

 --
 Brian Lunergan
 Nepean, Ontario
 Canada

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to make a boxplot with exclusion of certain groups

2010-04-19 Thread Josef . Kardos
This seems like a simple thing, but I have been stuck for some time.  My 
data has 2 columns.  Column 1 is the value, and column 2 is the Site where 
data was collected.  Column 2 contains 7 different Sites (i.e. groups).  I 
am only interested in showing 3 groups on a single boxplot. 

I have tried various methods of subsetting the data, in order to only have 
the 3 groups in my subset.  However even after doing this, all 7 groups 
carry forward, so that when I make a boxplot of my subsetted data, all 7 
groups still appear in the x-axis labels; all 7 groups also appear in the 
boxplot summary (i.e. the values returned with boxplot (…plot=FALSE)  ) . 
Even if I delete the unwanted groups from the ‘levels’ of Column 2, they 
still appear on the plot, and in the boxplot summary statistics.

There are various tricks I can do with the boxplot summary statistics to 
correct for this, but they get complicated when I want to change the 
algorithm for calculating outliers and their corresponding groups. Rather 
than do all these tricks, it seems much simpler to fully exclude the 
unwanted groups from the beginning.  But this doesn’t appear to work

Any ideas?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Identifying names of matrix columns shared by many matrices

2010-04-19 Thread Marshall Feldman
Greetings R-Geniuses,

What is the most efficient way to handle the problem described below?

Thanks
 Marsh Feldman


Problem description:

Each U.S. state has its own matrix. The rows are dates, the columns are 
industries, and each cell contains total statewide employment at the 
given time and industry. There is a similar matrix for the U.S. as a 
whole. Due to disclosure rules and other limitations, one or more 
industries may be missing from any given matrix (including the national 
one), but industries missing from one matrix are sometimes not missing 
from others. Industry numbers are treated as factors commonly used as 
column names.

I want to do two things:

   1. For any given set of states, find the set of industries present in
  all of them and use this to select this subset of industries from
  each state's matrix.
   2. For any given set of states, find the set of industries present in
  any of the states.
   3. Given that one or more cells in the table may be NA, identify
  those industries present in all states and have no values equal to NA.

I can do this using for() statements and %in%, but is there is a more 
efficient way? Your thoughts?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] plotting RR, 95% CI as table and figure in same plot

2010-04-19 Thread David Atkins


Hi all--

I am in the process of helping colleagues write up a ms in which we fit 
zero-inflated Poisson models.  I would prefer plotting the rate ratios 
and 95% CI (as I've found Gelman and others convincing about plotting 
tables...), but our journals usually like the numbers themselves.


Thus, I'm looking at a recent JAMA article in which both numbers and 
dotplot of RR and 95% CI are presented and wondering about best way to 
do this in R.


Essentially, the plot has 3 columns: variable names, RR and 95% CI, and 
dotplot of the same.


Using the bioChemists data in the pscl package and errbar function in 
Hmisc package, the code below is in the right direction... but still 
pretty ugly.


Wondering if folks would have alternative suggestions about how to go 
about this, or pointers on cleaning up the code below (eg, I know there 
are many functions for plotting errbars/CI).


[And, obviously, there are somethings that would be straightforward to 
clean-up such as supplying better variable names, etc., just wanted to 
see if there were better overall suggestions before getting too far on 
this route.]


Thanks in advance.

cheers, Dave

library(Hmisc)
library(pscl)   

## data
data(bioChemists, package = pscl)
fm_pois - glm(art ~ ., data = bioChemists, family = poisson)
summary(fm_pois)

### pull out rate-ratios and 95% CI
rr - exp(cbind(coef(fm_pois), confint(fm_pois)))
rr
### round to 2 decimal places
rr - round(rr, 2)

### plot
par(mfrow=c(1,3))
plot(0, type = n, xlim=c(0,2), ylim=c(1,6),
axes = FALSE, ylab=NULL, xlab=NULL)
text(row.names(rr), x = 1, y = 1:6)

plot(0, type = n, xlim=c(0,2), ylim=c(1,6),
axes = FALSE, ylab=NULL, xlab=NULL)
text(paste(rr[,1],  [, rr[,2], , , rr[,3], ], sep = ), x = 1, y 
= 1:6)


errbar(x = factor(row.names(rr)),
y = rr[,1], yplus = rr[,3],
yminus = rr[,2])
abline(v = 1, lty =2)   

--
Dave Atkins, PhD
Research Associate Professor
Department of Psychiatry and Behavioral Science
University of Washington
datk...@u.washington.edu

Center for the Study of Health and Risk Behaviors (CSHRB)   
1100 NE 45th Street, Suite 300  
Seattle, WA  98105  
206-616-3879
http://depts.washington.edu/cshrb/
(Mon-Wed)   

Center for Healthcare Improvement, for Addictions, Mental Illness,
  Medically Vulnerable Populations (CHAMMP)
325 9th Avenue, 2HH-15
Box 359911
Seattle, WA 98104?
206-897-4210
http://www.chammp.org
(Thurs)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to make a boxplot with exclusion of certain groups

2010-04-19 Thread Chuck Cleland
On 4/19/2010 12:21 PM, josef.kar...@phila.gov wrote:
 This seems like a simple thing, but I have been stuck for some time.  My 
 data has 2 columns.  Column 1 is the value, and column 2 is the Site where 
 data was collected.  Column 2 contains 7 different Sites (i.e. groups).  I 
 am only interested in showing 3 groups on a single boxplot. 
 
 I have tried various methods of subsetting the data, in order to only have 
 the 3 groups in my subset.  However even after doing this, all 7 groups 
 carry forward, so that when I make a boxplot of my subsetted data, all 7 
 groups still appear in the x-axis labels; all 7 groups also appear in the 
 boxplot summary (i.e. the values returned with boxplot (…plot=FALSE)  ) . 
 Even if I delete the unwanted groups from the ‘levels’ of Column 2, they 
 still appear on the plot, and in the boxplot summary statistics.
 
 There are various tricks I can do with the boxplot summary statistics to 
 correct for this, but they get complicated when I want to change the 
 algorithm for calculating outliers and their corresponding groups. Rather 
 than do all these tricks, it seems much simpler to fully exclude the 
 unwanted groups from the beginning.  But this doesn’t appear to work
 
 Any ideas?

library(gdata) # for drop.levels()

DF - data.frame(site = rep(LETTERS[1:7], each=20), y = runif(7*20))

boxplot(y ~ drop.levels(site), data=subset(DF, site %in% c('A','D','F'),
drop=TRUE))

   [[alternative HTML version deleted]]
 
 
 
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Chuck Cleland, Ph.D.
NDRI, Inc. (www.ndri.org)
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plotting RR, 95% CI as table and figure in same plot

2010-04-19 Thread Thomas Lumley



You could try the forestplot() function in rmeta, or the original grid code on 
which it is based,
  http://www.stat.auckland.ac.nz/~paul/RGraphics/chapter1.html

-thomas

On Mon, 19 Apr 2010, David Atkins wrote:



Hi all--

I am in the process of helping colleagues write up a ms in which we fit 
zero-inflated Poisson models.  I would prefer plotting the rate ratios and 
95% CI (as I've found Gelman and others convincing about plotting tables...), 
but our journals usually like the numbers themselves.


Thus, I'm looking at a recent JAMA article in which both numbers and dotplot 
of RR and 95% CI are presented and wondering about best way to do this in R.


Essentially, the plot has 3 columns: variable names, RR and 95% CI, and 
dotplot of the same.


Using the bioChemists data in the pscl package and errbar function in Hmisc 
package, the code below is in the right direction... but still pretty ugly.


Wondering if folks would have alternative suggestions about how to go about 
this, or pointers on cleaning up the code below (eg, I know there are many 
functions for plotting errbars/CI).


[And, obviously, there are somethings that would be straightforward to 
clean-up such as supplying better variable names, etc., just wanted to see if 
there were better overall suggestions before getting too far on this route.]


Thanks in advance.

cheers, Dave

library(Hmisc)
library(pscl)   

## data
data(bioChemists, package = pscl)
fm_pois - glm(art ~ ., data = bioChemists, family = poisson)
summary(fm_pois)

### pull out rate-ratios and 95% CI
rr - exp(cbind(coef(fm_pois), confint(fm_pois)))
rr
### round to 2 decimal places
rr - round(rr, 2)

### plot
par(mfrow=c(1,3))
plot(0, type = n, xlim=c(0,2), ylim=c(1,6),
axes = FALSE, ylab=NULL, xlab=NULL)
text(row.names(rr), x = 1, y = 1:6)

plot(0, type = n, xlim=c(0,2), ylim=c(1,6),
axes = FALSE, ylab=NULL, xlab=NULL)
text(paste(rr[,1],  [, rr[,2], , , rr[,3], ], sep = ), x = 1, y = 
1:6)


errbar(x = factor(row.names(rr)),
y = rr[,1], yplus = rr[,3],
yminus = rr[,2])
abline(v = 1, lty =2)   

--
Dave Atkins, PhD
Research Associate Professor
Department of Psychiatry and Behavioral Science
University of Washington
datk...@u.washington.edu

Center for the Study of Health and Risk Behaviors (CSHRB)   
1100 NE 45th Street, Suite 300  
Seattle, WA  98105  
206-616-3879
http://depts.washington.edu/cshrb/
(Mon-Wed)   

Center for Healthcare Improvement, for Addictions, Mental Illness,
 Medically Vulnerable Populations (CHAMMP)
325 9th Avenue, 2HH-15
Box 359911
Seattle, WA 98104?
206-897-4210
http://www.chammp.org
(Thurs)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Thomas Lumley   Assoc. Professor, Biostatistics
tlum...@u.washington.eduUniversity of Washington, Seattle

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] dataframe

2010-04-19 Thread Jeff Brown

http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-do-I-convert-factors-to-numeric_003f
-- 
View this message in context: 
http://n4.nabble.com/dataframe-tp2015650p2016230.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to pass a list of parameters into a function

2010-04-19 Thread Gene Leynes
Does anyone know how to pass a list of parameters into a function?


for example:

somefun=function(x1,x2,x3,x4,x5,x6,x7,x8,x9){
ans=x1+x2+x3+x4+x5+x6+x7+x8+x9
return(ans)
}

somefun(1,2,3,4,5,6,7,8,9)

# I would like this to work:
temp=c(x3=3,x4=4,x5=5,x6=6,x7=7,x8=8,x9=9)
somefun(x1=1,x2=2,temp)

# OR I would like this to work:
temp=list(x3=3,x4=4,x5=5,x6=6,x7=7,x8=8,x9=9)
somefun(x1=1,x2=2,temp)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] BRugs

2010-04-19 Thread Uwe Ligges
Perhaps a better idea is to ask on a BUGS mailing list. BRugs is just an 
interface to OpenBUGS and is not involved in handling the BUGS language


I'd also suggest to strat trying your problem witht BRugs but in 
OpenBUGS directly in order to avoid confusion caused by the interface.


Best wishes,
Uwe Ligges

On 19.04.2010 18:04, N S Ha wrote:


Thanks for the reply Bob, but it still does not work, you see. I ran this
model, just with the main effects and it ran fine.

n=length(bi.bmi)

Lgen=2
Lrace=5
Lagegp=13
Lstra=15
Lpsu=2

bi.bmi.model=function(){
# likelihood
for(i in 1:n){
bi.bmi[i]~ dbern(p[i])
logit(p[i])- a0 + a1[agegp[i]]+a2[gen[i]]+a3[race[i]]
+ g[stra[i]]+ u[psu[i],stra[i]]
}
# constraints for a1, a2, a3
a1[1]-0.0
a2[1]-0.0
a3[1]-0.0
# priors
a0~ dnorm(0.0, 1.0E-4)
for(j in 2:Lagegp){a1[j]~ dnorm(0.0, 1.0E-4)}
for(j in 2:Lgen){ a2[j]~ dnorm(0.0, 1.0E-4)}
for(k in 2:Lrace){ a3[k]~ dnorm(0.0, 1.0E-4)}

for(l in 1:Lstra){
g[l]~dunif(0, 100)
}
for( m in 1:Lpsu){
for(l in 1:Lstra){
u[m,l]~ dnorm(0.0, tau.u)
}}
tau.u-pow(sigma.u, -2)
sigma.u~ dunif(0.0,100)
}

library(BRugs)
writeModel(bi.bmi.model, con='bi.bmi.model.txt')
model.data=list( 'n','Lagegp', 'Lgen', 'Lrace', 'Lstra', 'Lpsu',

'bi.bmi','agegp', 'gen', 'race','stra', 'psu')
model.init=function(){
list( sigma.u=runif(1),
a0=rnorm(1), a1=c(NA, rep(0,12)),
a2=c(NA, rep(0, 1)),
a3=c(NA, rep(0, 4)),
g=rep(0,Lstra), u=matrix(rep(0, 30), nrow=2))
}
model.parameters=c( 'a0', 'a1', 'a2', 'a3')
model.bugs=BRugsFit(modelFile='bi.bmi.model.txt',
   data=model.data,
   inits=model.init,
   numChains=1,
   para=model.parameters,
   nBurnin=50, nIter=100)

This is just with the main effects, and this does not give me any problems,
and I also ran the following model with interaction term between gen and
race, and it also ran fine.
for (i in 1:n){
bi.bmi[i]~ dbern(p[i])
logit(p[i])- a0 + a1[agegp[i]]+a2[gen[i]]+a3[race[i]]
   + a23[gen[i], race[i]]
  + gam[stra[i]]+ u[psu[i],stra[i]]
}
# constraints for a2, a3, a12 and a13
a1[1]-0.0
a2[1]-0.0
a3[1]-0.0
a23[1,1]-0.0

#gen x race
for(j in 2:Lrace){ a23[1,j]-0.0}
for(k in 2:Lgen){ a23[k,1]-0.0}
# priors
a0~ dnorm(0.0, 1.0E-4)
for(i in 2:Lagegp){a1[i]~dnorm(0.0, 1.0E-4)}
for(i in 2:Lgen){ a2[i]~ dnorm(0.0, 1.0E-4)}
for(i in 2:Lrace){ a3[i]~ dnorm(0.0, 1.0E-4)}
for(i in 2:Lgen){
for(j in 2:Lrace){
a23[i,j]~ dnorm(0.0, 1.0E-4)
}}
for(i in 1:Lstra){
gam[i]~dunif(0, 1000)
}
for( i in 1:Lpsu){
for(j in 1:Lstra){
u[i,j]~ dnorm(0.0, tau.u)
}}
tau.u-pow(sigma.u, -2)
sigma.u~ dunif(0.0,100)
}

So, the error happens only when I try to plug in interaction with the agegp.
I still don't know how to correct it.
Thanks



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] S4-based package failure: setGeneric example#1

2010-04-19 Thread Daniel Murphy
Nevermind, I figured out my problem after looking at the packS4 package.

I didn't realize that package.skeleton would build invalid R code. The
warning hints at that, but I interpreted the warning to mean that the
package.skeleton-generated code should be *edited*, and none of my edits
gave me a workable package. Now I realize that the S4-driven code from
package.skeleton can simply be *replaced by the original code*, which is
what packS4 does.

-D

I am a newbie package builder who successfully built a Hello world
package
but am now having trouble building a package with S4 functionality. I
thought I would start by building a package consisting of just the first
example under the setGeneric help page in a fresh 2.10.0 (windows)
console
(methods loaded at startup). The example:

## create a new generic function, with a default method
props - function(object) attributes(object)
setGeneric(props)

I executed those commands, then

package.skeleton(props, list=props, namespace=TRUE)

I edited the help files, put Depends: methods into the DESCRIPTION file,
and put export(props) as the only line in the NAMESPACE file.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to pass a list of parameters into a function

2010-04-19 Thread Henrique Dallazuanna
Try this:

do.call(somefun, c(x1 = 1, x2 = 2, as.list(temp)))


On Mon, Apr 19, 2010 at 1:58 PM, Gene Leynes
gleyne...@gmail.comgleynes%...@gmail.com
 wrote:

 Does anyone know how to pass a list of parameters into a function?


 for example:

 somefun=function(x1,x2,x3,x4,x5,x6,x7,x8,x9){
ans=x1+x2+x3+x4+x5+x6+x7+x8+x9
return(ans)
 }

 somefun(1,2,3,4,5,6,7,8,9)

 # I would like this to work:
 temp=c(x3=3,x4=4,x5=5,x6=6,x7=7,x8=8,x9=9)
 somefun(x1=1,x2=2,temp)

 # OR I would like this to work:
 temp=list(x3=3,x4=4,x5=5,x6=6,x7=7,x8=8,x9=9)
 somefun(x1=1,x2=2,temp)

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to pass a list of parameters into a function

2010-04-19 Thread Barry Rowlingson
On Mon, Apr 19, 2010 at 5:58 PM, Gene Leynes gleyne...@gmail.com wrote:
 Does anyone know how to pass a list of parameters into a function?


 for example:

 somefun=function(x1,x2,x3,x4,x5,x6,x7,x8,x9){
    ans=x1+x2+x3+x4+x5+x6+x7+x8+x9
    return(ans)
 }

 somefun(1,2,3,4,5,6,7,8,9)

 # I would like this to work:
 temp=c(x3=3,x4=4,x5=5,x6=6,x7=7,x8=8,x9=9)
 somefun(x1=1,x2=2,temp)

 # OR I would like this to work:
 temp=list(x3=3,x4=4,x5=5,x6=6,x7=7,x8=8,x9=9)
 somefun(x1=1,x2=2,temp)

 Why?

 These are the kind of things that should only be done by people who
know how to do them and hence know not to do them. A bit like the
definition of a gentleman being someone who knows how to play the
bagpipes but chooses not to.

 You can do this, but it requires all sorts of fiddling of the
argument lists which would make the functions very messy, and most
probably unlike any other R functions the user has encountered.

 But... for starters you might want to look at the '...' argument
which gives some flexibility in argument handling, something like:

 foo = function(...){
   return(list(...))
}

 then try foo(x1=1,x2=2) and so on. See what you get back. Then work
out how to add them all up, check they are all x1 to x9 and so on. And
recursively unwrap your 'temp' variable by testing if it is atomic or
not.

 I'm off to play the bagpipes now.

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping rows of data by day

2010-04-19 Thread Henrique Dallazuanna
Try this:

aggregate(DF[c('data1','data2')], list(gsub('\\..*', '', DF$time)), FUN =
sum)

On Mon, Apr 19, 2010 at 12:00 PM, jennyed jen.wri...@ed.ac.uk wrote:


 Hi all,

 I have a set of data in hourly time steps with each row identified as
 time  data column1 data column2
 1  
 1.042
 1.083
 1.125
 1.167
 1.208
 1.25 .and so on (the
 time column is in fractions of a day)

 I want to be able to group the data by day. I managed to do this using:

 Day1H = hourlydata[c(1:24),]

 but I'd like to be able to create groups for each day without doing this
 manually for each set of 24 rows.

 Any suggestions greatly appreciated

 Thanks


 --
 View this message in context:
 http://n4.nabble.com/Grouping-rows-of-data-by-day-tp2016063p2016063.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Type-I v/s Type-III Sum-Of-Squares in ANOVA

2010-04-19 Thread Daniel Wollschlaeger
* On Mo, 1. Mar 2010, Ista Zahn wrote:

 I've posted a short explanation about this at
 http://yourpsyche.org/miscellaneous that you might find helpful. I'm a

As someone who's also struggled with the type X sum of squares topic, I
like the idea to completely walk through a numerical example and see what
happens. I'd like to extend this a bit, and cover the following aspects:

- how are the model comparisons underlying the SS types calculated?
- do the compared models obey the marginality principle?
- what are the orthogonal projections defining the model comparisons?
- are the projections invariant to the type of contrast codes?
- are the hypotheses formulated using empirical cell sizes?
  (are the effect estimates using weighted or unweighted marginal means)?
- how can (some of) the SS be calculated without matrix math?

Below you'll find the code for SS type III using the 2x2 example from
Maxwell and Delaney. For SS type I, II, and III using the 3x3 example in
MD, please see http://www.uni-kiel.de/psychologie/dwoll/r/doc/ssTypes.r

The model projections for SS type III corresponding to models that violate
the marginality principle are not invariant to the coding scheme. If,
e.g., Pai is the projection for the model including main effect A and
interaction A:B, Pai will be different for non sum-to-zero and sum-to-zero
codes. This seems to mean that SS type III for main effects compare
different models when using different contrasts codes. Which leads to the
question what hypotheses these models actually imply. I'd be grateful if
someone could provide any pointers on where to read up on that.

I hope this post is not too long! Best, Daniel

-
# 2x2 unbalanced design: data from Maxwell  Delaney 2004 p322
P   - 2  # two groups in factor A (female / male)
Q   - 2  # two groups in factor B (college degree / no degree)
g11 - c(24, 26, 25, 24, 27, 24, 27, 23)
g12 - c(15, 17, 20, 16)
g21 - c(25, 29, 27)
g22 - c(19, 18, 21, 20, 21, 22, 19)
Y   - c(g11, g12, g21, g22)  # salary in 100$
A   - factor(rep(1:P, c(8+4, 3+7)), labels=c(f, m))
B   - factor(rep(rep(1:Q, P), c(8,4, 3,7)), labels=c(deg, noDeg))

-
# utility function getInf2x2 (run with different contrasts settings)
# fit all relevant regression models for 2x2 between-subjects design
# output: * residual sum of squares for each model and their df
# * orthogonal projection on subspace as defined
#   by the design matrix of each model
getInf2x2 - function() {
  X - model.matrix(lm(Y ~ A + B + A:B))  # ANOVA design matrix

  # indicator variables for factors from design matrix
  idA - X[ , 2]   # factor A
  idB - X[ , 3]   # factor B
  idI - X[ , 4]   # interaction A:B

  # fit each relevant regression model
  mod1   - lm(Y ~ 1) # no effect
  modA   - lm(Y ~ idA)   # factor A
  modB   - lm(Y ~   idB) # factor B
  modAB  - lm(Y ~ idA + idB) # factors A, B
  modAI  - lm(Y ~ idA   + idI)   # factor A, interaction A:B
  modBI  - lm(Y ~   idB + idI)   # factor B, interaction A:B
  modABI - lm(Y ~ idA + idB + idI)   # full model A, B, A:B

  # RSS for each regression model from lm()
  rss1   - sum(residuals(mod1)^2)# no effect, i.e., total SS
  rssA   - sum(residuals(modA)^2)# factor A
  rssB   - sum(residuals(modB)^2)# factor B
  rssAB  - sum(residuals(modAB)^2)   # factors A, B
  rssAI  - sum(residuals(modAI)^2)   # factor A, A:B
  rssBI  - sum(residuals(modBI)^2)   # factor B, A:B
  rssABI - sum(residuals(modABI)^2)  # full model A, B, A:B

  # degrees of freedom for RSS for each model
  N - length(Y)  # total N
  df1   - N - (0+1)  # no effect:0 predictors + mean
  dfA   - N - (1+1)  # factor A: 1 predictor  + mean
  dfB   - N - (1+1)  # factor B: 1 predictor  + mean
  dfAB  - N - (2+1)  # factors A, B: 2 predictors + mean
  dfAI  - N - (2+1)  # factor A, A:B:2 predictors + mean
  dfBI  - N - (2+1)  # factor B, A:B:2 predictors + mean
  dfABI - N - (3+1)  # full model A, B, A:B: 3 predictors + mean

  ---
  # alternatively: get RSS for each model and their df manually
  # based on geometric interpretation
  # design matrix for each model
  one  - rep(1, nrow(X))# column of 1s
  X1   - cbind(one) # no effect
  Xa   - cbind(one, idA)# factor A
  Xb   - cbind(one,  idB)   # factor B
  Xab  - cbind(one, idA, idB)   # factors A, B
  Xai  - cbind(one, idA,  idI)  # factor A, interaction A:B
  Xbi  - cbind(one,  idB, idI)  # factor B, interaction A:B
  Xabi - cbind(one, idA, idB, idI)  # full model A, B, A:B

  # orthogonal projections P on subspace given by the design matrix
  # of each model: P*y = y^hat are the model predictions
  P1   - X1   %*% solve(t(X1)  

[R] help in output file

2010-04-19 Thread Changbin Du
HI, Dear R-community,

I AM using the following codes to grow tree and plot tree:

# Classification Tree with rpart
library(rpart)

pdf(file=/home/cdu/changbin/dimer_tree.pdf)

# grow tree
fit.dimer - rpart(outcome ~ ., method=class, data=p.dimer[,2:402])

plotcp(fit.dimer) # visualize cross-validation results


# plot tree
plot(fit.dimer, uniform=TRUE, main=Classification Tree for AA.dimer)
text(fit.dimer, use.n=TRUE, all=TRUE, cex=.8)


dev.off()

But when I open in the pdf file, I found the right side of tree is not shown
up, also part of the bottom of the tree did not show. HOW TO DEAL WITH THIS
PROBLEM?

THanks!

-- 
Sincerely,
Changbin
--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] nls for piecewise linear regression not converging to least square

2010-04-19 Thread Karen Chang Liu
Hi R experts,

I'm trying to use nls() for a piecewise linear regression with the first
slope constrained to 0. There are 10 data points and when it does converge
the second slope is almost always over estimated for some reason. I have
many sets of these 10-point datasets that I need to do. The following
segment of code is an example, and sorry for the overly precise numbers,
they are just copied from real data.

y1-c(2.37700445, 1.76209775, 0.09795576, 2.21834963, 6.62262243,
15.70471269,  21.92956392, 36.39401717, 32.43620195, 44.77442277)
x1-c(24.6, 28.9, 33.2, 37.6, 42.0, 46.4, 50.9, 55.3, 59.8, 64.3)

dat - data.frame(x1,y1)
nlmod - nls(y1 ~ ifelse(x1  xint+(yint/slp), yint, yint +
(x1-(xint+(yint/slp)))*slp),
data=dat, control=list(minFactor=1e-5,maxiter=500,warnOnly=T),
start=list(xint=39.27464924, yint=0.09795576, slp=2.15061064),
na.action=na.omit, trace=T)

##plotting the function
plot(dat$x1,dat$y1)
segments(x0=0, x1=coef(nlmod)[1]+coef(nlmod)[2]*coef(nlmod)[3],
y0=coef(nlmod)[2], y1=coef(nlmod)[2])
segments(x0=coef(nlmod)[1]+coef(nlmod)[2]*coef(nlmod)[3],x1=80,
y0=coef(nlmod)[2], y1=80*coef(nlmod)[3]+coef(nlmod)[2])

As you can see from the plot, the line is above all data points on the
second segment. This seems to be the case for different datasets. I'm
wondering if anyone can help me understand why this happens. Is this because
there are too few data points or is it because the likelihood function is
just not smooth enough?

Karen

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to set proxy settings for R

2010-04-19 Thread danda

Dear All,

I would like to run R on my computer (with win xp on it) at work bu the
proxy restrictions of the university don't let me download the packages or
to connect to a cran mirror, I usually get this message:

 chooseCRANmirror()
Warning message:
In open.connection(con, r) :
  unable to connect to 'cran.r-project.org' on port 80.

Do you know if there is a way to set proxy settings for using R normally?
Thanks a lot
-- 
View this message in context: 
http://n4.nabble.com/How-to-set-proxy-settings-for-R-tp2016158p2016158.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] nls minimum factor error

2010-04-19 Thread Karen Chang Liu
Hi,

I have a small dataset that I'm fitting a segmented regression using nls on.
I get a step below minimum factor error, which I presume is because residual
sum of square is still not small enough when steps in the parameter space
is already below specified/default value. However, when I look at the trace,
the convergence seems to have been reached. I initially thought I might have
reached the parameter space boundary, but these converging parameter values
are by no means near boundary, they are quite in the middle. Could someone
help me understand or throw out some possibilities?

##Here's a sample dataset and code.
y2-c(2.404529, 1.625661, 1.013981, 3.810921, 10.023745, 10.990817,
10.740636, 11.246827,17.022761, 21.430386)
x2-c(25.0, 29.3, 33.8, 38.3, 42.8, 47.2, 51.6, 55.8, 60.4, 64.9)
dat - data.frame(x2,y2)
nlmod - nls(y2 ~ ifelse(x2  xint+(yint/slp), yint, yint +
(x2-(xint+(yint/slp)))*slp),
data=dat, control=list(minFactor=1e-5,maxiter=500,warnOnly=T),
start=list(xint=40.49782, yint=1.013981, slp=0.8547828),
na.action=na.omit, trace=T)

##plotting the function
plot(dat$x2,dat$y2)
segments(x0=0, x1=coef(nlmod)[1]+coef(nlmod)[2]*coef(nlmod)[3],
y0=coef(nlmod)[2], y1=coef(nlmod)[2])
segments(x0=coef(nlmod)[1]+coef(nlmod)[2]*coef(nlmod)[3],x1=80,
y0=coef(nlmod)[2], y1=80*coef(nlmod)[3]+coef(nlmod)[2])

Karen

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to set proxy settings for R

2010-04-19 Thread Henrique Dallazuanna
Try this:

setInternet2()
chooseCRANmirror()




On Mon, Apr 19, 2010 at 1:00 PM, danda gal...@tcd.ie wrote:


 Dear All,

 I would like to run R on my computer (with win xp on it) at work bu the
 proxy restrictions of the university don't let me download the packages or
 to connect to a cran mirror, I usually get this message:

  chooseCRANmirror()
 Warning message:
 In open.connection(con, r) :
  unable to connect to 'cran.r-project.org' on port 80.

 Do you know if there is a way to set proxy settings for using R normally?
 Thanks a lot
 --
 View this message in context:
 http://n4.nabble.com/How-to-set-proxy-settings-for-R-tp2016158p2016158.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glmer with non integer weights

2010-04-19 Thread Emmanuel Charpentier
Le lundi 19 avril 2010 à 03:00 -0800, Kay Cichini a écrit : 
 hi emmanuel,
 
 thanks a lot for your extensive answer.
 do you think using the asin(sqrt()) transf. can be justified for publishing
 prurpose or do i have to expect criticism.

Hmmm ... depends of your reviewers. But if an half-asleep dental surgeon
caught that after an insomnia, you might expect that a fully caffeinated
reviewer will. Add Murphy's law to the mix and ... boom !

 naivly i excluded that possibility, because of violated anova-assumptions,
 but if i did get you right the finite range rather posses a problem here.

No. your problem is that you model a probability as a smooth (linear)
finite function of finite variables. Under those assumptions, you can't
get a *certitude* (probability 0 or 1). Your model is *intrinsically*
inconsistent with your data.

In other word, I'm unable to believe both your model (linear
whathyoumaycallit regression) and your data (wich include certainties)
*simultaneously*.

I'd reconsider your 0 or 1, as meaning *censored* quantities (i. e. no
farther than some epsilon from 0 or 1), with *hard* data (i. e. not a
cooked-up estimate such as the ones i used) to estimate epsilon. There
are *lots* of ways to fit models with censored dependent variables.

 why is it in this special case an advantage? 

It's bloody hell *not* a specific advantage : if you want to fit a
linear model to a a probability, you *need* some function mapping R to
the open ]0 1[ (i. e. all reals strictly superior to 0 and strictly
inferior to 1 ; I thing that's denoted (0 1) in English/American usage).
Asin(sqrt()) does that.

However, (asin(sqrt()))^-1 has a big problem (mapping back [0 1] i. e.
*including* 0 and 1, *not* (0 1), to R) which *hides* the (IMHO bigger)
problem of the inadequacy of your model to your data ! In other words,
it lets you shoot yourself in the foot after a nice sciatic nerve
articaïne block making the operation painless (but still harmful). On
the other hand, logit (or, as pointed by Martin Maechler, qlogis), is
kind enough to choke on this (i. e. returning back Inf values, which
will make the regression program choke).

So please quench my thirst : what exactly is MH.Index supposed to be ?
How is it measured, estimated, guessed or divined ?

HTH,

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] comparing attitudes of 2 groups / likert scales?

2010-04-19 Thread Mona_m

thanks a lot for your help!! I better get on with reading / working now!
-- 
View this message in context: 
http://n4.nabble.com/comparing-attitudes-of-2-groups-likert-scales-tp2015738p2016398.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problems with labels and scaling in star diagrams

2010-04-19 Thread Greg Snow
For number 2, do the scaling yourself so that all values are between 0 and 1, 
then use scale=FALSE in the call to stars.

For number 3 try stardata[1,,drop=FALSE]

Don't have a good suggestion for 1 (though you could look at the code to see 
where the legend is plotted and move that code to the regular stars for your 
own custom copy of the function).

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Euan Reavie
 Sent: Friday, April 16, 2010 7:33 PM
 To: r-help@r-project.org
 Subject: [R] Problems with labels and scaling in star diagrams
 
 I have the following small dataset:
 
  stardata
   NSHEEBCWRW PW
 1  0 0.000 0.000 0.042 0.006  0
 2  0 0.006 0.000 0.013 0.005  0
 3  0 0.000 0.011 0.000 0.000  0
 
 I have plotted the star diagrams as follows:
 
 stars(stardata,
   key.labels = dimnames(stardata)[[2]],
   labels = NULL,
   key.loc = NULL,
   draw.segments=TRUE,
   col.segments=gray,
   lty=blank)
 
 I am having three problems. I welcome solutions for any or all of
 them, or recommendations for a specific package or function that would
 be better suited to my needs.
 
 1. How do I add labels (dimnames(stardata)[[2]]) to all of the
 segments, the same way they are added to the segments of the key
 (which I haven't included)? The hard way would be to use the text
 function with six coordinates for the variables, but surely there's an
 easier way?
 
 2. It took me a while to realize that each segment is scaled based on
 the maximum for that variable (column). I would like to treat each row
 independently so that all segments are scaled based on the maximum for
 that row. Possible?
 
 3. I figured one way around problem 2 would be to reduce the dataset
 to one row (e.g. stardata[1,]) but that gives me an error incorrect
 number of dimensions when I try to generate the diagram. So, I seem
 to be unable to plot a single star diagram; it must be two or more.
 
 Many thanks - Euan.
 Euan Reavie, University of Minnesota Duluth.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] densCols: what are the computed densities and how to create a legend

2010-04-19 Thread Kate Zinszer
Hi,

I'm using the densCols function for a scatterplot and cannot figure out 1) how 
to extract the computed densities, and 2) how to create a legend based that 
represents the upper and lower ranges of the densities.

For example:

movers.den  - densCols(move$x, move$y)
table(movers.den)
#08306B #083775 #083B7C #083D7E #3989C1 #3F8FC4 
 28   22   101   25  4   5 
#4392C6 #65AAD3 #69ACD5 #6CAED6 #77B4D8 #98C6DF 
 146  8   43   
9 
plot(move, col=movers.den, pch=20,ylab=y coordinate movement (meters),xlab=x 
coordinate movement (meters))
abline(h=0, v=0, col = grey, lty=2)
#legend??

Any help would be appreciated!
Kate
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nls for piecewise linear regression not converging to least square

2010-04-19 Thread Gabor Grothendieck
Try reparameterizing:

nlmod2 - nls(y2 ~ pmax(1/p, (x2 - xint)), data = dat,
 start = list(xint = 40.49782, p = 1), trace = TRUE, alg = plinear)

On Mon, Apr 19, 2010 at 11:32 AM, Karen Chang Liu kare...@uw.edu wrote:
 Hi R experts,

 I'm trying to use nls() for a piecewise linear regression with the first
 slope constrained to 0. There are 10 data points and when it does converge
 the second slope is almost always over estimated for some reason. I have
 many sets of these 10-point datasets that I need to do. The following
 segment of code is an example, and sorry for the overly precise numbers,
 they are just copied from real data.

 y1-c(2.37700445, 1.76209775, 0.09795576, 2.21834963, 6.62262243,
 15.70471269,  21.92956392, 36.39401717, 32.43620195, 44.77442277)
 x1-c(24.6, 28.9, 33.2, 37.6, 42.0, 46.4, 50.9, 55.3, 59.8, 64.3)

 dat - data.frame(x1,y1)
 nlmod - nls(y1 ~ ifelse(x1  xint+(yint/slp), yint, yint +
 (x1-(xint+(yint/slp)))*slp),
            data=dat, control=list(minFactor=1e-5,maxiter=500,warnOnly=T),
            start=list(xint=39.27464924, yint=0.09795576, slp=2.15061064),
            na.action=na.omit, trace=T)

 ##plotting the function
 plot(dat$x1,dat$y1)
 segments(x0=0, x1=coef(nlmod)[1]+coef(nlmod)[2]*coef(nlmod)[3],
            y0=coef(nlmod)[2], y1=coef(nlmod)[2])
 segments(x0=coef(nlmod)[1]+coef(nlmod)[2]*coef(nlmod)[3],x1=80,
            y0=coef(nlmod)[2], y1=80*coef(nlmod)[3]+coef(nlmod)[2])

 As you can see from the plot, the line is above all data points on the
 second segment. This seems to be the case for different datasets. I'm
 wondering if anyone can help me understand why this happens. Is this because
 there are too few data points or is it because the likelihood function is
 just not smooth enough?

 Karen

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] nls for piecewise linear regression not converging to least square

2010-04-19 Thread Karen Liu

Hi R experts,

I'm trying to use nls() for a piecewise linear 
regression with the first slope constrained to 0. There are 10 data 
points and when it does converge the second slope is almost always over 
estimated for some reason. I have many sets of these 10-point datasets 
that I need to do. The following segment of code is an example, and 
sorry for the overly precise numbers, they are just copied from real 
data.


y1-c(2.37700445, 1.76209775, 0.09795576, 2.21834963, 6.62262243,
 15.70471269,  21.92956392, 36.39401717, 32.43620195, 44.77442277)
x1-c(24.6,
 28.9, 33.2, 37.6, 42.0, 46.4, 50.9, 55.3, 59.8, 64.3)

dat - 
data.frame(x1,y1)

nlmod - nls(y1 ~ ifelse(x1  xint+(yint/slp), yint, yint + 
(x1-(xint+(yint/slp)))*slp), 
data=dat, 
control=list(minFactor=1e-5,maxiter=500,warnOnly=T),
   
 start=list(xint=39.27464924, yint=0.09795576, slp=2.15061064),

na.action=na.omit, trace=T)

##plotting the function
plot(dat$x1,dat$y1)
segments(x0=0,
 x1=coef(nlmod)[1]+coef(nlmod)[2]*coef(nlmod)[3],

y0=coef(nlmod)[2], y1=coef(nlmod)[2])
segments(x0=coef(nlmod)[1]+coef(nlmod)[2]*coef(nlmod)[3],x1=80,

y0=coef(nlmod)[2], y1=80*coef(nlmod)[3]+coef(nlmod)[2])

As
 you can see from the plot, the line is above all data points on the 
second segment. This seems to be the case for different datasets. I'm 
wondering if anyone can help me understand why this happens. Is this 
because there are too few data points or is it because the likelihood 
function is just not smooth enough?


Karen 
_
The New Busy is not the old busy. Search, chat and e-mail from your inbox.

N:WL:en-US:WM_HMP:042010_3
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] nls minimum factor error

2010-04-19 Thread Karen Liu

Hi,

I have a small dataset that I'm fitting a segmented 
regression using nls on. I get a step below minimum factor error, which I
 presume is because residual sum of square is still not small enough 
when steps in the parameter space is already below specified/default 
value. However, when I look at the trace, the convergence seems to have 
been reached. I initially thought I might have reached the parameter 
space boundary, but these converging parameter values are by no means 
near boundary, they are quite in the middle. Could someone help me 
understand or throw out some possibilities?


##Here's a sample dataset and code.
y2-c(2.404529, 1.625661, 
1.013981, 3.810921, 10.023745, 10.990817, 10.740636, 
11.246827,17.022761, 21.430386)
x2-c(25.0, 29.3, 33.8, 38.3, 
42.8, 47.2, 51.6, 55.8, 60.4, 64.9)

dat - data.frame(x2,y2)
nlmod - nls(y2 ~ ifelse(x2  
xint+(yint/slp), yint, yint + (x2-(xint+(yint/slp)))*slp), 

data=dat, control=list(minFactor=1e-5,maxiter=500,warnOnly=T),
   
 start=list(xint=40.49782, yint=1.013981, slp=0.8547828),

na.action=na.omit, trace=T)

##plotting the function
plot(dat$x2,dat$y2)
segments(x0=0,
 x1=coef(nlmod)[1]+coef(nlmod)[2]*coef(nlmod)[3],

y0=coef(nlmod)[2], y1=coef(nlmod)[2])
segments(x0=coef(nlmod)[1]+coef(nlmod)[2]*coef(nlmod)[3],x1=80,

y0=coef(nlmod)[2], y1=80*coef(nlmod)[3]+coef(nlmod)[2])

Karen 
_
Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.

N:WL:en-US:WM_HMP:042010_1
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Huge data sets and RAM problems

2010-04-19 Thread Stella Pachidi
Dear all,

This is the first time I am sending mail to the mailing list, so I
hope I do not make a mistake...

The last months I have been working on my MSc thesis project on
performing data mining techniques on user logs of a
software-as-a-service application. The main problem  I am experiencing
is how to process the huge amount of data. More specifically:

I am using R 2.10.1 in a laptop with Windows 7 - 32bit system, 2GB RAM
and CPU Intel Core Duo 2GHz.

The user logs data come from a query Crystal report (.rpt file) which
I transform with some Java code into a tab separated file.

Although with a small subset of my data everything manages to run,
when I increase the data set I get several problems:

The first problem is with the use of read.delim(). When  I try to read
a big amount of data  (over 2.400.000 rows and 18 attributes at each
row) it doesn't seem to transform all table into a data frame. In
particular, the data frame returned has 1.220.987 rows.

Furthermore, as one of the data attributes is DataTime, when I try to
split this column into two columns (one with Data and one with the
Time), the returned result is quite strange, as the two new columns
appear to have more rows than the data frame:

applicLog.dat - read.delim(file.txt)
#Process the syscreated column (Date time -- Date + time)
copyDate - applicLog.dat[[ï..syscreated]]
copyDate - as.character(copyDate)
splitDate - strsplit(copyDate,  )
splitDate - unlist(splitDate)
splitDateIndex - c(1:length(splitDate))
sysCreatedDate - splitDate[splitDateIndex %% 2 == 1]
sysCreatedTime - splitDate[splitDateIndex %% 2 == 0]
sysCreatedDate - strptime(sysCreatedDate, format=%Y-%m-%d)
op - options(digits.secs = 3)
sysCreatedTime - strptime(sysCreatedTime, format =%H:%M:%OS)
applicLog.dat[[ï..syscreated]] - NULL
applicLog.dat - cbind (sysCreatedDate,sysCreatedTime,applicLog.dat)

Then I get the error: Error in data.frame(..., check.names = FALSE) :
  arguments imply differing number of rows: 1221063, 1221062, 1220987


Finally, another problem I have is when I perform association mining
on the data set using the package arules: I turn the data frame into
transactions table and then run the apriori algorithm. When I put too
low support in order to manage to find the rules I need, the vector of
rules becomes too big and I get problems with the memory such as:
Error: cannot allocate vector of size 923.1 Mb
In addition: Warning messages:
1: In items(x) : Reached total allocation of 153Mb: see help(memory.size)

Could you please help me with how I could allocate more RAM? Or, do
you think there is a way to process the data by loading them into a
document instead of loading all into RAM? Do you know how I could
manage to read all my data set?

I would really appreciate your help.

Kind regards,
Stella Pachidi

PS: Do you know any text editor that can read huge .txt files?





--
Stella Pachidi
Master in Business Informatics student
Utrecht University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plotting RR, 95% CI as table and figure in same plot

2010-04-19 Thread Christos Argyropoulos

ggplot2 should work (resize to get the plot to the dimensions you need for the 
paper)

 

library(Hmisc)
library(pscl) 
library(ggplot2)

## data
data(bioChemists, package = pscl)
fm_pois - glm(art ~ ., data = bioChemists, family = poisson)
summary(fm_pois)

### pull out rate-ratios and 95% CI
rr - exp(cbind(coef(fm_pois), confint(fm_pois)))
rr
### round to 2 decimal places
rr - as.data.frame(round(rr, 2))

colnames(rr)-c(y,ymin,ymax)
rr$labl-rownames(rr) ## Change this to meaningful labels
rr$x-1:length(rownames(rr))
gpl-ggplot(rr,aes(x,y,ymin=ymin,ymax=ymax))
gpl+geom_point()+geom_linerange()+
 geom_hline(aes(yintercept=1),
 linetype=dashed,size=0.5)+
 geom_text(aes(x,y=0.3,label=y,hjust=0),size=3)+
 geom_text(aes(x,y=0.0,label=labl),size=3)+
 geom_text(aes(x,y=0.5,label=paste([,ymin,,,ymax,],sep=)
 ,hjust=0.0),size=3)+
 ylab(Relative Risk)+xlab()+
 coord_cartesian(ylim=c(-1,1.7))+
 coord_cartesian(xlim=c(0.85,6.15))+
 scale_x_continuous(breaks=NA)+
 scale_y_continuous(breaks=seq(0.8,1.6,.1))+
 opts(
  panel.grid.major = theme_blank(),
  panel.grid.minor=theme_blank(),
  title=,
  panel.background = theme_rect(fill=NA,colour=NA)
 )+
 coord_flip()+
geom_text(aes(x=6.3,y=0.35,label=RR),size=4)+
geom_text(aes(x=6.3,y=0.60,label=95% CI),size=4)

 

 

Christos


 
 Date: Mon, 19 Apr 2010 09:29:48 -0700
 From: datk...@u.washington.edu
 To: r-help@r-project.org
 Subject: [R] plotting RR, 95% CI as table and figure in same plot
 
 
 Hi all--
 
 I am in the process of helping colleagues write up a ms in which we fit 
 zero-inflated Poisson models. I would prefer plotting the rate ratios 
 and 95% CI (as I've found Gelman and others convincing about plotting 
 tables...), but our journals usually like the numbers themselves.
 
 Thus, I'm looking at a recent JAMA article in which both numbers and 
 dotplot of RR and 95% CI are presented and wondering about best way to 
 do this in R.
 
 Essentially, the plot has 3 columns: variable names, RR and 95% CI, and 
 dotplot of the same.
 
 Using the bioChemists data in the pscl package and errbar function in 
 Hmisc package, the code below is in the right direction... but still 
 pretty ugly.
 
 Wondering if folks would have alternative suggestions about how to go 
 about this, or pointers on cleaning up the code below (eg, I know there 
 are many functions for plotting errbars/CI).
 
 [And, obviously, there are somethings that would be straightforward to 
 clean-up such as supplying better variable names, etc., just wanted to 
 see if there were better overall suggestions before getting too far on 
 this route.]
 
 Thanks in advance.
 
 cheers, Dave
 
 library(Hmisc)
 library(pscl) 
 
 ## data
 data(bioChemists, package = pscl)
 fm_pois - glm(art ~ ., data = bioChemists, family = poisson)
 summary(fm_pois)
 
 ### pull out rate-ratios and 95% CI
 rr - exp(cbind(coef(fm_pois), confint(fm_pois)))
 rr
 ### round to 2 decimal places
 rr - round(rr, 2)
 
 ### plot
 par(mfrow=c(1,3))
 plot(0, type = n, xlim=c(0,2), ylim=c(1,6),
 axes = FALSE, ylab=NULL, xlab=NULL)
 text(row.names(rr), x = 1, y = 1:6)
 
 plot(0, type = n, xlim=c(0,2), ylim=c(1,6),
 axes = FALSE, ylab=NULL, xlab=NULL)
 text(paste(rr[,1],  [, rr[,2], , , rr[,3], ], sep = ), x = 1, y 
 = 1:6)
 
 errbar(x = factor(row.names(rr)),
 y = rr[,1], yplus = rr[,3],
 yminus = rr[,2])
 abline(v = 1, lty =2) 
 
 -- 
 Dave Atkins, PhD
 Research Associate Professor
 Department of Psychiatry and Behavioral Science
 University of Washington
 datk...@u.washington.edu
 
 Center for the Study of Health and Risk Behaviors (CSHRB) 
 1100 NE 45th Street, Suite 300 
 Seattle, WA 98105 
 206-616-3879 
 http://depts.washington.edu/cshrb/
 (Mon-Wed) 
 
 Center for Healthcare Improvement, for Addictions, Mental Illness,
 Medically Vulnerable Populations (CHAMMP)
 325 9th Avenue, 2HH-15
 Box 359911
 Seattle, WA 98104?
 206-897-4210
 http://www.chammp.org
 (Thurs)
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
  
_
Hotmail: Trusted email with powerful SPAM protection.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tinn-R

2010-04-19 Thread Robert Ruser
Thank you very much - it really works. Maybe it's not so useful as
Tinn-R but is sufficient.



2010/4/19 Tal Galili tal.gal...@gmail.com:
 Consider trying
 notepad++
 with
 NppToR
 That's what I use (it also works with the REvolution distribution)


 Contact
 Details:---
 Contact me: tal.gal...@gmail.com |  972-52-7275845
 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
 www.r-statistics.com (English)
 --




 On Mon, Apr 19, 2010 at 6:50 PM, Robert Ruser robert.ru...@gmail.com
 wrote:

 Hello,
 I want to use the free distribution of R (R REvolution 3.2) and Tinn-R
 editor as well. Unfortunately they don't cooperate. In Tinn-R
 commands: send selection, send line etc. don't work. Do you have any
 idea how to resolve this problem?

 Best,
 Robert

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to set proxy settings for R

2010-04-19 Thread Pete B

Also, the FAQ suggests using the alternative internet2.dll  by starting R
with the flag --internet2

If you start R from a desktop icon, you can add the --internet flag to the
target line (right click, properties) e.g. C:\Program
Files\R\R-2.8.1\bin\Rgui.exe --internet2

see
http://cran.r-project.org/bin/windows/rw-FAQ.html#The-Internet-download-functions-fail_002e

HTH 

Pete



-- 
View this message in context: 
http://n4.nabble.com/How-to-set-proxy-settings-for-R-tp2016158p2016511.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plotting RR, 95% CI as table and figure in same plot

2010-04-19 Thread David Atkins


Thanks to Thomas and Christos for helpful suggestions.

The forestplot (in package rmeta) suggestion seems to work fairly well 
for me, though does require a bit of fiddling (no complaints, obviously 
using it for a different purpose than it was written).


Below is an example using a slightly hacked version of forestplot (and 
also using a ZIP model).


[BTW, my hacks were to adjust the code so I could set the line weights 
and to use circles as opposed to boxes and set the radii.]


cheers, Dave


## data
data(bioChemists, package = pscl)
fm_zip - zeroinfl(art ~ ., data = bioChemists)
summary(fm_zip)

### pull out rate-ratios and 95% CI
rr - exp(cbind(coef(fm_zip), confint(fm_zip)))
rr

### round to 2 decimal places
rr - format(rr, digits=2)


### Alternative: forestplot() from rmeta package 


#
library(rmeta)

preds - c(Intercept,Women,Married,Kids,PhD,Mentor)
tab.txt - rbind(c(Predictor,RR [95% CI]), c(, ),
cbind(preds, paste(rr[1:6,1],  [, rr[1:6,2], , 
, rr[1:6,3], ],
  
sep = )),
c(, ),
c(Predictor,OR [95% CI]), c(, ),
	cbind(preds, paste(rr[7:12,1],  [, rr[7:12,2], , , rr[7:12,3], 
],

  
sep = )))
tab.txt

dat.txt - rbind(c(NA,NA,NA), c(NA,NA,NA), rr[1:6,],
   c(NA,NA,NA), c(NA,NA,NA), c(NA,NA,NA), 
rr[7:12,])
dat.txt

### NOTE: slightly hacked version of forestplot from rmeta
forestplot2(labeltext = tab.txt,
  mean = dat.txt[,1], lower = dat.txt[,2], upper = dat.txt[,3],
  zero=1,
  is.summary=c(TRUE,rep(FALSE,8),TRUE,rep(FALSE,8)),
  xlog=FALSE,
  graphwidth = unit(3, inches), lwd= 3, rad = 0.3)



### Functions 



forestplot2 -
function (labeltext, mean, lower, upper, align = NULL, is.summary = FALSE,
clip = c(-Inf, Inf), xlab = , zero = 0, graphwidth = unit(2,
inches), col = meta.colors(), xlog = FALSE, xticks = NULL,
boxsize = NULL, lwd = 1, rad = 0.1, ...)
{
require(grid) || stop(`grid' package not found)
require(rmeta) || stop(`rmeta' package not found)
drawNormalCI - function(LL, OR, UL, size) {
size = 0.75 * size
clipupper - convertX(unit(UL, native), npc, valueOnly = 
TRUE) 

1
cliplower - convertX(unit(LL, native), npc, valueOnly = 
TRUE) 

0
box - convertX(unit(OR, native), npc, valueOnly = TRUE)
clipbox - box  0 || box  1
if (clipupper || cliplower) {
ends - both
lims - unit(c(0, 1), c(npc, npc))
if (!clipupper) {
ends - first
lims - unit(c(0, UL), c(npc, native))
}
if (!cliplower) {
ends - last
lims - unit(c(LL, 1), c(native, npc))
}
grid.lines(x = lims, y = 0.5, arrow = arrow(ends = ends,
length = unit(0.05, inches)), gp = gpar(col = col$lines))
if (!clipbox)
grid.rect(x = unit(OR, native), width = unit(size,
  snpc), height = unit(size, snpc), gp = gpar(fill 
= col$box,

  col = col$box))
}
else {
grid.lines(x = unit(c(LL, UL), native), y = 0.5,
gp = gpar(col = 1, lwd = lwd))
grid.circle(x = unit(OR, native),
#width = unit(size, snpc),
#height = unit(size, snpc),
r = rad,
gp = gpar(fill = col$box,
col = col$box))
if ((convertX(unit(OR, native) + unit(0.5 * size,
lines), native, valueOnly = TRUE)  UL) 
(convertX(unit(OR, native) - unit(0.5 * size,
  lines), native, valueOnly = TRUE)  LL))
grid.lines(x = unit(c(LL, UL), native), y = 0.5,
  gp = gpar(col = col$lines))
}
}
drawSummaryCI - function(LL, OR, UL, size) {
grid.polygon(x = unit(c(LL, OR, UL, OR), native), y = unit(0.5 +
c(0, 0.5 * size, 0, -0.5 * size), npc), gp = gpar(fill = 
col$summary,

col = col$summary))
}
plot.new()
widthcolumn - !apply(is.na(labeltext), 1, any)
nc - NCOL(labeltext)
labels - vector(list, nc)
if (is.null(align))
align - c(l, rep(r, nc - 1))
else align - rep(align, length = nc)
nr - NROW(labeltext)
is.summary - rep(is.summary, length = nr)
for (j in 1:nc) {
labels[[j]] - vector(list, nr)
for (i in 1:nr) {
if (is.na(labeltext[i, 

  1   2   >