Re: [R] accessing and preserving list names in lapply

2009-02-27 Thread Romain Francois

Hi,

This might be the trick you are looking for:
http://tolstoy.newcastle.edu.au/R/e4/help/08/04/8720.html

Romain

Alexy Khrabrov wrote:

res - lapply(1:length(L),do.one)


Actually, I do

res - lapply(:length(L),function(x)do.one(L[x]))

-- this is the price of needing the element's name, so I have to both 
make do.one extract the name and the meat separately inside, and 
lapply becomes ugly.  Yet the obvious alternatives -- extracting the 
names separately, attaching them back into list elements, etc., -- are 
even uglier.  Something pretty? :)


Cheers,
Alexy

--
Romain Francois
Independent R Consultant
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem with RBloomberg (not the usual one)

2009-02-27 Thread Sergey Goriatchev
Hello, everyone!

I have a problem with RBloomberg and this is not the usual no
administrator rights problem.

I have R 2.7.2, RBloomberg 0.1-10, RDCOMclient 0.92-0

RDCOMClient, chron, zoo, stats: these packages load OK.

Then, trying to connect, I get following error message:


 conn - blpConnect(show.days=week, na.action=previous.days,
periodicity=daily)
Warning messages:
1: In getCOMInstance(name, force = TRUE, silent = TRUE) :
  Couldn't get clsid from the string
2: In blpConnect(show.days = week, na.action = previous.days,
periodicity = daily) :
  Seems like this is not a Bloomberg Workstation:  Error : Invalid class string

Anyone encountered this problem?
What is wrong and how can I solve it?

Online, I found just one instance of this problem discussed, and it
was in Chinese:

http://cos.name/bbs/read.php?tid=12821fpage=3

Thank you for your help!

Sergey

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] combining identify() and locator()

2009-02-27 Thread Brian Bolt

Hi,
I am wondering if there might be a way to combine the two functions  
identify() and locator() such that if I use identify() and then click  
on a point outside the set tolerance, the x,y coordinates are returned  
as in locator().  Does anyone know of a way to do this?

Thanks in advance for any help
-brian

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Installing different versions of R simultaneously on Linux

2009-02-27 Thread Berwin A Turlach
G'day Rainer,

On Fri, 27 Feb 2009 09:34:11 +0200
Rainer M Krug r.m.k...@gmail.com wrote:

 I want to install some versions of R simultaneously from source on a
 computer (running Linux). [...]

What flavour of Linux are we talking about?

 If it is not, how is it possible to have several versions of R on one
 computer, or is the only way to compile them and then call R in the
 directory of the version where it was compiled (~/R-2.7.2/bin/R)?

For Debian based machines (I first used Debian, nowadays Kubuntu), I
got into the following habit:

1) Unpack the R sources in /opt/src
2) Enter /opt/src/R-x.y.z and run configure with
   --prefix=/opt/R/R-x.y.z (and other options) 
3) Build R with checks and documentation from source and install.
4) Run in /opt/src a script that uses update-alternative install to
   install the new version and creates a link from /opt/R/R-x.y.z/bin/R
   to /opt/bin/R-x.y.z

I have /opt in my PATH, thus I can call any R version explicitly by
R-x.y.z.  

Typing R alone, will usually start the most recently installed
version (as this will have the highest priority) but I can configure
that via sudo update-alternatives --config R.  I.e., I can make R run
a particular version.  Since the update-alternative step above also
registers all the *.info files and man pages, I will also access the
documentation of that particular R version (e.g., C-h i in emacs will
give me access to the info version of the manuals of the version of R
which is run by the R command).

Over time, typically when the linux system is upgraded, libraries on
which old R-x.y.z binaries relied vanish.  At that time I usually
delete /opt/R/R-x.y.z and remove that version from the available
alternatives.

HTH.  Let me know if you need more details.

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] survival::survfit,plot.survfit

2009-02-27 Thread Heinz Tuechler

At 15:28 26.02.2009, Terry Therneau wrote:

 plot(survfit(fit)) should plot the survival-function for x=0 or
 equivalently beta'=0. This curve is independent of any covariates.

  This is not correct.  It plots the curve for a hypothetical 
subject with x=

mean of each covariate.


Does this mean, the curve corresponds to the one you would get based 
on the base line hazard?


Heinz


  This is NOT the average survival of the data set.  Imagine a 
cohort made up

of 60 year old men and their 10 year old grandsons: the expected survival of
this cohort does not look that for a 35 year old male.

Terry T

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] combining identify() and locator()

2009-02-27 Thread Barry Rowlingson
2009/2/27 Brian Bolt bb...@kalypsys.com:
 Hi,
 I am wondering if there might be a way to combine the two functions
 identify() and locator() such that if I use identify() and then click on a
 point outside the set tolerance, the x,y coordinates are returned as in
 locator().  Does anyone know of a way to do this?
 Thanks in advance for any help

 Since identify will only return the indexes of selected points, and
it only takes on-screen clicks for coordinates, you'll have to
leverage locator and duplicate some of the identify work. So call
locator(1), then compute the distancez to your points, and if any are
below your tolerance mark them using text(), otherwise keep the
coordinates of the click.

 You can use dist() to compute a distance matrix, but if you want to
totally replicate identify's tolerance behaviour I think you'll have
to convert from your data coordinates to device coordinates. The
grconvertX and Y functions look like they'll do that for you.

 Okay, that's the flatpack delivered, I think you've got all the
parts, some assembly required!

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Download daily weather data

2009-02-27 Thread Pfaff, Bernhard Dr.
Dear Thomas,

more for the sake of completeness and as an alternative to R. There are GRIB 
data [1] sets available (some for free) and there is the GPL software Grads 
[2]. Because the Grib-Format is well documented it should be possible to get it 
into R easily and make up your own plots/weather analyis. I do not know and 
have not checked if somebody has already done so.

I use this information/tools aside of others during longer-dated off-shore 
sailing.

Best,
Bernhard 

[1] http://www.grib.us/
[2] http://www.iges.org/grads/

-Ursprüngliche Nachricht-
Von: r-help-boun...@r-project.org 
[mailto:r-help-boun...@r-project.org] Im Auftrag von Scillieri, John
Gesendet: Donnerstag, 26. Februar 2009 22:58
An: 'James Muller'; 'r-help@r-project.org'
Betreff: Re: [R] Download daily weather data

Looks like you can sign up to get XML feed data from Weather.com

http://www.weather.com/services/xmloap.html

Hope it works out!

-Original Message-
From: r-help-boun...@r-project.org 
[mailto:r-help-boun...@r-project.org] On Behalf Of James Muller
Sent: Thursday, February 26, 2009 3:57 PM
To: r-help@r-project.org
Subject: Re: [R] Download daily weather data

Thomas,

Have a look at the source code for the webpage (ctrl-u in firefox,
don't know in internet explorer, etc.). That is what you'd have to
parse in order to get the forecast from this page. Typically when I
parse webpages such as this I use regular expressions to do so (and I
would never downplay the usefulness of regular expressions, but they
take a little getting used to). There are two parts to the task: find
patterns that allow you to pull out the datum/data you're after; and
then write a program to pull it/them out. Also, of course, download
the webpage (but that's no issue).

I bet you'd be able to find a comma separated value (CSV) file
containing the weather report somewhere, which would probably involve
a little less labor in order to produce your automatic wardrobe
advice.

James



On Thu, Feb 26, 2009 at 3:47 PM, Thomas Levine 
thomas.lev...@gmail.com wrote:
 I'm writing a program that will tell me whether I should wear a coat,
 so I'd like to be able to download daily weather forecasts and daily
 reports of recent past weather conditions.

 The NOAA has very promising tabular forecasts
 
(http://forecast.weather.gov/MapClick.php?CityName=Ithacastate
=NYsite=BGMtextField1=42.4422textField2=-76.5002e=0FcstType=digital),
 but I can't figure out how to import them.

 Someone must have needed to do this before. Suggestions?

 Thomas Levine!

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
 This e-mail and any attachments are confidential, may 
contain legal, professional or other privileged information, 
and are intended solely for the addressee.  If you are not the 
intended recipient, do not use the information in this e-mail 
in any way, delete this e-mail and notify the sender. CEG-IP1

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

*
Confidentiality Note: The information contained in this ...{{dropped:10}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Axis-question

2009-02-27 Thread Antje

Hi there,

I was wondering wether it's possible to generate an axis with groups (like in 
Excel).


So that you can have something like this as x-axis (for example for the 
levelplot-method of the lattice package):


---
| X1 | X2 | X3 | X1 | X2 | X3 | X1 | ...
| group1   | group2   | group3  ...
..
..
..

I hope you understand what I'm looking for?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] accessing and preserving list names in lapply

2009-02-27 Thread baptiste auguie

Hi,

Perhaps Hadley's plyr package can help,


library(plyr)
temp - list(x=2,y=3,x=4)
llply(temp, function(x) x^2 )

$x
[1] 4

$y
[1] 9

$x
[1] 16



baptiste

On 27 Feb 2009, at 03:07, Alexy Khrabrov wrote:


Sometimes I'm iterating over a list where names are keys into another
data structure, e.g. a related list.  Then I can't use lapply as it
does [[]] and loses the name.  Then I do something like this:

do.one - function(ldf) { # list-dataframe item
  key - names(ldf)
  meat - ldf[[1]]
  mydf - some.df[[key]] # related data structure
  r.df - cbind(meat,new.column=computed)
  r - list(xxx=r.df)
  names(r) - key
  r
}

then if I operate on the list L of those ldf's not as lapply(L,...),  
but


res - lapply(1:length(L),do.one)

Can this procedure be simplified so that names are preserved?
Specifically, can the xxx=..., xxx - key part be eliminated -- how
can we have a variable on the left-hand side of list(lhs=value)?

Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


_

Baptiste Auguié

School of Physics
University of Exeter
Stocker Road,
Exeter, Devon,
EX4 4QL, UK

Phone: +44 1392 264187

http://newton.ex.ac.uk/research/emag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] combining identify() and locator()

2009-02-27 Thread Brian Bolt

Hi,
I am wondering if there might be a way to combine the two functions  
identify() and locator() such that if I use identify() and then click  
on a point outside the set tolerance, the x,y coordinates are returned  
as in locator().  Does anyone know of a way to do this?

Thanks in advance for any help
-brian

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Balanced design, differences in results using anova and lmer/anova

2009-02-27 Thread Lars Kunert
Hi, I am trying to do an analysis of variance for an unbalanced design.
As a toy example, I use a dataset presented by K. Hinkelmann and O.
Kempthorne in Design and Anaylysis of Experiments (p353-356).
This example is very similar to my own dataset, with one difference: it
is balanced.
Thus it is possible to do an anaylsis using both: (1) anova, and (2) lmer.
Furthermore, I can compare my results with the results presented in the
book (the book uses SAS).

In short:
 using anova, I can reproduce the results presented in the book.
 using lmer, I fail to reproduce the results
However, for my real analysis, I need lmer - what do I do wrong?

The example uses as randomized complete block desigh (RCBD) with a
nested blocking structure
and subsampling.

response:
  height (of some trees)
covariates:
  HSF (type of the trees)
nested covariates:
  loc (location)
  block  (block is nested in location)

# the data (file: pine.txt) looks like this:

locblockHSFheight
111210
111221
112252
112260
113197
113190
121222
121214
122265
122271
123201
123210
131220
131225
132271
132277
133205
133204
141224
141231
142270
142283
143211
143216
211178
211175
212191
212193
213182
213179
221180
221184
222198
222201
223183
223190
231189
231183
232200
232195
233197
233205
241184
241192
242197
242204
243192
243190

#
# then I load the data
#
read.data = function()
{
d = read.table( pines.txt, header=TRUE )

d$loc   = as.factor( d$loc   )
d$block.tmp = as.factor( d$block )
d$block = ( d$loc:d$block.tmp )[drop=TRUE]  # lme4 does not support
implicit nesting

d$HSF   = as.factor( d$HSF )

return( d )
}

d = read.data()


#
# using anova.
#
m.aov = aov( height ~ HSF*loc + Error(loc/block + HSF:loc/block), data=d )
summary( m.aov )

#
# I get:
#
Error: loc
Df Sum Sq Mean Sq
loc  1  20336   20336

Error: loc:block
  Df  Sum Sq Mean Sq F value Pr(F)
Residuals  6 1462.33  243.72

Error: loc:HSF
Df  Sum Sq Mean Sq
HSF  2 12170.7  6085.3
HSF:loc  2  6511.2  3255.6

Error: loc:block:HSF
  Df  Sum Sq Mean Sq F value Pr(F)
Residuals 12 301.167  25.097

Error: Within
  Df Sum Sq Mean Sq F value Pr(F)
Residuals 24 529.00   22.04

#
# which is, what I expected, however, using lmer
#
m.lmer = lmer( height ~ HSF*loc + HSF*(loc|block), data=d )
anova( m.lmer )

#
# I get:
#
Analysis of Variance Table
Df  Sum Sq Mean Sq
HSF  2 12170.7  6085.3
loc  1  1924.6  1924.6
HSF:loc  2  6511.2  3255.6

#
# what is, at least not what I expected...
#
Thanks for your help, Lars

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using package ROCR

2009-02-27 Thread wiener30

Just an update concerning an error message in using ROCR package.

Error in as.double(y) : 
  cannot coerce type 'S4' to vector of type 'double' 

I have changed the sequence of loading the packages and the problem has
gone:
library(ROCR)
library(randomForest)

The loading sequence that caused an error was:
library(randomForest)
library(ROCR)

May be this info could be useful for somebody else who is getting the same
error.




wiener30 wrote:
 
 Thank you very much for the response!
 
 The plot(1,1) helped to resolve the first problem.
 But I am still getting a second error message when running demo(ROCR)
 
 Error in as.double(y) : 
   cannot coerce type 'S4' to vector of type 'double'
 
 It seems it has something to do with compatibility of S4 objects.
 
 My versions of R and ROCR package are the same as you listed.
 But it seems something other is missing in my installation.
 
 
 William Doane wrote:
 
 
 Responding to question 1... it seems the demo assumes you already have a
 plot window open.
 
   library(ROCR)
   plot(1,1)
   demo(ROCR)
 
 seems to work.
 
 For question 2, my environment produces the expected results... plot
 doesn't generate an error:
   * R 2.8.1 GUI 1.27 Tiger build 32-bit (5301)
   * OS X 10.5.6
   * ROCR 1.0-2
 
 -Wil
 
 
 
 wiener30 wrote:
 
 I am trying to use package ROCR to analyze classification accuracy,
 unfortunately there are some problems right at the beginning.
 
 Question 1) 
 When I try to run demo I am getting the following error message
 library(ROCR)
 demo(ROCR)
 if(dev.cur() = 1)  [TRUNCATED] 
 Error in get(getOption(device)) : wrong first argument
 When I issue the command
 dev.cur() 
 it returns
 null device 
   1
 It seems something is wrong with my R-environment ?
 Could somebody provide a hint, what is wrong.
 
 Question 2)
 When I run an example commands from the manual
 library(ROCR)
 data(ROCR.simple)
 pred - prediction( ROCR.simple$predictions, ROCR.simple$labels )
 perf - performance( pred, tpr, fpr )
 plot( perf )
 
 the plot command issues the following error message
 Error in as.double(y) : 
   cannot coerce type 'S4' to vector of type 'double'
 
 How this could be fixed ?
 
 Thanks for the support
 
 
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Using-package-ROCR-tp22198213p22242023.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] bottom legends in ggplot2 ?

2009-02-27 Thread ONKELINX, Thierry
I would think that the lines below should work but they give an error.
Hadley, can you clarify this?

Cheers,

Thierry

 library(ggplot2)
 qplot(mpg, wt, data=mtcars, colour=cyl) + opts(legend.position =
bottom)
Error in grid.Call.graphics(L_setviewport, pvp, TRUE) : 
  Non-finite location and/or size for viewport
 ggplot(mtcars, aes(x = mpg, y = wt, colour = cyl)) + geom_point() +
opts(legend.position= bottom)
Error in grid.Call.graphics(L_setviewport, pvp, TRUE) : 
  Non-finite location and/or size for viewport
 sessionInfo()
R version 2.8.1 (2008-12-22) 
i386-pc-mingw32 

locale:
LC_COLLATE=Dutch_Belgium.1252;LC_CTYPE=Dutch_Belgium.1252;LC_MONETARY=Du
tch_Belgium.1252;LC_NUMERIC=C;LC_TIME=Dutch_Belgium.1252

attached base packages:
[1] grid  stats graphics  grDevices datasets  utils methods

[8] base 

other attached packages:
[1] ggplot2_0.8.1 reshape_0.8.2 plyr_0.1.5proto_0.3-8  
 




ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium 
tel. + 32 54/436 185
thierry.onkel...@inbo.be 
www.inbo.be 

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey

-Oorspronkelijk bericht-
Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
Namens Avram Aelony
Verzonden: donderdag 26 februari 2009 20:34
Aan: r-h...@stat.math.ethz.ch
Onderwerp: [R] bottom legends in ggplot2 ?


Has anyone had success with producing legends to a qplot graph such that
the legend is placed on the bottom, under the abcissa rather than to the
right hand side ?

The following doesn't move the legend:
   library(ggplot2)
   qplot(mpg, wt, data=mtcars, colour=cyl,
gpar(legend.position=bottom) )


I am using ggplot2_0.8.2.

Thanks in advance,

Avram

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer 
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in  this message 
and any annex are purely those of the writer and may not be regarded as stating 
an official position of INBO, as long as the message is not confirmed by a duly 
signed document.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] gplot problems with faceting

2009-02-27 Thread ONKELINX, Thierry
Dear Pascal,

I thik you need to define the facets as
facets = ~ Par 
Instead of
facets = Par ~ .

The Par ~ . Syntax can be used with facet_grid and not with facet_wrap.

HTH,

Thierry



ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and 
Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology 
and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium 
tel. + 32 54/436 185
thierry.onkel...@inbo.be 
www.inbo.be 

To call in the statistician after the experiment is done may be no more than 
asking him to perform a post-mortem examination: he may be able to say what the 
experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure 
that a reasonable answer can be extracted from a given body of data.
~ John Tukey

-Oorspronkelijk bericht-
Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens 
BOISSON, Pascal
Verzonden: donderdag 26 februari 2009 17:08
Aan: r-help@r-project.org
Onderwerp: [R] gplot problems with faceting

Dear R-Listers,

I am very confused with what seems to be a misuse of the faceting options with 
gplot function and I hope you might help me on this.

z contains various simulation results from simulations with different set of 
parameters.
I melt my data to have the following data.frame structure : 

 str(z)
'data.frame':   12383 obs. of  5 variables:
 $ vID  : num  1 2 3 4 5 6 7 8 9 10 ...
 $ Var  : Factor w/ 61 levels .t,.ASU_1.Biofilm_C,..: 1 1 1 1 1 1 
 $ Var.Value: num  317 318 319 320 319 ...
 $ Par  : Factor w/ 7 levels .Biostyr0d.t_K,..: 1 1 1 1 1 1 1 1 
 $ Par.Value: num  5 5 5 5 5 5 5 5 5 5 ...

I would like to plot for each couple (Parameter(i), Variable(j)) the plot 
Variable(j).value = f(Parameter(i).Value.
I would like to do it step wise and have one set of graphs per Variable.
Then I subset z based on a single variable name eg .ASU_1.Biofilm_C

Then I try the following, but I get an error message :

 qp- qplot(Par.Value, Var.Value, data = z[z$Var==v,], ylab=v, 
 geom=c(point,smooth), method=lm)
 qp- qp + facet_wrap( facets= Par~ ., scales = free_x, ncol=length(vPar))
 qp
Erreur dans `[.data.frame`(plot$data, , setdiff(cond, names(df)), drop = FALSE) 
: 
  colonnes non définies sélectionnées

I can have this working by modifying the facets arguments to Par~Var, and it 
does what I want, 
But it is not satisfying, and I am confused with this error message.
The same error message happens when I use the full data frame. 
Or when I try other mappings like colors = Par

Any idea of what I am doing wrong?

Best regards
Pascal Boisson
___

Protegeons ensemble l'environnement : avez-vous besoin d'imprimer ce courrier 
electronique ?
___

Les informations figurant sur cet e-mail ont un caractere strictement 
confidentiel et sont exclusivement adressees au destinataire mentionne 
ci-dessus.Tout usage, reproduction ou divulgation de cet e-mail est strictement 
interdit si vous n'en etes pas le destinataire. Dans ce cas, veuillez nous en 
avertir immediatement par la meme voie et detruire l'original. Merci.

This e-mail is intended only for use of the individual or entity to which it is 
addressed and may contain information that is privileged, confidential and 
exempt from disclosure under applicable law. 
Any use, distribution or copying of this e-mail communication is strictly 
prohibited if you are not the addressee. If so, please notify us immediately by 
e-mail, and destroy the original. Thank you.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer 
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in  this message 
and any annex are purely those of the writer and may not be regarded as stating 
an official position of INBO, as long as the message is not confirmed by a duly 
signed document.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] rounding problem

2009-02-27 Thread Peterko

hi i am creating some variables from same data, but somewhere is different
rouding.
look:
 P = abs(fft(d.zlato)/480)^2 
 hladane= sort(P,decreasing=T)[1:10]/480 
  
 pozicia=c(0,0,0,0,0) 
 for (j in 1:5){ for (i in 2:239){
  if (P[i]/480==hladane[2*j-1]){pozicia[j]=i-1}}}
 period=479/pozicia  

 P[2]/334 
 [1] 0.0001279107 
  hladane[1]
 [1] 0.0001279107
  P[2]/334==hladane[1]
 [1] FALSE
 abs(P[2]/334 - hladane[1])  0.001
 [1] TRUE

It is possible to avoid it ?
I know in this exam i can use 2x if to eliminate this rouding, but i need to
fix it in general.
-- 
View this message in context: 
http://www.nabble.com/rounding-problem-tp22243179p22243179.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Installing different versions of R simultaneously on Linux

2009-02-27 Thread Prof Brian Ripley

This is really an R-devel question.

On Fri, 27 Feb 2009, Rainer M Krug wrote:


Hi

I want to install some versions of R simultaneously from source on a
computer (running Linux). Some programs have an option to specify a
suffix for the executable (eg R would become R-2.7.2 when the suffix
is specified as -2.7.2). I did not find this option for R - did I
overlook it?

If it is not, how is it possible to have several versions of R on one
computer, or is the only way to compile them and then call R in the
directory of the version where it was compiled (~/R-2.7.2/bin/R)?

If this is the case, would it be possible to add this o[ptiuon to
specify the suffix for the executables?


'R' is not an executable, but a shell script.

You can use 'prefix' to install R anywhere, or other variables for 
more precise control (see the R-admin manual).  For example, we use 
rhome to have R 2.8.x under /usr/local/lib64/R-2.8 etc.


And you can rename $prefix/bin/R to, say, R-2.7.2, or link 
R_HOME/bin/R to anywhere in yout path, under any name you choose.




Thanks

Rainer
--
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
Biology, UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Faculty of Science
Natural Sciences Building
Private Bag X1
University of Stellenbosch
Matieland 7602
South Africa


--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rounding problem

2009-02-27 Thread baptiste auguie

Hi,

you probably want to use ?all.equal instead of ==

I couldn't run your example, though

Hope this helps,

baptiste

On 27 Feb 2009, at 10:32, Peterko wrote:



hi i am creating some variables from same data, but somewhere is  
different

rouding.
look:
P = abs(fft(d.zlato)/480)^2
hladane= sort(P,decreasing=T)[1:10]/480

pozicia=c(0,0,0,0,0)
for (j in 1:5){ for (i in 2:239){
 if (P[i]/480==hladane[2*j-1]){pozicia[j]=i-1}}}
period=479/pozicia


P[2]/334

[1] 0.0001279107

hladane[1]

[1] 0.0001279107

P[2]/334==hladane[1]

[1] FALSE

abs(P[2]/334 - hladane[1])  0.001

[1] TRUE

It is possible to avoid it ?
I know in this exam i can use 2x if to eliminate this rouding, but i  
need to

fix it in general.
--
View this message in context: 
http://www.nabble.com/rounding-problem-tp22243179p22243179.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


_

Baptiste Auguié

School of Physics
University of Exeter
Stocker Road,
Exeter, Devon,
EX4 4QL, UK

Phone: +44 1392 264187

http://newton.ex.ac.uk/research/emag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Installing different versions of R simultaneously on Linux

2009-02-27 Thread Wacek Kusnierczyk
Prof Brian Ripley wrote:
 This is really an R-devel question.

 On Fri, 27 Feb 2009, Rainer M Krug wrote:

 Hi

 I want to install some versions of R simultaneously from source on a
 computer (running Linux). Some programs have an option to specify a
 suffix for the executable (eg R would become R-2.7.2 when the suffix
 is specified as -2.7.2). I did not find this option for R - did I
 overlook it?

 If it is not, how is it possible to have several versions of R on one
 computer, or is the only way to compile them and then call R in the
 directory of the version where it was compiled (~/R-2.7.2/bin/R)?

 If this is the case, would it be possible to add this o[ptiuon to
 specify the suffix for the executables?

 'R' is not an executable, but a shell script.

depending on what is meant by 'executable'.

Files that contain instructions for an interpreter
http://en.wikipedia.org/wiki/Interpreter_%28computing%29 or virtual
machine http://en.wikipedia.org/wiki/Virtual_machine may be considered
executables [1]
The term might also be, but generally isn't, applied to scripts
http://foldoc.org/index.cgi?scripts which are interpreted by a command
line interpreter
http://foldoc.org/index.cgi?command+line+interpreter. [2]

try also

file `which R`

which is likely, system-dependently, to say that it's *executable*
(independently of the access mode).

vQ


[1] http://en.wikipedia.org/wiki/Executable
[2] http://foldoc.org/index.cgi?query=executableaction=Search


vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rounding problem

2009-02-27 Thread Peterko

all.equal is what i need, many thanks to help me

baptiste auguie-2 wrote:
 
 Hi,
 
 you probably want to use ?all.equal instead of ==
 
 I couldn't run your example, though
 
 Hope this helps,
 
 baptiste
 
 On 27 Feb 2009, at 10:32, Peterko wrote:
 

 hi i am creating some variables from same data, but somewhere is  
 different
 rouding.
 look:
 P = abs(fft(d.zlato)/480)^2
 hladane= sort(P,decreasing=T)[1:10]/480

 pozicia=c(0,0,0,0,0)
 for (j in 1:5){ for (i in 2:239){
  if (P[i]/480==hladane[2*j-1]){pozicia[j]=i-1}}}
 period=479/pozicia

 P[2]/334
 [1] 0.0001279107
 hladane[1]
 [1] 0.0001279107
 P[2]/334==hladane[1]
 [1] FALSE
 abs(P[2]/334 - hladane[1])  0.001
 [1] TRUE

 It is possible to avoid it ?
 I know in this exam i can use 2x if to eliminate this rouding, but i  
 need to
 fix it in general.
 --
 View this message in context:
 http://www.nabble.com/rounding-problem-tp22243179p22243179.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 _
 
 Baptiste Auguié
 
 School of Physics
 University of Exeter
 Stocker Road,
 Exeter, Devon,
 EX4 4QL, UK
 
 Phone: +44 1392 264187
 
 http://newton.ex.ac.uk/research/emag
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/rounding-problem-tp22243179p22243567.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inefficiency of SAS Programming

2009-02-27 Thread Gerard M. Keogh
Frank,

I can't see the code you mention - Web marshall at work - but I don't think
you should be too quick to run down SAS - it's a powerful and flexible
language but unfortunately very expensive.

Your example mentions doing a vector product in the macro language - this
only suggest to me that those people writing the code need a crash course
in SAS/IML (the matrix language). SAS is designed to work on records and so
is inapproprorriate for matrices - macros are only an efficient code
copying device. Doing matrix computations in this way is pretty mad and the
code would be impossible never mind the memory problems.
SAS recognise that but a lot of SAS users remain familiar with IML.

In IML by contrast there are inner, cross and outer products and a raft of
other useful methods for matrix work that R users would be familiar with.
OLS for example is one line:

b = solve(X`X, X`y) ;
rss = sqrt(ssq(y - Xb)) ;

And to give you a flavour of IML's capabilities I implemented a SAS version
of the MARS program in it about 6 or 7 years ago.
BTW SPSS also has a matrix language.

Gerard



   
 Frank E Harrell   
 Jr
 f.harr...@vander  To 
 bilt.edu R list r-h...@stat.math.ethz.ch   
 Sent by:   cc 
 r-help-boun...@r- 
 project.org   Subject 
   [R] Inefficiency of SAS Programming 
   
 26/02/2009 22:57  
   
   
   
   




If anyone wants to see a prime example of how inefficient it is to
program in SAS, take a look at the SAS programs provided by the US
Agency for Healthcare Research and Quality for risk adjusting and
reporting for hospital outcomes at
http://www.qualityindicators.ahrq.gov/software.htm .  The PSSASP3.SAS
program is a prime example.  Look at how you do a vector product in the
SAS macro language to evaluate predictions from a logistic regression
model.  I estimate that using R would easily cut the programming time of
this set of programs by a factor of 4.

Frank
--
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



**
The information transmitted is intended only for the person or entity to which 
it is addressed and may contain confidential and/or privileged material. Any 
review, retransmission, dissemination or other use of, or taking of any action 
in reliance upon, this information by persons or entities other than the 
intended recipient is prohibited. If you received this in error, please contact 
the sender and delete the material from any computer.  It is the policy of the 
Department of Justice, Equality and Law Reform and the Agencies and Offices 
using its IT services to disallow the sending of offensive material.
Should you consider that the material contained in this message is offensive 
you should contact the sender immediately and also mailminder[at]justice.ie.

Is le haghaidh an duine nó an eintitis ar a bhfuil sí dírithe, agus le haghaidh 
an duine nó an eintitis sin amháin, a bheartaítear an fhaisnéis a tarchuireadh 
agus féadfaidh sé go bhfuil ábhar faoi rún agus/nó faoi phribhléid inti. 
Toirmisctear aon athbhreithniú, atarchur nó leathadh a dhéanamh ar an 
bhfaisnéis seo, aon úsáid eile a bhaint aisti nó aon ghníomh a dhéanamh ar a 
hiontaoibh, ag daoine nó ag eintitis seachas an faighteoir beartaithe. Má fuair 
tú é seo trí dhearmad, téigh i dteagmháil leis an seoltóir, le do thoil, agus 
scrios an t-ábhar as aon ríomhaire. Is é beartas na Roinne Dlí agus Cirt, 
Comhionannais agus Athchóirithe Dlí, agus na nOifígí agus na nGníomhaireachtaí 
a úsáideann seirbhísí TF na Roinne, seoladh ábhair cholúil a dhícheadú.
Más rud é go measann tú gur ábhar colúil atá san ábhar atá sa teachtaireacht 
seo is ceart duit dul i dteagmháil leis an seoltóir láithreach agus 

Re: [R] bottom legends in ggplot2 ?

2009-02-27 Thread hadley wickham
Yes, this is a known bug which will (hopefully) be addressed in the
next release.
Hadley

On Fri, Feb 27, 2009 at 4:15 AM, ONKELINX, Thierry
thierry.onkel...@inbo.be wrote:
 I would think that the lines below should work but they give an error.
 Hadley, can you clarify this?

 Cheers,

 Thierry

 library(ggplot2)
 qplot(mpg, wt, data=mtcars, colour=cyl) + opts(legend.position =
 bottom)
 Error in grid.Call.graphics(L_setviewport, pvp, TRUE) :
  Non-finite location and/or size for viewport
 ggplot(mtcars, aes(x = mpg, y = wt, colour = cyl)) + geom_point() +
 opts(legend.position= bottom)
 Error in grid.Call.graphics(L_setviewport, pvp, TRUE) :
  Non-finite location and/or size for viewport
 sessionInfo()
 R version 2.8.1 (2008-12-22)
 i386-pc-mingw32

 locale:
 LC_COLLATE=Dutch_Belgium.1252;LC_CTYPE=Dutch_Belgium.1252;LC_MONETARY=Du
 tch_Belgium.1252;LC_NUMERIC=C;LC_TIME=Dutch_Belgium.1252

 attached base packages:
 [1] grid      stats     graphics  grDevices datasets  utils     methods

 [8] base

 other attached packages:
 [1] ggplot2_0.8.1 reshape_0.8.2 plyr_0.1.5    proto_0.3-8



 
 
 ir. Thierry Onkelinx
 Instituut voor natuur- en bosonderzoek / Research Institute for Nature
 and Forest
 Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
 methodology and quality assurance
 Gaverstraat 4
 9500 Geraardsbergen
 Belgium
 tel. + 32 54/436 185
 thierry.onkel...@inbo.be
 www.inbo.be

 To call in the statistician after the experiment is done may be no more
 than asking him to perform a post-mortem examination: he may be able to
 say what the experiment died of.
 ~ Sir Ronald Aylmer Fisher

 The plural of anecdote is not data.
 ~ Roger Brinner

 The combination of some data and an aching desire for an answer does not
 ensure that a reasonable answer can be extracted from a given body of
 data.
 ~ John Tukey

 -Oorspronkelijk bericht-
 Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 Namens Avram Aelony
 Verzonden: donderdag 26 februari 2009 20:34
 Aan: r-h...@stat.math.ethz.ch
 Onderwerp: [R] bottom legends in ggplot2 ?


 Has anyone had success with producing legends to a qplot graph such that
 the legend is placed on the bottom, under the abcissa rather than to the
 right hand side ?

 The following doesn't move the legend:
       library(ggplot2)
       qplot(mpg, wt, data=mtcars, colour=cyl,
 gpar(legend.position=bottom) )


 I am using ggplot2_0.8.2.

 Thanks in advance,

 Avram

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer
 en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd 
 is
 door een geldig ondertekend document. The views expressed in  this message
 and any annex are purely those of the writer and may not be regarded as 
 stating
 an official position of INBO, as long as the message is not confirmed by a 
 duly
 signed document.




-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] survival::predict.coxph

2009-02-27 Thread Bernhard Reinhardt

Hello Therry,

it´s really great to receive some feedback from a pro. I´m not sure if 
I´ve got the point right:
You suppose that the cox-model isn´t good at forecasting an expected 
survival time because of the issues with the prediction of the 
survival-function at the right tail and one should better use parametric 
models like an exponential model? Or what do you mean by smooth 
parametric estimate?
Anyways I just ordered your book at the library. Hopefully I´ll get some 
more insights by the lecture of it.


Maybe I should point out why I even tried to do such forecasts.

Following the article Quantifying climate-related risks and 
uncertainties using Cox regression models by Maia and Meinke I try to 
deduce winter-precipitation from lagged Sea-Surface-Temperatures (SSTs).
So precipitation is my survival-time and and the SST-Observations at 
different lags are my covariates.
The sample size is only 55 and I´ve got 11 covariates (Lag=0 months to 
Lag=10 months) to choose from.
My first goal is to identify the optimal time-lag(s) between 
SST-Anomaly-Observation and Precipitation-Observation.

Expectation was that the lag should be some months.

I thought a cox-model would easily provide such a selection. At first I 
used the covariates individually. Coefficients for lags between 0 and 5 
months were all quite big and then decreasing from 6 to 10 months. So I 
think 5 months could be the lag of the process and high persistence of 
the SST accounts for the big coefficients for 0-4 months.


As the next step I used all 11 covariates at once. I hoped to gain 
similar results. Instead the sign of the coefficients randomly jumps 
from plus to minus and the magnitude as well is randomly distributed.


I also tried to using sets of three covariates e.g. with lag 4,5,6. But 
even then the sign of the coefficients is varying.


So my thought was that maybe I overfitted the model. But in fact I did 
not find any literature if that´s even possible. As far as my limited 
knowledge goes, overfitted models should reproduce the training-period 
very good but other periods very poor. So I first tried to reproduce the 
training-period. But so far with no success - as well with using 11 
covariates or just 1.


Regards

Bernhard R.

Terry Therneau wrote:

You are mostly correct.
Because of the censoring issue, there is no good estimate of the mean survival 
time.  The survival curve either does not go to zero, or gets very noisy near 
the right hand tail (large standard error); a smooth parametric estimate is what 
is really needed to deal with this.
  For this reason the mean survival, though computed (but see the 
survfit.print.mean option, help(print.survfit)) is not highly regarded.  It is 
not an option in predict.coxph.
  
  	Terry T.


 begin included message --
 
Hi,


if I got it right then the survival-time we expect for a subject is the 
integral over the specific survival-function of the subject from 0 to t_max.


If I have a trained cox-model and want to make a prediction of the 
survival-time for a new subject I could use
survfit(coxmodel, newdata=newSubject) to estimate a new 
survival-function which I have to integrate thereafter.


Actually I thought predict(coxmodel, newSubject) would do this for me, 
but I?m confused which type I have to declare. If I understand the 
little pieces of documentation right then none of the available types is 
exactly the predicted survival-time.
I think I have to use the mean survival-time of the baseline-function 
times exp(the result of type linear predictor).


Am I right?




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inefficiency of SAS Programming

2009-02-27 Thread Ajay ohri
I would like to know if we can create a package in which r functions are
renamed closer to sas language.doing so will help people familiar to SAS to
straight away take to R for their work,thus decreasing the threshold for
acceptance - and then get into deeper understanding later.
since it is a package it would be optional only for people wanting to try
out R from SAS.. Do we have such a package right now..it basically masks R
functions to the equivalent function in another language just for user ease
/beginners

for example

creating function for means

 procmeans-function(x,y)
+ {
summary (
subset(x,select=c(x,y))
+
)

creating function for importing csv

procimport -function(x,y)
+ {
read.csv(
textConnection(x),row.names=y,na.strings=  
+
)


creating function fo describing data

procunivariate-function(x)+ {
summary(x)
+
)

regards,

ajay

www.decisionstats.com

On Fri, Feb 27, 2009 at 4:27 AM, Frank E Harrell Jr 
f.harr...@vanderbilt.edu wrote:

 If anyone wants to see a prime example of how inefficient it is to program
 in SAS, take a look at the SAS programs provided by the US Agency for
 Healthcare Research and Quality for risk adjusting and reporting for
 hospital outcomes at http://www.qualityindicators.ahrq.gov/software.htm .
  The PSSASP3.SAS program is a prime example.  Look at how you do a vector
 product in the SAS macro language to evaluate predictions from a logistic
 regression model.  I estimate that using R would easily cut the programming
 time of this set of programs by a factor of 4.

 Frank
 --
 Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inefficiency of SAS Programming

2009-02-27 Thread Barry Rowlingson
2009/2/27 Peter Dalgaard p.dalga...@biostat.ku.dk:

 Presumably, something like

     IF N. =  1 THEN SUB_N = 1;
     ELSE IF N.  5 THEN SUB_N = N.-1;
     ELSE IF N.  16 THEN SUB_N = N.-2;
     ELSE SUB_N = N.-3;

 would work, provided that 2, 5, 16 are impossible values. Problem is that it
 actually makes the code harder to grasp, so experienced SAS programmers go
 for the dumb but readable code like the above.

 I'm not sure which is easier to grasp. When I first saw the original
version I thought it was an odd way of doing SUB_N = N.. Only then
did I have a closer look and spot the missing 2, 5, and 16. A comment
would have been very enlightening. But there was nothing relevant.

 In R, the cleanest I can think of is

 subn - match(n, setdiff(1:19, c(2,5,16)))

 or maybe just

 subn - match(n, c(1, 3:4, 6:15, 17:19))

 although

 subn - factor(n, levels = c(1, 3:4, 6:15, 17:19))

 might be what is really wanted

 I think the important thing with any programming is to make sure what
you want is expressed in words somewhere. If not in the code, then in
the comments. And operations like this should be abstracted into
functions.

  All the examples of SAS code I've seen seem to fall into the old
practices of writing great long 'scripts', with minimal code-reuse and
encapsulation of useful functionality. If these SAS scripts are then
given to new SAS programmers then the chances are they will follow
these bad practices. Show them well-written R code (or C, or Python)
and maybe they can implement those good practices into their SAS work.
Assuming SAS can do that. I'm not sure.


Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Installing different versions of R simultaneously on Linux

2009-02-27 Thread Rainer M Krug
On Fri, Feb 27, 2009 at 12:37 PM, Prof Brian Ripley
rip...@stats.ox.ac.uk wrote:
 This is really an R-devel question.

sorry about the wrong list.


 On Fri, 27 Feb 2009, Rainer M Krug wrote:

 Hi

 I want to install some versions of R simultaneously from source on a
 computer (running Linux). Some programs have an option to specify a
 suffix for the executable (eg R would become R-2.7.2 when the suffix
 is specified as -2.7.2). I did not find this option for R - did I
 overlook it?

 If it is not, how is it possible to have several versions of R on one
 computer, or is the only way to compile them and then call R in the
 directory of the version where it was compiled (~/R-2.7.2/bin/R)?

 If this is the case, would it be possible to add this o[ptiuon to
 specify the suffix for the executables?

 'R' is not an executable, but a shell script.

 You can use 'prefix' to install R anywhere, or other variables for more
 precise control (see the R-admin manual).  For example, we use rhome to have
 R 2.8.x under /usr/local/lib64/R-2.8 etc.

 And you can rename $prefix/bin/R to, say, R-2.7.2, or link R_HOME/bin/R to
 anywhere in yout path, under any name you choose.

OK - so the proceuder will be: if I want to install R 2.7.2 without
impacting on my existing installation of R (which is done by a package
manager), I use
 ./configure --prefix=/usr/bin/R-2.7.2
 make
 ln -s /usr/R-2.7.2/bin/R /usr/bin/R-2.7.2

and when I use
 R-2.7.2

it will start R 2.7.2

I can continue with as many installed version as I want

Thanks a lot,
that was what I was looking for

Rainer



 Thanks

 Rainer
 --
 Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
 Biology, UCT), Dipl. Phys. (Germany)

 Centre of Excellence for Invasion Biology
 Faculty of Science
 Natural Sciences Building
 Private Bag X1
 University of Stellenbosch
 Matieland 7602
 South Africa

 --
 Brian D. Ripley,                  rip...@stats.ox.ac.uk
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford,             Tel:  +44 1865 272861 (self)
 1 South Parks Road,                     +44 1865 272866 (PA)
 Oxford OX1 3TG, UK                Fax:  +44 1865 272595




-- 
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
Biology, UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Faculty of Science
Natural Sciences Building
Private Bag X1
University of Stellenbosch
Matieland 7602
South Africa

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Installing different versions of R simultaneously on Linux

2009-02-27 Thread Berwin A Turlach
G'day Rainer,

On Fri, 27 Feb 2009 10:53:12 +0200
Rainer M Krug r.m.k...@gmail.com wrote:

  What flavour of Linux are we talking about?
 
 Sorry - I am running SuSE on the machine where I need it.

Sorry, I am not familiar with that flavour; before switching to Debian
(and Debian based distributions), I was using RedHat.  And before that
Slackware.

  4) Run in /opt/src a script that uses update-alternative install
  to install the new version and creates a link
  from /opt/R/R-x.y.z/bin/R to /opt/bin/R-x.y.z
 
 How do I do this? I usually call sudo make install. Do I have to use
 update-alternative --install R-2.7.1 R 2 if I want to have R-2.7.1
 aqs the second priority installed?

I do the make install step manually, the script just alerts the
system that another alternative for the R command was installed.

If memory serves correctly, the alternatives mechanism was developed
by Debian and adopted by RedHat (or the other way round).  I am not
sure whether SuSE has adopted this, or a similar system.

Essentially, for a command, say foo, for which several alternatives
exists, is installed on the system in, say /usr/bin/, as a link
to /etc/alternatives/foo and /etc/alternatives/foo is a link to the
actual program that is called.  

E.g. on my machine I have

ber...@berwin-nus1:~$ update-alternatives --list wish
/usr/bin/wish8.5
/usr/bin/wish8.4

which tells me that wish 8.5 and wish8.4 are installed and I could call
them explicitly.  /usr/bin/wish is a link to /etc/alternatives/wish
and /etc/alternatives/wish will point to either of these two programs
(depending on what the system admin decided should be the default, i.e.
should be used if a user just types 'wish').  

A command like update-alternatives --config wish allows to configure
whether wish should mean wish8.5 or wish8.4.  And all that is
necessary is to change the link in /etc/alternatives/wish to point at
the desired program.

 That is what I need - but I can't find update-alternatives in SuSE

As I said, I do not know whether SuSE offers this alternatives system
or a similar system.  If it does, perhaps it is just a matter of
installing some additional packages?  If it offers a different, but
similar system, then you would have to ask on a SuSE list on that
system is maintained and configured.

On my machine I would say apt-file search update-alternatives to find
out which package provides that command and to install that package if
it is not yet installed.  I am afraid I do not know what the equivalent
command on SuSE is.

  Typing R alone, will usually start the most recently installed
  version (as this will have the highest priority) but I can configure
  that via sudo update-alternatives --config R. __I.e., I can make R
  run a particular version. __Since the update-alternative step
  above also registers all the *.info files and man pages, I will
  also access the documentation of that particular R version (e.g.,
  C-h i in emacs will give me access to the info version of the
  manuals of the version of R which is run by the R command).
 
 Exactly what I would like to have.

Well, if you ever use a system that has the alternatives set up and the
update-alternatives command, I am happy to share my script with you. 

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Installing different versions of R simultaneously on Linux

2009-02-27 Thread Rainer M Krug
On Fri, Feb 27, 2009 at 1:49 PM, Berwin A Turlach
ber...@maths.uwa.edu.au wrote:
 G'day Rainer,

 On Fri, 27 Feb 2009 10:53:12 +0200
 Rainer M Krug r.m.k...@gmail.com wrote:

  What flavour of Linux are we talking about?

 Sorry - I am running SuSE on the machine where I need it.

 Sorry, I am not familiar with that flavour; before switching to Debian
 (and Debian based distributions), I was using RedHat.  And before that
 Slackware.

  4) Run in /opt/src a script that uses update-alternative install
  to install the new version and creates a link
  from /opt/R/R-x.y.z/bin/R to /opt/bin/R-x.y.z

 How do I do this? I usually call sudo make install. Do I have to use
 update-alternative --install R-2.7.1 R 2 if I want to have R-2.7.1
 aqs the second priority installed?

 I do the make install step manually, the script just alerts the
 system that another alternative for the R command was installed.

 If memory serves correctly, the alternatives mechanism was developed
 by Debian and adopted by RedHat (or the other way round).  I am not
 sure whether SuSE has adopted this, or a similar system.

 Essentially, for a command, say foo, for which several alternatives
 exists, is installed on the system in, say /usr/bin/, as a link
 to /etc/alternatives/foo and /etc/alternatives/foo is a link to the
 actual program that is called.

 E.g. on my machine I have

 ber...@berwin-nus1:~$ update-alternatives --list wish
 /usr/bin/wish8.5
 /usr/bin/wish8.4

 which tells me that wish 8.5 and wish8.4 are installed and I could call
 them explicitly.  /usr/bin/wish is a link to /etc/alternatives/wish
 and /etc/alternatives/wish will point to either of these two programs
 (depending on what the system admin decided should be the default, i.e.
 should be used if a user just types 'wish').

 A command like update-alternatives --config wish allows to configure
 whether wish should mean wish8.5 or wish8.4.  And all that is
 necessary is to change the link in /etc/alternatives/wish to point at
 the desired program.

 That is what I need - but I can't find update-alternatives in SuSE

 As I said, I do not know whether SuSE offers this alternatives system
 or a similar system.  If it does, perhaps it is just a matter of
 installing some additional packages?  If it offers a different, but
 similar system, then you would have to ask on a SuSE list on that
 system is maintained and configured.

 On my machine I would say apt-file search update-alternatives to find
 out which package provides that command and to install that package if
 it is not yet installed.  I am afraid I do not know what the equivalent
 command on SuSE is.

  Typing R alone, will usually start the most recently installed
  version (as this will have the highest priority) but I can configure
  that via sudo update-alternatives --config R. __I.e., I can make R
  run a particular version. __Since the update-alternative step
  above also registers all the *.info files and man pages, I will
  also access the documentation of that particular R version (e.g.,
  C-h i in emacs will give me access to the info version of the
  manuals of the version of R which is run by the R command).

 Exactly what I would like to have.

 Well, if you ever use a system that has the alternatives set up and the
 update-alternatives command, I am happy to share my script with you.

Thanks a lot for the offer - that would be great. I will set it up the
same way on m y PC with Xubuntu.

Cheers

Rainer


 Cheers,

        Berwin

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
Biology, UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Faculty of Science
Natural Sciences Building
Private Bag X1
University of Stellenbosch
Matieland 7602
South Africa

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Advice on graphics to design circle with density-shaded sectors

2009-02-27 Thread John Poulsen

Hello,

I am looking for some general advice on which graphics package to use to 
make a figure demonstrating my experimental design.


I want to design a circle with 7 sectors inside.  Then I will want to 
shade the sectors depending on densities of observations in the 
sectors.  I will also want to draw horizontal lines at increments along 
the sectors to demonstrate different distances out to the end of the sector.


Given this sparse description, does anyone have advice on what package 
or functions to use in R?


Thanks for your help,
John

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to get input-data of ROCR

2009-02-27 Thread bioshaw
Hi
I have a problem while using the ROCR package in R.
I can understand the main three commands, but can't understand the input 
format, 
including ROCR.hiv,ROCR.simple and ROCR.xval (actually,not only the format,but 
also 
how to get this data)
##
vectors(scores:numeric; labels:0 or 1)
multiple runs (cross-validation,bootstrapping...)

 What is the scores? 
I use the randomForest in windows XP, but can't obtain such data.
 Would you like to give me some details about the data ?
 It cannot be much better if you can show me some examples about it.
version: R 2.8.0, ROCR 1.0-2, randomForest 4.5-28
Best Wishes
Jiamin Shaw
2009.2.28
2009-02-27 



bioshaw 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] add absolute value to bars in barplot

2009-02-27 Thread soeren . vogel

Hello,

r-h...@r-project.orgbarplot(twcons.area,
  beside=T, col=c(green4, blue, red3, gray),
  xlab=estate,
  ylab=number of persons, ylim=c(0, 110),
  legend.text=c(treated, mix, untreated, NA))

produces a barplot very fine. In addition, I'd like to get the bars'  
absolute values on the top of the bars. How can I produce this in an  
easy way?


Thanks

Sören

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] help with correct use of function lsfit

2009-02-27 Thread mauede
To the purpose of fitting a 2nd order polynomial (a + b*x + c*x^2)  to the 
chunk of signal falling in a 17 consecutive samples window
I wrote the following very crude script. Since I have no previous experience of 
using Least Square Fit with R I would appreciate 
your supervision and suggestion.
I guess the returned coefficients of the oolynomial are: 
a = -1.3191398 
b = 0.1233055 
c = 0.9297401 


Thank you very much in advance,
Maura

##
## Main

tms - t(read.table(signal877cycle1.txt))
J - ilogb(length(tms), base=2) + 1
y - c(tms,rep(0,2^J - length(tms)))
y.win - tms.ext[1:17]
ls.mat - matrix(nrow=length(y.win),ncol=3,byrow=TRUE)
dt - 0.033
ls.mat[,1] - 1
ls.mat[,2] - seq(0,dt*(length(y.win)-1),dt)
ls.mat[,3] - ls.mat[,2]^2
#

 tms - t(read.table(signal877cycle1.txt))
 J - ilogb(length(tms), base=2) + 1
 y - c(tms,rep(0,2^J - length(tms)))
 y.win - tms.ext[1:17]
 ls.mat - matrix(nrow=length(y.win),ncol=3,byrow=TRUE)
 dt - 0.033
 ls.mat[,1] - 1
 ls.mat[,2] - seq(0,dt*(length(y.win)-1),dt)
 ls.mat[,3] - ls.mat[,2]^2
 y
  [1] -1.29882462 -1.29816465 -1.29175902 -1.33508315 -1.31905086 -1.30246447 
-1.25496640 -1.25858566 -1.19862868
 [10] -1.16985809 -1.15755035 -1.15627040 -1.10929231 -1.09324296 -1.07202676 
-1.03543530 -1.00609649 -0.96931799
 [19] -0.96014189 -0.93879923 -0.89472101 -0.86568807 -0.86394226 -0.83804684 
-0.79226517 -0.74804696 -0.69506558
 [28] -0.63984135 -0.57677266 -0.52376371 -0.48793752 -0.44261935 -0.37505621 
-0.30538492 -0.19309771 -0.07859412
 [37] -0.01879655  0.04247391  0.09565881  0.17329566  0.29132263  0.38380712  
0.45016443  0.50107765  0.57413940
 [46]  0.68835476  0.78369090  0.83756871  0.87753415  0.92834503  0.99560230  
1.08055356  1.17121517  1.22967280
 [55]  1.25791166  1.28749046  1.31672692  1.33188866  1.35420775  1.37356226  
1.38792638  1.40398573  1.41558702
 [64]  1.39204622  1.39848595  1.39902593  1.40604565  1.42092504  1.41436531  
1.3843  1.36012986  1.32950875
 [73]  1.26507137  1.25315597  1.18249472  1.08857029  0.98782261  0.90470599  
0.83081192  0.77709116  0.65228917
 [82]  0.51844166  0.44530462  0.39562664  0.30153281  0.17979539  0.09895985  
0.04306094 -0.03937571 -0.14150334
 [91] -0.25936679 -0.31480454 -0.38806157 -0.47389691 -0.50785671 -0.58179371 
-0.67538285 -0.74246719 -0.78380551
[100] -0.83894328 -0.86450224 -0.90614055 -0.93751928 -0.99679687 -1.03205956 
-1.06616465 -1.06651404 -1.14997066
[109] -1.18338930 -1.21335809 -1.20208854 -1.22370767 -1.23488486 -1.25112655 
-1.26942581 -1.26792234 -1.28838504
[118] -1.28799329 -1.27326566 -1.28502518  0.  0.  0.  
0.  0.  0.
[127]  0.  0.
 y.win
 [1] -1.298825 -1.298165 -1.291759 -1.335083 -1.319051 -1.302464 -1.254966 
-1.258586 -1.198629 -1.169858 -1.157550
[12] -1.156270 -1.109292 -1.093243 -1.072027 -1.035435 -1.006096
 ls.mat
  [,1]  [,2] [,3]
 [1,]1 0.000 0.00
 [2,]1 0.033 0.001089
 [3,]1 0.066 0.004356
 [4,]1 0.099 0.009801
 [5,]1 0.132 0.017424
 [6,]1 0.165 0.027225
 [7,]1 0.198 0.039204
 [8,]1 0.231 0.053361
 [9,]1 0.264 0.069696
[10,]1 0.297 0.088209
[11,]1 0.330 0.108900
[12,]1 0.363 0.131769
[13,]1 0.396 0.156816
[14,]1 0.429 0.184041
[15,]1 0.462 0.213444
[16,]1 0.495 0.245025
[17,]1 0.528 0.278784
 lsfit(x, y, wt = NULL, intercept = TRUE, tolerance = 1e-07,

+yname = NULL 
 lsfit(ls.mat, y.win,wt = NULL, intercept = TRUE, tolerance = 1e-07,yname = 
 NULL)
$coefficients
 Intercept X1 X2 X3 
-1.3191398  0.1233055  0.9297401  0.000 

$residuals
 [1]  0.020315146  0.015893550  0.015192628 -0.037263015 -0.032387216 
-0.028982296  0.003309337 -0.017541342
 [9]  0.023159250  0.030648485  0.019649885 -0.004401476  0.015220334  
0.001888425 -0.008301609 -0.005141358
[17] -0.011258729

$intercept
[1] TRUE

$qr
$qt
 [1]  4.937370523  0.409411205 -0.089144866 -0.041892736 -0.035696706 
-0.031176843  0.002024443 -0.018121872
 [9]  0.023077794  0.030860815  0.019950712 -0.004217443  0.015082286  
0.001223006 -0.009699688 -0.007477386
[17] -0.014737995

$qr
   Intercept  X2  X3X1
 [1,] -4.1231056 -1.08849989 -0.39512546 -4.123106e+00
 [2,]  0.2425356  0.66656733  0.35194755  1.558035e-17
 [3,]  0.2425356  0.21973588 -0.09588149  1.787189e-17
 [4,]  0.2425356  0.17022850 -0.10350966 -2.990539e-17
 [5,]  0.2425356  0.12072112 -0.19811319  2.906411e-01
 [6,]  0.2425356  0.07121375 -0.27000118  2.654896e-01
 [7,]  0.2425356  0.02170637 -0.31917362  2.457966e-01
 [8,]  0.2425356 -0.02780101 -0.34563052  2.315620e-01
 [9,]  0.2425356 -0.07730838 -0.34937188  2.227859e-01
[10,]  0.2425356 -0.12681576 -0.33039769  2.194681e-01
[11,]  0.2425356 -0.17632314 -0.28870796  2.216089e-01
[12,]  0.2425356 -0.22583052 -0.22430269  2.292080e-01
[13,]  0.2425356 -0.27533789 

Re: [R] Installing different versions of R simultaneously on Linux

2009-02-27 Thread Berwin A Turlach
G'day Rainer,

On Fri, 27 Feb 2009 14:06:20 +0200
Rainer M Krug r.m.k...@gmail.com wrote:

 Thanks a lot for the offer - that would be great. I will set it up the
 same way on m y PC with Xubuntu.

Script is attached.  Ignore the comments at the beginning they are
there just to remind me what ./configure line I usually use, possible
variations, and whether to edit config.site or work with environment
variables.

After the make install step, I edit in this file the variable VERSION
and PRIORITY and then ran the script as root.  Note that VERSION should
be the same number as the one specified in the ./configure line.  

As long as the the configuration of a command is set to 'auto', the
alternative with the highest priority is used.  So make sure that the
newest version of R has highest priority, I usually set priority just
to xyz for R-x.y.z (and keep my fingers crossed that there will never
be a release with either y or z larger than 9, otherwise I will have
to refine my scheme).

To use this on a new machine, you have to create /opt/info, 
/opt/man/man1 and /opt/bin before running the script the first time
(IIRC).  It also helps to copy /opt/R/R-$VERSION/share/info/dir
to /opt/info/dir so that emacs will include the info files in the list
that you get with C-h i (this has to be done only once, the dir file
does not seem to change between R versions).

Prior to 2.5.0 the man and info files were installed in R-$VERSION/man
and R-$VERSION/info instead of R-$VERSION/share/man and
R-$VERSION/share/info, respectively.  I have a separate script for those
versions (but don't install such old versions anymore).  How far do you
want to go back?  Also, much earlier, if memory serves correctly,
R-exts.info came in 2 parts instead of 3; but I don't seem to have my
script from that time anymore.

I think that's all.  Let me know if you run into troubles or need more
help.

Cheers,

Berwin
#!/bin/bash

##Configure with the following options:
##
## ./configure --prefix=/opt/R/R-2.8.1 --with-blas --with-lapack 
--enable-R-shlib r_arch=32
##
## other possible options:
## r_arch=32 and r_arch=64
## --enable-R-shlib
##
## export JAVA_HOME=/where/is/sun/java (/usr/lib/jvm/java-1.6-sun)
## above not necessary, use config.site instead.
##
##Then as root:
## VERSION=devel
## PRIORITY=100
VERSION=2.8.1
PRIORITY=281

update-alternatives --install /opt/bin/R R /opt/R/R-$VERSION/bin/R $PRIORITY \
  --slave /opt/man/man1/R.1 R.1 /opt/R/R-$VERSION/share/man/man1/R.1 \
  --slave /opt/info/R-FAQ.info.gz R-FAQ.info 
/opt/R/R-$VERSION/share/info/R-FAQ.info.gz \
  --slave /opt/info/R-admin.info.gz R-admin.info 
/opt/R/R-$VERSION/share/info/R-admin.info.gz \
  --slave /opt/info/R-data.info.gz R-data.info 
/opt/R/R-$VERSION/share/info/R-data.info.gz \
  --slave /opt/info/R-exts.info.gz R-exts.info 
/opt/R/R-$VERSION/share/info/R-exts.info.gz \
  --slave /opt/info/R-exts.info-1.gz R-exts.info-1 
/opt/R/R-$VERSION/share/info/R-exts.info-1.gz \
  --slave /opt/info/R-exts.info-2.gz R-exts.info-2 
/opt/R/R-$VERSION/share/info/R-exts.info-2.gz \
  --slave /opt/info/R-intro.info.gz R-intro.info 
/opt/R/R-$VERSION/share/info/R-intro.info.gz \
  --slave /opt/info/R-lang.info.gz R-lang.info 
/opt/R/R-$VERSION/share/info/R-lang.info.gz \
  --slave /opt/info/R-ints.info.gz R-ints.info 
/opt/R/R-$VERSION/share/info/R-ints.info.gz

ln -sf /opt/R/R-$VERSION/bin/R /opt/bin/R-$VERSION 
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Sweave doesn't do csv.get()

2009-02-27 Thread christiaan pauw
Hi Everybody
I use R2.8.0 on Mac OS X. I set up LyX 1.6.1 to use Sweave today. I can
compile the test file I found on CRAN (
http://cran.r-project.org/contrib/extra/lyx/) without a problem and the
output looks very nice. In the test file the following R code is used.

myFirstChunkInLyX=
xObs - 100; xMean - 10; xVar - 9
x - rnorm(n=xObs, mean=xMean, sd=sqrt(xVar))
mean(x)
@

that should be the same as:

xObs - 100
xMean - 10
xVar - 9
x - rnorm(n=xObs, mean=xMean, sd=sqrt(xVar))
mean(x)

in the R console.

My problem is that I want to import data to use in my report. In the R
source I currently use to analyse my data I import it through csv.get(). I
have found that I cannot use csv.get() or write.csv() or that matter. I
don't seem to be able to use load() to get a .rda file in either

Is this issue related to LyX, LaTeX or R?

Thanks in advance
Christiaan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] add absolute value to bars in barplot

2009-02-27 Thread Philipp Pagel
On Fri, Feb 27, 2009 at 01:32:45PM +0100, soeren.vo...@eawag.ch wrote:
 barplot(twcons.area,
   beside=T, col=c(green4, blue, red3, gray),
   xlab=estate,
   ylab=number of persons, ylim=c(0, 110),
   legend.text=c(treated, mix, untreated, NA))

 produces a barplot very fine. In addition, I'd like to get the bars'  
 absolute values on the top of the bars. How can I produce this in an  
 easy way?

barplot() returns a vector of midpoints so you can use text() to add the
annotation. There is an example in the manual page of barplot:

mp - barplot(VADeaths)
tot - colMeans(VADeaths)
text(mp, tot + 3, format(tot), xpd = TRUE, col = blue)

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
85350 Freising, Germany
http://mips.gsf.de/staff/pagel

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Axis-question

2009-02-27 Thread Antje

solved by grouping... (see my next mail)

Antje schrieb:

Hi there,

I was wondering wether it's possible to generate an axis with groups 
(like in Excel).


So that you can have something like this as x-axis (for example for the 
levelplot-method of the lattice package):


---
| X1 | X2 | X3 | X1 | X2 | X3 | X1 | ...
| group1   | group2   | group3  ...
..
..
..

I hope you understand what I'm looking for?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inefficiency of SAS Programming

2009-02-27 Thread Frank E Harrell Jr

Wensui Liu wrote:

Thanks for pointing me to the SAS code, Dr Harrell
After reading codes, I have to say that the inefficiency is not
related to SAS language itself but the SAS programmer. An experienced
SAS programmer won't use much of hard-coding, very adhoc and difficult
to maintain.
I agree with you that in the SAS code, it is a little too much to
evaluate predictions. such complex data step actually can be replaced
by simpler iml code.


Agreed that the SAS code could have been much better.  I programmed in 
SAS for 23 years and would have done it much differently.  But you will 
find that the most elegant SAS program re-write will still be a far cry 
from the elegance of R.


Frank



On Thu, Feb 26, 2009 at 5:57 PM, Frank E Harrell Jr
f.harr...@vanderbilt.edu wrote:

If anyone wants to see a prime example of how inefficient it is to program
in SAS, take a look at the SAS programs provided by the US Agency for
Healthcare Research and Quality for risk adjusting and reporting for
hospital outcomes at http://www.qualityindicators.ahrq.gov/software.htm .
 The PSSASP3.SAS program is a prime example.  Look at how you do a vector
product in the SAS macro language to evaluate predictions from a logistic
regression model.  I estimate that using R would easily cut the programming
time of this set of programs by a factor of 4.

Frank
--
Frank E Harrell Jr   Professor and Chair   School of Medicine
Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.








--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] levelplot help needed

2009-02-27 Thread Antje

Hi there,

I'm looking for someone who can give me some hints how to make a nice 
levelplot. As an example, I have the following code:


# create some example data
# --
xl - 4
yl - 10

my.data - sapply(1:xl, FUN = function(x) { rnorm( yl, mean = x) })

x_label - rep(c(X Label 1, X Label 2, X Label 3, X Label 4), each = yl)
y_label - rep(paste(Y Label , 1:yl, sep=), xl)

df - data.frame(x_label = factor(x_label),y_label = factor(y_label), values = 
as.vector(my.data))


df1 - data.frame(df, group = rep(Group 1, xl*yl))
df2 - data.frame(df, group = rep(Group 2, xl*yl))
df3 - data.frame(df, group = rep(Group 3, xl*yl))

mdf - rbind(df1,df2,df3)

# plot
# --

graph - levelplot(mdf$values ~ mdf$x_label * mdf$y_label | mdf$group,
aspect = xy, layout = c(3,1),
scales = list(x = list(labels = substr(levels(factor(mdf$x_label)),0,5), 
rot = 45)))

print(graph)

# --


(I need to put this strange x-labels, because in my real data the values of the 
x-labels are too long and I just want to display the first 10 characters as label)


My questions:

* I'd like to start with Y Label 1 in the upper row (that's a more general 
issue, how can I have influence on the order of x,y, and groups?)

* I'd like to put the groups at the bottom

Can anybody give me some help?

Antje

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inefficiency of SAS Programming

2009-02-27 Thread Frank E Harrell Jr

Ajay ohri wrote:
Sometimes for the sake of simplicity, SAS coding is created like that. 
One can use the concatenate function and drag and drop in an simple 
excel sheet for creating elaborate SAS code like the one mentioned and 
without any time at all.


A system that requires Excel for its success is not a complete system.



There are multiple ways to do this in SAS , much better and similarly in 
R


There are many areas that SAS programmers would find R a bit not so 
useful ---example


the equivalence of proc logistic for creating a logistic model.


Really?  Try this in SAS:

library(Design)
f - lrm(death ~ rcs(age,5)*sex)
anova(f) # get test of nonlinearity of interactions among other things
nomogram(f)  # depict model graphically

The restricted cubic spline in age, i.e., assuming the age relationship 
is smooth but not much else, is very easy to code in R.  There are many 
other automatic transformations available.  The lack of generality of 
the SAS language makes many SAS users assume linearity for more often 
than R users do.


Also note that PROC LOGISTIC, without invocation of a special option, 
would make the user believe that older subjects have lower chances of 
dying, as SAS by default takes the even being predicted to be death=0.


Frank




On Fri, Feb 27, 2009 at 10:21 AM, Wensui Liu liuwen...@gmail.com 
mailto:liuwen...@gmail.com wrote:


Thanks for pointing me to the SAS code, Dr Harrell
After reading codes, I have to say that the inefficiency is not
related to SAS language itself but the SAS programmer. An experienced
SAS programmer won't use much of hard-coding, very adhoc and difficult
to maintain.
I agree with you that in the SAS code, it is a little too much to
evaluate predictions. such complex data step actually can be replaced
by simpler iml code.

On Thu, Feb 26, 2009 at 5:57 PM, Frank E Harrell Jr
f.harr...@vanderbilt.edu mailto:f.harr...@vanderbilt.edu wrote:
  If anyone wants to see a prime example of how inefficient it is
to program
  in SAS, take a look at the SAS programs provided by the US Agency for
  Healthcare Research and Quality for risk adjusting and reporting for
  hospital outcomes at
http://www.qualityindicators.ahrq.gov/software.htm .
   The PSSASP3.SAS program is a prime example.  Look at how you do
a vector
  product in the SAS macro language to evaluate predictions from a
logistic
  regression model.  I estimate that using R would easily cut the
programming
  time of this set of programs by a factor of 4.
 
  Frank
  --
  Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt
University
 
  __
  R-help@r-project.org mailto:R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



--
===
WenSui Liu
Acquisition Risk, Chase
Blog   : statcompute.spaces.live.com
http://statcompute.spaces.live.com

I can calculate the motion of heavenly bodies, but not the madness
of people.”
--  Isaac Newton
===

__
R-help@r-project.org mailto:R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inefficiency of SAS Programming

2009-02-27 Thread Frank E Harrell Jr

Gerard M. Keogh wrote:

Frank,

I can't see the code you mention - Web marshall at work - but I don't think
you should be too quick to run down SAS - it's a powerful and flexible
language but unfortunately very expensive.

Your example mentions doing a vector product in the macro language - this
only suggest to me that those people writing the code need a crash course
in SAS/IML (the matrix language). SAS is designed to work on records and so
is inapproprorriate for matrices - macros are only an efficient code
copying device. Doing matrix computations in this way is pretty mad and the
code would be impossible never mind the memory problems.
SAS recognise that but a lot of SAS users remain familiar with IML.

In IML by contrast there are inner, cross and outer products and a raft of
other useful methods for matrix work that R users would be familiar with.
OLS for example is one line:

b = solve(X`X, X`y) ;
rss = sqrt(ssq(y - Xb)) ;

And to give you a flavour of IML's capabilities I implemented a SAS version
of the MARS program in it about 6 or 7 years ago.
BTW SPSS also has a matrix language.

Gerard


But try this:

PROC IML;
... some custom user code ...
... loop over j=1 to 10 ...
...   PROC GENMOD, output results back to IML
...

IML is only a partial solution since it is not integrated with the PROC 
step.


Frank





   
 Frank E Harrell   
 Jr
 f.harr...@vander  To 
 bilt.edu R list r-h...@stat.math.ethz.ch   
 Sent by:   cc 
 r-help-boun...@r- 
 project.org   Subject 
   [R] Inefficiency of SAS Programming 
   
 26/02/2009 22:57  
   
   
   
   





If anyone wants to see a prime example of how inefficient it is to
program in SAS, take a look at the SAS programs provided by the US
Agency for Healthcare Research and Quality for risk adjusting and
reporting for hospital outcomes at
http://www.qualityindicators.ahrq.gov/software.htm .  The PSSASP3.SAS
program is a prime example.  Look at how you do a vector product in the
SAS macro language to evaluate predictions from a logistic regression
model.  I estimate that using R would easily cut the programming time of
this set of programs by a factor of 4.

Frank
--
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



**
The information transmitted is intended only for the p...{{dropped:15}}


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inefficiency of SAS Programming

2009-02-27 Thread Frank E Harrell Jr

Ajay ohri wrote:


I would like to know if we can create a package in which r functions are 
renamed closer to sas language.doing so will help people familiar to SAS 
to straight away take to R for their work,thus decreasing the threshold 
for acceptance - and then get into deeper understanding later.


since it is a package it would be optional only for people wanting to 
try out R from SAS.. Do we have such a package right now..it basically 
masks R functions to the equivalent function in another language just 
for user ease /beginners


for example

creating function for means 


 procmeans-function(x,y)
+ {
summary (
subset(x,select=c(x,y))
+
)

creating function for importing csv

procimport -function(x,y)
+ {
read.csv(
textConnection(x),row.names=y,na.strings=  
+
)


creating function fo describing data

procunivariate-function(x)
+ {
summary(x)
+
)

regards,

ajay


Ajay,

This will generate major confusion among users of all types and be hard 
to maintain.  A better approach is to get Bob Muenchen's excellent book 
and keep it nearby.


Frank



www.decisionstats.com http://www.decisionstats.com

On Fri, Feb 27, 2009 at 4:27 AM, Frank E Harrell Jr 
f.harr...@vanderbilt.edu mailto:f.harr...@vanderbilt.edu wrote:


If anyone wants to see a prime example of how inefficient it is to
program in SAS, take a look at the SAS programs provided by the US
Agency for Healthcare Research and Quality for risk adjusting and
reporting for hospital outcomes at
http://www.qualityindicators.ahrq.gov/software.htm .  The
PSSASP3.SAS program is a prime example.  Look at how you do a vector
product in the SAS macro language to evaluate predictions from a
logistic regression model.  I estimate that using R would easily cut
the programming time of this set of programs by a factor of 4.

Frank
-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine

Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailto:R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Ordinal Mantel-Haenszel type inference

2009-02-27 Thread Jourdan Gold
Hello,

I am searching for an R-Package that does an exentsion of the Mantel-Haenszel 
test for ordinal data as described in Liu and Agresti (1996) A Mantel-Haenszel 
type inference for cummulative odds ratios. in Biometrics. I see packages such 
as Epi that perform it for binary data and derives a varaince for it using the 
Robbins and Breslow variance method. As well as another pacakge that derives it 
for nominal variables but does not provide a variance or confidence limit. 

Does a package exist that does this? I have searched the list archives and 
can't seem to see such a package but I could be missing something.  thank you.


yours sincerely,


Jourdan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how can I compare two vector by a factor

2009-02-27 Thread Xin Shi
Hi, I used Wilcox.test to carry out mann whiteney test when paired=false. 
However, I want to see the comparison of two variables, e.g. pre and post, 
grouped by treatment.

Anyone has this experience?

Thanks!

Xin


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Singularity in a regression?

2009-02-27 Thread Alex Roy
If  collinearity exists, one of the solutions is regulazation version of
regression. There are different types of regularization method. like Ridge,
LASSO, elastic net etc. For example, in  MASS package you can get ridge
regression.

Alex


On Thu, Feb 26, 2009 at 1:58 PM, Bob Gotwals gotw...@ncssm.edu wrote:

 R friends,

 In a matrix of 1s and 0s, I'm getting a singularity error.  Any helpful
 ideas?

 lm(formula = activity ~ metaF + metaCl + metaBr + metaI + metaMe +
paraF + paraCl + paraBr + paraI + paraMe)

 Residuals:
   Min 1Q Median 3QMax
 -4.573e-01 -7.884e-02  3.469e-17  6.616e-02  2.427e-01

 Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(|t|)
 (Intercept)   7.9173 0.1129  70.135   2e-16 ***
 metaF-0.3973 0.2339  -1.698 0.115172
 metaClNA NA  NA   NA
 metaBr0.3454 0.1149   3.007 0.010929 *
 metaI 0.4827 0.2339   2.063 0.061404 .
 metaMe0.3654 0.1149   3.181 0.007909 **
 paraF 0.7675 0.1449   5.298 0.000189 ***
 paraCl0.3400 0.1449   2.347 0.036925 *
 paraBr1.0200 0.1449   7.040 1.36e-05 ***
 paraI 1.3327 0.2339   5.697 9.96e-05 ***
 paraMe1.2191 0.1573   7.751 5.19e-06 ***
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 Residual standard error: 0.2049 on 12 degrees of freedom
 Multiple R-squared: 0.9257, Adjusted R-squared: 0.8699
 F-statistic: 16.61 on 9 and 12 DF,  p-value: 1.811e-05

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about 3-d plot

2009-02-27 Thread Tony Breyal
Hi Deepankar
The code on the following page looks kind of cool, and also seems to
produce something of the type of graph you are after perhaps:

https://r-forge.r-project.org/plugins/scmsvn/viewcvs.php/pkg/rgl/demo/regression.r?rev=702root=rglsortby=dateview=auto

[below is a copy of the code...]


library(rgl)

# demo: regression
# author: Daniel Adler
# $Id$

rgl.demo.regression - function(n=100,xa=3,za=8,xb=0.02,
zb=0.01,xlim=c(0,100),zlim=c(0,100)) {

  rgl.clear(all)
  rgl.bg(sphere = TRUE, color = c(black, green), lit = FALSE,
size=2,
alpha=0.2, back = lines)
  rgl.light()
  rgl.bbox()

  x  - runif(n,min=xlim[1],max=xlim[2])
  z  - runif(n,min=zlim[1],max=zlim[2])
  ex - rnorm(n,sd=3)
  ez - rnorm(n,sd=2)
  esty  - (xa+xb*x) * (za+zb*z) + ex + ez

  rgl.spheres(x,esty,z,color=gray,radius=1,specular=green,
texture=system.file(textures/
bump_dust.png,package=rgl),
texmipmap=T, texminfilter=linear.mipmap.linear)

  regx  - seq(xlim[1],xlim[2],len=100)
  regz  - seq(zlim[1],zlim[2],len=100)
  regy  - (xa+regx*xb) %*% t(za+regz*zb)

  rgl.surface(regx,regz,regy,color=blue,alpha=0.5,shininess=128)

  lx - c(xlim[1],xlim[2],xlim[2],xlim[1])
  lz - c(zlim[1],zlim[1],zlim[2],zlim[2])
  f - function(x,z) { return ( (xa+x*xb) * t(za+z*zb) ) }
  ly - f(lx,lz)

  rgl.quads
(lx,ly,lz,color=red,size=5,front=lines,back=lines,lit=F)
}

rgl.open()
rgl.demo.regression()



On Feb 27, 5:28 am, Dipankar Basu basu...@gmail.com wrote:
 Hi R Users,

 I have produced a simulated scatter plot of y versus x tightly clustered
 around the 45 degree line through the origin with the following code:

  x - seq(1,100)
  y - x+rnorm(100,0,10)
  plot(x,y,col=blue)
  abline(0,1)

 Is there some way to generate a 3-dimensional analogue of this? Can I get a
 similar simulated scatter plot of points in 3 dimensions where the points
 are clustered around a plane through the origin where the plane in question
 is the 3-dimensional analogue of the 45 degree line through the origin?

 Deepankar

         [[alternative HTML version deleted]]

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sweave doesn't do csv.get()

2009-02-27 Thread Frank E Harrell Jr

christiaan pauw wrote:

Hi Everybody
I use R2.8.0 on Mac OS X. I set up LyX 1.6.1 to use Sweave today. I can
compile the test file I found on CRAN (
http://cran.r-project.org/contrib/extra/lyx/) without a problem and the
output looks very nice. In the test file the following R code is used.

myFirstChunkInLyX=
xObs - 100; xMean - 10; xVar - 9
x - rnorm(n=xObs, mean=xMean, sd=sqrt(xVar))
mean(x)
@

that should be the same as:

xObs - 100
xMean - 10
xVar - 9
x - rnorm(n=xObs, mean=xMean, sd=sqrt(xVar))
mean(x)

in the R console.

My problem is that I want to import data to use in my report. In the R
source I currently use to analyse my data I import it through csv.get(). I
have found that I cannot use csv.get() or write.csv() or that matter. I
don't seem to be able to use load() to get a .rda file in either

Is this issue related to LyX, LaTeX or R?

Thanks in advance
Christiaan


I didn't see the library(Hmisc) statement in your code that would give 
you access to csv.get.  This should be unrelated to lyx, Sweave, etc.

Frank



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inefficiency of SAS Programming

2009-02-27 Thread Gerard M. Keogh
Yes Frank, I accept your point but nevertheless IML is the proper place for
matrix work in SAS - mixing macro-level logic and computation is another
question - R is certainly more seemless in this respect.

Gerard


   
 Frank E Harrell   
 Jr
 f.harr...@vander  To 
 bilt.edu Gerard M. Keogh   
   gmke...@justice.ie
 27/02/2009 13:55   cc 
   R list r-h...@stat.math.ethz.ch,  
   r-help-boun...@r-project.org
   Subject 
   Re: [R] Inefficiency of SAS 
   Programming 
   
   
   
   
   
   




Gerard M. Keogh wrote:
 Frank,

 I can't see the code you mention - Web marshall at work - but I don't
think
 you should be too quick to run down SAS - it's a powerful and flexible
 language but unfortunately very expensive.

 Your example mentions doing a vector product in the macro language - this
 only suggest to me that those people writing the code need a crash course
 in SAS/IML (the matrix language). SAS is designed to work on records and
so
 is inapproprorriate for matrices - macros are only an efficient code
 copying device. Doing matrix computations in this way is pretty mad and
the
 code would be impossible never mind the memory problems.
 SAS recognise that but a lot of SAS users remain familiar with IML.

 In IML by contrast there are inner, cross and outer products and a raft
of
 other useful methods for matrix work that R users would be familiar with.
 OLS for example is one line:

 b = solve(X`X, X`y) ;
 rss = sqrt(ssq(y - Xb)) ;

 And to give you a flavour of IML's capabilities I implemented a SAS
version
 of the MARS program in it about 6 or 7 years ago.
 BTW SPSS also has a matrix language.

 Gerard

But try this:

PROC IML;
... some custom user code ...
... loop over j=1 to 10 ...
...   PROC GENMOD, output results back to IML
...

IML is only a partial solution since it is not integrated with the PROC
step.

Frank






  Frank E Harrell

  Jr

  f.harr...@vander
To
  bilt.edu R list r-h...@stat.math.ethz.ch

  Sent by:
cc
  r-help-boun...@r-

  project.org
Subject
[R] Inefficiency of SAS
Programming


  26/02/2009 22:57













 If anyone wants to see a prime example of how inefficient it is to
 program in SAS, take a look at the SAS programs provided by the US
 Agency for Healthcare Research and Quality for risk adjusting and
 reporting for hospital outcomes at
 http://www.qualityindicators.ahrq.gov/software.htm .  The PSSASP3.SAS
 program is a prime example.  Look at how you do a vector product in the
 SAS macro language to evaluate predictions from a logistic regression
 model.  I estimate that using R would easily cut the programming time of
 this set of programs by a factor of 4.

 Frank
 --
 Frank E Harrell Jr   Professor and Chair   School of Medicine
   Department of Biostatistics   Vanderbilt University

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




**

 The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited. If you received
this in error, please contact the sender and delete the material from any
computer.  It is the policy of the Department of Justice, Equality and Law
Reform and the Agencies and Offices using its IT services to disallow the
sending of 

[R] Will ctv package work on ubuntu?

2009-02-27 Thread Brian Lunergan
Hi ho:

I had used the ctv package on a Windows setup of R and I was wondering
about Ubuntu. Certainly under Windows it has an easy time of it because
there is only one library folder to scan for existing packages. Would
its install.views and update.views functions work in Ubuntu where the
packages are split up between the library established by R-cran
downloads from synaptic and the default library used by 'conventional'
downloads using install.packages?

If it can't handle that distinction between a Windows and a Linux
situation, is it a package I should remove for now?

Regards...
-- 
Brian Lunergan
Nepean, Ontario
Canada

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inefficiency of SAS Programming

2009-02-27 Thread Marc Schwartz
on 02/27/2009 07:57 AM Frank E Harrell Jr wrote:
 Ajay ohri wrote:

 I would like to know if we can create a package in which r functions
 are renamed closer to sas language.doing so will help people familiar
 to SAS to straight away take to R for their work,thus decreasing the
 threshold for acceptance - and then get into deeper understanding later.

 since it is a package it would be optional only for people wanting to
 try out R from SAS.. Do we have such a package right now..it basically
 masks R functions to the equivalent function in another language just
 for user ease /beginners

 for example

 creating function for means
  procmeans-function(x,y)
 + {
 summary (
 subset(x,select=c(x,y))
 +
 )

 creating function for importing csv

 procimport -function(x,y)
 + {
 read.csv(
 textConnection(x),row.names=y,na.strings=  
 +
 )


 creating function fo describing data

 procunivariate-function(x)
 + {
 summary(x)
 +
 )

 regards,

 ajay
 
 Ajay,
 
 This will generate major confusion among users of all types and be hard
 to maintain.  A better approach is to get Bob Muenchen's excellent book
 and keep it nearby.
 
 Frank

I whole heartedly agree with Frank here. It may be one thing to have a
translation process in place based upon some form of logical mapping
between the two languages (as Bob's book provides). But is another thing
entirely to actually start writing functions that provide wrappers
modeled on SAS based PROCs.

If you do this, then you only serve to obfuscate the fundamental
philosophical and functional differences between the two languages and
doom a new useR to missing all of R's benefits. They will continue to
try to figure out how to use R based upon their SAS intuition rather
than developing a new set of coding and even statistical paradigms.

Having been through the SAS to S/R transition myself, having used SAS
for much of the 90's and now having used R for over 7 years, I can speak
from personal experience and state that the only way to achieve the
requisite proficiency with R is immersion therapy.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Download daily weather data

2009-02-27 Thread Thomas Levine
Geonames unfortunately doesn't have weather forecasts. This is a problem.

GRIB looks better. There is an interface between GRIB and R.

On Fri, Feb 27, 2009 at 4:14 AM, Pfaff, Bernhard Dr.
bernhard_pf...@fra.invesco.com wrote:
 Dear Thomas,

 more for the sake of completeness and as an alternative to R. There are GRIB 
 data [1] sets available (some for free) and there is the GPL software Grads 
 [2]. Because the Grib-Format is well documented it should be possible to get 
 it into R easily and make up your own plots/weather analyis. I do not know 
 and have not checked if somebody has already done so.

 I use this information/tools aside of others during longer-dated off-shore 
 sailing.

 Best,
 Bernhard

 [1] http://www.grib.us/
 [2] http://www.iges.org/grads/

-Ursprüngliche Nachricht-
Von: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] Im Auftrag von Scillieri, John
Gesendet: Donnerstag, 26. Februar 2009 22:58
An: 'James Muller'; 'r-help@r-project.org'
Betreff: Re: [R] Download daily weather data

Looks like you can sign up to get XML feed data from Weather.com

http://www.weather.com/services/xmloap.html

Hope it works out!

-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of James Muller
Sent: Thursday, February 26, 2009 3:57 PM
To: r-help@r-project.org
Subject: Re: [R] Download daily weather data

Thomas,

Have a look at the source code for the webpage (ctrl-u in firefox,
don't know in internet explorer, etc.). That is what you'd have to
parse in order to get the forecast from this page. Typically when I
parse webpages such as this I use regular expressions to do so (and I
would never downplay the usefulness of regular expressions, but they
take a little getting used to). There are two parts to the task: find
patterns that allow you to pull out the datum/data you're after; and
then write a program to pull it/them out. Also, of course, download
the webpage (but that's no issue).

I bet you'd be able to find a comma separated value (CSV) file
containing the weather report somewhere, which would probably involve
a little less labor in order to produce your automatic wardrobe
advice.

James



On Thu, Feb 26, 2009 at 3:47 PM, Thomas Levine
thomas.lev...@gmail.com wrote:
 I'm writing a program that will tell me whether I should wear a coat,
 so I'd like to be able to download daily weather forecasts and daily
 reports of recent past weather conditions.

 The NOAA has very promising tabular forecasts

(http://forecast.weather.gov/MapClick.php?CityName=Ithacastate
 =NYsite=BGMtextField1=42.4422textField2=-76.5002e=0FcstType=digital),
 but I can't figure out how to import them.

 Someone must have needed to do this before. Suggestions?

 Thomas Levine!

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
 This e-mail and any attachments are confidential, may
contain legal, professional or other privileged information,
and are intended solely for the addressee.  If you are not the
intended recipient, do not use the information in this e-mail
in any way, delete this e-mail and notify the sender. CEG-IP1

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

 *
 Confidentiality Note: The information contained in this ...{{dropped:10}}

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sweave doesn't do csv.get()

2009-02-27 Thread christiaan pauw
It works now.
Your help is much appreciated
Christiaan

2009/2/27 Frank E Harrell Jr f.harr...@vanderbilt.edu

 christiaan pauw wrote:

 Hi Everybody
 I use R2.8.0 on Mac OS X. I set up LyX 1.6.1 to use Sweave today. I can
 compile the test file I found on CRAN (
 http://cran.r-project.org/contrib/extra/lyx/) without a problem and the
 output looks very nice. In the test file the following R code is used.

 myFirstChunkInLyX=
 xObs - 100; xMean - 10; xVar - 9
 x - rnorm(n=xObs, mean=xMean, sd=sqrt(xVar))
 mean(x)
 @

 that should be the same as:

 xObs - 100
 xMean - 10
 xVar - 9
 x - rnorm(n=xObs, mean=xMean, sd=sqrt(xVar))
 mean(x)

 in the R console.

 My problem is that I want to import data to use in my report. In the R
 source I currently use to analyse my data I import it through csv.get(). I
 have found that I cannot use csv.get() or write.csv() or that matter. I
 don't seem to be able to use load() to get a .rda file in either

 Is this issue related to LyX, LaTeX or R?

 Thanks in advance
 Christiaan


 I didn't see the library(Hmisc) statement in your code that would give you
 access to csv.get.  This should be unrelated to lyx, Sweave, etc.
 Frank


[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --
 Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Download daily weather data

2009-02-27 Thread James Muller
Can I just say, it's great to see the R community really come out in
support of such a noble and worthy cause as this :).

Downfall of civilization, all that. Not here, no!

James



On Thu, Feb 26, 2009 at 3:47 PM, Thomas Levine thomas.lev...@gmail.com wrote:
 I'm writing a program that will tell me whether I should wear a coat,
 so I'd like to be able to download daily weather forecasts and daily
 reports of recent past weather conditions.

 The NOAA has very promising tabular forecasts
 (http://forecast.weather.gov/MapClick.php?CityName=Ithacastate=NYsite=BGMtextField1=42.4422textField2=-76.5002e=0FcstType=digital),
 but I can't figure out how to import them.

 Someone must have needed to do this before. Suggestions?

 Thomas Levine!

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inefficiency of SAS Programming

2009-02-27 Thread spam me
I've actually used AHRQ's software to create Inpatient Quality Indicator
reports.  I can confirm pretty much what we already know; it is inefficient.
Running on about 1.8 - 2 million cases, it would take just about a whole day
to run the entire process from start to finish.  That isn't all processing
time and includes some time for the analyst to check results between
substeps, but I still knew that my day was full when I was working on IQI
reports.



To be fair though, there are a lot of other factors (beside efficiency
considerations) that go into AHRQ's program design.  First, there are a lot
of changes to that software every year.  In some cases it is easier and less
error prone to hardcode a few points in the data so that it is blatantly
obvious what to change next year should another analyst need to do so.  Second,
the organizations that use this software often require transparency and may
not have high level programmers on staff.  Writing code so that it is
accessible, editable, and interpretable by intermediate level programmers or
analysts is a plus.  Third, given that IQI reports are often produced on a
yearly basis, there's no real need to sacrifice clarity, etc. for efficiency
- you're only doing this process once a year.



There are other points that could be made, but the main idea is I don't
think it's fair to hold this software up, out of context, as an example of
SAS's (or even AHRQs) inefficiencies.  I agree that SAS syntax is nowhere
near as elegant or as powerful as R from a programming standpoint, that's
why after 7 years of using SAS I switched to R.  But comparing the two at
that level is like a racing a Ferrari and a Bentley to see which is the
better car.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ftp fetch using RCurl?

2009-02-27 Thread CHD850

I am using RCurl, version 0.9-4, under windows. I can not find the function
getURLContent(). Is it being renamed ? or is it in a different version?

Also, in the reference manual on CRAN R under package RCurl, I found a
function getBinaryURL() documented but can not be found in the package as
well. 





 
 I would use something like
 
content = getURLContent(ftp://./foo.zip;)
 
attributes(content) = NULL
 
writeBin(content, /tmp/foo.zip)
 
 and that should be sufficient.
 
 (You have to strip the attributes or writeBin() complains.)
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/ftp-fetch-using-RCurl--tp8067p22247131.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] cross tabulation: convert frequencies to percentages

2009-02-27 Thread soeren . vogel

Hello,

might be rather easy for R pros, but I've been searching to the dead  
end to ...


twsource.area - table(twsource, area, useNA=ifany)

gives me a nice cross tabulation of frequencies of two factors, but  
now I want to convert to pecentages of those absolute values. In  
addition I'd like an extra column and an extra row with absolute sums.  
I know, Excel or the likes will produce it more easily, but how would  
the procedure look like in R?


Thanks,

Sören

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inefficiency of SAS Programming

2009-02-27 Thread Thomas Levine
I had enrolled in a statistics course this semester, but after the
first class, I dropped it because it uses SAS. This thread makes me
quite glad.

Tom!

On Fri, Feb 27, 2009 at 8:48 AM, Frank E Harrell Jr
f.harr...@vanderbilt.edu wrote:
 Wensui Liu wrote:

 Thanks for pointing me to the SAS code, Dr Harrell
 After reading codes, I have to say that the inefficiency is not
 related to SAS language itself but the SAS programmer. An experienced
 SAS programmer won't use much of hard-coding, very adhoc and difficult
 to maintain.
 I agree with you that in the SAS code, it is a little too much to
 evaluate predictions. such complex data step actually can be replaced
 by simpler iml code.

 Agreed that the SAS code could have been much better.  I programmed in SAS
 for 23 years and would have done it much differently.  But you will find
 that the most elegant SAS program re-write will still be a far cry from the
 elegance of R.

 Frank


 On Thu, Feb 26, 2009 at 5:57 PM, Frank E Harrell Jr
 f.harr...@vanderbilt.edu wrote:

 If anyone wants to see a prime example of how inefficient it is to
 program
 in SAS, take a look at the SAS programs provided by the US Agency for
 Healthcare Research and Quality for risk adjusting and reporting for
 hospital outcomes at http://www.qualityindicators.ahrq.gov/software.htm .
  The PSSASP3.SAS program is a prime example.  Look at how you do a vector
 product in the SAS macro language to evaluate predictions from a logistic
 regression model.  I estimate that using R would easily cut the
 programming
 time of this set of programs by a factor of 4.

 Frank
 --
 Frank E Harrell Jr   Professor and Chair           School of Medicine
                    Department of Biostatistics   Vanderbilt University

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.






 --
 Frank E Harrell Jr   Professor and Chair           School of Medicine
                     Department of Biostatistics   Vanderbilt University

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inefficiency of SAS Programming

2009-02-27 Thread Ajay ohri
Immersion therapy can be done at a later stage after the newly
baptized R  corporate
user is happy with the fact that he can do most of his legacy code in R
easily now .
 I have treading water in the immersion for over a year now.

 Most SAS consultants and corporate users are eager to try out R ..but they
are scared of immersion especially in these cut back times  ...so this could
be a middle step...let me go ahead and create the wrapper SAS package as a
middle ware between r and sas ..

and we will let the invisible hands of  free market decide :))

regards,

ajay

www.decisionstats.com

I am not a Marxist.
Karl Marx http://www.brainyquote.com/quotes/quotes/k/karlmarx131048.html

On Fri, Feb 27, 2009 at 8:01 PM, Marc Schwartz marc_schwa...@comcast.netwrote:

 on 02/27/2009 07:57 AM Frank E Harrell Jr wrote:
  Ajay ohri wrote:
 
  I would like to know if we can create a package in which r functions
  are renamed closer to sas language.doing so will help people familiar
  to SAS to straight away take to R for their work,thus decreasing the
  threshold for acceptance - and then get into deeper understanding later.
 
  since it is a package it would be optional only for people wanting to
  try out R from SAS.. Do we have such a package right now..it basically
  masks R functions to the equivalent function in another language just
  for user ease /beginners
 
  for example
 
  creating function for means
   procmeans-function(x,y)
  + {
  summary (
  subset(x,select=c(x,y))
  +
  )
 
  creating function for importing csv
 
  procimport -function(x,y)
  + {
  read.csv(
  textConnection(x),row.names=y,na.strings=  
  +
  )
 
 
  creating function fo describing data
 
  procunivariate-function(x)
  + {
  summary(x)
  +
  )
 
  regards,
 
  ajay
 
  Ajay,
 
  This will generate major confusion among users of all types and be hard
  to maintain.  A better approach is to get Bob Muenchen's excellent book
  and keep it nearby.
 
  Frank

 I whole heartedly agree with Frank here. It may be one thing to have a
 translation process in place based upon some form of logical mapping
 between the two languages (as Bob's book provides). But is another thing
 entirely to actually start writing functions that provide wrappers
 modeled on SAS based PROCs.

 If you do this, then you only serve to obfuscate the fundamental
 philosophical and functional differences between the two languages and
 doom a new useR to missing all of R's benefits. They will continue to
 try to figure out how to use R based upon their SAS intuition rather
 than developing a new set of coding and even statistical paradigms.

 Having been through the SAS to S/R transition myself, having used SAS
 for much of the 90's and now having used R for over 7 years, I can speak
 from personal experience and state that the only way to achieve the
 requisite proficiency with R is immersion therapy.

 Regards,

 Marc Schwartz


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Making tapply code more efficient

2009-02-27 Thread Doran, Harold
Previously, I posed the question pasted down below to the list and
received some very helpful responses. While the code suggestions
provided in response indeed work, they seem to only work with *very*
small data sets and so I wanted to follow up and see if anyone had ideas
for better efficiency. I was quite embarrased on this as our SAS
programmers cranked out programs that did this in the blink of an eye
(with a few variables), but R was spinning for days on my Ubuntu machine
and ultimately I saw a message that R was killed.

The data I am working with has 800967 total rows and 31 total columns.
The ID variable I use as the index variable in tapply() has 326397
unique cases.

 length(unique(qq$student_unique_id))
[1] 326397

To give a sense of what my data look like and the actual problem,
consider the following:

qq - data.frame(student_unique_id = factor(c(1,1,2,2,2)),
teacher_unique_id = factor(c(10,10,20,20,25)))

This is a student achievement database where students occupy multiple
rows in the data and the variable teacher_unique_id denotes the class
the student was in. What I am doing is looking to see if the teacher is
the same for each instance of the unique student ID. So, if I implement
the following:

same - function(x) length( unique(x) ) == 1
results - data.frame(
freq = tapply(qq$student_unique_id, qq$student_unique_id,
length),
tch = tapply(qq$teacher_unique_id, qq$student_unique_id, same)
)

I get the following results. I can see that student 1 appears in the
data twice and the teacher is always the same. However, student 2
appears three times and the teacher is not always the same.

 results
  freq   tch
12  TRUE
23 FALSE

Now, implementing this same procedure to a large data set with the
characteristics described above seems to be problematic in this
implementation. 

Does anyone have reactions on how this could be more efficient such that
it can run with large data as I described?

Harold

 sessionInfo()
R version 2.8.1 (2008-12-22)
x86_64-pc-linux-gnu 

locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.U
TF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=
C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATI
ON=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base




# Original question posted on 1/13/09
Suppose I have a dataframe as follows:

dat - data.frame(id = c(1,1,2,2,2), var1 = c(10,10,20,20,25), var2 =
c('foo', 'foo', 'foo', 'foobar', 'foo'))

Now, if I were to subset by id, such as:

 subset(dat, id==1)
  id var1 var2
1  1   10  foo
2  1   10  foo

I can see that the elements in var1 are exactly the same and the
elements in var2 are exactly the same. However,

 subset(dat, id==2)
  id var1   var2
3  2   20foo
4  2   20 foobar
5  2   25foo

Shows the elements are not the same for either variable in this
instance. So, what I am looking to create is a data frame that would be
like this

id  freqvar1var2
1   2   TRUETRUE   
2   3   FALSE   FALSE

Where freq is the number of times the ID is repeated in the dataframe. A
TRUE appears in the cell if all elements in the column are the same for
the ID and FALSE otherwise. It is insignificant which values differ for
my problem.

The way I am thinking about tackling this is to loop through the ID
variable and compare the values in the various columns of the dataframe.
The problem I am encountering is that I don't think all.equal or
identical are the right functions in this case.

So, say I was wanting to compare the elements of var1 for id ==1. I
would have

x - c(10,10)

Of course, the following works

 all.equal(x[1], x[2])
[1] TRUE

As would a similar call to identical. However, what if I only have a
vector of values (or if the column consists of names) that I want to
assess for equality when I am trying to automate a process over
thousands of cases? As in the example above, the vector may contain only
two values or it may contain many more. The number of values in the
vector differ by id.

Any thoughts?

Harold

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Setting initial starting conditions in scripts

2009-02-27 Thread Steve_Friedman

Hello,

I'm writing a variety of  R scripts and want to code the loadhistory and
workspace from within the script.  I found the loadhistory function but do
not see a comparable function for load workspace.  Is there one ?

Working with R 2.8.1 (2008-12-22) on a windows platform.

Thanks for any and all suggestions.

Steve

Steve Friedman Ph. D.
Spatial Statistical Analyst
Everglades and Dry Tortugas National Park
950 N Krome Ave (3rd Floor)
Homestead, Florida 33034

steve_fried...@nps.gov
Office (305) 224 - 4282
Fax (305) 224 - 4147

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Adjusting confidence intervals for paired t-tests of multiple endpoints

2009-02-27 Thread Erich Studerus
 

Dear R-users,

 

In a randomized placebo-controlled within-subject design, subjects recieved
a psycho-active drug and placebo. Subjects filled out a questionnaire
containing 15 scales on four different time points after drug
administration. In order to detect drug effects on each time point, I
compared scale values between placebo and drug for all time conditions and
scales, which sums up to 4*15=60 comparisons.

 

I have summarized the results in a data.frame with columns for t test
results including confidence intervals and mean-differences:

 

df1-data.frame(trt=gl(2,35),matrix(rnorm(4200),70,60))

 

df2-as.data.frame(matrix(NA,60,6))

names(df2)-c('t','df','p','lower','upper','mean.diff')

for (i in 1:60) {df2[i,1:6]-as.numeric(

unlist(t.test(df1[,i+1]~df1$trt,paired=T))[1:6])}

 

Now, I want to adjust the confidence intervals for multiple comparisons.

 

For a Bonferroni-adjustment, I did the following:

 

df2$std.error.of.diff-df2$mean.diff/df2$t

ci-qt(p=1-(0.05/nrow(df2)),df=df2$df)*df2$std.error.of.diff

ci.bonf-data.frame(lower=df2$mean.diff-ci,upper=df2$mean.diff+ci)

 

I hope this is the correct method. However, I think, the
Bonferroni-adjustment would be much too conservative. I need a less
conservative approach, perhaps, something like Holm's method, which I can
easily apply to the p-value with p.adjust(df2$p,method='holm'). Is there
package, which can do this for the confidence-interval or could someone
provide a simple script to calculate this?

 

Thanks a lot!

 

Erich


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Changing Ylab and scale in hclust plots

2009-02-27 Thread Steve_Friedman

Hello,

Running R 2.8.1 (2008-12-22) on Windows.

I running a series (25)  of clustering procedures using the hclust function
and would like each of the plots to have the same yaxis label and scale in
all of the plots.
Is there a procedure to change the scale on these plots?  Or is there an
alternative clustering function that can give me broader control

Here is my very simple code:

par(mfrow=c(2,1))

NSM5172004 - read.csv(H:\\HRH-Data_Files\\FrequencyScenarios\\NMS.csv,
header=TRUE, sep=,)
NMS - NSM5172004[-(1)]
NMS.dist - dist(NMS)
plot(hclust(NMS.dist, method = ward), xlab=, labels=NMS$Year, main =
Cape Sable Seaside Sparrow, sub = Hydro Scenario NMS5172004)


ECB2_65_01 -
read.csv(H:\\HRH-Data_Files\\FrequencyScenarios\\ECB2_65_01.csv,
header=TRUE, sep=,)
  ECB2 - ECB2_65_01[-(1)]
   ECB2.dist - dist(ECB2)
plot(hclust(ECB2.dist, method=ward), xlab=,labels=ECB2$Year, main=Cape
Sable Seaside Sparrow, sub= Hydro Scenario ECB2_65-01)

Thanks

Steve

Steve Friedman Ph. D.
Spatial Statistical Analyst
Everglades and Dry Tortugas National Park
950 N Krome Ave (3rd Floor)
Homestead, Florida 33034

steve_fried...@nps.gov
Office (305) 224 - 4282
Fax (305) 224 - 4147

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cross tabulation: convert frequencies to percentages

2009-02-27 Thread Marc Schwartz
on 02/27/2009 08:43 AM soeren.vo...@eawag.ch wrote:
 Hello,
 
 might be rather easy for R pros, but I've been searching to the dead end
 to ...
 
 twsource.area - table(twsource, area, useNA=ifany)
 
 gives me a nice cross tabulation of frequencies of two factors, but now
 I want to convert to pecentages of those absolute values. In addition
 I'd like an extra column and an extra row with absolute sums. I know,
 Excel or the likes will produce it more easily, but how would the
 procedure look like in R?

See ?prop.table which is referenced in the See Also section of ?table.

This will give you proportions, so if you want percentages, just
multiply by 100.

To add row and column totals, see ?addmargins which is also in the See
Also for ?table

TAB - table(state.division, state.region)

 TAB
state.region
state.division   Northeast South North Central West
  New England6 0 00
  Middle Atlantic3 0 00
  South Atlantic 0 8 00
  East South Central 0 4 00
  West South Central 0 4 00
  East North Central 0 0 50
  West North Central 0 0 70
  Mountain   0 0 08
  Pacific0 0 05

# Overall table proportions

 prop.table(TAB)
state.region
state.division   Northeast South North Central West
  New England 0.12  0.00  0.00 0.00
  Middle Atlantic 0.06  0.00  0.00 0.00
  South Atlantic  0.00  0.16  0.00 0.00
  East South Central  0.00  0.08  0.00 0.00
  West South Central  0.00  0.08  0.00 0.00
  East North Central  0.00  0.00  0.10 0.00
  West North Central  0.00  0.00  0.14 0.00
  Mountain0.00  0.00  0.00 0.16
  Pacific 0.00  0.00  0.00 0.10


# Column proportions

 prop.table(TAB, 2)
state.region
state.division   Northeast South North Central  West
  New England0.667 0.000 0.000 0.000
  Middle Atlantic0.333 0.000 0.000 0.000
  South Atlantic 0.000 0.500 0.000 0.000
  East South Central 0.000 0.250 0.000 0.000
  West South Central 0.000 0.250 0.000 0.000
  East North Central 0.000 0.000 0.417 0.000
  West North Central 0.000 0.000 0.583 0.000
  Mountain   0.000 0.000 0.000 0.6153846
  Pacific0.000 0.000 0.000 0.3846154



 addmargins(TAB)
state.region
state.division   Northeast South North Central West Sum
  New England6 0 00   6
  Middle Atlantic3 0 00   3
  South Atlantic 0 8 00   8
  East South Central 0 4 00   4
  West South Central 0 4 00   4
  East North Central 0 0 50   5
  West North Central 0 0 70   7
  Mountain   0 0 08   8
  Pacific0 0 05   5
  Sum91612   13  50



HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inefficiency of SAS Programming

2009-02-27 Thread Frank E Harrell Jr

Ajay ohri wrote:
Immersion therapy can be done at a later stage after the 
newly baptized R  corporate user is happy with the fact that he can do 
most of his legacy code in R easily now .


 I have treading water in the immersion for over a year now.


 Most SAS consultants and corporate users are eager to try out R ..but 
they are scared of immersion especially in these cut back times  ...so 
this could be a middle step...let me go ahead and create the wrapper SAS 
package as a middle ware between r and sas ..


and we will let the invisible hands of  free market decide :))


This is futile and will make it more difficult for other R users to help 
you in the future.  As Marc said this is really a bad idea and will 
backfire.


Frank




regards,

ajay

www.decisionstats.com http://www.decisionstats.com

I am not a Marxist. 
Karl Marx http://www.brainyquote.com/quotes/quotes/k/karlmarx131048.html 

On Fri, Feb 27, 2009 at 8:01 PM, Marc Schwartz 
marc_schwa...@comcast.net mailto:marc_schwa...@comcast.net wrote:


on 02/27/2009 07:57 AM Frank E Harrell Jr wrote:
  Ajay ohri wrote:
 
  I would like to know if we can create a package in which r functions
  are renamed closer to sas language.doing so will help people
familiar
  to SAS to straight away take to R for their work,thus decreasing the
  threshold for acceptance - and then get into deeper
understanding later.
 
  since it is a package it would be optional only for people
wanting to
  try out R from SAS.. Do we have such a package right now..it
basically
  masks R functions to the equivalent function in another language
just
  for user ease /beginners
 
  for example
 
  creating function for means
   procmeans-function(x,y)
  + {
  summary (
  subset(x,select=c(x,y))
  +
  )
 
  creating function for importing csv
 
  procimport -function(x,y)
  + {
  read.csv(
  textConnection(x),row.names=y,na.strings=  
  +
  )
 
 
  creating function fo describing data
 
  procunivariate-function(x)
  + {
  summary(x)
  +
  )
 
  regards,
 
  ajay
 
  Ajay,
 
  This will generate major confusion among users of all types and
be hard
  to maintain.  A better approach is to get Bob Muenchen's
excellent book
  and keep it nearby.
 
  Frank

I whole heartedly agree with Frank here. It may be one thing to have a
translation process in place based upon some form of logical mapping
between the two languages (as Bob's book provides). But is another thing
entirely to actually start writing functions that provide wrappers
modeled on SAS based PROCs.

If you do this, then you only serve to obfuscate the fundamental
philosophical and functional differences between the two languages and
doom a new useR to missing all of R's benefits. They will continue to
try to figure out how to use R based upon their SAS intuition rather
than developing a new set of coding and even statistical paradigms.

Having been through the SAS to S/R transition myself, having used SAS
for much of the 90's and now having used R for over 7 years, I can speak
from personal experience and state that the only way to achieve the
requisite proficiency with R is immersion therapy.

Regards,

Marc Schwartz





--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Making tapply code more efficient

2009-02-27 Thread ONKELINX, Thierry
Hi Harold,

What about this? You one have to make the crosstabulation once.

 qq - data.frame(student = factor(c(1,1,2,2,2)), teacher =
factor(c(10,10,20,20,25)))
 tab - table(qq$student, qq$teacher)
 data.frame(Student = rownames(tab), Freq = rowSums(tab), tch =
rowSums(tab  0) == 1)
  Student Freq   tch
1   12  TRUE
2   23 FALSE

HTH,

Thierry




ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium 
tel. + 32 54/436 185
thierry.onkel...@inbo.be 
www.inbo.be 

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey

-Oorspronkelijk bericht-
Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
Namens Doran, Harold
Verzonden: vrijdag 27 februari 2009 15:47
Aan: r-help@r-project.org
Onderwerp: [R] Making tapply code more efficient

Previously, I posed the question pasted down below to the list and
received some very helpful responses. While the code suggestions
provided in response indeed work, they seem to only work with *very*
small data sets and so I wanted to follow up and see if anyone had ideas
for better efficiency. I was quite embarrased on this as our SAS
programmers cranked out programs that did this in the blink of an eye
(with a few variables), but R was spinning for days on my Ubuntu machine
and ultimately I saw a message that R was killed.

The data I am working with has 800967 total rows and 31 total columns.
The ID variable I use as the index variable in tapply() has 326397
unique cases.

 length(unique(qq$student_unique_id))
[1] 326397

To give a sense of what my data look like and the actual problem,
consider the following:

qq - data.frame(student_unique_id = factor(c(1,1,2,2,2)),
teacher_unique_id = factor(c(10,10,20,20,25)))

This is a student achievement database where students occupy multiple
rows in the data and the variable teacher_unique_id denotes the class
the student was in. What I am doing is looking to see if the teacher is
the same for each instance of the unique student ID. So, if I implement
the following:

same - function(x) length( unique(x) ) == 1
results - data.frame(
freq = tapply(qq$student_unique_id, qq$student_unique_id,
length),
tch = tapply(qq$teacher_unique_id, qq$student_unique_id, same)
)

I get the following results. I can see that student 1 appears in the
data twice and the teacher is always the same. However, student 2
appears three times and the teacher is not always the same.

 results
  freq   tch
12  TRUE
23 FALSE

Now, implementing this same procedure to a large data set with the
characteristics described above seems to be problematic in this
implementation. 

Does anyone have reactions on how this could be more efficient such that
it can run with large data as I described?

Harold

 sessionInfo()
R version 2.8.1 (2008-12-22)
x86_64-pc-linux-gnu 

locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.U
TF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=
C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATI
ON=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base




# Original question posted on 1/13/09
Suppose I have a dataframe as follows:

dat - data.frame(id = c(1,1,2,2,2), var1 = c(10,10,20,20,25), var2 =
c('foo', 'foo', 'foo', 'foobar', 'foo'))

Now, if I were to subset by id, such as:

 subset(dat, id==1)
  id var1 var2
1  1   10  foo
2  1   10  foo

I can see that the elements in var1 are exactly the same and the
elements in var2 are exactly the same. However,

 subset(dat, id==2)
  id var1   var2
3  2   20foo
4  2   20 foobar
5  2   25foo

Shows the elements are not the same for either variable in this
instance. So, what I am looking to create is a data frame that would be
like this

id  freqvar1var2
1   2   TRUETRUE   
2   3   FALSE   FALSE

Where freq is the number of times the ID is repeated in the dataframe. A
TRUE appears in the cell if all elements in the column are the same for
the ID and FALSE otherwise. It is insignificant which values differ for
my problem.

The way I am thinking about tackling this is to loop through the ID
variable and compare the values in the various columns of the dataframe.
The problem I am encountering is that I don't think all.equal or
identical are the right functions in 

Re: [R] Inefficiency of SAS Programming

2009-02-27 Thread Terry Therneau
Three comments

 I actually think you can write worse code in R than in SAS: more tools = more 
scope for innovatively bad ideas.  The ability to write bad code should not 
damm 
a language.  
 
  I found almost all of the improvements to the multi-line SAS recode to be 
regressions, both the SAS and the S suggestions. 
a. Everyone, even those of you with no SAS backround whatsoever, 
immediately 
understood the code.  Most of the replacements are obscure.  Compilers are very 
good these days and computers are fast, fewer typed characters != better.
b. If I were writing the S code for such an application, it would look much 
the same.  I worked as a programmer in medical research for several years, and 
one of the things that moved me on to graduate studies in statistics was the 
realization that doing my best work meant being as UN-clever as possible in my 
code.  

  Frank's comments imply that he was reading SAS macro code at the moment of 
peak frustration.  And if you want to criticise SAS code, this is the place to 
look.  SAS macro started out as some simple expansions, then got added on to, 
then added on again, and again, and   with no overall blueprint.  It is 
much 
like the farmhouse of some neighbors of mine growing up: 4 different expansions 
in 4 eras, and no overall guiding plan.  The interior layout was interesting 
to say the least. I was once a bona fide SAS 'wizard' (and Frank was much 
better 
than me), and I can't read the stuff without grinding my teeth.
  S was once headed down the same road. One of the best things ever with the 
language was documented in the blue book The New S Language, where Becker et 
al had the wisdom to scrap the macro processor.  
 
Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Ordinal Mantel-Haenszel type inference

2009-02-27 Thread David Winsemius
I suspect that what you need will be in S-PLUS (and R) Manual to  
Accompany Agresti’s Categorical Data Analysis (2002) 2nd edition by  
Laura A. Thompson, 2007 which I have always been able to find with a  
Google search. Yep, it's still there:


https://home.comcast.net/~lthompson221/Splusdiscrete2.pdf

Its Chapter 7, Logit Models for Multinomial Responses  discusses  
various cumulative logit models.


The polr function (proportional odds logistic regression) in MASS will  
return the regression equivalent of what you are asking for. Thompson  
says the lrm in the Desing library will also do it, by which she  
really means that the lrm in the Design package by Harrell will do it.  
The link she offers is outdated and it doesn't really matter for  
obtaining the Hmisc/Design packages, since they are on CRAN, but  
online available documentation is currently at:


http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/StatComp

She then also mentions lcr (library ordinal), and nordr (library  
gnlm). Later in the chapter she illustrates the use of the vglm  
function in in the vgam package.


--
David Winsemius

On Feb 27, 2009, at 9:04 AM, Jourdan Gold wrote:


Hello,

I am searching for an R-Package that does an exentsion of the Mantel- 
Haenszel test for ordinal data as described in Liu and Agresti  
(1996) A Mantel-Haenszel type inference for cummulative odds  
ratios. in Biometrics. I see packages such as Epi that perform it  
for binary data and derives a varaince for it using the Robbins and  
Breslow variance method. As well as another pacakge that derives it  
for nominal variables but does not provide a variance or confidence  
limit.


Does a package exist that does this? I have searched the list  
archives and can't seem to see such a package but I could be missing  
something.  thank you.



yours sincerely,


Jourdan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inefficiency of SAS Programming

2009-02-27 Thread Frank E Harrell Jr

Terry Therneau wrote:

Three comments

 I actually think you can write worse code in R than in SAS: more tools = more 
scope for innovatively bad ideas.  The ability to write bad code should not damm 
a language.  
 
  I found almost all of the improvements to the multi-line SAS recode to be 
regressions, both the SAS and the S suggestions. 
a. Everyone, even those of you with no SAS backround whatsoever, immediately 
understood the code.  Most of the replacements are obscure.  Compilers are very 
good these days and computers are fast, fewer typed characters != better.
b. If I were writing the S code for such an application, it would look much 
the same.  I worked as a programmer in medical research for several years, and 
one of the things that moved me on to graduate studies in statistics was the 
realization that doing my best work meant being as UN-clever as possible in my 
code.  


If I were writing S code for this it would be dramatically different.  I 
would try to be efficient and elegant but would need to remember to be a 
teacher at the same time.  For example this kind of recode is super 
efficient and quick to program but would need good comments or a 
handbook to all of my code:  c(cat=1, dog=2, giraffe=3)[animal]
But I think the code is quite intuitive once you have used that 
construct once.


There also a lot of factoring of code that could be done as others have 
pointed out.



  Frank's comments imply that he was reading SAS macro code at the moment of 
peak frustration.  And if you want to criticise SAS code, this is the place to 
look.  SAS macro started out as some simple expansions, then got added on to, 
then added on again, and again, and   with no overall blueprint.  It is much 
like the farmhouse of some neighbors of mine growing up: 4 different expansions 
in 4 eras, and no overall guiding plan.  The interior layout was interesting 
to say the least. I was once a bona fide SAS 'wizard' (and Frank was much better 
than me), and I can't read the stuff without grinding my teeth.
  S was once headed down the same road. One of the best things ever with the 
language was documented in the blue book The New S Language, where Becker et 
al had the wisdom to scrap the macro processor.  


Well put.  I am amazed there hasn't been a revolt among SAS users 
decades ago.  The S approach is also easier to debug one line at a time.


Cheers,
Frank

 
  	Terry Therneau






--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inefficiency of SAS Programming

2009-02-27 Thread John Sorkin
Terry's remarks (see below) are well received however, I take issue with one 
part of his comments. As a long time programmer (in both statistical 
programming languages and traditional programming languages), I miss the 
ability to write native-languages in R. While macros can make for difficult to 
read code, when used properly, they can also make flexible code that, if 
properly written (including good documentation, which should be a part of any 
code) can be easy to read.

Finally, everyone must remember that SAS code can be difficult to understand or 
inefficient just as R code can be difficult to understand or inefficient. 
In the end, both programming systems have their advantages and disadvantage. No 
programming language is perfect. It is not fair, nor correct to damn one or the 
other. Accept the fact that some things are more easily and more clearly done 
in one language, other things are more clearly and more easily done in another 
language.  Let's move on to more important issues, viz. improving R so it is as 
good as it possibly can be.
John  

  

John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

 Terry Therneau thern...@mayo.edu 2/27/2009 10:23 AM 
Three comments

 I actually think you can write worse code in R than in SAS: more tools = more 
scope for innovatively bad ideas.  The ability to write bad code should not 
damm 
a language.  
 
  I found almost all of the improvements to the multi-line SAS recode to be 
regressions, both the SAS and the S suggestions. 
a. Everyone, even those of you with no SAS backround whatsoever, 
immediately 
understood the code.  Most of the replacements are obscure.  Compilers are very 
good these days and computers are fast, fewer typed characters != better.
b. If I were writing the S code for such an application, it would look much 
the same.  I worked as a programmer in medical research for several years, and 
one of the things that moved me on to graduate studies in statistics was the 
realization that doing my best work meant being as UN-clever as possible in my 
code.  

  Frank's comments imply that he was reading SAS macro code at the moment of 
peak frustration.  And if you want to criticise SAS code, this is the place to 
look.  SAS macro started out as some simple expansions, then got added on to, 
then added on again, and again, and   with no overall blueprint.  It is 
much 
like the farmhouse of some neighbors of mine growing up: 4 different expansions 
in 4 eras, and no overall guiding plan.  The interior layout was interesting 
to say the least. I was once a bona fide SAS 'wizard' (and Frank was much 
better 
than me), and I can't read the stuff without grinding my teeth.
  S was once headed down the same road. One of the best things ever with the 
language was documented in the blue book The New S Language, where Becker et 
al had the wisdom to scrap the macro processor.  
 
Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code.

Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:6}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R crash on Mac

2009-02-27 Thread Adelchi Azzalini

If I define this function

R ask -  function (message = Type in datum) 
   eval(parse(prompt = paste(message, : , sep = )))

the following is produced as expected on a Linux/debian machine

R ask(input)
input: 3
[1] 3
R ask(input)
input: 3:6
[1] 3 4 5 6
R ask(input)
input: c(3,6)
[1] 3 6

If I run exactly the same on a Mac (OS X 10.5.6), it still works 
provided R is run in a Terminal window. 

The outcome changes if R is run in its own window, started by clicking 
on its icon; the first two examples are still Ok, the third one produces:


 *** caught segfault ***
 address 0x4628c854, cause 'memory not mapped'
 

R sessionInfo()  # before crash!
R version 2.8.1 (2008-12-22) 
i386-apple-darwin8.11.1 

locale:
en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats utils datasets  grDevices graphics  methods   base
R R.version
   _   
   platform   i386-apple-darwin8.11.1 
   arch   i386
   os darwin8.11.1
   system i386, darwin8.11.1  
   status 
   major  2   
   minor  8.1 
   year   2008
   month  12  
   day22  
   svn rev47281   
   language   R   
   version.string R version 2.8.1 (2008-12-22)
   


-- 
Adelchi Azzalini  azzal...@stat.unipd.it
Dipart.Scienze Statistiche, Università di Padova, Italia
tel. +39 049 8274147,  http://azzalini.stat.unipd.it/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] formula formatting/grammar for regression

2009-02-27 Thread Brigid Mooney
Hi all,

I am doing some basic regression analysis, and am getting a bit
confused on how to enter non-polynomial formulas to be used.

For example, consider that I want to find A and r such that the
formula y = A*exp(r*x) provides the the best fit to the line y=x on
the interval [0,50].

I can set:
xpts - seq(0, 50, by=0.1)
ypts - seq(0, 50, by=0.1)

I know I can find a fitted polynomial of a given degree using
lm(ypts ~ poly(xpts, degree=5, raw=TRUE))

But am confused on what the formula should be for trying to find a fit
to y = A*exp(r*x).

If anyone knows of a resource that describes the grammar behind
assembling these formulas, I would really appreciate being pointed in
that direction as I can't seem to find much beyond basic polynomials.

Thanks for the help!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] formula formatting/grammar for regression

2009-02-27 Thread Dieter Menne
Brigid Mooney bkmooney at gmail.com writes:

 I am doing some basic regression analysis, and am getting a bit
 confused on how to enter non-polynomial formulas to be used.
..
 But am confused on what the formula should be for trying to find a fit
 to y = A*exp(r*x).

If this example is just a placeholder for more complex than poly,
you should check function nls which works for non-linear functions.

However, if you really want to solve this problem only, doing a 
log on you data and fitting a log of the above function with lm()
is the easiest way out. Results can be a bit different from the
nonlinear case depending on noise, because in one case weight
are log-weighted, in the other linearly.

Dieter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] Package DAKS for knowledge space theory, on CRAN now

2009-02-27 Thread Ali Uenlue
Version 1.0-0 of DAKS (Data Analysis and Knowledge Spaces) has been  
released to CRAN.

Knowledge space theory is a recent psychometric test theory based on  
combinatorial mathematical structures (order and lattice theory).  
Solvability dependencies between dichotomous test items play an  
important role in knowledge space theory. Utilizing hypothesized  
dependencies between items, knowledge space theory has been  
successfully applied for the computerized, adaptive assessment and  
training of knowledge.

The package DAKS implements inductive item tree analysis methods for  
deriving surmise relations from binary data. It provides functions for  
computing population and estimated asymptotic variances of the used  
fit measures, and for switching between test item and knowledge state  
representations.  Other features are a Hasse diagram drawing device, a  
data simulation tool based on a finite mixture latent variable model,  
and a function for computing response pattern and knowledge state  
frequencies.

Best regards,
Anatol Sargin
Ali Uenlue
--
Department of Computer-Oriented Statistics and Data Analysis
Institute of Mathematics
University of Augsburg
http://stats.math.uni-augsburg.de/


[[alternative HTML version deleted]]

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using package ROCR

2009-02-27 Thread Uwe Ligges
For question 1: Can you please report to the package maintainer (well, I 
am CCing Tobias now) who will certainly be happy to improve the package 
(particularly the demo behaviour).


For question 2 (and your latest message):
does not happen for me. Which versions are you using, i.e. have you 
updated to the most recent ones? In any case, using Namespaces is 
another thing that might be worth considering for Tobias as the ROCR 
maintainer.


Tobias, a last point for you: Your package gives WARNINGs in the checks 
for ages now, can you please fix that also.


Thank you,
Uwe Ligges



wiener30 wrote:

Just an update concerning an error message in using ROCR package.

Error in as.double(y) : 
  cannot coerce type 'S4' to vector of type 'double' 


I have changed the sequence of loading the packages and the problem has
gone:
library(ROCR)
library(randomForest)

The loading sequence that caused an error was:
library(randomForest)
library(ROCR)

May be this info could be useful for somebody else who is getting the same
error.




wiener30 wrote:

Thank you very much for the response!

The plot(1,1) helped to resolve the first problem.
But I am still getting a second error message when running demo(ROCR)

Error in as.double(y) : 
  cannot coerce type 'S4' to vector of type 'double'


It seems it has something to do with compatibility of S4 objects.

My versions of R and ROCR package are the same as you listed.
But it seems something other is missing in my installation.


William Doane wrote:


Responding to question 1... it seems the demo assumes you already have a
plot window open.

  library(ROCR)
  plot(1,1)
  demo(ROCR)

seems to work.

For question 2, my environment produces the expected results... plot
doesn't generate an error:
  * R 2.8.1 GUI 1.27 Tiger build 32-bit (5301)
  * OS X 10.5.6
  * ROCR 1.0-2

-Wil



wiener30 wrote:

I am trying to use package ROCR to analyze classification accuracy,
unfortunately there are some problems right at the beginning.

Question 1) 
When I try to run demo I am getting the following error message

library(ROCR)
demo(ROCR)
if(dev.cur() = 1)  [TRUNCATED] 

Error in get(getOption(device)) : wrong first argument
When I issue the command
dev.cur() 

it returns
null device 
  1

It seems something is wrong with my R-environment ?
Could somebody provide a hint, what is wrong.

Question 2)
When I run an example commands from the manual
library(ROCR)
data(ROCR.simple)
pred - prediction( ROCR.simple$predictions, ROCR.simple$labels )
perf - performance( pred, tpr, fpr )
plot( perf )

the plot command issues the following error message
Error in as.double(y) : 
  cannot coerce type 'S4' to vector of type 'double'


How this could be fixed ?

Thanks for the support










__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] levelplot help needed

2009-02-27 Thread Sundar Dorai-Raj
To reorder the y-labels, simply reorder the factor levels:

df - data.frame(x_label = factor(x_label),
 y_label = factor(y_label, rev(y_label)),
 values = as.vector(my.data))

Not sure about putting the strips at the bottom. A quick scan of
?xyplot and ?strip.default suggests that this is not possible, but I'm
sure Deepayan will correct me if I'm wrong (he often does).

--sundar

On Fri, Feb 27, 2009 at 5:51 AM, Antje niederlein-rs...@yahoo.de wrote:
 Hi there,

 I'm looking for someone who can give me some hints how to make a nice
 levelplot. As an example, I have the following code:

 # create some example data
 # --
 xl - 4
 yl - 10

 my.data - sapply(1:xl, FUN = function(x) { rnorm( yl, mean = x) })

 x_label - rep(c(X Label 1, X Label 2, X Label 3, X Label 4), each =
 yl)
 y_label - rep(paste(Y Label , 1:yl, sep=), xl)

 df - data.frame(x_label = factor(x_label),y_label = factor(y_label), values
 = as.vector(my.data))

 df1 - data.frame(df, group = rep(Group 1, xl*yl))
 df2 - data.frame(df, group = rep(Group 2, xl*yl))
 df3 - data.frame(df, group = rep(Group 3, xl*yl))

 mdf - rbind(df1,df2,df3)

 # plot
 # --

 graph - levelplot(mdf$values ~ mdf$x_label * mdf$y_label | mdf$group,
                                aspect = xy, layout = c(3,1),
                                scales = list(x = list(labels =
 substr(levels(factor(mdf$x_label)),0,5), rot = 45)))
            print(graph)

 # --


 (I need to put this strange x-labels, because in my real data the values of
 the x-labels are too long and I just want to display the first 10 characters
 as label)

 My questions:

 * I'd like to start with Y Label 1 in the upper row (that's a more general
 issue, how can I have influence on the order of x,y, and groups?)
 * I'd like to put the groups at the bottom

 Can anybody give me some help?

 Antje

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] combining identify() and locator()

2009-02-27 Thread Brian Bolt
awesome.  Thank you very much for the quick response. I think this is  
exactly what I was looking for.

-Brian

On Feb 27, 2009, at 1:10 AM, Barry Rowlingson wrote:


2009/2/27 Brian Bolt bb...@kalypsys.com:

Hi,
I am wondering if there might be a way to combine the two functions
identify() and locator() such that if I use identify() and then  
click on a
point outside the set tolerance, the x,y coordinates are returned  
as in

locator().  Does anyone know of a way to do this?
Thanks in advance for any help


Since identify will only return the indexes of selected points, and
it only takes on-screen clicks for coordinates, you'll have to
leverage locator and duplicate some of the identify work. So call
locator(1), then compute the distancez to your points, and if any are
below your tolerance mark them using text(), otherwise keep the
coordinates of the click.

You can use dist() to compute a distance matrix, but if you want to
totally replicate identify's tolerance behaviour I think you'll have
to convert from your data coordinates to device coordinates. The
grconvertX and Y functions look like they'll do that for you.

Okay, that's the flatpack delivered, I think you've got all the
parts, some assembly required!

Barry


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] levelplot help needed

2009-02-27 Thread David Winsemius

Try using the alternating=FALSE option.

--  
David Winsemius

On Feb 27, 2009, at 12:07 PM, Sundar Dorai-Raj wrote:


To reorder the y-labels, simply reorder the factor levels:

df - data.frame(x_label = factor(x_label),
y_label = factor(y_label, rev(y_label)),
values = as.vector(my.data))

Not sure about putting the strips at the bottom. A quick scan of
?xyplot and ?strip.default suggests that this is not possible, but I'm
sure Deepayan will correct me if I'm wrong (he often does).

--sundar

On Fri, Feb 27, 2009 at 5:51 AM, Antje niederlein-rs...@yahoo.de  
wrote:

Hi there,

I'm looking for someone who can give me some hints how to make a nice
levelplot. As an example, I have the following code:

# create some example data
# --
xl - 4
yl - 10

my.data - sapply(1:xl, FUN = function(x) { rnorm( yl, mean = x) })

x_label - rep(c(X Label 1, X Label 2, X Label 3, X Label  
4), each =

yl)
y_label - rep(paste(Y Label , 1:yl, sep=), xl)

df - data.frame(x_label = factor(x_label),y_label =  
factor(y_label), values

= as.vector(my.data))

df1 - data.frame(df, group = rep(Group 1, xl*yl))
df2 - data.frame(df, group = rep(Group 2, xl*yl))
df3 - data.frame(df, group = rep(Group 3, xl*yl))

mdf - rbind(df1,df2,df3)

# plot
# --

graph - levelplot(mdf$values ~ mdf$x_label * mdf$y_label | mdf 
$group,

   aspect = xy, layout = c(3,1),
   scales = list(x = list(labels =
substr(levels(factor(mdf$x_label)),0,5), rot = 45)))
   print(graph)

# --


(I need to put this strange x-labels, because in my real data the  
values of
the x-labels are too long and I just want to display the first 10  
characters

as label)

My questions:

* I'd like to start with Y Label 1 in the upper row (that's a  
more general

issue, how can I have influence on the order of x,y, and groups?)
* I'd like to put the groups at the bottom

Can anybody give me some help?

Antje

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Re : Have a function like the _n_ in R ? (Automatic count function )

2009-02-27 Thread Johannes Hüsing
If you are in the context of a data frame (which is closest to the 
concept

of a data set in SAS), the 1:nrow(df) is closest to what you may look
for.

For instance:

data(iris)
.n. - 1:nrow(iris)

You may notice that this number is not very idiomatic in R.

If you have something like:

if(_N_  50) then output;

in R you can simply put

iris[-(1:50),]

without using an explicit counter variable.

In the context of a matrix, the row() and col() functions may do what
you want.



Am 25.02.2009 um 15:34 schrieb justin bem:

R is more flexible that SAS. You have many functions for loop e.g. 
for, while, repeat. You also have dim and length functions to get 
objects dimensions.


i-0
dat-matrix(c(1, runif(1), .Random.seed[1]),nr=1)
repeat{
    i=i+1
    dat-rbind(dat, matrix(c(1+i, runif(1), .Random.seed[1]),nr=1))
    if (i==4) break
}

colnames(dat)-c(counter, x,seed)
dat

 Justin BEM
BP 1917 Yaoundé
Tél (237) 99597295
(237) 22040246





De : Nash morri...@ibms.sinica.edu.tw
À : r-help r-help@r-project.org
Envoyé le : Mercredi, 25 Février 2009, 13h25mn 18s
Objet : [R] Have a function like the _n_ in R ? (Automatic count 
function )



Have the counter function in R ?

if we use the software SAS

/*** SAS Code **/
data tmp(drop= i);
retain seed x 0;
do i = 1 to 5;
    call ranuni(seed,x);
    output;
end;
run;

data new;
counter=_n_;  * this keyword _n_ ;
set tmp;
run;

/*
_n_ (Automatic variables)
are created automatically by the DATA step or by DATA step statements.
*/

/*** Output 
counter        seed            x
1    584043288            0.27197
2    935902963            0.43581
3    301879523            0.14057
4    753212598            0.35074
5    1607264573    0.74844

/

Have a function like the _n_ in R ?


--
Nash - morri...@ibms.sinica.edu.tw

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Filtering a dataset's columns by another dataset's column names

2009-02-27 Thread Josh B
Hello all,

I hope some of you can come to my rescue, yet again.

I have two genetic datasets, and I want one of the datasets to have only the 
columns that are in common with the other dataset. 
Here is a toy example (my real datasets have hundreds of columns):

Dataset 1:

IndividualSNP1SNP2SNP3SNP4SNP5
1AGTCA
2TCAGT
3ACTCA

Dataset 2:

IndividualSNP1SNP3SNP5SNP6SNP7
4ATTGC
5TAAGG
6AATCG

I want Dataset1 to have only columns that are also represented in Dataset 2, 
i.e., I want to generate a new Dataset 3 that looks like this:

IndividualSNP1SNP3SNP5
1ATA
2TAT
3ATA

Does anyone know how I could do this? Keep in mind that this is not a simple 
merge, as in the merge function.

Thanks very much for your help everyone.
Josh B.



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Filtering a dataset's columns by another dataset's column names

2009-02-27 Thread Rowe, Brian Lee Yung (Portfolio Analytics)
Try this:

d1[,intersect(names(d1),names(d2))]

HTH, Brian

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of Josh B
Sent: Friday, February 27, 2009 12:28 PM
To: R Help
Subject: [R] Filtering a dataset's columns by another dataset's column
names


Hello all,

I hope some of you can come to my rescue, yet again.

I have two genetic datasets, and I want one of the datasets to have only
the columns that are in common with the other dataset. 
Here is a toy example (my real datasets have hundreds of columns):

Dataset 1:

IndividualSNP1SNP2SNP3SNP4SNP5
1AGTCA
2TCAGT
3ACTCA

Dataset 2:

IndividualSNP1SNP3SNP5SNP6SNP7
4ATTGC
5TAAGG
6AATCG

I want Dataset1 to have only columns that are also represented in
Dataset 2, i.e., I want to generate a new Dataset 3 that looks like
this:

IndividualSNP1SNP3SNP5
1ATA
2TAT
3ATA

Does anyone know how I could do this? Keep in mind that this is not a
simple merge, as in the merge function.

Thanks very much for your help everyone.
Josh B.



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
This message w/attachments (message) may be privileged, confidential or 
proprietary, and if you are not an intended recipient, please notify the 
sender, do not use or share it and delete it. Unless specifically indicated, 
this message is not an offer to sell or a solicitation of any investment 
products or other financial product or service, an official confirmation of any 
transaction, or an official statement of Merrill Lynch. Subject to applicable 
law, Merrill Lynch may monitor, review and retain e-communications (EC) 
traveling through its networks/systems. The laws of the country of each 
sender/recipient may impact the handling of EC, and EC may be archived, 
supervised and produced in countries other than the country in which you are 
located. This message cannot be guaranteed to be secure or error-free. 
References to Merrill Lynch are references to any company in the Merrill 
Lynch  Co., Inc. group of companies, which are wholly-owned by Bank of America 
Corporation. Secu!
 rities and Insurance Products: * Are Not FDIC Insured * Are Not Bank 
Guaranteed * May Lose Value * Are Not a Bank Deposit * Are Not a Condition to 
Any Banking Service or Activity * Are Not Insured by Any Federal Government 
Agency. Attachments that are part of this E-communication may have additional 
important disclosures and disclaimers, which you should read. This message is 
subject to terms available at the following link: 
http://www.ml.com/e-communications_terms/. By messaging with Merrill Lynch you 
consent to the foregoing.
--

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Filtering a dataset's columns by another dataset's column names

2009-02-27 Thread Marc Schwartz
on 02/27/2009 11:27 AM Josh B wrote:
 Hello all,
 
 I hope some of you can come to my rescue, yet again.
 
 I have two genetic datasets, and I want one of the datasets to have only the 
 columns that are in common with the other dataset. 
 Here is a toy example (my real datasets have hundreds of columns):
 
 Dataset 1:
 
 IndividualSNP1SNP2SNP3SNP4SNP5
 1AGTCA
 2TCAGT
 3ACTCA
 
 Dataset 2:
 
 IndividualSNP1SNP3SNP5SNP6SNP7
 4ATTGC
 5TAAGG
 6AATCG
 
 I want Dataset1 to have only columns that are also represented in Dataset 2, 
 i.e., I want to generate a new Dataset 3 that looks like this:
 
 IndividualSNP1SNP3SNP5
 1ATA
 2TAT
 3ATA
 
 Does anyone know how I could do this? Keep in mind that this is not a simple 
 merge, as in the merge function.
 
 Thanks very much for your help everyone.
 Josh B.

Same.Cols - intersect(names(DF1), names(DF2))

 Same.Cols
[1] Individual SNP1   SNP3   SNP5

 rbind(DF1[, Same.Cols], DF2[, Same.Cols])
  Individual SNP1 SNP3 SNP5
1  1ATA
2  2TAT
3  3ATA
4  4ATT
5  5TAA
6  6AAT


See ?intersect, which gives you the common column names, which you can
then use in rbind().

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Filtering a dataset's columns by another dataset's column names

2009-02-27 Thread Jorge Ivan Velez
Dear Josh,
Try this:

dataset1[,colnames(dataset1) %in% colnames(dataset2)]

Take a look at ?colnames and ?%in% for more information.

HTH,

Jorge


On Fri, Feb 27, 2009 at 12:27 PM, Josh B josh...@yahoo.com wrote:

 Hello all,

 I hope some of you can come to my rescue, yet again.

 I have two genetic datasets, and I want one of the datasets to have only
 the columns that are in common with the other dataset.
 Here is a toy example (my real datasets have hundreds of columns):

 Dataset 1:

 IndividualSNP1SNP2SNP3SNP4SNP5
 1AGTCA
 2TCAGT
 3ACTCA

 Dataset 2:

 IndividualSNP1SNP3SNP5SNP6SNP7
 4ATTGC
 5TAAGG
 6AATCG

 I want Dataset1 to have only columns that are also represented in Dataset
 2, i.e., I want to generate a new Dataset 3 that looks like this:

 IndividualSNP1SNP3SNP5
 1ATA
 2TAT
 3ATA

 Does anyone know how I could do this? Keep in mind that this is not a
 simple merge, as in the merge function.

 Thanks very much for your help everyone.
 Josh B.




[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Filtering a dataset's columns by another dataset's column names

2009-02-27 Thread David Winsemius
So you want the data that is in Dataset 1 but only the column names  
that are also in Dataset 2:


How about:

 subset(DS1, select = names(DS1) %in% names(DS2) )

 DS1 -read.table(textConnection(IndividualSNP1SNP2 
SNP3SNP4SNP5

+ 1AGTCA
+ 2TCAGT
+ 3ACTCA),header=TRUE)
 DS2 -read.table(textConnection(IndividualSNP1SNP3 
SNP5SNP6SNP7

+ 4ATTGC
+ 5TAAGG
+ 6AATCG),header=TRUE)

 subset(DS1, select= names(DS1) %in% names(DS2) )
  Individual SNP1 SNP3 SNP5
1  1ATA
2  2TAT
3  3ATA

Tested!
--
David Winsemius
Heritage Labs

On Feb 27, 2009, at 12:27 PM, Josh B wrote:


Hello all,

I hope some of you can come to my rescue, yet again.

I have two genetic datasets, and I want one of the datasets to have  
only the columns that are in common with the other dataset.

Here is a toy example (my real datasets have hundreds of columns):

Dataset 1:

IndividualSNP1SNP2SNP3SNP4SNP5
1AGTCA
2TCAGT
3ACTCA

Dataset 2:

IndividualSNP1SNP3SNP5SNP6SNP7
4ATTGC
5TAAGG
6AATCG

I want Dataset1 to have only columns that are also represented in  
Dataset 2, i.e., I want to generate a new Dataset 3 that looks like  
this:


IndividualSNP1SNP3SNP5
1ATA
2TAT
3ATA

Does anyone know how I could do this? Keep in mind that this is not  
a simple merge, as in the merge function.


Thanks very much for your help everyone.
Josh B.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] combining identify() and locator()

2009-02-27 Thread Barry Rowlingson
2009/2/27 Brian Bolt bb...@kalypsys.com:
 awesome.  Thank you very much for the quick response. I think this is
 exactly what I was looking for.

 Here's a basic framework:

 `idloc` -
  function(xy,n=1, tol=0.25){

tol2=tol^2

icoords = 
cbind(grconvertX(xy[,1],to=inches),grconvertY(xy[,2],to=inches))
hit = c()
missed = matrix(ncol=2,nrow=0)
for(i in 1:n){
  ptU = locator(1)
  pt = c(grconvertX(ptU$x,to='inches'),grconvertY(ptU$y,to=inches))

  d2 = (icoords[,1]-pt[1])^2 + (icoords[,2]-pt[2])^2
  if (any(d2  tol2)){
print(clicked)
hit = c(hit, (1:dim(xy)[1])[d2  tol2])
  }else{
print(missed)
missed=rbind(missed,c(ptU$x,ptU$y))
  }

}
return(list(hit=hit,missed=missed))

  }

Test:

 xy = cbind(1:10,runif(10))
 plot(xy)
 idloc(xy,10)

 now click ten times, on points or off points. You get back:

$hit
[1]  4  6  7 10

$missed
 [,1]  [,2]
[1,] 5.698940 0.6835392
[2,] 6.216171 0.6144229
[3,] 5.877982 0.5752569
[4,] 6.773190 0.2895761
[5,] 7.210847 0.3126149
[6,] 9.239985 0.5614337

 - $hit is the indices of the points you hit (in order, including
duplicates) and $missed are the coordinates of the misses.

 It crashes out if you hit the middle button for the locator, but that
should be easy enough to fixup. It doesn't label hit points, but
that's also easy enough to do.

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Filtering a dataset's columns by another dataset's column names

2009-02-27 Thread Daniel Malter
Hi Josh B,

this looks like homework to me. Please obey the posting rules. I.e., provide
self-contained code/examples and show what the point is at which you are
stuck. 

To solve your problem, you need the which and the names function as well
as the %in%  operator. It is then easy to rbind the two datasets once you
have figured out what the common column names are. Please try on your own
first and report back if and where you are stuck along with the
self-contained code. If this is indeed homework, please ask your professor
or teacher.

Example for two simulated datasets:

x=rnorm(30)
dim(x)=c(5,6)
x=data.frame(x)
names(x)=c(a,b,c,x,y,z)

y=rnorm(30)
dim(y)=c(5,6)
y=data.frame(y)
names(y)=c(a,b,d,v,w,x)

Daniel


-
cuncta stricte discussurus
-

-Ursprüngliche Nachricht-
Von: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Im
Auftrag von Josh B
Gesendet: Friday, February 27, 2009 12:28 PM
An: R Help
Betreff: [R] Filtering a dataset's columns by another dataset's column names

Hello all,

I hope some of you can come to my rescue, yet again.

I have two genetic datasets, and I want one of the datasets to have only the
columns that are in common with the other dataset. 
Here is a toy example (my real datasets have hundreds of columns):

Dataset 1:

IndividualSNP1SNP2SNP3SNP4SNP5
1AGTCA
2TCAGT
3ACTCA

Dataset 2:

IndividualSNP1SNP3SNP5SNP6SNP7
4ATTGC
5TAAGG
6AATCG

I want Dataset1 to have only columns that are also represented in Dataset 2,
i.e., I want to generate a new Dataset 3 that looks like this:

IndividualSNP1SNP3SNP5
1ATA
2TAT
3ATA

Does anyone know how I could do this? Keep in mind that this is not a simple
merge, as in the merge function.

Thanks very much for your help everyone.
Josh B.



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inefficiency of SAS Programming

2009-02-27 Thread Frank E Harrell Jr

spam me wrote:

I've actually used AHRQ's software to create Inpatient Quality Indicator
reports.  I can confirm pretty much what we already know; it is inefficient.
Running on about 1.8 - 2 million cases, it would take just about a whole day
to run the entire process from start to finish.  That isn't all processing
time and includes some time for the analyst to check results between
substeps, but I still knew that my day was full when I was working on IQI
reports.



To be fair though, there are a lot of other factors (beside efficiency
considerations) that go into AHRQ's program design.  First, there are a lot
of changes to that software every year.  In some cases it is easier and less
error prone to hardcode a few points in the data so that it is blatantly
obvious what to change next year should another analyst need to do so.  Second,
the organizations that use this software often require transparency and may
not have high level programmers on staff.  Writing code so that it is
accessible, editable, and interpretable by intermediate level programmers or
analysts is a plus.  Third, given that IQI reports are often produced on a
yearly basis, there's no real need to sacrifice clarity, etc. for efficiency
- you're only doing this process once a year.



There are other points that could be made, but the main idea is I don't
think it's fair to hold this software up, out of context, as an example of
SAS's (or even AHRQs) inefficiencies.  I agree that SAS syntax is nowhere
near as elegant or as powerful as R from a programming standpoint, that's
why after 7 years of using SAS I switched to R.  But comparing the two at
that level is like a racing a Ferrari and a Bentley to see which is the
better car.


Dear Anonymous,

Nice points.  I would just add that it would be better if 
government-sponsored projects would result in software that could be run 
without expensive licenses.


Thanks
Frank



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inefficiency of SAS Programming

2009-02-27 Thread Frank E Harrell Jr

John Sorkin wrote:

Terry's remarks (see below) are well received however, I take issue with one part of his comments. 
As a long time programmer (in both statistical programming languages and 
traditional programming languages), I miss the ability to write native-languages in R. 
While macros can make for difficult to read code, when used properly, they can also make flexible 
code that, if properly written (including good documentation, which should be a part of any code) 
can be easy to read.

Finally, everyone must remember that SAS code can be difficult to understand or 
inefficient just as R code can be difficult to understand or inefficient. 
In the end, both programming systems have their advantages and disadvantage. No programming 
language is perfect. It is not fair, nor correct to damn one or the other. Accept the fact that 
some things are more easily and more clearly done in one language, other things are more clearly 
and more easily done in another language.  Let's move on to more important issues, viz. improving R 
so it is as good as it possibly can be.
John  


Nice points John.  My only response is that I learned SAS in 1969 and 
used it intensively until 1991.  I wrote some of the first 
user-contributed SAS procedures (PROCs PCTL, GRAPH, DATACHK, LOGIST, 
PHGLM) and wrote extensively in the macro language.  After using S-Plus 
for only one month my productivity was far ahead of my productivity 
using SAS.


Frank



  


John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)


Terry Therneau thern...@mayo.edu 2/27/2009 10:23 AM 

Three comments

 I actually think you can write worse code in R than in SAS: more tools = more 
scope for innovatively bad ideas.  The ability to write bad code should not damm 
a language.  
 
  I found almost all of the improvements to the multi-line SAS recode to be 
regressions, both the SAS and the S suggestions. 
a. Everyone, even those of you with no SAS backround whatsoever, immediately 
understood the code.  Most of the replacements are obscure.  Compilers are very 
good these days and computers are fast, fewer typed characters != better.
b. If I were writing the S code for such an application, it would look much 
the same.  I worked as a programmer in medical research for several years, and 
one of the things that moved me on to graduate studies in statistics was the 
realization that doing my best work meant being as UN-clever as possible in my 
code.  

  Frank's comments imply that he was reading SAS macro code at the moment of 
peak frustration.  And if you want to criticise SAS code, this is the place to 
look.  SAS macro started out as some simple expansions, then got added on to, 
then added on again, and again, and   with no overall blueprint.  It is much 
like the farmhouse of some neighbors of mine growing up: 4 different expansions 
in 4 eras, and no overall guiding plan.  The interior layout was interesting 
to say the least. I was once a bona fide SAS 'wizard' (and Frank was much better 
than me), and I can't read the stuff without grinding my teeth.
  S was once headed down the same road. One of the best things ever with the 
language was documented in the blue book The New S Language, where Becker et 
al had the wisdom to scrap the macro processor.  
 
  	Terry Therneau


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code.


Confidentiality Statement:
This email message, including any attachments, is for ...{{dropped:14}}


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] help with projection pursuit

2009-02-27 Thread Olivier MARTIN

Hi all,

I have some difficulties with the function ppr for projection pursuit 
regression.
I obtained the results for a projection pursuit regression and now I 
would like to

compute some predictions for new data.

I tried the function predict in the following way predict(res.ppr, 
newdata) but it seems
that it is not right. The data rock is given for illustration of the 
function ppr.


attach(rock)

rock.ppr - ppr(log(perm) ~ area1 + peri1 + shape, data = rock, nterms = 2, 
max.terms = 5)


So suppose I want to make a prediction for the point
area1=10,peri1=3 and shape=2. I tried
the command predict(rock.ppr, c(10,3,2))  but it returns
an error message.
So, could you  indicate me the right way for this prediction?

Thanks for your help.
Olivier.


--

-
Martin Olivier
INRA - Unité Biostatistique  Processus Spatiaux
Domaine St Paul, Site Agroparc
84914 Avignon Cedex 9, France
Tel : 04 32 72 21 57
Fax : 04 32 72 21 82

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help: locfit (local logistic regression)

2009-02-27 Thread Sharai Gomez
Hi,

I am running a local logistic regression using locfit. Now, I want to choose
the bandwidth using cross-validation. I don't know if there is an additional
command to do so or if I can do it in the locfit. I would appreciate any
help about this matter. Thank you.

Regards,

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] add absolute value to bars in barplot

2009-02-27 Thread Greg Snow
Note that putting numbers near the top of the bars (either inside or outside) 
tends to create 'fuzzy' tops to the bars that make it harder for the viewer to 
quickly interpret the graph.  If the numbers are important, put them in a 
table.  If you really need to have the numbers and graph together then look at 
alternatives (some type of combined table/graph) or put the numbers in a margin 
of the graph where they will not distract from the graph itself.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of soeren.vo...@eawag.ch
 Sent: Friday, February 27, 2009 5:33 AM
 To: r-help@r-project.org
 Subject: [R] add absolute value to bars in barplot
 
 Hello,
 
 r-h...@r-project.orgbarplot(twcons.area,
beside=T, col=c(green4, blue, red3, gray),
xlab=estate,
ylab=number of persons, ylim=c(0, 110),
legend.text=c(treated, mix, untreated, NA))
 
 produces a barplot very fine. In addition, I'd like to get the bars'
 absolute values on the top of the bars. How can I produce this in an
 easy way?
 
 Thanks
 
 Sören
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] mefa 3.0-0

2009-02-27 Thread Peter Solymos
Dear R Community,

I am pleased to announce that a new version of the mefa R package is
available at the CRAN.

mefa is a package for multivariate data handling in ecology and
biogeography. It provides object classes to represent the data coded
by samples, taxa and segments (i.e., subpopulations, repeated
measures). It supports easy processing of the data along with
relational data tables for samples and taxa. An object of class mefa
is a project specific compendium of the dataset and can be easily used
in further analyses. Methods are provided for extraction, aggregation,
conversion, plotting, summary and reporting of mefa objects. Reports
can be generated in plain text or LaTex.

The current version has been published in JSS (
http://www.jstatsoft.org/v29/i08 ). The paper presents worked examples
on a variety of ecological analyses.

Best wishes,

Péter

Péter Sólymos, PhD
Postdoctoral Fellow
Department of Mathematical and Statistical Sciences
University of Alberta
Edmonton, Alberta, T6G 2G1
Canada
email - paste(solymos, ualberta.ca, sep = @)

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] testing two-factor anova effects using model comparison approach with lm() and anova()

2009-02-27 Thread Paul Gribble
I wonder if someone could explain the behavior of the anova() and lm()
functions in the following situation:

I have a standard 3x2 factorial design, factorA has 3 levels, factorB has 2
levels, they are fully crossed. I have a dependent variable DV.

Of course I can do the following to get the usual anova table:

 anova(lm(DV~factorA+factorB+factorA:factorB))
Analysis of Variance Table

Response: DV
Df  Sum Sq Mean Sq F value   Pr(F)
factorA  2  7.4667  3.7333  4.9778 0.015546 *
factorB  1  2.1333  2.1333  2.8444 0.104648
factorA:factorB  2  9.8667  4.9333  6.5778 0.005275 **
Residuals   24 18.  0.7500

This is perfectly satisfactory for my situation, but as a pedagogical
exercise, I wanted to demonstrate the model comparison approach to analysis
of variance by using anova() to compare a full model that contains all
effects, to restricted models that contain all effects save for the effect
of interest.

The test of the interaction effect seems to be as I expected:

 fullmodel-lm(DV~factorA+factorB+factorA:factorB)
 restmodel-lm(DV~factorA+factorB)
 anova(fullmodel,restmodel)
Analysis of Variance Table

Model 1: DV ~ factorA + factorB + factorA:factorB
Model 2: DV ~ factorA + factorB
  Res.Df RSS Df Sum of Sq  F   Pr(F)
1 24 18.
2 26 27.8667 -2   -9.8667 6.5778 0.005275 **

As you can see the value of F (6.5778) is the same as in the anova table
above. All is well.

However, if I try to test a main effect, e.g. factorA, by testing the full
model against a restricted model that doesn't contain the main effect
factorA, I get something strange:

 restmodel-lm(DV~factorB+factorA:factorB)
 anova(fullmodel,restmodel)
Analysis of Variance Table

Model 1: DV ~ factorA + factorB + factorA:factorB
Model 2: DV ~ factorB + factorA:factorB
  Res.Df RSS Df Sum of Sq F Pr(F)
1 24  18
2 24  18  0 0

upon inspection of each model I see that the Residuals are identical, which
is not what I was expecting:

 anova(fullmodel)
Analysis of Variance Table

Response: DV
Df  Sum Sq Mean Sq F value   Pr(F)
factorA  2  7.4667  3.7333  4.9778 0.015546 *
factorB  1  2.1333  2.1333  2.8444 0.104648
factorA:factorB  2  9.8667  4.9333  6.5778 0.005275 **
Residuals   24 18.  0.7500

This looks fine, but then the restricted model is where things are not as I
expected:

 anova(restmodel)
Analysis of Variance Table

Response: DV
Df  Sum Sq Mean Sq F value   Pr(F)
factorB  1  2.1333  2.1333  2.8444 0.104648
factorB:factorA  4 17.  4.  5.7778 0.002104 **
Residuals   24 18.  0.7500

I was expecting the Residuals in the restricted model (the one not
containing main effect of factorA) to be larger than in the full model
containing all three effects. In other words, the variance accounted for by
the main effect factorA should be added to the Residuals. Instead, it looks
like the variance accounted for by the main effect of factorA is being
soaked up by the factorA:factorB interaction term. Strangely, the degrees of
freedom are also affected.

I must be misunderstanding something here. Can someone point out what is
happening?

Thanks,

-Paul

-- 
Paul L. Gribble, Ph.D.
Associate Professor
Dept. Psychology
The University of Western Ontario
London, Ontario
Canada N6A 5C2
Tel. +1 519 661 2111 x82237
Fax. +1 519 661 3961
pgrib...@uwo.ca
http://gribblelab.org

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] formula formatting/grammar for regression

2009-02-27 Thread BKMooney

This is just (or should be) just a simple example of what I would like to
extend to further regression - which is why I was looking for a resource on
the grammar.  

If I try:
lm(ypts ~ exp(xpts)), I only get an intercept and one coefficient.  And for
the coefficient, I am not sure where that should go?  (ie is that A or r in
the formula y=A*exp(r*x) ) 

Also, when I tried to use nls, I get an error:  
nls(ypts ~ exp(xpts))
Error in getInitial.default(func, data, mCall = as.list(match.call(func,  : 
  no 'getInitial' method found for function objects

If someone could please point out what I am doing wrong, or point me to a
good resource on this, I would greatly appreciate it.  

Thanks!


Dieter Menne wrote:
 
 Brigid Mooney bkmooney at gmail.com writes:
 
 I am doing some basic regression analysis, and am getting a bit
 confused on how to enter non-polynomial formulas to be used.
 ..
 But am confused on what the formula should be for trying to find a fit
 to y = A*exp(r*x).
 
 If this example is just a placeholder for more complex than poly,
 you should check function nls which works for non-linear functions.
 
 However, if you really want to solve this problem only, doing a 
 log on you data and fitting a log of the above function with lm()
 is the easiest way out. Results can be a bit different from the
 nonlinear case depending on noise, because in one case weight
 are log-weighted, in the other linearly.
 
 Dieter
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/formula-formatting-grammar-for-regression-tp22249014p22251094.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inefficiency of SAS Programming

2009-02-27 Thread Ajay ohri
A further example of software pricing dynamics

 is the complete lack of awareness of WPS , a UK based software which is
basically a base SAS clone with all the features of SAS ( coding read ,write
and data read /write) and priced only at 660$ per desktop and 1400$ for
server licenses ..very very cheap compared to SAS Base..and it has a Bridge
to R for higher level statistics...

You would think a corporate user would not have any hesitation to switch to
a clone software priced at 10 % ...

yet there are hardly any takers for it..in the federal government...
:))

people worried about their government's spending should use the new website
http://www.recovery.gov/?q=content/contact

it is supposed to chronicle this and it would be a good test and control for
the Web 2.0 initiatives..

On Fri, Feb 27, 2009 at 11:18 PM, Frank E Harrell Jr 
f.harr...@vanderbilt.edu wrote:

 spam me wrote:

 I've actually used AHRQ's software to create Inpatient Quality Indicator
 reports.  I can confirm pretty much what we already know; it is
 inefficient.
 Running on about 1.8 - 2 million cases, it would take just about a whole
 day
 to run the entire process from start to finish.  That isn't all processing
 time and includes some time for the analyst to check results between
 substeps, but I still knew that my day was full when I was working on IQI
 reports.



 To be fair though, there are a lot of other factors (beside efficiency
 considerations) that go into AHRQ's program design.  First, there are a
 lot
 of changes to that software every year.  In some cases it is easier and
 less
 error prone to hardcode a few points in the data so that it is blatantly
 obvious what to change next year should another analyst need to do so.
  Second,
 the organizations that use this software often require transparency and
 may
 not have high level programmers on staff.  Writing code so that it is
 accessible, editable, and interpretable by intermediate level programmers
 or
 analysts is a plus.  Third, given that IQI reports are often produced on a
 yearly basis, there's no real need to sacrifice clarity, etc. for
 efficiency
 - you're only doing this process once a year.



 There are other points that could be made, but the main idea is I don't
 think it's fair to hold this software up, out of context, as an example of
 SAS's (or even AHRQs) inefficiencies.  I agree that SAS syntax is nowhere
 near as elegant or as powerful as R from a programming standpoint, that's
 why after 7 years of using SAS I switched to R.  But comparing the two at
 that level is like a racing a Ferrari and a Bentley to see which is the
 better car.


 Dear Anonymous,

 Nice points.  I would just add that it would be better if
 government-sponsored projects would result in software that could be run
 without expensive licenses.

 Thanks
 Frank


[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --
 Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help with projection pursuit

2009-02-27 Thread David Winsemius
In my experience (and per the help pages now that I look) the predict  
functions need named arguments that match up with the column names in  
the model and generally this needs to be supplied as a dataframe or a  
list.


(note: at least on my machine the rock dataframe does *not* have the  
names you offered)


predict(rock.ppr, list(area=10, peri= 3, shape=2))   # or...
predict(rock.ppr, data.frame(area=10, peri= 3, shape=2))

 predict(rock.ppr, list(area=10, peri= 3, shape=2))
   1
7.118094

--
David Winsemius

On Feb 27, 2009, at 10:09 AM, Olivier MARTIN wrote:


Hi all,

I have some difficulties with the function ppr for projection  
pursuit regression.
I obtained the results for a projection pursuit regression and now I  
would like to

compute some predictions for new data.

I tried the function predict in the following way predict(res.ppr,  
newdata) but it seems
that it is not right. The data rock is given for illustration of the  
function ppr.


attach(rock)

rock.ppr - ppr(log(perm) ~ area1 + peri1 + shape, data = rock,  
nterms = 2, max.terms = 5)



So suppose I want to make a prediction for the point
area1=10,peri1=3 and shape=2. I tried
the command predict(rock.ppr, c(10,3,2))  but it returns
an error message.
So, could you  indicate me the right way for this prediction?

Thanks for your help.
Olivier.


--

-
Martin Olivier
INRA - Unité Biostatistique  Processus Spatiaux
Domaine St Paul, Site Agroparc
84914 Avignon Cedex 9, France
Tel : 04 32 72 21 57
Fax : 04 32 72 21 82

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inefficiency of SAS Programming

2009-02-27 Thread John Sorkin
Frank,
A programming language's efficience is a function of several items, including 
what you are trying to program. Without using SAS proc IML, I have found that 
it is more efficient to code algorithms (e.g. a least squares linear 
regression) using R than SAS; we all know that matrix notation leads to more 
compact syntax than can be had when using non-matrix notation and R implements 
matrix notation. On the other hand, searching, sub-setting, merging etc. can a 
times be coded more efficiently, more easily, and in a more easily understood 
fashion is SAS. I am sure you people who use SAS to set up their datasets and 
then use R when they are developing an algorithm. 

Just as French may be a better language to express love, Italian a better 
language in which to write opera, and English the most efficient language for 
communication (at least for the last 50 years), so too do both R and SAS have a 
place in the larger world.
John 

John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

 Frank E Harrell Jr f.harr...@vanderbilt.edu 2/27/2009 12:52 PM 
John Sorkin wrote:
 Terry's remarks (see below) are well received however, I take issue with one 
 part of his comments. As a long time programmer (in both statistical 
 programming languages and traditional programming languages), I miss the 
 ability to write native-languages in R. While macros can make for difficult 
 to read code, when used properly, they can also make flexible code that, if 
 properly written (including good documentation, which should be a part of any 
 code) can be easy to read.
 
 Finally, everyone must remember that SAS code can be difficult to understand 
 or inefficient just as R code can be difficult to understand or 
 inefficient. In the end, both programming systems have their advantages and 
 disadvantage. No programming language is perfect. It is not fair, nor correct 
 to damn one or the other. Accept the fact that some things are more easily 
 and more clearly done in one language, other things are more clearly and more 
 easily done in another language.  Let's move on to more important issues, 
 viz. improving R so it is as good as it possibly can be.
 John  

Nice points John.  My only response is that I learned SAS in 1969 and 
used it intensively until 1991.  I wrote some of the first 
user-contributed SAS procedures (PROCs PCTL, GRAPH, DATACHK, LOGIST, 
PHGLM) and wrote extensively in the macro language.  After using S-Plus 
for only one month my productivity was far ahead of my productivity 
using SAS.

Frank

 
   
 
 John David Sorkin M.D., Ph.D.
 Chief, Biostatistics and Informatics
 University of Maryland School of Medicine Division of Gerontology
 Baltimore VA Medical Center
 10 North Greene Street
 GRECC (BT/18/GR)
 Baltimore, MD 21201-1524
 (Phone) 410-605-7119
 (Fax) 410-605-7913 (Please call phone number above prior to faxing)
 
 Terry Therneau thern...@mayo.edu 2/27/2009 10:23 AM 
 Three comments
 
  I actually think you can write worse code in R than in SAS: more tools = 
 more 
 scope for innovatively bad ideas.  The ability to write bad code should not 
 damm 
 a language.  
  
   I found almost all of the improvements to the multi-line SAS recode to be 
 regressions, both the SAS and the S suggestions. 
 a. Everyone, even those of you with no SAS backround whatsoever, 
 immediately 
 understood the code.  Most of the replacements are obscure.  Compilers are 
 very 
 good these days and computers are fast, fewer typed characters != better.
 b. If I were writing the S code for such an application, it would look 
 much 
 the same.  I worked as a programmer in medical research for several years, 
 and 
 one of the things that moved me on to graduate studies in statistics was the 
 realization that doing my best work meant being as UN-clever as possible in 
 my 
 code.  
 
   Frank's comments imply that he was reading SAS macro code at the moment of 
 peak frustration.  And if you want to criticise SAS code, this is the place 
 to 
 look.  SAS macro started out as some simple expansions, then got added on to, 
 then added on again, and again, and   with no overall blueprint.  It is 
 much 
 like the farmhouse of some neighbors of mine growing up: 4 different 
 expansions 
 in 4 eras, and no overall guiding plan.  The interior layout was 
 interesting 
 to say the least. I was once a bona fide SAS 'wizard' (and Frank was much 
 better 
 than me), and I can't read the stuff without grinding my teeth.
   S was once headed down the same road. One of the best things ever with the 
 language was documented in the blue book The New S Language, where Becker 
 et 
 al had the wisdom to scrap the macro processor.  
  
   Terry Therneau

Re: [R] testing two-factor anova effects using model comparison approach with lm() and anova()

2009-02-27 Thread Greg Snow
Notice the degrees of freedom as well in the different models.  

With factors A and B, the 2 models:

A + B + A:B 

And 

A + A:B

Are actually the same overall model, just different parameterizations (you can 
also see this by using x=TRUE in the call to lm and looking at the x matrix 
used).

Testing if the main effect A should be in the model given that the interaction 
is in the model does not make sense in most cases, therefore the notation gives 
a different parameterization rather than the generally uninteresting test. 

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Paul Gribble
 Sent: Friday, February 27, 2009 11:01 AM
 To: r-help@r-project.org
 Subject: [R] testing two-factor anova effects using model comparison
 approach with lm() and anova()
 
 I wonder if someone could explain the behavior of the anova() and lm()
 functions in the following situation:
 
 I have a standard 3x2 factorial design, factorA has 3 levels, factorB
 has 2
 levels, they are fully crossed. I have a dependent variable DV.
 
 Of course I can do the following to get the usual anova table:
 
  anova(lm(DV~factorA+factorB+factorA:factorB))
 Analysis of Variance Table
 
 Response: DV
 Df  Sum Sq Mean Sq F value   Pr(F)
 factorA  2  7.4667  3.7333  4.9778 0.015546 *
 factorB  1  2.1333  2.1333  2.8444 0.104648
 factorA:factorB  2  9.8667  4.9333  6.5778 0.005275 **
 Residuals   24 18.  0.7500
 
 This is perfectly satisfactory for my situation, but as a pedagogical
 exercise, I wanted to demonstrate the model comparison approach to
 analysis
 of variance by using anova() to compare a full model that contains all
 effects, to restricted models that contain all effects save for the
 effect
 of interest.
 
 The test of the interaction effect seems to be as I expected:
 
  fullmodel-lm(DV~factorA+factorB+factorA:factorB)
  restmodel-lm(DV~factorA+factorB)
  anova(fullmodel,restmodel)
 Analysis of Variance Table
 
 Model 1: DV ~ factorA + factorB + factorA:factorB
 Model 2: DV ~ factorA + factorB
   Res.Df RSS Df Sum of Sq  F   Pr(F)
 1 24 18.
 2 26 27.8667 -2   -9.8667 6.5778 0.005275 **
 
 As you can see the value of F (6.5778) is the same as in the anova
 table
 above. All is well.
 
 However, if I try to test a main effect, e.g. factorA, by testing the
 full
 model against a restricted model that doesn't contain the main effect
 factorA, I get something strange:
 
  restmodel-lm(DV~factorB+factorA:factorB)
  anova(fullmodel,restmodel)
 Analysis of Variance Table
 
 Model 1: DV ~ factorA + factorB + factorA:factorB
 Model 2: DV ~ factorB + factorA:factorB
   Res.Df RSS Df Sum of Sq F Pr(F)
 1 24  18
 2 24  18  0 0
 
 upon inspection of each model I see that the Residuals are identical,
 which
 is not what I was expecting:
 
  anova(fullmodel)
 Analysis of Variance Table
 
 Response: DV
 Df  Sum Sq Mean Sq F value   Pr(F)
 factorA  2  7.4667  3.7333  4.9778 0.015546 *
 factorB  1  2.1333  2.1333  2.8444 0.104648
 factorA:factorB  2  9.8667  4.9333  6.5778 0.005275 **
 Residuals   24 18.  0.7500
 
 This looks fine, but then the restricted model is where things are not
 as I
 expected:
 
  anova(restmodel)
 Analysis of Variance Table
 
 Response: DV
 Df  Sum Sq Mean Sq F value   Pr(F)
 factorB  1  2.1333  2.1333  2.8444 0.104648
 factorB:factorA  4 17.  4.  5.7778 0.002104 **
 Residuals   24 18.  0.7500
 
 I was expecting the Residuals in the restricted model (the one not
 containing main effect of factorA) to be larger than in the full model
 containing all three effects. In other words, the variance accounted
 for by
 the main effect factorA should be added to the Residuals. Instead, it
 looks
 like the variance accounted for by the main effect of factorA is being
 soaked up by the factorA:factorB interaction term. Strangely, the
 degrees of
 freedom are also affected.
 
 I must be misunderstanding something here. Can someone point out what
 is
 happening?
 
 Thanks,
 
 -Paul
 
 --
 Paul L. Gribble, Ph.D.
 Associate Professor
 Dept. Psychology
 The University of Western Ontario
 London, Ontario
 Canada N6A 5C2
 Tel. +1 519 661 2111 x82237
 Fax. +1 519 661 3961
 pgrib...@uwo.ca
 http://gribblelab.org
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 

Re: [R] Inefficiency of SAS Programming

2009-02-27 Thread Bryan
My apologies, this obviously doubles as my for registration purposes
account and so I don't often send from it - I was not intentionally being so
secretive : )

At any rate, I completely agree, but of course it's a reciprocal
relationship.  The software is written in SAS because that's what the
organizations use, the organizations use SAS because that's what the
programs are written in...  For better or worse, SAS's integration in big
bureaucracies is the main thing that keeps it competitive in the marketplace
and viable.  There aren't a lot of other contexts in which their pricing
structure would work.

Bryan

On Fri, Feb 27, 2009 at 12:48 PM, Frank E Harrell Jr 
f.harr...@vanderbilt.edu wrote:

 spam me wrote:

 I've actually used AHRQ's software to create Inpatient Quality Indicator
 reports.  I can confirm pretty much what we already know; it is
 inefficient.
 Running on about 1.8 - 2 million cases, it would take just about a whole
 day
 to run the entire process from start to finish.  That isn't all processing
 time and includes some time for the analyst to check results between
 substeps, but I still knew that my day was full when I was working on IQI
 reports.



 To be fair though, there are a lot of other factors (beside efficiency
 considerations) that go into AHRQ's program design.  First, there are a
 lot
 of changes to that software every year.  In some cases it is easier and
 less
 error prone to hardcode a few points in the data so that it is blatantly
 obvious what to change next year should another analyst need to do so.
  Second,
 the organizations that use this software often require transparency and
 may
 not have high level programmers on staff.  Writing code so that it is
 accessible, editable, and interpretable by intermediate level programmers
 or
 analysts is a plus.  Third, given that IQI reports are often produced on a
 yearly basis, there's no real need to sacrifice clarity, etc. for
 efficiency
 - you're only doing this process once a year.



 There are other points that could be made, but the main idea is I don't
 think it's fair to hold this software up, out of context, as an example of
 SAS's (or even AHRQs) inefficiencies.  I agree that SAS syntax is nowhere
 near as elegant or as powerful as R from a programming standpoint, that's
 why after 7 years of using SAS I switched to R.  But comparing the two at
 that level is like a racing a Ferrari and a Bentley to see which is the
 better car.


 Dear Anonymous,

 Nice points.  I would just add that it would be better if
 government-sponsored projects would result in software that could be run
 without expensive licenses.

 Thanks
 Frank


[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --
 Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inefficiency of SAS Programming

2009-02-27 Thread Chu, Roy
Also because no one wants to put their neck out on a chopping block to
suggest R without technical support and the like.  If you use SAS,
there's a cascade of blame available, but it's not immediately
available for R.

On Fri, Feb 27, 2009 at 10:36 AM, Bryan thespamho...@gmail.com wrote:
 My apologies, this obviously doubles as my for registration purposes
 account and so I don't often send from it - I was not intentionally being so
 secretive : )

 At any rate, I completely agree, but of course it's a reciprocal
 relationship.  The software is written in SAS because that's what the
 organizations use, the organizations use SAS because that's what the
 programs are written in...  For better or worse, SAS's integration in big
 bureaucracies is the main thing that keeps it competitive in the marketplace
 and viable.  There aren't a lot of other contexts in which their pricing
 structure would work.

 Bryan

 On Fri, Feb 27, 2009 at 12:48 PM, Frank E Harrell Jr 
 f.harr...@vanderbilt.edu wrote:

 spam me wrote:

 I've actually used AHRQ's software to create Inpatient Quality Indicator
 reports.  I can confirm pretty much what we already know; it is
 inefficient.
 Running on about 1.8 - 2 million cases, it would take just about a whole
 day
 to run the entire process from start to finish.  That isn't all processing
 time and includes some time for the analyst to check results between
 substeps, but I still knew that my day was full when I was working on IQI
 reports.



 To be fair though, there are a lot of other factors (beside efficiency
 considerations) that go into AHRQ's program design.  First, there are a
 lot
 of changes to that software every year.  In some cases it is easier and
 less
 error prone to hardcode a few points in the data so that it is blatantly
 obvious what to change next year should another analyst need to do so.
  Second,
 the organizations that use this software often require transparency and
 may
 not have high level programmers on staff.  Writing code so that it is
 accessible, editable, and interpretable by intermediate level programmers
 or
 analysts is a plus.  Third, given that IQI reports are often produced on a
 yearly basis, there's no real need to sacrifice clarity, etc. for
 efficiency
 - you're only doing this process once a year.



 There are other points that could be made, but the main idea is I don't
 think it's fair to hold this software up, out of context, as an example of
 SAS's (or even AHRQs) inefficiencies.  I agree that SAS syntax is nowhere
 near as elegant or as powerful as R from a programming standpoint, that's
 why after 7 years of using SAS I switched to R.  But comparing the two at
 that level is like a racing a Ferrari and a Bentley to see which is the
 better car.


 Dear Anonymous,

 Nice points.  I would just add that it would be better if
 government-sponsored projects would result in software that could be run
 without expensive licenses.

 Thanks
 Frank


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --
 Frank E Harrell Jr   Professor and Chair           School of Medicine
                     Department of Biostatistics   Vanderbilt University


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] select Intercept coefficients only

2009-02-27 Thread choonhong ang
Hi friends,

Is there a function to select intercept coefficients only ?

When I use coeficients it shows me all the coefficients, but I only want a
specific coefficients.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Making tapply code more efficient

2009-02-27 Thread jim holtman
On something the size of your data it took about 30 seconds to
determine the number of unique teachers per student.

 x - cbind(sample(326397, 800967, TRUE), sample(20, 800967, TRUE))
 # split the data so you have the number of teachers per student
 system.time(t.s - split(x[,2], x[,1]))
   user  system elapsed
   0.920.010.94
 t.s[1:7]  # sample data
$`1`
[1] 16

$`2`
[1] 3

$`3`
[1] 1

$`4`
[1] 17

$`6`
[1]  9  9 19

$`7`
[1] 20

$`9`
[1]  3 16 16 10  8 17

 # count number of unique teachers per student
 system.time(t.a - sapply(t.s, function(x) length(unique(x
   user  system elapsed
  20.170.10   20.26



 t.a[1:10]
 1  2  3  4  6  7  9 10 11 12
 1  1  1  1  2  1  5  1  1  1


On Fri, Feb 27, 2009 at 9:46 AM, Doran, Harold hdo...@air.org wrote:
 Previously, I posed the question pasted down below to the list and
 received some very helpful responses. While the code suggestions
 provided in response indeed work, they seem to only work with *very*
 small data sets and so I wanted to follow up and see if anyone had ideas
 for better efficiency. I was quite embarrased on this as our SAS
 programmers cranked out programs that did this in the blink of an eye
 (with a few variables), but R was spinning for days on my Ubuntu machine
 and ultimately I saw a message that R was killed.

 The data I am working with has 800967 total rows and 31 total columns.
 The ID variable I use as the index variable in tapply() has 326397
 unique cases.

 length(unique(qq$student_unique_id))
 [1] 326397

 To give a sense of what my data look like and the actual problem,
 consider the following:

 qq - data.frame(student_unique_id = factor(c(1,1,2,2,2)),
 teacher_unique_id = factor(c(10,10,20,20,25)))

 This is a student achievement database where students occupy multiple
 rows in the data and the variable teacher_unique_id denotes the class
 the student was in. What I am doing is looking to see if the teacher is
 the same for each instance of the unique student ID. So, if I implement
 the following:

 same - function(x) length( unique(x) ) == 1
 results - data.frame(
        freq = tapply(qq$student_unique_id, qq$student_unique_id,
 length),
        tch = tapply(qq$teacher_unique_id, qq$student_unique_id, same)
 )

 I get the following results. I can see that student 1 appears in the
 data twice and the teacher is always the same. However, student 2
 appears three times and the teacher is not always the same.

 results
  freq   tch
 1    2  TRUE
 2    3 FALSE

 Now, implementing this same procedure to a large data set with the
 characteristics described above seems to be problematic in this
 implementation.

 Does anyone have reactions on how this could be more efficient such that
 it can run with large data as I described?

 Harold

 sessionInfo()
 R version 2.8.1 (2008-12-22)
 x86_64-pc-linux-gnu

 locale:
 LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.U
 TF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=
 C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATI
 ON=C

 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods   base




 # Original question posted on 1/13/09
 Suppose I have a dataframe as follows:

 dat - data.frame(id = c(1,1,2,2,2), var1 = c(10,10,20,20,25), var2 =
 c('foo', 'foo', 'foo', 'foobar', 'foo'))

 Now, if I were to subset by id, such as:

 subset(dat, id==1)
  id var1 var2
 1  1   10  foo
 2  1   10  foo

 I can see that the elements in var1 are exactly the same and the
 elements in var2 are exactly the same. However,

 subset(dat, id==2)
  id var1   var2
 3  2   20    foo
 4  2   20 foobar
 5  2   25    foo

 Shows the elements are not the same for either variable in this
 instance. So, what I am looking to create is a data frame that would be
 like this

 id      freq    var1    var2
 1       2       TRUE    TRUE
 2       3       FALSE   FALSE

 Where freq is the number of times the ID is repeated in the dataframe. A
 TRUE appears in the cell if all elements in the column are the same for
 the ID and FALSE otherwise. It is insignificant which values differ for
 my problem.

 The way I am thinking about tackling this is to loop through the ID
 variable and compare the values in the various columns of the dataframe.
 The problem I am encountering is that I don't think all.equal or
 identical are the right functions in this case.

 So, say I was wanting to compare the elements of var1 for id ==1. I
 would have

 x - c(10,10)

 Of course, the following works

 all.equal(x[1], x[2])
 [1] TRUE

 As would a similar call to identical. However, what if I only have a
 vector of values (or if the column consists of names) that I want to
 assess for equality when I am trying to automate a process over
 thousands of cases? As in the example above, the vector may contain only
 two values or it may contain many more. The number of values in the
 vector differ by id.

 Any thoughts?

 Harold

 

Re: [R] select Intercept coefficients only

2009-02-27 Thread Uwe Ligges



choonhong ang wrote:

Hi friends,

Is there a function to select intercept coefficients only ?

When I use coeficients it shows me all the coefficients, but I only want a
specific coefficients.


What about indexing, e.g. as in:

coefficients(some_lm_object)[(Intercept)]

Uwe Ligges




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >