Re: [R] Pairwise T-Tests and Dunnett's Test (possibly using multcomp)

2011-03-01 Thread Joshua Wiley
Hi Paul,

Changing the factor levels will work (as you saw).  In this case, you
could also edit the contrast matrix.

## look at default contrasts
contrasts(gad$dosegrp)
model1 <- lm(hama ~ dosegrp, data = gad)
summary(model1)

## choose group 3 as base (comparison)
contrasts(gad$dosegrp) <- contr.treatment(n = 3, base = 3)
model1 <- lm(hama ~ dosegrp, data = gad)
summary(model1)

If you have MASS 4th ed., I believe a discussion of contrast matrices
starts on page 146.

As far as improvements, in general, I think it would be preferable to
let R's method dispatch system choose the method for summary rather
than specifying summary.aov() yourself.  You might also consider more
spaces (e.g., between arguments).

Cheers,

Josh

On Tue, Mar 1, 2011 at 3:45 PM, Paul Miller  wrote:
> Hello Everyone,
>
> Figured out one part of the code. Setting the reference level for a factor is 
> accomplished using the relevel funtion (pg. 383 of MASS; pg. 70 of Data 
> Manipulation with R):
>
> gad$dosegrp <- relevel(gad$dosegrp,3)
>
> This works very well. Much better than using a format in SAS procedures that 
> don't allow the "ref=" option for instance.
>
> Does anyone have suggestions about how to improve other aspects of my code?
>
> Thanks,
>
> Paul

> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Difference in numeric Dates between Excel and R

2011-03-01 Thread Prof Brian Ripley

On Wed, 2 Mar 2011, Luis Felipe Parra wrote:


Hello. I am using some dates I read in excel in R. I know the excel origin
is supposed to be 1900-1-1. But when I used as.Date with origin=1900-1-1 the
dates that R reported me where two days ahead than the ones I read from
Excel. I noticed that when I did in R the following:


as.Date("2011-3-4")-as.Date("1900-1-1")

Time difference of 40604 days

but if I do the same operation in Excel the answer is 40605. Does anybody
know what can be going on?


We cannot know: you say a difference of 2 and report 1!

As the examples from as.Date says

 ## Excel is said to use 1900-01-01 as day 1 (Windows default) or
 ## 1904-01-01 as day 0 (Mac default), but this is complicated by Excel
 ## thinking 1900 was a leap year.
 ## So for recent dates from Windows Excel
 as.Date(35981, origin="1899-12-30") # 1998-07-05
 ## and Mac Excel
 as.Date(34519, origin="1904-01-01") # 1998-07-05

So the origin you used is off by 2 days: one for the origin being day 
1 and one for Windows Excel's ignorance of the calendar.


Note too that these are *default*: they can be changed in Excel.


Thank you
Felipe Parra

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


PLEASE do try to do your own homework (and not send HTML), as we 
requested there.  It is galling that you ask here about bugs in Excel, 
bugs that are even documented in R's help.  In future, please use the 
Microsoft help you paid for with Excel if it disagrees with R.


--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Pairwise T-Tests and Dunnett's Test (possibly using multcomp)

2011-03-01 Thread Paul Miller
Hello Everyone,
 
Figured out one part of the code. Setting the reference level for a factor is 
accomplished using the relevel funtion (pg. 383 of MASS; pg. 70 of Data 
Manipulation with R):
 
gad$dosegrp <- relevel(gad$dosegrp,3)
 
This works very well. Much better than using a format in SAS procedures that 
don't allow the "ref=" option for instance.
 
Does anyone have suggestions about how to improve other aspects of my code?
 
Thanks,
 
Paul
 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] a question on sqldf's handling of missing value and factor

2011-03-01 Thread xin wei
Dear subscribers:

I am using the following code to read a large number of big text files:
library(sqldf)
tempd <- file()
tempdx <- sqldf("select * from tempd", dbname = tempfile(), file.format =
list(header = T, sep="\t", row.names = F))

The problem is: all my numberical variable become factor (maybe because
these columns all contain missing value). It would be quite cubersome to
convert them to numeric variable using as.numeric one by one. Does anyone
know how to re-set SQLDF so that it would automatically read the numeric
column with missing row as real numeric instead of factor?

many thanks

-- 
View this message in context: 
http://r.789695.n4.nabble.com/a-question-on-sqldf-s-handling-of-missing-value-and-factor-tp3331007p3331007.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] getting attributes of list without the "names".

2011-03-01 Thread Jeroen Ooms
Thank you, this is very helpful. I am a little confused now about the
structure of a list though. If the names of the list-elements is truly an
attribute that is stored in another list this would lead to an infintely
recursing object?

How could I iterate over the full object tree, without getting into infinite
recursion if every list's attributes is another list with at least a 'names'
attribute?





On Tue, Mar 1, 2011 at 2:30 AM, Duncan Murdoch wrote:

> On 11-02-28 11:17 PM, Jeroen Ooms wrote:
>
>> I am trying to encode arbitrary S3 objects by recursively looping over the
>> object and all its attributes. However, there is an unfortunate feature of
>> the attributes() function that is causing trouble. From the manual for
>> ?attributes:
>>
>> The names of a pairlist are not stored as attributes, but are reported as
>> if
>> they were (and can be set by the replacement method for attributes).
>>
>> Now because of this, my program ends up in infinite recursion, because it
>> will try to encode
>> attributes(attributes(attributes(attributes(list(foo=123 etc. I can't
>> remove the 'names' attribute, because this will actually affect the list
>> structure. And even when I do:
>>
>> attributes(attributes(obj)[names(attributes(obj)) != "names"])
>>
>> This will keep giving me a named list. Is there any way I can get the
>> attributes() of a list without it reporting the names of a list as
>> attributes? I.e it should hold that:
>>
>> atr1<- attributes(list(foo="bar"));
>> atr2<- attributes(list());
>> identical(atr1,atr2);
>>
>
> The names of a list (a generic vector) are attributes, just like the names
> of other vectors.  The documentation is talking about pairlists, a mostly
> internal structure, used for example to store parts of expressions.  So your
> premise might be wrong about the cause of the recursion...
>
> But assuming you really want to see all attributes except names.  Then just
> write your own version:
>
> nonameattributes <- function(obj) {
>  result <- attributes(obj)
>  if (!is.null(result$names))
>result$names <- NULL
>
>  # This removes the empty names of the result if there were no other
>  # attributes.  It's optional, but you said you wanted
>  # identical(atr1, atr2)
>
>  if (!length(result))
>names(result) <- NULL
>
>  result
> }
>
> You can make the conditional more complicated, only making the change for
> pairlists, etc., using tests on typeof(obj) or other tests.
>
> Duncan Murdoch
>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] accessing variables inside a function, inside a loop

2011-03-01 Thread Joshua Wiley
Hi Alan,

Other more knowledgeable people may have better opinions on this than
I do.  Manipulating language and call objects is seriously stretching
my skills.  \

In any case, two ways come to mind, both of them sufficiently
cumbersome I would seriously question the value (btw, this is a
completely different question, right?).  To borrow from Barry
Rowlingson, I'd like to prefix all these 'solutions' with "Here's how
to do it, but don't actually do it."  The first option would be to
manually construct a call to glm() and then evaluate it.  That is,
rather than pass "frm" to the formula argument, construct a text
string of the entire glm call.  Something like:

#
foo <- function(y) {
  paste("glm(formula = ", y, " ~ hp * wt, data = mtcars)", sep = '')
}
foo("mpg")
eval(parse(text = foo("mpg")))
#

The other thought would be to just update the part of the model object
containing the call (I actually like this better than my first option,
though I'm still not fond of it).  Assuming you do not actually need
the entire call, you could easily use the formula.  Here's an example:

#
xReg <- function(y) {
  frm <- eval(substitute(p ~ hp * wt, list(p = as.name(y
  mod <- glm(frm, data = mtcars)
  mod$call <- frm
  return(mod)
}

xReg("mpg")
#


If you want to know what the formula used for a model is, my
suggestion would be to simply have your function xReg() return both
the model object AND the formula you used (i.e., "frm").  Here is an
example:

#
## *my* preference

xReg <- function(y) {
  frm <- eval(substitute(p ~ hp * wt, list(p = as.name(y
  mod <- glm(frm, data = mtcars)
  output <- list(formula = frm, model = mod)
  attributes(output$formula) <- NULL
  return(output)
}

xReg("mpg")
#

Side note, Dr. Bates (author of lme4, genius, and a nice, helpful
person to boot) taught me how to use substitute() for something I
tried once on the ggplot2 list.

Cheers,

Josh

On Tue, Mar 1, 2011 at 7:04 PM, zListserv  wrote:
> Joshua
>
> Great solution.  Taking off on your code, the following works but does not 
> display the names of the variables in the formula.  Any suggestions about how 
> to modify the function so that it displays the correct formula (e.g., 
> "glm(formula = y1 ~ x1 * x2, data = dat)" instead of "glm(formula = frm, data 
> = dat)")?
>
> R> x = runif(2000)
> R> y = runif(2000)
> R> z = runif(2000)
> R>
> R> y1 = y * x
> R> y2 = y * sqrt(x)
> R>
> R> x1 = y1 / y2 + z
> R> x2 = y2 / y1 * z + z
> R>
> R> dat = data.frame(y1,y2,x1,x2)
> R>
> R> xReg = function(y) {
> +
> +       frm = eval(substitute(p ~ x1 * x2, list(p = as.name(y
> +       mod = glm(frm, data=dat)
> +       }
> R>
> R> lapply(names(dat[,1:2]), xReg)
> [[1]]
>
> Call:  glm(formula = frm, data = dat)
>
> Coefficients:
> (Intercept)           x1           x2        x1:x2
> -0.1882452    0.4932059    0.0667401   -0.1310084
>
> Degrees of Freedom: 1999 Total (i.e. Null);  1996 Residual
> Null Deviance:      99.15032
> Residual Deviance: 67.71775     AIC: -1085.354
>
> [[2]]
>
> Call:  glm(formula = frm, data = dat)
>
> Coefficients:
> (Intercept)            x1            x2         x1:x2
> -0.005464627   0.386937367   0.037363416  -0.094136334
>
> Degrees of Freedom: 1999 Total (i.e. Null);  1996 Residual
> Null Deviance:      112.7078
> Residual Deviance: 90.24796     AIC: -510.9287
>
> ---
>
> Thanks,
>
> Alan
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] the features of the truth

2011-03-01 Thread Alexy Khrabrov
This is really a statistics problem, so I wonder which R packages can be 
employed best to solve and visualize it.

I run a lot of simulations to approach the truth.  The truth is a result of 
very complex computations, and is a real number.  The closer it is to 0, the 
truthier it is.

Each simulations has a set of features, some of which are not available for all 
simulations.  Some of the features are numeric (week), some boolean (utility), 
while others are factors.

Each simulation has the final value, the dm column in the data frame.  The 
names of the simulations are rownames of the data frame, and feature names are 
the column names.  Here's the dataframe:

http://dl.dropbox.com/u/9300701/Data/sf.dm.pos.r

You read it in R with

sf <- read.table(sf.dm.pos.r)

Seeking the truth questions:

-- What kinds of GLM and other models can we run to determine which features 
are most contributing to the truth, i.e. making dm closer to 0?

-- What kind of clustering can emphasize the most contributing features?

-- What kind of visualizations can be used to make it clear which features 
affect the truth the most, and in which combinations?  What kind of color 
visualizations are there to make the truth even clearer?

Cheers,
Alexy

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merge in data.tables -- "non-visible"

2011-03-01 Thread Steve Lianoglou
Hi Ted,

On Tue, Mar 1, 2011 at 9:45 PM, Ted Rosenbaum  wrote:
> Hi,
> I am trying to use the merge command in the data.tables package.
> However, when I run the command I am not sure if it is running the merge
> command from the base package or the merge command from data.tables.
> When I run "methods(generic.function="merge")' it informs me that
> 'merge.data.table" is "non-visible".
> I am just trying to run the merge command on two data tables using the
> index, is there anything else that I need to do (my googling has simply left
> me uncertain about how to get this to work).
> Thanks for your help!

Assuming everything is "normal", I'm going to bet the merge.data.table
function is the one that is being used.

Assuming you are using version <= 1.5.3, though, an easy way to check
is to see if the result of the merge ignores the `suffixes` argument.
The behavior of merge is being changed for the next version, but this
"feature" is an easy way for you to check which merge function is
being used in the current version ;-)

Hope that helps,
-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problems with playwith

2011-03-01 Thread Michael Lawrence
Do you get an error message in a dialog box? My guess is that you just need
to update your GTK+.

On Tue, Mar 1, 2011 at 8:58 AM, R Heberto Ghezzo, Dr <
heberto.ghe...@mcgill.ca> wrote:

> hello, i tried to run playwith but :
>
> > library(playwith)
> Loading required package: lattice
> Loading required package: cairoDevice
> Loading required package: gWidgetsRGtk2
> Loading required package: gWidgets
> Error in inDL(x, as.logical(local), as.logical(now), ...) :
>  unable to load shared object 'H:/R/cran/RGtk2/libs/i386/RGtk2.dll':
>  LoadLibrary failure:  The specified procedure could not be found.
> Failed to load RGtk2 dynamic library, attempting to install it.
> Learn more about GTK+ at http://www.gtk.org
> If the package still does not load, please ensure that GTK+ is installed
> and that it is on your PATH environment variable
> IN ANY CASE, RESTART R BEFORE TRYING TO LOAD THE PACKAGE AGAIN
> Error : .onAttach failed in attachNamespace() for 'gWidgetsRGtk2', details:
>  call: .Call(name, ..., PACKAGE = PACKAGE)
>  error: C symbol name "S_gtk_icon_factory_new" not in DLL for package
> "RGtk2"
> Error: package 'gWidgetsRGtk2' could not be loaded
> >
> > Sys.getenv("PATH")
>
>
>
>
>PATH
> "H:\\R/GTK/bin;H:\\R/GTK/lib;H:\\R/ImageMagick;C:\\windows\\system32;C:\\windows;C:\\windows\\System32\\Wbem;C:\\windows\\System32\\WindowsPowerShell\\v1.0\\;C:\\Program
> Files\\Common Files\\Ulead Systems\\MPEG;C:\\Program
> Files\\QuickTime\\QTSystem\\;H:\\R\\GTK\\GTK2-Runtime\\bin;H:\\PortableUSB/PortableApps/MikeTex/miktex/bin"
> >
> packages(lattice, cairoDevice, gWidgetsRGtk2, gWidgets, RGtk2, playwith)
> were reinstalled
> program GTK was reinstalled.
> using R-2-12-2 on Windows 7
> Can anybody suggest a solution?
> thanks
>
> R.Heberto Ghezzo Ph.D.
> Montreal - Canada
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] accessing variables inside a function, inside a loop

2011-03-01 Thread zListserv
Joshua

Great solution.  Taking off on your code, the following works but does not 
display the names of the variables in the formula.  Any suggestions about how 
to modify the function so that it displays the correct formula (e.g., 
"glm(formula = y1 ~ x1 * x2, data = dat)" instead of "glm(formula = frm, data = 
dat)")?

R> x = runif(2000)
R> y = runif(2000)
R> z = runif(2000)
R> 
R> y1 = y * x
R> y2 = y * sqrt(x)
R> 
R> x1 = y1 / y2 + z
R> x2 = y2 / y1 * z + z
R> 
R> dat = data.frame(y1,y2,x1,x2)
R>
R> xReg = function(y) {
+ 
+   frm = eval(substitute(p ~ x1 * x2, list(p = as.name(y
+   mod = glm(frm, data=dat)
+   }
R> 
R> lapply(names(dat[,1:2]), xReg)
[[1]]

Call:  glm(formula = frm, data = dat)

Coefficients:
(Intercept)   x1   x2x1:x2  
-0.18824520.49320590.0667401   -0.1310084  

Degrees of Freedom: 1999 Total (i.e. Null);  1996 Residual
Null Deviance:  99.15032 
Residual Deviance: 67.71775 AIC: -1085.354 

[[2]]

Call:  glm(formula = frm, data = dat)

Coefficients:
(Intercept)x1x2 x1:x2  
-0.005464627   0.386937367   0.037363416  -0.094136334  

Degrees of Freedom: 1999 Total (i.e. Null);  1996 Residual
Null Deviance:  112.7078 
Residual Deviance: 90.24796 AIC: -510.9287 

---

Thanks,

Alan
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] merge in data.tables -- "non-visible"

2011-03-01 Thread Ted Rosenbaum
Hi,
I am trying to use the merge command in the data.tables package.
However, when I run the command I am not sure if it is running the merge
command from the base package or the merge command from data.tables.
When I run "methods(generic.function="merge")' it informs me that
'merge.data.table" is "non-visible".
I am just trying to run the merge command on two data tables using the
index, is there anything else that I need to do (my googling has simply left
me uncertain about how to get this to work).
Thanks for your help!

Ted Rosenbaum
Graduate Student
Department of Economics
Yale University

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Plotting a 3D histogram

2011-03-01 Thread Robert A'gata
Hi - I am wondering if there is any package that does plotting of
joint histogram between 2 variables, i.e. f(x,y). I found rgl but it
seems not so intuitive to use. I'm wondering if there is any
alternative. Thank you.

Robert

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Difference in numeric Dates between Excel and R

2011-03-01 Thread David Scott

On 2/03/2011 12:31 p.m., Nordlund, Dan (DSHS/RDA) wrote:

-Original Message- From: r-help-boun...@r-project.org
[mailto:r-help-bounces@r- project.org] On Behalf Of Luis Felipe
Parra Sent: Tuesday, March 01, 2011 3:07 PM To: r-help Subject: [R]
Difference in numeric Dates between Excel and R

Hello. I am using some dates I read in excel in R. I know the
excel origin is supposed to be 1900-1-1. But when I used as.Date
with origin=1900-1- 1 the dates that R reported me where two days
ahead than the ones I read from Excel. I noticed that when I did in
R the following:


as.Date("2011-3-4")-as.Date("1900-1-1")

Time difference of 40604 days

but if I do the same operation in Excel the answer is 40605. Does
anybody know what can be going on?



I think so.  It is a known problem that Excel thinks 1900 was a leap
year, but it was not.  So Excel counts an extra day (for nonexistent
Feb 29, 1900).  In addition,  Excel considers "1900-01-01" as day 1,
not day 0.

Hope this is helpful,

Dan


An explanation which seems reasonably authoritative is given here:
http://www.cpearson.com/excel/datetime.htm


David Scott


Daniel J. Nordlund Washington State Department of Social and Health
Services Planning, Performance, and Accountability Research and Data
Analysis Division Olympia, WA 98504-5204


__ R-help@r-project.org
mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
_
David Scott Department of Statistics
The University of Auckland, PB 92019
Auckland 1142,NEW ZEALAND
Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055
Email:  d.sc...@auckland.ac.nz,  Fax: +64 9 373 7018

Director of Consulting, Department of Statistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] does rpy support R 2.12.2

2011-03-01 Thread Pete Shepard
Hi,

I am getting the following error when I try to run import rpy from the the
python IDE:

Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib/python2.6/dist-packages/rpy.py", line 134, in 
""" % RVERSION)
RuntimeError: No module named _rpy2122

  RPy module can not be imported. Please check if your rpy
  installation supports R 2.12.2. If you have multiple R versions
  installed, you may need to set RHOME before importing rpy. For
  example:

  >>> from rpy_options import set_options
  >>> set_options(RHOME='c:/progra~1/r/rw2011/')
  >>> from rpy import *


I am wondering if rpy supports R 2.12.2?

Thanks

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Difference in numeric Dates between Excel and R

2011-03-01 Thread Nordlund, Dan (DSHS/RDA)
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of Luis Felipe Parra
> Sent: Tuesday, March 01, 2011 3:07 PM
> To: r-help
> Subject: [R] Difference in numeric Dates between Excel and R
> 
> Hello. I am using some dates I read in excel in R. I know the excel
> origin
> is supposed to be 1900-1-1. But when I used as.Date with origin=1900-1-
> 1 the
> dates that R reported me where two days ahead than the ones I read from
> Excel. I noticed that when I did in R the following:
> 
> > as.Date("2011-3-4")-as.Date("1900-1-1")
> Time difference of 40604 days
> 
> but if I do the same operation in Excel the answer is 40605. Does
> anybody
> know what can be going on?
> 

I think so.  It is a known problem that Excel thinks 1900 was a leap year, but 
it was not.  So Excel counts an extra day (for nonexistent Feb 29, 1900).  In 
addition,  Excel considers "1900-01-01" as day 1, not day 0.

Hope this is helpful,

Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glht() used with coxph()

2011-03-01 Thread array chip
In case anyone is interested, I figure it out that when strata is used, I have 
to specify the comparison matrix manually:

> (fit<-coxph(Surv(stop, status>0)~treatment+strata(enum),bladder1))
> coef(fit)
treatmentpyridoxine   treatmentthiotepa 
  0.1877925  -0.2097894
> glht(fit,linfct=mcp(treatment='Tukey'))
Error in glht.matrix(model = list(coefficients = c(0.187792527684977,  : 
  ‘ncol(linfct)’ is not equal to ‘length(coef(model))’

> K<-rbind('pyridoxine - placebo'= c(1,0),'thiotepa - placebo' = 
> c(0,1),'thiotepa 
>- pyridoxine'= c(-1,1))
> glht(fit,linfct=K)
 General Linear Hypotheses

Linear Hypotheses:
   Estimate
pyridoxine - placebo == 00.1878
thiotepa - placebo == 0 -0.2098
thiotepa - pyridoxine == 0  -0.3976

John






To: r-help@r-project.org
Sent: Tue, March 1, 2011 11:38:22 AM
Subject: [R] glht() used with coxph()

Hi, I am experimenting with using glht() from multcomp package together with 
coxph(), and glad to find that glht() can work on coph object, for example:

> (fit<-coxph(Surv(stop, status>0)~treatment,bladder1))
coxph(formula = Surv(stop, status > 0) ~ treatment, data = bladder1)


  coef exp(coef) se(coef)  zp
treatmentpyridoxine -0.063 0.9390.161 -0.391 0.70
treatmentthiotepa   -0.159 0.8530.168 -0.947 0.34

Likelihood ratio test=0.91  on 2 df, p=0.635  n= 294 

> glht(fit,linfct=mcp(treatment='Tukey'))

 General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts


Linear Hypotheses:
   Estimate
pyridoxine - placebo == 0  -0.06303
thiotepa - placebo == 0-0.15885
thiotepa - pyridoxine == 0 -0.09582

However, once I added a strata term in the formula of coxph(), then glht() 
can't 

work anymore:

> (fit<-coxph(Surv(stop, status>0)~treatment+strata(enum),bladder1))
coxph(formula = Surv(stop, status > 0) ~ treatment + strata(enum), 
data = bladder1)


  coef exp(coef) se(coef) zp
treatmentpyridoxine  0.188  1.210.170  1.11 0.27
treatmentthiotepa   -0.210  0.810.172 -1.22 0.22

Likelihood ratio test=4.39  on 2 df, p=0.111  n= 294 

> glht(fit,linfct=mcp(treatment='Tukey'))
Error in glht.matrix(model = list(coefficients = c(0.187792527684977,  : 
  ‘ncol(linfct)’ is not equal to ‘length(coef(model))’

Can anyone suggest why strata would make coxph object ineligible for glht()? Or 
how to make it work?

Thanks

John


  
[[alternative HTML version deleted]]


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] inefficient ifelse() ?

2011-03-01 Thread William Dunlap
Try using [<- more, instead of ifelse().  I rarely find
myself really using both of the calls to [<- that ifelse
makes.  E.g., I use
   x[x==999] <- NA
instead of
   x <- ifelse(x==999, NA, x)

But if you find yourself using ifelse in a certain way often,
try writing a function that only allows that case.  E.g.,
   transform2 <- function(x, test, ifTrueFunction, ifFalseFunction)
   {
   stopifnot(is.logical(test), length(x) != length(test), 
is.function(ifTrueFunction), is.function(ifFalseFunction))
   retval <- x # assume output is of same type as input
   retval[test] <- ifTrueFunction(x[test])
   retval[!test] <- ifFalseFunction(x[!test])
   retval
   }
   transform2(x, x<=0, f, g) 

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

> -Original Message-
> From: ivo...@gmail.com [mailto:ivo...@gmail.com] On Behalf Of 
> ivo welch
> Sent: Tuesday, March 01, 2011 2:20 PM
> To: William Dunlap
> Cc: r-help
> Subject: Re: [R] inefficient ifelse() ?
> 
> yikes.  you are asking me too much.
> 
> thanks everybody for the information.  I learned something new.
> 
> my suggestion would be for the much smarter language designers (than
> I) to offer us more or less blissfully ignorant users another
> vector-related construct in R.  It could perhaps be named %if% %else%,
> analogous to if else (with naming inspired by %in%, and with
> evaluation only of relevant parts [just as if else for scalars]), with
> different outcomes in some cases, but with the advantage of typically
> evaluating only half as many conditions as the ifelse() vector
> construct.  %if% %else% may work only in a subset of cases, but when
> it does work, it would be nice to have.  it would probably be my first
> "goto" function, with ifelse() use only as a fallback.
> 
> of course, I now know how to fix my specific issue.  I was just
> surprised that my first choice, ifelse(), was not as optimized as I
> had thought.
> 
> best,
> 
> /iaw
> 
> 
> On Tue, Mar 1, 2011 at 5:13 PM, William Dunlap 
>  wrote:
> > An ifelse-like function that only evaluated
> > what was needed would be fine, but it would
> > have to be different from ifelse itself.  The
> > trick is to come up with a good parameterization.
> >
> > E.g., how would it deal with things like
> >   ifelse(is.na(x), mean(x, na.rm=TRUE), x)
> > or
> >   ifelse(x>1, log(x), runif(length(x),-1,0))
> > or
> >   ifelse(x>1, log(x), -seq_along(x))
> > Would it reject such things?  Deciding that the
> > x in mean(x,na.rm=TRUE) should be replaced by
> > x[is.na(x)] would be wrong.  Deciding that
> > runif(length(x)) should be replaced by runif(sum(x>1))
> > seems a bit much to expect.  Replacing seq_along(x) with
> > seq_len(sum(x>1)) is wrong.  It would be better to
> > parameterize the new function so it wouldn't have to
> > think about those cases.
> >
> > Would you want it to depend only on a logical
> > vector or perhaps also on a factor (a vectorized
> > switch/case function)?
> >
> > Bill Dunlap
> > Spotfire, TIBCO Software
> > wdunlap tibco.com
> >
> >> -Original Message-
> >> From: r-help-boun...@r-project.org
> >> [mailto:r-help-boun...@r-project.org] On Behalf Of ivo welch
> >> Sent: Tuesday, March 01, 2011 12:36 PM
> >> To: Henrique Dallazuanna
> >> Cc: r-help
> >> Subject: Re: [R] inefficient ifelse() ?
> >>
> >> thanks, Henrique.  did you mean
> >>
> >>     as.vector(t(mapply(function(x, f)f(x), split(t, ((t %% 2)==0)),
> >> list(f, g   ?
> >>
> >> otherwise, you get a matrix.
> >>
> >> its a good solution, but unfortunately I don't think this 
> can be used
> >> to redefine ifelse(cond,ift,iff) in a way that is transparent.  the
> >> ift and iff functions will always be evaluated before the function
> >> call happens, even with lazy evaluation.  :-(
> >>
> >> I still think that it makes sense to have a smarter 
> vectorized %if% in
> >> a vectorized language like R.  just my 5 cents.
> >>
> >> /iaw
> >>
> >> 
> >> Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com)
> >>
> >>
> >>
> >>
> >>
> >> On Tue, Mar 1, 2011 at 2:33 PM, Henrique Dallazuanna
> >>  wrote:
> >> > Try this:
> >> >
> >> > mapply(function(x, f)f(x), split(t, t %% 2), list(g, f))
> >> >
> >> > On Tue, Mar 1, 2011 at 4:19 PM, ivo welch 
>  wrote:
> >> >>
> >> >> dear R experts---
> >> >>
> >> >>  t <- 1:30
> >> >>  f <- function(t) { cat("f for", t, "\n"); return(2*t) }
> >> >>  g <- function(t) { cat("g for", t, "\n"); return(3*t) }
> >> >>  s <- ifelse( t%%2==0, g(t), f(t))
> >> >>
> >> >> shows that the ifelse function actually evaluates both f()
> >> and g() for
> >> >> all values first, and presumably then just picks left or
> >> right results
> >> >> based on t%%2.  uggh... wouldn't it make more sense to
> >> evaluate only
> >> >> the relevant parts of each vector and then reassemble them?
> >> >>
> >> >> /iaw
> >> >> 
> >> >> Ivo Welch
> >> >>
> >> >> __
> >> >> R-help@r-project.org mailing list
> >> >> https://stat.

Re: [R] How to prove the MLE estimators are normal distributed?

2011-03-01 Thread Bill.Venables
This is a purely statistical question and you should try asking it on some 
statistics list.  

This is for help with using R, mostly for data analysis and graphics.  A glance 
at the posting guide (see the footnote below) might be a good idea.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Ning Cheng
Sent: Wednesday, 2 March 2011 8:21 AM
To: r-help@r-project.org
Subject: [R] How to prove the MLE estimators are normal distributed?

Dear List,
I'm now working on MLE and OSL estimators.I just noticed that the
textbook argues they are joint normal distributed.But how to prove the
conclusion?

Thanks for your time in advance!

Best,
Ning

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Difference in numeric Dates between Excel and R

2011-03-01 Thread Luis Felipe Parra
Hello. I am using some dates I read in excel in R. I know the excel origin
is supposed to be 1900-1-1. But when I used as.Date with origin=1900-1-1 the
dates that R reported me where two days ahead than the ones I read from
Excel. I noticed that when I did in R the following:

> as.Date("2011-3-4")-as.Date("1900-1-1")
Time difference of 40604 days

but if I do the same operation in Excel the answer is 40605. Does anybody
know what can be going on?

Thank you
Felipe Parra

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] error in saved .csv

2011-03-01 Thread Don McKenzie
If you have ONE data frame that you want to export to excel (I believe that was 
the original request), you probably
don't need to change any of the default arguments to write.csv(), except 
"row.names", which will give you an extra column.

On Mar 1, 2011, at 2:50 PM, Tamas Barjak wrote:

> Yes, the format is incorrect. I have already tried the write.table, but it
> didn't work.
> 
> 
> 2011/3/1 jim holtman 
> 
>> I am not sure what you are saying your problem is?  Is the format
>> incorrect?  BTW, notice that write.csv does not have a 'sep'
>> parameter.  Maybe you should be using write.table.
>> 
>> On Tue, Mar 1, 2011 at 4:36 PM, Tamas Barjak 
>> wrote:
>>> Help me please!
>>> 
>>> I would like to be saved a data table:
>>> 
>>> write.csv(random.t1, "place", dec=",", append = T, quote = FALSE, sep = "
>> ",
>>> qmethod = "double", eol = "\n", row.names=F)
>>> 
>>> It's OK!
>>> 
>>> But the rows of file
>>> 
>>> 1,1,21042,-4084.87179487179,2457.66483516483,-582.275562799881
>>> 2,2,23846,-6383.86480186479,-3409.98451548449,-3569.72145340269
>>> and no
>>> 
>>> 1
>>> 21042 - 4084.87179487179 2457.66483516483
>>> Not proportional...
>>> 
>>> What's the problem???
>>> 
>>> Thanks!
>>> 
>>>   [[alternative HTML version deleted]]
>>> 
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>> 
>> 
>> 
>> 
>> --
>> Jim Holtman
>> Data Munger Guru
>> 
>> What is the problem that you are trying to solve?
>> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




Don McKenzie
Research Ecologist
Pacific Wildland Fire Sciences Lab
US Forest Service

Affiliate Professor
School of Forest Resources and CSES Climate Impacts Group
University of Washington

phone: 206-732-7824
cell: 206-321-5966
d...@uw.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Stepwise Criterion

2011-03-01 Thread Mano Gabor Toth
Thanks a lot for your last letter, you're right, I wasn't clear enough.
I've already tried AIC before, but I thought that comparing models based on
this criterion would be applicable if I had good models and I wanted to find
the best fitting one. However, I only have poor models with high AIC, and I
thought that it would make sense to also look for models with relatively
good fit. So probably I should try to take into account the significance
level of the residual deviance (thanks for the correction) somehow. I may be
wrong, but simply minimising the deviance would not necessarily maximise the
significance level as it also depends on the degrees of freedom, which
varies with each model (with different variables and interaction terms). And
yes, simply maximising the significance level would make the model very
complex, so some penalty is necessary for additional variables. All in all,
I'm not really sure how to balance AIC and model fit in this context.
Thanks again for your comments.

Best,

Mano Gabor TOTH


On 1 March 2011 23:39,  wrote:

> The "probability OF the residual deviance" is zero.  The significance level
> for the residual deviance according to its asymptotic Chi-squared
> distribution is a possible criterion, but a silly one.  If you want to
> minimise that, just fit no variables at all.  That's the best you can do. If
> you want to maximise it, just minimise the deviance itself, which means
> include all possible variables in the regression, together with as many
> interactions as you can as well.  (Incidently R doesn't have restrictions on
> how many interaction terms it can handle, those are imposed my your
> computer.)
>
> I suggest you think again about what criterion you really want to use.
>  Somehow you need to balance fit in the training sample against some
> complexity measure.  AIC and BIC are commonly used criteria, but not the
> only ones.  I suggest you start with these and see if either does the kind
> of job you want.
>
> Stepwise regression with interaction terms can be a bit tricky if you want
> to impose the marginality constraints, but that is a bigger issue.
>
> Bill Venables.
>
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf Of Mano Gabor Toth
> Sent: Wednesday, 2 March 2011 6:44 AM
> To: r-help@r-project.org
> Subject: [R] Logistic Stepwise Criterion
>
> Dear R-help members,
>
> I'd like to run a binomial logistic stepwise regression with ten
> explanatory
> variables and as many interaction terms as R can handle. I'll come up with
> the right R command sooner or later, but my real question is whether and
> how
> the criterion for the evaluation of the different models can be set to be
> the probability of the residual deviance in the Chi-Square distribution
> (which would be more informative of overall model fit than AIC).
>
> Thanks in advance for all your help.
>
> Kind regards,
>
> Mano Gabor TOTH
> MA Political Science
> Central European University
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] splitting and stacking matrices

2011-03-01 Thread Gabor Grothendieck
On Tue, Mar 1, 2011 at 4:55 PM, Darcy Webber  wrote:
> Dear R users,
>
> I am having some difficulty arranging some matrices and wondered if
> anyone could help out. As an example, consider the following matrix:
>
> a <- matrix(1:32, nrow = 4, ncol = 8)
> a
>     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
> [1,]    1    5    9   13   17   21   25   29
> [2,]    2    6   10   14   18   22   26   30
> [3,]    3    7   11   15   19   23   27   31
> [4,]    4    8   12   16   20   24   28   32
>
> I would like it to look like the following matrix:
>
>     [,1] [,2] [,3] [,4]
> [1,]    1    5    9   13
> [2,]    2    6   10   14
> [3,]    3    7   11   15
> [4,]    4    8   12   16
> [5,]  17   21   25   29
> [6,]  18   22   26   30
> [7,]  19   23   27   31
> [8,]  20   24   28   32
>
> I can achieve this using the following:
>
> a1 <- a[, 1:4]
> a2 <- a[, 5:8]
> b <- rbind(a1, a2)
>
> However, my initial matrix often has a varibale number of columns (in
> multiples of 4, and I still want to split the columns into blocks of 4
> and stack these). I have considered working out how many blocks the
> matrix must be split into using: no.blocks <- ncol(a)/4. My problem is
> then implementing this information to actually split the matrix up and
> then stack it. Any guidance on this would be much appreciated.
>
> Regards
> Darcy Webber
>

Try converting to a 3d array, swapping the last two dimensions and
reconstituting it as a matrix:

> matrix(aperm(array(a, c(4, 4, 2)), c(1, 3, 2)), nc = 4)
 [,1] [,2] [,3] [,4]
[1,]159   13
[2,]26   10   14
[3,]37   11   15
[4,]48   12   16
[5,]   17   21   25   29
[6,]   18   22   26   30
[7,]   19   23   27   31
[8,]   20   24   28   32



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] error in saved .csv

2011-03-01 Thread Tamas Barjak
Yes, the format is incorrect. I have already tried the write.table, but it
didn't work.


2011/3/1 jim holtman 

> I am not sure what you are saying your problem is?  Is the format
> incorrect?  BTW, notice that write.csv does not have a 'sep'
> parameter.  Maybe you should be using write.table.
>
> On Tue, Mar 1, 2011 at 4:36 PM, Tamas Barjak 
> wrote:
> > Help me please!
> >
> > I would like to be saved a data table:
> >
> > write.csv(random.t1, "place", dec=",", append = T, quote = FALSE, sep = "
> ",
> > qmethod = "double", eol = "\n", row.names=F)
> >
> > It's OK!
> >
> > But the rows of file
> >
> >  1,1,21042,-4084.87179487179,2457.66483516483,-582.275562799881
> > 2,2,23846,-6383.86480186479,-3409.98451548449,-3569.72145340269
> > and no
> >
> > 1
> > 21042 - 4084.87179487179 2457.66483516483
> > Not proportional...
> >
> > What's the problem???
> >
> > Thanks!
> >
> >[[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to prove the MLE estimators are normal distributed?

2011-03-01 Thread Ning Cheng
Dear List,
I'm now working on MLE and OSL estimators.I just noticed that the
textbook argues they are joint normal distributed.But how to prove the
conclusion?

Thanks for your time in advance!

Best,
Ning

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Stepwise Criterion

2011-03-01 Thread Bill.Venables
The "probability OF the residual deviance" is zero.  The significance level for 
the residual deviance according to its asymptotic Chi-squared distribution is a 
possible criterion, but a silly one.  If you want to minimise that, just fit no 
variables at all.  That's the best you can do. If you want to maximise it, just 
minimise the deviance itself, which means include all possible variables in the 
regression, together with as many interactions as you can as well.  (Incidently 
R doesn't have restrictions on how many interaction terms it can handle, those 
are imposed my your computer.)

I suggest you think again about what criterion you really want to use.  Somehow 
you need to balance fit in the training sample against some complexity measure. 
 AIC and BIC are commonly used criteria, but not the only ones.  I suggest you 
start with these and see if either does the kind of job you want.

Stepwise regression with interaction terms can be a bit tricky if you want to 
impose the marginality constraints, but that is a bigger issue.

Bill Venables. 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Mano Gabor Toth
Sent: Wednesday, 2 March 2011 6:44 AM
To: r-help@r-project.org
Subject: [R] Logistic Stepwise Criterion

Dear R-help members,

I'd like to run a binomial logistic stepwise regression with ten explanatory
variables and as many interaction terms as R can handle. I'll come up with
the right R command sooner or later, but my real question is whether and how
the criterion for the evaluation of the different models can be set to be
the probability of the residual deviance in the Chi-Square distribution
(which would be more informative of overall model fit than AIC).

Thanks in advance for all your help.

Kind regards,

Mano Gabor TOTH
MA Political Science
Central European University

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] error in saved .csv

2011-03-01 Thread Peter Langfelder
On Tue, Mar 1, 2011 at 1:36 PM, Tamas Barjak  wrote:
> Help me please!
>
> I would like to be saved a data table:
>
> write.csv(random.t1, "place", dec=",", append = T, quote = FALSE, sep = " ",
> qmethod = "double", eol = "\n", row.names=F)
>
> It's OK!
>
> But the rows of file

>  1,1,21042,-4084.87179487179,2457.66483516483,-582.275562799881
> 2,2,23846,-6383.86480186479,-3409.98451548449,-3569.72145340269
> and no
>
> 1
> 21042 -         4084.87179487179 2457.66483516483
> Not proportional...
>
> What's the problem???

If I understand you correctly, you want a text file where the
separator is a white space. You cannot get that with write.csv - you
have to use write.table(). The function write.csv does not allow you
to change the sep argument. Here's what the help file says:

 ‘write.csv’ and ‘write.csv2’ provide convenience wrappers for
 writing CSV files.  They set ‘sep’, ‘dec’ and ‘qmethod’, and
 ‘col.names’ to ‘NA’ if ‘row.names = TRUE’ and ‘TRUE’ otherwise.

 ‘write.csv’ uses ‘"."’ for the decimal point and a comma for the
 separator.

 ‘write.csv2’ uses a comma for the decimal point and a semicolon
 for the separator, the Excel convention for CSV files in some
 Western European locales.

 These wrappers are deliberately inflexible: they are designed to
 ensure that the correct conventions are used to write a valid
 file.  Attempts to change ‘append’, ‘col.names’, ‘sep’, ‘dec’ or
 ‘qmethod’ are ignored, with a warning.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] error in saved .csv

2011-03-01 Thread jim holtman
I am not sure what you are saying your problem is?  Is the format
incorrect?  BTW, notice that write.csv does not have a 'sep'
parameter.  Maybe you should be using write.table.

On Tue, Mar 1, 2011 at 4:36 PM, Tamas Barjak  wrote:
> Help me please!
>
> I would like to be saved a data table:
>
> write.csv(random.t1, "place", dec=",", append = T, quote = FALSE, sep = " ",
> qmethod = "double", eol = "\n", row.names=F)
>
> It's OK!
>
> But the rows of file
>
>  1,1,21042,-4084.87179487179,2457.66483516483,-582.275562799881
> 2,2,23846,-6383.86480186479,-3409.98451548449,-3569.72145340269
> and no
>
> 1
> 21042 -         4084.87179487179 2457.66483516483
> Not proportional...
>
> What's the problem???
>
> Thanks!
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] inefficient ifelse() ?

2011-03-01 Thread ivo welch
yikes.  you are asking me too much.

thanks everybody for the information.  I learned something new.

my suggestion would be for the much smarter language designers (than
I) to offer us more or less blissfully ignorant users another
vector-related construct in R.  It could perhaps be named %if% %else%,
analogous to if else (with naming inspired by %in%, and with
evaluation only of relevant parts [just as if else for scalars]), with
different outcomes in some cases, but with the advantage of typically
evaluating only half as many conditions as the ifelse() vector
construct.  %if% %else% may work only in a subset of cases, but when
it does work, it would be nice to have.  it would probably be my first
"goto" function, with ifelse() use only as a fallback.

of course, I now know how to fix my specific issue.  I was just
surprised that my first choice, ifelse(), was not as optimized as I
had thought.

best,

/iaw


On Tue, Mar 1, 2011 at 5:13 PM, William Dunlap  wrote:
> An ifelse-like function that only evaluated
> what was needed would be fine, but it would
> have to be different from ifelse itself.  The
> trick is to come up with a good parameterization.
>
> E.g., how would it deal with things like
>   ifelse(is.na(x), mean(x, na.rm=TRUE), x)
> or
>   ifelse(x>1, log(x), runif(length(x),-1,0))
> or
>   ifelse(x>1, log(x), -seq_along(x))
> Would it reject such things?  Deciding that the
> x in mean(x,na.rm=TRUE) should be replaced by
> x[is.na(x)] would be wrong.  Deciding that
> runif(length(x)) should be replaced by runif(sum(x>1))
> seems a bit much to expect.  Replacing seq_along(x) with
> seq_len(sum(x>1)) is wrong.  It would be better to
> parameterize the new function so it wouldn't have to
> think about those cases.
>
> Would you want it to depend only on a logical
> vector or perhaps also on a factor (a vectorized
> switch/case function)?
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>> -Original Message-
>> From: r-help-boun...@r-project.org
>> [mailto:r-help-boun...@r-project.org] On Behalf Of ivo welch
>> Sent: Tuesday, March 01, 2011 12:36 PM
>> To: Henrique Dallazuanna
>> Cc: r-help
>> Subject: Re: [R] inefficient ifelse() ?
>>
>> thanks, Henrique.  did you mean
>>
>>     as.vector(t(mapply(function(x, f)f(x), split(t, ((t %% 2)==0)),
>> list(f, g   ?
>>
>> otherwise, you get a matrix.
>>
>> its a good solution, but unfortunately I don't think this can be used
>> to redefine ifelse(cond,ift,iff) in a way that is transparent.  the
>> ift and iff functions will always be evaluated before the function
>> call happens, even with lazy evaluation.  :-(
>>
>> I still think that it makes sense to have a smarter vectorized %if% in
>> a vectorized language like R.  just my 5 cents.
>>
>> /iaw
>>
>> 
>> Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com)
>>
>>
>>
>>
>>
>> On Tue, Mar 1, 2011 at 2:33 PM, Henrique Dallazuanna
>>  wrote:
>> > Try this:
>> >
>> > mapply(function(x, f)f(x), split(t, t %% 2), list(g, f))
>> >
>> > On Tue, Mar 1, 2011 at 4:19 PM, ivo welch  wrote:
>> >>
>> >> dear R experts---
>> >>
>> >>  t <- 1:30
>> >>  f <- function(t) { cat("f for", t, "\n"); return(2*t) }
>> >>  g <- function(t) { cat("g for", t, "\n"); return(3*t) }
>> >>  s <- ifelse( t%%2==0, g(t), f(t))
>> >>
>> >> shows that the ifelse function actually evaluates both f()
>> and g() for
>> >> all values first, and presumably then just picks left or
>> right results
>> >> based on t%%2.  uggh... wouldn't it make more sense to
>> evaluate only
>> >> the relevant parts of each vector and then reassemble them?
>> >>
>> >> /iaw
>> >> 
>> >> Ivo Welch
>> >>
>> >> __
>> >> R-help@r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide
>> >> http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>> >
>> >
>> >
>> > --
>> > Henrique Dallazuanna
>> > Curitiba-Paraná-Brasil
>> > 25° 25' 40" S 49° 16' 22" O
>> >
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] inefficient ifelse() ?

2011-03-01 Thread Thomas Lumley
On Wed, Mar 2, 2011 at 9:36 AM, ivo welch  wrote:
> thanks, Henrique.  did you mean
>
>    as.vector(t(mapply(function(x, f)f(x), split(t, ((t %% 2)==0)),
> list(f, g   ?
>
> otherwise, you get a matrix.
>
> its a good solution, but unfortunately I don't think this can be used
> to redefine ifelse(cond,ift,iff) in a way that is transparent.  the
> ift and iff functions will always be evaluated before the function
> call happens, even with lazy evaluation.  :-(
>
> I still think that it makes sense to have a smarter vectorized %if% in
> a vectorized language like R.  just my 5 cents.
>

Ivo,

There is no guarantee in general that  f(x[3,5,7]) is the same as f(x)[3,5,7]


  -thomas

-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] inefficient ifelse() ?

2011-03-01 Thread William Dunlap
An ifelse-like function that only evaluated
what was needed would be fine, but it would
have to be different from ifelse itself.  The
trick is to come up with a good parameterization.

E.g., how would it deal with things like
   ifelse(is.na(x), mean(x, na.rm=TRUE), x)
or
   ifelse(x>1, log(x), runif(length(x),-1,0)) 
or
   ifelse(x>1, log(x), -seq_along(x))
Would it reject such things?  Deciding that the
x in mean(x,na.rm=TRUE) should be replaced by
x[is.na(x)] would be wrong.  Deciding that
runif(length(x)) should be replaced by runif(sum(x>1))
seems a bit much to expect.  Replacing seq_along(x) with
seq_len(sum(x>1)) is wrong.  It would be better to
parameterize the new function so it wouldn't have to
think about those cases.

Would you want it to depend only on a logical
vector or perhaps also on a factor (a vectorized
switch/case function)?

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of ivo welch
> Sent: Tuesday, March 01, 2011 12:36 PM
> To: Henrique Dallazuanna
> Cc: r-help
> Subject: Re: [R] inefficient ifelse() ?
> 
> thanks, Henrique.  did you mean
> 
> as.vector(t(mapply(function(x, f)f(x), split(t, ((t %% 2)==0)),
> list(f, g   ?
> 
> otherwise, you get a matrix.
> 
> its a good solution, but unfortunately I don't think this can be used
> to redefine ifelse(cond,ift,iff) in a way that is transparent.  the
> ift and iff functions will always be evaluated before the function
> call happens, even with lazy evaluation.  :-(
> 
> I still think that it makes sense to have a smarter vectorized %if% in
> a vectorized language like R.  just my 5 cents.
> 
> /iaw
> 
> 
> Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com)
> 
> 
> 
> 
> 
> On Tue, Mar 1, 2011 at 2:33 PM, Henrique Dallazuanna 
>  wrote:
> > Try this:
> >
> > mapply(function(x, f)f(x), split(t, t %% 2), list(g, f))
> >
> > On Tue, Mar 1, 2011 at 4:19 PM, ivo welch  wrote:
> >>
> >> dear R experts---
> >>
> >>  t <- 1:30
> >>  f <- function(t) { cat("f for", t, "\n"); return(2*t) }
> >>  g <- function(t) { cat("g for", t, "\n"); return(3*t) }
> >>  s <- ifelse( t%%2==0, g(t), f(t))
> >>
> >> shows that the ifelse function actually evaluates both f() 
> and g() for
> >> all values first, and presumably then just picks left or 
> right results
> >> based on t%%2.  uggh... wouldn't it make more sense to 
> evaluate only
> >> the relevant parts of each vector and then reassemble them?
> >>
> >> /iaw
> >> 
> >> Ivo Welch
> >>
> >> __
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
> > --
> > Henrique Dallazuanna
> > Curitiba-Paraná-Brasil
> > 25° 25' 40" S 49° 16' 22" O
> >
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] splitting and stacking matrices

2011-03-01 Thread Darcy Webber
Dear R users,

I am having some difficulty arranging some matrices and wondered if
anyone could help out. As an example, consider the following matrix:

a <- matrix(1:32, nrow = 4, ncol = 8)
a
 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]159   13   17   21   25   29
[2,]26   10   14   18   22   26   30
[3,]37   11   15   19   23   27   31
[4,]48   12   16   20   24   28   32

I would like it to look like the following matrix:

 [,1] [,2] [,3] [,4]
[1,]159   13
[2,]26   10   14
[3,]37   11   15
[4,]48   12   16
[5,]  17   21   25   29
[6,]  18   22   26   30
[7,]  19   23   27   31
[8,]  20   24   28   32

I can achieve this using the following:

a1 <- a[, 1:4]
a2 <- a[, 5:8]
b <- rbind(a1, a2)

However, my initial matrix often has a varibale number of columns (in
multiples of 4, and I still want to split the columns into blocks of 4
and stack these). I have considered working out how many blocks the
matrix must be split into using: no.blocks <- ncol(a)/4. My problem is
then implementing this information to actually split the matrix up and
then stack it. Any guidance on this would be much appreciated.

Regards
Darcy Webber

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to understand output from R's polr function (ordered logistic regression)?

2011-03-01 Thread Dan Frankowski
Also posted as
http://stats.stackexchange.com/questions/7720/how-to-understand-output-from-rs-polr-function-ordered-logistic-regression
.

Also, I read section 7.3 of "Modern Applied Statistics with S" by Venables
and Ripley (who wrote polr?), and I can still not answer many of these
questions.

On Tue, Mar 1, 2011 at 3:25 PM, Dan Frankowski  wrote:

> I am new to R, ordered logistic regression, and polr.
>
> The "Examples" section at the bottom of the help page for 
> polr(that 
> fits a logistic or probit regression model to an ordered factor
> response) shows
>
> options(contrasts = c("contr.treatment", "contr.poly"))
>
> house.plr <- polr(Sat ~ Infl + Type + Cont, weights = Freq, data = housing)
>
> pr <- profile(house.plr)
>
> plot(pr)
> pairs(pr)
>
>
>-
>
>What information does pr contain? The help page on 
> profileis
>  generic, and gives no guidance for polr.
>-
>
>What is plot(pr) showing? I see six graphs. Each has an X axis that is
>numeric, although the label is an indicator variable (looks like an input
>variable that is an indicator for an ordinal value). Then the Y axis is
>"tau" which is completely unexplained.
>-
>
>What is pairs(pr) showing? It looks like a plot for each pair of input
>variables, but again I see no explanation of the X or Y axes.
>-
>
>How can one understand if the model gave a good fit? 
> summary(house.plr)shows Residual Deviance 3479.149 and AIC (Akaike 
> Information Criterion?) of
>3495.149. Is that good? In the case those are only useful as relative
>measures (i.e. to compare to another model fit), what is a good absolute
>measure? Is the residual deviance approximately chi-squared distributed? 
> Can
>one use "% correctly predicted" on the original data or some
>cross-validation? What is the easiest way to do that?
>-
>
>How does one apply and interpret anova on this model? The docs say
>"There are methods for the standard model-fitting functions, including
>predict, summary, vcov, anova." However, running anova(house.plr)results 
> in anova
>is not implemented for a single "polr" object
>-
>
>How does one interpret the t values for each coefficient? Unlike some
>model fits, there are no P values here.
>
> I realize this is a lot of questions, but it makes sense to me to ask as
> one bundle ("how do I use this thing?") rather than 7 different questions.
> Any information appreciated.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to understand output from R's polr function (ordered logistic regression)?

2011-03-01 Thread Dan Frankowski
I am new to R, ordered logistic regression, and polr.

The "Examples" section at the bottom of the help page for
polr(that
fits a logistic or probit regression model to an ordered factor
response) shows

options(contrasts = c("contr.treatment", "contr.poly"))
house.plr <- polr(Sat ~ Infl + Type + Cont, weights = Freq, data = housing)
pr <- profile(house.plr)
plot(pr)
pairs(pr)


   -

   What information does pr contain? The help page on
profileis
generic, and gives no guidance for polr.
   -

   What is plot(pr) showing? I see six graphs. Each has an X axis that is
   numeric, although the label is an indicator variable (looks like an input
   variable that is an indicator for an ordinal value). Then the Y axis is
   "tau" which is completely unexplained.
   -

   What is pairs(pr) showing? It looks like a plot for each pair of input
   variables, but again I see no explanation of the X or Y axes.
   -

   How can one understand if the model gave a good fit?
summary(house.plr)shows Residual Deviance 3479.149 and AIC (Akaike
Information Criterion?) of
   3495.149. Is that good? In the case those are only useful as relative
   measures (i.e. to compare to another model fit), what is a good absolute
   measure? Is the residual deviance approximately chi-squared distributed? Can
   one use "% correctly predicted" on the original data or some
   cross-validation? What is the easiest way to do that?
   -

   How does one apply and interpret anova on this model? The docs say "There
   are methods for the standard model-fitting functions, including predict,
   summary, vcov, anova." However, running anova(house.plr) results in anova
   is not implemented for a single "polr" object
   -

   How does one interpret the t values for each coefficient? Unlike some
   model fits, there are no P values here.

I realize this is a lot of questions, but it makes sense to me to ask as one
bundle ("how do I use this thing?") rather than 7 different questions. Any
information appreciated.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Lattice: useOuterStrips and axes

2011-03-01 Thread Jim Price
Thank you, that's exactly what I needed.

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Lattice-useOuterStrips-and-axes-tp3330338p3330613.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Pairwise T-Tests and Dunnett's Test (possibly using multcomp)

2011-03-01 Thread Paul Miller
Hello Everyone,
 
I've been learning to use R in my spare time over the past several months. I've 
read about 7-8 books on the subject. Lately I've been testing what I've learned 
by trying to replicate the analyses from some of my SAS books. This helps me 
make sure I know how to use R properly and also helps me to understand how the 
two programs are similar and different.
 
Below is an attempt to replicate some SAS analyses. This was a success in that 
I was able to reproduce the results using R. I have a feeling though that it 
may not represent an ideal approach. So I was hoping a few people might be 
willing to look at what I've done and to suggest improvements or alternatives.
 
One thing I'm struggling with is setting the reference level. I inserted a 
comment about this in the R code.
 
I've also pasted in the original SAS code in case some people are also SAS 
users and might find this helpful or interesting.
 
Thanks,
 
Paul
 
  
R Code:
 
##
 Chapter 6: One-Way ANOVA 
##
 
 Example 6.1: One-Way ANOVA using HAM-A Scores in GAD  
 
connection <- textConnection("
101 LO 21  104 LO 18
106 LO 19  110 LO  .
112 LO 28  116 LO 22
120 LO 30  121 LO 27
124 LO 28  125 LO 19
130 LO 23  136 LO 22
137 LO 20  141 LO 19
143 LO 26  148 LO 35
152 LO  .  103 HI 16
105 HI 21  109 HI 31
111 HI 25  113 HI 23
119 HI 25  123 HI 18
127 HI 20  128 HI 18
131 HI 16  135 HI 24
138 HI 22  140 HI 21
142 HI 16  146 HI 33
150 HI 21  151 HI 17
102 PB 22  107 PB 26
108 PB 29  114 PB 19
115 PB  .  117 PB 33
118 PB 37  122 PB 25
126 PB 28  129 PB 26
132 PB  .  133 PB 31
134 PB 27  139 PB 30
144 PB 25  145 PB 22
147 PB 36  149 PB 32
")
 
gad <- data.frame(scan(connection,
   list(patno=0,dosegrp="",hama=0),na.strings="."))
 
gad
 
 Summary Statistics 
 
summaryfn <- function(x) data.frame(
   mean=mean(x,na.rm=T),std.dev=sd(x,na.rm=T),n=sum(!is.na(x)))
 
by(gad$hama,gad$dosegrp,summaryfn)
 
 ANOVA 
 
model1 <- aov(hama~dosegrp,data=gad)
summary.aov(model1)
 
 Multiple Comparisons 
 
library(multcomp)
 
 Pairwise T-Tests 
 
cht <- glht(model1, linfct = mcp(dosegrp = "Tukey"))
summary(cht, test = univariate())
 
 Dunnett's Test 
 
#Not sure how to set the reference group to placebo
#cht <- glht(model1, linfct = mcp(dosegrp = "Dunnett"))
#summary(cht, test = univariate())
 
model2 <- aov(hama ~ dosegrp - 1, data = gad)
K <- rbind(c(1, 0, -1), c(0, 1, -1))
rownames(K) <- c("HI vs PL", "LO vs PL")
colnames(K) <- names(coef(model2))
summary(glht(model2, linfct = K))
 
 Treatment vs. Placebo 
 
#SAS produces an F-test but this is equivalent
 
K <- rbind(c(0.5, 0.5, -1))
rownames(K) <- c("TR vs PL")
colnames(K) <- names(coef(model2))
summary(glht(model2, linfct = K))

SAS code:
 
*---*;
** Chapter 6: One-Way ANOVA;
 
/* Example 6.1: HAM-A Scores in GAD */
 
DATA GAD;
    INPUT PATNO DOSEGRP $ HAMA @@;
    DATALINES;
101 LO 21  104 LO 18
106 LO 19  110 LO  .
112 LO 28  116 LO 22
120 LO 30  121 LO 27
124 LO 28  125 LO 19
130 LO 23  136 LO 22
137 LO 20  141 LO 19
143 LO 26  148 LO 35
152 LO  .  103 HI 16
105 HI 21  109 HI 31
111 HI 25  113 HI 23
119 HI 25  123 HI 18
127 HI 20  128 HI 18
131 HI 16  135 HI 24
138 HI 22  140 HI 21
142 HI 16  146 HI 33
150 HI 21  151 HI 17
102 PB 22  107 PB 26
108 PB 29  114 PB 19
115 PB  .  117 PB 33
118 PB 37  122 PB 25
126 PB 28  129 PB 26
132 PB  .  133 PB 31
134 PB 27  139 PB 30
144 PB 25  145 PB 22
147 PB 36  149 PB 32
;
 
PROC SORT DATA = GAD; BY DOSEGRP;
PROC MEANS MEAN STD N DATA = GAD;
    BY DOSEGRP;
    VAR HAMA;
    TITLE1 'One-Way ANOVA';
    TITLE2 'EXAMPLE 6.1: HAM-A Scores in GAD';
RUN;
 
PROC GLM DATA = GAD;
    CLASS DOSEGRP;
    MODEL HAMA = DOSEGRP;
    MEANS DOSEGRP/T DUNNETT('PB');
    CONTRAST 'ACTIVE vs. PLACEBO' DOSEGRP 0.5 0.5 -1;
RUN;


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Logistic Stepwise Criterion

2011-03-01 Thread Mano Gabor Toth
Dear R-help members,

I'd like to run a binomial logistic stepwise regression with ten explanatory
variables and as many interaction terms as R can handle. I'll come up with
the right R command sooner or later, but my real question is whether and how
the criterion for the evaluation of the different models can be set to be
the probability of the residual deviance in the Chi-Square distribution
(which would be more informative of overall model fit than AIC).

Thanks in advance for all your help.

Kind regards,

Mano Gabor TOTH
MA Political Science
Central European University

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Export R dataframes to excel

2011-03-01 Thread maxsilva
Thx, but im looking for a more direct solution... my problem is very simple,
I have a dataframe and I want to create a standard excel spreadsheet. My
dataframe could be something like this

id   sex weight
1 M   5'8
2 F6'2
3 F5'5
4 M   5'7
5 F6'3



-- 
View this message in context: 
http://r.789695.n4.nabble.com/Export-R-dataframes-to-excel-tp3330399p3330491.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regression with many independent variables

2011-03-01 Thread Matthew Douglas
Hi Greg,

Thanks for the help, it works perfectly. To answer your question,
there are 339 independent variables but only 10 will be used at one
time . So at any given line of the data set there will be 10 non zero
entries for the independent variables and the rest will be zeros.

One more question:

1. I still want to find a way to look at the interactions of the
independent variables.

the regression would look like this:

y = b12*X1X2 + b23*X2X3 +...+ bk-1k*Xk-1Xk

so I think the regression in R would look like this:

lm(MARGIN, P235:P236+P236:P237+,weights = Poss, data = adj0708),

my problem is that since I have technically 339 independent variables,
when I do this regression I would have 339 Choose 2 = approx 57000
independent variables (a vast majority will be 0s though) so I dont
want to have to write all of these out. Is there a way to do this
quickly in R?

Also just a curious question that I cant seem to find to online:
is there a more efficient model other than lm() that is better for
very sparse data sets like mine?

Thanks,
Matt


On Mon, Feb 28, 2011 at 4:30 PM, Greg Snow  wrote:
> Don't put the name of the dataset in the formula, use the data argument to lm 
> to provide that.  A single period (".") on the right hand side of the formula 
> will represent all the columns in the data set that are not on the left hand 
> side (you can then use "-" to remove any other columns that you don't want 
> included on the RHS).
>
> For example:
>
>> lm(Sepal.Width ~ . - Sepal.Length, data=iris)
>
> Call:
> lm(formula = Sepal.Width ~ . - Sepal.Length, data = iris)
>
> Coefficients:
>      (Intercept)       Petal.Length        Petal.Width  Speciesversicolor
>           3.0485             0.1547             0.6234            -1.7641
>  Speciesvirginica
>          -2.1964
>
>
> But, are you sure that a regression model with 339 predictors will be 
> meaningful?
>
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.s...@imail.org
> 801.408.8111
>
>
>> -Original Message-
>> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
>> project.org] On Behalf Of Matthew Douglas
>> Sent: Monday, February 28, 2011 1:32 PM
>> To: r-help@r-project.org
>> Subject: [R] Regression with many independent variables
>>
>> Hi,
>>
>> I am trying use lm() on some data, the code works fine but I would
>> like to use a more efficient way to do this.
>>
>> The data looks like this (the data is very sparse with a few 1s, -1s
>> and the rest 0s):
>>
>> > head(adj0708)
>>       MARGIN Poss P235 P247 P703 P218 P430 P489 P83 P307 P337
>> 1   64.28571   29    0    0    0    0    0    0   0    0    0    0
>> 0    0    0
>> 2 -100.0    6    0    0    0    0    0    0   0    1    0    0
>> 0    0    0
>> 3  100.0    4    0    0    0    0    0    0   0    1    0    0
>> 0    0    0
>> 4  -33.3    7    0    0    0    0    0    0   0    0    0    0
>> 0    0    0
>> 5  200.0    2    0    0    0    0    0    0   0    0    0    0
>> -1    0    0
>> 6  -83.3   12    0    -1    0    0    0    0   0    0    0    0
>> 0    0    0
>>
>> adj0708 is actually a 35657x341 data set. Each column after "Poss" is
>> an independent variable, the dependent variable is "MARGIN" and it is
>> weighted by "Poss"
>>
>>
>> The regression is below:
>> fit.adj0708 <- lm( adj0708$MARGIN~adj0708$P235 + adj0708$P247 +
>> adj0708$P703 + adj0708$P430 + adj0708$P489 + adj0708$P218 +
>> adj0708$P605 + adj0708$P337 +  +
>> adj0708$P510,weights=adj0708$Poss)
>>
>> I have two questions:
>>
>> 1. Is there a way to to condense how I write the independent variables
>> in the lm(), instead of having such a long line of code (I have 339
>> independent variables to be exact)?
>> 2. I would like to pair the data to look a regression of the
>> interactions between two independent variables. I think it would look
>> something like this
>> fit.adj0708 <- lm( adj0708$MARGIN~adj0708$P235:adj0708$P247 +
>> adj0708$P703:adj0708$P430 + adj0708$P489:adj0708$P218 +
>> adj0708$P605:adj0708$P337 + ,weights=adj0708$Poss)
>> but there will be 339 Choose 2 combinations, so a lot of independent
>> variables! Is there a more efficient way of writing this code. Is
>> there a way I can do this?
>>
>> Thanks,
>> Matt
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] Major update to rms package

2011-03-01 Thread Frank Harrell
A new version of rms is now available on CRAN for Linux and Windows (Mac  
will probably be available very soon). Largest changes include latex  
methods for validate.* and adding the capability to force a subset of  
variables to be included in all backwards stepdown models (single model or  
validation by resampling).


Recent updates:

   * In survplot.rms, fixed bug (curves were undefined if conf='bands' and  
labelc was FALSE)

   * In survfit.cph, fixed bug by which n wasn't always defined
   * In cph, put survival::: on exact fit call
   * Quit ignoring zlim argument in bplot; added xlabrot argument
   * Added caption argument for latex.anova.rms
   * Changed predab to not print summaries of variables selected if bw=TRUE
   * Changed predab to pass force argument to fastbw
   * fastbw: implemented force argument
   * Added force argument to validate.lrm, validate.bj, calibrate.default,  
calibrate.cph, calibrate.psm, validate.bj, validate.cph, validate.ols
   * print.validate: added B argument to limit how many resamples are  
printed summarizing variables selected if BW=TRUE

   * print.calibrate, print.calibrate.default: added B argument
   * Added latex method for results produced by validate functions
   * Fixed survest.cph to convert summary.survfit std.err to log S(t) scale
   * Fixed val.surv by pulling surv object from survest result
   * Clarified in predict.lrm help file that doesn't always use the first  
intercept
   * lrm.fit, lrm: linear predictor stored in fit object now uses first  
intercept and not middle one (NOT DOWNWARD COMPATIBLE but makes predict  
work when using stored linear.predictors)

   * Fixed argument consistency with validate methods


More information is at http://biostat.mc.vanderbilt.edu/Rrms

--
Frank E Harrell Jr Professor and Chairman  School of Medicine
   Department of Biostatistics Vanderbilt University

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] inefficient ifelse() ?

2011-03-01 Thread Dennis Murphy
Hi:

As far as I can see, the problem has to do with the way you wrote f and g:

> f(2)
f for 2
[1] 4
> g(5)
g for 5
[1] 15

You output an unsaved character string with a numeric result, but the
character string is not part of the return object since it is neither saved
as a name or an attribute. If you save the output to an object, it returns
the numeric outcome:
> s <- f(2)
> s
[1] 4
> names(s)
NULL
> attributes(s)
NULL

However,
> f <- function(x) 2^x
> g <- function(x) 3^x
> ifelse(t %% 2 == 0, f(t), g(t))
 [1] 3.00e+00 4.00e+00 2.70e+01 1.60e+01 2.43e+02
 [6] 6.40e+01 2.187000e+03 2.56e+02 1.968300e+04 1.024000e+03
[11] 1.771470e+05 4.096000e+03 1.594323e+06 1.638400e+04 1.434891e+07
[16] 6.553600e+04 1.291402e+08 2.621440e+05 1.162261e+09 1.048576e+06
[21] 1.046035e+10 4.194304e+06 9.414318e+10 1.677722e+07 8.472886e+11
[26] 6.710886e+07 7.625597e+12 2.684355e+08 6.863038e+13 1.073742e+09

is the same as
> ifelse(t %% 2 == 0, 2^t, 3^t)
 [1] 3.00e+00 4.00e+00 2.70e+01 1.60e+01 2.43e+02
 [6] 6.40e+01 2.187000e+03 2.56e+02 1.968300e+04 1.024000e+03
[11] 1.771470e+05 4.096000e+03 1.594323e+06 1.638400e+04 1.434891e+07
[16] 6.553600e+04 1.291402e+08 2.621440e+05 1.162261e+09 1.048576e+06
[21] 1.046035e+10 4.194304e+06 9.414318e+10 1.677722e+07 8.472886e+11
[26] 6.710886e+07 7.625597e+12 2.684355e+08 6.863038e+13 1.073742e+09

which is the vector result you apparently wanted. If you also want to output
the character string with the value, you could either return a list or
rewrite the print method of ifelse() to accommodate it.

HTH,
Dennis


On Tue, Mar 1, 2011 at 12:36 PM, ivo welch  wrote:

> thanks, Henrique.  did you mean
>
>as.vector(t(mapply(function(x, f)f(x), split(t, ((t %% 2)==0)),
> list(f, g   ?
>
> otherwise, you get a matrix.
>
> its a good solution, but unfortunately I don't think this can be used
> to redefine ifelse(cond,ift,iff) in a way that is transparent.  the
> ift and iff functions will always be evaluated before the function
> call happens, even with lazy evaluation.  :-(
>
> I still think that it makes sense to have a smarter vectorized %if% in
> a vectorized language like R.  just my 5 cents.
>
> /iaw
>
> 
> Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com)
>
>
>
>
>
> On Tue, Mar 1, 2011 at 2:33 PM, Henrique Dallazuanna 
> wrote:
> > Try this:
> >
> > mapply(function(x, f)f(x), split(t, t %% 2), list(g, f))
> >
> > On Tue, Mar 1, 2011 at 4:19 PM, ivo welch  wrote:
> >>
> >> dear R experts---
> >>
> >>  t <- 1:30
> >>  f <- function(t) { cat("f for", t, "\n"); return(2*t) }
> >>  g <- function(t) { cat("g for", t, "\n"); return(3*t) }
> >>  s <- ifelse( t%%2==0, g(t), f(t))
> >>
> >> shows that the ifelse function actually evaluates both f() and g() for
> >> all values first, and presumably then just picks left or right results
> >> based on t%%2.  uggh... wouldn't it make more sense to evaluate only
> >> the relevant parts of each vector and then reassemble them?
> >>
> >> /iaw
> >> 
> >> Ivo Welch
> >>
> >> __
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
> > --
> > Henrique Dallazuanna
> > Curitiba-Paraná-Brasil
> > 25° 25' 40" S 49° 16' 22" O
> >
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] error in saved .csv

2011-03-01 Thread Tamas Barjak
Help me please!

I would like to be saved a data table:

write.csv(random.t1, "place", dec=",", append = T, quote = FALSE, sep = " ",
qmethod = "double", eol = "\n", row.names=F)

It's OK!

But the rows of file

 1,1,21042,-4084.87179487179,2457.66483516483,-582.275562799881
2,2,23846,-6383.86480186479,-3409.98451548449,-3569.72145340269
and no

1
21042 - 4084.87179487179 2457.66483516483
Not proportional...

What's the problem???

Thanks!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Export R dataframes to excel

2011-03-01 Thread Iain Gallagher
This appeared today on the r-bloggers site and might be useful for you.

http://www.r-bloggers.com/release-of-xlconnect-0-1-3/

cheers

i

--- On Tue, 1/3/11, Steve Taylor  wrote:

> From: Steve Taylor 
> Subject: Re: [R] Export R dataframes to excel
> To: r-help@r-project.org, "maxsilva" 
> Date: Tuesday, 1 March, 2011, 20:15
> You can copy it with the following
> function and then paste into Excel...
>  
> copy = function (df, buffer.kb=256) {
>   write.table(df,
> file=paste("clipboard-",buffer.kb,sep=""),
>       sep="\t", na='', quote=FALSE,
> row.names=FALSE)
> }
> 
> 
> >>> 
> 
> From: maxsilva 
> To:
> Date: 2/Mar/2011 8:50a
> Subject: [R] Export R dataframes to excel
> 
> I'm trying to do this in several ways but havent had any
> result. Im asked to
> install python, or perl etc. Can anybody suggest a
> direct, easy and
> understandable way?  Every help would be appreciated.
> 
> 
> Thx.
> 
> -- 
> View this message in context: 
> http://r.789695.n4.nabble.com/Export-R-dataframes-to-excel-tp3330399p3330399.html
> 
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org
> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help 
> PLEASE do read the posting guide http://www.R ( http://www.r/
> )-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
> 
>     [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org
> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regression with many independent variables

2011-03-01 Thread Greg Snow
You can use ^2 to get all 2 way interactions and ^3 to get all 3 way 
interactions, e.g.:

lm(Sepal.Width ~ (. - Sepal.Length)^2, data=iris)

The lm.fit function is what actually does the fitting, so you could go directly 
there, but then you lose the benefits of using . and ^.  The Matrix package has 
ways of dealing with sparse matricies, but I don't know if  that would help 
here or not.

You could also just create x'x and x'y matricies directly since the variables 
are 0/1 then use solve.  A lot depends on what you are doing and what questions 
you are trying to answer.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: Matthew Douglas [mailto:matt.dougla...@gmail.com]
> Sent: Tuesday, March 01, 2011 1:09 PM
> To: Greg Snow
> Cc: r-help@r-project.org
> Subject: Re: [R] Regression with many independent variables
> 
> Hi Greg,
> 
> Thanks for the help, it works perfectly. To answer your question,
> there are 339 independent variables but only 10 will be used at one
> time . So at any given line of the data set there will be 10 non zero
> entries for the independent variables and the rest will be zeros.
> 
> One more question:
> 
> 1. I still want to find a way to look at the interactions of the
> independent variables.
> 
> the regression would look like this:
> 
> y = b12*X1X2 + b23*X2X3 +...+ bk-1k*Xk-1Xk
> 
> so I think the regression in R would look like this:
> 
> lm(MARGIN, P235:P236+P236:P237+,weights = Poss, data = adj0708),
> 
> my problem is that since I have technically 339 independent variables,
> when I do this regression I would have 339 Choose 2 = approx 57000
> independent variables (a vast majority will be 0s though) so I dont
> want to have to write all of these out. Is there a way to do this
> quickly in R?
> 
> Also just a curious question that I cant seem to find to online:
> is there a more efficient model other than lm() that is better for
> very sparse data sets like mine?
> 
> Thanks,
> Matt
> 
> 
> On Mon, Feb 28, 2011 at 4:30 PM, Greg Snow  wrote:
> > Don't put the name of the dataset in the formula, use the data
> argument to lm to provide that.  A single period (".") on the right
> hand side of the formula will represent all the columns in the data set
> that are not on the left hand side (you can then use "-" to remove any
> other columns that you don't want included on the RHS).
> >
> > For example:
> >
> >> lm(Sepal.Width ~ . - Sepal.Length, data=iris)
> >
> > Call:
> > lm(formula = Sepal.Width ~ . - Sepal.Length, data = iris)
> >
> > Coefficients:
> >      (Intercept)       Petal.Length        Petal.Width
>  Speciesversicolor
> >           3.0485             0.1547             0.6234            -
> 1.7641
> >  Speciesvirginica
> >          -2.1964
> >
> >
> > But, are you sure that a regression model with 339 predictors will be
> meaningful?
> >
> > --
> > Gregory (Greg) L. Snow Ph.D.
> > Statistical Data Center
> > Intermountain Healthcare
> > greg.s...@imail.org
> > 801.408.8111
> >
> >
> >> -Original Message-
> >> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> >> project.org] On Behalf Of Matthew Douglas
> >> Sent: Monday, February 28, 2011 1:32 PM
> >> To: r-help@r-project.org
> >> Subject: [R] Regression with many independent variables
> >>
> >> Hi,
> >>
> >> I am trying use lm() on some data, the code works fine but I would
> >> like to use a more efficient way to do this.
> >>
> >> The data looks like this (the data is very sparse with a few 1s, -1s
> >> and the rest 0s):
> >>
> >> > head(adj0708)
> >>       MARGIN Poss P235 P247 P703 P218 P430 P489 P83 P307 P337
> >> 1   64.28571   29    0    0    0    0    0    0   0    0    0    0
> >> 0    0    0
> >> 2 -100.0    6    0    0    0    0    0    0   0    1    0    0
> >> 0    0    0
> >> 3  100.0    4    0    0    0    0    0    0   0    1    0    0
> >> 0    0    0
> >> 4  -33.3    7    0    0    0    0    0    0   0    0    0    0
> >> 0    0    0
> >> 5  200.0    2    0    0    0    0    0    0   0    0    0    0
> >> -1    0    0
> >> 6  -83.3   12    0    -1    0    0    0    0   0    0    0    0
> >> 0    0    0
> >>
> >> adj0708 is actually a 35657x341 data set. Each column after "Poss"
> is
> >> an independent variable, the dependent variable is "MARGIN" and it
> is
> >> weighted by "Poss"
> >>
> >>
> >> The regression is below:
> >> fit.adj0708 <- lm( adj0708$MARGIN~adj0708$P235 + adj0708$P247 +
> >> adj0708$P703 + adj0708$P430 + adj0708$P489 + adj0708$P218 +
> >> adj0708$P605 + adj0708$P337 +  +
> >> adj0708$P510,weights=adj0708$Poss)
> >>
> >> I have two questions:
> >>
> >> 1. Is there a way to to condense how I write the independent
> variables
> >> in the lm(), instead of having such a long line of code (I have 339
> >> independent variables to be exact)?
> >> 2. I would like to pair the data to look a regression of the
> >> interactions betwe

Re: [R] expression help

2011-03-01 Thread Dennis Murphy
Hi:

1. expression() in plotmath ignores control characters such as \n.
2. The workaround, discussed a couple of times on this list (hence in the
archives), is to use the atop function, so try something like

plot(0:1,0:1,xaxt="n")
axis(side=1,at=.3,expression(paste("IFN-", gamma, "\n", "TNF-", alpha)))
axis(side=1,at=.6,expression(atop("a label", "2nd line")))

It appears you'll have to make some vertical adjustments, but that's the
basic workaround.

HTH,
Dennis

On Tue, Mar 1, 2011 at 11:32 AM, Daryl Morris  wrote:

> Hello,
>
> I am trying to write math-type on a plot.  Due to space limitations on the
> plot, I want 2 short expressions written on top of each other.  It is
> certainly possible to write them in two separate calls, but that involves
> fine-tuning locations and a lot of trial and error (and I'm trying to write
> a general purpose function).
>
> Here's where I've gotten to:
> plot(0:1,0:1,xaxt="n")
> axis(side=1,at=.3,expression(paste("IFN-", gamma, "\n", "TNF-", alpha)))
> axis(side=1,at=.6,"a label\n2nd line")
>
> What I am trying to do is illustrated by the the non-expression axis label
> ("a label\n2nd line").  The "\n" forces a new line when I'm not using
> expressions, but doesn't work for my real example.
>
> I have googled for general documentation on expressions, but the only thing
> I've been able to find is the help for plotmath(), so if someone can point
> me to more complete documentation that would also be helpful.
>
> thanks, Daryl
> SCHARP, FHCRC, UW Biostatistics
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Speed up sum of outer products?

2011-03-01 Thread Dennis Murphy
...and this is where we cue the informative article on least squares
calculations in R by Doug Bates:

http://cran.r-project.org/doc/Rnews/Rnews_2004-1.pdf

HTH,
Dennis

On Tue, Mar 1, 2011 at 10:52 AM, AjayT  wrote:

> Hey thanks alot guys !!! That really speeds things up !!! I didn't know %*%
> and crossprod, could operate on matrices. I think you've saved me hours in
> calculation time. Thanks again.
>
> > system.time({C=matrix(0,50,50);for(i in 1:n)C = C + (X[i,] %o% X[i,])})
>   user  system elapsed
>0.450.000.90
> > system.time({C1 = t(X) %*% X})
>   user  system elapsed
>0.020.000.05
> > system.time({C2 = crossprod(X)})
>   user  system elapsed
>   0.020.000.02
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Speed-up-sum-of-outer-products-tp3330160p3330378.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] inefficient ifelse() ?

2011-03-01 Thread ivo welch
thanks, Henrique.  did you mean

as.vector(t(mapply(function(x, f)f(x), split(t, ((t %% 2)==0)),
list(f, g   ?

otherwise, you get a matrix.

its a good solution, but unfortunately I don't think this can be used
to redefine ifelse(cond,ift,iff) in a way that is transparent.  the
ift and iff functions will always be evaluated before the function
call happens, even with lazy evaluation.  :-(

I still think that it makes sense to have a smarter vectorized %if% in
a vectorized language like R.  just my 5 cents.

/iaw


Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com)





On Tue, Mar 1, 2011 at 2:33 PM, Henrique Dallazuanna  wrote:
> Try this:
>
> mapply(function(x, f)f(x), split(t, t %% 2), list(g, f))
>
> On Tue, Mar 1, 2011 at 4:19 PM, ivo welch  wrote:
>>
>> dear R experts---
>>
>>  t <- 1:30
>>  f <- function(t) { cat("f for", t, "\n"); return(2*t) }
>>  g <- function(t) { cat("g for", t, "\n"); return(3*t) }
>>  s <- ifelse( t%%2==0, g(t), f(t))
>>
>> shows that the ifelse function actually evaluates both f() and g() for
>> all values first, and presumably then just picks left or right results
>> based on t%%2.  uggh... wouldn't it make more sense to evaluate only
>> the relevant parts of each vector and then reassemble them?
>>
>> /iaw
>> 
>> Ivo Welch
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Henrique Dallazuanna
> Curitiba-Paraná-Brasil
> 25° 25' 40" S 49° 16' 22" O
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Lattice: useOuterStrips and axes

2011-03-01 Thread Peter Ehlers

On 2011-03-01 10:29, Jim Price wrote:

Consider the following:


library(lattice)
library(latticeExtra)

temp<- expand.grid(
subject = factor(paste('Subject', 1:3)),
var = factor(paste('Variable', 1:3)),
time = 1:10
)
temp$resp<- rnorm(nrow(temp), 10 * as.numeric(temp$var), 1)

ylimits<- by(temp$resp, temp$var, function(x) range(pretty(x)))

useOuterStrips(xyplot(
  resp ~ time | subject * var,
  data = temp,
  as.table = TRUE,
  scales = list(
alternating = 1, tck = c(1, 0),
y = list(relation = 'free', rot = 0, limits 
= rep(ylimits, each =
3))
)
  ))


This is a matrix of variables on subjects, where it makes sense to have
panel-specific y-axes because of the differing variable ranges. In fact, it
makes sense to have row-specific y-axes, because of the similarity of
intra-variable inter-subject response. However the graphic as presented
gives per-panel y-axis annotation - I'd like to drop the 2nd and 3rd columns
of y-axes labels, so the panels become a contiguous array.

Is this possible?


Looks like you want the combineLimits() function in latticeExtra.
Specifically,

 p <- [your code]
 combineLimits(p, margin.y = 1, extend = FALSE)

Peter Ehlers



Thanks,
Jim Price.
Cardiome Pharma. Corp.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Export R dataframes to excel

2011-03-01 Thread Don McKenzie

Or

?write.csv

which excel will import

On 1-Mar-11, at 12:17 PM, Berend Hasselman wrote:



maxsilva wrote:


Thx, but im looking for a more direct solution... my problem is very
simple, I have a dataframe and I want to create a standard excel
spreadsheet. My dataframe could be something like this



More or less the same question  was answered several hours ago.
See  http://rwiki.sciviews.org/doku.php?id=tips:data- 
io:ms_windows&s=excel

as Gabor Grothendieck suggested.

/Berend

--
View this message in context: http://r.789695.n4.nabble.com/Export- 
R-dataframes-to-excel-tp3330399p3330518.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting- 
guide.html

and provide commented, minimal, self-contained, reproducible code.


Why does the universe go to all the bother of existing?
-- Stephen Hawking

#define QUESTION ((bb) || !(bb))
-- William Shakespeare



Don McKenzie, Research Ecologist
Pacific WIldland Fire Sciences Lab
US Forest Service

Affiliate Professor
School of Forest Resources, College of the Environment
CSES Climate Impacts Group
University of Washington

desk: 206-732-7824
cell: 206-321-5966
d...@uw.edu
donaldmcken...@fs.fed.us

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Export R dataframes to excel

2011-03-01 Thread Berend Hasselman

maxsilva wrote:
> 
> Thx, but im looking for a more direct solution... my problem is very
> simple, I have a dataframe and I want to create a standard excel
> spreadsheet. My dataframe could be something like this
> 

More or less the same question  was answered several hours ago.
See  http://rwiki.sciviews.org/doku.php?id=tips:data-io:ms_windows&s=excel
as Gabor Grothendieck suggested.

/Berend

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Export-R-dataframes-to-excel-tp3330399p3330518.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Export R dataframes to excel

2011-03-01 Thread Steve Taylor
You can copy it with the following function and then paste into Excel...
 
copy = function (df, buffer.kb=256) {
  write.table(df, file=paste("clipboard-",buffer.kb,sep=""),
  sep="\t", na='', quote=FALSE, row.names=FALSE)
}


>>> 

From: maxsilva 
To:
Date: 2/Mar/2011 8:50a
Subject: [R] Export R dataframes to excel

I'm trying to do this in several ways but havent had any result. Im asked to
install python, or perl etc. Can anybody suggest a direct, easy and
understandable way?  Every help would be appreciated. 

Thx.

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Export-R-dataframes-to-excel-tp3330399p3330399.html
 
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide http://www.R ( http://www.r/ 
)-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data type problem when extract data from SQLite to R by using RSQLite

2011-03-01 Thread Seth Falcon
On Tue, Mar 1, 2011 at 10:06 AM, chen jia  wrote:
> Hi Seth,
>
> Thanks so much for identifying the problem and explaining everything.
> I think the first solution that you suggest--make sure the schema has
> well defined types--would work the best for me. But, I have one
> question about how to implement it, which is more about sqlite itself.
>
> First, I found out that the columns that don't have the expected data
> types in the table annual_data3 are created by aggregate functions in
> a separate table. These columns are later combined with other columns
> that do.
>
> I read the link that you provide,
> http://www.sqlite.org/datatype3.html. One paragraph says "When
> grouping values with the GROUP BY clause values with different storage
> classes are considered distinct, except for INTEGER and REAL values
> which are considered equal if they are numerically equal. No
> affinities are applied to any values as the result of a GROUP by
> clause."
>
> If I understand it correctly, the columns created by aggregate
> functions with a GROUP by clause do not have any expected data types.
>
> My solution is to use CREATE TABLE clause to declare the expected
> datatype and then insert the values of columns created by the
> aggregate functions with the GROUP by clause. However, this solution
> requires a CREATE TABLE cause every time the aggregate function and
> the GROUP by clause is used.
>
> My question is: Is this the best way to make sure that the columns as
> a result of a GROUP by clause have the expected data types? Thanks.

That might be a good question to post to the SQLite user's list :-)

I don't have an answer off the top of my head.  My reading of the
SQLite docs would lead me to expect that a GROUP BY clause would not
change/remove type if the column being grouped contains all the same
declared type affinity.

+ seth

>
> Best,
> Jia
>
> On Tue, Mar 1, 2011 at 1:16 AM, Seth Falcon  wrote:
>> Hi Jia,
>>
>> On Mon, Feb 28, 2011 at 6:57 PM, chen jia  wrote:
>>> The .schema of table annual_data3 is
>>> sqlite> .schema annual_data3
>>> CREATE TABLE "annual_data3"(
>>>  PERMNO INT,
>>>  DATE INT,
>>>  CUSIP TEXT,
>>>  EXCHCD INT,
>>>  SICCD INT,
>>>  SHROUT INT,
>>>  PRC REAL,
>>>  RET REAL,
>>>  ...
>>>  pret_var,
>>>  pRET_sd,
>>>  nmret,
>>>  pya_var,
>>
>> [snip]
>>
>> Is there a reason that you've told SQLite the expected data type for
>> only some of the columns?
>>
>>> Interestingly, I find that the problem I reported does not for columns
>>> labeled real in the schema info. For example, the type of column RET
>>> never changes no matter what the first observation is.
>>
>> Yes, that is expected and I think it is the solution to your problem:
>> setup your schema so that all columns have a declared type.  For some
>> details on SQLite's type system see
>> http://www.sqlite.org/datatype3.html.
>>
>> RSQLite currently maps NA values to NULL in the database.  Pulling
>> data out of a SELECT query, RSQLite uses the sqlite3_column_type
>> SQLite API to determine the data type and map it to an R type.  If
>> NULL is encountered, then the schema is inspected using
>> sqlite3_column_decltype to attempt to obtain a type.  If that fails,
>> the data is mapped to a character vector at the R level.  The type
>> selection is done once after the first row has been fetched.
>>
>> To work around this you can:
>>
>> - make sure your schema has well defined
>>  types (which will help SQLite perform its operations);
>>
>> - check whether the returned column has the expected type and convert
>>  if needed at the R level.
>>
>> - remove NA/NULL values from the db or decide on a different way of
>>  encoding them (e.g you might be able to use -1 in the db in some
>>  situation to indicate missing).  Your R code would then need to map
>>  these to proper NA.
>>
>> Hope that helps.
>>
>> + seth
>>
>>
>>
>> --
>> Seth Falcon | @sfalcon | http://userprimary.net/
>>
>
>
>
> --
> 700 Fisher Hall
> 2100 Neil Ave.
> Columbus, Ohio  43210
> http://www.fisher.osu.edu/~chen_1002/
>



-- 
Seth Falcon | @sfalcon | http://userprimary.net/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Export R dataframes to excel

2011-03-01 Thread KATSCHKE, ADRIAN CIV DFAS
write.table() using the sep="," and file extension as .csv works great to pull 
directly into excel. 

?write.table 

Without more detail as to the problem, it is difficult to give a more specific 
answer.

Adrian 
-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of maxsilva
Sent: Tuesday, March 01, 2011 2:11 PM
To: r-help@r-project.org
Subject: [R] Export R dataframes to excel

I'm trying to do this in several ways but havent had any result. Im asked to
install python, or perl etc. Can anybody suggest a direct, easy and
understandable way?  Every help would be appreciated. 

Thx.

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Export-R-dataframes-to-excel-tp3330399p3330399.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] odbcConnectExcel2007 creates corrupted files

2011-03-01 Thread Mark Lyman
I tried creating a .xlsx file using odbcConnectExcel2007 and adding a 
worksheet with sqlSave. This seems to work, I am even able to query the 
worksheet, but when I try opening the file in Excel I get the following 
message: "Excel cannot open the file 'test.xlx' because the file format or 
file extension is not valid. Verify that the file has not been corrupted and 
that the file extension matches the format of the file." Is this a known 
issue? Or just user error? The RODBC manual seemed to indicate that sqlSave 
worked fine with Excel, however it did not mention Excel 2007.

I am running Excel 2007 and R 2.12.1 on Windows XP. Below is my example code. 

$ library(RODBC)
$ 
$ # This doesn't work
$ # Connect to an previously non-existent Excel file
$ out <- odbcConnectExcel2007("test.xlsx", readOnly=FALSE)
$ test <- data.frame(x=1:10, y=rnorm(10))
$ sqlSave(out, test)
$ sqlTables(out)
TABLE_CAT TABLE_SCHEM 
TABLE_NAME   TABLE_TYPE REMARKS
1 C:\\Documents and Settings\\G69974\\My Documents\\test.xlsx  
test$ SYSTEM TABLE
2 C:\\Documents and Settings\\G69974\\My Documents\\test.xlsx
   testTABLE
$ sqlFetch(out, "test")
x  y
1   1  0.5832882
2   2  0.4387569
3   3 -0.6444048
4   4 -1.0013450
5   5  1.0324718
6   6 -0.7844128
7   7 -1.6789266
8   8  0.1402672
9   9  0.8650061
10 10 -0.0420201
$ close(out)
$ # Opening test.xlsx now fails
$ 
$ # This works
$ out <- odbcConnectExcel("test.xls", readOnly=FALSE)
$ test <- data.frame(x=1:10, y=rnorm(10))
$ sqlSave(out, test)
$ sqlTables(out)
   TABLE_CAT TABLE_SCHEM 
TABLE_NAME   TABLE_TYPE REMARKS
1 C:\\Documents and Settings\\G69974\\My Documents\\test  
test$ SYSTEM TABLE
2 C:\\Documents and Settings\\G69974\\My Documents\\test   
testTABLE
$ sqlFetch(out, "test")
x  y
1   1  0.5955787
2   2  1.0517528
3   3  0.3884892
4   4 -2.1408813
5   5 -0.7081686
6   6  0.1511828
7   7  2.0560555
8   8 -0.5801912
9   9 -0.6988058
10 10 -0.1237739
$ close(out)
$ # Opening test.xls now works

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] expression help

2011-03-01 Thread Daryl Morris

Hello,

I am trying to write math-type on a plot.  Due to space limitations on 
the plot, I want 2 short expressions written on top of each other.  It 
is certainly possible to write them in two separate calls, but that 
involves fine-tuning locations and a lot of trial and error (and I'm 
trying to write a general purpose function).


Here's where I've gotten to:
plot(0:1,0:1,xaxt="n")
axis(side=1,at=.3,expression(paste("IFN-", gamma, "\n", "TNF-", alpha)))
axis(side=1,at=.6,"a label\n2nd line")

What I am trying to do is illustrated by the the non-expression axis 
label ("a label\n2nd line").  The "\n" forces a new line when I'm not 
using expressions, but doesn't work for my real example.


I have googled for general documentation on expressions, but the only 
thing I've been able to find is the help for plotmath(), so if someone 
can point me to more complete documentation that would also be helpful.


thanks, Daryl
SCHARP, FHCRC, UW Biostatistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Export R dataframes to excel

2011-03-01 Thread maxsilva
I'm trying to do this in several ways but havent had any result. Im asked to
install python, or perl etc. Can anybody suggest a direct, easy and
understandable way?  Every help would be appreciated. 

Thx.

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Export-R-dataframes-to-excel-tp3330399p3330399.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] glht() used with coxph()

2011-03-01 Thread array chip
Hi, I am experimenting with using glht() from multcomp package together with 
coxph(), and glad to find that glht() can work on coph object, for example:

> (fit<-coxph(Surv(stop, status>0)~treatment,bladder1))
coxph(formula = Surv(stop, status > 0) ~ treatment, data = bladder1)


  coef exp(coef) se(coef)  zp
treatmentpyridoxine -0.063 0.9390.161 -0.391 0.70
treatmentthiotepa   -0.159 0.8530.168 -0.947 0.34

Likelihood ratio test=0.91  on 2 df, p=0.635  n= 294 

> glht(fit,linfct=mcp(treatment='Tukey'))

 General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts


Linear Hypotheses:
   Estimate
pyridoxine - placebo == 0  -0.06303
thiotepa - placebo == 0-0.15885
thiotepa - pyridoxine == 0 -0.09582

However, once I added a strata term in the formula of coxph(), then glht() 
can't 
work anymore:

> (fit<-coxph(Surv(stop, status>0)~treatment+strata(enum),bladder1))
coxph(formula = Surv(stop, status > 0) ~ treatment + strata(enum), 
data = bladder1)


  coef exp(coef) se(coef) zp
treatmentpyridoxine  0.188  1.210.170  1.11 0.27
treatmentthiotepa   -0.210  0.810.172 -1.22 0.22

Likelihood ratio test=4.39  on 2 df, p=0.111  n= 294 

> glht(fit,linfct=mcp(treatment='Tukey'))
Error in glht.matrix(model = list(coefficients = c(0.187792527684977,  : 
  ‘ncol(linfct)’ is not equal to ‘length(coef(model))’

Can anyone suggest why strata would make coxph object ineligible for glht()? Or 
how to make it work?

Thanks

John


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] inefficient ifelse() ?

2011-03-01 Thread Henrique Dallazuanna
Try this:

mapply(function(x, f)f(x), split(t, t %% 2), list(g, f))

On Tue, Mar 1, 2011 at 4:19 PM, ivo welch  wrote:

> dear R experts---
>
>  t <- 1:30
>  f <- function(t) { cat("f for", t, "\n"); return(2*t) }
>  g <- function(t) { cat("g for", t, "\n"); return(3*t) }
>  s <- ifelse( t%%2==0, g(t), f(t))
>
> shows that the ifelse function actually evaluates both f() and g() for
> all values first, and presumably then just picks left or right results
> based on t%%2.  uggh... wouldn't it make more sense to evaluate only
> the relevant parts of each vector and then reassemble them?
>
> /iaw
> 
> Ivo Welch
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] inefficient ifelse() ?

2011-03-01 Thread ivo welch
dear R experts---

  t <- 1:30
  f <- function(t) { cat("f for", t, "\n"); return(2*t) }
  g <- function(t) { cat("g for", t, "\n"); return(3*t) }
  s <- ifelse( t%%2==0, g(t), f(t))

shows that the ifelse function actually evaluates both f() and g() for
all values first, and presumably then just picks left or right results
based on t%%2.  uggh... wouldn't it make more sense to evaluate only
the relevant parts of each vector and then reassemble them?

/iaw

Ivo Welch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] bootstrap resampling - simplified

2011-03-01 Thread Dennis Murphy
Hi:

On Tue, Mar 1, 2011 at 8:22 AM, Bodnar Laszlo EB_HU <
laszlo.bod...@erstebank.hu> wrote:

> Hello there,
>
> I have a problem concerning bootstrapping in R - especially focusing on the
> resampling part of it. I try to sum it up in a simplified way so that I
> would not confuse anybody.
>
> I have a small database consisting of 20 observations (basically numbers
> from 1 to 20, I mean: 1, 2, 3, 4, 5, ... 18, 19, 20).
>

To check on the probability of this event happening, I ran the following:
bootmat <- matrix(sample(1:20, 20, replace = TRUE), nrow = 1)
sum(apply(bootmat, 1, function(x) any(table(x) >= 5)) )
[1] 492

It's about 0.05. A Q& D 'solution' would be to oversample by at least 5%
(let's do 10% just to be on the safe side) and then pick out the first B of
these. In the above example, we could do 11000 samples instead, and pick out
the first 1 that meet the criterion:

bootmat <- matrix(sample(1:20, 22, replace = TRUE), nrow = 11000)
badsamps <- apply(bootmat, 1, function(x) any(tabulate(x) >= 5))
bootfin <- bootmat[-badsamps, ][1:1, ]

Time:
   user  system elapsed
   0.280.000.28

(Note 1: Using table instead of tabulate took 4.22 seconds on my machine -
tabulate is much faster.)
(Note 2: In the call above, there were 539 bad samples, so the 5% ballpark
estimate seems plausible.)

This is a simple application of the accept-reject criterion. I don't know
how large 'many' is to you, but 10,000 seems to be a reasonable starting
point. I ran it again for 1,000,000 such samples, and the completion time
was
   user  system elapsed
  36.740.31   37.15
so the processing time is of an order a bit larger than linear. If your
simulations are of this magnitude and are to be run repeatedly, you probably
need to write a function to improve the speed and to get rid of the waste
produced by a rejection sampling approach. If this is a one-off deal,
perhaps the above is sufficient.

HTH,
Dennis

> I would like to resample this database many times for the bootstrap process
> with the following conditions. Firstly, every resampled database should also
> include 20 observations. Secondly, when selecting a number from the
> above-mentioned 20 numbers, you can do this selection with replacement. The
> difficult part comes now: one number can be selected only maximum 5 times.
> In order to make this clear I show you a couple of examples. So the
> resampled databases might be like the following ones:
>
> (1st database)  1,2,1,2,1,2,1,2,1,2,3,3,3,3,3,4,4,4,4,4
> 4 different numbers are chosen (1, 2, 3, 4), each selected - for the
> maximum possible - 5 times.
>
> (2nd database)  1,8,8,6,8,8,8,2,3,4,5,6,6,6,6,7,19,1,1,1
> Two numbers - 8 and 6 - selected 5 times (the maximum possible times),
> number 1 selected 4 times, the others selected less than 4 times.
>
> (3rd database)  1,1,2,2,3,3,4,4,9,9,9,10,10,13,10,9,3,9,2,1
> Number 9 chosen for the maximum possible 5 times, number 10, 3, 2, 1 chosen
> for 3 times, number 4 selected twice and number 13 selected only once.
>
> ...
>
> Anybody knows how to implement my "tricky" condition into one of the R
> functions - that one number can be selected only 5 times at most? Are 'boot'
> and 'bootstrap' packages capable of managing this? I guess they are, I just
> couldn't figure it out yet...
>
> Thanks very much! Best regards,
> Laszlo Bodnar
>
>
>
> 
> Ez az e-mail és az összes hozzá tartozó csatolt melléklet titkos és/vagy
> jogilag, szakmailag vagy más módon védett információt tartalmazhat.
> Amennyiben nem Ön a levél címzettje akkor a levél tartalmának közlése,
> reprodukálása, másolása, vagy egyéb más úton történõ terjesztése,
> felhasználása szigorúan tilos. Amennyiben tévedésbõl kapta meg ezt az
> üzenetet kérjük azonnal értesítse az üzenet küldõjét. Az Erste Bank Hungary
> Zrt. (EBH) nem vállal felelõsséget az információ teljes és pontos -
> címzett(ek)hez történõ - eljuttatásáért, valamint semmilyen késésért,
> kapcsolat megszakadásból eredõ hibáért, vagy az információ felhasználásából
> vagy annak megbízhatatlanságából eredõ kárért.
>
> Az üzenetek EBH-n kívüli küldõje vagy címzettje tudomásul veszi és
> hozzájárul, hogy az üzenetekhez más banki alkalmazott is hozzáférhet az EBH
> folytonos munkamenetének biztosítása érdekében.
>
>
> This e-mail and any attached files are confidential and/...{{dropped:19}}
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-projec

Re: [R] Finding pairs with least magnitude difference from mean

2011-03-01 Thread rex.dwyer
No, that's not what I meant, but maybe I didn't understand the question.
What I suggested would involve sorting y, not x: "sort the *distances*".
If you want to minimize the sd of a subset of numbers, you sort the numbers and 
find a subset that is clumped together.
If the numbers are a function of pairs, you compute the function for all pairs 
of numbers, and find a subset that's clumped together.
Anyway, it's an idea, not a theorem, so proof is left as an exercise for the 
esteemed reader.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Hans W Borchers
Sent: Monday, February 28, 2011 2:17 PM
To: r-h...@stat.math.ethz.ch
Subject: Re: [R] Finding pairs with least magnitude difference from mean

  syngenta.com> writes:

> James,
> It seems the 2*mean(x) term is irrelevant if you are seeking to
> minimize sd. Then you want to sort the distances from smallest to
> largest. Then it seems clear that your five values will be adjacent in
> the list, since if you have a set of five adjacent values, exchanging
> any of them for one further away in the list will increase the sd. The
> only problem I see with this is that you can't use a number more than
> once. In any case, you need to compute the best five pairs beginning
> at position i in the sorted list, for 1<=i<=choose(n,2), then take the
> max over all i.
> There no R in my answer such as you'd notice, but I hope it helps just
> the same.
> Rex

You probably mean something like the following:

x <- rnorm(10)
y <- outer(x, x, "+") - (2 * mean(x))

o <- order(x)
sd(c(y[o[1],o[10]], y[o[2],o[9]], y[o[3],o[8]], y[o[4],o[7]], y[o[5],o[6]]))

This seems reasonable, though you would have to supply a more stringent
argument. I did two tests and it works alright.

--Hans Werner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




message may contain confidential information. If you are not the designated 
recipient, please notify the sender immediately, and delete the original and 
any copies. Any use of the message by you is prohibited. 
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Speed up sum of outer products?

2011-03-01 Thread AjayT
Hey thanks alot guys !!! That really speeds things up !!! I didn't know %*%
and crossprod, could operate on matrices. I think you've saved me hours in
calculation time. Thanks again.

> system.time({C=matrix(0,50,50);for(i in 1:n)C = C + (X[i,] %o% X[i,])}) 
   user  system elapsed 
   0.450.000.90 
> system.time({C1 = t(X) %*% X}) 
   user  system elapsed 
   0.020.000.05 
> system.time({C2 = crossprod(X)})
   user  system elapsed 
   0.020.000.02 

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Speed-up-sum-of-outer-products-tp3330160p3330378.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Lattice: useOuterStrips and axes

2011-03-01 Thread Jim Price
Consider the following:


library(lattice)
library(latticeExtra)

temp <- expand.grid(
subject = factor(paste('Subject', 1:3)),
var = factor(paste('Variable', 1:3)),
time = 1:10
)
temp$resp <- rnorm(nrow(temp), 10 * as.numeric(temp$var), 1)

ylimits <- by(temp$resp, temp$var, function(x) range(pretty(x)))

useOuterStrips(xyplot(
  resp ~ time | subject * var,
  data = temp,
  as.table = TRUE,
  scales = list(
alternating = 1, tck = c(1, 0),
y = list(relation = 'free', rot = 0, limits 
= rep(ylimits, each =
3))
)
  ))


This is a matrix of variables on subjects, where it makes sense to have
panel-specific y-axes because of the differing variable ranges. In fact, it
makes sense to have row-specific y-axes, because of the similarity of
intra-variable inter-subject response. However the graphic as presented
gives per-panel y-axis annotation - I'd like to drop the 2nd and 3rd columns
of y-axes labels, so the panels become a contiguous array.

Is this possible?

Thanks,
Jim Price.
Cardiome Pharma. Corp.



-- 
View this message in context: 
http://r.789695.n4.nabble.com/Lattice-useOuterStrips-and-axes-tp3330338p3330338.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] more boa plots questions

2011-03-01 Thread emj83
I have MCMC output chains A and B for example, I want to produce trace plots
for them using the boa command line...

#loads boa
boa.init()
#reads in chains
boa.chain.add(boa.importMatrix('A'), 'A')
boa.chain.add(boa.importMatrix('B'), 'B')
#plot trace plot
problems arise here!

I know I can get trace plots using boa.plot('trace') but this plots the
parameter chains on the same plot- I want separate plots using
boa.plot.trace()

#from the manual..
boa.plot.trace(lnames, pname, annotate = boa.par("legend"))
lnames: Character vector giving the name of the desired MCMC sequence
   in the working session list of sequences.
pname: Character string giving the name of the parameters to be
  plotted.
annotate: Logical value indicating that a legend be included in the
  plot.

I tried boa.plot.trace(B) and boa.plot.trace(B,"B") but both do not give me
a trace plot for chain B and print FALSE. I am obviously misinterpreting
lnames and pnames.

Can anyone help please?
Thanks in advance Emma





-- 
View this message in context: 
http://r.789695.n4.nabble.com/more-boa-plots-questions-tp3330312p3330312.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] bootstrap resampling question

2011-03-01 Thread Jonathan P Daily
I'm not sure that is equivalent to sampling with replacement, since if the 
first "draw" is 1, then the probability that the next draw will be one is 
4/100 instead of the 1/20 it would be in sampling with replacement. I 
think the way to do this would be what Greg suggested - something like:

bigsamp <- sample(1:20, 100, T)
idx <- sort(unlist(sapply(1:20, function(x) which(bigsamp == 
x)[1:5])))[1:20]
samp <- bigsamp[idx]

--
Jonathan P. Daily
Technician - USGS Leetown Science Center
11649 Leetown Road
Kearneysville WV, 25430
(304) 724-4480
"Is the room still a room when its empty? Does the room,
 the thing itself have purpose? Or do we, what's the word... imbue it."
 - Jubal Early, Firefly

r-help-boun...@r-project.org wrote on 03/01/2011 09:37:31 AM:

> [image removed] 
> 
> Re: [R] bootstrap resampling question
> 
> Giovanni Petris 
> 
> to:
> 
> Bodnar Laszlo EB_HU
> 
> 03/01/2011 11:58 AM
> 
> Sent by:
> 
> r-help-boun...@r-project.org
> 
> Cc:
> 
> "'r-help@r-project.org'"
> 
> A simple way of sampling with replacement from 1:20, with the additional
> constraint that each number can be selected at most five times is
> 
> > sample(rep(1:20, 5), 20)
> 
> HTH,
> Giovanni
> 
> On Tue, 2011-03-01 at 11:30 +0100, Bodnar Laszlo EB_HU wrote:
> > Hello there,
> > 
> > I have a problem concerning bootstrapping in R - especially 
> focusing on the resampling part of it. I try to sum it up in a 
> simplified way so that I would not confuse anybody.
> > 
> > I have a small database consisting of 20 observations (basically 
> numbers from 1 to 20, I mean: 1, 2, 3, 4, 5, ... 18, 19, 20).
> > 
> > I would like to resample this database many times for the 
> bootstrap process with the following two conditions. The resampled 
> databases should also have 20 observations and you can select each 
> of the previously mentioned 20 numbers with replacement. I guess it 
> is obvious so far. Now the more difficult second condition is that 
> one number can be selected only maximum 5 times. In order to make 
> this clear I try to show you an example. So there can be resampled 
> databases like the following ones:
> > 
> > (1st database)  1,2,1,2,1,2,1,2,1,2,3,3,3,3,3,4,4,4,4,4
> > (4 different numbers are chosen, each selected 5 times)
> > 
> > (2nd database)  1,8,8,6,8,8,8,2,3,4,5,6,6,6,6,7,19,1,1,1
> > (Two numbers - 8 and 6 - selected 5 times, number "1" selected 
> four times, the others selected less than 4 times)
> > 
> > My very first guess that came to my mind whilst thinking about the
> problem was the sample function where there are settings like 
> replace=TRUE and prob=... where you can create a probability vector 
> i.e. how much should be the probability of selecting a number. So I 
> tried to calculate probabilities first. I thought the problem can 
> basically described as a k-combination with repetitions. 
> Unfortunately the only thing I could calculate so far is the total 
> number of all possible selections which amounts to 137 846 527 049.
> > 
> > Anybody knows how to implement my second "tricky" condition into 
> one of the R functions? Are 'boot' and 'bootstrap' packages capable 
> of managing this? I guess they are, I just couldn't figure it out yet...
> > 
> > Thanks very much! Best regards,
> > Laszlo Bodnar
> > 
> > 
> 

> > Ez az e-mail és az összes hozzá tartozó csatolt melléklet titkos 
> és/vagy jogilag, szakmailag vagy más módon védett információt
> tartalmazhat. Amennyiben nem Ön a levél címzettje akkor a levél 
> tartalmának közlése, reprodukálása, másolása, vagy egyéb más úton 
> történő terjesztése, felhasználása szigorúan tilos. Amennyiben 
> tévedésből kapta meg ezt az üzenetet kérjük azonnal értesítse az 
> üzenet küldőjét. Az Erste Bank Hungary Zrt. (EBH) nem vállal 
> felelősséget az információ teljes és pontos - címzett(ek)hez történő
> - eljuttatásáért, valamint semmilyen késésért, kapcsolat 
> megszakadásból eredő hibáért, vagy az információ felhasználásából 
> vagy annak megbízhatatlanságából eredő kárért.
> > 
> > Az üzenetek EBH-n kívüli küldője vagy címzettje tudomásul veszi és
> hozzájárul, hogy az üzenetekhez más banki alkalmazott is hozzáférhet
> az EBH folytonos munkamenetének biztosítása érdekében.
> > 
> > 
> > This e-mail and any attached files are confidential 
and/...{{dropped:19}}
> > 
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> -- 
> 
> Giovanni Petris  
> Associate Professor
> Department of Mathematical Sciences
> University of Arkansas - Fayetteville, AR 72701
> Ph: (479) 575-6324, 575-8630 (fax)
> http://definetti.uark.edu/~gpetris/
> 
> _

Re: [R] boa library and plots

2011-03-01 Thread emj83
Many thanks for your response, and I am sorry I did not post correctly.

I have found dev.copy2eps() useful.

Emma

-- 
View this message in context: 
http://r.789695.n4.nabble.com/boa-library-and-plots-tp3322508p3330299.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Does POSIXlt extract date components properly?

2011-03-01 Thread Prof Brian Ripley

On Tue, 1 Mar 2011, Seth W Bigelow wrote:


I would like to use POSIX classes to store dates and extract components of
dates. Following the example in Spector ("Data Manipulation in R"), I
create a date


mydate = as. POSIXlt('2005-4-19 7:01:00')


I then successfully extract the day with the command


mydate$day

[1] 19

But when I try to extract the month

> mydate$mon
[1] 3

it returns the wrong month. And mydate$year is off by about 2,000 years.
Am I doing something wrong?


Not reading the documentation (nor the posting guide). 
?DateTimeClasses says


 ‘mon’ 0-11: months after the first of the year.

 ‘year’ years since 1900.

That is the POSIX standard ... you could also have looked there.


Dr. Seth  W. Bigelow
Biologist, USDA-FS Pacific Southwest Research Station
1731 Research Park Drive, Davis California
sbige...@fs.fed.us /  ph. 530 759 1718
[[alternative HTML version deleted]]


Please note what the posting guide said about that!


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Adjusting values via vectorization

2011-03-01 Thread Albert Vernon Smith
I'm adjusting values in a list based on a couple of matrixes.  One matrix 
specifies the row to be taken from the adjustment matrix, while using the 
aligned column values.  I have an approach which works, but I might find an 
approach with vectorization. 

Here is code with my solution:

--
nids <- 10
npredictors <- 2
ncol <- 4
values <- sample(c(-1,0,1),nids,replace=TRUE)
input <- matrix(sample(1:ncol,nids*npredictors,replace=TRUE),nrow=nids)
values.adjust <- matrix(rnorm(ncol*npredictors),ncol=npredictors)

for(i in 1:nids){
  for(j in 1:npredictors){
values[i] <- values[i] + values.adjust[input[i,j],j]
  }
}
--

I'm using this as an example to hopefully better understand R syntax w.r.t. 
vectorization.  Is there such a way to replace my for loops?

Thanks,
-albert
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data type problem when extract data from SQLite to R by using RSQLite

2011-03-01 Thread chen jia
Hi Seth,

Thanks so much for identifying the problem and explaining everything.
I think the first solution that you suggest--make sure the schema has
well defined types--would work the best for me. But, I have one
question about how to implement it, which is more about sqlite itself.

First, I found out that the columns that don't have the expected data
types in the table annual_data3 are created by aggregate functions in
a separate table. These columns are later combined with other columns
that do.

I read the link that you provide,
http://www.sqlite.org/datatype3.html. One paragraph says "When
grouping values with the GROUP BY clause values with different storage
classes are considered distinct, except for INTEGER and REAL values
which are considered equal if they are numerically equal. No
affinities are applied to any values as the result of a GROUP by
clause."

If I understand it correctly, the columns created by aggregate
functions with a GROUP by clause do not have any expected data types.

My solution is to use CREATE TABLE clause to declare the expected
datatype and then insert the values of columns created by the
aggregate functions with the GROUP by clause. However, this solution
requires a CREATE TABLE cause every time the aggregate function and
the GROUP by clause is used.

My question is: Is this the best way to make sure that the columns as
a result of a GROUP by clause have the expected data types? Thanks.

Best,
Jia

On Tue, Mar 1, 2011 at 1:16 AM, Seth Falcon  wrote:
> Hi Jia,
>
> On Mon, Feb 28, 2011 at 6:57 PM, chen jia  wrote:
>> The .schema of table annual_data3 is
>> sqlite> .schema annual_data3
>> CREATE TABLE "annual_data3"(
>>  PERMNO INT,
>>  DATE INT,
>>  CUSIP TEXT,
>>  EXCHCD INT,
>>  SICCD INT,
>>  SHROUT INT,
>>  PRC REAL,
>>  RET REAL,
>>  ...
>>  pret_var,
>>  pRET_sd,
>>  nmret,
>>  pya_var,
>
> [snip]
>
> Is there a reason that you've told SQLite the expected data type for
> only some of the columns?
>
>> Interestingly, I find that the problem I reported does not for columns
>> labeled real in the schema info. For example, the type of column RET
>> never changes no matter what the first observation is.
>
> Yes, that is expected and I think it is the solution to your problem:
> setup your schema so that all columns have a declared type.  For some
> details on SQLite's type system see
> http://www.sqlite.org/datatype3.html.
>
> RSQLite currently maps NA values to NULL in the database.  Pulling
> data out of a SELECT query, RSQLite uses the sqlite3_column_type
> SQLite API to determine the data type and map it to an R type.  If
> NULL is encountered, then the schema is inspected using
> sqlite3_column_decltype to attempt to obtain a type.  If that fails,
> the data is mapped to a character vector at the R level.  The type
> selection is done once after the first row has been fetched.
>
> To work around this you can:
>
> - make sure your schema has well defined
>  types (which will help SQLite perform its operations);
>
> - check whether the returned column has the expected type and convert
>  if needed at the R level.
>
> - remove NA/NULL values from the db or decide on a different way of
>  encoding them (e.g you might be able to use -1 in the db in some
>  situation to indicate missing).  Your R code would then need to map
>  these to proper NA.
>
> Hope that helps.
>
> + seth
>
>
>
> --
> Seth Falcon | @sfalcon | http://userprimary.net/
>



-- 
700 Fisher Hall
2100 Neil Ave.
Columbus, Ohio  43210
http://www.fisher.osu.edu/~chen_1002/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SetInternet2, RCurl and proxy

2011-03-01 Thread Manta
It says the function does not exist. The version is around 2.8, cant check
right now. Is it because it's an older version? If so, is there any way to
do it in a different way then?

-- 
View this message in context: 
http://r.789695.n4.nabble.com/SetInternet2-RCurl-and-proxy-tp3248576p3330244.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nls not solving

2011-03-01 Thread Peter Ehlers

On 2011-03-01 06:38, Schatzi wrote:

Here is a reply by Bart:
Yes you're right (I should have taken off my glasses and looked closer).
However, the argument is essentially the same:

Suppose you have a solution with a,b,k,l. Then for any positive c, [a+b-bc]
+ [bc] + (bc) *exp(kl')exp(-kx) is also a solution, where l'
= l - log(c)/k  .

Cheers,
Bert

(Feel free to post this correction if you like)


This is from me:
The problem with dropping the "l" parameter is that it is supposed to
account for the lag component. This equation was published in the literature
and has been being solved in SAS. When I put it in excel, it solves, but not
very well as it comes to a different solution for each time that I change
the starting values. As such, I'm not sure how SAS solves for it and I'm not
sure what I should do about the equation. Maybe I should just drop the
parameter "a." Thanks for the help.


When you say 'published in the literature' you should
provide a reference; you may be misinterpreting what's
published.

If SAS provides a 'solution', then there's an added
assumption being made (perhaps 'l' is being fixed?).
What Excel does is of little interest.

'Dropping' the parameter 'a' is equivalent to setting a=0.
You could also set, say, a = -10 or l = 50, or ...
The point is that, as Bert says, the model is
nonidentifiable.

Peter Ehlers

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Speed up sum of outer products?

2011-03-01 Thread Doran, Harold
Isn't the following the canonical (R-ish) way of doing this:

X = matrix(rnorm(1000*50),1000,50)
system.time({C1 = t(X) %*% X}) # Phil's example

C2 <- crossprod(X) # use crossprod instead

> all.equal(C1,C2)
[1] TRUE

> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
> Behalf Of Phil Spector
> Sent: Tuesday, March 01, 2011 12:31 PM
> To: AjayT
> Cc: r-help@r-project.org
> Subject: Re: [R] Speed up sum of outer products?
> 
> What you're doing is breaking up the calculation of X'X
> into n steps.   I'm not sure what you mean by "very slow":
> 
> > X = matrix(rnorm(1000*50),1000,50)
> > n = 1000
> > system.time({C=matrix(0,50,50);for(i in 1:n)C = C + (X[i,] %o% X[i,])})
> user  system elapsed
>0.096   0.008   0.104
> 
> Of course, you could just do the calculation directly:
> 
> > system.time({C1 = t(X) %*% X})
> user  system elapsed
>0.008   0.000   0.007
> > all.equal(C,C1)
> [1] TRUE
> 
> 
>   - Phil Spector
>Statistical Computing Facility
>Department of Statistics
>UC Berkeley
>spec...@stat.berkeley.edu
> 
> 
> 
> On Tue, 1 Mar 2011, AjayT wrote:
> 
> > Hi, I'm new to R and stats, and I'm trying to speed up the following sum,
> >
> > for (i in 1:n){
> > C = C + (X[i,] %o% X[i,])   # the sum of outer products - this is very
> slow
> > according to Rprof()
> > }
> >
> > where X is a data matrix (nrows=1000 X ncols=50), and n=1000. The sum has to
> > be calculated over 10,000 times for different X.
> >
> > I think it is similar to estimating a co-variance matrix for demeaned data
> > X. I tried using cov, but got different answers, and it was'nt much quicker?
> >
> > Any help gratefully appreciated,
> >
> > --
> > View this message in context: http://r.789695.n4.nabble.com/Speed-up-sum-of-
> outer-products-tp3330160p3330160.html
> > Sent from the R help mailing list archive at Nabble.com.
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Speed up sum of outer products?

2011-03-01 Thread Phil Spector
What you're doing is breaking up the calculation of X'X 
into n steps.   I'm not sure what you mean by "very slow":



X = matrix(rnorm(1000*50),1000,50)
n = 1000
system.time({C=matrix(0,50,50);for(i in 1:n)C = C + (X[i,] %o% X[i,])})

   user  system elapsed
  0.096   0.008   0.104

Of course, you could just do the calculation directly:


system.time({C1 = t(X) %*% X})

   user  system elapsed
  0.008   0.000   0.007 

all.equal(C,C1)

[1] TRUE


- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu



On Tue, 1 Mar 2011, AjayT wrote:


Hi, I'm new to R and stats, and I'm trying to speed up the following sum,

for (i in 1:n){
C = C + (X[i,] %o% X[i,])   # the sum of outer products - this is very 
slow
according to Rprof()
}

where X is a data matrix (nrows=1000 X ncols=50), and n=1000. The sum has to
be calculated over 10,000 times for different X.

I think it is similar to estimating a co-variance matrix for demeaned data
X. I tried using cov, but got different answers, and it was'nt much quicker?

Any help gratefully appreciated,

--
View this message in context: 
http://r.789695.n4.nabble.com/Speed-up-sum-of-outer-products-tp3330160p3330160.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SetInternet2, RCurl and proxy

2011-03-01 Thread Uwe Ligges



On 01.03.2011 12:00, Manta wrote:

Dear all,

I am facing a problem. I am trying to install packages using a proxy, but I
am not able to call the setInternet2 function, either with the small or
capital s. What package do I have to call then? And, could there be a reason
why this does not function?


What does the error message say? Is this a recent version of R?

Uwe Ligges



Thanks,
Marco



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Does POSIXlt extract date components properly?

2011-03-01 Thread Luke Miller
Month counts from 0 in POSIXlt objects, so that April is month 3 in your
example, January being month 0.

Year counts from 1900 in POSIXlt objects, so that 2005 should return as 105
in your example.

All of the other fields in POSIXlt should return values that you might
expect them to a priori. Keep an eye on daylight savings time adjustments if
they apply in your time zone.

On Tue, Mar 1, 2011 at 12:01 PM, Seth W Bigelow  wrote:

> I would like to use POSIX classes to store dates and extract components of
> dates. Following the example in Spector ("Data Manipulation in R"), I
> create a date
>
> > mydate = as. POSIXlt('2005-4-19 7:01:00')
>
> I then successfully extract the day with the command
>
> > mydate$day
> [1] 19
>
> But when I try to extract the month
>
>  > mydate$mon
> [1] 3
>
> it returns the wrong month. And mydate$year is off by about 2,000 years.
> Am I doing something wrong?
>
> Dr. Seth  W. Bigelow
> Biologist, USDA-FS Pacific Southwest Research Station
> 1731 Research Park Drive, Davis California
> sbige...@fs.fed.us /  ph. 530 759 1718
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
___
Luke Miller
Postdoctoral Researcher
Marine Science Center
Northeastern University
Nahant, MA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is there any Command showing correlation of all variables in a dataset?

2011-03-01 Thread Liviu Andronic
On Tue, Mar 1, 2011 at 11:41 AM, JoonGi  wrote:
>
> Thanks in advance.
>
> I want to derive correlations of variables in a dataset
>
> Specifically
>
> library(Ecdat)
> data(Housing)
> attach(Housing)
> cor(lotsize, bathrooms)
>
> this code results only the correlationship between two variables.
> But I want to examine all the combinations of variables in this dataset.
> And I will finally make a table in Latex.
>
> How can I test correlations for all combinations of variables?
> with one simple command?
>
See Rcmdr for an example.
Liviu


>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Is-there-any-Command-showing-correlation-of-all-variables-in-a-dataset-tp3329599p3329599.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Do you know how to read?
http://www.alienetworks.com/srtest.cfm
http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader
Do you know how to write?
http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problems with playwith

2011-03-01 Thread Liviu Andronic
On Tue, Mar 1, 2011 at 5:58 PM, R Heberto Ghezzo, Dr
 wrote:
> hello, i tried to run playwith but :
>
>> library(playwith)
> Loading required package: lattice
> Loading required package: cairoDevice
> Loading required package: gWidgetsRGtk2
> Loading required package: gWidgets
> Error in inDL(x, as.logical(local), as.logical(now), ...) :
>  unable to load shared object 'H:/R/cran/RGtk2/libs/i386/RGtk2.dll':
>  LoadLibrary failure:  The specified procedure could not be found.
> Failed to load RGtk2 dynamic library, attempting to install it.
>
Did you install RGtk2? [1]
Liviu

[1] https://code.google.com/p/playwith/


> Learn more about GTK+ at http://www.gtk.org
> If the package still does not load, please ensure that GTK+ is installed and 
> that it is on your PATH environment variable
> IN ANY CASE, RESTART R BEFORE TRYING TO LOAD THE PACKAGE AGAIN
> Error : .onAttach failed in attachNamespace() for 'gWidgetsRGtk2', details:
>  call: .Call(name, ..., PACKAGE = PACKAGE)
>  error: C symbol name "S_gtk_icon_factory_new" not in DLL for package "RGtk2"
> Error: package 'gWidgetsRGtk2' could not be loaded
>>
>> Sys.getenv("PATH")
>                                                                               
>                                                                               
>                                                                               
>                                                                               
>            PATH
> "H:\\R/GTK/bin;H:\\R/GTK/lib;H:\\R/ImageMagick;C:\\windows\\system32;C:\\windows;C:\\windows\\System32\\Wbem;C:\\windows\\System32\\WindowsPowerShell\\v1.0\\;C:\\Program
>  Files\\Common Files\\Ulead Systems\\MPEG;C:\\Program 
> Files\\QuickTime\\QTSystem\\;H:\\R\\GTK\\GTK2-Runtime\\bin;H:\\PortableUSB/PortableApps/MikeTex/miktex/bin"
>>
> packages(lattice, cairoDevice, gWidgetsRGtk2, gWidgets, RGtk2, playwith) were 
> reinstalled
> program GTK was reinstalled.
> using R-2-12-2 on Windows 7
> Can anybody suggest a solution?
> thanks
>
> R.Heberto Ghezzo Ph.D.
> Montreal - Canada
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Do you know how to read?
http://www.alienetworks.com/srtest.cfm
http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader
Do you know how to write?
http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is there any Command showing correlation of all variables in a dataset?

2011-03-01 Thread Scott Chamberlain
cor.prob function gives matrix of correlation coefficients and p-values together


### Function for calculating correlation matrix, corrs below diagonal, 
### and P-values above diagonal cor.prob <- function(X, dfr = nrow(X) - 2) 
{ R <- cor(X) above <- row(R) < col(R) r2 <- R[above]^2 Fstat <- r2 * dfr / (1 
- r2) R[above] <- 1 - pf(Fstat, 1, dfr) R } 


Scott 
On Tuesday, March 1, 2011 at 9:54 AM, rex.dw...@syngenta.com wrote:
?cor answers that question. If Housing is a dataframe, cor(Housing) should do 
it. Surprisingly, ??correlation doesn't point you to ?cor.
> 
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
> Behalf Of JoonGi
> Sent: Tuesday, March 01, 2011 5:41 AM
> To: r-help@r-project.org
> Subject: [R] Is there any Command showing correlation of all variables in a 
> dataset?
> 
> 
> Thanks in advance.
> 
> I want to derive correlations of variables in a dataset
> 
> Specifically
> 
> library(Ecdat)
> data(Housing)
> attach(Housing)
> cor(lotsize, bathrooms)
> 
> this code results only the correlationship between two variables.
> But I want to examine all the combinations of variables in this dataset.
> And I will finally make a table in Latex.
> 
> How can I test correlations for all combinations of variables?
> with one simple command?
> 
> 
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Is-there-any-Command-showing-correlation-of-all-variables-in-a-dataset-tp3329599p3329599.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> 
> message may contain confidential information. If you are not the designated 
> recipient, please notify the sender immediately, and delete the original and 
> any copies. Any use of the message by you is prohibited. 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mlogit.data

2011-03-01 Thread Mark Difford
My previous posting seems to have got mangled. This reposts it.

On Mar 01, 2011; 03:32pm gmacfarlane wrote:

>> workdata.csv
>> The code I posted is exactly what I am running. What you need is this
>> data. Here is the code again. 
> hbwmode<-mlogit.data("worktrips.csv", shape="long", choice="CHOSEN",
> alt.var="ALTNUM") 
> hbwmode<-mlogit.data(hbwtrips, shape="long", choice="CHOSEN",
> alt.var="ALTNUM")

You still have not done what the posting guide asks for but have expected me
(or someone else) to scrutinize a large unknown data set (22003 rows). 

Fortunately there are other routes. Had you studied Yves Croissant's
examples (?mlogit.data), which do work, you would have seen that your input
or "raw" data have to have a particular format for mlogit.data to work.

In particular, the "alt.var" ("mode" in the TravelMode data set and "ALTNUM"
in your data set) has to go through all its levels in sequence. Yours don't
(your variable has 6 levels but sometimes runs from 1 to 5, sometimes from 2
to 6, and so on). Within each run there must be only one choice.

##
> library(mlogit)
> data("TravelMode", package = "AER")
> head(TravelMode, n= 20)
   individual  mode choice wait vcost travel gcost income size
1   1   air no   695910070 351
2   1 train no   343137271 351
3   1   bus no   352541770 351
4   1   caryes01018030 351
5   2   air no   6458 6868 302
6   2 train no   443135484 302
7   2   bus no   532539985 302
8   2   caryes01125550 302
9   3   air no   69   115125   129 401
10  3 train no   3498892   195 401
11  3   bus no   3553882   149 401
12  3   caryes023720   101 401
13  4   air no   6449 6859 703
14  4 train no   442635479 703
15  4   bus no   532139981 703
16  4   caryes0 518032 703
17  5   air no   646014482 452
18  5 train no   443240493 452
19  5   bus no   532644994 452
20  5   caryes0 860099 452

When we look at just the relevant part of your data we have the following:

> hbwtrips<-read.csv("E:/Downloads/workdata.csv", header=TRUE, sep=",",
> dec=".", row.names=NULL)
> head(hbwtrips[, c(2:11)], n=25)
   HHID PERID CASE ALTNUM NUMALTS CHOSEN  IVTT  OVTT  TVTT   COST
1 2 11  1   5  1 13.38  2.00 15.38  70.63
2 2 11  2   5  0 18.38  2.00 20.38  35.32
3 2 11  3   5  0 20.38  2.00 22.38  20.18
4 2 11  4   5  0 25.90 15.20 41.10 115.64
5 2 11  5   5  0 40.50  2.00 42.50   0.00
6 3 12  1   5  0 29.92 10.00 39.92 390.81
7 3 12  2   5  0 34.92 10.00 44.92 195.40
8 3 12  3   5  0 21.92 10.00 31.92  97.97
9 3 12  4   5  1 22.96 14.20 37.16 185.00
103 12  5   5  0 58.95 10.00 68.95   0.00
115 13  1   4  1  8.60  6.00 14.60  37.76
125 13  2   4  0 13.60  6.00 19.60  18.88
135 13  3   4  0 15.60  6.00 21.60  10.79
145 13  4   4  0 16.87 21.40 38.27 105.00
156 14  1   4  0 30.60  8.50 39.10 417.32
166 14  2   4  0 35.70  8.50 44.20 208.66
176 14  3   4  0 22.70  8.50 31.20 105.54
186 14  4   4  1 24.27  9.00 33.27 193.49
198 25  2   4  1 23.04  3.00 26.04  29.95
208 25  3   4  0 25.04  3.00 28.04  17.12
218 25  4   4  0 25.04 23.50 48.54 100.00
228 25  5   4  0 34.35  3.00 37.35   0.00
238 36  2   5  0 11.14  3.50 14.64  14.00
248 36  3   5  0 13.14  3.50 16.64   8.00
258 36  4   5  1  3.95 16.24 20.19 100.00

To show you that this is so we will mock up two variables that have the
characteristics described above and use them to execute the function.

##
hbwtrips$CHOICEN <- rep(c(rep(0,10),1), 2003)
hbwtrips$ALTNUMTest <- gl(11,1,22033, labels=LETTERS[1:11])
hbwtrips[1:30, c(1:11,44,45)]
hbwmode <- mlogit.data(hbwtrips, varying=c(8:11), shape="long",
choice="CHOICEN", 
  alt.var="ALTNUMTest")

Hope that helps,

Regards, Mark.

-- 
View this message in context: 
http://r.789695.n4.nabble.com/mlogit-data-tp3328739p3330148.html
Sent from the R help mailing list archive at Nabbl

[R] Speed up sum of outer products?

2011-03-01 Thread AjayT
Hi, I'm new to R and stats, and I'm trying to speed up the following sum,

for (i in 1:n){
C = C + (X[i,] %o% X[i,])   # the sum of outer products - this is very 
slow
according to Rprof()
}

where X is a data matrix (nrows=1000 X ncols=50), and n=1000. The sum has to
be calculated over 10,000 times for different X. 

I think it is similar to estimating a co-variance matrix for demeaned data
X. I tried using cov, but got different answers, and it was'nt much quicker?

Any help gratefully appreciated, 

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Speed-up-sum-of-outer-products-tp3330160p3330160.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Does POSIXlt extract date components properly?

2011-03-01 Thread Seth W Bigelow
I would like to use POSIX classes to store dates and extract components of 
dates. Following the example in Spector ("Data Manipulation in R"), I 
create a date

> mydate = as. POSIXlt('2005-4-19 7:01:00')

I then successfully extract the day with the command

> mydate$day
[1] 19

But when I try to extract the month

 > mydate$mon
[1] 3

it returns the wrong month. And mydate$year is off by about 2,000 years. 
Am I doing something wrong?

Dr. Seth  W. Bigelow
Biologist, USDA-FS Pacific Southwest Research Station
1731 Research Park Drive, Davis California
sbige...@fs.fed.us /  ph. 530 759 1718
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] problems with playwith

2011-03-01 Thread R Heberto Ghezzo, Dr
hello, i tried to run playwith but :

> library(playwith)
Loading required package: lattice
Loading required package: cairoDevice
Loading required package: gWidgetsRGtk2
Loading required package: gWidgets
Error in inDL(x, as.logical(local), as.logical(now), ...) : 
  unable to load shared object 'H:/R/cran/RGtk2/libs/i386/RGtk2.dll':
  LoadLibrary failure:  The specified procedure could not be found.
Failed to load RGtk2 dynamic library, attempting to install it.
Learn more about GTK+ at http://www.gtk.org
If the package still does not load, please ensure that GTK+ is installed and 
that it is on your PATH environment variable
IN ANY CASE, RESTART R BEFORE TRYING TO LOAD THE PACKAGE AGAIN
Error : .onAttach failed in attachNamespace() for 'gWidgetsRGtk2', details:
  call: .Call(name, ..., PACKAGE = PACKAGE)
  error: C symbol name "S_gtk_icon_factory_new" not in DLL for package "RGtk2"
Error: package 'gWidgetsRGtk2' could not be loaded
> 
> Sys.getenv("PATH")




PATH 
"H:\\R/GTK/bin;H:\\R/GTK/lib;H:\\R/ImageMagick;C:\\windows\\system32;C:\\windows;C:\\windows\\System32\\Wbem;C:\\windows\\System32\\WindowsPowerShell\\v1.0\\;C:\\Program
 Files\\Common Files\\Ulead Systems\\MPEG;C:\\Program 
Files\\QuickTime\\QTSystem\\;H:\\R\\GTK\\GTK2-Runtime\\bin;H:\\PortableUSB/PortableApps/MikeTex/miktex/bin"
 
> 
packages(lattice, cairoDevice, gWidgetsRGtk2, gWidgets, RGtk2, playwith) were 
reinstalled
program GTK was reinstalled.
using R-2-12-2 on Windows 7
Can anybody suggest a solution?
thanks

R.Heberto Ghezzo Ph.D.
Montreal - Canada
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] bootstrap resampling question

2011-03-01 Thread Giovanni Petris
A simple way of sampling with replacement from 1:20, with the additional
constraint that each number can be selected at most five times is

> sample(rep(1:20, 5), 20)

HTH,
Giovanni

On Tue, 2011-03-01 at 11:30 +0100, Bodnar Laszlo EB_HU wrote:
> Hello there,
> 
> I have a problem concerning bootstrapping in R - especially focusing on the 
> resampling part of it. I try to sum it up in a simplified way so that I would 
> not confuse anybody.
> 
> I have a small database consisting of 20 observations (basically numbers from 
> 1 to 20, I mean: 1, 2, 3, 4, 5, ... 18, 19, 20).
> 
> I would like to resample this database many times for the bootstrap process 
> with the following two conditions. The resampled databases should also have 
> 20 observations and you can select each of the previously mentioned 20 
> numbers with replacement. I guess it is obvious so far. Now the more 
> difficult second condition is that one number can be selected only maximum 5 
> times. In order to make this clear I try to show you an example. So there can 
> be resampled databases like the following ones:
> 
> (1st database)  1,2,1,2,1,2,1,2,1,2,3,3,3,3,3,4,4,4,4,4
> (4 different numbers are chosen, each selected 5 times)
> 
> (2nd database)  1,8,8,6,8,8,8,2,3,4,5,6,6,6,6,7,19,1,1,1
> (Two numbers - 8 and 6 - selected 5 times, number "1" selected four times, 
> the others selected less than 4 times)
> 
> My very first guess that came to my mind whilst thinking about the problem 
> was the sample function where there are settings like replace=TRUE and 
> prob=... where you can create a probability vector i.e. how much should be 
> the probability of selecting a number. So I tried to calculate probabilities 
> first. I thought the problem can basically described as a k-combination with 
> repetitions. Unfortunately the only thing I could calculate so far is the 
> total number of all possible selections which amounts to 137 846 527 049.
> 
> Anybody knows how to implement my second "tricky" condition into one of the R 
> functions? Are 'boot' and 'bootstrap' packages capable of managing this? I 
> guess they are, I just couldn't figure it out yet...
> 
> Thanks very much! Best regards,
> Laszlo Bodnar
> 
> 
> Ez az e-mail és az összes hozzá tartozó csatolt melléklet titkos és/vagy 
> jogilag, szakmailag vagy más módon védett információt tartalmazhat. 
> Amennyiben nem Ön a levél címzettje akkor a levél tartalmának közlése, 
> reprodukálása, másolása, vagy egyéb más úton történő terjesztése, 
> felhasználása szigorúan tilos. Amennyiben tévedésből kapta meg ezt az 
> üzenetet kérjük azonnal értesítse az üzenet küldőjét. Az Erste Bank Hungary 
> Zrt. (EBH) nem vállal felelősséget az információ teljes és pontos - 
> címzett(ek)hez történő - eljuttatásáért, valamint semmilyen késésért, 
> kapcsolat megszakadásból eredő hibáért, vagy az információ felhasználásából 
> vagy annak megbízhatatlanságából eredő kárért.
> 
> Az üzenetek EBH-n kívüli küldője vagy címzettje tudomásul veszi és 
> hozzájárul, hogy az üzenetekhez más banki alkalmazott is hozzáférhet az EBH 
> folytonos munkamenetének biztosítása érdekében.
> 
> 
> This e-mail and any attached files are confidential and/...{{dropped:19}}
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 

Giovanni Petris  
Associate Professor
Department of Mathematical Sciences
University of Arkansas - Fayetteville, AR 72701
Ph: (479) 575-6324, 575-8630 (fax)
http://definetti.uark.edu/~gpetris/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] bootstrap resampling question

2011-03-01 Thread Greg Snow
Here are a couple of thoughts.

If you want to use the boot package then the statistic function you give it 
just receives the bootstrapped indexes, you could test the indexes for your 
condition of not more than 5 of each and if it fails return an NA instead of 
computing the statistic.  Then in your output just remove the NAs (you should 
increase the total number of samples tried so that you have a reasonable number 
after deletion).

If you want to do it by hand, just use the rep function to create 5 replicates 
of your data, then sample from that without replacement.  You can get up to 5 
copies of each value (from the hand replication), but no more since it samples 
without replacement.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of Bodnar Laszlo EB_HU
> Sent: Tuesday, March 01, 2011 3:31 AM
> To: 'r-help@r-project.org'
> Subject: [R] bootstrap resampling question
> 
> Hello there,
> 
> 
> 
> I have a problem concerning bootstrapping in R - especially focusing on
> the resampling part of it. I try to sum it up in a simplified way so
> that I would not confuse anybody.
> 
> 
> 
> I have a small database consisting of 20 observations (basically
> numbers from 1 to 20, I mean: 1, 2, 3, 4, 5, ... 18, 19, 20).
> 
> 
> 
> I would like to resample this database many times for the bootstrap
> process with the following two conditions. The resampled databases
> should also have 20 observations and you can select each of the
> previously mentioned 20 numbers with replacement. I guess it is obvious
> so far. Now the more difficult second condition is that one number can
> be selected only maximum 5 times. In order to make this clear I try to
> show you an example. So there can be resampled databases like the
> following ones:
> 
> 
> 
> (1st database)  1,2,1,2,1,2,1,2,1,2,3,3,3,3,3,4,4,4,4,4
> 
> (4 different numbers are chosen, each selected 5 times)
> 
> 
> 
> (2nd database)  1,8,8,6,8,8,8,2,3,4,5,6,6,6,6,7,19,1,1,1
> 
> (Two numbers - 8 and 6 - selected 5 times, number "1" selected four
> times, the others selected less than 4 times)
> 
> 
> 
> My very first guess that came to my mind whilst thinking about the
> problem was the sample function where there are settings like
> replace=TRUE and prob=... where you can create a probability vector
> i.e. how much should be the probability of selecting a number. So I
> tried to calculate probabilities first. I thought the problem can
> basically described as a k-combination with repetitions. Unfortunately
> the only thing I could calculate so far is the total number of all
> possible selections which amounts to 137 846 527 049.
> 
> 
> 
> Anybody knows how to implement my second "tricky" condition into one of
> the R functions? Are 'boot' and 'bootstrap' packages capable of
> managing this? I guess they are, I just couldn't figure it out yet...
> 
> 
> 
> Thanks very much! Best regards,
> 
> Laszlo Bodnar
> 
> 
> 
> ___
> _
> 
> Ez az e-mail és az összes hozzá tartozó csatolt melléklet titkos
> és/vagy jogilag, szakmailag vagy más módon védett információt
> tartalmazhat. Amennyiben nem Ön a levél címzettje akkor a levél
> tartalmának közlése, reprodukálása, másolása, vagy egyéb más úton
> történő terjesztése, felhasználása szigorúan tilos. Amennyiben
> tévedésből kapta meg ezt az üzenetet kérjük azonnal értesítse az üzenet
> küldőjét. Az Erste Bank Hungary Zrt. (EBH) nem vállal felelősséget az
> információ teljes és pontos - címzett(ek)hez történő - eljuttatásáért,
> valamint semmilyen késésért, kapcsolat megszakadásból eredő hibáért,
> vagy az információ felhasználásából vagy annak megbízhatatlanságából
> eredő kárért.
> 
> 
> 
> Az üzenetek EBH-n kívüli küldője vagy címzettje tudomásul veszi és
> hozzájárul, hogy az üzenetekhez más banki alkalmazott is hozzáférhet az
> EBH folytonos munkamenetének biztosítása érdekében.
> 
> 
> 
> 
> 
> This e-mail and any attached files are confidential
> and/...{{dropped:19}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem on flexmix when trying to apply signature developed in one model to a new sample

2011-03-01 Thread Jon Toledo


Problem on flexmix when trying to apply signature developed in one model to a 
new sample.
Dear
R Users, R Core Team,

 

I have a problem when trying to know the
classification of the tested cases using two variables with the function  of 
flexmix:

 

After importing the database and creating
a matrix:

BM<-cbind(Data$var1,Data$var2)

 

I see that the best model has 2 groups and
use:

 

ex2
<- flexmix(BM~1, k=2, model=FLXMCmvnorm(diagonal=FALSE))

print(ex2)

plotEll(ex2, BM)

 

Then I want to test to which group one
subset of patients belongs, so I import a smaller sample of the previous data:

BM2<-data.frame (Data2$var1,Data2$var2)

 

However when I test the results I get are
from the complete training sample I used in ex2 and not from the new sample
BM2.

 

ProbMCI<-posterior(ex2, BM2)

 

And if I do the following I get double the
number of entered cases (I think because I entered 2 variables):

BM2<-cbind (Data2$var1,Data2$var2)

p<-posterior(ex2)[BMMCI,]

max.col(p)

 

(The same with clusters(ex2)[BM2])

 

In the future I would like to test the
result of this mixture also in new samples.

 

Thank you in advance  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] bootstrap resampling - simplified

2011-03-01 Thread Bodnar Laszlo EB_HU
Hello there,

I have a problem concerning bootstrapping in R - especially focusing on the 
resampling part of it. I try to sum it up in a simplified way so that I would 
not confuse anybody.

I have a small database consisting of 20 observations (basically numbers from 1 
to 20, I mean: 1, 2, 3, 4, 5, ... 18, 19, 20).

I would like to resample this database many times for the bootstrap process 
with the following conditions. Firstly, every resampled database should also 
include 20 observations. Secondly, when selecting a number from the 
above-mentioned 20 numbers, you can do this selection with replacement. The 
difficult part comes now: one number can be selected only maximum 5 times. In 
order to make this clear I show you a couple of examples. So the resampled 
databases might be like the following ones:

(1st database)  1,2,1,2,1,2,1,2,1,2,3,3,3,3,3,4,4,4,4,4
4 different numbers are chosen (1, 2, 3, 4), each selected - for the maximum 
possible - 5 times.

(2nd database)  1,8,8,6,8,8,8,2,3,4,5,6,6,6,6,7,19,1,1,1
Two numbers - 8 and 6 - selected 5 times (the maximum possible times), number 1 
selected 4 times, the others selected less than 4 times.

(3rd database)  1,1,2,2,3,3,4,4,9,9,9,10,10,13,10,9,3,9,2,1
Number 9 chosen for the maximum possible 5 times, number 10, 3, 2, 1 chosen for 
3 times, number 4 selected twice and number 13 selected only once.

...

Anybody knows how to implement my "tricky" condition into one of the R 
functions - that one number can be selected only 5 times at most? Are 'boot' 
and 'bootstrap' packages capable of managing this? I guess they are, I just 
couldn't figure it out yet...

Thanks very much! Best regards,
Laszlo Bodnar



Ez az e-mail és az összes hozzá tartozó csatolt melléklet titkos és/vagy 
jogilag, szakmailag vagy más módon védett információt tartalmazhat. 
Amennyiben nem Ön a levél címzettje akkor a levél tartalmának közlése, 
reprodukálása, másolása, vagy egyéb más úton történő terjesztése, 
felhasználása szigorúan tilos. Amennyiben tévedésből kapta meg ezt az 
üzenetet kérjük azonnal értesítse az üzenet küldőjét. Az Erste Bank 
Hungary Zrt. (EBH) nem vállal felelősséget az információ teljes és pontos 
- címzett(ek)hez történő - eljuttatásáért, valamint semmilyen 
késésért, kapcsolat megszakadásból eredő hibáért, vagy az információ 
felhasználásából vagy annak megbízhatatlanságából eredő kárért.

Az üzenetek EBH-n kívüli küldője vagy címzettje tudomásul veszi és 
hozzájárul, hogy az üzenetekhez más banki alkalmazott is hozzáférhet az 
EBH folytonos munkamenetének biztosítása érdekében.


This e-mail and any attached files are confidential and/...{{dropped:19}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is there any Command showing correlation of all variables in a dataset?

2011-03-01 Thread rex.dwyer
?cor  answers that question.  If Housing is a dataframe, cor(Housing) should do 
it.  Surprisingly, ??correlation doesn't point you to ?cor.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of JoonGi
Sent: Tuesday, March 01, 2011 5:41 AM
To: r-help@r-project.org
Subject: [R] Is there any Command showing correlation of all variables in a 
dataset?


Thanks in advance.

I want to derive correlations of variables in a dataset

Specifically

library(Ecdat)
data(Housing)
attach(Housing)
cor(lotsize, bathrooms)

this code results only the correlationship between two variables.
But I want to examine all the combinations of variables in this dataset.
And I will finally make a table in Latex.

How can I test correlations for all combinations of variables?
with one simple command?


--
View this message in context: 
http://r.789695.n4.nabble.com/Is-there-any-Command-showing-correlation-of-all-variables-in-a-dataset-tp3329599p3329599.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




message may contain confidential information. If you are not the designated 
recipient, please notify the sender immediately, and delete the original and 
any copies. Any use of the message by you is prohibited. 
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Quantreg model error and goodness of fit

2011-03-01 Thread sadz a
Hi All,

I am using the package quantreg to create a 'model' that I can then use to
predict the response variable (volume) from a larger set of explanatory
variables (environmental factors):

ie-
#model-
>fit <- rqss(volume~qss(env.factor1,lambda=1)+ qss(env.factor2,lambda=1),
tau =  0.9)
>summary(fit)
predict volume from new environmental factors for a larger area where I do
not know the volume
> predi<-predict(fit, new, interval = "none", level = 0.9)

However I am getting the following error message:
>Error in predict.qss1(qss, newdata = newd, ...) :
>no extrapolation allowed in predict.qss

Is there a way around this?

and also with the initial model, is there a way to test goodness of fit? If
so how? because #summary(fit) only tells me if the model is significant not
how good it is (like with a linear regression you get and R square value
which tells you how good the model is).

Thank you
All help is appreciated,
If you would like anything clarified please contact me,
Kitty

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nls not solving

2011-03-01 Thread Schatzi
Here is a reply by Bart:
Yes you're right (I should have taken off my glasses and looked closer).
However, the argument is essentially the same:

Suppose you have a solution with a,b,k,l. Then for any positive c, [a+b-bc]
+ [bc] + (bc) *exp(kl')exp(-kx) is also a solution, where l'
= l - log(c)/k  .

Cheers,
Bert

(Feel free to post this correction if you like)


This is from me:
The problem with dropping the "l" parameter is that it is supposed to
account for the lag component. This equation was published in the literature
and has been being solved in SAS. When I put it in excel, it solves, but not
very well as it comes to a different solution for each time that I change
the starting values. As such, I'm not sure how SAS solves for it and I'm not
sure what I should do about the equation. Maybe I should just drop the
parameter "a." Thanks for the help.

-- 
View this message in context: 
http://r.789695.n4.nabble.com/nls-not-solving-tp3328647p3329936.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Robust variance estimation with rq (failure of the bootstrap?)

2011-03-01 Thread Matt Shotwell
Jim,

Thanks for pointing me to this article. The authors argue that the 
bootstrap intervals for a robust estimator may not be as robust as the 
estimator. In this context, robustness is measured by the breakdown 
point, which is supposed to measure robustness to outliers. Even so, the
authors found that the upper bound of a quantile bootstrap interval for 
the sample median was nearly as robust as the sample median. That brings
some comfort in using quantile bootstrap intervals in quantile
regression.

Does the sandwich estimator assume that errors are independent? And a 
related question: Does the rq function allow the user to specify 
clusters/grouping among the observations?

Best,
Matt

On Tue, 2011-03-01 at 05:35 -0600, James Shaw wrote:
> Matt:
> 
> Thanks for your prompt reply.
> 
> The disparity between the bootstrap and sandwich variance estimates
> derived when modeling the highly skewed outcome suggest that either
> (A) the empirical robust variance estimator is underestimating the
> variance or (B) the bootstrap is breaking down.  The bootstrap
> variance estimate of a robust location estimate is not necessarily
> robust, see Statistics & Probability Letters 50 (2000) 49-53.  Since
> submitting my earlier post, I have noticed that the the robust kernel
> variance estimate is similar to the bootstrap estimate.  Under what
> conditions would one expect Koenker and Machado's sandwich variance
> estimator, which uses a local estimate of the sparsity, to fail?
> 
> --
> Jim
> 
> 
> 
> On Mon, Feb 28, 2011 at 8:59 PM, Matt Shotwell  wrote:
> > Jim,
> >
> > If repeated measurements on patients are correlated, then resampling all
> > measurements independently induces an incorrect sampling distribution
> > (=> incorrect variance) on a statistic of these data. One solution, as
> > you mention, is the block or cluster bootstrap, which preserves the
> > correlation among repeated observations in resamples. I don't
> > immediately see why the cluster bootstrap is unsuitable.
> >
> > Beyond this, I would be concerned about *any* variance estimates that
> > are blind to correlated observations.
> >
> > The bootstrap variance estimate may be larger than the asymptotic
> > variance estimate, but that alone isn't evidence to favor one over the
> > other.
> >
> > Also, I can't justify (to myself) why skew would hamper the quality of
> > bootstrap variance estimates. I wonder how it affects the sandwich
> > variance estimate...
> >
> > Best,
> > Matt
> >
> > On Mon, 2011-02-28 at 17:50 -0600, James Shaw wrote:
> >> I am fitting quantile regression models using data collected from a
> >> sample of 124 patients.  When modeling cross-sectional associations, I
> >> have noticed that nonparametric bootstrap estimates of the variances
> >> of parameter estimates are much greater in magnitude than the
> >> empirical Huber estimates derived using summary.rq's "nid" option.
> >> The outcome variable is severely skewed, and I am afraid that this may
> >> be affecting the consistency of the bootstrap variance estimates.  I
> >> have read that the m out of n bootstrap can be used to overcome this
> >> problem.  However, this procedure requires both the original sample
> >> (n) and the subsample (m) sizes to be large.  The version implemented
> >> in rq.boot does not appear to provide any improvement over the naive
> >> bootstrap.  Ultimately, I am interested in using median regression to
> >> model changes in the outcome variable over time.  Summary.rq's robust
> >> variance estimator is not applicable to repeated-measures data.  I
> >> question whether the block (cluster) bootstrap variance estimator,
> >> which can accommodate intraclass correlation, would perform well.  Can
> >> anyone suggest alternatives for variance estimation in this situation?
> >> Regards,
> >>
> >> Jim
> >>
> >>
> >> James W. Shaw, Ph.D., Pharm.D., M.P.H.
> >> Assistant Professor
> >> Department of Pharmacy Administration
> >> College of Pharmacy
> >> University of Illinois at Chicago
> >> 833 South Wood Street, M/C 871, Room 266
> >> Chicago, IL 60612
> >> Tel.: 312-355-5666
> >> Fax: 312-996-0868
> >> Mobile Tel.: 215-852-3045
> >>
> >> __
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide 
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
> 
> 
> 
> -- 
> James W. Shaw, Ph.D., Pharm.D., M.P.H.
> Assistant Professor
> Department of Pharmacy Administration
> College of Pharmacy
> University of Illinois at Chicago
> 833 South Wood Street, M/C 871, Room 266
> Chicago, IL 60612
> Tel.: 312-355-5666
> Fax: 312-996-0868
> Mobile Tel.: 215-852-3045
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guid

Re: [R] High standard error

2011-03-01 Thread John Kane
Sure why not?

You do realize, do you not that no one has the slightest idea of what you are 
doing?

PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code.




--- On Tue, 3/1/11, danielepippo  wrote:

> From: danielepippo 
> Subject: [R] High standard error
> To: r-help@r-project.org
> Received: Tuesday, March 1, 2011, 9:07 AM
> Hi to everyone,
>     
>      if the estimate of the parameter
> results in 0.196 and his standard
> error is 0.426, can I say that this parameter is not
> significant for the
> model?
> 
> 
> Thank you very much
> 
> Pippo
> 
> -- 
> View this message in context: 
> http://r.789695.n4.nabble.com/High-standard-error-tp3329903p3329903.html
> Sent from the R help mailing list archive at Nabble.com.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] High standard error

2011-03-01 Thread danielepippo
Hi to everyone,

 if the estimate of the parameter results in 0.196 and his standard
error is 0.426, can I say that this parameter is not significant for the
model?


Thank you very much

Pippo

-- 
View this message in context: 
http://r.789695.n4.nabble.com/High-standard-error-tp3329903p3329903.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to Save R library data into xls or dta format

2011-03-01 Thread Gabor Grothendieck
On Tue, Mar 1, 2011 at 4:27 AM, JoonGi  wrote:
>
> Thanks in advance.
>
> I'm having a trouble with data saving.
>
> I want to run the same data which is in Ecdat library at different statistic
> programs(excel, stata and matlab)
>
> The data I want to use is
>
> library(Ecdat)
> data(Housing)
>
> and I want to extract this data our of R as *.dta *.xls formats.
> So, my first try was to open the data in R window and drag and paste to
> excel or notepad.
>

For excel see:
http://rwiki.sciviews.org/doku.php?id=tips:data-io:ms_windows&s=excel


-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] what does the "S.D." returned by {Hmisc} rcorr.cens measure?

2011-03-01 Thread Frank Harrell
Vikki,

The formula you used for std. error of C is not correct.  C is not a simple
per-observation proportion.

SD in the output is the standard error of Dxy.  Dxy = 2(C - .5).  Backsolve
for std err of C.  

Variation in Dxy or C comes from the usual source: sampling variability. 
You can also see this by sampling from the original dataset (a la
bootstrap).

Frank


vikkiyft wrote:
> 
> Dear R-help,
> 
> This is an example in the {Hmisc} help manual in the section of
> rcorr.cens:
> 
>> set.seed(1)
>> x <- round(rnorm(200))
>> y <- rnorm(200)
>> round(rcorr.cens(x, y, outx=F),4)
>C IndexDxy   S.D.  nmissing
> uncensored Relevant Pairs Concordant  Uncertain 
> 0.4831-0.0338 0.0462   200. 0.
>   
> 200. 39800. 19228. 0.
> 
> That S.D. confuses me!!
> It is obviously not the standard deviation of x or y.. but there is only
> one realization of the c-index or Dxy for this sample dataset, where does
> the variation come from..??  if I use the conventional formula for
> calculating the standard deviation of proportions: sqrt((C Index)*(1-C
> Index)/n), I get 0.0353 instead of 0.0462..
> 
> Any advice is appreciated.
> 
> 
> Vikki
> 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/what-does-the-S-D-returned-by-Hmisc-rcorr-cens-measure-tp3329609p3329899.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Explained variance for ICA

2011-03-01 Thread rex.dwyer
You determine the variance explained by *any* unit vector by taking its inner 
product with the data points, then finding the variance of the results.  In the 
case of FastICA, the variance explained by the ICs collectively is exactly the 
same as the variance explained by the principal components (collectively) from 
which they are derived.
HTH
Rex

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Pavel Goldstein
Sent: Tuesday, March 01, 2011 1:24 AM
To: r-help@r-project.org
Subject: [R] Explained variance for ICA

Hello,
I think to use FastICA package for microarray data clusterization,
but one question stops me:  can I know how much variance explain each
component (or all components together) ?
I will be very thankful for the help.

Thanks,

Pavel

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




message may contain confidential information. If you are not the designated 
recipient, please notify the sender immediately, and delete the original and 
any copies. Any use of the message by you is prohibited. 
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unicode&pdf font problem RESOLVED

2011-03-01 Thread Ben Madin
Just to add to this (I've been looking through the archive) problem with 
display unicode fonts in pdf document in R

If you can use the Cairo package to create pdf on Mac, it seems quite happy 
with pushing unicode characters through (probably still font family dependant 
whether it will display)

probstring <- c(' \u2264 0.2',' \u2268 0.4',' \u00FC 0.6',' \u2264 
0.8',' \u2264 1.0')
Cairo(type='pdf', file='outputs/demo.pdf', width=9,height=12, 
units='in', bg='transparent')
plot(1:5,1:5, type='n')
text(1:5,1:5,probstring)  
dev.off()

?Cairo suggests encoding is ignored if you do try to set it.

cheers

Ben



On 14/01/2011, at 7:00 PM, r-help-requ...@r-project.org wrote:

> Date: Thu, 13 Jan 2011 10:47:09 -0500
> From: David Winsemius 
> To: Sascha Vieweg 
> Cc: r-help@r-project.org
> Subject: Re: [R] unicode&pdf font problem RESOLVED
> Message-ID: <74fa099f-4ce5-45c7-a05a-4a1de6c87...@comcast.net>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed; delsp=yes
> 
> 
> On Jan 13, 2011, at 10:41 AM, Sascha Vieweg wrote:
> 
>> I have many German umlauts in my data sets and code them UTF-8. When  
>> it comes to plotting on pdf, I figured out that "CP1257" is a good  
>> choice to output Umlauts. I have no experiences with "CP1250", but  
>> maybe this small hint helps:
>> 
>> pdf(file=paste(sharepath, "/filename.pdf", sep=""), 9, 6, pointsize  
>> = 11, family = "Helvetica", encoding = "CP1257")
> 
> Just an FYI for the archives, that encoding fails with  
> pdf(encoding="CP1257") on a Mac when printing that target umlaut.
> 
> David.
>> 
>> *S*
>> 
>> On 11-01-13 16:17, tde...@cogpsyphy.hu wrote:
>> 
>>> Date: Thu, 13 Jan 2011 16:17:04 +0100 (CET)
>>> From: tde...@cogpsyphy.hu
>>> To: David Winsemius 
>>> Cc: r-help@r-project.org
>>> Subject: Re: [R] unicode&pdf font problem RESOLVED
>>> 
>>> Dear David,
>>> 
>>> Thank you for your efforts. Inspired by your remarks, I started a new
>>> google-search and found this:
>>> http://stackoverflow.com/questions/3434349/sweave-not-printing-localized-characters
>>> 
>>> SO HERE COMES THE SOLUTION (it works on both OSs):
>>> 
>>> pdf.options(encoding = "CP1250")
>>> pdf()
>>> plot(1,type="n")
>>> text(1,1,"\U0171")
>>> dev.off()
>>> 
>>> CP1250 should work for all Central-European languages:
>>> http://en.wikipedia.org/wiki/Windows-1250
>>> 
>>> 
>>> Thank you again,
>>> Denes
>>> 
>>> 
>>> 
 
 On Jan 13, 2011, at 7:01 AM, tde...@cogpsyphy.hu wrote:
 
> 
> Hi!
> 
> Sorry for the missing specs, here they are:
>> version
> _
> platform   i386-pc-mingw32
> arch   i386
> os mingw32
> system i386, mingw32
> status
> major  2
> minor  12.1
> year   2010
> month  12
> day16
> svn rev53855
> language   R
> version.string R version 2.12.1 (2010-12-16)
> 
> OS: Windows 7 (English version, 32 bit)
> 
> 
 
 You are after what Adobe calls: udblacute; 0171.  It is recognized  
 in
 the list of adobe glyphs:
> str(tools::Adobe_glyphs[371, ])
 'data.frame':  1 obs. of  2 variables:
 $ adobe  : chr "udblacute"
 $ unicode: chr "0171"
 
 Consulted the help pages
 points {graphics}
 postscript {grDevices}
 pdf {grDevices}
 charsets {tools}
 postscriptFonts {grDevices}
 
 I have tried a variety of the pdfFonts installed on my Mac without
 success. You can perhaps make a list of fonts on your machines with
 names(pdfFonts()). Perhaps the range of fonts and the glyphs they
 contain is different on your machines. I get consistently warning
 messages saying there is a conversion failure:
 
> pdf("trial.pdf", family="Helvetica")
 # also tried with font="Helvetica" but I think that is erroneous
> plot(1,type="n")
> text(1,1,"print \U0170\U0171")
 Warning messages:
 1: In text.default(1, 1, "print ") :
  conversion failure on 'print ' in 'mbcsToSbcs': dot  
 substituted
 for 
 2: In text.default(1, 1, "print ") :
  conversion failure on 'print ' in 'mbcsToSbcs': dot  
 substituted
 for 
 3: In text.default(1, 1, "print ") :
  conversion failure on 'print ' in 'mbcsToSbcs': dot  
 substituted
 for 
 4: In text.default(1, 1, "print ") :
  conversion failure on 'print ' in 'mbcsToSbcs': dot  
 substituted
 for 
 5: In text.default(1, 1, "print ") :
  font metrics unknown for Unicode character U+0170
 6: In text.default(1, 1, "print ") :
  font metrics unknown for Unicode character U+0171
 7: In text.default(1, 1, "print ") :
  conversion failure on 'print ' in 'mbcsToSbcs': dot  
 substituted
 for 
 8: In text.default(1, 1, "print ") :
  conversion failure on 'print ' in 'mbcsToSb

Re: [R] mlogit.data

2011-03-01 Thread gmacfarlane
http://r.789695.n4.nabble.com/file/n3329821/workdata.csv workdata.csv 

The code I posted is exactly what I am running. What you need is this data.
Here is the code again.

> hbwmode<-mlogit.data("worktrips.csv", shape="long", choice="CHOSEN",
> alt.var="ALTNUM")
> hbwmode<-mlogit.data(hbwtrips, shape="long", choice="CHOSEN",
> alt.var="ALTNUM")

-- 
View this message in context: 
http://r.789695.n4.nabble.com/mlogit-data-tp3328739p3329821.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Components of variance with lme

2011-03-01 Thread Scott Chamberlain
VarCorr() and then calculate percentage variance for each component from output 
of VarCorr
On Tuesday, March 1, 2011 at 6:55 AM, John Sorkin wrote: 
> R 2.10
> Windows Vista
> 
> Is it possible to run a variance-components analysis using lme? I looked at 
> Pinheiro and Bates' book and don't see code that will perform these analyses. 
> If the analyses can not be done using lme, what package might I try?
> Thanks,
> John
> 
> 
> John David Sorkin M.D., Ph.D.
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
> 
> Confidentiality Statement:
> This email message, including any attachments, is for ...{{dropped:12}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nested case-control study

2011-03-01 Thread Terry Therneau
1. Using offset(logweight) in coxph is the same as using an "offset
logweight;" statement in SAS, and neither is the same as case weights.

2. For a nested case control, which is what you said you have, the
"strata" controls who is in what risk set.  No trickery with start,stop
times is needed.  It does no harm, but is not needed.
  In a case-cohort design the start= stop-epsilon trick is one way to
set risk sets up properly.  But that's not the design you gave for your
study.  

Terry T.



On Mon, 2011-02-28 at 17:02 -0800, array chip wrote:
> Terry, thanks very much! 
> 
> Professor Langholz used a SAS software trick to estimate absolute risk
> by creating a fake variable "entry_time" that is 0.001 less than the
> variable "exit_time" (i.e. time to event), and then use both variables
> in Phreg. Is this equivalent to your creating a dummy survival with
> time=1?
> 
> Another question is, is using offset(logweight) inside the formula of
> coxph() the same as using weight=logweight argument in coxph(),
> because my understanding of Professor Langholz's approach for nested
> case-control study is weighted regression?
> 
> Thank you very much for the help.
> 
> John

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Components of variance with lme

2011-03-01 Thread John Sorkin
R 2.10
Windows Vista

Is it possible to run a variance-components analysis using lme? I looked at 
Pinheiro and Bates' book and don't see code that will perform these analyses. 
If the analyses can not be done using lme, what package might I try?
Thanks,
John


John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:6}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simulation

2011-03-01 Thread Mike Marchywka






> Date: Mon, 28 Feb 2011 19:18:18 -0800
> From: kadodamb...@hotmail.com
> To: r-help@r-project.org
> Subject: [R] Simulation
>
> I tried looking for help but I couldn't locate the exact solution.
> I have data that has several variables. I want to do several sample
> simulations using only two of the variables (eg: say you have data between
> people and properties owned. You only want to check how many in the samples
> will come up with bicycles) to estimate probabilities and that sort of
> thing.
> Now, I can only do a simulation in terms of this code: sample(1:10, size =
> 15, replace = TRUE).
> I do not know how select specific variables only.
> I'll appreciate the help



This is probably not the best R but you can do something like either of these.
Note that this is just the easiest derivative of stuff I already had
and can be fixed to your needs, I usually use runif instead of sample for 
example.
The first example probably being much less efficient than the second,


df<-data.frame(a=.1*rnorm(100), b=(1:100)/100,c=(1:100)/100+.1*rnorm(100))

res=1:100; for ( i in 1:100) {res[i]=cor(df[which(runif(100)>.9),])[1,3] }
res
hist(res)
res=1:100; for ( i in 1:100) {wh=which(runif(100)>.9); 
res[i]=cor(df$a[wh],df$c[wh]); }
res





>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Simulation-tp3329173p3329173.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >