Re: [R] dataframe: visualization as tiles(?)

2004-04-13 Thread Jason Turner
Whoops - didn't get what you meant

?mosaicplot

is your friend

Cheers
Jason

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] dataframe: visualization as tiles(?)

2004-04-13 Thread Jason Turner
>
> Dear R users,
>
> I remember seeing somewhere a method of visualizing a set of
> observations on two variables x and y in the following way

Is this what you want?

> ## fake data
> zz <- data.frame(x=sample(0:1,20,rep=T),y=sample((-1:1),20,rep=T))
> zz

> ## tabulate it
> zz.tab <- data.frame(table(zz))
> zz.tab
> library(lattice)
> barchart(y ~ Freq | x, data=zz.tab)

Cheers

Jason

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] dataframe: visualization as tiles(?)

2004-04-13 Thread Itay Furman

Dear R users,

I remember seeing somewhere a method of visualizing a set of 
observations on two variables x and y in the following way

   x=0   x=1

|---| |---|
  y=-1  |   | |   |
|---| |   |
  |   |
|---| |   |
|   | |---|
  y=0   |   |
|   | |---|
|---| |   |
  |---|
|---|
  y=1   |   | |---|
|---| |---|

where x = 0 or 1; y = -1, 0, 1. The 'tile' area represents 
the count of observations with corresponding x and y values.

Now, I don't remember what is the name of the functions that 
support such plots.

I tried help.search("*tile*"); I skimmed the documentation of 
the 'lattice' package. Both seem not to be what I remembered.

Please send me pointers.

Thanks in advance
Itay

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] How does nlm work?

2004-04-13 Thread Jason Turner
> 3) I have never heard of this step selections (line search, dogleg and
> optimal step). I would like to know something about it. I would
> appreciate if someone could send references for me to learn the subject.

IIRC, you'll find them here:

Nonlinear Regression Analysis and Its Applications
Douglas M. Bates and Donald G. Watts
John Wiley & Sons Inc.,
1988
ISBN: 0471816434

Cheers
Jason

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] How does nlm work?

2004-04-13 Thread Frederico Zanqueta Poleto
Dear R users,

I have looked in the reference
Schnabel, R. B., Koontz, J. E. and Weiss, B. E. (1985) A modular
system of algorithms for unconstrained minimization. _ACM Trans.
Math. Software_, *11*, 419-440.
cited in the nlm help.
This article says that the algorithm permits the use of  step selection 
(line search, dogleg and optimal step), analytic or finite diference 
gradient and analytic, finite diference or BFGS Hessian aproximation.

Looking back in the nlm help, it has the information that:
a) it does just the line search step selecion;
b) it has the option to inform the gradient and the Hessian by 
attributes if the user wants.

My questions are:
1) When I do not supply the Hessian, the function does finite difference 
or BFGS approximation? (Is it possible to select one or other?)

2) I have already used the option to inform the gradient but I don't 
know how to inform the Hessian. Anybody has an example?

3) I have never heard of this step selections (line search, dogleg and 
optimal step). I would like to know something about it. I would 
appreciate if someone could send references for me to learn the subject.

Sincerely,

--
Frederico Zanqueta Poleto
[EMAIL PROTECTED]
--
"An approximate answer to the right problem is worth a good deal more than an exact answer 
to an approximate problem." J. W. Tukey
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Makefile for installing all available packages

2004-04-13 Thread Warnes, Gregory R

Below is a makefile I wrote to download and install all available R packages
from the CRAN and BioConductor package repositories.  

The primary advantage of using this makefile instead of R's built-in
install.package() and update.packages() is the creation of a separate
installation log for every package.   Further, if make is invoked with '-k',
failure to install a single package will not derail the installation of
other packages.

I hope that this script may be useful to other folks.  

-Greg

# Download and install all available R packages from the CRAN and
Bioconductor 
# package repositories
#
RCMD ?= R-1.9.0
WGET ?= wget -N -nd -r -A gz -r -l 1 -nv

PACKAGE_FILES = $(wildcard *.gz ) 
PACKAGE_LOGS  = $(addsuffix .log, $(basename $(basename $(PACKAGE_FILES

default: cran bioconductor install

cran: 
$(WGET) "http://cran.r-project.org/src/contrib/PACKAGES.html";

bioconductor: bioCmain bioCcontrib bioCdata

bioCmain:
$(WGET)
"http://www.bioconductor.org/repository/release1.3/package/html/index.html";

bioCcontrib:
$(WGET) "http://www.bioconductor.org/contrib/index.html";

bioCdata:
$(WGET) "http://www.bioconductor.org/data/metaData.html";

install: $(PACKAGE_LOGS)

%.log: %.tar.gz
$(RCMD) INSTALL $< > [EMAIL PROTECTED] 2>&1 
mv [EMAIL PROTECTED] $@



LEGAL NOTICE\ Unless expressly stated otherwise, this messag...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] un-expected return by fdim

2004-04-13 Thread Fred J.
Browse[1]> Lframe
   v  v  v  v  v  v v v
1  8  7  6  5  4  3 2 1
2  9  8  7  6  5  4 3 2
3 10  9  8  7  6  5 4 3
4 11 10  9  8  7  6 5 4
5 12 11 10  9  8  7 6 5
6 13 12 11 10  9  8 7 6
7 14 13 12 11 10  9 8 7
8 15 14 13 12 11 10 9 8
Browse[1]> fdim(Lframe,q=2)
Error in slopeopt(AllPoints, Alpha) : Object "LineP"
not found

thanks for any feed back

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Non-homogeneity of variance - decreasing variance

2004-04-13 Thread John Fox
Dear Simon,

I'm not sure that I follow this entirely, but if error variance decreases
with the level of the response, you could try raising the response to a
power greater than 1. Of course, the response has to be non-negative. You
might take a look at the spread.level.plot function in the car package,
which will produce a suggested transformation when applied to an lm object.

I hope that this helps,
 John 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Simon Chamaillé
> Sent: Tuesday, April 13, 2004 12:36 PM
> To: [EMAIL PROTECTED]
> Subject: [R] Non-homogeneity of variance - decreasing variance
> 
> Hello all,
> I'm running very simple regression but face a problem of 
> non-homogeneity of variance, but with a decreasing variance 
> with increasing mean...I do not know how to deal with that.
> this relationship doesn't seem to be strong, but it's my 
> first time to see something like that, and would like to know 
> what to do if one day it becomes stronger. I tested just for 
> fun some transformation but was not able to get a better 
> model. I do not know if it can help, but my predictor 
> variable is a kind of gamma poisson-shaped-like zero-rich 
> distribution (continuous of course), highly overdispersed.
> If one know how to deal with decreasing variance, I would 
> appreciate any advice (I tried to modelize negative 
> variance-mean relationship in a new
> quasi- family this was prohibited, only constant, mu, mu^x 
> (and mu(1-mu) for
> binomial) were allowed). I've definitively reached the border 
> of the statistical black box for me.
> thanks
> simon
>

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Matrix question

2004-04-13 Thread apjaworski





Gideon,

Eigenvectors are normalized to unit length.  The first eigenvector
calculated by R is equal (ignoring the signs of course) to your stable
distribution vector divided by its length.

Andy

__
Andy Jaworski
518-1-01
Process Laboratory
3M Corporate Research Laboratory
-
E-mail: [EMAIL PROTECTED]
Tel:  (651) 733-6092
Fax:  (651) 736-3122


|-+>
| |   GIDEON WASSERBERG|
| |   <[EMAIL PROTECTED]>|
| |   Sent by: |
| |   [EMAIL PROTECTED]|
| |   ath.ethz.ch  |
| ||
| ||
| |   04/13/2004 18:28 |
|-+>
  
>-|
  |
 |
  |  To:   "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> 
 |
  |  cc:   
 |
  |  Subject:  [R] Matrix question 
 |
  
>-|




Dear Friends

I am doing a simple matrix analysis to calculate the eigenvalue,
eigenvector using R for the below matrix, and comparing the result to those
obtained from a projection (using excel)

THE MATRIX:
> c
 [,1] [,2] [,3]
[1,]  0.0  2.02
[2,]  0.8  0.00
[3,]  0.0  0.80


The dominant eigenvalue comes out comparable to that calculated
numerically, but the eigenvectors do not( see below)!


EIGENVALUES (calculated by R):

> eigen(c)
$values
[1]  1.5564082+0.00i -0.7782041+0.465623i -0.7782041-0.465623i

EIGENVALUE numerically calculated: 1.556408145


EIGENVECTORS (calculated by R):
$vectors
  [,1]  [,2]  [,3]
[1,] -0.8658084+0i  0.6476861+0.000i  0.6476861+0.000i
[2,] -0.4450290+0i -0.4902997-0.2933611i -0.4902997+0.2933611i
[3,] -0.2287467+0i  0.2382837+0.4441499i  0.2382837-0.4441499i

Stable age distribution (calculated numerically):

0.562365145
0.289057934
0.148576921


My questions are:
1. Both eigenvalue and eigenvectors are associated with some imaginary
value (i). How should I relate to that information? 2. More importantly, a.
I presume the 1st eigenvector collumn [,1] should correspond to the
dominant eigenvalue. How come then that it comes out different from the one
calculated numerically? Is there some conversion I should do?

Many thanks

Gideon


Gideon Wasserberg (Ph.D.)
Wildlife research unit,
Department of wildlife ecology,
University of Wisconsin
218 Russell labs, 1630 Linden dr.,
Madison, Wisconsin 53706, USA.
Tel.:608 265 2130, Fax: 608 262 6099

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Matrix question

2004-04-13 Thread Thomas Lumley
On Tue, 13 Apr 2004, GIDEON WASSERBERG wrote:

> Dear Friends
>
> I am doing a simple matrix analysis to calculate the eigenvalue,
> eigenvector using R for the below matrix, and comparing the result to
> those obtained from a projection (using excel)
>
> THE MATRIX:
> > c
>  [,1] [,2] [,3]
> [1,]  0.0  2.02
> [2,]  0.8  0.00
> [3,]  0.0  0.80
>
>
> The dominant eigenvalue comes out comparable to that calculated
> numerically, but the eigenvectors do not( see below)!

Yes, they do.

Your dominant eigenvector is -0.6495461 times the R dominant eigenvector,
and eigenvectors are defined only up to direction. You probably want to
rescale the eigenvector so that the sums of entries are 1.

>
> EIGENVALUES (calculated by R):
>
> > eigen(c)
> $values
> [1]  1.5564082+0.00i -0.7782041+0.465623i -0.7782041-0.465623i
>
> EIGENVALUE numerically calculated: 1.556408145
>
>
> EIGENVECTORS (calculated by R):
> $vectors
>   [,1]  [,2]  [,3]
> [1,] -0.8658084+0i  0.6476861+0.000i  0.6476861+0.000i
> [2,] -0.4450290+0i -0.4902997-0.2933611i -0.4902997+0.2933611i
> [3,] -0.2287467+0i  0.2382837+0.4441499i  0.2382837-0.4441499i
>
> Stable age distribution (calculated numerically):
>
> 0.562365145
> 0.289057934
> 0.148576921
>
>
> My questions are: 1. Both eigenvalue and eigenvectors are associated
> with some imaginary value (i). How should I relate to that information?

The first eigenvalue has zero imaginary component, as does its
eigenvector, so you may not need to relate to it.


-thomas

Thomas Lumley   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]   University of Washington, Seattle

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] "diff"^-1

2004-04-13 Thread michele lux
Yes kjetil (and other) my question wasn't clear but
what I was looking for was just diffinv!!
thanks michele

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] R 1.9.0 and cursors

2004-04-13 Thread Liaw, Andy
Even if this is not covered in the FAQ, it sure had been asked many times on
R-help that it qualifies to be in the FAQ.  Make sure you have the readline
libraries & headers.  The last output of `configure' should show `readline'
if such capability is found to be working by the configure script.

Andy

> From: Paolo Sirabella
> 
> Hi all,
> I have successfully compiled and installed (OS: Linux 
> Mandrake 9.2) the last 
> devel version of R (ver. 1.9.0). All seems to go well, but 
> now I cannot use 
> the cursors for exploring back the command history (when I 
> press the cursor 
> key the following characters are shown: ^[[A , or ^[[B etc.).
> 
> Hints and suggestions are welcomed.
> 
> Thanks.
> 
> Paolo
> -- 
> -
> Paolo Sirabella, PhD
> University of Rome "La Sapienza"
> Dept. of Human Physiology and Pharmacology
> Building of Human Physiology
> P.le Aldo Moro, 5 - 00185 - Roma - Italy
> 
> Res Non Verba
> 
> __
> [EMAIL PROTECTED] mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
>

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] R 1.9.0 and cursors

2004-04-13 Thread Paolo Sirabella
Hi all,
I have successfully compiled and installed (OS: Linux Mandrake 9.2) the last 
devel version of R (ver. 1.9.0). All seems to go well, but now I cannot use 
the cursors for exploring back the command history (when I press the cursor 
key the following characters are shown: ^[[A , or ^[[B etc.).

Hints and suggestions are welcomed.

Thanks.

Paolo
-- 
-
Paolo Sirabella, PhD
University of Rome "La Sapienza"
Dept. of Human Physiology and Pharmacology
Building of Human Physiology
P.le Aldo Moro, 5 - 00185 - Roma - Italy

Res Non Verba

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Reverse dendrogram plot in R

2004-04-13 Thread Mark Wall
Is there a way to completely reverse a dendrogram plot?

So, for dendrogram D, 
I want to generate a mirror image of plot(D).

This works if one plots an hclust object in reverse order, but I need
this to work as a dendrogram, since the dendrogram is plotted in a more
complicated layout.

Thank you,
Mark Wall

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Matrix question

2004-04-13 Thread GIDEON WASSERBERG
Dear Friends

I am doing a simple matrix analysis to calculate the eigenvalue, eigenvector using R 
for the below matrix, and comparing the result to those obtained from a projection 
(using excel)

THE MATRIX:
> c
 [,1] [,2] [,3]
[1,]  0.0  2.02
[2,]  0.8  0.00
[3,]  0.0  0.80


The dominant eigenvalue comes out comparable to that calculated numerically, but the 
eigenvectors do not( see below)!


EIGENVALUES (calculated by R):

> eigen(c)
$values
[1]  1.5564082+0.00i -0.7782041+0.465623i -0.7782041-0.465623i

EIGENVALUE numerically calculated: 1.556408145


EIGENVECTORS (calculated by R):
$vectors
  [,1]  [,2]  [,3]
[1,] -0.8658084+0i  0.6476861+0.000i  0.6476861+0.000i
[2,] -0.4450290+0i -0.4902997-0.2933611i -0.4902997+0.2933611i
[3,] -0.2287467+0i  0.2382837+0.4441499i  0.2382837-0.4441499i

Stable age distribution (calculated numerically):

0.562365145
0.289057934
0.148576921


My questions are:
1. Both eigenvalue and eigenvectors are associated with some imaginary value (i). How 
should I relate to that information? 2. More importantly, a. I presume the 1st 
eigenvector collumn [,1] should correspond to the dominant eigenvalue. How come then 
that it comes out different from the one calculated numerically? Is there some 
conversion I should do?

Many thanks

Gideon


Gideon Wasserberg (Ph.D.)
Wildlife research unit,
Department of wildlife ecology,
University of Wisconsin
218 Russell labs, 1630 Linden dr.,
Madison, Wisconsin 53706, USA.
Tel.:608 265 2130, Fax: 608 262 6099

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Non-homogeneity of variance - decreasing variance

2004-04-13 Thread kjetil
On 13 Apr 2004 at 19:36, Simon Chamaillé wrote:

You could maybe try gls in package nlme, where you can estimate
 parameters in variance functions. If you need a generalized linear 
model, you could have a look at glmmPQL in MASS, but I don't know if 
that accepts models without random effects.

Kjetil Halvorsen

> Hello all,
> I'm running very simple regression but face a problem of
> non-homogeneity of variance, but with a decreasing variance with
> increasing mean...I do not know how to deal with that. this
> relationship doesn't seem to be strong, but it's my first time to see
> something like that, and would like to know what to do if one day it
> becomes stronger. I tested just for fun some transformation but was
> not able to get a better model. I do not know if it can help, but my
> predictor variable is a kind of gamma poisson-shaped-like zero-rich
> distribution (continuous of course), highly overdispersed. If one know
> how to deal with decreasing variance, I would appreciate any advice (I
> tried to modelize negative variance-mean relationship in a new quasi-
> family this was prohibited, only constant, mu, mu^x (and mu(1-mu) for
> binomial) were allowed). I've definitively reached the border of the
> statistical black box for me. thanks simon
> 
>  [[alternative HTML version deleted]]
> 
> __
> [EMAIL PROTECTED] mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] lattice problem in R-1.9.0

2004-04-13 Thread Sundar Dorai-Raj


Deepayan Sarkar wrote:

On Tuesday 13 April 2004 12:51, Sundar Dorai-Raj wrote:

Hi all,
  I just installed R-1.9.0 on Windows 2000 from binaries. Yesterday,
on R-1.8.1 I ran a script that looked like:
library(lattice)
tmp <- expand.grid(A = 1:3, B = letters[1:2])
tmp$z <- runif(NROW(tmp))
trellis.device(png, file = "x1081.png", theme = col.whitebg)
xyplot(z ~ A | B, data = tmp,
   panel = function(x, y, i) {
 panel.xyplot(x, y)
 ltext(1, 0.95, paste("i =", i), adj = 0)
   },
   ylim = c(0, 1),
   i = 10)
dev.off()
In R-1.9.0, the same script gives the following error message:

Error in trellis.skeleton(cond = structure(list(B =
structure(as.integer(c(1,  :
Invalid value of index.cond
 ^^



I've tracked it down to including the argument "i" to the panel
function. If I change the argument to
xyplot(z ~ A | B, data = tmp,
   panel = function(x, y, I) {
 panel.xyplot(x, y)
 ltext(1, 0.95, paste("i =", I), adj = 0)
   },
   ylim = c(0, 1),
   I = 10)
all is copacetic. There is no argument in xyplot that starts with "i"
so I don't know where the partial matching is occurring.


Actually, in R 1.9.0, xyplot() does have a new argument that starts with 
i, namely 'index.cond' (as indicated by the error message above). This 
(along with many other arguments) doesn't show up in args(xyplot) 
because of the way arguments common to high-level lattice functions are 
handled by common code (they are formally part of ...); but it is 
documented in ?xyplot.

Deepayan

Sorry, should have caught that. As you suspected all I did was 
args(xyplot). I wasn't expecting a new argument.

Thanks for the quick reply.

--sundar

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] "diff"^-1

2004-04-13 Thread Roger D. Peng
Whoops, I was thinking of something totally different.  Apologies.

I think you might want diffinv().

-roger

Roger D. Peng wrote:
What do you mean by "opposite"?  Have you looked at patch?

-roger

michele lux wrote:

Hallo all
somebody knows if exist a command who makes the
opposite of what "diff" command do?
I'he to write code?
thanks Michele
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Complex sample variances

2004-04-13 Thread Thomas Lumley
On Tue, 13 Apr 2004, Fred Rohde wrote:

> Looked through the publication, "Statistical Methods and Mathematical
> Algorithms Used in Sudaan" (Shah, et al, 1993) but the only reference to
> variances on quantiles is a 1991 presentation by David Binder.  Googled
> the title and got this link.
>
> http://www.amstat.org/sections/srms/Proceedings/papers/1991_005.pdf
>

Ok.  I see.

I wouldn't have called this a Taylor series method, and I notice that
Binder agrees with me.  They are doing interval estimation by inverting a
score test, which is an interval estimation method I want to add more
generally in R.  It works much better than Wald tests for a number of
quasilikelihood/estimating function estimators in ordinary model-based
analysis, too.

Taylor series methods have trouble with quantiles because the estimating
function isn't differentiable.  Asymptotic normality still applies, but
the asymptotic standard error depends on the density of the variable at
the quantile, and the asymptotic approximation is not as good as usual.
Even the bootstrap needs larger sample sizes for quantiles than for many
statistics.


-thomas

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Need advice on using R with large datasets

2004-04-13 Thread Peter Dalgaard
"Liaw, Andy" <[EMAIL PROTECTED]> writes:

> On a dual Opteron 244 with 16GB ram, and 
> 
> [EMAIL PROTECTED]:cb1]% free
>  total   used   free sharedbuffers cached
> Mem:  16278648   145526761725972  0 2294203691824
> -/+ buffers/cache:   106314325647216
> Swap:  2096472  134282083044
> 
> ... using freshly compiled R-1.9.0:
> 
> > system.time(x <- numeric(1e9))
> [1]  3.60  8.09 15.11  0.00  0.00
> > object.size(x)/1024^3
> [1] 7.45058


Well,

> system.time(mean(x))
[1]   15.80   20.94 1323.010.000.00
> object.size(x)/1024^3
[1] 7.45058

I suppose I just have to look forward to RAM prices dropping...
(Actually, the OS should be able to do better. Should be able to read
the data from disk at about 20s/GB.)

-- 
   O__   Peter Dalgaard Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics 2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] "diff"^-1

2004-04-13 Thread kjetil
On 13 Apr 2004 at 18:58, michele lux wrote:

> Hallo all
> somebody knows if exist a command who makes the
> opposite of what "diff" command do?
> I'he to write code?
> thanks Michele
> 

As other responses has shown, your Q could have been clearer!

?diffinv

note that this is not really an inverse, as the following shows:

> diffinv(diff(1:10))
 [1] 0 1 2 3 4 5 6 7 8 9

If you know the first element of your original series, you can do:

> diffinv(diff(3:13), xi=3)
 [1]  3  4  5  6  7  8  9 10 11 12 13

Kjetil Halvorsen

> __
> [EMAIL PROTECTED] mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Opposite of 'diff'? [was: (no subject)]

2004-04-13 Thread Duncan Murdoch
On Tue, 13 Apr 2004 10:58:35 -0700, Spencer Graves
<[EMAIL PROTECTED]> wrote :

>What is "patch"?  I don't find it in R 1.8.1.  However, ?"diff" mentions 
>"diffinv";  that and "cumsum" perform as follows: 

"diff" is a Unix command to calculate differences between files.
"patch" is a Unix command to apply such differences to a file.

I'm pretty sure I misunderstood the question; "cumsum" is probably the
right answer.

Duncan Murdoch

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Opposite of 'diff'? [was: (no subject)]

2004-04-13 Thread Peter Dalgaard
Spencer Graves <[EMAIL PROTECTED]> writes:

> What is "patch"?  I don't find it in R 1.8.1.  However, ?"diff"
> mentions "diffinv";  that and "cumsum" perform as follows:

diff is a Unix command for comparing files. Using output from diff to
patch a file is done by a program called ... well you guessed it (an
acronym for "please apply this clever hack" according to legend).

I think that your guess, that it was the R function that was intended,
was a better one, though.


-- 
   O__   Peter Dalgaard Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics 2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] lattice problem in R-1.9.0

2004-04-13 Thread Deepayan Sarkar
On Tuesday 13 April 2004 12:51, Sundar Dorai-Raj wrote:
> Hi all,
>I just installed R-1.9.0 on Windows 2000 from binaries. Yesterday,
> on R-1.8.1 I ran a script that looked like:
>
> library(lattice)
> tmp <- expand.grid(A = 1:3, B = letters[1:2])
> tmp$z <- runif(NROW(tmp))
> trellis.device(png, file = "x1081.png", theme = col.whitebg)
> xyplot(z ~ A | B, data = tmp,
> panel = function(x, y, i) {
>   panel.xyplot(x, y)
>   ltext(1, 0.95, paste("i =", i), adj = 0)
> },
> ylim = c(0, 1),
> i = 10)
> dev.off()
>
> In R-1.9.0, the same script gives the following error message:
>
> Error in trellis.skeleton(cond = structure(list(B =
> structure(as.integer(c(1,  :
>   Invalid value of index.cond
 ^^


> I've tracked it down to including the argument "i" to the panel
> function. If I change the argument to
>
> xyplot(z ~ A | B, data = tmp,
> panel = function(x, y, I) {
>   panel.xyplot(x, y)
>   ltext(1, 0.95, paste("i =", I), adj = 0)
> },
> ylim = c(0, 1),
> I = 10)
>
> all is copacetic. There is no argument in xyplot that starts with "i"
> so I don't know where the partial matching is occurring.

Actually, in R 1.9.0, xyplot() does have a new argument that starts with 
i, namely 'index.cond' (as indicated by the error message above). This 
(along with many other arguments) doesn't show up in args(xyplot) 
because of the way arguments common to high-level lattice functions are 
handled by common code (they are formally part of ...); but it is 
documented in ?xyplot.

Deepayan

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Need advice on using R with large datasets

2004-04-13 Thread Paul Gilbert
Liaw, Andy wrote:

I was under the impression that R has been run on 64-bit Solaris (and other
64-bit Unices) for quite a while (as 64-bit app).  

Yes, on Solaris it has worked for quite a while. I don't use it a lot, 
but have one problem that I have been running from time to time for a 
few years.  There are two "issues" that I know about.

1/  Some extra capabilities (like png I think)  also need to be compiled 
as 64 bit apps, and in some cases this is a non-trivial effort (on 
Solaris for someone like me that does not do that kind of thing often). 
For this reason I have both a 32-bit version for regular use and a 
64-bit version for special problems.

2/  Some R functions make copies of the data sets used and attach them 
to the result. For small data sets that can be very useful. If the 
result is then used as an argument to another function then very quickly 
there are multiple copies. If the data set is large then one is quickly 
making heavy use of swap, and the processing is very slow. This is not 
just a 64-bit problem, but with a 32-bit architecture it is hard to work 
on a data set big enough that this becomes an issue.  In some cases 
performance  can be improved  a lot by hacking the code and  not 
attaching the  dataset to the result (with some risk that functions 
using the result get broken).

Paul Gilbert

We've been running 64-bit
R on amd64 for a few months (and had quite a few oppertunities to get the R
processes using over 8GB of RAM).  Not much problem as far as I can see...
Best,
Andy
 

From: Roger D. Peng

As far as I know, R does compile on AMD Opterons and runs as a 
64-bit application.  So it can store objects larger than 4GB. 
However, I don't think R gets tested very often on 64-bit 
machines with such large objects so there may be yet undiscovered 
bugs.

-roger

Sunny Ho wrote:

   

Hello everyone,

I would like to get some advices on using R with some 
 

really large datasets.
   

I'm using RH9 Linux R 1.8.1 for a research with a lot of 
 

numerical data. The datasets total to around 200Mb (shown by 
memory.size). During my data manipulation, the system memory 
usage grew to 1.5Gb, and this caused a lot of swapping 
activities on my 1Gb PC. This is just a small-scale 
experiment, the full-scale one will be using data 30 times as 
large (on a 4Gb machine). I can see that I'll need to deal 
with memory usage problem very soon.
   

I notice that R keeps all datasets in memory at all times. 
 

I wonder whether there is any way to instruct R to push some 
of the less-frequently-used data tables out of main memory, 
so as to free up memory for those that are actively in used. 
It'll be even better if R can keep only part of a table in 
memory only when that part is needed. Using save & load could 
help, but I just wonder whether R is intelligent enough to do 
this by itself, so I don't need to keep track of memory usage 
at all times.
   

Another thought is to use a 64-bit machine (AMD64). I find 
 

there is a pre-compiled R for Fedora Linux on AMD64. Anyone 
knows whether this version of R runs as 64-bit? If so, then 
will R be able to go beyond the 32-bit 4Gb memory limit?
   

Also, from the manual, I find that the RPgSQL package (for 
 

PostgreSQL database) supports a feature "proxy data frame". 
Does anyone have experience with this? Can "proxy data frame" 
handle memory efficiently for very large datasets? Say, if I 
have a 6Gb database table defined as a proxy data frame, will 
R & RPgSQL be able to handle it with just 4Gb of memory?
   

Any comments will be useful. Many thanks.

Sunny Ho
(Hong Kong University of Science & Technology)
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
 

http://www.R-project.org/posting-guide.html
   

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

   



--
Notice:  This e-mail message, together with any attachments,...{{dropped}}
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Opposite of 'diff'? [was: (no subject)]

2004-04-13 Thread Spencer Graves
What is "patch"?  I don't find it in R 1.8.1.  However, ?"diff" mentions 
"diffinv";  that and "cumsum" perform as follows: 

cumsum(diff(1:11))
[1]  1  2  3  4  5  6  7  8  9 10
> diffinv(diff(1:11))
[1]  0  1  2  3  4  5  6  7  8  9 10
>
spencer graves

Duncan Murdoch wrote:

On Tue, 13 Apr 2004 18:57:59 +0200 (CEST), michele lux
<[EMAIL PROTECTED]> wrote :
 

Hallo all
somebody knows if exist a command who makes the
opposite of what "diff" command do?
I'he to write code?
   

Sounds like "patch" is what you want.  

Duncan Murdoch

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] lattice problem in R-1.9.0

2004-04-13 Thread Sundar Dorai-Raj
Hi all,
  I just installed R-1.9.0 on Windows 2000 from binaries. Yesterday, on 
R-1.8.1 I ran a script that looked like:

library(lattice)
tmp <- expand.grid(A = 1:3, B = letters[1:2])
tmp$z <- runif(NROW(tmp))
trellis.device(png, file = "x1081.png", theme = col.whitebg)
xyplot(z ~ A | B, data = tmp,
   panel = function(x, y, i) {
 panel.xyplot(x, y)
 ltext(1, 0.95, paste("i =", i), adj = 0)
   },
   ylim = c(0, 1),
   i = 10)
dev.off()
In R-1.9.0, the same script gives the following error message:

Error in trellis.skeleton(cond = structure(list(B = 
structure(as.integer(c(1,  :
	Invalid value of index.cond

I've tracked it down to including the argument "i" to the panel 
function. If I change the argument to

xyplot(z ~ A | B, data = tmp,
   panel = function(x, y, I) {
 panel.xyplot(x, y)
 ltext(1, 0.95, paste("i =", I), adj = 0)
   },
   ylim = c(0, 1),
   I = 10)
all is copacetic. There is no argument in xyplot that starts with "i" so 
I don't know where the partial matching is occurring.

Thanks,
Sundar
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] (no subject)

2004-04-13 Thread Duncan Murdoch
On Tue, 13 Apr 2004 18:57:59 +0200 (CEST), michele lux
<[EMAIL PROTECTED]> wrote :

>Hallo all
>somebody knows if exist a command who makes the
>opposite of what "diff" command do?
>I'he to write code?

Sounds like "patch" is what you want.  

Duncan Murdoch

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Non-homogeneity of variance - decreasing variance

2004-04-13 Thread Simon Chamaillé
Hello all,
I'm running very simple regression but face a problem of non-homogeneity of
variance, but with a decreasing variance with increasing mean...I do not
know how to deal with that.
this relationship doesn't seem to be strong, but it's my first time to see
something like that, and would like to know what to do if one day it becomes
stronger. I tested just for fun some transformation but was not able to get
a better model. I do not know if it can help, but my predictor variable is a
kind of gamma poisson-shaped-like zero-rich distribution (continuous of
course), highly overdispersed.
If one know how to deal with decreasing variance, I would appreciate any
advice (I tried to modelize negative variance-mean relationship in a new
quasi- family this was prohibited, only constant, mu, mu^x (and mu(1-mu) for
binomial) were allowed). I've definitively reached the border of the
statistical black box for me.
thanks
simon

[[alternative HTML version deleted]]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] "diff"^-1

2004-04-13 Thread Roger D. Peng
What do you mean by "opposite"?  Have you looked at patch?

-roger

michele lux wrote:
Hallo all
somebody knows if exist a command who makes the
opposite of what "diff" command do?
I'he to write code?
thanks Michele
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] "diff"^-1

2004-04-13 Thread Spencer Graves
?cumsum

michele lux wrote:

Hallo all
somebody knows if exist a command who makes the
opposite of what "diff" command do?
I'he to write code?
thanks Michele
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] "diff"^-1

2004-04-13 Thread michele lux
Hallo all
somebody knows if exist a command who makes the
opposite of what "diff" command do?
I'he to write code?
thanks Michele

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] (no subject)

2004-04-13 Thread michele lux
Hallo all
somebody knows if exist a command who makes the
opposite of what "diff" command do?
I'he to write code?
thanks Michele

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Need advice on using R with large datasets

2004-04-13 Thread Liaw, Andy
On a dual Opteron 244 with 16GB ram, and 

[EMAIL PROTECTED]:cb1]% free
 total   used   free sharedbuffers cached
Mem:  16278648   145526761725972  0 2294203691824
-/+ buffers/cache:   106314325647216
Swap:  2096472  134282083044

... using freshly compiled R-1.9.0:

> system.time(x <- numeric(1e9))
[1]  3.60  8.09 15.11  0.00  0.00
> object.size(x)/1024^3
[1] 7.45058

Andy

> From: Peter Dalgaard
> 
> "Roger D. Peng" <[EMAIL PROTECTED]> writes:
> 
> > I've been running R on 64-bit SuSE Linux on Opterons for a 
> few months
> > now and it certainly runs fine in what I would call standard
> > situations.  In particular there seems to be no problem with
> > workspaces > 4GB.  But I seldom handle single objects (like 
> matrices,
> > vectors) that are > 4GB.  The only exception is lists, but I think
> > those are okay since they are composed of various sub-objects (like
> > Peter mentioned).
> 
> I just tried, and x <- numeric(1e9) (~8GB) doesn't appear to be a
> problem, except that it takes "forever" since the machine in question
> has only 1GB of memory, and numeric() zero fills the allocated
> block...
> 
> -- 
>O__   Peter Dalgaard Blegdamsvej 3  
>   c/ /'_ --- Dept. of Biostatistics 2200 Cph. N   
>  (*) \(*) -- University of Copenhagen   Denmark  Ph: 
> (+45) 35327918
> ~~ - ([EMAIL PROTECTED]) FAX: 
> (+45) 35327907
> 
> __
> [EMAIL PROTECTED] mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
>

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Need advice on using R with large datasets

2004-04-13 Thread Peter Dalgaard
"Roger D. Peng" <[EMAIL PROTECTED]> writes:

> I've been running R on 64-bit SuSE Linux on Opterons for a few months
> now and it certainly runs fine in what I would call standard
> situations.  In particular there seems to be no problem with
> workspaces > 4GB.  But I seldom handle single objects (like matrices,
> vectors) that are > 4GB.  The only exception is lists, but I think
> those are okay since they are composed of various sub-objects (like
> Peter mentioned).

I just tried, and x <- numeric(1e9) (~8GB) doesn't appear to be a
problem, except that it takes "forever" since the machine in question
has only 1GB of memory, and numeric() zero fills the allocated
block...

-- 
   O__   Peter Dalgaard Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics 2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] randomForest: more than one variable needed?

2004-04-13 Thread Torsten Hothorn
On Tue, 13 Apr 2004, Hui Han wrote:

> Hi,
>
> I am doing feature selection for my dataset. The following is
> the extreme case where only one feature is left. But I got
> the error below. So my question is that do I have to use
> more than one features?
>
> sample.subset
>   udomain.edu hpclass
> 1-1.0 not
> 2-1.0 not
> 3-0.2 not
> 4 1.0  hp
> 5 1.0  hp
> > randomForest(hpclass ~., data=sample.subset, importance=TRUE);
> Error in if (n == 0) stop("data (x) has 0 rows") :
> argument is of length zero
>

no idea about the error message, but there is no need for feature
selection before using random forests - give it a try without
preselection of variables.

best

Torsten

> Best regards,
> Hui Han
> Department of Computer Science and Engineering,
> The Pennsylvania State University
> University Park, PA,16802
> email: [EMAIL PROTECTED]
> homepage: http://www.cse.psu.edu/~hhan
>
> __
> [EMAIL PROTECTED] mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
>

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] randomForest: more than one variable needed?

2004-04-13 Thread Liaw, Andy
With only one `x' variable, RF will be identical to bagging.

This looks like a bug.  I will check it out.

Andy

> From: Hui Han
> 
> I agree with you about the less practical meaning of this sample of 
> the extreme case. I am just curious about the "grammar" syntax of 
> randomForest.
> 
> Thanks.
> Hui
> 
> On Tue, Apr 13, 2004 at 05:29:06PM +0200, Philippe Grosjean wrote:
> > I don't see much why to use random forest with only one 
> predictive variable!
> > Recall that random forest grow trees with a random subset 
> of variables "in
> > competition" for growing each node of the trees in the 
> forest... How do you
> > make such a random subset with only one predictive 
> variable? there is no
> > point here!
> > 
> > Philippe Grosjean
> > 
> > -Original Message-
> > From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED] Behalf Of Hui Han
> > Sent: Tuesday, 13 April, 2004 17:16
> > To: [EMAIL PROTECTED]
> > Subject: [R] randomForest: more than one variable needed?
> > 
> > 
> > Hi,
> > 
> > I am doing feature selection for my dataset. The following is
> > the extreme case where only one feature is left. But I got
> > the error below. So my question is that do I have to use
> > more than one features?
> > 
> > sample.subset
> >   udomain.edu hpclass
> > 1-1.0 not
> > 2-1.0 not
> > 3-0.2 not
> > 4 1.0  hp
> > 5 1.0  hp
> > > randomForest(hpclass ~., data=sample.subset, importance=TRUE);
> > Error in if (n == 0) stop("data (x) has 0 rows") :
> > argument is of length zero
> > 
> > Best regards,
> > Hui Han
> > Department of Computer Science and Engineering,
> > The Pennsylvania State University
> > University Park, PA,16802
> > email: [EMAIL PROTECTED]
> > homepage: http://www.cse.psu.edu/~hhan
> > 
> > __
> > [EMAIL PROTECTED] mailing list
> > https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html
> > 
> > 
> > 
> 
> 
> Hui Han
> Department of Computer Science and Engineering,
> The Pennsylvania State University 
> University Park, PA,16802
> email: [EMAIL PROTECTED]
> homepage: http://www.cse.psu.edu/~hhan
> 
> __
> [EMAIL PROTECTED] mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Need advice on using R with large datasets

2004-04-13 Thread Roger D. Peng
I've been running R on 64-bit SuSE Linux on Opterons for a few 
months now and it certainly runs fine in what I would call 
standard situations.  In particular there seems to be no problem 
with workspaces > 4GB.  But I seldom handle single objects (like 
matrices, vectors) that are > 4GB.  The only exception is lists, 
but I think those are okay since they are composed of various 
sub-objects (like Peter mentioned).

-roger

Liaw, Andy wrote:
I was under the impression that R has been run on 64-bit Solaris (and other
64-bit Unices) for quite a while (as 64-bit app).  We've been running 64-bit
R on amd64 for a few months (and had quite a few oppertunities to get the R
processes using over 8GB of RAM).  Not much problem as far as I can see...
Best,
Andy

From: Roger D. Peng

As far as I know, R does compile on AMD Opterons and runs as a 
64-bit application.  So it can store objects larger than 4GB. 
However, I don't think R gets tested very often on 64-bit 
machines with such large objects so there may be yet undiscovered 
bugs.

-roger

Sunny Ho wrote:


Hello everyone,

I would like to get some advices on using R with some 
really large datasets.

I'm using RH9 Linux R 1.8.1 for a research with a lot of 
numerical data. The datasets total to around 200Mb (shown by 
memory.size). During my data manipulation, the system memory 
usage grew to 1.5Gb, and this caused a lot of swapping 
activities on my 1Gb PC. This is just a small-scale 
experiment, the full-scale one will be using data 30 times as 
large (on a 4Gb machine). I can see that I'll need to deal 
with memory usage problem very soon.

I notice that R keeps all datasets in memory at all times. 
I wonder whether there is any way to instruct R to push some 
of the less-frequently-used data tables out of main memory, 
so as to free up memory for those that are actively in used. 
It'll be even better if R can keep only part of a table in 
memory only when that part is needed. Using save & load could 
help, but I just wonder whether R is intelligent enough to do 
this by itself, so I don't need to keep track of memory usage 
at all times.

Another thought is to use a 64-bit machine (AMD64). I find 
there is a pre-compiled R for Fedora Linux on AMD64. Anyone 
knows whether this version of R runs as 64-bit? If so, then 
will R be able to go beyond the 32-bit 4Gb memory limit?

Also, from the manual, I find that the RPgSQL package (for 
PostgreSQL database) supports a feature "proxy data frame". 
Does anyone have experience with this? Can "proxy data frame" 
handle memory efficiently for very large datasets? Say, if I 
have a 6Gb database table defined as a proxy data frame, will 
R & RPgSQL be able to handle it with just 4Gb of memory?

Any comments will be useful. Many thanks.

Sunny Ho
(Hong Kong University of Science & Technology)
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html




--
Notice:  This e-mail message, together with any attachments, contains
information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New
Jersey, USA 08889), and/or its affiliates (which may be known outside the
United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan as
Banyu) that may be confidential, proprietary copyrighted and/or legally
privileged. It is intended solely for the use of the individual or entity
named on this message.  If you are not the intended recipient, and have
received this message in error, please notify us immediately by reply e-mail
and then delete it from your system.
--
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] randomForest: more than one variable needed?

2004-04-13 Thread Hui Han
I agree with you about the less practical meaning of this sample of 
the extreme case. I am just curious about the "grammar" syntax of 
randomForest.

Thanks.
Hui

On Tue, Apr 13, 2004 at 05:29:06PM +0200, Philippe Grosjean wrote:
> I don't see much why to use random forest with only one predictive variable!
> Recall that random forest grow trees with a random subset of variables "in
> competition" for growing each node of the trees in the forest... How do you
> make such a random subset with only one predictive variable? there is no
> point here!
> 
> Philippe Grosjean
> 
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] Behalf Of Hui Han
> Sent: Tuesday, 13 April, 2004 17:16
> To: [EMAIL PROTECTED]
> Subject: [R] randomForest: more than one variable needed?
> 
> 
> Hi,
> 
> I am doing feature selection for my dataset. The following is
> the extreme case where only one feature is left. But I got
> the error below. So my question is that do I have to use
> more than one features?
> 
> sample.subset
>   udomain.edu hpclass
> 1-1.0 not
> 2-1.0 not
> 3-0.2 not
> 4 1.0  hp
> 5 1.0  hp
> > randomForest(hpclass ~., data=sample.subset, importance=TRUE);
> Error in if (n == 0) stop("data (x) has 0 rows") :
> argument is of length zero
> 
> Best regards,
> Hui Han
> Department of Computer Science and Engineering,
> The Pennsylvania State University
> University Park, PA,16802
> email: [EMAIL PROTECTED]
> homepage: http://www.cse.psu.edu/~hhan
> 
> __
> [EMAIL PROTECTED] mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
> 
> 
> 


Hui Han
Department of Computer Science and Engineering,
The Pennsylvania State University 
University Park, PA,16802
email: [EMAIL PROTECTED]
homepage: http://www.cse.psu.edu/~hhan

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] randomForest: more than one variable needed?

2004-04-13 Thread Philippe Grosjean
I don't see much why to use random forest with only one predictive variable!
Recall that random forest grow trees with a random subset of variables "in
competition" for growing each node of the trees in the forest... How do you
make such a random subset with only one predictive variable? there is no
point here!

Philippe Grosjean

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Hui Han
Sent: Tuesday, 13 April, 2004 17:16
To: [EMAIL PROTECTED]
Subject: [R] randomForest: more than one variable needed?


Hi,

I am doing feature selection for my dataset. The following is
the extreme case where only one feature is left. But I got
the error below. So my question is that do I have to use
more than one features?

sample.subset
  udomain.edu hpclass
1-1.0 not
2-1.0 not
3-0.2 not
4 1.0  hp
5 1.0  hp
> randomForest(hpclass ~., data=sample.subset, importance=TRUE);
Error in if (n == 0) stop("data (x) has 0 rows") :
argument is of length zero

Best regards,
Hui Han
Department of Computer Science and Engineering,
The Pennsylvania State University
University Park, PA,16802
email: [EMAIL PROTECTED]
homepage: http://www.cse.psu.edu/~hhan

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] fractal calculation using fdim

2004-04-13 Thread Fred J.
  Is that how you got your data or are
> you using real data?

I am using some synthatic data I happend to have.

> argument X is to be a dataframe
> not a matrix  (mat??).  Could that be giving you
> problems?  Do you get
> better results with as.data.frame(mat)?
no, even with data.frame, it gives the same error

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] randomForest: more than one variable needed?

2004-04-13 Thread Hui Han
Hi,

I am doing feature selection for my dataset. The following is
the extreme case where only one feature is left. But I got
the error below. So my question is that do I have to use
more than one features?

sample.subset
  udomain.edu hpclass
1-1.0 not
2-1.0 not
3-0.2 not
4 1.0  hp
5 1.0  hp
> randomForest(hpclass ~., data=sample.subset, importance=TRUE);
Error in if (n == 0) stop("data (x) has 0 rows") :
argument is of length zero

Best regards,
Hui Han
Department of Computer Science and Engineering,
The Pennsylvania State University 
University Park, PA,16802
email: [EMAIL PROTECTED]
homepage: http://www.cse.psu.edu/~hhan

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Need advice on using R with large datasets

2004-04-13 Thread Peter Dalgaard
"Roger D. Peng" <[EMAIL PROTECTED]> writes:

> As far as I know, R does compile on AMD Opterons and runs as a 64-bit
> application.  So it can store objects larger than 4GB. However, I
> don't think R gets tested very often on 64-bit machines with such
> large objects so there may be yet undiscovered bugs.

There are a few such machines around among R users, and R seems to
work OK on them. One slight gotcha is that the Fortran numeric
libraries (Lapack, ATLAS) tend to use integer indexing, which might
overflow for large objects. Things like data frames which consist of
multiple subobjects might be less sensitive to this. 

-- 
   O__   Peter Dalgaard Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics 2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Need advice on using R with large datasets

2004-04-13 Thread Liaw, Andy
I was under the impression that R has been run on 64-bit Solaris (and other
64-bit Unices) for quite a while (as 64-bit app).  We've been running 64-bit
R on amd64 for a few months (and had quite a few oppertunities to get the R
processes using over 8GB of RAM).  Not much problem as far as I can see...

Best,
Andy

> From: Roger D. Peng
> 
> As far as I know, R does compile on AMD Opterons and runs as a 
> 64-bit application.  So it can store objects larger than 4GB. 
> However, I don't think R gets tested very often on 64-bit 
> machines with such large objects so there may be yet undiscovered 
> bugs.
> 
> -roger
> 
> Sunny Ho wrote:
> 
> > Hello everyone,
> > 
> > I would like to get some advices on using R with some 
> really large datasets.
> > 
> > I'm using RH9 Linux R 1.8.1 for a research with a lot of 
> numerical data. The datasets total to around 200Mb (shown by 
> memory.size). During my data manipulation, the system memory 
> usage grew to 1.5Gb, and this caused a lot of swapping 
> activities on my 1Gb PC. This is just a small-scale 
> experiment, the full-scale one will be using data 30 times as 
> large (on a 4Gb machine). I can see that I'll need to deal 
> with memory usage problem very soon.
> > 
> > I notice that R keeps all datasets in memory at all times. 
> I wonder whether there is any way to instruct R to push some 
> of the less-frequently-used data tables out of main memory, 
> so as to free up memory for those that are actively in used. 
> It'll be even better if R can keep only part of a table in 
> memory only when that part is needed. Using save & load could 
> help, but I just wonder whether R is intelligent enough to do 
> this by itself, so I don't need to keep track of memory usage 
> at all times.
> > 
> > Another thought is to use a 64-bit machine (AMD64). I find 
> there is a pre-compiled R for Fedora Linux on AMD64. Anyone 
> knows whether this version of R runs as 64-bit? If so, then 
> will R be able to go beyond the 32-bit 4Gb memory limit?
> > 
> > Also, from the manual, I find that the RPgSQL package (for 
> PostgreSQL database) supports a feature "proxy data frame". 
> Does anyone have experience with this? Can "proxy data frame" 
> handle memory efficiently for very large datasets? Say, if I 
> have a 6Gb database table defined as a proxy data frame, will 
> R & RPgSQL be able to handle it with just 4Gb of memory?
> > 
> > Any comments will be useful. Many thanks.
> > 
> > Sunny Ho
> > (Hong Kong University of Science & Technology)
> > 
> > __
> > [EMAIL PROTECTED] mailing list
> > https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> >
> 
> 
> __
> [EMAIL PROTECTED] mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] R apache and PHP

2004-04-13 Thread Marcello Verona
I've developed a web application in PHP and R

my script is



...
exec("R CMD BATCH --silent /home/marcello/R_in/myfile.bat  
/home/marcello/R_out/myfile.out");

...

?>

This script execute in R batch mode and write the myfile.out.

On Win2000 the similar script is ok, but on linux I've a problem.

I suppose is a permession problem because the same script on shell run fine
and on Zend debugger (my IDE for php) is also ok.
In this case the owner is "marcello" , if I run the script by browser 
the owner is "apache".

I've  overwritted all the ownerships of R directory and bin to apache 
user but not work.

If a run
exec("ls > mydir.txt"); is ok (is not a PHP general problem!)
Someone can help me?

Thanks
(and excuse my for my poor english)
Marcello Verona

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] R , apache and PHP

2004-04-13 Thread Marcello Verona
I've developed a web application in PHP and R

my script is



...
exec("R CMD BATCH --silent /home/marcello/R_in/myfile.bat  
/home/marcello/R_out/myfile.out");

...

?>

This script execute in R batch mode and write the myfile.out.

On Win2000 the similar script is ok, but on linux I've a problem.

I suppose is a permession problem because the same script on shell run fine
and on Zend debugger (my IDE for php) is also ok.
In this case the owner is "marcello" , if I run the script by browser 
the owner is "apache".

I've  overwritted all the ownerships of R directory and bin to apache 
user but not work.

If a run
exec("ls > mydir.txt"); is ok (is not a PHP general problem!)
Someone can help me?

Thanks
(and excuse my for my poor english)
Marcello Verona

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Need advice on using R with large datasets

2004-04-13 Thread Thomas Lumley
On Tue, 13 Apr 2004, Roger D. Peng wrote:

> As far as I know, R does compile on AMD Opterons and runs as a
> 64-bit application.  So it can store objects larger than 4GB.
> However, I don't think R gets tested very often on 64-bit
> machines with such large objects so there may be yet undiscovered
> bugs.

Using more than 4Gb memory is reasonably tested now.  Single objects of
that size may not be -- I think you still can't have a vector whose
length() is more than 2^31, for example.

-thomas


>
> -roger
>
> Sunny Ho wrote:
>
> > Hello everyone,
> >
> > I would like to get some advices on using R with some really large datasets.
> >
> > I'm using RH9 Linux R 1.8.1 for a research with a lot of numerical data. The 
> > datasets total to around 200Mb (shown by memory.size). During my data 
> > manipulation, the system memory usage grew to 1.5Gb, and this caused a lot of 
> > swapping activities on my 1Gb PC. This is just a small-scale experiment, the 
> > full-scale one will be using data 30 times as large (on a 4Gb machine). I can see 
> > that I'll need to deal with memory usage problem very soon.
> >
> > I notice that R keeps all datasets in memory at all times. I wonder whether there 
> > is any way to instruct R to push some of the less-frequently-used data tables out 
> > of main memory, so as to free up memory for those that are actively in used. It'll 
> > be even better if R can keep only part of a table in memory only when that part is 
> > needed. Using save & load could help, but I just wonder whether R is intelligent 
> > enough to do this by itself, so I don't need to keep track of memory usage at all 
> > times.
> >
> > Another thought is to use a 64-bit machine (AMD64). I find there is a pre-compiled 
> > R for Fedora Linux on AMD64. Anyone knows whether this version of R runs as 
> > 64-bit? If so, then will R be able to go beyond the 32-bit 4Gb memory limit?
> >
> > Also, from the manual, I find that the RPgSQL package (for PostgreSQL database) 
> > supports a feature "proxy data frame". Does anyone have experience with this? Can 
> > "proxy data frame" handle memory efficiently for very large datasets? Say, if I 
> > have a 6Gb database table defined as a proxy data frame, will R & RPgSQL be able 
> > to handle it with just 4Gb of memory?
> >
> > Any comments will be useful. Many thanks.
> >
> > Sunny Ho
> > (Hong Kong University of Science & Technology)
> >
> > __
> > [EMAIL PROTECTED] mailing list
> > https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> >
>
> __
> [EMAIL PROTECTED] mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

Thomas Lumley   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]   University of Washington, Seattle

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Re: model-based clustering

2004-04-13 Thread Christian Hennig
Dear Talita,

you may start with
library(mclust)
?EMclust
example(EMclust)
# The example is with Iris data.

To understand what you're doing, read a paper from the web page
of the developers, cited on the help page.

Best,
Christian

PS: What a "good" or "optimal" clustering is, is by no means well defined.
Some cluster algorithms will find 2 or more than 3 clusters on Iris,
and that's not necessarily an argument against these algorithms.
 
On Tue, 13 Apr 2004, Talita Leite wrote:

> Hello again,
> 
> Let me explain this better. I've been working in clustering methods during 
> this year and now i'm starting (or trying to) with model-based clustering. 
> I've been searching for help to understand how the functions works and i 
> found some. The problem is that i don't know the steps to follow. For 
> example: working with the data set IRIS. What steps do i have to follow and 
> what functions do i have to use to make a good clustering? To find the three 
> groups on that case?
> 
> Thanks,
> 
> Talita
> 
> _
> MSN Messenger: converse com os seus amigos online.  
> http://messenger.msn.com.br
> 

***
Christian Hennig
Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg
[EMAIL PROTECTED], http://www.math.uni-hamburg.de/home/hennig/
###
ich empfehle www.boag-online.de

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] model-based clustering

2004-04-13 Thread Liaw, Andy
In the time that you posted the numerous uninformative messages, you can do
yourself a great favor by following the posting guide mentioned in the
footer.

Try:

install.packages("mclust")
library(mclust)
?mclust
example(mclust)

Andy

> From: Talita Leite
> 
> Hello again,
> 
> Let me explain this better. I've been working in clustering 
> methods during 
> this year and now i'm starting (or trying to) with 
> model-based clustering. 
> I've been searching for help to understand how the functions 
> works and i 
> found some. The problem is that i don't know the steps to follow. For 
> example: working with the data set IRIS. What steps do i have 
> to follow and 
> what functions do i have to use to make a good clustering? To 
> find the three 
> groups on that case?
> 
> Thanks,
> 
> Talita
> 
> __
> [EMAIL PROTECTED] mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Complex sample variances

2004-04-13 Thread Thomas Lumley
On Mon, 12 Apr 2004, Fred Rohde wrote:

> Thanks.  I'll update the survey package.  Sudaan does the standard
> errors on quantiles using Taylor series.  If I can hunt down the formula
> it uses, could you add that to svyquantile?

If I can bring myself to believe it.  Computing standard errors for the
normal approximation to the median is not easy even in simple random
samples.

-thomas


> Fred
>
> Thomas Lumley <[EMAIL PROTECTED]> wrote:
> On Mon, 12 Apr 2004, Fred Rohde wrote:
>
> > Hello,
> > Is there a way to get complex sample variances in the survey package on
> > summary statistics other than means? If not, can they be added to a
> > future version? It would be be great to have them on totals, quantiles,
> > ratios, and tables (eg row percent, columns percent, etc).
> >
>
> svytotal() and svyratio() will do this for totals and ratios if you have a
> new enough version. At the moment the easiest way to get row or column
> percentages is to think of them them as ratios of means of binary
> variables and use svyratio().
>
> Quantiles are more difficult, since neither Taylor series nor jackknife
> approaches work.
>
> -thomas
>
>
> -
> Do you Yahoo!?
> Yahoo! Tax Center - File online by April 15th

Thomas Lumley   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]   University of Washington, Seattle

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Execute function at startup

2004-04-13 Thread Erich Neuwirth
It would be convenient to have something like
Rgui runfist="myfunction()"
in Windows.
The reason:
AFAIK Rgui does not accept piped input
(RGui < myfile.R does not seem to work).
A solution could be to put a few fuctions in Rprofile and then
give the name for one of these functions to be executed at startup as
a command line parameter to Rgui.
Can something like this be done?

--
Erich Neuwirth, Computer Supported Didactics Working Group
Visit our SunSITE at http://sunsite.univie.ac.at
Phone: +43-1-4277-38624 Fax: +43-1-4277-9386
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] model-based clustering

2004-04-13 Thread Talita Leite
Hello again,

Let me explain this better. I've been working in clustering methods during 
this year and now i'm starting (or trying to) with model-based clustering. 
I've been searching for help to understand how the functions works and i 
found some. The problem is that i don't know the steps to follow. For 
example: working with the data set IRIS. What steps do i have to follow and 
what functions do i have to use to make a good clustering? To find the three 
groups on that case?

Thanks,

Talita

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] model-based clustering

2004-04-13 Thread Christian Hennig
Dear Talita,

no help is possible unless you do not tell us what exactly you want to do
and what exactly your difficulties are.

Best,
Christian

On Tue, 13 Apr 2004, Talita Leite wrote:

> Hello,
> 
> I'm trying to use the model-based clustering functions but i'm having some 
> difficulties. Does anybody could help me to make a good analisys of a data 
> set using these functions??
> 
> __
> [EMAIL PROTECTED] mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 

***
Christian Hennig
Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg
[EMAIL PROTECTED], http://www.math.uni-hamburg.de/home/hennig/
###
ich empfehle www.boag-online.de

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Need advice on using R with large datasets

2004-04-13 Thread Roger D. Peng
As far as I know, R does compile on AMD Opterons and runs as a 
64-bit application.  So it can store objects larger than 4GB. 
However, I don't think R gets tested very often on 64-bit 
machines with such large objects so there may be yet undiscovered 
bugs.

-roger

Sunny Ho wrote:

Hello everyone,

I would like to get some advices on using R with some really large datasets.

I'm using RH9 Linux R 1.8.1 for a research with a lot of numerical data. The datasets total to around 200Mb (shown by memory.size). During my data manipulation, the system memory usage grew to 1.5Gb, and this caused a lot of swapping activities on my 1Gb PC. This is just a small-scale experiment, the full-scale one will be using data 30 times as large (on a 4Gb machine). I can see that I'll need to deal with memory usage problem very soon.

I notice that R keeps all datasets in memory at all times. I wonder whether there is any way to instruct R to push some of the less-frequently-used data tables out of main memory, so as to free up memory for those that are actively in used. It'll be even better if R can keep only part of a table in memory only when that part is needed. Using save & load could help, but I just wonder whether R is intelligent enough to do this by itself, so I don't need to keep track of memory usage at all times.

Another thought is to use a 64-bit machine (AMD64). I find there is a pre-compiled R for Fedora Linux on AMD64. Anyone knows whether this version of R runs as 64-bit? If so, then will R be able to go beyond the 32-bit 4Gb memory limit?

Also, from the manual, I find that the RPgSQL package (for PostgreSQL database) supports a feature "proxy data frame". Does anyone have experience with this? Can "proxy data frame" handle memory efficiently for very large datasets? Say, if I have a 6Gb database table defined as a proxy data frame, will R & RPgSQL be able to handle it with just 4Gb of memory?

Any comments will be useful. Many thanks.

Sunny Ho
(Hong Kong University of Science & Technology)
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] mts

2004-04-13 Thread Santosh Kumar
Hi!

 I am new to R. I need your help.

 I have got the time series of fifteen variables in data file. I would like
 to plot it in R in separate ps pages, not in same ps page.

 I was reading about mts, but I could not figure out how to do it.

 Can anyone help me out?

 with regards;
 Santosh
 --
 Santosh Kumar URL http://www.igidr.ac.in/~santosh/
 PhD Student
 Indira Gandhi Institute of Development Research
 Gen A. K. Vaidya Marg
 Goregaon ( East )
 Mumbai pin 400065 India
 Phone 28400919
 

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] model-based clustering

2004-04-13 Thread Talita Leite
Hello,

I'm trying to use the model-based clustering functions but i'm having some 
difficulties. Does anybody could help me to make a good analisys of a data 
set using these functions??

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Complex sample variances

2004-04-13 Thread Fred Rohde
Looked through the publication, "Statistical Methods and Mathematical Algorithms Used 
in Sudaan" (Shah, et al, 1993) but the only reference to variances on quantiles is a 
1991 presentation by David Binder.  Googled the title and got this link.
 
http://www.amstat.org/sections/srms/Proceedings/papers/1991_005.pdf
 
Point estimation (section 1 of this reference) is already implented in R; variance 
estimatation for quantiles is presented in the last part of section 3.  Can you make 
sense of it?  It's beyond me.
 
Fred

Fred Rohde <[EMAIL PROTECTED]> wrote:
Thanks. I'll update the survey package. Sudaan does the standard errors on quantiles 
using Taylor series. If I can hunt down the formula it uses, could you add that to 
svyquantile?

Fred

Thomas Lumley wrote:
On Mon, 12 Apr 2004, Fred Rohde wrote:

> Hello,
> Is there a way to get complex sample variances in the survey package on
> summary statistics other than means? If not, can they be added to a
> future version? It would be be great to have them on totals, quantiles,
> ratios, and tables (eg row percent, columns percent, etc).
>

svytotal() and svyratio() will do this for totals and ratios if you have a
new enough version. At the moment the easiest way to get row or column
percentages is to think of them them as ratios of means of binary
variables and use svyratio().

Quantiles are more difficult, since neither Taylor series nor jackknife
approaches work.

-thomas


-


[[alternative HTML version deleted]]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-


[[alternative HTML version deleted]]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] model-based clustering

2004-04-13 Thread Talita Leite
Hello,

I'm trying to use the model-based clustering functions R provides but i'm 
having some difficulties. Does anybody could help me how to make a good 
analisys of a data set with these functions??

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Need advice on using R with large datasets

2004-04-13 Thread Sunny Ho
Hello everyone,

I would like to get some advices on using R with some really large datasets.

I'm using RH9 Linux R 1.8.1 for a research with a lot of numerical data. The datasets 
total to around 200Mb (shown by memory.size). During my data manipulation, the system 
memory usage grew to 1.5Gb, and this caused a lot of swapping activities on my 1Gb PC. 
This is just a small-scale experiment, the full-scale one will be using data 30 times 
as large (on a 4Gb machine). I can see that I'll need to deal with memory usage 
problem very soon.

I notice that R keeps all datasets in memory at all times. I wonder whether there is 
any way to instruct R to push some of the less-frequently-used data tables out of main 
memory, so as to free up memory for those that are actively in used. It'll be even 
better if R can keep only part of a table in memory only when that part is needed. 
Using save & load could help, but I just wonder whether R is intelligent enough to do 
this by itself, so I don't need to keep track of memory usage at all times.

Another thought is to use a 64-bit machine (AMD64). I find there is a pre-compiled R 
for Fedora Linux on AMD64. Anyone knows whether this version of R runs as 64-bit? If 
so, then will R be able to go beyond the 32-bit 4Gb memory limit?

Also, from the manual, I find that the RPgSQL package (for PostgreSQL database) 
supports a feature "proxy data frame". Does anyone have experience with this? Can 
"proxy data frame" handle memory efficiently for very large datasets? Say, if I have a 
6Gb database table defined as a proxy data frame, will R & RPgSQL be able to handle it 
with just 4Gb of memory?

Any comments will be useful. Many thanks.

Sunny Ho
(Hong Kong University of Science & Technology)

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


class of seq(length=n) values (was Re: [R] Zero Index Origin?)

2004-04-13 Thread Matthias Burger
Dear Brian Ripley,

following your advice (very much appreciated) of using seq(length=n) instead of 
1:n constructs in loop constructions
(usually prepended by the not so elegant if (n > 0))
I was somewhat surprised to find that (in R 1.8.1 & R 1.9.0 beta (2004-03-29) on 
Debian 3.0)
and using the methods package

carrel>a <- seq(1,5)
carrel>class(a)
[1] "integer"
carrel>class(seq(along=a))
[1] "integer"
but
carrel>class(seq(length=length(a)))
[1] "numeric"
and from ?seq
...
Value:
 The result is of 'mode' '"integer"' if 'from' is (numerically
 equal to an) integer and 'by' is not specified.
...
it is not (to me, of course) obvious that if length is specified and from 
omitted a numeric result sequence is returned.

This might only matter in conjunction with S4 class slot assignments where the 
integer class requirement is important.
OTOH a
for (i in as.integer(seq(length=length(a { ... }
is not so elegant either.

Could you please enlighten me as to why this behaviour was chosen and if there 
is any more elegant way than using

as.integer(seq(length=length(a)))
to get the desired result (given that I have to use a loop in the first place).
Regards,

  Matthias



Prof Brian Ripley wrote:
Much of R is itself written in R, so you cannot possibly change something 
as fundamental as this.  Further, index 0 has a special meaning that you 
would lose if R have 0-based indexing.

However, the R thinking is to work with whole objects (vectors, arrays, 
lists ...) and you rather rarely need to know what numbers are in an index 
vector.  There are usages such as 1:n, and those are quite often wrong: 
they should be seq(length=n) or seq(along=x) or some such, since n might 
be zero.  If you are writing code that works with single elements, you are 
probably a lot better off writing C code to link into R (and C is 
0-based ...).

On Wed, 31 Mar 2004, Bob Cain wrote:


I'm very new to R and utterly blown away by not only the 
language but the unbelievable set of packages and the 
documentation and the documentation standards and...

I was an early APL user and never lost my love for it and in 
R I find most of the essential things I loved about APL 
except for one thing.  At this early stage of my learning I 
can't yet determine if there is a way to effect what in APL 
was zero index origin, the ordinality of indexes starts with 
0 instead of 1.  Is it possible to effect that in R without 
a lot of difficulty?

I come here today from the world of DSP research and 
development where Matlab has a near hegemony.  I see no 
reason whatsoever that R couldn't replace it with a _far_ 
better and _far_ less idiosyncratic framework.  I'd be 
interested in working on a Matlab equivalent DSP package for 
R (if that isn't being done by someone) and one of the 
things most criticized about Matlab from the standpoint of 
the DSP programmer is its insistence on 1 origin indexing. 
Any feedback greatly appreciated.

Thanks,

Bob



--
Matthias Burger
Bioinformatics R&D
Epigenomics AG  www.epigenomics.com
Kleine Präsidentenstraße 1  fax:   +49-30-24345-555
10178 Berlin Germanyphone: +49-30-24345-0
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] from .csv file to a pca plot

2004-04-13 Thread Petr Pikal
Hallo

On 13 Apr 2004 at 13:25, Dansen, Ing. M.C. wrote:

> Hi,
> I'm just a beginner, who has just encountered a problem!
> 
> 1. -  I wanted to load a csv file with names in the rows (1st column)
> and and numbers in the 2nd til 10th column. The file contains
> names in the headers.
> 
>-  I used;  a <- as.matrix(read.table("filename", sep=',",
> row.names=1, header=TRUE)

maybe read.csv("filename") will do the same without need for other 
specifications.

Why do you convert it to matrix? What is wrong with data frame?

> 
> Question; 1 - I would like to select the first four columns
a[1:4,]

>   2 - and execute a pca(plot) from the mva package on
> those four columns

go through examples in mva

>   3 - how can I set the data type eg(string, integer,
> double) separate for each column

During loading process you can use colClasses argument to read.csv or 
read.table.

Or you can change the column type by as.xxx statement

> 
> Can anyone help me out, Help!
> 
> Thanks in advance,
> 
> Marinus

Cheers
Petr


> 
> 
> 
> 
> 
> 
> This e-mail and its contents are subject to the DISCLAIMER at
> http://www.tno.nl/disclaimer/email.html
> 
> __
> [EMAIL PROTECTED] mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html

Petr Pikal
[EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] par() in .Rprofile

2004-04-13 Thread Detlef Steuer

Hi,

it's in the release notes from Peter:

>Users may notice that code in .Rprofile is run with only the
>new base loaded and so functions may now not be found. For
>example, ps.options(horizontal = TRUE) should be preceded by
>library(graphics) or called as graphics::ps.options or,
>better, set as a hook -- see ?setHook.

detlef


On Tue, 13 Apr 2004 13:39:20 +0200
"Petr Pikal" <[EMAIL PROTECTED]> wrote:

> Dear all
> 
> I installed new version (from binaries) and I noticed that 
> 
> par(bg="white")
> 
> which I have in my .Rprofile causes error message on startup
> But if I issued this command immediately after startup everything worked as 
> expected. I did not see any note in changes file or elsewhere. 

-- 
Detlef Steuer --- http://fawn.unibw-hamburg.de/steuer.html
* Encrypted mail preferred *

"Die herrschenden Ideen sind die Ideen der Herrschenden."
--- K. Marx

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Question

2004-04-13 Thread Petr Pikal
Hi


On 13 Apr 2004 at 14:51, Ivan Yegorov wrote:

> I use R for Windows. I got error "Can not allocate 100 Mb for vector".
> Does R use only physical memory or it can operate with virtual memory?
> What should I do if it does. Thanks in advance.

Starting R with

--max-mem-size 550M

option can help.

The figure depends on your memory size.

Newer versions are better in using memmory resources.

Cheers
Petr

> 
> __
> [EMAIL PROTECTED] mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html

Petr Pikal
[EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] par() in .Rprofile

2004-04-13 Thread Petr Pikal
Dear all

I installed new version (from binaries) and I noticed that 

par(bg="white")

which I have in my .Rprofile causes error message on startup
But if I issued this command immediately after startup everything worked as 
expected. I did not see any note in changes file or elsewhere. 

Should I specify white background in .Rprofile differently?
Or is there some other recommended way to set up white background on startup?
Everything was OK in 1.8.1 version.

Using W2000.


Startup example

R : Copyright 2004, The R Foundation for Statistical Computing
Version 1.9.0  (2004-04-12), ISBN 3-900051-00-3



Attaching package 'fun':

The following object(s) are masked from package:base :

 interaction 

Error: couldn't find function "par"
[Previously saved workspace restored]

> par("bg")
[1] "transparent"
> par(bg="white")
> par("bg")
[1] "white"
>

my .Rprofile

library(fun)
par(bg="white")
RNGkind("Mersenne-Twister", "Inversion")
data(stand)

Thank you

Best regards.

Petr Pikal
[EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] from .csv file to a pca plot

2004-04-13 Thread Dansen, Ing. M.C.
Hi,
I'm just a beginner, who has just encountered a problem!

1. -I wanted to load a csv file with names in the rows (1st column)
and and numbers in the 2nd til 10th column. The file contains
names in the headers.

   -I used;  a <- as.matrix(read.table("filename", sep=',",
row.names=1, header=TRUE)

Question;   1 - I would like to select the first four columns
2 - and execute a pca(plot) from the mva package on
those four columns
3 - how can I set the data type eg(string, integer,
double) separate for each column

Can anyone help me out, Help!

Thanks in advance,

Marinus

 

 
 

This e-mail and its contents are subject to the DISCLAIMER at 
http://www.tno.nl/disclaimer/email.html

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] mts

2004-04-13 Thread Santosh Kumar
Hi!

I am new to R. I need your help.

I have got the time series of fifteen variables in data file. I would like
to plot it in R in separate ps pages, not in same ps page.

I was reading about mts, but I could not figure out how to do it.

Can anyone help me out?

with regards;
Santosh
--
Santosh Kumar URL http://www.igidr.ac.in/~santosh/
PhD Student
Indira Gandhi Institute of Development Research
Gen A. K. Vaidya Marg
Goregaon ( East )
Mumbai pin 400065 India
Phone 28400919

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Question

2004-04-13 Thread "Ivan Yegorov"
I use R for Windows. I got error "Can not allocate 100 Mb for vector". Does R use only 
physical memory or it can operate with virtual memory? What should I do if it does. 
Thanks in advance.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] R 1.9.0 is release

2004-04-13 Thread Duncan Murdoch
On 12 Apr 2004 14:05:25 +0200, Peter Dalgaard
<[EMAIL PROTECTED]> wrote :

>I've rolled up R-1.9.0.tgz a short while ago. This is a new version
>with a number of new features, most notably a substantial
>reorganization of the standard packages, a major update of the grid
>package, and the fact that underscore can now be used as a regular
>character in variable names.

I've just uploaded the Windows build.  It should appear on CRAN and
the mirrors by tomorrow.

The main Windows-specific changes are the following:

 - A "stay on top" option for windows.
 - Rcmd can now be written R CMD, as on Unix.
 - Tony Plate's "Paste commands only" to paste the commands from a
copied block of output
 
There are many other changes and bug fixes.  Here's an extract from
the CHANGES file:

rw1090
==

Both Rterm and Rgui now give usage information via the --help or -h
command-line flag.

There is now a "Misc|Break to debugger" menu option, enabled 
when a debugger is detected (somewhat fallibly), or infallibly by the
"--debug" command line option.  This will cause a trap to an external
debugger, e.g. for running Rgui under gdb.  If the menu item is
selected when not running under a debugger R is likely to crash.
If the "--debug" option is used, R will break to the debugger during
command line processing, allowing the startup process to be debugged.

Added "stay" argument to bringToTop(), to allow the user to specify
that a window should stay on top of other windows.  Also added "stay
on top" item to the popup menus.  All of these require R to be running
in SDI mode ("Rgui --sdi" or via the settings in file `Rconsole').

Changed windows() so that new windows fit within the MDI client
area.

Added winMenuNames() and winMenuItems() functions to query user menus.

Added menu items for www.r-project.org and CRAN on the help menu. 
(Wishlist PR#6492)

Added "R" command to be similar to Unix invocation of scripts, e.g.
"R CMD INSTALL" is the same as "Rcmd INSTALL".  Rcmd still exists for 
backwards compatibility (and to avoid conflicts over the name `R').
All of R, R CMD and Rcmd now accept --help.

Rcmd Rd2dvi can now be specified as such rather than as Rcmd
Rd2dvi.sh.

Added "Paste commands only" to edit and popup menus in the Rgui
console.
This allows copying of a block of output, but pasting only the
commands
back to the console for re-execution.  (Code contributed by Tony
Plate.)


Installation


Parallel make (make -j2, say) can be used, but only usefully on
dual-processor (or perhaps hyperthreaded) hosts with at least 384Mb of
memory.

Installing now sorts in the C locale to ensure that a consistent sort
order is used.  (Some aspects of sorting used to be done in the locale
of the host machine, but Perl and the cygwin-based tools used the
ASCII collation order.)

The long-untested support for making Windows .hlp files has been
withdrawn.

There is support for using K. Goto's fast BLAS.  On a 2.6Ghz P4 with
1Gb RAM and A a 1000 x 1000 matrix we had the following timings

R BLAS  ATLAS   Goto
A %*% A  3.70.650.56
svd(A)  16.27.776.83

Note that using a fast BLAS is much less effective for smaller
matrices as are more common in statistical applications.

Faster assembler code for exponentiation is used.

Cross-building of R itself now works again.  (It had been broken since
1.8.0.)


Building/installing packages


R CMD INSTALL/build/check map path names with spaces in to their short
forms.

R CMD INSTALL now supports versioned install via
--with-package-versions.

Installing (binary) package bundles now checks the MD5 sums and
reports success, just as for packages.

Added "* DONE" to the end of INSTALL logs so --install option to CHECK
will work. (This is a repository maintainer option; see 
src/scripts/check.in for docs).


Internal changes


The fast bmp/png/jpeg code introduced in R 1.8.0 is used even for
256-color displays (as we have now been able to test it on such).

R's internal malloc etc are now remapped to Rm_malloc etc and only
used in allocating memory for R objects, the Wilcoxon tests and a few
other memory-intensive applications.

Improved malloc routines from the current version of Doug Lea's malloc
(as suggested by David Teller) should enable large memory areas to be
used more effectively, in particular those over 2Gb where OS support
has been enabled.  The initially requested memory is no longer
reserved, but as this malloc is able to work with non-contiguous
memory chunks that should not matter.

The installer uses LZMA compression, so Inno Setup >= 4.1.5 is
required.

Version 1.2.5 of libpng is now used in binary builds.


Bug fixes
-

Fixed list.files() to properly handle paths like "C:", etc.

Fixed unlink() to accept empty file list for Unix consistency.

Fixed handling of whitespace in Rd2dvi.sh processing of DESCRIPTION
files.

Fixed handling of "--max-mem-size" syntax error on command line.

In RGui, ^T would n