Re: [R] R via ssh login on OS X?

2004-06-29 Thread James Howison
On Jun 29, 2004, at 1:22 AM, Ulises Mora Alvarez wrote:
Hi!
If you are trying to log in from another Mac to the G5 there are some
details to bear in mind, though. If you are indeed trying from a Mac, 
I'd
suggest you to launch your local X server; then, from an xterm 'ssh 
-X...'
to the G5. Of course, if the sshd on the G5 is configured so that its
/etc/sshd_config says 'X11Forwarding no' you'll be not able to use the 
X11
device for graphics; but you can search for a solution on the list
files.
That's good thinking.  That hadn't occurred to me and would be great 
for the graphical stuff.  Goes to show that Xwindows has the right idea 
for networked graphics while aqua is hopeless in that regard.

I don't think that this problem happens in R-1.9.1 because if I ssh 
into my laptop from a remote box as a non-logged-in user R behaves 
perfectly on the commandline.  Or maybe the install on the G5 is 
fubared.

Happily I have managed to solve my immediate problem on the G5 by 
compiling a copy of R in my home directory.  This wasn't the easiest 
primarily because I didn't have f2c installed (and because I don't have 
root I couldn't put it in the normal place).  I'm going to say how I 
did it in case this is handy for others (frankly I hope others don't 
have to go through this ;)

I grabbed the f2c code from 
ftp://netlib.bell-labs.com:21/netlib/f2c/src.tar
and libf2c from http://www.netlib.org/f2c/libf2c.zip

Both built ok.  I moved the f2c executable, f2c.h and libf2c.a into 
~/f2c.  Don't forget to run ranlib over libf2c

I set the environment variables:
LDFLAGS=-L$HOME/f2c/
CPPFLAGS=-I$HOME/f2c/   (for some reason the --includedir just didn't 
seem to work ...)

(had to remember to do this before configure)
then did
./configure --prefix=$HOME/Rinstall/ --enable-R-framework=no 
--with-x=no --with-lapack=no

and then
make
This basically worked but for some reason lapack was still trying to 
build and that was failing, so I deleted it from the appropriate 
makefile and the rest of the compile went fine.  The lapack confusion 
stopped some of the recommended modules from building but I didn't need 
those (just sna which built fine from CRAN).

I didn't do the actual install but am just using the full path to R.  
It is working fine in command-line mode now and the calculations are 
running as I type.

I didn't test this but I did also read that people are able to get 
around the need graphical launching access by using OS X fast user 
switching.

Thanks everyone!
--J
Good look.
On Mon, 28 Jun 2004, Paul Roebuck wrote:
On Mon, 28 Jun 2004, James Howison wrote:
I have an ssh only login to a G5 on which I am hoping to run some
analyses.  The situation is complicated by the fact that the 
computer's
owner is away for the summer (and thus also only has shell login).

R is installed and there is a symlink to /usr/local/bin/R but when I
try to launch it I get:
[EMAIL PROTECTED] R
kCGErrorRangeCheck : Window Server communications from outside of
session allowed for root and console user only
INIT_Processeses(), could not establish the default connection to the
WindowServer.Abort trap
I though, ah ha, I need to tell it not to use the GUI but to no 
avail:

[EMAIL PROTECTED] R --gui=none
kCGErrorRangeCheck : Window Server communications from outside of
session allowed for root and console user only
INIT_Processeses(), could not establish the default connection to the
WindowServer.Abort trap
I'm embarrassed to say that I'm writing to the list without having 
the
latest version installed---because I can't install it at the moment. 
 I
am using R 1.8.1.  I have tried to compile the latest from source but
there is no F77 compiler. I thought I'd ask around before going down
the put local dependencies in the home folder to compile this route
(any hints on doing that would be great though) ...

Can other people get R command-line to work with logged in remotely 
via
ssh?  Any hints?
Is this something that is fixed in more recent versions?

I think I can see one other route:  getting the computer's owner to
install fink and their version remotely ... but I'm open to all 
don't
bother the professor when he's on holiday options ...
I suffered similarly attempting to run R via CGI; I never found
a workaround for remote access (also running 1.8.1 with Panther).
Seemed to have something to do with running an application requiring
access to graphics but not being the user currently owning the dock.
I did not determine if the limitation was due to R implementation or
operating system software.
F77 not necessary; use 'f2c' instead. But don't bother with Fink since
it's not necessary to build it. No 'sudo' access either? Is the user
still logged in (screenlocked) or are you just lacking administrative
access?
--
SIGSIG -- signature too long (core dumped)
__
[EMAIL PROTECTED] mailing list

Re: [R] R via ssh login on OS X?

2004-06-29 Thread Prof Brian Ripley
Did you look at the notes on MacOS X in the R-admin manual (as the INSTALL 
file asks)?  That would have told you why lapack failed, and I think you 
should redo your build following the advice there.

On Tue, 29 Jun 2004, James Howison wrote:

[...]

 then did
 
 ./configure --prefix=$HOME/Rinstall/ --enable-R-framework=no 
 --with-x=no --with-lapack=no

Note 

   --with-blas='-framework vecLib' --with-lapack 

is `strongly recommended', and on some versions of MacOS X `appears to be
the only way to build R'.

 and then
 
 make
 
 This basically worked but for some reason lapack was still trying to 
 build and that was failing, so I deleted it from the appropriate 
 makefile and the rest of the compile went fine.  The lapack confusion 
 stopped some of the recommended modules from building but I didn't need 
 those (just sna which built fine from CRAN).

[...]

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] camberra distance?

2004-06-29 Thread Wolski
Hi!

Its not an R specific question but had no idea where to ask elsewhere.

Does anyone know the orginal reference to the CAMBERA  DISTANCE?

Eryk.

Ps.:
I knew that its an out of topic question (sorry).
Can anyone reccomend a mailing list where such questions are in topic?

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] camberra distance?

2004-06-29 Thread Wolski
Thanks Mark.

Yes I mean canberra. 

Searching for canberra camberra by google I observed the following.
Searching for caMberra you will find a paper from 1997 where they write camberra 
instead of canberra dissimilarity for the meassure defined sum(|x_i - y_i| / |x_i + 
y_i|). Meanwhile there are plenty of articles on the net which  reference this paper 
from 1997 and write caMberra instead of canberra.  
May be because it is much harder to find an article about canberra distance using 
google (because of the city). 
A quite assertive argument to use distinctive names and to publish papers in journals 
which are free, online and can be searched by google.


Eryk


*** REPLY SEPARATOR  ***

On 29.06.2004 at 15:51 [EMAIL PROTECTED] wrote:

maybe you mean 'Canberra'?, if so it might have come from work in csiro
in canberra back in the 60's/70's? Look for Lance  Williams 1967 ,
possibly. Aust. Comput. J. 1, 15-20


Mark Palmer
Environmetrics Monitoring for Management   
CSIRO Mathematical and Information Sciences
Private bag 5, Wembley, Western Australia, 6913
Phone  61-8-9333-6293
Mobile  0427-50-2353
Fax:   61-8-9333-6121
Email: [EMAIL PROTECTED] 
URL:   www.cmis.csiro.au/envir



-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Wolski
Sent: Tuesday, 29 June 2004 3:45 PM
To: R Help Mailing List
Subject: [R] camberra distance?


Hi!

Its not an R specific question but had no idea where to ask elsewhere.

Does anyone know the orginal reference to the CAMBERA  DISTANCE?

Eryk.

Ps.:
I knew that its an out of topic question (sorry).
Can anyone reccomend a mailing list where such questions are in topic?

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] strucchange-esque inference for glms ?

2004-06-29 Thread Achim Zeileis
Alexis:

 according to the strucchange package .pdf, all procedures in this
 package are concerned with testing or assessing deviations from
 stability in the classical linear regression model.

 i'd like to test/assess deviations from stability in the Poisson
 model.
 
 is there a way to modify the strucchange package to suit my purposes,
 or should i use be using another package,   or is this a tough nut to
 crack? :)

As of version 1.2-0 strucchange supports tests for parameter
instability in much more general models including GLMs. A simple example
would be

R library(strucchange)
R data(BostonHomicide)
R mcus - gefp(homicides ~ population, family = poisson, fit = glm,
data = BostonHomicide, vcov = kernHAC)
R plot(mcus)
R sctest(mcus)

See our technical report Generalized M-fluctuation tests for Parameter
Instability (linked from my web page) for the theory behind it.

 my application is detecting the onset of a flu outbreak as new daily
 data trickles in from each morning from local hospitals.  seems to me
 like the same sort of inferential goal that strucchange refers to as
 monitoring of structural change.

In principile the theory established in the report above could also be
applied to monitoring, but I have neither worked the theory out nor
implemented a function which could handle monitoring in GLMs. But you
can contact me off-list if you are interested in this.

Best,
Achim

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Several PCA questions...

2004-06-29 Thread Dan Bolser

Hi, I am doing PCA on several columns of data in a data.frame.

I am interested in particular rows of data which may have a particular
combination of 'types' of column values (without any pre-conception of
what they may be).

I do the following...

# My data table.
allDat - read.table(big_select_thresh_5, header=1)

# Where some rows look like this...
# PDB SUNID1  SUNID2  AA  CH  IPCAPCA IBB BB
# 3sdh14984   14985   6   10  24  24  93  116
# 3hbi14986   14987   6   10  20  22  94  117
# 4sdh14988   14989   6   10  20  20  104 122

# NB First three columns = row ID, last 6 = variables

attach(allDat)

# My columns of interest (variables).
part - data.frame(AA,CH,IPCA,PCA,IBB,BB)

pc - princomp(part)

plot(pc)

The above plot shows that 95% of the variance is due to the first
'Component' (which I assume is AA).

i.e. All the variables behave in quite much the same way.

I then did ...


biplot(pc)

Which showed some outliers with a numeric ID - How do I get back my old 3
part ID used in allDat?

In the above plot I saw all the variables (correctly named) pointing in
more or less the same direction (as shown by the variance). I then did the
following...

postscript(file=test.ps,paper=a4)

biplot(pc)

dev.off()

However, looking at test.ps shows that the arrows are missing (using
ggv)... Hmmm, they come back when I pstoimg then xv... never mind.


Finally, I would like to make a contour plot of the above biplot, is this
possible? (or even a good way to present the data?

Thanks very much for any feedback, 

Dan.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] camberra distance?

2004-06-29 Thread Dan Bolser
On Tue, 29 Jun 2004, Wolski wrote:

Hi!

Its not an R specific question but had no idea where to ask elsewhere.

Does anyone know the orginal reference to the CAMBERA  DISTANCE?

Eryk.

Ps.:
I knew that its an out of topic question (sorry).
Can anyone reccomend a mailing list where such questions are in topic?

sci.stat.consult (applied statistics and consulting) and 
sci.stat.math (mathematical stat and probability)

Both news groups.



__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] A strange question on probability

2004-06-29 Thread Spencer Graves
 Does the following do what you want: 

rseq - function(n=1, length.=2){
 s1 - sample(x=length., size=n, replace=TRUE)
 s2 - sample(x=length., size=n, replace=TRUE)
 ranseq - array(0, dim=c(n, length.))
 for(i in 1:n)
   ranseq[i, s1[i]:s2[i]] - 1
 ranseq
}
set.seed(1)
rseq(9, 5)
 set.seed(1)
 rseq(9, 5)
 [,1] [,2] [,3] [,4] [,5]
[1,]11000
[2,]01000
[3,]11100
[4,]00011
[5,]01000
[6,]00011
[7,]00111
[8,]00010
[9,]00011

 hope this helps.  spencer graves
Jim Lemon wrote:
On Tuesday 29 June 2004 01:48 pm, Steve S wrote:
 

Dear All,
I wonder if there is a probability distribution where you can specify when
a certain event start and finish within a fixed period? For example I might
specify the number of period to be 5, and a random vector from this
distribution might give me:
0 1 1 1 0
where 1 is always adjacent to each other?
This can never happen: 0 0 1 0 1 for example.
   

Well, I'll have a go. Let's call it the start-finish distribution. We have a  
p (period) and d (duration). As there must be an off observation (otherwise 
we don't know the duration), It's fairly easy to enumerate the outcomes for a 
given period:

d   start(s)finish(f)   count
1   1:n-1   2:n n-1
2   1:n-2   3:n n-2
...
n-1 1   n-1 1
Assuming that all outcomes are equally likely, the total number of outcomes 
is:

n(n-1)/2
thus the probability of a given d occurring is:
P[d|n] = 2(n-d)/n(n-1)
The probabilities of s and f over all d are inverse over the values k in 1:n
P[s=k|n] = (n-k-1)/(n-1)
P[f=k|n] = (k-1)/(n-1)
giving, I think, a monotonic function for s and f.
 

My apology for this strange question!
   

My apology if this is no use at all.
Jim
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Several PCA questions...

2004-06-29 Thread Jonathan Baron
On 06/29/04 11:04, Dan Bolser wrote:

Hi, I am doing PCA on several columns of data in a data.frame.

I am interested in particular rows of data which may have a particular
combination of 'types' of column values (without any pre-conception of
what they may be).

I do the following...

# My data table.
allDat - read.table(big_select_thresh_5, header=1)

# Where some rows look like this...
# PDB SUNID1  SUNID2  AA  CH  IPCAPCA IBB BB
# 3sdh14984   14985   6   10  24  24  93  116
# 3hbi14986   14987   6   10  20  22  94  117
# 4sdh14988   14989   6   10  20  20  104 122

# NB First three columns = row ID, last 6 = variables

attach(allDat)

# My columns of interest (variables).
part - data.frame(AA,CH,IPCA,PCA,IBB,BB)

pc - princomp(part)

plot(pc)

The above plot shows that 95% of the variance is due to the first
'Component' (which I assume is AA).

No.  It is the first principal component, which is some linear
combination of all the variables.  Try loadings(pc).  It sounds
like you need to read up on principal component analysis.

i.e. All the variables behave in quite much the same way.

I then did ...


biplot(pc)

Which showed some outliers with a numeric ID - How do I get back my old 3
part ID used in allDat?

The numeric ID is taken from the row names of pc.  So, if the IDs
in question are 3 and 5, then alldat[c(3,5),] should work.

In the above plot I saw all the variables (correctly named) pointing in
more or less the same direction (as shown by the variance). I then did the
following...

postscript(file=test.ps,paper=a4)

biplot(pc)

dev.off()

However, looking at test.ps shows that the arrows are missing (using
ggv)... Hmmm, they come back when I pstoimg then xv... never mind.

I get red arrows for the components in both the original graph
and the ps output (R 1.9.1, Fedora Core 2).  This may be a
platform-specific problem or one specific to ggv.  I have neither
ggv nor pstoimg.  (But xv and gv both work.)

Finally, I would like to make a contour plot of the above biplot, is this
possible? (or even a good way to present the data?

No idea how to do this or why you would want it.

Jon
-- 
Jonathan Baron, Professor of Psychology, University of Pennsylvania
Home page:http://www.sas.upenn.edu/~baron
R search page:http://finzi.psych.upenn.edu/

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Several PCA questions...

2004-06-29 Thread Prof Brian Ripley
On Tue, 29 Jun 2004, Dan Bolser wrote:

 Hi, I am doing PCA on several columns of data in a data.frame.
 
 I am interested in particular rows of data which may have a particular
 combination of 'types' of column values (without any pre-conception of
 what they may be).
 
 I do the following...
 
 # My data table.
 allDat - read.table(big_select_thresh_5, header=1)
 
 # Where some rows look like this...
 # PDB SUNID1  SUNID2  AA  CH  IPCAPCA IBB BB
 # 3sdh14984   14985   6   10  24  24  93  116
 # 3hbi14986   14987   6   10  20  22  94  117
 # 4sdh14988   14989   6   10  20  20  104 122
 
 # NB First three columns = row ID, last 6 = variables
 
 attach(allDat)
 
 # My columns of interest (variables).
 part - data.frame(AA,CH,IPCA,PCA,IBB,BB)
 
 pc - princomp(part)

Do you really want an unscaled PCA on that data set?  Looks unlikely (but 
then two of the columns are constant in the sample, which is also 
worrying).

 plot(pc)
 
 The above plot shows that 95% of the variance is due to the first
 'Component' (which I assume is AA).

No, it is the first (principal) component.  You did ask for PCA!

 i.e. All the variables behave in quite much the same way.

Or you failed to scale the data so one dominates.

 I then did ...
 
 
 biplot(pc)
 
 Which showed some outliers with a numeric ID - How do I get back my old 3
 part ID used in allDat?

Set row names on your data frame.  Like almost all of R, it is the row 
names of a data frame that are used for labelling, and you did not give 
any so you got numbers.

 In the above plot I saw all the variables (correctly named) pointing in
 more or less the same direction (as shown by the variance). I then did the
 following...
 
 postscript(file=test.ps,paper=a4)
 
 biplot(pc)
 
 dev.off()
 
 However, looking at test.ps shows that the arrows are missing (using
 ggv)... Hmmm, they come back when I pstoimg then xv... never mind.

So ggv is unreliable, perhaps cannot cope with colours?

 Finally, I would like to make a contour plot of the above biplot, is this
 possible? (or even a good way to present the data?

What do you propose to represent by the contours?  Biplots have a 
well-defined interpretation in terms of distances and angles.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] camberra distance?

2004-06-29 Thread Prof Brian Ripley
You may find it easier to search for `canberra distance', if that is
really what you intend (and your subject line and text differ in spelling
anyway).  See ?dist, and Google results for `cambera distance', which both
shows this a fairly common mispelling of the capital of Australia, and
suggests the correct spelling.

On Tue, 29 Jun 2004, Dan Bolser wrote:

 On Tue, 29 Jun 2004, Wolski wrote:
 
 Hi!
 
 Its not an R specific question but had no idea where to ask elsewhere.
 
 Does anyone know the orginal reference to the CAMBERA  DISTANCE?
 
 Eryk.
 
 Ps.:
 I knew that its an out of topic question (sorry).
 Can anyone reccomend a mailing list where such questions are in topic?
 
 sci.stat.consult (applied statistics and consulting) and 
 sci.stat.math (mathematical stat and probability)
 
 Both news groups.

I do think readers of such groups (and this one) expect Google etc to be 
used first.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] R via ssh login on OS X?

2004-06-29 Thread roger koenker
My office G5 running R-devel has no problem with remote logins, either
mine, or my students, so I doubt there is something fatally flawed in
either the OS or R that is a problem.  X11Forwarding is off by default
so this does need to be changed, I believe.  I might add just for a
moment of schadenfreude that Stata's Mac version does seem to
make it impossible to run remotely even though their other unix
versions are happy to do so.
url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820
On Jun 29, 2004, at 2:04 AM, James Howison wrote:
On Jun 29, 2004, at 1:22 AM, Ulises Mora Alvarez wrote:
Hi!
If you are trying to log in from another Mac to the G5 there are some
details to bear in mind, though. If you are indeed trying from a Mac, 
I'd
suggest you to launch your local X server; then, from an xterm 'ssh 
-X...'
to the G5. Of course, if the sshd on the G5 is configured so that its
/etc/sshd_config says 'X11Forwarding no' you'll be not able to use 
the X11
device for graphics; but you can search for a solution on the list
files.
That's good thinking.  That hadn't occurred to me and would be great 
for the graphical stuff.  Goes to show that Xwindows has the right 
idea for networked graphics while aqua is hopeless in that regard.

I don't think that this problem happens in R-1.9.1 because if I ssh 
into my laptop from a remote box as a non-logged-in user R behaves 
perfectly on the commandline.  Or maybe the install on the G5 is 
fubared.

Happily I have managed to solve my immediate problem on the G5 by 
compiling a copy of R in my home directory.  This wasn't the easiest 
primarily because I didn't have f2c installed (and because I don't 
have root I couldn't put it in the normal place).  I'm going to say 
how I did it in case this is handy for others (frankly I hope others 
don't have to go through this ;)

I grabbed the f2c code from 
ftp://netlib.bell-labs.com:21/netlib/f2c/src.tar
and libf2c from http://www.netlib.org/f2c/libf2c.zip

Both built ok.  I moved the f2c executable, f2c.h and libf2c.a into 
~/f2c.  Don't forget to run ranlib over libf2c

I set the environment variables:
LDFLAGS=-L$HOME/f2c/
CPPFLAGS=-I$HOME/f2c/   (for some reason the --includedir just didn't 
seem to work ...)

(had to remember to do this before configure)
then did
./configure --prefix=$HOME/Rinstall/ --enable-R-framework=no 
--with-x=no --with-lapack=no

and then
make
This basically worked but for some reason lapack was still trying to 
build and that was failing, so I deleted it from the appropriate 
makefile and the rest of the compile went fine.  The lapack confusion 
stopped some of the recommended modules from building but I didn't 
need those (just sna which built fine from CRAN).

I didn't do the actual install but am just using the full path to R.  
It is working fine in command-line mode now and the calculations are 
running as I type.

I didn't test this but I did also read that people are able to get 
around the need graphical launching access by using OS X fast user 
switching.

Thanks everyone!
--J
Good look.
On Mon, 28 Jun 2004, Paul Roebuck wrote:
On Mon, 28 Jun 2004, James Howison wrote:
I have an ssh only login to a G5 on which I am hoping to run some
analyses.  The situation is complicated by the fact that the 
computer's
owner is away for the summer (and thus also only has shell login).

R is installed and there is a symlink to /usr/local/bin/R but when I
try to launch it I get:
[EMAIL PROTECTED] R
kCGErrorRangeCheck : Window Server communications from outside of
session allowed for root and console user only
INIT_Processeses(), could not establish the default connection to 
the
WindowServer.Abort trap

I though, ah ha, I need to tell it not to use the GUI but to no 
avail:

[EMAIL PROTECTED] R --gui=none
kCGErrorRangeCheck : Window Server communications from outside of
session allowed for root and console user only
INIT_Processeses(), could not establish the default connection to 
the
WindowServer.Abort trap

I'm embarrassed to say that I'm writing to the list without having 
the
latest version installed---because I can't install it at the 
moment.  I
am using R 1.8.1.  I have tried to compile the latest from source 
but
there is no F77 compiler. I thought I'd ask around before going down
the put local dependencies in the home folder to compile this 
route
(any hints on doing that would be great though) ...

Can other people get R command-line to work with logged in remotely 
via
ssh?  Any hints?
Is this something that is fixed in more recent versions?

I think I can see one other route:  getting the computer's owner to
install fink and their version remotely ... but I'm open to all 
don't
bother the professor when he's on holiday options ...
I suffered similarly attempting to run R via 

Re: [R] Several PCA questions...

2004-06-29 Thread Dan Bolser

Thanks Jonathan and Brian for advise, all but for the last point I will
do more background reading. To come back to the last point...


 Finally, I would like to make a contour plot of the above biplot, is this
 possible? (or even a good way to present the data?


Brian said:

What do you propose to represent by the contours?  Biplots have a 
well-defined interpretation in terms of distances and angles.

Jonathan said:

No idea how to do this or why you would want it.



Basically I would like to make a 2d smothed density, represented as a
countour plot. I would like to do this as a crude visual clustering of my 
data points.

i.e. instead of representing data by the row labels in the biplot, I would
like to see just a single dot for each data point. Then I would like to
only see the density of these points in 2d (hence contours).

For example...

x - rnorm(1000)
y - rnorm(1000)

plot(x,y)

library(MASS)

z - kde2d(x,y)

contour(z)

I imagine the above in the context of my biplot, and I would like to see
peaks which represent clusters of datapoints in 2d. However, I don't know
how to get x,y coords from the pc object or the biplot function.


Thanks again for the other tips, I need to read more! I will just
throw one final question out there (perhaps to further highlight my
ignorance).

I thought that a significant factor in my data was the relative magnitude
of the different variables, so I thought about making a new variable for
each distinct pair of existing variables, and setting that new (pariwise)
variable to 1 or 0 depending on the relative magnitude of the two
component variables. Then I do PCA (or some other clustering (or a simple
grouping)) on this new set of variables, and hey presto, I have the
classes of my data points. Just an idea. Any good?

Cheers,
Dan.







__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] discrete hazard rate analysis

2004-06-29 Thread Joao Pedro W. de Azevedo
Dear R users,

I have more of a statistical/econometric question than straight R one.

I have a data set with the discrete hazard rate of small firms survival on
400 counties over a period of 9 years. This data was generated using census
information from the VAT registration number of each one of these business. 

I would like to analyze the effect of regional factors (deprivation index,
real wages, average schooling, population density, etc) on the variation of
these hazard rates across counties over time.

I've done a search in the economic literature on firm survival and regional
economics, but I could not find any reference that would resemble the type
of data or the problem that I would like to explore. I would like to know if
anyone in the list has any suggestion of references that I might have missed
in economics, or if people on any other fields know of any references of
people looking at any data that might resemble this one (I don't know, but
maybe epidemiology might have regional level data that might look at similar
issues).

Of course I would also like to know which R commands could assist me on this
analysis.

Any suggestions will be much appreciated.

All the very best,

JP


County Region time 1993 1994 1995 1996 1997 1998 1999 2000 2001
a South 6 months 95.0 95.1 95.5 95.7 96.8 96.8 96.9 95.9 98.0
a South 12 months 87.1 87.1 89.7 89.6 92.4 91.7 92.6 90.2 93.9
a South 18 months 79.5 79.8 83.3 83.0 86.4 86.7 85.8 84.7 
a South 24 months 73.3 73.1 77.8 78.0 80.6 81.0 79.6 79.2 
a South 30 months 68.0 67.3 72.8 72.3 75.8 75.1 74.1 
a South 36 months 63.7 62.9 69.0 67.1 70.8 68.7 68.5 
a South 42 months 59.3 59.1 65.4 62.6 65.8 64.2 
a South 48 months 56.1 56.2 61.6 59.1 61.2 59.6 

b South 6 months 94.2 96.0 96.3 96.7 96.5 97.0 97.0 96.1 97.1
b South 12 months 87.2 89.1 90.6 90.5 91.4 91.8 92.1 91.3 92.8
b South 18 months 79.9 82.0 84.2 84.5 85.8 85.8 86.3 86.2 
b South 24 months 73.9 75.9 78.1 79.0 80.5 79.8 80.8 80.4 
b South 30 months 68.2 70.0 73.2 74.2 75.6 74.3 75.8 
b South 36 months 64.0 65.4 69.0 70.3 70.4 69.6 71.0 
b South 42 months 60.2 60.8 65.4 66.0 66.1 64.9 
b South 48 months 56.6 57.5 62.0 61.7 61.7 61.1 

c South 6 months 93.2 94.0 94.6 95.6 95.7 95.8 95.9 96.6 97.2
c South 12 months 84.5 85.8 87.8 89.1 89.6 89.8 90.8 91.6 92.7
c South 18 months 77.2 78.9 80.7 83.3 84.1 83.8 84.1 86.7 
c South 24 months 69.8 72.8 75.1 77.2 78.1 78.0 78.7 80.7 
c South 30 months 63.8 66.3 69.3 71.9 72.9 73.0 73.3 
c South 36 months 59.4 61.7 65.3 67.8 68.5 68.2 68.3 
c South 42 months 55.8 57.3 60.9 63.7 64.0 63.6 
c South 48 months 52.4 53.7 57.6 60.3 59.9 59.0

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] R client connection OLAP cube (SQL Analysis Services / PivotTable Service)

2004-06-29 Thread David James
Dear Olivier,

I believe your best bet may be to connect to the database through
some kind of R-COM connection (either Thomas Baier and
Erich Neuwirth's R-(D)COM in CRAN or Duncan Temple Lang's at
http://www.omegahat.org/RDCOMClient).  For instance, the 
ADO MD (ActiveX Data Objects Multi-dimensional) COM object/library
allows you to connect to the OLAP database and query multiple cubes,
their axes and hierarchies, etc.  See the Microsoft Developer Network
(MSDN) for the gory details.

Hope this helps,

--
David

Olivier Collignon wrote:
 I have been doing data analysis/modeling in R, connecting to SQL databases 
 with RODBC (winXP client with R1.9.0 and win2k SQL server 2000).
 
 I am now trying to leverage some of the OLAP features to keep the data 
 intensive tasks on the DB server side and only keep the analytical tasks 
 within R (optimize use of memory). Is there any package that would allow to 
 connect to OLAP cubes (as a client to SQL Analysis Services PivotTable 
 Service) from an R session (similar to the RODBC package)?
 
 Can this be done directly with special connection parameters (from R) 
 through the RODBC package or do I have to setup an intermediary XL table 
 (pivottable linked to the OLAP cube) and then connect to the XL data source 
 from R?
 
 I would appreciate any reference / pointer to OLAP/R configuration 
 instructions. Thanks.
 
 Olivier Collignon
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] gambling problem

2004-06-29 Thread allan clark
Hi all

i have an interesting project that i have been working on. i intended to
set this as a first year programming problem but then changed my mind
since i thought that it might be too difficult for them to program.

 the problem is as follows:

 You have been approached by a local casino in order to
 investigate the performance of one of their slot machines.

 The slot machine consists of three independently operating
 reels on which one of six different symbols can appear. The
 symbols are hearts (H), diamonds (D), spades (S), clubs (C) a
 joker (J) and a castle (Ca). People bet 1 unit at a time in
 order to play the game and are paid out according to the
 arrangement of the three reel symbols.

 Each reel consists of a number of different tiles. For example, the
 first reel contains 40 tiles. The second has 40 tiles and the third
 has 50 tiles. Each time the game is played each of the reels spin
 such that 1 of the 40 tiles (for reel 1 and similarly for the other
 reels) will appear. The number of tiles that occur on each of the
 reels are shown below: (i havn't included these but they are in the
 code below: ie)

   * I've written three functions that will solve the above problem. the
 code is attached below. the code is very fast but i would like to
 improve the speed by not utilizing loops. is that possible?
   * another question? in the function called GAMBLING, i use the
 following :
 a-COUNTER(reelpic,nreels,countcombs,payoffcombs,payoff,bet)
  countcombs-a$countcombs
  payoff-a$payoff
 payoffvec[i]-payoff
 in order to count up the number of times each of the times we get
 each of the payoff symbol combinations (ie H, HH, HHH, D, DD, DDD,
 ... J,JJ,JJJ, Ca, CaCa, CaCaCa). countcombs is a vector that
 contains the counts of the various payoff symbol combinations. the
 function COUNTER calculates these values (ie basically just adds
 one if a particular combination occurs) and it is attached as a
 list object in COUNTER. is there a way of declaring PUBLIC
 VARIABLES (as allowed by VBA) that allows one to know the value of
 different variables as caclulated in different functions without
 using the method that i used. ie. using the value
 countcombs-a$countcombs  after calling
 a-COUNTER(reelpic,nreels,countcombs,payoffcombs,payoff,bet)

   * Another question: in the function RUNDIST i used the following 2
 lines
  zrung-GAMBLING(nsim=150)
  z[p]-cumsum(zrung$payoffvec)[150]
 Initially the second of these lines was set as
  z[p]-cumsum(zrung$payoffvec)[nsim]
 which caused an error. why does this happen?


Sorry for the extremely long email but any help would be much
appreciated.

regards
Allan

the following functions are attached:

   * GAMBLING- this function simulates the basic game as stated above
   * COUNTER - this function calculates the number of times each of the
 various payoff combinations occur
   * FORMATCOMBSPDF - this function creates a table of the simulated pdf
 of the payoff combinations
   * RUNDIST - allows one to generate a distribution of a gamblers total
 payoff after playing the game 150 times.



#GAMBLING- this function simulates the basic game as stated above

GAMBLING-function(nsim)
{
#denote hearts =1,diamonds =2,spades =3,clubs =4, joker=5, castle = 6
time1-Sys.time()

#the upper level of the pdf
uplimit1-c(0.14,0.24,0.30,0.40,0.50,1)
uplimit2-c(0.14,0.30,0.44,0.50,0.56,1)
uplimit3-c(0.16,0.30,0.36,0.50,0.56,1)
payoffcombs-matrix(c(2,6,34,2,8,48,2,13,211,2,21,127,2,19,296,0,0,0),nrow=18,ncol=1)

bet-1

t-matrix(data=0,nrow=length(uplimit1),ncol=3)
reelpic-matrix(data=0,nrow=1,ncol=3)
countcombs-matrix(data=0,nrow=length(payoffcombs),ncol=1)
payoffvec-matrix(data=0,nrow=nsim,ncol=1)

nreels-3
payoff-0

uplimit-matrix(c(uplimit1,uplimit2,uplimit3),ncol=length(uplimit1),nrow=nreels,byrow=TRUE)

#the loop over the number the simulation counter
for (i in 1:nsim)
{
 #the loop over the number the reels
 for (j in 1:nreels)
 {
  unif-runif(n=1, min=0, max=1)
  #print(unif)
  #the loop over the number of prob categories
  for (k in 1:length(uplimit1))
  {
   if (unif=uplimit[j,k])
   {
#counts up the number of times we get
#each of the symbols
t[k,j]=t[k,j]+1

#the reel picture generated
reelpic[1,j]-k
break
   }# endif
  }# next k
 }# next j
 a-COUNTER(reelpic,nreels,countcombs,payoffcombs,payoff,bet)
 countcombs-a$countcombs
 payoff-a$payoff
 payoffvec[i]-payoff
}# next i

totals-apply(t,2,sum)
pdf-sweep(t,2,totals,/)
combspdf-sweep(countcombs,1,nsim,/)
b-FORMATCOMBSPDF(combspdf)

time2-Sys.time()
timer-time2-time1

aa-paste(THE OUTPUT LIST: $COMBSPDF, 

[R] binding rows from different matrices

2004-06-29 Thread Stephane DRAY
Hello list,
I have 3 matrices with same dimension :
 veca=matrix(1:25,5,5)
 vecb=matrix(letters[1:25],5,5)
 vecc=matrix(LETTERS[1:25],5,5)
I would like to obtain a new matrix composed by alternating rows of these 
different matrices (row 1 of mat 1, row 1 of mat 2, row 1 of mat 3, row 2 
of mat 1.)

I have found a solution to do it but it is not very pretty and I wonder if 
I can do it in an other way (perhaps with apply ) ?

 res=matrix(0,1,5)
 for(i in 1:5)
+ res=rbind(res,veca[i,],vecb[i,],vecc[i,])
 res=res[-1,]
 res
  [,1] [,2] [,3] [,4] [,5]
 [1,] 1  6  11 16 21
 [2,] a  f  k  p  u
 [3,] A  F  K  P  U
 [4,] 2  7  12 17 22
 [5,] b  g  l  q  v
 [6,] B  G  L  Q  V
 [7,] 3  8  13 18 23
 [8,] c  h  m  r  w
 [9,] C  H  M  R  W
[10,] 4  9  14 19 24
[11,] d  i  n  s  x
[12,] D  I  N  S  X
[13,] 5  10 15 20 25
[14,] e  j  o  t  y
[15,] E  J  O  T  Y

Thanks in advance !
Stéphane DRAY
-- 

Département des Sciences Biologiques
Université de Montréal, C.P. 6128, succursale centre-ville
Montréal, Québec H3C 3J7, Canada
Tel : 514 343 6111 poste 1233
E-mail : [EMAIL PROTECTED]
-- 

Web  http://www.steph280.freesurf.fr/
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] give PAM my own medoids

2004-06-29 Thread Isabel Brito
Hello,
When using PAM (partitioning around medoids), I would like to skip the 
build-step and give the fonction my own medoids.

Do you know if it is possible, and how ?
Thank you very much.
Isabel
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] [ cor(x, y,use = all.obs,method = c(spearman)) ]

2004-06-29 Thread Sebastien Moretti
Hello
I would like to know how cor()  handles ranks when some ranks are ex aequo.
Does it use Spearman Correlation Coefficient with correction of the formula ?
Thanks

-- 
Sebastien MORETTI
Linux User - #327894
CNRS - IGS
31 chemin Joseph Aiguier
13402 Marseille cedex 20, FRANCE
tel. +33 (0)4 91 16 44 55

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] give PAM my own medoids

2004-06-29 Thread Martin Maechler
Bonjour Isabel,

 Isabel == Isabel Brito [EMAIL PROTECTED]
 on Tue, 29 Jun 2004 17:06:12 +0200 writes:

Isabel Hello,

Isabel When using PAM (partitioning around medoids), I
Isabel would like to skip the build-step and give the
Isabel fonction my own medoids.

Isabel Do you know if it is possible, and how ?

unfortunately, it's not yet possible, but --- believe or not ---
this has been on my TODO list for 'cluster' (the package) for a
while now -- and your wish definitely raises the priority!
I want to do some checking for user input errors there, but this
is definitely not so much of work to do...

- do nag me about it at least once a month till it'done.. ;-)

Isabel Thank you very much.

You're welcome,
Martin Maechler, Seminar fuer Statistik ETH Zurich

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Different behaviour of unique(), R vs. Splus.

2004-06-29 Thread Rolf Turner
Apologies for the cross-posting, but I thought this snippet of info
might be vaguely interesting to both lists.

I did a ***brief*** search to see if this issue had previously been
discussed and found nothing.  So I thought I'd tell the list about a
difference in behaviour between unique() in R and unique() in Splus
which bit me just now.

I was trying to convert a package from Splus to R and got nonsense
answers in R.  Turned out that within the bowels of the package I was
doing something like

u - unique(y)

where y was a matrix of integer values.  In Splus this gives a
(short) vector of unique values.  In R it gives a matrix of the same
dimensionality as y, except that any duplicated rows are eliminated.

(This looks like being very useful --- once you know about it.  And
it was probably mentioned in the R release notes at one time, but, as
Dr. Hook says, ``I was stoned and I missed it.'')

E.g.
set.seed(42)
m - matrix(sample(1:5,20,TRUE),5,4)
u - unique(m)

In R ``u'' is identical to ``m''; in Splus ``u'' is vector (of
length 5).

To get what I want in R I simply need to do

u - unique(as.vector(y))

Simple, once you know.  Took me a devil of a long time to track down
what was going wrong, but!

cheers,

Rolf Turner

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Boosting

2004-06-29 Thread Martin Maechler

 ORIORDAN == ORIORDAN EDMOND [EMAIL PROTECTED]
 on Mon, 28 Jun 2004 10:23:24 -0400 writes:

ORIORDAN Hi Does anybody have a package/code for Real
ORIORDAN Adaboost that works in R?  

Did you try the 'gbm' package from CRAN?

ORIORDAN very large binary data set Any help greatly
ORIORDAN appreciated cheers ed

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Quantile Regression in R

2004-06-29 Thread Ali Hirsa
I recently learn about Quantile Regression in R.
I am trying to study two time series (attached) by Quantile Regression in R.
I wrote the following code and do not know how to interpret the lines.
   
What kind of information can I get from them? Correlation for quantiles, 
conditional probabilties (i.e. P(X in Quantile i | Y in Quantile i)) , and etc.
Many thanks in advance for any help.

Best,
Ali

library(quantreg)
#help.start()

Data - read.table(RESvsMOVE2.dat)
#
x - Data[,2]
y - Data[,1]

par(mfrow=c(2,2))

qqnorm(x,main=MOVE Norm Q-Q Plot, xlab=Normal Qunatiles,ylab = MOVE Quantiles)
qqline(x)

qqnorm(y,main=Residuals Norm Q-Q Plot, xlab=Normal Qunatiles,ylab = Residuals 
Quantiles)
qqline(y)

plot(x,y,xlab=MOVE,ylab=Residuals,cex=.5)

xx - seq(min(x),max(x),.5)

# Just a linear regression
g - coef(lm(y~x))
yy - (g[1]+g[2]*(xx))
lines(xx,yy,col=yellow)

taus - c(.05,.1,.25,.5,.75,.9,.95)

for(tau in taus){
f - coef(rq(y~x,tau=tau,method=pfn))
yy - (f[1]+f[2]*(xx))
if (tau ==.05){
 lines(xx,yy,col=red)
}
if (tau ==.95){
 lines(xx,yy,col=green)
}
if (tau != .05  tau != .95){
 lines(xx,yy,col=blue)
}
}




__


__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] binding rows from different matrices

2004-06-29 Thread james . holtman




Try:

  veca=matrix(1:25,5,5)
  vecb=matrix(letters[1:25],5,5)
  vecc=matrix(LETTERS[1:25],5,5)
 x.1 - lapply(1:5,function(x)rbind(veca[x,],vecb[x,],vecc[x,]))
 do.call('rbind',x.1)
  [,1] [,2] [,3] [,4] [,5]
 [1,] 1  6  11 16 21
 [2,] a  f  k  p  u
 [3,] A  F  K  P  U
 [4,] 2  7  12 17 22
 [5,] b  g  l  q  v
 [6,] B  G  L  Q  V
 [7,] 3  8  13 18 23
 [8,] c  h  m  r  w
 [9,] C  H  M  R  W
[10,] 4  9  14 19 24
[11,] d  i  n  s  x
[12,] D  I  N  S  X
[13,] 5  10 15 20 25
[14,] e  j  o  t  y
[15,] E  J  O  T  Y

__
James HoltmanWhat is the problem you are trying to solve?
Executive Technical Consultant  --  Office of Technology, Convergys
[EMAIL PROTECTED]
+1 (513) 723-2929


   

  Stephane DRAY

  [EMAIL PROTECTED]To:   [EMAIL PROTECTED]
  
  eal.ca  cc: 

  Sent by: Subject:  [R] binding rows from 
different matrices  
  [EMAIL PROTECTED]

  ath.ethz.ch  

   

   

  06/29/2004 11:00 

   

   





Hello list,
I have 3 matrices with same dimension :
  veca=matrix(1:25,5,5)
  vecb=matrix(letters[1:25],5,5)
  vecc=matrix(LETTERS[1:25],5,5)

I would like to obtain a new matrix composed by alternating rows of these
different matrices (row 1 of mat 1, row 1 of mat 2, row 1 of mat 3, row 2
of mat 1.)

I have found a solution to do it but it is not very pretty and I wonder if
I can do it in an other way (perhaps with apply ) ?

  res=matrix(0,1,5)
  for(i in 1:5)
+ res=rbind(res,veca[i,],vecb[i,],vecc[i,])
  res=res[-1,]
  res
   [,1] [,2] [,3] [,4] [,5]
  [1,] 1  6  11 16 21
  [2,] a  f  k  p  u
  [3,] A  F  K  P  U
  [4,] 2  7  12 17 22
  [5,] b  g  l  q  v
  [6,] B  G  L  Q  V
  [7,] 3  8  13 18 23
  [8,] c  h  m  r  w
  [9,] C  H  M  R  W
[10,] 4  9  14 19 24
[11,] d  i  n  s  x
[12,] D  I  N  S  X
[13,] 5  10 15 20 25
[14,] e  j  o  t  y
[15,] E  J  O  T  Y
 

Thanks in advance !

Stéphane DRAY
--


Département des Sciences Biologiques
Université de Montréal, C.P. 6128, succursale centre-ville
Montréal, Québec H3C 3J7, Canada

Tel : 514 343 6111 poste 1233
E-mail : [EMAIL PROTECTED]
--


Web
http://www.steph280.freesurf.fr/

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] binding rows from different matrices

2004-06-29 Thread Prof Brian Ripley
You can almost always index in such problems: here is one way.

rbind(veca,vecb,vecc)[matrix(1:15, 3, byrow=T), ]

Take it apart of see how it works, if it is not immediately obvious.

On Tue, 29 Jun 2004, Stephane DRAY wrote:

 Hello list,
 I have 3 matrices with same dimension :
   veca=matrix(1:25,5,5)
   vecb=matrix(letters[1:25],5,5)
   vecc=matrix(LETTERS[1:25],5,5)
 
 I would like to obtain a new matrix composed by alternating rows of these 
 different matrices (row 1 of mat 1, row 1 of mat 2, row 1 of mat 3, row 2 
 of mat 1.)
 
 I have found a solution to do it but it is not very pretty and I wonder if 
 I can do it in an other way (perhaps with apply ) ?
 
   res=matrix(0,1,5)
   for(i in 1:5)
 + res=rbind(res,veca[i,],vecb[i,],vecc[i,])
   res=res[-1,]
   res
[,1] [,2] [,3] [,4] [,5]
   [1,] 1  6  11 16 21
   [2,] a  f  k  p  u
   [3,] A  F  K  P  U
   [4,] 2  7  12 17 22
   [5,] b  g  l  q  v
   [6,] B  G  L  Q  V
   [7,] 3  8  13 18 23
   [8,] c  h  m  r  w
   [9,] C  H  M  R  W
 [10,] 4  9  14 19 24
 [11,] d  i  n  s  x
 [12,] D  I  N  S  X
 [13,] 5  10 15 20 25
 [14,] e  j  o  t  y
 [15,] E  J  O  T  Y

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Quantile Regression in R

2004-06-29 Thread roger koenker
The short answer to your question is that  quantile regression
estimates are estimating linear conditional quantile functions,
just like lm() is used to estimate conditional mean functions.
A longer answer would inevitably involve unpleasant suggestions
that you should follow the posting guide:
	a.)  send questions about packages to the maintainer, not R-help
	b.)  not attach datasets in modes that are stripped by R-help
	c.)  make a token effort to read the documentation and related 
literature


url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820
On Jun 29, 2004, at 10:26 AM, Ali Hirsa wrote:
I recently learn about Quantile Regression in R.
I am trying to study two time series (attached) by Quantile Regression 
in R.
I wrote the following code and do not know how to interpret the lines.

What kind of information can I get from them? Correlation for 
quantiles,
conditional probabilties (i.e. P(X in Quantile i | Y in Quantile i)) , 
and etc.
Many thanks in advance for any help.

Best,
Ali
library(quantreg)
#help.start()
Data - read.table(RESvsMOVE2.dat)
#
x - Data[,2]
y - Data[,1]
par(mfrow=c(2,2))
qqnorm(x,main=MOVE Norm Q-Q Plot, xlab=Normal Qunatiles,ylab = 
MOVE Quantiles)
qqline(x)

qqnorm(y,main=Residuals Norm Q-Q Plot, xlab=Normal Qunatiles,ylab 
= Residuals Quantiles)
qqline(y)

plot(x,y,xlab=MOVE,ylab=Residuals,cex=.5)
xx - seq(min(x),max(x),.5)
# Just a linear regression
g - coef(lm(y~x))
yy - (g[1]+g[2]*(xx))
lines(xx,yy,col=yellow)
taus - c(.05,.1,.25,.5,.75,.9,.95)
for(tau in taus){
f - coef(rq(y~x,tau=tau,method=pfn))
yy - (f[1]+f[2]*(xx))
if (tau ==.05){
 lines(xx,yy,col=red)
}
if (tau ==.95){
 lines(xx,yy,col=green)
}
if (tau != .05  tau != .95){
 lines(xx,yy,col=blue)
}
}


__
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] alternate rank method

2004-06-29 Thread Martin Maechler
 Torsten == Torsten Hothorn [EMAIL PROTECTED]
 on Mon, 28 Jun 2004 10:59:26 +0200 (CEST) writes:

Torsten On Fri, 25 Jun 2004, Douglas Grove wrote:

 I should have specified an additional constraint:
 
 I'm going to need to use this repeatedly on large vectors
 (length 10^6), so something efficient is needed.
 

Torsten give function `irank' in package `exactRankTests' a
Torsten try.

As an answer to Torsten (who got it already orally) and Gabor's
original tricky suggestions:

I strongly believe this should happen in the same C code on
which R's base rank() function works and already implements the
*averaging* of ties.
Doing the analog of changing average(..) to min(..) or max(..)
shouldn't be hard and certainly will be more efficient than the
workarounds posted here.

Patches welcome...
since otherwise I'm not sure I'll get there in time.

Martin

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] binding rows from different matrices

2004-06-29 Thread Peter Dalgaard
Prof Brian Ripley [EMAIL PROTECTED] writes:

 You can almost always index in such problems: here is one way.
 
 rbind(veca,vecb,vecc)[matrix(1:15, 3, byrow=T), ]
 
 Take it apart of see how it works, if it is not immediately obvious.

Or, a little longer, but perhaps more intuitive:

 matrix(aperm(array(c(veca,vecb,vecc),c(5,5,3)),c(3,1,2)),15)

I.e., convert to array, do generalized transpose, convert back to
matrix. Not that I got the index calculations right on first try

-- 
   O__   Peter Dalgaard Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics 2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Different behaviour of unique(), R vs. Splus.

2004-06-29 Thread J W

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] alternate rank method

2004-06-29 Thread Douglas Grove
I agree.  These are obvious extensions to the options provided
now by rank.  I didn't suggest this as I am not a contributor and
don't feel comfortable asking others to do more work :)

Thanks,
Doug


On Tue, 29 Jun 2004, Martin Maechler wrote:

  Torsten == Torsten Hothorn [EMAIL PROTECTED]
  on Mon, 28 Jun 2004 10:59:26 +0200 (CEST) writes:
 
 Torsten On Fri, 25 Jun 2004, Douglas Grove wrote:
 
  I should have specified an additional constraint:
  
  I'm going to need to use this repeatedly on large vectors
  (length 10^6), so something efficient is needed.
  
 
 Torsten give function `irank' in package `exactRankTests' a
 Torsten try.
 
 As an answer to Torsten (who got it already orally) and Gabor's
 original tricky suggestions:
 
 I strongly believe this should happen in the same C code on
 which R's base rank() function works and already implements the
 *averaging* of ties.
 Doing the analog of changing average(..) to min(..) or max(..)
 shouldn't be hard and certainly will be more efficient than the
 workarounds posted here.
 
 Patches welcome...
 since otherwise I'm not sure I'll get there in time.
 
 Martin


__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Different behaviour of unique(), R vs. Splus.

2004-06-29 Thread J W

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Several PCA questions...

2004-06-29 Thread Dan Bolser

Perhaps this question is less dumb... (in context below...)


On Tue, 29 Jun 2004, Prof Brian Ripley wrote:

On Tue, 29 Jun 2004, Dan Bolser wrote:

 Hi, I am doing PCA on several columns of data in a data.frame.
 
 I am interested in particular rows of data which may have a particular
 combination of 'types' of column values (without any pre-conception of
 what they may be).
 
 I do the following...
 
 # My data table.
 allDat - read.table(big_select_thresh_5, header=1)
 
 # Where some rows look like this...
 # PDB SUNID1  SUNID2  AA  CH  IPCAPCA IBB BB
 # 3sdh14984   14985   6   10  24  24  93  116
 # 3hbi14986   14987   6   10  20  22  94  117
 # 4sdh14988   14989   6   10  20  20  104 122
 
 # NB First three columns = row ID, last 6 = variables
 
 attach(allDat)
 
 # My columns of interest (variables).
 part - data.frame(AA,CH,IPCA,PCA,IBB,BB)
 
 pc - princomp(part)

Do you really want an unscaled PCA on that data set?  Looks unlikely (but 
then two of the columns are constant in the sample, which is also 
worrying).


That is just sample bias. By unscaled I assume you mean something like
normalized?


 plot(pc)
 
 The above plot shows that 95% of the variance is due to the first
 'Component' (which I assume is AA).

No, it is the first (principal) component.  You did ask for PCA!

 i.e. All the variables behave in quite much the same way.

Or you failed to scale the data so one dominates.

Yes.

I added the following to the above


x - colMeans(part)
partNorm - part/x
pc1 - princomp(partNorm)

plot(pc1)

biplot(pc1)

Which shows two major components, and possibly a third.

What I want to know is that given my data is not uniformly distributed, is
my normalization valid?

I know I should find this out via further investigation of PCA, but in
general if my variables have a very skewed distribution (possibly without
a theoretically definable mean) should I attempt to use any standard
clustering technique?

I guess I should log transform my data.

Cheers,
Dan.





 I then did ...
 
 
 biplot(pc)
 
 Which showed some outliers with a numeric ID - How do I get back my old 3
 part ID used in allDat?

Set row names on your data frame.  Like almost all of R, it is the row 
names of a data frame that are used for labelling, and you did not give 
any so you got numbers.

 In the above plot I saw all the variables (correctly named) pointing in
 more or less the same direction (as shown by the variance). I then did the
 following...
 
 postscript(file=test.ps,paper=a4)
 
 biplot(pc)
 
 dev.off()
 
 However, looking at test.ps shows that the arrows are missing (using
 ggv)... Hmmm, they come back when I pstoimg then xv... never mind.

So ggv is unreliable, perhaps cannot cope with colours?

 Finally, I would like to make a contour plot of the above biplot, is this
 possible? (or even a good way to present the data?

What do you propose to represent by the contours?  Biplots have a 
well-defined interpretation in terms of distances and angles.



__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] PAM clustering: using my own dissimilarity matrix

2004-06-29 Thread Hans Körber
Hello,
I would like to use my own dissimilarity matrix in a PAM clustering with 
method pam (cluster package) instead of a dissimilarity matrix created 
by daisy.

I read data from a file containing the dissimilarity values using 
read.csv. This creates a matrix (alternatively: an array or vector) 
which is not accepted by pam: A call

   p-pam(d,k=2,diss=TRUE)
yields an error message Error in pam(d, k = 2, diss = TRUE) : x is not 
of class dissimilarity and can not be converted to this class. How can 
I convert the matrix d into a dissimilarity matrix suitable for pam?

I'm aware of a response by Friedrich Leisch to a similar question posed 
by Jose Quesada (quoted below). But as I understood the answer, the 
dissimilarity matrix there is calculated on the basis of (random) data.

Thank you in advance.
Hans
__
/ On Tue, 09 Jan 2001 15:42:30 -0700, /
/ Jose Quesada (JQ) wrote: /
/  Hi, /
/  I'm trying to use a similarity matrix (triangular) as input for 
pam() or /
/  fanny() clustering algorithms. /
/  The problem is that this algorithms can only accept a dissimilarity /
/  matrix, normally generated by daisy(). /

/  However, daisy only accept 'data matrix or dataframe. Dissimilarities /
/  will be computed between the rows of x'. /
/  Is there any way to say to that your data are already a similarity /
/  matrix (triangular)? /
/  In Kaufman and Rousseeuw's FORTRAN implementation (1990), they 
showed an /
/  option like this one: /

/  Maybe you already have correlations coefficients between variables. /
/  Your input data constist on a lower triangular matrix of pairwise /
/  correlations. You wish to calculate dissimilarities between the /
/  variables. /
/  But I couldn't find this alternative in the R implementation. /
/  I can not use foo - as.dist(foo), neither daisy(foo...) because /
/  Dissimilarities will be computed between the rows of x, and this is /
/  not /
/  what I mean. /
/  You can easily transform your similarities into dissimilarities like /
/  this (also recommended in Kaufman and Rousseeuw ,1990): /
/  foo - (1 - abs(foo)) # where foo are similarities /
/  But then pam() will complain like this: /
/   x is not of class dissimilarity and can not be converted to this /
/  class. /
/  Can anyone help me? I also appreciate any advice about other 
clustering /
/  algorithms that can accept this type of input. /

Hmm, I don't understand your problem, because proceeding as the docs
describe it works for me ...
If foo is a similarity matrix (with 1 meaning identical objects), then
bar - as.dist(1 - abs(foo))
fanny(bar, ...)
works for me:
## create a random 12x12 similarity matrix, make it symmetric and set the
## diagonal to 1
/ x - matrix(runif(144), nc=12) /
/ x - x+t(x) /
/ diag(x) - 1 /
## now proceed as described in the docs
/ y - as.dist(1-x) /
/ fanny(y, 3) /
iterations objective
42.00 3.303235
Membership coefficients:
   [,1] [,2] [,3]
1 0.333 0.333 0.333
2 0.333 0.333 0.333
3 0.334 0.333 0.333
4 0.333 0.333 0.333
...
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] removing NA from an its object

2004-06-29 Thread Laura Holt
Hi again!
I have the following its object:
class(x1)
[1] its
attr(,package)
[1] its
x1
FTSE DAX
2004-06-07 4491.6 4017.81
2004-06-08 4504.8 4018.95
2004-06-09 4489.5 3997.76
2004-06-10 4486.1 4021.64
2004-06-11 4484.0 4014.56
2004-06-14 4433.2 3948.65
2004-06-15 4458.6 3987.30
2004-06-16 4491.1 4003.24
2004-06-17 4493.3 3985.46
2004-06-18 4505.8 3999.79
2004-06-21 4502.2 3989.31
2004-06-22 NA 3928.39
2004-06-23 NA 3945.10
2004-06-24 NA 4007.05
2004-06-25 NA 4013.35
2004-06-28 NA 4069.35
I want to create an its object with no NAs; that is, if there is an NA in 
any column, strike the entire row.

I did the following:
x2 - its(na.omit(x1))
x2
FTSE DAX
2004-06-07 4491.6 4017.81
2004-06-08 4504.8 4018.95
2004-06-09 4489.5 3997.76
2004-06-10 4486.1 4021.64
2004-06-11 4484.0 4014.56
2004-06-14 4433.2 3948.65
2004-06-15 4458.6 3987.30
2004-06-16 4491.1 4003.24
2004-06-17 4493.3 3985.46
2004-06-18 4505.8 3999.79
2004-06-21 4502.2 3989.31
attr(,na.action)
2004-06-22 2004-06-23 2004-06-24 2004-06-25 2004-06-28
   12 13 14 15 16
attr(,class)
[1] omit
class(x2) - its
My question:  is this the best way to accomplish the goal, please?  I tried 
apply with all and is.na but I got strange results.

Thanks.
R Version 1.9.1
Sincerely,
Laura
mailto: [EMAIL PROTECTED]
Married. http://lifeevents.msn.com/category.aspx?cid=married
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] naive question

2004-06-29 Thread Igor Rivin

I have a 100Mb comma-separated file, and R takes several minutes to read it
(via read.table()). This is R 1.9.0 on a linux box with a couple gigabytes of
RAM. I am conjecturing that R is gc-ing, so maybe there is some command-line
arg I can give it to convince it that I have a lot of space, or?!

Thanks!

Igor

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] PAM clustering: using my own dissimilarity matrix

2004-06-29 Thread Wolski
Hi!

If your x is your symmetric matrix containing the distances than cast it to an dist 
object using as.dist.
?as.dist.

Sincerely
Eryk

*** REPLY SEPARATOR  ***

On 29.06.2004 at 18:28 Hans Körber wrote:

Hello,

I would like to use my own dissimilarity matrix in a PAM clustering with 
method pam (cluster package) instead of a dissimilarity matrix created 
by daisy.

I read data from a file containing the dissimilarity values using 
read.csv. This creates a matrix (alternatively: an array or vector) 
which is not accepted by pam: A call

p-pam(d,k=2,diss=TRUE)

yields an error message Error in pam(d, k = 2, diss = TRUE) : x is not 
of class dissimilarity and can not be converted to this class. How can 
I convert the matrix d into a dissimilarity matrix suitable for pam?

I'm aware of a response by Friedrich Leisch to a similar question posed 
by Jose Quesada (quoted below). But as I understood the answer, the 
dissimilarity matrix there is calculated on the basis of (random) data.

Thank you in advance.
Hans

__

/ On Tue, 09 Jan 2001 15:42:30 -0700, /
/ Jose Quesada (JQ) wrote: /

/  Hi, /
/  I'm trying to use a similarity matrix (triangular) as input for 
pam() or /
/  fanny() clustering algorithms. /
/  The problem is that this algorithms can only accept a dissimilarity /
/  matrix, normally generated by daisy(). /

/  However, daisy only accept 'data matrix or dataframe. Dissimilarities /
/  will be computed between the rows of x'. /
/  Is there any way to say to that your data are already a similarity /
/  matrix (triangular)? /
/  In Kaufman and Rousseeuw's FORTRAN implementation (1990), they 
showed an /
/  option like this one: /

/  Maybe you already have correlations coefficients between variables. /
/  Your input data constist on a lower triangular matrix of pairwise /
/  correlations. You wish to calculate dissimilarities between the /
/  variables. /

/  But I couldn't find this alternative in the R implementation. /

/  I can not use foo - as.dist(foo), neither daisy(foo...) because /
/  Dissimilarities will be computed between the rows of x, and this is /
/  not /
/  what I mean. /

/  You can easily transform your similarities into dissimilarities like /
/  this (also recommended in Kaufman and Rousseeuw ,1990): /

/  foo - (1 - abs(foo)) # where foo are similarities /

/  But then pam() will complain like this: /

/   x is not of class dissimilarity and can not be converted to this /
/  class. /

/  Can anyone help me? I also appreciate any advice about other 
clustering /
/  algorithms that can accept this type of input. /

Hmm, I don't understand your problem, because proceeding as the docs
describe it works for me ...

If foo is a similarity matrix (with 1 meaning identical objects), then

bar - as.dist(1 - abs(foo))
fanny(bar, ...)

works for me:

## create a random 12x12 similarity matrix, make it symmetric and set the
## diagonal to 1
/ x - matrix(runif(144), nc=12) /
/ x - x+t(x) /
/ diag(x) - 1 /

## now proceed as described in the docs
/ y - as.dist(1-x) /
/ fanny(y, 3) /
iterations objective
 42.00 3.303235
Membership coefficients:
[,1] [,2] [,3]
1 0.333 0.333 0.333
2 0.333 0.333 0.333
3 0.334 0.333 0.333
4 0.333 0.333 0.333
...

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Several PCA questions...

2004-06-29 Thread Prof Brian Ripley
See `cor' in ?princomp, and its references.  I meant `scale' as in ?scale.


On Tue, 29 Jun 2004, Dan Bolser wrote:

 
 Perhaps this question is less dumb... (in context below...)
 
 
 On Tue, 29 Jun 2004, Prof Brian Ripley wrote:
 
 On Tue, 29 Jun 2004, Dan Bolser wrote:
 
  Hi, I am doing PCA on several columns of data in a data.frame.
  
  I am interested in particular rows of data which may have a particular
  combination of 'types' of column values (without any pre-conception of
  what they may be).
  
  I do the following...
  
  # My data table.
  allDat - read.table(big_select_thresh_5, header=1)
  
  # Where some rows look like this...
  # PDB SUNID1  SUNID2  AA  CH  IPCAPCA IBB BB
  # 3sdh14984   14985   6   10  24  24  93  116
  # 3hbi14986   14987   6   10  20  22  94  117
  # 4sdh14988   14989   6   10  20  20  104 122
  
  # NB First three columns = row ID, last 6 = variables
  
  attach(allDat)
  
  # My columns of interest (variables).
  part - data.frame(AA,CH,IPCA,PCA,IBB,BB)
  
  pc - princomp(part)
 
 Do you really want an unscaled PCA on that data set?  Looks unlikely (but 
 then two of the columns are constant in the sample, which is also 
 worrying).
 
 
 That is just sample bias. By unscaled I assume you mean something like
 normalized?
 
 
  plot(pc)
  
  The above plot shows that 95% of the variance is due to the first
  'Component' (which I assume is AA).
 
 No, it is the first (principal) component.  You did ask for PCA!
 
  i.e. All the variables behave in quite much the same way.
 
 Or you failed to scale the data so one dominates.
 
 Yes.
 
 I added the following to the above
 
 
 x - colMeans(part)
 partNorm - part/x
 pc1 - princomp(partNorm)
 
 plot(pc1)
 
 biplot(pc1)
 
 Which shows two major components, and possibly a third.
 
 What I want to know is that given my data is not uniformly distributed, is
 my normalization valid?
 
 I know I should find this out via further investigation of PCA, but in
 general if my variables have a very skewed distribution (possibly without
 a theoretically definable mean) should I attempt to use any standard
 clustering technique?
 
 I guess I should log transform my data.
 
 Cheers,
 Dan.
 
 
 
 
 
  I then did ...
  
  
  biplot(pc)
  
  Which showed some outliers with a numeric ID - How do I get back my old 3
  part ID used in allDat?
 
 Set row names on your data frame.  Like almost all of R, it is the row 
 names of a data frame that are used for labelling, and you did not give 
 any so you got numbers.
 
  In the above plot I saw all the variables (correctly named) pointing in
  more or less the same direction (as shown by the variance). I then did the
  following...
  
  postscript(file=test.ps,paper=a4)
  
  biplot(pc)
  
  dev.off()
  
  However, looking at test.ps shows that the arrows are missing (using
  ggv)... Hmmm, they come back when I pstoimg then xv... never mind.
 
 So ggv is unreliable, perhaps cannot cope with colours?
 
  Finally, I would like to make a contour plot of the above biplot, is this
  possible? (or even a good way to present the data?
 
 What do you propose to represent by the contours?  Biplots have a 
 well-defined interpretation in terms of distances and angles.
 
 
 
 

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Jens Praestgaard/Hgsi is out of the office.

2004-06-29 Thread Jens_Praestgaard
I will be out of the office starting  06/28/2004 and will not return until
06/30/2004.

Jens Praestgaard is  out of the office until June 30 and  will respond to
your message when he returns.
Thank you

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] removing NA from an its object

2004-06-29 Thread kevin bartz
What did you try with apply? It seems to work for me. I did

x2[!apply(is.na(x2), 1, any),]

and got the desired results.

Kevin

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Laura Holt
Sent: Tuesday, June 29, 2004 9:26 AM
To: [EMAIL PROTECTED]
Subject: [R] removing NA from an its object

Hi again!

I have the following its object:
class(x1)
[1] its
attr(,package)
[1] its
x1
 FTSE DAX
2004-06-07 4491.6 4017.81
2004-06-08 4504.8 4018.95
2004-06-09 4489.5 3997.76
2004-06-10 4486.1 4021.64
2004-06-11 4484.0 4014.56
2004-06-14 4433.2 3948.65
2004-06-15 4458.6 3987.30
2004-06-16 4491.1 4003.24
2004-06-17 4493.3 3985.46
2004-06-18 4505.8 3999.79
2004-06-21 4502.2 3989.31
2004-06-22 NA 3928.39
2004-06-23 NA 3945.10
2004-06-24 NA 4007.05
2004-06-25 NA 4013.35
2004-06-28 NA 4069.35

I want to create an its object with no NAs; that is, if there is an NA in 
any column, strike the entire row.


I did the following:
x2 - its(na.omit(x1))
x2
 FTSE DAX
2004-06-07 4491.6 4017.81
2004-06-08 4504.8 4018.95
2004-06-09 4489.5 3997.76
2004-06-10 4486.1 4021.64
2004-06-11 4484.0 4014.56
2004-06-14 4433.2 3948.65
2004-06-15 4458.6 3987.30
2004-06-16 4491.1 4003.24
2004-06-17 4493.3 3985.46
2004-06-18 4505.8 3999.79
2004-06-21 4502.2 3989.31
attr(,na.action)
2004-06-22 2004-06-23 2004-06-24 2004-06-25 2004-06-28
12 13 14 15 16
attr(,class)
[1] omit
class(x2) - its


My question:  is this the best way to accomplish the goal, please?  I tried 
apply with all and is.na but I got strange results.

Thanks.
R Version 1.9.1
Sincerely,
Laura
mailto: [EMAIL PROTECTED]


Married. http://lifeevents.msn.com/category.aspx?cid=married

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] R via ssh login on OS X?

2004-06-29 Thread James Howison
On Jun 29, 2004, at 3:43 AM, Prof Brian Ripley wrote:
Did you look at the notes on MacOS X in the R-admin manual (as the 
INSTALL
file asks)?  That would have told you why lapack failed, and I think 
you
should redo your build following the advice there.
Clearly I didn't read closely enough.  Thanks for the reminder.  The 
build and check completed successfully as a fully non-root build  with 
this sequence:

Compile f2c and libf2c and put f2c, f2c.h and libf2c.a in $HOME/f2c.  
Run ranlib on libf2c.a

mkdir $HOME/f2c
mkdir ~/Rinstall
mv R-1.9.1.tgz Rinstall/
PATH=$PATH:$HOME/f2c  (So that configure can find the f2c executable)
export LDFLAGS=-L$HOME/f2c/
export CPPFLAGS=-I$HOME/f2c/
./configure --prefix=$HOME/Rinstall/  --with-blas='-framework vecLib' 
--with-lapack

make
make check
make install
are all successful.
Built in this way it has no problems with remote login on OS X.  As I 
said I haven't found problems with remote log-in at at with 1.9.1

Thanks for the help everyone.
--J
On Tue, 29 Jun 2004, James Howison wrote:
[...]
then did
./configure --prefix=$HOME/Rinstall/ --enable-R-framework=no
--with-x=no --with-lapack=no
Note
   --with-blas='-framework vecLib' --with-lapack
is `strongly recommended', and on some versions of MacOS X `appears to 
be
the only way to build R'.

and then
make
This basically worked but for some reason lapack was still trying to
build and that was failing, so I deleted it from the appropriate
makefile and the rest of the compile went fine.  The lapack confusion
stopped some of the recommended modules from building but I didn't 
need
those (just sna which built fine from CRAN).
[...]
--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

--James
+1 315 395 4056
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] RE: [S] Different behaviour of unique(), R vs. Splus.

2004-06-29 Thread Liaw, Andy
The source of the incompatibility:

In S-PLUS 6.2:

 methods(unique)
   splussplus  menu splus 
 unique.data.frame unique.default unique.name unique.rowcol.names


In R-1.9.1:

 methods(unique)
[1] unique.array  unique.data.frame unique.defaultunique.matrix


Unless there's some sort of coordination (or even just separate effort) on
either/both R Core and Insightful developers to make sure there's agreement
on what methods to provide in the base code, such problem can only get
worse, not better, I guess.

Best,
Andy


 From: Rolf Turner
 
 Apologies for the cross-posting, but I thought this snippet of info
 might be vaguely interesting to both lists.
 
 I did a ***brief*** search to see if this issue had previously been
 discussed and found nothing.  So I thought I'd tell the list about a
 difference in behaviour between unique() in R and unique() in Splus
 which bit me just now.
 
 I was trying to convert a package from Splus to R and got nonsense
 answers in R.  Turned out that within the bowels of the package I was
 doing something like
 
   u - unique(y)
 
 where y was a matrix of integer values.  In Splus this gives a
 (short) vector of unique values.  In R it gives a matrix of the same
 dimensionality as y, except that any duplicated rows are eliminated.
 
 (This looks like being very useful --- once you know about it.  And
 it was probably mentioned in the R release notes at one time, but, as
 Dr. Hook says, ``I was stoned and I missed it.'')
 
 E.g.
   set.seed(42)
   m - matrix(sample(1:5,20,TRUE),5,4)
   u - unique(m)
 
 In R ``u'' is identical to ``m''; in Splus ``u'' is vector (of
 length 5).
 
 To get what I want in R I simply need to do
 
   u - unique(as.vector(y))
 
 Simple, once you know.  Took me a devil of a long time to track down
 what was going wrong, but!
 
   cheers,
 
   Rolf Turner
 
 This message was distributed by [EMAIL PROTECTED]  To
 ...(s-news.. clipped)...

 


__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] anti-R vitriol

2004-06-29 Thread Barry Rowlingson
A colleague is receiving some data from another person. That person 
reads the data in SAS and it takes 30s and uses 64k RAM. That person 
then tries to read the data in R and it takes 10 minutes and uses a 
gigabyte of RAM. Person then goes on to say:

  It's not that I think SAS is such great software,
  it's not.  But I really hate badly designed
  software.  R is designed by committee.  Worse,
  it's designed by a committee of statisticians.
  They tend to confuse numerical analysis with
  computer science and don't have any idea about
  software development at all.  The result is R.
  I do hope [your colleague] won't have to waste time doing
  [this analysis] in an outdated and poorly designed piece
  of software like R.
Would any of the committee like to respond to this? Or shall we just 
slap our collective forehead and wonder how someone could get such a view?


Barry
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Goodness of fit test for estimated distribution

2004-06-29 Thread Christian Hennig
Hi,

is there any method for goodness of fit testing of an (as general as
possible) univariate distribution with parameters estimated, for normal, 
exponential, gamma distributions, say (e.g. the corrected p-values for 
the Kolmogorov-Smirnov or Chi-squared with corresponding ML estimation
method)? 
It seems that neither ks.test nor chisq.test handle estimated parameters.
I am aware of function goodfit in package vcd, which seems to it for some
discrete distributions.

Thank you for help,
Christian 


***
Christian Hennig
Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg
[EMAIL PROTECTED], http://www.math.uni-hamburg.de/home/hennig/
###
ich empfehle www.boag-online.de

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] binding rows from different matrices

2004-06-29 Thread Giovanni Petris

Still another variation on the same theme:

 matrix(t(cbind(veca,vecb,vecc)),nc=5,byrow=T)

Giovanni

 Date: Tue, 29 Jun 2004 17:58:32 +0200
 From: Peter Dalgaard [EMAIL PROTECTED]
 Sender: [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED], Stephane DRAY [EMAIL PROTECTED]
 Precedence: list
 User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2
 Lines: 20
 
 Prof Brian Ripley [EMAIL PROTECTED] writes:
 
  You can almost always index in such problems: here is one way.
  
  rbind(veca,vecb,vecc)[matrix(1:15, 3, byrow=T), ]
  
  Take it apart of see how it works, if it is not immediately obvious.
 
 Or, a little longer, but perhaps more intuitive:
 
  matrix(aperm(array(c(veca,vecb,vecc),c(5,5,3)),c(3,1,2)),15)
 
 I.e., convert to array, do generalized transpose, convert back to
 matrix. Not that I got the index calculations right on first try
 
 -- 
O__   Peter Dalgaard Blegdamsvej 3  
   c/ /'_ --- Dept. of Biostatistics 2200 Cph. N   
  (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 
 


-- 

 __
[  ]
[ Giovanni Petris [EMAIL PROTECTED] ]
[ Department of Mathematical Sciences  ]
[ University of Arkansas - Fayetteville, AR 72701  ]
[ Ph: (479) 575-6324, 575-8630 (fax)   ]
[ http://definetti.uark.edu/~gpetris/  ]
[__]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] anti-R vitriol

2004-06-29 Thread Berton Gunter
My reaction, as a mere individual user: Of course, one cannot have any idea
what's really going on, so a rational reply to the rant is impossible. But, as
this list repeatedly demonstrates (and as we all have probably experienced), it
is possible to do things foolishly in any software.

Worth noting: John Chambers, the designer of the S language (of which R is an
implementation) won  an ACM computing award (readers -- please correct details of
this citation) for his achievement; so apparently the professional computing
community disagreed with the sentiments expressed in the rant.

Cheers,

--

Bert Gunter

Non-Clinical Biostatistics
Genentech
MS: 240B
Phone: 650-467-7374


The business of the statistician is to catalyze the scientific learning
process.

 -- George E.P. Box

Barry Rowlingson wrote:

 A colleague is receiving some data from another person. That person
 reads the data in SAS and it takes 30s and uses 64k RAM. That person
 then tries to read the data in R and it takes 10 minutes and uses a
 gigabyte of RAM. Person then goes on to say:

It's not that I think SAS is such great software,
it's not.  But I really hate badly designed
software.  R is designed by committee.  Worse,
it's designed by a committee of statisticians.
They tend to confuse numerical analysis with
computer science and don't have any idea about
software development at all.  The result is R.

I do hope [your colleague] won't have to waste time doing
[this analysis] in an outdated and poorly designed piece
of software like R.

 Would any of the committee like to respond to this? Or shall we just
 slap our collective forehead and wonder how someone could get such a view?

 Barry

 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] anti-R vitriol

2004-06-29 Thread Roger D. Peng
I'm not too concerned about your colleague's view about R.  S/He 
doesn' have to like it, and I don't think anyone actually believes 
that R is designed to make *everyone* happy.  For me, R does about 99% 
of the things I need to do, but sadly, when I need to order a pizza, I 
still have to pick up the telephone.

What worries me more is that your colleague seems to have lost sight 
of the fact that just about all software development involves 
tradeoffs.  Although I've never used SAS, I've used other stat 
packages and it's clear that all of them (including R) have traded in 
some things to get out other things.  An example is R's potentially 
large memory usage, which, one might argue, trades in analyses of very 
large datasets but gets out a very powerful and elegant programming 
language.

Rather than use absolutes, I'd encourage your colleague to be more 
specific.  Rather than and say things like R is poorly designed I'd 
like to hear R is poorly designed for [fill in the blank].  Then we 
can get a better handle on the world in which s/he lives.

-roger
Barry Rowlingson wrote:
A colleague is receiving some data from another person. That person 
reads the data in SAS and it takes 30s and uses 64k RAM. That person 
then tries to read the data in R and it takes 10 minutes and uses a 
gigabyte of RAM. Person then goes on to say:

  It's not that I think SAS is such great software,
  it's not.  But I really hate badly designed
  software.  R is designed by committee.  Worse,
  it's designed by a committee of statisticians.
  They tend to confuse numerical analysis with
  computer science and don't have any idea about
  software development at all.  The result is R.
  I do hope [your colleague] won't have to waste time doing
  [this analysis] in an outdated and poorly designed piece
  of software like R.
Would any of the committee like to respond to this? Or shall we just 
slap our collective forehead and wonder how someone could get such a view?


Barry
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

--
Roger D. Peng
http://www.biostat.jhsph.edu/~rpeng/
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] anti-R vitriol

2004-06-29 Thread Liaw, Andy
 From: Barry Rowlingson
 
 A colleague is receiving some data from another person. That person 
 reads the data in SAS and it takes 30s and uses 64k RAM. That person 
 then tries to read the data in R and it takes 10 minutes and uses a 
 gigabyte of RAM. Person then goes on to say:
 
It's not that I think SAS is such great software,
it's not.  But I really hate badly designed
software.  R is designed by committee.  Worse,
it's designed by a committee of statisticians.
They tend to confuse numerical analysis with
computer science and don't have any idea about
software development at all.  The result is R.
 
I do hope [your colleague] won't have to waste time doing
[this analysis] in an outdated and poorly designed piece
of software like R.
 
 Would any of the committee like to respond to this? Or 
 shall we just 
 slap our collective forehead and wonder how someone could get 
 such a view?
 
 Barry
 

My $0.02:

R, being a flexible programming language, has an amazing ability to cope
with people's laziness/ignorance/inelegance, but it comes at a (sometimes
hefty) price.  While there is no specifics on the situation leading to the
person's comments, here's one (not as extreme) example that I happen to come
across today:

 system.time(spam - read.table(data_dmc2003_train.txt, 
+ header=T, 
+ colClasses=c(rep(numeric, 833), 
+  character)))
[1] 15.92  0.09 16.80NANA
 system.time(spam - read.table(data_dmc2003_train.txt, header=T))
[1] 187.29   0.60 200.19 NA NA

My SAS ability is rather serverely limited, but AFAIK, one needs to specify
_all_ variables to be read into a dataset in order to read in the data in
SAS.  If one has that information, R can be very efficient as well.  Without
that information, one gets nothing in SAS, or just let R does the hard work.

Best,
Andy

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Goodness of fit test for estimated distribution

2004-06-29 Thread roger koenker
In full generality this is a quite difficult problem as discussed in
Durbin's (1973) SIAM monograph.  An elegant general approach
is provided by Khmaladze
@article{Khma:Arie:1981,
author = {Khmaladze, E. V.},
title = {Martingale approach in the theory of goodness-of-fit 
tests},
year = {1981},
journal = {Theory of Probability and its Applications (Transl of 
Teorija Verojatnostei i ee Primenenija)},
volume = {26},
pages = {240--257}
}

but I don't think that there is a general implementation of the 
approach for R, or
any other software environment, for that matter.

url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820
On Jun 29, 2004, at 1:08 PM, Christian Hennig wrote:
Hi,
is there any method for goodness of fit testing of an (as general as
possible) univariate distribution with parameters estimated, for 
normal,
exponential, gamma distributions, say (e.g. the corrected p-values for
the Kolmogorov-Smirnov or Chi-squared with corresponding ML estimation
method)?
It seems that neither ks.test nor chisq.test handle estimated 
parameters.
I am aware of function goodfit in package vcd, which seems to it for 
some
discrete distributions.

Thank you for help,
Christian
***
Christian Hennig
Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg
[EMAIL PROTECTED], http://www.math.uni-hamburg.de/home/hennig/
###
ich empfehle www.boag-online.de
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] job opening in Merck Research Labs, NJ

2004-06-29 Thread Liaw, Andy
Apology for the cross-post...  Andy

==

Job description:  Computational statistician/medical image analyst

The Biometrics Research Department at Merck Research Laboratories, Merck 
Co., Inc. in Rahway, NJ is seeking a highly motivated statistician/data
analyst to work in its basic research and drug discovery area.  The
applicant should have broad expertise in image processing, statistics, and
computer science, with substantial experience in medical imaging analysis
including design of experiments, image registration and segmentation,
statistical analysis, and pattern recognition.  The position will initially
involve providing statistical, mathematical, and software development
support for MRI and ultrasound imaging teams in preclinical research (i.e.,
animal studies, not human).  Merck has its own facilities for CT, PET, MRI,
and ultrasound imaging. We are looking for a Ph.D. with a background and/or
post-doctoral experience in at least one of the following fields:
Statistics, Electrical/Computer or Biomedical Engineering, Computer Science,
Applied Mathematics, or Physics.  Advanced computer programming skills
(including, but not limited to Matlab, C/C++, Visual Basic, SQL, IDL, or
PV-WAVE), and good communication skills are essential, as is familiarity
with  statistical software  like R and  Splus.  The position may also
involve general statistical consulting and training.  An ability to lead
statistical analysis efforts within a multidisciplinary team is required.
Strong candidates will also have interests and experience in computer
vision, machine learning/data mining, and/or signal processing. 

Our dedication to delivering quality medicines in innovative ways and our
commitment to bringing out the best in our people are just some of the
reasons why we're ranked among Fortune magazine's 100 Best Companies to
Work for in America.  We offer a competitive salary, an oustanding benefits
package, and a professional work environment with a company known for
scientific excellence.  To apply, please forward your CV or resume and cover
letter to

ATTENTION: Open Position
Vladimir Svetnik, Ph.D.
Biometrics Research Dept.
Merck Research Laboratories, RY33-300
126 E. Lincoln Avenue
Rahway, NJ 07065-0900
[EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Goodness of fit test for estimated distribution

2004-06-29 Thread Spencer Graves
 What about Monte Carlo?  I recently produced (with help from 
contributors to this list) qq plots for certain complicated mixtures of 
distributions.  To evaluate goodness of fit, I produced Monte Carlo 
confidence intervals from 401 simulated qq plots and took the 11th and 
391st of them for each quantile.  {quantile(1:401, c(.025, .975)) = 
c(11, 391)}.  Something like this could be done to obtain a significance 
level for ks.test, for example. 

 This may not be as satisfying for some purposes as a clean, 
theoretical result, but it produced useful answers without busting the 
project budget too badly. 

 hope this helps. 
 spencer graves

roger koenker wrote:
In full generality this is a quite difficult problem as discussed in
Durbin's (1973) SIAM monograph.  An elegant general approach
is provided by Khmaladze
@article{Khma:Arie:1981,
author = {Khmaladze, E. V.},
title = {Martingale approach in the theory of goodness-of-fit tests},
year = {1981},
journal = {Theory of Probability and its Applications (Transl of 
Teorija Verojatnostei i ee Primenenija)},
volume = {26},
pages = {240--257}
}

but I don't think that there is a general implementation of the 
approach for R, or
any other software environment, for that matter.

url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820
On Jun 29, 2004, at 1:08 PM, Christian Hennig wrote:
Hi,
is there any method for goodness of fit testing of an (as general as
possible) univariate distribution with parameters estimated, for normal,
exponential, gamma distributions, say (e.g. the corrected p-values for
the Kolmogorov-Smirnov or Chi-squared with corresponding ML estimation
method)?
It seems that neither ks.test nor chisq.test handle estimated 
parameters.
I am aware of function goodfit in package vcd, which seems to it for 
some
discrete distributions.

Thank you for help,
Christian
***
Christian Hennig
Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg
[EMAIL PROTECTED], http://www.math.uni-hamburg.de/home/hennig/
###
ich empfehle www.boag-online.de
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] anti-R vitriol

2004-06-29 Thread Gabor Grothendieck

Barry Rowlingson B.Rowlingson at lancaster.ac.uk writes:

: A colleague is receiving some data from another person. That person 
: reads the data in SAS and it takes 30s and uses 64k RAM. That person 
: then tries to read the data in R and it takes 10 minutes and uses a 
: gigabyte of RAM. Person then goes on to say:
: 
:It's not that I think SAS is such great software,
:it's not.  But I really hate badly designed
:software.  R is designed by committee.  Worse,
:it's designed by a committee of statisticians.
:They tend to confuse numerical analysis with
:computer science and don't have any idea about
:software development at all.  The result is R.
: 
:I do hope [your colleague] won't have to waste time doing
:[this analysis] in an outdated and poorly designed piece
:of software like R.
: 
: Would any of the committee like to respond to this? Or shall we just 
: slap our collective forehead and wonder how someone could get such a view?

Does he have to repeatedly read in different large datasets or is this 
just a one time requirement?  In the latter case, he could read in the 
data, save it (using the save command), and then just load it (using
the load command) in subsequent sessions.  He would only have to wait 
10 minutes the first time.  If he has that much data its probably a 
large project and a one time hit of 10 minutes versus several days, 
weeks or months of work seems negligible.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] fl_show_fselector

2004-06-29 Thread Dimas Martnez Morera
Hi everybody, 

I new to xforms and I'm trying to use fl_show_fselector. In fact I did it 
without any problem. But now I'm getting segmentation fault in the line of 
code:

filename =fl_show_fselector(Select file to open,.,*.off, );

the message is:

In SetFont [fonts.c 224] Bad FontStyle request 0: 
Segmentation fault (core dumped)


I'm wondering what is happening here. Can anybody help me?

Thanks a lot,
 Dimas

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] [R-pkgs] MNP

2004-06-29 Thread Kosuke Imai
We would like to announce the release of our software, which is now 
available through CRAN.

MNP: R Package for Fitting the Multinomial Probit Models

Abstract:
MNP is a publicly available R package that fits the Bayesian multinomial
probit models via Markov chain Monte Carlo. Along with the standard
multinomial probit model, it can also fit models with different choice
sets for each observation, and complete or partial ordering of all the
available alternatives. The computation is based on the efficient marginal
data augmentation algorithm that is developed by Imai and van Dyk (2004)  
``A Bayesian Analysis of the Multinomial Probit Model Using the Data
Augmentation,'' Journal of Econometrics, Forthcoming.

Kosuke Imai, Department of Politics, Princeton University
Jordan R. Vance, Department of Computer Science, Princeton University
David A. van Dyk, Department of Statistics, University of California, Irvine

___
R-packages mailing list
[EMAIL PROTECTED]
https://www.stat.math.ethz.ch/mailman/listinfo/r-packages

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] naive question

2004-06-29 Thread Prof Brian Ripley
There are hints in the R Data Import/Export Manual.  Just checking: you 
_have_ read it?

On Tue, 29 Jun 2004, Igor Rivin wrote:

 
 I have a 100Mb comma-separated file, and R takes several minutes to read it
 (via read.table()). This is R 1.9.0 on a linux box with a couple gigabytes of
 RAM. I am conjecturing that R is gc-ing, so maybe there is some command-line
 arg I can give it to convince it that I have a lot of space, or?!

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Re: [S] Different behaviour of unique(), R vs. Splus.

2004-06-29 Thread Prof Brian Ripley
On Tue, 29 Jun 2004, Liaw, Andy wrote:

 The source of the incompatibility:
 
 In S-PLUS 6.2:
 
  methods(unique)
splussplus  menu splus 
  unique.data.frame unique.default unique.name unique.rowcol.names
 
 
 In R-1.9.1:
 
  methods(unique)
 [1] unique.array  unique.data.frame unique.defaultunique.matrix
 
 
 Unless there's some sort of coordination (or even just separate effort) on
 either/both R Core and Insightful developers to make sure there's agreement
 on what methods to provide in the base code, such problem can only get
 worse, not better, I guess.

There are plans to that effect, but R moves much faster than a commercial 
product such as S-PLUS.

It seems to me a bad idea that unique (or foo) does different things for 
matrices and data frames, for as we see frequently, many users do not 
distinguish between them.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] naive question

2004-06-29 Thread Igor Rivin

I did read the Import/Export document. It is true that replacing
the read.table by read.csv and setting the commentChar= speeds
things up some (a factor of two?) -- this is very far from acceptable performance,
being some two orders of magnitude worse than SAS (the IO of which is, in turn, much 
worse
than that of the unix utilities (awk, sort, and so on))   . Setting colClasses is 
suggested
(and has been suggested by some in response to my question), but for 
a frame with some 60 columns, this is a major nuisance.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[OT] Ordering pizza [was Re: [R] anti-R vitriol]

2004-06-29 Thread Douglas Bates
Roger D. Peng wrote:
I'm not too concerned about your colleague's view about R.  S/He doesn' 
have to like it, and I don't think anyone actually believes that R is 
designed to make *everyone* happy.  For me, R does about 99% of the 
things I need to do, but sadly, when I need to order a pizza, I still 
have to pick up the telephone.
There are several chains of pizzerias in the U.S. that provide for 
Internet-based ordering (e.g. www.papajohnsonline.com) so, with the 
Internet modules in R, it's only a matter of time before you will have a 
pizza-ordering function available.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [OT] Ordering pizza [was Re: [R] anti-R vitriol]

2004-06-29 Thread Rolf Turner

Dang!  You're making me hungry!

cheers,

Rolf Turner

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [OT] Ordering pizza [was Re: [R] anti-R vitriol]

2004-06-29 Thread Prof Brian Ripley
On Tue, 29 Jun 2004, Douglas Bates wrote:

 Roger D. Peng wrote:
 
  I'm not too concerned about your colleague's view about R.  S/He doesn' 
  have to like it, and I don't think anyone actually believes that R is 
  designed to make *everyone* happy.  For me, R does about 99% of the 
  things I need to do, but sadly, when I need to order a pizza, I still 
  have to pick up the telephone.
 
 There are several chains of pizzerias in the U.S. that provide for 
 Internet-based ordering (e.g. www.papajohnsonline.com) so, with the 
 Internet modules in R, it's only a matter of time before you will have a 
 pizza-ordering function available.

Indeed, the GraphApp toolkit (used for the RGui interface under R for
Windows, but Guido forgot to include it) provides one (for use in Sydney,
Australia, we presume as that is where the GraphApp author hails from).
Alternatively, a Padovian has no need of ordering pizzas with both home 
and neighbourhood restaurants 

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Is there a function for Principal Surface?

2004-06-29 Thread Fred
Dear All
 
I know there are functions (PCURVE, PRINCURVE) in R packages to estimate
the principal curve given a set of data.
 
Now, I am wondering if there are some functions for estimating the 
Principal Surface?
 
Please give me a hint if you know some function or software (not limited
to R) to be used for Principal Surface.
 
Thanks for your help.
 
Fred
 

[[alternative HTML version deleted]]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] abline and its objects

2004-06-29 Thread Laura Holt
Hi R People:
Is there a way to put an abline line for its objects on a plot, please?
I have an its object, ibm2, which runs from the January 2 through May 28.
ibm2
 ibm
2004-01-02  91.55
2004-01-05  93.05
2004-01-06  93.06
2004-01-07  92.78
2004-01-08  93.04
2004-01-09  91.21
2004-01-12  91.55
2004-01-13  89.70
2004-01-14  90.31
2004-01-15  94.02
.
.
.
I plot the data.  No Problem.
Now I extract the first day of the month in this fashion.
zi - extractIts(ibm2,weekday=T,find=first,period=month)
zi
ibm
2004-01-02 91.55
2004-02-02 99.39
2004-03-01 97.04
2004-04-01 92.37
2004-05-03 88.02

Still ok.
I would like to put a vertical line at each of the zi values.
abline(v=zi,type=h,col=2)
lines(zi,type=h,col=2)
Nothing happens.
I tried creating another its object with NA in all but the zi places.  Then 
I used lines(test1)

Still nothing happened.
Any suggestions would be much appreciated.
R Version 1.9.1
Sincerely,
Laura H
mailto: [EMAIL PROTECTED]
Married. http://lifeevents.msn.com/category.aspx?cid=married
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] abline and its objects

2004-06-29 Thread Kevin Bartz
The problem is that you've instructed R to place lines at the values 91.55,
99.39, 97.04, 92.37 and 88.02, but these values do not correspond to the
user coordinates of the x-axis (which you've specified to be dates).

Luckily, the dates where you need lines are in the rownames of zi. You do
need to convert them to your user coordinates--and that depends on how plot
decides to specify your user coordinates, which hinges on the range of your
full data set (you clipped it).

What's on your x-axis? Do par(usr) when you have one of the plots open and
tell me what R says.

Kevin

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Laura Holt
Sent: Tuesday, June 29, 2004 2:06 PM
To: [EMAIL PROTECTED]
Subject: [R] abline and its objects

Hi R People:

Is there a way to put an abline line for its objects on a plot, please?

I have an its object, ibm2, which runs from the January 2 through May 28.

ibm2
  ibm
2004-01-02  91.55
2004-01-05  93.05
2004-01-06  93.06
2004-01-07  92.78
2004-01-08  93.04
2004-01-09  91.21
2004-01-12  91.55
2004-01-13  89.70
2004-01-14  90.31
2004-01-15  94.02
.
.
.
I plot the data.  No Problem.
Now I extract the first day of the month in this fashion.
zi - extractIts(ibm2,weekday=T,find=first,period=month)
zi
 ibm
2004-01-02 91.55
2004-02-02 99.39
2004-03-01 97.04
2004-04-01 92.37
2004-05-03 88.02


Still ok.
I would like to put a vertical line at each of the zi values.

abline(v=zi,type=h,col=2)
lines(zi,type=h,col=2)

Nothing happens.

I tried creating another its object with NA in all but the zi places.  Then 
I used lines(test1)

Still nothing happened.

Any suggestions would be much appreciated.
R Version 1.9.1
Sincerely,
Laura H
mailto: [EMAIL PROTECTED]


Married. http://lifeevents.msn.com/category.aspx?cid=married

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] naive question

2004-06-29 Thread Liaw, Andy
 From: [EMAIL PROTECTED]
 
 I did read the Import/Export document. It is true that replacing
 the read.table by read.csv and setting the commentChar= speeds
 things up some (a factor of two?) -- this is very far from 
 acceptable performance,
 being some two orders of magnitude worse than SAS (the IO of 
 which is, in turn, much worse
 than that of the unix utilities (awk, sort, and so on))   . 
 Setting colClasses is suggested
 (and has been suggested by some in response to my question), but for 
 a frame with some 60 columns, this is a major nuisance.
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html

Please don't make _your_ nuisance into others'.  Do read the posting guide
as suggested above.  You have not provided any info for anyone to give you
any useful advice beyond those you said you received.

R is not all things to all people.  If you are so annoyed, why not use
SAS/awk/sort and so on?

[For my own education:  How do you read the file into SAS without specifying
column names and types?]

Andy

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Issue with ROracle and results not being returned

2004-06-29 Thread Coburn Watson
Hello,

I am using ROracle 5.5 with R 1.8.1. I am connecting to an oracle database
, issuing a query and attempting to fetch data from the result set.  I can see my 
session in the oracle database as well as the sql which was executed (including number 
of blocks hit, etc). 
 I have also verified that the SQL returns a valid result set from sqlplus.

Below is a sample trace of my session:

 library(ROracle)
 drv - dbDriver(Oracle)
 con - dbConnect(drv,perf/[EMAIL PROTECTED])
 rs1 - dbSendQuery(con, statement = paste (SELECT distinct api FROM et_log_data 
 order by api))
 df- fetch(rs1,n=-1)
 summary(rs1,verbose=T)
OraResult:(25657,0,3)
  Statement: SELECT distinct api FROM et_log_data order by api
  Has completed? yes
  Affected rows: 0
  Rows fetched: -1
  Fields:
  nameSclass type len precision scale isVarLength nullOK
1  API character VARCHAR2  50 0 0TRUE  FALSE
 summary(df,verbose=T)
 API
 Length:0
 Class :character
 Mode  :character
 df
[1] API
0 rows (or 0-length row.names)
 q()

Any ideas on why the data cannot be retrieved into the df object?   Please remove 
_nospam from email address to email me directly.  
Any help would be appreciated.

Coburn

(sample output from query via sqlplus)
SQL SELECT distinct api FROM et_log_data order by api
  2  ;

API
--
ADD_ACCOUNT
ADD_COMMENT
ADD_DLR_CH_SUB
..
 54 total rows returned

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] rgl installation problems

2004-06-29 Thread E GCP
Thanks for your replies. I do not HTML-ize my mail, but free email accounts 
do that and there is not a switch to turn it off.  I apologize in advance.

I installed R from the redhat package provided by Martyn Plummer. It 
installed fine and without problems. I can use R and have installed and used 
other packages within R without any problems whatsoever. I do not think the 
problem is with R or its installation.   I do think there is a problem with 
the installation of rgl_0.64-13.tar.gz on RedHat 9 (linux).  So, if there is 
anybody out there who has installed succesfully rgl_0.64-13.tar.gz on RedHat 
9, I would like to know how.

Thanks so much,
Enrique
From: Peter Dalgaard [EMAIL PROTECTED]
To: Prof Brian Ripley [EMAIL PROTECTED]
CC: E GCP [EMAIL PROTECTED], [EMAIL PROTECTED]
Subject: Re: [R] rgl installation problems
Date: 28 Jun 2004 20:05:34 +0200
Prof Brian Ripley [EMAIL PROTECTED] writes:
On Mon, 28 Jun 2004, E GCP wrote:
Thanks for your quick replies, and excuse my naivete, but how do I fix 
the
problem, so the rgl package installs?
We have no idea what is wrong on your system -- all we can tell is that
you have done something wrong but we were not sitting at your shoulder
when you did.  Perhaps you should try re-building R from scratch, paying
close attention to any messages?
Meanwhile, try to follow the posting guide and not HTML-ize your mail.
Another thing to try is to install Martyn's RPM (for FC1) instead of
what is there now. Seems to get things right for me on RH8.
The demo is amazingly smooth even on this ancient machine, BTW.
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Sorting elements in a data.frame

2004-06-29 Thread Patrick Connolly
On Wed, 23-Jun-2004 at 07:29PM +0100, Dan Bolser wrote:

| 
| Hi,
| 
| I have data like this
| 
| print(x)
| 
| ID   VAL1VAL2
| 12   6
| 24   9
| 345  12
| 499  44
| 
| What I would like is data like this...
| 
| ID   VAL1VAL2
| 12   6
| 24   9
| 312  45
| 444  99
| 
| 
| So that my analysis of the ratio VAL2/VAL1 is somehow uniform.

By uniform, I'm guessing you want them to be = 1

If z is a vector of VAL2/VAL1 values, you can make them all = 1 this way.

z[z  1] - z[z  1]^-1

Depending on just how you want to use them, there could be better ways
but I've done enough guessing for now.

HTH

-- 
Patrick Connolly
HortResearch
Mt Albert
Auckland
New Zealand 
Ph: +64-9 815 4200 x 7188
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~
I have the world`s largest collection of seashells. I keep it on all
the beaches of the world ... Perhaps you`ve seen it.  ---Steven Wright 
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] naive question

2004-06-29 Thread Marc R. Feldesman
At 01:22 PM 6/29/2004, Igor Rivin wrote:

I did read the Import/Export document. It is true that replacing
the read.table by read.csv and setting the commentChar= speeds
things up some (a factor of two?) -- this is very far from acceptable
performance,
being some two orders of magnitude worse than SAS (the IO of which is, in
turn, much worse
than that of the unix utilities (awk, sort, and so on))   . Setting
colClasses is suggested
(and has been suggested by some in response to my question), but for
a frame with some 60 columns, this is a major nuisance.

Feel free to contribute to the project.   Whining and complaining won't get 
you anywhere.  SAS *is* faster at I/O.  So what?


Dr. Marc R. Feldesman
Professor and Chairman Emeritus
Anthropology Department - Portland State University
email:  [EMAIL PROTECTED]
email:  [EMAIL PROTECTED]
fax:503-725-3905
Don't knock on my door if you don't know my Rottweiler's name  Warren Zevon
Its midnight and I'm not famous yet  Jimmy Buffett
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] nls fitting problems (singularity)

2004-06-29 Thread Karl Knoblick

Hallo!

I have a problem with fitting data with nls. The first
example with y1 (data frame df1) shows an error, the
second works fine.

Is there a possibility to get a fit (e.g. JMP can fit
also data I can not manage to fit with R). Sometimes I
also got an error singularity with starting
parameters.

# x-values
x-c(-1,5,8,11,13,15,16,17,18,19,21,22)
# y1-values (first data set)
y1=c(-55,-22,-13,-11,-9.7,-1.4,-0.22,5.3,8.5,10,14,20)
# y2-values (second data set)
y2=c(-92,-42,-15,1.3,2.7,8.7,9.7,13,11,19,18,22)

# data frames
df1-data.frame(x=x, y=y1)
df2-data.frame(x=x, y=y2)

# start list for parameters
sl-list( d=0, b=10, c1=90, c2=20) 

# y1-Analysis - Result: Error in ...  singular
gradient
nls(y~d+(x-b)*c1*(x-b0)+(x-b)*c2*(x-b=0), data=df1,
start=sl)
# y2-Analysis - Result: working...
nls(y~d+(x-b)*c1*(x-b0)+(x-b)*c2*(x-b=0), data=df2,
start=sl)

# plots to look at data
par(mfrow=c(1,2))
plot(df1$x,df1$y)
plot(df2$x,df2$y)

Perhaps there is another fitting routine? Can anybody
help?

Best wishes,
Karl






___
Bestellen Sie Y! DSL und erhalten Sie die AVM FritzBox SL für 0€.
Sie sparen 119€ und bekommen 2 Monate Grundgebührbefreiung.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] naive question

2004-06-29 Thread Vadim Ogranovich
 R's IO is indeed 20 - 50 times slower than that of equivalent C code no
matter what you do, which has been a pain for some of us. It does
however help read the Import/Export tips as w/o them the ratio gets much
worse. As Gabor G. suggested in another mail, if you use the file
repeatedly you can convert it into internal format: read.table once into
R and save using save()... This is much faster.

In my experience R is not so good at large data sets, where large is
roughly 10% of your RAM.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] anti-R vitriol

2004-06-29 Thread A.J. Rossini
Barry Rowlingson [EMAIL PROTECTED] writes:

It's not that I think SAS is such great software,
it's not.  But I really hate badly designed
software.  R is designed by committee.  Worse,
it's designed by a committee of statisticians.
They tend to confuse numerical analysis with
computer science and don't have any idea about
software development at all.  The result is R.

They'd probably prefer computer scientists and numerical analysts who
confuse data munging with statistical data analysis, a common problem
in mixed departments...

best,
-tony

-- 
[EMAIL PROTECTED]http://www.analytics.washington.edu/ 
Biomedical and Health Informatics   University of Washington
Biostatistics, SCHARP/HVTN  Fred Hutchinson Cancer Research Center
UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable
FHCRC  (M/W): 206-667-7025 FAX=206-667-4812 | use Email

CONFIDENTIALITY NOTICE: This e-mail message and any attachme...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] naive question

2004-06-29 Thread Igor Rivin

I was not particularly annoyed, just disappointed, since R seems like
a much better thing than SAS in general, and doing everything with a combination
of hand-rolled tools is too much work. However, I do need to work with very large data 
sets, and if it takes 20 minutes to read them in, I have to explore other
options (one of which might be S-PLUS, which claims scalability as a major 
, er, PLUS over R).

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] naive question

2004-06-29 Thread rivin
 At 01:22 PM 6/29/2004, Igor Rivin wrote:
  
  I did read the Import/Export document. It is true that replacing the
 read.table by read.csv and setting the commentChar= speeds things up
 some (a factor of two?) -- this is very far from acceptable
 performance,
  being some two orders of magnitude worse than SAS (the IO of which is,
 in turn, much worse
  than that of the unix utilities (awk, sort, and so on))   . Setting
 colClasses is suggested
  (and has been suggested by some in response to my question), but for a
 frame with some 60 columns, this is a major nuisance.
  
 Feel free to contribute to the project.   Whining and complaining won't
 get  you anywhere.  SAS *is* faster at I/O.  So what?


Sigh. Why are you being defensive? If you read my message, you will see
that what it comes down to is: I tried what to me are obvious things (some
of which
only become obvious after getting advice from people on this forum), and I
cannot get the system to perform acceptably. The whining and complaining
is actually an attempt to figure out whether I am missing something else
obvious, because I find it hard to believe that I am the first one to face
this [I know I am not, actually, because a gentleman emailed me a response
to the effect that he had to break up his similarly large file into
several pieces to get acceptable performance -- I would say that this
solution is rather hard on the user]; on this list's archive there was a
posting back in '98, asking basically the same question as mine). As for
contributing
to the project, perhaps getting a response of the form this is slow
because we are trying to achieve this, and that, and the third thing, and
this is the best compromise we seem to have come up with might be more
encouraging than So what?


  Igor

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Several PCA questions...

2004-06-29 Thread Dan Bolser

I have the following problem

cov(allDat, method='kendall')

Where allDat is 11,000 by 6 data.frame.

Will the above ever finish on my home computer?

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] naive question

2004-06-29 Thread Douglas Bates
Igor Rivin wrote:
I was not particularly annoyed, just disappointed, since R seems like
a much better thing than SAS in general, and doing everything with a combination
of hand-rolled tools is too much work. However, I do need to work with very large data sets, and if it takes 20 minutes to read them in, I have to explore other
options (one of which might be S-PLUS, which claims scalability as a major 
, er, PLUS over R).

If you are routinely working with very large data sets it would be 
worthwhile learning to use a relational database (PostgreSQL, MySQL, 
even Access) to store the data and then access it from R with RODBC or 
one of the specialized database packages.

R is slow reading ASCII files because it is assembling the meta-data on 
the fly and it is continually checking the types of the variables being 
read.  If you know all this information and build it into your table 
definitions, reading the data will be much faster.

A disadvantage of this approach is the need to learn yet another 
language and system.  I was going to do an example but found I could not 
because I left all my SQL books at home (I'm travelling at the moment) 
and I couldn't remember the particular commands for loading a table from 
an ASCII file.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] naive question

2004-06-29 Thread Peter Wilkinson
I am working with data sets that have 2 matrices of 300 columns by 19,000 
rows , and I manage to get the data loaded in a reasonable amount of time. 
Once its in I save the workspace and load from there. Once I start doing 
some work on the data, I am taking up about 600 Meg's of RAM out of the 1 
Gig I have in the computer.I will soon upgrade to 2 Gig because I will have 
to work with an even larger data matrix soon.

I must say that the speed of R given with what I have been doing, is 
acceptable.

Peter

At 07:59 PM 6/29/2004, Vadim Ogranovich wrote:
 R's IO is indeed 20 - 50 times slower than that of equivalent C code no
matter what you do, which has been a pain for some of us. It does
however help read the Import/Export tips as w/o them the ratio gets much
worse. As Gabor G. suggested in another mail, if you use the file
repeatedly you can convert it into internal format: read.table once into
R and save using save()... This is much faster.
In my experience R is not so good at large data sets, where large is
roughly 10% of your RAM.
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] naive question

2004-06-29 Thread Duncan Murdoch
On Tue, 29 Jun 2004 16:59:58 -0700, Vadim Ogranovich
[EMAIL PROTECTED] wrote:

 R's IO is indeed 20 - 50 times slower than that of equivalent C code no
matter what you do, which has been a pain for some of us. 

Things like this shouldn't be a pain for long.  If C code works well,
why not use C?  It wouldn't be hard to write two C functions that 
1. counted the lines and 2. read them into preallocated vectors. 

Doing it this way you could use .C, you don't need to learn the
intricacies of .Call, and it should be about half the speed (since it
takes two passes) of fast C code, i.e. 10-25 times faster than the
read.* functions.

Then, if you felt really ambitious, you could write it in a way that
others could use, put it in a package, and suddenly R would have I/O
10-25 times faster than it does now.  You wouldn't try to make it as
flexible as current R code, but for reading these huge files people
are talking about, it would be worthwhile to go through a few extra
setup steps.  

Duncan Murdoch

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] naive question

2004-06-29 Thread rivin
 We need more details about your problem to provide any useful
 help.  Are all the variables numeric?  Are they all completely
 different?  Is it possible to use `colClasses'?

It is possible, but very inconvenient. There are mostly numeric columns,
but some integer categories, and some string names. The total number is
high, so doing this by hand would take several minutes as well, so a
different solution is preferable. I did use as.is=TRUE, but that did not
seem to make a huge difference.
 Also, having a couple of gigabytes of RAM is not necessarily
 useful if you're on a 32-bit OS since the total process size is
 usually limited to be less than ~3GB.

True. top shows that the maximal memory usage for the process is about
700MB, so process size was not a limitation (but had I 512Mb, the
thrashing would have killed me...)

 Believe it or not, complaints like these are not that common.
 1998 was a long time ago!

Alas...


 -roger

 Igor Rivin wrote:

 I have a 100Mb comma-separated file, and R takes several minutes to
 read it (via read.table()). This is R 1.9.0 on a linux box with a
 couple gigabytes of RAM. I am conjecturing that R is gc-ing, so maybe
 there is some command-line arg I can give it to convince it that I
 have a lot of space, or?!

 Thanks!

  Igor

 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


 --
 Roger D. Peng
 http://www.biostat.jhsph.edu/~rpeng/

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] naive question

2004-06-29 Thread Peter Wilkinson

Also, having a couple of gigabytes of RAM is not necessarily useful if 
you're on a 32-bit OS since the total process size is usually limited to 
be less than ~3GB.
well 2^32 gives you more like 4 GB, how much of that can be given to a 
process   my highest workspace reached 1.2 Gig. I will add another Gig 
... or 2  

I am assuming that R can address more than 2 Gig memory,  does anybody know 
if R has some other limitation that might be lower than the OS?

Peter
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] naive question

2004-06-29 Thread rivin
 Igor Rivin wrote:

 I was not particularly annoyed, just disappointed, since R seems like
 a much better thing than SAS in general, and doing everything with a
 combination of hand-rolled tools is too much work. However, I do need
 to work with very large data sets, and if it takes 20 minutes to read
 them in, I have to explore other options (one of which might be
 S-PLUS, which claims scalability as a major  , er, PLUS over R).


 If you are routinely working with very large data sets it would be
 worthwhile learning to use a relational database (PostgreSQL, MySQL,
 even Access) to store the data and then access it from R with RODBC or
 one of the specialized database packages.

I was thinking about that, but I had thought that this would help for
reading small pieces of the data (since subsetting would happen on the db
side), but not so much for reading big chunks. But it's certainly worth a
try

 R is slow reading ASCII files because it is assembling the meta-data on
 the fly and it is continually checking the types of the variables being
 read.  If you know all this information and build it into your table
 definitions, reading the data will be much faster.

What do you mean by meta-data? Anyway, I agree that this would slow it
down, but I would suspect that even so there is a bit of room for
improvement, since five minutes for 12 million tokens comes out to
4/second, which is really pretty bad on a 2-3 Ghz machine...

 A disadvantage of this approach is the need to learn yet another
 language and system.  I was going to do an example but found I could not
  because I left all my SQL books at home (I'm travelling at the moment)
 and I couldn't remember the particular commands for loading a table from
  an ASCII file.

Well, I will look into it (among other possibilities).

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] naive question

2004-06-29 Thread Patrick Connolly
On Tue, 29-Jun-2004 at 10:31PM -0400, [EMAIL PROTECTED] wrote:

|  We need more details about your problem to provide any useful
|  help.  Are all the variables numeric?  Are they all completely
|  different?  Is it possible to use `colClasses'?
| 
| It is possible, but very inconvenient. There are mostly numeric columns,
| but some integer categories, and some string names. The total number is
| high, so doing this by hand would take several minutes as well, so a
| different solution is preferable. I did use as.is=TRUE, but that did not

For the lazy typist, here's an idea:

Make a small subset of the datafile (say the first 20 rows) and read
that in with read.table.

X - read.table(blah.txt, header = TRUE, )

Xclasses - sapply(X, class)

Now we have a nice long vector that you can use in your colClasses
argument with the whole data.  Even if it needs a bit of editing, it
will save you typing in all those numeric strings.

HTH

-- 
Patrick Connolly
HortResearch
Mt Albert
Auckland
New Zealand 
Ph: +64-9 815 4200 x 7188
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~
I have the world`s largest collection of seashells. I keep it on all
the beaches of the world ... Perhaps you`ve seen it.  ---Steven Wright 
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] naive question

2004-06-29 Thread rivin
 On Tue, 29-Jun-2004 at 10:31PM -0400, [EMAIL PROTECTED]
 wrote:

 |  We need more details about your problem to provide any useful | 
 help.  Are all the variables numeric?  Are they all completely | 
 different?  Is it possible to use `colClasses'?
 | 
 | It is possible, but very inconvenient. There are mostly numeric
 columns, | but some integer categories, and some string names. The
 total number is | high, so doing this by hand would take several
 minutes as well, so a | different solution is preferable. I did use
 as.is=TRUE, but that did not

 For the lazy typist, here's an idea:

 Make a small subset of the datafile (say the first 20 rows) and read
 that in with read.table.

 X - read.table(blah.txt, header = TRUE, )

 Xclasses - sapply(X, class)

 Now we have a nice long vector that you can use in your colClasses
 argument with the whole data.  Even if it needs a bit of editing, it
 will save you typing in all those numeric strings.

 HTH

Aha! That could be the right trick. I will try it and see how it works...

  Thanks,

  Igor

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] MacOS X binaries won't install

2004-06-29 Thread Ruben Solis
I've tried installing the MacOS X binaries for R available at:
http://www.bioconductor.org/CRAN/
I'm running MacOS X version 10.2.8.
I get a message indicating the installation is successful, but when I 
double-click on the R icon that shows up in my Applications folder, the 
application seems to try to open but closes immediately.

I looked for  /Library/Frameworks/R.framework (by typing ls 
/Library/Frameworks) and it does not appear.  A global search for 
R.framework yields no results, so it seems that the installation is not 
working. (I was going to try command line execution.)

Any help would be appreciated.  Thanks! - RSS
Ruben S. Solis
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] gap() in SAGx package - How to handle this situation?

2004-06-29 Thread Taemyong Choi
gap(data,cluster)

Arguments:  data -  The data matrix
 cluster -  a vector descibing the cluster memberships

 

When I use gap(), 

Our data matrix 300 * 40 , it worked

But data matrix  40 * 300, it don't work

And next is error message.

- Error in data %*% t(veigen) : non-conformable arguments 

 

Gap Function 

function (data = swiss, class = g, B = 500) 

{

library(mva)

if (min(table(class)) == 1) 

stop(Singleton clusters not allowed)

data - as.matrix(data)

temp1 - log(sum(by(data, factor(class), intern - function(x)
sum(dist(x)/ncol(x))/2)))

 

 

veigen - svd(data)$v

x1 - data %*% t(veigen)- Error Message 

...

Example;

In gap function, when we compute singular value decomposition, X, X = UDV,
here  X  is data matrix with dimension of 30*400,  

we expect following results : U : 30*30 dimension, D: 30*400 dimenstion, V
:400*400 dimenstion

But gap function create U=30*30 D=30*400 V=400*30 . 

How to handle this error message? 

Thanks in advance.

Best regards.

 


[[alternative HTML version deleted]]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] funny plotting

2004-06-29 Thread Karla Meurk
Hi, I just wanted to plot a boxplot with a nice curve going through it, 
I thought this would be a simple task but for some reason I can't get 
the two graphs on the same page accurately. Enclosed is the code showing 
the two plots seperately and together.  I would have thought it should 
work if I could use boxplot() then plot() overlayed but it won't allow 
the command add=TRUE (which has worked for me in the past).

Thanks
Carla
P.S. please excuse the clumsy code!
#Section 2 Data Set particle
dial-rbind(-1,
-1,
0,
0,
0,
0,
1,
1,
1)
counts-rbind(2,
3,
6,
7,
8,
9,
10,
12,
15)
particle-as.data.frame(cbind(dial,counts),row.names=NULL)
names(particle)-c(dial,counts)
attach(particle)
pois.particle-glm(counts~dial,family=poisson)
x-seq(-2,2,length=20)
y-predict(pois.particle,data.frame(dial=x),type=response)
#Overlaying plots
x11()
boxplot(counts~dial,main=Boxplot of counts for dial setting and poisson 
fit,ylim=c(0,25))
lines(x,y)

#The seperate plots
x11()
boxplot(counts~dial,ylim=c(0,25))
x11()
plot(x,y,ylim=c(0,25),type=l)
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] funny plotting

2004-06-29 Thread Austin, Matt
Try something like

plot(x,y)
box.dat - boxplot(x=split(counts, dial), plot=FALSE)
bxp(box.dat, add=TRUE, at=c(-1, 0, 1), show.names=FALSE)

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Karla Meurk
Sent: Tuesday, June 29, 2004 21:46 PM
To: [EMAIL PROTECTED]
Subject: [R] funny plotting


Hi, I just wanted to plot a boxplot with a nice curve going through it, 
I thought this would be a simple task but for some reason I can't get 
the two graphs on the same page accurately. Enclosed is the code showing 
the two plots seperately and together.  I would have thought it should 
work if I could use boxplot() then plot() overlayed but it won't allow 
the command add=TRUE (which has worked for me in the past).

Thanks

Carla

P.S. please excuse the clumsy code!

#Section 2 Data Set particle

dial-rbind(-1,
-1,
0,
0,
0,
0,
1,
1,
1)
counts-rbind(2,
3,
6,
7,
8,
9,
10,
12,
15)
particle-as.data.frame(cbind(dial,counts),row.names=NULL)
names(particle)-c(dial,counts)
attach(particle)

pois.particle-glm(counts~dial,family=poisson)
x-seq(-2,2,length=20)
y-predict(pois.particle,data.frame(dial=x),type=response)

#Overlaying plots
x11()
boxplot(counts~dial,main=Boxplot of counts for dial setting and poisson 
fit,ylim=c(0,25))
lines(x,y)

#The seperate plots
x11()
boxplot(counts~dial,ylim=c(0,25))
x11()
plot(x,y,ylim=c(0,25),type=l)

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html