Re: [R] R via ssh login on OS X?
On Jun 29, 2004, at 1:22 AM, Ulises Mora Alvarez wrote: Hi! If you are trying to log in from another Mac to the G5 there are some details to bear in mind, though. If you are indeed trying from a Mac, I'd suggest you to launch your local X server; then, from an xterm 'ssh -X...' to the G5. Of course, if the sshd on the G5 is configured so that its /etc/sshd_config says 'X11Forwarding no' you'll be not able to use the X11 device for graphics; but you can search for a solution on the list files. That's good thinking. That hadn't occurred to me and would be great for the graphical stuff. Goes to show that Xwindows has the right idea for networked graphics while aqua is hopeless in that regard. I don't think that this problem happens in R-1.9.1 because if I ssh into my laptop from a remote box as a non-logged-in user R behaves perfectly on the commandline. Or maybe the install on the G5 is fubared. Happily I have managed to solve my immediate problem on the G5 by compiling a copy of R in my home directory. This wasn't the easiest primarily because I didn't have f2c installed (and because I don't have root I couldn't put it in the normal place). I'm going to say how I did it in case this is handy for others (frankly I hope others don't have to go through this ;) I grabbed the f2c code from ftp://netlib.bell-labs.com:21/netlib/f2c/src.tar and libf2c from http://www.netlib.org/f2c/libf2c.zip Both built ok. I moved the f2c executable, f2c.h and libf2c.a into ~/f2c. Don't forget to run ranlib over libf2c I set the environment variables: LDFLAGS=-L$HOME/f2c/ CPPFLAGS=-I$HOME/f2c/ (for some reason the --includedir just didn't seem to work ...) (had to remember to do this before configure) then did ./configure --prefix=$HOME/Rinstall/ --enable-R-framework=no --with-x=no --with-lapack=no and then make This basically worked but for some reason lapack was still trying to build and that was failing, so I deleted it from the appropriate makefile and the rest of the compile went fine. The lapack confusion stopped some of the recommended modules from building but I didn't need those (just sna which built fine from CRAN). I didn't do the actual install but am just using the full path to R. It is working fine in command-line mode now and the calculations are running as I type. I didn't test this but I did also read that people are able to get around the need graphical launching access by using OS X fast user switching. Thanks everyone! --J Good look. On Mon, 28 Jun 2004, Paul Roebuck wrote: On Mon, 28 Jun 2004, James Howison wrote: I have an ssh only login to a G5 on which I am hoping to run some analyses. The situation is complicated by the fact that the computer's owner is away for the summer (and thus also only has shell login). R is installed and there is a symlink to /usr/local/bin/R but when I try to launch it I get: [EMAIL PROTECTED] R kCGErrorRangeCheck : Window Server communications from outside of session allowed for root and console user only INIT_Processeses(), could not establish the default connection to the WindowServer.Abort trap I though, ah ha, I need to tell it not to use the GUI but to no avail: [EMAIL PROTECTED] R --gui=none kCGErrorRangeCheck : Window Server communications from outside of session allowed for root and console user only INIT_Processeses(), could not establish the default connection to the WindowServer.Abort trap I'm embarrassed to say that I'm writing to the list without having the latest version installed---because I can't install it at the moment. I am using R 1.8.1. I have tried to compile the latest from source but there is no F77 compiler. I thought I'd ask around before going down the put local dependencies in the home folder to compile this route (any hints on doing that would be great though) ... Can other people get R command-line to work with logged in remotely via ssh? Any hints? Is this something that is fixed in more recent versions? I think I can see one other route: getting the computer's owner to install fink and their version remotely ... but I'm open to all don't bother the professor when he's on holiday options ... I suffered similarly attempting to run R via CGI; I never found a workaround for remote access (also running 1.8.1 with Panther). Seemed to have something to do with running an application requiring access to graphics but not being the user currently owning the dock. I did not determine if the limitation was due to R implementation or operating system software. F77 not necessary; use 'f2c' instead. But don't bother with Fink since it's not necessary to build it. No 'sudo' access either? Is the user still logged in (screenlocked) or are you just lacking administrative access? -- SIGSIG -- signature too long (core dumped) __ [EMAIL PROTECTED] mailing list
Re: [R] R via ssh login on OS X?
Did you look at the notes on MacOS X in the R-admin manual (as the INSTALL file asks)? That would have told you why lapack failed, and I think you should redo your build following the advice there. On Tue, 29 Jun 2004, James Howison wrote: [...] then did ./configure --prefix=$HOME/Rinstall/ --enable-R-framework=no --with-x=no --with-lapack=no Note --with-blas='-framework vecLib' --with-lapack is `strongly recommended', and on some versions of MacOS X `appears to be the only way to build R'. and then make This basically worked but for some reason lapack was still trying to build and that was failing, so I deleted it from the appropriate makefile and the rest of the compile went fine. The lapack confusion stopped some of the recommended modules from building but I didn't need those (just sna which built fine from CRAN). [...] -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] camberra distance?
Hi! Its not an R specific question but had no idea where to ask elsewhere. Does anyone know the orginal reference to the CAMBERA DISTANCE? Eryk. Ps.: I knew that its an out of topic question (sorry). Can anyone reccomend a mailing list where such questions are in topic? __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] camberra distance?
Thanks Mark. Yes I mean canberra. Searching for canberra camberra by google I observed the following. Searching for caMberra you will find a paper from 1997 where they write camberra instead of canberra dissimilarity for the meassure defined sum(|x_i - y_i| / |x_i + y_i|). Meanwhile there are plenty of articles on the net which reference this paper from 1997 and write caMberra instead of canberra. May be because it is much harder to find an article about canberra distance using google (because of the city). A quite assertive argument to use distinctive names and to publish papers in journals which are free, online and can be searched by google. Eryk *** REPLY SEPARATOR *** On 29.06.2004 at 15:51 [EMAIL PROTECTED] wrote: maybe you mean 'Canberra'?, if so it might have come from work in csiro in canberra back in the 60's/70's? Look for Lance Williams 1967 , possibly. Aust. Comput. J. 1, 15-20 Mark Palmer Environmetrics Monitoring for Management CSIRO Mathematical and Information Sciences Private bag 5, Wembley, Western Australia, 6913 Phone 61-8-9333-6293 Mobile 0427-50-2353 Fax: 61-8-9333-6121 Email: [EMAIL PROTECTED] URL: www.cmis.csiro.au/envir -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Wolski Sent: Tuesday, 29 June 2004 3:45 PM To: R Help Mailing List Subject: [R] camberra distance? Hi! Its not an R specific question but had no idea where to ask elsewhere. Does anyone know the orginal reference to the CAMBERA DISTANCE? Eryk. Ps.: I knew that its an out of topic question (sorry). Can anyone reccomend a mailing list where such questions are in topic? __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] strucchange-esque inference for glms ?
Alexis: according to the strucchange package .pdf, all procedures in this package are concerned with testing or assessing deviations from stability in the classical linear regression model. i'd like to test/assess deviations from stability in the Poisson model. is there a way to modify the strucchange package to suit my purposes, or should i use be using another package, or is this a tough nut to crack? :) As of version 1.2-0 strucchange supports tests for parameter instability in much more general models including GLMs. A simple example would be R library(strucchange) R data(BostonHomicide) R mcus - gefp(homicides ~ population, family = poisson, fit = glm, data = BostonHomicide, vcov = kernHAC) R plot(mcus) R sctest(mcus) See our technical report Generalized M-fluctuation tests for Parameter Instability (linked from my web page) for the theory behind it. my application is detecting the onset of a flu outbreak as new daily data trickles in from each morning from local hospitals. seems to me like the same sort of inferential goal that strucchange refers to as monitoring of structural change. In principile the theory established in the report above could also be applied to monitoring, but I have neither worked the theory out nor implemented a function which could handle monitoring in GLMs. But you can contact me off-list if you are interested in this. Best, Achim __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Several PCA questions...
Hi, I am doing PCA on several columns of data in a data.frame. I am interested in particular rows of data which may have a particular combination of 'types' of column values (without any pre-conception of what they may be). I do the following... # My data table. allDat - read.table(big_select_thresh_5, header=1) # Where some rows look like this... # PDB SUNID1 SUNID2 AA CH IPCAPCA IBB BB # 3sdh14984 14985 6 10 24 24 93 116 # 3hbi14986 14987 6 10 20 22 94 117 # 4sdh14988 14989 6 10 20 20 104 122 # NB First three columns = row ID, last 6 = variables attach(allDat) # My columns of interest (variables). part - data.frame(AA,CH,IPCA,PCA,IBB,BB) pc - princomp(part) plot(pc) The above plot shows that 95% of the variance is due to the first 'Component' (which I assume is AA). i.e. All the variables behave in quite much the same way. I then did ... biplot(pc) Which showed some outliers with a numeric ID - How do I get back my old 3 part ID used in allDat? In the above plot I saw all the variables (correctly named) pointing in more or less the same direction (as shown by the variance). I then did the following... postscript(file=test.ps,paper=a4) biplot(pc) dev.off() However, looking at test.ps shows that the arrows are missing (using ggv)... Hmmm, they come back when I pstoimg then xv... never mind. Finally, I would like to make a contour plot of the above biplot, is this possible? (or even a good way to present the data? Thanks very much for any feedback, Dan. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] camberra distance?
On Tue, 29 Jun 2004, Wolski wrote: Hi! Its not an R specific question but had no idea where to ask elsewhere. Does anyone know the orginal reference to the CAMBERA DISTANCE? Eryk. Ps.: I knew that its an out of topic question (sorry). Can anyone reccomend a mailing list where such questions are in topic? sci.stat.consult (applied statistics and consulting) and sci.stat.math (mathematical stat and probability) Both news groups. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] A strange question on probability
Does the following do what you want: rseq - function(n=1, length.=2){ s1 - sample(x=length., size=n, replace=TRUE) s2 - sample(x=length., size=n, replace=TRUE) ranseq - array(0, dim=c(n, length.)) for(i in 1:n) ranseq[i, s1[i]:s2[i]] - 1 ranseq } set.seed(1) rseq(9, 5) set.seed(1) rseq(9, 5) [,1] [,2] [,3] [,4] [,5] [1,]11000 [2,]01000 [3,]11100 [4,]00011 [5,]01000 [6,]00011 [7,]00111 [8,]00010 [9,]00011 hope this helps. spencer graves Jim Lemon wrote: On Tuesday 29 June 2004 01:48 pm, Steve S wrote: Dear All, I wonder if there is a probability distribution where you can specify when a certain event start and finish within a fixed period? For example I might specify the number of period to be 5, and a random vector from this distribution might give me: 0 1 1 1 0 where 1 is always adjacent to each other? This can never happen: 0 0 1 0 1 for example. Well, I'll have a go. Let's call it the start-finish distribution. We have a p (period) and d (duration). As there must be an off observation (otherwise we don't know the duration), It's fairly easy to enumerate the outcomes for a given period: d start(s)finish(f) count 1 1:n-1 2:n n-1 2 1:n-2 3:n n-2 ... n-1 1 n-1 1 Assuming that all outcomes are equally likely, the total number of outcomes is: n(n-1)/2 thus the probability of a given d occurring is: P[d|n] = 2(n-d)/n(n-1) The probabilities of s and f over all d are inverse over the values k in 1:n P[s=k|n] = (n-k-1)/(n-1) P[f=k|n] = (k-1)/(n-1) giving, I think, a monotonic function for s and f. My apology for this strange question! My apology if this is no use at all. Jim __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Several PCA questions...
On 06/29/04 11:04, Dan Bolser wrote: Hi, I am doing PCA on several columns of data in a data.frame. I am interested in particular rows of data which may have a particular combination of 'types' of column values (without any pre-conception of what they may be). I do the following... # My data table. allDat - read.table(big_select_thresh_5, header=1) # Where some rows look like this... # PDB SUNID1 SUNID2 AA CH IPCAPCA IBB BB # 3sdh14984 14985 6 10 24 24 93 116 # 3hbi14986 14987 6 10 20 22 94 117 # 4sdh14988 14989 6 10 20 20 104 122 # NB First three columns = row ID, last 6 = variables attach(allDat) # My columns of interest (variables). part - data.frame(AA,CH,IPCA,PCA,IBB,BB) pc - princomp(part) plot(pc) The above plot shows that 95% of the variance is due to the first 'Component' (which I assume is AA). No. It is the first principal component, which is some linear combination of all the variables. Try loadings(pc). It sounds like you need to read up on principal component analysis. i.e. All the variables behave in quite much the same way. I then did ... biplot(pc) Which showed some outliers with a numeric ID - How do I get back my old 3 part ID used in allDat? The numeric ID is taken from the row names of pc. So, if the IDs in question are 3 and 5, then alldat[c(3,5),] should work. In the above plot I saw all the variables (correctly named) pointing in more or less the same direction (as shown by the variance). I then did the following... postscript(file=test.ps,paper=a4) biplot(pc) dev.off() However, looking at test.ps shows that the arrows are missing (using ggv)... Hmmm, they come back when I pstoimg then xv... never mind. I get red arrows for the components in both the original graph and the ps output (R 1.9.1, Fedora Core 2). This may be a platform-specific problem or one specific to ggv. I have neither ggv nor pstoimg. (But xv and gv both work.) Finally, I would like to make a contour plot of the above biplot, is this possible? (or even a good way to present the data? No idea how to do this or why you would want it. Jon -- Jonathan Baron, Professor of Psychology, University of Pennsylvania Home page:http://www.sas.upenn.edu/~baron R search page:http://finzi.psych.upenn.edu/ __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Several PCA questions...
On Tue, 29 Jun 2004, Dan Bolser wrote: Hi, I am doing PCA on several columns of data in a data.frame. I am interested in particular rows of data which may have a particular combination of 'types' of column values (without any pre-conception of what they may be). I do the following... # My data table. allDat - read.table(big_select_thresh_5, header=1) # Where some rows look like this... # PDB SUNID1 SUNID2 AA CH IPCAPCA IBB BB # 3sdh14984 14985 6 10 24 24 93 116 # 3hbi14986 14987 6 10 20 22 94 117 # 4sdh14988 14989 6 10 20 20 104 122 # NB First three columns = row ID, last 6 = variables attach(allDat) # My columns of interest (variables). part - data.frame(AA,CH,IPCA,PCA,IBB,BB) pc - princomp(part) Do you really want an unscaled PCA on that data set? Looks unlikely (but then two of the columns are constant in the sample, which is also worrying). plot(pc) The above plot shows that 95% of the variance is due to the first 'Component' (which I assume is AA). No, it is the first (principal) component. You did ask for PCA! i.e. All the variables behave in quite much the same way. Or you failed to scale the data so one dominates. I then did ... biplot(pc) Which showed some outliers with a numeric ID - How do I get back my old 3 part ID used in allDat? Set row names on your data frame. Like almost all of R, it is the row names of a data frame that are used for labelling, and you did not give any so you got numbers. In the above plot I saw all the variables (correctly named) pointing in more or less the same direction (as shown by the variance). I then did the following... postscript(file=test.ps,paper=a4) biplot(pc) dev.off() However, looking at test.ps shows that the arrows are missing (using ggv)... Hmmm, they come back when I pstoimg then xv... never mind. So ggv is unreliable, perhaps cannot cope with colours? Finally, I would like to make a contour plot of the above biplot, is this possible? (or even a good way to present the data? What do you propose to represent by the contours? Biplots have a well-defined interpretation in terms of distances and angles. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] camberra distance?
You may find it easier to search for `canberra distance', if that is really what you intend (and your subject line and text differ in spelling anyway). See ?dist, and Google results for `cambera distance', which both shows this a fairly common mispelling of the capital of Australia, and suggests the correct spelling. On Tue, 29 Jun 2004, Dan Bolser wrote: On Tue, 29 Jun 2004, Wolski wrote: Hi! Its not an R specific question but had no idea where to ask elsewhere. Does anyone know the orginal reference to the CAMBERA DISTANCE? Eryk. Ps.: I knew that its an out of topic question (sorry). Can anyone reccomend a mailing list where such questions are in topic? sci.stat.consult (applied statistics and consulting) and sci.stat.math (mathematical stat and probability) Both news groups. I do think readers of such groups (and this one) expect Google etc to be used first. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R via ssh login on OS X?
My office G5 running R-devel has no problem with remote logins, either mine, or my students, so I doubt there is something fatally flawed in either the OS or R that is a problem. X11Forwarding is off by default so this does need to be changed, I believe. I might add just for a moment of schadenfreude that Stata's Mac version does seem to make it impossible to run remotely even though their other unix versions are happy to do so. url:www.econ.uiuc.edu/~rogerRoger Koenker email [EMAIL PROTECTED] Department of Economics vox:217-333-4558University of Illinois fax:217-244-6678Champaign, IL 61820 On Jun 29, 2004, at 2:04 AM, James Howison wrote: On Jun 29, 2004, at 1:22 AM, Ulises Mora Alvarez wrote: Hi! If you are trying to log in from another Mac to the G5 there are some details to bear in mind, though. If you are indeed trying from a Mac, I'd suggest you to launch your local X server; then, from an xterm 'ssh -X...' to the G5. Of course, if the sshd on the G5 is configured so that its /etc/sshd_config says 'X11Forwarding no' you'll be not able to use the X11 device for graphics; but you can search for a solution on the list files. That's good thinking. That hadn't occurred to me and would be great for the graphical stuff. Goes to show that Xwindows has the right idea for networked graphics while aqua is hopeless in that regard. I don't think that this problem happens in R-1.9.1 because if I ssh into my laptop from a remote box as a non-logged-in user R behaves perfectly on the commandline. Or maybe the install on the G5 is fubared. Happily I have managed to solve my immediate problem on the G5 by compiling a copy of R in my home directory. This wasn't the easiest primarily because I didn't have f2c installed (and because I don't have root I couldn't put it in the normal place). I'm going to say how I did it in case this is handy for others (frankly I hope others don't have to go through this ;) I grabbed the f2c code from ftp://netlib.bell-labs.com:21/netlib/f2c/src.tar and libf2c from http://www.netlib.org/f2c/libf2c.zip Both built ok. I moved the f2c executable, f2c.h and libf2c.a into ~/f2c. Don't forget to run ranlib over libf2c I set the environment variables: LDFLAGS=-L$HOME/f2c/ CPPFLAGS=-I$HOME/f2c/ (for some reason the --includedir just didn't seem to work ...) (had to remember to do this before configure) then did ./configure --prefix=$HOME/Rinstall/ --enable-R-framework=no --with-x=no --with-lapack=no and then make This basically worked but for some reason lapack was still trying to build and that was failing, so I deleted it from the appropriate makefile and the rest of the compile went fine. The lapack confusion stopped some of the recommended modules from building but I didn't need those (just sna which built fine from CRAN). I didn't do the actual install but am just using the full path to R. It is working fine in command-line mode now and the calculations are running as I type. I didn't test this but I did also read that people are able to get around the need graphical launching access by using OS X fast user switching. Thanks everyone! --J Good look. On Mon, 28 Jun 2004, Paul Roebuck wrote: On Mon, 28 Jun 2004, James Howison wrote: I have an ssh only login to a G5 on which I am hoping to run some analyses. The situation is complicated by the fact that the computer's owner is away for the summer (and thus also only has shell login). R is installed and there is a symlink to /usr/local/bin/R but when I try to launch it I get: [EMAIL PROTECTED] R kCGErrorRangeCheck : Window Server communications from outside of session allowed for root and console user only INIT_Processeses(), could not establish the default connection to the WindowServer.Abort trap I though, ah ha, I need to tell it not to use the GUI but to no avail: [EMAIL PROTECTED] R --gui=none kCGErrorRangeCheck : Window Server communications from outside of session allowed for root and console user only INIT_Processeses(), could not establish the default connection to the WindowServer.Abort trap I'm embarrassed to say that I'm writing to the list without having the latest version installed---because I can't install it at the moment. I am using R 1.8.1. I have tried to compile the latest from source but there is no F77 compiler. I thought I'd ask around before going down the put local dependencies in the home folder to compile this route (any hints on doing that would be great though) ... Can other people get R command-line to work with logged in remotely via ssh? Any hints? Is this something that is fixed in more recent versions? I think I can see one other route: getting the computer's owner to install fink and their version remotely ... but I'm open to all don't bother the professor when he's on holiday options ... I suffered similarly attempting to run R via
Re: [R] Several PCA questions...
Thanks Jonathan and Brian for advise, all but for the last point I will do more background reading. To come back to the last point... Finally, I would like to make a contour plot of the above biplot, is this possible? (or even a good way to present the data? Brian said: What do you propose to represent by the contours? Biplots have a well-defined interpretation in terms of distances and angles. Jonathan said: No idea how to do this or why you would want it. Basically I would like to make a 2d smothed density, represented as a countour plot. I would like to do this as a crude visual clustering of my data points. i.e. instead of representing data by the row labels in the biplot, I would like to see just a single dot for each data point. Then I would like to only see the density of these points in 2d (hence contours). For example... x - rnorm(1000) y - rnorm(1000) plot(x,y) library(MASS) z - kde2d(x,y) contour(z) I imagine the above in the context of my biplot, and I would like to see peaks which represent clusters of datapoints in 2d. However, I don't know how to get x,y coords from the pc object or the biplot function. Thanks again for the other tips, I need to read more! I will just throw one final question out there (perhaps to further highlight my ignorance). I thought that a significant factor in my data was the relative magnitude of the different variables, so I thought about making a new variable for each distinct pair of existing variables, and setting that new (pariwise) variable to 1 or 0 depending on the relative magnitude of the two component variables. Then I do PCA (or some other clustering (or a simple grouping)) on this new set of variables, and hey presto, I have the classes of my data points. Just an idea. Any good? Cheers, Dan. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] discrete hazard rate analysis
Dear R users, I have more of a statistical/econometric question than straight R one. I have a data set with the discrete hazard rate of small firms survival on 400 counties over a period of 9 years. This data was generated using census information from the VAT registration number of each one of these business. I would like to analyze the effect of regional factors (deprivation index, real wages, average schooling, population density, etc) on the variation of these hazard rates across counties over time. I've done a search in the economic literature on firm survival and regional economics, but I could not find any reference that would resemble the type of data or the problem that I would like to explore. I would like to know if anyone in the list has any suggestion of references that I might have missed in economics, or if people on any other fields know of any references of people looking at any data that might resemble this one (I don't know, but maybe epidemiology might have regional level data that might look at similar issues). Of course I would also like to know which R commands could assist me on this analysis. Any suggestions will be much appreciated. All the very best, JP County Region time 1993 1994 1995 1996 1997 1998 1999 2000 2001 a South 6 months 95.0 95.1 95.5 95.7 96.8 96.8 96.9 95.9 98.0 a South 12 months 87.1 87.1 89.7 89.6 92.4 91.7 92.6 90.2 93.9 a South 18 months 79.5 79.8 83.3 83.0 86.4 86.7 85.8 84.7 a South 24 months 73.3 73.1 77.8 78.0 80.6 81.0 79.6 79.2 a South 30 months 68.0 67.3 72.8 72.3 75.8 75.1 74.1 a South 36 months 63.7 62.9 69.0 67.1 70.8 68.7 68.5 a South 42 months 59.3 59.1 65.4 62.6 65.8 64.2 a South 48 months 56.1 56.2 61.6 59.1 61.2 59.6 b South 6 months 94.2 96.0 96.3 96.7 96.5 97.0 97.0 96.1 97.1 b South 12 months 87.2 89.1 90.6 90.5 91.4 91.8 92.1 91.3 92.8 b South 18 months 79.9 82.0 84.2 84.5 85.8 85.8 86.3 86.2 b South 24 months 73.9 75.9 78.1 79.0 80.5 79.8 80.8 80.4 b South 30 months 68.2 70.0 73.2 74.2 75.6 74.3 75.8 b South 36 months 64.0 65.4 69.0 70.3 70.4 69.6 71.0 b South 42 months 60.2 60.8 65.4 66.0 66.1 64.9 b South 48 months 56.6 57.5 62.0 61.7 61.7 61.1 c South 6 months 93.2 94.0 94.6 95.6 95.7 95.8 95.9 96.6 97.2 c South 12 months 84.5 85.8 87.8 89.1 89.6 89.8 90.8 91.6 92.7 c South 18 months 77.2 78.9 80.7 83.3 84.1 83.8 84.1 86.7 c South 24 months 69.8 72.8 75.1 77.2 78.1 78.0 78.7 80.7 c South 30 months 63.8 66.3 69.3 71.9 72.9 73.0 73.3 c South 36 months 59.4 61.7 65.3 67.8 68.5 68.2 68.3 c South 42 months 55.8 57.3 60.9 63.7 64.0 63.6 c South 48 months 52.4 53.7 57.6 60.3 59.9 59.0 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R client connection OLAP cube (SQL Analysis Services / PivotTable Service)
Dear Olivier, I believe your best bet may be to connect to the database through some kind of R-COM connection (either Thomas Baier and Erich Neuwirth's R-(D)COM in CRAN or Duncan Temple Lang's at http://www.omegahat.org/RDCOMClient). For instance, the ADO MD (ActiveX Data Objects Multi-dimensional) COM object/library allows you to connect to the OLAP database and query multiple cubes, their axes and hierarchies, etc. See the Microsoft Developer Network (MSDN) for the gory details. Hope this helps, -- David Olivier Collignon wrote: I have been doing data analysis/modeling in R, connecting to SQL databases with RODBC (winXP client with R1.9.0 and win2k SQL server 2000). I am now trying to leverage some of the OLAP features to keep the data intensive tasks on the DB server side and only keep the analytical tasks within R (optimize use of memory). Is there any package that would allow to connect to OLAP cubes (as a client to SQL Analysis Services PivotTable Service) from an R session (similar to the RODBC package)? Can this be done directly with special connection parameters (from R) through the RODBC package or do I have to setup an intermediary XL table (pivottable linked to the OLAP cube) and then connect to the XL data source from R? I would appreciate any reference / pointer to OLAP/R configuration instructions. Thanks. Olivier Collignon __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] gambling problem
Hi all i have an interesting project that i have been working on. i intended to set this as a first year programming problem but then changed my mind since i thought that it might be too difficult for them to program. the problem is as follows: You have been approached by a local casino in order to investigate the performance of one of their slot machines. The slot machine consists of three independently operating reels on which one of six different symbols can appear. The symbols are hearts (H), diamonds (D), spades (S), clubs (C) a joker (J) and a castle (Ca). People bet 1 unit at a time in order to play the game and are paid out according to the arrangement of the three reel symbols. Each reel consists of a number of different tiles. For example, the first reel contains 40 tiles. The second has 40 tiles and the third has 50 tiles. Each time the game is played each of the reels spin such that 1 of the 40 tiles (for reel 1 and similarly for the other reels) will appear. The number of tiles that occur on each of the reels are shown below: (i havn't included these but they are in the code below: ie) * I've written three functions that will solve the above problem. the code is attached below. the code is very fast but i would like to improve the speed by not utilizing loops. is that possible? * another question? in the function called GAMBLING, i use the following : a-COUNTER(reelpic,nreels,countcombs,payoffcombs,payoff,bet) countcombs-a$countcombs payoff-a$payoff payoffvec[i]-payoff in order to count up the number of times each of the times we get each of the payoff symbol combinations (ie H, HH, HHH, D, DD, DDD, ... J,JJ,JJJ, Ca, CaCa, CaCaCa). countcombs is a vector that contains the counts of the various payoff symbol combinations. the function COUNTER calculates these values (ie basically just adds one if a particular combination occurs) and it is attached as a list object in COUNTER. is there a way of declaring PUBLIC VARIABLES (as allowed by VBA) that allows one to know the value of different variables as caclulated in different functions without using the method that i used. ie. using the value countcombs-a$countcombs after calling a-COUNTER(reelpic,nreels,countcombs,payoffcombs,payoff,bet) * Another question: in the function RUNDIST i used the following 2 lines zrung-GAMBLING(nsim=150) z[p]-cumsum(zrung$payoffvec)[150] Initially the second of these lines was set as z[p]-cumsum(zrung$payoffvec)[nsim] which caused an error. why does this happen? Sorry for the extremely long email but any help would be much appreciated. regards Allan the following functions are attached: * GAMBLING- this function simulates the basic game as stated above * COUNTER - this function calculates the number of times each of the various payoff combinations occur * FORMATCOMBSPDF - this function creates a table of the simulated pdf of the payoff combinations * RUNDIST - allows one to generate a distribution of a gamblers total payoff after playing the game 150 times. #GAMBLING- this function simulates the basic game as stated above GAMBLING-function(nsim) { #denote hearts =1,diamonds =2,spades =3,clubs =4, joker=5, castle = 6 time1-Sys.time() #the upper level of the pdf uplimit1-c(0.14,0.24,0.30,0.40,0.50,1) uplimit2-c(0.14,0.30,0.44,0.50,0.56,1) uplimit3-c(0.16,0.30,0.36,0.50,0.56,1) payoffcombs-matrix(c(2,6,34,2,8,48,2,13,211,2,21,127,2,19,296,0,0,0),nrow=18,ncol=1) bet-1 t-matrix(data=0,nrow=length(uplimit1),ncol=3) reelpic-matrix(data=0,nrow=1,ncol=3) countcombs-matrix(data=0,nrow=length(payoffcombs),ncol=1) payoffvec-matrix(data=0,nrow=nsim,ncol=1) nreels-3 payoff-0 uplimit-matrix(c(uplimit1,uplimit2,uplimit3),ncol=length(uplimit1),nrow=nreels,byrow=TRUE) #the loop over the number the simulation counter for (i in 1:nsim) { #the loop over the number the reels for (j in 1:nreels) { unif-runif(n=1, min=0, max=1) #print(unif) #the loop over the number of prob categories for (k in 1:length(uplimit1)) { if (unif=uplimit[j,k]) { #counts up the number of times we get #each of the symbols t[k,j]=t[k,j]+1 #the reel picture generated reelpic[1,j]-k break }# endif }# next k }# next j a-COUNTER(reelpic,nreels,countcombs,payoffcombs,payoff,bet) countcombs-a$countcombs payoff-a$payoff payoffvec[i]-payoff }# next i totals-apply(t,2,sum) pdf-sweep(t,2,totals,/) combspdf-sweep(countcombs,1,nsim,/) b-FORMATCOMBSPDF(combspdf) time2-Sys.time() timer-time2-time1 aa-paste(THE OUTPUT LIST: $COMBSPDF,
[R] binding rows from different matrices
Hello list, I have 3 matrices with same dimension : veca=matrix(1:25,5,5) vecb=matrix(letters[1:25],5,5) vecc=matrix(LETTERS[1:25],5,5) I would like to obtain a new matrix composed by alternating rows of these different matrices (row 1 of mat 1, row 1 of mat 2, row 1 of mat 3, row 2 of mat 1.) I have found a solution to do it but it is not very pretty and I wonder if I can do it in an other way (perhaps with apply ) ? res=matrix(0,1,5) for(i in 1:5) + res=rbind(res,veca[i,],vecb[i,],vecc[i,]) res=res[-1,] res [,1] [,2] [,3] [,4] [,5] [1,] 1 6 11 16 21 [2,] a f k p u [3,] A F K P U [4,] 2 7 12 17 22 [5,] b g l q v [6,] B G L Q V [7,] 3 8 13 18 23 [8,] c h m r w [9,] C H M R W [10,] 4 9 14 19 24 [11,] d i n s x [12,] D I N S X [13,] 5 10 15 20 25 [14,] e j o t y [15,] E J O T Y Thanks in advance ! Stéphane DRAY -- Département des Sciences Biologiques Université de Montréal, C.P. 6128, succursale centre-ville Montréal, Québec H3C 3J7, Canada Tel : 514 343 6111 poste 1233 E-mail : [EMAIL PROTECTED] -- Web http://www.steph280.freesurf.fr/ __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] give PAM my own medoids
Hello, When using PAM (partitioning around medoids), I would like to skip the build-step and give the fonction my own medoids. Do you know if it is possible, and how ? Thank you very much. Isabel __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] [ cor(x, y,use = all.obs,method = c(spearman)) ]
Hello I would like to know how cor() handles ranks when some ranks are ex aequo. Does it use Spearman Correlation Coefficient with correction of the formula ? Thanks -- Sebastien MORETTI Linux User - #327894 CNRS - IGS 31 chemin Joseph Aiguier 13402 Marseille cedex 20, FRANCE tel. +33 (0)4 91 16 44 55 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] give PAM my own medoids
Bonjour Isabel, Isabel == Isabel Brito [EMAIL PROTECTED] on Tue, 29 Jun 2004 17:06:12 +0200 writes: Isabel Hello, Isabel When using PAM (partitioning around medoids), I Isabel would like to skip the build-step and give the Isabel fonction my own medoids. Isabel Do you know if it is possible, and how ? unfortunately, it's not yet possible, but --- believe or not --- this has been on my TODO list for 'cluster' (the package) for a while now -- and your wish definitely raises the priority! I want to do some checking for user input errors there, but this is definitely not so much of work to do... - do nag me about it at least once a month till it'done.. ;-) Isabel Thank you very much. You're welcome, Martin Maechler, Seminar fuer Statistik ETH Zurich __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Different behaviour of unique(), R vs. Splus.
Apologies for the cross-posting, but I thought this snippet of info might be vaguely interesting to both lists. I did a ***brief*** search to see if this issue had previously been discussed and found nothing. So I thought I'd tell the list about a difference in behaviour between unique() in R and unique() in Splus which bit me just now. I was trying to convert a package from Splus to R and got nonsense answers in R. Turned out that within the bowels of the package I was doing something like u - unique(y) where y was a matrix of integer values. In Splus this gives a (short) vector of unique values. In R it gives a matrix of the same dimensionality as y, except that any duplicated rows are eliminated. (This looks like being very useful --- once you know about it. And it was probably mentioned in the R release notes at one time, but, as Dr. Hook says, ``I was stoned and I missed it.'') E.g. set.seed(42) m - matrix(sample(1:5,20,TRUE),5,4) u - unique(m) In R ``u'' is identical to ``m''; in Splus ``u'' is vector (of length 5). To get what I want in R I simply need to do u - unique(as.vector(y)) Simple, once you know. Took me a devil of a long time to track down what was going wrong, but! cheers, Rolf Turner __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Boosting
ORIORDAN == ORIORDAN EDMOND [EMAIL PROTECTED] on Mon, 28 Jun 2004 10:23:24 -0400 writes: ORIORDAN Hi Does anybody have a package/code for Real ORIORDAN Adaboost that works in R? Did you try the 'gbm' package from CRAN? ORIORDAN very large binary data set Any help greatly ORIORDAN appreciated cheers ed __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Quantile Regression in R
I recently learn about Quantile Regression in R. I am trying to study two time series (attached) by Quantile Regression in R. I wrote the following code and do not know how to interpret the lines. What kind of information can I get from them? Correlation for quantiles, conditional probabilties (i.e. P(X in Quantile i | Y in Quantile i)) , and etc. Many thanks in advance for any help. Best, Ali library(quantreg) #help.start() Data - read.table(RESvsMOVE2.dat) # x - Data[,2] y - Data[,1] par(mfrow=c(2,2)) qqnorm(x,main=MOVE Norm Q-Q Plot, xlab=Normal Qunatiles,ylab = MOVE Quantiles) qqline(x) qqnorm(y,main=Residuals Norm Q-Q Plot, xlab=Normal Qunatiles,ylab = Residuals Quantiles) qqline(y) plot(x,y,xlab=MOVE,ylab=Residuals,cex=.5) xx - seq(min(x),max(x),.5) # Just a linear regression g - coef(lm(y~x)) yy - (g[1]+g[2]*(xx)) lines(xx,yy,col=yellow) taus - c(.05,.1,.25,.5,.75,.9,.95) for(tau in taus){ f - coef(rq(y~x,tau=tau,method=pfn)) yy - (f[1]+f[2]*(xx)) if (tau ==.05){ lines(xx,yy,col=red) } if (tau ==.95){ lines(xx,yy,col=green) } if (tau != .05 tau != .95){ lines(xx,yy,col=blue) } } __ __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] binding rows from different matrices
Try: veca=matrix(1:25,5,5) vecb=matrix(letters[1:25],5,5) vecc=matrix(LETTERS[1:25],5,5) x.1 - lapply(1:5,function(x)rbind(veca[x,],vecb[x,],vecc[x,])) do.call('rbind',x.1) [,1] [,2] [,3] [,4] [,5] [1,] 1 6 11 16 21 [2,] a f k p u [3,] A F K P U [4,] 2 7 12 17 22 [5,] b g l q v [6,] B G L Q V [7,] 3 8 13 18 23 [8,] c h m r w [9,] C H M R W [10,] 4 9 14 19 24 [11,] d i n s x [12,] D I N S X [13,] 5 10 15 20 25 [14,] e j o t y [15,] E J O T Y __ James HoltmanWhat is the problem you are trying to solve? Executive Technical Consultant -- Office of Technology, Convergys [EMAIL PROTECTED] +1 (513) 723-2929 Stephane DRAY [EMAIL PROTECTED]To: [EMAIL PROTECTED] eal.ca cc: Sent by: Subject: [R] binding rows from different matrices [EMAIL PROTECTED] ath.ethz.ch 06/29/2004 11:00 Hello list, I have 3 matrices with same dimension : veca=matrix(1:25,5,5) vecb=matrix(letters[1:25],5,5) vecc=matrix(LETTERS[1:25],5,5) I would like to obtain a new matrix composed by alternating rows of these different matrices (row 1 of mat 1, row 1 of mat 2, row 1 of mat 3, row 2 of mat 1.) I have found a solution to do it but it is not very pretty and I wonder if I can do it in an other way (perhaps with apply ) ? res=matrix(0,1,5) for(i in 1:5) + res=rbind(res,veca[i,],vecb[i,],vecc[i,]) res=res[-1,] res [,1] [,2] [,3] [,4] [,5] [1,] 1 6 11 16 21 [2,] a f k p u [3,] A F K P U [4,] 2 7 12 17 22 [5,] b g l q v [6,] B G L Q V [7,] 3 8 13 18 23 [8,] c h m r w [9,] C H M R W [10,] 4 9 14 19 24 [11,] d i n s x [12,] D I N S X [13,] 5 10 15 20 25 [14,] e j o t y [15,] E J O T Y Thanks in advance ! Stéphane DRAY -- Département des Sciences Biologiques Université de Montréal, C.P. 6128, succursale centre-ville Montréal, Québec H3C 3J7, Canada Tel : 514 343 6111 poste 1233 E-mail : [EMAIL PROTECTED] -- Web http://www.steph280.freesurf.fr/ __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] binding rows from different matrices
You can almost always index in such problems: here is one way. rbind(veca,vecb,vecc)[matrix(1:15, 3, byrow=T), ] Take it apart of see how it works, if it is not immediately obvious. On Tue, 29 Jun 2004, Stephane DRAY wrote: Hello list, I have 3 matrices with same dimension : veca=matrix(1:25,5,5) vecb=matrix(letters[1:25],5,5) vecc=matrix(LETTERS[1:25],5,5) I would like to obtain a new matrix composed by alternating rows of these different matrices (row 1 of mat 1, row 1 of mat 2, row 1 of mat 3, row 2 of mat 1.) I have found a solution to do it but it is not very pretty and I wonder if I can do it in an other way (perhaps with apply ) ? res=matrix(0,1,5) for(i in 1:5) + res=rbind(res,veca[i,],vecb[i,],vecc[i,]) res=res[-1,] res [,1] [,2] [,3] [,4] [,5] [1,] 1 6 11 16 21 [2,] a f k p u [3,] A F K P U [4,] 2 7 12 17 22 [5,] b g l q v [6,] B G L Q V [7,] 3 8 13 18 23 [8,] c h m r w [9,] C H M R W [10,] 4 9 14 19 24 [11,] d i n s x [12,] D I N S X [13,] 5 10 15 20 25 [14,] e j o t y [15,] E J O T Y -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Quantile Regression in R
The short answer to your question is that quantile regression estimates are estimating linear conditional quantile functions, just like lm() is used to estimate conditional mean functions. A longer answer would inevitably involve unpleasant suggestions that you should follow the posting guide: a.) send questions about packages to the maintainer, not R-help b.) not attach datasets in modes that are stripped by R-help c.) make a token effort to read the documentation and related literature url:www.econ.uiuc.edu/~rogerRoger Koenker email [EMAIL PROTECTED] Department of Economics vox:217-333-4558University of Illinois fax:217-244-6678Champaign, IL 61820 On Jun 29, 2004, at 10:26 AM, Ali Hirsa wrote: I recently learn about Quantile Regression in R. I am trying to study two time series (attached) by Quantile Regression in R. I wrote the following code and do not know how to interpret the lines. What kind of information can I get from them? Correlation for quantiles, conditional probabilties (i.e. P(X in Quantile i | Y in Quantile i)) , and etc. Many thanks in advance for any help. Best, Ali library(quantreg) #help.start() Data - read.table(RESvsMOVE2.dat) # x - Data[,2] y - Data[,1] par(mfrow=c(2,2)) qqnorm(x,main=MOVE Norm Q-Q Plot, xlab=Normal Qunatiles,ylab = MOVE Quantiles) qqline(x) qqnorm(y,main=Residuals Norm Q-Q Plot, xlab=Normal Qunatiles,ylab = Residuals Quantiles) qqline(y) plot(x,y,xlab=MOVE,ylab=Residuals,cex=.5) xx - seq(min(x),max(x),.5) # Just a linear regression g - coef(lm(y~x)) yy - (g[1]+g[2]*(xx)) lines(xx,yy,col=yellow) taus - c(.05,.1,.25,.5,.75,.9,.95) for(tau in taus){ f - coef(rq(y~x,tau=tau,method=pfn)) yy - (f[1]+f[2]*(xx)) if (tau ==.05){ lines(xx,yy,col=red) } if (tau ==.95){ lines(xx,yy,col=green) } if (tau != .05 tau != .95){ lines(xx,yy,col=blue) } } __ __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] alternate rank method
Torsten == Torsten Hothorn [EMAIL PROTECTED] on Mon, 28 Jun 2004 10:59:26 +0200 (CEST) writes: Torsten On Fri, 25 Jun 2004, Douglas Grove wrote: I should have specified an additional constraint: I'm going to need to use this repeatedly on large vectors (length 10^6), so something efficient is needed. Torsten give function `irank' in package `exactRankTests' a Torsten try. As an answer to Torsten (who got it already orally) and Gabor's original tricky suggestions: I strongly believe this should happen in the same C code on which R's base rank() function works and already implements the *averaging* of ties. Doing the analog of changing average(..) to min(..) or max(..) shouldn't be hard and certainly will be more efficient than the workarounds posted here. Patches welcome... since otherwise I'm not sure I'll get there in time. Martin __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] binding rows from different matrices
Prof Brian Ripley [EMAIL PROTECTED] writes: You can almost always index in such problems: here is one way. rbind(veca,vecb,vecc)[matrix(1:15, 3, byrow=T), ] Take it apart of see how it works, if it is not immediately obvious. Or, a little longer, but perhaps more intuitive: matrix(aperm(array(c(veca,vecb,vecc),c(5,5,3)),c(3,1,2)),15) I.e., convert to array, do generalized transpose, convert back to matrix. Not that I got the index calculations right on first try -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Different behaviour of unique(), R vs. Splus.
__ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] alternate rank method
I agree. These are obvious extensions to the options provided now by rank. I didn't suggest this as I am not a contributor and don't feel comfortable asking others to do more work :) Thanks, Doug On Tue, 29 Jun 2004, Martin Maechler wrote: Torsten == Torsten Hothorn [EMAIL PROTECTED] on Mon, 28 Jun 2004 10:59:26 +0200 (CEST) writes: Torsten On Fri, 25 Jun 2004, Douglas Grove wrote: I should have specified an additional constraint: I'm going to need to use this repeatedly on large vectors (length 10^6), so something efficient is needed. Torsten give function `irank' in package `exactRankTests' a Torsten try. As an answer to Torsten (who got it already orally) and Gabor's original tricky suggestions: I strongly believe this should happen in the same C code on which R's base rank() function works and already implements the *averaging* of ties. Doing the analog of changing average(..) to min(..) or max(..) shouldn't be hard and certainly will be more efficient than the workarounds posted here. Patches welcome... since otherwise I'm not sure I'll get there in time. Martin __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Different behaviour of unique(), R vs. Splus.
__ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Several PCA questions...
Perhaps this question is less dumb... (in context below...) On Tue, 29 Jun 2004, Prof Brian Ripley wrote: On Tue, 29 Jun 2004, Dan Bolser wrote: Hi, I am doing PCA on several columns of data in a data.frame. I am interested in particular rows of data which may have a particular combination of 'types' of column values (without any pre-conception of what they may be). I do the following... # My data table. allDat - read.table(big_select_thresh_5, header=1) # Where some rows look like this... # PDB SUNID1 SUNID2 AA CH IPCAPCA IBB BB # 3sdh14984 14985 6 10 24 24 93 116 # 3hbi14986 14987 6 10 20 22 94 117 # 4sdh14988 14989 6 10 20 20 104 122 # NB First three columns = row ID, last 6 = variables attach(allDat) # My columns of interest (variables). part - data.frame(AA,CH,IPCA,PCA,IBB,BB) pc - princomp(part) Do you really want an unscaled PCA on that data set? Looks unlikely (but then two of the columns are constant in the sample, which is also worrying). That is just sample bias. By unscaled I assume you mean something like normalized? plot(pc) The above plot shows that 95% of the variance is due to the first 'Component' (which I assume is AA). No, it is the first (principal) component. You did ask for PCA! i.e. All the variables behave in quite much the same way. Or you failed to scale the data so one dominates. Yes. I added the following to the above x - colMeans(part) partNorm - part/x pc1 - princomp(partNorm) plot(pc1) biplot(pc1) Which shows two major components, and possibly a third. What I want to know is that given my data is not uniformly distributed, is my normalization valid? I know I should find this out via further investigation of PCA, but in general if my variables have a very skewed distribution (possibly without a theoretically definable mean) should I attempt to use any standard clustering technique? I guess I should log transform my data. Cheers, Dan. I then did ... biplot(pc) Which showed some outliers with a numeric ID - How do I get back my old 3 part ID used in allDat? Set row names on your data frame. Like almost all of R, it is the row names of a data frame that are used for labelling, and you did not give any so you got numbers. In the above plot I saw all the variables (correctly named) pointing in more or less the same direction (as shown by the variance). I then did the following... postscript(file=test.ps,paper=a4) biplot(pc) dev.off() However, looking at test.ps shows that the arrows are missing (using ggv)... Hmmm, they come back when I pstoimg then xv... never mind. So ggv is unreliable, perhaps cannot cope with colours? Finally, I would like to make a contour plot of the above biplot, is this possible? (or even a good way to present the data? What do you propose to represent by the contours? Biplots have a well-defined interpretation in terms of distances and angles. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] PAM clustering: using my own dissimilarity matrix
Hello, I would like to use my own dissimilarity matrix in a PAM clustering with method pam (cluster package) instead of a dissimilarity matrix created by daisy. I read data from a file containing the dissimilarity values using read.csv. This creates a matrix (alternatively: an array or vector) which is not accepted by pam: A call p-pam(d,k=2,diss=TRUE) yields an error message Error in pam(d, k = 2, diss = TRUE) : x is not of class dissimilarity and can not be converted to this class. How can I convert the matrix d into a dissimilarity matrix suitable for pam? I'm aware of a response by Friedrich Leisch to a similar question posed by Jose Quesada (quoted below). But as I understood the answer, the dissimilarity matrix there is calculated on the basis of (random) data. Thank you in advance. Hans __ / On Tue, 09 Jan 2001 15:42:30 -0700, / / Jose Quesada (JQ) wrote: / / Hi, / / I'm trying to use a similarity matrix (triangular) as input for pam() or / / fanny() clustering algorithms. / / The problem is that this algorithms can only accept a dissimilarity / / matrix, normally generated by daisy(). / / However, daisy only accept 'data matrix or dataframe. Dissimilarities / / will be computed between the rows of x'. / / Is there any way to say to that your data are already a similarity / / matrix (triangular)? / / In Kaufman and Rousseeuw's FORTRAN implementation (1990), they showed an / / option like this one: / / Maybe you already have correlations coefficients between variables. / / Your input data constist on a lower triangular matrix of pairwise / / correlations. You wish to calculate dissimilarities between the / / variables. / / But I couldn't find this alternative in the R implementation. / / I can not use foo - as.dist(foo), neither daisy(foo...) because / / Dissimilarities will be computed between the rows of x, and this is / / not / / what I mean. / / You can easily transform your similarities into dissimilarities like / / this (also recommended in Kaufman and Rousseeuw ,1990): / / foo - (1 - abs(foo)) # where foo are similarities / / But then pam() will complain like this: / / x is not of class dissimilarity and can not be converted to this / / class. / / Can anyone help me? I also appreciate any advice about other clustering / / algorithms that can accept this type of input. / Hmm, I don't understand your problem, because proceeding as the docs describe it works for me ... If foo is a similarity matrix (with 1 meaning identical objects), then bar - as.dist(1 - abs(foo)) fanny(bar, ...) works for me: ## create a random 12x12 similarity matrix, make it symmetric and set the ## diagonal to 1 / x - matrix(runif(144), nc=12) / / x - x+t(x) / / diag(x) - 1 / ## now proceed as described in the docs / y - as.dist(1-x) / / fanny(y, 3) / iterations objective 42.00 3.303235 Membership coefficients: [,1] [,2] [,3] 1 0.333 0.333 0.333 2 0.333 0.333 0.333 3 0.334 0.333 0.333 4 0.333 0.333 0.333 ... __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] removing NA from an its object
Hi again! I have the following its object: class(x1) [1] its attr(,package) [1] its x1 FTSE DAX 2004-06-07 4491.6 4017.81 2004-06-08 4504.8 4018.95 2004-06-09 4489.5 3997.76 2004-06-10 4486.1 4021.64 2004-06-11 4484.0 4014.56 2004-06-14 4433.2 3948.65 2004-06-15 4458.6 3987.30 2004-06-16 4491.1 4003.24 2004-06-17 4493.3 3985.46 2004-06-18 4505.8 3999.79 2004-06-21 4502.2 3989.31 2004-06-22 NA 3928.39 2004-06-23 NA 3945.10 2004-06-24 NA 4007.05 2004-06-25 NA 4013.35 2004-06-28 NA 4069.35 I want to create an its object with no NAs; that is, if there is an NA in any column, strike the entire row. I did the following: x2 - its(na.omit(x1)) x2 FTSE DAX 2004-06-07 4491.6 4017.81 2004-06-08 4504.8 4018.95 2004-06-09 4489.5 3997.76 2004-06-10 4486.1 4021.64 2004-06-11 4484.0 4014.56 2004-06-14 4433.2 3948.65 2004-06-15 4458.6 3987.30 2004-06-16 4491.1 4003.24 2004-06-17 4493.3 3985.46 2004-06-18 4505.8 3999.79 2004-06-21 4502.2 3989.31 attr(,na.action) 2004-06-22 2004-06-23 2004-06-24 2004-06-25 2004-06-28 12 13 14 15 16 attr(,class) [1] omit class(x2) - its My question: is this the best way to accomplish the goal, please? I tried apply with all and is.na but I got strange results. Thanks. R Version 1.9.1 Sincerely, Laura mailto: [EMAIL PROTECTED] Married. http://lifeevents.msn.com/category.aspx?cid=married __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] naive question
I have a 100Mb comma-separated file, and R takes several minutes to read it (via read.table()). This is R 1.9.0 on a linux box with a couple gigabytes of RAM. I am conjecturing that R is gc-ing, so maybe there is some command-line arg I can give it to convince it that I have a lot of space, or?! Thanks! Igor __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] PAM clustering: using my own dissimilarity matrix
Hi! If your x is your symmetric matrix containing the distances than cast it to an dist object using as.dist. ?as.dist. Sincerely Eryk *** REPLY SEPARATOR *** On 29.06.2004 at 18:28 Hans Körber wrote: Hello, I would like to use my own dissimilarity matrix in a PAM clustering with method pam (cluster package) instead of a dissimilarity matrix created by daisy. I read data from a file containing the dissimilarity values using read.csv. This creates a matrix (alternatively: an array or vector) which is not accepted by pam: A call p-pam(d,k=2,diss=TRUE) yields an error message Error in pam(d, k = 2, diss = TRUE) : x is not of class dissimilarity and can not be converted to this class. How can I convert the matrix d into a dissimilarity matrix suitable for pam? I'm aware of a response by Friedrich Leisch to a similar question posed by Jose Quesada (quoted below). But as I understood the answer, the dissimilarity matrix there is calculated on the basis of (random) data. Thank you in advance. Hans __ / On Tue, 09 Jan 2001 15:42:30 -0700, / / Jose Quesada (JQ) wrote: / / Hi, / / I'm trying to use a similarity matrix (triangular) as input for pam() or / / fanny() clustering algorithms. / / The problem is that this algorithms can only accept a dissimilarity / / matrix, normally generated by daisy(). / / However, daisy only accept 'data matrix or dataframe. Dissimilarities / / will be computed between the rows of x'. / / Is there any way to say to that your data are already a similarity / / matrix (triangular)? / / In Kaufman and Rousseeuw's FORTRAN implementation (1990), they showed an / / option like this one: / / Maybe you already have correlations coefficients between variables. / / Your input data constist on a lower triangular matrix of pairwise / / correlations. You wish to calculate dissimilarities between the / / variables. / / But I couldn't find this alternative in the R implementation. / / I can not use foo - as.dist(foo), neither daisy(foo...) because / / Dissimilarities will be computed between the rows of x, and this is / / not / / what I mean. / / You can easily transform your similarities into dissimilarities like / / this (also recommended in Kaufman and Rousseeuw ,1990): / / foo - (1 - abs(foo)) # where foo are similarities / / But then pam() will complain like this: / / x is not of class dissimilarity and can not be converted to this / / class. / / Can anyone help me? I also appreciate any advice about other clustering / / algorithms that can accept this type of input. / Hmm, I don't understand your problem, because proceeding as the docs describe it works for me ... If foo is a similarity matrix (with 1 meaning identical objects), then bar - as.dist(1 - abs(foo)) fanny(bar, ...) works for me: ## create a random 12x12 similarity matrix, make it symmetric and set the ## diagonal to 1 / x - matrix(runif(144), nc=12) / / x - x+t(x) / / diag(x) - 1 / ## now proceed as described in the docs / y - as.dist(1-x) / / fanny(y, 3) / iterations objective 42.00 3.303235 Membership coefficients: [,1] [,2] [,3] 1 0.333 0.333 0.333 2 0.333 0.333 0.333 3 0.334 0.333 0.333 4 0.333 0.333 0.333 ... __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Several PCA questions...
See `cor' in ?princomp, and its references. I meant `scale' as in ?scale. On Tue, 29 Jun 2004, Dan Bolser wrote: Perhaps this question is less dumb... (in context below...) On Tue, 29 Jun 2004, Prof Brian Ripley wrote: On Tue, 29 Jun 2004, Dan Bolser wrote: Hi, I am doing PCA on several columns of data in a data.frame. I am interested in particular rows of data which may have a particular combination of 'types' of column values (without any pre-conception of what they may be). I do the following... # My data table. allDat - read.table(big_select_thresh_5, header=1) # Where some rows look like this... # PDB SUNID1 SUNID2 AA CH IPCAPCA IBB BB # 3sdh14984 14985 6 10 24 24 93 116 # 3hbi14986 14987 6 10 20 22 94 117 # 4sdh14988 14989 6 10 20 20 104 122 # NB First three columns = row ID, last 6 = variables attach(allDat) # My columns of interest (variables). part - data.frame(AA,CH,IPCA,PCA,IBB,BB) pc - princomp(part) Do you really want an unscaled PCA on that data set? Looks unlikely (but then two of the columns are constant in the sample, which is also worrying). That is just sample bias. By unscaled I assume you mean something like normalized? plot(pc) The above plot shows that 95% of the variance is due to the first 'Component' (which I assume is AA). No, it is the first (principal) component. You did ask for PCA! i.e. All the variables behave in quite much the same way. Or you failed to scale the data so one dominates. Yes. I added the following to the above x - colMeans(part) partNorm - part/x pc1 - princomp(partNorm) plot(pc1) biplot(pc1) Which shows two major components, and possibly a third. What I want to know is that given my data is not uniformly distributed, is my normalization valid? I know I should find this out via further investigation of PCA, but in general if my variables have a very skewed distribution (possibly without a theoretically definable mean) should I attempt to use any standard clustering technique? I guess I should log transform my data. Cheers, Dan. I then did ... biplot(pc) Which showed some outliers with a numeric ID - How do I get back my old 3 part ID used in allDat? Set row names on your data frame. Like almost all of R, it is the row names of a data frame that are used for labelling, and you did not give any so you got numbers. In the above plot I saw all the variables (correctly named) pointing in more or less the same direction (as shown by the variance). I then did the following... postscript(file=test.ps,paper=a4) biplot(pc) dev.off() However, looking at test.ps shows that the arrows are missing (using ggv)... Hmmm, they come back when I pstoimg then xv... never mind. So ggv is unreliable, perhaps cannot cope with colours? Finally, I would like to make a contour plot of the above biplot, is this possible? (or even a good way to present the data? What do you propose to represent by the contours? Biplots have a well-defined interpretation in terms of distances and angles. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Jens Praestgaard/Hgsi is out of the office.
I will be out of the office starting 06/28/2004 and will not return until 06/30/2004. Jens Praestgaard is out of the office until June 30 and will respond to your message when he returns. Thank you __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] removing NA from an its object
What did you try with apply? It seems to work for me. I did x2[!apply(is.na(x2), 1, any),] and got the desired results. Kevin -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Laura Holt Sent: Tuesday, June 29, 2004 9:26 AM To: [EMAIL PROTECTED] Subject: [R] removing NA from an its object Hi again! I have the following its object: class(x1) [1] its attr(,package) [1] its x1 FTSE DAX 2004-06-07 4491.6 4017.81 2004-06-08 4504.8 4018.95 2004-06-09 4489.5 3997.76 2004-06-10 4486.1 4021.64 2004-06-11 4484.0 4014.56 2004-06-14 4433.2 3948.65 2004-06-15 4458.6 3987.30 2004-06-16 4491.1 4003.24 2004-06-17 4493.3 3985.46 2004-06-18 4505.8 3999.79 2004-06-21 4502.2 3989.31 2004-06-22 NA 3928.39 2004-06-23 NA 3945.10 2004-06-24 NA 4007.05 2004-06-25 NA 4013.35 2004-06-28 NA 4069.35 I want to create an its object with no NAs; that is, if there is an NA in any column, strike the entire row. I did the following: x2 - its(na.omit(x1)) x2 FTSE DAX 2004-06-07 4491.6 4017.81 2004-06-08 4504.8 4018.95 2004-06-09 4489.5 3997.76 2004-06-10 4486.1 4021.64 2004-06-11 4484.0 4014.56 2004-06-14 4433.2 3948.65 2004-06-15 4458.6 3987.30 2004-06-16 4491.1 4003.24 2004-06-17 4493.3 3985.46 2004-06-18 4505.8 3999.79 2004-06-21 4502.2 3989.31 attr(,na.action) 2004-06-22 2004-06-23 2004-06-24 2004-06-25 2004-06-28 12 13 14 15 16 attr(,class) [1] omit class(x2) - its My question: is this the best way to accomplish the goal, please? I tried apply with all and is.na but I got strange results. Thanks. R Version 1.9.1 Sincerely, Laura mailto: [EMAIL PROTECTED] Married. http://lifeevents.msn.com/category.aspx?cid=married __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R via ssh login on OS X?
On Jun 29, 2004, at 3:43 AM, Prof Brian Ripley wrote: Did you look at the notes on MacOS X in the R-admin manual (as the INSTALL file asks)? That would have told you why lapack failed, and I think you should redo your build following the advice there. Clearly I didn't read closely enough. Thanks for the reminder. The build and check completed successfully as a fully non-root build with this sequence: Compile f2c and libf2c and put f2c, f2c.h and libf2c.a in $HOME/f2c. Run ranlib on libf2c.a mkdir $HOME/f2c mkdir ~/Rinstall mv R-1.9.1.tgz Rinstall/ PATH=$PATH:$HOME/f2c (So that configure can find the f2c executable) export LDFLAGS=-L$HOME/f2c/ export CPPFLAGS=-I$HOME/f2c/ ./configure --prefix=$HOME/Rinstall/ --with-blas='-framework vecLib' --with-lapack make make check make install are all successful. Built in this way it has no problems with remote login on OS X. As I said I haven't found problems with remote log-in at at with 1.9.1 Thanks for the help everyone. --J On Tue, 29 Jun 2004, James Howison wrote: [...] then did ./configure --prefix=$HOME/Rinstall/ --enable-R-framework=no --with-x=no --with-lapack=no Note --with-blas='-framework vecLib' --with-lapack is `strongly recommended', and on some versions of MacOS X `appears to be the only way to build R'. and then make This basically worked but for some reason lapack was still trying to build and that was failing, so I deleted it from the appropriate makefile and the rest of the compile went fine. The lapack confusion stopped some of the recommended modules from building but I didn't need those (just sna which built fine from CRAN). [...] -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 --James +1 315 395 4056 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] RE: [S] Different behaviour of unique(), R vs. Splus.
The source of the incompatibility: In S-PLUS 6.2: methods(unique) splussplus menu splus unique.data.frame unique.default unique.name unique.rowcol.names In R-1.9.1: methods(unique) [1] unique.array unique.data.frame unique.defaultunique.matrix Unless there's some sort of coordination (or even just separate effort) on either/both R Core and Insightful developers to make sure there's agreement on what methods to provide in the base code, such problem can only get worse, not better, I guess. Best, Andy From: Rolf Turner Apologies for the cross-posting, but I thought this snippet of info might be vaguely interesting to both lists. I did a ***brief*** search to see if this issue had previously been discussed and found nothing. So I thought I'd tell the list about a difference in behaviour between unique() in R and unique() in Splus which bit me just now. I was trying to convert a package from Splus to R and got nonsense answers in R. Turned out that within the bowels of the package I was doing something like u - unique(y) where y was a matrix of integer values. In Splus this gives a (short) vector of unique values. In R it gives a matrix of the same dimensionality as y, except that any duplicated rows are eliminated. (This looks like being very useful --- once you know about it. And it was probably mentioned in the R release notes at one time, but, as Dr. Hook says, ``I was stoned and I missed it.'') E.g. set.seed(42) m - matrix(sample(1:5,20,TRUE),5,4) u - unique(m) In R ``u'' is identical to ``m''; in Splus ``u'' is vector (of length 5). To get what I want in R I simply need to do u - unique(as.vector(y)) Simple, once you know. Took me a devil of a long time to track down what was going wrong, but! cheers, Rolf Turner This message was distributed by [EMAIL PROTECTED] To ...(s-news.. clipped)... __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] anti-R vitriol
A colleague is receiving some data from another person. That person reads the data in SAS and it takes 30s and uses 64k RAM. That person then tries to read the data in R and it takes 10 minutes and uses a gigabyte of RAM. Person then goes on to say: It's not that I think SAS is such great software, it's not. But I really hate badly designed software. R is designed by committee. Worse, it's designed by a committee of statisticians. They tend to confuse numerical analysis with computer science and don't have any idea about software development at all. The result is R. I do hope [your colleague] won't have to waste time doing [this analysis] in an outdated and poorly designed piece of software like R. Would any of the committee like to respond to this? Or shall we just slap our collective forehead and wonder how someone could get such a view? Barry __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Goodness of fit test for estimated distribution
Hi, is there any method for goodness of fit testing of an (as general as possible) univariate distribution with parameters estimated, for normal, exponential, gamma distributions, say (e.g. the corrected p-values for the Kolmogorov-Smirnov or Chi-squared with corresponding ML estimation method)? It seems that neither ks.test nor chisq.test handle estimated parameters. I am aware of function goodfit in package vcd, which seems to it for some discrete distributions. Thank you for help, Christian *** Christian Hennig Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg [EMAIL PROTECTED], http://www.math.uni-hamburg.de/home/hennig/ ### ich empfehle www.boag-online.de __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] binding rows from different matrices
Still another variation on the same theme: matrix(t(cbind(veca,vecb,vecc)),nc=5,byrow=T) Giovanni Date: Tue, 29 Jun 2004 17:58:32 +0200 From: Peter Dalgaard [EMAIL PROTECTED] Sender: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED], Stephane DRAY [EMAIL PROTECTED] Precedence: list User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2 Lines: 20 Prof Brian Ripley [EMAIL PROTECTED] writes: You can almost always index in such problems: here is one way. rbind(veca,vecb,vecc)[matrix(1:15, 3, byrow=T), ] Take it apart of see how it works, if it is not immediately obvious. Or, a little longer, but perhaps more intuitive: matrix(aperm(array(c(veca,vecb,vecc),c(5,5,3)),c(3,1,2)),15) I.e., convert to array, do generalized transpose, convert back to matrix. Not that I got the index calculations right on first try -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- __ [ ] [ Giovanni Petris [EMAIL PROTECTED] ] [ Department of Mathematical Sciences ] [ University of Arkansas - Fayetteville, AR 72701 ] [ Ph: (479) 575-6324, 575-8630 (fax) ] [ http://definetti.uark.edu/~gpetris/ ] [__] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] anti-R vitriol
My reaction, as a mere individual user: Of course, one cannot have any idea what's really going on, so a rational reply to the rant is impossible. But, as this list repeatedly demonstrates (and as we all have probably experienced), it is possible to do things foolishly in any software. Worth noting: John Chambers, the designer of the S language (of which R is an implementation) won an ACM computing award (readers -- please correct details of this citation) for his achievement; so apparently the professional computing community disagreed with the sentiments expressed in the rant. Cheers, -- Bert Gunter Non-Clinical Biostatistics Genentech MS: 240B Phone: 650-467-7374 The business of the statistician is to catalyze the scientific learning process. -- George E.P. Box Barry Rowlingson wrote: A colleague is receiving some data from another person. That person reads the data in SAS and it takes 30s and uses 64k RAM. That person then tries to read the data in R and it takes 10 minutes and uses a gigabyte of RAM. Person then goes on to say: It's not that I think SAS is such great software, it's not. But I really hate badly designed software. R is designed by committee. Worse, it's designed by a committee of statisticians. They tend to confuse numerical analysis with computer science and don't have any idea about software development at all. The result is R. I do hope [your colleague] won't have to waste time doing [this analysis] in an outdated and poorly designed piece of software like R. Would any of the committee like to respond to this? Or shall we just slap our collective forehead and wonder how someone could get such a view? Barry __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] anti-R vitriol
I'm not too concerned about your colleague's view about R. S/He doesn' have to like it, and I don't think anyone actually believes that R is designed to make *everyone* happy. For me, R does about 99% of the things I need to do, but sadly, when I need to order a pizza, I still have to pick up the telephone. What worries me more is that your colleague seems to have lost sight of the fact that just about all software development involves tradeoffs. Although I've never used SAS, I've used other stat packages and it's clear that all of them (including R) have traded in some things to get out other things. An example is R's potentially large memory usage, which, one might argue, trades in analyses of very large datasets but gets out a very powerful and elegant programming language. Rather than use absolutes, I'd encourage your colleague to be more specific. Rather than and say things like R is poorly designed I'd like to hear R is poorly designed for [fill in the blank]. Then we can get a better handle on the world in which s/he lives. -roger Barry Rowlingson wrote: A colleague is receiving some data from another person. That person reads the data in SAS and it takes 30s and uses 64k RAM. That person then tries to read the data in R and it takes 10 minutes and uses a gigabyte of RAM. Person then goes on to say: It's not that I think SAS is such great software, it's not. But I really hate badly designed software. R is designed by committee. Worse, it's designed by a committee of statisticians. They tend to confuse numerical analysis with computer science and don't have any idea about software development at all. The result is R. I do hope [your colleague] won't have to waste time doing [this analysis] in an outdated and poorly designed piece of software like R. Would any of the committee like to respond to this? Or shall we just slap our collective forehead and wonder how someone could get such a view? Barry __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Roger D. Peng http://www.biostat.jhsph.edu/~rpeng/ __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] anti-R vitriol
From: Barry Rowlingson A colleague is receiving some data from another person. That person reads the data in SAS and it takes 30s and uses 64k RAM. That person then tries to read the data in R and it takes 10 minutes and uses a gigabyte of RAM. Person then goes on to say: It's not that I think SAS is such great software, it's not. But I really hate badly designed software. R is designed by committee. Worse, it's designed by a committee of statisticians. They tend to confuse numerical analysis with computer science and don't have any idea about software development at all. The result is R. I do hope [your colleague] won't have to waste time doing [this analysis] in an outdated and poorly designed piece of software like R. Would any of the committee like to respond to this? Or shall we just slap our collective forehead and wonder how someone could get such a view? Barry My $0.02: R, being a flexible programming language, has an amazing ability to cope with people's laziness/ignorance/inelegance, but it comes at a (sometimes hefty) price. While there is no specifics on the situation leading to the person's comments, here's one (not as extreme) example that I happen to come across today: system.time(spam - read.table(data_dmc2003_train.txt, + header=T, + colClasses=c(rep(numeric, 833), + character))) [1] 15.92 0.09 16.80NANA system.time(spam - read.table(data_dmc2003_train.txt, header=T)) [1] 187.29 0.60 200.19 NA NA My SAS ability is rather serverely limited, but AFAIK, one needs to specify _all_ variables to be read into a dataset in order to read in the data in SAS. If one has that information, R can be very efficient as well. Without that information, one gets nothing in SAS, or just let R does the hard work. Best, Andy __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Goodness of fit test for estimated distribution
In full generality this is a quite difficult problem as discussed in Durbin's (1973) SIAM monograph. An elegant general approach is provided by Khmaladze @article{Khma:Arie:1981, author = {Khmaladze, E. V.}, title = {Martingale approach in the theory of goodness-of-fit tests}, year = {1981}, journal = {Theory of Probability and its Applications (Transl of Teorija Verojatnostei i ee Primenenija)}, volume = {26}, pages = {240--257} } but I don't think that there is a general implementation of the approach for R, or any other software environment, for that matter. url:www.econ.uiuc.edu/~rogerRoger Koenker email [EMAIL PROTECTED] Department of Economics vox:217-333-4558University of Illinois fax:217-244-6678Champaign, IL 61820 On Jun 29, 2004, at 1:08 PM, Christian Hennig wrote: Hi, is there any method for goodness of fit testing of an (as general as possible) univariate distribution with parameters estimated, for normal, exponential, gamma distributions, say (e.g. the corrected p-values for the Kolmogorov-Smirnov or Chi-squared with corresponding ML estimation method)? It seems that neither ks.test nor chisq.test handle estimated parameters. I am aware of function goodfit in package vcd, which seems to it for some discrete distributions. Thank you for help, Christian *** Christian Hennig Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg [EMAIL PROTECTED], http://www.math.uni-hamburg.de/home/hennig/ ### ich empfehle www.boag-online.de __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] job opening in Merck Research Labs, NJ
Apology for the cross-post... Andy == Job description: Computational statistician/medical image analyst The Biometrics Research Department at Merck Research Laboratories, Merck Co., Inc. in Rahway, NJ is seeking a highly motivated statistician/data analyst to work in its basic research and drug discovery area. The applicant should have broad expertise in image processing, statistics, and computer science, with substantial experience in medical imaging analysis including design of experiments, image registration and segmentation, statistical analysis, and pattern recognition. The position will initially involve providing statistical, mathematical, and software development support for MRI and ultrasound imaging teams in preclinical research (i.e., animal studies, not human). Merck has its own facilities for CT, PET, MRI, and ultrasound imaging. We are looking for a Ph.D. with a background and/or post-doctoral experience in at least one of the following fields: Statistics, Electrical/Computer or Biomedical Engineering, Computer Science, Applied Mathematics, or Physics. Advanced computer programming skills (including, but not limited to Matlab, C/C++, Visual Basic, SQL, IDL, or PV-WAVE), and good communication skills are essential, as is familiarity with statistical software like R and Splus. The position may also involve general statistical consulting and training. An ability to lead statistical analysis efforts within a multidisciplinary team is required. Strong candidates will also have interests and experience in computer vision, machine learning/data mining, and/or signal processing. Our dedication to delivering quality medicines in innovative ways and our commitment to bringing out the best in our people are just some of the reasons why we're ranked among Fortune magazine's 100 Best Companies to Work for in America. We offer a competitive salary, an oustanding benefits package, and a professional work environment with a company known for scientific excellence. To apply, please forward your CV or resume and cover letter to ATTENTION: Open Position Vladimir Svetnik, Ph.D. Biometrics Research Dept. Merck Research Laboratories, RY33-300 126 E. Lincoln Avenue Rahway, NJ 07065-0900 [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Goodness of fit test for estimated distribution
What about Monte Carlo? I recently produced (with help from contributors to this list) qq plots for certain complicated mixtures of distributions. To evaluate goodness of fit, I produced Monte Carlo confidence intervals from 401 simulated qq plots and took the 11th and 391st of them for each quantile. {quantile(1:401, c(.025, .975)) = c(11, 391)}. Something like this could be done to obtain a significance level for ks.test, for example. This may not be as satisfying for some purposes as a clean, theoretical result, but it produced useful answers without busting the project budget too badly. hope this helps. spencer graves roger koenker wrote: In full generality this is a quite difficult problem as discussed in Durbin's (1973) SIAM monograph. An elegant general approach is provided by Khmaladze @article{Khma:Arie:1981, author = {Khmaladze, E. V.}, title = {Martingale approach in the theory of goodness-of-fit tests}, year = {1981}, journal = {Theory of Probability and its Applications (Transl of Teorija Verojatnostei i ee Primenenija)}, volume = {26}, pages = {240--257} } but I don't think that there is a general implementation of the approach for R, or any other software environment, for that matter. url:www.econ.uiuc.edu/~rogerRoger Koenker email[EMAIL PROTECTED]Department of Economics vox: 217-333-4558University of Illinois fax: 217-244-6678Champaign, IL 61820 On Jun 29, 2004, at 1:08 PM, Christian Hennig wrote: Hi, is there any method for goodness of fit testing of an (as general as possible) univariate distribution with parameters estimated, for normal, exponential, gamma distributions, say (e.g. the corrected p-values for the Kolmogorov-Smirnov or Chi-squared with corresponding ML estimation method)? It seems that neither ks.test nor chisq.test handle estimated parameters. I am aware of function goodfit in package vcd, which seems to it for some discrete distributions. Thank you for help, Christian *** Christian Hennig Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg [EMAIL PROTECTED], http://www.math.uni-hamburg.de/home/hennig/ ### ich empfehle www.boag-online.de __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] anti-R vitriol
Barry Rowlingson B.Rowlingson at lancaster.ac.uk writes: : A colleague is receiving some data from another person. That person : reads the data in SAS and it takes 30s and uses 64k RAM. That person : then tries to read the data in R and it takes 10 minutes and uses a : gigabyte of RAM. Person then goes on to say: : :It's not that I think SAS is such great software, :it's not. But I really hate badly designed :software. R is designed by committee. Worse, :it's designed by a committee of statisticians. :They tend to confuse numerical analysis with :computer science and don't have any idea about :software development at all. The result is R. : :I do hope [your colleague] won't have to waste time doing :[this analysis] in an outdated and poorly designed piece :of software like R. : : Would any of the committee like to respond to this? Or shall we just : slap our collective forehead and wonder how someone could get such a view? Does he have to repeatedly read in different large datasets or is this just a one time requirement? In the latter case, he could read in the data, save it (using the save command), and then just load it (using the load command) in subsequent sessions. He would only have to wait 10 minutes the first time. If he has that much data its probably a large project and a one time hit of 10 minutes versus several days, weeks or months of work seems negligible. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] fl_show_fselector
Hi everybody, I new to xforms and I'm trying to use fl_show_fselector. In fact I did it without any problem. But now I'm getting segmentation fault in the line of code: filename =fl_show_fselector(Select file to open,.,*.off, ); the message is: In SetFont [fonts.c 224] Bad FontStyle request 0: Segmentation fault (core dumped) I'm wondering what is happening here. Can anybody help me? Thanks a lot, Dimas __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] [R-pkgs] MNP
We would like to announce the release of our software, which is now available through CRAN. MNP: R Package for Fitting the Multinomial Probit Models Abstract: MNP is a publicly available R package that fits the Bayesian multinomial probit models via Markov chain Monte Carlo. Along with the standard multinomial probit model, it can also fit models with different choice sets for each observation, and complete or partial ordering of all the available alternatives. The computation is based on the efficient marginal data augmentation algorithm that is developed by Imai and van Dyk (2004) ``A Bayesian Analysis of the Multinomial Probit Model Using the Data Augmentation,'' Journal of Econometrics, Forthcoming. Kosuke Imai, Department of Politics, Princeton University Jordan R. Vance, Department of Computer Science, Princeton University David A. van Dyk, Department of Statistics, University of California, Irvine ___ R-packages mailing list [EMAIL PROTECTED] https://www.stat.math.ethz.ch/mailman/listinfo/r-packages __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] naive question
There are hints in the R Data Import/Export Manual. Just checking: you _have_ read it? On Tue, 29 Jun 2004, Igor Rivin wrote: I have a 100Mb comma-separated file, and R takes several minutes to read it (via read.table()). This is R 1.9.0 on a linux box with a couple gigabytes of RAM. I am conjecturing that R is gc-ing, so maybe there is some command-line arg I can give it to convince it that I have a lot of space, or?! -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Re: [S] Different behaviour of unique(), R vs. Splus.
On Tue, 29 Jun 2004, Liaw, Andy wrote: The source of the incompatibility: In S-PLUS 6.2: methods(unique) splussplus menu splus unique.data.frame unique.default unique.name unique.rowcol.names In R-1.9.1: methods(unique) [1] unique.array unique.data.frame unique.defaultunique.matrix Unless there's some sort of coordination (or even just separate effort) on either/both R Core and Insightful developers to make sure there's agreement on what methods to provide in the base code, such problem can only get worse, not better, I guess. There are plans to that effect, but R moves much faster than a commercial product such as S-PLUS. It seems to me a bad idea that unique (or foo) does different things for matrices and data frames, for as we see frequently, many users do not distinguish between them. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] naive question
I did read the Import/Export document. It is true that replacing the read.table by read.csv and setting the commentChar= speeds things up some (a factor of two?) -- this is very far from acceptable performance, being some two orders of magnitude worse than SAS (the IO of which is, in turn, much worse than that of the unix utilities (awk, sort, and so on)) . Setting colClasses is suggested (and has been suggested by some in response to my question), but for a frame with some 60 columns, this is a major nuisance. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[OT] Ordering pizza [was Re: [R] anti-R vitriol]
Roger D. Peng wrote: I'm not too concerned about your colleague's view about R. S/He doesn' have to like it, and I don't think anyone actually believes that R is designed to make *everyone* happy. For me, R does about 99% of the things I need to do, but sadly, when I need to order a pizza, I still have to pick up the telephone. There are several chains of pizzerias in the U.S. that provide for Internet-based ordering (e.g. www.papajohnsonline.com) so, with the Internet modules in R, it's only a matter of time before you will have a pizza-ordering function available. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [OT] Ordering pizza [was Re: [R] anti-R vitriol]
Dang! You're making me hungry! cheers, Rolf Turner __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [OT] Ordering pizza [was Re: [R] anti-R vitriol]
On Tue, 29 Jun 2004, Douglas Bates wrote: Roger D. Peng wrote: I'm not too concerned about your colleague's view about R. S/He doesn' have to like it, and I don't think anyone actually believes that R is designed to make *everyone* happy. For me, R does about 99% of the things I need to do, but sadly, when I need to order a pizza, I still have to pick up the telephone. There are several chains of pizzerias in the U.S. that provide for Internet-based ordering (e.g. www.papajohnsonline.com) so, with the Internet modules in R, it's only a matter of time before you will have a pizza-ordering function available. Indeed, the GraphApp toolkit (used for the RGui interface under R for Windows, but Guido forgot to include it) provides one (for use in Sydney, Australia, we presume as that is where the GraphApp author hails from). Alternatively, a Padovian has no need of ordering pizzas with both home and neighbourhood restaurants -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Is there a function for Principal Surface?
Dear All I know there are functions (PCURVE, PRINCURVE) in R packages to estimate the principal curve given a set of data. Now, I am wondering if there are some functions for estimating the Principal Surface? Please give me a hint if you know some function or software (not limited to R) to be used for Principal Surface. Thanks for your help. Fred [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] abline and its objects
Hi R People: Is there a way to put an abline line for its objects on a plot, please? I have an its object, ibm2, which runs from the January 2 through May 28. ibm2 ibm 2004-01-02 91.55 2004-01-05 93.05 2004-01-06 93.06 2004-01-07 92.78 2004-01-08 93.04 2004-01-09 91.21 2004-01-12 91.55 2004-01-13 89.70 2004-01-14 90.31 2004-01-15 94.02 . . . I plot the data. No Problem. Now I extract the first day of the month in this fashion. zi - extractIts(ibm2,weekday=T,find=first,period=month) zi ibm 2004-01-02 91.55 2004-02-02 99.39 2004-03-01 97.04 2004-04-01 92.37 2004-05-03 88.02 Still ok. I would like to put a vertical line at each of the zi values. abline(v=zi,type=h,col=2) lines(zi,type=h,col=2) Nothing happens. I tried creating another its object with NA in all but the zi places. Then I used lines(test1) Still nothing happened. Any suggestions would be much appreciated. R Version 1.9.1 Sincerely, Laura H mailto: [EMAIL PROTECTED] Married. http://lifeevents.msn.com/category.aspx?cid=married __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] abline and its objects
The problem is that you've instructed R to place lines at the values 91.55, 99.39, 97.04, 92.37 and 88.02, but these values do not correspond to the user coordinates of the x-axis (which you've specified to be dates). Luckily, the dates where you need lines are in the rownames of zi. You do need to convert them to your user coordinates--and that depends on how plot decides to specify your user coordinates, which hinges on the range of your full data set (you clipped it). What's on your x-axis? Do par(usr) when you have one of the plots open and tell me what R says. Kevin -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Laura Holt Sent: Tuesday, June 29, 2004 2:06 PM To: [EMAIL PROTECTED] Subject: [R] abline and its objects Hi R People: Is there a way to put an abline line for its objects on a plot, please? I have an its object, ibm2, which runs from the January 2 through May 28. ibm2 ibm 2004-01-02 91.55 2004-01-05 93.05 2004-01-06 93.06 2004-01-07 92.78 2004-01-08 93.04 2004-01-09 91.21 2004-01-12 91.55 2004-01-13 89.70 2004-01-14 90.31 2004-01-15 94.02 . . . I plot the data. No Problem. Now I extract the first day of the month in this fashion. zi - extractIts(ibm2,weekday=T,find=first,period=month) zi ibm 2004-01-02 91.55 2004-02-02 99.39 2004-03-01 97.04 2004-04-01 92.37 2004-05-03 88.02 Still ok. I would like to put a vertical line at each of the zi values. abline(v=zi,type=h,col=2) lines(zi,type=h,col=2) Nothing happens. I tried creating another its object with NA in all but the zi places. Then I used lines(test1) Still nothing happened. Any suggestions would be much appreciated. R Version 1.9.1 Sincerely, Laura H mailto: [EMAIL PROTECTED] Married. http://lifeevents.msn.com/category.aspx?cid=married __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] naive question
From: [EMAIL PROTECTED] I did read the Import/Export document. It is true that replacing the read.table by read.csv and setting the commentChar= speeds things up some (a factor of two?) -- this is very far from acceptable performance, being some two orders of magnitude worse than SAS (the IO of which is, in turn, much worse than that of the unix utilities (awk, sort, and so on)) . Setting colClasses is suggested (and has been suggested by some in response to my question), but for a frame with some 60 columns, this is a major nuisance. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Please don't make _your_ nuisance into others'. Do read the posting guide as suggested above. You have not provided any info for anyone to give you any useful advice beyond those you said you received. R is not all things to all people. If you are so annoyed, why not use SAS/awk/sort and so on? [For my own education: How do you read the file into SAS without specifying column names and types?] Andy __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Issue with ROracle and results not being returned
Hello, I am using ROracle 5.5 with R 1.8.1. I am connecting to an oracle database , issuing a query and attempting to fetch data from the result set. I can see my session in the oracle database as well as the sql which was executed (including number of blocks hit, etc). I have also verified that the SQL returns a valid result set from sqlplus. Below is a sample trace of my session: library(ROracle) drv - dbDriver(Oracle) con - dbConnect(drv,perf/[EMAIL PROTECTED]) rs1 - dbSendQuery(con, statement = paste (SELECT distinct api FROM et_log_data order by api)) df- fetch(rs1,n=-1) summary(rs1,verbose=T) OraResult:(25657,0,3) Statement: SELECT distinct api FROM et_log_data order by api Has completed? yes Affected rows: 0 Rows fetched: -1 Fields: nameSclass type len precision scale isVarLength nullOK 1 API character VARCHAR2 50 0 0TRUE FALSE summary(df,verbose=T) API Length:0 Class :character Mode :character df [1] API 0 rows (or 0-length row.names) q() Any ideas on why the data cannot be retrieved into the df object? Please remove _nospam from email address to email me directly. Any help would be appreciated. Coburn (sample output from query via sqlplus) SQL SELECT distinct api FROM et_log_data order by api 2 ; API -- ADD_ACCOUNT ADD_COMMENT ADD_DLR_CH_SUB .. 54 total rows returned __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] rgl installation problems
Thanks for your replies. I do not HTML-ize my mail, but free email accounts do that and there is not a switch to turn it off. I apologize in advance. I installed R from the redhat package provided by Martyn Plummer. It installed fine and without problems. I can use R and have installed and used other packages within R without any problems whatsoever. I do not think the problem is with R or its installation. I do think there is a problem with the installation of rgl_0.64-13.tar.gz on RedHat 9 (linux). So, if there is anybody out there who has installed succesfully rgl_0.64-13.tar.gz on RedHat 9, I would like to know how. Thanks so much, Enrique From: Peter Dalgaard [EMAIL PROTECTED] To: Prof Brian Ripley [EMAIL PROTECTED] CC: E GCP [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: [R] rgl installation problems Date: 28 Jun 2004 20:05:34 +0200 Prof Brian Ripley [EMAIL PROTECTED] writes: On Mon, 28 Jun 2004, E GCP wrote: Thanks for your quick replies, and excuse my naivete, but how do I fix the problem, so the rgl package installs? We have no idea what is wrong on your system -- all we can tell is that you have done something wrong but we were not sitting at your shoulder when you did. Perhaps you should try re-building R from scratch, paying close attention to any messages? Meanwhile, try to follow the posting guide and not HTML-ize your mail. Another thing to try is to install Martyn's RPM (for FC1) instead of what is there now. Seems to get things right for me on RH8. The demo is amazingly smooth even on this ancient machine, BTW. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Sorting elements in a data.frame
On Wed, 23-Jun-2004 at 07:29PM +0100, Dan Bolser wrote: | | Hi, | | I have data like this | | print(x) | | ID VAL1VAL2 | 12 6 | 24 9 | 345 12 | 499 44 | | What I would like is data like this... | | ID VAL1VAL2 | 12 6 | 24 9 | 312 45 | 444 99 | | | So that my analysis of the ratio VAL2/VAL1 is somehow uniform. By uniform, I'm guessing you want them to be = 1 If z is a vector of VAL2/VAL1 values, you can make them all = 1 this way. z[z 1] - z[z 1]^-1 Depending on just how you want to use them, there could be better ways but I've done enough guessing for now. HTH -- Patrick Connolly HortResearch Mt Albert Auckland New Zealand Ph: +64-9 815 4200 x 7188 ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~ I have the world`s largest collection of seashells. I keep it on all the beaches of the world ... Perhaps you`ve seen it. ---Steven Wright ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~ __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] naive question
At 01:22 PM 6/29/2004, Igor Rivin wrote: I did read the Import/Export document. It is true that replacing the read.table by read.csv and setting the commentChar= speeds things up some (a factor of two?) -- this is very far from acceptable performance, being some two orders of magnitude worse than SAS (the IO of which is, in turn, much worse than that of the unix utilities (awk, sort, and so on)) . Setting colClasses is suggested (and has been suggested by some in response to my question), but for a frame with some 60 columns, this is a major nuisance. Feel free to contribute to the project. Whining and complaining won't get you anywhere. SAS *is* faster at I/O. So what? Dr. Marc R. Feldesman Professor and Chairman Emeritus Anthropology Department - Portland State University email: [EMAIL PROTECTED] email: [EMAIL PROTECTED] fax:503-725-3905 Don't knock on my door if you don't know my Rottweiler's name Warren Zevon Its midnight and I'm not famous yet Jimmy Buffett __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] nls fitting problems (singularity)
Hallo! I have a problem with fitting data with nls. The first example with y1 (data frame df1) shows an error, the second works fine. Is there a possibility to get a fit (e.g. JMP can fit also data I can not manage to fit with R). Sometimes I also got an error singularity with starting parameters. # x-values x-c(-1,5,8,11,13,15,16,17,18,19,21,22) # y1-values (first data set) y1=c(-55,-22,-13,-11,-9.7,-1.4,-0.22,5.3,8.5,10,14,20) # y2-values (second data set) y2=c(-92,-42,-15,1.3,2.7,8.7,9.7,13,11,19,18,22) # data frames df1-data.frame(x=x, y=y1) df2-data.frame(x=x, y=y2) # start list for parameters sl-list( d=0, b=10, c1=90, c2=20) # y1-Analysis - Result: Error in ... singular gradient nls(y~d+(x-b)*c1*(x-b0)+(x-b)*c2*(x-b=0), data=df1, start=sl) # y2-Analysis - Result: working... nls(y~d+(x-b)*c1*(x-b0)+(x-b)*c2*(x-b=0), data=df2, start=sl) # plots to look at data par(mfrow=c(1,2)) plot(df1$x,df1$y) plot(df2$x,df2$y) Perhaps there is another fitting routine? Can anybody help? Best wishes, Karl ___ Bestellen Sie Y! DSL und erhalten Sie die AVM FritzBox SL für 0. Sie sparen 119 und bekommen 2 Monate Grundgebührbefreiung. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] naive question
R's IO is indeed 20 - 50 times slower than that of equivalent C code no matter what you do, which has been a pain for some of us. It does however help read the Import/Export tips as w/o them the ratio gets much worse. As Gabor G. suggested in another mail, if you use the file repeatedly you can convert it into internal format: read.table once into R and save using save()... This is much faster. In my experience R is not so good at large data sets, where large is roughly 10% of your RAM. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] anti-R vitriol
Barry Rowlingson [EMAIL PROTECTED] writes: It's not that I think SAS is such great software, it's not. But I really hate badly designed software. R is designed by committee. Worse, it's designed by a committee of statisticians. They tend to confuse numerical analysis with computer science and don't have any idea about software development at all. The result is R. They'd probably prefer computer scientists and numerical analysts who confuse data munging with statistical data analysis, a common problem in mixed departments... best, -tony -- [EMAIL PROTECTED]http://www.analytics.washington.edu/ Biomedical and Health Informatics University of Washington Biostatistics, SCHARP/HVTN Fred Hutchinson Cancer Research Center UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable FHCRC (M/W): 206-667-7025 FAX=206-667-4812 | use Email CONFIDENTIALITY NOTICE: This e-mail message and any attachme...{{dropped}} __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] naive question
I was not particularly annoyed, just disappointed, since R seems like a much better thing than SAS in general, and doing everything with a combination of hand-rolled tools is too much work. However, I do need to work with very large data sets, and if it takes 20 minutes to read them in, I have to explore other options (one of which might be S-PLUS, which claims scalability as a major , er, PLUS over R). __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] naive question
At 01:22 PM 6/29/2004, Igor Rivin wrote: I did read the Import/Export document. It is true that replacing the read.table by read.csv and setting the commentChar= speeds things up some (a factor of two?) -- this is very far from acceptable performance, being some two orders of magnitude worse than SAS (the IO of which is, in turn, much worse than that of the unix utilities (awk, sort, and so on)) . Setting colClasses is suggested (and has been suggested by some in response to my question), but for a frame with some 60 columns, this is a major nuisance. Feel free to contribute to the project. Whining and complaining won't get you anywhere. SAS *is* faster at I/O. So what? Sigh. Why are you being defensive? If you read my message, you will see that what it comes down to is: I tried what to me are obvious things (some of which only become obvious after getting advice from people on this forum), and I cannot get the system to perform acceptably. The whining and complaining is actually an attempt to figure out whether I am missing something else obvious, because I find it hard to believe that I am the first one to face this [I know I am not, actually, because a gentleman emailed me a response to the effect that he had to break up his similarly large file into several pieces to get acceptable performance -- I would say that this solution is rather hard on the user]; on this list's archive there was a posting back in '98, asking basically the same question as mine). As for contributing to the project, perhaps getting a response of the form this is slow because we are trying to achieve this, and that, and the third thing, and this is the best compromise we seem to have come up with might be more encouraging than So what? Igor __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Several PCA questions...
I have the following problem cov(allDat, method='kendall') Where allDat is 11,000 by 6 data.frame. Will the above ever finish on my home computer? __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] naive question
Igor Rivin wrote: I was not particularly annoyed, just disappointed, since R seems like a much better thing than SAS in general, and doing everything with a combination of hand-rolled tools is too much work. However, I do need to work with very large data sets, and if it takes 20 minutes to read them in, I have to explore other options (one of which might be S-PLUS, which claims scalability as a major , er, PLUS over R). If you are routinely working with very large data sets it would be worthwhile learning to use a relational database (PostgreSQL, MySQL, even Access) to store the data and then access it from R with RODBC or one of the specialized database packages. R is slow reading ASCII files because it is assembling the meta-data on the fly and it is continually checking the types of the variables being read. If you know all this information and build it into your table definitions, reading the data will be much faster. A disadvantage of this approach is the need to learn yet another language and system. I was going to do an example but found I could not because I left all my SQL books at home (I'm travelling at the moment) and I couldn't remember the particular commands for loading a table from an ASCII file. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] naive question
I am working with data sets that have 2 matrices of 300 columns by 19,000 rows , and I manage to get the data loaded in a reasonable amount of time. Once its in I save the workspace and load from there. Once I start doing some work on the data, I am taking up about 600 Meg's of RAM out of the 1 Gig I have in the computer.I will soon upgrade to 2 Gig because I will have to work with an even larger data matrix soon. I must say that the speed of R given with what I have been doing, is acceptable. Peter At 07:59 PM 6/29/2004, Vadim Ogranovich wrote: R's IO is indeed 20 - 50 times slower than that of equivalent C code no matter what you do, which has been a pain for some of us. It does however help read the Import/Export tips as w/o them the ratio gets much worse. As Gabor G. suggested in another mail, if you use the file repeatedly you can convert it into internal format: read.table once into R and save using save()... This is much faster. In my experience R is not so good at large data sets, where large is roughly 10% of your RAM. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] naive question
On Tue, 29 Jun 2004 16:59:58 -0700, Vadim Ogranovich [EMAIL PROTECTED] wrote: R's IO is indeed 20 - 50 times slower than that of equivalent C code no matter what you do, which has been a pain for some of us. Things like this shouldn't be a pain for long. If C code works well, why not use C? It wouldn't be hard to write two C functions that 1. counted the lines and 2. read them into preallocated vectors. Doing it this way you could use .C, you don't need to learn the intricacies of .Call, and it should be about half the speed (since it takes two passes) of fast C code, i.e. 10-25 times faster than the read.* functions. Then, if you felt really ambitious, you could write it in a way that others could use, put it in a package, and suddenly R would have I/O 10-25 times faster than it does now. You wouldn't try to make it as flexible as current R code, but for reading these huge files people are talking about, it would be worthwhile to go through a few extra setup steps. Duncan Murdoch __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] naive question
We need more details about your problem to provide any useful help. Are all the variables numeric? Are they all completely different? Is it possible to use `colClasses'? It is possible, but very inconvenient. There are mostly numeric columns, but some integer categories, and some string names. The total number is high, so doing this by hand would take several minutes as well, so a different solution is preferable. I did use as.is=TRUE, but that did not seem to make a huge difference. Also, having a couple of gigabytes of RAM is not necessarily useful if you're on a 32-bit OS since the total process size is usually limited to be less than ~3GB. True. top shows that the maximal memory usage for the process is about 700MB, so process size was not a limitation (but had I 512Mb, the thrashing would have killed me...) Believe it or not, complaints like these are not that common. 1998 was a long time ago! Alas... -roger Igor Rivin wrote: I have a 100Mb comma-separated file, and R takes several minutes to read it (via read.table()). This is R 1.9.0 on a linux box with a couple gigabytes of RAM. I am conjecturing that R is gc-ing, so maybe there is some command-line arg I can give it to convince it that I have a lot of space, or?! Thanks! Igor __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Roger D. Peng http://www.biostat.jhsph.edu/~rpeng/ __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] naive question
Also, having a couple of gigabytes of RAM is not necessarily useful if you're on a 32-bit OS since the total process size is usually limited to be less than ~3GB. well 2^32 gives you more like 4 GB, how much of that can be given to a process my highest workspace reached 1.2 Gig. I will add another Gig ... or 2 I am assuming that R can address more than 2 Gig memory, does anybody know if R has some other limitation that might be lower than the OS? Peter __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] naive question
Igor Rivin wrote: I was not particularly annoyed, just disappointed, since R seems like a much better thing than SAS in general, and doing everything with a combination of hand-rolled tools is too much work. However, I do need to work with very large data sets, and if it takes 20 minutes to read them in, I have to explore other options (one of which might be S-PLUS, which claims scalability as a major , er, PLUS over R). If you are routinely working with very large data sets it would be worthwhile learning to use a relational database (PostgreSQL, MySQL, even Access) to store the data and then access it from R with RODBC or one of the specialized database packages. I was thinking about that, but I had thought that this would help for reading small pieces of the data (since subsetting would happen on the db side), but not so much for reading big chunks. But it's certainly worth a try R is slow reading ASCII files because it is assembling the meta-data on the fly and it is continually checking the types of the variables being read. If you know all this information and build it into your table definitions, reading the data will be much faster. What do you mean by meta-data? Anyway, I agree that this would slow it down, but I would suspect that even so there is a bit of room for improvement, since five minutes for 12 million tokens comes out to 4/second, which is really pretty bad on a 2-3 Ghz machine... A disadvantage of this approach is the need to learn yet another language and system. I was going to do an example but found I could not because I left all my SQL books at home (I'm travelling at the moment) and I couldn't remember the particular commands for loading a table from an ASCII file. Well, I will look into it (among other possibilities). __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] naive question
On Tue, 29-Jun-2004 at 10:31PM -0400, [EMAIL PROTECTED] wrote: | We need more details about your problem to provide any useful | help. Are all the variables numeric? Are they all completely | different? Is it possible to use `colClasses'? | | It is possible, but very inconvenient. There are mostly numeric columns, | but some integer categories, and some string names. The total number is | high, so doing this by hand would take several minutes as well, so a | different solution is preferable. I did use as.is=TRUE, but that did not For the lazy typist, here's an idea: Make a small subset of the datafile (say the first 20 rows) and read that in with read.table. X - read.table(blah.txt, header = TRUE, ) Xclasses - sapply(X, class) Now we have a nice long vector that you can use in your colClasses argument with the whole data. Even if it needs a bit of editing, it will save you typing in all those numeric strings. HTH -- Patrick Connolly HortResearch Mt Albert Auckland New Zealand Ph: +64-9 815 4200 x 7188 ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~ I have the world`s largest collection of seashells. I keep it on all the beaches of the world ... Perhaps you`ve seen it. ---Steven Wright ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~ __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] naive question
On Tue, 29-Jun-2004 at 10:31PM -0400, [EMAIL PROTECTED] wrote: | We need more details about your problem to provide any useful | help. Are all the variables numeric? Are they all completely | different? Is it possible to use `colClasses'? | | It is possible, but very inconvenient. There are mostly numeric columns, | but some integer categories, and some string names. The total number is | high, so doing this by hand would take several minutes as well, so a | different solution is preferable. I did use as.is=TRUE, but that did not For the lazy typist, here's an idea: Make a small subset of the datafile (say the first 20 rows) and read that in with read.table. X - read.table(blah.txt, header = TRUE, ) Xclasses - sapply(X, class) Now we have a nice long vector that you can use in your colClasses argument with the whole data. Even if it needs a bit of editing, it will save you typing in all those numeric strings. HTH Aha! That could be the right trick. I will try it and see how it works... Thanks, Igor __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] MacOS X binaries won't install
I've tried installing the MacOS X binaries for R available at: http://www.bioconductor.org/CRAN/ I'm running MacOS X version 10.2.8. I get a message indicating the installation is successful, but when I double-click on the R icon that shows up in my Applications folder, the application seems to try to open but closes immediately. I looked for /Library/Frameworks/R.framework (by typing ls /Library/Frameworks) and it does not appear. A global search for R.framework yields no results, so it seems that the installation is not working. (I was going to try command line execution.) Any help would be appreciated. Thanks! - RSS Ruben S. Solis __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] gap() in SAGx package - How to handle this situation?
gap(data,cluster) Arguments: data - The data matrix cluster - a vector descibing the cluster memberships When I use gap(), Our data matrix 300 * 40 , it worked But data matrix 40 * 300, it don't work And next is error message. - Error in data %*% t(veigen) : non-conformable arguments Gap Function function (data = swiss, class = g, B = 500) { library(mva) if (min(table(class)) == 1) stop(Singleton clusters not allowed) data - as.matrix(data) temp1 - log(sum(by(data, factor(class), intern - function(x) sum(dist(x)/ncol(x))/2))) veigen - svd(data)$v x1 - data %*% t(veigen)- Error Message ... Example; In gap function, when we compute singular value decomposition, X, X = UDV, here X is data matrix with dimension of 30*400, we expect following results : U : 30*30 dimension, D: 30*400 dimenstion, V :400*400 dimenstion But gap function create U=30*30 D=30*400 V=400*30 . How to handle this error message? Thanks in advance. Best regards. [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] funny plotting
Hi, I just wanted to plot a boxplot with a nice curve going through it, I thought this would be a simple task but for some reason I can't get the two graphs on the same page accurately. Enclosed is the code showing the two plots seperately and together. I would have thought it should work if I could use boxplot() then plot() overlayed but it won't allow the command add=TRUE (which has worked for me in the past). Thanks Carla P.S. please excuse the clumsy code! #Section 2 Data Set particle dial-rbind(-1, -1, 0, 0, 0, 0, 1, 1, 1) counts-rbind(2, 3, 6, 7, 8, 9, 10, 12, 15) particle-as.data.frame(cbind(dial,counts),row.names=NULL) names(particle)-c(dial,counts) attach(particle) pois.particle-glm(counts~dial,family=poisson) x-seq(-2,2,length=20) y-predict(pois.particle,data.frame(dial=x),type=response) #Overlaying plots x11() boxplot(counts~dial,main=Boxplot of counts for dial setting and poisson fit,ylim=c(0,25)) lines(x,y) #The seperate plots x11() boxplot(counts~dial,ylim=c(0,25)) x11() plot(x,y,ylim=c(0,25),type=l) __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] funny plotting
Try something like plot(x,y) box.dat - boxplot(x=split(counts, dial), plot=FALSE) bxp(box.dat, add=TRUE, at=c(-1, 0, 1), show.names=FALSE) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Karla Meurk Sent: Tuesday, June 29, 2004 21:46 PM To: [EMAIL PROTECTED] Subject: [R] funny plotting Hi, I just wanted to plot a boxplot with a nice curve going through it, I thought this would be a simple task but for some reason I can't get the two graphs on the same page accurately. Enclosed is the code showing the two plots seperately and together. I would have thought it should work if I could use boxplot() then plot() overlayed but it won't allow the command add=TRUE (which has worked for me in the past). Thanks Carla P.S. please excuse the clumsy code! #Section 2 Data Set particle dial-rbind(-1, -1, 0, 0, 0, 0, 1, 1, 1) counts-rbind(2, 3, 6, 7, 8, 9, 10, 12, 15) particle-as.data.frame(cbind(dial,counts),row.names=NULL) names(particle)-c(dial,counts) attach(particle) pois.particle-glm(counts~dial,family=poisson) x-seq(-2,2,length=20) y-predict(pois.particle,data.frame(dial=x),type=response) #Overlaying plots x11() boxplot(counts~dial,main=Boxplot of counts for dial setting and poisson fit,ylim=c(0,25)) lines(x,y) #The seperate plots x11() boxplot(counts~dial,ylim=c(0,25)) x11() plot(x,y,ylim=c(0,25),type=l) __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html