[R] fractional ranks
Hi, is there a function to calculate fractional ranks? Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R: Securities earning covariance
Thank you for your very fast response. I just tried to use the zoo package, after having read the vignettes, but I get this error message: Warning messages: 1: In x$DAY : $ operator is invalid for atomic vectors, returning NULL 2: In x$EARNINGS : $ operator is invalid for atomic vectors, returning NULL 3: In x$DAY : $ operator is invalid for atomic vectors, returning NULL 4: In x$EARNINGS : $ operator is invalid for atomic vectors, returning NULL 5: In x$DAY : $ operator is invalid for atomic vectors, returning NULL 6: In x$EARNINGS : $ operator is invalid for atomic vectors, returning NULL Am I missing something ? Thank you again Angelo Linardi -Messaggio originale- Da: Gabor Grothendieck [mailto:[EMAIL PROTECTED] Inviato: giovedì 5 giugno 2008 17.55 A: LINARDI ANGELO Cc: r-help@r-project.org Oggetto: Re: [R] Securities earning covariance Check out the three vignettes (i.e. pdf documents in the zoo package). e.g. Lines - SEC_ID DAY EARNING IT001 200701015.467 IT001 200701025.456 IT001 200701034.954 IT001 200701043.456 IT002 200701011.456 IT002 200701021.345 IT002 200701031.233 IT003 200701010.345 IT003 200701020.367 IT003 200701030.319 DF - read.table(textConnection(Lines), header = TRUE) DFs - split(DF, DF$SEC_ID) library(zoo) f - function(DF.) zoo(DF.$EARNING, as.Date(format(DF.$DAY), %Y%m%d)) z - do.call(merge, lapply(DFs, f)) cov(z) # uses n-1 On Thu, Jun 5, 2008 at 11:41 AM, [EMAIL PROTECTED] wrote: Good morning, I am a new R user and I am trying to learn how to use it. I am trying to solve this problem. I have a dataframe df of daily securities (for a year) earnings as follows: SEC_ID DAY EARNING IT001 200701015.467 IT001 200701025.456 IT001 200701034.954 IT001 200701043.456 .. IT002 200701011.456 IT002 200701021.345 IT002 200701031.233 .. IT003 200701010.345 IT003 200701020.367 IT003 200701030.319 .. And so on: about 800 different SEC_ID and about 18 rows. I have to calculate the covariance for each couple of securities x and y according to the formula: Cov(x,y) = (sum[(x-x')*(y-y')]/N)/(sx*sy) being x' and y' the mean of securities earning in the year, N the number of observations, sx and sy the standard deviation of x and y. To do this I could build a df2 data frame like this: DAY SEC_ID.xSEC_ID.yEARNING.x EARNING.y x' y' sx sy 20070101IT001 IT002 5.467 1.456 a b aa bb 20070101IT001 IT003 5.467 0.345 a c aa cc 20070101IT002 IT003 1.456 0.345 b c bb cc 20070102IT001 IT002 5.456 1.345 a b aa bb 20070102IT001 IT003 5.456 0.367 a c aa cc 20070102IT002 IT003 1.345 0.367 b c bb cc ... (merging df with itself with a condition SEC_ID.x SEC_ID.y) and then easily calculate the formula; but the dimensions are too big (the process stops whit an out-of-memory message). Besides partitioning the input and using a loop, are there any smarter solutions (eventually using split and other ways of subgroup merging to solve the problem ? Are there any shortcuts using statistical built-in functions (e.g. cov, vcov) ? Thank you in advance Angelo Linardi ** Le e-mail provenienti dalla Banca d'Italia sono trasmesse in buona fede e non comportano alcun vincolo ne' creano obblighi per la Banca stessa, salvo che cio' non sia espressamente previsto da un accordo scritto. Questa e-mail e' confidenziale. Qualora l'avesse ricevuta per errore, La preghiamo di comunicarne via e-mail la ricezione al mittente e di distruggerne il contenuto. La informiamo inoltre che l'utilizzo non autorizzato del messaggio o dei suoi allegati potrebbe costituire reato. Grazie per la collaborazione. -- E-mails from the Bank of Italy are sent in good faith but they are neither binding on the Bank nor to be understood as creating any obligation on its part except where provided for in a written agreement. This e-mail is confidential. If you have received it by mistake, please inform the sender by reply e-mail and
[R] R: Securities earning covariance
It works perfectly, thank you so much. Now I will try to put teh results into a suitable form (a data frame like this): SEC_ID.xSEC_ID.yEARN_COV Thank you again Angelo Linardi -Messaggio originale- Da: Patrick Burns [mailto:[EMAIL PROTECTED] Inviato: giovedì 5 giugno 2008 18.11 A: LINARDI ANGELO Cc: r-help@r-project.org Oggetto: Re: [R] Securities earning covariance I would start by creating a matrix that held the returns with rows being the dates and columns being the securities. You can do this by something along the lines of: days - as.character(df[, 'DAY']) sec - as.character(df[, 'SEC_ID'] earningmat - array(NA, c(length(unique(days)), length(unique(sec))), list(sort(unique(days)), unique(sec))) submat - cbind(match(days, rownames(earningmat)), match(sec, colnames(earningmat))) earningmat[submat] - as.numeric(as.character(df[, 'EARNING'])) Notice that while the 'as.numeric-as.character' in the last line may not be needed -- if it is needed, it is needed in a big way. If the 'EARNING' column is a factor (because there was at least one item that didn't appear to be numeric when it was read in), then skipping the 'as.numeric-as.character' call will put the codes for the factor into the matrix. It will be numeric as you expect, but complete garbage. The trick with 'submat' is explained in any complete description of subscripting -- the subscripting section of Chapter 1 of S Poetry, for instance. Once you have a suitable matrix, then you can use 'var' or some other function to get the variance matrix. Depending on where you are going, a factor model variance may be better. You can get 'factor.model.stat' from the public domain area of the Burns Statistics website. This is especially useful if there are missing values in your matrix. Patrick Burns [EMAIL PROTECTED] +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and A Guide for the Unwilling S User) [EMAIL PROTECTED] wrote: Good morning, I am a new R user and I am trying to learn how to use it. I am trying to solve this problem. I have a dataframe df of daily securities (for a year) earnings as follows: SEC_IDDAY EARNING IT001 200701015.467 IT001 200701025.456 IT001 200701034.954 IT001 200701043.456 .. IT002 200701011.456 IT002 200701021.345 IT002 200701031.233 .. IT003 200701010.345 IT003 200701020.367 IT003 200701030.319 .. And so on: about 800 different SEC_ID and about 18 rows. I have to calculate the covariance for each couple of securities x and y according to the formula: Cov(x,y) = (sum[(x-x')*(y-y')]/N)/(sx*sy) being x' and y' the mean of securities earning in the year, N the number of observations, sx and sy the standard deviation of x and y. To do this I could build a df2 data frame like this: DAY SEC_ID.xSEC_ID.yEARNING.x EARNING.y x' y' sx sy 20070101 IT001 IT002 5.467 1.456 a b aa bb 20070101 IT001 IT003 5.467 0.345 a c aa cc 20070101 IT002 IT003 1.456 0.345 b c bb cc 20070102 IT001 IT002 5.456 1.345 a b aa bb 20070102 IT001 IT003 5.456 0.367 a c aa cc 20070102 IT002 IT003 1.345 0.367 b c bb cc ... (merging df with itself with a condition SEC_ID.x SEC_ID.y) and then easily calculate the formula; but the dimensions are too big (the process stops whit an out-of-memory message). Besides partitioning the input and using a loop, are there any smarter solutions (eventually using split and other ways of subgroup merging to solve the problem ? Are there any shortcuts using statistical built-in functions (e.g. cov, vcov) ? Thank you in advance Angelo Linardi ** Le e-mail provenienti dalla Banca d'Italia sono trasmesse in buona fede e non comportano alcun vincolo ne' creano obblighi per la Banca stessa, salvo che cio' non sia espressamente previsto da un accordo scritto. Questa e-mail e' confidenziale. Qualora l'avesse ricevuta per errore, La preghiamo di comunicarne via e-mail la ricezione al mittente e di distruggerne il contenuto. La informiamo inoltre che l'utilizzo non autorizzato del messaggio o dei suoi allegati potrebbe costituire reato. Grazie per la collaborazione. -- E-mails from the Bank of Italy are sent in
Re: [R] Getting R and x11 to work
On Fri, 6 Jun 2008, Rick Bilonick wrote: I'm using Suse Linux Enterprise Desktop 10.2 (SP2) on an HP 2133 (x86) mini-notebook. (There apparently are a LOT of bugs in 10.1!) I downloaded R-base from the openSuse 10.2 repository and was (finally) able to install it (after installing blas and gcc-fortran). I can start an R session and do computations. When I try to do any graphics using x11, I get the message: unable to load shared library '/usr/lib/R/modules//R_X11.so': /usr/lib/R/modules//R_X11.so: undefined symbol: cairo_image_surface_get_data Does anyone have an idea on how to fix this? Yes, your binary version of R is incompatible with the version of cairo you have installed (if you have one). Really the RPM should have checked that, so please report to the maintainer. Short-term fix: set X11.options(type=Xlib) in the session or in .Rprofile via setHook(packageEvent(grDevices, onLoad), function(...) grDevices::X11.options(type=Xlib) ) Longer-term fix: install or update cairo = 1.2 (and preferably 1.4). -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Multiple comment.char under read.table
Hi all, Suppose I want to read a text file with read.table. It containt lines to be skipped that begins with ! and ^. Is there a way to include this two values in the read.table function? I tried this but doesn't seem to work. dat - read.table(mydata.txt, comment.char = c(!,^) , na.strings = null, sep = \t); Please advice. -- Gundala Viswanath __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] power of a multiway ANOVA
thank you ! :) 2008/6/5 Rolf Turner [EMAIL PROTECTED]: On 6/06/2008, at 1:08 AM, biologeeks wrote: dear all, in the package pwr , there is the fonction power.anova.test which permit to obtain the power for a one-way ANOVA...but I'm looking for a way to compute the power of a multiway ANOVA.( find the 1-beta). Is it possible? do you have some ideas ? The cumulative F distribution function pf in R allows for a non-centrality parameter, so you can calculate the power of any F-test, against any properly specified alternative hypothesis, if you know what you are doing. (And if you don't know what you're doing, don't do it.) cheers, Rolf Turner ## Attention:This e-mail message is privileged and confidential. If you are not theintended recipient please delete the message and notify the sender.Any views or opinions presented are solely those of the author. This e-mail has been scanned and cleared by MailMarshal www.marshalsoftware.com ## [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How can I display a characters table ?
I would like to generate a graphics text. I have a 67x2 table with 5-character string in col 1 and 2-character string in col 2. Is it possible to make such a table appear on a graphics or a message-box pop-up window ? Thank you so much. -- Maura E.M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (baseline) logistic regression + gof functions?
Hi, I'm not sure why you think glm doesn't provide goodness of fit tests. Have a look at anova.glm and summary.glm. All the functions you mention can deal with multiple predictors. multinom deals with non-binary data. lrm will deal with ordinal data as well as binary. polr (in the MASS package) will also do ordinal logistic regression. David On Thu, Jun 5, 2008 at 10:33 PM, Wim Bertels [EMAIL PROTECTED] wrote: Hallo, which function can i use to do (baseline) logistic regression + goodness of fit tests? so far i found: # logistic on binary data lrm combined with resid(model,'gof') # logistic on binary data glm with no gof-test # baseline logit on binary data multinom with no gof-test (# also, what if the data are not binary and more than one predictor in the model?) Hints? Suggestions? Other functions that might help? mvg, Wim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] modifying tbrm function
Hi, I haven´t much experience on writing functions and would like to modify the simple tbrm() function from package dplR in order to save the weights that it produces. I have tried using the superassignment operator as explained in the R intro, but is this the right way to save a variable within a function? This is my code mytukey - function (x, C = 9) { wt = rep(0, length(x)) x.med = median(x) S.star = median(abs(x - x.med)) w0 = (x - x.med)/(C * S.star + 1e-06) lt0.flag = abs(w0) = 1 wt[lt0.flag] = ((1 - w0^2)^2)[lt0.flag] t.bi.m = sum(wt * x)/sum(wt) myweights-wt # this is my added line t.bi.m } Thanks, D. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Lattice: key does not accept German umlaute
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 library(lattice) ## works as expected xyplot(1~1, key = list(text = list(c(Maenner ## works as expected xyplot(1~1, key = list(text = list(c(Maenner))), xlab = M\344nner) ## gives an error xyplot(1~1, key = list(text = list(c(M\344nner Is this a bug? TIA, Bernd -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFISPo6Usbvfbd00+ERArJFAJsEvWq2Cai7chuOADadZHT2pnRJOgCfWLdx 3Hs3PnCzd6nuTqt6JwCl+VM= =RVUk -END PGP SIGNATURE- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] which question
Hello list, I was trying to select a column of a data frame using the *which* command. I was actually selecting the rows of the data frame using *which, *and then displayed a certain column of it. The command that I was using is: sequence=*mydata*[*which*(human[,3] %in% genes.sam.names),*9*] In the above command, *mydata *is my data frame, *9 *is the column which I want to display. The rest are just other variables that I use. The *which*command is supposed to retrieve the rows of interst. The rows are well retrieved, however, if for the certain row, column *9* is NA, the respective element of column *10* is displayed. How can I fix that? Thank you very much, Eleni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] label outliers in geom_boxplot (ggplot2)
hadley wickham írta: 2008/5/27 Mihalicza Péter [EMAIL PROTECTED]: Dear List and Hadley, I would like to have a boxplot with ggplot2 and have the outlier values labelled with their name attribute. So I did library(ggplot2) dat=data.frame(num=rep(1,20), val=c(runif(18),3,3.5), name=letters[1:20]) p=ggplot(dat, aes(y=val, x=num))+geom_boxplot(outlier.size=4, outlier.colour=green) p+geom_text(label=dat$name) But this -of course- labels all the data points. So I searched high and low to find the way to only label the outliers, but I couldn't find any solution. Probably my keywords were inappropriate, but I looked at the ggplot website and the book also. So I did this: boxout=boxplot(dat$val)$out outname=as.character(dat$name) outname[(dat$val %in% boxout)==FALSE]=\n p+geom_text(label=outname) This works, but seems like a hack to me. Is there an obvious solution that I am missing? I don't think so. This type of problem (where you need to independently access the statistics generated by ggplot) does come up fairly often, but I don't have any particularly good solution for it. It's too obvious, so I am positive that there is a good reason for not doing this, but still: why is it not possible, to have an outlier output in stat_boxplot that can be used at geom_text()? Something like this, with upper: dat=data.frame(num=rep(1,20), val=c(runif(18),3,3.5), name=letters[1:20]) ggplot(dat, aes(y=val, x=num))+stat_boxplot(outlier.size=4, + outlier.colour=green)+geom_text(aes(y=..upper..), label=This is upper hinge) Unfortunately, this does not work and gives the error message: Error in eval(expr, envir, enclos) : object upper not found Is it because you can only use stat outputs within the stat statements? Could it be possible to make them available outside the statements too? P.S. Sorry for taking so long to respond, I've been at my sister's wedding in New Zealand Thanks for the answer and happy marriage to your sister! Peter -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (baseline) logistic regression + gof functions?
Thanks for the quick reply David, so far this sums up to: # logistic on binary data lrm combined with resid(model,'gof') # logistic on raw binary data glm with gof using anova.glm (i think that anova.glm only makes sence on grouped binary data, not on the raw binary data..) (so what is the gof for raw binary data and glm?) # baseline logit on raw multicategory data multinom with no gof-test (the only reference i found was *by GOEMAN* Jelle J. ; LE CESSIE Saskia eg https://openaccess.leidenuniv.nl/bitst*r*eam/1887/4324/22/04.pdf but there is no R implementation?) (# also, what if the data are not binary and more than one predictor in the model?) # what if the grouped data are very unbalanced and might have a lot of empty cell counts? Hints? Suggestions? Other functions that might help? eg: by grouped data i mean (example for binomial): Var1 (with 3 levels) L1 L2 L3 T112 F020 by raw data i mean: eg Bin Var1 T L1 T L2 T L3 T L3 F L2 F L2 mvg, Wim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] bartlett-test
i a have transformed my data to data frame named df with only column names(no rownames).each column represnt one sample with 3 observations (in deed nrow(df)=3, and ncol(df)=92).in order to check homoskedasticity of variance of my 92 samples i do: bartlett.test(df) it work and give me a result.but i'm afread of getting false result,knowen that a call of such function require a vector of data x and a factor g . g is omitted when x is a list of vector. is the call that i do true? is a data frame in my case considered like a list of vector? -- View this message in context: http://www.nabble.com/bartlett-test-tp17687757p17687757.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Agreggating data using external aggregation rules
Dear R experts, I am currently facing a tricky problem which I have read a lot about in the various R mailing lists without finding exactly what I need. I have a big data frame DF (about 2,000,000 rows) with 7 columns being variables and 1 being a measure (using reshape package nomeclature). There are no duplicates in it. Fot each of the variables I have some rules to apply, being COD_IN the value of the variable in the DF, COD_OUT the one to be transformed to; once obtained the new codes in the DF I have to aggregate the new DF (for example summing the measure). Usually the total transformation (merge+aggregate) really decreases the number of lines in the data frame, but sometimes it can grows depending on the rule. Just to give an idea, the first rule in v1 maps 820 different values into 7 ones. Using SQL and a database this can be done in a very straightforward way (for example on the variable v1): Select COD_OUT, v2, v3, v4, v5, v6, v7, sum(measure) From DF, RULE_v1 Where v1=COD_IN Group by v2, v3,v4, v5, v6, v7 So the first choice would be using a database; the second one would be splitting the data frame and then joining the results. Is there any other possibility to merge+aggregate caused by the merge ? Thank you in advance Angelo Linardi ** Le e-mail provenienti dalla Banca d'Italia sono trasmesse in buona fede e non comportano alcun vincolo ne' creano obblighi per la Banca stessa, salvo che cio' non sia espressamente previsto da un accordo scritto. Questa e-mail e' confidenziale. Qualora l'avesse ricevuta per errore, La preghiamo di comunicarne via e-mail la ricezione al mittente e di distruggerne il contenuto. La informiamo inoltre che l'utilizzo non autorizzato del messaggio o dei suoi allegati potrebbe costituire reato. Grazie per la collaborazione. -- E-mails from the Bank of Italy are sent in good faith but they are neither binding on the Bank nor to be understood as creating any obligation on its part except where provided for in a written agreement. This e-mail is confidential. If you have received it by mistake, please inform the sender by reply e-mail and delete it from your system. Please also note that the unauthorized disclosure or use of the message or any attachments could be an offence. Thank you for your cooperation. ** __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lattice: key does not accept German umlaute
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Bernd Weiss schrieb: | library(lattice) | | ## works as expected | xyplot(1~1, key = list(text = list(c(Maenner | | ## works as expected | xyplot(1~1, key = list(text = list(c(Maenner))), xlab = M\344nner) | | ## gives an error | xyplot(1~1, key = list(text = list(c(M\344nner | | Is this a bug? | | TIA, | | Bernd | Sorry, I forgot to mention my | sessionInfo() R version 2.7.0 (2008-04-22) i386-pc-mingw32 locale: LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] lattice_0.17-8 loaded via a namespace (and not attached): [1] grid_2.7.0 tools_2.7.0 Bernd -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFISQbkUsbvfbd00+ERAk/JAJ4nFVRTJFZe2XalbYDUgyz4YGw6LQCeIcAe 8dQeN5+Fwr+DThwjBxH72xA= =MZr3 -END PGP SIGNATURE- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] which question
Eleni Christodoulou elenichri at gmail.com writes: I was trying to select a column of a data frame using the *which* command. I was actually selecting the rows of the data frame using *which, *and then displayed a certain column of it. The command that I was using is: sequence=*mydata*[*which*(human[,3] %in% genes.sam.names),*9*] Please provide a running example. The *mydata* are difficult to read. Dieter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] vector comparison
Karin Lagesen wrote: I know this is fairly basic, but I must have somehow missed it in the manuals. I have two vectors, often of unequal length. I would like to compare them for identity. Order of elements do not matter, but they should contain the same. I.e: I want this kind of comparison: if (1==1) show(yes) else show(blah) [1] yes if (1==2) show(yes) else show(blah) [1] blah Only replace the numbers with for instance the vectors a = c(a) b = c(b,c) c = c(c,b) Now, I realize I only get a warning when comparing things, but this to me means that I am not doing it correctly: if (a==a) show(yes) else show(blah) [1] yes if (a==b) show(yes) else show(blah) [1] blah Warning message: In if (a == b) show(yes) else show(blah) : the condition has length 1 and only the first element will be used if (b == c) show(yes) else show(blah) [1] blah Warning message: In if (b == c) show(yes) else show(blah) : the condition has length 1 and only the first element will be used I have also tried the %in% comparator, but that one throws warnings too: if (b %in% c) show(yes) else show(blah) [1] yes Warning message: In if (b %in% c) show(yes) else show(blah) : the condition has length 1 and only the first element will be used if (c %in% c) show(yes) else show(blah) [1] yes Warning message: In if (c %in% c) show(yes) else show(blah) : the condition has length 1 and only the first element will be used So, how is this really supposed to be done? Hi Karin, My interpretation of your question is that you want to test whether two vectors contain the same elements, whether or not the order of those elements is the same. I'll first assume that the vectors must only have elements from the same _set_ and it doesn't matter if they have different lengths. if(length(unique(a))==length(unique(b))) { if(all(unique(a)==unique(b))) cat(Yes\n) else cat(No\n) } else cat(No\n) However, if the lengths must be the same, but the order may be different: if(length(a)==length(b)) { if(all(sort(a)==sort(b))) cat(Yes\n) else cat(No\n) } else cat(No\n) The latter test ensures that if there are repeated elements, the number of repeats of each element is the same in each vector. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Y values below the X plot
jpardila wrote: Dear List, I am creating a plot and I want to insert the tabular data below the X axis. I mean for every value of X I want to show the value in Y as a table below the plot. I think the attached image gives an idea of what I mean by this. Below is the code i am using now... but as you see the Y values don't have the right location. Maybe I should insert them as a table? Any ideas on that. This should be easy to do but I don't have much experience in R. Many thanks in advanced, JP http://www.nabble.com/file/p17670311/legend.jpg legend.jpg - img1-c(-5.28191709,-5.364480081,-4.829456677,-5.325101503,-5.212952356,-5.181171896,-5.211122693,-5.153677663,-5.292961077,-5.151612394,-5.056544559,-5.151457115,-5.332984571,-5.325259917,-5.523870109,-5.429800485,-5.436455325) img2-c(-5.55,-5.56,-5.72,-5.57,-5.34,-5.18,-5.18,-5.36,-5.46,-5.32,-5.29,-5.37,-5.42,-5.45,-5.75,-5.75,-5.77) angle-26:42 plot(img1~angle, type=o, xlab=Incident angle, ylab=sigma, ylim=c(-8,-2),lwd=2,col=8, pch=19,cex=1,axes=false) lines(img2~angle,lwd=2,type=o, col=1, pch=19,cex=1) legend(38,-2,format(img1,digits=2), cex=0.8) legend(40,-2,format(img2,digits=2),cex=0.8) legend(26, -2, c(Image 1,Image 2), cex=0.8,lwd=2,col=c(8,1), pch=19, lty=1:2,bty=n) abline(h = -1:-8, v = 25:45, col = lightgray, lty=3) axis(1, at=2*0:22) axis(2, at=-8:-2) --- Hi JP, I thought I could do this with addtable2plot, but I hadn't coded a column spacing into it (maybe next version). However, this is almost what you want, and I'm sure you can work out how to add the lines. plot(img1~angle, type=o, xlab=Incident angle, ylab=sigma, ylim=c(-8,-2),lwd=2,col=8, pch=19,cex=1,axes=false) box() lines(img2~angle,lwd=2,type=o, col=1, pch=19,cex=1) tablerownames-Angle\nImage1\nImage2 mtext(c(tablerownames, paste(angle,round(img1,2),round(img2,2),sep=\n)), 1,line=1,at=c(24.7,angle),cex=0.5) Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lattice: key does not accept German umlaute
Well, you failed to give the 'at a minimum information' asked for in the posting guide, and \344 is locale-specific. I see 'MingW32' below, so will guess this is German-language Windows. We don't know what the error was, either. It works correctly for me in CP1252 with R-patched, and gives an error in 2.7.0 (and works in 2.6.2). I think it was fixed as side effect of o Rare string width calculations in package grid were not interpreting the string encoding correctly. although it is not the same problem that NEWS item refers to. My error message in 2.7.0 was Error in grid.Call.graphics(L_setviewport, pvp, TRUE) : invalid input 'Männer' in 'utf8towcs' which is what makes me think this was to do with sizing the viewport. So please update to R-patched and try again. On Fri, 6 Jun 2008, Bernd Weiss wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 library(lattice) ## works as expected xyplot(1~1, key = list(text = list(c(Maenner ## works as expected xyplot(1~1, key = list(text = list(c(Maenner))), xlab = M\344nner) ## gives an error xyplot(1~1, key = list(text = list(c(M\344nner Is this a bug? TIA, Bernd -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFISPo6Usbvfbd00+ERArJFAJsEvWq2Cai7chuOADadZHT2pnRJOgCfWLdx 3Hs3PnCzd6nuTqt6JwCl+VM= =RVUk -END PGP SIGNATURE- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lattice: key does not accept German umlaute
Bernd Weiss bernd.weiss at uni-koeln.de writes: library(lattice) ## gives an error xyplot(1~1, key = list(text = list(c(M\344nner Is this a bug? You forgot to mention your version, assuming 2.7.0 unpatched. Corrected by Brian Ripley in developer version (and probably also in patched) http://finzi.psych.upenn.edu/R/Rhelp02a/archive/129251.html Dieter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Merging two dataframes
Hi All, Newbie question for you all but i have been looking at the archieves and the help dtuff to get a rough idea of what i want to do I would like to merge two dataframes together based on a keyed variable in one dataframe linking to the other dataframe. Only some of the cases will match but i would like to keep the others as well. My dataframes have 67 and 28 cases respectively and i would like ot end uip with one file 67 cases long (all 28 are matched cases). I can use the merge command to merge two datasets together this but i still get some odd results, i'm using the code below; ETC - read.csv(file=CSV_Data2.csv,head=TRUE,sep=,) 'SURVEY - read.csv(file=survey.csv,head=TRUE,sep=,) 'FullData - merge(ETC, SURVEY, by.SURVEY = uid, by.ETC = ord) The merged file seems to have 1800 cases while the ETC data file only has 67 and the SURVEY file only has 28. (Reading the help it looks as if it merges 1 case with all cases in the other file, which is not what i want) The matching variables fields are the 'ord' field and the 'uid' field Can anyone advise please? -- Michael Pearmain [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lattice: key does not accept German umlaute
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Prof Brian Ripley schrieb: [...] | It works correctly for me in CP1252 with R-patched, and gives an error | in 2.7.0 (and works in 2.6.2). I think it was fixed as side effect of | | oRare string width calculations in package grid were not | interpreting the string encoding correctly. | | although it is not the same problem that NEWS item refers to. | | My error message in 2.7.0 was | | Error in grid.Call.graphics(L_setviewport, pvp, TRUE) : | invalid input 'Männer' in 'utf8towcs' | | which is what makes me think this was to do with sizing the viewport. | | | So please update to R-patched and try again. That's it! Thanks for your help. Bernd -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFISSwiUsbvfbd00+ERAphpAJ9I5vxmzCYIkl52potRXsMG322J1gCgxe4S BgPTcyWju9A74csTgVPQSi4= =urOX -END PGP SIGNATURE- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] boxplot changes fontsize of labels
Hi all! So far I learned some R but finilizing my plots so they look publishable seems not to be possible. I set up some boxplots. Everything works well but when I put more then two of them in one plot the labels of the axes appear smaller than the normal font size. x - rnorm(30) y - rnorm(30) par(mfrow=c(1,4)) boxplot(x,y, names=c(horray, hurra)) mtext(Jubel, side=1, line=2) In case I take one or two boxplots this does not happen: par(mfrow=c(1,2)) boxplot(x,y, names=c(horray, hurra)) mtext(Jubel, side=1, line=2) The cex.axis seems not to be changed, as setting it to 1.0 doesn't change the behaviour. If cex.axis=1.3 in the first example the font size used by boxplot and by mtext is about the same. But as I use a function to draw quite some of these plots this hack is not a proper solution. I couldn't find anything about this behaviour in the documention or the inet. Can anybody explain? All hints are appriciated. Thanks, S. Merz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] simple data question
if I wanted to use a name for a column with two words say Dick Cheney and George Bush can I put these in quotes Dick Cheney and George Bush to get them to read into R using both read.table and read.zoo to recognize this. thanks Stephen -- Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple comment.char under read.table
according to the helpfile, comment only takes one character, so you'll have to do some 'magic' :) i'd suggest to first run mydata through sed, and replace one of the comment chars with another, then run read.table with the one comment char that remains. sed -e 's/^\^/!/' mydata.txt mydata2.txt alternatively, you could do read.table twice, once with ! and once with ^, and then pull out all the common rows from the two results. on 06/06/2008 03:47 AM Gundala Viswanath said the following: Hi all, Suppose I want to read a text file with read.table. It containt lines to be skipped that begins with ! and ^. Is there a way to include this two values in the read.table function? I tried this but doesn't seem to work. dat - read.table(mydata.txt, comment.char = c(!,^) , na.strings = null, sep = \t); Please advice. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simple data question
should work - don't even have to put them in quotes, if your field separator is not space. why don't you just try it and see what comes out? :) on 06/06/2008 08:43 AM stephen sefick said the following: if I wanted to use a name for a column with two words say Dick Cheney and George Bush can I put these in quotes Dick Cheney and George Bush to get them to read into R using both read.table and read.zoo to recognize this. thanks Stephen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R: Securities earning covariance
Update your version of zoo to the latest one. On Fri, Jun 6, 2008 at 3:18 AM, [EMAIL PROTECTED] wrote: Thank you for your very fast response. I just tried to use the zoo package, after having read the vignettes, but I get this error message: Warning messages: 1: In x$DAY : $ operator is invalid for atomic vectors, returning NULL 2: In x$EARNINGS : $ operator is invalid for atomic vectors, returning NULL 3: In x$DAY : $ operator is invalid for atomic vectors, returning NULL 4: In x$EARNINGS : $ operator is invalid for atomic vectors, returning NULL 5: In x$DAY : $ operator is invalid for atomic vectors, returning NULL 6: In x$EARNINGS : $ operator is invalid for atomic vectors, returning NULL Am I missing something ? Thank you again Angelo Linardi -Messaggio originale- Da: Gabor Grothendieck [mailto:[EMAIL PROTECTED] Inviato: giovedì 5 giugno 2008 17.55 A: LINARDI ANGELO Cc: r-help@r-project.org Oggetto: Re: [R] Securities earning covariance Check out the three vignettes (i.e. pdf documents in the zoo package). e.g. Lines - SEC_ID DAY EARNING IT001 200701015.467 IT001 200701025.456 IT001 200701034.954 IT001 200701043.456 IT002 200701011.456 IT002 200701021.345 IT002 200701031.233 IT003 200701010.345 IT003 200701020.367 IT003 200701030.319 DF - read.table(textConnection(Lines), header = TRUE) DFs - split(DF, DF$SEC_ID) library(zoo) f - function(DF.) zoo(DF.$EARNING, as.Date(format(DF.$DAY), %Y%m%d)) z - do.call(merge, lapply(DFs, f)) cov(z) # uses n-1 On Thu, Jun 5, 2008 at 11:41 AM, [EMAIL PROTECTED] wrote: Good morning, I am a new R user and I am trying to learn how to use it. I am trying to solve this problem. I have a dataframe df of daily securities (for a year) earnings as follows: SEC_ID DAY EARNING IT001 200701015.467 IT001 200701025.456 IT001 200701034.954 IT001 200701043.456 .. IT002 200701011.456 IT002 200701021.345 IT002 200701031.233 .. IT003 200701010.345 IT003 200701020.367 IT003 200701030.319 .. And so on: about 800 different SEC_ID and about 18 rows. I have to calculate the covariance for each couple of securities x and y according to the formula: Cov(x,y) = (sum[(x-x')*(y-y')]/N)/(sx*sy) being x' and y' the mean of securities earning in the year, N the number of observations, sx and sy the standard deviation of x and y. To do this I could build a df2 data frame like this: DAY SEC_ID.xSEC_ID.yEARNING.x EARNING.y x' y' sx sy 20070101IT001 IT002 5.467 1.456 a b aa bb 20070101IT001 IT003 5.467 0.345 a c aa cc 20070101IT002 IT003 1.456 0.345 b c bb cc 20070102IT001 IT002 5.456 1.345 a b aa bb 20070102IT001 IT003 5.456 0.367 a c aa cc 20070102IT002 IT003 1.345 0.367 b c bb cc ... (merging df with itself with a condition SEC_ID.x SEC_ID.y) and then easily calculate the formula; but the dimensions are too big (the process stops whit an out-of-memory message). Besides partitioning the input and using a loop, are there any smarter solutions (eventually using split and other ways of subgroup merging to solve the problem ? Are there any shortcuts using statistical built-in functions (e.g. cov, vcov) ? Thank you in advance Angelo Linardi ** Le e-mail provenienti dalla Banca d'Italia sono trasmesse in buona fede e non comportano alcun vincolo ne' creano obblighi per la Banca stessa, salvo che cio' non sia espressamente previsto da un accordo scritto. Questa e-mail e' confidenziale. Qualora l'avesse ricevuta per errore, La preghiamo di comunicarne via e-mail la ricezione al mittente e di distruggerne il contenuto. La informiamo inoltre che l'utilizzo non autorizzato del messaggio o dei suoi allegati potrebbe costituire reato. Grazie per la collaborazione. -- E-mails from the Bank of Italy are sent in good faith but they are neither binding on the Bank nor to be understood as creating any obligation on its part except where provided for in a written
Re: [R] Merging two dataframes
try this: FullData - merge(ETC, SURVEY, by.x = ord, by.y = uid, all.x = T, all.y = F) on 06/06/2008 07:30 AM Michael Pearmain said the following: Hi All, Newbie question for you all but i have been looking at the archieves and the help dtuff to get a rough idea of what i want to do I would like to merge two dataframes together based on a keyed variable in one dataframe linking to the other dataframe. Only some of the cases will match but i would like to keep the others as well. My dataframes have 67 and 28 cases respectively and i would like ot end uip with one file 67 cases long (all 28 are matched cases). I can use the merge command to merge two datasets together this but i still get some odd results, i'm using the code below; ETC - read.csv(file=CSV_Data2.csv,head=TRUE,sep=,) 'SURVEY - read.csv(file=survey.csv,head=TRUE,sep=,) 'FullData - merge(ETC, SURVEY, by.SURVEY = uid, by.ETC = ord) The merged file seems to have 1800 cases while the ETC data file only has 67 and the SURVEY file only has 28. (Reading the help it looks as if it merges 1 case with all cases in the other file, which is not what i want) The matching variables fields are the 'ord' field and the 'uid' field Can anyone advise please? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] request: a class having max frequency
Dear R users I have a very basic question. I tried but could not find the required result. using dat - pima f - table(dat[,9]) f 0 1 500 268 i want to find that class say 0 having maximum frequency i.e 500. I used which.max(f) which provide 0 1 How can i get only the 0. Thanks and best regards Muhammad Azam Ph.D. Student Department of Medical Statistics, Informatics and Health Economics University of Innsbruck, Austria [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] request: a class having max frequency
On 6/6/2008 9:14 AM, Muhammad Azam wrote: Dear R users I have a very basic question. I tried but could not find the required result. using dat - pima f - table(dat[,9]) f 0 1 500 268 i want to find that class say 0 having maximum frequency i.e 500. I used which.max(f) which provide 0 1 How can i get only the 0. Thanks and table(iris$Species) setosa versicolor virginica 50 50 50 which.max(table(iris$Species)) setosa 1 names(which.max(table(iris$Species))) [1] setosa best regards Muhammad Azam Ph.D. Student Department of Medical Statistics, Informatics and Health Economics University of Innsbruck, Austria [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Chuck Cleland, Ph.D. NDRI, Inc. (www.ndri.org) 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] request: a class having max frequency
The 0 is the name of the item and the 1 is the index in f of the maximum class. (since f is a table, and the first element of the table is the maximum, which.max returns a 1) So, if you just want to know which class is maximum you can say names(which.max(f)) Michael Conklin Chief Methodologist - Advanced Analytics MarketTools, Inc. 6465 Wayzata Blvd. Suite 170 Minneapolis, MN 55426 Tel: 952.417.4719 | Mobile:612.201.8978 [EMAIL PROTECTED] MarketTools(r)http://www.markettools.com This e-mail and any attachments may contain privileged, confidential or proprietary information. If you are not the intended recipient, be aware that any review, copying, or distribution of this e-mail or any attachment is strictly prohibited. If you have received this e-mail in error, please return it to the sender immediately, and permanently delete the original and any copies from your system. Thank you for your cooperation. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Muhammad Azam Sent: Friday, June 06, 2008 8:15 AM To: R Help; R-help request Subject: [R] request: a class having max frequency Dear R users I have a very basic question. I tried but could not find the required result. using dat - pima f - table(dat[,9]) f 0 1 500 268 i want to find that class say 0 having maximum frequency i.e 500. I used which.max(f) which provide 0 1 How can i get only the 0. Thanks and best regards Muhammad Azam Ph.D. Student Department of Medical Statistics, Informatics and Health Economics University of Innsbruck, Austria [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] request: a class having max frequency
names(f)[which.max(f)] on 06/06/2008 09:14 AM Muhammad Azam said the following: Dear R users I have a very basic question. I tried but could not find the required result. using dat - pima f - table(dat[,9]) f 0 1 500 268 i want to find that class say 0 having maximum frequency i.e 500. I used which.max(f) which provide 0 1 How can i get only the 0. Thanks and best regards Muhammad Azam Ph.D. Student Department of Medical Statistics, Informatics and Health Economics University of Innsbruck, Austria [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem in executing R on server
Run the sessionInfo() command in R, as the posting guide requests! Jason Lee wrote: Hi, I am not too sure its what you meant :- Below is the closest data for each session from top PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 26792 jason 25 0 283m 199m 2620 R 100 0.6 0:00.38 R The numbers changed as the processes are running. I am actually sharing the server with other few people. I dont think this is a problem. And, for my own pc, PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 6192 jason 25 0 157m 148m 2888 R 100 14.8 1081:21 R On Fri, Jun 6, 2008 at 12:46 PM, Erik Iverson [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: And what is your sessionInfo() in each case! Jason Lee wrote: Hi, I query free -m, On my server it is, total used free sharedbuffers cached Mem: 32190 8758 23431 0742 2156 And on my pc, total used free sharedbuffers cached Mem: 1002986 16 0132 255 On the server, the above figure is after I exited the R. It seems that there are still alot free MB available if I am not wrong. On Fri, Jun 6, 2008 at 12:29 PM, Erik Iverson [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: How much RAM is installed in your Sun Solaris server? How much RAM is installed on your PC? Jason Lee wrote: Hi, I am actually trying to do some matrix multiplications of large datasets of 3000 columns and 150 rows. And I am running R version 2.7.0. http://2.7.0. http://2.7.0. http://2.7.0. I tried setting R --min-vsize=10M --max-vsize=100M --min-nsize=500k --max-nsize=1000M Yet I still get:- Error: cannot allocate vector of size 17.7 Mb I am running on Sun Solaris server. Please advise. Thanks. On Fri, Jun 6, 2008 at 11:50 AM, Erik Iverson [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Jason Lee wrote: Hi R-listers, I have problem in executing my R on server. It returns me Error: cannot allocate vector of size 15.8 Mb each time when i execute R on the server. But it doesnt give me any problem when i try executing on my own Pc (except it runs extremely slow). Any pointers to this? I tried to read the FAQ on this issue before in the archive but it seems there is no one solution to this. And that is because there is no one cause to this issue. I might guess your 'server' has less memory than your 'PC', but you didn't say anything your respective setups, or what you are even trying to do with R. I tried to simplified my code but it seems the problem is still the same. Please advise. Thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailto:R-help@r-project.org mailto:R-help@r-project.org mailto:R-help@r-project.org mailto:R-help@r-project.org mailto:R-help@r-project.org mailto:R-help@r-project.org mailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] request: a class having max frequency
On 6/6/2008 9:18 AM, Chuck Cleland wrote: On 6/6/2008 9:14 AM, Muhammad Azam wrote: Dear R users I have a very basic question. I tried but could not find the required result. using dat - pima f - table(dat[,9]) f 0 1 500 268 i want to find that class say 0 having maximum frequency i.e 500. I used which.max(f) which provide 0 1 How can i get only the 0. Thanks and table(iris$Species) setosa versicolor virginica 50 50 50 which.max(table(iris$Species)) setosa 1 names(which.max(table(iris$Species))) [1] setosa If, as above, more than one category frequency is at the maximum, you might want something like this: x - table(iris$Species) which(x == max(x)) setosa versicolor virginica 1 2 3 names(which(x == max(x))) [1] setosa versicolor virginica best regards Muhammad Azam Ph.D. Student Department of Medical Statistics, Informatics and Health Economics University of Innsbruck, Austria [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Chuck Cleland, Ph.D. NDRI, Inc. (www.ndri.org) 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Manipulating DataSets
Hello R-users, I have a very simple problem I wanted to solve. I have a large dataset as such: Lag X.Symbol Time TickType ReferenceNumber Price Size X.Symbol.1 Time.1 TickType.1 ReferenceNumber.1 1 ES 3:ESZ7.GB 08:30:00B74390987 151075 44 3:ESZ7.GB08:30:00 A 74390988 2 ES 3:YMZ7.EC 08:30:00B74390993 13686 17 3:YMZ7.EC08:30:00 A 74390994 3 YM 3:ESZ7.GB 08:30:00B74391135 151075 49 3:ESZ7.GB08:30:00 A 74391136 4 YM 3:YMZ7.EC 08:30:00B74390998 13686 17 3:YMZ7.EC08:30:00 A 74390999 5 YM 3:ESZ7.GB 08:30:00B74391135 151075 49 3:ESZ7.GB08:30:00 A 74391136 6 YM 3:YMZ7.EC 08:30:00B74391000 13686 14 3:YMZ7.EC08:30:00 A 74391001 Price.1 Size.1 LeadTime MidPoint Spread 1 151100 22 08:30:00 *151087.5* 25 2 13688 27 08:30:00 13687.0 2 3 151100 22 08:30:00 *151087.5* 25 4 13688 27 08:30:00 13687.0 2 5 151100 22 08:30:00 151087.5 25 6 13688 27 08:30:00 13687.0 2 All I wanted to do was take the Log(MidPoint[2]) - Log(MidPoint[1]) for a symbol 3:ESZ7.GB So the first one would be log(151087.5) - log(151087.5). I wanted to do this throughout the data set and add that in another column. I would appreciate any help. Regards, Neil Gupta [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Agreggating data using external aggregation rules
Use aggregate() for aggregation and use indexing or subset() for selection. Alternately try the sqldf package: http://sqldf.googlecode.com which allows one to perform SQL operations on data frames. On Fri, Jun 6, 2008 at 6:12 AM, [EMAIL PROTECTED] wrote: Dear R experts, I am currently facing a tricky problem which I have read a lot about in the various R mailing lists without finding exactly what I need. I have a big data frame DF (about 2,000,000 rows) with 7 columns being variables and 1 being a measure (using reshape package nomeclature). There are no duplicates in it. Fot each of the variables I have some rules to apply, being COD_IN the value of the variable in the DF, COD_OUT the one to be transformed to; once obtained the new codes in the DF I have to aggregate the new DF (for example summing the measure). Usually the total transformation (merge+aggregate) really decreases the number of lines in the data frame, but sometimes it can grows depending on the rule. Just to give an idea, the first rule in v1 maps 820 different values into 7 ones. Using SQL and a database this can be done in a very straightforward way (for example on the variable v1): Select COD_OUT, v2, v3, v4, v5, v6, v7, sum(measure) From DF, RULE_v1 Where v1=COD_IN Group by v2, v3,v4, v5, v6, v7 So the first choice would be using a database; the second one would be splitting the data frame and then joining the results. Is there any other possibility to merge+aggregate caused by the merge ? Thank you in advance Angelo Linardi ** Le e-mail provenienti dalla Banca d'Italia sono trasmesse in buona fede e non comportano alcun vincolo ne' creano obblighi per la Banca stessa, salvo che cio' non sia espressamente previsto da un accordo scritto. Questa e-mail e' confidenziale. Qualora l'avesse ricevuta per errore, La preghiamo di comunicarne via e-mail la ricezione al mittente e di distruggerne il contenuto. La informiamo inoltre che l'utilizzo non autorizzato del messaggio o dei suoi allegati potrebbe costituire reato. Grazie per la collaborazione. -- E-mails from the Bank of Italy are sent in good faith but they are neither binding on the Bank nor to be understood as creating any obligation on its part except where provided for in a written agreement. This e-mail is confidential. If you have received it by mistake, please inform the sender by reply e-mail and delete it from your system. Please also note that the unauthorized disclosure or use of the message or any attachments could be an offence. Thank you for your cooperation. ** __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Subsetting to unique values
I want to take the first row of each unique ID value from a data frame. For instance ddTable - data.frame(Id=c(1,1,2,2),name=c(Paul,Joe,Bob,Larry)) I want a dataset that is Id Name 1 Paul 2 Bob unique(ddTable) Will give me all 4 rows, and unique(ddTable$Id) Will give me c(1,2), but not accompanied by the name column. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How can I display a characters table ?
Dear Maura, try the function textplot from the package gplots. you can say textplot(yourmatrix) and get a plot of a character matrix. On Fri, 6 Jun 2008, Maura E Monville wrote: I would like to generate a graphics text. I have a 67x2 table with 5-character string in col 1 and 2-character string in col 2. Is it possible to make such a table appear on a graphics or a message-box pop-up window ? Thank you so much. -- Maura E.M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Giovanna Jonalasinio è fuori ufficio, I' m away
Risposta automatica dal 06/06/08 fino al 14/06/08 I'm going to have limited access to my email untill the 14th of june 2008 Avrò accesso limitato all'email fino al 14 giugno 2008 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subsetting to unique values
On 6/6/2008 9:35 AM, Emslie, Paul [Ctr] wrote: I want to take the first row of each unique ID value from a data frame. For instance ddTable - data.frame(Id=c(1,1,2,2),name=c(Paul,Joe,Bob,Larry)) I want a dataset that is Id Name 1 Paul 2 Bob unique(ddTable) Will give me all 4 rows, and unique(ddTable$Id) Will give me c(1,2), but not accompanied by the name column. ddTable - data.frame(Id=c(1,1,2,2),name=c(Paul,Joe,Bob,Larry)) !duplicated(ddTable$Id) [1] TRUE FALSE TRUE FALSE ddTable[!duplicated(ddTable$Id),] Id name 1 1 Paul 3 2 Bob ?duplicated __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Chuck Cleland, Ph.D. NDRI, Inc. (www.ndri.org) 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simple data question
Good point. Thanks On Fri, Jun 6, 2008 at 9:05 AM, Daniel Folkinshteyn [EMAIL PROTECTED] wrote: should work - don't even have to put them in quotes, if your field separator is not space. why don't you just try it and see what comes out? :) on 06/06/2008 08:43 AM stephen sefick said the following: if I wanted to use a name for a column with two words say Dick Cheney and George Bush can I put these in quotes Dick Cheney and George Bush to get them to read into R using both read.table and read.zoo to recognize this. thanks Stephen -- Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subsetting to unique values
I don't have R on this machine but will this work. myrows - unique(ddTable[,1]) unis - ddTable(myrows, ] --- On Fri, 6/6/08, Emslie, Paul [Ctr] [EMAIL PROTECTED] wrote: From: Emslie, Paul [Ctr] [EMAIL PROTECTED] Subject: [R] Subsetting to unique values To: r-help@r-project.org Received: Friday, June 6, 2008, 9:35 AM I want to take the first row of each unique ID value from a data frame. For instance ddTable - data.frame(Id=c(1,1,2,2),name=c(Paul,Joe,Bob,Larry)) I want a dataset that is IdName 1 Paul 2 Bob unique(ddTable) Will give me all 4 rows, and unique(ddTable$Id) Will give me c(1,2), but not accompanied by the name column. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] which question
An example is: symbol=human[which(human[,3] %in% genes.sam.names),8] The data* human* and *genes.sam.names* are attached. The result of the above command is: symbol [1] CCL18 MARCO SYT13 [4] FOXC1 CDH3 [7] CA12 CELSR1 NM_018440 [10] MICROTUBULE-ASSOCIATED NM_015529 ESR1 [13] PHGDH GABRP LGMN [16] MMP9 BMP7 KLF5 [19] RIPK2 GATA3 NM_032023 [22] TRIM2 CCND1 MMP12 [25] LDHB AF493978 SOD2 [28] SOD2 SOD2 NME5 [31] STC2 RBP1 ROPN1 [34] RDH10 KRTHB1 SLPI [37] BBOX1 FOXA1 NM_005669 [40] MCCC2 CHI3L1 GSTM3 [43] LPIN1 DSC2 FADS2 [46] ELF5 CYP1B1 LMO4 [49] AL035297 NM_152398 AB018342 [52] PIK3R1 NFKBIE MLZE [55] NFIB NM_052997 NM_006023 [58] CPB1 CXCL13 CBR3 [61] NM_017527 FABP7 DACH [64] IFI27 ACOX2 CXCL11 [67] UGP2 CLDN4 M12740 [70] IGKC IGKC CLECSF12 [73] AY069977 HOXB2 SOX11 [76]NM_017422 TLR2 [79] CKS1B BC017946 APOBEC3B [82]HLA-DRB1 HLA-DQB1 [85]CCL13 C4orf7 [88]NM_173552 21345 Levels: (2 (32 (55.11 (AIB-1) (ALU (CAK1) (CAP4) (CASPASE ... ZYX As you can see, apart from gene symbols, which is the required thing, RefSeq ID sare also retrieved... Thanks a lot, Eleni On Fri, Jun 6, 2008 at 1:23 PM, Dieter Menne [EMAIL PROTECTED] wrote: Eleni Christodoulou elenichri at gmail.com writes: I was trying to select a column of a data frame using the *which* command. I was actually selecting the rows of the data frame using *which, *and then displayed a certain column of it. The command that I was using is: sequence=*mydata*[*which*(human[,3] %in% genes.sam.names),*9*] Please provide a running example. The *mydata* are difficult to read. Dieter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subsetting to unique values
Emslie, Paul [Ctr] emsliep at atac.mil writes: I want to take the first row of each unique ID value from a data frame. For instance ddTable - data.frame(Id=c(1,1,2,2),name=c(Paul,Joe,Bob,Larry)) I want a dataset that is IdName 1 Paul 2 Bob unique(ddTable) Will give me all 4 rows, and unique(ddTable$Id) Will give me c(1,2), but not accompanied by the name column. ddTable[-which(duplicated(ddTable$Id)), ] HTH, Adrian __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Java to R interface
Try and make sure that R is in your windows Path variable I got your message when I first did this, but when I did the about it then worked... == Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html == [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Startup speed for a lengthy script
Colleagues, Several days ago, I wrote to the list about a lengthy delay in startup of a a script. I will start with a brief summary of that email. I have a 10,000 line script of which the final 3000 lines constitute a function. The script contains time-markers (cat(date()) to that I can determine how fast it was read. When I invoke the script from the OS (R --slave Script.R; similar performance with R 2.6.1 or 2.7.0 on a Mac / Linux / Windows), the first 7000 lines were read in 5 seconds, then it took 2 minutes to read the remaining 3000 lines. I inquired as to the cause for the lengthy reading of the final 3000 lines. Subsequently, I whittled the 3000 lines to ~ 1000 (moving 2000 lines to smaller functions). Now the first 9000 lines still reads in ~ 6 seconds and the final 1000 lines in ~ 15 seconds. Better but not ideal. However, I just encountered a new situation that I don't understand. The R code is now embedded in a graphical interface built with Real Basic. When I invoke the script in that environment, the first 9000 lines takes the usual 6 seconds. But, to my surprise, the final 1000 lines takes 2 seconds! There is one major difference in the implementation. With the GUI, the commands are pushed, i.e., the GUI opens R, then sends a continuous stream of code. Does anyone have any idea as to why the delay should be so different in the two settings? Dennis Dennis Fisher MD P (The P Less Than Company) Phone: 1-866-PLessThan (1-866-753-7784) Fax: 1-415-564-2220 www.PLessThan.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] which question
I didn't get any attached data, but my suspicion here is that you have somehow got RefSeq IDs in column 8 of human, as well as the gene symbols. Did you read this data in from a text file? Eleni Christodoulou wrote: An example is: symbol=human[which(human[,3] %in% genes.sam.names),8] The data* human* and *genes.sam.names* are attached. The result of the above command is: symbol [1] CCL18 MARCO SYT13 [4] FOXC1 CDH3 [7] CA12 CELSR1 NM_018440 [10] MICROTUBULE-ASSOCIATED NM_015529 ESR1 [13] PHGDH GABRP LGMN [16] MMP9 BMP7 KLF5 [19] RIPK2 GATA3 NM_032023 [22] TRIM2 CCND1 MMP12 [25] LDHB AF493978 SOD2 [28] SOD2 SOD2 NME5 [31] STC2 RBP1 ROPN1 [34] RDH10 KRTHB1 SLPI [37] BBOX1 FOXA1 NM_005669 [40] MCCC2 CHI3L1 GSTM3 [43] LPIN1 DSC2 FADS2 [46] ELF5 CYP1B1 LMO4 [49] AL035297 NM_152398 AB018342 [52] PIK3R1 NFKBIE MLZE [55] NFIB NM_052997 NM_006023 [58] CPB1 CXCL13 CBR3 [61] NM_017527 FABP7 DACH [64] IFI27 ACOX2 CXCL11 [67] UGP2 CLDN4 M12740 [70] IGKC IGKC CLECSF12 [73] AY069977 HOXB2 SOX11 [76]NM_017422 TLR2 [79] CKS1B BC017946 APOBEC3B [82]HLA-DRB1 HLA-DQB1 [85]CCL13 C4orf7 [88]NM_173552 21345 Levels: (2 (32 (55.11 (AIB-1) (ALU (CAK1) (CAP4) (CASPASE ... ZYX As you can see, apart from gene symbols, which is the required thing, RefSeq ID sare also retrieved... Thanks a lot, Eleni On Fri, Jun 6, 2008 at 1:23 PM, Dieter Menne [EMAIL PROTECTED] wrote: Eleni Christodoulou elenichri at gmail.com writes: I was trying to select a column of a data frame using the *which* command. I was actually selecting the rows of the data frame using *which, *and then displayed a certain column of it. The command that I was using is: sequence=*mydata*[*which*(human[,3] %in% genes.sam.names),*9*] Please provide a running example. The *mydata* are difficult to read. Dieter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Richard D. Pearson [EMAIL PROTECTED] School of Computer Science,http://www.cs.man.ac.uk/~pearsonr University of Manchester, Tel: +44 161 275 6178 Oxford Road, Mob: +44 7971 221181 Manchester M13 9PL, UK.Fax: +44 161 275 6204 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merging two dataframes
cool. :) yea, the argument names are by.x and by.y, so your by.etc were ignored in the black hole of arguments passed to other methods on 06/06/2008 09:11 AM Michael Pearmain said the following: Thanks Works perfectly. Was the problem due to me putting by.survey and by.etc rather than by.y and by.x? I think when i was playing around i tried the all. command in that setup as well Mike On Fri, Jun 6, 2008 at 2:07 PM, Daniel Folkinshteyn [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: try this: FullData - merge(ETC, SURVEY, by.x = ord, by.y = uid, all.x = T, all.y = F) on 06/06/2008 07:30 AM Michael Pearmain said the following: Hi All, Newbie question for you all but i have been looking at the archieves and the help dtuff to get a rough idea of what i want to do I would like to merge two dataframes together based on a keyed variable in one dataframe linking to the other dataframe. Only some of the cases will match but i would like to keep the others as well. My dataframes have 67 and 28 cases respectively and i would like ot end uip with one file 67 cases long (all 28 are matched cases). I can use the merge command to merge two datasets together this but i still get some odd results, i'm using the code below; ETC - read.csv(file=CSV_Data2.csv,head=TRUE,sep=,) 'SURVEY - read.csv(file=survey.csv,head=TRUE,sep=,) 'FullData - merge(ETC, SURVEY, by.SURVEY = uid, by.ETC = ord) The merged file seems to have 1800 cases while the ETC data file only has 67 and the SURVEY file only has 28. (Reading the help it looks as if it merges 1 case with all cases in the other file, which is not what i want) The matching variables fields are the 'ord' field and the 'uid' field Can anyone advise please? -- Michael Pearmain Senior Statistical Analyst 1st Floor, 180 Great Portland St. London W1W 5QZ t +44 (0) 2032191684 [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] Doubleclick is a part of the Google group of companies __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] fit.variogram sgeostat error
Hi, When i do the next line it work fine: fit.spherical(var, 0, 2.6, 250, type='c', iterations=10, tolerance=1e-06, echo=FALSE, plot.it=T, weighted=TRUE, delta=0.1, verbose=TRUE) But, i use the next and send one error: fit.variogram(spherical, var, nugget=0, sill=2.6, range=250, plot.it=TRUE, iterations=0) This is the error: Error in fit.variogram(spherical, var, nugget = 0, sill = 2.6, range = 250, : unused argument(s) (nugget = 0, sill = 2.6, range = 250, plot.it = TRUE, iterations = 0) any suggest? Alexys H __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lsmeans
Hello, I have the next function call: lme(fixed=Error ~ Temperature * Tumour ,random = ~1|ID, data=error_DB) which returns an lme object. I am interested on carrying out some kind of lsmeans on the data returned, but I cannot find any function to do this in R. I'have seen the effect() function, but it does not work with lme objects. Any idea? Best, Dani -- Daniel Valverde Saubí Grup de Biologia Molecular de Llevats Facultat de Veterinària de la Universitat Autònoma de Barcelona Edifici V, Campus UAB 08193 Cerdanyola del Vallès- SPAIN Centro de Investigación Biomédica en Red en Bioingeniería, Biomateriales y Nanomedicina (CIBER-BBN) Grup d'Aplicacions Biomèdiques de la RMN Facultat de Biociències Universitat Autònoma de Barcelona Edifici Cs, Campus UAB 08193 Cerdanyola del Vallès- SPAIN +34 93 5814126 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Improving data processing efficiency
Anybody have any thoughts on this? Please? :) on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following: Hi everyone! I have a question about data processing efficiency. My data are as follows: I have a data set on quarterly institutional ownership of equities; some of them have had recent IPOs, some have not (I have a binary flag set). The total dataset size is 700k+ rows. My goal is this: For every quarter since issue for each IPO, I need to find a matched firm in the same industry, and close in market cap. So, e.g., for firm X, which had an IPO, i need to find a matched non-issuing firm in quarter 1 since IPO, then a (possibly different) non-issuing firm in quarter 2 since IPO, etc. Repeat for each issuing firm (there are about 8300 of these). Thus it seems to me that I need to be doing a lot of data selection and subsetting, and looping (yikes!), but the result appears to be highly inefficient and takes ages (well, many hours). What I am doing, in pseudocode, is this: 1. for each quarter of data, getting out all the IPOs and all the eligible non-issuing firms. 2. for each IPO in a quarter, grab all the non-issuers in the same industry, sort them by size, and finally grab a matching firm closest in size (the exact procedure is to grab the closest bigger firm if one exists, and just the biggest available if all are smaller) 3. assign the matched firm-observation the same quarters since issue as the IPO being matched 4. rbind them all into the matching dataset. The function I currently have is pasted below, for your reference. Is there any way to make it produce the same result but much faster? Specifically, I am guessing eliminating some loops would be very good, but I don't see how, since I need to do some fancy footwork for each IPO in each quarter to find the matching firm. I'll be doing a few things similar to this, so it's somewhat important to up the efficiency of this. Maybe some of you R-fu masters can clue me in? :) I would appreciate any help, tips, tricks, tweaks, you name it! :) == my function below === fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata, quarters_since_issue=40) { result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is cheaper, so typecast the result to matrix colnames = names(tfdata) quarterends = sort(unique(tfdata$DATE)) for (aquarter in quarterends) { tfdata_quarter = tfdata[tfdata$DATE == aquarter, ] tfdata_quarter_fitting_nonissuers = tfdata_quarter[ (tfdata_quarter$Quarters.Since.Latest.Issue quarters_since_issue) (tfdata_quarter$IPO.Flag == 0), ] tfdata_quarter_ipoissuers = tfdata_quarter[ tfdata_quarter$IPO.Flag == 1, ] for (i in 1:nrow(tfdata_quarter_ipoissuers)) { arow = tfdata_quarter_ipoissuers[i,] industrypeers = tfdata_quarter_fitting_nonissuers[ tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ] industrypeers = industrypeers[ order(industrypeers$Market.Cap.13f), ] if ( nrow(industrypeers) 0 ) { if ( nrow(industrypeers[industrypeers$Market.Cap.13f = arow$Market.Cap.13f, ]) 0 ) { bestpeer = industrypeers[industrypeers$Market.Cap.13f = arow$Market.Cap.13f, ][1,] } else { bestpeer = industrypeers[nrow(industrypeers),] } bestpeer$Quarters.Since.IPO.Issue = arow$Quarters.Since.IPO.Issue #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO == bestpeer$PERMNO] = 1 result = rbind(result, as.matrix(bestpeer)) } } #result = rbind(result, tfdata_quarter) print (aquarter) } result = as.data.frame(result) names(result) = colnames return(result) } = end of my function = __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] store filename
__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] label outliers in geom_boxplot (ggplot2)
It's too obvious, so I am positive that there is a good reason for not doing this, but still: why is it not possible, to have an outlier output in stat_boxplot that can be used at geom_text()? Something like this, with upper: dat=data.frame(num=rep(1,20), val=c(runif(18),3,3.5), name=letters[1:20]) ggplot(dat, aes(y=val, x=num))+stat_boxplot(outlier.size=4, + outlier.colour=green)+geom_text(aes(y=..upper..), label=This is upper hinge) Unfortunately, this does not work and gives the error message: Error in eval(expr, envir, enclos) : object upper not found Is it because you can only use stat outputs within the stat statements? Could it be possible to make them available outside the statements too? You can generally, but it won't work here. The problem is that you want a different y aesthetic for the statistic (val) than you do for the geom (upper) and there's no way to get around that with the current design of ggplot2. Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Improving data processing efficiency
One thing that is likely to speed the code significantly is if you create 'result' to be its final size and then subscript into it. Something like: result[i, ] - bestpeer (though I'm not sure if 'i' is the proper index). Patrick Burns [EMAIL PROTECTED] +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and A Guide for the Unwilling S User) Daniel Folkinshteyn wrote: Anybody have any thoughts on this? Please? :) on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following: Hi everyone! I have a question about data processing efficiency. My data are as follows: I have a data set on quarterly institutional ownership of equities; some of them have had recent IPOs, some have not (I have a binary flag set). The total dataset size is 700k+ rows. My goal is this: For every quarter since issue for each IPO, I need to find a matched firm in the same industry, and close in market cap. So, e.g., for firm X, which had an IPO, i need to find a matched non-issuing firm in quarter 1 since IPO, then a (possibly different) non-issuing firm in quarter 2 since IPO, etc. Repeat for each issuing firm (there are about 8300 of these). Thus it seems to me that I need to be doing a lot of data selection and subsetting, and looping (yikes!), but the result appears to be highly inefficient and takes ages (well, many hours). What I am doing, in pseudocode, is this: 1. for each quarter of data, getting out all the IPOs and all the eligible non-issuing firms. 2. for each IPO in a quarter, grab all the non-issuers in the same industry, sort them by size, and finally grab a matching firm closest in size (the exact procedure is to grab the closest bigger firm if one exists, and just the biggest available if all are smaller) 3. assign the matched firm-observation the same quarters since issue as the IPO being matched 4. rbind them all into the matching dataset. The function I currently have is pasted below, for your reference. Is there any way to make it produce the same result but much faster? Specifically, I am guessing eliminating some loops would be very good, but I don't see how, since I need to do some fancy footwork for each IPO in each quarter to find the matching firm. I'll be doing a few things similar to this, so it's somewhat important to up the efficiency of this. Maybe some of you R-fu masters can clue me in? :) I would appreciate any help, tips, tricks, tweaks, you name it! :) == my function below === fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata, quarters_since_issue=40) { result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is cheaper, so typecast the result to matrix colnames = names(tfdata) quarterends = sort(unique(tfdata$DATE)) for (aquarter in quarterends) { tfdata_quarter = tfdata[tfdata$DATE == aquarter, ] tfdata_quarter_fitting_nonissuers = tfdata_quarter[ (tfdata_quarter$Quarters.Since.Latest.Issue quarters_since_issue) (tfdata_quarter$IPO.Flag == 0), ] tfdata_quarter_ipoissuers = tfdata_quarter[ tfdata_quarter$IPO.Flag == 1, ] for (i in 1:nrow(tfdata_quarter_ipoissuers)) { arow = tfdata_quarter_ipoissuers[i,] industrypeers = tfdata_quarter_fitting_nonissuers[ tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ] industrypeers = industrypeers[ order(industrypeers$Market.Cap.13f), ] if ( nrow(industrypeers) 0 ) { if ( nrow(industrypeers[industrypeers$Market.Cap.13f = arow$Market.Cap.13f, ]) 0 ) { bestpeer = industrypeers[industrypeers$Market.Cap.13f = arow$Market.Cap.13f, ][1,] } else { bestpeer = industrypeers[nrow(industrypeers),] } bestpeer$Quarters.Since.IPO.Issue = arow$Quarters.Since.IPO.Issue #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO == bestpeer$PERMNO] = 1 result = rbind(result, as.matrix(bestpeer)) } } #result = rbind(result, tfdata_quarter) print (aquarter) } result = as.data.frame(result) names(result) = colnames return(result) } = end of my function = __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Improving data processing efficiency
Try reading the posting guide before posting. On Fri, Jun 6, 2008 at 11:12 AM, Daniel Folkinshteyn [EMAIL PROTECTED] wrote: Anybody have any thoughts on this? Please? :) on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following: Hi everyone! I have a question about data processing efficiency. My data are as follows: I have a data set on quarterly institutional ownership of equities; some of them have had recent IPOs, some have not (I have a binary flag set). The total dataset size is 700k+ rows. My goal is this: For every quarter since issue for each IPO, I need to find a matched firm in the same industry, and close in market cap. So, e.g., for firm X, which had an IPO, i need to find a matched non-issuing firm in quarter 1 since IPO, then a (possibly different) non-issuing firm in quarter 2 since IPO, etc. Repeat for each issuing firm (there are about 8300 of these). Thus it seems to me that I need to be doing a lot of data selection and subsetting, and looping (yikes!), but the result appears to be highly inefficient and takes ages (well, many hours). What I am doing, in pseudocode, is this: 1. for each quarter of data, getting out all the IPOs and all the eligible non-issuing firms. 2. for each IPO in a quarter, grab all the non-issuers in the same industry, sort them by size, and finally grab a matching firm closest in size (the exact procedure is to grab the closest bigger firm if one exists, and just the biggest available if all are smaller) 3. assign the matched firm-observation the same quarters since issue as the IPO being matched 4. rbind them all into the matching dataset. The function I currently have is pasted below, for your reference. Is there any way to make it produce the same result but much faster? Specifically, I am guessing eliminating some loops would be very good, but I don't see how, since I need to do some fancy footwork for each IPO in each quarter to find the matching firm. I'll be doing a few things similar to this, so it's somewhat important to up the efficiency of this. Maybe some of you R-fu masters can clue me in? :) I would appreciate any help, tips, tricks, tweaks, you name it! :) == my function below === fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata, quarters_since_issue=40) { result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is cheaper, so typecast the result to matrix colnames = names(tfdata) quarterends = sort(unique(tfdata$DATE)) for (aquarter in quarterends) { tfdata_quarter = tfdata[tfdata$DATE == aquarter, ] tfdata_quarter_fitting_nonissuers = tfdata_quarter[ (tfdata_quarter$Quarters.Since.Latest.Issue quarters_since_issue) (tfdata_quarter$IPO.Flag == 0), ] tfdata_quarter_ipoissuers = tfdata_quarter[ tfdata_quarter$IPO.Flag == 1, ] for (i in 1:nrow(tfdata_quarter_ipoissuers)) { arow = tfdata_quarter_ipoissuers[i,] industrypeers = tfdata_quarter_fitting_nonissuers[ tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ] industrypeers = industrypeers[ order(industrypeers$Market.Cap.13f), ] if ( nrow(industrypeers) 0 ) { if ( nrow(industrypeers[industrypeers$Market.Cap.13f = arow$Market.Cap.13f, ]) 0 ) { bestpeer = industrypeers[industrypeers$Market.Cap.13f = arow$Market.Cap.13f, ][1,] } else { bestpeer = industrypeers[nrow(industrypeers),] } bestpeer$Quarters.Since.IPO.Issue = arow$Quarters.Since.IPO.Issue #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO == bestpeer$PERMNO] = 1 result = rbind(result, as.matrix(bestpeer)) } } #result = rbind(result, tfdata_quarter) print (aquarter) } result = as.data.frame(result) names(result) = colnames return(result) } = end of my function = __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Store filename
Hi list, Is it possible to save the name of a filename automatically when reading it using read.table() or some other function? My aim is to create then an output table with the name of the original table with a suffix like _out example: mydata = read.table(Run224_v2_060308.txt, sep = \t, header = TRUE) ## store name? myfile = the_name_of_the_file ## do analysis of data and store in a data.frame myoutput ## write output in tab format write.table(myoutput, c(myfile,_out.txt),sep=\t) the name of the new file will be Run224_v2_060308_out.txt Thanks in advanve, David __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] where to download BRugs?
Hi all, Does anyone know where to download the BRugs package? I did not find it on r-project website. Thanks. NL __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Improving data processing efficiency
i did! what did i miss? on 06/06/2008 11:45 AM Gabor Grothendieck said the following: Try reading the posting guide before posting. On Fri, Jun 6, 2008 at 11:12 AM, Daniel Folkinshteyn [EMAIL PROTECTED] wrote: Anybody have any thoughts on this? Please? :) on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following: Hi everyone! I have a question about data processing efficiency. My data are as follows: I have a data set on quarterly institutional ownership of equities; some of them have had recent IPOs, some have not (I have a binary flag set). The total dataset size is 700k+ rows. My goal is this: For every quarter since issue for each IPO, I need to find a matched firm in the same industry, and close in market cap. So, e.g., for firm X, which had an IPO, i need to find a matched non-issuing firm in quarter 1 since IPO, then a (possibly different) non-issuing firm in quarter 2 since IPO, etc. Repeat for each issuing firm (there are about 8300 of these). Thus it seems to me that I need to be doing a lot of data selection and subsetting, and looping (yikes!), but the result appears to be highly inefficient and takes ages (well, many hours). What I am doing, in pseudocode, is this: 1. for each quarter of data, getting out all the IPOs and all the eligible non-issuing firms. 2. for each IPO in a quarter, grab all the non-issuers in the same industry, sort them by size, and finally grab a matching firm closest in size (the exact procedure is to grab the closest bigger firm if one exists, and just the biggest available if all are smaller) 3. assign the matched firm-observation the same quarters since issue as the IPO being matched 4. rbind them all into the matching dataset. The function I currently have is pasted below, for your reference. Is there any way to make it produce the same result but much faster? Specifically, I am guessing eliminating some loops would be very good, but I don't see how, since I need to do some fancy footwork for each IPO in each quarter to find the matching firm. I'll be doing a few things similar to this, so it's somewhat important to up the efficiency of this. Maybe some of you R-fu masters can clue me in? :) I would appreciate any help, tips, tricks, tweaks, you name it! :) == my function below === fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata, quarters_since_issue=40) { result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is cheaper, so typecast the result to matrix colnames = names(tfdata) quarterends = sort(unique(tfdata$DATE)) for (aquarter in quarterends) { tfdata_quarter = tfdata[tfdata$DATE == aquarter, ] tfdata_quarter_fitting_nonissuers = tfdata_quarter[ (tfdata_quarter$Quarters.Since.Latest.Issue quarters_since_issue) (tfdata_quarter$IPO.Flag == 0), ] tfdata_quarter_ipoissuers = tfdata_quarter[ tfdata_quarter$IPO.Flag == 1, ] for (i in 1:nrow(tfdata_quarter_ipoissuers)) { arow = tfdata_quarter_ipoissuers[i,] industrypeers = tfdata_quarter_fitting_nonissuers[ tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ] industrypeers = industrypeers[ order(industrypeers$Market.Cap.13f), ] if ( nrow(industrypeers) 0 ) { if ( nrow(industrypeers[industrypeers$Market.Cap.13f = arow$Market.Cap.13f, ]) 0 ) { bestpeer = industrypeers[industrypeers$Market.Cap.13f = arow$Market.Cap.13f, ][1,] } else { bestpeer = industrypeers[nrow(industrypeers),] } bestpeer$Quarters.Since.IPO.Issue = arow$Quarters.Since.IPO.Issue #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO == bestpeer$PERMNO] = 1 result = rbind(result, as.matrix(bestpeer)) } } #result = rbind(result, tfdata_quarter) print (aquarter) } result = as.data.frame(result) names(result) = colnames return(result) } = end of my function = __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to force two regression coefficients to be equal but opposite in sign?
Is there a way to set up a regression in R that forces two coefficients to be equal but opposite in sign? I'm trying to setup a model where a subject appears in a pair of environments where a measurement X is made. There are a total of 5 environments, one of which is a baseline. But each observation is for a subject in only two of them, and not all subjects will appear in each environment. Each of the environments has an effect on the variable X. I want to measure the relative effects of each environment E on X with a model. Xj = Xi * Ei / Ej Ei of the baseline model is set equal to 1. With a log transform, a linear-looking regression can be written as: log(Xj) = log(Xi) + log(Ei) - log(Ej) My data looks like: #E1 X1 E2X2 1A .20 B.25 What I've tried in R: env - c(A,B,C,D,E) # Note: data is made up just for this example df - data.frame( X1 = c(.20,.10,.40,.05,.10,.24,.30,.70,.48,.22,.87,.29,.24,.19,.92), X2 = c(.25,.12,.45,.01,.19,.50,.30,.40,.50,.40,.68,.30,.16,.02,.70), E1 = c(A,A,A,B,B,B,C,C,C,D,D,D,E,E,E), E2 = c(B,C,D,A,D,E,A,B,E,B,C,E,A,B,C) ) model - lm(log(X2) ~ log(X1) + E1 + E2, data = df) summary(model) Call: lm(formula = log(X2) ~ log(X1) + E1 + E2, data = df) Residuals: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0.3240 0.2621 -0.5861 -1.0283 0.5861 0.4422 0.3831 -0.2608 -0.1222 0.9002 -0.5802 -0.3200 0.6452 -0.9634 0.3182 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 0.545631.71558 0.3180.763 log(X1) 1.297450.57295 2.2650.073 . E1B -0.235710.95738 -0.2460.815 E1C -0.570571.20490 -0.4740.656 E1D -0.229880.98274 -0.2340.824 E1E -1.171811.02918 -1.1390.306 E2B -0.167750.87803 -0.1910.856 E2C 0.059521.12779 0.0530.960 E2D 0.430771.19485 0.3610.733 E2E 0.406330.98289 0.4130.696 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1.004 on 5 degrees of freedom Multiple R-squared: 0.7622, Adjusted R-squared: 0.3343 F-statistic: 1.781 on 9 and 5 DF, p-value: 0.2721 What I need to do is force the corresponding environment coefficients to be equal in absolute value, but opposite in sign. That is: E1B = -E2B E1C = -E3C E1D = -E3D E1E = -E1E In essence, E1 and E2 are the same variable, but can play two different roles in the model depending on whether it's the first part of the observation or the second part. I searched the archive, and the closest thing I found to my situation was: http://tolstoy.newcastle.edu.au/R/e4/help/08/03/6773.html But the response to that thread didn't seem to be applicable to my situation. Any pointers would be appreciated. Thanks, Keith [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Improving data processing efficiency
Its summarized in the last line to r-help. Note reproducible and minimal. On Fri, Jun 6, 2008 at 12:03 PM, Daniel Folkinshteyn [EMAIL PROTECTED] wrote: i did! what did i miss? on 06/06/2008 11:45 AM Gabor Grothendieck said the following: Try reading the posting guide before posting. On Fri, Jun 6, 2008 at 11:12 AM, Daniel Folkinshteyn [EMAIL PROTECTED] wrote: Anybody have any thoughts on this? Please? :) on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following: Hi everyone! I have a question about data processing efficiency. My data are as follows: I have a data set on quarterly institutional ownership of equities; some of them have had recent IPOs, some have not (I have a binary flag set). The total dataset size is 700k+ rows. My goal is this: For every quarter since issue for each IPO, I need to find a matched firm in the same industry, and close in market cap. So, e.g., for firm X, which had an IPO, i need to find a matched non-issuing firm in quarter 1 since IPO, then a (possibly different) non-issuing firm in quarter 2 since IPO, etc. Repeat for each issuing firm (there are about 8300 of these). Thus it seems to me that I need to be doing a lot of data selection and subsetting, and looping (yikes!), but the result appears to be highly inefficient and takes ages (well, many hours). What I am doing, in pseudocode, is this: 1. for each quarter of data, getting out all the IPOs and all the eligible non-issuing firms. 2. for each IPO in a quarter, grab all the non-issuers in the same industry, sort them by size, and finally grab a matching firm closest in size (the exact procedure is to grab the closest bigger firm if one exists, and just the biggest available if all are smaller) 3. assign the matched firm-observation the same quarters since issue as the IPO being matched 4. rbind them all into the matching dataset. The function I currently have is pasted below, for your reference. Is there any way to make it produce the same result but much faster? Specifically, I am guessing eliminating some loops would be very good, but I don't see how, since I need to do some fancy footwork for each IPO in each quarter to find the matching firm. I'll be doing a few things similar to this, so it's somewhat important to up the efficiency of this. Maybe some of you R-fu masters can clue me in? :) I would appreciate any help, tips, tricks, tweaks, you name it! :) == my function below === fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata, quarters_since_issue=40) { result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is cheaper, so typecast the result to matrix colnames = names(tfdata) quarterends = sort(unique(tfdata$DATE)) for (aquarter in quarterends) { tfdata_quarter = tfdata[tfdata$DATE == aquarter, ] tfdata_quarter_fitting_nonissuers = tfdata_quarter[ (tfdata_quarter$Quarters.Since.Latest.Issue quarters_since_issue) (tfdata_quarter$IPO.Flag == 0), ] tfdata_quarter_ipoissuers = tfdata_quarter[ tfdata_quarter$IPO.Flag == 1, ] for (i in 1:nrow(tfdata_quarter_ipoissuers)) { arow = tfdata_quarter_ipoissuers[i,] industrypeers = tfdata_quarter_fitting_nonissuers[ tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ] industrypeers = industrypeers[ order(industrypeers$Market.Cap.13f), ] if ( nrow(industrypeers) 0 ) { if ( nrow(industrypeers[industrypeers$Market.Cap.13f = arow$Market.Cap.13f, ]) 0 ) { bestpeer = industrypeers[industrypeers$Market.Cap.13f = arow$Market.Cap.13f, ][1,] } else { bestpeer = industrypeers[nrow(industrypeers),] } bestpeer$Quarters.Since.IPO.Issue = arow$Quarters.Since.IPO.Issue #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO == bestpeer$PERMNO] = 1 result = rbind(result, as.matrix(bestpeer)) } } #result = rbind(result, tfdata_quarter) print (aquarter) } result = as.data.frame(result) names(result) = colnames return(result) } = end of my function = __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Improving data processing efficiency
That is the last line of every message to r-help. On Fri, Jun 6, 2008 at 12:05 PM, Gabor Grothendieck [EMAIL PROTECTED] wrote: Its summarized in the last line to r-help. Note reproducible and minimal. On Fri, Jun 6, 2008 at 12:03 PM, Daniel Folkinshteyn [EMAIL PROTECTED] wrote: i did! what did i miss? on 06/06/2008 11:45 AM Gabor Grothendieck said the following: Try reading the posting guide before posting. On Fri, Jun 6, 2008 at 11:12 AM, Daniel Folkinshteyn [EMAIL PROTECTED] wrote: Anybody have any thoughts on this? Please? :) on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following: Hi everyone! I have a question about data processing efficiency. My data are as follows: I have a data set on quarterly institutional ownership of equities; some of them have had recent IPOs, some have not (I have a binary flag set). The total dataset size is 700k+ rows. My goal is this: For every quarter since issue for each IPO, I need to find a matched firm in the same industry, and close in market cap. So, e.g., for firm X, which had an IPO, i need to find a matched non-issuing firm in quarter 1 since IPO, then a (possibly different) non-issuing firm in quarter 2 since IPO, etc. Repeat for each issuing firm (there are about 8300 of these). Thus it seems to me that I need to be doing a lot of data selection and subsetting, and looping (yikes!), but the result appears to be highly inefficient and takes ages (well, many hours). What I am doing, in pseudocode, is this: 1. for each quarter of data, getting out all the IPOs and all the eligible non-issuing firms. 2. for each IPO in a quarter, grab all the non-issuers in the same industry, sort them by size, and finally grab a matching firm closest in size (the exact procedure is to grab the closest bigger firm if one exists, and just the biggest available if all are smaller) 3. assign the matched firm-observation the same quarters since issue as the IPO being matched 4. rbind them all into the matching dataset. The function I currently have is pasted below, for your reference. Is there any way to make it produce the same result but much faster? Specifically, I am guessing eliminating some loops would be very good, but I don't see how, since I need to do some fancy footwork for each IPO in each quarter to find the matching firm. I'll be doing a few things similar to this, so it's somewhat important to up the efficiency of this. Maybe some of you R-fu masters can clue me in? :) I would appreciate any help, tips, tricks, tweaks, you name it! :) == my function below === fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata, quarters_since_issue=40) { result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is cheaper, so typecast the result to matrix colnames = names(tfdata) quarterends = sort(unique(tfdata$DATE)) for (aquarter in quarterends) { tfdata_quarter = tfdata[tfdata$DATE == aquarter, ] tfdata_quarter_fitting_nonissuers = tfdata_quarter[ (tfdata_quarter$Quarters.Since.Latest.Issue quarters_since_issue) (tfdata_quarter$IPO.Flag == 0), ] tfdata_quarter_ipoissuers = tfdata_quarter[ tfdata_quarter$IPO.Flag == 1, ] for (i in 1:nrow(tfdata_quarter_ipoissuers)) { arow = tfdata_quarter_ipoissuers[i,] industrypeers = tfdata_quarter_fitting_nonissuers[ tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ] industrypeers = industrypeers[ order(industrypeers$Market.Cap.13f), ] if ( nrow(industrypeers) 0 ) { if ( nrow(industrypeers[industrypeers$Market.Cap.13f = arow$Market.Cap.13f, ]) 0 ) { bestpeer = industrypeers[industrypeers$Market.Cap.13f = arow$Market.Cap.13f, ][1,] } else { bestpeer = industrypeers[nrow(industrypeers),] } bestpeer$Quarters.Since.IPO.Issue = arow$Quarters.Since.IPO.Issue #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO == bestpeer$PERMNO] = 1 result = rbind(result, as.matrix(bestpeer)) } } #result = rbind(result, tfdata_quarter) print (aquarter) } result = as.data.frame(result) names(result) = colnames return(result) } = end of my function = __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Store filename
well, where are you getting the filename in the first place? are you looping over a list of filenames that comes from somewhere? generally, for concatenating strings, look at function 'paste': write.table(myoutput, paste(myfilename,_out.txt, sep=''),sep=\t) on 06/06/2008 11:51 AM DAVID ARTETA GARCIA said the following: Hi list, Is it possible to save the name of a filename automatically when reading it using read.table() or some other function? My aim is to create then an output table with the name of the original table with a suffix like _out example: mydata = read.table(Run224_v2_060308.txt, sep = \t, header = TRUE) ## store name? myfile = the_name_of_the_file ## do analysis of data and store in a data.frame myoutput ## write output in tab format write.table(myoutput, c(myfile,_out.txt),sep=\t) the name of the new file will be Run224_v2_060308_out.txt Thanks in advanve, David __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Store filename
You can write your own function, something about like this: read.table2 - function(file, ...) { x - read.table(file, ...) attributes(x)[[file_name]] - file return(x) } mydata - read.table2(Run224_v2_060308.txt, sep = \t, header = TRUE) myfile - attr(x, file_name) On Fri, Jun 6, 2008 at 12:51 PM, DAVID ARTETA GARCIA [EMAIL PROTECTED] wrote: Hi list, Is it possible to save the name of a filename automatically when reading it using read.table() or some other function? My aim is to create then an output table with the name of the original table with a suffix like _out example: mydata = read.table(Run224_v2_060308.txt, sep = \t, header = TRUE) ## store name? myfile = the_name_of_the_file ## do analysis of data and store in a data.frame myoutput ## write output in tab format write.table(myoutput, c(myfile,_out.txt),sep=\t) the name of the new file will be Run224_v2_060308_out.txt Thanks in advanve, David __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] fit.contrast error
Hello, I am trying to perform a fit.contrast() on a lme object with this code: attach(error_DB) model_temperature - lme(Error ~ Temperature, data = error_DB,random=~1|ID) summary(model_temperature) fit.contrast(model_temperature, Temperature, c(-1,1), conf.int=0.95 ) detach(error_DB) but I got this error Error in `contrasts-`(`*tmp*`, value = c(-0.5, 0.5)) : contrasts apply only to factors My database is a dataframe, very similar to that of the Orthodont. Could anyone give me some advise on how to solve the problem? Best, Dani -- Daniel Valverde Saubí Grup de Biologia Molecular de Llevats Facultat de Veterinària de la Universitat Autònoma de Barcelona Edifici V, Campus UAB 08193 Cerdanyola del Vallès- SPAIN Centro de Investigación Biomédica en Red en Bioingeniería, Biomateriales y Nanomedicina (CIBER-BBN) Grup d'Aplicacions Biomèdiques de la RMN Facultat de Biociències Universitat Autònoma de Barcelona Edifici Cs, Campus UAB 08193 Cerdanyola del Vallès- SPAIN +34 93 5814126 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] where to download BRugs?
Dear NL. BRugs is available from the CRAN extras repository hosted by Brian Ripley. install.packages(BRugs) should install it as before (for R-2.7.x), if you have not changed the list of default repositories. Best wishes, Uwe Ligges Nanye Long wrote: Hi all, Does anyone know where to download the BRugs package? I did not find it on r-project website. Thanks. NL __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] choosing an appropriate linear model
Perhaps this was too big a question, so I'll ask something shorter: I have fit a linear model, and want to use its prediction intervals to calculate the sum of many individual predictions. 1) Some of the lower prediction intervals are negative, which is non-sensical. Should I just set all negative predictions to zero, or is there another way to require non-negative predictions only? 2) I am interested in the sum of many predictions based on the lm. How can I calculate the 95% prediction interval for the sum? Should I calculate a root mean square of the individual errors, or use a bootstrap method, or something else? ps. the data is attached to the end of this email. On Thu, Jun 5, 2008 at 6:25 PM, Levi Waldron [EMAIL PROTECTED] wrote: I am trying to model the observed leaching of wood preservative chemicals from treated wood during an outdoor experiment where leaching is caused by rainfall events. For each rainfall event, the amount of rainfall was recorded as well as the amount of preservative chemical leached. A number of climatic variables were measured, but the most important is the amount of rainfall. I have tried a simple linear model, with zero intercept because zero rainfall cannot cause any leaching (leachdata dataframe is attached to this email). The diagnostics show clearly non-normally distributed residuals with a simple linear regression, and I am trying to figure out what to do about it (see attached diagnostics.png). This dataset contains measurements from 57 rainfall events on three replicate samples, for a total of 171 measurements. Part of the problem is that physically, the leaching values can only be positive, so for the smaller rainfall amounts the residuals are all positive. If I allow an intercept then it is significantly positive, possibly since the researcher wouldn't have collected measurements for very small rain events, but in terms of the model it doesn't make sense physically to have a positive intercept, particularly since lab experiments have shown that a certain amount of rain exposure is required to wet the wood before leaching begins. I can get more normally distributed residuals by log-transforming the response, or using the optimal box-cox transformation of lambda = 0.21, which produces nicer-looking residuals but unsatisfactory prediction which is the main goal of the model (also attached). Any advice on how to create a better predictive model? I presume it has something to do with glm, especially since I have repeated rainfalls on replicate samples, but any advice on the approach to take would be much appreciated. The code I used to produce the attached plots is included below. leach.lm - lm(leachate~rainmm-1,data=leachdata) png(dianostics.png,height=1200,width=700) par(mfrow=c(3,2)) plot(leachate~rainmm,data=leachdata,main=Data and fitted line) abline(leach.lm) plot(predict(leach.lm)~leachdata$leachate,main=predicted vs. observed leaching amount,xlim=c(0,12),ylim=c(0,12),xlab=observed leaching,ylab=predicted leaching) abline(a=0,b=1) plot(leach.lm) dev.off() library(MASS) boxcox(leach.lm,plotit=T,lambda=seq(0,0.4,by=0.01)) boxtran - function(y,lambda,inverse=F){ if(inverse) return((lambda*y+1)^(1/lambda)) else return((y^lambda-1)/lambda) } png(boxcox-dianostics.png,height=1200,width=700) par(mfrow=c(3,2)) logleach.lm - lm(boxtran(leachate,0.21)~rainmm-1,data=leachdata) plot(leachate~rainmm,data=leachdata,main=Data and fitted line) x - leachdata$rainmm y - boxtran(predict(logleach.lm),0.21,T) xy - cbind(x,y)[order(x),] lines(xy) plot(y~leachdata$leachate,xlim=c(0,12),ylim=c(0,12),main=predicted vs. observed leaching amount,xlab=observed leaching,ylab=predicted leaching) abline(a=0,b=1) plot(logleach.lm) dev.off() `leachdata` - structure(list(rainmm = c(19.68, 36.168, 18.632, 2.74, 0.822, 9.864, 7.124, 29.592, 4.384, 11.508, 1.37, 3.288, 9.042, 2.74, 18.906, 4.932, 0.274, 3.836, 1.918, 4.384, 16.714, 5.754, 12.604, 2.466, 13.014, 2.74, 14.796, 5.754, 4.93, 5.21, 0.548, 1.644, 3.014, 6.028, 18.358, 1.918, 3.014, 18.358, 0.274, 1.918, 54.2, 43.4, 11.2, 1.6, 3.8, 70.2, 0.2, 24.4, 25.8, 13, 7.124, 10.96, 7.672, 3.562, 3.288, 6.02, 17.54, 19.68, 36.168, 18.632, 2.74, 0.822, 9.864, 7.124, 29.592, 4.384, 11.508, 1.37, 3.288, 9.042, 2.74, 18.906, 4.932, 0.274, 3.836, 1.918, 4.384, 16.714, 5.754, 12.604, 2.466, 13.014, 2.74, 14.796, 5.754, 4.93, 5.21, 0.548, 1.644, 3.014, 6.028, 18.358, 1.918, 3.014, 18.358, 0.274, 1.918, 54.2, 43.4, 11.2, 1.6, 3.8, 70.2, 0.2, 24.4, 25.8, 13, 7.124, 10.96, 7.672, 3.562, 3.288, 6.02, 17.54, 19.68, 36.168, 18.632, 2.74, 0.822, 9.864, 7.124, 29.592, 4.384, 11.508, 1.37, 3.288, 9.042, 2.74, 18.906, 4.932, 0.274, 3.836, 1.918, 4.384, 16.714, 5.754, 12.604, 2.466, 13.014, 2.74, 14.796, 5.754, 4.93, 5.21, 0.548, 1.644, 3.014, 6.028, 18.358, 1.918, 3.014, 18.358, 0.274, 1.918, 54.2, 43.4, 11.2, 1.6, 3.8, 70.2, 0.2, 24.4, 25.8, 13, 7.124, 10.96, 7.672,
[R] reorder breaking by half
Hi, I want to reorder the colors given by rainbow(7) so that the last half move to the first 4. For example: ci=rainbow(7) ci [1] #FFFF #FFDB00FF #49FF00FF #00FF92FF #0092 #4900 [7] #FF00DBFF I would like #FFFF #FFDB00FF #49FF00FF to be at the end of ci, and the rest to be at the beginning. How can I do that? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rmeta package: metaplot or forestplot of meta-analysis under DSL (ramdon) model
The package has a plot() method for random-effects meta-analyses as well, either those produced by meta.DSL or meta.summaries. There are examples on the help page for meta.DSL. -thomas On Tue, 27 May 2008, Shi, Jiajun [BSD] - KNP wrote: Dear all, I could not draw a forest plot for meta-analysis under ramdon models using the rmeta package. The rmeta has a default function for MH (fixed-effect) model. Has the rmeta package been updated for such a function? Or someone revised it and kept a private code? I would appreciate it if you could provide some information on this question. Thanks, Andrew This email is intended only for the use of the individua...{{dropped:12}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with subset
Hi, I am new to R and i am looking for a way to extract a subset from a vector. I have a vector of number oscillating around zero (a decreasing autocorrelation function) and i would like to extract only the first positive part of the function (from zero lag to the lag where the function inverts its sign for the first time). I have tried subset(myvector,myvector0) but this obviously extract all the positive intervals not only the first one. Is there a logical statement i can use in subset? I prefer not to use an if statement that would probably slow down the code. Thanks a lot, Luca * dr. Luca Mortarini [EMAIL PROTECTED] Università del Piemonte Orientale Dipartimento di Scienze e Tecnologie Avanzate __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Manipulating DataSets
On Fri, 6 Jun 2008, Neil Gupta wrote: Hello R-users, I have a very simple problem I wanted to solve. I have a large dataset as such: Lag X.Symbol Time TickType ReferenceNumber Price Size X.Symbol.1 Time.1 TickType.1 ReferenceNumber.1 1 ES 3:ESZ7.GB 08:30:00B74390987 151075 44 3:ESZ7.GB08:30:00 A 74390988 2 ES 3:YMZ7.EC 08:30:00B74390993 13686 17 3:YMZ7.EC08:30:00 A 74390994 3 YM 3:ESZ7.GB 08:30:00B74391135 151075 49 3:ESZ7.GB08:30:00 A 74391136 4 YM 3:YMZ7.EC 08:30:00B74390998 13686 17 3:YMZ7.EC08:30:00 A 74390999 5 YM 3:ESZ7.GB 08:30:00B74391135 151075 49 3:ESZ7.GB08:30:00 A 74391136 6 YM 3:YMZ7.EC 08:30:00B74391000 13686 14 3:YMZ7.EC08:30:00 A 74391001 Price.1 Size.1 LeadTime MidPoint Spread 1 151100 22 08:30:00 *151087.5* 25 2 13688 27 08:30:00 13687.0 2 3 151100 22 08:30:00 *151087.5* 25 4 13688 27 08:30:00 13687.0 2 5 151100 22 08:30:00 151087.5 25 6 13688 27 08:30:00 13687.0 2 All I wanted to do was take the Log(MidPoint[2]) - Log(MidPoint[1]) for a symbol 3:ESZ7.GB So the first one would be log(151087.5) - log(151087.5). I wanted to do this throughout the data set and add that in another column. I would appreciate any help. See example( split ) Note the ### data frame variation, which should serve as a template for your problem. HTH, Chuck Regards, Neil Gupta [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Improving data processing efficiency
i thought since the function code (which i provided in full) was pretty short, it would be reasonably easy to just read the code and see what it's doing. but ok, so... i am attaching a zip file, with a small sample of the data set (tab delimited), and the function code, in a zip file (posting guidelines claim that some archive formats are allowed, i assume zip is one of them... would appreciate your comments! :) on 06/06/2008 12:05 PM Gabor Grothendieck said the following: Its summarized in the last line to r-help. Note reproducible and minimal. On Fri, Jun 6, 2008 at 12:03 PM, Daniel Folkinshteyn [EMAIL PROTECTED] wrote: i did! what did i miss? on 06/06/2008 11:45 AM Gabor Grothendieck said the following: Try reading the posting guide before posting. On Fri, Jun 6, 2008 at 11:12 AM, Daniel Folkinshteyn [EMAIL PROTECTED] wrote: Anybody have any thoughts on this? Please? :) on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following: Hi everyone! I have a question about data processing efficiency. My data are as follows: I have a data set on quarterly institutional ownership of equities; some of them have had recent IPOs, some have not (I have a binary flag set). The total dataset size is 700k+ rows. My goal is this: For every quarter since issue for each IPO, I need to find a matched firm in the same industry, and close in market cap. So, e.g., for firm X, which had an IPO, i need to find a matched non-issuing firm in quarter 1 since IPO, then a (possibly different) non-issuing firm in quarter 2 since IPO, etc. Repeat for each issuing firm (there are about 8300 of these). Thus it seems to me that I need to be doing a lot of data selection and subsetting, and looping (yikes!), but the result appears to be highly inefficient and takes ages (well, many hours). What I am doing, in pseudocode, is this: 1. for each quarter of data, getting out all the IPOs and all the eligible non-issuing firms. 2. for each IPO in a quarter, grab all the non-issuers in the same industry, sort them by size, and finally grab a matching firm closest in size (the exact procedure is to grab the closest bigger firm if one exists, and just the biggest available if all are smaller) 3. assign the matched firm-observation the same quarters since issue as the IPO being matched 4. rbind them all into the matching dataset. The function I currently have is pasted below, for your reference. Is there any way to make it produce the same result but much faster? Specifically, I am guessing eliminating some loops would be very good, but I don't see how, since I need to do some fancy footwork for each IPO in each quarter to find the matching firm. I'll be doing a few things similar to this, so it's somewhat important to up the efficiency of this. Maybe some of you R-fu masters can clue me in? :) I would appreciate any help, tips, tricks, tweaks, you name it! :) == my function below === fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata, quarters_since_issue=40) { result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is cheaper, so typecast the result to matrix colnames = names(tfdata) quarterends = sort(unique(tfdata$DATE)) for (aquarter in quarterends) { tfdata_quarter = tfdata[tfdata$DATE == aquarter, ] tfdata_quarter_fitting_nonissuers = tfdata_quarter[ (tfdata_quarter$Quarters.Since.Latest.Issue quarters_since_issue) (tfdata_quarter$IPO.Flag == 0), ] tfdata_quarter_ipoissuers = tfdata_quarter[ tfdata_quarter$IPO.Flag == 1, ] for (i in 1:nrow(tfdata_quarter_ipoissuers)) { arow = tfdata_quarter_ipoissuers[i,] industrypeers = tfdata_quarter_fitting_nonissuers[ tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ] industrypeers = industrypeers[ order(industrypeers$Market.Cap.13f), ] if ( nrow(industrypeers) 0 ) { if ( nrow(industrypeers[industrypeers$Market.Cap.13f = arow$Market.Cap.13f, ]) 0 ) { bestpeer = industrypeers[industrypeers$Market.Cap.13f = arow$Market.Cap.13f, ][1,] } else { bestpeer = industrypeers[nrow(industrypeers),] } bestpeer$Quarters.Since.IPO.Issue = arow$Quarters.Since.IPO.Issue #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO == bestpeer$PERMNO] = 1 result = rbind(result, as.matrix(bestpeer)) } } #result = rbind(result, tfdata_quarter) print (aquarter) } result = as.data.frame(result) names(result) = colnames return(result) } = end of my function = __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __
Re: [R] lsmeans
Dear Dani, I intend at some point to extend the effects package to linear and generalized linear mixed-effects models, probably using lmer() rather than lme(), but as you discovered, it doesn't handle these models now. It wouldn't be hard, however, to do the computations yourself, using the coefficient vector for the fixed effects and a suitably constructed model-matrix to compute the effects; you could also get standard errors by using the covariance matrix for the fixed effects. I hope this helps, John On Fri, 06 Jun 2008 17:05:58 +0200 Dani Valverde [EMAIL PROTECTED] wrote: Hello, I have the next function call: lme(fixed=Error ~ Temperature * Tumour ,random = ~1|ID, data=error_DB) which returns an lme object. I am interested on carrying out some kind of lsmeans on the data returned, but I cannot find any function to do this in R. I'have seen the effect() function, but it does not work with lme objects. Any idea? Best, Dani -- Daniel Valverde Saubí Grup de Biologia Molecular de Llevats Facultat de Veterinària de la Universitat Autònoma de Barcelona Edifici V, Campus UAB 08193 Cerdanyola del Vallès- SPAIN Centro de Investigación Biomédica en Red en Bioingeniería, Biomateriales y Nanomedicina (CIBER-BBN) Grup d'Aplicacions Biomèdiques de la RMN Facultat de Biociències Universitat Autònoma de Barcelona Edifici Cs, Campus UAB 08193 Cerdanyola del Vallès- SPAIN +34 93 5814126 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. John Fox, Professor Department of Sociology McMaster University Hamilton, Ontario, Canada http://socserv.mcmaster.ca/jfox/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Improving data processing efficiency
thanks for the tip! i'll try that and see how big of a difference that makes... if i am not sure what exactly the size will be, am i better off making it larger, and then later stripping off the blank rows, or making it smaller, and appending the missing rows? on 06/06/2008 11:44 AM Patrick Burns said the following: One thing that is likely to speed the code significantly is if you create 'result' to be its final size and then subscript into it. Something like: result[i, ] - bestpeer (though I'm not sure if 'i' is the proper index). Patrick Burns [EMAIL PROTECTED] +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and A Guide for the Unwilling S User) Daniel Folkinshteyn wrote: Anybody have any thoughts on this? Please? :) on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following: Hi everyone! I have a question about data processing efficiency. My data are as follows: I have a data set on quarterly institutional ownership of equities; some of them have had recent IPOs, some have not (I have a binary flag set). The total dataset size is 700k+ rows. My goal is this: For every quarter since issue for each IPO, I need to find a matched firm in the same industry, and close in market cap. So, e.g., for firm X, which had an IPO, i need to find a matched non-issuing firm in quarter 1 since IPO, then a (possibly different) non-issuing firm in quarter 2 since IPO, etc. Repeat for each issuing firm (there are about 8300 of these). Thus it seems to me that I need to be doing a lot of data selection and subsetting, and looping (yikes!), but the result appears to be highly inefficient and takes ages (well, many hours). What I am doing, in pseudocode, is this: 1. for each quarter of data, getting out all the IPOs and all the eligible non-issuing firms. 2. for each IPO in a quarter, grab all the non-issuers in the same industry, sort them by size, and finally grab a matching firm closest in size (the exact procedure is to grab the closest bigger firm if one exists, and just the biggest available if all are smaller) 3. assign the matched firm-observation the same quarters since issue as the IPO being matched 4. rbind them all into the matching dataset. The function I currently have is pasted below, for your reference. Is there any way to make it produce the same result but much faster? Specifically, I am guessing eliminating some loops would be very good, but I don't see how, since I need to do some fancy footwork for each IPO in each quarter to find the matching firm. I'll be doing a few things similar to this, so it's somewhat important to up the efficiency of this. Maybe some of you R-fu masters can clue me in? :) I would appreciate any help, tips, tricks, tweaks, you name it! :) == my function below === fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata, quarters_since_issue=40) { result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is cheaper, so typecast the result to matrix colnames = names(tfdata) quarterends = sort(unique(tfdata$DATE)) for (aquarter in quarterends) { tfdata_quarter = tfdata[tfdata$DATE == aquarter, ] tfdata_quarter_fitting_nonissuers = tfdata_quarter[ (tfdata_quarter$Quarters.Since.Latest.Issue quarters_since_issue) (tfdata_quarter$IPO.Flag == 0), ] tfdata_quarter_ipoissuers = tfdata_quarter[ tfdata_quarter$IPO.Flag == 1, ] for (i in 1:nrow(tfdata_quarter_ipoissuers)) { arow = tfdata_quarter_ipoissuers[i,] industrypeers = tfdata_quarter_fitting_nonissuers[ tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ] industrypeers = industrypeers[ order(industrypeers$Market.Cap.13f), ] if ( nrow(industrypeers) 0 ) { if ( nrow(industrypeers[industrypeers$Market.Cap.13f = arow$Market.Cap.13f, ]) 0 ) { bestpeer = industrypeers[industrypeers$Market.Cap.13f = arow$Market.Cap.13f, ][1,] } else { bestpeer = industrypeers[nrow(industrypeers),] } bestpeer$Quarters.Since.IPO.Issue = arow$Quarters.Since.IPO.Issue #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO == bestpeer$PERMNO] = 1 result = rbind(result, as.matrix(bestpeer)) } } #result = rbind(result, tfdata_quarter) print (aquarter) } result = as.data.frame(result) names(result) = colnames return(result) } = end of my function = __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list
Re: [R] reorder breaking by half
ci = rainbow(7)[c(4:7, 1:3)] on 06/06/2008 01:02 PM avilella said the following: Hi, I want to reorder the colors given by rainbow(7) so that the last half move to the first 4. For example: ci=rainbow(7) ci [1] #FFFF #FFDB00FF #49FF00FF #00FF92FF #0092 #4900 [7] #FF00DBFF I would like #FFFF #FFDB00FF #49FF00FF to be at the end of ci, and the rest to be at the beginning. How can I do that? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Improving data processing efficiency
I think the posting guide may not be clear enough and have suggested that it be clarified. Hopefully this better communicates what is required and why in a shorter amount of space: https://stat.ethz.ch/pipermail/r-devel/2008-June/049891.html On Fri, Jun 6, 2008 at 1:25 PM, Daniel Folkinshteyn [EMAIL PROTECTED] wrote: i thought since the function code (which i provided in full) was pretty short, it would be reasonably easy to just read the code and see what it's doing. but ok, so... i am attaching a zip file, with a small sample of the data set (tab delimited), and the function code, in a zip file (posting guidelines claim that some archive formats are allowed, i assume zip is one of them... would appreciate your comments! :) on 06/06/2008 12:05 PM Gabor Grothendieck said the following: Its summarized in the last line to r-help. Note reproducible and minimal. On Fri, Jun 6, 2008 at 12:03 PM, Daniel Folkinshteyn [EMAIL PROTECTED] wrote: i did! what did i miss? on 06/06/2008 11:45 AM Gabor Grothendieck said the following: Try reading the posting guide before posting. On Fri, Jun 6, 2008 at 11:12 AM, Daniel Folkinshteyn [EMAIL PROTECTED] wrote: Anybody have any thoughts on this? Please? :) on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following: Hi everyone! I have a question about data processing efficiency. My data are as follows: I have a data set on quarterly institutional ownership of equities; some of them have had recent IPOs, some have not (I have a binary flag set). The total dataset size is 700k+ rows. My goal is this: For every quarter since issue for each IPO, I need to find a matched firm in the same industry, and close in market cap. So, e.g., for firm X, which had an IPO, i need to find a matched non-issuing firm in quarter 1 since IPO, then a (possibly different) non-issuing firm in quarter 2 since IPO, etc. Repeat for each issuing firm (there are about 8300 of these). Thus it seems to me that I need to be doing a lot of data selection and subsetting, and looping (yikes!), but the result appears to be highly inefficient and takes ages (well, many hours). What I am doing, in pseudocode, is this: 1. for each quarter of data, getting out all the IPOs and all the eligible non-issuing firms. 2. for each IPO in a quarter, grab all the non-issuers in the same industry, sort them by size, and finally grab a matching firm closest in size (the exact procedure is to grab the closest bigger firm if one exists, and just the biggest available if all are smaller) 3. assign the matched firm-observation the same quarters since issue as the IPO being matched 4. rbind them all into the matching dataset. The function I currently have is pasted below, for your reference. Is there any way to make it produce the same result but much faster? Specifically, I am guessing eliminating some loops would be very good, but I don't see how, since I need to do some fancy footwork for each IPO in each quarter to find the matching firm. I'll be doing a few things similar to this, so it's somewhat important to up the efficiency of this. Maybe some of you R-fu masters can clue me in? :) I would appreciate any help, tips, tricks, tweaks, you name it! :) == my function below === fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata, quarters_since_issue=40) { result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is cheaper, so typecast the result to matrix colnames = names(tfdata) quarterends = sort(unique(tfdata$DATE)) for (aquarter in quarterends) { tfdata_quarter = tfdata[tfdata$DATE == aquarter, ] tfdata_quarter_fitting_nonissuers = tfdata_quarter[ (tfdata_quarter$Quarters.Since.Latest.Issue quarters_since_issue) (tfdata_quarter$IPO.Flag == 0), ] tfdata_quarter_ipoissuers = tfdata_quarter[ tfdata_quarter$IPO.Flag == 1, ] for (i in 1:nrow(tfdata_quarter_ipoissuers)) { arow = tfdata_quarter_ipoissuers[i,] industrypeers = tfdata_quarter_fitting_nonissuers[ tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ] industrypeers = industrypeers[ order(industrypeers$Market.Cap.13f), ] if ( nrow(industrypeers) 0 ) { if ( nrow(industrypeers[industrypeers$Market.Cap.13f = arow$Market.Cap.13f, ]) 0 ) { bestpeer = industrypeers[industrypeers$Market.Cap.13f = arow$Market.Cap.13f, ][1,] } else { bestpeer = industrypeers[nrow(industrypeers),] } bestpeer$Quarters.Since.IPO.Issue = arow$Quarters.Since.IPO.Issue #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO == bestpeer$PERMNO] = 1 result = rbind(result, as.matrix(bestpeer)) } } #result = rbind(result, tfdata_quarter) print (aquarter) } result = as.data.frame(result) names(result) = colnames
Re: [R] Improving data processing efficiency
just in case, uploaded it to the server, you can get the zip file i mentioned here: http://astro.temple.edu/~dfolkins/helplistfiles.zip on 06/06/2008 01:25 PM Daniel Folkinshteyn said the following: i thought since the function code (which i provided in full) was pretty short, it would be reasonably easy to just read the code and see what it's doing. but ok, so... i am attaching a zip file, with a small sample of the data set (tab delimited), and the function code, in a zip file (posting guidelines claim that some archive formats are allowed, i assume zip is one of them... would appreciate your comments! :) on 06/06/2008 12:05 PM Gabor Grothendieck said the following: Its summarized in the last line to r-help. Note reproducible and minimal. On Fri, Jun 6, 2008 at 12:03 PM, Daniel Folkinshteyn [EMAIL PROTECTED] wrote: i did! what did i miss? on 06/06/2008 11:45 AM Gabor Grothendieck said the following: Try reading the posting guide before posting. On Fri, Jun 6, 2008 at 11:12 AM, Daniel Folkinshteyn [EMAIL PROTECTED] wrote: Anybody have any thoughts on this? Please? :) on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following: Hi everyone! I have a question about data processing efficiency. My data are as follows: I have a data set on quarterly institutional ownership of equities; some of them have had recent IPOs, some have not (I have a binary flag set). The total dataset size is 700k+ rows. My goal is this: For every quarter since issue for each IPO, I need to find a matched firm in the same industry, and close in market cap. So, e.g., for firm X, which had an IPO, i need to find a matched non-issuing firm in quarter 1 since IPO, then a (possibly different) non-issuing firm in quarter 2 since IPO, etc. Repeat for each issuing firm (there are about 8300 of these). Thus it seems to me that I need to be doing a lot of data selection and subsetting, and looping (yikes!), but the result appears to be highly inefficient and takes ages (well, many hours). What I am doing, in pseudocode, is this: 1. for each quarter of data, getting out all the IPOs and all the eligible non-issuing firms. 2. for each IPO in a quarter, grab all the non-issuers in the same industry, sort them by size, and finally grab a matching firm closest in size (the exact procedure is to grab the closest bigger firm if one exists, and just the biggest available if all are smaller) 3. assign the matched firm-observation the same quarters since issue as the IPO being matched 4. rbind them all into the matching dataset. The function I currently have is pasted below, for your reference. Is there any way to make it produce the same result but much faster? Specifically, I am guessing eliminating some loops would be very good, but I don't see how, since I need to do some fancy footwork for each IPO in each quarter to find the matching firm. I'll be doing a few things similar to this, so it's somewhat important to up the efficiency of this. Maybe some of you R-fu masters can clue me in? :) I would appreciate any help, tips, tricks, tweaks, you name it! :) == my function below === fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata, quarters_since_issue=40) { result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is cheaper, so typecast the result to matrix colnames = names(tfdata) quarterends = sort(unique(tfdata$DATE)) for (aquarter in quarterends) { tfdata_quarter = tfdata[tfdata$DATE == aquarter, ] tfdata_quarter_fitting_nonissuers = tfdata_quarter[ (tfdata_quarter$Quarters.Since.Latest.Issue quarters_since_issue) (tfdata_quarter$IPO.Flag == 0), ] tfdata_quarter_ipoissuers = tfdata_quarter[ tfdata_quarter$IPO.Flag == 1, ] for (i in 1:nrow(tfdata_quarter_ipoissuers)) { arow = tfdata_quarter_ipoissuers[i,] industrypeers = tfdata_quarter_fitting_nonissuers[ tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ] industrypeers = industrypeers[ order(industrypeers$Market.Cap.13f), ] if ( nrow(industrypeers) 0 ) { if ( nrow(industrypeers[industrypeers$Market.Cap.13f = arow$Market.Cap.13f, ]) 0 ) { bestpeer = industrypeers[industrypeers$Market.Cap.13f = arow$Market.Cap.13f, ][1,] } else { bestpeer = industrypeers[nrow(industrypeers),] } bestpeer$Quarters.Since.IPO.Issue = arow$Quarters.Since.IPO.Issue #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO == bestpeer$PERMNO] = 1 result = rbind(result, as.matrix(bestpeer)) } } #result = rbind(result, tfdata_quarter) print (aquarter) } result = as.data.frame(result) names(result) = colnames return(result) } = end of my function = __ R-help@r-project.org mailing list
Re: [R] How to force two regression coefficients to be equal but opposite in sign?
One simple way is to do something like: fit - lm(y ~ I(x1-x2) + x3, data=mydata) The first coeficient (after the intercept) will be the slope for x1, the slope for x2 will be the negative of that. This model is nested in the fuller model with x1 and x2 fit seperately and you can therefore test for differences. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Woolner, Keith Sent: Friday, June 06, 2008 10:07 AM To: r-help@r-project.org Subject: [R] How to force two regression coefficients to be equal but opposite in sign? Is there a way to set up a regression in R that forces two coefficients to be equal but opposite in sign? I'm trying to setup a model where a subject appears in a pair of environments where a measurement X is made. There are a total of 5 environments, one of which is a baseline. But each observation is for a subject in only two of them, and not all subjects will appear in each environment. Each of the environments has an effect on the variable X. I want to measure the relative effects of each environment E on X with a model. Xj = Xi * Ei / Ej Ei of the baseline model is set equal to 1. With a log transform, a linear-looking regression can be written as: log(Xj) = log(Xi) + log(Ei) - log(Ej) My data looks like: #E1 X1 E2X2 1A .20 B.25 What I've tried in R: env - c(A,B,C,D,E) # Note: data is made up just for this example df - data.frame( X1 = c(.20,.10,.40,.05,.10,.24,.30,.70,.48,.22,.87,.29,.24,.19,.92), X2 = c(.25,.12,.45,.01,.19,.50,.30,.40,.50,.40,.68,.30,.16,.02,.70), E1 = c(A,A,A,B,B,B,C,C,C,D,D,D,E,E,E), E2 = c(B,C,D,A,D,E,A,B,E,B,C,E,A,B,C) ) model - lm(log(X2) ~ log(X1) + E1 + E2, data = df) summary(model) Call: lm(formula = log(X2) ~ log(X1) + E1 + E2, data = df) Residuals: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0.3240 0.2621 -0.5861 -1.0283 0.5861 0.4422 0.3831 -0.2608 -0.1222 0.9002 -0.5802 -0.3200 0.6452 -0.9634 0.3182 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 0.545631.71558 0.3180.763 log(X1) 1.297450.57295 2.2650.073 . E1B -0.235710.95738 -0.2460.815 E1C -0.570571.20490 -0.4740.656 E1D -0.229880.98274 -0.2340.824 E1E -1.171811.02918 -1.1390.306 E2B -0.167750.87803 -0.1910.856 E2C 0.059521.12779 0.0530.960 E2D 0.430771.19485 0.3610.733 E2E 0.406330.98289 0.4130.696 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1.004 on 5 degrees of freedom Multiple R-squared: 0.7622, Adjusted R-squared: 0.3343 F-statistic: 1.781 on 9 and 5 DF, p-value: 0.2721 What I need to do is force the corresponding environment coefficients to be equal in absolute value, but opposite in sign. That is: E1B = -E2B E1C = -E3C E1D = -E3D E1E = -E1E In essence, E1 and E2 are the same variable, but can play two different roles in the model depending on whether it's the first part of the observation or the second part. I searched the archive, and the closest thing I found to my situation was: http://tolstoy.newcastle.edu.au/R/e4/help/08/03/6773.html But the response to that thread didn't seem to be applicable to my situation. Any pointers would be appreciated. Thanks, Keith [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] where to download BRugs?
On Fri, 6 Jun 2008, Nanye Long wrote: Hi all, Does anyone know where to download the BRugs package? I did not find it on r-project website. Thanks. It is Windows-only, and you download it from 'CRAN (extras)' which is part of the default repository set on Windows versions of R. So install.packages(BRugs) is all that is needed unless you changed something to stop it working. (It is only available for R = 2.6.0.) -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Improving data processing efficiency
Ok, sorry about the zip, then. :) Thanks for taking the trouble to clue me in as to the best posting procedure! well, here's a dput-ed version of the small data subset you can use for testing. below that, an updated version of the function, with extra explanatory comments, and producing an extra column showing exactly what is matched to what. to test, just run the function, with the dataset as sole argument. Thanks again; i'd appreciate any input on this. === begin dataset dput representation structure(list(PERMNO = c(10001L, 10001L, 10298L, 10298L, 10484L, 10484L, 10515L, 10515L, 10634L, 10634L, 11048L, 11048L, 11237L, 11294L, 11294L, 11338L, 11338L, 11404L, 11404L, 11587L, 11587L, 11591L, 11591L, 11737L, 11737L, 11791L, 11809L, 11809L, 11858L, 11858L, 11955L, 11955L, 12003L, 12003L, 12016L, 12016L, 12223L, 12223L, 12758L, 12758L, 13688L, 13688L, 16117L, 16117L, 17770L, 17770L, 21514L, 21514L, 21792L, 21792L, 21821L, 21821L, 22437L, 22437L, 22947L, 22947L, 23027L, 23027L, 23182L, 23182L, 23536L, 23536L, 23712L, 23712L, 24053L, 24053L, 24117L, 24117L, 24256L, 24256L, 24299L, 24299L, 24352L, 24352L, 24379L, 24379L, 24467L, 24467L, 24679L, 24679L, 24870L, 24870L, 25056L, 25056L, 25208L, 25208L, 25232L, 25232L, 25241L, 25590L, 25590L, 26463L, 26463L, 26470L, 26470L, 26614L, 26614L, 27385L, 27385L, 29196L, 29196L, 30411L, 30411L, 32943L, 32943L, 38893L, 38893L, 40708L, 40708L, 41005L, 41005L, 42817L, 42817L, 42833L, 42833L, 43668L, 43668L, 45947L, 45947L, 46017L, 46017L, 48274L, 48274L, 49971L, 49971L, 53786L, 53786L, 53859L, 53859L, 54199L, 54199L, 56371L, 56952L, 56952L, 57277L, 57277L, 57381L, 57381L, 58202L, 58202L, 59395L, 59395L, 59935L, 60169L, 60169L, 61188L, 61188L, 61444L, 61444L, 62690L, 62690L, 62842L, 62842L, 64290L, 64290L, 64418L, 64418L, 64450L, 64450L, 64477L, 64477L, 64557L, 64557L, 64646L, 64646L, 64902L, 64902L, 67774L, 67774L, 68910L, 68910L, 70471L, 70471L, 74406L, 74406L, 75091L, 75091L, 75304L, 75304L, 75743L, 75964L, 75964L, 76026L, 76026L, 76162L, 76170L, 76173L, 78530L, 78530L, 78682L, 78682L, 81569L, 81569L, 82502L, 82502L, 83337L, 83337L, 83919L, 83919L, 88242L, 88242L, 90852L, 90852L, 91353L, 91353L ), DATE = c(19900331, 19900630, 19900630, 19900331, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900630, 19900630, 19900331, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900630, 19900331, 19900331, 19900630, 19900630, 19900331, 19900630, 19900331, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900331, 19900630, 19900630, 19900331, 19900331, 19900630, 19900630, 19900331, 19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900331, 19900331, 19900630, 19900630, 19900331, 19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900331, 19900630, 19900630, 19900331, 19900630, 19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900331, 19900630, 19900331, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900630, 19900331, 19900331, 19900630, 19900630, 19900331, 19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900630, 19900331, 19900630, 19900630, 19900630, 19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630), Shares.Owned = c(50100, 50100, 25, 293500, 3656629, 3827119, 4132439, 3566591, 2631193, 2500301, 775879, 816879, 38700, 1041600, 1070300, 533768, 558815, 61384492, 60466567, 194595, 196979, 359946, 314446, 106770, 107070, 20242, 1935098, 2099403, 1902125, 1766750, 41991, 41991, 34490, 36290, 589400, 596700, 1549395, 1759440, 854473, 762903, 156366785, 98780287, 2486389, 2635718, 122264, 122292, 25455916, 25458658, 71645490, 71855722, 30969596, 30409838, 2738576, 2814490, 20846605, 20930233, 1148299, 505415, 396388, 385714, 25239923, 24117950, 73465526, 73084616, 8096614, 7595742, 3937930, 3820215, 20884821, 19456342, 2127331, 2188276, 2334515, 2813347, 8267264, 8544084, 783277, 810742, 742048, 512956, 9659658, 9436873,
Re: [R] Subsetting to unique values
The interesting thing about R is that there are several ways to skin the cat; here is yet another solution: do.call(rbind, by(ddTable, ddTable$Id, function(z) z[1,,drop=FALSE])) Id name 1 1 Paul 2 2 Bob On Fri, Jun 6, 2008 at 9:35 AM, Emslie, Paul [Ctr] [EMAIL PROTECTED] wrote: I want to take the first row of each unique ID value from a data frame. For instance ddTable - data.frame(Id=c(1,1,2,2),name=c(Paul,Joe,Bob,Larry)) I want a dataset that is Id Name 1 Paul 2 Bob unique(ddTable) Will give me all 4 rows, and unique(ddTable$Id) Will give me c(1,2), but not accompanied by the name column. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot questions
Thanx Thierry, Suggestion #1 had no effect. I have been playing with variants on #2 along the way. DaveT. -Original Message- From: ONKELINX, Thierry [mailto:[EMAIL PROTECTED] Sent: June 6, 2008 04:02 AM To: Thompson, David (MNR); hadley wickham Cc: r-help@r-project.org Subject: RE: [R] ggplot questions David, 1. Try scale_x_continuous(lim = c(0, 360)) + scale_y_continuous(lim = c(0, 16)) 2. You could set the colour of the gridlines equal to the backgroup colour with ggopt HTH, Thierry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Improving data processing efficiency
That is going to be situation dependent, but if you have a reasonable upper bound, then that will be much easier and not far from optimal. If you pick the possibly too small route, then increasing the size in largish junks is much better than adding a row at a time. Pat Daniel Folkinshteyn wrote: thanks for the tip! i'll try that and see how big of a difference that makes... if i am not sure what exactly the size will be, am i better off making it larger, and then later stripping off the blank rows, or making it smaller, and appending the missing rows? on 06/06/2008 11:44 AM Patrick Burns said the following: One thing that is likely to speed the code significantly is if you create 'result' to be its final size and then subscript into it. Something like: result[i, ] - bestpeer (though I'm not sure if 'i' is the proper index). Patrick Burns [EMAIL PROTECTED] +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and A Guide for the Unwilling S User) Daniel Folkinshteyn wrote: Anybody have any thoughts on this? Please? :) on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following: Hi everyone! I have a question about data processing efficiency. My data are as follows: I have a data set on quarterly institutional ownership of equities; some of them have had recent IPOs, some have not (I have a binary flag set). The total dataset size is 700k+ rows. My goal is this: For every quarter since issue for each IPO, I need to find a matched firm in the same industry, and close in market cap. So, e.g., for firm X, which had an IPO, i need to find a matched non-issuing firm in quarter 1 since IPO, then a (possibly different) non-issuing firm in quarter 2 since IPO, etc. Repeat for each issuing firm (there are about 8300 of these). Thus it seems to me that I need to be doing a lot of data selection and subsetting, and looping (yikes!), but the result appears to be highly inefficient and takes ages (well, many hours). What I am doing, in pseudocode, is this: 1. for each quarter of data, getting out all the IPOs and all the eligible non-issuing firms. 2. for each IPO in a quarter, grab all the non-issuers in the same industry, sort them by size, and finally grab a matching firm closest in size (the exact procedure is to grab the closest bigger firm if one exists, and just the biggest available if all are smaller) 3. assign the matched firm-observation the same quarters since issue as the IPO being matched 4. rbind them all into the matching dataset. The function I currently have is pasted below, for your reference. Is there any way to make it produce the same result but much faster? Specifically, I am guessing eliminating some loops would be very good, but I don't see how, since I need to do some fancy footwork for each IPO in each quarter to find the matching firm. I'll be doing a few things similar to this, so it's somewhat important to up the efficiency of this. Maybe some of you R-fu masters can clue me in? :) I would appreciate any help, tips, tricks, tweaks, you name it! :) == my function below === fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata, quarters_since_issue=40) { result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is cheaper, so typecast the result to matrix colnames = names(tfdata) quarterends = sort(unique(tfdata$DATE)) for (aquarter in quarterends) { tfdata_quarter = tfdata[tfdata$DATE == aquarter, ] tfdata_quarter_fitting_nonissuers = tfdata_quarter[ (tfdata_quarter$Quarters.Since.Latest.Issue quarters_since_issue) (tfdata_quarter$IPO.Flag == 0), ] tfdata_quarter_ipoissuers = tfdata_quarter[ tfdata_quarter$IPO.Flag == 1, ] for (i in 1:nrow(tfdata_quarter_ipoissuers)) { arow = tfdata_quarter_ipoissuers[i,] industrypeers = tfdata_quarter_fitting_nonissuers[ tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ] industrypeers = industrypeers[ order(industrypeers$Market.Cap.13f), ] if ( nrow(industrypeers) 0 ) { if ( nrow(industrypeers[industrypeers$Market.Cap.13f = arow$Market.Cap.13f, ]) 0 ) { bestpeer = industrypeers[industrypeers$Market.Cap.13f = arow$Market.Cap.13f, ][1,] } else { bestpeer = industrypeers[nrow(industrypeers),] } bestpeer$Quarters.Since.IPO.Issue = arow$Quarters.Since.IPO.Issue #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO == bestpeer$PERMNO] = 1 result = rbind(result, as.matrix(bestpeer)) } } #result = rbind(result, tfdata_quarter) print (aquarter) } result = as.data.frame(result) names(result) = colnames return(result) } = end of my function = __ R-help@r-project.org mailing
Re: [R] Problem with subset
On Fri, 6 Jun 2008, Luca Mortarini wrote: Hi, I am new to R and i am looking for a way to extract a subset from a vector. I have a vector of number oscillating around zero (a decreasing autocorrelation function) and i would like to extract only the first positive part of the function (from zero lag to the lag where the function inverts its sign for the first time). I have tried subset(myvector,myvector0) but this obviously extract all the positive intervals not only the first one. Is there a logical statement i can use in subset? I prefer not to use an For vector subsets you probably want [. Because from help([) For ordinary vectors, the result is simply x[subset !is.na(subset)]. But see ?rle Something like myvector[ 1 : rle( myvector = 0 )$lengths[ 1 ] ] should work. HTH, Chuck if statement that would probably slow down the code. Thanks a lot, Luca * dr. Luca Mortarini [EMAIL PROTECTED] Università del Piemonte Orientale Dipartimento di Scienze e Tecnologie Avanzate __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot questions
Does the difference have something to do with ggplot() using ranges derived from the data? When I modify my original 'test' dataframe with two extra rows as defined below, I get expected results in both versions. Order shouldn't matter - and if it's making a difference, that's a bug. But I'm still not completely sure what you're expecting. This highlights my next question (warned you ;-) ), I have been unsuccessful in trying to define fixed plotting ranges to generate a 'template' graphic that I may reuse with successive 'overstory plot' data sets. I have used '+ xlim(0, 360) + ylim(0, 16)' but, this seems to not have any effect on the final plot layout. Could you please produce a small reproducible example that demonstrates this? It may well be a bug. Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Improving data processing efficiency
Cool, I do have an upper bound, so I'll try it and how much of a speedboost it gives me. Thanks for the suggestion! on 06/06/2008 02:03 PM Patrick Burns said the following: That is going to be situation dependent, but if you have a reasonable upper bound, then that will be much easier and not far from optimal. If you pick the possibly too small route, then increasing the size in largish junks is much better than adding a row at a time. Pat Daniel Folkinshteyn wrote: thanks for the tip! i'll try that and see how big of a difference that makes... if i am not sure what exactly the size will be, am i better off making it larger, and then later stripping off the blank rows, or making it smaller, and appending the missing rows? on 06/06/2008 11:44 AM Patrick Burns said the following: One thing that is likely to speed the code significantly is if you create 'result' to be its final size and then subscript into it. Something like: result[i, ] - bestpeer (though I'm not sure if 'i' is the proper index). Patrick Burns [EMAIL PROTECTED] +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and A Guide for the Unwilling S User) Daniel Folkinshteyn wrote: Anybody have any thoughts on this? Please? :) on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following: Hi everyone! I have a question about data processing efficiency. My data are as follows: I have a data set on quarterly institutional ownership of equities; some of them have had recent IPOs, some have not (I have a binary flag set). The total dataset size is 700k+ rows. My goal is this: For every quarter since issue for each IPO, I need to find a matched firm in the same industry, and close in market cap. So, e.g., for firm X, which had an IPO, i need to find a matched non-issuing firm in quarter 1 since IPO, then a (possibly different) non-issuing firm in quarter 2 since IPO, etc. Repeat for each issuing firm (there are about 8300 of these). Thus it seems to me that I need to be doing a lot of data selection and subsetting, and looping (yikes!), but the result appears to be highly inefficient and takes ages (well, many hours). What I am doing, in pseudocode, is this: 1. for each quarter of data, getting out all the IPOs and all the eligible non-issuing firms. 2. for each IPO in a quarter, grab all the non-issuers in the same industry, sort them by size, and finally grab a matching firm closest in size (the exact procedure is to grab the closest bigger firm if one exists, and just the biggest available if all are smaller) 3. assign the matched firm-observation the same quarters since issue as the IPO being matched 4. rbind them all into the matching dataset. The function I currently have is pasted below, for your reference. Is there any way to make it produce the same result but much faster? Specifically, I am guessing eliminating some loops would be very good, but I don't see how, since I need to do some fancy footwork for each IPO in each quarter to find the matching firm. I'll be doing a few things similar to this, so it's somewhat important to up the efficiency of this. Maybe some of you R-fu masters can clue me in? :) I would appreciate any help, tips, tricks, tweaks, you name it! :) == my function below === fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata, quarters_since_issue=40) { result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is cheaper, so typecast the result to matrix colnames = names(tfdata) quarterends = sort(unique(tfdata$DATE)) for (aquarter in quarterends) { tfdata_quarter = tfdata[tfdata$DATE == aquarter, ] tfdata_quarter_fitting_nonissuers = tfdata_quarter[ (tfdata_quarter$Quarters.Since.Latest.Issue quarters_since_issue) (tfdata_quarter$IPO.Flag == 0), ] tfdata_quarter_ipoissuers = tfdata_quarter[ tfdata_quarter$IPO.Flag == 1, ] for (i in 1:nrow(tfdata_quarter_ipoissuers)) { arow = tfdata_quarter_ipoissuers[i,] industrypeers = tfdata_quarter_fitting_nonissuers[ tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ] industrypeers = industrypeers[ order(industrypeers$Market.Cap.13f), ] if ( nrow(industrypeers) 0 ) { if ( nrow(industrypeers[industrypeers$Market.Cap.13f = arow$Market.Cap.13f, ]) 0 ) { bestpeer = industrypeers[industrypeers$Market.Cap.13f = arow$Market.Cap.13f, ][1,] } else { bestpeer = industrypeers[nrow(industrypeers),] } bestpeer$Quarters.Since.IPO.Issue = arow$Quarters.Since.IPO.Issue #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO == bestpeer$PERMNO] = 1 result = rbind(result, as.matrix(bestpeer)) } } #result = rbind(result, tfdata_quarter) print (aquarter) } result =
Re: [R] boxplot changes fontsize of labels
Please read the help for par(mfrow)! AFAICS this is nothing to do with boxplot(). In a layout with exactly two rows and columns the base value of 'cex' is reduced by a factor of 0.83: if there are three or more of either rows or columns, the reduction factor is 0.66. See also the 'consider the alternatives' in that entry. On Fri, 6 Jun 2008, Sebastian Merz wrote: Hi all! So far I learned some R but finilizing my plots so they look publishable seems not to be possible. I set up some boxplots. Everything works well but when I put more then two of them in one plot the labels of the axes appear smaller than the normal font size. x - rnorm(30) y - rnorm(30) par(mfrow=c(1,4)) boxplot(x,y, names=c(horray, hurra)) mtext(Jubel, side=1, line=2) In case I take one or two boxplots this does not happen: par(mfrow=c(1,2)) boxplot(x,y, names=c(horray, hurra)) mtext(Jubel, side=1, line=2) The cex.axis seems not to be changed, as setting it to 1.0 doesn't change the behaviour. If cex.axis=1.3 in the first example the font size used by boxplot and by mtext is about the same. But as I use a function to draw quite some of these plots this hack is not a proper solution. I couldn't find anything about this behaviour in the documention or the inet. Can anybody explain? All hints are appriciated. Thanks, S. Merz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Improving data processing efficiency
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Patrick Burns Sent: Friday, June 06, 2008 12:04 PM To: Daniel Folkinshteyn Cc: r-help@r-project.org Subject: Re: [R] Improving data processing efficiency That is going to be situation dependent, but if you have a reasonable upper bound, then that will be much easier and not far from optimal. If you pick the possibly too small route, then increasing the size in largish junks is much better than adding a row at a time. Pat, I am unfamiliar with the use of the word junk as a unit of measure for data objects. I figure there are a few different possibilities: 1. You are using the term intentionally meaning that you suggest he increases the size in terms of old cars and broken pianos rather than used up pens and broken pencils. 2. This was a Freudian slip based on your opinion of some datasets you have seen. 3. Somewhere between your mind and the final product jumps/chunks became junks (possibly a microsoft correction, or just typing too fast combined with number 2). 4. junks is an official measure of data/object size that I need to learn more about (the history of the term possibly being related to 2 and 3 above). Please let it be #4, I would love to be able to tell some clients that I have received a junk of data from them. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Improving data processing efficiency
On Fri, Jun 6, 2008 at 2:28 PM, Greg Snow [EMAIL PROTECTED] wrote: -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Patrick Burns Sent: Friday, June 06, 2008 12:04 PM To: Daniel Folkinshteyn Cc: r-help@r-project.org Subject: Re: [R] Improving data processing efficiency That is going to be situation dependent, but if you have a reasonable upper bound, then that will be much easier and not far from optimal. If you pick the possibly too small route, then increasing the size in largish junks is much better than adding a row at a time. Pat, I am unfamiliar with the use of the word junk as a unit of measure for data objects. I figure there are a few different possibilities: 1. You are using the term intentionally meaning that you suggest he increases the size in terms of old cars and broken pianos rather than used up pens and broken pencils. 2. This was a Freudian slip based on your opinion of some datasets you have seen. 3. Somewhere between your mind and the final product jumps/chunks became junks (possibly a microsoft correction, or just typing too fast combined with number 2). 4. junks is an official measure of data/object size that I need to learn more about (the history of the term possibly being related to 2 and 3 above). 5. Chinese sailing vessel. http://en.wikipedia.org/wiki/Junk_(ship) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Random Forest
Hello Is there exists a package for multivariate random forest, namely for multivariate response data ? It seems to be impossible with the randomForest function and I did not find any information about this in the help pages ... Thank you for your help Bertrand __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] mean
Hi, I have a simple question. If I have a table and I want to have the mean for each row, how can I do?! Es: c1 c2 c3 mean 1 12 13 14 ?? 2 15 24 10 ?? ... Thanks, Marco __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Java to R interface
The path to R/bin is in the Windows PATH variable. Yet I get this error. On Jun 6, 10:37 am, Dumblauskas, Jerry [EMAIL PROTECTED] suisse.com wrote: Try and make sure that R is in your windows Path variable I got your message when I first did this, but when I did the about it then worked... ====== Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html ====== [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R (D)COM Server not working on windows domain account
I have installed R (D)COM on a (windows) machine that is part of a windows domain. if I run the test file in a local (log into this machine) administrative account it works fine. If I run the test file on a domain account with administrative rights it will not connect to the server, even is I change the account type from roaming to local. Anyone have any ideas? Thanks, Gregg -- View this message in context: http://www.nabble.com/R-%28D%29COM-Server-not-working-on-windows-domain-account-tp17695171p17695171.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Random Forest and for multivariate response data
Hello Is there exists a package for multivariate random forest, namely for multivariate response data ? It seems to be impossible with the randomForest function and I did not find any information about this in the help pages ... Thank you for your help Bertrand __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Improving data processing efficiency
My guess is that number 2 is closest to the mark. Typing too fast is unfortunately not one of my habitual attributes. Gabor Grothendieck wrote: On Fri, Jun 6, 2008 at 2:28 PM, Greg Snow [EMAIL PROTECTED] wrote: -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Patrick Burns Sent: Friday, June 06, 2008 12:04 PM To: Daniel Folkinshteyn Cc: r-help@r-project.org Subject: Re: [R] Improving data processing efficiency That is going to be situation dependent, but if you have a reasonable upper bound, then that will be much easier and not far from optimal. If you pick the possibly too small route, then increasing the size in largish junks is much better than adding a row at a time. Pat, I am unfamiliar with the use of the word junk as a unit of measure for data objects. I figure there are a few different possibilities: 1. You are using the term intentionally meaning that you suggest he increases the size in terms of old cars and broken pianos rather than used up pens and broken pencils. 2. This was a Freudian slip based on your opinion of some datasets you have seen. 3. Somewhere between your mind and the final product jumps/chunks became junks (possibly a microsoft correction, or just typing too fast combined with number 2). 4. junks is an official measure of data/object size that I need to learn more about (the history of the term possibly being related to 2 and 3 above). 5. Chinese sailing vessel. http://en.wikipedia.org/wiki/Junk_(ship) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Random Forest
Hello Is there exists a package for multivariate random forest, namely for multivariate response data ? It seems to be impossible with the randomForest function and I did not find any information about this in the help pages ... Thank you for your help Bertrand __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R + Linux
Dear all; I'm planning to install Linux on my computer to run R (I'm bored of W..XP). However, I haven't used Linux before and I would appreciate, if possible, suggestions/comments about what could be the best option install, say Fedora, Ubuntu or OpenSuse which to my impression are the most popular ones (at least on the R-help lists). The computer is a PC desktop with 4GB RAM and Intel Quad-Core Xeon processor and will be used only to run R. Thanks Steven __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Improving data processing efficiency
-Original Message- From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] Sent: Friday, June 06, 2008 12:33 PM To: Greg Snow Cc: Patrick Burns; Daniel Folkinshteyn; r-help@r-project.org Subject: Re: [R] Improving data processing efficiency On Fri, Jun 6, 2008 at 2:28 PM, Greg Snow [EMAIL PROTECTED] wrote: -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Patrick Burns Sent: Friday, June 06, 2008 12:04 PM To: Daniel Folkinshteyn Cc: r-help@r-project.org Subject: Re: [R] Improving data processing efficiency That is going to be situation dependent, but if you have a reasonable upper bound, then that will be much easier and not far from optimal. If you pick the possibly too small route, then increasing the size in largish junks is much better than adding a row at a time. Pat, I am unfamiliar with the use of the word junk as a unit of measure for data objects. I figure there are a few different possibilities: 1. You are using the term intentionally meaning that you suggest he increases the size in terms of old cars and broken pianos rather than used up pens and broken pencils. 2. This was a Freudian slip based on your opinion of some datasets you have seen. 3. Somewhere between your mind and the final product jumps/chunks became junks (possibly a microsoft correction, or just typing too fast combined with number 2). 4. junks is an official measure of data/object size that I need to learn more about (the history of the term possibly being related to 2 and 3 above). 5. Chinese sailing vessel. http://en.wikipedia.org/wiki/Junk_(ship) Thanks for expanding my vocabulary (hmm, how am I going to use that word in context today?). So, if 5 is the case, then Pat's original statement can be reworded as: If you pick the possibly too small route, then increasing the size in largish Chinese sailing vessels is much better than adding a row boat at a time. While that is probably true, I am not sure what that would mean in terms of the original data processing question. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] mean
TABLE-matrix(data=c(12,13,14,15,24,10),byrow=T,nrow=2,ncol=3) TABLE [,1] [,2] [,3] [1,] 12 13 14 [2,] 15 24 10 apply(TABLE,1,mean) [1] 13.0 16.3 Chunhao Quoting Marco Chiapello [EMAIL PROTECTED]: Hi, I have a simple question. If I have a table and I want to have the mean for each row, how can I do?! Es: c1 c2 c3 mean 1 12 13 14 ?? 2 15 24 10 ?? ... Thanks, Marco __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot questions
OK, The original ggplot() construct (below) on the following two dataframes (test1, test2) generate different outputs, which I have attached. The output that I expect is that shown in test2.png. My expectations are that I have set the plotting limits with 'scale_x_continuous(lim = c(0, 360)) + scale_y_continuous(lim = c(0, 16))' so, both data sets should produce the same output except for the 'o' at plot center and the 'N' at the top. The only difference in the two dataframes are inclusion of first two rows in test2 with rplt column changed to character: test2[1:2,] oplt rplt az dist 10o 00 20N 360 16 Ahhh, wait a second! In composing this message I may have found the problem. It appears that including the 'scale_x_continuous()' component twice in my original version was causing (?) the erratic behaviour. And I have confirmed that the ordering of the layer, scale* and coord* components does not affect the output. However, I'm still getting more x-breaks than requested with radial lines corresponding to 45, 135, 225, 315 degrees (NE, SE, SW, NW). Still open to suggestions on that. # new version working with both dataframes ggplot() + coord_polar() + layer( data = test1, mapping = aes(x = az, y = dist, label = rplt), geom = text) + scale_x_continuous(lim = c(0, 360), breaks=c(90, 180, 270, 360), labels=c('E', 'S', 'W', 'N')) + scale_y_continuous(lim = c(0, 16), breaks=c(0, 4, 8, 12, 16), labels=c('centre', '4m', '8m', '12m', '16m')) ## ## ## # original version NOT WORKING with test1 ggplot() + coord_polar() + scale_x_continuous(lim = c(0, 360)) + scale_y_continuous(lim = c(0, 16)) + layer( data = test, mapping = aes(x = az, y = dist, label = rplt), geom = text) + scale_x_continuous(breaks=c(90, 180, 270, 360), labels=c('90', '180', '270', '360')) # data generating test1.png test1 -structure(list(oplt = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L ), rplt = 1:10, az = c(57L, 94L, 96L, 152L, 182L, 185L, 227L, 264L, 332L, 354L), dist = c(4.09, 2.8, 7.08, 7.09, 3.28, 7.85, 6.12, 1.97, 7.68, 7.9)), .Names = c(oplt, rplt, az, dist ), row.names = c(NA, 10L), class = data.frame) # data generating test2.png test2 - structure(list(oplt = c(0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), rplt = c(o, N, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10), az = c(0, 360, 57, 94, 96, 152, 182, 185, 227, 264, 332, 354), dist = c(0, 16, 4.09, 2.8, 7.08, 7.09, 3.28, 7.85, 6.12, 1.97, 7.68, 7.9)), .Names = c(oplt, rplt, az, dist), row.names = c(NA, 12L), class = data.frame) Many, many thanks for your patience and perseverance on this one Hadley, DaveT. -Original Message- From: hadley wickham [mailto:[EMAIL PROTECTED] Sent: June 6, 2008 02:06 PM To: Thompson, David (MNR) Cc: r-help@r-project.org Subject: Re: [R] ggplot questions Does the difference have something to do with ggplot() using ranges derived from the data? When I modify my original 'test' dataframe with two extra rows as defined below, I get expected results in both versions. Order shouldn't matter - and if it's making a difference, that's a bug. But I'm still not completely sure what you're expecting. This highlights my next question (warned you ;-) ), I have been unsuccessful in trying to define fixed plotting ranges to generate a 'template' graphic that I may reuse with successive 'overstory plot' data sets. I have used '+ xlim(0, 360) + ylim(0, 16)' but, this seems to not have any effect on the final plot layout. Could you please produce a small reproducible example that demonstrates this? It may well be a bug. Hadley -- http://had.co.nz/ attachment: test1.pngattachment: test2.png__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.