Re: [R] specifications windows pc
Good morning Ruud, What sort of tasks are you going to be doing in R? Some tasks will be faster on a single core extreme type processor, and other tasks can benefit from a multi-core processor (which run at slower clock speeds than extreme single-core). If you're working with large matrices, then an optimized BLAS can help. Do the problems you'll be working on require more than 1500mb of RAM? If so then you should consider looking at a 64-bit linux on a 64-bit CPU. The more performance you're looking for - the more work you have to do to get it! As an aside - I don't know whether AMD or Intel processors are faster - clock-speed for clock-speed or / bang-for-buck... doing R-ish tasks (int / float etc) Kind Regards, Sean R.H. Koning wrote: Hello, I am about to order a new workstation at my university that will be used for R (and other research related tasks). I would appreciate any feedback on the specifications of a very fast machine. The machine should run windows (XP probably better than vista). Which chip, memory size and specification, etc should I be looking for? Thanks, Ruud [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/specifications-windows-pc-tp20730325p20733228.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to create a string containing '\/' to be used with SED?
Good morning, You do not need to quote a forward slash / in R, but you do need to quote a backslash when you're inputting it... so to get a string which actually contains blah\/blah... you need to use blah\\/blah http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-does-backslash-behave-strangely-inside-strings_003f Unless this is a very very big file you shouldn't need to go out to sed, as gsub() should work adequately... and probably quicker and cleaner. So something along the lines of.. (UNTESTED!!! since I don't have a reproduceable example) tmp1 - readLines(configurationFile) tmp1 - gsub(^instance .*, paste(instance = , data$instancePath, /, data$newInstance, sep = ), tmp1) I'm working on 50mb text files, and doing all sorts of manipulations and I do it all inside R under windows XP... reading a 50mb text file across the 100mb network and doing a gsub() on most lines takes an elapsed 16 seconds on this office desktop. hth... Regards, Sean ikarus wrote: Hi guys, I've been struggling to find a solution to the following issue: I need to change strings in .ini files that are given in input to a program whose output is processed by R. The strings to be changed looks like: instance = /home/TSPFiles/TSPLIB/berlin52.tsp I normally use Sed for this kind of things. So, inside R I'd like to write something like: command - paste(sed -i 's/^instance .*/instance = , data$instancePath, data$newInstance, /' , configurationFile, sep = ) system(command) This will overwrite the line starting with instance using instance = the_new_instance In the example I gave, data$instancePath = /home/TSPFiles/TSPLIB/ and data$newInstance = berlin52.tsp The problem is that I need to pass the above path string to sed in the form: \/home\/TSPFiles\/TSPLIB\/ However, I couldn't find a way to create such a string in R. I tried in several different ways, but it always complains saying that '\/' is an unrecognized escape! Any suggestion? Thanks! -- View this message in context: http://www.nabble.com/How-to-create-a-string-containing-%27%5C-%27-to-be-used-with-SED--tp20694319p20696613.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] memory limit
Good afternoon, The short answer is yes, the long answer is it depends. It all depends on what you want to do with the data, I'm working with dataframes of a couple of million lines, on this plain desktop machine and for my purposes it works fine. I read in text files, manipulate them, convert them into dataframes, do some basic descriptive stats and tests on them, a couple of columns at a time, all quick and simple in R. There are some libraries which are setup to handle very large datasets, e.g. biglm [1]. If you're using algorithms which require vast quantities of memory, then as the previous emails in this thread suggest, you might need R running on 64-bit. If you're working with a problem which is embarrassingly parallel[2], then there are a variety of solutions - if you're in between then the solutions are much more data dependant. the flip question: how long would it take you to get up and running with the functionallity (tried and tested in R) you require if you're going to be re-working things in C++? I suggest that you have a look at R, possibly using a subset of your full set to start with - you'll be amazed how quickly you can get up and running. As suggested at the start of this email... it depends... Best Regards, Sean O'Riordain Dublin [1] http://cran.r-project.org/web/packages/biglm/index.html [2] http://en.wikipedia.org/wiki/Embarrassingly_parallel iwalters wrote: I'm currently working with very large datasets that consist out of 1,000,000 + rows. Is it at all possible to use R for datasets this size or should I rather consider C++/Java. -- View this message in context: http://www.nabble.com/increasing-memory-limit-in-Windows-Server-2008-64-bit-tp20675880p20700590.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to create a string containing '\/' to be used with SED?
What is the problem error message? I can say fred - blah1\\/blah2\\/blah3 and then the string looks like... cat(#, fred, '#\n', sep='') #blah1\/blah2\/blah3# If you just ask R to print it then it looks like... fred [1] blah1\\/blah2\\/blah3 when you're playing with strings and regular expressions, it's vital to understand the backslash quoting mechanism... Best regards, Sean ikarus wrote: I still can't create a string with inside \/ (e.g., a - ..\\/path\\/file doesn't work, R complains and the \\ are removed), ... snip -- View this message in context: http://www.nabble.com/How-to-create-a-string-containing-%27%5C-%27-to-be-used-with-SED--tp20694319p20713699.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] exponential of a matrix
Good morning, Try expm() in the Matrix package by Douglas Bates and Martin Maechler http://www.stats.bris.ac.uk/R/web/packages/Matrix/index.html Note that there is a revised version of that paper, refer: Cleve Moler and Charles Van Loan (2003) Nineteen dubious ways to compute the exponential of a matrix, twenty-five years later. SIAM Review 45, 1, 3–49. Regards, Sean O'Riordain [EMAIL PROTECTED] Peter Dalgaard wrote: Terry Therneau wrote: Is the matrix exponential available in some package? Multiple. At least Matrix and msm. One of Jim Lindsey's too, but I think that's one of the more dubious ones. The cannonical reference is Nineteen dubious ways to take the exponential of a matrix. (Love that title) Yes, it's a classic. As I recall it, the paper misses one point, though: You often want a fast way of computing exp(tQ) (or exp(tQ)%*%v or u%*%exp(tQ)) for multiple values of t, and it is mainly about finding exp(Q) as accurately as possible. Terry T. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/exponential-of-a-matrix-tp20449726p20454590.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pros and Cons of R
Neil Shephard wrote: Another pro to consider is the cost, you can obtain R for free, SAS/S-Plus/Stata all have licenses of some sort that require purchasing. Neil Which has the side effect of *not* restricting how many machines are available for use or where; e.g. I was running big process a couple of different times with different scenarios, so I just fired up a few un-used machines and had them all running in parallel for the afternoon - no installation issues as I was able to run it off the network drive (windows as it happens). If I was licence restricted this would not have been possible. Similarly I can do analyses at home on any machine or even if I'm visiting somewhere else! Regards Sean -- View this message in context: http://www.nabble.com/Pros-and-Cons-of-R-tp17407521p17424335.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to plot wind direction and strength field
Jenny, Have a look at the R Newsletter Volume 3/2, October 2003 Regards, Sean Jenny Barnes wrote: Dear R-help community, I have searched through the archives and not been able ot find any advice on how to plot a wind field with one arrow per grid square with the arrow pointing in the direction of the wind and it's size proportional to the wind strength. I have the wind speed data in arrays of [lon,lat,uwind] and [lon,lat,vwind] so it is broken down into u and v components. How do I plot it though?!?! Any suggestions very wecome indeed - I seem to have hit a brick wall. All the best, Jenny ~~ Jennifer Barnes PhD student: long range drought prediction Climate Extremes Group Department of Space and Climate Physics University College London Holmbury St Mary Dorking, Surrey, RH5 6NT Web: http://climate.mssl.ucl.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/generic-question--%3E-Genomics-with-R-tp16954827p16958167.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] select rows from data based on a vector of char strings
or using the %in% operator... ?%in% data[data$label %in% flist,] regards, Sean Applejus wrote: Hi, You are right the == doesn't work, but there's a workaround using regular expressions: flist-fun|food grep(flist, data$label) will give you the vector [2 4] which are the numbers of the rows of interest! Dirkheld wrote: Hi, I have loaded a dataset in R : data = label freq1 freq2 news 54 35 fun 37 21 milk19 7 food 3 3 etc And I have a vector flist-c(fun,food) Now I want to use the vector 'flist' for selecting these values from 'data' so that I get the following dataset : label freq1 freq2 fun 37 21 food 3 3 When I do 'data$label==flist[1]' I get 'F T F F', so it works for one item in the char vector flist. But, when I do 'data$label==flist' I get 'F F F F' while I expected 'F T F T'. It seems that I can't perform this action with a vector of charstrings? Is there an other way to do so? -- View this message in context: http://www.nabble.com/select-rows-from-data-based-on-a-vector-of-char-strings-tp16832735p16848199.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Documentation General Comments
Good morning, Firstly I'd like to say that I'm a huge fan of R and I think it's great system. Part of the problem in searching for information is knowing what buzzwords / keywords to use. I was recently caught out like this as I didn't see my problem as a cumulative sum (keyword=cumsum) only as referencing one line of a dataframe from another. Academic papers and certain webpages add special classification keywords to the text of a page to help. Searching is a general problem - not just within R - ask any archivist or librarian! A partial solution is to have disambiguation pages, e.g. http://en.wikipedia.org/wiki/Comma Is it reasonable to have help pages with no specific R / package item behind it only a See Also section? Does somebody have access to the most frequent RSiteSearch() terms? It would probably help to increase the number of See Also details - for example when I run into a problem the first thing I do is try to recreate it as a reproduceable toy problem which I could send to this list (which incidentally is actually a great way of figuring out solution to the problem without having to bother the list!!!). To do this I invariably want to generate some random numbers, and I can never remember the names runif() or rnorm() so I say help.search(random) which doesn't actually reference either runif() or rnorm() directly so I look at ?RNG which leads me to rnorm() - and already knowing that this is what I'm looking for I'm ok - but if somebody didn't already know this it is not obvious. I appreciate that there is always a difficult balance when writing documentation between having enough and too much. Just looking at the core documentation for R-2.6.2 (and ignoring the many many additional packages) The introduction to R is 100 pages of PDF and the reference manual runs to 1,576 pages of PDF. Adding more information as many of us want would make the reference manual even more unwieldy and far too big to print out to peruse, which gives rise to a market for books which take over where the introduction manual leaves off... Part of the difficulty that we encounter is that sometimes our difficulties are pure R, and other times the difficulty is statistical or mathematical - more often than not the problem is between the two... and frequently those of us asking the question don't actually know where on the spectrum it is... Q: Could there be ways other than submitting a bug / patch to help improve R? Q: Should this discussion be on r-devel or r-help? Best Regards, Sean The root of the problem is that R is a voluntary/cooperative project and those who develop and maintain R are (generously) contributing their time and probably have little-to-no time left over to devote to the improvement of the documentation. snip... This is why the documentation tends to be opaque in the first place. The people who build R are so clever and understand so much that they cannot put themselves in the shoes of those of us who are not so blessed with intelligence and erudition. So they (often) write terse cryptic instructions which (often) depend on background knowledge that many of us lack. That background knowledge can of course be found ***if you know where to look*** --- or even if you don't, given that you are prepared to put in sufficient time and effort searching ***and*** are clever at searching. It's that last requirement that leaves *me* out in the cold. snip... -- View this message in context: http://www.nabble.com/Documentation-General-Comments-tp16821085p16833353.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] running balance down a dataframe referring back to previous row
Good morning, I've searched high and low and I've tried many different ways of doing this, but I can't seem to get it to work. I'm looking for a way of vectorising a running balance; i.e. the value in the first row of the dataframe is zero, and following rows add to this running balance. This is easy to write in a loop, but I can't seem to get it working in vectorised code. Hopefully the example below will explain what I'm trying to do... # # create a dummy dataframe txns - data.frame(LETTERS) set.seed(123) # randomly specify debit / credit columns txns$drcr - sample(c('d','c'), nrow(txns), replace=T) txns$dr - 0 txns$cr - 0 # give values to the debits / credits... txns[txns$drcr == 'd', 'dr'] - runif(nrow(txns[txns$drcr == 'd',]), min=0, max=1) txns[txns$drcr == 'c', 'cr'] - runif(nrow(txns[txns$drcr == 'c',]), min=0, max=1) # reset the initial dr/cr value to zero... txns[1,'dr'] - 0 txns[1,'cr'] - 0 # intialize the entire running balance column to zero txns$rbal - 0 # setup a row index starting at row 2 so that we only operate on these rows... r0 - c(2:nrow(txns)) # setup a row index offset by 1 so that we can access the running balance # from the previous line... r1 - c(2:nrow(txns)) - 1 # calculate the running balance using vectorized code unfortunately this doesn't work... txns[r0,'rbal'] - txns[r1,'rbal'] + txns[r0,'dr'] - txns[r0,'cr'] # calculate the running balance using a loop txns$running.bal - 0 for (i in (2:nrow(txns))) { txns[i,'running.bal'] - txns[(i-1), 'running.bal'] + txns[i, 'dr'] - txns[i, 'cr'] } txns # I was hoping that rbal and running.bal would be the same... evidently not... I've even tried --vanilla... Is there a specified order in which vectorized dataframe calculations are carried out? Top to bottom or unspecified? Does it work off a copy and then replace the old column? Do I just have to use a loop for this? platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 6.2 year 2008 month 02 day08 svn rev44383 language R version.string R version 2.6.2 (2008-02-08) Many thanks in advance, Best regards, Sean O'Riordain -- View this message in context: http://www.nabble.com/running-balance-down-a-dataframe-referring-back-to-previous-row-tp16142263p16142263.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] spliting strings ...
Good afternoon Monica, Relying on regular expressions, substituting nothing for everything starting with a space until the end of the line (i.e. with a dollar sign) str1 - sub( .*$, , str) Regards, Sean Monica Pisica wrote: Hi everyone, I have a vector of strings, each string made up by different number of words. I want to get a new vector which has only the first word of each string in the first vector. I came up with this: str - c('aaa bbb', 'cc', 'd eee aa', 'mmm o n') str1 - rep(1, length(str)) for (i in 1:length(str)) { str1[i] - strsplit(str, )[[i]][1] } str1 'aaa' 'cc' 'd' 'mmm' Now, is there any way to do this simpler? Thanks, Monica _ Get the power of Windows + Web with the new Windows Live. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/spliting-strings-...-tp14316255p14316361.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Re placing values job
fyi On my machine match runs *much* faster... t0 - Sys.time(); for (i in 1:reps) { match(Y,X) }; print(Sys.time() - t0) Time difference of 0.1570001 secs t0 - Sys.time(); for (i in 1:reps) { sapply(Y,function(Y){which(Y==X)}) }; print(Sys.time() - t0) Time difference of 6.093 secs 6.09/.157 [1] 38.78981 Regards, Sean Peter Dalgaard wrote: Ingmar Visser wrote: does this do what you want? sapply(y,function(y){which(y==x)}) Maybe, but match(Y,X) would be more to the point. hth, Ingmar On 28 Nov 2007, at 15:53, Serguei Kaniovski wrote: Hallo, I have two vectors of different lengths which contain the same set of values: X -c(2,6,1,7,4,3,5) Y - c(1,1,6,4,6,1,4,1,2,3,6,6,1,2,4,4,5,4,1,7,6,6,4,4,7,1,2) How can I replace the values in Y with the index (!) of the corresponding values in X. So 2 appears in X in the first coordinate, so all 2�s in Y should be replaced by 1, etc. Thank you for your help, Serguei Austrian Institute of Economic Research (WIFO) P.O.Box 91 Tel.: +43-1-7982601-231 1103 Vienna, AustriaFax: +43-1-7989386 Mail: [EMAIL PROTECTED] http://www.wifo.ac.at/Serguei.Kaniovski [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. Ingmar Visser Department of Psychology, University of Amsterdam Roetersstraat 15 1018 WB Amsterdam The Netherlands t: +31-20-5256723 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Replacing-values-job-tf4889131.html#a14021232 Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.