Re: [R] Issue with gc() on Ubuntu 20.04
On 27-08-2023 21:02, Ivan Krylov wrote: On Sun, 27 Aug 2023 19:54:23 +0100 John Logsdon wrote: Not so although it did lower the gc() time to 95.84%. This was on a 16 core Threadripper 1950X box so I was intending to use library parallel but I tried it on my lowly windows box that is years old and got it down to 88.07%. Does the Windows box have the same version of R on it? Yes, they are both 4.3.1 The only thing I can think of is that there are quite a lot of cases where a function is generated on the fly as in: eval(parse(t=paste("dprob <- function(x,l,s){",dist.functions[2,][dist.functions[1,]==distn],"(x,l,s)}",sep=""))) This isn't very idiomatic. If you need dprob to call the function named in dist.functions[2,][dist.functions[1,]==distn], wouldn't it be easier for R to assign that function straight to dprob? dprob <- get(dist.functions[2,][dist.functions[1,]==distn]) This way, you avoid the need to parse the code, which is typically not the fastest part of a programming language. (Generally in R and other programming languages with recursive data structures, storing variable names in other variables is not very efficient. Why not put functions directly into a list?) Agreed but this statement and other similar ones are only assigned once in an outer loop. Rprof() samples the whole call stack. Can you find out which functions result in a call to gc()? I haven't experimented with a wide sample of R code, but I don't usually encounter gc() as a major entry in my Rprof() outputs. From the first table, removing all the system functions, it suggests that the function do.combx() is mainly guilty. I have recoded that and gc() no longer appears - as it shouldn't with it switched off! One difference was that the new code used the built in combn function while the old code used gtools::combinations. I need gtools::permutations elsewhere but that is not time critical. Thanks Ivan for making me think! -- John Logsdon Quantex Research Ltd m:+447717758675/h:+441614454951 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Issue with gc() on Ubuntu 20.04
Folks I have come across an issue with gc() hogging the processor according to Rprof. Platform is Ubuntu 20.04 all up to date R version 4.3.1 libraries: survival, MASS, gtools and openxlsx. With default gc.auto options, the profiler notes the garbage collector as self.pct 99.39%. So I have tried switching it off using options(gc.auto=Inf) in the R session before running my program using source(). This lowered self.pct to 99.36. Not much there. After some pondering, I added an options(gc.auto=Inf) at the beginning of each function, not resetting it at exit, but expecting the offending function(s) to plead guilty. Not so although it did lower the gc() time to 95.84%. This was on a 16 core Threadripper 1950X box so I was intending to use library parallel but I tried it on my lowly windows box that is years old and got it down to 88.07%. The only thing I can think of is that there are quite a lot of cases where a function is generated on the fly as in: eval(parse(t=paste("dprob <- function(x,l,s){",dist.functions[2,][dist.functions[1,]==distn],"(x,l,s)}",sep=""))) I haven't added the options to any of these. The highest time used by any of my functions is 0.05% - the rest is dominated by gc(). There may not be much point in parallising the code until I can reduce the garbage collection. I am not short of memory and would like to disable it fully but despite adding to all routines, I haven't managed to do this yet. Can anyone advise me? And why is the Linux version so much worse than Windows? TIA -- John Logsdon Quantex Research Ltd m:+447717758675/h:+441614454951 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] apply and cousins
Thanks Jim and others (and sorry Jim - an early version of this slipped into your inbox :)) Apologies for not giving some concrete code - I was trying to explain in words. What I need to do is to fit a simple linear model to successive sections of a long matrix. So far, the best solution I have come up with uses apply twice: Generate some data in a 10*3 matrix: N = 10 Z = cbind(1:N,cumsum(rnorm(N,1,0.01)),rnorm(N,1.2,0.1)) # where the first column is an index, the second a monotonic increasing value representing time and the third just the measurements I want to process. Then write a function dVals1: dVals1 = function(Y,DD,dT){which.min((Y[2] - dT) > DD[,2]))) which will identify the first row where the time is greater than current time - dT. So to identify the start of the data (say) 10 units before for each row, we use apply and prepended this as a column to the array for later use: ZZ = cbind(apply(Z,1,dVals1,Z,10),Z) There may be some cases, particularly at the start, where later values are extracted because the minimum returned by which.min is 1. I now have start and finish pointers for each position so can proceed to fit a simple linear model with the following function: dVals2=function(D2,DD){ if((D2[2]-D2[1])<10){return(rep(0,2))} # reject short examples DX=DD[D2[1]:D2[2],] Res=as.vector(lm(DX[,3]~DX[,2])$coefficients) return(Res) } which returns 2 0's either if there are fewer than 10 values, otherwise it returns the intercept and slope calculated over the specified range. Applying this to the whole data by: t(apply(ZZ,1,dVals2,DD=ZZ)) does the job I think returning the results as an N * 2 matrix. > Hi John, > With due respect to the other respondents, here is something that might help: > > # get a vector of values > foo<-rnorm(100) > # get a vector of increasing indices (aka your "recent" values) > bar<-sort(sample(1:100,40)) > # write a function to "clump" the adjacent index values > clump_adj_int<-function(x) { > index_list<-list(x[1]) > list_index<-1 > for(i in 2:length(x)) { > if(x[i]==x[i-1]+1) >index_list[[list_index]]<-c(index_list[[list_index]],x[i]) > else { >list_index<-list_index+1 >index_list[[list_index]]<-x[i] > } > } > return(index_list) > } > index_clumps<-clump_adj_int(bar) > # write another function to sum the values > sum_subsets<-function(indices,vector) > return(sum(vector[indices],na.rm=TRUE)) > # now "apply" the function to the list of indices > lapply(index_clumps,sum_subsets,foo) > > Jim > > > On Thu, Jun 9, 2016 at 2:41 AM, John Logsdon > <j.logs...@quantex-research.com> wrote: >> Folks >> >> Is there any way to get the row index into apply as a variable? >> >> I want a function to do some sums on a small subset of some very long vectors, rolling through the whole vectors. >> >> apply(X,1,function {do something}, other arguments) >> >> seems to be the way to do it. >> >> The subset I want is the most recent set of measurements only - perhaps a >> couple of hundred out of millions - but I can't see how to index each value. The ultimate output should be a matrix of results the length of the input vector. But to do the sum I need to access the current row number. >> >> It is easy in a loop but that will take ages. Is there any vectorised apply-like solution to this? >> >> Or does apply etc only operate on each row at a time, independently of other rows? >> >> >> Best wishes >> >> John >> >> John Logsdon >> Quantex Research Ltd >> +44 161 445 4951/+44 7717758675 >> >> ______ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > Best wishes John John Logsdon Quantex Research Ltd +44 161 445 4951/+44 7717758675 Best wishes John John Logsdon Quantex Research Ltd +44 161 445 4951/+44 7717758675 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] apply and cousins
Folks Is there any way to get the row index into apply as a variable? I want a function to do some sums on a small subset of some very long vectors, rolling through the whole vectors. apply(X,1,function {do something}, other arguments) seems to be the way to do it. The subset I want is the most recent set of measurements only - perhaps a couple of hundred out of millions - but I can't see how to index each value. The ultimate output should be a matrix of results the length of the input vector. But to do the sum I need to access the current row number. It is easy in a loop but that will take ages. Is there any vectorised apply-like solution to this? Or does apply etc only operate on each row at a time, independently of other rows? Best wishes John John Logsdon Quantex Research Ltd +44 161 445 4951/+44 7717758675 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Vectorised operations
Folks I have some very long vectors - typically 1 million long - which are indexed by another vector, same length, with values from 1 to a few thousand, sp each sub part of the vector may be a few hundred values long. I want to calculate the cumulative maximum of each sub part the main vector by the index in an efficient manner. This can obviously be done in a loop but the whole calculation is embedded within many other calculations which would make everything very slow indeed. All the other sums are vectorised already. For example, A=c(1,2,1, -3,5,6,7,4, 6,3,7,6,9, ...) i=c(1,1,1, 2,2,2,2,2, 3,3,3,3,3, ...) where A has three levels that are not the same but the levels themselves are all monotonic non-decreasing. the answer to be a vector of the same length: R=c(1,2,2, -3,5,6,7,7, 6,6,7,7,9, ...) If I could reset the cumulative maximum to -1e6 (eg) at each change of index, a simple cummax would do but I can't see how to do this. The best way I have found so far is to use the aggregate command: as.vector(unlist(aggregate(a,list(i),cummax)[[2]])) but rarely this fails, returning a shorter vector than expected and seems rather ugly, converting to and from lists which may well be an unnecessary overhead. I have been trying other approaches using apply() methods but either it can't be done using them or I can't get my head round them! Any ideas? Best wishes John John Logsdon Quantex Research Ltd +44 161 445 4951/+44 7717758675 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R project and the TPP
Folks TPP, and in a European context, TTIP are very dangerous not only to open source software but to any public service and no satisfactory response has been forthcoming. There are ways of circumventing it I guess or opposing it (maybe using ISPs in China, Russia or North Korea???). The issue really should be reversed - how muchy open source coding has found its way into closed source software? We do not know because proprietory coding is secret, and hence insecure. Perhaps a court could rule that all software should be available for inspection by independent experts. This possibility may be sufficient to shut TI etc up. But this seems to have been put together in total secrecy and undermines pretty nearly every 'freedom' people have fought for since at least King John and probably others (not that English peasants enjoyed too much freedom after 1215 as it was the barons who got it all!) I really do not understand why legislators have done this unless corruption has become so pervasive that there are no longer any good guys and girls around (well, maybe Bernie Sanders and Jeremy Corbyn excepted but their chance of power is pretty slim at the moment despite Iowa). In the UK we have a referendum on EU membership which under ordinary circumstances I would automatically support as very much a pro-Europe person. But if TTIP is implemented, I don't know which way to vote. Of course it is a total sham anyway, so maybe a bloody nose for the legislators would not be a bad idea. And looking at the way the EU has treated Greece, Cyprus, Ireland, Portugal, I don't hold out much hope for an epiphany. Anyway this is a bit OT. :) > > > On 2/4/2016 6:59 PM, David Winsemius wrote: >>> On Feb 4, 2016, at 3:15 PM, Rolf Turner <r.tur...@auckland.ac.nz> >>> wrote: >>> >>> >>> >>> Quite a while ago I went to talk (I think it may have been at an NZSA >>> conference) given by the great Ross Ihaka. I forget the details but my >>> vague recollection was that it involved a technique for automatic >>> choice of some sort of smoothing parameter involved in a graphical >>> display. >> Identifying discontinuities: >> >> https://www.stat.auckland.ac.nz/~ihaka/downloads/Curves.pdf >> >> http://www.google.com/patents/US6704013 >> >> TI can now own analytic geometry if they file enough patents. > >And TI could therefore under TPP demand that any Internet Service > Provider remove any R content (or R generated content) that they claimed > (correctly or otherwise) infringed on their intellectual property, > without a court order, and with common citizens having only slightly > more ability to seek redress than the British peasants had when their > nobility got King John of England to sign the Magna Carta on 15 June 1215? > > >And, of course, this is only one concrete example. > > >More relevant, TPP might prohibit any government from promoting > the use of open-source software, because it could deprive a for-profit > company of income, and they could therefore sue for lost profit under > the Investor-State Dispute Settlement Settlement (ISDS) provisions of > the TPP or other "free trade" agreements like NAFTA. This is hardly far > fetched: Last Dec. 21, the U.S. Congress decided that consumers in the > U.S. did not have the right to know the origins of the meat they buy > under NAFTA (Scott Smith, "Congress repeals country of origin labeling > for meat", United Press International, Dec. 21, 2015 at 10:12 AM, > http://www.upi.com/Top_News/US/2015/12/21/Congress-repeals-country-of-origin-labeling-for-meat/3241450709277/). > > > >Spencer Graves > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > Best wishes John John Logsdon Quantex Research Ltd +44 161 445 4951/+44 7717758675 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.