Re: [R] Issue with gc() on Ubuntu 20.04

2023-08-27 Thread John Logsdon

On 27-08-2023 21:02, Ivan Krylov wrote:

On Sun, 27 Aug 2023 19:54:23 +0100
John Logsdon  wrote:


Not so although it did lower the gc() time to 95.84%.

This was on a 16 core Threadripper 1950X box so I was intending to
use library parallel but I tried it on my lowly windows box that is
years old and got it down to 88.07%.


Does the Windows box have the same version of R on it?



Yes, they are both 4.3.1


The only thing I can think of is that there are quite a lot of cases
where a function is generated on the fly as in:

eval(parse(t=paste("dprob <-
function(x,l,s){",dist.functions[2,][dist.functions[1,]==distn],"(x,l,s)}",sep="")))


This isn't very idiomatic. If you need dprob to call the function named
in dist.functions[2,][dist.functions[1,]==distn], wouldn't it be easier
for R to assign that function straight to dprob?

dprob <- get(dist.functions[2,][dist.functions[1,]==distn])

This way, you avoid the need to parse the code, which is typically not
the fastest part of a programming language.

(Generally in R and other programming languages with recursive data
structures, storing variable names in other variables is not very
efficient. Why not put functions directly into a list?)



Agreed but this statement and other similar ones are only assigned once 
in an outer loop.



Rprof() samples the whole call stack. Can you find out which functions
result in a call to gc()? I haven't experimented with a wide sample of
R code, but I don't usually encounter gc() as a major entry in my
Rprof() outputs.


From the first table, removing all the system functions, it suggests 
that the function do.combx() is mainly guilty.  I have recoded that and 
gc() no longer appears - as it shouldn't with it switched off!  One 
difference was that the new code used the built in combn function while 
the old code used gtools::combinations.  I need gtools::permutations 
elsewhere but that is not time critical.


Thanks Ivan for making me think!

--
John Logsdon
Quantex Research Ltd
m:+447717758675/h:+441614454951

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Issue with gc() on Ubuntu 20.04

2023-08-27 Thread John Logsdon

Folks

I have come across an issue with gc() hogging the processor according to 
Rprof.


Platform is Ubuntu 20.04 all up to date
R version 4.3.1
libraries: survival, MASS, gtools and openxlsx.

With default gc.auto options, the profiler notes the garbage collector 
as self.pct 99.39%.


So I have tried switching it off using options(gc.auto=Inf) in the R 
session before running my program using source().


This lowered self.pct to 99.36.  Not much there.

After some pondering, I added an options(gc.auto=Inf) at the beginning 
of each function, not resetting it at exit, but expecting the offending 
function(s) to plead guilty.


Not so although it did lower the gc() time to 95.84%.

This was on a 16 core Threadripper 1950X box so I was intending to use 
library parallel but I tried it on my lowly windows box that is years 
old and got it down to 88.07%.


The only thing I can think of is that there are quite a lot of cases 
where a function is generated on the fly as in:


eval(parse(t=paste("dprob <- 
function(x,l,s){",dist.functions[2,][dist.functions[1,]==distn],"(x,l,s)}",sep="")))


I haven't added the options to any of these.

The highest time used by any of my functions is 0.05% - the rest is 
dominated by gc().


There may not be much point in parallising the code until I can reduce 
the garbage collection.


I am not short of memory and would like to disable it fully but despite 
adding to all routines, I haven't managed to do this yet.


Can anyone advise me?

And why is the Linux version so much worse than Windows?

TIA

--
John Logsdon
Quantex Research Ltd
m:+447717758675/h:+441614454951

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] apply and cousins

2016-06-09 Thread John Logsdon
Thanks Jim and others (and sorry Jim - an early version of this slipped
into your inbox :))

Apologies for not giving some concrete code - I was trying to explain in
words.

What I need to do is to fit a simple linear model to successive sections
of a long matrix.

So far, the best solution I have come up with uses apply twice:

Generate some data in a 10*3 matrix:

N = 10
Z = cbind(1:N,cumsum(rnorm(N,1,0.01)),rnorm(N,1.2,0.1)) #

where the first column is an index, the second a monotonic increasing
value representing time and the third just the measurements I want to
process.

Then write a function dVals1:

dVals1 = function(Y,DD,dT){which.min((Y[2] - dT) > DD[,2])))

which will identify the first row where the time is greater than current
time - dT.

So to identify the start of the data (say) 10 units before for each row,
we use apply and prepended this as a column to the array for later use:

ZZ = cbind(apply(Z,1,dVals1,Z,10),Z)

There may be some cases, particularly at the start, where later values are
extracted because the minimum returned by which.min is 1.

I now have start and finish pointers for each position so can proceed to
fit a simple linear model with the following function:

dVals2=function(D2,DD){
  if((D2[2]-D2[1])<10){return(rep(0,2))} # reject short examples
  DX=DD[D2[1]:D2[2],]
  Res=as.vector(lm(DX[,3]~DX[,2])$coefficients)
  return(Res)
}

which returns 2 0's either if there are fewer than 10 values, otherwise it
returns the intercept and slope calculated over the specified range.

Applying this to the whole data by:

t(apply(ZZ,1,dVals2,DD=ZZ))

does the job I think returning the results as an N * 2 matrix.

> Hi John,
> With due respect to the other respondents, here is something that might
help:
>
> # get a vector of values
> foo<-rnorm(100)
> # get a vector of increasing indices (aka your "recent" values)
> bar<-sort(sample(1:100,40))
> # write a function to "clump" the adjacent index values
> clump_adj_int<-function(x) {
>  index_list<-list(x[1])
>  list_index<-1
>  for(i in 2:length(x)) {
>   if(x[i]==x[i-1]+1)
>index_list[[list_index]]<-c(index_list[[list_index]],x[i])
>   else {
>list_index<-list_index+1
>index_list[[list_index]]<-x[i]
>   }
>  }
>  return(index_list)
> }
> index_clumps<-clump_adj_int(bar)
> # write another function to sum the values
> sum_subsets<-function(indices,vector)
> return(sum(vector[indices],na.rm=TRUE))
> # now "apply" the function to the list of indices
> lapply(index_clumps,sum_subsets,foo)
>
> Jim
>
>
> On Thu, Jun 9, 2016 at 2:41 AM, John Logsdon
> <j.logs...@quantex-research.com> wrote:
>> Folks
>>
>> Is there any way to get the row index into apply as a variable?
>>
>> I want a function to do some sums on a small subset of some very long
vectors, rolling through the whole vectors.
>>
>> apply(X,1,function {do something}, other arguments)
>>
>> seems to be the way to do it.
>>
>> The subset I want is the most recent set of measurements only - perhaps a
>> couple of hundred out of millions - but I can't see how to index each
value.  The ultimate output should be a matrix of results the length of
the input vector.  But to do the sum I need to access the current row
number.
>>
>> It is easy in a loop but that will take ages. Is there any vectorised
apply-like solution to this?
>>
>> Or does apply etc only operate on each row at a time, independently of
other rows?
>>
>>
>> Best wishes
>>
>> John
>>
>> John Logsdon
>> Quantex Research Ltd
>> +44 161 445 4951/+44 7717758675
>>
>> ______
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>


Best wishes

John

John Logsdon
Quantex Research Ltd
+44 161 445 4951/+44 7717758675



Best wishes

John

John Logsdon
Quantex Research Ltd
+44 161 445 4951/+44 7717758675

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] apply and cousins

2016-06-08 Thread John Logsdon
Folks

Is there any way to get the row index into apply as a variable?

I want a function to do some sums on a small subset of some very long
vectors, rolling through the whole vectors.

apply(X,1,function {do something}, other arguments)

seems to be the way to do it.

The subset I want is the most recent set of measurements only - perhaps a
couple of hundred out of millions - but I can't see how to index each
value.  The ultimate output should be a matrix of results the length of
the input vector.  But to do the sum I need to access the current row
number.

It is easy in a loop but that will take ages. Is there any vectorised
apply-like solution to this?

Or does apply etc only operate on each row at a time, independently of
other rows?


Best wishes

John

John Logsdon
Quantex Research Ltd
+44 161 445 4951/+44 7717758675

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Vectorised operations

2016-05-18 Thread John Logsdon
Folks

I have some very long vectors - typically 1 million long - which are
indexed by another vector, same length, with values from 1 to a few
thousand, sp each sub part of the vector may be a few hundred values long.

I want to calculate the cumulative maximum of each sub part the main
vector by the index in an efficient manner.  This can obviously be done in
a loop but the whole calculation is embedded within many other
calculations which would make everything very slow indeed.  All the other
sums are vectorised already.

For example,

A=c(1,2,1,  -3,5,6,7,4,  6,3,7,6,9, ...)
i=c(1,1,1,   2,2,2,2,2,  3,3,3,3,3, ...)

where A has three levels that are not the same but the levels themselves
are all monotonic non-decreasing.

the answer to be a vector of the same length:

R=c(1,2,2,  -3,5,6,7,7,  6,6,7,7,9, ...)

If I could reset the cumulative maximum to -1e6 (eg) at each change of
index, a simple cummax would do but I can't see how to do this.

The best way I have found so far is to use the aggregate command:

as.vector(unlist(aggregate(a,list(i),cummax)[[2]]))

but rarely this fails, returning a shorter vector than expected and seems
rather ugly,  converting to and from lists which may well be an
unnecessary overhead.

I have been trying other approaches using apply() methods but either it
can't be done using them or I can't get my head round them!

Any ideas?

Best wishes

John

John Logsdon
Quantex Research Ltd
+44 161 445 4951/+44 7717758675

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R project and the TPP

2016-02-05 Thread John Logsdon
Folks

TPP, and in a European context, TTIP are very dangerous not only to open
source software but to any public service and no satisfactory response has
been forthcoming.

There are ways of circumventing it I guess or opposing it (maybe using
ISPs in China, Russia or North Korea???).  The issue really should be
reversed - how muchy open source coding has found its way into closed
source software?  We do not know because proprietory coding is secret, and
hence insecure.  Perhaps a court could rule that all software should be
available for inspection by independent experts.  This possibility may be
sufficient to shut TI etc up.

But this seems to have been put together in total secrecy and undermines
pretty nearly every 'freedom' people have fought for since at least King
John and probably others (not that English peasants enjoyed too much
freedom after 1215 as it was the barons who got it all!)

I really do not understand why legislators have done this unless
corruption has become so pervasive that there are no longer any good guys
and girls around (well, maybe Bernie Sanders and Jeremy Corbyn excepted
but their chance of power is pretty slim at the moment despite Iowa).

In the UK we have a referendum on EU membership which under ordinary
circumstances I would automatically support as very much a pro-Europe
person.  But if TTIP is implemented, I don't know which way to vote.  Of
course it is a total sham anyway, so maybe a bloody nose for the
legislators would not be a bad idea.  And looking at the way the EU has
treated Greece, Cyprus, Ireland, Portugal, I don't hold out much hope for
an epiphany.

Anyway this is a bit OT. :)


>
>
> On 2/4/2016 6:59 PM, David Winsemius wrote:
>>> On Feb 4, 2016, at 3:15 PM, Rolf Turner <r.tur...@auckland.ac.nz>
>>> wrote:
>>>
>>>
>>>
>>> Quite a while ago I went to talk (I think it may have been at an NZSA
>>> conference) given by the great Ross Ihaka.  I forget the details but my
>>> vague recollection was that it involved a technique for automatic
>>> choice of some sort of smoothing parameter involved in a graphical
>>> display.
>> Identifying discontinuities:
>>
>> https://www.stat.auckland.ac.nz/~ihaka/downloads/Curves.pdf
>>
>> http://www.google.com/patents/US6704013
>>
>> TI can now own analytic geometry if they file enough patents.
>
>And TI could therefore under TPP demand that any Internet Service
> Provider remove any R content (or R generated content) that they claimed
> (correctly or otherwise) infringed on their intellectual property,
> without a court order, and with common citizens having only slightly
> more ability to seek redress than the British peasants had when their
> nobility got King John of England to sign the Magna Carta on 15 June 1215?
>
>
>And, of course, this is only one concrete example.
>
>
>More relevant, TPP might prohibit any government from promoting
> the use of open-source software, because it could deprive a for-profit
> company of income, and they could therefore sue for lost profit under
> the Investor-State Dispute Settlement Settlement (ISDS) provisions of
> the TPP or other "free trade" agreements like NAFTA.  This is hardly far
> fetched:  Last Dec. 21, the U.S. Congress decided that consumers in the
> U.S. did not have the right to know the origins of the meat they buy
> under NAFTA (Scott Smith, "Congress repeals country of origin labeling
> for meat", United Press International, Dec. 21, 2015 at 10:12 AM,
> http://www.upi.com/Top_News/US/2015/12/21/Congress-repeals-country-of-origin-labeling-for-meat/3241450709277/).
>
>
>
>Spencer Graves
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


Best wishes

John

John Logsdon
Quantex Research Ltd
+44 161 445 4951/+44 7717758675

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.