Hello again,
turns out that I was able to nail down the problem to this :
Prior to the profiling, I have several packages and a lot of functions to
load, including the C code for the rankCR function in R.
If I load first the libraries and next the C functions
library(Hmisc)
library(copula)
library(igraph)
library(ape)
library(phytools)
library(pcaPP)
if(OS=="Darwin") {
dyn.load("NAC_package/JKendall.so")
dyn.load("NAC_package/Npsi.so")
dyn.load("NAC_package/rankC.so")
} else {
dyn.load("NAC_package/JKendall.dll")
dyn.load("NAC_package/Npsi.dll")
dyn.load("NAC_package/rankC.dll")
}
No problem occurs at all.
But if load the libraries after the C functions prior to the profiling,
like this
if(OS=="Darwin") {
dyn.load("NAC_package/JKendall.so")
dyn.load("NAC_package/Npsi.so")
dyn.load("NAC_package/rankC.so")
} else {
dyn.load("NAC_package/JKendall.dll")
dyn.load("NAC_package/Npsi.dll")
dyn.load("NAC_package/rankC.dll")
}
library(Hmisc)
library(copula)
library(igraph)
library(ape)
library(phytools)
library(pcaPP)
the problem with rankC occurs.
If I try to reload in the same R session the C functions and the
librairies, the problem disappears permanently. The reason is that the C
functions are reloaded once more, but the already loaded libraries are not
and thus the last thing loaded is the C functions, exactly as in the
previous described case.
Any thoughts why I should load the libraries first? I already tried to
recompile the C functions with different names to be sure there are not
somehow erased by the C functions in the libraries, but it didn't help.
On Thu, Jan 30, 2014 at 8:46 AM, Nathan Uyttendaele <[email protected]>wrote:
> Hello Jim, thanks for the reply!
>
> What I'm trying to do:
>
> I have a small function that makes use of many other functions, such as
> the one I called "estimate.NAC.structure.of()"
> I'm trying to make run everything faster. Thus I'm using Rprof to perform
> a line by line profiling to help me decide what lines of code I should
> replace. Mostly it's this estimate.NACstructure.of function that requires
> time. I could copy paste this function, but it has more than a 1000 lines
> of code...
>
> The line #33 and what's around is
>
> rankCR=function(a, b, c) {
> vvv=c(-1, -1, -1)
> dens=.C("rankC", as.integer(sample.size), as.double(vvv), as.double(a),
> as.double(b),as.double(c),
> result=as.double(rep(-1, sample.size*3)), ties=as.double(rep(-1,
> sample.size*3))) ################ line 33
> return(list(matrix(dens[["result"]], ncol=3, nrow=sample.size,
> byrow=TRUE), as.numeric(dens[["ties"]])))
> }
>
> This rankCR function is then used in a loop:
>
> pre.r <- rankCR(pseudo.samples[[a]], pseudo.samples[[b]],
> pseudo.samples[[c]])
>
> where pseudo.samples[[whatever]] is just a vector.
>
>
> And so the problem is that this rankCR function eats up more than 50% of
> the execution time according to the first profiling (more precisely, line
> 33 is eating up more than 50% of the execution time), then does not even
> show up in the top 30 in the second profiling... thus what I have to
> improve to make the code run faster is getting not that clear.
>
>
>
> On Wed, Jan 29, 2014 at 10:02 PM, jim holtman <[email protected]> wrote:
>
>> You should at least post the script so that we see what line 33 is.
>> For example, was it an input statement so that on the second time you
>> ran the data was cached in memory? Did you remove all the objects and
>> do a gc() to clean up memory before trying again (maybe there was some
>> data hanging around that helped out). It is not unknown to get
>> different results if you are having to access, especially, external
>> data.
>>
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>>
>>
>> On Wed, Jan 29, 2014 at 8:29 AM, Nathan Uyttendaele <[email protected]>
>> wrote:
>> > Hello,
>> >
>> > when I run this code in a brand new R session
>> >
>> > --------------------------------------------------
>> > ### loading of libraries and other functions
>> >
>> > Rprof("profiling.out")
>> > start.time=proc.time()[3]
>> > for(i in 1:50) {
>> > main.function()
>> > }
>> > end.time=proc.time()[3]
>> > Rprof()
>> > ---------------------------------------------------
>> >
>> > I get as result of the profiling
>> >
>> > estimateNACstructureof.R#33 56.6900000
>> >
>> > estimateNACstructureof.R#93 5.2200000
>> >
>> > estimateNACstructureof.R#87 4.9700000
>> >
>> > estimateNACstructureof.R#78 1.9400000
>> >
>> > oldFriedman.R#39 1.2600000
>> >
>> > estimateNACstructureof.R#81 1.2100000
>> >
>> > estimateNACstructureof.R#88 1.2000000
>> >
>> > estimateNACstructureof.R#657 0.9600000
>> >
>> > todecompose.R#34 0.6900000
>> >
>> > estimateNACstructureof.R#658 0.6600000
>> >
>> > estimateNACstructureof.R#264 0.6500000
>> >
>> > estimateNACstructureof.R#923 0.5900000
>> >
>> > aresameNACstructures.R#4 0.5600000
>> >
>> > estimateNACstructureof.R#1059 0.4800000
>> >
>> > estimateNACstructureof.R#1064 0.4700000
>> >
>> > estimateNACstructureof.R#22 0.4600000
>> >
>> > estimateNACstructureof.R#659 0.4500000
>> >
>> > nicenodenames.R#22 0.4400000
>> >
>> > estimateNACstructureof.R#1029 0.4200000
>> >
>> > todecompose.R#119 0.4200000
>> >
>> > estimateNACstructureof.R#1061 0.4000000
>> >
>> > estimateNACstructureof.R#654 0.3800000
>> >
>> > todecompose.R#12 0.3800000
>> >
>> > estimateNACstructureof.R#85 0.3700000
>> >
>> > estimateNACstructureof.R#89 0.3700000
>> >
>> > todecompose.R#19 0.3500000
>> >
>> > todecompose.R#160 0.3500000
>> >
>> > estimateNACstructureof.R#1060 0.3300000
>> >
>> > estimateNACstructureof.R#83 0.3100000
>> >
>> > todecompose.R#144 0.3100000
>> >
>> > sampling.time 207.1400000
>> >
>> > total.elapsed.time 235.9440000
>> >
>> > total.elapsed.time/samping.time*100 87.79202
>> >
>> >
>> > Except for the 3 last rows, what we have is the self.pct value in
>> > decreasing order, by line.
>> >
>> >
>> > What I do not understand is that if am to run the same code all over
>> again,
>> > I now get
>> >
>> >
>> > estimateNACstructureof.R#852 4.34000
>> >
>> > estimateNACstructureof.R#923 3.94000
>> >
>> > estimateNACstructureof.R#87 3.38000
>> >
>> > estimateNACstructureof.R#93 3.31000
>> >
>> > estimateNACstructureof.R#859 2.70000
>> >
>> > estimateNACstructureof.R#1029 2.27000
>> >
>> > estimateNACstructureof.R#657 1.65000
>> >
>> > estimateNACstructureof.R#933 1.51000
>> >
>> > estimateNACstructureof.R#78 1.49000
>> >
>> > estimateNACstructureof.R#721 1.49000
>> >
>> > todecompose.R#34 1.49000
>> >
>> > estimateNACstructureof.R#658 1.39000
>> >
>> > estimateNACstructureof.R#1064 1.26000
>> >
>> > estimateNACstructureof.R#1059 1.25000
>> >
>> > estimateNACstructureof.R#659 1.21000
>> >
>> > estimateNACstructureof.R#1061 1.20000
>> >
>> > aresameNACstructures.R#4 1.17000
>> >
>> > estimateNACstructureof.R#1060 1.09000
>> >
>> > todecompose.R#12 1.00000
>> >
>> > todecompose.R#145 0.98000
>> >
>> > estimateNACstructureof.R#723 0.97000
>> >
>> > getNACadjacencyof.R#10 0.97000
>> >
>> > estimateNACstructureof.R#709 0.96000
>> >
>> > oldFriedman.R#39 0.90000
>> >
>> > estimateNACstructureof.R#81 0.89000
>> >
>> > estimateNACstructureof.R#88 0.88000
>> >
>> > estimateNACstructureof.R#708 0.88000
>> >
>> > todecompose.R#119 0.86000
>> >
>> > todecompose.R#160 0.84000
>> >
>> > estimateNACstructureof.R#715 0.83000
>> >
>> > sampling.time 198.38000
>> >
>> > total.elapsed.time 203.56800
>> >
>> > total.elapsed.time /samping.time*100 97.45147
>> >
>> >
>> >
>> > And the line 33 of estimateNACstructureof, which previously was observed
>> > 56.69% of the sampling time is now... gone ! If I profile he code in the
>> > same R session again and again, line 33 of estimateNACstructureof seems
>> > never to appear again.
>> >
>> >
>> > In fact the line 33 of estimateNACstructureof appears only when Rprof is
>> > run for the first time in the R session.
>> >
>> >
>> > Note: this is maybe relevant, the line 33 of estimateNACstructureof
>> refers
>> > to C function.
>> >
>> >
>> > Any thoughts of what I get this peculiar result?
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > [email protected] mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>
>
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.