Re: [R] Quicker way of combining vectors into a data.frame

2006-12-01 Thread Gavin Simpson
[ Resending to the list as I fell foul of the too many recipients rule ]

On Thu, 2006-11-30 at 11:34 -0600, Marc Schwartz wrote:

Thanks to Marc, Prof. Ripley, Sebastian and Sebastian (Luque - offline)
for your comments and suggestions.

I noticed that two of the vectors were named and so I removed the names
(names(vec) - NULL) and that pushed the execution time for the function
from c. 40 seconds to c. 115 seconds and all the time was taken within
the data.frame(...) call. So having names *on* some of the vectors
seemed to help things along, which was the opposite of what i had
expected.

If I use the cbind method of Marc, then the execution time for the
function drops to c. 1 second (most of which is in the calculation of
one of the vectors). So I guess I can work round this now.

What I find interesting is that:

test.dat - rnorm(4471)
 system.time(z - data.frame(col1 = test.dat, col2 = test.dat, col3 =
test.dat,
+ col4 = test.dat, col5 = test.dat, col6 = test.dat, col7 = test.dat,
+ col8 = test.dat, col9 = test.dat, col10 = test.dat))
[1] 0.008 0.000 0.007 0.000 0.000

Whereas doing exactly the same thing with different data in the function
gives the following timings:

system.time(fab - data.frame(lc.ratio, Q,
+  fNupt,
+  rho.n, rho.s,
+  net.Nimm,
+  net.Nden,
+  CLminN,
+  CLmaxN,
+  CLmaxS))
[1] 173.415   0.260 192.192   0.000   0.000

Most of that was without a change in memory, but towards the end for c.
5 seconds memory use by R increased by 200-300 MB.

and...

 system.time(fab - data.frame(lc.ratio = lc.ratio, Q = Q,
+  fNupt = fNupt,
+  rho.n = rho.n, rho.s = rho.s,
+  net.Nimm = net.Nimm,
+  net.Nden = net.Nden,
+  CLminN = CLminN,
+  CLmaxN = CLmaxN,
+  CLmaxS = CLmaxS))
[1]  99.966   0.140 114.091   0.000   0.000

Again with a slight increase in memory usage in last 5 seconds. So now,
having stripped the names of two of the vectors (so now all are
un-named), the un-named version of the data.frame call is almost twice
as slow as the named data.frame call.

If I leave the names on the two vectors that had them, I get the
following timings for those same calls

 system.time(fab - data.frame(lc.ratio, Q,
+  fNupt,
+  rho.n, rho.s,
+  net.Nimm,
+  net.Nden,
+  CLminN,
+  CLmaxN,
+  CLmaxS))
[1]  96.234   0.244 101.706   0.000   0.000

 system.time(fab - data.frame(lc.ratio = lc.ratio, Q = Q,
+  fNupt = fNupt,
+  rho.n = rho.n, rho.s = rho.s,
+  net.Nimm = net.Nimm,
+  net.Nden = net.Nden,
+  CLminN = CLminN,
+  CLmaxN = CLmaxN,
+  CLmaxS = CLmaxS))
[1] 13.597  0.088 15.868  0.000  0.000

So having the 2 named vectors and using the named version of the
data.frame call is the fastest combination.

This is all done within the debugger at the time when I would be
generating fab, and if I do,

system.time(z - data.frame(col1 = test.dat, col2 = test.dat, col3 =
test.dat,
+ col4 = test.dat, col5 = test.dat, col6 = test.dat, col7 = test.dat,
+ col8 = test.dat, col9 = test.dat, col10 = test.dat))
[1] 0.008 0.000 0.007 0.000 0.000

(as above) at this point in the debugger it is exceedingly quick.

I just don't understand what is going on with data.frame.

I have yet to try Prof. Ripley's suggestion of being a bit naughty with
R - I'll see if that is any quicker.

Once again, thanks to you all for your suggestions.

All the best,

G

 Gavin,
 
 One more note, which is that even timing the direct data frame creation
 on my system with colnames, again using the same 10 numeric columns, I
 get:
 
  system.time(DF1 - data.frame(lc.ratio = Col1, Q = Col2, fNupt = Col3,
 rho.n = Col4, rho.s = Col5, 
 net.Nimm = Col6, net.Nden = Col7, 
 CLminN = Col8, CLmaxN = Col9, 
 CLmaxS = Col10))
 [1] 0.012 0.000 0.028 0.000 0.000
 
 
  str(DF1)
 'data.frame':   4471 obs. of  10 variables:
  $ lc.ratio: num   0.1423  0.1873 -1.8129  0.0255 -1.7650 ...
  $ Q   : num   0.8340 -0.2387 -0.0864 -1.1184 -0.3368 ...
  $ fNupt   : num  -0.1718 -0.0549  1.5194 -1.6127 -1.2019 ...
  $ rho.n   : num  -0.740  0.240  0.522 -1.492  1.003 ...
  $ rho.s   : num  -0.2363 -1.6248 -0.3045  0.0294  0.1240 ...
  $ net.Nimm: num  -0.774  0.947 -1.098  0.809  1.216 ...
  $ net.Nden: num  -0.198 -0.135 -0.300 -0.618 -0.784 ...
  $ CLminN  : num   0.924 -3.265  0.211  0.813  0.262 ...
  $ CLmaxN  : num   0.3212 -0.0502 -0.9978  0.9005 -1.6535 ...
  $ CLmaxS  : num  

Re: [R] Quicker way of combining vectors into a data.frame

2006-12-01 Thread Peter Dalgaard
Gavin Simpson wrote:
 [ Resending to the list as I fell foul of the too many recipients rule ]

 On Thu, 2006-11-30 at 11:34 -0600, Marc Schwartz wrote:

 Thanks to Marc, Prof. Ripley, Sebastian and Sebastian (Luque - offline)
 for your comments and suggestions.

 I noticed that two of the vectors were named and so I removed the names
 (names(vec) - NULL) and that pushed the execution time for the function
 from c. 40 seconds to c. 115 seconds and all the time was taken within
 the data.frame(...) call. So having names *on* some of the vectors
 seemed to help things along, which was the opposite of what i had
 expected.

 If I use the cbind method of Marc, then the execution time for the
 function drops to c. 1 second (most of which is in the calculation of
 one of the vectors). So I guess I can work round this now.

 What I find interesting is that:

 test.dat - rnorm(4471)
   
 system.time(z - data.frame(col1 = test.dat, col2 = test.dat, col3 =
 
 test.dat,
 + col4 = test.dat, col5 = test.dat, col6 = test.dat, col7 = test.dat,
 + col8 = test.dat, col9 = test.dat, col10 = test.dat))
 [1] 0.008 0.000 0.007 0.000 0.000

 Whereas doing exactly the same thing with different data in the function
 gives the following timings:

 system.time(fab - data.frame(lc.ratio, Q,
 +  fNupt,
 +  rho.n, rho.s,
 +  net.Nimm,
 +  net.Nden,
 +  CLminN,
 +  CLmaxN,
 +  CLmaxS))
 [1] 173.415   0.260 192.192   0.000   0.000

 Most of that was without a change in memory, but towards the end for c.
 5 seconds memory use by R increased by 200-300 MB.

 and...

   
 system.time(fab - data.frame(lc.ratio = lc.ratio, Q = Q,
 
 +  fNupt = fNupt,
 +  rho.n = rho.n, rho.s = rho.s,
 +  net.Nimm = net.Nimm,
 +  net.Nden = net.Nden,
 +  CLminN = CLminN,
 +  CLmaxN = CLmaxN,
 +  CLmaxS = CLmaxS))
 [1]  99.966   0.140 114.091   0.000   0.000

 Again with a slight increase in memory usage in last 5 seconds. So now,
 having stripped the names of two of the vectors (so now all are
 un-named), the un-named version of the data.frame call is almost twice
 as slow as the named data.frame call.

 If I leave the names on the two vectors that had them, I get the
 following timings for those same calls

   
 system.time(fab - data.frame(lc.ratio, Q,
 
 +  fNupt,
 +  rho.n, rho.s,
 +  net.Nimm,
 +  net.Nden,
 +  CLminN,
 +  CLmaxN,
 +  CLmaxS))
 [1]  96.234   0.244 101.706   0.000   0.000

   
 system.time(fab - data.frame(lc.ratio = lc.ratio, Q = Q,
 
 +  fNupt = fNupt,
 +  rho.n = rho.n, rho.s = rho.s,
 +  net.Nimm = net.Nimm,
 +  net.Nden = net.Nden,
 +  CLminN = CLminN,
 +  CLmaxN = CLmaxN,
 +  CLmaxS = CLmaxS))
 [1] 13.597  0.088 15.868  0.000  0.000

 So having the 2 named vectors and using the named version of the
 data.frame call is the fastest combination.

 This is all done within the debugger at the time when I would be
 generating fab, and if I do,

 system.time(z - data.frame(col1 = test.dat, col2 = test.dat, col3 =
 test.dat,
 + col4 = test.dat, col5 = test.dat, col6 = test.dat, col7 = test.dat,
 + col8 = test.dat, col9 = test.dat, col10 = test.dat))
 [1] 0.008 0.000 0.007 0.000 0.000

 (as above) at this point in the debugger it is exceedingly quick.

 I just don't understand what is going on with data.frame.

   
I think there is something about the data you're not telling us...

Could you e.g. do something like

str(data.frame(lc.ratio, Q,
  fNupt,
  rho.n, rho.s,
  net.Nimm,
  net.Nden,
  CLminN,
  CLmaxN,
  CLmaxS))


and

str(list(lc.ratio, Q,
  fNupt,
  rho.n, rho.s,
  net.Nimm,
  net.Nden,
  CLminN,
  CLmaxN,
  CLmaxS))





-- 
   O__   Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Quicker way of combining vectors into a data.frame

2006-12-01 Thread Gavin Simpson
On Fri, 2006-12-01 at 12:13 +0100, Peter Dalgaard wrote:
 Gavin Simpson wrote:
snip /
 
  I just don't understand what is going on with data.frame.
 

 I think there is something about the data you're not telling us...

Yes, that I was doing something very, very silly that I thought would
work (produce a vector CLmaxN of the required length), but was in fact
blowing out to a huge named list. It was this that was causing the
massive increase in computation time in data.frame over cbind.

After correcting my mistake, timings for data.frame are:

system.time(fab - data.frame(lc.ratio, Q,
+  fNupt,
+  rho.n, rho.s,
+  net.Nimm,
+  net.Nden,
+  CLminN,
+  CLmaxN,
+  CLmaxS))
[1] 0.012 0.000 0.011 0.000 0.000
Browse[1] system.time(fab - data.frame(lc.ratio = lc.ratio, Q = Q,
+  fNupt = fNupt,
+  rho.n = rho.n, rho.s = rho.s,
+  net.Nimm = net.Nimm,
+  net.Nden = net.Nden,
+  CLminN = CLminN,
+  CLmaxN = CLmaxN,
+  CLmaxS = CLmaxS))
[1] 0.008 0.000 0.018 0.000 0.000

One vector has names for some reason, removing them brings the un-named
data.frame version down to the named version timing and makes no
difference to the named version

Browse[1] names(CLmaxS) - NULL
Browse[1] system.time(fab - data.frame(lc.ratio, Q,
+  fNupt,
+  rho.n, rho.s,
+  net.Nimm,
+  net.Nden,
+  CLminN,
+  CLmaxN,
+  CLmaxS))
[1] 0.008 0.000 0.016 0.000 0.000
Browse[1] system.time(fab - data.frame(lc.ratio = lc.ratio, Q = Q,
+  fNupt = fNupt,
+  rho.n = rho.n, rho.s = rho.s,
+  net.Nimm = net.Nimm,
+  net.Nden = net.Nden,
+  CLminN = CLminN,
+  CLmaxN = CLmaxN,
+  CLmaxS = CLmaxS))
[1] 0.008 0.000 0.009 0.000 0.000

Apologies to the list for bothering you all with my stupidity and thank
you again to everyone who replied - I knew it was I who was doing
something wrong, but couldn't see it and thanks to your comments,
suggestions and queries I was able to work out what that was.

All the best,

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC  ENSIS, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Quicker way of combining vectors into a data.frame

2006-12-01 Thread Marc Schwartz
On Thu, 2006-11-30 at 19:26 +, Gavin Simpson wrote:
 On Thu, 2006-11-30 at 11:34 -0600, Marc Schwartz wrote:
 
 Thanks to Marc, Prof. Ripley, Sebastian and Sebastian (Luque - offline)
 for your comments and suggestions.
 
 I noticed that two of the vectors were named and so I removed the names
 (names(vec) - NULL) and that pushed the execution time for the function
 from c. 40 seconds to c. 115 seconds and all the time was taken within
 the data.frame(...) call. So having names *on* some of the vectors
 seemed to help things along, which was the opposite of what i had
 expected.
 
 If I use the cbind method of Marc, then the execution time for the
 function drops to c. 1 second (most of which is in the calculation of
 one of the vectors). So I guess I can work round this now.
 
 What I find interesting is that:
 
 test.dat - rnorm(4471)
  system.time(z - data.frame(col1 = test.dat, col2 = test.dat, col3 =
 test.dat,
 + col4 = test.dat, col5 = test.dat, col6 = test.dat, col7 = test.dat,
 + col8 = test.dat, col9 = test.dat, col10 = test.dat))
 [1] 0.008 0.000 0.007 0.000 0.000
 
 Whereas doing exactly the same thing with different data in the function
 gives the following timings:
 
 system.time(fab - data.frame(lc.ratio, Q,
 +  fNupt,
 +  rho.n, rho.s,
 +  net.Nimm,
 +  net.Nden,
 +  CLminN,
 +  CLmaxN,
 +  CLmaxS))
 [1] 173.415   0.260 192.192   0.000   0.000
 
 Most of that was without a change in memory, but towards the end for c.
 5 seconds memory use by R increased by 200-300 MB.
 
 and...
 
  system.time(fab - data.frame(lc.ratio = lc.ratio, Q = Q,
 +  fNupt = fNupt,
 +  rho.n = rho.n, rho.s = rho.s,
 +  net.Nimm = net.Nimm,
 +  net.Nden = net.Nden,
 +  CLminN = CLminN,
 +  CLmaxN = CLmaxN,
 +  CLmaxS = CLmaxS))
 [1]  99.966   0.140 114.091   0.000   0.000
 
 Again with a slight increase in memory usage in last 5 seconds. So now,
 having stripped the names of two of the vectors (so now all are
 un-named), the un-named version of the data.frame call is almost twice
 as slow as the named data.frame call.
 
 If I leave the names on the two vectors that had them, I get the
 following timings for those same calls
 
  system.time(fab - data.frame(lc.ratio, Q,
 +  fNupt,
 +  rho.n, rho.s,
 +  net.Nimm,
 +  net.Nden,
 +  CLminN,
 +  CLmaxN,
 +  CLmaxS))
 [1]  96.234   0.244 101.706   0.000   0.000
 
  system.time(fab - data.frame(lc.ratio = lc.ratio, Q = Q,
 +  fNupt = fNupt,
 +  rho.n = rho.n, rho.s = rho.s,
 +  net.Nimm = net.Nimm,
 +  net.Nden = net.Nden,
 +  CLminN = CLminN,
 +  CLmaxN = CLmaxN,
 +  CLmaxS = CLmaxS))
 [1] 13.597  0.088 15.868  0.000  0.000
 
 So having the 2 named vectors and using the named version of the
 data.frame call is the fastest combination.
 
 This is all done within the debugger at the time when I would be
 generating fab, and if I do,
 
 system.time(z - data.frame(col1 = test.dat, col2 = test.dat, col3 =
 test.dat,
 + col4 = test.dat, col5 = test.dat, col6 = test.dat, col7 = test.dat,
 + col8 = test.dat, col9 = test.dat, col10 = test.dat))
 [1] 0.008 0.000 0.007 0.000 0.000
 
 (as above) at this point in the debugger it is exceedingly quick.
 
 I just don't understand what is going on with data.frame.
 
 I have yet to try Prof. Ripley's suggestion of being a bit naughty with
 R - I'll see if that is any quicker.
 
 Once again, thanks to you all for your suggestions.

Gavin,

Can you post the results of:

  str(fab)

and

  str(lc.ratio)
  str(Q)
  str(fNupt)
  str(rho.n)
  str(rho.s)
  str(net.Nimm)
  str(net.Nden)
  str(CLminN)
  str(CLmaxN)
  str(CLmaxS)

This is taking way too long. There is either something about one or more
of these objects that is more complex than just being simple vectors, or
there is something corrupt in your R session/environment.

You might want to try running a new and clean R session using:

  R --vanilla

and then re-run your code to see if that changes anything.  If so, it
suggests that my latter idea may be in play.

HTH,

Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Quicker way of combining vectors into a data.frame

2006-11-30 Thread Marc Schwartz
On Thu, 2006-11-30 at 17:00 +, Gavin Simpson wrote:
 Hi,
 
 In a function, I compute 10 (un-named) vectors of reasonable length
 (4471 in the particular example I have to hand) that I want to combine
 into a data frame object, that the function will return.
 
 This is very slow, so *I'm* doing something wrong if I want it to be
 quick and efficient, though I'm not sure what the best way to do this
 would be.
 
 I know it is the combining into data frame bit that is slow, because
 I've Rprof'ed it:
 
 $by.self
 self.time self.pct total.time total.pct
 names-.default   16.58 52.8  16.58  52.8
 unlist 7.22 23.0   7.26  23.1
 data.frame 1.72  5.5  29.38  93.6
 duplicated.default 1.66  5.3   1.66   5.3
 +  1.20  3.8   1.20   3.8
 list   0.40  1.3   0.40   1.3
 as.data.frame.numeric  0.28  0.9   3.32  10.6
 apply  0.26  0.8   1.70   5.4
 pmatch 0.22  0.7   0.22   0.7
 paste  0.20  0.6   0.90   2.9
 deparse0.14  0.4   0.70   2.2
 eval   0.12  0.4  31.28  99.7
 names-0.12  0.4  16.70  53.2
 FUN0.12  0.4   1.32   4.2
 names  0.12  0.4   0.14   0.4
 as.list.default0.12  0.4   0.12   0.4
 duplicated 0.10  0.3   1.76   5.6
 gc 0.10  0.3   0.10   0.3
 
 And I stepped through it under debug() and all the calculations before
 are quick, and then this bit takes a little over 20 seconds to complete
 
  fab - data.frame(lc.ratio = lc.ratio, Q = Q,
  fNupt = fNupt,
  rho.n = rho.n, rho.s = rho.s,
  net.Nimm = net.Nimm,
  net.Nden = net.Nden,
  CLminN = CLminN,
  CLmaxN = CLmaxN,
  CLmaxS = CLmaxS)
 
 I can get it down to c. 5 seconds if I do (not Rprof'ed):
 
  fab - data.frame(lc.ratio, Q,
  fNupt,
  rho.n, rho.s,
  net.Nimm,
  net.Nden,
  CLminN,
  CLmaxN,
  CLmaxS)
 
 But this still seems quite a long time, so I'm thinking that there must
 be a quicker of doing what I want (end up with a data.frame with the 10
 vectors in it).
 
 Can anyone enlighten me?

I am imputing from the above, that the 10 columns are all numeric as
there seems to be time spent in the column naming process (the lack of
which speeds up your second example), as well as the use of
as.data.frame.numeric() and related activities.

It is not clear, if this is correct, why you want a dataframe as opposed
to a numeric matrix, but in either case:

If we have 10 vectors, named Colx, where x is 1:10 and each vector is:

 str(Col1)
 num [1:4471]  0.1423  0.1873 -1.8129  0.0255 -1.7650 ...

Then:

 system.time(Mat - cbind(Col1, Col2, Col3, Col4, Col5, Col6, Col7,
   Col8, Col9, Col10))
[1] 0.002 0.000 0.001 0.000 0.000


Or:

 system.time(DF - as.data.frame(cbind(Col1, Col2, Col3, Col4, Col5,
Col6, Col7, Col8, Col9, Col10)))
[1] 0.005 0.000 0.005 0.000 0.000


You can then add colnames() subsequent to the cbind()ing:

 system.time(colnames(Mat) - c(lc.ratio, Q, fNupt, rho.n,
 rho.s, net.Nimm, net.Nden,
 CLminN, CLmaxN, CLmaxS))
[1] 0.002 0.000 0.001 0.000 0.000
 

 system.time(colnames(DF) - c(lc.ratio, Q, fNupt, rho.n,
rho.s, net.Nimm, net.Nden,
CLminN, CLmaxN, CLmaxS))
[1] 0.011 0.000 0.020 0.000 0.000



 str(Mat)
 num [1:4471, 1:10]  0.1423  0.1873 -1.8129  0.0255 -1.7650 ...
 - attr(*, dimnames)=List of 2
  ..$ : NULL
  ..$ : chr [1:10] lc.ratio Q fNupt rho.n ...

 str(DF)
'data.frame':   4471 obs. of  10 variables:
 $ lc.ratio: num   0.1423  0.1873 -1.8129  0.0255 -1.7650 ...
 $ Q   : num   0.8340 -0.2387 -0.0864 -1.1184 -0.3368 ...
 $ fNupt   : num  -0.1718 -0.0549  1.5194 -1.6127 -1.2019 ...
 $ rho.n   : num  -0.740  0.240  0.522 -1.492  1.003 ...
 $ rho.s   : num  -0.2363 -1.6248 -0.3045  0.0294  0.1240 ...
 $ net.Nimm: num  -0.774  0.947 -1.098  0.809  1.216 ...
 $ net.Nden: num  -0.198 -0.135 -0.300 -0.618 -0.784 ...
 $ CLminN  : num   0.924 -3.265  0.211  0.813  0.262 ...
 $ CLmaxN  : num   0.3212 -0.0502 -0.9978  0.9005 -1.6535 ...
 $ CLmaxS  : num  -0.520  0.278 -0.546 -0.925  1.507 ...


HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list

Re: [R] Quicker way of combining vectors into a data.frame

2006-11-30 Thread Prof Brian Ripley
If you are prepared to give up most of the sanity checks, see this at the 
bottom of read.table:

 ## this is extremely underhanded
 ## we should use the constructor function ...
 ## don't try this at home kids

 class(data) - data.frame
 row.names(data) - row.names
 data

So create a (named?) list with your vectors in it, assign class 
data.frame and then row.names(data) - NULL

On Thu, 30 Nov 2006, Gavin Simpson wrote:

 Hi,

 In a function, I compute 10 (un-named) vectors of reasonable length
 (4471 in the particular example I have to hand) that I want to combine
 into a data frame object, that the function will return.

 This is very slow, so *I'm* doing something wrong if I want it to be
 quick and efficient, though I'm not sure what the best way to do this
 would be.

 I know it is the combining into data frame bit that is slow, because
 I've Rprof'ed it:

 $by.self
self.time self.pct total.time total.pct
 names-.default   16.58 52.8  16.58  52.8
 unlist 7.22 23.0   7.26  23.1
 data.frame 1.72  5.5  29.38  93.6
 duplicated.default 1.66  5.3   1.66   5.3
 +  1.20  3.8   1.20   3.8
 list   0.40  1.3   0.40   1.3
 as.data.frame.numeric  0.28  0.9   3.32  10.6
 apply  0.26  0.8   1.70   5.4
 pmatch 0.22  0.7   0.22   0.7
 paste  0.20  0.6   0.90   2.9
 deparse0.14  0.4   0.70   2.2
 eval   0.12  0.4  31.28  99.7
 names-0.12  0.4  16.70  53.2
 FUN0.12  0.4   1.32   4.2
 names  0.12  0.4   0.14   0.4
 as.list.default0.12  0.4   0.12   0.4
 duplicated 0.10  0.3   1.76   5.6
 gc 0.10  0.3   0.10   0.3

 And I stepped through it under debug() and all the calculations before
 are quick, and then this bit takes a little over 20 seconds to complete

 fab - data.frame(lc.ratio = lc.ratio, Q = Q,
 fNupt = fNupt,
 rho.n = rho.n, rho.s = rho.s,
 net.Nimm = net.Nimm,
 net.Nden = net.Nden,
 CLminN = CLminN,
 CLmaxN = CLmaxN,
 CLmaxS = CLmaxS)

 I can get it down to c. 5 seconds if I do (not Rprof'ed):

 fab - data.frame(lc.ratio, Q,
 fNupt,
 rho.n, rho.s,
 net.Nimm,
 net.Nden,
 CLminN,
 CLmaxN,
 CLmaxS)

 But this still seems quite a long time, so I'm thinking that there must
 be a quicker of doing what I want (end up with a data.frame with the 10
 vectors in it).

 Can anyone enlighten me?

 version
   _
 platform   i686-pc-linux-gnu
 arch   i686
 os linux-gnu
 system i686, linux-gnu
 status Patched
 major  2
 minor  4.0
 year   2006
 month  10
 day03
 svn rev39576
 language   R
 version.string R version 2.4.0 Patched (2006-10-03 r39576)

 sessionInfo()
 R version 2.4.0 Patched (2006-10-03 r39576)
 i686-pc-linux-gnu

 locale:
 LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=en_GB.UTF-8;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATION=C

 attached base packages:
 [1] methods   stats graphics  grDevices utils
 datasets
 [7] base

 Thanks in advance,

 G


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Quicker way of combining vectors into a data.frame

2006-11-30 Thread Sebastian Weber
Hi!

I don't know for sure - and I have not tried it yet, but how about
allocating a matrix which will hold all stuff, then put all vectors in
it and at last assign some dimnames to it:

data - matrix(0, ncol=5, nrow=length(vec1))
data[1,] - vec1
...
dimnames(data) - list(c(1,2,3,4,5), )

as.data.frame(data)

I forgot, I of course assume all of your vectors to be numeric ...

Hope that helps!

Greetings,

Sebastian

On Thu, 2006-11-30 at 17:00 +, Gavin Simpson wrote:
 Hi,
 
 In a function, I compute 10 (un-named) vectors of reasonable length
 (4471 in the particular example I have to hand) that I want to combine
 into a data frame object, that the function will return.
 
 This is very slow, so *I'm* doing something wrong if I want it to be
 quick and efficient, though I'm not sure what the best way to do this
 would be.
 
 I know it is the combining into data frame bit that is slow, because
 I've Rprof'ed it:
 
 $by.self
 self.time self.pct total.time total.pct
 names-.default   16.58 52.8  16.58  52.8
 unlist 7.22 23.0   7.26  23.1
 data.frame 1.72  5.5  29.38  93.6
 duplicated.default 1.66  5.3   1.66   5.3
 +  1.20  3.8   1.20   3.8
 list   0.40  1.3   0.40   1.3
 as.data.frame.numeric  0.28  0.9   3.32  10.6
 apply  0.26  0.8   1.70   5.4
 pmatch 0.22  0.7   0.22   0.7
 paste  0.20  0.6   0.90   2.9
 deparse0.14  0.4   0.70   2.2
 eval   0.12  0.4  31.28  99.7
 names-0.12  0.4  16.70  53.2
 FUN0.12  0.4   1.32   4.2
 names  0.12  0.4   0.14   0.4
 as.list.default0.12  0.4   0.12   0.4
 duplicated 0.10  0.3   1.76   5.6
 gc 0.10  0.3   0.10   0.3
 
 And I stepped through it under debug() and all the calculations before
 are quick, and then this bit takes a little over 20 seconds to complete
 
  fab - data.frame(lc.ratio = lc.ratio, Q = Q,
  fNupt = fNupt,
  rho.n = rho.n, rho.s = rho.s,
  net.Nimm = net.Nimm,
  net.Nden = net.Nden,
  CLminN = CLminN,
  CLmaxN = CLmaxN,
  CLmaxS = CLmaxS)
 
 I can get it down to c. 5 seconds if I do (not Rprof'ed):
 
  fab - data.frame(lc.ratio, Q,
  fNupt,
  rho.n, rho.s,
  net.Nimm,
  net.Nden,
  CLminN,
  CLmaxN,
  CLmaxS)
 
 But this still seems quite a long time, so I'm thinking that there must
 be a quicker of doing what I want (end up with a data.frame with the 10
 vectors in it).
 
 Can anyone enlighten me?
 
  version
_  
 platform   i686-pc-linux-gnu  
 arch   i686   
 os linux-gnu  
 system i686, linux-gnu
 status Patched
 major  2  
 minor  4.0
 year   2006   
 month  10 
 day03 
 svn rev39576  
 language   R  
 version.string R version 2.4.0 Patched (2006-10-03 r39576)
 
  sessionInfo()
 R version 2.4.0 Patched (2006-10-03 r39576) 
 i686-pc-linux-gnu 
 
 locale:
 LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=en_GB.UTF-8;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATION=C
 
 attached base packages:
 [1] methods   stats graphics  grDevices utils
 datasets 
 [7] base
 
 Thanks in advance,
 
 G
 -- 
 %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
  Gavin Simpson [t] +44 (0)20 7679 0522
  ECRC  ENSIS, UCL Geography,  [f] +44 (0)20 7679 0565
  Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
  Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
  UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
 %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 
 __
 R-help@stat.math.ethz.ch mailing list
 

Re: [R] Quicker way of combining vectors into a data.frame

2006-11-30 Thread Marc Schwartz
Gavin,

One more note, which is that even timing the direct data frame creation
on my system with colnames, again using the same 10 numeric columns, I
get:

 system.time(DF1 - data.frame(lc.ratio = Col1, Q = Col2, fNupt = Col3,
rho.n = Col4, rho.s = Col5, 
net.Nimm = Col6, net.Nden = Col7, 
CLminN = Col8, CLmaxN = Col9, 
CLmaxS = Col10))
[1] 0.012 0.000 0.028 0.000 0.000


 str(DF1)
'data.frame':   4471 obs. of  10 variables:
 $ lc.ratio: num   0.1423  0.1873 -1.8129  0.0255 -1.7650 ...
 $ Q   : num   0.8340 -0.2387 -0.0864 -1.1184 -0.3368 ...
 $ fNupt   : num  -0.1718 -0.0549  1.5194 -1.6127 -1.2019 ...
 $ rho.n   : num  -0.740  0.240  0.522 -1.492  1.003 ...
 $ rho.s   : num  -0.2363 -1.6248 -0.3045  0.0294  0.1240 ...
 $ net.Nimm: num  -0.774  0.947 -1.098  0.809  1.216 ...
 $ net.Nden: num  -0.198 -0.135 -0.300 -0.618 -0.784 ...
 $ CLminN  : num   0.924 -3.265  0.211  0.813  0.262 ...
 $ CLmaxN  : num   0.3212 -0.0502 -0.9978  0.9005 -1.6535 ...
 $ CLmaxS  : num  -0.520  0.278 -0.546 -0.925  1.507 ...




So there is something else going on, either with your code or some other
conflict, unless my assumptions about your data are incorrect.

HTH,

Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.