>>>>> "BD" == Berry, David <d...@noc.ac.uk> writes:

BD> All variables are reals other than id which is varchar(10) and date
BD> which is a timestamp, approximately 1.5 million rows are returned by
BD> the query and it takes order 10 second to execute using psql (the
BD> command line client for Postgres) and a similar time using pgAdmin
BD> 3. In R it takes several minutes to run and I'm unsure where the
BD> bottleneck is occurring.

You may want to test progressively smaller chunks of the data to see how
quickly R slows down as compared to psql on that query.

My first guess is that something allocating and re-allocating ram in a
quadratic (or worse) fashion.

I don't know whether OSX has anything equivilent, but you could test on
the linux box using oprofile (http://oprofile.sourceforge.net; SuSE
should have an rpm for it and kernel support compiled in) to confirm
where the time is spent.

It is /possible/ that the (sql)NULL->(r)NA logic in RS-PostgreSQL.c may
be slow (relatively speaking), but it is necessary.  Nothing else jumps
out as a possible choke point.

Oprofile (or the equivilent) would best answer the question.

-JimC
-- 
James Cloos <cl...@jhcloos.com>         OpenPGP: 1024D/ED7DAEA6

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to