Hi,
I am (ab)using a PostgreSQL database (with PostGIS extension) in a
large data processing job - each day, I load several GB of data, run a
lot of analyses on it, and then throw everything away again. Loading,
running, and dumping the results takes about 18 hours every day.
The job involves a lot of index building and sorting, and is run on a
64-bit machine with 96 GB of RAM.
Naturally I would like the system to use as much RAM as possible before
resorting to disk-based operations, but no amount of
maintenance_work_mem setting seems to make it do my bidding.
I'm using PostgreSQL 8.3 but would be willing and able to upgrade to any
later version.
Some googling has unearthed the issue - which is likely known to all of
you, just repeating it to prove I've done my homework - that tuplesort.c
always tries to double its memory allocation, and will refuse to do so
if that results in an allocation greater than MaxAllocSize:
if ((Size) (state->memtupsize * 2) >= MaxAllocSize / sizeof(SortTuple))
return false;
And MaxAllocSize is hardcoded to 1 GB in memutils.h.
(All this based on Postgres 9.1alpha source - I didn't want to bring
something up that has been fixed already.)
Now I assume that there are reasons that you're doing this. memutils.h
has the (for me) cryptic comment about MaxAllocSize: "XXX This is
deliberately chosen to correspond to the limiting size of varlena
objects under TOAST. See VARATT_MASK_SIZE in postgres.h.", but
VARATT_MASK_SIZE has zero other occurences in the source code.
If I were to either (a) increase MaxAllocSize to, say, 48 GB instead of
1 GB, or (b) hack tuplesort.c to ignore MaxAllocSize, just for my local
setup - would that likely be viable in my situation, or would I break
countless things?
I can afford some experimentation; as I said, I'm throwing away the
database every day anyway. I just thought I'd solicit your advice before
I do anything super stupid. - If I can use my setup to somehow
contribute to further PostgreSQL development by trying out some things,
I'll be more than happy to do so. I do C/C++ but apart from building
packages for several platforms, I haven't worked with the PostgreSQL
source code.
Of course the cop-out solution would be to just create a huge RAM disk
and instruct PostgreSQL to use that for disk-based sorting. I'll do that
if all of you say "OMG don't touch MaxAllocSize" ;)
Bye
Frederik
--
Frederik Ramm ## eMail frede...@remote.org ## N49°00'09" E008°23'33"
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers