On Tue, Dec 10, 2013 at 8:23 AM, Herouth Maoz <hero...@unicell.co.il> wrote:
> > Hello. > > I have one particular table with very specialized use. I am sending > messages to some partner. The partner processes them asynchronously, and > then returns the status report to me. The table is used to store a > serialized version of the message object, together with a few identifiers, > expiry date, and a reference from the partner. When the status report comes > back from the partner, we: > > > 1. Select the record using the partner's reference number > 2. reconstruct and queue the object. > 3. Delete the record from database using the serial primary key ID. > > Where do you "queue" the object? Back into a different table within the same database? Why not use "DELETE ...RETURNING"? > > Every once in a while we run a garbage collection process which selects > expired messages, reconstructs and queues them for processing, and deletes > the record using the primary key. > > This works wonderfully as long as the table remains small - a few thousand > records waiting for their status report, and that's it. The table is set to > have frequent auto-anylize runs. > You might want to turn > > The problem starts when our partner has some glitch, under high load, and > fails to send back a few hundred thousand reports. In that case, the table > grows to a few hundred records, and they are not deleted until they hit > their expiry date, at which point the "garbage collector" takes care of > them and everything goes back to normal. When it contains hundreds of > thousands of records, performance deteriorates considerably. > There is no inherent reason the performance needs to degrade. Can you give the specific queries that perform worse? > I am trying to figure out a solution that will keep the system working > well even when there is a large number of records in the table. At first I > thought of partitioning the table on the partner's reference field. But the > DELETE would actually slow down if I do this, right? > Not necessarily, but partitioning should be your last resort not your first resort, and I don't see any reason it would be needed here. Indexes: > "transient_pkey" PRIMARY KEY, btree (id) > "transient_msisdn_ref" btree (msisdn, smsc_reference) > "transient_sm_vp" btree (validity) > What is the partner reference? If it is smsc_reference, then you probably need a index in which that is the lead (or only) column. Cheers, Jeff