On Tue, Dec 10, 2013 at 8:23 AM, Herouth Maoz <hero...@unicell.co.il> wrote:

>
> Hello.
>
> I have one particular table with very specialized use. I am sending
> messages to some partner. The partner processes them asynchronously, and
> then returns the status report to me. The table is used to store a
> serialized version of the message object, together with a few identifiers,
> expiry date, and a reference from the partner. When the status report comes
> back from the partner, we:
>
>
>    1. Select the record using the partner's reference number
>    2. reconstruct and queue the object.
>    3. Delete the record from database using the serial primary key ID.
>
>
Where do you "queue" the object?  Back into a different table within the
same database?  Why not use "DELETE ...RETURNING"?


>
> Every once in a while we run a garbage collection process which selects
> expired messages, reconstructs and queues them for processing, and deletes
> the record using the primary key.
>
> This works wonderfully as long as the table remains small - a few thousand
> records waiting for their status report, and that's it. The table is set to
> have frequent auto-anylize runs.
>

You might want to turn


>
> The problem starts when our partner has some glitch, under high load, and
> fails to send back a few hundred thousand reports. In that case, the table
> grows to a few hundred records, and they are not deleted until they hit
> their expiry date, at which point the "garbage collector" takes care of
> them and everything goes back to normal. When it contains hundreds of
> thousands of records, performance deteriorates considerably.
>

There is no inherent reason the performance needs to degrade.  Can you give
the specific queries that perform worse?



> I am trying to figure out a solution that will keep the system working
> well even when there is a large number of records in the table. At first I
> thought of partitioning the table on the partner's reference field. But the
> DELETE would actually slow down if I do this, right?
>

Not necessarily, but partitioning should be your last resort not your first
resort, and I don't see any reason it would be needed here.

Indexes:
>     "transient_pkey" PRIMARY KEY, btree (id)
>     "transient_msisdn_ref" btree (msisdn, smsc_reference)
>     "transient_sm_vp" btree (validity)
>

What is the partner reference?  If it is smsc_reference, then you probably
need a index in which that is the lead (or only) column.

Cheers,

Jeff

Reply via email to