Re: Insert into on conflict, data size upto 3 billion records

Ron Sat, 13 Feb 2021 12:04:37 -0800

On 2/12/21 12:46 PM, Karthik Kumar Kondamudi wrote:

Hi,
I'm looking for suggestions on how I can improve the performance of thebelow merge statement, we have a batch process that batch load the datainto the _batch tables using Postgres and the task is to update the maintarget tables if the record exists else into it, sometime these batchtable could go up to 5 billion records. Here is the current scenario
|target_table_main| has 700,070,247 records and is hash partitioned into50 chunks, it has an index on |logical_ts| and the batch table has2,715,020,546 close to 3 billion records, so I'm dealing with a huge setof data so looking of doing this in the most efficient way.

Many times, I have drastically sped up batch processing by #1 partitioningon the same field as an index, and #2 pre-sorting the input data by that field.

That way, you get excellent "locality of data" (meaning lots of writes tothe same hot bits of cache, which later get asynchronously flushed todisk). Unfortunately for your situation, the purpose of hash partitioningis to /reduce/ locality of data. (Sometimes that's useful, but *not* whenprocessing batches.)


--
Angular momentum makes the world go 'round.

Re: Insert into on conflict, data size upto 3 billion records

Reply via email to