Re: Suggestion for memory parameters

2024-09-30 Thread yudhi s
On Mon, Sep 30, 2024 at 8:46 PM Philip Semanchuk <
phi...@americanefficient.com> wrote:

>
>
> > On Sep 26, 2024, at 7:03 AM, yudhi s 
> wrote:
> >
> > In a RDS postgres ...
>
> >  Is it fine to let it use "FreeLocalStorage" unless it goes till zero?
>
> Hi Yudhi,
> FreeLocalStorage and some of the other things you ask about are specific
> to AWS RDS, so you might have better luck getting answers on an
> RDS-specific mailing list. We also use RDS-hosted Postgres and so I
> completely understand how Postgres and RDS are intertwined.
>
> We have had runaway queries exhaust FreeLocalStorage. It has been quite a
> while since that happened, so my memories are hazy, but I’m pretty sure
> that when we used all of FreeLocalStorage, the result was that Postgres
> restarted. It might be equivalent to using all memory and disk space on a
> standalone system. Once there’s no storage left, behavior is unpredictable
> but we can’t be surprised if things crash. Usually our runaway queries got
> killed before FreeLocalStorage filled up, but not always.
>
> I second Veem’s suggestion to set work_mem on a per-session basis. Also
> note that the doc for work_mem says, “the total memory used could be many
> times the value of work_mem; it is necessary to keep this fact in mind when
> choosing the value."
>
>
> https://www.postgresql.org/docs/current/runtime-config-resource.html#GUC-WORK-MEM
>
>
Thank you.

When I execute the query with explain (analyze, buffers),I see the section
below in the plan having "sort method" information in three places
each showing ~75MB size, which if combined is coming <250MB. So , does that
mean it's enough to set the work_mem as ~250MB for these queries before
they start?

 But yes somehow this query is finished in a few seconds when i execute
using explain(analyze,buffers) while if i run it without using explain it
runs for ~10minutes+. My expectation was that doing (explain analyze)
should actually execute the query fully. Is my understanding correct here
and if the disk spilling stats which I am seeing is accurate enough to go
with?


Limit  (cost=557514.75..592517.20 rows=30 width=1430) (actual
time=2269.939..2541.527 rows=30 loops=1)
  Buffers: shared hit=886206, temp read=38263 written=56947
  I/O Timings: temp read=70.040 write=660.073
  ->  Gather Merge  (cost=557514.75..643393.02 rows=736048 width=1430)
(actual time=2269.938..2513.748 rows=30 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=886206, temp read=38263 written=56947
I/O Timings: temp read=70.040 write=660.073
->  Sort  (cost=556514.73..557434.79 rows=368024 width=1430)
(actual time=2227.392..2279.389 rows=100135 loops=3)
  Sort Key: column1, column2
  Sort Method: external merge  Disk: *77352kB*
  Buffers: shared hit=886206, temp read=38263 written=56947
  I/O Timings: temp read=70.040 write=660.073
  Worker 0:  Sort Method: external merge  Disk: *75592kB*
  Worker 1:  Sort Method: external merge  Disk: *74440kB*
  ->  Parallel Append  (cost=0.00..64614.94 rows=368024
width=1430) (actual time=0.406..570.105 rows=299204 loops=3)


Context variable in application and trigger code

2024-09-28 Thread yudhi s
Hi,
We have a trigger function called from a trigger which executes before
inserting rows in the table (say TAB1). This trigger function does some
conversion of code to description and persists the description in the table
in respective columns. We want to keep this trigger as light as possible as
there will be 100's million rows inserted into this table from multiple
input sources and the conversion should only happen  the inserts which
happen from a specific input data stream.

There are 4-5 different input sources which will ingest data into this
table (some are using file based processing framework and some are using
other streaming technology). Say for example we want this description to
only be fetched for input source - 1, but not others.  We don't have any
such column persisted in the table which can be checked for the input data
stream such as this code to describe decoding can be made conditional.

Are there any techniques possible in which we can set some parameter or
session context variable in application code level to determine the input
data source, which can then be checked within the trigger function code at
the very first and thus will avoid querying the "CODE" table every time the
trigger executes?


*Existing sample trigger code:-*

CREATE OR REPLACE FUNCTION code_to_desc( )
RETURNS trigger
LANGUAGE plpgsql
AS $function$
declare code_description code.code%TYPE;
 begin
 select currency_description_text into code_description
 from code
where code = new.code_input;

  IF FOUND THEN NEW.code_input := code_description;
  END IF;

return NEW;
 end;
$function$;

Regards
Yudhi


Re: Suggestion for memory parameters

2024-09-26 Thread yudhi s
On Fri, Sep 27, 2024 at 9:11 AM veem v  wrote:

>
> On Thu, 26 Sept 2024 at 16:33, yudhi s 
> wrote:
>
>> Hello All,
>>
>> In a RDS postgres we are seeing some select queries when running and
>> doing sorting on 50 million rows(as its having order by clause in it) , the
>> significant portion of wait event is showing as "IO:BufFileWrite" and it
>> runs for ~20minutes+.
>>
>> Going through the document in the link below, it states we should monitor
>> the "FreeLocalStorage" metric and when monitoring that, I see it showing up
>> to ~535GB as the max limit and when these queries run this goes down till
>> 100GB. Note-  (it's a R7g8xl instance)
>>
>>
>> https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/apg-waits.iobuffile.html
>>
>> We were thinking of bumping up the work_mem to a higher value in database
>> level , which is currently having size 4MB default. But we will also have
>> ~100 sessions running at same time and majority were from other
>> applications which execute other single row "insert" queries and I hope
>> that will not need high "work_mem" . And setting it at database level will
>> consume 100 times that set work_mem value. So how to handle this situation?
>>  Or
>>  Is it fine to let it use "FreeLocalStorage" unless it goes till zero?
>>
>> Also I am confused between the local storage (which is showing as 535GB)
>> vs the memory/RAM which is 256GB for this instance class with ~128TB max
>> storage space restriction, how are these storage different, (mainly the
>> 535GB space which it's showing vs the 128TB storage space restriction)?
>> Appreciate your guidance.
>>
>> select query looks something as below with no Joins but just single table
>> fetch:-
>>
>> Select
>> from 
>> where
>> order by column1, column2 LIMIT $b1 OFFSET $B2 ;
>>
>>
>>
> My 2 cents
> I think you should set the work_mem on specific session level , if your
> sorting queries are only from specific handful of sessions, as because
> setting it up at database level will eat up your most of RAM(which you said
> is 256GB) and you said 100+ sessions getting spawned at any point in time.
>


Thank you.
When I checked pg_stat_statements for this query , and divided the
temp_blk_read+temp_blk_written with the "calls", it came as ~1million which
means ~7GB. So does that mean ~7GB of work_mem should be allocated for this
query?


Suggestion for memory parameters

2024-09-26 Thread yudhi s
Hello All,

In a RDS postgres we are seeing some select queries when running and doing
sorting on 50 million rows(as its having order by clause in it) , the
significant portion of wait event is showing as "IO:BufFileWrite" and it
runs for ~20minutes+.

Going through the document in the link below, it states we should monitor
the "FreeLocalStorage" metric and when monitoring that, I see it showing up
to ~535GB as the max limit and when these queries run this goes down till
100GB. Note-  (it's a R7g8xl instance)

https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/apg-waits.iobuffile.html

We were thinking of bumping up the work_mem to a higher value in database
level , which is currently having size 4MB default. But we will also have
~100 sessions running at same time and majority were from other
applications which execute other single row "insert" queries and I hope
that will not need high "work_mem" . And setting it at database level will
consume 100 times that set work_mem value. So how to handle this situation?
 Or
 Is it fine to let it use "FreeLocalStorage" unless it goes till zero?

Also I am confused between the local storage (which is showing as 535GB) vs
the memory/RAM which is 256GB for this instance class with ~128TB max
storage space restriction, how are these storage different, (mainly the
535GB space which it's showing vs the 128TB storage space restriction)?
Appreciate your guidance.

select query looks something as below with no Joins but just single table
fetch:-

Select
from 
where
order by column1, column2 LIMIT $b1 OFFSET $B2 ;

Regards
Yudhi


Re: Manual query vs trigger during data load

2024-09-14 Thread yudhi s
On Sat, Sep 14, 2024 at 4:17 PM Peter J. Holzer  wrote:

> On 2024-09-14 00:54:49 +0530, yudhi s wrote:
> > As "thiemo" mentioned , it can be done as below method, but if we have
> > multiple lookup tables to be populated for multiple columns , then , how
> can
> > the INSERT query be tweaked to cater the need here?
>
> Just use a join:
> insert into target(val1, val2, val3, val4)
> select :param1, cfgA.substA, :param3, cfgB.substB
> from cfgA, cfgB
> where cfgA.keyA = :param2 and cfgB.keyB = :param4
>
> Or use a CTE per lookup which might be more readable:
>
> with cA as ( select substA from cfgA where keyA = :param2 ),
>  cB as ( select substB from cfgB where keyB = :param4 )
> insert into target(val1, val2, val3, val4)
> select :param1, cA.substA, :param3, cB.substB
> from cA, cB
>
>
Thank you. I will try these options.
Also we are trying to do something as below , which will separate the
tables based on the specific lookup fields for the target tables and thus
it will look simple rather than using those reference tables in the From
clause which may cause some confusion in reading the code or not sure if it
will cause cartesian. Please correct me if I'm wrong.

INSERT INTO tab_part1 (column1, column2, column3, column4, column5,
part_date)
VALUES ( :v_col1, (SELECT lookup_value FROM reference_tab1 WHERE lookup_key
= :v_col2), :v_col3, :v_col4, :v_col5, CURRENT_DATE );


Re: update faster way

2024-09-14 Thread yudhi s
On Sat, Sep 14, 2024 at 4:55 PM Peter J. Holzer  wrote:

>
> Which in turn means that you want as little overhead as possible per
> batch which means finding those 5000 rows should be quick. Which brings
> us back to Igor's question: Do you have any indexes in place which speed
> up finding those 5000 rows (the primary key almost certainly won't help
> with that). EXPLAIN (ANALYZE) (as suggested by Laurenz) will certainly
> help answering that question.
>
> > And also those rows will not collide with each other. So do you think
> > that approach can anyway cause locking issues?
>
> No, I don't think so. With a batch size that small I wouldn't expect
> problems even on the live partition. But of course many busy parallel
> sessions will put additional load on the system which may or may not be
> noticeable by users (you might saturate the disks writing WAL entries
> for example, which would slow down other sessions trying to commit).
>
>
> > Regarding batch update with batch size of 1000, do we have any method
> exists in
> > postgres (say like forall statement in Oracle) which will do the batch
> dml. Can
> > you please guide me here, how we can do it in postgres.
>
> Postgres offers several server side languages. As an Oracle admin you
> will probably find PL/pgSQL most familiar. But you could also use Perl
> or Python or several others. And of course you could use any
> programming/scripting language you like on the client side.
>
>
 When you said *"(the primary key almost certainly won't help with that)", *I
am trying to understand why it is so ?
I was thinking of using that column as an incrementing filter and driving
the eligible rows based on that filter. And if it would have been a
sequence. I think it would have helped but in this case it's UUID , so I
may not be able to do the batch DML using that as filter criteria. but in
that case will it be fine to drive the update based on ctid something as
below? Each session will have the range of 5 days of data or five partition
data and will execute a query something as below which will update in the
batches of 10K and then commit. Is this fine? Or is there some better way
of doing the batch DML in postgres plpgsql?

DO $$
DECLARE
l_rowid_array ctid[];
l_ctid ctid;
l_array_size INT := 1;
l_processed INT := 0;
BEGIN

FOR l_cnt IN 0..(SELECT COUNT(*) FROM part_tab WHERE   part_date >
'1-sep-2024' and part_date < '5-sep-2024'
) / l_array_size LOOP
l_rowid_array := ARRAY(
SELECT ctid
FROM part_tab
WHERE part_date   > '1-sep-2024' and part_date < '5-sep-2024'
LIMIT l_array_size OFFSET l_cnt * l_array_size
);

FOREACH l_ctid IN ARRAY l_rowid_array LOOP
update  part_tab
SET column1 = reftab.code
   FROM reference_tab reftab
WHERE tab_part1.column1 = reftab.column1
and ctid = l_ctid;
l_processed := l_processed + 1;
END LOOP;

COMMIT;
END LOOP;

END $$;


Re: update faster way

2024-09-14 Thread yudhi s
On Sat, 14 Sept, 2024, 1:09 pm Laurenz Albe, 
wrote:

> On Sat, 2024-09-14 at 08:43 +0530, yudhi s wrote:
> > We have to update a column value(from numbers like '123' to codes like
> 'abc'
> > by looking into a reference table data) in a partitioned table with
> billions
> > of rows in it, with each partition having 100's millions rows. As we
> tested
>
> > for ~30million rows it's taking ~20minutes to update. So if we go by this
> > calculation, it's going to take days for updating all the values. So my
> > question is
> >
> > 1) If there is any inbuilt way of running the update query in parallel
> >(e.g. using parallel hints etc) to make it run faster?
> > 2) should we run each individual partition in a separate session (e.g.
> five
> >partitions will have the updates done at same time from 5 different
> >sessions)? And will it have any locking effect or we can just start
> the
> >sessions and let them run without impacting our live transactions?
>
> Option 1 doesn't exist.
> Option 2 is possible, and you can even have more than one session workingr
> on a single partition.
>
> However, the strain on your system's resources and particularly the row
> locks will impair normal database work.
>
> Essentially, you can either take an extended down time or perform the
> updates
> in very small chunks with a very low "lock_timeout" over a very long period
> of time.  If any of the batches fails because of locking conflicts, it has
> to be retried.
>
> Investigate with EXPLAIN (ANALYZE) why the updates take that long.  It
> could
> be a lame disk, tons of (unnecessary?) indexes or triggers, but it might as
> well be the join with the lookup table, so perhaps there is room for
> improvement (more "work_mem" for a hash join?)
>


Thank you so much Laurenz.

We have mostly insert/update happen on current day/live partition. So
considering that, if we will run batch updates(with batch size of 1000)
from five different sessions in parallel on different historical partition,
at any time they will lock 5000 rows and then commit. And also those rows
will not collide with each other. So do you think that approach can anyway
cause locking issues? We will ensure the update of live partition occurs
when we have least activity. So in that way we will not need extended down
time. Please correct me if wrong.

Never used lock_timeout though, but in above case do we need lock_timeout?

Regarding batch update with batch size of 1000, do we have any method
exists in postgres (say like forall statement in Oracle) which will do the
batch dml. Can you please guide me here, how we can do it in postgres.

And yes will need to see what happens in the update using explain analyze.
And I was trying to see, if we can run explain analyze without doing actual
update , but seems that is not possible.




>


Re: update faster way

2024-09-13 Thread yudhi s
>
>
>
> Do you have any indexes?
> If not - you should, if yes - what are they?
>
>
>
Yes we have a primary key on this table which is on a UUID type column and
also we have other indexes in other timestamp columns . But how is this
going to help as we are going to update almost all the rows in the table?


update faster way

2024-09-13 Thread yudhi s
Hello,
We have to update a column value(from numbers like '123' to codes like
'abc' by looking into a reference table data) in a partitioned table with
billions of rows in it, with each partition having 100's millions rows. As
we tested for ~30million rows it's taking ~20minutes to update. So if we go
by this calculation, it's going to take days for updating all the values.
So my question is

1) If there is any inbuilt way of running the update query in parallel
(e.g. using parallel hints etc) to make it run faster?
2) should we run each individual partition in a separate session (e.g. five
partitions will have the updates done at same time from 5 different
sessions)? And will it have any locking effect or we can just start the
sessions and let them run without impacting our live transactions?

UPDATE tab_part1
SET column1 = reftab.code
FROM reference_tab reftab
WHERE tab_part1.column1 = subquery.column1;

Regards
Yudhi


Re: Manual query vs trigger during data load

2024-09-13 Thread yudhi s
On Fri, Sep 13, 2024 at 8:27 PM Adrian Klaver 
wrote:

> On 9/13/24 07:50, Adrian Klaver wrote:
> > On 9/13/24 02:58, Juan Rodrigo Alejandro Burgos Mella wrote:
> >> Hello, I find it unlikely that the trigger will work properly, since
> >> the reserved fields of the OLD subset have no value in an INSERT
> >
> > I'm not seeing that the OP is asking for OLD.* values, they are just
> > looking to include the result of a lookup on another table in the INSERT.
>
> My mistake I see the OLD reference now.
>
> >
>

My mistake.The trigger was supposed to use "new.col2" and fetch the
corresponding lookup value from the lookup table and insert that value to
the target table.

Now my question was ,in such a situation , the trigger will work fine , but
is that the optimal way of doing ? Or should we convert the query someway
such that the lookup table can be queried along with the INSERT at one shot
from the database with a single DB call? And is it true that the trigger on
the target table will suppress the batch insert and make it row by row,
even if we call it in a batch fashion?

As "thiemo" mentioned , it can be done as below method, but if we have
multiple lookup tables to be populated for multiple columns , then , how
can the INSERT query be tweaked to cater the need here? And I understand ,
the lookup table can be cached in Java and refreshed at a certain point in
time, but I was trying to understand if this can be doable by directly
querying the database, considering the lookup tables are having large data
sets in them.

Insert into tab1 (val1, val2)
Select valA, valB
>From tab2
Where valC = :param1


Manual query vs trigger during data load

2024-09-13 Thread yudhi s
Hello All,

We are having a table which is going to be inserted with 100's of millions
of rows each day. And we now want to have a requirement in which we need to
do some transformation/lookup logic built on top of a few of the input bind
values , while inserting the data. So I wanted to understand ,is it
possible to do it along with the INSERT query or is it better to have a
trigger created for the same?

For. e.g Below is the current Insert query used in the Java code. We want
to fetch the value for "column2" from a lookup table rather than directly
inserting as it's coming from the customer side. So I am thinking of a
trigger like below. But at the same time I also want to compare the
performance of a normal way of doing the lookup vs having it performed
using triggers.

So one way i am thinking is first fetching the value of the "column2" from
reference_tab1 using a separate "select query" in Java code itself,  and
then passing that to the below insert query, but i think that will increase
the response time as that will be a separate DB call.

1)So,  is there a way I can do it directly using the single INSERT query
itself without additional SELECT query? And then will try to compare that
with the trigger based approach.
2)Additionally , if this decision will impact a batch insert approach. i.e.
say , in case of trigger , will the batch insert fail because trigger will
force it to make it row by row?

INSERT INTO tab_part1 (column1, column2, column3, column4, column5,
part_date)
VALUES (:v_col1, :v_col2, :v_col3, :v_col3, :v_col4,:v_col5,
CURRENT_DATE);

CREATE OR REPLACE FUNCTION trg_func_populate_column2() RETURNS TRIGGER AS $$
BEGIN
-- Fetch reference value and populate column2
NEW.column2 := (SELECT lookup_key FROM reference_tab1 WHERE lookup_key
= old.column2);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;

Regards
Yudhi


Re: question on audit columns

2024-09-04 Thread yudhi s
On Wed, Sep 4, 2024 at 6:29 PM Muhammad Usman Khan 
wrote:

> Hi,
>
> In your scenario, triggers can add some overhead since they require extra
> processing after each update operation. Considering the size of your table
> and the high transaction volume, you need to observe that this might
> significantly affect performance.
>
>
>
>
> On Wed, 4 Sept 2024 at 17:50, yudhi s  wrote:
>
>> Hello,
>> In postgres database , we have all the tables with audit columns like
>> created_by_user, created_timestamp,updated_by_user, updated_timestamp. So
>> we have these fields that were supposed to be populated by the time at
>> which the insert/update operation happened on the database but not at the
>> application level. So we are planning to populate the created_by_user,
>> created_timestamp columns by setting a default value of "current_timestamp"
>> and "current_user" for the two columns,  but no such this is available to
>> populate while we do the update of the row, so the only option seems to be
>> through a trigger.
>>
>> So wanted to check with the experts here  ,considering the table will be
>> DML heavy table (300M+ transactions will be inserted daily), Is is okay to
>> have the trigger for this table for populating all the audit columns or
>> should we keep default for  created_by_user, created_timestamp and just
>> trigger for the update related two audit column? Basically wanted to see,
>> if the default value does the same thing as a trigger or it does something
>> more optimally than trigger?
>>
>> Regards
>> Yudhi
>>
>
Thank you so much. So do you mean to say that , we should add default
values for the create_timestamp and create_user_id as current_timestamp and
current_user,  but for update_user_id and update_timestamp , we can ask the
application to update the values manually , whenever they are executing the
update statement on the rows?


question on audit columns

2024-09-04 Thread yudhi s
Hello,
In postgres database , we have all the tables with audit columns like
created_by_user, created_timestamp,updated_by_user, updated_timestamp. So
we have these fields that were supposed to be populated by the time at
which the insert/update operation happened on the database but not at the
application level. So we are planning to populate the created_by_user,
created_timestamp columns by setting a default value of "current_timestamp"
and "current_user" for the two columns,  but no such this is available to
populate while we do the update of the row, so the only option seems to be
through a trigger.

So wanted to check with the experts here  ,considering the table will be
DML heavy table (300M+ transactions will be inserted daily), Is is okay to
have the trigger for this table for populating all the audit columns or
should we keep default for  created_by_user, created_timestamp and just
trigger for the update related two audit column? Basically wanted to see,
if the default value does the same thing as a trigger or it does something
more optimally than trigger?

Regards
Yudhi


Re: Insert works but fails for merge

2024-08-10 Thread yudhi s
Apology for the confusion. The other column is the txn_timestamp in the
composite unique key,  which is also the partition key.

But yes we cant use both in the ON clause because of certain business
requirements. We realized it late. And that's why "on conflict " We are
unable to use.

On Sun, 11 Aug, 2024, 2:57 am Adrian Klaver, 
wrote:

> On 8/10/24 13:23, yudhi s wrote:
> >
> >
> > On Sat, Aug 10, 2024 at 8:22 PM Adrian Klaver  > <mailto:adrian.kla...@aklaver.com>> wrote:
> >
> >
> >
> > Why not use INSERT ... ON CONFLICT instead of MERGE?
> >
> >  >
> >  > MERGE INTO tab1 AS target
> >  > USING (VALUES ('5efd4c91-ef93-4477-840c-a723ae212d99', 123,
> >  >
> > '2024-08-09T11:33:49.402585600Z','2024-08-09T11:33:49.402585600Z'))
> AS
> >  > source(id, mid,txn_timestamp, cre_ts)
> >  > ON target.id <http://target.id> <http://target.id
> > <http://target.id>> = source.id <http://source.id> <http://source.id
> > <http://source.id>>
> >  > WHEN MATCHED THEN
> >  > UPDATE SET mid  = source.mid
> >  > WHEN NOT MATCHED THEN
> >  > INSERT (id, mid, txn_timestamp, cre_ts)
> >  >  VALUES (source.id <http://source.id> <http://source.id
> > <http://source.id>>,source.mid,
> >  >   source.txn_timestamp, source.cre_ts);
> >
> >
> >
> > Actually , as per the business logic , we need to merge on a column
> > which is not unique or having any unique index on it. It's the leading
> > column of a composite unique key though. And in such scenarios the
> > "INSERT ON CONFLICT" will give an error. So we are opting for a
> > merge statement here, which will work fine with the column being
> > having duplicate values in it.
>
>
> Alright it's official I am confused.
>
> You started with:
>
> WITH source_data (id, mid, txn_timestamp, cre_ts) AS (
>  VALUES ('5efd4c91-ef93-4477-840c-a723ae212d66', 123,
> '2024-08-09T11:33:49.402585600Z', '2024-08-09T11:33:49.402585600Z')
> )
> INSERT INTO tab1 (id, mid, txn_timestamp, cre_ts)
> SELECT id, mid, txn_timestamp, cre_ts
> FROM source_data
> ON CONFLICT (id) DO UPDATE
> SETmid = EXCLUDED.mid,
>  txn_timestamp = EXCLUDED.txn_timestamp,
>  cre_ts = EXCLUDED.cre_ts;
>
> That implied that id was unique in of itself. As side note you called it
> a merge, which it is not as in MERGE. At this point I got off track
> thinking of MERGE.
>
> Then you went to the below which is a merge:
>
> MERGE INTO tab1 AS target
> USING (VALUES ('5efd4c91-ef93-4477-840c-a723ae212d99', 123,
> '2024-08-09T11:33:49.402585600Z','2024-08-09T11:33:49.402585600Z')) AS
> source(id, mid,txn_timestamp, cre_ts)
> ON target.id = source.id
> WHEN MATCHED THEN
> UPDATE SET mid  = source.mid
> WHEN NOT MATCHED THEN
> INSERT (id, mid, txn_timestamp, cre_ts)
>  VALUES (source.id,source.mid,  source.txn_timestamp, source.cre_ts);
>
> The question I have now is if id is part of a composite UNIQUE index on
> this:
>
>   CREATE TABLE tab1 (
>  id varchar(100) ,
>  mid INT,
>  txn_timestamp TIMESTAMPTZ NOT NULL,
>  cre_ts TIMESTAMPTZ NOT NULL
> ) PARTITION BY RANGE (txn_timestamp);
>
> Then what is the other column in the UNIQUE index?
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>


Re: Insert works but fails for merge

2024-08-10 Thread yudhi s
On Sat, Aug 10, 2024 at 8:22 PM Adrian Klaver 
wrote:

>
>
> Why not use INSERT ... ON CONFLICT instead of MERGE?
>
> >
> > MERGE INTO tab1 AS target
> > USING (VALUES ('5efd4c91-ef93-4477-840c-a723ae212d99', 123,
> > '2024-08-09T11:33:49.402585600Z','2024-08-09T11:33:49.402585600Z')) AS
> > source(id, mid,txn_timestamp, cre_ts)
> > ON target.id  = source.id 
> > WHEN MATCHED THEN
> > UPDATE SET mid  = source.mid
> > WHEN NOT MATCHED THEN
> > INSERT (id, mid, txn_timestamp, cre_ts)
> >  VALUES (source.id ,source.mid,
> >   source.txn_timestamp, source.cre_ts);
>
>
>
Actually , as per the business logic , we need to merge on a column which
is not unique or having any unique index on it. It's the leading column of
a composite unique key though. And in such scenarios the "INSERT ON
CONFLICT" will give an error. So we are opting for a merge statement here,
which will work fine with the column being having duplicate values in it.


Re: Insert works but fails for merge

2024-08-10 Thread yudhi s
On Sat, Aug 10, 2024 at 2:56 AM Adrian Klaver 
wrote:

> On 8/9/24 14:13, yudhi s wrote:
> > Hello,
> > It's version 15.4 postgres. Where we have an insert working fine, but
> > then a similar insert with the same 'timestamp' value, when trying to be
> > executed through merge , it fails stating "You will need to rewrite or
> > cast the expression.". Why so?
> >
> > *Example:-*
> > https://dbfiddle.uk/j5S7br-q <https://dbfiddle.uk/j5S7br-q>*
> > *
> >
> > CREATE TABLE tab1 (
> >  id varchar(100) ,
> >  mid INT,
> >  txn_timestamp TIMESTAMPTZ NOT NULL,
> >  cre_ts TIMESTAMPTZ NOT NULL
> > ) PARTITION BY RANGE (txn_timestamp);
> >
> > CREATE TABLE tab1_2024_08_09 PARTITION OF tab1
> >  FOR VALUES FROM ('2024-08-09 00:00:00') TO ('2024-08-10 00:00:00');
> >
> > -- Below insert works fine
> > INSERT INTO tab1
> >  (id, mid, txn_timestamp, cre_ts)
> > VALUES
> >  ('5efd4c91-ef93-4477-840c-a723ae212d84', 123,
> > '2024-08-09T11:33:49.402585600Z', '2024-08-09T11:33:49.402585600Z');
> >
> > -- Below merge , which trying to insert similar row but failing
> >
> > WITH source_data (id, mid, txn_timestamp, cre_ts) AS (
> >  VALUES ('5efd4c91-ef93-4477-840c-a723ae212d66', 123,
> > '2024-08-09T11:33:49.402585600Z', '2024-08-09T11:33:49.402585600Z')
> > )
> > INSERT INTO tab1 (id, mid, txn_timestamp, cre_ts)
> > SELECT id, mid, txn_timestamp, cre_ts
> > FROM source_data
> > ON CONFLICT (id) DO UPDATE
> > SETmid = EXCLUDED.mid,
> >  txn_timestamp = EXCLUDED.txn_timestamp,
> >  cre_ts = EXCLUDED.cre_ts;
> >
> > ERROR: column "txn_timestamp" is of type timestamp with time zone but
> > expression is of type text LINE 24: SELECT id, mid, txn_timestamp,
> > cre_ts ^ HINT: You will need to rewrite or cast the expression.
>
>
> VALUES:
>
> https://www.postgresql.org/docs/current/sql-values.html
>
> "When VALUES is used in INSERT, the values are all automatically coerced
> to the data type of the corresponding destination column. When it's used
> in other contexts, it might be necessary to specify the correct data
> type. If the entries are all quoted literal constants, coercing the
> first is sufficient to determine the assumed type for all:
>
> SELECT * FROM machines
> WHERE ip_address IN (VALUES('192.168.0.1'::inet), ('192.168.0.10'),
> ('192.168.1.43'));
> "
>
> The VALUES is not directly attached to the INSERT, you will need to do
> explicit casts:
>
> VALUES ('5efd4c91-ef93-4477-840c-a723ae212d66', 123,
> '2024-08-09T11:33:49.402585600Z'::timestamptz,
> '2024-08-09T11:33:49.402585600Z'::timestamptz)
>
>
>
Thank You Adrian and David.

Even converting the merge avoiding the WITH clause/CTE as below , is still
making it fail with the same error. So it seems , only direct "insert into
values" query can be auto converted/casted but not the other queries.

In our case , we were using this merge query in application code(in Java)
as a framework to dynamically take these values as bind values and do the
merge of input data/message. But it seems we have to now cast each and
every field which we get from the incoming message to make this merge work
in a correct way. I am wondering if the only way now is to get the data
types from information_schema.columns and then use the cast function to
write the values of the merge query dynamically casted/converted for each
of the fields in the application code. Please correct me if my
understanding is wrong.

MERGE INTO tab1 AS target
USING (VALUES ('5efd4c91-ef93-4477-840c-a723ae212d99', 123,
'2024-08-09T11:33:49.402585600Z','2024-08-09T11:33:49.402585600Z')) AS
source(id, mid,txn_timestamp, cre_ts)
ON target.id = source.id
WHEN MATCHED THEN
UPDATE SET mid  = source.mid
WHEN NOT MATCHED THEN
INSERT (id, mid, txn_timestamp, cre_ts)
VALUES (source.id,source.mid,  source.txn_timestamp, source.cre_ts);


Insert works but fails for merge

2024-08-09 Thread yudhi s
Hello,
It's version 15.4 postgres. Where we have an insert working fine, but then
a similar insert with the same 'timestamp' value, when trying to be
executed through merge , it fails stating "You will need to rewrite or cast
the expression.". Why so?

*Example:-*
https://dbfiddle.uk/j5S7br-q

 CREATE TABLE tab1 (
id varchar(100) ,
mid INT,
txn_timestamp TIMESTAMPTZ NOT NULL,
cre_ts TIMESTAMPTZ NOT NULL
) PARTITION BY RANGE (txn_timestamp);

CREATE TABLE tab1_2024_08_09 PARTITION OF tab1
FOR VALUES FROM ('2024-08-09 00:00:00') TO ('2024-08-10 00:00:00');

-- Below insert works fine
INSERT INTO tab1
(id, mid, txn_timestamp, cre_ts)
VALUES
('5efd4c91-ef93-4477-840c-a723ae212d84', 123,
'2024-08-09T11:33:49.402585600Z', '2024-08-09T11:33:49.402585600Z');

-- Below merge , which trying to insert similar row but failing

WITH source_data (id, mid, txn_timestamp, cre_ts) AS (
VALUES ('5efd4c91-ef93-4477-840c-a723ae212d66', 123,
'2024-08-09T11:33:49.402585600Z', '2024-08-09T11:33:49.402585600Z')
)
INSERT INTO tab1 (id, mid, txn_timestamp, cre_ts)
SELECT id, mid, txn_timestamp, cre_ts
FROM source_data
ON CONFLICT (id) DO UPDATE
SETmid = EXCLUDED.mid,
txn_timestamp = EXCLUDED.txn_timestamp,
cre_ts = EXCLUDED.cre_ts;

ERROR: column "txn_timestamp" is of type timestamp with time zone but
expression is of type text LINE 24: SELECT id, mid, txn_timestamp, cre_ts ^
HINT: You will need to rewrite or cast the expression.


Re: Standard of data storage and transformation

2024-08-07 Thread yudhi s
On Wed, Aug 7, 2024 at 3:13 AM Ron Johnson  wrote:

> On Tue, Aug 6, 2024 at 5:07 PM yudhi s 
> wrote:
>
>> Hi All,
>> We are having a use case in which we are having transaction data for
>> multiple customers in one of the Postgres databases(version 15.4) and we
>> are consuming it from multiple sources(batch file processing, kafka event
>> processing etc). It's currently stored in normalized form postgres
>> with constraints, indexes, partitions defined. This postgres database is
>> holding the transaction data for around a month or so. There are use cases
>> of running online transaction search reports which will be mostly real time
>> reporting and also some daily transaction batch reports based on customers
>> and also month end reports for customers. In target state it will hold
>> Approx. ~400 million transactions/day which can be billions of rows across
>> multiple related parent/child tables.
>>
>> There is another requirement to send these customer transaction data to
>> an olap system which is in a snowflake database and there it will be
>> persisted for many years. The lag between the data in postgres/oltp and in
>> snowflake will be ~1hr. And any reporting api can query postgres for <1
>> month worth of transaction data and if it needs to scan for >1month worth
>> of transaction data, it will point to the snowflake database.
>>
>> Now the question which we are wondering is , should we send the data as
>> is in normalized table form to snowflake and then there we
>> transform/flatten the data to support the reporting use case or should we
>> first flatten or transform the data in postgres itself and make it as
>> another structure( for example creating materialized views on top of base
>> table) and only then move that data to the snowflake? What is the
>> appropriate standard and downside if we do anything different.
>>
>
> Some thoughts:
> 0) How big are the records?
> 1) Materialized views add disk space overhead.
> 2) Materialized views are for when you query the same static over and over
> again.
> 3) IIUC, you'll be moving the data from PG to Snowflake just once.
> 4) Writing an MV to disk and then reading it only once doubles the IO
> requirements.
> 5) Thus, my first thought would be to extract the data from PG using a
> denormalizing "plain" view.
> 5a) If you can't make that Fast Enough, then obviously you must pull the
> normalized data from PG and denorm it elsewhere.  You know your situation
> better than us.
> 6) Indices will be critical: not too many, but not too few.
> 7) Obviously consider partitioning, but note that too many partitions can
> make query planning MUCH slower.
> 7a) 31 days cut into hours means 744 partitions.  That's a LOT.
> 7b) Partitioning every THREE hours means only 248 child tables.  A lot,
> but much more manageable.
> 7c) That might well kill reporting performance, though, if it's for
> example one customer across many partitions.
> 8) You (hopefully) know what kind of queries will be run.  Maybe partition
> by customer (or whatever) range and THEN by an hour range.
> 8a) You'd have to simultaneously run multiple extract jobs (on for each
> "customer" range), but that might not be too onerous, since then each hour
> partition would be smaller.
> 9) Testing.  Nothing beats full-scale testing.
>
>
>
The table has ~100+ columns but I think the denormalized or the flatten
table which we are planning to create will mostly have a large number of
columns in it as that will be based on the columns from multiple tables
joined together. However, the plan was to have the cooked data ready so as
not to do the additional work in target or downstream. So I was thinking of
a physical model for persisting the transformed data(using MV) rather than
using a simple view which will have performance overhead.

Because what is happening is , after the data moves to snowflake , people
try to create their own version of the transformed table on top of
these normalized tables(which we call as refiners) and then query those
from UI or for reporting. And some people say we should avoid those
downstream refiners and should do it here in source/postgres.Also the plan
was to move the data from postgres once every hour.


Standard of data storage and transformation

2024-08-06 Thread yudhi s
Hi All,
We are having a use case in which we are having transaction data for
multiple customers in one of the Postgres databases(version 15.4) and we
are consuming it from multiple sources(batch file processing, kafka event
processing etc). It's currently stored in normalized form postgres
with constraints, indexes, partitions defined. This postgres database is
holding the transaction data for around a month or so. There are use cases
of running online transaction search reports which will be mostly real time
reporting and also some daily transaction batch reports based on customers
and also month end reports for customers. In target state it will hold
Approx. ~400 million transactions/day which can be billions of rows across
multiple related parent/child tables.

There is another requirement to send these customer transaction data to an
olap system which is in a snowflake database and there it will be persisted
for many years. The lag between the data in postgres/oltp and in snowflake
will be ~1hr. And any reporting api can query postgres for <1 month worth
of transaction data and if it needs to scan for >1month worth of
transaction data, it will point to the snowflake database.

Now the question which we are wondering is , should we send the data as is
in normalized table form to snowflake and then there we transform/flatten
the data to support the reporting use case or should we first flatten or
transform the data in postgres itself and make it as another structure( for
example creating materialized views on top of base table) and only then
move that data to the snowflake? What is the appropriate standard and
downside if we do anything different.

Regards
Yudhi


Re: Partition boundary messed up

2024-07-27 Thread yudhi s
If you were thinking of some ALTER command which will just alter the
boundaries of the incorrect partition and make it correct ,  I don't think
there exists any such. You may have to create a whole new table and run the
partman.create_parent block with the UTC time zone set so that all the
subsequent partitions will be created with correct boundaries and then move
the data into it from the existing table.

On Thu, Jul 25, 2024 at 12:55 AM Lok P  wrote:

> Hello All,
> We normally operate on UTC timezone so we normally create partitions in
> UTC timezone so that each day partition starts from today's midnight UTC to
> next day's midnight UTC. The script looks something like below. And also
> that way reference partition tables are also created in a similar way. Say
> for example the 29th august partition in parent partition reference to 29th
> august child partition as FK. This was working fine even for partition
> maintenance(mainly while dropping historical partitions) too without any
> issue.
>
> For one of the newly introduced partition tables, by mistake somebody
> created the initial partition definition without setting the timezone to
> UTC and all the partitions created are in different time zones. And as this
> table refers to another partition table (which is the parent and have
> partitions created in UTC timezone) the rows are spanning across two
> partitions and it's breaking the partition maintenance process while we try
> to drop the historical partition.
>
> Now the issue is that the newly introduced table already has billions of
> rows pumped into it spanning across 40 partitions. So is there an easy way
> to just alter the partition boundary to make it UTC midnight to midnight
> range?
> or
> The only way is to create a new table from scratch with the correct
> partition boundary in UTC timezone and then move the data and then create
> foreign key on that(which I believe is going to take a lot of time too)?
>
> Another thing we noticed, it shows initial partitions having boundaries in
> NON UTC (which we understand because of the missing timezone syntax) but
> then suddenly the subsequent partitions are getting created UTC too, not
> sure how it happened. And I believe it will create issues while rows come
> into the database which falls in between these ranges? Wondering if there
> is any easy way to correct this mess now? Note- This is postgres version
> 15+.
>
> Below partition creation script we use:-
> set timesozne='UTC';
> SELECT partman.create_parent(
>p_parent_table := 'schema1.tab1',
>p_control := 'TRAN_DATE',
>p_type := 'native',
>p_interval := '1 day',
>p_premake := 90,
>p_start_partition => '2024-02-15 00:00:00'
> );
> UPDATE partman.part_config SET infinite_time_partitions = 'true' ,
> premake=20 WHERE parent_table = 'schema1.tab1';
> CALL partman.run_maintenance_proc();
>
> Below details i fetched from  pg_class for the table which is messed up:-
>
> Partition_name   Partition Expressions
> TAB1_p2024_08_29 FOR VALUES FROM ('2024-08-29 00:00:00+05:30') TO
> ('2024-08-30 00:00:00+05:30')
> TAB1_p2024_08_30 FOR VALUES FROM ('2024-08-30 00:00:00+05:30') TO
> ('2024-08-31 00:00:00+05:30')
> TAB1_p2024_08_31 FOR VALUES FROM ('2024-08-31 00:00:00+05:30') TO
> ('2024-09-01 00:00:00+05:30')
> TAB1_p2024_09_01 FOR VALUES FROM ('2024-09-01 00:00:00+05:30') TO
> ('2024-09-02 00:00:00+05:30')
>
> *TAB1_p2024_09_02 FOR VALUES FROM ('2024-09-02 00:00:00+05:30') TO
> ('2024-09-03 00:00:00+05:30')*
> *TAB1_p2024_09_03 FOR VALUES FROM ('2024-09-03 05:30:00+05:30') TO
> ('2024-09-04 05:30:00+05:30')*TAB1_p2024_09_04 FOR VALUES FROM
> ('2024-09-04 05:30:00+05:30') TO ('2024-09-05 05:30:00+05:30')
> TAB1_p2024_09_05 FOR VALUES FROM ('2024-09-05 05:30:00+05:30') TO
> ('2024-09-06 05:30:00+05:30')
> TAB1_p2024_09_06 FOR VALUES FROM ('2024-09-06 05:30:00+05:30') TO
> ('2024-09-07 05:30:00+05:30')
>
> Regards
> Lok
>


Re: Question on partman extension while relation exist

2024-07-02 Thread yudhi s
On Tue, 2 Jul, 2024, 12:43 pm Muhammad Ikram,  wrote:

> Hi Yudhi,
>
> I think disabling foreign keys before maintenance will help.
>
> --
> Muhammad Ikram
>

Do you mean to say call the parent table first for maintenance followed by
child, and remove all the foreign key first which are pointing to this
parent table partition which is going to be dropped by the pg_partman?

As drop/create partition is being called from within pg_partman without our
intervention, so where should we put this drop foreign key code? Do you
mean having that with another event trigger which will fire before drop?


Question on partman extension while relation exist

2024-07-01 Thread yudhi s
Hello All,
In postgres we are seeing issues during automatic partition maintenance
using pg_partman extension. So basically it automatically creates one new
partition and drops one historical partition each day based on the set
retention period in part_config. We just call it like
partman.run_maintenance_proc('table_name');

While there exists foreign key relationships between the partitioned
tables, Mainly during dropping the parent partitions it takes a lot of
time, as it validates every child table partitions record and also is
taking lock longer. Ideally it should check only the respective parent
partition, but it's not doing that because the foreign key is defined in
table level rather than partition level. So we are planning to create the
foreign keys on the partition level but not at table level.

And we were thinking of doing it dynamically by having an "event trigger"
which will fire on "create statement" i.e while the "create new partition"
statement will be triggered by the Pg_partman. It will try to also create
the foreign key constraints on the new child partition referring to the
respective parent partition during the same time. So that things will be
automated.

But now we are stuck in one scenario , say for example if we execute the
pg_partman for the parent table first then it will create the new partition
independently which is fine, but when it will try to drop the historical
partition, it will complain stating the child partition already exists.

On the other hand,

If we run the pg_partman for the child table first, then it will drop the
historical child partition without any issue , however it will throw an
error while creating the foreign key , as because the respective parent
partition has not yet been created.

Need advice, how we should handle this scenario. Basically in which order
we should call the "pg_partman.run_maintenance_proc" for the parent and
child tables?


Re: Design for dashboard query

2024-06-15 Thread yudhi s
>
>
> On Sat, 15 Jun 2024 at 6:54 PM, sud  wrote:
>
>> Hello All,
>>
>> Its postgres version 15.4. We are having a requirement in which
>> aggregated information for all the users has to be displayed on the UI
>> screen. It should show that information on the screen. So basically, it
>> would be scanning the full table data which is billions of rows across many
>> months and then join with other master tables and aggregate those and then
>> display the results based on the input "user id" filter.
>>
>> In such a scenario we are thinking of using a materialized view on top of
>> the base tables which will store the base information and refresh those
>> periodically to show the data based on the input user id. However i am
>> seeing , postgres not supporting incremental refresh of materialized view
>> and full refresh can take longer. So , do we have any other option
>> available? Additionally , It should not impact or block the online users
>> querying the same materialized view when the refresh is happening.
>>
>>
>>
Yes incremental refresh is by default not available but you can do the
refresh using "concurrently" keyword which will be online i believe and
won't block your ongoing  queries if any.

And there are some other ways like create another view(say mv_new) with
exactly the same definition and refresh that with whatever interval you
want and then switch the views with the original one using the rename
command.


ALTER MATERIALIZED VIEW mv RENAME TO mv_old;
ALTER MATERIALIZED VIEW mv_new RENAME TO mv;
DROP MATERIALIZED VIEW mv_old;

Or

have a log table created (say transaction_log) and populate it using
triggers with each delta insert/update/delete on the transaction table. And
then schedule a cron job which will periodically flush the rows from the
transaction_log table to the materialized view. This will achieve your
incremental refresh.


Re: Long running query causing XID limit breach

2024-06-13 Thread yudhi s
On Sat, Jun 8, 2024 at 2:51 PM sud  wrote:

>
> Thank You so much Laurenz and Yudhi.
>
> Yes its RDS and as you mentioned there does exist a space limitation of
> ~64TB but as Laurenz mentioned the only time the second standby may crash
> would be probably because of  the storage space saturation and thus we need
> to have appropriate monitoring in place to find this and get alerted
> beforehand. And also a monitoring to see how much WAL gets generated per
> hour/day to get an idea of the usage. I am not sure how to do it , but will
> check on this.
>


Not exactly related but just for our information, While going through the
"aurora postgres" database docs in regards to similar concepts which are
getting discussed here, I am finding some interesting stuff.

https://aws.amazon.com/blogs/database/manage-long-running-read-queries-on-amazon-aurora-postgresql-compatible-edition/




*Cancel the conflicting query on the reader node if the conflict lasts
longer than max_standby_streaming_delay (maximum 30 seconds). This is
different from Amazon RDS or self-managed PostgreSQL. With Amazon RDS or
self-managed PostgreSQL, the instance has its own physical copy of the
database, and you’re able to set the parameter max_standby_streaming_delay
as high as you want to prevent query cancellation.If the conflicting query
can’t cancel in time, or if multiple long-running queries are causing the
replication lag to go beyond 60 seconds, Aurora restarts the reader node to
ensure it’s not lagging far behind the primary node.*

So if i get it correct it means, even if hot_standby_feedback is set to OFF,
the constraints of max_standby_streaming_delay (30 seconds) and the
60-second replication lag limit applies. And thus Aurora may cancel
long-running queries or restart reader nodes to maintain synchronization
even if it just runs for >60seconds.  So it's really odd but does that mean
, by no way you can guarantee a query to run >60 seconds on read replica in
aurora postgres?


Re: Question on pg_cron

2024-06-08 Thread yudhi s
On Sat, 8 Jun, 2024, 9:53 pm Ron Johnson,  wrote:

> On Sat, Jun 8, 2024 at 5:31 AM yudhi s 
> wrote:
>
>> Hello All,
>>
>> We have around 10 different partition tables for which the partition
>> maintenance is done using pg_partman extension. These tables have foreign
>> key dependency between them.  We just called partman.run_maintanance_proc()
>> through pg_cron without any parameters and it was working fine. So we can
>> see only one entry in the cron.job table. And it runs daily once.
>>
>> It was all working fine and we were seeing the historical partition being
>> dropped and new partitions being created without any issue. But suddenly we
>> started seeing, its getting failed with error "ERROR: can not drop
>> schema1.tab1_part_p2023_12_01 because other objects depend on it"
>>
>
> Have you changed version lately of PG, pg_cron or pg_partman?  Or maybe
> what pg_cron or pg_partman depends on?
>

No version change, but we updated the part_config to set premake from 30 to
60 for all the tables. But not sure how that impacted this behavior.

However, do you think, we should better control the order of execution
rather letting postgres to decide it's own, considering there is no such
parameters for this ordering in part_config? And in that case which
approach should we use out of the two i mentioned. Or any other strategies,
should we follow, please advise?

>


Question on pg_cron

2024-06-08 Thread yudhi s
Hello All,

We have around 10 different partition tables for which the partition
maintenance is done using pg_partman extension. These tables have foreign
key dependency between them.  We just called partman.run_maintanance_proc()
through pg_cron without any parameters and it was working fine. So we can
see only one entry in the cron.job table. And it runs daily once.

It was all working fine and we were seeing the historical partition being
dropped and new partitions being created without any issue. But suddenly we
started seeing, its getting failed with error "ERROR: can not drop
schema1.tab1_part_p2023_12_01 because other objects depend on it"

Then we realized , it may be the case that it's trying to run the partition
maintenance for the Parent partition table first before the child partition
table. So not sure how pg_partman handles the ordering of tables while
doing the partition maintenance as we don't see any parameter to drive the
ordering of the partition maintenance in part_config and we were under the
impression pg_partman will take care of the sequence of partition
maintenance automatically.

So want to understand if anybody encountered such issues?

And to handle the above issue , we are planning to call the partition
maintenance of each of the TABLE by passing the table name explicitly to
the run_maintanance_proc(), something as below. Is this advisable?

SELECT cron.unschedule('run_maintenance_proc');
SELECT cron.schedule(
'daily_partition_maintenance',
'0 2 * * *',
$$
DO $$
BEGIN
-- Run maintenance for child tables first
PERFORM partman.run_maintenance_proc('schema1.child_table1');
PERFORM partman.run_maintenance_proc('schema1.child_table2');
-- Add more child tables as needed

-- Run maintenance for parent tables next
PERFORM partman.run_maintenance_proc('schema1.parent_table1');
PERFORM partman.run_maintenance_proc('schema1.parent_table2');
-- Add more parent tables as needed
END;
$$;
$$
);



*Or else *
create a function like below and then call/schedule it through pg_cron

CREATE OR REPLACE FUNCTION run_partition_maintenance()
RETURNS void AS $$
BEGIN
-- Run maintenance for child tables first
PERFORM partman.run_maintenance_proc('schema1.child_table1');
PERFORM partman.run_maintenance_proc('schema1.child_table2');
-- Add more child tables as needed

-- Run maintenance for parent tables next
PERFORM partman.run_maintenance_proc('schema1.parent_table1');
PERFORM partman.run_maintenance_proc('schema1.parent_table2');
-- Add more parent tables as needed
END;
$$ LANGUAGE plpgsql;

SELECT cron.unschedule('run_maintenance_proc');

SELECT cron.schedule(
'daily_partition_maintenance',
'0 2 * * *',
'CALL run_partition_maintenance()'
);


Re: Long running query causing XID limit breach

2024-06-05 Thread yudhi s
On Wed, Jun 5, 2024 at 3:52 PM Laurenz Albe 
wrote:

>
> There should never be a restart unless you perform one or the standby
> crashes.
> If you mean that you want to avoid a crash caused by a full disk on the
> standby,
> the answer is probably "no".  Make sure that you have enough disk space and
> use monitoring.
>
> Yours,
> Laurenz Albe
>

Is this because OP initially mentioned its RDS postgres, so in that case
there is storage space restriction on 64TB(and 128TB in case of aurora
postgres). So I believe this storage space combines data + WAL , so in that
case as you mentioned, appropriate monitoring needs to be put in place.
Or else in the worst case scenario, if the storage consumption hit that
hard limit , then there will be instance restart or crash?


Re: Long running query causing XID limit breach

2024-05-28 Thread yudhi s
On Mon, May 27, 2024 at 2:50 PM Laurenz Albe 
wrote:

>
> > > But again for HA , in case primary down we should not be in big lag
> for the standby
> > > and thus we want the standby also with minimal lag. And as you
> mentioned there will
> > > never be incorrect results but at amx it will be query cancellation,
> so I was thinking,
> > > if it's fine to just keep the "hot_feedback_standby" as OFF and let the
> > > max_standby_streaming_delay set as it is like 14 sec. Let me know your
> thoughts.
> >
> You cannot have it.
> Let me repeat: you cannot have it.
>
> The only way you can have no delay in replication AND no canceled queries
> is
> if you use two different standby servers with different settings for
> "max_standby_streaming_delay".  One of the server is for HA, the other for
> your long-running queries.
>
> When you suggest having different max_standby_streaming_delay for first
replica (say 10 sec for High availability) and second replica(say -1 for
long running queries). Do you also suggest  keeping "hot_feedback_standby"
as "OFF" for all the three instances i.e. master and both the replicas?

Also OP has added a few other parameters as below, do you think these
should be needed?
 I think the master and first replica should have the same set up because
in case of any disaster to master the first replica should be able to take
the place of master.

Master/Primary First Replica/Standby for High Availability Second Replica
for Reporting
hot_standby_feedback=ON hot_standby_feedback=ON hot_standby_feedback=OFF
max_standby_streaming_delay=10 sec max_standby_streaming_delay=10 sec
max_standby_streaming_delay=-1
(Infinite)
statement_timeout = "2hrs" statement_timeout="2hrs" No statement_timeout
i.e. infinite
idle_in_transaction_session_timeout=10minutes
idle_in_transaction_session_timeout=10minutes No
idle_in_transaction_session_timeout i.e. infinite
autovacuum_freeze_max_age=100M autovacuum_freeze_max_age=100M
autovacuum_freeze_max_age=100M
Log_autovacuum_min_duration=0 Log_autovacuum_min_duration=0
Log_autovacuum_min_duration=0


Re: Long running query causing XID limit breach

2024-05-25 Thread yudhi s
On Fri, May 24, 2024 at 10:34 AM sud  wrote:

> I am trying to understand these two parameters and each time it looks a
> bit confusing to me. If These two parameters complement or conflict with
> each other.
>
> Say for example, If we set hot_feedback_standby to ON (which is currently
> set as default ON by the way), it will make the primary wait till the query
> completion at standby and can cause such a high bump in XID in scenarios
> where the query on standby runs for days(like in our current scenario which
> happens). So we were thinking of setting it as OFF, to avoid
> the transaction ID wrap around issue..
>
> But as you also mentioned to set the "max_standby_streaming_delay" to -1
> (which is currently set as 14 second in our case) ,it will wait infinitely
> , till the query completes on the standby and wont apply the WAL which can
> cause override of the XID which the standby query is reading from. But wont
> this same behaviour be happening while we have hot_feedback_standby set as
> "ON"?
>
> But again for HA , in case primary down we should not be in big lag for
> the standby and thus we want the standby also with minimal lag. And as you
> mentioned there will never be incorrect results but at amx it will be query
> cancellation, so I was thinking , if it's fine to just keep the
> "hot_feedback_standby" as OFF and let the max_standby_streaming_delay set
> as it is like 14 sec. Let me know your thoughts.
>
> Basically below are the combinations, i am confused between..
>
> hot_feedback_stanby ON and max_standby_streaming_delay=-1
> or
> hot_feedback_stanby OFF and max_standby_streaming_delay=-1
> Or
> hot_feedback_stanby ON and max_standby_streaming_delay=14 sec
> Or
> hot_feedback_stanby OFF and max_standby_streaming_delay=14 sec
>

As per my understanding here, this would be the behaviour. Others may
comment..



*hot_standby_feedback ON and max_standby_streaming_delay = -1:*
Ensures that long-running queries on the standby are not interrupted. The
primary waits indefinitely to avoid vacuuming rows needed by standby
queries.
But Can lead to significant replication lag and increased XID consumption
on the primary, potentially causing transaction ID wraparound issues.


*hot_standby_feedback OFF and max_standby_streaming_delay = -1:*
Ensures long-running queries on the standby are not interrupted. No
feedback is sent to the primary, reducing the risk of XID wraparound.
But The standby may fall significantly behind the primary, resulting in
high replication lag.



*hot_standby_feedback ON and max_standby_streaming_delay = 14 seconds:*
The primary prevents vacuuming rows needed by standby queries, reducing
query cancellations on the standby. The replication lag is limited to 14
seconds.
But Long-running queries on the standby that exceed 14 seconds may be
canceled, and the primary can still experience increased XID consumption.


*hot_standby_feedback OFF and max_standby_streaming_delay = 14 seconds:*
 Limits replication lag to 14 seconds and reduces XID consumption on the
primary. Queries on the standby exceeding 14 seconds are canceled.
but Long-running queries on the standby are more likely to be canceled due
to the lack of feedback to the primary.


Re: Long running query causing XID limit breach

2024-05-23 Thread yudhi s
On Thu, May 23, 2024 at 11:42 AM sud  wrote:

>
>> Calculation Rationale
>> Daily XID Usage: Approximately 4 billion rows per day implies high XID
>> consumption.
>> Buffer Time: At 1 billion XIDs, you would still have 1 billion XIDs
>> remaining, giving you roughly 12 hours to address the issue if your system
>> consumes 200 million XIDs per hour.
>>
>>
>>
>
> OP mentioned that initially the number of business transactions is around
500million but the rows inserted across many tables are ~4-5billion in
total per day. So doesn't it mean that the XID consumption will happen
based on the transactions rather on the number of rows basis. Say
for example ~4billion rows may be loaded using a batch size of ~1000 using
bulk load, so that will be ~4million txn so it should use ~4million XID but
not 4 billion XID usage. And thus making the transactions process in
batches rather than row by row minimizes the XID usage. Correct?


Re: Question on roles and privileges

2024-05-09 Thread yudhi s
On Fri, May 10, 2024 at 11:31 AM Lok P  wrote:

> For the initial installation the extensions may need superuser privileges.
>
>>
>>
Thank you. Yes, I got it. For the initial installation for the extensions
,it will need  super user privilege. But once that is done for the day to
day use , does these extensions need any write/execute privileges or
readonly privileges is enough? and if any readymade role available through
with all these are catered?


Question on roles and privileges

2024-05-09 Thread yudhi s
Hello All,
We want to make sure to keep minimal privileges for the users based on
their roles and responsibility. We have one user group who will be working
on analyzing/debugging into performance issues in the databases. Basically
this group will be operating on extensions like apg_plan_management,
pg_hint_plan, auto_explain, plprofiler, pg_repack. So these extensions will
already be installed for the group, but they will just need to use those
appropriately. For example pg_hint_plan will not need any write privilege
because the user just has to put the hint in the query and run it to see
any performance variation.

So like that , what kind of minimal privileges will each of these
extensions need to make them work for this performance group? Basically if
any of these will need write privilege or all works can be performed using
Readonly roles/privilege only?

And I understand pg_monitor role wraps up most of the key read only
privileges within it to work on performance issues and also its a readonly
privilege only. So I wanted to know from experts here , if it's true and
pg_monitor role will suffice for all the above work?

Regards
Yudhi


Re: How you make efficient design for CDC and book marking

2024-04-25 Thread yudhi s
If you are worried about the audit trigger and also an additional audit
table to hold all the rows , then you may get an additional flag added to
your base table deleted_flag and when delete happens, you just need to
update that flag through your code manually, but not physically delete the
record from the base table (if you want to avoid the performance impact of
the row trigger on the base tables). So it will make the main table bulky
but will have lesser impact to your incoming data load to base or main
tables.

On Fri, Apr 26, 2024 at 1:56 AM Lok P  wrote:

> Hello,
> My understanding is that the replication tools normally rely on the
> database transaction logs to find the CDC/delta
> changes(Insert/Update/Delete) for tables and then move those delta changes
> to the target system/databases. Whatever may be the source database (It
> might be open source postgres or aws RDS). And never done though, but i am
> assuming , manually scanning the DB logs must not be easy ones and also
> might not be given access to DB logs because of security reasons too. Hope
> my understanding is correct here.
>
> Thus, in absence of such replication tool(may be because of the additional
> cost associated etc) if someone wants to find the delta changes
> (insert/update/delete) in a database as efficiently as possible and move
> those to the target database in a continuous data streaming setup, I can
> only think of below option
>
> i.e maintaining audit columns like create_timestamp/update_timestamp
> columns in every source table so that they can be utilized to get the CDC
> for Insert and Update statements and also for bookmarking. But to find the
> delta for the deletes , there is not much option but to have row level
> triggers created on the base table which will populate another audit table
> with the deleted rows, and this is going to crawl if we get a lot of
> deletes(in millions) on the source tables.
>
> Want to know from experts, if there exists any other way to have these
> manual CDC and book marking more efficient for such continuous delta data
> movement scenarios?
>
> Regards
> Lok
>


Re: error in trigger creation

2024-04-21 Thread yudhi s
On Mon, 22 Apr, 2024, 1:34 am Ron Johnson,  wrote:

> On Sun, Apr 21, 2024 at 2:58 PM yudhi s 
> wrote:
>
>> the partition drop from parent is taking longer as it scans all the
>> partitions of the child table
>>
>
> Does the relevant supporting index exist on the child table?
>

Yes all the child tables have foreign keys indexed.

 Again I don't want to divert the main topic(trigger creation) . I will
provide the exact test case how it puts exclusive lock and runs longer as
scans all child partitions (which must be using indexes though) but as the
number of partitions increase the time of drop partitions increases.

>


Re: error in trigger creation

2024-04-21 Thread yudhi s
On Mon, Apr 22, 2024 at 12:02 AM David G. Johnston <
david.g.johns...@gmail.com> wrote:

>
> I suggest you share a script that demonstrates exactly what you are trying
> to accomplish.  Which event triggers you need to create from the
> application and what the functions those triggers call do.
>
>
We are using pg_partman for automatic partition maintenance however as we
have foreign keys created on the tables, so the partition drop from parent
is taking longer as it scans all the partitions of the child table and also
locks the full child table for that duration(even SELECT query not allowed
during that period). So we are thinking of creating foreign keys on
partitions rather than on tables however there is no direct option for that
to happen through pg_partman.

So we are thinking of first creating the table without any foreign keys and
creating the partitions using pg_partman, then create the below event
trigger which will add the foreign key to the new partitions for all new
future partitions. And we are planning to create such an event trigger for
all such child tables that are partitioned and having FK's.

CREATE OR REPLACE FUNCTION add_partition_foreign_key()
RETURNS event_trigger
LANGUAGE plpgsql
AS $$
DECLARE
partition_table TEXT;
parent_table TEXT;
partition_name TEXT;
BEGIN
IF TG_TAG = 'CREATE TABLE' THEN
partition_table := TG_TABLE_NAME;
parent_table := 'parent_table_' || to_char(NEW.partition_key, '_MM_DD');

EXECUTE format('ALTER TABLE %I ADD CONSTRAINT fk_%I_%I FOREIGN KEY
(partition_key, id) REFERENCES %I (partition_key, id)', partition_table,
partition_table, parent_table, parent_table);
END IF;
END;
$$;

CREATE EVENT TRIGGER add_partition_foreign_key_trigger
ON ddl_command_end
WHEN TAG IN ('CREATE TABLE')
EXECUTE FUNCTION add_partition_foreign_key();


Re: error in trigger creation

2024-04-21 Thread yudhi s
On Sun, Apr 21, 2024 at 8:13 PM Tom Lane  wrote:

> "David G. Johnston"  writes:
> > On Sunday, April 21, 2024, yudhi s  wrote:
> >> Are you saying something like below, in which we first create the
> >> function from super user and then execute the grant? But doesn't that
> mean,
> >> each time we want to create a new event trigger we have to be again
> >> dependent on the "super user" to modify the security definer function?
>
> > Dynamic SQL.  See “execute” in plpgsql.
>
> You might as well just give that user superuser and be done with it.
> It's foolish to imagine that you have any shred of security left
> if you're letting a user that's not 100.00% trusted write event
> triggers.  (Much less execute any SQL command whatsoever, which
> is what it sounds like David is suggesting you create a function
> to do.)
>
>
So do you mean , we should not create the event trigger using the "security
definer" , rather have the super user do this each time we have to create
the event trigger?

Actually , I am not very much aware about the security part, but is it fine
to give the super user privilege to the application user(say app_user) from
which normally scripts/procedures get executed by the application, but
nobody(individual person) can login using that user.

Additionally in other databases, triggers are driven by some
specific privileges (say for example in oracle "create trigger" privilege).
And it doesn't need any super user and we were having many applications in
which the application user (which were used for app to app login) was
having these privileges, similar to "create table" privileges which comes
by default to the schema who owns the objects  etc. So in this case i was
wondering if "event trigger" can cause any additional threat and thus there
is no such privilege like "create trigger" exist in postgres and so it
should be treated cautiously?


Re: error in trigger creation

2024-04-21 Thread yudhi s
On Sun, Apr 21, 2024 at 7:55 PM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Sunday, April 21, 2024, yudhi s  wrote:
>
>> On Sun, Apr 21, 2024 at 1:55 PM David G. Johnston <
>> david.g.johns...@gmail.com> wrote:
>>
>>> On Sunday, April 21, 2024, yudhi s  wrote:
>>>
>>>> so that it will be able to assign the privilege, so we will be able to
>>>> create the event trigger without need to run the event trigger script from
>>>> super user itself?
>>>>
>>>
>>> Write a security-definer function owned by superuser and grant app_user
>>> permission to execute it.
>>>
>>> David J.
>>>
>>>
>>
>> Thank You David.
>>
>>  Are you saying something like below, in which we first create the
>> function from super user and then execute the grant? But doesn't that mean,
>> each time we want to create a new event trigger we have to be again
>> dependent on the "super user" to modify the security definer function?
>>
>
> Dynamic SQL.  See “execute” in plpgsql.
>
> David J.
>
>

Even if we create the  event trigger using "security definer" function
embedding the "create event trigger" with in its body using dynamic
sql(something as below), and in future if we need to create another event
trigger , we need to again update the function and re-compile and for that
, we will need it it to be compiled using user "super user", is my
understanding correct here?
Or
it will just need the "super user" to create the function for the first
time , but after that the user who has the "execute grant" given (say
app_user) will be able to perform updates and compile to the function body?

CREATE OR REPLACE FUNCTION create_event_trigger_func()
  RETURNS void
  LANGUAGE plpgsql
  SECURITY DEFINER
  AS $$
  BEGIN
EXECUTE 'CREATE EVENT TRIGGER event_trigger_name ON schema_name ...';
  END;
$$;

GRANT EXECUTE ON FUNCTION create_event_trigger_func() TO app_user;


Re: error in trigger creation

2024-04-21 Thread yudhi s
On Sun, Apr 21, 2024 at 1:55 PM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Sunday, April 21, 2024, yudhi s  wrote:
>
>> so that it will be able to assign the privilege, so we will be able to
>> create the event trigger without need to run the event trigger script from
>> super user itself?
>>
>
> Write a security-definer function owned by superuser and grant app_user
> permission to execute it.
>
> David J.
>
>

Thank You David.

 Are you saying something like below, in which we first create the function
from super user and then execute the grant? But doesn't that mean, each
time we want to create a new event trigger we have to be again dependent on
the "super user" to modify the security definer function?

CREATE OR REPLACE FUNCTION create_event_trigger_func()
  RETURNS void
  LANGUAGE plpgsql
  SECURITY DEFINER
  AS $$
  BEGIN
EXECUTE 'CREATE EVENT TRIGGER event_trigger_name ON schema_name ...';
  END;
$$;

GRANT EXECUTE ON FUNCTION create_event_trigger_func() TO app_user;


error in trigger creation

2024-04-21 Thread yudhi s
Hi All,
We are seeing privilege issues while creating event triggers. It says the
user "*must be a superuser to create an event trigger*".

So my question is , if we have application user as "app_user" which is
responsible for creating database objects in schema "app_schema" and also
we have all scripts executed in the database through user "app_user" only,
then is there any way to grant necessary privilege or to equip the
app_user, so that it will be able to assign the privilege, so we will be
able to create the event trigger without need to run the event trigger
script from super user itself? This will ensure that we do not need "super
user" every time, when we want to run scripts to have an event
trigger created for our application schema.
Something like "*grant create event trigger on schema app_schema to
app_user*"?
Regards
Yudhi


Re: Controlling resource utilization

2024-04-17 Thread yudhi s
On Wed, 17 Apr, 2024, 12:40 pm ,  wrote:

>
>
> --
>
> *De: *"Juan Rodrigo Alejandro Burgos Mella" 
> *À: *"yudhi s" 
> *Cc: *"pgsql-general" 
> *Envoyé: *Mardi 16 Avril 2024 22:29:35
> *Objet: *Re: Controlling resource utilization
>
> ALTER ROLE  SET statement_timeout = '';
>
> Regards
> Gilles
>
>

Thank you so much. That helps.

This statement is succeeding for user as I executed. So it's working I
believe.

But to immediately verify without manually running queries and waiting for
it to be auto killed to confirm, Is there any system table which we can
verify to see if this setting is effective, as because I don't see any such
columns in pg_user or pg_role which shows the statement_timeout.

 And is there a way to put similar cap/restrictions on other db resources
like cpu, memory, I/O at specific user/role level?

>


Re: Controlling resource utilization

2024-04-16 Thread yudhi s
On Wed, 17 Apr, 2024, 1:32 am Juan Rodrigo Alejandro Burgos Mella, <
rodrigoburgosme...@gmail.com> wrote:

> Yes sir
>
> SET statement_timeout TO ''
>
> Atte
> JRBM
>
> El mar, 16 abr 2024 a las 14:46, yudhi s ()
> escribió:
>
>> Hi ,
>> We want to have controls around the DB resource utilization by the adhoc
>> user queries, so that it won't impact the application queries negatively.
>> Its RDS postgresql database version 15.4.
>>
>> Saw one parameter as statement_timeout which restricts the queries to not
>> run after a certain time duration and queries will be automatically
>> killed/cancelled. However, I don't see any other options to set this at
>> user level, rather it's getting set for all or at session level. So I want
>> to know if there exists, anyway to control the database resource
>> utilization specific to users?
>>
>> Regards
>> Yudhi
>>
>

This will set the timeout at session level. However, We want to understand,
if it can be done at user/role level, so that any such adhoc user queries
can be auto killed or cancelled after the set time.

>
>>


Controlling resource utilization

2024-04-16 Thread yudhi s
Hi ,
We want to have controls around the DB resource utilization by the adhoc
user queries, so that it won't impact the application queries negatively.
Its RDS postgresql database version 15.4.

Saw one parameter as statement_timeout which restricts the queries to not
run after a certain time duration and queries will be automatically
killed/cancelled. However, I don't see any other options to set this at
user level, rather it's getting set for all or at session level. So I want
to know if there exists, anyway to control the database resource
utilization specific to users?

Regards
Yudhi


Re: Issue with date/timezone conversion function

2024-04-09 Thread yudhi s
Below should work...

date_trunc('hour', timestamp_column *AT TIME ZONE '*America/New_York')
+ (((date_part('minute', timestamp_column *AT TIME ZONE
'*America/New_York')::int
/ 15)::int) * interval '15 min')

On Tue, Apr 9, 2024 at 11:54 PM Lok P  wrote:

>
> On Tue, Apr 9, 2024 at 10:33 PM Tom Lane  wrote:
>
>> Lok P  writes:
>> > These tables are INSERT only tables and the data in the create_timestamp
>> > column is populated using the now() function from the application, which
>> > means it will always be incremental, and the historical day transaction
>> > count is going to be the same. However surprisingly the counts are
>> changing
>> > each day when the user fetches the result using the below query. So my
>> > question was , if there is any issue with the way we are fetching the
>> data
>> > and it's making some date/time shift which is why the transaction count
>> > looks to be changing even for the past days data?
>>
>> Well, your cutoff time "CURRENT_TIMESTAMP - INTERVAL '10 day'" is
>> constantly moving, so that'd account for shifts in what's perceived
>> to belong to the oldest day.  Maybe you want "CURRENT_DATE - 10"
>> instead?
>>
>> > And also somehow this
>> > conversion function "DATE_TRUNC('hour', create_timestamp AT TIME ZONE
>> > 'EST')" is showing time in CST but not EST, why so?
>>
>> 'EST' is going to rotate to UTC-5, but that's probably not what
>> you want in the summer.  I'd suggest AT TIME ZONE 'America/New_York'
>> or the like.  See
>>
>>
>> https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-TIMEZONES
>>
>> regards, tom lane
>>
>
>
>  Thank you so much. You are correct. The AT TIME ZONE 'America/New_York'
> is giving correct EST time conversion.
>
> But I think regarding why it looks to be shifting i.e. the same time
> duration appears to be holding a different count of transactions while the
> base table is not getting updated/inserted/deleted for its historical
> create_timestamps, I suspect the below conversion part.
>
> The task is to count each ~15minutes duration transaction and publish in
> ordered fashion i.e. something as below, but the way it's been written
> seems wrong. It's an existing script. It first gets the date component with
> truncated hour and then adds the time component to it to make it ~15minutes
> interval. Can it be written in some simple way?
>
> 9-apr-2024 14:00 12340
> 9-apr-2024 14:15 12312
> 9-apr-2024 14:30 12323
> 9-apr-2024 14:45 12304
>
> *DATE_TRUNC('hour', create_timestamp AT TIME ZONE '*America/New_York'*) +*
> *(EXTRACT(MINUTE FROM create_timestamp AT TIME ZONE '*America/New_York*')
> / 15 * 15) * INTERVAL '15 minute'*
>


Re: Moving delta data faster

2024-04-09 Thread yudhi s
On Sun, Apr 7, 2024 at 2:25 AM Adrian Klaver 
wrote:

>
> I have no idea how this works in the code, but my suspicion is it is due
> to the following:
>
> https://www.postgresql.org/docs/current/sql-insert.html#SQL-ON-CONFLICT
>
> "The optional ON CONFLICT clause specifies an alternative action to
> raising a unique violation or exclusion constraint violation error. For
> each individual row proposed for insertion, either the insertion
> proceeds, or, if an arbiter constraint or index specified by
> conflict_target is violated, the alternative conflict_action is taken.
> ON CONFLICT DO NOTHING simply avoids inserting a row as its alternative
> action. ON CONFLICT DO UPDATE updates the existing row that conflicts
> with the row proposed for insertion as its alternative action."
>
> vs this:
>
> "First, the MERGE command performs a join from data_source to
> target_table_name producing zero or more candidate change rows. For each
> candidate change row, the status of MATCHED or NOT MATCHED is set just
> once, after which WHEN clauses are evaluated in the order specified. For
> each candidate change row, the first clause to evaluate as true is
> executed. No more than one WHEN clause is executed for any candidate
> change row."
>
> Where ON CONFLICT attempts the INSERT then on failure does the UPDATE
> for the ON CONFLICT DO UPDATE case. MERGE on the hand evaluates based on
> the join condition(ON tbl1.fld =tbl2.fld) and then based on MATCH/NOT
> MATCHED takes the appropriate action for the first WHEN match. In other
> words it goes directly to the appropriate action.
>
>
Thank you Adrian. I think you are spoton on the cause of upsert becoming
slower than Merge. Below is the explain plan I captured for both the
operations and it looks like even the planning time is small for the
UPSERT, as because it operates on the constraint i.e the Update will wait
for all the failure records from the INSERT and thus it takes longer. The
Merge seems to be evaluated on the Joins i.e it is directly able to get the
set of rows which has to be Updated rather waiting for the INSERT to make
it fail based on the PK constraint.

** Explain plan for UPSERT *

Insert on public.target_tab (cost=0.00..17353.00 rows=0 width=0) (actual
time=19957.569..19957.570 rows=0 loops=1)
Conflict Resolution: UPDATE
Conflict Arbiter Indexes: target_tab_pkey
Tuples Inserted: 50
Conflicting Tuples: 50
Buffers: shared hit=8545735 written=10094
-> Seq Scan on public.source_tab (cost=0.00..17353.00 rows=100
width=29) (actual time=0.006..208.306 rows=100 loops=1)
Output: source_tab.id, source_tab.column1, source_tab.column2
Buffers: shared hit=7353
Settings: effective_cache_size = '10475192kB', maintenance_io_concurrency =
'1', max_parallel_workers = '32', max_parallel_workers_per_gather = '4',
search_path = 'public, public, "$user"', temp_buffers = '16MB', work_mem =
'8MB'
Query Identifier: -1356019529835809419
Planning:
Buffers: shared hit=41
Planning Time: 0.199 ms
Execution Time: 19959.261 ms

** Explain plan for Merge *

Merge on public.target_tab t (cost=17368.00..53460.01 rows=0 width=0)
(actual time=14209.966..14209.968 rows=0 loops=1)
Tuples: inserted=50 updated=50
Buffers: shared hit=5040097 written=10460, temp read=4143 written=4143
I/O Timings: temp read=26.746 write=68.596
-> Hash Left Join (cost=17368.00..53460.01 rows=100 width=35) (actual
time=179.233..1332.264 rows=100 loops=1)
Output: t.ctid, s.column1, s.column2, s.id
Inner Unique: true
Hash Cond: (s.id = t.id)
Buffers: shared hit=11029, temp read=4143 written=4143
I/O Timings: temp read=26.746 write=68.596
-> Seq Scan on public.source_tab s (cost=0.00..17353.00 rows=100
width=29) (actual time=0.008..268.506 rows=100 loops=1)
Output: s.column1, s.column2, s.id
Buffers: shared hit=7353
-> Hash (cost=8676.00..8676.00 rows=50 width=10) (actual
time=178.101..178.102 rows=50 loops=1)
Output: t.ctid, t.id
Buckets: 524288 Batches: 2 Memory Usage: 14824kB
Buffers: shared hit=3676, temp written=977
I/O Timings: temp write=5.904
-> Seq Scan on public.target_tab t (cost=0.00..8676.00 rows=50
width=10) (actual time=0.007..66.441 rows=50 loops=1)
Output: t.ctid, t.id
Buffers: shared hit=3676
Settings: effective_cache_size = '10475192kB', maintenance_io_concurrency =
'1', max_parallel_workers = '32', max_parallel_workers_per_gather = '4',
search_path = 'public, public, "$user"', temp_buffers = '16MB', work_mem =
'8MB'
Query Identifier: -2297080081674771467
Planning:
Buffers: shared hit=85
Planning Time: 0.466 ms
Execution Time: 14212.061 ms


Re: Moving delta data faster

2024-04-06 Thread yudhi s
On Sat, Apr 6, 2024 at 10:25 PM Adrian Klaver 
wrote:

>
> Your original problem description was:
>
> "Then subsequently these rows will be inserted/updated based on the
> delta number of rows that got inserted/updated in the source database.
> In some cases these changed data can flow multiple times per day to the
> downstream i.e. postgres database and in other cases once daily."
>
> If the above is not a hard rule, then yes up to some point just
> replacing the data in mass would be the simplest/fastest method. You
> could cut a step out by doing something like TRUNCATE target_tab and
> then COPY target_tab FROM 'source.csv' bypassing the INSERT INTO
> source_tab.
>
> Yes, actually i didn't realize that truncate table transactional/online
here in postgres. In other databases like Oracle its downtime for the read
queries on the target table, as data will be vanished from the target table
post truncate(until the data load happens) and those are auto commit.
Thanks Veem for sharing that  option.

 I also think that truncate will be faster if the changes/delta is large ,
but if its handful of rows like <5%of the rows in the table then
Upsert/Merge will be better performant. And also the down side of the
truncate option is,  it does ask to bring/export all the data from source
to the S3 file which may take longer as compared to bringing just the delta
records. Correct me if I'm wrong.

However I am still not able to understand why the upsert is less performant
than merge, could you throw some light on this please?


Re: Moving delta data faster

2024-04-06 Thread yudhi s
Thank you Adrian, Greg and Veem.

I tried writing a small routine to see how the performance differs in these
four approaches i.e. Upsert VS traditional update+insert VS Merge vs
Truncate+load.

Initially I was thinking Upsert will perform the same as Merge as the logic
looks similar but it seems it's the worst performing among all, not sure
why , yet to know the reason though. Truncate+ load seems to be the best
performing among all. Hope i am doing it correctly. Please correct me if
I'm wrong.

UPSERT approach execution time: *00:00:20.921343*
UPSERT approach rows: 100

insert/update approach execution time: *00:00:15.53612*
insert/update approach update rows : 50
insert/update approach Insert rows: 50

MERGE approach execution time: *00:00:14.884623*
MERGE approach rows: 100

truncate load approach execution time:* 00:00:07.428826*
truncate load rows: 100

* Routine 

 UPSERT Testcase ***
drop table source_tab;
drop table target_tab;

CREATE TABLE source_tab (
id SERIAL PRIMARY KEY,
column1 VARCHAR(100),
column2 VARCHAR(100)
);

-- Create target table
CREATE TABLE target_tab (
id SERIAL PRIMARY KEY,
column1 VARCHAR(100),
column2 VARCHAR(100)
);

INSERT INTO source_tab (column1, column2)
SELECT
'Value ' || i,
'Value ' || (i * 2)
FROM generate_series(1, 100) AS i;

INSERT INTO target_tab (column1, column2)
SELECT
'Value ' || i,
'Value ' || (i * 2)
FROM generate_series(1, 50) AS i;

DO $$
DECLARE
start_time timestamp;
end_time timestamp;
rows_inserted integer:=0;
rows_updated integer:=0;
rows_upserted integer:=0;
rows_merged integer:=0;
BEGIN
-- Measure performance of UPSERT
start_time := clock_timestamp();
INSERT INTO target_tab (id, column1, column2)
SELECT id, column1, column2
FROM source_tab
ON CONFLICT (id) DO UPDATE
SET
column1 = EXCLUDED.column1,
column2 = EXCLUDED.column2;
get diagnostics rows_upserted=row_count;
end_time := clock_timestamp();
RAISE NOTICE 'UPSERT approach execution time: %', end_time - start_time;
RAISE NOTICE 'UPSERT approach rows: %', rows_upserted;

rollback;
END $$;


 Traditional Insert+update Testcase ***
drop table source_tab;
drop table target_tab;

CREATE TABLE source_tab (
id SERIAL PRIMARY KEY,
column1 VARCHAR(100),
column2 VARCHAR(100)
);

-- Create target table
CREATE TABLE target_tab (
id SERIAL PRIMARY KEY,
column1 VARCHAR(100),
column2 VARCHAR(100)
);

INSERT INTO source_tab (column1, column2)
SELECT
'Value ' || i,
'Value ' || (i * 2)
FROM generate_series(1, 100) AS i;

INSERT INTO target_tab (column1, column2)
SELECT
'Value ' || i,
'Value ' || (i * 2)
FROM generate_series(1, 50) AS i;

DO $$
DECLARE
start_time timestamp;
end_time timestamp;
rows_inserted integer:=0;
rows_updated integer:=0;
rows_upserted integer:=0;
rows_merged integer:=0;
BEGIN
-- Measure performance of insert/update approach
start_time := clock_timestamp();
-- Update existing records
UPDATE target_tab AS t
SET
column1 = s.column1,
column2 = s.column2
FROM source_tab AS s
WHERE t.id = s.id;
get diagnostics rows_updated=row_count;

-- Insert new records
INSERT INTO target_tab (id, column1, column2)
SELECT s.id, s.column1, s.column2
FROM source_tab AS s
LEFT JOIN target_tab AS t ON s.id = t.id
WHERE t.id IS NULL;
get diagnostics rows_inserted=row_count;

end_time := clock_timestamp();
RAISE NOTICE 'insert/update approach execution time: %', end_time -
start_time;
RAISE NOTICE 'insert/update approach update rows : %', rows_updated;
RAISE NOTICE 'insert/update approach Insert rows: %', rows_inserted;

rollback;
END $$;


 MERGE Testcase ***
drop table source_tab;
drop table target_tab;

CREATE TABLE source_tab (
id SERIAL PRIMARY KEY,
column1 VARCHAR(100),
column2 VARCHAR(100)
);

-- Create target table
CREATE TABLE target_tab (
id SERIAL PRIMARY KEY,
column1 VARCHAR(100),
column2 VARCHAR(100)
);

INSERT INTO source_tab (column1, column2)
SELECT
'Value ' || i,
'Value ' || (i * 2)
FROM generate_series(1, 100) AS i;

INSERT INTO target_tab (column1, column2)
SELECT
'Value ' || i,
'Value ' || (i * 2)
FROM generate_series(1, 50) AS i;

DO $$
DECLARE
start_time timestamp;
end_time timestamp;
rows_inserted integer:=0;
rows_updated integer:=0;
rows_upserted integer:=0;
rows_merged integer:=0;
begin

start_time := clock_timestamp();

merge into
target_tab t
using source_tab s on
t. id = s. id
when matched then
update
set column1 = s.column1,
column2 = s.column2
when not matched then
insert
values (id, column1, column2);
get diagnostics rows_merged=row_count;

end_time := clock_timestamp();
RAISE NOTICE 'MERGE approach execution time: %', end_time - start_time;
RAISE NOTICE 'MERGE approach rows: %', rows_merged;

rollback;
END $$;


 Truncate+load Testcase ***
drop table source_tab;
drop table target_tab;

CREATE TABLE source_tab (
id SERIAL PRIMARY KEY,
column1 VARCHAR(100),
column2 VARCHAR(100)
);

-- Create target table
CREATE TABLE target_tab (
id SERI

Re: Moving delta data faster

2024-04-04 Thread yudhi s
On Thu, Apr 4, 2024 at 9:04 PM Adrian Klaver 
wrote:

> On 4/3/24 22:24, yudhi s wrote:
> >
> > On Thu, Apr 4, 2024 at 10:16 AM Adrian Klaver  > <mailto:adrian.kla...@aklaver.com>> wrote:
>
> S3 is not a database. You will need to be more specific about '... then
> from the S3 it will be picked and gets merged to the target postgres
> database.'
>
>
The data from S3 will be dumped into the stage table and then the
upsert/merge from that table to the actual table.


Re: Moving delta data faster

2024-04-03 Thread yudhi s
On Thu, Apr 4, 2024 at 10:16 AM Adrian Klaver 
wrote:

> On 4/3/24 20:54, yudhi s wrote:
> > On Thu, Apr 4, 2024 at 2:41 AM Adrian Klaver  > <mailto:adrian.kla...@aklaver.com>> wrote:
> >
> >> Thank you Adrian.
> >
> > And one thing i forgot to mention this target postgresql database would
> > be on AWS RDS whereas the source Oracle databases is on premise. I think
> > we don't have the FDW extension currently in place but we can get that.
> > I am just not able to understand clearly  though, but do you mean export
> > the data from source using CSV and do truncate and import on target. And
> > as these data will be moved through the network won't that cause
> slowness?
> >
> > The source database here is Oracle database. Correct me if wrong, it
> > looks like foreign data wrapper is like a DB link. Or do you mean
> > writing a query on the target database (which can be UPSERT or MERGE)
> > but will be joining the table from the source database through the
> > DBlink/DDW? But my question was whether we should use UPSERT or MERGE
> > for comparing and loading the delta records to the target postgresql
> > database. Want to understand which is more performant , as I see in the
> > past Merge having performance issues in the past, but not very sure
> > about that.
>
> My motivation was to get some basic information about your setup and
> what you are trying to achieve.
>
> If I understand correctly you have:
>
> 1) An Oracle database with tables that you want to copy the complete
> data from to a Postgres database. For this sort of thing
> COPY(https://www.postgresql.org/docs/current/sql-copy.html) on the
> Postgres end using CSV data generated from the source is probably the
> quickest bulk load method.
>
> 2) After the initial load you want to do follow up INSERT/UPDATEs based
> on a delta of the source tables relative to the initial load. This is
> still a bit of mystery to me. How are determining the delta: a) On the
> source end entirely or b) Target relative to source? Also what is the
> anticipated size of the delta per transfer?
>
> Additional information needed:
>
> 1) Network distance between source and target?
>
> 2) Network capacity?
>
> 3) Expected load on both source and target servers from other operations?
>
>
Thank you. Actually I was trying to understand how to cater the delta load
after the one time load is done . The delta change in records is planned to
be found based on the primary keys on the tables. If it found the key it
will update the records if it does not find the keys it will insert the
rows.

Basically the select query from the source database will fetch the data
with a certain time interval(based on the latest update timestamp or create
timestamp if they are available or else full dump) and put it on S3 and
then from the S3 it will be picked and gets merged to the target postgres
database. As upsert and merge both were looking similar , so was wondering
what we should use here for loading the delta records?


Re: Moving delta data faster

2024-04-03 Thread yudhi s
On Thu, Apr 4, 2024 at 2:41 AM Adrian Klaver 
wrote:

> On 4/3/24 13:38, yudhi s wrote:
> >   Hi All,
> >   It's postgresql database version 15.4. We have a requirement in which
> > we will be initially moving full table data for 3-4 tables, from source
> > database to target(i.e. postgres) . Maximum number of rows will be
> > ~10million rows in those tables. Then subsequently these rows will be
> > inserted/updated based on the delta number of rows that got
> > inserted/updated in the source database. In some cases these changed
> > data can flow multiple times per day to the downstream i.e. postgres
> > database and in other cases once daily.
>
> What is the source database?
>
> Can it be reached with a FDW?:
>
> https://wiki.postgresql.org/wiki/Foreign_data_wrappers
>
> Can the delta on the source be output as CSV?
>
>

Thank you Adrian.

And one thing i forgot to mention this target postgresql database would be
on AWS RDS whereas the source Oracle databases is on premise. I think we
don't have the FDW extension currently in place but we can get that. I am
just not able to understand clearly  though, but do you mean export the
data from source using CSV and do truncate and import on target. And as
these data will be moved through the network won't that cause slowness?

The source database here is Oracle database. Correct me if wrong, it looks
like foreign data wrapper is like a DB link. Or do you mean writing a query
on the target database (which can be UPSERT or MERGE) but will be joining
the table from the source database through the DBlink/DDW? But my question
was whether we should use UPSERT or MERGE for comparing and loading the
delta records to the target postgresql database. Want to understand which
is more performant , as I see in the past Merge having performance issues
in the past, but not very sure about that.


Moving delta data faster

2024-04-03 Thread yudhi s
 Hi All,
 It's postgresql database version 15.4. We have a requirement in which we
will be initially moving full table data for 3-4 tables, from source
database to target(i.e. postgres) . Maximum number of rows will be
~10million rows in those tables. Then subsequently these rows will be
inserted/updated based on the delta number of rows that got
inserted/updated in the source database. In some cases these changed data
can flow multiple times per day to the downstream i.e. postgres database
and in other cases once daily.

 Want to understand , if we should use upsert(insert on conflict) or merge
statements or anything else in such a scenario so as to persist those delta
records faster in the target database, while making the system online to
the users?

Regards
Yudhi


Re: How to add columns faster

2024-03-03 Thread yudhi s
On Mon, Mar 4, 2024 at 12:43 AM Christophe Pettus  wrote:

>
> > On Mar 3, 2024, at 11:06, yudhi s  wrote:
> > as the column addition using the traditional "Alter table" command in
> postgres looks to be a full table rewrite
>
> That's not always (or, really, usually) true.  Adding a new column in any
> recent version of PostgreSQL just alters the system catalogs; it does not
> rewrite the table.  Make sure the new column is either NULL-able, or has a
> simple DEFAULT expression (specifically, not using a VOLATILE function).
> Per the documentation:
>
> > When a column is added with ADD COLUMN and a non-volatile DEFAULT is
> specified, the default is evaluated at the time of the statement and the
> result stored in the table's metadata. That value will be used for the
> column for all existing rows. If no DEFAULT is specified, NULL is used. In
> neither case is a rewrite of the table required.
> >
> > Adding a column with a volatile DEFAULT or changing the type of an
> existing column will require the entire table and its indexes to be
> rewritten.
>
> https://www.postgresql.org/docs/current/sql-altertable.html


Thanks for the clarification. In case of adding the column as volatile
default (like current_timestamp function as default) or say adding NOT NULL
column with some conditional population of existing values will be a full
table rewrite. In such scenarios, the full table rewrite operation is going
to take a long time , so what will be the fastest way to achieve that
with minimal to no downtime?

Apology if this is dumb one, but considering the partitions in postgres are
as good as different tables, can we add the new column someway at the table
level and add the columns to each of the partitions individually and then
attach or it has to happen at one shot only?


How to add columns faster

2024-03-03 Thread yudhi s
Hello,
We have an application in which the tables will have approx ~200 columns in
some of the big transaction tables when we will be serving all the business
use cases. Currently it has ~100 columns to serve current business use
cases to start with. As the team is following an agile approach , the
attributes will be added gradually with each future release. But the
concern is, the transaction tables will be holding billions of rows (
because of historical data migration from start) and will be range
partitioned and each of those partitions can be as big as ~100GB and full
table size can be in multiple Terabytes.

So our concern was , as the column addition using the traditional "Alter
table" command in postgres looks to be a full table rewrite , it’s going to
take a lot of time and resources with each of such releases for these big
transaction tables. So what is the way to handle such scenarios and do
these column additions in quick time with no/minimal downtime?

Regards
Yudhi


Where the data stitching/update/deduplication should happen

2024-02-28 Thread yudhi s
Hello All,
It's related to data flow to OLTP database which is mostly going to be
postgres.

We are designing a system which is going to move data from input files(in
Avro format) to Goldengate to kafka topics to the database. Incoming
files-->GGS--> KAFKA-->OLTP Database. This would be a heavy transactional
system processing ~10K txn/second. The database is supposed to show the
near real time transactions to the users. The transactions which come from
kafka topics will be asynchronous in nature and also there are chances of
duplicate data being ingested from kafka topics. So the data has to be
stitched/ updated/deduplicated before showing it to the users a complete
transaction or say before persisting it to the normalized data model which
would be ready for the querying by the end users.

So where should we perform these stitching/update/deduplication stuff in
this workflow? Should it happen inside the application somewhere in the
kafka consumer(using poison pill concept) or should it happen in a stage
schema in the database by persisting all the pieces of transaction as it is
coming from kafka topics. Adding another stage layer within the database is
going to add some more time to the data to be visible to the users and thus
it may not be near real time. As it will take some more time to move the
data to the main transaction/normalized tables from the stage tables.

Or should we persist the data as is in the stage area and show the data
from stage itself if some users are okay with partial transaction data and
showing the complete transaction data from the normalized table to other
users who want to see it as a complete transaction but with some delay?

What is the appropriate design to address such use cases?

Regards
Yudhi


Re: Creating table and indexes for new application

2024-02-24 Thread yudhi s
On Fri, Feb 23, 2024 at 5:26 PM sud  wrote:

>
>
> On Fri, 23 Feb, 2024, 1:28 pm yudhi s, 
> wrote:
>
>>
>>
>> On Fri, 23 Feb, 2024, 1:20 pm sud,  wrote:
>>
>>>
>>>
>>> On Fri, 23 Feb, 2024, 12:41 pm Laurenz Albe, 
>>> wrote:
>>>
>>>> On Fri, 2024-02-23 at 02:05 +0530, yudhi s
>>>>
>>>> > 2)Should we be creating composite indexes on each foreign key for
>>>> table2 and table3, because
>>>> >   any update or delete on parent is going to take lock on all child
>>>> tables?
>>>>
>>>> Every foreign key needs its own index.  A composite index is only
>>>> appropriate if the foreign
>>>> key spans multiple columns.
>>>>
>>>>
>>>
>>> From the DDL which OP posted it's using composite foreign key thus a
>>> composite index would be needed.
>>> However, if someone doesn't delete or update the parent table PK   , is
>>> it still advisable to have all the FK indexed? Like in general I think
>>> transaction id should not get updated in a normal scenario unless some
>>> special case.
>>>
>>>
>>>
>> Thank you. I can double check if we have confirmed use case of deleting
>> the parent table or updating PK in the parent table. But anyway it can
>> happen for data fix for sure in some scenario.
>>
>> But yes, we are certainly going to drop/purge partition from all the
>> parent and child table after specific days. So isn't that need the FK to be
>> indexed or else it will scan whole parent table partition?
>>
>
>
> I am not sure if drop partition of parent table, will have a lock or will
> do a full scan on the child table while doing the partition maintenance or
> dropping the partitions, in absence of foreign key index. Others may
> comment here.
>
>>
>>>
Can you please help me understand, If it's true that all the statements
like Delete, Update and Drop partition of parent table will take lock on
the child table and Full scan the child table , and thus foreign key index
on all the child table is necessary irrespective of the performance
overhead it has on all the INSERT queries into the child tables?


Re: Creating table and indexes for new application

2024-02-22 Thread yudhi s
On Fri, 23 Feb, 2024, 1:20 pm sud,  wrote:

>
>
> On Fri, 23 Feb, 2024, 12:41 pm Laurenz Albe, 
> wrote:
>
>> On Fri, 2024-02-23 at 02:05 +0530, yudhi s
>>
>> > 2)Should we be creating composite indexes on each foreign key for
>> table2 and table3, because
>> >   any update or delete on parent is going to take lock on all child
>> tables?
>>
>> Every foreign key needs its own index.  A composite index is only
>> appropriate if the foreign
>> key spans multiple columns.
>>
>>
>
> From the DDL which OP posted it's using composite foreign key thus a
> composite index would be needed.
> However, if someone doesn't delete or update the parent table PK   , is it
> still advisable to have all the FK indexed? Like in general I think
> transaction id should not get updated in a normal scenario unless some
> special case.
>
>
>
Thank you. I can double check if we have confirmed use case of deleting the
parent table or updating PK in the parent table. But anyway it can happen
for data fix for sure in some scenario.

But yes, we are certainly going to drop/purge partition from all the parent
and child table after specific days. So isn't that need the FK to be
indexed or else it will scan whole parent table partition?

>
>


Creating table and indexes for new application

2024-02-22 Thread yudhi s
Hello Friends,
We are newly creating tables for a system which is going to consume
transaction data from customers and store in postgres version 15+ database.
And it would be ~400million transactions/rows per day in the main
transaction table and almost double in the multiple child tables and some
child tables will hold lesser records too.

Say TABLE1 is the main or parent transaction table which will hold
~400million transactions.
TABLE2 is the child table which is going to hold ~800million rows/day. It
has one to many relationships with table1.
TABLe3 is another child table which is going to hold ~200million rows per
day.

We are considering all of these tables for partitioning by the same
transaction_date column and it would be daily partitions. We have some
questions,

1)While creating these tables and related indexes, do we need to be careful
of defining any other storage parameters like tablespaces etc Or its fine
to make those table/indexes aligned to the default tablespace only? and are
there any constraints on tablespace size , as we will have 100's GB of data
going to be stored in each of the daily partitions?

2)Should we be creating composite indexes on each foreign key for table2
and table3, because any update or delete on parent is going to take lock on
all child tables?

3)We were thinking of simple Btree indexes to be created on the columns
based on the search criteria of the queries. but the indexes doc i see in
postgres having INCLUDE keywords also available in them. So I'm struggling
to understand a bit, how it's adding value to the read query performance if
those additional columns are not added explicitly to the index but are part
of the INCLUDE clause? Will it give some benefit in regards to the index
storage space? or should we always keep all the columns in the index
definition itself other than some exception scenario? Struggling to
understand the real benefit of the INCLUDE clause.

Below is a sample DDL of what it will look like.

 Create table syntax:-

CREATE TABLE TABLE1
(
TXN_ID varchar(36)  NOT NULL ,
txn_timestamp date  NOT NULL ,
CONSTRAINT TABLE1_PK PRIMARY KEY (TXN_ID,txn_timestamp)
) partition by range (txn_timestamp);


CREATE TABLE TABLE2
(
table2_id varchar(36) not null,
TXN_ID varchar(36)  NOT NULL ,
txn_timestamp date  NOT NULL ,
CONSTRAINT TABLE2_PK PRIMARY KEY (table2_id,txn_timestamp)
) partition by range (txn_timestamp);

alter table table2 add constraint fk1 (TXN_ID,txn_timestamp) references
 table1(TXN_ID,txn_timestamp);
Create Index idx1 on TABLE2(TXN_ID,txn_timestamp); -- Index for foreign Key

CREATE TABLE TABLE3
(
table3_id varchar(36) not null,
TXN_ID varchar(36)  NOT NULL ,
txn_timestamp date  NOT NULL ,
CONSTRAINT TABLE3_PK PRIMARY KEY (table3_id,txn_timestamp)
) partition by range (txn_timestamp);

alter table table2 add constraint fk2 (TXN_ID,txn_timestamp) references
 table1(TXN_ID,txn_timestamp);
Create Index idx2 on TABLE3(TXN_ID,txn_timestamp); -- Index for foreign key.

Thanks And Regards
Yudhi