Re: How to do faster DML

2024-02-17 Thread Peter J. Holzer
On 2024-02-16 01:34:01 +0100, Peter J. Holzer wrote: > On 2024-02-15 16:51:56 -0700, David G. Johnston wrote: > > What you see with an exact type is what you get, which allows for > > implementing > > equality, unlike inexact which requires epsilon checking. > > You can check binary fp values

Re: How to do faster DML

2024-02-16 Thread Peter J. Holzer
On 2024-02-16 12:10:20 +0530, veem v wrote: > > On Fri, 16 Feb 2024 at 06:04, Peter J. Holzer wrote: > > On 2024-02-15 16:51:56 -0700, David G. Johnston wrote: > > On Thu, Feb 15, 2024 at 4:31 PM Peter J. Holzer > wrote: > >     On 2024-02-14 22:55:01 -0700, David G. Johnston

Re: How to do faster DML

2024-02-15 Thread veem v
On Fri, 16 Feb 2024 at 06:04, Peter J. Holzer wrote: > On 2024-02-15 16:51:56 -0700, David G. Johnston wrote: > > On Thu, Feb 15, 2024 at 4:31 PM Peter J. Holzer > wrote: > > On 2024-02-14 22:55:01 -0700, David G. Johnston wrote: > > > On Tuesday, February 13, 2024, veem v wrote: > >

Re: How to do faster DML

2024-02-15 Thread Peter J. Holzer
On 2024-02-15 16:51:56 -0700, David G. Johnston wrote: > On Thu, Feb 15, 2024 at 4:31 PM Peter J. Holzer wrote: > On 2024-02-14 22:55:01 -0700, David G. Johnston wrote: > > On Tuesday, February 13, 2024, veem v wrote: > > > >     float data types rather than numeric. This will

Re: How to do faster DML

2024-02-15 Thread David G. Johnston
On Thu, Feb 15, 2024 at 4:31 PM Peter J. Holzer wrote: > On 2024-02-14 22:55:01 -0700, David G. Johnston wrote: > > On Tuesday, February 13, 2024, veem v wrote: > > > > float data types rather than numeric. This will give better > performance. > > > > > > Only use an inexact floating-point

Re: How to do faster DML

2024-02-15 Thread Peter J. Holzer
On 2024-02-14 22:55:01 -0700, David G. Johnston wrote: > On Tuesday, February 13, 2024, veem v wrote: > > float data types rather than numeric. This will give better performance. > > >  Only use an inexact floating-point data type if you truly understand what you > are getting yourself

Re: How to do faster DML

2024-02-15 Thread Greg Sabino Mullane
I really worry you are overthinking this. The only real concern is going from INT to BIGINT, jumping from 4 to 8 bytes or storage. That really covers 99% of real world cases, and the canonical advice is to start with BIGINT if you ever think your rows are going to be numbered in the billions.

Re: How to do faster DML

2024-02-15 Thread David G. Johnston
On Thu, Feb 15, 2024 at 12:18 PM veem v wrote: > > So one learning for me, i.e. one of the downside of fixed length data type > is, with fixed length data types any future changes to it , will be a full > table rewrite. And thus this sort of change for big tables will be a > nightmare. > > Yes,

Re: How to do faster DML

2024-02-15 Thread veem v
On Thu, 15 Feb 2024 at 22:40, Adrian Klaver wrote: > On 2/15/24 09:00, Greg Sabino Mullane wrote: > > On Thu, Feb 15, 2024 at 11:43 AM Adrian Klaver > > mailto:adrian.kla...@aklaver.com>> wrote: > > > > That is a mixed bag: > > > > > > Ha! Good point. Our contrived example table does suffer

Re: How to do faster DML

2024-02-15 Thread Adrian Klaver
On 2/15/24 09:00, Greg Sabino Mullane wrote: On Thu, Feb 15, 2024 at 11:43 AM Adrian Klaver mailto:adrian.kla...@aklaver.com>> wrote: That is a mixed bag: Ha! Good point. Our contrived example table does suffer from that, so perhaps the test should be: create table int_test(c1 int, c2

Re: How to do faster DML

2024-02-15 Thread Greg Sabino Mullane
On Thu, Feb 15, 2024 at 11:43 AM Adrian Klaver wrote: > That is a mixed bag: > Ha! Good point. Our contrived example table does suffer from that, so perhaps the test should be: create table int_test(c1 int, c2 int); Cheers, Greg

Re: How to do faster DML

2024-02-15 Thread Adrian Klaver
On 2/15/24 08:16, Greg Sabino Mullane wrote: So as I also tested the same as you posted, there has been no change in "ctid" , when I altered the column data type from 'int' to 'bigint' in the table, so that means full table rewriting won't happen in such a scenario. No it was

Re: How to do faster DML

2024-02-15 Thread Greg Sabino Mullane
> > So as I also tested the same as you posted, there has been no change in > "ctid" , when I altered the column data type from 'int' to 'bigint' in the > table, so that means full table rewriting won't happen in such a scenario. No it was definitely rewritten - do not depend on the ctid to

Re: How to do faster DML

2024-02-14 Thread David G. Johnston
On Tuesday, February 13, 2024, veem v wrote: > > float data types rather than numeric. This will give better performance. > Only use an inexact floating-point data type if you truly understand what you are getting yourself into. Quickly getting the wrong answer isn’t tolerable solution. David

Re: How to do faster DML

2024-02-14 Thread Ron Johnson
On Wed, Feb 14, 2024 at 11:58 PM veem v wrote: > > > On Thu, 15 Feb 2024 at 00:43, Adrian Klaver > wrote: > >> It depends: >> >> https://www.postgresql.org/docs/current/sql-altertable.html >> >> "Adding a column with a volatile DEFAULT or changing the type of an >> existing column will require

Re: How to do faster DML

2024-02-14 Thread veem v
On Thu, 15 Feb 2024 at 00:43, Adrian Klaver wrote: > It depends: > > https://www.postgresql.org/docs/current/sql-altertable.html > > "Adding a column with a volatile DEFAULT or changing the type of an > existing column will require the entire table and its indexes to be > rewritten. As an

Re: How to do faster DML

2024-02-14 Thread Adrian Klaver
On 2/14/24 10:11 AM, veem v wrote: On Wed, 14 Feb, 2024, 10:30 am Ron Johnson, wrote: On Tue, Feb 13, 2024 at 4:17 PM veem v wrote: [sni[] One question here, if we have defined one column as a fixed length data type "integer" and slowly we noticed the

Re: How to do faster DML

2024-02-14 Thread veem v
On Wed, 14 Feb, 2024, 10:30 am Ron Johnson, wrote: > On Tue, Feb 13, 2024 at 4:17 PM veem v wrote: > [sni[] > >> One question here, if we have defined one column as a fixed length data >> type "integer" and slowly we noticed the length of data keeps increasing >> (in case of a sequence

Re: How to do faster DML

2024-02-13 Thread Ron Johnson
On Tue, Feb 13, 2024 at 4:17 PM veem v wrote: [sni[] > One question here, if we have defined one column as a fixed length data > type "integer" and slowly we noticed the length of data keeps increasing > (in case of a sequence generated PK column which will keep increasing), and > we want to

Re: How to do faster DML

2024-02-13 Thread veem v
On Tue, 13 Feb 2024 at 20:32, Peter J. Holzer wrote: > Please do not conflate "char(n)" with native machine types like int or > float. These are very different things. A char(n) is string of fixed but > arbitrary length. This is not something a CPU can process in a single > instruction. It has

Re: How to do faster DML

2024-02-13 Thread Peter J. Holzer
On 2024-02-13 01:53:25 +0530, veem v wrote: > On Mon, 12 Feb 2024 at 03:40, Peter J. Holzer wrote: > > The fixed width types are those that the CPU can directly process: > Integers with 16, 32 and 64 bits, floating point numbers with 32 and 64 > bits. The CPU can read and write them

Re: How to do faster DML

2024-02-13 Thread Peter J. Holzer
On 2024-02-12 11:28:41 -0500, Greg Sabino Mullane wrote: > On Mon, Feb 12, 2024 at 1:50 AM veem v wrote: > > So we were thinking, adding many column to a table should be fine in > postgres (as here we have a use case in which total number of columns may > go till ~500+). But then, 

Re: How to do faster DML

2024-02-12 Thread veem v
On Tue, 13 Feb 2024 at 02:01, Ron Johnson wrote: > On Mon, Feb 12, 2024 at 3:23 PM veem v wrote: > [snip] > >> So it looks like the fixed length data type(like integer, float) should >> be the first choice while choosing the data type of the attributes >> wherever possible, as these are native

Re: How to do faster DML

2024-02-12 Thread Ron Johnson
On Mon, Feb 12, 2024 at 3:23 PM veem v wrote: [snip] > So it looks like the fixed length data type(like integer, float) should be > the first choice while choosing the data type of the attributes > wherever possible, as these are native types. > Correct. > (Like choosing "Integer/float" over

Re: How to do faster DML

2024-02-12 Thread veem v
Thank you so much for the clarification. On Mon, 12 Feb 2024 at 03:40, Peter J. Holzer wrote: > The fixed width types are those that the CPU can directly process: > Integers with 16, 32 and 64 bits, floating point numbers with 32 and 64 > bits. The CPU can read and write them with a single

Re: How to do faster DML

2024-02-12 Thread Greg Sabino Mullane
On Mon, Feb 12, 2024 at 1:50 AM veem v wrote: > So we were thinking, adding many column to a table should be fine in > postgres (as here we have a use case in which total number of columns may > go till ~500+). But then, considering the access of columns towards the > end of a row is going to

Re: How to do faster DML

2024-02-12 Thread Greg Sabino Mullane
On Tue, Feb 6, 2024 at 12:15 AM Lok P wrote: > Another thing I noticed the shared_buffer parameters set as 2029684 in > this instance, which comes to ~21MB and that seems very small for a > database operating in large scale. And I see we have RAM in the instance > showing as ~256GB. So thinking

Re: How to do faster DML

2024-02-12 Thread Dominique Devienne
On Mon, Feb 12, 2024 at 7:50 AM veem v wrote: > I think that's not much of a concern with PostgreSQL because you can't >> update a row in-place anyway because of MVCC. > > > Good to know. So it means here in postgres, there is no such concern like > "row chaining", "row migration" etc. > which

Re: How to do faster DML

2024-02-11 Thread veem v
Thank you . On Mon, 12 Feb 2024 at 03:52, Peter J. Holzer wrote: > On 2024-02-11 12:08:47 -0500, Ron Johnson wrote: > > On Sun, Feb 11, 2024 at 11:54 AM veem v wrote: > > When you said "you would normally prefer those over numeric " I was > > thinking the opposite. As you mentioned

Re: How to do faster DML

2024-02-11 Thread Peter J. Holzer
On 2024-02-11 12:08:47 -0500, Ron Johnson wrote: > On Sun, Feb 11, 2024 at 11:54 AM veem v wrote: > When you said "you would normally prefer those over numeric " I was > thinking the opposite. As you mentioned integer is a fixed length data > type > and will occupy 4 bytes whether

Re: How to do faster DML

2024-02-11 Thread Peter J. Holzer
On 2024-02-11 22:23:58 +0530, veem v wrote: > On Sun, 11 Feb 2024 at 19:02, Peter J. Holzer wrote: > > > Similarly for Number/Numeric data type. > > Number in Oracle and numeric in PostgreSQL are variable length types. > But in PostgreSQL you also have a lot of fixed length numeric

Re: How to do faster DML

2024-02-11 Thread Ron Johnson
On Sun, Feb 11, 2024 at 11:54 AM veem v wrote: [snip] > When you said *"you would normally prefer those over numeric " *I was > thinking the opposite. As you mentioned integer is a fixed length data type > and will occupy 4 bytes whether you store 15 or .But in case of > variable length

Re: How to do faster DML

2024-02-11 Thread David G. Johnston
On Sunday, February 11, 2024, veem v wrote: > > On Sun, 11 Feb 2024 at 19:02, Peter J. Holzer wrote: > >> > Similarly for Number/Numeric data type. >> >> Number in Oracle and numeric in PostgreSQL are variable length types. >> But in PostgreSQL you also have a lot of fixed length numeric types

Re: How to do faster DML

2024-02-11 Thread veem v
On Sun, 11 Feb 2024 at 19:02, Peter J. Holzer wrote: > > Similarly for Number/Numeric data type. > > Number in Oracle and numeric in PostgreSQL are variable length types. > But in PostgreSQL you also have a lot of fixed length numeric types > (from boolean to bigint as well as float4 and float8)

Re: How to do faster DML

2024-02-11 Thread Peter J. Holzer
On 2024-02-11 13:25:10 +0530, veem v wrote: > On Sun, 11 Feb 2024 at 05:55, Peter J. Holzer wrote: > Yes. Numbers in Oracle are variable length, so most Oracle tables > wouldn't contain many fixed length columns. In PostgreSQL must numeric > types are fixed length, so you'll have

Re: How to do faster DML

2024-02-11 Thread Laurenz Albe
On Sun, 2024-02-11 at 13:25 +0530, veem v wrote: > On Sun, 2024-02-11 at 01:25 +0100, Peter J. Holzer wrote: > > On 2024-02-06 11:25:05 +0530, veem v wrote: > > > With respect to the storage part:- In Oracle database we were supposed to > > > keep > > > the frequently accessed column first and

Re: How to do faster DML

2024-02-10 Thread veem v
Thank You so much for the detailed explanation. On Sun, 11 Feb 2024 at 05:55, Peter J. Holzer wrote: > > Yes. Numbers in Oracle are variable length, so most Oracle tables > wouldn't contain many fixed length columns. In PostgreSQL must numeric > types are fixed length, so you'll have quite a

Re: How to do faster DML

2024-02-10 Thread Peter J. Holzer
On 2024-02-06 11:25:05 +0530, veem v wrote: > With respect to the storage part:- In Oracle database we were supposed to keep > the frequently accessed column first and less frequently accessed columns > towards last in the table while creating table DDL. This used to help the > query >

Re: How to do faster DML

2024-02-08 Thread Greg Sabino Mullane
On Thu, Feb 8, 2024 at 12:12 AM Lok P wrote: > Rearranging the table columns by typlen desc, didn't give much storage > space reduction. > It's not so much "biggest to smallest" as making sure you don't have any gaps when you move from one data type to another. You may have not had any "holes"

Re: How to do faster DML

2024-02-07 Thread Lok P
> On Tue, Feb 6, 2024 at 8:34 AM Greg Sabino Mullane > wrote: > >> On Sun, Feb 4, 2024 at 3:52 PM Lok P wrote: >> >>> What I mean was, we will definitely need the data for querying purposes >>> by the users, but just to keep the storage space incontrol (and to help the >>> read queries), we were

Re: How to do faster DML

2024-02-05 Thread veem v
On Tue, 6 Feb 2024 at 10:45, Lok P wrote: > Thank you Greg. > > *"and keeping all your active stuff in cache. Since you have 127 columns, > only pull back the columns you absolutely need for each query."* > > Understood the point. As postgres is a "row store" database, so keeping > the size of

Re: How to do faster DML

2024-02-05 Thread Lok P
Thank you Greg. *"and keeping all your active stuff in cache. Since you have 127 columns, only pull back the columns you absolutely need for each query."* Understood the point. As postgres is a "row store" database, so keeping the size of the row lesser by making the number of columns to

Re: How to do faster DML

2024-02-05 Thread Greg Sabino Mullane
> So when you said *"In that case, and based on the numbers you provided, > daily partitioning seems a decent solution."* > , does it mean that size of the partition (which will be ~100-150GB per > partition here) is not an issue for the postgres to serve read latency > sensitive queries? > Yes,

Re: How to do faster DML

2024-02-05 Thread Lok P
On Tue, Feb 6, 2024 at 8:34 AM Greg Sabino Mullane wrote: > On Sun, Feb 4, 2024 at 3:52 PM Lok P wrote: > >> This table will always be queried on the transaction_date column as one >> of the filters criteria. But the querying/search criteria can span from a >> day to a month worth of

Re: How to do faster DML

2024-02-05 Thread Greg Sabino Mullane
On Sun, Feb 4, 2024 at 3:52 PM Lok P wrote: > This table will always be queried on the transaction_date column as one of > the filters criteria. But the querying/search criteria can span from a day > to a month worth of transaction date data. > In that case, and based on the numbers you

Re: How to do faster DML

2024-02-05 Thread veem v
I have not much experience with postgres on this scale though, others may provide better suggestions. However, with this scale you will have a single table size ~30TB+. And someone just pointed out in this thread, ~180 partitions/table as the good to have limit, and if that is true it would be

Re: How to do faster DML

2024-02-05 Thread Lok P
On Mon, 5 Feb, 2024, 2:21 am Lok P, wrote: On Sun, Feb 4, 2024 at 9:18 PM Greg Sabino Mullane wrote: Partitioning will depend a lot on how you access the data. Is it always using that transaction_date column in the where clause? Can you share the table definition via \d? We've talked about

Re: How to do faster DML

2024-02-04 Thread Ron Johnson
On Sat, Feb 3, 2024 at 11:09 AM Lok P wrote: [snip] > show maintenance_work_mem; - 4155MB > show work_mem; - 8MB > show shared_buffers ; -22029684 > Those are pretty small values. What are your server specs?

Re: How to do faster DML

2024-02-04 Thread Lok P
On Sun, Feb 4, 2024 at 9:25 PM Ron Johnson wrote: > 1. Load the children before attaching them to the parent. > 2. Create the child indices, PK and FKs before attaching to the parent. > 3. Do step 2 in multiple parallel jobs. (cron is your friend.) > 4. Attach the children to the "naked" (no

Re: How to do faster DML

2024-02-04 Thread Lok P
On Sun, Feb 4, 2024 at 9:18 PM Greg Sabino Mullane wrote: > Partitioning will depend a lot on how you access the data. Is it always > using that transaction_date column in the where clause? Can you share the > table definition via \d? We've talked about this table quite a bit, but not > actually

Re: How to do faster DML

2024-02-04 Thread Ron Johnson
On Sun, Feb 4, 2024 at 10:30 AM Lok P wrote: > On Sun, Feb 4, 2024 at 8:14 PM Dennis White > wrote: > >> I'm surprised no one has mentioned perhaps it's a good idea to partition >> this table while adding the pk. By your own statements the table is >> difficult to work with as is. Without

Re: How to do faster DML

2024-02-04 Thread Greg Sabino Mullane
Partitioning will depend a lot on how you access the data. Is it always using that transaction_date column in the where clause? Can you share the table definition via \d? We've talked about this table quite a bit, but not actually seen what it looks like. We are flying blind a little bit. You

Re: How to do faster DML

2024-02-04 Thread Lok P
On Sun, Feb 4, 2024 at 8:14 PM Dennis White wrote: > I'm surprised no one has mentioned perhaps it's a good idea to partition > this table while adding the pk. By your own statements the table is > difficult to work with as is. Without partitioning the table, row inserts > would need to walk the

Re: How to do faster DML

2024-02-04 Thread Dennis White
I'm surprised no one has mentioned perhaps it's a good idea to partition this table while adding the pk. By your own statements the table is difficult to work with as is. Without partitioning the table, row inserts would need to walk the pk index and could be a factor. If this is static table then

Re: How to do faster DML

2024-02-04 Thread Marcos Pegoraro
> > insert into mytable2 select * from mytable1 order by ctid limit 10_000_000 > offset 20_000_000; > You can use min/max values grouping them by 10_000_000 records, so you don´t need that offset, then generate commands and run them. select format('insert into mytable2 select * from mytable1

Re: How to do faster DML

2024-02-04 Thread Francisco Olarte
Lok: On Sat, 3 Feb 2024 at 21:44, Lok P wrote: > On Sun, Feb 4, 2024 at 12:50 AM Francisco Olarte > wrote: > However , as we have ~5billion rows in the base table and out of that , we > were expecting almost half i.e. ~2billion would be duplicates. And you said, > doing the inserts using

Re: How to do faster DML

2024-02-04 Thread Alban Hertroys
> On 3 Feb 2024, at 13:20, Lok P wrote: > > Hello All, > A non partitioned table having ~4.8 billion rows in it and having data size > as ~1.4TB , row size as ~313 bytes having ~127 columns in it. This has got > approx ~1billion+ duplicate rows inserted in it and we want to get the >

Re: How to do faster DML

2024-02-04 Thread Peter J. Holzer
On 2024-02-04 02:14:20 +0530, Lok P wrote: > However , as we have ~5billion rows in the base table and out of that , we > were > expecting almost half i.e. ~2billion would be duplicates. That's way more than I expected from your original description. And it of course raises the question whether

Re: How to do faster DML

2024-02-03 Thread Lok P
On Sun, Feb 4, 2024 at 12:50 AM Francisco Olarte wrote: > On Sat, 3 Feb 2024 at 19:29, Greg Sabino Mullane > wrote: > ... > > Given the size of your table, you probably want to divide that up. > > As long as nothing is changing the original table, you could do: > > > > insert into mytable2

Re: How to do faster DML

2024-02-03 Thread Peter J. Holzer
On 2024-02-03 19:25:12 +0530, Lok P wrote: > Apology. One correction, the query is like below. I. E filter will be on on > ctid which I believe is equivalent of rowid in oracle and we will not need the > index on Id column then.  > >  But, it still runs long, so thinking any other way to make the 

Re: How to do faster DML

2024-02-03 Thread Francisco Olarte
On Sat, 3 Feb 2024 at 19:29, Greg Sabino Mullane wrote: ... > Given the size of your table, you probably want to divide that up. > As long as nothing is changing the original table, you could do: > > insert into mytable2 select * from mytable1 order by ctid limit 10_000_000 > offset 0; > insert

Re: How to do faster DML

2024-02-03 Thread Greg Sabino Mullane
As a general rule, avoid heavy subselects like that. You don't need to build a full list of duplicates before starting. Another approach: create table mytable2 (like mytable1); alter table mytable2 add primary key (id); insert into mytable2 select * from mytable1 on conflict do nothing; Given

Re: How to do faster DML

2024-02-03 Thread Lok P
Ron Johnson 7:37 PM (1 hour ago) to *pgsql-general* On Sat, Feb 3, 2024 at 7:37 PM Ron Johnson wrote: > On Sat, Feb 3, 2024 at 8:55 AM Lok P wrote: > >> Apology. One correction, the query is like below. I. E filter will be on >> on ctid which I believe is equivalent of rowid in oracle and we

Re: How to do faster DML

2024-02-03 Thread Ron Johnson
On Sat, Feb 3, 2024 at 8:55 AM Lok P wrote: > Apology. One correction, the query is like below. I. E filter will be on > on ctid which I believe is equivalent of rowid in oracle and we will not > need the index on Id column then. > > But, it still runs long, so thinking any other way to make

Re: How to do faster DML

2024-02-03 Thread Todd Lewis
I copy/pasted your question into ChatGPT, and it gave me 10 specific suggestions. Have you tried those? On Sat, Feb 3, 2024 at 10:55 AM Lok P wrote: > Apology. One correction, the query is like below. I. E filter will be on > on ctid which I believe is equivalent of rowid in oracle and we will

Re: How to do faster DML

2024-02-03 Thread Lok P
Apology. One correction, the query is like below. I. E filter will be on on ctid which I believe is equivalent of rowid in oracle and we will not need the index on Id column then. But, it still runs long, so thinking any other way to make the duplicate removal faster? Also wondering , the index

How to do faster DML

2024-02-03 Thread Lok P
Hello All, A non partitioned table having ~4.8 billion rows in it and having data size as ~1.4TB , row size as ~313 bytes having ~127 columns in it. This has got approx ~1billion+ duplicate rows inserted in it and we want to get the duplicate data removed for this table and create a PK/unique