Re: How to do faster DML

2024-02-11 Thread veem v
Thank you .

On Mon, 12 Feb 2024 at 03:52, Peter J. Holzer  wrote:

> On 2024-02-11 12:08:47 -0500, Ron Johnson wrote:
> > On Sun, Feb 11, 2024 at 11:54 AM veem v  wrote:
> > When you said "you would normally prefer those over numeric " I was
> > thinking the opposite. As you mentioned integer is a fixed length
> data type
> > and will occupy 4 bytes whether you store 15 or .But in case
> of
> > variable length type like Number or numeric , it will resize itself
> based
> > on the actual data, So is there any downside of going with the
> variable
> > length data type like Numeric,
> >
> >
> > Consider a table with a bunch of NUMERIC fields.  One of those records
> has
> > small values (aka three bytes).  It fits neatly in 2KiB.
> >
> > And then you update all those NUMERIC fields to big numbers that take 15
> > bytes.  Suddenly (or eventually, if you update them at different times),
> the
> > record does not fit in 2KiB, and so must be moved to its own.page.
> That causes
> > extra IO.
>
> I think that's not much of a concern with PostgreSQL because you can't
> update a row in-place anyway because of MVCC. So in any case you're
> writing a new row. If you're lucky there is enough free space in the same
> page and you can do a HOT update, but that's quite independent on
> whether the row changes size.
>
>
>
Good to know. So it means here in postgres, there is no such concern like
"row chaining", "row migration" etc. which we normally have in a non mvcc
database (like Oracle say). And there its not advisable to have more than
~255 columns in a table even its technically possible. And if such
requirement arises, we normally break the table into 2 different tables
with some columns in common to join them.

https://jonathanlewis.wordpress.com/2015/02/19/255-columns/

So we were thinking, adding many column to a table should be fine in
postgres (as here we have a use case in which total number of columns may
go till ~500+). But then,  considering the access of columns towards the
end of a row is going to add more time as compared to the column which is
at the start of the row. As you mentioned, accessing 100th column may add 4
to 5 times more as compared to the access of first column. So , is it
advisable here to go for similar approach of breaking the table into two ,
if the total number of column reaches certain number/threshold for a table?

Regards
Veem


Re: Query hangs (and then timeout) after using COPY to import data

2024-02-11 Thread Tom Lane
Ron Johnson  writes:
> Since the query works in PgAdmin, but not in npgsql, the problem has to be
> somewhere in Npgsql.

I'm wondering if npgsql is executing the query as a cursor and that
is causing the planner to make a poor choice of plan.  There is a
bias towards fast-start plans if you use a cursor, and that doesn't
always work well :-(

regards, tom lane




Re: Query hangs (and then timeout) after using COPY to import data

2024-02-11 Thread Ron Johnson
Since the query works in PgAdmin, but not in npgsql, the problem has to be
somewhere in Npgsql.

https://www.npgsql.org/doc/diagnostics/overview.html

Maybe increasing the log level will lead to a solution.

On Sun, Feb 11, 2024 at 6:13 PM  wrote:

> Thanks, Adrian, for the suggestion, but same problem.
>
> I've just tried to execute "ANALYZE" (on the whole database) after the
> import of all the tables (with COPY) and before the other queries, but the
> query always hangs (I hope this was the way you suggested).
>
> Stefano
>
> > -Original Message-
> > From: Adrian Klaver 
> > Sent: Sunday, February 11, 2024 10:42 PM
> > To: ste...@gmail.com; pgsql-general@lists.postgresql.org
> > Subject: Re: Query hangs (and then timeout) after using COPY to import
> data
> >
> > On 2/11/24 13:37, ste...@gmail.com wrote:
> > > Hello,
> > >
> > > I’m new to PostgreSQL. I’m trying to migrate an app from SqlServer to
> > > Postgres (it’s written in C# and uses Npgsql)
> > >
> > > I’ve tried with Postgres 16.1 and 16.2 on Windows Server 2019.
> > >
> > > This app used SqlServer’s Bulk Insert to import some tables (about 50
> > > tables) from another database, I replaced it with Postgres’ COPY
> > > function: this part works correctly.
> > >
> > > After the import, I execute sequentially (not in parallel) some
> > > queries in these tables, to update some data and to make some
> validations.
> > >
> > > At some point, systematically, one of these queries hangs, and after
> > > 10 minutes (the CommandTimeout that I set) it throws this exception:
> > >
> > > Exception while reading from stream ---> System.TimeoutException:
> > > Timeout during reading attempt
> > >
> > > at Npgsql.Internal.NpgsqlConnector
> > >
> > > The query is this one:
> > >
> > >
> > > SELECT Id FROM Item
> > >
> > > WHERE Id NOT IN (
> > >
> > > SELECT ItemId FROM ItemUom)
> > >
> > > LIMIT 100
> > >
> > > The same query, executed from pgAdmin, returns the result in less than
> > > a second (even if it’s executed while the query from my app is
> running).
> > >
> > > (actually the result are 0 record, but it’s correct: the query it’s
> > > just a validation that there are no records in that query)
> > >
> > > While the query is running from my app, I noticed that the CPU goes
> > > beyond 95%, even up to 100%, due to 3 postgres.exe processes.
> > >
> > > The RAM usage is less than 70%.
> > >
> > > In pgAdmin I’ve executed a query to list the running queries, and I
> > > can see that one.
> > >
> > > My issue seems to be very similar to this one:
> > > https://stackoverflow.com/questions/77195107/npgsql-timeout-during-
> > rea
> > > ding-attempt
> > >  > re
> > > ading-attempt> but I didn’t understand how that guy solved the
> > > problem.
> > >
> > > If I import less tables from the external database, the query doesn’t
> > > hang and runs correctly, so this make me think about some resources
> > > that could “finish”, but I haven’t understood which one (for example
> > > the connections used to import the tables and all the commands and
> > > datareader used to execute the queries seem disposed correctly).
> > >
> > > I don’t know if it could be due to some Postgres parameter.
> > >
> > > Do you have any suggestions to solve this problem?
> > >
> >
> > Run ANALYZE on the tables/database.
> >
> > See:
> > https://www.postgresql.org/docs/current/sql-analyze.html
> >
> >
> > --
> > Adrian Klaver
> > adrian.kla...@aklaver.com
>
>
>
>


RE: Query hangs (and then timeout) after using COPY to import data

2024-02-11 Thread steott
Thanks, Adrian, for the suggestion, but same problem.

I've just tried to execute "ANALYZE" (on the whole database) after the import 
of all the tables (with COPY) and before the other queries, but the query 
always hangs (I hope this was the way you suggested).

Stefano

> -Original Message-
> From: Adrian Klaver 
> Sent: Sunday, February 11, 2024 10:42 PM
> To: ste...@gmail.com; pgsql-general@lists.postgresql.org
> Subject: Re: Query hangs (and then timeout) after using COPY to import data
> 
> On 2/11/24 13:37, ste...@gmail.com wrote:
> > Hello,
> >
> > I’m new to PostgreSQL. I’m trying to migrate an app from SqlServer to
> > Postgres (it’s written in C# and uses Npgsql)
> >
> > I’ve tried with Postgres 16.1 and 16.2 on Windows Server 2019.
> >
> > This app used SqlServer’s Bulk Insert to import some tables (about 50
> > tables) from another database, I replaced it with Postgres’ COPY
> > function: this part works correctly.
> >
> > After the import, I execute sequentially (not in parallel) some
> > queries in these tables, to update some data and to make some validations.
> >
> > At some point, systematically, one of these queries hangs, and after
> > 10 minutes (the CommandTimeout that I set) it throws this exception:
> >
> > Exception while reading from stream ---> System.TimeoutException:
> > Timeout during reading attempt
> >
> > at Npgsql.Internal.NpgsqlConnector
> >
> > The query is this one:
> >
> >
> > SELECT Id FROM Item
> >
> > WHERE Id NOT IN (
> >
> > SELECT ItemId FROM ItemUom)
> >
> > LIMIT 100
> >
> > The same query, executed from pgAdmin, returns the result in less than
> > a second (even if it’s executed while the query from my app is running).
> >
> > (actually the result are 0 record, but it’s correct: the query it’s
> > just a validation that there are no records in that query)
> >
> > While the query is running from my app, I noticed that the CPU goes
> > beyond 95%, even up to 100%, due to 3 postgres.exe processes.
> >
> > The RAM usage is less than 70%.
> >
> > In pgAdmin I’ve executed a query to list the running queries, and I
> > can see that one.
> >
> > My issue seems to be very similar to this one:
> > https://stackoverflow.com/questions/77195107/npgsql-timeout-during-
> rea
> > ding-attempt
> >  re
> > ading-attempt> but I didn’t understand how that guy solved the
> > problem.
> >
> > If I import less tables from the external database, the query doesn’t
> > hang and runs correctly, so this make me think about some resources
> > that could “finish”, but I haven’t understood which one (for example
> > the connections used to import the tables and all the commands and
> > datareader used to execute the queries seem disposed correctly).
> >
> > I don’t know if it could be due to some Postgres parameter.
> >
> > Do you have any suggestions to solve this problem?
> >
> 
> Run ANALYZE on the tables/database.
> 
> See:
> https://www.postgresql.org/docs/current/sql-analyze.html
> 
> 
> --
> Adrian Klaver
> adrian.kla...@aklaver.com





Re: Query hangs (and then timeout) after using COPY to import data

2024-02-11 Thread Ron Johnson
On Sun, Feb 11, 2024 at 4:41 PM Adrian Klaver 
wrote:

> On 2/11/24 13:37, ste...@gmail.com wrote:
>
[snip]

> >
> > The same query, executed from pgAdmin, returns the result in less than a
> > second (even if it’s executed while the query from my app is running).
> >
> > (actually the result are 0 record, but it’s correct: the query it’s just
> > a validation that there are no records in that query)
>
[snip]

> > Do you have any suggestions to solve this problem?
> >
>
> Run ANALYZE on the tables/database.
>

If ANALYZE was the cure, why does it successfully run from PgAdmin?


Re: How to do faster DML

2024-02-11 Thread Peter J. Holzer
On 2024-02-11 12:08:47 -0500, Ron Johnson wrote:
> On Sun, Feb 11, 2024 at 11:54 AM veem v  wrote:
> When you said "you would normally prefer those over numeric " I was
> thinking the opposite. As you mentioned integer is a fixed length data 
> type
> and will occupy 4 bytes whether you store 15 or .But in case of
> variable length type like Number or numeric , it will resize itself based
> on the actual data, So is there any downside of going with the variable
> length data type like Numeric,
> 
> 
> Consider a table with a bunch of NUMERIC fields.  One of those records has
> small values (aka three bytes).  It fits neatly in 2KiB.
> 
> And then you update all those NUMERIC fields to big numbers that take 15
> bytes.  Suddenly (or eventually, if you update them at different times), the
> record does not fit in 2KiB, and so must be moved to its own.page.  That 
> causes
> extra IO.

I think that's not much of a concern with PostgreSQL because you can't
update a row in-place anyway because of MVCC. So in any case you're
writing a new row. If you're lucky there is enough free space in the same
page and you can do a HOT update, but that's quite independent on
whether the row changes size.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: How to do faster DML

2024-02-11 Thread Peter J. Holzer
On 2024-02-11 22:23:58 +0530, veem v wrote:
> On Sun, 11 Feb 2024 at 19:02, Peter J. Holzer  wrote:
> 
> > Similarly for Number/Numeric data type.
> 
> Number in Oracle and numeric in PostgreSQL are variable length types.
> But in PostgreSQL you also have a lot of fixed length numeric types
> (from boolean to bigint as well as float4 and float8) and you would
> normally prefer those over numeric (unless you really need a decimal or
> very long type). So padding is something you would encounter in a
> typical PostgreSQL database while it just wouldn't happen in a typical
> Oracle database.
> 
> 
> 
> When you said "you would normally prefer those over numeric " I was thinking
> the opposite. As you mentioned integer is a fixed length data type and will
> occupy 4 bytes whether you store 15 or .But in case of variable
> length type like Number or numeric , it will resize itself based on the actual
> data, So is there any downside of going with the variable length data type 
> like
> Numeric, Varchar type always for defining the data elements?

The fixed width types are those that the CPU can directly process:
Integers with 16, 32 and 64 bits, floating point numbers with 32 and 64
bits. The CPU can read and write them with a single memory access, it
can do arithmetic with a single instruction, etc.

Number/Numeric are not native types on any CPU. To read them the CPU
needs several memory accesses (probably one per byte unless you get
really clever) and then it can't do any calculations with them
directly, instead it has run a subroutine which does operations on
little chunks and then puts those chunks together again - basically the
same as you do when you're doing long addition or multiplication on
paper. So that's not very efficient.

Also the space savings aren't that great and probably even negative: In
my example the length of a numeric type with at most 10 digits varied
between 3 and 7 bytes, Only for values between -99 and +99 is this (3
bytes) actually shorter, for other values it's the same length or
longer. So you would only save space if most of your values are in that
±99 range. But not if all of them are, because then you could simply use
a smallint (Range -32768..32767) in PostgreSQL and save another byte.

Finally - and I'm probably biased in this as I learned programming 40
years ago - to me the native types feel more natural than product
specific variable-length decimal types.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Query hangs (and then timeout) after using COPY to import data

2024-02-11 Thread Adrian Klaver

On 2/11/24 13:37, ste...@gmail.com wrote:

Hello,

I’m new to PostgreSQL. I’m trying to migrate an app from SqlServer to 
Postgres (it’s written in C# and uses Npgsql)


I’ve tried with Postgres 16.1 and 16.2 on Windows Server 2019.

This app used SqlServer’s Bulk Insert to import some tables (about 50 
tables) from another database, I replaced it with Postgres’ COPY 
function: this part works correctly.


After the import, I execute sequentially (not in parallel) some queries 
in these tables, to update some data and to make some validations.


At some point, systematically, one of these queries hangs, and after 10 
minutes (the CommandTimeout that I set) it throws this exception:


Exception while reading from stream ---> System.TimeoutException: 
Timeout during reading attempt


    at Npgsql.Internal.NpgsqlConnector

The query is this one:


SELECT Id FROM Item

WHERE Id NOT IN (

SELECT ItemId FROM ItemUom)

LIMIT 100

The same query, executed from pgAdmin, returns the result in less than a 
second (even if it’s executed while the query from my app is running).


(actually the result are 0 record, but it’s correct: the query it’s just 
a validation that there are no records in that query)


While the query is running from my app, I noticed that the CPU goes 
beyond 95%, even up to 100%, due to 3 postgres.exe processes.


The RAM usage is less than 70%.

In pgAdmin I’ve executed a query to list the running queries, and I can 
see that one.


My issue seems to be very similar to this one:
https://stackoverflow.com/questions/77195107/npgsql-timeout-during-reading-attempt 

but I didn’t understand how that guy solved the problem.

If I import less tables from the external database, the query doesn’t 
hang and runs correctly, so this make me think about some resources that 
could “finish”, but I haven’t understood which one (for example the 
connections used to import the tables and all the commands and 
datareader used to execute the queries seem disposed correctly).


I don’t know if it could be due to some Postgres parameter.

Do you have any suggestions to solve this problem?



Run ANALYZE on the tables/database.

See:
https://www.postgresql.org/docs/current/sql-analyze.html


--
Adrian Klaver
adrian.kla...@aklaver.com





Query hangs (and then timeout) after using COPY to import data

2024-02-11 Thread steott
Hello,

I'm new to PostgreSQL. I'm trying to migrate an app from SqlServer to
Postgres (it's written in C# and uses Npgsql) 

I've tried with Postgres 16.1 and 16.2 on Windows Server 2019.

 

This app used SqlServer's Bulk Insert to import some tables (about 50
tables) from another database, I replaced it with Postgres' COPY function:
this part works correctly.

 

After the import, I execute sequentially (not in parallel) some queries in
these tables, to update some data and to make some validations.

At some point, systematically, one of these queries hangs, and after 10
minutes (the CommandTimeout that I set) it throws this exception: 

 

Exception while reading from stream ---> System.TimeoutException: Timeout
during reading attempt

   at Npgsql.Internal.NpgsqlConnector 



The query is this one:


SELECT Id FROM Item 

WHERE Id NOT IN (

SELECT ItemId FROM ItemUom) 

LIMIT 100

 

The same query, executed from pgAdmin, returns the result in less than a
second (even if it's executed while the query from my app is running).

(actually the result are 0 record, but it's correct: the query it's just a
validation that there are no records in that query)

 

While the query is running from my app, I noticed that the CPU goes beyond
95%, even up to 100%, due to 3 postgres.exe processes.

The RAM usage is less than 70%.

In pgAdmin I've executed a query to list the running queries, and I can see
that one.

 

My issue seems to be very similar to this one:
https://stackoverflow.com/questions/77195107/npgsql-timeout-during-reading-a
ttempt
but I didn't understand how that guy solved the problem.

 

If I import less tables from the external database, the query doesn't hang
and runs correctly, so this make me think about some resources that could
"finish", but I haven't understood which one (for example the connections
used to import the tables and all the commands and datareader used to
execute the queries seem disposed correctly).

I don't know if it could be due to some Postgres parameter.

 

Do you have any suggestions to solve this problem?



Re: How to do faster DML

2024-02-11 Thread Ron Johnson
On Sun, Feb 11, 2024 at 11:54 AM veem v  wrote:
[snip]

> When you said *"you would normally prefer those over numeric " *I was
> thinking the opposite. As you mentioned integer is a fixed length data type
> and will occupy 4 bytes whether you store 15 or .But in case of
> variable length type like Number or numeric , it will resize itself based
> on the actual data, So is there any downside of going with the variable
> length data type like Numeric,
>

Consider a table with a bunch of NUMERIC fields.  One of those records has
small values (aka three bytes).  It fits neatly in 2KiB.

And then you update all those NUMERIC fields to big numbers that take 15
bytes.  Suddenly (or eventually, if you update them at different times),
the record does *not* fit in 2KiB, and so must be moved to its own.page.
That causes extra IO.


> Varchar type always for defining the data elements?
>

Internally, all character-type fields are stored as TEXT.  CHAR and
VARCHAR(XX)'s only purposes are SQL-compliance and length-limitation.
And length-limitation is "just" a legacy carried forward from the card
punch days.


Re: How to do faster DML

2024-02-11 Thread David G. Johnston
On Sunday, February 11, 2024, veem v  wrote:

>
> On Sun, 11 Feb 2024 at 19:02, Peter J. Holzer  wrote:
>
>> > Similarly for Number/Numeric data type.
>>
>> Number in Oracle and numeric in PostgreSQL are variable length types.
>> But in PostgreSQL you also have a lot of fixed length numeric types
>> (from boolean to bigint as well as float4 and float8) and you would
>> normally prefer those over numeric (unless you really need a decimal or
>> very long type). So padding is something you would encounter in a
>> typical PostgreSQL database while it just wouldn't happen in a typical
>> Oracle database.
>>
>>
>> When you said *"you would normally prefer those over numeric " *I was
> thinking the opposite. As you mentioned integer is a fixed length data type
> and will occupy 4 bytes whether you store 15 or .But in case of
> variable length type like Number or numeric , it will resize itself based
> on the actual data, So is there any downside of going with the variable
> length data type like Numeric, Varchar type always for defining the data
> elements?
>

Regardless of the size of the actual data in a variable width column expect
that the size and computational overhead is going to make using that field
more costly than using a fixed width field.  You don’t have a choice for
text, it is always variable width, but for numeric, if can use an integer
variant you will come out ahead versus numeric.

David J.


Re: How to do faster DML

2024-02-11 Thread veem v
On Sun, 11 Feb 2024 at 19:02, Peter J. Holzer  wrote:

> > Similarly for Number/Numeric data type.
>
> Number in Oracle and numeric in PostgreSQL are variable length types.
> But in PostgreSQL you also have a lot of fixed length numeric types
> (from boolean to bigint as well as float4 and float8) and you would
> normally prefer those over numeric (unless you really need a decimal or
> very long type). So padding is something you would encounter in a
> typical PostgreSQL database while it just wouldn't happen in a typical
> Oracle database.
>
>
> When you said *"you would normally prefer those over numeric " *I was
thinking the opposite. As you mentioned integer is a fixed length data type
and will occupy 4 bytes whether you store 15 or .But in case of
variable length type like Number or numeric , it will resize itself based
on the actual data, So is there any downside of going with the variable
length data type like Numeric, Varchar type always for defining the data
elements?

Regards
Veem


Re: Partitioning options

2024-02-11 Thread Justin
Hi Marc,

Nested partitioning still allows for simple data deletion by dropping the
table that falls in that date range.

Probably thinking of partitioning by multicolomn rules which is very
complex  to set up

On Fri, Feb 9, 2024, 10:29 AM Marc Millas  wrote:

>
>
>
> On Thu, Feb 8, 2024 at 10:25 PM Justin  wrote:
>
>> Hi Sud,
>>
>> Would not look at HASH partitioning as it is very expensive to add or
>> subtract the number of partitions.
>>
>> Would probably look at a nested partitioning using  customer ID using
>> range or list of IDs then  by transaction date,  Its easy to add
>> partitions and balance the partitions segments.
>>
>
>  I'll not do that because, then, when getting rid of obsolete data, you
> must delete a huge number of records, and vacuum each partition.
> if partitioning by date, you will ease greatly the cleaning, by just
> getting rid of obsolete partitions which is quite speedy.( no delete, no
> vacuum, no index updates, ...)
> Marc
>
>
>> Keep in mind that SELECT queries being used on the partition must  use
>> the partitioning KEY in the WHERE clause of the query or performance will
>> suffer.
>>
>> Suggest doing a query analysis before deploying partition to confirm the
>> queries WHERE clauses matched the planned partition rule.  I suggest that
>> 80% of the queries of the executed queries must match the partition rule if
>> not don't deploy partitioning or change  all the queries in the
>> application to match the partition rule
>>
>>
>> On Thu, Feb 8, 2024 at 3:51 PM Greg Sabino Mullane 
>> wrote:
>>
>>> Out of curiosity, As OP mentioned that there will be Joins and also
 filters on column Customer_id column , so why don't you think that
 subpartition by customer_id will be a good option? I understand List
 subpartition may not be an option considering the new customer_ids gets
 added slowly in the future(and default list may not be allowed) and also OP
 mentioned, there is skewed distribution of data for customer_id column.
 However what is the problem if OP will opt for HASH subpartition on
 customer_id in this situation?

>>>
>>> It doesn't really gain you much, given you would be hashing it, the
>>> customers are unevenly distributed, and OP talked about filtering on the
>>> customer_id column. A hash partition would just be a lot more work and
>>> complexity for us humans and for Postgres. Partitioning for the sake of
>>> partitioning is not a good thing. Yes, smaller tables are better, but they
>>> have to be smaller targeted tables.
>>>
>>> sud wrote:
>>>
>>> 130GB of storage space as we verified using the "pg_relation_size"
 function, for a sample data set.
>>>
>>>
>>> You might also want to closely examine your schema. At that scale, every
>>> byte saved per row can add up.
>>>
>>> Cheers,
>>> Greg
>>>
>>>


Re: How to do faster DML

2024-02-11 Thread Peter J. Holzer
On 2024-02-11 13:25:10 +0530, veem v wrote:
> On Sun, 11 Feb 2024 at 05:55, Peter J. Holzer  wrote:
> Yes. Numbers in Oracle are variable length, so most Oracle tables
> wouldn't contain many fixed length columns. In PostgreSQL must numeric
> types are fixed length, so you'll have quite a lot of them.
> 
> 
> 
> So it means , say in other databases like (oracle database), we were careless
> choosing the data length , say for example Varchar2(4000), if the real data
> which is inserted into the table holds a varchar string of length 20 bytes 
> then
> Oracle trimmed it to occupy the 20 bytes length only in the storage. but in
> postgre here we need to be cautious and define the length as what the data
> attribute can max contains , because that amount of fixed space is allocated
> to every value which is inserted into the table for that attribute/data
> element.

No. Varchar is (as the name implies) a variable length data type and
both Oracle and PostgreSQL store only the actual value plus some length
indicator in the database. Indeed, in PostgreSQL you don't need to
specify the maximum length at all.

However, if you specify a column as "integer" in PostgreSQL it will
always occupy 4 bytes, whether you store the value 15 in it or
9. In Oracle, there is no "integer" type and the roughly
equivalent number(10,0) is actually a variable length floating point
number. So 15 will occupy only 3 bytes and 9 will occupy 7
bytes[1].

> Similarly for Number/Numeric data type.

Number in Oracle and numeric in PostgreSQL are variable length types.
But in PostgreSQL you also have a lot of fixed length numeric types
(from boolean to bigint as well as float4 and float8) and you would
normally prefer those over numeric (unless you really need a decimal or
very long type). So padding is something you would encounter in a
typical PostgreSQL database while it just wouldn't happen in a typical
Oracle database.

But as Laurenz wrote that's a micro optimization which you usually
shouldn't spend much time on. OTOH, if you have hundreds of columns in a
table, maybe it is worthwhile to spend some time on analyzing access
patterns and optimizing the data layout.

hp

[1] From memory. I may be misremembering the details.

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: How to do faster DML

2024-02-11 Thread Laurenz Albe
On Sun, 2024-02-11 at 13:25 +0530, veem v wrote:
> On Sun, 2024-02-11 at 01:25 +0100, Peter J. Holzer wrote:
> > On 2024-02-06 11:25:05 +0530, veem v wrote:
> > > With respect to the storage part:- In Oracle database we were supposed to 
> > > keep
> > > the frequently accessed column first and less frequently accessed columns
> > > towards last in the table while creating table DDL. This used to help the 
> > > query
> > > performance while accessing those columns as it's a row store database. 
> > > Are
> > > there any such performance implications exists in postgres? And there the 
> > > data
> > > types like varchar used to only occupy the space which the real data/value
> > > contains. 
> > >
> > > But here in postgres, it seems a bit different as it has significant
> > > overhead in storage, as these seem like being fixed length data types and 
> > > will
> > > occupy the space no matter what the read data holds.
> >
> > Yes. Numbers in Oracle are variable length, so most Oracle tablesbetween
> > wouldn't contain many fixed length columns. In PostgreSQL must numeric
> > types are fixed length, so you'll have quite a lot of them.
>
> So it means , say in other databases like (oracle database), we were careless
> choosing the data length , say for example Varchar2(4000), if the real data 
> which
> is inserted into the table holds a varchar string of length 20 bytes then 
> Oracle
> trimmed it to occupy the 20 bytes length only in the storage. but in postgre 
> here
> we need to be cautious and define the length as what the data attribute can 
> max
> contains , because that amount of fixed space is allocated to every value 
> which
> is inserted into the table for that attribute/data element. Similarly for
> Number/Numeric data type.  Please correct if my understanding is wrong.

Your understanding is wrong.  Oracle and PostgreSQL are not too different about
storing values.  The data type that Oracle calls "number", and that is called
"numeric" in PostgreSQL and in the SQL standard, is stored a variable length 
data
type in both Oracle and PostgreSQL.

It is just that Oracle has very few data types (I saw them gush about adding
"boolean" as a great new feature in version 23).  So, for example, there are
no integer data types in Oracle, and you have to store them as a variable
length data type.  PostgreSQL has integer data types, which are fixed length
(2, 4 or 8 bytes) and provide much more efficient storage for integers.

"character varying" is also pretty similar in Oracle and PostgreSQL, except
that Oracle calls it "varchar2".

The only fixed-length character data type is "character", but that is always
a bad choice, in Oracle as in PostgreSQL.

About your initial question: in PostgreSQL there is also a certain performance
gain if you store frequently used columns first, since the database has to
skip fewer columns to get to the data.  If the previous columns are fixed
length data types like integers, that is cheaper, because we don't have to
look at the data to know how long they are.

Another thing to consider is padding.  Each fixed-width data type has certain
alignment requirements (imposed by CPU processing) that you can find in
"pg_type.typalign".  This can lead to wasted space in the form of
"padding bytes".  For example, if a "bigint" follows a ASCII single-character
"text" or "varchar" value (which occupies two bytes), there will be six
padding bytes between them to align the "bigint" at a storage address that is
a multiple of eight.

But although both of these considerations (skipping over previous columns and
padding) are relevant for performance, they are often a micro-optimization
that you won't be able to measure, and you shouldn't lose too much sleep
over them.

> 
> > So there's a bit of a tradeoff between minimizing alignment overhead and
> > arranging columns for fastest access.

Precisely.

Yours,
Laurenz Albe




Re: How should we design our tables and indexes

2024-02-11 Thread Karsten Hilbert
Am Sun, Feb 11, 2024 at 12:53:10PM +0530 schrieb veem v:

> >> Pagination is already a hard problem, and does not even make sense when
> > combined with "a continuous stream of inserts". What should the user see
> > when they click on page 2?
> >
>
> When the user clicks to the second page , it will see the next set of rows
> i.e 100 to 200 and next will see 200 to 300 and so on till the result set
> finishes.

Given a continuous stream of inserts "second page" or "next
set of rows" is undefined -- if you aim for live data,
because interleaving data may have been inserting while the
user inspected the first batch of results.

A "second page" is only defined in terms of "what the original
query returned on the first run".

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B