from:"Jim Nasby"

Re: Time zone offset in to_char()

2024-01-11 Thread Jim Nasby


On 1/11/24 9:06 AM, Alban Hertroijs wrote:
I'm basically looking for a one-liner to convert a timestamptz (or a 
timestamp w/o time zone if that turns out to be more convenient) to a 
string format equal to what MS uses for their datetimeoffset type. I got 
almost there with to_char(ts, '-MM-DD HH24:MI:SS.US0 TZH:TZM'). 
Unfortunately(?), the server lives at time zone UTC, while we need to 
convert to both UTC and Europe/Amsterdam zones. The above always gives 
me +00 for the TZH output, while it should be +01 now and +02 in the 
summer...


The issue here is that timestamptz doesn't store the original timezone; 
it always converts whatever is passed in to UTC and stores that. When 
you read the timezone back, by default it will be in the timezone 
specified in the TimeZone GUC. While there's a bunch of ways you can set 
that, for what you're looking to do I don't think any of them are 
appropriate; instead you want to use either AT TIME ZONE or timezone():


create table tstz(tstz timestamptz);
insert into tstz values(now());
SHOW timezone;
 TimeZone
--
 CST6CDT
(1 row)

select * from tstz ;
 tstz
--
 2024-01-11 17:29:00.04933-06
(1 row)

select timezone('UTC',tstz) from tstz ;
 timezone
---
 2024-01-11 23:29:00.04933
(1 row)

select tstz AT TIME ZONE 'UTC' from tstz ;
 timezone
-------
 2024-01-11 23:29:00.04933
(1 row)

--
Jim Nasby, Data Architect, Austin TX

Re: Refresh Materialized View Issue

2024-01-11 Thread Jim Nasby


On 1/11/24 3:40 PM, Ron Johnson wrote:
On Thu, Jan 11, 2024 at 3:50 PM Jeremiah Bauer <mailto:jba...@agristats.com>> wrote:


My question is: what indexes are on public.large_table? 
Hopefully there's a compound b-tree index on id1, id2, id3.


There is not, after further investigation.  There are these 4
indexes that involve id1, id2, and id3.  Should I try creating an
index on all three of the columns?

CREATE INDEX IF NOT EXISTS idx_large_table_id1

[snip]

CREATE INDEX IF NOT EXISTS idx_large_table_id2

[snip]

CREATE INDEX IF NOT EXISTS idx_large_table_id3

[snip]

CREATE INDEX IF NOT EXISTS idx_large_table_id2_id3

[snip]
I'd strongly think about creating such an index, since the current 
indices don't help much.


That'd be a band-aid at best, because we know that the query used to 
define the materialized view runs in a reasonable amount of time on it's 
own, as does a CTAS. So either the REFRESH is doing something odd when 
writing into the new relation (which looking at the code seems very 
unlikely), or REFRESH is getting a different query plan for some reason. 
Unfortunately, I don't know of any easy way to get the query plan for 
the REFRESH (it might be possible via gdb, but I'm not sure). We do at 
least know that the REFRESH is using parallel workers.


Can you post the output of EXPLAIN ANALYZE for the SELECT? That might 
provide some clues.

--
Jim Nasby, Data Architect, Austin TX

Re: Time zone offset in to_char()

2024-01-11 Thread Jim Nasby


On 1/11/24 5:53 PM, Tom Lane wrote:

Adrian Klaver  writes:

test=# select to_char(now() AT TIME ZONE 'Europe/Amsterdam', '-MM-DD
HH24:MI:SS.US0 TZH:TZM') ;
to_char

   2024-01-12 00:44:57.5421420 +00:00
(1 row)



You end up with string that does not the correct offset as the AT TIME
ZONE outputs a timestamp not timestamptz value.


Yeah.  to_char() does not have any source for the TZ/TZH/TZM fields
other than the prevailing value of the timezone parameter, so you
really have to set that the way you want if you desire to use these
format fields.  As noted upthread, SET LOCAL together with a (dummy)
"SET timezone" clause in the function definition can be used to get
the effect of a function-local setting of the parameter.  I don't
know of another way to achieve that result above the C-code level.

regards, tom lane


Sorry, I was implying that you could use the generated timestamp without 
timezone as a string and supply the necessary timezone:


select to_char(timestamptz(timezone('UTC',tstz) || ' CST6CDT'), 
'-MM-DD HH24:MI:SS.US0 TZH:TZM') from tstz ;

  to_char
--------
 2024-01-11 23:29:00.0493300 -06:00
(1 row)

--
Jim Nasby, Data Architect, Austin TX

Re: Time zone offset in to_char()

2024-01-11 Thread Jim Nasby


On 1/11/24 6:20 PM, Jim Nasby wrote:

On 1/11/24 5:53 PM, Tom Lane wrote:

Adrian Klaver  writes:

test=# select to_char(now() AT TIME ZONE 'Europe/Amsterdam', '-MM-DD
HH24:MI:SS.US0 TZH:TZM') ;
    to_char

   2024-01-12 00:44:57.5421420 +00:00
(1 row)



You end up with string that does not the correct offset as the AT TIME
ZONE outputs a timestamp not timestamptz value.


Yeah.  to_char() does not have any source for the TZ/TZH/TZM fields
other than the prevailing value of the timezone parameter, so you
really have to set that the way you want if you desire to use these
format fields.  As noted upthread, SET LOCAL together with a (dummy)
"SET timezone" clause in the function definition can be used to get
the effect of a function-local setting of the parameter.  I don't
know of another way to achieve that result above the C-code level.

    regards, tom lane


Sorry, I was implying that you could use the generated timestamp without 
timezone as a string and supply the necessary timezone:


select to_char(timestamptz(timezone('UTC',tstz) || ' CST6CDT'), 
'-MM-DD HH24:MI:SS.US0 TZH:TZM') from tstz ;

   to_char

  2024-01-11 23:29:00.0493300 -06:00
(1 row)


NEVERMIND... I see now that doesn't actually work.

Perhaps we should add a variant of timezone() that handles this use-case...
--
Jim Nasby, Data Architect, Austin TX

Re: [EXTERNAL]Re: Refresh Materialized View Issue

2024-01-12 Thread Jim Nasby


On 1/12/24 6:34 AM, Jeremiah Bauer wrote:


That'd be a band-aid at best, because we know that the query used to
define the materialized view runs in a reasonable amount of time on it's
own, as does a CTAS. So either the REFRESH is doing something odd when
writing into the new relation (which looking at the code seems very
unlikely), or REFRESH is getting a different query plan for some reason.
Unfortunately, I don't know of any easy way to get the query plan for
the REFRESH (it might be possible via gdb, but I'm not sure). We do at
least know that the REFRESH is using parallel workers.

Can you post the output of EXPLAIN ANALYZE for the SELECT? That might
provide some clues.

Sure, here is the explain analyze for the select:

HashAggregate  (cost=123986470.20..129486707.63 rows=164493082 width=24) 
(actual time=697250.385..741548.965 rows=59015171 loops=1)

   Group Key: id1, id2, id3
   Planned Partitions: 64  Batches: 65  Memory Usage: 328209kB  Disk 
Usage: 6750176kB
   ->  Gather  (cost=64653301.50..107228737.47 rows=328986164 width=24) 
(actual time=357598.331..594226.355 rows=161151623 loops=1)

         Workers Planned: 2
         Workers Launched: 2
         ->  HashAggregate  (cost=64652301.50..74329121.07 
rows=164493082 width=24) (actual time=357572.082..578038.457 
rows=53717208 loops=3)

               Group Key: id1, id2, id3
               Planned Partitions: 64  Batches: 65  Memory Usage: 
328209kB  Disk Usage: 25774088kB
               Worker 0:  Batches: 65  Memory Usage: 328209kB  Disk 
Usage: 25375784kB
               Worker 1:  Batches: 65  Memory Usage: 328209kB  Disk 
Usage: 25382936kB
               ->  Parallel Seq Scan on large_table 
  (cost=0.00..29740358.40 rows=685387840 width=24) (actual 
time=12.954..99596.289 rows=548310252 loops=3)

Planning Time: 5.380 ms
Execution Time: 745750.371 ms


Ok, so that's using a parallel query as well, just like REFRESH, so no 
help there.


At this point I think you'd need to do some OS-level investigation using 
a tool like perf to determine what the parent process is doing once the 
workers finish.


BTW, I did just come across a script[1] that will print the query plan 
for a running query. Based on what you've said I suspect it wouldn't 
tell us much here, but I wanted to mention it.


1: https://github.com/StarfishStorage/explain-running-query
--
Jim Nasby, Data Architect, Austin TX

Re: Moving to Postgresql database

2024-01-16 Thread Jim Nasby

On 1/16/24 11:59 AM, Ron Johnson wrote:

 >      > Hi. One of the biggest pitfall of PostgreSQL, from the app-dev
 >     perspective,
 >      > is the fact any failed statement fails the whole
transaction, with
 >      > ROLLBACK as the only recourse.
 >
 >     "SAVEPOINT establishes a new savepoint within the current
transaction.
 >
 >
 > I wish it was that easy.
 > I've been scared away from using them, after reading a few
articles...
 > Also, that incurs extra round trips to the server, from the extra
commands.

The point was that '...  with ROLLBACK as the only recourse.' is not
the
case. There is an alternative, whether you want to use it being a
separate question.

Performance-killing alternatives are not really altternatives.

What's the actual performance issue here?

I'm also wondering what the use case for constantly retrying errors is.
--
Jim Nasby, Data Architect, Austin TX

Re: What should I expect when creating many logical replication slots?

2024-01-16 Thread Jim Nasby


On 1/11/24 6:17 PM, Antonin Bas wrote:

Hi all,

I have a use case for which I am considering using Postgres Logical 
Replication, but I would like to scale up to 100 or even 200 
replication slots.


I have increased max_wal_senders and max_replication_slots to 100 (also 
making sure that max_connections is large enough). Things seem to be 
working pretty well so far based on some PoC code I have written. 
Postgres is creating a walsender process for each replication slot, as 
expected, and the memory footprint of each one is around 4MB.


So I am quite happy with the way things are working, but I am a bit 
uneasy about increasing these configuration values by 10-20x compared to 
their defaults (both max_wal_senders and max_replication_slots default 
to 10).


Is there anything I should be looking out for specifically? Is it 
considered an anti-pattern to use that many replication slots and 
walsender processes? And, when my database comes under heavy write load, 
will walsender processes start consuming a large amount of CPU / memory 
(I recognize that this is a vague question, I am still working on some 
empirical testing).


The biggest issue with logical decoding (what drives logical 
replication) is that every subscriber has to completely decode 
everything for it's publication, which can be extremely memory intensive 
under certain circumstances (long running transacitons being one 
potential trigger). Decoders also have to read through all WAL traffic, 
regardless of what their publication is set to - everything runs of the 
single WAL stream.


Note that this only applies to actually decoding - simply having a large 
number of slots isn't much of an issue. Even having a large number of 
subscribers that aren't consuming isn't a resource issue (though it IS 
an issue for MVCC / vacuuming!) - to test you need to have all the 
decoders that you expect to support.


Ultimately, I'd be concerned with trying to support 100+ slots unless 
you know that your change rate isn't super high and that you don't have 
long-running transactions.

--
Jim Nasby, Data Architect, Austin TX

Re: How to redirect output from PostgreSQL pg_recvlogical to a file or a pipe?

2024-01-16 Thread Jim Nasby


On 1/13/24 3:34 PM, David Ventimiglia wrote:

The business problem I'm trying to solve is:

"How do I capture logical decoding events with the wal2json output 
encoder, filter them with jq, and pipe them to psql, using pg_recvlogical?"


I think the missing piece here is that you can't simply pipe JSON into 
psql and expect anything useful to happen. Are you using jq to turn the 
JSON into actual SQL statements? What does some of your jq output look like?

--
Jim Nasby, Data Architect, Austin TX

Re: Why scan all columns when we select distinct c1?

2024-01-16 Thread Jim Nasby


On 1/14/24 9:46 AM, Tom Lane wrote:

Ron Johnson  writes:

You can't scan just one column of a row-oriented table.


Technically you kinda can, depending on your definition of "scan".


The real question is why it mentions c2.

The planner did that so that the SeqScan step doesn't have to
perform a projection: it can just return (a pointer to)
the physical tuple it found in the table, without doing extra
work to form a tuple containing only c1.


The piece of info that's not mentioned here is how tuples (rows) are 
actually processed to extract individual datums (columns). The full 
details are in heap_deform_tuple() in backend/access/common/heaptuple.c, 
but the general gist is that (ignoring nulls) to read a tuple the code 
has to go datum by datum, computing the size of each datum to determine 
the physical location of the *next* datum. So if you want the 3rd datum 
in a tuple, you need to calculate the size of the 1st datum to see where 
the 2nd datum lives, and then compute the size of the 2nd datum to see 
where the 3rd one lives.


In this example, if c1 is literally the first column in the table, then 
heap_deform_tuple is free to ignore everything else in the tuple, so 
long as the code calling heap_deform_tuple() knows to ask for only 1 datum.


If all that sounds kinda expensive and tedious: you're right, it is, and 
it's why deforming tuples is generally done as late as possible. Based 
on what Tom's saying, in 9.6+ the HashAggregate code would be calling 
heap_deform_tuple(), and I'd expect it to only be retrieving c1. Without 
going through all the code, I think what's happening in 9.4 is the 
projection ends up calling heap_deform_tuple instead of the HashAgg 
code. It's still only grabbing c1; it's just doing it sooner rather than 
later. In this particular case I don't think it'd make much difference, 
but in more complicated queries it could certainly have a noticable effect.


Either way, the explain output is kinda confusing. IMO it'd be more 
accurate if it said something like "Output: t1". And Ron's main point 
that you're going to be reading an entire row of t1 from the OS is also 
true.


BTW, there's another place where the code waits as long as possible to 
access actual data in the hopes of avoiding needless work and that's 
values that have been TOASTed. heap_deform_tuple() doesn't actually need 
to de-toast data, so it will simply return a Datum that is a "pointer" 
(not a C pointer) to the toasted data. That will only be detoasted if 
something actually needs the actual data. In some cases that be a big 
performance win.

--
Jim Nasby, Data Architect, Austin TX

Re: Add support for data change delta tables

2024-01-16 Thread Jim Nasby


On 1/15/24 4:40 AM, Pavel Stehule wrote:
PostgreSQL is not a project where somebody says, you will do this, and 
you will do this, in this and this priority. There are a lot of nice 
features from standard - SQL standard is big, but a) you have to have 
funds to pay the developers, b) you have to have developers with free 
time who want to do free work. Sure, you can write a proposal, but 
without a) money or b) your hard work is not a big possibility so your 
proposal will be implemented in the next few years. With the patch the 
situation is different. Then developers can see, so it is possible to 
implement, and how much code (and work) needs it.


And in this scenario, the big thing that a patch provides is clarity on 
what you're actually trying to accomplish. Your original ask is frankly 
as clear as mud to me - I have no clue what the syntax you showed is 
supposed to be doing.


If you were to ask about something concrete in plain English (vs some 
unspecified SQL syntax) then you'd be much more likely to at least get 
some kind of response about your idea. It's certainly possible to 
convince people on -hackers to work on something that you'd like to see 
done, but the base prerequisite for that is you have to communicate what 
you're actually looking for.

--
Jim Nasby, Data Architect, Austin TX

Re: data migration using EXTENSION tds_fdw

2024-01-16 Thread Jim Nasby


On 1/15/24 1:17 PM, Adrian Klaver wrote:


I don't use tds_fdw, so the best I can do is point you at information

Hopefully someone with practical experience will chime in with more 
specific information.


The information I could find is:

https://www.freetds.org/userguide/configs.html#Localization


One other UTF8 "gotcha" in Postgres that may or may not be relevant 
here: UTF8 considers the value 0 (as in a 0 byte; not the glyph "0") to 
be a legit character, but for various reasons Postgres can't actually 
support that, so trying to insert into a text/varchar/char field in PG 
will give you an error (not sure what exact error you'd get though).

--
Jim Nasby, Data Architect, Austin TX

Re: postgres sql assistance

2024-01-16 Thread Jim Nasby


On 1/16/24 6:34 AM, arun chirappurath wrote:
I am trying to load data from the temp table to the main table and catch 
the exceptions inside another table.


I don't have a specific answer, but do have a few comments:

- There are much easier ways to do this kind of data load. Search for 
"postgres data loader" on google.


- When you're building your dynamic SQL you almost certainly should have 
some kind of ORDER BY on the queries pulling data from 
information_schema. SQL never mandates data ordering except when you 
specifically use ORDER BY, so the fact that your fields are lining up 
right now is pure luck.


- EXCEPTION WHEN others is kinda dangerous, because it traps *all* 
errors. It's much safer to find the exact error code. An easy way to do 
that in psql is \errverbose [1]. In this particular case that might not 
work well since there's a bunch of different errors you could get that 
are directly related to a bad row of data. BUT, there's also a bunch of 
errors you could get that have nothing whatsoever to do with the data 
you're trying to load (like if there's a bug in your code that's 
building the INSERT statement).


- You should look at the other details you can get via GET STACKED 
DIAGNOSTICS [2]. As far as I can tell, your script as-written will 
always return the first column in the target table. Instead you should 
use COLUMN_NAME. Note that not every error will set that though.


1: 
https://www.postgresql.org/docs/current/app-psql.html#APP-PSQL-META-COMMAND-ERRVERBOSE
2: 
https://www.postgresql.org/docs/current/plpgsql-control-structures.html#PLPGSQL-EXCEPTION-DIAGNOSTICS

--
Jim Nasby, Data Architect, Austin TX

Re: pg_dump Running Slow

2024-01-16 Thread Jim Nasby


On 1/16/24 12:55 PM, Yongye Serkfem wrote:

Hello Engineers,
I trust you are all doing well. I need help on how to improve the speed 
of pg_dump. I took a pg_dump on a 1TB database, which took almost a 
whole day. I used this command: "pg_dump -U postgres -d dynamic -f 
/backups/." Also, how do I check on the progression of the process?


The advantage to pg_dump is that it's very simple and you'd have to work 
really hard to end up with a backup that won't restore. As such, I 
*always* recommend that it be *part* of your disaster recovery plan. It 
also directly supports only restoring one table.


The flip-side is that all you get from pg_dump is (in one form or 
another) just a bunch of SQL that you'll then have to execute to 
restore. That process can be quite slow for anything but a very small 
cluster. (And as you've seen, pg_dump itself can be pretty slow.) While 
--jobs allows parallelizing that work, it's still a very expensive process.


For clusters of any significant size you'll also want to look at some 
form of binary backup (aka: PITR) [1]. Since that operates directly at 
the OS level it's much faster than pg_dump on large clusters. The 
downside is there are lots of subtle ways to set it up incorrectly, 
resulting in backups that won't restore. Instead of trying to manage it 
by hand, I strongly recommend using a tool that's meant for managing 
Postgres binary backups. pgBackRest[2] and barman[3] are two popular 
choices for that.


Most importantly: ALWAYS TEST YOUR BACKUPS. As I mentioned, it's pretty 
hard to screw up pg_dump, but you don't want to be trying to figure out 
how to restore in the middle of a disaster. Given all the subtlties 
involved with PITR I would actually recommend you test restoring *every* 
base backup you take, especially if you're not using one of the tools to 
manage it.


BTW, one of the biggest advantages to those tools is that a lot of 
people use them, so any bugs (in the tool or in Postgres itself) are 
more likely to be found and eliminated. It's also a big advantage for 
cloud providers (as well as tools like PGO[4]): with many thousands of 
instances running regular backups it's harder for bugs to remain hidden.


1: https://www.postgresql.org/docs/current/continuous-archiving.html
2: https://pgbackrest.org/
3: https://pgbarman.org/
4: https://github.com/CrunchyData/postgres-operator
--
Jim Nasby, Data Architect, Austin TX

Re: Mimic ALIAS in Postgresql?

2024-01-16 Thread Jim Nasby


On 1/16/24 11:20 AM, Ron Johnson wrote:
Some RDBMSs have CREATE ALIAS, which allows you to refer to a table by a 
different name (while also referring to it by the original name).


We have an application running on DB2/UDB which (for reasons wholly 
unknown to me, and probably also to the current developer) extensively 
uses this with two schemas: MTUSER and MTQRY.  For example, sometimes 
refer to MTUSER.sometable and other times refer to it as MYQRY.sometable.


My goal is to present a way to migrate from UDB to PG with as few 
application changes as possible.  Thus, the need to mimic aliases.


Maybe updatable views?
CREATE VIEW mtqry.sometable AS SELECT * FROM mtuser.sometable;


Based on the schema names one possibility is that the aliases are there 
as a pseudo-api between people/tools writing queries and the base 
tables. IE: if you needed to make a (maybe backwards-incompatible) 
change to "sometable" you now at least have the option of creating a 
MTQRY.sometable *view* that hides whatever change you're making to 
MTUSER.sometable.


In any case, yes, an updatable view would provide equivalent behavior in 
Postgres.

--
Jim Nasby, Data Architect, Austin TX

Re: Mimic ALIAS in Postgresql?

2024-01-16 Thread Jim Nasby


On 1/16/24 4:57 PM, Rob Sargent wrote:

Or perhaps you have to beef the sed up to use word boundaries just
in case.


I'm not a Java web developer... 😁


You need to adjust you glasses if that's what you see me as.


Reality is that basically all modern (as in last 20 years) SQL access is 
via frameworks that all use their own language and come up with SQL 
based on that. How hard it'd be to bulk change the schema depends 
entirely on the framework.

--
Jim Nasby, Data Architect, Austin TX

Re: Parameter value in RDS

2024-01-16 Thread Jim Nasby


On 1/16/24 3:20 PM, Bruce Momjian wrote:

We can't answer RDS-specific questions here.  I suggest you ask the
vendor.

Okay, it seems someone here_did_  answer the question.  🙂


Yeah, but still best that people go to vendors for stuff that's not 
community. If for no other reason than one of the best ways to get 
documentation improved is for support to get tired of answering the same 
question over and over again ;p

--
Jim Nasby, Data Architect, Austin TX

Re: postgres sql assistance

2024-01-17 Thread Jim Nasby


On 1/16/24 10:04 PM, arun chirappurath wrote:

Architect is pressing for a native procedure to data load.


It's possible to write a loader in pl/pgsql but it would be easily twice 
as complex as where you got on your first attempt. It would also never 
perform anywhere near as well as a dedicated loader, because there's no 
way to avoid the temp table (which a native loader doesn't need to use).

--
Jim Nasby, Data Architect, Austin TX

Re: Moving to Postgresql database

2024-01-17 Thread Jim Nasby


On 1/17/24 5:31 AM, Dominique Devienne wrote:

Is the pg_statement_rollback technically wrong? Can't what it does be done
better and more efficiently if it was in the core itself? Is it a lot of 
code?


I took a quick look at that extension, and it's doing pretty much what 
you'd do if this was baked into Postgres. The performance penaltiy that 
you'll suffer here is that you're going to assign a new transaction ID 
for every statement, which can be significantly more expensive than 
using one XID per BEGIN/COMMIT (depending of course on how many 
statements you have inside a BEGIN/COMMIT).


By the way, you might take a look at Babelfish[1] since it has to solve 
this issue as well due to some of the error handling modes that T-SQL 
supports.


Basically implicit-statement-level-rollback is the norm, AFAIK, and 
PostgreSQL is the exception here.


I'm really curious what other databases you've seen that have this 
behavior, because the only time I've ever seen it was T-SQL. Way back in 
Sybase 11 days it was the only behavior you had, but at some point SQL 
Server (and maybe Sybase) added additional options.


Frankly, this paradigm has always seemed completely broken to me. The 
entire point of having transactions is so you have all-or-nothing 
behavior: either everything works or the transaction aborts. I realize 
that automatically rolling a statement back doesn't technically violate 
ACID, but IMO it definitely violates the spirit of it. While there are 
certainly *some* legitimate uses for rolling a statement back on error, 
in 30 years I've seen maybe one scenario where you'd want to roll a 
statement back on *any* error, and even then it was only on a specific 
statement - not every statement that might get sent to the server.


1: https://babelfishpg.org/
--
Jim Nasby, Data Architect, Austin TX

Re: unbale to list schema

2024-01-22 Thread Jim Nasby


On 1/17/24 12:46 PM, Atul Kumar wrote:

Hi,

I am not able to find any solution to list all schemas in all databases 
at once, to check the structure of the whole cluster.


Easiest way to do this is `pg_dumpall --schema-only`.
--
Jim Nasby, Data Architect, Austin TX

Re: Mimic ALIAS in Postgresql?

2024-01-22 Thread Jim Nasby


On 1/16/24 6:41 PM, Rob Sargent wrote:

On 1/16/24 17:39, Jim Nasby wrote:

On 1/16/24 4:57 PM, Rob Sargent wrote:

    Or perhaps you have to beef the sed up to use word boundaries just
    in case.


I'm not a Java web developer... 😁


You need to adjust you glasses if that's what you see me as.


Reality is that basically all modern (as in last 20 years) SQL access 
is via frameworks that all use their own language and come up with SQL 
based on that. How hard it'd be to bulk change the schema depends 
entirely on the framework.
Hm, it's a string /somewhere/.  The rest of this thread might be accused 
of adding to the problem.


No, it's not, at least not as a complete SQL statement. See [1] as an 
example of how this works in Ruby on Rails. Most modern frameworks work 
in a similar fashion: you DON'T write raw SQL, or anything that looks 
anything like it. In fact, many (most?) of these frameworks make it 
difficult to do anything in raw SQL because it completely breaks the 
paradigm of the framework.


Note that I'm talking about *frameworks*, not languages. But since most 
languages require huge amounts of boilerplate to create a web service or 
website it's not surprising that pretty much everyone uses frameworks. 
(Go is actually an interesting exception to this.)


1: https://guides.rubyonrails.org/active_record_querying.html#find
--
Jim Nasby, Data Architect, Austin TX

Re: Partitioning options

2024-02-08 Thread Jim Nasby


On 2/8/24 1:43 PM, veem v wrote:


On Thu, 8 Feb 2024 at 20:08, Greg Sabino Mullane <mailto:htamf...@gmail.com>> wrote:




Should we go for simple daily range partitioning on the
transaction_date column?


This one gets my vote. That and some good indexes.





Hello Greg,

Out of curiosity, As OP mentioned that there will be Joins and also 
filters on column Customer_id column , so why don't you think that 
subpartition by customer_id will be a good option? I understand List 
subpartition may not be an option considering the new customer_ids gets 
added slowly in the future(and default list may not be allowed) and also 
OP mentioned, there is skewed distribution of data for customer_id 
column. However what is the problem if OP will opt for HASH subpartition 
on customer_id in this situation?


Is it because the number of partitions will be higher i.e.

If you go with simple range partitioning, for 5 months you will have 
~150 daily range partitions and with each index the count of partition 
will gets double, for e.g if you will have 10 indexes, the total 
partitions will be = ~150 table partition+ (10*150)index partition= 1650 
total number of partitions.


If OP goes for , range-hash, and hash will mostly have to be 2^N, so say 
8, hash sub-partitions , then the total number of partitions will be = 
(8*150) table partitions+ (8*150*10) index partitions= ~13200 partitions.


Though there are no theoretical limits to the number of partitions in 
postgres, there are some serious issues noted in the past with higher 
number of table partitions. One such is below. Is this the reason?


https://www.kylehailey.com/post/postgres-partition-pains-lockmanager-waits 
<https://www.kylehailey.com/post/postgres-partition-pains-lockmanager-waits>


The issue with partitioning by customer_id is that it won't do much (if 
anything) to improve data locality. When partitioning by date, you can 
at least benefit from partition elimination *IF* your most frequent 
queries limit the number of days that the query will look at. Per the 
OP, all queries will include transaction date. Note that does NOT 
actually mean the number of days/partitions will be limited (ie, WHERE 
date > today - 150 will hit all the partitions), but if we assume that 
the majority of queries will limit themselves to the past few days then 
partitioning by date should greatly increase data locality.


Also, when it comes to customer partitioning... really what you probably 
want there isn't partitioning but sharding.

--
Jim Nasby, Data Architect, Austin TX

Re: Time zone offset in to_char()

Re: Refresh Materialized View Issue

Re: Time zone offset in to_char()

Re: Time zone offset in to_char()

Re: [EXTERNAL]Re: Refresh Materialized View Issue

Re: Moving to Postgresql database

Re: What should I expect when creating many logical replication slots?

Re: How to redirect output from PostgreSQL pg_recvlogical to a file or a pipe?

Re: Why scan all columns when we select distinct c1?

Re: Add support for data change delta tables

Re: data migration using EXTENSION tds_fdw

Re: postgres sql assistance

Re: pg_dump Running Slow

Re: Mimic ALIAS in Postgresql?

Re: Mimic ALIAS in Postgresql?

Re: Parameter value in RDS

Re: postgres sql assistance

Re: Moving to Postgresql database

Re: unbale to list schema

Re: Mimic ALIAS in Postgresql?

Re: Partitioning options

21 matches

Site Navigation

Mail list logo

Footer information