Re: JSON query
> On Oct 18, 2021, at 10:02 PM, David G. Johnston > wrote: > > (jsonb - text[]) = ‘{}’::jsonb …? Aha, thank you!
JSON query
What's a good way to query jsonb column for "no keys other than those in this list of keys" in other words "containing only keys from this list of keys" -- Scott Ribe scott_r...@elevated-dev.com https://www.linkedin.com/in/scottribe/
Re: export to parquet
> On Aug 26, 2020, at 1:11 PM, Chris Travers wrote: > > For simple exporting, the simplest thing is a single-node instance of Spark. Thanks. > You can read parquet files in Postgres using > https://github.com/adjust/parquet_fdw if you so desire but it does not > support writing as parquet files are basically immutable. Yep, that's the next step. Well, really it is what I am interested in testing, but first I need my data in parquet format (and confirmation that it gets decently compressed).
export to parquet
I have no Hadoop, no HDFS. Just looking for the easiest way to export some PG tables into Parquet format for testing--need to determine what kind of space reduction we can get before deciding whether to look into it more. Any suggestions on particular tools? (PG 12, Linux) -- Scott Ribe scott_r...@elevated-dev.com https://www.linkedin.com/in/scottribe/
Re: privileges oddity
So, one last follow-up, perhaps \du or \du+ should show when a role is mapped that way. If I'd seen a clue to this setting that had been made "before I got here" it would have been figured out sooner. I realize ALTER ROLE... SET... can be used to set many more defaults, and there could be some debate about how much to display with \du[+], but the fact that a role abandons all its privs and adopts a different set seems like pretty important info to surface ;-)
Re: privileges oddity
> On Aug 7, 2020, at 1:32 PM, Tom Lane wrote: > > Yes, you are. It looks like what you actually issued is > > ALTER USER akanzler SET role confidential_read_only; > > but that would have the effect that subsequent session starts would > automatically do "SET ROLE confidential_read_only". AHA! This is the correct answer, and it is solved now. I *know* I executed grant role properly--it's right there in the script. However, I think that SET ROLE had been accidentally misused instead of GRANT at some point in the past, *AND* that the role contained privs to the user-specific schemas when it should not have. So revoking all privs from the role and adding back the proper ones resulted in inadvertently removing privs from users who'd properly had them explicitly granted. (I've never even used SET ROLE and was unaware you could even do that!) Anyway, thanks a million for being patient and sticking with this.
Re: privileges oddity
> On Aug 7, 2020, at 1:08 PM, Adrian Klaver wrote: > > "Using this command, it is possible to either add privileges or restrict > one's privileges. If the session user role has the INHERIT attribute, then it > automatically has all the privileges of every role that it could SET ROLE to; > in this case SET ROLE effectively drops all the privileges assigned directly > to the session user and to the other roles it is a member of, leaving only > the privileges available to the named role. On the other hand, if the session > user role has the NOINHERIT attribute, SET ROLE drops the privileges assigned > directly to the session user and instead acquires the privileges available to > the named role. > " So it would only have removed privs if I had used set role in the session, which I am not.
Re: privileges oddity
> On Aug 7, 2020, at 12:45 PM, Tom Lane wrote: > > If I'm reading this correctly, you have set things up so that any > session logging in as akanzler will immediately do "SET ROLE > confidential_read_only", after which it's the privileges of that > role not akanzler that determine what happens. YES, confidential_read_only has privs on everything *except* individual user's schemas, and rolinherit was accidentally set, that would certainly seem to be the problem. But I turned that off, and it still doesn't work--even in a new connection.
Re: privileges oddity
On Aug 7, 2020, at 12:27 PM, Adrian Klaver wrote: > > So what privileges does role 'confidential_read_only' have? read on everything I tried creating a new user without it, just doing the same grants otherwise as for akanzler, that worked. Then I added that user to confidential_read_only, still worked.
Re: privileges oddity
> On Aug 7, 2020, at 12:17 PM, Adrian Klaver wrote: > > Well if this for the same line as before it represents table privileges. The > problem is with schema access. Continuing grasping at straws: > > select * from pg_roles where rolname = 'aakanzler'; rolname | rolsuper | rolinherit | rolcreaterole | rolcreatedb | rolcanlogin | rolreplication | rolconnlimit | rolpassword | rolvaliduntil | rolbypassrls | rolconfig | oid --+--++---+-+-++--+-+---+--+---+--- akanzler | f| t | f | f | t | f | -1 | | | f| {role=confidential_read_only} | 16391
Re: privileges oddity
> On Aug 7, 2020, at 11:31 AM, Scott Ribe wrote: > > Wondering if there's a code path somewhere that lets the default take > precedence??? So, I changed the defaults, now I see akanzler=arwdDxt/srv_risk, problem persists
Re: privileges oddity
> > What happens if you do?: > > select has_schema_privilege('akanzler', 'zoewang', 'usage'); risk_oltp_prod=# select has_schema_privilege('akanzler', 'zoewang', 'usage'); has_schema_privilege -- t (1 row) > In psql what does > > \ddp > > show? risk_oltp_prod=# \ddp Default access privileges Owner | Schema | Type | Access privileges +--+---+ ... srv_risk | zoewang | table | akanzler=r/srv_risk + | | | srv_risk=arwdD/srv_risk ... (40 rows) Wondering if there's a code path somewhere that lets the default take precedence???
Re: privileges oddity
Further update: create a new user, grant all on schema & the table, works reboot of server did not change anything, so the problem is in persistent state
Re: privileges oddity
FYI, REVOKE ALL ON SCHEMA... followed by GRANT ALL ON SCHEMA... did not change anything
Re: privileges oddity
> On Aug 6, 2020, at 12:53 PM, Stephen Frost wrote: > > Are you 110% sure that you're actually connecting to the same instance > in both cases (I'd say database too, but hopefully psql isn't lying to > you about that on your prompt, but maybe double-check anyway...). yes--double checked > Have you re-tried from the app (maybe someone fixed it in the > meantime)? still seeing it from psql \
Re: privileges oddity
> On Aug 6, 2020, at 12:38 PM, Tom Lane wrote: > > Hmph. Any chance of getting a stack trace from the point of the error? possibly > Also, which PG version is this? 12.3 It is probably relevant that we cleaned up roles & privs yesterday, lots of REVOKE & GRANT, and some DROP ROLE. I started out thinking I'd made a mistake with that, but now I'm starting to wonder if there's a bug I hit in some of that which fubar'd something in system catalog... Maybe also relevant that the original creator & owner of the schema (zoewang) was dropped after schema owner was changed srv_risk. (Likewise, changed ownership of tables in the schema...)
Re: privileges oddity
> On Aug 6, 2020, at 12:36 PM, Adrian Klaver wrote: > > No triggers or FOREIGN KEYS? No. No keys or indexes either--that was the entire table def.
Re: privileges oddity
On Aug 6, 2020, at 12:22 PM, Tom Lane wrote: > > Gonna need more context. The session-level user seems to have the > right privileges, but maybe something is happening inside a > security-definer function that doesn't have privileges? The only security definer function in the db is a simple pg_shadow lookup used by pgbouncer. Hmm, I should check both direct to PG and through PG bouncer--even though he is getting connected as the correct user, per PG's error in the log.
Re: privileges oddity
> On Aug 6, 2020, at 12:22 PM, Adrian Klaver wrote: > > Schema for the table? Nothing relevant: Column| Type | Collation | Nullable | Default -+---+---+--+- curve_name | character varying(30) | | | curve_type | character varying(15) | | | tenor_name | character varying(10) | | | tenor_date | date | | | value_date | date | | | curve_value | numeric | | |
privileges oddity
when user akanzler tries to run query "SELECT * FROM zoewang.sometable...", it triggers this error: 2020-08-06 17:27:27.664 UTC [15914]: [3] user=akanzler,db=risk_oltp_prod,app=[unknown],client=10.8.170.24: ERROR: permission denied for schema zoewang at character 15 --- YET --- risk_oltp_prod=# \dn+ zoewang List of schemas Name | Owner | Access privileges| Description -+--++- zoewang | srv_risk | srv_risk=UC/srv_risk +| | | akanzler=UC/srv_risk +| | | srv_risk_ro=U/srv_risk | (1 row) HUH? (And the user also has all privs on all the tables in the schema...) -- Scott Ribe scott_r...@elevated-dev.com https://www.linkedin.com/in/scottribe/
Re: is JIT available
> On Jul 27, 2020, at 9:33 PM, Tom Lane wrote: > > The general, non-hacker meaning of "jit is enabled" would seem to > be pretty much what this function is already doing; and for that > matter, the same can be said for "JIT compilation is available". > We need something that's less tautological-looking. Maybe along > the lines of My problem was that it says "is enabled", then calls out just one of the conditions for it to be available, but not the other one. Either calling out no conditions, or all of them, would be more clear.
Re: is JIT available
> On Jul 27, 2020, at 6:04 PM, David Rowley wrote: > > "returns true if jit is enabled and JIT compilation is available in > this session (see Chapter 31)." That is clearer. I didn't submit a suggestion myself because I'm not clear on the actual circumstances. I know it won't be available if: - jit is not on in config - PG was not compiled with JIT support But does compilation with JIT enable and LLVM dev tools mean that all the LLVM compilation/optimization is built into the PG binaries, or does it require LLVM presence on the machine where deployed? And if so, does the function take that into account as well? I would guess the function is telling the truth under all circumstances, but I don't know for sure. Perhaps: "returns true if JIT (see Chapter 31) is available in this session. Availability of JIT requires that PG was compiled with JIT support, JIT is enabled in config, .
Re: bad JIT decision
> On Jul 27, 2020, at 4:00 PM, Alvaro Herrera wrote: > > I don't quite understand why is it that a table with 1000 partitions > means that JIT compiles the thing 1000 times. Sure, it is possible that > some partitions have a different column layout, but it seems an easy bet > that most cases are going to have identical column layout, and so tuple > deforming can be shared. (I'm less sure about sharing a compile of an > expression, since the varno would vary. But presumably there's a way to > take the varno as an input value for the compiled expr too?) Now I > don't actually know how this works so please correct if I misunderstand > it. I'm guessing it's because of inlining. You could optimize a function that takes parameters, no problem. But what's happening is inlining, with parameters, then optimizing.
Re: is JIT available
> On Jul 25, 2020, at 8:02 AM, Christoph Moench-Tegeder > wrote: > > ## Scott Ribe (scott_r...@elevated-dev.com): > >> So JIT is enabled in your conf, how can you tell from within a client >> session whether it's actually available (PG compiled with it and >> compiler available)? > > pg_jit_available() boolean is JIT compilation available in this session > > https://www.postgresql.org/docs/12/functions-info.html Thanks, that seems to be exactly what I was looking for. Even though the documentation is not clear, it does return false when jit = on but PG was not compiled with JIT.
Re: is JIT available
> On Jul 25, 2020, at 6:21 AM, Pavel Stehule wrote: > > It shows if Postgres was compiled with JIT support. > > When you run EXPLAIN ANALYZE SELECT ... then you can see info about JIT > overhead. If you don't see notices about JIT in EXPLAIN, then JIT was not > used. The presence of "jit = on" in the config file does not indicate whether the running PG was actually compiled with JIT. And I'm not sure whether beyond that, compiling with JIT requires presence of anything outside the PG install.
Re: is JIT available
> On Jul 24, 2020, at 9:55 PM, Pavel Stehule wrote: > > SELECT * FROM pg_config; That doesn't tell me whether or not it can actually be used.
is JIT available
So JIT is enabled in your conf, how can you tell from within a client session whether it's actually available (PG compiled with it and compiler available)? (In the other discussion I started, doing a dump and import of just the tables involved, onto a system where JIT was inadvertently not working for some reason, lead me down a dead end for a bit.) -- Scott Ribe scott_r...@elevated-dev.com https://www.linkedin.com/in/scottribe/
Re: bad JIT decision
> On Jul 24, 2020, at 4:37 PM, Tom Lane wrote: > > Yeah. I'm fairly convinced that the v12 defaults are far too low, > because we are constantly seeing complaints of this sort. They are certainly too low for our case; not sure if for folks who are not partitioning if they're way too low. The passive-aggressive approach would really not be good general advice for you, but I'm actually glad that in our case they were low enough to get our attention early ;-) I think I will disable optimization, because with our partitioning scheme we will commonly see blow ups of optimization time like this one. The inlining time in this case is still much more than the query, but it is low enough to not be noticed by users, and I think that with different variations of the parameters coming in to the query, that for the slower versions (more partitions requiring actual scans), inlining will help. Slowing down the fastest while speeding up the slower is a trade off we can take.
Re: bad JIT decision
> On Jul 24, 2020, at 4:26 PM, David Rowley wrote: > > It does not really take into account the cost of jitting. That is what I was missing. I read about JIT when 12 was pre-release; in re-reading after my post I see that it does not attempt to estimate JIT cost. And in thinking about it, I realize that would be next to impossible to anticipate how expensive LLVM optimizstion was going to be. In the case where a set of functions is replicated across partitions, it would be possible to do them once, then project the cost of the copies. Perhaps for PG 14 as better support for the combination of JIT optimization and highly-partitioned data ;-)
bad JIT decision
I have come across a case where PG 12 with default JIT settings makes a dramatically bad decision. PG11 without JIT, executes the query in <1ms, PG12 with JIT takes 7s--and explain analyze attributes all that time to JIT. (The plan is the same on both 11 & 12, it's just the JIT.) It is a complex query, with joins to subqueries etc; there is a decent amount of data (~50M rows), and around 80 partitions (by date) on the main table. The particular query that I'm testing is intended as a sort of base case, in that it queries on a small set (4) of unique ids which will not match any rows, thus the complex bits never get executed, and this is reflected in the plan, where the innermost section is: -> Index Scan using equities_rds_id on equities e0 (cost=0.42..33.74 rows=1 width=37) (actual time=6751.892..6751.892 rows=0 loops=1) Index Cond: (rds_id = ANY ('{..., ..., ..., ...}'::uuid[])) Filter: (security_type = 'ETP'::text) Rows Removed by Filter: 4 And that is ultimately followed by a couple of sets of 80'ish scans of partitions, which show never executed, pretty much as expected since there are no rows left to check. The final bit is: JIT: Functions: 683 Options: Inlining true, Optimization true, Expressions true, Deforming true Timing: Generation 86.439 ms, Inlining 21.994 ms, Optimization 3900.318 ms, Emission 2561.409 ms, Total 6570.161 ms Now I think the query is not so complex that there could possibly be 683 distinct functions. I think this count must be the result of a smaller number of functions created per-partition. I can understand how that would make sense, and some testing in which I added conditions that would restrict the matches to a single partition seem to bear it out (JIT reports 79 functions in that case). Given the magnitude of the miss in using JIT here, I am wondering: is it possible that the planner does not properly take into account the cost of JIT'ing a function for multiple partitions? Or is it that the planner doesn't have enough info about the restrictiveness of conditions, and is therefore anticipating running the functions against a great many rows? -- Scott Ribe scott_r...@elevated-dev.com https://www.linkedin.com/in/scottribe/
Re: query, probably needs window functions
> On May 22, 2020, at 1:37 PM, Michael Lewis wrote: > > I believe something like this is what you want. You might be able to do it > without a sub-query by comparing the current name value to the lag value and > null it out if it's the same. > ... Thanks, that's what I needed! (And better than using lag in my case because there's 4 tables in the joins and many more columns involved. The repetition of "case when lag(...)..." would be really noisy, but it's good to know of that possibility anyway.) One correction, just for posterity if someone else searches this question, the answer was missing "over", should have been: select case when row_number = 1 then id end AS id, case when row_number = 1 then name end as name, phone.number from( select person.id, person.name, phone.number, row_number() over partition by( phone.person_id order by phone.number ) as row_number from person join phone on person.id = phone.person_id ) AS sub order by name, row_number;
query, probably needs window functions
given, let's say: create table person (id int not null, name varchar); create table phone (id int not null, person_id int not null, number varchar); select person.*, phone.number from person join phone on (person.id = phone.person_id) order by... How would you get results where only the first row for a person was filled in, with rest of that person's phones showing blanks for those columns? I'm guessing that window functions provide this capability, but I don't know how. -- Scott Ribe scott_r...@elevated-dev.com https://www.linkedin.com/in/scottribe/
Re: parameter limit
> On Apr 23, 2020, at 8:48 AM, Adrian Klaver wrote: > > No. Ah, thanks a lot for that info. It's not that I really normally want to have bazillions of params. But we have some moderately large inserts, that are showing a sudden non-linear dropoff when scaling up, and a suspicion that the performance dropoff is NOT actually in PG. So, lots of spelunking...
parameter limit
In libpq, PQexecParams has nParams as type int. So on any reasonable platform, that's at least 4 bytes. My question then is: when I see documented limits of 65535 params in various drivers and libraries, that is NOT a restriction of libpq nor of the protocol, but rather an arbitrary limit of the driver/library, correct? -- Scott Ribe scott_r...@elevated-dev.com https://www.linkedin.com/in/scottribe/
logical replication protocol
I haven't been able to find documentation on the actual messages used in the logical replication protocol ('k' & 'w', lower case). I've figured things out mostly by reading pg_recvlogical.c, but "Read The Fine Source" doesn't seem in line with the way PG usually does it ;-) Did I miss a doc somewhere in my searches???