Re: Can user specification of a column value be required when querying a view ?

2023-11-25 Thread Peter J. Holzer
On 2023-11-25 10:49:56 -0500, Ron Johnson wrote:
> On Sat, Nov 25, 2023 at 4:49 AM Peter J. Holzer  wrote:
> On 2023-11-24 13:06:45 -0500, Ron Johnson wrote:
> > On Fri, Nov 24, 2023 at 1:01 PM Peter J. Holzer  
> wrote:
> >     On 2023-11-20 22:03:06 -0500, Ron Johnson wrote:
> >     > Or row level security.
> >
> >     Does that help here?
[...]
> It's an alternative to functions for restricting the client to only his data.

Which isn't the problem here. So RLS doesn't help.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Emitting JSON to file using COPY TO

2023-11-25 Thread Adrian Klaver

On 11/25/23 11:21, Davin Shearer wrote:

Hello!

I'm trying to emit a JSON aggregation of JSON rows to a file using COPY 
TO, but I'm running into problems with COPY TO double quoting the 
output.   Here is a minimal example that demonstrates the problem I'm 
having:




I have tried to get COPY TO to copy the results to file "as-is" by 
setting the escape and the quote characters to the empty string (''), 
but they only apply to the CSV format.


Is there a way to emit JSON results to file from within postgres?  
Effectively, nn "as-is" option to COPY TO would work well for this JSON 
use case.




Not using COPY.

See David Johnson's post for one way using the client psql.

Otherwise you will need to use any of the many ETL programs out there 
that are designed for this sort of thing.



Any assistance would be appreciated.

Thanks,
Davin


--
Adrian Klaver
adrian.kla...@aklaver.com





Re: Emitting JSON to file using COPY TO

2023-11-25 Thread David G. Johnston
On Sat, Nov 25, 2023 at 12:22 PM Davin Shearer 
wrote:

>
> Is there a way to emit JSON results to file from within postgres?
>

Use psql to directly output query results to a file instead of using COPY
to output structured output in a format you don't want.

David J.


Emitting JSON to file using COPY TO

2023-11-25 Thread Davin Shearer
Hello!

I'm trying to emit a JSON aggregation of JSON rows to a file using COPY TO,
but I'm running into problems with COPY TO double quoting the output.
Here is a minimal example that demonstrates the problem I'm having:

create table public.tbl_json_test (id int, t_test text);

-- insert text that includes double quotes
insert into public.tbl_json_test (id, t_test) values (1, 'here''s a "string"');

-- select a JSON aggregation of JSON rows
select json_agg(row_to_json(t)) from (select * from public.tbl_json_test) t;
-- this yields the correct result in proper JSON format:
-- [{"id":1,"t_test":"here's a \"string\""}]
copy (select json_agg(row_to_json(t)) from (select * from
public.tbl_json_test) t) to '/tmp/tbl_json_test.json';
-- once the JSON results are copied to file, the JSON is broken due to
double quoting:
-- [{"id":1,"t_test":"here's a \\"string\\""}]
-- this fails to be parsed using jq on the command line:
-- cat /tmp/tbl_json_test.json | jq .
-- jq: parse error: Invalid numeric literal at line 1, column 40


We populate a text field in a table with text containing at least one
double-quote (").  We then select from that table, formating the result as
a JSON aggregation of JSON rows.  At this point the JSON syntax is
correct, with the double quotes being properly quoted.  The problem is that
once we use COPY TO to emit the results to a file, the output gets quoted
again with a second escape character (\), breaking the JSON and causing a
syntax error (as we can see above using the `jq` command line tool).

I have tried to get COPY TO to copy the results to file "as-is" by setting
the escape and the quote characters to the empty string (''), but they only
apply to the CSV format.

Is there a way to emit JSON results to file from within postgres?
Effectively, nn "as-is" option to COPY TO would work well for this JSON use
case.

Any assistance would be appreciated.

Thanks,
Davin


Re: How to eliminate extra "NOT EXISTS"-query here?

2023-11-25 Thread Andreas Joseph Krogh


På lørdag 25. november 2023 kl. 17:08:28, skrev Tom Lane mailto:t...@sss.pgh.pa.us>>:
Andreas Joseph Krogh  writes:
> -- This works, but I'd rather not do the extra EXISTS
> select * from test t
> WHERE (NOT ARRAY ['x', 'y', 'z', 't']::varchar[] <@ (select array_agg(s.v) 
from
> stuffs WHERE s.test_id = t.id)
> OR NOT EXISTS (
> select * from stuff s where s.test_id = t.id
> )
> )
> ;

> So, I want to return all entries in test not having any of ARRAY ['x', 'y', 
> 'z', 't'] referenced in the table stuff, and I'd like to have test.id="d" 
> returned as well, but in order to do that I need to execute the “or not 
> exists”-query. Is it possible to avoid that?

Probably not directly, but perhaps you could improve the performance of
this query by converting the sub-selects into a left join:

select * from test t
 left join
 (select s.test_id, array_agg(s.v) as arr from stuffs group by s.test_id) ss
 on ss.test_id = t.id
WHERE (NOT ARRAY ['x', 'y', 'z', 't']::varchar[] <@ ss.arr)
 OR ss.test_id IS NULL;

Another possibility is

...
WHERE (ARRAY ['x', 'y', 'z', 't']::varchar[] <@ ss.arr) IS NOT TRUE

but I don't think that's more readable really, and it will save little.

In either case, this would result in computing array_agg once for
each group of test_id values in "stuffs", while your original computes
a similar aggregate for each row in "test". So whether this is better
depends on the relative sizes of the tables, although my proposal
avoids random access to "stuffs" so it will have some advantage.

regards, tom lane
Excellent, thanks!






--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com 
www.visena.com 
 


Re: How to eliminate extra "NOT EXISTS"-query here?

2023-11-25 Thread hector vass
Not sure you need to use array why not simple table joins, so a table with
your criteria x y z t joined to stuff to give you candidates that do match,
then left join with coalesce to add the 'd'

select

--a.id,b.test_id,

coalesce(a.id,b.test_id) as finalresult

from test a

left join (

select

test_id

from stuff a

inner join (values ('x'),('y'),('z'),('t')) b (v) using(v)

group by 1

)b on(a.id=b.test_id);


Regards
Hector Vass



On Sat, Nov 25, 2023 at 4:08 PM Tom Lane  wrote:

> Andreas Joseph Krogh  writes:
> > -- This works, but I'd rather not do the extra EXISTS
> > select * from test t
> > WHERE (NOT ARRAY ['x', 'y', 'z', 't']::varchar[] <@ (select
> array_agg(s.v) from
> > stuffs WHERE s.test_id = t.id)
> > OR NOT EXISTS (
> > select * from stuff s where s.test_id = t.id
> > )
> >  )
> > ;
>
> > So, I want to return all entries in test not having any of ARRAY ['x',
> 'y',
> > 'z', 't'] referenced in the table stuff, and I'd like to have test.id="d"
>
> > returned as well, but in order to do that I need to execute the “or not
> > exists”-query. Is it possible to avoid that?
>
> Probably not directly, but perhaps you could improve the performance of
> this query by converting the sub-selects into a left join:
>
> select * from test t
>   left join
> (select s.test_id, array_agg(s.v) as arr from stuffs group by
> s.test_id) ss
>   on ss.test_id = t.id
> WHERE (NOT ARRAY ['x', 'y', 'z', 't']::varchar[] <@ ss.arr)
>   OR ss.test_id IS NULL;
>
> Another possibility is
>
> ...
> WHERE (ARRAY ['x', 'y', 'z', 't']::varchar[] <@ ss.arr) IS NOT TRUE
>
> but I don't think that's more readable really, and it will save little.
>
> In either case, this would result in computing array_agg once for
> each group of test_id values in "stuffs", while your original computes
> a similar aggregate for each row in "test".  So whether this is better
> depends on the relative sizes of the tables, although my proposal
> avoids random access to "stuffs" so it will have some advantage.
>
> regards, tom lane
>
>
>


Re: How to eliminate extra "NOT EXISTS"-query here?

2023-11-25 Thread Tom Lane
Andreas Joseph Krogh  writes:
> -- This works, but I'd rather not do the extra EXISTS
> select * from test t
> WHERE (NOT ARRAY ['x', 'y', 'z', 't']::varchar[] <@ (select array_agg(s.v) 
> from
> stuffs WHERE s.test_id = t.id)
> OR NOT EXISTS (
> select * from stuff s where s.test_id = t.id
> )
>  )
> ;

> So, I want to return all entries in test not having any of ARRAY ['x', 'y', 
> 'z', 't'] referenced in the table stuff, and I'd like to have test.id="d" 
> returned as well, but in order to do that I need to execute the “or not 
> exists”-query. Is it possible to avoid that?

Probably not directly, but perhaps you could improve the performance of
this query by converting the sub-selects into a left join:

select * from test t
  left join
(select s.test_id, array_agg(s.v) as arr from stuffs group by s.test_id) ss
  on ss.test_id = t.id
WHERE (NOT ARRAY ['x', 'y', 'z', 't']::varchar[] <@ ss.arr)
  OR ss.test_id IS NULL;

Another possibility is

...
WHERE (ARRAY ['x', 'y', 'z', 't']::varchar[] <@ ss.arr) IS NOT TRUE

but I don't think that's more readable really, and it will save little.

In either case, this would result in computing array_agg once for
each group of test_id values in "stuffs", while your original computes
a similar aggregate for each row in "test".  So whether this is better
depends on the relative sizes of the tables, although my proposal
avoids random access to "stuffs" so it will have some advantage.

regards, tom lane




Re: Can user specification of a column value be required when querying a view ?

2023-11-25 Thread Ron Johnson
On Sat, Nov 25, 2023 at 4:49 AM Peter J. Holzer  wrote:

> On 2023-11-24 13:06:45 -0500, Ron Johnson wrote:
> > On Fri, Nov 24, 2023 at 1:01 PM Peter J. Holzer 
> wrote:
> > On 2023-11-20 22:03:06 -0500, Ron Johnson wrote:
> > > Or row level security.
> >
> > Does that help here? AIUI row level security can be used to limit
> access
> > to specific rows (e.g. user alex can access info about ssn
> '106-91-9930'
> > but not '234-56-7890') but not how many rows can be accessed in a
> single
> > query.
> >
> >
> > I don't think OP indicated that ssn in a unique key.
>
> No he didn't, but that's IMHO not relevant to the possibility of using
> row level security. If a row level security allows a user to select a
> row, that row can be selected by any query, including «select * from t».
> I don't see a way to use RLS to ensure that a query can only return a
> sufficiently small subset of the total rows a user has access to.
> How would you do that?
>

It's an alternative to functions for restricting the client to only his
data.


How to eliminate extra "NOT EXISTS"-query here?

2023-11-25 Thread Andreas Joseph Krogh


Hi, I'm testing if some dataset contains an array of elements and want to 
return all “not containing the specified array”, including entries in master 
table not being referenced.



I have the following schema:

drop table if exists stuff;
drop table if exists test;
CREATE TABLE test(
id varchar primary key
);

create table stuff(
id serial primary key,
test_id varchar NOT NULL REFERENCES test(id),
v varchar not null,
unique (test_id, v)
);

INSERT INTO test(id) values ('a');
INSERT INTO test(id) values ('b');
INSERT INTO test(id) values ('c');
INSERT INTO test(id) values ('d');

INSERT INTO stuff(test_id, v)
values ('a', 'x')
;

INSERT INTO stuff(test_id, v)
values ('b', 'x')
 , ('b', 'y')
;

INSERT INTO stuff(test_id, v)
values ('c', 'x')
 , ('c', 'y')
 , ('c', 'z')
;

select * from test t
WHERE NOT ARRAY['x']::varchar[] <@ (select array_agg(s.v) from stuff s WHERE s.
test_id= t.id)
;

select * from test t
WHERE NOT ARRAY['x', 'y']::varchar[] <@ (select array_agg(s.v) from stuff s 
WHERE s.test_id = t.id)
;

select * from test t
WHERE NOT ARRAY['x', 'y', 'z']::varchar[] <@ (select array_agg(s.v) from stuff 
s WHERE s.test_id = t.id)
;

select * from test t
WHERE NOT ARRAY ['x', 'y', 'z', 't']::varchar[] <@ (select array_agg(s.v) from 
stuffs WHERE s.test_id = t.id)
;

-- This works, but I'd rather not do the extra EXISTS
select * from test t
WHERE (NOT ARRAY ['x', 'y', 'z', 't']::varchar[] <@ (select array_agg(s.v) from
stuffs WHERE s.test_id = t.id)
OR NOT EXISTS (
select * from stuff s where s.test_id = t.id
)
 )
;



So, I want to return all entries in test not having any of ARRAY ['x', 'y', 
'z', 't'] referenced in the table stuff, and I'd like to have test.id="d" 
returned as well, but in order to do that I need to execute the “or not 
exists”-query. Is it possible to avoid that?






--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com 
www.visena.com 
 

RE: Odd Shortcut behaviour in PG14

2023-11-25 Thread Zahir Lalani
> -Original Message-
> From: Tom Lane 
> Sent: Friday, November 24, 2023 6:44 PM
> To: Zahir Lalani 
> Cc: Ron Johnson ; pgsql-
> generallists.postgresql.org 
> Subject: Re: Odd Shortcut behaviour in PG14
> 
> OK, so if this is a plpgsql function and ekey is a function variable, the 
> planner
> will definitely perceive this as a query parameterized by the value of "ekey".
> We will consider a "custom" plan where the value is directly substituted into
> the query (allowing plan-time folding based on whether ekey is zero or not),
> but we will also consider a "generic" plan where the value of ekey is not 
> known
> at plan time so no such folding occurs, and that's probably where your failure
> is happening.  Replanning for every query execution is expensive so there's a
> preference for using generic plans if we can.
> 
> I don't really understand why you wrote
> 
> > SELECT
> > CASE WHEN (ekey > 0) THEN 
> > convert_from(crypto_secretbox_open,
> 'utf8')::JSON ELSE NULL END AS edata
> > FROM crypto_secretbox_open(
> > sc.data,
> > sc.nonce,
> > boxkey)
> 
> rather than just
> 
> SELECT
> CASE WHEN (ekey > 0) THEN convert_from(
>  crypto_secretbox_open(sc.data,
>sc.nonce,
>boxkey),
>  'utf8')::JSON ELSE NULL END AS edata
> 
> I see no reason why you should feel entitled to assume that
> crypto_secretbox_open won't get called in the first formulation.
> The normal understanding of such a SELECT is that we evaluate FROM and
> then apply the SELECT expressions to its result, so the existence of a CASE in
> the SELECT expression doesn't cause the function call in FROM to get
> bypassed.
> 
> Likewise, the fact that the JOIN ON condition is false seems like a poor 
> reason
> to assume that the join's input relation won't get evaluated.
> 
> Another approach could be to force matters in the plpgsql logic:
> 
> IF ekey > 0 THEN
> RETURN QUERY query-with-decryption;
> ELSE
> RETURN QUERY query-without-decryption;
> END IF;
> 
> which seems a good deal safer than relying on undocumented details of
> planner optimization behavior.
> 
> I also wonder why you don't make crypto_secretbox_open a bit more robust -
> -- at the very least mark it strict (RETURNS NULL ON NULL INPUT).
> 
> regards, tom lane


Thank you for the detailed explanation Tom - much appreciated.

So our assumption was made as it worked for so long - ok - so we have learnt 
not to do that again

> IF ekey > 0 THEN
> RETURN QUERY query-with-decryption;
> ELSE
> RETURN QUERY query-without-decryption;
> END IF;

This is indeed what we have done - we were trying to keep code maintenance down 
as the above requires updates in two places rather than 1 - but better that 
than a broken system!

> I also wonder why you don't make crypto_secretbox_open a bit more robust -
> -- at the very least mark it strict (RETURNS NULL ON NULL INPUT).

This is a third party plugin (pgsodium).

Thank you to this group for your expertise!

Z




Re: Can user specification of a column value be required when querying a view ?

2023-11-25 Thread Peter J. Holzer
On 2023-11-24 13:06:45 -0500, Ron Johnson wrote:
> On Fri, Nov 24, 2023 at 1:01 PM Peter J. Holzer  wrote:
> On 2023-11-20 22:03:06 -0500, Ron Johnson wrote:
> > Or row level security.
> 
> Does that help here? AIUI row level security can be used to limit access
> to specific rows (e.g. user alex can access info about ssn '106-91-9930'
> but not '234-56-7890') but not how many rows can be accessed in a single
> query.
> 
> 
> I don't think OP indicated that ssn in a unique key. 

No he didn't, but that's IMHO not relevant to the possibility of using
row level security. If a row level security allows a user to select a
row, that row can be selected by any query, including «select * from t».
I don't see a way to use RLS to ensure that a query can only return a
sufficiently small subset of the total rows a user has access to.
How would you do that?

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature