Re: Packages, inner subprograms, and parameterizable anonymous blocks for PL/pgSQL

2021-12-17 Thread Bryn Llewellyn
> p...@bowt.ie wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> Modular design recommends exposing functionality through a purpose oriented 
>> interface and hiding all implementation details from the API’s user. A 
>> package achieves this with declarative syntax via the spec/body separation. 
>> The body encapsulates as many (top-level) subprograms as you want. Each of 
>> these is visible to all of its peers. But none is visible outside of the 
>> package unless the spec declares that it should be. This is a simple opt-in 
>> scheme.
> 
> I still don’t get it. It sounds like you’re mostly talking about 
> encapsulation, or Information hiding, for stored procedures. I can certainly 
> see how plpgsql doesn’t do those things very well, but it still seems like 
> there might be a lot of nuance that isn’t getting across. The list of 
> specific features that seem to be missing are not unreasonable, individually, 
> and yet it feels like I cannot see some bigger picture that's apparent to you.
> 
> Maybe you should explain your position by way of a motivating example, 
> involving a real world use case. Something that makes the issues concrete. 
> Are these items compelling because of how they allow an organization to 
> deploy a program in a production environment, complete with version control? 
> Does it have something to do with decoupling the mutable business data stored 
> in tables from the programs contained/run in the same database?

I can certainly make up an example. I’ll do this over the weekend. However, I 
fear that it will be time wasted because at least some of the addressees here 
who’ve expressed strong opposition to the notion of PL/pgSQL packages must 
understand very well what they’re objecting to. For example, 
pavel.steh...@gmail.com  with his “schema 
variables, LET command” work.

Anyway… I’ll give it my best shot. I’ll try to address your specific questions 
in my follow-up reply. Hang on for a couple of days, please.

Re: Packages, inner subprograms, and parameterizable anonymous blocks for PL/pgSQL

2021-12-17 Thread Peter Geoghegan
On Fri, Dec 17, 2021 at 11:43 AM Bryn Llewellyn  wrote:
> Modular design recommends exposing functionality through a purpose oriented 
> interface and hiding all implementation details from the API’s user. A 
> package achieves this with declarative syntax via the spec/body separation. 
> The body encapsulates as many (top-level) subprograms as you want. Each of 
> these is visible to all of its peers. But none is visible outside of the 
> package unless the spec declares that it should be. This is a simple opt-in 
> scheme.

I still don't get it. It sounds like you're mostly talking about
encapsulation, or Information hiding, for stored procedures. I can
certainly see how plpgsql doesn't do those things very well, but it
still seems like there might be a lot of nuance that isn't getting
across. The list of specific features that seem to be missing are not
unreasonable, individually, and yet it feels like I cannot see some
bigger picture that's apparent to you.

Maybe you should explain your position by way of a motivating example,
involving a real world use case. Something that makes the issues
concrete. Are these items compelling because of how they allow an
organization to deploy a program in a production environment, complete
with version control? Does it have something to do with decoupling the
mutable business data stored in tables from the programs contained/run
in the same database?

-- 
Peter Geoghegan




Re: Packages, inner subprograms, and parameterizable anonymous blocks for PL/pgSQL

2021-12-17 Thread Bryn Llewellyn
laurenz.a...@cybertec.at wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> The advantages are self-evident to these programmers…
> 
> I am not trying to belittle this, but when you are used to system A and start 
> working with system B you always miss some features of A, until you get to 
> know B better and figure out how to do things there.

Several people have responded with various points—some technical and some 
political. Thanks to all of you. I’m simply using the most recent turn in the 
thread as the hook for my present turn.

I made a terrible essay design choice with my “advantages of packages are 
self-evident”. I used this as a wrong-headed shortcut to save myself the effort 
of writing about modular software design principles—and to save you all the 
effort of (re)reading that stuff. So I’ll make the briefest attempt here.

«
Modular design recommends exposing functionality through a purpose oriented 
interface and hiding all implementation details from the API’s user. A package 
achieves this with declarative syntax via the spec/body separation. The body 
encapsulates as many (top-level) subprograms as you want. Each of these is 
visible to all of its peers. But none is visible outside of the package unless 
the spec declares that it should be. This is a simple opt-in scheme.

Inner subprograms are a very nice bonus because they allow locally useful 
helpers to be implemented right where they’re needed in one subprogram so that 
they’re invisible to all other subprograms in the body. The scope rules here a 
simply an extension of what PG already supports with block statements inside 
block statements.

There’s also the business of globals, at any level, and package-level globals 
that bring package state. Some programmers have religious objections here. But 
the value of constants declared in the package spec seems to be seen by all who 
program using packages as only useful.
»

I firmly believe that the intrinsic value of all of this has nothing to do with 
Oracle Database, with migrating from it to PG, or with Ada.  It’s just that 
Oracle’s PL/SQL has a working implementation. And many people find it easier to 
think when they can experiment with something concrete rather than trying to 
hold, and run, a pretty big abstract model entirely in their head.

Anyway… enough of this. I fear that even what I said above will annoy some 
folks on this list.

It seems to me that there’s sufficiently vigorous opposition to anything like 
packages for PL/pgSQL that such an enhancement will never happen. So I must 
just accept this and (as Laurenz recommends) learn the best design patterns for 
singleton PL/pgSQL functions and procedures in PG as it presently is.

When I first started to use PG, I read “43.13. Porting from Oracle PL/SQL” 
(www.postgresql.org/docs/current/plpgsql-porting.html 
). These are the 
relevant bullets: «Instead of packages, use schemas to organize your functions 
into groups.» and «Since there are no packages, there are no package-level 
variables either. This is somewhat annoying. You can keep per-session state in 
temporary tables instead.»

This says nothing about how to model the spec/body distinction. I experimented 
with various schemes. For example, one schema for the exposed API and another 
one for the to-be-hidden implementation. This depends on a careful, 
practice-based, use of the privileges scheme and implies using “security 
definer” units. But you can’t do transaction control in such a unit—and this 
brings its own problems. I tried to model package state using temporary tables 
but I hit what seemed to be a bootstrap conundrum. How, in the absence of a 
trigger that fires when a session starts, can I make sure that the table that I 
want is in place?

Can anybody please recommend a whitepaper, or similar, that explains the 
recommended practice in this space?



Re: SELECT DISTINCT scans the table?

2021-12-17 Thread Markus Demleitner
Dear Tom,

On Thu, Dec 16, 2021 at 03:47:57PM -0500, Tom Lane wrote:
> Markus Demleitner  writes:
> > Maximally stripped down, my problem is that
> 
> >   select distinct 300 from 
> 
> > seqscans  (at least in PostgreSQL 11.14).  To me, it seems
> > obvious that this ought be be just one row containing 300 once
> > Postgres has established that  is nonempty.
> 
> That seems like the sort of optimization that we absolutely should
> not spend cycles looking for.  If it were a trivial change consuming
> no detectable number of planning cycles, maybe it would be worth the
> development and maintenance effort; though I'd be dubious about the
> value.  But the fact that it'd have to be transformed into something
> testing whether the table is nonempty makes it fairly nontrivial.
> I doubt it's worth the development cost plus the cycles imposed
> on every other query.

I certainly understand that reasoning for this particular example.
However, in my actual use case, the one with the view consisting of a
large union containing constants from the original mail, 

  CREATE VIEW a_view AS (
SELECT 'abc' as coll, ...
FROM table1   -- with perhaps 1e6 rows
  UNION
SELECT 'def' as coll, ...
FROM table2   -- with perhaps another 1e7 rows
  UNION
SELECT coll, ... -- with an index on table3.coll
FROM table3
  ...)

being able to factor out constants would make a difference of
milliseconds versus a long time (~ a minute in my case, with about
1e8 total rows) when running SELECT DISTINCT coll FROM a_view.

Is there, perhaps, a non-obvious way to give the planner a nudge to
exploit the constant-ness of coll in table1 and table2?

Thanks,

   Markus