Re: [HACKERS] Review: Extensions Patch

David E. Wheeler Sun, 05 Dec 2010 20:59:25 -0800

On Dec 4, 2010, at 6:14 AM, Dimitri Fontaine wrote:

> Hi,
> 
> Thanks for the review, that's quite one! :)
> 
> I'm not sure to follow you all along, it seems like the reading is "try
> it first then understand and comment again", so sometimes I'm not sure
> if you're saying that docs are missing the point or that the behaviour
> ain't right.


Overall I think the docs could use a lot more fleshing out. Some of the stuff 
in the wiki would help a lot. At some point, though, I'll work over the docs 
myself and either send a patch to you or to the list (if it has been committed 
to core).

> "David E. Wheeler" <[email protected]> writes:
>> `make installcheck` fails (regression.diffs attached).
> 
> You forgot the attachment. I'd add mine if I didn't get "All 124 tests
> passed." result for the command :)

Sorry, attached now.

> (Note that I've been having problems too, but can't reproduce them
> easily, and they might have been related to forgotten maintainer-clean
> steps).

Yeah, I did that a bunch of times. This was the best run.

> Well, the exposed functionality aren't much (2 SQL commands and some
> support functions) and you need to have a contrib at hand to exercise
> them. So I'm not sure about how to add in regression tests for contrib
> infrastructure work and have them depend only on -core. I'll take ideas.

You write a very simple contrib module exclusively for testing. It doesn't have 
to actually do anything other than create a couple of objects. A domain perhaps.

> The other thing is that the very important feature here is dump and
> restore of a database containing extensions. You're not reporting about
> it, though. And I'm not sure about how to check for pg_dump content from
> pg_regress oriented tests.

Yeah, I haven't tried the dump and restore stuff. Are there no dump/restore 
tests in core? If there are, I expect all you'd have to do is add an extension 
or two to be dumped and restored.

>> Yes. Much of the execution is superuser-only, so we need to make sure
>> that such is actually the case (unit tests would help with that).
> 
> How do you mean? Trying to subvert the features by being able to run
> them as a non-superuser? How do you install tests around that?

If the tests run as a super user (I think they usually do), then all you have 
to do is create a new unprivileged user, switch to that user via SET ROLE, and 
call the function in question.

>> The pg_extension catalog docs are a little thin, but probably
>> sufficient. They appear to be duplicated, though, with the entries for
>> catalog-pg-extension and view-pg-extensions looking identical. One is
>> apparently a list of installed extensions, and the other an SRF that
>> lists installed and available extensions. But I find the duplication a
>> bit confusing. (Might be easer if I wasn't reading SGML though.)
> 
> Well yes the pg_extension catalog is filled at CREATE EXTENSION time so
> that we have an OID to register dependencies on. Then I added a SRF to
> also list available extensions so that it's easy to integrate the
> feature into pgadmin and the like.

Right.

> Now I'm open to suggestions as far as the names are involved, or maybe
> we should keep the SRF but remove the view. psql could easily enough use
> the function directly I think.

I think the catalog should be pg_created_extensions (or 
pg_installed_extensions) and the view should be pg_available_extensions.

Is there no way to have a catalog table visible to *all* databases that 
contains the list of available extensions?

>> I don't understand this paragraph at all:
>> This paragraph isn't very clear, either:
>> Examples might help.
> 
> Well, I think native English skills are required here. And I'm not too
> sure about the example.

I recognize that some documentation pages have no (or very few examples). But 
oftentimes nothing makes the utility of a feature more apparent than an 
example. I tend to use them liberally in my documentation.

As I said, I'll probably contribute a refactoring of the docs at some point.

>> Control file keys:
>> 
>> * comment -- Should mention that it's used for CREATE COMMENT ON EXTENSION, 
>> no?
>> * version -- If it's free-form, how will you do dependency-checking? Or will 
>> you?
>> * script
>> * encoding -- Seems unnecessary; one can explicitly set it in the .sql file, 
>> no?
>> * custom_variable_classes
> 
> Doc says:
>  http://pgsql.tapoueh.org/extensions/doc/html/extend-extension.html
> 
>  This control file allows for registering the extension and should
>  provide values for the keys described below.
> 
> Then there's the detail about fields. I think the context is enough to
> understand, but I'd be happy to integrate some revisions of the current
> content.

Yes, I think if you just mentioned that the comment is used for an implicit 
CREATE COMMENT that it would be useful. That's true, yes? The others are okay 
as-is.

> For the version, it has been agreed before (Developer Meeting) that v1
> would not care about inter-extension dependencies nor be smart about it,
> or I would have proposed a patch to have the following into core:
> 
>  http://packages.debian.org/sid/postgresql-9.0-debversion

OKay.

> 
>> I don't understand this paragraph:
>> 
>>     <para>
>>      The admin
>>      function <link 
>> linkend="functions-extension">pg_extension_flag_dump</link>
>>      can be used to revert the default <literal>pg_dump</literal> policy
>>      about objects that belong to an extension and force the flagged objects
>>      to be part of the backups.
>>     </para>
>> 
>> What's a pg_dump policy?
> 
> Here again not being a native speaker shows up it seems. What I mean is
> that the only goal of all this patch is for pg_dump to stop including
> extensions install scripts, but just mark the dependency.

Yeah. Say that then. Makes much more sense.

> So obviously, if your extension provides for some user editable content,
> you want to reverse the default behavior and have those content be parts
> of the dump, right?

Right.

> Now, to phrase that in a manual fashion...
> 
>> * There appears to be an irrelevant change to the docs for replace().
> 
> As detailed in another thread, the patch you're reviewing depends on
> another one that has been reviewed separately, marked as ready, but is
> not yet in. So the extension patch is including the pg_execute_sql_file
> patch, to ease your reviewer work (you don't have to deal with patch
> dependencies). So you're seeing unrelated-at-first-glance changes.

Got it.

>> Also, why are the placholders different than everywhere else? I see
>> 
>> + SELECT pg_execute_sql_string('CREATE SCHEMA @schema@;', '@schema@', 
>> 'utils');
>> 
>> and
>> 
>> + SELECT pg_execute_sql_string('CREATE SCHEMA s;', 's', 'bar');
>> 
>> Why not $1 like is used in PREPARE and user-defined functions? I think
>> the second example is particularly weird, as there's nothing to
>> indicate that "s" is a placeholder. I don't mind "@schema@", but would
>> like to see uses of such things be consistent throughout PostgreSQL
>> (and I expect that one cannot add such as placeholders to PREPARE).
> 
> The @extschema@ syntax has been proposed by Heikki, and I just adopted
> it. If another is to emerge, I'll adapt to it. While preparing this, I
> didn't want to mix mechanism and policy, so there's nothing imposed on
> the extension authors or the function users wrt to what a placeholder
> name should look like.

Okay. I think I see the need to differentiate between a variable in .control.in 
replaced by `make` and one replaced by `CREATE EXTENSION`. Something explaining 
that might be good.

>> In the CREATE EXTENSION docs, I don't understand this section:
>>      <varlistentry>
>>       <term>NO USER DATA</term>
>> Does it mean that no user data will be dumped when the extension is
>>      dumped?
> 
> Consider you have an extension with user data. At create extension time,
> you're creating a table to hold the data, and maybe you insert default
> values into it. At pg_restore time, CREATE EXTENSION will be called
> again. Do you want the extension's provided default user data to be
> restored, or the one from the backup?

Backup.

> So pg_dump will issue CREATE EXTENSION foo WITH NO USER DATA variant,
> and will also issue dump commands for objects marked as being user data,
> so that at pg_restore time you get the values from the backup when it
> makes sense.

Ah, I see.

> Updates on the manual style paragraph to explain that are welcome. It
> could be that what's needed is a better overview of the whole thing
> somewhere, but I though what's written would be enough. It seems to be
> only enough when you already know what it's all about --- and that's
> typical of Unix style man pages. Do we want to include an extension
> author tutorial?

Yes!

>> Why would multiple control files be necessary? And how would it map to
>> EXTVERSION? Would all the extensions get the one version in
>> EXTVERSION?
> 
> That part has been extensively re-hashed by Alvaro (and Heikki I think)
> already. The basis for multiple control files per extension is to be
> able to cope with contrib/spi, that contains 5 extensions:
> 
> MODULES = autoinc insert_username moddatetime refint timetravel
> EXTENSION = autoinc auto_username moddatetime refint timetravel
> EXTVERSION = $(VERSION)
> 
> Now, would you make the same layout choice and maintain different
> version numbering schemes, you still can do that, but you won't benefit
> from the EXTVERSION facility.

Okay. So a control file corresponds to a single extension with a single 
version, but that extension might have multiple files, right? No, wait, you 
have five *extensions* above (but only one version number). Why not just make 
it *on* extension in five scripts? That would be less confusing to me, given 
that you can have only one version number, it appears.

>>        <literal><function>pg_extension_flag_dump(<parameter>objid</> 
>> <type>oid</>)</function></literal>
>> Why is this necessary? Aren't all database objects "worthy" of being
>> dumped? And this:
> 
> Remember that the only goal of all that patch is *NOT* to dump the
> extension's install script?

I do now, yes.

>> What constitutes "user data" when it comes to extensions? In contrast,
>> the explanation below of "pg_extension_flag_dump" is very clear,
>> making it obvious what it's for and why it might be useful. I don't
>> see how it relates to the summary above, however. OTOH, it also says:
> 
> User data is anything that the extension's install script flagged so.

Okay. The scoping feels a little odd, given the example:


DO $$
BEGIN
IF pg_extension_with_user_data() THEN
  create schema foo;
  create table foo.bar(id serial primary key);
  perform pg_extension_flag_dump('foo.bar_id_seq'::regclass);
  perform pg_extension_flag_dump('foo.bar::regclass);
END IF;
END;
$$;

But not too bad I guess.

>>   <para>
>>    This function can only be called when loading a <acronym>SQL</> script
>>    using the command <command>CREATE
>>    EXTENSION</command>. See <xref linkend="sql-createextension">
>>    and <xref linkend="extend-extension">.
>>   </para>
>> 
>> And I have no idea how I'd call that function within another function
>> call. Does this mean that only the extension author can call it? Seems
>> like a weird scoping without precedent.
> 
> Indeed. Those two functions only make sense when used while running the
> CREATE EXTENSION sql command, so are only meaningful in the extension's
> install script. And yes that looks like a precedent.

Yeah.

>> +                            (errmsg("This function can only be used from 
>> CREATE EXTENSION.")));
>> 
>> I think that's a confusing error message, frankly. Perhaps better
>> would be something like "This function can only be called from SQL
>> files executed by CREATE EXTENSION". The docs should make that
>> clearer, too. It was only when I saw this error message that I
>> realized how it's scoped.
> 
> Any proposal here?

Yes. Change the error message to "This function can only be called from SQL
files executed by CREATE EXTENSION".

> Well maybe it's just me but I feel like wiki content is "free style"
> while the manual tone has to be sharp and concise, and the examples in
> there must be carefully chosen so as not to deprecate early. But if the
> consensus is to add the wiki content into the documentation chapter,
> that's what I'll do for next patch version.
> 
>  http://pgsql.tapoueh.org/extensions/doc/html/extend-extension.html

Yes, I definitely think that there should be a lot more information in that 
page, including an example or six. Some of the stuff from the wiki page would 
go a long way towards that. And that seems like the right place for it, too, as 
long as the CREATE EXTENSION page points to it.

> The example is in the docs, yes. The whole goal of the two functions
> pg_extension_flag_dump() and pg_extension_with_user_data() is for
> pg_restore to instruct the extension's install script if it should
> create the user data objects or bypass the step because they will be in
> the dump already.
> 
> I think we have to find a way to state that clearly in the manual.

Yes. I see now how it works, but it was a bit confusing for me to get there. I 
think I had some trouble with the term "user data," though I don't have a 
better term to offer, alas.

> 
>> Code Review
>> -----------
>> 
>> I find the use of variables in the control files confusing. It looks like 
>> this:
>> 
>>    version = 'EXTVERSION'
>> 
>> Why not use make syntax?
>> 
>>    version = $(EXTVERSION)
>> 
>> That makes it much clearer that it's a variable and provides the same
>> syntax a the extension developer will already be using in the
>> Makefile.
> 
> For starters, you should be comparing this facility with the .sql.in one
> where you can use the following notation:
> 
> CREATE OR REPLACE FUNCTION citext_cmp(citext, citext)
> RETURNS int4
> AS 'MODULE_PATHNAME'
> LANGUAGE C STRICT IMMUTABLE;

Right, I forgot about MODULE_PATHNAME in the .sql.in file.

> Then, if you insist on some strange cleverer input syntax, I'll welcome
> the C-coded patch to make that happen. You might want to know that the
> current extension patch is re-using the same parsing code that is used
> for postgresql.conf and recovery.conf, though.

It's probably fine, given the precedent of MODULE_PATHNAME. I'd rather see the 
.control file killed from the distribution altogether (that is, generated by 
`make`).

>> I've already mentioned that the @extschema@ placeholder is different from 
>> any other SQL placeholder:
>> 
>>    SET search_path = @extschema@;
>> 
>> I'm not sure of a better option, but I do think it should be made
>> consistent, if at all possible, since this is executed in the
>> database, not at `make` time.
> 
> Again, there's no technical restriction in there.
> 
> =# select replace('Are placeholders $2 and $1 better than @n...@?', '$1', 
> 'foo', '$2', 'bar', '@name@', '@');
>                   replace                   
> ---------------------------------------------
> Are placeholders bar and foo better than @?
> (1 row)
> 
> It looks like bikeshedding time. We managed to reduce it up until now,
> but well, it has to kick-in after a while :)

Oh, so they're not defined placeholders. I can do anything I want supported by 
variadic replace(). Seems kind of weird, though. Erm, except that @extschema@ 
is explicitly supported by CREATE EXTENSION, right?

>> Speaking of which, what happens if an extension has no search_path? Or
>> an old explicit one?
> 
> No. Please do not try to open this can of worms. It took one entire year
> for us to agree that extension should not concern themselves with schema
> and search_path issues, and 2 developer meetings.

Yeah, except that you *are* making us be concerned with them. I am required to 
put this line into my extension's .sql file:

    SET search_path = @extschema@;

I'd rather not have to.

> So what's in the patch is a very simple way for users to instruct the
> extension about a schema preference choice, which as been asked by
> Heikki and Robert this time. I'll try to come up with the archives link
> later, now ain't a good time for me to crawl the lists.

No, I agree that I should be able to do `CREATE EXTENSION foo WITH SCHEMA bar` 
and have it work. But it appears to *only* work if the extension author 
included the above magical line. It feels superfluous to me.

IOW, if I install extension "foo" and it does *not* have the above magic line, 
then this command will *not* do what I expect:

    CREATE EXTENSION foo WITH SCHEMA bar;

Extension "foo" will be in the public schema (usually) rather than "bar".

> The only other acceptable option would be to do nothing with schema at
> install time and to ask users to use the next patch in the series.
> 
>  CREATE EXTENSION foo;
>  ALTER EXTENSION foo SET SCHEMA bar;

Why is that the only other option?

> Now, this would have to fail if more than exactly one schema is
> depending on the extension's foo, and that's it. Considering that most
> (if not all) extensions are using only one schema, it was though nice to
> include the WITH SCHEMA bar; facility at CREATE EXTENSION time, and the
> simple implementation is what you're commenting on.

What I'm suggesting is that extension authors should not be concerned with 
schemas.

>> All except the last one could be written into the Makefile, and then
>> `make` would generate the control file for installation. Only if the
>> extension author wants to support custom_variable_classes would you
>> need to create an explicit control file. And even that could probably
>> be finessed with some sort of Make variable.
> 
> We've been trying to get there, and Tom and Alvaro both said that it's
> not worth the Makefile effort and dance, then Itagaki proposed the
> current mechanism, which is based on extension.control.in files and a
> simple implicit Make rule in pgxs.mk.

You already have key/value replacement of EXTVERSION. What's so different about 
EXTNAME, EXTSCRIPT, EXTENCODING, and EXTCOMMENT? If I don't need a custom 
variable class, why bother me with the added complexity of another file with 
another syntax?

>> BTW, you might want to put the changes to the existing contrib modules
>> in a separate patch, just to simplify things a bit. Not a big deal at
>> this point, but it might help the committer to have further
>> separation. But reviewing that bit, I was wondering, how are the
>> uninstall scripts handled? Are they necessary anymore?
> 
> Separating the only use case of the patch away does not sound like
> simplifying anything for me, but maybe that's only because I'm already
> hard at work maintaining 7 git branches (+ master) to be able to issue
> reviewable patches.

Yeah, was just a thought. I expect that everything in contrib can work even if 
it's not an extension, and could be converted in a later commit. But it's not a 
big deal to do it up-front.

> As far as the uninstall scripts goes, yes, I think they are not
> necessary anymore and I now realize I didn't remove them. Will wait for
> some more input before to add their removal in the v16 patch.

Cool.

>> ### Questions ###
>> 
>> * I assume that if an extension is installed that includes a DSO that
>> the server will need to be HUPed before you can CREATE EXTENSION, yes?
> 
> If by DSO you mean dynamic shared object (.so, .dylib or .dll) then no.

Yes, shared object. Don't you need to restart the server to have it pick up a 
new dso now?

>> * I trust that CREATE EXTENSION and DROP EXTENSION and ALTER EXTENSION
>> are transactional, yes?
> 
> Sure. But the patch you're reviewing is not featuring ALTER EXTENSION,
> that's for the next one :)

Right. I kept thinking it should be there, and then remembering it was 
separate. :-)

>> * Is there a regtype for extensions?
> 
> No. Do we need one?

Probably not.

>> * Why can only super users get a list of extensions? It might be
>> useful for me to be able to check from within a udf to see if a given
>> extension is installed, for example. I can see why you wouldn't want
>> to allow listing of extensions that are installed on the system but
>> not in the current database.
> 
> Anyone can SELECT FROM pg_extension, the catalog, and so list the
> installed extensions. Same for listing an extension's object.

Well then the limit on the function is arbitrary.

> The superuser only function is the one that has to parse the control
> files. I'm not good about security concerns, but it well seems to me
> that parsing files in $SHAREDIR should be superuser only.

Oh, agreed. Better still would be some way to have a catalog (heh) of available 
extensions readable from all databases in a cluster. Then, amusingly enough, 
one wouldn't need control files, either. But maybe that's not possible in 
PostgreSQL? I'm not sure.

> So I think what you want to happen is what's in the code. Right?

I think so. That function is used by the view, right?

>> * Is there no way to keep a list of available extensions somewhere in
>> the cluster? Seems kind of silly to parse all the control files from
>> within pg_extensions(). That can make a call to pg_extensions() pretty
>> expensive.
> 
> Where would you put that cache, and when do you invalidate its entries
> or add new ones? I don't think that listing available extensions is such
> a frequent operation that it warrant for its own cache, even if you're
> able to come up with a good enough management rule set.

No idea. I know zilch about that stuff.

> Oh, and I just did select * from pg_extensions(); 3 times here, to be
> sure we're on the same page. Time: 166,868 ms then 7,823 then 3,584.

That's pretty damned slow.

>> * I like that I can see a list of installed and available extensions,
>> but I'm not sure that \dx+ is the right command for the
>> latter. Usually + means "more columns to provide more information for
>> the same objects as you saw without the +". Is there any precedent for
>> showing different objects with + than without?
> 
> Well, I'm not sure about what the best user interface for it is in psql,
> but the feature is there already:
> 
> =# select name, comment, installed from pg_extensions limit 2;
>     name      |                  comment                   | installed 
> ---------------+--------------------------------------------+-----------
> adminpack     | Administrative functions for PostgreSQL    | t
> auto_username | functions for tracking who changed a table | f

Yeah, I'm not sure, either. I like the *functionality of \dx+, I'm just not 
sure that's the right name for it.

>> * I built my cluster with all the contrib extensions. \dx+ does a nice
>> job of listing what's available. So I ran `CREATE EXTENSION
>> hstore`. It seemed to work great, but was *very* verbose. Every SQL
>> statement was emitted to the terminal. I thought \echo was turned off
>> by CREATE EXTENSION. I was able to use hstore after that; worked
>> great. Yay!
> 
> The chatter has been discussed extensively with Alvaro and Tom in the
> pg_execute_from_file() branch already --- sorry about that, but really,
> I though that handing you a single patch would make it easier.
> 
> So this chatter is a result of 2 things:
> 
> - we're using SPI in CREATE EXTENSION
> 
> - hstore issues a WARNING on the => operator creation, and SPI reaction
>   to WARNING is to dump the query string, which is the install script
>   here.

Ah. SPI isn't able to be more precise about it, eh? Pity. Maybe for hstore we 
could silence the warning in the install script, eh?

>> * Next I tried dropping it. `DROP EXTENSION ex.citext;` did not
>> work. What if I have citext installed in two schemas and just want to
>> drop one of them? Ah, I see, one cannot have an extension with the
>> same name in different schemas. Currently the schema-qualification is
>> a syntax error, but maybe it should be a little more informative?
> 
> Well this point is not decided upon yet. I'm arguing that an extension
> is a database global object, and there's a point to be made for them to
> live in a schema.

Those sound like mutually-contradicting statements to me. But maybe it's 
because I tend to think of schemas as namespaces, and perhaps they're not?

> Currently, they depend on the schema you've been using at CREATE
> EXTENSION time, and the schema created by the script are depending on
> the extension, and as a result all is working fine and following the
> usual conventions (try drop schema cascade when its owner ain't a
> superuser but an extension has been installed there, e.g.).
> 
> It could be that the easiest way to fix the current mess is to remove
> WITH SCHEMA support at CREATE EXTENSION time and have the facility back
> in core with the ALTER EXTENSION SET SCHEMA patch.

It seems to me that if you can put an extension in a schema, then you can put 
another extension with the same name in another schema. That's how it works for 
all other schema-residing objects, AFAIK.

>> * The `pg_extension` catalog and `pg_extensions` views are both
>> useful, but too close in name. I found it confusing how similar they
>> are. It seems to me like the `pg_extension` class is well-named and
>> like the other catalog table names, but `pg_extenions` should perhaps
>> be `pg_available_extensions` or something.
> 
> Give me the name and I'll update the patch. Well, or this could be
> considered a commiter decision.

Covered above: pg_created_extensions and pg_available_extensions.

>> Creating a Third Party Extension
>> --------------------------------
>> 
>>    ! CREATE EXTENSION pair;
>>    ! NOTICE:  Installing extension 'pair' from 
>> '/usr/local/pgsql-devel/share/contrib/pair.sql
>>    ! ERROR:  could not execute sql file: 
>> '/usr/local/pgsql-devel/share/contrib/pair.sql'
>> 
>> It would be nice if it gave me more information. What failed, exactly?
>> It also fails when I use `WITH SCHEMA public`.
> 
> Yeah, that's a byproduct of using SPI_execute() in pg_execute_sql_file.
> Your script contains BEGIN; and COMMIT; commands. It's not supposed to.
> You won't find a single contrib module containing those either, with or
> without my patch.

Ah! Yep, that was the problem. Still, it would be great if I could get better 
feedback as to what the error actually is. I'm not sure I ever would have 
figured it out myself, given that it works just fine if I install it the 
old-fashioned way (psql -f).

>> * One could specify stuff *only* in the `Makefile` and let `make`
>> create a control file.
> 
> Been trying, failed. Sorry.

Keep trying man! I got your back!

>> * I could omit the @extschema@ business altogether and just let CREATE
>> EXTENSION set the path for the duration of its execution.
> 
> That won't fly.

Why not?

> The only workable solution is a 2-steps recipe:
>  CREATE EXTENSION pair;
>  ALTER EXTENSION pair SET SCHEMA utils;
> 
> And the second command will raise an error if 'pair' has registered
> dependencies for more than exactly one schema. Some work needs to be
> done to see that happen, though.

This seems orthogonal to me.

> I'm pleased to see that you mean 'Waiting on Author', because I intend
> to answer timely enough to still hope for the patch to make it in this
> commitfest, baring showstoppers.

D'oh! Sorry, yes, that's what I meant. I think I made the same mistake the last 
time I did a commitfest review. Duh. Thanks for changing it in the cf app.

> If you follow the git repository you will notice that the doc changes
> asked for in this review are pushed. I'll wait some more for us to
> decide on remaining points before to produce yet another patch version,
> though.

Sounds like a plan.

Best,

David

regression.diffs
Description: Binary data

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: Extensions Patch

Reply via email to