Re: Proposal for disk quota feature

2018-09-24 Thread Hubert Zhang
>
> The quotas or object limits, resource limits are pretty useful and
> necessary, but I don't see these like new type of objects, it is much more
> some property of current objects. Because we have one syntax for this
> purpose I prefer it. Because is not good to have two syntaxes for similar
> purpose.

SCHEMA and TABLE are OK for me, But as I mentioned before, ROLE is a
special case when using ALTER SET at this moment.
TABLE and SCHEMA are both database level, e.g. pg_class and pg_namespace
both residents in one database. But ROLE is cluster-level. They don't
belong to a database. ALTER ROLE XXX SET disk_quota = xxx means to set the
quota for the user on all the databases in the first glance. But in our
first stage design, ROLE's quota is bind to a specific database. E.g. Role
Jack could have 10GB quota on database A and 2GB quota on database B.

SQL syntax is not hard to modify,  I don't think this should block the main
design of disk quota feature. Is there any comment on the design and
architecture? If no, we'll firstly submit our patch and involve more
discussion?

On Sat, Sep 22, 2018 at 3:03 PM Pavel Stehule 
wrote:

>
>
> so 22. 9. 2018 v 8:48 odesílatel Hubert Zhang  napsal:
>
>> But it looks like redundant to current GUC configuration and limits
>>
>> what do you mean by current GUC configuration? Is that the general block
>> number limit in your patch? If yes, the difference between GUC and
>> pg_diskquota catalog is that pg_diskquota will store different quota limit
>> for the different role, schema or table instead of a single GUC value.
>>
>
> storage is not relevant in this moment.
>
> I don't see to consistent to sets some limits via SET command, or ALTER X
> SET, and some other with CREATE QUOTA ON.
>
> The quotas or object limits, resource limits are pretty useful and
> necessary, but I don't see these like new type of objects, it is much more
> some property of current objects. Because we have one syntax for this
> purpose I prefer it. Because is not good to have two syntaxes for similar
> purpose.
>
> So instead CREATE DISC QUATA ON SCHEMA xxx some value I prefer
>
> ALTER SCHEMA xxx SET disc_quota = xxx;
>
> The functionality is +/- same. But ALTER XX SET was introduce first, and I
> don't feel comfortable to have any new syntax for similar purpose
>
> Regards
>
> Pavel
>
>
>
>
>
>>
>> On Sat, Sep 22, 2018 at 11:17 AM Pavel Stehule 
>> wrote:
>>
>>>
>>>
>>> pá 21. 9. 2018 v 16:21 odesílatel Hubert Zhang 
>>> napsal:
>>>
 just fast reaction - why QUOTA object?
> Isn't ALTER SET enough?
> Some like
> ALTER TABLE a1 SET quote = 1MB;
> ALTER USER ...
> ALTER SCHEMA ..
> New DDL commans looks like too hard hammer .


 It's an option. Prefer to consider quota setting store together:
 CREATE DISK QUOTA way is more nature to store quota setting in a
 separate pg_diskquota catalog
 While ALTER SET way is more close to store quota setting in pg_class,
 pg_role, pg_namespace. etc in an integrated way.
 (Note that here I mean nature/close is not must, ALTER SET could also
 store in pg_diskquota and vice versa.)

>>>
>>> I have not a problem with new special table for storing this
>>> information. But it looks like redundant to current GUC configuration and
>>> limits. Can be messy do some work with ALTER ROLE, and some work via CREATE
>>> QUOTE.
>>>
>>> Regards
>>>
>>> Pavel
>>>
>>>
 Here are some differences I can think of:
 1 pg_role is a global catalog, not per database level. It's harder to
 tracker the user's disk usage in the whole clusters(considering 1000+
 databases).  So the semantic of  CREATE DISK QUOTA ON USER is limited: it
 only tracks the user's disk usage inside the current database.
 2 using separate pg_diskquota could add more field except for quota
 limit without adding too many fields in pg_class, e.g. red zone to give the
 user a warning or the current disk usage of the db objects.

 On Fri, Sep 21, 2018 at 8:01 PM Pavel Stehule 
 wrote:

>
>
> pá 21. 9. 2018 v 13:32 odesílatel Hubert Zhang 
> napsal:
>
>>
>>
>>
>>
>> *Hi all,We redesign disk quota feature based on the comments from
>> Pavel Stehule and Chapman Flack. Here are the new 
>> design.OverviewBasically,
>>  disk quota feature is used to support multi-tenancy environment, 
>> different
>> level of database objects could be set a quota limit to avoid over use of
>> disk space. A common case could be as follows: DBA could enable disk 
>> quota
>> on a specified database list. DBA could set disk quota limit for
>> tables/schemas/roles in these databases. Separate disk quota worker 
>> process
>> will monitor the disk usage for these objects and detect the objects 
>> which
>> exceed their quota limit. Queries loading data into these “out of disk
>> quota” tables/schemas/roles will be cancelled.We are current

Re: Proposal for disk quota feature

2018-09-22 Thread Pavel Stehule
so 22. 9. 2018 v 8:48 odesílatel Hubert Zhang  napsal:

> But it looks like redundant to current GUC configuration and limits
>
> what do you mean by current GUC configuration? Is that the general block
> number limit in your patch? If yes, the difference between GUC and
> pg_diskquota catalog is that pg_diskquota will store different quota limit
> for the different role, schema or table instead of a single GUC value.
>

storage is not relevant in this moment.

I don't see to consistent to sets some limits via SET command, or ALTER X
SET, and some other with CREATE QUOTA ON.

The quotas or object limits, resource limits are pretty useful and
necessary, but I don't see these like new type of objects, it is much more
some property of current objects. Because we have one syntax for this
purpose I prefer it. Because is not good to have two syntaxes for similar
purpose.

So instead CREATE DISC QUATA ON SCHEMA xxx some value I prefer

ALTER SCHEMA xxx SET disc_quota = xxx;

The functionality is +/- same. But ALTER XX SET was introduce first, and I
don't feel comfortable to have any new syntax for similar purpose

Regards

Pavel





>
> On Sat, Sep 22, 2018 at 11:17 AM Pavel Stehule 
> wrote:
>
>>
>>
>> pá 21. 9. 2018 v 16:21 odesílatel Hubert Zhang 
>> napsal:
>>
>>> just fast reaction - why QUOTA object?
 Isn't ALTER SET enough?
 Some like
 ALTER TABLE a1 SET quote = 1MB;
 ALTER USER ...
 ALTER SCHEMA ..
 New DDL commans looks like too hard hammer .
>>>
>>>
>>> It's an option. Prefer to consider quota setting store together:
>>> CREATE DISK QUOTA way is more nature to store quota setting in a
>>> separate pg_diskquota catalog
>>> While ALTER SET way is more close to store quota setting in pg_class,
>>> pg_role, pg_namespace. etc in an integrated way.
>>> (Note that here I mean nature/close is not must, ALTER SET could also
>>> store in pg_diskquota and vice versa.)
>>>
>>
>> I have not a problem with new special table for storing this information.
>> But it looks like redundant to current GUC configuration and limits. Can be
>> messy do some work with ALTER ROLE, and some work via CREATE QUOTE.
>>
>> Regards
>>
>> Pavel
>>
>>
>>> Here are some differences I can think of:
>>> 1 pg_role is a global catalog, not per database level. It's harder to
>>> tracker the user's disk usage in the whole clusters(considering 1000+
>>> databases).  So the semantic of  CREATE DISK QUOTA ON USER is limited: it
>>> only tracks the user's disk usage inside the current database.
>>> 2 using separate pg_diskquota could add more field except for quota
>>> limit without adding too many fields in pg_class, e.g. red zone to give the
>>> user a warning or the current disk usage of the db objects.
>>>
>>> On Fri, Sep 21, 2018 at 8:01 PM Pavel Stehule 
>>> wrote:
>>>


 pá 21. 9. 2018 v 13:32 odesílatel Hubert Zhang 
 napsal:

>
>
>
>
> *Hi all,We redesign disk quota feature based on the comments from
> Pavel Stehule and Chapman Flack. Here are the new 
> design.OverviewBasically,
>  disk quota feature is used to support multi-tenancy environment, 
> different
> level of database objects could be set a quota limit to avoid over use of
> disk space. A common case could be as follows: DBA could enable disk quota
> on a specified database list. DBA could set disk quota limit for
> tables/schemas/roles in these databases. Separate disk quota worker 
> process
> will monitor the disk usage for these objects and detect the objects which
> exceed their quota limit. Queries loading data into these “out of disk
> quota” tables/schemas/roles will be cancelled.We are currently working at
> init implementation stage. We would like to propose our idea firstly and
> get feedbacks from community to do quick iteration.SQL Syntax (How to use
> disk quota)1 Specify the databases with disk quota enabled in GUC
> “diskquota_databases” in postgresql.conf and restart the database.2 DBA
> could set disk quota limit for table/schema/role.CREATE DISK QUOTA tablea1
> ON TABLE a1 with (quota = ‘1MB’);CREATE DISK QUOTA roleu1 ON USER u1 with
> (quota = ‘1GB’);CREATE DISK QUOTA schemas1 ON SCHEMA s1 with (quota =
> ‘3MB’);*
>

 just fast reaction - why QUOTA object?

 Isn't ALTER SET enough?

 Some like

 ALTER TABLE a1 SET quote = 1MB;
 ALTER USER ...
 ALTER SCHEMA ..

 New DDL commans looks like too hard hammer .



>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *3 Simulate a schema out of quota limit case: suppose table a1 and
> table a2 are both under schema s1.INSERT INTO a1 SELECT
> generate_series(1,1000);INSERT INTO a2 SELECT
> generate_series(1,300);SELECT pg_sleep(5)INSERT INTO a1 SELECT
> generate_series(1,1000);ERROR:  schema's disk space quota exceededDROP
> TABLE a2

Re: Proposal for disk quota feature

2018-09-21 Thread Hubert Zhang
>
> But it looks like redundant to current GUC configuration and limits

what do you mean by current GUC configuration? Is that the general block
number limit in your patch? If yes, the difference between GUC and
pg_diskquota catalog is that pg_diskquota will store different quota limit
for the different role, schema or table instead of a single GUC value.

On Sat, Sep 22, 2018 at 11:17 AM Pavel Stehule 
wrote:

>
>
> pá 21. 9. 2018 v 16:21 odesílatel Hubert Zhang  napsal:
>
>> just fast reaction - why QUOTA object?
>>> Isn't ALTER SET enough?
>>> Some like
>>> ALTER TABLE a1 SET quote = 1MB;
>>> ALTER USER ...
>>> ALTER SCHEMA ..
>>> New DDL commans looks like too hard hammer .
>>
>>
>> It's an option. Prefer to consider quota setting store together:
>> CREATE DISK QUOTA way is more nature to store quota setting in a separate
>> pg_diskquota catalog
>> While ALTER SET way is more close to store quota setting in pg_class,
>> pg_role, pg_namespace. etc in an integrated way.
>> (Note that here I mean nature/close is not must, ALTER SET could also
>> store in pg_diskquota and vice versa.)
>>
>
> I have not a problem with new special table for storing this information.
> But it looks like redundant to current GUC configuration and limits. Can be
> messy do some work with ALTER ROLE, and some work via CREATE QUOTE.
>
> Regards
>
> Pavel
>
>
>> Here are some differences I can think of:
>> 1 pg_role is a global catalog, not per database level. It's harder to
>> tracker the user's disk usage in the whole clusters(considering 1000+
>> databases).  So the semantic of  CREATE DISK QUOTA ON USER is limited: it
>> only tracks the user's disk usage inside the current database.
>> 2 using separate pg_diskquota could add more field except for quota limit
>> without adding too many fields in pg_class, e.g. red zone to give the user
>> a warning or the current disk usage of the db objects.
>>
>> On Fri, Sep 21, 2018 at 8:01 PM Pavel Stehule 
>> wrote:
>>
>>>
>>>
>>> pá 21. 9. 2018 v 13:32 odesílatel Hubert Zhang 
>>> napsal:
>>>




 *Hi all,We redesign disk quota feature based on the comments from Pavel
 Stehule and Chapman Flack. Here are the new design.OverviewBasically,  disk
 quota feature is used to support multi-tenancy environment, different level
 of database objects could be set a quota limit to avoid over use of disk
 space. A common case could be as follows: DBA could enable disk quota on a
 specified database list. DBA could set disk quota limit for
 tables/schemas/roles in these databases. Separate disk quota worker process
 will monitor the disk usage for these objects and detect the objects which
 exceed their quota limit. Queries loading data into these “out of disk
 quota” tables/schemas/roles will be cancelled.We are currently working at
 init implementation stage. We would like to propose our idea firstly and
 get feedbacks from community to do quick iteration.SQL Syntax (How to use
 disk quota)1 Specify the databases with disk quota enabled in GUC
 “diskquota_databases” in postgresql.conf and restart the database.2 DBA
 could set disk quota limit for table/schema/role.CREATE DISK QUOTA tablea1
 ON TABLE a1 with (quota = ‘1MB’);CREATE DISK QUOTA roleu1 ON USER u1 with
 (quota = ‘1GB’);CREATE DISK QUOTA schemas1 ON SCHEMA s1 with (quota =
 ‘3MB’);*

>>>
>>> just fast reaction - why QUOTA object?
>>>
>>> Isn't ALTER SET enough?
>>>
>>> Some like
>>>
>>> ALTER TABLE a1 SET quote = 1MB;
>>> ALTER USER ...
>>> ALTER SCHEMA ..
>>>
>>> New DDL commans looks like too hard hammer .
>>>
>>>
>>>

















 *3 Simulate a schema out of quota limit case: suppose table a1 and
 table a2 are both under schema s1.INSERT INTO a1 SELECT
 generate_series(1,1000);INSERT INTO a2 SELECT
 generate_series(1,300);SELECT pg_sleep(5)INSERT INTO a1 SELECT
 generate_series(1,1000);ERROR:  schema's disk space quota exceededDROP
 TABLE a2;SELECT pg_sleep(5)INSERT INTO a1 SELECT
 generate_series(1,1000);INSERT 0 1000ArchitectureDisk quota has the
 following components.1. Quota Setting Store is where the disk quota setting
 to be stored and accessed. We plan to use catalog table pg_diskquota to
 store these information. pg_diskquota is
 like:CATALOG(pg_diskquota,6122,DiskQuotaRelationId){ NameData quotaname; /*
 diskquota name */int16 quotatype; /* diskquota type name */ Oid
 quotatargetoid; /* diskquota target db object oid*/ int32 quotalimit; /*
 diskquota size limit in MB*/ int32 quotaredzone; /* diskquota redzone in
 MB*/} FormData_pg_diskquota;2. Quota Change Detector is the monitor of size
 change of database objects. We plan to use stat collector to detect the
 ‘active’ table list at initial stage. But stat collector has some
 limitation on finding the active table which is in a runnin

Re: Proposal for disk quota feature

2018-09-21 Thread Pavel Stehule
pá 21. 9. 2018 v 16:21 odesílatel Hubert Zhang  napsal:

> just fast reaction - why QUOTA object?
>> Isn't ALTER SET enough?
>> Some like
>> ALTER TABLE a1 SET quote = 1MB;
>> ALTER USER ...
>> ALTER SCHEMA ..
>> New DDL commans looks like too hard hammer .
>
>
> It's an option. Prefer to consider quota setting store together:
> CREATE DISK QUOTA way is more nature to store quota setting in a separate
> pg_diskquota catalog
> While ALTER SET way is more close to store quota setting in pg_class,
> pg_role, pg_namespace. etc in an integrated way.
> (Note that here I mean nature/close is not must, ALTER SET could also
> store in pg_diskquota and vice versa.)
>

I have not a problem with new special table for storing this information.
But it looks like redundant to current GUC configuration and limits. Can be
messy do some work with ALTER ROLE, and some work via CREATE QUOTE.

Regards

Pavel


> Here are some differences I can think of:
> 1 pg_role is a global catalog, not per database level. It's harder to
> tracker the user's disk usage in the whole clusters(considering 1000+
> databases).  So the semantic of  CREATE DISK QUOTA ON USER is limited: it
> only tracks the user's disk usage inside the current database.
> 2 using separate pg_diskquota could add more field except for quota limit
> without adding too many fields in pg_class, e.g. red zone to give the user
> a warning or the current disk usage of the db objects.
>
> On Fri, Sep 21, 2018 at 8:01 PM Pavel Stehule 
> wrote:
>
>>
>>
>> pá 21. 9. 2018 v 13:32 odesílatel Hubert Zhang 
>> napsal:
>>
>>>
>>>
>>>
>>>
>>> *Hi all,We redesign disk quota feature based on the comments from Pavel
>>> Stehule and Chapman Flack. Here are the new design.OverviewBasically,  disk
>>> quota feature is used to support multi-tenancy environment, different level
>>> of database objects could be set a quota limit to avoid over use of disk
>>> space. A common case could be as follows: DBA could enable disk quota on a
>>> specified database list. DBA could set disk quota limit for
>>> tables/schemas/roles in these databases. Separate disk quota worker process
>>> will monitor the disk usage for these objects and detect the objects which
>>> exceed their quota limit. Queries loading data into these “out of disk
>>> quota” tables/schemas/roles will be cancelled.We are currently working at
>>> init implementation stage. We would like to propose our idea firstly and
>>> get feedbacks from community to do quick iteration.SQL Syntax (How to use
>>> disk quota)1 Specify the databases with disk quota enabled in GUC
>>> “diskquota_databases” in postgresql.conf and restart the database.2 DBA
>>> could set disk quota limit for table/schema/role.CREATE DISK QUOTA tablea1
>>> ON TABLE a1 with (quota = ‘1MB’);CREATE DISK QUOTA roleu1 ON USER u1 with
>>> (quota = ‘1GB’);CREATE DISK QUOTA schemas1 ON SCHEMA s1 with (quota =
>>> ‘3MB’);*
>>>
>>
>> just fast reaction - why QUOTA object?
>>
>> Isn't ALTER SET enough?
>>
>> Some like
>>
>> ALTER TABLE a1 SET quote = 1MB;
>> ALTER USER ...
>> ALTER SCHEMA ..
>>
>> New DDL commans looks like too hard hammer .
>>
>>
>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *3 Simulate a schema out of quota limit case: suppose table a1 and table
>>> a2 are both under schema s1.INSERT INTO a1 SELECT
>>> generate_series(1,1000);INSERT INTO a2 SELECT
>>> generate_series(1,300);SELECT pg_sleep(5)INSERT INTO a1 SELECT
>>> generate_series(1,1000);ERROR:  schema's disk space quota exceededDROP
>>> TABLE a2;SELECT pg_sleep(5)INSERT INTO a1 SELECT
>>> generate_series(1,1000);INSERT 0 1000ArchitectureDisk quota has the
>>> following components.1. Quota Setting Store is where the disk quota setting
>>> to be stored and accessed. We plan to use catalog table pg_diskquota to
>>> store these information. pg_diskquota is
>>> like:CATALOG(pg_diskquota,6122,DiskQuotaRelationId){ NameData quotaname; /*
>>> diskquota name */int16 quotatype; /* diskquota type name */ Oid
>>> quotatargetoid; /* diskquota target db object oid*/ int32 quotalimit; /*
>>> diskquota size limit in MB*/ int32 quotaredzone; /* diskquota redzone in
>>> MB*/} FormData_pg_diskquota;2. Quota Change Detector is the monitor of size
>>> change of database objects. We plan to use stat collector to detect the
>>> ‘active’ table list at initial stage. But stat collector has some
>>> limitation on finding the active table which is in a running transaction.
>>> Details see TODO section.3. Quota Size Checker is where to calculate the
>>> size and compare with quota limit for database objects. According to
>>> Pavel’s comment, autovacuum launcher and worker process could be a good
>>> reference to disk quota. So we plan to use a disk quota launcher daemon
>>> process and several disk quota worker process to finish this work. Launcher
>>> process is responsible for starting worker process based on a user defined
>>> database list from GUC. Worker process will connect to its target

Re: Proposal for disk quota feature

2018-09-21 Thread Jeremy Finzel
On Fri, Sep 21, 2018 at 9:21 AM Hubert Zhang  wrote:

> just fast reaction - why QUOTA object?
>> Isn't ALTER SET enough?
>> Some like
>> ALTER TABLE a1 SET quote = 1MB;
>> ALTER USER ...
>> ALTER SCHEMA ..
>> New DDL commans looks like too hard hammer .
>
>
> It's an option. Prefer to consider quota setting store together:
> CREATE DISK QUOTA way is more nature to store quota setting in a separate
> pg_diskquota catalog
> While ALTER SET way is more close to store quota setting in pg_class,
> pg_role, pg_namespace. etc in an integrated way.
> (Note that here I mean nature/close is not must, ALTER SET could also
> store in pg_diskquota and vice versa.)
>
> Here are some differences I can think of:
> 1 pg_role is a global catalog, not per database level. It's harder to
> tracker the user's disk usage in the whole clusters(considering 1000+
> databases).  So the semantic of  CREATE DISK QUOTA ON USER is limited: it
> only tracks the user's disk usage inside the current database.
> 2 using separate pg_diskquota could add more field except for quota limit
> without adding too many fields in pg_class, e.g. red zone to give the user
> a warning or the current disk usage of the db objects.
>
> On Fri, Sep 21, 2018 at 8:01 PM Pavel Stehule 
> wrote:
>
>>
>>
>> pá 21. 9. 2018 v 13:32 odesílatel Hubert Zhang 
>> napsal:
>>
>>>
>>>
>>>
>>>
>>> *Hi all,We redesign disk quota feature based on the comments from Pavel
>>> Stehule and Chapman Flack. Here are the new design.OverviewBasically,  disk
>>> quota feature is used to support multi-tenancy environment, different level
>>> of database objects could be set a quota limit to avoid over use of disk
>>> space. A common case could be as follows: DBA could enable disk quota on a
>>> specified database list. DBA could set disk quota limit for
>>> tables/schemas/roles in these databases. Separate disk quota worker process
>>> will monitor the disk usage for these objects and detect the objects which
>>> exceed their quota limit. Queries loading data into these “out of disk
>>> quota” tables/schemas/roles will be cancelled.We are currently working at
>>> init implementation stage. We would like to propose our idea firstly and
>>> get feedbacks from community to do quick iteration.SQL Syntax (How to use
>>> disk quota)1 Specify the databases with disk quota enabled in GUC
>>> “diskquota_databases” in postgresql.conf and restart the database.2 DBA
>>> could set disk quota limit for table/schema/role.CREATE DISK QUOTA tablea1
>>> ON TABLE a1 with (quota = ‘1MB’);CREATE DISK QUOTA roleu1 ON USER u1 with
>>> (quota = ‘1GB’);CREATE DISK QUOTA schemas1 ON SCHEMA s1 with (quota =
>>> ‘3MB’);*
>>>
>>
>> just fast reaction - why QUOTA object?
>>
>> Isn't ALTER SET enough?
>>
>> Some like
>>
>> ALTER TABLE a1 SET quote = 1MB;
>> ALTER USER ...
>> ALTER SCHEMA ..
>>
>> New DDL commans looks like too hard hammer .
>>
>>
>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *3 Simulate a schema out of quota limit case: suppose table a1 and table
>>> a2 are both under schema s1.INSERT INTO a1 SELECT
>>> generate_series(1,1000);INSERT INTO a2 SELECT
>>> generate_series(1,300);SELECT pg_sleep(5)INSERT INTO a1 SELECT
>>> generate_series(1,1000);ERROR:  schema's disk space quota exceededDROP
>>> TABLE a2;SELECT pg_sleep(5)INSERT INTO a1 SELECT
>>> generate_series(1,1000);INSERT 0 1000ArchitectureDisk quota has the
>>> following components.1. Quota Setting Store is where the disk quota setting
>>> to be stored and accessed. We plan to use catalog table pg_diskquota to
>>> store these information. pg_diskquota is
>>> like:CATALOG(pg_diskquota,6122,DiskQuotaRelationId){ NameData quotaname; /*
>>> diskquota name */int16 quotatype; /* diskquota type name */ Oid
>>> quotatargetoid; /* diskquota target db object oid*/ int32 quotalimit; /*
>>> diskquota size limit in MB*/ int32 quotaredzone; /* diskquota redzone in
>>> MB*/} FormData_pg_diskquota;2. Quota Change Detector is the monitor of size
>>> change of database objects. We plan to use stat collector to detect the
>>> ‘active’ table list at initial stage. But stat collector has some
>>> limitation on finding the active table which is in a running transaction.
>>> Details see TODO section.3. Quota Size Checker is where to calculate the
>>> size and compare with quota limit for database objects. According to
>>> Pavel’s comment, autovacuum launcher and worker process could be a good
>>> reference to disk quota. So we plan to use a disk quota launcher daemon
>>> process and several disk quota worker process to finish this work. Launcher
>>> process is responsible for starting worker process based on a user defined
>>> database list from GUC. Worker process will connect to its target database
>>> and monitor the disk usage for objects in this database. In init stage of
>>> worker process, it will call calculate_total_relation_size() to calculate
>>> the size for each user table. After init stage, worker process will re

Re: Proposal for disk quota feature

2018-09-21 Thread Hubert Zhang
>
> just fast reaction - why QUOTA object?
> Isn't ALTER SET enough?
> Some like
> ALTER TABLE a1 SET quote = 1MB;
> ALTER USER ...
> ALTER SCHEMA ..
> New DDL commans looks like too hard hammer .


It's an option. Prefer to consider quota setting store together:
CREATE DISK QUOTA way is more nature to store quota setting in a separate
pg_diskquota catalog
While ALTER SET way is more close to store quota setting in pg_class,
pg_role, pg_namespace. etc in an integrated way.
(Note that here I mean nature/close is not must, ALTER SET could also store
in pg_diskquota and vice versa.)

Here are some differences I can think of:
1 pg_role is a global catalog, not per database level. It's harder to
tracker the user's disk usage in the whole clusters(considering 1000+
databases).  So the semantic of  CREATE DISK QUOTA ON USER is limited: it
only tracks the user's disk usage inside the current database.
2 using separate pg_diskquota could add more field except for quota limit
without adding too many fields in pg_class, e.g. red zone to give the user
a warning or the current disk usage of the db objects.

On Fri, Sep 21, 2018 at 8:01 PM Pavel Stehule 
wrote:

>
>
> pá 21. 9. 2018 v 13:32 odesílatel Hubert Zhang  napsal:
>
>>
>>
>>
>>
>> *Hi all,We redesign disk quota feature based on the comments from Pavel
>> Stehule and Chapman Flack. Here are the new design.OverviewBasically,  disk
>> quota feature is used to support multi-tenancy environment, different level
>> of database objects could be set a quota limit to avoid over use of disk
>> space. A common case could be as follows: DBA could enable disk quota on a
>> specified database list. DBA could set disk quota limit for
>> tables/schemas/roles in these databases. Separate disk quota worker process
>> will monitor the disk usage for these objects and detect the objects which
>> exceed their quota limit. Queries loading data into these “out of disk
>> quota” tables/schemas/roles will be cancelled.We are currently working at
>> init implementation stage. We would like to propose our idea firstly and
>> get feedbacks from community to do quick iteration.SQL Syntax (How to use
>> disk quota)1 Specify the databases with disk quota enabled in GUC
>> “diskquota_databases” in postgresql.conf and restart the database.2 DBA
>> could set disk quota limit for table/schema/role.CREATE DISK QUOTA tablea1
>> ON TABLE a1 with (quota = ‘1MB’);CREATE DISK QUOTA roleu1 ON USER u1 with
>> (quota = ‘1GB’);CREATE DISK QUOTA schemas1 ON SCHEMA s1 with (quota =
>> ‘3MB’);*
>>
>
> just fast reaction - why QUOTA object?
>
> Isn't ALTER SET enough?
>
> Some like
>
> ALTER TABLE a1 SET quote = 1MB;
> ALTER USER ...
> ALTER SCHEMA ..
>
> New DDL commans looks like too hard hammer .
>
>
>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *3 Simulate a schema out of quota limit case: suppose table a1 and table
>> a2 are both under schema s1.INSERT INTO a1 SELECT
>> generate_series(1,1000);INSERT INTO a2 SELECT
>> generate_series(1,300);SELECT pg_sleep(5)INSERT INTO a1 SELECT
>> generate_series(1,1000);ERROR:  schema's disk space quota exceededDROP
>> TABLE a2;SELECT pg_sleep(5)INSERT INTO a1 SELECT
>> generate_series(1,1000);INSERT 0 1000ArchitectureDisk quota has the
>> following components.1. Quota Setting Store is where the disk quota setting
>> to be stored and accessed. We plan to use catalog table pg_diskquota to
>> store these information. pg_diskquota is
>> like:CATALOG(pg_diskquota,6122,DiskQuotaRelationId){ NameData quotaname; /*
>> diskquota name */int16 quotatype; /* diskquota type name */ Oid
>> quotatargetoid; /* diskquota target db object oid*/ int32 quotalimit; /*
>> diskquota size limit in MB*/ int32 quotaredzone; /* diskquota redzone in
>> MB*/} FormData_pg_diskquota;2. Quota Change Detector is the monitor of size
>> change of database objects. We plan to use stat collector to detect the
>> ‘active’ table list at initial stage. But stat collector has some
>> limitation on finding the active table which is in a running transaction.
>> Details see TODO section.3. Quota Size Checker is where to calculate the
>> size and compare with quota limit for database objects. According to
>> Pavel’s comment, autovacuum launcher and worker process could be a good
>> reference to disk quota. So we plan to use a disk quota launcher daemon
>> process and several disk quota worker process to finish this work. Launcher
>> process is responsible for starting worker process based on a user defined
>> database list from GUC. Worker process will connect to its target database
>> and monitor the disk usage for objects in this database. In init stage of
>> worker process, it will call calculate_total_relation_size() to calculate
>> the size for each user table. After init stage, worker process will refresh
>> the disk model every N seconds. Refreshing will only recalculate the size
>> of tables in ‘active’ table list, which is generated by Quata Change
>> Detector to minimize the cost

Re: Proposal for disk quota feature

2018-09-21 Thread Pavel Stehule
pá 21. 9. 2018 v 13:32 odesílatel Hubert Zhang  napsal:

>
>
>
>
> *Hi all,We redesign disk quota feature based on the comments from Pavel
> Stehule and Chapman Flack. Here are the new design.OverviewBasically,  disk
> quota feature is used to support multi-tenancy environment, different level
> of database objects could be set a quota limit to avoid over use of disk
> space. A common case could be as follows: DBA could enable disk quota on a
> specified database list. DBA could set disk quota limit for
> tables/schemas/roles in these databases. Separate disk quota worker process
> will monitor the disk usage for these objects and detect the objects which
> exceed their quota limit. Queries loading data into these “out of disk
> quota” tables/schemas/roles will be cancelled.We are currently working at
> init implementation stage. We would like to propose our idea firstly and
> get feedbacks from community to do quick iteration.SQL Syntax (How to use
> disk quota)1 Specify the databases with disk quota enabled in GUC
> “diskquota_databases” in postgresql.conf and restart the database.2 DBA
> could set disk quota limit for table/schema/role.CREATE DISK QUOTA tablea1
> ON TABLE a1 with (quota = ‘1MB’);CREATE DISK QUOTA roleu1 ON USER u1 with
> (quota = ‘1GB’);CREATE DISK QUOTA schemas1 ON SCHEMA s1 with (quota =
> ‘3MB’);*
>

just fast reaction - why QUOTA object?

Isn't ALTER SET enough?

Some like

ALTER TABLE a1 SET quote = 1MB;
ALTER USER ...
ALTER SCHEMA ..

New DDL commans looks like too hard hammer .



>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *3 Simulate a schema out of quota limit case: suppose table a1 and table
> a2 are both under schema s1.INSERT INTO a1 SELECT
> generate_series(1,1000);INSERT INTO a2 SELECT
> generate_series(1,300);SELECT pg_sleep(5)INSERT INTO a1 SELECT
> generate_series(1,1000);ERROR:  schema's disk space quota exceededDROP
> TABLE a2;SELECT pg_sleep(5)INSERT INTO a1 SELECT
> generate_series(1,1000);INSERT 0 1000ArchitectureDisk quota has the
> following components.1. Quota Setting Store is where the disk quota setting
> to be stored and accessed. We plan to use catalog table pg_diskquota to
> store these information. pg_diskquota is
> like:CATALOG(pg_diskquota,6122,DiskQuotaRelationId){ NameData quotaname; /*
> diskquota name */int16 quotatype; /* diskquota type name */ Oid
> quotatargetoid; /* diskquota target db object oid*/ int32 quotalimit; /*
> diskquota size limit in MB*/ int32 quotaredzone; /* diskquota redzone in
> MB*/} FormData_pg_diskquota;2. Quota Change Detector is the monitor of size
> change of database objects. We plan to use stat collector to detect the
> ‘active’ table list at initial stage. But stat collector has some
> limitation on finding the active table which is in a running transaction.
> Details see TODO section.3. Quota Size Checker is where to calculate the
> size and compare with quota limit for database objects. According to
> Pavel’s comment, autovacuum launcher and worker process could be a good
> reference to disk quota. So we plan to use a disk quota launcher daemon
> process and several disk quota worker process to finish this work. Launcher
> process is responsible for starting worker process based on a user defined
> database list from GUC. Worker process will connect to its target database
> and monitor the disk usage for objects in this database. In init stage of
> worker process, it will call calculate_total_relation_size() to calculate
> the size for each user table. After init stage, worker process will refresh
> the disk model every N seconds. Refreshing will only recalculate the size
> of tables in ‘active’ table list, which is generated by Quata Change
> Detector to minimize the cost.4. Quota Enforcement Operator is where to
> check for the quota limitation at postgres backend side. We will firstly
> implement it in ExecCheckRTPerms() as pre-running enforcement. It will
> check the disk quota of tables being inserted or updated, and report error
> if table’s or table’s schema’s or table’s owner’s quota limit is exceeded.
> As a native feature, we plan to add more checkpoint to do running query
> enforcement. For example, if a disk quota lefts 10MB quota, a query could
> insert 1GB data. This query could be allowed in pre-running enforcement
> check, but will be cancelled in running query enforcement check. Therefore,
> it can improve the accurate of disk quota usage. To achieve this, we plan
> to add a checkpoint in lower API such as smgr_extened. Hence, the Quota
> Enforcement Operator will check the disk quota usage when smgr_extened is
> called. If the quota is over limited, current query will be cancelled.
> Highlight1. Native feature.Support native Create/Drop Disk Quota SQL
> statement.New catalog table pg_diskquota to store disk quota setting.2.
> Auto DML/DDL detection. Table
> create/update/insert/delete/vacuum/truncate/drop/schema_change/owner_change,
>  Schema create/drop and Role create/drop will be detected by di

Re: Proposal for disk quota feature

2018-09-21 Thread Hubert Zhang
*Hi all,We redesign disk quota feature based on the comments from Pavel
Stehule and Chapman Flack. Here are the new design.OverviewBasically,  disk
quota feature is used to support multi-tenancy environment, different level
of database objects could be set a quota limit to avoid over use of disk
space. A common case could be as follows: DBA could enable disk quota on a
specified database list. DBA could set disk quota limit for
tables/schemas/roles in these databases. Separate disk quota worker process
will monitor the disk usage for these objects and detect the objects which
exceed their quota limit. Queries loading data into these “out of disk
quota” tables/schemas/roles will be cancelled.We are currently working at
init implementation stage. We would like to propose our idea firstly and
get feedbacks from community to do quick iteration.SQL Syntax (How to use
disk quota)1 Specify the databases with disk quota enabled in GUC
“diskquota_databases” in postgresql.conf and restart the database.2 DBA
could set disk quota limit for table/schema/role.CREATE DISK QUOTA tablea1
ON TABLE a1 with (quota = ‘1MB’);CREATE DISK QUOTA roleu1 ON USER u1 with
(quota = ‘1GB’);CREATE DISK QUOTA schemas1 ON SCHEMA s1 with (quota =
‘3MB’);3 Simulate a schema out of quota limit case: suppose table a1 and
table a2 are both under schema s1.INSERT INTO a1 SELECT
generate_series(1,1000);INSERT INTO a2 SELECT
generate_series(1,300);SELECT pg_sleep(5)INSERT INTO a1 SELECT
generate_series(1,1000);ERROR:  schema's disk space quota exceededDROP
TABLE a2;SELECT pg_sleep(5)INSERT INTO a1 SELECT
generate_series(1,1000);INSERT 0 1000ArchitectureDisk quota has the
following components.1. Quota Setting Store is where the disk quota setting
to be stored and accessed. We plan to use catalog table pg_diskquota to
store these information. pg_diskquota is
like:CATALOG(pg_diskquota,6122,DiskQuotaRelationId){ NameData quotaname; /*
diskquota name */int16 quotatype; /* diskquota type name */ Oid
quotatargetoid; /* diskquota target db object oid*/ int32 quotalimit; /*
diskquota size limit in MB*/ int32 quotaredzone; /* diskquota redzone in
MB*/} FormData_pg_diskquota;2. Quota Change Detector is the monitor of size
change of database objects. We plan to use stat collector to detect the
‘active’ table list at initial stage. But stat collector has some
limitation on finding the active table which is in a running transaction.
Details see TODO section.3. Quota Size Checker is where to calculate the
size and compare with quota limit for database objects. According to
Pavel’s comment, autovacuum launcher and worker process could be a good
reference to disk quota. So we plan to use a disk quota launcher daemon
process and several disk quota worker process to finish this work. Launcher
process is responsible for starting worker process based on a user defined
database list from GUC. Worker process will connect to its target database
and monitor the disk usage for objects in this database. In init stage of
worker process, it will call calculate_total_relation_size() to calculate
the size for each user table. After init stage, worker process will refresh
the disk model every N seconds. Refreshing will only recalculate the size
of tables in ‘active’ table list, which is generated by Quata Change
Detector to minimize the cost.4. Quota Enforcement Operator is where to
check for the quota limitation at postgres backend side. We will firstly
implement it in ExecCheckRTPerms() as pre-running enforcement. It will
check the disk quota of tables being inserted or updated, and report error
if table’s or table’s schema’s or table’s owner’s quota limit is exceeded.
As a native feature, we plan to add more checkpoint to do running query
enforcement. For example, if a disk quota lefts 10MB quota, a query could
insert 1GB data. This query could be allowed in pre-running enforcement
check, but will be cancelled in running query enforcement check. Therefore,
it can improve the accurate of disk quota usage. To achieve this, we plan
to add a checkpoint in lower API such as smgr_extened. Hence, the Quota
Enforcement Operator will check the disk quota usage when smgr_extened is
called. If the quota is over limited, current query will be cancelled.
Highlight1. Native feature.Support native Create/Drop Disk Quota SQL
statement.New catalog table pg_diskquota to store disk quota setting.2.
Auto DML/DDL detection. Table
create/update/insert/delete/vacuum/truncate/drop/schema_change/owner_change,
 Schema create/drop and Role create/drop will be detected by disk quota
automatically. 3. Low cost disk quota checker.Worker process of disk quota
need to refresh the disk usage model every N seconds. Since recalculate the
file size using stat() system call is expensive for a large number of
files, we use an ‘active’ table list to reduce the real work at each
iteration. A basic experiment on our init stage implementation on database
with 20K tables shows that the refresh cost is 1% c

Re: Proposal for disk quota feature

2018-09-02 Thread Pavel Stehule
2018-09-03 3:49 GMT+02:00 Hubert Zhang :

> Thanks Pavel.
> Your patch did enforcement on storage level(md.c or we could also use
> smgr_extend). It's straight forward.
> But I prefer to implement disk_quota as a feature with following
> objectives:
> 1 set/alter disk quota setting on different database objects, e.g. user,
> database, schema etc. not only a general GUC, but we could set separate
> quota limit for a specific objects.
> 2 enforcement operator should work at two positions: before query is
> running and when query is running. The latter one's implementation maybe
> similar to your patch.
>

The patch was just example. The resource quotes should be more complex -
per partition, table, schema, database, user - so GUC are possible, but not
very user friendly.

Our case is specific, but not too much. The servers are used for
multidimensional analyses - and some tables can grow too fast (COPY, INSERT
SELECT). We need to solve limits immediately. The implementation is simple,
so I did it. Same implementation on database level, or schema level needs
some more locks, so it will not be too effective. The resource management
can be complex very complex, and I expect so it will be hard work.

Regards

Pavel


> On Sun, Sep 2, 2018 at 8:44 PM, Pavel Stehule 
> wrote:
>
>> Hi
>>
>> 2018-09-02 14:18 GMT+02:00 Hubert Zhang :
>>
>>> Thanks Chapman.
>>> @Pavel,  could you please explain more about your second suggestion 
>>> "implement
>>> some quotas on storage level?"
>>>
>>
>> See attached patch - it is very simple - and good enough for our
>> purposes.
>>
>> Regards
>>
>> Pavel
>>
>>
>>
>>> We will not keep the long-lived processes attach to all databases(just
>>> like you mentioned servers with thousands of databases)
>>> And you are right, we could share ideas with autovacuum process, fork
>>> worker processes in need.
>>> "autovacuum checks for tables that have had a large number of inserted,
>>> updated or deleted tuples. These checks use the statistics collection
>>> facility"
>>> diskquota process is similar to autovacuum at caring about insert, but
>>> the difference is that it also care about vucuum full, truncate and drop.
>>> While update and delete may not be interested since no file change happens.
>>> So a separate diskquota process is preferred.
>>>
>>> So if we implemented disk quota as a full native feature, and in the
>>> first initial version I prefer to implement the following features:
>>> 1 Fork diskquota launcher process under Postmaster serverloop, which is
>>> long-lived.
>>> 2 Diskquota launcher process is responsible for creating diskquota
>>> worker process for every database.
>>> 3 DIskquota setting is stored in a separate catalog table for each
>>> database.
>>> 4 Initialization stage, Diskquota launcher process creates diskquota worker
>>> process for all the databases(traverse like autovacuum). Worker process
>>> calculates disk usage of db objects and their diskquota setting. If any
>>> db object exceeds its quota limit, put them into the blacklist in the
>>> shared memory, which will later be used by enforcement operator. Worker
>>> process exits when works are done.
>>> 5 Running stage, Diskquota launcher process creates diskquota worker
>>> process for the database with a large number of insert, copy, truncate,
>>> drop etc. or create disk quota statement. Worker process updates the file
>>> size for db objects containing the result relation, and compare with the
>>> diskquota setting. Again, if exceeds quota limit, put them into blacklist,
>>> remove from blacklist vice versa. Worker process exits when works are
>>> done and a GUC could control the frequency of worker process restart to a
>>> specific database. As you know, this GUC also controls the delay when we do
>>> enforcement.
>>> 6 Enforcement. When postgres backend executes queries, check the
>>> blacklist in shared memory to determine whether the query is allowed(before
>>> execute) or need rollback(is executing)?
>>>
>>> If we implemented disk quota as an extension, we could just use
>>> background worker to start diskquota launcher process and use
>>> RegisterDynamicBackgroundWorker() to fork child diskquota worker
>>> processes by the launcher process as suggested by @Chapman.
>>> Diskquota setting could be stored in user table in a separate schema for
>>> each database(Schema and table created by create extension statement) just
>>> like what Heikki has done in pg_quota project. But in this case, we need to
>>> create extension for each database before diskquota worker process can be
>>> set up for that database.
>>>
>>> Any comments on the above design and which is preferred, native feature
>>> or extension as the POC?
>>>
>>>
>>> -- Hubert
>>>
>>>
>>>
>>> On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule 
>>> wrote:
>>>


 2018-08-30 16:22 GMT+02:00 Chapman Flack :

> On 08/30/2018 09:57 AM, Hubert Zhang wrote:
>
> > 2 Keep one worker process for each database. But using a
> p

Re: Proposal for disk quota feature

2018-09-02 Thread Hubert Zhang
Thanks Pavel.
Your patch did enforcement on storage level(md.c or we could also use
smgr_extend). It's straight forward.
But I prefer to implement disk_quota as a feature with following objectives:
1 set/alter disk quota setting on different database objects, e.g. user,
database, schema etc. not only a general GUC, but we could set separate
quota limit for a specific objects.
2 enforcement operator should work at two positions: before query is
running and when query is running. The latter one's implementation maybe
similar to your patch.

On Sun, Sep 2, 2018 at 8:44 PM, Pavel Stehule 
wrote:

> Hi
>
> 2018-09-02 14:18 GMT+02:00 Hubert Zhang :
>
>> Thanks Chapman.
>> @Pavel,  could you please explain more about your second suggestion 
>> "implement
>> some quotas on storage level?"
>>
>
> See attached patch - it is very simple - and good enough for our purposes.
>
> Regards
>
> Pavel
>
>
>
>> We will not keep the long-lived processes attach to all databases(just
>> like you mentioned servers with thousands of databases)
>> And you are right, we could share ideas with autovacuum process, fork
>> worker processes in need.
>> "autovacuum checks for tables that have had a large number of inserted,
>> updated or deleted tuples. These checks use the statistics collection
>> facility"
>> diskquota process is similar to autovacuum at caring about insert, but
>> the difference is that it also care about vucuum full, truncate and drop.
>> While update and delete may not be interested since no file change happens.
>> So a separate diskquota process is preferred.
>>
>> So if we implemented disk quota as a full native feature, and in the
>> first initial version I prefer to implement the following features:
>> 1 Fork diskquota launcher process under Postmaster serverloop, which is
>> long-lived.
>> 2 Diskquota launcher process is responsible for creating diskquota
>> worker process for every database.
>> 3 DIskquota setting is stored in a separate catalog table for each
>> database.
>> 4 Initialization stage, Diskquota launcher process creates diskquota worker
>> process for all the databases(traverse like autovacuum). Worker process
>> calculates disk usage of db objects and their diskquota setting. If any
>> db object exceeds its quota limit, put them into the blacklist in the
>> shared memory, which will later be used by enforcement operator. Worker
>> process exits when works are done.
>> 5 Running stage, Diskquota launcher process creates diskquota worker
>> process for the database with a large number of insert, copy, truncate,
>> drop etc. or create disk quota statement. Worker process updates the file
>> size for db objects containing the result relation, and compare with the
>> diskquota setting. Again, if exceeds quota limit, put them into blacklist,
>> remove from blacklist vice versa. Worker process exits when works are
>> done and a GUC could control the frequency of worker process restart to a
>> specific database. As you know, this GUC also controls the delay when we do
>> enforcement.
>> 6 Enforcement. When postgres backend executes queries, check the
>> blacklist in shared memory to determine whether the query is allowed(before
>> execute) or need rollback(is executing)?
>>
>> If we implemented disk quota as an extension, we could just use
>> background worker to start diskquota launcher process and use
>> RegisterDynamicBackgroundWorker() to fork child diskquota worker
>> processes by the launcher process as suggested by @Chapman.
>> Diskquota setting could be stored in user table in a separate schema for
>> each database(Schema and table created by create extension statement) just
>> like what Heikki has done in pg_quota project. But in this case, we need to
>> create extension for each database before diskquota worker process can be
>> set up for that database.
>>
>> Any comments on the above design and which is preferred, native feature
>> or extension as the POC?
>>
>>
>> -- Hubert
>>
>>
>>
>> On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule 
>> wrote:
>>
>>>
>>>
>>> 2018-08-30 16:22 GMT+02:00 Chapman Flack :
>>>
 On 08/30/2018 09:57 AM, Hubert Zhang wrote:

 > 2 Keep one worker process for each database. But using a parent/global
 > quota worker process to manage the lifecycle of database level worker
 > processes. It could handle the newly created database(avoid restart
 > database) and save resource when a database is not used. But this
 needs to
 > change worker process to be hierarchical. Postmaster becomes the
 grandfather
 >  of database level worker processes in this case.

 I am using background workers this way in 9.5 at $work.

 In my case, one worker lives forever, wakes up on a set period, and
 starts a short-lived worker for every database, waiting for each
 one before starting the next.

 It was straightforward to implement. Looking back over the code,
 I see the global worker assigns its own PID to worker.bgw_

Re: Proposal for disk quota feature

2018-09-02 Thread Pavel Stehule
Hi

2018-09-02 14:18 GMT+02:00 Hubert Zhang :

> Thanks Chapman.
> @Pavel,  could you please explain more about your second suggestion "implement
> some quotas on storage level?"
>

See attached patch - it is very simple - and good enough for our purposes.

Regards

Pavel



> We will not keep the long-lived processes attach to all databases(just
> like you mentioned servers with thousands of databases)
> And you are right, we could share ideas with autovacuum process, fork
> worker processes in need.
> "autovacuum checks for tables that have had a large number of inserted,
> updated or deleted tuples. These checks use the statistics collection
> facility"
> diskquota process is similar to autovacuum at caring about insert, but the
> difference is that it also care about vucuum full, truncate and drop. While
> update and delete may not be interested since no file change happens. So a
> separate diskquota process is preferred.
>
> So if we implemented disk quota as a full native feature, and in the
> first initial version I prefer to implement the following features:
> 1 Fork diskquota launcher process under Postmaster serverloop, which is
> long-lived.
> 2 Diskquota launcher process is responsible for creating diskquota worker
> process for every database.
> 3 DIskquota setting is stored in a separate catalog table for each
> database.
> 4 Initialization stage, Diskquota launcher process creates diskquota worker
> process for all the databases(traverse like autovacuum). Worker process
> calculates disk usage of db objects and their diskquota setting. If any
> db object exceeds its quota limit, put them into the blacklist in the
> shared memory, which will later be used by enforcement operator. Worker
> process exits when works are done.
> 5 Running stage, Diskquota launcher process creates diskquota worker
> process for the database with a large number of insert, copy, truncate,
> drop etc. or create disk quota statement. Worker process updates the file
> size for db objects containing the result relation, and compare with the
> diskquota setting. Again, if exceeds quota limit, put them into blacklist,
> remove from blacklist vice versa. Worker process exits when works are
> done and a GUC could control the frequency of worker process restart to a
> specific database. As you know, this GUC also controls the delay when we do
> enforcement.
> 6 Enforcement. When postgres backend executes queries, check the blacklist
> in shared memory to determine whether the query is allowed(before execute)
> or need rollback(is executing)?
>
> If we implemented disk quota as an extension, we could just use background
> worker to start diskquota launcher process and use
> RegisterDynamicBackgroundWorker() to fork child diskquota worker
> processes by the launcher process as suggested by @Chapman.
> Diskquota setting could be stored in user table in a separate schema for
> each database(Schema and table created by create extension statement) just
> like what Heikki has done in pg_quota project. But in this case, we need to
> create extension for each database before diskquota worker process can be
> set up for that database.
>
> Any comments on the above design and which is preferred, native feature or
> extension as the POC?
>
>
> -- Hubert
>
>
>
> On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule 
> wrote:
>
>>
>>
>> 2018-08-30 16:22 GMT+02:00 Chapman Flack :
>>
>>> On 08/30/2018 09:57 AM, Hubert Zhang wrote:
>>>
>>> > 2 Keep one worker process for each database. But using a parent/global
>>> > quota worker process to manage the lifecycle of database level worker
>>> > processes. It could handle the newly created database(avoid restart
>>> > database) and save resource when a database is not used. But this
>>> needs to
>>> > change worker process to be hierarchical. Postmaster becomes the
>>> grandfather
>>> >  of database level worker processes in this case.
>>>
>>> I am using background workers this way in 9.5 at $work.
>>>
>>> In my case, one worker lives forever, wakes up on a set period, and
>>> starts a short-lived worker for every database, waiting for each
>>> one before starting the next.
>>>
>>> It was straightforward to implement. Looking back over the code,
>>> I see the global worker assigns its own PID to worker.bgw_notify_pid
>>> of each of its children, and also obtains a handle for each child
>>> from RegisterDynamicBackgroundWorker().
>>>
>>> I imagine the global quota worker would prefer to start workers
>>> for every database and then just wait for notifications from any
>>> of them, but that seems equally straightforward at first glance.
>>>
>>
>> There are servers with thousands databases. Worker per database is not
>> good idea.
>>
>> It should to share ideas, code with autovacuum process.
>>
>> Not sure, how to effective implementation based on bg workers can be. On
>> servers with large set of databases, large set of tables it can identify
>> too big table too late.
>>
>> Isn't better to implement som

Re: Proposal for disk quota feature

2018-09-02 Thread Hubert Zhang
Thanks Chapman.
@Pavel,  could you please explain more about your second suggestion "implement
some quotas on storage level?"
We will not keep the long-lived processes attach to all databases(just like
you mentioned servers with thousands of databases)
And you are right, we could share ideas with autovacuum process, fork
worker processes in need.
"autovacuum checks for tables that have had a large number of inserted,
updated or deleted tuples. These checks use the statistics collection
facility"
diskquota process is similar to autovacuum at caring about insert, but the
difference is that it also care about vucuum full, truncate and drop. While
update and delete may not be interested since no file change happens. So a
separate diskquota process is preferred.

So if we implemented disk quota as a full native feature, and in the first
initial version I prefer to implement the following features:
1 Fork diskquota launcher process under Postmaster serverloop, which is
long-lived.
2 Diskquota launcher process is responsible for creating diskquota worker
process for every database.
3 DIskquota setting is stored in a separate catalog table for each database.
4 Initialization stage, Diskquota launcher process creates diskquota worker
process for all the databases(traverse like autovacuum). Worker process
calculates disk usage of db objects and their diskquota setting. If any
db object exceeds its quota limit, put them into the blacklist in the
shared memory, which will later be used by enforcement operator. Worker
process exits when works are done.
5 Running stage, Diskquota launcher process creates diskquota worker
process for the database with a large number of insert, copy, truncate,
drop etc. or create disk quota statement. Worker process updates the file
size for db objects containing the result relation, and compare with the
diskquota setting. Again, if exceeds quota limit, put them into blacklist,
remove from blacklist vice versa. Worker process exits when works are done
and a GUC could control the frequency of worker process restart to a
specific database. As you know, this GUC also controls the delay when we do
enforcement.
6 Enforcement. When postgres backend executes queries, check the blacklist
in shared memory to determine whether the query is allowed(before execute)
or need rollback(is executing)?

If we implemented disk quota as an extension, we could just use background
worker to start diskquota launcher process and use
RegisterDynamicBackgroundWorker() to fork child diskquota worker processes
by the launcher process as suggested by @Chapman. Diskquota setting could
be stored in user table in a separate schema for each database(Schema and
table created by create extension statement) just like what Heikki has done
in pg_quota project. But in this case, we need to create extension for each
database before diskquota worker process can be set up for that database.

Any comments on the above design and which is preferred, native feature or
extension as the POC?


-- Hubert



On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule 
wrote:

>
>
> 2018-08-30 16:22 GMT+02:00 Chapman Flack :
>
>> On 08/30/2018 09:57 AM, Hubert Zhang wrote:
>>
>> > 2 Keep one worker process for each database. But using a parent/global
>> > quota worker process to manage the lifecycle of database level worker
>> > processes. It could handle the newly created database(avoid restart
>> > database) and save resource when a database is not used. But this needs
>> to
>> > change worker process to be hierarchical. Postmaster becomes the
>> grandfather
>> >  of database level worker processes in this case.
>>
>> I am using background workers this way in 9.5 at $work.
>>
>> In my case, one worker lives forever, wakes up on a set period, and
>> starts a short-lived worker for every database, waiting for each
>> one before starting the next.
>>
>> It was straightforward to implement. Looking back over the code,
>> I see the global worker assigns its own PID to worker.bgw_notify_pid
>> of each of its children, and also obtains a handle for each child
>> from RegisterDynamicBackgroundWorker().
>>
>> I imagine the global quota worker would prefer to start workers
>> for every database and then just wait for notifications from any
>> of them, but that seems equally straightforward at first glance.
>>
>
> There are servers with thousands databases. Worker per database is not
> good idea.
>
> It should to share ideas, code with autovacuum process.
>
> Not sure, how to effective implementation based on bg workers can be. On
> servers with large set of databases, large set of tables it can identify
> too big table too late.
>
> Isn't better to implement some quotas on storage level?
>
> Regards
>
> Pavel
>
>
>
>> -Chap
>>
>>
>


-- 
Thanks

Hubert Zhang


Re: Proposal for disk quota feature

2018-08-30 Thread Pavel Stehule
2018-08-30 16:22 GMT+02:00 Chapman Flack :

> On 08/30/2018 09:57 AM, Hubert Zhang wrote:
>
> > 2 Keep one worker process for each database. But using a parent/global
> > quota worker process to manage the lifecycle of database level worker
> > processes. It could handle the newly created database(avoid restart
> > database) and save resource when a database is not used. But this needs
> to
> > change worker process to be hierarchical. Postmaster becomes the
> grandfather
> >  of database level worker processes in this case.
>
> I am using background workers this way in 9.5 at $work.
>
> In my case, one worker lives forever, wakes up on a set period, and
> starts a short-lived worker for every database, waiting for each
> one before starting the next.
>
> It was straightforward to implement. Looking back over the code,
> I see the global worker assigns its own PID to worker.bgw_notify_pid
> of each of its children, and also obtains a handle for each child
> from RegisterDynamicBackgroundWorker().
>
> I imagine the global quota worker would prefer to start workers
> for every database and then just wait for notifications from any
> of them, but that seems equally straightforward at first glance.
>

There are servers with thousands databases. Worker per database is not good
idea.

It should to share ideas, code with autovacuum process.

Not sure, how to effective implementation based on bg workers can be. On
servers with large set of databases, large set of tables it can identify
too big table too late.

Isn't better to implement some quotas on storage level?

Regards

Pavel



> -Chap
>
>


Re: Proposal for disk quota feature

2018-08-30 Thread Chapman Flack
On 08/30/2018 09:57 AM, Hubert Zhang wrote:

> 2 Keep one worker process for each database. But using a parent/global
> quota worker process to manage the lifecycle of database level worker
> processes. It could handle the newly created database(avoid restart
> database) and save resource when a database is not used. But this needs to
> change worker process to be hierarchical. Postmaster becomes the grandfather
>  of database level worker processes in this case.

I am using background workers this way in 9.5 at $work.

In my case, one worker lives forever, wakes up on a set period, and
starts a short-lived worker for every database, waiting for each
one before starting the next.

It was straightforward to implement. Looking back over the code,
I see the global worker assigns its own PID to worker.bgw_notify_pid
of each of its children, and also obtains a handle for each child
from RegisterDynamicBackgroundWorker().

I imagine the global quota worker would prefer to start workers
for every database and then just wait for notifications from any
of them, but that seems equally straightforward at first glance.

-Chap