subject:"\[HACKERS\] \[Patch\] Temporary tables that do not bloat pg

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-31 Thread Corey Huinker

On Wed, Aug 31, 2016 at 6:07 PM, Andres Freund  wrote:

>
> In my experience pg attribute is usually the worst affected. Many tech
> takes won't even have stays entries...
>
>
Mine too. One database currently has a 400GB pg_attribute table, because we
chew through temp tables like popcorn.

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-31 Thread Andres Freund



On August 31, 2016 3:00:15 PM PDT, Tomas Vondra  
wrote:
>
>
>On 08/31/2016 11:43 PM, Andres Freund wrote:
>> On 2016-08-31 23:40:46 +0200, Tomas Vondra wrote:
>>> It's an improvement (and it's pretty much exactly what I proposed
>>> upthread). But it does not solve the problems with pg_statistic for
>>> example (each backend needs it's own statistics. So we'd either
>bloat
>>> the pg_statistic (if we manage to solve the problem that the table
>has
>>> the same oid in all backends), or we would need in-memory tuples
>(just
>>> like discussed in the thread so far).
>> 
>> Creating a session private version of pg_statistic would be fairly
>> simple.
>
>Sure. I'm just saying it's not as simple as overriding relpath.
>
>ISTM we only need the pg_statistics (as other catalogs are connected to
>the pg_class entry), which does not have the dependency issues. Or do
>we
>need other catalogs?

In my experience pg attribute is usually the worst affected. Many tech takes 
won't even have stays entries...

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-31 Thread Tomas Vondra



On 08/31/2016 11:43 PM, Andres Freund wrote:
> On 2016-08-31 23:40:46 +0200, Tomas Vondra wrote:
>> It's an improvement (and it's pretty much exactly what I proposed
>> upthread). But it does not solve the problems with pg_statistic for
>> example (each backend needs it's own statistics. So we'd either bloat
>> the pg_statistic (if we manage to solve the problem that the table has
>> the same oid in all backends), or we would need in-memory tuples (just
>> like discussed in the thread so far).
> 
> Creating a session private version of pg_statistic would be fairly
> simple.

Sure. I'm just saying it's not as simple as overriding relpath.

ISTM we only need the pg_statistics (as other catalogs are connected to
the pg_class entry), which does not have the dependency issues. Or do we
need other catalogs?

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-31 Thread Andres Freund

On 2016-08-31 23:40:46 +0200, Tomas Vondra wrote:
> It's an improvement (and it's pretty much exactly what I proposed
> upthread). But it does not solve the problems with pg_statistic for
> example (each backend needs it's own statistics. So we'd either bloat
> the pg_statistic (if we manage to solve the problem that the table has
> the same oid in all backends), or we would need in-memory tuples (just
> like discussed in the thread so far).

Creating a session private version of pg_statistic would be fairly
simple.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-31 Thread Tomas Vondra



On 08/31/2016 09:20 PM, Vik Fearing wrote:
> On 08/24/2016 06:16 PM, Robert Haas wrote:
>> On Tue, Aug 23, 2016 at 6:11 PM, Tomas Vondra
>>  wrote:
>>> Could someone please explain how the unlogged tables are supposed to fix the
>>> catalog bloat problem, as stated in the initial patch submission? We'd still
>>> need to insert/delete the catalog rows when creating/dropping the temporary
>>> tables, causing the bloat. Or is there something I'm missing?
>>
>> No, not really.  Jim just asked if the idea of partitioning the
>> columns was completely dead in the water, and I said, no, you could
>> theoretically salvage it.  Whether that does you much good is another
>> question.
>>
>> IMV, the point here is that you MUST have globally visible dependency
>> entries for this to work sanely.  If they're not in a catalog, they
>> have to be someplace else, and backend-private memory isn't good
>> enough, because that's not globally visible.  Until we've got a
>> strategy for that problem, this whole effort is going nowhere - even
>> though in other respects it may be a terrific idea.
> 
> Why not just have a regular-looking table, with a "global temporary"
> relpersistence (I don't care which letter it gets) and when a backend
> tries to access it, it uses its own private relfilenode instead of
> whatever is in pg_class, creating one if necessary.  That way the
> structure of the table is fixed, with all the dependencies and whatnot,
> but the content is private to each backend.  What's wrong with this idea?
> 

It's an improvement (and it's pretty much exactly what I proposed
upthread). But it does not solve the problems with pg_statistic for
example (each backend needs it's own statistics. So we'd either bloat
the pg_statistic (if we manage to solve the problem that the table has
the same oid in all backends), or we would need in-memory tuples (just
like discussed in the thread so far).



-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-31 Thread Vik Fearing

On 08/24/2016 06:16 PM, Robert Haas wrote:
> On Tue, Aug 23, 2016 at 6:11 PM, Tomas Vondra
>  wrote:
>> Could someone please explain how the unlogged tables are supposed to fix the
>> catalog bloat problem, as stated in the initial patch submission? We'd still
>> need to insert/delete the catalog rows when creating/dropping the temporary
>> tables, causing the bloat. Or is there something I'm missing?
> 
> No, not really.  Jim just asked if the idea of partitioning the
> columns was completely dead in the water, and I said, no, you could
> theoretically salvage it.  Whether that does you much good is another
> question.
> 
> IMV, the point here is that you MUST have globally visible dependency
> entries for this to work sanely.  If they're not in a catalog, they
> have to be someplace else, and backend-private memory isn't good
> enough, because that's not globally visible.  Until we've got a
> strategy for that problem, this whole effort is going nowhere - even
> though in other respects it may be a terrific idea.

Why not just have a regular-looking table, with a "global temporary"
relpersistence (I don't care which letter it gets) and when a backend
tries to access it, it uses its own private relfilenode instead of
whatever is in pg_class, creating one if necessary.  That way the
structure of the table is fixed, with all the dependencies and whatnot,
but the content is private to each backend.  What's wrong with this idea?
-- 
Vik Fearing  +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-31 Thread Corey Huinker

On Wed, Aug 24, 2016 at 12:39 PM, Andres Freund  wrote:

>
>
> On August 24, 2016 9:32:48 AM PDT, Tomas Vondra <
> tomas.von...@2ndquadrant.com> wrote:
> >
> >
> >On 08/24/2016 12:20 AM, Andres Freund wrote:
> >> On 2016-08-23 19:18:04 -0300, Claudio Freire wrote:
> >>> On Tue, Aug 23, 2016 at 7:11 PM, Tomas Vondra
> >>>  wrote:
>  Could someone please explain how the unlogged tables are supposed
> >to fix the
>  catalog bloat problem, as stated in the initial patch submission?
> >We'd still
>  need to insert/delete the catalog rows when creating/dropping the
> >temporary
>  tables, causing the bloat. Or is there something I'm missing?
> >>
> >> Beats me.
> >>
> >
> >Are you puzzled just like me, or are you puzzled why I'm puzzled?
>
> Like you. I don't think this addresses the problem to a significant enough
> degree to care.
>
> Andres
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Ok, here's a wild idea, and it probably depends on having native
partitioning implemented.

Propagate relpersistence, or a boolean flag on (relpersistence = 't') from
pg_class into the child pg_attribute records.

Partition the tables pg_class and pg_attribute first by relpersistence, and
then by oid.

The partitions holding data on persistent objects would basically stay
as-is, but the partition wouldn't have much activity and no temp-table
churn.

The temporary ones, however, would fall into essentially a rotating set of
partitions. Pick enough partitions such that the active transactions only
cover some of the partitions. The rest can be safely truncated by vacuum.

It would mitigate the bloat, existing dictionary queries would still work,
but the additional lookup cost might not be worth it.

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-24 Thread Andres Freund



On August 24, 2016 9:32:48 AM PDT, Tomas Vondra  
wrote:
>
>
>On 08/24/2016 12:20 AM, Andres Freund wrote:
>> On 2016-08-23 19:18:04 -0300, Claudio Freire wrote:
>>> On Tue, Aug 23, 2016 at 7:11 PM, Tomas Vondra
>>>  wrote:
 Could someone please explain how the unlogged tables are supposed
>to fix the
 catalog bloat problem, as stated in the initial patch submission?
>We'd still
 need to insert/delete the catalog rows when creating/dropping the
>temporary
 tables, causing the bloat. Or is there something I'm missing?
>>
>> Beats me.
>>
>
>Are you puzzled just like me, or are you puzzled why I'm puzzled?

Like you. I don't think this addresses the problem to a significant enough 
degree to care.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-24 Thread Tomas Vondra




On 08/24/2016 12:20 AM, Andres Freund wrote:

On 2016-08-23 19:18:04 -0300, Claudio Freire wrote:

On Tue, Aug 23, 2016 at 7:11 PM, Tomas Vondra
 wrote:

Could someone please explain how the unlogged tables are supposed to fix the
catalog bloat problem, as stated in the initial patch submission? We'd still
need to insert/delete the catalog rows when creating/dropping the temporary
tables, causing the bloat. Or is there something I'm missing?


Beats me.



Are you puzzled just like me, or are you puzzled why I'm puzzled?



--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-24 Thread Robert Haas

On Tue, Aug 23, 2016 at 6:11 PM, Tomas Vondra
 wrote:
> Could someone please explain how the unlogged tables are supposed to fix the
> catalog bloat problem, as stated in the initial patch submission? We'd still
> need to insert/delete the catalog rows when creating/dropping the temporary
> tables, causing the bloat. Or is there something I'm missing?

No, not really.  Jim just asked if the idea of partitioning the
columns was completely dead in the water, and I said, no, you could
theoretically salvage it.  Whether that does you much good is another
question.

IMV, the point here is that you MUST have globally visible dependency
entries for this to work sanely.  If they're not in a catalog, they
have to be someplace else, and backend-private memory isn't good
enough, because that's not globally visible.  Until we've got a
strategy for that problem, this whole effort is going nowhere - even
though in other respects it may be a terrific idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-23 Thread Claudio Freire

On Wed, Aug 24, 2016 at 2:04 AM, Alvaro Herrera
 wrote:
> Claudio Freire wrote:
>
>> After looking at it from a birdseye view, I agree it's conceptually
>> complex (reading HeapTupleSatisfiesSelf already makes one dizzy).
>>
>> But other than that, the implementation seems rather simple. It seems
>> to me, if one figures out that it is safe to do so (a-priori, xmin not
>> committed, xmax is current transaction), it would simply be a matter
>> of chasing the HOT chain root, setting all LP except the first to
>> LP_UNUSED and the first one to LP_DEAD.
>>
>> Of course I may be missing a ton of stuff.
>
> What you seem to be missing is that rows corresponding to temp tables
> are not "visible to its own transaction only".  The rows are valid
> after the transaction is gone; what makes the tables temporary is the
> fact that they are in a temporary schema.  And what makes them invisible
> to one backend is the fact that they are in the temporary schema for
> another backend.  Not that they are uncommitted.

Yeah, I was thinking of "on commit drop" behavior, but granted there's
all the others.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-23 Thread Alvaro Herrera

Claudio Freire wrote:

> After looking at it from a birdseye view, I agree it's conceptually
> complex (reading HeapTupleSatisfiesSelf already makes one dizzy).
> 
> But other than that, the implementation seems rather simple. It seems
> to me, if one figures out that it is safe to do so (a-priori, xmin not
> committed, xmax is current transaction), it would simply be a matter
> of chasing the HOT chain root, setting all LP except the first to
> LP_UNUSED and the first one to LP_DEAD.
> 
> Of course I may be missing a ton of stuff.

What you seem to be missing is that rows corresponding to temp tables
are not "visible to its own transaction only".  The rows are valid
after the transaction is gone; what makes the tables temporary is the
fact that they are in a temporary schema.  And what makes them invisible
to one backend is the fact that they are in the temporary schema for
another backend.  Not that they are uncommitted.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-23 Thread Claudio Freire

On Tue, Aug 23, 2016 at 9:12 PM, Tomas Vondra
 wrote:
> On 08/24/2016 12:38 AM, Claudio Freire wrote:
>>
>> On Tue, Aug 23, 2016 at 7:25 PM, Tomas Vondra
>>  wrote:
>
> Could someone please explain how the unlogged tables are supposed to
> fix
> the
> catalog bloat problem, as stated in the initial patch submission? We'd
> still
> need to insert/delete the catalog rows when creating/dropping the
> temporary
> tables, causing the bloat. Or is there something I'm missing?



 Wouldn't more aggressive vacuuming of catalog tables fix the bloat?

 Perhaps reserving a worker or N to run only on catalog schemas?

 That'd be far simpler.
>>>
>>>
>>>
>>> Maybe, although IIRC the issues with catalog bloat were due to a
>>> combination
>>> of long queries and many temporary tables being created/dropped. In that
>>> case simply ramping up autovacuum (or even having a dedicated workers for
>>> catalogs) would not realy help due to the xmin horizon being blocked by
>>> the
>>> long-running queries.
>>>
>>> Maybe it's entirely crazy idea due to the wine I drank at the dinner, but
>>> couldn't we vacuum the temporary table records differently? For example,
>>> couldn't we just consider them removable as soon as the backend that owns
>>> them disappears?
>>
>>
>> Or perhaps go all the way and generalize that to rows that never
>> become visible outside their parent transaction.
>>
>> As in, delete of rows created by the deleting transaction could clean
>> up, carefully to avoid voiding indexes and all that, but more
>> aggressively than regular deletes.
>>
>
> Maybe, but I wouldn't be surprised if such generalization would be an order
> of magnitude more complicated - and even the vacuuming changes I mentioned
> are undoubtedly a fair amount of work.

After looking at it from a birdseye view, I agree it's conceptually
complex (reading HeapTupleSatisfiesSelf already makes one dizzy).

But other than that, the implementation seems rather simple. It seems
to me, if one figures out that it is safe to do so (a-priori, xmin not
committed, xmax is current transaction), it would simply be a matter
of chasing the HOT chain root, setting all LP except the first to
LP_UNUSED and the first one to LP_DEAD.

Of course I may be missing a ton of stuff.

> Sadly, I don't see how this might fix the other issues mentioned in this
> thread (e.g. impossibility to create temp tables on standbys),

No it doesn't :(


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-23 Thread Tomas Vondra




On 08/24/2016 12:38 AM, Claudio Freire wrote:

On Tue, Aug 23, 2016 at 7:25 PM, Tomas Vondra
 wrote:

Could someone please explain how the unlogged tables are supposed to fix
the
catalog bloat problem, as stated in the initial patch submission? We'd
still
need to insert/delete the catalog rows when creating/dropping the
temporary
tables, causing the bloat. Or is there something I'm missing?



Wouldn't more aggressive vacuuming of catalog tables fix the bloat?

Perhaps reserving a worker or N to run only on catalog schemas?

That'd be far simpler.



Maybe, although IIRC the issues with catalog bloat were due to a combination
of long queries and many temporary tables being created/dropped. In that
case simply ramping up autovacuum (or even having a dedicated workers for
catalogs) would not realy help due to the xmin horizon being blocked by the
long-running queries.

Maybe it's entirely crazy idea due to the wine I drank at the dinner, but
couldn't we vacuum the temporary table records differently? For example,
couldn't we just consider them removable as soon as the backend that owns
them disappears?


Or perhaps go all the way and generalize that to rows that never
become visible outside their parent transaction.

As in, delete of rows created by the deleting transaction could clean
up, carefully to avoid voiding indexes and all that, but more
aggressively than regular deletes.



Maybe, but I wouldn't be surprised if such generalization would be an 
order of magnitude more complicated - and even the vacuuming changes I 
mentioned are undoubtedly a fair amount of work.


Sadly, I don't see how this might fix the other issues mentioned in this 
thread (e.g. impossibility to create temp tables on standbys),



regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-23 Thread Claudio Freire

On Tue, Aug 23, 2016 at 7:25 PM, Tomas Vondra
 wrote:
>>> Could someone please explain how the unlogged tables are supposed to fix
>>> the
>>> catalog bloat problem, as stated in the initial patch submission? We'd
>>> still
>>> need to insert/delete the catalog rows when creating/dropping the
>>> temporary
>>> tables, causing the bloat. Or is there something I'm missing?
>>
>>
>> Wouldn't more aggressive vacuuming of catalog tables fix the bloat?
>>
>> Perhaps reserving a worker or N to run only on catalog schemas?
>>
>> That'd be far simpler.
>
>
> Maybe, although IIRC the issues with catalog bloat were due to a combination
> of long queries and many temporary tables being created/dropped. In that
> case simply ramping up autovacuum (or even having a dedicated workers for
> catalogs) would not realy help due to the xmin horizon being blocked by the
> long-running queries.
>
> Maybe it's entirely crazy idea due to the wine I drank at the dinner, but
> couldn't we vacuum the temporary table records differently? For example,
> couldn't we just consider them removable as soon as the backend that owns
> them disappears?

Or perhaps go all the way and generalize that to rows that never
become visible outside their parent transaction.

As in, delete of rows created by the deleting transaction could clean
up, carefully to avoid voiding indexes and all that, but more
aggressively than regular deletes.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-23 Thread Andres Freund

On 2016-08-23 19:33:33 -0300, Claudio Freire wrote:
> On Tue, Aug 23, 2016 at 7:20 PM, Andres Freund  wrote:
> >> Wouldn't more aggressive vacuuming of catalog tables fix the bloat?
> >
> > Not really in my experience, at least not without more drastic vacuum
> > changes. The issue is that if you have a single "long running"
> > transaction (in some workloads that can even just be a 3 min taking
> > query/xact), nothing will be cleaned up during that time. If you have a
> > few hundred temp tables created per sec, you'll be in trouble even
> > then. Not to speak of the case where you have queries taking hours (say
> > a backup).
> 
> Well, my experience isn't as extreme as that (just a few dozen temp
> tables per minute), but when I see bloat in catalog tables it's
> because all autovacuum workers are stuck vacuuming huge tables for
> huge periods of time (hours or days).

Well, that's because our defaults are batshit stupid.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-23 Thread Claudio Freire

On Tue, Aug 23, 2016 at 7:20 PM, Andres Freund  wrote:
>> Wouldn't more aggressive vacuuming of catalog tables fix the bloat?
>
> Not really in my experience, at least not without more drastic vacuum
> changes. The issue is that if you have a single "long running"
> transaction (in some workloads that can even just be a 3 min taking
> query/xact), nothing will be cleaned up during that time. If you have a
> few hundred temp tables created per sec, you'll be in trouble even
> then. Not to speak of the case where you have queries taking hours (say
> a backup).

Well, my experience isn't as extreme as that (just a few dozen temp
tables per minute), but when I see bloat in catalog tables it's
because all autovacuum workers are stuck vacuuming huge tables for
huge periods of time (hours or days).

So that's certainly another bloat case to consider.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-23 Thread Tomas Vondra




On 08/24/2016 12:18 AM, Claudio Freire wrote:

On Tue, Aug 23, 2016 at 7:11 PM, Tomas Vondra
 wrote:

On 08/22/2016 10:32 PM, Robert Haas wrote:



...

1. The number of tables for which we would need to add a duplicate,
unlogged table is formidable.  You need pg_attribute, pg_attrdef,
pg_constraint, pg_description, pg_type, pg_trigger, pg_rewrite, etc.
And the backend changes needed so that we used the unlogged copy for
temp tables and the permanent copy for regular tables is probably
really large.

2. You can't write to unlogged tables on standby servers, so this
doesn't help solve the problem of wanting to use temporary tables on
standbys.

3. While it makes creating temporary tables a lighter-weight
operation, because you no longer need to write WAL for the catalog
entries, there's probably still substantially more overhead than just
stuffing them in backend-local RAM.  So the performance benefits are
probably fairly modest.

Overall I feel like the development effort that it would take to make
this work would almost certainly be better-expended elsewhere.  But of
course I'm not in charge of how people who work for other companies
spend their time...



Could someone please explain how the unlogged tables are supposed to fix the
catalog bloat problem, as stated in the initial patch submission? We'd still
need to insert/delete the catalog rows when creating/dropping the temporary
tables, causing the bloat. Or is there something I'm missing?


Wouldn't more aggressive vacuuming of catalog tables fix the bloat?

Perhaps reserving a worker or N to run only on catalog schemas?

That'd be far simpler.


Maybe, although IIRC the issues with catalog bloat were due to a 
combination of long queries and many temporary tables being 
created/dropped. In that case simply ramping up autovacuum (or even 
having a dedicated workers for catalogs) would not realy help due to the 
xmin horizon being blocked by the long-running queries.


Maybe it's entirely crazy idea due to the wine I drank at the dinner, 
but couldn't we vacuum the temporary table records differently? For 
example, couldn't we just consider them removable as soon as the backend 
that owns them disappears?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-23 Thread Andres Freund

On 2016-08-23 19:18:04 -0300, Claudio Freire wrote:
> On Tue, Aug 23, 2016 at 7:11 PM, Tomas Vondra
>  wrote:
> > Could someone please explain how the unlogged tables are supposed to fix the
> > catalog bloat problem, as stated in the initial patch submission? We'd still
> > need to insert/delete the catalog rows when creating/dropping the temporary
> > tables, causing the bloat. Or is there something I'm missing?

Beats me.

> Wouldn't more aggressive vacuuming of catalog tables fix the bloat?

Not really in my experience, at least not without more drastic vacuum
changes. The issue is that if you have a single "long running"
transaction (in some workloads that can even just be a 3 min taking
query/xact), nothing will be cleaned up during that time. If you have a
few hundred temp tables created per sec, you'll be in trouble even
then. Not to speak of the case where you have queries taking hours (say
a backup).

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-23 Thread Claudio Freire

On Tue, Aug 23, 2016 at 7:11 PM, Tomas Vondra
 wrote:
> On 08/22/2016 10:32 PM, Robert Haas wrote:
>>
>>
>> ...
>>
>> 1. The number of tables for which we would need to add a duplicate,
>> unlogged table is formidable.  You need pg_attribute, pg_attrdef,
>> pg_constraint, pg_description, pg_type, pg_trigger, pg_rewrite, etc.
>> And the backend changes needed so that we used the unlogged copy for
>> temp tables and the permanent copy for regular tables is probably
>> really large.
>>
>> 2. You can't write to unlogged tables on standby servers, so this
>> doesn't help solve the problem of wanting to use temporary tables on
>> standbys.
>>
>> 3. While it makes creating temporary tables a lighter-weight
>> operation, because you no longer need to write WAL for the catalog
>> entries, there's probably still substantially more overhead than just
>> stuffing them in backend-local RAM.  So the performance benefits are
>> probably fairly modest.
>>
>> Overall I feel like the development effort that it would take to make
>> this work would almost certainly be better-expended elsewhere.  But of
>> course I'm not in charge of how people who work for other companies
>> spend their time...
>>
>
> Could someone please explain how the unlogged tables are supposed to fix the
> catalog bloat problem, as stated in the initial patch submission? We'd still
> need to insert/delete the catalog rows when creating/dropping the temporary
> tables, causing the bloat. Or is there something I'm missing?

Wouldn't more aggressive vacuuming of catalog tables fix the bloat?

Perhaps reserving a worker or N to run only on catalog schemas?

That'd be far simpler.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-23 Thread Tomas Vondra




On 08/22/2016 10:32 PM, Robert Haas wrote:


...

1. The number of tables for which we would need to add a duplicate,
unlogged table is formidable.  You need pg_attribute, pg_attrdef,
pg_constraint, pg_description, pg_type, pg_trigger, pg_rewrite, etc.
And the backend changes needed so that we used the unlogged copy for
temp tables and the permanent copy for regular tables is probably
really large.

2. You can't write to unlogged tables on standby servers, so this
doesn't help solve the problem of wanting to use temporary tables on
standbys.

3. While it makes creating temporary tables a lighter-weight
operation, because you no longer need to write WAL for the catalog
entries, there's probably still substantially more overhead than just
stuffing them in backend-local RAM.  So the performance benefits are
probably fairly modest.

Overall I feel like the development effort that it would take to make
this work would almost certainly be better-expended elsewhere.  But of
course I'm not in charge of how people who work for other companies
spend their time...



Could someone please explain how the unlogged tables are supposed to fix 
the catalog bloat problem, as stated in the initial patch submission? 
We'd still need to insert/delete the catalog rows when creating/dropping 
the temporary tables, causing the bloat. Or is there something I'm missing?



--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-23 Thread Aleksander Alekseev

Thank you everyone for great comments!

> have a permanent pg_class which holds the records for permanent tables
> and an *unlogged* table, say pg_class_unlogged, which holds records
> for temporary tables.  Now everybody can see everybody else's data,
> yet we don't have to create permanent catalog entries.  So we are not
> dead.  All of the temporary catalog tables vanish on a crash, too, and
> in a very clean way, which is great.
>
> [...]
>
> Overall I feel like the development effort that it would take to make
> this work would almost certainly be better-expended elsewhere.

Agree. This is an interesting idea but considering named drawbacks,
especially:

> 2. You can't write to unlogged tables on standby servers, so this
> doesn't help solve the problem of wanting to use temporary tables on
> standbys.

... I don't think it's worth an effort.

>> when you use global temporary tables, then you create it only once -
>> like usual tables.
>>
>> you don't drop these tables.
>
> I share the view that this is a better/simpler solution to the problem.
> It will still require virtual (in-memory) tuples for pg_statistic
> records, but everything else works pretty much as for regular tables. In
> particular there are no problems with dependencies.
>
> The obvious disadvantage is that it requires changes to applications.

Frankly I have much more faith in Tom's idea of using virtual part of the
catalog for all temporary tables, i.e turning all temporary tables into
"fast" temporary tables. Instead of introducing a new type of temporary tables
that solve catalog bloating problem and forcing users to rewrite applications
why not just not to create a problem in a first place?

> I think one way to fix that would be to store the virtual tuples in
> shared memory (instead of process memory). That will certainly require
> locking and synchronization, but well - it needs to be shared.

I believe currently this is the most promising course of action. In first
implementation we could just place all virtual part of the catalog in a shared
memory and protect it with a single lock. If it will work as expected the next
step would be elimination of bottlenecks --- using multiple locks, moving part
of a virtual catalog to local backend's memory, etc.

As always, please don't hesitate to share any thoughts on this topic!

-- 
Best regards,
Aleksander Alekseev


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-22 Thread Andres Freund

On 2016-08-22 17:50:11 -0300, Alvaro Herrera wrote:
> > 2. You can't write to unlogged tables on standby servers, so this
> > doesn't help solve the problem of wanting to use temporary tables on
> > standbys.
> 
> Check.  We could think about relaxing this restriction, which would
> enable the feature to satisfy that use case.  (I think the main
> complication there is the init fork of btrees on those catalogs; other
> relations could just be truncated to empty on restart.)

Isn't the main complication that visibility currently requires xids to
be assigned?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-22 Thread Alvaro Herrera

Robert Haas wrote:

> However:
> 
> 1. The number of tables for which we would need to add a duplicate,
> unlogged table is formidable.  You need pg_attribute, pg_attrdef,
> pg_constraint, pg_description, pg_type, pg_trigger, pg_rewrite, etc.
> And the backend changes needed so that we used the unlogged copy for
> temp tables and the permanent copy for regular tables is probably
> really large.

Check.  This is the most serious issue, IMV.

> 2. You can't write to unlogged tables on standby servers, so this
> doesn't help solve the problem of wanting to use temporary tables on
> standbys.

Check.  We could think about relaxing this restriction, which would
enable the feature to satisfy that use case.  (I think the main
complication there is the init fork of btrees on those catalogs; other
relations could just be truncated to empty on restart.)

> 3. While it makes creating temporary tables a lighter-weight
> operation, because you no longer need to write WAL for the catalog
> entries, there's probably still substantially more overhead than just
> stuffing them in backend-local RAM.  So the performance benefits are
> probably fairly modest.

You also save catalog bloat ... These benefits may not be tremendous,
but I think they may be good enough for many users.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-22 Thread Robert Haas

On Tue, Aug 16, 2016 at 8:03 PM, Jim Nasby  wrote:
> On 8/16/16 11:59 AM, Robert Haas wrote:
> ...
>>
>> That doesn't really solve the problem, because OTHER backends won't be
>> able to see them.  So, if I create a fast temporary table in one
>> session that depends on a permanent object, some other session can
>> drop the permanent object.  If there were REAL catalog entries, that
>> wouldn't work, because the other session would see the dependency.
>
> Some discussion about TEMP functions is happening on -general right now, and
> there's other things where temp objects are good to have, so it'd be nice to
> have a more generic fix for this stuff. Is the idea of "partitioning" the
> catalogs to store temp objects separate from permanent fatally flawed?

I wouldn't say it's fatally flawed.  But you might need a
world-renowned team of physicians working round the clock for days in
a class 1 trauma center to save it.  If you imagine that you have a
permanent pg_class which holds permanent tables and a temporary
pg_class per-backend which stores temporary tables, then you very
quickly end up with the same deadly flaw as in Aleksander's design:
other backends cannot see all of the dependency entries and can drop
things that they shouldn't be permitted to drop.  However, you could
have a permanent pg_class which holds the records for permanent tables
and an *unlogged* table, say pg_class_unlogged, which holds records
for temporary tables.  Now everybody can see everybody else's data,
yet we don't have to create permanent catalog entries.  So we are not
dead.  All of the temporary catalog tables vanish on a crash, too, and
in a very clean way, which is great.

However:

1. The number of tables for which we would need to add a duplicate,
unlogged table is formidable.  You need pg_attribute, pg_attrdef,
pg_constraint, pg_description, pg_type, pg_trigger, pg_rewrite, etc.
And the backend changes needed so that we used the unlogged copy for
temp tables and the permanent copy for regular tables is probably
really large.

2. You can't write to unlogged tables on standby servers, so this
doesn't help solve the problem of wanting to use temporary tables on
standbys.

3. While it makes creating temporary tables a lighter-weight
operation, because you no longer need to write WAL for the catalog
entries, there's probably still substantially more overhead than just
stuffing them in backend-local RAM.  So the performance benefits are
probably fairly modest.

Overall I feel like the development effort that it would take to make
this work would almost certainly be better-expended elsewhere.  But of
course I'm not in charge of how people who work for other companies
spend their time...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-18 Thread Pavel Stehule

2016-08-18 16:33 GMT+02:00 Tomas Vondra :

> On 08/15/2016 12:23 PM, Pavel Stehule wrote:
>
>>
>>
>> 2016-08-15 12:18 GMT+02:00 Aleksander Alekseev
>> mailto:a.aleks...@postgrespro.ru>>:
>>
>> > The global temporary tables has persistent rows in the catalogue.
>> The
>> > mapping to files can be marked as special and real mapping should be
>> > only in memory.
>> >
>> > So the changes in catalogue related to global temporary tables are
>> > pretty less frequently.
>>
>> I'm afraid I still don't get it. Let say I have an application that
>> does `CREATE TEMP TABLE xx ; DROP TABLE xx` in every session all the
>> time. Naturally there is not only one temp table per session. Could
>> you
>> explain please in more detail how exactly do these persistent rows
>> help?
>>
>>
>> when you use global temporary tables, then you create it only once -
>> like usual tables.
>>
>> you don't drop these tables.
>>
>>
> I share the view that this is a better/simpler solution to the problem. It
> will still require virtual (in-memory) tuples for pg_statistic records, but
> everything else works pretty much as for regular tables. In particular
> there are no problems with dependencies.
>
> The obvious disadvantage is that it requires changes to applications.


sure - as plpgsql developer I can say, the global temp tables are much more
friendly - so rewriting in application is enjoy work.

Regards

Pavel


>
>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-18 Thread Tomas Vondra

On 08/15/2016 12:23 PM, Pavel Stehule wrote:

2016-08-15 12:18 GMT+02:00 Aleksander Alekseev
mailto:a.aleks...@postgrespro.ru>>:

> The global temporary tables has persistent rows in the catalogue. The
> mapping to files can be marked as special and real mapping should be
> only in memory.
>
> So the changes in catalogue related to global temporary tables are
> pretty less frequently.

I'm afraid I still don't get it. Let say I have an application that
does `CREATE TEMP TABLE xx ; DROP TABLE xx` in every session all the
time. Naturally there is not only one temp table per session. Could you
explain please in more detail how exactly do these persistent rows help?

when you use global temporary tables, then you create it only once -
like usual tables.

you don't drop these tables.

I share the view that this is a better/simpler solution to the problem. 
It will still require virtual (in-memory) tuples for pg_statistic 
records, but everything else works pretty much as for regular tables. In 
particular there are no problems with dependencies.

The obvious disadvantage is that it requires changes to applications.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-18 Thread Tomas Vondra


On 08/17/2016 11:50 AM, Aleksander Alekseev wrote:

That doesn't really solve the problem, because OTHER backends won't be
able to see them.  So, if I create a fast temporary table in one
session that depends on a permanent object, some other session can
drop the permanent object.  If there were REAL catalog entries, that
wouldn't work, because the other session would see the dependency.



This is a good point. However current implementation doesn't allow to
do that.


IMHO without handling that, the design is effectively broken and has 
very little change (or rather none at all) to get committed.


I think one way to fix that would be to store the virtual tuples in 
shared memory (instead of process memory). That will certainly require 
locking and synchronization, but well - it needs to be shared.


> There is a related bug though, a minor one.


In session 1:

```
CREATE TABLE cities2 (name text, population float, altitude int);
CREATE FAST TEMPORARY TABLE capitals2 (state char(2)) INHERITS (cities2);
```

In session 2:

```
DROP TABLE cities2;

ERROR:  cache lookup failed for relation 16401
```

Instead of "cache lookup failed" probably a better error message
should be displayed. Something like "cannot drop table cities2
because other objects depend on it". I will send a corrected patch
shortly.

Everything else seems to work as expected.

If you discover any other bugs please let me know!



While a better error message would be nice, this is curing the symptoms 
and not the cause. I think a proper design needs to prevent the DROP by 
using dependencies.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-17 Thread Aleksander Alekseev

> That doesn't really solve the problem, because OTHER backends won't be
> able to see them.  So, if I create a fast temporary table in one
> session that depends on a permanent object, some other session can
> drop the permanent object.  If there were REAL catalog entries, that
> wouldn't work, because the other session would see the dependency.
> 

This is a good point. However current implementation doesn't allow to
do that. There is a related bug though, a minor one.

In session 1:

```
CREATE TABLE cities2 (name text, population float, altitude int);
CREATE FAST TEMPORARY TABLE capitals2 (state char(2)) INHERITS (cities2);
```

In session 2:

```
DROP TABLE cities2;

ERROR:  cache lookup failed for relation 16401
```

Instead of "cache lookup failed" probably a better error message
should be displayed. Something like "cannot drop table cities2 because
other objects depend on it". I will send a corrected patch shortly.

Everything else seems to work as expected.

If you discover any other bugs please let me know!

-- 
Best regards,
Aleksander Alekseev


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-16 Thread Jim Nasby


On 8/16/16 11:59 AM, Robert Haas wrote:
...

That doesn't really solve the problem, because OTHER backends won't be
able to see them.  So, if I create a fast temporary table in one
session that depends on a permanent object, some other session can
drop the permanent object.  If there were REAL catalog entries, that
wouldn't work, because the other session would see the dependency.


Some discussion about TEMP functions is happening on -general right now, 
and there's other things where temp objects are good to have, so it'd be 
nice to have a more generic fix for this stuff. Is the idea of 
"partitioning" the catalogs to store temp objects separate from 
permanent fatally flawed?

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-16 Thread Robert Haas

On Mon, Aug 15, 2016 at 5:12 AM, Aleksander Alekseev
 wrote:
> Just to keep things sane I would like to remind that in this concrete
> patch there _are_ catalog entries:
>
> ```
> [...]
> This file contents imlementation of special type of temporary tables ---
> fast temporary tables (FTT). From user perspective they work exactly as
> regular temporary tables. However there are no records about FTTs in
> pg_catalog. These records are stored in backend's memory instead and
> mixed with regular records during scans of catalog tables. We refer to
> corresponding tuples of catalog tables as "in-memory" or "virtual"
> tuples and to all these tuples together --- as "in-memory" or "virtual"
> catalog.
> [...]
> ```

That doesn't really solve the problem, because OTHER backends won't be
able to see them.  So, if I create a fast temporary table in one
session that depends on a permanent object, some other session can
drop the permanent object.  If there were REAL catalog entries, that
wouldn't work, because the other session would see the dependency.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-15 Thread Aleksander Alekseev

> Why are you sending this off-list?  Please let's keep the discussion
> on the mailing list.  I suggest resending this there.

Sorry for that. I accidentally removed pgsql-hackers@ from CC list or
maybe my email client somehow did it for me. Short after that I realized
my mistake and sent a copy to the mailing list. 

-- 
Best regards,
Aleksander Alekseev


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-15 Thread Robert Haas

Why are you sending this off-list?  Please let's keep the discussion
on the mailing list.  I suggest resending this there.

On Mon, Aug 15, 2016 at 5:01 AM, Aleksander Alekseev
 wrote:
>> > >>> I think the whole idea of a fast temporary table is that there
>> > >>> are no catalog entries.  If there are no catalog entries, then
>> > >>> dependencies are not visible.  If there ARE catalog entries, to
>> > >>> what do they refer? Without a pg_class entry for the table,
>> > >>> there's no table OID upon which to depend.
>> >
>> > >> TBH, I think that the chances of such a design getting committed
>> > >> are not distinguishable from zero.  Tables have to have OIDs;
>> > >> there is just too much code that assumes that.  And I seriously
>> > >> doubt that it will work (for any large value of "work") without
>> > >> catalog entries.
>> >
>> > > That seems a bit too defeatist.
>> >
>> > Huh?  I didn't say we shouldn't work on the problem --- I just
>> > think that this particular approach isn't good.  Which you seemed
>> > to agree with.
>>
>> I took your statement to mean that they need a pg_class entry - even
>> if there were a partial solution to the pg_depend problem allowing to
>> avoid pg_attribute entries, tha't still not really be a solution. If
>> that's not what you mean, sorry - and nice that we agree ;)
>>
>>
>
> Just to keep things sane I would like to remind that in this concrete
> patch there _are_ catalog entries:
>
> ```
> [...]
> This file contents imlementation of special type of temporary tables ---
> fast temporary tables (FTT). From user perspective they work exactly as
> regular temporary tables. However there are no records about FTTs in
> pg_catalog. These records are stored in backend's memory instead and
> mixed with regular records during scans of catalog tables. We refer to
> corresponding tuples of catalog tables as "in-memory" or "virtual"
> tuples and to all these tuples together --- as "in-memory" or "virtual"
> catalog.
> [...]
> ```
>
> As Tom pointed out a lot of PL/pgSQL code would stop working otherwise.
> Also I mentioned that in this case even \d and \d+ would not work.
>
> I personally find this discussion very confusing. Maybe we should
> concentrate on a concrete patch instead of some abstract ideas and
> topics that are still open.
>
> For instance it surprises me that apparently there is no one who
> objects "lets make all temporary tables fast temporary tables" idea.
> Since in this case code would use more memory for keeping a virtual
> catalog wouldn't it be considered a major change of behavior that could
> break someones production environment?
>
> --
> Best regards,
> Aleksander Alekseev



-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-15 Thread Christoph Berg

Re: To Tom Lane 2016-08-15 <20160815111057.v2mqqjp4aabvw...@msg.df7cb.de>
> Re: Tom Lane 2016-07-30 <1184.1469890...@sss.pgh.pa.us>
> > In short, I think that the way to make something like this work is to
> > figure out how to have "virtual" catalog rows describing a temp table.
> > Or maybe to partition the catalogs so that vacuuming away temp-table
> > rows is easier/cheaper than today.
> 
> We should also be thinking about how the opposite idea of "global"
> temp tables

(Obviously I should catch up on the rest of the thread when postponing
a message for an hour or two. Sorry for the duplicated idea here...)

Christoph


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-15 Thread Christoph Berg

Re: Tom Lane 2016-07-30 <1184.1469890...@sss.pgh.pa.us>
> In short, I think that the way to make something like this work is to
> figure out how to have "virtual" catalog rows describing a temp table.
> Or maybe to partition the catalogs so that vacuuming away temp-table
> rows is easier/cheaper than today.

We should also be thinking about how the opposite idea of "global"
temp tables (I believe that's what Oracle calls them) would work.
These have a persistent structure in the catalogs, just the data is
private to every session (or transaction); every connection starts
with an empty temp table and for their use.

I'd guess that type of global temp tables would fix the bloat problem
also very efficiently. (Ad-hoc temp tables shouldn't occur that often
so the bloat caused by them wouldn't matter that much. If they do,
their structure is likely always the same, and they could be made
"global" in the schema.)

The bit that needs to be thought out here would be how to maintain
statistics for these tables. Obviously ANALYZE shouldn't update any
globally visible data.

Christoph


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-15 Thread Pavel Stehule

2016-08-15 12:18 GMT+02:00 Aleksander Alekseev :

> > The global temporary tables has persistent rows in the catalogue. The
> > mapping to files can be marked as special and real mapping should be
> > only in memory.
> >
> > So the changes in catalogue related to global temporary tables are
> > pretty less frequently.
>
> I'm afraid I still don't get it. Let say I have an application that
> does `CREATE TEMP TABLE xx ; DROP TABLE xx` in every session all the
> time. Naturally there is not only one temp table per session. Could you
> explain please in more detail how exactly do these persistent rows help?
>
>
when you use global temporary tables, then you create it only once - like
usual tables.

you don't drop these tables.

Regards

Pavel


> --
> Best regards,
> Aleksander Alekseev
>

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-15 Thread Aleksander Alekseev

> The global temporary tables has persistent rows in the catalogue. The
> mapping to files can be marked as special and real mapping should be
> only in memory.
> 
> So the changes in catalogue related to global temporary tables are
> pretty less frequently.

I'm afraid I still don't get it. Let say I have an application that
does `CREATE TEMP TABLE xx ; DROP TABLE xx` in every session all the
time. Naturally there is not only one temp table per session. Could you
explain please in more detail how exactly do these persistent rows help?

-- 
Best regards,
Aleksander Alekseev


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-15 Thread Pavel Stehule

2016-08-15 12:00 GMT+02:00 Aleksander Alekseev :

> > But we can change this discussion little bit different. I believe so
> > solution should be *global temporary tables*. These tables has
> > persistent catalogue entries. Data are joined with session. These
> > tables can be effective solution of problem with temporary tables,
> > can be strong benefit for developers (more comfortable, possible
> > static analyse of PLpgSQL) and it simplify life to all people who has
> > do migration from Oracle. So only benefits are there :).
>
> I don't think that global temporary tables solve "catalog bloating that
> causes auto vacuum" problem. I suggest we don't change a topic. Or maybe
> I don't know something about global temporary tables?
>

The global temporary tables has persistent rows in the catalogue. The
mapping to files can be marked as special and real mapping should be only
in memory.

So the changes in catalogue related to global temporary tables are pretty
less frequently.

Regards

Pavel


>
> --
> Best regards,
> Aleksander Alekseev
>

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-15 Thread Aleksander Alekseev

> But we can change this discussion little bit different. I believe so
> solution should be *global temporary tables*. These tables has
> persistent catalogue entries. Data are joined with session. These
> tables can be effective solution of problem with temporary tables,
> can be strong benefit for developers (more comfortable, possible
> static analyse of PLpgSQL) and it simplify life to all people who has
> do migration from Oracle. So only benefits are there :).

I don't think that global temporary tables solve "catalog bloating that
causes auto vacuum" problem. I suggest we don't change a topic. Or maybe
I don't know something about global temporary tables?

-- 
Best regards,
Aleksander Alekseev


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-15 Thread Pavel Stehule

2016-08-15 11:01 GMT+02:00 Aleksander Alekseev :

> > > >>> I think the whole idea of a fast temporary table is that there
> > > >>> are no catalog entries.  If there are no catalog entries, then
> > > >>> dependencies are not visible.  If there ARE catalog entries, to
> > > >>> what do they refer? Without a pg_class entry for the table,
> > > >>> there's no table OID upon which to depend.
> > >
> > > >> TBH, I think that the chances of such a design getting committed
> > > >> are not distinguishable from zero.  Tables have to have OIDs;
> > > >> there is just too much code that assumes that.  And I seriously
> > > >> doubt that it will work (for any large value of "work") without
> > > >> catalog entries.
> > >
> > > > That seems a bit too defeatist.
> > >
> > > Huh?  I didn't say we shouldn't work on the problem --- I just
> > > think that this particular approach isn't good.  Which you seemed
> > > to agree with.
> >
> > I took your statement to mean that they need a pg_class entry - even
> > if there were a partial solution to the pg_depend problem allowing to
> > avoid pg_attribute entries, tha't still not really be a solution. If
> > that's not what you mean, sorry - and nice that we agree ;)
> >
> >
>
> Just to keep things sane I would like to remind that in this concrete
> patch there _are_ catalog entries:
>
> ```
> [...]
> This file contents imlementation of special type of temporary tables ---
> fast temporary tables (FTT). From user perspective they work exactly as
> regular temporary tables. However there are no records about FTTs in
> pg_catalog. These records are stored in backend's memory instead and
> mixed with regular records during scans of catalog tables. We refer to
> corresponding tuples of catalog tables as "in-memory" or "virtual"
> tuples and to all these tuples together --- as "in-memory" or "virtual"
> catalog.
> [...]
> ```
>
> As Tom pointed out a lot of PL/pgSQL code would stop working otherwise.
> Also I mentioned that in this case even \d and \d+ would not work.
>
> I personally find this discussion very confusing. Maybe we should
> concentrate on a concrete patch instead of some abstract ideas and
> topics that are still open.
>
> For instance it surprises me that apparently there is no one who
> objects "lets make all temporary tables fast temporary tables" idea.
> Since in this case code would use more memory for keeping a virtual
> catalog wouldn't it be considered a major change of behavior that could
> break someones production environment?
>

It is pretty hard discussion about cost or usability of FTT. The small FTT
(for usage in PLpgSQL) can be replaced by arrays. The overhead of
pg_catalog of big TT is not significant. So introduction special
proprietary table type is debatable.

Probably size of metadata of temporary tables should be minimal - currently
all metadata are cached in memory - and it is not a problem.

But we can change this discussion little bit different. I believe so
solution should be *global temporary tables*. These tables has persistent
catalogue entries. Data are joined with session. These tables can be
effective solution of problem with temporary tables, can be strong benefit
for developers (more comfortable, possible static analyse of PLpgSQL) and
it simplify life to all people who has do migration from Oracle. So only
benefits are there :).

Regards

Pavel


>
> --
> Best regards,
> Aleksander Alekseev
>

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-15 Thread Aleksander Alekseev

> > >>> I think the whole idea of a fast temporary table is that there
> > >>> are no catalog entries.  If there are no catalog entries, then
> > >>> dependencies are not visible.  If there ARE catalog entries, to
> > >>> what do they refer? Without a pg_class entry for the table,
> > >>> there's no table OID upon which to depend.
> > 
> > >> TBH, I think that the chances of such a design getting committed
> > >> are not distinguishable from zero.  Tables have to have OIDs;
> > >> there is just too much code that assumes that.  And I seriously
> > >> doubt that it will work (for any large value of "work") without
> > >> catalog entries.
> > 
> > > That seems a bit too defeatist.
> > 
> > Huh?  I didn't say we shouldn't work on the problem --- I just
> > think that this particular approach isn't good.  Which you seemed
> > to agree with.
> 
> I took your statement to mean that they need a pg_class entry - even
> if there were a partial solution to the pg_depend problem allowing to
> avoid pg_attribute entries, tha't still not really be a solution. If
> that's not what you mean, sorry - and nice that we agree ;)
> 
>   

Just to keep things sane I would like to remind that in this concrete
patch there _are_ catalog entries:

```
[...]
This file contents imlementation of special type of temporary tables ---
fast temporary tables (FTT). From user perspective they work exactly as
regular temporary tables. However there are no records about FTTs in
pg_catalog. These records are stored in backend's memory instead and
mixed with regular records during scans of catalog tables. We refer to
corresponding tuples of catalog tables as "in-memory" or "virtual"
tuples and to all these tuples together --- as "in-memory" or "virtual"
catalog.
[...]
```

As Tom pointed out a lot of PL/pgSQL code would stop working otherwise.
Also I mentioned that in this case even \d and \d+ would not work.

I personally find this discussion very confusing. Maybe we should
concentrate on a concrete patch instead of some abstract ideas, and
topics that are still open.

For instance it surprises me that apparently there is no one who
objects "lets make all temporary tables fast temporary tables" idea.
Since in this case code would use more memory for keeping a virtual
catalog wouldn't it be considered a major change of behavior that can
break someones production environment?

-- 
Best regards,
Aleksander Alekseev


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-14 Thread Andres Freund

On 2016-08-14 21:04:57 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2016-08-07 14:46:06 -0400, Tom Lane wrote:
> >> Robert Haas  writes:
> >>> I think the whole idea of a fast temporary table is that there are no
> >>> catalog entries.  If there are no catalog entries, then dependencies
> >>> are not visible.  If there ARE catalog entries, to what do they refer?
> >>> Without a pg_class entry for the table, there's no table OID upon
> >>> which to depend.
> 
> >> TBH, I think that the chances of such a design getting committed are
> >> not distinguishable from zero.  Tables have to have OIDs; there is just
> >> too much code that assumes that.  And I seriously doubt that it will
> >> work (for any large value of "work") without catalog entries.
> 
> > That seems a bit too defeatist.
> 
> Huh?  I didn't say we shouldn't work on the problem --- I just think that
> this particular approach isn't good.  Which you seemed to agree with.

I took your statement to mean that they need a pg_class entry - even if
there were a partial solution to the pg_depend problem allowing to avoid
pg_attribute entries, tha't still not really be a solution. If that's
not what you mean, sorry - and nice that we agree ;)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-14 Thread Tom Lane

Andres Freund  writes:
> On 2016-08-07 14:46:06 -0400, Tom Lane wrote:
>> Robert Haas  writes:
>>> I think the whole idea of a fast temporary table is that there are no
>>> catalog entries.  If there are no catalog entries, then dependencies
>>> are not visible.  If there ARE catalog entries, to what do they refer?
>>> Without a pg_class entry for the table, there's no table OID upon
>>> which to depend.

>> TBH, I think that the chances of such a design getting committed are
>> not distinguishable from zero.  Tables have to have OIDs; there is just
>> too much code that assumes that.  And I seriously doubt that it will
>> work (for any large value of "work") without catalog entries.

> That seems a bit too defeatist.

Huh?  I didn't say we shouldn't work on the problem --- I just think that
this particular approach isn't good.  Which you seemed to agree with.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-14 Thread Andres Freund

On 2016-08-07 14:46:06 -0400, Tom Lane wrote:
> Robert Haas  writes:
> > I think the whole idea of a fast temporary table is that there are no
> > catalog entries.  If there are no catalog entries, then dependencies
> > are not visible.  If there ARE catalog entries, to what do they refer?
> >  Without a pg_class entry for the table, there's no table OID upon
> > which to depend.
>
> TBH, I think that the chances of such a design getting committed are
> not distinguishable from zero.  Tables have to have OIDs; there is just
> too much code that assumes that.  And I seriously doubt that it will
> work (for any large value of "work") without catalog entries.

That seems a bit too defeatist. It's obviously not a small change to get
there - and I don't think the patch upthread is really attacking the
relevant problems yet - but saying that we'll never have temp tables
without pg_class/pg_depend bloat seems to be pretty close to just giving
up.  Having 8 byte oids (as explicit columns instead of magic? Or just
oid64?) and then reserving ranges for temp objects stored in a local
memory seems to be feasible.  The pinning problem could potentially be
solved by "session lifetime" pins in pg_depend, which prevents dependent
objects being dropped.  Obviously that's just spitballing; but I think
the problem is too big to just give up.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-08 Thread Anastasia Lubennikova


05.08.2016 19:41, Robert Haas:


2. This inserts additional code in a bunch of really low-level places
like heap_hot_search_buffer, heap_update, heap_delete, etc.  I think
what you've basically done here is create a new, in-memory heap AM and
then, because we don't have an API for adding new storage managers,
you've bolted it onto the existing heapam layer.  That's certainly a
reasonable approach for creating a PoC, but I think we actually need a
real API here.  Otherwise, when the next person comes along and wants
to add a third heap implementation, they've got to modify all of these
same places again.  I don't think this code is reasonably maintainable
in this form.


As I can see, you recommend to clean up the API of storage
management code. I strongly agree that it's high time to do it.

So, I started the discussion about refactoring and improving API
of heapam and heap relations.
You can find it on commitfest:
https://commitfest.postgresql.org/10/700/

I'll be glad to see your thoughts on the thread.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-07 Thread Tom Lane

Robert Haas  writes:
> I think the whole idea of a fast temporary table is that there are no
> catalog entries.  If there are no catalog entries, then dependencies
> are not visible.  If there ARE catalog entries, to what do they refer?
>  Without a pg_class entry for the table, there's no table OID upon
> which to depend.

TBH, I think that the chances of such a design getting committed are
not distinguishable from zero.  Tables have to have OIDs; there is just
too much code that assumes that.  And I seriously doubt that it will
work (for any large value of "work") without catalog entries.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-07 Thread Robert Haas

On Sat, Aug 6, 2016 at 4:05 AM, Dmitry Dolgov <9erthali...@gmail.com> wrote:
>> For example, suppose I create a fast temporary table and then I create a
>> functional index on the fast temporary table that uses some SQL function
>> defined in pg_proc.
> Just to clarify, did you mean something like this?
> ```
> create fast temp table fasttab(x int, s text);
> create or replace function test_function_for_index(t text) returns text as
> $$
> begin
> return lower(t);
> end;
> $$ language plpgsql immutable;
> create index fasttab_s_idx on fasttab (test_function_for_index(s));
> drop function test_function_for_index(t text);
> ```
> As far as I understand dependencies should protect in case of fasttable too,
> because everything is visible as in regular case, isn't it?

I think the whole idea of a fast temporary table is that there are no
catalog entries.  If there are no catalog entries, then dependencies
are not visible.  If there ARE catalog entries, to what do they refer?
 Without a pg_class entry for the table, there's no table OID upon
which to depend.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-06 Thread Dmitry Dolgov

Hi

I tried to dig into this patch (which seems pretty interesting) to help
bring
it in good shape. Here are few random notes, I hope they can be helpful:

> I think we actually need a real API here.
Definitely, there are plenty places in the new code with the same pattern:
 * figure out if it's an action related to the fast temporary tables based
on
   `ItemPointer`/relation OID/etc
 * if it is, add some extra logic or skip something in original
implementation

in `heapam.c`, `indexam.c`, `xact.c`, `dependency.c`. I believe it's
possible to
make it more generic (although it will contain almost the same logic), e.g.
separate regular and fasttable implementations completely, and decide which
one
we should choose in that particular case.

Btw, I'm wondering about the `heap_*` functions in `heapam.c` - some of
them are
wrapped and never used directly, although they're contains in
`INTERFACE_ROUTINES` (like `simple_heap_delete` -> `heap_delete`), some of
them
aren't. It looks like inconsistency in terms of function names, probably it
should be unified?

> What is needed is an overview of the approach, so that the reviewers can
read
> that first,
I feel lack of such information even in new version of this patch (but, I'm
probably a noob in these matters).  I noted that the `fasttab.c` file
contains some
general overview, but in terms of "what are we doing", and "why are we doing
this". I think general overview of "how are we doing this" also may be
useful.
And there are several slightly obvious commentaries like:

```
+ /* Is it a virtual TID? */
+ if (IsFasttabItemPointer(tid))
```

but I believe an introduction of a new API (see the previous note) will
solve
this eventually.

> Why do we need the new relpersistence value? ISTM we could easily got with
> just RELPERSISTENCE_TEMP, which would got right away of many chances as
the
> steps are exactly the same.
I agree, it looks like `RELPERSISTENCE_FAST_TEMP` hasn't any influence on
the
code.

> For example, suppose I create a fast temporary table and then I create a
> functional index on the fast temporary table that uses some SQL function
> defined in pg_proc.
Just to clarify, did you mean something like this?
```
create fast temp table fasttab(x int, s text);
create or replace function test_function_for_index(t text) returns text as
$$
begin
return lower(t);
end;
$$ language plpgsql immutable;
create index fasttab_s_idx on fasttab (test_function_for_index(s));
drop function test_function_for_index(t text);
```
As far as I understand dependencies should protect in case of fasttable too,
because everything is visible as in regular case, isn't it?

And finally one more question, why items of `FasttabIndexMethodsTable[]`
like
this one:
```
+ /* 2187, non-unique */
+ {InheritsParentIndexId, 1,
+ {Anum_pg_inherits_inhparent, 0, 0},
+ {CompareOid, CompareInvalid, CompareInvalid}
+ },
```
have this commentary before them? I assume it's an id and an extra
information,
and I'm concerned that they can easily become outdated inside commentary
block.

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-05 Thread Robert Haas

On Thu, Aug 4, 2016 at 8:14 AM, Aleksander Alekseev
 wrote:
> Thanks everyone for your remarks and comments!
>
> Here is an improved version of a patch.
>
> Main changes:
> * Patch passes `make installcheck`
> * Code is fully commented, also no more TODO's
>
> I wish I sent this version of a patch last time. Now I realize it was
> really hard to read and understand. Hope I managed to correct this
> flaw. If you believe that some parts of the code are still poorly
> commented or could be improved in any other way please let me know.
>
> And as usual, any other comments, remarks or questions are highly
> appreciated!

Three general comments:

1. It's always seemed to me that a huge problem with anything of this
sort is dependencies.  For example, suppose I create a fast temporary
table and then I create a functional index on the fast temporary table
that uses some SQL function defined in pg_proc.  Then, another user
drops the function.  Then, I try to use the index.  With regular
temporary tables, dependencies protect us here, but if there are no
catalog entries, I wonder how this can ever be made safe.  Similar
problems exist for triggers, constraints, RLS policies, and attribute
defaults.

2. This inserts additional code in a bunch of really low-level places
like heap_hot_search_buffer, heap_update, heap_delete, etc.  I think
what you've basically done here is create a new, in-memory heap AM and
then, because we don't have an API for adding new storage managers,
you've bolted it onto the existing heapam layer.  That's certainly a
reasonable approach for creating a PoC, but I think we actually need a
real API here.  Otherwise, when the next person comes along and wants
to add a third heap implementation, they've got to modify all of these
same places again.  I don't think this code is reasonably maintainable
in this form.

3. Currently, temporary tables are parallel-restricted: a query that
references temporary tables can use parallelism, but access to the
temporary tables is only possible from within the leader.  I suspect,
although I'm not entirely sure, that lifting this restriction would be
easier with our current temporary table implementation than with this
one, because the current temporary table implementation mostly relies
on a set of buffers that could be moved from backend-private memory to
DSM.  On a quick look, this implementation uses a bunch of new data
structures that are heavy on pointers, so that gets quite a bit more
complicated to make parallel-safe (unless we adopt threads instead of
processes!).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-04 Thread Geoff Winkless

On 30 July 2016 at 13:42, Tomas Vondra  wrote:
> I'd argue that if you mess with catalogs directly, you're on your own.

Interesting. What would you suggest people use instead?

Geoff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-04 Thread Aleksander Alekseev

Thanks everyone for your remarks and comments!

Here is an improved version of a patch.

Main changes:
* Patch passes `make installcheck`
* Code is fully commented, also no more TODO's

I wish I sent this version of a patch last time. Now I realize it was
really hard to read and understand. Hope I managed to correct this
flaw. If you believe that some parts of the code are still poorly
commented or could be improved in any other way please let me know.

And as usual, any other comments, remarks or questions are highly
appreciated!

-- 
Best regards,
Aleksander Alekseev
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 8f5332a..ac272e1 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1693,7 +1693,7 @@
   
   
p = permanent table, u = unlogged table,
-   t = temporary table
+   t = temporary table, f = fast temporary table
   
  
 
diff --git a/src/backend/access/common/Makefile b/src/backend/access/common/Makefile
index 1fa6de0..56de4dc 100644
--- a/src/backend/access/common/Makefile
+++ b/src/backend/access/common/Makefile
@@ -12,7 +12,7 @@ subdir = src/backend/access/common
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = heaptuple.o indextuple.o printtup.o reloptions.o scankey.o \
+OBJS = fasttab.o heaptuple.o indextuple.o printtup.o reloptions.o scankey.o \
 	tupconvert.o tupdesc.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/common/fasttab.c b/src/backend/access/common/fasttab.c
new file mode 100644
index 000..610790d
--- /dev/null
+++ b/src/backend/access/common/fasttab.c
@@ -0,0 +1,1884 @@
+/*-
+ *
+ * fasttab.c
+ *	  virtual catalog and fast temporary tables
+ *
+ * This file contents imlementation of special type of temporary tables ---
+ * fast temporary tables (FTT). From user perspective they work exactly as
+ * regular temporary tables. However there are no records about FTTs in
+ * pg_catalog. These records are stored in backend's memory instead and mixed
+ * with regular records during scans of catalog tables. We refer to
+ * corresponding tuples of catalog tables as "in-memory" or "virtual" tuples
+ * and to all these tuples together --- as "in-memory" or "virtual" catalog.
+ *
+ * Note that since temporary tables are visiable only in one session there is
+ * no need to use shared memory or locks for FTTs. Transactions support is
+ * very simple too. There is no need to track xmin/xmax, etc.
+ *
+ * FTTs are designed to to solve pg_catalog bloating problem. The are
+ * applications that create and delete a lot of temporary tables. It causes
+ * bloating of pg_catalog and running auto vacuum on it. It's quite an
+ * expensive operation that affects entire database performance.
+ *
+ * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/common/fasttab.c
+ *
+ *-
+ */
+
+#include "c.h"
+#include "postgres.h"
+#include "pgstat.h"
+#include "miscadmin.h"
+#include "access/amapi.h"
+#include "access/fasttab.h"
+#include "access/relscan.h"
+#include "access/valid.h"
+#include "access/sysattr.h"
+#include "access/htup_details.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_type.h"
+#include "catalog/pg_depend.h"
+#include "catalog/pg_inherits.h"
+#include "catalog/pg_statistic.h"
+#include "storage/bufmgr.h"
+#include "utils/rel.h"
+#include "utils/inval.h"
+#include "utils/memutils.h"
+
+/*
+		  TYPEDEFS, MACRO DECLARATIONS AND CONST STATIC VARIABLES
+ */
+
+/* #define FASTTAB_DEBUG 1 */
+
+#ifdef FASTTAB_DEBUG
+static int32 fasttab_scan_tuples_counter = -1;
+#endif
+
+/* List of in-memory tuples. */
+typedef struct
+{
+	dlist_node	node;
+	HeapTuple	tup;
+}	DListHeapTupleData;
+
+typedef DListHeapTupleData *DListHeapTuple;
+
+/* Like strcmp but for integer types --- int, uint32, Oid, etc. */
+#define FasttabCompareInts(x, y) ( (x) == (y) ? 0 : ( (x) > (y) ? 1 : -1 ))
+
+/* Forward declaration is required for relation_is_inmem_tuple_function */
+struct FasttabSnapshotData;
+typedef struct FasttabSnapshotData *FasttabSnapshot;
+
+/* Predicate that determines whether given tuple should be stored in-memory */
+typedef bool (*relation_is_inmem_tuple_function)
+			(Relation relation, HeapTuple tup, FasttabSnapshot fasttab_snapshot,
+		 int tableIdx);
+
+/* Capacity of FasttabRelationMethods->attrNumbers, see below */
+#define FasttabRelationMaxOidAttributes 2
+
+/* FasttabRelationMethodsTable entry */
+typedef const struct
+{
+	/* relation oid */
+	Oid			relationId;
+	/* predicate that determines whether tuple s

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-01 Thread Tom Lane

Tomas Vondra  writes:
> What is needed is an overview of the approach, so that the reviewers can 
> read that first, instead of assembling the knowledge from pieces 
> scattered over comments in many pieces. But I see the fasttab.c contains 
> this:

> /* TODO TODO comment the general idea - in-memory tuples and indexes, 
> hooks principle, FasttabSnapshots, etc */

A fairly common answer when some feature needs an implementation overview
is to create a README file for it, or add a new section in an existing
README file.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-01 Thread Tomas Vondra


On 08/01/2016 11:45 AM, Aleksander Alekseev wrote:

Hello.

Thanks everyone for great comments!


Well, jokes aside, that's a pretty lousy excuse for not writing any
docs


I think maybe I put it in a wrong way. There are currently a lot of
comments in a code, more then enough to understand how this feature
works. What I meant is that this is not a final version of a patch and
a few paragraphs are yet to be written. At least it's how I see it. If
you believe that some parts of the code are currently hard to understand
and some comments could be improved, please name it and I will be happy
to fix it.


I don't think there's "a lot of comments in the code", not even 
remotely. At least not in the files I looked into - heapam, indexam, 
xact etc. There are a few comments in general, and most of them only 
comment obvious facts, like "ignore in-memory tuples" right before a 
trivial if statement.


What is needed is an overview of the approach, so that the reviewers can 
read that first, instead of assembling the knowledge from pieces 
scattered over comments in many pieces. But I see the fasttab.c contains 
this:


/* TODO TODO comment the general idea - in-memory tuples and indexes, 
hooks principle, FasttabSnapshots, etc */


The other thing that needs to happen is you need to modify comments in 
front of some of the modified methods - e.g. the comments may need a 
paragraph "But when the table is fast temporary, what happens is ..."





IMHO if this patch gets in, we should use it as the only temp table
implementation (Or can you think of cases where keeping rows in
pg_class has advantages?). That'd also eliminate the need for FAST
keyword in the CREATE TABLE command.



Probably has zero value to have slow and fast temp tables (from
catalogue cost perspective). So the FAST implementation should be used
everywhere.


If there are no objections I see no reason no to do it in a next
version of a patch.


I believe there will be a lot of discussion about this.




I'm getting failures in the regression suite


I've run regression suite like 10 times in a row in different
environments with different build flags but didn't manage to reproduce
it. Also our DBAs are testing this feature for weeks now on real-world
applications and they didn't report anything like this. Could you
please describe how to reproduce this issue?



Nothing special:

$ ./configure --prefix=/home/user/pg-temporary --enable-debug \
  --enable-cassert

$ make -s clean && make -s -j4 install

$ export PATH=/home/user/pg-temporary/bin:$PATH

$ pg_ctl -D ~/tmp/data-temporary init

$ pg_ctl -D ~/tmp/data-temporary -l ~/temporary.log start

$ make installcheck

I get the failures every time - regression diff attached. The first 
failure in "rolenames" is expected, because of clash with existing user 
name. The remaining two failures are not.


I only get the failure for "installcheck" but not "check" for some reason.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
*** /home/user/work/postgres/src/test/regress/expected/rolenames.out
2016-07-18 20:48:39.421444507 +0200
--- /home/user/work/postgres/src/test/regress/results/rolenames.out 
2016-08-01 15:07:04.59700 +0200
***
*** 42,47 
--- 42,48 
  CREATE ROLE "current_user";
  CREATE ROLE "session_user";
  CREATE ROLE "user";
+ ERROR:  role "user" already exists
  CREATE ROLE current_user; -- error
  ERROR:  CURRENT_USER cannot be used as a role name here
  LINE 1: CREATE ROLE current_user;
***
*** 956,958 
--- 957,960 
  DROP OWNED BY regress_testrol0, "Public", "current_user", regress_testrol1, 
regress_testrol2, regress_testrolx CASCADE;
  DROP ROLE regress_testrol0, regress_testrol1, regress_testrol2, 
regress_testrolx;
  DROP ROLE "Public", "None", "current_user", "session_user", "user";
+ ERROR:  current user cannot be dropped

==


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-01 Thread Aleksander Alekseev

Hello.

Thanks everyone for great comments!

> Well, jokes aside, that's a pretty lousy excuse for not writing any 
> docs

I think maybe I put it in a wrong way. There are currently a lot of
comments in a code, more then enough to understand how this feature
works. What I meant is that this is not a final version of a patch and
a few paragraphs are yet to be written. At least it's how I see it. If
you believe that some parts of the code are currently hard to understand
and some comments could be improved, please name it and I will be happy
to fix it.

> IMHO if this patch gets in, we should use it as the only temp table 
> implementation (Or can you think of cases where keeping rows in
> pg_class has advantages?). That'd also eliminate the need for FAST
> keyword in the CREATE TABLE command.

> Probably has zero value to have slow and fast temp tables (from
> catalogue cost perspective). So the FAST implementation should be used
> everywhere.

If there are no objections I see no reason no to do it in a next
version of a patch.

> I'm getting failures in the regression suite

I've run regression suite like 10 times in a row in different
environments with different build flags but didn't manage to reproduce
it. Also our DBAs are testing this feature for weeks now on real-world
applications and they didn't report anything like this. Could you
please describe how to reproduce this issue?

> This should to work on slaves - it is one of ToDo

Glad you noticed! In fact I'm currently researching a possibility of
using the same approach for creating writable temporary tables on
replicas.

> The key reason why I don't think that's negotiable is that if there
> aren't (apparently) catalog entries corresponding to the temp tables,
> that will almost certainly break many things in the backend and
> third-party extensions, not only user code patterns like this one.

> In short, I think that the way to make something like this work is to
> figure out how to have "virtual" catalog rows describing a temp table.

I'm afraid once again I put it in a wrong way. What I meant by
"Information about these tables is stored not in a catalog but in
backend's memory" is in fact that _records_ of pg_class, pg_type and
other catalog relations are stored in-memory. Naturally this records
are visible to the user (otherwise \d or \d+ would not work) and you
can do queries like ` select * from pg_class where relname = 'tt1' `.
In other words part of the catalog is indeed "virtual".

> I didn't see support for memory store for column's statistics. Some
> separate questions is about production statistics -
> pg_stat_user_table, ..

> That seems to work (both analyze and pg_stat_user_tables). Not sure 
> where it's in the code, and I'm not willing to reverse engineer it.

Right, `ANALYZE temp_table;` and everything else works. Besides
pg_class, pg_type, pg_attribute and other relations pg_statistic
records for temp tables are stored in-memory as well. IIRC a lot of
pg_stat* relations are in fact views and thus don't require any special
support. If you see that some statistics are broken please don't
hesitate to report it and I will fix it.

Hope I answered all questions so far. I look forward to receive more
comments and questions regarding this patch!

-- 
Best regards,
Aleksander Alekseev


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-07-31 Thread Tomas Vondra


On 07/30/2016 04:47 PM, Tom Lane wrote:

Pavel Stehule  writes:

But there are some patterns used with work with temp tables,that should not
working, and we would to decide if we prepare workaround or not.



-- problematic pattern (old code)
IF NOT EXISTS(SELECT * FROM pg_class WHERE ) THEN
  CREATE TEMP TABLE xxx()
ELSE
  TRUNCATE TABLE xxx;
END IF;



-- modern patter (new code)
BEGIN
  TRUNCATE TABLE xxx;
EXCEPTION WHEN . THEN
  CREATE TEMP TABLE(...)
END;


If the former stops working, that's a sufficient reason to reject the
patch: it hasn't been thought through carefully enough.  The key reason
why I don't think that's negotiable is that if there aren't (apparently)
catalog entries corresponding to the temp tables, that will almost
certainly break many things in the backend and third-party extensions,
not only user code patterns like this one.  We'd constantly be fielding
bug reports that "feature X doesn't work with temp tables anymore".



Agreed - breaking internal features for temporary tables is not 
acceptable. I was thinking more about external code messing with 
catalogs, but on second thought we probably need to keep the records in 
pg_class anyway.


>

In short, I think that the way to make something like this work is
to figure out how to have "virtual" catalog rows describing a temp
table. Or maybe to partition the catalogs so that vacuuming away
temp-table rows is easier/cheaper than today.



Yeah, and I think the patch tries to do that, although in a rather 
invasive / unprincipled way. But this will only work for the current 
behavior (i.e. mostly what SQL standard means by LOCAL). For GLOBAL 
temporary tables I think we need to keep physical catalog row, and only 
override the storage filename.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-07-30 Thread David Steele

On 7/30/16 10:47 AM, Tom Lane wrote:
> Pavel Stehule  writes:
>> But there are some patterns used with work with temp tables,that should not
>> working, and we would to decide if we prepare workaround or not.
> 
>> -- problematic pattern (old code)
>> IF NOT EXISTS(SELECT * FROM pg_class WHERE ) THEN
>>   CREATE TEMP TABLE xxx()
>> ELSE
>>   TRUNCATE TABLE xxx;
>> END IF;
> 
>> -- modern patter (new code)
>> BEGIN
>>   TRUNCATE TABLE xxx;
>> EXCEPTION WHEN . THEN
>>   CREATE TEMP TABLE(...)
>> END;
> 
> If the former stops working, that's a sufficient reason to reject the
> patch: it hasn't been thought through carefully enough.  The key reason
> why I don't think that's negotiable is that if there aren't (apparently)
> catalog entries corresponding to the temp tables, that will almost
> certainly break many things in the backend and third-party extensions,
> not only user code patterns like this one.  We'd constantly be fielding
> bug reports that "feature X doesn't work with temp tables anymore".
> 
> In short, I think that the way to make something like this work is to
> figure out how to have "virtual" catalog rows describing a temp table.
> Or maybe to partition the catalogs so that vacuuming away temp-table
> rows is easier/cheaper than today.

In addition the latter pattern burns an xid which can be a problem for
high-volume databases.

How about CREATE TEMP TABLE IF NOT EXISTS...?

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-07-30 Thread Tom Lane

Pavel Stehule  writes:
> But there are some patterns used with work with temp tables,that should not
> working, and we would to decide if we prepare workaround or not.

> -- problematic pattern (old code)
> IF NOT EXISTS(SELECT * FROM pg_class WHERE ) THEN
>   CREATE TEMP TABLE xxx()
> ELSE
>   TRUNCATE TABLE xxx;
> END IF;

> -- modern patter (new code)
> BEGIN
>   TRUNCATE TABLE xxx;
> EXCEPTION WHEN . THEN
>   CREATE TEMP TABLE(...)
> END;

If the former stops working, that's a sufficient reason to reject the
patch: it hasn't been thought through carefully enough.  The key reason
why I don't think that's negotiable is that if there aren't (apparently)
catalog entries corresponding to the temp tables, that will almost
certainly break many things in the backend and third-party extensions,
not only user code patterns like this one.  We'd constantly be fielding
bug reports that "feature X doesn't work with temp tables anymore".

In short, I think that the way to make something like this work is to
figure out how to have "virtual" catalog rows describing a temp table.
Or maybe to partition the catalogs so that vacuuming away temp-table
rows is easier/cheaper than today.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-07-30 Thread Tomas Vondra


On 07/30/2016 06:49 AM, Pavel Stehule wrote:

1) I wonder whether the FAST makes sense - does this really change
the performance significantly? IMHO you only move the catalog rows
to memory, so why should the tables be any faster? I also believe
this conflicts with SQL standard specification of CREATE TABLE.


Probably has zero value to have slow and fast temp tables (from
catalogue cost perspective). So the FAST implementation should be used
everywhere. But there are some patterns used with work with temp
tables,that should not working, and we would to decide if we prepare
workaround or not.

-- problematic pattern (old code)
IF NOT EXISTS(SELECT * FROM pg_class WHERE ) THEN
  CREATE TEMP TABLE xxx()
ELSE
  TRUNCATE TABLE xxx;
END IF;


I'd argue that if you mess with catalogs directly, you're on your own. 
Not only it's fragile, but this pattern is also prone to race conditions 
(although a concurrent session can't create a conflicting temporary table).




-- modern patter (new code)
BEGIN
  TRUNCATE TABLE xxx;
EXCEPTION WHEN . THEN
  CREATE TEMP TABLE(...)
END;

In this case we can use GUC, because visible behave should be same.


What GUC?



The benefit of zero catalogue cost temp tables is significant - and for
some larger applications the temp tables did hard performance issues.


Yeah, catalog bloat is a serious issue in such cases, and it's amplified 
by indexes created on the temporary tables.




Some other random notes:

1. With this code should not be hard to implement global temp tables -
shared persistent structure, temp local data - significant help for any
who have to migrate from Oracle.


The patch moves in pretty much the opposite direction - if anything, 
it'll make it more difficult to implement global temporary tables, 
because it removes the definitions from the catalog, thus impossible to 
share by catalogs. To get global temporary tables, I think the best 
approach would be to share the catalog definition and only override the 
filename. Or something like that.




2. This should to work on slaves - it is one of ToDo



No, it does not work on slaves, because it still does a read-write 
transaction.


test=# begin read only;
BEGIN
test=# create fast temporary table x (id int);
ERROR:  cannot execute CREATE TABLE in a read-only transaction

No idea how difficult it'd be to make it work.


3. I didn't see support for memory store for column's statistics. Some
separate questions is about production statistics - pg_stat_user_table, ..


That seems to work (both analyze and pg_stat_user_tables). Not sure 
where it's in the code, and I'm not willing to reverse engineer it.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-07-29 Thread Pavel Stehule

2016-07-30 1:46 GMT+02:00 Tomas Vondra :

> Hi,
>
> On 07/29/2016 01:15 PM, Aleksander Alekseev wrote:
>
>> Hello
>>
>> Some time ago we discussed an idea of "fast temporary tables":
>>
>> https://www.postgresql.org/message-id/20160301182500.2c81c3dc%40fujitsu
>>
>> In two words the idea is following.
>>
>> 
>>
>> PostgreSQL stores information about all relations in pg_catalog. Some
>> applications create and delete a lot of temporary tables. It causes a
>> bloating of pg_catalog and running auto vacuum on it. It's quite an
>> expensive operation which affects entire database performance.
>>
>> We could introduce a new type of temporary tables. Information about
>> these tables is stored not in a catalog but in backend's memory. This
>> way user can solve a pg_catalog bloating problem and improve overall
>> database performance.
>>
>> 
>>
>
> Great! Thanks for the patch, this is definitely an annoying issue worth
> fixing. I've spent a bit of time looking at the patch today, comments below
> ...
>

Yes, it some what we need long time

>
>
>> I took me a few months but eventually I made it work. Attached patch
>> has some flaws. I decided not to invest a lot of time in documenting
>> it or pgindent'ing all files yet. In my experience it will be rewritten
>> entirely 3 or 4 times before merging anyway :) But it _works_ and
>> passes all tests I could think of, including non-trivial cases like
>> index-only or bitmap scans of catalog tables.
>>
>>
> Well, jokes aside, that's a pretty lousy excuse for not writing any docs,
> and you're pretty much asking the reviewers to reverse-engineer your
> reasoning. So I doubt you'll get many serious reviews without fixing this
> gap.
>
> Usage example:
>>
>> ```
>> CREATE FAST TEMP TABLE fasttab_test1(x int, s text);
>>
>> INSERT INTO fasttab_test1 VALUES (1, 'aaa'), (2, 'bbb'), (3, 'ccc');
>>
>> UPDATE fasttab_test1 SET s = 'ddd' WHERE x = 2;
>>
>> DELETE FROM fasttab_test1 WHERE x = 3;
>>
>> SELECT * FROM fasttab_test1 ORDER BY x;
>>
>> DROP TABLE fasttab_test1;
>> ```
>>
>> More sophisticated examples could be find in regression tests:
>>
>> ./src/test/regress/sql/fast_temp.sql
>>
>> Any feedback on this patch will be much appreciated!
>>
>>
>>
> 1) I wonder whether the FAST makes sense - does this really change the
> performance significantly? IMHO you only move the catalog rows to memory,
> so why should the tables be any faster? I also believe this conflicts with
> SQL standard specification of CREATE TABLE.
>

Probably has zero value to have slow and fast temp tables (from catalogue
cost perspective). So the FAST implementation should be used everywhere.
But there are some patterns used with work with temp tables,that should not
working, and we would to decide if we prepare workaround or not.

-- problematic pattern (old code)
IF NOT EXISTS(SELECT * FROM pg_class WHERE ) THEN
  CREATE TEMP TABLE xxx()
ELSE
  TRUNCATE TABLE xxx;
END IF;

-- modern patter (new code)
BEGIN
  TRUNCATE TABLE xxx;
EXCEPTION WHEN . THEN
  CREATE TEMP TABLE(...)
END;

In this case we can use GUC, because visible behave should be same.

The benefit of zero catalogue cost temp tables is significant - and for
some larger applications the temp tables did hard performance issues.


>
> 2) Why do we need the new relpersistence value? ISTM we could easily got
> with just RELPERSISTENCE_TEMP, which would got right away of many chances
> as the steps are exactly the same.
>
> IMHO if this patch gets in, we should use it as the only temp table
> implementation (Or can you think of cases where keeping rows in pg_class
> has advantages?). That'd also eliminate the need for FAST keyword in the
> CREATE TABLE command.
>
> The one thin I'm not sure about is that our handling of temporary tables
> is not standard compliant - we require each session to create it's own
> private temporary table. Moving the rows from pg_class into backend memory
> seems to go in the opposite direction, but as no one was planning to fix
> this, I don't think it matters much.
>
> 3) I think the heapam/indexam/xact and various other places needs a major
> rework. You've mostly randomly sprinkled the code with function calls to
> make the patch work - that's fine for an initial version, but a more
> principled approach is needed.
>
> 4) I'm getting failures in the regression suite - apparently the patch
> somehow affects costing of index only scans, so that a several queries
> switch from index only scans to bitmap index scans etc. I haven't
> investigated this more closely, but it seems quite consistent (and I don't
> see it without the patch). It seems the patch delays building of visibility
> map, or something like that.
>

Some other random notes:

1. With this code should not be hard to implement global temp tables -
shared persistent structure, temp local data - significant help for any who
have to migrate from Oracle.

2. This should to work on slaves - it is one of ToDo

3. I didn't see support for memory store

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-07-29 Thread Tomas Vondra


Hi,

On 07/29/2016 01:15 PM, Aleksander Alekseev wrote:

Hello

Some time ago we discussed an idea of "fast temporary tables":

https://www.postgresql.org/message-id/20160301182500.2c81c3dc%40fujitsu

In two words the idea is following.



PostgreSQL stores information about all relations in pg_catalog. Some
applications create and delete a lot of temporary tables. It causes a
bloating of pg_catalog and running auto vacuum on it. It's quite an
expensive operation which affects entire database performance.

We could introduce a new type of temporary tables. Information about
these tables is stored not in a catalog but in backend's memory. This
way user can solve a pg_catalog bloating problem and improve overall
database performance.




Great! Thanks for the patch, this is definitely an annoying issue worth 
fixing. I've spent a bit of time looking at the patch today, comments 
below ...




I took me a few months but eventually I made it work. Attached patch
has some flaws. I decided not to invest a lot of time in documenting
it or pgindent'ing all files yet. In my experience it will be rewritten
entirely 3 or 4 times before merging anyway :) But it _works_ and
passes all tests I could think of, including non-trivial cases like
index-only or bitmap scans of catalog tables.



Well, jokes aside, that's a pretty lousy excuse for not writing any 
docs, and you're pretty much asking the reviewers to reverse-engineer 
your reasoning. So I doubt you'll get many serious reviews without 
fixing this gap.



Usage example:

```
CREATE FAST TEMP TABLE fasttab_test1(x int, s text);

INSERT INTO fasttab_test1 VALUES (1, 'aaa'), (2, 'bbb'), (3, 'ccc');

UPDATE fasttab_test1 SET s = 'ddd' WHERE x = 2;

DELETE FROM fasttab_test1 WHERE x = 3;

SELECT * FROM fasttab_test1 ORDER BY x;

DROP TABLE fasttab_test1;
```

More sophisticated examples could be find in regression tests:

./src/test/regress/sql/fast_temp.sql

Any feedback on this patch will be much appreciated!




1) I wonder whether the FAST makes sense - does this really change the 
performance significantly? IMHO you only move the catalog rows to 
memory, so why should the tables be any faster? I also believe this 
conflicts with SQL standard specification of CREATE TABLE.


2) Why do we need the new relpersistence value? ISTM we could easily got 
with just RELPERSISTENCE_TEMP, which would got right away of many 
chances as the steps are exactly the same.


IMHO if this patch gets in, we should use it as the only temp table 
implementation (Or can you think of cases where keeping rows in pg_class 
has advantages?). That'd also eliminate the need for FAST keyword in the 
CREATE TABLE command.


The one thin I'm not sure about is that our handling of temporary tables 
is not standard compliant - we require each session to create it's own 
private temporary table. Moving the rows from pg_class into backend 
memory seems to go in the opposite direction, but as no one was planning 
to fix this, I don't think it matters much.


3) I think the heapam/indexam/xact and various other places needs a 
major rework. You've mostly randomly sprinkled the code with function 
calls to make the patch work - that's fine for an initial version, but a 
more principled approach is needed.


4) I'm getting failures in the regression suite - apparently the patch 
somehow affects costing of index only scans, so that a several queries 
switch from index only scans to bitmap index scans etc. I haven't 
investigated this more closely, but it seems quite consistent (and I 
don't see it without the patch). It seems the patch delays building of 
visibility map, or something like that.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-07-29 Thread Aleksander Alekseev

Hello

Some time ago we discussed an idea of "fast temporary tables":

https://www.postgresql.org/message-id/20160301182500.2c81c3dc%40fujitsu

In two words the idea is following.



PostgreSQL stores information about all relations in pg_catalog. Some
applications create and delete a lot of temporary tables. It causes a
bloating of pg_catalog and running auto vacuum on it. It's quite an
expensive operation which affects entire database performance.

We could introduce a new type of temporary tables. Information about
these tables is stored not in a catalog but in backend's memory. This
way user can solve a pg_catalog bloating problem and improve overall
database performance.



I took me a few months but eventually I made it work. Attached patch
has some flaws. I decided not to invest a lot of time in documenting
it or pgindent'ing all files yet. In my experience it will be rewritten
entirely 3 or 4 times before merging anyway :) But it _works_ and
passes all tests I could think of, including non-trivial cases like
index-only or bitmap scans of catalog tables.

Usage example:

```
CREATE FAST TEMP TABLE fasttab_test1(x int, s text);

INSERT INTO fasttab_test1 VALUES (1, 'aaa'), (2, 'bbb'), (3, 'ccc');

UPDATE fasttab_test1 SET s = 'ddd' WHERE x = 2;

DELETE FROM fasttab_test1 WHERE x = 3;

SELECT * FROM fasttab_test1 ORDER BY x;

DROP TABLE fasttab_test1;
```

More sophisticated examples could be find in regression tests:

./src/test/regress/sql/fast_temp.sql

Any feedback on this patch will be much appreciated!

-- 
Best regards,
Aleksander Alekseev
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 0689cc9..940210b 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1693,7 +1693,7 @@
   
   
p = permanent table, u = unlogged table,
-   t = temporary table
+   t = temporary table, f = fast temporary table
   
  
 
diff --git a/src/backend/access/common/Makefile b/src/backend/access/common/Makefile
index 1fa6de0..56de4dc 100644
--- a/src/backend/access/common/Makefile
+++ b/src/backend/access/common/Makefile
@@ -12,7 +12,7 @@ subdir = src/backend/access/common
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = heaptuple.o indextuple.o printtup.o reloptions.o scankey.o \
+OBJS = fasttab.o heaptuple.o indextuple.o printtup.o reloptions.o scankey.o \
 	tupconvert.o tupdesc.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/common/fasttab.c b/src/backend/access/common/fasttab.c
new file mode 100644
index 000..0a20247
--- /dev/null
+++ b/src/backend/access/common/fasttab.c
@@ -0,0 +1,1678 @@
+/* TODO TODO comment the general idea - in-memory tuples and indexes, hooks principle, FasttabSnapshots, etc */
+
+#include "c.h"
+#include "postgres.h"
+#include "pgstat.h"
+#include "miscadmin.h"
+#include "access/amapi.h"
+#include "access/fasttab.h"
+#include "access/relscan.h"
+#include "access/valid.h"
+#include "access/sysattr.h"
+#include "access/htup_details.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_type.h"
+#include "catalog/pg_depend.h"
+#include "catalog/pg_inherits.h"
+#include "catalog/pg_statistic.h"
+#include "storage/bufmgr.h"
+#include "utils/rel.h"
+#include "utils/inval.h"
+#include "utils/memutils.h"
+
+/*
+		  TYPEDEFS, MACRO DECLARATIONS AND CONST STATIC VARIABLES
+ */
+
+/* #define FASTTAB_DEBUG 1 */
+
+#ifdef FASTTAB_DEBUG
+static int32 fasttab_scan_tuples_counter = -1;
+#endif
+
+/* list of in-memory catalog tuples */
+typedef struct
+{
+	dlist_node	node;
+	HeapTuple	tup;
+}	DListHeapTupleData;
+
+typedef DListHeapTupleData *DListHeapTuple;
+
+#define FasttabSnapshotIsAnonymous(sn) ( !PointerIsValid((sn)->name) )
+#define FasttabSnapshotIsRoot(sn) ( !PointerIsValid((sn)->prev) )
+#define FasttabTransactionInProgress() \
+	( PointerIsValid(FasttabSnapshotGetCurrent()->prev))
+/* like strcmp but for integer types --- int, uint32, Oid, etc */
+#define FasttabCompareInts(x, y) ( (x) == (y) ? 0 : ( (x) > (y) ? 1 : -1 ))
+
+struct FasttabSnapshotData;		/* forward declaration required for
+ * relation_is_inmem_tuple_function typedef */
+typedef struct FasttabSnapshotData *FasttabSnapshot;
+
+typedef bool (*relation_is_inmem_tuple_function)
+			(Relation relation, HeapTuple tup, FasttabSnapshot fasttab_snapshot,
+		 int tableIdx);
+
+#define FasttabRelationMaxOidAttributes 2
+
+typedef const struct
+{
+	Oid			relationId;
+	relation_is_inmem_tuple_function is_inmem_tuple_fn;
+	AttrNumber	noidattr;
+	AttrNumber	attrNumbers[FasttabRelationMaxOidAttributes];
+}	FasttabRelationMethodsData;
+
+typedef FasttabRelationMethodsData const *FasttabRelationMethods;
+
+static bool generic_is_inmem_tuple(Relation relation, HeapTuple tup,
+	   FasttabSnapshot fasttab_snapshot, int tableIdx);
+stat

61 matches

Mail list logo