On 2013-04-30 07:33:05 -0700, Kevin Grittner wrote: > Andres Freund <and...@2ndquadrant.com> wrote: > > 1) vacuum can truncate the table to an empty relation already if there is > > no data returned by the view's query which then leads to the wrong > > scannability state. > > > > S1: CREATE MATERIALIZED VIEW matview_empty AS SELECT false WHERE random() > >< -1; > > S2: S2: SELECT * FROM matview_empty; -- works > > S1: VACUUM matview_empty; > > S2: SELECT * FROM matview_empty; -- still works > > S3: SELECT * FROM matview_empty; -- errors out > > > > So we need to make vacuum skip cleaning out the last page. Once we > > get incrementally updated matviews there are more situations to get > > into this than just a query not returning anything. > > I remember this being discussed somewhere already, but couldn't find > > it quickly enough. > > Yeah, I posted a short patch earlier on this thread that fixes > that. Nobody has commented on it, and so far I have not committed > anything related to this without posting details and giving ample > opportunity for anyone to comment. If nobody objects, I can push > that, and this issue is gone.
Well, this bug is gone, but the multiple layer violations aren't. > > Imo this is quite an indicator that the idea of using the filesize is > > just plain wrong. Adding logic to vacuum not to truncate data because > > its a flag for matview scannability is quite the layer violation and > > a sign for a bad design decision. Such a hack has already been added > > to copy_heap_data(), while not as bad, shows that it is hard to find > > all the places where we need to add it. > > I agree, but if you review the thread I started with a flag in > pg_class, I tried using the "special" area of the first page of the > heap, and finally wound up with the current technique based on > several people reviewing the patch and offering opinions and > reaching an apparent consensus that this was the least evil way to > do it given current infrastructure for unlogged tables. This > really is a result of trying to preserver *correct* behavior while > catering to those emphasizing how important the performance of > unlogged matviews is. Imo it now uses the worst of both worlds... > > 2) Since we don't have a metapage to represent scannability in 9.3 we > > cannot easily use one in 9.4+ without pg_upgrade emptying all > > matviews unless we only rely on the catalogs which we currently > > cannot. > I am not following this argument at all. Doesn't pg_upgrade use > pg_dump to create the tables and matviews WITH NO DATA and take > later action for ones which are populated in the source? It would > not be hard at all to move to a new release which used a different > technique for tracking populated tables by changing what pg_dump > does for populated tables with the switch pg_upgrade uses. What I am thinking about is a 100GB materialized view which has been filled in 9.3 and should now be pg_upgraded into 9.4. If we don't have a metapage yet and we want to add one we would need to rewrite the whole 100GB which seems like a rather bad idea. Or how are you proposing to add the metapage? You cannot add it to the end or somesuch. > I am not seeing this at all. Given my comment above about how this > works for pg_upgrade and pg_dump, can you explain how this is a > problem? Well, that only works if there is a cheap way to add the new notation to existing matview heaps which I don't see. > > 3) Using the filesize as a flag will make other stuff like preallocating > > on-disk data in bigger chunks and related things more complicated. > > Why? You don't need to preallocate for a non-populated matview, > since its heap will be replaced on REFRESH. You would not *want* > to preallocate a lot of space for something guaranteed to remain at > zero length until deleted. But we would likely also want to optimize reducing the filesize in the same chunks because you otherwise would end up with even worse fragmentation. And I am not talking about an unscannable relation but about one which is currently empty (e.g. SELECT ... WHERE false). > > 4) doing the check for scannability in the executor imo is a rather bad > > idea. Note that we e.g. don't see an error about a matview which > > won't be scanned because of constraint exclusion or one-time > > filters. > > > > S1: CREATE MATERIALIZED VIEW matview_unit_false AS SELECT false WHERE > >false > > WITH NO DATA; > > S1: SELECT * FROM matview_unit_false; > > > > You can get there in far more reasonable cases than WHERE false. > > I am a little concerned about that, but the case you show doesn't demonstrate > a problem: > The view was defined WITH NO DATA and is behaving correctly for > that case. When populated (with zero rows) it behaves correctly: Ah. Tom already fixed the problem in 5194024d72f33fb209e10f9ab0ada7cc67df45b7. I was working in a branch based on 3ccae48f44d993351e1f881761bd6c556ebd6638 and it failed there. > I would be interested to see whether anyone can come up with an > actual mis-behavior related to this either before or after the > recent refactoring. Seems ok after Tom's refactoring. > > Not sure what the consequence of this is. The most reasonable solution > > seems to be to introduce a metapage somewhere *now* which sucks, but ... > > That seems far riskier to me than using the current > lame-but-functional approach now and improving it in a subsequent > release. Why? We have bitten by the lack of such metapages several times now and I don't think we have bitten by their presence in relation types that have them by now. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers