On Tue, Apr 30, 2013 at 7:29 AM, Kevin Grittner <kgri...@ymail.com> wrote:
> Let's look at the "corner" this supposedly paints us into.  If a
> later major release creates a better mechanism, current pg_dump and
> load will already use it, based on the way matviews are created
> empty and REFRESHed by pg_dump.  Worst case, we need to modify the
> behavior of pg_dump running with the switch used by pg_upgrade to
> use a new ALTER MATERIALIZED VIEW SET (populated); (or whatever
> syntax is chosen) -- a command we would probably want at that point
> anyway.  I'm not seeing cause for panic here.
>
> What is a real problem or risk with using this mechanism until we
> engineer something better?  What problems with converting to a
> later major release does anyone see?

Well, it's a pg_upgrade hazard, if nothing else, isn't it?

> Yeah, I know you don't get it, but as a DBA I would never have
> allowed a feature which could quietly generate false results to be
> used -- which is exactly what you have without a way to
> differentiate these cases.  If it comes down to a choice between a
> performance tweak like unlogged matviews and an issue of
> correctness of results, I will pick correctness.  Now, as I've
> already said, this tweak is significant (I expect it will make
> matviews useful in roughly twice as many cases), but it is just a
> performance tweak.

Sure, I wouldn't allow that either.  My point is that I feel that
could be engineered around in user space.  If you have a materialized
view which could legitimately be empty (there are many for which that
won't be an issue), then you can either arrange the view definition so
that it isn't (e.g. by including a dummy record that clients can look
for), or you can include a sentinel unlogged table that will contain a
row if and only if materialized views have been refreshed since the
last crash.  In the examples I can think of,
indefinitely-stale-but-valid-at-some-point wouldn't be very good
either, so I would anticipate needing to do some engineering around
relative freshness anyway - e.g. keeping a side table that lists the
last-refreshed-time for each matview.  Or, maybe I'd wouldn't care
about tracking elapsed time per se, but instead want to track
freshness relative to updates - e.g. set things up so that anyone who
performs an action that would invalidate a row in the materialized
view would also update a row someplace that would allow me to identify
the row as stale.  In either case, handling the case where the view is
altogether invalid doesn't seem to add a whole lot of additional
complexity.

Now, I can imagine cases where it does.  For example, suppose you have
a cron job (which you trust to work) to refresh your materialized
views every night.  Well, that means that you'll never be more than 24
hours stale - but if any of those materialized views are unlogged,
that also means that you could have no data at all for up to 24 hours
following a crash.  Not great, because now you need some logic to
handle just that one case that wouldn't be necessary if the DB did it
for you.  But I just think it's a judgement call how serious one
thinks that scenario is, vs. any other scenario where a boolean isn't
adequate either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to