On Tue, Apr 30, 2013 at 10:40 AM, Kevin Grittner <kgri...@ymail.com> wrote:
>>> What is a real problem or risk with using this mechanism until we
>>> engineer something better?  What problems with converting to a
>>> later major release does anyone see?
>>
>> Well, it's a pg_upgrade hazard, if nothing else, isn't it?
>
> I don't think so.  What do you see as a problem?

pg_upgrade only handles changes in catalog state, not on-disk
representation.  If the on-disk representation of an non-scannable
view might change in a future release, it's a pg_upgrade hazard.

>> Sure, I wouldn't allow that either.  My point is that I feel that
>> could be engineered around in user space.  If you have a materialized
>> view which could legitimately be empty (there are many for which that
>> won't be an issue), then you can either arrange the view definition so
>> that it isn't (e.g. by including a dummy record that clients can look
>> for), or you can include a sentinel unlogged table that will contain a
>> row if and only if materialized views have been refreshed since the
>> last crash.  In the examples I can think of,
>> indefinitely-stale-but-valid-at-some-point wouldn't be very good
>> either, so I would anticipate needing to do some engineering around
>> relative freshness anyway - e.g. keeping a side table that lists the
>> last-refreshed-time for each matview.  Or, maybe I'd wouldn't care
>> about tracking elapsed time per se, but instead want to track
>> freshness relative to updates - e.g. set things up so that anyone who
>> performs an action that would invalidate a row in the materialized
>> view would also update a row someplace that would allow me to identify
>> the row as stale.  In either case, handling the case where the view is
>> altogether invalid doesn't seem to add a whole lot of additional
>> complexity.
>>
>> Now, I can imagine cases where it does.  For example, suppose you have
>> a cron job (which you trust to work) to refresh your materialized
>> views every night.  Well, that means that you'll never be more than 24
>> hours stale - but if any of those materialized views are unlogged,
>> that also means that you could have no data at all for up to 24 hours
>> following a crash.  Not great, because now you need some logic to
>> handle just that one case that wouldn't be necessary if the DB did it
>> for you.  But I just think it's a judgement call how serious one
>> thinks that scenario is, vs. any other scenario where a boolean isn't
>> adequate either.
>
> "Staleness" is a completely different issue, in my view, from
> quietly returning results that are not, and never were, accurate.
> Sure we need to implement more refined "scannability" tests than
> whether valid data from *some* point in time is present.  But that
> should always be *part* of the scannability testing, and without it
> I don't feel we have a feature of a quality suitable for delivery.

I don't really see these as being all that separate.  The user is
going to want to know whether the data is usable or not.  The answer
to that question depends on the user's business rules, and those could
be anything.  The current system implements the following business
rule: "The data can be used if the matview has ever been populated."
But there are lots of other possibilities:

- The data can be used if the matview is fully up-to-date.
- The data can be used if the matview is not out of date by more than
a certain amount of time.
- The data can be used if the matview is out of date with respect to
one of its base tables, but not if it is out of date with respect to
another of its base tables.  For example, maybe you're OK with using
an order-summary view if new orders have arrived (but not if the view
is more than a day out of date); but not if any customers have been
renamed.

Not only can the business rules vary, but they can vary between
queries.  Query A might need an exact answer, hence can only use the
matview when its up to date.  Query B against the very same matview
might only need data that's up to date within some time threshold, and
query C might well want to just use whatever data exists.   I'm not at
all clear that it's even feasible to solve all of these problems
inside the database; my suspicion is that at least some of these
things are going to end up being things that have to be handled at
least partially in user-space, while others will be handled in the DB
through mechanisms not yet designed.

I am even willing to go so far as to say that I am unconvinced that
everyone will want a just-truncated materialized view to be
automatically set to non-scannable.  Suppose you have an unlogged view
that summarizes a real-time data stream - all SMTP traps received in
the last minute summarized by the node from which they were received.
When the server reboots unexpectedly, the raw data is lost along with
the summary.  But for the scannability flag, we'd read the summary as
empty, which would be right.  Adding the scannability flag forces the
user to add application logic to treat the error as equivalent to an
empty view.

The real concern I have about using an empty page to differentiate
whether or not the view is scannable is that I think it's a modularity
violation.  But on the value of the underlying feature, my concern is
that there are an essentially infinite number of possible business
rules here that someone might want to implement, and I don't think
it's the job of a materialized view, either now or in the future, to
lock the user into any particular set of policy decisions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to