Re: [HACKERS] Performance and WAL on big inserts/updates

Marty Scholes Thu, 11 Mar 2004 18:32:26 -0800

> A major reason for this is that logical logs make recovery contingent
> on being able to execute the "parsed statements". This execution
> might, however, not be possible if the system is itself not in a
> consistent state .. as is normally the case during recovery.
>

I am not sure I follow you here. The logs should (IMHO) save both types of data: physical pages (what happens now), or the SQL statement if it is small and generates a bunch of changes.

If the DB state cannot be put back to a consistent state prior to a SQL statement in the log, then NO amount of logging will help. The idea is that the state can be put back to what it was prior to a particular log entry, be it raw datafile blocks or a SQL statement.

> What if, for instance, it's the catalog tables that were hosed when
> the system went down ? It may be difficult to execute the parsed
> statements without the catalogs.
>

See above. If this cannot be resolved prior to re-executing a statement in the log, then the problem is beyond ANY subsequent logging.

> Having said that, page-oriented undo logging can be a pain when B-tree
> pages split. For higher concurrency, ARIES uses logical undo
> logging. In this case, the logs are akin to your "parsed statement"
> idea.
>

Yes, my experience exactly. Maybe we are the only company on the planet that experiences this sort of thing. Maybe not.

> In any case, the only place that parsed statements are useful, imo are
> with searched updates that cause a large number of records to change
> and with "insert into from select" statements.
>

Yes. Correlated UPDATE, INSERT INTO with subselects AND mass DELETE on heavily indexed tables. Index creation... The list goes on and on. I have experienced and live it all on a daily basis with Oracle. And I despise it.

The difference is, of course, I can't even have this kind of discussion with Oracle, but I can here. ;-)

> Then, there is also the case that this, the "parsed statements"
> approach, is not a general solution. How would you handle the "update
> current of cursor" scenarios ? In this case, there is some application
> logic that determines the precise records that change and how they
> change.
>
> Ergo, it is my claim that while logical redo logging does have some
> benefits, it is not a viable general solution.
>

Agreed, this is not a general solution. What it is, however, is a tremendous improvement over the current situation for transactions that do massive changes to heavily indexed datasets.

I am working on an application right now that will require current postal information on EVERY address in the U.S. -- street name, street address, directional, subunit, 5 digit zip, 3 digit zip, city, state, delivery point barcode, carrier route, lattitude, longitude, etc. Most of these fields will need to be indexed, because they will be searched in real time via a web application several thousand times per day.

To keep the address current, we will be updating them all (150+ million) on a programmed basis, so we will go through and update several million addresses EVERY DAY, while needing to ensure that the address updates happen atomically so that they don't disrupt web activity.

Maybe this is not a "traditional" RDBMS app, but I am not in the mood to write my own storage infrastructure for it.

Then again, maybe I don't know what I am talking about...

Marty

Sailesh Krishnamurthy wrote:

"Marty" == Marty Scholes <[EMAIL PROTECTED]> writes:
    Marty> Why have I not seen this in any database?
    Marty> There must be a reason.
For ARIES-style systems, logging parsed statements (commonly called "logical" logging) is not preferred compared to logging data items ("physical" or "physiological" logging).

A major reason for this is that logical logs make recovery contingent on being able to execute the "parsed statements". This execution might, however, not be possible if the system is itself not in a consistent state .. as is normally the case during recovery.

What if, for instance, it's the catalog tables that were hosed when the system went down ? It may be difficult to execute the parsed statements without the catalogs.
For this reason, a major goal of ARIES was to have each and every data
object (tables/indexes) individually recoverable. So ARIES follows
page-oriented redo logging.
Having said that, page-oriented undo logging can be a pain when B-tree
pages split. For higher concurrency, ARIES uses logical undo
logging. In this case, the logs are akin to your "parsed statement"
idea.
In any case, the only place that parsed statements are useful, imo are
with searched updates that cause a large number of records to change
and with "insert into from select" statements.
Then, there is also the case that this, the "parsed statements" approach, is not a general solution. How would you handle the "update current of cursor" scenarios ? In this case, there is some application logic that determines the precise records that change and how they change.
Ergo, it is my claim that while logical redo logging does have some
benefits, it is not a viable general solution.


---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] Performance and WAL on big inserts/updates

Reply via email to