Re: [HACKERS] Updateable cursors

Joshua D. Drake Mon, 22 Jan 2007 21:37:42 -0800

FAST PostgreSQL wrote:
> On Tue, 23 Jan 2007 15:48, Joshua D. Drake wrote:
>> FAST PostgreSQL wrote:
>>> We are trying to develop the updateable cursors functionality into
>>> Postgresql. I have given below details of the design and also issues we
>>> are facing.  Looking forward to the advice on how to proceed with these
>>> issues.
>>>
>>> Rgds,
>>> Arul Shaji
>> Would this be something that you would hope to submit for 8.3?
> 
> Yes definitely. If we can finish it before the feature freeze of course.


Great! I will put it on my, "Remember to bug Arul" list :)

Sincerely,

Joshua D. Drake

> 
> Rgds,
> Arul Shaji
> 
> 
>> Joshua D. Drake
>>
>>> 1. Introduction
>>> --------------
>>> This is a combined proposal and design document for adding updatable
>>> (insensitive) cursor capability to the PostgreSQL database.
>>> There have already been a couple of previous proposals since 2003 for
>>> implementing this feature so there appears to be community interest in
>>> doing so. This will enable the following constructs to be processed:
>>>
>>>
>>> UPDATE <table_name> SET value_list WHERE CURRENT OF <cursor_name>
>>> DELETE FROM <table_name> WHERE CURRENT OF <cursor_name>
>>>
>>> This has the effect of users being able to update or delete specific rows
>>> of a table, as defined by the row currently fetched into the cursor.
>>>
>>>
>>> 2. Overall Conceptual Design
>>> -----------------------------
>>> The design is considered from the viewpoint of progression of a command
>>> through the various stages of processing, from changes to the file
>>> ?gram.y? to implement the actual grammar changes, through to changes in
>>> the Executor portion of the database architecture.
>>>
>>> 2.1 Changes to the Grammar
>>> ------------------------------
>>> The following changes will be done to the PostgreSQL grammar:
>>>
>>> UPDATE statement has the option ?WHERE CURRENT OF <cursor_name>? added
>>> DELETE statement has the option ?WHERE CURRENT OF <cursor_name>? added
>>>
>>> The cursor_name data is held in the UpdateStmt and DeleteStmt structures
>>> and contains just the name of the cursor.
>>>
>>> The pl/pgsql grammar changes in the same manner.
>>>
>>> The word CURRENT will be added to the ScanKeywords array in keywords.c.
>>>
>>>
>>> 2.2 Changes to Affected Data Structures
>>> ------------------------------------------
>>> The following data structures are affected by this change:
>>>
>>> Portal structure, QueryDesc structure, the UpdateStmt and DeleteStmt
>>> structures
>>>
>>> The Portal will contain a list of structures of relation ids and tuple
>>> ids relating to the tuple held in the QueryDesc structure. There will be
>>> one entry in the relation and tuple id list for each entry in the
>>> relation-list of the statement below:
>>>
>>> DECLARE <cursor_name> [WITH HOLD] SELECT FOR UPDATE OF <relation-list>
>>>
>>> The QueryDesc structure will contain the relation id and the tuple id
>>> relating to the tuple obtained via the FETCH command so that it can be
>>> propagated back to the Portal for storage in the list described above.
>>>
>>> The UpdateStmt and DeleteStmt structures have the cursor name added so
>>> that the information is available for use in obtaining the portal
>>> structure related to the cursor previously opened via the DECLARE CURSOR
>>> request.
>>>
>>>
>>> 2.3 Changes to the SQL Parser
>>> ------------------------------------
>>> At present, although the FOR UPDATE clause of the DECLARE CURSOR command
>>> has been present in the grammar, it causes an error message later in the
>>> processing since cursors are currently not updatable. This now needs to
>>> change. The ?FOR UPDATE? clause has to be valid, but not the ?FOR SHARE?
>>> clause.
>>>
>>> The relation names that follow the ?FOR UPDATE? clause will be added to
>>> the rtable in the Query structure and identified by means of the rowMarks
>>> array. In the case of an updatable cursor the FOR SHARE option is not
>>> allowed therefore all entries in the rtable that are identified by the
>>> rowMarks array must relate to tables that are FOR UPDATE.
>>>
>>> In the UPDATE or DELETE statements the ?WHERE CURRENT OF <cursor_name>?
>>> clause results in the cursor name being placed in the UpdateStmt or
>>> DeleteStmt structure. During the processing of the functions -
>>> transformDeleteStmt() and transformUpdateStmt() - the cursor name is used
>>> to obtain a pointer to the related Portal structure and the tuple
>>> affected by the current UPDATE or DELETE statement is extracted from the
>>> Portal, where it has been placed as the result of a previous FETCH
>>> request. At this point all the information for the UPDATE or DELETE
>>> statement is available so the statements can be transformed into standard
>>> UPDATE or DELETE statements and sent for re-write/planning/execution as
>>> usual.
>>>
>>> 2.4 Changes to the Optimizer
>>> ------------------------------
>>> There is a need to add a TidScan node to planning UPDATE / DELETE
>>> statements where the statements are ?UPDATE / DELETE at position?. This
>>> is to enable the tuple ids of the tuples in the tables relating to the
>>> query to be obtained. There will need to be a new mechanism to achieve
>>> this, as at present, a Tid scan is done only if there is a standard WHERE
>>> condition on update or delete statements to provide Tid qualifier data.
>>>
>>>
>>> 2.5 Changes to the Executor
>>> -------------------------------
>>> There are various options that have been considered for this part of the
>>> enhancement. These are described in the sections below.
>>>
>>> We would like to hear opinions on which option is the best way to go or
>>> if none of these is acceptable, any alternate ideas ?
>>>
>>> Option 1  MVCC Via Continuous Searching of Database
>>>
>>> The Executor is to be changed in the following ways:
>>> 1)  When the FETCH statement is executed the id of the resulting tuple is
>>> extracted and passed back to the Portal structure to be saved to indicate
>>> the cursor is currently positioned on a tuple.
>>> 2)  When the UPDATE or DELETE request is executed the tuple id previously
>>> FETCHed is held in the QueryDesc structure so that it can be compared
>>> with the tuple ids returned from the TidScan node processed prior to the
>>> actual UPDATE / DELETE node in the plan. This enables a decision to be
>>> made as to whether the tuple held in the cursor is visible to the UPDATE
>>> / DELETE request according to the rules of concurrency. The result is
>>> that, at the cost of repeatedly searching the database at each UPDATE /
>>> DELETE command, the hash table is no longer required.
>>> This approach has the advantage that there is no hash table held in
>>> memory or on disk so it will not be memory intensive but will be
>>> processing intensive.
>>>
>>> This is a good ?one-off? solution to the problem and, taken in isolation
>>> is probably the best approach. However, if one considers the method(s)
>>> used in other areas of PostgreSQL, it is probably not the best solution.
>>> This option will probably not be used further.
>>>
>>> Option 2  MVCC via New Snapshot
>>>
>>> The executor can be changed by adding a new kind of snapshot that is
>>> specifically used for identifying if a given tuple, retrieved from the
>>> database during an update or delete statement should be visible during
>>> the current transaction.
>>>
>>> This approach requires a new kind of snapshot (this idea was used by
>>> Gavin for a previous updatable cursor patch but objections were raised.)
>>>
>>> Option 3  MVCC Via Hash Table in Memory
>>>
>>> The executor can be changed by saving into a hash table and comparing
>>> each tuple in the cursor with that set to check if the tuple should be
>>> visible. This approach has the advantage that it will be quick. It has
>>> the disadvantage that, since the hash table will contain all the tuples
>>> of the table being checked that it may use all local memory for a large
>>> table.
>>>
>>> Option 4  MVCC Via Hash Table on Disk
>>>
>>> When the UPDATE or DELETE request is executed the first time the Tid scan
>>> database retrieval will be done first. At this time the tuple id of each
>>> row in the table to be updated by the request will be available in the
>>> executor. These tuple ids need to be stored in a hash table that is
>>> stored to disk, as, if the table is large there could be a huge number of
>>> tuple ids. This data is then available for comparison with the individual
>>> tuple to be updated or deleted to check if it should be processed. The
>>> hash table will exist for the duration of the transaction, from BEGIN to
>>> END (or ABORT).
>>>
>>> The hash table is then used to identify if the tuple should be visible
>>> during the current transaction. If the tuple should be visible then the
>>> update or delete proceeds as usual.
>>>
>>> This approach has the advantage that it will use little memory but will
>>> be relatively slow as the data has to be accessed from disk.
>>>
>>> Option 5 Store Tuple Id in Snapshot.
>>>
>>> The Snapshot structure can be changed to include the tuple id. This
>>> enables the current state of the tuple to be identified with respect to
>>> the current transaction.
>>> The tuple id, as identified in the cursor at the point where the
>>> DELETE/UPDATE statement is being processed, can use the snapshot to
>>> identify if the tuple should be visible in the context of the current
>>> transaction.
>>>
>>>
>>> 2.6 Changes to the Catalog
>>> ----------------------------
>>> The Catalog needs to reflect changes introduced by the updatable cursor
>>> implementation. A boolean attribute ?is_for_update? is to be added to the
>>> pg_cursors implementation. It will define that the cursor is for update
>>> (value is FALSE) or for share (value is TRUE, the default value).
>>>
>>>
>>> 3 Design Assumptions
>>> ----------------------------
>>> The following design assumptions are made:
>>>
>>> As PostgreSQL8.2 does not support the SENSITIVE cursor option the tuples
>>> contained in a cursor can never be updated so these tuples will always
>>> appear in their ?original? form as at the start of the transaction. This
>>> is in breach of the SQL2003 Standard as described in
>>> 5WD-02-Foundation-2003-09.pdf, p 810. The standard requires the updatable
>>> cursor to be declared as sensitive.
>>>
>>> With respect to nested transactions ? In PostgreSQL nested transactions
>>> are implemented by defining ?save points? via the keyword SAVEPOINT. A
>>> ?ROLLBACK TO SAVEPOINT? rolls back the database contents to the last
>>> savepoint in this transaction or the begin statement, whichever is
>>> closer.
>>>
>>> It is assumed that the FETCH statement is used to return only a single
>>> row into the cursor with each command when the cursor is updatable.
>>>
>>> According to the SQL2003 Standard Update and Delete statements may
>>> contain only a single base table.
>>>
>>> The DECLARE CURSOR statement is supposed to use column level locking, but
>>> PostgreSQL supports only row level locking. The result of this is that
>>> the column list that the standard requires ?DECLARE <cursor_name> SELECT
>>> ? FOR UPDATE OF column-list? becomes a relation (table) list.
>>>
>>> This is an email from Fujitsu Australia Software Technology Pty Ltd, ABN
>>> 27 003 693 481. It is confidential to the ordinary user of the email
>>> address to which it was addressed and may contain copyright and/or
>>> legally privileged information. No one else may read, print, store, copy
>>> or forward all or any of it or its attachments. If you receive this email
>>> in error, please return to sender. Thank you.
>>>
>>> If you do not wish to receive commercial email messages from Fujitsu
>>> Australia Software Technology Pty Ltd, please email
>>> [EMAIL PROTECTED]
>>>
>>>
>>> ---------------------------(end of broadcast)---------------------------
>>> TIP 9: In versions below 8.0, the planner will ignore your desire to
>>>        choose an index scan if your joining column's datatypes do not
>>>        match
> This is an email from Fujitsu Australia Software Technology Pty Ltd, ABN 27 
> 003 693 481. It is confidential to the ordinary user of the email address to 
> which it was addressed and may contain copyright and/or legally privileged 
> information. No one else may read, print, store, copy or forward all or any 
> of it or its attachments. If you receive this email in error, please return 
> to sender. Thank you.
> 
> If you do not wish to receive commercial email messages from Fujitsu 
> Australia Software Technology Pty Ltd, please email [EMAIL PROTECTED]
> 


-- 

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to [EMAIL PROTECTED] so that your
       message can get through to the mailing list cleanly

Re: [HACKERS] Updateable cursors

Reply via email to