Re: [HACKERS] Updateable cursors

Joshua D. Drake Mon, 22 Jan 2007 20:45:45 -0800

FAST PostgreSQL wrote:
> We are trying to develop the updateable cursors functionality into 
> Postgresql. I have given below details of the design and also issues we are 
> facing.  Looking forward to the advice on how to proceed with these issues.
> 
> Rgds,
> Arul Shaji


Would this be something that you would hope to submit for 8.3?

Joshua D. Drake


> 
> 
> 
> 
>  
> 1. Introduction
> --------------
> This is a combined proposal and design document for adding updatable 
> (insensitive) cursor capability to the PostgreSQL database. 
> There have already been a couple of previous proposals since 2003 for 
> implementing this feature so there appears to be community interest in doing 
> so. This will enable the following constructs to be processed:
> 
> 
> UPDATE <table_name> SET value_list WHERE CURRENT OF <cursor_name>
> DELETE FROM <table_name> WHERE CURRENT OF <cursor_name>
> 
> This has the effect of users being able to update or delete specific rows of 
> a table, as defined by the row currently fetched into the cursor.
> 
> 
> 2. Overall Conceptual Design
> -----------------------------
> The design is considered from the viewpoint of progression of a command 
> through the various stages of processing, from changes to the file ‘gram.y’ 
> to implement the actual grammar changes, through to changes in the Executor 
> portion of the database architecture.
> 
> 2.1 Changes to the Grammar
> ------------------------------
> The following changes will be done to the PostgreSQL grammar:
> 
> UPDATE statement has the option ‘WHERE CURRENT OF <cursor_name>’ added
> DELETE statement has the option ‘WHERE CURRENT OF <cursor_name>’ added
> 
> The cursor_name data is held in the UpdateStmt and DeleteStmt structures and 
> contains just the name of the cursor.
> 
> The pl/pgsql grammar changes in the same manner.
> 
> The word CURRENT will be added to the ScanKeywords array in keywords.c.
> 
> 
> 2.2 Changes to Affected Data Structures
> ------------------------------------------
> The following data structures are affected by this change: 
> 
> Portal structure, QueryDesc structure, the UpdateStmt and DeleteStmt 
> structures
> 
> The Portal will contain a list of structures of relation ids and tuple ids 
> relating to the tuple held in the QueryDesc structure. There will be one 
> entry in the relation and tuple id list for each entry in the relation-list 
> of the statement below: 
> 
> DECLARE <cursor_name> [WITH HOLD] SELECT FOR UPDATE OF <relation-list> 
> 
> The QueryDesc structure will contain the relation id and the tuple id 
> relating to the tuple obtained via the FETCH command so that it can be 
> propagated back to the Portal for storage in the list described above.
> 
> The UpdateStmt and DeleteStmt structures have the cursor name added so that 
> the information is available for use in obtaining the portal structure 
> related to the cursor previously opened via the DECLARE CURSOR request.
> 
> 
> 2.3 Changes to the SQL Parser
> ------------------------------------
> At present, although the FOR UPDATE clause of the DECLARE CURSOR command has 
> been present in the grammar, it causes an error message later in the 
> processing since cursors are currently not updatable. This now needs to 
> change. The ‘FOR UPDATE’ clause has to be valid, but not the ‘FOR SHARE’ 
> clause. 
> 
> The relation names that follow the ‘FOR UPDATE’ clause will be added to the 
> rtable in the Query structure and identified by means of the rowMarks array. 
> In the case of an updatable cursor the FOR SHARE option is not allowed 
> therefore all entries in the rtable that are identified by the rowMarks array 
> must relate to tables that are FOR UPDATE.
> 
> In the UPDATE or DELETE statements the ‘WHERE CURRENT OF <cursor_name>’ 
> clause results in the cursor name being placed in the UpdateStmt or 
> DeleteStmt structure. During the processing of the functions - 
> transformDeleteStmt() and transformUpdateStmt() - the cursor name is used to 
> obtain a pointer to the related Portal structure and the tuple affected by 
> the current UPDATE or DELETE statement is extracted from the Portal, where it 
> has been placed as the result of a previous FETCH request. At this point all 
> the information for the UPDATE or DELETE statement is available so the 
> statements can be transformed into standard UPDATE or DELETE statements and 
> sent for re-write/planning/execution as usual.
> 
> 2.4 Changes to the Optimizer
> ------------------------------
> There is a need to add a TidScan node to planning UPDATE / DELETE statements 
> where the statements are ‘UPDATE / DELETE at position’. This is to enable the 
> tuple ids of the tuples in the tables relating to the query to be obtained. 
> There will need to be a new mechanism to achieve this, as at present, a Tid 
> scan is done only if there is a standard WHERE condition on update or delete 
> statements to provide Tid qualifier data.
> 
> 
> 2.5 Changes to the Executor
> -------------------------------
> There are various options that have been considered for this part of the 
> enhancement. These are described in the sections below.
> 
> We would like to hear opinions on which option is the best way to go or if 
> none of these is acceptable, any alternate ideas ?
> 
> Option 1  MVCC Via Continuous Searching of Database
> 
> The Executor is to be changed in the following ways:
> 1)    When the FETCH statement is executed the id of the resulting tuple is 
> extracted and passed back to the Portal structure to be saved to indicate the 
> cursor is currently positioned on a tuple.
> 2)    When the UPDATE or DELETE request is executed the tuple id previously 
> FETCHed is held in the QueryDesc structure so that it can be compared with 
> the tuple ids returned from the TidScan node processed prior to the actual 
> UPDATE / DELETE node in the plan. This enables a decision to be made as to 
> whether the tuple held in the cursor is visible to the UPDATE / DELETE 
> request according to the rules of concurrency. The result is that, at the 
> cost of repeatedly searching the database at each UPDATE / DELETE command, 
> the hash table is no longer required.
> This approach has the advantage that there is no hash table held in memory or 
> on disk so it will not be memory intensive but will be processing intensive. 
> 
> This is a good ‘one-off’ solution to the problem and, taken in isolation is 
> probably the best approach. However, if one considers the method(s) used in 
> other areas of PostgreSQL, it is probably not the best solution. This option 
> will probably not be used further.
> 
> Option 2  MVCC via New Snapshot
> 
> The executor can be changed by adding a new kind of snapshot that is 
> specifically used for identifying if a given tuple, retrieved from the 
> database during an update or delete statement should be visible during the 
> current transaction.
> 
> This approach requires a new kind of snapshot (this idea was used by Gavin 
> for a previous updatable cursor patch but objections were raised.)
> 
> Option 3  MVCC Via Hash Table in Memory
>  
> The executor can be changed by saving into a hash table and comparing each 
> tuple in the cursor with that set to check if the tuple should be visible.
> This approach has the advantage that it will be quick. It has the 
> disadvantage that, since the hash table will contain all the tuples of the 
> table being checked that it may use all local memory for a large table.
> 
> Option 4  MVCC Via Hash Table on Disk
>  
> When the UPDATE or DELETE request is executed the first time the Tid scan 
> database retrieval will be done first. At this time the tuple id of each row 
> in the table to be updated by the request will be available in the executor. 
> These tuple ids need to be stored in a hash table that is stored to disk, as, 
> if the table is large there could be a huge number of tuple ids. This data is 
> then available for comparison with the individual tuple to be updated or 
> deleted to check if it should be processed. The hash table will exist for the 
> duration of the transaction, from BEGIN to END (or ABORT). 
> 
> The hash table is then used to identify if the tuple should be visible during 
> the current transaction. If the tuple should be visible then the update or 
> delete proceeds as usual.
> 
> This approach has the advantage that it will use little memory but will be 
> relatively slow as the data has to be accessed from disk. 
> 
> Option 5 Store Tuple Id in Snapshot.
> 
> The Snapshot structure can be changed to include the tuple id. This enables 
> the current state of the tuple to be identified with respect to the current 
> transaction.
> The tuple id, as identified in the cursor at the point where the 
> DELETE/UPDATE statement is being processed, can use the snapshot to identify 
> if the tuple should be visible in the context of the current transaction.
> 
> 
> 2.6 Changes to the Catalog 
> ----------------------------
> The Catalog needs to reflect changes introduced by the updatable cursor 
> implementation. A boolean attribute ‘is_for_update’ is to be added to the 
> pg_cursors implementation. It will define that the cursor is for update 
> (value is FALSE) or for share (value is TRUE, the default value).
>  
> 
> 3 Design Assumptions
> ----------------------------
> The following design assumptions are made:
> 
> As PostgreSQL8.2 does not support the SENSITIVE cursor option the tuples 
> contained in a cursor can never be updated so these tuples will always appear 
> in their ‘original’ form as at the start of the transaction. This is in 
> breach of the SQL2003 Standard as described in 5WD-02-Foundation-2003-09.pdf, 
> p 810. The standard requires the updatable cursor to be declared as sensitive.
> 
> With respect to nested transactions – In PostgreSQL nested transactions are 
> implemented by defining ‘save points’ via the keyword SAVEPOINT. A ‘ROLLBACK 
> TO SAVEPOINT’ rolls back the database contents to the last savepoint in this 
> transaction or the begin statement, whichever is closer.
> 
> It is assumed that the FETCH statement is used to return only a single row 
> into the cursor with each command when the cursor is updatable.
> 
> According to the SQL2003 Standard Update and Delete statements may contain 
> only a single base table.
> 
> The DECLARE CURSOR statement is supposed to use column level locking, but 
> PostgreSQL supports only row level locking. The result of this is that the 
> column list that the standard requires ‘DECLARE <cursor_name> SELECT … FOR 
> UPDATE OF column-list’ becomes a relation (table) list.
> 
> This is an email from Fujitsu Australia Software Technology Pty Ltd, ABN 27 
> 003 693 481. It is confidential to the ordinary user of the email address to 
> which it was addressed and may contain copyright and/or legally privileged 
> information. No one else may read, print, store, copy or forward all or any 
> of it or its attachments. If you receive this email in error, please return 
> to sender. Thank you.
> 
> If you do not wish to receive commercial email messages from Fujitsu 
> Australia Software Technology Pty Ltd, please email [EMAIL PROTECTED]
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
> 


-- 

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org

Re: [HACKERS] Updateable cursors

Reply via email to