[jira] Updated: (DERBY-1482) Update triggers on tables with blob columns stream blobs into memory even when the blobs are not referenced/accessed.

Mamta A. Satoor (JIRA) Fri, 26 Mar 2010 15:06:50 -0700

     [ 
https://issues.apache.org/jira/browse/DERBY-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mamta A. Satoor updated DERBY-1482:
-----------------------------------

    Attachment: derby1482_patch1_stat.txt
                derby1482_patch1_diff.txt
                junitUpgradeTestFailureWithPatch1.out

I am attaching a patch, derby1482_patch1_diff.txt,  which is NOT READY FOR 
commit yet because the required upgrade work for system table change has not 
been done yet. If this patch is committed in it's current state, the older 
releases databases will fail with the trunk even with hard upgrade. I plan to 
work on the upgrade work soon.

I ran derbyall and junit All suite with my patch. During the junite All suite 
run, I am getting lots of errors in the upgrade suite. The failures are 
attached to this jira as junitUpgradeTestFailureWithPatch1.out I think this 
probably is because of the new column that I have added to SYSTRIGGERS. I have 
not investigated this failure enough to know for sure but I will be tackling 
the upgrade changes soon. 

Next, I am planning on focusing on following tasks
1)Implement Rick's suggestion of including the extra information we need about 
trigger in existing column SYSTRIGGERS.REFERENCEDCOLUMNS This columns is of 
type ReferencedColumnsDescriptorImpl and I can hopefully enhance that type to 
include the additional information. Hopefully this approach will make it the 
next task easier which is to make upgrade changes.
2)Work on the upgrade code so this change in system table does not break older 
release databases when they are run in soft/hard upgrade mode with the 
suggested work done through this jira.
3)Add more test cases in addition to what was added through TriggerTests.java

One thing to keep in mind is the improvement proposed by this patch will work 
only for UPDATE triggers defined at row level AND they identify specific 
columns on which the trigger is defined. eg
        CREATE TRIGGER tr1 AFTER UPDATE OF c12 ON table1 
            REFERENCING OLD AS oldt NEW AS newt
            FOR EACH ROW UPDATE table2 SET c24=oldt.c14;
The trigger above is a row level UPDATE trigger and it identifies column "c12" 
as the triggering column. For all other kinds of triggers(INSERT, DELETE, 
UPDATE trigger defined at table leve, UPDATE trigger defined at row level but 
does not identify trigger columns), the improvement proposed by this patch will 
not kick in. That is because all the other kinds of triggers do not identify 
specific trigger columns. In the absence of specific trigger columns, Derby 
decides to fetch in all the columns from the triggering table since it doesn't 
know what column it might need to fetch at trigger execution time.

Following is the description of the existing behavior and changes made through 
my patch. It is rather long but I wanted to provide as much information as I 
can since some part of the code is little tricky.

I am going to work with following example to show what is being done by Derby 
today and what changes I am suggesting.
        CREATE TABLE table1 (c11 int, c12 int, c13 int, c14 int, c15 int);
        INSERT INTO table1 VALUES(1,2,3,4,5);
        CREATE TABLE table2 (c21 int, c22 int, c23 int, c24 int, c25 int);
        INSERT INTO table2 VALUES(2,2,3,-1,5);
        CREATE TRIGGER tr1 AFTER UPDATE OF c12 ON table1 
            REFERENCING OLD AS oldt NEW AS newt
            FOR EACH ROW UPDATE table2 SET c24=oldt.c14;
In the update trigger tr1 above, the trigger column is c12 and trigger action 
references old/new transition variable oldt.c14 Note that the trigger table, 
table1, has 5 columns and of those only 2 columns are really needed by the 
trigger tr1. The trunk code converts the trigger action UPDATE table2 SET 
c24=oldt.c14 to UPDATE table2 SET 
c24=org.apache.derby.iapi.db.Factory::getTriggerExecutionContext().getOldRow().getInt(4)
Note that oldt.c14 gets referenced through it's column position in triggering 
table table1.
Now let's consider the triggering statement
        UPDATE table1 SET c12 = -9 WHERE c11=1;
The trunk code is written to fetch all the 5 columns in trigger table table1 
even though the trigger is only in need of columns c12 and c14. If in our 
example, if table1 had a LOB column with large amount of data, we would fetch 
that LOB column in memory too even though trigger does not need that column.

With my patch, I am keeping track of references to old/new transition variables 
in trigger action. Trunk code already keeps track of trigger columns in 
SYSTRIGGERS.REFERENCEDCOLUMNS. The combination of the two will tell us exactly 
what columns are needed for trigger execution from the triggering table and we 
fetch only those columns. For the trigger tr1, that would mean column c12(which 
is the trigger column) and column c14(which is in the trigger action). Because 
of this, columns c11, c13 and c15 will not be read into memory. If one or more 
of those columns were LOB columns, that can result in lot of performance and 
memory consumption improvement. My code will convert the trigger action UPDATE 
table2 SET c24=oldt.c14 
to 
UPDATE table2 SET 
c24=org.apache.derby.iapi.db.Factory::getTriggerExecutionContext().getOldRow().getInt(2)
Note that oldt.c14 get referenced through it's column position is what columns 
are fetched and not by it's column position in the trigger table table1. In our 
specific eg, we are fetching column c12 and c14 and column c14 is in the 2nd 
position in that set. But when we make a note of the trigger action columns in 
SYSTRIGGERS, we want to record it by it's column position in the trigger table 
to know exactly what columns we are talking about. So, for our trigger eg 
below, SYSTRIGGERS will look as follows
        CREATE TRIGGER tr1 AFTER UPDATE OF c12 ON table1 
            REFERENCING OLD AS oldt NEW AS newt
            FOR EACH ROW UPDATE table2 SET c24=oldt.c14;
select TRIGGERNAME, EVENT, TYPE, REFERENCEDCOLUMNS, REFCOLSTRIGGERACTION, 
TRIGGERDEFINITION from sys.systriggers;
TRIGGERNAME|EVENT|TYPE|REFERENCEDCOLUMNS|REFCOLSTRIGGERACTION|TRIGGERDEFINITION
--------------------------------------------------------------------------------------------
TR1        |U    |R   |(2)              |(4)                 |update table2 set 
c24=oldt.c14
So, the column numbers to recognize the trigger columns and trigger action 
columns, we use their position in triggering table. But to convert the trigger 
action to 
org.apache.derby.iapi.db.Factory::getTriggerExecutionContext().getxxxRow().getXXX(?),
 for ?, we 
need to use the relative column position in the columns actually fetched from 
triggering table. My apologies if this sounds confusing. In one of intermediate 
work on my machine, my code didn't account for this difference in column 
position in table vs column position in the columns 
actually fetched, and I started getting NPE because there is no column at 
position 4 since we are in a need to fetch only 2 columns. This 2 different 
column numbering scheme makes the code little tricky to follow but if we one 
understands the eg above, I think then it becomes easier to understand the 
code. I will be happy to hear if someone has ideas on may be code this 
differently. 


> Update triggers on tables with blob columns stream blobs into memory even 
> when the blobs are not referenced/accessed.
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-1482
>                 URL: https://issues.apache.org/jira/browse/DERBY-1482
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.2.1.6
>            Reporter: Daniel John Debrunner
>            Assignee: Mamta A. Satoor
>            Priority: Minor
>         Attachments: derby1482_patch1_diff.txt, derby1482_patch1_stat.txt, 
> derby1482DeepCopyAfterTriggerOnLobColumn.java, derby1482Repro.java, 
> derby1482ReproVersion2.java, junitUpgradeTestFailureWithPatch1.out, 
> TriggerTests_ver1_diff.txt, TriggerTests_ver1_stat.txt
>
>
> Suppose I have 1) a table "t1" with blob data in it, and 2) an UPDATE trigger 
> "tr1" defined on that table, where the triggered-SQL-action for "tr1" does 
> NOT reference any of the blob columns in the table. [ Note that this is 
> different from DERBY-438 because DERBY-438 deals with triggers that _do_ 
> reference the blob column(s), whereas this issue deals with triggers that do 
> _not_ reference the blob columns--but I think they're related, so I'm 
> creating this as subtask to 438 ]. In such a case, if the trigger is fired, 
> the blob data will be streamed into memory and thus consume JVM heap, even 
> though it (the blob data) is never actually referenced/accessed by the 
> trigger statement.
> For example, suppose we have the following DDL:
>     create table t1 (id int, status smallint, bl blob(2G));
>     create table t2 (id int, updated int default 0);
>     create trigger tr1 after update of status on t1 referencing new as n_row 
> for each row mode db2sql update t2 set updated = updated + 1 where t2.id = 
> n_row.id;
> Then if t1 and t2 both have data and we make a call to:
>     update t1 set status = 3;
> the trigger tr1 will fire, which will cause the blob column in t1 to be 
> streamed into memory for each row affected by the trigger. The result is 
> that, if the blob data is large, we end up using a lot of JVM memory when we 
> really shouldn't have to (at least, in _theory_ we shouldn't have to...).
> Ideally, Derby could figure out whether or not the blob column is referenced, 
> and avoid streaming the lob into memory whenever possible (hence this is 
> probably more of an "enhancement" request than a bug)... 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-1482) Update triggers on tables with blob columns stream blobs into memory even when the blobs are not referenced/accessed.

Reply via email to