[
https://issues.apache.org/jira/browse/DERBY-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mamta A. Satoor updated DERBY-1482:
-----------------------------------
Attachment: derby1482_patch1_stat.txt
derby1482_patch1_diff.txt
junitUpgradeTestFailureWithPatch1.out
I am attaching a patch, derby1482_patch1_diff.txt, which is NOT READY FOR
commit yet because the required upgrade work for system table change has not
been done yet. If this patch is committed in it's current state, the older
releases databases will fail with the trunk even with hard upgrade. I plan to
work on the upgrade work soon.
I ran derbyall and junit All suite with my patch. During the junite All suite
run, I am getting lots of errors in the upgrade suite. The failures are
attached to this jira as junitUpgradeTestFailureWithPatch1.out I think this
probably is because of the new column that I have added to SYSTRIGGERS. I have
not investigated this failure enough to know for sure but I will be tackling
the upgrade changes soon.
Next, I am planning on focusing on following tasks
1)Implement Rick's suggestion of including the extra information we need about
trigger in existing column SYSTRIGGERS.REFERENCEDCOLUMNS This columns is of
type ReferencedColumnsDescriptorImpl and I can hopefully enhance that type to
include the additional information. Hopefully this approach will make it the
next task easier which is to make upgrade changes.
2)Work on the upgrade code so this change in system table does not break older
release databases when they are run in soft/hard upgrade mode with the
suggested work done through this jira.
3)Add more test cases in addition to what was added through TriggerTests.java
One thing to keep in mind is the improvement proposed by this patch will work
only for UPDATE triggers defined at row level AND they identify specific
columns on which the trigger is defined. eg
CREATE TRIGGER tr1 AFTER UPDATE OF c12 ON table1
REFERENCING OLD AS oldt NEW AS newt
FOR EACH ROW UPDATE table2 SET c24=oldt.c14;
The trigger above is a row level UPDATE trigger and it identifies column "c12"
as the triggering column. For all other kinds of triggers(INSERT, DELETE,
UPDATE trigger defined at table leve, UPDATE trigger defined at row level but
does not identify trigger columns), the improvement proposed by this patch will
not kick in. That is because all the other kinds of triggers do not identify
specific trigger columns. In the absence of specific trigger columns, Derby
decides to fetch in all the columns from the triggering table since it doesn't
know what column it might need to fetch at trigger execution time.
Following is the description of the existing behavior and changes made through
my patch. It is rather long but I wanted to provide as much information as I
can since some part of the code is little tricky.
I am going to work with following example to show what is being done by Derby
today and what changes I am suggesting.
CREATE TABLE table1 (c11 int, c12 int, c13 int, c14 int, c15 int);
INSERT INTO table1 VALUES(1,2,3,4,5);
CREATE TABLE table2 (c21 int, c22 int, c23 int, c24 int, c25 int);
INSERT INTO table2 VALUES(2,2,3,-1,5);
CREATE TRIGGER tr1 AFTER UPDATE OF c12 ON table1
REFERENCING OLD AS oldt NEW AS newt
FOR EACH ROW UPDATE table2 SET c24=oldt.c14;
In the update trigger tr1 above, the trigger column is c12 and trigger action
references old/new transition variable oldt.c14 Note that the trigger table,
table1, has 5 columns and of those only 2 columns are really needed by the
trigger tr1. The trunk code converts the trigger action UPDATE table2 SET
c24=oldt.c14 to UPDATE table2 SET
c24=org.apache.derby.iapi.db.Factory::getTriggerExecutionContext().getOldRow().getInt(4)
Note that oldt.c14 gets referenced through it's column position in triggering
table table1.
Now let's consider the triggering statement
UPDATE table1 SET c12 = -9 WHERE c11=1;
The trunk code is written to fetch all the 5 columns in trigger table table1
even though the trigger is only in need of columns c12 and c14. If in our
example, if table1 had a LOB column with large amount of data, we would fetch
that LOB column in memory too even though trigger does not need that column.
With my patch, I am keeping track of references to old/new transition variables
in trigger action. Trunk code already keeps track of trigger columns in
SYSTRIGGERS.REFERENCEDCOLUMNS. The combination of the two will tell us exactly
what columns are needed for trigger execution from the triggering table and we
fetch only those columns. For the trigger tr1, that would mean column c12(which
is the trigger column) and column c14(which is in the trigger action). Because
of this, columns c11, c13 and c15 will not be read into memory. If one or more
of those columns were LOB columns, that can result in lot of performance and
memory consumption improvement. My code will convert the trigger action UPDATE
table2 SET c24=oldt.c14
to
UPDATE table2 SET
c24=org.apache.derby.iapi.db.Factory::getTriggerExecutionContext().getOldRow().getInt(2)
Note that oldt.c14 get referenced through it's column position is what columns
are fetched and not by it's column position in the trigger table table1. In our
specific eg, we are fetching column c12 and c14 and column c14 is in the 2nd
position in that set. But when we make a note of the trigger action columns in
SYSTRIGGERS, we want to record it by it's column position in the trigger table
to know exactly what columns we are talking about. So, for our trigger eg
below, SYSTRIGGERS will look as follows
CREATE TRIGGER tr1 AFTER UPDATE OF c12 ON table1
REFERENCING OLD AS oldt NEW AS newt
FOR EACH ROW UPDATE table2 SET c24=oldt.c14;
select TRIGGERNAME, EVENT, TYPE, REFERENCEDCOLUMNS, REFCOLSTRIGGERACTION,
TRIGGERDEFINITION from sys.systriggers;
TRIGGERNAME|EVENT|TYPE|REFERENCEDCOLUMNS|REFCOLSTRIGGERACTION|TRIGGERDEFINITION
--------------------------------------------------------------------------------------------
TR1 |U |R |(2) |(4) |update table2 set
c24=oldt.c14
So, the column numbers to recognize the trigger columns and trigger action
columns, we use their position in triggering table. But to convert the trigger
action to
org.apache.derby.iapi.db.Factory::getTriggerExecutionContext().getxxxRow().getXXX(?),
for ?, we
need to use the relative column position in the columns actually fetched from
triggering table. My apologies if this sounds confusing. In one of intermediate
work on my machine, my code didn't account for this difference in column
position in table vs column position in the columns
actually fetched, and I started getting NPE because there is no column at
position 4 since we are in a need to fetch only 2 columns. This 2 different
column numbering scheme makes the code little tricky to follow but if we one
understands the eg above, I think then it becomes easier to understand the
code. I will be happy to hear if someone has ideas on may be code this
differently.
> Update triggers on tables with blob columns stream blobs into memory even
> when the blobs are not referenced/accessed.
> ---------------------------------------------------------------------------------------------------------------------
>
> Key: DERBY-1482
> URL: https://issues.apache.org/jira/browse/DERBY-1482
> Project: Derby
> Issue Type: Bug
> Components: SQL
> Affects Versions: 10.2.1.6
> Reporter: Daniel John Debrunner
> Assignee: Mamta A. Satoor
> Priority: Minor
> Attachments: derby1482_patch1_diff.txt, derby1482_patch1_stat.txt,
> derby1482DeepCopyAfterTriggerOnLobColumn.java, derby1482Repro.java,
> derby1482ReproVersion2.java, junitUpgradeTestFailureWithPatch1.out,
> TriggerTests_ver1_diff.txt, TriggerTests_ver1_stat.txt
>
>
> Suppose I have 1) a table "t1" with blob data in it, and 2) an UPDATE trigger
> "tr1" defined on that table, where the triggered-SQL-action for "tr1" does
> NOT reference any of the blob columns in the table. [ Note that this is
> different from DERBY-438 because DERBY-438 deals with triggers that _do_
> reference the blob column(s), whereas this issue deals with triggers that do
> _not_ reference the blob columns--but I think they're related, so I'm
> creating this as subtask to 438 ]. In such a case, if the trigger is fired,
> the blob data will be streamed into memory and thus consume JVM heap, even
> though it (the blob data) is never actually referenced/accessed by the
> trigger statement.
> For example, suppose we have the following DDL:
> create table t1 (id int, status smallint, bl blob(2G));
> create table t2 (id int, updated int default 0);
> create trigger tr1 after update of status on t1 referencing new as n_row
> for each row mode db2sql update t2 set updated = updated + 1 where t2.id =
> n_row.id;
> Then if t1 and t2 both have data and we make a call to:
> update t1 set status = 3;
> the trigger tr1 will fire, which will cause the blob column in t1 to be
> streamed into memory for each row affected by the trigger. The result is
> that, if the blob data is large, we end up using a lot of JVM memory when we
> really shouldn't have to (at least, in _theory_ we shouldn't have to...).
> Ideally, Derby could figure out whether or not the blob column is referenced,
> and avoid streaming the lob into memory whenever possible (hence this is
> probably more of an "enhancement" request than a bug)...
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.