[jira] Commented: (DERBY-4007) Optimization of IN with nested SELECT

Romi Ou (JIRA) Wed, 22 Apr 2009 23:48:10 -0700

    [ 
https://issues.apache.org/jira/browse/DERBY-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12701825#action_12701825
 ]


Romi Ou commented on DERBY-4007:
--------------------------------

I ran into this issue too recently.
Here is my table schema:
create table TRADE_HISTORY (SEQID bigint not null unique, TRADEID varchar(16) 
not null, VERSION integer, STREAM varchar(20), TIMESTAMP varchar(31) not null, 
STATE char(1) default 'V', COMMENT varchar(512), CONTENT varchar(512), 
unique(TRADEID,DATASTREAMREF));

I added 50k records into the table. The SEQID of the records are from 0 to 
49999. 
Updated the STATE of 10k records:
update TRADE_HISTORY set STATE='E' where seqid >=40000 and seqid < 50000;

Then I query the table with below sql:
ij> select count(*) from TRADE_HISTORY where STATE= 'E' and SEQID in (select 
max(SEQID) from TRADE_HISTORY where seqid>=
40000 and seqid<40100 group by TRADEID) ;
1
-----------
100

1 row selected
ELAPSED TIME = 2375 milliseconds
ij> select count(*) from TRADE_HISTORY where STATE= 'E' and SEQID in (select 
max(SEQID) from TRADE_HISTORY where seqid>=
40000 and seqid<40200 group by TRADEID) ;
1
-----------
200

1 row selected
ELAPSED TIME = 9875 milliseconds
ij> select count(*) from TRADE_HISTORY where STATE= 'E' and SEQID in (select 
max(SEQID) from TRADE_HISTORY where seqid>=
40000 and seqid<40400 group by TRADEID) ;
1
-----------
400

1 row selected
ELAPSED TIME = 38812 milliseconds

After optimizing the query sql, the performance is much better. 
ij> select count(*) from (select * from TRADE_HISTORY where ( STATE= 'E' ) as 
errorTabl
e, (select max(SEQID) maxSeqID from TRADE_HISTORY where seqid>=40000 and 
seqid<40400 group by TRADEID) as tempT where er
rorTable.SEQID =tempT.maxSeqID ;
1
-----------
400

1 row selected
ELAPSED TIME = 47 milliseconds

By referring to Tuning Derby Doc 
(http://db.apache.org/derby/docs/10.4/tuning/tuning-single.html#rtuntransform582),
 I noticed that there is a "Simple IN predicate transformations" in Derby. 
I'm not quite sure about the root cause of this issue.
Is it estimated cost or the sql is transferred to another sql?


> Optimization of IN with nested SELECT
> -------------------------------------
>
>                 Key: DERBY-4007
>                 URL: https://issues.apache.org/jira/browse/DERBY-4007
>             Project: Derby
>          Issue Type: Bug
>          Components: Performance
>    Affects Versions: 10.4.2.0
>         Environment: Linux
>            Reporter: Mikkel Kamstrup Erlandsen
>            Priority: Minor
>         Attachments: dblook.log, dblook_p_index.log, derby.log, 
> derby_p_index.log
>
>
> The problem is with the following query:
> UPDATE summa_records SET base='foobar' WHERE id IN ( SELECT parentId FROM 
> summa_relations WHERE childId='horizon_2615441');
> It takes in the order of 30s to run when we expect something in the order of 
> 1-2ms.
> We have a setup with two tables
> summa_records:  1,5M rows
> summa_relations: ~350000 rows
> summa_records have and 'id' column that is also indexed and is the primary 
> key. The summa_relations table holds mappings between different ids.
> In our case the nested SELECT produces 2 hits, say, 'foo' and 'bar'. So the 
> UPDATE on these two hits should be quite snappy. If we run the SELECT alone 
> it runs in an instant, and also if we run with hardcoded ids for the IN 
> clause:
> UPDATE summa_records SET base='foobar' WHERE id IN ('foo', 'bar');
> We have instant execution. I'll attach a query plan in a sec.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-4007) Optimization of IN with nested SELECT

Reply via email to