[
https://issues.apache.org/jira/browse/DERBY-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12701825#action_12701825
]
Romi Ou commented on DERBY-4007:
--------------------------------
I ran into this issue too recently.
Here is my table schema:
create table TRADE_HISTORY (SEQID bigint not null unique, TRADEID varchar(16)
not null, VERSION integer, STREAM varchar(20), TIMESTAMP varchar(31) not null,
STATE char(1) default 'V', COMMENT varchar(512), CONTENT varchar(512),
unique(TRADEID,DATASTREAMREF));
I added 50k records into the table. The SEQID of the records are from 0 to
49999.
Updated the STATE of 10k records:
update TRADE_HISTORY set STATE='E' where seqid >=40000 and seqid < 50000;
Then I query the table with below sql:
ij> select count(*) from TRADE_HISTORY where STATE= 'E' and SEQID in (select
max(SEQID) from TRADE_HISTORY where seqid>=
40000 and seqid<40100 group by TRADEID) ;
1
-----------
100
1 row selected
ELAPSED TIME = 2375 milliseconds
ij> select count(*) from TRADE_HISTORY where STATE= 'E' and SEQID in (select
max(SEQID) from TRADE_HISTORY where seqid>=
40000 and seqid<40200 group by TRADEID) ;
1
-----------
200
1 row selected
ELAPSED TIME = 9875 milliseconds
ij> select count(*) from TRADE_HISTORY where STATE= 'E' and SEQID in (select
max(SEQID) from TRADE_HISTORY where seqid>=
40000 and seqid<40400 group by TRADEID) ;
1
-----------
400
1 row selected
ELAPSED TIME = 38812 milliseconds
After optimizing the query sql, the performance is much better.
ij> select count(*) from (select * from TRADE_HISTORY where ( STATE= 'E' ) as
errorTabl
e, (select max(SEQID) maxSeqID from TRADE_HISTORY where seqid>=40000 and
seqid<40400 group by TRADEID) as tempT where er
rorTable.SEQID =tempT.maxSeqID ;
1
-----------
400
1 row selected
ELAPSED TIME = 47 milliseconds
By referring to Tuning Derby Doc
(http://db.apache.org/derby/docs/10.4/tuning/tuning-single.html#rtuntransform582),
I noticed that there is a "Simple IN predicate transformations" in Derby.
I'm not quite sure about the root cause of this issue.
Is it estimated cost or the sql is transferred to another sql?
> Optimization of IN with nested SELECT
> -------------------------------------
>
> Key: DERBY-4007
> URL: https://issues.apache.org/jira/browse/DERBY-4007
> Project: Derby
> Issue Type: Bug
> Components: Performance
> Affects Versions: 10.4.2.0
> Environment: Linux
> Reporter: Mikkel Kamstrup Erlandsen
> Priority: Minor
> Attachments: dblook.log, dblook_p_index.log, derby.log,
> derby_p_index.log
>
>
> The problem is with the following query:
> UPDATE summa_records SET base='foobar' WHERE id IN ( SELECT parentId FROM
> summa_relations WHERE childId='horizon_2615441');
> It takes in the order of 30s to run when we expect something in the order of
> 1-2ms.
> We have a setup with two tables
> summa_records: 1,5M rows
> summa_relations: ~350000 rows
> summa_records have and 'id' column that is also indexed and is the primary
> key. The summa_relations table holds mappings between different ids.
> In our case the nested SELECT produces 2 hits, say, 'foo' and 'bar'. So the
> UPDATE on these two hits should be quite snappy. If we run the SELECT alone
> it runs in an instant, and also if we run with hardcoded ids for the IN
> clause:
> UPDATE summa_records SET base='foobar' WHERE id IN ('foo', 'bar');
> We have instant execution. I'll attach a query plan in a sec.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.