Hi,

When using H2 for our tests I noticed that adding an IN condition into
the WHERE clause slows down the queries.

To check this in isolation from our application I've written this
simple class. This is not at all a comprehensive benchmark but I think
you may find my results interesting.


package test;

import static java.lang.System.currentTimeMillis;
import static java.lang.System.out;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.Arrays;
import java.util.Random;

/**
 * @author [EMAIL PROTECTED] on 29 Aug 2008
 */
public class SearchByPKPerformance {

        public static void main(String[] args) throws SQLException {

                DriverManager.registerDriver(new org.h2.Driver());
                DriverManager.registerDriver(new com.mysql.jdbc.Driver());

                String h2url = "jdbc:h2:mem:test";
                String mysqlUrl = "jdbc:mysql://localhost:3306/test?
user=test&password=test";

                for (String url : Arrays.asList(h2url, mysqlUrl)) {

                        out.println("Testing " + url);

                        long start = System.currentTimeMillis();

                        Connection conn = DriverManager.getConnection(url);
                        Statement stmt = conn.createStatement();

                        Random rnd = new Random(1234);

                        stmt.execute("DROP TABLE IF EXISTS t");
                        stmt.execute("CREATE TABLE t (pk INT, PRIMARY KEY 
(pk))");

                        for (int i = 0; i < 10000; i++)
                                stmt.execute("INSERT INTO t (pk) VALUES (" + 
rnd.nextInt() + ")");

                        out.println("Initialization time: " + 
(currentTimeMillis() - start)
+ "ms");

                        start = currentTimeMillis();
                        for (int i = 0; i < 1000; i++)
                                stmt.executeQuery("SELECT pk FROM t WHERE pk = 
" + rnd.nextInt());
                        out.println("A. WHERE pk=?          [" + 
(currentTimeMillis() -
start) + "ms]");

                        start = currentTimeMillis();
                        for (int i = 0; i < 1000; i++)
                                stmt.executeQuery("SELECT pk FROM t WHERE pk IN 
(" + rnd.nextInt()
+ ", "
                                                + rnd.nextInt() + ")");
                        out.println("B. WHERE pk IN (?, ?)  [" + 
(currentTimeMillis() -
start) + "ms]");

                        start = currentTimeMillis();
                        for (int i = 0; i < 1000; i++)
                                stmt.executeQuery("SELECT pk FROM t WHERE pk = 
" + rnd.nextInt() +
" OR pk = "
                                                + rnd.nextInt());
                        out.println("C. WHERE pk=? OR pk=?  [" + 
(currentTimeMillis() -
start) + "ms]");

                        start = currentTimeMillis();
                        for (int i = 0; i < 1000; i++)
                                stmt.executeQuery("SELECT pk FROM t WHERE pk = 
" + rnd.nextInt() +
" AND pk = "
                                                + rnd.nextInt());
                        out.println("D. WHERE pk=? AND pk=? [" + 
(currentTimeMillis() -
start) + "ms]");

                        start = currentTimeMillis();
                        for (int i = 0; i < 1000; i++)
                                stmt.executeQuery("SELECT pk FROM t WHERE pk = 
" + rnd.nextInt() +
" AND pk > "
                                                + rnd.nextInt());
                        out.println("E. WHERE pk=? AND pk>? [" + 
(currentTimeMillis() -
start) + "ms]");

                }

        }

}



When run this code creates a one column table with an integer PK. It
is filled in with 10K records. Then I execute a few types of queries
containing a PK condition that limits number of returned rows to at
most 2.

The results from my laptop (Sun JDK 1.6, H2 1.0.76):


Testing jdbc:h2:mem:test
Initialization time: 469ms
A. WHERE pk=?          [109ms]
B. WHERE pk IN (?, ?)  [1000ms]
C. WHERE pk=? OR pk=?  [3188ms]
D. WHERE pk=? AND pk=? [31ms]
E. WHERE pk=? AND pk>? [531ms]
Testing jdbc:mysql://localhost:3306/test?user=test&password=test
Initialization time: 8985ms
A. WHERE pk=?          [219ms]
B. WHERE pk IN (?, ?)  [156ms]
C. WHERE pk=? OR pk=?  [140ms]
D. WHERE pk=? AND pk=? [110ms]
E. WHERE pk=? AND pk>? [125ms]


"A" shows that simple equality condition on the PK works fast. In "B"
placing an IN condition makes query 10 times slower. I would expect
slowing down by the factor of 2. "C" uses an OR condition equivalent
to the one used in "B" but it is 3 times slower. "D" doesn't trick the
H2 optimizer but "E" does because for "E" it would be sufficient to
evaluate the GREATER condition only for one row.

As a comparison I've done the same tests on the "real" database -
MySQL. MySQL is more predictable regarding these kind of queries. In
general, in these tests the SELECTs are 5 times faster on MySQL then
on H2. The cost of this is much longer INSERT time on MySQL.

I wish H2 was improved in this area because in our persistence
framework we use this type of queries quite commonly. However, H2 is
still a great tool for our tests.


Br,
Karol

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To post to this group, send email to h2-database@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/h2-database?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to