Re: [h2] org.h2.util.StringUtils.toUpperEnglish can be a bottleneck
The patch seems good. Here are my test results: Typical query time after JVM warm-up h2 1.3.176 - 716 ms h2 1.4.185 (but not using MV_STORE) - 350 ms h2 1.4.185 (but not using MV_STORE) with patch - 243 ms Regards, Steve On Tuesday, 10 February 2015 23:36:34 UTC+6:30, Thomas Mueller wrote: Hi, I have committed my patch, could you please test it? You would need to compile H2 from the trunk, see https://code.google.com/p/support/source/checkout Regards, Thomas On Tue, Feb 10, 2015 at 7:45 AM, Thomas Mueller thomas.to...@gmail.com javascript: wrote: Hi, It might make sense to use a concurrent and global cache in the StringUtils class instead, similar to StringUtils.softCache (StringUtils.fromCacheOrNew). That way, other components could benefit from it as well, and small result sets would also benefit. I will write a patch. Regards, Thomas On Tuesday, February 10, 2015, Steve McLeod steve@gmail.com javascript: wrote: Hi Thomas and Noel, I've created a patch that caches the results of toUpperEnglish on a per JdbcResultSet basis. In my specific use case (fetching 95 columns by column name over tens of thousands of rows), the query speed-up was extremely significant, and utterly repeatable. Before: 510 milliseconds. After: 230 milliseconds. The patch is attached. I wrote the code, it's mine, and I'm contributing it to H2 for distribution multiple-licensed under the MPL 2.0, and the EPL 1.0 (http://h2database.com/html/license.html). Would either of you mind taking a look to see if you think it is worth committing? In particular, I'm concerned that my use of HashMap may be incorrect if multiple threads are sharing the result set. Regards, Steve On Monday, 9 February 2015 22:45:12 UTC+6:30, Steve McLeod wrote: Hi Noel, On Monday, 9 February 2015 19:14:32 UTC+6:30, Noel Grandin wrote: Hi On 2015-02-09 02:21 PM, Steve McLeod wrote: final ResultSet resultSet = conn.prepareStatement(SELECT * FROM foobar).executeQuery(); int rowCount = 0; while (resultSet.next()) { rowCount++; final int columnCount = resultSet.getMetaData().getColumnCount(); for (int column = 1; column = columnCount; column++) { final String columnName = resultSet.getMetaData().getColumnName(column); final int anInt = resultSet.getInt(columnName); } } You should rather be retrieving the column names once, and then retrieving the result-set columns using the getXXX(int columnIndex) methods. I agree - that would help my (contrived) case. My actual code doesn't use meta-data to get column names, but sources the column names from elsewhere. I can certainly fix the problems on my end. But I think there is an opportunity to add a tiny performance improvement in general here. Iterating over a result set, using getXXX(String columnName) is very common use of JDBC. We're not talking 10% speed improvement, but perhaps 1% in some cases? I don't know for sure - I'm just guessing based on the profiling. The only other thing I can think of that might speed it up would be to modify the current caching code to use a TreeMap with a custom comparator and then set the comparator to java.lang.String.CASE_INSENSITIVE_ORDER. That would avoid the extra String object creation, at the very least. Sounds good. Will java.lang.String.CASE_INSENSITIVE_ORDER be satisfactory? The Javadocs state: Note that this Comparator does not take locale into account, and will result in an unsatisfactory ordering for certain locales. I think that means it is acceptable for H2's case, but I'm not certain. Regards, Steve -- You received this message because you are subscribed to the Google Groups H2 Database group. To unsubscribe from this group and stop receiving emails from it, send an email to h2-database+unsubscr...@googlegroups.com. To post to this group, send email to h2-database@googlegroups.com. Visit this group at http://groups.google.com/group/h2-database. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups H2 Database group. To unsubscribe from this group and stop receiving emails from it, send an email to h2-database+unsubscr...@googlegroups.com. To post to this group, send email to h2-database@googlegroups.com. Visit this group at http://groups.google.com/group/h2-database. For more options, visit https://groups.google.com/d/optout.
Re: [h2] org.h2.util.StringUtils.toUpperEnglish can be a bottleneck
Hi, I have committed my patch, could you please test it? You would need to compile H2 from the trunk, see https://code.google.com/p/support/source/checkout Regards, Thomas On Tue, Feb 10, 2015 at 7:45 AM, Thomas Mueller thomas.tom.muel...@gmail.com wrote: Hi, It might make sense to use a concurrent and global cache in the StringUtils class instead, similar to StringUtils.softCache (StringUtils.fromCacheOrNew). That way, other components could benefit from it as well, and small result sets would also benefit. I will write a patch. Regards, Thomas On Tuesday, February 10, 2015, Steve McLeod steve.mcl...@gmail.com wrote: Hi Thomas and Noel, I've created a patch that caches the results of toUpperEnglish on a per JdbcResultSet basis. In my specific use case (fetching 95 columns by column name over tens of thousands of rows), the query speed-up was extremely significant, and utterly repeatable. Before: 510 milliseconds. After: 230 milliseconds. The patch is attached. I wrote the code, it's mine, and I'm contributing it to H2 for distribution multiple-licensed under the MPL 2.0, and the EPL 1.0 (http://h2database.com/html/license.html). Would either of you mind taking a look to see if you think it is worth committing? In particular, I'm concerned that my use of HashMap may be incorrect if multiple threads are sharing the result set. Regards, Steve On Monday, 9 February 2015 22:45:12 UTC+6:30, Steve McLeod wrote: Hi Noel, On Monday, 9 February 2015 19:14:32 UTC+6:30, Noel Grandin wrote: Hi On 2015-02-09 02:21 PM, Steve McLeod wrote: final ResultSet resultSet = conn.prepareStatement(SELECT * FROM foobar).executeQuery(); int rowCount = 0; while (resultSet.next()) { rowCount++; final int columnCount = resultSet.getMetaData().getColumnCount(); for (int column = 1; column = columnCount; column++) { final String columnName = resultSet.getMetaData().getColumnName(column); final int anInt = resultSet.getInt(columnName); } } You should rather be retrieving the column names once, and then retrieving the result-set columns using the getXXX(int columnIndex) methods. I agree - that would help my (contrived) case. My actual code doesn't use meta-data to get column names, but sources the column names from elsewhere. I can certainly fix the problems on my end. But I think there is an opportunity to add a tiny performance improvement in general here. Iterating over a result set, using getXXX(String columnName) is very common use of JDBC. We're not talking 10% speed improvement, but perhaps 1% in some cases? I don't know for sure - I'm just guessing based on the profiling. The only other thing I can think of that might speed it up would be to modify the current caching code to use a TreeMap with a custom comparator and then set the comparator to java.lang.String.CASE_INSENSITIVE_ORDER. That would avoid the extra String object creation, at the very least. Sounds good. Will java.lang.String.CASE_INSENSITIVE_ORDER be satisfactory? The Javadocs state: Note that this Comparator does not take locale into account, and will result in an unsatisfactory ordering for certain locales. I think that means it is acceptable for H2's case, but I'm not certain. Regards, Steve -- You received this message because you are subscribed to the Google Groups H2 Database group. To unsubscribe from this group and stop receiving emails from it, send an email to h2-database+unsubscr...@googlegroups.com. To post to this group, send email to h2-database@googlegroups.com. Visit this group at http://groups.google.com/group/h2-database. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups H2 Database group. To unsubscribe from this group and stop receiving emails from it, send an email to h2-database+unsubscr...@googlegroups.com. To post to this group, send email to h2-database@googlegroups.com. Visit this group at http://groups.google.com/group/h2-database. For more options, visit https://groups.google.com/d/optout.
Re: [h2] org.h2.util.StringUtils.toUpperEnglish can be a bottleneck
Hi Noel, On Monday, 9 February 2015 19:14:32 UTC+6:30, Noel Grandin wrote: Hi On 2015-02-09 02:21 PM, Steve McLeod wrote: final ResultSet resultSet = conn.prepareStatement(SELECT * FROM foobar).executeQuery(); int rowCount = 0; while (resultSet.next()) { rowCount++; final int columnCount = resultSet.getMetaData().getColumnCount(); for (int column = 1; column = columnCount; column++) { final String columnName = resultSet.getMetaData().getColumnName(column); final int anInt = resultSet.getInt(columnName); } } You should rather be retrieving the column names once, and then retrieving the result-set columns using the getXXX(int columnIndex) methods. I agree - that would help my (contrived) case. My actual code doesn't use meta-data to get column names, but sources the column names from elsewhere. I can certainly fix the problems on my end. But I think there is an opportunity to add a tiny performance improvement in general here. Iterating over a result set, using getXXX(String columnName) is very common use of JDBC. We're not talking 10% speed improvement, but perhaps 1% in some cases? I don't know for sure - I'm just guessing based on the profiling. The only other thing I can think of that might speed it up would be to modify the current caching code to use a TreeMap with a custom comparator and then set the comparator to java.lang.String.CASE_INSENSITIVE_ORDER. That would avoid the extra String object creation, at the very least. Sounds good. Will java.lang.String.CASE_INSENSITIVE_ORDER be satisfactory? The Javadocs state: Note that this Comparator does not take locale into account, and will result in an unsatisfactory ordering for certain locales. I think that means it is acceptable for H2's case, but I'm not certain. Regards, Steve -- You received this message because you are subscribed to the Google Groups H2 Database group. To unsubscribe from this group and stop receiving emails from it, send an email to h2-database+unsubscr...@googlegroups.com. To post to this group, send email to h2-database@googlegroups.com. Visit this group at http://groups.google.com/group/h2-database. For more options, visit https://groups.google.com/d/optout.
Re: [h2] org.h2.util.StringUtils.toUpperEnglish can be a bottleneck
Hi Thomas and Noel, I've created a patch that caches the results of toUpperEnglish on a per JdbcResultSet basis. In my specific use case (fetching 95 columns by column name over tens of thousands of rows), the query speed-up was extremely significant, and utterly repeatable. Before: 510 milliseconds. After: 230 milliseconds. The patch is attached. I wrote the code, it's mine, and I'm contributing it to H2 for distribution multiple-licensed under the MPL 2.0, and the EPL 1.0 (http://h2database.com/html/license.html). Would either of you mind taking a look to see if you think it is worth committing? In particular, I'm concerned that my use of HashMap may be incorrect if multiple threads are sharing the result set. Regards, Steve On Monday, 9 February 2015 22:45:12 UTC+6:30, Steve McLeod wrote: Hi Noel, On Monday, 9 February 2015 19:14:32 UTC+6:30, Noel Grandin wrote: Hi On 2015-02-09 02:21 PM, Steve McLeod wrote: final ResultSet resultSet = conn.prepareStatement(SELECT * FROM foobar).executeQuery(); int rowCount = 0; while (resultSet.next()) { rowCount++; final int columnCount = resultSet.getMetaData().getColumnCount(); for (int column = 1; column = columnCount; column++) { final String columnName = resultSet.getMetaData().getColumnName(column); final int anInt = resultSet.getInt(columnName); } } You should rather be retrieving the column names once, and then retrieving the result-set columns using the getXXX(int columnIndex) methods. I agree - that would help my (contrived) case. My actual code doesn't use meta-data to get column names, but sources the column names from elsewhere. I can certainly fix the problems on my end. But I think there is an opportunity to add a tiny performance improvement in general here. Iterating over a result set, using getXXX(String columnName) is very common use of JDBC. We're not talking 10% speed improvement, but perhaps 1% in some cases? I don't know for sure - I'm just guessing based on the profiling. The only other thing I can think of that might speed it up would be to modify the current caching code to use a TreeMap with a custom comparator and then set the comparator to java.lang.String.CASE_INSENSITIVE_ORDER. That would avoid the extra String object creation, at the very least. Sounds good. Will java.lang.String.CASE_INSENSITIVE_ORDER be satisfactory? The Javadocs state: Note that this Comparator does not take locale into account, and will result in an unsatisfactory ordering for certain locales. I think that means it is acceptable for H2's case, but I'm not certain. Regards, Steve -- You received this message because you are subscribed to the Google Groups H2 Database group. To unsubscribe from this group and stop receiving emails from it, send an email to h2-database+unsubscr...@googlegroups.com. To post to this group, send email to h2-database@googlegroups.com. Visit this group at http://groups.google.com/group/h2-database. For more options, visit https://groups.google.com/d/optout. Index: h2/src/main/org/h2/jdbc/JdbcResultSet.java IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP +UTF-8 === --- h2/src/main/org/h2/jdbc/JdbcResultSet.java (revision 6033) +++ h2/src/main/org/h2/jdbc/JdbcResultSet.java (revision ) @@ -83,9 +83,11 @@ private Value[] insertRow; private Value[] updateRow; private HashMapString, Integer columnLabelMap; +private final MapString, String toUpperMap = New.hashMap(); private HashMapInteger, Value[] patchedRows; private JdbcPreparedStatement preparedStatement; + JdbcResultSet(JdbcConnection conn, JdbcStatement stat, ResultInterface result, int id, boolean closeStatement, boolean scrollable, boolean updatable) { @@ -3115,7 +3117,7 @@ preparedStatement.setCachedColumnLabelMap(columnLabelMap); } } -Integer index = columnLabelMap.get(StringUtils.toUpperEnglish(columnLabel)); +Integer index = columnLabelMap.get(toUpperEnglishCached(columnLabel)); if (index == null) { throw DbException.get(ErrorCode.COLUMN_NOT_FOUND_1, columnLabel); } @@ -3144,6 +3146,21 @@ } } throw DbException.get(ErrorCode.COLUMN_NOT_FOUND_1, columnLabel); +} + +/** + * This isn't thread-safe. Should we use a concurrent map instead? Or will an instance JdbcResultSet always be accessed from only one thread? + * @param columnLabel column name that can be lower, upper, or mixed case + * @return the column label in upper case according to the english locale, obtained from a cache if possible + */ +private String
Re: [h2] org.h2.util.StringUtils.toUpperEnglish can be a bottleneck
Hi On 2015-02-09 02:21 PM, Steve McLeod wrote: final ResultSet resultSet = conn.prepareStatement(SELECT * FROM foobar).executeQuery(); int rowCount = 0; while (resultSet.next()) { rowCount++; final int columnCount = resultSet.getMetaData().getColumnCount(); for (int column = 1; column = columnCount; column++) { final String columnName = resultSet.getMetaData().getColumnName(column); final int anInt = resultSet.getInt(columnName); } } You should rather be retrieving the column names once, and then retrieving the result-set columns using the getXXX(int columnIndex) methods. The only other thing I can think of that might speed it up would be to modify the current caching code to use a TreeMap with a custom comparator and then set the comparator to java.lang.String.CASE_INSENSITIVE_ORDER. That would avoid the extra String object creation, at the very least. Regards, Noel. -- You received this message because you are subscribed to the Google Groups H2 Database group. To unsubscribe from this group and stop receiving emails from it, send an email to h2-database+unsubscr...@googlegroups.com. To post to this group, send email to h2-database@googlegroups.com. Visit this group at http://groups.google.com/group/h2-database. For more options, visit https://groups.google.com/d/optout.
[h2] org.h2.util.StringUtils.toUpperEnglish can be a bottleneck
Hi all, I've got some H2 code that executes a query that returns possibly 100,000+ rows and 95 columns. I need to retrieve every column value by name, using JDBC. Using H2's built-in profiling tool, StringUtils.toUpperEnglish seems to be a bottleneck. It seems that JdbcResultSet.getColumnIndex(String columnLabel) is part of the problem. I can see a couple of ways I could improve this in H2: * caching the conversion from column-name to uppercase column name in JdbcResultSet.getColumnIndex(String columnLabel), on the assumption that if code calls, eg, rs.getInt(foobar) once it is rather likely to call it again. * make StringUtils.toUpperEnglish cache recent invocations, on the assumption that the same strings within H2 probably get converted to upper case frequently. * rework the logic in JdbcResultSet.getColumnIndex(String columnLabel) a little - there is already some code that tries to minimize the toUpperEnglish invocations, but I think it could be made smarter. * I also suspect there is a Turkish-locale bug in in the case where the column count is less than 3, but I'd want to test before stating this incontrovertibly. Of course, I could also implement a work-around in my code, but here is perhaps a chance for a tiny but worthwhile performance improvement for all users of H2. Thoughts? I'm using H2-1.3.176, but I've checked SVN and the relevant code seems unchanged at the present. Regards, Steve Contrived code to reproduce the problem follows: import org.h2.util.Profiler; import javax.swing.*; import java.io.IOException; import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.SQLException; public class ScratchSpace extends JFrame { public static void main(String[] args) throws SQLException, ClassNotFoundException, IOException { Class.forName(org.h2.Driver); Connection conn = DriverManager.getConnection(jdbc:h2:~/test); conn.prepareStatement(DROP TABLE IF EXISTS foobar).execute(); // create 100 column names, each of the form AA, AB, AC...etc. System.out.println(Creating table with 100 columns); String sql = create table foobar (; for (int i = 0; i 100; i++) { String col = getColumnName(i); if (i 0) { sql += , ; } sql += col + INT; } sql += ); System.out.println(sql = + sql); conn.prepareStatement(sql).execute(); final int MAX_ROWS = 10; System.out.println(Inserting + MAX_ROWS + rows); for (int j = 0; j MAX_ROWS; j++) { conn.prepareStatement(INSERT INTO foobar (Column_AA) VALUES (1)).execute(); } Profiler prof = new Profiler(); prof.startCollecting(); final ResultSet resultSet = conn.prepareStatement(SELECT * FROM foobar).executeQuery(); int rowCount = 0; while (resultSet.next()) { rowCount++; final int columnCount = resultSet.getMetaData().getColumnCount(); for (int column = 1; column = columnCount; column++) { final String columnName = resultSet.getMetaData().getColumnName(column); final int anInt = resultSet.getInt(columnName); } } System.out.println(Done); prof.stopCollecting(); System.out.println(prof.getTop(3)); conn.close(); } private static String getColumnName(int i) { return Column_ + (char) ((65 + i / 26)) + (char) (65 + i % 26); } } Profiler output: Profiler: top 3 stack trace(s) of of 2219 ms of 825 thread dumps: 108/825 (13%): at org.h2.util.StringUtils.toUpperEnglish(StringUtils.java:93) at org.h2.jdbc.JdbcResultSet.getColumnIndex(JdbcResultSet.java:3120) at org.h2.jdbc.JdbcResultSet.get(JdbcResultSet.java:3210) at org.h2.jdbc.JdbcResultSet.getInt(JdbcResultSet.java:340) at com.barbarysoftware.pokercopilot.ScratchSpace.main(ScratchSpace.java:52) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134) 72/825 (8%): at org.h2.store.fs.FileDisk.read(FilePathDisk.java:451) at org.h2.store.fs.FileUtils.readFully(FileUtils.java:345) at org.h2.store.FileStore.readFully(FileStore.java:274) at org.h2.result.ResultDiskBuffer.readRow(ResultDiskBuffer.java:198) at org.h2.result.ResultDiskBuffer.nextUnsorted(ResultDiskBuffer.java:221) at org.h2.result.ResultDiskBuffer.next(ResultDiskBuffer.java:214) at org.h2.result.LocalResult.next(LocalResult.java:234) at org.h2.jdbc.JdbcResultSet.nextRow(JdbcResultSet.java:3233) at org.h2.jdbc.JdbcResultSet.next(JdbcResultSet.java:124) at com.barbarysoftware.pokercopilot.ScratchSpace.main(ScratchSpace.java:47) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134) 65/825 (7%): at org.h2.engine.Session.isReconnectNeeded(Session.java:1336) at org.h2.jdbc.JdbcConnection.checkClosed(JdbcConnection.java:1470) at