subject:"\[h2\] org.h2.util.StringUtils.toUpperEnglish can be a bottleneck"

Re: [h2] org.h2.util.StringUtils.toUpperEnglish can be a bottleneck

2015-02-11 Thread Steve McLeod

The patch seems good. Here are my test results:

Typical query time after JVM warm-up

h2 1.3.176 - 716 ms
h2 1.4.185 (but not using MV_STORE) - 350 ms
h2 1.4.185 (but not using MV_STORE) with patch - 243 ms

Regards,

Steve


On Tuesday, 10 February 2015 23:36:34 UTC+6:30, Thomas Mueller wrote:

 Hi,

 I have committed my patch, could you please test it? You would need to 
 compile H2 from the trunk, see 
 https://code.google.com/p/support/source/checkout

 Regards,
 Thomas


 On Tue, Feb 10, 2015 at 7:45 AM, Thomas Mueller thomas.to...@gmail.com 
 javascript: wrote:

 Hi,

 It might make sense to use a concurrent and global cache in the 
 StringUtils class instead, similar to StringUtils.softCache 
 (StringUtils.fromCacheOrNew). That way, other components could benefit from 
 it as well, and small result sets would also benefit.

 I will write a patch.

 Regards,
 Thomas




 On Tuesday, February 10, 2015, Steve McLeod steve@gmail.com 
 javascript: wrote:

 Hi Thomas and Noel,

 I've created a patch that caches the results of toUpperEnglish on a per 
 JdbcResultSet basis. In my specific use case (fetching 95 columns by column 
 name over tens of thousands of rows), the query speed-up was extremely 
 significant, and utterly repeatable. Before: 510 milliseconds. After: 230 
 milliseconds.

 The patch is attached. I wrote the code, it's mine, and I'm 
 contributing it to H2 for distribution multiple-licensed under the MPL 2.0, 
 and the EPL 1.0 (http://h2database.com/html/license.html).

 Would either of you mind taking a look to see if you think it is worth 
 committing? In particular, I'm concerned that my use of HashMap may be 
 incorrect if multiple threads are sharing the result set. 

 Regards,

 Steve




 On Monday, 9 February 2015 22:45:12 UTC+6:30, Steve McLeod wrote:

 Hi Noel,



 On Monday, 9 February 2015 19:14:32 UTC+6:30, Noel Grandin wrote:

 Hi 


 On 2015-02-09 02:21 PM, Steve McLeod wrote: 
   final ResultSet resultSet = conn.prepareStatement(SELECT * 
 FROM foobar).executeQuery(); 
   int rowCount = 0; 
   while (resultSet.next()) { 
   rowCount++; 
   final int columnCount = 
  resultSet.getMetaData().getColumnCount(); 

   for (int column = 1; column = columnCount; column++) { 
   final String columnName = 
  resultSet.getMetaData().getColumnName(column); 

   final int anInt = resultSet.getInt(columnName); 
   } 
   } 

 You should rather be retrieving the column names once, and then 
 retrieving the result-set columns using the getXXX(int 
 columnIndex) methods. 


 I agree - that would help my (contrived) case. My actual code doesn't 
 use meta-data to get column names, but sources the column names from 
 elsewhere. I can certainly fix the problems on my end.

 But I think there is an opportunity to add a tiny performance 
 improvement in general here. Iterating over a result set, using 
 getXXX(String columnName) is very common use of JDBC. We're not talking 
 10% 
 speed improvement, but perhaps 1% in some cases? I don't know for sure - 
 I'm just guessing based on the profiling.
  

 The only other thing I can think of that might speed it up would be to 
 modify the current caching code to use a TreeMap 
 with a custom comparator and then set the comparator to 
 java.lang.String.CASE_INSENSITIVE_ORDER. 
 That would avoid the extra String object creation, at the very least. 


 Sounds good. Will java.lang.String.CASE_INSENSITIVE_ORDER be 
 satisfactory? The Javadocs state: Note that this Comparator does not take 
 locale into account, and will result in an unsatisfactory ordering for 
 certain locales. I think that means it is acceptable for H2's case, but 
 I'm not certain.


 Regards,

 Steve

  -- 
 You received this message because you are subscribed to the Google 
 Groups H2 Database group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to h2-database+unsubscr...@googlegroups.com.
 To post to this group, send email to h2-database@googlegroups.com.
 Visit this group at http://groups.google.com/group/h2-database.
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups H2 
Database group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to h2-database+unsubscr...@googlegroups.com.
To post to this group, send email to h2-database@googlegroups.com.
Visit this group at http://groups.google.com/group/h2-database.
For more options, visit https://groups.google.com/d/optout.

Re: [h2] org.h2.util.StringUtils.toUpperEnglish can be a bottleneck

2015-02-10 Thread Thomas Mueller

Hi,

I have committed my patch, could you please test it? You would need to
compile H2 from the trunk, see
https://code.google.com/p/support/source/checkout

Regards,
Thomas


On Tue, Feb 10, 2015 at 7:45 AM, Thomas Mueller 
thomas.tom.muel...@gmail.com wrote:

 Hi,

 It might make sense to use a concurrent and global cache in the
 StringUtils class instead, similar to StringUtils.softCache
 (StringUtils.fromCacheOrNew). That way, other components could benefit from
 it as well, and small result sets would also benefit.

 I will write a patch.

 Regards,
 Thomas




 On Tuesday, February 10, 2015, Steve McLeod steve.mcl...@gmail.com
 wrote:

 Hi Thomas and Noel,

 I've created a patch that caches the results of toUpperEnglish on a per
 JdbcResultSet basis. In my specific use case (fetching 95 columns by column
 name over tens of thousands of rows), the query speed-up was extremely
 significant, and utterly repeatable. Before: 510 milliseconds. After: 230
 milliseconds.

 The patch is attached. I wrote the code, it's mine, and I'm contributing
 it to H2 for distribution multiple-licensed under the MPL 2.0, and the EPL
 1.0 (http://h2database.com/html/license.html).

 Would either of you mind taking a look to see if you think it is worth
 committing? In particular, I'm concerned that my use of HashMap may be
 incorrect if multiple threads are sharing the result set.

 Regards,

 Steve




 On Monday, 9 February 2015 22:45:12 UTC+6:30, Steve McLeod wrote:

 Hi Noel,



 On Monday, 9 February 2015 19:14:32 UTC+6:30, Noel Grandin wrote:

 Hi


 On 2015-02-09 02:21 PM, Steve McLeod wrote:
   final ResultSet resultSet = conn.prepareStatement(SELECT *
 FROM foobar).executeQuery();
   int rowCount = 0;
   while (resultSet.next()) {
   rowCount++;
   final int columnCount = 
  resultSet.getMetaData().getColumnCount();

   for (int column = 1; column = columnCount; column++) {
   final String columnName = 
  resultSet.getMetaData().getColumnName(column);

   final int anInt = resultSet.getInt(columnName);
   }
   }

 You should rather be retrieving the column names once, and then
 retrieving the result-set columns using the getXXX(int
 columnIndex) methods.


 I agree - that would help my (contrived) case. My actual code doesn't
 use meta-data to get column names, but sources the column names from
 elsewhere. I can certainly fix the problems on my end.

 But I think there is an opportunity to add a tiny performance
 improvement in general here. Iterating over a result set, using
 getXXX(String columnName) is very common use of JDBC. We're not talking 10%
 speed improvement, but perhaps 1% in some cases? I don't know for sure -
 I'm just guessing based on the profiling.


 The only other thing I can think of that might speed it up would be to
 modify the current caching code to use a TreeMap
 with a custom comparator and then set the comparator to
 java.lang.String.CASE_INSENSITIVE_ORDER.
 That would avoid the extra String object creation, at the very least.


 Sounds good. Will java.lang.String.CASE_INSENSITIVE_ORDER be
 satisfactory? The Javadocs state: Note that this Comparator does not take
 locale into account, and will result in an unsatisfactory ordering for
 certain locales. I think that means it is acceptable for H2's case, but
 I'm not certain.


 Regards,

 Steve

  --
 You received this message because you are subscribed to the Google Groups
 H2 Database group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to h2-database+unsubscr...@googlegroups.com.
 To post to this group, send email to h2-database@googlegroups.com.
 Visit this group at http://groups.google.com/group/h2-database.
 For more options, visit https://groups.google.com/d/optout.



-- 
You received this message because you are subscribed to the Google Groups H2 
Database group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to h2-database+unsubscr...@googlegroups.com.
To post to this group, send email to h2-database@googlegroups.com.
Visit this group at http://groups.google.com/group/h2-database.
For more options, visit https://groups.google.com/d/optout.

Re: [h2] org.h2.util.StringUtils.toUpperEnglish can be a bottleneck

2015-02-09 Thread Steve McLeod

Hi Noel,



On Monday, 9 February 2015 19:14:32 UTC+6:30, Noel Grandin wrote:

 Hi 


 On 2015-02-09 02:21 PM, Steve McLeod wrote: 
   final ResultSet resultSet = conn.prepareStatement(SELECT * 
 FROM foobar).executeQuery(); 
   int rowCount = 0; 
   while (resultSet.next()) { 
   rowCount++; 
   final int columnCount = 
 resultSet.getMetaData().getColumnCount(); 
   for (int column = 1; column = columnCount; column++) { 
   final String columnName = 
 resultSet.getMetaData().getColumnName(column); 
   final int anInt = resultSet.getInt(columnName); 
   } 
   } 

 You should rather be retrieving the column names once, and then retrieving 
 the result-set columns using the getXXX(int 
 columnIndex) methods. 


I agree - that would help my (contrived) case. My actual code doesn't use 
meta-data to get column names, but sources the column names from elsewhere. 
I can certainly fix the problems on my end.

But I think there is an opportunity to add a tiny performance improvement 
in general here. Iterating over a result set, using getXXX(String 
columnName) is very common use of JDBC. We're not talking 10% speed 
improvement, but perhaps 1% in some cases? I don't know for sure - I'm just 
guessing based on the profiling.
 

 The only other thing I can think of that might speed it up would be to 
 modify the current caching code to use a TreeMap 
 with a custom comparator and then set the comparator to 
 java.lang.String.CASE_INSENSITIVE_ORDER. 
 That would avoid the extra String object creation, at the very least. 


Sounds good. Will java.lang.String.CASE_INSENSITIVE_ORDER be satisfactory? 
The Javadocs state: Note that this Comparator does not take locale into 
account, and will result in an unsatisfactory ordering for certain 
locales. I think that means it is acceptable for H2's case, but I'm not 
certain.


Regards,

Steve

-- 
You received this message because you are subscribed to the Google Groups H2 
Database group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to h2-database+unsubscr...@googlegroups.com.
To post to this group, send email to h2-database@googlegroups.com.
Visit this group at http://groups.google.com/group/h2-database.
For more options, visit https://groups.google.com/d/optout.

Re: [h2] org.h2.util.StringUtils.toUpperEnglish can be a bottleneck

2015-02-09 Thread Steve McLeod

Hi Thomas and Noel,

I've created a patch that caches the results of toUpperEnglish on a per 
JdbcResultSet basis. In my specific use case (fetching 95 columns by column 
name over tens of thousands of rows), the query speed-up was extremely 
significant, and utterly repeatable. Before: 510 milliseconds. After: 230 
milliseconds.

The patch is attached. I wrote the code, it's mine, and I'm contributing it 
to H2 for distribution multiple-licensed under the MPL 2.0, and the EPL 1.0 
(http://h2database.com/html/license.html).

Would either of you mind taking a look to see if you think it is worth 
committing? In particular, I'm concerned that my use of HashMap may be 
incorrect if multiple threads are sharing the result set. 

Regards,

Steve




On Monday, 9 February 2015 22:45:12 UTC+6:30, Steve McLeod wrote:

 Hi Noel,



 On Monday, 9 February 2015 19:14:32 UTC+6:30, Noel Grandin wrote:

 Hi 


 On 2015-02-09 02:21 PM, Steve McLeod wrote: 
   final ResultSet resultSet = conn.prepareStatement(SELECT * 
 FROM foobar).executeQuery(); 
   int rowCount = 0; 
   while (resultSet.next()) { 
   rowCount++; 
   final int columnCount = 
 resultSet.getMetaData().getColumnCount(); 
   for (int column = 1; column = columnCount; column++) { 
   final String columnName = 
 resultSet.getMetaData().getColumnName(column); 
   final int anInt = resultSet.getInt(columnName); 
   } 
   } 

 You should rather be retrieving the column names once, and then 
 retrieving the result-set columns using the getXXX(int 
 columnIndex) methods. 


 I agree - that would help my (contrived) case. My actual code doesn't use 
 meta-data to get column names, but sources the column names from elsewhere. 
 I can certainly fix the problems on my end.

 But I think there is an opportunity to add a tiny performance improvement 
 in general here. Iterating over a result set, using getXXX(String 
 columnName) is very common use of JDBC. We're not talking 10% speed 
 improvement, but perhaps 1% in some cases? I don't know for sure - I'm just 
 guessing based on the profiling.
  

 The only other thing I can think of that might speed it up would be to 
 modify the current caching code to use a TreeMap 
 with a custom comparator and then set the comparator to 
 java.lang.String.CASE_INSENSITIVE_ORDER. 
 That would avoid the extra String object creation, at the very least. 


 Sounds good. Will java.lang.String.CASE_INSENSITIVE_ORDER be satisfactory? 
 The Javadocs state: Note that this Comparator does not take locale into 
 account, and will result in an unsatisfactory ordering for certain 
 locales. I think that means it is acceptable for H2's case, but I'm not 
 certain.


 Regards,

 Steve



-- 
You received this message because you are subscribed to the Google Groups H2 
Database group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to h2-database+unsubscr...@googlegroups.com.
To post to this group, send email to h2-database@googlegroups.com.
Visit this group at http://groups.google.com/group/h2-database.
For more options, visit https://groups.google.com/d/optout.
Index: h2/src/main/org/h2/jdbc/JdbcResultSet.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
+UTF-8
===
--- h2/src/main/org/h2/jdbc/JdbcResultSet.java	(revision 6033)
+++ h2/src/main/org/h2/jdbc/JdbcResultSet.java	(revision )
@@ -83,9 +83,11 @@
 private Value[] insertRow;
 private Value[] updateRow;
 private HashMapString, Integer columnLabelMap;
+private final MapString, String toUpperMap = New.hashMap();
 private HashMapInteger, Value[] patchedRows;
 private JdbcPreparedStatement preparedStatement;
 
+
 JdbcResultSet(JdbcConnection conn, JdbcStatement stat,
 ResultInterface result, int id, boolean closeStatement,
 boolean scrollable, boolean updatable) {
@@ -3115,7 +3117,7 @@
 preparedStatement.setCachedColumnLabelMap(columnLabelMap);
 }
 }
-Integer index = columnLabelMap.get(StringUtils.toUpperEnglish(columnLabel));
+Integer index = columnLabelMap.get(toUpperEnglishCached(columnLabel));
 if (index == null) {
 throw DbException.get(ErrorCode.COLUMN_NOT_FOUND_1, columnLabel);
 }
@@ -3144,6 +3146,21 @@
 }
 }
 throw DbException.get(ErrorCode.COLUMN_NOT_FOUND_1, columnLabel);
+}
+
+/**
+ * This isn't thread-safe. Should we use a concurrent map instead? Or will an instance JdbcResultSet always be accessed from only one thread?
+ * @param columnLabel column name that can be lower, upper, or mixed case
+ * @return the column label in upper case according to the english locale, obtained from a cache if possible
+ */
+private String

Re: [h2] org.h2.util.StringUtils.toUpperEnglish can be a bottleneck

2015-02-09 Thread Noel Grandin


Hi


On 2015-02-09 02:21 PM, Steve McLeod wrote:

 final ResultSet resultSet = conn.prepareStatement(SELECT * FROM 
foobar).executeQuery();
 int rowCount = 0;
 while (resultSet.next()) {
 rowCount++;
 final int columnCount = resultSet.getMetaData().getColumnCount();
 for (int column = 1; column = columnCount; column++) {
 final String columnName = 
resultSet.getMetaData().getColumnName(column);
 final int anInt = resultSet.getInt(columnName);
 }
 }


You should rather be retrieving the column names once, and then retrieving the result-set columns using the getXXX(int 
columnIndex) methods.


The only other thing I can think of that might speed it up would be to modify the current caching code to use a TreeMap 
with a custom comparator and then set the comparator to java.lang.String.CASE_INSENSITIVE_ORDER.

That would avoid the extra String object creation, at the very least.

Regards, Noel.

--
You received this message because you are subscribed to the Google Groups H2 
Database group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to h2-database+unsubscr...@googlegroups.com.
To post to this group, send email to h2-database@googlegroups.com.
Visit this group at http://groups.google.com/group/h2-database.
For more options, visit https://groups.google.com/d/optout.

[h2] org.h2.util.StringUtils.toUpperEnglish can be a bottleneck

2015-02-09 Thread Steve McLeod

Hi all,

I've got some H2 code that executes a query that returns possibly 100,000+ 
rows and 95 columns. I need to retrieve every column value by name, using 
JDBC. Using H2's built-in profiling tool, StringUtils.toUpperEnglish seems 
to be a bottleneck. It seems that JdbcResultSet.getColumnIndex(String 
columnLabel) is part of the problem.

I can see a couple of ways I could improve this in H2:
* caching the conversion from column-name to uppercase column name 
in JdbcResultSet.getColumnIndex(String columnLabel), on the assumption that 
if code calls, eg, rs.getInt(foobar) once it is rather likely to call it 
again.
* make StringUtils.toUpperEnglish cache recent invocations, on the 
assumption that the same strings within H2 probably get converted to upper 
case frequently.
* rework the logic in JdbcResultSet.getColumnIndex(String columnLabel) a 
little - there is already some code that tries to minimize the 
toUpperEnglish invocations, but I think it could be made smarter.
* I also suspect there is a Turkish-locale bug in in the case where the 
column count is less than 3, but I'd want to test before stating this 
incontrovertibly.

Of course, I could also implement a work-around in my code, but here is 
perhaps a chance for a tiny but worthwhile performance improvement for all 
users of H2.

Thoughts? 

I'm using H2-1.3.176, but I've checked SVN and the relevant code seems 
unchanged at the present.

Regards,

Steve


Contrived code to reproduce the problem follows:

import org.h2.util.Profiler;

import javax.swing.*;
import java.io.IOException;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;

public class ScratchSpace extends JFrame {
public static void main(String[] args) throws SQLException, 
ClassNotFoundException, IOException {


Class.forName(org.h2.Driver);
Connection conn = DriverManager.getConnection(jdbc:h2:~/test);

conn.prepareStatement(DROP TABLE IF EXISTS foobar).execute();
// create 100 column names, each of the form AA, AB, AC...etc.

System.out.println(Creating table with 100 columns);
String sql = create table foobar (;
for (int i = 0; i  100; i++) {
String col = getColumnName(i);
if (i  0) {
sql += , ;
}
sql += col +  INT;
}
sql += );
System.out.println(sql =  + sql);
conn.prepareStatement(sql).execute();

final int MAX_ROWS = 10;
System.out.println(Inserting  + MAX_ROWS +  rows);
for (int j = 0; j  MAX_ROWS; j++) {
conn.prepareStatement(INSERT INTO foobar (Column_AA) VALUES 
(1)).execute();
}

Profiler prof = new Profiler();
prof.startCollecting();


final ResultSet resultSet = conn.prepareStatement(SELECT * FROM 
foobar).executeQuery();
int rowCount = 0;
while (resultSet.next()) {
rowCount++;
final int columnCount = 
resultSet.getMetaData().getColumnCount();
for (int column = 1; column = columnCount; column++) {
final String columnName = 
resultSet.getMetaData().getColumnName(column);
final int anInt = resultSet.getInt(columnName);
}
}

System.out.println(Done);


prof.stopCollecting();
System.out.println(prof.getTop(3));
conn.close();


}

private static String getColumnName(int i) {
return Column_ + (char) ((65 + i / 26)) + (char) (65 + i % 26);
}
}

Profiler output:

Profiler: top 3 stack trace(s) of  of 2219 ms of 825 thread dumps:
108/825 (13%):
at org.h2.util.StringUtils.toUpperEnglish(StringUtils.java:93)
at org.h2.jdbc.JdbcResultSet.getColumnIndex(JdbcResultSet.java:3120)
at org.h2.jdbc.JdbcResultSet.get(JdbcResultSet.java:3210)
at org.h2.jdbc.JdbcResultSet.getInt(JdbcResultSet.java:340)
at com.barbarysoftware.pokercopilot.ScratchSpace.main(ScratchSpace.java:52)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
72/825 (8%):
at org.h2.store.fs.FileDisk.read(FilePathDisk.java:451)
at org.h2.store.fs.FileUtils.readFully(FileUtils.java:345)
at org.h2.store.FileStore.readFully(FileStore.java:274)
at org.h2.result.ResultDiskBuffer.readRow(ResultDiskBuffer.java:198)
at org.h2.result.ResultDiskBuffer.nextUnsorted(ResultDiskBuffer.java:221)
at org.h2.result.ResultDiskBuffer.next(ResultDiskBuffer.java:214)
at org.h2.result.LocalResult.next(LocalResult.java:234)
at org.h2.jdbc.JdbcResultSet.nextRow(JdbcResultSet.java:3233)
at org.h2.jdbc.JdbcResultSet.next(JdbcResultSet.java:124)
at com.barbarysoftware.pokercopilot.ScratchSpace.main(ScratchSpace.java:47)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
65/825 (7%):
at org.h2.engine.Session.isReconnectNeeded(Session.java:1336)
at org.h2.jdbc.JdbcConnection.checkClosed(JdbcConnection.java:1470)
at

Re: [h2] org.h2.util.StringUtils.toUpperEnglish can be a bottleneck

Re: [h2] org.h2.util.StringUtils.toUpperEnglish can be a bottleneck

Re: [h2] org.h2.util.StringUtils.toUpperEnglish can be a bottleneck

Re: [h2] org.h2.util.StringUtils.toUpperEnglish can be a bottleneck

Re: [h2] org.h2.util.StringUtils.toUpperEnglish can be a bottleneck

[h2] org.h2.util.StringUtils.toUpperEnglish can be a bottleneck

6 matches

Site Navigation

Mail list logo

Footer information