Re: [h2] org.h2.util.StringUtils.toUpperEnglish can be a bottleneck

Steve McLeod Mon, 09 Feb 2015 20:20:12 -0800

Hi Thomas and Noel,

I've created a patch that caches the results of toUpperEnglish on a per 
JdbcResultSet basis. In my specific use case (fetching 95 columns by column 
name over tens of thousands of rows), the query speed-up was extremely 
significant, and utterly repeatable. Before: 510 milliseconds. After: 230 
milliseconds.


The patch is attached. I wrote the code, it's mine, and I'm contributing it 
to H2 for distribution multiple-licensed under the MPL 2.0, and the EPL 1.0 
(http://h2database.com/html/license.html).

Would either of you mind taking a look to see if you think it is worth 
committing? In particular, I'm concerned that my use of HashMap may be 
incorrect if multiple threads are sharing the result set. 

Regards,

Steve




On Monday, 9 February 2015 22:45:12 UTC+6:30, Steve McLeod wrote:
>
> Hi Noel,
>
>
>
> On Monday, 9 February 2015 19:14:32 UTC+6:30, Noel Grandin wrote:
>>
>> Hi 
>>
>>
>> On 2015-02-09 02:21 PM, Steve McLeod wrote: 
>> >          final ResultSet resultSet = conn.prepareStatement("SELECT * 
>> FROM foobar").executeQuery(); 
>> >          int rowCount = 0; 
>> >          while (resultSet.next()) { 
>> >              rowCount++; 
>> >              final int columnCount = 
>> resultSet.getMetaData().getColumnCount(); 
>> >              for (int column = 1; column <= columnCount; column++) { 
>> >                  final String columnName = 
>> resultSet.getMetaData().getColumnName(column); 
>> >                  final int anInt = resultSet.getInt(columnName); 
>> >              } 
>> >          } 
>>
>> You should rather be retrieving the column names once, and then 
>> retrieving the result-set columns using the getXXX(int 
>> columnIndex) methods. 
>>
>>
> I agree - that would help my (contrived) case. My actual code doesn't use 
> meta-data to get column names, but sources the column names from elsewhere. 
> I can certainly fix the problems on my end.
>
> But I think there is an opportunity to add a tiny performance improvement 
> in general here. Iterating over a result set, using getXXX(String 
> columnName) is very common use of JDBC. We're not talking 10% speed 
> improvement, but perhaps 1% in some cases? I don't know for sure - I'm just 
> guessing based on the profiling.
>  
>
>> The only other thing I can think of that might speed it up would be to 
>> modify the current caching code to use a TreeMap 
>> with a custom comparator and then set the comparator to 
>> java.lang.String.CASE_INSENSITIVE_ORDER. 
>> That would avoid the extra String object creation, at the very least. 
>>
>
> Sounds good. Will java.lang.String.CASE_INSENSITIVE_ORDER be satisfactory? 
> The Javadocs state: "Note that this Comparator does not take locale into 
> account, and will result in an unsatisfactory ordering for certain 
> locales." I think that means it is acceptable for H2's case, but I'm not 
> certain.
>
>
> Regards,
>
> Steve
>
>

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to h2-database+unsubscr...@googlegroups.com.
To post to this group, send email to h2-database@googlegroups.com.
Visit this group at http://groups.google.com/group/h2-database.
For more options, visit https://groups.google.com/d/optout.

Index: h2/src/main/org/h2/jdbc/JdbcResultSet.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- h2/src/main/org/h2/jdbc/JdbcResultSet.java	(revision 6033)
+++ h2/src/main/org/h2/jdbc/JdbcResultSet.java	(revision )
@@ -83,9 +83,11 @@
     private Value[] insertRow;
     private Value[] updateRow;
     private HashMap<String, Integer> columnLabelMap;
+    private final Map<String, String> toUpperMap = New.hashMap();
     private HashMap<Integer, Value[]> patchedRows;
     private JdbcPreparedStatement preparedStatement;
 
+
     JdbcResultSet(JdbcConnection conn, JdbcStatement stat,
             ResultInterface result, int id, boolean closeStatement,
             boolean scrollable, boolean updatable) {
@@ -3115,7 +3117,7 @@
                     preparedStatement.setCachedColumnLabelMap(columnLabelMap);
                 }
             }
-            Integer index = columnLabelMap.get(StringUtils.toUpperEnglish(columnLabel));
+            Integer index = columnLabelMap.get(toUpperEnglishCached(columnLabel));
             if (index == null) {
                 throw DbException.get(ErrorCode.COLUMN_NOT_FOUND_1, columnLabel);
             }
@@ -3144,6 +3146,21 @@
             }
         }
         throw DbException.get(ErrorCode.COLUMN_NOT_FOUND_1, columnLabel);
+    }
+
+    /**
+     * This isn't thread-safe. Should we use a concurrent map instead? Or will an instance JdbcResultSet always be accessed from only one thread?
+     * @param columnLabel column name that can be lower, upper, or mixed case
+     * @return the column label in upper case according to the english locale, obtained from a cache if possible
+     */
+    private String toUpperEnglishCached(String columnLabel) {
+        if (toUpperMap.containsKey(columnLabel)) {
+            return toUpperMap.get(columnLabel);
+        } else {
+            final String columnLabelUpperEnglish = StringUtils.toUpperEnglish(columnLabel);
+            toUpperMap.put(columnLabel, columnLabelUpperEnglish);
+            return columnLabelUpperEnglish;
+        }
     }
 
     private static void mapColumn(HashMap<String, Integer> map, String label,

Re: [h2] org.h2.util.StringUtils.toUpperEnglish can be a bottleneck

Reply via email to