[
https://issues.apache.org/jira/browse/DERBY-3882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758705#action_12758705
]
Knut Anders Hatlen commented on DERBY-3882:
-------------------------------------------
I think the reason why it's not used as a general optimization technique for
String.equals() is that there are some conditions that must be satisfied before
it actually is an optimization:
1) The same String objects must be compared multiple times, otherwise the cost
of calculating the hash codes will be too high compared to the benefit.
2) There's nothing to gain by comparing the hash codes if the Strings are
equal, so it only speeds up the comparisons where we expect a high number of
mismatches.
3) String.equals() is very fast if the strings have different lengths or if the
strings differ in one of the first characters, so the optimization has the best
effect when comparing strings of the same length with a common prefix.
The cursor names generated by the network client satisfy all of these
conditions. They are on the form SQL_CURLH000C + serial#, which means they are
not equal, have a common prefix, and are likely to have the same length. Also,
the names are stored in the activation on the server, so they'll be reused and
benefit from the caching of the hash code.
Another reason is, as you mentioned, that a hash table is normally used for
such lookups. A hash table could be used in this case as well, but there are
some complicating issues that may make it too complex to justify it:
a) The cursor names are not unique within a connection (open cursors cannot
have the same name, but open statements can share the same name as long as they
don't have open cursors at the same time). This means that one key (cursor
name) can map to many values, so some sort of multi-map must be implemented. In
the normal embedded case with no cursor name, all activations will be located
in the same bucket (key=null).
b) A statement can change its cursor name any time. Currently, this is done by
simply changing the cursorName field in the activation. If we store the
activation list in a hash table, changing the cursor name means that we also
need to move the activation from one bucket to another.
c) There's some code to reclaim memory if the activation list has been big and
later shrinks. I'd imagine that this code would be somewhat more complex too if
the list is transformed into a multi-map.
That said, I'm all for replacing the list with a data structure that's more
suited for effective lookups. I'd suggest that we go for the current patch
proposal for now, since it looks simple and rather harmless, and then revisit
the issue and try to come up with a more efficient data structure if this
optimization turns out to be insufficient.
> Expensive cursor name lookup in network server
> ----------------------------------------------
>
> Key: DERBY-3882
> URL: https://issues.apache.org/jira/browse/DERBY-3882
> Project: Derby
> Issue Type: Improvement
> Components: Network Server, SQL
> Affects Versions: 10.4.2.0
> Reporter: Knut Anders Hatlen
> Assignee: Knut Anders Hatlen
> Priority: Minor
> Attachments: check_hash.diff, Cursors.java
>
>
> I have sometimes seen in a profiler that an unreasonably high amount of the
> CPU time is spent in
> GenericLanguageConnectionContext.lookupCursorActivation() when the network
> server is running. That method is used to check that there is no active
> statement in the current transaction with the same cursor name as the
> statement currently being executed, and it is normally only used if the
> executing statement has a cursor name. None of the client-side statements had
> a cursor name when I saw this.
> The method is always called when the network server executes a statement
> because the network server assigns a cursor name to each statement even if no
> cursor name has been set on the client side. If the list of open statements
> is short, the method is relatively cheap. If one uses
> ClientConnectionPoolDataSource with the JDBC statement cache, the list of
> open statements can however be quite long, and lookupCursorActivation() needs
> to spend a fair amount of time iterating over the list and comparing strings.
> The time spent looking for duplicate names in lookupCursorActivation() is
> actually wasted time when it is called from the network server, since the
> network server assigns unique names to the statements it executes, even when
> there are duplicate names on the client. It would be good if we could reduce
> the cost of this operation, or perhaps eliminate it completely when the
> client doesn't use cursor names.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.