Hi Claes,
ok, I didn't spot any bugs so its fine as is.
On 1/3/2017 5:52 PM, Claes Redestad wrote:
Hi Roger,
On 2017-01-03 22:22, Roger Riggs wrote:
Hi Claes,
So Windows didn't suffer from having the '\' separator.
relatively speaking Windows won't see any speed-up when dealing
with file system paths, but most places where we see encodePath in
profiles is actually not files: jar/runtime image paths, HTTP etc...
Thus Windows shouldn't be left too far behind for typical workloads. :-)
ParseUtil:
firstEncodeIndex:121:
'a' - 'z' seems more frequent than '/' or '.'; does it improve the
stats to move that range to the beginning of the if.
(yes the compiler can re-order).
line 125:
Since 127 is known to need encoding it could be >= 0x007f
line 136: I suppose the arraycopy intrinsic already optimizes length
== 0;
Line 134: I question the math on * 2 + 16 -index; (But this is
pre-existing code)
if there were lots of characters that needed encoding it might be
possible to overflow the array since 1 char is replaced by at least 3
and up to 9.
16 seems like a questionable fudge factor; but perhaps it has not
been a problem in practice.
As the main objective here is to get rid of the allocations while not
regressing throughput when they can't be avoided I resisted *most*
urges to micro-optimize. :-)
The weird heuristic with len * 2 + 16 is ugly, yes, but I prefer to
leave it mostly alone and perhaps revisit this for a throughput
performance enhancement in the future.
'/' can actually be rather frequent in paths, and while '.' is likely
less common and should have been inserted after a-z, adding the check
improved performance of encoding to-be-rather-common paths like
"/java.base/bla/bla/bla/" remarkably while not visibly regressing
anything else.
What is clear from micros is that having to load and call into the
BitSet has a relatively large overhead compared to an extra comparison
(which is likely why there are separate checks for a-z, A-Z and 0-9 in
the first place).
A good optimizer should be able to bisect and get the result efficiently
given the constant values.
I should actually see if we can't remove the BitSet entirely: I think
the java.net.URI approach with final long masks and a simple shift and
check might be more efficient and relatively easy to duplicate here.
A couple of fixed 64-bit compile time bitmasks computed from the set of
encoded chars would be pretty efficient and compact too.
I did wonder if there was a way to have a common utility for the encoding;
but that's probably something to save for later.
Roger
Thanks!
/Claes
$.02, Roger
On 1/3/2017 9:46 AM, Claes Redestad wrote:
Hi,
some users reports high allocation rates in ParseUtil.encodePath,
regardless of whether paths actually need to be encoded or not.
Since this is a commonly used utility it makes sense.
Webrev: http://cr.openjdk.java.net/~redestad/8170785/webrev.01
Bug: https://bugs.openjdk.java.net/browse/JDK-8170785
This patch provides a semantically neutral fast-path for cases when
the path does not need to be encoded (up to 5x speedup), reduces
allocation when the string has a prefix that does not need to be
encoded (1-2x speedup) and no regression when using a separator
that's not '/' or the first char needs encoding.
Interpreted performance is not affected much either: small positive
when no encoding is needed, neutral or negligible regression
otherwise.
Thanks!
/Claes