Re: RFR: 8170785: Excessive allocation in ParseUtil.encodePath

Roger Riggs Wed, 04 Jan 2017 07:58:32 -0800

Hi Claes,

ok, I didn't spot any bugs so its fine as is.



On 1/3/2017 5:52 PM, Claes Redestad wrote:

Hi Roger,

On 2017-01-03 22:22, Roger Riggs wrote:

Hi Claes,

So Windows didn't suffer from having the '\' separator.


relatively speaking Windows won't see any speed-up when dealing
with file system paths, but most places where we see encodePath in
profiles is actually not files: jar/runtime image paths, HTTP etc...

Thus Windows shouldn't be left too far behind for typical workloads. :-)


ParseUtil:

firstEncodeIndex:121:
  'a' - 'z' seems more frequent than '/' or '.'; does it improve the
stats to move that range to the beginning of the if.
  (yes the compiler can re-order).

line 125:
    Since 127 is known to need encoding it could be >= 0x007f

line 136: I suppose the arraycopy intrinsic already optimizes length== 0;


Line 134:  I question the math on  * 2 + 16 -index;  (But this is
pre-existing code)
   if there were lots of characters that needed encoding it might be
possible to overflow the array since 1 char is replaced by at least 3
and up to 9.
   16 seems like a questionable fudge factor; but perhaps it has not
been a problem in practice.


As the main objective here is to get rid of the allocations while not
regressing throughput when they can't be avoided I resisted *most*
urges to micro-optimize. :-)

The weird heuristic with len * 2 + 16 is ugly, yes, but I prefer to
leave it mostly alone and perhaps revisit this for a throughput
performance enhancement in the future.

'/' can actually be rather frequent in paths, and while '.' is likely
less common and should have been inserted after a-z, adding the check
improved performance of encoding to-be-rather-common paths like
"/java.base/bla/bla/bla/" remarkably while not visibly regressing
anything else.

What is clear from micros is that having to load and call into the
BitSet has a relatively large overhead compared to an extra comparison
(which is likely why there are separate checks for a-z, A-Z and 0-9 in
the first place).

A good optimizer should be able to bisect and get the result efficientlygiven the constant values.


I should actually see if we can't remove the BitSet entirely: I think
the java.net.URI approach with final long masks and a simple shift and
check might be more efficient and relatively easy to duplicate here.

A couple of fixed 64-bit compile time bitmasks computed from the set ofencoded chars would be pretty efficient and compact too.

I did wonder if there was a way to have a common utility for the encoding;
but that's probably something to save for later.

Roger


Thanks!

/Claes


$.02, Roger



On 1/3/2017 9:46 AM, Claes Redestad wrote:

Hi,

some users reports high allocation rates in ParseUtil.encodePath,
regardless of whether paths actually need to be encoded or not.
Since this is a commonly used utility it makes sense.

Webrev: http://cr.openjdk.java.net/~redestad/8170785/webrev.01
Bug: https://bugs.openjdk.java.net/browse/JDK-8170785

This patch provides a semantically neutral fast-path for cases when
the path does not need to be encoded (up to 5x speedup), reduces
allocation when the string has a prefix that does not need to be
encoded (1-2x speedup) and no regression when using a separator
that's not '/' or the first char needs encoding.

Interpreted performance is not affected much either: small positive
when no encoding is needed, neutral or negligible regression
otherwise.

Thanks!

/Claes

Re: RFR: 8170785: Excessive allocation in ParseUtil.encodePath

Reply via email to