Re: Code Review Request: 7171415: java.net.URI.equals/hashCode not consistent for some URIs

Vitaly Davidovich Tue, 08 Jan 2013 20:30:20 -0800

Yes, I also thought about range checks not being eliminated if using
charAt() but I guess that depends on how smart the JIT is - if charAt is
inlined there's technically enough info there for the compiler to see that
range checks aren't needed.  Whether that happens or not I haven't
checked.  toCharArray brings us back to having allocations unless, again,
EA helps out.  I think a microbenchmark would help here (along with verbose
GC logging) to see which is better if this is a concern.


Why do you say you need to duplicate String.hashCode to be consistent with
what people are using already? As long as the hash quality is at least as
good as today (or not significantly worse) shouldn't you be able to change
the impl? If someone's relying on specific value for some input then their
code is broken.  Besides, doing toUpper will change the hash for URIs with
% anyway.  Perhaps I misunderstood your point though ...

Vitaly

Sent from my phone
On Jan 8, 2013 11:04 PM, "Kurchi Subhra Hazra" <
[email protected]> wrote:

>  On 1/8/13 6:55 PM, Vitaly Davidovich wrote:
>
> Also, I'm not sure how hot this method is in practice but allocating
> StringBuilder seems a bit heavy (maybe it gets partially escape analyzed
> out though).  Can you instead loop over all the chars in the string and
> build up the hash code as you go along? If you see a % then you handle next
> 2 chars specially, like you do now.  Or are you trying to take advantage of
> String intrinsic support in the JIT? I guess if perf is a concern you can
> write a micro benchmark comparing the approaches ...
>
> That did occur to me, but I guess we have to be consistent with the value
> that people have already been using, and that means I have
> to duplicate the code in String.hashCode() (that is what the original
> implementation was calling) - I was trying to avoid that. Also,
> String.hashCode() uses its internal char[] - whereas charAt() will involve
> several additional bound checks - but
> using toCharArray() may be better. Let me take another look at this, and
> get back with another webrev.
>
>  Sent from my phone
> On Jan 8, 2013 9:45 PM, "Vitaly Davidovich" <[email protected]> wrote:
>
>> Hi Kurchi,
>>
>> In the hash method, I suggest you move handling of strings with % into a
>> separate method to keep the hash method small for common case (no %).
>> Otherwise, there's a chance this method won't get inlined anymore due to
>> its (newly increased) size.
>>
>  - Yep, will do.
>
>   Also, I realize toLower does the same thing, but why does toUpper
>> return an int whereas it's really a char? Might be cleaner to declare
>> return type as char and do the cast inside the method as needed.
>>
>  - I followed the format of toLower(). But I agree this way it will be
> cleaner.
>
> Thanks a lot,
> Kurchi
>
>
>
>   Thanks
>>
>> Sent from my phone
>> On Jan 8, 2013 8:20 PM, "Kurchi Hazra" <[email protected]>
>> wrote:
>>
>>> Hi,
>>>
>>>     According to RFC 3986[1], hexadecimal digits encoded by a '%' should
>>> be case-insensitive, for example,%A2 and %a2 should be
>>> considered equal. Although, URI.equals() does take this into
>>> consideration, the implementation of URI.hashCode() does not and
>>> returns different hashcodes for two URIs that are similar in all
>>> respects except for the case of the percent-encoded hexadecimal digits.
>>> This fix attempts to construct a normalized string from the string
>>> representing a component before calculating its hashCode.
>>> I converted to upper case for the normalization(and not lower case) as
>>> required by [1].
>>>
>>>     For testing the fix, I added an additional test scenario to an
>>> existing test (jdk/test/java/net/URI/Test.java). While I was there, I also
>>> made
>>> minor changes to the test so that it does not produce rawtype and other
>>> lint warnings.
>>>
>>> Bug: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7171415
>>> Webrev: http://cr.openjdk.java.net/~khazra/7171415/webrev.00/
>>>
>>> URI.compareTo() still suffers from the same problem - I am not sure if
>>> it should be dealt with as a separate bug.
>>>
>>> Thanks,
>>> Kurchi
>>>
>>> [1] http://tools.ietf.org/html/rfc3986#section-6.2.2.1
>>>
>>
>

Re: Code Review Request: 7171415: java.net.URI.equals/hashCode not consistent for some URIs

Reply via email to