Re: Code Review Request: 7171415: java.net.URI.equals/hashCode not consistent for some URIs

Vitaly Davidovich Tue, 15 Jan 2013 05:28:19 -0800

Looks good.  You can move ch, i, and index declarations into the for loop
scope, but it's fine as-is.


Thanks

Sent from my phone
On Jan 14, 2013 3:05 PM, "Kurchi Hazra" <kurchi.subhra.ha...@oracle.com>
wrote:

>  Thank you all for your comments. Here is an updated webrev where
> URI.hashCode() calculates the hashCode itself
> instead of relying on String.hashCode():
>
> http://cr.openjdk.java.net/~khazra/7171415/webrev.01/
>
> Thanks,
> Kurchi
>
> On 08.01.2013 20:29, Vitaly Davidovich wrote:
>
> Yes, I also thought about range checks not being eliminated if using
> charAt() but I guess that depends on how smart the JIT is - if charAt is
> inlined there's technically enough info there for the compiler to see that
> range checks aren't needed.  Whether that happens or not I haven't
> checked.  toCharArray brings us back to having allocations unless, again,
> EA helps out.  I think a microbenchmark would help here (along with verbose
> GC logging) to see which is better if this is a concern.
>
> Why do you say you need to duplicate String.hashCode to be consistent with
> what people are using already? As long as the hash quality is at least as
> good as today (or not significantly worse) shouldn't you be able to change
> the impl? If someone's relying on specific value for some input then their
> code is broken.  Besides, doing toUpper will change the hash for URIs with
> % anyway.  Perhaps I misunderstood your point though ...
>
> Vitaly
>
> Sent from my phone
> On Jan 8, 2013 11:04 PM, "Kurchi Subhra Hazra" <
> kurchi.subhra.ha...@oracle.com> wrote:
>
>>  On 1/8/13 6:55 PM, Vitaly Davidovich wrote:
>>
>> Also, I'm not sure how hot this method is in practice but allocating
>> StringBuilder seems a bit heavy (maybe it gets partially escape analyzed
>> out though).  Can you instead loop over all the chars in the string and
>> build up the hash code as you go along? If you see a % then you handle next
>> 2 chars specially, like you do now.  Or are you trying to take advantage of
>> String intrinsic support in the JIT? I guess if perf is a concern you can
>> write a micro benchmark comparing the approaches ...
>>
>> That did occur to me, but I guess we have to be consistent with the value
>> that people have already been using, and that means I have
>> to duplicate the code in String.hashCode() (that is what the original
>> implementation was calling) - I was trying to avoid that. Also,
>> String.hashCode() uses its internal char[] - whereas charAt() will
>> involve several additional bound checks - but
>> using toCharArray() may be better. Let me take another look at this, and
>> get back with another webrev.
>>
>>  Sent from my phone
>> On Jan 8, 2013 9:45 PM, "Vitaly Davidovich" <vita...@gmail.com> wrote:
>>
>>> Hi Kurchi,
>>>
>>> In the hash method, I suggest you move handling of strings with % into a
>>> separate method to keep the hash method small for common case (no %).
>>> Otherwise, there's a chance this method won't get inlined anymore due to
>>> its (newly increased) size.
>>>
>>  - Yep, will do.
>>
>>   Also, I realize toLower does the same thing, but why does toUpper
>>> return an int whereas it's really a char? Might be cleaner to declare
>>> return type as char and do the cast inside the method as needed.
>>>
>>  - I followed the format of toLower(). But I agree this way it will be
>> cleaner.
>>
>> Thanks a lot,
>> Kurchi
>>
>>
>>
>>   Thanks
>>>
>>> Sent from my phone
>>> On Jan 8, 2013 8:20 PM, "Kurchi Hazra" <kurchi.subhra.ha...@oracle.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>>     According to RFC 3986[1], hexadecimal digits encoded by a '%'
>>>> should be case-insensitive, for example,%A2 and %a2 should be
>>>> considered equal. Although, URI.equals() does take this into
>>>> consideration, the implementation of URI.hashCode() does not and
>>>> returns different hashcodes for two URIs that are similar in all
>>>> respects except for the case of the percent-encoded hexadecimal digits.
>>>> This fix attempts to construct a normalized string from the string
>>>> representing a component before calculating its hashCode.
>>>> I converted to upper case for the normalization(and not lower case) as
>>>> required by [1].
>>>>
>>>>     For testing the fix, I added an additional test scenario to an
>>>> existing test (jdk/test/java/net/URI/Test.java). While I was there, I also
>>>> made
>>>> minor changes to the test so that it does not produce rawtype and other
>>>> lint warnings.
>>>>
>>>> Bug: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7171415
>>>> Webrev: http://cr.openjdk.java.net/~khazra/7171415/webrev.00/
>>>>
>>>> URI.compareTo() still suffers from the same problem - I am not sure if
>>>> it should be dealt with as a separate bug.
>>>>
>>>> Thanks,
>>>> Kurchi
>>>>
>>>> [1] http://tools.ietf.org/html/rfc3986#section-6.2.2.1
>>>>
>>>
>>
> --
> -Kurchi
>
>

Re: Code Review Request: 7171415: java.net.URI.equals/hashCode not consistent for some URIs

Reply via email to