[ 
https://issues.apache.org/jira/browse/HADOOP-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007172#comment-13007172
 ] 

Todd Lipcon commented on HADOOP-7183:
-------------------------------------

bq. This behavior wasn't changed by HADOOP-6881

It looks to me like it was. Prior to 6881, if nothing was in the cache, it did 
not "write back" to the cache in WritableComparator.get(). 6881 changed it to 
add {{comparators.put}} there.

bq. Have you actually verified that the shuffle's sorts actually do a get per a 
thread?

Not in the latest shuffle code, but it looked that way to me previously.


Using ThreadLocals seems like a reasonable alternate implementation. But we 
should still make the contract explicit that cached comparators must be 
threadsafe (yes, it's fairly obvious, but we seem to have missed it ourselves!)

> WritableComparator.get should not cache comparator objects
> ----------------------------------------------------------
>
>                 Key: HADOOP-7183
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7183
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Tom White
>            Priority: Blocker
>             Fix For: 0.20.3, 0.21.1, 0.22.0
>
>         Attachments: HADOOP-7183.patch
>
>
> HADOOP-6881 modified WritableComparator.get such that the constructed 
> WritableComparator gets saved back into the static map. This is fine for 
> stateless comparators, but some comparators have per-instance state, and thus 
> this becomes thread-unsafe and causes errors in the shuffle where multiple 
> threads are doing comparisons. An example of a Comparator with per-instance 
> state is WritableComparator itself.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to