Thanks for bringing this up. If you can find any place where an array might realistically be larger than 67 million elements, then I guess file a JIRA for it. Also this array needs to be of objects, not of primitives (quicksort is used for those in jdk7, apparently). I can't think of any such place off the top of my head, but I might be missing something.
Potentially we could also file a JIRA just as a place to gather the discussion, even if the only outcome is a release note. best, Colin On Thu, Feb 26, 2015 at 12:16 AM, Tsuyoshi Ozawa <oz...@apache.org> wrote: > Maybe we should discuss whether the elements of array can be larger > than 67108864 in our use cases - e.g. FairScheduler uses > Collection.sort(), but the number of job isn't larger than 67108864 in > many use cases, so we can keep using it. It's also reasonable that we > choose to use safe algorithms for stability. > > Thanks, > - Tsuyoshi > > On Thu, Feb 26, 2015 at 5:04 PM, Tsuyoshi Ozawa <oz...@apache.org> wrote: >> Hi hadoop developers, >> >> Last 2 weeks, a bug of JDK about TimSort, related to Collections#sort, >> is reported. How can we deal with this problem? >> >> http://envisage-project.eu/timsort-specification-and-verification/ >> https://bugs.openjdk.java.net/browse/JDK-8072909 >> >> The bug causes ArrayIndexOutOfBoundsException if the number of element >> is larger than 67108864. >> >> We use the sort method at 77 places at least. >> find . -name "*.java" | xargs grep "Collections.sort" | wc -l >> 77 >> >> One reasonable workaround is to set >> java.util.Arrays.useLegacyMergeSort() by default. >> >> Thanks, >> - Tsuyoshi