[jira] Commented: (MAHOUT-230) Replace org.apache.mahout.math.Sorting with code of clear provenance
[ https://issues.apache.org/jira/browse/MAHOUT-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12795666#action_12795666 ] Benson Margulies commented on MAHOUT-230: - Grant, do you resolve or do I? Replace org.apache.mahout.math.Sorting with code of clear provenance Key: MAHOUT-230 URL: https://issues.apache.org/jira/browse/MAHOUT-230 Project: Mahout Issue Type: Bug Components: Math Affects Versions: 0.3 Reporter: Benson Margulies Assignee: Grant Ingersoll Fix For: 0.3 Attachments: replace-sorting.diff Original Estimate: 72h Remaining Estimate: 72h org.apache.mahout.math.Sorting looks as if the original author borrowed from the Sun JRE, based on the private internal function names and contents. That code has a restrictive license. We need to take the equivalent file (java.util.Arrays) from Apache Harmony and use it as the basis for a clean replacement. The problematic code are the quickSort and mergeSort functions, which extend 'Arrays' by supporting slices of arrays and custom sorting predicate functions. One might also wistfully note that the more recent JDKs from Sun have deployed different (and one hopes) better sort algorithms that 1.5 and/or Harmony, so a really energetic person might build implementations in here to match. However, expediency calls for just bashing on the Harmony implementation to solve the problem at hand. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-230) Replace org.apache.mahout.math.Sorting with code of clear provenance
[ https://issues.apache.org/jira/browse/MAHOUT-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12795047#action_12795047 ] Robin Anil commented on MAHOUT-230: --- I ran the SortingTest instead of 3.2 sec it now takes 5.2 seconds. I repeated the patch and rechecked. Any idea why the perf dip? Notice the perf drop in the second block i.e. after the line break in each block {code:xml|title=Original Output} testcase time=0.016 classname=org.apache.mahout.math.SortingTest name=testBinarySearch/ testcase time=0.015 classname=org.apache.mahout.math.SortingTest name=testBinarySearchObjects/ testcase time=0.032 classname=org.apache.mahout.math.SortingTest name=testQuickSortBytes/ testcase time=0.234 classname=org.apache.mahout.math.SortingTest name=testQuickSortChars/ testcase time=0.188 classname=org.apache.mahout.math.SortingTest name=testQuickSortInts/ testcase time=0.171 classname=org.apache.mahout.math.SortingTest name=testQuickSortLongs/ testcase time=0.204 classname=org.apache.mahout.math.SortingTest name=testQuickSortShorts/ testcase time=0.218 classname=org.apache.mahout.math.SortingTest name=testQuickSortFloats/ testcase time=0.235 classname=org.apache.mahout.math.SortingTest name=testQuickSortDoubles/ testcase time=0.062 classname=org.apache.mahout.math.SortingTest name=testMergeSortBytes/ testcase time=0.313 classname=org.apache.mahout.math.SortingTest name=testMergeSortChars/ testcase time=0.281 classname=org.apache.mahout.math.SortingTest name=testMergeSortInts/ testcase time=0.391 classname=org.apache.mahout.math.SortingTest name=testMergeSortLongs/ testcase time=0.25 classname=org.apache.mahout.math.SortingTest name=testMergeSortShorts/ testcase time=0.359 classname=org.apache.mahout.math.SortingTest name=testMergeSortFloats/ testcase time=0.328 classname=org.apache.mahout.math.SortingTest name=testMergeSortDoubles/ {code} {code:xml|title=After Patching} testcase time=0.031 classname=org.apache.mahout.math.SortingTest name=testBinarySearch/ testcase time=0 classname=org.apache.mahout.math.SortingTest name=testBinarySearchObjects/ testcase time=0 classname=org.apache.mahout.math.SortingTest name=testQuickSortBytes/ testcase time=0.234 classname=org.apache.mahout.math.SortingTest name=testQuickSortChars/ testcase time=0.234 classname=org.apache.mahout.math.SortingTest name=testQuickSortInts/ testcase time=0.172 classname=org.apache.mahout.math.SortingTest name=testQuickSortLongs/ testcase time=0.266 classname=org.apache.mahout.math.SortingTest name=testQuickSortShorts/ testcase time=0.235 classname=org.apache.mahout.math.SortingTest name=testQuickSortFloats/ testcase time=0.25 classname=org.apache.mahout.math.SortingTest name=testQuickSortDoubles/ testcase time=0.015 classname=org.apache.mahout.math.SortingTest name=testMergeSortBytes/ testcase time=0.594 classname=org.apache.mahout.math.SortingTest name=testMergeSortChars/ testcase time=0.61 classname=org.apache.mahout.math.SortingTest name=testMergeSortInts/ testcase time=0.781 classname=org.apache.mahout.math.SortingTest name=testMergeSortLongs/ testcase time=0.562 classname=org.apache.mahout.math.SortingTest name=testMergeSortShorts/ testcase time=0.578 classname=org.apache.mahout.math.SortingTest name=testMergeSortFloats/ testcase time=0.61 classname=org.apache.mahout.math.SortingTest name=testMergeSortDoubles/ {code} Replace org.apache.mahout.math.Sorting with code of clear provenance Key: MAHOUT-230 URL: https://issues.apache.org/jira/browse/MAHOUT-230 Project: Mahout Issue Type: Bug Components: Math Affects Versions: 0.3 Reporter: Benson Margulies Assignee: Benson Margulies Fix For: 0.3 Attachments: replace-sorting.diff Original Estimate: 72h Remaining Estimate: 72h org.apache.mahout.math.Sorting looks as if the original author borrowed from the Sun JRE, based on the private internal function names and contents. That code has a restrictive license. We need to take the equivalent file (java.util.Arrays) from Apache Harmony and use it as the basis for a clean replacement. The problematic code are the quickSort and mergeSort functions, which extend 'Arrays' by supporting slices of arrays and custom sorting predicate functions. One might also wistfully note that the more recent JDKs from Sun have deployed different (and one hopes) better sort algorithms that 1.5 and/or Harmony, so a really energetic person might build implementations in here to match. However, expediency calls for just bashing on the Harmony implementation to solve the problem at hand. -- This message is automatically generated by JIRA. - You can reply to this email to
[jira] Commented: (MAHOUT-230) Replace org.apache.mahout.math.Sorting with code of clear provenance
[ https://issues.apache.org/jira/browse/MAHOUT-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12795048#action_12795048 ] Grant Ingersoll commented on MAHOUT-230: I think we need to commit and then worry about performance. The legal issues outweigh the performance issues at this point. I'll commit and then we can open a new one for performance.. Replace org.apache.mahout.math.Sorting with code of clear provenance Key: MAHOUT-230 URL: https://issues.apache.org/jira/browse/MAHOUT-230 Project: Mahout Issue Type: Bug Components: Math Affects Versions: 0.3 Reporter: Benson Margulies Assignee: Grant Ingersoll Fix For: 0.3 Attachments: replace-sorting.diff Original Estimate: 72h Remaining Estimate: 72h org.apache.mahout.math.Sorting looks as if the original author borrowed from the Sun JRE, based on the private internal function names and contents. That code has a restrictive license. We need to take the equivalent file (java.util.Arrays) from Apache Harmony and use it as the basis for a clean replacement. The problematic code are the quickSort and mergeSort functions, which extend 'Arrays' by supporting slices of arrays and custom sorting predicate functions. One might also wistfully note that the more recent JDKs from Sun have deployed different (and one hopes) better sort algorithms that 1.5 and/or Harmony, so a really energetic person might build implementations in here to match. However, expediency calls for just bashing on the Harmony implementation to solve the problem at hand. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [jira] Commented: (MAHOUT-230) Replace org.apache.mahout.math.Sorting with code of clear provenance
Simple answer: the Harmony team didn't code as well as the Sun people did. This is not my metier, so if someone else can suggest algorithmic improvements ... On Tue, Dec 29, 2009 at 8:06 AM, Robin Anil (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/MAHOUT-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12795047#action_12795047 ] Robin Anil commented on MAHOUT-230: --- I ran the SortingTest instead of 3.2 sec it now takes 5.2 seconds. I repeated the patch and rechecked. Any idea why the perf dip? Notice the perf drop in the second block i.e. after the line break in each block {code:xml|title=Original Output} testcase time=0.016 classname=org.apache.mahout.math.SortingTest name=testBinarySearch/ testcase time=0.015 classname=org.apache.mahout.math.SortingTest name=testBinarySearchObjects/ testcase time=0.032 classname=org.apache.mahout.math.SortingTest name=testQuickSortBytes/ testcase time=0.234 classname=org.apache.mahout.math.SortingTest name=testQuickSortChars/ testcase time=0.188 classname=org.apache.mahout.math.SortingTest name=testQuickSortInts/ testcase time=0.171 classname=org.apache.mahout.math.SortingTest name=testQuickSortLongs/ testcase time=0.204 classname=org.apache.mahout.math.SortingTest name=testQuickSortShorts/ testcase time=0.218 classname=org.apache.mahout.math.SortingTest name=testQuickSortFloats/ testcase time=0.235 classname=org.apache.mahout.math.SortingTest name=testQuickSortDoubles/ testcase time=0.062 classname=org.apache.mahout.math.SortingTest name=testMergeSortBytes/ testcase time=0.313 classname=org.apache.mahout.math.SortingTest name=testMergeSortChars/ testcase time=0.281 classname=org.apache.mahout.math.SortingTest name=testMergeSortInts/ testcase time=0.391 classname=org.apache.mahout.math.SortingTest name=testMergeSortLongs/ testcase time=0.25 classname=org.apache.mahout.math.SortingTest name=testMergeSortShorts/ testcase time=0.359 classname=org.apache.mahout.math.SortingTest name=testMergeSortFloats/ testcase time=0.328 classname=org.apache.mahout.math.SortingTest name=testMergeSortDoubles/ {code} {code:xml|title=After Patching} testcase time=0.031 classname=org.apache.mahout.math.SortingTest name=testBinarySearch/ testcase time=0 classname=org.apache.mahout.math.SortingTest name=testBinarySearchObjects/ testcase time=0 classname=org.apache.mahout.math.SortingTest name=testQuickSortBytes/ testcase time=0.234 classname=org.apache.mahout.math.SortingTest name=testQuickSortChars/ testcase time=0.234 classname=org.apache.mahout.math.SortingTest name=testQuickSortInts/ testcase time=0.172 classname=org.apache.mahout.math.SortingTest name=testQuickSortLongs/ testcase time=0.266 classname=org.apache.mahout.math.SortingTest name=testQuickSortShorts/ testcase time=0.235 classname=org.apache.mahout.math.SortingTest name=testQuickSortFloats/ testcase time=0.25 classname=org.apache.mahout.math.SortingTest name=testQuickSortDoubles/ testcase time=0.015 classname=org.apache.mahout.math.SortingTest name=testMergeSortBytes/ testcase time=0.594 classname=org.apache.mahout.math.SortingTest name=testMergeSortChars/ testcase time=0.61 classname=org.apache.mahout.math.SortingTest name=testMergeSortInts/ testcase time=0.781 classname=org.apache.mahout.math.SortingTest name=testMergeSortLongs/ testcase time=0.562 classname=org.apache.mahout.math.SortingTest name=testMergeSortShorts/ testcase time=0.578 classname=org.apache.mahout.math.SortingTest name=testMergeSortFloats/ testcase time=0.61 classname=org.apache.mahout.math.SortingTest name=testMergeSortDoubles/ {code} Replace org.apache.mahout.math.Sorting with code of clear provenance Key: MAHOUT-230 URL: https://issues.apache.org/jira/browse/MAHOUT-230 Project: Mahout Issue Type: Bug Components: Math Affects Versions: 0.3 Reporter: Benson Margulies Assignee: Benson Margulies Fix For: 0.3 Attachments: replace-sorting.diff Original Estimate: 72h Remaining Estimate: 72h org.apache.mahout.math.Sorting looks as if the original author borrowed from the Sun JRE, based on the private internal function names and contents. That code has a restrictive license. We need to take the equivalent file (java.util.Arrays) from Apache Harmony and use it as the basis for a clean replacement. The problematic code are the quickSort and mergeSort functions, which extend 'Arrays' by supporting slices of arrays and custom sorting predicate functions. One might also wistfully note that the more recent JDKs from Sun have deployed different (and one hopes) better sort algorithms that 1.5 and/or Harmony,
[jira] Commented: (MAHOUT-230) Replace org.apache.mahout.math.Sorting with code of clear provenance
[ https://issues.apache.org/jira/browse/MAHOUT-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12795053#action_12795053 ] Grant Ingersoll commented on MAHOUT-230: Committed revision 894390. Replace org.apache.mahout.math.Sorting with code of clear provenance Key: MAHOUT-230 URL: https://issues.apache.org/jira/browse/MAHOUT-230 Project: Mahout Issue Type: Bug Components: Math Affects Versions: 0.3 Reporter: Benson Margulies Assignee: Grant Ingersoll Fix For: 0.3 Attachments: replace-sorting.diff Original Estimate: 72h Remaining Estimate: 72h org.apache.mahout.math.Sorting looks as if the original author borrowed from the Sun JRE, based on the private internal function names and contents. That code has a restrictive license. We need to take the equivalent file (java.util.Arrays) from Apache Harmony and use it as the basis for a clean replacement. The problematic code are the quickSort and mergeSort functions, which extend 'Arrays' by supporting slices of arrays and custom sorting predicate functions. One might also wistfully note that the more recent JDKs from Sun have deployed different (and one hopes) better sort algorithms that 1.5 and/or Harmony, so a really energetic person might build implementations in here to match. However, expediency calls for just bashing on the Harmony implementation to solve the problem at hand. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-230) Replace org.apache.mahout.math.Sorting with code of clear provenance
[ https://issues.apache.org/jira/browse/MAHOUT-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12795054#action_12795054 ] Benson Margulies commented on MAHOUT-230: - And it's all in the merge sorts. Replace org.apache.mahout.math.Sorting with code of clear provenance Key: MAHOUT-230 URL: https://issues.apache.org/jira/browse/MAHOUT-230 Project: Mahout Issue Type: Bug Components: Math Affects Versions: 0.3 Reporter: Benson Margulies Assignee: Grant Ingersoll Fix For: 0.3 Attachments: replace-sorting.diff Original Estimate: 72h Remaining Estimate: 72h org.apache.mahout.math.Sorting looks as if the original author borrowed from the Sun JRE, based on the private internal function names and contents. That code has a restrictive license. We need to take the equivalent file (java.util.Arrays) from Apache Harmony and use it as the basis for a clean replacement. The problematic code are the quickSort and mergeSort functions, which extend 'Arrays' by supporting slices of arrays and custom sorting predicate functions. One might also wistfully note that the more recent JDKs from Sun have deployed different (and one hopes) better sort algorithms that 1.5 and/or Harmony, so a really energetic person might build implementations in here to match. However, expediency calls for just bashing on the Harmony implementation to solve the problem at hand. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-230) Replace org.apache.mahout.math.Sorting with code of clear provenance
[ https://issues.apache.org/jira/browse/MAHOUT-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12795058#action_12795058 ] Robin Anil commented on MAHOUT-230: --- What about hadoop?. I guess its their core operation. http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/MergeSort.html Replace org.apache.mahout.math.Sorting with code of clear provenance Key: MAHOUT-230 URL: https://issues.apache.org/jira/browse/MAHOUT-230 Project: Mahout Issue Type: Bug Components: Math Affects Versions: 0.3 Reporter: Benson Margulies Assignee: Grant Ingersoll Fix For: 0.3 Attachments: replace-sorting.diff Original Estimate: 72h Remaining Estimate: 72h org.apache.mahout.math.Sorting looks as if the original author borrowed from the Sun JRE, based on the private internal function names and contents. That code has a restrictive license. We need to take the equivalent file (java.util.Arrays) from Apache Harmony and use it as the basis for a clean replacement. The problematic code are the quickSort and mergeSort functions, which extend 'Arrays' by supporting slices of arrays and custom sorting predicate functions. One might also wistfully note that the more recent JDKs from Sun have deployed different (and one hopes) better sort algorithms that 1.5 and/or Harmony, so a really energetic person might build implementations in here to match. However, expediency calls for just bashing on the Harmony implementation to solve the problem at hand. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-230) Replace org.apache.mahout.math.Sorting with code of clear provenance
[ https://issues.apache.org/jira/browse/MAHOUT-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12795060#action_12795060 ] Benson Margulies commented on MAHOUT-230: - Heck, a read of the reference cited in the JDK 1.6 doc would prove rewarding, no doubt. Anyone else willing? Replace org.apache.mahout.math.Sorting with code of clear provenance Key: MAHOUT-230 URL: https://issues.apache.org/jira/browse/MAHOUT-230 Project: Mahout Issue Type: Bug Components: Math Affects Versions: 0.3 Reporter: Benson Margulies Assignee: Grant Ingersoll Fix For: 0.3 Attachments: replace-sorting.diff Original Estimate: 72h Remaining Estimate: 72h org.apache.mahout.math.Sorting looks as if the original author borrowed from the Sun JRE, based on the private internal function names and contents. That code has a restrictive license. We need to take the equivalent file (java.util.Arrays) from Apache Harmony and use it as the basis for a clean replacement. The problematic code are the quickSort and mergeSort functions, which extend 'Arrays' by supporting slices of arrays and custom sorting predicate functions. One might also wistfully note that the more recent JDKs from Sun have deployed different (and one hopes) better sort algorithms that 1.5 and/or Harmony, so a really energetic person might build implementations in here to match. However, expediency calls for just bashing on the Harmony implementation to solve the problem at hand. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.