[jira] Commented: (MAHOUT-230) Replace org.apache.mahout.math.Sorting with code of clear provenance

2009-12-31 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12795666#action_12795666
 ] 

Benson Margulies commented on MAHOUT-230:
-

Grant, do you resolve or do I?

 Replace org.apache.mahout.math.Sorting with code of clear provenance
 

 Key: MAHOUT-230
 URL: https://issues.apache.org/jira/browse/MAHOUT-230
 Project: Mahout
  Issue Type: Bug
  Components: Math
Affects Versions: 0.3
Reporter: Benson Margulies
Assignee: Grant Ingersoll
 Fix For: 0.3

 Attachments: replace-sorting.diff

   Original Estimate: 72h
  Remaining Estimate: 72h

 org.apache.mahout.math.Sorting looks as if the original author borrowed from 
 the Sun JRE, based on the private internal function names and contents. That 
 code has a restrictive license. We need to take the equivalent file 
 (java.util.Arrays) from Apache Harmony and use it as the basis for a clean 
 replacement.
 The problematic code are the quickSort and mergeSort functions, which extend 
 'Arrays' by supporting slices of arrays and custom sorting predicate 
 functions. 
 One might also wistfully note that the more recent JDKs from Sun have 
 deployed different (and one hopes) better sort algorithms that 1.5 and/or 
 Harmony, so a really energetic person might build implementations in here to 
 match. However, expediency calls for just bashing on the Harmony 
 implementation to solve the problem at hand.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-230) Replace org.apache.mahout.math.Sorting with code of clear provenance

2009-12-29 Thread Robin Anil (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12795047#action_12795047
 ] 

Robin Anil commented on MAHOUT-230:
---

I ran the SortingTest instead of 3.2 sec it now takes 5.2 seconds. I repeated 
the patch and rechecked. Any idea why the perf dip?  Notice the perf drop in 
the second block  i.e. after the line break in each block

{code:xml|title=Original Output} 

  testcase time=0.016 classname=org.apache.mahout.math.SortingTest 
name=testBinarySearch/
  testcase time=0.015 classname=org.apache.mahout.math.SortingTest 
name=testBinarySearchObjects/
  testcase time=0.032 classname=org.apache.mahout.math.SortingTest 
name=testQuickSortBytes/
  testcase time=0.234 classname=org.apache.mahout.math.SortingTest 
name=testQuickSortChars/
  testcase time=0.188 classname=org.apache.mahout.math.SortingTest 
name=testQuickSortInts/
  testcase time=0.171 classname=org.apache.mahout.math.SortingTest 
name=testQuickSortLongs/
  testcase time=0.204 classname=org.apache.mahout.math.SortingTest 
name=testQuickSortShorts/
  testcase time=0.218 classname=org.apache.mahout.math.SortingTest 
name=testQuickSortFloats/
  testcase time=0.235 classname=org.apache.mahout.math.SortingTest 
name=testQuickSortDoubles/
  testcase time=0.062 classname=org.apache.mahout.math.SortingTest 
name=testMergeSortBytes/

  testcase time=0.313 classname=org.apache.mahout.math.SortingTest 
name=testMergeSortChars/
  testcase time=0.281 classname=org.apache.mahout.math.SortingTest 
name=testMergeSortInts/
  testcase time=0.391 classname=org.apache.mahout.math.SortingTest 
name=testMergeSortLongs/
  testcase time=0.25 classname=org.apache.mahout.math.SortingTest 
name=testMergeSortShorts/
  testcase time=0.359 classname=org.apache.mahout.math.SortingTest 
name=testMergeSortFloats/
  testcase time=0.328 classname=org.apache.mahout.math.SortingTest 
name=testMergeSortDoubles/
{code} 

{code:xml|title=After Patching} 

  testcase time=0.031 classname=org.apache.mahout.math.SortingTest 
name=testBinarySearch/
  testcase time=0 classname=org.apache.mahout.math.SortingTest 
name=testBinarySearchObjects/
  testcase time=0 classname=org.apache.mahout.math.SortingTest 
name=testQuickSortBytes/
  testcase time=0.234 classname=org.apache.mahout.math.SortingTest 
name=testQuickSortChars/
  testcase time=0.234 classname=org.apache.mahout.math.SortingTest 
name=testQuickSortInts/
  testcase time=0.172 classname=org.apache.mahout.math.SortingTest 
name=testQuickSortLongs/
  testcase time=0.266 classname=org.apache.mahout.math.SortingTest 
name=testQuickSortShorts/
  testcase time=0.235 classname=org.apache.mahout.math.SortingTest 
name=testQuickSortFloats/
  testcase time=0.25 classname=org.apache.mahout.math.SortingTest 
name=testQuickSortDoubles/
  testcase time=0.015 classname=org.apache.mahout.math.SortingTest 
name=testMergeSortBytes/

  testcase time=0.594 classname=org.apache.mahout.math.SortingTest 
name=testMergeSortChars/
  testcase time=0.61 classname=org.apache.mahout.math.SortingTest 
name=testMergeSortInts/
  testcase time=0.781 classname=org.apache.mahout.math.SortingTest 
name=testMergeSortLongs/
  testcase time=0.562 classname=org.apache.mahout.math.SortingTest 
name=testMergeSortShorts/
  testcase time=0.578 classname=org.apache.mahout.math.SortingTest 
name=testMergeSortFloats/
  testcase time=0.61 classname=org.apache.mahout.math.SortingTest 
name=testMergeSortDoubles/
{code} 

 Replace org.apache.mahout.math.Sorting with code of clear provenance
 

 Key: MAHOUT-230
 URL: https://issues.apache.org/jira/browse/MAHOUT-230
 Project: Mahout
  Issue Type: Bug
  Components: Math
Affects Versions: 0.3
Reporter: Benson Margulies
Assignee: Benson Margulies
 Fix For: 0.3

 Attachments: replace-sorting.diff

   Original Estimate: 72h
  Remaining Estimate: 72h

 org.apache.mahout.math.Sorting looks as if the original author borrowed from 
 the Sun JRE, based on the private internal function names and contents. That 
 code has a restrictive license. We need to take the equivalent file 
 (java.util.Arrays) from Apache Harmony and use it as the basis for a clean 
 replacement.
 The problematic code are the quickSort and mergeSort functions, which extend 
 'Arrays' by supporting slices of arrays and custom sorting predicate 
 functions. 
 One might also wistfully note that the more recent JDKs from Sun have 
 deployed different (and one hopes) better sort algorithms that 1.5 and/or 
 Harmony, so a really energetic person might build implementations in here to 
 match. However, expediency calls for just bashing on the Harmony 
 implementation to solve the problem at hand.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to 

[jira] Commented: (MAHOUT-230) Replace org.apache.mahout.math.Sorting with code of clear provenance

2009-12-29 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12795048#action_12795048
 ] 

Grant Ingersoll commented on MAHOUT-230:


I think we need to commit and then worry about performance.  The legal issues 
outweigh the performance issues at this point.  I'll commit and then we can 
open a new one for performance..

 Replace org.apache.mahout.math.Sorting with code of clear provenance
 

 Key: MAHOUT-230
 URL: https://issues.apache.org/jira/browse/MAHOUT-230
 Project: Mahout
  Issue Type: Bug
  Components: Math
Affects Versions: 0.3
Reporter: Benson Margulies
Assignee: Grant Ingersoll
 Fix For: 0.3

 Attachments: replace-sorting.diff

   Original Estimate: 72h
  Remaining Estimate: 72h

 org.apache.mahout.math.Sorting looks as if the original author borrowed from 
 the Sun JRE, based on the private internal function names and contents. That 
 code has a restrictive license. We need to take the equivalent file 
 (java.util.Arrays) from Apache Harmony and use it as the basis for a clean 
 replacement.
 The problematic code are the quickSort and mergeSort functions, which extend 
 'Arrays' by supporting slices of arrays and custom sorting predicate 
 functions. 
 One might also wistfully note that the more recent JDKs from Sun have 
 deployed different (and one hopes) better sort algorithms that 1.5 and/or 
 Harmony, so a really energetic person might build implementations in here to 
 match. However, expediency calls for just bashing on the Harmony 
 implementation to solve the problem at hand.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Commented: (MAHOUT-230) Replace org.apache.mahout.math.Sorting with code of clear provenance

2009-12-29 Thread Benson Margulies
Simple answer: the Harmony team didn't code as well as the Sun people did.

This is not my metier, so if someone else can suggest algorithmic
improvements ...

On Tue, Dec 29, 2009 at 8:06 AM, Robin Anil (JIRA) j...@apache.org wrote:

    [ 
 https://issues.apache.org/jira/browse/MAHOUT-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12795047#action_12795047
  ]

 Robin Anil commented on MAHOUT-230:
 ---

 I ran the SortingTest instead of 3.2 sec it now takes 5.2 seconds. I repeated 
 the patch and rechecked. Any idea why the perf dip?  Notice the perf drop in 
 the second block  i.e. after the line break in each block

 {code:xml|title=Original Output}

  testcase time=0.016 classname=org.apache.mahout.math.SortingTest 
 name=testBinarySearch/
  testcase time=0.015 classname=org.apache.mahout.math.SortingTest 
 name=testBinarySearchObjects/
  testcase time=0.032 classname=org.apache.mahout.math.SortingTest 
 name=testQuickSortBytes/
  testcase time=0.234 classname=org.apache.mahout.math.SortingTest 
 name=testQuickSortChars/
  testcase time=0.188 classname=org.apache.mahout.math.SortingTest 
 name=testQuickSortInts/
  testcase time=0.171 classname=org.apache.mahout.math.SortingTest 
 name=testQuickSortLongs/
  testcase time=0.204 classname=org.apache.mahout.math.SortingTest 
 name=testQuickSortShorts/
  testcase time=0.218 classname=org.apache.mahout.math.SortingTest 
 name=testQuickSortFloats/
  testcase time=0.235 classname=org.apache.mahout.math.SortingTest 
 name=testQuickSortDoubles/
  testcase time=0.062 classname=org.apache.mahout.math.SortingTest 
 name=testMergeSortBytes/

  testcase time=0.313 classname=org.apache.mahout.math.SortingTest 
 name=testMergeSortChars/
  testcase time=0.281 classname=org.apache.mahout.math.SortingTest 
 name=testMergeSortInts/
  testcase time=0.391 classname=org.apache.mahout.math.SortingTest 
 name=testMergeSortLongs/
  testcase time=0.25 classname=org.apache.mahout.math.SortingTest 
 name=testMergeSortShorts/
  testcase time=0.359 classname=org.apache.mahout.math.SortingTest 
 name=testMergeSortFloats/
  testcase time=0.328 classname=org.apache.mahout.math.SortingTest 
 name=testMergeSortDoubles/
 {code}

 {code:xml|title=After Patching}

  testcase time=0.031 classname=org.apache.mahout.math.SortingTest 
 name=testBinarySearch/
  testcase time=0 classname=org.apache.mahout.math.SortingTest 
 name=testBinarySearchObjects/
  testcase time=0 classname=org.apache.mahout.math.SortingTest 
 name=testQuickSortBytes/
  testcase time=0.234 classname=org.apache.mahout.math.SortingTest 
 name=testQuickSortChars/
  testcase time=0.234 classname=org.apache.mahout.math.SortingTest 
 name=testQuickSortInts/
  testcase time=0.172 classname=org.apache.mahout.math.SortingTest 
 name=testQuickSortLongs/
  testcase time=0.266 classname=org.apache.mahout.math.SortingTest 
 name=testQuickSortShorts/
  testcase time=0.235 classname=org.apache.mahout.math.SortingTest 
 name=testQuickSortFloats/
  testcase time=0.25 classname=org.apache.mahout.math.SortingTest 
 name=testQuickSortDoubles/
  testcase time=0.015 classname=org.apache.mahout.math.SortingTest 
 name=testMergeSortBytes/

  testcase time=0.594 classname=org.apache.mahout.math.SortingTest 
 name=testMergeSortChars/
  testcase time=0.61 classname=org.apache.mahout.math.SortingTest 
 name=testMergeSortInts/
  testcase time=0.781 classname=org.apache.mahout.math.SortingTest 
 name=testMergeSortLongs/
  testcase time=0.562 classname=org.apache.mahout.math.SortingTest 
 name=testMergeSortShorts/
  testcase time=0.578 classname=org.apache.mahout.math.SortingTest 
 name=testMergeSortFloats/
  testcase time=0.61 classname=org.apache.mahout.math.SortingTest 
 name=testMergeSortDoubles/
 {code}

 Replace org.apache.mahout.math.Sorting with code of clear provenance
 

                 Key: MAHOUT-230
                 URL: https://issues.apache.org/jira/browse/MAHOUT-230
             Project: Mahout
          Issue Type: Bug
          Components: Math
    Affects Versions: 0.3
            Reporter: Benson Margulies
            Assignee: Benson Margulies
             Fix For: 0.3

         Attachments: replace-sorting.diff

   Original Estimate: 72h
  Remaining Estimate: 72h

 org.apache.mahout.math.Sorting looks as if the original author borrowed from 
 the Sun JRE, based on the private internal function names and contents. That 
 code has a restrictive license. We need to take the equivalent file 
 (java.util.Arrays) from Apache Harmony and use it as the basis for a clean 
 replacement.
 The problematic code are the quickSort and mergeSort functions, which extend 
 'Arrays' by supporting slices of arrays and custom sorting predicate 
 functions.
 One might also wistfully note that the more recent JDKs from Sun have 
 deployed different (and one hopes) better sort algorithms that 1.5 and/or 
 Harmony, 

[jira] Commented: (MAHOUT-230) Replace org.apache.mahout.math.Sorting with code of clear provenance

2009-12-29 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12795053#action_12795053
 ] 

Grant Ingersoll commented on MAHOUT-230:


Committed revision 894390.

 Replace org.apache.mahout.math.Sorting with code of clear provenance
 

 Key: MAHOUT-230
 URL: https://issues.apache.org/jira/browse/MAHOUT-230
 Project: Mahout
  Issue Type: Bug
  Components: Math
Affects Versions: 0.3
Reporter: Benson Margulies
Assignee: Grant Ingersoll
 Fix For: 0.3

 Attachments: replace-sorting.diff

   Original Estimate: 72h
  Remaining Estimate: 72h

 org.apache.mahout.math.Sorting looks as if the original author borrowed from 
 the Sun JRE, based on the private internal function names and contents. That 
 code has a restrictive license. We need to take the equivalent file 
 (java.util.Arrays) from Apache Harmony and use it as the basis for a clean 
 replacement.
 The problematic code are the quickSort and mergeSort functions, which extend 
 'Arrays' by supporting slices of arrays and custom sorting predicate 
 functions. 
 One might also wistfully note that the more recent JDKs from Sun have 
 deployed different (and one hopes) better sort algorithms that 1.5 and/or 
 Harmony, so a really energetic person might build implementations in here to 
 match. However, expediency calls for just bashing on the Harmony 
 implementation to solve the problem at hand.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-230) Replace org.apache.mahout.math.Sorting with code of clear provenance

2009-12-29 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12795054#action_12795054
 ] 

Benson Margulies commented on MAHOUT-230:
-

And it's all in the merge sorts.

 Replace org.apache.mahout.math.Sorting with code of clear provenance
 

 Key: MAHOUT-230
 URL: https://issues.apache.org/jira/browse/MAHOUT-230
 Project: Mahout
  Issue Type: Bug
  Components: Math
Affects Versions: 0.3
Reporter: Benson Margulies
Assignee: Grant Ingersoll
 Fix For: 0.3

 Attachments: replace-sorting.diff

   Original Estimate: 72h
  Remaining Estimate: 72h

 org.apache.mahout.math.Sorting looks as if the original author borrowed from 
 the Sun JRE, based on the private internal function names and contents. That 
 code has a restrictive license. We need to take the equivalent file 
 (java.util.Arrays) from Apache Harmony and use it as the basis for a clean 
 replacement.
 The problematic code are the quickSort and mergeSort functions, which extend 
 'Arrays' by supporting slices of arrays and custom sorting predicate 
 functions. 
 One might also wistfully note that the more recent JDKs from Sun have 
 deployed different (and one hopes) better sort algorithms that 1.5 and/or 
 Harmony, so a really energetic person might build implementations in here to 
 match. However, expediency calls for just bashing on the Harmony 
 implementation to solve the problem at hand.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-230) Replace org.apache.mahout.math.Sorting with code of clear provenance

2009-12-29 Thread Robin Anil (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12795058#action_12795058
 ] 

Robin Anil commented on MAHOUT-230:
---

What about hadoop?. I guess its their core operation.

http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/MergeSort.html

 Replace org.apache.mahout.math.Sorting with code of clear provenance
 

 Key: MAHOUT-230
 URL: https://issues.apache.org/jira/browse/MAHOUT-230
 Project: Mahout
  Issue Type: Bug
  Components: Math
Affects Versions: 0.3
Reporter: Benson Margulies
Assignee: Grant Ingersoll
 Fix For: 0.3

 Attachments: replace-sorting.diff

   Original Estimate: 72h
  Remaining Estimate: 72h

 org.apache.mahout.math.Sorting looks as if the original author borrowed from 
 the Sun JRE, based on the private internal function names and contents. That 
 code has a restrictive license. We need to take the equivalent file 
 (java.util.Arrays) from Apache Harmony and use it as the basis for a clean 
 replacement.
 The problematic code are the quickSort and mergeSort functions, which extend 
 'Arrays' by supporting slices of arrays and custom sorting predicate 
 functions. 
 One might also wistfully note that the more recent JDKs from Sun have 
 deployed different (and one hopes) better sort algorithms that 1.5 and/or 
 Harmony, so a really energetic person might build implementations in here to 
 match. However, expediency calls for just bashing on the Harmony 
 implementation to solve the problem at hand.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-230) Replace org.apache.mahout.math.Sorting with code of clear provenance

2009-12-29 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12795060#action_12795060
 ] 

Benson Margulies commented on MAHOUT-230:
-

Heck, a read of the reference cited in the JDK 1.6 doc would prove rewarding, 
no doubt. Anyone else willing?

 Replace org.apache.mahout.math.Sorting with code of clear provenance
 

 Key: MAHOUT-230
 URL: https://issues.apache.org/jira/browse/MAHOUT-230
 Project: Mahout
  Issue Type: Bug
  Components: Math
Affects Versions: 0.3
Reporter: Benson Margulies
Assignee: Grant Ingersoll
 Fix For: 0.3

 Attachments: replace-sorting.diff

   Original Estimate: 72h
  Remaining Estimate: 72h

 org.apache.mahout.math.Sorting looks as if the original author borrowed from 
 the Sun JRE, based on the private internal function names and contents. That 
 code has a restrictive license. We need to take the equivalent file 
 (java.util.Arrays) from Apache Harmony and use it as the basis for a clean 
 replacement.
 The problematic code are the quickSort and mergeSort functions, which extend 
 'Arrays' by supporting slices of arrays and custom sorting predicate 
 functions. 
 One might also wistfully note that the more recent JDKs from Sun have 
 deployed different (and one hopes) better sort algorithms that 1.5 and/or 
 Harmony, so a really energetic person might build implementations in here to 
 match. However, expediency calls for just bashing on the Harmony 
 implementation to solve the problem at hand.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.