[ 
https://issues.apache.org/jira/browse/MAHOUT-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673031#comment-13673031
 ] 

Dawid Weiss commented on MAHOUT-1225:
-------------------------------------

Take a look at this test:
{code}
    @Test
    public void testClearTable() throws Exception {
        OpenObjectIntHashMap<Integer> m = new OpenObjectIntHashMap<Integer>();
        m.clear(); // rehash from the default capacity to the next prime after 
1 (3).
        m.put(1, 2);
        m.clear(); // Should clear internal references.
        
        Field tableField = m.getClass().getDeclaredField("table");
        tableField.setAccessible(true);
        Object[] table = (Object[]) tableField.get(m);
        
        assertEquals(
            new HashSet<Object>(Arrays.asList(new Object [] { null } )),
            new HashSet<Object>(Arrays.asList(table)));
    }
{code}

This fails because clear() does not explicitly erase the table of references. 
It does call rehash but not always (not if there's no need) in which case the 
references stay hard-linked. The fix is to:

{code}
   public void clear() {
     Arrays.fill(this.state, FREE);
+    Arrays.fill(this.table, null);
+
     distinct = 0;
     freeEntries = table.length; // delta
     trimToSize();
{code}

You could avoid this by returning a boolean from trimToSize() and checking 
whether internal buffers have been reallocated (and thus references freed).
                
> Sets and maps incorrectly clear() their state arrays (potential endless loops)
> ------------------------------------------------------------------------------
>
>                 Key: MAHOUT-1225
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1225
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>    Affects Versions: 0.7
>         Environment: Eclipse, linux Fedora 17, Java 1.7, Mahout Maths 
> collections (Set) 0.7, hppc 0.4.3
>            Reporter: Sophie Sperner
>            Assignee: Dawid Weiss
>              Labels: hashset, java, mahout, test
>             Fix For: 0.7
>
>         Attachments: hppc-0.4.3.jar, MAHOUT-1225.patch, MAHOUT-1225.patch, 
> MAHOUT-1225.patch, mahout-math-0.8-SNAPSHOT.jar
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> The code I attached hangs on forever, Eclipse does not print me its stack 
> trace because it does not terminate the program. So I decided to make a small 
> test.java file that you can easily run.
> This code has the main function that simply runs getItemList() method which 
> successfully executes getDataset() method (here please download mushroom.dat 
> dataset and set the full path into filePath string variable) and the hangs on 
> (the problem happens on a fourth columnValues.add() call). After the dataset 
> was taken into X array, the code simply goes through X column by column and 
> searches for different items in it.
> If you uncomment IntSet columnValues = new IntOpenHashSet(); and 
> corresponding import headers then everything will work just fine (you will 
> also need to include hppc jar file found here 
> http://labs.carrotsearch.com/hppc.html or below in the attachment).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to