DistributedSparseMatrix should clean up after itself when doing times(Vector) 
and timesSquared(Vector)
------------------------------------------------------------------------------------------------------

                 Key: MAHOUT-666
                 URL: https://issues.apache.org/jira/browse/MAHOUT-666
             Project: Mahout
          Issue Type: Bug
          Components: Math
    Affects Versions: 0.5
         Environment: Linux x86_64 2.6.18, Mac OS 10.6 64-bit, Hadoop 0.20.2, 
Java 1.6
            Reporter: Jonathan Traupman
            Priority: Minor
             Fix For: 0.5


The directories created during the times() and timesSquared() methods in 
DistributedSparseMatrix leave behind a lot of cruft. While the individual files 
are tagged with deleteOnExit, but the directories are not. Also, but not 
deleting them until JVM exit, a job that does repeated matrix/vector 
multiplies, like DistributedLanczosSolver, creates a lot of temp files that 
stick around for the whole run, even though the results they contain are read 
once and then never again. 

Our cluster admins enforce both file count and size quotas, so since 5 temp 
files/directories are created on each iteration of DistributedLanczosSolver, 
we're constantly bumping into the quota with large SVDs. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to