DistributedSparseMatrix should clean up after itself when doing times(Vector)
and timesSquared(Vector)
------------------------------------------------------------------------------------------------------
Key: MAHOUT-666
URL: https://issues.apache.org/jira/browse/MAHOUT-666
Project: Mahout
Issue Type: Bug
Components: Math
Affects Versions: 0.5
Environment: Linux x86_64 2.6.18, Mac OS 10.6 64-bit, Hadoop 0.20.2,
Java 1.6
Reporter: Jonathan Traupman
Priority: Minor
Fix For: 0.5
The directories created during the times() and timesSquared() methods in
DistributedSparseMatrix leave behind a lot of cruft. While the individual files
are tagged with deleteOnExit, but the directories are not. Also, but not
deleting them until JVM exit, a job that does repeated matrix/vector
multiplies, like DistributedLanczosSolver, creates a lot of temp files that
stick around for the whole run, even though the results they contain are read
once and then never again.
Our cluster admins enforce both file count and size quotas, so since 5 temp
files/directories are created on each iteration of DistributedLanczosSolver,
we're constantly bumping into the quota with large SVDs.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira