[ https://issues.apache.org/jira/browse/MAHOUT-11?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Isabel Drost updated MAHOUT-11: ------------------------------- Attachment: MAHOUT-11.patch Not the original author of the source, but still managed to get the static fields out of the k-means clustering code. All unit-tests are still passing. However I would feel a lot better, if someone else double-checked the changes made. Looking at the code, I spotted some more points that could benefit from being revisited (e.g. usage of deprecated MapReduce APIs and introduction of status reports). But this should be done in a separate issue. > Static fields used throughout clustering code (Canopy, K-Means). > ---------------------------------------------------------------- > > Key: MAHOUT-11 > URL: https://issues.apache.org/jira/browse/MAHOUT-11 > Project: Mahout > Issue Type: Bug > Components: Clustering > Affects Versions: 0.1 > Reporter: Dawid Weiss > Fix For: 0.3 > > Attachments: MAHOUT-11.patch > > > I file this as a bug, even though I'm not 100% sure it is one. In the currect > code the information is exchanged via static fields (for example, distance > measure and thresholds for Canopies are static field). Is it always true in > Hadoop that one job runs inside one JVM with exclusive access? I haven't seen > it anywhere in Hadoop documentation and my impression was that everything > uses JobConf to pass configuration to jobs, but jobs are configured on a > per-object basis (a job is an object, a mapper is an object and everything > else is basically an object). > If it's possible for two jobs to run in parallel inside one JVM then this is > a limitation and bug in our code that needs to be addressed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.