Re: [jira] Assigned: (MAHOUT-11) Static fields used throughout clustering code (Canopy, K-Means).
On Thu Sean Owen wrote: > Looks like Hudson is saying that broke the build but looks like easily > addressable stuff. Fixed it - but only shortly *after* Hudson had already started building the project :/ Triggered the build on Hudson manually a few minutes ago - now it runs successfully again. Isabel
Re: [jira] Assigned: (MAHOUT-11) Static fields used throughout clustering code (Canopy, K-Means).
Looks like Hudson is saying that broke the build but looks like easily addressable stuff. On Dec 10, 2009 11:10 AM, "Isabel Drost (JIRA)" wrote: [ https://issues.apache.org/jira/browse/MAHOUT-11?page=com.atlassian.jira.plugin.system.issuet. .. Assignee: Drew Farris (was: Isabel Drost) Thanks. > Static fields used throughout clustering code (Canopy, K-Means). > --... >Assignee: Drew Farris > Fix For: 0.3 > > Attachments: MAHOUT-11-all-cleanup-20091128.patch, MAHOUT-11-...
[jira] Assigned: (MAHOUT-11) Static fields used throughout clustering code (Canopy, K-Means).
[ https://issues.apache.org/jira/browse/MAHOUT-11?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabel Drost reassigned MAHOUT-11: -- Assignee: Drew Farris (was: Isabel Drost) Thanks. > Static fields used throughout clustering code (Canopy, K-Means). > > > Key: MAHOUT-11 > URL: https://issues.apache.org/jira/browse/MAHOUT-11 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.1 >Reporter: Dawid Weiss >Assignee: Drew Farris > Fix For: 0.3 > > Attachments: MAHOUT-11-all-cleanup-20091128.patch, > MAHOUT-11-kmeans-cleanup.patch, MAHOUT-11-RandomSeedGenerator.patch, > MAHOUT-11.patch > > > I file this as a bug, even though I'm not 100% sure it is one. In the currect > code the information is exchanged via static fields (for example, distance > measure and thresholds for Canopies are static field). Is it always true in > Hadoop that one job runs inside one JVM with exclusive access? I haven't seen > it anywhere in Hadoop documentation and my impression was that everything > uses JobConf to pass configuration to jobs, but jobs are configured on a > per-object basis (a job is an object, a mapper is an object and everything > else is basically an object). > If it's possible for two jobs to run in parallel inside one JVM then this is > a limitation and bug in our code that needs to be addressed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (MAHOUT-11) Static fields used throughout clustering code (Canopy, K-Means).
[ https://issues.apache.org/jira/browse/MAHOUT-11?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabel Drost reassigned MAHOUT-11: -- Assignee: Isabel Drost > Static fields used throughout clustering code (Canopy, K-Means). > > > Key: MAHOUT-11 > URL: https://issues.apache.org/jira/browse/MAHOUT-11 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.1 >Reporter: Dawid Weiss >Assignee: Isabel Drost > Fix For: 0.3 > > Attachments: MAHOUT-11-all-cleanup-20091128.patch, > MAHOUT-11-kmeans-cleanup.patch, MAHOUT-11-RandomSeedGenerator.patch, > MAHOUT-11.patch > > > I file this as a bug, even though I'm not 100% sure it is one. In the currect > code the information is exchanged via static fields (for example, distance > measure and thresholds for Canopies are static field). Is it always true in > Hadoop that one job runs inside one JVM with exclusive access? I haven't seen > it anywhere in Hadoop documentation and my impression was that everything > uses JobConf to pass configuration to jobs, but jobs are configured on a > per-object basis (a job is an object, a mapper is an object and everything > else is basically an object). > If it's possible for two jobs to run in parallel inside one JVM then this is > a limitation and bug in our code that needs to be addressed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [jira] Assigned: (MAHOUT-11) Static fields used throughout clustering code (Canopy, K-Means).
I do see a few advantages of using static variables, actually -- I just wasn't sure if it's contractual for Hadoop jobs to run in isolation from other jobs. This is a refactoring rather than functionality improvement, so I'll leave the issue open for some time; once I get a spare minute I'll look at Hadoop's code and see what's cooking there. D. Jeff Eastman wrote: Dawid, I'm not sure either, as it seems to work on deployed jobs where each process only uses a single configuration of distance measure. I'm sure one can easily create use cases where different t1 and t2 values are required and this will break the static approach. I was going to move the static variables back into the object and require each instance to be configured individually, but I got sidetracked into vectors and matrices and have not gotten to it. Go for it, Jeff -Original Message- From: Dawid Weiss (JIRA) [mailto:[EMAIL PROTECTED] Sent: Thursday, March 06, 2008 4:59 AM To: mahout-dev@lucene.apache.org Subject: [jira] Assigned: (MAHOUT-11) Static fields used throughout clustering code (Canopy, K-Means). [ https://issues.apache.org/jira/browse/MAHOUT-11?page=com.atlassian.jira. plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss reassigned MAHOUT-11: - Assignee: Dawid Weiss Static fields used throughout clustering code (Canopy, K-Means). Key: MAHOUT-11 URL: https://issues.apache.org/jira/browse/MAHOUT-11 Project: Mahout Issue Type: Bug Components: Clustering Affects Versions: 0.1 Reporter: Dawid Weiss Assignee: Dawid Weiss I file this as a bug, even though I'm not 100% sure it is one. In the currect code the information is exchanged via static fields (for example, distance measure and thresholds for Canopies are static field). Is it always true in Hadoop that one job runs inside one JVM with exclusive access? I haven't seen it anywhere in Hadoop documentation and my impression was that everything uses JobConf to pass configuration to jobs, but jobs are configured on a per-object basis (a job is an object, a mapper is an object and everything else is basically an object). If it's possible for two jobs to run in parallel inside one JVM then this is a limitation and bug in our code that needs to be addressed.
RE: [jira] Assigned: (MAHOUT-11) Static fields used throughout clustering code (Canopy, K-Means).
Dawid, I'm not sure either, as it seems to work on deployed jobs where each process only uses a single configuration of distance measure. I'm sure one can easily create use cases where different t1 and t2 values are required and this will break the static approach. I was going to move the static variables back into the object and require each instance to be configured individually, but I got sidetracked into vectors and matrices and have not gotten to it. Go for it, Jeff -Original Message- From: Dawid Weiss (JIRA) [mailto:[EMAIL PROTECTED] Sent: Thursday, March 06, 2008 4:59 AM To: mahout-dev@lucene.apache.org Subject: [jira] Assigned: (MAHOUT-11) Static fields used throughout clustering code (Canopy, K-Means). [ https://issues.apache.org/jira/browse/MAHOUT-11?page=com.atlassian.jira. plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss reassigned MAHOUT-11: - Assignee: Dawid Weiss > Static fields used throughout clustering code (Canopy, K-Means). > > > Key: MAHOUT-11 > URL: https://issues.apache.org/jira/browse/MAHOUT-11 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.1 >Reporter: Dawid Weiss >Assignee: Dawid Weiss > > I file this as a bug, even though I'm not 100% sure it is one. In the currect code the information is exchanged via static fields (for example, distance measure and thresholds for Canopies are static field). Is it always true in Hadoop that one job runs inside one JVM with exclusive access? I haven't seen it anywhere in Hadoop documentation and my impression was that everything uses JobConf to pass configuration to jobs, but jobs are configured on a per-object basis (a job is an object, a mapper is an object and everything else is basically an object). > If it's possible for two jobs to run in parallel inside one JVM then this is a limitation and bug in our code that needs to be addressed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (MAHOUT-11) Static fields used throughout clustering code (Canopy, K-Means).
[ https://issues.apache.org/jira/browse/MAHOUT-11?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss reassigned MAHOUT-11: - Assignee: Dawid Weiss > Static fields used throughout clustering code (Canopy, K-Means). > > > Key: MAHOUT-11 > URL: https://issues.apache.org/jira/browse/MAHOUT-11 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.1 >Reporter: Dawid Weiss >Assignee: Dawid Weiss > > I file this as a bug, even though I'm not 100% sure it is one. In the currect > code the information is exchanged via static fields (for example, distance > measure and thresholds for Canopies are static field). Is it always true in > Hadoop that one job runs inside one JVM with exclusive access? I haven't seen > it anywhere in Hadoop documentation and my impression was that everything > uses JobConf to pass configuration to jobs, but jobs are configured on a > per-object basis (a job is an object, a mapper is an object and everything > else is basically an object). > If it's possible for two jobs to run in parallel inside one JVM then this is > a limitation and bug in our code that needs to be addressed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.