Re: [jira] Commented: (MAHOUT-206) Separate and clearly label different SparseVector implementations
Practically speaking, the huge advantage of the abstract class is that you have lower update requirements and less duplicated code when augmenting the interface. Yes, you can do the dual thing, but the practical experience with Hadoop and Lucene has been that using just the abstract class which is named as if it were an interface works about better in the long run. The update requirements become very onerous when you are dealing with more than one package that have to be updated (and which can't for some reason be updated simultaneously). When adding methods, the standard practice is to add an implementation that throws UnsupportedOperationException or something similar. Yes, you can do this with interace+abstract if *everybody* codes just the right way, but with the abstract only approach, there is one less thing for people to do wrong. I took a long time to come around to this pattern of coding, but I finally agree that publishing abstract classes really is better except where you have to have an interface (for RPC or multiple inheritance). It only takes a little bit of outside coding to run into the problem and the social cost can be enormous. On Tue, Nov 24, 2009 at 1:09 PM, Sean Owen wrote: > ... > Abstract classes afford the possibility of adding methods plus > implementation, without breaking anybody, so yeah I'm into abstract > classes. But then that's no argument against an abstract class + > interface, which would add a small bit of flexibility too. >
[jira] Commented: (MAHOUT-204) Better integration of Mahout matrix capabilities with Colt Matrix additions
[ https://issues.apache.org/jira/browse/MAHOUT-204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782670#action_12782670 ] Sean Owen commented on MAHOUT-204: -- OK I have another pretty big round of changes queued up, as per my last comments. I've also deleted the 'test' classes and demo code as neither appear maintained and are not unit tests. Before I get into some fine-grained work, can anyone comment on what definitely isn't needed, so I don't bother with it? Otherwise I assume it's basically just linear algebra and matrices -- not stats stuff, etc. > Better integration of Mahout matrix capabilities with Colt Matrix additions > --- > > Key: MAHOUT-204 > URL: https://issues.apache.org/jira/browse/MAHOUT-204 > Project: Mahout > Issue Type: Improvement >Affects Versions: 0.3 >Reporter: Grant Ingersoll > Fix For: 0.3 > > Attachments: MAHOUT-204-author-cleanup.patch > > > Per MAHOUT-165, we need to refactor the matrix package structures a bit to be > more coherent and clean. For instance, there are two levels of matrix > packages now, so those should be rectified. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-11) Static fields used throughout clustering code (Canopy, K-Means).
[ https://issues.apache.org/jira/browse/MAHOUT-11?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782470#action_12782470 ] Isabel Drost commented on MAHOUT-11: Drew, go ahead then. > Static fields used throughout clustering code (Canopy, K-Means). > > > Key: MAHOUT-11 > URL: https://issues.apache.org/jira/browse/MAHOUT-11 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.1 >Reporter: Dawid Weiss > Fix For: 0.3 > > Attachments: MAHOUT-11-kmeans-cleanup.patch, > MAHOUT-11-RandomSeedGenerator.patch, MAHOUT-11.patch > > > I file this as a bug, even though I'm not 100% sure it is one. In the currect > code the information is exchanged via static fields (for example, distance > measure and thresholds for Canopies are static field). Is it always true in > Hadoop that one job runs inside one JVM with exclusive access? I haven't seen > it anywhere in Hadoop documentation and my impression was that everything > uses JobConf to pass configuration to jobs, but jobs are configured on a > per-object basis (a job is an object, a mapper is an object and everything > else is basically an object). > If it's possible for two jobs to run in parallel inside one JVM then this is > a limitation and bug in our code that needs to be addressed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-204) Better integration of Mahout matrix capabilities with Colt Matrix additions
[ https://issues.apache.org/jira/browse/MAHOUT-204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782466#action_12782466 ] Jake Mannix commented on MAHOUT-204: I think we should be pretty aggressive in removing code from this stuff. There are some core stuff we want (linear stuff and collections, and morphisms which interact with them), and a ton of stuff we don't. Maybe we should have separate jira tickets for each thing that could/should be removed? > Better integration of Mahout matrix capabilities with Colt Matrix additions > --- > > Key: MAHOUT-204 > URL: https://issues.apache.org/jira/browse/MAHOUT-204 > Project: Mahout > Issue Type: Improvement >Affects Versions: 0.3 >Reporter: Grant Ingersoll > Fix For: 0.3 > > Attachments: MAHOUT-204-author-cleanup.patch > > > Per MAHOUT-165, we need to refactor the matrix package structures a bit to be > more coherent and clean. For instance, there are two levels of matrix > packages now, so those should be rectified. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-204) Better integration of Mahout matrix capabilities with Colt Matrix additions
[ https://issues.apache.org/jira/browse/MAHOUT-204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782463#action_12782463 ] Sean Owen commented on MAHOUT-204: -- I've been attacking this all day. The changes are already big enough I'm committing my current changes, but need to keep going. The major changes still left: replacing System.out with logs, getting rid of all these type references using the complete package name too. There is lots of dead code and other practices that kind of concern me. If there are changes I think deserve discussion I'll surface them. Note, I found some code in here that carries a different copyright: Copyright PIERSOL Engineering? See TestMatrix2D. It's commented out but I think it best to kill it. Along with the other commented out code actually. Also class Gamma mentions it's a port of some code from http://www.sci.usq.edu.au/staff/leighb/graph/Top.html and a library called Cephes 2.2. Can't find these now. Should we be concerned? bottom line there is a lot of work to be done on this code. > Better integration of Mahout matrix capabilities with Colt Matrix additions > --- > > Key: MAHOUT-204 > URL: https://issues.apache.org/jira/browse/MAHOUT-204 > Project: Mahout > Issue Type: Improvement >Affects Versions: 0.3 >Reporter: Grant Ingersoll > Fix For: 0.3 > > Attachments: MAHOUT-204-author-cleanup.patch > > > Per MAHOUT-165, we need to refactor the matrix package structures a bit to be > more coherent and clean. For instance, there are two levels of matrix > packages now, so those should be rectified. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-204) Better integration of Mahout matrix capabilities with Colt Matrix additions
[ https://issues.apache.org/jira/browse/MAHOUT-204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782370#action_12782370 ] Sean Owen commented on MAHOUT-204: -- I'm still working on this. The reformatting was simple but IntelliJ is having a field day with its inspections and I'm slugging through them all. I'm focusing on style issues mostly. > Better integration of Mahout matrix capabilities with Colt Matrix additions > --- > > Key: MAHOUT-204 > URL: https://issues.apache.org/jira/browse/MAHOUT-204 > Project: Mahout > Issue Type: Improvement >Affects Versions: 0.3 >Reporter: Grant Ingersoll > Fix For: 0.3 > > Attachments: MAHOUT-204-author-cleanup.patch > > > Per MAHOUT-165, we need to refactor the matrix package structures a bit to be > more coherent and clean. For instance, there are two levels of matrix > packages now, so those should be rectified. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: SVM algo, code, etc.
On Fri Grant Ingersoll wrote: > On Nov 19, 2009, at 1:15 PM, Sean Owen wrote: > > Post a patch if you'd like to proceed, IMHO. > +1 +1 from me as well. I would love to see solid svm support in Mahout. Isabel