Re: Dangling collections in front of commons
Sounds good to me. On Fri, Apr 2, 2010 at 6:42 PM, Benson Margulies bimargul...@gmail.comwrote: Does anyone object if I send a suggestion to the commons PMC that mahout-collections would make more sense as commons-something-or-another? I don't expect to get anywhere, but I thought I'd try.
Re: Dangling collections in front of commons
I'm neutral... maybe let it marinate longer in Mahout, prove it's used and worthwhile and such? I think the question will be, well, doesn't that conflict with Commons Collections, and so, are we suggesting pushing into Collections, and can we make an argument that it complements Collections? On Sat, Apr 3, 2010 at 2:42 AM, Benson Margulies bimargul...@gmail.com wrote: Does anyone object if I send a suggestion to the commons PMC that mahout-collections would make more sense as commons-something-or-another? I don't expect to get anywhere, but I thought I'd try.
Re: [collections] and what about 'identity'?
The source code to HPPC is public and accessible, so you are more then welcome to peek/ contribute/ take whatever you want, Benson. Dawid On Fri, Apr 2, 2010 at 10:45 PM, Benson Margulies bimargul...@gmail.com wrote: Dawid, Now I recall why I stopped working on features of Mahout collections :-) HPPC. We'll see who gets where first. --benson On Fri, Apr 2, 2010 at 10:06 AM, Dawid Weiss dawid.we...@gmail.com wrote: What's the use case for needing to vary the hash function? It's one of those things where I assume there are incorrect ways to do it, and correct ways, and among the correct ways fairly clear arguments about which function will be better -- i.e. the object should provide the best function. Unfortunately this is not true -- just recently I've hit a use case where the keys stored were Long values and their distribution had a very low variance in the lower bits. HPPC implemented open hashing using 2^n arrays and hashes were modulo bitmask... this caused really, really long conflict chains for values that were actually very different. I looked at how JDK's HashMap solves this problem -- they do a simple rehashing scheme internally (so it's object hash and then remixing hash in a cascade). I've finally decided to allow external hash functions AND changed the _default_ hash function used for remixing to be murmur hash. Performance benchmarks show this yields virtually no degradation in execution time (the CPUs seem to spend most of their time waiting on cache misses anyway, so internal rehashing is not an issue). I must also apologize for a bit of inactivity with HPPC... Like I said, we have released it internally on our labs Web site here: http://labs.carrotsearch.com/hppc.html It doesn't mean we turn our backs on contributing HPPC to Mahout -- the opposite, we would love to do it. But contrary to what I originally thought (to push HPPC to Mahout as soon as possible) I kind of grew reluctant because so many things are missing (equals/hashcode, java collections adapters) or can be improved (documentation, faster iterators). So... I'm still going to experiment with HPPC in our labs, especially API-wise, release one or two versions in between and then kindly ask you to peek at the final (?) result and consider moving the code under Mahout umbrella. Sounds good? Dawid
Re: Dangling collections in front of commons
On Apr 3, 2010, at 5:17 AM, Sean Owen wrote: I'm neutral... maybe let it marinate longer in Mahout, prove it's used and worthwhile and such? Yeah, I'd tend to agree here. Let's see if we get some contributions on it and how it plays out for us. I think the question will be, well, doesn't that conflict with Commons Collections, and so, are we suggesting pushing into Collections, and can we make an argument that it complements Collections? I think it does, since it focuses on primitives. On Sat, Apr 3, 2010 at 2:42 AM, Benson Margulies bimargul...@gmail.com wrote: Does anyone object if I send a suggestion to the commons PMC that mahout-collections would make more sense as commons-something-or-another? I don't expect to get anywhere, but I thought I'd try.
[jira] Updated: (MAHOUT-344) Minhash based clustering
[ https://issues.apache.org/jira/browse/MAHOUT-344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cristi Prodan updated MAHOUT-344: - Status: Patch Available (was: Open) Thank you guys for all the encouragement and advices. I'm committing my first patch for MinHah clustering. The patch contain the following things: - in core - minhash: * MinHashMapRed - removed the distributed hash need; Each mapper generates the same hash functions using the same seed (as per instructions from Ankur). * RandomLinearHashFunction - added another random linear hash function in form: h( x ) = ax + b mod p . p will be as big as possible 1000 and it should be prime(not done yet, but committing in this form due to some time restrictions) . - in examples - minhash directory: * DisplayMinHash - contains an example of running min-hash, with the options commented. It's basically the main function from MinHashMapRed. * PrepareDataset - this class offers the ability to convert the last-fm database suggested above in a format readable by the MinHash algorithm. It also shows a progress bar with the percent done :) . For the future I believe that all the code in the algorithm should take a more generalized form and use the Vectors classes used by Mahout, then the users could either write their own version with a Vector interface or create a tool that converts their ds to the vector format the code will know. MurmurHash is used by PrepareDataset to hash the strings which denoted users (in the original last_fm dataset) - to integers. * TestClusterQuality - gets a clustered file, generated by the minhash algorithm and computes the average for each cluster aggregated over all clusters. In each cluster the mean is computed by: SUM (similarity (item_i, item_j)) / TOTAL_SIMILARITIES, for i != j . TOTAL_SIMILARITIES = n! / k! * (n -k)!, n = total number of items in cluster, k = 2. The aggregated mean is the mean of all these values. As an example. Having the following input: 1 1 2 3 4 5 2 1 2 3 4 6 3 7 6 3 8 9 4 7 8 9 6 1 5 5 6 7 8 9 6 8 7 5 6 The first column are the users. For each user, on the lines we have the items preferred (browsed, listened to) by him. Same format in the contents of each cluster bellow. we get the following output(PARAMETERS: 20 hash functions, 4 keygroups (hash indices in a bucket), 2 - minimum items within cluster): CLUSTER ID -- 2359983695385880352354530253637788 (items=2) = 2 1 2 3 4 6 1 1 2 3 4 5 CLUSTER ID -- 236643825172184878353970117486898894 (items=2) = 4 7 8 9 6 1 3 7 6 3 8 9 CLUSTER ID -- 35606006580772015548743126287496777 (items=2) = 6 8 7 5 6 5 5 6 7 8 9 CLUSTER ID -- 38797144231157365543316465389702468 (items=2) = 6 8 7 5 6 5 5 6 7 8 9 The aggregated average over theses clusters is 0.6793650793650793. I'm now testing on the last_fm dataset. The problem I currently encounter is that the size for the clustered file is kind of big, but I'm working on tuning the params. Minhash based clustering - Key: MAHOUT-344 URL: https://issues.apache.org/jira/browse/MAHOUT-344 Project: Mahout Issue Type: Bug Components: Clustering Affects Versions: 0.3 Reporter: Ankur Assignee: Ankur Attachments: MAHOUT-344-v1.patch Minhash clustering performs probabilistic dimension reduction of high dimensional data. The essence of the technique is to hash each item using multiple independent hash functions such that the probability of collision of similar items is higher. Multiple such hash tables can then be constructed to answer near neighbor type of queries efficiently. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAHOUT-344) Minhash based clustering
[ https://issues.apache.org/jira/browse/MAHOUT-344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cristi Prodan updated MAHOUT-344: - Attachment: MAHOUT-344-v2.patch See comment above for this patch. Minhash based clustering - Key: MAHOUT-344 URL: https://issues.apache.org/jira/browse/MAHOUT-344 Project: Mahout Issue Type: Bug Components: Clustering Affects Versions: 0.3 Reporter: Ankur Assignee: Ankur Attachments: MAHOUT-344-v1.patch, MAHOUT-344-v2.patch Minhash clustering performs probabilistic dimension reduction of high dimensional data. The essence of the technique is to hash each item using multiple independent hash functions such that the probability of collision of similar items is higher. Multiple such hash tables can then be constructed to answer near neighbor type of queries efficiently. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Dangling collections in front of commons
I'm neutral... maybe let it marinate longer in Mahout, prove it's used and worthwhile and such? Yeah, I'd tend to agree here. Let's see if we get some contributions on it and how it plays out for us. Marination is exactly my motive why I work on HPPC in separation from Mahout... Once you let the API out in the open, it's much more difficult/ problematic to change it. D.
GSoC - Implementing SOM
Dear Mahout Developers, I am an undergraduate student, finishing my final year. For my final year project, I got to work on Hadoop MapReduce and HDFS; furthermore I also had to use clustering algorithms in Mahout on some of the datasets. One of my project mentors proposed to implement Self Organizing Maps for that but it has not yet been implemented in Mahout. So I thought why not I should do it. Here I am, open for your comments and suggestions for the scope of this project. Thanking you all, Hifsa Kazmi
A request for prospective GSOC students
I am having a tough time separating Mahout proposals from rest of Apache on gsoc website. So I would request you all to reply to this thread when you have submitted a proposal so that we don't miss out on reading your hard worked proposal. For now I could only find Zhao Zhendong's LIBLINEAR proposal. If anyone else have applied do reply back with the title of the proposal. Robin
Re: A request for prospective GSOC students
Dear Robin and other contributors, Nice to meet you. I am a PhD student in University of Central Florida. I submitted a proposal to Google Summer of Code 2010 with title Implement Map/Reduce Enabled Neural Networks (mahout-342). Any suggestions and advice are very welcome. I am still allowed to do correction on it before April 9th. Thank you! -- Regards, Yinghua On Sat, Apr 3, 2010 at 11:37 AM, Robin Anil robin.a...@gmail.com wrote: I am having a tough time separating Mahout proposals from rest of Apache on gsoc website. So I would request you all to reply to this thread when you have submitted a proposal so that we don't miss out on reading your hard worked proposal. For now I could only find Zhao Zhendong's LIBLINEAR proposal. If anyone else have applied do reply back with the title of the proposal. Robin
Re: A request for prospective GSOC students
Thanks! I just noticed your proposal. My advice to everyone would be to be clear on what you want to do instead of the related content and theory about any algorithm. So really expand the design, implementation and time line sections. Robin On Sat, Apr 3, 2010 at 9:18 PM, yinghua hu yinghua...@gmail.com wrote: Dear Robin and other contributors, Nice to meet you. I am a PhD student in University of Central Florida. I submitted a proposal to Google Summer of Code 2010 with title Implement Map/Reduce Enabled Neural Networks (mahout-342). Any suggestions and advice are very welcome. I am still allowed to do correction on it before April 9th. Thank you! -- Regards, Yinghua On Sat, Apr 3, 2010 at 11:37 AM, Robin Anil robin.a...@gmail.com wrote: I am having a tough time separating Mahout proposals from rest of Apache on gsoc website. So I would request you all to reply to this thread when you have submitted a proposal so that we don't miss out on reading your hard worked proposal. For now I could only find Zhao Zhendong's LIBLINEAR proposal. If anyone else have applied do reply back with the title of the proposal. Robin
[GSOC] 2010 Timelines
http://socghop.appspot.com/document/show/gsoc_program/google/gsoc2010/faqs#timeline
[jira] Commented: (MAHOUT-332) Create adapters for MYSQL and NOSQL(hbase, cassandra) to access data for all the algorithms to use
[ https://issues.apache.org/jira/browse/MAHOUT-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853167#action_12853167 ] Necati Batur commented on MAHOUT-332: - I am a senior student at computer engineering at iztech in turkey. My areas of inetrests are information management, OOP(Object Oriented Programming) and currently bioinformatics. I have been working with a Asistan Professor(Jens Allmer) in molecular biology genetics department for one year.Firstly, we worked on a protein database 2DB and we presented the project in HIBIT09 organization. The Project was Database management system independence by amending 2DB with a database access layer (written in Java). Currently, I am working on another project (Kerb) as my senior project which is a general sqeuential task management system intend to reduce the errors and increase time saving in biological experiments. We will present this project in HIBIT2010 too. I am confident to work with all new technologies.I took the data structures I , II courses at university so I am ok with data structures.Most importantly I am interested in databases.From my software engineering courses experience I know how to work on a project by iterative development and timelining. In order to add more functionalities I need a mentor to contact for this project. Create adapters for MYSQL and NOSQL(hbase, cassandra) to access data for all the algorithms to use --- Key: MAHOUT-332 URL: https://issues.apache.org/jira/browse/MAHOUT-332 Project: Mahout Issue Type: New Feature Reporter: Robin Anil A student with a good proposal - should be free to work for Mahout in the summer and should be thrilled to work in this area :) - should be able to program in Java and be comfortable with datastructures and algorithms - must explore SQL and NOSQL implementations, and design a framework with which data from them could be fetched and converted to mahout format or used directly as a matrix transparently - should have a plan to make it high performance with ample caching strategies or the ability to use it on a map/reduce job - should focus more on getting a working version than to implement all functionalities. So its recommended that you divide features into milestones - must have clear deadlines and pace it evenly across the span of 3 months. If you can do something extra it counts, but make sure the plan is reasonable within the specified time frame. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Dangling collections in front of commons
Commons-collections turns out to be a very specific thing which this is not. I have an intemediate proposal that I'll put in a separate thread. On Sat, Apr 3, 2010 at 7:06 AM, Dawid Weiss dawid.we...@gmail.com wrote: I'm neutral... maybe let it marinate longer in Mahout, prove it's used and worthwhile and such? Yeah, I'd tend to agree here. Let's see if we get some contributions on it and how it plays out for us. Marination is exactly my motive why I work on HPPC in separation from Mahout... Once you let the API out in the open, it's much more difficult/ problematic to change it. D.
Proposal: make collections releases independent of the rest of Mahout
I propose to disconnect collections from the aggregate project and put it on its own release cycle. This was originally someone else's idea when we started on it. Collections is useful in its own right, and I'd like to make fixes to it available without having the whole rest of Mahout reach a release point. I confess that the slf4j dependency in collections is a very strong local motivation to me, but it also seems right in principle. When we go TLP, we can organize this more coherently in svn, but for now we can leave it where it is, but fix up the poms. This strikes me as consistent with the idea of marinating with possible intent that it would become its own thing some day.
[jira] Commented: (MAHOUT-332) Create adapters for MYSQL and NOSQL(hbase, cassandra) to access data for all the algorithms to use
[ https://issues.apache.org/jira/browse/MAHOUT-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853174#action_12853174 ] Robin Anil commented on MAHOUT-332: --- Hi Necati, Take a look at the matrix and vector classes in mahout. And read up on how mahout converts text into vectors. We need a generic framework where data from Databases could be iterated upon as a vector and algorithms can use it seamlessly. The current VectorWritable could be extended to say a database backed vector, which should reach each field and convert it to a vector on the fly using a pre populated dictionary. This could be easily consumed by the mahout algorithms. The database backed vector should be configurable enough such that fields could be selected. I am sure there are frameworks which already does this. Drew Farris is working on a document structure for mahout using avro. I am sure he will have more inputs on how these adapters should fit with his structure. Create adapters for MYSQL and NOSQL(hbase, cassandra) to access data for all the algorithms to use --- Key: MAHOUT-332 URL: https://issues.apache.org/jira/browse/MAHOUT-332 Project: Mahout Issue Type: New Feature Reporter: Robin Anil A student with a good proposal - should be free to work for Mahout in the summer and should be thrilled to work in this area :) - should be able to program in Java and be comfortable with datastructures and algorithms - must explore SQL and NOSQL implementations, and design a framework with which data from them could be fetched and converted to mahout format or used directly as a matrix transparently - should have a plan to make it high performance with ample caching strategies or the ability to use it on a map/reduce job - should focus more on getting a working version than to implement all functionalities. So its recommended that you divide features into milestones - must have clear deadlines and pace it evenly across the span of 3 months. If you can do something extra it counts, but make sure the plan is reasonable within the specified time frame. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Proposal: make collections releases independent of the rest of Mahout
On Sat, Apr 3, 2010 at 6:47 PM, Benson Margulies bimargul...@gmail.com wrote: I confess that the slf4j dependency in collections is a very strong local motivation to me, but it also seems right in principle. I just killed this BTW. (There was one dangling log statement... not worth a dependency.) When we go TLP, we can organize this more coherently in svn, but for now we can leave it where it is, but fix up the poms. Actually it seems like this a valid subproject of a Mahout TLP in its own right, if that would be a useful middle-ground status. This strikes me as consistent with the idea of marinating with possible intent that it would become its own thing some day. Yes it's already its own module, which helps manage it independently. At the moment that means anyone can depend on it, and only it, via Maven, which is 80% of the value. I think it probably needs a fair bit of API rethinking and cleanup to truly stand as a general purpose and reusable component, but that can happen.
Re: Proposal: make collections releases independent of the rest of Mahout
On Sat, Apr 3, 2010 at 2:07 PM, Sean Owen sro...@gmail.com wrote: On Sat, Apr 3, 2010 at 6:47 PM, Benson Margulies bimargul...@gmail.com wrote: I confess that the slf4j dependency in collections is a very strong local motivation to me, but it also seems right in principle. I just killed this BTW. (There was one dangling log statement... not worth a dependency.) Yes, thank you. My selfish short-term goal is to get a release with the log dependency removed out before Mahout 0.4 :-). When we go TLP, we can organize this more coherently in svn, but for now we can leave it where it is, but fix up the poms. Actually it seems like this a valid subproject of a Mahout TLP in its own right, if that would be a useful middle-ground status. I'm not trying to suggest anything different. I'm opposed to having 'separate committers', but I'm happy to have multiple releasable components all in the Mahout TLP. This strikes me as consistent with the idea of marinating with possible intent that it would become its own thing some day. Yes it's already its own module, which helps manage it independently. At the moment that means anyone can depend on it, and only it, via Maven, which is 80% of the value. I think it probably needs a fair bit of API rethinking and cleanup to truly stand as a general purpose and reusable component, but that can happen. No argument there. Practical point: it would be, all joking aside, good to make a very prompt release of this so that the rest of Mahout 0.4-SNAPSHOT could depend on it. If no one protests, I'll do the POM surgery in a couple of days.
[jira] Commented: (MAHOUT-332) Create adapters for MYSQL and NOSQL(hbase, cassandra) to access data for all the algorithms to use
[ https://issues.apache.org/jira/browse/MAHOUT-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853177#action_12853177 ] Necati Batur commented on MAHOUT-332: - Well it will not be to hard to understand the conversion of data into vectors if there is a source and algorithm already :) However could you please give me the neccessary links to check out because in website there is excess amount of repositories that I hardly understand what in where. Nonetheless,how should I write a proposal if I am asked to write? thanks Create adapters for MYSQL and NOSQL(hbase, cassandra) to access data for all the algorithms to use --- Key: MAHOUT-332 URL: https://issues.apache.org/jira/browse/MAHOUT-332 Project: Mahout Issue Type: New Feature Reporter: Robin Anil A student with a good proposal - should be free to work for Mahout in the summer and should be thrilled to work in this area :) - should be able to program in Java and be comfortable with datastructures and algorithms - must explore SQL and NOSQL implementations, and design a framework with which data from them could be fetched and converted to mahout format or used directly as a matrix transparently - should have a plan to make it high performance with ample caching strategies or the ability to use it on a map/reduce job - should focus more on getting a working version than to implement all functionalities. So its recommended that you divide features into milestones - must have clear deadlines and pace it evenly across the span of 3 months. If you can do something extra it counts, but make sure the plan is reasonable within the specified time frame. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: A request for prospective GSOC students
Robin, This is very helpful suggestions! Design and Implementation details are exactly what should be strengthened in this proposal. I will dig more if time allows. Thanks a lot! On Sat, Apr 3, 2010 at 11:51 AM, Robin Anil robin.a...@gmail.com wrote: Thanks! I just noticed your proposal. My advice to everyone would be to be clear on what you want to do instead of the related content and theory about any algorithm. So really expand the design, implementation and time line sections. Robin On Sat, Apr 3, 2010 at 9:18 PM, yinghua hu yinghua...@gmail.com wrote: Dear Robin and other contributors, Nice to meet you. I am a PhD student in University of Central Florida. I submitted a proposal to Google Summer of Code 2010 with title Implement Map/Reduce Enabled Neural Networks (mahout-342). Any suggestions and advice are very welcome. I am still allowed to do correction on it before April 9th. Thank you! -- Regards, Yinghua On Sat, Apr 3, 2010 at 11:37 AM, Robin Anil robin.a...@gmail.com wrote: I am having a tough time separating Mahout proposals from rest of Apache on gsoc website. So I would request you all to reply to this thread when you have submitted a proposal so that we don't miss out on reading your hard worked proposal. For now I could only find Zhao Zhendong's LIBLINEAR proposal. If anyone else have applied do reply back with the title of the proposal. Robin -- Regards, Yinghua
[jira] Commented: (MAHOUT-332) Create adapters for MYSQL and NOSQL(hbase, cassandra) to access data for all the algorithms to use
[ https://issues.apache.org/jira/browse/MAHOUT-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853182#action_12853182 ] Robin Anil commented on MAHOUT-332: --- Conversion of any arbitary data in a database to vectors would be along the same lines as how ARFF format is to be converted to vectors. You can find the code under trunk/utils. It treats boolean, enum and numeric and string datatypes separately. That code still may need some more tweaking up so that the entire ARFF spec is supported. But its a good starting point for you to understand how data is converted to vectors. Also look at the SparseVectorsFromSequenceFiles to understand how text documents in a SequenceFile(you need to understand this also) are converted to vectors using tf-idf based weighting. So in short there could be many weighting strategies. It will be really nice if you can make this pluggable so that users of the library could make custom weighting techniques for each field. Create adapters for MYSQL and NOSQL(hbase, cassandra) to access data for all the algorithms to use --- Key: MAHOUT-332 URL: https://issues.apache.org/jira/browse/MAHOUT-332 Project: Mahout Issue Type: New Feature Reporter: Robin Anil A student with a good proposal - should be free to work for Mahout in the summer and should be thrilled to work in this area :) - should be able to program in Java and be comfortable with datastructures and algorithms - must explore SQL and NOSQL implementations, and design a framework with which data from them could be fetched and converted to mahout format or used directly as a matrix transparently - should have a plan to make it high performance with ample caching strategies or the ability to use it on a map/reduce job - should focus more on getting a working version than to implement all functionalities. So its recommended that you divide features into milestones - must have clear deadlines and pace it evenly across the span of 3 months. If you can do something extra it counts, but make sure the plan is reasonable within the specified time frame. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-323) Classify new data using Decision Forest
[ https://issues.apache.org/jira/browse/MAHOUT-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853185#action_12853185 ] Deneche A. Hakim commented on MAHOUT-323: - committed a basic mapreduce version of TestForest. If you pass -mr to TestForest it will use Hadoop to classify the data. Each input file is processed by exactly one mapper. For now, you compute the confusion matrix with the mapreduce version...this should come in a the next commit Classify new data using Decision Forest --- Key: MAHOUT-323 URL: https://issues.apache.org/jira/browse/MAHOUT-323 Project: Mahout Issue Type: Improvement Components: Classification Affects Versions: 0.4 Reporter: Deneche A. Hakim Assignee: Deneche A. Hakim Attachments: mahout-323.patch When building a Decision Forest we should be able to store it somewhere and use it later to classify new datasets -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: A request for prospective GSOC students
Hi, My proposal had the following subject: Mahout GSoC 2010 proposal: Association Mining It was missing time schedule and further implementation details. I can work on those missing parts but I was rather expecting some general discussion about this topic first before I invest time in time planning and other details. I can see that Mahout is getting a lot of proposals and I think some of them will get reasonable interest of the community. Saying this I think I am fine working on association mining my way without being limited/pushed by GSoC timeline to do compromises that I do not need to do now. However, comments from community about my proposal are still warmly welcome. Regard, Lukas On Sat, Apr 3, 2010 at 5:37 PM, Robin Anil robin.a...@gmail.com wrote: I am having a tough time separating Mahout proposals from rest of Apache on gsoc website. So I would request you all to reply to this thread when you have submitted a proposal so that we don't miss out on reading your hard worked proposal. For now I could only find Zhao Zhendong's LIBLINEAR proposal. If anyone else have applied do reply back with the title of the proposal. Robin
Re: Proposal: make collections releases independent of the rest of Mahout
On Apr 3, 2010, at 2:22 PM, Benson Margulies wrote: On Sat, Apr 3, 2010 at 2:07 PM, Sean Owen sro...@gmail.com wrote: Actually it seems like this a valid subproject of a Mahout TLP in its own right, if that would be a useful middle-ground status. I'm not trying to suggest anything different. I'm opposed to having 'separate committers', but I'm happy to have multiple releasable components all in the Mahout TLP. For those following the sub project saga in Lucene, let's not go down that road. +1 to releasable components, though. We can release what we want when we want. It doesn't have to be the whole thing all the time. But I'd say no to separate committers, etc.