Algorithms for categorical data
I've noticed (correct me if I'm wrong) that mahout lacks algorithms specialized in clustering data with categorical attributes. Would the community be interested in the implementation of algorithms like ROCK http://www.cis.upenn.edu/~sudipto/mypapers/categorical.pdf ? I'm currently working on this area (applied-research project) and I'd like to have my code open-sourced.
Re: Algorithms for categorical data
Do you already have one implemented? 2013/6/2 Florents Tselai tse...@dmst.aueb.gr I've noticed (correct me if I'm wrong) that mahout lacks algorithms specialized in clustering data with categorical attributes. Would the community be interested in the implementation of algorithms like ROCK http://www.cis.upenn.edu/~sudipto/mypapers/categorical.pdf ? I'm currently working on this area (applied-research project) and I'd like to have my code open-sourced. -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/
Re: Algorithms for categorical data
Not yet. I'm currently experimenting with various implementation in Python. On Sun, Jun 2, 2013 at 9:43 PM, Yexi Jiang yexiji...@gmail.com wrote: Do you already have one implemented? 2013/6/2 Florents Tselai tse...@dmst.aueb.gr I've noticed (correct me if I'm wrong) that mahout lacks algorithms specialized in clustering data with categorical attributes. Would the community be interested in the implementation of algorithms like ROCK http://www.cis.upenn.edu/~sudipto/mypapers/categorical.pdf ? I'm currently working on this area (applied-research project) and I'd like to have my code open-sourced. -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/
Re: Algorithms for categorical data
You mean you are testing on the single machine version? 2013/6/2 Florents Tselai tse...@dmst.aueb.gr Not yet. I'm currently experimenting with various implementation in Python. On Sun, Jun 2, 2013 at 9:43 PM, Yexi Jiang yexiji...@gmail.com wrote: Do you already have one implemented? 2013/6/2 Florents Tselai tse...@dmst.aueb.gr I've noticed (correct me if I'm wrong) that mahout lacks algorithms specialized in clustering data with categorical attributes. Would the community be interested in the implementation of algorithms like ROCK http://www.cis.upenn.edu/~sudipto/mypapers/categorical.pdf ? I'm currently working on this area (applied-research project) and I'd like to have my code open-sourced. -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/ -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/
Re: Algorithms for categorical data
Yes On Sun, Jun 2, 2013 at 9:56 PM, Yexi Jiang yexiji...@gmail.com wrote: You mean you are testing on the single machine version? 2013/6/2 Florents Tselai tse...@dmst.aueb.gr Not yet. I'm currently experimenting with various implementation in Python. On Sun, Jun 2, 2013 at 9:43 PM, Yexi Jiang yexiji...@gmail.com wrote: Do you already have one implemented? 2013/6/2 Florents Tselai tse...@dmst.aueb.gr I've noticed (correct me if I'm wrong) that mahout lacks algorithms specialized in clustering data with categorical attributes. Would the community be interested in the implementation of algorithms like ROCK http://www.cis.upenn.edu/~sudipto/mypapers/categorical.pdf ? I'm currently working on this area (applied-research project) and I'd like to have my code open-sourced. -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/ -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/
Re: Algorithms for categorical data
So Florents, can you say how this works better than 1 of n coding and then using a simple scaled Euclidean metric? Beyond that, how would this scale? On Sun, Jun 2, 2013 at 2:39 PM, Florents Tselai tse...@dmst.aueb.gr wrote: I've noticed (correct me if I'm wrong) that mahout lacks algorithms specialized in clustering data with categorical attributes. Would the community be interested in the implementation of algorithms like ROCK http://www.cis.upenn.edu/~sudipto/mypapers/categorical.pdf ? I'm currently working on this area (applied-research project) and I'd like to have my code open-sourced.
Re: Algorithms for categorical data
Also, I was just reading the paper you referred to. It makes what seem to me to be a series of somehwat strawman arguments against 1 of n encoding. First, actual practice often involves Euclidean distances between points on a sphere S^n rather than than unrestricted points in R^n. This helps quite a lot. Another vein of usage is to embed points using 1 of n coding and then embedding points based on cooccurrence in a user history matrix. Euclidean distance works well there as well. Neither of these approaches is addressed in the justification of your paper. I haven't read enough or thought enough to talk about your methods yet. On Sun, Jun 2, 2013 at 3:18 PM, Ted Dunning ted.dunn...@gmail.com wrote: So Florents, can you say how this works better than 1 of n coding and then using a simple scaled Euclidean metric? Beyond that, how would this scale? On Sun, Jun 2, 2013 at 2:39 PM, Florents Tselai tse...@dmst.aueb.grwrote: I've noticed (correct me if I'm wrong) that mahout lacks algorithms specialized in clustering data with categorical attributes. Would the community be interested in the implementation of algorithms like ROCK http://www.cis.upenn.edu/~sudipto/mypapers/categorical.pdf ? I'm currently working on this area (applied-research project) and I'd like to have my code open-sourced.