Algorithms for categorical data

2013-06-02 Thread Florents Tselai
I've noticed (correct me if I'm wrong) that mahout lacks algorithms
specialized in clustering data with categorical attributes.

Would the community be interested in the implementation of algorithms like
ROCK http://www.cis.upenn.edu/~sudipto/mypapers/categorical.pdf ?

I'm currently working on this area (applied-research project) and I'd like
to have my code open-sourced.


Re: Algorithms for categorical data

2013-06-02 Thread Yexi Jiang
Do you already have one implemented?


2013/6/2 Florents Tselai tse...@dmst.aueb.gr

 I've noticed (correct me if I'm wrong) that mahout lacks algorithms
 specialized in clustering data with categorical attributes.

 Would the community be interested in the implementation of algorithms like
 ROCK http://www.cis.upenn.edu/~sudipto/mypapers/categorical.pdf ?

 I'm currently working on this area (applied-research project) and I'd like
 to have my code open-sourced.




-- 
--
Yexi Jiang,
ECS 251,  yjian...@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/


Re: Algorithms for categorical data

2013-06-02 Thread Florents Tselai
Not yet.

I'm currently experimenting with various implementation in Python.


On Sun, Jun 2, 2013 at 9:43 PM, Yexi Jiang yexiji...@gmail.com wrote:

 Do you already have one implemented?


 2013/6/2 Florents Tselai tse...@dmst.aueb.gr

  I've noticed (correct me if I'm wrong) that mahout lacks algorithms
  specialized in clustering data with categorical attributes.
 
  Would the community be interested in the implementation of algorithms
 like
  ROCK http://www.cis.upenn.edu/~sudipto/mypapers/categorical.pdf ?
 
  I'm currently working on this area (applied-research project) and I'd
 like
  to have my code open-sourced.
 



 --
 --
 Yexi Jiang,
 ECS 251,  yjian...@cs.fiu.edu
 School of Computer and Information Science,
 Florida International University
 Homepage: http://users.cis.fiu.edu/~yjian004/



Re: Algorithms for categorical data

2013-06-02 Thread Yexi Jiang
You mean you are testing on the single machine version?


2013/6/2 Florents Tselai tse...@dmst.aueb.gr

 Not yet.

 I'm currently experimenting with various implementation in Python.


 On Sun, Jun 2, 2013 at 9:43 PM, Yexi Jiang yexiji...@gmail.com wrote:

  Do you already have one implemented?
 
 
  2013/6/2 Florents Tselai tse...@dmst.aueb.gr
 
   I've noticed (correct me if I'm wrong) that mahout lacks algorithms
   specialized in clustering data with categorical attributes.
  
   Would the community be interested in the implementation of algorithms
  like
   ROCK http://www.cis.upenn.edu/~sudipto/mypapers/categorical.pdf ?
  
   I'm currently working on this area (applied-research project) and I'd
  like
   to have my code open-sourced.
  
 
 
 
  --
  --
  Yexi Jiang,
  ECS 251,  yjian...@cs.fiu.edu
  School of Computer and Information Science,
  Florida International University
  Homepage: http://users.cis.fiu.edu/~yjian004/
 




-- 
--
Yexi Jiang,
ECS 251,  yjian...@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/


Re: Algorithms for categorical data

2013-06-02 Thread Florents Tselai
Yes


On Sun, Jun 2, 2013 at 9:56 PM, Yexi Jiang yexiji...@gmail.com wrote:

 You mean you are testing on the single machine version?


 2013/6/2 Florents Tselai tse...@dmst.aueb.gr

  Not yet.
 
  I'm currently experimenting with various implementation in Python.
 
 
  On Sun, Jun 2, 2013 at 9:43 PM, Yexi Jiang yexiji...@gmail.com wrote:
 
   Do you already have one implemented?
  
  
   2013/6/2 Florents Tselai tse...@dmst.aueb.gr
  
I've noticed (correct me if I'm wrong) that mahout lacks algorithms
specialized in clustering data with categorical attributes.
   
Would the community be interested in the implementation of algorithms
   like
ROCK http://www.cis.upenn.edu/~sudipto/mypapers/categorical.pdf ?
   
I'm currently working on this area (applied-research project) and I'd
   like
to have my code open-sourced.
   
  
  
  
   --
   --
   Yexi Jiang,
   ECS 251,  yjian...@cs.fiu.edu
   School of Computer and Information Science,
   Florida International University
   Homepage: http://users.cis.fiu.edu/~yjian004/
  
 



 --
 --
 Yexi Jiang,
 ECS 251,  yjian...@cs.fiu.edu
 School of Computer and Information Science,
 Florida International University
 Homepage: http://users.cis.fiu.edu/~yjian004/



Re: Algorithms for categorical data

2013-06-02 Thread Ted Dunning
So Florents, can you say how this works better than 1 of n coding and then
using a simple scaled Euclidean metric?

Beyond that, how would this scale?




On Sun, Jun 2, 2013 at 2:39 PM, Florents Tselai tse...@dmst.aueb.gr wrote:

 I've noticed (correct me if I'm wrong) that mahout lacks algorithms
 specialized in clustering data with categorical attributes.

 Would the community be interested in the implementation of algorithms like
 ROCK http://www.cis.upenn.edu/~sudipto/mypapers/categorical.pdf ?

 I'm currently working on this area (applied-research project) and I'd like
 to have my code open-sourced.



Re: Algorithms for categorical data

2013-06-02 Thread Ted Dunning
Also, I was just reading the paper you referred to.  It makes what seem to
me to be a series of somehwat strawman arguments against 1 of n encoding.

First, actual practice often involves Euclidean distances between points on
a sphere S^n rather than than unrestricted points in R^n.  This helps quite
a lot.

Another vein of usage is to embed points using 1 of n coding and then
embedding points based on cooccurrence in a user history matrix.  Euclidean
distance works well there as well.

Neither of these approaches is addressed in the justification of your paper.

I haven't read enough or thought enough to talk about your methods yet.




On Sun, Jun 2, 2013 at 3:18 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 So Florents, can you say how this works better than 1 of n coding and then
 using a simple scaled Euclidean metric?

 Beyond that, how would this scale?




 On Sun, Jun 2, 2013 at 2:39 PM, Florents Tselai tse...@dmst.aueb.grwrote:

 I've noticed (correct me if I'm wrong) that mahout lacks algorithms
 specialized in clustering data with categorical attributes.

 Would the community be interested in the implementation of algorithms like
 ROCK http://www.cis.upenn.edu/~sudipto/mypapers/categorical.pdf ?

 I'm currently working on this area (applied-research project) and I'd like
 to have my code open-sourced.