Hi All, I am new to mahout and just want to understand below.
I would like to know, why mahout clustering algorithms need numerical vectorization of actual records(like json etc)? When we have a record with mixed data types and if we convert it into numerical vector, we may not be able to apply field wise comparisons and also maintaing mapping b/w actual record and vector also a problem. Is it numerical vectorization only for performance optimization? or is there any other reason. Does it make sense to apply clustering directly on actual records? Thanks & Regards, B Anil Kumar.
