For clustering analysis, we need a way to measure distances. When the data contains different levels of measurement - *binary / categorical (nominal), counts (ordinal), and ratio (scale)*
To be concrete, for example, working with attributes of *city, zip, satisfaction_level, price* In the meanwhile, the real data usually also contains string attributes, for example, book titles. The distance between two strings can be measured by minimum-edit-distance. In SPSS, it provides Two-Step Cluster, which can handle both ratio scale and ordinal numbers. What is right algorithm to do hierarchical clustering analysis with all these four-kind attributes above with *MLlib*? If we cannot find a right metric to measure the distance, an alternative solution is to do a topological data analysis (e.g. linkage, and etc). Can we do such kind of analysis with *GraphX*? -Rex