Github user srowen commented on the pull request: https://github.com/apache/spark/pull/4460#issuecomment-75170753 Call it `AttributeType` maybe? So if an `AttributeGroup` contains both `Attribute`s but also vector-valued columns, which sound like `AttributeGroup`s within themselves. That's why it seemed like `AttributeGroup` should be an `Attribute` or at least share a common superclass? then I didn't know what to call it and it seemed like overkill. That was the logic behind `AttributeGroup extends Attribute` -- WDYT? As for hierarchy that's all I can think of. Ordinal extends discrete extends continuous; binary extends, well, discrete and categorical I suppose. Hm, I'd imagine most categorical features come in as strings. This feels like just the kind of thing a framework can accommodate if it has the type information. I don't think it's more or less complex to say that a string column can be categorical? It would take some work to inject a translation to integers where that's needed but that's great if the framework can do that.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org