Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/4460#issuecomment-75106841
  
    > Rename FeatureType? and what's its value for AttributeGroup? GROUP or 
null?
    
    I wish we could use `type`, but it is already taken by Scala. `DataType` is 
taken by SQL. So `DatumType` or `MLDataType`? ... I don't really have good 
suggestions. I'm not sure whether we should make `AttributeGroup` an 
`Attribute`. What is the benefit of making it an `Attribute`?
    
    > You could imagine a more elaborate hierarchy of types: discrete is a 
special case of continuous, ordinal is a special case of discrete. It's nice to 
have that expressiveness; it adds somewhat to the complexity for the caller and 
the code. Maybe you could argue that the schema should force an interpretation 
for the algorithm. But I kind of like it. The type objects would have methods 
like isContinuous, isCategorical. Should I make a fuller hierarchy or stick to 
adding BINARY?
    
    I think having a full hierarchy is a good idea. Could you list all of the 
types you want to include? Then we can check the complexity. Btw, I don't know 
whether we should have ML attributes attached to string columns. It seems to me 
that a string column should be mapped to an integer column first to become an 
ML column with attribute. Hopefully that reduces the complexity.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to