Sparks' Decision tree does not accept datasets with a single value in a
feature. It produces the following error:

> requirement failed: DecisionTree Strategy given invalid
> categoricalFeaturesInfo setting: feature 645 has 1 categories.  The number
> of categories should be >= 2
>

This is not an uncommon scenario since large datasets can contain features
with only a single value (See training data in [1] for example). As this is
a Spark error, there should be a way to handle such datasets externally.

One possible solution is to allow user to discard features(columns), so
that they can discard those features with single values before training a
Decision tree. Please suggest if there are any other feasible solutions.

Best regards,

[1] https://www.kaggle.com/c/digit-recognizer
-- 
Pruthuvi Maheshakya Wijewardena
Software Engineer
WSO2 Lanka (Pvt) Ltd
Email: mahesha...@wso2.com
Mobile: +94711228855
_______________________________________________
Dev mailing list
Dev@wso2.org
http://wso2.org/cgi-bin/mailman/listinfo/dev

Reply via email to