[ https://issues.apache.org/jira/browse/SPARK-9610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16581253#comment-16581253 ]
Barry Becker commented on SPARK-9610: ------------------------------------- All ML models should support having and optional weighting column set. The weighting column should be a positive real number. If weight values are not >0, then that should throw an error. A weighting column is useful for several cases - like when the class labels are very skewed, or when you just want some records to count more heavily than others. For example, you might want a dataset of cities to be weighted by population, or a dataset of products to be weighted by price. > Class and instance weighting for ML > ----------------------------------- > > Key: SPARK-9610 > URL: https://issues.apache.org/jira/browse/SPARK-9610 > Project: Spark > Issue Type: Umbrella > Components: ML > Reporter: Joseph K. Bradley > Priority: Major > > This umbrella is for tracking tasks for adding support for label or instance > weights to ML algorithms. These additions will help handle skewed or > imbalanced data, ensemble methods, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org