[ 
https://issues.apache.org/jira/browse/SPARK-9610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16581253#comment-16581253
 ] 

Barry Becker commented on SPARK-9610:
-------------------------------------

All ML models should support having and optional weighting column set. The 
weighting column should be a positive real number. If weight values are not >0, 
then that should throw an error. A weighting column is useful for several cases 
- like when the class labels are very skewed, or when you just want some 
records to count more heavily than others. For example, you might want a 
dataset of cities to be weighted by population, or a dataset of products to be 
weighted by price.

> Class and instance weighting for ML
> -----------------------------------
>
>                 Key: SPARK-9610
>                 URL: https://issues.apache.org/jira/browse/SPARK-9610
>             Project: Spark
>          Issue Type: Umbrella
>          Components: ML
>            Reporter: Joseph K. Bradley
>            Priority: Major
>
> This umbrella is for tracking tasks for adding support for label or instance 
> weights to ML algorithms.  These additions will help handle skewed or 
> imbalanced data, ensemble methods, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to