Github user iyerr3 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/246#discussion_r175924018
--- Diff:
src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in ---
@@ -127,7 +132,11 @@ tree_train(
<DT>weights (optional)</DT>
<DD>TEXT. Column name containing numerical weights for each observation.
+ Can be any value greater than 0 (does not need to be
+ an integer).
This can be used to handle the case of unbalanced data sets.
+ For classification the row's vote is multiplied by the weight,
--- End diff --
I suggest rephrase as
> The `weights` is used to compute a weighted average in the output leaf
node. For classification, the contribution of a row towards the vote of it's
corresponding level is multiplied by the weight (weighted mode). For
regression, the output value of the row is multiplied by the weight (weighted
mean).
---