Github user iyerr3 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/246#discussion_r176660561
--- Diff:
src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in ---
@@ -127,18 +128,20 @@ tree_train(
<DT>grouping_cols (optional)</DT>
<DD>TEXT, default: NULL. Comma-separated list of column names to group
the
- data by. This will result in multiple decision trees, one for
+ data by. This will produce multiple decision trees, one for
each group. </DD>
<DT>weights (optional)</DT>
- <DD>TEXT. Column name containing numerical weights for each observation.
- Can be any value greater than 0 (does not need to be
- an integer).
+ <DD>TEXT. Column name containing numerical weights for
+ each observation. Can be any value greater
+ than 0 (does not need to be an integer).
This can be used to handle the case of unbalanced data sets.
- For classification the row's vote is multiplied by the weight,
- and for regression we perform a weighted average at each node.
- If this parameter is not set, all observations (tuples)
- are treated equally with a weight of 1.0.</DD>
+ The weights are used to compute a weighted average in
+ the output leaf node. For classification, the contribution
+ of a row towards the vote of it's corresponding level
--- End diff --
`s/it's/its`. I had a typo in my comment as well.
---