Github user iyerr3 commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/246#discussion_r176660561
  
    --- Diff: 
src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in ---
    @@ -127,18 +128,20 @@ tree_train(
     
       <DT>grouping_cols (optional)</DT>
       <DD>TEXT, default: NULL. Comma-separated list of column names to group 
the
    -      data by. This will result in multiple decision trees, one for
    +      data by. This will produce multiple decision trees, one for
           each group. </DD>
     
       <DT>weights (optional)</DT>
    -  <DD>TEXT. Column name containing numerical weights for each observation.
    -  Can be any value greater than 0 (does not need to be
    -  an integer).  
    +  <DD>TEXT. Column name containing numerical weights for 
    +  each observation.  Can be any value greater 
    +  than 0 (does not need to be an integer).  
       This can be used to handle the case of unbalanced data sets.
    -  For classification the row's vote is multiplied by the weight, 
    -  and for regression we perform a weighted average at each node.
    -  If this parameter is not set, all observations (tuples)
    -  are treated equally with a weight of 1.0.</DD>
    +  The weights are used to compute a weighted average in 
    +  the output leaf node. For classification, the contribution 
    +  of a row towards the vote of it's corresponding level 
    --- End diff --
    
    `s/it's/its`. I had a typo in my comment as well. 


---

Reply via email to