[GitHub] madlib pull request #246: DT user doc updates

iyerr3 Fri, 23 Mar 2018 01:08:39 -0700

Github user iyerr3 commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/246#discussion_r176661267
  
    --- Diff: 
src/ports/postgres/modules/recursive_partitioning/random_forest.sql_in ---
    @@ -97,327 +264,220 @@ forest_train(training_table_name,
     
         <tr>
           <th>is_classification</th>
    -      <td>boolean. True if it is a classification model.</td>
    +      <td>BOOLEAN. True if it is a classification model, false
    +      if for regression.</td>
         </tr>
     
         <tr>
           <th>source_table</th>
    -      <td>text. Data source table name.</td>
    +      <td>TEXT. Data source table name.</td>
         </tr>
     
         <tr>
           <th>model_table</th>
    -      <td>text. Model table name.</td>
    +      <td>TEXT. Model table name.</td>
         </tr>
     
         <tr>
           <th>id_col_name</th>
    -    <td>text. The ID column name.</td>
    +    <td>TEXT. The ID column name.</td>
         </tr>
     
         <tr>
           <th>dependent_varname</th>
    -      <td>text. Dependent variable.</td>
    +      <td>TEXT. Dependent variable.</td>
         </tr>
     
         <tr>
    -      <th>independent_varname</th>
    -      <td>text. Independent variables</td>
    +      <th>independent_varnames</th>
    +      <td>TEXT. Independent variables</td>
         </tr>
     
         <tr>
           <th>cat_features</th>
    -      <td>text. Categorical feature names.</td>
    +      <td>TEXT. List of categorical features 
    +      as a comma-separated string.</td>
         </tr>
     
         <tr>
           <th>con_features</th>
    -      <td>text. Continuous feature names.</td>
    +      <td>TEXT. List of continuous feature
    +      as a comma-separated string.</td>
         </tr>
     
         <tr>
    -      <th>grouping_col</th>
    -      <td>int. Names of grouping columns.</td>
    +      <th>grouping_cols</th>
    +      <td>INTEGER. Names of grouping columns.</td>
         </tr>
     
         <tr>
           <th>num_trees</th>
    -      <td>int. Number of trees grown by the model.</td>
    +      <td>INTEGER. Number of trees grown by the model.</td>
         </tr>
     
         <tr>
           <th>num_random_features</th>
    -      <td>int. Number of features randomly selected for each split.</td>
    +      <td>INTEGER. Number of features randomly selected for each 
split.</td>
         </tr>
     
         <tr>
           <th>max_tree_depth</th>
    -      <td>int. Maximum depth of any tree in the random forest 
model_table.</td>
    +      <td>INTEGER. Maximum depth of any tree in the random forest 
model_table.</td>
         </tr>
     
         <tr>
           <th>min_split</th>
    -      <td>int. Minimum number of observations in a node for it to be 
split.</td>
    +      <td>INTEGER. Minimum number of observations in a node for it to be 
split.</td>
         </tr>
     
         <tr>
           <th>min_bucket</th>
    -      <td>int. Minimum number of observations in any terminal node.</td>
    +      <td>INTEGER. Minimum number of observations in any terminal 
node.</td>
         </tr>
     
         <tr>
           <th>num_splits</th>
    -      <td>int. Number of buckets for continuous variables.</td>
    +      <td>INTEGER. Number of buckets for continuous variables.</td>
         </tr>
     
         <tr>
           <th>verbose</th>
    -      <td>boolean. Whether or not to display debug info.</td>
    +      <td>BOOLEAN. Whether or not to display debug info.</td>
         </tr>
     
         <tr>
           <th>importance</th>
    -      <td>boolean. Whether or not to calculate variable importance.</td>
    +      <td>BOOLEAN. Whether or not to calculate variable importance.</td>
         </tr>
     
         <tr>
           <th>num_permutations</th>
    -      <td>int. Number of times feature values are permuted while 
calculating
    -      variable importance. The default value is 1.</td>
    +      <td>INTEGER. Number of times feature values are permuted while 
calculating
    +      variable importance.</td>
         </tr>
     
         <tr>
         <th>num_all_groups</th>
    -    <td>int. Number of groups during forest training.</td>
    +    <td>INTEGER. Number of groups during forest training.</td>
         </tr>
     
         <tr>
         <th>num_failed_groups</th>
    -    <td>int. Number of failed groups during forest training.</td>
    +    <td>INTEGER. Number of failed groups during forest training.</td>
         </tr>
     
         <tr>
           <th>total_rows_processed</th>
    -      <td>bigint. Total numbers of rows processed in all groups.</td>
    +      <td>BIG INTEGER. Total numbers of rows processed in all groups.</td>
    --- End diff --
    
    This is `BIGINT`. Postgres doesn't expand on the `INT` for this type. Same 
comment for the next item as well.

---

[GitHub] madlib pull request #246: DT user doc updates

Reply via email to