khannaekta commented on code in PR #615:
URL: https://github.com/apache/madlib/pull/615#discussion_r1511750633
##########
src/ports/postgres/modules/pmml/table_to_pmml.sql_in:
##########
@@ -146,10 +144,25 @@ Alternatively, the above can also be invoked as below if
custom names are needed
for fields in the Data Dictionary:
<pre class="example">
SELECT madlib.pmml('patients_logregr',
- 'out_attack~1+in_trait_anxiety+in_treatment');
+ 'out_attack~in_trait_anxiety+in_treatment');
</pre>
-\b Note: If the second argument of 'pmml' function is not specified, a default
suffix "_pmml_prediction" will be automatically append to the column name to be
predicted. This can help avoid name conflicts.
+\b Note: 1. If the second argument of 'pmml' function is not specified, a
default suffix "_pmml_prediction" will be automatically append to the column
name to be predicted. This can help avoid name conflicts.
+
+\b Note: 2. While training regression models, it is possible to use a non
array expression. Consider this example:
+<pre>
+-- Create a table where a column named 'x' is an array of the independent
variables
+CREATE TABLE patients2 AS SELECT second_attack AS y, ARRAY[1, treatment,
trait_anxiety] AS x from patients;
+
+-- Now use the columns 'x' and 'y' created in the previous step
+SELECT madlib.logregr_train(
+ 'patients2',
+ 'patients_logregr2',
+ 'y',
+ 'x');
+</pre>
+In such scenarios, the pmml code always assumes that the intercept variable
"1," was already included in the independent variable
+expression. If it is not included, the exported PMML would be incorrect.
Review Comment:
will reword this to : `If it is not included, the exported PMML would not
explicitly capture it as intercept.`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]