[ 
https://issues.apache.org/jira/browse/MADLIB-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratik updated MADLIB-1317:
---------------------------
    Description: 
Hi team,

I have using madlib multinomial method on my dataset with categorical 
independent variable (hot encoded) as below. 

 
{code:java}
SELECT
    CASE WHEN multinom IS NOT NULL THEN TRUE ELSE FALSE END
FROM
 madlib.multinom(
    'TEMP_TEST_1',
    'TEMP_TEST_1_OP',
    'dep_var_col',
    'ARRAY[ 1,hot_encoded_GENDER_col_val1, hot_encoded_GENDER_col_val2]',
    '1',--REF CATEGORY 
    'logit',
    NULL,
    'max_iter=100,optimizer=irls,tolerance=0.0001',
    TRUE
 );{code}
Gender being a categorical column I am hot encoding it in 2 columns 0|1. 

When comparing results with R's method coefficients match but the StdErr and 
pValue are way off in comparison.

R method -
{code:java}
nnet::multinom
{code}
 

Is there anything I need to do specially for multinom or is it a bug? 

Or is there perticular way I need to use R to compare results with multinom?

*UPDATE:*

Is it mandatory to have ref_category like column for categorical independent 
variable?? 

hot encoded GENDER_col_val1 from list of independent variable and results are 
matching with Rs output.

 

Is there any documentation or reference to confirm this? 

  was:
Hi team,

I have using madlib multinomial method on my dataset with categorical 
independent variable (hot encoded) as below. 

 
{code:java}
SELECT
    CASE WHEN multinom IS NOT NULL THEN TRUE ELSE FALSE END
FROM
 madlib.multinom(
    'TEMP_TEST_1',
    'TEMP_TEST_1_OP',
    'dep_var_col',
    'ARRAY[ 1,hot_encoded_GENDER_col_val1, hot_encoded_GENDER_col_val2]',
    '1',--REF CATEGORY 
    'logit',
    NULL,
    'max_iter=100,optimizer=irls,tolerance=0.0001',
    TRUE
 );{code}
Gender being a categorical column I am hot encoding it in 2 columns 0|1. 

When comparing results with R's method coefficients match but the StdErr and 
pValue are way off in comparison.

R method -
{code:java}
nnet::multinom
{code}
 

Is there anything I need to do specially for multinom or is it a bug? 

Or is there perticular way I need to use R to compare results with multinom?

 


> Multinomial results not matching with R method
> ----------------------------------------------
>
>                 Key: MADLIB-1317
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1317
>             Project: Apache MADlib
>          Issue Type: Bug
>          Components: Module: Multinomial Logistic Regression
>            Reporter: Pratik
>            Priority: Major
>
> Hi team,
> I have using madlib multinomial method on my dataset with categorical 
> independent variable (hot encoded) as below. 
>  
> {code:java}
> SELECT
>     CASE WHEN multinom IS NOT NULL THEN TRUE ELSE FALSE END
> FROM
>  madlib.multinom(
>     'TEMP_TEST_1',
>     'TEMP_TEST_1_OP',
>     'dep_var_col',
>     'ARRAY[ 1,hot_encoded_GENDER_col_val1, hot_encoded_GENDER_col_val2]',
>     '1',--REF CATEGORY 
>     'logit',
>     NULL,
>     'max_iter=100,optimizer=irls,tolerance=0.0001',
>     TRUE
>  );{code}
> Gender being a categorical column I am hot encoding it in 2 columns 0|1. 
> When comparing results with R's method coefficients match but the StdErr and 
> pValue are way off in comparison.
> R method -
> {code:java}
> nnet::multinom
> {code}
>  
> Is there anything I need to do specially for multinom or is it a bug? 
> Or is there perticular way I need to use R to compare results with multinom?
> *UPDATE:*
> Is it mandatory to have ref_category like column for categorical independent 
> variable?? 
> hot encoded GENDER_col_val1 from list of independent variable and results are 
> matching with Rs output.
>  
> Is there any documentation or reference to confirm this? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to