Dear list,

I am trying to run some regression models with big data set using sparklyr. 
Some of the explanatory variables (Xs) in my model are categorical variables, 
they have to be converted into dummy codes before the analysis. I understand 
that in spark columns need to be treated as string type and ft_one_hot_encoder 
to the dummy code, there are some discussions online, however, I could not 
figure out how to properly write the code, could you give me some suggestions 
please? Thank you very much.

The code looks as below:

> sc_mtcars%>%ft_string_indexer("gear","gear1")%>%ft_one_hot_encoder("gear1","gear2")%>%ml_linear_regression(hp~gear1+wt)
>  
Formula: hp ~ gear1 + wt

Coefficients:
(Intercept)       gear1          wt 
  -78.38285    36.41416    62.17596 

As you can see, it seems "ft_one_hot_encoder("gear1","gear2”)” didn’t work, 
otherwise there should be two coefficients for gear2. Any idea what when wrong?

One more thing, there are some earlier posts online showing regression results 
with significance test info (standard errors and p values), is there any way to 
extract these info with the latest release of sparklyr? standard error, maybe?

Thank you very much.

Best regards,

YA. 

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to