[ 
https://issues.apache.org/jira/browse/SPARK-28295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nils Skotara updated SPARK-28295:
---------------------------------
    Description: 
Using pyspark.ml.regression,

when I fit a GeneralizedLinearRegression like this:
 glr = GeneralizedLinearRegression(family="gaussian", link="identity",
 regParam=0.3, maxIter=10)
 model = glr.fit(someData)

It seems like there is no way to get the matching of the features and their 
coefficients or standard errors. I am using an ugly work around like this right 
now:

field = 
model.summary._call_java('getClass').getDeclaredField("coefficientsWithStatistics")
 object2 = model._call_java('summary')
 field.setAccessible(True)
 value = field.get(object2)

coef_value = {}

for i in range(0, len(value)):
    row = value[i].toString()
    values = row.split(',')
    coef_value[values[0].replace('(', '').replace(')', '')] = float(values[1])

Am I missing something?
 If not, I'd like to request a method similar to model.coefficients with which 
one can just get the feature names in the right order, like model.features or 
something like that.

  was:
In from pyspark.ml.regression

when I fit a GeneralizedLinearRegression like this:
glr = GeneralizedLinearRegression(family="gaussian", link="identity",
 regParam=0.3, maxIter=10)
model = glr.fit(someData)

It seems like there is no way to get the matching of the features and their 
coefficients or standard errors. I am using an ugly work around like this right 
now:



field = 
model.summary._call_java('getClass').getDeclaredField("coefficientsWithStatistics")
object2 = model._call_java('summary')
field.setAccessible(True)
value = field.get(object2)

coef_value = {}

for i in range(0, len(value)):
   row = value[i].toString()
   values = row.split(',')
   coef_value[values[0].replace('(', '').replace(')', '')] = float(values[1])


Am I missing something?
If not, I'd like to request a method similar to model.coefficients with which 
one can just get the feature names in the right order, like model.features or 
something like that.


> Is there a way of getting feature names from pyspark.ml.regression 
> GeneralizedLinearRegression?
> -----------------------------------------------------------------------------------------------
>
>                 Key: SPARK-28295
>                 URL: https://issues.apache.org/jira/browse/SPARK-28295
>             Project: Spark
>          Issue Type: Request
>          Components: Build
>    Affects Versions: 2.3.1
>            Reporter: Nils Skotara
>            Priority: Minor
>              Labels: features
>             Fix For: 2.3.1
>
>
> Using pyspark.ml.regression,
> when I fit a GeneralizedLinearRegression like this:
>  glr = GeneralizedLinearRegression(family="gaussian", link="identity",
>  regParam=0.3, maxIter=10)
>  model = glr.fit(someData)
> It seems like there is no way to get the matching of the features and their 
> coefficients or standard errors. I am using an ugly work around like this 
> right now:
> field = 
> model.summary._call_java('getClass').getDeclaredField("coefficientsWithStatistics")
>  object2 = model._call_java('summary')
>  field.setAccessible(True)
>  value = field.get(object2)
> coef_value = {}
> for i in range(0, len(value)):
>     row = value[i].toString()
>     values = row.split(',')
>     coef_value[values[0].replace('(', '').replace(')', '')] = float(values[1])
> Am I missing something?
>  If not, I'd like to request a method similar to model.coefficients with 
> which one can just get the feature names in the right order, like 
> model.features or something like that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to