[ https://issues.apache.org/jira/browse/SPARK-13641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15181819#comment-15181819 ]
Gayathri Murali commented on SPARK-13641: ----------------------------------------- [~xusen] Can you list the steps to reproduce the bug? > getModelFeatures of ml.api.r.SparkRWrapper cannot (always) reveal the > original column names > ------------------------------------------------------------------------------------------- > > Key: SPARK-13641 > URL: https://issues.apache.org/jira/browse/SPARK-13641 > Project: Spark > Issue Type: Bug > Components: ML, SparkR > Reporter: Xusen Yin > Priority: Minor > > getModelFeatures of ml.api.r.SparkRWrapper cannot (always) reveal the > original column names. Let's take the HouseVotes84 data set as an example: > {code} > case m: XXXModel => > val attrs = AttributeGroup.fromStructField( > m.summary.predictions.schema(m.summary.featuresCol)) > attrs.attributes.get.map(_.name.get) > {code} > The code above gets features' names from the features column. Usually, the > features column is generated by RFormula. The latter has a VectorAssembler in > it, which leads the output attributes not equal with the original ones. > E.g., we want to learn the HouseVotes84's features' name "V1, V2, ..., V16". > But with RFormula, we can only get "V1_n, V2_y, ..., V16_y" because [the > transform function of > VectorAssembler|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala#L75] > adds salts of the column names. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org