[ https://issues.apache.org/jira/browse/SPARK-5895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386365#comment-14386365 ]
Xusen Yin commented on SPARK-5895: ---------------------------------- I have another concern here. We can not reveal each column name of a `Vector`. Given the selected features "age" and "salary", how to select these two columns from a vector? One solution is giving it a list of column names, say, `setColumnNames(List[String])`. But a more natural way to solve it is adding the list of column names in `VectorAssembler`. > Add VectorSlicer > ---------------- > > Key: SPARK-5895 > URL: https://issues.apache.org/jira/browse/SPARK-5895 > Project: Spark > Issue Type: Sub-task > Components: ML > Reporter: Xiangrui Meng > > `VectorSlicer` takes a vector column and output a vector column with a subset > of features. > {code} > val vs = new VectorSlicer() > .setInputCol("user") > .setSelectedFeatures("age", "salary") > .setOutputCol("usefulUserFeatures") > {code} > We should allow specifying selected features by indices and by names. It > should preserve the output names. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org