[ https://issues.apache.org/jira/browse/SPARK-5895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648583#comment-14648583 ]
Joseph K. Bradley commented on SPARK-5895: ------------------------------------------ Comment copied from PR about design: {quote} Here are some initial thoughts: We should definitely permit users to specify features with indices and names. Supporting both within the same type makes the API pretty complex. What if we instead had 1 Param for each way of selecting columns, where the full set of selected columns will be the union of each subset (without duplicates)? 2 params: * selectedIndices: IntArrayParam * selectedNames: StringArrayParam E.g.: * Input Vector col with length 10 and names "col1, col2, ..." * selectedIndices = 1, 3 * selectedNames = "col1", "col5" * Output Vector has columns 1, 3, 5 {quote} > Add VectorSlicer > ---------------- > > Key: SPARK-5895 > URL: https://issues.apache.org/jira/browse/SPARK-5895 > Project: Spark > Issue Type: Sub-task > Components: ML > Reporter: Xiangrui Meng > Assignee: Xusen Yin > > `VectorSlicer` takes a vector column and output a vector column with a subset > of features. > {code} > val vs = new VectorSlicer() > .setInputCol("user") > .setSelectedFeatures("age", "salary") > .setOutputCol("usefulUserFeatures") > {code} > We should allow specifying selected features by indices and by names. It > should preserve the output names. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org