[ 
https://issues.apache.org/jira/browse/SPARK-5895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648583#comment-14648583
 ] 

Joseph K. Bradley commented on SPARK-5895:
------------------------------------------

Comment copied from PR about design:

{quote}
Here are some initial thoughts: We should definitely permit users to specify 
features with indices and names. Supporting both within the same type makes the 
API pretty complex. What if we instead had 1 Param for each way of selecting 
columns, where the full set of selected columns will be the union of each 
subset (without duplicates)?

2 params:
* selectedIndices: IntArrayParam
* selectedNames: StringArrayParam

E.g.:
* Input Vector col with length 10 and names "col1, col2, ..."
* selectedIndices = 1, 3
* selectedNames = "col1", "col5"
* Output Vector has columns 1, 3, 5
{quote}


> Add VectorSlicer
> ----------------
>
>                 Key: SPARK-5895
>                 URL: https://issues.apache.org/jira/browse/SPARK-5895
>             Project: Spark
>          Issue Type: Sub-task
>          Components: ML
>            Reporter: Xiangrui Meng
>            Assignee: Xusen Yin
>
> `VectorSlicer` takes a vector column and output a vector column with a subset 
> of features.
> {code}
> val vs = new VectorSlicer()
>   .setInputCol("user")
>   .setSelectedFeatures("age", "salary")
>   .setOutputCol("usefulUserFeatures")
> {code}
> We should allow specifying selected features by indices and by names. It 
> should preserve the output names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to