[ https://issues.apache.org/jira/browse/SPARK-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234091#comment-14234091 ]
Apache Spark commented on SPARK-4494: ------------------------------------- User 'yu-iskw' has created a pull request for this issue: https://github.com/apache/spark/pull/3603 > IDFModel.transform() add support for single vector > -------------------------------------------------- > > Key: SPARK-4494 > URL: https://issues.apache.org/jira/browse/SPARK-4494 > Project: Spark > Issue Type: New Feature > Components: MLlib > Affects Versions: 1.1.1, 1.2.0 > Reporter: Jean-Philippe Quemener > Priority: Minor > > For now when using the tfidf implementation of mllib you have no other > possibility to map your data back onto i.e. labels or ids than use a hackish > way with ziping: {quote} 1. Persist input RDD. 2. Transform it to just > vectors and apply IDFModel 3. zip with original RDD 4. transform label and > new vector to LabeledPoint{quote} > Source:[http://stackoverflow.com/questions/26897908/spark-mllib-tfidf-implementation-for-logisticregression] > I think as in production alot of users want to map their data back to some > identifier, it would be a good imporvement to allow using a single vector on > IDFModel.transform() -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org