Github user PowerToThePeople111 commented on a diff in the pull request: https://github.com/apache/spark/pull/20313#discussion_r208977114 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala --- @@ -264,7 +265,9 @@ class CountVectorizerModel( Vectors.sparse(dictBr.value.size, effectiveCounts) } - dataset.withColumn($(outputCol), vectorizer(col($(inputCol)))) + val attrs = vocabulary.map(_ => new NumericAttribute).asInstanceOf[Array[Attribute]] --- End diff -- I do not think, that the information is totally useless: if you want to know which feature-vector-index (created by a CountVectorizer) corresponds to which LR coefficient for example is very helpful. It should in general be possible to actually easily get this information given an arbitrary vector which was created by properly implemented feature-generation-transformer.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org