Github user PowerToThePeople111 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20313#discussion_r208977114
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala ---
    @@ -264,7 +265,9 @@ class CountVectorizerModel(
     
           Vectors.sparse(dictBr.value.size, effectiveCounts)
         }
    -    dataset.withColumn($(outputCol), vectorizer(col($(inputCol))))
    +    val attrs = vocabulary.map(_ => new 
NumericAttribute).asInstanceOf[Array[Attribute]]
    --- End diff --
    
    I do not think, that the information is totally useless: if you want to 
know which feature-vector-index (created by a CountVectorizer) corresponds to 
which LR coefficient for example is very helpful. It should in general be 
possible to actually easily get this information given an arbitrary vector 
which was created by properly implemented feature-generation-transformer.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to