[GitHub] spark pull request #20313: [SPARK-22974][ML] Attach attributes to output col...

PowerToThePeople111 Thu, 09 Aug 2018 08:34:16 -0700

Github user PowerToThePeople111 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20313#discussion_r208977114
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala ---
    @@ -264,7 +265,9 @@ class CountVectorizerModel(
     
           Vectors.sparse(dictBr.value.size, effectiveCounts)
         }
    -    dataset.withColumn($(outputCol), vectorizer(col($(inputCol))))
    +    val attrs = vocabulary.map(_ => new 
NumericAttribute).asInstanceOf[Array[Attribute]]
    --- End diff --
    
    I do not think, that the information is totally useless: if you want to 
know which feature-vector-index (created by a CountVectorizer) corresponds to 
which LR coefficient for example is very helpful. It should in general be 
possible to actually easily get this information given an arbitrary vector 
which was created by properly implemented feature-generation-transformer.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20313: [SPARK-22974][ML] Attach attributes to output col...

Reply via email to