[GitHub] spark pull request #20777: [SPARK-23615][ML][PYSPARK]Add maxDF Parameter to ...

huaxingao Wed, 14 Mar 2018 16:06:51 -0700

Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20777#discussion_r174636559
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -679,6 +679,29 @@ def test_count_vectorizer_with_binary(self):
                 feature, expected = r
                 self.assertEqual(feature, expected)
     
    +    def test_count_vectorizer_with_maxDF(self):
    +        dataset = self.spark.createDataFrame([
    +            (0, "a b c d".split(' '), SparseVector(3, {0: 1.0, 1: 1.0, 2: 
1.0}),),
    +            (1, "a b c".split(' '), SparseVector(3, {0: 1.0, 1: 1.0}),),
    +            (2, "a b".split(' '), SparseVector(3, {0: 1.0}),),
    +            (3, "a".split(' '), SparseVector(3,  {}),)], ["id", "words", 
"expected"])
    +        cv = CountVectorizer(inputCol="words", outputCol="features")
    +        model1 = cv.setMaxDF(3).fit(dataset)
    --- End diff --
    
    Hi Bryan, Thanks for your comments. I will change these.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20777: [SPARK-23615][ML][PYSPARK]Add maxDF Parameter to ...

Reply via email to