Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20446#discussion_r165568368
  
    --- Diff: docs/ml-statistics.md ---
    @@ -89,4 +89,26 @@ Refer to the [`ChiSquareTest` Python 
docs](api/python/index.html#pyspark.ml.stat
     {% include_example python/ml/chi_square_test_example.py %}
     </div>
     
    +</div>
    +
    +## Summarizer
    +
    +We provide vector column summary statistics for `Dataframe` through 
`Summarizer`.
    +Available metrics are the column-wise max, min, mean, variance, and number 
of nonzeros, as well as the total count.
    +
    +<div class="codetabs">
    +<div data-lang="scala" markdown="1">
    +The following example demonstrates using 
[`Summarizer`](api/scala/index.html#org.apache.spark.ml.stat.Summarizer$)
    +to compute the mean and variance for the input dataframe, with and without 
a weight column.
    --- End diff --
    
    sorry, one more comment here
    
    I think perhaps "... to compute the mean and variance for a vector column 
of the input dataframe ..." 
    
    (and same below)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to