Github user Dooyoung-Hwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22219#discussion_r213200433
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -3237,6 +3237,28 @@ class Dataset[T] private[sql](
         files.toSet.toArray
       }
     
    +  /**
    +   * Returns the tuple of the row count and an iterator that contains all 
rows in this Dataset.
    +   *
    +   * The iterator will consume as much memory as the total size of 
serialized results which can be
    +   * limited with the config 'spark.driver.maxResultSize'. Rows are 
deserialized when iterating rows
    +   * with returned iterator. Whether to collect all deserialized rows or 
to iterate them
    +   * incrementally can be decided with considering total rows count and 
driver memory.
    +   */
    +  def collectCountAndIterator(): (Long, Iterator[T]) =
    --- End diff --
    
    Ok. I agree with you.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to