Xiao Li created SPARK-21510:
-------------------------------

             Summary: Add isMaterialized() and eager persist() to Dataset APIs
                 Key: SPARK-21510
                 URL: https://issues.apache.org/jira/browse/SPARK-21510
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.3.0
            Reporter: Xiao Li
            Assignee: Xiao Li


Currently, when using Spark, the beginners do not realize our persist API is 
lazy. They do not know what is the most efficient way to materialize it. 
Sometimes, they just use collect(), which is very expensive when the data set 
is big. 

In addition, we also need another API to verify whether the Dataset has been 
cached and materialized. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to