planga82 edited a comment on issue #26082: [SPARK-29431][WebUI] Improve Web UI 
/ Sql tab visualization with cached dataframes.
URL: https://github.com/apache/spark/pull/26082#issuecomment-551979197
 
 
   With partial dataframe I mean, for example, when you have a cached 
dataframe, and the first action you make on it is show(10), only a few elements 
of this dataframe will be cached. You could do count() on the cached dataframe 
and this will make the rest of the elements be cached. This is what I'm trying 
to show with the images.
   
   I attach an example with a simple parquet file
   `sc.parallelize(1 to 100000).toDF("x").withColumn("x1",col("x") + 1)
   .write.parquet("test.parquet")`
   `val df = spark.read.parquet("test.parquet").filter(col("x")<100).cache()`
   `res3.filter(col("x")<50).count`
   
![image](https://user-images.githubusercontent.com/12819544/68508497-d287e180-026e-11ea-9c9a-ee51bb41a411.png)
   `res3.filter(col("x")<50).count`
   
![image](https://user-images.githubusercontent.com/12819544/68508527-e6334800-026e-11ea-986b-7126c35fcf53.png)
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to