planga82 commented on issue #26082: [SPARK-29431][WebUI] Improve Web UI / Sql tab visualization with cached dataframes. URL: https://github.com/apache/spark/pull/26082#issuecomment-551979197 With partial dataframe I mean, for example, when you have a cached dataframe, and the first action you make on it is show(10), only a few elements of this dataframe will be cached. You could do count() on the cached dataframe and this will make the rest of the elements be cached. This is what I'm trying to show with the images. I attach an example with a simple parquet file `sc.parallelize(1 to 100000).toDF("x").withColumn("x1",col("x") + 1).write.parquet("test.parquet")` `val df = spark.read.parquet("test.parquet").filter(col("x")<100).cache()` `res3.filter(col("x")<50).count` ![image](https://user-images.githubusercontent.com/12819544/68508497-d287e180-026e-11ea-9c9a-ee51bb41a411.png) `res3.filter(col("x")<50).count` ![image](https://user-images.githubusercontent.com/12819544/68508527-e6334800-026e-11ea-986b-7126c35fcf53.png)
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org