Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21070#discussion_r184156731 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarQuerySuite.scala --- @@ -503,7 +503,7 @@ class InMemoryColumnarQuerySuite extends QueryTest with SharedSQLContext { case plan: InMemoryRelation => plan }.head // InMemoryRelation's stats is file size before the underlying RDD is materialized - assert(inMemoryRelation.computeStats().sizeInBytes === 740) + assert(inMemoryRelation.computeStats().sizeInBytes === 800) --- End diff -- This is data dependent so it is hard to estimate. We write the stats for older readers when the type uses a signed sort order, so it is limited to mostly primitive types and won't be written for byte arrays or utf8 data. That limits the size to 16 bytes + thrift overhead per page and you might have about 100 pages per row group. So 1.5k per 128MB, which is about 0.001%.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org