Re: Default size of a datatype in SparkSQL

2015-10-08 Thread Michael Armbrust
Its purely for estimation, when guessing when its safe to do a broadcast join. We picked a random number that we thought was larger than the common case (its better to over estimate to avoid OOM). On Wed, Oct 7, 2015 at 10:11 PM, vivek bhaskar wrote: > I want to understand

Default size of a datatype in SparkSQL

2015-10-07 Thread vivek bhaskar
I want to understand whats use of default size for a given datatype? Following link mention that its for internal size estimation. https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/types/DataType.html Above behavior is also reflected in code where default value seems to be used