Spark+hive bucketing

Marcin Szymaniuk Tue, 16 Jun 2015 00:56:43 -0700

Spark SQL document states:
Tables with buckets: bucket is the hash partitioning within a Hive table
partition. Spark SQL doesn’t support buckets yet


What exactly does that mean?:

   - that writing to bucketed table wont respect this feature and data will
   be written in not bucketed manner?
   - that reading from bucketed table won't use this feature to improve
   performance?
   - both?

Also, event if bucketing is not supported for reading - do we benefit from
having bucketed table just because of the way data is stored in hdfs? If we
read bucketed table in spark is it more likely that data from the same
bucket will be processed by the same task/executor?

Spark+hive bucketing

Reply via email to