Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The following page has been changed by AMammenT: http://wiki.apache.org/hadoop/Hive/LanguageManual/Sampling The comment on the change is: Clear up confusion around cluster vs. bucket and how they interact. ------------------------------------------------------------------------------ So in the above example, if table 'source' was created with 'CLUSTERED BY id INTO 32 BUCKETS' {{{ - TABLESAMPLE(BUCKET 3 OUT OF 16) + TABLESAMPLE(BUCKET 3 OUT OF 16 ON id) }}} - would pick out the 3rd and 19th buckets. + would pick out the 3rd and 19th clusters as each bucket would be composed of (32/16)=2 clusters. On the other hand the tablesample clause {{{ TABLESAMPLE(BUCKET 3 OUT OF 64 ON id) }}} - would pick out half of the 3rd bucket. + would pick out half of the 3rd cluster as each bucket would be composed of (32/64)=1/2 of a cluster.