Sampling" by AMammenT

Apache Wiki Thu, 06 Aug 2009 16:37:12 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The following page has been changed by AMammenT:
http://wiki.apache.org/hadoop/Hive/LanguageManual/Sampling

The comment on the change is:
Clear up confusion around cluster vs. bucket and how they interact.  

------------------------------------------------------------------------------
  
  So in the above example, if table 'source' was created with 'CLUSTERED BY id 
INTO 32 BUCKETS' 
  {{{
-     TABLESAMPLE(BUCKET 3 OUT OF 16) 
+     TABLESAMPLE(BUCKET 3 OUT OF 16 ON id) 
  }}}
- would pick out the 3rd and 19th buckets. 
+ would pick out the 3rd and 19th clusters as each bucket would be composed of 
(32/16)=2 clusters. 
  
  On the other hand the tablesample clause
  {{{
      TABLESAMPLE(BUCKET 3 OUT OF 64 ON id) 
  }}}
- would pick out half of the 3rd bucket. 
+ would pick out half of the 3rd cluster as each bucket would be composed of 
(32/64)=1/2 of a cluster.

[Hadoop Wiki] Update of "Hive/LanguageManual/Sampling" by AMammenT

Reply via email to