I've got it , thank you.! On Thu, Jan 8, 2009 at 9:17 PM, Jeremy Chow <[email protected]> wrote:
> Is that the same meaning of hash partition? > > > On Thu, Jan 8, 2009 at 4:52 PM, Jeff Hammerbacher <[email protected]>wrote: > >> Hey Jeremy, >> >> Hive stores each "table" inside of HDFS in a folder. For example, all of >> your weblogs could be stored in a folder called "/hive/weblogs". If you want >> to partition those weblogs by day, you can use the PARTITIONED BY clause on >> the CREATE TABLE statement to create a subfolder for each new day, e.g. >> "/hive/weblogs/ds=2009-01-08". If you wanted to further partition a day's >> logfiles by userid, for example, Hive can hash partition your logfiles into >> "buckets" (subfolders) inside that day's folder, e.g. >> "/hive/weblogs/ds=2009-01-08/0001", where 0001 is the name of the bucket. To >> indicate your desire to have buckets, use the CLUSTERED BY clause on the >> CREATE TABLE statement (see >> http://wiki.apache.org/hadoop/Hive/HiveQL#head-6fb42f2747383d4375e56cc31bbae68860c88a3d >> ). >> >> You can also use buckets with the TABLESAMPLE operator to run Hive queries >> over subsets of your data; this is useful for rapidly prototyping new >> analyses. See >> http://wiki.apache.org/hadoop/Hive/HiveQL#head-c7c5e4391816048d290eb70091487b4f91beebc9for >> the TABLESAMPLE syntax. >> >> Hive folks: in case I butchered that, feel free to jump in with a more >> correct explanation. If it's correct, I'll toss it on the wiki. It would be >> good to have actual HiveQL statements using buckets on the getting started >> guide too, I'd imagine. >> >> Later, >> Jeff >> >> >> On Thu, Jan 8, 2009 at 12:21 AM, Jeremy Chow <[email protected]> wrote: >> >>> Hi list, >>> >>> I get a term named bucket when reading hive source code. what is it >>> means? >>> >>> Thanks, >>> Jeremy >>> -- >>> My research interests are distributed systems, parallel computing and >>> bytecode based virtual machine. >>> >>> http://coderplay.javaeye.com >>> >> >> > > > -- > My research interests are distributed systems, parallel computing and > bytecode based virtual machine. > > http://coderplay.javaeye.com > -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. http://coderplay.javaeye.com
