Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.

The "Hive/IndexDev/Bitmap" page has been changed by MarquisWang.
http://wiki.apache.org/hadoop/Hive/IndexDev/Bitmap?action=diff&rev1=3&rev2=4

--------------------------------------------------

  
  This implementation confers some of the benefits of bitmap indexing and 
should be easy to implement given the already existing compact index, but it 
does few of the optimizations such as compression that a really good bitmap 
index should do.
  
- Like the complex index, this implementation uses an index table. The index 
table on a column "key" has three columns, _bucketname, _offset, and _bitmaps. 
_bucketname is a string pointing to the hadoop file that is storing this block 
in the table, _offset is the block offset of a block, and _bitmaps is a Map 
where the keys are all the values of the column "key" that exist in this block 
and a bitmap encoding (an Array of BigInts??) of every row in that block, with 
a 1 if that row has the value, 0 if not. If a key value does not appear in a 
block at all, the value is not stored in the map.
+ Like the complex index, this implementation uses an index table. The index 
table on a column "key" has four or more columns: first, the columns that are 
being indexed, then _bucketname, _offset, and _bitmaps. _bucketname is a string 
pointing to the hadoop file that is storing this block in the table, _offset is 
the block offset of a block, and _bitmaps is an uncompressed bitmap encoding 
(an Array of bytes) of the bitmap for this column value, bucketname, and row 
offset. Each bit in the bitmap corresponds to one row in the block. The bit is 
1 if that row has the value of the values in the columns being indexed, and a 0 
if not. If a key value does not appear in a block at all, the value is not 
stored in the map.
  
- When querying this index, we select each filename, block pair where the 
_bitmaps Map has a key that is the queried key value. If there are boolean AND 
or OR operations done on the predicates with bitmap indexes, we can use bitwise 
operations to try to eliminate blocks as well. We can use this data to generate 
the filename, array of block offsets format that the compact index handler uses 
and reuse that in the bitmap index query.
+ When querying this index, if there are boolean AND or OR operations done on 
the predicates with bitmap indexes, we can use bitwise operations to try to 
eliminate blocks as well. We can then eliminate blocks that do not contain the 
value combinations we are interested in. We can use this data to generate the 
filename, array of block offsets format that the compact index handler uses and 
reuse that in the bitmap index query.
  
  === Second iteration ===
  

Reply via email to