Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The "Hive/IndexDev/Bitmap" page has been changed by MarquisWang. http://wiki.apache.org/hadoop/Hive/IndexDev/Bitmap?action=diff&rev1=3&rev2=4 -------------------------------------------------- This implementation confers some of the benefits of bitmap indexing and should be easy to implement given the already existing compact index, but it does few of the optimizations such as compression that a really good bitmap index should do. - Like the complex index, this implementation uses an index table. The index table on a column "key" has three columns, _bucketname, _offset, and _bitmaps. _bucketname is a string pointing to the hadoop file that is storing this block in the table, _offset is the block offset of a block, and _bitmaps is a Map where the keys are all the values of the column "key" that exist in this block and a bitmap encoding (an Array of BigInts??) of every row in that block, with a 1 if that row has the value, 0 if not. If a key value does not appear in a block at all, the value is not stored in the map. + Like the complex index, this implementation uses an index table. The index table on a column "key" has four or more columns: first, the columns that are being indexed, then _bucketname, _offset, and _bitmaps. _bucketname is a string pointing to the hadoop file that is storing this block in the table, _offset is the block offset of a block, and _bitmaps is an uncompressed bitmap encoding (an Array of bytes) of the bitmap for this column value, bucketname, and row offset. Each bit in the bitmap corresponds to one row in the block. The bit is 1 if that row has the value of the values in the columns being indexed, and a 0 if not. If a key value does not appear in a block at all, the value is not stored in the map. - When querying this index, we select each filename, block pair where the _bitmaps Map has a key that is the queried key value. If there are boolean AND or OR operations done on the predicates with bitmap indexes, we can use bitwise operations to try to eliminate blocks as well. We can use this data to generate the filename, array of block offsets format that the compact index handler uses and reuse that in the bitmap index query. + When querying this index, if there are boolean AND or OR operations done on the predicates with bitmap indexes, we can use bitwise operations to try to eliminate blocks as well. We can then eliminate blocks that do not contain the value combinations we are interested in. We can use this data to generate the filename, array of block offsets format that the compact index handler uses and reuse that in the bitmap index query. === Second iteration ===
