GitHub user spanchamiamapr opened a pull request:
https://github.com/apache/drill/pull/122
DRILL - 3492 - Add support for encoding of Drill data types into byte
ordered format
Description:
This change allows encoding/decoding of data from/to 'double', 'float',
'bigint', and 'int' data types to/from OrderedBytes format. It also allows for
OrderedByte encoded row-keys to be stored in ascending as well as descending
order.
The following JIRA added the OrderedBytes encoding to HBase:
https://issues.apache.org/jira/browse/HBASE-8201
This encoding scheme will preserve the sort-order of the native data-type
when it is stored as sorted byte arrays on disk.
Thus, it will help the HBase storage plugin if the row-keys have been
encoded in OrderedBytes format.
This functionality allows us to prune the scan ranges, thus reading much
lesser data from the server.
Testing Done:
Added a new unit-test class TestOrderedBytesConvertFunctions.java which
derives from TestConvertFunctions.java class. Also added new test cases to
TestHBaseFilterPushDown class that will test if we were able to push-down
filters correctly and if the results are correct.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/spanchamiamapr/drill master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/drill/pull/122.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #122
----
commit e3642de28a5f03c702433fe581819acced7847a7
Author: spanchamia <[email protected]>
Date: 2015-07-29T23:59:31Z
DRILL-3364: Prune scan range if the filter is on the leading field with
byte comparable encoding
The change adds support to perform row-key range pruning when the
row-key
prefix is interpretted as UINT4_BE, TIMESTAMP_EPOCH_BE, TIME_EPOCH_BE,
DATE_EPOCH_BE, UINT8_BE encoded.
Testing Done: Added a unit-tests for the new feature, also ran all
existing unit-tests to make sure there is no regression.
commit 70e1f3b2ca0410748b9872535bb205651e86d6c9
Author: spanchamia <[email protected]>
Date: 2015-07-30T05:53:04Z
DRILL-3492: Add support for encoding/decoding of to/from OrderedBytes
format
Description:
This change allows encoding/decoding of data from/to 'double', 'float',
'bigint', 'int' and 'utf8' data types to/from OrderedBytes format.
It also allows for OrderedByte encoded row-keys to be stored in
ascending as well as descending order.
The following JIRA added the OrderedBytes encoding to HBase:
https://issues.apache.org/jira/browse/HBASE-8201
This encoding scheme will preserve the sort-order of the native
data-type when it is stored as sorted byte arrays on disk.
Thus, it will help the HBase storage plugin if the row-keys have been
encoded in OrderedBytes format.
This functionality allows us to prune the scan ranges, thus reading much
lesser data from the server.
Testing Done:
Added a new unit-test class TestOrderedBytesConvertFunctions.java which
derives from TestConvertFunctions.java class.
Also add new test cases to TestHBaseFilterPushDown class that will test
if we were able to push-down filters correctly and if the results are
correct.
commit c9f8622b5cc0cf87dcdf88d73e608039556fedcb
Author: Smidth Panchamia <[email protected]>
Date: 2015-08-19T21:51:36Z
Merge remote-tracking branch 'apache/master'
Conflicts:
contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/CompareFunctionsProcessor.java
contrib/storage-hbase/src/test/java/org/apache/drill/hbase/HBaseTestsSuite.java
contrib/storage-hbase/src/test/java/org/apache/drill/hbase/TestHBaseFilterPushDown.java
contrib/storage-hbase/src/test/java/org/apache/drill/hbase/TestTableGenerator.java
commit 2daacad4ca62e753bbcad7f3637512ca810ea491
Author: Smidth Panchamia <[email protected]>
Date: 2015-08-19T22:18:33Z
DRILL-3492 - * Remove repeated allocations of byte arrays and
PositionedByteRange objects on heap(as suggested by Jason).
* Remove OrderedBytes encode/decode operations on UTF8 types.
Reasons -
1. These operations are slow and incur a lot of heap allocations
2. UTF8 types maintain their natural sort order when stored as binary
arrays.
commit 71b053006b587f39a47025302e7d3de8dcac482d
Author: Smidth Panchamia <[email protected]>
Date: 2015-08-19T22:27:02Z
DRILL-3492 - Remove test code that creates test tables with UTF8
OrderedByte encoding.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---