GitHub user spanchamiamapr opened a pull request:

    https://github.com/apache/drill/pull/122

    DRILL - 3492 - Add support for encoding of Drill data types into byte 
ordered format

    Description:
    This change allows encoding/decoding of data from/to 'double', 'float', 
'bigint', and 'int' data types to/from OrderedBytes format. It also allows for 
OrderedByte encoded row-keys to be stored in ascending as well as descending 
order.
    
    The following JIRA added the OrderedBytes encoding to HBase:
    https://issues.apache.org/jira/browse/HBASE-8201
    
    This encoding scheme will preserve the sort-order of the native data-type 
when it is stored as sorted byte arrays on disk.
    Thus, it will help the HBase storage plugin if the row-keys have been 
encoded in OrderedBytes format.
    
    This functionality allows us to prune the scan ranges, thus reading much 
lesser data from the server.
    
    Testing Done:
    Added a new unit-test class TestOrderedBytesConvertFunctions.java which 
derives from TestConvertFunctions.java class. Also added new test cases to 
TestHBaseFilterPushDown class that will test if we were able to push-down 
filters correctly and if the results are correct.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/spanchamiamapr/drill master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/122.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #122
    
----
commit e3642de28a5f03c702433fe581819acced7847a7
Author: spanchamia <[email protected]>
Date:   2015-07-29T23:59:31Z

    DRILL-3364: Prune scan range if the filter is on the leading  field with
    byte comparable encoding
    
    The change adds support to perform row-key range pruning when the
    row-key
    prefix is interpretted as UINT4_BE, TIMESTAMP_EPOCH_BE, TIME_EPOCH_BE,
    DATE_EPOCH_BE, UINT8_BE encoded.
    
    Testing Done: Added a unit-tests for the new feature, also ran all
    existing unit-tests to make sure there is no regression.

commit 70e1f3b2ca0410748b9872535bb205651e86d6c9
Author: spanchamia <[email protected]>
Date:   2015-07-30T05:53:04Z

    DRILL-3492: Add support for encoding/decoding of to/from OrderedBytes
    format
    
    Description:
    This change allows encoding/decoding of data from/to 'double', 'float',
    'bigint', 'int' and 'utf8' data types to/from OrderedBytes format.
    It also allows for OrderedByte encoded row-keys to be stored in
    ascending as well as descending order.
    
    The following JIRA added the OrderedBytes encoding to HBase:
    https://issues.apache.org/jira/browse/HBASE-8201
    
    This encoding scheme will preserve the sort-order of the native
    data-type when it is stored as sorted byte arrays on disk.
    Thus, it will help the HBase storage plugin if the row-keys have been
    encoded in OrderedBytes format.
    
    This functionality allows us to prune the scan ranges, thus reading much
    lesser data from the server.
    
    Testing Done:
    Added a new unit-test class TestOrderedBytesConvertFunctions.java which
    derives from TestConvertFunctions.java class.
    Also add new test cases to TestHBaseFilterPushDown class that will test
    if we were able to push-down filters correctly and if the results are
    correct.

commit c9f8622b5cc0cf87dcdf88d73e608039556fedcb
Author: Smidth Panchamia <[email protected]>
Date:   2015-08-19T21:51:36Z

    Merge remote-tracking branch 'apache/master'
    
    Conflicts:
        
contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/CompareFunctionsProcessor.java
        
contrib/storage-hbase/src/test/java/org/apache/drill/hbase/HBaseTestsSuite.java
        
contrib/storage-hbase/src/test/java/org/apache/drill/hbase/TestHBaseFilterPushDown.java
        
contrib/storage-hbase/src/test/java/org/apache/drill/hbase/TestTableGenerator.java

commit 2daacad4ca62e753bbcad7f3637512ca810ea491
Author: Smidth Panchamia <[email protected]>
Date:   2015-08-19T22:18:33Z

    DRILL-3492 - * Remove repeated allocations of byte arrays and 
PositionedByteRange objects on heap(as suggested by Jason).
    * Remove OrderedBytes encode/decode operations on UTF8 types.
    Reasons -
    1. These operations are slow and incur a lot of heap allocations
    2. UTF8 types maintain their natural sort order when stored as binary 
arrays.

commit 71b053006b587f39a47025302e7d3de8dcac482d
Author: Smidth Panchamia <[email protected]>
Date:   2015-08-19T22:27:02Z

    DRILL-3492 - Remove test code that creates test tables with UTF8 
OrderedByte encoding.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to