[GitHub] drill pull request: DRILL-4028: Get off parquet fork

jaltekruse Tue, 03 Nov 2015 18:14:27 -0800

GitHub user jaltekruse opened a pull request:

    https://github.com/apache/drill/pull/236


    DRILL-4028: Get off parquet fork

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jaltekruse/incubator-drill 
parquet-update-squash

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/236.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #236
    
----
commit afb72c81bbba69346c48c77852f2429bae47dea4
Author: Jason Altekruse <[email protected]>
Date:   2015-09-04T18:09:23Z

    DRILL-4028: Part 1 - Remove references to the shaded version of a Jackson 
@JsonCreator annotation from parquet, replace with proper fasterxml version.

commit 0f51a6bf341699aa7f14457b2c49097e84fff936
Author: Jason Altekruse <[email protected]>
Date:   2015-09-04T18:17:21Z

    DRILL-4028: Part 2 - Fixing imports using the wrong parquet packages after 
rebase.
    
    clean up imports in generated source template

commit 4feb538da813f2f1a974337f5e6874866c3cd350
Author: Jason Altekruse <[email protected]>
Date:   2015-09-14T18:13:04Z

    DRILL-4028: Part 3 - Fixing issues with Drill parquet read a write path 
after merging the Drill parquet fork back into mainline.
    
    Fixed the issue with the writer, needed to flush the RecordConsumer in the 
ParquetRecordWriter.
    
    Consolidate page reading code
    
    Fix buffer sizes, uncompressed and compressed sizes were backwards
    
    The issue was a mismatch in the usage of byte buffers. Even though the 
position of a buffer was being set, that seemed to be ignored in the setSafe 
method on the varbinary vector. I needed to pass in the offset as it seems to 
just read from the beginning of the buffer. I'm not sure this is how 
ByteBuffers are supposed to be used, but we seem to make use of this pattern 
commonly so I'm not sure it could be easily refactored.
    
    Added some test to print out some additional context when an ordered 
comparison of two datasets fails in a test.
    
    Removing usage of Drill classes from DirectCodecFactory, getting it ready 
to be moved into the parquet codebase.
    
    Fix up parquet API usage in Hive Module.
    
    Fix dictionary reading, the changes made I think may speed up reading 
dictionary encoded files by avoiding an extra copy.
    
    Adding unit test to read a write all types in parquet, the decimal types 
and interval year have some issues.
    
    Use direct codec factory from new package in the parquet library now that 
it has been moved.
    
    Moving the test for Direct Codec Factory out of the Drill source as the 
class itself has been moved.
    
    Small fix after consolidating two different ByteBuffer based 
implementations of BytesInput.
    
    Small fixes to accommodate interface changes.
    
    Small changes to remove direct references to DirectCodecFactory, this class 
is not accessible outside of parquet, but an instance with the same contract is 
now accessible with a new factory method on CodecFactory.
    
    Fixed failing test using miniDFS when reading a larger parquet file.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request: DRILL-4028: Get off parquet fork

Reply via email to