Re: Review Request 22174: HIVE-6394 Implement Timestmap in ParquetSerde
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22174/#review44805 --- ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java <https://reviews.apache.org/r/22174/#comment79343> A stupid question perhaps, but is INT96 reserved for timestamps in parquet? I dug this up, but not sure if it's definitive: https://github.com/Parquet/parquet-mr/issues/101 - justin coffey On June 5, 2014, 7:33 a.m., Szehon Ho wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/22174/ > --- > > (Updated June 5, 2014, 7:33 a.m.) > > > Review request for hive, Brock Noland, justin coffey, and Xuefu Zhang. > > > Bugs: HIVE-6394 > https://issues.apache.org/jira/browse/HIVE-6394 > > > Repository: hive-git > > > Description > --- > > This uses the Jodd library to convert java.sql.Timestamp type used by Hive > into the {julian-day:nanos} format expected by parquet, and vice-versa. > > > Diffs > - > > data/files/parquet_types.txt 0be390b > pom.xml 4bb8880 > ql/pom.xml 13c477a > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java > 4da0d30 > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java > 29f7e11 > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java > 57161d8 > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java > fb2f5a8 > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/utils/NanoTimeUtils.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java > 3490061 > > ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java > PRE-CREATION > ql/src/test/queries/clientpositive/parquet_types.q 5d6333c > ql/src/test/results/clientpositive/parquet_types.q.out c23f7f1 > > Diff: https://reviews.apache.org/r/22174/diff/ > > > Testing > --- > > Unit tests the new libraries, and also added timestamp data in the > "parquet_types" q-test. > > > Thanks, > > Szehon Ho > >
[jira] [Commented] (HIVE-6994) parquet-hive createArray strips null elements
[ https://issues.apache.org/jira/browse/HIVE-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004924#comment-14004924 ] Justin Coffey commented on HIVE-6994: - hmmm... good catch. It didn't get picked up by my qtest regex of parquet* and now that I run it locally I see it failing. I'll debug. > parquet-hive createArray strips null elements > - > > Key: HIVE-6994 > URL: https://issues.apache.org/jira/browse/HIVE-6994 > Project: Hive > Issue Type: Bug >Affects Versions: 0.13.0, 0.14.0 >Reporter: Justin Coffey >Assignee: Justin Coffey > Fix For: 0.14.0 > > Attachments: HIVE-6994-1.patch, HIVE-6994.2.patch, HIVE-6994.3.patch, > HIVE-6994.3.patch, HIVE-6994.patch > > > The createArray method in ParquetHiveSerDe strips null values from resultant > ArrayWritables. > tracked here as well: https://github.com/Parquet/parquet-mr/issues/377 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6994) parquet-hive createArray strips null elements
[ https://issues.apache.org/jira/browse/HIVE-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002985#comment-14002985 ] Justin Coffey commented on HIVE-6994: - Thanks Shzehon! > parquet-hive createArray strips null elements > - > > Key: HIVE-6994 > URL: https://issues.apache.org/jira/browse/HIVE-6994 > Project: Hive > Issue Type: Bug >Affects Versions: 0.13.0, 0.14.0 >Reporter: Justin Coffey > Assignee: Justin Coffey > Fix For: 0.14.0 > > Attachments: HIVE-6994-1.patch, HIVE-6994.2.patch, HIVE-6994.3.patch, > HIVE-6994.3.patch, HIVE-6994.patch > > > The createArray method in ParquetHiveSerDe strips null values from resultant > ArrayWritables. > tracked here as well: https://github.com/Parquet/parquet-mr/issues/377 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6994) parquet-hive createArray strips null elements
[ https://issues.apache.org/jira/browse/HIVE-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1396#comment-1396 ] Justin Coffey commented on HIVE-6994: - Any chance for another pass on this patch from QA? > parquet-hive createArray strips null elements > - > > Key: HIVE-6994 > URL: https://issues.apache.org/jira/browse/HIVE-6994 > Project: Hive > Issue Type: Bug >Affects Versions: 0.13.0, 0.14.0 >Reporter: Justin Coffey > Assignee: Justin Coffey > Fix For: 0.14.0 > > Attachments: HIVE-6994-1.patch, HIVE-6994.2.patch, HIVE-6994.3.patch, > HIVE-6994.patch > > > The createArray method in ParquetHiveSerDe strips null values from resultant > ArrayWritables. > tracked here as well: https://github.com/Parquet/parquet-mr/issues/377 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6994) parquet-hive createArray strips null elements
[ https://issues.apache.org/jira/browse/HIVE-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Coffey updated HIVE-6994: Attachment: HIVE-6994.3.patch The failed tests are unrelated to the patch--submitting a rebased against the trunk and retested patch. [~szehon], new rb link here: https://reviews.apache.org/r/21430/ hope we're good :) > parquet-hive createArray strips null elements > - > > Key: HIVE-6994 > URL: https://issues.apache.org/jira/browse/HIVE-6994 > Project: Hive > Issue Type: Bug >Affects Versions: 0.13.0, 0.14.0 > Reporter: Justin Coffey >Assignee: Justin Coffey > Fix For: 0.14.0 > > Attachments: HIVE-6994-1.patch, HIVE-6994.2.patch, HIVE-6994.3.patch, > HIVE-6994.patch > > > The createArray method in ParquetHiveSerDe strips null values from resultant > ArrayWritables. > tracked here as well: https://github.com/Parquet/parquet-mr/issues/377 -- This message was sent by Atlassian JIRA (v6.2#6252)
Review Request 21430: HIVE-6994 - parquet-hive createArray strips null elements
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/21430/ --- Review request for hive. Repository: hive-git Description --- The createArray method in ParquetHiveSerDe strips null values from resultant ArrayWritables. This patch: - removes an incorrect if null check in createArray - simplifies ParquetHiveSerDe - total refactor of TestParquetHiveSerDe for better test coverage and easier regression testing Diffs - data/files/parquet_create.txt ccd48ee ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java b689336 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 3b56fc7 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetHiveSerDe.java PRE-CREATION ql/src/test/queries/clientpositive/parquet_create.q 0b976bd ql/src/test/results/clientpositive/parquet_create.q.out 3220be5 Diff: https://reviews.apache.org/r/21430/diff/ Testing --- Thanks, justin coffey
Re: Review Request 20899: HIVE-6994 - parquet-hive createArray strips null elements
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/20899/ --- (Updated May 12, 2014, 2:18 p.m.) Review request for hive. Changes --- added back finals and cleaned up commentary. Repository: hive-git Description --- - Fix for bug in createArray() that strips null elements. - In the process refactored serde for simplification purposes. - Refactored tests for better regression testing. Diffs (updated) - data/files/parquet_create.txt ccd48ee ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java b689336 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 3b56fc7 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetHiveSerDe.java PRE-CREATION ql/src/test/queries/clientpositive/parquet_create.q 0b976bd ql/src/test/results/clientpositive/parquet_create.q.out 3220be5 Diff: https://reviews.apache.org/r/20899/diff/ Testing --- Thanks, justin coffey
[jira] [Updated] (HIVE-6994) parquet-hive createArray strips null elements
[ https://issues.apache.org/jira/browse/HIVE-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Coffey updated HIVE-6994: Attachment: HIVE-6994.2.patch Updated based on comments on review board and fixed to include the right extension for retesting :). > parquet-hive createArray strips null elements > - > > Key: HIVE-6994 > URL: https://issues.apache.org/jira/browse/HIVE-6994 > Project: Hive > Issue Type: Bug >Affects Versions: 0.13.0, 0.14.0 > Reporter: Justin Coffey > Assignee: Justin Coffey > Fix For: 0.14.0 > > Attachments: HIVE-6994-1.patch, HIVE-6994.2.patch, HIVE-6994.patch > > > The createArray method in ParquetHiveSerDe strips null values from resultant > ArrayWritables. > tracked here as well: https://github.com/Parquet/parquet-mr/issues/377 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6994) parquet-hive createArray strips null elements
[ https://issues.apache.org/jira/browse/HIVE-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Coffey updated HIVE-6994: Attachment: HIVE-6994-1.patch updated patch after rebasing against the trunk. it applies for me :) > parquet-hive createArray strips null elements > - > > Key: HIVE-6994 > URL: https://issues.apache.org/jira/browse/HIVE-6994 > Project: Hive > Issue Type: Bug >Affects Versions: 0.13.0, 0.14.0 > Reporter: Justin Coffey > Assignee: Justin Coffey > Fix For: 0.14.0 > > Attachments: HIVE-6994-1.patch, HIVE-6994.patch > > > The createArray method in ParquetHiveSerDe strips null values from resultant > ArrayWritables. > tracked here as well: https://github.com/Parquet/parquet-mr/issues/377 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6994) parquet-hive createArray strips null elements
[ https://issues.apache.org/jira/browse/HIVE-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13989688#comment-13989688 ] Justin Coffey commented on HIVE-6994: - Hello, is it just me or does it look like the patch wasn't actually applied? In the included output I don't see anything associated with my patch and when looking at the precommit change log, I don't see this ticket referenced: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/91/ {code} hanges HIVE-6741: HiveServer2 startup fails in secure (kerberos) mode due to backward incompatible hadoop change (Vaibhav Gumashta via Jason Dere) (detail/ViewSVN) HIVE-6931 : Windows unit test fixes (Jason Dere via Sushanth Sowmyan) (detail/ViewSVN) HIVE-6966 : More fixes for TestCliDriver on Windows (Jason Dere via Sushanth Sowmyan) (detail/ViewSVN) HIVE-6982 : Export all .sh equivalent for windows (.cmd files) in bin, bin/ext (Hari Sankar Sivarama Subramaniyan via Sushanth Sowmyan) (detail/ViewSVN) {code} Not sure if I'm missing something--do I need to rebase my patch to the trunk head and resubmit? > parquet-hive createArray strips null elements > - > > Key: HIVE-6994 > URL: https://issues.apache.org/jira/browse/HIVE-6994 > Project: Hive > Issue Type: Bug >Affects Versions: 0.13.0, 0.14.0 >Reporter: Justin Coffey >Assignee: Justin Coffey > Fix For: 0.14.0 > > Attachments: HIVE-6994.patch > > > The createArray method in ParquetHiveSerDe strips null values from resultant > ArrayWritables. > tracked here as well: https://github.com/Parquet/parquet-mr/issues/377 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6920) Parquet Serde Simplification
[ https://issues.apache.org/jira/browse/HIVE-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985777#comment-13985777 ] Justin Coffey commented on HIVE-6920: - btw, in the superceding patch, I killed the pom bump to 1.4.1, but I neglected to remove my spurious comments about serde stats :). I'm late out the door right now, but if you have a chance to check the new patch and review board request, I can clean that up before a final commit :D. > Parquet Serde Simplification > > > Key: HIVE-6920 > URL: https://issues.apache.org/jira/browse/HIVE-6920 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Affects Versions: 0.13.0 >Reporter: Justin Coffey >Assignee: Justin Coffey >Priority: Minor > Fix For: 0.14.0 > > Attachments: HIVE-6920.patch > > > Various fixes and code simplification in the ParquetHiveSerde (with minor > optimizations) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6994) parquet-hive createArray strips null elements
[ https://issues.apache.org/jira/browse/HIVE-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985772#comment-13985772 ] Justin Coffey commented on HIVE-6994: - review board link: https://reviews.apache.org/r/20899/ > parquet-hive createArray strips null elements > - > > Key: HIVE-6994 > URL: https://issues.apache.org/jira/browse/HIVE-6994 > Project: Hive > Issue Type: Bug >Affects Versions: 0.13.0, 0.14.0 >Reporter: Justin Coffey > Assignee: Justin Coffey > Fix For: 0.14.0 > > Attachments: HIVE-6994.patch > > > The createArray method in ParquetHiveSerDe strips null values from resultant > ArrayWritables. > tracked here as well: https://github.com/Parquet/parquet-mr/issues/377 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6920) Parquet Serde Simplification
[ https://issues.apache.org/jira/browse/HIVE-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Coffey updated HIVE-6920: Status: Open (was: Patch Available) Please see superceding issue here: #HIVE-6994 > Parquet Serde Simplification > > > Key: HIVE-6920 > URL: https://issues.apache.org/jira/browse/HIVE-6920 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Affects Versions: 0.13.0 > Reporter: Justin Coffey > Assignee: Justin Coffey >Priority: Minor > Fix For: 0.14.0 > > Attachments: HIVE-6920.patch > > > Various fixes and code simplification in the ParquetHiveSerde (with minor > optimizations) -- This message was sent by Atlassian JIRA (v6.2#6252)
Review Request 20899: HIVE-6994 - parquet-hive createArray strips null elements
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/20899/ --- Review request for hive. Repository: hive-git Description --- - Fix for bug in createArray() that strips null elements. - In the process refactored serde for simplification purposes. - Refactored tests for better regression testing. Diffs - data/files/parquet_create.txt ccd48ee ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java b689336 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java be518b9 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetHiveSerDe.java PRE-CREATION ql/src/test/queries/clientpositive/parquet_create.q 0b976bd ql/src/test/results/clientpositive/parquet_create.q.out 3220be5 Diff: https://reviews.apache.org/r/20899/diff/ Testing --- Thanks, justin coffey
[jira] [Updated] (HIVE-6994) parquet-hive createArray strips null elements
[ https://issues.apache.org/jira/browse/HIVE-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Coffey updated HIVE-6994: Description: The createArray method in ParquetHiveSerDe strips null values from resultant ArrayWritables. tracked here as well: https://github.com/Parquet/parquet-mr/issues/377 was:The createArray method in ParquetHiveSerDe strips null values from resultant ArrayWritables. > parquet-hive createArray strips null elements > - > > Key: HIVE-6994 > URL: https://issues.apache.org/jira/browse/HIVE-6994 > Project: Hive > Issue Type: Bug >Affects Versions: 0.13.0, 0.14.0 > Reporter: Justin Coffey > Assignee: Justin Coffey > Fix For: 0.14.0 > > > The createArray method in ParquetHiveSerDe strips null values from resultant > ArrayWritables. > tracked here as well: https://github.com/Parquet/parquet-mr/issues/377 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6994) parquet-hive createArray strips null elements
[ https://issues.apache.org/jira/browse/HIVE-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Coffey updated HIVE-6994: Attachment: HIVE-6994.patch > parquet-hive createArray strips null elements > - > > Key: HIVE-6994 > URL: https://issues.apache.org/jira/browse/HIVE-6994 > Project: Hive > Issue Type: Bug >Affects Versions: 0.13.0, 0.14.0 > Reporter: Justin Coffey > Assignee: Justin Coffey > Fix For: 0.14.0 > > Attachments: HIVE-6994.patch > > > The createArray method in ParquetHiveSerDe strips null values from resultant > ArrayWritables. > tracked here as well: https://github.com/Parquet/parquet-mr/issues/377 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6994) parquet-hive createArray strips null elements
[ https://issues.apache.org/jira/browse/HIVE-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Coffey updated HIVE-6994: Status: Patch Available (was: Open) This patch fixes the issue in ParquetHiveSerDe, but there may be an underlying issue in parquet (this is still under investigation). > parquet-hive createArray strips null elements > - > > Key: HIVE-6994 > URL: https://issues.apache.org/jira/browse/HIVE-6994 > Project: Hive > Issue Type: Bug >Affects Versions: 0.13.0, 0.14.0 > Reporter: Justin Coffey > Assignee: Justin Coffey > Fix For: 0.14.0 > > Attachments: HIVE-6994.patch > > > The createArray method in ParquetHiveSerDe strips null values from resultant > ArrayWritables. > tracked here as well: https://github.com/Parquet/parquet-mr/issues/377 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-6994) parquet-hive createArray strips null elements
Justin Coffey created HIVE-6994: --- Summary: parquet-hive createArray strips null elements Key: HIVE-6994 URL: https://issues.apache.org/jira/browse/HIVE-6994 Project: Hive Issue Type: Bug Affects Versions: 0.13.0, 0.14.0 Reporter: Justin Coffey Assignee: Justin Coffey Fix For: 0.14.0 The createArray method in ParquetHiveSerDe strips null values from resultant ArrayWritables. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6920) Parquet Serde Simplification
[ https://issues.apache.org/jira/browse/HIVE-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984183#comment-13984183 ] Justin Coffey commented on HIVE-6920: - bump? I'd like to build off of this for a bug fix that I need to submit. > Parquet Serde Simplification > > > Key: HIVE-6920 > URL: https://issues.apache.org/jira/browse/HIVE-6920 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Affects Versions: 0.13.0 >Reporter: Justin Coffey >Assignee: Justin Coffey >Priority: Minor > Fix For: 0.14.0 > > Attachments: HIVE-6920.patch > > > Various fixes and code simplification in the ParquetHiveSerde (with minor > optimizations) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6920) Parquet Serde Simplification
[ https://issues.apache.org/jira/browse/HIVE-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981072#comment-13981072 ] Justin Coffey commented on HIVE-6920: - It's actually mostly just code reduction. Here's the RB link: https://reviews.apache.org/r/20710/ thanks :) > Parquet Serde Simplification > > > Key: HIVE-6920 > URL: https://issues.apache.org/jira/browse/HIVE-6920 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Affects Versions: 0.13.0 >Reporter: Justin Coffey >Assignee: Justin Coffey >Priority: Minor > Fix For: 0.14.0 > > Attachments: HIVE-6920.patch > > > Various fixes and code simplification in the ParquetHiveSerde (with minor > optimizations) -- This message was sent by Atlassian JIRA (v6.2#6252)
Review Request 20710: HIVE-6920 - Parquet Serde Simplification
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/20710/ --- Review request for hive. Repository: hive-git Description --- Refactoring for simplification of the parquet-hive serde. Diffs - pom.xml 426dca8 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java b689336 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java be518b9 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetHiveSerDe.java PRE-CREATION Diff: https://reviews.apache.org/r/20710/diff/ Testing --- Thanks, justin coffey
[jira] [Commented] (HIVE-6920) Parquet Serde Simplification
[ https://issues.apache.org/jira/browse/HIVE-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13972790#comment-13972790 ] Justin Coffey commented on HIVE-6920: - cc: [~brocknoland] [~xuefuz] > Parquet Serde Simplification > > > Key: HIVE-6920 > URL: https://issues.apache.org/jira/browse/HIVE-6920 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Affects Versions: 0.13.0 > Reporter: Justin Coffey >Assignee: Justin Coffey >Priority: Minor > Fix For: 0.14.0 > > Attachments: HIVE-6920.patch > > > Various fixes and code simplification in the ParquetHiveSerde (with minor > optimizations) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6920) Parquet Serde Simplification
[ https://issues.apache.org/jira/browse/HIVE-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Coffey updated HIVE-6920: Release Note: - Removed unused serde stats - Simplified initialize code - Renamed test class to match serde class name - Separated serialize and deserialize tests Status: Patch Available (was: Open) > Parquet Serde Simplification > > > Key: HIVE-6920 > URL: https://issues.apache.org/jira/browse/HIVE-6920 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Affects Versions: 0.13.0 > Reporter: Justin Coffey > Assignee: Justin Coffey >Priority: Minor > Fix For: 0.14.0 > > Attachments: HIVE-6920.patch > > > Various fixes and code simplification in the ParquetHiveSerde (with minor > optimizations) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6920) Parquet Serde Simplification
[ https://issues.apache.org/jira/browse/HIVE-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Coffey updated HIVE-6920: Release Note: - Removed unused serde stats - Simplified initialize code - Renamed test class to match serde class name - Separated serialize and deserialize tests - Bumped Parquet version to 1.4.1 was: - Removed unused serde stats - Simplified initialize code - Renamed test class to match serde class name - Separated serialize and deserialize tests > Parquet Serde Simplification > > > Key: HIVE-6920 > URL: https://issues.apache.org/jira/browse/HIVE-6920 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Affects Versions: 0.13.0 > Reporter: Justin Coffey > Assignee: Justin Coffey >Priority: Minor > Fix For: 0.14.0 > > Attachments: HIVE-6920.patch > > > Various fixes and code simplification in the ParquetHiveSerde (with minor > optimizations) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6920) Parquet Serde Simplification
[ https://issues.apache.org/jira/browse/HIVE-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Coffey updated HIVE-6920: Attachment: HIVE-6920.patch > Parquet Serde Simplification > > > Key: HIVE-6920 > URL: https://issues.apache.org/jira/browse/HIVE-6920 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Affects Versions: 0.13.0 > Reporter: Justin Coffey > Assignee: Justin Coffey >Priority: Minor > Fix For: 0.14.0 > > Attachments: HIVE-6920.patch > > > Various fixes and code simplification in the ParquetHiveSerde (with minor > optimizations) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-6920) Parquet Serde Simplification
Justin Coffey created HIVE-6920: --- Summary: Parquet Serde Simplification Key: HIVE-6920 URL: https://issues.apache.org/jira/browse/HIVE-6920 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Justin Coffey Assignee: Justin Coffey Priority: Minor Fix For: 0.14.0 Various fixes and code simplification in the ParquetHiveSerde (with minor optimizations) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6784) parquet-hive should allow column type change
[ https://issues.apache.org/jira/browse/HIVE-6784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13966341#comment-13966341 ] Justin Coffey commented on HIVE-6784: - You've cited a "lazy" serde. Parquet is not "lazy". It is similar to ORC. Have a look ORC's deserialize() method (org.apache.hadoop.hive.ql.io.orc.OrcSerde): {code} @Override public Object deserialize(Writable writable) throws SerDeException { return writable; } {code} A quick look through ORC code indicates to me that they don't do any reparsing (though I might have missed something). Looking through other serde's not a single one (that I checked) reparses values. Value parsing is handled in ObjectInspectors (poke around org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils). In my opinion, the *substantial* performance penalty that you are introducing with this patch is going to be a much bigger negative to adopting parquet than obliging people to rebuild their data set in the rare event that you have to change a type. And if you do need to change a type, insert overwrite table is a good work around. -1 > parquet-hive should allow column type change > > > Key: HIVE-6784 > URL: https://issues.apache.org/jira/browse/HIVE-6784 > Project: Hive > Issue Type: Bug > Components: File Formats, Serializers/Deserializers >Affects Versions: 0.13.0 >Reporter: Tongjie Chen > Fix For: 0.14.0 > > Attachments: HIVE-6784.1.patch.txt, HIVE-6784.2.patch.txt > > > see also in the following parquet issue: > https://github.com/Parquet/parquet-mr/issues/323 > Currently, if we change parquet format hive table using "alter table > parquet_table change c1 c1 bigint " ( assuming original type of c1 is int), > it will result in exception thrown from SerDe: > "org.apache.hadoop.io.IntWritable cannot be cast to > org.apache.hadoop.io.LongWritable" in query runtime. > This is different behavior from hive (using other file format), where it will > try to perform cast (null value in case of incompatible type). > Parquet Hive's RecordReader returns an ArrayWritable (based on schema stored > in footers of parquet files); ParquetHiveSerDe also creates an corresponding > ArrayWritableObjectInspector (but using column type info from metastore). > Whenever there is column type change, the objector inspector will throw > exception, since WritableLongObjectInspector cannot inspect an IntWritable > etc... > Conversion has to happen somewhere if we want to allow type change. SerDe's > deserialize method seems a natural place for it. > Currently, serialize method calls createStruct (then createPrimitive) for > every record, but it creates a new object regardless, which seems expensive. > I think that could be optimized a bit by just returning the object passed if > already of the right type. deserialize also reuse this method, if there is a > type change, there will be new object to be created, which I think is > inevitable. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6784) parquet-hive should allow column type change
[ https://issues.apache.org/jira/browse/HIVE-6784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965279#comment-13965279 ] Justin Coffey commented on HIVE-6784: - -1 on this patch. Looping on the arraywriteable in deserialize() will cause a performance penalty at read time, and running parseXxxx(obj.toString) in the event of a type mismatch is also painful. Changing types of columns is a rare event, we shouldn't write code that will cause performance penalties to handle it. Users should recreate the table with the new type and load it from the old table casting and converting as appropriate in their query. > parquet-hive should allow column type change > > > Key: HIVE-6784 > URL: https://issues.apache.org/jira/browse/HIVE-6784 > Project: Hive > Issue Type: Bug > Components: File Formats, Serializers/Deserializers >Affects Versions: 0.13.0 >Reporter: Tongjie Chen > Fix For: 0.14.0 > > Attachments: HIVE-6784.1.patch.txt, HIVE-6784.2.patch.txt > > > see also in the following parquet issue: > https://github.com/Parquet/parquet-mr/issues/323 > Currently, if we change parquet format hive table using "alter table > parquet_table change c1 c1 bigint " ( assuming original type of c1 is int), > it will result in exception thrown from SerDe: > "org.apache.hadoop.io.IntWritable cannot be cast to > org.apache.hadoop.io.LongWritable" in query runtime. > This is different behavior from hive (using other file format), where it will > try to perform cast (null value in case of incompatible type). > Parquet Hive's RecordReader returns an ArrayWritable (based on schema stored > in footers of parquet files); ParquetHiveSerDe also creates an corresponding > ArrayWritableObjectInspector (but using column type info from metastore). > Whenever there is column type change, the objector inspector will throw > exception, since WritableLongObjectInspector cannot inspect an IntWritable > etc... > Conversion has to happen somewhere if we want to allow type change. SerDe's > deserialize method seems a natural place for it. > Currently, serialize method calls createStruct (then createPrimitive) for > every record, but it creates a new object regardless, which seems expensive. > I think that could be optimized a bit by just returning the object passed if > already of the right type. deserialize also reuse this method, if there is a > type change, there will be new object to be created, which I think is > inevitable. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6757) Remove deprecated parquet classes from outside of org.apache package
[ https://issues.apache.org/jira/browse/HIVE-6757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961892#comment-13961892 ] Justin Coffey commented on HIVE-6757: - much appreciated Harish! > Remove deprecated parquet classes from outside of org.apache package > > > Key: HIVE-6757 > URL: https://issues.apache.org/jira/browse/HIVE-6757 > Project: Hive > Issue Type: Bug >Reporter: Owen O'Malley >Assignee: Owen O'Malley >Priority: Blocker > Fix For: 0.13.0 > > Attachments: HIVE-6757.2.patch, HIVE-6757.patch, parquet-hive.patch > > > Apache shouldn't release projects with files outside of the org.apache > namespace. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6757) Remove deprecated parquet classes from outside of org.apache package
[ https://issues.apache.org/jira/browse/HIVE-6757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959178#comment-13959178 ] Justin Coffey commented on HIVE-6757: - I find that to be an acceptable compromise. consensus :). > Remove deprecated parquet classes from outside of org.apache package > > > Key: HIVE-6757 > URL: https://issues.apache.org/jira/browse/HIVE-6757 > Project: Hive > Issue Type: Bug >Reporter: Owen O'Malley >Assignee: Owen O'Malley >Priority: Blocker > Fix For: 0.13.0 > > Attachments: HIVE-6757.patch, parquet-hive.patch > > > Apache shouldn't release projects with files outside of the org.apache > namespace. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6757) Remove deprecated parquet classes from outside of org.apache package
[ https://issues.apache.org/jira/browse/HIVE-6757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13955121#comment-13955121 ] Justin Coffey commented on HIVE-6757: - I can +1 [~brocknoland]'s solution if that flies for everyone else. Actually, we joked about this in one of our review sessions here thinking that it was a bit of a brute force solution, but if this works for everyone it works for us (FYI, for one table we expect to have 47K partitions to update). > Remove deprecated parquet classes from outside of org.apache package > > > Key: HIVE-6757 > URL: https://issues.apache.org/jira/browse/HIVE-6757 > Project: Hive > Issue Type: Bug >Reporter: Owen O'Malley >Assignee: Owen O'Malley >Priority: Blocker > Fix For: 0.13.0 > > Attachments: HIVE-6757.patch, parquet-hive.patch > > > Apache shouldn't release projects with files outside of the org.apache > namespace. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6757) Remove deprecated parquet classes from outside of org.apache package
[ https://issues.apache.org/jira/browse/HIVE-6757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950996#comment-13950996 ] Justin Coffey commented on HIVE-6757: - I guess my point is simply that early adopters are penalized for life whereas new users get the full benefit of the patch. I agree that the penalty is pretty small, but the two classes kicking around in the parquet package are even less of a penalty to the hive code base. Thus I remain against pulling them out. > Remove deprecated parquet classes from outside of org.apache package > > > Key: HIVE-6757 > URL: https://issues.apache.org/jira/browse/HIVE-6757 > Project: Hive > Issue Type: Bug >Reporter: Owen O'Malley >Assignee: Owen O'Malley >Priority: Blocker > Fix For: 0.13.0 > > Attachments: HIVE-6757.patch, parquet-hive.patch > > > Apache shouldn't release projects with files outside of the org.apache > namespace. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6757) Remove deprecated parquet classes from outside of org.apache package
[ https://issues.apache.org/jira/browse/HIVE-6757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950531#comment-13950531 ] Justin Coffey commented on HIVE-6757: - Owen, the solution your proposing means that there is no seamless upgrade path for existing parquet-hive users and that somewhere on the hive wiki there will have to be a call out "attention existing parquet users, you must include the parquet-hive.jar when upgrading to hive 13. we're sorry, but this is the price you have to pay for being an early adopter and driving functionality". One of the goals of the #HIVE-5783 patch was to make the lives of parquet users easier (there were of course many other reasons, but ease of use is a good goal in and of itself). The classes as they are do no harm and it's hard to see how they pollute the code base of Hive in any significant way. This patch kinda sorta seems a tiny bit punitive if you ask me. Please don't take any of this the wrong way, but I believe this is what a fair chunk of the parquet-hive community might think if this patch is committed. > Remove deprecated parquet classes from outside of org.apache package > > > Key: HIVE-6757 > URL: https://issues.apache.org/jira/browse/HIVE-6757 > Project: Hive > Issue Type: Bug >Reporter: Owen O'Malley >Assignee: Owen O'Malley >Priority: Blocker > Fix For: 0.13.0 > > Attachments: HIVE-6757.patch, parquet-hive.patch > > > Apache shouldn't release projects with files outside of the org.apache > namespace. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 18925: HIVE-6575 select * fails on parquet table with map datatype
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18925/#review36644 --- Ship it! go for r3 with the getClass (and no instanceof) check and {} formatting. - justin coffey On March 8, 2014, 12:01 a.m., Szehon Ho wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/18925/ > --- > > (Updated March 8, 2014, 12:01 a.m.) > > > Review request for hive, Brock Noland, justin coffey, and Xuefu Zhang. > > > Repository: hive-git > > > Description > --- > > The issue is, as part of select * query, a DeepParquetHiveMapInspector is > used for one column of an overall parquet-table struct object inspector. > > The problem lies in the ObjectInspectorFactory's cache for struct object > inspector. For performance, there is a cache keyed on an array list, of all > object inspectors of columns. The second time the query is run, it attempts > to lookup cached struct inspector. But when the hashmap looks up the part of > the key consisting of the DeepParquetHiveMapInspector, java calls .equals > against the existing DeepParquetHivemapInspector. This fails, as the .equals > method casted the "other" to a "StandardParquetHiveInspector". > > Regenerating the .equals and .hashcode from eclipse. > > Also adding one more check in .equals before casting, to handle the case if > another class of object inspector gets hashed to the same hashcode in the > cache. Then java would call .equals against the other, which in this case is > not of the same class. > > > Diffs > - > > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java > 1d72747 > > Diff: https://reviews.apache.org/r/18925/diff/ > > > Testing > --- > > Manual testing. > > > Thanks, > > Szehon Ho > >
[jira] [Updated] (HIVE-6414) ParquetInputFormat provides data values that do not match the object inspectors
[ https://issues.apache.org/jira/browse/HIVE-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Coffey updated HIVE-6414: Attachment: HIVE-6414.3.patch > ParquetInputFormat provides data values that do not match the object > inspectors > --- > > Key: HIVE-6414 > URL: https://issues.apache.org/jira/browse/HIVE-6414 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Affects Versions: 0.13.0 >Reporter: Remus Rusanu >Assignee: Justin Coffey > Labels: Parquet > Fix For: 0.13.0 > > Attachments: HIVE-6414.2.patch, HIVE-6414.3.patch, HIVE-6414.3.patch, > HIVE-6414.3.patch, HIVE-6414.patch > > > While working on HIVE-5998 I noticed that the ParquetRecordReader returns > IntWritable for all 'int like' types, in disaccord with the row object > inspectors. I though fine, and I worked my way around it. But I see now that > the issue trigger failuers in other places, eg. in aggregates: > {noformat} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row > {"cint":528534767,"ctinyint":31,"csmallint":4963,"cfloat":31.0,"cdouble":4963.0,"cstring1":"cvLH6Eat2yFsyy7p"} > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) > ... 8 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast > to java.lang.Short > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:808) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:524) > ... 9 more > Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable > cannot be cast to java.lang.Short > at > org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaShortObjectInspector.get(JavaShortObjectInspector.java:41) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:671) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:631) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.merge(GenericUDAFMin.java:109) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.iterate(GenericUDAFMin.java:96) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:183) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:641) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:838) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:735) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:803) > ... 15 more > {noformat} > My test is (I'm writing a test .q from HIVE-5998, but the repro does not > involve vectorization): > {noformat} > create table if not exists alltypes_parquet ( > cint int, > ctinyint tinyint, > csmallint smallint, > cfloat float, > cdouble double, > cstring1 string) stored as parquet; > insert overwrite table alltypes_parquet > select cint, > ctinyint, > csmallint, > cfloat, > cdouble, > cstring1 > from alltypesorc; > explain select * from alltypes_parquet limit 10; select * from > alltypes_parquet limit 10; > explain select ctinyint, > max(cint), > min(csmallint), > count(cstring1), > avg(cfloat), > stddev_pop(cdouble) > from alltypes_parquet > group by ctinyint; > select ctinyint, > max(cint), > min(csmallint), > count(cstring1), > avg(cfloat), > stddev_pop(cdouble) > from alltypes_parquet > group by ctinyint; > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6414) ParquetInputFormat provides data values that do not match the object inspectors
[ https://issues.apache.org/jira/browse/HIVE-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924095#comment-13924095 ] Justin Coffey commented on HIVE-6414: - hello, I don't think these are related to the patch, so resubmitting for retesting. > ParquetInputFormat provides data values that do not match the object > inspectors > --- > > Key: HIVE-6414 > URL: https://issues.apache.org/jira/browse/HIVE-6414 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Affects Versions: 0.13.0 >Reporter: Remus Rusanu >Assignee: Justin Coffey > Labels: Parquet > Fix For: 0.13.0 > > Attachments: HIVE-6414.2.patch, HIVE-6414.3.patch, HIVE-6414.3.patch, > HIVE-6414.patch > > > While working on HIVE-5998 I noticed that the ParquetRecordReader returns > IntWritable for all 'int like' types, in disaccord with the row object > inspectors. I though fine, and I worked my way around it. But I see now that > the issue trigger failuers in other places, eg. in aggregates: > {noformat} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row > {"cint":528534767,"ctinyint":31,"csmallint":4963,"cfloat":31.0,"cdouble":4963.0,"cstring1":"cvLH6Eat2yFsyy7p"} > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) > ... 8 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast > to java.lang.Short > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:808) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:524) > ... 9 more > Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable > cannot be cast to java.lang.Short > at > org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaShortObjectInspector.get(JavaShortObjectInspector.java:41) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:671) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:631) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.merge(GenericUDAFMin.java:109) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.iterate(GenericUDAFMin.java:96) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:183) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:641) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:838) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:735) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:803) > ... 15 more > {noformat} > My test is (I'm writing a test .q from HIVE-5998, but the repro does not > involve vectorization): > {noformat} > create table if not exists alltypes_parquet ( > cint int, > ctinyint tinyint, > csmallint smallint, > cfloat float, > cdouble double, > cstring1 string) stored as parquet; > insert overwrite table alltypes_parquet > select cint, > ctinyint, > csmallint, > cfloat, > cdouble, > cstring1 > from alltypesorc; > explain select * from alltypes_parquet limit 10; select * from > alltypes_parquet limit 10; > explain select ctinyint, > max(cint), > min(csmallint), > count(cstring1), > avg(cfloat), > stddev_pop(cdouble) > from alltypes_parquet > group by ctinyint; > select ctinyint, > max(cint), > min(csmallint), > count(cstring1), > avg(cfloat), > stddev_pop(cdouble) > from alltypes_parquet > group by ctinyint; > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6414) ParquetInputFormat provides data values that do not match the object inspectors
[ https://issues.apache.org/jira/browse/HIVE-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Coffey updated HIVE-6414: Attachment: HIVE-6414.3.patch Update patch based on comments from Xuefu. > ParquetInputFormat provides data values that do not match the object > inspectors > --- > > Key: HIVE-6414 > URL: https://issues.apache.org/jira/browse/HIVE-6414 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Affects Versions: 0.13.0 >Reporter: Remus Rusanu >Assignee: Justin Coffey > Labels: Parquet > Fix For: 0.13.0 > > Attachments: HIVE-6414.2.patch, HIVE-6414.3.patch, HIVE-6414.patch > > > While working on HIVE-5998 I noticed that the ParquetRecordReader returns > IntWritable for all 'int like' types, in disaccord with the row object > inspectors. I though fine, and I worked my way around it. But I see now that > the issue trigger failuers in other places, eg. in aggregates: > {noformat} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row > {"cint":528534767,"ctinyint":31,"csmallint":4963,"cfloat":31.0,"cdouble":4963.0,"cstring1":"cvLH6Eat2yFsyy7p"} > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) > ... 8 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast > to java.lang.Short > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:808) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:524) > ... 9 more > Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable > cannot be cast to java.lang.Short > at > org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaShortObjectInspector.get(JavaShortObjectInspector.java:41) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:671) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:631) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.merge(GenericUDAFMin.java:109) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.iterate(GenericUDAFMin.java:96) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:183) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:641) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:838) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:735) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:803) > ... 15 more > {noformat} > My test is (I'm writing a test .q from HIVE-5998, but the repro does not > involve vectorization): > {noformat} > create table if not exists alltypes_parquet ( > cint int, > ctinyint tinyint, > csmallint smallint, > cfloat float, > cdouble double, > cstring1 string) stored as parquet; > insert overwrite table alltypes_parquet > select cint, > ctinyint, > csmallint, > cfloat, > cdouble, > cstring1 > from alltypesorc; > explain select * from alltypes_parquet limit 10; select * from > alltypes_parquet limit 10; > explain select ctinyint, > max(cint), > min(csmallint), > count(cstring1), > avg(cfloat), > stddev_pop(cdouble) > from alltypes_parquet > group by ctinyint; > select ctinyint, > max(cint), > min(csmallint), > count(cstring1), > avg(cfloat), > stddev_pop(cdouble) > from alltypes_parquet > group by ctinyint; > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6414) ParquetInputFormat provides data values that do not match the object inspectors
[ https://issues.apache.org/jira/browse/HIVE-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13912643#comment-13912643 ] Justin Coffey commented on HIVE-6414: - [~xuefuz] ok will recheck qtest and resubmit with nulls and not exceptions. I wasn't sure what the behavior should be in the case of an overflow. > ParquetInputFormat provides data values that do not match the object > inspectors > --- > > Key: HIVE-6414 > URL: https://issues.apache.org/jira/browse/HIVE-6414 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Affects Versions: 0.13.0 >Reporter: Remus Rusanu >Assignee: Justin Coffey > Labels: Parquet > Fix For: 0.13.0 > > Attachments: HIVE-6414.2.patch, HIVE-6414.patch > > > While working on HIVE-5998 I noticed that the ParquetRecordReader returns > IntWritable for all 'int like' types, in disaccord with the row object > inspectors. I though fine, and I worked my way around it. But I see now that > the issue trigger failuers in other places, eg. in aggregates: > {noformat} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row > {"cint":528534767,"ctinyint":31,"csmallint":4963,"cfloat":31.0,"cdouble":4963.0,"cstring1":"cvLH6Eat2yFsyy7p"} > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) > ... 8 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast > to java.lang.Short > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:808) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:524) > ... 9 more > Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable > cannot be cast to java.lang.Short > at > org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaShortObjectInspector.get(JavaShortObjectInspector.java:41) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:671) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:631) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.merge(GenericUDAFMin.java:109) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.iterate(GenericUDAFMin.java:96) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:183) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:641) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:838) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:735) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:803) > ... 15 more > {noformat} > My test is (I'm writing a test .q from HIVE-5998, but the repro does not > involve vectorization): > {noformat} > create table if not exists alltypes_parquet ( > cint int, > ctinyint tinyint, > csmallint smallint, > cfloat float, > cdouble double, > cstring1 string) stored as parquet; > insert overwrite table alltypes_parquet > select cint, > ctinyint, > csmallint, > cfloat, > cdouble, > cstring1 > from alltypesorc; > explain select * from alltypes_parquet limit 10; select * from > alltypes_parquet limit 10; > explain select ctinyint, > max(cint), > min(csmallint), > count(cstring1), > avg(cfloat), > stddev_pop(cdouble) > from alltypes_parquet > group by ctinyint; > select ctinyint, > max(cint), > min(csmallint), > count(cstring1), > avg(cfloat), > stddev_pop(cdouble) > from alltypes_parquet > group by ctinyint; > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6414) ParquetInputFormat provides data values that do not match the object inspectors
[ https://issues.apache.org/jira/browse/HIVE-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Coffey updated HIVE-6414: Attachment: HIVE-6414.2.patch Updated patch with working unit and qtests applicable to trunk commit: 6010e22bd24d5004990c63f0aeb232d75693dd94 (#HIVE-5954) > ParquetInputFormat provides data values that do not match the object > inspectors > --- > > Key: HIVE-6414 > URL: https://issues.apache.org/jira/browse/HIVE-6414 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Affects Versions: 0.13.0 >Reporter: Remus Rusanu >Assignee: Justin Coffey > Labels: Parquet > Fix For: 0.13.0 > > Attachments: HIVE-6414.2.patch, HIVE-6414.patch > > > While working on HIVE-5998 I noticed that the ParquetRecordReader returns > IntWritable for all 'int like' types, in disaccord with the row object > inspectors. I though fine, and I worked my way around it. But I see now that > the issue trigger failuers in other places, eg. in aggregates: > {noformat} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row > {"cint":528534767,"ctinyint":31,"csmallint":4963,"cfloat":31.0,"cdouble":4963.0,"cstring1":"cvLH6Eat2yFsyy7p"} > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) > ... 8 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast > to java.lang.Short > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:808) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:524) > ... 9 more > Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable > cannot be cast to java.lang.Short > at > org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaShortObjectInspector.get(JavaShortObjectInspector.java:41) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:671) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:631) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.merge(GenericUDAFMin.java:109) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.iterate(GenericUDAFMin.java:96) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:183) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:641) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:838) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:735) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:803) > ... 15 more > {noformat} > My test is (I'm writing a test .q from HIVE-5998, but the repro does not > involve vectorization): > {noformat} > create table if not exists alltypes_parquet ( > cint int, > ctinyint tinyint, > csmallint smallint, > cfloat float, > cdouble double, > cstring1 string) stored as parquet; > insert overwrite table alltypes_parquet > select cint, > ctinyint, > csmallint, > cfloat, > cdouble, > cstring1 > from alltypesorc; > explain select * from alltypes_parquet limit 10; select * from > alltypes_parquet limit 10; > explain select ctinyint, > max(cint), > min(csmallint), > count(cstring1), > avg(cfloat), > stddev_pop(cdouble) > from alltypes_parquet > group by ctinyint; > select ctinyint, > max(cint), > min(csmallint), > count(cstring1), > avg(cfloat), > stddev_pop(cdouble) > from alltypes_parquet > group by ctinyint; > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6414) ParquetInputFormat provides data values that do not match the object inspectors
[ https://issues.apache.org/jira/browse/HIVE-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911587#comment-13911587 ] Justin Coffey commented on HIVE-6414: - Oh, and we don't appear to need the order by for deterministic tests, but I have added it and will submit an updated patch with it (once we have gotten to the bottom of these failures). btw are your qtests passing in #HIVE-6477? > ParquetInputFormat provides data values that do not match the object > inspectors > --- > > Key: HIVE-6414 > URL: https://issues.apache.org/jira/browse/HIVE-6414 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Affects Versions: 0.13.0 >Reporter: Remus Rusanu >Assignee: Justin Coffey > Labels: Parquet > Fix For: 0.13.0 > > Attachments: HIVE-6414.patch > > > While working on HIVE-5998 I noticed that the ParquetRecordReader returns > IntWritable for all 'int like' types, in disaccord with the row object > inspectors. I though fine, and I worked my way around it. But I see now that > the issue trigger failuers in other places, eg. in aggregates: > {noformat} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row > {"cint":528534767,"ctinyint":31,"csmallint":4963,"cfloat":31.0,"cdouble":4963.0,"cstring1":"cvLH6Eat2yFsyy7p"} > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) > ... 8 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast > to java.lang.Short > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:808) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:524) > ... 9 more > Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable > cannot be cast to java.lang.Short > at > org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaShortObjectInspector.get(JavaShortObjectInspector.java:41) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:671) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:631) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.merge(GenericUDAFMin.java:109) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.iterate(GenericUDAFMin.java:96) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:183) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:641) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:838) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:735) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:803) > ... 15 more > {noformat} > My test is (I'm writing a test .q from HIVE-5998, but the repro does not > involve vectorization): > {noformat} > create table if not exists alltypes_parquet ( > cint int, > ctinyint tinyint, > csmallint smallint, > cfloat float, > cdouble double, > cstring1 string) stored as parquet; > insert overwrite table alltypes_parquet > select cint, > ctinyint, > csmallint, > cfloat, > cdouble, > cstring1 > from alltypesorc; > explain select * from alltypes_parquet limit 10; select * from > alltypes_parquet limit 10; > explain select ctinyint, > max(cint), > min(csmallint), > count(cstring1), > avg(cfloat), > stddev_pop(cdouble) > from alltypes_parquet > group by ctinyint; > select ctinyint, > max(cint), > min(csmallint), > count(cstring1), > avg(cfloat), > stddev_pop(cdouble) > from alltypes_parquet > group by ctinyint; > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6414) ParquetInputFormat provides data values that do not match the object inspectors
[ https://issues.apache.org/jira/browse/HIVE-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911585#comment-13911585 ] Justin Coffey commented on HIVE-6414: - Hi Szehon, I worked off of the trunk on this. We are applying cleanly to the latest commit and unit tests pass, but our qtest fails after the commit for #HIVE-5958. qtests for parquet_create.q work just fine though. We're digging into it. > ParquetInputFormat provides data values that do not match the object > inspectors > --- > > Key: HIVE-6414 > URL: https://issues.apache.org/jira/browse/HIVE-6414 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Affects Versions: 0.13.0 >Reporter: Remus Rusanu >Assignee: Justin Coffey > Labels: Parquet > Fix For: 0.13.0 > > Attachments: HIVE-6414.patch > > > While working on HIVE-5998 I noticed that the ParquetRecordReader returns > IntWritable for all 'int like' types, in disaccord with the row object > inspectors. I though fine, and I worked my way around it. But I see now that > the issue trigger failuers in other places, eg. in aggregates: > {noformat} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row > {"cint":528534767,"ctinyint":31,"csmallint":4963,"cfloat":31.0,"cdouble":4963.0,"cstring1":"cvLH6Eat2yFsyy7p"} > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) > ... 8 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast > to java.lang.Short > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:808) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:524) > ... 9 more > Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable > cannot be cast to java.lang.Short > at > org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaShortObjectInspector.get(JavaShortObjectInspector.java:41) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:671) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:631) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.merge(GenericUDAFMin.java:109) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.iterate(GenericUDAFMin.java:96) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:183) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:641) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:838) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:735) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:803) > ... 15 more > {noformat} > My test is (I'm writing a test .q from HIVE-5998, but the repro does not > involve vectorization): > {noformat} > create table if not exists alltypes_parquet ( > cint int, > ctinyint tinyint, > csmallint smallint, > cfloat float, > cdouble double, > cstring1 string) stored as parquet; > insert overwrite table alltypes_parquet > select cint, > ctinyint, > csmallint, > cfloat, > cdouble, > cstring1 > from alltypesorc; > explain select * from alltypes_parquet limit 10; select * from > alltypes_parquet limit 10; > explain select ctinyint, > max(cint), > min(csmallint), > count(cstring1), > avg(cfloat), > stddev_pop(cdouble) > from alltypes_parquet > group by ctinyint; > select ctinyint, > max(cint), > min(csmallint), > count(cstring1), > avg(cfloat), > stddev_pop(cdouble) > from alltypes_parquet > group by ctinyint; > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6414) ParquetInputFormat provides data values that do not match the object inspectors
[ https://issues.apache.org/jira/browse/HIVE-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Coffey updated HIVE-6414: Attachment: HIVE-6414.patch Credit should be given to Remy Pecqueur > ParquetInputFormat provides data values that do not match the object > inspectors > --- > > Key: HIVE-6414 > URL: https://issues.apache.org/jira/browse/HIVE-6414 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Affects Versions: 0.13.0 >Reporter: Remus Rusanu >Assignee: Justin Coffey > Labels: Parquet > Fix For: 0.13.0 > > Attachments: HIVE-6414.patch > > > While working on HIVE-5998 I noticed that the ParquetRecordReader returns > IntWritable for all 'int like' types, in disaccord with the row object > inspectors. I though fine, and I worked my way around it. But I see now that > the issue trigger failuers in other places, eg. in aggregates: > {noformat} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row > {"cint":528534767,"ctinyint":31,"csmallint":4963,"cfloat":31.0,"cdouble":4963.0,"cstring1":"cvLH6Eat2yFsyy7p"} > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) > ... 8 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast > to java.lang.Short > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:808) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:524) > ... 9 more > Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable > cannot be cast to java.lang.Short > at > org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaShortObjectInspector.get(JavaShortObjectInspector.java:41) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:671) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:631) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.merge(GenericUDAFMin.java:109) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.iterate(GenericUDAFMin.java:96) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:183) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:641) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:838) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:735) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:803) > ... 15 more > {noformat} > My test is (I'm writing a test .q from HIVE-5998, but the repro does not > involve vectorization): > {noformat} > create table if not exists alltypes_parquet ( > cint int, > ctinyint tinyint, > csmallint smallint, > cfloat float, > cdouble double, > cstring1 string) stored as parquet; > insert overwrite table alltypes_parquet > select cint, > ctinyint, > csmallint, > cfloat, > cdouble, > cstring1 > from alltypesorc; > explain select * from alltypes_parquet limit 10; select * from > alltypes_parquet limit 10; > explain select ctinyint, > max(cint), > min(csmallint), > count(cstring1), > avg(cfloat), > stddev_pop(cdouble) > from alltypes_parquet > group by ctinyint; > select ctinyint, > max(cint), > min(csmallint), > count(cstring1), > avg(cfloat), > stddev_pop(cdouble) > from alltypes_parquet > group by ctinyint; > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6414) ParquetInputFormat provides data values that do not match the object inspectors
[ https://issues.apache.org/jira/browse/HIVE-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Coffey updated HIVE-6414: Fix Version/s: 0.13.0 Affects Version/s: 0.13.0 Status: Patch Available (was: Open) the patch was developed against this commit: b05004a863b09cbe5f4b734c5474092f328f0c41 unit tests and qtests run fine against this commit. the latest commit (as of today): 1a3608d8b1f8cf41e9ba2fc7e9bacdecf271bb92 Appears to have broken qtests (none will run) and so I can't verify the patch specific qtest. Unit tests, however, execute without error. > ParquetInputFormat provides data values that do not match the object > inspectors > --- > > Key: HIVE-6414 > URL: https://issues.apache.org/jira/browse/HIVE-6414 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Affects Versions: 0.13.0 >Reporter: Remus Rusanu >Assignee: Justin Coffey > Labels: Parquet > Fix For: 0.13.0 > > > While working on HIVE-5998 I noticed that the ParquetRecordReader returns > IntWritable for all 'int like' types, in disaccord with the row object > inspectors. I though fine, and I worked my way around it. But I see now that > the issue trigger failuers in other places, eg. in aggregates: > {noformat} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row > {"cint":528534767,"ctinyint":31,"csmallint":4963,"cfloat":31.0,"cdouble":4963.0,"cstring1":"cvLH6Eat2yFsyy7p"} > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) > ... 8 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast > to java.lang.Short > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:808) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:524) > ... 9 more > Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable > cannot be cast to java.lang.Short > at > org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaShortObjectInspector.get(JavaShortObjectInspector.java:41) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:671) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:631) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.merge(GenericUDAFMin.java:109) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.iterate(GenericUDAFMin.java:96) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:183) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:641) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:838) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:735) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:803) > ... 15 more > {noformat} > My test is (I'm writing a test .q from HIVE-5998, but the repro does not > involve vectorization): > {noformat} > create table if not exists alltypes_parquet ( > cint int, > ctinyint tinyint, > csmallint smallint, > cfloat float, > cdouble double, > cstring1 string) stored as parquet; > insert overwrite table alltypes_parquet > select cint, > ctinyint, > csmallint, > cfloat, > cdouble, > cstring1 > from alltypesorc; > explain select * from alltypes_parquet limit 10; select * from > alltypes_parquet limit 10; > explain select ctinyint, >
[jira] [Created] (HIVE-6463) unit test for evoloving schema in parquet files
Justin Coffey created HIVE-6463: --- Summary: unit test for evoloving schema in parquet files Key: HIVE-6463 URL: https://issues.apache.org/jira/browse/HIVE-6463 Project: Hive Issue Type: Test Reporter: Justin Coffey Assignee: Justin Coffey Unit test(s) for patch found in #HIVE-6456 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6456) Implement Parquet schema evolution
[ https://issues.apache.org/jira/browse/HIVE-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13905459#comment-13905459 ] Justin Coffey commented on HIVE-6456: - done and linked. > Implement Parquet schema evolution > -- > > Key: HIVE-6456 > URL: https://issues.apache.org/jira/browse/HIVE-6456 > Project: Hive > Issue Type: Improvement >Reporter: Brock Noland >Assignee: Brock Noland >Priority: Trivial > Attachments: HIVE-6456.patch > > > In HIVE-5783 we removed schema evolution: > https://github.com/Parquet/parquet-mr/pull/297/files#r9824155 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6456) Implement Parquet schema evolution
[ https://issues.apache.org/jira/browse/HIVE-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13905450#comment-13905450 ] Justin Coffey commented on HIVE-6456: - brock and I had the same thought offline. Not sure what the protocol is here: should I open a separate ticket? > Implement Parquet schema evolution > -- > > Key: HIVE-6456 > URL: https://issues.apache.org/jira/browse/HIVE-6456 > Project: Hive > Issue Type: Improvement >Reporter: Brock Noland >Assignee: Brock Noland >Priority: Trivial > Attachments: HIVE-6456.patch > > > In HIVE-5783 we removed schema evolution: > https://github.com/Parquet/parquet-mr/pull/297/files#r9824155 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6456) Improve Parquet schema evolution
[ https://issues.apache.org/jira/browse/HIVE-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904423#comment-13904423 ] Justin Coffey commented on HIVE-6456: - good to go. thanks for the fast work! > Improve Parquet schema evolution > > > Key: HIVE-6456 > URL: https://issues.apache.org/jira/browse/HIVE-6456 > Project: Hive > Issue Type: Improvement >Reporter: Brock Noland >Assignee: Brock Noland >Priority: Trivial > Attachments: HIVE-6456.patch > > > In HIVE-5783 we removed schema evolution: > https://github.com/Parquet/parquet-mr/pull/297/files#r9824155 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HIVE-6414) ParquetInputFormat provides data values that do not match the object inspectors
[ https://issues.apache.org/jira/browse/HIVE-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Coffey reassigned HIVE-6414: --- Assignee: Justin Coffey > ParquetInputFormat provides data values that do not match the object > inspectors > --- > > Key: HIVE-6414 > URL: https://issues.apache.org/jira/browse/HIVE-6414 > Project: Hive > Issue Type: Bug >Reporter: Remus Rusanu > Assignee: Justin Coffey > > While working on HIVE-5998 I noticed that the ParquetRecordReader returns > IntWritable for all 'int like' types, in disaccord with the row object > inspectors. I though fine, and I worked my way around it. But I see now that > the issue trigger failuers in other places, eg. in aggregates: > {noformat} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row > {"cint":528534767,"ctinyint":31,"csmallint":4963,"cfloat":31.0,"cdouble":4963.0,"cstring1":"cvLH6Eat2yFsyy7p"} > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) > ... 8 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast > to java.lang.Short > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:808) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:524) > ... 9 more > Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable > cannot be cast to java.lang.Short > at > org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaShortObjectInspector.get(JavaShortObjectInspector.java:41) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:671) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:631) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.merge(GenericUDAFMin.java:109) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.iterate(GenericUDAFMin.java:96) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:183) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:641) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:838) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:735) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:803) > ... 15 more > {noformat} > My test is (I'm writing a test .q from HIVE-5998, but the repro does not > involve vectorization): > {noformat} > create table if not exists alltypes_parquet ( > cint int, > ctinyint tinyint, > csmallint smallint, > cfloat float, > cdouble double, > cstring1 string) stored as parquet; > insert overwrite table alltypes_parquet > select cint, > ctinyint, > csmallint, > cfloat, > cdouble, > cstring1 > from alltypesorc; > explain select * from alltypes_parquet limit 10; select * from > alltypes_parquet limit 10; > explain select ctinyint, > max(cint), > min(csmallint), > count(cstring1), > avg(cfloat), > stddev_pop(cdouble) > from alltypes_parquet > group by ctinyint; > select ctinyint, > max(cint), > min(csmallint), > count(cstring1), > avg(cfloat), > stddev_pop(cdouble) > from alltypes_parquet > group by ctinyint; > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6414) ParquetInputFormat provides data values that do not match the object inspectors
[ https://issues.apache.org/jira/browse/HIVE-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13899284#comment-13899284 ] Justin Coffey commented on HIVE-6414: - I'll investigate. > ParquetInputFormat provides data values that do not match the object > inspectors > --- > > Key: HIVE-6414 > URL: https://issues.apache.org/jira/browse/HIVE-6414 > Project: Hive > Issue Type: Bug >Reporter: Remus Rusanu >Assignee: Justin Coffey > > While working on HIVE-5998 I noticed that the ParquetRecordReader returns > IntWritable for all 'int like' types, in disaccord with the row object > inspectors. I though fine, and I worked my way around it. But I see now that > the issue trigger failuers in other places, eg. in aggregates: > {noformat} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row > {"cint":528534767,"ctinyint":31,"csmallint":4963,"cfloat":31.0,"cdouble":4963.0,"cstring1":"cvLH6Eat2yFsyy7p"} > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) > ... 8 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast > to java.lang.Short > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:808) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:524) > ... 9 more > Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable > cannot be cast to java.lang.Short > at > org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaShortObjectInspector.get(JavaShortObjectInspector.java:41) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:671) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:631) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.merge(GenericUDAFMin.java:109) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.iterate(GenericUDAFMin.java:96) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:183) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:641) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:838) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:735) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:803) > ... 15 more > {noformat} > My test is (I'm writing a test .q from HIVE-5998, but the repro does not > involve vectorization): > {noformat} > create table if not exists alltypes_parquet ( > cint int, > ctinyint tinyint, > csmallint smallint, > cfloat float, > cdouble double, > cstring1 string) stored as parquet; > insert overwrite table alltypes_parquet > select cint, > ctinyint, > csmallint, > cfloat, > cdouble, > cstring1 > from alltypesorc; > explain select * from alltypes_parquet limit 10; select * from > alltypes_parquet limit 10; > explain select ctinyint, > max(cint), > min(csmallint), > count(cstring1), > avg(cfloat), > stddev_pop(cdouble) > from alltypes_parquet > group by ctinyint; > select ctinyint, > max(cint), > min(csmallint), > count(cstring1), > avg(cfloat), > stddev_pop(cdouble) > from alltypes_parquet > group by ctinyint; > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896303#comment-13896303 ] Justin Coffey commented on HIVE-5783: - Thanks to all, and especially [~brocknoland] for all his help! > Native Parquet Support in Hive > -- > > Key: HIVE-5783 > URL: https://issues.apache.org/jira/browse/HIVE-5783 > Project: Hive > Issue Type: New Feature > Components: Serializers/Deserializers >Reporter: Justin Coffey > Assignee: Justin Coffey >Priority: Minor > Fix For: 0.13.0 > > Attachments: HIVE-5783.noprefix.patch, HIVE-5783.noprefix.patch, > HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, > HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, > HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, > HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, > HIVE-5783.patch > > > Problem Statement: > Hive would be easier to use if it had native Parquet support. Our > organization, Criteo, uses Hive extensively. Therefore we built the Parquet > Hive integration and would like to now contribute that integration to Hive. > About Parquet: > Parquet is a columnar storage format for Hadoop and integrates with many > Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, > Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native > Parquet integration. > Changes Details: > Parquet was built with dependency management in mind and therefore only a > single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Coffey updated HIVE-5783: Attachment: HIVE-5783.patch The updated patch. This fixes incorrect behavior when using HiveInputSplits. Regression tests have been added as a qtest (parquet_partitioned.q). > Native Parquet Support in Hive > -- > > Key: HIVE-5783 > URL: https://issues.apache.org/jira/browse/HIVE-5783 > Project: Hive > Issue Type: New Feature > Components: Serializers/Deserializers > Reporter: Justin Coffey > Assignee: Justin Coffey >Priority: Minor > Fix For: 0.13.0 > > Attachments: HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, > HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch > > > Problem Statement: > Hive would be easier to use if it had native Parquet support. Our > organization, Criteo, uses Hive extensively. Therefore we built the Parquet > Hive integration and would like to now contribute that integration to Hive. > About Parquet: > Parquet is a columnar storage format for Hadoop and integrates with many > Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, > Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native > Parquet integration. > Changes Details: > Parquet was built with dependency management in mind and therefore only a > single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13879960#comment-13879960 ] Justin Coffey commented on HIVE-5783: - We have unfortunately found a bug in MapredParquetInputFormat. We are working on a fix and will resubmit a patch once tested. Sorry :( > Native Parquet Support in Hive > -- > > Key: HIVE-5783 > URL: https://issues.apache.org/jira/browse/HIVE-5783 > Project: Hive > Issue Type: New Feature > Components: Serializers/Deserializers >Reporter: Justin Coffey > Assignee: Justin Coffey >Priority: Minor > Fix For: 0.13.0 > > Attachments: HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, > HIVE-5783.patch, HIVE-5783.patch > > > Problem Statement: > Hive would be easier to use if it had native Parquet support. Our > organization, Criteo, uses Hive extensively. Therefore we built the Parquet > Hive integration and would like to now contribute that integration to Hive. > About Parquet: > Parquet is a columnar storage format for Hadoop and integrates with many > Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, > Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native > Parquet integration. > Changes Details: > Parquet was built with dependency management in mind and therefore only a > single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13877509#comment-13877509 ] Justin Coffey commented on HIVE-5783: - [~leftylev], if you'd like I can give this a review and propose changes. > Native Parquet Support in Hive > -- > > Key: HIVE-5783 > URL: https://issues.apache.org/jira/browse/HIVE-5783 > Project: Hive > Issue Type: New Feature > Components: Serializers/Deserializers >Reporter: Justin Coffey >Assignee: Justin Coffey >Priority: Minor > Fix For: 0.13.0 > > Attachments: HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, > HIVE-5783.patch, HIVE-5783.patch > > > Problem Statement: > Hive would be easier to use if it had native Parquet support. Our > organization, Criteo, uses Hive extensively. Therefore we built the Parquet > Hive integration and would like to now contribute that integration to Hive. > About Parquet: > Parquet is a columnar storage format for Hadoop and integrates with many > Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, > Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native > Parquet integration. > Changes Details: > Parquet was built with dependency management in mind and therefore only a > single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13876712#comment-13876712 ] Justin Coffey commented on HIVE-5783: - Sorry for the spam in posts. Latest patch is good: - no author tags - no criteo copyright - builds against latest version of parquet (1.3.2) I attempted to create a review.apache.org review, but am unable to publish it because I can't assign any reviewers. > Native Parquet Support in Hive > -- > > Key: HIVE-5783 > URL: https://issues.apache.org/jira/browse/HIVE-5783 > Project: Hive > Issue Type: New Feature > Components: Serializers/Deserializers >Reporter: Justin Coffey >Assignee: Justin Coffey >Priority: Minor > Attachments: HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch > > > Problem Statement: > Hive would be easier to use if it had native Parquet support. Our > organization, Criteo, uses Hive extensively. Therefore we built the Parquet > Hive integration and would like to now contribute that integration to Hive. > About Parquet: > Parquet is a columnar storage format for Hadoop and integrates with many > Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, > Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native > Parquet integration. > Changes Details: > Parquet was built with dependency management in mind and therefore only a > single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Coffey updated HIVE-5783: Attachment: HIVE-5783.patch this is the good one. had a final dependency to clean up. > Native Parquet Support in Hive > -- > > Key: HIVE-5783 > URL: https://issues.apache.org/jira/browse/HIVE-5783 > Project: Hive > Issue Type: New Feature > Components: Serializers/Deserializers > Reporter: Justin Coffey > Assignee: Justin Coffey >Priority: Minor > Attachments: HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch > > > Problem Statement: > Hive would be easier to use if it had native Parquet support. Our > organization, Criteo, uses Hive extensively. Therefore we built the Parquet > Hive integration and would like to now contribute that integration to Hive. > About Parquet: > Parquet is a columnar storage format for Hadoop and integrates with many > Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, > Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native > Parquet integration. > Changes Details: > Parquet was built with dependency management in mind and therefore only a > single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Coffey updated HIVE-5783: Attachment: (was: HIVE-5783.patch) > Native Parquet Support in Hive > -- > > Key: HIVE-5783 > URL: https://issues.apache.org/jira/browse/HIVE-5783 > Project: Hive > Issue Type: New Feature > Components: Serializers/Deserializers > Reporter: Justin Coffey > Assignee: Justin Coffey >Priority: Minor > Attachments: HIVE-5783.patch, HIVE-5783.patch > > > Problem Statement: > Hive would be easier to use if it had native Parquet support. Our > organization, Criteo, uses Hive extensively. Therefore we built the Parquet > Hive integration and would like to now contribute that integration to Hive. > About Parquet: > Parquet is a columnar storage format for Hadoop and integrates with many > Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, > Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native > Parquet integration. > Changes Details: > Parquet was built with dependency management in mind and therefore only a > single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Coffey updated HIVE-5783: Attachment: HIVE-5783.patch without license or author tags. > Native Parquet Support in Hive > -- > > Key: HIVE-5783 > URL: https://issues.apache.org/jira/browse/HIVE-5783 > Project: Hive > Issue Type: New Feature > Components: Serializers/Deserializers > Reporter: Justin Coffey > Assignee: Justin Coffey >Priority: Minor > Attachments: HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch > > > Problem Statement: > Hive would be easier to use if it had native Parquet support. Our > organization, Criteo, uses Hive extensively. Therefore we built the Parquet > Hive integration and would like to now contribute that integration to Hive. > About Parquet: > Parquet is a columnar storage format for Hadoop and integrates with many > Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, > Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native > Parquet integration. > Changes Details: > Parquet was built with dependency management in mind and therefore only a > single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Coffey updated HIVE-5783: Attachment: (was: parquet-hive.patch) > Native Parquet Support in Hive > -- > > Key: HIVE-5783 > URL: https://issues.apache.org/jira/browse/HIVE-5783 > Project: Hive > Issue Type: New Feature > Components: Serializers/Deserializers > Reporter: Justin Coffey > Assignee: Justin Coffey >Priority: Minor > Attachments: HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch > > > Problem Statement: > Hive would be easier to use if it had native Parquet support. Our > organization, Criteo, uses Hive extensively. Therefore we built the Parquet > Hive integration and would like to now contribute that integration to Hive. > About Parquet: > Parquet is a columnar storage format for Hadoop and integrates with many > Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, > Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native > Parquet integration. > Changes Details: > Parquet was built with dependency management in mind and therefore only a > single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Coffey updated HIVE-5783: Attachment: (was: hive-0.11-parquet.patch) > Native Parquet Support in Hive > -- > > Key: HIVE-5783 > URL: https://issues.apache.org/jira/browse/HIVE-5783 > Project: Hive > Issue Type: New Feature > Components: Serializers/Deserializers > Reporter: Justin Coffey > Assignee: Justin Coffey >Priority: Minor > Attachments: HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch > > > Problem Statement: > Hive would be easier to use if it had native Parquet support. Our > organization, Criteo, uses Hive extensively. Therefore we built the Parquet > Hive integration and would like to now contribute that integration to Hive. > About Parquet: > Parquet is a columnar storage format for Hadoop and integrates with many > Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, > Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native > Parquet integration. > Changes Details: > Parquet was built with dependency management in mind and therefore only a > single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13875930#comment-13875930 ] Justin Coffey commented on HIVE-5783: - Hi [~cwsteinbach]. Actually, that looks like just a boilerplate auto insertion in the affected class files. The ASF license is on our short list of approved OSS licenses, so I don't think it will be an issue for me to strip that out and resubmit. I'll just double check all is well and resubmit Monday. > Native Parquet Support in Hive > -- > > Key: HIVE-5783 > URL: https://issues.apache.org/jira/browse/HIVE-5783 > Project: Hive > Issue Type: New Feature > Components: Serializers/Deserializers >Reporter: Justin Coffey >Assignee: Justin Coffey >Priority: Minor > Attachments: HIVE-5783.patch, HIVE-5783.patch, > hive-0.11-parquet.patch, parquet-hive.patch > > > Problem Statement: > Hive would be easier to use if it had native Parquet support. Our > organization, Criteo, uses Hive extensively. Therefore we built the Parquet > Hive integration and would like to now contribute that integration to Hive. > About Parquet: > Parquet is a columnar storage format for Hadoop and integrates with many > Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, > Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native > Parquet integration. > Changes Details: > Parquet was built with dependency management in mind and therefore only a > single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874666#comment-13874666 ] Justin Coffey commented on HIVE-5783: - [~rusanu]: like so? https://reviews.facebook.net/differential/diff/47487/ > Native Parquet Support in Hive > -- > > Key: HIVE-5783 > URL: https://issues.apache.org/jira/browse/HIVE-5783 > Project: Hive > Issue Type: New Feature > Components: Serializers/Deserializers >Reporter: Justin Coffey > Assignee: Justin Coffey >Priority: Minor > Attachments: HIVE-5783.patch, hive-0.11-parquet.patch, > parquet-hive.patch > > > Problem Statement: > Hive would be easier to use if it had native Parquet support. Our > organization, Criteo, uses Hive extensively. Therefore we built the Parquet > Hive integration and would like to now contribute that integration to Hive. > About Parquet: > Parquet is a columnar storage format for Hadoop and integrates with many > Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, > Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native > Parquet integration. > Changes Details: > Parquet was built with dependency management in mind and therefore only a > single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874626#comment-13874626 ] Justin Coffey commented on HIVE-5783: - After much delay, here is the patch. This integrates the former "parquet-hive" project directly into ql.io.parquet. There is a qtest file (modeled on that of ORC) and unit tests for much of the code. This applies cleanly to the commit 3a7cea58ababfbbbdb6eac97fefa4298337b7c06 on the branch-0.11. Comments welcome :). > Native Parquet Support in Hive > -- > > Key: HIVE-5783 > URL: https://issues.apache.org/jira/browse/HIVE-5783 > Project: Hive > Issue Type: New Feature > Components: Serializers/Deserializers >Reporter: Justin Coffey >Assignee: Justin Coffey >Priority: Minor > Attachments: HIVE-5783.patch, hive-0.11-parquet.patch, > parquet-hive.patch > > > Problem Statement: > Hive would be easier to use if it had native Parquet support. Our > organization, Criteo, uses Hive extensively. Therefore we built the Parquet > Hive integration and would like to now contribute that integration to Hive. > About Parquet: > Parquet is a columnar storage format for Hadoop and integrates with many > Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, > Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native > Parquet integration. > Changes Details: > Parquet was built with dependency management in mind and therefore only a > single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Coffey updated HIVE-5783: Attachment: parquet-hive.patch > Native Parquet Support in Hive > -- > > Key: HIVE-5783 > URL: https://issues.apache.org/jira/browse/HIVE-5783 > Project: Hive > Issue Type: New Feature > Components: Serializers/Deserializers > Reporter: Justin Coffey > Assignee: Justin Coffey >Priority: Minor > Attachments: HIVE-5783.patch, hive-0.11-parquet.patch, > parquet-hive.patch > > > Problem Statement: > Hive would be easier to use if it had native Parquet support. Our > organization, Criteo, uses Hive extensively. Therefore we built the Parquet > Hive integration and would like to now contribute that integration to Hive. > About Parquet: > Parquet is a columnar storage format for Hadoop and integrates with many > Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, > Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native > Parquet integration. > Changes Details: > Parquet was built with dependency management in mind and therefore only a > single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851661#comment-13851661 ] Justin Coffey commented on HIVE-5783: - Yes this is true. We are refactoring to merge the whole parquet-hive project into hive. There are a couple of folks involved at this point and so it's taking a smidgen extra time what with holidays and all. > Native Parquet Support in Hive > -- > > Key: HIVE-5783 > URL: https://issues.apache.org/jira/browse/HIVE-5783 > Project: Hive > Issue Type: New Feature > Components: Serializers/Deserializers >Reporter: Justin Coffey >Assignee: Justin Coffey >Priority: Minor > Attachments: HIVE-5783.patch, hive-0.11-parquet.patch > > > Problem Statement: > Hive would be easier to use if it had native Parquet support. Our > organization, Criteo, uses Hive extensively. Therefore we built the Parquet > Hive integration and would like to now contribute that integration to Hive. > About Parquet: > Parquet is a columnar storage format for Hadoop and integrates with many > Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, > Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native > Parquet integration. > Changes Details: > Parquet was built with dependency management in mind and therefore only a > single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844116#comment-13844116 ] Justin Coffey commented on HIVE-5783: - [~cwsteinbach] all sounds good. Regarding test cases, I had some QTests prepared, but they were excluded from the initial patch to keep it as minimal as possible. We'll be sure to have full test coverage with the follow up patch. > Native Parquet Support in Hive > -- > > Key: HIVE-5783 > URL: https://issues.apache.org/jira/browse/HIVE-5783 > Project: Hive > Issue Type: New Feature > Reporter: Justin Coffey >Assignee: Justin Coffey >Priority: Minor > Fix For: 0.11.0 > > Attachments: HIVE-5783.patch, hive-0.11-parquet.patch > > > Problem Statement: > Hive would be easier to use if it had native Parquet support. Our > organization, Criteo, uses Hive extensively. Therefore we built the Parquet > Hive integration and would like to now contribute that integration to Hive. > About Parquet: > Parquet is a columnar storage format for Hadoop and integrates with many > Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, > Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native > Parquet integration. > Changes Details: > Parquet was built with dependency management in mind and therefore only a > single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843428#comment-13843428 ] Justin Coffey commented on HIVE-5783: - (sorry, errant trackpad submit on the last comment) I wanted to add that I think the registry/format factory refactoring of the BaseSemanticAnalyzer still seems out of scope for this request. There is willingness to work on that on a different ticket, but I humbly submit that the two are not linked and one should not impede the other. Good? > Native Parquet Support in Hive > -- > > Key: HIVE-5783 > URL: https://issues.apache.org/jira/browse/HIVE-5783 > Project: Hive > Issue Type: New Feature > Reporter: Justin Coffey >Assignee: Justin Coffey >Priority: Minor > Fix For: 0.11.0 > > Attachments: HIVE-5783.patch, hive-0.11-parquet.patch > > > Problem Statement: > Hive would be easier to use if it had native Parquet support. Our > organization, Criteo, uses Hive extensively. Therefore we built the Parquet > Hive integration and would like to now contribute that integration to Hive. > About Parquet: > Parquet is a columnar storage format for Hadoop and integrates with many > Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, > Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native > Parquet integration. > Changes Details: > Parquet was built with dependency management in mind and therefore only a > single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843425#comment-13843425 ] Justin Coffey commented on HIVE-5783: - Hi [~cwsteinbach], so on the parquet-hive side, we're good to submit a new patch with direct serde integration. We'll work on that presently. > Native Parquet Support in Hive > -- > > Key: HIVE-5783 > URL: https://issues.apache.org/jira/browse/HIVE-5783 > Project: Hive > Issue Type: New Feature >Reporter: Justin Coffey >Assignee: Justin Coffey >Priority: Minor > Fix For: 0.11.0 > > Attachments: HIVE-5783.patch, hive-0.11-parquet.patch > > > Problem Statement: > Hive would be easier to use if it had native Parquet support. Our > organization, Criteo, uses Hive extensively. Therefore we built the Parquet > Hive integration and would like to now contribute that integration to Hive. > About Parquet: > Parquet is a columnar storage format for Hadoop and integrates with many > Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, > Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native > Parquet integration. > Changes Details: > Parquet was built with dependency management in mind and therefore only a > single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841681#comment-13841681 ] Justin Coffey commented on HIVE-5783: - {quote} I think that was done before maven. I am sure there is a reason why RCFILE, ORCFILE and this add there own syntax, but this is something we might not want to copy-and-paste repeat just because the last person did it that way. {quote} I would normally agree with this, but I suppose I was trying to make as minor a change as possible. > Native Parquet Support in Hive > -- > > Key: HIVE-5783 > URL: https://issues.apache.org/jira/browse/HIVE-5783 > Project: Hive > Issue Type: New Feature > Reporter: Justin Coffey >Assignee: Justin Coffey >Priority: Minor > Fix For: 0.11.0 > > Attachments: hive-0.11-parquet.patch > > > Problem Statement: > Hive would be easier to use if it had native Parquet support. Our > organization, Criteo, uses Hive extensively. Therefore we built the Parquet > Hive integration and would like to now contribute that integration to Hive. > About Parquet: > Parquet is a columnar storage format for Hadoop and integrates with many > Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, > Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native > Parquet integration. > Changes Details: > Parquet was built with dependency management in mind and therefore only a > single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841666#comment-13841666 ] Justin Coffey commented on HIVE-5783: - [~appodictic], regarding the support being built into the semantic analyzer, I mimicked what was done for ORC support. I agree that a hard coded switch statement is not the best approach, but thought a larger refactoring was out of scope for this request--and definitely not something to be done against the 0.11 branch :). Now with trunk support for parquet-hive I suppose we could tackle this in a more generic/robust way. [~xuefuz], do you mean the actual parquet input/output formats and serde? If so, these are in the parquet-hive project (https://github.com/Parquet/parquet-mr/tree/master/parquet-hive). > Native Parquet Support in Hive > -- > > Key: HIVE-5783 > URL: https://issues.apache.org/jira/browse/HIVE-5783 > Project: Hive > Issue Type: New Feature > Reporter: Justin Coffey >Assignee: Justin Coffey >Priority: Minor > Fix For: 0.11.0 > > Attachments: hive-0.11-parquet.patch > > > Problem Statement: > Hive would be easier to use if it had native Parquet support. Our > organization, Criteo, uses Hive extensively. Therefore we built the Parquet > Hive integration and would like to now contribute that integration to Hive. > About Parquet: > Parquet is a columnar storage format for Hadoop and integrates with many > Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, > Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native > Parquet integration. > Changes Details: > Parquet was built with dependency management in mind and therefore only a > single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Coffey updated HIVE-5783: Fix Version/s: 0.11.0 Release Note: adds stored as parquet and setting parquet as the default storage engine. Status: Patch Available (was: Open) built and tested against hive 0.11--a rebase will be necessary to work against the trunk > Native Parquet Support in Hive > -- > > Key: HIVE-5783 > URL: https://issues.apache.org/jira/browse/HIVE-5783 > Project: Hive > Issue Type: New Feature > Reporter: Justin Coffey > Assignee: Justin Coffey >Priority: Minor > Fix For: 0.11.0 > > Attachments: hive-0.11-parquet.patch > > > Problem Statement: > Hive would be easier to use if it had native Parquet support. Our > organization, Criteo, uses Hive extensively. Therefore we built the Parquet > Hive integration and would like to now contribute that integration to Hive. > About Parquet: > Parquet is a columnar storage format for Hadoop and integrates with many > Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, > Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native > Parquet integration. > Changes Details: > Parquet was built with dependency management in mind and therefore only a > single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Coffey updated HIVE-5783: Attachment: hive-0.11-parquet.patch > Native Parquet Support in Hive > -- > > Key: HIVE-5783 > URL: https://issues.apache.org/jira/browse/HIVE-5783 > Project: Hive > Issue Type: New Feature > Reporter: Justin Coffey > Assignee: Justin Coffey >Priority: Minor > Fix For: 0.11.0 > > Attachments: hive-0.11-parquet.patch > > > Problem Statement: > Hive would be easier to use if it had native Parquet support. Our > organization, Criteo, uses Hive extensively. Therefore we built the Parquet > Hive integration and would like to now contribute that integration to Hive. > About Parquet: > Parquet is a columnar storage format for Hadoop and integrates with many > Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, > Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native > Parquet integration. > Changes Details: > Parquet was built with dependency management in mind and therefore only a > single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820168#comment-13820168 ] Justin Coffey commented on HIVE-5783: - Thanks [~cwsteinbach] and [~ehans]. Regarding vectorization support the parquet team will review ASAP! > Native Parquet Support in Hive > -- > > Key: HIVE-5783 > URL: https://issues.apache.org/jira/browse/HIVE-5783 > Project: Hive > Issue Type: New Feature > Reporter: Justin Coffey >Assignee: Justin Coffey >Priority: Minor > > Problem Statement: > Hive would be easier to use if it had native Parquet support. Our > organization, Criteo, uses Hive extensively. Therefore we built the Parquet > Hive integration and would like to now contribute that integration to Hive. > About Parquet: > Parquet is a columnar storage format for Hadoop and integrates with many > Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, > Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native > Parquet integration. > Changes Details: > Parquet was built with dependency management in mind and therefore only a > single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Coffey reassigned HIVE-5783: --- Assignee: Justin Coffey > Native Parquet Support in Hive > -- > > Key: HIVE-5783 > URL: https://issues.apache.org/jira/browse/HIVE-5783 > Project: Hive > Issue Type: New Feature > Reporter: Justin Coffey > Assignee: Justin Coffey >Priority: Minor > > Problem Statement: > Hive would be easier to use if it had native Parquet support. Our > organization, Criteo, uses Hive extensively. Therefore we built the Parquet > Hive integration and would like to now contribute that integration to Hive. > About Parquet: > Parquet is a columnar storage format for Hadoop and integrates with many > Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, > Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native > Parquet integration. > Changes Details: > Parquet was built with dependency management in mind and therefore only a > single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HIVE-5783) Native Parquet Support in Hive
Justin Coffey created HIVE-5783: --- Summary: Native Parquet Support in Hive Key: HIVE-5783 URL: https://issues.apache.org/jira/browse/HIVE-5783 Project: Hive Issue Type: New Feature Reporter: Justin Coffey Priority: Minor Problem Statement: Hive would be easier to use if it had native Parquet support. Our organization, Criteo, uses Hive extensively. Therefore we built the Parquet Hive integration and would like to now contribute that integration to Hive. About Parquet: Parquet is a columnar storage format for Hadoop and integrates with many Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native Parquet integration. Changes Details: Parquet was built with dependency management in mind and therefore only a single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1#6144)