[jira] [Commented] (HIVE-22973) Handle 0 length batches in LlapArrowRowRecordReader
[ https://issues.apache.org/jira/browse/HIVE-22973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051851#comment-17051851 ] Hive QA commented on HIVE-22973: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 48s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 5s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 26s{color} | {color:blue} llap-ext-client in master has 2 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 42s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 29s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 10s{color} | {color:red} llap-ext-client: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 15s{color} | {color:red} The patch generated 2 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 19m 15s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-20959/dev-support/hive-personality.sh | | git revision | master / deebfb6 | | Default Java | 1.8.0_111 | | findbugs | v3.0.1 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-20959/yetus/diff-checkstyle-llap-ext-client.txt | | asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-20959/yetus/patch-asflicense-problems.txt | | modules | C: llap-ext-client itests/hive-unit U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-20959/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Handle 0 length batches in LlapArrowRowRecordReader > --- > > Key: HIVE-22973 > URL: https://issues.apache.org/jira/browse/HIVE-22973 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22973.01.patch, HIVE-22973.02.patch > > Time Spent: 10m > Remaining Estimate: 0h > > In https://issues.apache.org/jira/browse/HIVE-22856, we allowed > {{LlapArrowBatchRecordReader}} to permit 0 length arrow batches. > {{LlapArrowRowRecordReader}} which is a wrapper over > {{LlapArrowBatchRecordReader}} should also handle this. > On one of the systems (cannot be reproduced easily) where we were running > test {{TestJdbcWithMiniLlapVectorArrow}}, we
[jira] [Commented] (HIVE-22762) Leap day is incorrectly parsed during cast in Hive
[ https://issues.apache.org/jira/browse/HIVE-22762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051848#comment-17051848 ] Hive QA commented on HIVE-22762: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12995637/HIVE-22762.04.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:green}SUCCESS:{color} +1 due to 18096 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/20958/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20958/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20958/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12995637 - PreCommit-HIVE-Build > Leap day is incorrectly parsed during cast in Hive > -- > > Key: HIVE-22762 > URL: https://issues.apache.org/jira/browse/HIVE-22762 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Minor > Fix For: 4.0.0 > > Attachments: HIVE-22762.01.patch, HIVE-22762.01.patch, > HIVE-22762.01.patch, HIVE-22762.01.patch, HIVE-22762.02.patch, > HIVE-22762.03.patch, HIVE-22762.03.patch, HIVE-22762.04.patch > > > While casting a string to a date with a custom date format having day token > before year and moth tokens, the date is parsed incorrectly for leap days. > h3. How to reproduce > Execute {code}select cast("29 02 0" as date format "dd mm rr"){code} with > Hive. The query results in *2020-02-28*, incorrectly. > > Executing the another cast with a slightly modified representation of the > date (day is preceded by year and moth) is however correctly parsed: > {code}select cast("0 02 29" as date format "rr mm dd"){code} > It returns *2020-02-29*. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22865) Include data in replication staging directory
[ https://issues.apache.org/jira/browse/HIVE-22865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PRAVIN KUMAR SINHA updated HIVE-22865: -- Attachment: HIVE-22865.10.patch > Include data in replication staging directory > - > > Key: HIVE-22865 > URL: https://issues.apache.org/jira/browse/HIVE-22865 > Project: Hive > Issue Type: Task >Reporter: PRAVIN KUMAR SINHA >Assignee: PRAVIN KUMAR SINHA >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22865.1.patch, HIVE-22865.10.patch, > HIVE-22865.2.patch, HIVE-22865.3.patch, HIVE-22865.4.patch, > HIVE-22865.5.patch, HIVE-22865.6.patch, HIVE-22865.7.patch, > HIVE-22865.8.patch, HIVE-22865.9.patch > > Time Spent: 4h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22972) Allow table id to be set for table creation requests
[ https://issues.apache.org/jira/browse/HIVE-22972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Gergely updated HIVE-22972: -- Attachment: HIVE-22972.03.patch > Allow table id to be set for table creation requests > > > Key: HIVE-22972 > URL: https://issues.apache.org/jira/browse/HIVE-22972 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22972.01.patch, HIVE-22972.02.patch, > HIVE-22972.03.patch > > > Hive Metastore should accept requests for table creation where the id is set, > ignoring it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22972) Allow table id to be set for table creation requests
[ https://issues.apache.org/jira/browse/HIVE-22972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Gergely updated HIVE-22972: -- Attachment: (was: HIVE-22972.03.patch) > Allow table id to be set for table creation requests > > > Key: HIVE-22972 > URL: https://issues.apache.org/jira/browse/HIVE-22972 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22972.01.patch, HIVE-22972.02.patch, > HIVE-22972.03.patch > > > Hive Metastore should accept requests for table creation where the id is set, > ignoring it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22762) Leap day is incorrectly parsed during cast in Hive
[ https://issues.apache.org/jira/browse/HIVE-22762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051822#comment-17051822 ] Hive QA commented on HIVE-22762: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 37s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 34s{color} | {color:blue} common in master has 63 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 13s{color} | {color:red} common: The patch generated 7 new + 0 unchanged - 0 fixed = 7 total (was 0) {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 15s{color} | {color:red} The patch generated 2 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 13m 24s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-20958/dev-support/hive-personality.sh | | git revision | master / deebfb6 | | Default Java | 1.8.0_111 | | findbugs | v3.0.1 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-20958/yetus/diff-checkstyle-common.txt | | whitespace | http://104.198.109.242/logs//PreCommit-HIVE-Build-20958/yetus/whitespace-eol.txt | | asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-20958/yetus/patch-asflicense-problems.txt | | modules | C: common U: common | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-20958/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Leap day is incorrectly parsed during cast in Hive > -- > > Key: HIVE-22762 > URL: https://issues.apache.org/jira/browse/HIVE-22762 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Minor > Fix For: 4.0.0 > > Attachments: HIVE-22762.01.patch, HIVE-22762.01.patch, > HIVE-22762.01.patch, HIVE-22762.01.patch, HIVE-22762.02.patch, > HIVE-22762.03.patch, HIVE-22762.03.patch, HIVE-22762.04.patch > > > While casting a string to a date with a custom date format having day token > before year and moth tokens, the date is parsed incorrectly for leap days. > h3. How to reproduce > Execute {code}select cast("29 02 0" as date format "dd mm rr"){code} with > Hive. The query results in *2020-02-28*, incorrectly. > > Executing the another cast with a slightly modified representation of the > date (day is preceded by year and moth) is however correctly parsed: > {code}select cast("0 02 29" as date format "rr mm dd"){code} > It returns *2020-02-29*. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David McGinnis updated HIVE-21218: -- Attachment: HIVE-21218.6.patch > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug > Components: kafka integration, Serializers/Deserializers >Affects Versions: 3.1.1 >Reporter: Milan Baran >Assignee: David McGinnis >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, > HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.6.patch, HIVE-21218.patch > > Time Spent: 12.5h > Remaining Estimate: 0h > > According to [Google > groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] > the Confluent avro serialzier uses propertiary format for kafka value - > <4 bytes of schema ID> conforms to schema>. > This format does not cause any problem for Confluent kafka deserializer which > respect the format however for hive kafka handler its bit a problem to > correctly deserialize kafka value, because Hive uses custom deserializer from > bytes to objects and ignores kafka consumer ser/deser classes provided via > table property. > It would be nice to support Confluent format with magic byte. > Also it would be great to support Schema registry as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051815#comment-17051815 ] Hive QA commented on HIVE-21218: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12995636/HIVE-21218.5.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:green}SUCCESS:{color} +1 due to 18096 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/20957/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20957/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20957/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12995636 - PreCommit-HIVE-Build > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug > Components: kafka integration, Serializers/Deserializers >Affects Versions: 3.1.1 >Reporter: Milan Baran >Assignee: David McGinnis >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, > HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch > > Time Spent: 12.5h > Remaining Estimate: 0h > > According to [Google > groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] > the Confluent avro serialzier uses propertiary format for kafka value - > <4 bytes of schema ID> conforms to schema>. > This format does not cause any problem for Confluent kafka deserializer which > respect the format however for hive kafka handler its bit a problem to > correctly deserialize kafka value, because Hive uses custom deserializer from > bytes to objects and ignores kafka consumer ser/deser classes provided via > table property. > It would be nice to support Confluent format with magic byte. > Also it would be great to support Schema registry as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=398141=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398141 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 05/Mar/20 05:16 Start Date: 05/Mar/20 05:16 Worklog Time Spent: 10m Work Description: davidov541 commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r388082372 ## File path: kafka-handler/README.md ## @@ -50,6 +50,9 @@ ALTER TABLE SET TBLPROPERTIES ( "kafka.serde.class" = "org.apache.hadoop.hive.serde2.avro.AvroSerDe"); ``` + +If you use Confluent Avro serialzier/deserializer with Schema Registry you may want to remove 5 bytes from beginning that represents magic byte + schema ID from registry. Review comment: I'm not sure how that is germane to this thread or to this changelist at all. Skipping the bytes does not currently have any connection to having a schema, which is already implemented as part of the Kafka Avro support. The system would work the same as it would if you didn't specify a schema in the first place. I don't see any tests in the other parts which test this, although there are plenty of query tests which test the use of the literal and URL properties, so I suspect there is one such test there. Given that that is orthagonal to this problem, however, I see no need to add another test to this changeset for that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398141) Time Spent: 12.5h (was: 12h 20m) > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug > Components: kafka integration, Serializers/Deserializers >Affects Versions: 3.1.1 >Reporter: Milan Baran >Assignee: David McGinnis >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, > HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch > > Time Spent: 12.5h > Remaining Estimate: 0h > > According to [Google > groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] > the Confluent avro serialzier uses propertiary format for kafka value - > <4 bytes of schema ID> conforms to schema>. > This format does not cause any problem for Confluent kafka deserializer which > respect the format however for hive kafka handler its bit a problem to > correctly deserialize kafka value, because Hive uses custom deserializer from > bytes to objects and ignores kafka consumer ser/deser classes provided via > table property. > It would be nice to support Confluent format with magic byte. > Also it would be great to support Schema registry as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=398139=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398139 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 05/Mar/20 05:08 Start Date: 05/Mar/20 05:08 Worklog Time Spent: 10m Work Description: davidov541 commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r388080591 ## File path: kafka-handler/src/test/org/apache/hadoop/hive/kafka/AvroBytesConverterTest.java ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.kafka; + +import com.google.common.collect.Maps; +import io.confluent.kafka.schemaregistry.client.MockSchemaRegistryClient; +import io.confluent.kafka.serializers.AbstractKafkaAvroSerDeConfig; +import io.confluent.kafka.serializers.KafkaAvroSerializer; +import org.apache.avro.Schema; +import org.apache.hadoop.hive.serde2.avro.AvroGenericRecordWritable; +import org.junit.Assert; +import org.junit.BeforeClass; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.ExpectedException; + +import java.util.Arrays; +import java.util.HashMap; +import java.util.Map; + +/** + * Test class for Hive Kafka Avro SerDe with variable bytes skipped. + */ +public class AvroBytesConverterTest { + private static SimpleRecord simpleRecord = SimpleRecord.newBuilder().setId("123").setName("test").build(); + private static byte[] simpleRecordConfluentBytes; + + @Rule + public ExpectedException exception = ExpectedException.none(); + + /** + * Use the KafkaAvroSerializer from Confluent to serialize the simpleRecord. + */ + @BeforeClass + public static void setUp() { +Map config = Maps.newHashMap(); +config.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, "http://localhost:8081;); +KafkaAvroSerializer avroSerializer = new KafkaAvroSerializer(new MockSchemaRegistryClient()); +avroSerializer.configure(config, false); +simpleRecordConfluentBytes = avroSerializer.serialize("temp", simpleRecord); + } + + private void runConversionTest(KafkaSerDe.AvroBytesConverter conv, byte[] serializedSimpleRecord) { +AvroGenericRecordWritable simpleRecordWritable = conv.getWritable(serializedSimpleRecord); + +Assert.assertNotNull(simpleRecordWritable); +Assert.assertEquals(SimpleRecord.class, simpleRecordWritable.getRecord().getClass()); + +SimpleRecord simpleRecordDeserialized = (SimpleRecord) simpleRecordWritable.getRecord(); + +Assert.assertNotNull(simpleRecordDeserialized); +Assert.assertEquals(simpleRecord, simpleRecordDeserialized); + } + + /** + * Tests the default case of no skipped bytes per record works properly. + */ + @Test + public void convertWithAvroBytesConverter() { +// Since the serialized version was created by Confluent, lets remove the first five bytes to get the actual message. +byte[] simpleRecordWithNoOffset = Arrays.copyOfRange(simpleRecordConfluentBytes, 5, simpleRecordConfluentBytes.length); + +Schema schema = SimpleRecord.getClassSchema(); +KafkaSerDe.AvroBytesConverter conv = new KafkaSerDe.AvroBytesConverter(schema); +runConversionTest(conv, simpleRecordWithNoOffset); + } + + /** + * Tests that the skip converter skips 5 bytes properly, which matches what Confluent needs. + */ + @Test + public void convertWithConfluentAvroBytesConverter() { +Schema schema = SimpleRecord.getClassSchema(); +KafkaSerDe.AvroSkipBytesConverter conv = new KafkaSerDe.AvroSkipBytesConverter(schema, 5); +runConversionTest(conv, simpleRecordConfluentBytes); + } + + /** + * Tests that the skip converter skips a custom number of bytes properly. + */ + @Test + public void convertWithCustomAvroSkipBytesConverter() { +int offset = 2; +// Remove all but two bytes of the five byte offset which Confluent adds, +// to simulate a message with only 2 bytes in front of each message. +byte[] simpleRecordAsOffsetBytes =
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=398135=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398135 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 05/Mar/20 05:03 Start Date: 05/Mar/20 05:03 Worklog Time Spent: 10m Work Description: davidov541 commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r388079696 ## File path: kafka-handler/src/test/org/apache/hadoop/hive/kafka/AvroBytesConverterTest.java ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.kafka; + +import com.google.common.collect.Maps; +import io.confluent.kafka.schemaregistry.client.MockSchemaRegistryClient; +import io.confluent.kafka.serializers.AbstractKafkaAvroSerDeConfig; +import io.confluent.kafka.serializers.KafkaAvroSerializer; +import org.apache.avro.Schema; +import org.apache.hadoop.hive.serde2.avro.AvroGenericRecordWritable; +import org.junit.Assert; +import org.junit.BeforeClass; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.ExpectedException; + +import java.util.Arrays; +import java.util.HashMap; +import java.util.Map; + +/** + * Test class for Hive Kafka Avro SerDe with variable bytes skipped. + */ +public class AvroBytesConverterTest { + private static SimpleRecord simpleRecord = SimpleRecord.newBuilder().setId("123").setName("test").build(); + private static byte[] simpleRecordConfluentBytes; + + @Rule + public ExpectedException exception = ExpectedException.none(); + + /** + * Use the KafkaAvroSerializer from Confluent to serialize the simpleRecord. + */ + @BeforeClass + public static void setUp() { +Map config = Maps.newHashMap(); +config.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, "http://localhost:8081;); +KafkaAvroSerializer avroSerializer = new KafkaAvroSerializer(new MockSchemaRegistryClient()); +avroSerializer.configure(config, false); +simpleRecordConfluentBytes = avroSerializer.serialize("temp", simpleRecord); + } + + private void runConversionTest(KafkaSerDe.AvroBytesConverter conv, byte[] serializedSimpleRecord) { Review comment: We are instantiating KafkaSerDe.AvroBytesConverter for the normal case and KafkaSerDe.AvroSkipBytesConverter for the other cases. If we wanted to move creation of the converter into the function, it would require an if-else statement, making it more complex than just doing this. Additionally, this gives us the leeway to better add tests if more converters are needed later. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398135) Time Spent: 12h 10m (was: 12h) > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug > Components: kafka integration, Serializers/Deserializers >Affects Versions: 3.1.1 >Reporter: Milan Baran >Assignee: David McGinnis >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, > HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch > > Time Spent: 12h 10m > Remaining Estimate: 0h > > According to [Google > groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] > the Confluent avro serialzier uses propertiary format for kafka value - > <4 bytes of schema ID> conforms to schema>. > This format does not cause any
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=398133=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398133 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 05/Mar/20 04:59 Start Date: 05/Mar/20 04:59 Worklog Time Spent: 10m Work Description: cricket007 commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r388078819 ## File path: kafka-handler/src/test/org/apache/hadoop/hive/kafka/AvroBytesConverterTest.java ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.kafka; + +import com.google.common.collect.Maps; +import io.confluent.kafka.schemaregistry.client.MockSchemaRegistryClient; +import io.confluent.kafka.serializers.AbstractKafkaAvroSerDeConfig; +import io.confluent.kafka.serializers.KafkaAvroSerializer; +import org.apache.avro.Schema; +import org.apache.hadoop.hive.serde2.avro.AvroGenericRecordWritable; +import org.junit.Assert; +import org.junit.BeforeClass; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.ExpectedException; + +import java.util.Arrays; +import java.util.HashMap; +import java.util.Map; + +/** + * Test class for Hive Kafka Avro SerDe with variable bytes skipped. + */ +public class AvroBytesConverterTest { + private static SimpleRecord simpleRecord = SimpleRecord.newBuilder().setId("123").setName("test").build(); + private static byte[] simpleRecordConfluentBytes; + + @Rule + public ExpectedException exception = ExpectedException.none(); + + /** + * Use the KafkaAvroSerializer from Confluent to serialize the simpleRecord. + */ + @BeforeClass + public static void setUp() { +Map config = Maps.newHashMap(); +config.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, "http://localhost:8081;); +KafkaAvroSerializer avroSerializer = new KafkaAvroSerializer(new MockSchemaRegistryClient()); +avroSerializer.configure(config, false); +simpleRecordConfluentBytes = avroSerializer.serialize("temp", simpleRecord); + } + + private void runConversionTest(KafkaSerDe.AvroBytesConverter conv, byte[] serializedSimpleRecord) { Review comment: Pass Integer in as a parameter, and you could move more code in here. For the case where there's no offset, pass in null This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398133) Time Spent: 12h (was: 11h 50m) > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug > Components: kafka integration, Serializers/Deserializers >Affects Versions: 3.1.1 >Reporter: Milan Baran >Assignee: David McGinnis >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, > HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch > > Time Spent: 12h > Remaining Estimate: 0h > > According to [Google > groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] > the Confluent avro serialzier uses propertiary format for kafka value - > <4 bytes of schema ID> conforms to schema>. > This format does not cause any problem for Confluent kafka deserializer which > respect the format however for hive kafka handler its bit a problem to > correctly deserialize kafka value, because Hive uses custom deserializer from > bytes to objects and ignores kafka consumer
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=398132=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398132 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 05/Mar/20 04:59 Start Date: 05/Mar/20 04:59 Worklog Time Spent: 10m Work Description: cricket007 commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r388078182 ## File path: kafka-handler/src/test/org/apache/hadoop/hive/kafka/AvroBytesConverterTest.java ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.kafka; + +import com.google.common.collect.Maps; +import io.confluent.kafka.schemaregistry.client.MockSchemaRegistryClient; +import io.confluent.kafka.serializers.AbstractKafkaAvroSerDeConfig; +import io.confluent.kafka.serializers.KafkaAvroSerializer; +import org.apache.avro.Schema; +import org.apache.hadoop.hive.serde2.avro.AvroGenericRecordWritable; +import org.junit.Assert; +import org.junit.BeforeClass; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.ExpectedException; + +import java.util.Arrays; +import java.util.HashMap; +import java.util.Map; + +/** + * Test class for Hive Kafka Avro SerDe with variable bytes skipped. + */ +public class AvroBytesConverterTest { + private static SimpleRecord simpleRecord = SimpleRecord.newBuilder().setId("123").setName("test").build(); + private static byte[] simpleRecordConfluentBytes; + + @Rule + public ExpectedException exception = ExpectedException.none(); + + /** + * Use the KafkaAvroSerializer from Confluent to serialize the simpleRecord. + */ + @BeforeClass + public static void setUp() { +Map config = Maps.newHashMap(); +config.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, "http://localhost:8081;); +KafkaAvroSerializer avroSerializer = new KafkaAvroSerializer(new MockSchemaRegistryClient()); +avroSerializer.configure(config, false); +simpleRecordConfluentBytes = avroSerializer.serialize("temp", simpleRecord); + } + + private void runConversionTest(KafkaSerDe.AvroBytesConverter conv, byte[] serializedSimpleRecord) { +AvroGenericRecordWritable simpleRecordWritable = conv.getWritable(serializedSimpleRecord); + +Assert.assertNotNull(simpleRecordWritable); +Assert.assertEquals(SimpleRecord.class, simpleRecordWritable.getRecord().getClass()); + +SimpleRecord simpleRecordDeserialized = (SimpleRecord) simpleRecordWritable.getRecord(); + +Assert.assertNotNull(simpleRecordDeserialized); +Assert.assertEquals(simpleRecord, simpleRecordDeserialized); + } + + /** + * Tests the default case of no skipped bytes per record works properly. + */ + @Test + public void convertWithAvroBytesConverter() { +// Since the serialized version was created by Confluent, lets remove the first five bytes to get the actual message. +byte[] simpleRecordWithNoOffset = Arrays.copyOfRange(simpleRecordConfluentBytes, 5, simpleRecordConfluentBytes.length); + +Schema schema = SimpleRecord.getClassSchema(); +KafkaSerDe.AvroBytesConverter conv = new KafkaSerDe.AvroBytesConverter(schema); +runConversionTest(conv, simpleRecordWithNoOffset); + } + + /** + * Tests that the skip converter skips 5 bytes properly, which matches what Confluent needs. + */ + @Test + public void convertWithConfluentAvroBytesConverter() { +Schema schema = SimpleRecord.getClassSchema(); +KafkaSerDe.AvroSkipBytesConverter conv = new KafkaSerDe.AvroSkipBytesConverter(schema, 5); +runConversionTest(conv, simpleRecordConfluentBytes); + } + + /** + * Tests that the skip converter skips a custom number of bytes properly. + */ + @Test + public void convertWithCustomAvroSkipBytesConverter() { +int offset = 2; +// Remove all but two bytes of the five byte offset which Confluent adds, +// to simulate a message with only 2 bytes in front of each message. +byte[] simpleRecordAsOffsetBytes =
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=398134=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398134 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 05/Mar/20 04:59 Start Date: 05/Mar/20 04:59 Worklog Time Spent: 10m Work Description: cricket007 commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r388077675 ## File path: kafka-handler/README.md ## @@ -50,6 +50,9 @@ ALTER TABLE SET TBLPROPERTIES ( "kafka.serde.class" = "org.apache.hadoop.hive.serde2.avro.AvroSerDe"); ``` + +If you use Confluent's Avro serialzier or deserializer with the Confluent Schema Registry, you will need to remove five bytes from beginning of each message. These five bytes represent [a magic byte and a four-byte schema ID from registry.](https://docs.confluent.io/current/schema-registry/serializer-formatter.html#wire-format) Review comment: Typo: serializer Remove five bytes from *the* beginning of each From *the* Registry This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398134) > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug > Components: kafka integration, Serializers/Deserializers >Affects Versions: 3.1.1 >Reporter: Milan Baran >Assignee: David McGinnis >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, > HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch > > Time Spent: 12h > Remaining Estimate: 0h > > According to [Google > groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] > the Confluent avro serialzier uses propertiary format for kafka value - > <4 bytes of schema ID> conforms to schema>. > This format does not cause any problem for Confluent kafka deserializer which > respect the format however for hive kafka handler its bit a problem to > correctly deserialize kafka value, because Hive uses custom deserializer from > bytes to objects and ignores kafka consumer ser/deser classes provided via > table property. > It would be nice to support Confluent format with magic byte. > Also it would be great to support Schema registry as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=398130=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398130 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 05/Mar/20 04:52 Start Date: 05/Mar/20 04:52 Worklog Time Spent: 10m Work Description: cricket007 commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r388077250 ## File path: kafka-handler/README.md ## @@ -50,6 +50,9 @@ ALTER TABLE SET TBLPROPERTIES ( "kafka.serde.class" = "org.apache.hadoop.hive.serde2.avro.AvroSerDe"); ``` + +If you use Confluent Avro serialzier/deserializer with Schema Registry you may want to remove 5 bytes from beginning that represents magic byte + schema ID from registry. Review comment: Sorry if I missed it, but is there a test case for when you do skip the bytes, but there's no url or literal given? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398130) Time Spent: 11h 50m (was: 11h 40m) > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug > Components: kafka integration, Serializers/Deserializers >Affects Versions: 3.1.1 >Reporter: Milan Baran >Assignee: David McGinnis >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, > HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch > > Time Spent: 11h 50m > Remaining Estimate: 0h > > According to [Google > groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] > the Confluent avro serialzier uses propertiary format for kafka value - > <4 bytes of schema ID> conforms to schema>. > This format does not cause any problem for Confluent kafka deserializer which > respect the format however for hive kafka handler its bit a problem to > correctly deserialize kafka value, because Hive uses custom deserializer from > bytes to objects and ignores kafka consumer ser/deser classes provided via > table property. > It would be nice to support Confluent format with magic byte. > Also it would be great to support Schema registry as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=398127=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398127 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 05/Mar/20 04:49 Start Date: 05/Mar/20 04:49 Worklog Time Spent: 10m Work Description: cricket007 commented on issue #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#issuecomment-595026999 You may want to rebase? I see your commit for 14888 in here This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398127) Time Spent: 11h 40m (was: 11.5h) > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug > Components: kafka integration, Serializers/Deserializers >Affects Versions: 3.1.1 >Reporter: Milan Baran >Assignee: David McGinnis >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, > HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch > > Time Spent: 11h 40m > Remaining Estimate: 0h > > According to [Google > groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] > the Confluent avro serialzier uses propertiary format for kafka value - > <4 bytes of schema ID> conforms to schema>. > This format does not cause any problem for Confluent kafka deserializer which > respect the format however for hive kafka handler its bit a problem to > correctly deserialize kafka value, because Hive uses custom deserializer from > bytes to objects and ignores kafka consumer ser/deser classes provided via > table property. > It would be nice to support Confluent format with magic byte. > Also it would be great to support Schema registry as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051800#comment-17051800 ] Hive QA commented on HIVE-21218: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 59s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 12s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 44s{color} | {color:blue} serde in master has 197 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 32s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 17s{color} | {color:red} kafka-handler in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 17s{color} | {color:red} kafka-handler in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 17s{color} | {color:red} kafka-handler in the patch failed. {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 10s{color} | {color:red} kafka-handler: The patch generated 10 new + 1 unchanged - 0 fixed = 11 total (was 1) {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 4 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 16s{color} | {color:red} kafka-handler in the patch failed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 15s{color} | {color:red} The patch generated 2 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 17m 28s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc xml compile findbugs checkstyle | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-20957/dev-support/hive-personality.sh | | git revision | master / deebfb6 | | Default Java | 1.8.0_111 | | findbugs | v3.0.1 | | mvninstall | http://104.198.109.242/logs//PreCommit-HIVE-Build-20957/yetus/patch-mvninstall-kafka-handler.txt | | compile | http://104.198.109.242/logs//PreCommit-HIVE-Build-20957/yetus/patch-compile-kafka-handler.txt | | javac | http://104.198.109.242/logs//PreCommit-HIVE-Build-20957/yetus/patch-compile-kafka-handler.txt | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-20957/yetus/diff-checkstyle-kafka-handler.txt | | whitespace | http://104.198.109.242/logs//PreCommit-HIVE-Build-20957/yetus/whitespace-eol.txt | | findbugs | http://104.198.109.242/logs//PreCommit-HIVE-Build-20957/yetus/patch-findbugs-kafka-handler.txt | | asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-20957/yetus/patch-asflicense-problems.txt | | modules | C: serde kafka-handler U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-20957/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug >
[jira] [Commented] (HIVE-22977) Merge delta files instead of running a query in major/minor compaction
[ https://issues.apache.org/jira/browse/HIVE-22977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051792#comment-17051792 ] Hive QA commented on HIVE-22977: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12995635/HIVE-22977.01.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/20956/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20956/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20956/ Messages: {noformat} This message was trimmed, see log for full details [loading ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/jetty-runner/9.3.27.v20190418/jetty-runner-9.3.27.v20190418.jar(javax/servlet/DispatcherType.class)]] [loading ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/jetty-runner/9.3.27.v20190418/jetty-runner-9.3.27.v20190418.jar(javax/servlet/Filter.class)]] [loading ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/jetty-runner/9.3.27.v20190418/jetty-runner-9.3.27.v20190418.jar(javax/servlet/FilterChain.class)]] [loading ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/jetty-runner/9.3.27.v20190418/jetty-runner-9.3.27.v20190418.jar(javax/servlet/FilterConfig.class)]] [loading ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/jetty-runner/9.3.27.v20190418/jetty-runner-9.3.27.v20190418.jar(javax/servlet/ServletException.class)]] [loading ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/jetty-runner/9.3.27.v20190418/jetty-runner-9.3.27.v20190418.jar(javax/servlet/ServletRequest.class)]] [loading ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/jetty-runner/9.3.27.v20190418/jetty-runner-9.3.27.v20190418.jar(javax/servlet/ServletResponse.class)]] [loading ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/jetty-runner/9.3.27.v20190418/jetty-runner-9.3.27.v20190418.jar(javax/servlet/annotation/WebFilter.class)]] [loading ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/jetty-runner/9.3.27.v20190418/jetty-runner-9.3.27.v20190418.jar(javax/servlet/http/HttpServletRequest.class)]] [loading ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/jetty-runner/9.3.27.v20190418/jetty-runner-9.3.27.v20190418.jar(javax/servlet/http/HttpServletResponse.class)]] [loading ZipFileIndexFileObject[/data/hiveptest/working/apache-github-source-source/classification/target/hive-classification-4.0.0-SNAPSHOT.jar(org/apache/hadoop/hive/common/classification/InterfaceAudience$LimitedPrivate.class)]] [loading ZipFileIndexFileObject[/data/hiveptest/working/apache-github-source-source/classification/target/hive-classification-4.0.0-SNAPSHOT.jar(org/apache/hadoop/hive/common/classification/InterfaceStability$Unstable.class)]] [loading ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/io/ByteArrayOutputStream.class)]] [loading ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/io/OutputStream.class)]] [loading ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/io/Closeable.class)]] [loading ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/lang/AutoCloseable.class)]] [loading ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/io/Flushable.class)]] [loading ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(javax/xml/bind/annotation/XmlRootElement.class)]] [loading ZipFileIndexFileObject[/data/hiveptest/working/maven/org/apache/commons/commons-exec/1.1/commons-exec-1.1.jar(org/apache/commons/exec/ExecuteException.class)]] [loading ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/security/PrivilegedExceptionAction.class)]] [loading ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/util/concurrent/ExecutionException.class)]] [loading ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/util/concurrent/TimeoutException.class)]] [loading ZipFileIndexFileObject[/data/hiveptest/working/maven/org/apache/hadoop/hadoop-common/3.1.0/hadoop-common-3.1.0.jar(org/apache/hadoop/fs/FileSystem.class)]] [loading ZipFileIndexFileObject[/data/hiveptest/working/apache-github-source-source/shims/common/target/hive-shims-common-4.0.0-SNAPSHOT.jar(org/apache/hadoop/hive/shims/HadoopShimsSecure.class)]] [loading ZipFileIndexFileObject[/data/hiveptest/working/apache-github-source-source/shims/common/target/hive-shims-common-4.0.0-SNAPSHOT.jar(org/apache/hadoop/hive/shims/ShimLoader.class)]] [loading
[jira] [Commented] (HIVE-22972) Allow table id to be set for table creation requests
[ https://issues.apache.org/jira/browse/HIVE-22972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051788#comment-17051788 ] Hive QA commented on HIVE-22972: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12995658/HIVE-22972.03.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 18096 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.metastore.TestMetastoreHousekeepingLeaderEmptyConfig.testHouseKeepingThreadExistence (batchId=252) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/20955/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20955/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20955/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12995658 - PreCommit-HIVE-Build > Allow table id to be set for table creation requests > > > Key: HIVE-22972 > URL: https://issues.apache.org/jira/browse/HIVE-22972 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22972.01.patch, HIVE-22972.02.patch, > HIVE-22972.03.patch > > > Hive Metastore should accept requests for table creation where the id is set, > ignoring it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22966) LLAP: Consider including waitTime for comparing attempts in same vertex
[ https://issues.apache.org/jira/browse/HIVE-22966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051768#comment-17051768 ] Gopal Vijayaraghavan commented on HIVE-22966: - LGTM - +1 > LLAP: Consider including waitTime for comparing attempts in same vertex > --- > > Key: HIVE-22966 > URL: https://issues.apache.org/jira/browse/HIVE-22966 > Project: Hive > Issue Type: Improvement > Components: llap >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HIVE-22966.3.patch, HIVE-22966.4.patch > > > When attempts are compared within same vertex, it should pick up the attempt > with longest wait time to avoid starvation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22978) Fix decimal precision and scale inference for aggregate rewriting in Calcite
[ https://issues.apache.org/jira/browse/HIVE-22978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-22978: --- Attachment: HIVE-22978.patch > Fix decimal precision and scale inference for aggregate rewriting in Calcite > > > Key: HIVE-22978 > URL: https://issues.apache.org/jira/browse/HIVE-22978 > Project: Hive > Issue Type: Bug > Components: CBO >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Attachments: HIVE-22978.patch > > > Calcite rules can do rewritings of aggregate functions, e.g., {{avg}} into > {{sum/count}}. When type of {{avg}} is decimal, inference of intermediate > precision and scale for the division is not done correctly. The reason is > that we miss support for some types in method {{getDefaultPrecision}} in > {{HiveTypeSystemImpl}}. Additionally, {{deriveSumType}} should be overridden > in {{HiveTypeSystemImpl}} to abide by the Hive semantics for sum aggregate > type inference. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22978) Fix decimal precision and scale inference for aggregate rewriting in Calcite
[ https://issues.apache.org/jira/browse/HIVE-22978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-22978: --- Status: Patch Available (was: In Progress) > Fix decimal precision and scale inference for aggregate rewriting in Calcite > > > Key: HIVE-22978 > URL: https://issues.apache.org/jira/browse/HIVE-22978 > Project: Hive > Issue Type: Bug > Components: CBO >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > > Calcite rules can do rewritings of aggregate functions, e.g., {{avg}} into > {{sum/count}}. When type of {{avg}} is decimal, inference of intermediate > precision and scale for the division is not done correctly. The reason is > that we miss support for some types in method {{getDefaultPrecision}} in > {{HiveTypeSystemImpl}}. Additionally, {{deriveSumType}} should be overridden > in {{HiveTypeSystemImpl}} to abide by the Hive semantics for sum aggregate > type inference. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HIVE-22978) Fix decimal precision and scale inference for aggregate rewriting in Calcite
[ https://issues.apache.org/jira/browse/HIVE-22978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-22978 started by Jesus Camacho Rodriguez. -- > Fix decimal precision and scale inference for aggregate rewriting in Calcite > > > Key: HIVE-22978 > URL: https://issues.apache.org/jira/browse/HIVE-22978 > Project: Hive > Issue Type: Bug > Components: CBO >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > > Calcite rules can do rewritings of aggregate functions, e.g., {{avg}} into > {{sum/count}}. When type of {{avg}} is decimal, inference of intermediate > precision and scale for the division is not done correctly. The reason is > that we miss support for some types in method {{getDefaultPrecision}} in > {{HiveTypeSystemImpl}}. Additionally, {{deriveSumType}} should be overridden > in {{HiveTypeSystemImpl}} to abide by the Hive semantics for sum aggregate > type inference. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22972) Allow table id to be set for table creation requests
[ https://issues.apache.org/jira/browse/HIVE-22972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051759#comment-17051759 ] Hive QA commented on HIVE-22972: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 34s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 1m 14s{color} | {color:blue} standalone-metastore/metastore-server in master has 185 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 14s{color} | {color:red} The patch generated 2 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 15m 51s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-20955/dev-support/hive-personality.sh | | git revision | master / deebfb6 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-20955/yetus/patch-asflicense-problems.txt | | modules | C: standalone-metastore/metastore-server U: standalone-metastore/metastore-server | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-20955/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Allow table id to be set for table creation requests > > > Key: HIVE-22972 > URL: https://issues.apache.org/jira/browse/HIVE-22972 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22972.01.patch, HIVE-22972.02.patch, > HIVE-22972.03.patch > > > Hive Metastore should accept requests for table creation where the id is set, > ignoring it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-22978) Fix decimal precision and scale inference for aggregate rewriting in Calcite
[ https://issues.apache.org/jira/browse/HIVE-22978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez reassigned HIVE-22978: -- > Fix decimal precision and scale inference for aggregate rewriting in Calcite > > > Key: HIVE-22978 > URL: https://issues.apache.org/jira/browse/HIVE-22978 > Project: Hive > Issue Type: Bug > Components: CBO >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > > Calcite rules can do rewritings of aggregate functions, e.g., {{avg}} into > {{sum/count}}. When type of {{avg}} is decimal, inference of intermediate > precision and scale for the division is not done correctly. The reason is > that we miss support for some types in method {{getDefaultPrecision}} in > {{HiveTypeSystemImpl}}. Additionally, {{deriveSumType}} should be overridden > in {{HiveTypeSystemImpl}} to abide by the Hive semantics for sum aggregate > type inference. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22256) Rewriting fails when `IN` clause has items in different order in MV and query.
[ https://issues.apache.org/jira/browse/HIVE-22256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051753#comment-17051753 ] Jesus Camacho Rodriguez commented on HIVE-22256: Thanks for checking [~vgarg]. I will change the approach for the fix slightly (instead of moving the rules around) so those issues go away. > Rewriting fails when `IN` clause has items in different order in MV and query. > -- > > Key: HIVE-22256 > URL: https://issues.apache.org/jira/browse/HIVE-22256 > Project: Hive > Issue Type: Sub-task > Components: CBO, Materialized views >Affects Versions: 3.1.2 >Reporter: Steve Carlin >Assignee: Jesus Camacho Rodriguez >Priority: Major > Attachments: HIVE-22256.patch, expr2.sql > > > Rewriting fails on following materialized view and query (script is also > attached): > create materialized view view2 stored as orc as (select prod_id, cust_id, > store_id, sale_date, qty, amt, descr from sales where cust_id in (1,2,3,4,5)); > explain extended select prod_id, cust_id from sales where cust_id in > (5,1,2,3,4); -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22954) Schedule Repl Load using Hive Scheduler
[ https://issues.apache.org/jira/browse/HIVE-22954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051745#comment-17051745 ] Hive QA commented on HIVE-22954: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12995664/HIVE-22954.13.patch {color:green}SUCCESS:{color} +1 due to 23 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 28 failed/errored test(s), 18089 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[repl_load_requires_admin] (batchId=107) org.apache.hadoop.hive.ql.parse.TestReplAcidTablesWithJsonMessage.testMultiDBTxn (batchId=277) org.apache.hadoop.hive.ql.parse.TestReplAcrossInstancesWithJsonMessageFormat.testBootStrapDumpOfWarehouse (batchId=268) org.apache.hadoop.hive.ql.parse.TestReplAcrossInstancesWithJsonMessageFormat.testIncrementalDumpOfWarehouse (batchId=268) org.apache.hadoop.hive.ql.parse.TestReplAcrossInstancesWithJsonMessageFormat.testMoveOptimizationIncrementalFailureAfterCopy (batchId=268) org.apache.hadoop.hive.ql.parse.TestReplAcrossInstancesWithJsonMessageFormat.testMoveOptimizationIncrementalFailureAfterCopyReplace (batchId=268) org.apache.hadoop.hive.ql.parse.TestReplTableMigrationWithJsonFormat.testIncrementalLoadMigrationManagedToAcidFailure (batchId=275) org.apache.hadoop.hive.ql.parse.TestReplTableMigrationWithJsonFormat.testIncrementalLoadMigrationManagedToAcidFailurePart (batchId=275) org.apache.hadoop.hive.ql.parse.TestReplWithJsonMessageFormat.testIncrementalLoadFailAndRetry (batchId=260) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testIncrementalLoadFailAndRetry (batchId=269) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcidTables.testMultiDBTxn (batchId=279) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootStrapDumpOfWarehouse (batchId=273) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testIncrementalDumpOfWarehouse (batchId=273) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testMoveOptimizationIncrementalFailureAfterCopy (batchId=273) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testMoveOptimizationIncrementalFailureAfterCopyReplace (batchId=273) org.apache.hadoop.hive.ql.parse.TestReplicationWithTableMigration.testIncrementalLoadMigrationManagedToAcidFailure (batchId=263) org.apache.hadoop.hive.ql.parse.TestReplicationWithTableMigration.testIncrementalLoadMigrationManagedToAcidFailurePart (batchId=263) org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios.testRetryFailure (batchId=281) org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenariosACID.testRetryFailure (batchId=261) org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenariosMM.testRetryFailure (batchId=266) org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenariosMMNoAutogather.testRetryFailure (batchId=258) org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenariosMigration.testRetryFailure (batchId=265) org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenariosMigrationNoAutogather.testRetryFailure (batchId=276) org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenariosNoAutogather.testRetryFailure (batchId=278) org.apache.hive.jdbc.TestJdbcWithMiniLlapArrow.testComplexQuery (batchId=293) org.apache.hive.jdbc.TestJdbcWithMiniLlapArrow.testKillQuery (batchId=293) org.apache.hive.jdbc.TestJdbcWithMiniLlapArrow.testKillQueryByTagNegative (batchId=293) org.apache.hive.jdbc.TestJdbcWithMiniLlapArrow.testLlapInputFormatEndToEnd (batchId=293) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/20954/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20954/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20954/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 28 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12995664 - PreCommit-HIVE-Build > Schedule Repl Load using Hive Scheduler > --- > > Key: HIVE-22954 > URL: https://issues.apache.org/jira/browse/HIVE-22954 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22954.01.patch, HIVE-22954.02.patch, > HIVE-22954.03.patch, HIVE-22954.04.patch, HIVE-22954.05.patch, > HIVE-22954.06.patch, HIVE-22954.07.patch, HIVE-22954.08.patch, >
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=398065=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398065 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 05/Mar/20 01:55 Start Date: 05/Mar/20 01:55 Worklog Time Spent: 10m Work Description: davidov541 commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r388038766 ## File path: kafka-handler/README.md ## @@ -50,6 +50,9 @@ ALTER TABLE SET TBLPROPERTIES ( "kafka.serde.class" = "org.apache.hadoop.hive.serde2.avro.AvroSerDe"); ``` + +If you use Confluent Avro serialzier/deserializer with Schema Registry you may want to remove 5 bytes from beginning that represents magic byte + schema ID from registry. Review comment: How about this: for now, we do not call this Confluent, and instead document that if you are using Confluent, you need to use skip bytes = 5. Once we implement the feature to use the schema ID properly, then we can use Confluent at that point. That way it is clear what functionality must be in place in order to have a separate SerDe type. In a related note, I don't see any reference to a JIRA to implement the feature to use the schema ID. Do either of you have that one? If not, should I create that and link it for reference? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398065) Time Spent: 11.5h (was: 11h 20m) > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug > Components: kafka integration, Serializers/Deserializers >Affects Versions: 3.1.1 >Reporter: Milan Baran >Assignee: David McGinnis >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, > HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch > > Time Spent: 11.5h > Remaining Estimate: 0h > > According to [Google > groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] > the Confluent avro serialzier uses propertiary format for kafka value - > <4 bytes of schema ID> conforms to schema>. > This format does not cause any problem for Confluent kafka deserializer which > respect the format however for hive kafka handler its bit a problem to > correctly deserialize kafka value, because Hive uses custom deserializer from > bytes to objects and ignores kafka consumer ser/deser classes provided via > table property. > It would be nice to support Confluent format with magic byte. > Also it would be great to support Schema registry as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=398062=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398062 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 05/Mar/20 01:52 Start Date: 05/Mar/20 01:52 Worklog Time Spent: 10m Work Description: davidov541 commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r388037768 ## File path: kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java ## @@ -369,6 +402,26 @@ private SubStructObjectInspector(StructObjectInspector baseOI, int toIndex) { } } +/** + * The converter reads bytes from kafka message and skip first @skipBytes from beginning. + * + * For example: + * The Confluent Avro serializer adds 5 magic bytes that represents Schema ID as Integer to the message. + */ + static class AvroSkipBytesConverter extends AvroBytesConverter { +private final int skipBytes; + +AvroSkipBytesConverter(Schema schema, int skipBytes) { + super(schema); + this.skipBytes = skipBytes; +} + +@Override +Decoder getDecoder(byte[] value) { + return DecoderFactory.get().binaryDecoder(value, this.skipBytes, value.length - this.skipBytes, null); Review comment: BinaryDecoder already throws a nice ArrayIndexOutOfBoundsException in this case, so I'm going to update to catch that, wrap in a SerDe exception, and keep going. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398062) Time Spent: 11h 20m (was: 11h 10m) > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug > Components: kafka integration, Serializers/Deserializers >Affects Versions: 3.1.1 >Reporter: Milan Baran >Assignee: David McGinnis >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, > HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch > > Time Spent: 11h 20m > Remaining Estimate: 0h > > According to [Google > groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] > the Confluent avro serialzier uses propertiary format for kafka value - > <4 bytes of schema ID> conforms to schema>. > This format does not cause any problem for Confluent kafka deserializer which > respect the format however for hive kafka handler its bit a problem to > correctly deserialize kafka value, because Hive uses custom deserializer from > bytes to objects and ignores kafka consumer ser/deser classes provided via > table property. > It would be nice to support Confluent format with magic byte. > Also it would be great to support Schema registry as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=398061=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398061 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 05/Mar/20 01:51 Start Date: 05/Mar/20 01:51 Worklog Time Spent: 10m Work Description: davidov541 commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r388037563 ## File path: kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java ## @@ -133,12 +134,40 @@ Preconditions.checkArgument(!schemaFromProperty.isEmpty(), "Avro Schema is empty Can not go further"); Schema schema = AvroSerdeUtils.getSchemaFor(schemaFromProperty); LOG.debug("Building Avro Reader with schema {}", schemaFromProperty); - bytesConverter = new AvroBytesConverter(schema); + bytesConverter = getByteConverterForAvroDelegate(schema, tbl); } else { bytesConverter = new BytesWritableConverter(); } } + enum BytesConverterType { +CONFLUENT, +SKIP, +NONE; + +static BytesConverterType fromString(String value) { + try { +return BytesConverterType.valueOf(value.trim().toUpperCase()); + } catch (Exception e){ +return NONE; + } +} + } + + BytesConverter getByteConverterForAvroDelegate(Schema schema, Properties tbl) { +String avroBytesConverterPropertyName = AvroSerdeUtils.AvroTableProperties.AVRO_SERDE_TYPE.getPropName(); +String avroBytesConverterProperty = tbl.getProperty(avroBytesConverterPropertyName, + BytesConverterType.NONE.toString()); +BytesConverterType avroByteConverterType = BytesConverterType.fromString(avroBytesConverterProperty); +String avroSkipBytesPropertyName = AvroSerdeUtils.AvroTableProperties.AVRO_SERDE_SKIP_BYTES.getPropName(); +Integer avroSkipBytes = Integer.parseInt(tbl.getProperty(avroSkipBytesPropertyName)); +switch (avroByteConverterType) { + case CONFLUENT: return new AvroSkipBytesConverter(schema, 5); + case SKIP: return new AvroSkipBytesConverter(schema, avroSkipBytes); + default: return new AvroBytesConverter(schema); Review comment: This would be more confusing to me than the current code, personally. I will call out the NONE case, however, along with an error if it's not one of those three. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398061) Time Spent: 11h 10m (was: 11h) > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug > Components: kafka integration, Serializers/Deserializers >Affects Versions: 3.1.1 >Reporter: Milan Baran >Assignee: David McGinnis >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, > HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch > > Time Spent: 11h 10m > Remaining Estimate: 0h > > According to [Google > groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] > the Confluent avro serialzier uses propertiary format for kafka value - > <4 bytes of schema ID> conforms to schema>. > This format does not cause any problem for Confluent kafka deserializer which > respect the format however for hive kafka handler its bit a problem to > correctly deserialize kafka value, because Hive uses custom deserializer from > bytes to objects and ignores kafka consumer ser/deser classes provided via > table property. > It would be nice to support Confluent format with magic byte. > Also it would be great to support Schema registry as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22954) Schedule Repl Load using Hive Scheduler
[ https://issues.apache.org/jira/browse/HIVE-22954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi updated HIVE-22954: --- Status: In Progress (was: Patch Available) > Schedule Repl Load using Hive Scheduler > --- > > Key: HIVE-22954 > URL: https://issues.apache.org/jira/browse/HIVE-22954 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22954.01.patch, HIVE-22954.02.patch, > HIVE-22954.03.patch, HIVE-22954.04.patch, HIVE-22954.05.patch, > HIVE-22954.06.patch, HIVE-22954.07.patch, HIVE-22954.08.patch, > HIVE-22954.09.patch, HIVE-22954.10.patch, HIVE-22954.11.patch, > HIVE-22954.12.patch, HIVE-22954.13.patch, HIVE-22954.15.patch, > HIVE-22954.patch > > > [https://github.com/apache/hive/pull/932] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22954) Schedule Repl Load using Hive Scheduler
[ https://issues.apache.org/jira/browse/HIVE-22954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi updated HIVE-22954: --- Attachment: HIVE-22954.15.patch Status: Patch Available (was: In Progress) > Schedule Repl Load using Hive Scheduler > --- > > Key: HIVE-22954 > URL: https://issues.apache.org/jira/browse/HIVE-22954 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22954.01.patch, HIVE-22954.02.patch, > HIVE-22954.03.patch, HIVE-22954.04.patch, HIVE-22954.05.patch, > HIVE-22954.06.patch, HIVE-22954.07.patch, HIVE-22954.08.patch, > HIVE-22954.09.patch, HIVE-22954.10.patch, HIVE-22954.11.patch, > HIVE-22954.12.patch, HIVE-22954.13.patch, HIVE-22954.15.patch, > HIVE-22954.patch > > > [https://github.com/apache/hive/pull/932] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22954) Schedule Repl Load using Hive Scheduler
[ https://issues.apache.org/jira/browse/HIVE-22954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051726#comment-17051726 ] Hive QA commented on HIVE-22954: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 2m 4s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 49s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 7s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 20s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 57s{color} | {color:blue} parser in master has 3 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 3m 51s{color} | {color:blue} ql in master has 1531 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 43s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 42s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 28s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 11s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 41s{color} | {color:red} ql: The patch generated 2 new + 44 unchanged - 0 fixed = 46 total (was 44) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 29s{color} | {color:red} itests/hive-unit: The patch generated 1 new + 1327 unchanged - 1 fixed = 1328 total (was 1328) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 4m 4s{color} | {color:red} ql generated 1 new + 1531 unchanged - 0 fixed = 1532 total (was 1531) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 37s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 17s{color} | {color:red} The patch generated 2 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 35m 54s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:ql | | | Boxing/unboxing to parse a primitive org.apache.hadoop.hive.ql.parse.ReplicationSemanticAnalyzer.getCurrentLoadPath() At ReplicationSemanticAnalyzer.java:org.apache.hadoop.hive.ql.parse.ReplicationSemanticAnalyzer.getCurrentLoadPath() At ReplicationSemanticAnalyzer.java:[line 446] | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-20954/dev-support/hive-personality.sh | | git revision | master / deebfb6 | | Default Java | 1.8.0_111 | | findbugs | v3.0.1 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-20954/yetus/diff-checkstyle-ql.txt | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-20954/yetus/diff-checkstyle-itests_hive-unit.txt | | findbugs | http://104.198.109.242/logs//PreCommit-HIVE-Build-20954/yetus/new-findbugs-ql.html | | asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-20954/yetus/patch-asflicense-problems.txt | | modules | C: parser ql itests/hive-unit U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-20954/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Schedule
[jira] [Commented] (HIVE-22875) Refactor query creation in QueryCompactor implementations
[ https://issues.apache.org/jira/browse/HIVE-22875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051708#comment-17051708 ] Hive QA commented on HIVE-22875: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12995594/HIVE-22875.01.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 18096 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.ql.TestWarehouseExternalDir.testManagedPaths (batchId=270) org.apache.hadoop.hive.ql.txn.compactor.TestMmCompactorOnTez.testMmMajorCompactionAfterMinor (batchId=270) org.apache.hadoop.hive.ql.txn.compactor.TestMmCompactorOnTez.testMultipleMmMinorCompactions (batchId=270) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/20953/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20953/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20953/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12995594 - PreCommit-HIVE-Build > Refactor query creation in QueryCompactor implementations > - > > Key: HIVE-22875 > URL: https://issues.apache.org/jira/browse/HIVE-22875 > Project: Hive > Issue Type: Improvement >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Major > Attachments: HIVE-22875.01.patch > > > There is a lot of repetition where creation/compaction/drop queries are > created in MajorQueryCompactor, MinorQueryCompactor, MmMajorQueryCompactor > and MmMinorQueryCompactor. > Initial idea is to create a CompactionQueryBuilder that all 4 implementations > would use. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22875) Refactor query creation in QueryCompactor implementations
[ https://issues.apache.org/jira/browse/HIVE-22875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051694#comment-17051694 ] Hive QA commented on HIVE-22875: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 1s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 49s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 10s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 45s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 2s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 3m 50s{color} | {color:blue} ql in master has 1531 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 42s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 21s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 28s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 40s{color} | {color:red} ql: The patch generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 26s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 15s{color} | {color:red} The patch generated 2 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 31m 29s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-20953/dev-support/hive-personality.sh | | git revision | master / deebfb6 | | Default Java | 1.8.0_111 | | findbugs | v3.0.1 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-20953/yetus/diff-checkstyle-ql.txt | | whitespace | http://104.198.109.242/logs//PreCommit-HIVE-Build-20953/yetus/whitespace-eol.txt | | asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-20953/yetus/patch-asflicense-problems.txt | | modules | C: ql itests/hive-unit U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-20953/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Refactor query creation in QueryCompactor implementations > - > > Key: HIVE-22875 > URL: https://issues.apache.org/jira/browse/HIVE-22875 > Project: Hive > Issue Type: Improvement >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Major > Attachments: HIVE-22875.01.patch > > > There is a lot of repetition where creation/compaction/drop queries are > created in MajorQueryCompactor, MinorQueryCompactor, MmMajorQueryCompactor > and MmMinorQueryCompactor. > Initial idea is to create a CompactionQueryBuilder that all 4 implementations > would use. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397993=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397993 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 04/Mar/20 23:45 Start Date: 04/Mar/20 23:45 Worklog Time Spent: 10m Work Description: cricket007 commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r388001426 ## File path: kafka-handler/README.md ## @@ -50,6 +50,9 @@ ALTER TABLE SET TBLPROPERTIES ( "kafka.serde.class" = "org.apache.hadoop.hive.serde2.avro.AvroSerDe"); ``` + +If you use Confluent Avro serialzier/deserializer with Schema Registry you may want to remove 5 bytes from beginning that represents magic byte + schema ID from registry. +It can be done by setting `"avro.serde.type"="confluent"` or `"avro.serde.type"="skip"` with `"avro.serde.skip.bytes"="5"`. It's recommended to set an avro schema via `"avro.schema.url"="http://hostname/SimpleDocument.avsc"` or `"avro.schema.literal"="{"type" : "record","name" : "SimpleRecord","..."}`. If both properties are set then `avro.schema.literal` has higher priority. Review comment: It is still recommended to set the literal if using `confluent`? Link https://github.com/confluentinc/schema-registry/pull/686 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397993) Time Spent: 11h (was: 10h 50m) > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug > Components: kafka integration, Serializers/Deserializers >Affects Versions: 3.1.1 >Reporter: Milan Baran >Assignee: David McGinnis >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, > HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch > > Time Spent: 11h > Remaining Estimate: 0h > > According to [Google > groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] > the Confluent avro serialzier uses propertiary format for kafka value - > <4 bytes of schema ID> conforms to schema>. > This format does not cause any problem for Confluent kafka deserializer which > respect the format however for hive kafka handler its bit a problem to > correctly deserialize kafka value, because Hive uses custom deserializer from > bytes to objects and ignores kafka consumer ser/deser classes provided via > table property. > It would be nice to support Confluent format with magic byte. > Also it would be great to support Schema registry as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397991=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397991 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 04/Mar/20 23:43 Start Date: 04/Mar/20 23:43 Worklog Time Spent: 10m Work Description: cricket007 commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r387999324 ## File path: kafka-handler/src/test/org/apache/hadoop/hive/kafka/AvroBytesConverterTest.java ## @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.kafka; + +import com.google.common.collect.Maps; +import io.confluent.kafka.schemaregistry.client.MockSchemaRegistryClient; +import io.confluent.kafka.serializers.AbstractKafkaAvroSerDeConfig; +import io.confluent.kafka.serializers.KafkaAvroSerializer; +import org.apache.avro.Schema; +import org.apache.hadoop.hive.serde2.avro.AvroGenericRecordWritable; +import org.junit.Assert; +import org.junit.BeforeClass; +import org.junit.Test; + +import java.util.Arrays; +import java.util.Map; + +/** + * Test class for Hive Kafka Avro bytes converter. + */ +public class AvroBytesConverterTest { + private static SimpleRecord simpleRecord1 = SimpleRecord.newBuilder().setId("123").setName("test").build(); + private static byte[] simpleRecord1AsBytes; + + /** + * Emulate confluent avro producer that add 4 magic bits (int) before value bytes. The int represents the schema ID from schema registry. + */ + @BeforeClass + public static void setUp() { +Map config = Maps.newHashMap(); +config.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, "http://localhost:8081;); +KafkaAvroSerializer avroSerializer = new KafkaAvroSerializer(new MockSchemaRegistryClient()); +avroSerializer.configure(config, false); +simpleRecord1AsBytes = avroSerializer.serialize("temp", simpleRecord1); + } + + /** + * Emulate - avro.serde.type = none (Default). + */ + @Test + public void convertWithAvroBytesConverter() { +Schema schema = SimpleRecord.getClassSchema(); +KafkaSerDe.AvroBytesConverter conv = new KafkaSerDe.AvroBytesConverter(schema); +AvroGenericRecordWritable simpleRecord1Writable = conv.getWritable(simpleRecord1AsBytes); + +Assert.assertNotNull(simpleRecord1Writable); +Assert.assertEquals(SimpleRecord.class, simpleRecord1Writable.getRecord().getClass()); + +SimpleRecord simpleRecord1Deserialized = (SimpleRecord) simpleRecord1Writable.getRecord(); + +Assert.assertNotNull(simpleRecord1Deserialized); +Assert.assertNotEquals(simpleRecord1, simpleRecord1Deserialized); + } + + /** + * Emulate - avro.serde.type = confluent. + */ + @Test + public void convertWithConfluentAvroBytesConverter() { +Schema schema = SimpleRecord.getClassSchema(); +KafkaSerDe.AvroSkipBytesConverter conv = new KafkaSerDe.AvroSkipBytesConverter(schema, 5); +AvroGenericRecordWritable simpleRecord1Writable = conv.getWritable(simpleRecord1AsBytes); + +Assert.assertNotNull(simpleRecord1Writable); +Assert.assertEquals(SimpleRecord.class, simpleRecord1Writable.getRecord().getClass()); + +SimpleRecord simpleRecord1Deserialized = (SimpleRecord) simpleRecord1Writable.getRecord(); + +Assert.assertNotNull(simpleRecord1Deserialized); +Assert.assertEquals(simpleRecord1, simpleRecord1Deserialized); Review comment: What does the `1` represent here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397991) Time Spent: 10h 50m (was: 10h 40m) > KafkaSerDe doesn't support topics created via Confluent Avro serializer >
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397988=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397988 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 04/Mar/20 23:43 Start Date: 04/Mar/20 23:43 Worklog Time Spent: 10m Work Description: cricket007 commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r388000371 ## File path: kafka-handler/pom.xml ## @@ -118,8 +118,27 @@ 1.7.30 test + + io.confluent + kafka-avro-serializer + 5.4.0 + test + + + org.apache.avro + avro Review comment: Avro itself is still needed as a compile-time & runtime dependency elsewhere. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397988) Time Spent: 10h 40m (was: 10.5h) > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug > Components: kafka integration, Serializers/Deserializers >Affects Versions: 3.1.1 >Reporter: Milan Baran >Assignee: David McGinnis >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, > HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch > > Time Spent: 10h 40m > Remaining Estimate: 0h > > According to [Google > groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] > the Confluent avro serialzier uses propertiary format for kafka value - > <4 bytes of schema ID> conforms to schema>. > This format does not cause any problem for Confluent kafka deserializer which > respect the format however for hive kafka handler its bit a problem to > correctly deserialize kafka value, because Hive uses custom deserializer from > bytes to objects and ignores kafka consumer ser/deser classes provided via > table property. > It would be nice to support Confluent format with magic byte. > Also it would be great to support Schema registry as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397983=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397983 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 04/Mar/20 23:43 Start Date: 04/Mar/20 23:43 Worklog Time Spent: 10m Work Description: cricket007 commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r387997559 ## File path: kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java ## @@ -133,12 +134,40 @@ Preconditions.checkArgument(!schemaFromProperty.isEmpty(), "Avro Schema is empty Can not go further"); Schema schema = AvroSerdeUtils.getSchemaFor(schemaFromProperty); LOG.debug("Building Avro Reader with schema {}", schemaFromProperty); - bytesConverter = new AvroBytesConverter(schema); + bytesConverter = getByteConverterForAvroDelegate(schema, tbl); } else { bytesConverter = new BytesWritableConverter(); } } + enum BytesConverterType { +CONFLUENT, Review comment: Overall, I'm somewhat in agreement with @b-slim here. There is little reason to make a specific "subtype" if it is just documented that `avro.skip.bytes=5` will get the necessary Avro payload. **However**, you would not know _which_ of those 5 bytes actually represents the schema ID in order to set `schema.literal` behind the scenes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397983) Time Spent: 10h 10m (was: 10h) > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug > Components: kafka integration, Serializers/Deserializers >Affects Versions: 3.1.1 >Reporter: Milan Baran >Assignee: David McGinnis >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, > HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch > > Time Spent: 10h 10m > Remaining Estimate: 0h > > According to [Google > groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] > the Confluent avro serialzier uses propertiary format for kafka value - > <4 bytes of schema ID> conforms to schema>. > This format does not cause any problem for Confluent kafka deserializer which > respect the format however for hive kafka handler its bit a problem to > correctly deserialize kafka value, because Hive uses custom deserializer from > bytes to objects and ignores kafka consumer ser/deser classes provided via > table property. > It would be nice to support Confluent format with magic byte. > Also it would be great to support Schema registry as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397989=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397989 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 04/Mar/20 23:43 Start Date: 04/Mar/20 23:43 Worklog Time Spent: 10m Work Description: cricket007 commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r387999601 ## File path: kafka-handler/src/test/org/apache/hadoop/hive/kafka/AvroBytesConverterTest.java ## @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.kafka; + +import com.google.common.collect.Maps; +import io.confluent.kafka.schemaregistry.client.MockSchemaRegistryClient; +import io.confluent.kafka.serializers.AbstractKafkaAvroSerDeConfig; +import io.confluent.kafka.serializers.KafkaAvroSerializer; +import org.apache.avro.Schema; +import org.apache.hadoop.hive.serde2.avro.AvroGenericRecordWritable; +import org.junit.Assert; +import org.junit.BeforeClass; +import org.junit.Test; + +import java.util.Arrays; +import java.util.Map; + +/** + * Test class for Hive Kafka Avro bytes converter. + */ +public class AvroBytesConverterTest { + private static SimpleRecord simpleRecord1 = SimpleRecord.newBuilder().setId("123").setName("test").build(); + private static byte[] simpleRecord1AsBytes; + + /** + * Emulate confluent avro producer that add 4 magic bits (int) before value bytes. The int represents the schema ID from schema registry. + */ + @BeforeClass + public static void setUp() { +Map config = Maps.newHashMap(); +config.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, "http://localhost:8081;); +KafkaAvroSerializer avroSerializer = new KafkaAvroSerializer(new MockSchemaRegistryClient()); +avroSerializer.configure(config, false); +simpleRecord1AsBytes = avroSerializer.serialize("temp", simpleRecord1); + } + + /** + * Emulate - avro.serde.type = none (Default). + */ + @Test + public void convertWithAvroBytesConverter() { +Schema schema = SimpleRecord.getClassSchema(); +KafkaSerDe.AvroBytesConverter conv = new KafkaSerDe.AvroBytesConverter(schema); +AvroGenericRecordWritable simpleRecord1Writable = conv.getWritable(simpleRecord1AsBytes); + +Assert.assertNotNull(simpleRecord1Writable); +Assert.assertEquals(SimpleRecord.class, simpleRecord1Writable.getRecord().getClass()); + +SimpleRecord simpleRecord1Deserialized = (SimpleRecord) simpleRecord1Writable.getRecord(); + +Assert.assertNotNull(simpleRecord1Deserialized); +Assert.assertNotEquals(simpleRecord1, simpleRecord1Deserialized); + } + + /** + * Emulate - avro.serde.type = confluent. + */ + @Test + public void convertWithConfluentAvroBytesConverter() { +Schema schema = SimpleRecord.getClassSchema(); +KafkaSerDe.AvroSkipBytesConverter conv = new KafkaSerDe.AvroSkipBytesConverter(schema, 5); +AvroGenericRecordWritable simpleRecord1Writable = conv.getWritable(simpleRecord1AsBytes); + +Assert.assertNotNull(simpleRecord1Writable); +Assert.assertEquals(SimpleRecord.class, simpleRecord1Writable.getRecord().getClass()); + +SimpleRecord simpleRecord1Deserialized = (SimpleRecord) simpleRecord1Writable.getRecord(); + +Assert.assertNotNull(simpleRecord1Deserialized); +Assert.assertEquals(simpleRecord1, simpleRecord1Deserialized); + } + + /** + * Emulate - avro.serde.type = skip. + */ + @Test + public void convertWithCustomAvroSkipBytesConverter() { +int offset = 2; +byte[] simpleRecordAsOffsetBytes = Arrays.copyOfRange(simpleRecord1AsBytes, 5 - offset, simpleRecord1AsBytes.length); + +Schema schema = SimpleRecord.getClassSchema(); +KafkaSerDe.AvroSkipBytesConverter conv = new KafkaSerDe.AvroSkipBytesConverter(schema, offset); +AvroGenericRecordWritable simpleRecord1Writable = conv.getWritable(simpleRecordAsOffsetBytes); + +Assert.assertNotNull(simpleRecord1Writable); +
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397990=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397990 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 04/Mar/20 23:43 Start Date: 04/Mar/20 23:43 Worklog Time Spent: 10m Work Description: cricket007 commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r387998699 ## File path: kafka-handler/src/test/org/apache/hadoop/hive/kafka/AvroBytesConverterTest.java ## @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.kafka; + +import com.google.common.collect.Maps; +import io.confluent.kafka.schemaregistry.client.MockSchemaRegistryClient; +import io.confluent.kafka.serializers.AbstractKafkaAvroSerDeConfig; +import io.confluent.kafka.serializers.KafkaAvroSerializer; +import org.apache.avro.Schema; +import org.apache.hadoop.hive.serde2.avro.AvroGenericRecordWritable; +import org.junit.Assert; +import org.junit.BeforeClass; +import org.junit.Test; + +import java.util.Arrays; +import java.util.Map; + +/** + * Test class for Hive Kafka Avro bytes converter. + */ +public class AvroBytesConverterTest { + private static SimpleRecord simpleRecord1 = SimpleRecord.newBuilder().setId("123").setName("test").build(); + private static byte[] simpleRecord1AsBytes; + + /** + * Emulate confluent avro producer that add 4 magic bits (int) before value bytes. The int represents the schema ID from schema registry. Review comment: nit: `Confluent` is a noun/company This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397990) > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug > Components: kafka integration, Serializers/Deserializers >Affects Versions: 3.1.1 >Reporter: Milan Baran >Assignee: David McGinnis >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, > HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch > > Time Spent: 10h 40m > Remaining Estimate: 0h > > According to [Google > groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] > the Confluent avro serialzier uses propertiary format for kafka value - > <4 bytes of schema ID> conforms to schema>. > This format does not cause any problem for Confluent kafka deserializer which > respect the format however for hive kafka handler its bit a problem to > correctly deserialize kafka value, because Hive uses custom deserializer from > bytes to objects and ignores kafka consumer ser/deser classes provided via > table property. > It would be nice to support Confluent format with magic byte. > Also it would be great to support Schema registry as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397985=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397985 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 04/Mar/20 23:43 Start Date: 04/Mar/20 23:43 Worklog Time Spent: 10m Work Description: cricket007 commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r387998774 ## File path: kafka-handler/src/test/org/apache/hadoop/hive/kafka/AvroBytesConverterTest.java ## @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.kafka; + +import com.google.common.collect.Maps; +import io.confluent.kafka.schemaregistry.client.MockSchemaRegistryClient; +import io.confluent.kafka.serializers.AbstractKafkaAvroSerDeConfig; +import io.confluent.kafka.serializers.KafkaAvroSerializer; +import org.apache.avro.Schema; +import org.apache.hadoop.hive.serde2.avro.AvroGenericRecordWritable; +import org.junit.Assert; +import org.junit.BeforeClass; +import org.junit.Test; + +import java.util.Arrays; +import java.util.Map; + +/** + * Test class for Hive Kafka Avro bytes converter. + */ +public class AvroBytesConverterTest { + private static SimpleRecord simpleRecord1 = SimpleRecord.newBuilder().setId("123").setName("test").build(); + private static byte[] simpleRecord1AsBytes; + + /** + * Emulate confluent avro producer that add 4 magic bits (int) before value bytes. The int represents the schema ID from schema registry. Review comment: `Schema Registry` is a noun/product This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397985) Time Spent: 10h 20m (was: 10h 10m) > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug > Components: kafka integration, Serializers/Deserializers >Affects Versions: 3.1.1 >Reporter: Milan Baran >Assignee: David McGinnis >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, > HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch > > Time Spent: 10h 20m > Remaining Estimate: 0h > > According to [Google > groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] > the Confluent avro serialzier uses propertiary format for kafka value - > <4 bytes of schema ID> conforms to schema>. > This format does not cause any problem for Confluent kafka deserializer which > respect the format however for hive kafka handler its bit a problem to > correctly deserialize kafka value, because Hive uses custom deserializer from > bytes to objects and ignores kafka consumer ser/deser classes provided via > table property. > It would be nice to support Confluent format with magic byte. > Also it would be great to support Schema registry as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397992=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397992 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 04/Mar/20 23:43 Start Date: 04/Mar/20 23:43 Worklog Time Spent: 10m Work Description: cricket007 commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r387999851 ## File path: kafka-handler/src/test/org/apache/hadoop/hive/kafka/AvroBytesConverterTest.java ## @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.kafka; + +import com.google.common.collect.Maps; +import io.confluent.kafka.schemaregistry.client.MockSchemaRegistryClient; +import io.confluent.kafka.serializers.AbstractKafkaAvroSerDeConfig; +import io.confluent.kafka.serializers.KafkaAvroSerializer; +import org.apache.avro.Schema; +import org.apache.hadoop.hive.serde2.avro.AvroGenericRecordWritable; +import org.junit.Assert; +import org.junit.BeforeClass; +import org.junit.Test; + +import java.util.Arrays; +import java.util.Map; + +/** + * Test class for Hive Kafka Avro bytes converter. + */ +public class AvroBytesConverterTest { + private static SimpleRecord simpleRecord1 = SimpleRecord.newBuilder().setId("123").setName("test").build(); + private static byte[] simpleRecord1AsBytes; + + /** + * Emulate confluent avro producer that add 4 magic bits (int) before value bytes. The int represents the schema ID from schema registry. + */ + @BeforeClass + public static void setUp() { +Map config = Maps.newHashMap(); +config.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, "http://localhost:8081;); +KafkaAvroSerializer avroSerializer = new KafkaAvroSerializer(new MockSchemaRegistryClient()); +avroSerializer.configure(config, false); +simpleRecord1AsBytes = avroSerializer.serialize("temp", simpleRecord1); + } + + /** + * Emulate - avro.serde.type = none (Default). + */ + @Test + public void convertWithAvroBytesConverter() { +Schema schema = SimpleRecord.getClassSchema(); +KafkaSerDe.AvroBytesConverter conv = new KafkaSerDe.AvroBytesConverter(schema); +AvroGenericRecordWritable simpleRecord1Writable = conv.getWritable(simpleRecord1AsBytes); + +Assert.assertNotNull(simpleRecord1Writable); +Assert.assertEquals(SimpleRecord.class, simpleRecord1Writable.getRecord().getClass()); + +SimpleRecord simpleRecord1Deserialized = (SimpleRecord) simpleRecord1Writable.getRecord(); + +Assert.assertNotNull(simpleRecord1Deserialized); +Assert.assertNotEquals(simpleRecord1, simpleRecord1Deserialized); + } + + /** + * Emulate - avro.serde.type = confluent. + */ + @Test + public void convertWithConfluentAvroBytesConverter() { +Schema schema = SimpleRecord.getClassSchema(); +KafkaSerDe.AvroSkipBytesConverter conv = new KafkaSerDe.AvroSkipBytesConverter(schema, 5); +AvroGenericRecordWritable simpleRecord1Writable = conv.getWritable(simpleRecord1AsBytes); + +Assert.assertNotNull(simpleRecord1Writable); +Assert.assertEquals(SimpleRecord.class, simpleRecord1Writable.getRecord().getClass()); + +SimpleRecord simpleRecord1Deserialized = (SimpleRecord) simpleRecord1Writable.getRecord(); + +Assert.assertNotNull(simpleRecord1Deserialized); +Assert.assertEquals(simpleRecord1, simpleRecord1Deserialized); + } + + /** + * Emulate - avro.serde.type = skip. Review comment: You have three methods that roughly do the same thing. Might be a good use case for https://github.com/junit-team/junit4/wiki/Parameterized-tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id:
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397984=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397984 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 04/Mar/20 23:43 Start Date: 04/Mar/20 23:43 Worklog Time Spent: 10m Work Description: cricket007 commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r387997714 ## File path: kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java ## @@ -133,12 +134,40 @@ Preconditions.checkArgument(!schemaFromProperty.isEmpty(), "Avro Schema is empty Can not go further"); Schema schema = AvroSerdeUtils.getSchemaFor(schemaFromProperty); LOG.debug("Building Avro Reader with schema {}", schemaFromProperty); - bytesConverter = new AvroBytesConverter(schema); + bytesConverter = getByteConverterForAvroDelegate(schema, tbl); } else { bytesConverter = new BytesWritableConverter(); } } + enum BytesConverterType { +CONFLUENT, +SKIP, +NONE; + +static BytesConverterType fromString(String value) { + try { +return BytesConverterType.valueOf(value.trim().toUpperCase()); + } catch (Exception e){ +return NONE; + } +} + } + + BytesConverter getByteConverterForAvroDelegate(Schema schema, Properties tbl) { +String avroBytesConverterPropertyName = AvroSerdeUtils.AvroTableProperties.AVRO_SERDE_TYPE.getPropName(); +String avroBytesConverterProperty = tbl.getProperty(avroBytesConverterPropertyName, + BytesConverterType.NONE.toString()); +BytesConverterType avroByteConverterType = BytesConverterType.fromString(avroBytesConverterProperty); +String avroSkipBytesPropertyName = AvroSerdeUtils.AvroTableProperties.AVRO_SERDE_SKIP_BYTES.getPropName(); +Integer avroSkipBytes = Integer.parseInt(tbl.getProperty(avroSkipBytesPropertyName)); Review comment: nit `catch (NumberFormatException )` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397984) Time Spent: 10h 20m (was: 10h 10m) > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug > Components: kafka integration, Serializers/Deserializers >Affects Versions: 3.1.1 >Reporter: Milan Baran >Assignee: David McGinnis >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, > HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch > > Time Spent: 10h 20m > Remaining Estimate: 0h > > According to [Google > groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] > the Confluent avro serialzier uses propertiary format for kafka value - > <4 bytes of schema ID> conforms to schema>. > This format does not cause any problem for Confluent kafka deserializer which > respect the format however for hive kafka handler its bit a problem to > correctly deserialize kafka value, because Hive uses custom deserializer from > bytes to objects and ignores kafka consumer ser/deser classes provided via > table property. > It would be nice to support Confluent format with magic byte. > Also it would be great to support Schema registry as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397982=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397982 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 04/Mar/20 23:43 Start Date: 04/Mar/20 23:43 Worklog Time Spent: 10m Work Description: cricket007 commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r387995318 ## File path: kafka-handler/README.md ## @@ -50,6 +50,9 @@ ALTER TABLE SET TBLPROPERTIES ( "kafka.serde.class" = "org.apache.hadoop.hive.serde2.avro.AvroSerDe"); ``` + +If you use Confluent Avro serialzier/deserializer with Schema Registry you may want to remove 5 bytes from beginning that represents magic byte + schema ID from registry. Review comment: ```suggestion If you use Confluent's Avro serializer/deserializer with their Schema Registry, you may want to remove the first 5 bytes from the beginning of the Kafka field, which represent a magic byte (0x0) & a numeric schema integer ID. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397982) Time Spent: 10h (was: 9h 50m) > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug > Components: kafka integration, Serializers/Deserializers >Affects Versions: 3.1.1 >Reporter: Milan Baran >Assignee: David McGinnis >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, > HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch > > Time Spent: 10h > Remaining Estimate: 0h > > According to [Google > groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] > the Confluent avro serialzier uses propertiary format for kafka value - > <4 bytes of schema ID> conforms to schema>. > This format does not cause any problem for Confluent kafka deserializer which > respect the format however for hive kafka handler its bit a problem to > correctly deserialize kafka value, because Hive uses custom deserializer from > bytes to objects and ignores kafka consumer ser/deser classes provided via > table property. > It would be nice to support Confluent format with magic byte. > Also it would be great to support Schema registry as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397987=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397987 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 04/Mar/20 23:43 Start Date: 04/Mar/20 23:43 Worklog Time Spent: 10m Work Description: cricket007 commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r387998553 ## File path: kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java ## @@ -369,6 +402,26 @@ private SubStructObjectInspector(StructObjectInspector baseOI, int toIndex) { } } +/** + * The converter reads bytes from kafka message and skip first @skipBytes from beginning. + * + * For example: + * The Confluent Avro serializer adds 5 magic bytes that represents Schema ID as Integer to the message. + */ + static class AvroSkipBytesConverter extends AvroBytesConverter { +private final int skipBytes; + +AvroSkipBytesConverter(Schema schema, int skipBytes) { + super(schema); + this.skipBytes = skipBytes; +} + +@Override +Decoder getDecoder(byte[] value) { + return DecoderFactory.get().binaryDecoder(value, this.skipBytes, value.length - this.skipBytes, null); Review comment: Should there be validation that `skipBytes.length < value.length`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397987) > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug > Components: kafka integration, Serializers/Deserializers >Affects Versions: 3.1.1 >Reporter: Milan Baran >Assignee: David McGinnis >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, > HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch > > Time Spent: 10.5h > Remaining Estimate: 0h > > According to [Google > groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] > the Confluent avro serialzier uses propertiary format for kafka value - > <4 bytes of schema ID> conforms to schema>. > This format does not cause any problem for Confluent kafka deserializer which > respect the format however for hive kafka handler its bit a problem to > correctly deserialize kafka value, because Hive uses custom deserializer from > bytes to objects and ignores kafka consumer ser/deser classes provided via > table property. > It would be nice to support Confluent format with magic byte. > Also it would be great to support Schema registry as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397981=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397981 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 04/Mar/20 23:43 Start Date: 04/Mar/20 23:43 Worklog Time Spent: 10m Work Description: cricket007 commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r387996074 ## File path: kafka-handler/README.md ## @@ -50,6 +50,9 @@ ALTER TABLE SET TBLPROPERTIES ( "kafka.serde.class" = "org.apache.hadoop.hive.serde2.avro.AvroSerDe"); ``` + +If you use Confluent Avro serialzier/deserializer with Schema Registry you may want to remove 5 bytes from beginning that represents magic byte + schema ID from registry. +It can be done by setting `"avro.serde.type"="confluent"` or `"avro.serde.type"="skip"` with `"avro.serde.skip.bytes"="5"`. It's recommended to set an avro schema via `"avro.schema.url"="http://hostname/SimpleDocument.avsc"` or `"avro.schema.literal"="{"type" : "record","name" : "SimpleRecord","..."}`. If both properties are set then `avro.schema.literal` has higher priority. Review comment: ```suggestion It can be done by setting `"avro.serde.type"="confluent"` or `"avro.serde.type"="skip"` with `"avro.serde.skip.bytes"="5"`. It's recommended to set an Avro schema via `"avro.schema.url"="http://hostname/SimpleDocument.avsc"` or `"avro.schema.literal"="{"type" : "record","name" : "SimpleRecord","..."}`. If both properties are set then `avro.schema.literal` has higher priority. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397981) Time Spent: 10h (was: 9h 50m) > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug > Components: kafka integration, Serializers/Deserializers >Affects Versions: 3.1.1 >Reporter: Milan Baran >Assignee: David McGinnis >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, > HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch > > Time Spent: 10h > Remaining Estimate: 0h > > According to [Google > groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] > the Confluent avro serialzier uses propertiary format for kafka value - > <4 bytes of schema ID> conforms to schema>. > This format does not cause any problem for Confluent kafka deserializer which > respect the format however for hive kafka handler its bit a problem to > correctly deserialize kafka value, because Hive uses custom deserializer from > bytes to objects and ignores kafka consumer ser/deser classes provided via > table property. > It would be nice to support Confluent format with magic byte. > Also it would be great to support Schema registry as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397986=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397986 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 04/Mar/20 23:43 Start Date: 04/Mar/20 23:43 Worklog Time Spent: 10m Work Description: cricket007 commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r387998210 ## File path: kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java ## @@ -133,12 +134,40 @@ Preconditions.checkArgument(!schemaFromProperty.isEmpty(), "Avro Schema is empty Can not go further"); Schema schema = AvroSerdeUtils.getSchemaFor(schemaFromProperty); LOG.debug("Building Avro Reader with schema {}", schemaFromProperty); - bytesConverter = new AvroBytesConverter(schema); + bytesConverter = getByteConverterForAvroDelegate(schema, tbl); } else { bytesConverter = new BytesWritableConverter(); } } + enum BytesConverterType { +CONFLUENT, +SKIP, +NONE; + +static BytesConverterType fromString(String value) { + try { +return BytesConverterType.valueOf(value.trim().toUpperCase()); + } catch (Exception e){ +return NONE; + } +} + } + + BytesConverter getByteConverterForAvroDelegate(Schema schema, Properties tbl) { +String avroBytesConverterPropertyName = AvroSerdeUtils.AvroTableProperties.AVRO_SERDE_TYPE.getPropName(); +String avroBytesConverterProperty = tbl.getProperty(avroBytesConverterPropertyName, + BytesConverterType.NONE.toString()); +BytesConverterType avroByteConverterType = BytesConverterType.fromString(avroBytesConverterProperty); +String avroSkipBytesPropertyName = AvroSerdeUtils.AvroTableProperties.AVRO_SERDE_SKIP_BYTES.getPropName(); +Integer avroSkipBytes = Integer.parseInt(tbl.getProperty(avroSkipBytesPropertyName)); +switch (avroByteConverterType) { + case CONFLUENT: return new AvroSkipBytesConverter(schema, 5); + case SKIP: return new AvroSkipBytesConverter(schema, avroSkipBytes); + default: return new AvroBytesConverter(schema); Review comment: Could fall-through on the cases. ``` int skipBytes = avroSkipBytes; switch (avroByteConverterType) { case CONFLUENT: skipBytes = 5; case SKIP: return new AvroSkipBytesConverter(schema, avroSkipBytes); default: return new AvroBytesConverter(schema); ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397986) Time Spent: 10.5h (was: 10h 20m) > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug > Components: kafka integration, Serializers/Deserializers >Affects Versions: 3.1.1 >Reporter: Milan Baran >Assignee: David McGinnis >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, > HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch > > Time Spent: 10.5h > Remaining Estimate: 0h > > According to [Google > groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] > the Confluent avro serialzier uses propertiary format for kafka value - > <4 bytes of schema ID> conforms to schema>. > This format does not cause any problem for Confluent kafka deserializer which > respect the format however for hive kafka handler its bit a problem to > correctly deserialize kafka value, because Hive uses custom deserializer from > bytes to objects and ignores kafka consumer ser/deser classes provided via > table property. > It would be nice to support Confluent format with magic byte. > Also it would be great to support Schema registry as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22865) Include data in replication staging directory
[ https://issues.apache.org/jira/browse/HIVE-22865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PRAVIN KUMAR SINHA updated HIVE-22865: -- Attachment: HIVE-22865.9.patch > Include data in replication staging directory > - > > Key: HIVE-22865 > URL: https://issues.apache.org/jira/browse/HIVE-22865 > Project: Hive > Issue Type: Task >Reporter: PRAVIN KUMAR SINHA >Assignee: PRAVIN KUMAR SINHA >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22865.1.patch, HIVE-22865.2.patch, > HIVE-22865.3.patch, HIVE-22865.4.patch, HIVE-22865.5.patch, > HIVE-22865.6.patch, HIVE-22865.7.patch, HIVE-22865.8.patch, HIVE-22865.9.patch > > Time Spent: 4h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-22126) hive-exec packaging should shade guava
[ https://issues.apache.org/jira/browse/HIVE-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051680#comment-17051680 ] Eugene Chung edited comment on HIVE-22126 at 3/4/20, 11:26 PM: --- [^HIVE-22126.07.patch] I missed some tests which also require dependency of calcite-core dependent modules. was (Author: euigeun_chung): [^HIVE-22126.07.patch] I missed some tests which requires dependency of calcite-core dependent modules. > hive-exec packaging should shade guava > -- > > Key: HIVE-22126 > URL: https://issues.apache.org/jira/browse/HIVE-22126 > Project: Hive > Issue Type: Bug >Reporter: Vihang Karajgaonkar >Assignee: Eugene Chung >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22126.01.patch, HIVE-22126.02.patch, > HIVE-22126.03.patch, HIVE-22126.04.patch, HIVE-22126.05.patch, > HIVE-22126.06.patch, HIVE-22126.07.patch > > > The ql/pom.xml includes complete guava library into hive-exec.jar > https://github.com/apache/hive/blob/master/ql/pom.xml#L990 This causes a > problems for downstream clients of hive which have hive-exec.jar in their > classpath since they are pinned to the same guava version as that of hive. > We should shade guava classes so that other components which depend on > hive-exec can independently use a different version of guava as needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397965=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397965 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 04/Mar/20 23:26 Start Date: 04/Mar/20 23:26 Worklog Time Spent: 10m Work Description: cricket007 commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r387995005 ## File path: kafka-handler/README.md ## @@ -50,6 +50,9 @@ ALTER TABLE SET TBLPROPERTIES ( "kafka.serde.class" = "org.apache.hadoop.hive.serde2.avro.AvroSerDe"); ``` + +If you use Confluent Avro serialzier/deserializer with Schema Registry you may want to remove 5 bytes from beginning that represents magic byte + schema ID from registry. Review comment: Links for Context - https://docs.confluent.io/current/schema-registry/serializer-formatter.html#wire-format However, there are *other* Registries, so I would like to make a point to make this as extensible as possible. - https://github.com/hortonworks/registry - https://github.com/Apicurio/apicurio-registry/ [Confluent's branding says `Confluent Schema Regisry`](https://docs.confluent.io/current/schema-registry/index.html), so I think it's self-explanatory enough This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397965) Time Spent: 9h 40m (was: 9.5h) > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug > Components: kafka integration, Serializers/Deserializers >Affects Versions: 3.1.1 >Reporter: Milan Baran >Assignee: David McGinnis >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, > HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch > > Time Spent: 9h 40m > Remaining Estimate: 0h > > According to [Google > groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] > the Confluent avro serialzier uses propertiary format for kafka value - > <4 bytes of schema ID> conforms to schema>. > This format does not cause any problem for Confluent kafka deserializer which > respect the format however for hive kafka handler its bit a problem to > correctly deserialize kafka value, because Hive uses custom deserializer from > bytes to objects and ignores kafka consumer ser/deser classes provided via > table property. > It would be nice to support Confluent format with magic byte. > Also it would be great to support Schema registry as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397966=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397966 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 04/Mar/20 23:26 Start Date: 04/Mar/20 23:26 Worklog Time Spent: 10m Work Description: cricket007 commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r387995005 ## File path: kafka-handler/README.md ## @@ -50,6 +50,9 @@ ALTER TABLE SET TBLPROPERTIES ( "kafka.serde.class" = "org.apache.hadoop.hive.serde2.avro.AvroSerDe"); ``` + +If you use Confluent Avro serialzier/deserializer with Schema Registry you may want to remove 5 bytes from beginning that represents magic byte + schema ID from registry. Review comment: Links for Context - https://docs.confluent.io/current/schema-registry/serializer-formatter.html#wire-format However, there are *other* Registries, so I would like to make a point to have this be as extensible as possible. - https://github.com/hortonworks/registry - https://github.com/Apicurio/apicurio-registry/ [Confluent's branding says `Confluent Schema Regisry`](https://docs.confluent.io/current/schema-registry/index.html), so I think it's self-explanatory enough This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397966) Time Spent: 9h 50m (was: 9h 40m) > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug > Components: kafka integration, Serializers/Deserializers >Affects Versions: 3.1.1 >Reporter: Milan Baran >Assignee: David McGinnis >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, > HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch > > Time Spent: 9h 50m > Remaining Estimate: 0h > > According to [Google > groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] > the Confluent avro serialzier uses propertiary format for kafka value - > <4 bytes of schema ID> conforms to schema>. > This format does not cause any problem for Confluent kafka deserializer which > respect the format however for hive kafka handler its bit a problem to > correctly deserialize kafka value, because Hive uses custom deserializer from > bytes to objects and ignores kafka consumer ser/deser classes provided via > table property. > It would be nice to support Confluent format with magic byte. > Also it would be great to support Schema registry as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Issue Comment Deleted] (HIVE-22126) hive-exec packaging should shade guava
[ https://issues.apache.org/jira/browse/HIVE-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Chung updated HIVE-22126: Comment: was deleted (was: [~dlavati] Shading guava for Hive also requires shading calcite modules. And it leads to changing the FQCN of calcite-avatica JDBC driver. e.g. * org.apache.calcite.jdbc.Driver -> org.apache.hive.org.apache.calcite.jdbc.Driver I stopped there cause I was not sure it's okay to change it. If changing the name of driver is just internal or test concern, I think it's okay. I have some free time these days, so I am going to investigate this again.) > hive-exec packaging should shade guava > -- > > Key: HIVE-22126 > URL: https://issues.apache.org/jira/browse/HIVE-22126 > Project: Hive > Issue Type: Bug >Reporter: Vihang Karajgaonkar >Assignee: Eugene Chung >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22126.01.patch, HIVE-22126.02.patch, > HIVE-22126.03.patch, HIVE-22126.04.patch, HIVE-22126.05.patch, > HIVE-22126.06.patch, HIVE-22126.07.patch > > > The ql/pom.xml includes complete guava library into hive-exec.jar > https://github.com/apache/hive/blob/master/ql/pom.xml#L990 This causes a > problems for downstream clients of hive which have hive-exec.jar in their > classpath since they are pinned to the same guava version as that of hive. > We should shade guava classes so that other components which depend on > hive-exec can independently use a different version of guava as needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22126) hive-exec packaging should shade guava
[ https://issues.apache.org/jira/browse/HIVE-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Chung updated HIVE-22126: Attachment: HIVE-22126.07.patch Status: Patch Available (was: Open) [^HIVE-22126.07.patch] I missed some tests which requires dependency of calcite-core dependent modules. > hive-exec packaging should shade guava > -- > > Key: HIVE-22126 > URL: https://issues.apache.org/jira/browse/HIVE-22126 > Project: Hive > Issue Type: Bug >Reporter: Vihang Karajgaonkar >Assignee: Eugene Chung >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22126.01.patch, HIVE-22126.02.patch, > HIVE-22126.03.patch, HIVE-22126.04.patch, HIVE-22126.05.patch, > HIVE-22126.06.patch, HIVE-22126.07.patch > > > The ql/pom.xml includes complete guava library into hive-exec.jar > https://github.com/apache/hive/blob/master/ql/pom.xml#L990 This causes a > problems for downstream clients of hive which have hive-exec.jar in their > classpath since they are pinned to the same guava version as that of hive. > We should shade guava classes so that other components which depend on > hive-exec can independently use a different version of guava as needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22126) hive-exec packaging should shade guava
[ https://issues.apache.org/jira/browse/HIVE-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Chung updated HIVE-22126: Status: Open (was: Patch Available) > hive-exec packaging should shade guava > -- > > Key: HIVE-22126 > URL: https://issues.apache.org/jira/browse/HIVE-22126 > Project: Hive > Issue Type: Bug >Reporter: Vihang Karajgaonkar >Assignee: Eugene Chung >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22126.01.patch, HIVE-22126.02.patch, > HIVE-22126.03.patch, HIVE-22126.04.patch, HIVE-22126.05.patch, > HIVE-22126.06.patch > > > The ql/pom.xml includes complete guava library into hive-exec.jar > https://github.com/apache/hive/blob/master/ql/pom.xml#L990 This causes a > problems for downstream clients of hive which have hive-exec.jar in their > classpath since they are pinned to the same guava version as that of hive. > We should shade guava classes so that other components which depend on > hive-exec can independently use a different version of guava as needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22975) Optimise TopNKeyFilter with boundary checks
[ https://issues.apache.org/jira/browse/HIVE-22975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051677#comment-17051677 ] Hive QA commented on HIVE-22975: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12995588/HIVE-22975.1.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:green}SUCCESS:{color} +1 due to 18097 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/20952/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20952/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20952/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12995588 - PreCommit-HIVE-Build > Optimise TopNKeyFilter with boundary checks > --- > > Key: HIVE-22975 > URL: https://issues.apache.org/jira/browse/HIVE-22975 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HIVE-22975.1.patch, Screenshot 2020-03-04 at 3.26.45 > PM.jpg > > > !Screenshot 2020-03-04 at 3.26.45 PM.jpg|width=507,height=322! > > It would be good to add boundary checks to reduce cycles spent on topN > filter. E.g Q43 spends good amount of time in topN. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397939=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397939 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 04/Mar/20 22:47 Start Date: 04/Mar/20 22:47 Worklog Time Spent: 10m Work Description: davidov541 commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r387980822 ## File path: kafka-handler/README.md ## @@ -50,6 +50,9 @@ ALTER TABLE SET TBLPROPERTIES ( "kafka.serde.class" = "org.apache.hadoop.hive.serde2.avro.AvroSerDe"); ``` + +If you use Confluent Avro serialzier/deserializer with Schema Registry you may want to remove 5 bytes from beginning that represents magic byte + schema ID from registry. Review comment: @b-slim You appear to be correct, based on the source code: https://github.com/confluentinc/schema-registry/blob/master/avro-serializer/src/main/java/io/confluent/kafka/serializers/AbstractKafkaAvroSerializer.java. Let's say we did implement as is, and later we implement the schema registry lookup and use the same identifier? Who would that break? Serialized messages that point to a bogus schema registry instance, or serialized messages that happened to need 5 bytes at the front of the message, but aren't from confluent, and some clever dev figured out he could use Confluent instead of the right way? The second case doesn't matter to me tbh. The first case is concerning and should be handled. I would expect that we would catch when we can't find a schema and print out a warning, but no error. That would allow this case to continue working. But we would be making assumptions on the implementation of a feature in the future, which is always a crapshoot... To be clear, if we make sure documentation is clear on this outside of just these parameters, and @cricket007 agrees with it as a heavy Confluent user, then I'm fine with it. It feels like we've covered this problem enough to be in a good spot either way. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397939) Time Spent: 9.5h (was: 9h 20m) > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug > Components: kafka integration, Serializers/Deserializers >Affects Versions: 3.1.1 >Reporter: Milan Baran >Assignee: David McGinnis >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, > HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch > > Time Spent: 9.5h > Remaining Estimate: 0h > > According to [Google > groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] > the Confluent avro serialzier uses propertiary format for kafka value - > <4 bytes of schema ID> conforms to schema>. > This format does not cause any problem for Confluent kafka deserializer which > respect the format however for hive kafka handler its bit a problem to > correctly deserialize kafka value, because Hive uses custom deserializer from > bytes to objects and ignores kafka consumer ser/deser classes provided via > table property. > It would be nice to support Confluent format with magic byte. > Also it would be great to support Schema registry as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22962) Reuse HiveRelFieldTrimmer instance across queries
[ https://issues.apache.org/jira/browse/HIVE-22962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-22962: --- Attachment: HIVE-22962.04.patch > Reuse HiveRelFieldTrimmer instance across queries > - > > Key: HIVE-22962 > URL: https://issues.apache.org/jira/browse/HIVE-22962 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Attachments: HIVE-22962.01.patch, HIVE-22962.02.patch, > HIVE-22962.03.patch, HIVE-22962.04.patch, HIVE-22962.patch > > > Currently we create multiple {{HiveRelFieldTrimmer}} instances per query. > {{HiveRelFieldTrimmer}} uses a method dispatcher that has a built-in caching > mechanism: given a certain object, it stores the method that was called for > the object class. However, by instantiating the trimmer multiple times per > query and across queries, we create a new dispatcher with each instantiation, > thus effectively removing the caching mechanism that is built within the > dispatcher. > This issue is to reutilize the same {{HiveRelFieldTrimmer}} instance within a > single query and across queries. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397930=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397930 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 04/Mar/20 22:34 Start Date: 04/Mar/20 22:34 Worklog Time Spent: 10m Work Description: b-slim commented on issue #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#issuecomment-594901087 @davidov541 thanks for the PR i left one important comment that need some work thanks for the contribution. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397930) Time Spent: 9h 20m (was: 9h 10m) > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug > Components: kafka integration, Serializers/Deserializers >Affects Versions: 3.1.1 >Reporter: Milan Baran >Assignee: David McGinnis >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, > HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch > > Time Spent: 9h 20m > Remaining Estimate: 0h > > According to [Google > groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] > the Confluent avro serialzier uses propertiary format for kafka value - > <4 bytes of schema ID> conforms to schema>. > This format does not cause any problem for Confluent kafka deserializer which > respect the format however for hive kafka handler its bit a problem to > correctly deserialize kafka value, because Hive uses custom deserializer from > bytes to objects and ignores kafka consumer ser/deser classes provided via > table property. > It would be nice to support Confluent format with magic byte. > Also it would be great to support Schema registry as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397920=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397920 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 04/Mar/20 22:28 Start Date: 04/Mar/20 22:28 Worklog Time Spent: 10m Work Description: b-slim commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r387973124 ## File path: kafka-handler/README.md ## @@ -50,6 +50,9 @@ ALTER TABLE SET TBLPROPERTIES ( "kafka.serde.class" = "org.apache.hadoop.hive.serde2.avro.AvroSerDe"); ``` + +If you use Confluent Avro serialzier/deserializer with Schema Registry you may want to remove 5 bytes from beginning that represents magic byte + schema ID from registry. Review comment: @davidov541 please help me understand this, the first 5 bytes on the record am guessing it is a schema id that can be used to fetch schema from the registry ? **If that is correct then I think we should not call this confluent and let me explain why.** Imagine in the near/far future someone implement a full fledge Avro reader that uses those five bytes to figure out schema from the schema registry what shall we call this now ? Confluent V2 ? that is why I think it is confusing. Thus in my opinion it is better to call it skip, and document how this can be used to read data from Confluent Avro SerDe. Please let me know if this makes sense to you. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397920) Time Spent: 9h 10m (was: 9h) > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug > Components: kafka integration, Serializers/Deserializers >Affects Versions: 3.1.1 >Reporter: Milan Baran >Assignee: David McGinnis >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, > HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch > > Time Spent: 9h 10m > Remaining Estimate: 0h > > According to [Google > groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] > the Confluent avro serialzier uses propertiary format for kafka value - > <4 bytes of schema ID> conforms to schema>. > This format does not cause any problem for Confluent kafka deserializer which > respect the format however for hive kafka handler its bit a problem to > correctly deserialize kafka value, because Hive uses custom deserializer from > bytes to objects and ignores kafka consumer ser/deser classes provided via > table property. > It would be nice to support Confluent format with magic byte. > Also it would be great to support Schema registry as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22975) Optimise TopNKeyFilter with boundary checks
[ https://issues.apache.org/jira/browse/HIVE-22975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051662#comment-17051662 ] Hive QA commented on HIVE-22975: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 21s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 43s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 3m 49s{color} | {color:blue} ql in master has 1531 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 14s{color} | {color:red} The patch generated 2 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 24m 45s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-20952/dev-support/hive-personality.sh | | git revision | master / deebfb6 | | Default Java | 1.8.0_111 | | findbugs | v3.0.1 | | asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-20952/yetus/patch-asflicense-problems.txt | | modules | C: ql U: ql | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-20952/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Optimise TopNKeyFilter with boundary checks > --- > > Key: HIVE-22975 > URL: https://issues.apache.org/jira/browse/HIVE-22975 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HIVE-22975.1.patch, Screenshot 2020-03-04 at 3.26.45 > PM.jpg > > > !Screenshot 2020-03-04 at 3.26.45 PM.jpg|width=507,height=322! > > It would be good to add boundary checks to reduce cycles spent on topN > filter. E.g Q43 spends good amount of time in topN. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22966) LLAP: Consider including waitTime for comparing attempts in same vertex
[ https://issues.apache.org/jira/browse/HIVE-22966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051650#comment-17051650 ] Hive QA commented on HIVE-22966: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12995412/HIVE-22966.4.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:green}SUCCESS:{color} +1 due to 18096 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/20951/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20951/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20951/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12995412 - PreCommit-HIVE-Build > LLAP: Consider including waitTime for comparing attempts in same vertex > --- > > Key: HIVE-22966 > URL: https://issues.apache.org/jira/browse/HIVE-22966 > Project: Hive > Issue Type: Improvement > Components: llap >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HIVE-22966.3.patch, HIVE-22966.4.patch > > > When attempts are compared within same vertex, it should pick up the attempt > with longest wait time to avoid starvation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397906=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397906 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 04/Mar/20 21:39 Start Date: 04/Mar/20 21:39 Worklog Time Spent: 10m Work Description: davidov541 commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r387949896 ## File path: kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java ## @@ -133,12 +134,40 @@ Preconditions.checkArgument(!schemaFromProperty.isEmpty(), "Avro Schema is empty Can not go further"); Schema schema = AvroSerdeUtils.getSchemaFor(schemaFromProperty); LOG.debug("Building Avro Reader with schema {}", schemaFromProperty); - bytesConverter = new AvroBytesConverter(schema); + bytesConverter = getByteConverterForAvroDelegate(schema, tbl); } else { bytesConverter = new BytesWritableConverter(); } } + enum BytesConverterType { +CONFLUENT, Review comment: That makes sense, @cricket007. So far Confluent has done a good job preventing backward compatibility issues, so I don't see a reason to assume that they will do so in the future at the cost of a worse developer experience. I'd be fine leaving it as is. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397906) Time Spent: 9h (was: 8h 50m) > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug > Components: kafka integration, Serializers/Deserializers >Affects Versions: 3.1.1 >Reporter: Milan Baran >Assignee: David McGinnis >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, > HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch > > Time Spent: 9h > Remaining Estimate: 0h > > According to [Google > groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] > the Confluent avro serialzier uses propertiary format for kafka value - > <4 bytes of schema ID> conforms to schema>. > This format does not cause any problem for Confluent kafka deserializer which > respect the format however for hive kafka handler its bit a problem to > correctly deserialize kafka value, because Hive uses custom deserializer from > bytes to objects and ignores kafka consumer ser/deser classes provided via > table property. > It would be nice to support Confluent format with magic byte. > Also it would be great to support Schema registry as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22974) Metastore's table location check should be optional
[ https://issues.apache.org/jira/browse/HIVE-22974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-22974: - Attachment: HIVE-22974.1.patch > Metastore's table location check should be optional > --- > > Key: HIVE-22974 > URL: https://issues.apache.org/jira/browse/HIVE-22974 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22974.1.patch > > > In HIVE-22189 a check was introduced to make sure managed and external tables > are located at the proper space. This condition cannot be satisfied during an > upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22974) Metastore's table location check should be optional
[ https://issues.apache.org/jira/browse/HIVE-22974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-22974: - Status: Open (was: Patch Available) > Metastore's table location check should be optional > --- > > Key: HIVE-22974 > URL: https://issues.apache.org/jira/browse/HIVE-22974 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22974.1.patch > > > In HIVE-22189 a check was introduced to make sure managed and external tables > are located at the proper space. This condition cannot be satisfied during an > upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22974) Metastore's table location check should be optional
[ https://issues.apache.org/jira/browse/HIVE-22974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-22974: - Attachment: (was: HIVE-22974.1.patch) > Metastore's table location check should be optional > --- > > Key: HIVE-22974 > URL: https://issues.apache.org/jira/browse/HIVE-22974 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22974.1.patch > > > In HIVE-22189 a check was introduced to make sure managed and external tables > are located at the proper space. This condition cannot be satisfied during an > upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22974) Metastore's table location check should be optional
[ https://issues.apache.org/jira/browse/HIVE-22974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-22974: - Status: Patch Available (was: Open) > Metastore's table location check should be optional > --- > > Key: HIVE-22974 > URL: https://issues.apache.org/jira/browse/HIVE-22974 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22974.1.patch > > > In HIVE-22189 a check was introduced to make sure managed and external tables > are located at the proper space. This condition cannot be satisfied during an > upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22954) Schedule Repl Load using Hive Scheduler
[ https://issues.apache.org/jira/browse/HIVE-22954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi updated HIVE-22954: --- Status: In Progress (was: Patch Available) > Schedule Repl Load using Hive Scheduler > --- > > Key: HIVE-22954 > URL: https://issues.apache.org/jira/browse/HIVE-22954 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22954.01.patch, HIVE-22954.02.patch, > HIVE-22954.03.patch, HIVE-22954.04.patch, HIVE-22954.05.patch, > HIVE-22954.06.patch, HIVE-22954.07.patch, HIVE-22954.08.patch, > HIVE-22954.09.patch, HIVE-22954.10.patch, HIVE-22954.11.patch, > HIVE-22954.12.patch, HIVE-22954.13.patch, HIVE-22954.patch > > > [https://github.com/apache/hive/pull/932] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22954) Schedule Repl Load using Hive Scheduler
[ https://issues.apache.org/jira/browse/HIVE-22954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi updated HIVE-22954: --- Attachment: HIVE-22954.13.patch Status: Patch Available (was: In Progress) > Schedule Repl Load using Hive Scheduler > --- > > Key: HIVE-22954 > URL: https://issues.apache.org/jira/browse/HIVE-22954 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22954.01.patch, HIVE-22954.02.patch, > HIVE-22954.03.patch, HIVE-22954.04.patch, HIVE-22954.05.patch, > HIVE-22954.06.patch, HIVE-22954.07.patch, HIVE-22954.08.patch, > HIVE-22954.09.patch, HIVE-22954.10.patch, HIVE-22954.11.patch, > HIVE-22954.12.patch, HIVE-22954.13.patch, HIVE-22954.patch > > > [https://github.com/apache/hive/pull/932] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397883=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397883 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 04/Mar/20 21:01 Start Date: 04/Mar/20 21:01 Worklog Time Spent: 10m Work Description: davidov541 commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r387931635 ## File path: kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java ## @@ -133,12 +134,40 @@ Preconditions.checkArgument(!schemaFromProperty.isEmpty(), "Avro Schema is empty Can not go further"); Schema schema = AvroSerdeUtils.getSchemaFor(schemaFromProperty); LOG.debug("Building Avro Reader with schema {}", schemaFromProperty); - bytesConverter = new AvroBytesConverter(schema); + bytesConverter = getByteConverterForAvroDelegate(schema, tbl); } else { bytesConverter = new BytesWritableConverter(); } } + enum BytesConverterType { +CONFLUENT, +SKIP, +NONE; + +static BytesConverterType fromString(String value) { + try { +return BytesConverterType.valueOf(value.trim().toUpperCase()); + } catch (Exception e){ +return NONE; + } +} + } + + BytesConverter getByteConverterForAvroDelegate(Schema schema, Properties tbl) { +String avroBytesConverterProperty = tbl.getProperty(AvroSerdeUtils + .AvroTableProperties.AVRO_SERDE_TYPE +.getPropName(), BytesConverterType.NONE.toString()); +BytesConverterType avroByteConverterType = BytesConverterType.fromString(avroBytesConverterProperty); +Integer avroSkipBytes = Integer.getInteger(tbl.getProperty(AvroSerdeUtils.AvroTableProperties.AVRO_SERDE_SKIP_BYTES Review comment: OK, made the fix. Please check it to make sure I got it right this time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397883) Time Spent: 8h 50m (was: 8h 40m) > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug > Components: kafka integration, Serializers/Deserializers >Affects Versions: 3.1.1 >Reporter: Milan Baran >Assignee: David McGinnis >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, > HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch > > Time Spent: 8h 50m > Remaining Estimate: 0h > > According to [Google > groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] > the Confluent avro serialzier uses propertiary format for kafka value - > <4 bytes of schema ID> conforms to schema>. > This format does not cause any problem for Confluent kafka deserializer which > respect the format however for hive kafka handler its bit a problem to > correctly deserialize kafka value, because Hive uses custom deserializer from > bytes to objects and ignores kafka consumer ser/deser classes provided via > table property. > It would be nice to support Confluent format with magic byte. > Also it would be great to support Schema registry as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397882=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397882 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 04/Mar/20 21:00 Start Date: 04/Mar/20 21:00 Worklog Time Spent: 10m Work Description: davidov541 commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r387931454 ## File path: kafka-handler/pom.xml ## @@ -190,5 +207,27 @@ + + + + org.apache.avro + avro-maven-plugin + 1.8.1 Review comment: Used the global avro.version variable. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397882) Time Spent: 8h 40m (was: 8.5h) > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug > Components: kafka integration, Serializers/Deserializers >Affects Versions: 3.1.1 >Reporter: Milan Baran >Assignee: David McGinnis >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, > HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch > > Time Spent: 8h 40m > Remaining Estimate: 0h > > According to [Google > groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] > the Confluent avro serialzier uses propertiary format for kafka value - > <4 bytes of schema ID> conforms to schema>. > This format does not cause any problem for Confluent kafka deserializer which > respect the format however for hive kafka handler its bit a problem to > correctly deserialize kafka value, because Hive uses custom deserializer from > bytes to objects and ignores kafka consumer ser/deser classes provided via > table property. > It would be nice to support Confluent format with magic byte. > Also it would be great to support Schema registry as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397881=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397881 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 04/Mar/20 21:00 Start Date: 04/Mar/20 21:00 Worklog Time Spent: 10m Work Description: davidov541 commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r387931357 ## File path: kafka-handler/pom.xml ## @@ -118,8 +118,21 @@ 1.7.30 test + + io.confluent + kafka-avro-serializer + 5.4.0 + test Review comment: Added exclusion. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397881) Time Spent: 8.5h (was: 8h 20m) > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug > Components: kafka integration, Serializers/Deserializers >Affects Versions: 3.1.1 >Reporter: Milan Baran >Assignee: David McGinnis >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, > HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch > > Time Spent: 8.5h > Remaining Estimate: 0h > > According to [Google > groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] > the Confluent avro serialzier uses propertiary format for kafka value - > <4 bytes of schema ID> conforms to schema>. > This format does not cause any problem for Confluent kafka deserializer which > respect the format however for hive kafka handler its bit a problem to > correctly deserialize kafka value, because Hive uses custom deserializer from > bytes to objects and ignores kafka consumer ser/deser classes provided via > table property. > It would be nice to support Confluent format with magic byte. > Also it would be great to support Schema registry as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397880=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397880 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 04/Mar/20 21:00 Start Date: 04/Mar/20 21:00 Worklog Time Spent: 10m Work Description: davidov541 commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r387931187 ## File path: kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java ## @@ -369,6 +402,26 @@ private SubStructObjectInspector(StructObjectInspector baseOI, int toIndex) { } } +/** + * The converter reads bytes from kafka message and skip first @skipBytes from beginning. + * + * For example: + * Confluent kafka producer add 5 magic bytes that represents Schema ID as Integer to the message. Review comment: Fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397880) Time Spent: 8h 20m (was: 8h 10m) > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug > Components: kafka integration, Serializers/Deserializers >Affects Versions: 3.1.1 >Reporter: Milan Baran >Assignee: David McGinnis >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, > HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch > > Time Spent: 8h 20m > Remaining Estimate: 0h > > According to [Google > groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] > the Confluent avro serialzier uses propertiary format for kafka value - > <4 bytes of schema ID> conforms to schema>. > This format does not cause any problem for Confluent kafka deserializer which > respect the format however for hive kafka handler its bit a problem to > correctly deserialize kafka value, because Hive uses custom deserializer from > bytes to objects and ignores kafka consumer ser/deser classes provided via > table property. > It would be nice to support Confluent format with magic byte. > Also it would be great to support Schema registry as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397879=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397879 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 04/Mar/20 20:59 Start Date: 04/Mar/20 20:59 Worklog Time Spent: 10m Work Description: davidov541 commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r387931106 ## File path: kafka-handler/README.md ## @@ -50,6 +50,9 @@ ALTER TABLE SET TBLPROPERTIES ( "kafka.serde.class" = "org.apache.hadoop.hive.serde2.avro.AvroSerDe"); ``` + +If you use Confluent Avro serialzier/deserializer with schema registry you may want to remove 5 bytes from beginning that represents magic byte + schema ID from registry. Review comment: Fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397879) Time Spent: 8h 10m (was: 8h) > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug > Components: kafka integration, Serializers/Deserializers >Affects Versions: 3.1.1 >Reporter: Milan Baran >Assignee: David McGinnis >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, > HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch > > Time Spent: 8h 10m > Remaining Estimate: 0h > > According to [Google > groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] > the Confluent avro serialzier uses propertiary format for kafka value - > <4 bytes of schema ID> conforms to schema>. > This format does not cause any problem for Confluent kafka deserializer which > respect the format however for hive kafka handler its bit a problem to > correctly deserialize kafka value, because Hive uses custom deserializer from > bytes to objects and ignores kafka consumer ser/deser classes provided via > table property. > It would be nice to support Confluent format with magic byte. > Also it would be great to support Schema registry as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397878=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397878 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 04/Mar/20 20:59 Start Date: 04/Mar/20 20:59 Worklog Time Spent: 10m Work Description: davidov541 commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r387931054 ## File path: kafka-handler/src/test/org/apache/hadoop/hive/kafka/AvroBytesConverterTest.java ## @@ -0,0 +1,126 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.kafka; + +import com.google.common.collect.Maps; +import io.confluent.kafka.schemaregistry.client.MockSchemaRegistryClient; +import io.confluent.kafka.serializers.KafkaAvroSerializer; +import org.apache.avro.Schema; +import org.apache.hadoop.hive.serde2.avro.AvroGenericRecordWritable; +import org.junit.Assert; +import org.junit.BeforeClass; +import org.junit.Test; + +import java.util.Arrays; +import java.util.Map; + +/** + * Test class for Hive Kafka Avro bytes converter. + */ +public class AvroBytesConverterTest { + private static SimpleRecord simpleRecord1 = SimpleRecord.newBuilder().setId("123").setName("test").build(); + private static byte[] simpleRecord1AsBytes; + + /** + * Emulate confluent avro producer that add 4 magic bits (int) before value bytes. The int represents the schema ID from schema registry. + */ + @BeforeClass + public static void setUp() { +Map config = Maps.newHashMap(); +config.put("schema.registry.url", "http://localhost;); Review comment: Fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397878) Time Spent: 8h (was: 7h 50m) > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug > Components: kafka integration, Serializers/Deserializers >Affects Versions: 3.1.1 >Reporter: Milan Baran >Assignee: David McGinnis >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, > HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch > > Time Spent: 8h > Remaining Estimate: 0h > > According to [Google > groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] > the Confluent avro serialzier uses propertiary format for kafka value - > <4 bytes of schema ID> conforms to schema>. > This format does not cause any problem for Confluent kafka deserializer which > respect the format however for hive kafka handler its bit a problem to > correctly deserialize kafka value, because Hive uses custom deserializer from > bytes to objects and ignores kafka consumer ser/deser classes provided via > table property. > It would be nice to support Confluent format with magic byte. > Also it would be great to support Schema registry as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22954) Schedule Repl Load using Hive Scheduler
[ https://issues.apache.org/jira/browse/HIVE-22954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi updated HIVE-22954: --- Attachment: HIVE-22954.12.patch Status: Patch Available (was: In Progress) > Schedule Repl Load using Hive Scheduler > --- > > Key: HIVE-22954 > URL: https://issues.apache.org/jira/browse/HIVE-22954 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22954.01.patch, HIVE-22954.02.patch, > HIVE-22954.03.patch, HIVE-22954.04.patch, HIVE-22954.05.patch, > HIVE-22954.06.patch, HIVE-22954.07.patch, HIVE-22954.08.patch, > HIVE-22954.09.patch, HIVE-22954.10.patch, HIVE-22954.11.patch, > HIVE-22954.12.patch, HIVE-22954.patch > > > [https://github.com/apache/hive/pull/932] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22954) Schedule Repl Load using Hive Scheduler
[ https://issues.apache.org/jira/browse/HIVE-22954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi updated HIVE-22954: --- Status: In Progress (was: Patch Available) > Schedule Repl Load using Hive Scheduler > --- > > Key: HIVE-22954 > URL: https://issues.apache.org/jira/browse/HIVE-22954 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22954.01.patch, HIVE-22954.02.patch, > HIVE-22954.03.patch, HIVE-22954.04.patch, HIVE-22954.05.patch, > HIVE-22954.06.patch, HIVE-22954.07.patch, HIVE-22954.08.patch, > HIVE-22954.09.patch, HIVE-22954.10.patch, HIVE-22954.11.patch, > HIVE-22954.patch > > > [https://github.com/apache/hive/pull/932] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22954) Schedule Repl Load using Hive Scheduler
[ https://issues.apache.org/jira/browse/HIVE-22954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi updated HIVE-22954: --- Status: In Progress (was: Patch Available) > Schedule Repl Load using Hive Scheduler > --- > > Key: HIVE-22954 > URL: https://issues.apache.org/jira/browse/HIVE-22954 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22954.01.patch, HIVE-22954.02.patch, > HIVE-22954.03.patch, HIVE-22954.04.patch, HIVE-22954.05.patch, > HIVE-22954.06.patch, HIVE-22954.07.patch, HIVE-22954.08.patch, > HIVE-22954.09.patch, HIVE-22954.10.patch, HIVE-22954.11.patch, > HIVE-22954.patch > > > [https://github.com/apache/hive/pull/932] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22954) Schedule Repl Load using Hive Scheduler
[ https://issues.apache.org/jira/browse/HIVE-22954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi updated HIVE-22954: --- Attachment: HIVE-22954.11.patch Status: Patch Available (was: In Progress) > Schedule Repl Load using Hive Scheduler > --- > > Key: HIVE-22954 > URL: https://issues.apache.org/jira/browse/HIVE-22954 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22954.01.patch, HIVE-22954.02.patch, > HIVE-22954.03.patch, HIVE-22954.04.patch, HIVE-22954.05.patch, > HIVE-22954.06.patch, HIVE-22954.07.patch, HIVE-22954.08.patch, > HIVE-22954.09.patch, HIVE-22954.10.patch, HIVE-22954.11.patch, > HIVE-22954.patch > > > [https://github.com/apache/hive/pull/932] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22966) LLAP: Consider including waitTime for comparing attempts in same vertex
[ https://issues.apache.org/jira/browse/HIVE-22966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051598#comment-17051598 ] Hive QA commented on HIVE-22966: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 41s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 45s{color} | {color:blue} llap-server in master has 90 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 15s{color} | {color:red} The patch generated 2 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 14m 14s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-20951/dev-support/hive-personality.sh | | git revision | master / deebfb6 | | Default Java | 1.8.0_111 | | findbugs | v3.0.1 | | asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-20951/yetus/patch-asflicense-problems.txt | | modules | C: llap-server U: llap-server | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-20951/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > LLAP: Consider including waitTime for comparing attempts in same vertex > --- > > Key: HIVE-22966 > URL: https://issues.apache.org/jira/browse/HIVE-22966 > Project: Hive > Issue Type: Improvement > Components: llap >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HIVE-22966.3.patch, HIVE-22966.4.patch > > > When attempts are compared within same vertex, it should pick up the attempt > with longest wait time to avoid starvation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22785) Update/delete/merge statements not optimized through CBO
[ https://issues.apache.org/jira/browse/HIVE-22785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa updated HIVE-22785: -- Status: Patch Available (was: Open) > Update/delete/merge statements not optimized through CBO > > > Key: HIVE-22785 > URL: https://issues.apache.org/jira/browse/HIVE-22785 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Jesus Camacho Rodriguez >Assignee: Krisztian Kasa >Priority: Critical > Attachments: HIVE-22785.1.patch, HIVE-22785.2.patch, > HIVE-22785.2.patch, HIVE-22785.3.patch > > > Currently, CBO is bypassed for update/delete/merge statements. > To support optimizing these statements through CBO, we need to complete three > main tasks: 1) support for sort in Calcite planner, 2) support for SORT in > AST converter, and 3) {{RewriteSemanticAnalyzer}} should extend > {{CalcitePlanner}} instead of {{SemanticAnalyzer}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22785) Update/delete/merge statements not optimized through CBO
[ https://issues.apache.org/jira/browse/HIVE-22785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa updated HIVE-22785: -- Status: Open (was: Patch Available) > Update/delete/merge statements not optimized through CBO > > > Key: HIVE-22785 > URL: https://issues.apache.org/jira/browse/HIVE-22785 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Jesus Camacho Rodriguez >Assignee: Krisztian Kasa >Priority: Critical > Attachments: HIVE-22785.1.patch, HIVE-22785.2.patch, > HIVE-22785.2.patch, HIVE-22785.3.patch > > > Currently, CBO is bypassed for update/delete/merge statements. > To support optimizing these statements through CBO, we need to complete three > main tasks: 1) support for sort in Calcite planner, 2) support for SORT in > AST converter, and 3) {{RewriteSemanticAnalyzer}} should extend > {{CalcitePlanner}} instead of {{SemanticAnalyzer}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22785) Update/delete/merge statements not optimized through CBO
[ https://issues.apache.org/jira/browse/HIVE-22785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa updated HIVE-22785: -- Attachment: HIVE-22785.3.patch > Update/delete/merge statements not optimized through CBO > > > Key: HIVE-22785 > URL: https://issues.apache.org/jira/browse/HIVE-22785 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Jesus Camacho Rodriguez >Assignee: Krisztian Kasa >Priority: Critical > Attachments: HIVE-22785.1.patch, HIVE-22785.2.patch, > HIVE-22785.2.patch, HIVE-22785.3.patch > > > Currently, CBO is bypassed for update/delete/merge statements. > To support optimizing these statements through CBO, we need to complete three > main tasks: 1) support for sort in Calcite planner, 2) support for SORT in > AST converter, and 3) {{RewriteSemanticAnalyzer}} should extend > {{CalcitePlanner}} instead of {{SemanticAnalyzer}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22974) Metastore's table location check should be optional
[ https://issues.apache.org/jira/browse/HIVE-22974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051583#comment-17051583 ] Hive QA commented on HIVE-22974: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12995577/HIVE-22974.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 18096 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.ql.TestTxnExIm.testMM (batchId=342) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/20950/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20950/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20950/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12995577 - PreCommit-HIVE-Build > Metastore's table location check should be optional > --- > > Key: HIVE-22974 > URL: https://issues.apache.org/jira/browse/HIVE-22974 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22974.1.patch > > > In HIVE-22189 a check was introduced to make sure managed and external tables > are located at the proper space. This condition cannot be satisfied during an > upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22865) Include data in replication staging directory
[ https://issues.apache.org/jira/browse/HIVE-22865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PRAVIN KUMAR SINHA updated HIVE-22865: -- Attachment: HIVE-22865.8.patch > Include data in replication staging directory > - > > Key: HIVE-22865 > URL: https://issues.apache.org/jira/browse/HIVE-22865 > Project: Hive > Issue Type: Task >Reporter: PRAVIN KUMAR SINHA >Assignee: PRAVIN KUMAR SINHA >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22865.1.patch, HIVE-22865.2.patch, > HIVE-22865.3.patch, HIVE-22865.4.patch, HIVE-22865.5.patch, > HIVE-22865.6.patch, HIVE-22865.7.patch, HIVE-22865.8.patch > > Time Spent: 4h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22954) Schedule Repl Load using Hive Scheduler
[ https://issues.apache.org/jira/browse/HIVE-22954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi updated HIVE-22954: --- Attachment: HIVE-22954.10.patch Status: Patch Available (was: In Progress) > Schedule Repl Load using Hive Scheduler > --- > > Key: HIVE-22954 > URL: https://issues.apache.org/jira/browse/HIVE-22954 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22954.01.patch, HIVE-22954.02.patch, > HIVE-22954.03.patch, HIVE-22954.04.patch, HIVE-22954.05.patch, > HIVE-22954.06.patch, HIVE-22954.07.patch, HIVE-22954.08.patch, > HIVE-22954.09.patch, HIVE-22954.10.patch, HIVE-22954.patch > > > [https://github.com/apache/hive/pull/932] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22954) Schedule Repl Load using Hive Scheduler
[ https://issues.apache.org/jira/browse/HIVE-22954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi updated HIVE-22954: --- Status: In Progress (was: Patch Available) > Schedule Repl Load using Hive Scheduler > --- > > Key: HIVE-22954 > URL: https://issues.apache.org/jira/browse/HIVE-22954 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22954.01.patch, HIVE-22954.02.patch, > HIVE-22954.03.patch, HIVE-22954.04.patch, HIVE-22954.05.patch, > HIVE-22954.06.patch, HIVE-22954.07.patch, HIVE-22954.08.patch, > HIVE-22954.09.patch, HIVE-22954.10.patch, HIVE-22954.patch > > > [https://github.com/apache/hive/pull/932] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397818=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397818 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 04/Mar/20 19:34 Start Date: 04/Mar/20 19:34 Worklog Time Spent: 10m Work Description: b-slim commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r387888277 ## File path: kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java ## @@ -133,12 +134,40 @@ Preconditions.checkArgument(!schemaFromProperty.isEmpty(), "Avro Schema is empty Can not go further"); Schema schema = AvroSerdeUtils.getSchemaFor(schemaFromProperty); LOG.debug("Building Avro Reader with schema {}", schemaFromProperty); - bytesConverter = new AvroBytesConverter(schema); + bytesConverter = getByteConverterForAvroDelegate(schema, tbl); } else { bytesConverter = new BytesWritableConverter(); } } + enum BytesConverterType { +CONFLUENT, +SKIP, +NONE; + +static BytesConverterType fromString(String value) { + try { +return BytesConverterType.valueOf(value.trim().toUpperCase()); + } catch (Exception e){ +return NONE; + } +} + } + + BytesConverter getByteConverterForAvroDelegate(Schema schema, Properties tbl) { +String avroBytesConverterProperty = tbl.getProperty(AvroSerdeUtils + .AvroTableProperties.AVRO_SERDE_TYPE +.getPropName(), BytesConverterType.NONE.toString()); +BytesConverterType avroByteConverterType = BytesConverterType.fromString(avroBytesConverterProperty); +Integer avroSkipBytes = Integer.getInteger(tbl.getProperty(AvroSerdeUtils.AvroTableProperties.AVRO_SERDE_SKIP_BYTES Review comment: FYI the goal of code review is not to tell devs what to do, but to help them understand why the code need to be changed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397818) Time Spent: 7h 50m (was: 7h 40m) > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug > Components: kafka integration, Serializers/Deserializers >Affects Versions: 3.1.1 >Reporter: Milan Baran >Assignee: David McGinnis >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, > HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch > > Time Spent: 7h 50m > Remaining Estimate: 0h > > According to [Google > groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] > the Confluent avro serialzier uses propertiary format for kafka value - > <4 bytes of schema ID> conforms to schema>. > This format does not cause any problem for Confluent kafka deserializer which > respect the format however for hive kafka handler its bit a problem to > correctly deserialize kafka value, because Hive uses custom deserializer from > bytes to objects and ignores kafka consumer ser/deser classes provided via > table property. > It would be nice to support Confluent format with magic byte. > Also it would be great to support Schema registry as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397815=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397815 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 04/Mar/20 19:33 Start Date: 04/Mar/20 19:33 Worklog Time Spent: 10m Work Description: davidov541 commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r387887820 ## File path: kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java ## @@ -133,12 +134,40 @@ Preconditions.checkArgument(!schemaFromProperty.isEmpty(), "Avro Schema is empty Can not go further"); Schema schema = AvroSerdeUtils.getSchemaFor(schemaFromProperty); LOG.debug("Building Avro Reader with schema {}", schemaFromProperty); - bytesConverter = new AvroBytesConverter(schema); + bytesConverter = getByteConverterForAvroDelegate(schema, tbl); } else { bytesConverter = new BytesWritableConverter(); } } + enum BytesConverterType { +CONFLUENT, +SKIP, +NONE; + +static BytesConverterType fromString(String value) { + try { +return BytesConverterType.valueOf(value.trim().toUpperCase()); + } catch (Exception e){ +return NONE; + } +} + } + + BytesConverter getByteConverterForAvroDelegate(Schema schema, Properties tbl) { +String avroBytesConverterProperty = tbl.getProperty(AvroSerdeUtils + .AvroTableProperties.AVRO_SERDE_TYPE +.getPropName(), BytesConverterType.NONE.toString()); +BytesConverterType avroByteConverterType = BytesConverterType.fromString(avroBytesConverterProperty); +Integer avroSkipBytes = Integer.getInteger(tbl.getProperty(AvroSerdeUtils.AvroTableProperties.AVRO_SERDE_SKIP_BYTES Review comment: Thanks @b-slim , I definitely skipped that multiple times when I was reading the code. That is a bizarre issue, but I'm glad you pointed it out. I've fixed it, and am running the tests now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397815) Time Spent: 7h 40m (was: 7.5h) > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug > Components: kafka integration, Serializers/Deserializers >Affects Versions: 3.1.1 >Reporter: Milan Baran >Assignee: David McGinnis >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, > HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch > > Time Spent: 7h 40m > Remaining Estimate: 0h > > According to [Google > groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] > the Confluent avro serialzier uses propertiary format for kafka value - > <4 bytes of schema ID> conforms to schema>. > This format does not cause any problem for Confluent kafka deserializer which > respect the format however for hive kafka handler its bit a problem to > correctly deserialize kafka value, because Hive uses custom deserializer from > bytes to objects and ignores kafka consumer ser/deser classes provided via > table property. > It would be nice to support Confluent format with magic byte. > Also it would be great to support Schema registry as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22972) Allow table id to be set for table creation requests
[ https://issues.apache.org/jira/browse/HIVE-22972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Gergely updated HIVE-22972: -- Attachment: HIVE-22972.03.patch > Allow table id to be set for table creation requests > > > Key: HIVE-22972 > URL: https://issues.apache.org/jira/browse/HIVE-22972 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22972.01.patch, HIVE-22972.02.patch, > HIVE-22972.03.patch > > > Hive Metastore should accept requests for table creation where the id is set, > ignoring it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397810=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397810 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 04/Mar/20 19:30 Start Date: 04/Mar/20 19:30 Worklog Time Spent: 10m Work Description: cricket007 commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r387885875 ## File path: kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java ## @@ -133,12 +134,40 @@ Preconditions.checkArgument(!schemaFromProperty.isEmpty(), "Avro Schema is empty Can not go further"); Schema schema = AvroSerdeUtils.getSchemaFor(schemaFromProperty); LOG.debug("Building Avro Reader with schema {}", schemaFromProperty); - bytesConverter = new AvroBytesConverter(schema); + bytesConverter = getByteConverterForAvroDelegate(schema, tbl); } else { bytesConverter = new BytesWritableConverter(); } } + enum BytesConverterType { +CONFLUENT, +SKIP, +NONE; + +static BytesConverterType fromString(String value) { + try { +return BytesConverterType.valueOf(value.trim().toUpperCase()); + } catch (Exception e){ +return NONE; + } +} + } + + BytesConverter getByteConverterForAvroDelegate(Schema schema, Properties tbl) { +String avroBytesConverterProperty = tbl.getProperty(AvroSerdeUtils + .AvroTableProperties.AVRO_SERDE_TYPE +.getPropName(), BytesConverterType.NONE.toString()); +BytesConverterType avroByteConverterType = BytesConverterType.fromString(avroBytesConverterProperty); +Integer avroSkipBytes = Integer.getInteger(tbl.getProperty(AvroSerdeUtils.AvroTableProperties.AVRO_SERDE_SKIP_BYTES Review comment: tl;dr - Just use `Integer.parseInt(tbl.getProperty())` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397810) Time Spent: 7.5h (was: 7h 20m) > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug > Components: kafka integration, Serializers/Deserializers >Affects Versions: 3.1.1 >Reporter: Milan Baran >Assignee: David McGinnis >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, > HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch > > Time Spent: 7.5h > Remaining Estimate: 0h > > According to [Google > groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] > the Confluent avro serialzier uses propertiary format for kafka value - > <4 bytes of schema ID> conforms to schema>. > This format does not cause any problem for Confluent kafka deserializer which > respect the format however for hive kafka handler its bit a problem to > correctly deserialize kafka value, because Hive uses custom deserializer from > bytes to objects and ignores kafka consumer ser/deser classes provided via > table property. > It would be nice to support Confluent format with magic byte. > Also it would be great to support Schema registry as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397800=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397800 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 04/Mar/20 19:21 Start Date: 04/Mar/20 19:21 Worklog Time Spent: 10m Work Description: b-slim commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r387881145 ## File path: kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java ## @@ -133,12 +134,40 @@ Preconditions.checkArgument(!schemaFromProperty.isEmpty(), "Avro Schema is empty Can not go further"); Schema schema = AvroSerdeUtils.getSchemaFor(schemaFromProperty); LOG.debug("Building Avro Reader with schema {}", schemaFromProperty); - bytesConverter = new AvroBytesConverter(schema); + bytesConverter = getByteConverterForAvroDelegate(schema, tbl); } else { bytesConverter = new BytesWritableConverter(); } } + enum BytesConverterType { +CONFLUENT, +SKIP, +NONE; + +static BytesConverterType fromString(String value) { + try { +return BytesConverterType.valueOf(value.trim().toUpperCase()); + } catch (Exception e){ +return NONE; + } +} + } + + BytesConverter getByteConverterForAvroDelegate(Schema schema, Properties tbl) { +String avroBytesConverterProperty = tbl.getProperty(AvroSerdeUtils + .AvroTableProperties.AVRO_SERDE_TYPE +.getPropName(), BytesConverterType.NONE.toString()); +BytesConverterType avroByteConverterType = BytesConverterType.fromString(avroBytesConverterProperty); +Integer avroSkipBytes = Integer.getInteger(tbl.getProperty(AvroSerdeUtils.AvroTableProperties.AVRO_SERDE_SKIP_BYTES Review comment: @davidov541 please read the java doc of `java.lang.Integer#getInteger(java.lang.String)` In the first line it says `* Determines the integer value of the system property with the * specified name.` Reading the code should help as well This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397800) Time Spent: 7h 20m (was: 7h 10m) > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug > Components: kafka integration, Serializers/Deserializers >Affects Versions: 3.1.1 >Reporter: Milan Baran >Assignee: David McGinnis >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, > HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch > > Time Spent: 7h 20m > Remaining Estimate: 0h > > According to [Google > groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] > the Confluent avro serialzier uses propertiary format for kafka value - > <4 bytes of schema ID> conforms to schema>. > This format does not cause any problem for Confluent kafka deserializer which > respect the format however for hive kafka handler its bit a problem to > correctly deserialize kafka value, because Hive uses custom deserializer from > bytes to objects and ignores kafka consumer ser/deser classes provided via > table property. > It would be nice to support Confluent format with magic byte. > Also it would be great to support Schema registry as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-22561) Data loss on map join for bucketed, partitioned table
[ https://issues.apache.org/jira/browse/HIVE-22561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez reassigned HIVE-22561: -- Assignee: Aditya Shah (was: Jesus Camacho Rodriguez) > Data loss on map join for bucketed, partitioned table > - > > Key: HIVE-22561 > URL: https://issues.apache.org/jira/browse/HIVE-22561 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.2 >Reporter: Aditya Shah >Assignee: Aditya Shah >Priority: Blocker > Fix For: 3.0.0, 3.1.0 > > Attachments: HIVE-22561.1.branch-3.1.patch, > HIVE-22561.branch-3.1.patch, HIVE-22561.patch, Screenshot 2019-11-28 at > 8.45.17 PM.png, image-2019-11-28-20-46-25-432.png > > > A map join on a column (which is neither involved in bucketing and partition) > causes data loss. > Steps to reproduce: > Env: [hive-dev-box|[https://github.com/kgyrtkirk/hive-dev-box]] hive 3.1.2. > Create tables: > > {code:java} > CREATE TABLE `testj2`( > `id` int, > `bn` string, > `cn` string, > `ad` map, > `mi` array) > PARTITIONED BY ( > `br` string) > CLUSTERED BY ( > bn) > INTO 2 BUCKETS > ROW FORMAT DELIMITED > FIELDS TERMINATED BY ',' > STORED AS TEXTFILE > TBLPROPERTIES ( > 'bucketing_version'='2'); > CREATE TABLE `testj1`( > `id` int, > `can` string, > `cn` string, > `ad` map, > `av` boolean, > `mi` array) > PARTITIONED BY ( > `brand` string) > CLUSTERED BY ( > can) > INTO 2 BUCKETS > ROW FORMAT DELIMITED > FIELDS TERMINATED BY ',' > STORED AS TEXTFILE > TBLPROPERTIES ( > 'bucketing_version'='2'); > {code} > insert some data in both: > {code:java} > insert into testj1 values (100, 'mes_1', 'customer_1', map('city1', 560077), > false, array(5, 10), 'brand_1'), > (101, 'mes_2', 'customer_2', map('city2', 560078), true, array(10, 20), > 'brand_2'), > (102, 'mes_3', 'customer_3', map('city3', 560079), false, array(15, 30), > 'brand_3'), > (103, 'mes_4', 'customer_4', map('city4', 560080), true, array(20, 40), > 'brand_4'), > (104, 'mes_5', 'customer_5', map('city5', 560081), false, array(25, 50), > 'brand_5'); > insert into table testj2 values (100, 'tv_0', 'customer_0', map('city0', > 560076),array(0, 0, 0), 'tv'), > (101, 'tv_1', 'customer_1', map('city1', 560077),array(20, 25, 30), 'tv'), > (102, 'tv_2', 'customer_2', map('city2', 560078),array(40, 50, 60), 'tv'), > (103, 'tv_3', 'customer_3', map('city3', 560079),array(60, 75, 90), 'tv'), > (104, 'tv_4', 'customer_4', map('city4', 560080),array(80, 100, 120), 'tv'); > {code} > Do a join between them: > {code:java} > select t1.id, t1.can, t1.cn, t2.bn,t2.ad, t2.br FROM testj1 t1 JOIN testj2 t2 > on (t1.id = t2.id) order by t1.id; > {code} > Observed results: > !image-2019-11-28-20-46-25-432.png|width=524,height=100! > In the plan, I can see a map join. Disabling it gives the correct result. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-22561) Data loss on map join for bucketed, partitioned table
[ https://issues.apache.org/jira/browse/HIVE-22561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez reassigned HIVE-22561: -- Assignee: Jesus Camacho Rodriguez (was: Aditya Shah) > Data loss on map join for bucketed, partitioned table > - > > Key: HIVE-22561 > URL: https://issues.apache.org/jira/browse/HIVE-22561 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.2 >Reporter: Aditya Shah >Assignee: Jesus Camacho Rodriguez >Priority: Blocker > Fix For: 3.0.0, 3.1.0 > > Attachments: HIVE-22561.1.branch-3.1.patch, > HIVE-22561.branch-3.1.patch, HIVE-22561.patch, Screenshot 2019-11-28 at > 8.45.17 PM.png, image-2019-11-28-20-46-25-432.png > > > A map join on a column (which is neither involved in bucketing and partition) > causes data loss. > Steps to reproduce: > Env: [hive-dev-box|[https://github.com/kgyrtkirk/hive-dev-box]] hive 3.1.2. > Create tables: > > {code:java} > CREATE TABLE `testj2`( > `id` int, > `bn` string, > `cn` string, > `ad` map, > `mi` array) > PARTITIONED BY ( > `br` string) > CLUSTERED BY ( > bn) > INTO 2 BUCKETS > ROW FORMAT DELIMITED > FIELDS TERMINATED BY ',' > STORED AS TEXTFILE > TBLPROPERTIES ( > 'bucketing_version'='2'); > CREATE TABLE `testj1`( > `id` int, > `can` string, > `cn` string, > `ad` map, > `av` boolean, > `mi` array) > PARTITIONED BY ( > `brand` string) > CLUSTERED BY ( > can) > INTO 2 BUCKETS > ROW FORMAT DELIMITED > FIELDS TERMINATED BY ',' > STORED AS TEXTFILE > TBLPROPERTIES ( > 'bucketing_version'='2'); > {code} > insert some data in both: > {code:java} > insert into testj1 values (100, 'mes_1', 'customer_1', map('city1', 560077), > false, array(5, 10), 'brand_1'), > (101, 'mes_2', 'customer_2', map('city2', 560078), true, array(10, 20), > 'brand_2'), > (102, 'mes_3', 'customer_3', map('city3', 560079), false, array(15, 30), > 'brand_3'), > (103, 'mes_4', 'customer_4', map('city4', 560080), true, array(20, 40), > 'brand_4'), > (104, 'mes_5', 'customer_5', map('city5', 560081), false, array(25, 50), > 'brand_5'); > insert into table testj2 values (100, 'tv_0', 'customer_0', map('city0', > 560076),array(0, 0, 0), 'tv'), > (101, 'tv_1', 'customer_1', map('city1', 560077),array(20, 25, 30), 'tv'), > (102, 'tv_2', 'customer_2', map('city2', 560078),array(40, 50, 60), 'tv'), > (103, 'tv_3', 'customer_3', map('city3', 560079),array(60, 75, 90), 'tv'), > (104, 'tv_4', 'customer_4', map('city4', 560080),array(80, 100, 120), 'tv'); > {code} > Do a join between them: > {code:java} > select t1.id, t1.can, t1.cn, t2.bn,t2.ad, t2.br FROM testj1 t1 JOIN testj2 t2 > on (t1.id = t2.id) order by t1.id; > {code} > Observed results: > !image-2019-11-28-20-46-25-432.png|width=524,height=100! > In the plan, I can see a map join. Disabling it gives the correct result. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22954) Schedule Repl Load using Hive Scheduler
[ https://issues.apache.org/jira/browse/HIVE-22954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi updated HIVE-22954: --- Attachment: HIVE-22954.09.patch Status: Patch Available (was: In Progress) > Schedule Repl Load using Hive Scheduler > --- > > Key: HIVE-22954 > URL: https://issues.apache.org/jira/browse/HIVE-22954 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22954.01.patch, HIVE-22954.02.patch, > HIVE-22954.03.patch, HIVE-22954.04.patch, HIVE-22954.05.patch, > HIVE-22954.06.patch, HIVE-22954.07.patch, HIVE-22954.08.patch, > HIVE-22954.09.patch, HIVE-22954.patch > > > [https://github.com/apache/hive/pull/932] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22954) Schedule Repl Load using Hive Scheduler
[ https://issues.apache.org/jira/browse/HIVE-22954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi updated HIVE-22954: --- Status: In Progress (was: Patch Available) > Schedule Repl Load using Hive Scheduler > --- > > Key: HIVE-22954 > URL: https://issues.apache.org/jira/browse/HIVE-22954 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22954.01.patch, HIVE-22954.02.patch, > HIVE-22954.03.patch, HIVE-22954.04.patch, HIVE-22954.05.patch, > HIVE-22954.06.patch, HIVE-22954.07.patch, HIVE-22954.08.patch, > HIVE-22954.09.patch, HIVE-22954.patch > > > [https://github.com/apache/hive/pull/932] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397796=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397796 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 04/Mar/20 19:17 Start Date: 04/Mar/20 19:17 Worklog Time Spent: 10m Work Description: davidov541 commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r387878744 ## File path: kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java ## @@ -133,12 +134,40 @@ Preconditions.checkArgument(!schemaFromProperty.isEmpty(), "Avro Schema is empty Can not go further"); Schema schema = AvroSerdeUtils.getSchemaFor(schemaFromProperty); LOG.debug("Building Avro Reader with schema {}", schemaFromProperty); - bytesConverter = new AvroBytesConverter(schema); + bytesConverter = getByteConverterForAvroDelegate(schema, tbl); } else { bytesConverter = new BytesWritableConverter(); } } + enum BytesConverterType { +CONFLUENT, +SKIP, +NONE; + +static BytesConverterType fromString(String value) { + try { +return BytesConverterType.valueOf(value.trim().toUpperCase()); + } catch (Exception e){ +return NONE; + } +} + } + + BytesConverter getByteConverterForAvroDelegate(Schema schema, Properties tbl) { +String avroBytesConverterProperty = tbl.getProperty(AvroSerdeUtils + .AvroTableProperties.AVRO_SERDE_TYPE +.getPropName(), BytesConverterType.NONE.toString()); +BytesConverterType avroByteConverterType = BytesConverterType.fromString(avroBytesConverterProperty); +Integer avroSkipBytes = Integer.getInteger(tbl.getProperty(AvroSerdeUtils.AvroTableProperties.AVRO_SERDE_SKIP_BYTES Review comment: I think I'm confused what you're asking for then. The initialize function takes in a java.util.Properties object that has properties that have been set for the serde in the DDL for the table. It reads a few from that object, and then passes it to getByteConverterForAvroDelegate, where it is also used in the code added here. The usage of the properties object here matches what is being done in initialize, and seems to match what I would expect. These aren't pulling system properties of the JVM, or at least are not necessarily doing so, instead reading from the Properties object passed to us. Does that make sense, or am I way off base? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397796) Time Spent: 7h 10m (was: 7h) > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug > Components: kafka integration, Serializers/Deserializers >Affects Versions: 3.1.1 >Reporter: Milan Baran >Assignee: David McGinnis >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, > HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch > > Time Spent: 7h 10m > Remaining Estimate: 0h > > According to [Google > groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] > the Confluent avro serialzier uses propertiary format for kafka value - > <4 bytes of schema ID> conforms to schema>. > This format does not cause any problem for Confluent kafka deserializer which > respect the format however for hive kafka handler its bit a problem to > correctly deserialize kafka value, because Hive uses custom deserializer from > bytes to objects and ignores kafka consumer ser/deser classes provided via > table property. > It would be nice to support Confluent format with magic byte. > Also it would be great to support Schema registry as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22974) Metastore's table location check should be optional
[ https://issues.apache.org/jira/browse/HIVE-22974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051542#comment-17051542 ] Hive QA commented on HIVE-22974: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 39s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 55s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 33s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 2m 46s{color} | {color:blue} standalone-metastore/metastore-common in master has 35 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 1m 20s{color} | {color:blue} standalone-metastore/metastore-server in master has 185 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 16s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 21s{color} | {color:red} standalone-metastore/metastore-server: The patch generated 2 new + 375 unchanged - 0 fixed = 377 total (was 375) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 19s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 15s{color} | {color:red} The patch generated 2 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 25m 53s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-20950/dev-support/hive-personality.sh | | git revision | master / deebfb6 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-20950/yetus/diff-checkstyle-standalone-metastore_metastore-server.txt | | asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-20950/yetus/patch-asflicense-problems.txt | | modules | C: standalone-metastore/metastore-common standalone-metastore/metastore-server U: standalone-metastore | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-20950/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Metastore's table location check should be optional > --- > > Key: HIVE-22974 > URL: https://issues.apache.org/jira/browse/HIVE-22974 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22974.1.patch > > > In HIVE-22189 a check was introduced to make sure managed and external tables > are located at the proper space. This condition cannot be satisfied during an > upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397788=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397788 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 04/Mar/20 19:07 Start Date: 04/Mar/20 19:07 Worklog Time Spent: 10m Work Description: b-slim commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r387873547 ## File path: kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java ## @@ -133,12 +134,40 @@ Preconditions.checkArgument(!schemaFromProperty.isEmpty(), "Avro Schema is empty Can not go further"); Schema schema = AvroSerdeUtils.getSchemaFor(schemaFromProperty); LOG.debug("Building Avro Reader with schema {}", schemaFromProperty); - bytesConverter = new AvroBytesConverter(schema); + bytesConverter = getByteConverterForAvroDelegate(schema, tbl); } else { bytesConverter = new BytesWritableConverter(); } } + enum BytesConverterType { +CONFLUENT, +SKIP, +NONE; + +static BytesConverterType fromString(String value) { + try { +return BytesConverterType.valueOf(value.trim().toUpperCase()); + } catch (Exception e){ +return NONE; + } +} + } + + BytesConverter getByteConverterForAvroDelegate(Schema schema, Properties tbl) { +String avroBytesConverterProperty = tbl.getProperty(AvroSerdeUtils + .AvroTableProperties.AVRO_SERDE_TYPE +.getPropName(), BytesConverterType.NONE.toString()); +BytesConverterType avroByteConverterType = BytesConverterType.fromString(avroBytesConverterProperty); +Integer avroSkipBytes = Integer.getInteger(tbl.getProperty(AvroSerdeUtils.AvroTableProperties.AVRO_SERDE_SKIP_BYTES Review comment: @davidov541 did you read the comment above ? That function you are using is reading from the system property of the JVM please see the function implementation `System.getProperty(nm)`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397788) Time Spent: 7h (was: 6h 50m) > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug > Components: kafka integration, Serializers/Deserializers >Affects Versions: 3.1.1 >Reporter: Milan Baran >Assignee: David McGinnis >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, > HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch > > Time Spent: 7h > Remaining Estimate: 0h > > According to [Google > groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] > the Confluent avro serialzier uses propertiary format for kafka value - > <4 bytes of schema ID> conforms to schema>. > This format does not cause any problem for Confluent kafka deserializer which > respect the format however for hive kafka handler its bit a problem to > correctly deserialize kafka value, because Hive uses custom deserializer from > bytes to objects and ignores kafka consumer ser/deser classes provided via > table property. > It would be nice to support Confluent format with magic byte. > Also it would be great to support Schema registry as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer
[ https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397779=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397779 ] ASF GitHub Bot logged work on HIVE-21218: - Author: ASF GitHub Bot Created on: 04/Mar/20 19:00 Start Date: 04/Mar/20 19:00 Worklog Time Spent: 10m Work Description: cricket007 commented on pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r387866510 ## File path: kafka-handler/pom.xml ## @@ -190,5 +207,27 @@ + + + + org.apache.avro + avro-maven-plugin + 1.8.1 Review comment: Is the Avro version stored in properties anywhere else? Confluent uses Avro 1.9.x now This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397779) Time Spent: 6h 40m (was: 6.5h) > KafkaSerDe doesn't support topics created via Confluent Avro serializer > --- > > Key: HIVE-21218 > URL: https://issues.apache.org/jira/browse/HIVE-21218 > Project: Hive > Issue Type: Bug > Components: kafka integration, Serializers/Deserializers >Affects Versions: 3.1.1 >Reporter: Milan Baran >Assignee: David McGinnis >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, > HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch > > Time Spent: 6h 40m > Remaining Estimate: 0h > > According to [Google > groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] > the Confluent avro serialzier uses propertiary format for kafka value - > <4 bytes of schema ID> conforms to schema>. > This format does not cause any problem for Confluent kafka deserializer which > respect the format however for hive kafka handler its bit a problem to > correctly deserialize kafka value, because Hive uses custom deserializer from > bytes to objects and ignores kafka consumer ser/deser classes provided via > table property. > It would be nice to support Confluent format with magic byte. > Also it would be great to support Schema registry as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)