[jira] [Commented] (HIVE-22973) Handle 0 length batches in LlapArrowRowRecordReader

2020-03-04 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051851#comment-17051851
 ] 

Hive QA commented on HIVE-22973:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
48s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
 5s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
26s{color} | {color:blue} llap-ext-client in master has 2 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
42s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
29s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
10s{color} | {color:red} llap-ext-client: The patch generated 1 new + 0 
unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
15s{color} | {color:red} The patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 19m 15s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-20959/dev-support/hive-personality.sh
 |
| git revision | master / deebfb6 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.1 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20959/yetus/diff-checkstyle-llap-ext-client.txt
 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20959/yetus/patch-asflicense-problems.txt
 |
| modules | C: llap-ext-client itests/hive-unit U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20959/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Handle 0 length batches in LlapArrowRowRecordReader
> ---
>
> Key: HIVE-22973
> URL: https://issues.apache.org/jira/browse/HIVE-22973
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Shubham Chaurasia
>Assignee: Shubham Chaurasia
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22973.01.patch, HIVE-22973.02.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In https://issues.apache.org/jira/browse/HIVE-22856, we allowed 
> {{LlapArrowBatchRecordReader}} to permit 0 length arrow batches. 
> {{LlapArrowRowRecordReader}} which is a wrapper over 
> {{LlapArrowBatchRecordReader}} should also handle this.
> On one of the systems (cannot be reproduced easily) where we were running 
> test {{TestJdbcWithMiniLlapVectorArrow}}, we 

[jira] [Commented] (HIVE-22762) Leap day is incorrectly parsed during cast in Hive

2020-03-04 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051848#comment-17051848
 ] 

Hive QA commented on HIVE-22762:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12995637/HIVE-22762.04.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 18096 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/20958/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20958/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20958/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12995637 - PreCommit-HIVE-Build

> Leap day is incorrectly parsed during cast in Hive
> --
>
> Key: HIVE-22762
> URL: https://issues.apache.org/jira/browse/HIVE-22762
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Minor
> Fix For: 4.0.0
>
> Attachments: HIVE-22762.01.patch, HIVE-22762.01.patch, 
> HIVE-22762.01.patch, HIVE-22762.01.patch, HIVE-22762.02.patch, 
> HIVE-22762.03.patch, HIVE-22762.03.patch, HIVE-22762.04.patch
>
>
> While casting a string to a date with a custom date format having day token 
> before year and moth tokens, the date is parsed incorrectly for leap days.
> h3. How to reproduce
> Execute {code}select cast("29 02 0" as date format "dd mm rr"){code} with 
> Hive. The query  results in *2020-02-28*, incorrectly.
> 
> Executing the another cast with a slightly modified representation of the 
> date (day is preceded by year and moth) is however correctly parsed:
> {code}select cast("0 02 29" as date format "rr mm dd"){code}
> It returns *2020-02-29*.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22865) Include data in replication staging directory

2020-03-04 Thread PRAVIN KUMAR SINHA (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PRAVIN KUMAR SINHA updated HIVE-22865:
--
Attachment: HIVE-22865.10.patch

> Include data in replication staging directory
> -
>
> Key: HIVE-22865
> URL: https://issues.apache.org/jira/browse/HIVE-22865
> Project: Hive
>  Issue Type: Task
>Reporter: PRAVIN KUMAR SINHA
>Assignee: PRAVIN KUMAR SINHA
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22865.1.patch, HIVE-22865.10.patch, 
> HIVE-22865.2.patch, HIVE-22865.3.patch, HIVE-22865.4.patch, 
> HIVE-22865.5.patch, HIVE-22865.6.patch, HIVE-22865.7.patch, 
> HIVE-22865.8.patch, HIVE-22865.9.patch
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22972) Allow table id to be set for table creation requests

2020-03-04 Thread Miklos Gergely (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-22972:
--
Attachment: HIVE-22972.03.patch

> Allow table id to be set for table creation requests
> 
>
> Key: HIVE-22972
> URL: https://issues.apache.org/jira/browse/HIVE-22972
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22972.01.patch, HIVE-22972.02.patch, 
> HIVE-22972.03.patch
>
>
> Hive Metastore should accept requests for table creation where the id is set, 
> ignoring it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22972) Allow table id to be set for table creation requests

2020-03-04 Thread Miklos Gergely (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-22972:
--
Attachment: (was: HIVE-22972.03.patch)

> Allow table id to be set for table creation requests
> 
>
> Key: HIVE-22972
> URL: https://issues.apache.org/jira/browse/HIVE-22972
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22972.01.patch, HIVE-22972.02.patch, 
> HIVE-22972.03.patch
>
>
> Hive Metastore should accept requests for table creation where the id is set, 
> ignoring it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22762) Leap day is incorrectly parsed during cast in Hive

2020-03-04 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051822#comment-17051822
 ] 

Hive QA commented on HIVE-22762:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
37s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
16s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
34s{color} | {color:blue} common in master has 63 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
13s{color} | {color:red} common: The patch generated 7 new + 0 unchanged - 0 
fixed = 7 total (was 0) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
15s{color} | {color:red} The patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 13m 24s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-20958/dev-support/hive-personality.sh
 |
| git revision | master / deebfb6 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.1 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20958/yetus/diff-checkstyle-common.txt
 |
| whitespace | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20958/yetus/whitespace-eol.txt
 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20958/yetus/patch-asflicense-problems.txt
 |
| modules | C: common U: common |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20958/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Leap day is incorrectly parsed during cast in Hive
> --
>
> Key: HIVE-22762
> URL: https://issues.apache.org/jira/browse/HIVE-22762
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Minor
> Fix For: 4.0.0
>
> Attachments: HIVE-22762.01.patch, HIVE-22762.01.patch, 
> HIVE-22762.01.patch, HIVE-22762.01.patch, HIVE-22762.02.patch, 
> HIVE-22762.03.patch, HIVE-22762.03.patch, HIVE-22762.04.patch
>
>
> While casting a string to a date with a custom date format having day token 
> before year and moth tokens, the date is parsed incorrectly for leap days.
> h3. How to reproduce
> Execute {code}select cast("29 02 0" as date format "dd mm rr"){code} with 
> Hive. The query  results in *2020-02-28*, incorrectly.
> 
> Executing the another cast with a slightly modified representation of the 
> date (day is preceded by year and moth) is however correctly parsed:
> {code}select cast("0 02 29" as date format "rr mm dd"){code}
> It returns *2020-02-29*.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread David McGinnis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David McGinnis updated HIVE-21218:
--
Attachment: HIVE-21218.6.patch

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.6.patch, HIVE-21218.patch
>
>  Time Spent: 12.5h
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051815#comment-17051815
 ] 

Hive QA commented on HIVE-21218:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12995636/HIVE-21218.5.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 18096 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/20957/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20957/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20957/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12995636 - PreCommit-HIVE-Build

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch
>
>  Time Spent: 12.5h
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=398141=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398141
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 05/Mar/20 05:16
Start Date: 05/Mar/20 05:16
Worklog Time Spent: 10m 
  Work Description: davidov541 commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r388082372
 
 

 ##
 File path: kafka-handler/README.md
 ##
 @@ -50,6 +50,9 @@ ALTER TABLE
 SET TBLPROPERTIES (
   "kafka.serde.class" = "org.apache.hadoop.hive.serde2.avro.AvroSerDe");
 ```
+
+If you use Confluent Avro serialzier/deserializer with Schema Registry you may 
want to remove 5 bytes from beginning that represents magic byte + schema ID 
from registry.
 
 Review comment:
   I'm not sure how that is germane to this thread or to this changelist at 
all. Skipping the bytes does not currently have any connection to having a 
schema, which is already implemented as part of the Kafka Avro support. The 
system would work the same as it would if you didn't specify a schema in the 
first place. I don't see any tests in the other parts which test this, although 
there are plenty of query tests which test the use of the literal and URL 
properties, so I suspect there is one such test there. Given that that is 
orthagonal to this problem, however, I see no need to add another test to this 
changeset for that.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398141)
Time Spent: 12.5h  (was: 12h 20m)

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch
>
>  Time Spent: 12.5h
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=398139=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398139
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 05/Mar/20 05:08
Start Date: 05/Mar/20 05:08
Worklog Time Spent: 10m 
  Work Description: davidov541 commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r388080591
 
 

 ##
 File path: 
kafka-handler/src/test/org/apache/hadoop/hive/kafka/AvroBytesConverterTest.java
 ##
 @@ -0,0 +1,155 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.kafka;
+
+import com.google.common.collect.Maps;
+import io.confluent.kafka.schemaregistry.client.MockSchemaRegistryClient;
+import io.confluent.kafka.serializers.AbstractKafkaAvroSerDeConfig;
+import io.confluent.kafka.serializers.KafkaAvroSerializer;
+import org.apache.avro.Schema;
+import org.apache.hadoop.hive.serde2.avro.AvroGenericRecordWritable;
+import org.junit.Assert;
+import org.junit.BeforeClass;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.ExpectedException;
+
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Test class for Hive Kafka Avro SerDe with variable bytes skipped.
+ */
+public class AvroBytesConverterTest {
+  private static SimpleRecord simpleRecord = 
SimpleRecord.newBuilder().setId("123").setName("test").build();
+  private static byte[] simpleRecordConfluentBytes;
+
+  @Rule
+  public ExpectedException exception = ExpectedException.none();
+
+  /**
+   * Use the KafkaAvroSerializer from Confluent to serialize the simpleRecord. 
+   */
+  @BeforeClass
+  public static void setUp() {
+Map config = Maps.newHashMap();
+config.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, 
"http://localhost:8081;);
+KafkaAvroSerializer avroSerializer = new KafkaAvroSerializer(new 
MockSchemaRegistryClient());
+avroSerializer.configure(config, false);
+simpleRecordConfluentBytes = avroSerializer.serialize("temp", 
simpleRecord);
+  }
+
+  private void runConversionTest(KafkaSerDe.AvroBytesConverter conv, byte[] 
serializedSimpleRecord) { 
+AvroGenericRecordWritable simpleRecordWritable = 
conv.getWritable(serializedSimpleRecord);
+
+Assert.assertNotNull(simpleRecordWritable);
+Assert.assertEquals(SimpleRecord.class, 
simpleRecordWritable.getRecord().getClass());
+
+SimpleRecord simpleRecordDeserialized = (SimpleRecord) 
simpleRecordWritable.getRecord();
+
+Assert.assertNotNull(simpleRecordDeserialized);
+Assert.assertEquals(simpleRecord, simpleRecordDeserialized);
+  }
+
+  /**
+   * Tests the default case of no skipped bytes per record works properly. 
+   */
+  @Test
+  public void convertWithAvroBytesConverter() {
+// Since the serialized version was created by Confluent, lets remove the 
first five bytes to get the actual message.
+byte[] simpleRecordWithNoOffset = 
Arrays.copyOfRange(simpleRecordConfluentBytes, 5, 
simpleRecordConfluentBytes.length);
+
+Schema schema = SimpleRecord.getClassSchema();
+KafkaSerDe.AvroBytesConverter conv = new 
KafkaSerDe.AvroBytesConverter(schema);
+runConversionTest(conv, simpleRecordWithNoOffset);
+  }
+
+  /**
+   * Tests that the skip converter skips 5 bytes properly, which matches what 
Confluent needs.
+   */
+  @Test
+  public void convertWithConfluentAvroBytesConverter() {
+Schema schema = SimpleRecord.getClassSchema();
+KafkaSerDe.AvroSkipBytesConverter conv = new 
KafkaSerDe.AvroSkipBytesConverter(schema, 5);
+runConversionTest(conv, simpleRecordConfluentBytes);
+  }
+
+  /**
+   * Tests that the skip converter skips a custom number of bytes properly.
+   */
+  @Test
+  public void convertWithCustomAvroSkipBytesConverter() {
+int offset = 2;
+// Remove all but two bytes of the five byte offset which Confluent adds, 
+// to simulate a message with only 2 bytes in front of each message.
+byte[] simpleRecordAsOffsetBytes = 

[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=398135=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398135
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 05/Mar/20 05:03
Start Date: 05/Mar/20 05:03
Worklog Time Spent: 10m 
  Work Description: davidov541 commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r388079696
 
 

 ##
 File path: 
kafka-handler/src/test/org/apache/hadoop/hive/kafka/AvroBytesConverterTest.java
 ##
 @@ -0,0 +1,155 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.kafka;
+
+import com.google.common.collect.Maps;
+import io.confluent.kafka.schemaregistry.client.MockSchemaRegistryClient;
+import io.confluent.kafka.serializers.AbstractKafkaAvroSerDeConfig;
+import io.confluent.kafka.serializers.KafkaAvroSerializer;
+import org.apache.avro.Schema;
+import org.apache.hadoop.hive.serde2.avro.AvroGenericRecordWritable;
+import org.junit.Assert;
+import org.junit.BeforeClass;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.ExpectedException;
+
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Test class for Hive Kafka Avro SerDe with variable bytes skipped.
+ */
+public class AvroBytesConverterTest {
+  private static SimpleRecord simpleRecord = 
SimpleRecord.newBuilder().setId("123").setName("test").build();
+  private static byte[] simpleRecordConfluentBytes;
+
+  @Rule
+  public ExpectedException exception = ExpectedException.none();
+
+  /**
+   * Use the KafkaAvroSerializer from Confluent to serialize the simpleRecord. 
+   */
+  @BeforeClass
+  public static void setUp() {
+Map config = Maps.newHashMap();
+config.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, 
"http://localhost:8081;);
+KafkaAvroSerializer avroSerializer = new KafkaAvroSerializer(new 
MockSchemaRegistryClient());
+avroSerializer.configure(config, false);
+simpleRecordConfluentBytes = avroSerializer.serialize("temp", 
simpleRecord);
+  }
+
+  private void runConversionTest(KafkaSerDe.AvroBytesConverter conv, byte[] 
serializedSimpleRecord) { 
 
 Review comment:
   We are instantiating KafkaSerDe.AvroBytesConverter for the normal case and 
KafkaSerDe.AvroSkipBytesConverter for the other cases. If we wanted to move 
creation of the converter into the function, it would require an if-else 
statement, making it more complex than just doing this. Additionally, this 
gives us the leeway to better add tests if more converters are needed later.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398135)
Time Spent: 12h 10m  (was: 12h)

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch
>
>  Time Spent: 12h 10m
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any 

[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=398133=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398133
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 05/Mar/20 04:59
Start Date: 05/Mar/20 04:59
Worklog Time Spent: 10m 
  Work Description: cricket007 commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r388078819
 
 

 ##
 File path: 
kafka-handler/src/test/org/apache/hadoop/hive/kafka/AvroBytesConverterTest.java
 ##
 @@ -0,0 +1,155 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.kafka;
+
+import com.google.common.collect.Maps;
+import io.confluent.kafka.schemaregistry.client.MockSchemaRegistryClient;
+import io.confluent.kafka.serializers.AbstractKafkaAvroSerDeConfig;
+import io.confluent.kafka.serializers.KafkaAvroSerializer;
+import org.apache.avro.Schema;
+import org.apache.hadoop.hive.serde2.avro.AvroGenericRecordWritable;
+import org.junit.Assert;
+import org.junit.BeforeClass;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.ExpectedException;
+
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Test class for Hive Kafka Avro SerDe with variable bytes skipped.
+ */
+public class AvroBytesConverterTest {
+  private static SimpleRecord simpleRecord = 
SimpleRecord.newBuilder().setId("123").setName("test").build();
+  private static byte[] simpleRecordConfluentBytes;
+
+  @Rule
+  public ExpectedException exception = ExpectedException.none();
+
+  /**
+   * Use the KafkaAvroSerializer from Confluent to serialize the simpleRecord. 
+   */
+  @BeforeClass
+  public static void setUp() {
+Map config = Maps.newHashMap();
+config.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, 
"http://localhost:8081;);
+KafkaAvroSerializer avroSerializer = new KafkaAvroSerializer(new 
MockSchemaRegistryClient());
+avroSerializer.configure(config, false);
+simpleRecordConfluentBytes = avroSerializer.serialize("temp", 
simpleRecord);
+  }
+
+  private void runConversionTest(KafkaSerDe.AvroBytesConverter conv, byte[] 
serializedSimpleRecord) { 
 
 Review comment:
   Pass Integer in as a parameter, and you could move more code in here.
   
   For the case where there's no offset, pass in null 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398133)
Time Spent: 12h  (was: 11h 50m)

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch
>
>  Time Spent: 12h
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer 

[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=398132=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398132
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 05/Mar/20 04:59
Start Date: 05/Mar/20 04:59
Worklog Time Spent: 10m 
  Work Description: cricket007 commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r388078182
 
 

 ##
 File path: 
kafka-handler/src/test/org/apache/hadoop/hive/kafka/AvroBytesConverterTest.java
 ##
 @@ -0,0 +1,155 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.kafka;
+
+import com.google.common.collect.Maps;
+import io.confluent.kafka.schemaregistry.client.MockSchemaRegistryClient;
+import io.confluent.kafka.serializers.AbstractKafkaAvroSerDeConfig;
+import io.confluent.kafka.serializers.KafkaAvroSerializer;
+import org.apache.avro.Schema;
+import org.apache.hadoop.hive.serde2.avro.AvroGenericRecordWritable;
+import org.junit.Assert;
+import org.junit.BeforeClass;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.ExpectedException;
+
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Test class for Hive Kafka Avro SerDe with variable bytes skipped.
+ */
+public class AvroBytesConverterTest {
+  private static SimpleRecord simpleRecord = 
SimpleRecord.newBuilder().setId("123").setName("test").build();
+  private static byte[] simpleRecordConfluentBytes;
+
+  @Rule
+  public ExpectedException exception = ExpectedException.none();
+
+  /**
+   * Use the KafkaAvroSerializer from Confluent to serialize the simpleRecord. 
+   */
+  @BeforeClass
+  public static void setUp() {
+Map config = Maps.newHashMap();
+config.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, 
"http://localhost:8081;);
+KafkaAvroSerializer avroSerializer = new KafkaAvroSerializer(new 
MockSchemaRegistryClient());
+avroSerializer.configure(config, false);
+simpleRecordConfluentBytes = avroSerializer.serialize("temp", 
simpleRecord);
+  }
+
+  private void runConversionTest(KafkaSerDe.AvroBytesConverter conv, byte[] 
serializedSimpleRecord) { 
+AvroGenericRecordWritable simpleRecordWritable = 
conv.getWritable(serializedSimpleRecord);
+
+Assert.assertNotNull(simpleRecordWritable);
+Assert.assertEquals(SimpleRecord.class, 
simpleRecordWritable.getRecord().getClass());
+
+SimpleRecord simpleRecordDeserialized = (SimpleRecord) 
simpleRecordWritable.getRecord();
+
+Assert.assertNotNull(simpleRecordDeserialized);
+Assert.assertEquals(simpleRecord, simpleRecordDeserialized);
+  }
+
+  /**
+   * Tests the default case of no skipped bytes per record works properly. 
+   */
+  @Test
+  public void convertWithAvroBytesConverter() {
+// Since the serialized version was created by Confluent, lets remove the 
first five bytes to get the actual message.
+byte[] simpleRecordWithNoOffset = 
Arrays.copyOfRange(simpleRecordConfluentBytes, 5, 
simpleRecordConfluentBytes.length);
+
+Schema schema = SimpleRecord.getClassSchema();
+KafkaSerDe.AvroBytesConverter conv = new 
KafkaSerDe.AvroBytesConverter(schema);
+runConversionTest(conv, simpleRecordWithNoOffset);
+  }
+
+  /**
+   * Tests that the skip converter skips 5 bytes properly, which matches what 
Confluent needs.
+   */
+  @Test
+  public void convertWithConfluentAvroBytesConverter() {
+Schema schema = SimpleRecord.getClassSchema();
+KafkaSerDe.AvroSkipBytesConverter conv = new 
KafkaSerDe.AvroSkipBytesConverter(schema, 5);
+runConversionTest(conv, simpleRecordConfluentBytes);
+  }
+
+  /**
+   * Tests that the skip converter skips a custom number of bytes properly.
+   */
+  @Test
+  public void convertWithCustomAvroSkipBytesConverter() {
+int offset = 2;
+// Remove all but two bytes of the five byte offset which Confluent adds, 
+// to simulate a message with only 2 bytes in front of each message.
+byte[] simpleRecordAsOffsetBytes = 

[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=398134=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398134
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 05/Mar/20 04:59
Start Date: 05/Mar/20 04:59
Worklog Time Spent: 10m 
  Work Description: cricket007 commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r388077675
 
 

 ##
 File path: kafka-handler/README.md
 ##
 @@ -50,6 +50,9 @@ ALTER TABLE
 SET TBLPROPERTIES (
   "kafka.serde.class" = "org.apache.hadoop.hive.serde2.avro.AvroSerDe");
 ```
+
+If you use Confluent's Avro serialzier or deserializer with the Confluent 
Schema Registry, you will need to remove five bytes from beginning of each 
message. These five bytes represent [a magic byte and a four-byte schema ID 
from 
registry.](https://docs.confluent.io/current/schema-registry/serializer-formatter.html#wire-format)
 
 Review comment:
   Typo: serializer
   
   Remove five bytes from *the* beginning of each
   
   From *the* Registry 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398134)

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch
>
>  Time Spent: 12h
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=398130=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398130
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 05/Mar/20 04:52
Start Date: 05/Mar/20 04:52
Worklog Time Spent: 10m 
  Work Description: cricket007 commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r388077250
 
 

 ##
 File path: kafka-handler/README.md
 ##
 @@ -50,6 +50,9 @@ ALTER TABLE
 SET TBLPROPERTIES (
   "kafka.serde.class" = "org.apache.hadoop.hive.serde2.avro.AvroSerDe");
 ```
+
+If you use Confluent Avro serialzier/deserializer with Schema Registry you may 
want to remove 5 bytes from beginning that represents magic byte + schema ID 
from registry.
 
 Review comment:
   Sorry if I missed it, but is there a test case for when you do skip the 
bytes, but there's no url or literal given? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398130)
Time Spent: 11h 50m  (was: 11h 40m)

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch
>
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=398127=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398127
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 05/Mar/20 04:49
Start Date: 05/Mar/20 04:49
Worklog Time Spent: 10m 
  Work Description: cricket007 commented on issue #933: HIVE-21218: Adding 
support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#issuecomment-595026999
 
 
   You may want to rebase? I see your commit for 14888 in here 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398127)
Time Spent: 11h 40m  (was: 11.5h)

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch
>
>  Time Spent: 11h 40m
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051800#comment-17051800
 ] 

Hive QA commented on HIVE-21218:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
59s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
12s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
35s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
44s{color} | {color:blue} serde in master has 197 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
32s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
17s{color} | {color:red} kafka-handler in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
17s{color} | {color:red} kafka-handler in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 17s{color} 
| {color:red} kafka-handler in the patch failed. {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
10s{color} | {color:red} kafka-handler: The patch generated 10 new + 1 
unchanged - 0 fixed = 11 total (was 1) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 4 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
16s{color} | {color:red} kafka-handler in the patch failed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
15s{color} | {color:red} The patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 17m 28s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  xml  compile  findbugs  
checkstyle  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-20957/dev-support/hive-personality.sh
 |
| git revision | master / deebfb6 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.1 |
| mvninstall | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20957/yetus/patch-mvninstall-kafka-handler.txt
 |
| compile | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20957/yetus/patch-compile-kafka-handler.txt
 |
| javac | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20957/yetus/patch-compile-kafka-handler.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20957/yetus/diff-checkstyle-kafka-handler.txt
 |
| whitespace | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20957/yetus/whitespace-eol.txt
 |
| findbugs | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20957/yetus/patch-findbugs-kafka-handler.txt
 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20957/yetus/patch-asflicense-problems.txt
 |
| modules | C: serde kafka-handler U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20957/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  

[jira] [Commented] (HIVE-22977) Merge delta files instead of running a query in major/minor compaction

2020-03-04 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051792#comment-17051792
 ] 

Hive QA commented on HIVE-22977:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12995635/HIVE-22977.01.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/20956/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20956/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20956/

Messages:
{noformat}
 This message was trimmed, see log for full details 
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/jetty-runner/9.3.27.v20190418/jetty-runner-9.3.27.v20190418.jar(javax/servlet/DispatcherType.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/jetty-runner/9.3.27.v20190418/jetty-runner-9.3.27.v20190418.jar(javax/servlet/Filter.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/jetty-runner/9.3.27.v20190418/jetty-runner-9.3.27.v20190418.jar(javax/servlet/FilterChain.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/jetty-runner/9.3.27.v20190418/jetty-runner-9.3.27.v20190418.jar(javax/servlet/FilterConfig.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/jetty-runner/9.3.27.v20190418/jetty-runner-9.3.27.v20190418.jar(javax/servlet/ServletException.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/jetty-runner/9.3.27.v20190418/jetty-runner-9.3.27.v20190418.jar(javax/servlet/ServletRequest.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/jetty-runner/9.3.27.v20190418/jetty-runner-9.3.27.v20190418.jar(javax/servlet/ServletResponse.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/jetty-runner/9.3.27.v20190418/jetty-runner-9.3.27.v20190418.jar(javax/servlet/annotation/WebFilter.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/jetty-runner/9.3.27.v20190418/jetty-runner-9.3.27.v20190418.jar(javax/servlet/http/HttpServletRequest.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/jetty-runner/9.3.27.v20190418/jetty-runner-9.3.27.v20190418.jar(javax/servlet/http/HttpServletResponse.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/apache-github-source-source/classification/target/hive-classification-4.0.0-SNAPSHOT.jar(org/apache/hadoop/hive/common/classification/InterfaceAudience$LimitedPrivate.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/apache-github-source-source/classification/target/hive-classification-4.0.0-SNAPSHOT.jar(org/apache/hadoop/hive/common/classification/InterfaceStability$Unstable.class)]]
[loading 
ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/io/ByteArrayOutputStream.class)]]
[loading 
ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/io/OutputStream.class)]]
[loading 
ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/io/Closeable.class)]]
[loading 
ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/lang/AutoCloseable.class)]]
[loading 
ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/io/Flushable.class)]]
[loading 
ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(javax/xml/bind/annotation/XmlRootElement.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/org/apache/commons/commons-exec/1.1/commons-exec-1.1.jar(org/apache/commons/exec/ExecuteException.class)]]
[loading 
ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/security/PrivilegedExceptionAction.class)]]
[loading 
ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/util/concurrent/ExecutionException.class)]]
[loading 
ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/util/concurrent/TimeoutException.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/org/apache/hadoop/hadoop-common/3.1.0/hadoop-common-3.1.0.jar(org/apache/hadoop/fs/FileSystem.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/apache-github-source-source/shims/common/target/hive-shims-common-4.0.0-SNAPSHOT.jar(org/apache/hadoop/hive/shims/HadoopShimsSecure.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/apache-github-source-source/shims/common/target/hive-shims-common-4.0.0-SNAPSHOT.jar(org/apache/hadoop/hive/shims/ShimLoader.class)]]
[loading 

[jira] [Commented] (HIVE-22972) Allow table id to be set for table creation requests

2020-03-04 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051788#comment-17051788
 ] 

Hive QA commented on HIVE-22972:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12995658/HIVE-22972.03.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 18096 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.metastore.TestMetastoreHousekeepingLeaderEmptyConfig.testHouseKeepingThreadExistence
 (batchId=252)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/20955/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20955/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20955/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12995658 - PreCommit-HIVE-Build

> Allow table id to be set for table creation requests
> 
>
> Key: HIVE-22972
> URL: https://issues.apache.org/jira/browse/HIVE-22972
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22972.01.patch, HIVE-22972.02.patch, 
> HIVE-22972.03.patch
>
>
> Hive Metastore should accept requests for table creation where the id is set, 
> ignoring it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22966) LLAP: Consider including waitTime for comparing attempts in same vertex

2020-03-04 Thread Gopal Vijayaraghavan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051768#comment-17051768
 ] 

Gopal Vijayaraghavan commented on HIVE-22966:
-

LGTM - +1

> LLAP: Consider including waitTime for comparing attempts in same vertex
> ---
>
> Key: HIVE-22966
> URL: https://issues.apache.org/jira/browse/HIVE-22966
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-22966.3.patch, HIVE-22966.4.patch
>
>
> When attempts are compared within same vertex, it should pick up the attempt 
> with longest wait time to avoid starvation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22978) Fix decimal precision and scale inference for aggregate rewriting in Calcite

2020-03-04 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-22978:
---
Attachment: HIVE-22978.patch

> Fix decimal precision and scale inference for aggregate rewriting in Calcite
> 
>
> Key: HIVE-22978
> URL: https://issues.apache.org/jira/browse/HIVE-22978
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-22978.patch
>
>
> Calcite rules can do rewritings of aggregate functions, e.g., {{avg}} into 
> {{sum/count}}. When type of {{avg}} is decimal, inference of intermediate 
> precision and scale for the division is not done correctly. The reason is 
> that we miss support for some types in method {{getDefaultPrecision}} in 
> {{HiveTypeSystemImpl}}. Additionally, {{deriveSumType}} should be overridden 
> in {{HiveTypeSystemImpl}} to abide by the Hive semantics for sum aggregate 
> type inference.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22978) Fix decimal precision and scale inference for aggregate rewriting in Calcite

2020-03-04 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-22978:
---
Status: Patch Available  (was: In Progress)

> Fix decimal precision and scale inference for aggregate rewriting in Calcite
> 
>
> Key: HIVE-22978
> URL: https://issues.apache.org/jira/browse/HIVE-22978
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> Calcite rules can do rewritings of aggregate functions, e.g., {{avg}} into 
> {{sum/count}}. When type of {{avg}} is decimal, inference of intermediate 
> precision and scale for the division is not done correctly. The reason is 
> that we miss support for some types in method {{getDefaultPrecision}} in 
> {{HiveTypeSystemImpl}}. Additionally, {{deriveSumType}} should be overridden 
> in {{HiveTypeSystemImpl}} to abide by the Hive semantics for sum aggregate 
> type inference.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-22978) Fix decimal precision and scale inference for aggregate rewriting in Calcite

2020-03-04 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-22978 started by Jesus Camacho Rodriguez.
--
> Fix decimal precision and scale inference for aggregate rewriting in Calcite
> 
>
> Key: HIVE-22978
> URL: https://issues.apache.org/jira/browse/HIVE-22978
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> Calcite rules can do rewritings of aggregate functions, e.g., {{avg}} into 
> {{sum/count}}. When type of {{avg}} is decimal, inference of intermediate 
> precision and scale for the division is not done correctly. The reason is 
> that we miss support for some types in method {{getDefaultPrecision}} in 
> {{HiveTypeSystemImpl}}. Additionally, {{deriveSumType}} should be overridden 
> in {{HiveTypeSystemImpl}} to abide by the Hive semantics for sum aggregate 
> type inference.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22972) Allow table id to be set for table creation requests

2020-03-04 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051759#comment-17051759
 ] 

Hive QA commented on HIVE-22972:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
34s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
28s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  1m 
14s{color} | {color:blue} standalone-metastore/metastore-server in master has 
185 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
14s{color} | {color:red} The patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 15m 51s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-20955/dev-support/hive-personality.sh
 |
| git revision | master / deebfb6 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20955/yetus/patch-asflicense-problems.txt
 |
| modules | C: standalone-metastore/metastore-server U: 
standalone-metastore/metastore-server |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20955/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Allow table id to be set for table creation requests
> 
>
> Key: HIVE-22972
> URL: https://issues.apache.org/jira/browse/HIVE-22972
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22972.01.patch, HIVE-22972.02.patch, 
> HIVE-22972.03.patch
>
>
> Hive Metastore should accept requests for table creation where the id is set, 
> ignoring it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-22978) Fix decimal precision and scale inference for aggregate rewriting in Calcite

2020-03-04 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-22978:
--


> Fix decimal precision and scale inference for aggregate rewriting in Calcite
> 
>
> Key: HIVE-22978
> URL: https://issues.apache.org/jira/browse/HIVE-22978
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> Calcite rules can do rewritings of aggregate functions, e.g., {{avg}} into 
> {{sum/count}}. When type of {{avg}} is decimal, inference of intermediate 
> precision and scale for the division is not done correctly. The reason is 
> that we miss support for some types in method {{getDefaultPrecision}} in 
> {{HiveTypeSystemImpl}}. Additionally, {{deriveSumType}} should be overridden 
> in {{HiveTypeSystemImpl}} to abide by the Hive semantics for sum aggregate 
> type inference.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22256) Rewriting fails when `IN` clause has items in different order in MV and query.

2020-03-04 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051753#comment-17051753
 ] 

Jesus Camacho Rodriguez commented on HIVE-22256:


Thanks for checking [~vgarg]. I will change the approach for the fix slightly 
(instead of moving the rules around) so those issues go away.

> Rewriting fails when `IN` clause has items in different order in MV and query.
> --
>
> Key: HIVE-22256
> URL: https://issues.apache.org/jira/browse/HIVE-22256
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO, Materialized views
>Affects Versions: 3.1.2
>Reporter: Steve Carlin
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-22256.patch, expr2.sql
>
>
> Rewriting fails on following materialized view and query (script is also 
> attached):
> create materialized view view2 stored as orc as (select prod_id, cust_id, 
> store_id, sale_date, qty, amt, descr from sales where cust_id in (1,2,3,4,5));
> explain extended select prod_id, cust_id  from sales where cust_id in 
> (5,1,2,3,4);



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22954) Schedule Repl Load using Hive Scheduler

2020-03-04 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051745#comment-17051745
 ] 

Hive QA commented on HIVE-22954:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12995664/HIVE-22954.13.patch

{color:green}SUCCESS:{color} +1 due to 23 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 28 failed/errored test(s), 18089 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[repl_load_requires_admin]
 (batchId=107)
org.apache.hadoop.hive.ql.parse.TestReplAcidTablesWithJsonMessage.testMultiDBTxn
 (batchId=277)
org.apache.hadoop.hive.ql.parse.TestReplAcrossInstancesWithJsonMessageFormat.testBootStrapDumpOfWarehouse
 (batchId=268)
org.apache.hadoop.hive.ql.parse.TestReplAcrossInstancesWithJsonMessageFormat.testIncrementalDumpOfWarehouse
 (batchId=268)
org.apache.hadoop.hive.ql.parse.TestReplAcrossInstancesWithJsonMessageFormat.testMoveOptimizationIncrementalFailureAfterCopy
 (batchId=268)
org.apache.hadoop.hive.ql.parse.TestReplAcrossInstancesWithJsonMessageFormat.testMoveOptimizationIncrementalFailureAfterCopyReplace
 (batchId=268)
org.apache.hadoop.hive.ql.parse.TestReplTableMigrationWithJsonFormat.testIncrementalLoadMigrationManagedToAcidFailure
 (batchId=275)
org.apache.hadoop.hive.ql.parse.TestReplTableMigrationWithJsonFormat.testIncrementalLoadMigrationManagedToAcidFailurePart
 (batchId=275)
org.apache.hadoop.hive.ql.parse.TestReplWithJsonMessageFormat.testIncrementalLoadFailAndRetry
 (batchId=260)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testIncrementalLoadFailAndRetry
 (batchId=269)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcidTables.testMultiDBTxn
 (batchId=279)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootStrapDumpOfWarehouse
 (batchId=273)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testIncrementalDumpOfWarehouse
 (batchId=273)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testMoveOptimizationIncrementalFailureAfterCopy
 (batchId=273)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testMoveOptimizationIncrementalFailureAfterCopyReplace
 (batchId=273)
org.apache.hadoop.hive.ql.parse.TestReplicationWithTableMigration.testIncrementalLoadMigrationManagedToAcidFailure
 (batchId=263)
org.apache.hadoop.hive.ql.parse.TestReplicationWithTableMigration.testIncrementalLoadMigrationManagedToAcidFailurePart
 (batchId=263)
org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenarios.testRetryFailure 
(batchId=281)
org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenariosACID.testRetryFailure
 (batchId=261)
org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenariosMM.testRetryFailure
 (batchId=266)
org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenariosMMNoAutogather.testRetryFailure
 (batchId=258)
org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenariosMigration.testRetryFailure
 (batchId=265)
org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenariosMigrationNoAutogather.testRetryFailure
 (batchId=276)
org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenariosNoAutogather.testRetryFailure
 (batchId=278)
org.apache.hive.jdbc.TestJdbcWithMiniLlapArrow.testComplexQuery (batchId=293)
org.apache.hive.jdbc.TestJdbcWithMiniLlapArrow.testKillQuery (batchId=293)
org.apache.hive.jdbc.TestJdbcWithMiniLlapArrow.testKillQueryByTagNegative 
(batchId=293)
org.apache.hive.jdbc.TestJdbcWithMiniLlapArrow.testLlapInputFormatEndToEnd 
(batchId=293)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/20954/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20954/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20954/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 28 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12995664 - PreCommit-HIVE-Build

> Schedule Repl Load using Hive Scheduler
> ---
>
> Key: HIVE-22954
> URL: https://issues.apache.org/jira/browse/HIVE-22954
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22954.01.patch, HIVE-22954.02.patch, 
> HIVE-22954.03.patch, HIVE-22954.04.patch, HIVE-22954.05.patch, 
> HIVE-22954.06.patch, HIVE-22954.07.patch, HIVE-22954.08.patch, 
> 

[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=398065=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398065
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 05/Mar/20 01:55
Start Date: 05/Mar/20 01:55
Worklog Time Spent: 10m 
  Work Description: davidov541 commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r388038766
 
 

 ##
 File path: kafka-handler/README.md
 ##
 @@ -50,6 +50,9 @@ ALTER TABLE
 SET TBLPROPERTIES (
   "kafka.serde.class" = "org.apache.hadoop.hive.serde2.avro.AvroSerDe");
 ```
+
+If you use Confluent Avro serialzier/deserializer with Schema Registry you may 
want to remove 5 bytes from beginning that represents magic byte + schema ID 
from registry.
 
 Review comment:
   How about this: for now, we do not call this Confluent, and instead document 
that if you are using Confluent, you need to use skip bytes = 5. Once we 
implement the feature to use the schema ID properly, then we can use Confluent 
at that point. That way it is clear what functionality must be in place in 
order to have a separate SerDe type.
   
   In a related note, I don't see any reference to a JIRA to implement the 
feature to use the schema ID. Do either of you have that one? If not, should I 
create that and link it for reference?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398065)
Time Spent: 11.5h  (was: 11h 20m)

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch
>
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=398062=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398062
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 05/Mar/20 01:52
Start Date: 05/Mar/20 01:52
Worklog Time Spent: 10m 
  Work Description: davidov541 commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r388037768
 
 

 ##
 File path: kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java
 ##
 @@ -369,6 +402,26 @@ private SubStructObjectInspector(StructObjectInspector 
baseOI, int toIndex) {
 }
   }
 
+/**
+ * The converter reads bytes from kafka message and skip first @skipBytes 
from beginning.
+ *
+ * For example:
+ *   The Confluent Avro serializer adds 5 magic bytes that represents 
Schema ID as Integer to the message.
+ */
+  static class AvroSkipBytesConverter extends AvroBytesConverter {
+private final int skipBytes;
+
+AvroSkipBytesConverter(Schema schema, int skipBytes) {
+  super(schema);
+  this.skipBytes = skipBytes;
+}
+
+@Override
+Decoder getDecoder(byte[] value) {
+  return DecoderFactory.get().binaryDecoder(value, this.skipBytes, 
value.length - this.skipBytes, null);
 
 Review comment:
   BinaryDecoder already throws a nice ArrayIndexOutOfBoundsException in this 
case, so I'm going to update to catch that, wrap in a SerDe exception, and keep 
going.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398062)
Time Spent: 11h 20m  (was: 11h 10m)

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch
>
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=398061=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398061
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 05/Mar/20 01:51
Start Date: 05/Mar/20 01:51
Worklog Time Spent: 10m 
  Work Description: davidov541 commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r388037563
 
 

 ##
 File path: kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java
 ##
 @@ -133,12 +134,40 @@
   Preconditions.checkArgument(!schemaFromProperty.isEmpty(), "Avro Schema 
is empty Can not go further");
   Schema schema = AvroSerdeUtils.getSchemaFor(schemaFromProperty);
   LOG.debug("Building Avro Reader with schema {}", schemaFromProperty);
-  bytesConverter = new AvroBytesConverter(schema);
+  bytesConverter = getByteConverterForAvroDelegate(schema, tbl);
 } else {
   bytesConverter = new BytesWritableConverter();
 }
   }
 
+  enum BytesConverterType {
+CONFLUENT,
+SKIP,
+NONE;
+
+static BytesConverterType fromString(String value) {
+  try {
+return BytesConverterType.valueOf(value.trim().toUpperCase());
+  } catch (Exception e){
+return NONE;
+  }
+}
+  }
+
+  BytesConverter getByteConverterForAvroDelegate(Schema schema, Properties 
tbl) {
+String avroBytesConverterPropertyName = 
AvroSerdeUtils.AvroTableProperties.AVRO_SERDE_TYPE.getPropName();
+String avroBytesConverterProperty = 
tbl.getProperty(avroBytesConverterPropertyName, 
+  BytesConverterType.NONE.toString());
+BytesConverterType avroByteConverterType = 
BytesConverterType.fromString(avroBytesConverterProperty);
+String avroSkipBytesPropertyName = 
AvroSerdeUtils.AvroTableProperties.AVRO_SERDE_SKIP_BYTES.getPropName();
+Integer avroSkipBytes = 
Integer.parseInt(tbl.getProperty(avroSkipBytesPropertyName));
+switch (avroByteConverterType) {
+  case CONFLUENT: return new AvroSkipBytesConverter(schema, 5);
+  case SKIP: return new AvroSkipBytesConverter(schema, avroSkipBytes);
+  default: return new AvroBytesConverter(schema);
 
 Review comment:
   This would be more confusing to me than the current code, personally. I will 
call out the NONE case, however, along with an error if it's not one of those 
three.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398061)
Time Spent: 11h 10m  (was: 11h)

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch
>
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22954) Schedule Repl Load using Hive Scheduler

2020-03-04 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-22954:
---
Status: In Progress  (was: Patch Available)

> Schedule Repl Load using Hive Scheduler
> ---
>
> Key: HIVE-22954
> URL: https://issues.apache.org/jira/browse/HIVE-22954
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22954.01.patch, HIVE-22954.02.patch, 
> HIVE-22954.03.patch, HIVE-22954.04.patch, HIVE-22954.05.patch, 
> HIVE-22954.06.patch, HIVE-22954.07.patch, HIVE-22954.08.patch, 
> HIVE-22954.09.patch, HIVE-22954.10.patch, HIVE-22954.11.patch, 
> HIVE-22954.12.patch, HIVE-22954.13.patch, HIVE-22954.15.patch, 
> HIVE-22954.patch
>
>
> [https://github.com/apache/hive/pull/932]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22954) Schedule Repl Load using Hive Scheduler

2020-03-04 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-22954:
---
Attachment: HIVE-22954.15.patch
Status: Patch Available  (was: In Progress)

> Schedule Repl Load using Hive Scheduler
> ---
>
> Key: HIVE-22954
> URL: https://issues.apache.org/jira/browse/HIVE-22954
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22954.01.patch, HIVE-22954.02.patch, 
> HIVE-22954.03.patch, HIVE-22954.04.patch, HIVE-22954.05.patch, 
> HIVE-22954.06.patch, HIVE-22954.07.patch, HIVE-22954.08.patch, 
> HIVE-22954.09.patch, HIVE-22954.10.patch, HIVE-22954.11.patch, 
> HIVE-22954.12.patch, HIVE-22954.13.patch, HIVE-22954.15.patch, 
> HIVE-22954.patch
>
>
> [https://github.com/apache/hive/pull/932]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22954) Schedule Repl Load using Hive Scheduler

2020-03-04 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051726#comment-17051726
 ] 

Hive QA commented on HIVE-22954:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  2m  
4s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
49s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
7s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
20s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
57s{color} | {color:blue} parser in master has 3 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
51s{color} | {color:blue} ql in master has 1531 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
43s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
42s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
28s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
11s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
41s{color} | {color:red} ql: The patch generated 2 new + 44 unchanged - 0 fixed 
= 46 total (was 44) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
29s{color} | {color:red} itests/hive-unit: The patch generated 1 new + 1327 
unchanged - 1 fixed = 1328 total (was 1328) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  4m  
4s{color} | {color:red} ql generated 1 new + 1531 unchanged - 0 fixed = 1532 
total (was 1531) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
37s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
17s{color} | {color:red} The patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 35m 54s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:ql |
|  |  Boxing/unboxing to parse a primitive 
org.apache.hadoop.hive.ql.parse.ReplicationSemanticAnalyzer.getCurrentLoadPath()
  At 
ReplicationSemanticAnalyzer.java:org.apache.hadoop.hive.ql.parse.ReplicationSemanticAnalyzer.getCurrentLoadPath()
  At ReplicationSemanticAnalyzer.java:[line 446] |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-20954/dev-support/hive-personality.sh
 |
| git revision | master / deebfb6 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.1 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20954/yetus/diff-checkstyle-ql.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20954/yetus/diff-checkstyle-itests_hive-unit.txt
 |
| findbugs | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20954/yetus/new-findbugs-ql.html
 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20954/yetus/patch-asflicense-problems.txt
 |
| modules | C: parser ql itests/hive-unit U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20954/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Schedule 

[jira] [Commented] (HIVE-22875) Refactor query creation in QueryCompactor implementations

2020-03-04 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051708#comment-17051708
 ] 

Hive QA commented on HIVE-22875:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12995594/HIVE-22875.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 18096 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.ql.TestWarehouseExternalDir.testManagedPaths 
(batchId=270)
org.apache.hadoop.hive.ql.txn.compactor.TestMmCompactorOnTez.testMmMajorCompactionAfterMinor
 (batchId=270)
org.apache.hadoop.hive.ql.txn.compactor.TestMmCompactorOnTez.testMultipleMmMinorCompactions
 (batchId=270)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/20953/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20953/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20953/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12995594 - PreCommit-HIVE-Build

> Refactor query creation in QueryCompactor implementations
> -
>
> Key: HIVE-22875
> URL: https://issues.apache.org/jira/browse/HIVE-22875
> Project: Hive
>  Issue Type: Improvement
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
> Attachments: HIVE-22875.01.patch
>
>
> There is a lot of repetition where creation/compaction/drop queries are 
> created in MajorQueryCompactor, MinorQueryCompactor, MmMajorQueryCompactor 
> and MmMinorQueryCompactor.
> Initial idea is to create a CompactionQueryBuilder that all 4 implementations 
> would use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22875) Refactor query creation in QueryCompactor implementations

2020-03-04 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051694#comment-17051694
 ] 

Hive QA commented on HIVE-22875:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
49s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
10s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
45s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 2s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
50s{color} | {color:blue} ql in master has 1531 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
42s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
21s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
28s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
40s{color} | {color:red} ql: The patch generated 3 new + 0 unchanged - 0 fixed 
= 3 total (was 0) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
15s{color} | {color:red} The patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 31m 29s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-20953/dev-support/hive-personality.sh
 |
| git revision | master / deebfb6 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.1 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20953/yetus/diff-checkstyle-ql.txt
 |
| whitespace | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20953/yetus/whitespace-eol.txt
 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20953/yetus/patch-asflicense-problems.txt
 |
| modules | C: ql itests/hive-unit U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20953/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Refactor query creation in QueryCompactor implementations
> -
>
> Key: HIVE-22875
> URL: https://issues.apache.org/jira/browse/HIVE-22875
> Project: Hive
>  Issue Type: Improvement
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
> Attachments: HIVE-22875.01.patch
>
>
> There is a lot of repetition where creation/compaction/drop queries are 
> created in MajorQueryCompactor, MinorQueryCompactor, MmMajorQueryCompactor 
> and MmMinorQueryCompactor.
> Initial idea is to create a CompactionQueryBuilder that all 4 implementations 
> would use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397993=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397993
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 04/Mar/20 23:45
Start Date: 04/Mar/20 23:45
Worklog Time Spent: 10m 
  Work Description: cricket007 commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r388001426
 
 

 ##
 File path: kafka-handler/README.md
 ##
 @@ -50,6 +50,9 @@ ALTER TABLE
 SET TBLPROPERTIES (
   "kafka.serde.class" = "org.apache.hadoop.hive.serde2.avro.AvroSerDe");
 ```
+
+If you use Confluent Avro serialzier/deserializer with Schema Registry you may 
want to remove 5 bytes from beginning that represents magic byte + schema ID 
from registry.
+It can be done by setting `"avro.serde.type"="confluent"` or 
`"avro.serde.type"="skip"` with `"avro.serde.skip.bytes"="5"`. It's recommended 
to set an avro schema via 
`"avro.schema.url"="http://hostname/SimpleDocument.avsc"` or 
`"avro.schema.literal"="{"type" : "record","name" : "SimpleRecord","..."}`. If 
both properties are set then `avro.schema.literal` has higher priority.
 
 Review comment:
   It is still recommended to set the literal if using `confluent`?
   
   Link https://github.com/confluentinc/schema-registry/pull/686
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397993)
Time Spent: 11h  (was: 10h 50m)

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch
>
>  Time Spent: 11h
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397991=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397991
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 04/Mar/20 23:43
Start Date: 04/Mar/20 23:43
Worklog Time Spent: 10m 
  Work Description: cricket007 commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r387999324
 
 

 ##
 File path: 
kafka-handler/src/test/org/apache/hadoop/hive/kafka/AvroBytesConverterTest.java
 ##
 @@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.kafka;
+
+import com.google.common.collect.Maps;
+import io.confluent.kafka.schemaregistry.client.MockSchemaRegistryClient;
+import io.confluent.kafka.serializers.AbstractKafkaAvroSerDeConfig;
+import io.confluent.kafka.serializers.KafkaAvroSerializer;
+import org.apache.avro.Schema;
+import org.apache.hadoop.hive.serde2.avro.AvroGenericRecordWritable;
+import org.junit.Assert;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import java.util.Arrays;
+import java.util.Map;
+
+/**
+ * Test class for Hive Kafka Avro bytes converter.
+ */
+public class AvroBytesConverterTest {
+  private static SimpleRecord simpleRecord1 = 
SimpleRecord.newBuilder().setId("123").setName("test").build();
+  private static byte[] simpleRecord1AsBytes;
+
+  /**
+   * Emulate confluent avro producer that add 4 magic bits (int) before value 
bytes. The int represents the schema ID from schema registry.
+   */
+  @BeforeClass
+  public static void setUp() {
+Map config = Maps.newHashMap();
+config.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, 
"http://localhost:8081;);
+KafkaAvroSerializer avroSerializer = new KafkaAvroSerializer(new 
MockSchemaRegistryClient());
+avroSerializer.configure(config, false);
+simpleRecord1AsBytes = avroSerializer.serialize("temp", simpleRecord1);
+  }
+
+  /**
+   * Emulate - avro.serde.type = none (Default).
+   */
+  @Test
+  public void convertWithAvroBytesConverter() {
+Schema schema = SimpleRecord.getClassSchema();
+KafkaSerDe.AvroBytesConverter conv = new 
KafkaSerDe.AvroBytesConverter(schema);
+AvroGenericRecordWritable simpleRecord1Writable = 
conv.getWritable(simpleRecord1AsBytes);
+
+Assert.assertNotNull(simpleRecord1Writable);
+Assert.assertEquals(SimpleRecord.class, 
simpleRecord1Writable.getRecord().getClass());
+
+SimpleRecord simpleRecord1Deserialized = (SimpleRecord) 
simpleRecord1Writable.getRecord();
+
+Assert.assertNotNull(simpleRecord1Deserialized);
+Assert.assertNotEquals(simpleRecord1, simpleRecord1Deserialized);
+  }
+
+  /**
+   * Emulate - avro.serde.type = confluent.
+   */
+  @Test
+  public void convertWithConfluentAvroBytesConverter() {
+Schema schema = SimpleRecord.getClassSchema();
+KafkaSerDe.AvroSkipBytesConverter conv = new 
KafkaSerDe.AvroSkipBytesConverter(schema, 5);
+AvroGenericRecordWritable simpleRecord1Writable = 
conv.getWritable(simpleRecord1AsBytes);
+
+Assert.assertNotNull(simpleRecord1Writable);
+Assert.assertEquals(SimpleRecord.class, 
simpleRecord1Writable.getRecord().getClass());
+
+SimpleRecord simpleRecord1Deserialized = (SimpleRecord) 
simpleRecord1Writable.getRecord();
+
+Assert.assertNotNull(simpleRecord1Deserialized);
+Assert.assertEquals(simpleRecord1, simpleRecord1Deserialized);
 
 Review comment:
   What does the `1` represent here?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397991)
Time Spent: 10h 50m  (was: 10h 40m)

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> 

[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397988=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397988
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 04/Mar/20 23:43
Start Date: 04/Mar/20 23:43
Worklog Time Spent: 10m 
  Work Description: cricket007 commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r388000371
 
 

 ##
 File path: kafka-handler/pom.xml
 ##
 @@ -118,8 +118,27 @@
   1.7.30
   test
 
+
+  io.confluent
+  kafka-avro-serializer
+  5.4.0
+  test
+  
+
+  org.apache.avro
+  avro
 
 Review comment:
   Avro itself is still needed as a compile-time  & runtime dependency 
elsewhere. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397988)
Time Spent: 10h 40m  (was: 10.5h)

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch
>
>  Time Spent: 10h 40m
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397983=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397983
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 04/Mar/20 23:43
Start Date: 04/Mar/20 23:43
Worklog Time Spent: 10m 
  Work Description: cricket007 commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r387997559
 
 

 ##
 File path: kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java
 ##
 @@ -133,12 +134,40 @@
   Preconditions.checkArgument(!schemaFromProperty.isEmpty(), "Avro Schema 
is empty Can not go further");
   Schema schema = AvroSerdeUtils.getSchemaFor(schemaFromProperty);
   LOG.debug("Building Avro Reader with schema {}", schemaFromProperty);
-  bytesConverter = new AvroBytesConverter(schema);
+  bytesConverter = getByteConverterForAvroDelegate(schema, tbl);
 } else {
   bytesConverter = new BytesWritableConverter();
 }
   }
 
+  enum BytesConverterType {
+CONFLUENT,
 
 Review comment:
   Overall, I'm somewhat in agreement with @b-slim here. There is little reason 
to make a specific "subtype" if it is just documented that `avro.skip.bytes=5` 
will get the necessary Avro payload. 
   
   **However**, you would not know _which_ of those 5 bytes actually represents 
the schema ID in order to set `schema.literal` behind the scenes.  
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397983)
Time Spent: 10h 10m  (was: 10h)

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch
>
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397989=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397989
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 04/Mar/20 23:43
Start Date: 04/Mar/20 23:43
Worklog Time Spent: 10m 
  Work Description: cricket007 commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r387999601
 
 

 ##
 File path: 
kafka-handler/src/test/org/apache/hadoop/hive/kafka/AvroBytesConverterTest.java
 ##
 @@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.kafka;
+
+import com.google.common.collect.Maps;
+import io.confluent.kafka.schemaregistry.client.MockSchemaRegistryClient;
+import io.confluent.kafka.serializers.AbstractKafkaAvroSerDeConfig;
+import io.confluent.kafka.serializers.KafkaAvroSerializer;
+import org.apache.avro.Schema;
+import org.apache.hadoop.hive.serde2.avro.AvroGenericRecordWritable;
+import org.junit.Assert;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import java.util.Arrays;
+import java.util.Map;
+
+/**
+ * Test class for Hive Kafka Avro bytes converter.
+ */
+public class AvroBytesConverterTest {
+  private static SimpleRecord simpleRecord1 = 
SimpleRecord.newBuilder().setId("123").setName("test").build();
+  private static byte[] simpleRecord1AsBytes;
+
+  /**
+   * Emulate confluent avro producer that add 4 magic bits (int) before value 
bytes. The int represents the schema ID from schema registry.
+   */
+  @BeforeClass
+  public static void setUp() {
+Map config = Maps.newHashMap();
+config.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, 
"http://localhost:8081;);
+KafkaAvroSerializer avroSerializer = new KafkaAvroSerializer(new 
MockSchemaRegistryClient());
+avroSerializer.configure(config, false);
+simpleRecord1AsBytes = avroSerializer.serialize("temp", simpleRecord1);
+  }
+
+  /**
+   * Emulate - avro.serde.type = none (Default).
+   */
+  @Test
+  public void convertWithAvroBytesConverter() {
+Schema schema = SimpleRecord.getClassSchema();
+KafkaSerDe.AvroBytesConverter conv = new 
KafkaSerDe.AvroBytesConverter(schema);
+AvroGenericRecordWritable simpleRecord1Writable = 
conv.getWritable(simpleRecord1AsBytes);
+
+Assert.assertNotNull(simpleRecord1Writable);
+Assert.assertEquals(SimpleRecord.class, 
simpleRecord1Writable.getRecord().getClass());
+
+SimpleRecord simpleRecord1Deserialized = (SimpleRecord) 
simpleRecord1Writable.getRecord();
+
+Assert.assertNotNull(simpleRecord1Deserialized);
+Assert.assertNotEquals(simpleRecord1, simpleRecord1Deserialized);
+  }
+
+  /**
+   * Emulate - avro.serde.type = confluent.
+   */
+  @Test
+  public void convertWithConfluentAvroBytesConverter() {
+Schema schema = SimpleRecord.getClassSchema();
+KafkaSerDe.AvroSkipBytesConverter conv = new 
KafkaSerDe.AvroSkipBytesConverter(schema, 5);
+AvroGenericRecordWritable simpleRecord1Writable = 
conv.getWritable(simpleRecord1AsBytes);
+
+Assert.assertNotNull(simpleRecord1Writable);
+Assert.assertEquals(SimpleRecord.class, 
simpleRecord1Writable.getRecord().getClass());
+
+SimpleRecord simpleRecord1Deserialized = (SimpleRecord) 
simpleRecord1Writable.getRecord();
+
+Assert.assertNotNull(simpleRecord1Deserialized);
+Assert.assertEquals(simpleRecord1, simpleRecord1Deserialized);
+  }
+
+  /**
+   * Emulate - avro.serde.type = skip.
+   */
+  @Test
+  public void convertWithCustomAvroSkipBytesConverter() {
+int offset = 2;
+byte[] simpleRecordAsOffsetBytes = 
Arrays.copyOfRange(simpleRecord1AsBytes, 5 - offset, 
simpleRecord1AsBytes.length);
+
+Schema schema = SimpleRecord.getClassSchema();
+KafkaSerDe.AvroSkipBytesConverter conv = new 
KafkaSerDe.AvroSkipBytesConverter(schema, offset);
+AvroGenericRecordWritable simpleRecord1Writable = 
conv.getWritable(simpleRecordAsOffsetBytes);
+
+Assert.assertNotNull(simpleRecord1Writable);
+

[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397990=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397990
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 04/Mar/20 23:43
Start Date: 04/Mar/20 23:43
Worklog Time Spent: 10m 
  Work Description: cricket007 commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r387998699
 
 

 ##
 File path: 
kafka-handler/src/test/org/apache/hadoop/hive/kafka/AvroBytesConverterTest.java
 ##
 @@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.kafka;
+
+import com.google.common.collect.Maps;
+import io.confluent.kafka.schemaregistry.client.MockSchemaRegistryClient;
+import io.confluent.kafka.serializers.AbstractKafkaAvroSerDeConfig;
+import io.confluent.kafka.serializers.KafkaAvroSerializer;
+import org.apache.avro.Schema;
+import org.apache.hadoop.hive.serde2.avro.AvroGenericRecordWritable;
+import org.junit.Assert;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import java.util.Arrays;
+import java.util.Map;
+
+/**
+ * Test class for Hive Kafka Avro bytes converter.
+ */
+public class AvroBytesConverterTest {
+  private static SimpleRecord simpleRecord1 = 
SimpleRecord.newBuilder().setId("123").setName("test").build();
+  private static byte[] simpleRecord1AsBytes;
+
+  /**
+   * Emulate confluent avro producer that add 4 magic bits (int) before value 
bytes. The int represents the schema ID from schema registry.
 
 Review comment:
   nit: `Confluent` is a noun/company
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397990)

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch
>
>  Time Spent: 10h 40m
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397985=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397985
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 04/Mar/20 23:43
Start Date: 04/Mar/20 23:43
Worklog Time Spent: 10m 
  Work Description: cricket007 commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r387998774
 
 

 ##
 File path: 
kafka-handler/src/test/org/apache/hadoop/hive/kafka/AvroBytesConverterTest.java
 ##
 @@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.kafka;
+
+import com.google.common.collect.Maps;
+import io.confluent.kafka.schemaregistry.client.MockSchemaRegistryClient;
+import io.confluent.kafka.serializers.AbstractKafkaAvroSerDeConfig;
+import io.confluent.kafka.serializers.KafkaAvroSerializer;
+import org.apache.avro.Schema;
+import org.apache.hadoop.hive.serde2.avro.AvroGenericRecordWritable;
+import org.junit.Assert;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import java.util.Arrays;
+import java.util.Map;
+
+/**
+ * Test class for Hive Kafka Avro bytes converter.
+ */
+public class AvroBytesConverterTest {
+  private static SimpleRecord simpleRecord1 = 
SimpleRecord.newBuilder().setId("123").setName("test").build();
+  private static byte[] simpleRecord1AsBytes;
+
+  /**
+   * Emulate confluent avro producer that add 4 magic bits (int) before value 
bytes. The int represents the schema ID from schema registry.
 
 Review comment:
   `Schema Registry` is a noun/product
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397985)
Time Spent: 10h 20m  (was: 10h 10m)

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch
>
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397992=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397992
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 04/Mar/20 23:43
Start Date: 04/Mar/20 23:43
Worklog Time Spent: 10m 
  Work Description: cricket007 commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r387999851
 
 

 ##
 File path: 
kafka-handler/src/test/org/apache/hadoop/hive/kafka/AvroBytesConverterTest.java
 ##
 @@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.kafka;
+
+import com.google.common.collect.Maps;
+import io.confluent.kafka.schemaregistry.client.MockSchemaRegistryClient;
+import io.confluent.kafka.serializers.AbstractKafkaAvroSerDeConfig;
+import io.confluent.kafka.serializers.KafkaAvroSerializer;
+import org.apache.avro.Schema;
+import org.apache.hadoop.hive.serde2.avro.AvroGenericRecordWritable;
+import org.junit.Assert;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import java.util.Arrays;
+import java.util.Map;
+
+/**
+ * Test class for Hive Kafka Avro bytes converter.
+ */
+public class AvroBytesConverterTest {
+  private static SimpleRecord simpleRecord1 = 
SimpleRecord.newBuilder().setId("123").setName("test").build();
+  private static byte[] simpleRecord1AsBytes;
+
+  /**
+   * Emulate confluent avro producer that add 4 magic bits (int) before value 
bytes. The int represents the schema ID from schema registry.
+   */
+  @BeforeClass
+  public static void setUp() {
+Map config = Maps.newHashMap();
+config.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, 
"http://localhost:8081;);
+KafkaAvroSerializer avroSerializer = new KafkaAvroSerializer(new 
MockSchemaRegistryClient());
+avroSerializer.configure(config, false);
+simpleRecord1AsBytes = avroSerializer.serialize("temp", simpleRecord1);
+  }
+
+  /**
+   * Emulate - avro.serde.type = none (Default).
+   */
+  @Test
+  public void convertWithAvroBytesConverter() {
+Schema schema = SimpleRecord.getClassSchema();
+KafkaSerDe.AvroBytesConverter conv = new 
KafkaSerDe.AvroBytesConverter(schema);
+AvroGenericRecordWritable simpleRecord1Writable = 
conv.getWritable(simpleRecord1AsBytes);
+
+Assert.assertNotNull(simpleRecord1Writable);
+Assert.assertEquals(SimpleRecord.class, 
simpleRecord1Writable.getRecord().getClass());
+
+SimpleRecord simpleRecord1Deserialized = (SimpleRecord) 
simpleRecord1Writable.getRecord();
+
+Assert.assertNotNull(simpleRecord1Deserialized);
+Assert.assertNotEquals(simpleRecord1, simpleRecord1Deserialized);
+  }
+
+  /**
+   * Emulate - avro.serde.type = confluent.
+   */
+  @Test
+  public void convertWithConfluentAvroBytesConverter() {
+Schema schema = SimpleRecord.getClassSchema();
+KafkaSerDe.AvroSkipBytesConverter conv = new 
KafkaSerDe.AvroSkipBytesConverter(schema, 5);
+AvroGenericRecordWritable simpleRecord1Writable = 
conv.getWritable(simpleRecord1AsBytes);
+
+Assert.assertNotNull(simpleRecord1Writable);
+Assert.assertEquals(SimpleRecord.class, 
simpleRecord1Writable.getRecord().getClass());
+
+SimpleRecord simpleRecord1Deserialized = (SimpleRecord) 
simpleRecord1Writable.getRecord();
+
+Assert.assertNotNull(simpleRecord1Deserialized);
+Assert.assertEquals(simpleRecord1, simpleRecord1Deserialized);
+  }
+
+  /**
+   * Emulate - avro.serde.type = skip.
 
 Review comment:
   You have three methods that roughly do the same thing. 
   
   Might be a good use case for 
https://github.com/junit-team/junit4/wiki/Parameterized-tests
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: 

[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397984=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397984
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 04/Mar/20 23:43
Start Date: 04/Mar/20 23:43
Worklog Time Spent: 10m 
  Work Description: cricket007 commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r387997714
 
 

 ##
 File path: kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java
 ##
 @@ -133,12 +134,40 @@
   Preconditions.checkArgument(!schemaFromProperty.isEmpty(), "Avro Schema 
is empty Can not go further");
   Schema schema = AvroSerdeUtils.getSchemaFor(schemaFromProperty);
   LOG.debug("Building Avro Reader with schema {}", schemaFromProperty);
-  bytesConverter = new AvroBytesConverter(schema);
+  bytesConverter = getByteConverterForAvroDelegate(schema, tbl);
 } else {
   bytesConverter = new BytesWritableConverter();
 }
   }
 
+  enum BytesConverterType {
+CONFLUENT,
+SKIP,
+NONE;
+
+static BytesConverterType fromString(String value) {
+  try {
+return BytesConverterType.valueOf(value.trim().toUpperCase());
+  } catch (Exception e){
+return NONE;
+  }
+}
+  }
+
+  BytesConverter getByteConverterForAvroDelegate(Schema schema, Properties 
tbl) {
+String avroBytesConverterPropertyName = 
AvroSerdeUtils.AvroTableProperties.AVRO_SERDE_TYPE.getPropName();
+String avroBytesConverterProperty = 
tbl.getProperty(avroBytesConverterPropertyName, 
+  BytesConverterType.NONE.toString());
+BytesConverterType avroByteConverterType = 
BytesConverterType.fromString(avroBytesConverterProperty);
+String avroSkipBytesPropertyName = 
AvroSerdeUtils.AvroTableProperties.AVRO_SERDE_SKIP_BYTES.getPropName();
+Integer avroSkipBytes = 
Integer.parseInt(tbl.getProperty(avroSkipBytesPropertyName));
 
 Review comment:
   nit `catch (NumberFormatException )`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397984)
Time Spent: 10h 20m  (was: 10h 10m)

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch
>
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397982=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397982
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 04/Mar/20 23:43
Start Date: 04/Mar/20 23:43
Worklog Time Spent: 10m 
  Work Description: cricket007 commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r387995318
 
 

 ##
 File path: kafka-handler/README.md
 ##
 @@ -50,6 +50,9 @@ ALTER TABLE
 SET TBLPROPERTIES (
   "kafka.serde.class" = "org.apache.hadoop.hive.serde2.avro.AvroSerDe");
 ```
+
+If you use Confluent Avro serialzier/deserializer with Schema Registry you may 
want to remove 5 bytes from beginning that represents magic byte + schema ID 
from registry.
 
 Review comment:
   ```suggestion
   If you use Confluent's Avro serializer/deserializer with their Schema 
Registry, you may want to remove the first 5 bytes from the beginning of the 
Kafka field, which represent a magic byte (0x0) & a numeric schema integer ID.
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397982)
Time Spent: 10h  (was: 9h 50m)

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch
>
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397987=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397987
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 04/Mar/20 23:43
Start Date: 04/Mar/20 23:43
Worklog Time Spent: 10m 
  Work Description: cricket007 commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r387998553
 
 

 ##
 File path: kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java
 ##
 @@ -369,6 +402,26 @@ private SubStructObjectInspector(StructObjectInspector 
baseOI, int toIndex) {
 }
   }
 
+/**
+ * The converter reads bytes from kafka message and skip first @skipBytes 
from beginning.
+ *
+ * For example:
+ *   The Confluent Avro serializer adds 5 magic bytes that represents 
Schema ID as Integer to the message.
+ */
+  static class AvroSkipBytesConverter extends AvroBytesConverter {
+private final int skipBytes;
+
+AvroSkipBytesConverter(Schema schema, int skipBytes) {
+  super(schema);
+  this.skipBytes = skipBytes;
+}
+
+@Override
+Decoder getDecoder(byte[] value) {
+  return DecoderFactory.get().binaryDecoder(value, this.skipBytes, 
value.length - this.skipBytes, null);
 
 Review comment:
   Should there be validation that `skipBytes.length < value.length`?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397987)

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch
>
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397981=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397981
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 04/Mar/20 23:43
Start Date: 04/Mar/20 23:43
Worklog Time Spent: 10m 
  Work Description: cricket007 commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r387996074
 
 

 ##
 File path: kafka-handler/README.md
 ##
 @@ -50,6 +50,9 @@ ALTER TABLE
 SET TBLPROPERTIES (
   "kafka.serde.class" = "org.apache.hadoop.hive.serde2.avro.AvroSerDe");
 ```
+
+If you use Confluent Avro serialzier/deserializer with Schema Registry you may 
want to remove 5 bytes from beginning that represents magic byte + schema ID 
from registry.
+It can be done by setting `"avro.serde.type"="confluent"` or 
`"avro.serde.type"="skip"` with `"avro.serde.skip.bytes"="5"`. It's recommended 
to set an avro schema via 
`"avro.schema.url"="http://hostname/SimpleDocument.avsc"` or 
`"avro.schema.literal"="{"type" : "record","name" : "SimpleRecord","..."}`. If 
both properties are set then `avro.schema.literal` has higher priority.
 
 Review comment:
   ```suggestion
   It can be done by setting `"avro.serde.type"="confluent"` or 
`"avro.serde.type"="skip"` with `"avro.serde.skip.bytes"="5"`. It's recommended 
to set an Avro schema via 
`"avro.schema.url"="http://hostname/SimpleDocument.avsc"` or 
`"avro.schema.literal"="{"type" : "record","name" : "SimpleRecord","..."}`. If 
both properties are set then `avro.schema.literal` has higher priority.
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397981)
Time Spent: 10h  (was: 9h 50m)

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch
>
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397986=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397986
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 04/Mar/20 23:43
Start Date: 04/Mar/20 23:43
Worklog Time Spent: 10m 
  Work Description: cricket007 commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r387998210
 
 

 ##
 File path: kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java
 ##
 @@ -133,12 +134,40 @@
   Preconditions.checkArgument(!schemaFromProperty.isEmpty(), "Avro Schema 
is empty Can not go further");
   Schema schema = AvroSerdeUtils.getSchemaFor(schemaFromProperty);
   LOG.debug("Building Avro Reader with schema {}", schemaFromProperty);
-  bytesConverter = new AvroBytesConverter(schema);
+  bytesConverter = getByteConverterForAvroDelegate(schema, tbl);
 } else {
   bytesConverter = new BytesWritableConverter();
 }
   }
 
+  enum BytesConverterType {
+CONFLUENT,
+SKIP,
+NONE;
+
+static BytesConverterType fromString(String value) {
+  try {
+return BytesConverterType.valueOf(value.trim().toUpperCase());
+  } catch (Exception e){
+return NONE;
+  }
+}
+  }
+
+  BytesConverter getByteConverterForAvroDelegate(Schema schema, Properties 
tbl) {
+String avroBytesConverterPropertyName = 
AvroSerdeUtils.AvroTableProperties.AVRO_SERDE_TYPE.getPropName();
+String avroBytesConverterProperty = 
tbl.getProperty(avroBytesConverterPropertyName, 
+  BytesConverterType.NONE.toString());
+BytesConverterType avroByteConverterType = 
BytesConverterType.fromString(avroBytesConverterProperty);
+String avroSkipBytesPropertyName = 
AvroSerdeUtils.AvroTableProperties.AVRO_SERDE_SKIP_BYTES.getPropName();
+Integer avroSkipBytes = 
Integer.parseInt(tbl.getProperty(avroSkipBytesPropertyName));
+switch (avroByteConverterType) {
+  case CONFLUENT: return new AvroSkipBytesConverter(schema, 5);
+  case SKIP: return new AvroSkipBytesConverter(schema, avroSkipBytes);
+  default: return new AvroBytesConverter(schema);
 
 Review comment:
   Could fall-through on the cases. 
   
   ```
   int skipBytes = avroSkipBytes;
   switch (avroByteConverterType) {
 case CONFLUENT: skipBytes = 5;
 case SKIP: return new AvroSkipBytesConverter(schema, avroSkipBytes);
 default: return new AvroBytesConverter(schema);
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397986)
Time Spent: 10.5h  (was: 10h 20m)

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch
>
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22865) Include data in replication staging directory

2020-03-04 Thread PRAVIN KUMAR SINHA (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PRAVIN KUMAR SINHA updated HIVE-22865:
--
Attachment: HIVE-22865.9.patch

> Include data in replication staging directory
> -
>
> Key: HIVE-22865
> URL: https://issues.apache.org/jira/browse/HIVE-22865
> Project: Hive
>  Issue Type: Task
>Reporter: PRAVIN KUMAR SINHA
>Assignee: PRAVIN KUMAR SINHA
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22865.1.patch, HIVE-22865.2.patch, 
> HIVE-22865.3.patch, HIVE-22865.4.patch, HIVE-22865.5.patch, 
> HIVE-22865.6.patch, HIVE-22865.7.patch, HIVE-22865.8.patch, HIVE-22865.9.patch
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-22126) hive-exec packaging should shade guava

2020-03-04 Thread Eugene Chung (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051680#comment-17051680
 ] 

Eugene Chung edited comment on HIVE-22126 at 3/4/20, 11:26 PM:
---

[^HIVE-22126.07.patch]

I missed some tests which also require dependency of calcite-core dependent 
modules.


was (Author: euigeun_chung):
[^HIVE-22126.07.patch]

I missed some tests which requires dependency of calcite-core dependent modules.

> hive-exec packaging should shade guava
> --
>
> Key: HIVE-22126
> URL: https://issues.apache.org/jira/browse/HIVE-22126
> Project: Hive
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Eugene Chung
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22126.01.patch, HIVE-22126.02.patch, 
> HIVE-22126.03.patch, HIVE-22126.04.patch, HIVE-22126.05.patch, 
> HIVE-22126.06.patch, HIVE-22126.07.patch
>
>
> The ql/pom.xml includes complete guava library into hive-exec.jar 
> https://github.com/apache/hive/blob/master/ql/pom.xml#L990 This causes a 
> problems for downstream clients of hive which have hive-exec.jar in their 
> classpath since they are pinned to the same guava version as that of hive. 
> We should shade guava classes so that other components which depend on 
> hive-exec can independently use a different version of guava as needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397965=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397965
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 04/Mar/20 23:26
Start Date: 04/Mar/20 23:26
Worklog Time Spent: 10m 
  Work Description: cricket007 commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r387995005
 
 

 ##
 File path: kafka-handler/README.md
 ##
 @@ -50,6 +50,9 @@ ALTER TABLE
 SET TBLPROPERTIES (
   "kafka.serde.class" = "org.apache.hadoop.hive.serde2.avro.AvroSerDe");
 ```
+
+If you use Confluent Avro serialzier/deserializer with Schema Registry you may 
want to remove 5 bytes from beginning that represents magic byte + schema ID 
from registry.
 
 Review comment:
   Links for Context - 
https://docs.confluent.io/current/schema-registry/serializer-formatter.html#wire-format
   
   However, there are *other* Registries, so I would like to make a point to 
make this as extensible as possible. 
   
   - https://github.com/hortonworks/registry
   - https://github.com/Apicurio/apicurio-registry/
   
   [Confluent's branding says `Confluent Schema 
Regisry`](https://docs.confluent.io/current/schema-registry/index.html), so I 
think it's self-explanatory enough 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397965)
Time Spent: 9h 40m  (was: 9.5h)

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch
>
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397966=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397966
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 04/Mar/20 23:26
Start Date: 04/Mar/20 23:26
Worklog Time Spent: 10m 
  Work Description: cricket007 commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r387995005
 
 

 ##
 File path: kafka-handler/README.md
 ##
 @@ -50,6 +50,9 @@ ALTER TABLE
 SET TBLPROPERTIES (
   "kafka.serde.class" = "org.apache.hadoop.hive.serde2.avro.AvroSerDe");
 ```
+
+If you use Confluent Avro serialzier/deserializer with Schema Registry you may 
want to remove 5 bytes from beginning that represents magic byte + schema ID 
from registry.
 
 Review comment:
   Links for Context - 
https://docs.confluent.io/current/schema-registry/serializer-formatter.html#wire-format
   
   However, there are *other* Registries, so I would like to make a point to 
have this be as extensible as possible. 
   
   - https://github.com/hortonworks/registry
   - https://github.com/Apicurio/apicurio-registry/
   
   [Confluent's branding says `Confluent Schema 
Regisry`](https://docs.confluent.io/current/schema-registry/index.html), so I 
think it's self-explanatory enough 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397966)
Time Spent: 9h 50m  (was: 9h 40m)

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch
>
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (HIVE-22126) hive-exec packaging should shade guava

2020-03-04 Thread Eugene Chung (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Chung updated HIVE-22126:

Comment: was deleted

(was: [~dlavati] Shading guava for Hive also requires shading calcite modules. 
And it leads to changing the FQCN of calcite-avatica JDBC driver. e.g. 
 * org.apache.calcite.jdbc.Driver -> 
org.apache.hive.org.apache.calcite.jdbc.Driver

I stopped there cause I was not sure it's okay to change it.

If changing the name of driver is just internal or test concern, I think it's 
okay.

I have some free time these days, so I am going to investigate this again.)

> hive-exec packaging should shade guava
> --
>
> Key: HIVE-22126
> URL: https://issues.apache.org/jira/browse/HIVE-22126
> Project: Hive
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Eugene Chung
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22126.01.patch, HIVE-22126.02.patch, 
> HIVE-22126.03.patch, HIVE-22126.04.patch, HIVE-22126.05.patch, 
> HIVE-22126.06.patch, HIVE-22126.07.patch
>
>
> The ql/pom.xml includes complete guava library into hive-exec.jar 
> https://github.com/apache/hive/blob/master/ql/pom.xml#L990 This causes a 
> problems for downstream clients of hive which have hive-exec.jar in their 
> classpath since they are pinned to the same guava version as that of hive. 
> We should shade guava classes so that other components which depend on 
> hive-exec can independently use a different version of guava as needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22126) hive-exec packaging should shade guava

2020-03-04 Thread Eugene Chung (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Chung updated HIVE-22126:

Attachment: HIVE-22126.07.patch
Status: Patch Available  (was: Open)

[^HIVE-22126.07.patch]

I missed some tests which requires dependency of calcite-core dependent modules.

> hive-exec packaging should shade guava
> --
>
> Key: HIVE-22126
> URL: https://issues.apache.org/jira/browse/HIVE-22126
> Project: Hive
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Eugene Chung
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22126.01.patch, HIVE-22126.02.patch, 
> HIVE-22126.03.patch, HIVE-22126.04.patch, HIVE-22126.05.patch, 
> HIVE-22126.06.patch, HIVE-22126.07.patch
>
>
> The ql/pom.xml includes complete guava library into hive-exec.jar 
> https://github.com/apache/hive/blob/master/ql/pom.xml#L990 This causes a 
> problems for downstream clients of hive which have hive-exec.jar in their 
> classpath since they are pinned to the same guava version as that of hive. 
> We should shade guava classes so that other components which depend on 
> hive-exec can independently use a different version of guava as needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22126) hive-exec packaging should shade guava

2020-03-04 Thread Eugene Chung (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Chung updated HIVE-22126:

Status: Open  (was: Patch Available)

> hive-exec packaging should shade guava
> --
>
> Key: HIVE-22126
> URL: https://issues.apache.org/jira/browse/HIVE-22126
> Project: Hive
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Eugene Chung
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22126.01.patch, HIVE-22126.02.patch, 
> HIVE-22126.03.patch, HIVE-22126.04.patch, HIVE-22126.05.patch, 
> HIVE-22126.06.patch
>
>
> The ql/pom.xml includes complete guava library into hive-exec.jar 
> https://github.com/apache/hive/blob/master/ql/pom.xml#L990 This causes a 
> problems for downstream clients of hive which have hive-exec.jar in their 
> classpath since they are pinned to the same guava version as that of hive. 
> We should shade guava classes so that other components which depend on 
> hive-exec can independently use a different version of guava as needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22975) Optimise TopNKeyFilter with boundary checks

2020-03-04 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051677#comment-17051677
 ] 

Hive QA commented on HIVE-22975:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12995588/HIVE-22975.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 18097 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/20952/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20952/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20952/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12995588 - PreCommit-HIVE-Build

> Optimise TopNKeyFilter with boundary checks
> ---
>
> Key: HIVE-22975
> URL: https://issues.apache.org/jira/browse/HIVE-22975
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-22975.1.patch, Screenshot 2020-03-04 at 3.26.45 
> PM.jpg
>
>
> !Screenshot 2020-03-04 at 3.26.45 PM.jpg|width=507,height=322!
>  
> It would be good to add boundary checks to reduce cycles spent on topN 
> filter. E.g Q43 spends good amount of time in topN.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397939=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397939
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 04/Mar/20 22:47
Start Date: 04/Mar/20 22:47
Worklog Time Spent: 10m 
  Work Description: davidov541 commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r387980822
 
 

 ##
 File path: kafka-handler/README.md
 ##
 @@ -50,6 +50,9 @@ ALTER TABLE
 SET TBLPROPERTIES (
   "kafka.serde.class" = "org.apache.hadoop.hive.serde2.avro.AvroSerDe");
 ```
+
+If you use Confluent Avro serialzier/deserializer with Schema Registry you may 
want to remove 5 bytes from beginning that represents magic byte + schema ID 
from registry.
 
 Review comment:
   @b-slim You appear to be correct, based on the source code: 
https://github.com/confluentinc/schema-registry/blob/master/avro-serializer/src/main/java/io/confluent/kafka/serializers/AbstractKafkaAvroSerializer.java.
 Let's say we did implement as is, and later we implement the schema registry 
lookup and use the same identifier? Who would that break? Serialized messages 
that point to a bogus schema registry instance, or serialized messages that 
happened to need 5 bytes at the front of the message, but aren't from 
confluent, and some clever dev figured out he could use Confluent instead of 
the right way? 
   
   The second case doesn't matter to me tbh. 
   
   The first case is concerning and should be handled. I would expect that we 
would catch when we can't find a schema and print out a warning, but no error. 
That would allow this case to continue working. But we would be making 
assumptions on the implementation of a feature in the future, which is always a 
crapshoot...
   
   To be clear, if we make sure documentation is clear on this outside of just 
these parameters, and @cricket007 agrees with it as a heavy Confluent user, 
then I'm fine with it. It feels like we've covered this problem enough to be in 
a good spot either way.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397939)
Time Spent: 9.5h  (was: 9h 20m)

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch
>
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22962) Reuse HiveRelFieldTrimmer instance across queries

2020-03-04 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-22962:
---
Attachment: HIVE-22962.04.patch

> Reuse HiveRelFieldTrimmer instance across queries
> -
>
> Key: HIVE-22962
> URL: https://issues.apache.org/jira/browse/HIVE-22962
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-22962.01.patch, HIVE-22962.02.patch, 
> HIVE-22962.03.patch, HIVE-22962.04.patch, HIVE-22962.patch
>
>
> Currently we create multiple {{HiveRelFieldTrimmer}} instances per query. 
> {{HiveRelFieldTrimmer}} uses a method dispatcher that has a built-in caching 
> mechanism: given a certain object, it stores the method that was called for 
> the object class. However, by instantiating the trimmer multiple times per 
> query and across queries, we create a new dispatcher with each instantiation, 
> thus effectively removing the caching mechanism that is built within the 
> dispatcher.
> This issue is to reutilize the same {{HiveRelFieldTrimmer}} instance within a 
> single query and across queries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397930=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397930
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 04/Mar/20 22:34
Start Date: 04/Mar/20 22:34
Worklog Time Spent: 10m 
  Work Description: b-slim commented on issue #933: HIVE-21218: Adding 
support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#issuecomment-594901087
 
 
   @davidov541 thanks for the PR i left one important comment that need some 
work thanks for the contribution. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397930)
Time Spent: 9h 20m  (was: 9h 10m)

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch
>
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397920=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397920
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 04/Mar/20 22:28
Start Date: 04/Mar/20 22:28
Worklog Time Spent: 10m 
  Work Description: b-slim commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r387973124
 
 

 ##
 File path: kafka-handler/README.md
 ##
 @@ -50,6 +50,9 @@ ALTER TABLE
 SET TBLPROPERTIES (
   "kafka.serde.class" = "org.apache.hadoop.hive.serde2.avro.AvroSerDe");
 ```
+
+If you use Confluent Avro serialzier/deserializer with Schema Registry you may 
want to remove 5 bytes from beginning that represents magic byte + schema ID 
from registry.
 
 Review comment:
   @davidov541 please help me understand this, the first 5 bytes on the record 
am guessing it is a schema id that can be used to fetch schema from the 
registry ?
   **If that is correct then I think we should not call this confluent and let 
me explain why.**
   Imagine in the near/far future someone implement a full fledge Avro reader 
that uses those five bytes to figure out schema from the schema registry what 
shall we call this now ? Confluent V2 ? that is why I think it is confusing. 
Thus in my opinion it is better to call it skip, and document how this can be 
used to read data from Confluent Avro SerDe. Please let me know if this makes 
sense to you.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397920)
Time Spent: 9h 10m  (was: 9h)

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch
>
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22975) Optimise TopNKeyFilter with boundary checks

2020-03-04 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051662#comment-17051662
 ] 

Hive QA commented on HIVE-22975:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
21s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
49s{color} | {color:blue} ql in master has 1531 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
14s{color} | {color:red} The patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 24m 45s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-20952/dev-support/hive-personality.sh
 |
| git revision | master / deebfb6 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.1 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20952/yetus/patch-asflicense-problems.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20952/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Optimise TopNKeyFilter with boundary checks
> ---
>
> Key: HIVE-22975
> URL: https://issues.apache.org/jira/browse/HIVE-22975
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-22975.1.patch, Screenshot 2020-03-04 at 3.26.45 
> PM.jpg
>
>
> !Screenshot 2020-03-04 at 3.26.45 PM.jpg|width=507,height=322!
>  
> It would be good to add boundary checks to reduce cycles spent on topN 
> filter. E.g Q43 spends good amount of time in topN.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22966) LLAP: Consider including waitTime for comparing attempts in same vertex

2020-03-04 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051650#comment-17051650
 ] 

Hive QA commented on HIVE-22966:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12995412/HIVE-22966.4.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 18096 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/20951/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20951/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20951/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12995412 - PreCommit-HIVE-Build

> LLAP: Consider including waitTime for comparing attempts in same vertex
> ---
>
> Key: HIVE-22966
> URL: https://issues.apache.org/jira/browse/HIVE-22966
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-22966.3.patch, HIVE-22966.4.patch
>
>
> When attempts are compared within same vertex, it should pick up the attempt 
> with longest wait time to avoid starvation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397906=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397906
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 04/Mar/20 21:39
Start Date: 04/Mar/20 21:39
Worklog Time Spent: 10m 
  Work Description: davidov541 commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r387949896
 
 

 ##
 File path: kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java
 ##
 @@ -133,12 +134,40 @@
   Preconditions.checkArgument(!schemaFromProperty.isEmpty(), "Avro Schema 
is empty Can not go further");
   Schema schema = AvroSerdeUtils.getSchemaFor(schemaFromProperty);
   LOG.debug("Building Avro Reader with schema {}", schemaFromProperty);
-  bytesConverter = new AvroBytesConverter(schema);
+  bytesConverter = getByteConverterForAvroDelegate(schema, tbl);
 } else {
   bytesConverter = new BytesWritableConverter();
 }
   }
 
+  enum BytesConverterType {
+CONFLUENT,
 
 Review comment:
   That makes sense, @cricket007. So far Confluent has done a good job 
preventing backward compatibility issues, so I don't see a reason to assume 
that they will do so in the future at the cost of a worse developer experience. 
I'd be fine leaving it as is.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397906)
Time Spent: 9h  (was: 8h 50m)

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch
>
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22974) Metastore's table location check should be optional

2020-03-04 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-22974:
-
Attachment: HIVE-22974.1.patch

> Metastore's table location check should be optional
> ---
>
> Key: HIVE-22974
> URL: https://issues.apache.org/jira/browse/HIVE-22974
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22974.1.patch
>
>
> In HIVE-22189 a check was introduced to make sure managed and external tables 
> are located at the proper space. This condition cannot be satisfied during an 
> upgrade.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22974) Metastore's table location check should be optional

2020-03-04 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-22974:
-
Status: Open  (was: Patch Available)

> Metastore's table location check should be optional
> ---
>
> Key: HIVE-22974
> URL: https://issues.apache.org/jira/browse/HIVE-22974
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22974.1.patch
>
>
> In HIVE-22189 a check was introduced to make sure managed and external tables 
> are located at the proper space. This condition cannot be satisfied during an 
> upgrade.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22974) Metastore's table location check should be optional

2020-03-04 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-22974:
-
Attachment: (was: HIVE-22974.1.patch)

> Metastore's table location check should be optional
> ---
>
> Key: HIVE-22974
> URL: https://issues.apache.org/jira/browse/HIVE-22974
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22974.1.patch
>
>
> In HIVE-22189 a check was introduced to make sure managed and external tables 
> are located at the proper space. This condition cannot be satisfied during an 
> upgrade.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22974) Metastore's table location check should be optional

2020-03-04 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-22974:
-
Status: Patch Available  (was: Open)

> Metastore's table location check should be optional
> ---
>
> Key: HIVE-22974
> URL: https://issues.apache.org/jira/browse/HIVE-22974
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22974.1.patch
>
>
> In HIVE-22189 a check was introduced to make sure managed and external tables 
> are located at the proper space. This condition cannot be satisfied during an 
> upgrade.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22954) Schedule Repl Load using Hive Scheduler

2020-03-04 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-22954:
---
Status: In Progress  (was: Patch Available)

> Schedule Repl Load using Hive Scheduler
> ---
>
> Key: HIVE-22954
> URL: https://issues.apache.org/jira/browse/HIVE-22954
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22954.01.patch, HIVE-22954.02.patch, 
> HIVE-22954.03.patch, HIVE-22954.04.patch, HIVE-22954.05.patch, 
> HIVE-22954.06.patch, HIVE-22954.07.patch, HIVE-22954.08.patch, 
> HIVE-22954.09.patch, HIVE-22954.10.patch, HIVE-22954.11.patch, 
> HIVE-22954.12.patch, HIVE-22954.13.patch, HIVE-22954.patch
>
>
> [https://github.com/apache/hive/pull/932]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22954) Schedule Repl Load using Hive Scheduler

2020-03-04 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-22954:
---
Attachment: HIVE-22954.13.patch
Status: Patch Available  (was: In Progress)

> Schedule Repl Load using Hive Scheduler
> ---
>
> Key: HIVE-22954
> URL: https://issues.apache.org/jira/browse/HIVE-22954
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22954.01.patch, HIVE-22954.02.patch, 
> HIVE-22954.03.patch, HIVE-22954.04.patch, HIVE-22954.05.patch, 
> HIVE-22954.06.patch, HIVE-22954.07.patch, HIVE-22954.08.patch, 
> HIVE-22954.09.patch, HIVE-22954.10.patch, HIVE-22954.11.patch, 
> HIVE-22954.12.patch, HIVE-22954.13.patch, HIVE-22954.patch
>
>
> [https://github.com/apache/hive/pull/932]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397883=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397883
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 04/Mar/20 21:01
Start Date: 04/Mar/20 21:01
Worklog Time Spent: 10m 
  Work Description: davidov541 commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r387931635
 
 

 ##
 File path: kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java
 ##
 @@ -133,12 +134,40 @@
   Preconditions.checkArgument(!schemaFromProperty.isEmpty(), "Avro Schema 
is empty Can not go further");
   Schema schema = AvroSerdeUtils.getSchemaFor(schemaFromProperty);
   LOG.debug("Building Avro Reader with schema {}", schemaFromProperty);
-  bytesConverter = new AvroBytesConverter(schema);
+  bytesConverter = getByteConverterForAvroDelegate(schema, tbl);
 } else {
   bytesConverter = new BytesWritableConverter();
 }
   }
 
+  enum BytesConverterType {
+CONFLUENT,
+SKIP,
+NONE;
+
+static BytesConverterType fromString(String value) {
+  try {
+return BytesConverterType.valueOf(value.trim().toUpperCase());
+  } catch (Exception e){
+return NONE;
+  }
+}
+  }
+
+  BytesConverter getByteConverterForAvroDelegate(Schema schema, Properties 
tbl) {
+String avroBytesConverterProperty = tbl.getProperty(AvroSerdeUtils
+
.AvroTableProperties.AVRO_SERDE_TYPE
+.getPropName(), 
BytesConverterType.NONE.toString());
+BytesConverterType avroByteConverterType = 
BytesConverterType.fromString(avroBytesConverterProperty);
+Integer avroSkipBytes = 
Integer.getInteger(tbl.getProperty(AvroSerdeUtils.AvroTableProperties.AVRO_SERDE_SKIP_BYTES
 
 Review comment:
   OK, made the fix. Please check it to make sure I got it right this time.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397883)
Time Spent: 8h 50m  (was: 8h 40m)

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch
>
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397882=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397882
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 04/Mar/20 21:00
Start Date: 04/Mar/20 21:00
Worklog Time Spent: 10m 
  Work Description: davidov541 commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r387931454
 
 

 ##
 File path: kafka-handler/pom.xml
 ##
 @@ -190,5 +207,27 @@
 
   
 
+
+  
+
+  org.apache.avro
+  avro-maven-plugin
+  1.8.1
 
 Review comment:
   Used the global avro.version variable.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397882)
Time Spent: 8h 40m  (was: 8.5h)

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch
>
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397881=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397881
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 04/Mar/20 21:00
Start Date: 04/Mar/20 21:00
Worklog Time Spent: 10m 
  Work Description: davidov541 commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r387931357
 
 

 ##
 File path: kafka-handler/pom.xml
 ##
 @@ -118,8 +118,21 @@
   1.7.30
   test
 
+
+  io.confluent
+  kafka-avro-serializer
+  5.4.0
+  test
 
 Review comment:
   Added exclusion.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397881)
Time Spent: 8.5h  (was: 8h 20m)

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch
>
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397880=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397880
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 04/Mar/20 21:00
Start Date: 04/Mar/20 21:00
Worklog Time Spent: 10m 
  Work Description: davidov541 commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r387931187
 
 

 ##
 File path: kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java
 ##
 @@ -369,6 +402,26 @@ private SubStructObjectInspector(StructObjectInspector 
baseOI, int toIndex) {
 }
   }
 
+/**
+ * The converter reads bytes from kafka message and skip first @skipBytes 
from beginning.
+ *
+ * For example:
+ *   Confluent kafka producer add 5 magic bytes that represents Schema 
ID as Integer to the message.
 
 Review comment:
   Fixed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397880)
Time Spent: 8h 20m  (was: 8h 10m)

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch
>
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397879=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397879
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 04/Mar/20 20:59
Start Date: 04/Mar/20 20:59
Worklog Time Spent: 10m 
  Work Description: davidov541 commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r387931106
 
 

 ##
 File path: kafka-handler/README.md
 ##
 @@ -50,6 +50,9 @@ ALTER TABLE
 SET TBLPROPERTIES (
   "kafka.serde.class" = "org.apache.hadoop.hive.serde2.avro.AvroSerDe");
 ```
+
+If you use Confluent Avro serialzier/deserializer with schema registry you may 
want to remove 5 bytes from beginning that represents magic byte + schema ID 
from registry.
 
 Review comment:
   Fixed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397879)
Time Spent: 8h 10m  (was: 8h)

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch
>
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397878=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397878
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 04/Mar/20 20:59
Start Date: 04/Mar/20 20:59
Worklog Time Spent: 10m 
  Work Description: davidov541 commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r387931054
 
 

 ##
 File path: 
kafka-handler/src/test/org/apache/hadoop/hive/kafka/AvroBytesConverterTest.java
 ##
 @@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.kafka;
+
+import com.google.common.collect.Maps;
+import io.confluent.kafka.schemaregistry.client.MockSchemaRegistryClient;
+import io.confluent.kafka.serializers.KafkaAvroSerializer;
+import org.apache.avro.Schema;
+import org.apache.hadoop.hive.serde2.avro.AvroGenericRecordWritable;
+import org.junit.Assert;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import java.util.Arrays;
+import java.util.Map;
+
+/**
+ * Test class for Hive Kafka Avro bytes converter.
+ */
+public class AvroBytesConverterTest {
+  private static SimpleRecord simpleRecord1 = 
SimpleRecord.newBuilder().setId("123").setName("test").build();
+  private static byte[] simpleRecord1AsBytes;
+
+  /**
+   * Emulate confluent avro producer that add 4 magic bits (int) before value 
bytes. The int represents the schema ID from schema registry.
+   */
+  @BeforeClass
+  public static void setUp() {
+Map config = Maps.newHashMap();
+config.put("schema.registry.url", "http://localhost;);
 
 Review comment:
   Fixed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397878)
Time Spent: 8h  (was: 7h 50m)

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22954) Schedule Repl Load using Hive Scheduler

2020-03-04 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-22954:
---
Attachment: HIVE-22954.12.patch
Status: Patch Available  (was: In Progress)

> Schedule Repl Load using Hive Scheduler
> ---
>
> Key: HIVE-22954
> URL: https://issues.apache.org/jira/browse/HIVE-22954
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22954.01.patch, HIVE-22954.02.patch, 
> HIVE-22954.03.patch, HIVE-22954.04.patch, HIVE-22954.05.patch, 
> HIVE-22954.06.patch, HIVE-22954.07.patch, HIVE-22954.08.patch, 
> HIVE-22954.09.patch, HIVE-22954.10.patch, HIVE-22954.11.patch, 
> HIVE-22954.12.patch, HIVE-22954.patch
>
>
> [https://github.com/apache/hive/pull/932]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22954) Schedule Repl Load using Hive Scheduler

2020-03-04 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-22954:
---
Status: In Progress  (was: Patch Available)

> Schedule Repl Load using Hive Scheduler
> ---
>
> Key: HIVE-22954
> URL: https://issues.apache.org/jira/browse/HIVE-22954
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22954.01.patch, HIVE-22954.02.patch, 
> HIVE-22954.03.patch, HIVE-22954.04.patch, HIVE-22954.05.patch, 
> HIVE-22954.06.patch, HIVE-22954.07.patch, HIVE-22954.08.patch, 
> HIVE-22954.09.patch, HIVE-22954.10.patch, HIVE-22954.11.patch, 
> HIVE-22954.patch
>
>
> [https://github.com/apache/hive/pull/932]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22954) Schedule Repl Load using Hive Scheduler

2020-03-04 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-22954:
---
Status: In Progress  (was: Patch Available)

> Schedule Repl Load using Hive Scheduler
> ---
>
> Key: HIVE-22954
> URL: https://issues.apache.org/jira/browse/HIVE-22954
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22954.01.patch, HIVE-22954.02.patch, 
> HIVE-22954.03.patch, HIVE-22954.04.patch, HIVE-22954.05.patch, 
> HIVE-22954.06.patch, HIVE-22954.07.patch, HIVE-22954.08.patch, 
> HIVE-22954.09.patch, HIVE-22954.10.patch, HIVE-22954.11.patch, 
> HIVE-22954.patch
>
>
> [https://github.com/apache/hive/pull/932]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22954) Schedule Repl Load using Hive Scheduler

2020-03-04 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-22954:
---
Attachment: HIVE-22954.11.patch
Status: Patch Available  (was: In Progress)

> Schedule Repl Load using Hive Scheduler
> ---
>
> Key: HIVE-22954
> URL: https://issues.apache.org/jira/browse/HIVE-22954
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22954.01.patch, HIVE-22954.02.patch, 
> HIVE-22954.03.patch, HIVE-22954.04.patch, HIVE-22954.05.patch, 
> HIVE-22954.06.patch, HIVE-22954.07.patch, HIVE-22954.08.patch, 
> HIVE-22954.09.patch, HIVE-22954.10.patch, HIVE-22954.11.patch, 
> HIVE-22954.patch
>
>
> [https://github.com/apache/hive/pull/932]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22966) LLAP: Consider including waitTime for comparing attempts in same vertex

2020-03-04 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051598#comment-17051598
 ] 

Hive QA commented on HIVE-22966:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
41s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
45s{color} | {color:blue} llap-server in master has 90 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
15s{color} | {color:red} The patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 14m 14s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-20951/dev-support/hive-personality.sh
 |
| git revision | master / deebfb6 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.1 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20951/yetus/patch-asflicense-problems.txt
 |
| modules | C: llap-server U: llap-server |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20951/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> LLAP: Consider including waitTime for comparing attempts in same vertex
> ---
>
> Key: HIVE-22966
> URL: https://issues.apache.org/jira/browse/HIVE-22966
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-22966.3.patch, HIVE-22966.4.patch
>
>
> When attempts are compared within same vertex, it should pick up the attempt 
> with longest wait time to avoid starvation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22785) Update/delete/merge statements not optimized through CBO

2020-03-04 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-22785:
--
Status: Patch Available  (was: Open)

> Update/delete/merge statements not optimized through CBO
> 
>
> Key: HIVE-22785
> URL: https://issues.apache.org/jira/browse/HIVE-22785
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Krisztian Kasa
>Priority: Critical
> Attachments: HIVE-22785.1.patch, HIVE-22785.2.patch, 
> HIVE-22785.2.patch, HIVE-22785.3.patch
>
>
> Currently, CBO is bypassed for update/delete/merge statements.
> To support optimizing these statements through CBO, we need to complete three 
> main tasks: 1) support for sort in Calcite planner, 2) support for SORT in 
> AST converter, and 3) {{RewriteSemanticAnalyzer}} should extend 
> {{CalcitePlanner}} instead of {{SemanticAnalyzer}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22785) Update/delete/merge statements not optimized through CBO

2020-03-04 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-22785:
--
Status: Open  (was: Patch Available)

> Update/delete/merge statements not optimized through CBO
> 
>
> Key: HIVE-22785
> URL: https://issues.apache.org/jira/browse/HIVE-22785
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Krisztian Kasa
>Priority: Critical
> Attachments: HIVE-22785.1.patch, HIVE-22785.2.patch, 
> HIVE-22785.2.patch, HIVE-22785.3.patch
>
>
> Currently, CBO is bypassed for update/delete/merge statements.
> To support optimizing these statements through CBO, we need to complete three 
> main tasks: 1) support for sort in Calcite planner, 2) support for SORT in 
> AST converter, and 3) {{RewriteSemanticAnalyzer}} should extend 
> {{CalcitePlanner}} instead of {{SemanticAnalyzer}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22785) Update/delete/merge statements not optimized through CBO

2020-03-04 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-22785:
--
Attachment: HIVE-22785.3.patch

> Update/delete/merge statements not optimized through CBO
> 
>
> Key: HIVE-22785
> URL: https://issues.apache.org/jira/browse/HIVE-22785
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Krisztian Kasa
>Priority: Critical
> Attachments: HIVE-22785.1.patch, HIVE-22785.2.patch, 
> HIVE-22785.2.patch, HIVE-22785.3.patch
>
>
> Currently, CBO is bypassed for update/delete/merge statements.
> To support optimizing these statements through CBO, we need to complete three 
> main tasks: 1) support for sort in Calcite planner, 2) support for SORT in 
> AST converter, and 3) {{RewriteSemanticAnalyzer}} should extend 
> {{CalcitePlanner}} instead of {{SemanticAnalyzer}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22974) Metastore's table location check should be optional

2020-03-04 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051583#comment-17051583
 ] 

Hive QA commented on HIVE-22974:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12995577/HIVE-22974.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 18096 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.ql.TestTxnExIm.testMM (batchId=342)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/20950/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20950/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20950/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12995577 - PreCommit-HIVE-Build

> Metastore's table location check should be optional
> ---
>
> Key: HIVE-22974
> URL: https://issues.apache.org/jira/browse/HIVE-22974
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22974.1.patch
>
>
> In HIVE-22189 a check was introduced to make sure managed and external tables 
> are located at the proper space. This condition cannot be satisfied during an 
> upgrade.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22865) Include data in replication staging directory

2020-03-04 Thread PRAVIN KUMAR SINHA (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PRAVIN KUMAR SINHA updated HIVE-22865:
--
Attachment: HIVE-22865.8.patch

> Include data in replication staging directory
> -
>
> Key: HIVE-22865
> URL: https://issues.apache.org/jira/browse/HIVE-22865
> Project: Hive
>  Issue Type: Task
>Reporter: PRAVIN KUMAR SINHA
>Assignee: PRAVIN KUMAR SINHA
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22865.1.patch, HIVE-22865.2.patch, 
> HIVE-22865.3.patch, HIVE-22865.4.patch, HIVE-22865.5.patch, 
> HIVE-22865.6.patch, HIVE-22865.7.patch, HIVE-22865.8.patch
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22954) Schedule Repl Load using Hive Scheduler

2020-03-04 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-22954:
---
Attachment: HIVE-22954.10.patch
Status: Patch Available  (was: In Progress)

> Schedule Repl Load using Hive Scheduler
> ---
>
> Key: HIVE-22954
> URL: https://issues.apache.org/jira/browse/HIVE-22954
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22954.01.patch, HIVE-22954.02.patch, 
> HIVE-22954.03.patch, HIVE-22954.04.patch, HIVE-22954.05.patch, 
> HIVE-22954.06.patch, HIVE-22954.07.patch, HIVE-22954.08.patch, 
> HIVE-22954.09.patch, HIVE-22954.10.patch, HIVE-22954.patch
>
>
> [https://github.com/apache/hive/pull/932]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22954) Schedule Repl Load using Hive Scheduler

2020-03-04 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-22954:
---
Status: In Progress  (was: Patch Available)

> Schedule Repl Load using Hive Scheduler
> ---
>
> Key: HIVE-22954
> URL: https://issues.apache.org/jira/browse/HIVE-22954
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22954.01.patch, HIVE-22954.02.patch, 
> HIVE-22954.03.patch, HIVE-22954.04.patch, HIVE-22954.05.patch, 
> HIVE-22954.06.patch, HIVE-22954.07.patch, HIVE-22954.08.patch, 
> HIVE-22954.09.patch, HIVE-22954.10.patch, HIVE-22954.patch
>
>
> [https://github.com/apache/hive/pull/932]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397818=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397818
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 04/Mar/20 19:34
Start Date: 04/Mar/20 19:34
Worklog Time Spent: 10m 
  Work Description: b-slim commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r387888277
 
 

 ##
 File path: kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java
 ##
 @@ -133,12 +134,40 @@
   Preconditions.checkArgument(!schemaFromProperty.isEmpty(), "Avro Schema 
is empty Can not go further");
   Schema schema = AvroSerdeUtils.getSchemaFor(schemaFromProperty);
   LOG.debug("Building Avro Reader with schema {}", schemaFromProperty);
-  bytesConverter = new AvroBytesConverter(schema);
+  bytesConverter = getByteConverterForAvroDelegate(schema, tbl);
 } else {
   bytesConverter = new BytesWritableConverter();
 }
   }
 
+  enum BytesConverterType {
+CONFLUENT,
+SKIP,
+NONE;
+
+static BytesConverterType fromString(String value) {
+  try {
+return BytesConverterType.valueOf(value.trim().toUpperCase());
+  } catch (Exception e){
+return NONE;
+  }
+}
+  }
+
+  BytesConverter getByteConverterForAvroDelegate(Schema schema, Properties 
tbl) {
+String avroBytesConverterProperty = tbl.getProperty(AvroSerdeUtils
+
.AvroTableProperties.AVRO_SERDE_TYPE
+.getPropName(), 
BytesConverterType.NONE.toString());
+BytesConverterType avroByteConverterType = 
BytesConverterType.fromString(avroBytesConverterProperty);
+Integer avroSkipBytes = 
Integer.getInteger(tbl.getProperty(AvroSerdeUtils.AvroTableProperties.AVRO_SERDE_SKIP_BYTES
 
 Review comment:
   FYI the goal of code review is not to tell devs what to do, but to help them 
understand why the code need to be changed 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397818)
Time Spent: 7h 50m  (was: 7h 40m)

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch
>
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397815=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397815
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 04/Mar/20 19:33
Start Date: 04/Mar/20 19:33
Worklog Time Spent: 10m 
  Work Description: davidov541 commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r387887820
 
 

 ##
 File path: kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java
 ##
 @@ -133,12 +134,40 @@
   Preconditions.checkArgument(!schemaFromProperty.isEmpty(), "Avro Schema 
is empty Can not go further");
   Schema schema = AvroSerdeUtils.getSchemaFor(schemaFromProperty);
   LOG.debug("Building Avro Reader with schema {}", schemaFromProperty);
-  bytesConverter = new AvroBytesConverter(schema);
+  bytesConverter = getByteConverterForAvroDelegate(schema, tbl);
 } else {
   bytesConverter = new BytesWritableConverter();
 }
   }
 
+  enum BytesConverterType {
+CONFLUENT,
+SKIP,
+NONE;
+
+static BytesConverterType fromString(String value) {
+  try {
+return BytesConverterType.valueOf(value.trim().toUpperCase());
+  } catch (Exception e){
+return NONE;
+  }
+}
+  }
+
+  BytesConverter getByteConverterForAvroDelegate(Schema schema, Properties 
tbl) {
+String avroBytesConverterProperty = tbl.getProperty(AvroSerdeUtils
+
.AvroTableProperties.AVRO_SERDE_TYPE
+.getPropName(), 
BytesConverterType.NONE.toString());
+BytesConverterType avroByteConverterType = 
BytesConverterType.fromString(avroBytesConverterProperty);
+Integer avroSkipBytes = 
Integer.getInteger(tbl.getProperty(AvroSerdeUtils.AvroTableProperties.AVRO_SERDE_SKIP_BYTES
 
 Review comment:
   Thanks @b-slim , I definitely skipped that multiple times when I was reading 
the code. That is a bizarre issue, but I'm glad you pointed it out. I've fixed 
it, and am running the tests now.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397815)
Time Spent: 7h 40m  (was: 7.5h)

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch
>
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22972) Allow table id to be set for table creation requests

2020-03-04 Thread Miklos Gergely (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-22972:
--
Attachment: HIVE-22972.03.patch

> Allow table id to be set for table creation requests
> 
>
> Key: HIVE-22972
> URL: https://issues.apache.org/jira/browse/HIVE-22972
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22972.01.patch, HIVE-22972.02.patch, 
> HIVE-22972.03.patch
>
>
> Hive Metastore should accept requests for table creation where the id is set, 
> ignoring it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397810=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397810
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 04/Mar/20 19:30
Start Date: 04/Mar/20 19:30
Worklog Time Spent: 10m 
  Work Description: cricket007 commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r387885875
 
 

 ##
 File path: kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java
 ##
 @@ -133,12 +134,40 @@
   Preconditions.checkArgument(!schemaFromProperty.isEmpty(), "Avro Schema 
is empty Can not go further");
   Schema schema = AvroSerdeUtils.getSchemaFor(schemaFromProperty);
   LOG.debug("Building Avro Reader with schema {}", schemaFromProperty);
-  bytesConverter = new AvroBytesConverter(schema);
+  bytesConverter = getByteConverterForAvroDelegate(schema, tbl);
 } else {
   bytesConverter = new BytesWritableConverter();
 }
   }
 
+  enum BytesConverterType {
+CONFLUENT,
+SKIP,
+NONE;
+
+static BytesConverterType fromString(String value) {
+  try {
+return BytesConverterType.valueOf(value.trim().toUpperCase());
+  } catch (Exception e){
+return NONE;
+  }
+}
+  }
+
+  BytesConverter getByteConverterForAvroDelegate(Schema schema, Properties 
tbl) {
+String avroBytesConverterProperty = tbl.getProperty(AvroSerdeUtils
+
.AvroTableProperties.AVRO_SERDE_TYPE
+.getPropName(), 
BytesConverterType.NONE.toString());
+BytesConverterType avroByteConverterType = 
BytesConverterType.fromString(avroBytesConverterProperty);
+Integer avroSkipBytes = 
Integer.getInteger(tbl.getProperty(AvroSerdeUtils.AvroTableProperties.AVRO_SERDE_SKIP_BYTES
 
 Review comment:
   tl;dr - Just use `Integer.parseInt(tbl.getProperty())`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397810)
Time Spent: 7.5h  (was: 7h 20m)

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch
>
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397800=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397800
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 04/Mar/20 19:21
Start Date: 04/Mar/20 19:21
Worklog Time Spent: 10m 
  Work Description: b-slim commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r387881145
 
 

 ##
 File path: kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java
 ##
 @@ -133,12 +134,40 @@
   Preconditions.checkArgument(!schemaFromProperty.isEmpty(), "Avro Schema 
is empty Can not go further");
   Schema schema = AvroSerdeUtils.getSchemaFor(schemaFromProperty);
   LOG.debug("Building Avro Reader with schema {}", schemaFromProperty);
-  bytesConverter = new AvroBytesConverter(schema);
+  bytesConverter = getByteConverterForAvroDelegate(schema, tbl);
 } else {
   bytesConverter = new BytesWritableConverter();
 }
   }
 
+  enum BytesConverterType {
+CONFLUENT,
+SKIP,
+NONE;
+
+static BytesConverterType fromString(String value) {
+  try {
+return BytesConverterType.valueOf(value.trim().toUpperCase());
+  } catch (Exception e){
+return NONE;
+  }
+}
+  }
+
+  BytesConverter getByteConverterForAvroDelegate(Schema schema, Properties 
tbl) {
+String avroBytesConverterProperty = tbl.getProperty(AvroSerdeUtils
+
.AvroTableProperties.AVRO_SERDE_TYPE
+.getPropName(), 
BytesConverterType.NONE.toString());
+BytesConverterType avroByteConverterType = 
BytesConverterType.fromString(avroBytesConverterProperty);
+Integer avroSkipBytes = 
Integer.getInteger(tbl.getProperty(AvroSerdeUtils.AvroTableProperties.AVRO_SERDE_SKIP_BYTES
 
 Review comment:
   @davidov541 please read the java doc of 
`java.lang.Integer#getInteger(java.lang.String)` In the first line it says
   `* Determines the integer value of the system property with the
* specified name.`
   Reading the code should help as well 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397800)
Time Spent: 7h 20m  (was: 7h 10m)

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch
>
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-22561) Data loss on map join for bucketed, partitioned table

2020-03-04 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-22561:
--

Assignee: Aditya Shah  (was: Jesus Camacho Rodriguez)

> Data loss on map join for bucketed, partitioned table
> -
>
> Key: HIVE-22561
> URL: https://issues.apache.org/jira/browse/HIVE-22561
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Blocker
> Fix For: 3.0.0, 3.1.0
>
> Attachments: HIVE-22561.1.branch-3.1.patch, 
> HIVE-22561.branch-3.1.patch, HIVE-22561.patch, Screenshot 2019-11-28 at 
> 8.45.17 PM.png, image-2019-11-28-20-46-25-432.png
>
>
> A map join on a column (which is neither involved in bucketing and partition) 
> causes data loss. 
> Steps to reproduce:
> Env: [hive-dev-box|[https://github.com/kgyrtkirk/hive-dev-box]] hive 3.1.2.
> Create tables:
>  
> {code:java}
> CREATE TABLE `testj2`(
>   `id` int, 
>   `bn` string, 
>   `cn` string, 
>   `ad` map, 
>   `mi` array)
> PARTITIONED BY ( 
>   `br` string)
> CLUSTERED BY ( 
>   bn) 
> INTO 2 BUCKETS
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> TBLPROPERTIES (
>   'bucketing_version'='2');
> CREATE TABLE `testj1`(
>   `id` int, 
>   `can` string, 
>   `cn` string, 
>   `ad` map, 
>   `av` boolean, 
>   `mi` array)
> PARTITIONED BY ( 
>   `brand` string)
> CLUSTERED BY ( 
>   can) 
> INTO 2 BUCKETS
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> TBLPROPERTIES (
>   'bucketing_version'='2');
> {code}
> insert some data in both:
> {code:java}
> insert into testj1 values (100, 'mes_1', 'customer_1',  map('city1', 560077), 
> false, array(5, 10), 'brand_1'),
> (101, 'mes_2', 'customer_2',  map('city2', 560078), true, array(10, 20), 
> 'brand_2'),
> (102, 'mes_3', 'customer_3',  map('city3', 560079), false, array(15, 30), 
> 'brand_3'),
> (103, 'mes_4', 'customer_4',  map('city4', 560080), true, array(20, 40), 
> 'brand_4'),
> (104, 'mes_5', 'customer_5',  map('city5', 560081), false, array(25, 50), 
> 'brand_5');
> insert into table testj2 values (100, 'tv_0', 'customer_0', map('city0', 
> 560076),array(0, 0, 0), 'tv'),
> (101, 'tv_1', 'customer_1', map('city1', 560077),array(20, 25, 30), 'tv'),
> (102, 'tv_2', 'customer_2', map('city2', 560078),array(40, 50, 60), 'tv'),
> (103, 'tv_3', 'customer_3', map('city3', 560079),array(60, 75, 90), 'tv'),
> (104, 'tv_4', 'customer_4', map('city4', 560080),array(80, 100, 120), 'tv');
> {code}
> Do a join between them:
> {code:java}
> select t1.id, t1.can, t1.cn, t2.bn,t2.ad, t2.br FROM testj1 t1 JOIN testj2 t2 
> on (t1.id = t2.id) order by t1.id;
> {code}
> Observed results:
> !image-2019-11-28-20-46-25-432.png|width=524,height=100!
> In the plan, I can see a map join. Disabling it gives the correct result.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-22561) Data loss on map join for bucketed, partitioned table

2020-03-04 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-22561:
--

Assignee: Jesus Camacho Rodriguez  (was: Aditya Shah)

> Data loss on map join for bucketed, partitioned table
> -
>
> Key: HIVE-22561
> URL: https://issues.apache.org/jira/browse/HIVE-22561
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: Aditya Shah
>Assignee: Jesus Camacho Rodriguez
>Priority: Blocker
> Fix For: 3.0.0, 3.1.0
>
> Attachments: HIVE-22561.1.branch-3.1.patch, 
> HIVE-22561.branch-3.1.patch, HIVE-22561.patch, Screenshot 2019-11-28 at 
> 8.45.17 PM.png, image-2019-11-28-20-46-25-432.png
>
>
> A map join on a column (which is neither involved in bucketing and partition) 
> causes data loss. 
> Steps to reproduce:
> Env: [hive-dev-box|[https://github.com/kgyrtkirk/hive-dev-box]] hive 3.1.2.
> Create tables:
>  
> {code:java}
> CREATE TABLE `testj2`(
>   `id` int, 
>   `bn` string, 
>   `cn` string, 
>   `ad` map, 
>   `mi` array)
> PARTITIONED BY ( 
>   `br` string)
> CLUSTERED BY ( 
>   bn) 
> INTO 2 BUCKETS
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> TBLPROPERTIES (
>   'bucketing_version'='2');
> CREATE TABLE `testj1`(
>   `id` int, 
>   `can` string, 
>   `cn` string, 
>   `ad` map, 
>   `av` boolean, 
>   `mi` array)
> PARTITIONED BY ( 
>   `brand` string)
> CLUSTERED BY ( 
>   can) 
> INTO 2 BUCKETS
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> TBLPROPERTIES (
>   'bucketing_version'='2');
> {code}
> insert some data in both:
> {code:java}
> insert into testj1 values (100, 'mes_1', 'customer_1',  map('city1', 560077), 
> false, array(5, 10), 'brand_1'),
> (101, 'mes_2', 'customer_2',  map('city2', 560078), true, array(10, 20), 
> 'brand_2'),
> (102, 'mes_3', 'customer_3',  map('city3', 560079), false, array(15, 30), 
> 'brand_3'),
> (103, 'mes_4', 'customer_4',  map('city4', 560080), true, array(20, 40), 
> 'brand_4'),
> (104, 'mes_5', 'customer_5',  map('city5', 560081), false, array(25, 50), 
> 'brand_5');
> insert into table testj2 values (100, 'tv_0', 'customer_0', map('city0', 
> 560076),array(0, 0, 0), 'tv'),
> (101, 'tv_1', 'customer_1', map('city1', 560077),array(20, 25, 30), 'tv'),
> (102, 'tv_2', 'customer_2', map('city2', 560078),array(40, 50, 60), 'tv'),
> (103, 'tv_3', 'customer_3', map('city3', 560079),array(60, 75, 90), 'tv'),
> (104, 'tv_4', 'customer_4', map('city4', 560080),array(80, 100, 120), 'tv');
> {code}
> Do a join between them:
> {code:java}
> select t1.id, t1.can, t1.cn, t2.bn,t2.ad, t2.br FROM testj1 t1 JOIN testj2 t2 
> on (t1.id = t2.id) order by t1.id;
> {code}
> Observed results:
> !image-2019-11-28-20-46-25-432.png|width=524,height=100!
> In the plan, I can see a map join. Disabling it gives the correct result.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22954) Schedule Repl Load using Hive Scheduler

2020-03-04 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-22954:
---
Attachment: HIVE-22954.09.patch
Status: Patch Available  (was: In Progress)

> Schedule Repl Load using Hive Scheduler
> ---
>
> Key: HIVE-22954
> URL: https://issues.apache.org/jira/browse/HIVE-22954
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22954.01.patch, HIVE-22954.02.patch, 
> HIVE-22954.03.patch, HIVE-22954.04.patch, HIVE-22954.05.patch, 
> HIVE-22954.06.patch, HIVE-22954.07.patch, HIVE-22954.08.patch, 
> HIVE-22954.09.patch, HIVE-22954.patch
>
>
> [https://github.com/apache/hive/pull/932]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22954) Schedule Repl Load using Hive Scheduler

2020-03-04 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-22954:
---
Status: In Progress  (was: Patch Available)

> Schedule Repl Load using Hive Scheduler
> ---
>
> Key: HIVE-22954
> URL: https://issues.apache.org/jira/browse/HIVE-22954
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22954.01.patch, HIVE-22954.02.patch, 
> HIVE-22954.03.patch, HIVE-22954.04.patch, HIVE-22954.05.patch, 
> HIVE-22954.06.patch, HIVE-22954.07.patch, HIVE-22954.08.patch, 
> HIVE-22954.09.patch, HIVE-22954.patch
>
>
> [https://github.com/apache/hive/pull/932]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397796=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397796
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 04/Mar/20 19:17
Start Date: 04/Mar/20 19:17
Worklog Time Spent: 10m 
  Work Description: davidov541 commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r387878744
 
 

 ##
 File path: kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java
 ##
 @@ -133,12 +134,40 @@
   Preconditions.checkArgument(!schemaFromProperty.isEmpty(), "Avro Schema 
is empty Can not go further");
   Schema schema = AvroSerdeUtils.getSchemaFor(schemaFromProperty);
   LOG.debug("Building Avro Reader with schema {}", schemaFromProperty);
-  bytesConverter = new AvroBytesConverter(schema);
+  bytesConverter = getByteConverterForAvroDelegate(schema, tbl);
 } else {
   bytesConverter = new BytesWritableConverter();
 }
   }
 
+  enum BytesConverterType {
+CONFLUENT,
+SKIP,
+NONE;
+
+static BytesConverterType fromString(String value) {
+  try {
+return BytesConverterType.valueOf(value.trim().toUpperCase());
+  } catch (Exception e){
+return NONE;
+  }
+}
+  }
+
+  BytesConverter getByteConverterForAvroDelegate(Schema schema, Properties 
tbl) {
+String avroBytesConverterProperty = tbl.getProperty(AvroSerdeUtils
+
.AvroTableProperties.AVRO_SERDE_TYPE
+.getPropName(), 
BytesConverterType.NONE.toString());
+BytesConverterType avroByteConverterType = 
BytesConverterType.fromString(avroBytesConverterProperty);
+Integer avroSkipBytes = 
Integer.getInteger(tbl.getProperty(AvroSerdeUtils.AvroTableProperties.AVRO_SERDE_SKIP_BYTES
 
 Review comment:
   I think I'm confused what you're asking for then. The initialize function 
takes in a java.util.Properties object that has properties that have been set 
for the serde in the DDL for the table. It reads a few from that object, and 
then passes it to getByteConverterForAvroDelegate, where it is also used in the 
code added here. The usage of the properties object here matches what is being 
done in initialize, and seems to match what I would expect. These aren't 
pulling system properties of the JVM, or at least are not necessarily doing so, 
instead reading from the Properties object passed to us.
   
   Does that make sense, or am I way off base?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397796)
Time Spent: 7h 10m  (was: 7h)

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch
>
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22974) Metastore's table location check should be optional

2020-03-04 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051542#comment-17051542
 ] 

Hive QA commented on HIVE-22974:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
39s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
55s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  2m 
46s{color} | {color:blue} standalone-metastore/metastore-common in master has 
35 extant Findbugs warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  1m 
20s{color} | {color:blue} standalone-metastore/metastore-server in master has 
185 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
16s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
21s{color} | {color:red} standalone-metastore/metastore-server: The patch 
generated 2 new + 375 unchanged - 0 fixed = 377 total (was 375) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
15s{color} | {color:red} The patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 25m 53s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-20950/dev-support/hive-personality.sh
 |
| git revision | master / deebfb6 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20950/yetus/diff-checkstyle-standalone-metastore_metastore-server.txt
 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20950/yetus/patch-asflicense-problems.txt
 |
| modules | C: standalone-metastore/metastore-common 
standalone-metastore/metastore-server U: standalone-metastore |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20950/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Metastore's table location check should be optional
> ---
>
> Key: HIVE-22974
> URL: https://issues.apache.org/jira/browse/HIVE-22974
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22974.1.patch
>
>
> In HIVE-22189 a check was introduced to make sure managed and external tables 
> are located at the proper space. This condition cannot be satisfied during an 
> upgrade.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397788=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397788
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 04/Mar/20 19:07
Start Date: 04/Mar/20 19:07
Worklog Time Spent: 10m 
  Work Description: b-slim commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r387873547
 
 

 ##
 File path: kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java
 ##
 @@ -133,12 +134,40 @@
   Preconditions.checkArgument(!schemaFromProperty.isEmpty(), "Avro Schema 
is empty Can not go further");
   Schema schema = AvroSerdeUtils.getSchemaFor(schemaFromProperty);
   LOG.debug("Building Avro Reader with schema {}", schemaFromProperty);
-  bytesConverter = new AvroBytesConverter(schema);
+  bytesConverter = getByteConverterForAvroDelegate(schema, tbl);
 } else {
   bytesConverter = new BytesWritableConverter();
 }
   }
 
+  enum BytesConverterType {
+CONFLUENT,
+SKIP,
+NONE;
+
+static BytesConverterType fromString(String value) {
+  try {
+return BytesConverterType.valueOf(value.trim().toUpperCase());
+  } catch (Exception e){
+return NONE;
+  }
+}
+  }
+
+  BytesConverter getByteConverterForAvroDelegate(Schema schema, Properties 
tbl) {
+String avroBytesConverterProperty = tbl.getProperty(AvroSerdeUtils
+
.AvroTableProperties.AVRO_SERDE_TYPE
+.getPropName(), 
BytesConverterType.NONE.toString());
+BytesConverterType avroByteConverterType = 
BytesConverterType.fromString(avroBytesConverterProperty);
+Integer avroSkipBytes = 
Integer.getInteger(tbl.getProperty(AvroSerdeUtils.AvroTableProperties.AVRO_SERDE_SKIP_BYTES
 
 Review comment:
   @davidov541 did you read the comment above ? That function you are using is 
reading from the system property of the JVM please see the function 
implementation `System.getProperty(nm)`. 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397788)
Time Spent: 7h  (was: 6h 50m)

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?focusedWorklogId=397779=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397779
 ]

ASF GitHub Bot logged work on HIVE-21218:
-

Author: ASF GitHub Bot
Created on: 04/Mar/20 19:00
Start Date: 04/Mar/20 19:00
Worklog Time Spent: 10m 
  Work Description: cricket007 commented on pull request #933: HIVE-21218: 
Adding support for Confluent Kafka Avro message format
URL: https://github.com/apache/hive/pull/933#discussion_r387866510
 
 

 ##
 File path: kafka-handler/pom.xml
 ##
 @@ -190,5 +207,27 @@
 
   
 
+
+  
+
+  org.apache.avro
+  avro-maven-plugin
+  1.8.1
 
 Review comment:
   Is the Avro version stored in properties anywhere else? Confluent uses Avro 
1.9.x now 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397779)
Time Spent: 6h 40m  (was: 6.5h)

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.patch
>
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >