[jira] [Commented] (HIVE-25188) JsonSerDe: Unable to read the string value from a nested json

2021-06-03 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17357000#comment-17357000
 ] 

Zhihua Deng commented on HIVE-25188:


{quote}

The "data" field is not a valid JSON String type and therefore we should not 
allow this type of interaction. 

{quote}

That's a good clarify of this, thanks, close the pr.

> JsonSerDe: Unable to read the string value from a nested json
> -
>
> Key: HIVE-25188
> URL: https://issues.apache.org/jira/browse/HIVE-25188
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 4.0.0
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Steps to reproduce:
> create table json_table(data string, messageid string, publish_time bigint, 
> attributes string);
>  
> if the data of the table stored like:
> {code:java}
> {"data":{"H":{"event":"track_active","platform":"Android"},"B":{"device_type":"Phone","uuid":"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"}},"messageId":"2475185636801962","publish_time":1622514629783,"attributes":{"region":"IN"}}"}}{code}
> Exception will be thrown when trying to deserialize the data:
>  
> Caused by: java.lang.IllegalArgumentException
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:108)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitLeafNode(HiveJsonReader.java:374)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:216)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitStructNode(HiveJsonReader.java:327)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:221)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.parseStruct(HiveJsonReader.java:198)
>  at org.apache.hadoop.hive.serde2.JsonSerDe.deserialize(JsonSerDe.java:181)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25188) JsonSerDe: Unable to read the string value from a nested json

2021-06-03 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356997#comment-17356997
 ] 

Zhihua Deng commented on HIVE-25188:


Thank you for the comments, [~belugabehr]!  Agreed that "too lenient" cloud 
bring some problems, the main concern is that compared the maintenance 
afterwards, it's not worth solving/improving the problem. The Jira is just to 
solve the read of nested json value, no more features or complex logic are 
introduced. 

> JsonSerDe: Unable to read the string value from a nested json
> -
>
> Key: HIVE-25188
> URL: https://issues.apache.org/jira/browse/HIVE-25188
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 4.0.0
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Steps to reproduce:
> create table json_table(data string, messageid string, publish_time bigint, 
> attributes string);
>  
> if the data of the table stored like:
> {code:java}
> {"data":{"H":{"event":"track_active","platform":"Android"},"B":{"device_type":"Phone","uuid":"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"}},"messageId":"2475185636801962","publish_time":1622514629783,"attributes":{"region":"IN"}}"}}{code}
> Exception will be thrown when trying to deserialize the data:
>  
> Caused by: java.lang.IllegalArgumentException
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:108)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitLeafNode(HiveJsonReader.java:374)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:216)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitStructNode(HiveJsonReader.java:327)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:221)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.parseStruct(HiveJsonReader.java:198)
>  at org.apache.hadoop.hive.serde2.JsonSerDe.deserialize(JsonSerDe.java:181)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25188) JsonSerDe: Unable to read the string value from a nested json

2021-06-03 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356612#comment-17356612
 ] 

David Mollitor commented on HIVE-25188:
---

[~dengzh] As I understand the request, I am not in support of it.  The "data" 
field is not a valid JSON String type and therefore we should not allow this 
type of interaction.  Hive is already far too lenient it what it allows, which 
leads to break downs in testing, knowledge debt, and a larger testing surface 
area.  Just my opinion on the matter, maybe other disagree and can chime in.

> JsonSerDe: Unable to read the string value from a nested json
> -
>
> Key: HIVE-25188
> URL: https://issues.apache.org/jira/browse/HIVE-25188
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 4.0.0
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Steps to reproduce:
> create table json_table(data string, messageid string, publish_time bigint, 
> attributes string);
>  
> if the data of the table stored like:
> {code:java}
> {"data":{"H":{"event":"track_active","platform":"Android"},"B":{"device_type":"Phone","uuid":"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"}},"messageId":"2475185636801962","publish_time":1622514629783,"attributes":{"region":"IN"}}"}}{code}
> Exception will be thrown when trying to deserialize the data:
>  
> Caused by: java.lang.IllegalArgumentException
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:108)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitLeafNode(HiveJsonReader.java:374)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:216)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitStructNode(HiveJsonReader.java:327)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:221)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.parseStruct(HiveJsonReader.java:198)
>  at org.apache.hadoop.hive.serde2.JsonSerDe.deserialize(JsonSerDe.java:181)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25188) JsonSerDe: Unable to read the string value from a nested json

2021-06-02 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356065#comment-17356065
 ] 

Zhihua Deng commented on HIVE-25188:


Oh, sorry for the misspelling json. yes we can encode the data, decode when 
using,  inspecting the value by some json function, in this way, the json file 
may not be easy to read and understand at first glance, and another 
encoding-decoding step is introduced in order to be able to read the json by 
hive. In fact the 
[org.openx.data.jsonserde.JsonSerDelhttps://github.com/rcongiu/Hive-JSON-Serde/blob/develop/json-serde/src/main/java/org/openx/data/jsonserde/JsonSerDe.java]
 can retrieve the "data" field directly,  but we hope to migrate it to hive 
embedded json serde utils.

> JsonSerDe: Unable to read the string value from a nested json
> -
>
> Key: HIVE-25188
> URL: https://issues.apache.org/jira/browse/HIVE-25188
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 4.0.0
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Steps to reproduce:
> create table json_table(data string, messageid string, publish_time bigint, 
> attributes string);
>  
> if the data of the table stored like:
> {code:java}
> {"data":{"H":{"event":"track_active","platform":"Android"},"B":{"device_type":"Phone","uuid":"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"}},"messageId":"2475185636801962","publish_time":1622514629783,"attributes":{"region":"IN"}}"}}{code}
> Exception will be thrown when trying to deserialize the data:
>  
> Caused by: java.lang.IllegalArgumentException
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:108)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitLeafNode(HiveJsonReader.java:374)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:216)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitStructNode(HiveJsonReader.java:327)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:221)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.parseStruct(HiveJsonReader.java:198)
>  at org.apache.hadoop.hive.serde2.JsonSerDe.deserialize(JsonSerDe.java:181)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25188) JsonSerDe: Unable to read the string value from a nested json

2021-06-02 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17355908#comment-17355908
 ] 

David Mollitor commented on HIVE-25188:
---

[~dengzh] I've formatted the JSON to make it easier to read for discussion 
sake.  FYI, there are a few stray characters at the end of your example that 
were giving me issues during formatting.

{code:json}
{
"data": {
"H": {
"event": "track_active",
"platform": "Android"
},
"B": {
"device_type": "Phone",
"uuid": 
"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"
}
},
"messageId": "2475185636801962",
"publish_time": 1622514629783,
"attributes": {
"region": "IN"
}
}
{code}

create table json_table(data string, messageid string, publish_time bigint, 
attributes string);

The {{data}} field is not a String type.  It is itself a data type of type 
struct.  If you intend to do something like stuffing arbitrary data in that 
field, then "data" should be a Base-64 string and then you can declare it as a 
Binary type in Hive.  I think that's the preferred approach instead of just 
allowing an overloaded String type.


> JsonSerDe: Unable to read the string value from a nested json
> -
>
> Key: HIVE-25188
> URL: https://issues.apache.org/jira/browse/HIVE-25188
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 4.0.0
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Steps to reproduce:
> create table json_table(data string, messageid string, publish_time bigint, 
> attributes string);
>  
> if the data of the table stored like:
> {code:java}
> {"data":{"H":{"event":"track_active","platform":"Android"},"B":{"device_type":"Phone","uuid":"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"}},"messageId":"2475185636801962","publish_time":1622514629783,"attributes":{"region":"IN"}}"}}{code}
> Exception will be thrown when trying to deserialize the data:
>  
> Caused by: java.lang.IllegalArgumentException
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:108)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitLeafNode(HiveJsonReader.java:374)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:216)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitStructNode(HiveJsonReader.java:327)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:221)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.parseStruct(HiveJsonReader.java:198)
>  at org.apache.hadoop.hive.serde2.JsonSerDe.deserialize(JsonSerDe.java:181)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25188) JsonSerDe: Unable to read the string value from a nested json

2021-06-02 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17355802#comment-17355802
 ] 

Zhihua Deng commented on HIVE-25188:


Hey [~belugabehr], As you have seen it,  if user creates such tables to 
retrieve the "data" field, exception will be thrown, though the input is a 
valid complete json, so here I propose to answer the struct value to the "data" 
field, like that we can get the value mapping to the field "data" using the 
json lib.

Thanks, Zhihua Deng

 

> JsonSerDe: Unable to read the string value from a nested json
> -
>
> Key: HIVE-25188
> URL: https://issues.apache.org/jira/browse/HIVE-25188
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 4.0.0
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Steps to reproduce:
> create table json_table(data string, messageid string, publish_time bigint, 
> attributes string);
>  
> if the data of the table stored like:
> {code:java}
> {"data":{"H":{"event":"track_active","platform":"Android"},"B":{"device_type":"Phone","uuid":"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"}},"messageId":"2475185636801962","publish_time":1622514629783,"attributes":{"region":"IN"}}"}}{code}
> Exception will be thrown when trying to deserialize the data:
>  
> Caused by: java.lang.IllegalArgumentException
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:108)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitLeafNode(HiveJsonReader.java:374)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:216)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitStructNode(HiveJsonReader.java:327)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:221)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.parseStruct(HiveJsonReader.java:198)
>  at org.apache.hadoop.hive.serde2.JsonSerDe.deserialize(JsonSerDe.java:181)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25188) JsonSerDe: Unable to read the string value from a nested json

2021-06-02 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17355717#comment-17355717
 ] 

David Mollitor commented on HIVE-25188:
---

Hello and thanks for the report.

 

The "data" field is not a String, it's a struct.  What do you propose is the 
expected behavior here?

> JsonSerDe: Unable to read the string value from a nested json
> -
>
> Key: HIVE-25188
> URL: https://issues.apache.org/jira/browse/HIVE-25188
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 4.0.0
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Steps to reproduce:
> create table json_table(data string, messageid string, publish_time bigint, 
> attributes string);
>  
> if the data of the table stored like:
> {code:java}
> {"data":{"H":{"event":"track_active","platform":"Android"},"B":{"device_type":"Phone","uuid":"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"}},"messageId":"2475185636801962","publish_time":1622514629783,"attributes":{"region":"IN"}}"}}{code}
> Exception will be thrown when trying to deserialize the data:
>  
> Caused by: java.lang.IllegalArgumentException
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:108)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitLeafNode(HiveJsonReader.java:374)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:216)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitStructNode(HiveJsonReader.java:327)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:221)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.parseStruct(HiveJsonReader.java:198)
>  at org.apache.hadoop.hive.serde2.JsonSerDe.deserialize(JsonSerDe.java:181)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)