[jira] [Commented] (HIVE-25188) JsonSerDe: Unable to read the string value from a nested json
[ https://issues.apache.org/jira/browse/HIVE-25188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17357000#comment-17357000 ] Zhihua Deng commented on HIVE-25188: {quote} The "data" field is not a valid JSON String type and therefore we should not allow this type of interaction. {quote} That's a good clarify of this, thanks, close the pr. > JsonSerDe: Unable to read the string value from a nested json > - > > Key: HIVE-25188 > URL: https://issues.apache.org/jira/browse/HIVE-25188 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Affects Versions: 4.0.0 >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Steps to reproduce: > create table json_table(data string, messageid string, publish_time bigint, > attributes string); > > if the data of the table stored like: > {code:java} > {"data":{"H":{"event":"track_active","platform":"Android"},"B":{"device_type":"Phone","uuid":"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"}},"messageId":"2475185636801962","publish_time":1622514629783,"attributes":{"region":"IN"}}"}}{code} > Exception will be thrown when trying to deserialize the data: > > Caused by: java.lang.IllegalArgumentException > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:108) > at > org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitLeafNode(HiveJsonReader.java:374) > at > org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:216) > at > org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitStructNode(HiveJsonReader.java:327) > at > org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:221) > at > org.apache.hadoop.hive.serde2.json.HiveJsonReader.parseStruct(HiveJsonReader.java:198) > at org.apache.hadoop.hive.serde2.JsonSerDe.deserialize(JsonSerDe.java:181) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25188) JsonSerDe: Unable to read the string value from a nested json
[ https://issues.apache.org/jira/browse/HIVE-25188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356997#comment-17356997 ] Zhihua Deng commented on HIVE-25188: Thank you for the comments, [~belugabehr]! Agreed that "too lenient" cloud bring some problems, the main concern is that compared the maintenance afterwards, it's not worth solving/improving the problem. The Jira is just to solve the read of nested json value, no more features or complex logic are introduced. > JsonSerDe: Unable to read the string value from a nested json > - > > Key: HIVE-25188 > URL: https://issues.apache.org/jira/browse/HIVE-25188 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Affects Versions: 4.0.0 >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Steps to reproduce: > create table json_table(data string, messageid string, publish_time bigint, > attributes string); > > if the data of the table stored like: > {code:java} > {"data":{"H":{"event":"track_active","platform":"Android"},"B":{"device_type":"Phone","uuid":"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"}},"messageId":"2475185636801962","publish_time":1622514629783,"attributes":{"region":"IN"}}"}}{code} > Exception will be thrown when trying to deserialize the data: > > Caused by: java.lang.IllegalArgumentException > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:108) > at > org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitLeafNode(HiveJsonReader.java:374) > at > org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:216) > at > org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitStructNode(HiveJsonReader.java:327) > at > org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:221) > at > org.apache.hadoop.hive.serde2.json.HiveJsonReader.parseStruct(HiveJsonReader.java:198) > at org.apache.hadoop.hive.serde2.JsonSerDe.deserialize(JsonSerDe.java:181) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25188) JsonSerDe: Unable to read the string value from a nested json
[ https://issues.apache.org/jira/browse/HIVE-25188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356612#comment-17356612 ] David Mollitor commented on HIVE-25188: --- [~dengzh] As I understand the request, I am not in support of it. The "data" field is not a valid JSON String type and therefore we should not allow this type of interaction. Hive is already far too lenient it what it allows, which leads to break downs in testing, knowledge debt, and a larger testing surface area. Just my opinion on the matter, maybe other disagree and can chime in. > JsonSerDe: Unable to read the string value from a nested json > - > > Key: HIVE-25188 > URL: https://issues.apache.org/jira/browse/HIVE-25188 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Affects Versions: 4.0.0 >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Steps to reproduce: > create table json_table(data string, messageid string, publish_time bigint, > attributes string); > > if the data of the table stored like: > {code:java} > {"data":{"H":{"event":"track_active","platform":"Android"},"B":{"device_type":"Phone","uuid":"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"}},"messageId":"2475185636801962","publish_time":1622514629783,"attributes":{"region":"IN"}}"}}{code} > Exception will be thrown when trying to deserialize the data: > > Caused by: java.lang.IllegalArgumentException > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:108) > at > org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitLeafNode(HiveJsonReader.java:374) > at > org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:216) > at > org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitStructNode(HiveJsonReader.java:327) > at > org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:221) > at > org.apache.hadoop.hive.serde2.json.HiveJsonReader.parseStruct(HiveJsonReader.java:198) > at org.apache.hadoop.hive.serde2.JsonSerDe.deserialize(JsonSerDe.java:181) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25188) JsonSerDe: Unable to read the string value from a nested json
[ https://issues.apache.org/jira/browse/HIVE-25188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356065#comment-17356065 ] Zhihua Deng commented on HIVE-25188: Oh, sorry for the misspelling json. yes we can encode the data, decode when using, inspecting the value by some json function, in this way, the json file may not be easy to read and understand at first glance, and another encoding-decoding step is introduced in order to be able to read the json by hive. In fact the [org.openx.data.jsonserde.JsonSerDelhttps://github.com/rcongiu/Hive-JSON-Serde/blob/develop/json-serde/src/main/java/org/openx/data/jsonserde/JsonSerDe.java] can retrieve the "data" field directly, but we hope to migrate it to hive embedded json serde utils. > JsonSerDe: Unable to read the string value from a nested json > - > > Key: HIVE-25188 > URL: https://issues.apache.org/jira/browse/HIVE-25188 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Affects Versions: 4.0.0 >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Steps to reproduce: > create table json_table(data string, messageid string, publish_time bigint, > attributes string); > > if the data of the table stored like: > {code:java} > {"data":{"H":{"event":"track_active","platform":"Android"},"B":{"device_type":"Phone","uuid":"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"}},"messageId":"2475185636801962","publish_time":1622514629783,"attributes":{"region":"IN"}}"}}{code} > Exception will be thrown when trying to deserialize the data: > > Caused by: java.lang.IllegalArgumentException > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:108) > at > org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitLeafNode(HiveJsonReader.java:374) > at > org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:216) > at > org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitStructNode(HiveJsonReader.java:327) > at > org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:221) > at > org.apache.hadoop.hive.serde2.json.HiveJsonReader.parseStruct(HiveJsonReader.java:198) > at org.apache.hadoop.hive.serde2.JsonSerDe.deserialize(JsonSerDe.java:181) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25188) JsonSerDe: Unable to read the string value from a nested json
[ https://issues.apache.org/jira/browse/HIVE-25188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17355908#comment-17355908 ] David Mollitor commented on HIVE-25188: --- [~dengzh] I've formatted the JSON to make it easier to read for discussion sake. FYI, there are a few stray characters at the end of your example that were giving me issues during formatting. {code:json} { "data": { "H": { "event": "track_active", "platform": "Android" }, "B": { "device_type": "Phone", "uuid": "[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]" } }, "messageId": "2475185636801962", "publish_time": 1622514629783, "attributes": { "region": "IN" } } {code} create table json_table(data string, messageid string, publish_time bigint, attributes string); The {{data}} field is not a String type. It is itself a data type of type struct. If you intend to do something like stuffing arbitrary data in that field, then "data" should be a Base-64 string and then you can declare it as a Binary type in Hive. I think that's the preferred approach instead of just allowing an overloaded String type. > JsonSerDe: Unable to read the string value from a nested json > - > > Key: HIVE-25188 > URL: https://issues.apache.org/jira/browse/HIVE-25188 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Affects Versions: 4.0.0 >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Steps to reproduce: > create table json_table(data string, messageid string, publish_time bigint, > attributes string); > > if the data of the table stored like: > {code:java} > {"data":{"H":{"event":"track_active","platform":"Android"},"B":{"device_type":"Phone","uuid":"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"}},"messageId":"2475185636801962","publish_time":1622514629783,"attributes":{"region":"IN"}}"}}{code} > Exception will be thrown when trying to deserialize the data: > > Caused by: java.lang.IllegalArgumentException > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:108) > at > org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitLeafNode(HiveJsonReader.java:374) > at > org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:216) > at > org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitStructNode(HiveJsonReader.java:327) > at > org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:221) > at > org.apache.hadoop.hive.serde2.json.HiveJsonReader.parseStruct(HiveJsonReader.java:198) > at org.apache.hadoop.hive.serde2.JsonSerDe.deserialize(JsonSerDe.java:181) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25188) JsonSerDe: Unable to read the string value from a nested json
[ https://issues.apache.org/jira/browse/HIVE-25188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17355802#comment-17355802 ] Zhihua Deng commented on HIVE-25188: Hey [~belugabehr], As you have seen it, if user creates such tables to retrieve the "data" field, exception will be thrown, though the input is a valid complete json, so here I propose to answer the struct value to the "data" field, like that we can get the value mapping to the field "data" using the json lib. Thanks, Zhihua Deng > JsonSerDe: Unable to read the string value from a nested json > - > > Key: HIVE-25188 > URL: https://issues.apache.org/jira/browse/HIVE-25188 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Affects Versions: 4.0.0 >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Steps to reproduce: > create table json_table(data string, messageid string, publish_time bigint, > attributes string); > > if the data of the table stored like: > {code:java} > {"data":{"H":{"event":"track_active","platform":"Android"},"B":{"device_type":"Phone","uuid":"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"}},"messageId":"2475185636801962","publish_time":1622514629783,"attributes":{"region":"IN"}}"}}{code} > Exception will be thrown when trying to deserialize the data: > > Caused by: java.lang.IllegalArgumentException > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:108) > at > org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitLeafNode(HiveJsonReader.java:374) > at > org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:216) > at > org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitStructNode(HiveJsonReader.java:327) > at > org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:221) > at > org.apache.hadoop.hive.serde2.json.HiveJsonReader.parseStruct(HiveJsonReader.java:198) > at org.apache.hadoop.hive.serde2.JsonSerDe.deserialize(JsonSerDe.java:181) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25188) JsonSerDe: Unable to read the string value from a nested json
[ https://issues.apache.org/jira/browse/HIVE-25188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17355717#comment-17355717 ] David Mollitor commented on HIVE-25188: --- Hello and thanks for the report. The "data" field is not a String, it's a struct. What do you propose is the expected behavior here? > JsonSerDe: Unable to read the string value from a nested json > - > > Key: HIVE-25188 > URL: https://issues.apache.org/jira/browse/HIVE-25188 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Affects Versions: 4.0.0 >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Steps to reproduce: > create table json_table(data string, messageid string, publish_time bigint, > attributes string); > > if the data of the table stored like: > {code:java} > {"data":{"H":{"event":"track_active","platform":"Android"},"B":{"device_type":"Phone","uuid":"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"}},"messageId":"2475185636801962","publish_time":1622514629783,"attributes":{"region":"IN"}}"}}{code} > Exception will be thrown when trying to deserialize the data: > > Caused by: java.lang.IllegalArgumentException > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:108) > at > org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitLeafNode(HiveJsonReader.java:374) > at > org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:216) > at > org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitStructNode(HiveJsonReader.java:327) > at > org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:221) > at > org.apache.hadoop.hive.serde2.json.HiveJsonReader.parseStruct(HiveJsonReader.java:198) > at org.apache.hadoop.hive.serde2.JsonSerDe.deserialize(JsonSerDe.java:181) -- This message was sent by Atlassian Jira (v8.3.4#803005)