Hi all, Recently, I'm working on a patch to add a built-in function for parsing JSON strings and extracting values in them: https://gerrit.cloudera.org/#/c/10950. We've not reached an agreement on the name of this function. So I reach here for a broader discussion.
*-- Background --* Hive has a built-in function (get_json_object) for this purpose. But it's a Java UDF. Impala can't track its memory usage. The current patch adds built-in support for this function. Greg suggested that we name this function as json_value(). It's a function in ANSI SQL standard. Note that json_value() only returns scalar types so json_query() in the standard may be more fit. The discussion is about which of the following options we should choose for JSON support: (a) Support get_json_object for compatibility to Hive; (b) Support JSON functions in ANSI SQL standard; (c) Support both. I used to be in favor of (c). But Zoltan points out there's a json_object() function in ANSI standard with a quite different meaning which will confuse users with the Hive get_json_object UDF. Maybe we can only choose (a) or (b). *-- SQL supports in other systems --* Oracle supports JSON functions in ANSI standard, while MySQL does not. There's a JSON_EXTRACT function in MySQL contains all the ability of Hive's get_json_object function. Presto and Greenplum do not support ANSI JSON standards too: https://prestodb.io/docs/current/functions/json.html, https://gpdb.docs.pivotal.io/5100/admin_guide/query/topics/json-data.html Here're the previous discussions: https://gerrit.cloudera.org/#/c/10950/13/common/function-registry/impala_functions.py@514 What do you think? Thanks, Quanlong