Re: JSON support: Hive compatibility or ANSI SQL standard
+1 for option c On Mon, Sep 10, 2018 at 10:34 AM, Tim Armstrong wrote: > I think I agree with Lars. Having the (mostly) Hive-compatible version is > useful if there are shared views between Hive/Impala and for people > migrating queries from Hive or some of the Impala JSON UDFs that I've seen > floating around. > > On Mon, Sep 10, 2018 at 8:20 AM, Lars Volker wrote: > > > Thanks for the comprehensive summary, Quanlong. > > > > I'm in favor of (c) but I don't feel strongly about the order in which we > > should prioritize them. Being compatible to Hive and following the SQL > > standard will help users to switch more easily between systems. I'm not > > worried about the confusing users too much. Even if the functions are > have > > similar names, people are not going to confuse them by accident, and we > can > > add a warning to the docs to point out that get_json_object should not be > > confused with the ANSI SQL function json_object. > > > > Cheers, Lars > > > > On Mon, Sep 10, 2018 at 7:57 AM Quanlong Huang > > wrote: > > > >> Hi all, > >> > >> Recently, I'm working on a patch to add a built-in function for parsing > >> JSON strings and extracting values in them: https://gerrit.cloudera. > >> org/#/c/10950. We've not reached an agreement on the name of this > >> function. So I reach here for a broader discussion. > >> > >> *-- Background --* > >> > >> Hive has a built-in function (get_json_object) for this purpose. But > it's > >> a Java UDF. Impala can't track its memory usage. The current patch adds > >> built-in support for this function. > >> > >> Greg suggested that we name this function as json_value(). It's a > >> function in ANSI SQL standard. Note that json_value() only returns > scalar > >> types so json_query() in the standard may be more fit. > >> > >> The discussion is about which of the following options we should choose > >> for JSON support: > >> > >> (a) Support get_json_object for compatibility to Hive; > >> (b) Support JSON functions in ANSI SQL standard; > >> (c) Support both. > >> > >> I used to be in favor of (c). But Zoltan points out there's a > >> json_object() function in ANSI standard with a quite different meaning > >> which will confuse users with the Hive get_json_object UDF. Maybe we can > >> only choose (a) or (b). > >> > >> *-- SQL supports in other systems --* > >> > >> Oracle supports JSON functions in ANSI standard, while MySQL does not. > >> There's a JSON_EXTRACT function in MySQL contains all the ability of > Hive's > >> get_json_object function. > >> > >> Presto and Greenplum do not support ANSI JSON standards too: > >> https://prestodb.io/docs/current/functions/json.html, > >> https://gpdb.docs.pivotal.io/5100/admin_guide/query/topics/ > json-data.html > >> > >> Here're the previous discussions: https://gerrit. > >> cloudera.org/#/c/10950/13/common/function-registry/ > >> impala_functions.py@514 > >> > >> What do you think? > >> > >> Thanks, > >> Quanlong > >> > > >
Re: JSON support: Hive compatibility or ANSI SQL standard
I think I agree with Lars. Having the (mostly) Hive-compatible version is useful if there are shared views between Hive/Impala and for people migrating queries from Hive or some of the Impala JSON UDFs that I've seen floating around. On Mon, Sep 10, 2018 at 8:20 AM, Lars Volker wrote: > Thanks for the comprehensive summary, Quanlong. > > I'm in favor of (c) but I don't feel strongly about the order in which we > should prioritize them. Being compatible to Hive and following the SQL > standard will help users to switch more easily between systems. I'm not > worried about the confusing users too much. Even if the functions are have > similar names, people are not going to confuse them by accident, and we can > add a warning to the docs to point out that get_json_object should not be > confused with the ANSI SQL function json_object. > > Cheers, Lars > > On Mon, Sep 10, 2018 at 7:57 AM Quanlong Huang > wrote: > >> Hi all, >> >> Recently, I'm working on a patch to add a built-in function for parsing >> JSON strings and extracting values in them: https://gerrit.cloudera. >> org/#/c/10950. We've not reached an agreement on the name of this >> function. So I reach here for a broader discussion. >> >> *-- Background --* >> >> Hive has a built-in function (get_json_object) for this purpose. But it's >> a Java UDF. Impala can't track its memory usage. The current patch adds >> built-in support for this function. >> >> Greg suggested that we name this function as json_value(). It's a >> function in ANSI SQL standard. Note that json_value() only returns scalar >> types so json_query() in the standard may be more fit. >> >> The discussion is about which of the following options we should choose >> for JSON support: >> >> (a) Support get_json_object for compatibility to Hive; >> (b) Support JSON functions in ANSI SQL standard; >> (c) Support both. >> >> I used to be in favor of (c). But Zoltan points out there's a >> json_object() function in ANSI standard with a quite different meaning >> which will confuse users with the Hive get_json_object UDF. Maybe we can >> only choose (a) or (b). >> >> *-- SQL supports in other systems --* >> >> Oracle supports JSON functions in ANSI standard, while MySQL does not. >> There's a JSON_EXTRACT function in MySQL contains all the ability of Hive's >> get_json_object function. >> >> Presto and Greenplum do not support ANSI JSON standards too: >> https://prestodb.io/docs/current/functions/json.html, >> https://gpdb.docs.pivotal.io/5100/admin_guide/query/topics/json-data.html >> >> Here're the previous discussions: https://gerrit. >> cloudera.org/#/c/10950/13/common/function-registry/ >> impala_functions.py@514 >> >> What do you think? >> >> Thanks, >> Quanlong >> >
Re: JSON support: Hive compatibility or ANSI SQL standard
Thanks for the comprehensive summary, Quanlong. I'm in favor of (c) but I don't feel strongly about the order in which we should prioritize them. Being compatible to Hive and following the SQL standard will help users to switch more easily between systems. I'm not worried about the confusing users too much. Even if the functions are have similar names, people are not going to confuse them by accident, and we can add a warning to the docs to point out that get_json_object should not be confused with the ANSI SQL function json_object. Cheers, Lars On Mon, Sep 10, 2018 at 7:57 AM Quanlong Huang wrote: > Hi all, > > Recently, I'm working on a patch to add a built-in function for parsing > JSON strings and extracting values in them: > https://gerrit.cloudera.org/#/c/10950. We've not reached an agreement on > the name of this function. So I reach here for a broader discussion. > > *-- Background --* > > Hive has a built-in function (get_json_object) for this purpose. But it's > a Java UDF. Impala can't track its memory usage. The current patch adds > built-in support for this function. > > Greg suggested that we name this function as json_value(). It's a function > in ANSI SQL standard. Note that json_value() only returns scalar types so > json_query() in the standard may be more fit. > > The discussion is about which of the following options we should choose > for JSON support: > > (a) Support get_json_object for compatibility to Hive; > (b) Support JSON functions in ANSI SQL standard; > (c) Support both. > > I used to be in favor of (c). But Zoltan points out there's a > json_object() function in ANSI standard with a quite different meaning > which will confuse users with the Hive get_json_object UDF. Maybe we can > only choose (a) or (b). > > *-- SQL supports in other systems --* > > Oracle supports JSON functions in ANSI standard, while MySQL does not. > There's a JSON_EXTRACT function in MySQL contains all the ability of Hive's > get_json_object function. > > Presto and Greenplum do not support ANSI JSON standards too: > https://prestodb.io/docs/current/functions/json.html, > https://gpdb.docs.pivotal.io/5100/admin_guide/query/topics/json-data.html > > Here're the previous discussions: > https://gerrit.cloudera.org/#/c/10950/13/common/function-registry/impala_functions.py@514 > > What do you think? > > Thanks, > Quanlong >
JSON support: Hive compatibility or ANSI SQL standard
Hi all, Recently, I'm working on a patch to add a built-in function for parsing JSON strings and extracting values in them: https://gerrit.cloudera.org/#/c/10950. We've not reached an agreement on the name of this function. So I reach here for a broader discussion. *-- Background --* Hive has a built-in function (get_json_object) for this purpose. But it's a Java UDF. Impala can't track its memory usage. The current patch adds built-in support for this function. Greg suggested that we name this function as json_value(). It's a function in ANSI SQL standard. Note that json_value() only returns scalar types so json_query() in the standard may be more fit. The discussion is about which of the following options we should choose for JSON support: (a) Support get_json_object for compatibility to Hive; (b) Support JSON functions in ANSI SQL standard; (c) Support both. I used to be in favor of (c). But Zoltan points out there's a json_object() function in ANSI standard with a quite different meaning which will confuse users with the Hive get_json_object UDF. Maybe we can only choose (a) or (b). *-- SQL supports in other systems --* Oracle supports JSON functions in ANSI standard, while MySQL does not. There's a JSON_EXTRACT function in MySQL contains all the ability of Hive's get_json_object function. Presto and Greenplum do not support ANSI JSON standards too: https://prestodb.io/docs/current/functions/json.html, https://gpdb.docs.pivotal.io/5100/admin_guide/query/topics/json-data.html Here're the previous discussions: https://gerrit.cloudera.org/#/c/10950/13/common/function-registry/impala_functions.py@514 What do you think? Thanks, Quanlong
Re: [VOTE] Removing emeritus and activity requirements
+1 (non-binding) On Mon, Sep 10, 2018 at 3:13 PM Yongjun Zhang wrote: > +1 (non-binding) > > Thanks. > > --Yongjun > > On Sun, Sep 9, 2018 at 9:58 PM, Henry Robinson wrote: > > > +1 (binding) > > > > On Sun, 9 Sep 2018 at 18:41, Lars Volker wrote: > > > > > +1 (binding) > > > > > > On Sun, Sep 9, 2018, 18:36 Brock Noland wrote: > > > > > > > +1 > > > > > > > > On Sun, Sep 9, 2018 at 8:15 PM, Jim Apple > > wrote: > > > > > > > > > Please vote on the following diff to the bylaws: > > > > > > > > > > https://gerrit.cloudera.org/#/c/11407/ > > > > > > > > > > The vote will conclude Thursday morning, California time. Only PMC > > > > > votes are binding, but everyone is welcomed and encouraged to vote. > > > > > You can vote here or on the diff itself on gerrit. This vote will > > pass > > > > > with 3 binding +1 votes and more binding +1 votes than -1 votes. > > > > > > > > > > For posterity, this diff is on our reviews@ mailing list, as are > all > > > > > code reviews and gerrit traffic: > > > > > > > > > > https://lists.apache.org/api/source.lua/ > > fe0a4c170824112c0efd69265a8951 > > > > > afe162226e0c998d669a4b7216@%3Creviews.impala.apache.org%3E > > > > > > > > > > > > > > >
Re: [VOTE] Removing emeritus and activity requirements
+1 (non-binding) Thanks. --Yongjun On Sun, Sep 9, 2018 at 9:58 PM, Henry Robinson wrote: > +1 (binding) > > On Sun, 9 Sep 2018 at 18:41, Lars Volker wrote: > > > +1 (binding) > > > > On Sun, Sep 9, 2018, 18:36 Brock Noland wrote: > > > > > +1 > > > > > > On Sun, Sep 9, 2018 at 8:15 PM, Jim Apple > wrote: > > > > > > > Please vote on the following diff to the bylaws: > > > > > > > > https://gerrit.cloudera.org/#/c/11407/ > > > > > > > > The vote will conclude Thursday morning, California time. Only PMC > > > > votes are binding, but everyone is welcomed and encouraged to vote. > > > > You can vote here or on the diff itself on gerrit. This vote will > pass > > > > with 3 binding +1 votes and more binding +1 votes than -1 votes. > > > > > > > > For posterity, this diff is on our reviews@ mailing list, as are all > > > > code reviews and gerrit traffic: > > > > > > > > https://lists.apache.org/api/source.lua/ > fe0a4c170824112c0efd69265a8951 > > > > afe162226e0c998d669a4b7216@%3Creviews.impala.apache.org%3E > > > > > > > > > >