Re: JSON support: Hive compatibility or ANSI SQL standard

2018-09-10 Thread Brock Noland
+1 for option c

On Mon, Sep 10, 2018 at 10:34 AM, Tim Armstrong 
wrote:

> I think I agree with Lars. Having the (mostly) Hive-compatible version is
> useful if there are shared views between Hive/Impala and for people
> migrating queries from Hive or some of the Impala JSON UDFs that I've seen
> floating around.
>
> On Mon, Sep 10, 2018 at 8:20 AM, Lars Volker  wrote:
>
> > Thanks for the comprehensive summary, Quanlong.
> >
> > I'm in favor of (c) but I don't feel strongly about the order in which we
> > should prioritize them. Being compatible to Hive and following the SQL
> > standard will help users to switch more easily between systems. I'm not
> > worried about the confusing users too much. Even if the functions are
> have
> > similar names, people are not going to confuse them by accident, and we
> can
> > add a warning to the docs to point out that get_json_object should not be
> > confused with the ANSI SQL function json_object.
> >
> > Cheers, Lars
> >
> > On Mon, Sep 10, 2018 at 7:57 AM Quanlong Huang 
> > wrote:
> >
> >> Hi all,
> >>
> >> Recently, I'm working on a patch to add a built-in function for parsing
> >> JSON strings and extracting values in them: https://gerrit.cloudera.
> >> org/#/c/10950. We've not reached an agreement on the name of this
> >> function. So I reach here for a broader discussion.
> >>
> >> *-- Background --*
> >>
> >> Hive has a built-in function (get_json_object) for this purpose. But
> it's
> >> a Java UDF. Impala can't track its memory usage. The current patch adds
> >> built-in support for this function.
> >>
> >> Greg suggested that we name this function as json_value(). It's a
> >> function in ANSI SQL standard. Note that json_value() only returns
> scalar
> >> types so json_query() in the standard may be more fit.
> >>
> >> The discussion is about which of the following options we should choose
> >> for JSON support:
> >>
> >> (a) Support get_json_object for compatibility to Hive;
> >> (b) Support JSON functions in ANSI SQL standard;
> >> (c) Support both.
> >>
> >> I used to be in favor of (c). But Zoltan points out there's a
> >> json_object() function in ANSI standard with a quite different meaning
> >> which will confuse users with the Hive get_json_object UDF. Maybe we can
> >> only choose (a) or (b).
> >>
> >> *-- SQL supports in other systems --*
> >>
> >> Oracle supports JSON functions in ANSI standard, while MySQL does not.
> >> There's a JSON_EXTRACT function in MySQL contains all the ability of
> Hive's
> >> get_json_object function.
> >>
> >> Presto and Greenplum do not support ANSI JSON standards too:
> >> https://prestodb.io/docs/current/functions/json.html,
> >> https://gpdb.docs.pivotal.io/5100/admin_guide/query/topics/
> json-data.html
> >>
> >> Here're the previous discussions: https://gerrit.
> >> cloudera.org/#/c/10950/13/common/function-registry/
> >> impala_functions.py@514
> >>
> >> What do you think?
> >>
> >> Thanks,
> >> Quanlong
> >>
> >
>


Re: JSON support: Hive compatibility or ANSI SQL standard

2018-09-10 Thread Tim Armstrong
I think I agree with Lars. Having the (mostly) Hive-compatible version is
useful if there are shared views between Hive/Impala and for people
migrating queries from Hive or some of the Impala JSON UDFs that I've seen
floating around.

On Mon, Sep 10, 2018 at 8:20 AM, Lars Volker  wrote:

> Thanks for the comprehensive summary, Quanlong.
>
> I'm in favor of (c) but I don't feel strongly about the order in which we
> should prioritize them. Being compatible to Hive and following the SQL
> standard will help users to switch more easily between systems. I'm not
> worried about the confusing users too much. Even if the functions are have
> similar names, people are not going to confuse them by accident, and we can
> add a warning to the docs to point out that get_json_object should not be
> confused with the ANSI SQL function json_object.
>
> Cheers, Lars
>
> On Mon, Sep 10, 2018 at 7:57 AM Quanlong Huang 
> wrote:
>
>> Hi all,
>>
>> Recently, I'm working on a patch to add a built-in function for parsing
>> JSON strings and extracting values in them: https://gerrit.cloudera.
>> org/#/c/10950. We've not reached an agreement on the name of this
>> function. So I reach here for a broader discussion.
>>
>> *-- Background --*
>>
>> Hive has a built-in function (get_json_object) for this purpose. But it's
>> a Java UDF. Impala can't track its memory usage. The current patch adds
>> built-in support for this function.
>>
>> Greg suggested that we name this function as json_value(). It's a
>> function in ANSI SQL standard. Note that json_value() only returns scalar
>> types so json_query() in the standard may be more fit.
>>
>> The discussion is about which of the following options we should choose
>> for JSON support:
>>
>> (a) Support get_json_object for compatibility to Hive;
>> (b) Support JSON functions in ANSI SQL standard;
>> (c) Support both.
>>
>> I used to be in favor of (c). But Zoltan points out there's a
>> json_object() function in ANSI standard with a quite different meaning
>> which will confuse users with the Hive get_json_object UDF. Maybe we can
>> only choose (a) or (b).
>>
>> *-- SQL supports in other systems --*
>>
>> Oracle supports JSON functions in ANSI standard, while MySQL does not.
>> There's a JSON_EXTRACT function in MySQL contains all the ability of Hive's
>> get_json_object function.
>>
>> Presto and Greenplum do not support ANSI JSON standards too:
>> https://prestodb.io/docs/current/functions/json.html,
>> https://gpdb.docs.pivotal.io/5100/admin_guide/query/topics/json-data.html
>>
>> Here're the previous discussions: https://gerrit.
>> cloudera.org/#/c/10950/13/common/function-registry/
>> impala_functions.py@514
>>
>> What do you think?
>>
>> Thanks,
>> Quanlong
>>
>


Re: JSON support: Hive compatibility or ANSI SQL standard

2018-09-10 Thread Lars Volker
Thanks for the comprehensive summary, Quanlong.

I'm in favor of (c) but I don't feel strongly about the order in which we
should prioritize them. Being compatible to Hive and following the SQL
standard will help users to switch more easily between systems. I'm not
worried about the confusing users too much. Even if the functions are have
similar names, people are not going to confuse them by accident, and we can
add a warning to the docs to point out that get_json_object should not be
confused with the ANSI SQL function json_object.

Cheers, Lars

On Mon, Sep 10, 2018 at 7:57 AM Quanlong Huang 
wrote:

> Hi all,
>
> Recently, I'm working on a patch to add a built-in function for parsing
> JSON strings and extracting values in them:
> https://gerrit.cloudera.org/#/c/10950. We've not reached an agreement on
> the name of this function. So I reach here for a broader discussion.
>
> *-- Background --*
>
> Hive has a built-in function (get_json_object) for this purpose. But it's
> a Java UDF. Impala can't track its memory usage. The current patch adds
> built-in support for this function.
>
> Greg suggested that we name this function as json_value(). It's a function
> in ANSI SQL standard. Note that json_value() only returns scalar types so
> json_query() in the standard may be more fit.
>
> The discussion is about which of the following options we should choose
> for JSON support:
>
> (a) Support get_json_object for compatibility to Hive;
> (b) Support JSON functions in ANSI SQL standard;
> (c) Support both.
>
> I used to be in favor of (c). But Zoltan points out there's a
> json_object() function in ANSI standard with a quite different meaning
> which will confuse users with the Hive get_json_object UDF. Maybe we can
> only choose (a) or (b).
>
> *-- SQL supports in other systems --*
>
> Oracle supports JSON functions in ANSI standard, while MySQL does not.
> There's a JSON_EXTRACT function in MySQL contains all the ability of Hive's
> get_json_object function.
>
> Presto and Greenplum do not support ANSI JSON standards too:
> https://prestodb.io/docs/current/functions/json.html,
> https://gpdb.docs.pivotal.io/5100/admin_guide/query/topics/json-data.html
>
> Here're the previous discussions:
> https://gerrit.cloudera.org/#/c/10950/13/common/function-registry/impala_functions.py@514
>
> What do you think?
>
> Thanks,
> Quanlong
>


JSON support: Hive compatibility or ANSI SQL standard

2018-09-10 Thread Quanlong Huang
Hi all,

Recently, I'm working on a patch to add a built-in function for parsing
JSON strings and extracting values in them:
https://gerrit.cloudera.org/#/c/10950. We've not reached an agreement on
the name of this function. So I reach here for a broader discussion.

*-- Background --*

Hive has a built-in function (get_json_object) for this purpose. But it's a
Java UDF. Impala can't track its memory usage. The current patch adds
built-in support for this function.

Greg suggested that we name this function as json_value(). It's a function
in ANSI SQL standard. Note that json_value() only returns scalar types so
json_query() in the standard may be more fit.

The discussion is about which of the following options we should choose for
JSON support:

(a) Support get_json_object for compatibility to Hive;
(b) Support JSON functions in ANSI SQL standard;
(c) Support both.

I used to be in favor of (c). But Zoltan points out there's a json_object()
function in ANSI standard with a quite different meaning which will confuse
users with the Hive get_json_object UDF. Maybe we can only choose (a) or
(b).

*-- SQL supports in other systems --*

Oracle supports JSON functions in ANSI standard, while MySQL does not.
There's a JSON_EXTRACT function in MySQL contains all the ability of Hive's
get_json_object function.

Presto and Greenplum do not support ANSI JSON standards too:
https://prestodb.io/docs/current/functions/json.html,
https://gpdb.docs.pivotal.io/5100/admin_guide/query/topics/json-data.html

Here're the previous discussions:
https://gerrit.cloudera.org/#/c/10950/13/common/function-registry/impala_functions.py@514

What do you think?

Thanks,
Quanlong


Re: [VOTE] Removing emeritus and activity requirements

2018-09-10 Thread Quanlong Huang
+1 (non-binding)

On Mon, Sep 10, 2018 at 3:13 PM Yongjun Zhang  wrote:

> +1 (non-binding)
>
> Thanks.
>
> --Yongjun
>
> On Sun, Sep 9, 2018 at 9:58 PM, Henry Robinson  wrote:
>
> > +1 (binding)
> >
> > On Sun, 9 Sep 2018 at 18:41, Lars Volker  wrote:
> >
> > > +1 (binding)
> > >
> > > On Sun, Sep 9, 2018, 18:36 Brock Noland  wrote:
> > >
> > > > +1
> > > >
> > > > On Sun, Sep 9, 2018 at 8:15 PM, Jim Apple 
> > wrote:
> > > >
> > > > > Please vote on the following diff to the bylaws:
> > > > >
> > > > > https://gerrit.cloudera.org/#/c/11407/
> > > > >
> > > > > The vote will conclude Thursday morning, California time. Only PMC
> > > > > votes are binding, but everyone is welcomed and encouraged to vote.
> > > > > You can vote here or on the diff itself on gerrit. This vote will
> > pass
> > > > > with 3 binding +1 votes and more binding +1 votes than -1 votes.
> > > > >
> > > > > For posterity, this diff is on our reviews@ mailing list, as are
> all
> > > > > code reviews and gerrit traffic:
> > > > >
> > > > > https://lists.apache.org/api/source.lua/
> > fe0a4c170824112c0efd69265a8951
> > > > > afe162226e0c998d669a4b7216@%3Creviews.impala.apache.org%3E
> > > > >
> > > >
> > >
> >
>


Re: [VOTE] Removing emeritus and activity requirements

2018-09-10 Thread Yongjun Zhang
+1 (non-binding)

Thanks.

--Yongjun

On Sun, Sep 9, 2018 at 9:58 PM, Henry Robinson  wrote:

> +1 (binding)
>
> On Sun, 9 Sep 2018 at 18:41, Lars Volker  wrote:
>
> > +1 (binding)
> >
> > On Sun, Sep 9, 2018, 18:36 Brock Noland  wrote:
> >
> > > +1
> > >
> > > On Sun, Sep 9, 2018 at 8:15 PM, Jim Apple 
> wrote:
> > >
> > > > Please vote on the following diff to the bylaws:
> > > >
> > > > https://gerrit.cloudera.org/#/c/11407/
> > > >
> > > > The vote will conclude Thursday morning, California time. Only PMC
> > > > votes are binding, but everyone is welcomed and encouraged to vote.
> > > > You can vote here or on the diff itself on gerrit. This vote will
> pass
> > > > with 3 binding +1 votes and more binding +1 votes than -1 votes.
> > > >
> > > > For posterity, this diff is on our reviews@ mailing list, as are all
> > > > code reviews and gerrit traffic:
> > > >
> > > > https://lists.apache.org/api/source.lua/
> fe0a4c170824112c0efd69265a8951
> > > > afe162226e0c998d669a4b7216@%3Creviews.impala.apache.org%3E
> > > >
> > >
> >
>