I got it. You mean TimeSpentQuerying, PageType, TotalRevenue and UserAgent
are all UDFs that takes a JSON object and outputs a STRING.

Allowing such a new object type just for UDFs are simpler than supporting a
new type in all parts of the system, because we don't need to care about
serialization/deserialization.


If we go that route, the final query will look like this. Note that
StringToJson takes a STRING and returns a JSON, and TimeSpentQuerying (etc)
takes a JSON and returns either STRING or other types that are supported by
Hive.

SELECT TimeSpentQuerying(txnjson), PageType(txnjson), TotalRevenue(txnjson)
from (SELECT StringToJson(txn) as txnjson FROM txn_logs) tnxjson_logs
where UserAgent(txnjson) like '%Safari%';

Is this good enough?


Zheng

On Fri, Oct 2, 2009 at 11:50 PM, Bobby Rullo <[email protected]> wrote:

> Zheng,
> Thanks for your reply.
>
> I'm not sure what this "get_json_objects" function would do...do you mean
> it would return a canned set of fields? That's not what I'm interested in.
> The UDFs that I have in mind dig deep into the JSON and
> fetch/filter/transform data. I don't want a few canned fields to play with,
> I want our users to be able to create AdHoc queries on the JSON using any
> and all of the data inside.
>
> Here's a more realistic example. 'Txn' is a JSON field:
>
> select TimeSpentQuerying(txn), PageType(txn), TotalRevenue(txn) from
> txn_logs where UserAgent(txn) like '%Safari%'
>
> I have no problem writing the UDFs, I'm just trying to avoid parsing the
> JSON in each one of them - in other words, I want the input to the UDF's to
> not be a String (or Text) which I need to parse into JSON, but rather a JSON
> object.
>
> Is that possible? Can I create new field types? And if so, where do I
> register them?
>
> My fantasy is to be able to do this:
>
> CREATE TABLE txn_logs (tid String, txn JSON);
>
> Thanks,
>
> Bobby
>
>
>
> On Oct 2, 2009, at 10:54 PM, Zheng Shao wrote:
>
> We have 2 example serdes, one for text data (regexserde), one for
> binary data (thriftserde).
>
> But the simplest solution for this is to add a udf get_json_objects
> that returns a bunch of fields in an array. Then we can use sub query
> to extract the array elements as individual elements, until we have
> support for common expression extraction.
>
>
> There are a few example udf in the ctrib package illustrating the way
> to accept variable number of arguments.
>
> Let us know how it goes.
>
> Zheng
>
>
>
> On 10/2/09, Bobby Rullo <[email protected]> wrote:
>
> Hi there,
>
>
> I want to create a new JSON Field/Column type. I know there exists
>
> get_json_object(), but the things is I want to multiple JSON
>
> operations in a single select statement and don't want to incur the
>
> cost of parsing the JSON over and over again, because our json
>
> structures are quite large.
>
>
> Our schema is two fields - the first is a string of some sort used as
>
> an id, and the second is a huge JSON structure. I want to be able
>
> write sql like the following:
>
>
> select Foo(json), Bar(json), Baz(json) where Quux(json) like '%hello%';
>
>
> Foo(), Bar(), Baz() and Quux() are all UDF's that operate on json.
>
> Right now what I'm doing is storing the JSON as a string and in each
>
> of those UDF's I am de-serializing the json, and returning another
>
> string.
>
>
> Ideally I'd want to be able to return JSON objects from these UDF's so
>
> I could compose my functions like "Foo(Bar(Baz(json))"
>
>
> I'm thinking I need to create a SerDe but SerDe's seem to work at the
>
> table level, not at the field level (in other words, I just need a
>
> custom field, not a entire custom table type...or do I!?)
>
>
> Also, if there's any docs on writing custom SerDe's please let me
>
> know...I haven't been able to find much.
>
>
> Thanks!
>
>
> Bobby
>
>
>
> --
> Sent from Gmail for mobile | mobile.google.com
>
> Yours,
> Zheng
>
>
>


-- 
Yours,
Zheng

Reply via email to