Zheng,
Thanks for your reply.
I'm not sure what this "get_json_objects" function would do...do you
mean it would return a canned set of fields? That's not what I'm
interested in. The UDFs that I have in mind dig deep into the JSON and
fetch/filter/transform data. I don't want a few canned fields to play
with, I want our users to be able to create AdHoc queries on the JSON
using any and all of the data inside.
Here's a more realistic example. 'Txn' is a JSON field:
select TimeSpentQuerying(txn), PageType(txn), TotalRevenue(txn) from
txn_logs where UserAgent(txn) like '%Safari%'
I have no problem writing the UDFs, I'm just trying to avoid parsing
the JSON in each one of them - in other words, I want the input to the
UDF's to not be a String (or Text) which I need to parse into JSON,
but rather a JSON object.
Is that possible? Can I create new field types? And if so, where do I
register them?
My fantasy is to be able to do this:
CREATE TABLE txn_logs (tid String, txn JSON);
Thanks,
Bobby
On Oct 2, 2009, at 10:54 PM, Zheng Shao wrote:
We have 2 example serdes, one for text data (regexserde), one for
binary data (thriftserde).
But the simplest solution for this is to add a udf get_json_objects
that returns a bunch of fields in an array. Then we can use sub query
to extract the array elements as individual elements, until we have
support for common expression extraction.
There are a few example udf in the ctrib package illustrating the way
to accept variable number of arguments.
Let us know how it goes.
Zheng
On 10/2/09, Bobby Rullo <[email protected]> wrote:
Hi there,
I want to create a new JSON Field/Column type. I know there exists
get_json_object(), but the things is I want to multiple JSON
operations in a single select statement and don't want to incur the
cost of parsing the JSON over and over again, because our json
structures are quite large.
Our schema is two fields - the first is a string of some sort used as
an id, and the second is a huge JSON structure. I want to be able
write sql like the following:
select Foo(json), Bar(json), Baz(json) where Quux(json) like '%hello
%';
Foo(), Bar(), Baz() and Quux() are all UDF's that operate on json.
Right now what I'm doing is storing the JSON as a string and in each
of those UDF's I am de-serializing the json, and returning another
string.
Ideally I'd want to be able to return JSON objects from these UDF's
so
I could compose my functions like "Foo(Bar(Baz(json))"
I'm thinking I need to create a SerDe but SerDe's seem to work at the
table level, not at the field level (in other words, I just need a
custom field, not a entire custom table type...or do I!?)
Also, if there's any docs on writing custom SerDe's please let me
know...I haven't been able to find much.
Thanks!
Bobby
--
Sent from Gmail for mobile | mobile.google.com
Yours,
Zheng