On Oct 3, 2009, at 1:28 AM, Zheng Shao wrote:
I got it. You mean TimeSpentQuerying, PageType, TotalRevenue and
UserAgent are all UDFs that takes a JSON object and outputs a STRING.
Exactly!
Allowing such a new object type just for UDFs are simpler than
supporting a new type in all parts of the system, because we don't
need to care about serialization/deserialization.
Interesting. So you're saying it's ok to return whatever type from
UDF.evaluate(), as long as it doesn't get serialized (i.e. as long as
it's not the final output)?
If we go that route, the final query will look like this. Note that
StringToJson takes a STRING and returns a JSON, and
TimeSpentQuerying (etc) takes a JSON and returns either STRING or
other types that are supported by Hive.
SELECT TimeSpentQuerying(txnjson), PageType(txnjson),
TotalRevenue(txnjson) from (SELECT StringToJson(txn) as txnjson FROM
txn_logs) tnxjson_logs
where UserAgent(txnjson) like '%Safari%';
Is this good enough?
That's pretty damn good. I like that all I have to do is create one
more UDF which does the JSON parsing (and maybe JsonToString as well.)
The only thing that I'm concerned about is whether that sub-query is
going to create an extra map job. Is the planner smart enough to stick
stick all that work in one map?
Thanks!
bobby
Zheng
On Fri, Oct 2, 2009 at 11:50 PM, Bobby Rullo <[email protected]>
wrote:
Zheng,
Thanks for your reply.
I'm not sure what this "get_json_objects" function would do...do you
mean it would return a canned set of fields? That's not what I'm
interested in. The UDFs that I have in mind dig deep into the JSON
and fetch/filter/transform data. I don't want a few canned fields to
play with, I want our users to be able to create AdHoc queries on
the JSON using any and all of the data inside.
Here's a more realistic example. 'Txn' is a JSON field:
select TimeSpentQuerying(txn), PageType(txn), TotalRevenue(txn) from
txn_logs where UserAgent(txn) like '%Safari%'
I have no problem writing the UDFs, I'm just trying to avoid parsing
the JSON in each one of them - in other words, I want the input to
the UDF's to not be a String (or Text) which I need to parse into
JSON, but rather a JSON object.
Is that possible? Can I create new field types? And if so, where do
I register them?
My fantasy is to be able to do this:
CREATE TABLE txn_logs (tid String, txn JSON);
Thanks,
Bobby
On Oct 2, 2009, at 10:54 PM, Zheng Shao wrote:
We have 2 example serdes, one for text data (regexserde), one for
binary data (thriftserde).
But the simplest solution for this is to add a udf get_json_objects
that returns a bunch of fields in an array. Then we can use sub query
to extract the array elements as individual elements, until we have
support for common expression extraction.
There are a few example udf in the ctrib package illustrating the way
to accept variable number of arguments.
Let us know how it goes.
Zheng
On 10/2/09, Bobby Rullo <[email protected]> wrote:
Hi there,
I want to create a new JSON Field/Column type. I know there exists
get_json_object(), but the things is I want to multiple JSON
operations in a single select statement and don't want to incur the
cost of parsing the JSON over and over again, because our json
structures are quite large.
Our schema is two fields - the first is a string of some sort used
as
an id, and the second is a huge JSON structure. I want to be able
write sql like the following:
select Foo(json), Bar(json), Baz(json) where Quux(json) like
'%hello%';
Foo(), Bar(), Baz() and Quux() are all UDF's that operate on json.
Right now what I'm doing is storing the JSON as a string and in each
of those UDF's I am de-serializing the json, and returning another
string.
Ideally I'd want to be able to return JSON objects from these
UDF's so
I could compose my functions like "Foo(Bar(Baz(json))"
I'm thinking I need to create a SerDe but SerDe's seem to work at
the
table level, not at the field level (in other words, I just need a
custom field, not a entire custom table type...or do I!?)
Also, if there's any docs on writing custom SerDe's please let me
know...I haven't been able to find much.
Thanks!
Bobby
--
Sent from Gmail for mobile | mobile.google.com
Yours,
Zheng
--
Yours,
Zheng