I got it. You mean TimeSpentQuerying, PageType, TotalRevenue and UserAgent are all UDFs that takes a JSON object and outputs a STRING.
Allowing such a new object type just for UDFs are simpler than supporting a new type in all parts of the system, because we don't need to care about serialization/deserialization. If we go that route, the final query will look like this. Note that StringToJson takes a STRING and returns a JSON, and TimeSpentQuerying (etc) takes a JSON and returns either STRING or other types that are supported by Hive. SELECT TimeSpentQuerying(txnjson), PageType(txnjson), TotalRevenue(txnjson) from (SELECT StringToJson(txn) as txnjson FROM txn_logs) tnxjson_logs where UserAgent(txnjson) like '%Safari%'; Is this good enough? Zheng On Fri, Oct 2, 2009 at 11:50 PM, Bobby Rullo <[email protected]> wrote: > Zheng, > Thanks for your reply. > > I'm not sure what this "get_json_objects" function would do...do you mean > it would return a canned set of fields? That's not what I'm interested in. > The UDFs that I have in mind dig deep into the JSON and > fetch/filter/transform data. I don't want a few canned fields to play with, > I want our users to be able to create AdHoc queries on the JSON using any > and all of the data inside. > > Here's a more realistic example. 'Txn' is a JSON field: > > select TimeSpentQuerying(txn), PageType(txn), TotalRevenue(txn) from > txn_logs where UserAgent(txn) like '%Safari%' > > I have no problem writing the UDFs, I'm just trying to avoid parsing the > JSON in each one of them - in other words, I want the input to the UDF's to > not be a String (or Text) which I need to parse into JSON, but rather a JSON > object. > > Is that possible? Can I create new field types? And if so, where do I > register them? > > My fantasy is to be able to do this: > > CREATE TABLE txn_logs (tid String, txn JSON); > > Thanks, > > Bobby > > > > On Oct 2, 2009, at 10:54 PM, Zheng Shao wrote: > > We have 2 example serdes, one for text data (regexserde), one for > binary data (thriftserde). > > But the simplest solution for this is to add a udf get_json_objects > that returns a bunch of fields in an array. Then we can use sub query > to extract the array elements as individual elements, until we have > support for common expression extraction. > > > There are a few example udf in the ctrib package illustrating the way > to accept variable number of arguments. > > Let us know how it goes. > > Zheng > > > > On 10/2/09, Bobby Rullo <[email protected]> wrote: > > Hi there, > > > I want to create a new JSON Field/Column type. I know there exists > > get_json_object(), but the things is I want to multiple JSON > > operations in a single select statement and don't want to incur the > > cost of parsing the JSON over and over again, because our json > > structures are quite large. > > > Our schema is two fields - the first is a string of some sort used as > > an id, and the second is a huge JSON structure. I want to be able > > write sql like the following: > > > select Foo(json), Bar(json), Baz(json) where Quux(json) like '%hello%'; > > > Foo(), Bar(), Baz() and Quux() are all UDF's that operate on json. > > Right now what I'm doing is storing the JSON as a string and in each > > of those UDF's I am de-serializing the json, and returning another > > string. > > > Ideally I'd want to be able to return JSON objects from these UDF's so > > I could compose my functions like "Foo(Bar(Baz(json))" > > > I'm thinking I need to create a SerDe but SerDe's seem to work at the > > table level, not at the field level (in other words, I just need a > > custom field, not a entire custom table type...or do I!?) > > > Also, if there's any docs on writing custom SerDe's please let me > > know...I haven't been able to find much. > > > Thanks! > > > Bobby > > > > -- > Sent from Gmail for mobile | mobile.google.com > > Yours, > Zheng > > > -- Yours, Zheng
