Much of this is about mapping from logical fields (i.e. the fields you can reference in SQL) down to the Avro representation; I’m no expert on that mapping, so I’ll focus on the SQL stuff.
First, SQL doesn’t allow a record to have two fields of the same name, so you wouldn’t be allowed to have two “name” fields. When you do a join, you might need to alias output columns: select stream orders.id, products.id as productId from orders join products on orders.id = products.id; Second, JOIN isn’t the only SQL operator that combines records; GROUP BY also combines records. JOIN combines records from different streams, and they usually have different types (i.e. different numbers/types of fields), whereas GROUP BY combines records from the same stream. Use whichever best suits your purpose. select stream zipcode, floor(rowtime to hour), array_agg(orderid) as orderIds from orders group by zipcode, floor(rowtime to hour) (array_agg is an aggregate function, recently added to the SQL standard, that gathers input values into an array. See http://www.craigkerstiens.com/2013/04/17/array-agg/.) Output: { zipcode: “94705”, rowtime: “2015-04-09 11:00:00”, orderIds: [123, 156, 1056] }, { zipcode: “94117”, rowtime: “2015-04-09 11:00:00”, orderIds: [45, 777] }, { zipcode: “94705”, rowtime: “2015-04-09 12:00:00”, orderIds: [55] } Julian On Apr 9, 2015, at 12:07 PM, Yi Pan <nickpa...@gmail.com> wrote: > Hi, Roger, > > Good question on that. I am actually not aware of any "automatic" way of > doing this in Avro. I have tried to add generic Schema and Data interface > in samza-sql branch to address the morphing of the schemas from input > streams to the output streams. The basic idea is to have wrapper Schema and > Data classes on-top-of the deserialized objects to access the data fields > according to the schema w/o changing and copying the actual data fields. > Hence, when there is a need to morph the input data schemas into a new > output data schema, we just need an implementation of the new output data > Schema class that can read the corresponding data fields from the input > data and write them out in the output schema. An interface function > transform() is added in the Schema class for this exact purpose. Currently, > it only takes one input data and one example of "projection" transformation > can be found in the implementation of AvroSchema class. A join case as you > presented may well be a reason to have an implementation of "join" with > multiple input data. > > All the above solution is still experimental and please feel free to > provide your feedback and comments on that. If we agree that this solution > is good and suit for a broader use case, it can be considered to be used > outside the "SQL" context as well. > > Best regards! > > -Yi > > On Thu, Apr 9, 2015 at 8:55 AM, Roger Hoover <roger.hoo...@gmail.com> wrote: > >> Hi Milinda and others, >> >> This is an Avro question but since you guys are working on Avro support for >> stream SQL, I thought I'd ask you for help. >> >> If I have a two records of type A and B as below and want to join them >> similar to "SELECT *" in SQL to produce a record of type AB, is there an >> simple way to do this with Avro without writing code to copy each field >> individually? >> >> I appreciate any help. >> >> Thanks, >> >> Roger >> >> { >> "name": "A", >> "type": "record", >> "namespace": "fubar", >> "fields": [{"name": "a", "type" : "int"}] >> } >> >> { >> "name": "B", >> "type": "record", >> "namespace": "fubar", >> "fields": [{"name": "b", "type" : "int"}] >> } >> >> { >> "name": "AB", >> "type": "record", >> "namespace": "fubar", >> "fields": [{"name": "a", "type" : "int"}, {"name": "b", "type" : "int"}] >> } >>