Hi all,

In case this helps anyone, I was able to create a simple class to do the
join and it works nicely for my use case.  It assumes you have schema for
the input and output records.

Example (
https://github.com/Quantiply/rico/blob/master/avro-serde/src/test/java/com/quantiply/avro/JoinTest.java#L44-L52
):

        GenericRecord in1 = getIn1();
        GenericRecord in2 = getIn2();

        GenericRecord joined = new Join(getJoinedSchema())
                .merge(in1)
                .merge(in2)
                .getBuilder()
                .set("charlie", "blah blah")
                .build();

Class is here:
https://github.com/Quantiply/rico/blob/master/avro-serde/src/main/java/com/quantiply/avro/Join.java

Cheers,

Roger

On Thu, Apr 9, 2015 at 12:54 PM, Roger Hoover <roger.hoo...@gmail.com>
wrote:

> Yi Pan,
>
> Thanks for your response.  I'm thinking that I'll iterate over the fields
> of the input schemas (similar to this
> https://github.com/apache/samza/blob/samza-sql/samza-sql/src/main/java/org/apache/samza/sql/metadata/AvroSchemaConverter.java#L58-L62),
> match them up with the output schema and then copy the values.  It'll let
> you know how it goes in case it's useful.
>
> Cheers,
>
> Roger
>
> On Thu, Apr 9, 2015 at 12:07 PM, Yi Pan <nickpa...@gmail.com> wrote:
>
>> Hi, Roger,
>>
>> Good question on that. I am actually not aware of any "automatic" way of
>> doing this in Avro. I have tried to add generic Schema and Data interface
>> in samza-sql branch to address the morphing of the schemas from input
>> streams to the output streams. The basic idea is to have wrapper Schema
>> and
>> Data classes on-top-of the deserialized objects to access the data fields
>> according to the schema w/o changing and copying the actual data fields.
>> Hence, when there is a need to morph the input data schemas into a new
>> output data schema, we just need an implementation of the new output data
>> Schema class that can read the corresponding data fields from the input
>> data and write them out in the output schema. An interface function
>> transform() is added in the Schema class for this exact purpose.
>> Currently,
>> it only takes one input data and one example of "projection"
>> transformation
>> can be found in the implementation of AvroSchema class. A join case as you
>> presented may well be a reason to have an implementation of "join" with
>> multiple input data.
>>
>> All the above solution is still experimental and please feel free to
>> provide your feedback and comments on that. If we agree that this solution
>> is good and suit for a broader use case, it can be considered to be used
>> outside the "SQL" context as well.
>>
>> Best regards!
>>
>> -Yi
>>
>> On Thu, Apr 9, 2015 at 8:55 AM, Roger Hoover <roger.hoo...@gmail.com>
>> wrote:
>>
>> > Hi Milinda and others,
>> >
>> > This is an Avro question but since you guys are working on Avro support
>> for
>> > stream SQL, I thought I'd ask you for help.
>> >
>> > If I have a two records of type A and B as below and want to join them
>> > similar to "SELECT *" in SQL to produce a record of type AB, is there an
>> > simple way to do this with Avro without writing code to copy each field
>> > individually?
>> >
>> > I appreciate any help.
>> >
>> > Thanks,
>> >
>> > Roger
>> >
>> > {
>> >   "name": "A",
>> >   "type": "record",
>> >   "namespace": "fubar",
>> >   "fields": [{"name": "a", "type" : "int"}]
>> > }
>> >
>> > {
>> >   "name": "B",
>> >   "type": "record",
>> >   "namespace": "fubar",
>> >   "fields": [{"name": "b", "type" : "int"}]
>> > }
>> >
>> > {
>> >   "name": "AB",
>> >   "type": "record",
>> >   "namespace": "fubar",
>> >   "fields": [{"name": "a", "type" : "int"}, {"name": "b", "type" :
>> "int"}]
>> > }
>> >
>>
>
>

Reply via email to