Re: Avro Schema Question

2018-05-29 Thread Motoko Kusanagi
Hi Elliot,

Thanks for that bit of info. It is helpful. Where do you draw the line between 
complex unions versus simple unions? In other words, what criteria do you use 
to say this union is too complex?

Thanks,

Scott

From: Elliot West 
Sent: Saturday, May 26, 2018 1:58 AM
To: user@avro.apache.org
Subject: Re: Avro Schema Question

A word of caution on the union type. You may find support for unions very 
patchy if you are hoping to process records using well known data processing 
engines. We’ve been unable to usefully read union types in both Apache Spark 
and Hive for example. The simple null union construct is the exception: [null, 
typeA], as it is usually represented by a nullable columns of typeA. We’ve 
resorted to prohibiting schemas with complex unions so that our producers can’t 
create data that is not fully readable by our consumers.

Elliot.

On Fri, 25 May 2018 at 22:30, Motoko Kusanagi 
mailto:major-motoko-kusan...@outlook.com>> 
wrote:
Hi Michael,

Thanks!! Yes, it does.

Scott

From: Michael Smith mailto:micha...@syapse.com>>
Sent: Friday, May 25, 2018 2:21 PM
To: user@avro.apache.org<mailto:user@avro.apache.org>
Subject: Re: Avro Schema Question

{"type": "int"}, {"type": "string"} is not valid json, so you definitely can't 
do that. But

[{"type": "int"}, {"type": "string"}] is a valid schema -- it can encode a 
single value that is either an int or a string. At the highest level, your 
schema can only be one type, but that type may be (and in fact probably will 
be) a complex type -- a union of records or a single record.

Does that answer your question?

On Fri, May 25, 2018 at 5:08 PM Motoko Kusanagi 
mailto:major-motoko-kusan...@outlook.com>> 
wrote:

Hi,


I read the specification multiple times. In the specification, it says "A 
Schema is represented in JSON<http://www.json.org/> by one of:" in the Schema 
Declaration section. The "one" confuses me as I am interpreting it as exactly 
one of the 3 that it listed.


In short, can I do this as a single schema?

{type : int},

{type : string},

{type : int},


Or do the following as a single schema?

{type : int},

{type : record },

{type : record }, // Not the same as the previous.

{type : string},


Or do I have to "embed" the above under a complex type like a record if I want 
complex schema? Or does "one of" mean I have to choose one and exactly one for 
the high top-most level of the schema?


Thanks!!



--


Michael A. Smith — Senior Systems Engineer



micha...@syapse.com<mailto:micha...@syapse.com>
syapse.com
<http://www.syapse.com/>100 Matsonford 
Road<https://maps.google.com/?q=100+Matsonford+Rd=gmail=g>
Five Radnor Corporate Center
Suite 444
Radnor, PA 19087
https://www.linkedin.com/in/michaelalexandersmith


[https://lh3.googleusercontent.com/8OwE1TeaqeIeUgpNi5sD9LKfc0Zl8IoENh1w5JbTbmluiHFjMqEPDL_Fl-0ulgaUPxTKEXoYlY2GIdVBSHaqLihzqQCLtJR-gwZWJt9ri0rHgb7rn0hKtqYv5m9iVMdjIUv4xlOx]



Re: Avro Schema Question

2018-05-26 Thread Elliot West
A word of caution on the union type. You may find support for unions very
patchy if you are hoping to process records using well known data
processing engines. We’ve been unable to usefully read union types in both
Apache Spark and Hive for example. The simple null union construct is the
exception: [null, typeA], as it is usually represented by a nullable
columns of typeA. We’ve resorted to prohibiting schemas with complex unions
so that our producers can’t create data that is not fully readable by our
consumers.

Elliot.

On Fri, 25 May 2018 at 22:30, Motoko Kusanagi <
major-motoko-kusan...@outlook.com> wrote:

> Hi Michael,
>
> Thanks!! Yes, it does.
>
> Scott
> --
> *From:* Michael Smith <micha...@syapse.com>
> *Sent:* Friday, May 25, 2018 2:21 PM
> *To:* user@avro.apache.org
> *Subject:* Re: Avro Schema Question
>
> {"type": "int"}, {"type": "string"} is not valid json, so you definitely
> can't do that. But
>
> [{"type": "int"}, {"type": "string"}] is a valid schema -- it can encode a
> single value that is either an int or a string. At the highest level, your
> schema can only be one type, but that type may be (and in fact probably
> will be) a complex type -- a union of records or a single record.
>
> Does that answer your question?
>
> On Fri, May 25, 2018 at 5:08 PM Motoko Kusanagi <
> major-motoko-kusan...@outlook.com> wrote:
>
> Hi,
>
>
> I read the specification multiple times. In the specification, it says "A
> Schema is represented in JSON <http://www.json.org/> by one of:" in the
> Schema Declaration section. The "one" confuses me as I am interpreting it
> as exactly one of the 3 that it listed.
>
>
> In short, can I do this as a single schema?
>
> {type : int},
>
> {type : string},
>
> {type : int},
>
>
> Or do the following as a single schema?
>
> {type : int},
>
> {type : record },
>
> {type : record }, // Not the same as the previous.
>
> {type : string},
>
>
> Or do I have to "embed" the above under a complex type like a record if I
> want complex schema? Or does "one of" mean I have to choose one and exactly
> one for the high top-most level of the schema?
>
>
> Thanks!!
>
>
>
> --
>
> Michael A. Smith — Senior Systems Engineer
> --
>
> micha...@syapse.com
> syapse.com
> <http://www.syapse.com/>100 Matsonford Road
> <https://maps.google.com/?q=100+Matsonford+Rd=gmail=g>
> Five Radnor Corporate Center
> Suite 444
> Radnor, PA 19087
> https://www.linkedin.com/in/michaelalexandersmith
>
>


Re: Avro Schema Question

2018-05-25 Thread Motoko Kusanagi
Hi Michael,

Thanks!! Yes, it does.

Scott

From: Michael Smith <micha...@syapse.com>
Sent: Friday, May 25, 2018 2:21 PM
To: user@avro.apache.org
Subject: Re: Avro Schema Question

{"type": "int"}, {"type": "string"} is not valid json, so you definitely can't 
do that. But

[{"type": "int"}, {"type": "string"}] is a valid schema -- it can encode a 
single value that is either an int or a string. At the highest level, your 
schema can only be one type, but that type may be (and in fact probably will 
be) a complex type -- a union of records or a single record.

Does that answer your question?

On Fri, May 25, 2018 at 5:08 PM Motoko Kusanagi 
<major-motoko-kusan...@outlook.com<mailto:major-motoko-kusan...@outlook.com>> 
wrote:

Hi,


I read the specification multiple times. In the specification, it says "A 
Schema is represented in JSON<http://www.json.org/> by one of:" in the Schema 
Declaration section. The "one" confuses me as I am interpreting it as exactly 
one of the 3 that it listed.


In short, can I do this as a single schema?

{type : int},

{type : string},

{type : int},


Or do the following as a single schema?

{type : int},

{type : record },

{type : record }, // Not the same as the previous.

{type : string},


Or do I have to "embed" the above under a complex type like a record if I want 
complex schema? Or does "one of" mean I have to choose one and exactly one for 
the high top-most level of the schema?


Thanks!!



--


Michael A. Smith — Senior Systems Engineer



micha...@syapse.com<mailto:micha...@syapse.com>
syapse.com
<http://www.syapse.com/>100 Matsonford 
Road<https://maps.google.com/?q=100+Matsonford+Rd=gmail=g>
Five Radnor Corporate Center
Suite 444
Radnor, PA 19087
https://www.linkedin.com/in/michaelalexandersmith


[https://lh3.googleusercontent.com/8OwE1TeaqeIeUgpNi5sD9LKfc0Zl8IoENh1w5JbTbmluiHFjMqEPDL_Fl-0ulgaUPxTKEXoYlY2GIdVBSHaqLihzqQCLtJR-gwZWJt9ri0rHgb7rn0hKtqYv5m9iVMdjIUv4xlOx]



Re: Avro Schema Question

2018-05-25 Thread Michael Smith
{"type": "int"}, {"type": "string"} is not valid json, so you definitely
can't do that. But

[{"type": "int"}, {"type": "string"}] is a valid schema -- it can encode a
single value that is either an int or a string. At the highest level, your
schema can only be one type, but that type may be (and in fact probably
will be) a complex type -- a union of records or a single record.

Does that answer your question?

On Fri, May 25, 2018 at 5:08 PM Motoko Kusanagi <
major-motoko-kusan...@outlook.com> wrote:

> Hi,
>
>
> I read the specification multiple times. In the specification, it says "A
> Schema is represented in JSON  by one of:" in the
> Schema Declaration section. The "one" confuses me as I am interpreting it
> as exactly one of the 3 that it listed.
>
>
> In short, can I do this as a single schema?
>
> {type : int},
>
> {type : string},
>
> {type : int},
>
>
> Or do the following as a single schema?
>
> {type : int},
>
> {type : record },
>
> {type : record }, // Not the same as the previous.
>
> {type : string},
>
>
> Or do I have to "embed" the above under a complex type like a record if I
> want complex schema? Or does "one of" mean I have to choose one and exactly
> one for the high top-most level of the schema?
>
>
> Thanks!!
>
>
>
> --

Michael A. Smith — Senior Systems Engineer
--

micha...@syapse.com
syapse.com
100 Matsonford Road

Five Radnor Corporate Center
Suite 444
Radnor, PA 19087
https://www.linkedin.com/in/michaelalexandersmith


Avro Schema Question

2018-05-25 Thread Motoko Kusanagi
Hi,


I read the specification multiple times. In the specification, it says "A 
Schema is represented in JSON by one of:" in the Schema 
Declaration section. The "one" confuses me as I am interpreting it as exactly 
one of the 3 that it listed.


In short, can I do this as a single schema?

{type : int},

{type : string},

{type : int},


Or do the following as a single schema?

{type : int},

{type : record },

{type : record }, // Not the same as the previous.

{type : string},


Or do I have to "embed" the above under a complex type like a record if I want 
complex schema? Or does "one of" mean I have to choose one and exactly one for 
the high top-most level of the schema?


Thanks!!




Re: AvroStorage/Avro Schema Question

2012-04-17 Thread Russell Jurney
The fix was this: 

{
type:record,
name:Email,
fields:
[
{
name:message_id,
type:[null,string],
doc:
},
{
name:in_reply_to,
type: [string, null]
},
{
name:subject, 
type: [string, null]
},
{
name:body, 
type: [string, null]
},
{
name:date, 
type: [string, null]
},
{
name:froms,
type:
[
null,
{
type:array,
items:
[
null,
{
type:record,
name:from,
fields:
[
{
name:real_name,
type:[null,string],
doc:
},
{
name:address,
type:[null,string],
doc:
}
]
}

]
}
],
doc:
},
{
name:tos,
type:
[
null,
{
type:array,
items:
[
null,
{
type:record,
name:to,
fields:
[
{
name:real_name,
type:[null,string],
doc:
},
{
name:address,
type:[null,string],
doc:
}
]
}

]
}
],
doc:
},
{
name:ccs,
type:
[
null,
{
type:array,
items:
[
null,
{
type:record,
name:cc,
fields:
[
{
name:real_name,
type:[null,string],
doc:
},
{
name:address,
type:[null,string],
doc:
}
]
}

]
}
],
doc:
},
{
name:bccs,
type:
[
null,
{
type:array,
items:
[
null,
{
type:record,
name:bcc,
fields:
[
{
name:real_name,
type:[null,string],
doc:
},
{
name:address,
type:[null,string],
doc:
}
]
}

]
}
],
doc:
},
{
name:reply_tos,
type:
[
null,
{
type:array,
items:
[
null,
{
type:record,
name:reply_to,
fields:
[
{
name:real_name,
type:[null,string],
doc: