Re: Avro schema having Map of Records

2019-08-06 Thread Ryan Skraba
Funny, I'm familiar with Avro, but I'm currently looking closely at Parquet!

Interestingly enough, I just ran across the conversion utilities in
Spark that could have answered your original question[1].

It looks like you're using ReflectData to get the schema.  Is the
exception occurring during the ReflectData.getSchema() or .induce() ?
Can you share the full stack trace or better yet, the POJO that
reproduces the error?

I _think_ I may have ran across something similar when getting a
schema via reflection, but the class had a raw collection field (List
instead of List).  I can't clearly recall, but that might be
a useful hint.

[1]: 
https://github.com/apache/spark/blob/master/external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala#L136

On Tue, Aug 6, 2019 at 2:39 PM Edgar H  wrote:
>
> Thanks a lot for the quick reply Ryan! That was exactly what I was looking 
> for :)
>
> Been trying including the changes within my code and currently it's throwing 
> the following exception... Caused by: org.apache.avro.AvroRuntimeException: 
> Can't find element type of Collection
>
> I'm thinking that it could be the POJO not containing the classes for the 
> inner record fields (I just have a getter and setter for the one_level field 
> but the rest are types of that one)? Or how should it be represented within 
> the parent POJO?
>
> Sorry if the questions sound too simple, but I'm too used to work with 
> Parquet that Avro seems like a shift from time to time :)
>
> El mar., 6 ago. 2019 a las 12:01, Ryan Skraba () escribió:
>>
>> Hello -- Avro supports a map type:
>> https://avro.apache.org/docs/1.9.0/spec.html#Maps
>>
>> Generating an Avro schema from a JSON example can be ambiguous since a
>> JSON object can either be converted to a record or a map.  You're
>> probably looking for something like this:
>>
>> {
>>   "type" : "record",
>>   "name" : "MyClass",
>>   "namespace" : "com.acme.avro",
>>   "fields" : [ {
>> "name" : "one_level",
>> "type" : {
>>   "type" : "record",
>>   "name" : "one_level",
>>   "fields" : [ {
>> "name" : "inner_level",
>> "type" : {
>>   "type" : "map",
>>   "values" : {
>> "type" : "record",
>> "name" : "sample",
>> "fields" : [ {
>>   "name" : "sample1",
>>   "type" : "string"
>> }, {
>>   "name" : "sample2",
>>   "type" : "string"
>> } ]
>>   }
>> }
>>   } ]
>> }
>>   } ]
>> }
>>
>> On Tue, Aug 6, 2019 at 10:47 AM Edgar H  wrote:
>> >
>> > I'm trying to translate a schema that I have in Spark which is defined for 
>> > Parquet, and I would like to use it within Avro too.
>> >
>> >   StructField("one_level", StructType(List(StructField(
>> > "inner_level",
>> > MapType(
>> >   StringType,
>> >   StructType(
>> > List(
>> >   StructField("field1", StringType),
>> >   StructField("field2", ArrayType(StringType))
>> > )
>> >   )
>> > )
>> >   )
>> > )), nullable = false)
>> >
>> > However, in Avro I haven't seen any examples of Maps containing Record 
>> > type objects...
>> >
>> > Tried a sample input with an online Avro schema generator, taking this 
>> > input.
>> >
>> > {
>> > "one_level": {
>> > "inner_level": {
>> > "sample1": {
>> > "field1": "sample",
>> > "field2": ["a", "b"],
>> > },
>> > "sample2": {
>> > "field1": "sample2",
>> > "field2": ["a", "b"]
>> > }
>> > }
>> > }
>> >
>> > }
>> >
>> > It prompts this output.
>> >
>> > {
>> >   "name": "MyClass",
>> >   "type": "record",
>> >   "namespace": "com.acme.avro",
>> >   "fields": [
>> > {
>> >   "name": "one_level",
>> >   "type": {
>> > "name": "one_level",
>> > "type": "record",
>> > "fields": [
>> >   {
>> > "name": "inner_level",
>> > "type": {
>> >   "name": "inner_level",
>> >   "type": "record",
>> >   "fields": [
>> > {
>> >   "name": "sample1",
>> >   "type": {
>> > "name": "sample1",
>> > "type": "record",
>> > "fields": [
>> >   {
>> > "name": "field1",
>> > "type": "string"
>> >   },
>> >   {
>> > "name": "field2",
>> > "type": {
>> >   "type": "array",
>> >   "items": "string"
>> > }
>> >   }
>> > ]
>> >   }
>> > },
>> > {
>> >   "name": "sample2",
>> >   "type": {
>> > "name": "sample2",

Re: Schema parses in C# Avro lib but not in Kafka Schema registry (assume that it is the java lib)

2019-08-06 Thread Patrick Farry
Thanks Ryan and Bryan.

Turned out it was user error :(. One of our guys made a tweak to the schema
before trying to upload it.

On Tue, Aug 6, 2019, 2:20 AM Ryan Skraba  wrote:

> Hello!  I successfully managed to load the schema into the confluent
> schema-registry version 5.3.0 (containing Avro 1.8.1), using the docker
> quickstart [1] and the command line:
>
> I just tested that the load worked -- I didn't try reading or writing
> binary.
>
> # Save the schema into registry.
> curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json"
> --data "@PackageCreateInformationSchema.json"
> http://localhost:8081/subjects/PackageCreateInformation/versions
> # Fetch and check the schema from the registry.
> curl -X GET
> http://localhost:8081/subjects/PackageCreateInformation/versions/1
>
> The PackageCreateInformationSchema.json file is a bit weird, in the format
> {"schema": "your_schema_string_representation"} which takes a lot of
> escaping quotes (the exact contents I used follows if you want to
> reproduce).
>
> How are you loading the schema, and what version of the schema-registry
> are you using?  Perhaps we can narrow it down to a specific avro version
> and see if we can reproduce it outside of the registry.
>
> Best regards, Ryan
>
> [1]
> https://docs.confluent.io/current/quickstart/cos-docker-quickstart.html
>
> PackageCreateInformationSchema.json
>
>
> 

Re: Avro schema having Map of Records

2019-08-06 Thread Edgar H
Thanks a lot for the quick reply Ryan! That was exactly what I was looking
for :)

Been trying including the changes within my code and currently it's
throwing the following exception... Caused by:
org.apache.avro.AvroRuntimeException: Can't find element type of Collection

I'm thinking that it could be the POJO not containing the classes for the
inner record fields (I just have a getter and setter for the one_level
field but the rest are types of that one)? Or how should it be represented
within the parent POJO?

Sorry if the questions sound too simple, but I'm too used to work with
Parquet that Avro seems like a shift from time to time :)

El mar., 6 ago. 2019 a las 12:01, Ryan Skraba () escribió:

> Hello -- Avro supports a map type:
> https://avro.apache.org/docs/1.9.0/spec.html#Maps
>
> Generating an Avro schema from a JSON example can be ambiguous since a
> JSON object can either be converted to a record or a map.  You're
> probably looking for something like this:
>
> {
>   "type" : "record",
>   "name" : "MyClass",
>   "namespace" : "com.acme.avro",
>   "fields" : [ {
> "name" : "one_level",
> "type" : {
>   "type" : "record",
>   "name" : "one_level",
>   "fields" : [ {
> "name" : "inner_level",
> "type" : {
>   "type" : "map",
>   "values" : {
> "type" : "record",
> "name" : "sample",
> "fields" : [ {
>   "name" : "sample1",
>   "type" : "string"
> }, {
>   "name" : "sample2",
>   "type" : "string"
> } ]
>   }
> }
>   } ]
> }
>   } ]
> }
>
> On Tue, Aug 6, 2019 at 10:47 AM Edgar H  wrote:
> >
> > I'm trying to translate a schema that I have in Spark which is defined
> for Parquet, and I would like to use it within Avro too.
> >
> >   StructField("one_level", StructType(List(StructField(
> > "inner_level",
> > MapType(
> >   StringType,
> >   StructType(
> > List(
> >   StructField("field1", StringType),
> >   StructField("field2", ArrayType(StringType))
> > )
> >   )
> > )
> >   )
> > )), nullable = false)
> >
> > However, in Avro I haven't seen any examples of Maps containing Record
> type objects...
> >
> > Tried a sample input with an online Avro schema generator, taking this
> input.
> >
> > {
> > "one_level": {
> > "inner_level": {
> > "sample1": {
> > "field1": "sample",
> > "field2": ["a", "b"],
> > },
> > "sample2": {
> > "field1": "sample2",
> > "field2": ["a", "b"]
> > }
> > }
> > }
> >
> > }
> >
> > It prompts this output.
> >
> > {
> >   "name": "MyClass",
> >   "type": "record",
> >   "namespace": "com.acme.avro",
> >   "fields": [
> > {
> >   "name": "one_level",
> >   "type": {
> > "name": "one_level",
> > "type": "record",
> > "fields": [
> >   {
> > "name": "inner_level",
> > "type": {
> >   "name": "inner_level",
> >   "type": "record",
> >   "fields": [
> > {
> >   "name": "sample1",
> >   "type": {
> > "name": "sample1",
> > "type": "record",
> > "fields": [
> >   {
> > "name": "field1",
> > "type": "string"
> >   },
> >   {
> > "name": "field2",
> > "type": {
> >   "type": "array",
> >   "items": "string"
> > }
> >   }
> > ]
> >   }
> > },
> > {
> >   "name": "sample2",
> >   "type": {
> > "name": "sample2",
> > "type": "record",
> > "fields": [
> >   {
> > "name": "field1",
> > "type": "string"
> >   },
> >   {
> > "name": "field2",
> > "type": {
> >   "type": "array",
> >   "items": "string"
> > }
> >   }
> > ]
> >   }
> > }
> >   ]
> > }
> >   }
> > ]
> >   }
> > }
> >   ]
> > }
> >
> > Which isn't absolutely what I'm looking for. Is it possible to define
> such schema in Avro?
>


Re: Avro schema having Map of Records

2019-08-06 Thread Ryan Skraba
Hello -- Avro supports a map type:
https://avro.apache.org/docs/1.9.0/spec.html#Maps

Generating an Avro schema from a JSON example can be ambiguous since a
JSON object can either be converted to a record or a map.  You're
probably looking for something like this:

{
  "type" : "record",
  "name" : "MyClass",
  "namespace" : "com.acme.avro",
  "fields" : [ {
"name" : "one_level",
"type" : {
  "type" : "record",
  "name" : "one_level",
  "fields" : [ {
"name" : "inner_level",
"type" : {
  "type" : "map",
  "values" : {
"type" : "record",
"name" : "sample",
"fields" : [ {
  "name" : "sample1",
  "type" : "string"
}, {
  "name" : "sample2",
  "type" : "string"
} ]
  }
}
  } ]
}
  } ]
}

On Tue, Aug 6, 2019 at 10:47 AM Edgar H  wrote:
>
> I'm trying to translate a schema that I have in Spark which is defined for 
> Parquet, and I would like to use it within Avro too.
>
>   StructField("one_level", StructType(List(StructField(
> "inner_level",
> MapType(
>   StringType,
>   StructType(
> List(
>   StructField("field1", StringType),
>   StructField("field2", ArrayType(StringType))
> )
>   )
> )
>   )
> )), nullable = false)
>
> However, in Avro I haven't seen any examples of Maps containing Record type 
> objects...
>
> Tried a sample input with an online Avro schema generator, taking this input.
>
> {
> "one_level": {
> "inner_level": {
> "sample1": {
> "field1": "sample",
> "field2": ["a", "b"],
> },
> "sample2": {
> "field1": "sample2",
> "field2": ["a", "b"]
> }
> }
> }
>
> }
>
> It prompts this output.
>
> {
>   "name": "MyClass",
>   "type": "record",
>   "namespace": "com.acme.avro",
>   "fields": [
> {
>   "name": "one_level",
>   "type": {
> "name": "one_level",
> "type": "record",
> "fields": [
>   {
> "name": "inner_level",
> "type": {
>   "name": "inner_level",
>   "type": "record",
>   "fields": [
> {
>   "name": "sample1",
>   "type": {
> "name": "sample1",
> "type": "record",
> "fields": [
>   {
> "name": "field1",
> "type": "string"
>   },
>   {
> "name": "field2",
> "type": {
>   "type": "array",
>   "items": "string"
> }
>   }
> ]
>   }
> },
> {
>   "name": "sample2",
>   "type": {
> "name": "sample2",
> "type": "record",
> "fields": [
>   {
> "name": "field1",
> "type": "string"
>   },
>   {
> "name": "field2",
> "type": {
>   "type": "array",
>   "items": "string"
> }
>   }
> ]
>   }
> }
>   ]
> }
>   }
> ]
>   }
> }
>   ]
> }
>
> Which isn't absolutely what I'm looking for. Is it possible to define such 
> schema in Avro?


Avro schema having Map of Records

2019-08-06 Thread Edgar H
I'm trying to translate a schema that I have in Spark which is defined for
Parquet, and I would like to use it within Avro too.

  StructField("one_level", StructType(List(StructField(
"inner_level",
MapType(
  StringType,
  StructType(
List(
  StructField("field1", StringType),
  StructField("field2", ArrayType(StringType))
)
  )
)
  )
)), nullable = false)

However, in Avro I haven't seen any examples of Maps containing Record type
objects...

Tried a sample input with an online Avro schema generator, taking this
input.

{
"one_level": {
"inner_level": {
"sample1": {
"field1": "sample",
"field2": ["a", "b"],
},
"sample2": {
"field1": "sample2",
"field2": ["a", "b"]
}
}
}

}

It prompts this output.

{
  "name": "MyClass",
  "type": "record",
  "namespace": "com.acme.avro",
  "fields": [
{
  "name": "one_level",
  "type": {
"name": "one_level",
"type": "record",
"fields": [
  {
"name": "inner_level",
"type": {
  "name": "inner_level",
  "type": "record",
  "fields": [
{
  "name": "sample1",
  "type": {
"name": "sample1",
"type": "record",
"fields": [
  {
"name": "field1",
"type": "string"
  },
  {
"name": "field2",
"type": {
  "type": "array",
  "items": "string"
}
  }
]
  }
},
{
  "name": "sample2",
  "type": {
"name": "sample2",
"type": "record",
"fields": [
  {
"name": "field1",
"type": "string"
  },
  {
"name": "field2",
"type": {
  "type": "array",
  "items": "string"
}
  }
]
  }
}
  ]
}
  }
]
  }
}
  ]
}

Which isn't absolutely what I'm looking for. Is it possible to define such
schema in Avro?