Re: AVRO definition question - record within a record?

2020-04-01 Thread fady
This code, using your schema: 

Schema schema = new Schema.Parser().parse(new
JsonEncoder out = EncoderFactory.get().jsonEncoder(schema, System.out,
DatumWriter writer = new GenericDatumWriter<>(schema);
GenericRecord person = new GenericData.Record(schema);
person.put("location", 5);
person.put("country", "TH");
person.put("animal_number", "7");
person.put("alert_id", "ab1");
person.put("alert_date", "2014-12-05");
person.put("type_of_alert", "tu");
person.put("alert_name", "zu");
GenericRecord test = new
test.put("calving_date", "2014-12-05");
test.put("parity", "p");
test.put("create_dtm_dl", "12:12:12");
person.put("additionalInformation", test);
writer.write(person, out);

Produces this result: 

"location" : 5,
"country" : "TH",
"animal_number" : "7",
"alert_id" : "ab1",
"alert_date" : "2014-12-05",
"type_of_alert" : "tu",
"alert_name" : "zu",
"additionalInformation" : {
   "calving_date" : "2014-12-05",
   "parity" : "p",
   "create_dtm_dl" : "12:12:12"

So nested records are properly supported in avro and widely used. 

Maybe something wrong in your code you are using? 


On 31.03.2020 08:51, Erwin Speybroeck wrote:


I need to be able to make a POST call to an API and the body should look like this : 


"location" : "355669", 

"countryCode" : "NL", 

"identificationNumber" : "NL 672760327", 

"externalId" : "KTSPRED_01_817997491", 

"dateTime" : "2019-11-08T04:33:41.000Z", 

"value" : "GEMIDDELD_RISICO", 

"type" : "ketosis_prediction", 

"additionalInformation" : "{ 

"calvingDate": "2018-10-01", 

"parity": "3", 

"create_date": "2019-11-08T04:33:41.000Z " 



I tried the following AVRO definition for serialisation (starting from a csv file) : 


"type" : "record", 

"name" : "person", 

"namespace" : "nifi", 

"fields" : [{"name" : "location" , 

"type" : "int"}, 

{"name" : "country" , 

"type" : "string"}, 

{"name" : "animal_number" , 

"type" : "string"}, 

{"name" : "alert_id" , 

"type" : "string"}, 

{"name" : "alert_date" , 

"type" : "string"}, 

{"name" : "type_of_alert" , 

"type" : "string"}, 

{"name" : "alert_name" , 

"type" : "string"}, 

{"name" : "additionalInformation", 

"type" : { 

"type" : "record", 

"name" : "test", 

"fields" : [ 

{"name" : "calving_date", 

"type" : "string"}, 

{"name" : "parity", 

"type" : "string"}, 

{"name" : "create_dtm_dl", 

"type" : "string"} 


"default" : {} 




But it does not work. Is it possible to define a new record within a record? Or should it be done in another way? 

My hive tables are in CSV and I have to convert them to JSON so I can post them. 

To create this JSON I have to use an AVRO schema. It works fine until the field "additionalInformation". 

I'm not able to generate the fields inside additionalInformation, the only thing I can do is to say that additionalInformation is a string. But then it doesn't create the fields that I want and it doesn't post it. 

ABove is my AVRO schema trying to create the JSON. The BOLD part is the one trying to create the additionalInformation field as a record, but it doesn't work and I have to change the type to string so that it works, but then the POST body is not json. 

The csv file looks like this - maybe I need to change this input file in some way? 


"ketosis_prediction";"NL 743169121";"NL";83618;"KTSPRED_01_817997482";"HOOG_RISICO";"2019-11-08 04:33:38.0";2019-11-07 00:00:00.0;4;2019-11-09 19:13:29.484 

"ketosis_prediction";"NL 672760327";"NL";355669;"KTSPRED_01_817997491";"GEMIDDELD_RISICO";"2019-11-08 04:33:41.0";2019-11-07 00:00:00.0;3;2019-11-09 19:13:29.484 

Re: New Committer: Ryan Skraba

2019-12-17 Thread fady
Excellent news. Ryan, you are doing a fantastic job. Congratulations! 


Re: AvroTypeException with "Expected" case : field name is not provided

2019-09-13 Thread fady
On 13.09.2019 11:46, Yanna elina wrote:

> Hi guys ,
> in 1.9 when there is exception like this  "org.apache.avro.AvroTypeException: 
> Expected start-union. Got END_OBJECT" .   
> and Using genericRecord =, decoder);
> its could be nice to provide the field in error. it could be  more easy to 
> debug the schema.
> thx !

That would great yes. The field name may not be sufficient though, the
location relative to the schema root would be better I think. 

Internally we have defined a simple spec we call avroloc that looks like
recorda.field1.field2.1.recordb.field3 where the numeric step .1 is the
zero-based alternative in case you traverse a union. To signal a
particular item in an array, field2[3] is used in the avroloc. 

While we are on the topic of read error reporting, record number and
offset within the record where the offending data is would also help. 


Re: Avro schema properties contention on multithread read

2017-07-06 Thread fady
On 05.07.2017 21:53, Zoltan Farkas wrote:

> The synchronization in JsonProperties is curently inconsistent (see 
> getObjectProps()) which makes current implementation @NotThreadSafe 
> I think it would be probably best to remove synchronization from those 
> methods... and add @NotThreadSafe to the class... 
> Utilities like Schemas.synchronizedSchema(...) and 
> Schemas.unmodifiableSchema(...) could be added to help with various use 
> cases... 
> --Z

Thank you for your reply. I like your Schemas.unmodifiableSchema(...) a

While what you are describing would be ideal, a simpler solution might
be to change the LinkedHashMap that backs jsonProperties into something
like a ConcurrentHashMap, avoiding the need for synchronization. 

This being said ConcurrentHashMap itself does not preserve insertion
order, so its not a mere replacement to LinkedHashMap.

Re: Avro in OSGi environment

2016-01-24 Thread Fady

On 22/01/2016 17:15, Bernd Wiswedel wrote:

Has anyone successfully run Avro in an OSGi environment? There was an
issue fixed for release 1.7.6 [1] and I also see the 'correct'
MANIFEST.MF in the build. This is what it looks like (from [2]):

  29 Implementation-Title: Apache Avro
  30 Implementation-Version: 1.7.7
  31 Built-By: cutting
  32 Specification-Vendor: The Apache Software Foundation
  33 Tool: Bnd-0.0.357
  34 Bundle-Name: Apache Avro
  35 Created-By: Apache Maven Bundle Plugin
  36 Implementation-Vendor: The Apache Software Foundation
  37 Bundle-Vendor: The Apache Software Foundation
  38 Implementation-Vendor-Id: org.apache.avro
  39 Bundle-Version: 1.7.7
  40 Build-Jdk: 1.7.0_45
  41 Bnd-LastModified: 1405714122455
  42 Bundle-ManifestVersion: 2
  43 Specification-Title: Apache Avro
  44 Bundle-Description: Avro core components
  45 Bundle-License:
  46 Bundle-DocURL:
  47 Import-Package: com.thoughtworks.paranamer,org.apache.commons.compress
  48  .compressors.bzip2;version="1.4",org.apache.commons.compress.compress
  49  ors.xz;version="1.4",org.apache.commons.compress.utils;version="1.4",
  50  org.codehaus.jackson;version="1.9",;version="1
  51  .9",;version="1.9",org.codehaus.jackson.node;
  52  version="1.9",org.codehaus.jackson.util;version="1.9",org.slf4j;versi
  53  on="1.6",org.xerial.snappy;resolution:=optional;version="1.0",sun.mis
  54  c
  55 Bundle-SymbolicName: avro
  56 Specification-Version: 1.7.7

But then in line 53 it imports a package 'sun.misc', which is part of
the java environment (and therefore not provided by any other standard
bundle). Package not available = bundle will not start. I know how to
work around it but shouldn't that be excluded?



I managed to use avro-1.7.7 in an OSGi environment. You can either use 
it directly and get the sun.misc OSGified version here:


Or use the servicemix rebundling of avro which has less mandatory 
