Re: AVRO definition question - record within a record?

2020-04-01 Thread fady
This code, using your schema: 


Schema schema = new Schema.Parser().parse(new
File("src/test/data/nestedrecord.avsc"));
JsonEncoder out = EncoderFactory.get().jsonEncoder(schema, System.out,
true);
DatumWriter writer = new GenericDatumWriter<>(schema);
GenericRecord person = new GenericData.Record(schema);
person.put("location", 5);
person.put("country", "TH");
person.put("animal_number", "7");
person.put("alert_id", "ab1");
person.put("alert_date", "2014-12-05");
person.put("type_of_alert", "tu");
person.put("alert_name", "zu");
GenericRecord test = new
GenericData.Record(schema.getField("additionalInformation").schema());
test.put("calving_date", "2014-12-05");
test.put("parity", "p");
test.put("create_dtm_dl", "12:12:12");
person.put("additionalInformation", test);
writer.write(person, out);
out.flush(); 

Produces this result: 


{
"location" : 5,
"country" : "TH",
"animal_number" : "7",
"alert_id" : "ab1",
"alert_date" : "2014-12-05",
"type_of_alert" : "tu",
"alert_name" : "zu",
"additionalInformation" : {
   "calving_date" : "2014-12-05",
   "parity" : "p",
   "create_dtm_dl" : "12:12:12"
   }
} 

So nested records are properly supported in avro and widely used. 

Maybe something wrong in your code you are using? 

Cheers 


On 31.03.2020 08:51, Erwin Speybroeck wrote:

Hi, 

I need to be able to make a POST call to an API and the body should look like this : 

{ 

"location" : "355669", 

"countryCode" : "NL", 

"identificationNumber" : "NL 672760327", 

"externalId" : "KTSPRED_01_817997491", 

"dateTime" : "2019-11-08T04:33:41.000Z", 

"value" : "GEMIDDELD_RISICO", 

"type" : "ketosis_prediction", 

"additionalInformation" : "{ 

"calvingDate": "2018-10-01", 

"parity": "3", 

"create_date": "2019-11-08T04:33:41.000Z " 

}" 

} 

I tried the following AVRO definition for serialisation (starting from a csv file) : 

{ 

"type" : "record", 

"name" : "person", 

"namespace" : "nifi", 

"fields" : [{"name" : "location" , 

"type" : "int"}, 

{"name" : "country" , 

"type" : "string"}, 

{"name" : "animal_number" , 

"type" : "string"}, 

{"name" : "alert_id" , 

"type" : "string"}, 

{"name" : "alert_date" , 

"type" : "string"}, 

{"name" : "type_of_alert" , 

"type" : "string"}, 

{"name" : "alert_name" , 

"type" : "string"}, 

{"name" : "additionalInformation", 

"type" : { 

"type" : "record", 

"name" : "test", 

"fields" : [ 

{"name" : "calving_date", 

"type" : "string"}, 

{"name" : "parity", 

"type" : "string"}, 

{"name" : "create_dtm_dl", 

"type" : "string"} 

]}, 

"default" : {} 

} 

] 

} 

But it does not work. Is it possible to define a new record within a record? Or should it be done in another way? 

My hive tables are in CSV and I have to convert them to JSON so I can post them. 

To create this JSON I have to use an AVRO schema. It works fine until the field "additionalInformation". 

I'm not able to generate the fields inside additionalInformation, the only thing I can do is to say that additionalInformation is a string. But then it doesn't create the fields that I want and it doesn't post it. 

ABove is my AVRO schema trying to create the JSON. The BOLD part is the one trying to create the additionalInformation field as a record, but it doesn't work and I have to change the type to string so that it works, but then the POST body is not json. 

The csv file looks like this - maybe I need to change this input file in some way? 

alert_name;animal_number;country;location;alert_id;type_of_alert;alert_date;calving_date;parity;create_dtm_dl 

"ketosis_prediction";"NL 743169121";"NL";83618;"KTSPRED_01_817997482";"HOOG_RISICO";"2019-11-08 04:33:38.0";2019-11-07 00:00:00.0;4;2019-11-09 19:13:29.484 

"ketosis_prediction";"NL 672760327";"NL";355669;"KTSPRED_01_817997491";"GEMIDDELD_RISICO";"2019-11-08 04:33:41.0";2019-11-07 00:00:00.0;3;2019-11-09 19:13:29.484 

Met vriendelijke groet, Kind regards, S pozdravem, Freundlichen Grüßen, Atenciosamente, 

Erwin Speybroeck 

_Lead Business Consultant | BU Data_ 

(0)26-3898621 

0032475-252401 

erwin.speybro...@crv4all.com 


This message is subject to the following E-mail Disclaimer. 
(http://www.crv4all.com/disclaimer-email/) CRV Holding B.V. seats according to 
the articles of association in Arnhem, Dutch trade number 09125050.

Re: New Committer: Ryan Skraba

2019-12-17 Thread fady
Excellent news. Ryan, you are doing a fantastic job. Congratulations! 

Fady

Re: AvroTypeException with "Expected" case : field name is not provided

2019-09-13 Thread fady
On 13.09.2019 11:46, Yanna elina wrote:

> Hi guys ,
> 
> in 1.9 when there is exception like this  "org.apache.avro.AvroTypeException: 
> Expected start-union. Got END_OBJECT" .   
> and Using genericRecord = reader.read(null, decoder);
> 
> its could be nice to provide the field in error. it could be  more easy to 
> debug the schema.
> 
> thx !

That would great yes. The field name may not be sufficient though, the
location relative to the schema root would be better I think. 

Internally we have defined a simple spec we call avroloc that looks like
recorda.field1.field2.1.recordb.field3 where the numeric step .1 is the
zero-based alternative in case you traverse a union. To signal a
particular item in an array, field2[3] is used in the avroloc. 

While we are on the topic of read error reporting, record number and
offset within the record where the offending data is would also help. 

Fady

Re: Avro schema properties contention on multithread read

2017-07-06 Thread fady
On 05.07.2017 21:53, Zoltan Farkas wrote:

> The synchronization in JsonProperties is curently inconsistent (see 
> getObjectProps()) which makes current implementation @NotThreadSafe 
> 
> I think it would be probably best to remove synchronization from those 
> methods... and add @NotThreadSafe to the class... 
> Utilities like Schemas.synchronizedSchema(...) and 
> Schemas.unmodifiableSchema(...) could be added to help with various use 
> cases... 
> 
> --Z

Thank you for your reply. I like your Schemas.unmodifiableSchema(...) a
lot. 

While what you are describing would be ideal, a simpler solution might
be to change the LinkedHashMap that backs jsonProperties into something
like a ConcurrentHashMap, avoiding the need for synchronization. 

This being said ConcurrentHashMap itself does not preserve insertion
order, so its not a mere replacement to LinkedHashMap.

Re: Avro in OSGi environment

2016-01-24 Thread Fady

On 22/01/2016 17:15, Bernd Wiswedel wrote:

Has anyone successfully run Avro in an OSGi environment? There was an
issue fixed for release 1.7.6 [1] and I also see the 'correct'
MANIFEST.MF in the build. This is what it looks like (from [2]):


  29 Implementation-Title: Apache Avro
  30 Implementation-Version: 1.7.7
  31 Built-By: cutting
  32 Specification-Vendor: The Apache Software Foundation
  33 Tool: Bnd-0.0.357
  34 Bundle-Name: Apache Avro
  35 Created-By: Apache Maven Bundle Plugin
  36 Implementation-Vendor: The Apache Software Foundation
  37 Bundle-Vendor: The Apache Software Foundation
  38 Implementation-Vendor-Id: org.apache.avro
  39 Bundle-Version: 1.7.7
  40 Build-Jdk: 1.7.0_45
  41 Bnd-LastModified: 1405714122455
  42 Bundle-ManifestVersion: 2
  43 Specification-Title: Apache Avro
  44 Bundle-Description: Avro core components
  45 Bundle-License: http://www.apache.org/licenses/LICENSE-2.0.txt
  46 Bundle-DocURL: http://www.apache.org/
  47 Import-Package: com.thoughtworks.paranamer,org.apache.commons.compress
  48  .compressors.bzip2;version="1.4",org.apache.commons.compress.compress
  49  ors.xz;version="1.4",org.apache.commons.compress.utils;version="1.4",
  50  org.codehaus.jackson;version="1.9",org.codehaus.jackson.io;version="1
  51  .9",org.codehaus.jackson.map;version="1.9",org.codehaus.jackson.node;
  52  version="1.9",org.codehaus.jackson.util;version="1.9",org.slf4j;versi
  53  on="1.6",org.xerial.snappy;resolution:=optional;version="1.0",sun.mis
  54  c
  55 Bundle-SymbolicName: avro
  56 Specification-Version: 1.7.7

But then in line 53 it imports a package 'sun.misc', which is part of
the java environment (and therefore not provided by any other standard
bundle). Package not available = bundle will not start. I know how to
work around it but shouldn't that be excluded?

Thanks!

[1] https://issues.apache.org/jira/browse/AVRO-987
[2]
http://search.maven.org/remotecontent?filepath=org/apache/avro/avro/1.7.7/avro-1.7.7.jar


I managed to use avro-1.7.7 in an OSGi environment. You can either use 
it directly and get the sun.misc OSGified version here:


  
  com.github.livesense
org.liveSense.fragment.sun.misc
  1.0.5
  

Or use the servicemix rebundling of avro which has less mandatory 
dependencies:


  
  org.apache.servicemix.bundles
org.apache.servicemix.bundles.avro
  1.7.7_1