Re: AVRO schema evolution: adding optional column with default fails deserialization

2019-08-01 Thread Svante Karlsson
First of all you can use confluents schema registry as you which - it's not
in the paid bundle as long as you are not hosting kafka as a service (ie
amazon et al). And I would recommend you to. It's good and trivial to
operate.

Second,  take a look at the serializer in my pet project at:
https://github.com/bitbouncer/kspp/blob/master/include/kspp/avro/avro_serdes.h
:96

Note that this encoder/decoder does not support schema evolution but it
discovers the actual written schema and gets a "avro::ValidSchema" from the
schema registry on read. And this is what you need.

This is of course c++ but you can probably figure out what you need to do.

In the end you will need a rest/grpc service somewhere that your serializer
can use to get an in that you can refer to across your infrastructure. I
did write one some years ago but reverted to confluents since most people
use that.

/svante














Den tors 1 aug. 2019 kl 18:05 skrev Martin Mucha :

> Thanks for answer!
>
> Ad: "which byte[] are we talking about?" — actually I don't know. Please
> lets break it down together.
>
> I'm pretty sure, that we're not using confluent platform(iiuc the paid
> bundle, right?). I shared some serializer before [1], so you're saying,
> that this wont include neither schema ID, nor schema OK? Ok, lets assume
> that. Next. We're using SpringKafka project, to get this serialized data
> and send them over kafka. So we don't have any schema registry, but in
> principle it could be possible to include schema within each message. But I
> cannot see how that could be done. SpringKafka requires us to provide
> him 
> org.apache.kafka.clients.producer.ProducerConfig#VALUE_SERIALIZER_CLASS_CONFIG,
> which we did, but it's just a class calling serializer [1], and from that
> point on I have no idea how it could figure out used schema. The question
> here I'm asking is, whether when sending avro bytes (obtained by provided
> serializer[1]), they are or can be somehow paired with schema used to
> serialize data? Is this what kafka senders do, or can do? Include ID/whole
> schema somewhere in headers or ...??? And when I read kafka messages, will
> the schema be (or could be) somewhere stored in ConsumerRecord or somewhere
> like that?
>
> sorry for confused questions, but I'm really missing knowledge to even ask
> properly.
>
> thanks,
> Martin.
>
> [1]
> public static  byte[] serialize(T data,
> boolean useBinaryDecoder, boolean pretty) {
> try {
> if (data == null) {
> return new byte[0];
> }
>
> log.debug("data='{}'", data);
> Schema schema = data.getSchema();
> ByteArrayOutputStream byteArrayOutputStream = new
> ByteArrayOutputStream();
> Encoder binaryEncoder = useBinaryDecoder
> ?
> EncoderFactory.get().binaryEncoder(byteArrayOutputStream, null)
> : EncoderFactory.get().jsonEncoder(schema,
> byteArrayOutputStream, pretty);
>
> DatumWriter datumWriter = new
> GenericDatumWriter<>(schema);
> datumWriter.write(data, binaryEncoder);
>
> binaryEncoder.flush();
> byteArrayOutputStream.close();
>
> byte[] result = byteArrayOutputStream.toByteArray();
> log.debug("serialized data='{}'",
> DatatypeConverter.printHexBinary(result));
>     return result;
> } catch (IOException ex) {
> throw new SerializationException(
> "Can't serialize data='" + data, ex);
> }
> }
>
> čt 1. 8. 2019 v 17:06 odesílatel Svante Karlsson 
> napsal:
>
>> For clarity: What byte[] are we talking about?
>>
>> You are slightly missing my point if we are speaking about kafka.
>>
>> Confluent encoding:
>>  
>> 0  schema_id  avro
>>
>> avro_binary_payload does not in any case contain the schema or schema id.
>> The schema id is a confluent thing. (in an avrofile the schema is prepended
>> by value in the file)
>>
>> While it's trivial to build a schema registry that for example instead
>> gives you a md5 hash of the schema you have to use it throughout your
>> infrastructure OR use known reader and writer schema (ie hardcoded).
>>
>> In confluent world the id=N is the N+1'th registered schema in the
>> database (a kafka topic) if I remember right. Loose that database and you
>> cannot read your kafka topics.
>>
>> So you have to use some other encoder, homegrown or not that embeds
>> either the full schema in every message (expensive) of some id. Does this
>> make sense?
>>
>> /svante
>>
&

Re: AVRO schema evolution: adding optional column with default fails deserialization

2019-08-01 Thread Svante Karlsson
For clarity: What byte[] are we talking about?

You are slightly missing my point if we are speaking about kafka.

Confluent encoding:
 
0  schema_id  avro

avro_binary_payload does not in any case contain the schema or schema id.
The schema id is a confluent thing. (in an avrofile the schema is prepended
by value in the file)

While it's trivial to build a schema registry that for example instead
gives you a md5 hash of the schema you have to use it throughout your
infrastructure OR use known reader and writer schema (ie hardcoded).

In confluent world the id=N is the N+1'th registered schema in the database
(a kafka topic) if I remember right. Loose that database and you cannot
read your kafka topics.

So you have to use some other encoder, homegrown or not that embeds either
the full schema in every message (expensive) of some id. Does this make
sense?

/svante










Den tors 1 aug. 2019 kl 16:38 skrev Martin Mucha :

> Thanks for answer.
>
> What I knew already is, that in each message there is _somehow_ present
> either _some_ schema ID or full schema. I saw some byte array manipulations
> to get _somehow_ defined schema ID from byte[], which worked, but that's
> definitely not how it should be done. What I'm looking for is some
> documentation of _how_ to do these things right. I really cannot find a
> single thing, yet there must be some util functions, or anything. Is there
> some devel-first-steps page, where can I find answers for:
>
> * How to test, whether byte[] contains full schema or just id?
> * How to control, whether message is serialized with ID or with full
> schema?
> * how to get ID from byte[]?
> * how to get full schema from byte[]?
>
> I don't have confluent platform, and cannot have it, but implementing "get
> schema by ID" should be easy task, provided, that I have that ID. In my
> scenario I know, that message will be written using one schema, just
> different versions of it. So I just need to know, which version it is, so
> that I can configure deserializer to enable schema evolution.
>
> thanks in advance,
> Martin
>
> čt 1. 8. 2019 v 15:55 odesílatel Svante Karlsson 
> napsal:
>
>> In an avrofile the schema is in the beginning but if you refer a single
>> record serialization like Kafka then you have to add something that you can
>> use to get hold of the schema. Confluents avroencoder for Kafka uses
>> confluents schema registry that uses int32 as schema Id. This is prepended
>> (+a magic byte) to the binary avro. Thus using the schema registry again
>> you can get the writer schema.
>>
>> /Svante
>>
>> On Thu, Aug 1, 2019, 15:30 Martin Mucha  wrote:
>>
>>> Hi,
>>>
>>> just one more question, not strictly related to the subject.
>>>
>>> Initially I though I'd be OK with using some initial version of schema
>>> in place of writer schema. That works, but all columns from schema older
>>> than this initial one would be just ignored. So I need to know EXACTLY the
>>> schema, which writer used. I know, that avro messages contains either full
>>> schema or at least it's ID. Can you point me to the documentation, where
>>> this is discussed? So in my deserializer I have byte[] as a input, from
>>> which I need to get the schema information first, in order to be able to
>>> deserialize the record. I really do not know how to do that, I'm pretty
>>> sure I never saw this anywhere, and I cannot find it anywhere. But in
>>> principle it must be possible, since reader need not necessarily have any
>>> control of which schema writer used.
>>>
>>> thanks a lot.
>>> M.
>>>
>>> út 30. 7. 2019 v 18:16 odesílatel Martin Mucha 
>>> napsal:
>>>
>>>> Thank you very much for in depth answer. I understand how it works now
>>>> better, will test it shortly.
>>>> Thank you for your time.
>>>>
>>>> Martin.
>>>>
>>>> út 30. 7. 2019 v 17:09 odesílatel Ryan Skraba  napsal:
>>>>
>>>>> Hello!  It's the same issue in your example code as allegro, even with
>>>>> the SpecificDatumReader.
>>>>>
>>>>> This line: datumReader = new SpecificDatumReader<>(schema)
>>>>> should be: datumReader = new SpecificDatumReader<>(originalSchema,
>>>>> schema)
>>>>>
>>>>> In Avro, the original schema is commonly known as the writer schema
>>>>> (the instance that originally wrote the binary data).  Schema
>>>>> evolution applies when you are using the constructor of the
>>>>> 

Re: AVRO schema evolution: adding optional column with default fails deserialization

2019-08-01 Thread Svante Karlsson
In an avrofile the schema is in the beginning but if you refer a single
record serialization like Kafka then you have to add something that you can
use to get hold of the schema. Confluents avroencoder for Kafka uses
confluents schema registry that uses int32 as schema Id. This is prepended
(+a magic byte) to the binary avro. Thus using the schema registry again
you can get the writer schema.

/Svante

On Thu, Aug 1, 2019, 15:30 Martin Mucha  wrote:

> Hi,
>
> just one more question, not strictly related to the subject.
>
> Initially I though I'd be OK with using some initial version of schema in
> place of writer schema. That works, but all columns from schema older than
> this initial one would be just ignored. So I need to know EXACTLY the
> schema, which writer used. I know, that avro messages contains either full
> schema or at least it's ID. Can you point me to the documentation, where
> this is discussed? So in my deserializer I have byte[] as a input, from
> which I need to get the schema information first, in order to be able to
> deserialize the record. I really do not know how to do that, I'm pretty
> sure I never saw this anywhere, and I cannot find it anywhere. But in
> principle it must be possible, since reader need not necessarily have any
> control of which schema writer used.
>
> thanks a lot.
> M.
>
> út 30. 7. 2019 v 18:16 odesílatel Martin Mucha 
> napsal:
>
>> Thank you very much for in depth answer. I understand how it works now
>> better, will test it shortly.
>> Thank you for your time.
>>
>> Martin.
>>
>> út 30. 7. 2019 v 17:09 odesílatel Ryan Skraba  napsal:
>>
>>> Hello!  It's the same issue in your example code as allegro, even with
>>> the SpecificDatumReader.
>>>
>>> This line: datumReader = new SpecificDatumReader<>(schema)
>>> should be: datumReader = new SpecificDatumReader<>(originalSchema,
>>> schema)
>>>
>>> In Avro, the original schema is commonly known as the writer schema
>>> (the instance that originally wrote the binary data).  Schema
>>> evolution applies when you are using the constructor of the
>>> SpecificDatumReader that takes *both* reader and writer schemas.
>>>
>>> As a concrete example, if your original schema was:
>>>
>>> {
>>>   "type": "record",
>>>   "name": "Simple",
>>>   "fields": [
>>> {"name": "id", "type": "int"},
>>> {"name": "name","type": "string"}
>>>   ]
>>> }
>>>
>>> And you added a field:
>>>
>>> {
>>>   "type": "record",
>>>   "name": "SimpleV2",
>>>   "fields": [
>>> {"name": "id", "type": "int"},
>>> {"name": "name", "type": "string"},
>>> {"name": "description","type": ["null", "string"]}
>>>   ]
>>> }
>>>
>>> You could do the following safely, assuming that Simple and SimpleV2
>>> classes are generated from the avro-maven-plugin:
>>>
>>> @Test
>>> public void testSerializeDeserializeEvolution() throws IOException {
>>>   // Write a Simple v1 to bytes using your exact method.
>>>   byte[] v1AsBytes = serialize(new Simple(1, "name1"), true, false);
>>>
>>>   // Read as Simple v2, same as your method but with the writer and
>>> reader schema.
>>>   DatumReader datumReader =
>>>   new SpecificDatumReader<>(Simple.getClassSchema(),
>>> SimpleV2.getClassSchema());
>>>   Decoder decoder = DecoderFactory.get().binaryDecoder(v1AsBytes, null);
>>>   SimpleV2 v2 = datumReader.read(null, decoder);
>>>
>>>   assertThat(v2.getId(), is(1));
>>>   assertThat(v2.getName(), is(new Utf8("name1")));
>>>   assertThat(v2.getDescription(), nullValue());
>>> }
>>>
>>> This demonstrates with two different schemas and SpecificRecords in
>>> the same test, but the same principle applies if it's the same record
>>> that has evolved -- you need to know the original schema that wrote
>>> the data in order to apply the schema that you're now using for
>>> reading.
>>>
>>> I hope this clarifies what you are looking for!
>>>
>>> All my best, Ryan
>>>
>>>
>>>
>>> On Tue, Jul 30, 2019 at 3:30 PM Martin Mucha  wrote:
>>> >
>>> > Thanks for answer.
>>> >
>>> > Actually I have exactly the same behavior with avro 1.9.0 and
>>> following deserializer in our other app, which uses strictly avro codebase,
>>> and failing with same exceptions. So lets leave "allegro" library and lots
>>> of other tools out of it in our discussion.
>>> > I can use whichever aproach. All I need is single way, where I can
>>> deserialize byte[] into class generated by avro-maven-plugin, and which
>>> will respect documentation regarding schema evolution. Currently we're
>>> using following deserializer and serializer, and these does not work when
>>> it comes to schema evolution. What is the correct way to serialize and
>>> deserializer avro data?
>>> >
>>> > I probably don't understand your mention about GenericRecord or
>>> GenericDatumReader. I tried to use GenericDatumReader in deserializer
>>> below, but then it seems I got back just GenericData$Record instance, which
>>> I can use then to access array of instances, which is not what I'm looking
>>> for(IIUC), since 

Re: Avro C++: How to serialize data as Data Object File to send in Kafka?

2019-07-12 Thread Svante Karlsson
This is maybe not the nicest implementation since it feels way to
complicated but the only on I found out. Checkout encode starting at line
95.
Note that the example encodes data using confluent's schema registry format
(ie 5 extra bytes) and does a double copy - I have not found a way to get
rid of that.

https://github.com/bitbouncer/kspp/blob/master/include/kspp/avro/avro_serdes.h

/svante


Den fre 12 juli 2019 kl 14:42 skrev steinio :

> I am trying to serialize some data, created by a .json definition, and I
> would like to send the data that DataFileWriter writes.
> DataFileWriter takes a file name and writes it to a binary file.
> I can get around this by reading back the file to a string and sending the
> string over by stream (kafka.producer).
> This is not really a viable solution for a high speed producer application,
> and by looking at the DataFileWriter, it looks like it should also be able
> to take an std::unique_ptr instead of a file name, and write
> it to a stream.
> But this gives an error when trying to build the application.
>
> /error C2280:
>
> 'std::unique_ptr>::unique_ptr(const
> std::unique_ptr<_Ty,std::default_delete<_Ty>> &)': attempting to reference
> a
> deleted function
> with
> [
> _Ty=avro::OutputStream
> ]
> c:\program files (x86)\microsoft visual studio
> 14.0\vc\include\memory(1435):
> note: see declaration of
>
> 'std::unique_ptr>::unique_ptr'
> with
> [
> _Ty=avro::OutputStream
> ]/
>
> The error I guess is that I am trying pass a std::unique_ptr as an
> argument,
> which is not possible since they can not be copied, but I should rather
> call
> std::move(myUniquePtr) as an argument instead.
> But this gives me another error:
>
> /error C2664: 'avro::DataFileWriter::DataFileWriter(const
> avro::DataFileWriter &)': cannot convert argument 1 from
> 'std::shared_ptr' to 'const char *'/
>
> There are no examples no how to send data as object data file that includes
> header data, so I am just trying and failing here. Is there a "correct" way
> of doing this?
> I see that this is really easy to do in the Java library, it is just
>
> /ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
> DatumWriter writer = new GenericDatumWriter<>(schema);
> DataFileWriter dataFileWriter = new
> DataFileWriter<>(writer);
> dataFileWriter.create(schema, outputStream);
>
> dataFileWriter.append(data);
> dataFileWriter.close();/
>
> What I have done so far is this:
>
> /std::ifstream ifs("data.json");
> avro::ValidSchema dataSchm;
> avro::compileJsonSchema(ifs, dataSchm);
> const char* file = "data.bin";
> std::shared_ptr out = avro::memoryOutputStream();
> avro::DataFileWriter dfw(file, dataSchm);
> dfw.write(data);
> dfw.close();
> std::ifstream ifs("processmsg.bin");
> std::string str((std::istreambuf_iterator(ifs)),
> builder.payload(str);
> producer.produce(builder);/
>
> How can I avoid having to write this to a file, and instead just write the
> binary data directly to an output stream that I can encode and send?
>
>
>
> --
> Sent from: http://apache-avro.679487.n3.nabble.com/Avro-Users-f679479.html
>


Re: C++ How to get the length of the encoded data

2018-10-27 Thread Svante Karlsson
You need to call flush(). Send common usecaee below

auto bin_os = avro::memoryOutputStream();
  avro::EncoderPtr bin_encoder = avro::binaryEncoder();
  bin_encoder->init(*bin_os.get());
  avro::encode(*bin_encoder, src);
  bin_encoder->flush(); /* push back unused characters to the
output stream again, otherwise content_length will be a multiple of
4096 */


n Sat, Oct 27, 2018, 20:34 Olivier Delbeke 
wrote:

> Hi all,
>
> Just started with AVRO (in C++) and a bit stuck (so this will be an easy
> question).
> All examples show how to encode data and decode it right after you did
> that, so you can immediately connect the OutputStream to the InputStream,
> completely ignoring the size of the encoded data. However, what I'd like to
> do is to send the data to Kafka, so I need to know how many bytes I need to
> read from the stream. After encoding, the byteCount() of the OutputStream
> always returns 4096 which looks huge for 3 doubles and a short string. When
> I read back those 4096 bytes of data, I can indeed see that only the first
> 40 bytes are non-zeros. I cannot find any method in the Encoder or in the
> OutputStream that does return the length of the encoded data. Am I
> overlooking something ?
>
> Thanks,
>
> Olivier
>


Re: fromJson is failing with null as uniontype

2016-02-29 Thread Svante Karlsson
The problem is that avro has it's own representation of union encoding  so
your experience would encode to {"int": 50}.

In a recent project we ended up writing a slightly modded json parser to be
able to use avro schemas on existing json rest calls.

2016-02-29 9:20 GMT+01:00 Chris Miller :

> Did you ever figure this out? I was having the same problem.
>
>
> --
> Chris Miller
>
> On Fri, Feb 19, 2016 at 2:53 AM, Siva  wrote:
>
>> Can someone help on this? Is anyone faced similar issue?
>>
>> Thanks,
>> Sivakumar Bhavanari.
>>
>> On Wed, Feb 17, 2016 at 4:21 PM, Siva  wrote:
>>
>>> Hi Everyone,
>>>
>>> I m new to avro, running into issues if a type is combined with "null",
>>> like ["null","int"] or ["null", "string"].
>>>
>>> I have a schema like below
>>>
>>> {
>>>"type":"record","namespace":"tutorialspoint",
>>>   "name":"empdetails","fields":[  {
>>>  "name":"experience",
>>>  "type":["null","string"],"default":null
>>>   },
>>>   {  "name":"age", "type":"int" }
>>>]
>>> }
>>>
>>> Below is the json dataset.
>>>
>>> {"experience" : "da",  "age": 50}
>>>
>>> java -jar avro-tools-1.7.7.jar fromjson --schema-file test.avsc
>>> test.json > test.avro
>>>
>>> If I have "null" value in "experience" column it goes though, but if it
>>> has some string it is giving below error. Similar error with int types as
>>> well (VALUE_NUMBER_INT).
>>>
>>> Exception in thread "main" org.apache.avro.AvroTypeException: Expected
>>> start-union. Got VALUE_STRING
>>> at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:697)
>>> at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:441)
>>> at
>>> org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:290)
>>> at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>>>
>>> I have columns with strings or nulls in json, is there a work around to
>>> this error without changing the json data?
>>>
>>> Thanks in advance.
>>>
>>> Thanks,
>>> Sivakumar Bhavanari.
>>>
>>
>>
>


Re: Using Avro for encoding messages

2015-07-09 Thread Svante Karlsson
I had the same problem a while ago and for the same reasons as you mention
we decided to use fingerprints (MD5 hash of the schema), however there are
some catches here.

First I believe that the normalisation of the schema is incomplete so you
might end up with different hashes of the same schema.

Second, using a 128 bit integer prepended to both key and values takes more
space than using 32 bit. Not a big issue for values but for keys this
doubles our size.

Third, we already started to use confluent's registry as well because of
the already existing integration with other pieces of infrastructure.
(camus, bottledwater etc.)

What should be useful given this perspective is a byte or two prepending
the schema id - defining the registry namespace.

I've added the fingerprint schema registry as a example in the c++ kafka
library at
https://github.com/bitbouncer/csi-kafka/tree/master/examples/schema-registry


We run a couple of those in a mesos cluster and use HAproxy find them.


/svante


2015-07-09 10:36 GMT+02:00 Daniel Schierbeck daniel.schierb...@gmail.com:

 I'm working on a system that will store Avro-encoded messages in Kafka.
 The system will have both producers and consumers in different languages,
 including Ruby (not JRuby) and Java.

 At the moment I'm encoding each message as a data file, which means that
 the full schema is included in each encoded message. This is obviously
 suboptimal, but it doesn't seem like there's a standardized format for
 single-message Avro encodings.

 I've reviewed Confluent's schema-registry offering, but that seems to be
 overkill for my needs, and would require me to run and maintain yet another
 piece of infrastructure. Ideally, I wouldn't have to use anything besides
 Kafka.

 Is this something that other people have experience with?

 I've come up with a scheme that would seem to work well independently of
 what kind of infrastructure you're using: whenever a writer process is
 asked to encode a message m with schema s for the first time, it broadcasts
 (s', s) to a schema registry, where s' is the fingerprint of s. The schema
 registry in this case can be pluggable, and can be any mechanism that
 allows different processes to access the schemas. The writer then encodes
 the message as (s', m), i.e. only includes the schema fingerprint. A
 reader, when first encountering a message with a schema fingerprint s',
 looks up s from the schema registry and uses s to decode the message.

 Here, the concept of a schema registry has been abstracted away and is not
 tied to the concept of schema ids and versions. Furthermore, there are
 some desirable traits:

 1. Schemas are identified by their fingerprints, so there's no need for an
 external system to issue schema ids.
 2. Writing (s', s) pairs is idempotent, so there's no need to coordinate
 that task. If you've got a system with many writers, you can let all of
 them broadcast their schemas when they boot or when they need to encode
 data using the schemas.
 3. It would work using a range of different backends for the schema
 registry. Simple key-value stores would obviously work, but for my case I'd
 probably want to use Kafka itself. If the schemas are writting to a topic
 with key-based compaction, where s' is the message key and s is the message
 value, then Kafka would automatically clean up duplicates over time. This
 would save me from having to add more pieces to my infrastructure.

 Has this problem been solved already? If not, would it make sense to
 define a common message format that defined the structure of (s', m)
 pairs?

 Cheers,
 Daniel Schierbeck



Re: Using Avro for encoding messages

2015-07-09 Thread Svante Karlsson
 What causes the schema normalization to be incomplete?
Bad implementation, I use C++ avro and it's not complete and not very
active.

And is that a problem? As long as the reader can get the schema, it
shouldn't matter that there are duplicates – as long as the differences
between the duplicates do not affect decoding.
Not really a problem, we tend to use machine generated schemas and they are
always identical.

I think there are holes in the simplification of types if I remember
correctly.
Namespaces should be collapsed,
{type : string} - string etc

Current implementation can't reliably decide if two types are identical. If
you correct the problem later then a registered schema would actually
change it's hash since it now can be simplified. If this is a problem
depends on your application.

We currently encode this as you suggest schema_type (byte)schema_id
(32/128bits)avro (binary)
The binary fields should probably have a defined endianness also.

I agree on that a defacto way of encoding this would be nice. Currently I
would say that the confluent / linkedin way is the normal


Re: Schema default values in C++ implementation

2015-04-16 Thread Svante Karlsson
I think you are hit by https://issues.apache.org/jira/browse/AVRO-1335

I recently extended the avrogen_cpp thing so it also generates the
following members to your class.

...
   static inline const boost::uuids::uuid schema_hash()  { static const
boost::uuids::uuid
_hash(boost::uuids::string_generator()(eea03bf9-7719-1af0-1dfb-e8049f677f7d));
return _hash; }
   static inline const char*  schema_as_string() { return
{\type\:\record\,\name\:\cpx\,\fields\:[{\name\:\numbername\,\type\:\string\},{\name\:\re\,\type\:\double\},{\name\:\im\,\type\:\double\}]};
}
   static const avro::ValidSchema valid_schema() { static const
avro::ValidSchema
_validSchema(avro::compileJsonSchemaFromString(schema_as_string())); return
_validSchema; }
...

As you can see the existing C++ code seems to loose the default values
somewhere and this of course also makes the schema hash unusable (in my
use-case)

/svante


Re: Deserialize Avro Object Without Schema

2015-03-25 Thread svante karlsson
The schema is written inside an avro file. Thats why you don't need to
provide it. You really need the schema to decode avro data. Either by
providing a schema from somewhere and using a generic datum reader or by
generating a hardcoded decoder that knows the schema from compile time.

regards
svante

2015-03-25 11:10 GMT+01:00 Alexander Zenger a.zen...@cetec.cc:

 Hi,

  -Ursprüngliche Nachricht-
  Von: Rendy Bambang Junior [mailto:rendy.b.jun...@gmail.com]
  Gesendet: Mittwoch, 25. März 2015 10:08
  An: user@avro.apache.org
  Betreff: Deserialize Avro Object Without Schema
 
  It should be possible right? Since the schema itself is embedded in the
 data.
 
 yes and it is working for me. Altough I'm reading data from a file, and I
 create a
 DataFileReader from the GenericDatumReader which then reads the
 deserialized data.
 On a quick look, I didn't find a FileReader for streams.
 Here is my example:

DatumReaderGenericRecord datumReader = new
 GenericDatumReaderGenericRecord();
 DataFileReaderGenericRecord dataFileReader = null;
 try {
   dataFileReader = new DataFileReader(DATA_FILE, datumReader);
 } catch (IOException exp) {
   System.out.println(Could not read file  + DATA_FILE.getName());
   System.exit(1);
 }

 GenericRecord person = null;
 try {
   person = dataFileReader.next(person);
 } catch (IOException exp) {
   System.out.println(Could not read user from file  +
 DATA_FILE.getName());
   System.exit(1);
 }

 System.out.println(Id: + person.get(id));
 System.out.println(Name:   + person.get(name));
 System.out.println(Email:  + person.get(email));

 --
 Regards
 Alexander Zenger





Re: Error building/running tests on avro c++

2014-08-01 Thread svante karlsson
I had some issues with the cmakefile when I built avro c++ for windows a
month or two ago. If I remebered correctly it did not find or possibly
figure out the configuration of boost. I ended up doing some small hacks in
the CMakeList.txt file to get it to compile. This was on windows so the
changes are not relevant to you but after that the tests compiles fine.

I think the changes are as follows

...
#windows fix
SET(EXECUTABLE_OUTPUT_PATH  ${CMAKE_SOURCE_DIR}/bin/$(Platform))
SET(LIBRARY_OUTPUT_PATH ${CMAKE_SOURCE_DIR}/lib/$(Platform))

set(Boost_INCLUDE_DIRS ${CMAKE_SOURCE_DIR}/../boost_1_55_0)

#set(Boost_USE_STATIC_LIBS   ON)
#set(Boost_USE_MULTITHREADED ON)
#set(Boost_LIBRARIES boost_filesystem-vc120-mt-1_55
boost_system-vc120-mt-1_55 boost_program_options-vc120-mt-1_55
boost_iostreams-vc120-mt-1_55)
#boost_filesystem-vc120-mt-1_55.lib
# boost_filesystem-vc120-mt-1_55

set(BOOST_LIBRARYDIR ${Boost_INCLUDE_DIRS}/lib/$(Platform)/lib)

#add_definitions (-DHAVE_BOOST_ASIO)
link_directories(${BOOST_LIBRARYDIR})

#find_package (Boost 1.55 REQUIRED
#COMPONENTS filesystem system program_options iostreams)


add_definitions (${Boost_LIB_DIAGNOSTIC_DEFINITIONS})

include_directories (api ${CMAKE_CURRENT_BINARY_DIR} ${Boost_INCLUDE_DIRS})



as you can see I removed the find_package and basically pointed it out
myself.

Sooner or later I'll have to fix this on linux as well as that is my final
target but I prefer the visual studio development environment for debugging
purposes If you can't figure this out I might give you a hand.

If it is to any help - there is a github repo with cross compilation
directives for among others avro

https://github.com/bitbouncer/csi-build-scripts/blob/master/raspberry_rebuild_ia32.sh

the relevant portion is
...
export BOOST_VERSION=1_55_0
export AVRO_VERSION=1.7.6

cd avro-cpp-$AVRO_VERSION
export BOOST_ROOT=$PWD/../boost_$BOOST_VERSION
export Boost_INCLUDE_DIR=$PWD/../boost_$BOOST_VERSION/boost
export PI_TOOLS_HOME=~/xtools/tools
rm -rf avro
rm -rf build
mkdir build
cd build
cmake 
-DCMAKE_TOOLCHAIN_FILE=../csi-build-scripts/toolchains/raspberry.ia32.cmake
..
make -j4
cd ..
mkdir avro
cp -r api/*.* avro
cd ..

skip the pi tools part and give it a try.


There are a lot of other missing features on avro c++ that's on my
todolist.

/svante



2014-07-31 22:40 GMT+02:00 jeff saremi jeffsar...@hotmail.com:

 Does anyone know what the problem might be? appreciated it:


 [ 97%] Building CXX object CMakeFiles/buffertest.dir/test/buffertest.cc.o
 In file included from /temp/boost/boost/thread/detail/platform.hpp:17,
  from /temp/boost/boost/thread/thread_only.hpp:12,
  from /temp/boost/boost/thread/thread.hpp:12,
  from /temp/boost/boost/thread.hpp:13,
  from /temp/avro/test/buffertest.cc:21:
 /temp/boost/boost/config/requires_threads.hpp:47:5: error: #error
 Compiler threading support is not turned on. Please set the correct
 command line options for threading: -pthread (Linux), -pthreads (Solaris)
 or -mthreads (Mingw32)

 and 100's of similar messages follow.
 or error like:

 /temp/boost/boost/thread/detail/thread.hpp:93: error: expected class-name
 before '{' token
 /temp/boost/boost/thread/detail/thread.hpp:127: error: expected class-name
 before '{' token
 /temp/boost/boost/thread/detail/thread.hpp:144: error: expected class-name
 before '{' token
 /temp/boost/boost/thread/detail/thread.hpp:163: error: 'thread_attributes'
 does not name a type
 /temp/boost/boost/thread/detail/thread.hpp:172: error: 'thread_data_ptr'
 in namespace 'boost::detail' does not name a type
 /temp/boost/boost/thread/detail/thread.hpp:176: error: expected ',' or
 '...' before '' token
 /temp/boost/boost/thread/detail/thread.hpp:176: error: ISO C++ forbids
 declaration of 'attributes' with no type
 /temp/boost/boost/thread/detail/thread.hpp:185: error: expected ',' or
 '...' before '' token
 /temp/boost/boost/thread/detail/thread.hpp:185: error: ISO C++ forbids
 declaration of 'attributes' with no



Re: Error building/running tests on avro c++

2014-08-01 Thread svante karlsson
Since your using solaris check the ticket below: (it speaks about a bug in
1.53 that has been fixed in 1.54)

https://svn.boost.org/trac/boost/ticket/8212

/svante


2014-08-01 15:28 GMT+02:00 jeff saremi jeffsar...@hotmail.com:

 Svente, thanks very much for the info.
 I looked at the shell file as well. I'm not doing much different than that.
 So i believe this has to do with the boost compilation on my platform:
 Solaris.
 The shell file in the link did the default invocation of boost build which
 is what i did.
 But i think some flags are needed wrt multi-threading build.
 If i figure it out i'll share that with every one.

 --
 Date: Fri, 1 Aug 2014 14:37:36 +0200
 Subject: Re: Error building/running tests on avro c++
 From: s...@csi.se
 To: user@avro.apache.org


 I had some issues with the cmakefile when I built avro c++ for windows a
 month or two ago. If I remebered correctly it did not find or possibly
 figure out the configuration of boost. I ended up doing some small hacks in
 the CMakeList.txt file to get it to compile. This was on windows so the
 changes are not relevant to you but after that the tests compiles fine.

 I think the changes are as follows

 ...
 #windows fix
 SET(EXECUTABLE_OUTPUT_PATH  ${CMAKE_SOURCE_DIR}/bin/$(Platform))
 SET(LIBRARY_OUTPUT_PATH ${CMAKE_SOURCE_DIR}/lib/$(Platform))

 set(Boost_INCLUDE_DIRS ${CMAKE_SOURCE_DIR}/../boost_1_55_0)

 #set(Boost_USE_STATIC_LIBS   ON)
 #set(Boost_USE_MULTITHREADED ON)
 #set(Boost_LIBRARIES boost_filesystem-vc120-mt-1_55
 boost_system-vc120-mt-1_55 boost_program_options-vc120-mt-1_55
 boost_iostreams-vc120-mt-1_55)
 #boost_filesystem-vc120-mt-1_55.lib
 # boost_filesystem-vc120-mt-1_55

 set(BOOST_LIBRARYDIR ${Boost_INCLUDE_DIRS}/lib/$(Platform)/lib)

 #add_definitions (-DHAVE_BOOST_ASIO)
 link_directories(${BOOST_LIBRARYDIR})

 #find_package (Boost 1.55 REQUIRED
 #COMPONENTS filesystem system program_options iostreams)


 add_definitions (${Boost_LIB_DIAGNOSTIC_DEFINITIONS})

 include_directories (api ${CMAKE_CURRENT_BINARY_DIR} ${Boost_INCLUDE_DIRS})
 


 as you can see I removed the find_package and basically pointed it out
 myself.

 Sooner or later I'll have to fix this on linux as well as that is my final
 target but I prefer the visual studio development environment for debugging
 purposes If you can't figure this out I might give you a hand.

 If it is to any help - there is a github repo with cross compilation
 directives for among others avro


 https://github.com/bitbouncer/csi-build-scripts/blob/master/raspberry_rebuild_ia32.sh

 the relevant portion is
 ...
 export BOOST_VERSION=1_55_0
 export AVRO_VERSION=1.7.6

 cd avro-cpp-$AVRO_VERSION
 export BOOST_ROOT=$PWD/../boost_$BOOST_VERSION
 export Boost_INCLUDE_DIR=$PWD/../boost_$BOOST_VERSION/boost
 export PI_TOOLS_HOME=~/xtools/tools
 rm -rf avro
 rm -rf build

 mkdir build
 cd build
 cmake 
 -DCMAKE_TOOLCHAIN_FILE=../csi-build-scripts/toolchains/raspberry.ia32.cmake ..
 make -j4
 cd ..

 mkdir avro
 cp -r api/*.* avro
 cd ..

 skip the pi tools part and give it a try.


 There are a lot of other missing features on avro c++ that's on my
 todolist.

 /svante



 2014-07-31 22:40 GMT+02:00 jeff saremi jeffsar...@hotmail.com:

 Does anyone know what the problem might be? appreciated it:


 [ 97%] Building CXX object CMakeFiles/buffertest.dir/test/buffertest.cc.o
 In file included from /temp/boost/boost/thread/detail/platform.hpp:17,
  from /temp/boost/boost/thread/thread_only.hpp:12,
  from /temp/boost/boost/thread/thread.hpp:12,
  from /temp/boost/boost/thread.hpp:13,
  from /temp/avro/test/buffertest.cc:21:
 /temp/boost/boost/config/requires_threads.hpp:47:5: error: #error
 Compiler threading support is not turned on. Please set the correct
 command line options for threading: -pthread (Linux), -pthreads (Solaris)
 or -mthreads (Mingw32)

 and 100's of similar messages follow.
 or error like:

 /temp/boost/boost/thread/detail/thread.hpp:93: error: expected class-name
 before '{' token
 /temp/boost/boost/thread/detail/thread.hpp:127: error: expected class-name
 before '{' token
 /temp/boost/boost/thread/detail/thread.hpp:144: error: expected class-name
 before '{' token
 /temp/boost/boost/thread/detail/thread.hpp:163: error: 'thread_attributes'
 does not name a type
 /temp/boost/boost/thread/detail/thread.hpp:172: error: 'thread_data_ptr'
 in namespace 'boost::detail' does not name a type
 /temp/boost/boost/thread/detail/thread.hpp:176: error: expected ',' or
 '...' before '' token
 /temp/boost/boost/thread/detail/thread.hpp:176: error: ISO C++ forbids
 declaration of 'attributes' with no type
 /temp/boost/boost/thread/detail/thread.hpp:185: error: expected ',' or
 '...' before '' token
 /temp/boost/boost/thread/detail/thread.hpp:185: error: ISO C++ forbids
 declaration of 'attributes' with no





128 bit integers

2014-05-28 Thread svante karlsson
I'm having issues with endian converison of 128 bit integers (uuid's in my
case) but the problem is generic

I currently encodes them as fixed but that leaves the swapping of bytes
(for endianness) up to the user. I had not given the matter any thought
until we streched some existing 64 bit id's in an existing database to 128
by simply adding 0 in the upper 64 bits. It turns out that the c/c++ and
java versions ar not (of course) compatible.

I think we have the same issue in the spec

The spec exemplifies

serverHash in avro rpc as a md5.

 {type: fixed, name: MD5, size: 16}

 java to java this works fine...

What's the best way to tackle this?

/svante


c++11 http client and server library

2014-04-26 Thread svante karlsson
I've started to work on a c++ library that I intend to use for performing
avro encoded rest calls. If/when I understand how to implement avro rpc it
should be simple enough to extend the existing code base to that as well.

The client is implemented using libcurl and boost asio.

The server is based on some of the boost asio samples but using
Joyents/NGINX http parser.

The client has both async and sync methods and http 1.1 is (partially?)
supported. connection:keep-alive is implemented.

The code that I based this on supported openssl as well but I have not yet
completed that part. Fragments are there

I've noticed that the existing avrogencpp can't be used for avro-rpc specs
- anyone that can share some light on how it's supposed to be implemented?

Should be portable and currently runs on ubuntu 13.10 and windows.
(currently adding raspberry support)

All included code should be in various state of opensource and my own
contributions are distributed with boost license.

The documentation is sparse but there are some examples that's rather
simple.

dependencies
boost
avrocpp
libcurl
cmake
C++11  (tested with gcc  visual studio 2013)

code can be found here https://github.com/bitbouncer/csi-http


Comments most welcome.

/svante