Re: Avro schema evolution support in AVRO CPP

2024-01-12 Thread John McClean
fwiw, I'm using it and it works fine, at least for my use cases.

J

On Fri, Jan 12, 2024 at 1:55 AM Martin Grigorov 
wrote:

> Hi Vivek,
>
> I am not sure there is anyone to give you an exact answer. The C++ SDK has
> not been actively developed in the last few years.
> The best is to try it for your use cases and see if it works or not. The
> next step is to contribute Pull Requests for the missing functionalities!
>
> Martin
>
> On Thu, Jan 11, 2024 at 8:59 AM Vivek Kumar <
> vivek.ku...@eclipsetrading.com.invalid> wrote:
>
>> +dev
>>
>>
>> Regards,
>> Vivek Kumar
>>
>> [http://www.eclipsetrading.com/logo.png]
>>
>> Senior Software Developer
>> 23/F One Hennessy
>> 1 Hennessy Road
>> Wan Chai
>> Hong Kong
>> www.eclipsetrading.com
>> +852 2108 7352
>>
>> Follow us today on our online platforms
>> [Facebook][Linked-In]<
>> https://www.linkedin.com/company/eclipse-trading>[Instagram]<
>> https://www.instagram.com/eclipsetrading>
>> 
>> From: Vivek Kumar 
>> Sent: Thursday, January 11, 2024 11:07 AM
>> To: user@avro.apache.org 
>> Subject: Avro schema evolution support in AVRO CPP
>>
>> Hi Avro team,
>>
>> I am writing this email to check the support of Avro schema evolution in
>> CPP - i.e. provide both the producer and consumer schema when decoding the
>> data.
>>
>> I can see that there's a resolvingDecoder function in AVRO CPP that takes
>> two schemas. See
>>
>> https://avro.apache.org/docs/1.10.2/api/cpp/html/index.html#ReadingDifferentSchema
>>
>> But there's a FIXME comment in this function. See
>> https://issues.apache.org/jira/browse/AVRO-3720 and
>> https://github.com/apache/avro/blob/main/lang/c%2B%2B/api/Decoder.hh#L218.
>> Does this mean resolvingDecoder does not work properly? Could you please
>> explain what scenarios are not covered by resolvingDecoder and how can we
>> use it to support "Avro Schema Evolution" in c++?
>>
>> Thanks
>>
>>
>> Regards,
>> Vivek Kumar
>>
>> [http://www.eclipsetrading.com/logo.png]
>>
>> Senior Software Developer
>> 23/F One Hennessy
>> 1 Hennessy Road
>> Wan Chai
>> Hong Kong
>> www.eclipsetrading.com
>> +852 2108 7352
>>
>> Follow us today on our online platforms
>> [Facebook][Linked-In]<
>> https://www.linkedin.com/company/eclipse-trading>[Instagram]<
>> https://www.instagram.com/eclipsetrading>
>>
>


Re: Serialization with optional fields using C++ library

2021-12-22 Thread John McClean
I only skimmed this, but the schema should read "default", not "defaults".
I've no idea if that's the only issue.

J

On Wed, Dec 22, 2021 at 8:40 AM Anton  wrote:

> Hi Martin,
>
>
>
> SchemaTests.cc checks only test compilation and my schema compiles fine
> according to my code.
>
> Is there any test that checks full cycle of serialization/deserialization
> process with json/schema provided?
>
>
>
>
>
> *From:* Martin Grigorov [mailto:mgrigo...@apache.org]
> *Sent:* Wednesday, December 22, 2021 3:24 PM
> *To:* user@avro.apache.org
> *Subject:* Re: Serialization with optional fields using C++ library
>
>
>
> Hi Anton,
>
>
>
> I don't see any unit tests / examples for this at
> https://github.com/apache/avro/tree/master/lang/c%2B%2B/ test|examples,
> so I guess it is not implemented yet.
>
> You could add an entry for basicSchemas (
> https://github.com/apache/avro/blob/1aa963c44d1b9da3dfcf74acb3eeed56439332a0/lang/c%2B%2B/test/SchemaTests.cc#L30)
> and if it fails then create an issue / PR.
>
>
>
> On Wed, Dec 22, 2021 at 10:13 AM Anton  wrote:
>
> Hello, I have problem with serialization of data having optional fields.
> When I pass null in corresponding field it works but when it is non-null
> then serialization fails.
>
> Schema:
>
> {
>
> "type": "record",
>
> "name": "schema",
>
> "namespace": "space",
>
> "fields": [
>
>{
>
>"name": "username",
>
>"type": "string"
>
>},
>
>{
>
>"name": "data",
>
>"type": [
>
>"null",
>
>"string"
>
>],
>
>"defaults": null
>
>},
>
>{
>
>"name": "timestamp",
>
>"type": "long"
>
>}
>
> ]
>
> }
>
>
>
> Data that works:
>
> {"username":"miguno","data":null,"timestamp": 1366150681 }
>
>
>
> Data that fails:
>
> {"username":"miguno","data":"test","timestamp": 1366150681 }
>
>
>
> Should it work or I have some error in my schema? I didn’t find any active
> issues in jira so I guess the concept of optional fields should work just
> fine, also in C++.
>
>
>
> The code is:
>
>
>
> std::unique_ptr in = avro::memoryInputStream((const
> uint8_t*)[0], json.size()); // json is incoming data
>
> avro::ValidSchema validSchema;
>
> std::istringstream ins(schema); // schema is avro-schema
>
> try {
>
> avro::compileJsonSchema(ins, validSchema);
>
> }
>
> catch (const std::exception& e1) {
>
> std::string errstr = e1.what();
>
> }
>
> avro::DecoderPtr jd = avro::jsonDecoder(validSchema);
>
> avro::GenericDatum datum(validSchema);
>
> jd->init(*in);
>
> avro::decode(*jd, datum); //serialization with non-null data fails
> somewhere inside this step
>
>


Re: (Nothing -> C++ Union Decoding)

2021-05-21 Thread John McClean
>From the spec:

> A union is encoded by first writing an int value indicating the
zero-based position within the union of the schema of its value. The value
is then encoded per the indicated schema within the union.

In other words 'value' doesn't have a way for the decoder to figure out
which type is in the union.

(Where are you getting this example from?)

J

On Fri, May 21, 2021 at 12:36 AM svend frolund  wrote:

> Hello,
>
> I cannot seem to get avro union types to work properly in the c++ codebase
> that I pulled from your github repo a couple of weeks ago. I want to
> specify that an object attribute can be either null or a string in order to
> capture some notion of optional attributes in my json data. However, when
> decoding data that actually has a string value for the "optional" attribute
> in question, I get the following exception: "Incorrect token in the stream.
> Expected: Object start, found String". Here is a small program that
> replicates the issue:
>
>   std::string schema;
>   schema += "{";
>   schema += "   \"name\" : \"simple\", ";
>   schema += "   \"type\" : \"record\", ";
>   schema += "   \"fields\" : [ { \"name\" : \"last\", \"type\" : [
> \"null\", \"string\"] } ] ";
>   schema += "}";
>
>   std::string value;
>   value += "{";
>   value += "   \"last\" : \"dog\" ";
>   value += "}";
>
>   std::istringstream schemass(schema);
>   std::istringstream valuess(value);
>
>   avro::ValidSchema cpxSchema;
>   avro::compileJsonSchema(schemass, cpxSchema);
>
>   std::unique_ptr json_is =
> avro::istreamInputStream(valuess);
>
>   /* JSON decoder */
>   avro::DecoderPtr json_decoder = avro::jsonDecoder(cpxSchema);
>   avro::GenericDatum *datum = new avro::GenericDatum(cpxSchema);
>
>   try
>   {
>  /* Decode JSON to Avro datum */
>  json_decoder->init(*json_is);
>  avro::decode(*json_decoder, *datum);
>   }
>   catch(const avro::Exception &_e)
>   {
>   // throws Incorrect token in the stream. Expected: Object start,
> found String
>   }
>
> Do I need to configure the system in a particular way for this to work, or
> does the current implementation simply not support these types of unions.
>
> I sincerely hope someone can help!
>
> All the best,
>
>Svend
>


Re: Tool for creating uml diagrams

2018-04-09 Thread John McClean
I'd consider generating Java from the avro and then using one of the many
tools that will generate UML from Java.

On Mon, Apr 9, 2018 at 4:06 PM, Rob Torop  wrote:

> You could try using plantuml.  You'd have to write some code to generate
> the files it takes (it's like graphviz - a simple text format) and then run
> plantuml to generate image files.
>
>
> On Mon, Apr 9, 2018 at 1:16 PM David Espinosa  wrote:
>
>> Hi all,
>> I was wondering if somebody knows about some tool to create uml diagrams
>> from avro files.
>>
>> Thanks in advance!
>> David
>>
>


Re: Customizing JSON encoding for Avro C++ bindings

2017-03-07 Thread John McClean
If you want a string representation, can you use 'string' as the type
rather than 'fixed'? Or is the issue that you have data conversion /
backwards compatibility constraints?

On Tue, Mar 7, 2017 at 6:51 AM, Tim Upthegrove 
wrote:

> Hi folks,
>
> I am trying to add a convenience method for my team that can serialize an
> Avro record in the C++ bindings to JSON with a few minor customizations.
> For example, our schema encodes UUIDs as Fixed types, and they are the only
> fields which are a Fixed type with the particular length of a UUID.  We'd
> like to encode those UUIDs in the JSON output according to the RFC 4122
> specified UUID format instead of a unicode escape sequence for ease of
> debugging.
>
> Since we only have a few minor changes, I initially thought it would make
> sense to use inheritence for this, but looking at the JsonGenerator and
> JsonEncoder classes that currently exist, this seems like it could be
> challenging.  Continuing with the UUID example, the specific method I'd
> want to override (escapeCtl at https://github.com/apache/
> avro/blob/master/lang/c++/impl/json/JsonIO.hh#L214) and the StreamWriter
> for building the JSON string are both private to the JsonGenerator class.
>
> Is there an easy way to achieve what I describe in the example above in
> the C++ bindings that I am missing or overlooking?  If not, is there any
> reason why those methods should not be changed from private visibility to
> protected so I could accomplish this through inheritence?
>
> Thanks,
> --
>
> Tim Upthegrove
>


Re: Avro -> JDBC ?

2016-12-19 Thread John McClean
I've never done this, but I were to try I'd start by looking at Confluent's
JDBC sink connector.

http://docs.confluent.io/3.1.1/connect/connect-jdbc/docs/index.html

It's going to have all the moving parts you need. The question will be
whether it's too heavy-weight and requires too much setup. Not having used
it, I don't know.

J

On Mon, Dec 19, 2016 at 10:49 AM, tl  wrote:

> Hi,
>
> I wrote a small tool[0] that converts network monitoring data from a
> homegrown format to Avro, Parquet and JSON, using Avro 1.8.1 with the
> 'specific' API (with code generation).
>
> Now I need to import the same data into an RDBMS (a Derby derivate)
> through JDBC. The database schema mimics closely the Avro schemata.
>
> I wonder if there’s an easy way to hook into the machinery that I already
> built and make it talk to an RDBMS too. Does this sound reasonable? Has it
> been done before? Any hints on how to proceed or where to look?
>
> Thanks,
> Thomas
>
>
>
> .
> [0] https://github.com/tomlurge/converTor


Re: How to extract String from GenericRecorcd as null instead of null string?

2016-12-16 Thread John McClean
This isn't an avro issue, it's to do with your use of 'valueOf'. You need
to check for null before you call it. More info's here.

http://stackoverflow.com/questions/13577892/what-is-difference-between-null-and-null-of-string-valueofstring-object

J

On Fri, Dec 16, 2016 at 5:54 PM, Check Peck  wrote:

> I am working with Avro and in my avro schema I have one field like this:
>
>  {
> "name" : "userId",
> "type" : [ "null", "string" ],
> "doc" : "some doc"
>   },
>
> This is how I am extracting userId field from GenericRecord:
>
> GenericRecord payload = decoder.decode(record.value());
> String userid = String.valueOf(payload.get("userId"));
> // sometimes userid comes as null string meaning like this "null"
> System.out.println(Strings.isNullOrEmpty(userid));
>
> And because of that "null" string, my sysout prints out as false.  Is
> there any way to extract userid as null instead of "null" String?
>
> Bcoz when I check for null string it fails and if I have to accommodate
> this fix, I have to add extra check with ".equals" which I want to avoid if
> possible? Is there any way?
>
>


Re: Alternative to Avro container files for long-term Avro storage

2016-11-15 Thread John McClean
One approach is to have separate Kafka topics per schema, which evolve with
use of a schema registry: https://github.com/confluentinc/schema-registry.
You'd write to the topic with the schema id in metadata. You'd write normal
avro storage files, knowing when to split them based on the changing schema
id in the kafka message.

On Tue, Nov 15, 2016 at 2:24 AM, Josh  wrote:

> Hi all,
>
> I am using a typical Avro->Kafka solution where data is serialized to Avro
> before it gets written to Kafka and each message is prepended with a schema
> ID which can be looked up in my schema repository.
>
> Now, I want to store the data in long-term storage by writing data from
> Kafka->S3.
>
> I know that the usual way to store Avro in storage is using Avro container
> files, however a container file can only contain messages encoded with a
> single Avro schema. In my case, the messages may be encoded with difference
> schemas, and I need to retain the order of the messages (so that they can
> be replayed into Kafka, in order). Therefore, a single file in S3 needs to
> contain messages encoded with different schemas and so I can't use Avro
> container files.
>
> I was wondering what would be a good solution to this? What format could I
> use to store my Avro data, such that a single data file can contain
> messages encoded with different schemas? Should I store the messages with a
> prepended schema ID, similar to what I do in Kafka? In that case, how could
> I separate the messages in the file?
>
> Thanks for any advice,
> Josh
>