Re: Formal spec for Avro Schema

2024-05-22 Thread Oscar Westra van Holthe - Kind
Hi everyone,

A bit late, but I though I’d add a few thoughts on this as well.

For one, I love the idea of improving our documentation. Separating the schema 
specification from the encoding makes perfect sense to me, and also allows us 
to clearly state which encoding to use. Currently, most problems I see in 
online questions arise from using raw / internal encodings, which I think is an 
easy problem to prevent.

As to the specification, I think it’s a good start. Some things I really like:

the introduction of an Avro Schema Document, limiting top types to a (union of) 
named type(s)
an explicit "no imports" rule to ensure a schema document is self-contained

There are some things I think we can improve, such as explicitly mentioning all 
attributes (the ’type’ attribute is not introduced), fixing a few errors in the 
document, etc. I’ve taken the liberty of doing so.

One notable addition is a de facto requirement: that names and aliases must be 
unique within their context.

I’ve put my changes in a fork of Clemens gist: 
https://gist.github.com/opwvhk/38481bf19a175a86c703d8ba0ab08866


As a followup to this schema specification, we can make specifications for the 
binary encoding (warning to never use it directly), Avro files, the 
Single-Object encoding, protocols, the protocol wire format, and the IDL schema 
and protocol definitions.


Kind regards.
Oscar


-- 
Oscar Westra van Holthe - Kind 

> On 15 May 2024, at 11:17, Clemens Vasters via user  
> wrote:
> 
> Hi Martin,
>  
> I am saying that the specification of the schema is currently entangled with 
> the specification of the serialization framework. Avro Schema is useful and 
> usable even if you never touch the Avro binaries (the framework, an 
> implementation using the spec). 
>  
> I am indeed proposing to separate the schema spec from the specs of the Avro 
> binary encoding and the Avro JSON encoding, which also avoids strange 
> entanglements like the JSON encoding pointing to the schema description’s 
> default values section, which is in itself rather lacking in precision, i.e. 
> the encoding rule for binary or fixed is “defined” with a rather terse 
> example: "\u00ff"
>  
> Microsoft would like to propose Avro and Avro Schema in several 
> standardization efforts, but we need a spec that works in those contexts and 
> that can stand on its own. I would also like to see “application/avro” as a 
> formal media type, but the route towards that only goes via formal 
> standardization of both schema and encodings.
>  
> I believe the Avro project’s reach and importance is such that schema and 
> encodings should have formal specs that can stand on their own as JSON and 
> CBOR and AMQP and XML and OPC/Binary and other serialization schemas/formats 
> do. I don’t think existence of a formal spec gets in the way of progress and 
> Avro is so mature that the spec captures a fairly stable state.
>  
> Best Regards
> Clemens
>  
> From: Martin Grigorov 
> Sent: Wednesday, May 15, 2024 10:54 AM
> To: d...@avro.apache.org
> Cc: user@avro.apache.org
> Subject: Re: Formal spec for Avro Schema
>  
> Hi Clemens,
>  
> On Wed, May 15, 2024 at 11:18 AM Clemens Vasters 
> mailto:cleme...@microsoft.com.invalid>> 
> wrote:
> Hi Martin,
> 
> we find Avro Schema to be a great fit for describing application data 
> structures in general and even independent of wire-serialization scenarios.
> 
> Therefore, I would like to have a spec that focuses specifically on the 
> schema format, is grounded in the IETF RFC specs, and which follows the 
> conventions set by IETF, so that folks who need a sane schema format to 
> describe data structures independent of implementation can use that.
>  
> Do you say that the specification document is implementation dependent ?
> If this is the case then maybe we should work on improving it instead of 
> duplicating it.
>  
> 
> The benefit for the Avro serialization framework of having such a formal spec 
> that is untangled from the wire-serialization specs is that all schemas 
> defined by that schema model are compatible with the framework.
>  
> What do you mean by "framework" here ?
>  
> 
> The differences are organization, scope, and language style (including 
> keywords etc.). The expressed ruleset is the same.
>  
> I don't think it is a good idea to add a second document that is very similar 
> to the specification but uses a different language style.
> To me this looks like a duplication.
> IMO it would be better to suggest (many) (smaller) improvements for the 
> existing document. 
>  
>  
> 
> Best Regards
> Clemens
> 
> -Original Message-
> From: Martin Grigorov mailto:mgrigo...@apache.org>>
> Sent: Wednesd

Re: Schema version in namespace? good practice

2024-05-16 Thread Oscar Westra van Holthe - Kind
Hi,

As far as best practices are concerned, I have never seen any that change
the names and/or namespace of a schema. But that is also because of
generating code from the Avro schema.

What I usually see is adding an extra property to the root schema, named
"version" or similar, combined with a version name in the .avsc file name.

Kind regards,
Oscar

-- 
Oscar Westra van Holthe - Kind 

Op do 16 mei 2024 07:48 schreef Vignesh Kumar Kathiresan via user <
user@avro.apache.org>:

> Hi,
>
> I am new to avro and started working on it recently. I am in the process
> of designing a schema evolution process. We use java applications and make
> use of maven plugin to auto generate the classes from .avsc schema files.
>
> I am thinking of adding the version in the namespace during each evolution
> as compatibility is based on unqualified-name only according to
> specification. This is because I can now have a central library which keeps
> track of all the versions and all the client applications can just import
> the library and use different versions of schemas(java classes). Instead of
> every client importing the required schema files and auto-generating at
> their end every time they are upgrading to a newer version of the schema.
> Is this a good practice to include version_id in the namespace?
>
> Also we use a schema registry with full compatibility checks on.
>
>  Thanks,
> Vignesh
>


Re: Avro JSON Encoding

2024-04-23 Thread Oscar Westra van Holthe - Kind
Hi,

Using a JSON encoding to bridge Avro to/from JSON is indeed a good idea.

But the systems still need to talk the same “language” (data structure). I 
rarely encounter systems that allow fully free-form objects (and never in 
production); there’s always some data structure behind it. This data structure 
(dict/struct/record/…) limits what can be transferred, and in at least 19 out 
of 20 cases covers records (with required fields), arrays and 
primitives.

The cases where I do see more complex data structures, the ones that use the 
more advanced XML features (like mixing namespaces), free-form JSON options 
(like mixing fixed properties with patternProperties/additionalProperties), are 
generally so much tied to that format that bridging to Avro makes little to no 
sense.

That doesn’t mean that there is no use case, but it does mean that you’re more 
helped with a dedicated parser that emits Avro records. Kind of like I’ve 
dabbled a bit with: https://github.com/opwvhk/avro-conversions


Kind regards,
Oscar

-- 
Oscar Westra van Holthe - Kind 

> On 23 Apr 2024, at 15:40, Clemens Vasters via user  
> wrote:
> 
> I don't think you get around either maps or unions for a model where Avro 
> Schema can describe a JSON originating from an existing producer that isn't 
> aware of Avro Schema being used by the consumer. That is the test I would 
> apply for whether the encoder (or decoder in this case) is practically 
> useful. Avro Binary sufficiently covers the scenario where both parties are 
> known to be implemented with Avro. JSON is primarily useful as a bridge 
> to/from producers and consumers which do not use Avro bits and thus likely 
> not Avro Schema.
> Von: Oscar Westra van Holthe - Kind 
> Gesendet: Tuesday, April 23, 2024 1:10:16 PM
> An: user@avro.apache.org 
> Betreff: Re: Avro JSON Encoding
>  
> Sie erhalten nicht oft eine E-Mail von opw...@apache.org. Erfahren Sie, warum 
> dies wichtig ist <https://aka.ms/LearnAboutSenderIdentification>
> Hi everyone,
> 
> Having looked a bit more into what I usually see when using JSON to transfer 
> data, I think we can limit cross-format support (what this essentially is) to 
> a common denominator as we can see between Python objects / dicts, Rust 
> structs, Java POJOs/records, and Parquet MessageTypes, just to name a few.
> 
> This essentially boils down to all Avro constructs except for maps and unions 
> other than a single type plus null (i.e., the recent ‘?’ addition to the IDL 
> syntax). It also means we can omit support for most of the esoteric JSON 
> schema constructs, like additionalProperties, patternProperties, 
> if/then/else, etc.
> 
> 
> However, as Ryan noted, it still makes sense to find a way to promote the 
> Avro binary format. Especially the single-message encoding, I’d add: most 
> questions to use JSON, for example, are for single records. Currently 
> however, only the Rust and Java SDKs mention the byte marker for the 
> single-message encoding at all. It’s very much lacking from Python.
> 
> In fact, if we want to promote the use of Avro (and especially its binary 
> format), we must have a better documentation and implementation of the 
> single-message encoding.
> 
> 
> Kind regards,
> Oscar
> 
> -- 
> Oscar Westra van Holthe - Kind 
> 
>> On 19 Apr 2024, at 23:45, Andrew Otto  wrote:
>> 
>> > There's probably a nice balance between a rigorous and interoperable (but 
>> > less customizable) JSON encoding, and trying to accommodate arbitrary JSON 
>> > in the Avro project.
>> 
>> For my own purposes, I'd only need a very limited set of JSON support. For 
>> event streaming, we limit JSONSchema usages to those that can be easily and 
>> explicitly mapped to SQL (Hive, Spark, Flink) type systems. e.g. No 
>> undefined additionalProperties 
>> <https://wikitech.wikimedia.org/wiki/Event_Platform/Schemas/Guidelines#No_object_additionalProperties>,
>>  no union types 
>> <https://wikitech.wikimedia.org/wiki/Event_Platform/Schemas/Guidelines#No_union_types_/_No_null_values>,
>>  etc. etc.  
>> 
>> 
>> 
>> 
>> On Fri, Apr 19, 2024 at 11:58 AM Ryan Skraba > <mailto:r...@skraba.com>> wrote:
>> Hello!
>> 
>> A bit tongue in cheek: the one advantage of the current Avro JSON
>> encoding is that it drives users rapidly to prefer the binary
>> encoding!  In its current state, Avro isn't really a satisfactory
>> toolkit for JSON interoperability, while it shines for binary
>> interoperability. Using JSON with Avro schemas is pretty unwieldy and
>> a JSON data designer will almost never be entirely satisfied with the
>> JSON "shape" they can get... today it's 

Re: Avro JSON Encoding

2024-04-23 Thread Oscar Westra van Holthe - Kind
Hi everyone,

Having looked a bit more into what I usually see when using JSON to transfer 
data, I think we can limit cross-format support (what this essentially is) to a 
common denominator as we can see between Python objects / dicts, Rust structs, 
Java POJOs/records, and Parquet MessageTypes, just to name a few.

This essentially boils down to all Avro constructs except for maps and unions 
other than a single type plus null (i.e., the recent ‘?’ addition to the IDL 
syntax). It also means we can omit support for most of the esoteric JSON schema 
constructs, like additionalProperties, patternProperties, if/then/else, etc.


However, as Ryan noted, it still makes sense to find a way to promote the Avro 
binary format. Especially the single-message encoding, I’d add: most questions 
to use JSON, for example, are for single records. Currently however, only the 
Rust and Java SDKs mention the byte marker for the single-message encoding at 
all. It’s very much lacking from Python.

In fact, if we want to promote the use of Avro (and especially its binary 
format), we must have a better documentation and implementation of the 
single-message encoding.


Kind regards,
Oscar

-- 
Oscar Westra van Holthe - Kind 

> On 19 Apr 2024, at 23:45, Andrew Otto  wrote:
> 
> > There's probably a nice balance between a rigorous and interoperable (but 
> > less customizable) JSON encoding, and trying to accommodate arbitrary JSON 
> > in the Avro project.
> 
> For my own purposes, I'd only need a very limited set of JSON support. For 
> event streaming, we limit JSONSchema usages to those that can be easily and 
> explicitly mapped to SQL (Hive, Spark, Flink) type systems. e.g. No undefined 
> additionalProperties 
> <https://wikitech.wikimedia.org/wiki/Event_Platform/Schemas/Guidelines#No_object_additionalProperties>,
>  no union types 
> <https://wikitech.wikimedia.org/wiki/Event_Platform/Schemas/Guidelines#No_union_types_/_No_null_values>,
>  etc. etc.  
> 
> 
> 
> 
> On Fri, Apr 19, 2024 at 11:58 AM Ryan Skraba  <mailto:r...@skraba.com>> wrote:
>> Hello!
>> 
>> A bit tongue in cheek: the one advantage of the current Avro JSON
>> encoding is that it drives users rapidly to prefer the binary
>> encoding!  In its current state, Avro isn't really a satisfactory
>> toolkit for JSON interoperability, while it shines for binary
>> interoperability. Using JSON with Avro schemas is pretty unwieldy and
>> a JSON data designer will almost never be entirely satisfied with the
>> JSON "shape" they can get... today it's useful for testing and
>> debugging.
>> 
>> That being said, it's hard to argue with improving this experience
>> where it can help developers that really want to use Avro JSON for
>> data transfer, especially for things accepting JSON where the
>> intention is clearly unambiguous or allowing optional attributes to be
>> missing.  I'd be enthusiastic to see some of these improvements,
>> especially if we keep the possibility of generating strict Avro JSON
>> for forwards and backwards compatibility.
>> 
>> My preference would be to avoid adding JSON-specific attributes to the
>> spec where possible.  Maybe we could consider implementing Avro JSON
>> "variants" by implementing encoder options, or alternative encorders
>> for an SDK. There's probably a nice balance between a rigorous and
>> interoperable (but less customizable) JSON encoding, and trying to
>> accommodate arbitrary JSON in the Avro project.
>> 
>> All my best and thanks for this analysis -- I'm excited to see where
>> this leads!  Ryan
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On Thu, Apr 18, 2024 at 8:01 PM Oscar Westra van Holthe - Kind
>> mailto:os...@westravanholthe.nl>> wrote:
>> >
>> > Thank you Clemens,
>> >
>> > This is a very detailed set of proposals, and it looks like it would work.
>> >
>> > I do however, feel we'd need to define a way to unions with records. Your 
>> > proposal lists various options, of which the discriminatory option seems 
>> > most portable to me.
>> >
>> > You mention the "displayName" proposal. I don't like that, as it mixes 
>> > data with UI elements. The discriminator option can specify a fixed or 
>> > configurable field to hold the type of the record.
>> >
>> > Kind regards,
>> > Oscar
>> >
>> >
>> > --
>> > Oscar Westra van Holthe - Kind > > <mailto:os...@westravanholthe.nl>>
>> >
>> > Op do 18 apr. 2024 10:12 schreef Clemens Vasters via user 
>> > mail

Re: Avro JSON Encoding

2024-04-18 Thread Oscar Westra van Holthe - Kind
Thank you Clemens,

This is a very detailed set of proposals, and it looks like it would work.

I do however, feel we'd need to define a way to unions with records. Your
proposal lists various options, of which the discriminatory option seems
most portable to me.

You mention the "displayName" proposal. I don't like that, as it mixes data
with UI elements. The discriminator option can specify a fixed or
configurable field to hold the type of the record.

Kind regards,
Oscar


-- 
Oscar Westra van Holthe - Kind 

Op do 18 apr. 2024 10:12 schreef Clemens Vasters via user <
user@avro.apache.org>:

> Hi everyone,
>
>
>
> the current JSON Encoding approach severely limits interoperability with
> other JSON serialization frameworks. In my view, the JSON Encoding is only
> really useful if it acts as a bridge into and from JSON-centric
> applications and it currently gets in its own way.
>
>
>
> The current encoding being what it is, there should be an alternate mode
> that emphasizes interoperability with JSON “as-is” and allows Avro Schema
> to describe existing JSON document instances such that I can take someone’s
> existing JSON document in on one side of a piece of software and emit Avro
> binary on the other side while acting on the same schema.
>
>
>
> There are four specific issues:
>
>
>
>1. Binary Values
>2. Unions with Primitive Type Values and Enum Values
>3. Unions with Record Values
>4. DateTime
>
>
>
> One by one:
>
>
>
> 1. Binary values:
>
> -
>
>
>
> Binary values are (fixed and bytes) are encoded as escaped unicode
> literals. While I appreciate the creative trick, it costs 6 bytes for each
> encoded byte. I have a hard time finding any JSON libraries that provide a
> conversion of such strings from/to byte arrays, so this approach appears to
> be idiosyncratic for Avro’s JSON Encoding.
>
>
>
> The common way to encode binary in JSON is to use base64 encoding and that
> is widely and well supported in libraries. Base64 is 33% larger than plain
> bytes, the encoding chosen here is 500% (!) larger than plain bytes.
>
>
>
> The Avro decoder is schema-informed and it knows that a field is expected
> to hold bytes, so it’s easy to mandate base64 for the field content in the
> alternate mode.
>
>
>
> 2. Unions with Primitive Type Values and Enum Values
>
> -
>
>
>
> It’s common to express optionality in Avro Schema by creating a union with
> the “null” type, e.g. [“string”, “null”]. The Avro JSON Encoding opts to
> encode such unions, like any union, as { “{type}”: {value} } when the value
> is non-null.
>
>
>
> This choice ignores common practice and the fact that JSON’s values are
> dynamically typed (RFC8259 Section-3
> <https://www.rfc-editor.org/rfc/rfc8259#section-3>) and inherently
> accommodate unions. The conformant way to encode a value choice of null or
> “string” into a JSON value is plainly null and “string”.
>
>
>
> “foo” : null
>
> “foo”: “value”
>
>
>
> The “field default values” table in the Avro spec maps Avro types to the
> JSON types null, boolean, integer, number, string, object, and array, all
> of which can be encoded into and, more importantly, unambiguously decoded
> from a JSON value. The only semi-ambiguous case is integer vs. number,
> which is a convention in JSON rather than a distinct type, but any Avro
> serializer is guided by type information and can easily make that
> distinction.
>
>
>
> 3. Unions with Record Values
>
> -
>
>
>
> The JSON Encoding pattern of unions also covers “record” typed values, of
> course, and this is indeed a tricky scenario during deserialization since
> JSON does not have any built-in notion of type hints for “object” typed
> values.
>
>
>
> The problem of having to disambiguate instances of different types in a
> field value is a common one also for users of JSON Schema when using the
> “oneOf” construct, which is equivalent to Avro unions. There are two common
> strategies:
>
>
>
> - “Duck Typing”:  Every conformant JSON Schema Validator determines the
> validity of a JSON node against a “oneOf" rule by testing the instance
> against all available alternative schema definitions. Validation fails if
> there is not exactly one valid match.
>
> - Discriminators: OpenAPI, for instance, mandates a “discriminator” field
> (see https://spec.openapis.org/oas/latest.html#discriminator-object) for
> disambiguating “oneOf” constructs, whereby the discriminator property is
> part of each instance. That approach informs numerous JSON serialization
> frameworks, which im

Re: Avro query on Testing if Beta feature of Generating faster code is enabled

2024-02-05 Thread Oscar Westra van Holthe - Kind
Hi,

What do you mean by testing if the flag is successfully turned on? Do you
need to test its effects? Are you willing to trust the tests we have on the
flag?

As for testing the performance difference, we do have a performance test
module. Perhaps you can use the same technique?

Kind regards,
Oscar

-- 
Oscar Westra van Holthe - Kind 

Op di 6 feb. 2024 07:15 schreef chirag :

> Hi Team,
>
> On the Avro Docs it is mentioned that:- to turn new approach to generating
> code that speeds up decoding and encoding set feature flag/system flag
> org.apache.avro.specific.use_custom_coders to true at runtime.(here
> <
> https://avro.apache.org/docs/1.11.1/getting-started-java/#beta-feature-generating-faster-code
> >
> ).
>
> Enquiring if:
>
>1. There is a way to see if this flag is successfully turned on during
>runtime?
>2. There is a way to measure the performance improvement in doing so?
>
> I have added this system flag to my distributed enterprise application but
> I am not sure if it is enabled and if there is a performance improvement on
> doing so.
>
> Sincerely
> Chirag Nahata
>


Re: Metadata / Annotations support

2023-10-11 Thread Oscar Westra van Holthe - Kind
On wed 11 okt. 2023 17:24, Gustavo Monarin  wrote:

> Sometimes is usefull to give a contextual information to a field.
>
> Other protocols like protobuf support annotations through what they call
> Option <https://protobuf.dev/programming-guides/proto3/#customoptions>
> which allows the following customization (uuid field annotation):
>
>
> ```
> message FarmerRegisteredEvent {
>
>   string uuid = 1[(pi2schema.subject_identifier) = true];
>
>   oneof personalData {
> ContactInfo contactInfo = 2;
> pi2schema.EncryptedPersonalData encryptedPersonalData = 6;
>   }
>
>   google.protobuf.Timestamp registeredAt = 4;
>   string referrer = 5;
>
> }
> ```
>
> Would it be possible to add such metadata information in an avro schema?
>
Yes. On schemata, fields and messages, you an add any property other than
the standard ones (like 'name', 'doc', etc.) with any json value as value.

As far as Avro is concerned these are ignored, but you can use them in your
code.

Sometimes you want more though, and for that there are logical types and
(for Java code generation) the 'javaAnnotation' property.

Kind regards,
Oscar

-- 
Oscar Westra van Holthe - Kind 


Re: Skip Namespace while Decoding Union Types from JSON

2022-12-05 Thread Oscar Westra van Holthe - Kind
Hi Chirag,

Please note that json encoded Avro is NOT the same as plain json. It's a
special json dialect that is generally not compatible with regular json
data.

Sad news: there exists no json parser that yields Avro records. Json simply
lacks the context information that Avro union parsing needs.

Kind regard,
Oscar

-- 
Oscar Westra van Holthe - Kind 

Op ma 5 dec. 2022 12:47 schreef Chirag Dewan :

> Hello,
>
> My system is receiving JSON data serialized from JSON schemas. I am trying
> to represent the *minProperties *and *maxProperties *types in JSON schema
> in AVRO using Union types. But the problem is, AVRO JSON decoder expect the
> Union branch to be present in data and it searches the union type by fully
> qualified name i.e. namespace and type name.
>
> Unfortunately, this is not how my data is encoded.
>
> My input JSON is encoded like this:
>
> {"Request": {
> 
>   }
> }
>
> And I represent it as:
>
> {
> "name": "Schemas",
> "namespace": "com.sample",
> "type": ["null", {
> "type": "record"
> "name": "Request"
> ...
>
> }
> }
>
> So AVRO JSON decoder expects the following:
>
> {"com.sample.Request": {
> 
>   }
> }
>
> One way I thought I could solve this is by using blank namespaces. But that 
> messes my Java class generation in default package.
>
>
> Any way around this? Any help is appreciated.
>
> Thank you.
>
>
>
>


Re: Modifying a field's schema property in Java

2022-11-12 Thread Oscar Westra van Holthe - Kind
On sun 13 nov. 2022 05:34, Julien Phalip  wrote:

> I've got a schema with multiple levels of depths (sub-records) that I
> would need to change slightly. [...]
>
> Is there a way to make this type of modification on an existing schema, or
> do you have to recreate the entire schema from scratch?
>

After creation, Avro schemata are immutable. To make such modifications you
can use a visitor. There already is some code available to help you along:
you can find an example in the module avro-compiler, that replaces
references to named schemata with the actual schema.

IIRC, you're looking for the Schemas class. The interface you need to
implement has the word 'visitor' in the name.

Kind regards,
Oscar

-- 
Oscar Westra van Holthe - Kind 

>


Re: GenericDatumReader writer's schema question

2022-07-22 Thread Oscar Westra van Holthe - Kind
Hi Ivan,

You're correct about the GenericDatumReader javadoc, but the writer schema
can be adjusted after creation. This is what the DataFileReader does.

So after the DataFileReader is initialised, the underlying
GenericDatumReader uses the the schema in the file as write schema (to
understand the data), and the schema you provided as read schema (to give
data to you via dataFileReader.next(user)).

Does that clarify things for you?


Kind regards,
Oscar


On Wed, 20 Jul 2022 at 10:37, Ivan Tsyba  wrote:

> Hello
>
> As stated in Avro Getting Started
> <https://avro.apache.org/docs/current/gettingstartedjava.html#Deserializing> 
> about
> deserialization without code generation: "The data will be read using the
> writer's schema included in the file, and the reader's schema provided to
> the GenericDatumReader". Here is how GenericDatumReader is created in the
> example
>
> DatumReader datumReader = new
> GenericDatumReader(schema);
>
> But when you look at this GenericDatumReader constructor Javadoc it states
> "Construct where the writer's and reader's schemas are the same." (and
> actual code corresponds to this).
>
> So the writer's schema isn’t taken from a serialized file but from a
> constructor parameter?
>


-- 

✉️ Oscar Westra van Holthe - Kind 


Re: What exception to specify in Avro RPC message so that it generates to AvroRemoteException?

2022-07-01 Thread Oscar Westra van Holthe - Kind
Hi Abhishek,

Avro 1.9.0 introduced a change in the generated protocols that removes the
throws clause for AvroRemoteException.
Since then, any undeclared exception is wrapped in an AvroRuntimeException.

Kind regards,
Oscar

On Fri, 24 Jun 2022 at 15:01, Abhishek Dasgupta <
abhishekdasgupta...@gmail.com> wrote:

> Hi Oscar,
>   Thanks for the reply. Actually I'm trying to migrate from Avro
> 1.7.6-cdh5.12.0 to 1.11.0 in my codebase.  We had these Avro IDL files that
> generated the POJOs having methods with "throws  AvroRemoteException"
> clause even though there is no "throws" clause specified in their
> corresponding IDL file's RPC message.
>
>
> Currently in avro idl files with avro version 1.7.6-cdh5.12.0:
>
> void foo(int arg); // no throws clause present
>
> so the generated Avro POJO has:
>
> void foo(int arg) throws AvroRemoteException.// AvroRemoteException 
> generated even though no throws clause was present in IDL file
>
>
> When I upgraded the avro version to 1.11.0, the generated POJOs don't
> contain the throws clause in these methods.
>
>
> After avro version upgrade to 1.11.0,  :
>
> void foo(int arg); // no throws clause present
>
> so the generated Avro POJO has:
>
> void foo(int arg).  // no throws clause present after upgrade
>
>
>
>  I am unable to understand how these "throws" clauses were generated
> earlier ?
>
> What to specify in the avro IDL file's RPC messages so that they exactly
> throw AvroRemoteException in their generated POJOs ? Since a lot error
> handling is based on this exception in my codebase which results in
> compilation issues.
>
> On Thu, Jun 16, 2022 at 12:54 PM Oscar Westra van Holthe - Kind <
> os...@westravanholthe.nl> wrote:
>
>> Hi Abhishek,
>>
>> The throws something in your protocol will be compiled into a throws
>> something in Java.
>> The definition of something must be an error (not a record), which will
>> be compiled into a subclass of AvroRemoteException.
>>
>> So while your exact requirement cannot be satisfied, you will get
>> something similar.
>>
>>
>> Kind regards,
>> Oscar
>>
>>
>> On Wed, 15 Jun 2022 at 14:57, Abhishek Dasgupta <
>> abhishekdasgupta...@gmail.com> wrote:
>>
>>> I want the Avro generated Java class methods to have throws
>>> AvroRemoteException ? How to code this in my Avro IDL file ?
>>>
>>> Suppose I have this RPC message in my Avro protocol:
>>>
>>> void foo(int arg) throws something;
>>>
>>> so the generated Avro POJO has:
>>>
>>> void foo(int arg) throws AvroRemoteException
>>>
>>> What should put instead of something ?
>>>
>>> Using Avro 1.11.0
>>>
>>
>>
>> --
>>
>> ✉️ Oscar Westra van Holthe - Kind 
>>
>>

-- 

✉️ Oscar Westra van Holthe - Kind 


Re: What exception to specify in Avro RPC message so that it generates to AvroRemoteException?

2022-06-16 Thread Oscar Westra van Holthe - Kind
Hi Abhishek,

The throws something in your protocol will be compiled into a throws
something in Java.
The definition of something must be an error (not a record), which will be
compiled into a subclass of AvroRemoteException.

So while your exact requirement cannot be satisfied, you will get something
similar.


Kind regards,
Oscar


On Wed, 15 Jun 2022 at 14:57, Abhishek Dasgupta <
abhishekdasgupta...@gmail.com> wrote:

> I want the Avro generated Java class methods to have throws
> AvroRemoteException ? How to code this in my Avro IDL file ?
>
> Suppose I have this RPC message in my Avro protocol:
>
> void foo(int arg) throws something;
>
> so the generated Avro POJO has:
>
> void foo(int arg) throws AvroRemoteException
>
> What should put instead of something ?
>
> Using Avro 1.11.0
>


-- 

✉️ Oscar Westra van Holthe - Kind 


Re: Converting an AVDL file into something that the avro python package can parse

2022-04-22 Thread Oscar Westra van Holthe - Kind
Hi Eric,

You did everything right, except that you ended up with a protocol file.

Please use the tool idl2schemata instead, to generate schema file(s):

java -jar avro-tools.jar idl2schemata src/test/idl/input/namespaces.avdl
/tmp/

This will create a .avsc file per schema that you can use.

Kind regards,
Oscar

-- 
Oscar Westra van Holthe - Kind 

Op vr 22 apr. 2022 14:27 schreef Eric Gorr :

> What I would like to be able to do is take an .avdl file and parse it into
> python. I would like to make use of the information from within python.
>
> According to the documentation, Apache's python package does not handle
> .avdl files. I need to use their `avro-tools` to convert the .avdl file
> into something it does know how to parse.
>
> According to the documentation at
> https://avro.apache.org/docs/current/idl.html, I can convert a .avdl file
> into a .avpr file with the following command:
>
> > java -jar avro-tools.jar idl src/test/idl/input/namespaces.avdl
> /tmp/namespaces.avpr
>
> I ran through my .avdl file through Avro-tools, and it produced an .avpr
> file.
>
> What is unclear is how I can use the python package to interpret this
> data. I tried something simple...
>
> > schema = avro.schema.parse(open("my.avpr", "rb").read())
>
> but that generates the error:
>
> > SchemaParseException: No "type" property:
>
> I believe that `avro.schema.parse` is designed to parse .avsc files (?).
> However, it is unclear how I can use `avro-tools` to convert my .avdl into
> .avsc. Is that possible?
>
> I am guessing there are many pieces I am missing and do not quite
> understand (yet) what the purpose of all of these files are.
>
> It does appear that an .avpr is a JSON file (?) so I can just read and
> interpret it myself, but I was hoping that there would be a python package
> that would assist me in navigating the data.
>
> Can anyone provide some insights into this? Thank you.
>


Re: Spec wording on fullnames is not clear

2021-12-27 Thread Oscar Westra van Holthe - Kind
On Mon 27 dec. 2021 17:42, Askar Safin  wrote:

> Hi. I'm writing Avro implementation in Rust for personal use. I have a
> question. Consider this Avro scheme:
>
> {
>   "type": "record",
>   "name": "a.b",
>   "fields": [
> {
>   "name": "c",
>   "type": {
> "type": "record",
> "name": "d",
> "fields": []
>   }
> }
>   ]
> }
>
> What is fullname of record "c"? "a.c" or "c"? I think Avro specification
> is vague about this and should be fixed. When I attempt to interpret Avro
> spec literally, I get to conclusion that the fullname is "a.c". But this
> contradicts to my common sense.
>

c is a field in record a.d: d has no namespace and a simple name (without a
dot), so the namespace is taken from the innermost enclosing named type
(b), which has namespace a

Kind regards,
Oscar


Re: New website

2021-11-02 Thread Oscar Westra van Holthe - Kind
Hi,

This is a huge improvement. Responsive, excellent navigation, syntax
highlighting, ...

The only downside I see was already mentioned by Lee: the landing page is
too empty (also in a mobile browser).
I think we could really benefit from mentioning the unique selling point of
Avro here: "Your Data. Any Time, Anywhere." And then mention the language
availability & excellent schema evolution.

Kind regards,
Oscar

On Thu, 28 Oct 2021 at 10:43, Martin Grigorov  wrote:

> Hi all,
>
> Please check the new candidate for Apache Avro website:
> https://avro-website.netlify.app/
>
> It is based on Hugo and uses Docsy theme.
> Its source code and instructions how to build could be found at
> https://github.com/martin-g/avro-website.
> The JIRA ticket is: https://issues.apache.org/jira/browse/AVRO-2175
>
> I am not web designer, so some things may look not finished.
> I've just copied the HTML content from the old site (
> https://avro.apache.org/) and converted it to Markdown for Hugo.
>
> Any feedback is welcome! With Pull Requests would be awesome!
>
> Regards,
> Martin
>


Re: Schema Compatibility and Nullable Unions

2021-05-07 Thread Oscar Westra van Holthe - Kind
On Thu, 6 May 2021 at 02:15, Joseph Lorenzini 
wrote:

> Let’s say I have a record with a single union field. In Schema A, the
> first union may be null or a string. In schema B, the  union may be null or
> an int. The record names are the same between the twos chemas. According to
> avro spec section on Schema Resolution:
>
>
>
> if both are unions: The first schema in the reader's union that matches
> the selected writer's union schema is recursively resolved against it. if
> none match, an error is signalled.
>
>
>
> Does this mean that Schema A and Schema B are compatible because both
> unions can be null even though the other type is not compatible between the
> two schemas? I would have expected that compatibility would only be true if
> both types in a union matched between the two schemas.
>

In general no (the schemas are not compatible for all cases), but some A
records can still be read as / projected to B records.

Specifically, records using schema A with null values are compatible (as
the null type in the target schema is matched). Records with string values
are not.


Kind regards,
Oscar


Re: Companies using Apache Avro

2021-01-27 Thread Oscar Westra van Holthe - Kind
On wed 27 jan. 2021 11:20, Ismaël Mejía  wrote:

> A good reference  and comparison between formats and the advantages of
> Avro that you can refer on your paper is on Martin Kleppmann book:
>
> https://www.oreilly.com/library/view/designing-data-intensive-applications/9781491903063/ch04.html


This is an excellent reference (and primer) on many distributed data
subjects. It lists many more references, in case you wouldneed them.



Kind regards,
Oscar


New plugin for Jetbrains (IntelliJ / PyCharm / ...)

2021-01-11 Thread Oscar Westra van Holthe - Kind
Hello everyone,

Does anyone use IntelliJ, PyCharm or another Jetbrains IDE, and would like
to edit Avro schemas with it?
I've built a plugin that recognizes Avro schema and protocol files, and I
would appreciate feedback and constructive criticism.

It's an early version that supports:
.avsc schema files: recognized as JSON 'dialect', uses a JSON schema to
supply code completion and semantic checks
.avpr protocol files: recognized as JSON 'dialect', uses a JSON schema to
supply code completion and semantic checks
.avdl protocol files: provides syntax highlighting, correct formatting,
code completion, semantic checks, and named schema navigation

You can find it in the plugin marketplace in your IDE (search for "avrio
idl"), or here:
https://plugins.jetbrains.com/plugin/15728-apache-avro-idl-schema-support

Feedback, bugs, ideas (pull requests), etc. are most welcome via github:
https://github.com/opwvhk/avro-schema-support


Kind regards,
Oscar

-- 

✉️ Oscar Westra van Holthe - Kind 
 https://plugins.jetbrains.com/plugin/15728-apache-avro-idl-schema-support