Ruby gem fork - contribute back?

2014-06-25 Thread Willem van Bergen
Hi,

For a Ruby project, I am using AVRO schemas to validate Ruby objects. Because I 
ran into some issues with the official avro gem, so I forked it: 
https://github.com/wvanbergen/tros. (The name probably only makes sense to 
Dutch people :)


### Changes

- Fixed a round trip encoding issue for union(double, int) types. 
Integers were being encoded as floats, and read back as float. In Ruby versions 
2.0 and later, a float == bigint equality check will return false. This caused 
a test to fail.

- Fix UTF-8 support for Ruby 1.9+, and JRuby.
The original code was written for Ruby 1.8, and there's some big changes to how 
to properly do this in Ruby 1.9+ and JRuby.

- Remove monkey patching of Enumerable
Monkey patching builtin objects is frowned upon, especially in libraries. 
Fixing it was easy:
https://github.com/wvanbergen/tros/commit/c81d6189277111008ebb05239af91d286dd01061

- Dropped dependency of yajl-ruby and/or multi_json. 
The yajl-ruby dependency was causing compatibility issues with the rest of my 
application, and there's no released version yet with working multi_json (1.7.6 
cannot be installed because multi_json is misspelled multi-json). Instead of 
fixing that, I decided to simply use Ruby's built in support for JSON. For 
libraries, the less external dependencies the better.


I also did some heavy refactoring to make the Ruby codebase work outside of the 
context of the greater Avro project, and applied some best practices of the 
Ruby ecosystem. Finally, I set up CI (https://travis-ci.org/wvanbergen/tros) 
that checks the gem on multiple Ruby versions.


### Contributing back?

I would like to contribute back my changes if you are interested. However, 
maintaining Ruby 1.8 support will make this very hard. Ruby 1.8 doesn't come 
with built in JSON support, and it's unicode handling is severely broken. It is 
also no longer maintained: 
https://www.ruby-lang.org/en/news/2013/12/17/maintenance-of-1-8-7-and-1-9-2/

Is it acceptable to drop support for Ruby 1.8? If so, I can work with you to 
get my changes back into the main codebase.


Cheers,
Willem van Bergen

Re: Ruby gem fork - contribute back?

2014-06-25 Thread Sean Busbey
how far back did you fork?

could we have a Ruby 1.8 gem and a Ruby 1.9+ gem?

we have python and python 3 support broken out, for example.


On Wed, Jun 25, 2014 at 3:51 AM, Willem van Bergen wil...@vanbergen.org
wrote:

 Hi,

 For a Ruby project, I am using AVRO schemas to validate Ruby objects.
 Because I ran into some issues with the official avro gem, so I forked it:
 https://github.com/wvanbergen/tros. (The name probably only makes sense
 to Dutch people :)


 ### Changes

 - Fixed a round trip encoding issue for union(double, int) types.
 Integers were being encoded as floats, and read back as float. In Ruby
 versions 2.0 and later, a float == bigint equality check will return false.
 This caused a test to fail.

 - Fix UTF-8 support for Ruby 1.9+, and JRuby.
 The original code was written for Ruby 1.8, and there's some big changes
 to how to properly do this in Ruby 1.9+ and JRuby.

 - Remove monkey patching of Enumerable
 Monkey patching builtin objects is frowned upon, especially in libraries.
 Fixing it was easy:

 https://github.com/wvanbergen/tros/commit/c81d6189277111008ebb05239af91d286dd01061

 - Dropped dependency of yajl-ruby and/or multi_json.
 The yajl-ruby dependency was causing compatibility issues with the rest of
 my application, and there's no released version yet with working multi_json
 (1.7.6 cannot be installed because multi_json is misspelled multi-json).
 Instead of fixing that, I decided to simply use Ruby's built in support for
 JSON. For libraries, the less external dependencies the better.


 I also did some heavy refactoring to make the Ruby codebase work outside
 of the context of the greater Avro project, and applied some best practices
 of the Ruby ecosystem. Finally, I set up CI (
 https://travis-ci.org/wvanbergen/tros) that checks the gem on multiple
 Ruby versions.


 ### Contributing back?

 I would like to contribute back my changes if you are interested. However,
 maintaining Ruby 1.8 support will make this very hard. Ruby 1.8 doesn't
 come with built in JSON support, and it's unicode handling is severely
 broken. It is also no longer maintained:
 https://www.ruby-lang.org/en/news/2013/12/17/maintenance-of-1-8-7-and-1-9-2/

 Is it acceptable to drop support for Ruby 1.8? If so, I can work with you
 to get my changes back into the main codebase.


 Cheers,
 Willem van Bergen




-- 
Sean


Re: Ruby gem fork - contribute back?

2014-06-25 Thread Willem van Bergen
I forked off trunk 2 days ago. 

It's possible to have 2 different gems, but this is not very common in the Ruby 
world. Because Ruby 1.8 is not maintained anymore, not even for security 
issues, most people have moved on to newer versions. This is in contrast with 
Python 2, which is still maintained and heavily used.

My preference would be to document that the last release of avro that supports 
Ruby 1.8 is 1.7.5. (Version 1.7.6 won't install because of the multi_json 
issue). Maintaining 1.8 compatibility will be harder and harder over time and 
hold back development. E.g. it is already hard to even install a Ruby 1.8 
version on a recent OSX due to compiler changes.


Cheers,
Willem


On Jun 25, 2014, at 5:06 AM, Sean Busbey bus...@cloudera.com wrote:

 how far back did you fork?
 
 could we have a Ruby 1.8 gem and a Ruby 1.9+ gem?
 
 we have python and python 3 support broken out, for example.
 
 
 On Wed, Jun 25, 2014 at 3:51 AM, Willem van Bergen wil...@vanbergen.org
 wrote:
 
 Hi,
 
 For a Ruby project, I am using AVRO schemas to validate Ruby objects.
 Because I ran into some issues with the official avro gem, so I forked it:
 https://github.com/wvanbergen/tros. (The name probably only makes sense
 to Dutch people :)
 
 
 ### Changes
 
 - Fixed a round trip encoding issue for union(double, int) types.
 Integers were being encoded as floats, and read back as float. In Ruby
 versions 2.0 and later, a float == bigint equality check will return false.
 This caused a test to fail.
 
 - Fix UTF-8 support for Ruby 1.9+, and JRuby.
 The original code was written for Ruby 1.8, and there's some big changes
 to how to properly do this in Ruby 1.9+ and JRuby.
 
 - Remove monkey patching of Enumerable
 Monkey patching builtin objects is frowned upon, especially in libraries.
 Fixing it was easy:
 
 https://github.com/wvanbergen/tros/commit/c81d6189277111008ebb05239af91d286dd01061
 
 - Dropped dependency of yajl-ruby and/or multi_json.
 The yajl-ruby dependency was causing compatibility issues with the rest of
 my application, and there's no released version yet with working multi_json
 (1.7.6 cannot be installed because multi_json is misspelled multi-json).
 Instead of fixing that, I decided to simply use Ruby's built in support for
 JSON. For libraries, the less external dependencies the better.
 
 
 I also did some heavy refactoring to make the Ruby codebase work outside
 of the context of the greater Avro project, and applied some best practices
 of the Ruby ecosystem. Finally, I set up CI (
 https://travis-ci.org/wvanbergen/tros) that checks the gem on multiple
 Ruby versions.
 
 
 ### Contributing back?
 
 I would like to contribute back my changes if you are interested. However,
 maintaining Ruby 1.8 support will make this very hard. Ruby 1.8 doesn't
 come with built in JSON support, and it's unicode handling is severely
 broken. It is also no longer maintained:
 https://www.ruby-lang.org/en/news/2013/12/17/maintenance-of-1-8-7-and-1-9-2/
 
 Is it acceptable to drop support for Ruby 1.8? If so, I can work with you
 to get my changes back into the main codebase.
 
 
 Cheers,
 Willem van Bergen
 
 
 
 
 -- 
 Sean



Re: Ruby gem fork - contribute back?

2014-06-25 Thread Sean Busbey
IIRC, the multijson issue is fixed in the current snapshot.

I dunno, I certainly stopped using Ruby 1.8 several years ago. The issue is
that Avro has a strong history of favoring compatibility. It would be
surprising for us to drop Ruby 1.8 support while still in the Avro 1.7 line.

We could plan to only support Ruby 1.9+ in Avro 1.8 and take a contribution
that targeted that, maybe?

-- 
Sean
On Jun 25, 2014 4:16 AM, Willem van Bergen wil...@vanbergen.org wrote:

 I forked off trunk 2 days ago.

 It's possible to have 2 different gems, but this is not very common in the
 Ruby world. Because Ruby 1.8 is not maintained anymore, not even for
 security issues, most people have moved on to newer versions. This is in
 contrast with Python 2, which is still maintained and heavily used.

 My preference would be to document that the last release of avro that
 supports Ruby 1.8 is 1.7.5. (Version 1.7.6 won't install because of the
 multi_json issue). Maintaining 1.8 compatibility will be harder and harder
 over time and hold back development. E.g. it is already hard to even
 install a Ruby 1.8 version on a recent OSX due to compiler changes.


 Cheers,
 Willem


 On Jun 25, 2014, at 5:06 AM, Sean Busbey bus...@cloudera.com wrote:

  how far back did you fork?
 
  could we have a Ruby 1.8 gem and a Ruby 1.9+ gem?
 
  we have python and python 3 support broken out, for example.
 
 
  On Wed, Jun 25, 2014 at 3:51 AM, Willem van Bergen wil...@vanbergen.org
 
  wrote:
 
  Hi,
 
  For a Ruby project, I am using AVRO schemas to validate Ruby objects.
  Because I ran into some issues with the official avro gem, so I forked
 it:
  https://github.com/wvanbergen/tros. (The name probably only makes sense
  to Dutch people :)
 
 
  ### Changes
 
  - Fixed a round trip encoding issue for union(double, int) types.
  Integers were being encoded as floats, and read back as float. In Ruby
  versions 2.0 and later, a float == bigint equality check will return
 false.
  This caused a test to fail.
 
  - Fix UTF-8 support for Ruby 1.9+, and JRuby.
  The original code was written for Ruby 1.8, and there's some big changes
  to how to properly do this in Ruby 1.9+ and JRuby.
 
  - Remove monkey patching of Enumerable
  Monkey patching builtin objects is frowned upon, especially in
 libraries.
  Fixing it was easy:
 
 
 https://github.com/wvanbergen/tros/commit/c81d6189277111008ebb05239af91d286dd01061
 
  - Dropped dependency of yajl-ruby and/or multi_json.
  The yajl-ruby dependency was causing compatibility issues with the rest
 of
  my application, and there's no released version yet with working
 multi_json
  (1.7.6 cannot be installed because multi_json is misspelled multi-json).
  Instead of fixing that, I decided to simply use Ruby's built in support
 for
  JSON. For libraries, the less external dependencies the better.
 
 
  I also did some heavy refactoring to make the Ruby codebase work outside
  of the context of the greater Avro project, and applied some best
 practices
  of the Ruby ecosystem. Finally, I set up CI (
  https://travis-ci.org/wvanbergen/tros) that checks the gem on multiple
  Ruby versions.
 
 
  ### Contributing back?
 
  I would like to contribute back my changes if you are interested.
 However,
  maintaining Ruby 1.8 support will make this very hard. Ruby 1.8 doesn't
  come with built in JSON support, and it's unicode handling is severely
  broken. It is also no longer maintained:
 
 https://www.ruby-lang.org/en/news/2013/12/17/maintenance-of-1-8-7-and-1-9-2/
 
  Is it acceptable to drop support for Ruby 1.8? If so, I can work with
 you
  to get my changes back into the main codebase.
 
 
  Cheers,
  Willem van Bergen
 
 
 
 
  --
  Sean




[jira] [Updated] (AVRO-1532) Field deletion not possible for ReflectData: NPE

2014-06-25 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/AVRO-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

O. Reißig updated AVRO-1532:


Attachment: RemovalOfUnionSubtype.java

Thanks for your suggestion with AvroAlias, that's a lot better than my first 
try :-)

Explicitly stating the schema does indeed resolve the issue with removed 
fields, but now I run into problems with union types. In my schema I have a Map 
containing an abstract type, that has several possible implementations via 
@Union. I'd like the rest of the serialized data to still be readable when 
removing one of the Union subtypes.

I attached a test case to illustrate my example. In the real world, accessing 
the removed type would yield ClassCastException, but of course I cannot 
actually remove a class in this test case. As the stuff is stored in a Map, I 
should still be able to access the rest of the map entries.

Did I make my use case clear? Is this realistic?
Why actually store the schema inside the serialized file, if it is overridden 
anyway? I thought it's better to parse the serialized data with its according 
schema.

 Field deletion not possible for ReflectData: NPE
 

 Key: AVRO-1532
 URL: https://issues.apache.org/jira/browse/AVRO-1532
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.7.6
Reporter: O. Reißig
  Labels: java, reflection
 Attachments: AVRO-1532.patch, ReflectDataFieldRemovalTest.java, 
 ReflectDataFieldRemovalTest.java, RemovalOfUnionSubtype.java


 *Actual behaviour:*
 I have a field in my reflection-based schema like this:
 {code}
 @Nullable @AvroDefault(null)
 public Long someField;
 {code}
 When removing this field, parsing the previous serialized blob yields 
 NullPointerException:
 {noformat}
 java.lang.NullPointerException
   at org.apache.avro.reflect.ReflectData.setField(ReflectData.java:128)
   at 
 org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
   at 
 org.apache.avro.reflect.ReflectDatumReader.readField(ReflectDatumReader.java:230)
   at 
 org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
   at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
   at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
   at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
   at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220)
   at 
 ReflectDataFieldRemovalTest.testFieldRemoval(ReflectDataFieldRemovalTest.java:41)
 {noformat}
 *Expected behaviour:*
 Field removal is crucial for schema evolution and must be possible with 
 ReflectData.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Ruby gem fork - contribute back?

2014-06-25 Thread Willem van Bergen
Dropping support for Ruby 1.8 in the Avro 1.8.x series sounds like a plan. Is 
there already a branch for the 1.8 series?
Until that time happens, I can maintain my fork for people requiring unicode 
UTF support on Ruby 1.9+.

I know the multi_json issue is fixed in trunk. However, due to the project's 
structure, it's very hard to use a non-release version inside a project.
Because the project doesn't include a gemspec file, you cannot make Bundler use 
the latest trunk version.
(In my case, I use avro inside of another gem. Gem can only depend on released 
versions of other gems, so I had to fork  release it.)


Willem

On Jun 25, 2014, at 5:33 AM, Sean Busbey bus...@cloudera.com wrote:

 IIRC, the multijson issue is fixed in the current snapshot.
 
 I dunno, I certainly stopped using Ruby 1.8 several years ago. The issue is
 that Avro has a strong history of favoring compatibility. It would be
 surprising for us to drop Ruby 1.8 support while still in the Avro 1.7 line.
 
 We could plan to only support Ruby 1.9+ in Avro 1.8 and take a contribution
 that targeted that, maybe?
 
 -- 
 Sean
 On Jun 25, 2014 4:16 AM, Willem van Bergen wil...@vanbergen.org wrote:
 
 I forked off trunk 2 days ago.
 
 It's possible to have 2 different gems, but this is not very common in the
 Ruby world. Because Ruby 1.8 is not maintained anymore, not even for
 security issues, most people have moved on to newer versions. This is in
 contrast with Python 2, which is still maintained and heavily used.
 
 My preference would be to document that the last release of avro that
 supports Ruby 1.8 is 1.7.5. (Version 1.7.6 won't install because of the
 multi_json issue). Maintaining 1.8 compatibility will be harder and harder
 over time and hold back development. E.g. it is already hard to even
 install a Ruby 1.8 version on a recent OSX due to compiler changes.
 
 
 Cheers,
 Willem
 
 
 On Jun 25, 2014, at 5:06 AM, Sean Busbey bus...@cloudera.com wrote:
 
 how far back did you fork?
 
 could we have a Ruby 1.8 gem and a Ruby 1.9+ gem?
 
 we have python and python 3 support broken out, for example.
 
 
 On Wed, Jun 25, 2014 at 3:51 AM, Willem van Bergen wil...@vanbergen.org
 
 wrote:
 
 Hi,
 
 For a Ruby project, I am using AVRO schemas to validate Ruby objects.
 Because I ran into some issues with the official avro gem, so I forked
 it:
 https://github.com/wvanbergen/tros. (The name probably only makes sense
 to Dutch people :)
 
 
 ### Changes
 
 - Fixed a round trip encoding issue for union(double, int) types.
 Integers were being encoded as floats, and read back as float. In Ruby
 versions 2.0 and later, a float == bigint equality check will return
 false.
 This caused a test to fail.
 
 - Fix UTF-8 support for Ruby 1.9+, and JRuby.
 The original code was written for Ruby 1.8, and there's some big changes
 to how to properly do this in Ruby 1.9+ and JRuby.
 
 - Remove monkey patching of Enumerable
 Monkey patching builtin objects is frowned upon, especially in
 libraries.
 Fixing it was easy:
 
 
 https://github.com/wvanbergen/tros/commit/c81d6189277111008ebb05239af91d286dd01061
 
 - Dropped dependency of yajl-ruby and/or multi_json.
 The yajl-ruby dependency was causing compatibility issues with the rest
 of
 my application, and there's no released version yet with working
 multi_json
 (1.7.6 cannot be installed because multi_json is misspelled multi-json).
 Instead of fixing that, I decided to simply use Ruby's built in support
 for
 JSON. For libraries, the less external dependencies the better.
 
 
 I also did some heavy refactoring to make the Ruby codebase work outside
 of the context of the greater Avro project, and applied some best
 practices
 of the Ruby ecosystem. Finally, I set up CI (
 https://travis-ci.org/wvanbergen/tros) that checks the gem on multiple
 Ruby versions.
 
 
 ### Contributing back?
 
 I would like to contribute back my changes if you are interested.
 However,
 maintaining Ruby 1.8 support will make this very hard. Ruby 1.8 doesn't
 come with built in JSON support, and it's unicode handling is severely
 broken. It is also no longer maintained:
 
 https://www.ruby-lang.org/en/news/2013/12/17/maintenance-of-1-8-7-and-1-9-2/
 
 Is it acceptable to drop support for Ruby 1.8? If so, I can work with
 you
 to get my changes back into the main codebase.
 
 
 Cheers,
 Willem van Bergen
 
 
 
 
 --
 Sean
 
 



[jira] [Commented] (AVRO-1124) RESTful service for holding schemas

2014-06-25 Thread Thunder Stumpges (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043745#comment-14043745
 ] 

Thunder Stumpges commented on AVRO-1124:


Hi Francois, I noticed you applied these patches, and the code tweaks from 
Felix in mate1/release-1.7.5-with-AVRO-1124. I was able to pull and build it 
perfectly. Thanks a lot!

For others, here's the branch from trunk with patches for both AVRO-1315 and 
AVRO-1124 applied:
https://github.com/mate1/avro/tree/from-98ec5f2a172391cb5dfa7b4d85f39065bae22754-with-AVRO-1315-and-AVRO-1124

BTW, are we expecting AVRO-1124 to end up in an 'official' AVRO release or will 
this constantly be an open issue with a set of patches? I don't necessarily 
mind this way; we have figured it out and gotten it working. I'm just curious 
what the end plan is for it. I'd be glad to help out in whatever way I can.

Thanks again everyone.
Thunder


 RESTful service for holding schemas
 ---

 Key: AVRO-1124
 URL: https://issues.apache.org/jira/browse/AVRO-1124
 Project: Avro
  Issue Type: New Feature
Reporter: Jay Kreps
Assignee: Jay Kreps
 Attachments: AVRO-1124-can-read-with.patch, AVRO-1124-draft.patch, 
 AVRO-1124-validators-preliminary.patch, AVRO-1124.patch, AVRO-1124.patch


 Motivation: It is nice to be able to pass around data in serialized form but 
 still know the exact schema that was used to serialize it. The overhead of 
 storing the schema with each record is too high unless the individual records 
 are very large. There are workarounds for some common cases: in the case of 
 files a schema can be stored once with a file of many records amortizing the 
 per-record cost, and in the case of RPC the schema can be negotiated ahead of 
 time and used for many requests. For other uses, though it is nice to be able 
 to pass a reference to a given schema using a small id and allow this to be 
 looked up. Since only a small number of schemas are likely to be active for a 
 given data source, these can easily be cached, so the number of remote 
 lookups is very small (one per active schema version).
 Basically this would consist of two things:
 1. A simple REST service that stores and retrieves schemas
 2. Some helper java code for fetching and caching schemas for people using 
 the registry
 We have used something like this at LinkedIn for a few years now, and it 
 would be nice to standardize this facility to be able to build up common 
 tooling around it. This proposal will be based on what we have, but we can 
 change it as ideas come up.
 The facilities this provides are super simple, basically you can register a 
 schema which gives back a unique id for it or you can query for a schema. 
 There is almost no code, and nothing very complex. The contract is that 
 before emitting/storing a record you must first publish its schema to the 
 registry or know that it has already been published (by checking your cache 
 of published schemas). When reading you check your cache and if you don't 
 find the id/schema pair there you query the registry to look it up. I will 
 explain some of the nuances in more detail below. 
 An added benefit of such a repository is that it makes a few other things 
 possible:
 1. A graphical browser of the various data types that are currently used and 
 all their previous forms.
 2. Automatic enforcement of compatibility rules. Data is always compatible in 
 the sense that the reader will always deserialize it (since they are using 
 the same schema as the writer) but this does not mean it is compatible with 
 the expectations of the reader. For example if an int field is changed to a 
 string that will almost certainly break anyone relying on that field. This 
 definition of compatibility can differ for different use cases and should 
 likely be pluggable.
 Here is a description of one of our uses of this facility at LinkedIn. We use 
 this to retain a schema with log data end-to-end from the producing app to 
 various real-time consumers as well as a set of resulting AvroFile in Hadoop. 
 This schema metadata can then be used to auto-create hive tables (or add new 
 fields to existing tables), or inferring pig fields, all without manual 
 intervention. One important definition of compatibility that is nice to 
 enforce is compatibility with historical data for a given table. Log data 
 is usually loaded in an append-only manner, so if someone changes an int 
 field in a particular data set to be a string, tools like pig or hive that 
 expect static columns will be unusable. Even using plain-vanilla map/reduce 
 processing data where columns and types change willy nilly is painful. 
 However the person emitting this kind of data may not know all the details of 
 compatible schema evolution. We use the schema repository to 

[jira] [Commented] (AVRO-1124) RESTful service for holding schemas

2014-06-25 Thread Thunder Stumpges (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043777#comment-14043777
 ] 

Thunder Stumpges commented on AVRO-1124:


Hi Felix, was just reading through comments, and saw this from you. We are 
going through the exact same thing right now as well.
{quote}
There are certain avro schemas that we use across many Kafka topics (1:M 
relationship between schema and topics). I would like to benefit from the 
facilitated evolution capabilities of the schema repo, but I'm not 100% sure of 
the best way to proceed. I would like to avoid:
1. Having to register the same schema (and each further schema evolutions) into 
many subjects.
2. Having to externally manage a mapping of Kafka topic = subject 
registered into the repo.
{quote}

We also are trying to avoid this. We are currently planning to take a topic 
naming convention approach where we combine the subject name (avro class FQN) 
with a topic suffix when naming our topic. So a topic would be named: 
'subject_name--topic_suffix' where we don't use the -- delimeter in either 
subject name or suffix. I think this avoids both of the above issues, as well 
as not requiring each message to have a subject ID. It does however add 
complexity to all consumers on how to parse topic names.

{quote}
Another possibility would be to introduce the concept of a SubjectAlias. The 
way it would work is that you would register a SubjectAlias with an aliasName 
and a targetName. If the aliasName already exists, or if the targetName does 
not exist, the operation would fail. Afterwards, any lookup for the aliasName 
would return a DelegatingSubject containing the Subject referenced by the 
targetName of the alias that was looked up.
This change seems clean and not too intrusive, and also wouldn't require 
encoding both subject ID and schema ID in my message payloads. But perhaps 
there are problems to this approach that I haven't thought of.
Do you think this approach makes sense? And would it be worth contributing back 
into the main schema repo code?
{quote}

I think this would be a fine approach, and that would simplify our kafka 
consumers to not need to understand the convention we came up with. It would 
also free you to use any convenient topic name for any subject schema without 
having to adhere to that convention on naming. 
Have you had any progress or other thoughts on this issue since January? I 
realize I'm a little late to the party :)

Cheers,
Thunder



 RESTful service for holding schemas
 ---

 Key: AVRO-1124
 URL: https://issues.apache.org/jira/browse/AVRO-1124
 Project: Avro
  Issue Type: New Feature
Reporter: Jay Kreps
Assignee: Jay Kreps
 Attachments: AVRO-1124-can-read-with.patch, AVRO-1124-draft.patch, 
 AVRO-1124-validators-preliminary.patch, AVRO-1124.patch, AVRO-1124.patch


 Motivation: It is nice to be able to pass around data in serialized form but 
 still know the exact schema that was used to serialize it. The overhead of 
 storing the schema with each record is too high unless the individual records 
 are very large. There are workarounds for some common cases: in the case of 
 files a schema can be stored once with a file of many records amortizing the 
 per-record cost, and in the case of RPC the schema can be negotiated ahead of 
 time and used for many requests. For other uses, though it is nice to be able 
 to pass a reference to a given schema using a small id and allow this to be 
 looked up. Since only a small number of schemas are likely to be active for a 
 given data source, these can easily be cached, so the number of remote 
 lookups is very small (one per active schema version).
 Basically this would consist of two things:
 1. A simple REST service that stores and retrieves schemas
 2. Some helper java code for fetching and caching schemas for people using 
 the registry
 We have used something like this at LinkedIn for a few years now, and it 
 would be nice to standardize this facility to be able to build up common 
 tooling around it. This proposal will be based on what we have, but we can 
 change it as ideas come up.
 The facilities this provides are super simple, basically you can register a 
 schema which gives back a unique id for it or you can query for a schema. 
 There is almost no code, and nothing very complex. The contract is that 
 before emitting/storing a record you must first publish its schema to the 
 registry or know that it has already been published (by checking your cache 
 of published schemas). When reading you check your cache and if you don't 
 find the id/schema pair there you query the registry to look it up. I will 
 explain some of the nuances in more detail below. 
 An added benefit of such a repository is that it makes a few other things 
 possible:
 

[jira] [Commented] (AVRO-1124) RESTful service for holding schemas

2014-06-25 Thread Felix GV (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043801#comment-14043801
 ] 

Felix GV commented on AVRO-1124:


Hi Thunder,

Disclaimer: I don't work for Mate1 anymore, so what I'm going to say might be 
out of date by now. The Mate1 guys will need to chime in for the latest state 
of their work on this...

I did not end up coding SubjectAliases into Avro proper because there didn't 
seem to be any interest from the OSS community and I had limited time to build 
this completely generically.

Mate1 had a strategy that is kind of similar to the one you're describing. We 
had arbitrary topic names, which could be dynamically appended with certain 
suffixes (like __PROCESSING_FAILURE, __DECODING_FAILURE or whatever). Those 
dynamically created topics could then be re-processed later on, as convenient, 
and would contain the same schema as the topic they derived from.

In our Camus decoders, we hard-coded some topic alias resolution right there in 
the code (since the amount of suffixes was limited in our case). There were 
talks of porting that topic alias resolution logic to our schema-repo-client 
implementation ( https://github.com/mate1/schema-repo-client/ ), so that it is 
more conveniently available to all Kafka consumers (not just Camus), but that 
didn't end up happening before I left. So essentially, we went for option 2, 
above. For an organization that has more numerous/diverse suffixes, that 
strategy would probably not be ideal, but for small amounts of suffixes (or 
prefixes), it was deemed acceptable.

Hopefully, that sheds some light (:

-F

 RESTful service for holding schemas
 ---

 Key: AVRO-1124
 URL: https://issues.apache.org/jira/browse/AVRO-1124
 Project: Avro
  Issue Type: New Feature
Reporter: Jay Kreps
Assignee: Jay Kreps
 Attachments: AVRO-1124-can-read-with.patch, AVRO-1124-draft.patch, 
 AVRO-1124-validators-preliminary.patch, AVRO-1124.patch, AVRO-1124.patch


 Motivation: It is nice to be able to pass around data in serialized form but 
 still know the exact schema that was used to serialize it. The overhead of 
 storing the schema with each record is too high unless the individual records 
 are very large. There are workarounds for some common cases: in the case of 
 files a schema can be stored once with a file of many records amortizing the 
 per-record cost, and in the case of RPC the schema can be negotiated ahead of 
 time and used for many requests. For other uses, though it is nice to be able 
 to pass a reference to a given schema using a small id and allow this to be 
 looked up. Since only a small number of schemas are likely to be active for a 
 given data source, these can easily be cached, so the number of remote 
 lookups is very small (one per active schema version).
 Basically this would consist of two things:
 1. A simple REST service that stores and retrieves schemas
 2. Some helper java code for fetching and caching schemas for people using 
 the registry
 We have used something like this at LinkedIn for a few years now, and it 
 would be nice to standardize this facility to be able to build up common 
 tooling around it. This proposal will be based on what we have, but we can 
 change it as ideas come up.
 The facilities this provides are super simple, basically you can register a 
 schema which gives back a unique id for it or you can query for a schema. 
 There is almost no code, and nothing very complex. The contract is that 
 before emitting/storing a record you must first publish its schema to the 
 registry or know that it has already been published (by checking your cache 
 of published schemas). When reading you check your cache and if you don't 
 find the id/schema pair there you query the registry to look it up. I will 
 explain some of the nuances in more detail below. 
 An added benefit of such a repository is that it makes a few other things 
 possible:
 1. A graphical browser of the various data types that are currently used and 
 all their previous forms.
 2. Automatic enforcement of compatibility rules. Data is always compatible in 
 the sense that the reader will always deserialize it (since they are using 
 the same schema as the writer) but this does not mean it is compatible with 
 the expectations of the reader. For example if an int field is changed to a 
 string that will almost certainly break anyone relying on that field. This 
 definition of compatibility can differ for different use cases and should 
 likely be pluggable.
 Here is a description of one of our uses of this facility at LinkedIn. We use 
 this to retain a schema with log data end-to-end from the producing app to 
 various real-time consumers as well as a set of resulting AvroFile in Hadoop. 
 This schema metadata can then be used 

Re: Ruby gem fork - contribute back?

2014-06-25 Thread Sean Busbey
There isn't a branch for 1.8. Patches that target that version just get
generated based on trunk and attached to tickets with a fix version of
1.8.0. Generally, they also get hte incompatible flag.

Sure, I agree that using the unreleased versions isn't tenable. Doug made a
call for a 1.7.7 release back at the end of may[1]. It would be good to
ping that thread with the important of getting something soon for the Ruby
folks.


[1]: http://s.apache.org/1LB


On Wed, Jun 25, 2014 at 4:54 AM, Willem van Bergen wil...@vanbergen.org
wrote:

 Dropping support for Ruby 1.8 in the Avro 1.8.x series sounds like a plan.
 Is there already a branch for the 1.8 series?
 Until that time happens, I can maintain my fork for people requiring
 unicode UTF support on Ruby 1.9+.

 I know the multi_json issue is fixed in trunk. However, due to the
 project's structure, it's very hard to use a non-release version inside a
 project.
 Because the project doesn't include a gemspec file, you cannot make
 Bundler use the latest trunk version.
 (In my case, I use avro inside of another gem. Gem can only depend on
 released versions of other gems, so I had to fork  release it.)


 Willem

 On Jun 25, 2014, at 5:33 AM, Sean Busbey bus...@cloudera.com wrote:

  IIRC, the multijson issue is fixed in the current snapshot.
 
  I dunno, I certainly stopped using Ruby 1.8 several years ago. The issue
 is
  that Avro has a strong history of favoring compatibility. It would be
  surprising for us to drop Ruby 1.8 support while still in the Avro 1.7
 line.
 
  We could plan to only support Ruby 1.9+ in Avro 1.8 and take a
 contribution
  that targeted that, maybe?
 
  --
  Sean
  On Jun 25, 2014 4:16 AM, Willem van Bergen wil...@vanbergen.org
 wrote:
 
  I forked off trunk 2 days ago.
 
  It's possible to have 2 different gems, but this is not very common in
 the
  Ruby world. Because Ruby 1.8 is not maintained anymore, not even for
  security issues, most people have moved on to newer versions. This is in
  contrast with Python 2, which is still maintained and heavily used.
 
  My preference would be to document that the last release of avro that
  supports Ruby 1.8 is 1.7.5. (Version 1.7.6 won't install because of the
  multi_json issue). Maintaining 1.8 compatibility will be harder and
 harder
  over time and hold back development. E.g. it is already hard to even
  install a Ruby 1.8 version on a recent OSX due to compiler changes.
 
 
  Cheers,
  Willem
 
 
  On Jun 25, 2014, at 5:06 AM, Sean Busbey bus...@cloudera.com wrote:
 
  how far back did you fork?
 
  could we have a Ruby 1.8 gem and a Ruby 1.9+ gem?
 
  we have python and python 3 support broken out, for example.
 
 
  On Wed, Jun 25, 2014 at 3:51 AM, Willem van Bergen 
 wil...@vanbergen.org
 
  wrote:
 
  Hi,
 
  For a Ruby project, I am using AVRO schemas to validate Ruby objects.
  Because I ran into some issues with the official avro gem, so I forked
  it:
  https://github.com/wvanbergen/tros. (The name probably only makes
 sense
  to Dutch people :)
 
 
  ### Changes
 
  - Fixed a round trip encoding issue for union(double, int) types.
  Integers were being encoded as floats, and read back as float. In Ruby
  versions 2.0 and later, a float == bigint equality check will return
  false.
  This caused a test to fail.
 
  - Fix UTF-8 support for Ruby 1.9+, and JRuby.
  The original code was written for Ruby 1.8, and there's some big
 changes
  to how to properly do this in Ruby 1.9+ and JRuby.
 
  - Remove monkey patching of Enumerable
  Monkey patching builtin objects is frowned upon, especially in
  libraries.
  Fixing it was easy:
 
 
 
 https://github.com/wvanbergen/tros/commit/c81d6189277111008ebb05239af91d286dd01061
 
  - Dropped dependency of yajl-ruby and/or multi_json.
  The yajl-ruby dependency was causing compatibility issues with the
 rest
  of
  my application, and there's no released version yet with working
  multi_json
  (1.7.6 cannot be installed because multi_json is misspelled
 multi-json).
  Instead of fixing that, I decided to simply use Ruby's built in
 support
  for
  JSON. For libraries, the less external dependencies the better.
 
 
  I also did some heavy refactoring to make the Ruby codebase work
 outside
  of the context of the greater Avro project, and applied some best
  practices
  of the Ruby ecosystem. Finally, I set up CI (
  https://travis-ci.org/wvanbergen/tros) that checks the gem on
 multiple
  Ruby versions.
 
 
  ### Contributing back?
 
  I would like to contribute back my changes if you are interested.
  However,
  maintaining Ruby 1.8 support will make this very hard. Ruby 1.8
 doesn't
  come with built in JSON support, and it's unicode handling is severely
  broken. It is also no longer maintained:
 
 
 https://www.ruby-lang.org/en/news/2013/12/17/maintenance-of-1-8-7-and-1-9-2/
 
  Is it acceptable to drop support for Ruby 1.8? If so, I can work with
  you
  to get my changes back into the main codebase.
 
 
  Cheers,
  Willem van 

Re: Ruby gem fork - contribute back?

2014-06-25 Thread Doug Cutting
On Wed, Jun 25, 2014 at 2:16 AM, Willem van Bergen wil...@vanbergen.org wrote:
 It's possible to have 2 different gems, but this is not very common in the 
 Ruby world.
 Because Ruby 1.8 is not maintained anymore, not even for security issues, most
 people have moved on to newer versions.

I can see a couple of options:
  1. Assume that no one actually uses Ruby 1.8 anymore, and upgrade to
1.9 in Avro 1.7.7.  A change that doesn't break anyone isn't
incompatible.
  2. Assume some folks still use Ruby 1.8 and add a ruby1.9 fork in
Avro 1.7.7.  Ruby users who upgrade to Avro 1.7.7 would need to opt-in
to the Ruby 1.9 version.
  3. Wait until we release 1.8.0 to upgrade Avro to support Ruby 1.9.

(3) seems like a bad option unless we're confident we're going to
release a 1.8.0 soon, which I am not.  Folks hate getting broken by
upgrades.  Avro is a dependency of a lot of Java applications.  Having
an incompatible release makes it hard for one component to upgrade
without forcing all to upgrade.  Either you end up with a broken stack
or with one that can never upgrade.  Which of (1) or (2) seems more
palatable to Ruby folks?  Are there other options?

Doug


[jira] [Commented] (AVRO-1124) RESTful service for holding schemas

2014-06-25 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14044190#comment-14044190
 ] 

Doug Cutting commented on AVRO-1124:


 BTW, are we expecting AVRO-1124 to end up in an 'official' AVRO release or 
 will this constantly be an open issue with a set of patches?

I'd love to see this in a release sooner rather than later.  Since it's new 
functionality there should be no compatibility issues.  All it takes is someone 
to declare a particular patch ready to be committed, and for one or more other 
folks to endorse that.  We also need to have some confidence that, even if it's 
incomplete, the public APIs it exposes can be supported compatibly as 
functionality is improved.

 RESTful service for holding schemas
 ---

 Key: AVRO-1124
 URL: https://issues.apache.org/jira/browse/AVRO-1124
 Project: Avro
  Issue Type: New Feature
Reporter: Jay Kreps
Assignee: Jay Kreps
 Attachments: AVRO-1124-can-read-with.patch, AVRO-1124-draft.patch, 
 AVRO-1124-validators-preliminary.patch, AVRO-1124.patch, AVRO-1124.patch


 Motivation: It is nice to be able to pass around data in serialized form but 
 still know the exact schema that was used to serialize it. The overhead of 
 storing the schema with each record is too high unless the individual records 
 are very large. There are workarounds for some common cases: in the case of 
 files a schema can be stored once with a file of many records amortizing the 
 per-record cost, and in the case of RPC the schema can be negotiated ahead of 
 time and used for many requests. For other uses, though it is nice to be able 
 to pass a reference to a given schema using a small id and allow this to be 
 looked up. Since only a small number of schemas are likely to be active for a 
 given data source, these can easily be cached, so the number of remote 
 lookups is very small (one per active schema version).
 Basically this would consist of two things:
 1. A simple REST service that stores and retrieves schemas
 2. Some helper java code for fetching and caching schemas for people using 
 the registry
 We have used something like this at LinkedIn for a few years now, and it 
 would be nice to standardize this facility to be able to build up common 
 tooling around it. This proposal will be based on what we have, but we can 
 change it as ideas come up.
 The facilities this provides are super simple, basically you can register a 
 schema which gives back a unique id for it or you can query for a schema. 
 There is almost no code, and nothing very complex. The contract is that 
 before emitting/storing a record you must first publish its schema to the 
 registry or know that it has already been published (by checking your cache 
 of published schemas). When reading you check your cache and if you don't 
 find the id/schema pair there you query the registry to look it up. I will 
 explain some of the nuances in more detail below. 
 An added benefit of such a repository is that it makes a few other things 
 possible:
 1. A graphical browser of the various data types that are currently used and 
 all their previous forms.
 2. Automatic enforcement of compatibility rules. Data is always compatible in 
 the sense that the reader will always deserialize it (since they are using 
 the same schema as the writer) but this does not mean it is compatible with 
 the expectations of the reader. For example if an int field is changed to a 
 string that will almost certainly break anyone relying on that field. This 
 definition of compatibility can differ for different use cases and should 
 likely be pluggable.
 Here is a description of one of our uses of this facility at LinkedIn. We use 
 this to retain a schema with log data end-to-end from the producing app to 
 various real-time consumers as well as a set of resulting AvroFile in Hadoop. 
 This schema metadata can then be used to auto-create hive tables (or add new 
 fields to existing tables), or inferring pig fields, all without manual 
 intervention. One important definition of compatibility that is nice to 
 enforce is compatibility with historical data for a given table. Log data 
 is usually loaded in an append-only manner, so if someone changes an int 
 field in a particular data set to be a string, tools like pig or hive that 
 expect static columns will be unusable. Even using plain-vanilla map/reduce 
 processing data where columns and types change willy nilly is painful. 
 However the person emitting this kind of data may not know all the details of 
 compatible schema evolution. We use the schema repository to validate that 
 any change made to a schema don't violate the compatibility model, and reject 
 the update if it does. We do this check both at run time, and also as part of 
 the ant task 

[jira] [Commented] (AVRO-1530) Java DataFileStream does not allow distinguishing between empty files and corrupt files

2014-06-25 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14044193#comment-14044193
 ] 

Doug Cutting commented on AVRO-1530:


That would be an incompatible change.  Some folks might rely on the current 
behaviour.

One can detect an empty file by looking at its length.  No valid avro data file 
will ever be empty.

 Java DataFileStream does not allow distinguishing between empty files and 
 corrupt files
 ---

 Key: AVRO-1530
 URL: https://issues.apache.org/jira/browse/AVRO-1530
 Project: Avro
  Issue Type: Bug
Reporter: Brock Noland

 When writing data to HDFS, especially with Flume, it's possible to write 
 empty files. When you run Hive queries over this data, the job fails with 
 Not a data file. from here 
 https://github.com/apache/avro/blob/trunk/lang/java/avro/src/main/java/org/apache/avro/file/DataFileStream.java#L102



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (AVRO-695) Cycle Reference Support

2014-06-25 Thread Sachin Goyal (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sachin Goyal updated AVRO-695:
--

Attachment: circular_refs_and_nonstring_map_keys_2014_06_25.zip

 Cycle Reference Support
 ---

 Key: AVRO-695
 URL: https://issues.apache.org/jira/browse/AVRO-695
 Project: Avro
  Issue Type: New Feature
  Components: spec
Affects Versions: 1.7.6
Reporter: Moustapha Cherri
 Attachments: avro-1.4.1-cycle.patch.gz, avro-1.4.1-cycle.patch.gz, 
 avro_circular_references.zip, avro_circular_refs_2014_06_14.zip, 
 circular_refs_and_nonstring_map_keys_2014_06_25.zip

   Original Estimate: 672h
  Remaining Estimate: 672h

 This is a proposed implementation to add cycle reference support to Avro. It 
 basically introduce a new type named Cycle. Cycles contains a string 
 representing the path to the other reference.
 For example if we have an object of type Message that have a member named 
 previous with type Message too. If we have have this hierarchy:
 message
   previous : message2
 message2
   previous : message2
 When serializing the cycle path for message2.previous will be previous.
 The implementation depend on ANTLR to evaluate those cycle at read time to 
 resolve them. I used ANTLR 3.2. This dependency is not mandated; I just used 
 ANTLR to speed thing up. I kept in this implementation the generated code 
 from ANTLR though this should not be the case as this should be generated 
 during the build. I only updated the Java code.
 I did not make full unit testing but you can find avrotest.Main class that 
 can be used a preliminary test.
 Please do not hesitate to contact me for further clarification if this seems 
 interresting.
 Best regards,
 Moustapha Cherri



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (AVRO-695) Cycle Reference Support

2014-06-25 Thread Sachin Goyal (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14044266#comment-14044266
 ] 

Sachin Goyal commented on AVRO-695:
---


h3.Circular References


*Serialization*
Extra API required (not optional): 
{code}ReflectData.setCircularRefIdPrefix(some-field-name){code}

If the above is set, following happens:
  # During serialization, each record contains the extra field specified above. 
The value for this field is just a monotonically increasing number meant to 
uniquely identify each record in one particular serialization.
  # While writing schema, each RECORD schema is converted into a UNION schema 
such that it can either be a record or a string. During object serialization, 
if a record is seen before, it is not written as a record. Rather it is written 
as a string in the format: some-field-name + ID-generated-in #1 above. With 
this structure, reading applications have enough information to restore the 
circular reference if they want. This structure is also usable by languages not 
supporting circular reference because they will read that circular-reference as 
a normal string.
(AllowNull also works with this).
  # Above field-name is included in each record as a property. This allows the 
readers to become aware of this field-name so that the clients do not have to 
specify this just to populate the circular references. Basically it makes the 
schema self-sufficient.

*Deserialization*
Extra API required (optional):
{code}GenericDatumReader.setResolveCircularRefs(boolean){code}

Based on #3 above, GenericDatumReader becomes circular-reference-aware.
But since all GenericDatumReaders share a common GenericData instance,  they 
are provided with another flag resolveCircularRefs to control whether they 
want to resolve circular references or not.
If this flag is set and the serialized schema has non-null value for 
circular-reference-field, GenericDatumReader does the following:
  # If any record has circular-ref-field, store its value and the corresponding 
record in a map.
  # Look for unions which can be serialized as a record as well as a string. On 
finding such a record serialized as a string, replace the string with the 
record retreived from the map created in #1




h3.Non-string map-keys


*Serialization*
No extra API required.

Without this patch, Avro throws an exception for non-string map-keys.
This patch converts such maps into an array of records where each record has 
two fields: key and value. Example:
MapObjX, ObjY is converted to [{key:{ObjX}, value:{ObjY}}]
To do this, following is done:
  # In ReflectData.java, create schema for key as well as value in the 
non-string hash-map.
  Encapsulate these two schemas into a record schema and create an array schema 
of such records.
  Set property NS_MAP_FLAG to 1 and store the actual class of the map as a 
CLASS_PROP

  # While writing out a non-string map field, if NS_MAP_FLAG is set, convert 
map to array of records using map.entrySet()



*Deserialization*
No extra API required.

Deserialization for non-string map-keys is pretty simple since data and the 
schema match exactly.
So it just deserializes automatically.
To create an actual map (like when using ReflectDatumReader with actual-class 
type-parameter), map is instantiated using CLASS_PROP if the property 
NS_MAP_FLAG is set to 1



h3.Testcases included

The unit tests cover the following:
# Circular references at multiple levels of hierarchy
# Circular references within Collections and Maps. 
# Circular and non-circular deserialization of circularly serialized objects.
# Non-string map-keys having circular references.
# Non-string map-keys with nested maps.


 Cycle Reference Support
 ---

 Key: AVRO-695
 URL: https://issues.apache.org/jira/browse/AVRO-695
 Project: Avro
  Issue Type: New Feature
  Components: spec
Affects Versions: 1.7.6
Reporter: Moustapha Cherri
 Attachments: avro-1.4.1-cycle.patch.gz, avro-1.4.1-cycle.patch.gz, 
 avro_circular_references.zip, avro_circular_refs_2014_06_14.zip, 
 circular_refs_and_nonstring_map_keys_2014_06_25.zip

   Original Estimate: 672h
  Remaining Estimate: 672h

 This is a proposed implementation to add cycle reference support to Avro. It 
 basically introduce a new type named Cycle. Cycles contains a string 
 representing the path to the other reference.
 For example if we have an object of type Message that have a member named 
 previous with type Message too. If we have have this hierarchy:
 message
   previous : message2
 message2
   previous : message2
 When serializing the cycle path for message2.previous will be previous.
 The implementation depend on ANTLR to evaluate those cycle at read time to 
 

Re: Ruby gem fork - contribute back?

2014-06-25 Thread Sean Busbey
Personally, I'd rather see #2.

I think it's very hard to know what the current use of Ruby 1.8 is. Support
from the MRI community only ended ~1 year ago[1]. JRuby still supports
running in 1.8 mode. They'll be dropping it in their next major release,
but there isn't a schedule for that yet and they expect the current major
release line to continue for some time after that[2]. Additionally, Heroku
won't be ending support until the end of this month[3]. Even after that,
it's not clear to me that they won't allow users to keep using it.

 As mentioned previously I'm a JRuby-in-1.9-mode user and I usually just
work with the Java libraries directly. So this won't directly impact me,
but I agree that it sucks when upgrades break things. So I don't feel like
#1 is an option.

We could also investigate maintaining a single gem that just had two
implementations with in it, with the active one determined by the Ruby
version.

[1]: https://www.ruby-lang.org/en/news/2013/06/30/we-retire-1-8-7/
[2]: https://groups.google.com/d/msg/jruby-users/qmLpZ7qDwZo/J_iYViplcq4J
[3]: https://devcenter.heroku.com/articles/ruby-support#ruby-versions

-Sean

On Wed, Jun 25, 2014 at 6:41 PM, Doug Cutting cutt...@apache.org wrote:

 On Wed, Jun 25, 2014 at 2:16 AM, Willem van Bergen wil...@vanbergen.org
 wrote:
  It's possible to have 2 different gems, but this is not very common in
 the Ruby world.
  Because Ruby 1.8 is not maintained anymore, not even for security
 issues, most
  people have moved on to newer versions.

 I can see a couple of options:
   1. Assume that no one actually uses Ruby 1.8 anymore, and upgrade to
 1.9 in Avro 1.7.7.  A change that doesn't break anyone isn't
 incompatible.
   2. Assume some folks still use Ruby 1.8 and add a ruby1.9 fork in
 Avro 1.7.7.  Ruby users who upgrade to Avro 1.7.7 would need to opt-in
 to the Ruby 1.9 version.
   3. Wait until we release 1.8.0 to upgrade Avro to support Ruby 1.9.

 (3) seems like a bad option unless we're confident we're going to
 release a 1.8.0 soon, which I am not.  Folks hate getting broken by
 upgrades.  Avro is a dependency of a lot of Java applications.  Having
 an incompatible release makes it hard for one component to upgrade
 without forcing all to upgrade.  Either you end up with a broken stack
 or with one that can never upgrade.  Which of (1) or (2) seems more
 palatable to Ruby folks?  Are there other options?

 Doug




-- 
Sean