Re: What is the status of Date coding?

2015-07-07 Thread Lukas Steiblys
Probably the best practice is to simply encode the number of 
seconds/milliseconds since 1970-01-01 UTC, also known as Unix or Epoch time, 
and then convert to whatever timezone you want on the client side. This has 
been the only sane approach that worked for me over the years.

Lukas

From: Zijing Guo 
Sent: Tuesday, July 7, 2015 5:58 AM
To: user@avro.apache.org ; Zijing Guo 
Subject: Re: What is the status of Date coding?

I also paste this question on stackoverflow, What is the best practice of code 
Date field in Apache Avro as of now?
 
   
  What is the best practice of code Date field in Apache A... 
  I have been searching around and saw the jira 
https://issues.apache.org/jira/browse/AVRO-739 for this matter, but I don't 
have a better sense of what avro support f... 
 
  View on stackoverflow.com Preview by Yahoo 
 
 






On Tuesday, July 7, 2015 8:51 AM, Zijing Guo alter...@yahoo.com wrote:




I have been searching around and saw the jira 
https://issues.apache.org/jira/browse/AVRO-739 for this matter, but I don't 
have a better sense of what avro support for date time within the user 
document. What I trying to achieve is to code the date with timezone 
information (with iso8601) from the kafka producer side in python and all the 
downstream consumers are written in java can decode it properly. what is the 
best practice for this?

Thanks in advance
Edwin




Re: What is the status of Date coding?

2015-07-07 Thread Lukas Steiblys
Why not convert it to epoch time before serialization?

Lukas

From: Zijing Guo 
Sent: Tuesday, July 7, 2015 7:10 AM
To: Lukas Steiblys ; user@avro.apache.org 
Subject: Re: What is the status of Date coding?


  Thanks for your reply, the down side is that our data come from different 
timezone, so in order to assemble the data from epoch time to a date with tz, I 
have to carry the tz information within each record 





On Tuesday, July 7, 2015 9:56 AM, Lukas Steiblys lu...@doubledutch.me wrote:




Probably the best practice is to simply encode the number of 
seconds/milliseconds since 1970-01-01 UTC, also known as Unix or Epoch time, 
and then convert to whatever timezone you want on the client side. This has 
been the only sane approach that worked for me over the years.

Lukas

From: Zijing Guo 
Sent: Tuesday, July 7, 2015 5:58 AM
To: user@avro.apache.org ; Zijing Guo 
Subject: Re: What is the status of Date coding?

I also paste this question on stackoverflow, What is the best practice of code 
Date field in Apache Avro as of now?
 
   
  What is the best practice of code Date field in Apache A... 
  I have been searching around and saw the jira 
https://issues.apache.org/jira/browse/AVRO-739 for this matter, but I don't 
have a better sense of what avro support f... 
 
  View on stackoverflow.com Preview by Yahoo 
 
 






On Tuesday, July 7, 2015 8:51 AM, Zijing Guo alter...@yahoo.com wrote:




I have been searching around and saw the jira 
https://issues.apache.org/jira/browse/AVRO-739 for this matter, but I don't 
have a better sense of what avro support for date time within the user 
document. What I trying to achieve is to code the date with timezone 
information (with iso8601) from the kafka producer side in python and all the 
downstream consumers are written in java can decode it properly. what is the 
best practice for this?

Thanks in advance
Edwin







Re: various number of records in records

2015-04-13 Thread Lukas Steiblys
The type of foo should be an array, not a record.

Lukas

On Mon, Apr 13, 2015 at 1:35 AM, Marius m.die0...@googlemail.com wrote:

  Hi,

 i want to put some data into an avro file, however in my case it doesn't
 work...

 {
   name : fooList,
   type : record,
   fields : [
 {name : count, type : int},
 {name : foo, type : {
   type : record,
   name : fooData,
   fields :[
 {name : ids, type : int}
   ]
 }
 }

   ]
 }


 I want to put exactly one count (the number of ids) in it and a variable
 number of ids... However i can only append one count and one id...
 I'm new to avro and JSON so i'm not sure if my idea is right or if i have
 to rewrite my code...
 BTW i tried to google that but i couldn't find anything ;)

 Thanks and Greetings

 Marius



Re: Issue with reading old data with a new Avro Schema

2015-04-08 Thread Lukas Steiblys
The schema is not valid JSON. Maybe you forgot the “[“ after “fields:”?

Lukas
From: Nicolas Phung 
Sent: Wednesday, April 8, 2015 9:45 AM
To: user@avro.apache.org 
Subject: Issue with reading old data with a new Avro Schema

Hello, 

I'm trying to read old avro binary data with a new schema (I add a new field).


This is the Avro Schema (OLD) I was using to write Avro binary data before:
{
namespace: com.hello.world,
type: record,
name: Toto,
fields: 
{
name: a,
type: [
string,
null
]
},
{
name: b,
type: string
}
]
}

This is the Avro Schema (NEW) I'm using to read the Avro binary data :

{
namespace: com.hello.world,
type: record,
name: Toto,
fields: 
{
name: a,
type: [
string,
null
]
},
{
name: b,
type: string
},
{
name: c,
type: string,
default: na
}
]
}

However, I can't read the old data with the new Schema. I've got the following 
errors :

15/04/08 17:32:22 ERROR executor.Executor: Exception in task 0.0 in stage 3.0 
(TID 3)
java.io.EOFException
at org.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:473)
at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128)
at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:259)
at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:272)
at org.apache.avro.io.ValidatingDecoder.readString(ValidatingDecoder.java:113)
at 
org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:353)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:157)
at 
org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
at 
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
at com.miguno.kafka.avro.AvroDecoder.fromBytes(AvroDecoder.scala:31)

From my understanding, I should be able to read the old data with the new 
schema that contains a new field with a default value. But it doesn't seem to 
work. Am I doing something wrong ?

I have posted a report https://issues.apache.org/jira/browse/AVRO-1661

Regards,
Nicolas PHUNG

Re: Adding new field with default value to an Avro schema

2015-02-03 Thread Lukas Steiblys
On a related note, is there a tool that can check the backwards compatibility 
of schemas? I found some old messages talking about it, but no actual tool. I 
guess I could hack it together using some functions in the Avro library.

Lukas

From: Burak Emre 
Sent: Tuesday, February 3, 2015 9:01 AM
To: user@avro.apache.org 
Subject: Re: Adding new field with default value to an Avro schema

@Sean thanks for the explanation. 

I have multiple writers but only one reader and the only schema migration 
operation is adding a new field so I thought that I may use the same schema for 
all dataset since the ordering will be same in all of them even though some may 
contain extra fields which is also defined in schema definition.

Actually I wanted to avoid using an external database for sequential schema ids 
since it would make the system more complex than it should be in my case but it 
seems this is the only option for now.

-- 
Burak Emre
Koc University

On Tuesday 3 February 2015 at 18:22, Sean Busbey wrote:

  Schema evolution in Avro requires access to both the schema used when writing 
the data and the desired Schema for reading the data. 

  Normally, Avro data is stored in some container format (i.e. the one in the 
spec[1]) and the parsing library takes care of pulling the schema used when 
writing out of said container.

  If you are using Avro data in some other location, you must have the writer 
schema as well. One common use case is a shared messaging system focused on 
small messages (but that doesn't use Avro RPC). In such cases, Doug Cutting has 
some guidance he's previously given (quoted with permission, albeit very late):

   A best practice for things like this is to prefix each Avro record
   with a (small) numeric schema ID.  This is used as the key for a
   shared database of schemas.  The schema corresponding to a key never
   changes, so the database can be cached heavily.  It never gets very
   big either.  It could be as simple as a .java file, with the
   constraint that you'd need to upgrade things downstream before
   upstream, or as complicated as an enterprise-wide REST schema service
   (AVRO-1124).  A variation is to use schema fingerprints as keys.
   
   Potentially relevant stuff:
   
   https://issues.apache.org/jira/browse/AVRO-1124
   http://avro.apache.org/docs/current/spec.html#Schema+Fingerprints


  If you take the integer schema ID approach, you can use Avro's built in 
utilities for zig-zap encoding, which will ensure that most of the time your 
identifier only takes a small amount of space.

  [1]: http://avro.apache.org/docs/current/spec.html#Object+Container+Files


  On Tue, Feb 3, 2015 at 5:57 AM, Burak Emre emrekaba...@gmail.com wrote:

  I added a field with a default value to an Avro schema which is 
previously used for writing data. Is it possible to read the previous data 
using only new schema which has that new field at the end?
  I tried this scenario but unfortunately it throws EOFException while 
reading third field. Even though it has a default value and the previous fields 
is read successfully, I'm not able to de-serialize the record back without 
providing the writer schema I used previously.

Schema schema = Schema.createRecord(test, null, avro.test, false);
schema.setFields(Lists.newArrayList(
new Field(project, Schema.create(Type.STRING), null, null),
new Field(city, 
Schema.createUnion(Lists.newArrayList(Schema.create(Type.NULL), 
Schema.create(Type.STRING))), null, NullNode.getInstance())
));

GenericData.Record record = new GenericRecordBuilder(schema)
.set(project, ff).build();

GenericDatumWriter w = new GenericDatumWriter(schema);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
BinaryEncoder encoder = EncoderFactory.get().binaryEncoder(outputStream, null);

w.write(record, encoder);
encoder.flush();

schema = Schema.createRecord(test, null, avro.test, false);
schema.setFields(Lists.newArrayList(
new Field(project, Schema.create(Type.STRING), null, null),
new Field(city, 
Schema.createUnion(Lists.newArrayList(Schema.create(Type.NULL), 
Schema.create(Type.STRING))), null, NullNode.getInstance()),
new Field(newField, 
Schema.createUnion(Lists.newArrayList(Schema.create(Type.NULL), 
Schema.create(Type.STRING))), null, NullNode.getInstance())
));

DatumReaderGenericRecord reader = new GenericDatumReader(schema);
Decoder decoder = 
DecoderFactory.get().binaryDecoder(outputStream.toByteArray(), null);
GenericRecord result = reader.read(null, decoder);




  -- 

  Sean


Generated enum dollar sign in front of a symbol.

2014-10-08 Thread Lukas Steiblys
Has anyone run into the problem where the generated java class for an enum has 
a dollar sign for one enum value?

The schema {type: enum, name: ButtonTypeID, symbols: [default, 
keyboard]} generates the following class:

public final class ButtonTypeID extends java.lang.EnumButtonTypeID {
  public static final ButtonTypeID default$;
  public static final ButtonTypeID keyboard;
  public static final org.apache.avro.Schema SCHEMA$;
  public static ButtonTypeID[] values();
  public static ButtonTypeID valueOf(java.lang.String);
  public static org.apache.avro.Schema getClassSchema();
  static {};
}

(this is what “javap ButtonTypeID.class” produces)

When I try to read my data that has the “default” value for ButtonTypeID, I get 
the exception:
java.lang.IllegalArgumentException: No enum constant ButtonTypeID.default
at java.lang.Enum.valueOf(Enum.java:236)
at 
org.apache.avro.specific.SpecificData.createEnum(SpecificData.java:106)
at 
org.apache.avro.generic.GenericDatumReader.createEnum(GenericDatumReader.java:205)...Strangely,
 everything was working fine a day before. Where is this dollar sign coming 
from?Lukas

Re: Generated enum dollar sign in front of a symbol.

2014-10-08 Thread Lukas Steiblys
I realized now that “default” is a keyword in Java and can’t be used as an enum 
value. The files were generated in python using the python Avro library, where 
“default” is not a keyword and can be used freely. I assume there should be a 
conversion somewhere in the Java Avro library, where a dollar sign is 
automatically added for enum values that are Java keywords. Is that actually 
the case? Why did it fail this time then? Should I file a bug?

Lukas

From: Lukas Steiblys 
Sent: Wednesday, October 8, 2014 12:06 PM
To: user@avro.apache.org 
Subject: Generated enum dollar sign in front of a symbol.

Has anyone run into the problem where the generated java class for an enum has 
a dollar sign for one enum value?

The schema {type: enum, name: ButtonTypeID, symbols: [default, 
keyboard]} generates the following class:

public final class ButtonTypeID extends java.lang.EnumButtonTypeID {
  public static final ButtonTypeID default$;
  public static final ButtonTypeID keyboard;
  public static final org.apache.avro.Schema SCHEMA$;
  public static ButtonTypeID[] values();
  public static ButtonTypeID valueOf(java.lang.String);
  public static org.apache.avro.Schema getClassSchema();
  static {};
}

(this is what “javap ButtonTypeID.class” produces)

When I try to read my data that has the “default” value for ButtonTypeID, I get 
the exception:
java.lang.IllegalArgumentException: No enum constant ButtonTypeID.default
at java.lang.Enum.valueOf(Enum.java:236)
at 
org.apache.avro.specific.SpecificData.createEnum(SpecificData.java:106)
at 
org.apache.avro.generic.GenericDatumReader.createEnum(GenericDatumReader.java:205)...Strangely,
 everything was working fine a day before. Where is this dollar sign coming 
from?Lukas