[jira] [Commented] (AVRO-2076) Combine already serialized Avro records to an Avro file

2017-09-14 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16166858#comment-16166858
 ] 

Doug Cutting commented on AVRO-2076:


Doesn't DataFileWriter#appendEncoded() already provide this?

https://avro.apache.org/docs/1.8.2/api/java/org/apache/avro/file/DataFileWriter.html#appendEncoded(java.nio.ByteBuffer)


> Combine already serialized Avro records to an Avro file
> ---
>
> Key: AVRO-2076
> URL: https://issues.apache.org/jira/browse/AVRO-2076
> Project: Avro
>  Issue Type: Wish
>Reporter: Erik van Oosten
>
> In some use cases Avro events arrive already serialized (e.g. when listening 
> to a Kafka topic). It would be great if there would an API that allows 
> writing an Avro file without the need for deserializing and serializing these 
> Avro records.
> Providing such an API allows for very efficient creation of Avro files: given 
> that these Avro records are written with the same schema, an Avro file would 
> write will the exact same bytes anyway (before block compression).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AVRO-1810) GenericDatumWriter broken with Enum

2017-09-14 Thread Bridger Howell (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16166847#comment-16166847
 ] 

Bridger Howell commented on AVRO-1810:
--

No particular use case, I was just looking at the way that 
{{GenericData.EnumSymbol}} supports being compared with any other 
{{GenericEnumSymbol}} and wondering if there was a good reason for that.

Considering that {{equals}} is final for enums, it seems harder to keep that 
property (although {{compareTo}} should be fine because the type signatures for 
{{java.lang.Enum}}'s {{compareTo}} only matches the enum type, not 
{{GenericEnumSymbol}}?).

At least It would be nice to avoid weird cases where the following test fails:
{noformat}
/* assuming SpecificEnum is generated from enumSchema */
final SpecificEnum specificSymbol = SpecificEnum.FOO;
final GenericData.EnumSymbol genericSymbol = new 
GenericData.EnumSymbol(enumSchema, "FOO");

assertTrue(genericSymbol.equals(specificSymbol) == 
specificSymbol.equals(genericSymbol));
{noformat}

> GenericDatumWriter broken with Enum
> ---
>
> Key: AVRO-1810
> URL: https://issues.apache.org/jira/browse/AVRO-1810
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Ryon Day
>Priority: Blocker
> Fix For: 1.9.0, 1.8.4
>
>
> {panel:title=Description|titleBGColor=#3FA|bgColor=#DDD}
> Using the GenericDatumWriter with either Generic OR SpecificRecord will break 
> if an Enum is present.
> {panel}
> {panel:title=Steps To Reproduce|titleBGColor=#8DB|bgColor=#DDD}
> I have been tracking Avro decoding oddities for a while.
> The tests for this issue can be found 
> [here|https://github.com/ryonday/avroDecodingHelp/blob/master/src/test/java/com/ryonday/test/Avro180EnumFail.java]
> {panel}
> {panel:title=Notes|titleBGColor=#3AF|bgColor=#DDD}
> Due to the debacle that is the Avro "UTF8" object, we have been avoiding it 
> by using the following scheme:
> * Write incoming records to a byte array using the GenericDatumWriter
> * Read back the byte array to our compiled Java domain objects using a 
> SpecificDatumWriter
> This worked great with Avro 1.7.7, and this is a binary-incompatable breaking 
> change with 1.8.0.
> This would appear to be caused by an addition in the 
> {{GenericDatumWriter:163-164}}:
> {code}
>   if (!data.isEnum(datum))
>   throw new AvroTypeException("Not an enum: "+datum);
> {code}
> {panel}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AVRO-1810) GenericDatumWriter broken with Enum

2017-09-14 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16166749#comment-16166749
 ] 

Zoltan Farkas commented on AVRO-1810:
-

[~howellbridger] 
java.lang.Enum equals, hashCode, compareTo are final and cannot be 
overloaded... 

So if one would need to compare generated enums with generic enums a custom 
comparator would be the way...

what is the use case you are thinking of?



> GenericDatumWriter broken with Enum
> ---
>
> Key: AVRO-1810
> URL: https://issues.apache.org/jira/browse/AVRO-1810
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Ryon Day
>Priority: Blocker
> Fix For: 1.9.0, 1.8.4
>
>
> {panel:title=Description|titleBGColor=#3FA|bgColor=#DDD}
> Using the GenericDatumWriter with either Generic OR SpecificRecord will break 
> if an Enum is present.
> {panel}
> {panel:title=Steps To Reproduce|titleBGColor=#8DB|bgColor=#DDD}
> I have been tracking Avro decoding oddities for a while.
> The tests for this issue can be found 
> [here|https://github.com/ryonday/avroDecodingHelp/blob/master/src/test/java/com/ryonday/test/Avro180EnumFail.java]
> {panel}
> {panel:title=Notes|titleBGColor=#3AF|bgColor=#DDD}
> Due to the debacle that is the Avro "UTF8" object, we have been avoiding it 
> by using the following scheme:
> * Write incoming records to a byte array using the GenericDatumWriter
> * Read back the byte array to our compiled Java domain objects using a 
> SpecificDatumWriter
> This worked great with Avro 1.7.7, and this is a binary-incompatable breaking 
> change with 1.8.0.
> This would appear to be caused by an addition in the 
> {{GenericDatumWriter:163-164}}:
> {code}
>   if (!data.isEnum(datum))
>   throw new AvroTypeException("Not an enum: "+datum);
> {code}
> {panel}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (AVRO-1810) GenericDatumWriter broken with Enum

2017-09-14 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16166377#comment-16166377
 ] 

Zoltan Farkas edited comment on AVRO-1810 at 9/14/17 5:58 PM:
--

The way I resolved this in my fork was to make the Generated enums implement 
org.apache.avro.generic.GenericEnumSymbol:

https://github.com/zolyfarkas/avro/blob/trunk/lang/java/compiler/src/main/velocity/org/apache/avro/compiler/specific/templates/java/classic/enum.vm#L29

Also changed GenericEnumSymbol from:

{code}
/** An enum symbol. */
public interface GenericEnumSymbol
extends GenericContainer, Comparable {
  /** Return the symbol. */
  String toString();
}
{code}

to:

{code}
/** An enum symbol. */
public interface GenericEnumSymbol
extends GenericContainer, Comparable {
  /** Return the symbol. */
  String toString();
}
{code}

I can prepare a PR if this approach is OK with everyone.



was (Author: zolyfarkas):
The way I resolved this in my for was to make the Generated enums implement 
org.apache.avro.generic.GenericEnumSymbol:

https://github.com/zolyfarkas/avro/blob/trunk/lang/java/compiler/src/main/velocity/org/apache/avro/compiler/specific/templates/java/classic/enum.vm#L29

Also changed GenericEnumSymbol from:

{code}
/** An enum symbol. */
public interface GenericEnumSymbol
extends GenericContainer, Comparable {
  /** Return the symbol. */
  String toString();
}
{code}

to:

{code}
/** An enum symbol. */
public interface GenericEnumSymbol
extends GenericContainer, Comparable {
  /** Return the symbol. */
  String toString();
}
{code}

I can prepare a PR if this approach is OK with everyone.


> GenericDatumWriter broken with Enum
> ---
>
> Key: AVRO-1810
> URL: https://issues.apache.org/jira/browse/AVRO-1810
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Ryon Day
>Priority: Blocker
> Fix For: 1.9.0, 1.8.4
>
>
> {panel:title=Description|titleBGColor=#3FA|bgColor=#DDD}
> Using the GenericDatumWriter with either Generic OR SpecificRecord will break 
> if an Enum is present.
> {panel}
> {panel:title=Steps To Reproduce|titleBGColor=#8DB|bgColor=#DDD}
> I have been tracking Avro decoding oddities for a while.
> The tests for this issue can be found 
> [here|https://github.com/ryonday/avroDecodingHelp/blob/master/src/test/java/com/ryonday/test/Avro180EnumFail.java]
> {panel}
> {panel:title=Notes|titleBGColor=#3AF|bgColor=#DDD}
> Due to the debacle that is the Avro "UTF8" object, we have been avoiding it 
> by using the following scheme:
> * Write incoming records to a byte array using the GenericDatumWriter
> * Read back the byte array to our compiled Java domain objects using a 
> SpecificDatumWriter
> This worked great with Avro 1.7.7, and this is a binary-incompatable breaking 
> change with 1.8.0.
> This would appear to be caused by an addition in the 
> {{GenericDatumWriter:163-164}}:
> {code}
>   if (!data.isEnum(datum))
>   throw new AvroTypeException("Not an enum: "+datum);
> {code}
> {panel}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AVRO-2003) Report specific location of schema incompatibilities

2017-09-14 Thread Elliot West (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16166634#comment-16166634
 ] 

Elliot West commented on AVRO-2003:
---

[~nkollar], I believe this is ready for review. Apologies for the stream of 
small commits; I had to back out a bunch of formatting changes that were 
perhaps introduced from an early AVRO-1933 patch (I think...). Anyway, I 
believe the PR now includes only changes related to this feature. Thank you. 

> Report specific location of schema incompatibilities
> 
>
> Key: AVRO-2003
> URL: https://issues.apache.org/jira/browse/AVRO-2003
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Affects Versions: 1.8.1
> Environment: Any java env
>Reporter: Elliot West
>Assignee: Elliot West
>Priority: Minor
> Fix For: 1.9.0
>
> Attachments: AVRO-2003.patch
>
>
> h2. Overview
> Building on the work to improve schema incompatibility reporting in 
> AVRO-1933, it would be useful if the {{SchemaCompatibility}} classes could 
> also report the location in the schema where any incompatibility was 
> encountered.
> It is recommended that the location reported is both easily readable by 
> humans and machines. In the first case this would assist schema developers to 
> pin-point issues in there schema documents, and in the latter case it 
> provides a useful mechanism to schema tooling, such as IDEs and editors, to 
> easily select the pertinent nodes in the Schema document tree.
> h2. Implementation specifics
> To meet this requirements it is suggested that the location is encoded using 
> the [JSON Pointer specification|https://tools.ietf.org/html/rfc6901]. This is 
> both easily parsed by users, but is also supported by a number of libraries 
> for a range of common programming languages and platforms.
> h2. Examples
> Given the following example schema, consider some incompatibility scenarios. 
> For each case an expected JSON Pointer description of the incompatibility 
> location is described:
> {code}
> {
>   "type": "record",
>   "name": "myRecord",
>   "fields" : [
> {"name": "pField", "type": "long"},
> {"name": "uField", "type":
>   ["null", "int", "string"]
> },
> {"name": "eField", "type": 
>   { "type": "enum", "name": "Suit", "symbols" : ["SPADES", "HEARTS", 
> "DIAMONDS", "CLUBS"] }
> },
> {"name": "aField", "type":
>   {"type": "array", "items": "string"}
> },
> {"name": "mField", "type": 
>   {"type": "map", "values": "long"}
> },
> {"name": "fField", "type": 
>   {"type": "fixed", "size": 16, "name": "md5"}
> }
>   ]
> }
> {code}
> Possible incompatibility scenarions and the location that would be reported 
> back to the user/tool: 
> * Root type incompatibility; report location: {{/}}
> * Record name mismatch; report location: {{/name}}
> * {{pField}} type incompatibility; report location: {{/fields/0/type}}
> * {{uField}} field type incompatibility; report location: {{/fields/1/type}}
> * {{uField}} missing union branch {{string}}; report location: 
> {{/fields/1/type/2}}
> * {{eField}} field type incompatibility; report location: {{/fields/2/type}}
> * {{eField}} missing enum symbol; report location: {{/fields/2/type/symbols}}
> * {{eField}} enum name mismatch; report location: {{/fields/2/type/name}}
> * {{aField}} field type incompatibility; report location: {{/fields/3/type}}
> * {{aField}} array element type incompatibility; report location: 
> {{/fields/3/type/items}}
> * {{mField}} field type incompatibility; report location: {{/fields/4/type}}
> * {{mField}} map value type incompatibility; report location: 
> {{/fields/4/type/values}}
> * {{fField}} field type incompatibility; report location: {{/fields/5/type}}
> * {{fField}} fixed name mismatch; report location: {{/fields/5/type/name}}
> * {{fField}} fixed size type incompatibility; report location: 
> {{/fields/5/type/size}}
> * {{fField}} missing default value; report location: {{/fields/5}}
> h2. Notes
> * This ticket depends on AVRO-1933 and associated patches.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AVRO-1810) GenericDatumWriter broken with Enum

2017-09-14 Thread Bridger Howell (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16166383#comment-16166383
 ] 

Bridger Howell commented on AVRO-1810:
--

[~zolyfarkas] Would it be bad to interoperate {{GenericData.EnumSymbol}} 
instances with generated classes in terms of {{equals}} and {{compareTo}}?

> GenericDatumWriter broken with Enum
> ---
>
> Key: AVRO-1810
> URL: https://issues.apache.org/jira/browse/AVRO-1810
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Ryon Day
>Priority: Blocker
> Fix For: 1.9.0, 1.8.4
>
>
> {panel:title=Description|titleBGColor=#3FA|bgColor=#DDD}
> Using the GenericDatumWriter with either Generic OR SpecificRecord will break 
> if an Enum is present.
> {panel}
> {panel:title=Steps To Reproduce|titleBGColor=#8DB|bgColor=#DDD}
> I have been tracking Avro decoding oddities for a while.
> The tests for this issue can be found 
> [here|https://github.com/ryonday/avroDecodingHelp/blob/master/src/test/java/com/ryonday/test/Avro180EnumFail.java]
> {panel}
> {panel:title=Notes|titleBGColor=#3AF|bgColor=#DDD}
> Due to the debacle that is the Avro "UTF8" object, we have been avoiding it 
> by using the following scheme:
> * Write incoming records to a byte array using the GenericDatumWriter
> * Read back the byte array to our compiled Java domain objects using a 
> SpecificDatumWriter
> This worked great with Avro 1.7.7, and this is a binary-incompatable breaking 
> change with 1.8.0.
> This would appear to be caused by an addition in the 
> {{GenericDatumWriter:163-164}}:
> {code}
>   if (!data.isEnum(datum))
>   throw new AvroTypeException("Not an enum: "+datum);
> {code}
> {panel}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AVRO-1810) GenericDatumWriter broken with Enum

2017-09-14 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16166377#comment-16166377
 ] 

Zoltan Farkas commented on AVRO-1810:
-

The way I resolved this in my for was to make the Generated enums implement 
org.apache.avro.generic.GenericEnumSymbol:

https://github.com/zolyfarkas/avro/blob/trunk/lang/java/compiler/src/main/velocity/org/apache/avro/compiler/specific/templates/java/classic/enum.vm#L29

Also changed GenericEnumSymbol from:

{code}
/** An enum symbol. */
public interface GenericEnumSymbol
extends GenericContainer, Comparable {
  /** Return the symbol. */
  String toString();
}
{code}

to:

{code}
/** An enum symbol. */
public interface GenericEnumSymbol
extends GenericContainer, Comparable {
  /** Return the symbol. */
  String toString();
}
{code}

I can prepare a PR if this approach is OK with everyone.


> GenericDatumWriter broken with Enum
> ---
>
> Key: AVRO-1810
> URL: https://issues.apache.org/jira/browse/AVRO-1810
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Ryon Day
>Priority: Blocker
> Fix For: 1.9.0, 1.8.4
>
>
> {panel:title=Description|titleBGColor=#3FA|bgColor=#DDD}
> Using the GenericDatumWriter with either Generic OR SpecificRecord will break 
> if an Enum is present.
> {panel}
> {panel:title=Steps To Reproduce|titleBGColor=#8DB|bgColor=#DDD}
> I have been tracking Avro decoding oddities for a while.
> The tests for this issue can be found 
> [here|https://github.com/ryonday/avroDecodingHelp/blob/master/src/test/java/com/ryonday/test/Avro180EnumFail.java]
> {panel}
> {panel:title=Notes|titleBGColor=#3AF|bgColor=#DDD}
> Due to the debacle that is the Avro "UTF8" object, we have been avoiding it 
> by using the following scheme:
> * Write incoming records to a byte array using the GenericDatumWriter
> * Read back the byte array to our compiled Java domain objects using a 
> SpecificDatumWriter
> This worked great with Avro 1.7.7, and this is a binary-incompatable breaking 
> change with 1.8.0.
> This would appear to be caused by an addition in the 
> {{GenericDatumWriter:163-164}}:
> {code}
>   if (!data.isEnum(datum))
>   throw new AvroTypeException("Not an enum: "+datum);
> {code}
> {panel}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AVRO-1810) GenericDatumWriter broken with Enum

2017-09-14 Thread Sean Busbey (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated AVRO-1810:
--
Fix Version/s: 1.9.0
   1.8.4

> GenericDatumWriter broken with Enum
> ---
>
> Key: AVRO-1810
> URL: https://issues.apache.org/jira/browse/AVRO-1810
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Ryon Day
>Priority: Blocker
> Fix For: 1.9.0, 1.8.4
>
>
> {panel:title=Description|titleBGColor=#3FA|bgColor=#DDD}
> Using the GenericDatumWriter with either Generic OR SpecificRecord will break 
> if an Enum is present.
> {panel}
> {panel:title=Steps To Reproduce|titleBGColor=#8DB|bgColor=#DDD}
> I have been tracking Avro decoding oddities for a while.
> The tests for this issue can be found 
> [here|https://github.com/ryonday/avroDecodingHelp/blob/master/src/test/java/com/ryonday/test/Avro180EnumFail.java]
> {panel}
> {panel:title=Notes|titleBGColor=#3AF|bgColor=#DDD}
> Due to the debacle that is the Avro "UTF8" object, we have been avoiding it 
> by using the following scheme:
> * Write incoming records to a byte array using the GenericDatumWriter
> * Read back the byte array to our compiled Java domain objects using a 
> SpecificDatumWriter
> This worked great with Avro 1.7.7, and this is a binary-incompatable breaking 
> change with 1.8.0.
> This would appear to be caused by an addition in the 
> {{GenericDatumWriter:163-164}}:
> {code}
>   if (!data.isEnum(datum))
>   throw new AvroTypeException("Not an enum: "+datum);
> {code}
> {panel}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AVRO-1810) GenericDatumWriter broken with Enum

2017-09-14 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16166299#comment-16166299
 ] 

Sean Busbey commented on AVRO-1810:
---

Personally I'd love to see this get fixed. I'm happy to review, but almost 
certainly don't have the time to post a patch up. If anyone wants to try to 
work through things please either ping here or ping me off-list and I'll help 
get you up and running.

> GenericDatumWriter broken with Enum
> ---
>
> Key: AVRO-1810
> URL: https://issues.apache.org/jira/browse/AVRO-1810
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Ryon Day
>Priority: Blocker
>
> {panel:title=Description|titleBGColor=#3FA|bgColor=#DDD}
> Using the GenericDatumWriter with either Generic OR SpecificRecord will break 
> if an Enum is present.
> {panel}
> {panel:title=Steps To Reproduce|titleBGColor=#8DB|bgColor=#DDD}
> I have been tracking Avro decoding oddities for a while.
> The tests for this issue can be found 
> [here|https://github.com/ryonday/avroDecodingHelp/blob/master/src/test/java/com/ryonday/test/Avro180EnumFail.java]
> {panel}
> {panel:title=Notes|titleBGColor=#3AF|bgColor=#DDD}
> Due to the debacle that is the Avro "UTF8" object, we have been avoiding it 
> by using the following scheme:
> * Write incoming records to a byte array using the GenericDatumWriter
> * Read back the byte array to our compiled Java domain objects using a 
> SpecificDatumWriter
> This worked great with Avro 1.7.7, and this is a binary-incompatable breaking 
> change with 1.8.0.
> This would appear to be caused by an addition in the 
> {{GenericDatumWriter:163-164}}:
> {code}
>   if (!data.isEnum(datum))
>   throw new AvroTypeException("Not an enum: "+datum);
> {code}
> {panel}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AVRO-2076) Combine already serialized Avro records to an Avro file

2017-09-14 Thread Erik van Oosten (JIRA)
Erik van Oosten created AVRO-2076:
-

 Summary: Combine already serialized Avro records to an Avro file
 Key: AVRO-2076
 URL: https://issues.apache.org/jira/browse/AVRO-2076
 Project: Avro
  Issue Type: Wish
Reporter: Erik van Oosten


In some use cases Avro events arrive already serialized (e.g. when listening to 
a Kafka topic). It would be great if there would an API that allows writing an 
Avro file without the need for deserializing and serializing these Avro records.

Providing such an API allows for very efficient creation of Avro files: given 
that these Avro records are written with the same schema, an Avro file would 
write will the exact same bytes anyway (before block compression).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)