[jira] [Commented] (AVRO-2247) Improve Java reading performance with a new reader

2020-01-31 Thread Raymie Stata (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027873#comment-17027873
 ] 

Raymie Stata commented on AVRO-2247:


I'm in favor.

> Improve Java reading performance with a new reader
> --
>
> Key: AVRO-2247
> URL: https://issues.apache.org/jira/browse/AVRO-2247
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Martin Jubelgas
>Priority: Major
> Fix For: 1.10.0
>
> Attachments: Perf-Comparison.md
>
>
> Complementary to AVRO-2090, I have been working on decoding of Avro objects 
> in Java and am suggesting a new implementation of a DatumReader that improves 
> read performance for both generic and specific records by approximately 20% 
> (and even more in cases of nested objects with defaults, a case I encounter a 
> lot in practical use).
> Key concept is to create a detailed execution plan once at DatumReader. This 
> execution plan contains all required defaulting/lookup values so they need 
> not be looked up during object traversal while reading.
> The reader implementation can be enabled and disabled per GenericData 
> instance. The system default is set via the system variable 
> "org.apache.avro.fastread" (defaults to "false").
> Attached a performance comparison of the existing implementation with the 
> proposed one. Will open a pull request with respective code in a bit (not 
> including interoperability with the optimizations of AVRO-2090 yet). Please 
> let me know your opinion of whether this is worth pursuing further.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AVRO-2400) Avro 1.9.0 can't resolve schemas that can be resolved in 1.8.2

2019-05-29 Thread Raymie Stata (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16850557#comment-16850557
 ] 

Raymie Stata commented on AVRO-2400:


I don't have a horse in this race, other than getting the definition of schema 
resolution and the schema-compatibility check to agree.  Whether 1.9.x goes 
with the stricter (SchemaCompatibility) definition or the looser (schema 
resolution) approach doesn't much matter to me.  Happy to let the larger 
community decide.

That said, it doesn't seem like the larger community is likely to weight in on 
this topic.  As a result, I think we're obligated to go with what we believe 
most people in the community are assuming.  From this perspective, it seems 
like many more people are dependent on what the Java schema-resolution logic in 
1.8.x is doing, which would make it the de facto standard (any evidence to the 
contrary?).  Thus, I'd lean towards bringing the 1.9.x behavior of 
schema-resolution and SchemaCompatibility in line with the 1.8.x behavior or 
schema-resolution.  But definitely open to counter arguments!

> Avro 1.9.0 can't resolve schemas that can be resolved in 1.8.2
> --
>
> Key: AVRO-2400
> URL: https://issues.apache.org/jira/browse/AVRO-2400
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Reporter: Jacob Tolar
>Priority: Blocker
> Fix For: 1.10.0, 1.9.1
>
>
> The failure occurs in ResolvingGrammarGenerator when reader and writer schema 
> have an array of records with different full names (e.g. different 
> namespace). 
> {code:java}
> Exception in thread "main" java.lang.ClassCastException: 
> org.apache.avro.Resolver$ReaderUnion cannot be cast to 
> org.apache.avro.Resolver$Container{code}
> Avro 1.8.2 allowed this behavior but it now fails in 1.9.0. Looking at the 
> jiras and code, I don't believe this was intentional ( AVRO-2275,  
> [https://github.com/apache/avro/commit/39d959e1c6a1f339f03dab18289e47f27c10be7f]
>   ).
>  
> It looks like there were some attempts to keep compatibility ( 
> [https://github.com/apache/avro/blob/branch-1.9/lang/java/avro/src/main/java/org/apache/avro/Resolver.java]
>  , e.g. see the commented out check for w.getFullName() in resolve()) but 
> this case was missed.
>  
> See this simple example to reproduce. 
> [https://gist.github.com/jacobtolar/c88d43ab4e8767227891e5cdc188ffad]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2400) Avro 1.9.0 can't resolve schemas that can be resolved in 1.8.2

2019-05-27 Thread Raymie Stata (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16849231#comment-16849231
 ] 

Raymie Stata commented on AVRO-2400:


[~jtolar] made the following observation via email:

bq. Note that the resolver and SchemaCompatibility disagree on this. (The test 
provided would be marked 'incompatible' by SchemaCompatibility...and would have 
been in 1.8.x as well, even though the actual parser/resolver allowed it). If 
the spec is indeed updated per @rstata's suggestion then 
SchemaCompatibility.java should probably be updated to match as well.

Yes, good catch, lines 97 and 98 of SchemaCompatibility also need to be updated 
to use getName vs getFullName.  Note, however, that line 102 needs to use the 
writer's _full,_ because the aliasing part of the spec is clear as to what to 
do with qualified vs unqualified names.

> Avro 1.9.0 can't resolve schemas that can be resolved in 1.8.2
> --
>
> Key: AVRO-2400
> URL: https://issues.apache.org/jira/browse/AVRO-2400
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Reporter: Jacob Tolar
>Priority: Blocker
> Fix For: 1.10.0, 1.9.1
>
>
> The failure occurs in ResolvingGrammarGenerator when reader and writer schema 
> have an array of records with different full names (e.g. different 
> namespace). 
> {code:java}
> Exception in thread "main" java.lang.ClassCastException: 
> org.apache.avro.Resolver$ReaderUnion cannot be cast to 
> org.apache.avro.Resolver$Container{code}
> Avro 1.8.2 allowed this behavior but it now fails in 1.9.0. Looking at the 
> jiras and code, I don't believe this was intentional ( AVRO-2275,  
> [https://github.com/apache/avro/commit/39d959e1c6a1f339f03dab18289e47f27c10be7f]
>   ).
>  
> It looks like there were some attempts to keep compatibility ( 
> [https://github.com/apache/avro/blob/branch-1.9/lang/java/avro/src/main/java/org/apache/avro/Resolver.java]
>  , e.g. see the commented out check for w.getFullName() in resolve()) but 
> this case was missed.
>  
> See this simple example to reproduce. 
> [https://gist.github.com/jacobtolar/c88d43ab4e8767227891e5cdc188ffad]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2400) Avro 1.9.0 can't resolve schemas that can be resolved in 1.8.2

2019-05-23 Thread Raymie Stata (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846683#comment-16846683
 ] 

Raymie Stata commented on AVRO-2400:


Sorry for the delay getting back to you.  Thanks for reporting this issue!

The underlying issue here is an ambiguity in the specification.  The spec 
reads: "both schemas are enums whose names match, [or] both schemas are fixed 
whose sizes and names match, [or] both schemas are records with the same name." 
 But the specification is not clear as to whether the "name" here is the 
name-space qualified name, or the unqualified name.  The old implementation 
took the position that "name" here meant the unqualified name, and there 
doesn't seem like a good reason to reverse this approach right now.

The following would be a good way to fix this bug:

1) Yes, please do submit your reproduction as a test case for the future.

2) Please also update the spec to replace "name[s]" with "(unqualified) 
name[s]" in the places just quoted.

3) On line 695 of Resolver.java (i.e., in the unionEquiv method), please 
replace the three occurrences of ".getFullName" with ".getName" -- that should 
fix the problem in the most surgical means possible.  (I'm a bit nervous about 
re-ordering the WRITER_UNION and READER_UNION cases as you do in your first, 
suggested fix.  And completely wiping out the guard on line 694 would 
completely eliminate any name-based checking, which would relax the spec even 
further, which I don't think we want to do).

> Avro 1.9.0 can't resolve schemas that can be resolved in 1.8.2
> --
>
> Key: AVRO-2400
> URL: https://issues.apache.org/jira/browse/AVRO-2400
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Reporter: Jacob Tolar
>Priority: Blocker
> Fix For: 1.10.0, 1.9.1
>
>
> The failure occurs in ResolvingGrammarGenerator when reader and writer schema 
> have an array of records with different full names (e.g. different 
> namespace). 
> {code:java}
> Exception in thread "main" java.lang.ClassCastException: 
> org.apache.avro.Resolver$ReaderUnion cannot be cast to 
> org.apache.avro.Resolver$Container{code}
> Avro 1.8.2 allowed this behavior but it now fails in 1.9.0. Looking at the 
> jiras and code, I don't believe this was intentional ( AVRO-2275,  
> [https://github.com/apache/avro/commit/39d959e1c6a1f339f03dab18289e47f27c10be7f]
>   ).
>  
> It looks like there were some attempts to keep compatibility ( 
> [https://github.com/apache/avro/blob/branch-1.9/lang/java/avro/src/main/java/org/apache/avro/Resolver.java]
>  , e.g. see the commented out check for w.getFullName() in resolve()) but 
> this case was missed.
>  
> See this simple example to reproduce. 
> [https://gist.github.com/jacobtolar/c88d43ab4e8767227891e5cdc188ffad]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AVRO-2275) Refactor schema-resolution code from grammar-generation

2018-11-25 Thread Raymie Stata (JIRA)
Raymie Stata created AVRO-2275:
--

 Summary: Refactor schema-resolution code from grammar-generation
 Key: AVRO-2275
 URL: https://issues.apache.org/jira/browse/AVRO-2275
 Project: Apache Avro
  Issue Type: Improvement
  Components: java
Reporter: Raymie Stata
Assignee: Raymie Stata


In my own work to extend AVRO-2090, and also in AVRO-2247, an alternative 
approach optimizing decoders, we were forced to re-implement Schema resolution 
logic because it's currently embedded deeply in ResolvingGrammarGenerator.  
However, in the past the Avro community found it hard to maintain multiple 
implementations of the schema resolution code, as it is tedious and error-prone 
code.

In this JIRA we've refactored the resolution code into a new class called 
Resolver, and have rewritten ResolvingGrammarGenerator to be a client of this 
class.  This rewrite passes the full regression suite, including bug-for-bug 
compatibility with a few questionable resolutions rules, such as the "soft 
matching" rule for record in unions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AVRO-2274) Improve resolving performance when schemas don't change

2018-11-25 Thread Raymie Stata (JIRA)
Raymie Stata created AVRO-2274:
--

 Summary: Improve resolving performance when schemas don't change
 Key: AVRO-2274
 URL: https://issues.apache.org/jira/browse/AVRO-2274
 Project: Apache Avro
  Issue Type: Improvement
  Components: java
Reporter: Raymie Stata
Assignee: Raymie Stata


Decoding optimizations based on the observation that schemas don't change very 
much.  We add special-case paths to optimize the case where a _sub_schema of 
the reader and the writer are the same.  The specific cases are:

* In the case of an enumeration, if the reader and writer are the same, then we 
can simply return the tag written by the writer rather than "adjust" it as if 
it might have been re-ordered.  In fact, we can do this (directly return the 
tag written by the writer) as long as the reader-schema is an "extension" of 
the writer's in that it may have added new symbols but hasn't renumbered any of 
the writer's symbols.  Enumerations that either don't change at all or are 
"extended" as defined here are the common ways to extend enumerations.  (Our 
tests show this optimization improves performance by about 3%.)

* When the reader and writer subschemas are both unions, resolution is 
expensive: we have an outer union preceded by a "writer-union action", but each 
branch of this outer union consist of union-adjust actions, which are heavy 
weight.  We optimize this case when the reader and writer unions are the same: 
we fall back on the standard grammar used for a union, avoiding all these 
adjustments.  Since unions are commonly used to encode "nullable" fields in 
Avro, and nullability rarely changes as a schema evolves, this optimization 
should help many users.  (Our tests show this optimization improves performance 
by 25-30%, a significant win.)

* The "custom code" generated for reading records has to read fields in a loop 
that uses a switch statement to deal with writers that may have re-ordered 
fields.  In most cases, however, fields have not been reordered (esp. in more 
complex records with many record sub-schemas).  So we've added a new method to 
ResolvingDecoder called readFieldOrderIfDiff, which is a variant of the 
existing readFieldOrder.  If the field order has indeed changed, then 
readFieldOrderIfDiff returns the new field order, just like readFieldOrder 
does.  However, if the field-order hasn't changed, then readFieldOrderIfDiff 
returns null.  We then modified the generation of custom-decoders for records 
to add a special-case path that simply reads the record's fields in order, 
without incurring the overhead of the loop or the switch statement.  (Our tests 
show this optimization improves performance by 8-9%, on top of the 35-40% 
produced by the original custom-coder optimization.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AVRO-2269) Improve usability of Perf.java

2018-11-25 Thread Raymie Stata (JIRA)


 [ 
https://issues.apache.org/jira/browse/AVRO-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymie Stata updated AVRO-2269:
---
Description: 
The class {{org.apache.avro.ipc.io.Perf}} is Avro's performance test suite.  
This JIRA aims to make it easier to use.  Specifically:

* Added a file {{performance-testing.html}} with guidance on how to use the 
suite
* Added script {{run-script.sh}} that uses {{Perf}} to run structured 
experiments.
* Added tests for performance of resolution of unchanged unions and 
enumerations, which will be subject to future optimizations.
* Tweaks to {{Perf}} for better experimentation (e.g., support for minimum as 
well as average aggregation).


  was:
In attempting to use Perf.java to show that proposed performance changes 
actually improved performance, different runs of Perf.java using the exact same 
code base resulted variances of 5% or greater – and often 10% or greater – for 
about half the test cases. With variance this high within a code base, it's 
impossible to tell if a proposed "improved" code base indeed improves 
performance. I will post to the wiki and elsewhere some documents and scripts I 
developed to reduce this variance. This JIRA is for changes to Perf.java that 
reduce the variance. Specifically:
 * Access the {{reader}} and {{writer}} instance variables directly in the 
inner-loop for {{SpecificTest}}, as well as switched to a "reuse" object for 
reading records, rather than constructing fresh objects for each read. Both 
helped to significantly reduce variance for {{FooBarSpecificRecordTestWrite}}, 
a major target of recent performance-improvement efforts.

 * Switched to {{DirectBinaryEncoder}} instead of {{BufferedBinaryEncoder}} for 
write tests. Although this slowed writer-tests a bit, it reduced variance a 
lot, especially for performance tests of primitives like booleans, making it a 
better choice for measuring the performance-impact of code changes.

 * Started the timer of a test after the encoder/decoder for the test is 
constructed, rather than before. Helps a little.

 * Added the ability to output the _minimum_ runtime of a test case across 
multiple cycles (vs the total runtime across all cycles). This was inspired by 
JVMSpec, which used to use a minimum.  I was able to reduce the variance of 
total runtime enough to obviate the need for this metric, but since it's 
helpful diagnostically, I left it in.


> Improve usability of Perf.java
> --
>
> Key: AVRO-2269
> URL: https://issues.apache.org/jira/browse/AVRO-2269
> Project: Apache Avro
>  Issue Type: Test
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
>
> The class {{org.apache.avro.ipc.io.Perf}} is Avro's performance test suite.  
> This JIRA aims to make it easier to use.  Specifically:
> * Added a file {{performance-testing.html}} with guidance on how to use the 
> suite
> * Added script {{run-script.sh}} that uses {{Perf}} to run structured 
> experiments.
> * Added tests for performance of resolution of unchanged unions and 
> enumerations, which will be subject to future optimizations.
> * Tweaks to {{Perf}} for better experimentation (e.g., support for minimum as 
> well as average aggregation).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2269) Improve usability of Perf.java

2018-11-24 Thread Raymie Stata (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698096#comment-16698096
 ] 

Raymie Stata commented on AVRO-2269:


This work has changed direction.  The focus shifted away from variance and 
towards usability.  I've updated the subject and description accordingly.

> Improve usability of Perf.java
> --
>
> Key: AVRO-2269
> URL: https://issues.apache.org/jira/browse/AVRO-2269
> Project: Apache Avro
>  Issue Type: Test
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
>
> In attempting to use Perf.java to show that proposed performance changes 
> actually improved performance, different runs of Perf.java using the exact 
> same code base resulted variances of 5% or greater – and often 10% or greater 
> – for about half the test cases. With variance this high within a code base, 
> it's impossible to tell if a proposed "improved" code base indeed improves 
> performance. I will post to the wiki and elsewhere some documents and scripts 
> I developed to reduce this variance. This JIRA is for changes to Perf.java 
> that reduce the variance. Specifically:
>  * Access the {{reader}} and {{writer}} instance variables directly in the 
> inner-loop for {{SpecificTest}}, as well as switched to a "reuse" object for 
> reading records, rather than constructing fresh objects for each read. Both 
> helped to significantly reduce variance for 
> {{FooBarSpecificRecordTestWrite}}, a major target of recent 
> performance-improvement efforts.
>  * Switched to {{DirectBinaryEncoder}} instead of {{BufferedBinaryEncoder}} 
> for write tests. Although this slowed writer-tests a bit, it reduced variance 
> a lot, especially for performance tests of primitives like booleans, making 
> it a better choice for measuring the performance-impact of code changes.
>  * Started the timer of a test after the encoder/decoder for the test is 
> constructed, rather than before. Helps a little.
>  * Added the ability to output the _minimum_ runtime of a test case across 
> multiple cycles (vs the total runtime across all cycles). This was inspired 
> by JVMSpec, which used to use a minimum.  I was able to reduce the variance 
> of total runtime enough to obviate the need for this metric, but since it's 
> helpful diagnostically, I left it in.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AVRO-2269) Improve usability of Perf.java

2018-11-24 Thread Raymie Stata (JIRA)


 [ 
https://issues.apache.org/jira/browse/AVRO-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymie Stata updated AVRO-2269:
---
Summary: Improve usability of Perf.java  (was: Improve variances seen 
across Perf.java runs)

> Improve usability of Perf.java
> --
>
> Key: AVRO-2269
> URL: https://issues.apache.org/jira/browse/AVRO-2269
> Project: Apache Avro
>  Issue Type: Test
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
>
> In attempting to use Perf.java to show that proposed performance changes 
> actually improved performance, different runs of Perf.java using the exact 
> same code base resulted variances of 5% or greater – and often 10% or greater 
> – for about half the test cases. With variance this high within a code base, 
> it's impossible to tell if a proposed "improved" code base indeed improves 
> performance. I will post to the wiki and elsewhere some documents and scripts 
> I developed to reduce this variance. This JIRA is for changes to Perf.java 
> that reduce the variance. Specifically:
>  * Access the {{reader}} and {{writer}} instance variables directly in the 
> inner-loop for {{SpecificTest}}, as well as switched to a "reuse" object for 
> reading records, rather than constructing fresh objects for each read. Both 
> helped to significantly reduce variance for 
> {{FooBarSpecificRecordTestWrite}}, a major target of recent 
> performance-improvement efforts.
>  * Switched to {{DirectBinaryEncoder}} instead of {{BufferedBinaryEncoder}} 
> for write tests. Although this slowed writer-tests a bit, it reduced variance 
> a lot, especially for performance tests of primitives like booleans, making 
> it a better choice for measuring the performance-impact of code changes.
>  * Started the timer of a test after the encoder/decoder for the test is 
> constructed, rather than before. Helps a little.
>  * Added the ability to output the _minimum_ runtime of a test case across 
> multiple cycles (vs the total runtime across all cycles). This was inspired 
> by JVMSpec, which used to use a minimum.  I was able to reduce the variance 
> of total runtime enough to obviate the need for this metric, but since it's 
> helpful diagnostically, I left it in.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (AVRO-1658) Add avroDoc on reflect

2018-11-20 Thread Raymie Stata (JIRA)


 [ 
https://issues.apache.org/jira/browse/AVRO-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymie Stata reopened AVRO-1658:

  Assignee: Raymie Stata  (was: Zhaonan Sun)

The file {{AvroDoc.java}} doesn't have a license, cause the build to break 
(grumble).  Will send a patch for this shortly.

> Add avroDoc on reflect
> --
>
> Key: AVRO-1658
> URL: https://issues.apache.org/jira/browse/AVRO-1658
> Project: Apache Avro
>  Issue Type: New Feature
>  Components: java
>Affects Versions: 1.7.7
>Reporter: Zhaonan Sun
>Assignee: Raymie Stata
>Priority: Major
>  Labels: reflection
> Fix For: 1.9.0
>
> Attachments: 
> 0001-AVRO-1658-Java-Add-reflection-annotation-AvroDoc.patch, 
> 0001-AVRO-1658-Java-Add-reflection-annotation-AvroDoc.patch, 
> 0001-AVRO-1658-Java-Add-reflection-annotation-AvroDoc.patch
>
>
> Looks like @AvroMeta can't add reserved fields, like @AvroMeta("doc", "some 
> doc") will have exceptions.
> I would be greate if we have a @AvroDoc("some documentations") in 
> org.apache.avro.reflect



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2269) Improve variances seen across Perf.java runs

2018-11-16 Thread Raymie Stata (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689164#comment-16689164
 ] 

Raymie Stata commented on AVRO-2269:


I took a look at JMH.  I think it'd be great to convert `Perf.java` over to 
JMH.  I didn't pursue it because I couldn't find good enough doc's on JMH to 
feel comfortable using it myself.

The forthcoming patch I have for AVRO-2269 make changes that are orthogonal to 
what JMH does.  JMH does things like warm up the JIT and various caches, and so 
forth, and it runs tests a dynamic number of times in order to "seek" stable 
statistics on performance metrics.  The current `Perf.main` does some of this 
already – I didn't touch any of that code – but JMH seems to do a much more 
professional job of it.  Thus, again, it'd be great to convert `Perf.java` to 
JMH.

That said, while JMH might do a pretty good job of finding the "true" running 
time of a highly-variance piece of code, it doesn't turn a high-variance piece 
of code into a low-variance one.  The forthcoming patch for AVRO-2269 do the 
latter – try to reduce the inherent variance of the tests (for example, by 
reducing the allocations done for `FooBarSpecificRecord` tests).  JMH together 
with this forthcoming patch would be a great combination.

A just submitted a pull request for AVRO-2268 containing a little bug fix that 
I want to depend upon, but which is pretty independent of the changes I have 
for AVRO-2269.  If someone could pull AVRO-2268, I'd like to rebase onto that 
change before submitting the AVRO-2269 patch.

> Improve variances seen across Perf.java runs
> 
>
> Key: AVRO-2269
> URL: https://issues.apache.org/jira/browse/AVRO-2269
> Project: Apache Avro
>  Issue Type: Test
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
>
> In attempting to use Perf.java to show that proposed performance changes 
> actually improved performance, different runs of Perf.java using the exact 
> same code base resulted variances of 5% or greater – and often 10% or greater 
> – for about half the test cases. With variance this high within a code base, 
> it's impossible to tell if a proposed "improved" code base indeed improves 
> performance. I will post to the wiki and elsewhere some documents and scripts 
> I developed to reduce this variance. This JIRA is for changes to Perf.java 
> that reduce the variance. Specifically:
>  * Access the {{reader}} and {{writer}} instance variables directly in the 
> inner-loop for {{SpecificTest}}, as well as switched to a "reuse" object for 
> reading records, rather than constructing fresh objects for each read. Both 
> helped to significantly reduce variance for 
> {{FooBarSpecificRecordTestWrite}}, a major target of recent 
> performance-improvement efforts.
>  * Switched to {{DirectBinaryEncoder}} instead of {{BufferedBinaryEncoder}} 
> for write tests. Although this slowed writer-tests a bit, it reduced variance 
> a lot, especially for performance tests of primitives like booleans, making 
> it a better choice for measuring the performance-impact of code changes.
>  * Started the timer of a test after the encoder/decoder for the test is 
> constructed, rather than before. Helps a little.
>  * Added the ability to output the _minimum_ runtime of a test case across 
> multiple cycles (vs the total runtime across all cycles). This was inspired 
> by JVMSpec, which used to use a minimum.  I was able to reduce the variance 
> of total runtime enough to obviate the need for this metric, but since it's 
> helpful diagnostically, I left it in.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2268) Perf.java SpecificRecord input data not working

2018-11-16 Thread Raymie Stata (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689155#comment-16689155
 ] 

Raymie Stata commented on AVRO-2268:


My patch for AVRO-2269 assumes this fix is in place.  I wanted to submit this 
patch separately because the issue is independent of AVRO-2269 and the problem 
should be fixed whether or not AVRO-2269 is accepted.

> Perf.java SpecificRecord input data not working
> ---
>
> Key: AVRO-2268
> URL: https://issues.apache.org/jira/browse/AVRO-2268
> Project: Apache Avro
>  Issue Type: Test
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
>
> In {{FooBarSpecificRecordTest.genSingleRecord}}, the {{nicknames}} field is 
> given an instance of what is returned by {{ArrayList.asList}}, which does 
> _not_ support the {{clear}} method.  When reusing objects during a read, the 
> {{clear}} method is used to clear the contents of array-valued fields during 
> reading, which causes an {{OperationNotSupported}} exception.  So 
> {{genSingleRecord}} needs to change to set {{nicknames}} to a type that 
> implements {{clear}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AVRO-2269) Improve variances seen across Perf.java runs

2018-11-15 Thread Raymie Stata (JIRA)
Raymie Stata created AVRO-2269:
--

 Summary: Improve variances seen across Perf.java runs
 Key: AVRO-2269
 URL: https://issues.apache.org/jira/browse/AVRO-2269
 Project: Apache Avro
  Issue Type: Test
  Components: java
Reporter: Raymie Stata
Assignee: Raymie Stata


In attempting to use Perf.java to show that proposed performance changes 
actually improved performance, different runs of Perf.java using the exact same 
code base resulted variances of 5% or greater – and often 10% or greater – for 
about half the test cases. With variance this high within a code base, it's 
impossible to tell if a proposed "improved" code base indeed improves 
performance. I will post to the wiki and elsewhere some documents and scripts I 
developed to reduce this variance. This JIRA is for changes to Perf.java that 
reduce the variance. Specifically:
 * Access the {{reader}} and {{writer}} instance variables directly in the 
inner-loop for {{SpecificTest}}, as well as switched to a "reuse" object for 
reading records, rather than constructing fresh objects for each read. Both 
helped to significantly reduce variance for {{FooBarSpecificRecordTestWrite}}, 
a major target of recent performance-improvement efforts.

 * Switched to {{DirectBinaryEncoder}} instead of {{BufferedBinaryEncoder}} for 
write tests. Although this slowed writer-tests a bit, it reduced variance a 
lot, especially for performance tests of primitives like booleans, making it a 
better choice for measuring the performance-impact of code changes.

 * Started the timer of a test after the encoder/decoder for the test is 
constructed, rather than before. Helps a little.

 * Added the ability to output the _minimum_ runtime of a test case across 
multiple cycles (vs the total runtime across all cycles). This was inspired by 
JVMSpec, which used to use a minimum.  I was able to reduce the variance of 
total runtime enough to obviate the need for this metric, but since it's 
helpful diagnostically, I left it in.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2252) I'd like to improve Avro .NET (C#) library (many points)

2018-11-06 Thread Raymie Stata (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677387#comment-16677387
 ] 

Raymie Stata commented on AVRO-2252:


As annoying as bad style can be, large-scale changes to a code base for the 
sake of stylistic improvement makes it difficult to navigate the history of a 
code base, because diffs of post-change code against pre-change code show a lot 
of spurious changes not relevant to the problem you're trying to track down.

 

My suggestion is to focus on substantive changes for now (bug fixes, 
performance improvements, new features, etc).  Make your stylistic changes 
right after the new release is shipped.  This way, if any of the new 
substantive changes causes regressions, it will be easier to debug them (and 
fix them both on the release branch and on master).

> I'd like to improve Avro .NET (C#) library (many points)
> 
>
> Key: AVRO-2252
> URL: https://issues.apache.org/jira/browse/AVRO-2252
> Project: Avro
>  Issue Type: Wish
>  Components: csharp
>Reporter: Anton Ryzhov
>Priority: Major
>
> Hello all,
> The company where I'm working as a .NET developer is actively using Avro 
> format.
> I'd like to improve Avro .NET (C#) library:
> 1) clean-up the code:
>  - remove trailing spaces, unused namespace usings, etc.
>  - remove unused dependency of log4net library
>  - replace dependency of json library from direct reference to Nuget package
> 2) format the code to unify code style everywhere in the library
>  - possibly using the Microsoft recommended code style for C# 
> [https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/inside-a-program/coding-conventions]
> 3) use the latest C# 7.0 language features to make the code more compact and 
> readable
> 4) make .NET 4.5 and .NET standard 2.0 versions of the library, keeping the 
> existing compatibility with the .NET 3.5
>  - add asynchronous API to the .NET 4.5 and .NET standard 2.0 versions (async 
> methods along with the synchronous ones).
> What do you think?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (AVRO-2251) Modify Perf.java to better support automation scripts

2018-11-02 Thread Raymie Stata (JIRA)


 [ 
https://issues.apache.org/jira/browse/AVRO-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymie Stata reopened AVRO-2251:


As I continue to work on performance testing, I'm wanting to experiment with 
values for Perf.COUNT and Perf.CYCLES without having to recompile.  An 
additional patch is forthcoming that allows for this.

> Modify Perf.java to better support automation scripts
> -
>
> Key: AVRO-2251
> URL: https://issues.apache.org/jira/browse/AVRO-2251
> Project: Avro
>  Issue Type: Test
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
> Fix For: 1.9.0
>
>
> To better support automated performance-test suites, this patch proposes two 
> new arguments to the 'Perf.java' command-line tool:
> The `-o' argument gives 'Perf.java' the name of a file that should get the 
> results of the run.  Currently, Perf.java sends output to System.out – but if 
> 'Perf.java' is invoked using Maven, which is the easiest way to invoke it, 
> then System.out is polluted with a bunch of other output.  Redirecting 
> 'Perf.java' output metrics to a file makes it easier for automation scripts 
> to process those metrics.
> The `-c [spec]` argument tells 'Perf.java' to generate a comma-separated 
> output.  By default, all benchmark metrics are output, but the optional 
> `spec` argument can be used to indicate exactly which metrics should be 
> included in the CSV output.  The default output of 'Perf.java' is optimized 
> for human inspection – for example, it includes the text "ms" in the 
> running-time column so humans will understand the units of the running-time 
> metric.  The `-c` option will tell 'Perf.java' to generate machine-optimized 
> output that is easier to consume by automation scripts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AVRO-2251) Modify Perf.java to better support automation scripts

2018-10-30 Thread Raymie Stata (JIRA)
Raymie Stata created AVRO-2251:
--

 Summary: Modify Perf.java to better support automation scripts
 Key: AVRO-2251
 URL: https://issues.apache.org/jira/browse/AVRO-2251
 Project: Avro
  Issue Type: Test
Reporter: Raymie Stata
Assignee: Raymie Stata


To better support automated performance-test suites, this patch proposes two 
new arguments to the 'Perf.java' command-line tool:

The `-o' argument gives 'Perf.java' the name of a file that should get the 
results of the run.  Currently, Perf.java sends output to System.out – but if 
'Perf.java' is invoked using Maven, which is the easiest way to invoke it, then 
System.out is polluted with a bunch of other output.  Redirecting 'Perf.java' 
output metrics to a file makes it easier for automation scripts to process 
those metrics.

The `-c [spec]` argument tells 'Perf.java' to generate a comma-separated 
output.  By default, all benchmark metrics are output, but the optional `spec` 
argument can be used to indicate exactly which metrics should be included in 
the CSV output.  The default output of 'Perf.java' is optimized for human 
inspection – for example, it includes the text "ms" in the running-time column 
so humans will understand the units of the running-time metric.  The `-c` 
option will tell 'Perf.java' to generate machine-optimized output that is 
easier to consume by automation scripts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AVRO-1022) Error in validate name

2018-10-29 Thread Raymie Stata (JIRA)


 [ 
https://issues.apache.org/jira/browse/AVRO-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymie Stata reassigned AVRO-1022:
--

Assignee: Raymie Stata

> Error in validate name
> --
>
> Key: AVRO-1022
> URL: https://issues.apache.org/jira/browse/AVRO-1022
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Minor
> Attachments: AVRO-1022.patch, AVRO-1022.patch, 
> unicode-recommendation.html
>
>
> Fix schema.validateName to allow only ASCII letters, not Unicode letters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AVRO-419) Consistent laziness when resolving partially-compatible changes

2018-10-29 Thread Raymie Stata (JIRA)


 [ 
https://issues.apache.org/jira/browse/AVRO-419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymie Stata reassigned AVRO-419:
-

Assignee: Raymie Stata

> Consistent laziness when resolving partially-compatible changes
> ---
>
> Key: AVRO-419
> URL: https://issues.apache.org/jira/browse/AVRO-419
> Project: Avro
>  Issue Type: Bug
>  Components: spec
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
>
> Avro schema resolution is generally "lazy" when it comes to dealing with 
> incompatible changes.  If the writer writes a union of "int" and "null", and 
> the reader expects just an "int", Avro doesn't raise an exception unless the 
> writer _actually_ writes a "null" (and the reader attempts to read it).
> This laziness is a powerful feature for supporting "forward compatibility" 
> (old readers reading data written by new writers).  In the example just 
> given, for example, we might decide at some point that a column needs to be 
> "nullable" but there's a lot of old code that assumes that it's not.  When 
> using old code, we can often arrange to avoid sending the old code any new 
> records that have null-values in that column.  It's powerful to allow new 
> writers to write against the nullable schema and allow readers to read those 
> records.  (For this to be safe, it's also important that this be _checked,_ 
> i.e., that a run time error is thrown is a bad value is passed to the reader.)
> Avro is lazy in many places (e.g., in the union example just given, and for 
> enumerations).  But it's not _consistently_ lazy.  I propose we comb through 
> the spec and make it lazy in all places we can, unless there's a compelling 
> reason not to.
> Numeric types is one area where Avro is not consistently lazy.  I propose 
> that we fairly liberally allow any change from one numeric type to another, 
> and raise errors at runtime if bad values are found.  An "int" can be changed 
> to a "long", for example, and an error is raised when a reader gets an 
> out-of-bounds value.  A "double" can be changed to an "int", and an error is 
> raised if the reader gets a non-integer value or an out-of-bounds value.  
> (I'm not sure if there are types beyond numerics where we could be more 
> consistently lazy, but I decided to write this issue generically just in 
> case.)
> (One might object that these checks are expensive, but note that they are 
> only needed when the reader and writer specs don't agree.  Thus, if these 
> checks are induced, then the system designer _wanted_ these checks, we're 
> only adding value here, not inducing costs.)
> I'm not sure if there are other a



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AVRO-419) Consistent laziness when resolving partially-compatible changes

2018-10-29 Thread Raymie Stata (JIRA)


 [ 
https://issues.apache.org/jira/browse/AVRO-419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymie Stata resolved AVRO-419.
---
Resolution: Won't Fix

This is ancient history: will not fix.

> Consistent laziness when resolving partially-compatible changes
> ---
>
> Key: AVRO-419
> URL: https://issues.apache.org/jira/browse/AVRO-419
> Project: Avro
>  Issue Type: Bug
>  Components: spec
>Reporter: Raymie Stata
>Priority: Major
>
> Avro schema resolution is generally "lazy" when it comes to dealing with 
> incompatible changes.  If the writer writes a union of "int" and "null", and 
> the reader expects just an "int", Avro doesn't raise an exception unless the 
> writer _actually_ writes a "null" (and the reader attempts to read it).
> This laziness is a powerful feature for supporting "forward compatibility" 
> (old readers reading data written by new writers).  In the example just 
> given, for example, we might decide at some point that a column needs to be 
> "nullable" but there's a lot of old code that assumes that it's not.  When 
> using old code, we can often arrange to avoid sending the old code any new 
> records that have null-values in that column.  It's powerful to allow new 
> writers to write against the nullable schema and allow readers to read those 
> records.  (For this to be safe, it's also important that this be _checked,_ 
> i.e., that a run time error is thrown is a bad value is passed to the reader.)
> Avro is lazy in many places (e.g., in the union example just given, and for 
> enumerations).  But it's not _consistently_ lazy.  I propose we comb through 
> the spec and make it lazy in all places we can, unless there's a compelling 
> reason not to.
> Numeric types is one area where Avro is not consistently lazy.  I propose 
> that we fairly liberally allow any change from one numeric type to another, 
> and raise errors at runtime if bad values are found.  An "int" can be changed 
> to a "long", for example, and an error is raised when a reader gets an 
> out-of-bounds value.  A "double" can be changed to an "int", and an error is 
> raised if the reader gets a non-integer value or an out-of-bounds value.  
> (I'm not sure if there are types beyond numerics where we could be more 
> consistently lazy, but I decided to write this issue generically just in 
> case.)
> (One might object that these checks are expensive, but note that they are 
> only needed when the reader and writer specs don't agree.  Thus, if these 
> checks are induced, then the system designer _wanted_ these checks, we're 
> only adding value here, not inducing costs.)
> I'm not sure if there are other a



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AVRO-2244) Problems with TestSpecificLogicalTypes.testAbilityToReadJsr310RecordWrittenAsJodaRecord:148

2018-10-24 Thread Raymie Stata (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661817#comment-16661817
 ] 

Raymie Stata edited comment on AVRO-2244 at 10/24/18 6:51 AM:
--

{{If there's any doubt about this issue being resolved, I just got the 
following error:}}

{{ 
testAbilityToReadJsr310RecordWrittenAsJodaRecord(org.apache.avro.specific.TestSpecificLogicalTypes)
 Time elapsed: 0.085 sec <<< FAILURE!}}
{{ java.lang.AssertionError:}}{{Expected: is "23:43:30.800"}}
{{ but: was "23:43:30.8"}}
{{ at 
org.apache.avro.specific.TestSpecificLogicalTypes.testAbilityToReadJsr310RecordWrittenAsJodaRecord(TestSpecificLogicalTypes.java:150)}}

{{ Personally, I would revert AVRO-2241 and figure out how to get 
`TestSpecificLogicalTypes.testAbilityToReadJsr310RecordWrittenAsJodaRecord` to 
output zero-padded, three-digit time stamps for the Jsr310 case.}}


was (Author: raymie):
If there's any doubt about this issue being resolved, I just got the following 
error:
```
testAbilityToReadJsr310RecordWrittenAsJodaRecord(org.apache.avro.specific.TestSpecificLogicalTypes)
  Time elapsed: 0.085 sec  <<< FAILURE!
java.lang.AssertionError: 

Expected: is "23:43:30.800"
 but: was "23:43:30.8"
at 
org.apache.avro.specific.TestSpecificLogicalTypes.testAbilityToReadJsr310RecordWrittenAsJodaRecord(TestSpecificLogicalTypes.java:150)
```
Personally, I would revert AVRO-2241 and figure out how to get 
`TestSpecificLogicalTypes.testAbilityToReadJsr310RecordWrittenAsJodaRecord` to 
output zero-padded, three-digit time stamps for the Jsr310 case.

> Problems with 
> TestSpecificLogicalTypes.testAbilityToReadJsr310RecordWrittenAsJodaRecord:148
> ---
>
> Key: AVRO-2244
> URL: https://issues.apache.org/jira/browse/AVRO-2244
> Project: Avro
>  Issue Type: Bug
>  Components: logical types
>Reporter: Raymie Stata
>Priority: Major
>
> I've seen an intermittent test failure that looks like this:
> {{Failed tests:}}
> {{  
> TestSpecificLogicalTypes.testAbilityToReadJsr310RecordWrittenAsJodaRecord:148}}
> {{Expected: is "20:35:18.720"}}
> {{ but: was "20:35:18.72"}}
> When I see this failure, it's always the case that the trailing digit is 
> zero.  I suspect that it's a bug where the trailing zero is not printed.  
> Since the test cases use the current time, then most of the time the trailing 
> digit isn't zero and the bug isn't tickled.  But once-in-a-while the current 
> time has a trailing zero, which tickles the bug.
> If this diagnosis is correct, then in addition to fixing the bug, it might be 
> a good idea to add tests with hard-wired, static times that cover corner 
> cases like this one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2244) Problems with TestSpecificLogicalTypes.testAbilityToReadJsr310RecordWrittenAsJodaRecord:148

2018-10-24 Thread Raymie Stata (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661817#comment-16661817
 ] 

Raymie Stata commented on AVRO-2244:


If there's any doubt about this issue being resolved, I just got the following 
error:
```
testAbilityToReadJsr310RecordWrittenAsJodaRecord(org.apache.avro.specific.TestSpecificLogicalTypes)
  Time elapsed: 0.085 sec  <<< FAILURE!
java.lang.AssertionError: 

Expected: is "23:43:30.800"
 but: was "23:43:30.8"
at 
org.apache.avro.specific.TestSpecificLogicalTypes.testAbilityToReadJsr310RecordWrittenAsJodaRecord(TestSpecificLogicalTypes.java:150)
```
Personally, I would revert AVRO-2241 and figure out how to get 
`TestSpecificLogicalTypes.testAbilityToReadJsr310RecordWrittenAsJodaRecord` to 
output zero-padded, three-digit time stamps for the Jsr310 case.

> Problems with 
> TestSpecificLogicalTypes.testAbilityToReadJsr310RecordWrittenAsJodaRecord:148
> ---
>
> Key: AVRO-2244
> URL: https://issues.apache.org/jira/browse/AVRO-2244
> Project: Avro
>  Issue Type: Bug
>  Components: logical types
>Reporter: Raymie Stata
>Priority: Major
>
> I've seen an intermittent test failure that looks like this:
> {{Failed tests:}}
> {{  
> TestSpecificLogicalTypes.testAbilityToReadJsr310RecordWrittenAsJodaRecord:148}}
> {{Expected: is "20:35:18.720"}}
> {{ but: was "20:35:18.72"}}
> When I see this failure, it's always the case that the trailing digit is 
> zero.  I suspect that it's a bug where the trailing zero is not printed.  
> Since the test cases use the current time, then most of the time the trailing 
> digit isn't zero and the bug isn't tickled.  But once-in-a-while the current 
> time has a trailing zero, which tickles the bug.
> If this diagnosis is correct, then in addition to fixing the bug, it might be 
> a good idea to add tests with hard-wired, static times that cover corner 
> cases like this one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2244) Problems with TestSpecificLogicalTypes.testAbilityToReadJsr310RecordWrittenAsJodaRecord:148

2018-10-23 Thread Raymie Stata (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661123#comment-16661123
 ] 

Raymie Stata commented on AVRO-2244:


I don't believe the fix for AVRO-2241 addresses the problem in AVRO-2244: 2244 
seems to be related to the _formatting_ of times, rather than the truncation of 
them.  However, I think the reverse is true: A fix to AVRO-2244 would (have) 
addressed the problem seen in AVRO-2241.

> Problems with 
> TestSpecificLogicalTypes.testAbilityToReadJsr310RecordWrittenAsJodaRecord:148
> ---
>
> Key: AVRO-2244
> URL: https://issues.apache.org/jira/browse/AVRO-2244
> Project: Avro
>  Issue Type: Bug
>  Components: logical types
>Reporter: Raymie Stata
>Priority: Major
>
> I've seen an intermittent test failure that looks like this:
> {{Failed tests:}}
> {{  
> TestSpecificLogicalTypes.testAbilityToReadJsr310RecordWrittenAsJodaRecord:148}}
> {{Expected: is "20:35:18.720"}}
> {{ but: was "20:35:18.72"}}
> When I see this failure, it's always the case that the trailing digit is 
> zero.  I suspect that it's a bug where the trailing zero is not printed.  
> Since the test cases use the current time, then most of the time the trailing 
> digit isn't zero and the bug isn't tickled.  But once-in-a-while the current 
> time has a trailing zero, which tickles the bug.
> If this diagnosis is correct, then in addition to fixing the bug, it might be 
> a good idea to add tests with hard-wired, static times that cover corner 
> cases like this one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-10-22 Thread Raymie Stata (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658852#comment-16658852
 ] 

Raymie Stata commented on AVRO-2090:


Any more feedback on this patch?

> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
> Attachments: customcoders.md, perf-data.txt
>
>
> New implementation for generation of code for SpecificRecord that improves 
> decoding by over 10% and encoding over 30% (more improvements are on the 
> way).  This feature is behind a feature flag 
> ({{org.apache.avro.specific.use_custom_coders}}) and (for now) turned off by 
> default.  See [Getting Started 
> (Java)|https://avro.apache.org/docs/current/gettingstartedjava.html#Beta+feature:+Generating+faster+code]
>  for instructions.
> (A bit more info: Compared to GenericRecords, SpecificRecords offer 
> type-safety plus the performance of traditional getters/setters/instance 
> variables.  But these are only beneficial to Java code accessing those 
> records.  SpecificRecords inherit serialization and deserialization code from 
> GenericRecords, which is dynamic and thus slow (in fact, benchmarks show that 
> serialization and deserialization is _slower_ for SpecificRecord than for 
> GenericRecord).  This patch extends record.vm to generate custom, 
> higher-performance encoder and decoder functions for SpecificRecords.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-10-19 Thread Raymie Stata (JIRA)


 [ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymie Stata updated AVRO-2090:
---
Description: 
New implementation for generation of code for SpecificRecord that improves 
decoding by over 10% and encoding over 30% (more improvements are on the way).  
This feature is behind a feature flag 
({{org.apache.avro.specific.use_custom_coders}}) and (for now) turned off by 
default.  See [Getting Started 
(Java)|https://avro.apache.org/docs/current/gettingstartedjava.html#Beta+feature:+Generating+faster+code]
 for instructions.

(A bit more info: Compared to GenericRecords, SpecificRecords offer type-safety 
plus the performance of traditional getters/setters/instance variables.  But 
these are only beneficial to Java code accessing those records.  
SpecificRecords inherit serialization and deserialization code from 
GenericRecords, which is dynamic and thus slow (in fact, benchmarks show that 
serialization and deserialization is _slower_ for SpecificRecord than for 
GenericRecord).  This patch extends record.vm to generate custom, 
higher-performance encoder and decoder functions for SpecificRecords.)


  was:
New implementation for generation of code for SpecificRecord that improves 
decoding by over 10% and encoding over 30% (more improvements are on the way).  
This feature is behind a feature flag 
({{org.apache.avro.specific.use_custom_coders}}) and (for now) turned off by 
default.  See [Getting Started 
(Java)|https://avro.apache.org/docs/current/gettingstartedjava.html#Beta+feature:+Generating
 faster+code] for instructions.

(A bit more info: Compared to GenericRecords, SpecificRecords offer type-safety 
plus the performance of traditional getters/setters/instance variables.  But 
these are only beneficial to Java code accessing those records.  
SpecificRecords inherit serialization and deserialization code from 
GenericRecords, which is dynamic and thus slow (in fact, benchmarks show that 
serialization and deserialization is _slower_ for SpecificRecord than for 
GenericRecord).  This patch extends record.vm to generate custom, 
higher-performance encoder and decoder functions for SpecificRecords.)



> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
> Attachments: customcoders.md, perf-data.txt
>
>
> New implementation for generation of code for SpecificRecord that improves 
> decoding by over 10% and encoding over 30% (more improvements are on the 
> way).  This feature is behind a feature flag 
> ({{org.apache.avro.specific.use_custom_coders}}) and (for now) turned off by 
> default.  See [Getting Started 
> (Java)|https://avro.apache.org/docs/current/gettingstartedjava.html#Beta+feature:+Generating+faster+code]
>  for instructions.
> (A bit more info: Compared to GenericRecords, SpecificRecords offer 
> type-safety plus the performance of traditional getters/setters/instance 
> variables.  But these are only beneficial to Java code accessing those 
> records.  SpecificRecords inherit serialization and deserialization code from 
> GenericRecords, which is dynamic and thus slow (in fact, benchmarks show that 
> serialization and deserialization is _slower_ for SpecificRecord than for 
> GenericRecord).  This patch extends record.vm to generate custom, 
> higher-performance encoder and decoder functions for SpecificRecords.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-10-19 Thread Raymie Stata (JIRA)


 [ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymie Stata updated AVRO-2090:
---
Description: 
New implementation for generation of code for SpecificRecord that improves 
decoding by over 10% and encoding over 30% (more improvements are on the way).  
This feature is behind a feature flag 
({{org.apache.avro.specific.use_custom_coders}}) and (for now) turned off by 
default.  See [Getting Started 
(Java)|https://avro.apache.org/docs/current/gettingstartedjava.html#Beta+feature:+Generating
 faster+code] for instructions.

(A bit more info: Compared to GenericRecords, SpecificRecords offer type-safety 
plus the performance of traditional getters/setters/instance variables.  But 
these are only beneficial to Java code accessing those records.  
SpecificRecords inherit serialization and deserialization code from 
GenericRecords, which is dynamic and thus slow (in fact, benchmarks show that 
serialization and deserialization is _slower_ for SpecificRecord than for 
GenericRecord).  This patch extends record.vm to generate custom, 
higher-performance encoder and decoder functions for SpecificRecords.)


  was:
New implementation for generation of code for SpecificRecord that improves 
decoding by over 10% and encoding over 30% (more improvements are on the way).  
This feature is behind a feature flag 
({{org.apache.avro.specific.use_custom_coders}}) and (for now) turned off by 
default.  See [Getting Started 
(Java)|https://avro.apache.org/docs/current/gettingstartedjava.html#Beta+feature:+Faster+code+generation]
 for instructions.

(A bit more info: Compared to GenericRecords, SpecificRecords offer type-safety 
plus the performance of traditional getters/setters/instance variables.  But 
these are only beneficial to Java code accessing those records.  
SpecificRecords inherit serialization and deserialization code from 
GenericRecords, which is dynamic and thus slow (in fact, benchmarks show that 
serialization and deserialization is _slower_ for SpecificRecord than for 
GenericRecord).  This patch extends record.vm to generate custom, 
higher-performance encoder and decoder functions for SpecificRecords.)



> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
> Attachments: customcoders.md, perf-data.txt
>
>
> New implementation for generation of code for SpecificRecord that improves 
> decoding by over 10% and encoding over 30% (more improvements are on the 
> way).  This feature is behind a feature flag 
> ({{org.apache.avro.specific.use_custom_coders}}) and (for now) turned off by 
> default.  See [Getting Started 
> (Java)|https://avro.apache.org/docs/current/gettingstartedjava.html#Beta+feature:+Generating
>  faster+code] for instructions.
> (A bit more info: Compared to GenericRecords, SpecificRecords offer 
> type-safety plus the performance of traditional getters/setters/instance 
> variables.  But these are only beneficial to Java code accessing those 
> records.  SpecificRecords inherit serialization and deserialization code from 
> GenericRecords, which is dynamic and thus slow (in fact, benchmarks show that 
> serialization and deserialization is _slower_ for SpecificRecord than for 
> GenericRecord).  This patch extends record.vm to generate custom, 
> higher-performance encoder and decoder functions for SpecificRecords.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-10-19 Thread Raymie Stata (JIRA)


 [ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymie Stata updated AVRO-2090:
---
Description: 
New implementation for generation of code for SpecificRecord that improves 
decoding by over 10% and encoding over 30% (more improvements are on the way).  
This feature is behind a feature flag 
({{org.apache.avro.specific.use_custom_coders}}) and (for now) turned off by 
default.  See [Getting Started 
(Java)|https://avro.apache.org/docs/current/gettingstartedjava.html#Beta+feature:+Faster+code+generation]
 for instructions.

(A bit more info: Compared to GenericRecords, SpecificRecords offer type-safety 
plus the performance of traditional getters/setters/instance variables.  But 
these are only beneficial to Java code accessing those records.  
SpecificRecords inherit serialization and deserialization code from 
GenericRecords, which is dynamic and thus slow (in fact, benchmarks show that 
serialization and deserialization is _slower_ for SpecificRecord than for 
GenericRecord).  This patch extends record.vm to generate custom, 
higher-performance encoder and decoder functions for SpecificRecords.)


  was:
Compared to GenericRecords, SpecificRecords offer type-safety plus the 
performance of traditional getters/setters/instance variables.  But these are 
only beneficial to Java code accessing those records.  SpecificRecords inherit 
serialization and deserialization code from GenericRecords, which is dynamic 
and thus slow (in fact, benchmarks show that serialization and deserialization 
is _slower_ for SpecificRecord than for GenericRecord).

This patch extends record.vm to generate custom, higher-performance encoder and 
decoder functions for SpecificRecords.  We've run a public benchmark showing 
that the new code reduces serialization time by 2/3 and deserialization time by 
close to 50%.



> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
> Attachments: customcoders.md, perf-data.txt
>
>
> New implementation for generation of code for SpecificRecord that improves 
> decoding by over 10% and encoding over 30% (more improvements are on the 
> way).  This feature is behind a feature flag 
> ({{org.apache.avro.specific.use_custom_coders}}) and (for now) turned off by 
> default.  See [Getting Started 
> (Java)|https://avro.apache.org/docs/current/gettingstartedjava.html#Beta+feature:+Faster+code+generation]
>  for instructions.
> (A bit more info: Compared to GenericRecords, SpecificRecords offer 
> type-safety plus the performance of traditional getters/setters/instance 
> variables.  But these are only beneficial to Java code accessing those 
> records.  SpecificRecords inherit serialization and deserialization code from 
> GenericRecords, which is dynamic and thus slow (in fact, benchmarks show that 
> serialization and deserialization is _slower_ for SpecificRecord than for 
> GenericRecord).  This patch extends record.vm to generate custom, 
> higher-performance encoder and decoder functions for SpecificRecords.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2244) Problems with TestSpecificLogicalTypes.testAbilityToReadJsr310RecordWrittenAsJodaRecord:148

2018-10-19 Thread Raymie Stata (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656687#comment-16656687
 ] 

Raymie Stata commented on AVRO-2244:


The spec for 
[ISO_LOCAL_TIME|https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html#ISO_LOCAL_TIME]:
 "One to nine digits for the nano-of-second. As many digits will be output as 
required."  So it's going to drop the trailing zeros.

The spec for the [JODA 
formatter|https://www.joda.org/joda-time/apidocs/org/joda/time/format/ISODateTimeFormat.html#time--]
 that's being used here says: "Returns a formatter for a two digit hour of day, 
two digit minute of hour, two digit second of minute, three digit fraction of 
second, and time zone offset (HH:mm:ss.SSSZZ).", i.e., it will pad with 
trailing zeros.

So the test code in this case is buggy.

> Problems with 
> TestSpecificLogicalTypes.testAbilityToReadJsr310RecordWrittenAsJodaRecord:148
> ---
>
> Key: AVRO-2244
> URL: https://issues.apache.org/jira/browse/AVRO-2244
> Project: Avro
>  Issue Type: Bug
>  Components: logical types
>Reporter: Raymie Stata
>Priority: Major
>
> I've seen an intermittent test failure that looks like this:
> {{Failed tests:}}
> {{  
> TestSpecificLogicalTypes.testAbilityToReadJsr310RecordWrittenAsJodaRecord:148}}
> {{Expected: is "20:35:18.720"}}
> {{ but: was "20:35:18.72"}}
> When I see this failure, it's always the case that the trailing digit is 
> zero.  I suspect that it's a bug where the trailing zero is not printed.  
> Since the test cases use the current time, then most of the time the trailing 
> digit isn't zero and the bug isn't tickled.  But once-in-a-while the current 
> time has a trailing zero, which tickles the bug.
> If this diagnosis is correct, then in addition to fixing the bug, it might be 
> a good idea to add tests with hard-wired, static times that cover corner 
> cases like this one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-10-18 Thread Raymie Stata (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654739#comment-16654739
 ] 

Raymie Stata edited comment on AVRO-2090 at 10/18/18 7:19 AM:
--

I've attached my two runs of Perf.java combined into a single file 
([^perf-data.txt]).  The first four columns of numbers in this file are the 
results with custom-encoders turned off; the next four columns are the results 
with custom-encoders on.

For the two SpecificRecord cases: On my machine, FooBarSpecificRecordTestWrite 
improved 36% (from 3577 ms to 2296 ms), while FooBarSpecificRecordTestRead 
improved 12% (4728 ms to 4130 ms).  It's not surprising that the read case 
improved less: the overhead of accommodating schema migration is high.  I have 
some ideas on how improve performance even more, esp. for the read case.  That 
said, a >10% improvement is not bad, and 36% improvement is quite good, so I 
suggest we commit this change as-is and save further improvements to future 
patches.

(Thiru points out that FooBarSpecificRecord a very small class that probably 
understates the performance-improvements of this patch.  In our work at Aqfer, 
we've seen larger improvements.)


was (Author: raymie):
I've attached my two runs of Perf.java combined into a single file 
([^perf-data.txt]).  The first four columns of numbers in this file are the 
results with custom-encoders turned off; the next four columns are the results 
with custom-encoders on.

For the two SpecificRecord cases: On my machine, FooBarSpecificRecordTestWrite 
improved 36% (from 3577 ms to 2296 ms), while FooBarSpecificRecordTestRead 
improved 12% (4728 ms to 4130 ms).  It's not surprising that the read case 
improved less: the overhead of accommodating schema migration is high.  I have 
some ideas on how improve performance even more, esp. for the read case.  That 
said, a >10% improvement is not bad, and 36% improvement is quite good, so I 
suggest we commit this change as-is and save further improvements to future 
patches.

> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
> Attachments: customcoders.md, perf-data.txt
>
>
> Compared to GenericRecords, SpecificRecords offer type-safety plus the 
> performance of traditional getters/setters/instance variables.  But these are 
> only beneficial to Java code accessing those records.  SpecificRecords 
> inherit serialization and deserialization code from GenericRecords, which is 
> dynamic and thus slow (in fact, benchmarks show that serialization and 
> deserialization is _slower_ for SpecificRecord than for GenericRecord).
> This patch extends record.vm to generate custom, higher-performance encoder 
> and decoder functions for SpecificRecords.  We've run a public benchmark 
> showing that the new code reduces serialization time by 2/3 and 
> deserialization time by close to 50%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AVRO-2235) Regenerate TestRecordWithLogicalTypes

2018-10-16 Thread Raymie Stata (JIRA)


 [ 
https://issues.apache.org/jira/browse/AVRO-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymie Stata resolved AVRO-2235.

  Resolution: Won't Fix
Release Note: Based on Thiru's input, the old generate code should stay in 
place because it's there to test backward compatibility.

> Regenerate TestRecordWithLogicalTypes
> -
>
> Key: AVRO-2235
> URL: https://issues.apache.org/jira/browse/AVRO-2235
> Project: Avro
>  Issue Type: Bug
>  Components: logical types
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
>
> TestRecordWithLogicalTypes.java is code that was generated by the specific 
> compiler and then moved into the testing code tree.  It hasn't been changed 
> in a while, although the compiler is evolving.  I tried to regenerate it and 
> found there is a problem with record_with_logical_types.avsc.  I will fix the 
> schema file and then regenerate TestRecordWithLogicalTypes and check both in.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AVRO-2244) Problems with TestSpecificLogicalTypes.testAbilityToReadJsr310RecordWrittenAsJodaRecord:148

2018-10-16 Thread Raymie Stata (JIRA)
Raymie Stata created AVRO-2244:
--

 Summary: Problems with 
TestSpecificLogicalTypes.testAbilityToReadJsr310RecordWrittenAsJodaRecord:148
 Key: AVRO-2244
 URL: https://issues.apache.org/jira/browse/AVRO-2244
 Project: Avro
  Issue Type: Bug
  Components: logical types
Reporter: Raymie Stata


I've seen an intermittent test failure that looks like this:

{{Failed tests:}}
{{  
TestSpecificLogicalTypes.testAbilityToReadJsr310RecordWrittenAsJodaRecord:148}}
{{Expected: is "20:35:18.720"}}
{{ but: was "20:35:18.72"}}

When I see this failure, it's always the case that the trailing digit is zero.  
I suspect that it's a bug where the trailing zero is not printed.  Since the 
test cases use the current time, then most of the time the trailing digit isn't 
zero and the bug isn't tickled.  But once-in-a-while the current time has a 
trailing zero, which tickles the bug.

If this diagnosis is correct, then in addition to fixing the bug, it might be a 
good idea to add tests with hard-wired, static times that cover corner cases 
like this one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2235) Regenerate TestRecordWithLogicalTypes

2018-10-03 Thread Raymie Stata (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16637359#comment-16637359
 ] 

Raymie Stata commented on AVRO-2235:


I'm going to keep this open in case someone else wants to comment.  But based 
on what Thiru says, I think I'm going to close this issue as "won't fix."

> Regenerate TestRecordWithLogicalTypes
> -
>
> Key: AVRO-2235
> URL: https://issues.apache.org/jira/browse/AVRO-2235
> Project: Avro
>  Issue Type: Bug
>  Components: logical types
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
>
> TestRecordWithLogicalTypes.java is code that was generated by the specific 
> compiler and then moved into the testing code tree.  It hasn't been changed 
> in a while, although the compiler is evolving.  I tried to regenerate it and 
> found there is a problem with record_with_logical_types.avsc.  I will fix the 
> schema file and then regenerate TestRecordWithLogicalTypes and check both in.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2235) Regenerate TestRecordWithLogicalTypes

2018-10-02 Thread Raymie Stata (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16636484#comment-16636484
 ] 

Raymie Stata commented on AVRO-2235:


I looked at the comment at the top of TestSpecificLogicalTypes and realized 
that maybe I shouldn't have updated TestRecordWithLogicalTypes.  That comment 
reads:
{quote}The classes  [TestRecordWithLogicalTypes and 
TestRecordWithoutLogicalTypes]should not be re-generated because they test 
compatibility of Avro with existing Avro-generated sources. When using classes 
generated before AVRO-1684, logical types should not be applied by the read or 
write paths. Those files should behave as they did before.  At the same time, 
[~nkollar] suggested in my (GitHub) pull request for AVRO-2090 that I 
regenerate TestRecordForLogicalTypes.  
{quote}
So it sounds like this code here is for testing backward compatibility and thus 
shouldn't be updated.

At the same time, in my (GitHub) pull request for AVRO-2090, [~nkollar] 
suggests that I _do_ regenerate these classes.  At this point, I'm not sure 
what is the right thing to do.  Any suggestions?

> Regenerate TestRecordWithLogicalTypes
> -
>
> Key: AVRO-2235
> URL: https://issues.apache.org/jira/browse/AVRO-2235
> Project: Avro
>  Issue Type: Bug
>  Components: logical types
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
>
> TestRecordWithLogicalTypes.java is code that was generated by the specific 
> compiler and then moved into the testing code tree.  It hasn't been changed 
> in a while, although the compiler is evolving.  I tried to regenerate it and 
> found there is a problem with record_with_logical_types.avsc.  I will fix the 
> schema file and then regenerate TestRecordWithLogicalTypes and check both in.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2235) Regenerate TestRecordWithLogicalTypes

2018-10-02 Thread Raymie Stata (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16636466#comment-16636466
 ] 

Raymie Stata commented on AVRO-2235:


I want to re-regenerate TestRecordForLogicalTypes.java with my modifications to 
the specific compiler, but would prefer to do that after successfully 
re-generating it on the master.

> Regenerate TestRecordWithLogicalTypes
> -
>
> Key: AVRO-2235
> URL: https://issues.apache.org/jira/browse/AVRO-2235
> Project: Avro
>  Issue Type: Bug
>  Components: logical types
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
>
> TestRecordWithLogicalTypes.java is code that was generated by the specific 
> compiler and then moved into the testing code tree.  It hasn't been changed 
> in a while, although the compiler is evolving.  I tried to regenerate it and 
> found there is a problem with record_with_logical_types.avsc.  I will fix the 
> schema file and then regenerate TestRecordWithLogicalTypes and check both in.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2235) Regenerate TestRecordWithLogicalTypes

2018-10-02 Thread Raymie Stata (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16636464#comment-16636464
 ] 

Raymie Stata commented on AVRO-2235:


Does anyone know how TestRecordWithoutLogicalTypes.java was generated?  Was it 
a hand-edit of the generated code for TestRecordWithLogicalTypes.java?

> Regenerate TestRecordWithLogicalTypes
> -
>
> Key: AVRO-2235
> URL: https://issues.apache.org/jira/browse/AVRO-2235
> Project: Avro
>  Issue Type: Bug
>  Components: logical types
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
>
> TestRecordWithLogicalTypes.java is code that was generated by the specific 
> compiler and then moved into the testing code tree.  It hasn't been changed 
> in a while, although the compiler is evolving.  I tried to regenerate it and 
> found there is a problem with record_with_logical_types.avsc.  I will fix the 
> schema file and then regenerate TestRecordWithLogicalTypes and check both in.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AVRO-2235) Regenerate TestRecordWithLogicalTypes

2018-10-02 Thread Raymie Stata (JIRA)
Raymie Stata created AVRO-2235:
--

 Summary: Regenerate TestRecordWithLogicalTypes
 Key: AVRO-2235
 URL: https://issues.apache.org/jira/browse/AVRO-2235
 Project: Avro
  Issue Type: Bug
  Components: logical types
Reporter: Raymie Stata
Assignee: Raymie Stata


TestRecordWithLogicalTypes.java is code that was generated by the specific 
compiler and then moved into the testing code tree.  It hasn't been changed in 
a while, although the compiler is evolving.  I tried to regenerate it and found 
there is a problem with record_with_logical_types.avsc.  I will fix the schema 
file and then regenerate TestRecordWithLogicalTypes and check both in.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AVRO-2091) Eliminate org.apache.avro.specific.use_custom_coder feature flag

2018-09-30 Thread Raymie Stata (JIRA)


 [ 
https://issues.apache.org/jira/browse/AVRO-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymie Stata updated AVRO-2091:
---
Description: After the implementation of "custom coders" (AVRO-2090) is 
complete and seen more production usage, this feature flag should be 
eliminated.  (More specifically, the initial release of AVRO-2090 should set 
USE_CUSTOM_CODERS to false by default, to get some initial production testing.  
The release after that should set this flag to true by default, but allow folks 
to fall back on the old way in case there are corner cases that aren't working. 
 The release after that should remove this feature flag altogether, under the 
assumption that it works just fine and there's no need to maintain two ways of 
doing things.)  (was: After the implementation of "custom coders" (AVRO-2090) 
is complete and seen more production usage, this feature flag should be 
eliminated.)

> Eliminate org.apache.avro.specific.use_custom_coder feature flag
> 
>
> Key: AVRO-2091
> URL: https://issues.apache.org/jira/browse/AVRO-2091
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Priority: Minor
>
> After the implementation of "custom coders" (AVRO-2090) is complete and seen 
> more production usage, this feature flag should be eliminated.  (More 
> specifically, the initial release of AVRO-2090 should set USE_CUSTOM_CODERS 
> to false by default, to get some initial production testing.  The release 
> after that should set this flag to true by default, but allow folks to fall 
> back on the old way in case there are corner cases that aren't working.  The 
> release after that should remove this feature flag altogether, under the 
> assumption that it works just fine and there's no need to maintain two ways 
> of doing things.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2017-10-07 Thread Raymie Stata (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymie Stata updated AVRO-2090:
---
Attachment: customcoders.md

Attaching a design document for (forthcoming) patch.

> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
> Attachments: customcoders.md
>
>
> Compared to GenericRecords, SpecificRecords offer type-safety plus the 
> performance of traditional getters/setters/instance variables.  But these are 
> only beneficial to Java code accessing those records.  SpecificRecords 
> inherit serialization and deserialization code from GenericRecords, which is 
> dynamic and thus slow (in fact, benchmarks show that serialization and 
> deserialization is _slower_ for SpecificRecord than for GenericRecord).
> This patch extends record.vm to generate custom, higher-performance encoder 
> and decoder functions for SpecificRecords.  We've run a public benchmark 
> showing that the new code reduces serialization time by 2/3 and 
> deserialization time by close to 50%.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AVRO-2094) Extend "custom coders" to support logical types

2017-10-07 Thread Raymie Stata (JIRA)
Raymie Stata created AVRO-2094:
--

 Summary: Extend "custom coders" to support logical types
 Key: AVRO-2094
 URL: https://issues.apache.org/jira/browse/AVRO-2094
 Project: Avro
  Issue Type: Improvement
Reporter: Raymie Stata


The initial implementation of "custom coders" (AVRO-2090) does not support 
Avro's logical types.  This JIRA extends that implementation to remove this 
limitation.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AVRO-2093) Extend "custom coders" to fully support union types

2017-10-07 Thread Raymie Stata (JIRA)
Raymie Stata created AVRO-2093:
--

 Summary: Extend "custom coders" to fully support union types
 Key: AVRO-2093
 URL: https://issues.apache.org/jira/browse/AVRO-2093
 Project: Avro
  Issue Type: Improvement
Reporter: Raymie Stata


The initial implementation of "custom coders" for SpecificRecord (AVRO-2090) 
only supports "nullable unions" (two-branch unions where one branch is the null 
type).  This JIRA extends that implementation to support all forms of unions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AVRO-2092) Flip default value of org.apache.avro.specific.use_custom_coder to true

2017-10-07 Thread Raymie Stata (JIRA)
Raymie Stata created AVRO-2092:
--

 Summary: Flip default value of 
org.apache.avro.specific.use_custom_coder to true
 Key: AVRO-2092
 URL: https://issues.apache.org/jira/browse/AVRO-2092
 Project: Avro
  Issue Type: Improvement
  Components: java
Reporter: Raymie Stata
Priority: Minor


The initial implementation of "custom coders" for SpecificRecord is incomplete 
(it didn't initially handle logical types) and hasn't been battle-tested.  
Thus, it includes a feature flag (org.apache.avro.specific.use_custom_coder) to 
toggle between the new code and the old code.  The initial default for this 
feature flag is false -- defaulting to the old code -- but when the 
implementation of SpecificRecord is completed and it's seen more production 
use, we should switch the default to false, on the way to eliminating the flag 
altogether (AVRO-2091).




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AVRO-2091) Eliminate org.apache.avro.specific.use_custom_coder feature flag

2017-10-07 Thread Raymie Stata (JIRA)
Raymie Stata created AVRO-2091:
--

 Summary: Eliminate org.apache.avro.specific.use_custom_coder 
feature flag
 Key: AVRO-2091
 URL: https://issues.apache.org/jira/browse/AVRO-2091
 Project: Avro
  Issue Type: Improvement
  Components: java
Reporter: Raymie Stata
Priority: Minor


After the implementation of "custom coders" (AVRO-2090) is complete and seen 
more production usage, this feature flag should be eliminated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2017-10-07 Thread Raymie Stata (JIRA)
Raymie Stata created AVRO-2090:
--

 Summary: Improve encode/decode time for SpecificRecord using code 
generation
 Key: AVRO-2090
 URL: https://issues.apache.org/jira/browse/AVRO-2090
 Project: Avro
  Issue Type: Improvement
  Components: java
Reporter: Raymie Stata


Compared to GenericRecords, SpecificRecords offer type-safety plus the 
performance of traditional getters/setters/instance variables.  But these are 
only beneficial to Java code accessing those records.  SpecificRecords inherit 
serialization and deserialization code from GenericRecords, which is dynamic 
and thus slow (in fact, benchmarks show that serialization and deserialization 
is _slower_ for SpecificRecord than for GenericRecord).

This patch extends record.vm to generate custom, higher-performance encoder and 
decoder functions for SpecificRecords.  We've run a public benchmark showing 
that the new code reduces serialization time by 2/3 and deserialization time by 
close to 50%.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AVRO-806) add a column-major codec for data files

2012-04-27 Thread Raymie Stata (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymie Stata updated AVRO-806:
--


In about a month we will have some Hive benchmarks, but the data won't be very 
wide, so they won't be good for testing column-major formats.  However, maybe 
we should walk before we run: If someone puts Avro SerDe's in place against the 
regular Avro format, we could benchmark and maybe even help tune that 
configuration, which would provide a baseline for testing a column-major 
configuration.  (Unfortunately, we can't do the SerDe work itself.)

 add a column-major codec for data files
 ---

 Key: AVRO-806
 URL: https://issues.apache.org/jira/browse/AVRO-806
 Project: Avro
  Issue Type: New Feature
  Components: java, spec
Reporter: Doug Cutting
Assignee: Doug Cutting
 Fix For: 1.7.0

 Attachments: AVRO-806-v2.patch, AVRO-806.patch, avro-file-columnar.pdf


 Define a codec that, when a data file's schema is a record schema, writes 
 blocks within the file in column-major order.  This would permit better 
 compression and also permit efficient skipping of fields that are not of 
 interest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (AVRO-806) add a column-major codec for data files

2012-04-24 Thread Raymie Stata (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13260732#comment-13260732
 ] 

Raymie Stata commented on AVRO-806:
---

This is the second attempt at a column-major codec.  The whole goal of 
col-major formats is to optimize performance.  Thus, to drive this exercise 
forward it seems necessary to have some kind of benchmark to do some testing.  
(I don't think a micro-benchmark is sufficient -- rather the right benchmark is 
with a query planner (Hive?) that can take advantage of these formats.)  With 
such a benchmark in place, we'd compare the performance of the existing 
row-major (as a baseline) Avro formats with the various, proposed col-major 
formats to make sure that we're getting the kind of performance improvements 
(2x, 4x or more) to justify the complexity of a col-major format.

Some comments more specific to this proposal: First, I'd like to see the Type 
Mapping section for Avro filled in; this would give us a much better idea of 
what you're trying.  Second, at first glance, it seems like your design 
replicates some of the features of RCFiles that the CIF paper claims cause 
performance problems (but, again, maybe this issue is better addressed via some 
benchmarking).

Regarding your implementation of this proposal, it re-implements all the 
lower-levels of Avro.  It seems like this double-implementation will be a 
maintenance problem.  

 add a column-major codec for data files
 ---

 Key: AVRO-806
 URL: https://issues.apache.org/jira/browse/AVRO-806
 Project: Avro
  Issue Type: New Feature
  Components: java, spec
Reporter: Doug Cutting
Assignee: Doug Cutting
 Fix For: 1.7.0

 Attachments: AVRO-806-v2.patch, AVRO-806.patch, avro-file-columnar.pdf


 Define a codec that, when a data file's schema is a record schema, writes 
 blocks within the file in column-major order.  This would permit better 
 compression and also permit efficient skipping of fields that are not of 
 interest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira